1
|
Pultar F, Thürlemann M, Gordiy I, Doloszeski E, Riniker S. Neural Network Potential with Multiresolution Approach Enables Accurate Prediction of Reaction Free Energies in Solution. J Am Chem Soc 2025; 147:6835-6856. [PMID: 39961342 DOI: 10.1021/jacs.4c17015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/27/2025]
Abstract
We present the design and implementation of a novel neural network potential (NNP) and its combination with an electrostatic embedding scheme, commonly used within the context of hybrid quantum-mechanical/molecular-mechanical (QM/MM) simulations. Substitution of a computationally expensive QM Hamiltonian by an NNP with the same accuracy largely reduces the computational cost and enables efficient sampling in prospective MD simulations, the main limitation faced by traditional QM/MM setups. The model relies on the recently introduced anisotropic message passing (AMP) formalism to compute atomic interactions and encode symmetries found in QM systems. AMP is shown to be highly efficient in terms of both data and computational costs and can be readily scaled to sample systems involving more than 350 solute and 40,000 solvent atoms for hundreds of nanoseconds using umbrella sampling. Most deviations of AMP predictions from the underlying DFT ground truth lie within chemical accuracy (4.184 kJ mol-1). The performance and broad applicability of our approach are showcased by calculating the free-energy surface of alanine dipeptide, the preferred ligation states of nickel phosphine complexes, and dissociation free energies of charged pyridine and quinoline dimers. Results with this ML/MM approach show excellent agreement with experimental data and reach chemical accuracy in most cases. In contrast, free energies calculated with static DFT calculations paired with implicit solvent models or QM/MM MD simulations using cheaper semiempirical methods show up to ten times higher deviation from the experimental ground truth and sometimes even fail to reproduce qualitative trends.
Collapse
Affiliation(s)
- Felix Pultar
- Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 2, Zürich 8093, Switzerland
| | - Moritz Thürlemann
- Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 2, Zürich 8093, Switzerland
| | - Igor Gordiy
- Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 2, Zürich 8093, Switzerland
| | - Eva Doloszeski
- Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 2, Zürich 8093, Switzerland
| | - Sereina Riniker
- Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 2, Zürich 8093, Switzerland
| |
Collapse
|
2
|
Simeon G, Mirarchi A, Pelaez RP, Galvelis R, De Fabritiis G. Broadening the Scope of Neural Network Potentials through Direct Inclusion of Additional Molecular Attributes. J Chem Theory Comput 2025; 21:1831-1837. [PMID: 39933873 DOI: 10.1021/acs.jctc.4c01625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/13/2025]
Abstract
Most state-of-the-art neural network potentials do not account for molecular attributes other than atomic numbers and positions, which limits its range of applicability by design. In this work, we demonstrate the importance of including additional electronic attributes in neural network potential representations with a minimal architectural change to TensorNet, a state-of-the-art equivariant model based on Cartesian rank-2 tensor representations. By performing experiments on both custom-made and public benchmarking data sets, we show that this modification resolves input degeneracy issues stemming from the use of atomic numbers and positions alone, while enhancing the model's predictive accuracy across diverse chemical systems with different charge or spin states. This is accomplished without tailored strategies or the inclusion of physics-based energy terms, while maintaining efficiency and accuracy. These findings should furthermore encourage researchers to train and use models incorporating these additional representations.
Collapse
Affiliation(s)
- Guillem Simeon
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Antonio Mirarchi
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Raul P Pelaez
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Raimondas Galvelis
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
- Acellera Labs, C Dr Trueta 183, 08005 Barcelona, Spain
| | - Gianni De Fabritiis
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
- Acellera Labs, C Dr Trueta 183, 08005 Barcelona, Spain
- Institució Catalana de Recerca I Estudis Avançats (ICREA), Passeig Lluis Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
3
|
Poltavsky I, Puleva M, Charkin-Gorbulin A, Fonseca G, Batatia I, Browning NJ, Chmiela S, Cui M, Frank JT, Heinen S, Huang B, Käser S, Kabylda A, Khan D, Müller C, Price AJA, Riedmiller K, Töpfer K, Ko TW, Meuwly M, Rupp M, Csányi G, Anatole von Lilienfeld O, Margraf JT, Müller KR, Tkatchenko A. Crash testing machine learning force fields for molecules, materials, and interfaces: molecular dynamics in the TEA challenge 2023. Chem Sci 2025; 16:3738-3754. [PMID: 39911337 PMCID: PMC11791520 DOI: 10.1039/d4sc06530a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Accepted: 12/25/2024] [Indexed: 02/07/2025] Open
Abstract
We present the second part of the rigorous evaluation of modern machine learning force fields (MLFFs) within the TEA Challenge 2023. This study provides an in-depth analysis of the performance of MACE, SO3krates, sGDML, SOAP/GAP, and FCHL19* in modeling molecules, molecule-surface interfaces, and periodic materials. We compare observables obtained from molecular dynamics (MD) simulations using different MLFFs under identical conditions. Where applicable, density-functional theory (DFT) or experiment serves as a reference to reliably assess the performance of the ML models. In the absence of DFT benchmarks, we conduct a comparative analysis based on results from various MLFF architectures. Our findings indicate that, at the current stage of MLFF development, the choice of ML model is in the hands of the practitioner. When a problem falls within the scope of a given MLFF architecture, the resulting simulations exhibit weak dependency on the specific architecture used. Instead, emphasis should be placed on developing complete, reliable, and representative training datasets. Nonetheless, long-range noncovalent interactions remain challenging for all MLFF models, necessitating special caution in simulations of physical systems where such interactions are prominent, such as molecule-surface interfaces. The findings presented here reflect the state of MLFF models as of October 2023.
Collapse
Affiliation(s)
- Igor Poltavsky
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg Luxembourg
| | - Mirela Puleva
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg Luxembourg
- Institute for Advanced Studies, University of Luxembourg Campus Belval L-4365 Esch-sur-Alzette Luxembourg
| | - Anton Charkin-Gorbulin
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg Luxembourg
- Laboratory for Chemistry of Novel Materials, University of Mons B-7000 Mons Belgium
| | - Grégory Fonseca
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg Luxembourg
| | - Ilyes Batatia
- Department of Engineering, University of Cambridge Trumpington Street Cambridge CB2 1PZ UK
| | | | - Stefan Chmiela
- Machine Learning Group, Technical University Berlin Berlin Germany
- BIFOLD, Berlin Institute for the Foundations of Learning and Data Berlin Germany
| | - Mengnan Cui
- Fritz-Haber-Institut der Max-Planck-Gesellschaft Berlin Germany
| | - J Thorben Frank
- Machine Learning Group, Technical University Berlin Berlin Germany
- BIFOLD, Berlin Institute for the Foundations of Learning and Data Berlin Germany
| | - Stefan Heinen
- Vector Institute for Artificial Intelligence Toronto ON M5S 1M1 Canada
| | - Bing Huang
- Wuhan University, Department of Chemistry and Molecular Sciences 430072 Wuhan China
| | - Silvan Käser
- Department of Chemistry, University of Basel Klingelbergstrasse 80 CH-4056 Basel Switzerland
| | - Adil Kabylda
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg Luxembourg
| | - Danish Khan
- Vector Institute for Artificial Intelligence Toronto ON M5S 1M1 Canada
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto St. George Campus Toronto ON Canada
| | - Carolin Müller
- Friedrich-Alexander-Universität Erlangen-Nürnberg, Computer-Chemistry-Center Nägelsbachstraße 25 91052 Erlangen Germany
| | - Alastair J A Price
- Department of Chemistry, University of Toronto St. George campus Toronto ON Canada
- Acceleration Consortium, University of Toronto 80 St George St Toronto ON M5S 3H6 Canada
| | - Kai Riedmiller
- Heidelberg Institute for Theoretical Studies Heidelberg Germany
| | - Kai Töpfer
- Department of Chemistry, University of Basel Klingelbergstrasse 80 CH-4056 Basel Switzerland
| | - Tsz Wai Ko
- Department of NanoEngineering, University of California San Diego 9500 Gilman Dr, Mail Code 0448 La Jolla CA 92093-0448 USA
| | - Markus Meuwly
- Department of Chemistry, University of Basel Klingelbergstrasse 80 CH-4056 Basel Switzerland
| | - Matthias Rupp
- Luxembourg Institute of Science and Technology (LIST) L-4362 Esch-sur-Alzette Luxembourg
| | - Gábor Csányi
- Department of Engineering, University of Cambridge Trumpington Street Cambridge CB2 1PZ UK
| | - O Anatole von Lilienfeld
- Machine Learning Group, Technical University Berlin Berlin Germany
- BIFOLD, Berlin Institute for the Foundations of Learning and Data Berlin Germany
- Vector Institute for Artificial Intelligence Toronto ON M5S 1M1 Canada
- Department of Chemistry, University of Toronto St. George campus Toronto ON Canada
- Acceleration Consortium, University of Toronto 80 St George St Toronto ON M5S 3H6 Canada
- Department of Materials Science and Engineering, University of Toronto St. George campus Toronto ON Canada
- Department of Physics, University of Toronto, St. George campus Toronto ON Canada
| | - Johannes T Margraf
- University of Bayreuth, Bavarian Center for Battery Technology (BayBatt) Bayreuth Germany
| | - Klaus-Robert Müller
- Machine Learning Group, Technical University Berlin Berlin Germany
- BIFOLD, Berlin Institute for the Foundations of Learning and Data Berlin Germany
- University of Bayreuth, Bavarian Center for Battery Technology (BayBatt) Bayreuth Germany
- Department of Artificial Intelligence, Korea University Seoul South Korea
- Max Planck Institut für Informatik Saarbrücken Germany
- Google DeepMind Berlin Germany
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg Luxembourg
- Institute for Advanced Studies, University of Luxembourg Campus Belval L-4365 Esch-sur-Alzette Luxembourg
| |
Collapse
|
4
|
Poltavsky I, Charkin-Gorbulin A, Puleva M, Fonseca G, Batatia I, Browning NJ, Chmiela S, Cui M, Frank JT, Heinen S, Huang B, Käser S, Kabylda A, Khan D, Müller C, Price AJA, Riedmiller K, Töpfer K, Ko TW, Meuwly M, Rupp M, Csányi G, von Lilienfeld OA, Margraf JT, Müller KR, Tkatchenko A. Crash testing machine learning force fields for molecules, materials, and interfaces: model analysis in the TEA Challenge 2023. Chem Sci 2025; 16:3720-3737. [PMID: 39935506 PMCID: PMC11809572 DOI: 10.1039/d4sc06529h] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Accepted: 12/25/2024] [Indexed: 02/13/2025] Open
Abstract
Atomistic simulations are routinely employed in academia and industry to study the behavior of molecules, materials, and their interfaces. Central to these simulations are force fields (FFs), whose development is challenged by intricate interatomic interactions at different spatio-temporal scales and the vast expanse of chemical space. Machine learning (ML) FFs, trained on quantum-mechanical energies and forces, have shown the capacity to achieve sub-kcal (mol-1 Å-1) accuracy while maintaining computational efficiency. The TEA Challenge 2023 rigorously evaluated commonly used MLFFs across diverse applications, highlighting their strengths and weaknesses. Participants trained their models using provided datasets, and the results were systematically analyzed to assess the ability of MLFFs to reproduce potential energy surfaces, handle incomplete reference data, manage multi-component systems, and model complex periodic structures. This publication describes the datasets, outlines the proposed challenges, and presents a detailed analysis of the accuracy, stability, and efficiency of the MACE, SO3krates, sGDML, SOAP/GAP, and FCHL19* architectures in molecular dynamics simulations. The models represent the MLFF developers who participated in the TEA Challenge 2023. All results presented correspond to the state of the ML architectures as of October 2023. A comprehensive analysis of the molecular dynamics results obtained with different MLFFs will be presented in the second part of this manuscript.
Collapse
Affiliation(s)
- Igor Poltavsky
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg Luxembourg
| | - Anton Charkin-Gorbulin
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg Luxembourg
- Laboratory for Chemistry of Novel Materials, University of Mons B-7000 Mons Belgium
| | - Mirela Puleva
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg Luxembourg
- Institute for Advanced Studies, University of Luxembourg Campus Belval L-4365 Esch-sur-Alzette Luxembourg
| | - Grégory Fonseca
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg Luxembourg
| | - Ilyes Batatia
- Department of Engineering, University of Cambridge Trumpington Street Cambridge CB2 1PZ UK
| | | | - Stefan Chmiela
- Machine Learning Group, Technical University Berlin Berlin Germany
- BIFOLD, Berlin Institute for the Foundations of Learning and Data Berlin Germany
| | - Mengnan Cui
- Fritz-Haber-Institut der Max-Planck-Gesellschaft Berlin Germany
| | - J Thorben Frank
- Machine Learning Group, Technical University Berlin Berlin Germany
- BIFOLD, Berlin Institute for the Foundations of Learning and Data Berlin Germany
| | - Stefan Heinen
- Vector Institute for Artificial Intelligence Toronto ON M5S 1M1 Canada
| | - Bing Huang
- Wuhan University, Department of Chemistry and Molecular Sciences 430072 Wuhan China
| | - Silvan Käser
- Department of Chemistry, University of Basel Klingelbergstrasse 80 CH-4056 Basel Switzerland
| | - Adil Kabylda
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg Luxembourg
| | - Danish Khan
- Vector Institute for Artificial Intelligence Toronto ON M5S 1M1 Canada
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto St George Campus Toronto ON Canada
| | - Carolin Müller
- Friedrich-Alexander-Universität Erlangen-Nürnberg, Computer-Chemistry-Center Nägelsbachstraße 25 91052 Erlangen Germany
| | - Alastair J A Price
- Department of Chemistry, University of Toronto St George Campus Toronto ON Canada
- Acceleration Consortium, University of Toronto 80 St George St Toronto ON M5S 3H6 Canada
| | - Kai Riedmiller
- Heidelberg Institute for Theoretical Studies Heidelberg Germany
| | - Kai Töpfer
- Department of Chemistry, University of Basel Klingelbergstrasse 80 CH-4056 Basel Switzerland
| | - Tsz Wai Ko
- Department of NanoEngineering, University of California San Diego 9500 Gilman Dr, Mail Code 0448 La Jolla CA 92093-0448 USA
| | - Markus Meuwly
- Department of Chemistry, University of Basel Klingelbergstrasse 80 CH-4056 Basel Switzerland
| | - Matthias Rupp
- Luxembourg Institute of Science and Technology (LIST) L-4362 Esch-sur-Alzette Luxembourg
| | - Gábor Csányi
- Department of Engineering, University of Cambridge Trumpington Street Cambridge CB2 1PZ UK
| | - O Anatole von Lilienfeld
- Machine Learning Group, Technical University Berlin Berlin Germany
- BIFOLD, Berlin Institute for the Foundations of Learning and Data Berlin Germany
- Vector Institute for Artificial Intelligence Toronto ON M5S 1M1 Canada
- Department of Chemistry, University of Toronto St George Campus Toronto ON Canada
- Acceleration Consortium, University of Toronto 80 St George St Toronto ON M5S 3H6 Canada
- Department of Materials Science and Engineering, University of Toronto St George Campus Toronto ON Canada
- Department of Physics, University of Toronto St George Campus Toronto ON Canada
| | - Johannes T Margraf
- University of Bayreuth, Bavarian Center for Battery Technology (BayBatt) Bayreuth Germany
| | - Klaus-Robert Müller
- Machine Learning Group, Technical University Berlin Berlin Germany
- BIFOLD, Berlin Institute for the Foundations of Learning and Data Berlin Germany
- Department of Artificial Intelligence, Korea University Seoul South Korea
- Max Planck Institut für Informatik Saarbrücken Germany
- Google DeepMind Berlin Germany
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg Luxembourg
- Institute for Advanced Studies, University of Luxembourg Campus Belval L-4365 Esch-sur-Alzette Luxembourg
| |
Collapse
|
5
|
Chen J, Gao Q, Huang M, Yu K. Application of modern artificial intelligence techniques in the development of organic molecular force fields. Phys Chem Chem Phys 2025; 27:2294-2319. [PMID: 39820957 DOI: 10.1039/d4cp02989e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2025]
Abstract
The molecular force field (FF) determines the accuracy of molecular dynamics (MD) and is one of the major bottlenecks that limits the application of MD in molecular design. Recently, artificial intelligence (AI) techniques, such as machine-learning potentials (MLPs), have been rapidly reshaping the landscape of MD. Meanwhile, organic molecular systems feature unique characteristics, and require more careful treatment in both model construction, optimization, and validation. While an accurate and generic organic molecular force field is still missing, significant progress has been made with the facilitation of AI, warranting a promising future. In this review, we provide an overview of the various types of AI techniques used in molecular FF development and discuss both the advantages and weaknesses of these methodologies. We show how AI methods provide unprecedented capabilities in many tasks such as potential fitting, atom typification, and automatic optimization. Meanwhile, it is also worth noting that more efforts are needed to improve the transferability of the model, develop a more comprehensive database, and establish more standardized validation procedures. With these discussions, we hope to inspire more efforts to solve the existing problems, eventually leading to the birth of next-generation generic organic FFs.
Collapse
Affiliation(s)
- Junmin Chen
- Institute of Materials Research (IMR), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China.
- Tsinghua-Berkeley Shenzhen Institute (TBSI), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Qian Gao
- Institute of Materials Research (IMR), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China.
| | - Miaofei Huang
- Institute of Materials Research (IMR), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China.
| | - Kuang Yu
- Institute of Materials Research (IMR), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China.
- Tsinghua-Berkeley Shenzhen Institute (TBSI), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| |
Collapse
|
6
|
Kang S. How graph neural network interatomic potentials extrapolate: Role of the message-passing algorithm. J Chem Phys 2024; 161:244102. [PMID: 39713997 DOI: 10.1063/5.0234287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Accepted: 12/08/2024] [Indexed: 12/24/2024] Open
Abstract
Graph neural network interatomic potentials (GNN-IPs) are gaining significant attention due to their capability of learning from large datasets. Specifically, universal interatomic potentials based on GNN, usually trained with crystalline geometries, often exhibit remarkable extrapolative behavior toward untrained domains, such as surfaces and amorphous configurations. However, the origin of this extrapolation capability is not well understood. This work provides a theoretical explanation of how GNN-IPs extrapolate to untrained geometries. First, we demonstrate that GNN-IPs can capture non-local electrostatic interactions through the message-passing algorithm, as evidenced by tests on toy models and density-functional theory data. We find that GNN-IP models, SevenNet and MACE, accurately predict electrostatic forces in untrained domains, indicating that they have learned the exact functional form of the Coulomb interaction. Based on these results, we suggest that the ability to learn non-local electrostatic interactions, coupled with the embedding nature of GNN-IPs, explains their extrapolation ability. We find that the universal GNN-IP, SevenNet-0, effectively infers non-local Coulomb interactions in untrained domains but fails to extrapolate the non-local forces arising from the kinetic term, which supports the suggested theory. Finally, we address the impact of hyperparameters on the extrapolation performance of universal potentials, such as SevenNet-0 and MACE-MP-0, and discuss the limitations of the extrapolation capabilities.
Collapse
Affiliation(s)
- Sungwoo Kang
- Computational Science Research Center, Korea Institute of Science and Technology (KIST), Seoul 02792, Republic of Korea
| |
Collapse
|
7
|
Maennel H, Unke OT, Müller KR. Complete and Efficient Covariants for Three-Dimensional Point Configurations with Application to Learning Molecular Quantum Properties. J Phys Chem Lett 2024; 15:12513-12519. [PMID: 39670428 DOI: 10.1021/acs.jpclett.4c02376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2024]
Abstract
When physical properties of molecules are being modeled with machine learning, it is desirable to incorporate SO(3)-covariance. While such models based on low body order features are not complete, we formulate and prove general completeness properties for higher order methods and show that 6k - 5 of these features are enough for up to k atoms. We also find that the Clebsch-Gordan operations commonly used in these methods can be replaced by matrix multiplications without sacrificing completeness, lowering the scaling from O(l6) to O(l3) in the degree of the features. We apply this to quantum chemistry, but the proposed methods are generally applicable for problems involving three-dimensional point configurations.
Collapse
Affiliation(s)
- Hartmut Maennel
- Google DeepMind Zürich, Brandschenkestraße 110, 8002 Zürich, Switzerland
| | - Oliver T Unke
- Google DeepMind Berlin, Tucholskystraße 2, 10117 Berlin, Germany
| | - Klaus-Robert Müller
- Google DeepMind, https://deepmind.google/
- TU Berlin, Machine Learning Group, Marchstraße 23, 10587 Berlin, Germany
- Berlin Institute for the Foundation of Learning and Data, Ernst-Reuter-Platz 7, 10587 Berlin, Germany
- Max Planck Institute for Informatics Saarbrücken, Saarland Informatics Campus, Building E1 4, 66123 Sarbrücken, Germany
- Department of Artificial Intelligence, Korea University, Seoul 136-713, Korea
| |
Collapse
|
8
|
Wu Y, Xia J, Zhang Y, Jiang B. Simple and Efficient Equivariant Message-Passing Neural Network Model for Non-local Potential Energy Surfaces. J Phys Chem A 2024; 128:11061-11067. [PMID: 39665419 DOI: 10.1021/acs.jpca.4c06669] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2024]
Abstract
Machine learning potential has become increasingly successful in atomistic simulations. Many of these potentials are based on an atomistic representation in a local environment, but an efficient description of nonlocal interactions that exceed a common local environment remains a challenge. Herein, we propose a simple and efficient equivariant model, EquiREANN, to effectively represent a nonlocal potential energy surface. It relies on a physically inspired message-passing framework, where the fundamental descriptors are linear combinations of atomic orbitals, while both invariant orbital coefficients and the equivariant orbital functions are iteratively updated. We demonstrate that this EquiREANN model is able to describe the subtle potential energy variation due to the nonlocal structural change with high accuracy and little extra computational cost than an invariant message passing model. Our work offers a generalized approach to create equivariant message-passing adaptations of other advanced local many-body descriptors.
Collapse
Affiliation(s)
- Yibin Wu
- Heifei National Laboratory for Physical Science at the Microscale, Department of Chemical Physics, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Junfan Xia
- Heifei National Laboratory for Physical Science at the Microscale, Department of Chemical Physics, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Yaolong Zhang
- Department of Chemistry and Chemical Biology, Center for Computational Chemistry, University of New Mexico, Albuquerque, New Mexico 87131, United States
| | - Bin Jiang
- Key Laboratory of Precision and Intelligent Chemistry, Department of Chemical Physics, University of Science and Technology of China, Hefei, Anhui 230026, China
| |
Collapse
|
9
|
Kulichenko M, Nebgen B, Lubbers N, Smith JS, Barros K, Allen AEA, Habib A, Shinkle E, Fedik N, Li YW, Messerly RA, Tretiak S. Data Generation for Machine Learning Interatomic Potentials and Beyond. Chem Rev 2024; 124:13681-13714. [PMID: 39572011 DOI: 10.1021/acs.chemrev.4c00572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2024]
Abstract
The field of data-driven chemistry is undergoing an evolution, driven by innovations in machine learning models for predicting molecular properties and behavior. Recent strides in ML-based interatomic potentials have paved the way for accurate modeling of diverse chemical and structural properties at the atomic level. The key determinant defining MLIP reliability remains the quality of the training data. A paramount challenge lies in constructing training sets that capture specific domains in the vast chemical and structural space. This Review navigates the intricate landscape of essential components and integrity of training data that ensure the extensibility and transferability of the resulting models. We delve into the details of active learning, discussing its various facets and implementations. We outline different types of uncertainty quantification applied to atomistic data acquisition and the correlations between estimated uncertainty and true error. The role of atomistic data samplers in generating diverse and informative structures is highlighted. Furthermore, we discuss data acquisition via modified and surrogate potential energy surfaces as an innovative approach to diversify training data. The Review also provides a list of publicly available data sets that cover essential domains of chemical space.
Collapse
Affiliation(s)
- Maksim Kulichenko
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Benjamin Nebgen
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Nicholas Lubbers
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Justin S Smith
- NVIDIA Corporation, Santa Clara, California 95051, United States
| | - Kipton Barros
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Alice E A Allen
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Adela Habib
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Emily Shinkle
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Nikita Fedik
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Ying Wai Li
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Richard A Messerly
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Sergei Tretiak
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Integrated Nanotechnologies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| |
Collapse
|
10
|
Röcken S, Burnet AF, Zavadlav J. Predicting solvation free energies with an implicit solvent machine learning potential. J Chem Phys 2024; 161:234101. [PMID: 39679504 DOI: 10.1063/5.0235189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2024] [Accepted: 11/29/2024] [Indexed: 12/17/2024] Open
Abstract
Machine learning (ML) potentials are a powerful tool in molecular modeling, enabling ab initio accuracy for comparably small computational costs. Nevertheless, all-atom simulations employing best-performing graph neural network architectures are still too expensive for applications requiring extensive sampling, such as free energy computations. Implicit solvent models could provide the necessary speed-up due to reduced degrees of freedom and faster dynamics. Here, we introduce a Solvation Free Energy Path Reweighting (ReSolv) framework to parameterize an implicit solvent ML potential for small organic molecules that accurately predicts the hydration free energy, an essential parameter in drug design and pollutant modeling. Learning on a combination of experimental hydration free energy data and ab initio data of molecules in vacuum, ReSolv bypasses the need for intractable ab initio data of molecules in an explicit bulk solvent and does not have to resort to less accurate data-generating models. On the FreeSolv dataset, ReSolv achieves a mean absolute error close to average experimental uncertainty, significantly outperforming standard explicit solvent force fields. Compared to the explicit solvent ML potential, ReSolv offers a computational speedup of four orders of magnitude and attains closer agreement with experiments. The presented framework paves the way for deep molecular models that are more accurate yet computationally more cost-effective than classical atomistic models.
Collapse
Affiliation(s)
- Sebastien Röcken
- Multiscale Modeling of Fluid Materials, Department of Engineering Physics and Computation, TUM School of Engineering and Design, Technical University of Munich, Munich, Germany
| | - Anton F Burnet
- Multiscale Modeling of Fluid Materials, Department of Engineering Physics and Computation, TUM School of Engineering and Design, Technical University of Munich, Munich, Germany
| | - Julija Zavadlav
- Multiscale Modeling of Fluid Materials, Department of Engineering Physics and Computation, TUM School of Engineering and Design, Technical University of Munich, Munich, Germany
| |
Collapse
|
11
|
Hölzer C, Oerder R, Grimme S, Hamaekers J. ConfRank: Improving GFN-FF Conformer Ranking with Pairwise Training. J Chem Inf Model 2024; 64:8909-8925. [PMID: 39565928 DOI: 10.1021/acs.jcim.4c01524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2024]
Abstract
Conformer ranking is a crucial task for drug discovery, with methods for generating conformers often based on molecular (meta)dynamics or sophisticated sampling techniques. These methods are constrained by the underlying force computation regarding runtime and energy ranking accuracy, limiting their effectiveness for large-scale screening applications. To address these ranking limitations, we introduce ConfRank, a machine learning-based approach that enhances conformer ranking using pairwise training. We demonstrate its performance using GFN-FF-generated conformer ensembles, leveraging the DimeNet++ architecture trained on pairs of 159 760 uncharged organic compounds from the GEOM data set with r2SCAN-3c reference level. Instead of predicting only on single molecules, this approach captures relative energy differences between conformers, leading to a significant improvement of the overall conformational ranking, outperforming GFN-FF and GFN2-xTB. Thereby, the pairwise RMSD of the relative energy difference of two conformers can be reduced from 5.65 to 0.71 kcal mol-1 on the test data set, allowing to correctly identify up to 81% of all lowest lying conformers correctly (GFN-FF: 10%, GFN2-xTB: 47%). The ConfRank approach is cost-effective, allowing for scalable deployment on both CPU and GPU, achieving runtime accelerations by up to 2 orders of magnitude compared to GFN2-xTB. Out-of-sample investigations on CREST-generated conformer ensembles from the QM9 data set and conformers taken from an extended GMTKN55 data set show promising results for the robustness of this approach. Thereby, ranking correlation coefficient such as Spearman can be improved to 0.90 (GFN-FF: 0.39, GFN2-xTB: 0.84) reducing the probability of an incorrect sign flip in pairwise energy comparison from 32 to 7%. On the extended GMTKN55 subsets the pairwise MAD (RMSD) could be reduced on almost all subsets by up to 62% (58%) with an average improvement of 30% (29%). Moreover, an exemplary case study on vancomycin shows similar performance, indicating applicability to larger (bio)molecular structures. Furthermore, we motivate the usage of the pairwise training approach from a theoretical perspective, highlighting that while pairwise training can lead to a decline in single sample prediction of absolute energies for ML models, it significantly enhances conformer ranking performance. The data and models used in this study are available at https://github.com/grimme-lab/confrank.
Collapse
Affiliation(s)
- Christian Hölzer
- Mulliken Center for Theoretical Chemistry, University of Bonn, Beringstr. 4, 53115 Bonn, Germany
| | - Rick Oerder
- Institute for Numerical Simulation, Friedrich-Hirzebruch-Allee 7, 53115 Bonn, Germany
- Fraunhofer Institute for Algorithms and Scientific Computing SCAI, Schloss Birlinghoven 1, 53757 Sankt Augustin, Germany
| | - Stefan Grimme
- Mulliken Center for Theoretical Chemistry, University of Bonn, Beringstr. 4, 53115 Bonn, Germany
| | - Jan Hamaekers
- Fraunhofer Institute for Algorithms and Scientific Computing SCAI, Schloss Birlinghoven 1, 53757 Sankt Augustin, Germany
| |
Collapse
|
12
|
Sharma P, Chowdhury PR, Jain A, Patwari GN. Machine Learned Potential Enables Molecular Dynamics Simulation to Predict the Experimental Branching Ratios in the NO Release Channel of Nitroaromatic Compounds. J Phys Chem A 2024; 128:10137-10142. [PMID: 39550764 DOI: 10.1021/acs.jpca.4c04703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2024]
Abstract
This study employs a machine learning (ML) model using the Gaussian process regression algorithm to generate potential energy surfaces (PES) from density functional theory calculations, facilitating the investigation of photodissociation dynamics of nitroaromatic compounds, resulting in NO release. The experimentally observed trends in the slow-to-fast branching ratios of the NO moiety were captured by estimating the branching ratio between the two distinct reaction pathways, viz., roaming and oxaziridine mechanisms, calculated from molecular dynamics simulations performed on a reduced two-dimensional T1 surface. The qualitative agreement between the calculated and experimental results suggests that the mechanism dictating NO release is primarily governed by the dynamics on the T1 surface.
Collapse
Affiliation(s)
- Pooja Sharma
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai 400076, India
| | - Prahlad Roy Chowdhury
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai 400076, India
| | - Amber Jain
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai 400076, India
| | - G Naresh Patwari
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai 400076, India
| |
Collapse
|
13
|
Cheng X, Wu C, Xu J, Han Y, Xie W, Hu P. Leveraging Machine Learning Potentials for In-Situ Searching of Active sites in Heterogeneous Catalysis. PRECISION CHEMISTRY 2024; 2:570-586. [PMID: 39611023 PMCID: PMC11600352 DOI: 10.1021/prechem.4c00051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 08/19/2024] [Accepted: 08/20/2024] [Indexed: 11/30/2024]
Abstract
This Perspective explores the integration of machine learning potentials (MLPs) in the research of heterogeneous catalysis, focusing on their role in identifying in situ active sites and enhancing the understanding of catalytic processes. MLPs utilize extensive databases from high-throughput density functional theory (DFT) calculations to train models that predict atomic configurations, energies, and forces with near-DFT accuracy. These capabilities allow MLPs to handle significantly larger systems and extend simulation times beyond the limitations of traditional ab initio methods. Coupled with global optimization algorithms, MLPs enable systematic investigations across vast structural spaces, making substantial contributions to the modeling of catalyst surface structures under reactive conditions. The review aims to provide a broad introduction to recent advancements and practical guidance on employing MLPs and also showcases several exemplary cases of MLP-driven discoveries related to surface structure changes under reactive conditions and the nature of active sites in heterogeneous catalysis. The prevailing challenges faced by this approach are also discussed.
Collapse
Affiliation(s)
- Xiran Cheng
- School
of Physical Science and Technology, ShanghaiTech
University, Shanghai 201210, China
| | - Chenyu Wu
- School
of Physical Science and Technology, ShanghaiTech
University, Shanghai 201210, China
- Key
Laboratory of Mesoscopic Chemistry of MOE, School of Chemistry and
Chemical Engineering, Nanjing University, Nanjing 210023, China
| | - Jiayan Xu
- School
of Chemistry and Chemical Engineering, The
Queen’s University of Belfast, Belfast BT9 5AG, U.K.
| | - Yulan Han
- School
of Physical Science and Technology, ShanghaiTech
University, Shanghai 201210, China
- School
of Chemistry and Chemical Engineering, The
Queen’s University of Belfast, Belfast BT9 5AG, U.K.
| | - Wenbo Xie
- School
of Physical Science and Technology, ShanghaiTech
University, Shanghai 201210, China
| | - P. Hu
- School
of Physical Science and Technology, ShanghaiTech
University, Shanghai 201210, China
- School
of Chemistry and Chemical Engineering, The
Queen’s University of Belfast, Belfast BT9 5AG, U.K.
| |
Collapse
|
14
|
Chen Y, Yan W, Wang Z, Wu J, Xu X. Constructing Accurate and Efficient General-Purpose Atomistic Machine Learning Model with Transferable Accuracy for Quantum Chemistry. J Chem Theory Comput 2024; 20:9500-9511. [PMID: 39480759 DOI: 10.1021/acs.jctc.4c01151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2024]
Abstract
Density functional theory (DFT) has been a cornerstone in computational science, providing powerful insights into structure-property relationships for molecules and materials through first-principles quantum-mechanical (QM) calculations. However, the advent of atomistic machine learning (ML) is reshaping the landscape by enabling large-scale dynamics simulations and high-throughput screening at DFT-equivalent accuracy with drastically reduced computational cost. Yet, the development of general-purpose atomistic ML models as surrogates for QM calculations faces several challenges, particularly in terms of model capacity, data efficiency, and transferability across chemically diverse systems. This work introduces a novel extension of the polarizable atom interaction neural network (namely, XPaiNN) to address these challenges. Two distinct training strategies have been employed, one direct-learning and the other Δ-ML on top of a semiempirical QM method. These methodologies have been implemented within the same framework, allowing for a detailed comparison of their results. The XPaiNN models, in particular the one using Δ-ML, not only demonstrate competitive performance on standard benchmarks, but also demonstrate the effectiveness against other ML models and QM methods on comprehensive downstream tasks, including noncovalent interactions, reaction energetics, barrier heights, geometry optimization and reaction thermodynamics, etc. This work represents a significant step forward in the pursuit of accurate and efficient atomistic ML models of general-purpose, capable of handling complex chemical systems with transferable accuracy.
Collapse
Affiliation(s)
- Yicheng Chen
- Department of Chemistry, Collaborative Innovation Center of Chemistry for Energy Materials, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, MOE Key Laboratory of Computational Physical Sciences, Fudan University, Shanghai 200433, People's Republic of China
| | - Wenjie Yan
- Department of Chemistry, Collaborative Innovation Center of Chemistry for Energy Materials, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, MOE Key Laboratory of Computational Physical Sciences, Fudan University, Shanghai 200433, People's Republic of China
| | - Zhanfeng Wang
- Department of Chemistry, Collaborative Innovation Center of Chemistry for Energy Materials, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, MOE Key Laboratory of Computational Physical Sciences, Fudan University, Shanghai 200433, People's Republic of China
| | - Jianming Wu
- Department of Chemistry, Collaborative Innovation Center of Chemistry for Energy Materials, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, MOE Key Laboratory of Computational Physical Sciences, Fudan University, Shanghai 200433, People's Republic of China
| | - Xin Xu
- Department of Chemistry, Collaborative Innovation Center of Chemistry for Energy Materials, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, MOE Key Laboratory of Computational Physical Sciences, Fudan University, Shanghai 200433, People's Republic of China
- Hefei National Laboratory, Hefei 230088, People's Republic of China
| |
Collapse
|
15
|
Tang Z, Li H, Lin P, Gong X, Jin G, He L, Jiang H, Ren X, Duan W, Xu Y. A deep equivariant neural network approach for efficient hybrid density functional calculations. Nat Commun 2024; 15:8815. [PMID: 39394190 PMCID: PMC11470148 DOI: 10.1038/s41467-024-53028-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 09/24/2024] [Indexed: 10/13/2024] Open
Abstract
Hybrid density functional calculations are essential for accurate description of electronic structure, yet their widespread use is restricted by the substantial computational cost. Here we develop DeepH-hybrid, a deep equivariant neural network method for learning the hybrid-functional Hamiltonian as a function of material structure, which circumvents the time-consuming self-consistent field iterations and enables the study of large-scale materials with hybrid-functional accuracy. Our extensive experiments demonstrate good reliability as well as effective transferability and efficiency of the method. As a notable application, DeepH-hybrid is applied to study large-supercell Moiré-twisted materials, offering the first case study on how the inclusion of exact exchange affects flat bands in magic-angle twisted bilayer graphene. The work generalizes deep-learning electronic structure methods to beyond conventional density functional theory, facilitating the development of deep-learning-based ab initio methods.
Collapse
Affiliation(s)
- Zechen Tang
- State Key Laboratory of Low Dimensional Quantum Physics and Department of Physics, Tsinghua University, 100084, Beijing, China
| | - He Li
- State Key Laboratory of Low Dimensional Quantum Physics and Department of Physics, Tsinghua University, 100084, Beijing, China
- Institute for Advanced Study, Tsinghua University, 100084, Beijing, China
| | - Peize Lin
- Beijing National Laboratory for Condensed Matter Physics, Institute of Physics, Chinese Academy of Sciences, 100190, Beijing, China
- Songshan Lake Materials Laboratory, 523808, Dongguan, Guangdong, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, 230026, Hefei, Anhui, China
| | - Xiaoxun Gong
- State Key Laboratory of Low Dimensional Quantum Physics and Department of Physics, Tsinghua University, 100084, Beijing, China
- School of Physics, Peking University, 100871, Beijing, China
| | - Gan Jin
- Key Laboratory of Quantum Information, University of Science and Technology of China, 230026, Hefei, Anhui, China
| | - Lixin He
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, 230026, Hefei, Anhui, China
- Key Laboratory of Quantum Information, University of Science and Technology of China, 230026, Hefei, Anhui, China
| | - Hong Jiang
- College of Chemistry and Molecular Engineering, Peking University, 100871, Beijing, China
| | - Xinguo Ren
- Beijing National Laboratory for Condensed Matter Physics, Institute of Physics, Chinese Academy of Sciences, 100190, Beijing, China.
- Songshan Lake Materials Laboratory, 523808, Dongguan, Guangdong, China.
| | - Wenhui Duan
- State Key Laboratory of Low Dimensional Quantum Physics and Department of Physics, Tsinghua University, 100084, Beijing, China.
- Institute for Advanced Study, Tsinghua University, 100084, Beijing, China.
- Frontier Science Center for Quantum Information, Beijing, China.
| | - Yong Xu
- State Key Laboratory of Low Dimensional Quantum Physics and Department of Physics, Tsinghua University, 100084, Beijing, China.
- Frontier Science Center for Quantum Information, Beijing, China.
- RIKEN Center for Emergent Matter Science (CEMS), Wako, Saitama, 351-0198, Japan.
| |
Collapse
|
16
|
Eastman P, Pritchard BP, Chodera JD, Markland TE. Nutmeg and SPICE: Models and Data for Biomolecular Machine Learning. J Chem Theory Comput 2024; 20:8583-8593. [PMID: 39318326 PMCID: PMC11753618 DOI: 10.1021/acs.jctc.4c00794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/26/2024]
Abstract
We describe version 2 of the SPICE data set, a collection of quantum chemistry calculations for training machine learning potentials. It expands on the original data set by adding much more sampling of chemical space and more data on noncovalent interactions. We train a set of potential energy functions called Nutmeg on it. They are based on the TensorNet architecture. They use a novel mechanism to improve performance on charged and polar molecules, injecting precomputed partial charges into the model to provide a reference for the large-scale charge distribution. Evaluation of the new models shows that they do an excellent job of reproducing energy differences between conformations even on highly charged molecules or ones that are significantly larger than the molecules in the training set. They also produce stable molecular dynamics trajectories and are fast enough to be useful for routine simulation of small molecules.
Collapse
Affiliation(s)
- Peter Eastman
- Department of Chemistry, Stanford University, Stanford, CA 94305, USA
| | - Benjamin P. Pritchard
- Molecular Sciences Software Institute, Virginia Polytechnic Institute and State University, Blacksburg, VA 24060, USA
| | - John D. Chodera
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | | |
Collapse
|
17
|
Zaporozhets I, Musil F, Kapil V, Clementi C. Accurate nuclear quantum statistics on machine-learned classical effective potentials. J Chem Phys 2024; 161:134102. [PMID: 39352405 DOI: 10.1063/5.0226764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Accepted: 09/13/2024] [Indexed: 10/03/2024] Open
Abstract
The contribution of nuclear quantum effects (NQEs) to the properties of various hydrogen-bound systems, including biomolecules, is increasingly recognized. Despite the development of many acceleration techniques, the computational overhead of incorporating NQEs in complex systems is sizable, particularly at low temperatures. In this work, we leverage deep learning and multiscale coarse-graining techniques to mitigate the computational burden of path integral molecular dynamics (PIMD). In particular, we employ a machine-learned potential to accurately represent corrections to classical potentials, thereby significantly reducing the computational cost of simulating NQEs. We validate our approach using four distinct systems: Morse potential, Zundel cation, single water molecule, and bulk water. Our framework allows us to accurately compute position-dependent static properties, as demonstrated by the excellent agreement obtained between the machine-learned potential and computationally intensive PIMD calculations, even in the presence of strong NQEs. This approach opens the way to the development of transferable machine-learned potentials capable of accurately reproducing NQEs in a wide range of molecular systems.
Collapse
Affiliation(s)
- Iryna Zaporozhets
- Department of Physics, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
- Department of Chemistry, Rice University, Houston, Texas 77005, USA
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, USA
| | - Félix Musil
- Department of Physics, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
| | - Venkat Kapil
- Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
- Department of Physics and Astronomy, University College, London WC1E 6BT, United Kingdom
- Thomas Young Centre and London Centre for Nanotechnology, London WC1E 6BT, United Kingdom
| | - Cecilia Clementi
- Department of Physics, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
- Department of Chemistry, Rice University, Houston, Texas 77005, USA
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, USA
| |
Collapse
|
18
|
Kubečka J, Ayoubi D, Tang Z, Knattrup Y, Engsvang M, Wu H, Elm J. Accurate modeling of the potential energy surface of atmospheric molecular clusters boosted by neural networks. ENVIRONMENTAL SCIENCE. ADVANCES 2024; 3:1438-1451. [PMID: 39176037 PMCID: PMC11334116 DOI: 10.1039/d4va00255e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Accepted: 08/09/2024] [Indexed: 08/24/2024]
Abstract
The computational cost of accurate quantum chemistry (QC) calculations of large molecular systems can often be unbearably high. Machine learning offers a lower computational cost compared to QC methods while maintaining their accuracy. In this study, we employ the polarizable atom interaction neural network (PaiNN) architecture to train and model the potential energy surface of molecular clusters relevant to atmospheric new particle formation, such as sulfuric acid-ammonia clusters. We compare the differences between PaiNN and previous kernel ridge regression modeling for the Clusteromics I-V data sets. We showcase three models capable of predicting electronic binding energies and interatomic forces with mean absolute errors of <0.3 kcal mol-1 and <0.2 kcal mol-1 Å-1, respectively. Furthermore, we demonstrate that the error of the modeled properties remains below the chemical accuracy of 1 kcal mol-1 even for clusters vastly larger than those in the training database (up to (H2SO4)15(NH3)15 clusters, containing 30 molecules). Consequently, we emphasize the potential applications of these models for faster and more thorough configurational sampling and for boosting molecular dynamics studies of large atmospheric molecular clusters.
Collapse
Affiliation(s)
- Jakub Kubečka
- Department of Chemistry, Aarhus University Langelandsgade 140 8000 Aarhus C Denmark +420 724946622
| | - Daniel Ayoubi
- Department of Chemistry, Aarhus University Langelandsgade 140 8000 Aarhus C Denmark +420 724946622
| | - Zeyuan Tang
- Center for Interstellar Catalysis, Department of Physics and Astronomy, Aarhus University Ny Munkegade 120 8000 Aarhus C Denmark
| | - Yosef Knattrup
- Department of Chemistry, Aarhus University Langelandsgade 140 8000 Aarhus C Denmark +420 724946622
| | - Morten Engsvang
- Department of Chemistry, Aarhus University Langelandsgade 140 8000 Aarhus C Denmark +420 724946622
| | - Haide Wu
- Department of Chemistry, Aarhus University Langelandsgade 140 8000 Aarhus C Denmark +420 724946622
| | - Jonas Elm
- Department of Chemistry, Aarhus University Langelandsgade 140 8000 Aarhus C Denmark +420 724946622
| |
Collapse
|
19
|
Zhu Y, Peng J, Xu C, Lan Z. Unsupervised Machine Learning in the Analysis of Nonadiabatic Molecular Dynamics Simulation. J Phys Chem Lett 2024; 15:9601-9619. [PMID: 39270134 DOI: 10.1021/acs.jpclett.4c01751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/15/2024]
Abstract
The all-atomic full-dimensional-level simulations of nonadiabatic molecular dynamics (NAMD) in large realistic systems has received high research interest in recent years. However, such NAMD simulations normally generate an enormous amount of time-dependent high-dimensional data, leading to a significant challenge in result analyses. Based on unsupervised machine learning (ML) methods, considerable efforts were devoted to developing novel and easy-to-use analysis tools for the identification of photoinduced reaction channels and the comprehensive understanding of complicated molecular motions in NAMD simulations. Here, we tried to survey recent advances in this field, particularly to focus on how to use unsupervised ML methods to analyze the trajectory-based NAMD simulation results. Our purpose is to offer a comprehensive discussion on several essential components of this analysis protocol, including the selection of ML methods, the construction of molecular descriptors, the establishment of analytical frameworks, their advantages and limitations, and persistent challenges.
Collapse
Affiliation(s)
- Yifei Zhu
- MOE Key Laboratory of Environmental Theoretical Chemistry, SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety, School of Environment, South China Normal University, Guangzhou 510006, P. R. China
| | - Jiawei Peng
- MOE Key Laboratory of Environmental Theoretical Chemistry, SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety, School of Environment, South China Normal University, Guangzhou 510006, P. R. China
| | - Chao Xu
- MOE Key Laboratory of Environmental Theoretical Chemistry, SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety, School of Environment, South China Normal University, Guangzhou 510006, P. R. China
| | - Zhenggang Lan
- MOE Key Laboratory of Environmental Theoretical Chemistry, SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety, School of Environment, South China Normal University, Guangzhou 510006, P. R. China
| |
Collapse
|
20
|
Joll K, Schienbein P, Rosso KM, Blumberger J. Machine learning the electric field response of condensed phase systems using perturbed neural network potentials. Nat Commun 2024; 15:8192. [PMID: 39294144 PMCID: PMC11411082 DOI: 10.1038/s41467-024-52491-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Accepted: 09/11/2024] [Indexed: 09/20/2024] Open
Abstract
The interaction of condensed phase systems with external electric fields is of major importance in a myriad of processes in nature and technology, ranging from the field-directed motion of cells (galvanotaxis), to geochemistry and the formation of ice phases on planets, to field-directed chemical catalysis and energy storage and conversion systems including supercapacitors, batteries and solar cells. Molecular simulation in the presence of electric fields would give important atomistic insight into these processes but applications of the most accurate methods such as ab-initio molecular dynamics (AIMD) are limited in scope by their computational expense. Here we introduce Perturbed Neural Network Potential Molecular Dynamics (PNNP MD) to push back the accessible time and length scales of such simulations. We demonstrate that important dielectric properties of liquid water including the field-induced relaxation dynamics, the dielectric constant and the field-dependent IR spectrum can be machine learned up to surprisingly high field strengths of about 0.2 V Å-1 without loss in accuracy when compared to ab-initio molecular dynamics. This is remarkable because, in contrast to most previous approaches, the two neural networks on which PNNP MD is based are exclusively trained on molecular configurations sampled from zero-field MD simulations, demonstrating that the networks not only interpolate but also reliably extrapolate the field response. PNNP MD is based on rigorous theory yet it is simple, general, modular, and systematically improvable allowing us to obtain atomistic insight into the interaction of a wide range of condensed phase systems with external electric fields.
Collapse
Affiliation(s)
- Kit Joll
- Department of Physics and Astronomy and Thomas Young Centre, University College London, London, UK
| | - Philipp Schienbein
- Department of Physics and Astronomy and Thomas Young Centre, University College London, London, UK.
- Department of Physics, Imperial College London, South Kensington, London, UK.
| | - Kevin M Rosso
- Pacific Northwest National Laboratory, Richland, Washington, UK
| | - Jochen Blumberger
- Department of Physics and Astronomy and Thomas Young Centre, University College London, London, UK.
| |
Collapse
|
21
|
Yang Z, Zhao YM, Wang X, Liu X, Zhang X, Li Y, Lv Q, Chen CYC, Shen L. Scalable crystal structure relaxation using an iteration-free deep generative model with uncertainty quantification. Nat Commun 2024; 15:8148. [PMID: 39289379 PMCID: PMC11408520 DOI: 10.1038/s41467-024-52378-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2024] [Accepted: 09/02/2024] [Indexed: 09/19/2024] Open
Abstract
In computational molecular and materials science, determining equilibrium structures is the crucial first step for accurate subsequent property calculations. However, the recent discovery of millions of new crystals and super large twisted structures has challenged traditional computational methods, both ab initio and machine-learning-based, due to their computationally intensive iterative processes. To address these scalability issues, here we introduce DeepRelax, a deep generative model capable of performing geometric crystal structure relaxation rapidly and without iterations. DeepRelax learns the equilibrium structural distribution, enabling it to predict relaxed structures directly from their unrelaxed ones. The ability to perform structural relaxation at the millisecond level per structure, combined with the scalability of parallel processing, makes DeepRelax particularly useful for large-scale virtual screening. We demonstrate DeepRelax's reliability and robustness by applying it to five diverse databases, including oxides, Materials Project, two-dimensional materials, van der Waals crystals, and crystals with point defects. DeepRelax consistently shows high accuracy and efficiency, validated by density functional theory calculations. Finally, we enhance its trustworthiness by integrating uncertainty quantification. This work significantly accelerates computational workflows, offering a robust and trustworthy machine-learning method for material discovery and advancing the application of AI for science.
Collapse
Affiliation(s)
- Ziduo Yang
- Department of Mechanical Engineering, National University of Singapore, Singapore, Singapore
- Artificial Intelligence Medical Research Center, School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, China
- AI for Science (AI4S)-Preferred Program, School of Electronic and Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen, China
| | - Yi-Ming Zhao
- Department of Mechanical Engineering, National University of Singapore, Singapore, Singapore
| | - Xian Wang
- Department of Physics, National University of Singapore, Singapore, Singapore
| | - Xiaoqing Liu
- Department of Mechanical Engineering, National University of Singapore, Singapore, Singapore
| | - Xiuying Zhang
- Department of Mechanical Engineering, National University of Singapore, Singapore, Singapore
| | - Yifan Li
- Department of Mechanical Engineering, National University of Singapore, Singapore, Singapore
| | - Qiujie Lv
- Department of Mechanical Engineering, National University of Singapore, Singapore, Singapore
- Artificial Intelligence Medical Research Center, School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, China
| | - Calvin Yu-Chian Chen
- AI for Science (AI4S)-Preferred Program, School of Electronic and Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen, China.
- State Key Laboratory of Chemical Oncogenomics, School of Chemical Biology and Biotechnology, Peking University Shenzhen Graduate School, Shenzhen, China.
- Department of Medical Research, China Medical University Hospital, Taichung, Taiwan.
- Department of Bioinformatics and Medical Engineering, Asia University, Taichung, Taiwan.
- Guangdong L-Med Biotechnology Co., Ltd., Meizhou, Guangdong, China.
| | - Lei Shen
- Department of Mechanical Engineering, National University of Singapore, Singapore, Singapore.
- National University of Singapore (Chongqing) Research Institute, Chongqing, China.
| |
Collapse
|
22
|
Hou P, Tian Y, Meng X. Improving Molecular-Dynamics Simulations for Solid-Liquid Interfaces with Machine-Learning Interatomic Potentials. Chemistry 2024; 30:e202401373. [PMID: 38877181 DOI: 10.1002/chem.202401373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Revised: 06/13/2024] [Accepted: 06/14/2024] [Indexed: 06/16/2024]
Abstract
Emerging developments in artificial intelligence have opened infinite possibilities for material simulation. Depending on the powerful fitting of machine learning algorithms to first-principles data, machine learning interatomic potentials (MLIPs) can effectively balance the accuracy and efficiency problems in molecular dynamics (MD) simulations, serving as powerful tools in various complex physicochemical systems. Consequently, this brings unprecedented enthusiasm for researchers to apply such novel technology in multiple fields to revisit the major scientific problems that have remained controversial owing to the limitations of previous computational methods. Herein, we introduce the evolution of MLIPs, provide valuable application examples for solid-liquid interfaces, and present current challenges. Driven by solving multitudinous difficulties in terms of the accuracy, efficiency, and versatility of MLIPs, this booming technique, combined with molecular simulation methods, will provide an underlying and valuable understanding of interdisciplinary scientific challenges, including materials, physics, and chemistry.
Collapse
Affiliation(s)
- Pengfei Hou
- Key Laboratory of Physics and Technology for Advanced Batteries (Ministry of Education), College of Physics, Jilin University, Changchun, 130012, China
- Key Laboratory of Material Simulation Methods and Software of Ministry of Education, College of Physics, Jilin University, Changchun, 130012, China
| | - Yumiao Tian
- Key Laboratory of Physics and Technology for Advanced Batteries (Ministry of Education), College of Physics, Jilin University, Changchun, 130012, China
- Key Laboratory of Material Simulation Methods and Software of Ministry of Education, College of Physics, Jilin University, Changchun, 130012, China
| | - Xing Meng
- Key Laboratory of Physics and Technology for Advanced Batteries (Ministry of Education), College of Physics, Jilin University, Changchun, 130012, China
- Key Laboratory of Material Simulation Methods and Software of Ministry of Education, College of Physics, Jilin University, Changchun, 130012, China
| |
Collapse
|
23
|
Glick ZL, Metcalf DP, Glick CS, Spronk SA, Koutsoukas A, Cheney DL, Sherrill CD. A physics-aware neural network for protein-ligand interactions with quantum chemical accuracy. Chem Sci 2024; 15:13313-13324. [PMID: 39183910 PMCID: PMC11339967 DOI: 10.1039/d4sc01029a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Accepted: 07/09/2024] [Indexed: 08/27/2024] Open
Abstract
Quantifying intermolecular interactions with quantum chemistry (QC) is useful for many chemical problems, including understanding the nature of protein-ligand interactions. Unfortunately, QC computations on protein-ligand systems are too computationally expensive for most use cases. The flourishing field of machine-learned (ML) potentials is a promising solution, but it is limited by an inability to easily capture long range, non-local interactions. In this work we develop an atomic-pairwise neural network (AP-Net) specialized for modeling intermolecular interactions. This model benefits from a number of physical constraints, including a two-component equivariant message passing neural network architecture that predicts interaction energies via an intermediate prediction of monomer electron densities. The AP-Net model is trained on a comprehensive dataset composed of paired ligand and protein fragments. This model accurately predicts QC-quality interaction energies of protein-ligand systems at a computational cost reduced by orders of magnitude. Applications of the AP-Net model to molecular crystal structure prediction are explored, as well as limitations in modeling highly polarizable systems.
Collapse
Affiliation(s)
- Zachary L Glick
- School of Chemistry and Biochemistry, School of Computational Science and Engineering, Georgia Institute of Technology Atlanta Georgia 30332-0400 USA
| | - Derek P Metcalf
- School of Chemistry and Biochemistry, School of Computational Science and Engineering, Georgia Institute of Technology Atlanta Georgia 30332-0400 USA
| | - Caroline S Glick
- School of Chemistry and Biochemistry, School of Computational Science and Engineering, Georgia Institute of Technology Atlanta Georgia 30332-0400 USA
| | - Steven A Spronk
- Molecular Structure and Design, Bristol Myers Squibb Company P.O. Box 5400 Princeton New Jersey 08543 USA
| | - Alexios Koutsoukas
- Molecular Structure and Design, Bristol Myers Squibb Company P.O. Box 5400 Princeton New Jersey 08543 USA
| | - Daniel L Cheney
- Molecular Structure and Design, Bristol Myers Squibb Company P.O. Box 5400 Princeton New Jersey 08543 USA
| | - C David Sherrill
- School of Chemistry and Biochemistry, School of Computational Science and Engineering, Georgia Institute of Technology Atlanta Georgia 30332-0400 USA
| |
Collapse
|
24
|
van Gerwen P, Briling KR, Bunne C, Somnath VR, Laplaza R, Krause A, Corminboeuf C. 3DReact: Geometric Deep Learning for Chemical Reactions. J Chem Inf Model 2024; 64:5771-5785. [PMID: 39007724 PMCID: PMC11323278 DOI: 10.1021/acs.jcim.4c00104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Revised: 07/03/2024] [Accepted: 07/08/2024] [Indexed: 07/16/2024]
Abstract
Geometric deep learning models, which incorporate the relevant molecular symmetries within the neural network architecture, have considerably improved the accuracy and data efficiency of predictions of molecular properties. Building on this success, we introduce 3DReact, a geometric deep learning model to predict reaction properties from three-dimensional structures of reactants and products. We demonstrate that the invariant version of the model is sufficient for existing reaction data sets. We illustrate its competitive performance on the prediction of activation barriers on the GDB7-22-TS, Cyclo-23-TS, and Proparg-21-TS data sets in different atom-mapping regimes. We show that, compared to existing models for reaction property prediction, 3DReact offers a flexible framework that exploits atom-mapping information, if available, as well as geometries of reactants and products (in an invariant or equivariant fashion). Accordingly, it performs systematically well across different data sets, atom-mapping regimes, as well as both interpolation and extrapolation tasks.
Collapse
Affiliation(s)
- Puck van Gerwen
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National
Center for Competence in Research − Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
| | - Ksenia R. Briling
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Charlotte Bunne
- National
Center for Competence in Research − Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
- Learning
& Adaptive Systems Group, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland
| | - Vignesh Ram Somnath
- National
Center for Competence in Research − Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
- Learning
& Adaptive Systems Group, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland
| | - Ruben Laplaza
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National
Center for Competence in Research − Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
| | - Andreas Krause
- National
Center for Competence in Research − Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
- Learning
& Adaptive Systems Group, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland
| | - Clemence Corminboeuf
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National
Center for Competence in Research − Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
25
|
Frank JT, Unke OT, Müller KR, Chmiela S. A Euclidean transformer for fast and stable machine learned force fields. Nat Commun 2024; 15:6539. [PMID: 39107296 PMCID: PMC11303804 DOI: 10.1038/s41467-024-50620-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 07/10/2024] [Indexed: 08/10/2024] Open
Abstract
Recent years have seen vast progress in the development of machine learned force fields (MLFFs) based on ab-initio reference calculations. Despite achieving low test errors, the reliability of MLFFs in molecular dynamics (MD) simulations is facing growing scrutiny due to concerns about instability over extended simulation timescales. Our findings suggest a potential connection between robustness to cumulative inaccuracies and the use of equivariant representations in MLFFs, but the computational cost associated with these representations can limit this advantage in practice. To address this, we propose a transformer architecture called SO3KRATES that combines sparse equivariant representations (Euclidean variables) with a self-attention mechanism that separates invariant and equivariant information, eliminating the need for expensive tensor products. SO3KRATES achieves a unique combination of accuracy, stability, and speed that enables insightful analysis of quantum properties of matter on extended time and system size scales. To showcase this capability, we generate stable MD trajectories for flexible peptides and supra-molecular structures with hundreds of atoms. Furthermore, we investigate the PES topology for medium-sized chainlike molecules (e.g., small peptides) by exploring thousands of minima. Remarkably, SO3KRATES demonstrates the ability to strike a balance between the conflicting demands of stability and the emergence of new minimum-energy conformations beyond the training data, which is crucial for realistic exploration tasks in the field of biochemistry.
Collapse
Affiliation(s)
- J Thorben Frank
- Machine Learning Group, TU Berlin, Berlin, Germany
- BIFOLD, Berlin Institute for the Foundations of Learning and Data, Berlin, Germany
| | | | - Klaus-Robert Müller
- Machine Learning Group, TU Berlin, Berlin, Germany.
- BIFOLD, Berlin Institute for the Foundations of Learning and Data, Berlin, Germany.
- Google DeepMind, Berlin, Germany.
- Department of Artificial Intelligence, Korea University, Seoul, Korea.
- Max Planck Institut für Informatik, Saarbrücken, Germany.
| | - Stefan Chmiela
- Machine Learning Group, TU Berlin, Berlin, Germany.
- BIFOLD, Berlin Institute for the Foundations of Learning and Data, Berlin, Germany.
| |
Collapse
|
26
|
Plé T, Adjoua O, Lagardère L, Piquemal JP. FeNNol: An efficient and flexible library for building force-field-enhanced neural network potentials. J Chem Phys 2024; 161:042502. [PMID: 39051830 DOI: 10.1063/5.0217688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Accepted: 06/28/2024] [Indexed: 07/27/2024] Open
Abstract
Neural network interatomic potentials (NNPs) have recently proven to be powerful tools to accurately model complex molecular systems while bypassing the high numerical cost of ab initio molecular dynamics simulations. In recent years, numerous advances in model architectures as well as the development of hybrid models combining machine-learning (ML) with more traditional, physically motivated, force-field interactions have considerably increased the design space of ML potentials. In this paper, we present FeNNol, a new library for building, training, and running force-field-enhanced neural network potentials. It provides a flexible and modular system for building hybrid models, allowing us to easily combine state-of-the-art embeddings with ML-parameterized physical interaction terms without the need for explicit programming. Furthermore, FeNNol leverages the automatic differentiation and just-in-time compilation features of the Jax Python library to enable fast evaluation of NNPs, shrinking the performance gap between ML potentials and standard force-fields. This is demonstrated with the popular ANI-2x model reaching simulation speeds nearly on par with the AMOEBA polarizable force-field on commodity GPUs (graphics processing units). We hope that FeNNol will facilitate the development and application of new hybrid NNP architectures for a wide range of molecular simulation problems.
Collapse
Affiliation(s)
- Thomas Plé
- Sorbonne Université, LCT, UMR 7616 CNRS, 75005 Paris, France
| | - Olivier Adjoua
- Sorbonne Université, LCT, UMR 7616 CNRS, 75005 Paris, France
| | - Louis Lagardère
- Sorbonne Université, LCT, UMR 7616 CNRS, 75005 Paris, France
| | | |
Collapse
|
27
|
Biriukov D, Vácha R. Pathways to a Shiny Future: Building the Foundation for Computational Physical Chemistry and Biophysics in 2050. ACS PHYSICAL CHEMISTRY AU 2024; 4:302-313. [PMID: 39069976 PMCID: PMC11274290 DOI: 10.1021/acsphyschemau.4c00003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Revised: 03/15/2024] [Accepted: 03/18/2024] [Indexed: 07/30/2024]
Abstract
In the last quarter-century, the field of molecular dynamics (MD) has undergone a remarkable transformation, propelled by substantial enhancements in software, hardware, and underlying methodologies. In this Perspective, we contemplate the future trajectory of MD simulations and their possible look at the year 2050. We spotlight the pivotal role of artificial intelligence (AI) in shaping the future of MD and the broader field of computational physical chemistry. We outline critical strategies and initiatives that are essential for the seamless integration of such technologies. Our discussion delves into topics like multiscale modeling, adept management of ever-increasing data deluge, the establishment of centralized simulation databases, and the autonomous refinement, cross-validation, and self-expansion of these repositories. The successful implementation of these advancements requires scientific transparency, a cautiously optimistic approach to interpreting AI-driven simulations and their analysis, and a mindset that prioritizes knowledge-motivated research alongside AI-enhanced big data exploration. While history reminds us that the trajectory of technological progress can be unpredictable, this Perspective offers guidance on preparedness and proactive measures, aiming to steer future advancements in the most beneficial and successful direction.
Collapse
Affiliation(s)
- Denys Biriukov
- CEITEC
− Central European Institute of Technology, Masaryk University, Kamenice 753/5, 625 00 Brno, Czech Republic
- National
Centre for Biomolecular Research, Faculty of Science, Masaryk University, Kamenice 753/5, 625 00 Brno, Czech Republic
| | - Robert Vácha
- CEITEC
− Central European Institute of Technology, Masaryk University, Kamenice 753/5, 625 00 Brno, Czech Republic
- National
Centre for Biomolecular Research, Faculty of Science, Masaryk University, Kamenice 753/5, 625 00 Brno, Czech Republic
- Department
of Condensed Matter Physics, Faculty of Science, Masaryk University, Kotlářská 267/2, 611 37 Brno, Czech
Republic
| |
Collapse
|
28
|
Slootman E, Poltavsky I, Shinde R, Cocomello J, Moroni S, Tkatchenko A, Filippi C. Accurate Quantum Monte Carlo Forces for Machine-Learned Force Fields: Ethanol as a Benchmark. J Chem Theory Comput 2024; 20:6020-6027. [PMID: 39003522 PMCID: PMC11270822 DOI: 10.1021/acs.jctc.4c00498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 05/31/2024] [Accepted: 06/03/2024] [Indexed: 07/15/2024]
Abstract
Quantum Monte Carlo (QMC) is a powerful method to calculate accurate energies and forces for molecular systems. In this work, we demonstrate how we can obtain accurate QMC forces for the fluxional ethanol molecule at room temperature by using either multideterminant Jastrow-Slater wave functions in variational Monte Carlo or just a single determinant in diffusion Monte Carlo. The excellent performance of our protocols is assessed against high-level coupled cluster calculations on a diverse set of representative configurations of the system. Finally, we train machine-learning force fields on the QMC forces and compare them to models trained on coupled cluster reference data, showing that a force field based on the diffusion Monte Carlo forces with a single determinant can faithfully reproduce coupled cluster power spectra in molecular dynamics simulations.
Collapse
Affiliation(s)
- E. Slootman
- MESA+
Institute for Nanotechnology, University
of Twente, P.O. Box 217,
7500 AE Enschede, The Netherlands
| | - I. Poltavsky
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - R. Shinde
- MESA+
Institute for Nanotechnology, University
of Twente, P.O. Box 217,
7500 AE Enschede, The Netherlands
| | - J. Cocomello
- MESA+
Institute for Nanotechnology, University
of Twente, P.O. Box 217,
7500 AE Enschede, The Netherlands
| | - S. Moroni
- CNR-IOM
DEMOCRITOS, Istituto Officina dei Materiali,
and SISSA Scuola Internazionale Superiore di Studi Avanzati, Via Bonomea 265, I-34136 Trieste, Italy
| | - A. Tkatchenko
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - C. Filippi
- MESA+
Institute for Nanotechnology, University
of Twente, P.O. Box 217,
7500 AE Enschede, The Netherlands
| |
Collapse
|
29
|
Fallani A, Medrano Sandonas L, Tkatchenko A. Inverse mapping of quantum properties to structures for chemical space of small organic molecules. Nat Commun 2024; 15:6061. [PMID: 39025883 PMCID: PMC11258234 DOI: 10.1038/s41467-024-50401-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Accepted: 07/01/2024] [Indexed: 07/20/2024] Open
Abstract
Computer-driven molecular design combines the principles of chemistry, physics, and artificial intelligence to identify chemical compounds with tailored properties. While quantum-mechanical (QM) methods, coupled with machine learning, already offer a direct mapping from 3D molecular structures to their properties, effective methodologies for the inverse mapping in chemical space remain elusive. We address this challenge by demonstrating the possibility of parametrizing a chemical space with a finite set of QM properties. Our proof-of-concept implementation achieves an approximate property-to-structure mapping, the QIM model (which stands for "Quantum Inverse Mapping"), by forcing a variational auto-encoder with a property encoder to obtain a common internal representation for both structures and properties. After validating this mapping for small drug-like molecules, we illustrate its capabilities with an explainability study as well as by the generation of de novo molecular structures with targeted properties and transition pathways between conformational isomers. Our findings thus provide a proof-of-principle demonstration aiming to enable the inverse property-to-structure design in diverse chemical spaces.
Collapse
Affiliation(s)
- Alessio Fallani
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
| | - Leonardo Medrano Sandonas
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
- Institute for Materials Science and Max Bergmann Center of Biomaterials, TU Dresden, 01062, Dresden, Germany.
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
| |
Collapse
|
30
|
Atz K, Nippa DF, Müller AT, Jost V, Anelli A, Reutlinger M, Kramer C, Martin RE, Grether U, Schneider G, Wuitschik G. Geometric deep learning-guided Suzuki reaction conditions assessment for applications in medicinal chemistry. RSC Med Chem 2024; 15:2310-2321. [PMID: 39026644 PMCID: PMC11253849 DOI: 10.1039/d4md00196f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 05/25/2024] [Indexed: 07/20/2024] Open
Abstract
Suzuki cross-coupling reactions are considered a valuable tool for constructing carbon-carbon bonds in small molecule drug discovery. However, the synthesis of chemical matter often represents a time-consuming and labour-intensive bottleneck. We demonstrate how machine learning methods trained on high-throughput experimentation (HTE) data can be leveraged to enable fast reaction condition selection for novel coupling partners. We show that the trained models support chemists in determining suitable catalyst-solvent-base combinations for individual transformations including an evaluation of the need for HTE screening. We introduce an algorithm for designing 96-well plates optimized towards reaction yields and discuss the model performance of zero- and few-shot machine learning. The best-performing machine learning model achieved a three-category classification accuracy of 76.3% (±0.2%) and an F 1-score for a binary classification of 79.1% (±0.9%). Validation on eight reactions revealed a receiver operating characteristic (ROC) curve (AUC) value of 0.82 (±0.07) for few-shot machine learning. On the other hand, zero-shot machine learning models achieved a mean ROC-AUC value of 0.63 (±0.16). This study positively advocates the application of few-shot machine learning-guided reaction condition selection for HTE campaigns in medicinal chemistry and highlights practical applications as well as challenges associated with zero-shot machine learning.
Collapse
Affiliation(s)
- Kenneth Atz
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - David F Nippa
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Alex T Müller
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Vera Jost
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Andrea Anelli
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Michael Reutlinger
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Christian Kramer
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Rainer E Martin
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Uwe Grether
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, ETH Zurich Vladimir-Prelog-Weg 4 8093 Zurich Switzerland
| | - Georg Wuitschik
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| |
Collapse
|
31
|
Medrano Sandonas L, Van Rompaey D, Fallani A, Hilfiker M, Hahn D, Perez-Benito L, Verhoeven J, Tresadern G, Kurt Wegner J, Ceulemans H, Tkatchenko A. Dataset for quantum-mechanical exploration of conformers and solvent effects in large drug-like molecules. Sci Data 2024; 11:742. [PMID: 38972891 PMCID: PMC11228031 DOI: 10.1038/s41597-024-03521-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 06/13/2024] [Indexed: 07/09/2024] Open
Abstract
We here introduce the Aquamarine (AQM) dataset, an extensive quantum-mechanical (QM) dataset that contains the structural and electronic information of 59,783 low-and high-energy conformers of 1,653 molecules with a total number of atoms ranging from 2 to 92 (mean: 50.9), and containing up to 54 (mean: 28.2) non-hydrogen atoms. To gain insights into the solvent effects as well as collective dispersion interactions for drug-like molecules, we have performed QM calculations supplemented with a treatment of many-body dispersion (MBD) interactions of structures and properties in the gas phase and implicit water. Thus, AQM contains over 40 global and local physicochemical properties (including ground-state and response properties) per conformer computed at the tightly converged PBE0+MBD level of theory for gas-phase molecules, whereas PBE0+MBD with the modified Poisson-Boltzmann (MPB) model of water was used for solvated molecules. By addressing both molecule-solvent and dispersion interactions, AQM dataset can serve as a challenging benchmark for state-of-the-art machine learning methods for property modeling and de novo generation of large (solvated) molecules with pharmaceutical and biological relevance.
Collapse
Affiliation(s)
- Leonardo Medrano Sandonas
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
- Institute for Materials Science and Max Bergmann Center of Biomaterials, TU Dresden, 01062, Dresden, Germany.
| | - Dries Van Rompaey
- Drug Discovery Data Sciences (D3S), Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium.
| | - Alessio Fallani
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg
- Drug Discovery Data Sciences (D3S), Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Mathias Hilfiker
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg
| | - David Hahn
- Computational Chemistry, Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Laura Perez-Benito
- Computational Chemistry, Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Jonas Verhoeven
- Drug Discovery Data Sciences (D3S), Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Gary Tresadern
- Computational Chemistry, Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Joerg Kurt Wegner
- Drug Discovery Data Sciences (D3S), Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
- Drug Discovery Data Sciences (D3S), Johnson & Johnson Innovative Medicine, 301 Binney Street, MA 02142, Cambridge, USA
| | - Hugo Ceulemans
- Drug Discovery Data Sciences (D3S), Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
| |
Collapse
|
32
|
Aldossary A, Campos-Gonzalez-Angulo JA, Pablo-García S, Leong SX, Rajaonson EM, Thiede L, Tom G, Wang A, Avagliano D, Aspuru-Guzik A. In Silico Chemical Experiments in the Age of AI: From Quantum Chemistry to Machine Learning and Back. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2024; 36:e2402369. [PMID: 38794859 DOI: 10.1002/adma.202402369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 04/28/2024] [Indexed: 05/26/2024]
Abstract
Computational chemistry is an indispensable tool for understanding molecules and predicting chemical properties. However, traditional computational methods face significant challenges due to the difficulty of solving the Schrödinger equations and the increasing computational cost with the size of the molecular system. In response, there has been a surge of interest in leveraging artificial intelligence (AI) and machine learning (ML) techniques to in silico experiments. Integrating AI and ML into computational chemistry increases the scalability and speed of the exploration of chemical space. However, challenges remain, particularly regarding the reproducibility and transferability of ML models. This review highlights the evolution of ML in learning from, complementing, or replacing traditional computational chemistry for energy and property predictions. Starting from models trained entirely on numerical data, a journey set forth toward the ideal model incorporating or learning the physical laws of quantum mechanics. This paper also reviews existing computational methods and ML models and their intertwining, outlines a roadmap for future research, and identifies areas for improvement and innovation. Ultimately, the goal is to develop AI architectures capable of predicting accurate and transferable solutions to the Schrödinger equation, thereby revolutionizing in silico experiments within chemistry and materials science.
Collapse
Affiliation(s)
- Abdulrahman Aldossary
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
| | | | - Sergio Pablo-García
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, ON, M5S 2E4, Canada
| | - Shi Xuan Leong
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
| | - Ella Miray Rajaonson
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
| | - Luca Thiede
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, ON, M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
| | - Gary Tom
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
| | - Andrew Wang
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
| | - Davide Avagliano
- Chimie ParisTech, PSL University, CNRS, Institute of Chemistry for Life and Health Sciences (iCLeHS UMR 8060), Paris, F-75005, France
| | - Alán Aspuru-Guzik
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, ON, M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
- Department of Materials Science & Engineering, University of Toronto, 184 College St., Toronto, ON, M5S 3E4, Canada
- Department of Chemical Engineering & Applied Chemistry, University of Toronto, 200 College St., Toronto, ON, M5S 3E5, Canada
- Lebovic Fellow, Canadian Institute for Advanced Research (CIFAR), 66118 University Ave., Toronto, M5G 1M1, Canada
- Acceleration Consortium, 80 St George St, Toronto, M5S 3H6, Canada
| |
Collapse
|
33
|
Yang Y, Zhang S, Ranasinghe KD, Isayev O, Roitberg AE. Machine Learning of Reactive Potentials. Annu Rev Phys Chem 2024; 75:371-395. [PMID: 38941524 DOI: 10.1146/annurev-physchem-062123-024417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/30/2024]
Abstract
In the past two decades, machine learning potentials (MLPs) have driven significant developments in chemical, biological, and material sciences. The construction and training of MLPs enable fast and accurate simulations and analysis of thermodynamic and kinetic properties. This review focuses on the application of MLPs to reaction systems with consideration of bond breaking and formation. We review the development of MLP models, primarily with neural network and kernel-based algorithms, and recent applications of reactive MLPs (RMLPs) to systems at different scales. We show how RMLPs are constructed, how they speed up the calculation of reactive dynamics, and how they facilitate the study of reaction trajectories, reaction rates, free energy calculations, and many other calculations. Different data sampling strategies applied in building RMLPs are also discussed with a focus on how to collect structures for rare events and how to further improve their performance with active learning.
Collapse
Affiliation(s)
- Yinuo Yang
- Department of Chemistry, University of Florida, Gainesville, Florida;
| | - Shuhao Zhang
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania;
| | | | - Olexandr Isayev
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania;
| | - Adrian E Roitberg
- Department of Chemistry, University of Florida, Gainesville, Florida;
| |
Collapse
|
34
|
Pelaez RP, Simeon G, Galvelis R, Mirarchi A, Eastman P, Doerr S, Thölke P, Markland TE, De Fabritiis G. TorchMD-Net 2.0: Fast Neural Network Potentials for Molecular Simulations. J Chem Theory Comput 2024; 20:4076-4087. [PMID: 38743033 DOI: 10.1021/acs.jctc.4c00253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Achieving a balance between computational speed, prediction accuracy, and universal applicability in molecular simulations has been a persistent challenge. This paper presents substantial advancements in TorchMD-Net software, a pivotal step forward in the shift from conventional force fields to neural network-based potentials. The evolution of TorchMD-Net into a more comprehensive and versatile framework is highlighted, incorporating cutting-edge architectures such as TensorNet. This transformation is achieved through a modular design approach, encouraging customized applications within the scientific community. The most notable enhancement is a significant improvement in computational efficiency, achieving a very remarkable acceleration in the computation of energy and forces for TensorNet models, with performance gains ranging from 2× to 10× over previous, nonoptimized, iterations. Other enhancements include highly optimized neighbor search algorithms that support periodic boundary conditions and smooth integration with existing molecular dynamics frameworks. Additionally, the updated version introduces the capability to integrate physical priors, further enriching its application spectrum and utility in research. The software is available at https://github.com/torchmd/torchmd-net.
Collapse
Affiliation(s)
- Raul P Pelaez
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Guillem Simeon
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Raimondas Galvelis
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
- Acellera Labs, C Dr Trueta 183, 08005 Barcelona, Spain
| | - Antonio Mirarchi
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Peter Eastman
- Department of Chemistry, Stanford University, Stanford, California 94305, United States
| | - Stefan Doerr
- Acellera Labs, C Dr Trueta 183, 08005 Barcelona, Spain
| | | | - Thomas E Markland
- Department of Chemistry, Stanford University, Stanford, California 94305, United States
| | - Gianni De Fabritiis
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
- Acellera Labs, C Dr Trueta 183, 08005 Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
35
|
Pelaez RP, Simeon G, Galvelis R, Mirarchi A, Eastman P, Doerr S, Thölke P, Markland TE, De Fabritiis G. TorchMD-Net 2.0: Fast Neural Network Potentials for Molecular Simulations. ARXIV 2024:arXiv:2402.17660v3. [PMID: 38463504 PMCID: PMC10925388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Achieving a balance between computational speed, prediction accuracy, and universal applicability in molecular simulations has been a persistent challenge. This paper presents substantial advancements in the TorchMD-Net software, a pivotal step forward in the shift from conventional force fields to neural network-based potentials. The evolution of TorchMD-Net into a more comprehensive and versatile framework is highlighted, incorporating cutting-edge architectures such as TensorNet. This transformation is achieved through a modular design approach, encouraging customized applications within the scientific community. The most notable enhancement is a significant improvement in computational efficiency, achieving a very remarkable acceleration in the computation of energy and forces for Tensor-Net models, with performance gains ranging from 2x to 10x over previous, non-optimized, iterations. Other enhancements include highly optimized neighbor search algorithms that support periodic boundary conditions and smooth integration with existing molecular dynamics frameworks. Additionally, the updated version introduces the capability to integrate physical priors, further enriching its application spectrum and utility in research. The software is available at https://github.com/torchmd/torchmd-net.
Collapse
Affiliation(s)
- Raul P Pelaez
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Guillem Simeon
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Raimondas Galvelis
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
- Acellera Labs, C Dr Trueta 183, 08005, Barcelona, Spain
| | - Antonio Mirarchi
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Peter Eastman
- Department of Chemistry, Stanford University, Stanford, CA 94305, USA
| | - Stefan Doerr
- Acellera Labs, C Dr Trueta 183, 08005, Barcelona, Spain
| | | | - Thomas E Markland
- Department of Chemistry, Stanford University, Stanford, CA 94305, USA
| | - Gianni De Fabritiis
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
- Acellera Labs, C Dr Trueta 183, 08005, Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
36
|
Que ZX, Li SZ, Huang B, Yang ZX, Zhang WB. Ultra-flat bands at large twist angles in group-V twisted bilayer materials. J Chem Phys 2024; 160:194710. [PMID: 38767261 DOI: 10.1063/5.0197757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Accepted: 05/01/2024] [Indexed: 05/22/2024] Open
Abstract
Flat bands in 2D twisted materials are key to the realization of correlation-related exotic phenomena. However, a flat band often was achieved in the large system with a very small twist angle, which enormously increases the computational and experimental complexity. In this work, we proposed group-V twisted bilayer materials, including P, As, and Sb in the β phase with large twist angles. The band structure of twisted bilayer materials up to 2524 atoms has been investigated by a deep learning method DeepH, which significantly reduces the computational time. Our results show that the bandgap and the flat bandwidth of twisted bilayer β-P, β-As, and β-Sb reduce gradually with the decreasing of twist angle, and the ultra-flat band with bandwidth approaching 0 eV is achieved. Interestingly, we found that a twist angle of 9.43° is sufficient to achieve the band flatness for β-As comparable to that of twist bilayer graphene at the magic angle of 1.08°. Moreover, we also find that the bandgap reduces with decreasing interlayer distance while the flat band is still preserved, which suggests interlayer distance as an effective routine to tune the bandgap of flat band systems. Our research provides a feasible platform for exploring physical phenomena related to flat bands in twisted layered 2D materials.
Collapse
Affiliation(s)
- Zhi-Xiong Que
- Hunan Provincial Key Laboratory of Flexible Electronic Materials Genome Engineering, School of Physics and Electronic Sciences, Changsha University of Science and Technology, Changsha 410114, China
| | - Shu-Zong Li
- Hunan Provincial Key Laboratory of Flexible Electronic Materials Genome Engineering, School of Physics and Electronic Sciences, Changsha University of Science and Technology, Changsha 410114, China
| | - Bo Huang
- Hunan Provincial Key Laboratory of Flexible Electronic Materials Genome Engineering, School of Physics and Electronic Sciences, Changsha University of Science and Technology, Changsha 410114, China
| | - Zhi-Xiong Yang
- Hunan Provincial Key Laboratory of Flexible Electronic Materials Genome Engineering, School of Physics and Electronic Sciences, Changsha University of Science and Technology, Changsha 410114, China
| | - Wei-Bing Zhang
- Hunan Provincial Key Laboratory of Flexible Electronic Materials Genome Engineering, School of Physics and Electronic Sciences, Changsha University of Science and Technology, Changsha 410114, China
| |
Collapse
|
37
|
Wang G, Wang C, Zhang X, Li Z, Zhou J, Sun Z. Machine learning interatomic potential: Bridge the gap between small-scale models and realistic device-scale simulations. iScience 2024; 27:109673. [PMID: 38646181 PMCID: PMC11033164 DOI: 10.1016/j.isci.2024.109673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/23/2024] Open
Abstract
Machine learning interatomic potential (MLIP) overcomes the challenges of high computational costs in density-functional theory and the relatively low accuracy in classical large-scale molecular dynamics, facilitating more efficient and precise simulations in materials research and design. In this review, the current state of the four essential stages of MLIP is discussed, including data generation methods, material structure descriptors, six unique machine learning algorithms, and available software. Furthermore, the applications of MLIP in various fields are investigated, notably in phase-change memory materials, structure searching, material properties predicting, and the pre-trained universal models. Eventually, the future perspectives, consisting of standard datasets, transferability, generalization, and trade-off between accuracy and complexity in MLIPs, are reported.
Collapse
Affiliation(s)
- Guanjie Wang
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
- School of Integrated Circuit Science and Engineering, Beihang University, Beijing 100191, China
| | - Changrui Wang
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| | - Xuanguang Zhang
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| | - Zefeng Li
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| | - Jian Zhou
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| | - Zhimei Sun
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| |
Collapse
|
38
|
Dombrowski AK, Gerken JE, Muller KR, Kessel P. Diffeomorphic Counterfactuals With Generative Models. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:3257-3274. [PMID: 38055368 DOI: 10.1109/tpami.2023.3339980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/08/2023]
Abstract
Counterfactuals can explain classification decisions of neural networks in a human interpretable way. We propose a simple but effective method to generate such counterfactuals. More specifically, we perform a suitable diffeomorphic coordinate transformation and then perform gradient ascent in these coordinates to find counterfactuals which are classified with great confidence as a specified target class. We propose two methods to leverage generative models to construct such suitable coordinate systems that are either exactly or approximately diffeomorphic. We analyze the generation process theoretically using Riemannian differential geometry and validate the quality of the generated counterfactuals using various qualitative and quantitative measures.
Collapse
|
39
|
Zhang S, Makoś MZ, Jadrich RB, Kraka E, Barros K, Nebgen BT, Tretiak S, Isayev O, Lubbers N, Messerly RA, Smith JS. Exploring the frontiers of condensed-phase chemistry with a general reactive machine learning potential. Nat Chem 2024; 16:727-734. [PMID: 38454071 PMCID: PMC11087274 DOI: 10.1038/s41557-023-01427-3] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 12/12/2023] [Indexed: 03/09/2024]
Abstract
Atomistic simulation has a broad range of applications from drug design to materials discovery. Machine learning interatomic potentials (MLIPs) have become an efficient alternative to computationally expensive ab initio simulations. For this reason, chemistry and materials science would greatly benefit from a general reactive MLIP, that is, an MLIP that is applicable to a broad range of reactive chemistry without the need for refitting. Here we develop a general reactive MLIP (ANI-1xnr) through automated sampling of condensed-phase reactions. ANI-1xnr is then applied to study five distinct systems: carbon solid-phase nucleation, graphene ring formation from acetylene, biofuel additives, combustion of methane and the spontaneous formation of glycine from early earth small molecules. In all studies, ANI-1xnr closely matches experiment (when available) and/or previous studies using traditional model chemistry methods. As such, ANI-1xnr proves to be a highly general reactive MLIP for C, H, N and O elements in the condensed phase, enabling high-throughput in silico reactive chemistry experimentation.
Collapse
Affiliation(s)
- Shuhao Zhang
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, PA, USA
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Małgorzata Z Makoś
- Computational and Theoretical Chemistry Group, Department of Chemistry, Southern Methodist University, Dallas, TX, USA
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Ryan B Jadrich
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Elfi Kraka
- Computational and Theoretical Chemistry Group, Department of Chemistry, Southern Methodist University, Dallas, TX, USA
| | - Kipton Barros
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Benjamin T Nebgen
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Sergei Tretiak
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Olexandr Isayev
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Nicholas Lubbers
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, NM, USA.
| | - Richard A Messerly
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA.
| | - Justin S Smith
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA.
- NVIDIA Corp., Santa Clara, CA, USA.
| |
Collapse
|
40
|
Wan K, He J, Shi X. Construction of High Accuracy Machine Learning Interatomic Potential for Surface/Interface of Nanomaterials-A Review. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2024; 36:e2305758. [PMID: 37640376 DOI: 10.1002/adma.202305758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 08/24/2023] [Indexed: 08/31/2023]
Abstract
The inherent discontinuity and unique dimensional attributes of nanomaterial surfaces and interfaces bestow them with various exceptional properties. These properties, however, also introduce difficulties for both experimental and computational studies. The advent of machine learning interatomic potential (MLIP) addresses some of the limitations associated with empirical force fields, presenting a valuable avenue for accurate simulations of these surfaces/interfaces of nanomaterials. Central to this approach is the idea of capturing the relationship between system configuration and potential energy, leveraging the proficiency of machine learning (ML) to precisely approximate high-dimensional functions. This review offers an in-depth examination of MLIP principles and their execution and elaborates on their applications in the realm of nanomaterial surface and interface systems. The prevailing challenges faced by this potent methodology are also discussed.
Collapse
Affiliation(s)
- Kaiwei Wan
- Laboratory of Theoretical and Computational Nanoscience, National Center for Nanoscience and Technology, Chinese Academy of Sciences, Beijing, 100190, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Jianxin He
- Laboratory of Theoretical and Computational Nanoscience, National Center for Nanoscience and Technology, Chinese Academy of Sciences, Beijing, 100190, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Xinghua Shi
- Laboratory of Theoretical and Computational Nanoscience, National Center for Nanoscience and Technology, Chinese Academy of Sciences, Beijing, 100190, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| |
Collapse
|
41
|
Chen M, Jiang X, Zhang L, Chen X, Wen Y, Gu Z, Li X, Zheng M. The emergence of machine learning force fields in drug design. Med Res Rev 2024; 44:1147-1182. [PMID: 38173298 DOI: 10.1002/med.22008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Revised: 11/29/2023] [Accepted: 12/05/2023] [Indexed: 01/05/2024]
Abstract
In the field of molecular simulation for drug design, traditional molecular mechanic force fields and quantum chemical theories have been instrumental but limited in terms of scalability and computational efficiency. To overcome these limitations, machine learning force fields (MLFFs) have emerged as a powerful tool capable of balancing accuracy with efficiency. MLFFs rely on the relationship between molecular structures and potential energy, bypassing the need for a preconceived notion of interaction representations. Their accuracy depends on the machine learning models used, and the quality and volume of training data sets. With recent advances in equivariant neural networks and high-quality datasets, MLFFs have significantly improved their performance. This review explores MLFFs, emphasizing their potential in drug design. It elucidates MLFF principles, provides development and validation guidelines, and highlights successful MLFF implementations. It also addresses potential challenges in developing and applying MLFFs. The review concludes by illuminating the path ahead for MLFFs, outlining the challenges to be overcome and the opportunities to be harnessed. This inspires researchers to embrace MLFFs in their investigations as a new tool to perform molecular simulations in drug design.
Collapse
Affiliation(s)
- Mingan Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Physical Science and Technology, ShanghaiTech University, Shanghai, China
- Lingang Laboratory, Shanghai, China
| | - Xinyu Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
| | - Lehan Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
| | - Xiaoxu Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| | - Yiming Wen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| | - Zhiyong Gu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| |
Collapse
|
42
|
Jin H, Merz KM. Modeling Zinc Complexes Using Neural Networks. J Chem Inf Model 2024; 64:3140-3148. [PMID: 38587510 PMCID: PMC11040731 DOI: 10.1021/acs.jcim.4c00095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 03/04/2024] [Accepted: 03/28/2024] [Indexed: 04/09/2024]
Abstract
Understanding the energetic landscapes of large molecules is necessary for the study of chemical and biological systems. Recently, deep learning has greatly accelerated the development of models based on quantum chemistry, making it possible to build potential energy surfaces and explore chemical space. However, most of this work has focused on organic molecules due to the simplicity of their electronic structures as well as the availability of data sets. In this work, we build a deep learning architecture to model the energetics of zinc organometallic complexes. To achieve this, we have compiled a configurationally and conformationally diverse data set of zinc complexes using metadynamics to overcome the limitations of traditional sampling methods. In terms of the neural network potentials, our results indicate that for zinc complexes, partial charges play an important role in modeling the long-range interactions with a neural network. Our developed model outperforms semiempirical methods in predicting the relative energy of zinc conformers, yielding a mean absolute error (MAE) of 1.32 kcal/mol with reference to the double-hybrid PWPB95 method.
Collapse
Affiliation(s)
- Hongni Jin
- Department
of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States
| | - Kenneth M. Merz
- Department
of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States
- Department
of Biochemistry and Molecular Biology, Michigan
State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
43
|
Zills F, Schäfer MR, Segreto N, Kästner J, Holm C, Tovey S. Collaboration on Machine-Learned Potentials with IPSuite: A Modular Framework for Learning-on-the-Fly. J Phys Chem B 2024; 128:3662-3676. [PMID: 38568231 DOI: 10.1021/acs.jpcb.3c07187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024]
Abstract
The field of machine learning potentials has experienced a rapid surge in progress, thanks to advances in machine learning theory, algorithms, and hardware capabilities. While the underlying methods are continuously evolving, the infrastructure for their deployment has lagged. The community, due to these rapid developments, frequently finds itself split into groups built around different implementations of machine-learned potentials. In this work, we introduce IPSuite, a Python-driven software package designed to connect different methods and algorithms from the comprehensive field of machine-learned potentials into a single platform while also providing a collaborative infrastructure, helping ensure reproducibility. Furthermore, the data management infrastructure of the IPSuite code enables simple model sharing and deployment in simulations. Currently, IPSuite supports six state-of-the-art machine learning approaches for the fitting of interatomic potentials as well as a variety of methods for the selection of training data, running of ab initio calculations, learning-on-the-fly strategies, model evaluation, and simulation deployment.
Collapse
Affiliation(s)
- Fabian Zills
- Institute for Computational Physics, University of Stuttgart, 70569 Stuttgart, Germany
| | - Moritz René Schäfer
- Institute for Theoretical Chemistry, University of Stuttgart, 70569 Stuttgart, Germany
| | - Nico Segreto
- Institute for Theoretical Chemistry, University of Stuttgart, 70569 Stuttgart, Germany
| | - Johannes Kästner
- Institute for Theoretical Chemistry, University of Stuttgart, 70569 Stuttgart, Germany
| | - Christian Holm
- Institute for Computational Physics, University of Stuttgart, 70569 Stuttgart, Germany
| | - Samuel Tovey
- Institute for Computational Physics, University of Stuttgart, 70569 Stuttgart, Germany
| |
Collapse
|
44
|
Unke OT, Stöhr M, Ganscha S, Unterthiner T, Maennel H, Kashubin S, Ahlin D, Gastegger M, Medrano Sandonas L, Berryman JT, Tkatchenko A, Müller KR. Biomolecular dynamics with machine-learned quantum-mechanical force fields trained on diverse chemical fragments. SCIENCE ADVANCES 2024; 10:eadn4397. [PMID: 38579003 PMCID: PMC11809612 DOI: 10.1126/sciadv.adn4397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Accepted: 02/29/2024] [Indexed: 04/07/2024]
Abstract
The GEMS method enables molecular dynamics simulations of large heterogeneous systems at ab initio quality.
Collapse
Affiliation(s)
- Oliver T. Unke
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- DFG Cluster of Excellence “Unifying Systems in Catalysis” (UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
| | - Martin Stöhr
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Stefan Ganscha
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
| | - Thomas Unterthiner
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
| | - Hartmut Maennel
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
| | - Sergii Kashubin
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
| | - Daniel Ahlin
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
| | - Michael Gastegger
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- DFG Cluster of Excellence “Unifying Systems in Catalysis” (UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
- BASLEARN — TU Berlin/BASF Joint Lab for Machine Learning, Technische Universität Berlin, 10587 Berlin, Germany
| | - Leonardo Medrano Sandonas
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Joshua T. Berryman
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Klaus-Robert Müller
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- Department of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul 02841, Korea
- Max Planck Institute for Informatics, Stuhlsatzenhausweg, 66123 Saarbrücken, Germany
- BIFOLD — Berlin Institute for the Foundations of Learning and Data, Berlin, Germany
| |
Collapse
|
45
|
Jin H, Merz KM. Modeling Fe(II) Complexes Using Neural Networks. J Chem Theory Comput 2024; 20:2551-2558. [PMID: 38439716 PMCID: PMC10976644 DOI: 10.1021/acs.jctc.4c00063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 02/18/2024] [Accepted: 02/22/2024] [Indexed: 03/06/2024]
Abstract
We report a Fe(II) data set of more than 23000 conformers in both low-spin (LS) and high-spin (HS) states. This data set was generated to develop a neural network model that is capable of predicting the energy and the energy splitting as a function of the conformation of a Fe(II) organometallic complex. In order to achieve this, we propose a type of scaled electronic embedding to cover the long-range interactions implicitly in our neural network describing the Fe(II) organometallic complexes. For the total energy prediction, the lowest MAE is 0.037 eV, while the lowest MAE of the splitting energy is 0.030 eV. Compared to baseline models, which only incorporate short-range interactions, our scaled electronic embeddings improve the accuracy by over 70% for the prediction of the total energy and the splitting energy. With regard to semiempirical methods, our proposed models reduce the MAE, with respect to these methods, by 2 orders of magnitude.
Collapse
Affiliation(s)
- Hongni Jin
- Department
of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States
| | - Kenneth M. Merz
- Department
of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States
- Department
of Biochemistry and Molecular Biology, Michigan
State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
46
|
Käser S, Meuwly M. Numerical Accuracy Matters: Applications of Machine Learned Potential Energy Surfaces. J Phys Chem Lett 2024:3419-3424. [PMID: 38506827 DOI: 10.1021/acs.jpclett.3c03405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/21/2024]
Abstract
The role of numerical accuracy in training and evaluating neural network-based potential energy surfaces is examined for different experimental observables. For observables that require third- and fourth-order derivatives of the potential energy with respect to Cartesian coordinates single-precision arithmetics as is typically used in ML-based approaches is insufficient and leads to roughness of the underlying PES as is explicitly demonstrated. Increasing the numerical accuracy to double-precision gives a smooth PES with higher-order derivatives that are numerically stable and yield meaningful anharmonic frequencies and tunneling splitting as is demonstrated for H2CO and malonaldehyde. For molecular dynamics simulations, which only require first-order derivatives, single-precision arithmetics appears to be sufficient, though.
Collapse
Affiliation(s)
- Silvan Käser
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| | - Markus Meuwly
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| |
Collapse
|
47
|
Li H, Tang Z, Fu J, Dong WH, Zou N, Gong X, Duan W, Xu Y. Deep-Learning Density Functional Perturbation Theory. PHYSICAL REVIEW LETTERS 2024; 132:096401. [PMID: 38489617 DOI: 10.1103/physrevlett.132.096401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 01/01/2024] [Accepted: 01/31/2024] [Indexed: 03/17/2024]
Abstract
Calculating perturbation response properties of materials from first principles provides a vital link between theory and experiment, but is bottlenecked by the high computational cost. Here, a general framework is proposed to perform density functional perturbation theory (DFPT) calculations by neural networks, greatly improving the computational efficiency. Automatic differentiation is applied on neural networks, facilitating accurate computation of derivatives. High efficiency and good accuracy of the approach are demonstrated by studying electron-phonon coupling and related physical quantities. This work brings deep-learning density functional theory and DFPT into a unified framework, creating opportunities for developing ab initio artificial intelligence.
Collapse
Affiliation(s)
- He Li
- State Key Laboratory of Low Dimensional Quantum Physics and Department of Physics, Tsinghua University, Beijing 100084, China
- Institute for Advanced Study, Tsinghua University, Beijing 100084, China
| | - Zechen Tang
- State Key Laboratory of Low Dimensional Quantum Physics and Department of Physics, Tsinghua University, Beijing 100084, China
| | - Jingheng Fu
- State Key Laboratory of Low Dimensional Quantum Physics and Department of Physics, Tsinghua University, Beijing 100084, China
| | - Wen-Han Dong
- State Key Laboratory of Low Dimensional Quantum Physics and Department of Physics, Tsinghua University, Beijing 100084, China
| | - Nianlong Zou
- State Key Laboratory of Low Dimensional Quantum Physics and Department of Physics, Tsinghua University, Beijing 100084, China
| | - Xiaoxun Gong
- State Key Laboratory of Low Dimensional Quantum Physics and Department of Physics, Tsinghua University, Beijing 100084, China
- School of Physics, Peking University, Beijing 100871, China
| | - Wenhui Duan
- State Key Laboratory of Low Dimensional Quantum Physics and Department of Physics, Tsinghua University, Beijing 100084, China
- Institute for Advanced Study, Tsinghua University, Beijing 100084, China
- Frontier Science Center for Quantum Information, Beijing, China
| | - Yong Xu
- State Key Laboratory of Low Dimensional Quantum Physics and Department of Physics, Tsinghua University, Beijing 100084, China
- Frontier Science Center for Quantum Information, Beijing, China
- RIKEN Center for Emergent Matter Science (CEMS), Wako, Saitama 351-0198, Japan
| |
Collapse
|
48
|
Célerse F, Wodrich MD, Vela S, Gallarati S, Fabregat R, Juraskova V, Corminboeuf C. From Organic Fragments to Photoswitchable Catalysts: The OFF-ON Structural Repository for Transferable Kernel-Based Potentials. J Chem Inf Model 2024; 64:1201-1212. [PMID: 38319296 PMCID: PMC10900300 DOI: 10.1021/acs.jcim.3c01953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 01/18/2024] [Accepted: 01/22/2024] [Indexed: 02/07/2024]
Abstract
Structurally and conformationally diverse databases are needed to train accurate neural networks or kernel-based potentials capable of exploring the complex free energy landscape of flexible functional organic molecules. Curating such databases for species beyond "simple" drug-like compounds or molecules composed of well-defined building blocks (e.g., peptides) is challenging as it requires thorough chemical space mapping and evaluation of both chemical and conformational diversities. Here, we introduce the OFF-ON (organic fragments from organocatalysts that are non-modular) database, a repository of 7869 equilibrium and 67,457 nonequilibrium geometries of organic compounds and dimers aimed at describing conformationally flexible functional organic molecules, with an emphasis on photoswitchable organocatalysts. The relevance of this database is then demonstrated by training a local kernel regression model on a low-cost semiempirical baseline and comparing it with a PBE0-D3 reference for several known catalysts, notably the free energy surfaces of exemplary photoswitchable organocatalysts. Our results demonstrate that the OFF-ON data set offers reliable predictions for simulating the conformational behavior of virtually any (photoswitchable) organocatalyst or organic compound composed of H, C, N, O, F, and S atoms, thereby opening a computationally feasible route to explore complex free energy surfaces in order to rationalize and predict catalytic behavior.
Collapse
Affiliation(s)
- Frédéric Célerse
- Laboratory
for Computational Molecular Design (LCMD), Institute of Chemical Sciences
and Engineering, Ecole Polytechnique Fédérale
de Lausanne (EPFL), Lausanne 1015, Switzerland
| | - Matthew D. Wodrich
- Laboratory
for Computational Molecular Design (LCMD), Institute of Chemical Sciences
and Engineering, Ecole Polytechnique Fédérale
de Lausanne (EPFL), Lausanne 1015, Switzerland
- National
Center for Competence in Research-Catalysis (NCCR-Catalysis), Ecole Polytechnique Fédérale de Lausanne, Lausanne 1015, Switzerland
| | - Sergi Vela
- Laboratory
for Computational Molecular Design (LCMD), Institute of Chemical Sciences
and Engineering, Ecole Polytechnique Fédérale
de Lausanne (EPFL), Lausanne 1015, Switzerland
| | - Simone Gallarati
- Laboratory
for Computational Molecular Design (LCMD), Institute of Chemical Sciences
and Engineering, Ecole Polytechnique Fédérale
de Lausanne (EPFL), Lausanne 1015, Switzerland
| | - Raimon Fabregat
- Laboratory
for Computational Molecular Design (LCMD), Institute of Chemical Sciences
and Engineering, Ecole Polytechnique Fédérale
de Lausanne (EPFL), Lausanne 1015, Switzerland
| | - Veronika Juraskova
- Laboratory
for Computational Molecular Design (LCMD), Institute of Chemical Sciences
and Engineering, Ecole Polytechnique Fédérale
de Lausanne (EPFL), Lausanne 1015, Switzerland
| | - Clémence Corminboeuf
- Laboratory
for Computational Molecular Design (LCMD), Institute of Chemical Sciences
and Engineering, Ecole Polytechnique Fédérale
de Lausanne (EPFL), Lausanne 1015, Switzerland
- National
Center for Competence in Research-Catalysis (NCCR-Catalysis), Ecole Polytechnique Fédérale de Lausanne, Lausanne 1015, Switzerland
- National
Centre for Computational Design and Discovery of Novel Materials (MARVEL), Ecole Polytechnique Fédérale de Lausanne, Lausanne 1015, Switzerland
| |
Collapse
|
49
|
Sidorov P, Tsuji N. A Primer on 2D Descriptors in Selectivity Modeling for Asymmetric Catalysis. Chemistry 2024; 30:e202302837. [PMID: 38010242 DOI: 10.1002/chem.202302837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 11/21/2023] [Accepted: 11/23/2023] [Indexed: 11/29/2023]
Abstract
Machine learning has permeated all fields of research, including chemistry, and is now an integral part of the design of novel compounds with desired properties. In the field of asymmetric catalysis, the preference still lies with models based on a physical understanding of the catalysis phenomenon and the electronic and steric properties of catalysts. However, such models require quantum chemical calculations and are thus limited by their computational cost. Here, we highlight the recent advances in modeling catalyst selectivity by using the 2D structures of catalysts and substrates. While these have a less explicit mechanistic connection to the modeled property, 2D descriptors, such as topological indices, molecular fingerprints, and fragments, offer the tremendous advantages of low cost and high speed of calculations. This makes them optimal for the in-silico screening of large amounts of data. We provide an overview of common quantitative structure-property relationship workflow, model building and validation techniques, applications of these methodologies in asymmetric catalysis design, and an outlook on improving the understanding of 2D-based models.
Collapse
Affiliation(s)
- Pavel Sidorov
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo, 001-0021, Japan
| | - Nobuya Tsuji
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo, 001-0021, Japan
| |
Collapse
|
50
|
Witman MD, Bartelt NC, Ling S, Guan PW, Way L, Allendorf MD, Stavila V. Phase Diagrams of Alloys and Their Hydrides via On-Lattice Graph Neural Networks and Limited Training Data. J Phys Chem Lett 2024; 15:1500-1506. [PMID: 38299540 DOI: 10.1021/acs.jpclett.3c03369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2024]
Abstract
Efficient prediction of sampling-intensive thermodynamic properties is needed to evaluate material performance and permit high-throughput materials modeling for a diverse array of technology applications. To alleviate the prohibitive computational expense of high-throughput configurational sampling with density functional theory (DFT), surrogate modeling strategies like cluster expansion are many orders of magnitude more efficient but can be difficult to construct in systems with high compositional complexity. We therefore employ minimal-complexity graph neural network models that accurately predict and can even extrapolate to out-of-train distribution formation energies of DFT-relaxed structures from an ideal (unrelaxed) crystallographic representation. This enables the large-scale sampling necessary for various thermodynamic property predictions that may otherwise be intractable and can be achieved with small training data sets. Two exemplars, optimizing the thermodynamic stability of low-density high-entropy alloys and modulating the plateau pressure of hydrogen in metal alloys, demonstrate the power of this approach, which can be extended to a variety of materials discovery and modeling problems.
Collapse
Affiliation(s)
- Matthew D Witman
- Sandia National Laboratories, Livermore, California 94551-0969, United States
| | - Norman C Bartelt
- Sandia National Laboratories, Livermore, California 94551-0969, United States
| | - Sanliang Ling
- Advanced Materials Research Group, Faculty of Engineering, University of Nottingham, University Park, Nottingham NG7 2RD, U.K
| | - Pin-Wen Guan
- Sandia National Laboratories, Livermore, California 94551-0969, United States
| | - Lauren Way
- Sandia National Laboratories, Livermore, California 94551-0969, United States
| | - Mark D Allendorf
- Sandia National Laboratories, Livermore, California 94551-0969, United States
| | - Vitalie Stavila
- Sandia National Laboratories, Livermore, California 94551-0969, United States
| |
Collapse
|