1
|
Chen J, Gao Q, Huang M, Yu K. Application of modern artificial intelligence techniques in the development of organic molecular force fields. Phys Chem Chem Phys 2025; 27:2294-2319. [PMID: 39820957 DOI: 10.1039/d4cp02989e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2025]
Abstract
The molecular force field (FF) determines the accuracy of molecular dynamics (MD) and is one of the major bottlenecks that limits the application of MD in molecular design. Recently, artificial intelligence (AI) techniques, such as machine-learning potentials (MLPs), have been rapidly reshaping the landscape of MD. Meanwhile, organic molecular systems feature unique characteristics, and require more careful treatment in both model construction, optimization, and validation. While an accurate and generic organic molecular force field is still missing, significant progress has been made with the facilitation of AI, warranting a promising future. In this review, we provide an overview of the various types of AI techniques used in molecular FF development and discuss both the advantages and weaknesses of these methodologies. We show how AI methods provide unprecedented capabilities in many tasks such as potential fitting, atom typification, and automatic optimization. Meanwhile, it is also worth noting that more efforts are needed to improve the transferability of the model, develop a more comprehensive database, and establish more standardized validation procedures. With these discussions, we hope to inspire more efforts to solve the existing problems, eventually leading to the birth of next-generation generic organic FFs.
Collapse
Affiliation(s)
- Junmin Chen
- Institute of Materials Research (IMR), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China.
- Tsinghua-Berkeley Shenzhen Institute (TBSI), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Qian Gao
- Institute of Materials Research (IMR), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China.
| | - Miaofei Huang
- Institute of Materials Research (IMR), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China.
| | - Kuang Yu
- Institute of Materials Research (IMR), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China.
- Tsinghua-Berkeley Shenzhen Institute (TBSI), Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| |
Collapse
|
2
|
Frank JT, Unke OT, Müller KR, Chmiela S. A Euclidean transformer for fast and stable machine learned force fields. Nat Commun 2024; 15:6539. [PMID: 39107296 PMCID: PMC11303804 DOI: 10.1038/s41467-024-50620-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 07/10/2024] [Indexed: 08/10/2024] Open
Abstract
Recent years have seen vast progress in the development of machine learned force fields (MLFFs) based on ab-initio reference calculations. Despite achieving low test errors, the reliability of MLFFs in molecular dynamics (MD) simulations is facing growing scrutiny due to concerns about instability over extended simulation timescales. Our findings suggest a potential connection between robustness to cumulative inaccuracies and the use of equivariant representations in MLFFs, but the computational cost associated with these representations can limit this advantage in practice. To address this, we propose a transformer architecture called SO3KRATES that combines sparse equivariant representations (Euclidean variables) with a self-attention mechanism that separates invariant and equivariant information, eliminating the need for expensive tensor products. SO3KRATES achieves a unique combination of accuracy, stability, and speed that enables insightful analysis of quantum properties of matter on extended time and system size scales. To showcase this capability, we generate stable MD trajectories for flexible peptides and supra-molecular structures with hundreds of atoms. Furthermore, we investigate the PES topology for medium-sized chainlike molecules (e.g., small peptides) by exploring thousands of minima. Remarkably, SO3KRATES demonstrates the ability to strike a balance between the conflicting demands of stability and the emergence of new minimum-energy conformations beyond the training data, which is crucial for realistic exploration tasks in the field of biochemistry.
Collapse
Affiliation(s)
- J Thorben Frank
- Machine Learning Group, TU Berlin, Berlin, Germany
- BIFOLD, Berlin Institute for the Foundations of Learning and Data, Berlin, Germany
| | | | - Klaus-Robert Müller
- Machine Learning Group, TU Berlin, Berlin, Germany.
- BIFOLD, Berlin Institute for the Foundations of Learning and Data, Berlin, Germany.
- Google DeepMind, Berlin, Germany.
- Department of Artificial Intelligence, Korea University, Seoul, Korea.
- Max Planck Institut für Informatik, Saarbrücken, Germany.
| | - Stefan Chmiela
- Machine Learning Group, TU Berlin, Berlin, Germany.
- BIFOLD, Berlin Institute for the Foundations of Learning and Data, Berlin, Germany.
| |
Collapse
|
3
|
Biriukov D, Vácha R. Pathways to a Shiny Future: Building the Foundation for Computational Physical Chemistry and Biophysics in 2050. ACS PHYSICAL CHEMISTRY AU 2024; 4:302-313. [PMID: 39069976 PMCID: PMC11274290 DOI: 10.1021/acsphyschemau.4c00003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Revised: 03/15/2024] [Accepted: 03/18/2024] [Indexed: 07/30/2024]
Abstract
In the last quarter-century, the field of molecular dynamics (MD) has undergone a remarkable transformation, propelled by substantial enhancements in software, hardware, and underlying methodologies. In this Perspective, we contemplate the future trajectory of MD simulations and their possible look at the year 2050. We spotlight the pivotal role of artificial intelligence (AI) in shaping the future of MD and the broader field of computational physical chemistry. We outline critical strategies and initiatives that are essential for the seamless integration of such technologies. Our discussion delves into topics like multiscale modeling, adept management of ever-increasing data deluge, the establishment of centralized simulation databases, and the autonomous refinement, cross-validation, and self-expansion of these repositories. The successful implementation of these advancements requires scientific transparency, a cautiously optimistic approach to interpreting AI-driven simulations and their analysis, and a mindset that prioritizes knowledge-motivated research alongside AI-enhanced big data exploration. While history reminds us that the trajectory of technological progress can be unpredictable, this Perspective offers guidance on preparedness and proactive measures, aiming to steer future advancements in the most beneficial and successful direction.
Collapse
Affiliation(s)
- Denys Biriukov
- CEITEC
− Central European Institute of Technology, Masaryk University, Kamenice 753/5, 625 00 Brno, Czech Republic
- National
Centre for Biomolecular Research, Faculty of Science, Masaryk University, Kamenice 753/5, 625 00 Brno, Czech Republic
| | - Robert Vácha
- CEITEC
− Central European Institute of Technology, Masaryk University, Kamenice 753/5, 625 00 Brno, Czech Republic
- National
Centre for Biomolecular Research, Faculty of Science, Masaryk University, Kamenice 753/5, 625 00 Brno, Czech Republic
- Department
of Condensed Matter Physics, Faculty of Science, Masaryk University, Kotlářská 267/2, 611 37 Brno, Czech
Republic
| |
Collapse
|
4
|
Riemann A, Rankin L, Henry D. Atomic Charge Dependency of Spiropyran/Merocyanine Adsorption as a Precursor to Surface Isomerization Reactions. ACS OMEGA 2024; 9:798-810. [PMID: 38222550 PMCID: PMC10785610 DOI: 10.1021/acsomega.3c06712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 11/29/2023] [Accepted: 12/11/2023] [Indexed: 01/16/2024]
Abstract
This computational study investigates the adsorption of various spiropyran and merocyanine isomers on a NaCl substrate using a combination of density functional theory (DFT) and molecular mechanics (MM) calculations. Four different charge methods were used to determine the partial atomic charges for the adsorbate molecules, including Mulliken population analysis and three electrostatic potential (ESP) methods (Merz-Kollman, ChelpG, and Hu-Lu-Yang), while three different force fields (AMBER 3, CHARMM 27, and MM+) were employed for the MM calculations. The results show that the various DFT charge methods produced similar outcomes for the molecules' partial atomic charges, with some exceptions for individual atoms and methods. Additionally, it was found that the ESP charge methods were more sensitive to the conformer orientation than the Mulliken approach. The adsorption behavior of merocyanine conformers with the central bond in trans orientation (T-conformers) was similar for various configurations, with the molecule adsorbing mostly flat with its aromatic rings almost parallel to the substrate. However, C-conformers (with their central bond in cis orientation) and spiropyran isomers exhibited inconsistent adsorption behavior, mostly because only some of the aromatic rings contributed to the adsorption behavior. Due to additional van der Waals interactions of more aromatic rings, the adsorption energies for T-conformers are consistently 0.2-0.3 eV higher than for C-conformers and for spiropyran. The study found that the adsorption geometries and energies of stable T-conformers were independent of the partial atomic charge scheme and force field used, and C-conformers show parameter-dependent behavior upon adsorption, leading to metastable configurations. These findings indicate viable pathways during the spiropyran-merocyanine isomerization reactions. Therefore, the results provide initial insights into the possibility of switching spiropyran isomers into merocyanine isomers and vice versa after adsorption onto substrates.
Collapse
Affiliation(s)
- Andreas Riemann
- Department of Physics & Astronomy, Western Washington University, 516 High Street, Bellingham, Washington 98225, United States
| | - Lauren Rankin
- Department of Physics & Astronomy, Western Washington University, 516 High Street, Bellingham, Washington 98225, United States
| | - Dylan Henry
- Department of Physics & Astronomy, Western Washington University, 516 High Street, Bellingham, Washington 98225, United States
| |
Collapse
|
5
|
Wu S, Yang X, Zhao X, Li Z, Lu M, Xie X, Yan J. Applications and Advances in Machine Learning Force Fields. J Chem Inf Model 2023; 63:6972-6985. [PMID: 37751546 DOI: 10.1021/acs.jcim.3c00889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/28/2023]
Abstract
Force fields (FFs) form the basis of molecular simulations and have significant implications in diverse fields such as materials science, chemistry, physics, and biology. A suitable FF is required to accurately describe system properties. However, an off-the-shelf FF may not be suitable for certain specialized systems, and researchers often need to tailor the FF that fits specific requirements. Before applying machine learning (ML) techniques to construct FFs, the mainstream FFs were primarily based on first-principles force fields (FPFF) and empirical FFs. However, the drawbacks of FPFF and empirical FFs are high cost and low accuracy, respectively, so there is a growing interest in using ML as an effective and precise tool for reconciling this trade-off in developing FFs. In this review, we introduce the fundamental principles of ML and FFs in the context of machine learning force fields (MLFF). We also discuss the advantages and applications of MLFF compared to traditional FFs, as well as the MLFF toolkits widely employed in numerous applications.
Collapse
Affiliation(s)
- Shiru Wu
- Key Laboratory of Flexible Electronics (KLOFE) & Institute of Advanced Materials (IAM), Nanjing Tech University (Nanjing Tech), Nanjing 211816, P. R. China
| | - Xiaowei Yang
- Key Laboratory of Flexible Electronics (KLOFE) & Institute of Advanced Materials (IAM), Nanjing Tech University (Nanjing Tech), Nanjing 211816, P. R. China
| | - Xun Zhao
- Key Laboratory of Flexible Electronics (KLOFE) & Institute of Advanced Materials (IAM), Nanjing Tech University (Nanjing Tech), Nanjing 211816, P. R. China
| | - Zhipu Li
- Key Laboratory of Flexible Electronics (KLOFE) & Institute of Advanced Materials (IAM), Nanjing Tech University (Nanjing Tech), Nanjing 211816, P. R. China
| | - Min Lu
- Key Laboratory of Flexible Electronics (KLOFE) & Institute of Advanced Materials (IAM), Nanjing Tech University (Nanjing Tech), Nanjing 211816, P. R. China
| | - Xiaoji Xie
- Key Laboratory of Flexible Electronics (KLOFE) & Institute of Advanced Materials (IAM), Nanjing Tech University (Nanjing Tech), Nanjing 211816, P. R. China
| | - Jiaxu Yan
- Key Laboratory of Flexible Electronics (KLOFE) & Institute of Advanced Materials (IAM), Nanjing Tech University (Nanjing Tech), Nanjing 211816, P. R. China
- Changchun Institute of Optics, Fine Mechanics & Physics (CIOMP), Chinese Academy of Sciences, Changchun 130033, P. R. China
- University of Chinese Academy of Sciences, Chinese Academy of Sciences, Beijing 100049, P. R. China
| |
Collapse
|
6
|
Thürlemann M, Riniker S. Hybrid classical/machine-learning force fields for the accurate description of molecular condensed-phase systems. Chem Sci 2023; 14:12661-12675. [PMID: 38020395 PMCID: PMC10646964 DOI: 10.1039/d3sc04317g] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 10/24/2023] [Indexed: 12/01/2023] Open
Abstract
Electronic structure methods offer in principle accurate predictions of molecular properties, however, their applicability is limited by computational costs. Empirical methods are cheaper, but come with inherent approximations and are dependent on the quality and quantity of training data. The rise of machine learning (ML) force fields (FFs) exacerbates limitations related to training data even further, especially for condensed-phase systems for which the generation of large and high-quality training datasets is difficult. Here, we propose a hybrid ML/classical FF model that is parametrized exclusively on high-quality ab initio data of dimers and monomers in vacuum but is transferable to condensed-phase systems. The proposed hybrid model combines our previous ML-parametrized classical model with ML corrections for situations where classical approximations break down, thus combining the robustness and efficiency of classical FFs with the flexibility of ML. Extensive validation on benchmarking datasets and experimental condensed-phase data, including organic liquids and small-molecule crystal structures, showcases how the proposed approach may promote FF development and unlock the full potential of classical FFs.
Collapse
Affiliation(s)
- Moritz Thürlemann
- Department of Chemistry and Applied Biosciences, ETH Zürich Vladimir-Prelog-Weg 2 Zürich 8093 Switzerland
| | - Sereina Riniker
- Department of Chemistry and Applied Biosciences, ETH Zürich Vladimir-Prelog-Weg 2 Zürich 8093 Switzerland
| |
Collapse
|
7
|
Gandolfi M, Ceotto M. Molecular Dynamics of Artificially Pair-Decoupled Systems: An Accurate Tool for Investigating the Importance of Intramolecular Couplings. J Chem Theory Comput 2023; 19:6093-6108. [PMID: 37698951 PMCID: PMC10536992 DOI: 10.1021/acs.jctc.3c00553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Indexed: 09/14/2023]
Abstract
We propose a numerical technique to accurately simulate the vibrations of organic molecules in the gas phase, when pairs of atoms (or, in general, groups of degrees of freedom) are artificially decoupled, so that their motion is instantaneously decorrelated. The numerical technique we have developed is a symplectic integration algorithm that never requires computation of the force but requires estimates of the Hessian matrix. The theory we present to support our technique postulates a pair-decoupling Hamiltonian function, which parametrically depends on a decoupling coefficient α ∈ [0, 1]. The closer α is to 0, the more decoupled the selected atoms. We test the correctness of our numerical method on small molecular systems, and we apply it to study the vibrational spectroscopic features of salicylic acid at the Density Functional Theory ab initio level on a fitted potential. Our pair-decoupled simulations of salicylic acid show that decoupling hydrogen-bonded atoms do not significantly influence the frequencies of stretching modes, but enhance enormously the out-of-plane wagging and twisting motions of the hydroxyl and carboxyl groups to the point that the carboxyl and hydroxyl groups may overcome high potential energy barriers and change the salicylic acid conformation after a short simulation time. In addition, we found that the acidity of salicylic acid is more influenced by the dynamical couplings of the proton of the carboxylic group with the carbon ring than with the hydroxyl group.
Collapse
Affiliation(s)
- Michele Gandolfi
- Dipartimento di Chimica, Università degli Studi di Milano, via Golgi 19, 20133 Milano, Italy
| | - Michele Ceotto
- Dipartimento di Chimica, Università degli Studi di Milano, via Golgi 19, 20133 Milano, Italy
| |
Collapse
|
8
|
Lanjan A, Moradi Z, Srinivasan S. Computational Framework Combining Quantum Mechanics, Molecular Dynamics, and Deep Neural Networks to Evaluate the Intrinsic Properties of Materials. J Phys Chem A 2023; 127:6603-6613. [PMID: 37497552 DOI: 10.1021/acs.jpca.3c02887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/28/2023]
Abstract
The design and evaluation of future nanomaterials with specific properties is a challenging task as the current traditional methods rely on trial and error approaches that are time-consuming and expensive. On the computational front, design tools such as molecular dynamics (MD) simulations help us reduce the costs and times. However, nonbonded potential parameters, the key input parameters for an MD simulation, are usually not available for designing and studying new materials. Resolving this, quantum mechanics (QM) calculations could be used to evaluate the system's energy as a function of the nonbonded distances, and the resulting data set could be fit to a generic potential equation to obtain the fitting constants (potential parameters). However, fitting this massive data set containing thousands of unknown parameters using traditional mathematical formulations is not feasible. Hence, most computational frameworks in the literature utilize several simplifications, leading to a severe loss of accuracy. Addressing this deficiency, in this work, we propose a multi-scale framework that couples QM calculations and MD with advanced deep neural networks to determine the potential parameters. This advanced framework has been extensively validated by employing it to predict properties such as the density, boiling point, and melting point of five different types of molecules that are well-understood, namely, the polar molecule H2O, ionic compound LiPF6, ethanol (C2H5OH), long-chain molecule C8H18, and the complex molecular system ethylene carbonate (EC).
Collapse
Affiliation(s)
- Amirmasoud Lanjan
- Department of Mechanical Engineering, McMaster University, Hamilton, Ontario L8S 4K1, Canada
| | - Zahra Moradi
- W Booth School of Engineering Practice and Technology, McMaster University, Hamilton, Ontario L8S 4K1, Canada
| | - Seshasai Srinivasan
- Department of Mechanical Engineering, McMaster University, Hamilton, Ontario L8S 4K1, Canada
- W Booth School of Engineering Practice and Technology, McMaster University, Hamilton, Ontario L8S 4K1, Canada
| |
Collapse
|
9
|
Zhang P, Yang W. Toward a general neural network force field for protein simulations: Refining the intramolecular interaction in protein. J Chem Phys 2023; 159:024118. [PMID: 37431910 PMCID: PMC10481389 DOI: 10.1063/5.0142280] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 06/22/2023] [Indexed: 07/12/2023] Open
Abstract
Molecular dynamics (MD) is an extremely powerful, highly effective, and widely used approach to understanding the nature of chemical processes in atomic details for proteins. The accuracy of results from MD simulations is highly dependent on force fields. Currently, molecular mechanical (MM) force fields are mainly utilized in MD simulations because of their low computational cost. Quantum mechanical (QM) calculation has high accuracy, but it is exceedingly time consuming for protein simulations. Machine learning (ML) provides the capability for generating accurate potential at the QM level without increasing much computational effort for specific systems that can be studied at the QM level. However, the construction of general machine learned force fields, needed for broad applications and large and complex systems, is still challenging. Here, general and transferable neural network (NN) force fields based on CHARMM force fields, named CHARMM-NN, are constructed for proteins by training NN models on 27 fragments partitioned from the residue-based systematic molecular fragmentation (rSMF) method. The NN for each fragment is based on atom types and uses new input features that are similar to MM inputs, including bonds, angles, dihedrals, and non-bonded terms, which enhance the compatibility of CHARMM-NN to MM MD and enable the implementation of CHARMM-NN force fields in different MD programs. While the main part of the energy of the protein is based on rSMF and NN, the nonbonded interactions between the fragments and with water are taken from the CHARMM force field through mechanical embedding. The validations of the method for dipeptides on geometric data, relative potential energies, and structural reorganization energies demonstrate that the CHARMM-NN local minima on the potential energy surface are very accurate approximations to QM, showing the success of CHARMM-NN for bonded interactions. However, the MD simulations on peptides and proteins indicate that more accurate methods to represent protein-water interactions in fragments and non-bonded interactions between fragments should be considered in the future improvement of CHARMM-NN, which can increase the accuracy of approximation beyond the current mechanical embedding QM/MM level.
Collapse
Affiliation(s)
- Pan Zhang
- Department of Chemistry, Duke University, Durham, North Carolina 27708, USA
| | - Weitao Yang
- Department of Chemistry, Duke University, Durham, North Carolina 27708, USA
| |
Collapse
|
10
|
Goldman N, Fried LE, Lindsey RK, Pham CH, Dettori R. Enhancing the accuracy of density functional tight binding models through ChIMES many-body interaction potentials. J Chem Phys 2023; 158:144112. [PMID: 37061479 DOI: 10.1063/5.0141616] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2023] Open
Abstract
Semi-empirical quantum models such as Density Functional Tight Binding (DFTB) are attractive methods for obtaining quantum simulation data at longer time and length scales than possible with standard approaches. However, application of these models can require lengthy effort due to the lack of a systematic approach for their development. In this work, we discuss the use of the Chebyshev Interaction Model for Efficient Simulation (ChIMES) to create rapidly parameterized DFTB models, which exhibit strong transferability due to the inclusion of many-body interactions that might otherwise be inaccurate. We apply our modeling approach to silicon polymorphs and review previous work on titanium hydride. We also review the creation of a general purpose DFTB/ChIMES model for organic molecules and compounds that approaches hybrid functional and coupled cluster accuracy with two orders of magnitude fewer parameters than similar neural network approaches. In all cases, DFTB/ChIMES yields similar accuracy to the underlying quantum method with orders of magnitude improvement in computational cost. Our developments provide a way to create computationally efficient and highly accurate simulations over varying extreme thermodynamic conditions, where physical and chemical properties can be difficult to interrogate directly, and there is historically a significant reliance on theoretical approaches for interpretation and validation of experimental results.
Collapse
Affiliation(s)
- Nir Goldman
- Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California 94550, USA
| | - Laurence E Fried
- Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California 94550, USA
| | - Rebecca K Lindsey
- Department of Chemical Engineering, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - C Huy Pham
- Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California 94550, USA
| | - R Dettori
- Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California 94550, USA
| |
Collapse
|
11
|
Chmiela S, Vassilev-Galindo V, Unke OT, Kabylda A, Sauceda HE, Tkatchenko A, Müller KR. Accurate global machine learning force fields for molecules with hundreds of atoms. SCIENCE ADVANCES 2023; 9:eadf0873. [PMID: 36630510 PMCID: PMC9833674 DOI: 10.1126/sciadv.adf0873] [Citation(s) in RCA: 44] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Accepted: 11/28/2022] [Indexed: 05/25/2023]
Abstract
Global machine learning force fields, with the capacity to capture collective interactions in molecular systems, now scale up to a few dozen atoms due to considerable growth of model complexity with system size. For larger molecules, locality assumptions are introduced, with the consequence that nonlocal interactions are not described. Here, we develop an exact iterative approach to train global symmetric gradient domain machine learning (sGDML) force fields (FFs) for several hundred atoms, without resorting to any potentially uncontrolled approximations. All atomic degrees of freedom remain correlated in the global sGDML FF, allowing the accurate description of complex molecules and materials that present phenomena with far-reaching characteristic correlation lengths. We assess the accuracy and efficiency of sGDML on a newly developed MD22 benchmark dataset containing molecules from 42 to 370 atoms. The robustness of our approach is demonstrated in nanosecond path-integral molecular dynamics simulations for supramolecular complexes in the MD22 dataset.
Collapse
Affiliation(s)
- Stefan Chmiela
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- Berlin Institute for the Foundations of Learning and Data – BIFOLD, Germany
| | - Valentin Vassilev-Galindo
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Oliver T. Unke
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- Google Research, Brain Team, Berlin, Germany
| | - Adil Kabylda
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Huziel E. Sauceda
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- Berlin Institute for the Foundations of Learning and Data – BIFOLD, Germany
- Departamento de Materia Condensada, Instituto de Física, Universidad Nacional Autónoma de México, Cd. de México C.P. 04510, Mexico
- BASLEARN - TU Berlin/BASF Joint Lab for Machine Learning, Technische Universität Berlin, 10587 Berlin, Germany
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Klaus-Robert Müller
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- Berlin Institute for the Foundations of Learning and Data – BIFOLD, Germany
- Google Research, Brain Team, Berlin, Germany
- Max Planck Institute for Informatics, Stuhlsatzenhausweg, 66123 Saarbrücken, Germany
- Department of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul 02841, Korea
| |
Collapse
|
12
|
Antila HS, Kav B, Miettinen MS, Martinez-Seara H, Jungwirth P, Ollila OHS. Emerging Era of Biomolecular Membrane Simulations: Automated Physically-Justified Force Field Development and Quality-Evaluated Databanks. J Phys Chem B 2022. [DOI: 10.1021/acs.jpcb.2c01954] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Hanne S. Antila
- Department of Biomaterials, Max Planck Institute of Colloids and Interfaces, 14424 Potsdam, Germany
| | - Batuhan Kav
- Institute of Biological Information Processing, Structural Biochemistry (IBI-7), Forschungszentrum
Jülich, Wilhelm-Johnen-Str., 52425 Jülich, Germany
| | - Markus S. Miettinen
- Computational Biology Unit, Department of Informatics, University of Bergen, 5008 Bergen, Norway
- Department of Chemistry, University of Bergen, 5020 Bergen, Norway
| | - Hector Martinez-Seara
- Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Flemingovo nam. 2, 16000 Prague 6, Czech Republic
| | - Pavel Jungwirth
- Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Flemingovo nam. 2, 16000 Prague 6, Czech Republic
| | - O. H. Samuli Ollila
- Institute of Biotechonology, University of Helsinki, Helsinki 00014, Finland
| |
Collapse
|
13
|
Winkler L, Müller KR, Sauceda HE. High-fidelity molecular dynamics trajectory reconstruction with bi-directional neural networks. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2022. [DOI: 10.1088/2632-2153/ac6ec6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Abstract
Molecular dynamics (MD) simulations are a cornerstone in science, enabling the investigation of a system’s thermodynamics all the way to analyzing intricate molecular interactions. In general, creating extended molecular trajectories can be a computationally expensive process, for example, when running ab-initio simulations. Hence, repeating such calculations to either obtain more accurate thermodynamics or to get a higher resolution in the dynamics generated by a fine-grained quantum interaction can be time- and computational resource-consuming. In this work, we explore different machine learning methodologies to increase the resolution of MD trajectories on-demand within a post-processing step. As a proof of concept, we analyse the performance of bi-directional neural networks (NNs) such as neural ODEs, Hamiltonian networks, recurrent NNs and long short-term memories, as well as the uni-directional variants as a reference, for MD simulations (here: the MD17 dataset). We have found that Bi-LSTMs are the best performing models; by utilizing the local time-symmetry of thermostated trajectories they can even learn long-range correlations and display high robustness to noisy dynamics across molecular complexity. Our models can reach accuracies of up to 10−4 Å in trajectory interpolation, which leads to the faithful reconstruction of several unseen high-frequency molecular vibration cycles. This renders the comparison between the learned and reference trajectories indistinguishable. The results reported in this work can serve (1) as a baseline for larger systems, as well as (2) for the construction of better MD integrators.
Collapse
|
14
|
Pham CH, Lindsey RK, Fried LE, Goldman N. High-Accuracy Semiempirical Quantum Models Based on a Minimal Training Set. J Phys Chem Lett 2022; 13:2934-2942. [PMID: 35343698 DOI: 10.1021/acs.jpclett.2c00453] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
A great need exists for computationally efficient quantum simulation approaches that can achieve an accuracy similar to high-level theories at a fraction of the computational cost. In this regard, we have leveraged a machine-learned interaction potential based on Chebyshev polynomials to improve density functional tight binding (DFTB) models for organic materials. The benefit of our approach is two-fold: (1) many-body interactions can be corrected for in a systematic and rapidly tunable process, and (2) high-level quantum accuracy for a broad range of compounds can be achieved with ∼0.3% of data required for one advanced deep learning potential. Our model exhibits both transferability and extensibility through comparison to quantum chemical results for organic clusters, solid carbon phases, and molecular crystal phase stability rankings. Our efforts thus allow for high-throughput physical and chemical predictions with up to coupled-cluster accuracy for systems that are computationally intractable with standard approaches.
Collapse
Affiliation(s)
- Cong Huy Pham
- Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California 94550, United States
| | - Rebecca K Lindsey
- Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California 94550, United States
| | - Laurence E Fried
- Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California 94550, United States
| | - Nir Goldman
- Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California 94550, United States
- Department of Chemical Engineering, University of California, Davis, California 95616, United States
| |
Collapse
|
15
|
Ketkaew R, Creazzo F, Luber S. Machine Learning-Assisted Discovery of Hidden States in Expanded Free Energy Space. J Phys Chem Lett 2022; 13:1797-1805. [PMID: 35171614 DOI: 10.1021/acs.jpclett.1c04004] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Collective variables (CVs) are crucial parameters in enhanced sampling calculations and strongly impact the quality of the obtained free energy surface. However, many existing CVs are unique to and dependent on the system they are constructed with, making the developed CV non-transferable to other systems. Herein, we develop a non-instructor-led deep autoencoder neural network (DAENN) for discovering general-purpose CVs. The DAENN is used to train a model by learning molecular representations upon unbiased trajectories that contain only the reactant conformers. The prior knowledge of nonconstraint reactants coupled with the here-introduced topology variable and loss-like penalty function are only required to make the biasing method able to expand its configurational (phase) space to unexplored energy basins. Our developed autoencoder is efficient and relatively inexpensive to use in terms of a priori knowledge, enabling one to automatically search for hidden CVs of the reaction of interest.
Collapse
Affiliation(s)
- Rangsiman Ketkaew
- Department of Chemistry, University of Zurich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland
| | - Fabrizio Creazzo
- Department of Chemistry, University of Zurich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland
| | - Sandra Luber
- Department of Chemistry, University of Zurich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland
| |
Collapse
|
16
|
Unke OT, Chmiela S, Gastegger M, Schütt KT, Sauceda HE, Müller KR. SpookyNet: Learning force fields with electronic degrees of freedom and nonlocal effects. Nat Commun 2021; 12:7273. [PMID: 34907176 PMCID: PMC8671403 DOI: 10.1038/s41467-021-27504-0] [Citation(s) in RCA: 116] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 11/16/2021] [Indexed: 01/12/2023] Open
Abstract
Machine-learned force fields combine the accuracy of ab initio methods with the efficiency of conventional force fields. However, current machine-learned force fields typically ignore electronic degrees of freedom, such as the total charge or spin state, and assume chemical locality, which is problematic when molecules have inconsistent electronic states, or when nonlocal effects play a significant role. This work introduces SpookyNet, a deep neural network for constructing machine-learned force fields with explicit treatment of electronic degrees of freedom and nonlocality, modeled via self-attention in a transformer architecture. Chemically meaningful inductive biases and analytical corrections built into the network architecture allow it to properly model physical limits. SpookyNet improves upon the current state-of-the-art (or achieves similar performance) on popular quantum chemistry data sets. Notably, it is able to generalize across chemical and conformational space and can leverage the learned chemical insights, e.g. by predicting unknown spin states, thus helping to close a further important remaining gap for today's machine learning models in quantum chemistry.
Collapse
Affiliation(s)
- Oliver T Unke
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany.
- DFG Cluster of Excellence "Unifying Systems in Catalysis" (UniSysCat), Technische Universität Berlin, 10623, Berlin, Germany.
| | - Stefan Chmiela
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany
| | - Michael Gastegger
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany
- DFG Cluster of Excellence "Unifying Systems in Catalysis" (UniSysCat), Technische Universität Berlin, 10623, Berlin, Germany
| | - Kristof T Schütt
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany
| | - Huziel E Sauceda
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany
- BASLEARN, BASF-TU joint Lab, Technische Universität Berlin, 10587, Berlin, Germany
| | - Klaus-Robert Müller
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany.
- Department of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul, 02841, Korea.
- Max Planck Institute for Informatics, Stuhlsatzenhausweg, 66123, Saarbrücken, Germany.
- BIFOLD-Berlin Institute for the Foundations of Learning and Data, Berlin, Germany.
- Google Research, Brain team, Berlin, Germany.
| |
Collapse
|
17
|
Pandey S, Qu J, Stevanović V, St. John P, Gorai P. Predicting energy and stability of known and hypothetical crystals using graph neural network. PATTERNS (NEW YORK, N.Y.) 2021; 2:100361. [PMID: 34820646 PMCID: PMC8600245 DOI: 10.1016/j.patter.2021.100361] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Revised: 07/31/2021] [Accepted: 09/09/2021] [Indexed: 11/28/2022]
Abstract
The discovery of new inorganic materials in unexplored chemical spaces necessitates calculating total energy quickly and with sufficient accuracy. Machine learning models that provide such a capability for both ground-state (GS) and higher-energy structures would be instrumental in accelerated screening. Here, we demonstrate the importance of a balanced training dataset of GS and higher-energy structures to accurately predict total energies using a generic graph neural network architecture. Using ∼ 16,500 density functional theory calculations from the National Renewable Energy Laboratory (NREL) Materials Database and ∼ 11,000 calculations for hypothetical structures as our training database, we demonstrate that our model satisfactorily ranks the structures in the correct order of total energies for a given composition. Furthermore, we present a thorough error analysis to explain failure modes of the model, including both prediction outliers and occasional inconsistencies in the training data. By examining intermediate layers of the model, we analyze how the model represents learned structures and properties.
Collapse
Affiliation(s)
- Shubham Pandey
- Department of Metallurgical and Materials Engineering, Colorado School of Mines, Golden, CO 80401, USA
| | - Jiaxing Qu
- Mechanical Science and Engineering, University of Illinois, Urbana, IL 61801, USA
| | - Vladan Stevanović
- Department of Metallurgical and Materials Engineering, Colorado School of Mines, Golden, CO 80401, USA
| | - Peter St. John
- National Renewable Energy Laboratory, Golden, CO 80401, USA
| | - Prashun Gorai
- Department of Metallurgical and Materials Engineering, Colorado School of Mines, Golden, CO 80401, USA
| |
Collapse
|
18
|
Abstract
Chemical compound space (CCS), the set of all theoretically conceivable combinations of chemical elements and (meta-)stable geometries that make up matter, is colossal. The first-principles based virtual sampling of this space, for example, in search of novel molecules or materials which exhibit desirable properties, is therefore prohibitive for all but the smallest subsets and simplest properties. We review studies aimed at tackling this challenge using modern machine learning techniques based on (i) synthetic data, typically generated using quantum mechanics based methods, and (ii) model architectures inspired by quantum mechanics. Such Quantum mechanics based Machine Learning (QML) approaches combine the numerical efficiency of statistical surrogate models with an ab initio view on matter. They rigorously reflect the underlying physics in order to reach universality and transferability across CCS. While state-of-the-art approximations to quantum problems impose severe computational bottlenecks, recent QML based developments indicate the possibility of substantial acceleration without sacrificing the predictive power of quantum mechanics.
Collapse
Affiliation(s)
- Bing Huang
- Faculty
of Physics, University of Vienna, 1090 Vienna, Austria
| | - O. Anatole von Lilienfeld
- Faculty
of Physics, University of Vienna, 1090 Vienna, Austria
- Institute
of Physical Chemistry and National Center for Computational Design
and Discovery of Novel Materials (MARVEL), Department of Chemistry, University of Basel, 4056 Basel, Switzerland
| |
Collapse
|
19
|
Unke O, Chmiela S, Sauceda HE, Gastegger M, Poltavsky I, Schütt KT, Tkatchenko A, Müller KR. Machine Learning Force Fields. Chem Rev 2021; 121:10142-10186. [PMID: 33705118 PMCID: PMC8391964 DOI: 10.1021/acs.chemrev.0c01111] [Citation(s) in RCA: 463] [Impact Index Per Article: 115.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Indexed: 12/27/2022]
Abstract
In recent years, the use of machine learning (ML) in computational chemistry has enabled numerous advances previously out of reach due to the computational complexity of traditional electronic-structure methods. One of the most promising applications is the construction of ML-based force fields (FFs), with the aim to narrow the gap between the accuracy of ab initio methods and the efficiency of classical FFs. The key idea is to learn the statistical relation between chemical structure and potential energy without relying on a preconceived notion of fixed chemical bonds or knowledge about the relevant interactions. Such universal ML approximations are in principle only limited by the quality and quantity of the reference data used to train them. This review gives an overview of applications of ML-FFs and the chemical insights that can be obtained from them. The core concepts underlying ML-FFs are described in detail, and a step-by-step guide for constructing and testing them from scratch is given. The text concludes with a discussion of the challenges that remain to be overcome by the next generation of ML-FFs.
Collapse
Affiliation(s)
- Oliver
T. Unke
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
- DFG
Cluster of Excellence “Unifying Systems in Catalysis”
(UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
| | - Stefan Chmiela
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
| | - Huziel E. Sauceda
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
- BASLEARN,
BASF-TU Joint Lab, Technische Universität
Berlin, 10587 Berlin, Germany
| | - Michael Gastegger
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
- DFG
Cluster of Excellence “Unifying Systems in Catalysis”
(UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
- BASLEARN,
BASF-TU Joint Lab, Technische Universität
Berlin, 10587 Berlin, Germany
| | - Igor Poltavsky
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Kristof T. Schütt
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
| | - Alexandre Tkatchenko
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Klaus-Robert Müller
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
- BIFOLD−Berlin
Institute for the Foundations of Learning and Data, Berlin, Germany
- Department
of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul 02841, Korea
- Max Planck
Institute for Informatics, Stuhlsatzenhausweg, 66123 Saarbrücken, Germany
- Google
Research, Brain Team, Berlin, Germany
| |
Collapse
|
20
|
Ceriotti M, Clementi C, Anatole von Lilienfeld O. Machine learning meets chemical physics. J Chem Phys 2021; 154:160401. [PMID: 33940847 DOI: 10.1063/5.0051418] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Over recent years, the use of statistical learning techniques applied to chemical problems has gained substantial momentum. This is particularly apparent in the realm of physical chemistry, where the balance between empiricism and physics-based theory has traditionally been rather in favor of the latter. In this guest Editorial for the special topic issue on "Machine Learning Meets Chemical Physics," a brief rationale is provided, followed by an overview of the topics covered. We conclude by making some general remarks.
Collapse
Affiliation(s)
- Michele Ceriotti
- Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Cecilia Clementi
- Department of Physics, Freie Universität Berlin, Arnimallee 14, 14195 Berlin, Germany
| | | |
Collapse
|
21
|
Sauceda HE, Vassilev-Galindo V, Chmiela S, Müller KR, Tkatchenko A. Dynamical strengthening of covalent and non-covalent molecular interactions by nuclear quantum effects at finite temperature. Nat Commun 2021; 12:442. [PMID: 33469007 PMCID: PMC7815839 DOI: 10.1038/s41467-020-20212-1] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 11/12/2020] [Indexed: 11/08/2022] Open
Abstract
Nuclear quantum effects (NQE) tend to generate delocalized molecular dynamics due to the inclusion of the zero point energy and its coupling with the anharmonicities in interatomic interactions. Here, we present evidence that NQE often enhance electronic interactions and, in turn, can result in dynamical molecular stabilization at finite temperature. The underlying physical mechanism promoted by NQE depends on the particular interaction under consideration. First, the effective reduction of interatomic distances between functional groups within a molecule can enhance the n → π* interaction by increasing the overlap between molecular orbitals or by strengthening electrostatic interactions between neighboring charge densities. Second, NQE can localize methyl rotors by temporarily changing molecular bond orders and leading to the emergence of localized transient rotor states. Third, for noncovalent van der Waals interactions the strengthening comes from the increase of the polarizability given the expanded average interatomic distances induced by NQE. The implications of these boosted interactions include counterintuitive hydroxyl-hydroxyl bonding, hindered methyl rotor dynamics, and molecular stiffening which generates smoother free-energy surfaces. Our findings yield new insights into the versatile role of nuclear quantum fluctuations in molecules and materials.
Collapse
Affiliation(s)
- Huziel E Sauceda
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany.
- BASLEARN, BASF-TU joint Lab, Technische Universität Berlin, 10587, Berlin, Germany.
| | - Valentin Vassilev-Galindo
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg
| | - Stefan Chmiela
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany
| | - Klaus-Robert Müller
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany.
- Department of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul, 02841, Korea.
- Max Planck Institute for Informatics, Stuhlsatzenhausweg, 66123, Saarbrücken, Germany.
- Google Research, Brain team, Berlin, Germany.
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
| |
Collapse
|