1
|
Airas J, Zhang B. Scaling Graph Neural Networks to Large Proteins. J Chem Theory Comput 2025; 21:2055-2066. [PMID: 39913331 DOI: 10.1021/acs.jctc.4c01420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/26/2025]
Abstract
Graph neural network (GNN) architectures have emerged as promising force field models, exhibiting high accuracy in predicting complex energies and forces based on atomic identities and Cartesian coordinates. To expand the applicability of GNNs, and machine learning force fields more broadly, optimizing their computational efficiency is critical, especially for large biomolecular systems in classical molecular dynamics simulations. In this study, we address key challenges in existing GNN benchmarks by introducing a dataset, DISPEF, which comprises large, biologically relevant proteins. DISPEF includes 207,454 proteins with sizes up to 12,499 atoms and features diverse chemical environments, spanning folded and disordered regions. The implicit solvation free energies, used as training targets, represent a particularly challenging case due to their many-body nature, providing a stringent test for evaluating the expressiveness of machine learning models. We benchmark the performance of seven GNNs on DISPEF, emphasizing the importance of directly accounting for long-range interactions to enhance model transferability. Additionally, we present a novel multiscale architecture, termed Schake, which delivers transferable and computationally efficient energy and force predictions for large proteins. Our findings offer valuable insights and tools for advancing GNNs in protein modeling applications.
Collapse
Affiliation(s)
- Justin Airas
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139-4307, United States
| | - Bin Zhang
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139-4307, United States
| |
Collapse
|
2
|
Tao Y, Giese TJ, York DM. Electronic and Nuclear Quantum Effects on Proton Transfer Reactions of Guanine-Thymine (G-T) Mispairs Using Combined Quantum Mechanical/Molecular Mechanical and Machine Learning Potentials. Molecules 2024; 29:2703. [PMID: 38893576 PMCID: PMC11173453 DOI: 10.3390/molecules29112703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Revised: 05/30/2024] [Accepted: 06/04/2024] [Indexed: 06/21/2024] Open
Abstract
Rare tautomeric forms of nucleobases can lead to Watson-Crick-like (WC-like) mispairs in DNA, but the process of proton transfer is fast and difficult to detect experimentally. NMR studies show evidence for the existence of short-time WC-like guanine-thymine (G-T) mispairs; however, the mechanism of proton transfer and the degree to which nuclear quantum effects play a role are unclear. We use a B-DNA helix exhibiting a wGT mispair as a model system to study tautomerization reactions. We perform ab initio (PBE0/6-31G*) quantum mechanical/molecular mechanical (QM/MM) simulations to examine the free energy surface for tautomerization. We demonstrate that while the ab initio QM/MM simulations are accurate, considerable sampling is required to achieve high precision in the free energy barriers. To address this problem, we develop a QM/MM machine learning potential correction (QM/MM-ΔMLP) that is able to improve the computational efficiency, greatly extend the accessible time scales of the simulations, and enable practical application of path integral molecular dynamics to examine nuclear quantum effects. We find that the inclusion of nuclear quantum effects has only a modest effect on the mechanistic pathway but leads to a considerable lowering of the free energy barrier for the GT*⇌G*T equilibrium. Our results enable a rationalization of observed experimental data and the prediction of populations of rare tautomeric forms of nucleobases and rates of their interconversion in B-DNA.
Collapse
|
3
|
Gandolfi M, Ceotto M. Molecular Dynamics of Artificially Pair-Decoupled Systems: An Accurate Tool for Investigating the Importance of Intramolecular Couplings. J Chem Theory Comput 2023; 19:6093-6108. [PMID: 37698951 PMCID: PMC10536992 DOI: 10.1021/acs.jctc.3c00553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Indexed: 09/14/2023]
Abstract
We propose a numerical technique to accurately simulate the vibrations of organic molecules in the gas phase, when pairs of atoms (or, in general, groups of degrees of freedom) are artificially decoupled, so that their motion is instantaneously decorrelated. The numerical technique we have developed is a symplectic integration algorithm that never requires computation of the force but requires estimates of the Hessian matrix. The theory we present to support our technique postulates a pair-decoupling Hamiltonian function, which parametrically depends on a decoupling coefficient α ∈ [0, 1]. The closer α is to 0, the more decoupled the selected atoms. We test the correctness of our numerical method on small molecular systems, and we apply it to study the vibrational spectroscopic features of salicylic acid at the Density Functional Theory ab initio level on a fitted potential. Our pair-decoupled simulations of salicylic acid show that decoupling hydrogen-bonded atoms do not significantly influence the frequencies of stretching modes, but enhance enormously the out-of-plane wagging and twisting motions of the hydroxyl and carboxyl groups to the point that the carboxyl and hydroxyl groups may overcome high potential energy barriers and change the salicylic acid conformation after a short simulation time. In addition, we found that the acidity of salicylic acid is more influenced by the dynamical couplings of the proton of the carboxylic group with the carbon ring than with the hydroxyl group.
Collapse
Affiliation(s)
- Michele Gandolfi
- Dipartimento di Chimica, Università degli Studi di Milano, via Golgi 19, 20133 Milano, Italy
| | - Michele Ceotto
- Dipartimento di Chimica, Università degli Studi di Milano, via Golgi 19, 20133 Milano, Italy
| |
Collapse
|
4
|
Kraka E, Antonio JJ, Freindorf M. Reaction mechanism - explored with the unified reaction valley approach. Chem Commun (Camb) 2023; 59:7151-7165. [PMID: 37233449 DOI: 10.1039/d3cc01576a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
One of the ultimate goals of chemistry is to understand and manipulate chemical reactions, which implies the ability to monitor the reaction and its underlying mechanism at an atomic scale. In this article, we introduce the Unified Reaction Valley Approach (URVA) as a tool for elucidating reaction mechanisms, complementing existing computational procedures. URVA combines the concept of the potential energy surface with vibrational spectroscopy and describes a chemical reaction via the reaction path and the surrounding reaction valley traced out by the reacting species on the potential energy surface on their way from the entrance to the exit channel, where the products are located. The key feature of URVA is the focus on the curving of the reaction path. Moving along the reaction path, any electronic structure change of the reacting species is registered by a change in the normal vibrational modes spanning the reaction valley and their coupling with the path, which recovers the curvature of the reaction path. This leads to a unique curvature profile for each chemical reaction, with curvature minima reflecting minimal change and curvature maxima indicating the location of important chemical events such as bond breaking/formation, charge polarization and transfer, rehybridization, etc. A decomposition of the path curvature into internal coordinate components or other coordinates of relevance for the reaction under consideration, provides comprehensive insight into the origin of the chemical changes taking place. After giving an overview of current experimental and computational efforts to gain insight into the mechanism of a chemical reaction and presenting the theoretical background of URVA, we illustrate how URVA works for three diverse processes, (i) [1,3] hydrogen transfer reactions; (ii) α-keto-amino inhibitor for SARS-CoV-2 Mpro; (iii) Rh-catalyzed cyanation. We hope that this article will inspire our computational colleagues to add URVA to their repertoire and will serve as an incubator for new reaction mechanisms to be studied in collaboration with our experimental experts in the field.
Collapse
Affiliation(s)
- Elfi Kraka
- Computational and Theoretical Chemistry Group (CATCO), Department of Chemistry, Southern Methodist University, 3215 Daniel Ave, Dallas, TX 75275-0314, USA.
| | - Juliana J Antonio
- Computational and Theoretical Chemistry Group (CATCO), Department of Chemistry, Southern Methodist University, 3215 Daniel Ave, Dallas, TX 75275-0314, USA.
| | - Marek Freindorf
- Computational and Theoretical Chemistry Group (CATCO), Department of Chemistry, Southern Methodist University, 3215 Daniel Ave, Dallas, TX 75275-0314, USA.
| |
Collapse
|
5
|
Chmiela S, Vassilev-Galindo V, Unke OT, Kabylda A, Sauceda HE, Tkatchenko A, Müller KR. Accurate global machine learning force fields for molecules with hundreds of atoms. SCIENCE ADVANCES 2023; 9:eadf0873. [PMID: 36630510 PMCID: PMC9833674 DOI: 10.1126/sciadv.adf0873] [Citation(s) in RCA: 44] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Accepted: 11/28/2022] [Indexed: 05/25/2023]
Abstract
Global machine learning force fields, with the capacity to capture collective interactions in molecular systems, now scale up to a few dozen atoms due to considerable growth of model complexity with system size. For larger molecules, locality assumptions are introduced, with the consequence that nonlocal interactions are not described. Here, we develop an exact iterative approach to train global symmetric gradient domain machine learning (sGDML) force fields (FFs) for several hundred atoms, without resorting to any potentially uncontrolled approximations. All atomic degrees of freedom remain correlated in the global sGDML FF, allowing the accurate description of complex molecules and materials that present phenomena with far-reaching characteristic correlation lengths. We assess the accuracy and efficiency of sGDML on a newly developed MD22 benchmark dataset containing molecules from 42 to 370 atoms. The robustness of our approach is demonstrated in nanosecond path-integral molecular dynamics simulations for supramolecular complexes in the MD22 dataset.
Collapse
Affiliation(s)
- Stefan Chmiela
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- Berlin Institute for the Foundations of Learning and Data – BIFOLD, Germany
| | - Valentin Vassilev-Galindo
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Oliver T. Unke
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- Google Research, Brain Team, Berlin, Germany
| | - Adil Kabylda
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Huziel E. Sauceda
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- Berlin Institute for the Foundations of Learning and Data – BIFOLD, Germany
- Departamento de Materia Condensada, Instituto de Física, Universidad Nacional Autónoma de México, Cd. de México C.P. 04510, Mexico
- BASLEARN - TU Berlin/BASF Joint Lab for Machine Learning, Technische Universität Berlin, 10587 Berlin, Germany
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Klaus-Robert Müller
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- Berlin Institute for the Foundations of Learning and Data – BIFOLD, Germany
- Google Research, Brain Team, Berlin, Germany
- Max Planck Institute for Informatics, Stuhlsatzenhausweg, 66123 Saarbrücken, Germany
- Department of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul 02841, Korea
| |
Collapse
|
6
|
Bokhimi X. Learning the Use of Artificial Intelligence in Heterogeneous Catalysis. FRONTIERS IN CHEMICAL ENGINEERING 2021. [DOI: 10.3389/fceng.2021.740270] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
We describe the use of artificial intelligence techniques in heterogeneous catalysis. This description is intended to give readers some clues for the use of these techniques in their research or industrial processes related to hydrodesulfurization. Since the description corresponds to supervised learning, first of all, we give a brief introduction to this type of learning, emphasizing the variables X and Y that define it. For each description, there is a particular emphasis on highlighting these variables. This emphasis will help define them when one works on a new application. The descriptions that we present relate to the construction of learning machines that infer adsorption energies, surface areas, adsorption isotherms of nanoporous materials, novel catalysts, and the sulfur content after hydrodesulfurization. These learning machines can predict adsorption energies with mean absolute errors of 0.15 eV for a diverse chemical space. They predict more precise surface areas of porous materials than the BET technique and can calculate their isotherms much faster than the Monte Carlo method. These machines can also predict new catalysts by learning from the catalytic behavior of materials generated through atomic substitutions. When the machines learn from the variables associated with a hydrodesulfurization process, they can predict the sulfur content in the final product.
Collapse
|
7
|
Sauceda HE, Vassilev-Galindo V, Chmiela S, Müller KR, Tkatchenko A. Dynamical strengthening of covalent and non-covalent molecular interactions by nuclear quantum effects at finite temperature. Nat Commun 2021; 12:442. [PMID: 33469007 PMCID: PMC7815839 DOI: 10.1038/s41467-020-20212-1] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 11/12/2020] [Indexed: 11/08/2022] Open
Abstract
Nuclear quantum effects (NQE) tend to generate delocalized molecular dynamics due to the inclusion of the zero point energy and its coupling with the anharmonicities in interatomic interactions. Here, we present evidence that NQE often enhance electronic interactions and, in turn, can result in dynamical molecular stabilization at finite temperature. The underlying physical mechanism promoted by NQE depends on the particular interaction under consideration. First, the effective reduction of interatomic distances between functional groups within a molecule can enhance the n → π* interaction by increasing the overlap between molecular orbitals or by strengthening electrostatic interactions between neighboring charge densities. Second, NQE can localize methyl rotors by temporarily changing molecular bond orders and leading to the emergence of localized transient rotor states. Third, for noncovalent van der Waals interactions the strengthening comes from the increase of the polarizability given the expanded average interatomic distances induced by NQE. The implications of these boosted interactions include counterintuitive hydroxyl-hydroxyl bonding, hindered methyl rotor dynamics, and molecular stiffening which generates smoother free-energy surfaces. Our findings yield new insights into the versatile role of nuclear quantum fluctuations in molecules and materials.
Collapse
Affiliation(s)
- Huziel E Sauceda
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany.
- BASLEARN, BASF-TU joint Lab, Technische Universität Berlin, 10587, Berlin, Germany.
| | - Valentin Vassilev-Galindo
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg
| | - Stefan Chmiela
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany
| | - Klaus-Robert Müller
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany.
- Department of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul, 02841, Korea.
- Max Planck Institute for Informatics, Stuhlsatzenhausweg, 66123, Saarbrücken, Germany.
- Google Research, Brain team, Berlin, Germany.
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
| |
Collapse
|
8
|
Sauceda HE, Gastegger M, Chmiela S, Müller KR, Tkatchenko A. Molecular force fields with gradient-domain machine learning (GDML): Comparison and synergies with classical force fields. J Chem Phys 2020; 153:124109. [DOI: 10.1063/5.0023005] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Affiliation(s)
- Huziel E. Sauceda
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg, Luxembourg
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- BASLEARN, BASF-TU Joint Lab, Technische Universität Berlin, 10587 Berlin, Germany
| | - Michael Gastegger
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- BASLEARN, BASF-TU Joint Lab, Technische Universität Berlin, 10587 Berlin, Germany
- DFG Cluster of Excellence “Unifying Systems in Catalysis” (UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
| | - Stefan Chmiela
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
| | - Klaus-Robert Müller
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- Department of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul 136-713, South Korea
- Max Planck Institute for Informatics, Stuhlsatzenhausweg, 66123 Saarbrücken, Germany
- Google Research, Brain Team, Berlin, Germany
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg, Luxembourg
| |
Collapse
|