1
|
Sharma A, Sanvito S. Quantum-accurate machine learning potentials for metal-organic frameworks using temperature driven active learning. NPJ COMPUTATIONAL MATERIALS 2024; 10:237. [PMID: 39391672 PMCID: PMC11461275 DOI: 10.1038/s41524-024-01427-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Accepted: 09/26/2024] [Indexed: 10/12/2024]
Abstract
Understanding structural flexibility of metal-organic frameworks (MOFs) via molecular dynamics simulations is crucial to design better MOFs. Density functional theory (DFT) and quantum-chemistry methods provide highly accurate molecular dynamics, but the computational overheads limit their use in long time-dependent simulations. In contrast, classical force fields struggle with the description of coordination bonds. Here we develop a DFT-accurate machine-learning spectral neighbor analysis potentials for two representative MOFs. Their structural and vibrational properties are then studied and tightly compared with available experimental data. Most importantly, we demonstrate an active-learning algorithm, based on mapping the relevant internal coordinates, which drastically reduces the number of training data to be computed at the DFT level. Thus, the workflow presented here appears as an efficient strategy for the study of flexible MOFs with DFT accuracy, but at a fraction of the DFT computational cost.
Collapse
Affiliation(s)
- Abhishek Sharma
- School of Physics, AMBER and CRANN Institute, Trinity College, Dublin 2, Ireland
| | - Stefano Sanvito
- School of Physics, AMBER and CRANN Institute, Trinity College, Dublin 2, Ireland
| |
Collapse
|
2
|
Hou YF, Zhang Q, Dral PO. Surprising Dynamics Phenomena in the Diels-Alder Reaction of C 60 Uncovered with AI. J Org Chem 2024. [PMID: 39358911 DOI: 10.1021/acs.joc.4c01763] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/04/2024]
Abstract
We performed an extensive artificial intelligence-accelerated quasi-classical molecular dynamics investigation of the time-resolved mechanism of the Diels-Alder reaction of fullerene C60 with 2,3-dimethyl-1,3-butadiene. In a substantial fraction (10%) of reactive trajectories, the larger C60 noncovalently attracts the 2,3-dimethyl-1,3-butadiene long before the barrier so that the diene undergoes the series of complex motions including roaming, somersaults, twisting, and twisting somersaults around the fullerene until it aligns itself to pass over the barrier. These complicated processes could be easily missed in typically performed quantum chemical simulations with shorter and fewer trajectories. After the barrier is passed, the bonds take longer to form compared to the simplest prototypical Diels-Alder reaction of ethene with 1,3-butadiene despite high similarities in transition states and barrier widths evaluated with intrinsic reaction coordinate (IRC) calculations. C60 is mainly responsible for these differences as its reaction with 1,3-butadiene is similar to the reaction with 2,3-dimethyl-1,3-butadiene: the only substantial difference being that the extra methyl groups double the probability of the prolonged alignment phase in dynamics. These additional calculations of C60 with 1,3-butadiene could be performed via active learning more easily by reusing the data generated for the other two reactions, showing the potential for larger-scale exploration of the effects of different substrates in the same types of reactions.
Collapse
Affiliation(s)
- Yi-Fan Hou
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| | - Quanhao Zhang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- Institute of Physics, Faculty of Physics, Astronomy, and Informatics, Nicolaus Copernicus University in Toruń, Ul. Grudziądzka 5, Toruń 87-100, Poland
| |
Collapse
|
3
|
Hou YF, Zhang L, Zhang Q, Ge F, Dral PO. Physics-Informed Active Learning for Accelerating Quantum Chemical Simulations. J Chem Theory Comput 2024. [PMID: 39264419 DOI: 10.1021/acs.jctc.4c00821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/13/2024]
Abstract
Quantum chemical simulations can be greatly accelerated by constructing machine learning potentials, which is often done using active learning (AL). The usefulness of the constructed potentials is often limited by the high effort required and their insufficient robustness in the simulations. Here, we introduce the end-to-end AL for constructing robust data-efficient potentials with affordable investment of time and resources and minimum human interference. Our AL protocol is based on the physics-informed sampling of training points, automatic selection of initial data, uncertainty quantification, and convergence monitoring. The versatility of this protocol is shown in our implementation of quasi-classical molecular dynamics for simulating vibrational spectra, conformer search of a key biochemical molecule, and time-resolved mechanism of the Diels-Alder reaction. These investigations took us days instead of weeks of pure quantum chemical calculations on a high-performance computing cluster.
Collapse
Affiliation(s)
- Yi-Fan Hou
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| | - Lina Zhang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| | - Quanhao Zhang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| | - Fuchun Ge
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- Institute of Physics, Faculty of Physics, Astronomy, and Informatics, Nicolaus Copernicus University in Toruń, ul. Grudziądzka 5, Toruń 87-100, Poland
| |
Collapse
|
4
|
Si Y, Ou H, Jin X, Gu M, Sheng S, Peng W, Yang D, Zhan X, Zhang L, Yu Q, Liu X, Liu Y. G protein pathway suppressor 2 suppresses aerobic glycolysis through RACK1-mediated HIF-1α degradation in breast cancer. Free Radic Biol Med 2024; 222:478-492. [PMID: 38942092 DOI: 10.1016/j.freeradbiomed.2024.06.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 06/16/2024] [Accepted: 06/25/2024] [Indexed: 06/30/2024]
Abstract
Aerobic glycolysis has been recognized as a hallmark of human cancer. G protein pathway suppressor 2 (GPS2) is a negative regulator of the G protein-MAPK pathway and a core subunit of the NCoR/SMRT transcriptional co-repressor complex. However, how its biological properties intersect with cellular metabolism in breast cancer (BC) development remains poorly elucidated. Here, we report that GPS2 is low expressed in BC tissues and negatively correlated with poor prognosis. Both in vitro and in vivo studies demonstrate that GPS2 suppresses malignant progression of BC. Moreover, GPS2 suppresses aerobic glycolysis in BC cells. Mechanistically, GPS2 destabilizes HIF-1α to reduce the transcription of its downstream glycolytic regulators (PGK1, PGAM1, ENO1, PKM2, LDHA, PDK1, PDK2, and PDK4), and then suppresses cellular aerobic glycolysis. Notably, receptor for activated C kinase 1 (RACK1) is identified as a key ubiquitin ligase for GPS2 to promote HIF-1α degradation. GPS2 stabilizes the binding of HIF-1α to RACK1 by directly binding to RACK1, resulting in polyubiquitination and instability of HIF-1α. Amino acid residues 70-92 aa of the GPS2 N-terminus bind RACK1. A 23-amino-acid-long GPS2-derived peptide was developed based on this N-terminal region, which promotes the interaction of RACK1 with HIF-1α, downregulates HIF-1α expression and significantly suppresses BC tumorigenesis in vitro and in vivo. In conclusion, our findings indicate that GPS2 decreases the stability of HIF-1α, which in turn suppresses aerobic glycolysis and tumorigenesis in BC, suggesting that targeting HIF-1α degradation and treating with peptides may be a promising approach to treat BC.
Collapse
Affiliation(s)
- Yuan Si
- Laboratory of Molecular Target Therapy of Cancer, Institute of Basic Medical Sciences, Hubei University of Medicine, Shiyan, Hubei, China; Hubei Key Laboratory of Wudang Local Chinese Medicine Research, Hubei University of Medicine, Shiyan, Hubei, China.
| | - Hongling Ou
- Laboratory of Molecular Target Therapy of Cancer, Institute of Basic Medical Sciences, Hubei University of Medicine, Shiyan, Hubei, China; Hubei Key Laboratory of Embryonic Stem Cell Research, Hubei University of Medicine, Shiyan, Hubei, China
| | - Xin Jin
- Laboratory of Molecular Target Therapy of Cancer, Institute of Basic Medical Sciences, Hubei University of Medicine, Shiyan, Hubei, China; Institute of Modern Biology, Nanjing University, Nanjing, Jiangsu, China
| | - Manxiang Gu
- Laboratory of Molecular Target Therapy of Cancer, Institute of Basic Medical Sciences, Hubei University of Medicine, Shiyan, Hubei, China; Hubei Key Laboratory of Wudang Local Chinese Medicine Research, Hubei University of Medicine, Shiyan, Hubei, China
| | - Songran Sheng
- Laboratory of Molecular Target Therapy of Cancer, Institute of Basic Medical Sciences, Hubei University of Medicine, Shiyan, Hubei, China; Hubei Key Laboratory of Wudang Local Chinese Medicine Research, Hubei University of Medicine, Shiyan, Hubei, China
| | - Wenkang Peng
- Laboratory of Molecular Target Therapy of Cancer, Institute of Basic Medical Sciences, Hubei University of Medicine, Shiyan, Hubei, China; Hubei Key Laboratory of Wudang Local Chinese Medicine Research, Hubei University of Medicine, Shiyan, Hubei, China
| | - Dan Yang
- Laboratory of Molecular Target Therapy of Cancer, Institute of Basic Medical Sciences, Hubei University of Medicine, Shiyan, Hubei, China; Hubei Key Laboratory of Embryonic Stem Cell Research, Hubei University of Medicine, Shiyan, Hubei, China
| | - Xiangrong Zhan
- Laboratory of Molecular Target Therapy of Cancer, Institute of Basic Medical Sciences, Hubei University of Medicine, Shiyan, Hubei, China; Hubei Key Laboratory of Embryonic Stem Cell Research, Hubei University of Medicine, Shiyan, Hubei, China
| | - Liang Zhang
- Laboratory of Molecular Target Therapy of Cancer, Institute of Basic Medical Sciences, Hubei University of Medicine, Shiyan, Hubei, China; Hubei Key Laboratory of Embryonic Stem Cell Research, Hubei University of Medicine, Shiyan, Hubei, China
| | - Qingqing Yu
- Laboratory of Molecular Target Therapy of Cancer, Institute of Basic Medical Sciences, Hubei University of Medicine, Shiyan, Hubei, China; Hubei Key Laboratory of Wudang Local Chinese Medicine Research, Hubei University of Medicine, Shiyan, Hubei, China
| | - Xuewen Liu
- Laboratory of Molecular Target Therapy of Cancer, Institute of Basic Medical Sciences, Hubei University of Medicine, Shiyan, Hubei, China; Hubei Key Laboratory of Wudang Local Chinese Medicine Research, Hubei University of Medicine, Shiyan, Hubei, China; Hubei Key Laboratory of Embryonic Stem Cell Research, Hubei University of Medicine, Shiyan, Hubei, China.
| | - Ying Liu
- Laboratory of Molecular Target Therapy of Cancer, Institute of Basic Medical Sciences, Hubei University of Medicine, Shiyan, Hubei, China; Hubei Key Laboratory of Wudang Local Chinese Medicine Research, Hubei University of Medicine, Shiyan, Hubei, China; Hubei Key Laboratory of Embryonic Stem Cell Research, Hubei University of Medicine, Shiyan, Hubei, China.
| |
Collapse
|
5
|
Xu H, Zhao Y, Zhang Y, Han J, Zan P, He S, Bo X. Deep active learning with high structural discriminability for molecular mutagenicity prediction. Commun Biol 2024; 7:1071. [PMID: 39217273 PMCID: PMC11366013 DOI: 10.1038/s42003-024-06758-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 08/21/2024] [Indexed: 09/04/2024] Open
Abstract
The assessment of mutagenicity is essential in drug discovery, as it may lead to cancer and germ cells damage. Although in silico methods have been proposed for mutagenicity prediction, their performance is hindered by the scarcity of labeled molecules. However, experimental mutagenicity testing can be time-consuming and costly. One solution to reduce the annotation cost is active learning, where the algorithm actively selects the most valuable molecules from a vast chemical space and presents them to the oracle (e.g., a human expert) for annotation, thereby rapidly improving the model's predictive performance with a smaller annotation cost. In this paper, we propose muTOX-AL, a deep active learning framework, which can actively explore the chemical space and identify the most valuable molecules, resulting in competitive performance with a small number of labeled samples. The experimental results show that, compared to the random sampling strategy, muTOX-AL can reduce the number of training molecules by about 57%. Additionally, muTOX-AL exhibits outstanding molecular structural discriminability, allowing it to pick molecules with high structural similarity but opposite properties.
Collapse
Affiliation(s)
- Huiyan Xu
- Shanghai Key Laboratory of Power Station Automation Technology, School of Mechatronics Engineering and Automation, Shanghai University, Shanghai, China
- Academy of Military Medical Sciences, Beijing, China
| | - Yanpeng Zhao
- Academy of Military Medical Sciences, Beijing, China
| | - Yixin Zhang
- Academy of Military Medical Sciences, Beijing, China
| | - Junshan Han
- Academy of Military Medical Sciences, Beijing, China
| | - Peng Zan
- Shanghai Key Laboratory of Power Station Automation Technology, School of Mechatronics Engineering and Automation, Shanghai University, Shanghai, China.
| | - Song He
- Academy of Military Medical Sciences, Beijing, China.
| | - Xiaochen Bo
- Academy of Military Medical Sciences, Beijing, China.
| |
Collapse
|
6
|
Jin Y, Perez-Lemus GR, Zubieta Rico PF, de Pablo JJ. Improving Machine Learned Force Fields for Complex Fluids through Enhanced Sampling: A Liquid Crystal Case Study. J Phys Chem A 2024; 128:7257-7268. [PMID: 39150905 DOI: 10.1021/acs.jpca.4c01546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/18/2024]
Abstract
Machine learned force fields offer the potential for faster execution times while retaining the accuracy of traditional DFT calculations, making them promising candidates for molecular simulations in cases where reliable classical force fields are not available. Some of the challenges associated with machine learned force fields include simulation stability over extended periods of time and ensuring that the statistical and dynamical properties of the underlying simulated systems are correctly captured. In this work, we propose a systematic training pipeline for such force fields that leads to improved model quality, compared to that achieved by traditional data generation and training approaches. That pipeline relies on the use of enhanced sampling techniques, and it is demonstrated here in the context of a liquid crystal, which exemplifies many of the challenges that are encountered in fluids and materials with complex free energy landscapes. Our results indicate that, whereas the majority of traditional machine learned force field training approaches lead to molecular dynamics simulations that are only stable over hundred-picosecond trajectories, our approach allows for stable simulations over tens of nanoseconds for organic molecular systems comprising thousands of atoms.
Collapse
Affiliation(s)
- Yezhi Jin
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637-1476, United States
| | - Gustavo R Perez-Lemus
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637-1476, United States
| | - Pablo F Zubieta Rico
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637-1476, United States
| | - Juan J de Pablo
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637-1476, United States
| |
Collapse
|
7
|
Zhang H, Juraskova V, Duarte F. Modelling chemical processes in explicit solvents with machine learning potentials. Nat Commun 2024; 15:6114. [PMID: 39030199 PMCID: PMC11271496 DOI: 10.1038/s41467-024-50418-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Accepted: 07/08/2024] [Indexed: 07/21/2024] Open
Abstract
Solvent effects influence all stages of the chemical processes, modulating the stability of intermediates and transition states, as well as altering reaction rates and product ratios. However, accurately modelling these effects remains challenging. Here, we present a general strategy for generating reactive machine learning potentials to model chemical processes in solution. Our approach combines active learning with descriptor-based selectors and automation, enabling the construction of data-efficient training sets that span the relevant chemical and conformational space. We apply this strategy to investigate a Diels-Alder reaction in water and methanol. The generated machine learning potentials enable us to obtain reaction rates that are in agreement with experimental data and analyse the influence of these solvents on the reaction mechanism. Our strategy offers an efficient approach to the routine modelling of chemical reactions in solution, opening up avenues for studying complex chemical processes in an efficient manner.
Collapse
Affiliation(s)
- Hanwen Zhang
- Chemistry Research Laboratory, Oxford, United Kingdom
| | | | | |
Collapse
|
8
|
Dong J, Wang S, Cui W, Sun X, Guo H, Yan H, Vogel H, Wang Z, Yuan S. Machine Learning Deciphered Molecular Mechanistics with Accurate Kinetic and Thermodynamic Prediction. J Chem Theory Comput 2024; 20:4499-4513. [PMID: 38394691 DOI: 10.1021/acs.jctc.3c01412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2024]
Abstract
Time-lagged independent component analysis (tICA) and the Markov state model (MSM) have been extensively employed for extracting conformational dynamics and kinetic community networks from unbiased trajectory ensembles. However, these techniques may not be the optimal choice for elucidating transition mechanisms within low-dimensional representations, especially for intricate biosystems. Unraveling the association mechanism in such complex systems always necessitates permutations of several essential independent components or collective variables, a process that is inherently obscure and may require empirical knowledge for selection. To address these challenges, we have implemented an integrated unsupervised dimension reduction model: uniform manifold approximation and projection (UMAP) with hierarchy density-based spatial clustering of applications with noise (HDBSCAN). This approach effectively generates low-dimensional configurational embeddings. The hierarchical application of this architecture, in conjunction with MSM, reveals global kinetic connectivity while identifying local conformational states. Consequently, our methodology establishes a multiscale mechanistic elucidation framework. Leveraging the benefits of the uniform sample distribution and a denoising approach, our model demonstrates robustness in preserving global and local data structures compared to traditional dimension reduction methods in the field of MD analysis area. The interpretability of hyperparameter selection and compatibility with downstream tasks are cross-validated across various simulation data sets, utilizing both computational evaluation metrics and experimental kinetic observables. Furthermore, the predicted Mcl1-BH3 association kinetics (0.76 s-1) is in close agreement with surface plasmon resonance experiments (0.12 s-1), affirming the plausibility of the identified pathway composed of representative conformations. We anticipate that the devised workflow will serve as a foundational framework for studying recognition patterns in complex biological systems. Its contributions extend to the exploration of protein functional dynamics and rational drug design, offering a potent avenue for advancing research in these domains.
Collapse
Affiliation(s)
- Junlin Dong
- Research Center for Computer-Aided Drug Discovery, Institute of Biomedicine and Biotechnology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Shiyu Wang
- Research Center for Computer-Aided Drug Discovery, Institute of Biomedicine and Biotechnology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- AlphaMol Science Ltd, Shenzhen 518055, China
| | - Wenqiang Cui
- Research Center for Computer-Aided Drug Discovery, Institute of Biomedicine and Biotechnology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiaolin Sun
- Research Center for Computer-Aided Drug Discovery, Institute of Biomedicine and Biotechnology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Haojie Guo
- Research Center for Computer-Aided Drug Discovery, Institute of Biomedicine and Biotechnology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Hailu Yan
- School of Biological Sciences, College of Science and Engineering, University of Edinburgh, Edinburgh EH8 9YL, U.K
| | - Horst Vogel
- Research Center for Computer-Aided Drug Discovery, Institute of Biomedicine and Biotechnology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Zhi Wang
- Artificial Intelligence Department, Zhejiang Financial College, Hangzhou 310018, China
| | - Shuguang Yuan
- Research Center for Computer-Aided Drug Discovery, Institute of Biomedicine and Biotechnology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- AlphaMol Science Ltd, Shenzhen 518055, China
| |
Collapse
|
9
|
Yang Y, Zhang S, Ranasinghe KD, Isayev O, Roitberg AE. Machine Learning of Reactive Potentials. Annu Rev Phys Chem 2024; 75:371-395. [PMID: 38941524 DOI: 10.1146/annurev-physchem-062123-024417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/30/2024]
Abstract
In the past two decades, machine learning potentials (MLPs) have driven significant developments in chemical, biological, and material sciences. The construction and training of MLPs enable fast and accurate simulations and analysis of thermodynamic and kinetic properties. This review focuses on the application of MLPs to reaction systems with consideration of bond breaking and formation. We review the development of MLP models, primarily with neural network and kernel-based algorithms, and recent applications of reactive MLPs (RMLPs) to systems at different scales. We show how RMLPs are constructed, how they speed up the calculation of reactive dynamics, and how they facilitate the study of reaction trajectories, reaction rates, free energy calculations, and many other calculations. Different data sampling strategies applied in building RMLPs are also discussed with a focus on how to collect structures for rare events and how to further improve their performance with active learning.
Collapse
Affiliation(s)
- Yinuo Yang
- Department of Chemistry, University of Florida, Gainesville, Florida;
| | - Shuhao Zhang
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania;
| | | | - Olexandr Isayev
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania;
| | - Adrian E Roitberg
- Department of Chemistry, University of Florida, Gainesville, Florida;
| |
Collapse
|
10
|
Duignan TT. The Potential of Neural Network Potentials. ACS PHYSICAL CHEMISTRY AU 2024; 4:232-241. [PMID: 38800721 PMCID: PMC11117678 DOI: 10.1021/acsphyschemau.4c00004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 03/04/2024] [Accepted: 03/05/2024] [Indexed: 05/29/2024]
Abstract
In the next half-century, physical chemistry will likely undergo a profound transformation, driven predominantly by the combination of recent advances in quantum chemistry and machine learning (ML). Specifically, equivariant neural network potentials (NNPs) are a breakthrough new tool that are already enabling us to simulate systems at the molecular scale with unprecedented accuracy and speed, relying on nothing but fundamental physical laws. The continued development of this approach will realize Paul Dirac's 80-year-old vision of using quantum mechanics to unify physics with chemistry and providing invaluable tools for understanding materials science, biology, earth sciences, and beyond. The era of highly accurate and efficient first-principles molecular simulations will provide a wealth of training data that can be used to build automated computational methodologies, using tools such as diffusion models, for the design and optimization of systems at the molecular scale. Large language models (LLMs) will also evolve into increasingly indispensable tools for literature review, coding, idea generation, and scientific writing.
Collapse
|
11
|
France-Lanord A, Vroylandt H, Salanne M, Rotenberg B, Saitta AM, Pietrucci F. Data-Driven Path Collective Variables. J Chem Theory Comput 2024; 20:3069-3084. [PMID: 38619076 DOI: 10.1021/acs.jctc.4c00123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
Identifying optimal collective variables to model transformations using atomic-scale simulations is a long-standing challenge. We propose a new method for the generation, optimization, and comparison of collective variables that can be thought of as a data-driven generalization of the path collective variable concept. It consists of a kernel ridge regression of the committor probability, which encodes a transformation's progress. The resulting collective variable is one-dimensional, interpretable, and differentiable, making it appropriate for enhanced sampling simulations requiring biasing. We demonstrate the validity of the method on two different applications: a precipitation model and the association of Li+ and F- in water. For the former, we show that global descriptors such as the permutation invariant vector allow reaching an accuracy far from the one achieved via simpler, more intuitive variables. For the latter, we show that information correlated with the transformation mechanism is contained in the first solvation shell only and that inertial effects prevent the derivation of optimal collective variables from the atomic positions only.
Collapse
Affiliation(s)
- Arthur France-Lanord
- Institut des Sciences du Calcul et des Données, ISCD, Sorbonne Université, F-75005 Paris, France
- Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Sorbonne Université, F-75005 Paris, France
| | - Hadrien Vroylandt
- Institut des Sciences du Calcul et des Données, ISCD, Sorbonne Université, F-75005 Paris, France
| | - Mathieu Salanne
- Physicochimie des Électrolytes et Nanosystèmes Interfaciaux, Sorbonne Université, CNRS, 4 Place Jussieu, F-75005 Paris, France
- Institut Universitaire de France (IUF), 75231 Paris, France
| | - Benjamin Rotenberg
- Physicochimie des Électrolytes et Nanosystèmes Interfaciaux, Sorbonne Université, CNRS, 4 Place Jussieu, F-75005 Paris, France
| | - A Marco Saitta
- Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Sorbonne Université, F-75005 Paris, France
| | - Fabio Pietrucci
- Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Sorbonne Université, F-75005 Paris, France
| |
Collapse
|
12
|
Pan X, Snyder R, Wang JN, Lander C, Wickizer C, Van R, Chesney A, Xue Y, Mao Y, Mei Y, Pu J, Shao Y. Training machine learning potentials for reactive systems: A Colab tutorial on basic models. J Comput Chem 2024; 45:638-647. [PMID: 38082539 PMCID: PMC10923003 DOI: 10.1002/jcc.27269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 11/10/2023] [Accepted: 11/11/2023] [Indexed: 01/18/2024]
Abstract
In the last several years, there has been a surge in the development of machine learning potential (MLP) models for describing molecular systems. We are interested in a particular area of this field - the training of system-specific MLPs for reactive systems - with the goal of using these MLPs to accelerate free energy simulations of chemical and enzyme reactions. To help new members in our labs become familiar with the basic techniques, we have put together a self-guided Colab tutorial (https://cc-ats.github.io/mlp_tutorial/), which we expect to be also useful to other young researchers in the community. Our tutorial begins with the introduction of simple feedforward neural network (FNN) and kernel-based (using Gaussian process regression, GPR) models by fitting the two-dimensional Müller-Brown potential. Subsequently, two simple descriptors are presented for extracting features of molecular systems: symmetry functions (including the ANI variant) and embedding neural networks (such as DeepPot-SE). Lastly, these features will be fed into FNN and GPR models to reproduce the energies and forces for the molecular configurations in a Claisen rearrangement reaction.
Collapse
Affiliation(s)
- Xiaoliang Pan
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK 73019, USA
| | - Ryan Snyder
- Department of Chemistry and Chemical Biology, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, USA
| | - Jia-Ning Wang
- State Key Laboratory of Precision Spectroscopy, School of Physics and Electronic Science, East China Normal University, Shanghai 200241, China
| | - Chance Lander
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK 73019, USA
| | - Carly Wickizer
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK 73019, USA
| | - Richard Van
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK 73019, USA
- Laboratory of Computational Biology, National, Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD 20824, USA
| | - Andrew Chesney
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK 73019, USA
| | - Yuanfei Xue
- State Key Laboratory of Precision Spectroscopy, School of Physics and Electronic Science, East China Normal University, Shanghai 200241, China
| | - Yuezhi Mao
- Department of Chemistry and Biochemistry, San Diego State University, San Diego, CA 92182, USA
| | - Ye Mei
- State Key Laboratory of Precision Spectroscopy, School of Physics and Electronic Science, East China Normal University, Shanghai 200241, China
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
- Collaborative Innovation Center of Extreme Optics, Shanxi University, Taiyuan, Shanxi 030006, China
| | - Jingzhi Pu
- Department of Chemistry and Chemical Biology, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, USA
| | - Yihan Shao
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK 73019, USA
| |
Collapse
|
13
|
Guo J, Sours T, Holton S, Sun C, Kulkarni AR. Screening Cu-Zeolites for Methane Activation Using Curriculum-Based Training. ACS Catal 2024; 14:1232-1242. [PMID: 38327646 PMCID: PMC10845107 DOI: 10.1021/acscatal.3c05275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 12/15/2023] [Accepted: 12/18/2023] [Indexed: 02/09/2024]
Abstract
Machine learning (ML), when used synergistically with atomistic simulations, has recently emerged as a powerful tool for accelerated catalyst discovery. However, the application of these techniques has been limited by the lack of interpretable and transferable ML models. In this work, we propose a curriculum-based training (CBT) philosophy to systematically develop reactive machine learning potentials (rMLPs) for high-throughput screening of zeolite catalysts. Our CBT approach combines several different types of calculations to gradually teach the ML model about the relevant regions of the reactive potential energy surface. The resulting rMLPs are accurate, transferable, and interpretable. We further demonstrate the effectiveness of this approach by exhaustively screening thousands of [CuOCu]2+ sites across hundreds of Cu-zeolites for the industrially relevant methane activation reaction. Specifically, this large-scale analysis of the entire International Zeolite Association (IZA) database identifies a set of previously unexplored zeolites (i.e., MEI, ATN, EWO, and CAS) that show the highest ensemble-averaged rates for [CuOCu]2+-catalyzed methane activation. We believe that this CBT philosophy can be generally applied to other zeolite-catalyzed reactions and, subsequently, to other types of heterogeneous catalysts. Thus, this represents an important step toward overcoming the long-standing barriers within the computational heterogeneous catalysis community.
Collapse
Affiliation(s)
- Jiawei Guo
- Department of Chemical Engineering, University of California, Davis, California 95616, United States
| | - Tyler Sours
- Department of Chemical Engineering, University of California, Davis, California 95616, United States
| | - Sam Holton
- Department of Chemical Engineering, University of California, Davis, California 95616, United States
| | - Chenghan Sun
- Department of Chemical Engineering, University of California, Davis, California 95616, United States
| | - Ambarish R. Kulkarni
- Department of Chemical Engineering, University of California, Davis, California 95616, United States
| |
Collapse
|
14
|
Stark W, Westermayr J, Douglas-Gallardo OA, Gardner J, Habershon S, Maurer RJ. Machine Learning Interatomic Potentials for Reactive Hydrogen Dynamics at Metal Surfaces Based on Iterative Refinement of Reaction Probabilities. THE JOURNAL OF PHYSICAL CHEMISTRY. C, NANOMATERIALS AND INTERFACES 2023; 127:24168-24182. [PMID: 38148847 PMCID: PMC10749455 DOI: 10.1021/acs.jpcc.3c06648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 11/12/2023] [Accepted: 11/15/2023] [Indexed: 12/28/2023]
Abstract
The reactive chemistry of molecular hydrogen at surfaces, notably dissociative sticking and hydrogen evolution, plays a crucial role in energy storage and fuel cells. Theoretical studies can help to decipher underlying mechanisms and reaction design, but studying dynamics at surfaces is computationally challenging due to the complex electronic structure at interfaces and the high sensitivity of dynamics to reaction barriers. In addition, ab initio molecular dynamics, based on density functional theory, is too computationally demanding to accurately predict reactive sticking or desorption probabilities, as it requires averaging over tens of thousands of initial conditions. High-dimensional machine learning-based interatomic potentials are starting to be more commonly used in gas-surface dynamics, yet robust approaches to generate reliable training data and assess how model uncertainty affects the prediction of dynamic observables are not well established. Here, we employ ensemble learning to adaptively generate training data while assessing model performance with full uncertainty quantification (UQ) for reaction probabilities of hydrogen scattering on different copper facets. We use this approach to investigate the performance of two message-passing neural networks, SchNet and PaiNN. Ensemble-based UQ and iterative refinement allow us to expose the shortcomings of the invariant pairwise-distance-based feature representation in the SchNet model for gas-surface dynamics.
Collapse
Affiliation(s)
- Wojciech
G. Stark
- Department
of Chemistry, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, U.K.
| | - Julia Westermayr
- Department
of Chemistry, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, U.K.
| | | | - James Gardner
- Department
of Chemistry, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, U.K.
| | - Scott Habershon
- Department
of Chemistry, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, U.K.
| | - Reinhard J. Maurer
- Department
of Chemistry, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, U.K.
- Department
of Physics, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, U.K.
| |
Collapse
|
15
|
Guan X, Heindel JP, Ko T, Yang C, Head-Gordon T. Using machine learning to go beyond potential energy surface benchmarking for chemical reactivity. NATURE COMPUTATIONAL SCIENCE 2023; 3:965-974. [PMID: 38177593 DOI: 10.1038/s43588-023-00549-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 10/04/2023] [Indexed: 01/06/2024]
Abstract
We train an equivariant machine learning (ML) model to predict energies and forces for hydrogen combustion under conditions of finite temperature and pressure. This challenging case for reactive chemistry illustrates that ML potential energy surfaces are difficult to make complete, due to overreliance on chemical intuition of what data are important for training. Instead, a 'negative design' data acquisition strategy using metadynamics as part of an active learning workflow helps to create a ML model that avoids unforeseen high-energy or unphysical energy configurations. This strategy more rapidly converges the potential energy surfaces such that it is now more efficient to make calls to the external ab initio source when query-by-committee models disagree to further molecular dynamics in time without need for ML retraining. With the hybrid ML-physics model we realize two orders of magnitude reduction in cost, allowing for prediction of the free-energy change in the transition-state mechanism for several hydrogen combustion reaction channels.
Collapse
Affiliation(s)
- Xingyi Guan
- Kenneth S. Pitzer Theory Center and Department of Chemistry, Berkeley, CA, USA
- Chemical Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Joseph P Heindel
- Kenneth S. Pitzer Theory Center and Department of Chemistry, Berkeley, CA, USA
- Chemical Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Taehee Ko
- Department of Mathematics, Penn State University, University Park, PA, USA
| | - Chao Yang
- Applied Mathematics and Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Teresa Head-Gordon
- Kenneth S. Pitzer Theory Center and Department of Chemistry, Berkeley, CA, USA.
- Chemical Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
- Departments of Bioengineering and Chemical and Biomolecular Engineering, University of California, Berkeley, CA, USA.
| |
Collapse
|
16
|
Fedik N, Nebgen B, Lubbers N, Barros K, Kulichenko M, Li YW, Zubatyuk R, Messerly R, Isayev O, Tretiak S. Synergy of semiempirical models and machine learning in computational chemistry. J Chem Phys 2023; 159:110901. [PMID: 37712780 DOI: 10.1063/5.0151833] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 07/11/2023] [Indexed: 09/16/2023] Open
Abstract
Catalyzed by enormous success in the industrial sector, many research programs have been exploring data-driven, machine learning approaches. Performance can be poor when the model is extrapolated to new regions of chemical space, e.g., new bonding types, new many-body interactions. Another important limitation is the spatial locality assumption in model architecture, and this limitation cannot be overcome with larger or more diverse datasets. The outlined challenges are primarily associated with the lack of electronic structure information in surrogate models such as interatomic potentials. Given the fast development of machine learning and computational chemistry methods, we expect some limitations of surrogate models to be addressed in the near future; nevertheless spatial locality assumption will likely remain a limiting factor for their transferability. Here, we suggest focusing on an equally important effort-design of physics-informed models that leverage the domain knowledge and employ machine learning only as a corrective tool. In the context of material science, we will focus on semi-empirical quantum mechanics, using machine learning to predict corrections to the reduced-order Hamiltonian model parameters. The resulting models are broadly applicable, retain the speed of semiempirical chemistry, and frequently achieve accuracy on par with much more expensive ab initio calculations. These early results indicate that future work, in which machine learning and quantum chemistry methods are developed jointly, may provide the best of all worlds for chemistry applications that demand both high accuracy and high numerical efficiency.
Collapse
Affiliation(s)
- Nikita Fedik
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - Benjamin Nebgen
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - Nicholas Lubbers
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - Kipton Barros
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - Maksim Kulichenko
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - Ying Wai Li
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - Roman Zubatyuk
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| | - Richard Messerly
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - Olexandr Isayev
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| | - Sergei Tretiak
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
- Center for Integrated Nanotechnologies Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| |
Collapse
|
17
|
van der Oord C, Sachs M, Kovács DP, Ortner C, Csányi G. Hyperactive learning for data-driven interatomic potentials. NPJ COMPUTATIONAL MATERIALS 2023; 9:168. [PMID: 38666057 PMCID: PMC11041776 DOI: 10.1038/s41524-023-01104-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 08/02/2023] [Indexed: 04/28/2024]
Abstract
Data-driven interatomic potentials have emerged as a powerful tool for approximating ab initio potential energy surfaces. The most time-consuming step in creating these interatomic potentials is typically the generation of a suitable training database. To aid this process hyperactive learning (HAL), an accelerated active learning scheme, is presented as a method for rapid automated training database assembly. HAL adds a biasing term to a physically motivated sampler (e.g. molecular dynamics) driving atomic structures towards uncertainty in turn generating unseen or valuable training configurations. The proposed HAL framework is used to develop atomic cluster expansion (ACE) interatomic potentials for the AlSi10 alloy and polyethylene glycol (PEG) polymer starting from roughly a dozen initial configurations. The HAL generated ACE potentials are shown to be able to determine macroscopic properties, such as melting temperature and density, with close to experimental accuracy.
Collapse
|
18
|
Batzner S. Biasing energy surfaces towards the unknown. NATURE COMPUTATIONAL SCIENCE 2023; 3:190-191. [PMID: 38177879 DOI: 10.1038/s43588-023-00420-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2024]
Affiliation(s)
- Simon Batzner
- John A. Paulson School of Engineering and Applies Sciences, Harvard University, Cambridge, MA, USA.
| |
Collapse
|