1
|
Prat A, Abdel Aty H, Bastas O, Kamuntavičius G, Paquet T, Norvaišas P, Gasparotto P, Tal R. HydraScreen: A Generalizable Structure-Based Deep Learning Approach to Drug Discovery. J Chem Inf Model 2024; 64:5817-5831. [PMID: 39037942 DOI: 10.1021/acs.jcim.4c00481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/24/2024]
Abstract
We propose HydraScreen, a deep-learning framework for safe and robust accelerated drug discovery. HydraScreen utilizes a state-of-the-art 3D convolutional neural network designed for the effective representation of molecular structures and interactions in protein-ligand binding. We designed an end-to-end pipeline for high-throughput screening and lead optimization, targeting applications in structure-based drug design. We assessed our approach using established public benchmarks based on the CASF-2016 core set, achieving top-tier results in affinity and pose prediction (Pearson's r = 0.86, RMSE = 1.15, Top-1 = 0.95). We introduced a novel approach for interaction profiling, aimed at detecting potential biases within both the model and data sets. This approach not only enhanced interpretability but also reinforced the impartiality of our methodology. Finally, we demonstrated HydraScreen's ability to generalize effectively across novel proteins and ligands through a temporal split. We also provide insights into potential avenues for future development aimed at enhancing the robustness of machine learning scoring functions. HydraScreen (accessible at http://hydrascreen.ro5.ai/paper) provides a user-friendly GUI and a public API, facilitating the easy-access assessment of protein-ligand complexes.
Collapse
Affiliation(s)
- Alvaro Prat
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| | - Hisham Abdel Aty
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| | - Orestis Bastas
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| | | | - Tanya Paquet
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| | - Povilas Norvaišas
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| | - Piero Gasparotto
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| | - Roy Tal
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| |
Collapse
|
2
|
van Gerwen P, Briling KR, Bunne C, Somnath VR, Laplaza R, Krause A, Corminboeuf C. 3DReact: Geometric Deep Learning for Chemical Reactions. J Chem Inf Model 2024; 64:5771-5785. [PMID: 39007724 DOI: 10.1021/acs.jcim.4c00104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Geometric deep learning models, which incorporate the relevant molecular symmetries within the neural network architecture, have considerably improved the accuracy and data efficiency of predictions of molecular properties. Building on this success, we introduce 3DReact, a geometric deep learning model to predict reaction properties from three-dimensional structures of reactants and products. We demonstrate that the invariant version of the model is sufficient for existing reaction data sets. We illustrate its competitive performance on the prediction of activation barriers on the GDB7-22-TS, Cyclo-23-TS, and Proparg-21-TS data sets in different atom-mapping regimes. We show that, compared to existing models for reaction property prediction, 3DReact offers a flexible framework that exploits atom-mapping information, if available, as well as geometries of reactants and products (in an invariant or equivariant fashion). Accordingly, it performs systematically well across different data sets, atom-mapping regimes, as well as both interpolation and extrapolation tasks.
Collapse
Affiliation(s)
- Puck van Gerwen
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National Center for Competence in Research - Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Ksenia R Briling
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Charlotte Bunne
- National Center for Competence in Research - Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
- Learning & Adaptive Systems Group, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland
| | - Vignesh Ram Somnath
- National Center for Competence in Research - Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
- Learning & Adaptive Systems Group, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland
| | - Ruben Laplaza
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National Center for Competence in Research - Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Andreas Krause
- National Center for Competence in Research - Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
- Learning & Adaptive Systems Group, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland
| | - Clemence Corminboeuf
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National Center for Competence in Research - Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
3
|
Gantzer P, Staub R, Harabuchi Y, Maeda S, Varnek A. Chemography-guided analysis of a reaction path network for ethylene hydrogenation with a model Wilkinson's catalyst. Mol Inform 2024:e202400063. [PMID: 39121023 DOI: 10.1002/minf.202400063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 07/11/2024] [Accepted: 07/19/2024] [Indexed: 08/11/2024]
Abstract
Visualization and analysis of large chemical reaction networks become rather challenging when conventional graph-based approaches are used. As an alternative, we propose to use the chemical cartography ("chemography") approach, describing the data distribution on a 2-dimensional map. Here, the Generative Topographic Mapping (GTM) algorithm - an advanced chemography approach - has been applied to visualize the reaction path network of a simplified Wilkinson's catalyst-catalyzed hydrogenation containing some 105 structures generated with the help of the Artificial Force Induced Reaction (AFIR) method using either Density Functional Theory or Neural Network Potential (NNP) for potential energy surface calculations. Using new atoms permutation invariant 3D descriptors for structure encoding, we've demonstrated that GTM possesses the abilities to cluster structures that share the same 2D representation, to visualize potential energy surface, to provide an insight on the reaction path exploration as a function of time and to compare reaction path networks obtained with different methods of energy assessment.
Collapse
Affiliation(s)
- Philippe Gantzer
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo, Hokkaido, 001-0021, Japan
| | - Ruben Staub
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo, Hokkaido, 001-0021, Japan
| | - Yu Harabuchi
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo, Hokkaido, 001-0021, Japan
| | - Satoshi Maeda
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo, Hokkaido, 001-0021, Japan
| | - Alexandre Varnek
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo, Hokkaido, 001-0021, Japan
- Laboratory of Chemoinformatics, UMR 7140, CNRS, University of Strasbourg, Strasbourg, 67081, France
| |
Collapse
|
4
|
Panchagnula K, Graf D, Johnson ER, Thom AJW. Targeting spectroscopic accuracy for dispersion bound systems from ab initio techniques: Translational eigenstates of Ne@C70. J Chem Phys 2024; 161:054308. [PMID: 39092939 DOI: 10.1063/5.0223298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Accepted: 07/17/2024] [Indexed: 08/04/2024] Open
Abstract
We investigate the endofullerene system Ne@C70 by constructing a three-dimensional Potential Energy Surface (PES) describing the translational motion of the Ne atom. This is constructed from electronic structure calculations from a plethora of methods, including MP2, SCS-MP2, SOS-MP2, RPA@PBE, and C(HF)-RPA, which were previously used for He@C60 in Panchagnula et al. [J. Chem. Phys. 160, 104303 (2024)], alongside B86bPBE-25X-XDM and B86bPBE-50X-XDM. The reduction in symmetry moving from C60 to C70 introduces a double well potential along the anisotropic direction, which forms a test of the sensitivity and effectiveness of the electronic structure methods. The nuclear Hamiltonian is diagonalized using a symmetrized double minimum basis set outlined in Panchagnula and Thom [J. Chem. Phys. 159, 164308 (2023)], with translational energies having error bars ±1 and ±2 cm-1. We find no consistency between electronic structure methods as they find a range of barrier heights and minima positions of the double well and different translational eigenspectra, which also differ from the Lennard-Jones (LJ) PES given in Mandziuk and Bačić [J. Chem. Phys. 101, 2126-2140 (1994)]. We find that generating effective LJ parameters for each electronic structure method cannot reproduce the full PES nor recreate the eigenstates, and this suggests that the LJ form of the PES, while simple, may not be best suited to describe these systems. Even though MP2 and RPA@PBE performed best for He@C60, due to the lack of concordance between all electronic structure methods, we require more experimental data in order to properly validate the choice.
Collapse
Affiliation(s)
- K Panchagnula
- Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| | - D Graf
- Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
- Department of Chemistry, University of Munich (LMU), Munich, Germany
| | - E R Johnson
- Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
- Department of Chemistry, Dalhousie University, 6243 Alumni Crescent, Halifax, Nova Scotia B3H 4R2, Canada
| | - A J W Thom
- Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
5
|
Bigi F, Pozdnyakov SN, Ceriotti M. Wigner kernels: Body-ordered equivariant machine learning without a basis. J Chem Phys 2024; 161:044116. [PMID: 39056390 DOI: 10.1063/5.0208746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2024] [Accepted: 06/10/2024] [Indexed: 07/28/2024] Open
Abstract
Machine-learning models based on a point-cloud representation of a physical object are ubiquitous in scientific applications and particularly well-suited to the atomic-scale description of molecules and materials. Among the many different approaches that have been pursued, the description of local atomic environments in terms of their discretized neighbor densities has been used widely and very successfully. We propose a novel density-based method, which involves computing "Wigner kernels." These are fully equivariant and body-ordered kernels that can be computed iteratively at a cost that is independent of the basis used to discretize the density and grows only linearly with the maximum body-order considered. Wigner kernels represent the infinite-width limit of feature-space models, whose dimensionality and computational cost instead scale exponentially with the increasing order of correlations. We present several examples of the accuracy of models based on Wigner kernels in chemical applications, for both scalar and tensorial targets, reaching an accuracy that is competitive with state-of-the-art deep-learning architectures. We discuss the broader relevance of these findings to equivariant geometric machine-learning.
Collapse
Affiliation(s)
- Filippo Bigi
- Laboratory of Computational Science and Modeling, Institut des Matériaux, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Sergey N Pozdnyakov
- Laboratory of Computational Science and Modeling, Institut des Matériaux, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Michele Ceriotti
- Laboratory of Computational Science and Modeling, Institut des Matériaux, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
6
|
Bi S, Knijff L, Lian X, van Hees A, Zhang C, Salanne M. Modeling of Nanomaterials for Supercapacitors: Beyond Carbon Electrodes. ACS NANO 2024; 18:19931-19949. [PMID: 39053903 PMCID: PMC11308780 DOI: 10.1021/acsnano.4c01787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 04/08/2024] [Accepted: 04/23/2024] [Indexed: 07/27/2024]
Abstract
Capacitive storage devices allow for fast charge and discharge cycles, making them the perfect complements to batteries for high power applications. Many materials display interesting capacitive properties when they are put in contact with ionic solutions despite their very different structures and (surface) reactivity. Among them, nanocarbons are the most important for practical applications, but many nanomaterials have recently emerged, such as conductive metal-organic frameworks, 2D materials, and a wide variety of metal oxides. These heterogeneous and complex electrode materials are difficult to model with conventional approaches. However, the development of computational methods, the incorporation of machine learning techniques, and the increasing power in high performance computing now allow us to tackle these types of systems. In this Review, we summarize the current efforts in this direction. We show that depending on the nature of the materials and of the charging mechanisms, different methods, or combinations of them, can provide desirable atomic-scale insight on the interactions at play. We mainly focus on two important aspects: (i) the study of ion adsorption in complex nanoporous materials, which require the extension of constant potential molecular dynamics to multicomponent systems, and (ii) the characterization of Faradaic processes in pseudocapacitors, that involves the use of electronic structure-based methods. We also discuss how recently developed simulation methods will allow bridges to be made between double-layer capacitors and pseudocapacitors for future high power electricity storage devices.
Collapse
Affiliation(s)
- Sheng Bi
- Physicochimie
des Électrolytes et Nanosystèmes Interfaciaux, Sorbonne Université, CNRS, F-75005 Paris, France
- Réseau
sur le Stockage Electrochimique de l’Energie (RS2E), FR CNRS 3459, 80039 Amiens Cedex, France
| | - Lisanne Knijff
- Department
of Chemistry - Ångström Laboratory, Uppsala University, Lägerhyddsvägen 1, BOX 538, Uppsala 75121, Sweden
| | - Xiliang Lian
- Physicochimie
des Électrolytes et Nanosystèmes Interfaciaux, Sorbonne Université, CNRS, F-75005 Paris, France
- Réseau
sur le Stockage Electrochimique de l’Energie (RS2E), FR CNRS 3459, 80039 Amiens Cedex, France
| | - Alicia van Hees
- Department
of Chemistry - Ångström Laboratory, Uppsala University, Lägerhyddsvägen 1, BOX 538, Uppsala 75121, Sweden
| | - Chao Zhang
- Department
of Chemistry - Ångström Laboratory, Uppsala University, Lägerhyddsvägen 1, BOX 538, Uppsala 75121, Sweden
- Wallenberg
Initiative Materials Science for Sustainability, Uppsala University, 75121 Uppsala, Sweden
| | - Mathieu Salanne
- Réseau
sur le Stockage Electrochimique de l’Energie (RS2E), FR CNRS 3459, 80039 Amiens Cedex, France
- Institut
Universitaire de France (IUF), 75231 Paris, France
| |
Collapse
|
7
|
Roy S, Dürholt JP, Asche TS, Zipoli F, Gómez-Bombarelli R. Learning a reactive potential for silica-water through uncertainty attribution. Nat Commun 2024; 15:6030. [PMID: 39019930 PMCID: PMC11254924 DOI: 10.1038/s41467-024-50407-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 07/03/2024] [Indexed: 07/19/2024] Open
Abstract
The reactivity of silicates in aqueous solution is relevant to various chemistries ranging from silicate minerals in geology, to the C-S-H phase in cement, nanoporous zeolite catalysts, or highly porous precipitated silica. While simulations of chemical reactions can provide insight at the molecular level, balancing accuracy and scale in reactive simulations in the condensed phase is a challenge. Here, we demonstrate how a machine-learning reactive interatomic potential trained on PaiNN architecture can accurately capture silicate-water reactivity. The model was trained on a dataset comprising 400,000 energies and forces of molecular clusters at the ωB97X-D3/def2-TZVP level. To ensure the robustness of the model, we introduce a general active learning strategy based on the attribution of the model uncertainty, that automatically isolates uncertain regions of bulk simulations to be calculated as small-sized clusters. The potential reproduces static and dynamic properties of liquid water and solid crystalline silicates, despite having been trained exclusively on cluster data. Furthermore, we utilize enhanced sampling simulations to recover the self-ionization reactivity of water accurately, and the acidity of silicate oligomers, and lastly study the silicate dimerization reaction in a water solution at neutral conditions and find that the reaction occurs through a flanking mechanism.
Collapse
Affiliation(s)
- Swagata Roy
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | | | - Thomas S Asche
- Evonik Operations GmbH, Essen, North Rhine-Westphalia, Germany
| | - Federico Zipoli
- IBM Research Europe, Saümerstrasse 4, 8803, Rüschlikon, Switzerland
| | - Rafael Gómez-Bombarelli
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA.
| |
Collapse
|
8
|
de Blasio P, Elsborg J, Vegge T, Flores E, Bhowmik A. CALiSol-23: Experimental electrolyte conductivity data for various Li-salts and solvent combinations. Sci Data 2024; 11:750. [PMID: 38987528 PMCID: PMC11237020 DOI: 10.1038/s41597-024-03575-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 06/26/2024] [Indexed: 07/12/2024] Open
Abstract
Ion transport in non-aqueous electrolytes is crucial for high performance lithium-ion battery (LIB) development. The design of superior electrolytes requires extensive experimentation across the compositional space. To support data driven accelerated electrolyte discovery efforts, we curated and analyzed a large dataset covering a wide range of experimentally recorded ionic conductivities for various combinations of lithium salts, solvents, concentrations, and temperatures. The dataset is named as 'Conductivity Atlas for Lithium salts and Solvents' (CALiSol-23). Comprehensive datasets are lacking but are critical to building chemistry agnostic machine learning models for conductivity as well as data driven electrolyte optimization tasks. CALiSol-23 was derived from an exhaustive review of literature concerning experimental non-aqueous electrolyte conductivity measurement. The final dataset consists of 13,825 individual data points from 27 different experimental articles, in total covering 38 solvents, a broad temperature range, and 14 lithium salts. CALiSol-23 can help expedite machine learning model development that can help in understanding the complexities of ion transport and streamlining the optimization of non-aqueous electrolyte mixtures.
Collapse
Affiliation(s)
- Paolo de Blasio
- Technical University of Denmark, Department of Energy Conversion and Storage, Kgs. Lyngby, 2800, Denmark
| | - Jonas Elsborg
- Technical University of Denmark, Department of Energy Conversion and Storage, Kgs. Lyngby, 2800, Denmark
| | - Tejs Vegge
- Technical University of Denmark, Department of Energy Conversion and Storage, Kgs. Lyngby, 2800, Denmark
| | - Eibar Flores
- Technical University of Denmark, Department of Energy Conversion and Storage, Kgs. Lyngby, 2800, Denmark.
- SINTEF Industry, Sustainable Energy Technology, Trondheim, 7034, Norway.
| | - Arghya Bhowmik
- Technical University of Denmark, Department of Energy Conversion and Storage, Kgs. Lyngby, 2800, Denmark.
| |
Collapse
|
9
|
Fisher KE, Herbst MF, Marzouk YM. Multitask methods for predicting molecular properties from heterogeneous data. J Chem Phys 2024; 161:014114. [PMID: 38958501 DOI: 10.1063/5.0201681] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 06/12/2024] [Indexed: 07/04/2024] Open
Abstract
Data generation remains a bottleneck in training surrogate models to predict molecular properties. We demonstrate that multitask Gaussian process regression overcomes this limitation by leveraging both expensive and cheap data sources. In particular, we consider training sets constructed from coupled-cluster (CC) and density functional theory (DFT) data. We report that multitask surrogates can predict at CC-level accuracy with a reduction in data generation cost by over an order of magnitude. Of note, our approach allows the training set to include DFT data generated by a heterogeneous mix of exchange-correlation functionals without imposing any artificial hierarchy on functional accuracy. More generally, the multitask framework can accommodate a wider range of training set structures-including the full disparity between the different levels of fidelity-than existing kernel approaches based on Δ-learning although we show that the accuracy of the two approaches can be similar. Consequently, multitask regression can be a tool for reducing data generation costs even further by opportunistically exploiting existing data sources.
Collapse
Affiliation(s)
- K E Fisher
- Department of Aeronautics and Astronautics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - M F Herbst
- Mathematics for Materials Modelling, Institute of Mathematics and Institute of Materials, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Y M Marzouk
- Department of Aeronautics and Astronautics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
10
|
Lange J, Anelli A, Alsenz J, Kuentz M, O’Dwyer PJ, Saal W, Wyttenbach N, Griffin BT. Comparative Analysis of Chemical Descriptors by Machine Learning Reveals Atomistic Insights into Solute-Lipid Interactions. Mol Pharm 2024; 21:3343-3355. [PMID: 38780534 PMCID: PMC11220795 DOI: 10.1021/acs.molpharmaceut.4c00080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 05/07/2024] [Accepted: 05/07/2024] [Indexed: 05/25/2024]
Abstract
This study explores the research area of drug solubility in lipid excipients, an area persistently complex despite recent advancements in understanding and predicting solubility based on molecular structure. To this end, this research investigated novel descriptor sets, employing machine learning techniques to understand the determinants governing interactions between solutes and medium-chain triglycerides (MCTs). Quantitative structure-property relationships (QSPR) were constructed on an extended solubility data set comprising 182 experimental values of structurally diverse drug molecules, including both development and marketed drugs to extract meaningful property relationships. Four classes of molecular descriptors, ranging from traditional representations to complex geometrical descriptions, were assessed and compared in terms of their predictive accuracy and interpretability. These include two-dimensional (2D) and three-dimensional (3D) descriptors, Abraham solvation parameters, extended connectivity fingerprints (ECFPs), and the smooth overlap of atomic position (SOAP) descriptor. Through testing three distinct regularized regression algorithms alongside various preprocessing schemes, the SOAP descriptor enabled the construction of a superior performing model in terms of interpretability and accuracy. Its atom-centered characteristics allowed contributions to be estimated at the atomic level, thereby enabling the ranking of prevalent molecular motifs and their influence on drug solubility in MCTs. The performance on a separate test set demonstrated high predictive accuracy (RMSE = 0.50) for 2D and 3D, SOAP, and Abraham Solvation descriptors. The model trained on ECFP4 descriptors resulted in inferior predictive accuracy. Lastly, uncertainty estimations for each model were introduced to assess their applicability domains and provide information on where the models may extrapolate in chemical space and, thus, where more data may be necessary to refine a data-driven approach to predict solubility in MCTs. Overall, the presented approaches further enable computationally informed formulation development by introducing a novel in silico approach for rational drug development and prediction of dose loading in lipids.
Collapse
Affiliation(s)
- Justus
Johann Lange
- School
of Pharmacy, University College Cork, College Road, Cork T12 R229, Cork
County, Ireland
| | - Andrea Anelli
- Roche
Pharma Research and Early Development, Therapeutic Modalities, Roche
Innovation Center Basel, F. Hoffmann-La
Roche Limited, Grenzacherstrasse
124, Basel 4070, Switzerland
| | - Jochem Alsenz
- Roche
Pharma Research and Early Development, Therapeutic Modalities, Roche
Innovation Center Basel, F. Hoffmann-La
Roche Limited, Grenzacherstrasse
124, Basel 4070, Switzerland
| | - Martin Kuentz
- Insitute
of Pharma Technology, University of Applied
Sciences and Arts Northwestern Switzerland, Hofackerstrasse 30, Muttenz CH-4231, Basel City, Switzerland
| | - Patrick J. O’Dwyer
- School
of Pharmacy, University College Cork, College Road, Cork T12 R229, Cork
County, Ireland
| | - Wiebke Saal
- Roche
Pharma Research and Early Development, Therapeutic Modalities, Roche
Innovation Center Basel, F. Hoffmann-La
Roche Limited, Grenzacherstrasse
124, Basel 4070, Switzerland
| | - Nicole Wyttenbach
- Roche
Pharma Research and Early Development, Therapeutic Modalities, Roche
Innovation Center Basel, F. Hoffmann-La
Roche Limited, Grenzacherstrasse
124, Basel 4070, Switzerland
| | - Brendan T. Griffin
- School
of Pharmacy, University College Cork, College Road, Cork T12 R229, Cork
County, Ireland
| |
Collapse
|
11
|
Aldossary A, Campos-Gonzalez-Angulo JA, Pablo-García S, Leong SX, Rajaonson EM, Thiede L, Tom G, Wang A, Avagliano D, Aspuru-Guzik A. In Silico Chemical Experiments in the Age of AI: From Quantum Chemistry to Machine Learning and Back. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2024; 36:e2402369. [PMID: 38794859 DOI: 10.1002/adma.202402369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 04/28/2024] [Indexed: 05/26/2024]
Abstract
Computational chemistry is an indispensable tool for understanding molecules and predicting chemical properties. However, traditional computational methods face significant challenges due to the difficulty of solving the Schrödinger equations and the increasing computational cost with the size of the molecular system. In response, there has been a surge of interest in leveraging artificial intelligence (AI) and machine learning (ML) techniques to in silico experiments. Integrating AI and ML into computational chemistry increases the scalability and speed of the exploration of chemical space. However, challenges remain, particularly regarding the reproducibility and transferability of ML models. This review highlights the evolution of ML in learning from, complementing, or replacing traditional computational chemistry for energy and property predictions. Starting from models trained entirely on numerical data, a journey set forth toward the ideal model incorporating or learning the physical laws of quantum mechanics. This paper also reviews existing computational methods and ML models and their intertwining, outlines a roadmap for future research, and identifies areas for improvement and innovation. Ultimately, the goal is to develop AI architectures capable of predicting accurate and transferable solutions to the Schrödinger equation, thereby revolutionizing in silico experiments within chemistry and materials science.
Collapse
Affiliation(s)
- Abdulrahman Aldossary
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
| | | | - Sergio Pablo-García
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, ON, M5S 2E4, Canada
| | - Shi Xuan Leong
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
| | - Ella Miray Rajaonson
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
| | - Luca Thiede
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, ON, M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
| | - Gary Tom
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
| | - Andrew Wang
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
| | - Davide Avagliano
- Chimie ParisTech, PSL University, CNRS, Institute of Chemistry for Life and Health Sciences (iCLeHS UMR 8060), Paris, F-75005, France
| | - Alán Aspuru-Guzik
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, ON, M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
- Department of Materials Science & Engineering, University of Toronto, 184 College St., Toronto, ON, M5S 3E4, Canada
- Department of Chemical Engineering & Applied Chemistry, University of Toronto, 200 College St., Toronto, ON, M5S 3E5, Canada
- Lebovic Fellow, Canadian Institute for Advanced Research (CIFAR), 66118 University Ave., Toronto, M5G 1M1, Canada
- Acceleration Consortium, 80 St George St, Toronto, M5S 3H6, Canada
| |
Collapse
|
12
|
Noda K, Shibuta Y. Predicting long-term trends in physical properties from short-term molecular dynamics simulations using long short-term memory. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2024; 36:385902. [PMID: 38870994 DOI: 10.1088/1361-648x/ad5821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Accepted: 06/12/2024] [Indexed: 06/15/2024]
Abstract
This study proposes a novel long short-term memory (LSTM)-based model for predicting future physical properties based on partial data of molecular dynamics (MD) simulation. It extracts latent vectors from atomic coordinates of MD simulations using graph convolutional network, utilizes LSTM to learn temporal trends in latent vectors and make one-step-ahead predictions of physical properties through fully connected layers. Validating with MD simulations of Ni solid-liquid systems, the model achieved accurate one-step-ahead prediction for time variation of the potential energy during solidification and melting processes using residual connections. Recursive use of predicted values enabled long-term prediction from just the first 20 snapshots of the MD simulation. The prediction has captured the feature of potential energy bending at low temperatures, which represents completion of solidification, despite that the MD data in short time do not have such a bending characteristic. Remarkably, for long-time prediction over 900 ps, the computation time was reduced to 1/700th of a full MD simulation of the same duration. This approach has shown the potential to significantly reduce computational cost for prediction of physical properties by efficiently utilizing the data of MD simulation.
Collapse
Affiliation(s)
- Kota Noda
- Department of Materials Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
| | - Yasushi Shibuta
- Department of Materials Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
| |
Collapse
|
13
|
Weymuth T, Unsleber JP, Türtscher PL, Steiner M, Sobez JG, Müller CH, Mörchen M, Klasovita V, Grimmel SA, Eckhoff M, Csizi KS, Bosia F, Bensberg M, Reiher M. SCINE-Software for chemical interaction networks. J Chem Phys 2024; 160:222501. [PMID: 38857173 DOI: 10.1063/5.0206974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 05/09/2024] [Indexed: 06/12/2024] Open
Abstract
The software for chemical interaction networks (SCINE) project aims at pushing the frontier of quantum chemical calculations on molecular structures to a new level. While calculations on individual structures as well as on simple relations between them have become routine in chemistry, new developments have pushed the frontier in the field to high-throughput calculations. Chemical relations may be created by a search for specific molecular properties in a molecular design attempt, or they can be defined by a set of elementary reaction steps that form a chemical reaction network. The software modules of SCINE have been designed to facilitate such studies. The features of the modules are (i) general applicability of the applied methodologies ranging from electronic structure (no restriction to specific elements of the periodic table) to microkinetic modeling (with little restrictions on molecularity), full modularity so that SCINE modules can also be applied as stand-alone programs or be exchanged for external software packages that fulfill a similar purpose (to increase options for computational campaigns and to provide alternatives in case of tasks that are hard or impossible to accomplish with certain programs), (ii) high stability and autonomous operations so that control and steering by an operator are as easy as possible, and (iii) easy embedding into complex heterogeneous environments for molecular structures taken individually or in the context of a reaction network. A graphical user interface unites all modules and ensures interoperability. All components of the software have been made available as open source and free of charge.
Collapse
Affiliation(s)
- Thomas Weymuth
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Jan P Unsleber
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Paul L Türtscher
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Miguel Steiner
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Jan-Grimo Sobez
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Charlotte H Müller
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Maximilian Mörchen
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Veronika Klasovita
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Stephanie A Grimmel
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Marco Eckhoff
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Katja-Sophia Csizi
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Francesco Bosia
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Moritz Bensberg
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Markus Reiher
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| |
Collapse
|
14
|
Yang Y, Zhang S, Ranasinghe KD, Isayev O, Roitberg AE. Machine Learning of Reactive Potentials. Annu Rev Phys Chem 2024; 75:371-395. [PMID: 38941524 DOI: 10.1146/annurev-physchem-062123-024417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/30/2024]
Abstract
In the past two decades, machine learning potentials (MLPs) have driven significant developments in chemical, biological, and material sciences. The construction and training of MLPs enable fast and accurate simulations and analysis of thermodynamic and kinetic properties. This review focuses on the application of MLPs to reaction systems with consideration of bond breaking and formation. We review the development of MLP models, primarily with neural network and kernel-based algorithms, and recent applications of reactive MLPs (RMLPs) to systems at different scales. We show how RMLPs are constructed, how they speed up the calculation of reactive dynamics, and how they facilitate the study of reaction trajectories, reaction rates, free energy calculations, and many other calculations. Different data sampling strategies applied in building RMLPs are also discussed with a focus on how to collect structures for rare events and how to further improve their performance with active learning.
Collapse
Affiliation(s)
- Yinuo Yang
- Department of Chemistry, University of Florida, Gainesville, Florida;
| | - Shuhao Zhang
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania;
| | | | - Olexandr Isayev
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania;
| | - Adrian E Roitberg
- Department of Chemistry, University of Florida, Gainesville, Florida;
| |
Collapse
|
15
|
Zarrouk T, Ibragimova R, Bartók AP, Caro MA. Experiment-Driven Atomistic Materials Modeling: A Case Study Combining X-Ray Photoelectron Spectroscopy and Machine Learning Potentials to Infer the Structure of Oxygen-Rich Amorphous Carbon. J Am Chem Soc 2024; 146:14645-14659. [PMID: 38749497 PMCID: PMC11140750 DOI: 10.1021/jacs.4c01897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 05/02/2024] [Accepted: 05/03/2024] [Indexed: 05/30/2024]
Abstract
An important yet challenging aspect of atomistic materials modeling is reconciling experimental and computational results. Conventional approaches involve generating numerous configurations through molecular dynamics or Monte Carlo structure optimization and selecting the one with the closest match to experiment. However, this inefficient process is not guaranteed to succeed. We introduce a general method to combine atomistic machine learning (ML) with experimental observables that produces atomistic structures compatible with experiment by design. We use this approach in combination with grand-canonical Monte Carlo within a modified Hamiltonian formalism, to generate configurations that agree with experimental data and are chemically sound (low in energy). We apply our approach to understand the atomistic structure of oxygenated amorphous carbon (a-COx), an intriguing carbon-based material, to answer the question of how much oxygen can be added to carbon before it fully decomposes into CO and CO2. Utilizing an ML-based X-ray photoelectron spectroscopy (XPS) model trained from GW and density functional theory (DFT) data, in conjunction with an ML interatomic potential, we identify a-COx structures compliant with experimental XPS predictions that are also energetically favorable with respect to DFT. Employing a network analysis, we accurately deconvolve the XPS spectrum into motif contributions, both revealing the inaccuracies inherent to experimental XPS interpretation and granting us atomistic insight into the structure of a-COx. This method generalizes to multiple experimental observables and allows for the elucidation of the atomistic structure of materials directly from experimental data, thereby enabling experiment-driven materials modeling with a degree of realism previously out of reach.
Collapse
Affiliation(s)
- Tigany Zarrouk
- Department
of Chemistry and Materials Science, Aalto
University, Espoo 02150, Finland
| | - Rina Ibragimova
- Department
of Chemistry and Materials Science, Aalto
University, Espoo 02150, Finland
| | - Albert P. Bartók
- Department
of Physics, University of Warwick, Coventry CV4 7AL, U.K.
- Warwick
Centre for Predictive Modelling, School of Engineering, University of Warwick, Coventry CV4 7AL, U.K.
| | - Miguel A. Caro
- Department
of Chemistry and Materials Science, Aalto
University, Espoo 02150, Finland
| |
Collapse
|
16
|
Wang G, Wang C, Zhang X, Li Z, Zhou J, Sun Z. Machine learning interatomic potential: Bridge the gap between small-scale models and realistic device-scale simulations. iScience 2024; 27:109673. [PMID: 38646181 PMCID: PMC11033164 DOI: 10.1016/j.isci.2024.109673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/23/2024] Open
Abstract
Machine learning interatomic potential (MLIP) overcomes the challenges of high computational costs in density-functional theory and the relatively low accuracy in classical large-scale molecular dynamics, facilitating more efficient and precise simulations in materials research and design. In this review, the current state of the four essential stages of MLIP is discussed, including data generation methods, material structure descriptors, six unique machine learning algorithms, and available software. Furthermore, the applications of MLIP in various fields are investigated, notably in phase-change memory materials, structure searching, material properties predicting, and the pre-trained universal models. Eventually, the future perspectives, consisting of standard datasets, transferability, generalization, and trade-off between accuracy and complexity in MLIPs, are reported.
Collapse
Affiliation(s)
- Guanjie Wang
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
- School of Integrated Circuit Science and Engineering, Beihang University, Beijing 100191, China
| | - Changrui Wang
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| | - Xuanguang Zhang
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| | - Zefeng Li
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| | - Jian Zhou
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| | - Zhimei Sun
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| |
Collapse
|
17
|
van Gerwen P, Briling KR, Calvino Alonso Y, Franke M, Corminboeuf C. Benchmarking machine-readable vectors of chemical reactions on computed activation barriers. DIGITAL DISCOVERY 2024; 3:932-943. [PMID: 38756222 PMCID: PMC11094696 DOI: 10.1039/d3dd00175j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 02/28/2024] [Indexed: 05/18/2024]
Abstract
In recent years, there has been a surge of interest in predicting computed activation barriers, to enable the acceleration of the automated exploration of reaction networks. Consequently, various predictive approaches have emerged, ranging from graph-based models to methods based on the three-dimensional structure of reactants and products. In tandem, many representations have been developed to predict experimental targets, which may hold promise for barrier prediction as well. Here, we bring together all of these efforts and benchmark various methods (Morgan fingerprints, the DRFP, the CGR representation-based Chemprop, SLATMd, B2Rl2, EquiReact and language model BERT + RXNFP) for the prediction of computed activation barriers on three diverse datasets.
Collapse
Affiliation(s)
- Puck van Gerwen
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Ksenia R Briling
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Yannick Calvino Alonso
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Malte Franke
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Clemence Corminboeuf
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| |
Collapse
|
18
|
Steiner M, Reiher M. A human-machine interface for automatic exploration of chemical reaction networks. Nat Commun 2024; 15:3680. [PMID: 38693117 PMCID: PMC11063077 DOI: 10.1038/s41467-024-47997-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 04/15/2024] [Indexed: 05/03/2024] Open
Abstract
Autonomous reaction network exploration algorithms offer a systematic approach to explore mechanisms of complex chemical processes. However, the resulting reaction networks are so vast that an exploration of all potentially accessible intermediates is computationally too demanding. This renders brute-force explorations unfeasible, while explorations with completely pre-defined intermediates or hard-wired chemical constraints, such as element-specific coordination numbers, are not flexible enough for complex chemical systems. Here, we introduce a STEERING WHEEL to guide an otherwise unbiased automated exploration. The STEERING WHEEL algorithm is intuitive, generally applicable, and enables one to focus on specific regions of an emerging network. It also allows for guiding automated data generation in the context of mechanism exploration, catalyst design, and other chemical optimization challenges. The algorithm is demonstrated for reaction mechanism elucidation of transition metal catalysts. We highlight how to explore catalytic cycles in a systematic and reproducible way. The exploration objectives are fully adjustable, allowing one to harness the STEERING WHEEL for both structure-specific (accurate) calculations as well as for broad high-throughput screening of possible reaction intermediates.
Collapse
Affiliation(s)
- Miguel Steiner
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 2, 8093, Zurich, Switzerland
- ETH Zurich, NCCR Catalysis, Vladimir-Prelog-Weg 2, 8093, Zurich, Switzerland
| | - Markus Reiher
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 2, 8093, Zurich, Switzerland.
- ETH Zurich, NCCR Catalysis, Vladimir-Prelog-Weg 2, 8093, Zurich, Switzerland.
| |
Collapse
|
19
|
Wan K, He J, Shi X. Construction of High Accuracy Machine Learning Interatomic Potential for Surface/Interface of Nanomaterials-A Review. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2024; 36:e2305758. [PMID: 37640376 DOI: 10.1002/adma.202305758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 08/24/2023] [Indexed: 08/31/2023]
Abstract
The inherent discontinuity and unique dimensional attributes of nanomaterial surfaces and interfaces bestow them with various exceptional properties. These properties, however, also introduce difficulties for both experimental and computational studies. The advent of machine learning interatomic potential (MLIP) addresses some of the limitations associated with empirical force fields, presenting a valuable avenue for accurate simulations of these surfaces/interfaces of nanomaterials. Central to this approach is the idea of capturing the relationship between system configuration and potential energy, leveraging the proficiency of machine learning (ML) to precisely approximate high-dimensional functions. This review offers an in-depth examination of MLIP principles and their execution and elaborates on their applications in the realm of nanomaterial surface and interface systems. The prevailing challenges faced by this potent methodology are also discussed.
Collapse
Affiliation(s)
- Kaiwei Wan
- Laboratory of Theoretical and Computational Nanoscience, National Center for Nanoscience and Technology, Chinese Academy of Sciences, Beijing, 100190, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Jianxin He
- Laboratory of Theoretical and Computational Nanoscience, National Center for Nanoscience and Technology, Chinese Academy of Sciences, Beijing, 100190, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Xinghua Shi
- Laboratory of Theoretical and Computational Nanoscience, National Center for Nanoscience and Technology, Chinese Academy of Sciences, Beijing, 100190, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| |
Collapse
|
20
|
Ge F, Wang R, Qu C, Zheng P, Nandi A, Conte R, Houston PL, Bowman JM, Dral PO. Tell Machine Learning Potentials What They Are Needed For: Simulation-Oriented Training Exemplified for Glycine. J Phys Chem Lett 2024; 15:4451-4460. [PMID: 38626460 DOI: 10.1021/acs.jpclett.4c00746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/18/2024]
Abstract
Machine learning potentials (MLPs) are widely applied as an efficient alternative way to represent potential energy surfaces (PESs) in many chemical simulations. The MLPs are often evaluated with the root-mean-square errors on the test set drawn from the same distribution as the training data. Here, we systematically investigate the relationship between such test errors and the simulation accuracy with MLPs on an example of a full-dimensional, global PES for the glycine amino acid. Our results show that the errors in the test set do not unambiguously reflect the MLP performance in different simulation tasks, such as relative conformer energies, barriers, vibrational levels, and zero-point vibrational energies. We also offer an easily accessible solution for improving the MLP quality in a simulation-oriented manner, yielding the most precise relative conformer energies and barriers. This solution also passed the stringent test by diffusion Monte Carlo simulations.
Collapse
Affiliation(s)
- Fuchun Ge
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| | - Ran Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| | - Chen Qu
- Independent Researcher, Toronto, Ontario M9B0E3, Canada
| | - Peikun Zheng
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| | - Apurba Nandi
- Department of Chemistry and Cherry L. Emerson Center for Scientific Computation, Emory University, Atlanta, Georgia 30322, United States
- Department of Physics and Materials Science, University of Luxembourg, Luxembourg City L-1511, Luxembourg
| | - Riccardo Conte
- Dipartimento di Chimica, Università degli Studi di Milano, via Golgi 19, 20133 Milano, Italy
| | - Paul L Houston
- Department of Chemistry and Chemical Biology, Cornell University, Ithaca, New York 14853, United States
| | - Joel M Bowman
- Department of Chemistry and Cherry L. Emerson Center for Scientific Computation, Emory University, Atlanta, Georgia 30322, United States
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| |
Collapse
|
21
|
Žugec I, Geilhufe RM, Lončarić I. Global machine learning potentials for molecular crystals. J Chem Phys 2024; 160:154106. [PMID: 38624120 DOI: 10.1063/5.0196232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Accepted: 03/29/2024] [Indexed: 04/17/2024] Open
Abstract
Molecular crystals are difficult to model with accurate first-principles methods due to large unit cells. On the other hand, accurate modeling is required as polymorphs often differ by only 1 kJ/mol. Machine learning interatomic potentials promise to provide accuracy of the baseline first-principles methods with a cost lower by orders of magnitude. Using the existing databases of the density functional theory calculations for molecular crystals and molecules, we train global machine learning interatomic potentials, usable for any molecular crystal. We test the performance of the potentials on experimental benchmarks and show that they perform better than classical force fields and, in some cases, are comparable to the density functional theory calculations.
Collapse
Affiliation(s)
- Ivan Žugec
- Centro de Física de Materiales CFM/MPC (CSIC-UPV/EHU), Donostia-San Sebastián, Spain
| | - R Matthias Geilhufe
- Department of Physics, Chalmers University of Technology, Gothenburg, Sweden
| | - Ivor Lončarić
- Ruđer Bošković Institute, Bijenička 54, Zagreb, Croatia
| |
Collapse
|
22
|
Gou Q, Liu J, Su H, Guo Y, Chen J, Zhao X, Pu X. Exploring an accurate machine learning model to quickly estimate stability of diverse energetic materials. iScience 2024; 27:109452. [PMID: 38523799 PMCID: PMC10960145 DOI: 10.1016/j.isci.2024.109452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 01/27/2024] [Accepted: 03/06/2024] [Indexed: 03/26/2024] Open
Abstract
High energy and low sensitivity have been the focus of developing new energetic materials (EMs). However, there has been a lack of a quick and accurate method for evaluating the stability of diverse EMs. Here, we develop a machine learning prediction model with high accuracy for bond dissociation energy (BDE) of EMs. A reliable and representative BDE dataset of EMs is constructed by collecting 778 experimental energetic compounds and quantum mechanics calculation. To sufficiently characterize the BDE of EMs, a hybrid feature representation is proposed by coupling the local target bond into the global structure characteristics. To alleviate the limitation of the low dataset, pairwise difference regression is utilized as a data augmentation with the advantage of reducing systematic errors and improving diversity. Benefiting from these improvements, the XGBoost model achieves the best prediction accuracy with R2 of 0.98 and MAE of 8.8 kJ mol-1, significantly outperforming competitive models.
Collapse
Affiliation(s)
- Qiaolin Gou
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Jing Liu
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Haoming Su
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Yanzhi Guo
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Jiayi Chen
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Xueyan Zhao
- Institute of Chemical Materials, China Academy of Engineering Physics, Mianyang 621900, China
| | - Xuemei Pu
- College of Chemistry, Sichuan University, Chengdu 610064, China
| |
Collapse
|
23
|
Pang K, Wen M, Chang X, Xu Y, Chu Q, Chen D. The thermal decomposition mechanism of RDX/AP composites: ab initio neural network MD simulations. Phys Chem Chem Phys 2024; 26:11545-11557. [PMID: 38532730 DOI: 10.1039/d3cp05709g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/28/2024]
Abstract
A neural network potential (NNP) is developed to investigate the decomposition mechanism of RDX, AP, and their composites. Utilizing an ab initio dataset, the NNP is evaluated in terms of atomic energy and forces, demonstrating strong agreement with ab initio calculations. Numerical stability tests across a range of timesteps reveal excellent stability compared to the state-of-the-art ReaxFF models. Then the thermal decomposition of pure RDX, AP, and RDX/AP composites is performed using NNP to explore the coupling effect between RDX and AP. The results highlight a dual interaction between RDX and AP, i.e., AP accelerates RDX decomposition, particularly at low temperatures, and RDX promotes AP decomposition. Analyzing RDX trajectories at the RDX/AP interface unveils a three-part decomposition mechanism involving N-N bond cleavage, H transfer with AP to form Cl-containing acid, and chain-breaking reactions generating small molecules such as N2, CO, and CO2. The presence of AP enhances H transfer reactions, contributing to its role in promoting RDX decomposition. This work studies the reaction kinetics of RDX/AP composites from the atomic point of view, and can be widely used in the establishment of reaction kinetics models of composite systems with energetic materials.
Collapse
Affiliation(s)
- Kehui Pang
- State Key Laboratory of Explosion Science and Safety Protection, Beijing Institute of Technology, Beijing 100081, P. R. China.
| | - Mingjie Wen
- State Key Laboratory of Explosion Science and Safety Protection, Beijing Institute of Technology, Beijing 100081, P. R. China.
| | - Xiaoya Chang
- State Key Laboratory of Explosion Science and Safety Protection, Beijing Institute of Technology, Beijing 100081, P. R. China.
| | - Yabei Xu
- State Key Laboratory of Explosion Science and Safety Protection, Beijing Institute of Technology, Beijing 100081, P. R. China.
| | - Qingzhao Chu
- State Key Laboratory of Explosion Science and Safety Protection, Beijing Institute of Technology, Beijing 100081, P. R. China.
| | - Dongping Chen
- State Key Laboratory of Explosion Science and Safety Protection, Beijing Institute of Technology, Beijing 100081, P. R. China.
| |
Collapse
|
24
|
Cignoni E, Suman D, Nigam J, Cupellini L, Mennucci B, Ceriotti M. Electronic Excited States from Physically Constrained Machine Learning. ACS CENTRAL SCIENCE 2024; 10:637-648. [PMID: 38559300 PMCID: PMC10979507 DOI: 10.1021/acscentsci.3c01480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 01/16/2024] [Accepted: 01/30/2024] [Indexed: 04/04/2024]
Abstract
Data-driven techniques are increasingly used to replace electronic-structure calculations of matter. In this context, a relevant question is whether machine learning (ML) should be applied directly to predict the desired properties or combined explicitly with physically grounded operations. We present an example of an integrated modeling approach in which a symmetry-adapted ML model of an effective Hamiltonian is trained to reproduce electronic excitations from a quantum-mechanical calculation. The resulting model can make predictions for molecules that are much larger and more complex than those on which it is trained and allows for dramatic computational savings by indirectly targeting the outputs of well-converged calculations while using a parametrization corresponding to a minimal atom-centered basis. These results emphasize the merits of intertwining data-driven techniques with physical approximations, improving the transferability and interpretability of ML models without affecting their accuracy and computational efficiency and providing a blueprint for developing ML-augmented electronic-structure methods.
Collapse
Affiliation(s)
- Edoardo Cignoni
- Dipartimento
di Chimica e Chimica Industriale, Università
di Pisa, 56126 Pisa, Italy
| | - Divya Suman
- Laboratory
of Computational Science and Modeling, Institut des Matériaux, École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
| | - Jigyasa Nigam
- Laboratory
of Computational Science and Modeling, Institut des Matériaux, École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
| | - Lorenzo Cupellini
- Dipartimento
di Chimica e Chimica Industriale, Università
di Pisa, 56126 Pisa, Italy
| | - Benedetta Mennucci
- Dipartimento
di Chimica e Chimica Industriale, Università
di Pisa, 56126 Pisa, Italy
| | - Michele Ceriotti
- Laboratory
of Computational Science and Modeling, Institut des Matériaux, École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| |
Collapse
|
25
|
Liu S. Harvesting Chemical Understanding with Machine Learning and Quantum Computers. ACS PHYSICAL CHEMISTRY AU 2024; 4:135-142. [PMID: 38560751 PMCID: PMC10979482 DOI: 10.1021/acsphyschemau.3c00067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 12/29/2023] [Accepted: 01/02/2024] [Indexed: 04/04/2024]
Abstract
It is tenable to argue that nobody can predict the future with certainty, yet one can learn from the past and make informed projections for the years ahead. In this Perspective, we overview the status of how theory and computation can be exploited to obtain chemical understanding from wave function theory and density functional theory, and then outlook the likely impact of machine learning (ML) and quantum computers (QC) to appreciate traditional chemical concepts in decades to come. It is maintained that the development and maturation of ML and QC methods in theoretical and computational chemistry represent two paradigm shifts about how the Schrödinger equation can be solved. New chemical understanding can be harnessed in these two new paradigms by making respective use of ML features and QC qubits. Before that happens, however, we still have hurdles to face and obstacles to overcome in both ML and QC arenas. Possible pathways to tackle these challenges are proposed. We anticipate that hierarchical modeling, in contrast to multiscale modeling, will emerge and thrive, becoming the workhorse of in silico simulations in the next few decades.
Collapse
|
26
|
Miao L, Jia W, Cao X, Jiao L. Computational chemistry for water-splitting electrocatalysis. Chem Soc Rev 2024; 53:2771-2807. [PMID: 38344774 DOI: 10.1039/d2cs01068b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Electrocatalytic water splitting driven by renewable electricity has attracted great interest in recent years for producing hydrogen with high-purity. However, the practical applications of this technology are limited by the development of electrocatalysts with high activity, low cost, and long durability. In the search for new electrocatalysts, computational chemistry has made outstanding contributions by providing fundamental laws that govern the electron behavior and enabling predictions of electrocatalyst performance. This review delves into theoretical studies on electrochemical water-splitting processes. Firstly, we introduce the fundamentals of electrochemical water electrolysis and subsequently discuss the current advancements in computational methods and models for electrocatalytic water splitting. Additionally, a comprehensive overview of benchmark descriptors is provided to aid in understanding intrinsic catalytic performance for water-splitting electrocatalysts. Finally, we critically evaluate the remaining challenges within this field.
Collapse
Affiliation(s)
- Licheng Miao
- Key Laboratory of Advanced Energy Materials Chemistry (Ministry of Education), Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), College of Chemistry, Nankai University, Tianjin 300071, China.
| | - Wenqi Jia
- Key Laboratory of Advanced Energy Materials Chemistry (Ministry of Education), Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), College of Chemistry, Nankai University, Tianjin 300071, China.
| | - Xuejie Cao
- Key Laboratory of Advanced Energy Materials Chemistry (Ministry of Education), Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), College of Chemistry, Nankai University, Tianjin 300071, China.
| | - Lifang Jiao
- Key Laboratory of Advanced Energy Materials Chemistry (Ministry of Education), Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), College of Chemistry, Nankai University, Tianjin 300071, China.
| |
Collapse
|
27
|
Panchagnula K, Graf D, Albertani FEA, Thom AJW. Translational eigenstates of He@C60 from four-dimensional ab initio potential energy surfaces interpolated using Gaussian process regression. J Chem Phys 2024; 160:104303. [PMID: 38465682 DOI: 10.1063/5.0197903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 02/22/2024] [Indexed: 03/12/2024] Open
Abstract
We investigate the endofullerene system 3He@C60 with a four-dimensional potential energy surface (PES) to include the three He translational degrees of freedom and C60 cage radius. We compare second order Møller-Plesset perturbation theory (MP2), spin component scaled-MP2, scaled opposite spin-MP2, random phase approximation (RPA)@Perdew, Burke, and Ernzerhof (PBE), and corrected Hartree-Fock-RPA to calibrate and gain confidence in the choice of electronic structure method. Due to the high cost of these calculations, the PES is interpolated using Gaussian Process Regression (GPR), owing to its effectiveness with sparse training data. The PES is split into a two-dimensional radial surface, to which corrections are applied to achieve an overall four-dimensional surface. The nuclear Hamiltonian is diagonalized to generate the in-cage translational/vibrational eigenstates. The degeneracy of the three-dimensional harmonic oscillator energies with principal quantum number n is lifted due to the anharmonicity in the radial potential. The (2l + 1)-fold degeneracy of the angular momentum states is also weakly lifted, due to the angular dependence in the potential. We calculate the fundamental frequency to range between 96 and 110 cm-1 depending on the electronic structure method used. Error bars of the eigenstate energies were calculated from the GPR and are on the order of ∼±1.5 cm-1. Wavefunctions are also compared by considering their overlap and Hellinger distance to the one-dimensional empirical potential. As with the energies, the two ab initio methods MP2 and RPA@PBE show the best agreement. While MP2 has better agreement than RPA@PBE, due to its higher computational efficiency and comparable performance, we recommend RPA as an alternative electronic structure method of choice to MP2 for these systems.
Collapse
Affiliation(s)
- K Panchagnula
- Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| | - D Graf
- Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| | - F E A Albertani
- Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| | - A J W Thom
- Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
28
|
Pathirage PDVS, Phillips JT, Vogiatzis KD. Exploration of the Two-Electron Excitation Space with Data-Driven Coupled Cluster. J Phys Chem A 2024. [PMID: 38422511 DOI: 10.1021/acs.jpca.3c06600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/02/2024]
Abstract
Computational cost limits the applicability of post-Hartree-Fock methods such as coupled-cluster on larger molecular systems. The data-driven coupled-cluster (DDCC) method applies machine learning to predict the coupled-cluster two-electron amplitudes (t2) using data from second-order perturbation theory (MP2). One major limitation of the DDCC models is the size of training sets that increases exponentially with the system size. Effective sampling of the amplitude space can resolve this issue. Five different amplitude selection techniques that reduce the amount of data used for training were evaluated, an approach that also prevents model overfitting and increases the portability of data-driven coupled-cluster singles and doubles to more complex molecules or larger basis sets. In combination with a localized orbital formalism to predict the CCSD t2 amplitudes, we have achieved a 10-fold error reduction for energy calculations.
Collapse
Affiliation(s)
- P D Varuna S Pathirage
- Department of Chemistry, University of Tennessee, Knoxville, Tennessee 37996-1600, United States
| | - Justin T Phillips
- Department of Chemistry, University of Tennessee, Knoxville, Tennessee 37996-1600, United States
| | - Konstantinos D Vogiatzis
- Department of Chemistry, University of Tennessee, Knoxville, Tennessee 37996-1600, United States
| |
Collapse
|
29
|
Liu Y, Liu X, Cao B. Graph attention neural networks for mapping materials and molecules beyond short-range interatomic correlations. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2024; 36:215901. [PMID: 38306704 DOI: 10.1088/1361-648x/ad2584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Accepted: 02/02/2024] [Indexed: 02/04/2024]
Abstract
Bringing advances in machine learning to chemical science is leading to a revolutionary change in the way of accelerating materials discovery and atomic-scale simulations. Currently, most successful machine learning schemes can be largely traced to the use of localized atomic environments in the structural representation of materials and molecules. However, this may undermine the reliability of machine learning models for mapping complex systems and describing long-range physical effects because of the lack of non-local correlations between atoms. To overcome such limitations, here we report a graph attention neural network as a unified framework to map materials and molecules into a generalizable and interpretable representation that combines local and non-local information of atomic environments from multiple scales. As an exemplary study, our model is applied to predict the electronic structure properties of metal-organic frameworks (MOFs) which have notable diversity in compositions and structures. The results show that our model achieves the state-of-the-art performance. The clustering analysis further demonstrates that our model enables high-level identification of MOFs with spatial and chemical resolution, which would facilitate the rational design of promising reticular materials. Furthermore, the application of our model in predicting the heat capacity of complex nanoporous materials, a critical property in a carbon capture process, showcases its versatility and accuracy in handling diverse physical properties beyond electronic structures.
Collapse
Affiliation(s)
- Yuanbin Liu
- Key Laboratory for Thermal Science and Power Engineering of Ministry of Education, Department of Engineering Mechanics, Tsinghua University, Beijing 100084, People's Republic of China
- Inorganic Chemistry Laboratory, Department of Chemistry, University of Oxford, Oxford, OX1 3QR, United Kingdom
| | - Xin Liu
- School of Chemical Engineering and Advanced Materials, The University of Adelaide, Adelaide, SA 5005, Australia
- Key Laboratory of Engineering Dielectric and Applications of Ministry of Education, School of Electrical and Electronic Engineering, Harbin University of Science and Technology, Harbin 150080, People's Republic of China
| | - Bingyang Cao
- Key Laboratory for Thermal Science and Power Engineering of Ministry of Education, Department of Engineering Mechanics, Tsinghua University, Beijing 100084, People's Republic of China
| |
Collapse
|
30
|
He X, Li M, Rong C, Zhao D, Liu W, Ayers PW, Liu S. Some Recent Advances in Density-Based Reactivity Theory. J Phys Chem A 2024; 128:1183-1196. [PMID: 38329898 DOI: 10.1021/acs.jpca.3c07997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2024]
Abstract
Establishing a chemical reactivity theory in density functional theory (DFT) language has been our intense research interest in the past two decades, exemplified by the determination of steric effect and stereoselectivity, evaluation of electrophilicity and nucleophilicity, identification of strong and weak interactions, and formulation of cooperativity, frustration, and principle of chirality hierarchy. In this Featured Article, we first overview the four density-based frameworks in DFT to appreciate chemical understanding, including conceptual DFT, use of density associated quantities, information-theoretic approach, and orbital-free DFT, and then present a few recent advances of these frameworks as well as new applications from our studies. To that end, we will introduce the relationship among these frameworks, determining the entire spectrum of interactions with Pauli energy derivatives, performing topological analyses with information-theoretic quantities, and extending the density-based frameworks to excited states. Applications to examine physiochemical properties in external electric fields and to evaluate polarizability for proteins and crystals are discussed. A few possible directions for future development are followed, with the special emphasis on its merger with machine learning.
Collapse
Affiliation(s)
- Xin He
- Qingdao Institute for Theoretical and Computational Sciences, Institute of Frontier and Interdisciplinary Science, Shandong University, Qingdao, Shandong 266237, China
| | - Meng Li
- Key Laboratory of Chemical Biology and Traditional Chinese Medicine Research (Ministry of Education of China), Hunan Normal University, Changsha, Hunan 410081, China
| | - Chunying Rong
- Key Laboratory of Chemical Biology and Traditional Chinese Medicine Research (Ministry of Education of China), Hunan Normal University, Changsha, Hunan 410081, China
| | - Dongbo Zhao
- Institute of Biomedical Research, Yunnan University, Kunming 650500, China
| | - Wenjian Liu
- Qingdao Institute for Theoretical and Computational Sciences, Institute of Frontier and Interdisciplinary Science, Shandong University, Qingdao, Shandong 266237, China
| | - Paul W Ayers
- Department of Chemistry and Chemical Biology, McMaster University, Hamilton ONL8S, Canada
| | - Shubin Liu
- Research Computing Center, University of North Carolina, Chapel Hill, North Carolina 27599-3420, United States
- Department of Chemistry, University of North Carolina, Chapel Hill, North Carolina 27599-3290, United States
| |
Collapse
|
31
|
Li R, Zhou C, Singh A, Pei Y, Henkelman G, Li L. Local-environment-guided selection of atomic structures for the development of machine-learning potentials. J Chem Phys 2024; 160:074109. [PMID: 38380745 DOI: 10.1063/5.0187892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 01/26/2024] [Indexed: 02/22/2024] Open
Abstract
Machine learning potentials (MLPs) have attracted significant attention in computational chemistry and materials science due to their high accuracy and computational efficiency. The proper selection of atomic structures is crucial for developing reliable MLPs. Insufficient or redundant atomic structures can impede the training process and potentially result in a poor quality MLP. Here, we propose a local-environment-guided screening algorithm for efficient dataset selection in MLP development. The algorithm utilizes a local environment bank to store unique local environments of atoms. The dissimilarity between a particular local environment and those stored in the bank is evaluated using the Euclidean distance. A new structure is selected only if its local environment is significantly different from those already present in the bank. Consequently, the bank is then updated with all the new local environments found in the selected structure. To demonstrate the effectiveness of our algorithm, we applied it to select structures for a Ge system and a Pd13H2 particle system. The algorithm reduced the training data size by around 80% for both without compromising the performance of the MLP models. We verified that the results were independent of the selection and ordering of the initial structures. We also compared the performance of our method with the farthest point sampling algorithm, and the results show that our algorithm is superior in both robustness and computational efficiency. Furthermore, the generated local environment bank can be continuously updated and can potentially serve as a growing database of feature local environments, aiding in efficient dataset maintenance for constructing accurate MLPs.
Collapse
Affiliation(s)
- Renzhe Li
- Shenzhen Key Laboratory of Micro/Nano-Porous Functional Materials (SKLPM), Department of Materials Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, People's Republic of China
- College of Chemistry, Xiangtan University, Xiangtan 411105, Hunan Province, People's Republic of China
| | - Chuan Zhou
- Shenzhen Key Laboratory of Micro/Nano-Porous Functional Materials (SKLPM), Department of Materials Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, People's Republic of China
| | - Akksay Singh
- Shenzhen Key Laboratory of Micro/Nano-Porous Functional Materials (SKLPM), Department of Materials Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, People's Republic of China
- Department of Chemistry, The University of Texas at Austin, Austin, Texas 78712, USA
- Institute for Computational Engineering and Sciences, The University of Texas at Austin, Austin, Texas 78712, USA
| | - Yong Pei
- College of Chemistry, Xiangtan University, Xiangtan 411105, Hunan Province, People's Republic of China
| | - Graeme Henkelman
- Department of Chemistry, The University of Texas at Austin, Austin, Texas 78712, USA
- Institute for Computational Engineering and Sciences, The University of Texas at Austin, Austin, Texas 78712, USA
| | - Lei Li
- Shenzhen Key Laboratory of Micro/Nano-Porous Functional Materials (SKLPM), Department of Materials Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, People's Republic of China
| |
Collapse
|
32
|
Patel RA, Webb MA. Data-Driven Design of Polymer-Based Biomaterials: High-throughput Simulation, Experimentation, and Machine Learning. ACS APPLIED BIO MATERIALS 2024; 7:510-527. [PMID: 36701125 DOI: 10.1021/acsabm.2c00962] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Polymers, with the capacity to tunably alter properties and response based on manipulation of their chemical characteristics, are attractive components in biomaterials. Nevertheless, their potential as functional materials is also inhibited by their complexity, which complicates rational or brute-force design and realization. In recent years, machine learning has emerged as a useful tool for facilitating materials design via efficient modeling of structure-property relationships in the chemical domain of interest. In this Spotlight, we discuss the emergence of data-driven design of polymers that can be deployed in biomaterials with particular emphasis on complex copolymer systems. We outline recent developments, as well as our own contributions and takeaways, related to high-throughput data generation for polymer systems, methods for surrogate modeling by machine learning, and paradigms for property optimization and design. Throughout this discussion, we highlight key aspects of successful strategies and other considerations that will be relevant to the future design of polymer-based biomaterials with target properties.
Collapse
Affiliation(s)
- Roshan A Patel
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08540, United States
| | - Michael A Webb
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08540, United States
| |
Collapse
|
33
|
Briling K, Calvino Alonso Y, Fabrizio A, Corminboeuf C. SPA HM(a,b): Encoding the Density Information from Guess Hamiltonian in Quantum Machine Learning Representations. J Chem Theory Comput 2024; 20:1108-1117. [PMID: 38227222 PMCID: PMC10867806 DOI: 10.1021/acs.jctc.3c01040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 12/20/2023] [Accepted: 12/26/2023] [Indexed: 01/17/2024]
Abstract
Recently, we introduced a class of molecular representations for kernel-based regression methods─the spectrum of approximated Hamiltonian matrices (SPAHM)─that takes advantage of lightweight one-electron Hamiltonians traditionally used as a self-consistent field initial guess. The original SPAHM variant is built from occupied-orbital energies (i.e., eigenvalues) and naturally contains all of the information about nuclear charges, atomic positions, and symmetry requirements. Its advantages were demonstrated on data sets featuring a wide variation of charge and spin, for which traditional structure-based representations commonly fail. SPAHM(a,b), as introduced here, expand the eigenvalue SPAHM into local and transferable representations. They rely upon one-electron density matrices to build fingerprints from atomic and bond density overlap contributions inspired from preceding state-of-the-art representations. The performance and efficiency of SPAHM(a,b) is assessed on the predictions for data sets of prototypical organic molecules (QM7) of different charges and azoheteroarene dyes in an excited state. Overall, both SPAHM(a) and SPAHM(b) outperform state-of-the-art representations on difficult prediction tasks such as the atomic properties of charged open-shell species and of π-conjugated systems.
Collapse
Affiliation(s)
- Ksenia
R. Briling
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Yannick Calvino Alonso
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Alberto Fabrizio
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National
Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
| | - Clemence Corminboeuf
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National
Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
34
|
Nicolle A, Deng S, Ihme M, Kuzhagaliyeva N, Ibrahim EA, Farooq A. Mixtures Recomposition by Neural Nets: A Multidisciplinary Overview. J Chem Inf Model 2024; 64:597-620. [PMID: 38284618 DOI: 10.1021/acs.jcim.3c01633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2024]
Abstract
Artificial Neural Networks (ANNs) are transforming how we understand chemical mixtures, providing an expressive view of the chemical space and multiscale processes. Their hybridization with physical knowledge can bridge the gap between predictivity and understanding of the underlying processes. This overview explores recent progress in ANNs, particularly their potential in the 'recomposition' of chemical mixtures. Graph-based representations reveal patterns among mixture components, and deep learning models excel in capturing complexity and symmetries when compared to traditional Quantitative Structure-Property Relationship models. Key components, such as Hamiltonian networks and convolution operations, play a central role in representing multiscale mixtures. The integration of ANNs with Chemical Reaction Networks and Physics-Informed Neural Networks for inverse chemical kinetic problems is also examined. The combination of sensors with ANNs shows promise in optical and biomimetic applications. A common ground is identified in the context of statistical physics, where ANN-based methods iteratively adapt their models by blending their initial states with training data. The concept of mixture recomposition unveils a reciprocal inspiration between ANNs and reactive mixtures, highlighting learning behaviors influenced by the training environment.
Collapse
Affiliation(s)
- Andre Nicolle
- Aramco Fuel Research Center, Rueil-Malmaison 92852, France
| | - Sili Deng
- Massachusetts Institute of Technology, Cambridge 02139, Massachusetts, United States
| | - Matthias Ihme
- Stanford University, Stanford 94305, California, United States
| | | | - Emad Al Ibrahim
- King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| | - Aamir Farooq
- King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| |
Collapse
|
35
|
Sahre MJ, von Rudorff GF, Marquetand P, von Lilienfeld OA. Transferability of atomic energies from alchemical decomposition. J Chem Phys 2024; 160:054106. [PMID: 38341696 DOI: 10.1063/5.0187298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 01/09/2024] [Indexed: 02/13/2024] Open
Abstract
We study alchemical atomic energy partitioning as a method to estimate atomization energies from atomic contributions, which are defined in physically rigorous and general ways through the use of the uniform electron gas as a joint reference. We analyze quantitatively the relation between atomic energies and their local environment using a dataset of 1325 organic molecules. The atomic energies are transferable across various molecules, enabling the prediction of atomization energies with a mean absolute error of 23 kcal/mol, comparable to simple statistical estimates but potentially more robust given their grounding in the physics-based decomposition scheme. A comparative analysis with other decomposition methods highlights its sensitivity to electrostatic variations, underlining its potential as a representation of the environment as well as in studying processes like diffusion in solids characterized by significant electrostatic shifts.
Collapse
Affiliation(s)
- Michael J Sahre
- Vienna Doctoral School in Chemistry (DoSChem) and Institute of Theoretical Chemistry and Faculty of Physics, University of Vienna, 1090 Vienna, Austria
| | - Guido Falk von Rudorff
- Department of Chemistry, University Kassel, Heinrich-Plett-Str.40, 34132 Kassel, Germany
- Center for Interdisciplinary Nanostructure Science and Technology (CINSaT), Heinrich-Plett-Straße 40, 34132 Kassel, Germany
| | - Philipp Marquetand
- Faculty of Chemistry, Institute of Theoretical Chemistry, University of Vienna, Währinger Str. 17, 1090 Vienna, Austria
| | - O Anatole von Lilienfeld
- Vienna Doctoral School in Chemistry (DoSChem) and Institute of Theoretical Chemistry and Faculty of Physics, University of Vienna, 1090 Vienna, Austria
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, St. George Campus, Toronto, M5S 3H6 Ontario, Canada
- Department of Materials Science and Engineering, University of Toronto, St. George Campus, Toronto, M5S 3E4 Ontario, Canada
- Vector Institute for Artificial Intelligence, Toronto, M5S 1M1 Ontario, Canada
- ML Group, Technische Universität Berlin and Institute for the Foundations of Learning and Data, 10587 Berlin, Germany
- Berlin Institute for the Foundations of Learning and Data, 10587 Berlin, Germany
- Department of Physics, University of Toronto, St. George Campus, Toronto, M5S 1A7 Ontario, Canada
| |
Collapse
|
36
|
Kapil V, Kovács DP, Csányi G, Michaelides A. First-principles spectroscopy of aqueous interfaces using machine-learned electronic and quantum nuclear effects. Faraday Discuss 2024; 249:50-68. [PMID: 37799072 PMCID: PMC10845015 DOI: 10.1039/d3fd00113j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 07/18/2023] [Indexed: 10/07/2023]
Abstract
Vibrational spectroscopy is a powerful approach to visualising interfacial phenomena. However, extracting structural and dynamical information from vibrational spectra is a challenge that requires first-principles simulations, including non-Condon and quantum nuclear effects. We address this challenge by developing a machine-learning enhanced first-principles framework to speed up predictive modelling of infrared, Raman, and sum-frequency generation spectra. Our approach uses machine learning potentials that encode quantum nuclear effects to generate quantum trajectories using simple molecular dynamics efficiently. In addition, we reformulate bulk and interfacial selection rules to express them unambiguously in terms of the derivatives of polarisation and polarisabilities of the whole system and predict these derivatives efficiently using fully-differentiable machine learning models of dielectric response tensors. We demonstrate our framework's performance by predicting the IR, Raman, and sum-frequency generation spectra of liquid water, ice and the water-air interface by achieving near quantitative agreement with experiments at nearly the same computational efficiency as pure classical methods. Finally, to aid the experimental discovery of new phases of nanoconfined water, we predict the temperature-dependent vibrational spectra of monolayer water across the solid-hexatic-liquid phases transition.
Collapse
Affiliation(s)
- Venkat Kapil
- Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK.
| | | | - Gábor Csányi
- Engineering Laboratory, University of Cambridge, Cambridge, CB2 1PZ, UK
| | - Angelos Michaelides
- Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK.
| |
Collapse
|
37
|
Back S, Aspuru-Guzik A, Ceriotti M, Gryn'ova G, Grzybowski B, Gu GH, Hein J, Hippalgaonkar K, Hormázabal R, Jung Y, Kim S, Kim WY, Moosavi SM, Noh J, Park C, Schrier J, Schwaller P, Tsuda K, Vegge T, von Lilienfeld OA, Walsh A. Accelerated chemical science with AI. DIGITAL DISCOVERY 2024; 3:23-33. [PMID: 38239898 PMCID: PMC10793638 DOI: 10.1039/d3dd00213f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 12/06/2023] [Indexed: 01/22/2024]
Abstract
In light of the pressing need for practical materials and molecular solutions to renewable energy and health problems, to name just two examples, one wonders how to accelerate research and development in the chemical sciences, so as to address the time it takes to bring materials from initial discovery to commercialization. Artificial intelligence (AI)-based techniques, in particular, are having a transformative and accelerating impact on many if not most, technological domains. To shed light on these questions, the authors and participants gathered in person for the ASLLA Symposium on the theme of 'Accelerated Chemical Science with AI' at Gangneung, Republic of Korea. We present the findings, ideas, comments, and often contentious opinions expressed during four panel discussions related to the respective general topics: 'Data', 'New applications', 'Machine learning algorithms', and 'Education'. All discussions were recorded, transcribed into text using Open AI's Whisper, and summarized using LG AI Research's EXAONE LLM, followed by revision by all authors. For the broader benefit of current researchers, educators in higher education, and academic bodies such as associations, publishers, librarians, and companies, we provide chemistry-specific recommendations and summarize the resulting conclusions.
Collapse
Affiliation(s)
- Seoin Back
- Department of Chemical and Biomolecular Engineering, Institute of Emergent Materials, Sogang University Seoul Republic of Korea
| | - Alán Aspuru-Guzik
- Departments of Chemistry, Computer Science, University of Toronto St. George Campus Toronto ON Canada
- Acceleration Consortium and Vector Institute for Artificial Intelligence Toronto ON M5S 1M1 Canada
| | - Michele Ceriotti
- Laboratory of Computational Science and Modeling (COSMO), École Polytechnique Fédérale de Lausanne Lausanne Switzerland
| | - Ganna Gryn'ova
- Heidelberg Institute for Theoretical Studies (HITS gGmbH) 69118 Heidelberg Germany
- Interdisciplinary Center for Scientific Computing, Heidelberg University 69120 Heidelberg Germany
| | - Bartosz Grzybowski
- Center for Algorithmic and Robotized Synthesis (CARS), Institute for Basic Science (IBS) Ulsan Republic of Korea
- Institute of Organic Chemistry, Polish Academy of Sciences Warsaw Poland
- Department of Chemistry, Ulsan National Institute of Science and Technology Ulsan Republic of Korea
| | - Geun Ho Gu
- Department of Energy Engineering, Korea Institute of Energy Technology (KENTECH) Naju 58330 Republic of Korea
| | - Jason Hein
- Department of Chemistry, University of British Columbia Vancouver BC V6T 1Z1 Canada
| | - Kedar Hippalgaonkar
- School of Materials Science and Engineering, Nanyang Technological University 50 Nanyang Avenue Singapore 639798 Singapore
- Institute of Materials Research and Engineering, Agency for Science Technology and Research 2 Fusionopolis Way, 08-03 Singapore 138634 Singapore
| | | | - Yousung Jung
- Department of Chemical and Biomolecular Engineering, KAIST Daejeon Republic of Korea
- School of Chemical and Biological Engineering, Interdisciplinary Program in Artificial Intelligence, Seoul National University 1 Gwanak-ro, Gwanak-gu Seoul 08826 Republic of Korea
| | - Seonah Kim
- Department of Chemistry, Colorado State University 1301 Center Avenue Fort Collins CO 80523 USA
| | - Woo Youn Kim
- Department of Chemistry, KAIST Daejeon Republic of Korea
| | - Seyed Mohamad Moosavi
- Chemical Engineering & Applied Chemistry, University of Toronto Toronto Ontario M5S 3E5 Canada
| | - Juhwan Noh
- Chemical Data-Driven Research Center, Korea Research Institute of Chemical Technology Daejeon 34114 Republic of Korea
| | | | - Joshua Schrier
- Department of Chemistry, Fordham University The Bronx NY 10458 USA
| | - Philippe Schwaller
- Laboratory of Artificial Chemical Intelligence (LIAC) & National Centre of Competence in Research (NCCR) Catalysis, École Polytechnique Fédérale de Lausanne Lausanne Switzerland
| | - Koji Tsuda
- Graduate School of Frontier Sciences, The University of Tokyo Kashiwa Chiba 277-8561 Japan
- Center for Basic Research on Materials, National Institute for Materials Science Tsukuba Ibaraki 305-0044 Japan
- RIKEN Center for Advanced Intelligence Project Tokyo 103-0027 Japan
| | - Tejs Vegge
- Department of Energy Conversion and Storage, Technical University of Denmark 301 Anker Engelunds vej, Kongens Lyngby Copenhagen 2800 Denmark
| | - O Anatole von Lilienfeld
- Acceleration Consortium and Vector Institute for Artificial Intelligence Toronto ON M5S 1M1 Canada
- Departments of Chemistry, Materials Science and Engineering, and Physics, University of Toronto, St George Campus Toronto ON Canada
- Machine Learning Group, Technische Universität Berlin and Berlin Institute for the Foundations of Learning and Data 10587 Berlin Germany
| | - Aron Walsh
- Department of Materials, Imperial College London London SW7 2AZ UK
- Department of Physics, Ewha Women's University Seoul Republic of Korea
| |
Collapse
|
38
|
Eckhoff M, Diedrich JV, Mücke M, Proppe J. Quantitative Structure-Reactivity Relationships for Synthesis Planning: The Benzhydrylium Case. J Phys Chem A 2024; 128:343-354. [PMID: 38113457 PMCID: PMC10788916 DOI: 10.1021/acs.jpca.3c07289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 11/28/2023] [Accepted: 12/01/2023] [Indexed: 12/21/2023]
Abstract
Selective and feasible reactions are among the top targets in synthesis planning. Mayr's approach to quantifying chemical reactivity has greatly facilitated the planning process, but reactivity parameters for new compounds require time-consuming experiments. In the past decade, data-driven modeling has been gaining momentum in the field, as it shows promise in terms of efficient reactivity prediction. However, state-of-the-art models use quantum chemical data as input, which prevent access to real-time planning in organic synthesis. Here, we present a novel data-driven workflow for predicting reactivity parameters of molecules that takes only structural information as input, enabling de facto real-time reactivity predictions. We use the well-understood chemical space of benzhydrylium ions as an example to demonstrate the functionality of our approach and the performance of the resulting quantitative structure-reactivity relationships (QSRRs). Our results suggest that it is straightforward to build low-cost QSRR models that are accurate, interpretable, and transferable to unexplored systems within a given scope of application. Moreover, our QSRR approach suggests that Hammett σ parameters are only approximately additive.
Collapse
Affiliation(s)
- Maike Eckhoff
- Institute
of Physical and Theoretical Chemistry, TU
Braunschweig, Braunschweig 38106, Germany
| | - Johannes V. Diedrich
- Institute
of Physical and Theoretical Chemistry, TU
Braunschweig, Braunschweig 38106, Germany
- Institute
of Physical Chemistry, University of Göttingen, Göttingen 37077, Germany
| | - Maike Mücke
- Institute
of Physical and Theoretical Chemistry, TU
Braunschweig, Braunschweig 38106, Germany
- Institute
of Physical Chemistry, University of Göttingen, Göttingen 37077, Germany
| | - Jonny Proppe
- Institute
of Physical and Theoretical Chemistry, TU
Braunschweig, Braunschweig 38106, Germany
| |
Collapse
|
39
|
Herringer NSM, Dasetty S, Gandhi D, Lee J, Ferguson AL. Permutationally Invariant Networks for Enhanced Sampling (PINES): Discovery of Multimolecular and Solvent-Inclusive Collective Variables. J Chem Theory Comput 2024; 20:178-198. [PMID: 38150421 DOI: 10.1021/acs.jctc.3c00923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2023]
Abstract
The typically rugged nature of molecular free-energy landscapes can frustrate efficient sampling of the thermodynamically relevant phase space due to the presence of high free-energy barriers. Enhanced sampling techniques can improve phase space exploration by accelerating sampling along particular collective variables (CVs). A number of techniques exist for the data-driven discovery of CVs parametrizing the important large-scale motions of the system. A challenge to CV discovery is learning CVs invariant to the symmetries of the molecular system, frequently rigid translation, rigid rotation, and permutational relabeling of identical particles. Of these, permutational invariance has proved a persistent challenge in frustrating the data-driven discovery of multimolecular CVs in systems of self-assembling particles and solvent-inclusive CVs for solvated systems. In this work, we integrate permutation invariant vector (PIV) featurizations with autoencoding neural networks to learn nonlinear CVs invariant to translation, rotation, and permutation and perform interleaved rounds of CV discovery and enhanced sampling to iteratively expand the sampling of configurational phase space and obtain converged CVs and free-energy landscapes. We demonstrate the permutationally invariant network for enhanced sampling (PINES) approach in applications to the self-assembly of a 13-atom argon cluster, association/dissociation of a NaCl ion pair in water, and hydrophobic collapse of a C45H92 n-pentatetracontane polymer chain. We make the approach freely available as a new module within the PLUMED2 enhanced sampling libraries.
Collapse
Affiliation(s)
| | - Siva Dasetty
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Diya Gandhi
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Junhee Lee
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
40
|
Bodenschatz CJ, Saidi WA, Stokes JL, Webster RI, Costa G. Theoretical Prediction of Thermal Expansion Anisotropy for Y 2Si 2O 7 Environmental Barrier Coatings Using a Deep Neural Network Potential and Comparison to Experiment. MATERIALS (BASEL, SWITZERLAND) 2024; 17:286. [PMID: 38255454 PMCID: PMC10817232 DOI: 10.3390/ma17020286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 12/14/2023] [Accepted: 12/27/2023] [Indexed: 01/24/2024]
Abstract
Environmental barrier coatings (EBCs) are an enabling technology for silicon carbide (SiC)-based ceramic matrix composites (CMCs) in extreme environments such as gas turbine engines. However, the development of new coating systems is hindered by the large design space and difficulty in predicting the properties for these materials. Density Functional Theory (DFT) has successfully been used to model and predict some thermodynamic and thermo-mechanical properties of high-temperature ceramics for EBCs, although these calculations are challenging due to their high computational costs. In this work, we use machine learning to train a deep neural network potential (DNP) for Y2Si2O7, which is then applied to calculate the thermodynamic and thermo-mechanical properties at near-DFT accuracy much faster and using less computational resources than DFT. We use this DNP to predict the phonon-based thermodynamic properties of Y2Si2O7 with good agreement to DFT and experiments. We also utilize the DNP to calculate the anisotropic, lattice direction-dependent coefficients of thermal expansion (CTEs) for Y2Si2O7. Molecular dynamics trajectories using the DNP correctly demonstrate the accurate prediction of the anisotropy of the CTE in good agreement with the diffraction experiments. In the future, this DNP could be applied to accelerate additional property calculations for Y2Si2O7 compared to DFT or experiments.
Collapse
Affiliation(s)
- Cameron J. Bodenschatz
- Environmental Effects and Coatings Branch, NASA John H. Glenn Research Center at Lewis Field, Cleveland, OH 44135, USA; (J.L.S.); (R.I.W.); (G.C.)
| | - Wissam A. Saidi
- National Energy Technology Laboratory, Pittsburgh, PA 15236, USA;
- Mechanical Engineering and Materials Science, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Jamesa L. Stokes
- Environmental Effects and Coatings Branch, NASA John H. Glenn Research Center at Lewis Field, Cleveland, OH 44135, USA; (J.L.S.); (R.I.W.); (G.C.)
| | - Rebekah I. Webster
- Environmental Effects and Coatings Branch, NASA John H. Glenn Research Center at Lewis Field, Cleveland, OH 44135, USA; (J.L.S.); (R.I.W.); (G.C.)
| | - Gustavo Costa
- Environmental Effects and Coatings Branch, NASA John H. Glenn Research Center at Lewis Field, Cleveland, OH 44135, USA; (J.L.S.); (R.I.W.); (G.C.)
| |
Collapse
|
41
|
Baldwin WJ, Liang X, Klarbring J, Dubajic M, Dell'Angelo D, Sutton C, Caddeo C, Stranks SD, Mattoni A, Walsh A, Csányi G. Dynamic Local Structure in Caesium Lead Iodide: Spatial Correlation and Transient Domains. SMALL (WEINHEIM AN DER BERGSTRASSE, GERMANY) 2024; 20:e2303565. [PMID: 37736694 DOI: 10.1002/smll.202303565] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 07/31/2023] [Indexed: 09/23/2023]
Abstract
Metal halide perovskites are multifunctional semiconductors with tunable structures and properties. They are highly dynamic crystals with complex octahedral tilting patterns and strongly anharmonic atomic behavior. In the higher temperature, higher symmetry phases of these materials, several complex structural features are observed. The local structure can differ greatly from the average structure and there is evidence that dynamic 2D structures of correlated octahedral motion form. An understanding of the underlying complex atomistic dynamics is, however, still lacking. In this work, the local structure of the inorganic perovskite CsPbI3 is investigated using a new machine learning force field based on the atomic cluster expansion framework. Through analysis of the temporal and spatial correlation observed during large-scale simulations, it is revealed that the low frequency motion of octahedral tilts implies a double-well effective potential landscape, even well into the cubic phase. Moreover, dynamic local regions of lower symmetry are present within both higher symmetry phases. These regions are planar and the length and timescales of the motion are reported. Finally, the spatial arrangement of these features and their interactions are investigated and visualized, providing a comprehensive picture of local structure in the higher symmetry phases.
Collapse
Affiliation(s)
- William J Baldwin
- Department of Engineering, University of Cambridge, Cambridge, CB2 1PZ, UK
| | - Xia Liang
- Department of Materials, Imperial College London, London, SW7 2AZ, UK
| | - Johan Klarbring
- Department of Materials, Imperial College London, London, SW7 2AZ, UK
- Department of Physics, Chemistry and Biology (IFM), Linköping University, Linköping, SE-581 83, Sweden
| | - Milos Dubajic
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, CB3 0AS, UK
| | | | - Christopher Sutton
- Department of Chemistry and Biochemistry, University of South Carolina, Columbia, SC, 29208, USA
| | - Claudia Caddeo
- CNR-IOM, Unitá di Cagliari, Monserrato, Caligari, 09042, Italy
| | - Samuel D Stranks
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, CB3 0AS, UK
| | | | - Aron Walsh
- Department of Materials, Imperial College London, London, SW7 2AZ, UK
| | - Gábor Csányi
- Department of Engineering, University of Cambridge, Cambridge, CB2 1PZ, UK
| |
Collapse
|
42
|
Stark W, Westermayr J, Douglas-Gallardo OA, Gardner J, Habershon S, Maurer RJ. Machine Learning Interatomic Potentials for Reactive Hydrogen Dynamics at Metal Surfaces Based on Iterative Refinement of Reaction Probabilities. THE JOURNAL OF PHYSICAL CHEMISTRY. C, NANOMATERIALS AND INTERFACES 2023; 127:24168-24182. [PMID: 38148847 PMCID: PMC10749455 DOI: 10.1021/acs.jpcc.3c06648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 11/12/2023] [Accepted: 11/15/2023] [Indexed: 12/28/2023]
Abstract
The reactive chemistry of molecular hydrogen at surfaces, notably dissociative sticking and hydrogen evolution, plays a crucial role in energy storage and fuel cells. Theoretical studies can help to decipher underlying mechanisms and reaction design, but studying dynamics at surfaces is computationally challenging due to the complex electronic structure at interfaces and the high sensitivity of dynamics to reaction barriers. In addition, ab initio molecular dynamics, based on density functional theory, is too computationally demanding to accurately predict reactive sticking or desorption probabilities, as it requires averaging over tens of thousands of initial conditions. High-dimensional machine learning-based interatomic potentials are starting to be more commonly used in gas-surface dynamics, yet robust approaches to generate reliable training data and assess how model uncertainty affects the prediction of dynamic observables are not well established. Here, we employ ensemble learning to adaptively generate training data while assessing model performance with full uncertainty quantification (UQ) for reaction probabilities of hydrogen scattering on different copper facets. We use this approach to investigate the performance of two message-passing neural networks, SchNet and PaiNN. Ensemble-based UQ and iterative refinement allow us to expose the shortcomings of the invariant pairwise-distance-based feature representation in the SchNet model for gas-surface dynamics.
Collapse
Affiliation(s)
- Wojciech
G. Stark
- Department
of Chemistry, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, U.K.
| | - Julia Westermayr
- Department
of Chemistry, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, U.K.
| | | | - James Gardner
- Department
of Chemistry, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, U.K.
| | - Scott Habershon
- Department
of Chemistry, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, U.K.
| | - Reinhard J. Maurer
- Department
of Chemistry, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, U.K.
- Department
of Physics, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, U.K.
| |
Collapse
|
43
|
Ying P, Fan Z. Combining the D3 dispersion correction with the neuroevolution machine-learned potential. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2023; 36:125901. [PMID: 38052090 DOI: 10.1088/1361-648x/ad1278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Accepted: 12/05/2023] [Indexed: 12/07/2023]
Abstract
Machine-learned potentials (MLPs) have become a popular approach of modeling interatomic interactions in atomistic simulations, but to keep the computational cost under control, a relatively short cutoff must be imposed, which put serious restrictions on the capability of the MLPs for modeling relatively long-ranged dispersion interactions. In this paper, we propose to combine the neuroevolution potential (NEP) with the popular D3 correction to achieve a unified NEP-D3 model that can simultaneously model relatively short-ranged bonded interactions and relatively long-ranged dispersion interactions. We show that improved descriptions of the binding and sliding energies in bilayer graphene can be obtained by the NEP-D3 approach compared to the pure NEP approach. We implement the D3 part into thegpumdpackage such that it can be used out of the box for many exchange-correlation functionals. As a realistic application, we show that dispersion interactions result in approximately a 10% reduction in thermal conductivity for three typical metal-organic frameworks.
Collapse
Affiliation(s)
- Penghua Ying
- Department of Physical Chemistry, School of Chemistry, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Zheyong Fan
- College of Physical Science and Technology, Bohai University, Jinzhou 121013, People's Republic of China
| |
Collapse
|
44
|
Teng C, Huang D, Donahue E, Bao JL. Exploring torsional conformer space with physical prior mean function-driven meta-Gaussian processes. J Chem Phys 2023; 159:214111. [PMID: 38051097 DOI: 10.1063/5.0176709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 11/12/2023] [Indexed: 12/07/2023] Open
Abstract
We present a novel approach for systematically exploring the conformational space of small molecules with multiple internal torsions. Identifying unique conformers through a systematic conformational search is important for obtaining accurate thermodynamic functions (e.g., free energy), encompassing contributions from the ensemble of all local minima. Traditional geometry optimizers focus on one structure at a time, lacking transferability from the local potential-energy surface (PES) around a specific minimum to optimize other conformers. In this work, we introduce a physics-driven meta-Gaussian processes (meta-GPs) method that not only enables efficient exploration of target PES for locating local minima but, critically, incorporates physical surrogates that can be applied universally across the optimization of all conformers of the same molecule. Meta-GPs construct surrogate PESs based on the optimization history of prior conformers, dynamically selecting the most suitable prior mean function (representing prior knowledge in Bayesian learning) as a function of the optimization progress. We systematically benchmarked the performance of multiple GP variants for brute-force conformational search of amino acids. Our findings highlight the superior performance of meta-GPs in terms of efficiency, comprehensiveness of conformer discovery, and the distribution of conformers compared to conventional non-surrogate optimizers and other non-meta-GPs. Furthermore, we demonstrate that by concurrently optimizing, training GPs on the fly, and learning PESs, meta-GPs exhibit the capacity to generate high-quality PESs in the torsional space without extensive training data. This represents a promising avenue for physics-based transfer learning via meta-GPs with adaptive priors in exploring torsional conformer space.
Collapse
Affiliation(s)
- Chong Teng
- Department of Chemistry, Boston College, Chestnut Hill, Massachusetts 02467, USA
| | - Daniel Huang
- Department of Computer Science, San Francisco State University, San Francisco, California 94132, USA
| | - Elizabeth Donahue
- Department of Chemistry, Boston College, Chestnut Hill, Massachusetts 02467, USA
| | - Junwei Lucas Bao
- Department of Chemistry, Boston College, Chestnut Hill, Massachusetts 02467, USA
| |
Collapse
|
45
|
Ko TW, Ong SP. Recent advances and outstanding challenges for machine learning interatomic potentials. NATURE COMPUTATIONAL SCIENCE 2023; 3:998-1000. [PMID: 38177726 DOI: 10.1038/s43588-023-00561-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2024]
Affiliation(s)
- Tsz Wai Ko
- Department of NanoEngineering, University of California San Diego, La Jolla, CA, USA
| | - Shyue Ping Ong
- Department of NanoEngineering, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
46
|
Xia S, Chen E, Zhang Y. Integrated Molecular Modeling and Machine Learning for Drug Design. J Chem Theory Comput 2023; 19:7478-7495. [PMID: 37883810 PMCID: PMC10653122 DOI: 10.1021/acs.jctc.3c00814] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 10/10/2023] [Accepted: 10/11/2023] [Indexed: 10/28/2023]
Abstract
Modern therapeutic development often involves several stages that are interconnected, and multiple iterations are usually required to bring a new drug to the market. Computational approaches have increasingly become an indispensable part of helping reduce the time and cost of the research and development of new drugs. In this Perspective, we summarize our recent efforts on integrating molecular modeling and machine learning to develop computational tools for modulator design, including a pocket-guided rational design approach based on AlphaSpace to target protein-protein interactions, delta machine learning scoring functions for protein-ligand docking as well as virtual screening, and state-of-the-art deep learning models to predict calculated and experimental molecular properties based on molecular mechanics optimized geometries. Meanwhile, we discuss remaining challenges and promising directions for further development and use a retrospective example of FDA approved kinase inhibitor Erlotinib to demonstrate the use of these newly developed computational tools.
Collapse
Affiliation(s)
- Song Xia
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Eric Chen
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Yingkai Zhang
- Department
of Chemistry, New York University, New York, New York 10003, United States
- Simons
Center for Computational Physical Chemistry at New York University, New York, New York 10003, United States
- NYU-ECNU
Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
47
|
Nandi S, Vegge T, Bhowmik A. MultiXC-QM9: Large dataset of molecular and reaction energies from multi-level quantum chemical methods. Sci Data 2023; 10:783. [PMID: 37938558 PMCID: PMC10632468 DOI: 10.1038/s41597-023-02690-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 10/25/2023] [Indexed: 11/09/2023] Open
Abstract
Well curated extensive datasets have helped spur intense molecular machine learning (ML) method development activities over the last few years, encouraging nonchemists to be part of the effort as well. QM9 dataset is one of the benchmark databases for small molecules with molecular energies based on B3LYP functional. G4MP2 based energies of these molecules were published later. To enable a wide variety of ML tasks like transfer learning, delta learning, multitask learning, etc. with QM9 molecules, in this article, we introduce a new dataset with QM9 molecule energies estimated with 76 different DFT functionals and three different basis sets (228 energy numbers for each molecule). We additionally enumerated all possible A ↔ B monomolecular interconversions within the QM9 dataset and provided the reaction energies based on these 76 functionals, and basis sets. Lastly, we also provide the bond changes for all the 162 million reactions with the dataset to enable structure- and bond-based reaction energy prediction tools based on ML.
Collapse
Affiliation(s)
- Surajit Nandi
- Department of Energy Conversion and Storage, Technical University of Denmark, Anker Engelunds Vej 301, 2800 Kongens Lyngby, Copenhagen, Denmark
| | - Tejs Vegge
- Department of Energy Conversion and Storage, Technical University of Denmark, Anker Engelunds Vej 301, 2800 Kongens Lyngby, Copenhagen, Denmark
| | - Arghya Bhowmik
- Department of Energy Conversion and Storage, Technical University of Denmark, Anker Engelunds Vej 301, 2800 Kongens Lyngby, Copenhagen, Denmark.
| |
Collapse
|
48
|
Klawohn S, Darby JP, Kermode JR, Csányi G, Caro MA, Bartók AP. Gaussian approximation potentials: Theory, software implementation and application examples. J Chem Phys 2023; 159:174108. [PMID: 37929869 DOI: 10.1063/5.0160898] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 09/12/2023] [Indexed: 11/07/2023] Open
Abstract
Gaussian Approximation Potentials (GAPs) are a class of Machine Learned Interatomic Potentials routinely used to model materials and molecular systems on the atomic scale. The software implementation provides the means for both fitting models using ab initio data and using the resulting potentials in atomic simulations. Details of the GAP theory, algorithms and software are presented, together with detailed usage examples to help new and existing users. We review some recent developments to the GAP framework, including Message Passing Interface parallelisation of the fitting code enabling its use on thousands of central processing unit cores and compression of descriptors to eliminate the poor scaling with the number of different chemical elements.
Collapse
Affiliation(s)
- Sascha Klawohn
- Warwick Centre for Predictive Modelling, School of Engineering, University of Warwick, Coventry CV4 7AL, United Kingdom
| | - James P Darby
- Warwick Centre for Predictive Modelling, School of Engineering, University of Warwick, Coventry CV4 7AL, United Kingdom
| | - James R Kermode
- Warwick Centre for Predictive Modelling, School of Engineering, University of Warwick, Coventry CV4 7AL, United Kingdom
| | - Gábor Csányi
- Engineering Laboratory, University of Cambridge, Cambridge CB2 1PZ, United Kingdom
| | - Miguel A Caro
- Department of Chemistry and Materials Science, Aalto University, 02150 Espoo, Finland
| | - Albert P Bartók
- Department of Physics, University of Warwick, Coventry CV4 7AL, United Kingdom and Warwick Centre for Predictive Modelling, School of Engineering, University of Warwick, Coventry CV4 7AL, United Kingdom
| |
Collapse
|
49
|
Huguenin-Dumittan K, Loche P, Haoran N, Ceriotti M. Physics-Inspired Equivariant Descriptors of Nonbonded Interactions. J Phys Chem Lett 2023; 14:9612-9618. [PMID: 37862712 PMCID: PMC10626632 DOI: 10.1021/acs.jpclett.3c02375] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 10/13/2023] [Indexed: 10/22/2023]
Abstract
One essential ingredient in many machine learning (ML) based methods for atomistic modeling of materials and molecules is the use of locality. While allowing better system-size scaling, this systematically neglects long-range (LR) effects such as electrostatic or dispersion interactions. We present an extension of the long distance equivariant (LODE) framework that can handle diverse LR interactions in a consistent way and seamlessly integrates with preexisting methods by building new sets of atom centered features. We provide a direct physical interpretation of these using the multipole expansion, which allows for simpler and more efficient implementations. The framework is applied to simple toy systems as proof of concept and a heterogeneous set of molecular dimers to push the method to its limits. By generalizing LODE to arbitrary asymptotic behaviors, we provide a coherent approach to treat arbitrary two- and many-body nonbonded interactions in the data-driven modeling of matter.
Collapse
Affiliation(s)
- Kevin
K. Huguenin-Dumittan
- Laboratory
of Computational Science and Modeling, IMX,
École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Philip Loche
- Laboratory
of Computational Science and Modeling, IMX,
École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Ni Haoran
- Laboratory
of Computational Science and Modeling, IMX,
École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Michele Ceriotti
- Laboratory
of Computational Science and Modeling, IMX,
École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
50
|
Illarionov A, Sakipov S, Pereyaslavets L, Kurnikov IV, Kamath G, Butin O, Voronina E, Ivahnenko I, Leontyev I, Nawrocki G, Darkhovskiy M, Olevanov M, Cherniavskyi YK, Lock C, Greenslade S, Sankaranarayanan SKRS, Kurnikova MG, Potoff J, Kornberg RD, Levitt M, Fain B. Combining Force Fields and Neural Networks for an Accurate Representation of Chemically Diverse Molecular Interactions. J Am Chem Soc 2023; 145:23620-23629. [PMID: 37856313 PMCID: PMC10623557 DOI: 10.1021/jacs.3c07628] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Indexed: 10/21/2023]
Abstract
A key goal of molecular modeling is the accurate reproduction of the true quantum mechanical potential energy of arbitrary molecular ensembles with a tractable classical approximation. The challenges are that analytical expressions found in general purpose force fields struggle to faithfully represent the intermolecular quantum potential energy surface at close distances and in strong interaction regimes; that the more accurate neural network approximations do not capture crucial physics concepts, e.g., nonadditive inductive contributions and application of electric fields; and that the ultra-accurate narrowly targeted models have difficulty generalizing to the entire chemical space. We therefore designed a hybrid wide-coverage intermolecular interaction model consisting of an analytically polarizable force field combined with a short-range neural network correction for the total intermolecular interaction energy. Here, we describe the methodology and apply the model to accurately determine the properties of water, the free energy of solvation of neutral and charged molecules, and the binding free energy of ligands to proteins. The correction is subtyped for distinct chemical species to match the underlying force field, to segment and reduce the amount of quantum training data, and to increase accuracy and computational speed. For the systems considered, the hybrid ab initio parametrized Hamiltonian reproduces the two-body dimer quantum mechanics (QM) energies to within 0.03 kcal/mol and the nonadditive many-molecule contributions to within 2%. Simulations of molecular systems using this interaction model run at speeds of several nanoseconds per day.
Collapse
Affiliation(s)
- Alexey Illarionov
- InterX
Inc. (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Serzhan Sakipov
- InterX
Inc. (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Leonid Pereyaslavets
- InterX
Inc. (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Igor V. Kurnikov
- InterX
Inc. (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Ganesh Kamath
- InterX
Inc. (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Oleg Butin
- InterX
Inc. (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Ekaterina Voronina
- InterX
Inc. (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
- Lomonosov
MSU, Skobeltsyn Institute of Nuclear Physics, Moscow, 119991, Russia
| | - Ilya Ivahnenko
- InterX
Inc. (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Igor Leontyev
- InterX
Inc. (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Grzegorz Nawrocki
- InterX
Inc. (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Mikhail Darkhovskiy
- InterX
Inc. (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Michael Olevanov
- InterX
Inc. (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
- Lomonosov
MSU, Dept. of Physics, Moscow, 119991, Russia
| | - Yevhen K. Cherniavskyi
- InterX
Inc. (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Christopher Lock
- InterX
Inc. (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
- Department
of Neurology and Neurological Sciences, Stanford University School of Medicine, Palo Alto, California 94304, United States
| | - Sean Greenslade
- InterX
Inc. (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| | - Subramanian KRS Sankaranarayanan
- Center
for Nanoscale Materials, Argonne National
Lab, Argonne, Illinois 604391, United States
- Department
of Mechanical and Industrial Engineering, University of Illinois, Chicago, Illinois 60607, United States
| | - Maria G. Kurnikova
- Department
of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Jeffrey Potoff
- Department
of Chemical Engineering and Materials Science, Wayne State University, Detroit, Michigan 48202, United States
| | - Roger D. Kornberg
- Department
of Structural Biology, Stanford University
School of Medicine, Stanford, California 94304, United States
| | - Michael Levitt
- Department
of Structural Biology, Stanford University
School of Medicine, Stanford, California 94304, United States
| | - Boris Fain
- InterX
Inc. (a Subsidiary of NeoTX Therapeutics Ltd.), 805 Allston Way, Berkeley, California 94710, United States
| |
Collapse
|