1
|
Aldossary A, Campos-Gonzalez-Angulo JA, Pablo-García S, Leong SX, Rajaonson EM, Thiede L, Tom G, Wang A, Avagliano D, Aspuru-Guzik A. In Silico Chemical Experiments in the Age of AI: From Quantum Chemistry to Machine Learning and Back. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2024:e2402369. [PMID: 38794859 DOI: 10.1002/adma.202402369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 04/28/2024] [Indexed: 05/26/2024]
Abstract
Computational chemistry is an indispensable tool for understanding molecules and predicting chemical properties. However, traditional computational methods face significant challenges due to the difficulty of solving the Schrödinger equations and the increasing computational cost with the size of the molecular system. In response, there has been a surge of interest in leveraging artificial intelligence (AI) and machine learning (ML) techniques to in silico experiments. Integrating AI and ML into computational chemistry increases the scalability and speed of the exploration of chemical space. However, challenges remain, particularly regarding the reproducibility and transferability of ML models. This review highlights the evolution of ML in learning from, complementing, or replacing traditional computational chemistry for energy and property predictions. Starting from models trained entirely on numerical data, a journey set forth toward the ideal model incorporating or learning the physical laws of quantum mechanics. This paper also reviews existing computational methods and ML models and their intertwining, outlines a roadmap for future research, and identifies areas for improvement and innovation. Ultimately, the goal is to develop AI architectures capable of predicting accurate and transferable solutions to the Schrödinger equation, thereby revolutionizing in silico experiments within chemistry and materials science.
Collapse
Affiliation(s)
- Abdulrahman Aldossary
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
| | | | - Sergio Pablo-García
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, ON, M5S 2E4, Canada
| | - Shi Xuan Leong
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
| | - Ella Miray Rajaonson
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
| | - Luca Thiede
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, ON, M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
| | - Gary Tom
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
| | - Andrew Wang
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
| | - Davide Avagliano
- Chimie ParisTech, PSL University, CNRS, Institute of Chemistry for Life and Health Sciences (iCLeHS UMR 8060), Paris, F-75005, France
| | - Alán Aspuru-Guzik
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, ON, M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
- Department of Materials Science & Engineering, University of Toronto, 184 College St., Toronto, ON, M5S 3E4, Canada
- Department of Chemical Engineering & Applied Chemistry, University of Toronto, 200 College St., Toronto, ON, M5S 3E5, Canada
- Lebovic Fellow, Canadian Institute for Advanced Research (CIFAR), 66118 University Ave., Toronto, M5G 1M1, Canada
- Acceleration Consortium, 80 St George St, Toronto, M5S 3H6, Canada
| |
Collapse
|
2
|
Khanifaev J, Schrader T, Perlt E. The effect of machine learning predicted anharmonic frequencies on thermodynamic properties of fluid hydrogen fluoride. J Chem Phys 2024; 160:124302. [PMID: 38516969 DOI: 10.1063/5.0195386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Accepted: 03/02/2024] [Indexed: 03/23/2024] Open
Abstract
Anharmonic effects play a crucial role in determining thermochemical properties of liquids and gases. For such extended phases, the inclusion of anharmonicity in reliable electronic structure methods is computationally extremely demanding, and hence, anharmonic effects are often lacking in thermochemical calculations. In this study, we apply the quantum cluster equilibrium method to transfer density functional theory calculations at the cluster level to the macroscopic, liquid, and gaseous phase of hydrogen fluoride. This allows us to include anharmonicity, either via vibrational self-consistent field calculations for smaller clusters or using a regression model for larger clusters. We obtain the structural composition of the fluid phases in terms of the population of different clusters as well as isobaric heat capacities as an example for thermodynamic properties. We study the role of anharmonicities for these analyses and observe that, in particular, the dominating structural motifs are rather sensitive to the anharmonicity in vibrational frequencies. The regression model proves to be a promising way to get access to anharmonic features, and the extension to more sophisticated machine-learning models is promising.
Collapse
Affiliation(s)
- Jamoliddin Khanifaev
- Otto Schott Institute of Materials Research, Friedrich Schiller University Jena, 07743 Jena, Germany
| | - Tim Schrader
- Otto Schott Institute of Materials Research, Friedrich Schiller University Jena, 07743 Jena, Germany
| | - Eva Perlt
- Otto Schott Institute of Materials Research, Friedrich Schiller University Jena, 07743 Jena, Germany
| |
Collapse
|
3
|
Schröder B, Rauhut G. From the Automated Calculation of Potential Energy Surfaces to Accurate Infrared Spectra. J Phys Chem Lett 2024; 15:3159-3169. [PMID: 38478898 PMCID: PMC10961845 DOI: 10.1021/acs.jpclett.4c00186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 02/20/2024] [Accepted: 02/28/2024] [Indexed: 03/22/2024]
Abstract
Advances in the development of quantum chemical methods and progress in multicore architectures in computer science made the simulation of infrared spectra of isolated molecules competitive with respect to established experimental methods. Although it is mainly the multidimensional potential energy surface that controls the accuracy of these calculations, the subsequent vibrational structure calculations need to be carefully converged in order to yield accurate results. As both aspects need to be considered in a balanced way, we focus on approaches for molecules of up to 12-15 atoms with respect to both parts, which have been automated to some extent so that they can be employed in routine applications. Alternatives to machine learning will be discussed, which appear to be attractive, as long as local regions of the potential energy surface are sufficient. The automatization of these methods is still in its infancy, and the generalization to molecules with large amplitude motions or molecular clusters is far from trivial, but many systems relevant for astrophysical studies are already in reach.
Collapse
Affiliation(s)
- Benjamin Schröder
- Institute
of Physical Chemistry, University of Goettingen, Tammannstrasse 6, Göttingen 37077, Germany
| | - Guntram Rauhut
- Institute
for Theoretical Chemistry, University of
Stuttgart, Pfaffenwaldring 55, Stuttgart 70569, Germany
| |
Collapse
|
4
|
Käser S, Meuwly M. Numerical Accuracy Matters: Applications of Machine Learned Potential Energy Surfaces. J Phys Chem Lett 2024:3419-3424. [PMID: 38506827 DOI: 10.1021/acs.jpclett.3c03405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/21/2024]
Abstract
The role of numerical accuracy in training and evaluating neural network-based potential energy surfaces is examined for different experimental observables. For observables that require third- and fourth-order derivatives of the potential energy with respect to Cartesian coordinates single-precision arithmetics as is typically used in ML-based approaches is insufficient and leads to roughness of the underlying PES as is explicitly demonstrated. Increasing the numerical accuracy to double-precision gives a smooth PES with higher-order derivatives that are numerically stable and yield meaningful anharmonic frequencies and tunneling splitting as is demonstrated for H2CO and malonaldehyde. For molecular dynamics simulations, which only require first-order derivatives, single-precision arithmetics appears to be sufficient, though.
Collapse
Affiliation(s)
- Silvan Käser
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| | - Markus Meuwly
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| |
Collapse
|
5
|
Shi J, Albreiki F, Yamil J Colón, Srivastava S, Whitmer JK. Transfer Learning Facilitates the Prediction of Polymer-Surface Adhesion Strength. J Chem Theory Comput 2023; 19:4631-4640. [PMID: 37068204 DOI: 10.1021/acs.jctc.2c01314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2023]
Abstract
Machine learning (ML) accelerates the exploration of material properties and their links to the structure of the underlying molecules. In previous work [Shi et al. ACS Applied Materials & Interfaces 2022, 14, 37161-37169.], ML models were applied to predict the adhesive free energy of polymer-surface interactions with high accuracy from the knowledge of the sequence data, demonstrating successes in inverse-design of polymer sequence for known surface compositions. While the method was shown to be successful in designing polymers for a known surface, extensive data sets were needed for each specific surface in order to train the surrogate models. Ideally, one should be able to infer information about similar surfaces without having to regenerate a full complement of adhesion data for each new case. In the current work, we demonstrate a transfer learning (TL) technique using a deep neural network to improve the accuracy of ML models trained on small data sets by pretraining on a larger database from a related system and fine-tuning the weights of all layers with a small amount of additional data. The shared knowledge from the pretrained model facilitates the prediction accuracy significantly on small data sets. We also explore the limits of database size on accuracy and the optimal tuning of network architecture and parameters for our learning tasks. While applied to a relatively simple coarse-grained (CG) polymer model, the general lessons of this study apply to detailed modeling studies and the broader problems of inverse materials design.
Collapse
Affiliation(s)
- Jiale Shi
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Fahed Albreiki
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, Los Angeles, California 90095, United States
| | - Yamil J Colón
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Samanvaya Srivastava
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, Los Angeles, California 90095, United States
- California NanoSystems Institute, Center for Biological Physics, University of California, Los Angeles, Los Angeles, California 90095, United States
- Institute for Carbon Management, University of California, Los Angeles, Los Angeles, California 90095, United States
- Center for Biological Physics, University of California, Los Angeles, Los Angeles, California 90095, United States
| | - Jonathan K Whitmer
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, Indiana 46556, United States
| |
Collapse
|
6
|
Tang Z, Bromley ST, Hammer B. A machine learning potential for simulating infrared spectra of nanosilicate clusters. J Chem Phys 2023; 158:2895243. [PMID: 37290080 DOI: 10.1063/5.0150379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 05/23/2023] [Indexed: 06/10/2023] Open
Abstract
The use of machine learning (ML) in chemical physics has enabled the construction of interatomic potentials having the accuracy of ab initio methods and a computational cost comparable to that of classical force fields. Training an ML model requires an efficient method for the generation of training data. Here, we apply an accurate and efficient protocol to collect training data for constructing a neural network-based ML interatomic potential for nanosilicate clusters. Initial training data are taken from normal modes and farthest point sampling. Later on, the set of training data is extended via an active learning strategy in which new data are identified by the disagreement between an ensemble of ML models. The whole process is further accelerated by parallel sampling over structures. We use the ML model to run molecular dynamics simulations of nanosilicate clusters with various sizes, from which infrared spectra with anharmonicity included can be extracted. Such spectroscopic data are needed for understanding the properties of silicate dust grains in the interstellar medium and in circumstellar environments.
Collapse
Affiliation(s)
- Zeyuan Tang
- Center for Interstellar Catalysis, Department of Physics and Astronomy, Aarhus University, Ny Munkegade 120, Aarhus C 8000, Denmark
| | - Stefan T Bromley
- Departament de Ciència de Materials i Química Física and Institut de Química Teòrica i Computatcional (IQTCUB), Universitat de Barcelona, c/Martí i Franquès 1-11, 08028 Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys 23, 08010 Barcelona, Spain
| | - Bjørk Hammer
- Center for Interstellar Catalysis, Department of Physics and Astronomy, Aarhus University, Ny Munkegade 120, Aarhus C 8000, Denmark
| |
Collapse
|
7
|
Käser S, Vazquez-Salazar LI, Meuwly M, Töpfer K. Neural network potentials for chemistry: concepts, applications and prospects. DIGITAL DISCOVERY 2023; 2:28-58. [PMID: 36798879 PMCID: PMC9923808 DOI: 10.1039/d2dd00102k] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Accepted: 12/20/2022] [Indexed: 12/24/2022]
Abstract
Artificial Neural Networks (NN) are already heavily involved in methods and applications for frequent tasks in the field of computational chemistry such as representation of potential energy surfaces (PES) and spectroscopic predictions. This perspective provides an overview of the foundations of neural network-based full-dimensional potential energy surfaces, their architectures, underlying concepts, their representation and applications to chemical systems. Methods for data generation and training procedures for PES construction are discussed and means for error assessment and refinement through transfer learning are presented. A selection of recent results illustrates the latest improvements regarding accuracy of PES representations and system size limitations in dynamics simulations, but also NN application enabling direct prediction of physical results without dynamics simulations. The aim is to provide an overview for the current state-of-the-art NN approaches in computational chemistry and also to point out the current challenges in enhancing reliability and applicability of NN methods on a larger scale.
Collapse
Affiliation(s)
- Silvan Käser
- Department of Chemistry, University of Basel Klingelbergstrasse 80 CH-4056 Basel Switzerland
| | | | - Markus Meuwly
- Department of Chemistry, University of Basel Klingelbergstrasse 80 CH-4056 Basel Switzerland
| | - Kai Töpfer
- Department of Chemistry, University of Basel Klingelbergstrasse 80 CH-4056 Basel Switzerland
| |
Collapse
|
8
|
Bac S, Patra A, Kron KJ, Mallikarjun Sharada S. Recent Advances toward Efficient Calculation of Higher Nuclear Derivatives in Quantum Chemistry. J Phys Chem A 2022; 126:7795-7805. [DOI: 10.1021/acs.jpca.2c05459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Selin Bac
- Mork Family Department of Chemical Engineering and Materials Science, University of Southern California, Los Angeles, California90089, United States
| | - Abhilash Patra
- Mork Family Department of Chemical Engineering and Materials Science, University of Southern California, Los Angeles, California90089, United States
| | - Kareesa J. Kron
- Mork Family Department of Chemical Engineering and Materials Science, University of Southern California, Los Angeles, California90089, United States
| | - Shaama Mallikarjun Sharada
- Mork Family Department of Chemical Engineering and Materials Science, University of Southern California, Los Angeles, California90089, United States
- Department of Chemistry, University of Southern California, Los Angeles, California90089, United States
| |
Collapse
|
9
|
Käser S, Richardson JO, Meuwly M. Transfer Learning for Affordable and High-Quality Tunneling Splittings from Instanton Calculations. J Chem Theory Comput 2022; 18:6840-6850. [DOI: 10.1021/acs.jctc.2c00790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Silvan Käser
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| | | | - Markus Meuwly
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| |
Collapse
|
10
|
Ruth M, Gerbig D, Schreiner PR. Machine Learning of Coupled Cluster (T)-Energy Corrections via Delta (Δ)-Learning. J Chem Theory Comput 2022; 18:4846-4855. [PMID: 35816588 DOI: 10.1021/acs.jctc.2c00501] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Accurate thermochemistry is essential in many chemical disciplines, such as astro-, atmospheric, or combustion chemistry. These areas often involve fleetingly existent intermediates whose thermochemistry is difficult to assess. Whenever direct calorimetric experiments are infeasible, accurate computational estimates of relative molecular energies are required. However, high-level computations, often using coupled cluster theory, are generally resource-intensive. To expedite the process using machine learning techniques, we generated a database of energies for small organic molecules at the CCSD(T)/cc-pVDZ, CCSD(T)/aug-cc-pVDZ, and CCSD(T)/cc-pVTZ levels of theory. Leveraging the power of deep learning by employing graph neural networks, we are able to predict the effect of perturbatively included triples (T), that is, the difference between CCSD and CCSD(T) energies, with a mean absolute error of 0.25, 0.25, and 0.28 kcal mol-1 (R2 of 0.998, 0.997, and 0.998) with the cc-pVDZ, aug-cc-pVDZ, and cc-pVTZ basis sets, respectively. Our models were further validated by application to three validation sets taken from the S22 Database as well as to a selection of known theoretically challenging cases.
Collapse
Affiliation(s)
- Marcel Ruth
- Institute of Organic Chemistry, Justus Liebig University, Heinrich-Buff-Ring 17, 35392 Giessen, Germany
| | - Dennis Gerbig
- Institute of Organic Chemistry, Justus Liebig University, Heinrich-Buff-Ring 17, 35392 Giessen, Germany
| | - Peter R Schreiner
- Institute of Organic Chemistry, Justus Liebig University, Heinrich-Buff-Ring 17, 35392 Giessen, Germany
| |
Collapse
|
11
|
Meuwly M. Atomistic Simulations for Reactions and Vibrational Spectroscopy in the Era of Machine Learning─ Quo Vadis?. J Phys Chem B 2022; 126:2155-2167. [PMID: 35286087 DOI: 10.1021/acs.jpcb.2c00212] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Atomistic simulations using accurate energy functions can provide molecular-level insight into functional motions of molecules in the gas and in the condensed phase. This Perspective delineates the present status of the field from the efforts of others and some of our own work and discusses open questions and future prospects. The combination of physics-based long-range representations using multipolar charge distributions and kernel representations for the bonded interactions is shown to provide realistic models for the exploration of the infrared spectroscopy of molecules in solution. For reactions, empirical models connecting dedicated energy functions for the reactant and product states allow statistically meaningful sampling of conformational space whereas machine-learned energy functions are superior in accuracy. The future combination of physics-based models with machine-learning techniques and integration into all-purpose molecular simulation software provides a unique opportunity to bring such dynamics simulations closer to reality.
Collapse
Affiliation(s)
- Markus Meuwly
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, 4056 Basel, Switzerland
| |
Collapse
|
12
|
Gupta AK, Raghavachari K. Three-Dimensional Convolutional Neural Networks Utilizing Molecular Topological Features for Accurate Atomization Energy Predictions. J Chem Theory Comput 2022; 18:2132-2143. [PMID: 35226496 DOI: 10.1021/acs.jctc.1c00504] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Deep learning methods provide a novel way to establish a correlation between two quantities. In this context, computer vision techniques such as three-dimensional (3D)-convolutional neural networks become a natural choice to associate a molecular property with its structure due to the inherent 3D nature of a molecule. However, traditional 3D input data structures are intrinsically sparse in nature, which tend to induce instabilities during the learning process, which in turn may lead to underfitted results. To address this deficiency, in this project, we propose to use quantum-chemically derived molecular topological features, namely, localized orbital locator and electron localization function, as molecular descriptors, which provide a relatively denser input representation in a 3D space. Such topological features provide a detailed picture of the atomic and electronic configuration and interatomic interactions in the molecule and hence are ideal for predicting properties that are highly dependent on the physical or electronic structure of the molecule. Herein, we demonstrate the efficacy of our proposed model by applying it to the task of predicting atomization energies for the QM9-G4MP2 data set, which contains ∼134k molecules. Furthermore, we incorporated the Δ-machine learning approach into our model, which enabled us to reach beyond benchmark accuracy levels (∼1.0 kJ mol-1). As a result, we consistently obtain impressive mean absolute errors of the order 0.1 kcal mol-1 (∼0.42 kJ mol-1) versus the G4(MP2) theory using relatively modest models, which could potentially be improved further in a systematic manner using additional compute resources.
Collapse
Affiliation(s)
- Ankur Kumar Gupta
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | - Krishnan Raghavachari
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| |
Collapse
|
13
|
Salehi SM, Käser S, Töpfer K, Diamantis P, Pfister R, Hamm P, Rothlisberger U, Meuwly M. Hydration dynamics and IR spectroscopy of 4-fluorophenol. Phys Chem Chem Phys 2022; 24:26046-26060. [PMID: 36268728 PMCID: PMC9627945 DOI: 10.1039/d2cp02857c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Halogenated groups are relevant in pharmaceutical applications and potentially useful spectroscopic probes for infrared spectroscopy. In this work, the structural dynamics and infrared spectroscopy of para-fluorophenol (F-PhOH) and phenol (PhOH) is investigated in the gas phase and in water using a combination of experiment and molecular dynamics (MD) simulations. The gas phase and solvent dynamics around F-PhOH and PhOH is characterized from atomistic simulations using empirical energy functions with point charges or multipoles for the electrostatics, Machine Learning (ML) based parametrizations and with full ab initio (QM) and mixed Quantum Mechanical/Molecular Mechanics (QM/MM) simulations with a particular focus on the CF- and OH-stretch region. The CF-stretch band is heavily mixed with other modes whereas the OH-stretch in solution displays a characteristic high-frequency peak around 3600 cm−1 most likely associated with the –OH group of PhOH and F-PhOH together with a characteristic progression below 3000 cm−1 due to coupling with water modes which is also reproduced by several of the simulations. Solvent and radial distribution functions indicate that the CF-site is largely hydrophobic except for simulations using point charges which renders them unsuited for correctly describing hydration and dynamics around fluorinated sites. The hydrophobic character of the CF-group is particularly relevant for applications in pharmaceutical chemistry with a focus on local hydration and interaction with the surrounding protein. Halogenated groups are relevant in pharmaceutical applications and potentially useful spectroscopic probes for infrared spectroscopy.![]()
Collapse
Affiliation(s)
- Seyedeh Maryam Salehi
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| | - Silvan Käser
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| | - Kai Töpfer
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| | - Polydefkis Diamantis
- Laboratory of Computational Chemistry and Biochemistry, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland
| | - Rolf Pfister
- Department of Chemistry, University of Zurich, Switzerland
| | - Peter Hamm
- Department of Chemistry, University of Zurich, Switzerland
| | - Ursula Rothlisberger
- Laboratory of Computational Chemistry and Biochemistry, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland
| | - Markus Meuwly
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| |
Collapse
|
14
|
Käser S, Meuwly M. Transfer learned potential energy surfaces: accurate anharmonic vibrational dynamics and dissociation energies for the formic acid monomer and dimer. Phys Chem Chem Phys 2021; 24:5269-5281. [PMID: 34792523 PMCID: PMC8890265 DOI: 10.1039/d1cp04393e] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The vibrational dynamics of the formic acid monomer (FAM) and dimer (FAD) is investigated from machine-learned potential energy surfaces at the MP2 (PESMP2) and transfer-learned (PESTL) to the CCSD(T) levels of theory. The normal mode (MAEs of 17.6 and 25.1 cm−1) and second order vibrational perturbation theory (VPT2, MAEs of 6.7 and 17.1 cm−1) frequencies from PESTL for all modes below 2000 cm−1 for FAM and FAD agree favourably with experiment. For the OH stretch mode the experimental frequencies are overestimated by more than 150 cm−1 for both FAM and FAD from normal mode calculations. Conversely, VPT2 calculations on PESTL for FAM reproduce the experimental OH frequency to within 22 cm−1. For FAD the VPT2 calculations find the high-frequency OH stretch at 3011 cm−1, compared with an experimentally reported, broad (∼100 cm−1) absorption band with center frequency estimated at ∼3050 cm−1. In agreement with earlier reports, MD simulations at higher temperature shift the position of the OH-stretch in FAM to the red, consistent with improved sampling of the anharmonic regions of the PES. However, for FAD the OH-stretch shifts to the blue and for temperatures higher than 1000 K the dimer partly or fully dissociates using PESTL. Including zero-point energy corrections from diffusion Monte Carlo simulations for FAM and FAD and corrections due to basis set superposition and completeness errors yields a dissociation energy of D0 = −14.23 ± 0.08 kcal mol−1 compared with an experimentally determined value of −14.22 ± 0.12 kcal mol−1. Neural network based PESs are constructed for formic acid monomer and dimer at the MP2 and transfer learned to the CCSD(T) level of theory. The PESs are used to study the vibrational dynamics and dissociation energy of the molecules.![]()
Collapse
Affiliation(s)
- Silvan Käser
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland.
| | - Markus Meuwly
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland.
| |
Collapse
|
15
|
Abstract
Computational methods have emerged as a powerful tool to augment traditional experimental molecular catalyst design by providing useful predictions of catalyst performance and decreasing the time needed for catalyst screening. In this perspective, we discuss three approaches for computational molecular catalyst design: (i) the reaction mechanism-based approach that calculates all relevant elementary steps, finds the rate and selectivity determining steps, and ultimately makes predictions on catalyst performance based on kinetic analysis, (ii) the descriptor-based approach where physical/chemical considerations are used to find molecular properties as predictors of catalyst performance, and (iii) the data-driven approach where statistical analysis as well as machine learning (ML) methods are used to obtain relationships between available data/features and catalyst performance. Following an introduction to these approaches, we cover their strengths and weaknesses and highlight some recent key applications. Furthermore, we present an outlook on how the currently applied approaches may evolve in the near future by addressing how recent developments in building automated computational workflows and implementing advanced ML models hold promise for reducing human workload, eliminating human bias, and speeding up computational catalyst design at the same time. Finally, we provide our viewpoint on how some of the challenges associated with the up-and-coming approaches driven by automation and ML may be resolved.
Collapse
Affiliation(s)
- Ademola Soyemi
- Department of Chemical and Biological Engineering, The University of Alabama, Tuscaloosa, AL 35487, USA.
| | - Tibor Szilvási
- Department of Chemical and Biological Engineering, The University of Alabama, Tuscaloosa, AL 35487, USA.
| |
Collapse
|
16
|
Vazquez-Salazar LI, Boittier ED, Unke OT, Meuwly M. Impact of the Characteristics of Quantum Chemical Databases on Machine Learning Prediction of Tautomerization Energies. J Chem Theory Comput 2021; 17:4769-4785. [PMID: 34288675 DOI: 10.1021/acs.jctc.1c00363] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
An essential aspect for adequate predictions of chemical properties by machine learning models is the database used for training them. However, studies that analyze how the content and structure of the databases used for training impact the prediction quality are scarce. In this work, we analyze and quantify the relationships learned by a machine learning model (Neural Network) trained on five different reference databases (QM9, PC9, ANI-1E, ANI-1, and ANI-1x) to predict tautomerization energies from molecules in Tautobase. For this, characteristics such as the number of heavy atoms in a molecule, number of atoms of a given element, bond composition, or initial geometry on the quality of the predictions are considered. The results indicate that training on a chemically diverse database is crucial for obtaining good results and also that conformational sampling can partly compensate for limited coverage of chemical diversity. The overall best-performing reference database (ANI-1x) performs on average by 1 kcal/mol better than PC9, which, however, contains about 2 orders of magnitude fewer reference structures. On the other hand, PC9 is chemically more diverse by a factor of ∼5 as quantified by the number of atom-in-molecule-based fragments (amons) it contains compared with the ANI family of databases. A quantitative measure for deficiencies is the Kullback-Leibler divergence between reference and target distributions. It is explicitly demonstrated that when certain types of bonds need to be covered in the target database (Tautobase) but are undersampled in the reference databases, the resulting predictions are poor. Examples of this include the poor performance of all databases analyzed to predict C(sp2)-C(sp2) double bonds close to heteroatoms and azoles containing N-N and N-O bonds. Analysis of the results with a Tree MAP algorithm provides deeper understanding of specific deficiencies in predicting tautomerization energies by the reference datasets due to inadequate coverage of chemical space. Capitalizing on this information can be used to either improve existing databases or generate new databases of sufficient diversity for a range of machine learning (ML) applications in chemistry.
Collapse
Affiliation(s)
| | - Eric D Boittier
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| | - Oliver T Unke
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany.,DFG Cluster of Excellence "Unifying Systems in Catalysis" (UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
| | - Markus Meuwly
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland.,Department of Chemistry, Brown University, Providence, Rhode Island 02912, United States
| |
Collapse
|