1
|
Stuyver T. TS-tools: Rapid and automated localization of transition states based on a textual reaction SMILES input. J Comput Chem 2024; 45:2308-2317. [PMID: 38850166 DOI: 10.1002/jcc.27374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 03/08/2024] [Accepted: 03/20/2024] [Indexed: 06/10/2024]
Abstract
Here, TS-tools is presented, a Python package facilitating the automated localization of transition states (TS) based on a textual reaction SMILES input. TS searches can either be performed at xTB or DFT level of theory, with the former yielding guesses at marginal computational cost, and the latter directly yielding accurate structures at greater expense. On a benchmarking dataset of mono- and bimolecular reactions, TS-tools reaches an excellent success rate of 95% already at xTB level of theory. For tri- and multimolecular reaction pathways - which are typically not benchmarked when developing new automated TS search approaches, yet are relevant for various types of reactivity, cf. solvent- and autocatalysis and enzymatic reactivity - TS-tools retains its ability to identify TS geometries, though a DFT treatment becomes essential in many cases. Throughout the presented applications, a particular emphasis is placed on solvation-induced mechanistic changes, another issue that received limited attention in the automated TS search literature so far.
Collapse
Affiliation(s)
- Thijs Stuyver
- Ecole Nationale Supérieure de Chimie de Paris, Université PSL, CNRS, Institute of Chemistry for Life and Health Sciences, Paris, France
| |
Collapse
|
2
|
Joll K, Schienbein P, Rosso KM, Blumberger J. Machine learning the electric field response of condensed phase systems using perturbed neural network potentials. Nat Commun 2024; 15:8192. [PMID: 39294144 PMCID: PMC11411082 DOI: 10.1038/s41467-024-52491-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Accepted: 09/11/2024] [Indexed: 09/20/2024] Open
Abstract
The interaction of condensed phase systems with external electric fields is of major importance in a myriad of processes in nature and technology, ranging from the field-directed motion of cells (galvanotaxis), to geochemistry and the formation of ice phases on planets, to field-directed chemical catalysis and energy storage and conversion systems including supercapacitors, batteries and solar cells. Molecular simulation in the presence of electric fields would give important atomistic insight into these processes but applications of the most accurate methods such as ab-initio molecular dynamics (AIMD) are limited in scope by their computational expense. Here we introduce Perturbed Neural Network Potential Molecular Dynamics (PNNP MD) to push back the accessible time and length scales of such simulations. We demonstrate that important dielectric properties of liquid water including the field-induced relaxation dynamics, the dielectric constant and the field-dependent IR spectrum can be machine learned up to surprisingly high field strengths of about 0.2 V Å-1 without loss in accuracy when compared to ab-initio molecular dynamics. This is remarkable because, in contrast to most previous approaches, the two neural networks on which PNNP MD is based are exclusively trained on molecular configurations sampled from zero-field MD simulations, demonstrating that the networks not only interpolate but also reliably extrapolate the field response. PNNP MD is based on rigorous theory yet it is simple, general, modular, and systematically improvable allowing us to obtain atomistic insight into the interaction of a wide range of condensed phase systems with external electric fields.
Collapse
Affiliation(s)
- Kit Joll
- Department of Physics and Astronomy and Thomas Young Centre, University College London, London, UK
| | - Philipp Schienbein
- Department of Physics and Astronomy and Thomas Young Centre, University College London, London, UK.
- Department of Physics, Imperial College London, South Kensington, London, UK.
| | - Kevin M Rosso
- Pacific Northwest National Laboratory, Richland, Washington, UK
| | - Jochen Blumberger
- Department of Physics and Astronomy and Thomas Young Centre, University College London, London, UK.
| |
Collapse
|
3
|
Malenfant-Thuot O, Ryczko K, Tamblyn I, Côté M. Efficient determination of Born-effective charges, LO-TO splitting, and Raman tensors of solids with a real-space atom-centered deep learning approach. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2024; 36:425901. [PMID: 39019077 DOI: 10.1088/1361-648x/ad64a2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Accepted: 07/17/2024] [Indexed: 07/19/2024]
Abstract
We introduce a deep neural network (DNN) framework called theReal-spaceAtomicDecompositionNETwork (radnet), which is capable of making accurate predictions of polarization and of electronic dielectric permittivity tensors in solids and aims to address limitations of previously available machine learning models for Raman predictions in periodic systems. This framework builds on previous, atom-centered approaches while utilizing deep convolutional neural networks. We report excellent accuracies on direct predictions for two prototypical examples: GaAs and BN. We then use automatic differentiation to efficiently calculate the Born-effective charges, longitudinal optical-transverse optical (LO-TO) splitting frequencies, and Raman tensors of these materials. We compute the Raman spectra, and find agreement withab initioresults. Lastly, we explore ways to generalize the predictions of polarization while taking into account periodic boundary conditions and symmetries.
Collapse
Affiliation(s)
- Olivier Malenfant-Thuot
- Département de physique et Institut Courtois, Université de Montréal, Montréal, Québec, Canada
| | - Kevin Ryczko
- Department of Physics, University of Ottawa, Ottawa, Ontario, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada
- SandboxAQ, Palo Alto, CA, United States of America
| | - Isaac Tamblyn
- Department of Physics, University of Ottawa, Ottawa, Ontario, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada
| | - Michel Côté
- Département de physique et Institut Courtois, Université de Montréal, Montréal, Québec, Canada
| |
Collapse
|
4
|
Domenichini G. Extending the definition of atomic basis sets to atoms with fractional nuclear charge. J Chem Phys 2024; 160:124107. [PMID: 38526100 DOI: 10.1063/5.0196383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2024] [Accepted: 03/10/2024] [Indexed: 03/26/2024] Open
Abstract
Alchemical transformations showed that perturbation theory can be applied also to changes in the atomic nuclear charges of a molecule. The alchemical path that connects two different chemical species involves the conceptualization of a non-physical system in which an atom possess a non-integer nuclear charge. A correct quantum mechanical treatment of these systems is limited by the fact that finite size atomic basis sets do not define exponents and contraction coefficients for fractional charge atoms. This paper proposes a solution to this problem and shows that a smooth interpolation of the atomic orbital coefficients and exponents across the periodic table is a convenient way to produce accurate alchemical predictions, even using small size basis sets.
Collapse
Affiliation(s)
- Giorgio Domenichini
- Faculty of Physics, University of Vienna, Kolingasse 14-16, 1090 Vienna, Austria
| |
Collapse
|
5
|
Villard J, Bircher MP, Rothlisberger U. Structure and dynamics of liquid water from ab initio simulations: adding Minnesota density functionals to Jacob's ladder. Chem Sci 2024; 15:4434-4451. [PMID: 38516095 PMCID: PMC10952088 DOI: 10.1039/d3sc05828j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 02/12/2024] [Indexed: 03/23/2024] Open
Abstract
The accurate representation of the structural and dynamical properties of water is essential for simulating the unique behavior of this ubiquitous solvent. Here we assess the current status of describing liquid water using ab initio molecular dynamics, with a special focus on the performance of all the later generation Minnesota functionals. Findings are contextualized within the current knowledge on DFT for describing bulk water under ambient conditions and compared to experimental data. We find that, contrary to the prevalent idea that local and semilocal functionals overstructure water and underestimate dynamical properties, M06-L, revM06-L, and M11-L understructure water, while MN12-L and MN15-L overdistance water molecules due to weak cohesive effects. This can be attributed to a weakening of the hydrogen bond network, which leads to dynamical fingerprints that are over fast. While most of the hybrid Minnesota functionals (M06, M08-HX, M08-SO, M11, MN12-SX, and MN15) also yield understructured water, their dynamical properties generally improve over their semilocal counterparts. It emerges that exact exchange is a crucial component for accurately describing hydrogen bonds, which ultimately leads to corrections in both the dynamical and structural properties. However, an excessive amount of exact exchange strengthens hydrogen bonds and causes overstructuring and slow dynamics (M06-HF). As a compromise, M06-2X is the best performing Minnesota functional for water, and its D3 corrected variant shows very good structural agreement. From previous studies considering nuclear quantum effects (NQEs), the hybrid revPBE0-D3, and the rung-5 RPA (RPA@PBE) have been identified as the only two approximations that closely agree with experiments. Our results suggest that the M06-2X(-D3) functionals have the potential to further improve the reproduction of experimental properties when incorporating NQEs through path integral approaches. This work provides further proof that accurate modeling of water interactions requires the inclusion of both exact exchange and balanced (non-local) correlation, highlighting the need for higher rungs on Jacob's ladder to achieve predictive simulations of complex biological systems in aqueous environments.
Collapse
Affiliation(s)
- Justin Villard
- Laboratory of Computational Chemistry and Biochemistry, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne (EPFL) Lausanne CH-1015 Switzerland
| | - Martin P Bircher
- Computational and Soft Matter Physics, Universität Wien Wien A-1090 Austria
| | - Ursula Rothlisberger
- Laboratory of Computational Chemistry and Biochemistry, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne (EPFL) Lausanne CH-1015 Switzerland
| |
Collapse
|
6
|
Sharma V, Singh A, Chauhan S, Sharma PK, Chaudhary S, Sharma A, Porwal O, Fuloria NK. Role of Artificial Intelligence in Drug Discovery and Target Identification in Cancer. Curr Drug Deliv 2024; 21:870-886. [PMID: 37670704 DOI: 10.2174/1567201821666230905090621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 03/08/2023] [Accepted: 03/24/2023] [Indexed: 09/07/2023]
Abstract
Drug discovery and development (DDD) is a highly complex process that necessitates precise monitoring and extensive data analysis at each stage. Furthermore, the DDD process is both timeconsuming and costly. To tackle these concerns, artificial intelligence (AI) technology can be used, which facilitates rapid and precise analysis of extensive datasets within a limited timeframe. The pathophysiology of cancer disease is complicated and requires extensive research for novel drug discovery and development. The first stage in the process of drug discovery and development involves identifying targets. Cell structure and molecular functioning are complex due to the vast number of molecules that function constantly, performing various roles. Furthermore, scientists are continually discovering novel cellular mechanisms and molecules, expanding the range of potential targets. Accurately identifying the correct target is a crucial step in the preparation of a treatment strategy. Various forms of AI, such as machine learning, neural-based learning, deep learning, and network-based learning, are currently being utilised in applications, online services, and databases. These technologies facilitate the identification and validation of targets, ultimately contributing to the success of projects. This review focuses on the different types and subcategories of AI databases utilised in the field of drug discovery and target identification for cancer.
Collapse
Affiliation(s)
- Vishal Sharma
- Department of Pharmacy, Galgotias University, Greater Noida, Uttar Pradesh, 201310, India
| | - Amit Singh
- Department of Pharmacy, Galgotias University, Greater Noida, Uttar Pradesh, 201310, India
| | - Sanjana Chauhan
- Department of Pharmacy, Galgotias University, Greater Noida, Uttar Pradesh, 201310, India
| | - Pramod Kumar Sharma
- Department of Pharmacy, Galgotias University, Greater Noida, Uttar Pradesh, 201310, India
| | - Shubham Chaudhary
- Department of Pharmacy, Galgotias University, Greater Noida, Uttar Pradesh, 201310, India
| | - Astha Sharma
- Department of Pharmacy, Galgotias University, Greater Noida, Uttar Pradesh, 201310, India
| | - Omji Porwal
- Department of Pharmacognosy, Faculty of Pharmacy, Tishk International University, Erbil 44001, Iraq
| | | |
Collapse
|
7
|
Domenichini G, Dellago C. Molecular Hessian matrices from a machine learning random forest regression algorithm. J Chem Phys 2023; 159:194111. [PMID: 37982481 DOI: 10.1063/5.0169384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 10/27/2023] [Indexed: 11/21/2023] Open
Abstract
In this article, we present a machine learning model to obtain fast and accurate estimates of the molecular Hessian matrix. In this model, based on a random forest, the second derivatives of the energy with respect to redundant internal coordinates are learned individually. The internal coordinates together with their specific representation guarantee rotational and translational invariance. The model is trained on a subset of the QM7 dataset but is shown to be applicable to larger molecules picked from the QM9 dataset. From the predicted Hessian, it is also possible to obtain reasonable estimates of the vibrational frequencies, normal modes, and zero point energies of the molecules.
Collapse
Affiliation(s)
- Giorgio Domenichini
- Faculty of Physics, University of Vienna, Kolingasse 14-16, 1090 Vienna, Austria
| | - Christoph Dellago
- Faculty of Physics, University of Vienna, Kolingasse 14-16, 1090 Vienna, Austria
| |
Collapse
|
8
|
Zhang Y, Jiang B. Universal machine learning for the response of atomistic systems to external fields. Nat Commun 2023; 14:6424. [PMID: 37827998 PMCID: PMC10570356 DOI: 10.1038/s41467-023-42148-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Accepted: 10/01/2023] [Indexed: 10/14/2023] Open
Abstract
Machine learned interatomic interaction potentials have enabled efficient and accurate molecular simulations of closed systems. However, external fields, which can greatly change the chemical structure and/or reactivity, have been seldom included in current machine learning models. This work proposes a universal field-induced recursively embedded atom neural network (FIREANN) model, which integrates a pseudo field vector-dependent feature into atomic descriptors to represent system-field interactions with rigorous rotational equivariance. This "all-in-one" approach correlates various response properties like dipole moment and polarizability with the field-dependent potential energy in a single model, very suitable for spectroscopic and dynamics simulations in molecular and periodic systems in the presence of electric fields. Especially for periodic systems, we find that FIREANN can overcome the intrinsic multiple-value issue of the polarization by training atomic forces only. These results validate the universality and capability of the FIREANN method for efficient first-principles modeling of complicated systems in strong external fields.
Collapse
Affiliation(s)
- Yaolong Zhang
- Key Laboratory of Precision and Intelligent Chemistry, Department of Chemical Physics, Key Laboratory of Surface and Interface Chemistry and Energy Catalysis of Anhui Higher Education Institutes, University of Science and Technology of China, Hefei, Anhui, 230026, China
- École Polytechnique Fédérale de Lausanne, 1015, Lausanne, Switzerland
| | - Bin Jiang
- Key Laboratory of Precision and Intelligent Chemistry, Department of Chemical Physics, Key Laboratory of Surface and Interface Chemistry and Energy Catalysis of Anhui Higher Education Institutes, University of Science and Technology of China, Hefei, Anhui, 230026, China.
- Hefei National Laboratory, University of Science and Technology of China, Hefei, 230088, China.
| |
Collapse
|
9
|
Litman Y, Lan J, Nagata Y, Wilkins DM. Fully First-Principles Surface Spectroscopy with Machine Learning. J Phys Chem Lett 2023; 14:8175-8182. [PMID: 37671886 PMCID: PMC10510433 DOI: 10.1021/acs.jpclett.3c01989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 08/29/2023] [Indexed: 09/07/2023]
Abstract
Our current understanding of the structure and dynamics of aqueous interfaces at the molecular level has grown substantially due to the continuous development of surface-specific spectroscopies, such as vibrational sum-frequency generation (VSFG). As in other vibrational spectroscopies, we must turn to atomistic simulations to extract all of the information encoded in the VSFG spectra. The high computational cost associated with existing methods means that they have limitations in representing systems with complex electronic structure or in achieving statistical convergence. In this work, we combine high-dimensional neural network interatomic potentials and symmetry-adapted Gaussian process regression to overcome these constraints. We show that it is possible to model VSFG signals with fully ab initio accuracy using machine learning and illustrate the versatility of our approach on the water/air interface. Our strategy allows us to identify the main sources of theoretical inaccuracy and establish a clear pathway toward the modeling of surface-sensitive spectroscopy of complex interfaces.
Collapse
Affiliation(s)
- Yair Litman
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K.
- Max
Planck Institute for Polymer Research, Ackermannweg 10, 55128 Mainz, Germany
| | - Jinggang Lan
- Department
of Chemistry, New York University, New York, New York 10003, United States
- Simons
Center for Computational Physical Chemistry at New York University, New York, New York 10003, United States
| | - Yuki Nagata
- Max
Planck Institute for Polymer Research, Ackermannweg 10, 55128 Mainz, Germany
| | - David M. Wilkins
- Centre
for Quantum Materials and Technologies School of Mathematics and Physics, Queen’s University Belfast, Belfast BT7 1NN, Northern Ireland, United Kingdom
| |
Collapse
|
10
|
Williams AH, Zhan CG. Staying Ahead of the Game: How SARS-CoV-2 has Accelerated the Application of Machine Learning in Pandemic Management. BioDrugs 2023; 37:649-674. [PMID: 37464099 DOI: 10.1007/s40259-023-00611-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/28/2023] [Indexed: 07/20/2023]
Abstract
In recent years, machine learning (ML) techniques have garnered considerable interest for their potential use in accelerating the rate of drug discovery. With the emergence of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic, the utilization of ML has become even more crucial in the search for effective antiviral medications. The pandemic has presented the scientific community with a unique challenge, and the rapid identification of potential treatments has become an urgent priority. Researchers have been able to accelerate the process of identifying drug candidates, repurposing existing drugs, and designing new compounds with desirable properties using machine learning in drug discovery. To train predictive models, ML techniques in drug discovery rely on the analysis of large datasets, including both experimental and clinical data. These models can be used to predict the biological activities, potential side effects, and interactions with specific target proteins of drug candidates. This strategy has proven to be an effective method for identifying potential coronavirus disease 2019 (COVID-19) and other disease treatments. This paper offers a thorough analysis of the various ML techniques implemented to combat COVID-19, including supervised and unsupervised learning, deep learning, and natural language processing. The paper discusses the impact of these techniques on pandemic drug development, including the identification of potential treatments, the understanding of the disease mechanism, and the creation of effective and safe therapeutics. The lessons learned can be applied to future outbreaks and drug discovery initiatives.
Collapse
Affiliation(s)
- Alexander H Williams
- Molecular Modeling and Biopharmaceutical Center, University of Kentucky, 789 South Limestone Street, Lexington, KY, 40536, USA
- Department of Pharmaceutical Sciences, College of Pharmacy, University of Kentucky, 789 South Limestone Street, Lexington, KY, 40536, USA
- GSK Upper Providence, 1250 S. Collegeville Road, Collegeville, PA, 19426, USA
| | - Chang-Guo Zhan
- Molecular Modeling and Biopharmaceutical Center, University of Kentucky, 789 South Limestone Street, Lexington, KY, 40536, USA.
- Department of Pharmaceutical Sciences, College of Pharmacy, University of Kentucky, 789 South Limestone Street, Lexington, KY, 40536, USA.
| |
Collapse
|
11
|
Snyder R, Kim B, Pan X, Shao Y, Pu J. Bridging semiempirical and ab initio QM/MM potentials by Gaussian process regression and its sparse variants for free energy simulation. J Chem Phys 2023; 159:054107. [PMID: 37530109 PMCID: PMC10400118 DOI: 10.1063/5.0156327] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 07/10/2023] [Indexed: 08/03/2023] Open
Abstract
Free energy simulations that employ combined quantum mechanical and molecular mechanical (QM/MM) potentials at ab initio QM (AI) levels are computationally highly demanding. Here, we present a machine-learning-facilitated approach for obtaining AI/MM-quality free energy profiles at the cost of efficient semiempirical QM/MM (SE/MM) methods. Specifically, we use Gaussian process regression (GPR) to learn the potential energy corrections needed for an SE/MM level to match an AI/MM target along the minimum free energy path (MFEP). Force modification using gradients of the GPR potential allows us to improve configurational sampling and update the MFEP. To adaptively train our model, we further employ the sparse variational GP (SVGP) and streaming sparse GPR (SSGPR) methods, which efficiently incorporate previous sample information without significantly increasing the training data size. We applied the QM-(SS)GPR/MM method to the solution-phase SN2 Menshutkin reaction, NH3+CH3Cl→CH3NH3++Cl-, using AM1/MM and B3LYP/6-31+G(d,p)/MM as the base and target levels, respectively. For 4000 configurations sampled along the MFEP, the iteratively optimized AM1-SSGPR-4/MM model reduces the energy error in AM1/MM from 18.2 to 4.4 kcal/mol. Although not explicitly fitting forces, our method also reduces the key internal force errors from 25.5 to 11.1 kcal/mol/Å and from 30.2 to 10.3 kcal/mol/Å for the N-C and C-Cl bonds, respectively. Compared to the uncorrected simulations, the AM1-SSGPR-4/MM method lowers the predicted free energy barrier from 28.7 to 11.7 kcal/mol and decreases the reaction free energy from -12.4 to -41.9 kcal/mol, bringing these results into closer agreement with their AI/MM and experimental benchmarks.
Collapse
Affiliation(s)
- Ryan Snyder
- Department of Chemistry and Chemical Biology, Indiana University-Purdue University Indianapolis, 402 N Blackford St., Indianapolis, Indiana 46202, USA
| | - Bryant Kim
- Department of Chemistry and Chemical Biology, Indiana University-Purdue University Indianapolis, 402 N Blackford St., Indianapolis, Indiana 46202, USA
| | - Xiaoliang Pan
- Department of Chemistry and Biochemistry, University of Oklahoma, 101 Stephenson Pkwy, Norman, Oklahoma 73019, USA
| | - Yihan Shao
- Department of Chemistry and Biochemistry, University of Oklahoma, 101 Stephenson Pkwy, Norman, Oklahoma 73019, USA
| | - Jingzhi Pu
- Department of Chemistry and Chemical Biology, Indiana University-Purdue University Indianapolis, 402 N Blackford St., Indianapolis, Indiana 46202, USA
| |
Collapse
|
12
|
Sahre MJ, von Rudorff GF, von Lilienfeld OA. Quantum Alchemy Based Bonding Trends and Their Link to Hammett's Equation and Pauling's Electronegativity Model. J Am Chem Soc 2023; 145:5899-5908. [PMID: 36862462 DOI: 10.1021/jacs.2c13393] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/03/2023]
Abstract
We present an intuitive and general analytical approximation estimating the energy of covalent single and double bonds between participating atoms in terms of their respective nuclear charges with just three parameters, [EAB ≈ a - bZAZB + c(ZA7/3 + ZB7/3) ]. The functional form of our expression models an alchemical atomic energy decomposition between participating atoms A and B. After calibration, reasonably accurate bond dissociation energy estimates are obtained for hydrogen-saturated diatomics composed of p-block elements coming from the same row 2 ≤ n ≤ 4 in the periodic table. Corresponding changes in bond dissociation energies due to substitution of atom B by C can be obtained via simple formulas. While being of different functional form and origin, our model is as simple and accurate as Pauling's well-known electronegativity model. Analysis indicates that the model's response in covalent bonding to variation in nuclear charge is near-linear, which is consistent with Hammett's equation.
Collapse
Affiliation(s)
- Michael J Sahre
- Faculty of Physics, University of Vienna, Vienna, 1090, Austria.,Vienna Doctoral School in Chemistry (DoSChem), University of Vienna, Vienna, 1090, Austria
| | | | - O Anatole von Lilienfeld
- Vector Institute for Artificial Intelligence, Toronto, M5S 1M1, Canada.,Departments of Chemistry, Materials Science and Engineering, and Physics, University of Toronto, St. George Campus, Toronto, M5R 0A3, Canada.,Machine Learning Group, Technische Universität Berlin and Institute for the Foundations of Learning and Data, Berlin, 10587, Germany
| |
Collapse
|
13
|
Arab F, Nazari F, Illas F. Artificial Neural Network-Derived Unified Six-Dimensional Potential Energy Surface for Tetra Atomic Isomers of the Biogenic [H, C, N, O] System. J Chem Theory Comput 2023; 19:1186-1196. [PMID: 36735891 PMCID: PMC9979606 DOI: 10.1021/acs.jctc.2c00915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Recognition of different structural patterns in different potential energy surface regions, such as in isomerizing quasilinear tetra atomic molecules, is important for understanding the details of underlying physics and chemistry. In this respect, using three variants of artificial neural networks (ANNs), we investigated the six-dimensional (6-D) singlet potential energy surfaces (PES) of tetra atomic isomers of the biogenic [H, C, N, O] system. At first, we constructed a separate ANN potential for each of the studied isomers. In the next step, a comparative assessment of the separate ANN models led to the setting up of a unified 6-D singlet PES equally and accurately describing all studied isomers. The constructed unified model yields relative energies comparable to those obtained either from the gold standard CCSD(T) method or from separate ANNs for each of the studied isomers. The accuracy of the unified singlet PES is on the order of 10-4 Hartrees (0.1 kcal/mol). The developed PES in this work captures the main features of nonlinear and quasilinear tetra atomic isomers of this biogenic system.
Collapse
Affiliation(s)
- Fatemeh Arab
- Department
of Chemistry, Institute for Advanced Studies
in Basic Sciences, Zanjan45137-66731, Iran
| | - Fariba Nazari
- Department
of Chemistry, Institute for Advanced Studies
in Basic Sciences, Zanjan45137-66731, Iran,Center
of Climate Change and Global Warming, Institute
for Advanced Studies in Basic Sciences, Zanjan45137-66731, Iran,
| | - Francesc Illas
- Departament
de Ciència de Materials i Química Física &
Institut de Química Teòrica i Computacional (IQTCUB), Universitat de Barcelona, C/Martí i Franquès 1, 08028Barcelona, Spain,
| |
Collapse
|
14
|
Thie A, Menger MF, Faraji S. HOAX: a hyperparameter optimisation algorithm explorer for neural networks. Mol Phys 2023. [DOI: 10.1080/00268976.2023.2172732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Affiliation(s)
- Albert Thie
- Zernike Institute for Advanced Materials, Faculty of Science and Engineering, University of Groningen, Groningen, The Netherlands
| | - Maximilian F.S.J. Menger
- Zernike Institute for Advanced Materials, Faculty of Science and Engineering, University of Groningen, Groningen, The Netherlands
| | - Shirin Faraji
- Zernike Institute for Advanced Materials, Faculty of Science and Engineering, University of Groningen, Groningen, The Netherlands
| |
Collapse
|
15
|
Heinen S, von Rudorff GF, von Lilienfeld OA. Transition state search and geometry relaxation throughout chemical compound space with quantum machine learning. J Chem Phys 2022; 157:221102. [PMID: 36546806 DOI: 10.1063/5.0112856] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
We use energies and forces predicted within response operator based quantum machine learning (OQML) to perform geometry optimization and transition state search calculations with legacy optimizers but without the need for subsequent re-optimization with quantum chemistry methods. For randomly sampled initial coordinates of small organic query molecules, we report systematic improvement of equilibrium and transition state geometry output as training set sizes increase. Out-of-sample SN2 reactant complexes and transition state geometries have been predicted using the LBFGS and the QST2 algorithms with an root-mean-square deviation (RMSD) of 0.16 and 0.4 Å-after training on up to 200 reactant complex relaxations and transition state search trajectories from the QMrxn20 dataset, respectively. For geometry optimizations, we have also considered relaxation paths up to 5'595 constitutional isomers with sum formula C7H10O2 from the QM9-database. Using the resulting OQML models with an LBFGS optimizer reproduces the minimum geometry with an RMSD of 0.14 Å, only using ∼6000 training points obtained from normal mode sampling along the optimization paths of the training compounds without the need for active learning. For converged equilibrium and transition state geometries, subsequent vibrational normal mode frequency analysis indicates deviation from MP2 reference results by on average 14 and 26 cm-1, respectively. While the numerical cost for OQML predictions is negligible in comparison to density functional theory or MP2, the number of steps until convergence is typically larger in either case. The success rate for reaching convergence, however, improves systematically with training set size, underscoring OQML's potential for universal applicability.
Collapse
Affiliation(s)
- Stefan Heinen
- University of Vienna, Faculty of Physics, Kolingasse 14-16, AT-1090 Wien, Austria
| | | | | |
Collapse
|
16
|
Browning NJ, Faber FA, Anatole von Lilienfeld O. GPU-accelerated approximate kernel method for quantum machine learning. J Chem Phys 2022; 157:214801. [PMID: 36511559 DOI: 10.1063/5.0108967] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
We introduce Quantum Machine Learning (QML)-Lightning, a PyTorch package containing graphics processing unit (GPU)-accelerated approximate kernel models, which can yield trained models within seconds. QML-Lightning includes a cost-efficient GPU implementation of FCHL19, which together can provide energy and force predictions with competitive accuracy on a microsecond per atom timescale. Using modern GPU hardware, we report learning curves of energies and forces as well as timings as numerical evidence for select legacy benchmarks from atomistic simulation including QM9, MD-17, and 3BPA.
Collapse
Affiliation(s)
- Nicholas J Browning
- Institute of Physical Chemistry and National Center for Computational Design and Discovery of Novel Materials, Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| | - Felix A Faber
- Department of Physics, University of Cambridge, Cambridge, United Kingdom
| | | |
Collapse
|
17
|
Mazouin B, Schöpfer AA, von Lilienfeld OA. Selected machine learning of HOMO-LUMO gaps with improved data-efficiency. MATERIALS ADVANCES 2022; 3:8306-8316. [PMID: 36561279 PMCID: PMC9662596 DOI: 10.1039/d2ma00742h] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Accepted: 09/12/2022] [Indexed: 06/17/2023]
Abstract
Despite their relevance for organic electronics, quantum machine learning (QML) models of molecular electronic properties, such as HOMO-LUMO-gaps, often struggle to achieve satisfying data-efficiency as measured by decreasing prediction errors for increasing training set sizes. We demonstrate that partitioning training sets into different chemical classes prior to training results in independently trained QML models with overall reduced training data needs. For organic molecules drawn from previously published QM7 and QM9-data-sets we have identified and exploited three relevant classes corresponding to compounds containing either aromatic rings and carbonyl groups, or single unsaturated bonds, or saturated bonds The selected QML models of band-gaps (considered at GW and hybrid DFT levels of theory) reach mean absolute prediction errors of ∼0.1 eV for up to an order of magnitude fewer training molecules than for QML models trained on randomly selected molecules. Comparison to Δ-QML models of band-gaps indicates that selected QML exhibit superior data-efficiency. Our findings suggest that selected QML, e.g. based on simple classifications prior to training, could help to successfully tackle challenging quantum property screening tasks of large libraries with high fidelity and low computational burden.
Collapse
Affiliation(s)
- Bernard Mazouin
- University of Vienna, Faculty of Physics and Vienna Doctoral School in Physics Kolingasse 14-16 1090 Vienna Austria
| | | | - O Anatole von Lilienfeld
- Departments of Chemistry, Materials Science and Engineering, and Physics, University of Toronto St. George Campus Toronto ON Canada
- Vector Institute for Artificial Intelligence Toronto ON M5S 1M1 Canada
- Machine Learning Group, Technische Universität Berlin and Institute for the Foundations of Learning and Data 10587 Berlin Germany
| |
Collapse
|
18
|
Schmitz N, Müller KR, Chmiela S. Algorithmic Differentiation for Automated Modeling of Machine Learned Force Fields. J Phys Chem Lett 2022; 13:10183-10189. [PMID: 36279418 PMCID: PMC9639201 DOI: 10.1021/acs.jpclett.2c02632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 10/20/2022] [Indexed: 05/09/2023]
Abstract
Reconstructing force fields (FFs) from atomistic simulation data is a challenge since accurate data can be highly expensive. Here, machine learning (ML) models can help to be data economic as they can be successfully constrained using the underlying symmetry and conservation laws of physics. However, so far, every descriptor newly proposed for an ML model has required a cumbersome and mathematically tedious remodeling. We therefore propose using modern techniques from algorithmic differentiation within the ML modeling process, effectively enabling the usage of novel descriptors or models fully automatically at an order of magnitude higher computational efficiency. This paradigmatic approach enables not only a versatile usage of novel representations and the efficient computation of larger systems─all of high value to the FF community─but also the simple inclusion of further physical knowledge, such as higher-order information (e.g., Hessians, more complex partial differential equations constraints etc.), even beyond the presented FF domain.
Collapse
Affiliation(s)
| | - Klaus-Robert Müller
- Machine
Learning Group, Technische Universität
Berlin, 10587Berlin, Germany
- BIFOLD
- Berlin Institute for the Foundations of Learning and Data, 10587Berlin, Germany
- Department
of Artificial Intelligence, Korea University, Seongbuk-gu, Seoul02841, Korea
- Max
Planck Institute for Informatics, Stuhlsatzenhausweg, 66123Saarbrücken, Germany
- Google
Research, Brain Team, 10117Berlin, Germany
| | - Stefan Chmiela
- Machine
Learning Group, Technische Universität
Berlin, 10587Berlin, Germany
- BIFOLD
- Berlin Institute for the Foundations of Learning and Data, 10587Berlin, Germany
| |
Collapse
|
19
|
Snyder R, Kim B, Pan X, Shao Y, Pu J. Facilitating ab initio QM/MM free energy simulations by Gaussian process regression with derivative observations. Phys Chem Chem Phys 2022; 24:25134-25143. [PMID: 36222412 PMCID: PMC11095978 DOI: 10.1039/d2cp02820d] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
In combined quantum mechanical and molecular mechanical (QM/MM) free energy simulations, how to synthesize the accuracy of ab initio (AI) methods with the speed of semiempirical (SE) methods for a cost-effective QM treatment remains a long-standing challenge. In this work, we present a machine-learning-facilitated method for obtaining AI/MM-quality free energy profiles through efficient SE/MM simulations. In particular, we use Gaussian process regression (GPR) to learn the energy and force corrections needed for SE/MM to match with AI/MM results during molecular dynamics simulations. Force matching is enabled in our model by including energy derivatives into the observational targets through the extended-kernel formalism. We demonstrate the effectiveness of this method on the solution-phase SN2 Menshutkin reaction using AM1/MM and B3LYP/6-31+G(d,p)/MM as the base and target levels, respectively. Trained on only 80 configurations sampled along the minimum free energy path (MFEP), the resulting GPR model reduces the average energy error in AM1/MM from 18.2 to 5.8 kcal mol-1 for the 4000-sample testing set with the average force error on the QM atoms decreased from 14.6 to 3.7 kcal mol-1 Å-1. Free energy sampling with the GPR corrections applied (AM1-GPR/MM) produces a free energy barrier of 14.4 kcal mol-1 and a reaction free energy of -34.1 kcal mol-1, in closer agreement with the AI/MM benchmarks and experimental results.
Collapse
Affiliation(s)
- Ryan Snyder
- Department of Chemistry and Chemical Biology, Indiana University-Purdue University Indianapolis, 402 N. Blackford St., Indianapolis, IN 46202, USA.
| | - Bryant Kim
- Department of Chemistry and Chemical Biology, Indiana University-Purdue University Indianapolis, 402 N. Blackford St., Indianapolis, IN 46202, USA.
| | - Xiaoliang Pan
- Department of Chemistry and Biochemistry, University of Oklahoma, 101 Stephenson Pkwy, Norman, OK 73019, USA.
| | - Yihan Shao
- Department of Chemistry and Biochemistry, University of Oklahoma, 101 Stephenson Pkwy, Norman, OK 73019, USA.
| | - Jingzhi Pu
- Department of Chemistry and Chemical Biology, Indiana University-Purdue University Indianapolis, 402 N. Blackford St., Indianapolis, IN 46202, USA.
| |
Collapse
|
20
|
Kuntz D, Wilson AK. Machine learning, artificial intelligence, and chemistry: how smart algorithms are reshaping simulation and the laboratory. PURE APPL CHEM 2022. [DOI: 10.1515/pac-2022-0202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Abstract
Machine learning and artificial intelligence are increasingly gaining in prominence through image analysis, language processing, and automation, to name a few applications. Machine learning is also making profound changes in chemistry. From revisiting decades-old analytical techniques for the purpose of creating better calibration curves, to assisting and accelerating traditional in silico simulations, to automating entire scientific workflows, to being used as an approach to deduce underlying physics of unexplained chemical phenomena, machine learning and artificial intelligence are reshaping chemistry, accelerating scientific discovery, and yielding new insights. This review provides an overview of machine learning and artificial intelligence from a chemist’s perspective and focuses on a number of examples of the use of these approaches in computational chemistry and in the laboratory.
Collapse
Affiliation(s)
- David Kuntz
- Department of Chemistry , University of North Texas , Denton , TX 76201 , USA
| | - Angela K. Wilson
- Department of Chemistry , Michigan State University , East Lansing , MI 48824 , USA
| |
Collapse
|
21
|
Beckmann R, Brieuc F, Schran C, Marx D. Infrared Spectra at Coupled Cluster Accuracy from Neural Network Representations. J Chem Theory Comput 2022; 18:5492-5501. [PMID: 35998360 DOI: 10.1021/acs.jctc.2c00511] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Infrared spectroscopy is key to elucidating molecular structures, monitoring reactions, and observing conformational changes, while providing information on both structural and dynamical properties. This makes the accurate prediction of infrared spectra based on first-principle theories a highly desirable pursuit. Molecular dynamics simulations have proven to be a particularly powerful approach for this task, albeit requiring the computation of energies, forces and dipole moments for a large number of molecular configurations as a function of time. This explains why highly accurate first-principles methods, such as coupled cluster theory, have so far been inapplicable for the prediction of fully anharmonic vibrational spectra of large systems at finite temperatures. Here, we push cutting-edge machine learning techniques forward by using neural network representations of energies, forces, and in particular dipoles to predict such infrared spectra fully at "gold standard" coupled cluster accuracy as demonstrated for protonated water clusters as large as the protonated water hexamer, in its extended Zundel configuration. Furthermore, we show that this methodology can be used beyond the scope of the data considered during the development of the neural network models, allowing for the computation of finite-temperature infrared spectra of large systems inaccessible to explicit coupled cluster calculations. This substantially expands the hitherto existing limits of accuracy, speed, and system size for theoretical spectroscopy and opens up a multitude of avenues for the prediction of vibrational spectra and the understanding of complex intra- and intermolecular couplings.
Collapse
Affiliation(s)
- Richard Beckmann
- Lehrstuhl für Theoretische Chemie, Ruhr-Universität Bochum, 44780 Bochum, Germany
| | - Fabien Brieuc
- Lehrstuhl für Theoretische Chemie, Ruhr-Universität Bochum, 44780 Bochum, Germany
| | - Christoph Schran
- Lehrstuhl für Theoretische Chemie, Ruhr-Universität Bochum, 44780 Bochum, Germany
| | - Dominik Marx
- Lehrstuhl für Theoretische Chemie, Ruhr-Universität Bochum, 44780 Bochum, Germany
| |
Collapse
|
22
|
Sun J, Cheng L, Miller TF. Molecular Dipole Moment Learning via Rotationally Equivariant Gaussian Process Regression with Derivatives in Molecular-orbital-based Machine Learning. J Chem Phys 2022; 157:104109. [DOI: 10.1063/5.0101280] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
This study extends the accurate and transferable molecular-orbital-based machine learning (MOB-ML) approach to modeling the contribution of electron correlation to dipole moments at the cost of Hartree--Fock computations. A molecular-orbital-based (MOB) pairwise decomposition of the correlation part of the dipole moment is applied, and these pair dipole moments could be further regressed as a universal function of molecular orbitals (MOs).The dipole MOB features consist of the energy MOB features and their responses to electric fields. An interpretable and rotationally equivariant Gaussian process regression (GPR) with derivatives algorithm is introduced to learn the dipole moment more efficiently. The proposed problem setup, feature design, and ML algorithm are shown to provide highly-accurate models for both dipole moment and energies on water and fourteen small molecules. To demonstrate the ability of MOB-ML to function as generalized density-matrix functionals for molecular dipole moments and energies of organic molecules, we further apply the proposed MOB-ML approach to train and test the molecules from the QM9 dataset. The application of local scalable GPR with Gaussian mixture model unsupervised clustering (GMM/GPR) scales up MOB-ML to a large-data regime while retaining the prediction accuracy. In addition, compared with literature results, MOB-ML provides the best test MAEs of 4.21 mDebye and 0.045 kcal/mol for dipole moment and energy models, respectively, when training on 110000 QM9 molecules. The excellent transferability of the resulting QM9 models is also illustrated by the accurate predictions for four different series of peptides.
Collapse
Affiliation(s)
- Jiace Sun
- Chemistry and Chemical Engineering, California Institute of Technology, United States of America
| | - Lixue Cheng
- Chemistry, California Institute of Technology, United States of America
| | - Thomas F Miller
- Division of Chemistry and Chemical Engineering, California Institute of Technology, United States of America
| |
Collapse
|
23
|
Qiao Z, Christensen AS, Welborn M, Manby FR, Anandkumar A, Miller TF. Informing geometric deep learning with electronic interactions to accelerate quantum chemistry. Proc Natl Acad Sci U S A 2022; 119:e2205221119. [PMID: 35901215 PMCID: PMC9351474 DOI: 10.1073/pnas.2205221119] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 06/06/2022] [Indexed: 01/30/2023] Open
Abstract
Predicting electronic energies, densities, and related chemical properties can facilitate the discovery of novel catalysts, medicines, and battery materials. However, existing machine learning techniques are challenged by the scarcity of training data when exploring unknown chemical spaces. We overcome this barrier by systematically incorporating knowledge of molecular electronic structure into deep learning. By developing a physics-inspired equivariant neural network, we introduce a method to learn molecular representations based on the electronic interactions among atomic orbitals. Our method, OrbNet-Equi, leverages efficient tight-binding simulations and learned mappings to recover high-fidelity physical quantities. OrbNet-Equi accurately models a wide spectrum of target properties while being several orders of magnitude faster than density functional theory. Despite only using training samples collected from readily available small-molecule libraries, OrbNet-Equi outperforms traditional semiempirical and machine learning-based methods on comprehensive downstream benchmarks that encompass diverse main-group chemical processes. Our method also describes interactions in challenging charge-transfer complexes and open-shell systems. We anticipate that the strategy presented here will help to expand opportunities for studies in chemistry and materials science, where the acquisition of experimental or reference training data is costly.
Collapse
Affiliation(s)
- Zhuoran Qiao
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA 91125
| | | | | | | | - Anima Anandkumar
- Division of Engineering and Applied Science, California Institute of Technology, Pasadena, CA 91125
- Nvidia Corporation, Santa Clara, CA 95051
| | - Thomas F. Miller
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA 91125
- Entos, Inc., Los Angeles, CA 90027
| |
Collapse
|
24
|
Cheng L, Sun J, Miller TF. Accurate Molecular-Orbital-Based Machine Learning Energies via Unsupervised Clustering of Chemical Space. J Chem Theory Comput 2022; 18:4826-4835. [PMID: 35858242 DOI: 10.1021/acs.jctc.2c00396] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We introduce an unsupervised clustering algorithm to improve training efficiency and accuracy in predicting energies using molecular-orbital-based machine learning (MOB-ML). This work determines clusters via the Gaussian mixture model (GMM) in an entirely automatic manner and simplifies an earlier supervised clustering approach [ J. Chem. Theory Comput. 2019, 15, 6668] by eliminating both the necessity for user-specified parameters and the training of an additional classifier. Unsupervised clustering results from GMM have the advantages of accurately reproducing chemically intuitive groupings of frontier molecular orbitals and exhibiting improved performance with an increasing number of training examples. The resulting clusters from supervised or unsupervised clustering are further combined with scalable Gaussian process regression (GPR) or linear regression (LR) to learn molecular energies accurately by generating a local regression model in each cluster. Among all four combinations of regressors and clustering methods, GMM combined with scalable exact GPR (GMM/GPR) is the most efficient training protocol for MOB-ML. The numerical tests of molecular energy learning on thermalized data sets of drug-like molecules demonstrate the improved accuracy, transferability, and learning efficiency of GMM/GPR over other training protocols for MOB-ML, i.e., supervised regression clustering combined with GPR (RC/GPR) and GPR without clustering. GMM/GPR also provides the best molecular energy predictions compared with ones from the literature on the same benchmark data sets. With a lower scaling, GMM/GPR has a 10.4-fold speedup in wall-clock training time compared with scalable exact GPR with a training size of 6500 QM7b-T molecules.
Collapse
Affiliation(s)
- Lixue Cheng
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Jiace Sun
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Thomas F Miller
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States
| |
Collapse
|
25
|
Isert C, Atz K, Jiménez-Luna J, Schneider G. QMugs, quantum mechanical properties of drug-like molecules. Sci Data 2022; 9:273. [PMID: 35672335 PMCID: PMC9174255 DOI: 10.1038/s41597-022-01390-7] [Citation(s) in RCA: 39] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Accepted: 05/17/2022] [Indexed: 12/16/2022] Open
Abstract
Machine learning approaches in drug discovery, as well as in other areas of the chemical sciences, benefit from curated datasets of physical molecular properties. However, there currently is a lack of data collections featuring large bioactive molecules alongside first-principle quantum chemical information. The open-access QMugs (Quantum-Mechanical Properties of Drug-like Molecules) dataset fills this void. The QMugs collection comprises quantum mechanical properties of more than 665 k biologically and pharmacologically relevant molecules extracted from the ChEMBL database, totaling ~2 M conformers. QMugs contains optimized molecular geometries and thermodynamic data obtained via the semi-empirical method GFN2-xTB. Atomic and molecular properties are provided on both the GFN2-xTB and on the density-functional levels of theory (DFT, ωB97X-D/def2-SVP). QMugs features molecules of significantly larger size than previously-reported collections and comprises their respective quantum mechanical wave functions, including DFT density and orbital matrices. This dataset is intended to facilitate the development of models that learn from molecular data on different levels of theory while also providing insight into the corresponding relationships between molecular structure and biological activity.
Collapse
Affiliation(s)
- Clemens Isert
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, 8093, Zurich, Switzerland
| | - Kenneth Atz
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, 8093, Zurich, Switzerland
| | - José Jiménez-Luna
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, 8093, Zurich, Switzerland.
- Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Straße 65, 88397, Biberach an der Riss, Germany.
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, 8093, Zurich, Switzerland.
- ETH Singapore SEC Ltd, 1 CREATE Way, #06-01 CREATE Tower, Singapore, 138602, Singapore.
| |
Collapse
|
26
|
Abstract
We propose to relax geometries throughout chemical compound space (CCS) using alchemical perturbation density functional theory (APDFT). APDFT refers to perturbation theory involving changes in nuclear charges within approximate solutions to Schr\"odinger's equation. We give an analytical formula to calculate the mixed second order energy derivatives with respect to both, nuclear charges and nuclear positions (named "alchemical force"), within the restricted Hartree-Fock case.We have implemented and studied the formula for its use in geometry relaxation of various reference and target molecules.We have also analysed the convergence of the alchemical force perturbation series, as well as basis set effects.Interpolating alchemically predicted energies, forces, and Hessian to a Morse potential yields more accurate geometries and equilibrium energies than when performing a standard Newton Raphson step. Our numerical predictions for small molecules including BF, CO, N2, CH$_4$, NH$_3$, H$_2$O, and HF yield mean absolute errors of of equilibrium energies and bond lengths smaller than 10 mHa and 0.01 Bohr for 4$^\text{th}$ order APDFT predictions, respectively.Our alchemical geometry relaxation still preserves the combinatorial efficiency of APDFT: Based on a single coupled perturbed Hartree Fock derivative for benzene we provide numerical predictions of equilibrium energies and relaxed structures of all the 17 iso-electronic charge-netural BN-doped mutants with averaged absolute deviations of $\sim$27 mHa and $\sim$0.12 Bohr, respectively.
Collapse
|
27
|
Rosenberger D, Barros K, Germann TC, Lubbers N. Machine learning of consistent thermodynamic models using automatic differentiation. Phys Rev E 2022; 105:045301. [PMID: 35590626 DOI: 10.1103/physreve.105.045301] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Accepted: 03/09/2022] [Indexed: 06/15/2023]
Abstract
We propose a data-driven method to describe consistent equations of state (EOS) for arbitrary systems. Complex EOS are traditionally obtained by fitting suitable analytical expressions to thermophysical data. A key aspect of EOS is that the relationships between state variables are given by derivatives of the system free energy. In this work, we model the free energy with an artificial neural network and utilize automatic differentiation to directly learn the derivatives of the free energy. We demonstrate this approach on two different systems, the analytic van der Waals EOS and published data for the Lennard-Jones fluid, and we show that it is advantageous over direct learning of thermodynamic properties (i.e., not as derivatives of the free energy but as independent properties), in terms of both accuracy and the exact preservation of the Maxwell relations. Furthermore, the method implicitly provides the free energy of a system without explicit integration.
Collapse
Affiliation(s)
- David Rosenberger
- Los Alamos National Laboratory, Theoretical Division, Physics and Chemistry of Materials Group, Los Alamos, New Mexico 87545, USA
| | - Kipton Barros
- Los Alamos National Laboratory, Theoretical Division, Physics and Chemistry of Materials Group, Los Alamos, New Mexico 87545, USA
| | - Timothy C Germann
- Los Alamos National Laboratory, Theoretical Division, Physics and Chemistry of Materials Group, Los Alamos, New Mexico 87545, USA
| | - Nicholas Lubbers
- Los Alamos National Laboratory, Computer, Computational & Statistical Sciences Division, Information Sciences Group, Los Alamos, New Mexico 87545, USA
| |
Collapse
|
28
|
Staacke CG, Wengert S, Kunkel C, Csányi G, Reuter K, Margraf JT. Kernel charge equilibration: efficient and accurate prediction of molecular dipole moments with a machine-learning enhanced electron density model. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2022. [DOI: 10.1088/2632-2153/ac568d] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
Abstract
State-of-the-art machine learning (ML) interatomic potentials use local representations of atomic environments to ensure linear scaling and size-extensivity. This implies a neglect of long-range interactions, most prominently related to electrostatics. To overcome this limitation, we herein present a ML framework for predicting charge distributions and their interactions termed kernel charge equilibration (kQEq). This model is based on classical charge equilibration (QEq) models expanded with an environment-dependent electronegativity. In contrast to previously reported neural network models with a similar concept, kQEq takes advantage of the linearity of both QEq and Kernel Ridge Regression to obtain a closed-form linear algebra expression for training the models. Furthermore, we avoid the ambiguity of charge partitioning schemes by using dipole moments as reference data. As a first application, we show that kQEq can be used to generate accurate and highly data-efficient models for molecular dipole moments.
Collapse
|
29
|
Cools-Ceuppens M, Dambre J, Verstraelen T. Modeling Electronic Response Properties with an Explicit-Electron Machine Learning Potential. J Chem Theory Comput 2022; 18:1672-1691. [PMID: 35171606 DOI: 10.1021/acs.jctc.1c00978] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Explicit-electron force fields introduce electrons or electron pairs as semiclassical particles in force fields or empirical potentials, which are suitable for molecular dynamics simulations. Even though semiclassical electrons are a drastic simplification compared to a quantum-mechanical electronic wave function, they still retain a relatively detailed electronic model compared to conventional polarizable and reactive force fields. The ability of explicit-electron models to describe chemical reactions and electronic response properties has already been demonstrated, yet the description of short-range interactions for a broad range of chemical systems remains challenging. In this work, we present the electron machine learning potential (eMLP), a new explicit electron force field in which the short-range interactions are modeled with machine learning. The electron pair particles will be located at well-defined positions, derived from localized molecular orbitals or Wannier centers, naturally imposing the correct dielectric and piezoelectric behavior of the system. The eMLP is benchmarked on two newly constructed data sets: eQM7, an extension of the QM7 data set for small molecules, and a data set for the crystalline β-glycine. It is shown that the eMLP can predict dipole moments, polarizabilities, and IR-spectra of unseen molecules with high precision. Furthermore, a variety of response properties, for example, stiffness or piezoelectric constants, can be accurately reproduced.
Collapse
Affiliation(s)
- Maarten Cools-Ceuppens
- Center for Molecular Modeling (CMM), Ghent University, Technologiepark-Zwijnaarde 46, B-9052 Gent, Belgium
| | - Joni Dambre
- IDLab, Electronics and Information Systems Department, Ghent University-imec, Technologiepark-Zwijnaarde 126, B-9052 Gent, Belgium
| | - Toon Verstraelen
- Center for Molecular Modeling (CMM), Ghent University, Technologiepark-Zwijnaarde 46, B-9052 Gent, Belgium
| |
Collapse
|
30
|
Mouvet F, Villard J, Bolnykh V, Rothlisberger U. Recent Advances in First-Principles Based Molecular Dynamics. Acc Chem Res 2022; 55:221-230. [PMID: 35026115 DOI: 10.1021/acs.accounts.1c00503] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
First-principles molecular dynamics (FPMD) and its quantum mechanical-molecular mechanical (QM/MM) extensions are powerful tools to follow the real-time dynamics of a broad variety of systems in their ground as well as electronically excited states. The continued advances in computational power have enabled simulations of QM regions of larger sizes for more extended time scales. In addition, development of the parallel algorithms has boosted the performance of QM/MM methods even on existing computer architectures. In the case of density functional-based FPMD, systems of several hundreds to thousands of atoms can now be customarily simulated for tens to hundreds of picoseconds. In spite of this progress, the time scale limitations remain severe, especially when high-rung exchange-correlation functionals or high-level wave function based quantum mechanical methods are used. To ameliorate this, a large number of enhanced sampling methods have been introduced but most of the approaches that have been developed to increase the efficiency of FPMD based simulations sacrifice the real-time dynamics in favor of enhancing sampling. Here, we present some recent advances in boosting the efficiency of FPMD based simulations while keeping the full dynamic information. These include a highly efficient recent implementation of FPMD-based QM/MM simulations that not only enables fully flexible combinations of different electronic structure methods and force fields via a highly efficient communication library, it also fully exploits parallelism for both quantum and classical descriptions. The second type of acceleration methods we discuss is a large family of specially devised multiple-time-step algorithms that make use of suitable breakups of the total nuclear forces into fast components that can be calculated via lower level methods and slowly varying correction forces evaluated with a high-level method at long time intervals. The computational gain of this scheme mostly depends on the cost difference between the two methods and advantageous combinations can yield large speedups without compromising the accuracy of the high-level method. And finally, the third class of FPMD acceleration methods presented here are machine learning models to accelerated FPMD and their powerful combinations with multiple-time-step techniques. The combination of all the approaches enables substantial speedups of FPMD simulations of several orders of magnitude while fully preserving the real-time dynamics and accuracy.
Collapse
Affiliation(s)
- François Mouvet
- Laboratory of Computational Chemistry and Biochemistry, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| | - Justin Villard
- Laboratory of Computational Chemistry and Biochemistry, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| | - Viacheslav Bolnykh
- Laboratory of Computational Chemistry and Biochemistry, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| | - Ursula Rothlisberger
- Laboratory of Computational Chemistry and Biochemistry, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| |
Collapse
|
31
|
Pinheiro M, Ge F, Ferré N, Dral PO, Barbatti M. Choosing the right molecular machine learning potential. Chem Sci 2021; 12:14396-14413. [PMID: 34880991 PMCID: PMC8580106 DOI: 10.1039/d1sc03564a] [Citation(s) in RCA: 55] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Accepted: 09/14/2021] [Indexed: 11/21/2022] Open
Abstract
Quantum-chemistry simulations based on potential energy surfaces of molecules provide invaluable insight into the physicochemical processes at the atomistic level and yield such important observables as reaction rates and spectra. Machine learning potentials promise to significantly reduce the computational cost and hence enable otherwise unfeasible simulations. However, the surging number of such potentials begs the question of which one to choose or whether we still need to develop yet another one. Here, we address this question by evaluating the performance of popular machine learning potentials in terms of accuracy and computational cost. In addition, we deliver structured information for non-specialists in machine learning to guide them through the maze of acronyms, recognize each potential's main features, and judge what they could expect from each one.
Collapse
Affiliation(s)
- Max Pinheiro
- Aix Marseille University, CNRS, ICR Marseille France
| | - Fuchun Ge
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen University China
| | - Nicolas Ferré
- Aix Marseille University, CNRS, ICR Marseille France
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen University China
| | - Mario Barbatti
- Aix Marseille University, CNRS, ICR Marseille France
- Institut Universitaire de France 75231 Paris France
| |
Collapse
|
32
|
Gastegger M, Schütt KT, Müller KR. Machine learning of solvent effects on molecular spectra and reactions. Chem Sci 2021; 12:11473-11483. [PMID: 34567501 PMCID: PMC8409491 DOI: 10.1039/d1sc02742e] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Accepted: 07/22/2021] [Indexed: 01/13/2023] Open
Abstract
Fast and accurate simulation of complex chemical systems in environments such as solutions is a long standing challenge in theoretical chemistry. In recent years, machine learning has extended the boundaries of quantum chemistry by providing highly accurate and efficient surrogate models of electronic structure theory, which previously have been out of reach for conventional approaches. Those models have long been restricted to closed molecular systems without accounting for environmental influences, such as external electric and magnetic fields or solvent effects. Here, we introduce the deep neural network FieldSchNet for modeling the interaction of molecules with arbitrary external fields. FieldSchNet offers access to a wealth of molecular response properties, enabling it to simulate a wide range of molecular spectra, such as infrared, Raman and nuclear magnetic resonance. Beyond that, it is able to describe implicit and explicit molecular environments, operating as a polarizable continuum model for solvation or in a quantum mechanics/molecular mechanics setup. We employ FieldSchNet to study the influence of solvent effects on molecular spectra and a Claisen rearrangement reaction. Based on these results, we use FieldSchNet to design an external environment capable of lowering the activation barrier of the rearrangement reaction significantly, demonstrating promising venues for inverse chemical design.
Collapse
Affiliation(s)
- Michael Gastegger
- Machine Learning Group, Technische Universität Berlin 10587 Berlin Germany
| | - Kristof T Schütt
- Machine Learning Group, Technische Universität Berlin 10587 Berlin Germany
- Berlin Institute for the Foundations of Learning and Data 10587 Berlin Germany
| | - Klaus-Robert Müller
- Machine Learning Group, Technische Universität Berlin 10587 Berlin Germany
- Berlin Institute for the Foundations of Learning and Data 10587 Berlin Germany
- Department of Artificial Intelligence, Korea University Anam-dong, Seongbuk-gu Seoul 02841 Korea
- Max-Planck-Institut für Informatik 66123 Saarbrücken Germany
| |
Collapse
|
33
|
Deringer VL, Bartók AP, Bernstein N, Wilkins DM, Ceriotti M, Csányi G. Gaussian Process Regression for Materials and Molecules. Chem Rev 2021; 121:10073-10141. [PMID: 34398616 PMCID: PMC8391963 DOI: 10.1021/acs.chemrev.1c00022] [Citation(s) in RCA: 249] [Impact Index Per Article: 83.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Indexed: 12/18/2022]
Abstract
We provide an introduction to Gaussian process regression (GPR) machine-learning methods in computational materials science and chemistry. The focus of the present review is on the regression of atomistic properties: in particular, on the construction of interatomic potentials, or force fields, in the Gaussian Approximation Potential (GAP) framework; beyond this, we also discuss the fitting of arbitrary scalar, vectorial, and tensorial quantities. Methodological aspects of reference data generation, representation, and regression, as well as the question of how a data-driven model may be validated, are reviewed and critically discussed. A survey of applications to a variety of research questions in chemistry and materials science illustrates the rapid growth in the field. A vision is outlined for the development of the methodology in the years to come.
Collapse
Affiliation(s)
- Volker L. Deringer
- Department
of Chemistry, Inorganic Chemistry Laboratory, University of Oxford, Oxford OX1 3QR, United Kingdom
| | - Albert P. Bartók
- Department
of Physics and Warwick Centre for Predictive Modelling, School of
Engineering, University of Warwick, Coventry CV4 7AL, United Kingdom
| | - Noam Bernstein
- Center
for Computational Materials Science, U.S.
Naval Research Laboratory, Washington D.C. 20375, United States
| | - David M. Wilkins
- Atomistic
Simulation Centre, School of Mathematics and Physics, Queen’s University Belfast, Belfast BT7 1NN, Northern Ireland, United Kingdom
| | - Michele Ceriotti
- Laboratory
of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, Lausanne 1015, Switzerland
- National
Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale
de Lausanne, Lausanne, Switzerland
| | - Gábor Csányi
- Engineering
Laboratory, University of Cambridge, Cambridge CB2 1PZ, United Kingdom
| |
Collapse
|
34
|
Westermayr J, Marquetand P. Machine Learning for Electronically Excited States of Molecules. Chem Rev 2021; 121:9873-9926. [PMID: 33211478 PMCID: PMC8391943 DOI: 10.1021/acs.chemrev.0c00749] [Citation(s) in RCA: 173] [Impact Index Per Article: 57.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Indexed: 12/11/2022]
Abstract
Electronically excited states of molecules are at the heart of photochemistry, photophysics, as well as photobiology and also play a role in material science. Their theoretical description requires highly accurate quantum chemical calculations, which are computationally expensive. In this review, we focus on not only how machine learning is employed to speed up such excited-state simulations but also how this branch of artificial intelligence can be used to advance this exciting research field in all its aspects. Discussed applications of machine learning for excited states include excited-state dynamics simulations, static calculations of absorption spectra, as well as many others. In order to put these studies into context, we discuss the promises and pitfalls of the involved machine learning techniques. Since the latter are mostly based on quantum chemistry calculations, we also provide a short introduction into excited-state electronic structure methods and approaches for nonadiabatic dynamics simulations and describe tricks and problems when using them in machine learning for excited states of molecules.
Collapse
Affiliation(s)
- Julia Westermayr
- Institute
of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
| | - Philipp Marquetand
- Institute
of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
- Vienna
Research Platform on Accelerating Photoreaction Discovery, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
- Data
Science @ Uni Vienna, University of Vienna, Währinger Strasse 29, 1090 Vienna, Austria
| |
Collapse
|
35
|
Keith JA, Vassilev-Galindo V, Cheng B, Chmiela S, Gastegger M, Müller KR, Tkatchenko A. Combining Machine Learning and Computational Chemistry for Predictive Insights Into Chemical Systems. Chem Rev 2021; 121:9816-9872. [PMID: 34232033 PMCID: PMC8391798 DOI: 10.1021/acs.chemrev.1c00107] [Citation(s) in RCA: 218] [Impact Index Per Article: 72.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Indexed: 12/23/2022]
Abstract
Machine learning models are poised to make a transformative impact on chemical sciences by dramatically accelerating computational algorithms and amplifying insights available from computational chemistry methods. However, achieving this requires a confluence and coaction of expertise in computer science and physical sciences. This Review is written for new and experienced researchers working at the intersection of both fields. We first provide concise tutorials of computational chemistry and machine learning methods, showing how insights involving both can be achieved. We follow with a critical review of noteworthy applications that demonstrate how computational chemistry and machine learning can be used together to provide insightful (and useful) predictions in molecular and materials modeling, retrosyntheses, catalysis, and drug design.
Collapse
Affiliation(s)
- John A. Keith
- Department
of Chemical and Petroleum Engineering Swanson School of Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, United States
| | - Valentin Vassilev-Galindo
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Bingqing Cheng
- Accelerate
Programme for Scientific Discovery, Department
of Computer Science and Technology, 15 J. J. Thomson Avenue, Cambridge CB3 0FD, United Kingdom
| | - Stefan Chmiela
- Department
of Software Engineering and Theoretical Computer Science, Technische Universität Berlin, 10587, Berlin, Germany
| | - Michael Gastegger
- Department
of Software Engineering and Theoretical Computer Science, Technische Universität Berlin, 10587, Berlin, Germany
| | - Klaus-Robert Müller
- Machine
Learning Group, Technische Universität
Berlin, 10587, Berlin, Germany
- Department
of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul, 02841, Korea
- Max-Planck-Institut für Informatik, 66123 Saarbrücken, Germany
- Google Research, Brain Team, 10117 Berlin, Germany
| | - Alexandre Tkatchenko
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| |
Collapse
|
36
|
Abstract
Chemical compound space (CCS), the set of all theoretically conceivable combinations of chemical elements and (meta-)stable geometries that make up matter, is colossal. The first-principles based virtual sampling of this space, for example, in search of novel molecules or materials which exhibit desirable properties, is therefore prohibitive for all but the smallest subsets and simplest properties. We review studies aimed at tackling this challenge using modern machine learning techniques based on (i) synthetic data, typically generated using quantum mechanics based methods, and (ii) model architectures inspired by quantum mechanics. Such Quantum mechanics based Machine Learning (QML) approaches combine the numerical efficiency of statistical surrogate models with an ab initio view on matter. They rigorously reflect the underlying physics in order to reach universality and transferability across CCS. While state-of-the-art approximations to quantum problems impose severe computational bottlenecks, recent QML based developments indicate the possibility of substantial acceleration without sacrificing the predictive power of quantum mechanics.
Collapse
Affiliation(s)
- Bing Huang
- Faculty
of Physics, University of Vienna, 1090 Vienna, Austria
| | - O. Anatole von Lilienfeld
- Faculty
of Physics, University of Vienna, 1090 Vienna, Austria
- Institute
of Physical Chemistry and National Center for Computational Design
and Discovery of Novel Materials (MARVEL), Department of Chemistry, University of Basel, 4056 Basel, Switzerland
| |
Collapse
|
37
|
Unke O, Chmiela S, Sauceda HE, Gastegger M, Poltavsky I, Schütt KT, Tkatchenko A, Müller KR. Machine Learning Force Fields. Chem Rev 2021; 121:10142-10186. [PMID: 33705118 PMCID: PMC8391964 DOI: 10.1021/acs.chemrev.0c01111] [Citation(s) in RCA: 419] [Impact Index Per Article: 139.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Indexed: 12/27/2022]
Abstract
In recent years, the use of machine learning (ML) in computational chemistry has enabled numerous advances previously out of reach due to the computational complexity of traditional electronic-structure methods. One of the most promising applications is the construction of ML-based force fields (FFs), with the aim to narrow the gap between the accuracy of ab initio methods and the efficiency of classical FFs. The key idea is to learn the statistical relation between chemical structure and potential energy without relying on a preconceived notion of fixed chemical bonds or knowledge about the relevant interactions. Such universal ML approximations are in principle only limited by the quality and quantity of the reference data used to train them. This review gives an overview of applications of ML-FFs and the chemical insights that can be obtained from them. The core concepts underlying ML-FFs are described in detail, and a step-by-step guide for constructing and testing them from scratch is given. The text concludes with a discussion of the challenges that remain to be overcome by the next generation of ML-FFs.
Collapse
Affiliation(s)
- Oliver
T. Unke
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
- DFG
Cluster of Excellence “Unifying Systems in Catalysis”
(UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
| | - Stefan Chmiela
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
| | - Huziel E. Sauceda
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
- BASLEARN,
BASF-TU Joint Lab, Technische Universität
Berlin, 10587 Berlin, Germany
| | - Michael Gastegger
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
- DFG
Cluster of Excellence “Unifying Systems in Catalysis”
(UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
- BASLEARN,
BASF-TU Joint Lab, Technische Universität
Berlin, 10587 Berlin, Germany
| | - Igor Poltavsky
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Kristof T. Schütt
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
| | - Alexandre Tkatchenko
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Klaus-Robert Müller
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
- BIFOLD−Berlin
Institute for the Foundations of Learning and Data, Berlin, Germany
- Department
of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul 02841, Korea
- Max Planck
Institute for Informatics, Stuhlsatzenhausweg, 66123 Saarbrücken, Germany
- Google
Research, Brain Team, Berlin, Germany
| |
Collapse
|
38
|
Abstract
Electronically excited states of molecules are at the heart of photochemistry, photophysics, as well as photobiology and also play a role in material science. Their theoretical description requires highly accurate quantum chemical calculations, which are computationally expensive. In this review, we focus on not only how machine learning is employed to speed up such excited-state simulations but also how this branch of artificial intelligence can be used to advance this exciting research field in all its aspects. Discussed applications of machine learning for excited states include excited-state dynamics simulations, static calculations of absorption spectra, as well as many others. In order to put these studies into context, we discuss the promises and pitfalls of the involved machine learning techniques. Since the latter are mostly based on quantum chemistry calculations, we also provide a short introduction into excited-state electronic structure methods and approaches for nonadiabatic dynamics simulations and describe tricks and problems when using them in machine learning for excited states of molecules.
Collapse
Affiliation(s)
- Julia Westermayr
- Institute of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
| | - Philipp Marquetand
- Institute of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
- Vienna Research Platform on Accelerating Photoreaction Discovery, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
- Data Science @ Uni Vienna, University of Vienna, Währinger Strasse 29, 1090 Vienna, Austria
| |
Collapse
|
39
|
Westermayr J, Maurer RJ. Physically inspired deep learning of molecular excitations and photoemission spectra. Chem Sci 2021; 12:10755-10764. [PMID: 34447563 PMCID: PMC8372319 DOI: 10.1039/d1sc01542g] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Accepted: 06/29/2021] [Indexed: 12/29/2022] Open
Abstract
Modern functional materials consist of large molecular building blocks with significant chemical complexity which limits spectroscopic property prediction with accurate first-principles methods. Consequently, a targeted design of materials with tailored optoelectronic properties by high-throughput screening is bound to fail without efficient methods to predict molecular excited-state properties across chemical space. In this work, we present a deep neural network that predicts charged quasiparticle excitations for large and complex organic molecules with a rich elemental diversity and a size well out of reach of accurate many body perturbation theory calculations. The model exploits the fundamental underlying physics of molecular resonances as eigenvalues of a latent Hamiltonian matrix and is thus able to accurately describe multiple resonances simultaneously. The performance of this model is demonstrated for a range of organic molecules across chemical composition space and configuration space. We further showcase the model capabilities by predicting photoemission spectra at the level of the GW approximation for previously unseen conjugated molecules.
Collapse
Affiliation(s)
- Julia Westermayr
- Department of Chemistry, University of Warwick Gibbet Hill Road Coventry CV4 7AL UK
| | - Reinhard J Maurer
- Department of Chemistry, University of Warwick Gibbet Hill Road Coventry CV4 7AL UK
| |
Collapse
|
40
|
Musil F, Grisafi A, Bartók AP, Ortner C, Csányi G, Ceriotti M. Physics-Inspired Structural Representations for Molecules and Materials. Chem Rev 2021; 121:9759-9815. [PMID: 34310133 DOI: 10.1021/acs.chemrev.1c00021] [Citation(s) in RCA: 153] [Impact Index Per Article: 51.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The first step in the construction of a regression model or a data-driven analysis, aiming to predict or elucidate the relationship between the atomic-scale structure of matter and its properties, involves transforming the Cartesian coordinates of the atoms into a suitable representation. The development of atomic-scale representations has played, and continues to play, a central role in the success of machine-learning methods for chemistry and materials science. This review summarizes the current understanding of the nature and characteristics of the most commonly used structural and chemical descriptions of atomistic structures, highlighting the deep underlying connections between different frameworks and the ideas that lead to computationally efficient and universally applicable models. It emphasizes the link between properties, structures, their physical chemistry, and their mathematical description, provides examples of recent applications to a diverse set of chemical and materials science problems, and outlines the open questions and the most promising research directions in the field.
Collapse
Affiliation(s)
- Felix Musil
- Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland.,National Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Andrea Grisafi
- Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Albert P Bartók
- Department of Physics and Warwick Centre for Predictive Modelling, School of Engineering, University of Warwick, Coventry CV4 7AL, United Kingdom
| | - Christoph Ortner
- University of British Columbia, Vancouver, British Columbia V6T 1Z2, Canada
| | - Gábor Csányi
- Engineering Laboratory, University of Cambridge, Trumpington Street, Cambridge CB2 1PZ, United Kingdom
| | - Michele Ceriotti
- Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland.,National Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
41
|
Westermayr J, Gastegger M, Schütt KT, Maurer RJ. Perspective on integrating machine learning into computational chemistry and materials science. J Chem Phys 2021; 154:230903. [PMID: 34241249 DOI: 10.1063/5.0047760] [Citation(s) in RCA: 67] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Machine learning (ML) methods are being used in almost every conceivable area of electronic structure theory and molecular simulation. In particular, ML has become firmly established in the construction of high-dimensional interatomic potentials. Not a day goes by without another proof of principle being published on how ML methods can represent and predict quantum mechanical properties-be they observable, such as molecular polarizabilities, or not, such as atomic charges. As ML is becoming pervasive in electronic structure theory and molecular simulation, we provide an overview of how atomistic computational modeling is being transformed by the incorporation of ML approaches. From the perspective of the practitioner in the field, we assess how common workflows to predict structure, dynamics, and spectroscopy are affected by ML. Finally, we discuss how a tighter and lasting integration of ML methods with computational chemistry and materials science can be achieved and what it will mean for research practice, software development, and postgraduate training.
Collapse
Affiliation(s)
- Julia Westermayr
- Department of Chemistry, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, United Kingdom
| | - Michael Gastegger
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
| | - Kristof T Schütt
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
| | - Reinhard J Maurer
- Department of Chemistry, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, United Kingdom
| |
Collapse
|
42
|
von Rudorff GF, von Lilienfeld OA. Simplifying inverse materials design problems for fixed lattices with alchemical chirality. SCIENCE ADVANCES 2021; 7:eabf1173. [PMID: 34138735 PMCID: PMC8133750 DOI: 10.1126/sciadv.abf1173] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Accepted: 03/25/2021] [Indexed: 05/03/2023]
Abstract
Brute-force compute campaigns relying on demanding ab initio calculations routinely search for previously unknown materials in chemical compound space (CCS), the vast set of all conceivable stable combinations of elements and structural configurations. Here, we demonstrate that four-dimensional chirality arising from antisymmetry of alchemical perturbations dissects CCS and defines approximate ranks, which reduce its formal dimensionality and break down its combinatorial scaling. The resulting "alchemical" enantiomers have the same electronic energy up to the third order, independent of respective covalent bond topology, imposing relevant constraints on chemical bonding. Alchemical chirality deepens our understanding of CCS and enables the establishment of trends without empiricism for any materials with fixed lattices. We demonstrate the efficacy for three cases: (i) new rules for electronic energy contributions to chemical bonding; (ii) analysis of the electron density of BN-doped benzene; and (iii) ranking over 2000 and 4 million BN-doped naphthalene and picene derivatives, respectively.
Collapse
Affiliation(s)
- Guido Falk von Rudorff
- University of Vienna, Faculty of Physics, Kolingasse 14-16, 1090 Vienna, Austria
- Institute of Physical Chemistry and National Center for Computational Design and Discovery of Novel Materials (MARVEL), Department of Chemistry, University of Basel, 4056 Basel, Switzerland
| | - O Anatole von Lilienfeld
- University of Vienna, Faculty of Physics, Kolingasse 14-16, 1090 Vienna, Austria.
- Institute of Physical Chemistry and National Center for Computational Design and Discovery of Novel Materials (MARVEL), Department of Chemistry, University of Basel, 4056 Basel, Switzerland
| |
Collapse
|
43
|
Affiliation(s)
- Jörg Behler
- Universität Göttingen, Institut für Physikalische Chemie, Theoretische Chemie, Tammannstraße 6, 37077 Göttingen, Germany
| |
Collapse
|
44
|
Lee SJR, Husch T, Ding F, Miller TF. Analytical gradients for molecular-orbital-based machine learning. J Chem Phys 2021; 154:124120. [PMID: 33810669 DOI: 10.1063/5.0040782] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Molecular-orbital-based machine learning (MOB-ML) enables the prediction of accurate correlation energies at the cost of obtaining molecular orbitals. Here, we present the derivation, implementation, and numerical demonstration of MOB-ML analytical nuclear gradients, which are formulated in a general Lagrangian framework to enforce orthogonality, localization, and Brillouin constraints on the molecular orbitals. The MOB-ML gradient framework is general with respect to the regression technique (e.g., Gaussian process regression or neural networks) and the MOB feature design. We show that MOB-ML gradients are highly accurate compared to other ML methods on the ISO17 dataset while only being trained on energies for hundreds of molecules compared to energies and gradients for hundreds of thousands of molecules for the other ML methods. The MOB-ML gradients are also shown to yield accurate optimized structures at a computational cost for the gradient evaluation that is comparable to a density-corrected density functional theory calculation.
Collapse
Affiliation(s)
- Sebastian J R Lee
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| | - Tamara Husch
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| | - Feizhi Ding
- Entos, Inc., Los Angeles, California 90027, USA
| | - Thomas F Miller
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| |
Collapse
|
45
|
Musil F, Veit M, Goscinski A, Fraux G, Willatt MJ, Stricker M, Junge T, Ceriotti M. Efficient implementation of atom-density representations. J Chem Phys 2021; 154:114109. [DOI: 10.1063/5.0044689] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Félix Musil
- Laboratory of Computational Science and Modeling, Institute of Materials, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National Center for Computational Design and Discovery of Novel Materials (MARVEL), Lausanne, Switzerland
| | - Max Veit
- Laboratory of Computational Science and Modeling, Institute of Materials, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National Center for Computational Design and Discovery of Novel Materials (MARVEL), Lausanne, Switzerland
| | - Alexander Goscinski
- Laboratory of Computational Science and Modeling, Institute of Materials, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Guillaume Fraux
- Laboratory of Computational Science and Modeling, Institute of Materials, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Michael J. Willatt
- Laboratory of Computational Science and Modeling, Institute of Materials, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Markus Stricker
- Laboratory for Multiscale Mechanics Modeling, Institute of Mechanical Engineering, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
- Interdisciplinary Centre for Advanced Materials Simulation, Ruhr-University Bochum, Universitätsstraße 150, 44801 Bochum, Germany
| | - Till Junge
- Laboratory for Multiscale Mechanics Modeling, Institute of Mechanical Engineering, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Michele Ceriotti
- Laboratory of Computational Science and Modeling, Institute of Materials, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
46
|
Vassilev-Galindo V, Fonseca G, Poltavsky I, Tkatchenko A. Challenges for machine learning force fields in reproducing potential energy surfaces of flexible molecules. J Chem Phys 2021; 154:094119. [DOI: 10.1063/5.0038516] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Affiliation(s)
- Valentin Vassilev-Galindo
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Gregory Fonseca
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Igor Poltavsky
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| |
Collapse
|
47
|
Husch T, Sun J, Cheng L, Lee SJR, Miller TF. Improved accuracy and transferability of molecular-orbital-based machine learning: Organics, transition-metal complexes, non-covalent interactions, and transition states. J Chem Phys 2021; 154:064108. [DOI: 10.1063/5.0032362] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Affiliation(s)
- Tamara Husch
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| | - Jiace Sun
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| | - Lixue Cheng
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| | - Sebastian J. R. Lee
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| | - Thomas F. Miller
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| |
Collapse
|
48
|
Elm J. Toward a Holistic Understanding of the Formation and Growth of Atmospheric Molecular Clusters: A Quantum Machine Learning Perspective. J Phys Chem A 2021; 125:895-902. [PMID: 33378191 DOI: 10.1021/acs.jpca.0c09762] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The formation of atmospheric molecular clusters is an important stage in forming new particles in the atmosphere. Despite being a highly focused research area, the exact chemical species involved in the initial steps in new particle formation remain elusive. In this Perspective the main challenges and recent progression in the field are outlined with a special emphasis on the chemical complexity of the puzzle and prospect of modeling larger clusters. In general, there is a high demand for accurate and more complete quantum chemical data sets that can be applied in cluster distribution dynamics models and coupled to atmospheric chemical transport models. A view on how the community could reach this goal by applying data-driven machine learning approaches for more efficient exploration of cluster configurations is presented. A path toward larger clusters and direct molecular dynamics simulations of cluster formation and growth using machine learning models is discussed.
Collapse
Affiliation(s)
- Jonas Elm
- Department of Chemistry and iClimate, Aarhus University, Langelandsgade 140, Aarhus, Denmark
| |
Collapse
|
49
|
Hoja J, Medrano Sandonas L, Ernst BG, Vazquez-Mayagoitia A, DiStasio RA, Tkatchenko A. QM7-X, a comprehensive dataset of quantum-mechanical properties spanning the chemical space of small organic molecules. Sci Data 2021; 8:43. [PMID: 33531509 PMCID: PMC7854709 DOI: 10.1038/s41597-021-00812-2] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Accepted: 12/18/2020] [Indexed: 01/31/2023] Open
Abstract
We introduce QM7-X, a comprehensive dataset of 42 physicochemical properties for ≈4.2 million equilibrium and non-equilibrium structures of small organic molecules with up to seven non-hydrogen (C, N, O, S, Cl) atoms. To span this fundamentally important region of chemical compound space (CCS), QM7-X includes an exhaustive sampling of (meta-)stable equilibrium structures-comprised of constitutional/structural isomers and stereoisomers, e.g., enantiomers and diastereomers (including cis-/trans- and conformational isomers)-as well as 100 non-equilibrium structural variations thereof to reach a total of ≈4.2 million molecular structures. Computed at the tightly converged quantum-mechanical PBE0+MBD level of theory, QM7-X contains global (molecular) and local (atom-in-a-molecule) properties ranging from ground state quantities (such as atomization energies and dipole moments) to response quantities (such as polarizability tensors and dispersion coefficients). By providing a systematic, extensive, and tightly-converged dataset of quantum-mechanically computed physicochemical properties, we expect that QM7-X will play a critical role in the development of next-generation machine-learning based models for exploring greater swaths of CCS and performing in silico design of molecules with targeted properties.
Collapse
Affiliation(s)
- Johannes Hoja
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg
- Institute of Chemistry, University of Graz, 8010, Graz, Austria
| | - Leonardo Medrano Sandonas
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg
| | - Brian G Ernst
- Department of Chemistry and Chemical Biology, Cornell University, Ithaca, NY, 14853, USA
| | | | - Robert A DiStasio
- Department of Chemistry and Chemical Biology, Cornell University, Ithaca, NY, 14853, USA.
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
| |
Collapse
|
50
|
Menger MFS, Ehrmaier J, Faraji S. PySurf: A Framework for Database Accelerated Direct Dynamics. J Chem Theory Comput 2020; 16:7681-7689. [PMID: 33231447 PMCID: PMC7726901 DOI: 10.1021/acs.jctc.0c00825] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Indexed: 11/28/2022]
Abstract
The greatest restriction to the theoretical study of the dynamics of photoinduced processes is computationally expensive electronic structure calculations. Machine learning algorithms have the potential to reduce the number of these computations significantly. Here, PySurf is introduced as an innovative code framework, which is specifically designed for rapid prototyping and development tasks for data science applications in computational chemistry. It comes with powerful Plugin and Workflow engines, which allows intuitive customization for individual tasks. Data is automatically stored through the database framework, which enables additional interpolation of properties in previously evaluated regions of the conformational space. To illustrate the potential of the framework, a code for nonadiabatic surface hopping simulations based on the Landau-Zener algorithm is presented here. Deriving gradients from the interpolated potential energy surfaces allows for full-dimensional nonadiabatic surface hopping simulations using only adiabatic energies (energy only). Simulations of a pyrazine model and ab initio-based calculations of the SO2 molecule show that energy-only calculations with PySurf are able to correctly predict the nonadiabatic dynamics of these prototype systems. The results reveal the degree of sophistication, which can be achieved by the database accelerated energy-only surface hopping simulations being competitive to commonly used semiclassical approaches.
Collapse
Affiliation(s)
- Maximilian F. S.
J. Menger
- Zernike Institute
for Advanced
Materials, Faculty of Science and Engineering, University of Groningen, Nijenborgh 4, 9747AG Groningen, The Netherlands
| | - Johannes Ehrmaier
- Zernike Institute
for Advanced
Materials, Faculty of Science and Engineering, University of Groningen, Nijenborgh 4, 9747AG Groningen, The Netherlands
| | - Shirin Faraji
- Zernike Institute
for Advanced
Materials, Faculty of Science and Engineering, University of Groningen, Nijenborgh 4, 9747AG Groningen, The Netherlands
| |
Collapse
|