1
|
Zhou J, Huang M. Navigating the landscape of enzyme design: from molecular simulations to machine learning. Chem Soc Rev 2024; 53:8202-8239. [PMID: 38990263 DOI: 10.1039/d4cs00196f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
Global environmental issues and sustainable development call for new technologies for fine chemical synthesis and waste valorization. Biocatalysis has attracted great attention as the alternative to the traditional organic synthesis. However, it is challenging to navigate the vast sequence space to identify those proteins with admirable biocatalytic functions. The recent development of deep-learning based structure prediction methods such as AlphaFold2 reinforced by different computational simulations or multiscale calculations has largely expanded the 3D structure databases and enabled structure-based design. While structure-based approaches shed light on site-specific enzyme engineering, they are not suitable for large-scale screening of potential biocatalysts. Effective utilization of big data using machine learning techniques opens up a new era for accelerated predictions. Here, we review the approaches and applications of structure-based and machine-learning guided enzyme design. We also provide our view on the challenges and perspectives on effectively employing enzyme design approaches integrating traditional molecular simulations and machine learning, and the importance of database construction and algorithm development in attaining predictive ML models to explore the sequence fitness landscape for the design of admirable biocatalysts.
Collapse
Affiliation(s)
- Jiahui Zhou
- School of Chemistry and Chemical Engineering, Queen's University, David Keir Building, Stranmillis Road, Belfast BT9 5AG, Northern Ireland, UK.
| | - Meilan Huang
- School of Chemistry and Chemical Engineering, Queen's University, David Keir Building, Stranmillis Road, Belfast BT9 5AG, Northern Ireland, UK.
| |
Collapse
|
2
|
Plé T, Adjoua O, Lagardère L, Piquemal JP. FeNNol: An efficient and flexible library for building force-field-enhanced neural network potentials. J Chem Phys 2024; 161:042502. [PMID: 39051830 DOI: 10.1063/5.0217688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Accepted: 06/28/2024] [Indexed: 07/27/2024] Open
Abstract
Neural network interatomic potentials (NNPs) have recently proven to be powerful tools to accurately model complex molecular systems while bypassing the high numerical cost of ab initio molecular dynamics simulations. In recent years, numerous advances in model architectures as well as the development of hybrid models combining machine-learning (ML) with more traditional, physically motivated, force-field interactions have considerably increased the design space of ML potentials. In this paper, we present FeNNol, a new library for building, training, and running force-field-enhanced neural network potentials. It provides a flexible and modular system for building hybrid models, allowing us to easily combine state-of-the-art embeddings with ML-parameterized physical interaction terms without the need for explicit programming. Furthermore, FeNNol leverages the automatic differentiation and just-in-time compilation features of the Jax Python library to enable fast evaluation of NNPs, shrinking the performance gap between ML potentials and standard force-fields. This is demonstrated with the popular ANI-2x model reaching simulation speeds nearly on par with the AMOEBA polarizable force-field on commodity GPUs (graphics processing units). We hope that FeNNol will facilitate the development and application of new hybrid NNP architectures for a wide range of molecular simulation problems.
Collapse
Affiliation(s)
- Thomas Plé
- Sorbonne Université, LCT, UMR 7616 CNRS, 75005 Paris, France
| | - Olivier Adjoua
- Sorbonne Université, LCT, UMR 7616 CNRS, 75005 Paris, France
| | - Louis Lagardère
- Sorbonne Université, LCT, UMR 7616 CNRS, 75005 Paris, France
| | | |
Collapse
|
3
|
Zubatyuk R, Biczysko M, Ranasinghe K, Moriarty NW, Gokcan H, Kruse H, Poon BK, Adams PD, Waller MP, Roitberg AE, Isayev O, Afonine PV. AQuaRef: Machine learning accelerated quantum refinement of protein structures. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.21.604493. [PMID: 39071315 PMCID: PMC11275739 DOI: 10.1101/2024.07.21.604493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Cryo-EM and X-ray crystallography provide crucial experimental data for obtaining atomic-detail models of biomacromolecules. Refining these models relies on library- based stereochemical restraints, which, in addition to being limited to known chemical entities, do not include meaningful noncovalent interactions relying solely on nonbonded repulsions. Quantum mechanical (QM) calculations could alleviate these issues but are too expensive for large molecules. We present a novel AI-enabled Quantum Refinement (AQuaRef) based on AIMNet2 neural network potential mimicking QM at substantially lower computational costs. By refining 41 cryo-EM and 30 X-ray structures, we show that this approach yields atomic models with superior geometric quality compared to standard techniques, while maintaining an equal or better fit to experimental data.
Collapse
|
4
|
Shirani H, Hashemianzadeh SM. Quantum-level machine learning calculations of Levodopa. Comput Biol Chem 2024; 112:108146. [PMID: 39067350 DOI: 10.1016/j.compbiolchem.2024.108146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Revised: 06/20/2024] [Accepted: 07/08/2024] [Indexed: 07/30/2024]
Abstract
Many drug molecules contain functional groups, resulting in a torsional barrier corresponding to rotation around the bond linking the fragments. In medicinal chemistry and pharmaceutical sciences, inclusive of drug design studies, the exact calculation of the potential energy surface (PES) of these molecular torsions is extremely important and precious. Machine learning (ML), including deep learning (DL), is currently one of the most rapidly evolving tools in computer-aided drug discovery and molecular simulations. In this work, we used ANI-1x neural network potential as a quantum-level ML to predict the PESs of the L-3,4-dihydroxyphenylalanine (Levodopa) antiparkinsonian drug molecule. The electronic energies and structural parameters calculated by density functional theory (DFT) using the wB97X method and all possible Pople's basis sets indicated the 6-31G(d) basis set, when used with the wB97X functional, exhibits behavior similar to that of the ANI-1x model. The vibrational frequencies investigation showed a linear correlation between DFT and ML data. All ANI-1x calculations were completed quickly in a very short computing time. From this perspective, we expect the ANI-1x dataset applied in this work to be appreciably efficient and effective in computational structure-based drug design studies.
Collapse
Affiliation(s)
- Hossein Shirani
- Molecular Simulation Research Laboratory, Department of Chemistry, Iran University of Science and Technology, P.O. Box 16846-13114, Tehran, Iran.
| | - Seyed Majid Hashemianzadeh
- Molecular Simulation Research Laboratory, Department of Chemistry, Iran University of Science and Technology, P.O. Box 16846-13114, Tehran, Iran.
| |
Collapse
|
5
|
Medrano Sandonas L, Van Rompaey D, Fallani A, Hilfiker M, Hahn D, Perez-Benito L, Verhoeven J, Tresadern G, Kurt Wegner J, Ceulemans H, Tkatchenko A. Dataset for quantum-mechanical exploration of conformers and solvent effects in large drug-like molecules. Sci Data 2024; 11:742. [PMID: 38972891 PMCID: PMC11228031 DOI: 10.1038/s41597-024-03521-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 06/13/2024] [Indexed: 07/09/2024] Open
Abstract
We here introduce the Aquamarine (AQM) dataset, an extensive quantum-mechanical (QM) dataset that contains the structural and electronic information of 59,783 low-and high-energy conformers of 1,653 molecules with a total number of atoms ranging from 2 to 92 (mean: 50.9), and containing up to 54 (mean: 28.2) non-hydrogen atoms. To gain insights into the solvent effects as well as collective dispersion interactions for drug-like molecules, we have performed QM calculations supplemented with a treatment of many-body dispersion (MBD) interactions of structures and properties in the gas phase and implicit water. Thus, AQM contains over 40 global and local physicochemical properties (including ground-state and response properties) per conformer computed at the tightly converged PBE0+MBD level of theory for gas-phase molecules, whereas PBE0+MBD with the modified Poisson-Boltzmann (MPB) model of water was used for solvated molecules. By addressing both molecule-solvent and dispersion interactions, AQM dataset can serve as a challenging benchmark for state-of-the-art machine learning methods for property modeling and de novo generation of large (solvated) molecules with pharmaceutical and biological relevance.
Collapse
Affiliation(s)
- Leonardo Medrano Sandonas
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
- Institute for Materials Science and Max Bergmann Center of Biomaterials, TU Dresden, 01062, Dresden, Germany.
| | - Dries Van Rompaey
- Drug Discovery Data Sciences (D3S), Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium.
| | - Alessio Fallani
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg
- Drug Discovery Data Sciences (D3S), Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Mathias Hilfiker
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg
| | - David Hahn
- Computational Chemistry, Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Laura Perez-Benito
- Computational Chemistry, Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Jonas Verhoeven
- Drug Discovery Data Sciences (D3S), Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Gary Tresadern
- Computational Chemistry, Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Joerg Kurt Wegner
- Drug Discovery Data Sciences (D3S), Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
- Drug Discovery Data Sciences (D3S), Johnson & Johnson Innovative Medicine, 301 Binney Street, MA 02142, Cambridge, USA
| | - Hugo Ceulemans
- Drug Discovery Data Sciences (D3S), Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
| |
Collapse
|
6
|
Aldossary A, Campos-Gonzalez-Angulo JA, Pablo-García S, Leong SX, Rajaonson EM, Thiede L, Tom G, Wang A, Avagliano D, Aspuru-Guzik A. In Silico Chemical Experiments in the Age of AI: From Quantum Chemistry to Machine Learning and Back. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2024; 36:e2402369. [PMID: 38794859 DOI: 10.1002/adma.202402369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 04/28/2024] [Indexed: 05/26/2024]
Abstract
Computational chemistry is an indispensable tool for understanding molecules and predicting chemical properties. However, traditional computational methods face significant challenges due to the difficulty of solving the Schrödinger equations and the increasing computational cost with the size of the molecular system. In response, there has been a surge of interest in leveraging artificial intelligence (AI) and machine learning (ML) techniques to in silico experiments. Integrating AI and ML into computational chemistry increases the scalability and speed of the exploration of chemical space. However, challenges remain, particularly regarding the reproducibility and transferability of ML models. This review highlights the evolution of ML in learning from, complementing, or replacing traditional computational chemistry for energy and property predictions. Starting from models trained entirely on numerical data, a journey set forth toward the ideal model incorporating or learning the physical laws of quantum mechanics. This paper also reviews existing computational methods and ML models and their intertwining, outlines a roadmap for future research, and identifies areas for improvement and innovation. Ultimately, the goal is to develop AI architectures capable of predicting accurate and transferable solutions to the Schrödinger equation, thereby revolutionizing in silico experiments within chemistry and materials science.
Collapse
Affiliation(s)
- Abdulrahman Aldossary
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
| | | | - Sergio Pablo-García
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, ON, M5S 2E4, Canada
| | - Shi Xuan Leong
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
| | - Ella Miray Rajaonson
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
| | - Luca Thiede
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, ON, M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
| | - Gary Tom
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
| | - Andrew Wang
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
| | - Davide Avagliano
- Chimie ParisTech, PSL University, CNRS, Institute of Chemistry for Life and Health Sciences (iCLeHS UMR 8060), Paris, F-75005, France
| | - Alán Aspuru-Guzik
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, ON, M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
- Department of Materials Science & Engineering, University of Toronto, 184 College St., Toronto, ON, M5S 3E4, Canada
- Department of Chemical Engineering & Applied Chemistry, University of Toronto, 200 College St., Toronto, ON, M5S 3E5, Canada
- Lebovic Fellow, Canadian Institute for Advanced Research (CIFAR), 66118 University Ave., Toronto, M5G 1M1, Canada
- Acceleration Consortium, 80 St George St, Toronto, M5S 3H6, Canada
| |
Collapse
|
7
|
Bassani CL, van Anders G, Banin U, Baranov D, Chen Q, Dijkstra M, Dimitriyev MS, Efrati E, Faraudo J, Gang O, Gaston N, Golestanian R, Guerrero-Garcia GI, Gruenwald M, Haji-Akbari A, Ibáñez M, Karg M, Kraus T, Lee B, Van Lehn RC, Macfarlane RJ, Mognetti BM, Nikoubashman A, Osat S, Prezhdo OV, Rotskoff GM, Saiz L, Shi AC, Skrabalak S, Smalyukh II, Tagliazucchi M, Talapin DV, Tkachenko AV, Tretiak S, Vaknin D, Widmer-Cooper A, Wong GCL, Ye X, Zhou S, Rabani E, Engel M, Travesset A. Nanocrystal Assemblies: Current Advances and Open Problems. ACS NANO 2024; 18:14791-14840. [PMID: 38814908 DOI: 10.1021/acsnano.3c10201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
We explore the potential of nanocrystals (a term used equivalently to nanoparticles) as building blocks for nanomaterials, and the current advances and open challenges for fundamental science developments and applications. Nanocrystal assemblies are inherently multiscale, and the generation of revolutionary material properties requires a precise understanding of the relationship between structure and function, the former being determined by classical effects and the latter often by quantum effects. With an emphasis on theory and computation, we discuss challenges that hamper current assembly strategies and to what extent nanocrystal assemblies represent thermodynamic equilibrium or kinetically trapped metastable states. We also examine dynamic effects and optimization of assembly protocols. Finally, we discuss promising material functions and examples of their realization with nanocrystal assemblies.
Collapse
Affiliation(s)
- Carlos L Bassani
- Institute for Multiscale Simulation, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91058 Erlangen, Germany
| | - Greg van Anders
- Department of Physics, Engineering Physics, and Astronomy, Queen's University, Kingston, Ontario K7L 3N6, Canada
| | - Uri Banin
- Institute of Chemistry and the Center for Nanoscience and Nanotechnology, The Hebrew University of Jerusalem, Jerusalem 91904, Israel
| | - Dmitry Baranov
- Division of Chemical Physics, Department of Chemistry, Lund University, SE-221 00 Lund, Sweden
| | - Qian Chen
- University of Illinois, Urbana, Illinois 61801, USA
| | - Marjolein Dijkstra
- Soft Condensed Matter & Biophysics, Debye Institute for Nanomaterials Science, Utrecht University, 3584 CC Utrecht, The Netherlands
| | - Michael S Dimitriyev
- Department of Polymer Science and Engineering, University of Massachusetts, Amherst, Massachusetts 01003, USA
- Department of Materials Science and Engineering, Texas A&M University, College Station, Texas 77843, USA
| | - Efi Efrati
- Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot 76100, Israel
- James Franck Institute, The University of Chicago, Chicago, Illinois 60637, USA
| | - Jordi Faraudo
- Institut de Ciencia de Materials de Barcelona (ICMAB-CSIC), Campus de la UAB, E-08193 Bellaterra, Barcelona, Spain
| | - Oleg Gang
- Department of Chemical Engineering, Columbia University, New York, New York 10027, USA
- Department of Applied Physics and Applied Mathematics, Columbia University, New York, New York 10027, USA
- Center for Functional Nanomaterials, Brookhaven National Laboratory, Upton, New York 11973, USA
| | - Nicola Gaston
- The MacDiarmid Institute for Advanced Materials and Nanotechnology, Department of Physics, The University of Auckland, Auckland 1142, New Zealand
| | - Ramin Golestanian
- Max Planck Institute for Dynamics and Self-Organization (MPI-DS), 37077 Göttingen, Germany
- Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford OX1 3PU, UK
| | - G Ivan Guerrero-Garcia
- Facultad de Ciencias de la Universidad Autónoma de San Luis Potosí, 78295 San Luis Potosí, México
| | - Michael Gruenwald
- Department of Chemistry, University of Utah, Salt Lake City, Utah 84112, USA
| | - Amir Haji-Akbari
- Department of Chemical and Environmental Engineering, Yale University, New Haven, Connecticut 06511, USA
| | - Maria Ibáñez
- Institute of Science and Technology Austria (ISTA), 3400 Klosterneuburg, Austria
| | - Matthias Karg
- Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
| | - Tobias Kraus
- INM - Leibniz-Institute for New Materials, 66123 Saarbrücken, Germany
- Saarland University, Colloid and Interface Chemistry, 66123 Saarbrücken, Germany
| | - Byeongdu Lee
- X-ray Science Division, Argonne National Laboratory, Lemont, Illinois 60439, USA
| | - Reid C Van Lehn
- Department of Chemical and Biological Engineering, University of Wisconsin-Madison, Madison, Wisconsin 53717, USA
| | - Robert J Macfarlane
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, USA
| | - Bortolo M Mognetti
- Center for Nonlinear Phenomena and Complex Systems, Université Libre de Bruxelles, 1050 Brussels, Belgium
| | - Arash Nikoubashman
- Leibniz-Institut für Polymerforschung Dresden e.V., 01069 Dresden, Germany
- Institut für Theoretische Physik, Technische Universität Dresden, 01069 Dresden, Germany
| | - Saeed Osat
- Max Planck Institute for Dynamics and Self-Organization (MPI-DS), 37077 Göttingen, Germany
| | - Oleg V Prezhdo
- Department of Chemistry, University of Southern California, Los Angeles, CA 90089, USA
- Department of Physics and Astronomy, University of Southern California, Los Angeles, California 90089, USA
| | - Grant M Rotskoff
- Department of Chemistry, Stanford University, Stanford, California 94305, USA
| | - Leonor Saiz
- Department of Biomedical Engineering, University of California, Davis, California 95616, USA
| | - An-Chang Shi
- Department of Physics & Astronomy, McMaster University, Hamilton, Ontario L8S 4M1, Canada
| | - Sara Skrabalak
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, USA
| | - Ivan I Smalyukh
- Department of Physics and Chemical Physics Program, University of Colorado, Boulder, Colorado 80309, USA
- International Institute for Sustainability with Knotted Chiral Meta Matter, Hiroshima University, Higashi-Hiroshima City 739-0046, Japan
| | - Mario Tagliazucchi
- Universidad de Buenos Aires, Ciudad Universitaria, C1428EHA Ciudad Autónoma de Buenos Aires, Buenos Aires 1428 Argentina
| | - Dmitri V Talapin
- Department of Chemistry, James Franck Institute and Pritzker School of Molecular Engineering, The University of Chicago, Chicago, Illinois 60637, USA
- Center for Nanoscale Materials, Argonne National Laboratory, Argonne, IL 60439, USA
| | - Alexei V Tkachenko
- Center for Functional Nanomaterials, Brookhaven National Laboratory, Upton, New York 11973, USA
| | - Sergei Tretiak
- Theoretical Division and Center for Integrated Nanotechnologies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - David Vaknin
- Iowa State University and Ames Lab, Ames, Iowa 50011, USA
| | - Asaph Widmer-Cooper
- ARC Centre of Excellence in Exciton Science, School of Chemistry, University of Sydney, Sydney, New South Wales 2006, Australia
- The University of Sydney Nano Institute, University of Sydney, Sydney, New South Wales 2006, Australia
| | - Gerard C L Wong
- Department of Bioengineering, University of California, Los Angeles, California 90095, USA
- Department of Chemistry and Biochemistry, University of California, Los Angeles, California 90095, USA
- Department of Microbiology, Immunology & Molecular Genetics, University of California, Los Angeles, CA 90095, USA
- California NanoSystems Institute, University of California, Los Angeles, CA 90095, USA
| | - Xingchen Ye
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, USA
| | - Shan Zhou
- Department of Nanoscience and Biomedical Engineering, South Dakota School of Mines and Technology, Rapid City, South Dakota 57701, USA
| | - Eran Rabani
- Department of Chemistry, University of California and Materials Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
- The Raymond and Beverly Sackler Center of Computational Molecular and Materials Science, Tel Aviv University, Tel Aviv 69978, Israel
| | - Michael Engel
- Institute for Multiscale Simulation, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91058 Erlangen, Germany
| | - Alex Travesset
- Iowa State University and Ames Lab, Ames, Iowa 50011, USA
| |
Collapse
|
8
|
Yang Y, Zhang S, Ranasinghe KD, Isayev O, Roitberg AE. Machine Learning of Reactive Potentials. Annu Rev Phys Chem 2024; 75:371-395. [PMID: 38941524 DOI: 10.1146/annurev-physchem-062123-024417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/30/2024]
Abstract
In the past two decades, machine learning potentials (MLPs) have driven significant developments in chemical, biological, and material sciences. The construction and training of MLPs enable fast and accurate simulations and analysis of thermodynamic and kinetic properties. This review focuses on the application of MLPs to reaction systems with consideration of bond breaking and formation. We review the development of MLP models, primarily with neural network and kernel-based algorithms, and recent applications of reactive MLPs (RMLPs) to systems at different scales. We show how RMLPs are constructed, how they speed up the calculation of reactive dynamics, and how they facilitate the study of reaction trajectories, reaction rates, free energy calculations, and many other calculations. Different data sampling strategies applied in building RMLPs are also discussed with a focus on how to collect structures for rare events and how to further improve their performance with active learning.
Collapse
Affiliation(s)
- Yinuo Yang
- Department of Chemistry, University of Florida, Gainesville, Florida;
| | - Shuhao Zhang
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania;
| | | | - Olexandr Isayev
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania;
| | - Adrian E Roitberg
- Department of Chemistry, University of Florida, Gainesville, Florida;
| |
Collapse
|
9
|
Guibourg P, Dontot L, Anglade PM, Gervais B. DFTB Simulation of Charged Clusters Using Machine Learning Charge Inference. J Chem Theory Comput 2024; 20:4007-4018. [PMID: 38690586 DOI: 10.1021/acs.jctc.4c00107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2024]
Abstract
We present a modification to self-consistent charge density functional-based tight binding (SCC-DFTB), which allows computation based on approximate atomic charges. We obtain these charges by means of a machine learning (ML) process that combines a Coulomb model with a neural network. This allows us to avoid the SCC cycles in the SCC-DFTB calculation while maintaining its accuracy. The main input of the model is the atomic positions characterized by a set of atom-centered symmetry functions. The charge inference from our ML algorithm is as close as 10-2 units of charge from the exact SCC solution. Our ML-DFTB approach provides a good approximation of the density matrix and of the energy and forces with only a single diagonalization. This is a significant computational saving with respect to the complete SCC algorithm, which allows us to investigate a bigger ensemble of atoms. We show the quality of our approach in the case of charged silicon carbide (SiC) clusters. The ML-DFTB potential energy surface (PES) mimics the SCC-DFTB PES rather well, despite its simplicity. This allows us to obtain the same geometric structure ordering with respect to energy for small clusters. The dissociation barriers for ion emission are well-reproduced, which opens the way to investigating ion field emission and charged cluster stability. The ML-DFTB approach is obviously not limited to charged clusters or SiC materials. It opens a new route to investigate larger clusters than those investigated by standard SCC-DFTB, as well as surface and solid-state chemistry at the atomic level.
Collapse
Affiliation(s)
- Paul Guibourg
- Laboratoire Cimap, UMR6252─Université de Caen Normandie, École Nationale Supérieure d'Ingénieures de Caen, Commissariat à l'Énergie Atomique, Centre National de la Recherche Scientifique, 6 Boulevard Du Maréchal Juin, 14050 Caen Cedex, France
| | - Léo Dontot
- Laboratoire Cimap, UMR6252─Université de Caen Normandie, École Nationale Supérieure d'Ingénieures de Caen, Commissariat à l'Énergie Atomique, Centre National de la Recherche Scientifique, 6 Boulevard Du Maréchal Juin, 14050 Caen Cedex, France
| | - Pierre-Matthieu Anglade
- Laboratoire Cimap, UMR6252─Université de Caen Normandie, École Nationale Supérieure d'Ingénieures de Caen, Commissariat à l'Énergie Atomique, Centre National de la Recherche Scientifique, 6 Boulevard Du Maréchal Juin, 14050 Caen Cedex, France
| | - Benoit Gervais
- Laboratoire Cimap, UMR6252─Université de Caen Normandie, École Nationale Supérieure d'Ingénieures de Caen, Commissariat à l'Énergie Atomique, Centre National de la Recherche Scientifique, 6 Boulevard Du Maréchal Juin, 14050 Caen Cedex, France
| |
Collapse
|
10
|
Riedmiller K, Reiser P, Bobkova E, Maltsev K, Gryn'ova G, Friederich P, Gräter F. Substituting density functional theory in reaction barrier calculations for hydrogen atom transfer in proteins. Chem Sci 2024; 15:2518-2527. [PMID: 38362411 PMCID: PMC10866341 DOI: 10.1039/d3sc03922f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 01/10/2024] [Indexed: 02/17/2024] Open
Abstract
Hydrogen atom transfer (HAT) reactions are important in many biological systems. As these reactions are hard to observe experimentally, it is of high interest to shed light on them using simulations. Here, we present a machine learning model based on graph neural networks for the prediction of energy barriers of HAT reactions in proteins. As input, the model uses exclusively non-optimized structures as obtained from classical simulations. It was trained on more than 17 000 energy barriers calculated using hybrid density functional theory. We built and evaluated the model in the context of HAT in collagen, but we show that the same workflow can easily be applied to HAT reactions in other biological or synthetic polymers. We obtain for relevant reactions (small reaction distances) a model with good predictive power (R2 ∼ 0.9 and mean absolute error of <3 kcal mol-1). As the inference speed is high, this model enables evaluations of dozens of chemical situations within seconds. When combined with molecular dynamics in a kinetic Monte-Carlo scheme, the model paves the way toward reactive simulations.
Collapse
Affiliation(s)
- Kai Riedmiller
- Heidelberg Institute for Theoretical Studies Heidelberg Germany
| | - Patrick Reiser
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology Engler-Bunte-Ring 8 Karlsruhe 76131 Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology Hermann-von-Helmholtz-Platz 1: 76344 Eggenstein-Leopoldshafen Germany
| | | | - Kiril Maltsev
- Heidelberg Institute for Theoretical Studies Heidelberg Germany
| | - Ganna Gryn'ova
- Heidelberg Institute for Theoretical Studies Heidelberg Germany
- Interdisciplinary Center for Scientific Computing, Heidelberg University Heidelberg Germany
| | - Pascal Friederich
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology Engler-Bunte-Ring 8 Karlsruhe 76131 Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology Hermann-von-Helmholtz-Platz 1: 76344 Eggenstein-Leopoldshafen Germany
| | - Frauke Gräter
- Heidelberg Institute for Theoretical Studies Heidelberg Germany
- Interdisciplinary Center for Scientific Computing, Heidelberg University Heidelberg Germany
| |
Collapse
|
11
|
Briling K, Calvino Alonso Y, Fabrizio A, Corminboeuf C. SPA HM(a,b): Encoding the Density Information from Guess Hamiltonian in Quantum Machine Learning Representations. J Chem Theory Comput 2024; 20:1108-1117. [PMID: 38227222 PMCID: PMC10867806 DOI: 10.1021/acs.jctc.3c01040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 12/20/2023] [Accepted: 12/26/2023] [Indexed: 01/17/2024]
Abstract
Recently, we introduced a class of molecular representations for kernel-based regression methods─the spectrum of approximated Hamiltonian matrices (SPAHM)─that takes advantage of lightweight one-electron Hamiltonians traditionally used as a self-consistent field initial guess. The original SPAHM variant is built from occupied-orbital energies (i.e., eigenvalues) and naturally contains all of the information about nuclear charges, atomic positions, and symmetry requirements. Its advantages were demonstrated on data sets featuring a wide variation of charge and spin, for which traditional structure-based representations commonly fail. SPAHM(a,b), as introduced here, expand the eigenvalue SPAHM into local and transferable representations. They rely upon one-electron density matrices to build fingerprints from atomic and bond density overlap contributions inspired from preceding state-of-the-art representations. The performance and efficiency of SPAHM(a,b) is assessed on the predictions for data sets of prototypical organic molecules (QM7) of different charges and azoheteroarene dyes in an excited state. Overall, both SPAHM(a) and SPAHM(b) outperform state-of-the-art representations on difficult prediction tasks such as the atomic properties of charged open-shell species and of π-conjugated systems.
Collapse
Affiliation(s)
- Ksenia
R. Briling
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Yannick Calvino Alonso
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Alberto Fabrizio
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National
Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
| | - Clemence Corminboeuf
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National
Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
12
|
Vennelakanti V, Kilic IB, Terrones GG, Duan C, Kulik HJ. Machine Learning Prediction of the Experimental Transition Temperature of Fe(II) Spin-Crossover Complexes. J Phys Chem A 2024; 128:204-216. [PMID: 38148525 DOI: 10.1021/acs.jpca.3c07104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2023]
Abstract
Spin-crossover (SCO) complexes are materials that exhibit changes in the spin state in response to external stimuli, with potential applications in molecular electronics. It is challenging to know a priori how to design ligands to achieve the delicate balance of entropic and enthalpic contributions needed to tailor a transition temperature close to room temperature. We leverage the SCO complexes from the previously curated SCO-95 data set [Vennelakanti et al. J. Chem. Phys. 159, 024120 (2023)] to train three machine learning (ML) models for transition temperature (T1/2) prediction using graph-based revised autocorrelations as features. We perform feature selection using random forest-ranked recursive feature addition (RF-RFA) to identify the features essential to model transferability. Of the ML models considered, the full feature set RF and recursive feature addition RF models perform best, achieving moderate correlation to experimental T1/2 values. We then compare ML T1/2 predictions to those from three previously identified best-performing density functional approximations (DFAs) which accurately predict SCO behavior across SCO-95, finding that the ML models predict T1/2 more accurately than the best-performing DFAs. In addition, we study ML model predictions for a set of 18 SCO complexes for which only estimated T1/2 values are available. Upon excluding outliers from this set, the RF-RFA RF model shows a strong correlation to estimated T1/2 values with a Pearson's r of 0.82. In contrast, DFA-predicted T1/2 values have large errors and show no correlation to estimated T1/2 values over the same set of complexes. Overall, our study demonstrates slightly superior performance of ML models in comparison with some of the best-performing DFAs, and we expect ML models to improve further as larger data sets of SCO complexes are curated and become available for model training.
Collapse
Affiliation(s)
- Vyshnavi Vennelakanti
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Irem B Kilic
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Gianmarco G Terrones
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
13
|
Zhao Q, Anstine DM, Isayev O, Savoie BM. Δ 2 machine learning for reaction property prediction. Chem Sci 2023; 14:13392-13401. [PMID: 38033903 PMCID: PMC10686042 DOI: 10.1039/d3sc02408c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 07/11/2023] [Indexed: 12/02/2023] Open
Abstract
The emergence of Δ-learning models, whereby machine learning (ML) is used to predict a correction to a low-level energy calculation, provides a versatile route to accelerate high-level energy evaluations at a given geometry. However, Δ-learning models are inapplicable to reaction properties like heats of reaction and activation energies that require both a high-level geometry and energy evaluation. Here, a Δ2-learning model is introduced that can predict high-level activation energies based on low-level critical-point geometries. The Δ2 model uses an atom-wise featurization typical of contemporary ML interatomic potentials (MLIPs) and is trained on a dataset of ∼167 000 reactions, using the GFN2-xTB energy and critical-point geometry as a low-level input and the B3LYP-D3/TZVP energy calculated at the B3LYP-D3/TZVP critical point as a high-level target. The excellent performance of the Δ2 model on unseen reactions demonstrates the surprising ease with which the model implicitly learns the geometric deviations between the low-level and high-level geometries that condition the activation energy prediction. The transferability of the Δ2 model is validated on several external testing sets where it shows near chemical accuracy, illustrating the benefits of combining ML models with readily available physical-based information from semi-empirical quantum chemistry calculations. Fine-tuning of the Δ2 model on a small number of Gaussian-4 calculations produced a 35% accuracy improvement over DFT activation energy predictions while retaining xTB-level cost. The Δ2 model approach proves to be an efficient strategy for accelerating chemical reaction characterization with minimal sacrifice in prediction accuracy.
Collapse
Affiliation(s)
- Qiyuan Zhao
- Davidson School of Chemical Engineering, Purdue University West Lafayette IN 47906 USA
| | - Dylan M Anstine
- Department of Chemistry, Carnegie Mellon University Pittsburgh PA 15213 USA
| | - Olexandr Isayev
- Department of Chemistry, Carnegie Mellon University Pittsburgh PA 15213 USA
| | - Brett M Savoie
- Davidson School of Chemical Engineering, Purdue University West Lafayette IN 47906 USA
| |
Collapse
|
14
|
Wu S, Yang X, Zhao X, Li Z, Lu M, Xie X, Yan J. Applications and Advances in Machine Learning Force Fields. J Chem Inf Model 2023; 63:6972-6985. [PMID: 37751546 DOI: 10.1021/acs.jcim.3c00889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/28/2023]
Abstract
Force fields (FFs) form the basis of molecular simulations and have significant implications in diverse fields such as materials science, chemistry, physics, and biology. A suitable FF is required to accurately describe system properties. However, an off-the-shelf FF may not be suitable for certain specialized systems, and researchers often need to tailor the FF that fits specific requirements. Before applying machine learning (ML) techniques to construct FFs, the mainstream FFs were primarily based on first-principles force fields (FPFF) and empirical FFs. However, the drawbacks of FPFF and empirical FFs are high cost and low accuracy, respectively, so there is a growing interest in using ML as an effective and precise tool for reconciling this trade-off in developing FFs. In this review, we introduce the fundamental principles of ML and FFs in the context of machine learning force fields (MLFF). We also discuss the advantages and applications of MLFF compared to traditional FFs, as well as the MLFF toolkits widely employed in numerous applications.
Collapse
Affiliation(s)
- Shiru Wu
- Key Laboratory of Flexible Electronics (KLOFE) & Institute of Advanced Materials (IAM), Nanjing Tech University (Nanjing Tech), Nanjing 211816, P. R. China
| | - Xiaowei Yang
- Key Laboratory of Flexible Electronics (KLOFE) & Institute of Advanced Materials (IAM), Nanjing Tech University (Nanjing Tech), Nanjing 211816, P. R. China
| | - Xun Zhao
- Key Laboratory of Flexible Electronics (KLOFE) & Institute of Advanced Materials (IAM), Nanjing Tech University (Nanjing Tech), Nanjing 211816, P. R. China
| | - Zhipu Li
- Key Laboratory of Flexible Electronics (KLOFE) & Institute of Advanced Materials (IAM), Nanjing Tech University (Nanjing Tech), Nanjing 211816, P. R. China
| | - Min Lu
- Key Laboratory of Flexible Electronics (KLOFE) & Institute of Advanced Materials (IAM), Nanjing Tech University (Nanjing Tech), Nanjing 211816, P. R. China
| | - Xiaoji Xie
- Key Laboratory of Flexible Electronics (KLOFE) & Institute of Advanced Materials (IAM), Nanjing Tech University (Nanjing Tech), Nanjing 211816, P. R. China
| | - Jiaxu Yan
- Key Laboratory of Flexible Electronics (KLOFE) & Institute of Advanced Materials (IAM), Nanjing Tech University (Nanjing Tech), Nanjing 211816, P. R. China
- Changchun Institute of Optics, Fine Mechanics & Physics (CIOMP), Chinese Academy of Sciences, Changchun 130033, P. R. China
- University of Chinese Academy of Sciences, Chinese Academy of Sciences, Beijing 100049, P. R. China
| |
Collapse
|
15
|
Plé T, Lagardère L, Piquemal JP. Force-field-enhanced neural network interactions: from local equivariant embedding to atom-in-molecule properties and long-range effects. Chem Sci 2023; 14:12554-12569. [PMID: 38020379 PMCID: PMC10646944 DOI: 10.1039/d3sc02581k] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 10/03/2023] [Indexed: 12/01/2023] Open
Abstract
We introduce FENNIX (Force-Field-Enhanced Neural Network InteraXions), a hybrid approach between machine-learning and force-fields. We leverage state-of-the-art equivariant neural networks to predict local energy contributions and multiple atom-in-molecule properties that are then used as geometry-dependent parameters for physically-motivated energy terms which account for long-range electrostatics and dispersion. Using high-accuracy ab initio data (small organic molecules/dimers), we trained a first version of the model. Exhibiting accurate gas-phase energy predictions, FENNIX is transferable to the condensed phase. It is able to produce stable Molecular Dynamics simulations, including nuclear quantum effects, for water predicting accurate liquid properties. The extrapolating power of the hybrid physically-driven machine learning FENNIX approach is exemplified by computing: (i) the solvated alanine dipeptide free energy landscape; (ii) the reactive dissociation of small molecules.
Collapse
Affiliation(s)
- Thomas Plé
- Sorbonne Université, LCT, UMR 7616 CNRS F-75005 Paris France thomas.ple@sorbonne-université louis.lagardere@sorbonne-université jean-philip.piquemal@sorbonne-université
| | - Louis Lagardère
- Sorbonne Université, LCT, UMR 7616 CNRS F-75005 Paris France thomas.ple@sorbonne-université louis.lagardere@sorbonne-université jean-philip.piquemal@sorbonne-université
| | - Jean-Philip Piquemal
- Sorbonne Université, LCT, UMR 7616 CNRS F-75005 Paris France thomas.ple@sorbonne-université louis.lagardere@sorbonne-université jean-philip.piquemal@sorbonne-université
| |
Collapse
|
16
|
Liu Z, Moroz YS, Isayev O. The challenge of balancing model sensitivity and robustness in predicting yields: a benchmarking study of amide coupling reactions. Chem Sci 2023; 14:10835-10846. [PMID: 37829036 PMCID: PMC10566507 DOI: 10.1039/d3sc03902a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 09/12/2023] [Indexed: 10/14/2023] Open
Abstract
Accurate prediction of reaction yield is the holy grail for computer-assisted synthesis prediction, but current models have failed to generalize to large literature datasets. To understand the causes and inspire future design, we systematically benchmarked the yield prediction task. We carefully curated and augmented a literature dataset of 41 239 amide coupling reactions, each with information on reactants, products, intermediates, yields, and reaction contexts, and provided 3D structures for the molecules. We calculated molecular features related to 2D and 3D structure information, as well as physical and electronic properties. These descriptors were paired with 4 categories of machine learning methods (linear, kernel, ensemble, and neural network), yielding valuable benchmarks about feature and model performance. Despite the excellent performance on a high-throughput experiment (HTE) dataset (R2 around 0.9), no method gave satisfactory results on the literature data. The best performance was an R2 of 0.395 ± 0.020 using the stack technique. Error analysis revealed that reactivity cliff and yield uncertainty are among the main reasons for incorrect predictions. Removing reactivity cliffs and uncertain reactions boosted the R2 to 0.457 ± 0.006. These results highlight that yield prediction models must be sensitive to the reactivity change due to the subtle structure variance, as well as be robust to the uncertainty associated with yield measurements.
Collapse
Affiliation(s)
- Zhen Liu
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University Pittsburgh PA 15213 USA
| | - Yurii S Moroz
- Enamine Ltd Kyïv 02660 Ukraine
- Chemspace LLC Kyïv 02094 Ukraine
- Taras Shevchenko National University of Kyïv Kyïv 01601 Ukraine
| | - Olexandr Isayev
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University Pittsburgh PA 15213 USA
| |
Collapse
|
17
|
Fedik N, Nebgen B, Lubbers N, Barros K, Kulichenko M, Li YW, Zubatyuk R, Messerly R, Isayev O, Tretiak S. Synergy of semiempirical models and machine learning in computational chemistry. J Chem Phys 2023; 159:110901. [PMID: 37712780 DOI: 10.1063/5.0151833] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 07/11/2023] [Indexed: 09/16/2023] Open
Abstract
Catalyzed by enormous success in the industrial sector, many research programs have been exploring data-driven, machine learning approaches. Performance can be poor when the model is extrapolated to new regions of chemical space, e.g., new bonding types, new many-body interactions. Another important limitation is the spatial locality assumption in model architecture, and this limitation cannot be overcome with larger or more diverse datasets. The outlined challenges are primarily associated with the lack of electronic structure information in surrogate models such as interatomic potentials. Given the fast development of machine learning and computational chemistry methods, we expect some limitations of surrogate models to be addressed in the near future; nevertheless spatial locality assumption will likely remain a limiting factor for their transferability. Here, we suggest focusing on an equally important effort-design of physics-informed models that leverage the domain knowledge and employ machine learning only as a corrective tool. In the context of material science, we will focus on semi-empirical quantum mechanics, using machine learning to predict corrections to the reduced-order Hamiltonian model parameters. The resulting models are broadly applicable, retain the speed of semiempirical chemistry, and frequently achieve accuracy on par with much more expensive ab initio calculations. These early results indicate that future work, in which machine learning and quantum chemistry methods are developed jointly, may provide the best of all worlds for chemistry applications that demand both high accuracy and high numerical efficiency.
Collapse
Affiliation(s)
- Nikita Fedik
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - Benjamin Nebgen
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - Nicholas Lubbers
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - Kipton Barros
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - Maksim Kulichenko
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - Ying Wai Li
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - Roman Zubatyuk
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| | - Richard Messerly
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - Olexandr Isayev
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| | - Sergei Tretiak
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
- Center for Integrated Nanotechnologies Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| |
Collapse
|
18
|
Chen K, Kunkel C, Cheng B, Reuter K, Margraf JT. Physics-inspired machine learning of localized intensive properties. Chem Sci 2023; 14:4913-4922. [PMID: 37181767 PMCID: PMC10171074 DOI: 10.1039/d3sc00841j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 04/10/2023] [Indexed: 05/16/2023] Open
Abstract
Machine learning (ML) has been widely applied to chemical property prediction, most prominently for the energies and forces in molecules and materials. The strong interest in predicting energies in particular has led to a 'local energy'-based paradigm for modern atomistic ML models, which ensures size-extensivity and a linear scaling of computational cost with system size. However, many electronic properties (such as excitation energies or ionization energies) do not necessarily scale linearly with system size and may even be spatially localized. Using size-extensive models in these cases can lead to large errors. In this work, we explore different strategies for learning intensive and localized properties, using HOMO energies in organic molecules as a representative test case. In particular, we analyze the pooling functions that atomistic neural networks use to predict molecular properties, and suggest an orbital weighted average (OWA) approach that enables the accurate prediction of orbital energies and locations.
Collapse
Affiliation(s)
- Ke Chen
- Fritz-Haber-Institut der Max-Planck-Gesellschaft Faradayweg 4-6 D-14195 Berlin Germany
- Chair for Theoretical Chemistry and Catalysis Research Center, Technische Universität München Lichtenbergstraße 4 D-85747 Garching Germany
- Institute of Science and Technology Am Campus 1 3400 Klosterneuburg Austria
| | - Christian Kunkel
- Fritz-Haber-Institut der Max-Planck-Gesellschaft Faradayweg 4-6 D-14195 Berlin Germany
| | - Bingqing Cheng
- Institute of Science and Technology Am Campus 1 3400 Klosterneuburg Austria
| | - Karsten Reuter
- Fritz-Haber-Institut der Max-Planck-Gesellschaft Faradayweg 4-6 D-14195 Berlin Germany
- Chair for Theoretical Chemistry and Catalysis Research Center, Technische Universität München Lichtenbergstraße 4 D-85747 Garching Germany
| | - Johannes T Margraf
- Fritz-Haber-Institut der Max-Planck-Gesellschaft Faradayweg 4-6 D-14195 Berlin Germany
| |
Collapse
|
19
|
Bhat V, Callaway CP, Risko C. Computational Approaches for Organic Semiconductors: From Chemical and Physical Understanding to Predicting New Materials. Chem Rev 2023. [PMID: 37141497 DOI: 10.1021/acs.chemrev.2c00704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
While a complete understanding of organic semiconductor (OSC) design principles remains elusive, computational methods─ranging from techniques based in classical and quantum mechanics to more recent data-enabled models─can complement experimental observations and provide deep physicochemical insights into OSC structure-processing-property relationships, offering new capabilities for in silico OSC discovery and design. In this Review, we trace the evolution of these computational methods and their application to OSCs, beginning with early quantum-chemical methods to investigate resonance in benzene and building to recent machine-learning (ML) techniques and their application to ever more sophisticated OSC scientific and engineering challenges. Along the way, we highlight the limitations of the methods and how sophisticated physical and mathematical frameworks have been created to overcome those limitations. We illustrate applications of these methods to a range of specific challenges in OSCs derived from π-conjugated polymers and molecules, including predicting charge-carrier transport, modeling chain conformations and bulk morphology, estimating thermomechanical properties, and describing phonons and thermal transport, to name a few. Through these examples, we demonstrate how advances in computational methods accelerate the deployment of OSCsin wide-ranging technologies, such as organic photovoltaics (OPVs), organic light-emitting diodes (OLEDs), organic thermoelectrics, organic batteries, and organic (bio)sensors. We conclude by providing an outlook for the future development of computational techniques to discover and assess the properties of high-performing OSCs with greater accuracy.
Collapse
Affiliation(s)
- Vinayak Bhat
- Department of Chemistry & Center for Applied Energy Research, University of Kentucky, Lexington, Kentucky 40506-0055, United States
| | - Connor P Callaway
- Department of Chemistry & Center for Applied Energy Research, University of Kentucky, Lexington, Kentucky 40506-0055, United States
| | - Chad Risko
- Department of Chemistry & Center for Applied Energy Research, University of Kentucky, Lexington, Kentucky 40506-0055, United States
| |
Collapse
|
20
|
Li CH, Tabor DP. Reorganization Energy Predictions with Graph Neural Networks Informed by Low-Cost Conformers. J Phys Chem A 2023; 127:3484-3489. [PMID: 37017992 PMCID: PMC10848248 DOI: 10.1021/acs.jpca.2c09030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Revised: 02/21/2023] [Indexed: 04/06/2023]
Abstract
A critical bottleneck for the design of high-conductivity organic materials is finding molecules with low reorganization energy. To enable high-throughput virtual screening campaigns for many types of organic electronic materials, a fast reorganization energy prediction method compared to density functional theory is needed. However, the development of low-cost machine-learning-based models for calculating the reorganization energy has proven to be challenging. In this paper, we combine a 3D graph-based neural network (GNN) recently benchmarked for drug design applications, ChIRo, with low-cost conformational features for reorganization energy predictions. By comparing the performance of ChIRo to another 3D GNN, SchNet, we find evidence that the bond-invariant property of ChIRo enables the model to learn from low-cost conformational features more efficiently. Through an ablation study with a 2D GNN, we find that using low-cost conformational features on top of 2D features informs the model for making more accurate predictions. Our results demonstrate the feasibility of reorganization energy predictions on the benchmark QM9 data set without needing DFT-optimized geometries and demonstrate the types of features needed for robust models that work on diverse chemical spaces. Furthermore, we show that ChIRo informed with low-cost conformational features achieves comparable performance with the previously reported structure-based model on π-conjugated hydrocarbon molecules. We expect this class of methods can be applied to the high-throughput screening of high-conductivity organic electronics candidates.
Collapse
Affiliation(s)
- Cheng-Han Li
- Department
of Chemistry, Texas A&M University, College Station, Texas 77843, United States
| | - Daniel P. Tabor
- Department
of Chemistry, Texas A&M University, College Station, Texas 77843, United States
| |
Collapse
|
21
|
Abstract
Advances in machine learned interatomic potentials (MLIPs), such as those using neural networks, have resulted in short-range models that can infer interaction energies with near ab initio accuracy and orders of magnitude reduced computational cost. For many atom systems, including macromolecules, biomolecules, and condensed matter, model accuracy can become reliant on the description of short- and long-range physical interactions. The latter terms can be difficult to incorporate into an MLIP framework. Recent research has produced numerous models with considerations for nonlocal electrostatic and dispersion interactions, leading to a large range of applications that can be addressed using MLIPs. In light of this, we present a Perspective focused on key methodologies and models being used where the presence of nonlocal physics and chemistry are crucial for describing system properties. The strategies covered include MLIPs augmented with dispersion corrections, electrostatics calculated with charges predicted from atomic environment descriptors, the use of self-consistency and message passing iterations to propagated nonlocal system information, and charges obtained via equilibration schemes. We aim to provide a pointed discussion to support the development of machine learning-based interatomic potentials for systems where contributions from only nearsighted terms are deficient.
Collapse
Affiliation(s)
- Dylan M Anstine
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Olexandr Isayev
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| |
Collapse
|
22
|
Zeng J, Tao Y, Giese TJ, York DM. QDπ: A Quantum Deep Potential Interaction Model for Drug Discovery. J Chem Theory Comput 2023; 19:1261-1275. [PMID: 36696673 PMCID: PMC9992268 DOI: 10.1021/acs.jctc.2c01172] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
We report QDπ-v1.0 for modeling the internal energy of drug molecules containing H, C, N, and O atoms. The QDπ model is in the form of a quantum mechanical/machine learning potential correction (QM/Δ-MLP) that uses a fast third-order self-consistent density-functional tight-binding (DFTB3/3OB) model that is corrected to a quantitatively high-level of accuracy through a deep-learning potential (DeepPot-SE). The model has the advantage that it is able to properly treat electrostatic interactions and handle changes in charge/protonation states. The model is trained against reference data computed at the ωB97X/6-31G* level (as in the ANI-1x data set) and compared to several other approximate semiempirical and machine learning potentials (ANI-1x, ANI-2x, DFTB3, MNDO/d, AM1, PM6, GFN1-xTB, and GFN2-xTB). The QDπ model is demonstrated to be accurate for a wide range of intra- and intermolecular interactions (despite its intended use as an internal energy model) and has shown to perform exceptionally well for relative protonation/deprotonation energies and tautomers. An example application to model reactions involved in RNA strand cleavage catalyzed by protein and nucleic acid enzymes illustrates QDπ has average errors less than 0.5 kcal/mol, whereas the other models compared have errors over an order of magnitude greater. Taken together, this makes QDπ highly attractive as a potential force field model for drug discovery.
Collapse
Affiliation(s)
- Jinzhe Zeng
- Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, NJ 08854, USA
| | - Yujun Tao
- Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, NJ 08854, USA
| | - Timothy J. Giese
- Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, NJ 08854, USA
| | - Darrin M. York
- Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, NJ 08854, USA
| |
Collapse
|
23
|
Eastman P, Behara PK, Dotson DL, Galvelis R, Herr JE, Horton JT, Mao Y, Chodera JD, Pritchard BP, Wang Y, De Fabritiis G, Markland TE. SPICE, A Dataset of Drug-like Molecules and Peptides for Training Machine Learning Potentials. Sci Data 2023; 10:11. [PMID: 36599873 PMCID: PMC9813265 DOI: 10.1038/s41597-022-01882-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 12/01/2022] [Indexed: 01/05/2023] Open
Abstract
Machine learning potentials are an important tool for molecular simulation, but their development is held back by a shortage of high quality datasets to train them on. We describe the SPICE dataset, a new quantum chemistry dataset for training potentials relevant to simulating drug-like small molecules interacting with proteins. It contains over 1.1 million conformations for a diverse set of small molecules, dimers, dipeptides, and solvated amino acids. It includes 15 elements, charged and uncharged molecules, and a wide range of covalent and non-covalent interactions. It provides both forces and energies calculated at the ωB97M-D3(BJ)/def2-TZVPPD level of theory, along with other useful quantities such as multipole moments and bond orders. We train a set of machine learning potentials on it and demonstrate that they can achieve chemical accuracy across a broad region of chemical space. It can serve as a valuable resource for the creation of transferable, ready to use potential functions for use in molecular simulations.
Collapse
Affiliation(s)
- Peter Eastman
- Department of Chemistry, Stanford University, Stanford, CA, 94305, USA.
| | - Pavan Kumar Behara
- Department of Pharmaceutical Sciences, University of California, Irvine, CA, 92697, USA
| | - David L Dotson
- The Open Force Field Initiative, Open Molecular Software Foundation, Davis, CA, 95616, USA
| | | | - John E Herr
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Josh T Horton
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, NE1 7RU, United Kingdom
| | - Yuezhi Mao
- Department of Chemistry, Stanford University, Stanford, CA, 94305, USA
| | - John D Chodera
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
| | - Benjamin P Pritchard
- Molecular Sciences Software Institute, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24060, USA
| | - Yuanqing Wang
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
- Graduate Program in Physiology, Biophysics, and Systems Biology, Weill Cornell Graduate School of Medical Sciences, New York, NY, 10065, USA
| | - Gianni De Fabritiis
- Acellera Labs, Doctor Trueta 183, 08005, Barcelona, Spain
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003, Barcelona, Spain and ICREA, Passeig Lluis Companys 23, 08010, Barcelona, Spain
| | - Thomas E Markland
- Department of Chemistry, Stanford University, Stanford, CA, 94305, USA
| |
Collapse
|
24
|
Liu Z, Zubatiuk T, Roitberg A, Isayev O. Auto3D: Automatic Generation of the Low-Energy 3D Structures with ANI Neural Network Potentials. J Chem Inf Model 2022; 62:5373-5382. [PMID: 36112860 DOI: 10.1021/acs.jcim.2c00817] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Computational programs accelerate the chemical discovery processes but often need proper three-dimensional molecular information as part of the input. Getting optimal molecular structures is challenging because it requires enumerating and optimizing a huge space of stereoisomers and conformers. We developed the Python-based Auto3D package for generating the low-energy 3D structures using SMILES as the input. Auto3D is based on state-of-the-art algorithms and can automatize the isomer enumeration and duplicate filtering process, 3D building process, geometry optimization, and ranking process. Tested on 50 molecules with multiple unspecified stereocenters, Auto3D is guaranteed to find the stereoconfiguration that yields the lowest-energy conformer. With Auto3D, we provide an extension of the ANI model. The new model, dubbed ANI-2xt, is trained on a tautomer-rich data set. ANI-2xt is benchmarked with DFT methods on geometry optimization and electronic and Gibbs free energy calculations. Compared with ANI-2x, ANI-2xt provides a 42% error reduction for tautomeric reaction energy calculations when using the gold-standard coupled-cluster calculation as the reference. ANI-2xt can accurately predict the energies and is several orders of magnitude faster than DFT methods.
Collapse
Affiliation(s)
- Zhen Liu
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania15213, United States
| | - Tetiana Zubatiuk
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania15213, United States
| | - Adrian Roitberg
- Department of Chemistry, University of Florida, Gainesville, Florida32611, United States
| | - Olexandr Isayev
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania15213, United States
| |
Collapse
|
25
|
Cheng L, Sun J, Deustua JE, Bhethanabotla VC, Miller TF. Molecular-orbital-based machine learning for open-shell and multi-reference systems with kernel addition Gaussian process regression. J Chem Phys 2022; 157:154105. [PMID: 36272799 DOI: 10.1063/5.0110886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We introduce a novel machine learning strategy, kernel addition Gaussian process regression (KA-GPR), in molecular-orbital-based machine learning (MOB-ML) to learn the total correlation energies of general electronic structure theories for closed- and open-shell systems by introducing a machine learning strategy. The learning efficiency of MOB-ML(KA-GPR) is the same as the original MOB-ML method for the smallest criegee molecule, which is a closed-shell molecule with multi-reference characters. In addition, the prediction accuracies of different small free radicals could reach the chemical accuracy of 1 kcal/mol by training on one example structure. Accurate potential energy surfaces for the H10 chain (closed-shell) and water OH bond dissociation (open-shell) could also be generated by MOB-ML(KA-GPR). To explore the breadth of chemical systems that KA-GPR can describe, we further apply MOB-ML to accurately predict the large benchmark datasets for closed- (QM9, QM7b-T, and GDB-13-T) and open-shell (QMSpin) molecules.
Collapse
Affiliation(s)
- Lixue Cheng
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| | - Jiace Sun
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| | - J Emiliano Deustua
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| | - Vignesh C Bhethanabotla
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| | - Thomas F Miller
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| |
Collapse
|
26
|
Nie W, Liu D, Li S, Yu H, Fu Y. Nucleophilicity Prediction Using Graph Neural Networks. J Chem Inf Model 2022; 62:4319-4328. [PMID: 36097394 DOI: 10.1021/acs.jcim.2c00696] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The quantitative description between chemical reaction rates and nucleophilicity parameters plays a crucial role in organic chemistry. In this regard, the formula proposed by Mayr et al. and the constructed reactivity database are important representatives. However, the determination of Mayr's nucleophilicity parameter N often requires time-consuming experiments with reference electrophiles in the solvent. Several machine learning (ML)-based models have been proposed to realize the data-driven prediction of N in recent years. However, in addition to DFT-calculated electronic descriptors, most of them also use a set of artificially predefined structural descriptors as input, which may result in a biased representation of the nucleophile's structural information depending on descriptors' definition preference. Compared with traditional ML algorithms, graph neural networks (GNNs) can naturally take the molecule's structural information into account by applying the message passing technique. We herein proposed a SchNet-based GNN model that only takes the molecular conformation and solvent type as input. The model achieves a comparable performance to the previous benchmark study on 10-fold cross-validation of 894 data points (R2 = 0.91, RMSE = 2.25). To enhance the model's ability to capture the molecule's electronic information, some DFT-calculated parameters are then incorporated into the model via graph global features, and substantial improvement is achieved in the prediction precision (R2 = 0.95, RMSE = 1.63). These results demonstrate that both structural and electronic information are important for the prediction of N, and GNN can integrate these two kinds of information more effectively.
Collapse
Affiliation(s)
- Wan Nie
- Hefei National Laboratory for Physical Sciences at the Microscale, CAS Key Laboratory of Urban Pollutant Conversion, Anhui Province Key Laboratory of Biomass Clean Energy, Center for Excellence in Molecular Synthesis of CAS, Institute of Energy, Hefei Comprehensive National Science Center, University of Science and Technology of China, Hefei 230026, China.,Department of Computer Science, City University of Hong Kong, Hong Kong 999077, China
| | - Deguang Liu
- Hefei National Laboratory for Physical Sciences at the Microscale, CAS Key Laboratory of Urban Pollutant Conversion, Anhui Province Key Laboratory of Biomass Clean Energy, Center for Excellence in Molecular Synthesis of CAS, Institute of Energy, Hefei Comprehensive National Science Center, University of Science and Technology of China, Hefei 230026, China
| | - Shuaicheng Li
- Department of Computer Science, City University of Hong Kong, Hong Kong 999077, China
| | - Haizhu Yu
- Department of Chemistry and Centre for Atomic Engineering of Advanced Materials, Anhui Province Key Laboratory of Chemistry for Inorganic/Organic Hybrid Functionalized Materials, Anhui University, Hefei 230601, China
| | - Yao Fu
- Hefei National Laboratory for Physical Sciences at the Microscale, CAS Key Laboratory of Urban Pollutant Conversion, Anhui Province Key Laboratory of Biomass Clean Energy, Center for Excellence in Molecular Synthesis of CAS, Institute of Energy, Hefei Comprehensive National Science Center, University of Science and Technology of China, Hefei 230026, China
| |
Collapse
|
27
|
Fedik N, Zubatyuk R, Kulichenko M, Lubbers N, Smith JS, Nebgen B, Messerly R, Li YW, Boldyrev AI, Barros K, Isayev O, Tretiak S. Extending machine learning beyond interatomic potentials for predicting molecular properties. Nat Rev Chem 2022; 6:653-672. [PMID: 37117713 DOI: 10.1038/s41570-022-00416-3] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/15/2022] [Indexed: 11/09/2022]
Abstract
Machine learning (ML) is becoming a method of choice for modelling complex chemical processes and materials. ML provides a surrogate model trained on a reference dataset that can be used to establish a relationship between a molecular structure and its chemical properties. This Review highlights developments in the use of ML to evaluate chemical properties such as partial atomic charges, dipole moments, spin and electron densities, and chemical bonding, as well as to obtain a reduced quantum-mechanical description. We overview several modern neural network architectures, their predictive capabilities, generality and transferability, and illustrate their applicability to various chemical properties. We emphasize that learned molecular representations resemble quantum-mechanical analogues, demonstrating the ability of the models to capture the underlying physics. We also discuss how ML models can describe non-local quantum effects. Finally, we conclude by compiling a list of available ML toolboxes, summarizing the unresolved challenges and presenting an outlook for future development. The observed trends demonstrate that this field is evolving towards physics-based models augmented by ML, which is accompanied by the development of new methods and the rapid growth of user-friendly ML frameworks for chemistry.
Collapse
|
28
|
Karandashev K, von Lilienfeld OA. An orbital-based representation for accurate quantum machine learning. J Chem Phys 2022; 156:114101. [DOI: 10.1063/5.0083301] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We introduce an electronic structure based representation for quantum machine learning (QML) of electronic properties throughout chemical compound space. The representation is constructed using computationally inexpensive ab initio calculations and explicitly accounts for changes in the electronic structure. We demonstrate the accuracy and flexibility of resulting QML models when applied to property labels, such as total potential energy, HOMO and LUMO energies, ionization potential, and electron affinity, using as datasets for training and testing entries from the QM7b, QM7b-T, QM9, and LIBE libraries. For the latter, we also demonstrate the ability of this approach to account for molecular species of different charge and spin multiplicity, resulting in QML models that infer total potential energies based on geometry, charge, and spin as input.
Collapse
Affiliation(s)
| | - O. Anatole von Lilienfeld
- Faculty of Physics, University of Vienna, Kolingasse 14-16, AT-1090 Wien, Austria
- Department of Chemistry, Institute of Physical Chemistry and National Center for Computational Design and Discovery of Novel Materials (MARVEL), University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| |
Collapse
|
29
|
Cools-Ceuppens M, Dambre J, Verstraelen T. Modeling Electronic Response Properties with an Explicit-Electron Machine Learning Potential. J Chem Theory Comput 2022; 18:1672-1691. [PMID: 35171606 DOI: 10.1021/acs.jctc.1c00978] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Explicit-electron force fields introduce electrons or electron pairs as semiclassical particles in force fields or empirical potentials, which are suitable for molecular dynamics simulations. Even though semiclassical electrons are a drastic simplification compared to a quantum-mechanical electronic wave function, they still retain a relatively detailed electronic model compared to conventional polarizable and reactive force fields. The ability of explicit-electron models to describe chemical reactions and electronic response properties has already been demonstrated, yet the description of short-range interactions for a broad range of chemical systems remains challenging. In this work, we present the electron machine learning potential (eMLP), a new explicit electron force field in which the short-range interactions are modeled with machine learning. The electron pair particles will be located at well-defined positions, derived from localized molecular orbitals or Wannier centers, naturally imposing the correct dielectric and piezoelectric behavior of the system. The eMLP is benchmarked on two newly constructed data sets: eQM7, an extension of the QM7 data set for small molecules, and a data set for the crystalline β-glycine. It is shown that the eMLP can predict dipole moments, polarizabilities, and IR-spectra of unseen molecules with high precision. Furthermore, a variety of response properties, for example, stiffness or piezoelectric constants, can be accurately reproduced.
Collapse
Affiliation(s)
- Maarten Cools-Ceuppens
- Center for Molecular Modeling (CMM), Ghent University, Technologiepark-Zwijnaarde 46, B-9052 Gent, Belgium
| | - Joni Dambre
- IDLab, Electronics and Information Systems Department, Ghent University-imec, Technologiepark-Zwijnaarde 126, B-9052 Gent, Belgium
| | - Toon Verstraelen
- Center for Molecular Modeling (CMM), Ghent University, Technologiepark-Zwijnaarde 46, B-9052 Gent, Belgium
| |
Collapse
|
30
|
Gallegos M, Guevara-Vela JM, Pendás ÁM. NNAIMQ: A neural network model for predicting QTAIM charges. J Chem Phys 2022; 156:014112. [PMID: 34998318 DOI: 10.1063/5.0076896] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Atomic charges provide crucial information about the electronic structure of a molecular system. Among the different definitions of these descriptors, the one proposed by the Quantum Theory of Atoms in Molecules (QTAIM) is particularly attractive given its invariance against orbital transformations although the computational cost associated with their calculation limits its applicability. Given that Machine Learning (ML) techniques have been shown to accelerate orders of magnitude the computation of a number of quantum mechanical observables, in this work, we take advantage of ML knowledge to develop an intuitive and fast neural network model (NNAIMQ) for the computation of QTAIM charges for C, H, O, and N atoms with high accuracy. Our model has been trained and tested using data from quantum chemical calculations in more than 45 000 molecular environments of the near-equilibrium CHON chemical space. The reliability and performance of NNAIMQ have been analyzed in a variety of scenarios, from equilibrium geometries to molecular dynamics simulations. Altogether, NNAIMQ yields remarkably small prediction errors, well below the 0.03 electron limit in the general case, while accelerating the calculation of QTAIM charges by several orders of magnitude.
Collapse
Affiliation(s)
- Miguel Gallegos
- Depto. Química Física y Analítica, Universidad de Oviedo, 33006 Oviedo, Spain
| | - José Manuel Guevara-Vela
- Institute of Chemistry, National Autonomous University of Mexico, Circuito Exterior, Ciudad Universitaria, Delegación Coyoacán, Mexico City C.P. 04510, Mexico
| | - Ángel Martín Pendás
- Depto. Química Física y Analítica, Universidad de Oviedo, 33006 Oviedo, Spain
| |
Collapse
|
31
|
Abstract
In the past two decades, machine learning potentials (MLPs) have reached a level of maturity that now enables applications to large-scale atomistic simulations of a wide range of systems in chemistry, physics, and materials science. Different machine learning algorithms have been used with great success in the construction of these MLPs. In this review, we discuss an important group of MLPs relying on artificial neural networks to establish a mapping from the atomic structure to the potential energy. In spite of this common feature, there are important conceptual differences among MLPs, which concern the dimensionality of the systems, the inclusion of long-range electrostatic interactions, global phenomena like nonlocal charge transfer, and the type of descriptor used to represent the atomic structure, which can be either predefined or learnable. A concise overview is given along with a discussion of the open challenges in the field. Expected final online publication date for the Annual Review of Physical Chemistry, Volume 73 is April 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Emir Kocer
- Institut für Physikalische Chemie, Theoretische Chemie, Universität Göttingen, Göttingen, Germany;, ,
| | - Tsz Wai Ko
- Institut für Physikalische Chemie, Theoretische Chemie, Universität Göttingen, Göttingen, Germany;, ,
| | - Jörg Behler
- Institut für Physikalische Chemie, Theoretische Chemie, Universität Göttingen, Göttingen, Germany;, ,
| |
Collapse
|
32
|
Unke OT, Chmiela S, Gastegger M, Schütt KT, Sauceda HE, Müller KR. SpookyNet: Learning force fields with electronic degrees of freedom and nonlocal effects. Nat Commun 2021; 12:7273. [PMID: 34907176 PMCID: PMC8671403 DOI: 10.1038/s41467-021-27504-0] [Citation(s) in RCA: 87] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 11/16/2021] [Indexed: 01/12/2023] Open
Abstract
Machine-learned force fields combine the accuracy of ab initio methods with the efficiency of conventional force fields. However, current machine-learned force fields typically ignore electronic degrees of freedom, such as the total charge or spin state, and assume chemical locality, which is problematic when molecules have inconsistent electronic states, or when nonlocal effects play a significant role. This work introduces SpookyNet, a deep neural network for constructing machine-learned force fields with explicit treatment of electronic degrees of freedom and nonlocality, modeled via self-attention in a transformer architecture. Chemically meaningful inductive biases and analytical corrections built into the network architecture allow it to properly model physical limits. SpookyNet improves upon the current state-of-the-art (or achieves similar performance) on popular quantum chemistry data sets. Notably, it is able to generalize across chemical and conformational space and can leverage the learned chemical insights, e.g. by predicting unknown spin states, thus helping to close a further important remaining gap for today's machine learning models in quantum chemistry.
Collapse
Affiliation(s)
- Oliver T Unke
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany.
- DFG Cluster of Excellence "Unifying Systems in Catalysis" (UniSysCat), Technische Universität Berlin, 10623, Berlin, Germany.
| | - Stefan Chmiela
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany
| | - Michael Gastegger
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany
- DFG Cluster of Excellence "Unifying Systems in Catalysis" (UniSysCat), Technische Universität Berlin, 10623, Berlin, Germany
| | - Kristof T Schütt
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany
| | - Huziel E Sauceda
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany
- BASLEARN, BASF-TU joint Lab, Technische Universität Berlin, 10587, Berlin, Germany
| | - Klaus-Robert Müller
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany.
- Department of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul, 02841, Korea.
- Max Planck Institute for Informatics, Stuhlsatzenhausweg, 66123, Saarbrücken, Germany.
- BIFOLD-Berlin Institute for the Foundations of Learning and Data, Berlin, Germany.
- Google Research, Brain team, Berlin, Germany.
| |
Collapse
|
33
|
Zheng P, Zubatyuk R, Wu W, Isayev O, Dral PO. Artificial intelligence-enhanced quantum chemical method with broad applicability. Nat Commun 2021; 12:7022. [PMID: 34857738 PMCID: PMC8640006 DOI: 10.1038/s41467-021-27340-2] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Accepted: 11/10/2021] [Indexed: 11/08/2022] Open
Abstract
High-level quantum mechanical (QM) calculations are indispensable for accurate explanation of natural phenomena on the atomistic level. Their staggering computational cost, however, poses great limitations, which luckily can be lifted to a great extent by exploiting advances in artificial intelligence (AI). Here we introduce the general-purpose, highly transferable artificial intelligence-quantum mechanical method 1 (AIQM1). It approaches the accuracy of the gold-standard coupled cluster QM method with high computational speed of the approximate low-level semiempirical QM methods for the neutral, closed-shell species in the ground state. AIQM1 can provide accurate ground-state energies for diverse organic compounds as well as geometries for even challenging systems such as large conjugated compounds (fullerene C60) close to experiment. This opens an opportunity to investigate chemical compounds with previously unattainable speed and accuracy as we demonstrate by determining geometries of polyyne molecules-the task difficult for both experiment and theory. Noteworthy, our method's accuracy is also good for ions and excited-state properties, although the neural network part of AIQM1 was never fitted to these properties.
Collapse
Affiliation(s)
- Peikun Zheng
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | - Roman Zubatyuk
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
| | - Wei Wu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | - Olexandr Isayev
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, PA, 15213, USA.
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China.
| |
Collapse
|
34
|
Zlobin A, Diankin I, Pushkarev S, Golovin A. Probing the Suitability of Different Ca 2+ Parameters for Long Simulations of Diisopropyl Fluorophosphatase. Molecules 2021; 26:5839. [PMID: 34641383 PMCID: PMC8510429 DOI: 10.3390/molecules26195839] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 09/23/2021] [Accepted: 09/24/2021] [Indexed: 11/16/2022] Open
Abstract
Organophosphate hydrolases are promising as potential biotherapeutic agents to treat poisoning with pesticides or nerve gases. However, these enzymes often need to be further engineered in order to become useful in practice. One example of such enhancement is the alteration of enantioselectivity of diisopropyl fluorophosphatase (DFPase). Molecular modeling techniques offer a unique opportunity to address this task rationally by providing a physical description of the substrate-binding process. However, DFPase is a metalloenzyme, and correct modeling of metal cations is a challenging task generally coming with a tradeoff between simulation speed and accuracy. Here, we probe several molecular mechanical parameter combinations for their ability to empower long simulations needed to achieve a quantitative description of substrate binding. We demonstrate that a combination of the Amber19sb force field with the recently developed 12-6 Ca2+ models allows us to both correctly model DFPase and obtain new insights into the DFP binding process.
Collapse
Affiliation(s)
- Alexander Zlobin
- Faculty of Bioengineering, Lomonosov Moscow State University, 119234 Moscow, Russia; (I.D.); (S.P.)
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 117997 Moscow, Russia
- Sirius University of Science and Technology, 354340 Sochi, Russia
| | - Igor Diankin
- Faculty of Bioengineering, Lomonosov Moscow State University, 119234 Moscow, Russia; (I.D.); (S.P.)
- Sirius University of Science and Technology, 354340 Sochi, Russia
| | - Sergey Pushkarev
- Faculty of Bioengineering, Lomonosov Moscow State University, 119234 Moscow, Russia; (I.D.); (S.P.)
| | - Andrey Golovin
- Faculty of Bioengineering, Lomonosov Moscow State University, 119234 Moscow, Russia; (I.D.); (S.P.)
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 117997 Moscow, Russia
- Sirius University of Science and Technology, 354340 Sochi, Russia
| |
Collapse
|
35
|
Abstract
Chemical compound space (CCS), the set of all theoretically conceivable combinations of chemical elements and (meta-)stable geometries that make up matter, is colossal. The first-principles based virtual sampling of this space, for example, in search of novel molecules or materials which exhibit desirable properties, is therefore prohibitive for all but the smallest subsets and simplest properties. We review studies aimed at tackling this challenge using modern machine learning techniques based on (i) synthetic data, typically generated using quantum mechanics based methods, and (ii) model architectures inspired by quantum mechanics. Such Quantum mechanics based Machine Learning (QML) approaches combine the numerical efficiency of statistical surrogate models with an ab initio view on matter. They rigorously reflect the underlying physics in order to reach universality and transferability across CCS. While state-of-the-art approximations to quantum problems impose severe computational bottlenecks, recent QML based developments indicate the possibility of substantial acceleration without sacrificing the predictive power of quantum mechanics.
Collapse
Affiliation(s)
- Bing Huang
- Faculty
of Physics, University of Vienna, 1090 Vienna, Austria
| | - O. Anatole von Lilienfeld
- Faculty
of Physics, University of Vienna, 1090 Vienna, Austria
- Institute
of Physical Chemistry and National Center for Computational Design
and Discovery of Novel Materials (MARVEL), Department of Chemistry, University of Basel, 4056 Basel, Switzerland
| |
Collapse
|