1
|
van Gerwen P, Briling KR, Bunne C, Somnath VR, Laplaza R, Krause A, Corminboeuf C. 3DReact: Geometric Deep Learning for Chemical Reactions. J Chem Inf Model 2024; 64:5771-5785. [PMID: 39007724 DOI: 10.1021/acs.jcim.4c00104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Geometric deep learning models, which incorporate the relevant molecular symmetries within the neural network architecture, have considerably improved the accuracy and data efficiency of predictions of molecular properties. Building on this success, we introduce 3DReact, a geometric deep learning model to predict reaction properties from three-dimensional structures of reactants and products. We demonstrate that the invariant version of the model is sufficient for existing reaction data sets. We illustrate its competitive performance on the prediction of activation barriers on the GDB7-22-TS, Cyclo-23-TS, and Proparg-21-TS data sets in different atom-mapping regimes. We show that, compared to existing models for reaction property prediction, 3DReact offers a flexible framework that exploits atom-mapping information, if available, as well as geometries of reactants and products (in an invariant or equivariant fashion). Accordingly, it performs systematically well across different data sets, atom-mapping regimes, as well as both interpolation and extrapolation tasks.
Collapse
Affiliation(s)
- Puck van Gerwen
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National Center for Competence in Research - Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Ksenia R Briling
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Charlotte Bunne
- National Center for Competence in Research - Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
- Learning & Adaptive Systems Group, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland
| | - Vignesh Ram Somnath
- National Center for Competence in Research - Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
- Learning & Adaptive Systems Group, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland
| | - Ruben Laplaza
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National Center for Competence in Research - Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Andreas Krause
- National Center for Competence in Research - Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
- Learning & Adaptive Systems Group, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland
| | - Clemence Corminboeuf
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National Center for Competence in Research - Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
2
|
Briling K, Calvino Alonso Y, Fabrizio A, Corminboeuf C. SPA HM(a,b): Encoding the Density Information from Guess Hamiltonian in Quantum Machine Learning Representations. J Chem Theory Comput 2024; 20:1108-1117. [PMID: 38227222 PMCID: PMC10867806 DOI: 10.1021/acs.jctc.3c01040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 12/20/2023] [Accepted: 12/26/2023] [Indexed: 01/17/2024]
Abstract
Recently, we introduced a class of molecular representations for kernel-based regression methods─the spectrum of approximated Hamiltonian matrices (SPAHM)─that takes advantage of lightweight one-electron Hamiltonians traditionally used as a self-consistent field initial guess. The original SPAHM variant is built from occupied-orbital energies (i.e., eigenvalues) and naturally contains all of the information about nuclear charges, atomic positions, and symmetry requirements. Its advantages were demonstrated on data sets featuring a wide variation of charge and spin, for which traditional structure-based representations commonly fail. SPAHM(a,b), as introduced here, expand the eigenvalue SPAHM into local and transferable representations. They rely upon one-electron density matrices to build fingerprints from atomic and bond density overlap contributions inspired from preceding state-of-the-art representations. The performance and efficiency of SPAHM(a,b) is assessed on the predictions for data sets of prototypical organic molecules (QM7) of different charges and azoheteroarene dyes in an excited state. Overall, both SPAHM(a) and SPAHM(b) outperform state-of-the-art representations on difficult prediction tasks such as the atomic properties of charged open-shell species and of π-conjugated systems.
Collapse
Affiliation(s)
- Ksenia
R. Briling
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Yannick Calvino Alonso
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Alberto Fabrizio
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National
Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
| | - Clemence Corminboeuf
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National
Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
3
|
Chen Z, Wing-Wah Yam V. Encoding Hole-Particle Information in the Multi-Channel MolOrbImage for Machine-Learned Excited-State Energies of Large Photofunctional Materials. J Am Chem Soc 2023; 145:24098-24107. [PMID: 37874942 DOI: 10.1021/jacs.3c07766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2023]
Abstract
We present a novel class of one-electron multi-channel molecular orbital images (MolOrbImages) designed for the prediction of excited-state energetics in conjunction with the state-of-the-art VGG-type machine-learning architecture. By representing hole and particle states in the excitation process as channels of MolOrbImages, the revised VGG model achieves excellent prediction accuracy for both low-lying singlet and triplet states, with mean absolute errors (MAEs) of <0.08 and <0.1 eV for QM9 molecules and large photofunctional materials with up to 560 atoms, respectively. Remarkably, the model demonstrates exceptional performance (MAE < 1 kcal/mol) for the T1 state of QM9 molecules, making it a non-system-specific model that approaches chemical accuracy. The general rules attained, for instance, the improved performance with well-defined MO energies and the reduced overfitting concern via the inclusion of physically insightful hole-particle information, provide invaluable guidelines for the further design of orbital-based descriptors targeting molecular excited states.
Collapse
Affiliation(s)
- Ziyong Chen
- Institute of Molecular Functional Materials and Department of Chemistry, The University of Hong Kong, Pokfulam Road, Hong Kong, China
| | - Vivian Wing-Wah Yam
- Institute of Molecular Functional Materials and Department of Chemistry, The University of Hong Kong, Pokfulam Road, Hong Kong, China
- Hong Kong Quantum AI Lab Ltd., Hong Kong Science Park, Hong Kong, China
| |
Collapse
|
4
|
Ng WP, Liang Q, Yang J. Low-Data Deep Quantum Chemical Learning for Accurate MP2 and Coupled-Cluster Correlations. J Chem Theory Comput 2023; 19:5439-5449. [PMID: 37506400 DOI: 10.1021/acs.jctc.3c00518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/30/2023]
Abstract
Accurate ab initio prediction of electronic energies is very expensive for macromolecules by explicitly solving post-Hartree-Fock equations. We here exploit the physically justified local correlation feature in a compact basis of small molecules and construct an expressive low-data deep neural network (dNN) model to obtain machine-learned electron correlation energies on par with MP2 and CCSD levels of theory for more complex molecules and different datasets that are not represented in the training set. We show that our dNN-powered model is data efficient and makes highly transferable predictions across alkanes of various lengths, organic molecules with non-covalent and biomolecular interactions, as well as water clusters of different sizes and morphologies. In particular, by training 800 (H2O)8 clusters with the local correlation descriptors, accurate MP2/cc-pVTZ correlation energies up to (H2O)128 can be predicted with a small random error within chemical accuracy from exact values, while a majority of prediction deviations are attributed to an intrinsically systematic error. Our results reveal that an extremely compact local correlation feature set, which is poor for any direct post-Hartree-Fock calculations, has however a prominent advantage in reserving important electron correlation patterns for making accurate transferable predictions across distinct molecular compositions, bond types, and geometries.
Collapse
Affiliation(s)
- Wai-Pan Ng
- Department of Chemistry, The University of Hong Kong, Hong Kong 999077, P. R. China
- Hong Kong Quantum AI Lab Limited, Hong Kong 999077, P. R. China
| | - Qiujiang Liang
- Department of Chemistry, The University of Hong Kong, Hong Kong 999077, P. R. China
| | - Jun Yang
- Department of Chemistry, The University of Hong Kong, Hong Kong 999077, P. R. China
- Hong Kong Quantum AI Lab Limited, Hong Kong 999077, P. R. China
| |
Collapse
|
5
|
Chen Z, Yam VWW. Machine-Learned Electronically Excited States with the MolOrbImage Generated from the Molecular Ground State. J Phys Chem Lett 2023; 14:1955-1961. [PMID: 36787423 DOI: 10.1021/acs.jpclett.3c00014] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
We present a general machine learning framework for probing the electronic state properties using the novel quantum descriptor MolOrbImage. Each pixel of the MolOrbImage records the quantum information generated by the integration of the physical operator with a pair of bra and ket molecular orbital (MO) states. Inspired by the success of deep convolutional neural networks (NNs) in computer vision, we have implemented the convolutional-layer-dominated MO-NN model. Using the orbital energy and electron repulsion integral MolOrbImages, the MO-NN model achieves promising prediction accuracies against the ADC(2)/cc-pVTZ reference for transition energies to both low-lying singlet [mean absolute error (MAE) < 0.16 eV] and triplet (MAE < 0.14 eV) states. An apparent improvement in the prediction of oscillator strength, which has been shown to be challenging previously, has been demonstrated in this study. Moreover, the transferability test indicates the remarkable extrapolation capacity of the MO-NN model to describe the out of data set systems.
Collapse
Affiliation(s)
- Ziyong Chen
- Institute of Molecular Functional Materials and Department of Chemistry, The University of Hong Kong, Pokfulam Road, Hong Kong 999077, China
| | - Vivian Wing-Wah Yam
- Institute of Molecular Functional Materials and Department of Chemistry, The University of Hong Kong, Pokfulam Road, Hong Kong 999077, China
- Hong Kong Quantum AI Lab Ltd., Hong Kong Science Park, Hong Kong 999077, China
| |
Collapse
|
6
|
Cheng L, Sun J, Deustua JE, Bhethanabotla VC, Miller TF. Molecular-orbital-based machine learning for open-shell and multi-reference systems with kernel addition Gaussian process regression. J Chem Phys 2022; 157:154105. [PMID: 36272799 DOI: 10.1063/5.0110886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We introduce a novel machine learning strategy, kernel addition Gaussian process regression (KA-GPR), in molecular-orbital-based machine learning (MOB-ML) to learn the total correlation energies of general electronic structure theories for closed- and open-shell systems by introducing a machine learning strategy. The learning efficiency of MOB-ML(KA-GPR) is the same as the original MOB-ML method for the smallest criegee molecule, which is a closed-shell molecule with multi-reference characters. In addition, the prediction accuracies of different small free radicals could reach the chemical accuracy of 1 kcal/mol by training on one example structure. Accurate potential energy surfaces for the H10 chain (closed-shell) and water OH bond dissociation (open-shell) could also be generated by MOB-ML(KA-GPR). To explore the breadth of chemical systems that KA-GPR can describe, we further apply MOB-ML to accurately predict the large benchmark datasets for closed- (QM9, QM7b-T, and GDB-13-T) and open-shell (QMSpin) molecules.
Collapse
Affiliation(s)
- Lixue Cheng
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| | - Jiace Sun
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| | - J Emiliano Deustua
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| | - Vignesh C Bhethanabotla
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| | - Thomas F Miller
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| |
Collapse
|
7
|
Sun J, Cheng L, Miller TF. Molecular Dipole Moment Learning via Rotationally Equivariant Gaussian Process Regression with Derivatives in Molecular-orbital-based Machine Learning. J Chem Phys 2022; 157:104109. [DOI: 10.1063/5.0101280] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
This study extends the accurate and transferable molecular-orbital-based machine learning (MOB-ML) approach to modeling the contribution of electron correlation to dipole moments at the cost of Hartree--Fock computations. A molecular-orbital-based (MOB) pairwise decomposition of the correlation part of the dipole moment is applied, and these pair dipole moments could be further regressed as a universal function of molecular orbitals (MOs).The dipole MOB features consist of the energy MOB features and their responses to electric fields. An interpretable and rotationally equivariant Gaussian process regression (GPR) with derivatives algorithm is introduced to learn the dipole moment more efficiently. The proposed problem setup, feature design, and ML algorithm are shown to provide highly-accurate models for both dipole moment and energies on water and fourteen small molecules. To demonstrate the ability of MOB-ML to function as generalized density-matrix functionals for molecular dipole moments and energies of organic molecules, we further apply the proposed MOB-ML approach to train and test the molecules from the QM9 dataset. The application of local scalable GPR with Gaussian mixture model unsupervised clustering (GMM/GPR) scales up MOB-ML to a large-data regime while retaining the prediction accuracy. In addition, compared with literature results, MOB-ML provides the best test MAEs of 4.21 mDebye and 0.045 kcal/mol for dipole moment and energy models, respectively, when training on 110000 QM9 molecules. The excellent transferability of the resulting QM9 models is also illustrated by the accurate predictions for four different series of peptides.
Collapse
Affiliation(s)
- Jiace Sun
- Chemistry and Chemical Engineering, California Institute of Technology, United States of America
| | - Lixue Cheng
- Chemistry, California Institute of Technology, United States of America
| | - Thomas F Miller
- Division of Chemistry and Chemical Engineering, California Institute of Technology, United States of America
| |
Collapse
|