1
|
Xu Y, Jin Y, García Sánchez JS, Pérez-Lemus GR, Zubieta Rico PF, Delferro M, de Pablo JJ. A Molecular View of Methane Activation on Ni(111) through Enhanced Sampling and Machine Learning. J Phys Chem Lett 2024; 15:9852-9862. [PMID: 39298736 DOI: 10.1021/acs.jpclett.4c02237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/22/2024]
Abstract
A combination of machine learned interatomic potentials (MLIPs) and enhanced sampling simulations is used to investigate the activation of methane on a Ni(111) surface. The work entails the development and iterative refinement of MLIPs, initially trained on a data set constructed via ab initio molecular dynamics simulations, supplemented by adaptive biasing forces, to enrich the sampling of catalytically relevant configurations. Our results reveal that upon incorporation of collective variables that capture the behavior of the reactant molecule, as well as additional frames that describe the dynamic response of the catalytic surface, it is possible to enhance considerably the accuracy of predicted energies and forces. By employing enhanced sampling schemes in the refinement of the MLIP, we systematically explore the potential energy surface, leading to a refined MLIP capable of predicting density functional theory-level energies and forces and replicating key geometric characteristics of the catalytic system. The resulting free energy landscapes at several temperatures provide a detailed view of the thermodynamics and dynamics of methane activation. Specifically, as methane approaches and dissociates on the catalytic surface, the process involves the dynamic interplay of CH4 and the Ni catalyst that includes both enthalpic and entropic contributions. The progression toward the transition state involves a CH4 moiety that is increasingly restrained in its ability to rotate or translate, while the stage following the transition state is characterized by a notable rise of the Ni atom that interacts with the cleaved C-H bond. This leads to an increase in the mobility of the adsorbed species, a feature that becomes more pronounced at higher temperatures.
Collapse
Affiliation(s)
- Yinan Xu
- Pritzker School of Molecular Engineering, The University of Chicago, 640 South Ellis Avenue, Chicago, Illinois 60637, United States
| | - Yezhi Jin
- Pritzker School of Molecular Engineering, The University of Chicago, 640 South Ellis Avenue, Chicago, Illinois 60637, United States
| | - Jireh S García Sánchez
- Pritzker School of Molecular Engineering, The University of Chicago, 640 South Ellis Avenue, Chicago, Illinois 60637, United States
| | - Gustavo R Pérez-Lemus
- Pritzker School of Molecular Engineering, The University of Chicago, 640 South Ellis Avenue, Chicago, Illinois 60637, United States
| | - Pablo F Zubieta Rico
- Pritzker School of Molecular Engineering, The University of Chicago, 640 South Ellis Avenue, Chicago, Illinois 60637, United States
| | - Massimiliano Delferro
- Chemical Sciences and Engineering Division, Argonne National Laboratory, 9700 South Cass Avenue, Lemont, Illinois 60439, United States
| | - Juan J de Pablo
- Pritzker School of Molecular Engineering, The University of Chicago, 640 South Ellis Avenue, Chicago, Illinois 60637, United States
- Materials Science Division, Argonne National Laboratory, 9700 South Cass Avenue, Lemont, Illinois 60439, United States
| |
Collapse
|
2
|
Shirani H, Hashemianzadeh SM. Quantum-level machine learning calculations of Levodopa. Comput Biol Chem 2024; 112:108146. [PMID: 39067350 DOI: 10.1016/j.compbiolchem.2024.108146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Revised: 06/20/2024] [Accepted: 07/08/2024] [Indexed: 07/30/2024]
Abstract
Many drug molecules contain functional groups, resulting in a torsional barrier corresponding to rotation around the bond linking the fragments. In medicinal chemistry and pharmaceutical sciences, inclusive of drug design studies, the exact calculation of the potential energy surface (PES) of these molecular torsions is extremely important and precious. Machine learning (ML), including deep learning (DL), is currently one of the most rapidly evolving tools in computer-aided drug discovery and molecular simulations. In this work, we used ANI-1x neural network potential as a quantum-level ML to predict the PESs of the L-3,4-dihydroxyphenylalanine (Levodopa) antiparkinsonian drug molecule. The electronic energies and structural parameters calculated by density functional theory (DFT) using the wB97X method and all possible Pople's basis sets indicated the 6-31G(d) basis set, when used with the wB97X functional, exhibits behavior similar to that of the ANI-1x model. The vibrational frequencies investigation showed a linear correlation between DFT and ML data. All ANI-1x calculations were completed quickly in a very short computing time. From this perspective, we expect the ANI-1x dataset applied in this work to be appreciably efficient and effective in computational structure-based drug design studies.
Collapse
Affiliation(s)
- Hossein Shirani
- Molecular Simulation Research Laboratory, Department of Chemistry, Iran University of Science and Technology, P.O. Box 16846-13114, Tehran, Iran.
| | - Seyed Majid Hashemianzadeh
- Molecular Simulation Research Laboratory, Department of Chemistry, Iran University of Science and Technology, P.O. Box 16846-13114, Tehran, Iran.
| |
Collapse
|
3
|
Gupta AK, Stulajter MM, Shaidu Y, Neaton JB, de Jong WA. Equivariant Neural Networks Utilizing Molecular Clusters for Accurate Molecular Crystal Lattice Energy Predictions. ACS OMEGA 2024; 9:40269-40282. [PMID: 39346862 PMCID: PMC11425815 DOI: 10.1021/acsomega.4c07434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/13/2024] [Revised: 08/27/2024] [Accepted: 09/02/2024] [Indexed: 10/01/2024]
Abstract
Equivariant neural networks have emerged as prominent models in advancing the construction of interatomic potentials due to their remarkable data efficiency and generalization capabilities for out-of-distribution data. Here, we expand the utility of these networks to the prediction of crystal structures consisting of organic molecules. Traditional methods for computing crystal structure properties, such as plane-wave quantum chemical methods based on density functional theory (DFT), are prohibitively resource-intensive, often necessitating compromises in accuracy and the choice of exchange-correlation functional. We present an approach that leverages the efficiency, and transferability of equivariant neural networks, specifically Allegro, to predict molecular crystal structure energies at a reduced computational cost. Our neural network is trained on molecular clusters using a highly accurate Gaussian-type orbital (GTO)-based method as the target level of theory, eliminating the need for costly periodic DFT calculations, while providing access to all families of exchange-corelation functionals and post-Hartree-Fock methods. The trained model exhibits remarkable accuracy in predicting lattice energies, aligning closely with those computed by plane-wave based DFT methods, thus representing significant cost reductions. Furthermore, the Allegro network was seamlessly integrated with the USPEX framework, accelerating the discovery of low-energy crystal structures during crystal structure prediction.
Collapse
Affiliation(s)
- Ankur K Gupta
- Applied Mathematics and Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Miko M Stulajter
- Applied Mathematics and Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Yusuf Shaidu
- Department of Physics, University of California Berkeley, Berkeley, California 94720, United States
- Materials Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Jeffrey B Neaton
- Department of Physics, University of California Berkeley, Berkeley, California 94720, United States
- Materials Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
- Kavli Energy NanoSciences Institute at Berkeley, Berkeley, California 94720, United States
| | - Wibe A de Jong
- Applied Mathematics and Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| |
Collapse
|
4
|
Kneiding H, Balcells D. Augmenting genetic algorithms with machine learning for inverse molecular design. Chem Sci 2024:d4sc02934h. [PMID: 39296997 PMCID: PMC11404003 DOI: 10.1039/d4sc02934h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Accepted: 09/09/2024] [Indexed: 09/21/2024] Open
Abstract
Evolutionary and machine learning methods have been successfully applied to the generation of molecules and materials exhibiting desired properties. The combination of these two paradigms in inverse design tasks can yield powerful methods that explore massive chemical spaces more efficiently, improving the quality of the generated compounds. However, such synergistic approaches are still an incipient area of research and appear underexplored in the literature. This perspective covers different ways of incorporating machine learning approaches into evolutionary learning frameworks, with the overall goal of increasing the optimization efficiency of genetic algorithms. In particular, machine learning surrogate models for faster fitness function evaluation, discriminator models to control population diversity on-the-fly, machine learning based crossover operations, and evolution in latent space are discussed. The further potential of these synergistic approaches in generative tasks is also assessed, outlining promising directions for future developments.
Collapse
Affiliation(s)
- Hannes Kneiding
- Hylleraas Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo P.O. Box 1033, Blindern 0315 Oslo Norway
| | - David Balcells
- Hylleraas Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo P.O. Box 1033, Blindern 0315 Oslo Norway
| |
Collapse
|
5
|
Rothchild D, Rosen AS, Taw E, Robinson C, Gonzalez JE, Krishnapriyan AS. Investigating the behavior of diffusion models for accelerating electronic structure calculations. Chem Sci 2024; 15:13506-13522. [PMID: 39183908 PMCID: PMC11339969 DOI: 10.1039/d3sc05877h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 07/11/2024] [Indexed: 08/27/2024] Open
Abstract
We present an investigation of diffusion models for molecular generation, with the aim of better understanding how their predictions compare to the results of physics-based calculations. The investigation into these models is driven by their potential to significantly accelerate electronic structure calculations using machine learning, without requiring expensive first-principles datasets for training interatomic potentials. We find that the inference process of a popular diffusion model for de novo molecular generation is divided into an exploration phase, where the model chooses the atomic species, and a relaxation phase, where it adjusts the atomic coordinates to find a low-energy geometry. As training proceeds, we show that the model initially learns about the first-order structure of the potential energy surface, and then later learns about higher-order structure. We also find that the relaxation phase of the diffusion model can be re-purposed to sample the Boltzmann distribution over conformations and to carry out structure relaxations. For structure relaxations, the model finds geometries with ∼10× lower energy than those produced by a classical force field for small organic molecules. Initializing a density functional theory (DFT) relaxation at the diffusion-produced structures yields a >2× speedup to the DFT relaxation when compared to initializing at structures relaxed with a classical force field.
Collapse
Affiliation(s)
- Daniel Rothchild
- Department of Electrical Engineering and Computer Science, University of California Berkeley USA
| | - Andrew S Rosen
- Department of Materials Science and Engineering, University of California Berkeley USA
- Materials Science Division, Lawrence Berkeley National Laboratory USA
| | - Eric Taw
- Department of Chemical and Biomolecular Engineering, University of California Berkeley USA
- Materials Science Division, Lawrence Berkeley National Laboratory USA
| | - Connie Robinson
- Department of Chemistry, University of California Berkeley USA
| | - Joseph E Gonzalez
- Department of Electrical Engineering and Computer Science, University of California Berkeley USA
| | - Aditi S Krishnapriyan
- Department of Electrical Engineering and Computer Science, University of California Berkeley USA
- Department of Chemical and Biomolecular Engineering, University of California Berkeley USA
| |
Collapse
|
6
|
Plé T, Adjoua O, Lagardère L, Piquemal JP. FeNNol: An efficient and flexible library for building force-field-enhanced neural network potentials. J Chem Phys 2024; 161:042502. [PMID: 39051830 DOI: 10.1063/5.0217688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Accepted: 06/28/2024] [Indexed: 07/27/2024] Open
Abstract
Neural network interatomic potentials (NNPs) have recently proven to be powerful tools to accurately model complex molecular systems while bypassing the high numerical cost of ab initio molecular dynamics simulations. In recent years, numerous advances in model architectures as well as the development of hybrid models combining machine-learning (ML) with more traditional, physically motivated, force-field interactions have considerably increased the design space of ML potentials. In this paper, we present FeNNol, a new library for building, training, and running force-field-enhanced neural network potentials. It provides a flexible and modular system for building hybrid models, allowing us to easily combine state-of-the-art embeddings with ML-parameterized physical interaction terms without the need for explicit programming. Furthermore, FeNNol leverages the automatic differentiation and just-in-time compilation features of the Jax Python library to enable fast evaluation of NNPs, shrinking the performance gap between ML potentials and standard force-fields. This is demonstrated with the popular ANI-2x model reaching simulation speeds nearly on par with the AMOEBA polarizable force-field on commodity GPUs (graphics processing units). We hope that FeNNol will facilitate the development and application of new hybrid NNP architectures for a wide range of molecular simulation problems.
Collapse
Affiliation(s)
- Thomas Plé
- Sorbonne Université, LCT, UMR 7616 CNRS, 75005 Paris, France
| | - Olivier Adjoua
- Sorbonne Université, LCT, UMR 7616 CNRS, 75005 Paris, France
| | - Louis Lagardère
- Sorbonne Université, LCT, UMR 7616 CNRS, 75005 Paris, France
| | | |
Collapse
|
7
|
Li T, Huls NJ, Lu S, Hou P. Unsupervised manifold embedding to encode molecular quantum information for supervised learning of chemical data. Commun Chem 2024; 7:133. [PMID: 38862828 PMCID: PMC11166954 DOI: 10.1038/s42004-024-01217-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Accepted: 06/03/2024] [Indexed: 06/13/2024] Open
Abstract
Molecular representation is critical in chemical machine learning. It governs the complexity of model development and the fulfillment of training data to avoid either over- or under-fitting. As electronic structures and associated attributes are the root cause for molecular interactions and their manifested properties, we have sought to examine the local electron information on a molecular manifold to understand and predict molecular interactions. Our efforts led to the development of a lower-dimensional representation of a molecular manifold, Manifold Embedding of Molecular Surface (MEMS), to embody surface electronic quantities. By treating a molecular surface as a manifold and computing its embeddings, the embedded electronic attributes retain the chemical intuition of molecular interactions. MEMS can be further featurized as input for chemical learning. Our solubility prediction with MEMS demonstrated the feasibility of both shallow and deep learning by neural networks, suggesting that MEMS is expressive and robust against dimensionality reduction.
Collapse
Affiliation(s)
- Tonglei Li
- Deparment of Industrial and Molecular Pharmaceutics, Purdue University, West Lafayette, 47907, IN, USA.
| | - Nicholas J Huls
- Deparment of Industrial and Molecular Pharmaceutics, Purdue University, West Lafayette, 47907, IN, USA
| | - Shan Lu
- Deparment of Industrial and Molecular Pharmaceutics, Purdue University, West Lafayette, 47907, IN, USA
| | - Peng Hou
- Deparment of Industrial and Molecular Pharmaceutics, Purdue University, West Lafayette, 47907, IN, USA
| |
Collapse
|
8
|
Wang G, Wang C, Zhang X, Li Z, Zhou J, Sun Z. Machine learning interatomic potential: Bridge the gap between small-scale models and realistic device-scale simulations. iScience 2024; 27:109673. [PMID: 38646181 PMCID: PMC11033164 DOI: 10.1016/j.isci.2024.109673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/23/2024] Open
Abstract
Machine learning interatomic potential (MLIP) overcomes the challenges of high computational costs in density-functional theory and the relatively low accuracy in classical large-scale molecular dynamics, facilitating more efficient and precise simulations in materials research and design. In this review, the current state of the four essential stages of MLIP is discussed, including data generation methods, material structure descriptors, six unique machine learning algorithms, and available software. Furthermore, the applications of MLIP in various fields are investigated, notably in phase-change memory materials, structure searching, material properties predicting, and the pre-trained universal models. Eventually, the future perspectives, consisting of standard datasets, transferability, generalization, and trade-off between accuracy and complexity in MLIPs, are reported.
Collapse
Affiliation(s)
- Guanjie Wang
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
- School of Integrated Circuit Science and Engineering, Beihang University, Beijing 100191, China
| | - Changrui Wang
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| | - Xuanguang Zhang
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| | - Zefeng Li
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| | - Jian Zhou
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| | - Zhimei Sun
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| |
Collapse
|
9
|
Helal H, Firoz J, Bilbrey JA, Sprueill H, Herman KM, Krell MM, Murray T, Roldan ML, Kraus M, Li A, Das P, Xantheas SS, Choudhury S. Acceleration of Graph Neural Network-Based Prediction Models in Chemistry via Co-Design Optimization on Intelligence Processing Units. J Chem Inf Model 2024; 64:1568-1580. [PMID: 38382011 DOI: 10.1021/acs.jcim.3c01312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Atomic structure prediction and associated property calculations are the bedrock of chemical physics. Since high-fidelity ab initio modeling techniques for computing the structure and properties can be prohibitively expensive, this motivates the development of machine-learning (ML) models that make these predictions more efficiently. Training graph neural networks over large atomistic databases introduces unique computational challenges, such as the need to process millions of small graphs with variable size and support communication patterns that are distinct from learning over large graphs, such as social networks. We demonstrate a novel hardware-software codesign approach to scale up the training of atomistic graph neural networks (GNN) for structure and property prediction. First, to eliminate redundant computation and memory associated with alternative padding techniques and to improve throughput via minimizing communication, we formulate the effective coalescing of the batches of variable-size atomistic graphs as the bin packing problem and introduce a hardware-agnostic algorithm to pack these batches. In addition, we propose hardware-specific optimizations, including a planner and vectorization for the gather-scatter operations targeted for Graphcore's Intelligence Processing Unit (IPU), as well as model-specific optimizations such as merged communication collectives and optimized softplus. Putting these all together, we demonstrate the effectiveness of the proposed codesign approach by providing an implementation of a well-established atomistic GNN on the Graphcore IPUs. We evaluate the training performance on multiple atomistic graph databases with varying degrees of graph counts, sizes, and sparsity. We demonstrate that such a codesign approach can reduce the training time of atomistic GNNs and can improve their performance by up to 1.5× compared to the baseline implementation of the model on the IPUs. Additionally, we compare our IPU implementation with a Nvidia GPU-based implementation and show that our atomistic GNN implementation on the IPUs can run 1.8× faster on average compared to the execution time on the GPUs.
Collapse
Affiliation(s)
- Hatem Helal
- Graphcore, Kett House, Station Rd, Cambridge CB1 2JH, U.K
| | - Jesun Firoz
- Advanced Computing, Mathematics and Data Division, Pacific Northwest National Laboratory, 1100 Dexter Ave N, Seattle, Washington 98109, United States
| | - Jenna A Bilbrey
- Artificial Intelligence and Data Analytics Division, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, Washington 99352, United States
| | - Henry Sprueill
- Artificial Intelligence and Data Analytics Division, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, Washington 99352, United States
| | - Kristina M Herman
- Department of Chemistry, University of Washington, Seattle, Washington 98185, United States
| | | | - Tom Murray
- Graphcore, Kett House, Station Rd, Cambridge CB1 2JH, U.K
| | | | - Mike Kraus
- Graphcore, Kett House, Station Rd, Cambridge CB1 2JH, U.K
| | - Ang Li
- Advanced Computing, Mathematics and Data Division, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, Washington 99352, United States
| | - Payel Das
- IBM Research, Yorktown Heights, New York 10598, United States
| | - Sotiris S Xantheas
- Department of Chemistry, University of Washington, Seattle, Washington 98185, United States
- Advanced Computing, Mathematics and Data Division, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, Washington 99352, United States
| | - Sutanay Choudhury
- Advanced Computing, Mathematics and Data Division, Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, Washington 99352, United States
| |
Collapse
|
10
|
Pathirage PDVS, Phillips JT, Vogiatzis KD. Exploration of the Two-Electron Excitation Space with Data-Driven Coupled Cluster. J Phys Chem A 2024. [PMID: 38422511 DOI: 10.1021/acs.jpca.3c06600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/02/2024]
Abstract
Computational cost limits the applicability of post-Hartree-Fock methods such as coupled-cluster on larger molecular systems. The data-driven coupled-cluster (DDCC) method applies machine learning to predict the coupled-cluster two-electron amplitudes (t2) using data from second-order perturbation theory (MP2). One major limitation of the DDCC models is the size of training sets that increases exponentially with the system size. Effective sampling of the amplitude space can resolve this issue. Five different amplitude selection techniques that reduce the amount of data used for training were evaluated, an approach that also prevents model overfitting and increases the portability of data-driven coupled-cluster singles and doubles to more complex molecules or larger basis sets. In combination with a localized orbital formalism to predict the CCSD t2 amplitudes, we have achieved a 10-fold error reduction for energy calculations.
Collapse
Affiliation(s)
- P D Varuna S Pathirage
- Department of Chemistry, University of Tennessee, Knoxville, Tennessee 37996-1600, United States
| | - Justin T Phillips
- Department of Chemistry, University of Tennessee, Knoxville, Tennessee 37996-1600, United States
| | - Konstantinos D Vogiatzis
- Department of Chemistry, University of Tennessee, Knoxville, Tennessee 37996-1600, United States
| |
Collapse
|
11
|
Isert C, Atz K, Riniker S, Schneider G. Exploring protein-ligand binding affinity prediction with electron density-based geometric deep learning. RSC Adv 2024; 14:4492-4502. [PMID: 38312732 PMCID: PMC10835705 DOI: 10.1039/d3ra08650j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 01/19/2024] [Indexed: 02/06/2024] Open
Abstract
Rational structure-based drug design relies on accurate predictions of protein-ligand binding affinity from structural molecular information. Although deep learning-based methods for predicting binding affinity have shown promise in computational drug design, certain approaches have faced criticism for their potential to inadequately capture the fundamental physical interactions between ligands and their macromolecular targets or for being susceptible to dataset biases. Herein, we propose to include bond-critical points based on the electron density of a protein-ligand complex as a fundamental physical representation of protein-ligand interactions. Employing a geometric deep learning model, we explore the usefulness of these bond-critical points to predict absolute binding affinities of protein-ligand complexes, benchmark model performance against existing methods, and provide a critical analysis of this new approach. The models achieved root-mean-squared errors of 1.4-1.8 log units on the PDBbind dataset, and 1.0-1.7 log units on the PDE10A dataset, not indicating significant advantages over benchmark methods, and thus rendering the utility of electron density for deep learning models context-dependent. The relationship between intermolecular electron density and corresponding binding affinity was analyzed, and Pearson correlation coefficients r > 0.7 were obtained for several macromolecular targets.
Collapse
Affiliation(s)
- Clemens Isert
- ETH Zurich, Department of Chemistry and Applied Biosciences Vladimir-Prelog-Weg 4 8093 Zurich Switzerland +41 44 633 73 27
| | - Kenneth Atz
- ETH Zurich, Department of Chemistry and Applied Biosciences Vladimir-Prelog-Weg 4 8093 Zurich Switzerland +41 44 633 73 27
| | - Sereina Riniker
- ETH Zurich, Department of Chemistry and Applied Biosciences Vladimir-Prelog-Weg 4 8093 Zurich Switzerland +41 44 633 73 27
| | - Gisbert Schneider
- ETH Zurich, Department of Chemistry and Applied Biosciences Vladimir-Prelog-Weg 4 8093 Zurich Switzerland +41 44 633 73 27
| |
Collapse
|
12
|
Venturella C, Hillenbrand C, Li J, Zhu T. Machine Learning Many-Body Green's Functions for Molecular Excitation Spectra. J Chem Theory Comput 2024; 20:143-154. [PMID: 38150268 DOI: 10.1021/acs.jctc.3c01146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2023]
Abstract
We present a machine learning (ML) framework for predicting Green's functions of molecular systems, from which photoemission spectra and quasiparticle energies at quantum many-body level can be obtained. Kernel ridge regression is adopted to predict self-energy matrix elements on compact imaginary frequency grids from static and dynamical mean-field electronic features, which gives direct access to real-frequency many-body Green's functions through analytic continuation and Dyson's equation. Feature and self-energy matrices are represented in a symmetry-adapted intrinsic atomic orbital plus projected atomic orbital basis to enforce rotational invariance. We demonstrate good transferability and high data efficiency of the proposed ML method across molecular sizes and chemical species by showing accurate predictions of density of states (DOS) and quasiparticle energies at the level of many-body perturbation theory (GW) or full configuration interaction. For the ML model trained on 48 out of 1995 molecules randomly sampled from the QM7 and QM9 data sets, we report the mean absolute errors of ML-predicted highest occupied and lowest unoccupied molecular orbital energies to be 0.13 and 0.10 eV, respectively, compared to GW@PBE0. We further showcase the capability of this method by applying the same ML model to predict DOS for significantly larger organic molecules with up to 44 heavy atoms.
Collapse
Affiliation(s)
- Christian Venturella
- Department of Chemistry, Yale University, New Haven, Connecticut 06520, United States
| | | | - Jiachen Li
- Department of Chemistry, Yale University, New Haven, Connecticut 06520, United States
| | - Tianyu Zhu
- Department of Chemistry, Yale University, New Haven, Connecticut 06520, United States
| |
Collapse
|
13
|
Wang Y, Wang T, Li S, He X, Li M, Wang Z, Zheng N, Shao B, Liu TY. Enhancing geometric representations for molecules with equivariant vector-scalar interactive message passing. Nat Commun 2024; 15:313. [PMID: 38182565 PMCID: PMC10770089 DOI: 10.1038/s41467-023-43720-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 11/16/2023] [Indexed: 01/07/2024] Open
Abstract
Geometric deep learning has been revolutionizing the molecular modeling field. Despite the state-of-the-art neural network models are approaching ab initio accuracy for molecular property prediction, their applications, such as drug discovery and molecular dynamics (MD) simulation, have been hindered by insufficient utilization of geometric information and high computational costs. Here we propose an equivariant geometry-enhanced graph neural network called ViSNet, which elegantly extracts geometric features and efficiently models molecular structures with low computational costs. Our proposed ViSNet outperforms state-of-the-art approaches on multiple MD benchmarks, including MD17, revised MD17 and MD22, and achieves excellent chemical property prediction on QM9 and Molecule3D datasets. Furthermore, through a series of simulations and case studies, ViSNet can efficiently explore the conformational space and provide reasonable interpretability to map geometric representations to molecular structures.
Collapse
Affiliation(s)
- Yusong Wang
- Microsoft Research AI4Science, 100080, Beijing, China
- National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, 710049, Xi'an, China
| | - Tong Wang
- Microsoft Research AI4Science, 100080, Beijing, China.
| | - Shaoning Li
- Microsoft Research AI4Science, 100080, Beijing, China
| | - Xinheng He
- Microsoft Research AI4Science, 100080, Beijing, China
- The CAS Key Laboratory of Receptor Research and State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 201203, Shanghai, China
- University of Chinese Academy of Sciences, 100049, Beijing, China
| | - Mingyu Li
- Microsoft Research AI4Science, 100080, Beijing, China
- Medicinal Chemistry and Bioinformatics Center, School of Medicine, Shanghai Jiaotong University, Shanghai, 200025, China
| | - Zun Wang
- Microsoft Research AI4Science, 100080, Beijing, China
| | - Nanning Zheng
- National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, 710049, Xi'an, China
| | - Bin Shao
- Microsoft Research AI4Science, 100080, Beijing, China.
| | - Tie-Yan Liu
- Microsoft Research AI4Science, 100080, Beijing, China
| |
Collapse
|
14
|
Stark W, Westermayr J, Douglas-Gallardo OA, Gardner J, Habershon S, Maurer RJ. Machine Learning Interatomic Potentials for Reactive Hydrogen Dynamics at Metal Surfaces Based on Iterative Refinement of Reaction Probabilities. THE JOURNAL OF PHYSICAL CHEMISTRY. C, NANOMATERIALS AND INTERFACES 2023; 127:24168-24182. [PMID: 38148847 PMCID: PMC10749455 DOI: 10.1021/acs.jpcc.3c06648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 11/12/2023] [Accepted: 11/15/2023] [Indexed: 12/28/2023]
Abstract
The reactive chemistry of molecular hydrogen at surfaces, notably dissociative sticking and hydrogen evolution, plays a crucial role in energy storage and fuel cells. Theoretical studies can help to decipher underlying mechanisms and reaction design, but studying dynamics at surfaces is computationally challenging due to the complex electronic structure at interfaces and the high sensitivity of dynamics to reaction barriers. In addition, ab initio molecular dynamics, based on density functional theory, is too computationally demanding to accurately predict reactive sticking or desorption probabilities, as it requires averaging over tens of thousands of initial conditions. High-dimensional machine learning-based interatomic potentials are starting to be more commonly used in gas-surface dynamics, yet robust approaches to generate reliable training data and assess how model uncertainty affects the prediction of dynamic observables are not well established. Here, we employ ensemble learning to adaptively generate training data while assessing model performance with full uncertainty quantification (UQ) for reaction probabilities of hydrogen scattering on different copper facets. We use this approach to investigate the performance of two message-passing neural networks, SchNet and PaiNN. Ensemble-based UQ and iterative refinement allow us to expose the shortcomings of the invariant pairwise-distance-based feature representation in the SchNet model for gas-surface dynamics.
Collapse
Affiliation(s)
- Wojciech
G. Stark
- Department
of Chemistry, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, U.K.
| | - Julia Westermayr
- Department
of Chemistry, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, U.K.
| | | | - James Gardner
- Department
of Chemistry, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, U.K.
| | - Scott Habershon
- Department
of Chemistry, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, U.K.
| | - Reinhard J. Maurer
- Department
of Chemistry, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, U.K.
- Department
of Physics, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, U.K.
| |
Collapse
|
15
|
Hasan MN, Ray M, Saha A. Landscape of In Silico Tools for Modeling Covalent Modification of Proteins: A Review on Computational Covalent Drug Discovery. J Phys Chem B 2023; 127:9663-9684. [PMID: 37921534 DOI: 10.1021/acs.jpcb.3c04710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2023]
Abstract
Covalent drug discovery has been a challenging research area given the struggle of finding a sweet balance between selectivity and reactivity for these drugs, the lack of which often leads to off-target activities and hence undesirable side effects. However, there has been a resurgence in covalent drug design following the success of several covalent drugs such as boceprevir (2011), ibrutinib (2013), neratinib (2017), dacomitinib (2018), zanubrutinib (2019), and many others. Design of covalent drugs includes many crucial factors, where "evaluation of the binding affinity" and "a detailed mechanistic understanding on covalent inhibition" are at the top of the list. Well-defined experimental techniques are available to elucidate these factors; however, often they are expensive and/or time-consuming and hence not suitable for high throughput screens. Recent developments in in silico methods provide promise in this direction. In this report, we review a set of recent publications that focused on developing and/or implementing novel in silico techniques in "Computational Covalent Drug Discovery (CCDD)". We also discuss the advantages and disadvantages of these approaches along with what improvements are required to make it a great tool in medicinal chemistry in the near future.
Collapse
Affiliation(s)
- Md Nazmul Hasan
- Department of Chemistry and Biochemistry, University of Wisconsin─Milwaukee, Milwaukee, Wisconsin 53211, United States
| | - Manisha Ray
- Department of Chemistry and Biochemistry, Loyola University Chicago, Chicago, Illinois 60660, United States
| | - Arjun Saha
- Department of Chemistry and Biochemistry, University of Wisconsin─Milwaukee, Milwaukee, Wisconsin 53211, United States
| |
Collapse
|
16
|
Plé T, Lagardère L, Piquemal JP. Force-field-enhanced neural network interactions: from local equivariant embedding to atom-in-molecule properties and long-range effects. Chem Sci 2023; 14:12554-12569. [PMID: 38020379 PMCID: PMC10646944 DOI: 10.1039/d3sc02581k] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 10/03/2023] [Indexed: 12/01/2023] Open
Abstract
We introduce FENNIX (Force-Field-Enhanced Neural Network InteraXions), a hybrid approach between machine-learning and force-fields. We leverage state-of-the-art equivariant neural networks to predict local energy contributions and multiple atom-in-molecule properties that are then used as geometry-dependent parameters for physically-motivated energy terms which account for long-range electrostatics and dispersion. Using high-accuracy ab initio data (small organic molecules/dimers), we trained a first version of the model. Exhibiting accurate gas-phase energy predictions, FENNIX is transferable to the condensed phase. It is able to produce stable Molecular Dynamics simulations, including nuclear quantum effects, for water predicting accurate liquid properties. The extrapolating power of the hybrid physically-driven machine learning FENNIX approach is exemplified by computing: (i) the solvated alanine dipeptide free energy landscape; (ii) the reactive dissociation of small molecules.
Collapse
Affiliation(s)
- Thomas Plé
- Sorbonne Université, LCT, UMR 7616 CNRS F-75005 Paris France thomas.ple@sorbonne-université louis.lagardere@sorbonne-université jean-philip.piquemal@sorbonne-université
| | - Louis Lagardère
- Sorbonne Université, LCT, UMR 7616 CNRS F-75005 Paris France thomas.ple@sorbonne-université louis.lagardere@sorbonne-université jean-philip.piquemal@sorbonne-université
| | - Jean-Philip Piquemal
- Sorbonne Université, LCT, UMR 7616 CNRS F-75005 Paris France thomas.ple@sorbonne-université louis.lagardere@sorbonne-université jean-philip.piquemal@sorbonne-université
| |
Collapse
|
17
|
Chen Z, Wing-Wah Yam V. Encoding Hole-Particle Information in the Multi-Channel MolOrbImage for Machine-Learned Excited-State Energies of Large Photofunctional Materials. J Am Chem Soc 2023; 145:24098-24107. [PMID: 37874942 DOI: 10.1021/jacs.3c07766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2023]
Abstract
We present a novel class of one-electron multi-channel molecular orbital images (MolOrbImages) designed for the prediction of excited-state energetics in conjunction with the state-of-the-art VGG-type machine-learning architecture. By representing hole and particle states in the excitation process as channels of MolOrbImages, the revised VGG model achieves excellent prediction accuracy for both low-lying singlet and triplet states, with mean absolute errors (MAEs) of <0.08 and <0.1 eV for QM9 molecules and large photofunctional materials with up to 560 atoms, respectively. Remarkably, the model demonstrates exceptional performance (MAE < 1 kcal/mol) for the T1 state of QM9 molecules, making it a non-system-specific model that approaches chemical accuracy. The general rules attained, for instance, the improved performance with well-defined MO energies and the reduced overfitting concern via the inclusion of physically insightful hole-particle information, provide invaluable guidelines for the further design of orbital-based descriptors targeting molecular excited states.
Collapse
Affiliation(s)
- Ziyong Chen
- Institute of Molecular Functional Materials and Department of Chemistry, The University of Hong Kong, Pokfulam Road, Hong Kong, China
| | - Vivian Wing-Wah Yam
- Institute of Molecular Functional Materials and Department of Chemistry, The University of Hong Kong, Pokfulam Road, Hong Kong, China
- Hong Kong Quantum AI Lab Ltd., Hong Kong Science Park, Hong Kong, China
| |
Collapse
|
18
|
Watson L, Pope T, Jay RM, Banerjee A, Wernet P, Penfold TJ. A Δ-learning strategy for interpretation of spectroscopic observables. STRUCTURAL DYNAMICS (MELVILLE, N.Y.) 2023; 10:064101. [PMID: 37941993 PMCID: PMC10629969 DOI: 10.1063/4.0000215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Accepted: 10/17/2023] [Indexed: 11/10/2023]
Abstract
Accurate computations of experimental observables are essential for interpreting the high information content held within x-ray spectra. However, for complicated systems this can be difficult, a challenge compounded when dynamics becomes important owing to the large number of calculations required to capture the time-evolving observable. While machine learning architectures have been shown to represent a promising approach for rapidly predicting spectral lineshapes, achieving simultaneously accurate and sufficiently comprehensive training data is challenging. Herein, we introduce Δ-learning for x-ray spectroscopy. Instead of directly learning the structure-spectrum relationship, the Δ-model learns the structure dependent difference between a higher and lower level of theory. Consequently, once developed these models can be used to translate spectral shapes obtained from lower levels of theory to mimic those corresponding to higher levels of theory. Ultimately, this achieves accurate simulations with a much reduced computational burden as only the lower level of theory is computed, while the model can instantaneously transform this to a spectrum equivalent to a higher level of theory. Our present model, demonstrated herein, learns the difference between TDDFT(BLYP) and TDDFT(B3LYP) spectra. Its effectiveness is illustrated using simulations of Rh L3-edge spectra tracking the C-H activation of octane by a cyclopentadienyl rhodium carbonyl complex.
Collapse
Affiliation(s)
- Luke Watson
- Chemistry, School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, United Kingdom
| | - Thomas Pope
- Chemistry, School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, United Kingdom
| | - Raphael M. Jay
- Department of Physics and Astronomy, Uppsala University, 751 20 Uppsala, Sweden
| | - Ambar Banerjee
- Department of Physics and Astronomy, Uppsala University, 751 20 Uppsala, Sweden
| | - Philippe Wernet
- Department of Physics and Astronomy, Uppsala University, 751 20 Uppsala, Sweden
| | - Thomas J. Penfold
- Chemistry, School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, United Kingdom
| |
Collapse
|
19
|
Ng WP, Liang Q, Yang J. Low-Data Deep Quantum Chemical Learning for Accurate MP2 and Coupled-Cluster Correlations. J Chem Theory Comput 2023; 19:5439-5449. [PMID: 37506400 DOI: 10.1021/acs.jctc.3c00518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/30/2023]
Abstract
Accurate ab initio prediction of electronic energies is very expensive for macromolecules by explicitly solving post-Hartree-Fock equations. We here exploit the physically justified local correlation feature in a compact basis of small molecules and construct an expressive low-data deep neural network (dNN) model to obtain machine-learned electron correlation energies on par with MP2 and CCSD levels of theory for more complex molecules and different datasets that are not represented in the training set. We show that our dNN-powered model is data efficient and makes highly transferable predictions across alkanes of various lengths, organic molecules with non-covalent and biomolecular interactions, as well as water clusters of different sizes and morphologies. In particular, by training 800 (H2O)8 clusters with the local correlation descriptors, accurate MP2/cc-pVTZ correlation energies up to (H2O)128 can be predicted with a small random error within chemical accuracy from exact values, while a majority of prediction deviations are attributed to an intrinsically systematic error. Our results reveal that an extremely compact local correlation feature set, which is poor for any direct post-Hartree-Fock calculations, has however a prominent advantage in reserving important electron correlation patterns for making accurate transferable predictions across distinct molecular compositions, bond types, and geometries.
Collapse
Affiliation(s)
- Wai-Pan Ng
- Department of Chemistry, The University of Hong Kong, Hong Kong 999077, P. R. China
- Hong Kong Quantum AI Lab Limited, Hong Kong 999077, P. R. China
| | - Qiujiang Liang
- Department of Chemistry, The University of Hong Kong, Hong Kong 999077, P. R. China
| | - Jun Yang
- Department of Chemistry, The University of Hong Kong, Hong Kong 999077, P. R. China
- Hong Kong Quantum AI Lab Limited, Hong Kong 999077, P. R. China
| |
Collapse
|
20
|
Wang H, Fu T, Du Y, Gao W, Huang K, Liu Z, Chandak P, Liu S, Van Katwyk P, Deac A, Anandkumar A, Bergen K, Gomes CP, Ho S, Kohli P, Lasenby J, Leskovec J, Liu TY, Manrai A, Marks D, Ramsundar B, Song L, Sun J, Tang J, Veličković P, Welling M, Zhang L, Coley CW, Bengio Y, Zitnik M. Scientific discovery in the age of artificial intelligence. Nature 2023; 620:47-60. [PMID: 37532811 DOI: 10.1038/s41586-023-06221-2] [Citation(s) in RCA: 113] [Impact Index Per Article: 113.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 05/16/2023] [Indexed: 08/04/2023]
Abstract
Artificial intelligence (AI) is being increasingly integrated into scientific discovery to augment and accelerate research, helping scientists to generate hypotheses, design experiments, collect and interpret large datasets, and gain insights that might not have been possible using traditional scientific methods alone. Here we examine breakthroughs over the past decade that include self-supervised learning, which allows models to be trained on vast amounts of unlabelled data, and geometric deep learning, which leverages knowledge about the structure of scientific data to enhance model accuracy and efficiency. Generative AI methods can create designs, such as small-molecule drugs and proteins, by analysing diverse data modalities, including images and sequences. We discuss how these methods can help scientists throughout the scientific process and the central issues that remain despite such advances. Both developers and users of AI toolsneed a better understanding of when such approaches need improvement, and challenges posed by poor data quality and stewardship remain. These issues cut across scientific disciplines and require developing foundational algorithmic approaches that can contribute to scientific understanding or acquire it autonomously, making them critical areas of focus for AI innovation.
Collapse
Affiliation(s)
- Hanchen Wang
- Department of Engineering, University of Cambridge, Cambridge, UK
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA
- Department of Research and Early Development, Genentech Inc, South San Francisco, CA, USA
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Tianfan Fu
- Department of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - Yuanqi Du
- Department of Computer Science, Cornell University, Ithaca, NY, USA
| | - Wenhao Gao
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Kexin Huang
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Ziming Liu
- Department of Physics, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Payal Chandak
- Harvard-MIT Program in Health Sciences and Technology, Cambridge, MA, USA
| | - Shengchao Liu
- Mila - Quebec AI Institute, Montreal, Quebec, Canada
- Université de Montréal, Montreal, Quebec, Canada
| | - Peter Van Katwyk
- Department of Earth, Environmental and Planetary Sciences, Brown University, Providence, RI, USA
- Data Science Institute, Brown University, Providence, RI, USA
| | - Andreea Deac
- Mila - Quebec AI Institute, Montreal, Quebec, Canada
- Université de Montréal, Montreal, Quebec, Canada
| | - Anima Anandkumar
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA
- NVIDIA, Santa Clara, CA, USA
| | - Karianne Bergen
- Department of Earth, Environmental and Planetary Sciences, Brown University, Providence, RI, USA
- Data Science Institute, Brown University, Providence, RI, USA
| | - Carla P Gomes
- Department of Computer Science, Cornell University, Ithaca, NY, USA
| | - Shirley Ho
- Center for Computational Astrophysics, Flatiron Institute, New York, NY, USA
- Department of Astrophysical Sciences, Princeton University, Princeton, NJ, USA
- Department of Physics, Carnegie Mellon University, Pittsburgh, PA, USA
- Department of Physics and Center for Data Science, New York University, New York, NY, USA
| | | | - Joan Lasenby
- Department of Engineering, University of Cambridge, Cambridge, UK
| | - Jure Leskovec
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | | | - Arjun Manrai
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Debora Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Le Song
- BioMap, Beijing, China
- Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates
| | - Jimeng Sun
- University of Illinois at Urbana-Champaign, Champaign, IL, USA
| | - Jian Tang
- Mila - Quebec AI Institute, Montreal, Quebec, Canada
- HEC Montréal, Montreal, Quebec, Canada
- CIFAR AI Chair, Toronto, Ontario, Canada
| | - Petar Veličković
- Google DeepMind, London, UK
- Department of Computer Science and Technology, University of Cambridge, Cambridge, UK
| | - Max Welling
- University of Amsterdam, Amsterdam, Netherlands
- Microsoft Research Amsterdam, Amsterdam, Netherlands
| | - Linfeng Zhang
- DP Technology, Beijing, China
- AI for Science Institute, Beijing, China
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Yoshua Bengio
- Mila - Quebec AI Institute, Montreal, Quebec, Canada
- Université de Montréal, Montreal, Quebec, Canada
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Harvard Data Science Initiative, Cambridge, MA, USA.
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
21
|
Domingo L, Carlo G, Borondo F. Taking advantage of noise in quantum reservoir computing. Sci Rep 2023; 13:8790. [PMID: 37258528 PMCID: PMC10232431 DOI: 10.1038/s41598-023-35461-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 05/18/2023] [Indexed: 06/02/2023] Open
Abstract
The biggest challenge that quantum computing and quantum machine learning are currently facing is the presence of noise in quantum devices. As a result, big efforts have been put into correcting or mitigating the induced errors. But, can these two fields benefit from noise? Surprisingly, we demonstrate that under some circumstances, quantum noise can be used to improve the performance of quantum reservoir computing, a prominent and recent quantum machine learning algorithm. Our results show that the amplitude damping noise can be beneficial to machine learning, while the depolarizing and phase damping noises should be prioritized for correction. This critical result sheds new light into the physical mechanisms underlying quantum devices, providing solid practical prescriptions for a successful implementation of quantum information processing in nowadays hardware.
Collapse
Affiliation(s)
- L Domingo
- Instituto de Ciencias Matemáticas (ICMAT), Campus de Cantoblanco; Nicolás Cabrera, 13-15, 28049, Madrid, Spain
- Departamento de Química, Universidad Autónoma de Madrid, Cantoblanco, 28049, Madrid, Spain
- Grupo de Sistemas ComplejosUniversidad Politécnica de Madrid, 28035, Madrid, Spain
| | - G Carlo
- Departamento de Física, Comisión Nacional de Energía Atómica, CONICET, Av. del Libertador 8250, 1429, Buenos Aires, Argentina
| | - F Borondo
- Departamento de Química, Universidad Autónoma de Madrid, Cantoblanco, 28049, Madrid, Spain.
| |
Collapse
|
22
|
Gong X, Li H, Zou N, Xu R, Duan W, Xu Y. General framework for E(3)-equivariant neural network representation of density functional theory Hamiltonian. Nat Commun 2023; 14:2848. [PMID: 37208320 PMCID: PMC10199065 DOI: 10.1038/s41467-023-38468-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 04/27/2023] [Indexed: 05/21/2023] Open
Abstract
The combination of deep learning and ab initio calculation has shown great promise in revolutionizing future scientific research, but how to design neural network models incorporating a priori knowledge and symmetry requirements is a key challenging subject. Here we propose an E(3)-equivariant deep-learning framework to represent density functional theory (DFT) Hamiltonian as a function of material structure, which can naturally preserve the Euclidean symmetry even in the presence of spin-orbit coupling. Our DeepH-E3 method enables efficient electronic structure calculation at ab initio accuracy by learning from DFT data of small-sized structures, making the routine study of large-scale supercells (>104 atoms) feasible. The method can reach sub-meV prediction accuracy at high training efficiency, showing state-of-the-art performance in our experiments. The work is not only of general significance to deep-learning method development but also creates opportunities for materials research, such as building a Moiré-twisted material database.
Collapse
Affiliation(s)
- Xiaoxun Gong
- State Key Laboratory of Low Dimensional Quantum Physics and Department of Physics, Tsinghua University, 100084, Beijing, China
- School of Physics, Peking University, 100871, Beijing, China
| | - He Li
- State Key Laboratory of Low Dimensional Quantum Physics and Department of Physics, Tsinghua University, 100084, Beijing, China
- Institute for Advanced Study, Tsinghua University, 100084, Beijing, China
- Tencent Quantum Laboratory, Tencent, 518057, Shenzhen, Guangdong, China
| | - Nianlong Zou
- State Key Laboratory of Low Dimensional Quantum Physics and Department of Physics, Tsinghua University, 100084, Beijing, China
| | - Runzhang Xu
- State Key Laboratory of Low Dimensional Quantum Physics and Department of Physics, Tsinghua University, 100084, Beijing, China
| | - Wenhui Duan
- State Key Laboratory of Low Dimensional Quantum Physics and Department of Physics, Tsinghua University, 100084, Beijing, China.
- Institute for Advanced Study, Tsinghua University, 100084, Beijing, China.
- Tencent Quantum Laboratory, Tencent, 518057, Shenzhen, Guangdong, China.
- Frontier Science Center for Quantum Information, Beijing, China.
| | - Yong Xu
- State Key Laboratory of Low Dimensional Quantum Physics and Department of Physics, Tsinghua University, 100084, Beijing, China.
- Tencent Quantum Laboratory, Tencent, 518057, Shenzhen, Guangdong, China.
- Frontier Science Center for Quantum Information, Beijing, China.
- RIKEN Center for Emergent Matter Science (CEMS), Wako, Saitama, 351-0198, Japan.
| |
Collapse
|
23
|
Huang B, von Lilienfeld OA, Krogel JT, Benali A. Toward DMC Accuracy Across Chemical Space with Scalable Δ-QML. J Chem Theory Comput 2023; 19:1711-1721. [PMID: 36857531 DOI: 10.1021/acs.jctc.2c01058] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/03/2023]
Abstract
In the past decade, quantum diffusion Monte Carlo (DMC) has been demonstrated to successfully predict the energetics and properties of a wide range of molecules and solids by numerically solving the electronic many-body Schrödinger equation. With O(N3) scaling with the number of electrons N, DMC has the potential to be a reference method for larger systems that are not accessible to more traditional methods such as CCSD(T). Assessing the accuracy of DMC for smaller molecules becomes the stepping stone in making the method a reference for larger systems. We show that when coupled with quantum machine learning (QML)-based surrogate methods, the computational burden can be alleviated such that quantum Monte Carlo (QMC) shows clear potential to undergird the formation of high-quality descriptions across chemical space. We discuss three crucial approximations necessary to accomplish this: the fixed-node approximation, universal and accurate references for chemical bond dissociation energies, and scalable minimal amons-set-based QML (AQML) models. Numerical evidence presented includes converged DMC results for over 1000 small organic molecules with up to five heavy atoms used as amons and 50 medium-sized organic molecules with nine heavy atoms to validate the AQML predictions. Numerical evidence collected for Δ-AQML models suggests that already modestly sized QMC training data sets of amons suffice to predict total energies with near chemical accuracy throughout chemical space.
Collapse
Affiliation(s)
- Bing Huang
- Faculty of Physics, University of Vienna, Kolingasse 14-16, 1090 Vienna, Austria
| | - O Anatole von Lilienfeld
- Departments of Chemistry, Materials Science and Engineering, and Physics, University of Toronto, St. George Campus, Toronto, ON M5S 3H6, Canada.,Machine Learning Group, Technische Universität Berlin and Institute for the Foundations of Learning and Data, 10587 Berlin, Germany.,Vector Institute for Artificial Intelligence, Toronto, ON M5S 1M1, Canada
| | - Jaron T Krogel
- Materials Science and Technology Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| | - Anouar Benali
- Computational Science Division, Argonne National Laboratory, Argonne, Illinois 60439, United States
| |
Collapse
|
24
|
Chen Y, Ou Y, Zheng P, Huang Y, Ge F, Dral PO. Benchmark of general-purpose machine learning-based quantum mechanical method AIQM1 on reaction barrier heights. J Chem Phys 2023; 158:074103. [PMID: 36813722 DOI: 10.1063/5.0137101] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Artificial intelligence-enhanced quantum mechanical method 1 (AIQM1) is a general-purpose method that was shown to achieve high accuracy for many applications with a speed close to its baseline semiempirical quantum mechanical (SQM) method ODM2*. Here, we evaluate the hitherto unknown performance of out-of-the-box AIQM1 without any refitting for reaction barrier heights on eight datasets, including a total of ∼24 thousand reactions. This evaluation shows that AIQM1's accuracy strongly depends on the type of transition state and ranges from excellent for rotation barriers to poor for, e.g., pericyclic reactions. AIQM1 clearly outperforms its baseline ODM2* method and, even more so, a popular universal potential, ANI-1ccx. Overall, however, AIQM1 accuracy largely remains similar to SQM methods (and B3LYP/6-31G* for most reaction types) suggesting that it is desirable to focus on improving AIQM1 performance for barrier heights in the future. We also show that the built-in uncertainty quantification helps in identifying confident predictions. The accuracy of confident AIQM1 predictions is approaching the level of popular density functional theory methods for most reaction types. Encouragingly, AIQM1 is rather robust for transition state optimizations, even for the type of reactions it struggles with the most. Single-point calculations with high-level methods on AIQM1-optimized geometries can be used to significantly improve barrier heights, which cannot be said for its baseline ODM2* method.
Collapse
Affiliation(s)
- Yuxinxin Chen
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Yanchi Ou
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Peikun Zheng
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Yaohuang Huang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Fuchun Ge
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| |
Collapse
|
25
|
Computational Studies of Aflatoxin B 1 (AFB 1): A Review. Toxins (Basel) 2023; 15:toxins15020135. [PMID: 36828449 PMCID: PMC9967988 DOI: 10.3390/toxins15020135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 01/25/2023] [Accepted: 02/03/2023] [Indexed: 02/11/2023] Open
Abstract
Aflatoxin B1 (AFB1) exhibits the most potent mutagenic and carcinogenic activity among aflatoxins. For this reason, AFB1 is recognized as a human group 1 carcinogen by the International Agency of Research on Cancer. Consequently, it is essential to determine its properties and behavior in different chemical systems. The chemical properties of AFB1 can be explored using computational chemistry, which has been employed complementarily to experimental investigations. The present review includes in silico studies (semiempirical, Hartree-Fock, DFT, molecular docking, and molecular dynamics) conducted from the first computational study in 1974 to the present (2022). This work was performed, considering the following groups: (a) molecular properties of AFB1 (structural, energy, solvent effects, ground and the excited state, atomic charges, among others); (b) theoretical investigations of AFB1 (degradation, quantification, reactivity, among others); (c) molecular interactions with inorganic compounds (Ag+, Zn2+, and Mg2+); (d) molecular interactions with environmentally compounds (clays); and (e) molecular interactions with biological compounds (DNA, enzymes, cyclodextrins, glucans, among others). Accordingly, in this work, we provide to the stakeholder the knowledge of toxicity of types of AFB1-derivatives, the structure-activity relationships manifested by the bonds between AFB1 and DNA or proteins, and the types of strategies that have been employed to quantify, detect, and eliminate the AFB1 molecule.
Collapse
|
26
|
Bigi F, Huguenin-Dumittan KK, Ceriotti M, Manolopoulos DE. A smooth basis for atomistic machine learning. J Chem Phys 2022; 157:234101. [PMID: 36550032 DOI: 10.1063/5.0124363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Machine learning frameworks based on correlations of interatomic positions begin with a discretized description of the density of other atoms in the neighborhood of each atom in the system. Symmetry considerations support the use of spherical harmonics to expand the angular dependence of this density, but there is, as of yet, no clear rationale to choose one radial basis over another. Here, we investigate the basis that results from the solution of the Laplacian eigenvalue problem within a sphere around the atom of interest. We show that this generates a basis of controllable smoothness within the sphere (in the same sense as plane waves provide a basis with controllable smoothness for a problem with periodic boundaries) and that a tensor product of Laplacian eigenstates also provides a smooth basis for expanding any higher-order correlation of the atomic density within the appropriate hypersphere. We consider several unsupervised metrics of the quality of a basis for a given dataset and show that the Laplacian eigenstate basis has a performance that is much better than some widely used basis sets and competitive with data-driven bases that numerically optimize each metric. Finally, we investigate the role of the basis in building models of the potential energy. In these tests, we find that a combination of the Laplacian eigenstate basis and target-oriented heuristics leads to equal or improved regression performance when compared to both heuristic and data-driven bases in the literature. We conclude that the smoothness of the basis functions is a key aspect of successful atomic density representations.
Collapse
Affiliation(s)
- Filippo Bigi
- Physical and Theoretical Chemistry Laboratory, South Parks Road, Oxford OX1 3QZ, United Kingdom
| | - Kevin K Huguenin-Dumittan
- Laboratory of Computational Science and Modeling, Institute of Materials, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Michele Ceriotti
- Laboratory of Computational Science and Modeling, Institute of Materials, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - David E Manolopoulos
- Physical and Theoretical Chemistry Laboratory, South Parks Road, Oxford OX1 3QZ, United Kingdom
| |
Collapse
|
27
|
Cheng L, Sun J, Deustua JE, Bhethanabotla VC, Miller TF. Molecular-orbital-based machine learning for open-shell and multi-reference systems with kernel addition Gaussian process regression. J Chem Phys 2022; 157:154105. [PMID: 36272799 DOI: 10.1063/5.0110886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We introduce a novel machine learning strategy, kernel addition Gaussian process regression (KA-GPR), in molecular-orbital-based machine learning (MOB-ML) to learn the total correlation energies of general electronic structure theories for closed- and open-shell systems by introducing a machine learning strategy. The learning efficiency of MOB-ML(KA-GPR) is the same as the original MOB-ML method for the smallest criegee molecule, which is a closed-shell molecule with multi-reference characters. In addition, the prediction accuracies of different small free radicals could reach the chemical accuracy of 1 kcal/mol by training on one example structure. Accurate potential energy surfaces for the H10 chain (closed-shell) and water OH bond dissociation (open-shell) could also be generated by MOB-ML(KA-GPR). To explore the breadth of chemical systems that KA-GPR can describe, we further apply MOB-ML to accurately predict the large benchmark datasets for closed- (QM9, QM7b-T, and GDB-13-T) and open-shell (QMSpin) molecules.
Collapse
Affiliation(s)
- Lixue Cheng
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| | - Jiace Sun
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| | - J Emiliano Deustua
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| | - Vignesh C Bhethanabotla
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| | - Thomas F Miller
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| |
Collapse
|