1
|
Heid E, Schörghuber J, Wanzenböck R, Madsen GKH. Spatially Resolved Uncertainties for Machine Learning Potentials. J Chem Inf Model 2024; 64:6377-6387. [PMID: 39110874 DOI: 10.1021/acs.jcim.4c00904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
Machine learning potentials have become an essential tool for atomistic simulations, yielding results close to ab initio simulations at a fraction of computational cost. With recent improvements on the achievable accuracies, the focus has now shifted on the data set composition itself. The reliable identification of erroneously predicted configurations to extend a given data set is therefore of high priority. Yet, uncertainty estimation techniques have achieved mixed results for machine learning potentials. Consequently, a general and versatile method to correlate energy or atomic force uncertainties with the model error has remained elusive to date. In the current work, we show that epistemic uncertainty cannot correlate with model error by definition but can be aggregated over groups of atoms to yield a strong correlation. We demonstrate that our method correctly estimates prediction errors both globally per structure and locally resolved per atom. The direct correlation of local uncertainty and local error is used to design an active learning framework based on identifying local subregions of a large simulation cell and performing ab initio calculations only for the subregion subsequently. We successfully utilized this method to perform active learning in the low-data regime for liquid water.
Collapse
Affiliation(s)
- Esther Heid
- Institute of Materials Chemistry, TU Wien, A-1060 Vienna, Austria
| | | | - Ralf Wanzenböck
- Institute of Materials Chemistry, TU Wien, A-1060 Vienna, Austria
| | - Georg K H Madsen
- Institute of Materials Chemistry, TU Wien, A-1060 Vienna, Austria
| |
Collapse
|
2
|
Shermukhamedov S, Mamurjonova D, Maihom T, Probst M. Structure to Property: Chemical Element Embeddings for Predicting Electronic Properties of Crystals. J Chem Inf Model 2024; 64:5762-5770. [PMID: 39007646 PMCID: PMC11323004 DOI: 10.1021/acs.jcim.3c01990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 07/05/2024] [Accepted: 07/08/2024] [Indexed: 07/16/2024]
Abstract
We present a new general-purpose machine learning model that is able to predict a variety of crystal properties, including Fermi level energy and band gap, as well as spectral ones such as electronic densities of states. The model is based on atomic representations that enable it to effectively capture complex information about each atom and its surrounding environment in a crystal. The accuracy achieved for band gaps exceeds results previously published. By design, our model is not restricted to the electronic properties discussed here but can be extended to fit diverse chemical descriptors. Its advantages are (a) its low computational requirements, making it an efficient tool for high-throughput screening of materials; and (b) the simplicity and flexibility of its architecture, facilitating implementation and interpretation, especially for researchers in the field of computational chemistry.
Collapse
Affiliation(s)
| | - Dilorom Mamurjonova
- Department
of Inorganic Chemistry, Tashkent Chemical
Technological Institute, 100011 Tashkent, Uzbekistan
| | - Thana Maihom
- School
of Molecular Science and Engineering, Vidyasirimedhi
Institute of Science and Technology, 21201 Rayong, Thailand
- Division
of Chemistry, Department of Physical and Material Sciences, Faculty
of Liberal Arts and Science, Kasetsart University, Kamphaeng Saen Campus, 73140 Nakhon Pathom, Thailand
| | - Michael Probst
- Institute
of Ion Physics and Applied Physics, University
of Innsbruck, 6020 Innsbruck, Austria
- School
of Molecular Science and Engineering, Vidyasirimedhi
Institute of Science and Technology, 21201 Rayong, Thailand
| |
Collapse
|
3
|
Sriram A, Choi S, Yu X, Brabson LM, Das A, Ulissi Z, Uyttendaele M, Medford AJ, Sholl DS. The Open DAC 2023 Dataset and Challenges for Sorbent Discovery in Direct Air Capture. ACS CENTRAL SCIENCE 2024; 10:923-941. [PMID: 38799660 PMCID: PMC11117325 DOI: 10.1021/acscentsci.3c01629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Direct air capture (DAC) of CO2 with porous adsorbents such as metal-organic frameworks (MOFs) has the potential to aid large-scale decarbonization. Previous screening of MOFs for DAC relied on empirical force fields and ignored adsorbed H2O and MOF deformation. We performed quantum chemistry calculations overcoming these restrictions for thousands of MOFs. The resulting data enable efficient descriptions using machine learning.
Collapse
Affiliation(s)
- Anuroop Sriram
- Fundamental AI Research,
Meta AI, Meta, Menlo Park, California 94025, United States
| | - Sihoon Choi
- Fundamental AI Research,
Meta AI, Meta, Menlo Park, California 94025, United States
- School of Chemical and Biomolecular Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Xiaohan Yu
- School of Chemical and Biomolecular Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Logan M. Brabson
- School of Chemical and Biomolecular Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Abhishek Das
- Fundamental AI Research,
Meta AI, Meta, Menlo Park, California 94025, United States
| | - Zachary Ulissi
- Fundamental AI Research,
Meta AI, Meta, Menlo Park, California 94025, United States
| | - Matt Uyttendaele
- Fundamental AI Research,
Meta AI, Meta, Menlo Park, California 94025, United States
| | - Andrew J. Medford
- School of Chemical and Biomolecular Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - David S. Sholl
- School of Chemical and Biomolecular Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
- Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831-2008, United States
| |
Collapse
|
4
|
Wang Y, Sorkun MC, Brocks G, Er S. ML-Aided Computational Screening of 2D Materials for Photocatalytic Water Splitting. J Phys Chem Lett 2024:4983-4991. [PMID: 38691841 DOI: 10.1021/acs.jpclett.4c00425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2024]
Abstract
The exploration of two-dimensional (2D) materials with exceptional physical and chemical properties is essential for the advancement of solar water splitting technologies. However, the discovery of 2D materials is currently heavily reliant on fragmented studies with limited opportunities for fine-tuning the chemical composition and electronic features of compounds. Starting from the V2DB digital library as a resource of 2D materials, we set up and execute a funnel approach that incorporates multiple screening steps to uncover potential candidates for photocatalytic water splitting. The initial screening step is based upon machine learning (ML) predicted properties, and subsequent steps involve first-principles modeling of increasing complexity, going from density functional theory (DFT) to hybrid-DFT to GW calculations. Ensuring that at each stage more complex calculations are only applied to the most promising candidates, our study introduces an effective screening methodology that may serve as a model for accelerating 2D materials discovery within a large chemical space. Our screening process yields a selection of 11 promising 2D photocatalysts.
Collapse
Affiliation(s)
- Yatong Wang
- DIFFER - Dutch Institute for Fundamental Energy Research, De Zaale 20, Eindhoven 5612 AJ, The Netherlands
- Materials Simulation and Modeling, Department of Applied Physics, Eindhoven University of Technology, Eindhoven 5600 MB, The Netherlands
| | - Murat Cihan Sorkun
- DIFFER - Dutch Institute for Fundamental Energy Research, De Zaale 20, Eindhoven 5612 AJ, The Netherlands
| | - Geert Brocks
- Materials Simulation and Modeling, Department of Applied Physics, Eindhoven University of Technology, Eindhoven 5600 MB, The Netherlands
- Computational Chemical Physics, Faculty of Science and Technology and MESA+ Institute for Nanotechnology, University of Twente, P.O. Box 217, 7500 AE Enschede, The Netherlands
| | - Süleyman Er
- DIFFER - Dutch Institute for Fundamental Energy Research, De Zaale 20, Eindhoven 5612 AJ, The Netherlands
| |
Collapse
|
5
|
Roth JP, Bajorath J. Relationship between prediction accuracy and uncertainty in compound potency prediction using deep neural networks and control models. Sci Rep 2024; 14:6536. [PMID: 38503823 PMCID: PMC10950896 DOI: 10.1038/s41598-024-57135-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Accepted: 03/14/2024] [Indexed: 03/21/2024] Open
Abstract
The assessment of prediction variance or uncertainty contributes to the evaluation of machine learning models. In molecular machine learning, uncertainty quantification is an evolving area of research where currently no standard approaches or general guidelines are available. We have carried out a detailed analysis of deep neural network variants and simple control models for compound potency prediction to study relationships between prediction accuracy and uncertainty. For comparably accurate predictions obtained with models of different complexity, highly variable prediction uncertainties were detected using different metrics. Furthermore, a strong dependence of prediction characteristics and uncertainties on potency levels of test compounds was observed, often leading to over- or under-confident model decisions with respect to the expected variance of predictions. Moreover, neural network models responded very differently to training set modifications. Taken together, our findings indicate that there is only little, if any correlation between compound potency prediction accuracy and uncertainty, especially for deep neural network models, when predictions are assessed on the basis of currently used metrics for uncertainty quantification.
Collapse
Affiliation(s)
- Jannik P Roth
- Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5/6, 53115, Bonn, Germany
| | - Jürgen Bajorath
- Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5/6, 53115, Bonn, Germany.
| |
Collapse
|
6
|
Kumar S, Jing X, Pask JE, Medford AJ, Suryanarayana P. Kohn-Sham accuracy from orbital-free density functional theory via Δ-machine learning. J Chem Phys 2023; 159:244106. [PMID: 38147461 DOI: 10.1063/5.0180541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 11/30/2023] [Indexed: 12/28/2023] Open
Abstract
We present a Δ-machine learning model for obtaining Kohn-Sham accuracy from orbital-free density functional theory (DFT) calculations. In particular, we employ a machine-learned force field (MLFF) scheme based on the kernel method to capture the difference between Kohn-Sham and orbital-free DFT energies/forces. We implement this model in the context of on-the-fly molecular dynamics simulations and study its accuracy, performance, and sensitivity to parameters for representative systems. We find that the formalism not only improves the accuracy of Thomas-Fermi-von Weizsäcker orbital-free energies and forces by more than two orders of magnitude but is also more accurate than MLFFs based solely on Kohn-Sham DFT while being more efficient and less sensitive to model parameters. We apply the framework to study the structure of molten Al0.88Si0.12, the results suggesting no aggregation of Si atoms, in agreement with a previous Kohn-Sham study performed at an order of magnitude smaller length and time scales.
Collapse
Affiliation(s)
- Shashikant Kumar
- College of Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, USA
| | - Xin Jing
- College of Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, USA
- College of Computing, Georgia Institute of Technology, Atlanta, Georgia 30332, USA
| | - John E Pask
- Physics Division, Lawrence Livermore National Laboratory, Livermore, California 94550, USA
| | - Andrew J Medford
- College of Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, USA
| | - Phanish Suryanarayana
- College of Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, USA
- College of Computing, Georgia Institute of Technology, Atlanta, Georgia 30332, USA
| |
Collapse
|
7
|
Ghahremanpour MM, Saar A, Tirado-Rives J, Jorgensen WL. Ensemble Geometric Deep Learning of Aqueous Solubility. J Chem Inf Model 2023; 63:7338-7349. [PMID: 37990484 DOI: 10.1021/acs.jcim.3c01536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2023]
Abstract
Geometric deep learning is one of the main workhorses for harnessing the power of big data to predict molecular properties such as aqueous solubility, which is key to the pharmacokinetic improvement of drug candidates. Two ensembles of graph neural network architectures were built, one based on spectral convolution and the other on spatial convolution. The pretrained models, denoted respectively as SolNet-GCN and SolNet-GAT, significantly outperformed the existing neural networks benchmarked on a validation set of 207 molecules. The SolNet-GCN model demonstrated the best performance on both the training and validation sets, with RMSE values of 0.53 and 0.72 log molar unit and Pearson r2 values of 0.95 and 0.75, respectively. Further, the ranking power of the SolNet models agreed well with a QM-based thermodynamic cycle approach at the PBE-vdW level of theory on a series of benzophenylurea derivatives and a series of benzodiazepine derivatives. Nevertheless, testing the resultant models on a set of inhibitors of the macrophage migration inhibitory factor (MIF) illustrated that the inclusion of atomic attributes to discriminate atoms with a higher tendency to form intermolecular hydrogen bonds in the crystalline state and to identify planar or nonplanar substructures can be beneficial for the prediction of aqueous solubility.
Collapse
Affiliation(s)
| | - Anastasia Saar
- Department of Chemistry, Yale University New Haven, Connecticut 06520-8107, United States
| | - Julian Tirado-Rives
- Department of Chemistry, Yale University New Haven, Connecticut 06520-8107, United States
| | - William L Jorgensen
- Department of Chemistry, Yale University New Haven, Connecticut 06520-8107, United States
| |
Collapse
|
8
|
Luo Y, Liu Y, Peng J. Calibrated geometric deep learning improves kinase-drug binding predictions. NAT MACH INTELL 2023; 5:1390-1401. [PMID: 38962391 PMCID: PMC11221792 DOI: 10.1038/s42256-023-00751-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 09/29/2023] [Indexed: 07/05/2024]
Abstract
Protein kinases regulate various cellular functions and hold significant pharmacological promise in cancer and other diseases. Although kinase inhibitors are one of the largest groups of approved drugs, much of the human kinome remains unexplored but potentially druggable. Computational approaches, such as machine learning, offer efficient solutions for exploring kinase-compound interactions and uncovering novel binding activities. Despite the increasing availability of three-dimensional (3D) protein and compound structures, existing methods predominantly focus on exploiting local features from one-dimensional protein sequences and two-dimensional molecular graphs to predict binding affinities, overlooking the 3D nature of the binding process. Here we present KDBNet, a deep learning algorithm that incorporates 3D protein and molecule structure data to predict binding affinities. KDBNet uses graph neural networks to learn structure representations of protein binding pockets and drug molecules, capturing the geometric and spatial characteristics of binding activity. In addition, we introduce an algorithm to quantify and calibrate the uncertainties of KDBNet's predictions, enhancing its utility in model-guided discovery in chemical or protein space. Experiments demonstrated that KDBNet outperforms existing deep learning models in predicting kinase-drug binding affinities. The uncertainties estimated by KDBNet are informative and well-calibrated with respect to prediction errors. When integrated with a Bayesian optimization framework, KDBNet enables data-efficient active learning and accelerates the exploration and exploitation of diverse high-binding kinase-drug pairs.
Collapse
Affiliation(s)
- Yunan Luo
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA
- These authors contributed equally: Yunan Luo, Yang Liu
| | - Yang Liu
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL, USA
- These authors contributed equally: Yunan Luo, Yang Liu
| | - Jian Peng
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL, USA
| |
Collapse
|
9
|
Moriarty A, Kobayashi T, Salvalaglio M, Angeli P, Striolo A, McRobbie I. Analyzing the Accuracy of Critical Micelle Concentration Predictions Using Deep Learning. J Chem Theory Comput 2023; 19:7371-7386. [PMID: 37815387 DOI: 10.1021/acs.jctc.3c00868] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/11/2023]
Abstract
This paper presents a novel approach to predicting critical micelle concentrations (CMCs) by using graph neural networks (GNNs) augmented with Gaussian processes (GPs). The proposed model uses learned latent space representations of molecules to predict CMCs and estimate uncertainties. The performance of the model on a data set containing nonionic, cationic, anionic, and zwitterionic molecules is compared against a linear model that works with extended connectivity fingerprints (ECFPs). The GNN-based model performs slightly better than the linear ECFP model when there is enough well-balanced training data and achieves predictive accuracy that is comparable to published models that were evaluated on a smaller range of surfactant chemistries. We illustrate the applicability domain of our model using a molecular cartogram to visualize the latent space, which helps to identify molecules for which predictions are likely to be erroneous. In addition to accurately predicting CMCs for some surfactant classes, the proposed approach can provide valuable insights into the molecular properties that influence CMCs.
Collapse
Affiliation(s)
- Alexander Moriarty
- Department of Chemical Engineering, University College London, London WC1E 7JE, U.K
| | - Takeshi Kobayashi
- Department of Chemical Engineering, University College London, London WC1E 7JE, U.K
| | - Matteo Salvalaglio
- Department of Chemical Engineering, University College London, London WC1E 7JE, U.K
| | - Panagiota Angeli
- Department of Chemical Engineering, University College London, London WC1E 7JE, U.K
| | - Alberto Striolo
- Department of Chemical Engineering, University College London, London WC1E 7JE, U.K
- School of Sustainable Chemical, Biological and Materials Engineering, University of Oklahoma, Norman, Oklahoma 73019-0390, United States
| | | |
Collapse
|
10
|
Snyder R, Kim B, Pan X, Shao Y, Pu J. Bridging semiempirical and ab initio QM/MM potentials by Gaussian process regression and its sparse variants for free energy simulation. J Chem Phys 2023; 159:054107. [PMID: 37530109 PMCID: PMC10400118 DOI: 10.1063/5.0156327] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 07/10/2023] [Indexed: 08/03/2023] Open
Abstract
Free energy simulations that employ combined quantum mechanical and molecular mechanical (QM/MM) potentials at ab initio QM (AI) levels are computationally highly demanding. Here, we present a machine-learning-facilitated approach for obtaining AI/MM-quality free energy profiles at the cost of efficient semiempirical QM/MM (SE/MM) methods. Specifically, we use Gaussian process regression (GPR) to learn the potential energy corrections needed for an SE/MM level to match an AI/MM target along the minimum free energy path (MFEP). Force modification using gradients of the GPR potential allows us to improve configurational sampling and update the MFEP. To adaptively train our model, we further employ the sparse variational GP (SVGP) and streaming sparse GPR (SSGPR) methods, which efficiently incorporate previous sample information without significantly increasing the training data size. We applied the QM-(SS)GPR/MM method to the solution-phase SN2 Menshutkin reaction, NH3+CH3Cl→CH3NH3++Cl-, using AM1/MM and B3LYP/6-31+G(d,p)/MM as the base and target levels, respectively. For 4000 configurations sampled along the MFEP, the iteratively optimized AM1-SSGPR-4/MM model reduces the energy error in AM1/MM from 18.2 to 4.4 kcal/mol. Although not explicitly fitting forces, our method also reduces the key internal force errors from 25.5 to 11.1 kcal/mol/Å and from 30.2 to 10.3 kcal/mol/Å for the N-C and C-Cl bonds, respectively. Compared to the uncorrected simulations, the AM1-SSGPR-4/MM method lowers the predicted free energy barrier from 28.7 to 11.7 kcal/mol and decreases the reaction free energy from -12.4 to -41.9 kcal/mol, bringing these results into closer agreement with their AI/MM and experimental benchmarks.
Collapse
Affiliation(s)
- Ryan Snyder
- Department of Chemistry and Chemical Biology, Indiana University-Purdue University Indianapolis, 402 N Blackford St., Indianapolis, Indiana 46202, USA
| | - Bryant Kim
- Department of Chemistry and Chemical Biology, Indiana University-Purdue University Indianapolis, 402 N Blackford St., Indianapolis, Indiana 46202, USA
| | - Xiaoliang Pan
- Department of Chemistry and Biochemistry, University of Oklahoma, 101 Stephenson Pkwy, Norman, Oklahoma 73019, USA
| | - Yihan Shao
- Department of Chemistry and Biochemistry, University of Oklahoma, 101 Stephenson Pkwy, Norman, Oklahoma 73019, USA
| | - Jingzhi Pu
- Department of Chemistry and Chemical Biology, Indiana University-Purdue University Indianapolis, 402 N Blackford St., Indianapolis, Indiana 46202, USA
| |
Collapse
|
11
|
Heid E, McGill CJ, Vermeire FH, Green WH. Characterizing Uncertainty in Machine Learning for Chemistry. J Chem Inf Model 2023; 63:4012-4029. [PMID: 37338239 PMCID: PMC10336963 DOI: 10.1021/acs.jcim.3c00373] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Indexed: 06/21/2023]
Abstract
Characterizing uncertainty in machine learning models has recently gained interest in the context of machine learning reliability, robustness, safety, and active learning. Here, we separate the total uncertainty into contributions from noise in the data (aleatoric) and shortcomings of the model (epistemic), further dividing epistemic uncertainty into model bias and variance contributions. We systematically address the influence of noise, model bias, and model variance in the context of chemical property predictions, where the diverse nature of target properties and the vast chemical chemical space give rise to many different distinct sources of prediction error. We demonstrate that different sources of error can each be significant in different contexts and must be individually addressed during model development. Through controlled experiments on data sets of molecular properties, we show important trends in model performance associated with the level of noise in the data set, size of the data set, model architecture, molecule representation, ensemble size, and data set splitting. In particular, we show that 1) noise in the test set can limit a model's observed performance when the actual performance is much better, 2) using size-extensive model aggregation structures is crucial for extensive property prediction, and 3) ensembling is a reliable tool for uncertainty quantification and improvement specifically for the contribution of model variance. We develop general guidelines on how to improve an underperforming model when falling into different uncertainty contexts.
Collapse
Affiliation(s)
- Esther Heid
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Institute
of Materials Chemistry, TU Wien, 1060 Vienna, Austria
| | - Charles J. McGill
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemical and Life Science Engineering, Virginia Commonwealth University, Richmond, Virginia 23284, United States
| | - Florence H. Vermeire
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemical Engineering, KU Leuven, Celestijnenlaan 200F, B-3001 Leuven, Belgium
| | - William H. Green
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
12
|
Bang K, Hong D, Park Y, Kim D, Han SS, Lee HM. Machine learning-enabled exploration of the electrochemical stability of real-scale metallic nanoparticles. Nat Commun 2023; 14:3004. [PMID: 37230963 PMCID: PMC10213026 DOI: 10.1038/s41467-023-38758-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 05/10/2023] [Indexed: 05/27/2023] Open
Abstract
Surface Pourbaix diagrams are critical to understanding the stability of nanomaterials in electrochemical environments. Their construction based on density functional theory is, however, prohibitively expensive for real-scale systems, such as several nanometer-size nanoparticles (NPs). Herein, with the aim of accelerating the accurate prediction of adsorption energies, we developed a bond-type embedded crystal graph convolutional neural network (BE-CGCNN) model in which four bonding types were treated differently. Owing to the enhanced accuracy of the bond-type embedding approach, we demonstrate the construction of reliable Pourbaix diagrams for very large-size NPs involving up to 6525 atoms (approximately 4.8 nm in diameter), which enables the exploration of electrochemical stability over various NP sizes and shapes. BE-CGCNN-based Pourbaix diagrams well reproduce the experimental observations with increasing NP size. This work suggests a method for accelerated Pourbaix diagram construction for real-scale and arbitrarily shaped NPs, which would significantly open up an avenue for electrochemical stability studies.
Collapse
Affiliation(s)
- Kihoon Bang
- Department of Materials Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
- Computational Science Research Center, Korea Institute of Science and Technology (KIST), Seoul, 02792, Republic of Korea
| | - Doosun Hong
- Department of Materials Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
| | - Youngtae Park
- Department of Materials Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
| | - Donghun Kim
- Computational Science Research Center, Korea Institute of Science and Technology (KIST), Seoul, 02792, Republic of Korea.
| | - Sang Soo Han
- Computational Science Research Center, Korea Institute of Science and Technology (KIST), Seoul, 02792, Republic of Korea.
| | - Hyuck Mo Lee
- Department of Materials Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea.
| |
Collapse
|
13
|
Pepper N, Thomas M, De Ath G, Olivier E, Cannon R, Everson R, Dodwell T. A probabilistic model for aircraft in climb using monotonic functional Gaussian process emulators. Proc Math Phys Eng Sci 2023. [DOI: 10.1098/rspa.2022.0607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/09/2023] Open
Abstract
Ensuring vertical separation is a key means of maintaining safe separation between aircraft in congested airspace. Aircraft trajectories are modelled in the presence of significant epistemic uncertainty, leading to discrepancies between observed trajectories and the predictions of deterministic models, hampering the task of planning to ensure safe separation. In this paper, a probabilistic model is presented, for the purpose of emulating the trajectories of aircraft in climb and bounding the uncertainty of the predicted trajectory. A monotonic, functional representation exploits the spatio-temporal correlations in the radar observations. Through the use of Gaussian process emulators, features that parameterize the climb are mapped directly to functional outputs, providing a fast approximation, while ensuring that the resulting trajectory is monotonic. The model was applied as a probabilistic digital twin for aircraft in climb and baselined against the base of aircraft data, a deterministic model widely used in industry. When applied to an unseen test dataset, the probabilistic model was found to provide a mean prediction that was 20.56% more accurate, as measured by the mean absolute error, with data-driven credible intervals that were9.54% sharper.
Collapse
Affiliation(s)
- Nick Pepper
- The Alan Turing Institute, The British Library, London, UK
| | | | - George De Ath
- Department of Computer Science, University of Exeter, Exeter, UK
| | - Enrico Olivier
- Department of Computer Science, University of Exeter, Exeter, UK
| | | | - Richard Everson
- Department of Computer Science, University of Exeter, Exeter, UK
| | - Tim Dodwell
- Department of Computer Science, University of Exeter, Exeter, UK
- digiLab, Exeter, UK
| |
Collapse
|
14
|
Bridging the complexity gap in computational heterogeneous catalysis with machine learning. Nat Catal 2023. [DOI: 10.1038/s41929-023-00911-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/03/2023]
|
15
|
Chen Y, Ou Y, Zheng P, Huang Y, Ge F, Dral PO. Benchmark of general-purpose machine learning-based quantum mechanical method AIQM1 on reaction barrier heights. J Chem Phys 2023; 158:074103. [PMID: 36813722 DOI: 10.1063/5.0137101] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Artificial intelligence-enhanced quantum mechanical method 1 (AIQM1) is a general-purpose method that was shown to achieve high accuracy for many applications with a speed close to its baseline semiempirical quantum mechanical (SQM) method ODM2*. Here, we evaluate the hitherto unknown performance of out-of-the-box AIQM1 without any refitting for reaction barrier heights on eight datasets, including a total of ∼24 thousand reactions. This evaluation shows that AIQM1's accuracy strongly depends on the type of transition state and ranges from excellent for rotation barriers to poor for, e.g., pericyclic reactions. AIQM1 clearly outperforms its baseline ODM2* method and, even more so, a popular universal potential, ANI-1ccx. Overall, however, AIQM1 accuracy largely remains similar to SQM methods (and B3LYP/6-31G* for most reaction types) suggesting that it is desirable to focus on improving AIQM1 performance for barrier heights in the future. We also show that the built-in uncertainty quantification helps in identifying confident predictions. The accuracy of confident AIQM1 predictions is approaching the level of popular density functional theory methods for most reaction types. Encouragingly, AIQM1 is rather robust for transition state optimizations, even for the type of reactions it struggles with the most. Single-point calculations with high-level methods on AIQM1-optimized geometries can be used to significantly improve barrier heights, which cannot be said for its baseline ODM2* method.
Collapse
Affiliation(s)
- Yuxinxin Chen
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Yanchi Ou
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Peikun Zheng
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Yaohuang Huang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Fuchun Ge
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| |
Collapse
|
16
|
Exploring catalytic reaction networks with machine learning. Nat Catal 2023. [DOI: 10.1038/s41929-022-00896-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
|
17
|
Xu J, Xie W, Han Y, Hu P. Atomistic Insights into the Oxidation of Flat and Stepped Platinum Surfaces Using Large-Scale Machine Learning Potential-Based Grand-Canonical Monte Carlo. ACS Catal 2022. [DOI: 10.1021/acscatal.2c03976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Affiliation(s)
- Jiayan Xu
- School of Chemistry and Chemical Engineering, Queen’s University Belfast, BelfastBT9 5AG, U.K
| | - Wenbo Xie
- School of Chemistry and Chemical Engineering, Queen’s University Belfast, BelfastBT9 5AG, U.K
| | - Yulan Han
- School of Chemistry and Chemical Engineering, Queen’s University Belfast, BelfastBT9 5AG, U.K
| | - P. Hu
- School of Chemistry and Chemical Engineering, Queen’s University Belfast, BelfastBT9 5AG, U.K
| |
Collapse
|
18
|
Korolev V, Nevolin I, Protsenko P. A universal similarity based approach for predictive uncertainty quantification in materials science. Sci Rep 2022; 12:14931. [PMID: 36056050 PMCID: PMC9440040 DOI: 10.1038/s41598-022-19205-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 08/25/2022] [Indexed: 11/08/2022] Open
Abstract
Immense effort has been exerted in the materials informatics community towards enhancing the accuracy of machine learning (ML) models; however, the uncertainty quantification (UQ) of state-of-the-art algorithms also demands further development. Most prominent UQ methods are model-specific or are related to the ensembles of models; therefore, there is a need to develop a universal technique that can be readily applied to a single model from a diverse set of ML algorithms. In this study, we suggest a new UQ measure known as the Δ-metric to address this issue. The presented quantitative criterion was inspired by the k-nearest neighbor approach adopted for applicability domain estimation in chemoinformatics. It surpasses several UQ methods in accurately ranking the predictive errors and could be considered a low-cost option for a more advanced deep ensemble strategy. We also evaluated the performance of the presented UQ measure on various classes of materials, ML algorithms, and types of input features, thus demonstrating its universality.
Collapse
Affiliation(s)
- Vadim Korolev
- Department of Chemistry, Lomonosov Moscow State University, Moscow, 119991, Russia.
| | - Iurii Nevolin
- Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, Moscow, 119071, Russia
| | - Pavel Protsenko
- Department of Chemistry, Lomonosov Moscow State University, Moscow, 119991, Russia
| |
Collapse
|
19
|
Schmähling F, Martin J, Elster C. A framework for benchmarking uncertainty in deep regression. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03908-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
AbstractWe propose a framework for the assessment of uncertainty quantification in deep regression. The framework is based on regression problems where the regression function is a linear combination of nonlinear functions. Basically, any level of complexity can be realized through the choice of the nonlinear functions and the dimensionality of their domain. Results of an uncertainty quantification for deep regression are compared against those obtained by a statistical reference method. The reference method utilizes knowledge about the underlying nonlinear functions and is based on Bayesian linear regression using a prior reference. The flexibility, together with the availability of a reference solution, makes the framework suitable for defining benchmark sets for uncertainty quantification. Reliability of uncertainty quantification is assessed in terms of coverage probabilities, and accuracy through the size of calculated uncertainties. We illustrate the proposed framework by applying it to current approaches for uncertainty quantification in deep regression. In addition, results for three real-world regression tasks are presented.
Collapse
|
20
|
Kolluru A, Shuaibi M, Palizhati A, Shoghi N, Das A, Wood B, Zitnick CL, Kitchin JR, Ulissi ZW. Open Challenges in Developing Generalizable Large-Scale Machine-Learning Models for Catalyst Discovery. ACS Catal 2022. [DOI: 10.1021/acscatal.2c02291] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Adeesh Kolluru
- Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Muhammed Shuaibi
- Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Aini Palizhati
- Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Nima Shoghi
- Fundamental AI Research at Meta AI, Menlo Park, California 94025, United States
| | - Abhishek Das
- Fundamental AI Research at Meta AI, Menlo Park, California 94025, United States
| | - Brandon Wood
- Fundamental AI Research at Meta AI, Menlo Park, California 94025, United States
| | - C. Lawrence Zitnick
- Fundamental AI Research at Meta AI, Menlo Park, California 94025, United States
| | - John R. Kitchin
- Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Zachary W. Ulissi
- Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| |
Collapse
|
21
|
Xu W, Reuter K, Andersen M. Predicting binding motifs of complex adsorbates using machine learning with a physics-inspired graph representation. NATURE COMPUTATIONAL SCIENCE 2022; 2:443-450. [PMID: 38177870 DOI: 10.1038/s43588-022-00280-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Accepted: 06/17/2022] [Indexed: 01/06/2024]
Abstract
Computational screening in heterogeneous catalysis relies increasingly on machine learning models for predicting key input parameters due to the high cost of computing these directly using first-principles methods. This becomes especially relevant when considering complex materials spaces such as alloys, or complex reaction mechanisms with adsorbates that may exhibit bi- or higher-dentate adsorption motifs. Here we present a data-efficient approach to the prediction of binding motifs and associated adsorption enthalpies of complex adsorbates at transition metals and their alloys based on a customized Wasserstein Weisfeiler-Lehman graph kernel and Gaussian process regression. The model shows good predictive performance, not only for the elemental transition metals on which it was trained, but also for an alloy based on these transition metals. Furthermore, incorporation of minimal new training data allows for predicting an out-of-domain transition metal. We believe the model may be useful in active learning approaches, for which we present an ensemble uncertainty estimation approach.
Collapse
Affiliation(s)
- Wenbin Xu
- Chair for Theoretical Chemistry and Catalysis Research Center, Technische Universität München, Garching, Germany
- Fritz-Haber-Institut der Max-Planck-Gesellschaft, Berlin, Germany
| | - Karsten Reuter
- Fritz-Haber-Institut der Max-Planck-Gesellschaft, Berlin, Germany
| | - Mie Andersen
- Aarhus Institute of Advanced Studies, Aarhus University, Aarhus, Denmark.
- Department of Physics and Astronomy-Center for Interstellar Catalysis, Aarhus University, Aarhus, Denmark.
| |
Collapse
|
22
|
Freschi V, Lattanzi E. Evaluation of a sampling approach for computationally efficient uncertainty quantification in regression learning models. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07455-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
AbstractThe capability of effectively quantifying the uncertainty associated to a given prediction is an important task in many applications that range from drug design to autonomous driving, providing valuable information to many downstream decision-making processes. The increasing capacity of novel machine learning models, and the growing amount of data on which these systems are trained poses however significant issues to be addressed. Recent research advocated the need for evaluating learning systems not only according to traditional accuracy metrics but also according to the computational complexity required to design them, toward a perspective of sustainability and inclusivity. In this work, we present an empirical investigation aimed at assessing the impact of uniform sampling on the reduction in computational requirements, the quality of regression, and on its uncertainty quantification. We performed several experiments with recent state-of-the-art methods characterized by statistical guarantees whose performances have been measured according to different metrics for evaluating uncertainty quantification (i.e., coverage and length of prediction intervals) and regression (i.e., errors measures and correlation). Experimental results highlight possible interesting trade-offs between computation time, regression and uncertainty evaluation quality, thus confirming the viability of sampling-based approaches to overcome computational bottlenecks without significantly affecting the quality of predictions.
Collapse
|
23
|
Mou T, Han X, Zhu H, Xin H. Machine learning of lateral adsorbate interactions in surface reaction kinetics. Curr Opin Chem Eng 2022. [DOI: 10.1016/j.coche.2022.100825] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
24
|
Pernot P. The long road to calibrated prediction uncertainty in computational chemistry. J Chem Phys 2022; 156:114109. [DOI: 10.1063/5.0084302] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Uncertainty quantification (UQ) in computational chemistry (CC) is still in its infancy. Very few CC methods are designed to provide a confidence level on their predictions, and most users still rely improperly on the mean absolute error as an accuracy metric. The development of reliable UQ methods is essential, notably for CC to be used confidently in industrial processes. A review of the CC-UQ literature shows that there is no common standard procedure to report or validate prediction uncertainty. I consider here analysis tools using concepts (calibration and sharpness) developed in meteorology and machine learning for the validation of probabilistic forecasters. These tools are adapted to CC-UQ and applied to datasets of prediction uncertainties provided by composite methods, Bayesian ensembles methods, and machine learning and a posteriori statistical methods.
Collapse
Affiliation(s)
- Pascal Pernot
- Institut de Chimie Physique, UMR8000 CNRS, Université Paris-Saclay, 91405 Orsay, France
| |
Collapse
|
25
|
Steiner M, Reiher M. Autonomous Reaction Network Exploration in Homogeneous and Heterogeneous Catalysis. Top Catal 2022; 65:6-39. [PMID: 35185305 PMCID: PMC8816766 DOI: 10.1007/s11244-021-01543-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/17/2021] [Indexed: 12/11/2022]
Abstract
Autonomous computations that rely on automated reaction network elucidation algorithms may pave the way to make computational catalysis on a par with experimental research in the field. Several advantages of this approach are key to catalysis: (i) automation allows one to consider orders of magnitude more structures in a systematic and open-ended fashion than what would be accessible by manual inspection. Eventually, full resolution in terms of structural varieties and conformations as well as with respect to the type and number of potentially important elementary reaction steps (including decomposition reactions that determine turnover numbers) may be achieved. (ii) Fast electronic structure methods with uncertainty quantification warrant high efficiency and reliability in order to not only deliver results quickly, but also to allow for predictive work. (iii) A high degree of autonomy reduces the amount of manual human work, processing errors, and human bias. Although being inherently unbiased, it is still steerable with respect to specific regions of an emerging network and with respect to the addition of new reactant species. This allows for a high fidelity of the formalization of some catalytic process and for surprising in silico discoveries. In this work, we first review the state of the art in computational catalysis to embed autonomous explorations into the general field from which it draws its ingredients. We then elaborate on the specific conceptual issues that arise in the context of autonomous computational procedures, some of which we discuss at an example catalytic system. GRAPHICAL ABSTRACT SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s11244-021-01543-9.
Collapse
Affiliation(s)
- Miguel Steiner
- Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Markus Reiher
- Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| |
Collapse
|
26
|
Busk J, Bjørn Jørgensen P, Bhowmik A, Schmidt MN, Winther O, Vegge T. Calibrated uncertainty for molecular property prediction using ensembles of message passing neural networks. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2021. [DOI: 10.1088/2632-2153/ac3eb3] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Abstract
Data-driven methods based on machine learning have the potential to accelerate computational analysis of atomic structures. In this context, reliable uncertainty estimates are important for assessing confidence in predictions and enabling decision making. However, machine learning models can produce badly calibrated uncertainty estimates and it is therefore crucial to detect and handle uncertainty carefully. In this work we extend a message passing neural network designed specifically for predicting properties of molecules and materials with a calibrated probabilistic predictive distribution. The method presented in this paper differs from previous work by considering both aleatoric and epistemic uncertainty in a unified framework, and by recalibrating the predictive distribution on unseen data. Through computer experiments, we show that our approach results in accurate models for predicting molecular formation energies with well calibrated uncertainty in and out of the training data distribution on two public molecular benchmark datasets, QM9 and PC9. The proposed method provides a general framework for training and evaluating neural network ensemble models that are able to produce accurate predictions of properties of molecules with well calibrated uncertainty estimates.
Collapse
|
27
|
Allotey J, Butler KT, Thiyagalingam J. Entropy-based active learning of graph neural network surrogate models for materials properties. J Chem Phys 2021; 155:174116. [PMID: 34742215 DOI: 10.1063/5.0065694] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Graph neural networks trained on experimental or calculated data are becoming an increasingly important tool in computational materials science. Networks once trained are able to make highly accurate predictions at a fraction of the cost of experiments or first-principles calculations of comparable accuracy. However, these networks typically rely on large databases of labeled experiments to train the model. In scenarios where data are scarce or expensive to obtain, this can be prohibitive. By building a neural network that provides confidence on the predicted properties, we are able to develop an active learning scheme that can reduce the amount of labeled data required by identifying the areas of chemical space where the model is most uncertain. We present a scheme for coupling a graph neural network with a Gaussian process to featurize solid-state materials and predict properties including a measure of confidence in the prediction. We then demonstrate that this scheme can be used in an active learning context to speed up the training of the model by selecting the optimal next experiment for obtaining a data label. Our active learning scheme can double the rate at which the performance of the model on a test dataset improves with additional data compared to choosing the next sample at random. This type of uncertainty quantification and active learning has the potential to open up new areas of materials science, where data are scarce and expensive to obtain, to the transformative power of graph neural networks.
Collapse
Affiliation(s)
- Johannes Allotey
- School of Physics, University of Bristol, Bristol BS8 1TL, United Kingdom
| | - Keith T Butler
- Scientific Machine Learning Research Group, Scientific Computing Department, Rutherford Appleton Laboratory, Science and Technology Facilities Council, Didcot OX11 0DQ, United Kingdom
| | - Jeyan Thiyagalingam
- Scientific Machine Learning Research Group, Scientific Computing Department, Rutherford Appleton Laboratory, Science and Technology Facilities Council, Didcot OX11 0DQ, United Kingdom
| |
Collapse
|
28
|
Deringer VL, Bartók AP, Bernstein N, Wilkins DM, Ceriotti M, Csányi G. Gaussian Process Regression for Materials and Molecules. Chem Rev 2021; 121:10073-10141. [PMID: 34398616 PMCID: PMC8391963 DOI: 10.1021/acs.chemrev.1c00022] [Citation(s) in RCA: 245] [Impact Index Per Article: 81.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Indexed: 12/18/2022]
Abstract
We provide an introduction to Gaussian process regression (GPR) machine-learning methods in computational materials science and chemistry. The focus of the present review is on the regression of atomistic properties: in particular, on the construction of interatomic potentials, or force fields, in the Gaussian Approximation Potential (GAP) framework; beyond this, we also discuss the fitting of arbitrary scalar, vectorial, and tensorial quantities. Methodological aspects of reference data generation, representation, and regression, as well as the question of how a data-driven model may be validated, are reviewed and critically discussed. A survey of applications to a variety of research questions in chemistry and materials science illustrates the rapid growth in the field. A vision is outlined for the development of the methodology in the years to come.
Collapse
Affiliation(s)
- Volker L. Deringer
- Department
of Chemistry, Inorganic Chemistry Laboratory, University of Oxford, Oxford OX1 3QR, United Kingdom
| | - Albert P. Bartók
- Department
of Physics and Warwick Centre for Predictive Modelling, School of
Engineering, University of Warwick, Coventry CV4 7AL, United Kingdom
| | - Noam Bernstein
- Center
for Computational Materials Science, U.S.
Naval Research Laboratory, Washington D.C. 20375, United States
| | - David M. Wilkins
- Atomistic
Simulation Centre, School of Mathematics and Physics, Queen’s University Belfast, Belfast BT7 1NN, Northern Ireland, United Kingdom
| | - Michele Ceriotti
- Laboratory
of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, Lausanne 1015, Switzerland
- National
Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale
de Lausanne, Lausanne, Switzerland
| | - Gábor Csányi
- Engineering
Laboratory, University of Cambridge, Cambridge CB2 1PZ, United Kingdom
| |
Collapse
|
29
|
Soleimany AP, Amini A, Goldman S, Rus D, Bhatia SN, Coley CW. Evidential Deep Learning for Guided Molecular Property Prediction and Discovery. ACS CENTRAL SCIENCE 2021; 7:1356-1367. [PMID: 34471680 PMCID: PMC8393200 DOI: 10.1021/acscentsci.1c00546] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Indexed: 05/12/2023]
Abstract
While neural networks achieve state-of-the-art performance for many molecular modeling and structure-property prediction tasks, these models can struggle with generalization to out-of-domain examples, exhibit poor sample efficiency, and produce uncalibrated predictions. In this paper, we leverage advances in evidential deep learning to demonstrate a new approach to uncertainty quantification for neural network-based molecular structure-property prediction at no additional computational cost. We develop both evidential 2D message passing neural networks and evidential 3D atomistic neural networks and apply these networks across a range of different tasks. We demonstrate that evidential uncertainties enable (1) calibrated predictions where uncertainty correlates with error, (2) sample-efficient training through uncertainty-guided active learning, and (3) improved experimental validation rates in a retrospective virtual screening campaign. Our results suggest that evidential deep learning can provide an efficient means of uncertainty quantification useful for molecular property prediction, discovery, and design tasks in the chemical and physical sciences.
Collapse
Affiliation(s)
- Ava P. Soleimany
- Harvard-MIT
Division of Health Sciences and Technology, MIT, Cambridge, Massachusetts 02139, United States
- Graduate
Program in Biophysics, Harvard University, Boston, Massachusetts 02115, United States
- Microsoft
Research New England, Cambridge, Massachusetts 02142, United States
| | - Alexander Amini
- Department
of Electrical Engineering and Computer Science, MIT, Cambridge, Massachusetts 02139, United States
| | - Samuel Goldman
- Computational
and Systems Biology, MIT, Cambridge, Massachusetts 02139, United States
| | - Daniela Rus
- Department
of Electrical Engineering and Computer Science, MIT, Cambridge, Massachusetts 02139, United States
| | - Sangeeta N. Bhatia
- Harvard-MIT
Division of Health Sciences and Technology, MIT, Cambridge, Massachusetts 02139, United States
- Department
of Electrical Engineering and Computer Science, MIT, Cambridge, Massachusetts 02139, United States
- Howard
Hughes Medical Institute, Cambridge, Massachusetts 02139, United States
| | - Connor W. Coley
- Department
of Chemical Engineering, MIT, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
30
|
Cesar de Azevedo L, Pinheiro GA, Quiles MG, Da Silva JLF, Prati RC. Systematic Investigation of Error Distribution in Machine Learning Algorithms Applied to the Quantum-Chemistry QM9 Data Set Using the Bias and Variance Decomposition. J Chem Inf Model 2021; 61:4210-4223. [PMID: 34387994 DOI: 10.1021/acs.jcim.1c00503] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Most machine learning applications in quantum-chemistry (QC) data sets rely on a single statistical error parameter such as the mean square error (MSE) to evaluate their performance. However, this approach has limitations or can even yield incorrect interpretations. Here, we report a systematic investigation of the two components of the MSE, i.e., the bias and variance, using the QM9 data set. To this end, we experiment with three descriptors, namely (i) symmetry functions (SF, with two-body and three-body functions), (ii) many-body tensor representation (MBTR, with two- and three-body terms), and (iii) smooth overlap of atomic positions (SOAP), to evaluate the prediction process's performance using different numbers of molecules in training samples and the effect of bias and variance on the final MSE. Overall, low sample sizes are related to higher MSE. Moreover, the bias component strongly influences the larger MSEs. Furthermore, there is little agreement among molecules with higher errors (outliers) across different descriptors. However, there is a high prevalence among the outliers intersection set and the convex hull volume of geometric coordinates (VCH). According to the obtained results with the distribution of MSE (and its components bias and variance) and the appearance of outliers, it is suggested to use ensembles of models with a low bias to minimize the MSE, more specifically when using a small number of molecules in the training set.
Collapse
Affiliation(s)
- Luis Cesar de Azevedo
- Center of Mathematics, Computation and Cognition, Federal University of ABC, Av. dos Estados, 5001, 09210-580 Santo André, SP, Brazil
| | - Gabriel A Pinheiro
- Institute of Science and Technology, Federal University of São Paulo (Unifesp), 12247-014 São José dos Campos, SP, Brazil
| | - Marcos G Quiles
- Institute of Science and Technology, Federal University of São Paulo (Unifesp), 12247-014 São José dos Campos, SP, Brazil
| | - Juarez L F Da Silva
- São Carlos Institute of Chemistry, University of São Paulo, PO Box 780, 13560-970 São Carlos, SP, Brazil
| | - Ronaldo C Prati
- Center of Mathematics, Computation and Cognition, Federal University of ABC, Av. dos Estados, 5001, 09210-580 Santo André, SP, Brazil
| |
Collapse
|
31
|
Ward L, Dandu N, Blaiszik B, Narayanan B, Assary RS, Redfern PC, Foster I, Curtiss LA. Graph-Based Approaches for Predicting Solvation Energy in Multiple Solvents: Open Datasets and Machine Learning Models. J Phys Chem A 2021; 125:5990-5998. [PMID: 34191512 DOI: 10.1021/acs.jpca.1c01960] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The solvation properties of molecules, often estimated using quantum chemical simulations, are important in the synthesis of energy storage materials, drugs, and industrial chemicals. Here, we develop machine learning models of solvation energies to replace expensive quantum chemistry calculations with inexpensive-to-compute message-passing neural network models that require only the molecular graph as inputs. Our models are trained on a new database of solvation energies for 130,258 molecules taken from the QM9 dataset computed in five solvents (acetone, ethanol, acetonitrile, dimethyl sulfoxide, and water) via an implicit solvent model. Our best model achieves a mean absolute error of 0.5 kcal/mol for molecules with nine or fewer non-hydrogen atoms and 1 kcal/mol for molecules with between 10 and 14 non-hydrogen atoms. We make the entire dataset of 651,290 computed entries openly available and provide simple web and programmatic interfaces to enable others to run our solvation energy model on new molecules. This model calculates the solvation energies for molecules using only the SMILES string and also provides an estimate of whether each molecule is within the domain of applicability of our model. We envision that the dataset and models will provide the functionality needed for the rapid screening of large chemical spaces to discover improved molecules for many applications.
Collapse
Affiliation(s)
- Logan Ward
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Naveen Dandu
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Ben Blaiszik
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois 60439, United States.,Globus, University of Chicago, Chicago, Illinois 60637, United States
| | - Badri Narayanan
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States.,Department of Mechanical Engineering, University of Louisville, Louisville, Kentucky 40292, United States
| | - Rajeev S Assary
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Paul C Redfern
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Ian Foster
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois 60439, United States.,Globus, University of Chicago, Chicago, Illinois 60637, United States.,Department of Computer Science, University of Chicago, Chicago, Illinois 60637, United States
| | - Larry A Curtiss
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| |
Collapse
|
32
|
Xu J, Cao XM, Hu P. Accelerating Metadynamics-Based Free-Energy Calculations with Adaptive Machine Learning Potentials. J Chem Theory Comput 2021; 17:4465-4476. [PMID: 34100605 DOI: 10.1021/acs.jctc.1c00261] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
There is an increasing demand for free-energy calculations using ab initio molecular dynamics these days. Metadynamics (MetaD) is frequently utilized to reconstruct the free-energy surface, but it is often computationally intractable for the first-principles calculations. Machine learning potentials (MLPs) have become popular alternatives. However, the training could be a long and arduous process before using them in practical applications. To accelerate MetaD use with MLPs for the free-energy calculation in an easy manner, we propose the adaptive machine learning potential-accelerated metadynamics (AMLP-MetaD). In this method, the MLP in the form of a Gaussian approximation potential (GAP) can adapt itself based on its uncertainty estimation, which decides whether to accept the model prediction or recalculate it with a reference method (usually density functional theory) for further training during the MetaD simulation. We demonstrate that the free-energy landscape similar to the ab initio one can be obtained using AMLP-MetaD with a 10-time speedup. Moreover, the quality of the free-energy results can be deeply improved using Δ-MLP, which is the GAP-corrected density functional tight binding in our case. We exemplify this novel method with two model systems, CO adsorption on the Pt13 cluster and the Pt(111) surface, which are of vital importance in heterogeneous catalysis. The successful application in these two tests highlights that our proposed method can be used in both cluster and periodic systems and for up to two collective variables.
Collapse
Affiliation(s)
- Jiayan Xu
- School of Chemistry and Chemical Engineering, Queen's University Belfast, Belfast BT9 5AG, U.K
| | - Xiao-Ming Cao
- Key Laboratory for Advanced Materials, Centre for Computational Chemistry and Research Institute of Industrial Catalysis, School of Chemistry and Molecular Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
| | - P Hu
- School of Chemistry and Chemical Engineering, Queen's University Belfast, Belfast BT9 5AG, U.K
| |
Collapse
|
33
|
Xu J, Cao XM, Hu P. Perspective on computational reaction prediction using machine learning methods in heterogeneous catalysis. Phys Chem Chem Phys 2021; 23:11155-11179. [PMID: 33972971 DOI: 10.1039/d1cp01349a] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Heterogeneous catalysis plays a significant role in the modern chemical industry. Towards the rational design of novel catalysts, understanding reactions over surfaces is the most essential aspect. Typical industrial catalytic processes such as syngas conversion and methane utilisation can generate a large reaction network comprising thousands of intermediates and reaction pairs. This complexity not only arises from the permutation of transformations between species but also from the extra reaction channels offered by distinct surface sites. Despite the success in investigating surface reactions at the atomic scale, the huge computational expense of ab initio methods hinders the exploration of such complicated reaction networks. With the proliferation of catalysis studies, machine learning as an emerging tool can take advantage of the accumulated reaction data to emulate the output of ab initio methods towards swift reaction prediction. Here, we briefly summarise the conventional workflow of reaction prediction, including reaction network generation, ab initio thermodynamics and microkinetic modelling. An overview of the frequently used regression models in machine learning is presented. As a promising alternative to full ab initio calculations, machine learning interatomic potentials are highlighted. Furthermore, we survey applications assisted by these methods for accelerating reaction prediction, exploring reaction networks, and computational catalyst design. Finally, we envisage future directions in computationally investigating reactions and implementing machine learning algorithms in heterogeneous catalysis.
Collapse
Affiliation(s)
- Jiayan Xu
- Key Laboratory for Advanced Materials and Joint International Research Laboratory of Precision Chemistry and Molecular Engineering, Feringa Nobel Prize Scientist Joint Research Center, Frontiers Science Center for Materiobiology and Dynamic Chemistry, Centre for Computational Chemistry and Research Institute of Industrial Catalysis, School of Chemistry and Molecular Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, P. R. China. and School of Chemistry and Chemical Engineering, Queen's University Belfast, Belfast BT9 5AG, UK
| | - Xiao-Ming Cao
- Key Laboratory for Advanced Materials and Joint International Research Laboratory of Precision Chemistry and Molecular Engineering, Feringa Nobel Prize Scientist Joint Research Center, Frontiers Science Center for Materiobiology and Dynamic Chemistry, Centre for Computational Chemistry and Research Institute of Industrial Catalysis, School of Chemistry and Molecular Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, P. R. China.
| | - P Hu
- Key Laboratory for Advanced Materials and Joint International Research Laboratory of Precision Chemistry and Molecular Engineering, Feringa Nobel Prize Scientist Joint Research Center, Frontiers Science Center for Materiobiology and Dynamic Chemistry, Centre for Computational Chemistry and Research Institute of Industrial Catalysis, School of Chemistry and Molecular Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, P. R. China. and School of Chemistry and Chemical Engineering, Queen's University Belfast, Belfast BT9 5AG, UK
| |
Collapse
|
34
|
Wan S, Sinclair RC, Coveney PV. Uncertainty quantification in classical molecular dynamics. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2021; 379:20200082. [PMID: 33775140 PMCID: PMC8059622 DOI: 10.1098/rsta.2020.0082] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 11/02/2020] [Indexed: 05/24/2023]
Abstract
Molecular dynamics simulation is now a widespread approach for understanding complex systems on the atomistic scale. It finds applications from physics and chemistry to engineering, life and medical science. In the last decade, the approach has begun to advance from being a computer-based means of rationalizing experimental observations to producing apparently credible predictions for a number of real-world applications within industrial sectors such as advanced materials and drug discovery. However, key aspects concerning the reproducibility of the method have not kept pace with the speed of its uptake in the scientific community. Here, we present a discussion of uncertainty quantification for molecular dynamics simulation designed to endow the method with better error estimates that will enable it to be used to report actionable results. The approach adopted is a standard one in the field of uncertainty quantification, namely using ensemble methods, in which a sufficiently large number of replicas are run concurrently, from which reliable statistics can be extracted. Indeed, because molecular dynamics is intrinsically chaotic, the need to use ensemble methods is fundamental and holds regardless of the duration of the simulations performed. We discuss the approach and illustrate it in a range of applications from materials science to ligand-protein binding free energy estimation. This article is part of the theme issue 'Reliability and reproducibility in computational science: implementing verification, validation and uncertainty quantification in silico'.
Collapse
Affiliation(s)
- Shunzhou Wan
- Centre for Computational Science, University College London, Gordon Street, London WC1H 0AJ, UK
| | - Robert C. Sinclair
- Centre for Computational Science, University College London, Gordon Street, London WC1H 0AJ, UK
| | - Peter V. Coveney
- Centre for Computational Science, University College London, Gordon Street, London WC1H 0AJ, UK
- Institute for Informatics, Science Park 904, University of Amsterdam, 1098 XH Amsterdam, The Netherlands
| |
Collapse
|
35
|
Chanussot L, Das A, Goyal S, Lavril T, Shuaibi M, Riviere M, Tran K, Heras-Domingo J, Ho C, Hu W, Palizhati A, Sriram A, Wood B, Yoon J, Parikh D, Zitnick CL, Ulissi Z. Open Catalyst 2020 (OC20) Dataset and Community Challenges. ACS Catal 2021. [DOI: 10.1021/acscatal.0c04525] [Citation(s) in RCA: 67] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Affiliation(s)
- Lowik Chanussot
- Facebook AI Research (FAIR), 1 Hacker Way, Menlo Park, California 94025, United States
| | - Abhishek Das
- Facebook AI Research (FAIR), 1 Hacker Way, Menlo Park, California 94025, United States
| | - Siddharth Goyal
- Facebook AI Research (FAIR), 1 Hacker Way, Menlo Park, California 94025, United States
| | - Thibaut Lavril
- Facebook AI Research (FAIR), 1 Hacker Way, Menlo Park, California 94025, United States
| | - Muhammed Shuaibi
- Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Morgane Riviere
- Facebook AI Research (FAIR), 1 Hacker Way, Menlo Park, California 94025, United States
| | - Kevin Tran
- Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Javier Heras-Domingo
- Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Caleb Ho
- Facebook AI Research (FAIR), 1 Hacker Way, Menlo Park, California 94025, United States
| | - Weihua Hu
- Computer Science Department, Stanford University, Stanford, California 94305, United States
| | - Aini Palizhati
- Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Anuroop Sriram
- Facebook AI Research (FAIR), 1 Hacker Way, Menlo Park, California 94025, United States
| | - Brandon Wood
- National Energy Research Scientific Computing Center (NERSC), Berkeley, California 94720, United States
| | - Junwoong Yoon
- Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Devi Parikh
- Facebook AI Research (FAIR), 1 Hacker Way, Menlo Park, California 94025, United States
- School of Interactive Computing, Georgia Tech, Atlanta, Georgia 30332, United States
| | - C. Lawrence Zitnick
- Facebook AI Research (FAIR), 1 Hacker Way, Menlo Park, California 94025, United States
| | - Zachary Ulissi
- Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
- Scott Institute for Energy Innovation, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| |
Collapse
|
36
|
Tran K, Neiswanger W, Broderick K, Xing E, Schneider J, Ulissi ZW. Computational catalyst discovery: Active classification through myopic multiscale sampling. J Chem Phys 2021; 154:124118. [PMID: 33810693 DOI: 10.1063/5.0044989] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
The recent boom in computational chemistry has enabled several projects aimed at discovering useful materials or catalysts. We acknowledge and address two recurring issues in the field of computational catalyst discovery. First, calculating macro-scale catalyst properties is not straightforward when using ensembles of atomic-scale calculations [e.g., density functional theory (DFT)]. We attempt to address this issue by creating a multi-scale model that estimates bulk catalyst activity using adsorption energy predictions from both DFT and machine learning models. The second issue is that many catalyst discovery efforts seek to optimize catalyst properties, but optimization is an inherently exploitative objective that is in tension with the explorative nature of early-stage discovery projects. In other words, why invest so much time finding a "best" catalyst when it is likely to fail for some other, unforeseen problem? We address this issue by relaxing the catalyst discovery goal into a classification problem: "What is the set of catalysts that is worth testing experimentally?" Here, we present a catalyst discovery method called myopic multiscale sampling, which combines multiscale modeling with automated selection of DFT calculations. It is an active classification strategy that seeks to classify catalysts as "worth investigating" or "not worth investigating" experimentally. Our results show an ∼7-16 times speedup in catalyst classification relative to random sampling. These results were based on offline simulations of our algorithm on two different datasets: a larger, synthesized dataset and a smaller, real dataset.
Collapse
Affiliation(s)
- Kevin Tran
- Chemical Engineering Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15217, USA
| | - Willie Neiswanger
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15217, USA
| | - Kirby Broderick
- Chemical Engineering Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15217, USA
| | - Eric Xing
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15217, USA
| | - Jeff Schneider
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15217, USA
| | - Zachary W Ulissi
- Chemical Engineering Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15217, USA
| |
Collapse
|
37
|
Imbalzano G, Zhuang Y, Kapil V, Rossi K, Engel EA, Grasselli F, Ceriotti M. Uncertainty estimation for molecular dynamics and sampling. J Chem Phys 2021; 154:074102. [DOI: 10.1063/5.0036522] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Giulio Imbalzano
- Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Yongbin Zhuang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials, Xiamen University, Xiamen 361005, China
| | - Venkat Kapil
- Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
- Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Kevin Rossi
- Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
- Laboratory of Nanochemistry for Energy, ISIC, École Polytechnique Fédérale de Lausanne, 1950 Sion, Switzerland
| | - Edgar A. Engel
- Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Federico Grasselli
- Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Michele Ceriotti
- Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
38
|
Shuaibi M, Sivakumar S, Chen RQ, Ulissi ZW. Enabling robust offline active learning for machine learning potentials using simple physics-based priors. MACHINE LEARNING-SCIENCE AND TECHNOLOGY 2021. [DOI: 10.1088/2632-2153/abcc44] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
39
|
Hirschfeld L, Swanson K, Yang K, Barzilay R, Coley CW. Uncertainty Quantification Using Neural Networks for Molecular Property Prediction. J Chem Inf Model 2020; 60:3770-3780. [PMID: 32702986 DOI: 10.1021/acs.jcim.0c00502] [Citation(s) in RCA: 72] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Uncertainty quantification (UQ) is an important component of molecular property prediction, particularly for drug discovery applications where model predictions direct experimental design and where unanticipated imprecision wastes valuable time and resources. The need for UQ is especially acute for neural models, which are becoming increasingly standard yet are challenging to interpret. While several approaches to UQ have been proposed in the literature, there is no clear consensus on the comparative performance of these models. In this paper, we study this question in the context of regression tasks. We systematically evaluate several methods on five regression data sets using multiple complementary performance metrics. Our experiments show that none of the methods we tested is unequivocally superior to all others, and none produces a particularly reliable ranking of errors across multiple data sets. While we believe that these results show that existing UQ methods are not sufficient for all common use cases and further research is needed, we conclude with a practical recommendation as to which existing techniques seem to perform well relative to others.
Collapse
Affiliation(s)
- Lior Hirschfeld
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, Massachusetts 02139, United States
| | - Kyle Swanson
- Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, Cambridge CB3 0WB, U.K
| | - Kevin Yang
- Department of Electrical Engineering and Computer Sciences, University of California Berkeley, Berkeley, California 94720, United States
| | - Regina Barzilay
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, Massachusetts 02139, United States
| | - Connor W Coley
- Department of Chemical Engineering, MIT, Cambridge, Massachusetts 02139, United States
| |
Collapse
|