1
|
Gould T, Chan B, Dale SG, Vuckovic S. Identifying and embedding transferability in data-driven representations of chemical space. Chem Sci 2024; 15:11122-11133. [PMID: 39027290 PMCID: PMC11253166 DOI: 10.1039/d4sc02358g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Accepted: 06/02/2024] [Indexed: 07/20/2024] Open
Abstract
Transferability, especially in the context of model generalization, is a paradigm of all scientific disciplines. However, the rapid advancement of machine learned model development threatens this paradigm, as it can be difficult to understand how transferability is embedded (or missed) in complex models developed using large training data sets. Two related open problems are how to identify, without relying on human intuition, what makes training data transferable; and how to embed transferability into training data. To solve both problems for ab initio chemical modelling, an indispensable tool in everyday chemistry research, we introduce a transferability assessment tool (TAT) and demonstrate it on a controllable data-driven model for developing density functional approximations (DFAs). We reveal that human intuition in the curation of training data introduces chemical biases that can hamper the transferability of data-driven DFAs. We use our TAT to motivate three transferability principles; one of which introduces the key concept of transferable diversity. Finally, we propose data curation strategies for general-purpose machine learning models in chemistry that identify and embed the transferability principles.
Collapse
Affiliation(s)
- Tim Gould
- Queensland Micro- and Nanotechnology Centre, Griffith University Nathan Qld 4111 Australia
| | - Bun Chan
- Graduate School of Engineering, Nagasaki University Bunkyo 1-14 Nagasaki 852-8521 Japan
| | - Stephen G Dale
- Queensland Micro- and Nanotechnology Centre, Griffith University Nathan Qld 4111 Australia
- Institute of Functional Intelligent Materials, National University of Singapore 4 Science Drive 2 Singapore 117544
| | - Stefan Vuckovic
- Department of Chemistry, University of Fribourg Fribourg Switzerland
| |
Collapse
|
2
|
Fisher KE, Herbst MF, Marzouk YM. Multitask methods for predicting molecular properties from heterogeneous data. J Chem Phys 2024; 161:014114. [PMID: 38958501 DOI: 10.1063/5.0201681] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 06/12/2024] [Indexed: 07/04/2024] Open
Abstract
Data generation remains a bottleneck in training surrogate models to predict molecular properties. We demonstrate that multitask Gaussian process regression overcomes this limitation by leveraging both expensive and cheap data sources. In particular, we consider training sets constructed from coupled-cluster (CC) and density functional theory (DFT) data. We report that multitask surrogates can predict at CC-level accuracy with a reduction in data generation cost by over an order of magnitude. Of note, our approach allows the training set to include DFT data generated by a heterogeneous mix of exchange-correlation functionals without imposing any artificial hierarchy on functional accuracy. More generally, the multitask framework can accommodate a wider range of training set structures-including the full disparity between the different levels of fidelity-than existing kernel approaches based on Δ-learning although we show that the accuracy of the two approaches can be similar. Consequently, multitask regression can be a tool for reducing data generation costs even further by opportunistically exploiting existing data sources.
Collapse
Affiliation(s)
- K E Fisher
- Department of Aeronautics and Astronautics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - M F Herbst
- Mathematics for Materials Modelling, Institute of Mathematics and Institute of Materials, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Y M Marzouk
- Department of Aeronautics and Astronautics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
3
|
Vennelakanti V, Kilic IB, Terrones GG, Duan C, Kulik HJ. Machine Learning Prediction of the Experimental Transition Temperature of Fe(II) Spin-Crossover Complexes. J Phys Chem A 2024; 128:204-216. [PMID: 38148525 DOI: 10.1021/acs.jpca.3c07104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2023]
Abstract
Spin-crossover (SCO) complexes are materials that exhibit changes in the spin state in response to external stimuli, with potential applications in molecular electronics. It is challenging to know a priori how to design ligands to achieve the delicate balance of entropic and enthalpic contributions needed to tailor a transition temperature close to room temperature. We leverage the SCO complexes from the previously curated SCO-95 data set [Vennelakanti et al. J. Chem. Phys. 159, 024120 (2023)] to train three machine learning (ML) models for transition temperature (T1/2) prediction using graph-based revised autocorrelations as features. We perform feature selection using random forest-ranked recursive feature addition (RF-RFA) to identify the features essential to model transferability. Of the ML models considered, the full feature set RF and recursive feature addition RF models perform best, achieving moderate correlation to experimental T1/2 values. We then compare ML T1/2 predictions to those from three previously identified best-performing density functional approximations (DFAs) which accurately predict SCO behavior across SCO-95, finding that the ML models predict T1/2 more accurately than the best-performing DFAs. In addition, we study ML model predictions for a set of 18 SCO complexes for which only estimated T1/2 values are available. Upon excluding outliers from this set, the RF-RFA RF model shows a strong correlation to estimated T1/2 values with a Pearson's r of 0.82. In contrast, DFA-predicted T1/2 values have large errors and show no correlation to estimated T1/2 values over the same set of complexes. Overall, our study demonstrates slightly superior performance of ML models in comparison with some of the best-performing DFAs, and we expect ML models to improve further as larger data sets of SCO complexes are curated and become available for model training.
Collapse
Affiliation(s)
- Vyshnavi Vennelakanti
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Irem B Kilic
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Gianmarco G Terrones
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
4
|
Rasmussen MH, Duan C, Kulik HJ, Jensen JH. Uncertain of uncertainties? A comparison of uncertainty quantification metrics for chemical data sets. J Cheminform 2023; 15:121. [PMID: 38111020 PMCID: PMC10729461 DOI: 10.1186/s13321-023-00790-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 11/28/2023] [Indexed: 12/20/2023] Open
Abstract
With the increasingly more important role of machine learning (ML) models in chemical research, the need for putting a level of confidence to the model predictions naturally arises. Several methods for obtaining uncertainty estimates have been proposed in recent years but consensus on the evaluation of these have yet to be established and different studies on uncertainties generally uses different metrics to evaluate them. We compare three of the most popular validation metrics (Spearman's rank correlation coefficient, the negative log likelihood (NLL) and the miscalibration area) to the error-based calibration introduced by Levi et al. (Sensors 2022, 22, 5540). Importantly, metrics such as the negative log likelihood (NLL) and Spearman's rank correlation coefficient bear little information in themselves. We therefore introduce reference values obtained through errors simulated directly from the uncertainty distribution. The different metrics target different properties and we show how to interpret them, but we generally find the best overall validation to be done based on the error-based calibration plot introduced by Levi et al. Finally, we illustrate the sensitivity of ranking-based methods (e.g. Spearman's rank correlation coefficient) towards test set design by using the same toy model ferent test sets and obtaining vastly different metrics (0.05 vs. 0.65).
Collapse
Affiliation(s)
- Maria H Rasmussen
- Department of Chemistry, University of Copenhagen, Copenhagen, Denmark.
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, USA
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, USA
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, USA
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, USA
| | - Jan H Jensen
- Department of Chemistry, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
5
|
Ariyarathna IR, Cho Y, Duan C, Kulik HJ. Gas-phase and solid-state electronic structure analysis and DFT benchmarking of HfCO. Phys Chem Chem Phys 2023; 25:26632-26639. [PMID: 37767841 DOI: 10.1039/d3cp03550f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/29/2023]
Abstract
Ab initio multi-reference configuration interaction (MRCI) and coupled cluster singles doubles and perturbative triples [CCSD(T)] levels of theory were used to study ground and excited electronic states of HfCO. We report potential energy curves, dissociation energies (De), excitation energies, harmonic vibrational frequencies, and chemical bonding patterns of HfCO. The 3Σ- ground state of HfCO has an 1σ22σ21π2 electron configuration and a ∼30 kcal mol-1 dissociation energy with respect to its lowest-energy fragments Hf(3F) + CO(X1Σ+). We further evaluated the De of its isovalent HfCX (X = S, Se, Te, Po) series and observed that they increase linearly from the lighter HfCO to the heavier HfCPo with the dipole moment of the CX ligand. The same linear relationship was observed for TiCX and ZrCX. We utilized the CCSD(T) benchmark values of De, excitation energy, and ionization energy (IE) values to evaluate density functional theory (DFT) errors with 23 exchange-correlation functionals spanning GGA, meta-GGA, global GGA hybrid, meta-GGA hybrid, range-separated hybrid, and double-hybrid functional families. The global GGA hybrid B3LYP and range-separated hybrid ωB97X performed well at representing the ground state properties of HfCO (i.e., De and IE). Finally, we extended our DFT analysis to the interaction of a CO molecule with a Hf surface and observed that the surface chemisorption energy and the gas-phase molecular dissociation energy are very similar for some DFAs but not others, suggesting moderate transferability of the benchmarks on these molecules to the solid state.
Collapse
Affiliation(s)
- Isuru R Ariyarathna
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| | - Yeongsu Cho
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| |
Collapse
|
6
|
Kirschbaum T, von Seggern B, Dzubiella J, Bande A, Noé F. Machine Learning Frontier Orbital Energies of Nanodiamonds. J Chem Theory Comput 2023; 19:4461-4473. [PMID: 37053438 DOI: 10.1021/acs.jctc.2c01275] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/15/2023]
Abstract
Nanodiamonds have a wide range of applications including catalysis, sensing, tribology, and biomedicine. To leverage nanodiamond design via machine learning, we introduce the new data set ND5k, consisting of 5089 diamondoid and nanodiamond structures and their frontier orbital energies. ND5k structures are optimized via tight-binding density functional theory (DFTB) and their frontier orbital energies are computed using density functional theory (DFT) with the PBE0 hybrid functional. From this data set we derive a qualitative design suggestion for nanodiamonds in photocatalysis. We also compare recent machine learning models for predicting frontier orbital energies for similar structures as they have been trained on (interpolation on ND5k), and we test their abilities to extrapolate predictions to larger structures. For both the interpolation and extrapolation task, we find the best performance using the equivariant message passing neural network PaiNN. The second best results are achieved with a message passing neural network using a tailored set of atomic descriptors proposed here.
Collapse
Affiliation(s)
- Thorren Kirschbaum
- Helmholtz-Zentrum Berlin für Materialien und Energie GmbH, Hahn-Meitner-Platz 1, 14109 Berlin, Germany
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
| | - Börries von Seggern
- Helmholtz-Zentrum Berlin für Materialien und Energie GmbH, Hahn-Meitner-Platz 1, 14109 Berlin, Germany
- Department of Biology, Chemistry and Pharmacy, Freie Universität Berlin, Arnimallee 22, 14195 Berlin, Germany
| | - Joachim Dzubiella
- Institute of Physics, Albert-Ludwigs-Universität Freiburg, Hermann-Herder-Straße 3, 79104 Freiburg im Breisgau, Germany
| | - Annika Bande
- Helmholtz-Zentrum Berlin für Materialien und Energie GmbH, Hahn-Meitner-Platz 1, 14109 Berlin, Germany
| | - Frank Noé
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
- Microsoft Research AI4Science, Karl-Liebknecht Str. 32, 10178 Berlin, Germany
- Department of Physics, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
- Department of Chemistry, Rice University, 6100 Main Street, Houston, Texas 77005, United States
| |
Collapse
|
7
|
Cytter Y, Nandy A, Duan C, Kulik HJ. Insights into the deviation from piecewise linearity in transition metal complexes from supervised machine learning models. Phys Chem Chem Phys 2023; 25:8103-8116. [PMID: 36876903 DOI: 10.1039/d3cp00258f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Abstract
Virtual high-throughput screening (VHTS) and machine learning (ML) with density functional theory (DFT) suffer from inaccuracies from the underlying density functional approximation (DFA). Many of these inaccuracies can be traced to the lack of derivative discontinuity that leads to a curvature in the energy with electron addition or removal. Over a dataset of nearly one thousand transition metal complexes typical of VHTS applications, we computed and analyzed the average curvature (i.e., deviation from piecewise linearity) for 23 density functional approximations spanning multiple rungs of "Jacob's ladder". While we observe the expected dependence of the curvatures on Hartree-Fock exchange, we note limited correlation of curvature values between different rungs of "Jacob's ladder". We train ML models (i.e., artificial neural networks or ANNs) to predict the curvature and the associated frontier orbital energies for each of these 23 functionals and then interpret differences in curvature among the different DFAs through analysis of the ML models. Notably, we observe spin to play a much more important role in determining the curvature of range-separated and double hybrids in comparison to semi-local functionals, explaining why curvature values are weakly correlated between these and other families of functionals. Over a space of 187.2k hypothetical compounds, we use our ANNs to pinpoint DFAs for which representative transition metal complexes have near-zero curvature with low uncertainty, demonstrating an approach to accelerate screening of complexes with targeted optical gaps.
Collapse
Affiliation(s)
- Yael Cytter
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| |
Collapse
|
8
|
Duan C, Nandy A, Terrones GG, Kastner DW, Kulik HJ. Active Learning Exploration of Transition-Metal Complexes to Discover Method-Insensitive and Synthetically Accessible Chromophores. JACS AU 2023; 3:391-401. [PMID: 36873700 PMCID: PMC9976347 DOI: 10.1021/jacsau.2c00547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 11/15/2022] [Accepted: 11/16/2022] [Indexed: 06/18/2023]
Abstract
Transition-metal chromophores with earth-abundant transition metals are an important design target for their applications in lighting and nontoxic bioimaging, but their design is challenged by the scarcity of complexes that simultaneously have well-defined ground states and optimal target absorption energies in the visible region. Machine learning (ML) accelerated discovery could overcome such challenges by enabling the screening of a larger space but is limited by the fidelity of the data used in ML model training, which is typically from a single approximate density functional. To address this limitation, we search for consensus in predictions among 23 density functional approximations across multiple rungs of "Jacob's ladder". To accelerate the discovery of complexes with absorption energies in the visible region while minimizing the effect of low-lying excited states, we use two-dimensional (2D)efficient global optimization to sample candidate low-spin chromophores from multimillion complex spaces. Despite the scarcity (i.e., ∼0.01%) of potential chromophores in this large chemical space, we identify candidates with high likelihood (i.e., >10%) of computational validation as the ML models improve during active learning, representing a 1000-fold acceleration in discovery. Absorption spectra of promising chromophores from time-dependent density functional theory verify that 2/3 of candidates have the desired excited-state properties. The observation that constituent ligands from our leads have demonstrated interesting optical properties in the literature exemplifies the effectiveness of our construction of a realistic design space and active learning approach.
Collapse
Affiliation(s)
- Chenru Duan
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States
| | - Aditya Nandy
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States
| | - Gianmarco G. Terrones
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - David W. Kastner
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Biological Engineering, Massachusetts
Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J. Kulik
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
9
|
Liu Y, Sulaiman HF, Johnson BR, Ma R, Gao Y, Fernando H, Amarasekara A, Ashley-Oyewole A, Fan H, Ingram HN, Briggs JM. QM/MM study of N501 involved intermolecular interaction between SARS-CoV-2 receptor binding domain and antibody of human origin. Comput Biol Chem 2023; 102:107810. [PMID: 36610304 PMCID: PMC9811887 DOI: 10.1016/j.compbiolchem.2023.107810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Revised: 01/02/2023] [Accepted: 01/03/2023] [Indexed: 01/06/2023]
Abstract
Intermolecular interaction between key residue N501 of the epitope on SARS-CoV-2 RBD and screening antibody B38 was studied using the QM/MM and QM approach. The QM/MM optimized geometry shows that angle X-H---Y is 165° for O-H---O between mAb light chain S30 and RBD N501. High level MP2 calculations indicated the interaction between RBD N501 and S30 of B38 Fab light chain provide a relatively strong attractive force of - 3.32 kcal/mol, whereas the hydrogen bond between RBD Q498 and S30 was quantified as 0.10 kcal/mol. The decrease in ESP partial charge on hydrogen atom of hydroxyl group on S30 drops from 0.38 a.u. to 0.31 a.u., exhibiting the sharing of 0.07 a.u. from the lone pair electron oxygen of N501 due to hydrogen bond formation. The NBO occupancy of hydrogen atom also decreases from 25.79 % to 22.93 % in the hydroxyl H-O NBO bond of S30. However, the minor change of NBO hybridization of hydroxyl oxygen of S30 from sp3.00 to sp3.05 implies the rigidity of hydrogen bond tetrahedral geometry in the relative dynamic protein complex. The O-H---O angle is 165° which is close but not exactly linear. The structural requirement for sp3 hybridization of oxygen for hydroxyl group on S30 and dimension of protein likely prevent O-H---O from adopting linear geometry. The hydrogen bond strengths were also calculated using a variety of DFT methods, and the result of - 3.33 kcal/mol from the M06L method is the closest to that of the MP2 calculation. Results of this work may aid in the COVID-19 vaccine and drug screening.
Collapse
Affiliation(s)
- Yuemin Liu
- Department of Chemistry, Prairie View A&M University, Prairie View, TX 77446, the United States of America,Department of Chemistry, Rice University, Houston, TX 77005, the United States of America,Corresponding author at: Department of Chemistry, Prairie View A&M University, Prairie View, TX 77446, the United States of America
| | - Hana F. Sulaiman
- Department of Chemistry, Prairie View A&M University, Prairie View, TX 77446, the United States of America
| | - Bruce R. Johnson
- Department of Chemistry, Rice University, Houston, TX 77005, the United States of America
| | - Rulong Ma
- Department of Biology and Biochemistry, University of Houston, Houston, TX 77004, the United States of America
| | - Yunxiang Gao
- Department of Chemistry, Prairie View A&M University, Prairie View, TX 77446, the United States of America
| | - Harshica Fernando
- Department of Chemistry, Prairie View A&M University, Prairie View, TX 77446, the United States of America
| | - Ananda Amarasekara
- Department of Chemistry, Prairie View A&M University, Prairie View, TX 77446, the United States of America
| | - Andrea Ashley-Oyewole
- Department of Chemistry, Prairie View A&M University, Prairie View, TX 77446, the United States of America
| | - Huajun Fan
- College of Chemical Engineering, Sichuan University Science and Engineering, Zigong, Sichuan 643000, PR China
| | - Heaven N. Ingram
- Department of Chemistry, Prairie View A&M University, Prairie View, TX 77446, the United States of America
| | - James M. Briggs
- Department of Biology and Biochemistry, University of Houston, Houston, TX 77004, the United States of America
| |
Collapse
|
10
|
Duan C, Nandy A, Meyer R, Arunachalam N, Kulik HJ. A transferable recommender approach for selecting the best density functional approximations in chemical discovery. NATURE COMPUTATIONAL SCIENCE 2023; 3:38-47. [PMID: 38177951 DOI: 10.1038/s43588-022-00384-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Accepted: 11/23/2022] [Indexed: 01/06/2024]
Abstract
Approximate density functional theory has become indispensable owing to its balanced cost-accuracy trade-off, including in large-scale screening. To date, however, no density functional approximation (DFA) with universal accuracy has been identified, leading to uncertainty in the quality of data generated from density functional theory. With electron density fitting and Δ-learning, we build a DFA recommender that selects the DFA with the lowest expected error with respect to the gold standard (but cost-prohibitive) coupled cluster theory in a system-specific manner. We demonstrate this recommender approach on the evaluation of vertical spin splitting energies of transition metal complexes. Our recommender predicts top-performing DFAs and yields excellent accuracy (about 2 kcal mol-1) for chemical discovery, outperforming both individual Δ-learning models and the best conventional single-functional approach from a set of 48 DFAs. By demonstrating transferability to diverse synthesized compounds, our recommender potentially addresses the accuracy versus scope dilemma broadly encountered in computational chemistry.
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Ralf Meyer
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Naveen Arunachalam
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA.
| |
Collapse
|
11
|
Ranjbar M, Nowroozi A, Nakhaei E. The first principle study of chalcogen bonds, pnicogen bond and their mutual effects in a set of complexes between the triazine with SHF and PH2F ligands. COMPUT THEOR CHEM 2022. [DOI: 10.1016/j.comptc.2022.113867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
12
|
Nandy A, Duan C, Kulik HJ. Audacity of huge: overcoming challenges of data scarcity and data quality for machine learning in computational materials discovery. Curr Opin Chem Eng 2022. [DOI: 10.1016/j.coche.2021.100778] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
13
|
Ariyarathna IR, Duan C, Kulik HJ. Understanding the chemical bonding of ground and excited states of HfO and HfB with correlated wavefunction theory and density functional approximations. J Chem Phys 2022; 156:184113. [PMID: 35568536 DOI: 10.1063/5.0090128] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Knowledge of the chemical bonding of HfO and HfB ground and low-lying electronic states provides essential insights into a range of catalysts and materials that contain Hf-O or Hf-B moieties. Here, we carry out high-level multi-reference configuration interaction theory and coupled cluster quantum chemical calculations on these systems. We compute full potential energy curves, excitation energies, ionization energies, electronic configurations, and spectroscopic parameters with large quadruple-ζ and quintuple-ζ quality correlation consistent basis sets. We also investigate equilibrium chemical bonding patterns and effects of correlating core electrons on property predictions. Differences in the ground state electron configuration of HfB(X4Σ-) and HfO(X1Σ+) lead to a significantly stronger bond in HfO than HfB, as judged by both dissociation energies and equilibrium bond distances. We extend our analysis to the chemical bonding patterns of the isovalent HfX (X = O, S, Se, Te, and Po) series and observe similar trends. We also note a linear trend between the decreasing value of the dissociation energy (De) from HfO to HfPo and the singlet-triplet energy gap (ΔES-T) of the molecule. Finally, we compare these benchmark results to those obtained using density functional theory (DFT) with 23 exchange-correlation functionals spanning multiple rungs of "Jacob's ladder." When comparing DFT errors to coupled cluster reference values on dissociation energies, excitation energies, and ionization energies of HfB and HfO, we observe semi-local generalized gradient approximations to significantly outperform more complex and high-cost functionals.
Collapse
Affiliation(s)
- Isuru R Ariyarathna
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
14
|
Duan C, Chu DBK, Nandy A, Kulik HJ. Detection of multi-reference character imbalances enables a transfer learning approach for virtual high throughput screening with coupled cluster accuracy at DFT cost. Chem Sci 2022; 13:4962-4971. [PMID: 35655882 PMCID: PMC9067623 DOI: 10.1039/d2sc00393g] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 04/04/2022] [Indexed: 01/08/2023] Open
Abstract
Appropriately identifying and treating molecules and materials with significant multi-reference (MR) character is crucial for achieving high data fidelity in virtual high-throughput screening (VHTS). Despite development of numerous MR diagnostics, the extent to which a single value of such a diagnostic indicates the MR effect on a chemical property prediction is not well established. We evaluate MR diagnostics for over 10 000 transition-metal complexes (TMCs) and compare to those for organic molecules. We observe that only some MR diagnostics are transferable from one chemical space to another. By studying the influence of MR character on chemical properties (i.e., MR effect) that involve multiple potential energy surfaces (i.e., adiabatic spin splitting, ΔE H-L, and ionization potential, IP), we show that differences in MR character are more important than the cumulative degree of MR character in predicting the magnitude of an MR effect. Motivated by this observation, we build transfer learning models to predict CCSD(T)-level adiabatic ΔE H-L and IP from lower levels of theory. By combining these models with uncertainty quantification and multi-level modeling, we introduce a multi-pronged strategy that accelerates data acquisition by at least a factor of three while achieving coupled cluster accuracy (i.e., to within 1 kcal mol-1 MAE) for robust VHTS.
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
- Department of Chemistry, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Daniel B K Chu
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
- Department of Chemistry, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
15
|
Tarzia A, Jelfs KE. Unlocking the computational design of metal-organic cages. Chem Commun (Camb) 2022; 58:3717-3730. [PMID: 35229861 PMCID: PMC8932387 DOI: 10.1039/d2cc00532h] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 02/22/2022] [Indexed: 12/11/2022]
Abstract
Metal-organic cages are macrocyclic structures that can possess an intrinsic void that can hold molecules for encapsulation, adsorption, sensing, and catalysis applications. As metal-organic cages may be comprised from nearly any combination of organic and metal-containing components, cages can form with diverse shapes and sizes, allowing for tuning toward targeted properties. Therefore, their near-infinite design space is almost impossible to explore through experimentation alone and computational design can play a crucial role in exploring new systems. Although high-throughput computational design and screening workflows have long been known as powerful tools in drug and materials discovery, their application in exploring metal-organic cages is more recent. We show examples of structure prediction and host-guest/catalytic property evaluation of metal-organic cages. These examples are facilitated by advances in methods that handle metal-containing systems with improved accuracy and are the beginning of the development of automated cage design workflows. We finally outline a scope for how high-throughput computational methods can assist and drive experimental decisions as the field pushes toward functional and complex metal-organic cages. In particular, we highlight the importance of considering realistic, flexible systems.
Collapse
Affiliation(s)
- Andrew Tarzia
- Department of Chemistry, Molecular Sciences Research Hub, Imperial College London, White City Campus, Wood Lane, London, W12 0BZ, UK.
| | - Kim E Jelfs
- Department of Chemistry, Molecular Sciences Research Hub, Imperial College London, White City Campus, Wood Lane, London, W12 0BZ, UK.
| |
Collapse
|
16
|
Vitillo JG, Cramer CJ, Gagliardi L. Multireference Methods are Realistic and Useful Tools for Modeling Catalysis. Isr J Chem 2022. [DOI: 10.1002/ijch.202100136] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Affiliation(s)
- Jenny G. Vitillo
- Department of Science and High Technology and INSTM Università degli Studi dell'Insubria Via Valleggio 9 I-22100 Como Italy
| | - Christopher J. Cramer
- Underwriters Laboratories Inc. 333 Pfingsten Road Northbrook Illinois 60602 United States
| | - Laura Gagliardi
- Department of Chemistry Pritzker School of Molecular Engineering James Franck Institute University of Chicago Chicago Illinois 60637 United States
| |
Collapse
|
17
|
Duan C, Nandy A, Kulik HJ. Machine Learning for the Discovery, Design, and Engineering of Materials. Annu Rev Chem Biomol Eng 2022; 13:405-429. [PMID: 35320698 DOI: 10.1146/annurev-chembioeng-092320-120230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Machine learning (ML) has become a part of the fabric of high-throughput screening and computational discovery of materials. Despite its increasingly central role, challenges remain in fully realizing the promise of ML. This is especially true for the practical acceleration of the engineering of robust materials and the development of design strategies that surpass trial and error or high-throughput screening alone. Depending on the quantity being predicted and the experimental data available, ML can either outperform physics-based modes, be used to accelerate such models, or be integrated with them to improve their performance. We cover recent advances in algorithms and in their application that are starting to make inroads toward (a) the discovery of new materials through large-scale enumerative screening, (b) the design of materials through identification of rules and principles that govern materials properties, and (c) the engineering of practical materials by satisfying multiple objectives. We conclude with opportunities for further advancement to realize ML as a widespread tool for practical computational materials design. Expected final online publication date for the Annual Review of Chemical and Biomolecular Engineering, Volume 13 is October 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA; , , .,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA; , , .,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA; , ,
| |
Collapse
|
18
|
Chalcogen Bonding in the Molecular Dimers of WCh 2 (Ch = S, Se, Te): On the Basic Understanding of the Local Interfacial and Interlayer Bonding Environment in 2D Layered Tungsten Dichalcogenides. Int J Mol Sci 2022; 23:ijms23031263. [PMID: 35163185 PMCID: PMC8835845 DOI: 10.3390/ijms23031263] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2021] [Revised: 01/14/2022] [Accepted: 01/17/2022] [Indexed: 01/28/2023] Open
Abstract
Layered two-dimensional transition metal dichalcogenides and their heterostructures are of current interest, owing to the diversity of their applications in many areas of materials nanoscience and technologies. With this in mind, we have examined the three molecular dimers of the tungsten dichalcogenide series, (WCh2)2 (Ch = S, Se, Te), using density functional theory to provide insight into which interactions, and their specific characteristics, are responsible for the interfacial/interlayer region in the room temperature 2H phase of WCh2 crystals. Our calculations at various levels of theory suggested that the Te···Te chalcogen bonding in (WTe2)2 is weak, whereas the Se···Se and S···S bonding interactions in (WSe2)2 and (WS2)2, respectively, are of the van der Waals type. The presence and character of Ch···Ch chalcogen bonding interactions in the dimers of (WCh2)2 are examined with a number of theoretical approaches and discussed, including charge-density-based approaches, such as the quantum theory of atoms in molecules, interaction region indicator, independent gradient model, and reduced density gradient non-covalent index approaches. The charge-density-based topological features are shown to be concordant with the results that originate from the extrema of potential on the electrostatic surfaces of WCh2 monomers. A natural bond orbital analysis has enabled us to suggest a number of weak hyperconjugative charge transfer interactions between the interacting monomers that are responsible for the geometry of the (WCh2)2 dimers at equilibrium. In addition to other features, we demonstrate that there is no so-called van der Waals gap between the monolayers in two-dimensional layered transition metal tungsten dichalcogenides, which are gapless, and that the (WCh2)2 dimers may be prototypes for a basic understanding of the physical chemistry of the chemical bonding environments associated with the local interfacial/interlayer regions in layered 2H-WCh2 nanoscale systems.
Collapse
|
19
|
Harper DR, Kulik HJ. Computational Scaling Relationships Predict Experimental Activity and Rate-Limiting Behavior in Homogeneous Water Oxidation. Inorg Chem 2022; 61:2186-2197. [PMID: 35037756 DOI: 10.1021/acs.inorgchem.1c03376] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
While computational screening with first-principles density functional theory (DFT) is essential for evaluating candidate catalysts, limitations in accuracy typically prevent the prediction of experimentally relevant activities. Exemplary of these challenges are homogeneous water oxidation catalysts (WOCs) where differences in experimental conditions or small changes in ligand structure can alter rate constants by over an order of magnitude. Here, we compute mechanistically relevant electronic and energetic properties for 19 mononuclear Ru transition-metal complexes (TMCs) from three experimental water oxidation catalysis studies. We discover that 15 of these TMCs have experimental activities that correlate with a single property, the ionization potential of the Ru(II)-O2 catalytic intermediate. This scaling parameter allows the quantitative understanding of activity trends and provides insight into the rate-limiting behavior. We use this approach to rationalize differences in activity with different experimental conditions, and we qualitatively analyze the source of distinct behavior for different electronic states in the other four catalysts. Comparison to closely related single-atom catalysts and modified WOCs enables rationalization of the source of rate enhancement in these WOCs.
Collapse
Affiliation(s)
- Daniel R Harper
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|