1
|
Stuyver T. TS-tools: Rapid and automated localization of transition states based on a textual reaction SMILES input. J Comput Chem 2024; 45:2308-2317. [PMID: 38850166 DOI: 10.1002/jcc.27374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 03/08/2024] [Accepted: 03/20/2024] [Indexed: 06/10/2024]
Abstract
Here, TS-tools is presented, a Python package facilitating the automated localization of transition states (TS) based on a textual reaction SMILES input. TS searches can either be performed at xTB or DFT level of theory, with the former yielding guesses at marginal computational cost, and the latter directly yielding accurate structures at greater expense. On a benchmarking dataset of mono- and bimolecular reactions, TS-tools reaches an excellent success rate of 95% already at xTB level of theory. For tri- and multimolecular reaction pathways - which are typically not benchmarked when developing new automated TS search approaches, yet are relevant for various types of reactivity, cf. solvent- and autocatalysis and enzymatic reactivity - TS-tools retains its ability to identify TS geometries, though a DFT treatment becomes essential in many cases. Throughout the presented applications, a particular emphasis is placed on solvation-induced mechanistic changes, another issue that received limited attention in the automated TS search literature so far.
Collapse
Affiliation(s)
- Thijs Stuyver
- Ecole Nationale Supérieure de Chimie de Paris, Université PSL, CNRS, Institute of Chemistry for Life and Health Sciences, Paris, France
| |
Collapse
|
2
|
Wetthasinghe ST, Garashchuk SV, Rassolov VA. Stability Trends in disubstituted Cobaltocenium Based on the Analysis of the Machine Learning Models. J Phys Chem A 2023; 127:10701-10708. [PMID: 38015632 DOI: 10.1021/acs.jpca.3c05668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
Abstract
Cobaltocenium derivatives have shown great potential as components of anion exchange membranes in fuel cells because they exhibit excellent thermal and alkaline stability under operating conditions while allowing for high anion mobility. The properties of the cobaltocenium-anion complexes can be chemically tuned through the substituent groups on the cyclopentadienyl (Cp) rings of the cation CoCp2+. However, the synthesis and characterization of the full range of possible derivatives are very challenging and time-consuming, and while the computational tools can greatly expedite this process, full screening of the electronic structure at a high level of theory is still computationally intensive. Therefore, in this work, we consider the machine learning (ML) modeling as a tool of predicting stability of disubstituted [CoCp2]OH complexes measured by their bond-dissociation energy (BDE). The relevant process here is the dissociation of the cobaltocenium-hydroxide complex into fragments [CoCpY']OH and CpY, where Y and Y' each represent one out of 42 substituent groups of experimental interest. In agreement with the previous ML study of 120 mono- and selected disubstituted species [Wetthasinghe et al. J. Chem. Phys. A (2022) 126], our analysis of the data set expanded to all possible disubstituted cobaltoceniums, points to the highest occupied and lowest unoccupied molecular orbitals, along with the Hirshfeld charge on the singly substituted benzene, to be the key features predicting the BDE of the unseen complexes. Based on the examination of the outliers, the acidity of substituents ((CO)NH2 in our case) is found to be of special significance for the cobaltocenium stability and for the model development. Moreover, we demonstrate that upon the data set refinement, the conventional ML models are capable of predicting the BDE close to 1 kcal/mol based on the properties of just the fragments, thereby greatly reducing the total number of species and of the computational time of each calculation. Such fragment-based "combinatorial" approach to the BDE modeling is noteworthy, since the geometry optimization of complexes in solution is conceptually challenging and computationally demanding, even when leveraging high-performance computing resources.
Collapse
Affiliation(s)
- Shehani T Wetthasinghe
- Department of Chemistry and Biochemistry, University of South Carolina, Columbia, South Carolina 29208, United States
| | - Sophya V Garashchuk
- Department of Chemistry and Biochemistry, University of South Carolina, Columbia, South Carolina 29208, United States
| | - Vitaly A Rassolov
- Department of Chemistry and Biochemistry, University of South Carolina, Columbia, South Carolina 29208, United States
| |
Collapse
|
3
|
Adamji H, Nandy A, Kevlishvili I, Román-Leshkov Y, Kulik HJ. Computational Discovery of Stable Metal-Organic Frameworks for Methane-to-Methanol Catalysis. J Am Chem Soc 2023. [PMID: 37339429 DOI: 10.1021/jacs.3c03351] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/22/2023]
Abstract
The challenge of direct partial oxidation of methane to methanol has motivated the targeted search of metal-organic frameworks (MOFs) as a promising class of materials for this transformation because of their site-isolated metals with tunable ligand environments. Thousands of MOFs have been synthesized, yet relatively few have been screened for their promise in methane conversion. We developed a high-throughput virtual screening workflow that identifies MOFs from a diverse space of experimental MOFs that have not been studied for catalysis, yet are thermally stable, synthesizable, and have promising unsaturated metal sites for C-H activation via a terminal metal-oxo species. We carried out density functional theory calculations of the radical rebound mechanism for methane-to-methanol conversion on models of the secondary building units (SBUs) from 87 selected MOFs. While we showed that oxo formation favorability decreases with increasing 3d filling, consistent with prior work, previously observed scaling relations between oxo formation and hydrogen atom transfer (HAT) are disrupted by the greater diversity in our MOF set. Accordingly, we focused on Mn MOFs, which favor oxo intermediates without disfavoring HAT or leading to high methanol release energies─a key feature for methane hydroxylation activity. We identified three Mn MOFs comprising unsaturated Mn centers bound to weak-field carboxylate ligands in planar or bent geometries with promising methane-to-methanol kinetics and thermodynamics. The energetic spans of these MOFs are indicative of promising turnover frequencies for methane to methanol that warrant further experimental catalytic studies.
Collapse
Affiliation(s)
- Husain Adamji
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Ilia Kevlishvili
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Yuriy Román-Leshkov
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
4
|
Cytter Y, Nandy A, Duan C, Kulik HJ. Insights into the deviation from piecewise linearity in transition metal complexes from supervised machine learning models. Phys Chem Chem Phys 2023; 25:8103-8116. [PMID: 36876903 DOI: 10.1039/d3cp00258f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Abstract
Virtual high-throughput screening (VHTS) and machine learning (ML) with density functional theory (DFT) suffer from inaccuracies from the underlying density functional approximation (DFA). Many of these inaccuracies can be traced to the lack of derivative discontinuity that leads to a curvature in the energy with electron addition or removal. Over a dataset of nearly one thousand transition metal complexes typical of VHTS applications, we computed and analyzed the average curvature (i.e., deviation from piecewise linearity) for 23 density functional approximations spanning multiple rungs of "Jacob's ladder". While we observe the expected dependence of the curvatures on Hartree-Fock exchange, we note limited correlation of curvature values between different rungs of "Jacob's ladder". We train ML models (i.e., artificial neural networks or ANNs) to predict the curvature and the associated frontier orbital energies for each of these 23 functionals and then interpret differences in curvature among the different DFAs through analysis of the ML models. Notably, we observe spin to play a much more important role in determining the curvature of range-separated and double hybrids in comparison to semi-local functionals, explaining why curvature values are weakly correlated between these and other families of functionals. Over a space of 187.2k hypothetical compounds, we use our ANNs to pinpoint DFAs for which representative transition metal complexes have near-zero curvature with low uncertainty, demonstrating an approach to accelerate screening of complexes with targeted optical gaps.
Collapse
Affiliation(s)
- Yael Cytter
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| |
Collapse
|
5
|
Duan C, Nandy A, Terrones GG, Kastner DW, Kulik HJ. Active Learning Exploration of Transition-Metal Complexes to Discover Method-Insensitive and Synthetically Accessible Chromophores. JACS AU 2023; 3:391-401. [PMID: 36873700 PMCID: PMC9976347 DOI: 10.1021/jacsau.2c00547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 11/15/2022] [Accepted: 11/16/2022] [Indexed: 06/18/2023]
Abstract
Transition-metal chromophores with earth-abundant transition metals are an important design target for their applications in lighting and nontoxic bioimaging, but their design is challenged by the scarcity of complexes that simultaneously have well-defined ground states and optimal target absorption energies in the visible region. Machine learning (ML) accelerated discovery could overcome such challenges by enabling the screening of a larger space but is limited by the fidelity of the data used in ML model training, which is typically from a single approximate density functional. To address this limitation, we search for consensus in predictions among 23 density functional approximations across multiple rungs of "Jacob's ladder". To accelerate the discovery of complexes with absorption energies in the visible region while minimizing the effect of low-lying excited states, we use two-dimensional (2D)efficient global optimization to sample candidate low-spin chromophores from multimillion complex spaces. Despite the scarcity (i.e., ∼0.01%) of potential chromophores in this large chemical space, we identify candidates with high likelihood (i.e., >10%) of computational validation as the ML models improve during active learning, representing a 1000-fold acceleration in discovery. Absorption spectra of promising chromophores from time-dependent density functional theory verify that 2/3 of candidates have the desired excited-state properties. The observation that constituent ligands from our leads have demonstrated interesting optical properties in the literature exemplifies the effectiveness of our construction of a realistic design space and active learning approach.
Collapse
Affiliation(s)
- Chenru Duan
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States
| | - Aditya Nandy
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States
| | - Gianmarco G. Terrones
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - David W. Kastner
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Biological Engineering, Massachusetts
Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J. Kulik
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
6
|
Short MAS, Tovee CA, Willans CE, Nguyen BN. High-throughput computational workflow for ligand discovery in catalysis with the CSD. Catal Sci Technol 2023. [DOI: 10.1039/d3cy00083d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/30/2023]
Abstract
A novel semi-automated, high-throughput computational workflow for ligand/catalyst discovery based on the Cambridge Structural Database is reported.
Collapse
|
7
|
Duan C, Nandy A, Meyer R, Arunachalam N, Kulik HJ. A transferable recommender approach for selecting the best density functional approximations in chemical discovery. NATURE COMPUTATIONAL SCIENCE 2023; 3:38-47. [PMID: 38177951 DOI: 10.1038/s43588-022-00384-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Accepted: 11/23/2022] [Indexed: 01/06/2024]
Abstract
Approximate density functional theory has become indispensable owing to its balanced cost-accuracy trade-off, including in large-scale screening. To date, however, no density functional approximation (DFA) with universal accuracy has been identified, leading to uncertainty in the quality of data generated from density functional theory. With electron density fitting and Δ-learning, we build a DFA recommender that selects the DFA with the lowest expected error with respect to the gold standard (but cost-prohibitive) coupled cluster theory in a system-specific manner. We demonstrate this recommender approach on the evaluation of vertical spin splitting energies of transition metal complexes. Our recommender predicts top-performing DFAs and yields excellent accuracy (about 2 kcal mol-1) for chemical discovery, outperforming both individual Δ-learning models and the best conventional single-functional approach from a set of 48 DFAs. By demonstrating transferability to diverse synthesized compounds, our recommender potentially addresses the accuracy versus scope dilemma broadly encountered in computational chemistry.
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Ralf Meyer
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Naveen Arunachalam
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA.
| |
Collapse
|
8
|
Duan C, Ladera AJ, Liu JCL, Taylor MG, Ariyarathna IR, Kulik HJ. Exploiting Ligand Additivity for Transferable Machine Learning of Multireference Character across Known Transition Metal Complex Ligands. J Chem Theory Comput 2022; 18:4836-4845. [PMID: 35834742 DOI: 10.1021/acs.jctc.2c00468] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Accurate virtual high-throughput screening (VHTS) of transition metal complexes (TMCs) remains challenging due to the possibility of high multireference (MR) character that complicates property evaluation. We compute MR diagnostics for over 5,000 ligands present in previously synthesized octahedral mononuclear transition metal complexes in the Cambridge Structural Database (CSD). To accomplish this task, we introduce an iterative approach for consistent ligand charge assignment for ligands in the CSD. Across this set, we observe that the MR character correlates linearly with the inverse value of the averaged bond order over all bonds in the molecule. We then demonstrate that ligand additivity of the MR character holds in TMCs, which suggests that the TMC MR character can be inferred from the sum of the MR character of the ligands. Encouraged by this observation, we leverage ligand additivity and develop a ligand-derived machine learning representation to train neural networks to predict the MR character of TMCs from properties of the constituent ligands. This approach yields models with excellent performance and superior transferability to unseen ligand chemistry and compositions.
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Adriana J Ladera
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Julian C-L Liu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Michael G Taylor
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Isuru R Ariyarathna
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
9
|
Duan C, Nandy A, Adamji H, Roman-Leshkov Y, Kulik HJ. Machine Learning Models Predict Calculation Outcomes with the Transferability Necessary for Computational Catalysis. J Chem Theory Comput 2022; 18:4282-4292. [PMID: 35737587 DOI: 10.1021/acs.jctc.2c00331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Virtual high-throughput screening (VHTS) and machine learning (ML) have greatly accelerated the design of single-site transition-metal catalysts. VHTS of catalysts, however, is often accompanied with a high calculation failure rate and wasted computational resources due to the difficulty of simultaneously converging all mechanistically relevant reactive intermediates to expected geometries and electronic states. We demonstrate a dynamic classifier approach, i.e., a convolutional neural network that monitors geometry optimizations on the fly, and exploit its good performance and transferability in identifying geometry optimization failures for catalyst design. We show that the dynamic classifier performs well on all reactive intermediates in the representative catalytic cycle of the radical rebound mechanism for the conversion of methane to methanol despite being trained on only one reactive intermediate. The dynamic classifier also generalizes to chemically distinct intermediates and metal centers absent from the training data without loss of accuracy or model confidence. We rationalize this superior model transferability as arising from the use of electronic structure and geometric information generated on-the-fly from density functional theory calculations and the convolutional layer in the dynamic classifier. When used in combination with uncertainty quantification, the dynamic classifier saves more than half of the computational resources that would have been wasted on unsuccessful calculations for all reactive intermediates being considered.
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Husain Adamji
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Yuriy Roman-Leshkov
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
10
|
Nandy A, Duan C, Goffinet C, Kulik HJ. New Strategies for Direct Methane-to-Methanol Conversion from Active Learning Exploration of 16 Million Catalysts. JACS AU 2022; 2:1200-1213. [PMID: 35647589 PMCID: PMC9135396 DOI: 10.1021/jacsau.2c00176] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 04/12/2022] [Accepted: 04/15/2022] [Indexed: 05/03/2023]
Abstract
Despite decades of effort, no earth-abundant homogeneous catalysts have been discovered that can selectively oxidize methane to methanol. We exploit active learning to simultaneously optimize methane activation and methanol release calculated with machine learning-accelerated density functional theory in a space of 16 M candidate catalysts including novel macrocycles. By constructing macrocycles from fragments inspired by synthesized compounds, we ensure synthetic realism in our computational search. Our large-scale search reveals that low-spin Fe(II) compounds paired with strong-field (e.g., P or S-coordinating) ligands have among the best energetic tradeoffs between hydrogen atom transfer (HAT) and methanol release. This observation contrasts with prior efforts that have focused on high-spin Fe(II) with weak-field ligands. By decoupling equatorial and axial ligand effects, we determine that negatively charged axial ligands are critical for more rapid release of methanol and that higher-valency metals [i.e., M(III) vs M(II)] are likely to be rate-limited by slow methanol release. With full characterization of barrier heights, we confirm that optimizing for HAT does not lead to large oxo formation barriers. Energetic span analysis reveals designs for an intermediate-spin Mn(II) catalyst and a low-spin Fe(II) catalyst that are predicted to have good turnover frequencies. Our active learning approach to optimize two distinct reaction energies with efficient global optimization is expected to be beneficial for the search of large catalyst spaces where no prior designs have been identified and where linear scaling relationships between reaction energies or barriers may be limited or unknown.
Collapse
Affiliation(s)
- Aditya Nandy
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States
| | - Chenru Duan
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States
| | - Conrad Goffinet
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J. Kulik
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
11
|
Duan C, Chu DBK, Nandy A, Kulik HJ. Detection of multi-reference character imbalances enables a transfer learning approach for virtual high throughput screening with coupled cluster accuracy at DFT cost. Chem Sci 2022; 13:4962-4971. [PMID: 35655882 PMCID: PMC9067623 DOI: 10.1039/d2sc00393g] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 04/04/2022] [Indexed: 01/08/2023] Open
Abstract
Appropriately identifying and treating molecules and materials with significant multi-reference (MR) character is crucial for achieving high data fidelity in virtual high-throughput screening (VHTS). Despite development of numerous MR diagnostics, the extent to which a single value of such a diagnostic indicates the MR effect on a chemical property prediction is not well established. We evaluate MR diagnostics for over 10 000 transition-metal complexes (TMCs) and compare to those for organic molecules. We observe that only some MR diagnostics are transferable from one chemical space to another. By studying the influence of MR character on chemical properties (i.e., MR effect) that involve multiple potential energy surfaces (i.e., adiabatic spin splitting, ΔE H-L, and ionization potential, IP), we show that differences in MR character are more important than the cumulative degree of MR character in predicting the magnitude of an MR effect. Motivated by this observation, we build transfer learning models to predict CCSD(T)-level adiabatic ΔE H-L and IP from lower levels of theory. By combining these models with uncertainty quantification and multi-level modeling, we introduce a multi-pronged strategy that accelerates data acquisition by at least a factor of three while achieving coupled cluster accuracy (i.e., to within 1 kcal mol-1 MAE) for robust VHTS.
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
- Department of Chemistry, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Daniel B K Chu
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
- Department of Chemistry, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
12
|
Nandy A, Terrones G, Arunachalam N, Duan C, Kastner DW, Kulik HJ. MOFSimplify, machine learning models with extracted stability data of three thousand metal-organic frameworks. Sci Data 2022; 9:74. [PMID: 35277533 PMCID: PMC8917177 DOI: 10.1038/s41597-022-01181-0] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 01/17/2022] [Indexed: 11/09/2022] Open
Abstract
We report a workflow and the output of a natural language processing (NLP)-based procedure to mine the extant metal–organic framework (MOF) literature describing structurally characterized MOFs and their solvent removal and thermal stabilities. We obtain over 2,000 solvent removal stability measures from text mining and 3,000 thermal decomposition temperatures from thermogravimetric analysis data. We assess the validity of our NLP methods and the accuracy of our extracted data by comparing to a hand-labeled subset. Machine learning (ML, i.e. artificial neural network) models trained on this data using graph- and pore-geometry-based representations enable prediction of stability on new MOFs with quantified uncertainty. Our web interface, MOFSimplify, provides users access to our curated data and enables them to harness that data for predictions on new MOFs. MOFSimplify also encourages community feedback on existing data and on ML model predictions for community-based active learning for improved MOF stability models. Measurement(s) | thermal decomposition | Technology Type(s) | thermogravimetry |
Collapse
Affiliation(s)
- Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Gianmarco Terrones
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Naveen Arunachalam
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - David W Kastner
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.,Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
| |
Collapse
|
13
|
Duan C, Nandy A, Kulik HJ. Machine Learning for the Discovery, Design, and Engineering of Materials. Annu Rev Chem Biomol Eng 2022; 13:405-429. [PMID: 35320698 DOI: 10.1146/annurev-chembioeng-092320-120230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Machine learning (ML) has become a part of the fabric of high-throughput screening and computational discovery of materials. Despite its increasingly central role, challenges remain in fully realizing the promise of ML. This is especially true for the practical acceleration of the engineering of robust materials and the development of design strategies that surpass trial and error or high-throughput screening alone. Depending on the quantity being predicted and the experimental data available, ML can either outperform physics-based modes, be used to accelerate such models, or be integrated with them to improve their performance. We cover recent advances in algorithms and in their application that are starting to make inroads toward (a) the discovery of new materials through large-scale enumerative screening, (b) the design of materials through identification of rules and principles that govern materials properties, and (c) the engineering of practical materials by satisfying multiple objectives. We conclude with opportunities for further advancement to realize ML as a widespread tool for practical computational materials design. Expected final online publication date for the Annual Review of Chemical and Biomolecular Engineering, Volume 13 is October 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA; , , .,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA; , , .,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA; , ,
| |
Collapse
|
14
|
Harper DR, Nandy A, Arunachalam N, Duan C, Janet JP, Kulik HJ. Representations and strategies for transferable machine learning Improve model performance in chemical discovery. J Chem Phys 2022; 156:074101. [DOI: 10.1063/5.0082964] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Daniel R Harper
- Massachusetts Institute of Technology, United States of America
| | - Aditya Nandy
- Massachusetts Institute of Technology, United States of America
| | | | - Chenru Duan
- Massachusetts Institute of Technology, United States of America
| | | | - Heather J. Kulik
- Dept of Chemical Engineering, Massachusetts Institute of Technology, United States of America
| |
Collapse
|
15
|
Nandy A, Duan C, Kulik HJ. Using Machine Learning and Data Mining to Leverage Community Knowledge for the Engineering of Stable Metal-Organic Frameworks. J Am Chem Soc 2021; 143:17535-17547. [PMID: 34643374 DOI: 10.1021/jacs.1c07217] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Although the tailored metal active sites and porous architectures of MOFs hold great promise for engineering challenges ranging from gas separations to catalysis, a lack of understanding of how to improve their stability limits their use in practice. To overcome this limitation, we extract thousands of published reports of the key aspects of MOF stability necessary for their practical application: the ability to withstand high temperatures without degrading and the capacity to be activated by removal of solvent molecules. From nearly 4000 manuscripts, we use natural language processing and image analysis to obtain over 2000 solvent-removal stability measures and 3000 thermal degradation temperatures. We analyze the relationships between stability properties and the chemical and geometric structures in this set to identify limits of prior heuristics derived from smaller sets of MOFs. By training predictive machine learning (ML, i.e., Gaussian process and artificial neural network) models to encode the structure-property relationships with graph- and pore-structure-based representations, we are able to make predictions of stability orders of magnitude faster than conventional physics-based modeling or experiment. Interpretation of important features in ML models provides insights that we use to identify strategies to engineer increased stability into typically unstable 3d-transition-metal-containing MOFs that are frequently targeted for catalytic applications. We expect our approach to accelerate the time to discovery of stable, practical MOF materials for a wide range of applications.
Collapse
Affiliation(s)
- Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
16
|
Duan C, Chen S, Taylor MG, Liu F, Kulik HJ. Machine learning to tame divergent density functional approximations: a new path to consensus materials design principles. Chem Sci 2021; 12:13021-13036. [PMID: 34745533 PMCID: PMC8513898 DOI: 10.1039/d1sc03701c] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Accepted: 09/01/2021] [Indexed: 01/17/2023] Open
Abstract
Virtual high-throughput screening (VHTS) with density functional theory (DFT) and machine-learning (ML)-acceleration is essential in rapid materials discovery. By necessity, efficient DFT-based workflows are carried out with a single density functional approximation (DFA). Nevertheless, properties evaluated with different DFAs can be expected to disagree for cases with challenging electronic structure (e.g., open-shell transition-metal complexes, TMCs) for which rapid screening is most needed and accurate benchmarks are often unavailable. To quantify the effect of DFA bias, we introduce an approach to rapidly obtain property predictions from 23 representative DFAs spanning multiple families, “rungs” (e.g., semi-local to double hybrid) and basis sets on over 2000 TMCs. Although computed property values (e.g., spin state splitting and frontier orbital gap) differ by DFA, high linear correlations persist across all DFAs. We train independent ML models for each DFA and observe convergent trends in feature importance, providing DFA-invariant, universal design rules. We devise a strategy to train artificial neural network (ANN) models informed by all 23 DFAs and use them to predict properties (e.g., spin-splitting energy) of over 187k TMCs. By requiring consensus of the ANN-predicted DFA properties, we improve correspondence of computational lead compounds with literature-mined, experimental compounds over the typically employed single-DFA approach. Machine learning (ML)-based feature analysis reveals universal design rules regardless of density functional choices. Using the consensus among multiple functionals, we identify robust lead complexes in ML-accelerated chemical discovery.![]()
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA +1-617-253-4584.,Department of Chemistry, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Shuxin Chen
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA +1-617-253-4584.,Department of Chemistry, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Michael G Taylor
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA +1-617-253-4584
| | - Fang Liu
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA +1-617-253-4584
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA +1-617-253-4584
| |
Collapse
|
17
|
Automated Construction and Optimization Combined with Machine Learning to Generate Pt(II) Methane C–H Activation Transition States. Top Catal 2021. [DOI: 10.1007/s11244-021-01506-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
18
|
Nandy A, Duan C, Taylor MG, Liu F, Steeves AH, Kulik HJ. Computational Discovery of Transition-metal Complexes: From High-throughput Screening to Machine Learning. Chem Rev 2021; 121:9927-10000. [PMID: 34260198 DOI: 10.1021/acs.chemrev.1c00347] [Citation(s) in RCA: 70] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Transition-metal complexes are attractive targets for the design of catalysts and functional materials. The behavior of the metal-organic bond, while very tunable for achieving target properties, is challenging to predict and necessitates searching a wide and complex space to identify needles in haystacks for target applications. This review will focus on the techniques that make high-throughput search of transition-metal chemical space feasible for the discovery of complexes with desirable properties. The review will cover the development, promise, and limitations of "traditional" computational chemistry (i.e., force field, semiempirical, and density functional theory methods) as it pertains to data generation for inorganic molecular discovery. The review will also discuss the opportunities and limitations in leveraging experimental data sources. We will focus on how advances in statistical modeling, artificial intelligence, multiobjective optimization, and automation accelerate discovery of lead compounds and design rules. The overall objective of this review is to showcase how bringing together advances from diverse areas of computational chemistry and computer science have enabled the rapid uncovering of structure-property relationships in transition-metal chemistry. We aim to highlight how unique considerations in motifs of metal-organic bonding (e.g., variable spin and oxidation state, and bonding strength/nature) set them and their discovery apart from more commonly considered organic molecules. We will also highlight how uncertainty and relative data scarcity in transition-metal chemistry motivate specific developments in machine learning representations, model training, and in computational chemistry. Finally, we will conclude with an outlook of areas of opportunity for the accelerated discovery of transition-metal complexes.
Collapse
Affiliation(s)
- Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Michael G Taylor
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Fang Liu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Adam H Steeves
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
19
|
Duan C, Liu F, Nandy A, Kulik HJ. Putting Density Functional Theory to the Test in Machine-Learning-Accelerated Materials Discovery. J Phys Chem Lett 2021; 12:4628-4637. [PMID: 33973793 DOI: 10.1021/acs.jpclett.1c00631] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Accelerated discovery with machine learning (ML) has begun to provide the advances in efficiency needed to overcome the combinatorial challenge of computational materials design. Nevertheless, ML-accelerated discovery both inherits the biases of training data derived from density functional theory (DFT) and leads to many attempted calculations that are doomed to fail. Many compelling functional materials and catalytic processes involve strained chemical bonds, open-shell radicals and diradicals, or metal-organic bonds to open-shell transition-metal centers. Although promising targets, these materials present unique challenges for electronic structure methods and combinatorial challenges for their discovery. In this Perspective, we describe the advances needed in accuracy, efficiency, and approach beyond what is typical in conventional DFT-based ML workflows. These challenges have begun to be addressed through ML models trained to predict the results of multiple methods or the differences between them, enabling quantitative sensitivity analysis. For DFT to be trusted for a given data point in a high-throughput screen, it must pass a series of tests. ML models that predict the likelihood of calculation success and detect the presence of strong correlation will enable rapid diagnoses and adaptation strategies. These "decision engines" represent the first steps toward autonomous workflows that avoid the need for expert determination of the robustness of DFT-based materials discoveries.
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Fang Liu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
20
|
Affiliation(s)
- Heather J. Kulik
- Department of Chemical Engineering Massachusetts Institute of Technology 77 Massachusetts Ave Rm 66–464 Cambridge MA 02139 USA
| |
Collapse
|
21
|
Janet JP, Duan C, Nandy A, Liu F, Kulik HJ. Navigating Transition-Metal Chemical Space: Artificial Intelligence for First-Principles Design. Acc Chem Res 2021; 54:532-545. [PMID: 33480674 DOI: 10.1021/acs.accounts.0c00686] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
The variability of chemical bonding in open-shell transition-metal complexes not only motivates their study as functional materials and catalysts but also challenges conventional computational modeling tools. Here, tailoring ligand chemistry can alter preferred spin or oxidation states as well as electronic structure properties and reactivity, creating vast regions of chemical space to explore when designing new materials atom by atom. Although first-principles density functional theory (DFT) remains the workhorse of computational chemistry in mechanism deduction and property prediction, it is of limited use here. DFT is both far too computationally costly for widespread exploration of transition-metal chemical space and also prone to inaccuracies that limit its predictive performance for localized d electrons in transition-metal complexes. These challenges starkly contrast with the well-trodden regions of small-organic-molecule chemical space, where the analytical forms of molecular mechanics force fields and semiempirical theories have for decades accelerated the discovery of new molecules, accurate DFT functional performance has been demonstrated, and gold-standard methods from correlated wavefunction theory can predict experimental results to chemical accuracy.The combined promise of transition-metal chemical space exploration and lack of established tools has mandated a distinct approach. In this Account, we outline the path we charted in exploration of transition-metal chemical space starting from the first machine learning (ML) models (i.e., artificial neural network and kernel ridge regression) and representations for the prediction of open-shell transition-metal complex properties. The distinct importance of the immediate coordination environment of the metal center as well as the lack of low-level methods to accurately predict structural properties in this coordination environment first motivated and then benefited from these ML models and representations. Once developed, the recipe for prediction of geometric, spin state, and redox potential properties was straightforwardly extended to a diverse range of other properties, including in catalysis, computational "feasibility", and the gas separation properties of periodic metal-organic frameworks. Interpretation of selected features most important for model prediction revealed new ways to encapsulate design rules and confirmed that models were robustly mapping essential structure-property relationships. Encountering the special challenge of ensuring that good model performance could generalize to new discovery targets motivated investigation of how to best carry out model uncertainty quantification. Distance-based approaches, whether in model latent space or in carefully engineered feature space, provided intuitive measures of the domain of applicability. With all of these pieces together, ML can be harnessed as an engine to tackle the large-scale exploration of transition-metal chemical space needed to satisfy multiple objectives using efficient global optimization methods. In practical terms, bringing these artificial intelligence tools to bear on the problems of transition-metal chemical space exploration has resulted in ML-model assessments of large, multimillion compound spaces in minutes and validated new design leads in weeks instead of decades.
Collapse
Affiliation(s)
- Jon Paul Janet
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Fang Liu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J. Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
22
|
Balcells D, Skjelstad BB. tmQM Dataset-Quantum Geometries and Properties of 86k Transition Metal Complexes. J Chem Inf Model 2020; 60:6135-6146. [PMID: 33166143 PMCID: PMC7768608 DOI: 10.1021/acs.jcim.0c01041] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Indexed: 12/19/2022]
Abstract
We report the transition metal quantum mechanics (tmQM) data set, which contains the geometries and properties of a large transition metal-organic compound space. tmQM comprises 86,665 mononuclear complexes extracted from the Cambridge Structural Database, including Werner, bioinorganic, and organometallic complexes based on a large variety of organic ligands and 30 transition metals (the 3d, 4d, and 5d from groups 3 to 12). All complexes are closed-shell, with a formal charge in the range {+1, 0, -1}e. The tmQM data set provides the Cartesian coordinates of all metal complexes optimized at the GFN2-xTB level, and their molecular size, stoichiometry, and metal node degree. The quantum properties were computed at the DFT(TPSSh-D3BJ/def2-SVP) level and include the electronic and dispersion energies, highest occupied molecular orbital (HOMO) and lowest unoccupied molecular orbital (LUMO) energies, HOMO/LUMO gap, dipole moment, and natural charge of the metal center; GFN2-xTB polarizabilities are also provided. Pairwise representations showed the low correlation between these properties, providing nearly continuous maps with unusual regions of the chemical space, for example, complexes combining large polarizabilities with wide HOMO/LUMO gaps and complexes combining low-energy HOMO orbitals with electron-rich metal centers. The tmQM data set can be exploited in the data-driven discovery of new metal complexes, including predictive models based on machine learning. These models may have a strong impact on the fields in which transition metal chemistry plays a key role, for example, catalysis, organic synthesis, and materials science. tmQM is an open data set that can be downloaded free of charge from https://github.com/bbskjelstad/tmqm.
Collapse
Affiliation(s)
- David Balcells
- Hylleraas
Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, P.O. Box 1033, Blindern, 0315 Oslo, Norway
| | - Bastian Bjerkem Skjelstad
- Institute
for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo 001-0021, Japan
| |
Collapse
|
23
|
Nandy A, Kulik HJ. Why Conventional Design Rules for C–H Activation Fail for Open-Shell Transition-Metal Catalysts. ACS Catal 2020. [DOI: 10.1021/acscatal.0c04300] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J. Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
24
|
Moosavi S, Jablonka KM, Smit B. The Role of Machine Learning in the Understanding and Design of Materials. J Am Chem Soc 2020; 142:20273-20287. [PMID: 33170678 PMCID: PMC7716341 DOI: 10.1021/jacs.0c09105] [Citation(s) in RCA: 89] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Indexed: 12/21/2022]
Abstract
Developing algorithmic approaches for the rational design and discovery of materials can enable us to systematically find novel materials, which can have huge technological and social impact. However, such rational design requires a holistic perspective over the full multistage design process, which involves exploring immense materials spaces, their properties, and process design and engineering as well as a techno-economic assessment. The complexity of exploring all of these options using conventional scientific approaches seems intractable. Instead, novel tools from the field of machine learning can potentially solve some of our challenges on the way to rational materials design. Here we review some of the chief advancements of these methods and their applications in rational materials design, followed by a discussion on some of the main challenges and opportunities we currently face together with our perspective on the future of rational materials design and discovery.
Collapse
Affiliation(s)
- Seyed
Mohamad Moosavi
- Laboratory of Molecular Simulation,
Institut des Sciences et Ingénierie Chimiques, École Polytechnique Fédérale de Lausanne (EPFL), Rue de l’Industrie 17, CH-1951 Sion, Valais, Switzerland
| | - Kevin Maik Jablonka
- Laboratory of Molecular Simulation,
Institut des Sciences et Ingénierie Chimiques, École Polytechnique Fédérale de Lausanne (EPFL), Rue de l’Industrie 17, CH-1951 Sion, Valais, Switzerland
| | - Berend Smit
- Laboratory of Molecular Simulation,
Institut des Sciences et Ingénierie Chimiques, École Polytechnique Fédérale de Lausanne (EPFL), Rue de l’Industrie 17, CH-1951 Sion, Valais, Switzerland
| |
Collapse
|
25
|
Zöllner MS, Saghatchi A, Mujica V, Herrmann C. Influence of Electronic Structure Modeling and Junction Structure on First-Principles Chiral Induced Spin Selectivity. J Chem Theory Comput 2020; 16:7357-7371. [PMID: 33167619 DOI: 10.1021/acs.jctc.0c00621] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
We have carried out a comprehensive study of the influence of electronic structure modeling and junction structure description on the first-principles calculation of the spin polarization in molecular junctions caused by the chiral induced spin selectivity (CISS) effect. We explore the limits and the sensitivity to modeling decisions of a Landauer/Green's function/two-component density functional theory approach to CISS. We find that although the CISS effect is entirely attributed in the literature to molecular spin filtering, spin-orbit coupling being partially inherited from the metal electrodes plays an important role in our calculations on ideal carbon helices, even though this effect cannot explain the experimental conductance results. Its magnitude depends considerably on the shape, size, and material of the metal clusters modeling the electrodes. Also, a pronounced dependence on the specific description of exchange interaction and spin-orbit coupling is manifest in our approach. This is important because the interplay between exchange effects and spin-orbit coupling may play an important role in the description of the junction magnetic response. Our calculations are relevant for the whole field of spin-polarized electron transport and electron transfer, because there is still an open discussion in the literature about the detailed underlying mechanism and the magnitude of physical parameters that need to be included to achieve a consistent description of the CISS effect: seemingly good quantitative agreement between simulation and the experiment can be caused by error compensation, because spin polarization as contained in a Landauer/Green's function/two-component density functional theory approach depends strongly on computational and structural parameters.
Collapse
Affiliation(s)
| | - Aida Saghatchi
- Department of Chemistry, University of Hamburg, 20146 Hamburg, Germany
| | - Vladimiro Mujica
- School of Molecular Sciences, Arizona State University, Tempe, Arizona 85287-1604, United States.,Kimika Fakultatea, Euskal Herriko Unibertsitatea and Donostia International Physics Center (DIPC), Donostia, Euskadi P.K. 1072, 20080, Spain
| | - Carmen Herrmann
- Department of Chemistry, University of Hamburg, 20146 Hamburg, Germany
| |
Collapse
|
26
|
Liu F, Duan C, Kulik HJ. Rapid Detection of Strong Correlation with Machine Learning for Transition-Metal Complex High-Throughput Screening. J Phys Chem Lett 2020; 11:8067-8076. [PMID: 32864977 DOI: 10.1021/acs.jpclett.0c02288] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Despite its widespread use in chemical discovery, approximate density functional theory (DFT) is poorly suited to many targets, such as those containing open-shell, 3d transition metals that can be expected to have strong multireference (MR) character. For discovery workflows to be predictive, we need automated, low-cost methods that can distinguish the regions of chemical space where DFT should be applied from those where it should not. We curate more than 4800 open-shell transition-metal complexes up to hundreds of atoms in size from prior high-throughput DFT studies and evaluate affordable, finite-temperature DFT fractional occupation number (FON)-based MR diagnostics. We show that intuitive measures of strong correlation (i.e., the HOMO-LUMO gap) are not predictive of MR character as judged by FON-based diagnostics. Analysis of independently trained machine learning (ML) models to predict HOMO-LUMO gaps and FON-based diagnostics reveals differences in the metal and ligand sensitivity of the two quantities. We use our trained ML models to rapidly evaluate MR character over a space of ∼187000 theoretical complexes, identifying large-scale trends in spin-state-dependent MR character and finding small HOMO-LUMO gap complexes while ensuring low MR character.
Collapse
Affiliation(s)
- Fang Liu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
27
|
Bahlke MP, Mogos N, Proppe J, Herrmann C. Exchange Spin Coupling from Gaussian Process Regression. J Phys Chem A 2020; 124:8708-8723. [DOI: 10.1021/acs.jpca.0c05983] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Marc Philipp Bahlke
- Department of Chemistry, University of Hamburg, Martin-Luther-King-Platz 6, 20146 Hamburg, Germany
| | - Natnael Mogos
- Department of Chemistry, University of Hamburg, Martin-Luther-King-Platz 6, 20146 Hamburg, Germany
| | - Jonny Proppe
- Institute of Physical Chemistry, Georg-August University, Tammannstr. 6, 37077 Göttingen, Germany
| | - Carmen Herrmann
- Department of Chemistry, University of Hamburg, Martin-Luther-King-Platz 6, 20146 Hamburg, Germany
| |
Collapse
|
28
|
Jablonka K, Ongari D, Moosavi SM, Smit B. Big-Data Science in Porous Materials: Materials Genomics and Machine Learning. Chem Rev 2020; 120:8066-8129. [PMID: 32520531 PMCID: PMC7453404 DOI: 10.1021/acs.chemrev.0c00004] [Citation(s) in RCA: 154] [Impact Index Per Article: 38.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Indexed: 12/16/2022]
Abstract
By combining metal nodes with organic linkers we can potentially synthesize millions of possible metal-organic frameworks (MOFs). The fact that we have so many materials opens many exciting avenues but also create new challenges. We simply have too many materials to be processed using conventional, brute force, methods. In this review, we show that having so many materials allows us to use big-data methods as a powerful technique to study these materials and to discover complex correlations. The first part of the review gives an introduction to the principles of big-data science. We show how to select appropriate training sets, survey approaches that are used to represent these materials in feature space, and review different learning architectures, as well as evaluation and interpretation strategies. In the second part, we review how the different approaches of machine learning have been applied to porous materials. In particular, we discuss applications in the field of gas storage and separation, the stability of these materials, their electronic properties, and their synthesis. Given the increasing interest of the scientific community in machine learning, we expect this list to rapidly expand in the coming years.
Collapse
Affiliation(s)
- Kevin
Maik Jablonka
- Laboratory of Molecular Simulation
(LSMO), Institut des Sciences et Ingénierie Chimiques (ISIC), École Polytechnique Fédérale
de Lausanne (EPFL), Sion, Switzerland
| | - Daniele Ongari
- Laboratory of Molecular Simulation
(LSMO), Institut des Sciences et Ingénierie Chimiques (ISIC), École Polytechnique Fédérale
de Lausanne (EPFL), Sion, Switzerland
| | - Seyed Mohamad Moosavi
- Laboratory of Molecular Simulation
(LSMO), Institut des Sciences et Ingénierie Chimiques (ISIC), École Polytechnique Fédérale
de Lausanne (EPFL), Sion, Switzerland
| | - Berend Smit
- Laboratory of Molecular Simulation
(LSMO), Institut des Sciences et Ingénierie Chimiques (ISIC), École Polytechnique Fédérale
de Lausanne (EPFL), Sion, Switzerland
| |
Collapse
|
29
|
Duan C, Liu F, Nandy A, Kulik HJ. Semi-supervised Machine Learning Enables the Robust Detection of Multireference Character at Low Cost. J Phys Chem Lett 2020; 11:6640-6648. [PMID: 32692570 DOI: 10.1021/acs.jpclett.0c02018] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Multireference (MR) diagnostics are common tools for identifying strongly correlated electronic structure that makes single-reference (SR) methods (e.g., density functional theory or DFT) insufficient for accurate property prediction. However, MR diagnostics typically require computationally demanding correlated wave function theory (WFT) calculations, and diagnostics often disagree or fail to predict MR effects on properties. To overcome these challenges, we introduce a semi-supervised machine learning (ML) approach with virtual adversarial training (VAT) of an MR classifier using 15 WFT and DFT MR diagnostics as inputs. In semi-supervised learning, only the most extreme SR or MR points are labeled, and the remaining point labels are learned. The resulting VAT model outperforms the alternatives, as quantified by the distinct property distributions of SR- and MR-classified molecules. To reduce the cost of generating inputs to the VAT model, we leverage the VAT model's robustness to noisy inputs by replacing WFT MR diagnostics with regression predictions in an MR decision engine workflow that preserves excellent performance. We demonstrate the transferability of our approach to larger molecules and those with distinct chemical composition from the training set. This MR decision engine demonstrates promise as a low-cost, high-accuracy approach to the automatic detection of strong correlation for predictive high-throughput screening.
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Fang Liu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
30
|
Duan C, Liu F, Nandy A, Kulik HJ. Data-Driven Approaches Can Overcome the Cost-Accuracy Trade-Off in Multireference Diagnostics. J Chem Theory Comput 2020; 16:4373-4387. [PMID: 32536161 DOI: 10.1021/acs.jctc.0c00358] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
High-throughput computational screening typically employs methods (i.e., density functional theory or DFT) that can fail to describe challenging molecules, such as those with strongly correlated electronic structure. In such cases, multireference (MR) correlated wavefunction theory (WFT) would be the appropriate choice but remains more challenging to carry out and automate than single-reference (SR) WFT or DFT. Numerous diagnostics have been proposed for identifying when MR character is likely to have an effect on the predictive power of SR calculations, but conflicting conclusions about diagnostic performance have been reached on small data sets. We compute 15 MR diagnostics, ranging from affordable DFT-based to more costly MR-WFT-based diagnostics, on a set of 3165 equilibrium and distorted small organic molecules containing up to six heavy atoms. Conflicting MR character assignments and low pairwise linear correlations among diagnostics are also observed over this set. We evaluate the ability of existing diagnostics to predict the percent recovery of the correlation energy, %Ecorr. None of the DFT-based diagnostics are nearly as predictive of %Ecorr as the best WFT-based diagnostics. To overcome the limitation of this cost-accuracy trade-off, we develop machine learning (ML, i.e., kernel ridge regression) models to predict WFT-based diagnostics from a combination of DFT-based diagnostics and a new, size-independent 3D geometric representation. The ML-predicted diagnostics correlate as well with MR effects as their computed (i.e., with WFT) values, significantly improving over the DFT-based diagnostics on which the models were trained. These ML models thus provide a promising approach to improve upon DFT-based diagnostic accuracy while remaining suitably low cost for high-throughput screening.
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Fang Liu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
31
|
Coley CW, Eyke NS, Jensen KF. Autonomous Discovery in the Chemical Sciences Part I: Progress. Angew Chem Int Ed Engl 2020; 59:22858-22893. [DOI: 10.1002/anie.201909987] [Citation(s) in RCA: 100] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Indexed: 01/05/2023]
Affiliation(s)
- Connor W. Coley
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Natalie S. Eyke
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Klavs F. Jensen
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
32
|
Coley CW, Eyke NS, Jensen KF. Autonome Entdeckung in den chemischen Wissenschaften, Teil I: Fortschritt. Angew Chem Int Ed Engl 2020. [DOI: 10.1002/ange.201909987] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Connor W. Coley
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Natalie S. Eyke
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Klavs F. Jensen
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
33
|
Friederich P, Dos Passos Gomes G, De Bin R, Aspuru-Guzik A, Balcells D. Machine learning dihydrogen activation in the chemical space surrounding Vaska's complex. Chem Sci 2020; 11:4584-4601. [PMID: 33224459 PMCID: PMC7659707 DOI: 10.1039/d0sc00445f] [Citation(s) in RCA: 72] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Accepted: 04/06/2020] [Indexed: 12/15/2022] Open
Abstract
Homogeneous catalysis using transition metal complexes is ubiquitously used for organic synthesis, as well as technologically relevant in applications such as water splitting and CO2 reduction. The key steps underlying homogeneous catalysis require a specific combination of electronic and steric effects from the ligands bound to the metal center. Finding the optimal combination of ligands is a challenging task due to the exceedingly large number of possibilities and the non-trivial ligand-ligand interactions. The classic example of Vaska's complex, trans-[Ir(PPh3)2(CO)(Cl)], illustrates this scenario. The ligands of this species activate iridium for the oxidative addition of hydrogen, yielding the dihydride cis-[Ir(H)2(PPh3)2(CO)(Cl)] complex. Despite the simplicity of this system, thousands of derivatives can be formulated for the activation of H2, with a limited number of ligands belonging to the same general categories found in the original complex. In this work, we show how DFT and machine learning (ML) methods can be combined to enable the prediction of reactivity within large chemical spaces containing thousands of complexes. In a space of 2574 species derived from Vaska's complex, data from DFT calculations are used to train and test ML models that predict the H2-activation barrier. In contrast to experiments and calculations requiring several days to be completed, the ML models were trained and used on a laptop on a time-scale of minutes. As a first approach, we combined Bayesian-optimized artificial neural networks (ANN) with features derived from autocorrelation and deltametric functions. The resulting ANNs achieved high accuracies, with mean absolute errors (MAE) between 1 and 2 kcal mol-1, depending on the size of the training set. By using a Gaussian process (GP) model trained with a set of selected features, including fingerprints, accuracy was further enhanced. Remarkably, this GP model minimized the MAE below 1 kcal mol-1, by using only 20% or less of the data available for training. The gradient boosting (GB) method was also used to assess the relevance of the features, which was used for both feature selection and model interpretation purposes. Features accounting for chemical composition, atom size and electronegativity were found to be the most determinant in the predictions. Further, the ligand fragments with the strongest influence on the H2-activation barrier were identified.
Collapse
Affiliation(s)
- Pascal Friederich
- Chemical Physics Theory Group , Department of Chemistry , University of Toronto , Toronto , Ontario M5S 3H6 , Canada
- Institute of Nanotechnology , Karlsruhe Institute of Technology , Hermann-von-Helmholtz-Platz 1 , 76344 Eggenstein-Leopoldshafen , Germany
- Department of Computer Science , University of Toronto , 214 College St. , Toronto , Ontario M5T 3A1 , Canada
| | - Gabriel Dos Passos Gomes
- Chemical Physics Theory Group , Department of Chemistry , University of Toronto , Toronto , Ontario M5S 3H6 , Canada
- Department of Computer Science , University of Toronto , 214 College St. , Toronto , Ontario M5T 3A1 , Canada
| | - Riccardo De Bin
- Department of Mathematics , University of Oslo , P. O. Box 1053, Blindern , N-0316 , Oslo , Norway
| | - Alán Aspuru-Guzik
- Chemical Physics Theory Group , Department of Chemistry , University of Toronto , Toronto , Ontario M5S 3H6 , Canada
- Department of Computer Science , University of Toronto , 214 College St. , Toronto , Ontario M5T 3A1 , Canada
- Vector Institute for Artificial Intelligence , 661 University Ave. Suite 710 , Toronto , Ontario M5G 1M1 , Canada
- Lebovic Fellow , Canadian Institute for Advanced Research (CIFAR) , 661 University Ave , Toronto , ON M5G 1M1 , Canada
| | - David Balcells
- Hylleraas Centre for Quantum Molecular Sciences , Department of Chemistry , University of Oslo , P. O. Box 1033, Blindern , N-0315 , Oslo , Norway .
| |
Collapse
|
34
|
Janet JP, Ramesh S, Duan C, Kulik HJ. Accurate Multiobjective Design in a Space of Millions of Transition Metal Complexes with Neural-Network-Driven Efficient Global Optimization. ACS CENTRAL SCIENCE 2020; 6:513-524. [PMID: 32342001 PMCID: PMC7181321 DOI: 10.1021/acscentsci.0c00026] [Citation(s) in RCA: 83] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Indexed: 05/20/2023]
Abstract
The accelerated discovery of materials for real world applications requires the achievement of multiple design objectives. The multidimensional nature of the search necessitates exploration of multimillion compound libraries over which even density functional theory (DFT) screening is intractable. Machine learning (e.g., artificial neural network, ANN, or Gaussian process, GP) models for this task are limited by training data availability and predictive uncertainty quantification (UQ). We overcome such limitations by using efficient global optimization (EGO) with the multidimensional expected improvement (EI) criterion. EGO balances exploitation of a trained model with acquisition of new DFT data at the Pareto front, the region of chemical space that contains the optimal trade-off between multiple design criteria. We demonstrate this approach for the simultaneous optimization of redox potential and solubility in candidate M(II)/M(III) redox couples for redox flow batteries from a space of 2.8 M transition metal complexes designed for stability in practical redox flow battery (RFB) applications. We show that a multitask ANN with latent-distance-based UQ surpasses the generalization performance of a GP in this space. With this approach, ANN prediction and EI scoring of the full space are achieved in minutes. Starting from ca. 100 representative points, EGO improves both properties by over 3 standard deviations in only five generations. Analysis of lookahead errors confirms rapid ANN model improvement during the EGO process, achieving suitable accuracy for predictive design in the space of transition metal complexes. The ANN-driven EI approach achieves at least 500-fold acceleration over random search, identifying a Pareto-optimal design in around 5 weeks instead of 50 years.
Collapse
Affiliation(s)
- Jon Paul Janet
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Sahasrajit Ramesh
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Chenru Duan
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J. Kulik
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- . Phone: 617-253-4584
| |
Collapse
|
35
|
Taylor MG, Yang T, Lin S, Nandy A, Janet JP, Duan C, Kulik HJ. Seeing Is Believing: Experimental Spin States from Machine Learning Model Structure Predictions. J Phys Chem A 2020; 124:3286-3299. [PMID: 32223165 PMCID: PMC7311053 DOI: 10.1021/acs.jpca.0c01458] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
![]()
Determination of ground-state spins
of open-shell transition-metal
complexes is critical to understanding catalytic and materials properties
but also challenging with approximate electronic structure methods.
As an alternative approach, we demonstrate how structure alone can
be used to guide assignment of ground-state spin from experimentally
determined crystal structures of transition-metal complexes. We first
identify the limits of distance-based heuristics from distributions
of metal–ligand bond lengths of over 2000 unique mononuclear
Fe(II)/Fe(III) transition-metal complexes. To overcome these limits,
we employ artificial neural networks (ANNs) to predict spin-state-dependent
metal–ligand bond lengths and classify experimental ground-state
spins based on agreement of experimental structures with the ANN predictions.
Although the ANN is trained on hybrid density functional theory data,
we exploit the method-insensitivity of geometric properties to enable
assignment of ground states for the majority (ca. 80–90%) of
structures. We demonstrate the utility of the ANN by data-mining the
literature for spin-crossover (SCO) complexes, which have experimentally
observed temperature-dependent geometric structure changes, by correctly
assigning almost all (>95%) spin states in the 46 Fe(II) SCO complex
set. This approach represents a promising complement to more conventional
energy-based spin-state assignment from electronic structure theory
at the low cost of a machine learning model.
Collapse
Affiliation(s)
- Michael G Taylor
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Tzuhsiung Yang
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Sean Lin
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Jon Paul Janet
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
36
|
Heinen S, Schwilk M, von Rudorff GF, von Lilienfeld OA. Machine learning the computational cost of quantum chemistry. MACHINE LEARNING-SCIENCE AND TECHNOLOGY 2020. [DOI: 10.1088/2632-2153/ab6ac4] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
37
|
Abstract
As the quantum chemistry (QC) community embraces machine learning (ML), the number of new methods and applications based on the combination of QC and ML is surging. In this Perspective, a view of the current state of affairs in this new and exciting research field is offered, challenges of using machine learning in quantum chemistry applications are described, and potential future developments are outlined. Specifically, examples of how machine learning is used to improve the accuracy and accelerate quantum chemical research are shown. Generalization and classification of existing techniques are provided to ease the navigation in the sea of literature and to guide researchers entering the field. The emphasis of this Perspective is on supervised machine learning.
Collapse
Affiliation(s)
- Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| |
Collapse
|
38
|
Affiliation(s)
- Marco Foscato
- Department of Chemistry, University of Bergen, Allégaten 41, N-5007 Bergen, Norway
| | - Vidar R. Jensen
- Department of Chemistry, University of Bergen, Allégaten 41, N-5007 Bergen, Norway
| |
Collapse
|
39
|
Patrizi B, Cozza C, Pietropaolo A, Foggi P, Siciliani de Cumis M. Synergistic Approach of Ultrafast Spectroscopy and Molecular Simulations in the Characterization of Intramolecular Charge Transfer in Push-Pull Molecules. Molecules 2020; 25:E430. [PMID: 31968694 PMCID: PMC7024558 DOI: 10.3390/molecules25020430] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Revised: 01/14/2020] [Accepted: 01/17/2020] [Indexed: 11/28/2022] Open
Abstract
The comprehensive characterization of Intramolecular Charge Transfer (ICT) stemming in push-pull molecules with a delocalized π-system of electrons is noteworthy for a bespoke design of organic materials, spanning widespread applications from photovoltaics to nanomedicine imaging devices. Photo-induced ICT is characterized by structural reorganizations, which allows the molecule to adapt to the new electronic density distribution. Herein, we discuss recent photophysical advances combined with recent progresses in the computational chemistry of photoactive molecular ensembles. We focus the discussion on femtosecond Transient Absorption Spectroscopy (TAS) enabling us to follow the transition from a Locally Excited (LE) state to the ICT and to understand how the environment polarity influences radiative and non-radiative decay mechanisms. In many cases, the charge transfer transition is accompanied by structural rearrangements, such as the twisting or molecule planarization. The possibility of an accurate prediction of the charge-transfer occurring in complex molecules and molecular materials represents an enormous advantage in guiding new molecular and materials design. We briefly report on recent advances in ultrafast multidimensional spectroscopy, in particular, Two-Dimensional Electronic Spectroscopy (2DES), in unraveling the ICT nature of push-pull molecular systems. A theoretical description at the atomistic level of photo-induced molecular transitions can predict with reasonable accuracy the properties of photoactive molecules. In this framework, the review includes a discussion on the advances from simulation and modeling, which have provided, over the years, significant information on photoexcitation, emission, charge-transport, and decay pathways. Density Functional Theory (DFT) coupled with the Time-Dependent (TD) framework can describe electronic properties and dynamics for a limited system size. More recently, Machine Learning (ML) or deep learning approaches, as well as free-energy simulations containing excited state potentials, can speed up the calculations with transferable accuracy to more complex molecules with extended system size. A perspective on combining ultrafast spectroscopy with molecular simulations is foreseen for optimizing the design of photoactive compounds with tunable properties.
Collapse
Affiliation(s)
- Barbara Patrizi
- National Institute of Optics-National Research Council (INO-CNR), Via Madonna del Piano 10, 50019 Sesto Fiorentino, Italy; (B.P.); (P.F.)
- European Laboratory for Non-Linear Spectroscopy (LENS),Via Nello Carrara 1, 50019 Sesto Fiorentino, Italy
| | - Concetta Cozza
- Dipartimento di Scienze della Salute, Università di Catanzaro, Viale Europa, 88100 Catanzaro, Italy; (C.C.); (A.P.)
| | - Adriana Pietropaolo
- Dipartimento di Scienze della Salute, Università di Catanzaro, Viale Europa, 88100 Catanzaro, Italy; (C.C.); (A.P.)
| | - Paolo Foggi
- National Institute of Optics-National Research Council (INO-CNR), Via Madonna del Piano 10, 50019 Sesto Fiorentino, Italy; (B.P.); (P.F.)
- European Laboratory for Non-Linear Spectroscopy (LENS),Via Nello Carrara 1, 50019 Sesto Fiorentino, Italy
- Dipartimento di Chimica, Biologia e Biotecnologie, Università di Perugia, Via Elce di Sotto 8, 06123 Perugia, Italy
| | | |
Collapse
|
40
|
Nandy A, Chu DBK, Harper DR, Duan C, Arunachalam N, Cytter Y, Kulik HJ. Large-scale comparison of 3d and 4d transition metal complexes illuminates the reduced effect of exchange on second-row spin-state energetics. Phys Chem Chem Phys 2020; 22:19326-19341. [DOI: 10.1039/d0cp02977g] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
The origin of distinct 3d vs. 4d transition metal complex sensitivity to exchange is explored over a large data set.
Collapse
Affiliation(s)
- Aditya Nandy
- Department of Chemical Engineering
- Massachusetts Institute of Technology
- Cambridge
- USA
- Department of Chemistry
| | - Daniel B. K. Chu
- Department of Chemical Engineering
- Massachusetts Institute of Technology
- Cambridge
- USA
| | - Daniel R. Harper
- Department of Chemical Engineering
- Massachusetts Institute of Technology
- Cambridge
- USA
- Department of Chemistry
| | - Chenru Duan
- Department of Chemical Engineering
- Massachusetts Institute of Technology
- Cambridge
- USA
- Department of Chemistry
| | - Naveen Arunachalam
- Department of Chemical Engineering
- Massachusetts Institute of Technology
- Cambridge
- USA
| | - Yael Cytter
- Department of Chemical Engineering
- Massachusetts Institute of Technology
- Cambridge
- USA
| | - Heather J. Kulik
- Department of Chemical Engineering
- Massachusetts Institute of Technology
- Cambridge
- USA
| |
Collapse
|
41
|
|
42
|
Toyao T, Maeno Z, Takakusagi S, Kamachi T, Takigawa I, Shimizu KI. Machine Learning for Catalysis Informatics: Recent Applications and Prospects. ACS Catal 2019. [DOI: 10.1021/acscatal.9b04186] [Citation(s) in RCA: 189] [Impact Index Per Article: 37.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Affiliation(s)
- Takashi Toyao
- Institute for Catalysis, Hokkaido University, N-21, W-10, Sapporo 001-0021, Japan
- Elements Strategy Initiative for Catalysts and Batteries, Kyoto University, Katsura, Kyoto 615-8520, Japan
| | - Zen Maeno
- Institute for Catalysis, Hokkaido University, N-21, W-10, Sapporo 001-0021, Japan
| | - Satoru Takakusagi
- Institute for Catalysis, Hokkaido University, N-21, W-10, Sapporo 001-0021, Japan
| | - Takashi Kamachi
- Elements Strategy Initiative for Catalysts and Batteries, Kyoto University, Katsura, Kyoto 615-8520, Japan
- Department of Life, Environment and Materials Science, Fukuoka Institute of Technology, 3-30-1Wajiro-Higashi, Higashi-ku, Fukuoka 811-0295, Japan
| | - Ichigaku Takigawa
- RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, Sapporo, Hokkaido 001-0021, Japan
| | - Ken-ichi Shimizu
- Institute for Catalysis, Hokkaido University, N-21, W-10, Sapporo 001-0021, Japan
- Elements Strategy Initiative for Catalysts and Batteries, Kyoto University, Katsura, Kyoto 615-8520, Japan
| |
Collapse
|
43
|
Glavatskikh M, Leguy J, Hunault G, Cauchy T, Da Mota B. Dataset's chemical diversity limits the generalizability of machine learning predictions. J Cheminform 2019; 11:69. [PMID: 33430991 PMCID: PMC6852905 DOI: 10.1186/s13321-019-0391-2] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Accepted: 10/28/2019] [Indexed: 01/18/2023] Open
Abstract
The QM9 dataset has become the golden standard for Machine Learning (ML) predictions of various chemical properties. QM9 is based on the GDB, which is a combinatorial exploration of the chemical space. ML molecular predictions have been recently published with an accuracy on par with Density Functional Theory calculations. Such ML models need to be tested and generalized on real data. PC9, a new QM9 equivalent dataset (only H, C, N, O and F and up to 9 “heavy” atoms) of the PubChemQC project is presented in this article. A statistical study of bonding distances and chemical functions shows that this new dataset encompasses more chemical diversity. Kernel Ridge Regression, Elastic Net and the Neural Network model provided by SchNet have been used on both datasets. The overall accuracy in energy prediction is higher for the QM9 subset. However, a model trained on PC9 shows a stronger ability to predict energies of the other dataset. ![]()
Collapse
Affiliation(s)
- Marta Glavatskikh
- LERIA, University of Angers, 2 Bd Lavoisier, 49045, Angers, France.,Laboratoire MOLTECH-Anjou, UMR CNRS 6200, SFR MATRIX, UNIV Angers, 2 Bd Lavoisier, 49045, Angers, France
| | - Jules Leguy
- LERIA, University of Angers, 2 Bd Lavoisier, 49045, Angers, France
| | - Gilles Hunault
- LERIA, University of Angers, 2 Bd Lavoisier, 49045, Angers, France.,HIFIH, EA 3859, Institut de Biologie en Santé PBH-IRIS, CHU, University of Angers, 4, Rue Larrey, 49933, Angers, France
| | - Thomas Cauchy
- Laboratoire MOLTECH-Anjou, UMR CNRS 6200, SFR MATRIX, UNIV Angers, 2 Bd Lavoisier, 49045, Angers, France.
| | - Benoit Da Mota
- LERIA, University of Angers, 2 Bd Lavoisier, 49045, Angers, France
| |
Collapse
|
44
|
Janet JP, Duan C, Yang T, Nandy A, Kulik HJ. A quantitative uncertainty metric controls error in neural network-driven chemical discovery. Chem Sci 2019; 10:7913-7922. [PMID: 31588334 PMCID: PMC6764470 DOI: 10.1039/c9sc02298h] [Citation(s) in RCA: 81] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2019] [Accepted: 07/11/2019] [Indexed: 12/14/2022] Open
Abstract
Machine learning (ML) models, such as artificial neural networks, have emerged as a complement to high-throughput screening, enabling characterization of new compounds in seconds instead of hours. The promise of ML models to enable large-scale chemical space exploration can only be realized if it is straightforward to identify when molecules and materials are outside the model's domain of applicability. Established uncertainty metrics for neural network models are either costly to obtain (e.g., ensemble models) or rely on feature engineering (e.g., feature space distances), and each has limitations in estimating prediction errors for chemical space exploration. We introduce the distance to available data in the latent space of a neural network ML model as a low-cost, quantitative uncertainty metric that works for both inorganic and organic chemistry. The calibrated performance of this approach exceeds widely used uncertainty metrics and is readily applied to models of increasing complexity at no additional cost. Tightening latent distance cutoffs systematically drives down predicted model errors below training errors, thus enabling predictive error control in chemical discovery or identification of useful data points for active learning.
Collapse
Affiliation(s)
- Jon Paul Janet
- Department of Chemical Engineering , Massachusetts Institute of Technology , Cambridge , MA 02139 , USA . ; Tel: +1-617-253-4584
| | - Chenru Duan
- Department of Chemical Engineering , Massachusetts Institute of Technology , Cambridge , MA 02139 , USA . ; Tel: +1-617-253-4584
- Department of Chemistry , Massachusetts Institute of Technology , Cambridge , MA 02139 , USA
| | - Tzuhsiung Yang
- Department of Chemical Engineering , Massachusetts Institute of Technology , Cambridge , MA 02139 , USA . ; Tel: +1-617-253-4584
| | - Aditya Nandy
- Department of Chemical Engineering , Massachusetts Institute of Technology , Cambridge , MA 02139 , USA . ; Tel: +1-617-253-4584
- Department of Chemistry , Massachusetts Institute of Technology , Cambridge , MA 02139 , USA
| | - Heather J Kulik
- Department of Chemical Engineering , Massachusetts Institute of Technology , Cambridge , MA 02139 , USA . ; Tel: +1-617-253-4584
| |
Collapse
|
45
|
Herr JE, Koh K, Yao K, Parkhill J. Compressing physics with an autoencoder: Creating an atomic species representation to improve machine learning models in the chemical sciences. J Chem Phys 2019; 151:084103. [DOI: 10.1063/1.5108803] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Affiliation(s)
- John E. Herr
- Department of Chemistry and Biochemistry, The University of Notre Dame du Lac, 251 Nieuwland Science Hall, Notre Dame, Indiana 46556, USA
| | - Kevin Koh
- Department of Chemistry and Biochemistry, The University of Notre Dame du Lac, 251 Nieuwland Science Hall, Notre Dame, Indiana 46556, USA
| | - Kun Yao
- Department of Chemistry and Biochemistry, The University of Notre Dame du Lac, 251 Nieuwland Science Hall, Notre Dame, Indiana 46556, USA
| | - John Parkhill
- Department of Chemistry and Biochemistry, The University of Notre Dame du Lac, 251 Nieuwland Science Hall, Notre Dame, Indiana 46556, USA
| |
Collapse
|
46
|
Kulik HJ. Making machine learning a useful tool in the accelerated discovery of transition metal complexes. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2019. [DOI: 10.1002/wcms.1439] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Affiliation(s)
- Heather J. Kulik
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge Massachusetts
| |
Collapse
|
47
|
Nandy A, Zhu J, Janet JP, Duan C, Getman RB, Kulik HJ. Machine Learning Accelerates the Discovery of Design Rules and Exceptions in Stable Metal–Oxo Intermediate Formation. ACS Catal 2019. [DOI: 10.1021/acscatal.9b02165] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
| | - Jiazhou Zhu
- Department of Chemical & Biomolecular Engineering, Clemson University, Clemson, South Carolina 29634, United States
| | | | | | - Rachel B. Getman
- Department of Chemical & Biomolecular Engineering, Clemson University, Clemson, South Carolina 29634, United States
| | | |
Collapse
|
48
|
Sawatlon B, Wodrich MD, Meyer B, Fabrizio A, Corminboeuf C. Data Mining the C−C Cross‐Coupling Genome. ChemCatChem 2019. [DOI: 10.1002/cctc.201900597] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Boodsarin Sawatlon
- Laboratory for Computational Molecular Design Institute of Chemical Sciences and EngineeringEcole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
| | - Matthew D. Wodrich
- Laboratory for Computational Molecular Design Institute of Chemical Sciences and EngineeringEcole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
| | - Benjamin Meyer
- Laboratory for Computational Molecular Design Institute of Chemical Sciences and EngineeringEcole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
- National Centre for Computational Design and Discovery of Novel Materials (MARVEL)Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
| | - Alberto Fabrizio
- Laboratory for Computational Molecular Design Institute of Chemical Sciences and EngineeringEcole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
- National Centre for Computational Design and Discovery of Novel Materials (MARVEL)Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
| | - Clémence Corminboeuf
- Laboratory for Computational Molecular Design Institute of Chemical Sciences and EngineeringEcole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
- National Centre for Computational Design and Discovery of Novel Materials (MARVEL)Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
| |
Collapse
|