1
|
Lee I, Lee J, Kim M, Park J, Kim H, Lee S, Min K. Uncovering the Relationship between Metal Elements and Mechanical Stability for Metal-Organic Frameworks. ACS APPLIED MATERIALS & INTERFACES 2024; 16:52162-52178. [PMID: 39308060 DOI: 10.1021/acsami.4c07775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/04/2024]
Abstract
Assessing the mechanical robustness of metal-organic frameworks (MOFs) is crucial to enhance their applicability in various fields. Although considerable research has been conducted on the relationship between the mechanical properties of MOFs and their structural features (such as pore size, surface area, and topology), the insufficient exploration of metal elements has prevented researchers from fully understanding their mechanical behavior. To plug this knowledge gap, we constructed a database of mechanical properties for 20,342 MOFs included in the QMOF database using molecular simulations to investigate the impact of metal elements on mechanical stability. Through Shapley additive explanations (SHAP) analysis, we found that Co and Ln could enhance the structural stability of MOFs. We validated these findings using newly generated hypothetical MOFs. Notably, we adopted an interpretable machine learning technique to analyze the contribution of remarkably diverse metal elements in the 20,342 MOFs to the mechanical properties of each MOF. We anticipate that this research will serve as a valuable tool for future studies on identifying mechanically robust MOFs suitable for various industrial applications.
Collapse
Affiliation(s)
- Inhyo Lee
- School of Mechanical Engineering, Soongsil University, 369 Sangdo-ro, Dongjak-gu, Seoul 06978, Republic of Korea
| | - Jaejun Lee
- Department of Mechanical Engineering, Pohang University of Science and Technology, 77 Cheongam-ro, Pohang 37673, Republic of Korea
| | - Minseon Kim
- School of Mechanical Engineering, Soongsil University, 369 Sangdo-ro, Dongjak-gu, Seoul 06978, Republic of Korea
| | - Jaejung Park
- Department of Mechanical Engineering, Pohang University of Science and Technology, 77 Cheongam-ro, Pohang 37673, Republic of Korea
| | - Heekyu Kim
- Department of Mechanical Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| | - Seungchul Lee
- Department of Mechanical Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| | - Kyoungmin Min
- School of Mechanical Engineering, Soongsil University, 369 Sangdo-ro, Dongjak-gu, Seoul 06978, Republic of Korea
| |
Collapse
|
2
|
Wu X, Jiang J. Precision-engineered metal-organic frameworks: fine-tuning reverse topological structure prediction and design. Chem Sci 2024:d4sc05616g. [PMID: 39345765 PMCID: PMC11423560 DOI: 10.1039/d4sc05616g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Accepted: 09/18/2024] [Indexed: 10/01/2024] Open
Abstract
Digital discoveries of metal-organic frameworks (MOFs) have been significantly advanced by the reverse topological approach (RTA). The node-and-linker assembly strategy allows predictable reticulations predefined by in silico coordination templates; however, reticular equivalents lead to substantial combinatorial explosion due to the infinite design space of building units (BUs). Here, we develop a fine-tuned RTA for the structure prediction of MOFs by integrating precise topological constraints and leveraging reticular chemistry, thus transcending traditional exhaustive trial-and-error assembly. From an extensive array of chemically realistic BUs, we subsequently design a database of 94 823 precision-engineered MOFs (PE-MOFs) and further optimize their structures. The PE-MOFs are assessed for post-combustion CO2 capture in the presence of H2O and top-performing candidates are identified by integrating three stability criteria (activation, water and thermal stabilities). This study highlights the potential of synergizing PE with the RTA to enhance efficiency and precision for computational design of MOFs and beyond.
Collapse
Affiliation(s)
- Xiaoyu Wu
- Department of Chemical and Biomolecular Engineering, National University of Singapore 117576 Singapore
| | - Jianwen Jiang
- Department of Chemical and Biomolecular Engineering, National University of Singapore 117576 Singapore
| |
Collapse
|
3
|
Jin H, Merz KM. LigandDiff: de Novo Ligand Design for 3D Transition Metal Complexes with Diffusion Models. J Chem Theory Comput 2024; 20:4377-4384. [PMID: 38743854 PMCID: PMC11137811 DOI: 10.1021/acs.jctc.4c00232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 05/06/2024] [Accepted: 05/07/2024] [Indexed: 05/16/2024]
Abstract
Transition metal complexes are a class of compounds with varied and versatile properties, making them of great technological importance. Their applications cover a wide range of fields, either as metallodrugs in medicine or as materials, catalysts, batteries, solar cells, etc. The demand for the novel design of transition metal complexes with new properties remains of great interest. However, the traditional high-throughput screening approach is inherently expensive and laborious since it depends on human expertise. Here, we present LigandDiff, a generative model for the de novo design of novel transition metal complexes. Unlike the existing methods that simply extract and combine ligands with the metal to get new complexes, LigandDiff aims at designing configurationally novel ligands from scratch, which opens new pathways for the discovery of organometallic complexes. Moreover, it overcomes the limitations of current methods, where the diversity of new complexes highly relies on the diversity of available ligands, while LigandDiff can design numerous novel ligands without human intervention. Our results indicate that LigandDiff designs unique and novel ligands under different contexts, and these generated ligands are synthetically accessible. Moreover, LigandDiff shows good transferability by generating successful ligands for any transition metal complex.
Collapse
Affiliation(s)
- Hongni Jin
- Department
of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States
| | - Kenneth M. Merz
- Department
of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States
- Department
of Biochemistry and Molecular Biology, Michigan
State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
4
|
Sgueglia G, Vrettas MD, Chino M, De Simone A, Lombardi A. MetalHawk: Enhanced Classification of Metal Coordination Geometries by Artificial Neural Networks. J Chem Inf Model 2024; 64:2356-2367. [PMID: 37956388 PMCID: PMC11005052 DOI: 10.1021/acs.jcim.3c00873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 09/29/2023] [Accepted: 10/26/2023] [Indexed: 11/15/2023]
Abstract
The chemical properties of metal complexes are strongly dependent on the number and geometrical arrangement of ligands coordinated to the metal center. Existing methods for determining either coordination number or geometry rely on a trade-off between accuracy and computational costs, which hinders their application to the study of large structure data sets. Here, we propose MetalHawk (https://github.com/vrettasm/MetalHawk), a machine learning-based approach to perform simultaneous classification of metal site coordination number and geometry through artificial neural networks (ANNs), which were trained using the Cambridge Structural Database (CSD) and Metal Protein Data Bank (MetalPDB). We demonstrate that the CSD-trained model can be used to classify sites belonging to the most common coordination numbers and geometry classes with balanced accuracy equal to 96.51% for CSD-deposited metal sites. The CSD-trained model was also found to be capable of classifying bioinorganic metal sites from the MetalPDB database, with balanced accuracy equal to 84.29% on the whole PDB data set and to 91.66% on manually reviewed sites in the PDB validation set. Moreover, we report evidence that the output vectors of the CSD-trained model can be considered as a proxy indicator of metal-site distortions, showing that these can be interpreted as a low-dimensional representation of subtle geometrical features present in metal site structures.
Collapse
Affiliation(s)
- Gianmattia Sgueglia
- Department
of Chemical Sciences, University of Naples
Federico II, Via Cintia 21, 80126 Napoli, Italy
| | - Michail D. Vrettas
- Department
of Pharmacy, University of Naples Federico
II, Via Domenico Montesano
49, 80131 Napoli, Italy
| | - Marco Chino
- Department
of Chemical Sciences, University of Naples
Federico II, Via Cintia 21, 80126 Napoli, Italy
| | - Alfonso De Simone
- Department
of Pharmacy, University of Naples Federico
II, Via Domenico Montesano
49, 80131 Napoli, Italy
| | - Angela Lombardi
- Department
of Chemical Sciences, University of Naples
Federico II, Via Cintia 21, 80126 Napoli, Italy
| |
Collapse
|
5
|
Kirkland JK, Kumawat J, Shaban Tameh M, Tolman T, Lambert AC, Lief GR, Yang Q, Ess DH. Machine Learning Models for Predicting Zirconocene Properties and Barriers. J Chem Inf Model 2024; 64:775-784. [PMID: 38259142 DOI: 10.1021/acs.jcim.3c01575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Zr metallocenes have significant potential to be highly tunable polyethylene catalysts through modification of the aromatic ligand framework. Here we report the development of multiple machine learning models using a large library (>700 systems) of DFT-calculated zirconocene properties and barriers for ethylene polymerization. We show that very accurate machine learning models are possible for HOMO-LUMO gaps of precatalysts but the performance significantly depends on the machine learning algorithm and type of featurization, such as fingerprints, Coulomb matrices, smooth overlap of atomic positions, or persistence images. Surprisingly, the description of the bonding hapticity, the number of direct connections between Zr and the ligand aromatic carbons, only has a moderate influence on the performance of most models. Despite robust models for HOMO-LUMO gaps, these types of machine learning models based on structure connectivity type features perform poorly in predicting ethylene migratory insertion barrier heights. Therefore, we developed several relatively robust and accurate machine learning models for barrier heights that are based on quantum-chemical descriptors (QCDs). The quantitative accuracy of these models depends on which potential energy surface structure QCDs were harvested from. This revealed a Hammett-type principle to naturally emerge showing that QCDs from the π-coordination complexes provide much better descriptions of the transition states than other potential-energy structures. Feature importance analysis of the QCDs provides several fundamental principles that influence zirconocene catalyst reactivity.
Collapse
Affiliation(s)
- Justin K Kirkland
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604, United States
| | - Jugal Kumawat
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604, United States
| | - Maliheh Shaban Tameh
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604, United States
| | - Tyson Tolman
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604, United States
| | - Allison C Lambert
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604, United States
| | - Graham R Lief
- Research and Technology, Chevron Phillips Chemical Company, Highways 60 & 123, Bartlesville, Oklahoma 74003, United States
| | - Qing Yang
- Research and Technology, Chevron Phillips Chemical Company, Highways 60 & 123, Bartlesville, Oklahoma 74003, United States
| | - Daniel H Ess
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604, United States
| |
Collapse
|
6
|
Vennelakanti V, Kilic IB, Terrones GG, Duan C, Kulik HJ. Machine Learning Prediction of the Experimental Transition Temperature of Fe(II) Spin-Crossover Complexes. J Phys Chem A 2024; 128:204-216. [PMID: 38148525 DOI: 10.1021/acs.jpca.3c07104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2023]
Abstract
Spin-crossover (SCO) complexes are materials that exhibit changes in the spin state in response to external stimuli, with potential applications in molecular electronics. It is challenging to know a priori how to design ligands to achieve the delicate balance of entropic and enthalpic contributions needed to tailor a transition temperature close to room temperature. We leverage the SCO complexes from the previously curated SCO-95 data set [Vennelakanti et al. J. Chem. Phys. 159, 024120 (2023)] to train three machine learning (ML) models for transition temperature (T1/2) prediction using graph-based revised autocorrelations as features. We perform feature selection using random forest-ranked recursive feature addition (RF-RFA) to identify the features essential to model transferability. Of the ML models considered, the full feature set RF and recursive feature addition RF models perform best, achieving moderate correlation to experimental T1/2 values. We then compare ML T1/2 predictions to those from three previously identified best-performing density functional approximations (DFAs) which accurately predict SCO behavior across SCO-95, finding that the ML models predict T1/2 more accurately than the best-performing DFAs. In addition, we study ML model predictions for a set of 18 SCO complexes for which only estimated T1/2 values are available. Upon excluding outliers from this set, the RF-RFA RF model shows a strong correlation to estimated T1/2 values with a Pearson's r of 0.82. In contrast, DFA-predicted T1/2 values have large errors and show no correlation to estimated T1/2 values over the same set of complexes. Overall, our study demonstrates slightly superior performance of ML models in comparison with some of the best-performing DFAs, and we expect ML models to improve further as larger data sets of SCO complexes are curated and become available for model training.
Collapse
Affiliation(s)
- Vyshnavi Vennelakanti
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Irem B Kilic
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Gianmarco G Terrones
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
7
|
Garrison A, Heras-Domingo J, Kitchin JR, dos Passos Gomes G, Ulissi ZW, Blau SM. Applying Large Graph Neural Networks to Predict Transition Metal Complex Energies Using the tmQM_wB97MV Data Set. J Chem Inf Model 2023; 63:7642-7654. [PMID: 38049389 PMCID: PMC10751796 DOI: 10.1021/acs.jcim.3c01226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 11/08/2023] [Accepted: 11/20/2023] [Indexed: 12/06/2023]
Abstract
Machine learning (ML) methods have shown promise for discovering novel catalysts but are often restricted to specific chemical domains. Generalizable ML models require large and diverse training data sets, which exist for heterogeneous catalysis but not for homogeneous catalysis. The tmQM data set, which contains properties of 86,665 transition metal complexes calculated at the TPSSh/def2-SVP level of density functional theory (DFT), provided a promising training data set for homogeneous catalyst systems. However, we find that ML models trained on tmQM consistently underpredict the energies of a chemically distinct subset of the data. To address this, we present the tmQM_wB97MV data set, which filters out several structures in tmQM found to be missing hydrogens and recomputes the energies of all other structures at the ωB97M-V/def2-SVPD level of DFT. ML models trained on tmQM_wB97MV show no pattern of consistently incorrect predictions and much lower errors than those trained on tmQM. The ML models tested on tmQM_wB97MV were, from best to worst, GemNet-T > PaiNN ≈ SpinConv > SchNet. Performance consistently improves when using only neutral structures instead of the entire data set. However, while models saturate with only neutral structures, more data continue to improve the models when including charged species, indicating the importance of accurately capturing a range of oxidation states in future data generation and model development. Furthermore, a fine-tuning approach in which weights were initialized from models trained on OC20 led to drastic improvements in model performance, indicating transferability between ML strategies of heterogeneous and homogeneous systems.
Collapse
Affiliation(s)
- Aaron
G. Garrison
- Department
of Chemical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
| | - Javier Heras-Domingo
- Department
of Chemical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
| | - John R. Kitchin
- Department
of Chemical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
| | - Gabriel dos Passos Gomes
- Department
of Chemical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
- Department
of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
- Wilton
E. Scott Institute for Energy Innovation, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Zachary W. Ulissi
- Department
of Chemical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
- Wilton
E. Scott Institute for Energy Innovation, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Samuel M. Blau
- Lawrence
Berkeley National Laboratory, Berkeley, California 94720, United States
| |
Collapse
|
8
|
Kevlishvili I, Duan C, Kulik HJ. Classification of Hemilabile Ligands Using Machine Learning. J Phys Chem Lett 2023:11100-11109. [PMID: 38051982 DOI: 10.1021/acs.jpclett.3c02828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
Hemilabile ligands have the capacity to partially disengage from a metal center, providing a strategy to balance stability and reactivity in catalysis, but they are not straightforward to identify. We identify ligands in the Cambridge Structural Database that have been crystallized with distinct denticities and are thus identifiable as hemilabile ligands. We implement a semi-supervised learning approach using a label-spreading algorithm to augment a small negative set that is supported by heuristic rules of ligand and metal co-occurrence. We show that a heuristic based on coordinating atom identity alone is not sufficient to identify whether a ligand is hemilabile, and our trained machine-learning classification models are instead needed to predict whether a bi-, tri-, or tetradentate ligand is hemilabile with high accuracy and precision. Feature importance analysis of our models shows that the second, third, and fourth coordination spheres all play important roles in ligand hemilability.
Collapse
Affiliation(s)
- Ilia Kevlishvili
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
9
|
Xiang Y, Tang YH, Gong Z, Liu H, Wu L, Lin G, Sun H. Efficient Exploration of Chemical Compound Space Using Active Learning for Prediction of Thermodynamic Properties of Alkane Molecules. J Chem Inf Model 2023; 63:6515-6524. [PMID: 37857374 DOI: 10.1021/acs.jcim.3c01430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2023]
Abstract
We introduce an exploratory active learning (AL) algorithm using Gaussian process regression and marginalized graph kernel (GPR-MGK) to sample chemical compound space (CCS) at minimal cost. Targeting 251,728 enumerated alkane molecules with 4-19 carbon atoms, we applied the AL algorithm to select a diverse and representative set of molecules and then conducted high-throughput molecular simulations on these selected molecules. To demonstrate the power of the AL algorithm, we built directed message-passing neural networks (D-MPNN) using simulation data as the training set to predict liquid densities, heat capacities, and vaporization enthalpies of the CCS. Validations show that D-MPNN models built on the smallest training set considered in this work, which consists of 313 molecules or 0.124% of the original CCS, predict the properties with R2 > 0.99 against the computational data and R2 > 0.94 against the experimental data. The advantage of the presented AL algorithm is that the predicted uncertainty of GPR depends on only the molecular structures, which renders it compatible with high-throughput data generation.
Collapse
Affiliation(s)
- Yan Xiang
- School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yu-Hang Tang
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
- NVIDIA Corporation, Santa Clara, California 95051, United States
| | - Zheng Gong
- School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Hongyi Liu
- School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Liang Wu
- School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Guang Lin
- Department of Mathematics & School of Mechanical Engineering, Purdue University, West Lafayette, Indiana 47907, United States
| | - Huai Sun
- School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
10
|
Lambor SM, Kasiraju S, Vlachos DG. CKineticsDB─An Extensible and FAIR Data Management Framework and Datahub for Multiscale Modeling in Heterogeneous Catalysis. J Chem Inf Model 2023; 63:4342-4354. [PMID: 37436913 DOI: 10.1021/acs.jcim.3c00123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/14/2023]
Abstract
A great advantage of computational research is its reproducibility and reusability. However, an enormous amount of computational research data in heterogeneous catalysis is barricaded due to logistical limitations. Sufficient provenance and characterization of data and computational environment, with uniform organization and easy accessibility, can allow the development of software tools for integration across the multiscale modeling workflow. Here, we develop the Chemical Kinetics Database, CKineticsDB, a state-of-the-art datahub for multiscale modeling, designed to be compliant with the FAIR guiding principles for scientific data management. CKineticsDB utilizes a MongoDB back-end for extensibility and adaptation to varying data formats, with a referencing-based data model to reduce redundancy in storage. We have developed a Python software program for data processing operations and with built-in features to extract data for common applications. CKineticsDB evaluates the incoming data for quality and uniformity, retains curated information from simulations, enables accurate regeneration of publication results, optimizes storage, and allows the selective retrieval of files based on domain-relevant catalyst and simulation parameters. CKineticsDB provides data from multiple scales of theory (ab initio calculations, thermochemistry, and microkinetic models) to accelerate the development of new reaction pathways, kinetic analysis of reaction mechanisms, and catalysis discovery, along with several data-driven applications.
Collapse
Affiliation(s)
- Siddhant M Lambor
- RAPID Manufacturing Institute, Delaware Energy Institute, University of Delaware, Newark, Delaware 19716, United States
| | - Sashank Kasiraju
- RAPID Manufacturing Institute, Delaware Energy Institute, University of Delaware, Newark, Delaware 19716, United States
| | - Dionisios G Vlachos
- RAPID Manufacturing Institute, Delaware Energy Institute, University of Delaware, Newark, Delaware 19716, United States
- Department of Chemical and Biomolecular Engineering and Catalysis Center for Energy Innovation (CCEI), University of Delaware, Newark, Delaware 19716, United States
| |
Collapse
|
11
|
Adamji H, Nandy A, Kevlishvili I, Román-Leshkov Y, Kulik HJ. Computational Discovery of Stable Metal-Organic Frameworks for Methane-to-Methanol Catalysis. J Am Chem Soc 2023. [PMID: 37339429 DOI: 10.1021/jacs.3c03351] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/22/2023]
Abstract
The challenge of direct partial oxidation of methane to methanol has motivated the targeted search of metal-organic frameworks (MOFs) as a promising class of materials for this transformation because of their site-isolated metals with tunable ligand environments. Thousands of MOFs have been synthesized, yet relatively few have been screened for their promise in methane conversion. We developed a high-throughput virtual screening workflow that identifies MOFs from a diverse space of experimental MOFs that have not been studied for catalysis, yet are thermally stable, synthesizable, and have promising unsaturated metal sites for C-H activation via a terminal metal-oxo species. We carried out density functional theory calculations of the radical rebound mechanism for methane-to-methanol conversion on models of the secondary building units (SBUs) from 87 selected MOFs. While we showed that oxo formation favorability decreases with increasing 3d filling, consistent with prior work, previously observed scaling relations between oxo formation and hydrogen atom transfer (HAT) are disrupted by the greater diversity in our MOF set. Accordingly, we focused on Mn MOFs, which favor oxo intermediates without disfavoring HAT or leading to high methanol release energies─a key feature for methane hydroxylation activity. We identified three Mn MOFs comprising unsaturated Mn centers bound to weak-field carboxylate ligands in planar or bent geometries with promising methane-to-methanol kinetics and thermodynamics. The energetic spans of these MOFs are indicative of promising turnover frequencies for methane to methanol that warrant further experimental catalytic studies.
Collapse
Affiliation(s)
- Husain Adamji
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Ilia Kevlishvili
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Yuriy Román-Leshkov
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
12
|
Oliveira FL, Cleeton C, Neumann Barros Ferreira R, Luan B, Farmahini AH, Sarkisov L, Steiner M. CRAFTED: An exploratory database of simulated adsorption isotherms of metal-organic frameworks. Sci Data 2023; 10:230. [PMID: 37081024 PMCID: PMC10119274 DOI: 10.1038/s41597-023-02116-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Accepted: 03/28/2023] [Indexed: 04/22/2023] Open
Abstract
Grand Canonical Monte Carlo is an important method for performing molecular-level simulations and assisting the study and development of nanoporous materials for gas capture applications. These simulations are based on the use of force fields and partial charges to model the interaction between the adsorbent molecules and the solid framework. The choice of the force field parameters and partial charges can significantly impact the results obtained, however, there are very few databases available to support a comprehensive impact evaluation. Here, we present a database of simulations of CO2 and N2 adsorption isotherms on 690 metal-organic frameworks taken from the CoRE MOF 2014 database. We performed simulations with two force fields (UFF and DREIDING), six partial charge schemes (no charges, Qeq, EQeq, MPNN, PACMOF, and DDEC), and three temperatures (273, 298, 323 K). The resulting isotherms compose the Charge-dependent, Reproducible, Accessible, Forcefield-dependent, and Temperature-dependent Exploratory Database (CRAFTED) of adsorption isotherms.
Collapse
Affiliation(s)
- Felipe Lopes Oliveira
- IBM Research, Av. República do Chile, 330, CEP 20031-170, Rio de Janeiro, RJ, Brazil
- Department of Organic Chemistry, Instituto de Química, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, Brazil
| | - Conor Cleeton
- Department of Chemical Engineering, Engineering A, the University of Manchester, Manchester, M13 9PL, United Kingdom
| | | | - Binquan Luan
- IBM Research, 1101 Kitchawan Road, Yorktown Heights, 10598, NY, United States of America
| | - Amir H Farmahini
- Department of Chemical Engineering, Engineering A, the University of Manchester, Manchester, M13 9PL, United Kingdom
| | - Lev Sarkisov
- Department of Chemical Engineering, Engineering A, the University of Manchester, Manchester, M13 9PL, United Kingdom
| | - Mathias Steiner
- IBM Research, Av. República do Chile, 330, CEP 20031-170, Rio de Janeiro, RJ, Brazil
| |
Collapse
|
13
|
Cytter Y, Nandy A, Duan C, Kulik HJ. Insights into the deviation from piecewise linearity in transition metal complexes from supervised machine learning models. Phys Chem Chem Phys 2023; 25:8103-8116. [PMID: 36876903 DOI: 10.1039/d3cp00258f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Abstract
Virtual high-throughput screening (VHTS) and machine learning (ML) with density functional theory (DFT) suffer from inaccuracies from the underlying density functional approximation (DFA). Many of these inaccuracies can be traced to the lack of derivative discontinuity that leads to a curvature in the energy with electron addition or removal. Over a dataset of nearly one thousand transition metal complexes typical of VHTS applications, we computed and analyzed the average curvature (i.e., deviation from piecewise linearity) for 23 density functional approximations spanning multiple rungs of "Jacob's ladder". While we observe the expected dependence of the curvatures on Hartree-Fock exchange, we note limited correlation of curvature values between different rungs of "Jacob's ladder". We train ML models (i.e., artificial neural networks or ANNs) to predict the curvature and the associated frontier orbital energies for each of these 23 functionals and then interpret differences in curvature among the different DFAs through analysis of the ML models. Notably, we observe spin to play a much more important role in determining the curvature of range-separated and double hybrids in comparison to semi-local functionals, explaining why curvature values are weakly correlated between these and other families of functionals. Over a space of 187.2k hypothetical compounds, we use our ANNs to pinpoint DFAs for which representative transition metal complexes have near-zero curvature with low uncertainty, demonstrating an approach to accelerate screening of complexes with targeted optical gaps.
Collapse
Affiliation(s)
- Yael Cytter
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| |
Collapse
|
14
|
Zaverkin V, Holzmüller D, Bonfirraro L, Kästner J. Transfer learning for chemically accurate interatomic neural network potentials. Phys Chem Chem Phys 2023; 25:5383-5396. [PMID: 36748821 DOI: 10.1039/d2cp05793j] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Developing machine learning-based interatomic potentials from ab initio electronic structure methods remains a challenging task for computational chemistry and materials science. This work studies the capability of transfer learning, in particular discriminative fine-tuning, for efficiently generating chemically accurate interatomic neural network potentials on organic molecules from the MD17 and ANI data sets. We show that pre-training the network parameters on data obtained from density functional calculations considerably improves the sample efficiency of models trained on more accurate ab initio data. Additionally, we show that fine-tuning with energy labels alone can suffice to obtain accurate atomic forces and run large-scale atomistic simulations, provided a well-designed fine-tuning data set. We also investigate possible limitations of transfer learning, especially regarding the design and size of the pre-training and fine-tuning data sets. Finally, we provide GM-NN potentials pre-trained and fine-tuned on the ANI-1x and ANI-1ccx data sets, which can easily be fine-tuned on and applied to organic molecules.
Collapse
Affiliation(s)
- Viktor Zaverkin
- Faculty of Chemistry, Institute for Theoretical Chemistry, University of Stuttgart, Germany.
| | - David Holzmüller
- Faculty of Mathematics and Physics, Institute for Stochastics and Applications, University of Stuttgart, Germany.
| | - Luca Bonfirraro
- Faculty of Chemistry, Institute for Theoretical Chemistry, University of Stuttgart, Germany.
| | - Johannes Kästner
- Faculty of Chemistry, Institute for Theoretical Chemistry, University of Stuttgart, Germany.
| |
Collapse
|
15
|
Cao Z, Magar R, Wang Y, Barati Farimani A. MOFormer: Self-Supervised Transformer Model for Metal-Organic Framework Property Prediction. J Am Chem Soc 2023; 145:2958-2967. [PMID: 36706365 PMCID: PMC10041520 DOI: 10.1021/jacs.2c11420] [Citation(s) in RCA: 28] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Indexed: 01/28/2023]
Abstract
Metal-organic frameworks (MOFs) are materials with a high degree of porosity that can be used for many applications. However, the chemical space of MOFs is enormous due to the large variety of possible combinations of building blocks and topology. Discovering the optimal MOFs for specific applications requires an efficient and accurate search over countless potential candidates. Previous high-throughput screening methods using computational simulations like DFT can be time-consuming. Such methods also require the 3D atomic structures of MOFs, which adds one extra step when evaluating hypothetical MOFs. In this work, we propose a structure-agnostic deep learning method based on the Transformer model, named as MOFormer, for property predictions of MOFs. MOFormer takes a text string representation of MOF (MOFid) as input, thus circumventing the need of obtaining the 3D structure of a hypothetical MOF and accelerating the screening process. By comparing to other descriptors such as Stoichiometric-120 and revised autocorrelations, we demonstrate that MOFormer can achieve state-of-the-art structure-agnostic prediction accuracy on all benchmarks. Furthermore, we introduce a self-supervised learning framework that pretrains the MOFormer via maximizing the cross-correlation between its structure-agnostic representations and structure-based representations of the crystal graph convolutional neural network (CGCNN) on >400k publicly available MOF data. Benchmarks show that pretraining improves the prediction accuracy of both models on various downstream prediction tasks. Furthermore, we revealed that MOFormer can be more data-efficient on quantum-chemical property prediction than structure-based CGCNN when training data is limited. Overall, MOFormer provides a novel perspective on efficient MOF property prediction using deep learning.
Collapse
Affiliation(s)
- Zhonglin Cao
- Department
of Mechanical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania15213, United States
| | - Rishikesh Magar
- Department
of Mechanical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania15213, United States
| | - Yuyang Wang
- Department
of Mechanical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania15213, United States
| | - Amir Barati Farimani
- Department
of Mechanical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania15213, United States
- Department
of Chemical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania15213, United States
- Machine
Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania15213, United States
| |
Collapse
|
16
|
Terrones GG, Duan C, Nandy A, Kulik HJ. Low-cost machine learning prediction of excited state properties of iridium-centered phosphors. Chem Sci 2023; 14:1419-1433. [PMID: 36794185 PMCID: PMC9906783 DOI: 10.1039/d2sc06150c] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 01/05/2023] [Indexed: 01/07/2023] Open
Abstract
Prediction of the excited state properties of photoactive iridium complexes challenges ab initio methods such as time-dependent density functional theory (TDDFT) both from the perspective of accuracy and of computational cost, complicating high-throughput virtual screening (HTVS). We instead leverage low-cost machine learning (ML) models and experimental data for 1380 iridium complexes to perform these prediction tasks. We find the best-performing and most transferable models to be those trained on electronic structure features from low-cost density functional tight binding calculations. Using artificial neural network (ANN) models, we predict the mean emission energy of phosphorescence, the excited state lifetime, and the emission spectral integral for iridium complexes with accuracy competitive with or superseding that of TDDFT. We conduct feature importance analysis to determine that high cyclometalating ligand ionization potential correlates to high mean emission energy, while high ancillary ligand ionization potential correlates to low lifetime and low spectral integral. As a demonstration of how our ML models can be used for HTVS and the acceleration of chemical discovery, we curate a set of novel hypothetical iridium complexes and use uncertainty-controlled predictions to identify promising ligands for the design of new phosphors while retaining confidence in the quality of the ANN predictions.
Collapse
Affiliation(s)
- Gianmarco G Terrones
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
- Department of Chemistry, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
- Department of Chemistry, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
- Department of Chemistry, Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
17
|
Albavera-Mata A, Trickey SB, Hennig RG. Mean Value Ensemble Hubbard- U Correction for Spin-Crossover Molecules. J Phys Chem Lett 2022; 13:12049-12054. [PMID: 36542415 DOI: 10.1021/acs.jpclett.2c03388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
High-throughput searches for spin-crossover molecules require Hubbard-U corrections to common density functional exchange-correlation (XC) approximations. However, the Ueff values obtained from linear response or based on previous studies overcorrect the spin-crossover energies. We demonstrate that employing a linearly mixed ensemble average spin state as the reference configuration for the linear response calculation of Ueff resolves this issue. Validation on a commonly used set of spin-crossover complexes shows that these ensemble Ueff values consistently are smaller than those calculated directly on a pure spin state, irrespective of whether that be low- or high-spin. Adiabatic crossover energies using this methodology for a generalized gradient approximation XC functional are closer to the expected target energy range than with conventional Ueff values. Based on the observation that the Ueff correction is similar for different complexes that share transition metals with the same oxidation state, we devise a set of recommended averaged Ueff values for high-throughput calculations.
Collapse
Affiliation(s)
- Angel Albavera-Mata
- Center for Molecular Magnetic Quantum Materials, Quantum Theory Project, University of Florida, Gainesville, Florida32611, United States
- Department of Materials Science and Engineering, University of Florida, Gainesville, Florida32611, United States
| | - S B Trickey
- Center for Molecular Magnetic Quantum Materials, Quantum Theory Project, University of Florida, Gainesville, Florida32611, United States
- Department of Physics and Department of Chemistry, University of Florida, Gainesville, Florida32611, United States
| | - Richard G Hennig
- Center for Molecular Magnetic Quantum Materials, Quantum Theory Project, University of Florida, Gainesville, Florida32611, United States
- Department of Materials Science and Engineering, University of Florida, Gainesville, Florida32611, United States
| |
Collapse
|
18
|
van Beek B, Zito J, Visscher L, Infante I. CAT: A Compound Attachment Tool for the Construction of Composite Chemical Compounds. J Chem Inf Model 2022; 62:5525-5535. [PMID: 36314636 PMCID: PMC9976287 DOI: 10.1021/acs.jcim.2c00690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The continuous improvement of computer architectures allows for the simulation of molecular systems of growing sizes. However, such calculations still require the input of initial structures, which are also becoming increasingly complex. In this work, we present CAT, a Compound Attachment Tool (source code available at https://github.com/nlesc-nano/CAT) and Python package for the automatic construction of composite chemical compounds, which supports the functionalization of organic, inorganic, and hybrid organic-inorganic materials. The CAT workflow consists in defining the anchoring sites on the reference material, usually a large molecular system denoted as a scaffold, and on the molecular species that are attached to it, i.e., the ligands. Usually, ligands are pre-optimized in a conformation biased toward more linear structures to minimize interligand(s) steric interactions, a bias that is important when multiple ligands are attached onto the scaffold. The resulting superstructure(s) are then stored in various formats that can be used afterward in quantum chemical calculations or classical force field-based simulations.
Collapse
Affiliation(s)
- Bas van Beek
- Division
of Theoretical Chemistry, Faculty of Science, Vrije Universiteit Amsterdam, de Boelelaan 1083, Amsterdam 1081 HV, the Netherlands
| | - Juliette Zito
- Dipartimento
di Chimica e Chimica Industriale, Università
degli Studi di Genova, Via Dodecaneso 31, Genova 16146, Italy,Department
of Nanochemistry, Istituto Italiano di Tecnologia, Via Morego 30, Genova 16163, Italy
| | - Lucas Visscher
- Division
of Theoretical Chemistry, Faculty of Science, Vrije Universiteit Amsterdam, de Boelelaan 1083, Amsterdam 1081 HV, the Netherlands,
| | - Ivan Infante
- Department
of Nanochemistry, Istituto Italiano di Tecnologia, Via Morego 30, Genova 16163, Italy,BCMaterials,
Basque Center for Materials, Applications, and Nanostructures, UPV/EHU Science Park, Leioa 48940, Spain,Ikerbasque
Basque Foundation for Science Bilbao 48009, Spain,
| |
Collapse
|
19
|
Cheng L, Sun J, Miller TF. Accurate Molecular-Orbital-Based Machine Learning Energies via Unsupervised Clustering of Chemical Space. J Chem Theory Comput 2022; 18:4826-4835. [PMID: 35858242 DOI: 10.1021/acs.jctc.2c00396] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We introduce an unsupervised clustering algorithm to improve training efficiency and accuracy in predicting energies using molecular-orbital-based machine learning (MOB-ML). This work determines clusters via the Gaussian mixture model (GMM) in an entirely automatic manner and simplifies an earlier supervised clustering approach [ J. Chem. Theory Comput. 2019, 15, 6668] by eliminating both the necessity for user-specified parameters and the training of an additional classifier. Unsupervised clustering results from GMM have the advantages of accurately reproducing chemically intuitive groupings of frontier molecular orbitals and exhibiting improved performance with an increasing number of training examples. The resulting clusters from supervised or unsupervised clustering are further combined with scalable Gaussian process regression (GPR) or linear regression (LR) to learn molecular energies accurately by generating a local regression model in each cluster. Among all four combinations of regressors and clustering methods, GMM combined with scalable exact GPR (GMM/GPR) is the most efficient training protocol for MOB-ML. The numerical tests of molecular energy learning on thermalized data sets of drug-like molecules demonstrate the improved accuracy, transferability, and learning efficiency of GMM/GPR over other training protocols for MOB-ML, i.e., supervised regression clustering combined with GPR (RC/GPR) and GPR without clustering. GMM/GPR also provides the best molecular energy predictions compared with ones from the literature on the same benchmark data sets. With a lower scaling, GMM/GPR has a 10.4-fold speedup in wall-clock training time compared with scalable exact GPR with a training size of 6500 QM7b-T molecules.
Collapse
Affiliation(s)
- Lixue Cheng
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Jiace Sun
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Thomas F Miller
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States
| |
Collapse
|
20
|
Duan C, Ladera AJ, Liu JCL, Taylor MG, Ariyarathna IR, Kulik HJ. Exploiting Ligand Additivity for Transferable Machine Learning of Multireference Character across Known Transition Metal Complex Ligands. J Chem Theory Comput 2022; 18:4836-4845. [PMID: 35834742 DOI: 10.1021/acs.jctc.2c00468] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Accurate virtual high-throughput screening (VHTS) of transition metal complexes (TMCs) remains challenging due to the possibility of high multireference (MR) character that complicates property evaluation. We compute MR diagnostics for over 5,000 ligands present in previously synthesized octahedral mononuclear transition metal complexes in the Cambridge Structural Database (CSD). To accomplish this task, we introduce an iterative approach for consistent ligand charge assignment for ligands in the CSD. Across this set, we observe that the MR character correlates linearly with the inverse value of the averaged bond order over all bonds in the molecule. We then demonstrate that ligand additivity of the MR character holds in TMCs, which suggests that the TMC MR character can be inferred from the sum of the MR character of the ligands. Encouraged by this observation, we leverage ligand additivity and develop a ligand-derived machine learning representation to train neural networks to predict the MR character of TMCs from properties of the constituent ligands. This approach yields models with excellent performance and superior transferability to unseen ligand chemistry and compositions.
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Adriana J Ladera
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Julian C-L Liu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Michael G Taylor
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Isuru R Ariyarathna
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
21
|
Duan C, Nandy A, Adamji H, Roman-Leshkov Y, Kulik HJ. Machine Learning Models Predict Calculation Outcomes with the Transferability Necessary for Computational Catalysis. J Chem Theory Comput 2022; 18:4282-4292. [PMID: 35737587 DOI: 10.1021/acs.jctc.2c00331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Virtual high-throughput screening (VHTS) and machine learning (ML) have greatly accelerated the design of single-site transition-metal catalysts. VHTS of catalysts, however, is often accompanied with a high calculation failure rate and wasted computational resources due to the difficulty of simultaneously converging all mechanistically relevant reactive intermediates to expected geometries and electronic states. We demonstrate a dynamic classifier approach, i.e., a convolutional neural network that monitors geometry optimizations on the fly, and exploit its good performance and transferability in identifying geometry optimization failures for catalyst design. We show that the dynamic classifier performs well on all reactive intermediates in the representative catalytic cycle of the radical rebound mechanism for the conversion of methane to methanol despite being trained on only one reactive intermediate. The dynamic classifier also generalizes to chemically distinct intermediates and metal centers absent from the training data without loss of accuracy or model confidence. We rationalize this superior model transferability as arising from the use of electronic structure and geometric information generated on-the-fly from density functional theory calculations and the convolutional layer in the dynamic classifier. When used in combination with uncertainty quantification, the dynamic classifier saves more than half of the computational resources that would have been wasted on unsuccessful calculations for all reactive intermediates being considered.
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Husain Adamji
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Yuriy Roman-Leshkov
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
22
|
Alkhatib II, Albà CG, Darwish AS, Llovell F, Vega LF. Searching for Sustainable Refrigerants by Bridging Molecular Modeling with Machine Learning. Ind Eng Chem Res 2022; 61:7414-7429. [PMID: 35673400 PMCID: PMC9165071 DOI: 10.1021/acs.iecr.2c00719] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Revised: 04/30/2022] [Accepted: 05/06/2022] [Indexed: 11/29/2022]
Abstract
We present here a novel integrated approach employing machine learning algorithms for predicting thermophysical properties of fluids. The approach allows obtaining molecular parameters to be used in the polar soft-statistical associating fluid theory (SAFT) equation of state using molecular descriptors obtained from the conductor-like screening model for real solvents (COSMO-RS). The procedure is used for modeling 18 refrigerants including hydrofluorocarbons, hydrofluoroolefins, and hydrochlorofluoroolefins. The training dataset included six inputs obtained from COSMO-RS and five outputs from polar soft-SAFT parameters, with the accurate algorithm training ensured by its high statistical accuracy. The predicted molecular parameters were used in polar soft-SAFT for evaluating the thermophysical properties of the refrigerants such as density, vapor pressure, heat capacity, enthalpy of vaporization, and speed of sound. Predictions provided a good level of accuracy (AADs = 1.3-10.5%) compared to experimental data, and within a similar level of accuracy using parameters obtained from standard fitting procedures. Moreover, the predicted parameters provided a comparable level of predictive accuracy to parameters obtained from standard procedure when extended to modeling selected binary mixtures. The proposed approach enables bridging the gap in the data of thermodynamic properties of low global warming potential refrigerants, which hinders their technical evaluation and hence their final application.
Collapse
Affiliation(s)
- Ismail
I. I. Alkhatib
- Research
and Innovation Center on CO2 and Hydrogen (RICH), Khalifa University, PO Box 127788 Abu Dhabi, United Arab Emirates
- Chemical
Engineering Department, Khalifa University, PO Box 127788 Abu
Dhabi, United Arab Emirates
| | - Carlos G. Albà
- Department
of Chemical Engineering, ETSEQ, Universitat
Rovira i Virgili (URV), Avinguda Països Catalans 26, 43007 Tarragona, Spain
| | - Ahmad S. Darwish
- Chemical
Engineering Department, Khalifa University, PO Box 127788 Abu
Dhabi, United Arab Emirates
| | - Fèlix Llovell
- Department
of Chemical Engineering, ETSEQ, Universitat
Rovira i Virgili (URV), Avinguda Països Catalans 26, 43007 Tarragona, Spain
| | - Lourdes F. Vega
- Research
and Innovation Center on CO2 and Hydrogen (RICH), Khalifa University, PO Box 127788 Abu Dhabi, United Arab Emirates
- Chemical
Engineering Department, Khalifa University, PO Box 127788 Abu
Dhabi, United Arab Emirates
| |
Collapse
|
23
|
Cytter Y, Nandy A, Bajaj A, Kulik HJ. Ligand Additivity and Divergent Trends in Two Types of Delocalization Errors from Approximate Density Functional Theory. J Phys Chem Lett 2022; 13:4549-4555. [PMID: 35579948 DOI: 10.1021/acs.jpclett.2c01026] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The predictive accuracy of density functional theory (DFT) is hampered by delocalization errors, especially for correlated systems such as transition-metal complexes. Two complementary strategies have been developed to reduce delocalization error: eliminating the global curvature with change in charge, and applying a linear response Hubbard U as a measure of local curvature at a metal center at fixed charge in a DFT+U framework. We investigate the relationship between the two delocalization error measures as the ligand field strength is varied with the number of strong-field ligands in a series of heteroleptic complexes or by geometrically constraining the metal-ligand bond length in homoleptic octahedral complexes. We show that across these sets of complexes an inverse relationship generally exists between global and local curvatures. We find that effects of ligand substitution on both measures of delocalization are typically additive, but the quantities seldom coincide.
Collapse
Affiliation(s)
- Yael Cytter
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Akash Bajaj
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
24
|
Nandy A, Duan C, Goffinet C, Kulik HJ. New Strategies for Direct Methane-to-Methanol Conversion from Active Learning Exploration of 16 Million Catalysts. JACS AU 2022; 2:1200-1213. [PMID: 35647589 PMCID: PMC9135396 DOI: 10.1021/jacsau.2c00176] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 04/12/2022] [Accepted: 04/15/2022] [Indexed: 05/03/2023]
Abstract
Despite decades of effort, no earth-abundant homogeneous catalysts have been discovered that can selectively oxidize methane to methanol. We exploit active learning to simultaneously optimize methane activation and methanol release calculated with machine learning-accelerated density functional theory in a space of 16 M candidate catalysts including novel macrocycles. By constructing macrocycles from fragments inspired by synthesized compounds, we ensure synthetic realism in our computational search. Our large-scale search reveals that low-spin Fe(II) compounds paired with strong-field (e.g., P or S-coordinating) ligands have among the best energetic tradeoffs between hydrogen atom transfer (HAT) and methanol release. This observation contrasts with prior efforts that have focused on high-spin Fe(II) with weak-field ligands. By decoupling equatorial and axial ligand effects, we determine that negatively charged axial ligands are critical for more rapid release of methanol and that higher-valency metals [i.e., M(III) vs M(II)] are likely to be rate-limited by slow methanol release. With full characterization of barrier heights, we confirm that optimizing for HAT does not lead to large oxo formation barriers. Energetic span analysis reveals designs for an intermediate-spin Mn(II) catalyst and a low-spin Fe(II) catalyst that are predicted to have good turnover frequencies. Our active learning approach to optimize two distinct reaction energies with efficient global optimization is expected to be beneficial for the search of large catalyst spaces where no prior designs have been identified and where linear scaling relationships between reaction energies or barriers may be limited or unknown.
Collapse
Affiliation(s)
- Aditya Nandy
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States
| | - Chenru Duan
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States
| | - Conrad Goffinet
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J. Kulik
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
25
|
Duan C, Chu DBK, Nandy A, Kulik HJ. Detection of multi-reference character imbalances enables a transfer learning approach for virtual high throughput screening with coupled cluster accuracy at DFT cost. Chem Sci 2022; 13:4962-4971. [PMID: 35655882 PMCID: PMC9067623 DOI: 10.1039/d2sc00393g] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 04/04/2022] [Indexed: 01/08/2023] Open
Abstract
Appropriately identifying and treating molecules and materials with significant multi-reference (MR) character is crucial for achieving high data fidelity in virtual high-throughput screening (VHTS). Despite development of numerous MR diagnostics, the extent to which a single value of such a diagnostic indicates the MR effect on a chemical property prediction is not well established. We evaluate MR diagnostics for over 10 000 transition-metal complexes (TMCs) and compare to those for organic molecules. We observe that only some MR diagnostics are transferable from one chemical space to another. By studying the influence of MR character on chemical properties (i.e., MR effect) that involve multiple potential energy surfaces (i.e., adiabatic spin splitting, ΔE H-L, and ionization potential, IP), we show that differences in MR character are more important than the cumulative degree of MR character in predicting the magnitude of an MR effect. Motivated by this observation, we build transfer learning models to predict CCSD(T)-level adiabatic ΔE H-L and IP from lower levels of theory. By combining these models with uncertainty quantification and multi-level modeling, we introduce a multi-pronged strategy that accelerates data acquisition by at least a factor of three while achieving coupled cluster accuracy (i.e., to within 1 kcal mol-1 MAE) for robust VHTS.
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
- Department of Chemistry, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Daniel B K Chu
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
- Department of Chemistry, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
26
|
Genin SN, Ryabinkin IG, Paisley NR, Whelan SO, Helander MG, Hudson ZM. Estimating Phosphorescent Emission Energies in Ir
III
Complexes Using Large‐Scale Quantum Computing Simulations**. Angew Chem Int Ed Engl 2022; 61:e202116175. [DOI: 10.1002/anie.202116175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Indexed: 11/11/2022]
Affiliation(s)
- Scott N. Genin
- OTI Lumionics Inc. 100 College St. #351 Toronto Ontario M5G 1L5 Canada
| | - Ilya G. Ryabinkin
- OTI Lumionics Inc. 100 College St. #351 Toronto Ontario M5G 1L5 Canada
| | - Nathan R. Paisley
- Department of Chemistry The University of British Columbia 2036 Main Mall Vancouver British Columbia V6T 1Z1 Canada
| | - Sarah O. Whelan
- OTI Lumionics Inc. 100 College St. #351 Toronto Ontario M5G 1L5 Canada
| | | | - Zachary M. Hudson
- Department of Chemistry The University of British Columbia 2036 Main Mall Vancouver British Columbia V6T 1Z1 Canada
| |
Collapse
|
27
|
Tarzia A, Jelfs KE. Unlocking the computational design of metal-organic cages. Chem Commun (Camb) 2022; 58:3717-3730. [PMID: 35229861 PMCID: PMC8932387 DOI: 10.1039/d2cc00532h] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 02/22/2022] [Indexed: 12/11/2022]
Abstract
Metal-organic cages are macrocyclic structures that can possess an intrinsic void that can hold molecules for encapsulation, adsorption, sensing, and catalysis applications. As metal-organic cages may be comprised from nearly any combination of organic and metal-containing components, cages can form with diverse shapes and sizes, allowing for tuning toward targeted properties. Therefore, their near-infinite design space is almost impossible to explore through experimentation alone and computational design can play a crucial role in exploring new systems. Although high-throughput computational design and screening workflows have long been known as powerful tools in drug and materials discovery, their application in exploring metal-organic cages is more recent. We show examples of structure prediction and host-guest/catalytic property evaluation of metal-organic cages. These examples are facilitated by advances in methods that handle metal-containing systems with improved accuracy and are the beginning of the development of automated cage design workflows. We finally outline a scope for how high-throughput computational methods can assist and drive experimental decisions as the field pushes toward functional and complex metal-organic cages. In particular, we highlight the importance of considering realistic, flexible systems.
Collapse
Affiliation(s)
- Andrew Tarzia
- Department of Chemistry, Molecular Sciences Research Hub, Imperial College London, White City Campus, Wood Lane, London, W12 0BZ, UK.
| | - Kim E Jelfs
- Department of Chemistry, Molecular Sciences Research Hub, Imperial College London, White City Campus, Wood Lane, London, W12 0BZ, UK.
| |
Collapse
|
28
|
Genin SN, Ryabinkin IG, Paisley NR, Whelan SO, Helander MG, Hudson ZM. Estimating Phosphorescent Emission Energies in Ir
III
Complexes Using Large‐Scale Quantum Computing Simulations**. Angew Chem Int Ed Engl 2022. [DOI: 10.1002/ange.202116175] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Scott N. Genin
- OTI Lumionics Inc. 100 College St. #351 Toronto Ontario M5G 1L5 Canada
| | - Ilya G. Ryabinkin
- OTI Lumionics Inc. 100 College St. #351 Toronto Ontario M5G 1L5 Canada
| | - Nathan R. Paisley
- Department of Chemistry The University of British Columbia 2036 Main Mall Vancouver British Columbia V6T 1Z1 Canada
| | - Sarah O. Whelan
- OTI Lumionics Inc. 100 College St. #351 Toronto Ontario M5G 1L5 Canada
| | | | - Zachary M. Hudson
- Department of Chemistry The University of British Columbia 2036 Main Mall Vancouver British Columbia V6T 1Z1 Canada
| |
Collapse
|
29
|
Abramov YA, Sun G, Zeng Q. Emerging Landscape of Computational Modeling in Pharmaceutical Development. J Chem Inf Model 2022; 62:1160-1171. [PMID: 35226809 DOI: 10.1021/acs.jcim.1c01580] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Computational chemistry applications have become an integral part of the drug discovery workflow over the past 35 years. However, computational modeling in support of drug development has remained a relatively uncharted territory for a significant part of both academic and industrial communities. This review considers the computational modeling workflows for three key components of drug preclinical and clinical development, namely, process chemistry, analytical research and development, as well as drug product and formulation development. An overview of the computational support for each step of the respective workflows is presented. Additionally, in context of solid form design, special consideration is given to modern physics-based virtual screening methods. This covers rational approaches to polymorph, coformer, counterion, and solvent virtual screening in support of solid form selection and design.
Collapse
Affiliation(s)
- Yuriy A Abramov
- XtalPi, Inc., 245 Main St., Cambridge, Massachusetts 02142, United States.,Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| | - Guangxu Sun
- XtalPi, Inc., Shenzhen Jingtai Technology Co., Ltd., Floor 3, Sf Industrial Plant, No. 2 Hongliu road, Fubao Community, Fubao Street, Futian District, Shenzhen 518100, China
| | - Qun Zeng
- XtalPi, Inc., Shenzhen Jingtai Technology Co., Ltd., Floor 3, Sf Industrial Plant, No. 2 Hongliu road, Fubao Community, Fubao Street, Futian District, Shenzhen 518100, China
| |
Collapse
|
30
|
Duan C, Nandy A, Kulik HJ. Machine Learning for the Discovery, Design, and Engineering of Materials. Annu Rev Chem Biomol Eng 2022; 13:405-429. [PMID: 35320698 DOI: 10.1146/annurev-chembioeng-092320-120230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Machine learning (ML) has become a part of the fabric of high-throughput screening and computational discovery of materials. Despite its increasingly central role, challenges remain in fully realizing the promise of ML. This is especially true for the practical acceleration of the engineering of robust materials and the development of design strategies that surpass trial and error or high-throughput screening alone. Depending on the quantity being predicted and the experimental data available, ML can either outperform physics-based modes, be used to accelerate such models, or be integrated with them to improve their performance. We cover recent advances in algorithms and in their application that are starting to make inroads toward (a) the discovery of new materials through large-scale enumerative screening, (b) the design of materials through identification of rules and principles that govern materials properties, and (c) the engineering of practical materials by satisfying multiple objectives. We conclude with opportunities for further advancement to realize ML as a widespread tool for practical computational materials design. Expected final online publication date for the Annual Review of Chemical and Biomolecular Engineering, Volume 13 is October 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA; , , .,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA; , , .,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA; , ,
| |
Collapse
|
31
|
Kalikadien AV, Pidko EA, Sinha V. ChemSpaX: exploration of chemical space by automated functionalization of molecular scaffold. DIGITAL DISCOVERY 2022; 1:8-25. [PMID: 35340336 PMCID: PMC8887922 DOI: 10.1039/d1dd00017a] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Accepted: 12/23/2021] [Indexed: 12/19/2022]
Abstract
Exploration of the local chemical space of molecular scaffolds by post-functionalization (PF) is a promising route to discover novel molecules with desired structure and function. PF with rationally chosen substituents based on known electronic and steric properties is a commonly used experimental and computational strategy in screening, design and optimization of catalytic scaffolds. Automated generation of reasonably accurate geometric representations of post-functionalized molecular scaffolds is highly desirable for data-driven applications. However, automated PF of transition metal (TM) complexes remains challenging. In this work a Python-based workflow, ChemSpaX, that is aimed at automating the PF of a given molecular scaffold with special emphasis on TM complexes, is introduced. In three representative applications of ChemSpaX by comparing with DFT and DFT-B calculations, we show that the generated structures have a reasonable quality for use in computational screening applications. Furthermore, we show that ChemSpaX generated geometries can be used in machine learning applications to accurately predict DFT computed HOMO-LUMO gaps for transition metal complexes. ChemSpaX is open-source and aims to bolster and democratize the efforts of the scientific community towards data-driven chemical discovery.
Collapse
Affiliation(s)
- Adarsh V Kalikadien
- Inorganic Systems Engineering, Department of Chemical Engineering, Faculty of Applied Sciences, Delft University of Technology Van der Maasweg 9 2629 HZ Delft The Netherlands
| | - Evgeny A Pidko
- Inorganic Systems Engineering, Department of Chemical Engineering, Faculty of Applied Sciences, Delft University of Technology Van der Maasweg 9 2629 HZ Delft The Netherlands
| | - Vivek Sinha
- Inorganic Systems Engineering, Department of Chemical Engineering, Faculty of Applied Sciences, Delft University of Technology Van der Maasweg 9 2629 HZ Delft The Netherlands
| |
Collapse
|
32
|
Ren S, Fonseca E, Perry W, Cheng HP, Zhang XG, Hennig RG. Ligand Optimization of Exchange Interaction in Co(II) Dimer Single Molecule Magnet by Machine Learning. J Phys Chem A 2022; 126:529-535. [PMID: 35068152 DOI: 10.1021/acs.jpca.1c08950] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Designing single-molecule magnets (SMMs) for potential applications in quantum computing and high-density data storage requires tuning their magnetic properties, especially the strength of the magnetic interaction. These properties can be characterized by first-principles calculations based on density functional theory (DFT). In this work, we study the experimentally synthesized Co(II) dimer (Co2(C5NH5)4(μ-PO2(CH2C6H5)2)3) SMM with the goal to control the exchange energy, ΔEJ, between the Co atoms through tuning of the capping ligands. The experimentally synthesized Co(II) dimer molecule has a very small ΔEJ < 1 meV. We assemble a DFT data set of 1081 ligand substitutions for the Co(II) dimer. The ligand exchange provides a broad range of exchange energies, ΔEJ, from +50 to -200 meV, with 80% of the ligands yielding a small ΔEJ < 10 meV. We identify descriptors for the classification and regression of ΔEJ using gradient boosting machine learning models. We compare one-hot encoded, structure-based, and chemical descriptors consisting of the HOMO/LUMO energies of the individual ligands and the maximum electronegativity difference and bond order for the ligand atom connecting to Co. We observe a similar overall performance with the chemical descriptors outperforming the other descriptors. We show that the exchange coupling, ΔEJ, is correlated to the difference in the average bridging angle between the ferromagnetic and antiferromagnetic states, similar to the Goodenough-Kanamori rules.
Collapse
Affiliation(s)
- Sijin Ren
- Department of Physics, University of Florida, Gainesville, Florida 32611, United States.,Department of Materials Science and Engineering, University of Florida, Gainesville, Florida 32611, United States.,Quantum Theory Project, University of Florida, Gainesville, Florida 32 611, United States
| | - Eric Fonseca
- Department of Materials Science and Engineering, University of Florida, Gainesville, Florida 32611, United States.,Quantum Theory Project, University of Florida, Gainesville, Florida 32 611, United States
| | - William Perry
- Department of Physics, University of Florida, Gainesville, Florida 32611, United States.,Quantum Theory Project, University of Florida, Gainesville, Florida 32 611, United States
| | - Hai-Ping Cheng
- Department of Physics, University of Florida, Gainesville, Florida 32611, United States.,Quantum Theory Project, University of Florida, Gainesville, Florida 32 611, United States
| | - Xiao-Guang Zhang
- Department of Physics, University of Florida, Gainesville, Florida 32611, United States.,Quantum Theory Project, University of Florida, Gainesville, Florida 32 611, United States
| | - Richard G Hennig
- Department of Materials Science and Engineering, University of Florida, Gainesville, Florida 32611, United States.,Quantum Theory Project, University of Florida, Gainesville, Florida 32 611, United States
| |
Collapse
|
33
|
Harper DR, Nandy A, Arunachalam N, Duan C, Janet JP, Kulik HJ. Representations and strategies for transferable machine learning Improve model performance in chemical discovery. J Chem Phys 2022; 156:074101. [DOI: 10.1063/5.0082964] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Daniel R Harper
- Massachusetts Institute of Technology, United States of America
| | - Aditya Nandy
- Massachusetts Institute of Technology, United States of America
| | | | - Chenru Duan
- Massachusetts Institute of Technology, United States of America
| | | | - Heather J. Kulik
- Dept of Chemical Engineering, Massachusetts Institute of Technology, United States of America
| |
Collapse
|
34
|
Harper DR, Kulik HJ. Computational Scaling Relationships Predict Experimental Activity and Rate-Limiting Behavior in Homogeneous Water Oxidation. Inorg Chem 2022; 61:2186-2197. [PMID: 35037756 DOI: 10.1021/acs.inorgchem.1c03376] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
While computational screening with first-principles density functional theory (DFT) is essential for evaluating candidate catalysts, limitations in accuracy typically prevent the prediction of experimentally relevant activities. Exemplary of these challenges are homogeneous water oxidation catalysts (WOCs) where differences in experimental conditions or small changes in ligand structure can alter rate constants by over an order of magnitude. Here, we compute mechanistically relevant electronic and energetic properties for 19 mononuclear Ru transition-metal complexes (TMCs) from three experimental water oxidation catalysis studies. We discover that 15 of these TMCs have experimental activities that correlate with a single property, the ionization potential of the Ru(II)-O2 catalytic intermediate. This scaling parameter allows the quantitative understanding of activity trends and provides insight into the rate-limiting behavior. We use this approach to rationalize differences in activity with different experimental conditions, and we qualitatively analyze the source of distinct behavior for different electronic states in the other four catalysts. Comparison to closely related single-atom catalysts and modified WOCs enables rationalization of the source of rate enhancement in these WOCs.
Collapse
Affiliation(s)
- Daniel R Harper
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
35
|
Majumdar S, Moosavi SM, Jablonka KM, Ongari D, Smit B. Diversifying Databases of Metal Organic Frameworks for High-Throughput Computational Screening. ACS APPLIED MATERIALS & INTERFACES 2021; 13:61004-61014. [PMID: 34910455 PMCID: PMC8719320 DOI: 10.1021/acsami.1c16220] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Accepted: 12/03/2021] [Indexed: 05/19/2023]
Abstract
By combining metal nodes and organic linkers, an infinite number of metal organic frameworks (MOFs) can be designed in silico. Therefore, when making new databases of such hypothetical MOFs, we need to ensure that they not only contribute toward the growth of the count of structures but also add different chemistries to the existing databases. In this study, we designed a database of ∼20,000 hypothetical MOFs, which are diverse in terms of their chemical design space─metal nodes, organic linkers, functional groups, and pore geometries. Using machine learning techniques, we visualized and quantified the diversity of these structures. We find that on adding the structures of our database, the overall diversity metrics of hypothetical databases improve, especially in terms of the chemistry of metal nodes. We then assessed the usefulness of diverse structures by evaluating their performance, using grand-canonical Monte Carlo simulations, in two important environmental applications─post-combustion carbon capture and hydrogen storage. We find that many of these structures perform better than widely used benchmark materials such as Zeolite-13X (for post-combustion carbon capture) and MOF-5 (for hydrogen storage). All the structures developed in this study, and their properties, are provided on the Materials Cloud to encourage further use of these materials for other applications.
Collapse
|
36
|
Liu M, Nazemi A, Taylor MG, Nandy A, Duan C, Steeves AH, Kulik HJ. Large-Scale Screening Reveals That Geometric Structure Matters More Than Electronic Structure in the Bioinspired Catalyst Design of Formate Dehydrogenase Mimics. ACS Catal 2021. [DOI: 10.1021/acscatal.1c04624] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Affiliation(s)
- Mingjie Liu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Azadeh Nazemi
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Michael G. Taylor
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Adam H. Steeves
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J. Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
37
|
Nandy A, Duan C, Kulik HJ. Using Machine Learning and Data Mining to Leverage Community Knowledge for the Engineering of Stable Metal-Organic Frameworks. J Am Chem Soc 2021; 143:17535-17547. [PMID: 34643374 DOI: 10.1021/jacs.1c07217] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Although the tailored metal active sites and porous architectures of MOFs hold great promise for engineering challenges ranging from gas separations to catalysis, a lack of understanding of how to improve their stability limits their use in practice. To overcome this limitation, we extract thousands of published reports of the key aspects of MOF stability necessary for their practical application: the ability to withstand high temperatures without degrading and the capacity to be activated by removal of solvent molecules. From nearly 4000 manuscripts, we use natural language processing and image analysis to obtain over 2000 solvent-removal stability measures and 3000 thermal degradation temperatures. We analyze the relationships between stability properties and the chemical and geometric structures in this set to identify limits of prior heuristics derived from smaller sets of MOFs. By training predictive machine learning (ML, i.e., Gaussian process and artificial neural network) models to encode the structure-property relationships with graph- and pore-structure-based representations, we are able to make predictions of stability orders of magnitude faster than conventional physics-based modeling or experiment. Interpretation of important features in ML models provides insights that we use to identify strategies to engineer increased stability into typically unstable 3d-transition-metal-containing MOFs that are frequently targeted for catalytic applications. We expect our approach to accelerate the time to discovery of stable, practical MOF materials for a wide range of applications.
Collapse
Affiliation(s)
- Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
38
|
Craig MJ, García-Melchor M. Applying Active Learning to the Screening of Molecular Oxygen Evolution Catalysts. Molecules 2021; 26:molecules26216362. [PMID: 34770771 PMCID: PMC8588390 DOI: 10.3390/molecules26216362] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Revised: 10/01/2021] [Accepted: 10/19/2021] [Indexed: 11/16/2022] Open
Abstract
The oxygen evolution reaction (OER) can enable green hydrogen production; however, the state-of-the-art catalysts for this reaction are composed of prohibitively expensive materials. In addition, cheap catalysts have associated overpotentials that render the reaction inefficient. This impels the search to discover novel catalysts for this reaction computationally. In this communication, we present machine learning algorithms to enhance the hypothetical screening of molecular OER catalysts. By predicting calculated binding energies using Gaussian process regression (GPR) models and applying active learning schemes, we provide evidence that our algorithm can improve computational efficiency by guiding simulations towards candidates with promising OER descriptor values. Furthermore, we derive an acquisition function that, when maximized, can identify catalysts that can exhibit theoretical overpotentials that circumvent the constraints imposed by linear scaling relations by attempting to enforce a specific mechanism. Finally, we provide a brief perspective on the appropriate sets of molecules to consider when screening complexes that could be stable and active for this reaction.
Collapse
|
39
|
Raman G. Study of the Relationship between Synthesis Descriptors and the Type of Zeolite Phase Formed in ZSM‐43 Synthesis by Using Machine Learning. ChemistrySelect 2021. [DOI: 10.1002/slct.202102890] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Ganesan Raman
- Reliance Research & Development Center Reliance Corporate Park, Reliance Industries Limited Thane-Belapur Road, Ghansoli Navi Mumbai India 400701
| |
Collapse
|
40
|
Taylor MG, Nandy A, Lu CC, Kulik HJ. Deciphering Cryptic Behavior in Bimetallic Transition-Metal Complexes with Machine Learning. J Phys Chem Lett 2021; 12:9812-9820. [PMID: 34597514 DOI: 10.1021/acs.jpclett.1c02852] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
We demonstrate an alternative, data-driven approach to uncovering structure-property relationships for the rational design of heterobimetallic transition-metal complexes that exhibit metal-metal bonding. We tailor graph-based representations of the metal-local environment for these complexes for use in multiple linear regression and kernel ridge regression (KRR) models. We curate a set of 28 experimentally characterized complexes to develop a multiple linear regression model for oxidation potentials. We achieve good accuracy (mean absolute error of 0.25 V) and preserve transferability to unseen experimental data with a new ligand structure. We also train a KRR model on a subset of 330 structurally characterized heterobimetallics to predict the degree of metal-metal bonding. This KRR model predicts relative metal-metal bond lengths in the test set to within 5%, and analysis of key features reveals the fundamental atomic contributions (e.g., the valence electron configuration) that most strongly influence the behavior of these complexes. Our work provides guidance for rational bimetallic design, suggesting that properties, including the formal shortness ratio, should be transferable from one period to another.
Collapse
Affiliation(s)
- Michael G Taylor
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Connie C Lu
- Department of Chemistry, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
41
|
Duan C, Chen S, Taylor MG, Liu F, Kulik HJ. Machine learning to tame divergent density functional approximations: a new path to consensus materials design principles. Chem Sci 2021; 12:13021-13036. [PMID: 34745533 PMCID: PMC8513898 DOI: 10.1039/d1sc03701c] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Accepted: 09/01/2021] [Indexed: 01/17/2023] Open
Abstract
Virtual high-throughput screening (VHTS) with density functional theory (DFT) and machine-learning (ML)-acceleration is essential in rapid materials discovery. By necessity, efficient DFT-based workflows are carried out with a single density functional approximation (DFA). Nevertheless, properties evaluated with different DFAs can be expected to disagree for cases with challenging electronic structure (e.g., open-shell transition-metal complexes, TMCs) for which rapid screening is most needed and accurate benchmarks are often unavailable. To quantify the effect of DFA bias, we introduce an approach to rapidly obtain property predictions from 23 representative DFAs spanning multiple families, “rungs” (e.g., semi-local to double hybrid) and basis sets on over 2000 TMCs. Although computed property values (e.g., spin state splitting and frontier orbital gap) differ by DFA, high linear correlations persist across all DFAs. We train independent ML models for each DFA and observe convergent trends in feature importance, providing DFA-invariant, universal design rules. We devise a strategy to train artificial neural network (ANN) models informed by all 23 DFAs and use them to predict properties (e.g., spin-splitting energy) of over 187k TMCs. By requiring consensus of the ANN-predicted DFA properties, we improve correspondence of computational lead compounds with literature-mined, experimental compounds over the typically employed single-DFA approach. Machine learning (ML)-based feature analysis reveals universal design rules regardless of density functional choices. Using the consensus among multiple functionals, we identify robust lead complexes in ML-accelerated chemical discovery.![]()
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA +1-617-253-4584.,Department of Chemistry, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Shuxin Chen
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA +1-617-253-4584.,Department of Chemistry, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Michael G Taylor
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA +1-617-253-4584
| | - Fang Liu
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA +1-617-253-4584
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA +1-617-253-4584
| |
Collapse
|
42
|
Smith BA, Vogiatzis KD. σ-Donation and π-Backdonation Effects in Dative Bonds of Main-Group Elements. J Phys Chem A 2021; 125:7956-7966. [PMID: 34477393 DOI: 10.1021/acs.jpca.1c05956] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The nature of donor-acceptor interactions is important for the understanding of dative bonding and can provide vital insights into many chemical processes. Here, we have performed a computational study to elucidate substantial differences between different types of dative interactions. For this purpose, a data set of 20 molecular complexes stabilized by dative bonds was developed (DAT20). A benchmark study that considers many popular density functionals with respect to accurate quantum chemical interaction energies and geometries revealed two different trends between the complexes of DAT20. This behavior was further explored by means of frontier molecular orbitals, extended-transition-state natural orbitals for chemical valence (ETS-NOCV), and natural energy decomposition analysis (NEDA). These methods revealed the extent of the forward and backdonation between the donor and acceptor molecules and how they influence the total interaction energies and molecular geometries. A new classification of dative bonds is suggested.
Collapse
Affiliation(s)
- Brett A Smith
- Department of Chemistry, University of Tennessee, Knoxville, Tennessee 37996, United States
| | | |
Collapse
|
43
|
Sun W, Zheng Y, Zhang Q, Yang K, Chen H, Cho Y, Fu J, Odunmbaku O, Shah AA, Xiao Z, Lu S, Chen S, Li M, Qin B, Yang C, Frauenheim T, Sun K. Artificial Intelligence Designer for Highly-Efficient Organic Photovoltaic Materials. J Phys Chem Lett 2021; 12:8847-8854. [PMID: 34494851 DOI: 10.1021/acs.jpclett.1c02554] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Designing efficient organic photovoltaic (OPV) materials purposefully is still challenging and time-consuming. It is of paramount importance in material development to identify basic functional units that play the key roles in material performance and subsequently establish the substructure-property relationship. Herein, we describe an automatic design framework based on an in-house designed La FREMD Fingerprint and machine learning (ML) algorithms for highly efficient OPV donor molecules. The key building blocks are identified, and a library consisting of 18 960 new molecules is generated within this framework. Through investigating the chemical structures of materials with different performance, a guidance on designing efficient OPV materials is proposed. Furthermore, the most promising candidates exhibit a predicted power conversion efficiency (PCE) value of over 15% when combined with acceptor Y6. Density functional theory (DFT) studies show these candidate materials possess exceptional potential for efficient charge carrier transport. The proposed framework demonstrates the ability to design new materials based on the substructure-property relationship built by ML, which provides an alternative methodology for applying ML in new material discovery.
Collapse
Affiliation(s)
- Wenbo Sun
- MOE Key Laboratory of Low-grade Energy Utilization Technologies and Systems, School of Energy and Power Engineering, Chongqing University, 174 Shazhengjie, Shapingba, Chongqing 400044, China
- Bremen Center for Computational Materials Science, University of Bremen, Am Fallturm 1, Bremen 28359, Germany
| | - Yujie Zheng
- MOE Key Laboratory of Low-grade Energy Utilization Technologies and Systems, School of Energy and Power Engineering, Chongqing University, 174 Shazhengjie, Shapingba, Chongqing 400044, China
| | - Qi Zhang
- MOE Key Laboratory of Low-grade Energy Utilization Technologies and Systems, School of Energy and Power Engineering, Chongqing University, 174 Shazhengjie, Shapingba, Chongqing 400044, China
| | - Ke Yang
- MOE Key Laboratory of Low-grade Energy Utilization Technologies and Systems, School of Energy and Power Engineering, Chongqing University, 174 Shazhengjie, Shapingba, Chongqing 400044, China
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, 266 Fang Zheng Road, Beibei, Chongqing 400714, China
| | - Haiyan Chen
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, 266 Fang Zheng Road, Beibei, Chongqing 400714, China
| | - Yongjoon Cho
- Department of Energy Engineering, School of Energy and Chemical Engineering, Perovtronics Research Center, Low Dimensional Carbon Materials Center, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| | - Jiehao Fu
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, 266 Fang Zheng Road, Beibei, Chongqing 400714, China
| | - Omololu Odunmbaku
- MOE Key Laboratory of Low-grade Energy Utilization Technologies and Systems, School of Energy and Power Engineering, Chongqing University, 174 Shazhengjie, Shapingba, Chongqing 400044, China
| | - Akeel A Shah
- MOE Key Laboratory of Low-grade Energy Utilization Technologies and Systems, School of Energy and Power Engineering, Chongqing University, 174 Shazhengjie, Shapingba, Chongqing 400044, China
| | - Zeyun Xiao
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, 266 Fang Zheng Road, Beibei, Chongqing 400714, China
| | - Shirong Lu
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, 266 Fang Zheng Road, Beibei, Chongqing 400714, China
| | - Shanshan Chen
- MOE Key Laboratory of Low-grade Energy Utilization Technologies and Systems, School of Energy and Power Engineering, Chongqing University, 174 Shazhengjie, Shapingba, Chongqing 400044, China
| | - Meng Li
- MOE Key Laboratory of Low-grade Energy Utilization Technologies and Systems, School of Energy and Power Engineering, Chongqing University, 174 Shazhengjie, Shapingba, Chongqing 400044, China
| | - Bo Qin
- College of Chemistry and Chemical Engineering, Chongqing University, Chongqing 400044, China
| | - Changduk Yang
- Department of Energy Engineering, School of Energy and Chemical Engineering, Perovtronics Research Center, Low Dimensional Carbon Materials Center, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| | - Thomas Frauenheim
- Bremen Center for Computational Materials Science, University of Bremen, Am Fallturm 1, Bremen 28359, Germany
- Computational Science Research Center (CSRC) Beijing and Computational Science Applied Research (CSAR) Institute Shenzhen, Shenzhen 518110, China
| | - Kuan Sun
- MOE Key Laboratory of Low-grade Energy Utilization Technologies and Systems, School of Energy and Power Engineering, Chongqing University, 174 Shazhengjie, Shapingba, Chongqing 400044, China
| |
Collapse
|
44
|
Automated Construction and Optimization Combined with Machine Learning to Generate Pt(II) Methane C–H Activation Transition States. Top Catal 2021. [DOI: 10.1007/s11244-021-01506-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
45
|
Nandy A, Duan C, Taylor MG, Liu F, Steeves AH, Kulik HJ. Computational Discovery of Transition-metal Complexes: From High-throughput Screening to Machine Learning. Chem Rev 2021; 121:9927-10000. [PMID: 34260198 DOI: 10.1021/acs.chemrev.1c00347] [Citation(s) in RCA: 70] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Transition-metal complexes are attractive targets for the design of catalysts and functional materials. The behavior of the metal-organic bond, while very tunable for achieving target properties, is challenging to predict and necessitates searching a wide and complex space to identify needles in haystacks for target applications. This review will focus on the techniques that make high-throughput search of transition-metal chemical space feasible for the discovery of complexes with desirable properties. The review will cover the development, promise, and limitations of "traditional" computational chemistry (i.e., force field, semiempirical, and density functional theory methods) as it pertains to data generation for inorganic molecular discovery. The review will also discuss the opportunities and limitations in leveraging experimental data sources. We will focus on how advances in statistical modeling, artificial intelligence, multiobjective optimization, and automation accelerate discovery of lead compounds and design rules. The overall objective of this review is to showcase how bringing together advances from diverse areas of computational chemistry and computer science have enabled the rapid uncovering of structure-property relationships in transition-metal chemistry. We aim to highlight how unique considerations in motifs of metal-organic bonding (e.g., variable spin and oxidation state, and bonding strength/nature) set them and their discovery apart from more commonly considered organic molecules. We will also highlight how uncertainty and relative data scarcity in transition-metal chemistry motivate specific developments in machine learning representations, model training, and in computational chemistry. Finally, we will conclude with an outlook of areas of opportunity for the accelerated discovery of transition-metal complexes.
Collapse
Affiliation(s)
- Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Michael G Taylor
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Fang Liu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Adam H Steeves
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
46
|
Abstract
Computational methods have emerged as a powerful tool to augment traditional experimental molecular catalyst design by providing useful predictions of catalyst performance and decreasing the time needed for catalyst screening. In this perspective, we discuss three approaches for computational molecular catalyst design: (i) the reaction mechanism-based approach that calculates all relevant elementary steps, finds the rate and selectivity determining steps, and ultimately makes predictions on catalyst performance based on kinetic analysis, (ii) the descriptor-based approach where physical/chemical considerations are used to find molecular properties as predictors of catalyst performance, and (iii) the data-driven approach where statistical analysis as well as machine learning (ML) methods are used to obtain relationships between available data/features and catalyst performance. Following an introduction to these approaches, we cover their strengths and weaknesses and highlight some recent key applications. Furthermore, we present an outlook on how the currently applied approaches may evolve in the near future by addressing how recent developments in building automated computational workflows and implementing advanced ML models hold promise for reducing human workload, eliminating human bias, and speeding up computational catalyst design at the same time. Finally, we provide our viewpoint on how some of the challenges associated with the up-and-coming approaches driven by automation and ML may be resolved.
Collapse
Affiliation(s)
- Ademola Soyemi
- Department of Chemical and Biological Engineering, The University of Alabama, Tuscaloosa, AL 35487, USA.
| | - Tibor Szilvási
- Department of Chemical and Biological Engineering, The University of Alabama, Tuscaloosa, AL 35487, USA.
| |
Collapse
|
47
|
Tynes M, Gao W, Burrill DJ, Batista ER, Perez D, Yang P, Lubbers N. Pairwise Difference Regression: A Machine Learning Meta-algorithm for Improved Prediction and Uncertainty Quantification in Chemical Search. J Chem Inf Model 2021; 61:3846-3857. [PMID: 34347460 DOI: 10.1021/acs.jcim.1c00670] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Machine learning (ML) plays a growing role in the design and discovery of chemicals, aiming to reduce the need to perform expensive experiments and simulations. ML for such applications is promising but difficult, as models must generalize to vast chemical spaces from small training sets and must have reliable uncertainty quantification metrics to identify and prioritize unexplored regions. Ab initio computational chemistry and chemical intuition alike often take advantage of differences between chemical conditions, rather than their absolute structure or state, to generate more reliable results. We have developed an analogous comparison-based approach for ML regression, called pairwise difference regression (PADRE), which is applicable to arbitrary underlying learning models and operates on pairs of input data points. During training, the model learns to predict differences between all possible pairs of input points. During prediction, the test points are paired with all training set points, giving rise to a set of predictions that can be treated as a distribution of which the mean is treated as a final prediction and the dispersion is treated as an uncertainty measure. Pairwise difference regression was shown to reliably improve the performance of the random forest algorithm across five chemical ML tasks. Additionally, the pair-derived dispersion is both well correlated with model error and performs well in active learning. We also show that this method is competitive with state-of-the-art neural network techniques. Thus, pairwise difference regression is a promising tool for candidate selection algorithms used in chemical discovery.
Collapse
Affiliation(s)
- Michael Tynes
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States.,Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Wenhao Gao
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States.,Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Daniel J Burrill
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States.,Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Enrique R Batista
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States.,Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Danny Perez
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Ping Yang
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Nicholas Lubbers
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| |
Collapse
|
48
|
Kerner J, Dogan A, von Recum H. Machine learning and big data provide crucial insight for future biomaterials discovery and research. Acta Biomater 2021; 130:54-65. [PMID: 34087445 DOI: 10.1016/j.actbio.2021.05.053] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 05/24/2021] [Accepted: 05/25/2021] [Indexed: 02/06/2023]
Abstract
Machine learning have been widely adopted in a variety of fields including engineering, science, and medicine revolutionizing how data is collected, used, and stored. Their implementation has led to a drastic increase in the number of computational models for the prediction of various numerical, categorical, or association events given input variables. We aim to examine recent advances in the use of machine learning when applied to the biomaterial field. Specifically, quantitative structure properties relationships offer the unique ability to correlate microscale molecular descriptors to larger macroscale material properties. These new models can be broken down further into four categories: regression, classification, association, and clustering. We examine recent approaches and new uses of machine learning in the three major categories of biomaterials: metals, polymers, and ceramics for rapid property prediction and trend identification. While current research is promising, limitations in the form of lack of standardized reporting and available databases complicates the implementation of described models. Herein, we hope to provide a snapshot of the current state of the field and a beginner's guide to navigating the intersection of biomaterials research and machine learning. STATEMENT OF SIGNIFICANCE: Machine learning and its methods have found a variety of uses beyond the field of computer science but have largely been neglected by those in realm of biomaterials. Through the use of more computational methods, biomaterials development can be expediated while reducing the need for standard trial and error methods. Within, we introduce four basic models that readers can potentially apply to their current research as well as current applications within the field. Furthermore, we hope that this article may act as a "call to action" for readers to realize and address the current lack of implementation within the biomaterials field.
Collapse
Affiliation(s)
- Jacob Kerner
- Case Western Reserve University; 10900 Euclid Ave., Cleveland Ohio 44106.
| | - Alan Dogan
- Case Western Reserve University; 10900 Euclid Ave., Cleveland Ohio 44106.
| | - Horst von Recum
- Case Western Reserve University; 10900 Euclid Ave., Cleveland Ohio 44106.
| |
Collapse
|
49
|
Vennelakanti V, Nandy A, Kulik HJ. The Effect of Hartree-Fock Exchange on Scaling Relations and Reaction Energetics for C–H Activation Catalysts. Top Catal 2021. [DOI: 10.1007/s11244-021-01482-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
50
|
Turcani L, Tarzia A, Szczypiński FT, Jelfs KE. stk: An extendable Python framework for automated molecular and supramolecular structure assembly and discovery. J Chem Phys 2021; 154:214102. [PMID: 34240979 DOI: 10.1063/5.0049708] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Computational software workflows are emerging as all-in-one solutions to speed up the discovery of new materials. Many computational approaches require the generation of realistic structural models for property prediction and candidate screening. However, molecular and supramolecular materials represent classes of materials with many potential applications for which there is no go-to database of existing structures or general protocol for generating structures. Here, we report a new version of the supramolecular toolkit, stk, an open-source, extendable, and modular Python framework for general structure generation of (supra)molecular structures. Our construction approach works on arbitrary building blocks and topologies and minimizes the input required from the user, making stk user-friendly and applicable to many material classes. This version of stk includes metal-containing structures and rotaxanes as well as general implementation and interface improvements. Additionally, this version includes built-in tools for exploring chemical space with an evolutionary algorithm and tools for database generation and visualization. The latest version of stk is freely available at github.com/lukasturcani/stk.
Collapse
Affiliation(s)
- Lukas Turcani
- Department of Chemistry, Molecular Sciences Research Hub, Imperial College London, White City Campus, Wood Lane, London W12 0BZ, United Kingdom
| | - Andrew Tarzia
- Department of Chemistry, Molecular Sciences Research Hub, Imperial College London, White City Campus, Wood Lane, London W12 0BZ, United Kingdom
| | - Filip T Szczypiński
- Department of Chemistry, Molecular Sciences Research Hub, Imperial College London, White City Campus, Wood Lane, London W12 0BZ, United Kingdom
| | - Kim E Jelfs
- Department of Chemistry, Molecular Sciences Research Hub, Imperial College London, White City Campus, Wood Lane, London W12 0BZ, United Kingdom
| |
Collapse
|