1
|
Nayak SK, Yamijala SSRKC. Computing accurate bond dissociation energies of emerging per- and polyfluoroalkyl substances: Achieving chemical accuracy using connectivity-based hierarchy schemes. JOURNAL OF HAZARDOUS MATERIALS 2024; 468:133804. [PMID: 38377911 DOI: 10.1016/j.jhazmat.2024.133804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 02/06/2024] [Accepted: 02/14/2024] [Indexed: 02/22/2024]
Abstract
Understanding the bond dissociation energies (BDEs) of per- and polyfluoroalkyl substances (PFAS) helps in devising their efficient degradation pathways. However, there is only limited experimental data on the PFAS BDEs, and there are uncertainties associated with the BDEs computed using density functional theory. Although quantum chemical methods like the G4 composite method can provide highly accurate BDEs (< 1 kcal mol-1), they are limited to small system sizes. To address DFT's accuracy limitations and G4's system size constraints, we examined the connectivity-based hierarchy (CBH) scheme and found that it can provide BDEs that are reasonably close to the G4 accuracy while retaining the computational efficiency of DFT. To further improve the accuracy, we modified the CBH scheme and demonstrated that BDEs calculated using it have a mean-absolute deviation of 0.7 kcal mol-1 from G4 BDEs. To validate the reliability of this new scheme, we computed the ground state free energies of seven PFAS compounds and BDEs for 44 C-C and C-F bonds at the G4 level of theory. Our results suggest that the modified CBH scheme can accurately compute the BDEs of both small and large PFAS at near G4 level accuracy, offering promise for more effective PFAS degradation strategies.
Collapse
Affiliation(s)
- Samir Kumar Nayak
- Department of Chemistry, Indian Institute of Technology Madras, Chennai 600036 India; Centre for Atomistic Modelling and Materials Design, Indian Institute of Technology Madras, Chennai 600036, India
| | - Sharma S R K C Yamijala
- Department of Chemistry, Indian Institute of Technology Madras, Chennai 600036 India; Centre for Atomistic Modelling and Materials Design, Indian Institute of Technology Madras, Chennai 600036, India; Centre for Molecular Materials and Functions, Indian Institute of Technology Madras, Chennai 600036, India; Centre for Quantum Information, Communication, and Computing, Indian Institute of Technology Madras, Chennai 600036, India.
| |
Collapse
|
2
|
Sanchez AJ, Maier S, Raghavachari K. Leveraging DFT and Molecular Fragmentation for Chemically Accurate p Ka Prediction Using Machine Learning. J Chem Inf Model 2024; 64:712-723. [PMID: 38301279 DOI: 10.1021/acs.jcim.3c01923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
We present a quantum mechanical/machine learning (ML) framework based on random forest to accurately predict the pKas of complex organic molecules using inexpensive density functional theory (DFT) calculations. By including physics-based features from low-level DFT calculations and structural features from our connectivity-based hierarchy (CBH) fragmentation protocol, we can correct the systematic error associated with DFT. The generalizability and performance of our model are evaluated on two benchmark sets (SAMPL6 and Novartis). We believe the carefully curated input of physics-based features lessens the model's data dependence and need for complex deep learning architectures, without compromising the accuracy of the test sets. As a point of novelty, our work extends the applicability of CBH, employing it for the generation of viable molecular descriptors for ML.
Collapse
Affiliation(s)
- Alec J Sanchez
- Department of Chemistry, Indiana University?, Bloomington, Indiana 47405, United States
| | - Sarah Maier
- Department of Chemistry, Indiana University?, Bloomington, Indiana 47405, United States
| | - Krishnan Raghavachari
- Department of Chemistry, Indiana University?, Bloomington, Indiana 47405, United States
| |
Collapse
|
3
|
Fedorov R, Gryn’ova G. Unlocking the Potential: Predicting Redox Behavior of Organic Molecules, from Linear Fits to Neural Networks. J Chem Theory Comput 2023; 19:4796-4814. [PMID: 37463673 PMCID: PMC10414033 DOI: 10.1021/acs.jctc.3c00355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Indexed: 07/20/2023]
Abstract
Redox-active organic molecules, i.e., molecules that can relatively easily accept and/or donate electrons, are ubiquitous in biology, chemical synthesis, and electronic and spintronic devices, such as solar cells and rechargeable batteries, etc. Choosing the best candidates from an essentially infinite chemical space for experimental testing in a target application requires efficient screening approaches. In this Review, we discuss modern in silico techniques for predicting reduction and oxidation potentials of organic molecules that go beyond conventional first-principles computations and thermodynamic cycles. Approaches ranging from simple linear fits based on molecular orbital energy approximation and energy difference approximation to advanced regression and neural network machine learning algorithms employing complex descriptors of molecular compositions, geometries, and electronic structures are examined in conjunction with relevant literature examples. We discuss the interplay between ab initio data and machine learning (ML), i.e., whether it is better to base predictions on low-level quantum-chemical results corrected with ML or to bypass first-principles computations entirely and instead rely on elaborate deep learning architectures. Finally, we list currently available data sets of redox-active organic molecules and their experimental and/or computed properties to facilitate the development of screening platforms and rational design of redox-active organic molecules.
Collapse
Affiliation(s)
- Rostislav Fedorov
- Heidelberg
Institute for Theoretical Studies (HITS gGmbH), 69118 Heidelberg, Germany
- Interdisciplinary
Center for Scientific Computing, Heidelberg
University, 69120 Heidelberg, Germany
| | - Ganna Gryn’ova
- Heidelberg
Institute for Theoretical Studies (HITS gGmbH), 69118 Heidelberg, Germany
- Interdisciplinary
Center for Scientific Computing, Heidelberg
University, 69120 Heidelberg, Germany
| |
Collapse
|
4
|
Raghavachari K, Maier S, Collins EM, Debnath S, Sengupta A. Approaching Coupled Cluster Accuracy with Density Functional Theory Using the Generalized Connectivity-Based Hierarchy. J Chem Theory Comput 2023. [PMID: 37338997 DOI: 10.1021/acs.jctc.3c00301] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/22/2023]
Abstract
This Perspective reviews connectivity-based hierarchy (CBH), a systematic hierarchy of error-cancellation schemes developed in our group with the goal of achieving chemical accuracy using inexpensive computational techniques ("coupled cluster accuracy with DFT"). The hierarchy is a generalization of Pople's isodesmic bond separation scheme that is based only on the structure and connectivity and is applicable to any organic and biomolecule consisting of covalent bonds. It is formulated as a series of rungs involving increasing levels of error cancellation on progressively larger fragments of the parent molecule. The method and our implementation are discussed briefly. Examples are given for the applications of CBH involving (1) energies of complex organic rearrangement reactions, (2) bond energies of biofuel molecules, (3) redox potentials in solution, (4) pKa predictions in the aqueous medium, and (5) theoretical thermochemistry combining CBH with machine learning. They clearly show that near-chemical accuracy (1-2 kcal/mol) is achieved for a variety of applications with DFT methods irrespective of the underlying density functional used. They demonstrate conclusively that seemingly disparate results, often seen with different density functionals in many chemical applications, are due to an accumulation of systematic errors in the smaller local molecular fragments that can be easily corrected with higher-level calculations on those small units. This enables the method to achieve the accuracy of the high level of theory (e.g., coupled cluster) while the cost remains that of DFT. The advantages and limitations of the method are discussed along with areas of ongoing developments.
Collapse
Affiliation(s)
- Krishnan Raghavachari
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | - Sarah Maier
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | - Eric M Collins
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | - Sibali Debnath
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | - Arkajyoti Sengupta
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| |
Collapse
|
5
|
Lee S, Lee G, Park S, Yim D, Yim T, Kim J, Kim H. Theoretical Protocol Based on Long-Range Corrected Density Functional Theory and Tuning of Range-Split Parameter for Two-Electron Two-Proton Reduction of Phenylazocarboxylates. J Phys Chem A 2022; 126:2430-2436. [PMID: 35412306 DOI: 10.1021/acs.jpca.1c10637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
A theoretical protocol based on long-range corrected density functional theory is suggested for a highly accurate estimation of the two-electron two-proton (2e2p) reduction potential of ethyl 2-phenylazocarboxylate derivatives. Geometry optimization and single-point energy refinement with ωB97X-D are recommended. The impact of polarization and diffusion functions in the basis sets on the 2e2p reduction potential is discussed. Further improvements can be achieved by tuning the range-split parameter based on the linear relationship between the Hammett constant of phenyl substituents and the optimal ω value that most accurately reproduces the experiments. The suggested protocol can accurately predict the 2e2p reduction potential of five ethyl 2-phenylazocarboxylate derivatives. Based on these findings, 22 additional candidates are suggested to enlarge the electrochemical window and to increase the selectivity of 2e2p reactions. This study contributes to the development of a theoretical approach to accurately estimate the 2e2p reduction potential of azo groups.
Collapse
Affiliation(s)
- Serin Lee
- Incheon National University and Research Institute of Basic Sciences, Incheon National University, Incheon 22012, South Korea
| | - Giseung Lee
- Incheon National University and Research Institute of Basic Sciences, Incheon National University, Incheon 22012, South Korea
| | - Sanggil Park
- Incheon National University and Research Institute of Basic Sciences, Incheon National University, Incheon 22012, South Korea
| | - Daniel Yim
- Incheon National University and Research Institute of Basic Sciences, Incheon National University, Incheon 22012, South Korea
| | - Taeeun Yim
- Incheon National University and Research Institute of Basic Sciences, Incheon National University, Incheon 22012, South Korea
| | - Jinho Kim
- Incheon National University and Research Institute of Basic Sciences, Incheon National University, Incheon 22012, South Korea
| | - Hyungjun Kim
- Incheon National University and Research Institute of Basic Sciences, Incheon National University, Incheon 22012, South Korea
| |
Collapse
|
6
|
Collins EM, Raghavachari K. A Fragmentation-Based Graph Embedding Framework for QM/ML. J Phys Chem A 2021; 125:6872-6880. [PMID: 34342449 DOI: 10.1021/acs.jpca.1c06152] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
We introduce a new fragmentation-based molecular representation framework "FragGraph" for QM/ML methods involving embedding fragment-wise fingerprints onto molecular graphs. Our model is specifically designed for delta machine learning (Δ-ML) with the central goal of correcting the deficiencies of approximate methods such as DFT to achieve high accuracy. Our framework is based on a judicious combination of ideas from fragmentation, error cancellation, and a state-of-the-art deep learning architecture. Broadly, we develop a general graph-network framework for molecular machine learning by incorporating the inherent advantages prebuilt into error cancellation methods such as the generalized Connectivity-Based Hierarchy. More specifically, we develop a QM/ML representation through a fragmentation-based attributed graph representation encoded with fragment-wise molecular fingerprints. The utility of our representation is demonstrated through a graph network fingerprint encoder in which a global fingerprint is generated through message passing of local neighborhoods of fragment-wise fingerprints, effectively augmenting standard fingerprints to also include the inbuilt molecular graph structure. On the 130k-GDB9 dataset, our method predicts an out-of-sample mean absolute error significantly lower than 1 kJ/mol compared to target G4(MP2) calculated energies, rivaling current deep learning methods with reduced computational scaling.
Collapse
Affiliation(s)
- Eric M Collins
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | - Krishnan Raghavachari
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| |
Collapse
|
7
|
Collins EM, Raghavachari K. Effective Molecular Descriptors for Chemical Accuracy at DFT Cost: Fragmentation, Error-Cancellation, and Machine Learning. J Chem Theory Comput 2020; 16:4938-4950. [PMID: 32678593 DOI: 10.1021/acs.jctc.0c00236] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Recent advances in theoretical thermochemistry have allowed the study of small organic and bio-organic molecules with high accuracy. However, applications to larger molecules are still impeded by the steep scaling problem of highly accurate quantum mechanical (QM) methods, forcing the use of approximate, more cost-effective methods at a greatly reduced accuracy. One of the most successful strategies to mitigate this error is the use of systematic error-cancellation schemes, in which highly accurate QM calculations can be performed on small portions of the molecule to construct corrections to an approximate method. Herein, we build on ideas from fragmentation and error-cancellation to introduce a new family of molecular descriptors for machine learning modeled after the Connectivity-Based Hierarchy (CBH) of generalized isodesmic reaction schemes. The best performing descriptor ML(CBH-2) is constructed from fragments preserving only the immediate connectivity of all heavy (non-H) atoms of a molecule along with overlapping regions of fragments in accordance with the inclusion-exclusion principle. Our proposed approach offers a simple, chemically intuitive grouping of atoms, tuned with an optimal amount of error-cancellation, and outperforms previous structure-based descriptors using a much smaller input vector length. For a wide variety of density functionals, DFT+ΔML(CBH-2) models, trained on a set of small- to medium-sized organic HCNOSCl-containing molecules, achieved an out-of-sample MAE within 0.5 kcal/mol and 2σ (95%) confidence interval of <1.5 kcal/mol compared to accurate G4 reference values at DFT cost.
Collapse
Affiliation(s)
- Eric M Collins
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | - Krishnan Raghavachari
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| |
Collapse
|
8
|
Neugebauer H, Bohle F, Bursch M, Hansen A, Grimme S. Benchmark Study of Electrochemical Redox Potentials Calculated with Semiempirical and DFT Methods. J Phys Chem A 2020; 124:7166-7176. [DOI: 10.1021/acs.jpca.0c05052] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Affiliation(s)
- Hagen Neugebauer
- Mulliken Center for Theoretical Chemistry, Institute for Physical and Theoretical Chemistry, University of Bonn, Beringstr. 4, 53115 Bonn, Germany
| | - Fabian Bohle
- Mulliken Center for Theoretical Chemistry, Institute for Physical and Theoretical Chemistry, University of Bonn, Beringstr. 4, 53115 Bonn, Germany
| | - Markus Bursch
- Mulliken Center for Theoretical Chemistry, Institute for Physical and Theoretical Chemistry, University of Bonn, Beringstr. 4, 53115 Bonn, Germany
| | - Andreas Hansen
- Mulliken Center for Theoretical Chemistry, Institute for Physical and Theoretical Chemistry, University of Bonn, Beringstr. 4, 53115 Bonn, Germany
| | - Stefan Grimme
- Mulliken Center for Theoretical Chemistry, Institute for Physical and Theoretical Chemistry, University of Bonn, Beringstr. 4, 53115 Bonn, Germany
| |
Collapse
|