1
|
Collins EM, Raghavachari K. Stepping-Stone CBH: Benchmark and Application of a Multilayered Isodesmic-Based Correction Scheme. J Chem Theory Comput 2024; 20:3543-3550. [PMID: 38630625 DOI: 10.1021/acs.jctc.3c01330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024]
Abstract
We present a generalization of the connectivity-based hierarchy (CBH) of isodesmic-based correction schemes to a multilayered fragmentation platform for overall cost reduction while retaining high accuracy. The newly developed multilayered CBH approach, called stepping-stone CBH (SSCBH), is benchmarked on a diverse set of 959 medium-sized organic molecules. Applying SSCBH corrections to the PBEh-D3 density functional resulted in an average error of 0.76 kcal/mol for the full test set compared to accurate CCSD(T)-quality enthalpies and an even lower error of 0.44 kcal/mol on a subset containing only acyclic molecules. These results rival the traditional CBH-3 approach at a greatly reduced cost, allowing larger fragment corrections to be made at the MP2 level of theory rather than with G4. Our SSCBH approach will enable more widespread applications of CBH methods to a broader range of organic and biomolecular systems.
Collapse
Affiliation(s)
- Eric M Collins
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | - Krishnan Raghavachari
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| |
Collapse
|
2
|
Nayak SK, Yamijala SSRKC. Computing accurate bond dissociation energies of emerging per- and polyfluoroalkyl substances: Achieving chemical accuracy using connectivity-based hierarchy schemes. JOURNAL OF HAZARDOUS MATERIALS 2024; 468:133804. [PMID: 38377911 DOI: 10.1016/j.jhazmat.2024.133804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 02/06/2024] [Accepted: 02/14/2024] [Indexed: 02/22/2024]
Abstract
Understanding the bond dissociation energies (BDEs) of per- and polyfluoroalkyl substances (PFAS) helps in devising their efficient degradation pathways. However, there is only limited experimental data on the PFAS BDEs, and there are uncertainties associated with the BDEs computed using density functional theory. Although quantum chemical methods like the G4 composite method can provide highly accurate BDEs (< 1 kcal mol-1), they are limited to small system sizes. To address DFT's accuracy limitations and G4's system size constraints, we examined the connectivity-based hierarchy (CBH) scheme and found that it can provide BDEs that are reasonably close to the G4 accuracy while retaining the computational efficiency of DFT. To further improve the accuracy, we modified the CBH scheme and demonstrated that BDEs calculated using it have a mean-absolute deviation of 0.7 kcal mol-1 from G4 BDEs. To validate the reliability of this new scheme, we computed the ground state free energies of seven PFAS compounds and BDEs for 44 C-C and C-F bonds at the G4 level of theory. Our results suggest that the modified CBH scheme can accurately compute the BDEs of both small and large PFAS at near G4 level accuracy, offering promise for more effective PFAS degradation strategies.
Collapse
Affiliation(s)
- Samir Kumar Nayak
- Department of Chemistry, Indian Institute of Technology Madras, Chennai 600036 India; Centre for Atomistic Modelling and Materials Design, Indian Institute of Technology Madras, Chennai 600036, India
| | - Sharma S R K C Yamijala
- Department of Chemistry, Indian Institute of Technology Madras, Chennai 600036 India; Centre for Atomistic Modelling and Materials Design, Indian Institute of Technology Madras, Chennai 600036, India; Centre for Molecular Materials and Functions, Indian Institute of Technology Madras, Chennai 600036, India; Centre for Quantum Information, Communication, and Computing, Indian Institute of Technology Madras, Chennai 600036, India.
| |
Collapse
|
3
|
Sanchez AJ, Maier S, Raghavachari K. Leveraging DFT and Molecular Fragmentation for Chemically Accurate p Ka Prediction Using Machine Learning. J Chem Inf Model 2024; 64:712-723. [PMID: 38301279 DOI: 10.1021/acs.jcim.3c01923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
We present a quantum mechanical/machine learning (ML) framework based on random forest to accurately predict the pKas of complex organic molecules using inexpensive density functional theory (DFT) calculations. By including physics-based features from low-level DFT calculations and structural features from our connectivity-based hierarchy (CBH) fragmentation protocol, we can correct the systematic error associated with DFT. The generalizability and performance of our model are evaluated on two benchmark sets (SAMPL6 and Novartis). We believe the carefully curated input of physics-based features lessens the model's data dependence and need for complex deep learning architectures, without compromising the accuracy of the test sets. As a point of novelty, our work extends the applicability of CBH, employing it for the generation of viable molecular descriptors for ML.
Collapse
Affiliation(s)
- Alec J Sanchez
- Department of Chemistry, Indiana University?, Bloomington, Indiana 47405, United States
| | - Sarah Maier
- Department of Chemistry, Indiana University?, Bloomington, Indiana 47405, United States
| | - Krishnan Raghavachari
- Department of Chemistry, Indiana University?, Bloomington, Indiana 47405, United States
| |
Collapse
|
4
|
Raghavachari K, Maier S, Collins EM, Debnath S, Sengupta A. Approaching Coupled Cluster Accuracy with Density Functional Theory Using the Generalized Connectivity-Based Hierarchy. J Chem Theory Comput 2023. [PMID: 37338997 DOI: 10.1021/acs.jctc.3c00301] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/22/2023]
Abstract
This Perspective reviews connectivity-based hierarchy (CBH), a systematic hierarchy of error-cancellation schemes developed in our group with the goal of achieving chemical accuracy using inexpensive computational techniques ("coupled cluster accuracy with DFT"). The hierarchy is a generalization of Pople's isodesmic bond separation scheme that is based only on the structure and connectivity and is applicable to any organic and biomolecule consisting of covalent bonds. It is formulated as a series of rungs involving increasing levels of error cancellation on progressively larger fragments of the parent molecule. The method and our implementation are discussed briefly. Examples are given for the applications of CBH involving (1) energies of complex organic rearrangement reactions, (2) bond energies of biofuel molecules, (3) redox potentials in solution, (4) pKa predictions in the aqueous medium, and (5) theoretical thermochemistry combining CBH with machine learning. They clearly show that near-chemical accuracy (1-2 kcal/mol) is achieved for a variety of applications with DFT methods irrespective of the underlying density functional used. They demonstrate conclusively that seemingly disparate results, often seen with different density functionals in many chemical applications, are due to an accumulation of systematic errors in the smaller local molecular fragments that can be easily corrected with higher-level calculations on those small units. This enables the method to achieve the accuracy of the high level of theory (e.g., coupled cluster) while the cost remains that of DFT. The advantages and limitations of the method are discussed along with areas of ongoing developments.
Collapse
Affiliation(s)
- Krishnan Raghavachari
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | - Sarah Maier
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | - Eric M Collins
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | - Sibali Debnath
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | - Arkajyoti Sengupta
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| |
Collapse
|
5
|
Liang Y, Zheng W, Xie H, Zha X, Wang T. A quantum chemistry study on C–H homolytic bond dissociation enthalpies of five-membered and six-membered heterocyclic compounds. J INDIAN CHEM SOC 2022. [DOI: 10.1016/j.jics.2022.100527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
6
|
Poskrebyshev GA. The Corrected Values of Δ
r
H
o
(C
a
H
b
O
d
, a≤16) of Atomization of the Aromatic Compounds and Their Uncertainties Determined Using Several Quantum Mechanical Approaches. ChemistrySelect 2022. [DOI: 10.1002/slct.202104502] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Gregory A. Poskrebyshev
- V.L. Tal'rose Institute of Energy Problems for Chemical Physics at Federal Research Center for Chemical Physics Russian Academy of Sciences 119334 Moscow Russia Leninsky prosp., bldg. 38–2
| |
Collapse
|
7
|
Collins EM, Raghavachari K. A Fragmentation-Based Graph Embedding Framework for QM/ML. J Phys Chem A 2021; 125:6872-6880. [PMID: 34342449 DOI: 10.1021/acs.jpca.1c06152] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
We introduce a new fragmentation-based molecular representation framework "FragGraph" for QM/ML methods involving embedding fragment-wise fingerprints onto molecular graphs. Our model is specifically designed for delta machine learning (Δ-ML) with the central goal of correcting the deficiencies of approximate methods such as DFT to achieve high accuracy. Our framework is based on a judicious combination of ideas from fragmentation, error cancellation, and a state-of-the-art deep learning architecture. Broadly, we develop a general graph-network framework for molecular machine learning by incorporating the inherent advantages prebuilt into error cancellation methods such as the generalized Connectivity-Based Hierarchy. More specifically, we develop a QM/ML representation through a fragmentation-based attributed graph representation encoded with fragment-wise molecular fingerprints. The utility of our representation is demonstrated through a graph network fingerprint encoder in which a global fingerprint is generated through message passing of local neighborhoods of fragment-wise fingerprints, effectively augmenting standard fingerprints to also include the inbuilt molecular graph structure. On the 130k-GDB9 dataset, our method predicts an out-of-sample mean absolute error significantly lower than 1 kJ/mol compared to target G4(MP2) calculated energies, rivaling current deep learning methods with reduced computational scaling.
Collapse
Affiliation(s)
- Eric M Collins
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | - Krishnan Raghavachari
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| |
Collapse
|
8
|
Collins EM, Raghavachari K. Effective Molecular Descriptors for Chemical Accuracy at DFT Cost: Fragmentation, Error-Cancellation, and Machine Learning. J Chem Theory Comput 2020; 16:4938-4950. [PMID: 32678593 DOI: 10.1021/acs.jctc.0c00236] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Recent advances in theoretical thermochemistry have allowed the study of small organic and bio-organic molecules with high accuracy. However, applications to larger molecules are still impeded by the steep scaling problem of highly accurate quantum mechanical (QM) methods, forcing the use of approximate, more cost-effective methods at a greatly reduced accuracy. One of the most successful strategies to mitigate this error is the use of systematic error-cancellation schemes, in which highly accurate QM calculations can be performed on small portions of the molecule to construct corrections to an approximate method. Herein, we build on ideas from fragmentation and error-cancellation to introduce a new family of molecular descriptors for machine learning modeled after the Connectivity-Based Hierarchy (CBH) of generalized isodesmic reaction schemes. The best performing descriptor ML(CBH-2) is constructed from fragments preserving only the immediate connectivity of all heavy (non-H) atoms of a molecule along with overlapping regions of fragments in accordance with the inclusion-exclusion principle. Our proposed approach offers a simple, chemically intuitive grouping of atoms, tuned with an optimal amount of error-cancellation, and outperforms previous structure-based descriptors using a much smaller input vector length. For a wide variety of density functionals, DFT+ΔML(CBH-2) models, trained on a set of small- to medium-sized organic HCNOSCl-containing molecules, achieved an out-of-sample MAE within 0.5 kcal/mol and 2σ (95%) confidence interval of <1.5 kcal/mol compared to accurate G4 reference values at DFT cost.
Collapse
Affiliation(s)
- Eric M Collins
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | - Krishnan Raghavachari
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| |
Collapse
|
9
|
Guo H, Yang X, Zwier T. Virtual Issue on Combustion Chemistry. J Phys Chem A 2020; 124:5995-5996. [PMID: 32698590 DOI: 10.1021/acs.jpca.0c05674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
10
|
Maier S, Thapa B, Raghavachari K. G4 accuracy at DFT cost: unlocking accurate redox potentials for organic molecules using systematic error cancellation. Phys Chem Chem Phys 2020; 22:4439-4452. [DOI: 10.1039/c9cp06622e] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
This study presents a cost-effective error cancellation protocol to predict the redox potentials of 46 organic molecules with near-G4 accuracy.
Collapse
Affiliation(s)
- Sarah Maier
- Department of Chemistry
- Indiana University
- Bloomington
- USA
| | - Bishnu Thapa
- Department of Chemistry
- Indiana University
- Bloomington
- USA
| | | |
Collapse
|