1
|
Yang Y, Sun S, Yang S, Yang Q, Lu X, Wang X, Yu Q, Huo X, Qian X. Structural annotation of unknown molecules in a miniaturized mass spectrometer based on a transformer enabled fragment tree method. Commun Chem 2024; 7:109. [PMID: 38740942 DOI: 10.1038/s42004-024-01189-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 04/26/2024] [Indexed: 05/16/2024] Open
Abstract
Structural annotation of small molecules in tandem mass spectrometry has always been a central challenge in mass spectrometry analysis, especially using a miniaturized mass spectrometer for on-site testing. Here, we propose the Transformer enabled Fragment Tree (TeFT) method, which combines various types of fragmentation tree models and a deep learning Transformer module. It is aimed to generate the specific structure of molecules de novo solely from mass spectrometry spectra. The evaluation results on different open-source databases indicated that the proposed model achieved remarkable results in that the majority of molecular structures of compounds in the test can be successfully recognized. Also, the TeFT has been validated on a miniaturized mass spectrometer with low-resolution spectra for 16 flavonoid alcohols, achieving complete structure prediction for 8 substances. Finally, TeFT confirmed the structure of the compound contained in a Chinese medicine substance called the Anweiyang capsule. These results indicate that the TeFT method is suitable for annotating fragmentation peaks with clear fragmentation rules, particularly when applied to on-site mass spectrometry with lower mass resolution.
Collapse
Affiliation(s)
- Yiming Yang
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Shuang Sun
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Shuyuan Yang
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Qin Yang
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Xinqiong Lu
- CHIN Instrument (Hefei) Co., Ltd., Hefei, 231200, China
| | - Xiaohao Wang
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Quan Yu
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Xinming Huo
- Key Laboratory of Sensing Technology and Biomedical Instruments of Guangdong Province, School of Biomedical Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, 518107, China.
| | - Xiang Qian
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China.
| |
Collapse
|
2
|
Dral PO. AI in computational chemistry through the lens of a decade-long journey. Chem Commun (Camb) 2024; 60:3240-3258. [PMID: 38444290 DOI: 10.1039/d4cc00010b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2024]
Abstract
This article gives a perspective on the progress of AI tools in computational chemistry through the lens of the author's decade-long contributions put in the wider context of the trends in this rapidly expanding field. This progress over the last decade is tremendous: while a decade ago we had a glimpse of what was to come through many proof-of-concept studies, now we witness the emergence of many AI-based computational chemistry tools that are mature enough to make faster and more accurate simulations increasingly routine. Such simulations in turn allow us to validate and even revise experimental results, deepen our understanding of the physicochemical processes in nature, and design better materials, devices, and drugs. The rapid introduction of powerful AI tools gives rise to unique challenges and opportunities that are discussed in this article too.
Collapse
Affiliation(s)
- Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China.
| |
Collapse
|
3
|
Sanchez AJ, Maier S, Raghavachari K. Leveraging DFT and Molecular Fragmentation for Chemically Accurate p Ka Prediction Using Machine Learning. J Chem Inf Model 2024; 64:712-723. [PMID: 38301279 DOI: 10.1021/acs.jcim.3c01923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
We present a quantum mechanical/machine learning (ML) framework based on random forest to accurately predict the pKas of complex organic molecules using inexpensive density functional theory (DFT) calculations. By including physics-based features from low-level DFT calculations and structural features from our connectivity-based hierarchy (CBH) fragmentation protocol, we can correct the systematic error associated with DFT. The generalizability and performance of our model are evaluated on two benchmark sets (SAMPL6 and Novartis). We believe the carefully curated input of physics-based features lessens the model's data dependence and need for complex deep learning architectures, without compromising the accuracy of the test sets. As a point of novelty, our work extends the applicability of CBH, employing it for the generation of viable molecular descriptors for ML.
Collapse
Affiliation(s)
- Alec J Sanchez
- Department of Chemistry, Indiana University?, Bloomington, Indiana 47405, United States
| | - Sarah Maier
- Department of Chemistry, Indiana University?, Bloomington, Indiana 47405, United States
| | - Krishnan Raghavachari
- Department of Chemistry, Indiana University?, Bloomington, Indiana 47405, United States
| |
Collapse
|
4
|
Shilpa S, Kashyap G, Sunoj RB. Recent Applications of Machine Learning in Molecular Property and Chemical Reaction Outcome Predictions. J Phys Chem A 2023; 127:8253-8271. [PMID: 37769193 DOI: 10.1021/acs.jpca.3c04779] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/30/2023]
Abstract
Burgeoning developments in machine learning (ML) and its rapidly growing adaptations in chemistry are noteworthy. Motivated by the successful deployments of ML in the realm of molecular property prediction (MPP) and chemical reaction prediction (CRP), herein we highlight some of its most recent applications in predictive chemistry. We present a nonmathematical and concise overview of the progression of ML implementations, ranging from an ensemble-based random forest model to advanced graph neural network algorithms. Similarly, the prospects of various feature engineering and feature learning approaches that work in conjunction with ML models are described. Highly accurate predictions reported in MPP tasks (e.g., lipophilicity, solubility, distribution coefficient), using methods such as D-MPNN, MolCLR, SMILES-BERT, and MolBERT, offer promising avenues in molecular design and drug discovery. Whereas MPP pertains to a given molecule, ML applications in chemical reactions present a different level of challenge, primarily arising from the simultaneous involvement of multiple molecules and their diverse roles in a reaction setting. The reported RMSEs in MPP tasks range from 0.287 to 2.20, while those for yield predictions are well over 4.9 in the lower end, reaching thresholds of >10.0 in several examples. Our Review concludes with a set of persisting challenges in dealing with reaction data sets and an overall optimistic outlook on benefits of ML-driven workflows for various MPP as well as CRP tasks.
Collapse
Affiliation(s)
- Shilpa Shilpa
- Department of Chemistry, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| | - Gargee Kashyap
- Department of Chemistry, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| | - Raghavan B Sunoj
- Department of Chemistry, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
- Centre for Machine Intelligence and Data Science, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| |
Collapse
|
5
|
Dandu NK, Ward L, Assary RS, Redfern PC, Curtiss LA. Accurate Prediction of Adiabatic Ionization Potentials of Organic Molecules using Quantum Chemistry Assisted Machine Learning. J Phys Chem A 2023. [PMID: 37406209 DOI: 10.1021/acs.jpca.3c00823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/07/2023]
Abstract
In previous work (Dandu et al., J. Phys. Chem. A, 2022, 126, 4528-4536), we were successful in predicting accurate atomization energies of organic molecules using machine learning (ML) models, obtaining an accuracy as low as 0.1 kcal/mol compared to the G4MP2 method. In this work, we extend the use of these ML models to adiabatic ionization potentials on data sets of energies generated using quantum chemical calculations. Atomic specific corrections that were found to improve atomization energies from quantum chemical calculations have also been used in this study to improve ionization potentials. The quantum chemical calculations were performed on 3405 molecules containing eight or fewer non-hydrogen atoms derived from the QM9 data set, using the B3LYP functional with the 6-31G(2df,p) basis set for optimization. Low-fidelity IPs for these structures were obtained using two density functional methods: B3LYP/6-31+G(2df,p) and ωB97XD/6-311+G(3df,2p). Highly accurate G4MP2 calculations were performed on these optimized structures to obtain high-fidelity IPs to use in ML models based on the low-fidelity IPs. Our best performing ML methods gave IPs of organic molecules within a mean absolute deviation of 0.035 eV from the G4MP2 IPs for the whole data set. This work demonstrates that ML predictions assisted by quantum chemical calculations can be used to successfully predict IPs of organic molecules for use in high throughput screening.
Collapse
Affiliation(s)
- Naveen K Dandu
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
- Joint Center for Energy Storage Research (JCESR), Argonne National Laboratory, Lemont, Illinois 60439, United States
- Chemical Engineering Department, University of Illinois-Chicago, Chicago, Illinois 60608, United States
| | - Logan Ward
- Joint Center for Energy Storage Research (JCESR), Argonne National Laboratory, Lemont, Illinois 60439, United States
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Rajeev S Assary
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
- Joint Center for Energy Storage Research (JCESR), Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Paul C Redfern
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Larry A Curtiss
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
- Joint Center for Energy Storage Research (JCESR), Argonne National Laboratory, Lemont, Illinois 60439, United States
| |
Collapse
|
6
|
Raghavachari K, Maier S, Collins EM, Debnath S, Sengupta A. Approaching Coupled Cluster Accuracy with Density Functional Theory Using the Generalized Connectivity-Based Hierarchy. J Chem Theory Comput 2023. [PMID: 37338997 DOI: 10.1021/acs.jctc.3c00301] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/22/2023]
Abstract
This Perspective reviews connectivity-based hierarchy (CBH), a systematic hierarchy of error-cancellation schemes developed in our group with the goal of achieving chemical accuracy using inexpensive computational techniques ("coupled cluster accuracy with DFT"). The hierarchy is a generalization of Pople's isodesmic bond separation scheme that is based only on the structure and connectivity and is applicable to any organic and biomolecule consisting of covalent bonds. It is formulated as a series of rungs involving increasing levels of error cancellation on progressively larger fragments of the parent molecule. The method and our implementation are discussed briefly. Examples are given for the applications of CBH involving (1) energies of complex organic rearrangement reactions, (2) bond energies of biofuel molecules, (3) redox potentials in solution, (4) pKa predictions in the aqueous medium, and (5) theoretical thermochemistry combining CBH with machine learning. They clearly show that near-chemical accuracy (1-2 kcal/mol) is achieved for a variety of applications with DFT methods irrespective of the underlying density functional used. They demonstrate conclusively that seemingly disparate results, often seen with different density functionals in many chemical applications, are due to an accumulation of systematic errors in the smaller local molecular fragments that can be easily corrected with higher-level calculations on those small units. This enables the method to achieve the accuracy of the high level of theory (e.g., coupled cluster) while the cost remains that of DFT. The advantages and limitations of the method are discussed along with areas of ongoing developments.
Collapse
Affiliation(s)
- Krishnan Raghavachari
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | - Sarah Maier
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | - Eric M Collins
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | - Sibali Debnath
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | - Arkajyoti Sengupta
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| |
Collapse
|
7
|
Collins EM, Raghavachari K. Interpretable Graph-Network-Based Machine Learning Models via Molecular Fragmentation. J Chem Theory Comput 2023; 19:2804-2810. [PMID: 37134275 DOI: 10.1021/acs.jctc.2c01308] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Chemists have long benefitted from the ability to understand and interpret the predictions of computational models. With the current shift to more complex deep learning models, in many situations that utility is lost. In this work, we expand on our previously work on computational thermochemistry and propose an interpretable graph network, FragGraph(nodes), that provides decomposed predictions into fragment-wise contributions. We demonstrate the usefulness of our model in predicting a correction to density functional theory (DFT)-calculated atomization energies using Δ-learning. Our model predicts G4(MP2)-quality thermochemistry with an accuracy of <1 kJ mol-1 for the GDB9 dataset. Besides the high accuracy of our predictions, we observe trends in the fragment corrections which quantitatively describe the deficiencies of B3LYP. Node-wise predictions significantly outperform our previous model predictions from a global state vector. This effect is most pronounced as we explore the generality by predicting on more diverse test sets indicating node-wise predictions are less sensitive to extending machine learning models to larger molecules.
Collapse
Affiliation(s)
- Eric M Collins
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | - Krishnan Raghavachari
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| |
Collapse
|
8
|
Ruth M, Gerbig D, Schreiner PR. Machine Learning of Coupled Cluster (T)-Energy Corrections via Delta (Δ)-Learning. J Chem Theory Comput 2022; 18:4846-4855. [PMID: 35816588 DOI: 10.1021/acs.jctc.2c00501] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Accurate thermochemistry is essential in many chemical disciplines, such as astro-, atmospheric, or combustion chemistry. These areas often involve fleetingly existent intermediates whose thermochemistry is difficult to assess. Whenever direct calorimetric experiments are infeasible, accurate computational estimates of relative molecular energies are required. However, high-level computations, often using coupled cluster theory, are generally resource-intensive. To expedite the process using machine learning techniques, we generated a database of energies for small organic molecules at the CCSD(T)/cc-pVDZ, CCSD(T)/aug-cc-pVDZ, and CCSD(T)/cc-pVTZ levels of theory. Leveraging the power of deep learning by employing graph neural networks, we are able to predict the effect of perturbatively included triples (T), that is, the difference between CCSD and CCSD(T) energies, with a mean absolute error of 0.25, 0.25, and 0.28 kcal mol-1 (R2 of 0.998, 0.997, and 0.998) with the cc-pVDZ, aug-cc-pVDZ, and cc-pVTZ basis sets, respectively. Our models were further validated by application to three validation sets taken from the S22 Database as well as to a selection of known theoretically challenging cases.
Collapse
Affiliation(s)
- Marcel Ruth
- Institute of Organic Chemistry, Justus Liebig University, Heinrich-Buff-Ring 17, 35392 Giessen, Germany
| | - Dennis Gerbig
- Institute of Organic Chemistry, Justus Liebig University, Heinrich-Buff-Ring 17, 35392 Giessen, Germany
| | - Peter R Schreiner
- Institute of Organic Chemistry, Justus Liebig University, Heinrich-Buff-Ring 17, 35392 Giessen, Germany
| |
Collapse
|
9
|
Zheng P, Yang W, Wu W, Isayev O, Dral PO. Toward Chemical Accuracy in Predicting Enthalpies of Formation with General-Purpose Data-Driven Methods. J Phys Chem Lett 2022; 13:3479-3491. [PMID: 35416675 DOI: 10.1021/acs.jpclett.2c00734] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Enthalpies of formation and reaction are important thermodynamic properties that have a crucial impact on the outcome of chemical transformations. Here we implement the calculation of enthalpies of formation with a general-purpose ANI-1ccx neural network atomistic potential. We demonstrate on a wide range of benchmark sets that both ANI-1ccx and our other general-purpose data-driven method AIQM1 approach the coveted chemical accuracy of 1 kcal/mol with the speed of semiempirical quantum mechanical methods (AIQM1) or faster (ANI-1ccx). It is remarkably achieved without specifically training the machine learning parts of ANI-1ccx or AIQM1 on formation enthalpies. Importantly, we show that these data-driven methods provide statistical means for uncertainty quantification of their predictions, which we use to detect and eliminate outliers and revise reference experimental data. Uncertainty quantification may also help in the systematic improvement of such data-driven methods.
Collapse
Affiliation(s)
- Peikun Zheng
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Wudi Yang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Wei Wu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Olexandr Isayev
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| |
Collapse
|
10
|
Orr-Ewing AJ, Crawford TD, Zanni MT, Hartland G, Shea JE. A Venue for Advances in Experimental and Theoretical Methods in Physical Chemistry. J Phys Chem A 2022; 126:177-179. [PMID: 35045707 DOI: 10.1021/acs.jpca.1c10457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Andrew J Orr-Ewing
- School of Chemistry, University of Bristol, Cantock's Close, Bristol BS8 1TS, U.K
| | - T Daniel Crawford
- Department of Chemistry, Virginia Tech, Blacksburg, Virginia 24061, United States.,Molecular Sciences Software Institute, 1880 Pratt Drive, Suite 1100, Blacksburg, Virginia 24060, United States
| | - Martin T Zanni
- Department of Chemistry, University of Wisconsin─Madison, 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Gregory Hartland
- University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Joan-Emma Shea
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, Santa Barbara, California 93106, United States.,Department of Physics, University of California, Santa Barbara, Santa Barbara, California 93106, United States
| |
Collapse
|