1
|
Tian Z, Dai Y, Hu F, Shen Z, Xu H, Zhang H, Xu J, Hu Y, Diao Y, Li H. Enhancing Chemical Reaction Monitoring with a Deep Learning Model for NMR Spectra Image Matching to Target Compounds. J Chem Inf Model 2024; 64:5624-5633. [PMID: 38979856 DOI: 10.1021/acs.jcim.4c00522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
In the synthetic laboratory, researchers typically rely on nuclear magnetic resonance (NMR) spectra to elucidate structures of synthesized products and confirm whether they match the desired target compounds. As chemical synthesis technology evolves toward intelligence and continuity, efficient computer-assisted structure elucidation (CASE) techniques are required to replace time-consuming manual analysis and provide the necessary speed. However, current CASE methods typically aim to derive precise chemical structures from spectroscopic data, yet they suffer from drawbacks such as low accuracy, high computational cost, and reliance on chemical libraries. In meticulously designed chemical synthesis reactions, researchers prioritize confirming the attainment of the target product based on NMR spectra, rather than focusing on identifying the specific product obtained. For this purpose, we innovatively developed a binary classification model, termed as MatCS, to directly predict the relationship between NMR spectra image (including 1H NMR and 13C NMR) and the molecular structure of the target compound. After evaluating various feature extraction methods, MatCS employs a combination of the Graph Attention Networks and Graph Convolutional Networks to learn the structural features of molecular graphs and the pretrained ResNet101 network with a Convolutional Block Attention Module to extract features from NMR spectra images. The results show that on a challenging Testsim data set, which poses difficulty in distinguishing spectra of similar molecular structures, MatCS achieves comprehensive evaluation metrics with an F1-score of 0.81 and an AUC value of 0.87. Simultaneously, it exhibited commendable performance on an external SDBS data set containing experimental NMR spectra, showcasing substantial potential for structural verification tasks in real automated chemical synthesis.
Collapse
Affiliation(s)
- ZiJing Tian
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Yan Dai
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Feng Hu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - ZiHao Shen
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - HongLing Xu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - HongWen Zhang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - JinHang Xu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - YuTing Hu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - YanYan Diao
- Innovation Center for AI and Drug Discovery, School of Pharmacy, East China Normal University, Shanghai 200062, China
- Lingang Laboratory, Shanghai 200031, China
| | - HongLin Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
- Innovation Center for AI and Drug Discovery, School of Pharmacy, East China Normal University, Shanghai 200062, China
- Lingang Laboratory, Shanghai 200031, China
| |
Collapse
|
2
|
Specht T, Arweiler J, Stüber J, Münnemann K, Hasse H, Jirasek F. Automated nuclear magnetic resonance fingerprinting of mixtures. MAGNETIC RESONANCE IN CHEMISTRY : MRC 2024; 62:286-297. [PMID: 37515509 DOI: 10.1002/mrc.5381] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 06/30/2023] [Accepted: 07/03/2023] [Indexed: 07/31/2023]
Abstract
Nuclear magnetic resonance (NMR) spectroscopy is a powerful tool for qualitative and quantitative analysis. However, for complex mixtures, determining the speciation from NMR spectra can be tedious and sometimes even unfeasible. On the other hand, identifying and quantifying structural groups in a mixture from NMR spectra is much easier than doing the same for components. We call this group-based approach "NMR fingerprinting." In this work, we show that NMR fingerprinting can even be performed in an automated way, without expert knowledge, based only on standard NMR spectra, namely, 13C, 1H, and 13C DEPT NMR spectra. Our approach is based on the machine-learning method of support vector classification (SVC), which was trained here on thousands of labeled pure-component NMR spectra from open-source data banks. We demonstrate the applicability of the automated NMR fingerprinting using test mixtures, of which spectra were taken using a simple benchtop NMR spectrometer. The results from the NMR fingerprinting agree remarkably well with the ground truth, which was known from the gravimetric preparation of the samples. To facilitate the application of the method, we provide an interactive website (https://nmr-fingerprinting.de), where spectral information can be uploaded and which returns the NMR fingerprint. The NMR fingerprinting can be used in many ways, for example, for process monitoring or thermodynamic modeling using group-contribution methods-or simply as a first step in species analysis.
Collapse
Affiliation(s)
- Thomas Specht
- Laboratory of Engineering Thermodynamics (LTD), RPTU Kaiserslautern, Kaiserslautern, Germany
| | - Justus Arweiler
- Laboratory of Engineering Thermodynamics (LTD), RPTU Kaiserslautern, Kaiserslautern, Germany
| | - Johannes Stüber
- Laboratory of Engineering Thermodynamics (LTD), RPTU Kaiserslautern, Kaiserslautern, Germany
| | - Kerstin Münnemann
- Laboratory of Engineering Thermodynamics (LTD), RPTU Kaiserslautern, Kaiserslautern, Germany
| | - Hans Hasse
- Laboratory of Engineering Thermodynamics (LTD), RPTU Kaiserslautern, Kaiserslautern, Germany
| | - Fabian Jirasek
- Laboratory of Engineering Thermodynamics (LTD), RPTU Kaiserslautern, Kaiserslautern, Germany
| |
Collapse
|
3
|
Hu G, Qiu M. Machine learning-assisted structure annotation of natural products based on MS and NMR data. Nat Prod Rep 2023; 40:1735-1753. [PMID: 37519196 DOI: 10.1039/d3np00025g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/01/2023]
Abstract
Covering: up to March 2023Machine learning (ML) has emerged as a popular tool for analyzing the structures of natural products (NPs). This review presents a summary of the recent advancements in ML-assisted mass spectrometry (MS) and nuclear magnetic resonance (NMR) data analysis to establish the chemical structures of NPs. First, ML-based MS/MS analyses that rely on library matching are discussed, which involves the utilization of ML algorithms to calculate similarity, predict the MS/MS fragments, and form molecular fingerprint. Then, ML assisted MS/MS structural annotation without library matching is reviewed. Furthermore, the cases of ML algorithms in assisting structural studies of NPs based on NMR are discussed from four perspectives: NMR prediction, functional group identification, structural categorization and quantum chemical calculation. Finally, the review concludes with a discussion of the challenges and the trends associated with the structural establishment of NPs based on ML algorithms.
Collapse
Affiliation(s)
- Guilin Hu
- State Key Laboratory of Phytochemistry and Plant Resources in West China, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China.
- University of the Chinese Academy of Sciences, Beijing 100049, People's Republic of China
| | - Minghua Qiu
- State Key Laboratory of Phytochemistry and Plant Resources in West China, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China.
- University of the Chinese Academy of Sciences, Beijing 100049, People's Republic of China
| |
Collapse
|
4
|
Li C, Cong Y, Deng W. Identifying molecular functional groups of organic compounds by deep learning of NMR data. MAGNETIC RESONANCE IN CHEMISTRY : MRC 2022; 60:1061-1069. [PMID: 35674984 DOI: 10.1002/mrc.5292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 06/02/2022] [Accepted: 06/06/2022] [Indexed: 06/15/2023]
Abstract
We preprocess the raw nuclear magnetic resonance (NMR) spectrum and extract key features by using two different methodologies, called equidistant sampling and peak sampling for subsequent substructure pattern recognition. We also provide a strategy to address the imbalance issue frequently encountered in statistical modeling of NMR data set and establish two conventional support vector machine (SVM) and K-nearest neighbor (KNN) models to assess the capability of two feature selections, respectively. Our results in this study show that the models using the selected features of peak sampling outperform those using equidistant sampling. Then we build the recurrent neural network (RNN) model trained by data collected from peak sampling. Furthermore, we illustrate the easier optimization of hyperparameters and the better generalization ability of the RNN deep learning model by detailed comparison with traditional machine learning SVM and KNN models.
Collapse
Affiliation(s)
- Chongcan Li
- School of Mathematics and Statistics, Gansu Key Laboratory of Applied Mathematics and Complex Systems, Lanzhou University, Lanzhou, China
| | - Yong Cong
- College of Chemistry and Chemical Engineering, State Key Laboratory of Applied Organic Chemistry, Key Laboratory of Nonferrous Metals Chemistry and Resources Utilization, Lanzhou University, Lanzhou, China
| | - Weihua Deng
- School of Mathematics and Statistics, Gansu Key Laboratory of Applied Mathematics and Complex Systems, Lanzhou University, Lanzhou, China
| |
Collapse
|
5
|
Xu Z, Gu S, Li Y, Wu J, Zhao Y. Recognition-Enabled Automated Analyte Identification via 19F NMR. Anal Chem 2022; 94:8285-8292. [PMID: 35622989 DOI: 10.1021/acs.analchem.2c00642] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Nuclear magnetic resonance (NMR) is an indispensable tool for structural elucidation and noninvasive analysis. Automated identification of analytes with NMR is highly pursued in metabolism research and disease diagnosis; however, this process is often complicated by the signal overlap and the sample matrix. We herein report a detection scheme based on 19F NMR spectroscopy and dynamic recognition, which effectively simplifies the detection signal and mitigates the influence of the matrix on the detection. It is demonstrated that this approach can not only detect and differentiate capsaicin and dihydrocapsaicin in complex real-world samples but also quantify the ibuprofen content in sustained-release capsules. Based on the 19F signals obtained in the detection using a set of three 19F probes, automated analyte identification is achieved, effectively reducing the odds of misrecognition caused by structural similarity.
Collapse
Affiliation(s)
- Zhenchuang Xu
- Key Laboratory of Organofluorine Chemistry, Shanghai Institute of Organic Chemistry, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 345 Ling-Ling Road, Shanghai 200032, China
| | - Siyi Gu
- Key Laboratory of Organofluorine Chemistry, Shanghai Institute of Organic Chemistry, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 345 Ling-Ling Road, Shanghai 200032, China
| | - Yipeng Li
- Key Laboratory of Organofluorine Chemistry, Shanghai Institute of Organic Chemistry, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 345 Ling-Ling Road, Shanghai 200032, China
| | - Jian Wu
- Instrumental Analysis Center, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, 345 Ling-Ling Road, Shanghai 200032, China
| | - Yanchuan Zhao
- Key Laboratory of Organofluorine Chemistry, Shanghai Institute of Organic Chemistry, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 345 Ling-Ling Road, Shanghai 200032, China.,Key Laboratory of Energy Regulation Materials, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, 345 Ling-Ling Road, Shanghai 200032, China
| |
Collapse
|
6
|
Huang Z, Chen MS, Woroch CP, Markland TE, Kanan MW. A framework for automated structure elucidation from routine NMR spectra. Chem Sci 2021; 12:15329-15338. [PMID: 34976353 PMCID: PMC8635205 DOI: 10.1039/d1sc04105c] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 11/08/2021] [Indexed: 12/25/2022] Open
Abstract
Methods to automate structure elucidation that can be applied broadly across chemical structure space have the potential to greatly accelerate chemical discovery. NMR spectroscopy is the most widely used and arguably the most powerful method for elucidating structures of organic molecules. Here we introduce a machine learning (ML) framework that provides a quantitative probabilistic ranking of the most likely structural connectivity of an unknown compound when given routine, experimental one dimensional 1H and/or 13C NMR spectra. In particular, our ML-based algorithm takes input NMR spectra and (i) predicts the presence of specific substructures out of hundreds of substructures it has learned to identify; (ii) annotates the spectrum to label peaks with predicted substructures; and (iii) uses the substructures to construct candidate constitutional isomers and assign to them a probabilistic ranking. Using experimental spectra and molecular formulae for molecules containing up to 10 non-hydrogen atoms, the correct constitutional isomer was the highest-ranking prediction made by our model in 67.4% of the cases and one of the top-ten predictions in 95.8% of the cases. This advance will aid in solving the structure of unknown compounds, and thus further the development of automated structure elucidation tools that could enable the creation of fully autonomous reaction discovery platforms. A machine learning model and graph generator were able to accurately predict for the presence of nearly 1000 substructures and the connectivity of small organic molecules from experimental 1D NMR data.![]()
Collapse
Affiliation(s)
- Zhaorui Huang
- Department of Chemistry, Stanford University Stanford CA 94305 USA
| | - Michael S Chen
- Department of Chemistry, Stanford University Stanford CA 94305 USA
| | | | | | - Matthew W Kanan
- Department of Chemistry, Stanford University Stanford CA 94305 USA
| |
Collapse
|
7
|
Beniddir MA, Kang KB, Genta-Jouve G, Huber F, Rogers S, van der Hooft JJJ. Advances in decomposing complex metabolite mixtures using substructure- and network-based computational metabolomics approaches. Nat Prod Rep 2021; 38:1967-1993. [PMID: 34821250 PMCID: PMC8597898 DOI: 10.1039/d1np00023c] [Citation(s) in RCA: 81] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Indexed: 12/13/2022]
Abstract
Covering: up to the end of 2020Recently introduced computational metabolome mining tools have started to positively impact the chemical and biological interpretation of untargeted metabolomics analyses. We believe that these current advances make it possible to start decomposing complex metabolite mixtures into substructure and chemical class information, thereby supporting pivotal tasks in metabolomics analysis including metabolite annotation, the comparison of metabolic profiles, and network analyses. In this review, we highlight and explain key tools and emerging strategies covering 2015 up to the end of 2020. The majority of these tools aim at processing and analyzing liquid chromatography coupled to mass spectrometry fragmentation data. We start with defining what substructures are, how they relate to molecular fingerprints, and how recognizing them helps to decompose complex mixtures. We continue with chemical classes that are based on the presence or absence of particular molecular scaffolds and/or functional groups and are thus intrinsically related to substructures. We discuss novel tools to mine substructures, annotate chemical compound classes, and create mass spectral networks from metabolomics data and demonstrate them using two case studies. We also review and speculate about the opportunities that NMR spectroscopy-based metabolome mining of complex metabolite mixtures offers to discover substructures and chemical classes. Finally, we will describe the main benefits and limitations of the current tools and strategies that rely on them, and our vision on how this exciting field can develop toward repository-scale-sized metabolomics analyses. Complementary sources of structural information from genomics analyses and well-curated taxonomic records are also discussed. Many research fields such as natural products discovery, pharmacokinetic and drug metabolism studies, and environmental metabolomics increasingly rely on untargeted metabolomics to gain biochemical and biological insights. The here described technical advances will benefit all those metabolomics disciplines by transforming spectral data into knowledge that can answer biological questions.
Collapse
Affiliation(s)
- Mehdi A Beniddir
- Université Paris-Saclay, CNRS, BioCIS, 5 rue J.-B Clément, 92290 Châtenay-Malabry, France
| | - Kyo Bin Kang
- Research Institute of Pharmaceutical Sciences, College of Pharmacy, Sookmyung Women's University, Seoul 04310, Republic of Korea
| | - Grégory Genta-Jouve
- Laboratoire de Chimie-Toxicologie Analytique et Cellulaire (C-TAC), UMR CNRS 8038, CiTCoM, Université de Paris, 4, Avenue de l'Observatoire, 75006, Paris, France
- Laboratoire Ecologie, Evolution, Interactions des Systèmes Amazoniens (LEEISA), USR 3456, Université De Guyane, CNRS Guyane, 275 Route de Montabo, 97334 Cayenne, French Guiana, France
| | - Florian Huber
- Netherlands eScience Center, 1098 XG Amsterdam, The Netherlands
| | - Simon Rogers
- School of Computing Science, University of Glasgow, Glasgow G12 8QQ, UK
| | | |
Collapse
|