1
|
Shang H, Wu Q, Wu J, Zhou S, Wang Z, Wang H, Yin J. Study on breast cancerization and isolated diagnosis in situ by HOF-ATR-MIR spectroscopy with deep learning. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2024; 319:124546. [PMID: 38824755 DOI: 10.1016/j.saa.2024.124546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 05/20/2024] [Accepted: 05/26/2024] [Indexed: 06/04/2024]
Abstract
Mid-infrared (MIR) spectroscopy can characterize the content and structural changes of macromolecular components in different breast tissues, which can be used for feature extraction and model training by machine learning to achieve accurate classification and recognition of different breast tissues. In parallel, the one-dimensional convolutional neural network (1D-CNN) stands out in the field of deep learning for its ability to efficiently process sequential data, such as spectroscopic signals. In this study, MIR spectra of breast tissue were collected in situ by coupling the self-developed MIR hollow optical fiber attenuated total reflection (HOF-ATR) probe with a Fourier transform infrared spectroscopy (FTIR) spectrometer. Staging analysis was conducted on the changes in macromolecular content and structure in breast cancer tissues. For the first time, a trinary classification model was established based on 1D-CNN for recognizing normal, paracancerous and cancerous tissues. The final predication results reveal that the 1D-CNN model based on baseline correction (BC) and data augmentation yields more precise classification results, with a total accuracy of 95.09%, exhibiting superior discrimination ability than machine learning models of SVM-DA (90.00%), SVR (88.89%), PCA-FDA (67.78%) and PCA-KNN (70.00%). The experimental results suggest that the application of 1D-CNN enables accurate classification and recognition of different breast tissues, which can be considered as a precise, efficient and intelligent novel method for breast cancer diagnosis.
Collapse
Affiliation(s)
- Hui Shang
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
| | - Qingxia Wu
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
| | - Jinjin Wu
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
| | - Suwei Zhou
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
| | - Zihan Wang
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
| | - Huijie Wang
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China.
| | - Jianhua Yin
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China.
| |
Collapse
|
2
|
Lu XY, Wu HP, Ma H, Li H, Li J, Liu YT, Pan ZY, Xie Y, Wang L, Ren B, Liu GK. Deep Learning-Assisted Spectrum-Structure Correlation: State-of-the-Art and Perspectives. Anal Chem 2024; 96:7959-7975. [PMID: 38662943 DOI: 10.1021/acs.analchem.4c01639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Spectrum-structure correlation is playing an increasingly crucial role in spectral analysis and has undergone significant development in recent decades. With the advancement of spectrometers, the high-throughput detection triggers the explosive growth of spectral data, and the research extension from small molecules to biomolecules accompanies massive chemical space. Facing the evolving landscape of spectrum-structure correlation, conventional chemometrics becomes ill-equipped, and deep learning assisted chemometrics rapidly emerges as a flourishing approach with superior ability of extracting latent features and making precise predictions. In this review, the molecular and spectral representations and fundamental knowledge of deep learning are first introduced. We then summarize the development of how deep learning assist to establish the correlation between spectrum and molecular structure in the recent 5 years, by empowering spectral prediction (i.e., forward structure-spectrum correlation) and further enabling library matching and de novo molecular generation (i.e., inverse spectrum-structure correlation). Finally, we highlight the most important open issues persisted with corresponding potential solutions. With the fast development of deep learning, it is expected to see ultimate solution of establishing spectrum-structure correlation soon, which would trigger substantial development of various disciplines.
Collapse
Affiliation(s)
- Xin-Yu Lu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Hao-Ping Wu
- State Key Laboratory of Marine Environmental Science, Fujian Provincial Key Laboratory for Coastal Ecology and Environmental Studies, Center for Marine Environmental Chemistry & Toxicology, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, P. R. China
| | - Hao Ma
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Hui Li
- Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, Xiamen 361005, P. R. China
| | - Jia Li
- Institute of Artificial Intelligence, Xiamen University, Xiamen 361005, P. R. China
| | - Yan-Ti Liu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Zheng-Yan Pan
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Yi Xie
- School of Informatics, Xiamen University, Xiamen 361005, P. R. China
| | - Lei Wang
- Pen-Tung Sah Institute of Micro-Nano Science and Technology, Xiamen University, Xiamen 361005, P. R. China
| | - Bin Ren
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Guo-Kun Liu
- State Key Laboratory of Marine Environmental Science, Fujian Provincial Key Laboratory for Coastal Ecology and Environmental Studies, Center for Marine Environmental Chemistry & Toxicology, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, P. R. China
| |
Collapse
|
3
|
Venetos MC, Elkin M, Delaney C, Hartwig JF, Persson KA. Deconvolution and Analysis of the 1H NMR Spectra of Crude Reaction Mixtures. J Chem Inf Model 2024; 64:3008-3020. [PMID: 38573053 DOI: 10.1021/acs.jcim.3c01864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2024]
Abstract
Nuclear magnetic resonance (NMR) spectroscopy is an important analytical technique in synthetic organic chemistry, but its integration into high-throughput experimentation workflows has been limited by the necessity of manually analyzing the NMR spectra of new chemical entities. Current efforts to automate the analysis of NMR spectra rely on comparisons to databases of reported spectra for known compounds and, therefore, are incompatible with the exploration of new chemical space. By reframing the NMR spectrum of a reaction mixture as a joint probability distribution, we have used Hamiltonian Monte Carlo Markov Chain and density functional theory to fit the predicted NMR spectra to those of crude reaction mixtures. This approach enables the deconvolution and analysis of the spectra of mixtures of compounds without relying on reported spectra. The utility of our approach to analyze crude reaction mixtures is demonstrated with the experimental spectra of reactions that generate a mixture of isomers, such as Wittig olefination and C-H functionalization reactions. The correct identification of compounds in a reaction mixture and their relative concentrations is achieved with a mean absolute error as low as 1%.
Collapse
Affiliation(s)
- Maxwell C Venetos
- Department of Materials Science and Engineering, University of California, Berkeley, California 94720, United States
| | - Masha Elkin
- Department of Chemistry, University of California, Berkeley, California 94720, United States
| | - Connor Delaney
- Department of Chemistry, University of California, Berkeley, California 94720, United States
| | - John F Hartwig
- Department of Chemistry, University of California, Berkeley, California 94720, United States
| | - Kristin A Persson
- Department of Materials Science and Engineering, University of California, Berkeley, California 94720, United States
- Molecular Foundry, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| |
Collapse
|
4
|
Guo Z, Fan Y, Yu C, Lu H, Zhang Z. GCMSFormer: A Fully Automatic Method for the Resolution of Overlapping Peaks in Gas Chromatography-Mass Spectrometry. Anal Chem 2024; 96:5878-5886. [PMID: 38560891 DOI: 10.1021/acs.analchem.3c05772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Gas chromatography-mass spectrometry (GC-MS) is one of the most important instruments for analyzing volatile organic compounds. However, the complexity of real samples and the limitations of chromatographic separation capabilities lead to coeluting compounds without ideal separation. In this study, a Transformer-based automatic resolution method (GCMSFormer) is proposed to resolve mass spectra from GC-MS peaks in an end-to-end manner, predicting the mass spectra of components directly from the raw overlapping peaks data. Furthermore, orthogonal projection resolution (OPR) was integrated into GCMSFormer to resolve minor components. The GCMSFormer model was trained, validated, and tested using 100,000 augmented data. It achieves 99.88% of the bilingual evaluation understudy (BLEU) value on the test set, significantly higher than the 97.68% BLEU value of the baseline sequence-to-sequence model long short-term memory (LSTM). GCMSFormer was also compared with two nondeep learning resolution tools (MZmine and AMDIS) and two deep learning resolution tools (PARAFAC2 with DL and MSHub/GNPS) on a real plant essential oil GC-MS data set. Their resolution results were compared on evaluation metrics, including the number of compounds resolved, mass spectral match score, correlation coefficient, explained variance, and resolution speed. The results demonstrate that GCMSFormer has better resolution performance, higher automation, and faster resolution speed. In summary, GCMSFormer is an end-to-end, fast, fully automatic, and accurate method for analyzing GC-MS data of complex samples.
Collapse
Affiliation(s)
- Zixuan Guo
- College of Chemistry and Chemical Engineering, Central South University, Hunan, Changsha 410083, China
| | - Yingjie Fan
- College of Chemistry and Chemical Engineering, Central South University, Hunan, Changsha 410083, China
| | - Chuanxiu Yu
- College of Chemistry and Chemical Engineering, Central South University, Hunan, Changsha 410083, China
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, Hunan, Changsha 410083, China
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Hunan, Changsha 410083, China
| |
Collapse
|
5
|
Hu G, Qiu M. Machine learning-assisted structure annotation of natural products based on MS and NMR data. Nat Prod Rep 2023; 40:1735-1753. [PMID: 37519196 DOI: 10.1039/d3np00025g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/01/2023]
Abstract
Covering: up to March 2023Machine learning (ML) has emerged as a popular tool for analyzing the structures of natural products (NPs). This review presents a summary of the recent advancements in ML-assisted mass spectrometry (MS) and nuclear magnetic resonance (NMR) data analysis to establish the chemical structures of NPs. First, ML-based MS/MS analyses that rely on library matching are discussed, which involves the utilization of ML algorithms to calculate similarity, predict the MS/MS fragments, and form molecular fingerprint. Then, ML assisted MS/MS structural annotation without library matching is reviewed. Furthermore, the cases of ML algorithms in assisting structural studies of NPs based on NMR are discussed from four perspectives: NMR prediction, functional group identification, structural categorization and quantum chemical calculation. Finally, the review concludes with a discussion of the challenges and the trends associated with the structural establishment of NPs based on ML algorithms.
Collapse
Affiliation(s)
- Guilin Hu
- State Key Laboratory of Phytochemistry and Plant Resources in West China, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China.
- University of the Chinese Academy of Sciences, Beijing 100049, People's Republic of China
| | - Minghua Qiu
- State Key Laboratory of Phytochemistry and Plant Resources in West China, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China.
- University of the Chinese Academy of Sciences, Beijing 100049, People's Republic of China
| |
Collapse
|
6
|
Wang Y, Wei W, Du W, Cai J, Liao Y, Lu H, Kong B, Zhang Z. Deep-Learning-Based Mixture Identification for Nuclear Magnetic Resonance Spectroscopy Applied to Plant Flavors. Molecules 2023; 28:7380. [PMID: 37959799 PMCID: PMC10648966 DOI: 10.3390/molecules28217380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 10/25/2023] [Accepted: 10/30/2023] [Indexed: 11/15/2023] Open
Abstract
Nuclear magnetic resonance (NMR) is a crucial technique for analyzing mixtures consisting of small molecules, providing non-destructive, fast, reproducible, and unbiased benefits. However, it is challenging to perform mixture identification because of the offset of chemical shifts and peak overlaps that often exist in mixtures such as plant flavors. Here, we propose a deep-learning-based mixture identification method (DeepMID) that can be used to identify plant flavors (mixtures) in a formulated flavor (mixture consisting of several plant flavors) without the need to know the specific components in the plant flavors. A pseudo-Siamese convolutional neural network (pSCNN) and a spatial pyramid pooling (SPP) layer were used to solve the problems due to their high accuracy and robustness. The DeepMID model is trained, validated, and tested on an augmented data set containing 50,000 pairs of formulated and plant flavors. We demonstrate that DeepMID can achieve excellent prediction results in the augmented test set: ACC = 99.58%, TPR = 99.48%, FPR = 0.32%; and two experimentally obtained data sets: one shows ACC = 97.60%, TPR = 92.81%, FPR = 0.78% and the other shows ACC = 92.31%, TPR = 80.00%, FPR = 0.00%. In conclusion, DeepMID is a reliable method for identifying plant flavors in formulated flavors based on NMR spectroscopy, which can assist researchers in accelerating the design of flavor formulations.
Collapse
Affiliation(s)
- Yufei Wang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China; (Y.W.); (Y.L.); (H.L.)
| | - Weiwei Wei
- Technology Center, China Tobacco Hunan Industrial Co., Ltd., Changsha 410014, China; (W.W.); (W.D.); (J.C.)
| | - Wen Du
- Technology Center, China Tobacco Hunan Industrial Co., Ltd., Changsha 410014, China; (W.W.); (W.D.); (J.C.)
| | - Jiaxiao Cai
- Technology Center, China Tobacco Hunan Industrial Co., Ltd., Changsha 410014, China; (W.W.); (W.D.); (J.C.)
| | - Yuxuan Liao
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China; (Y.W.); (Y.L.); (H.L.)
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China; (Y.W.); (Y.L.); (H.L.)
| | - Bo Kong
- Technology Center, China Tobacco Hunan Industrial Co., Ltd., Changsha 410014, China; (W.W.); (W.D.); (J.C.)
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China; (Y.W.); (Y.L.); (H.L.)
| |
Collapse
|
7
|
Sampiron EG, Calsavara LL, Baldin VP, Montaholi DC, Leme ALD, Namba DY, Alves Olher VG, Caleffi-Ferraciolli KR, Cardoso RF, Siqueira VLD, Vandresen F, Scodro RBDL. Isoniazid-N-acylhydrazones as promising compounds for the anti-tuberculosis treatment. Tuberculosis (Edinb) 2023; 141:102363. [PMID: 37311289 DOI: 10.1016/j.tube.2023.102363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/29/2023] [Accepted: 06/05/2023] [Indexed: 06/15/2023]
Abstract
Tuberculosis (TB), a disease caused by Mycobacterium tuberculosis complex, still presents significant numbers of incidence and mortality, in addition to several cases of drug resistance. Resistance, especially to isoniazid, which is one of the main drugs used in the treatment, has increased. In this context, N-acylhydrazones derived from isoniazid have shown important anti-Mycobacterium tuberculosis activity. Hence, this work aimed to determine the anti-TB potential of 11 isoniazid-N-acylhydrazones (INH-acylhydrazones). For this purpose, the determination of minimum inhibitory concentration (MIC) against M. tuberculosis H37Rv and clinical isolates was carried out. Drug combination, minimum bactericidal concentration, cytotoxicity, and in silico parameters were also performed. INH-acylhydrazones (2), (8), and (9) had MIC for M. tuberculosis H37Rv similar to or lower than isoniazid, and bactericidal activity was observed. In addition, these compounds showed low cytotoxicity, with a selectivity index greater than 3,000. Interesting results were also obtained in the drug combination assay, with synergistic combinations with isoniazid, ethambutol, and rifampicin. In the in silico study, INH-acylhydrazones behaved similarly to INH, but with improvements in some aspects. Based on these findings, it is concluded that compounds (2), (8), and (9) are considered promising scaffolds and warrant further investigation for designing future antimicrobial drugs.
Collapse
Affiliation(s)
- Eloísa Gibin Sampiron
- Postgraduate Program in Health Sciences, State University of Maringá (UEM), Maringá, Paraná, 87020-900, Brazil.
| | | | | | - Débora Cássia Montaholi
- Postgraduate Program in Health Sciences, State University of Maringá (UEM), Maringá, Paraná, 87020-900, Brazil
| | | | - Danillo Yuji Namba
- Department of Chemistry, Federal Technological University of Paraná, Londrina, Paraná, 86057-970, Brazil
| | | | - Katiany Rizzieri Caleffi-Ferraciolli
- Postgraduate Program in Bioscience and Physiopathology, UEM, Maringá, Paraná, 87020-900, Brazil; Department of Clinical Analysis and Biomedicine, UEM, Maringá, Paraná, 87020-900, Brazil
| | - Rosilene Fressatti Cardoso
- Postgraduate Program in Health Sciences, State University of Maringá (UEM), Maringá, Paraná, 87020-900, Brazil; Postgraduate Program in Bioscience and Physiopathology, UEM, Maringá, Paraná, 87020-900, Brazil; Department of Clinical Analysis and Biomedicine, UEM, Maringá, Paraná, 87020-900, Brazil
| | - Vera Lucia Dias Siqueira
- Postgraduate Program in Bioscience and Physiopathology, UEM, Maringá, Paraná, 87020-900, Brazil; Department of Clinical Analysis and Biomedicine, UEM, Maringá, Paraná, 87020-900, Brazil
| | - Fábio Vandresen
- Department of Chemistry, Federal Technological University of Paraná, Londrina, Paraná, 86057-970, Brazil
| | - Regiane Bertin de Lima Scodro
- Postgraduate Program in Health Sciences, State University of Maringá (UEM), Maringá, Paraná, 87020-900, Brazil; Department of Clinical Analysis and Biomedicine, UEM, Maringá, Paraná, 87020-900, Brazil
| |
Collapse
|
8
|
Tian X, Wang P, Tian Y, Zhang R, Jiang Z, Gao J. Classification method based on Siamese-like neural network for inter-species blood Raman spectra similarity measure. JOURNAL OF BIOPHOTONICS 2023; 16:e202200377. [PMID: 36906736 DOI: 10.1002/jbio.202200377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 03/07/2023] [Accepted: 03/08/2023] [Indexed: 06/07/2023]
Abstract
Analysis of blood species is an extremely important part in customs inspection, forensic investigation, wildlife protection and other fields. In this study, a classification method based on Siamese-like neural network (SNN) for interspecies blood (22 species) was proposed to measure Raman Spectra similarity. The average accuracy was above 99.20% in the test set of spectra (known species) that did not appear in the training set. This model could detect species not represented in the dataset underlying the model. After adding new species to the training set, we can update the training based on the original model without retraining the model from scratch. For species with lower accuracy, SNN model can be trained intensively in the form of enriched training data for that species. A single model can achieve both multiple-classification and binary classification functions. Moreover, SNN showed higher accuracy rates when trained with smaller datasets compared to other methods.
Collapse
Affiliation(s)
- Xianli Tian
- School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230026, China
- Jiangsu Key Laboratory of Medical Optics, Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, Jiangsu, 215163, China
| | - Peng Wang
- Jiangsu Key Laboratory of Medical Optics, Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, Jiangsu, 215163, China
| | - Yubing Tian
- School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230026, China
- Jiangsu Key Laboratory of Medical Optics, Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, Jiangsu, 215163, China
| | - Rui Zhang
- School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230026, China
- Jiangsu Key Laboratory of Medical Optics, Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, Jiangsu, 215163, China
| | - Zhehan Jiang
- School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230026, China
- Jiangsu Key Laboratory of Medical Optics, Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, Jiangsu, 215163, China
| | - Jing Gao
- School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230026, China
- Jiangsu Key Laboratory of Medical Optics, Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, Jiangsu, 215163, China
| |
Collapse
|
9
|
Baxter JR, Holland DC, Gavranich B, Nicolle D, Hayton JB, Avery VM, Carroll AR. NMR Fingerprints of Formyl Phloroglucinol Meroterpenoids and Their Application to the Investigation of Eucalyptus gittinsii subsp. gittinsii. JOURNAL OF NATURAL PRODUCTS 2023; 86:1317-1334. [PMID: 37171174 DOI: 10.1021/acs.jnatprod.3c00139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
NMR fingerprints provide powerful tools to identify natural products in complex mixtures. Principal component analysis and machine learning using 1H and 13C NMR data, alongside structural information from 180 published formyl phloroglucinols, have generated diagnostic NMR fingerprints to categorize subclasses within this group. This resulted in the reassignment of 167 NMR chemical shifts ascribed to 44 compounds. Three pyrano-diformyl phloroglucinols, euglobal In-1 and psiguadiols E and G, contained 1H and 13C NMR data inconsistent with their predicted phloroglucinol subclass. Subsequent reinterpretation of their 2D NMR data combined with DFT 13C NMR chemical shift and ECD calculations led to their structure revisions. Direct covariance processing of HMBC data permitted 1H resonances for individual compounds in mixtures to be associated, and analysis of their 1H/13C HMBC correlations using the fingerprint tool further classified components into phloroglucinol subclasses. NMR fingerprinting HMBC data obtained for six eucalypt flower extracts identified three subclasses of pyrano-acyl-formyl phloroglucinols from Eucalyptus gittinsii subsp. gittinsii. New, eucalteretial F and (+)-eucalteretial B, and known, (-)-euglobal VII and eucalrobusone C, compounds, each belonging to predicted subclasses, were isolated and characterized. Staphylococcus aureus and Plasmodium falciparum screening revealed eucalrobusone C as the most potent antiplasmodial formyl phloroglucinol to date.
Collapse
Affiliation(s)
- James R Baxter
- School of Environment and Science, Griffith University, Gold Coast, Qld 4222, Australia
| | - Darren C Holland
- School of Environment and Science, Griffith University, Gold Coast, Qld 4222, Australia
| | - Brody Gavranich
- School of Environment and Science, Griffith University, Gold Coast, Qld 4222, Australia
| | - Dean Nicolle
- Currency Creek Arboretum, PO Box 808, Melrose Park, SA 5039, Australia
| | - Joshua B Hayton
- School of Environment and Science, Griffith University, Gold Coast, Qld 4222, Australia
| | - Vicky M Avery
- Griffith Institute for Drug Discovery, Griffith University, Brisbane, Qld 4111, Australia
- Discovery Biology, Griffith University, Brisbane, QLD 4111, Australia
| | - Anthony R Carroll
- School of Environment and Science, Griffith University, Gold Coast, Qld 4222, Australia
- Griffith Institute for Drug Discovery, Griffith University, Brisbane, Qld 4111, Australia
| |
Collapse
|
10
|
Wang W, Ma LH, Maletic-Savatic M, Liu Z. NMRQNet: a deep learning approach for automatic identification and quantification of metabolites using Nuclear Magnetic Resonance (NMR) in human plasma samples. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.01.530642. [PMID: 36909516 PMCID: PMC10002723 DOI: 10.1101/2023.03.01.530642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
Abstract
Nuclear Magnetic Resonance is a powerful platform that reveals the metabolomics profiles within biofluids or tissues and contributes to personalized treatments in medical practice. However, data volume and complexity hinder the exploration of NMR spectra. Besides, the lack of fast and accurate computational tools that can handle the automatic identification and quantification of essential metabolites from NMR spectra also slows the wide application of these techniques in clinical. We present NMRQNet, a deep-learning-based pipeline for automatic identification and quantification of dominant metabolite candidates within human plasma samples. The estimated relative concentrations could be further applied in statistical analysis to extract the potential biomarkers. We evaluate our method on multiple plasma samples, including species from mice to humans, curated using three anticoagulants, covering healthy and patient conditions in neurological disorder disease, greatly expanding the metabolomics analytical space in plasma. NMRQNet accurately reconstructed the original spectra and obtained significantly better quantification results than the earlier computational methods. Besides, NMRQNet also proposed relevant metabolites biomarkers that could potentially explain the risk factors associated with the condition. NMRQNet, with improved prediction performance, highlights the limitations in the existing approaches and has shown strong application potential for future metabolomics disease studies using plasma samples.
Collapse
Affiliation(s)
- Wanli Wang
- Jan and Dan Duncan Neurological Research Institute at Texas Children’s Hospital, Houston, TX, 77030, USA
- Graduate Program of Quantitative & Computational Biosciences, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Li-Hua Ma
- Advanced Technology Cores, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Mirjana Maletic-Savatic
- Jan and Dan Duncan Neurological Research Institute at Texas Children’s Hospital, Houston, TX, 77030, USA
- Department of Pediatrics-Neurology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Zhandong Liu
- Jan and Dan Duncan Neurological Research Institute at Texas Children’s Hospital, Houston, TX, 77030, USA
- Department of Pediatrics-Neurology, Baylor College of Medicine, Houston, TX, 77030, USA
| |
Collapse
|
11
|
Shang H, Shang L, Wu J, Xu Z, Zhou S, Wang Z, Wang H, Yin J. NIR spectroscopy combined with 1D-convolutional neural network for breast cancerization analysis and diagnosis. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2023; 287:121990. [PMID: 36327802 DOI: 10.1016/j.saa.2022.121990] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 10/05/2022] [Accepted: 10/10/2022] [Indexed: 06/16/2023]
Abstract
Near-infrared (NIR) spectroscopy with deep penetration can characterize the composition of biological tissue based on the vibration of the X-H group in a rapid and high-specificity way. Deep learning is proven helpful for rapid and automatic identification of tissue cancerization. In this study, NIR spectroscopic detection equipped with the lab-made NIR probe was performed to in situ explore the change of molecular compositions in breast cancerization, where the diffused NIR spectra were efficiently collected at different locations of cancerous and paracancerous areas. The breast cancerous-paracancerous discriminant model was established based on one-dimensional convolutional neural network (1D-CNN). By optimizing the structure of the neural network, the high classification accuracy (94.67%), recall/sensitivity (95.33%), specificity (94.00%), precision (94.08%) and F1 score (0.9470) were achieved, showing the better discrimination ability and reliability than the K-Nearest Neighbor (KNN, 88.34%, 98.21%, 76.11%, 83.59%, 0.9031) and Fisher Discriminant Analysis (FDA, 90.00%, 96.43%, 81.82%, 87.10%, 0.9153) methods. The experimental results indicate that the application of 1D-CNN can discriminate the cancerous and paracancerous breast tissues, and provide an intelligent method for clinical locating, diagnosis and treatment of breast cancer.
Collapse
Affiliation(s)
- Hui Shang
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
| | - Linwei Shang
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
| | - Jinjin Wu
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
| | - Zhibing Xu
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
| | - Suwei Zhou
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
| | - Zihan Wang
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
| | - Huijie Wang
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China.
| | - Jianhua Yin
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China.
| |
Collapse
|
12
|
Fan Y, Yu C, Lu H, Chen Y, Hu B, Zhang X, Su J, Zhang Z. Deep learning-based method for automatic resolution of gas chromatography-mass spectrometry data from complex samples. J Chromatogr A 2023; 1690:463768. [PMID: 36641940 DOI: 10.1016/j.chroma.2022.463768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 12/21/2022] [Accepted: 12/28/2022] [Indexed: 12/31/2022]
Abstract
Modern gas chromatography-mass spectrometry (GC-MS) is the workhorse for the high-throughput profiling of volatile compounds in complex samples. It can produce a considerable amount of two-dimensional data, and automatic methods are required to distill chemical information from raw GC-MS data efficiently. In this study, we proposed an Automatic Resolution method (AutoRes) based on pseudo-Siamese convolutional neural networks (pSCNN) to extract the meaningful features swamped by the noises, baseline drifts, retention time shifts, and overlapped peaks. Two pSCNN models were trained with 400,000 augmented spectral pairs, respectively. They can predict the selective region (pSCNN1) and elution region (pSCNN2) of compounds in an untargeted manner. The accuracies of the pSCNN1 model and the pSCNN2 model on their test sets are 99.9% and 92.6%, respectively. Then, the chromatographic profile of each component was automatically resolved by full rank resolution (FRR) based on the predicted regions by these models. The performance of AutoRes was evaluated on the simulated and plant essential oil datasets. Compared to AMDIS and MZmine, AutoRes resolves more reasonable mass spectra, chromatograms, and peak areas to identify and quantify compounds. The average match scores of AutoRes (925 and 936) outperformed AMDIS (909 and 925) and MZmine (888 and 916) when resolving mass spectra from overlapped peaks on the Set Ⅰ and Set Ⅱ of plant essential oil dataset and matching them against the NIST17 library. It extracted peak areas and mass spectra automatically from 10 GC-MS files of plant essential oils, and the entire process was completed in 8 min without any prior information or manual intervention. It is implemented in Python and is available as an open-source package at https://github.com/dyjfan/AutoRes.
Collapse
Affiliation(s)
- Yingjie Fan
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, Hunan, China
| | - Chuanxiu Yu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, Hunan, China
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, Hunan, China
| | - Yi Chen
- Yunnan Academy of Tobacco Agricultural Sciences, Kunming 650021, Yunnan, China
| | - Binbin Hu
- Yunnan Academy of Tobacco Agricultural Sciences, Kunming 650021, Yunnan, China
| | - Xingren Zhang
- Yunnan Academy of Tobacco Agricultural Sciences, Kunming 650021, Yunnan, China; Baoshan City Branch of Yunnan Tobacco Company, Baoshan 678000, Yunnan, China
| | - Jiaen Su
- Dali Prefecture Branch of Yunnan Tobacco Company, Dali 671000, Yunnan, China.
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, Hunan, China.
| |
Collapse
|
13
|
Unsupervised Analysis of Small Molecule Mixtures by Wavelet-Based Super-Resolved NMR. MOLECULES (BASEL, SWITZERLAND) 2023; 28:molecules28020792. [PMID: 36677850 PMCID: PMC9866129 DOI: 10.3390/molecules28020792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 12/27/2022] [Accepted: 01/03/2023] [Indexed: 01/15/2023]
Abstract
Resolving small molecule mixtures by nuclear magnetic resonance (NMR) spectroscopy has been of great interest for a long time for its precision, reproducibility, and efficiency. However, spectral analyses for such mixtures are often highly challenging due to overlapping resonance lines and limited chemical shift windows. The existing experimental and theoretical methods to produce shift NMR spectra in dealing with the problem have limited applicability owing to sensitivity issues, inconsistency, and/or the requirement of prior knowledge. Recently, we resolved the problem by decoupling multiplet structures in NMR spectra by the wavelet packet transform (WPT) technique. In this work, we developed a scheme for deploying the method in generating highly resolved WPT NMR spectra and predicting the composition of the corresponding molecular mixtures from their 1H NMR spectra in an automated fashion. The four-step spectral analysis scheme consists of calculating the WPT spectrum, peak matching with a WPT shift NMR library, followed by two optimization steps in producing the predicted molecular composition of a mixture. The robustness of the method was tested on an augmented dataset of 1000 molecular mixtures, each containing 3 to 7 molecules. The method successfully predicted the constituent molecules with a median true positive rate of 1.0 against the varying compositions, while a median false positive rate of 0.04 was obtained. The approach can be scaled easily for much larger datasets.
Collapse
|