1
|
Zhao Y, Lv W, Zhang Y, Tang M, Wang H. Enhanced data preprocessing with novel window function in Raman spectroscopy: Leveraging feature selection and machine learning for raspberry origin identification. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2024; 323:124913. [PMID: 39126867 DOI: 10.1016/j.saa.2024.124913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 07/14/2024] [Accepted: 07/29/2024] [Indexed: 08/12/2024]
Abstract
In this study, a simple and accurate approach is proposed for enhancing the origin identification of raspberry samples using a combination of innovative Raman spectral preprocessing techniques, feature selection, and machine learning algorithms. Window function was creatively introduced and combined with baseline removal technique to preprocess the Raman spectral data, reducing the dimensionality of the raw data and ensuring the quality of the processed data. An optimization process was conducted to determine the optimal parameter for the window function, resulting in a binning window width of 5 that yielded the highest accuracy. After applying three feature selection techniques, it was found that the information gain model had the best performance in extracting discriminative spectral features. Finally, ten different machine learning algorithms were employed to construct predictive models, and the optimal models were selected. Linear Support Vector Classifier (LinearSVC), Multi-Layer Perceptron Classifier (MLPClassifier), and Linear Discriminant Analysis (LDA) achieve accuracy, precision, recall, and F1 values above 0.96, while the Random Vector Functional Link Network Classifier (RVFLClassifier) surpasses 0.93 for these performance metrics. These results demonstrate the effectiveness of the proposed approach in identifying the origin of raspberry samples with high accuracy and robustness, providing a valuable tool for agricultural product authentication and quality control.
Collapse
Affiliation(s)
- Yaju Zhao
- Zhejiang Engineering Research Institute of Food & Drug Quality and Safety, Zhejiang Gongshang University, Hangzhou 310018, PR China.
| | - Wei Lv
- Zhejiang Engineering Research Institute of Food & Drug Quality and Safety, Zhejiang Gongshang University, Hangzhou 310018, PR China
| | - Yinsheng Zhang
- Zhejiang Engineering Research Institute of Food & Drug Quality and Safety, Zhejiang Gongshang University, Hangzhou 310018, PR China
| | - Minmin Tang
- Jiangsu Provincial Product Quality Supervision and Inspection Institute, Nanjing 210007, PR China
| | - Haiyan Wang
- Zhejiang Engineering Research Institute of Food & Drug Quality and Safety, Zhejiang Gongshang University, Hangzhou 310018, PR China.
| |
Collapse
|
2
|
Cai Y, Yao Z, Cheng X, He Y, Li S, Pan J. Deep metric learning framework combined with Gramian angular difference field image generation for Raman spectra classification based on a handheld Raman spectrometer. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2023; 303:123085. [PMID: 37454497 DOI: 10.1016/j.saa.2023.123085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 06/16/2023] [Accepted: 06/26/2023] [Indexed: 07/18/2023]
Abstract
Rapid identification of unknown material samples using portable or handheld Raman spectroscopy detection equipment is becoming a common analytical tool. However, the design and implementation of a set of Raman spectroscopy-based devices for substance identification must include spectral sampling of standard reference substance samples, resolution matching between different devices, and the training process of the corresponding classification models. The process of selecting a suitable classification model is frequently time-consuming, and when the number of classes of substances to be recognised increases dramatically, recognition accuracy decreases dramatically. In this paper, we propose a fast classification method for Raman spectra based on deep metric learning networks combined with the Gramian angular difference field (GADF) image generation approach. First, we uniformly convert Raman spectra acquired at different resolutions into GADF images of the same resolution, addressing spectral dimension disparities induced by resolution differences in different Raman spectroscopy detection devices. Second, a network capable of implementing nonlinear distance measurements between GADF images of different classes of substances is designed based on a deep metric learning approach. The Raman spectra of 450 different mineral classes obtained from the RRUFF database were converted into GADF images and used to train this deep metric learning network. Finally, the trained network can be installed on an embedded computing platform and used in conjunction with portable or handheld Raman spectroscopic detection sensors to perform material identification tasks at various scales. A series of experiments demonstrate that our trained deep metric learning network outperforms existing mainstream machine learning models on classification tasks of different sizes. For the two tasks of Raman spectral classification of natural minerals of 260 classes and Raman spectral classification of pathogenic bacteria of 8 classes with significant noise, our suggested model achieved 98.05% and 90.13% classification accuracy, respectively. Finally, we also deployed the model in a handheld Raman spectrometer and conducted identification experiments on 350 samples of chemical substances attributed to 32 classes, achieving a classification accuracy of 99.14%. These results demonstrate that our method can greatly improve the efficiency of developing Raman spectroscopy-based substance detection devices and can be widely used in tasks of unknown substance identification.
Collapse
Affiliation(s)
- Yaoyi Cai
- College of Engineering and Design, Hunan Normal University, Changsha, Hunan 410083, PR China; Xiangji Haidun Technology Co., Ltd., Changsha, Hunan 410199, PR China
| | - Zekai Yao
- College of Engineering and Design, Hunan Normal University, Changsha, Hunan 410083, PR China
| | - Xi Cheng
- College of Engineering and Design, Hunan Normal University, Changsha, Hunan 410083, PR China
| | - Yixuan He
- State Key Laboratory of Developmental Biology of Freshwater Fish, College of Life Sciences, Hunan Normal University, Changsha, Hunan 410083, PR China
| | - Shiwen Li
- College of Engineering and Design, Hunan Normal University, Changsha, Hunan 410083, PR China
| | - Jiaji Pan
- College of Engineering and Design, Hunan Normal University, Changsha, Hunan 410083, PR China; State Key Laboratory of Developmental Biology of Freshwater Fish, College of Life Sciences, Hunan Normal University, Changsha, Hunan 410083, PR China.
| |
Collapse
|
3
|
Vazquez-Osorio N, Castro-Ramos J, Sánchez-Escobar JJ. Matching Pursuit for Denoising Raman Spectra, Based on Genetic Algorithm and Hermite Atoms. APPLIED SPECTROSCOPY 2023; 77:1009-1024. [PMID: 37448352 DOI: 10.1177/00037028231179744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/15/2023]
Abstract
Due to its various advantages, Raman spectroscopy has become a powerful tool in different fields of science and engineering; however, in specific applications, this technique's limiting factor is closely related to the inherent noise of the Raman spectra. To eliminate the noise of a Raman spectrum, preserving its position, intensity, and width characteristic, we propose using a genetic matching pursuit-Hermite atoms (GMP-HAs) algorithm in this work. This algorithm helps recover Raman spectra immersed in Gaussian noise with the least number of atoms. The noise-free Raman signal is reconstructed with the GMP-HAs algorithm, transforming the typical best-matching atom search into an optimization problem. Specifically, we maximize the fitness function, defined as the correlation between current residual and Hermite atoms, with the genetic algorithm MI-LXPM encoded in a real domain and avoiding local maxima, by adding a stopping criterion based on an exponential adjustment according to the algorithm's behavior in the presence of noise. Simulated and biological Raman spectra are used to evaluate the proposed algorithm and compare its performance with typically known methods for denoising, such as the Savitzky- Golay filter (SG) and basis pursuit denoising. Using the signal-to-noise ratio (S/N)metric resulted in a 0.31 dB advantage in the S/N product for the proposed algorithm with respect to SG. Additionally, it is shown that the algorithm uses only 25.3% of the number of atoms needed by the matching pursuit algorithm. The results indicate that the GMP-HAs algorithm has better denoising capabilities, and at the same time, the Raman spectra are decomposed with fewer atoms compared to known sparse algorithms.
Collapse
Affiliation(s)
- Noe Vazquez-Osorio
- Coordinación de Óptica, Instituto Nacional de Astrofísica, Óptica y Electrónica, Puebla, Mexico
| | - J Castro-Ramos
- Coordinación de Óptica, Instituto Nacional de Astrofísica, Óptica y Electrónica, Puebla, Mexico
| | | |
Collapse
|
4
|
Zhang Y, Jin L, Guo F, Ni X, Zhao Y, Cheng Y, Wang H. Matrix Factorization-Based Dimensionality Reduction Algorithms─A Comparative Study on Spectroscopic Profiling Data. Anal Chem 2022; 94:13385-13395. [PMID: 36130041 DOI: 10.1021/acs.analchem.2c01922] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Spectroscopic profiling data used in analytical chemistry can be very high-dimensional. Dimensionality reduction (DR) is an effective way to handle the potential "curse of dimensionality" problem. Among the existing DR algorithms, many can be categorized as a matrix factorization (MF) problem, which decomposes the original data matrix X into the product of a low-dimensional matrix W and a dictionary matrix H. First, this paper provides a theoretical reformulation of relevant DR algorithms under a unified MF perspective, including PCA (principal component analysis), NMF (non-negative matrix factorization), LAE (linear autoencoder), RP (random projection), SRP (sparse random projection), VQ (vector quantization), AA (archetypical analysis), and ICA (independent component analysis). From this perspective, an open-sourced toolkit has been developed to integrate all of the above algorithms with a unified API. Second, we made a comparative study on MF-based DR algorithms. In a case study of TOF (time-of-flight) mass spectra, the eight algorithms extracted three components from the original 27,619 features. The results are compared by a set of DR quality metrics, e.g., reconstruction error, pairwise distance/ranking property, computational cost, local and global structure preservations, etc. Finally, based on the case study result, we summarized guidelines for DR algorithm selection. (1) For reconstruction quality, choose ICA. In the case study, ICA, PCA, and NMF have high reconstruction qualities (reconstruction error < 2%), ICA being the best. (2) To keep the pairwise topological structure, choose PCA. PCA best preserves the pairwise distance/ranking property. (3) For edge computing and IoT scenarios, choose RP or SRP if reconstruction is not required and the JL-lemma condition is met. The RP family has the best computational performance in the experiment, almost 10-100 times faster than its peers.
Collapse
Affiliation(s)
- Yinsheng Zhang
- School of Management and E-Business, Zhejiang Gongshang University, Hangzhou 310018, China.,School of Information Sciences, University of Illinois at Urbana Champaign, Champaign, Illinois 61820-6211, United States
| | - Ling Jin
- School of Management and E-Business, Zhejiang Gongshang University, Hangzhou 310018, China
| | - Fangjie Guo
- School of Management and E-Business, Zhejiang Gongshang University, Hangzhou 310018, China
| | - Xiaofeng Ni
- School of Management and E-Business, Zhejiang Gongshang University, Hangzhou 310018, China
| | - Yaju Zhao
- School of Management and E-Business, Zhejiang Gongshang University, Hangzhou 310018, China
| | - Yongbo Cheng
- School of Management Science and Engineering, Nanjing University of Finance and Economics, Nanjing 210023, China
| | - Haiyan Wang
- School of Management and E-Business, Zhejiang Gongshang University, Hangzhou 310018, China
| |
Collapse
|
5
|
Zhang ZY. The statistical fusion identification of dairy products based on extracted Raman spectroscopy. RSC Adv 2020; 10:29682-29687. [PMID: 35518240 PMCID: PMC9056169 DOI: 10.1039/d0ra06318e] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Accepted: 07/28/2020] [Indexed: 11/21/2022] Open
Abstract
At present, practical and rapid identification techniques for dairy products are still scarce. Taking different brands of pasteurized milk as an example, they are all milky white in appearance, and their Raman spectra are very similar, so it is not feasible to identify them directly using the naked eye. In the current work, a clear feature extraction and fusion strategy based on a combination of Raman spectroscopy and a support vector machine (SVM) algorithm was demonstrated. The results showed a 58% average recognition accuracy rate for dairy products as based on the original Raman full spectral data and up to nearly 70% based on a single spectral interval. Data normalization processing effectively improved the recognition accuracy rate. The average recognition accuracy rate of dairy products reached 91% based on the normalized Raman full spectral data or nearly 85% based on a normalized single spectral interval. The fusion of multispectral feature regions yielded high accuracy and operation efficiency. After screening and optimizing based on SVM algorithm, the best spectral feature intervals were determined to be 335–354 cm−1, 435–454 cm−1, 485–540 cm−1, 820–915 cm−1, 1155–1185 cm−1, 1300–1414 cm−1, and 1415–1520 cm−1 under the experimental conditions, and the average identification accuracy rate here reached 93%. The developed scheme has the advantages of clear feature extraction and fusion, and short identification time, and it provides a technical reference for food quality control. At present, practical and rapid identification techniques for dairy products are still scarce.![]()
Collapse
Affiliation(s)
- Zheng-Yong Zhang
- State Key Laboratory of Dairy Biotechnology
- Shanghai Engineering Research Center of Dairy Biotechnology
- Dairy Research Institute
- Bright Dairy & Food Co., Ltd
- Shanghai 200436
| |
Collapse
|