1
|
Gao C, Fan Q, Zhao P, Sun C, Dang R, Feng Y, Hu B, Wang Q. Spectral encoder to extract the efficient features of Raman spectra for reliable and precise quantitative analysis. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2024; 312:124036. [PMID: 38367343 DOI: 10.1016/j.saa.2024.124036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 02/04/2024] [Accepted: 02/10/2024] [Indexed: 02/19/2024]
Abstract
Raman spectroscopy has become a powerful analytical tool highly demanded in many applications such as microorganism sample analysis, food quality control, environmental science, and pharmaceutical analysis, owing to its non-invasiveness, simplicity, rapidity and ease of use. Among them, quantitative research using Raman spectroscopy is a crucial application field of spectral analysis. However, the entire process of quantitative modeling largely relies on the extraction of effective spectral features, particularly for measurements on complex samples or in environments with poor spectral signal quality. In this paper, we propose a method of utilizing a spectral encoder to extract effective spectral features, which can significantly enhance the reliability and precision of quantitative analysis. We built a latent encoded feature regression model; in the process of utilizing the autoencoder for reconstructing the spectrometer output, the latent feature obtained from the intermediate bottleneck layer is extracted. Then, these latent features are fed into a deep regression model for component concentration prediction. Through detailed ablation and comparative experiments, our proposed model demonstrates superior performance to common methods on single-component and multi-component mixture datasets, remarkably improving regression precision while without needing user-selected parameters and eliminating the interference of irrelevant and redundant information. Furthermore, in-depth analysis reveals that latent encoded feature possesses strong nonlinear feature representation capabilities, low computational costs, wide adaptability, and robustness against noise interference. This highlights its effectiveness in spectral regression tasks and indicates its potential in other application fields. Sufficient experimental results show that our proposed method provides a novel and effective feature extraction approach for spectral analysis, which is simple, suitable for various methods, and can meet the measurement needs of different real-world scenarios.
Collapse
Affiliation(s)
- Chi Gao
- Key Laboratory of Spectral Imaging Technology, Xi'an Institute of Optics and Precision Mechanics of the Chinese Academy of Sciences, Shaanxi, 710076, China; The Key Laboratory of Biomedical Spectroscopy of Xi'an, Shaanxi, 710076, China; University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Qi Fan
- Key Laboratory of Spectral Imaging Technology, Xi'an Institute of Optics and Precision Mechanics of the Chinese Academy of Sciences, Shaanxi, 710076, China; The Key Laboratory of Biomedical Spectroscopy of Xi'an, Shaanxi, 710076, China
| | - Peng Zhao
- Key Laboratory of Spectral Imaging Technology, Xi'an Institute of Optics and Precision Mechanics of the Chinese Academy of Sciences, Shaanxi, 710076, China; The Key Laboratory of Biomedical Spectroscopy of Xi'an, Shaanxi, 710076, China; University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Chao Sun
- Key Laboratory of Spectral Imaging Technology, Xi'an Institute of Optics and Precision Mechanics of the Chinese Academy of Sciences, Shaanxi, 710076, China; The Key Laboratory of Biomedical Spectroscopy of Xi'an, Shaanxi, 710076, China
| | - Ruochen Dang
- Key Laboratory of Spectral Imaging Technology, Xi'an Institute of Optics and Precision Mechanics of the Chinese Academy of Sciences, Shaanxi, 710076, China; The Key Laboratory of Biomedical Spectroscopy of Xi'an, Shaanxi, 710076, China; University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yutao Feng
- Key Laboratory of Spectral Imaging Technology, Xi'an Institute of Optics and Precision Mechanics of the Chinese Academy of Sciences, Shaanxi, 710076, China
| | - Bingliang Hu
- Key Laboratory of Spectral Imaging Technology, Xi'an Institute of Optics and Precision Mechanics of the Chinese Academy of Sciences, Shaanxi, 710076, China; The Key Laboratory of Biomedical Spectroscopy of Xi'an, Shaanxi, 710076, China
| | - Quan Wang
- Key Laboratory of Spectral Imaging Technology, Xi'an Institute of Optics and Precision Mechanics of the Chinese Academy of Sciences, Shaanxi, 710076, China; The Key Laboratory of Biomedical Spectroscopy of Xi'an, Shaanxi, 710076, China.
| |
Collapse
|
3
|
Ma H, Pan H, Pan D, Ni H, Feng X, Liu X, Chen Y, Wu Y, Luo N. Rapid monitoring approaches for concentration process of lanqin oral solution by near-infrared spectroscopy and chemometric models. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2020; 242:118792. [PMID: 32805551 DOI: 10.1016/j.saa.2020.118792] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Revised: 07/21/2020] [Accepted: 07/30/2020] [Indexed: 06/11/2023]
Abstract
Qualitative and quantitative detection methods based on near-infrared spectroscopy (NIRs) have been proposed in the process analysis of traditional Chinese medicine in recent years. In this study, rapid monitoring methods were developed for quality control of concentration process of lanqin oral solution (LOS). Partial least squares regression (PLSR) method was adopted to construct quantitative models for epigoitrin, geniposide, baicalin, berberine hydrochloride and density. Simultaneously, the genetic algorithm joint extreme learning machine (GA-ELM) was first applied in qualitative analysis of NIRs to distinguish end point of concentration process. Results of PLSR models were satisfactory with the relative standard error of calibration valued at 3.80%, 3.75%, 3.79%, 11.5% and 1.22% for epigoitrin, geniposide, baicalin, berberine hydrochloride and density respectively, and the residual predictive deviation values were higher than 3. For qualitative analysis, the GA-ELM model obtained 100% prediction accuracy. The PLSR quantitative models and the end point discrimination model constructed by GA-ELM correspond with the requirements of practical applications. The results indicate that NIRs in combination with chemometrics has great potential in improving the efficiency in production.
Collapse
Affiliation(s)
- Hui Ma
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Hongye Pan
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Dongyue Pan
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Hongfei Ni
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Xuejing Feng
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Xuesong Liu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yong Chen
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
| | - Yongjiang Wu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
| | - Niu Luo
- Suzhou ZeDaXingBang Pharmaceutical Co., Ltd., Suzhou 215000, China
| |
Collapse
|
4
|
Zhang S, Li X, Fan C, Wu Z, Liu Q. Application of Machine Learning Techniques to Predict Protein Phosphorylation Sites. LETT ORG CHEM 2019. [DOI: 10.2174/1570178615666180907150928] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Protein phosphorylation is one of the most important post-translational modifications of proteins.
Almost all processes that regulate the life activities of an organism as well as almost all physiological
and pathological processes are involved in protein phosphorylation. In this paper, we summarize
specific implementation and application of the methods used in protein phosphorylation site prediction
such as the support vector machine algorithm, random forest, Jensen-Shannon divergence combined
with quadratic discriminant analysis, Adaboost algorithm, increment of diversity with quadratic
discriminant analysis, modified CKSAAP algorithm, Bayes classifier combined with phosphorylation
sequences enrichment analysis, least absolute shrinkage and selection operator, stochastic search variable
selection, partial least squares and deep learning. On the basis of this prediction, we use k-nearest
neighbor algorithm with BLOSUM80 matrix method to predict phosphorylation sites. Firstly, we construct
dataset and remove the redundant set of positive and negative samples, that is, removal of protein
sequences with similarity of more than 30%. Next, the proposed method is evaluated by sensitivity
(Sn), specificity (Sp), accuracy (ACC) and Mathew’s correlation coefficient (MCC) these four metrics.
Finally, tenfold cross-validation is employed to evaluate this method. The result, which is verified by
tenfold cross-validation, shows that the average values of Sn, Sp, ACC and MCC of three types of amino
acid (serine, threonine, and tyrosine) are 90.44%, 86.95%, 88.74% and 0.7742, respectively. A
comparison with the predictive performance of PhosphoSVM and Musite reveals that the prediction
performance of the proposed method is better, and it has the advantages of simplicity, practicality and
low time complexity in classification.
Collapse
Affiliation(s)
- Shengli Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, China
| | - Xian Li
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, China
| | - Chengcheng Fan
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, China
| | - Zhehui Wu
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, China
| | - Qian Liu
- Centre for Biostatistics, School of Health Sciences, The University of Manchester, Manchester, M13 9PL, United Kingdom
| |
Collapse
|
6
|
Bian H, Gao J. Error analysis of the spectral shift for partial least squares models in Raman spectroscopy. OPTICS EXPRESS 2018; 26:8016-8027. [PMID: 29715775 DOI: 10.1364/oe.26.008016] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/08/2018] [Accepted: 03/06/2018] [Indexed: 05/28/2023]
Abstract
Raman spectroscopy paired with the partial least squares (PLS) method is commonly used for quantitative or qualitative analysis of complex samples. However, spectral shift induced by different Raman spectroscopy, different environment or different measured time will decrease the accuracy of the PLS model. In this work, the processing algorithms that improve the accuracy by removing the noise, background and varying sources of other spectral interference were first reviewed. The error induced by the spectral shift was analyzed and the formulas of the error were derived. The formulas were then used to calculate the theoretical error in the example of discriminating human and nonhuman blood. A comparison of the actual errors obtained from the mathematical method and experiment with the theoretical value demonstrated the effectiveness of the equation. The compensation for nonhuman blood according to the average error demonstrated the improvement of the accuracy. Finally, the non-uniform sampling of the Raman shift by charge-coupled device (CCD) was considered in the error equation. An accurate error equation was obtained. This work could help improve the stability of PLS models in the case of the spectral shift of the spectrometer in Raman spectroscopy.
Collapse
|