1
|
Huang H, Fang Z, Xu Y, Lu G, Feng C, Zeng M, Tian J, Ping Y, Han Z, Zhao Z. Stacking and ridge regression-based spectral ensemble preprocessing method and its application in near-infrared spectral analysis. Talanta 2024; 276:126242. [PMID: 38761656 DOI: 10.1016/j.talanta.2024.126242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 05/08/2024] [Accepted: 05/09/2024] [Indexed: 05/20/2024]
Abstract
Spectral preprocessing techniques can, to a certain extent, eliminate irrelevant information, such as current noise and stray light from spectral data, thereby enhancing the performance of prediction models. However, current preprocessing techniques mostly attempt to find the best single preprocessing method or their combination, overlooking the complementary information among different preprocessing methods. These preprocessing techniques fail to maximize the utilization of useful information in spectral data and restrict the performance of prediction models. This study proposed a spectral ensemble preprocessing method based on the rapidly developing ensemble learning methods in recent years and the ridge regression (RR) model, named stacking preprocessing ridge regression (SPRR), to address the aforementioned issues. Different from conventional ensemble learning methods, the proposed SPRR method applied multiple different preprocessing techniques to the original spectral data, generating multiple preprocessed datasets. These datasets were then individually inputted into RR base models for training. Ultimately, RR still served as the meta-model, integrating the output results of each RR base model through stacking. This approach not only produced diversity in base models but also achieved higher accuracy and lower computational complexity by using a single type of base model. On the apple spectral dataset collected by our team, correlation analysis showed significant complementary information among the data produced by different preprocessing techniques. This provided robust theoretical support for the proposed SPRR method. By introducing the currently popular averaging ensemble preprocessing method in a comparative experiment, the results of applying the proposed SPRR method to six datasets (apple, meat, wheat, olive oil, tablet, and corn) demonstrated that compared to the single preprocessing method and averaging ensemble preprocessing method, SPRR yielded the best accuracy and reliability for all six datasets. Furthermore, under the same conditions of the training and test datasets, the proposed SPRR method demonstrated better performance than the four commonly used ensemble preprocessing methods.
Collapse
Affiliation(s)
- Haowen Huang
- College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, 518118, PR China
| | - Zile Fang
- College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, 518118, PR China
| | - Yuelong Xu
- College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, 518118, PR China
| | - Guosheng Lu
- College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, 518118, PR China
| | - Can Feng
- College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, 518118, PR China
| | - Min Zeng
- College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, 518118, PR China
| | - Jiaju Tian
- College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, 518118, PR China
| | - Yongfu Ping
- College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, 518118, PR China
| | - Zhuolin Han
- College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, 518118, PR China
| | - Zhigang Zhao
- College of New Materials and New Energies, Shenzhen Technology University, Shenzhen, 518118, PR China.
| |
Collapse
|
2
|
Li K, Ding C, Zhang J, Du B, Song X, Wang G, Li Q, Zhang Y, Zhang Z. Accurate identification of methanol and ethanol gasoline types and rapid detection of the alcohol content using effective chemical information. Talanta 2024; 274:125961. [PMID: 38555768 DOI: 10.1016/j.talanta.2024.125961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 03/18/2024] [Accepted: 03/20/2024] [Indexed: 04/02/2024]
Abstract
Methanol and ethanol gasoline are two emerging clean energy sources with different characteristics. To achieve the qualitative identification and quantitative analysis of the alcohols present in methanol and ethanol gasoline, effective chemical information (ECI) models based on the characteristic spectral bands of the near-infrared (NIR) spectra of the methanol and ethanol molecules were developed using the partial least squares discriminant analysis (PLS-DA) and partial least squares (PLS) algorithms. The ECI model was further compared with models built from the full wavenumber (Full) spectra, variable importance in projection (VIP) spectra, and Monte Carlo uninformative variable elimination (MC-UVE) spectra to determine the predictive performance of ECI model. Among the various qualitative identification models, it was found that the ECI-PLS-DA model, which is built using the differences in molecular chemical information between methanol and ethanol, exhibited sensitivity, specificity and accuracy values of 100%. The ECI-PLS-DA model accurately identified methanol gasoline and ethanol gasoline with different contents. In the quantitative analysis model for methanol gasoline, the methanol gasoline and ethanol gasoline ECI-PLS models exhibited the smallest root mean squared error of predictions (RMSEP) of 0.18 and 0.21% (v/v), respectively, compared to the other models. Meanwhile, the F-test and T-test results revealed that the NIR method employing the ECI-PLS model showed no significant difference compared to the standard method. Compared with other spectral models examined herein, the ECI model demonstrated the highest recognition success and determination accuracy. This study therefore established a highly accurate and rapid determination model for the qualitative identification and quantitative analysis based on chemical structures. It is expected that this model could be extended to the NIR analysis of other physicochemical properties of fuel.
Collapse
Affiliation(s)
- Ke Li
- Center for Environmental Metrology, National Institute of Metrology, Beijing, 100029, China
| | - Chaomin Ding
- College of Environmental Sciences and Engineering, Dalian Maritime University, Dalian, 116026, China
| | - Jin Zhang
- College of Environmental Sciences and Engineering, Dalian Maritime University, Dalian, 116026, China
| | - Biao Du
- Beijing Yixingyuan Petrochemical Technology Co. Ltd., Beijing, 101301, China
| | - Xiaoping Song
- Center for Environmental Metrology, National Institute of Metrology, Beijing, 100029, China
| | - Guixuan Wang
- Beijing Yixingyuan Petrochemical Technology Co. Ltd., Beijing, 101301, China
| | - Qi Li
- Center for Environmental Metrology, National Institute of Metrology, Beijing, 100029, China
| | - Yinglan Zhang
- Leibniz Institut für Polymerforschung Dresden e.V., Hohe Straße 6, Dresden, 01069, Germany; Institut für Werkstoffwissenschaft, Technische Universität Dresden, Dresden, 01062, Germany
| | - Zhengdong Zhang
- Center for Environmental Metrology, National Institute of Metrology, Beijing, 100029, China.
| |
Collapse
|
3
|
Biancolillo A, Scappaticci C, Foschi M, Rossini C, Marini F. Coupling of NIR Spectroscopy and Chemometrics for the Quantification of Dexamethasone in Pharmaceutical Formulations. Pharmaceuticals (Basel) 2023; 16:309. [PMID: 37259451 PMCID: PMC9961082 DOI: 10.3390/ph16020309] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 02/12/2023] [Accepted: 02/14/2023] [Indexed: 11/07/2023] Open
Abstract
Counterfeit or substandard drugs are pharmaceutical formulations in which the active pharmaceutical ingredients (APIs) have been replaced or ingredients do not comply with the drug leaflet. With the outbreak of the COVID-19 pandemic, fraud associated with the preparation of substandard or counterfeit drugs is expected to grow, undermining health systems already weakened by the state of emergency. Analytical chemistry plays a key role in tackling this problem, and in implementing strategies that permit the recognition of uncompliant drugs. In light of this, the present work represents a feasibility study for the development of a NIR-based tool for the quantification of dexamethasone in mixtures of excipients (starch and lactose). Two different regression strategies were tested. The first, based on the coupling of NIR spectra and Partial Least Squares (PLS) provided good results (root mean square error in prediction (RMSEP) of 720 mg/kg), but the most accurate was the second, a strategy exploiting sequential preprocessing through orthogonalization (SPORT), which led (on the external set of mixtures) to an R2pred of 0.9044, and an RMSEP of 450 mg/kg. Eventually, Variable Importance in Projection (VIP) was applied to interpret the obtained results and determine which spectral regions contribute most to the SPORT model.
Collapse
Affiliation(s)
- Alessandra Biancolillo
- Department of Physical and Chemical Sciences, University of L’Aquila, Via Vetoio snc, Coppito, 67100 L’Aquila, Italy
| | - Claudia Scappaticci
- Department of Physical and Chemical Sciences, University of L’Aquila, Via Vetoio snc, Coppito, 67100 L’Aquila, Italy
| | - Martina Foschi
- Department of Physical and Chemical Sciences, University of L’Aquila, Via Vetoio snc, Coppito, 67100 L’Aquila, Italy
| | - Claudia Rossini
- Department of Chemistry, University of Rome “La Sapienza”, Piazzale Aldo Moro 5, 00185 Rome, Italy
| | - Federico Marini
- Department of Chemistry, University of Rome “La Sapienza”, Piazzale Aldo Moro 5, 00185 Rome, Italy
| |
Collapse
|
4
|
Chen P, Liu D, Wang X, Zhang Q, Chu X. Rapid determination of viscosity and viscosity index of lube base oil based on near-infrared spectroscopy and new transformation formula. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2023; 287:122079. [PMID: 36368267 DOI: 10.1016/j.saa.2022.122079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 10/20/2022] [Accepted: 11/01/2022] [Indexed: 06/16/2023]
Abstract
Viscosity and viscosity index are the key product properties in lubricating oil production process. Rapid and even online analysis of viscosity and viscosity index through near-infrared (NIR) spectroscopy combined with chemometrics is helpful to optimize the production process. However, due to the nonlinear effect, the commonly used linear multivariate correction method is not effective. In this work, the feasibility of four existing viscosity linear transformation formulas for establishing NIR models was studied, and a new viscosity linear transformation formula was developed based on the viscosity-gravity constant. The experimental results showed that three of the four existing viscosity linear transformation formulas made some improvement on the viscosity prediction of base oil, but not as good as the newly established viscosity linear transformation formula. For viscosity index, the accuracy of modeling with reference viscosity index directly was much better than calculating by prediction viscosity value. Both of the viscosity and viscosity index prediction results of NIR analysis were in good agreement with the results of reference method, indicating that the determination can meet the needs of rapid and on-line analysis in industrial field.
Collapse
Affiliation(s)
- Pu Chen
- Research Institute of Petroleum Processing Co., Ltd., Beijing 100083, China
| | - Dan Liu
- Research Institute of Petroleum Processing Co., Ltd., Beijing 100083, China
| | - Xiaowei Wang
- Research Institute of Petroleum Processing Co., Ltd., Beijing 100083, China
| | - Qundan Zhang
- Research Institute of Petroleum Processing Co., Ltd., Beijing 100083, China
| | - Xiaoli Chu
- Research Institute of Petroleum Processing Co., Ltd., Beijing 100083, China.
| |
Collapse
|
5
|
Liu Z, Shen T, Zhang J, Li Z, Zhao Y, Zuo Z, Zhang J, Wang Y. A Novel Multi-Preprocessing Integration Method for the Qualitative and Quantitative Assessment of Wild Medicinal Plants: Gentiana rigescens as an Example. FRONTIERS IN PLANT SCIENCE 2021; 12:759248. [PMID: 34691133 PMCID: PMC8531481 DOI: 10.3389/fpls.2021.759248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 09/15/2021] [Indexed: 06/13/2023]
Abstract
Until now, the over-exploitation of wild resources has increased growing concern over the quality of wild medicinal plants. This led to the necessity of developing a rapid method for the evaluation of wild medicinal plants. In this study, the content of total secoiridoids (gentiopicroside, swertiamarin, and sweroside) of Gentiana rigescens from 37 different regions in southwest China were analyzed by high performance liquid chromatography (HPLC). Furthermore, Fourier transform infrared (FT-IR) was adopted to trace the geographical origin (331 individuals) and predict the content of total secoiridoids (273 individuals). In the traditional FT-IR analysis, only one scatter correction technique could be selected from a series of preprocessing candidates to decrease the impact of the light correcting effect. Nevertheless, different scatter correction techniques may carry complementary information so that using the single scatter correction technique is sub-optimal. Hence, the emerging ensemble approach to preprocessing fusion, sequential preprocessing through orthogonalization (SPORT), was carried out to fuse the complementary information linked to different preprocessing methods. The results suggested that, compared with the best results obtained on the scatter correction modeling, SPORT increased the accuracy of the test set by 12.8% in qualitative analysis and decreased the RMSEP by 66.7% in quantitative analysis.
Collapse
Affiliation(s)
- Zhimin Liu
- Medicinal Plants Research Institute, Yunnan Academy of Agricultural Sciences, Kunming, China
- School of Agriculture, Yunnan University, Kunming, China
| | - Tao Shen
- Medicinal Plants Research Institute, Yunnan Academy of Agricultural Sciences, Kunming, China
- College of Chemistry, Biological and Environment, Yuxi Normal University, Yuxi, China
| | - Ji Zhang
- Medicinal Plants Research Institute, Yunnan Academy of Agricultural Sciences, Kunming, China
| | - Zhimin Li
- Medicinal Plants Research Institute, Yunnan Academy of Agricultural Sciences, Kunming, China
| | - Yanli Zhao
- Medicinal Plants Research Institute, Yunnan Academy of Agricultural Sciences, Kunming, China
| | - Zhitian Zuo
- Medicinal Plants Research Institute, Yunnan Academy of Agricultural Sciences, Kunming, China
| | - Jinyu Zhang
- Medicinal Plants Research Institute, Yunnan Academy of Agricultural Sciences, Kunming, China
- School of Agriculture, Yunnan University, Kunming, China
| | - Yuanzhong Wang
- Medicinal Plants Research Institute, Yunnan Academy of Agricultural Sciences, Kunming, China
| |
Collapse
|
6
|
Yang X, Ou Q, Qian K, Yang J, Bai Z, Yang W, Shi Y, Liu G. Diagnosis of Lung Cancer by ATR-FTIR Spectroscopy and Chemometrics. Front Oncol 2021; 11:753791. [PMID: 34660320 PMCID: PMC8515056 DOI: 10.3389/fonc.2021.753791] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Accepted: 09/15/2021] [Indexed: 01/06/2023] Open
Abstract
Lung cancer is the leading cause of cancer-related death in the world. Early diagnosis has great significance for the survival of patients with lung cancer. In this paper, attenuated total reflectance Fourier transform infrared (ATR-FTIR) spectroscopy combined with chemometrics was used to study the serum samples from patients with lung cancer and healthy people. The results of spectral band area comparison showed that the concentrations of protein, lipid and nucleic acids molecules in serum of patients with lung cancer were increased compared with those in healthy people. The original spectra were preprocessed to improve the accuracy of principal component regression (PCR) and partial least squares-discriminant analysis (PLS-DA) models. PLS-DA results for first derivative spectral data in nucleic acids (1250-1000cm-1) band showed 80% sensitivity, 91.89% specificity and 87.10% accuracy with highR c 2 of 0.8949 andR v 2 of 0.8153, low RMSEC of 0.3136 and RMSEV of 0.4180. It is shown that ATR-FTIR spectroscopy combined with chemometrics might be developed as a simple method for clinical screening and diagnosis of lung cancer.
Collapse
Affiliation(s)
- Xien Yang
- School of Physics and Electronic Information, Yunnan Normal University, Kunming, China
| | - Quanhong Ou
- School of Physics and Electronic Information, Yunnan Normal University, Kunming, China
| | - Kai Qian
- Department of Thoracic Surgery, The First People’s Hospital of Yunnan Province, Kunming, China
| | - Jianru Yang
- Department of Clinical Laboratory, Affiliated Hospital of Zunyi Medical University, Zunyi, China
| | - Zhixun Bai
- Department of Internal Medicine, The Second Affiliated Hospital of Zunyi Medical University, Zunyi, China
| | - Weiye Yang
- School of Physics and Electronic Information, Yunnan Normal University, Kunming, China
| | - Youming Shi
- School of Physics and Electronic Engineering, Qujing Normal University, Qujing, China
| | - Gang Liu
- School of Physics and Electronic Information, Yunnan Normal University, Kunming, China
| |
Collapse
|
7
|
Westad F. A retrospective look at cross model validation and its applicability in vibrational spectroscopy. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2021; 255:119676. [PMID: 33765535 DOI: 10.1016/j.saa.2021.119676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 01/11/2021] [Accepted: 03/01/2021] [Indexed: 06/12/2023]
Abstract
In this paper, it is presented how Cross Model Validation (CMV), also known as double cross validation, efficiently can be applied for variable selection in spectroscopic applications. The chosen applications are FT-IR spectroscopic measurements of mixtures of marzipan and NIR spectra of diesel fuels. Standard Normal Variate (SNV) is applied as a spectral pre-treatment to reduce baseline effects in the spectra for the FT-IR data whereas 2nd derivative was applied for the diesel fuels. Variable selection based on jack-knifing and frequency of significance from Cross Model Validation is employed for identifying non-relevant spectral regions as well as providing a relevant subset for model optimization. The results show a high degree of correspondence between the objectively found wavelength bands and the reported chemical interpretation found in the literature. In addition, the stability of the models due to conservative validation with respect to predictive performance is exemplified. Finally, an example of how the use of downweighing variables ensures optimal prediction ability and detailed model interpretation is shown.
Collapse
Affiliation(s)
- Frank Westad
- Norwegian University of Science and Technology, Department of Engineering Cybernetics, O. S. Bragstads plass 2D, 7034 Trondheim, Norway.
| |
Collapse
|
8
|
Yang X, Bao N, Li W, Liu S, Fu Y, Mao Y. Soil Nutrient Estimation and Mapping in Farmland Based on UAV Imaging Spectrometry. SENSORS 2021; 21:s21113919. [PMID: 34204160 PMCID: PMC8201019 DOI: 10.3390/s21113919] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Revised: 05/27/2021] [Accepted: 06/04/2021] [Indexed: 12/19/2022]
Abstract
Soil nutrient is one of the most important properties for improving farmland quality and product. Imaging spectrometry has the potential for rapid acquisition and real-time monitoring of soil characteristics. This study aims to explore the preprocessing and modeling methods of hyperspectral images obtained from an unmanned aerial vehicle (UAV) platform for estimating the soil organic matter (SOM) and soil total nitrogen (STN) in farmland. The results showed that: (1) Multiplicative Scattering Correction (MSC) performed better in reducing image scattering noise than Standard Normal Variate (SNV) transformation or spectral derivatives, and it yielded a result with higher correlation and lower signal-to-noise ratio; (2) The proposed feature selection method combining Successive Projections Algorithm (SPA) and Competitive Adaptive Reweighted Sampling algorithm (CARS), could provide selective preference for hyperspectral bands. Exploiting this method, 24 and 22 feature bands were selected for SOM and STN estimation, respectively; (3) The particle swarm optimization (PSO) algorithm was employed to obtain optimized input weights and bias values of the extreme learning machine (ELM) model for more accurate prediction of SOM and STN. The improved PSO-ELM model based on the selected preference bands achieved higher prediction accuracy (R2 of 0.73 and RPD of 1.91 for SOM, R2 of 0.63, and RPD of 1.53 for STN) than support vector machine (SVM), partial least squares regression (PLSR), and the ELM model. This study provides an important guideline for monitoring soil nutrient for precision agriculture with imaging spectrometry.
Collapse
Affiliation(s)
- Xiaoyu Yang
- College of Resources and Civil Engineering, Northeastern University, Shenyang 110819, China; (X.Y.); (S.L.); (Y.M.)
| | - Nisha Bao
- College of Resources and Civil Engineering, Northeastern University, Shenyang 110819, China; (X.Y.); (S.L.); (Y.M.)
- Correspondence:
| | - Wenwen Li
- School of Geographical Sciences and Urban Planning, Arizona State University, Tempe, AZ 85287, USA;
| | - Shanjun Liu
- College of Resources and Civil Engineering, Northeastern University, Shenyang 110819, China; (X.Y.); (S.L.); (Y.M.)
| | - Yanhua Fu
- JangHo Architecture College, Northeastern University, Shenyang 110169, China;
| | - Yachun Mao
- College of Resources and Civil Engineering, Northeastern University, Shenyang 110819, China; (X.Y.); (S.L.); (Y.M.)
| |
Collapse
|