1
|
Canova LDS, Vallese FD, Pistonesi MF, de Araújo Gomes A. An improved successive projections algorithm version to variable selection in multiple linear regression. Anal Chim Acta 2023; 1274:341560. [PMID: 37455078 DOI: 10.1016/j.aca.2023.341560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 06/07/2023] [Accepted: 06/22/2023] [Indexed: 07/18/2023]
Abstract
The aim of the successive projections algorithm (SPA) is to enhance the accuracy of multiple linear regressions (MLR) by minimizing the impact of collinearity effects in the calibration data set. Combining SPA with MLR as a variable selection approach has resulted in the SPA-MLR method, which has been reported in literature to produce models with good prediction ability compared to conventional full-spectrum models obtained with partial-least-squares (PLS) in some cases. This paper proposes the addition of a filter step to the current version of the SPA algorithm to reduce the number of uninformative variables before the projection phase and assist the algorithm in selecting the best variables on subsequent steps. The proposed fSPA-MLR algorithm is evaluated in two case studies involving the near-infrared spectrometric analysis of pharmaceutical tablet and diesel/biodiesel mixture samples. Compared to PLS, the fSPA-MLR models demonstrate similar or better performance. Moreover, the fSPA-MLR models outperform the original SPA-MLR in both cross-validation and external prediction. The fSPA-MLR models deliver superior results regardless of the pre-processing algorithm tested, including first-derivative Savitzky-Golay (SG) and Standard Normal Variate (SNV), or even in raw spectra data.
Collapse
Affiliation(s)
- Luciana Dos Santos Canova
- Instituto de Química, IQ, Universidade Federal do Rio Grande do Sul, Av. Bento Gonçalves, 9500 Agronomia, 91501970, Porto Alegre, RS, Brazil
| | - Federico Danilo Vallese
- Dpto. de Química, Universidad Nacional del Sur, INQUISUR, Av. Alem 1253, B8000CPB, Bahía Blanca, Buenos Aires, Argentina
| | - Marcelo Fabian Pistonesi
- Dpto. de Química, Universidad Nacional del Sur, INQUISUR, Av. Alem 1253, B8000CPB, Bahía Blanca, Buenos Aires, Argentina
| | - Adriano de Araújo Gomes
- Instituto de Química, IQ, Universidade Federal do Rio Grande do Sul, Av. Bento Gonçalves, 9500 Agronomia, 91501970, Porto Alegre, RS, Brazil.
| |
Collapse
|
2
|
Chemometrics-assisted inductively coupled plasma-optical emission spectrometry method for determination of natural zinc isotopes. J Radioanal Nucl Chem 2023. [DOI: 10.1007/s10967-022-08756-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
|
3
|
Gheidari D, Mehrdad M, Ghahremani M. Azole Compounds as Inhibitors of Candida albicans: QSAR Modelling. Front Chem 2021; 9:774416. [PMID: 34912782 PMCID: PMC8667819 DOI: 10.3389/fchem.2021.774416] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Accepted: 11/03/2021] [Indexed: 01/13/2023] Open
Abstract
Candida albicans is a pathogenic opportunistic yeast found in the human gut flora. It may also live outside of the human body, causing diseases ranging from minor to deadly. Candida albicans begins as a budding yeast that can become hyphae in response to a variety of environmental or biological triggers. The hyphae form is responsible for the development of multidrug resistant biofilms, despite the fact that both forms have been associated to virulence Here, we have proposed a linear and SPA-linear quantitative structure activity relationship (QSAR) modeling and prediction of Candida albicans inhibitors. A data set that consisted of 60 derivatives of benzoxazoles, benzimidazoles, oxazolo (4, 5-b) pyridines have been used. In this study, that after applying the leverage analysis method to detect outliers' molecules, the total number of these compounds reached 55. SPA-MLR model shows superiority over the multiple linear regressions (MLR) by accounting 90% of the Q 2 of anti-fungus derivatives 'activity. This paper focuses on investigating the role of SPA-MLR in developing model. The accuracy of SPA-MLR model was illustrated using leave-one-out (LOO). The mean effect of descriptors and sensitivity analysis show that RDF090u is the most important parameter affecting the as behavior of the inhibitors of Candida albicans.
Collapse
Affiliation(s)
- Davood Gheidari
- Department of Chemistry, Faculty of Science, University of Guilan, Rasht, Iran
| | - Morteza Mehrdad
- Department of Chemistry, Faculty of Science, University of Guilan, Rasht, Iran
| | - Mahboubeh Ghahremani
- Department of Chemistry and Biochemistry, Texas Tech University, Lubbock, TX, United States
| |
Collapse
|
4
|
Silva DIO, Vilar WTS, Pontes MJC. Chemometric-assisted UV spectrophotometric method for determination of N, N- diethyl-3-methylbenzamide in insect repellents. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2020; 241:118660. [PMID: 32653822 DOI: 10.1016/j.saa.2020.118660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 06/24/2020] [Accepted: 06/27/2020] [Indexed: 06/11/2023]
Abstract
In recent years outbreaks of vector-borne diseases have caused great concern to the population, especially those diseases transmitted by mosquitoes. Repellents appear as an affordable alternative for prevention, making it increasingly important to control the quality of these products, since the content of the active ingredients are directly related to the efficiency and the protection time provided by the repellent. This paper proposes an analytical method for determining the DEET (N, N- Diethyl-3-methylbenzamide) content in insect repellents in lotion using UV spectroscopy. For this propose five different strategies of regression were evaluated: (a) Partial Least Squares (PLS) using full-spectrum; (b) interval PLS (iPLS); Multiple Linear Regression (MLR) with variable selection by the (c) Genetic Algorithm (MLR/GA), (d) Successive Projections Algorithm (MLR/SPA) and the (e) Stepwise (MLR/SW). Appropriate predictions were obtained with RMSEP values between 0.88 and 0.93%w w-1. No systematic error was observed and no significant differences were found between the predicted and reference values, according to a paired t-test at 95% confidence level. The results demonstrated the potential of UV spectroscopy associated to multivariate calibration to determine DEET content in repellents as a fast, simple strategy and with a suitable correlation between the values estimated by the model and the reference values.
Collapse
|
5
|
Wang YJ, Li LQ, Shen SS, Liu Y, Ning JM, Zhang ZZ. Rapid detection of quality index of postharvest fresh tea leaves using hyperspectral imaging. JOURNAL OF THE SCIENCE OF FOOD AND AGRICULTURE 2020; 100:3803-3811. [PMID: 32201954 DOI: 10.1002/jsfa.10393] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Revised: 02/25/2020] [Accepted: 03/21/2020] [Indexed: 06/10/2023]
Abstract
BACKGROUND The quality of fresh tea leaves after harvest determines, to some extent, the quality and price of commercial tea. A fast and accurate method to evaluate the quality of fresh tea leaves is required. RESULTS In this study, the potential of hyperspectral imaging in the range of 328-1115 nm for the rapid prediction of moisture, total nitrogen, crude fiber contents, and quality index value was investigated. Ninety samples of eight tea-leaf varieties and two picking standards were tested. Quantitative partial least squares regression (PLSR) models were established using a full spectrum, whereas multiple linear regression (MLR) models were developed using characteristic wavelengths selected by a successive projections algorithm (SPA) and competitive adaptive reweighted sampling. The results showed that the optimal SPA-MLR models for moisture, total nitrogen, crude fiber contents, and quality index value yielded optimal performance with coefficients of determination for prediction (R2 p) of 0.9357, 0.8543, 0.8188, 0.9168; root mean square error of 0.3437, 0.1097, 0.3795, 1.0358; and residual prediction deviation of 4.00, 2.56, 2.31, and 3.51, respectively. CONCLUSION The results suggested that the hyperspectral imaging technique coupled with chemometrics was a promising tool for the rapid and nondestructive measurement of tea-leaf quality, and had the potential to develop multispectral imaging systems for future online detection of tea-leaf quality. © 2020 Society of Chemical Industry.
Collapse
Affiliation(s)
- Yu-Jie Wang
- State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, China
| | - Lu-Qing Li
- State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, China
| | - Shan-Shan Shen
- State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, China
| | - Ying Liu
- State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, China
| | - Jing-Ming Ning
- State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, China
| | - Zheng-Zhu Zhang
- State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, China
| |
Collapse
|
6
|
Wei G, Li Y, Zhang Z, Chen Y, Chen J, Yao Z, Lao C, Chen H. Estimation of soil salt content by combining UAV-borne multispectral sensor and machine learning algorithms. PeerJ 2020; 8:e9087. [PMID: 32377459 PMCID: PMC7194094 DOI: 10.7717/peerj.9087] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Accepted: 04/08/2020] [Indexed: 11/20/2022] Open
Abstract
Soil salinization is a global problem closely related to the sustainable development of social economy. Compared with frequently-used satellite-borne sensors, unmanned aerial vehicles (UAVs) equipped with multispectral sensors provide an opportunity to monitor soil salinization with on-demand high spatial and temporal resolution. This study aims to quantitatively estimate soil salt content (SSC) using UAV-borne multispectral imagery, and explore the deep mining of multispectral data. For this purpose, a total of 60 soil samples (0–20 cm) were collected from Shahaoqu Irrigation Area in Inner Mongolia, China. Meanwhile, from the UAV sensor we obtained the multispectral data, based on which 22 spectral covariates (6 spectral bands and 16 spectral indices) were constructed. The sensitive spectral covariates were selected by means of gray relational analysis (GRA), successive projections algorithm (SPA) and variable importance in projection (VIP), and from these selected covariates estimation models were built using back propagation neural network (BPNN) regression, support vector regression (SVR) and random forest (RF) regression, respectively. The performance of the models was assessed by coefficient of determination (R2), root mean squared error (RMSE) and ratio of performance to deviation (RPD). The results showed that the estimation accuracy of the models had been improved markedly using three variable selection methods, and VIP outperformed GRA and GRA outperformed SPA. However, the model accuracy with the three machine learning algorithms turned out to be significantly different: RF > SVR > BPNN. All the 12 SSC estimation models could be used to quantitatively estimate SSC (RPD > 1.4) while the VIP-RF model achieved the highest accuracy (Rc2 = 0.835, RP2 = 0.812, RPD = 2.299). The result of this study proved that UAV-borne multispectral sensor is a feasible instrument for SSC estimation, and provided a reference for further similar research.
Collapse
Affiliation(s)
- Guangfei Wei
- College of Water Resources and Architectural Engineering, Northwest A&F University, Yangling, China.,Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas of Ministry of Education, Northwest A&F University, Yangling, China
| | - Yu Li
- College of Water Resources and Architectural Engineering, Northwest A&F University, Yangling, China
| | - Zhitao Zhang
- College of Water Resources and Architectural Engineering, Northwest A&F University, Yangling, China.,Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas of Ministry of Education, Northwest A&F University, Yangling, China
| | - Yinwen Chen
- Department of Foreign Languages, Northwest A&F University, Yangling, China
| | - Junying Chen
- College of Water Resources and Architectural Engineering, Northwest A&F University, Yangling, China.,Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas of Ministry of Education, Northwest A&F University, Yangling, China
| | - Zhihua Yao
- College of Water Resources and Architectural Engineering, Northwest A&F University, Yangling, China.,Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas of Ministry of Education, Northwest A&F University, Yangling, China
| | - Congcong Lao
- College of Water Resources and Architectural Engineering, Northwest A&F University, Yangling, China.,Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas of Ministry of Education, Northwest A&F University, Yangling, China
| | - Huifang Chen
- College of Water Resources and Architectural Engineering, Northwest A&F University, Yangling, China.,Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas of Ministry of Education, Northwest A&F University, Yangling, China
| |
Collapse
|
7
|
FT-IR and Raman spectroscopy data fusion with chemometrics for simultaneous determination of chemical quality indices of edible oils during thermal oxidation. Lebensm Wiss Technol 2020. [DOI: 10.1016/j.lwt.2019.108906] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
8
|
Liang L, Wei L, Fang G, Xu F, Deng Y, Shen K, Tian Q, Wu T, Zhu B. Prediction of holocellulose and lignin content of pulp wood feedstock using near infrared spectroscopy and variable selection. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2020; 225:117515. [PMID: 31521985 DOI: 10.1016/j.saa.2019.117515] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Revised: 09/03/2019] [Accepted: 09/04/2019] [Indexed: 05/25/2023]
Abstract
Wood is the main feedstock source for pulp and paper industry. However, chemical composition variations from multispecies and multisource feedstock heavily affect the production continuity and stability. As a rapid and non-destructive analysis technique, near infrared (NIR) spectroscopy provides an alternative for wood properties on-line analysis and feedstock quality control. Herein, near infrared spectroscopy coupled with partial least squares (PLS) regression was used to predict holocellulose and lignin contents of various wood species including poplars, eucalyptus and acacias. In order to obtain more accurate and robust prediction models, a comparison was conducted among several variable selection methods for NIR spectral variables optimization, including competitive adaptive reweighted sampling (CARS), Monte Carlo-uninformative variable elimination (MC-UVE), successive projections algorithm (SPA), and genetic algorithm (GA). The results indicated that CARS method displayed relatively higher efficiency over other methods in elimination of uninformative variables as well as enhancement of the predictive performance of models. CARS-PLS models showed significantly higher robustness and accuracy for each property using lowest variable numbers in cross validation and external validation, demonstrating its applicability and reliability for prediction of multispecies feedstock properties.
Collapse
Affiliation(s)
- Long Liang
- Institute of Chemical Industry of Forest Products, Chinese Academy of Forestry; Key Lab. of Biomass Energy and Material, Jiangsu Province; Co-Innovation Center of Efficient Processing and Utilization of Forest Resources, Jiangsu Province; Key Lab. of Chemical Engineering of Forest Products, National Forestry and Grassland Administration; National Engineering Lab. for Biomass Chemical Utilization, Nanjing, 210042, China; Beijing Key Laboratory of Lignocellulosic Chemistry, Beijing Forestry University, Beijing, 100083, China
| | - Lulu Wei
- Institute of Chemical Industry of Forest Products, Chinese Academy of Forestry; Key Lab. of Biomass Energy and Material, Jiangsu Province; Co-Innovation Center of Efficient Processing and Utilization of Forest Resources, Jiangsu Province; Key Lab. of Chemical Engineering of Forest Products, National Forestry and Grassland Administration; National Engineering Lab. for Biomass Chemical Utilization, Nanjing, 210042, China
| | - Guigan Fang
- Institute of Chemical Industry of Forest Products, Chinese Academy of Forestry; Key Lab. of Biomass Energy and Material, Jiangsu Province; Co-Innovation Center of Efficient Processing and Utilization of Forest Resources, Jiangsu Province; Key Lab. of Chemical Engineering of Forest Products, National Forestry and Grassland Administration; National Engineering Lab. for Biomass Chemical Utilization, Nanjing, 210042, China.
| | - Feng Xu
- Beijing Key Laboratory of Lignocellulosic Chemistry, Beijing Forestry University, Beijing, 100083, China.
| | - Yongjun Deng
- Institute of Chemical Industry of Forest Products, Chinese Academy of Forestry; Key Lab. of Biomass Energy and Material, Jiangsu Province; Co-Innovation Center of Efficient Processing and Utilization of Forest Resources, Jiangsu Province; Key Lab. of Chemical Engineering of Forest Products, National Forestry and Grassland Administration; National Engineering Lab. for Biomass Chemical Utilization, Nanjing, 210042, China
| | - Kuizhong Shen
- Institute of Chemical Industry of Forest Products, Chinese Academy of Forestry; Key Lab. of Biomass Energy and Material, Jiangsu Province; Co-Innovation Center of Efficient Processing and Utilization of Forest Resources, Jiangsu Province; Key Lab. of Chemical Engineering of Forest Products, National Forestry and Grassland Administration; National Engineering Lab. for Biomass Chemical Utilization, Nanjing, 210042, China
| | - Qingwen Tian
- Institute of Chemical Industry of Forest Products, Chinese Academy of Forestry; Key Lab. of Biomass Energy and Material, Jiangsu Province; Co-Innovation Center of Efficient Processing and Utilization of Forest Resources, Jiangsu Province; Key Lab. of Chemical Engineering of Forest Products, National Forestry and Grassland Administration; National Engineering Lab. for Biomass Chemical Utilization, Nanjing, 210042, China
| | - Ting Wu
- Institute of Chemical Industry of Forest Products, Chinese Academy of Forestry; Key Lab. of Biomass Energy and Material, Jiangsu Province; Co-Innovation Center of Efficient Processing and Utilization of Forest Resources, Jiangsu Province; Key Lab. of Chemical Engineering of Forest Products, National Forestry and Grassland Administration; National Engineering Lab. for Biomass Chemical Utilization, Nanjing, 210042, China
| | - Beiping Zhu
- Institute of Chemical Industry of Forest Products, Chinese Academy of Forestry; Key Lab. of Biomass Energy and Material, Jiangsu Province; Co-Innovation Center of Efficient Processing and Utilization of Forest Resources, Jiangsu Province; Key Lab. of Chemical Engineering of Forest Products, National Forestry and Grassland Administration; National Engineering Lab. for Biomass Chemical Utilization, Nanjing, 210042, China
| |
Collapse
|
9
|
Abstract
This review presents a retrospective of the studies carried out in the last 10 years (2006–2016) using spectroscopic methods as a research tool in the field of virology. Spectroscopic analyses are sensitive to variations in the biochemical composition of the sample, are non-destructive, fast and require the least sample preparation, making spectroscopic techniques tools of great interest in biological studies. Herein important chemometric algorithms that have been used in virological studies are also evidenced as a good alternative for analyzing the spectra, discrimination and classification of samples. Techniques that have not yet been used in the field of virology are also suggested. This methodology emerges as a new and promising field of research, and may be used in the near future as diagnosis tools for detecting diseases caused by viruses. A retrospective study of 2006–2016 using spectroscopic methods as a research tool in the field of virology. Chemometric algorithms used in virological studies were evidenced. This review emerges as a new and promising field of research in virology.
Collapse
|
10
|
Non-destructive internal quality assessment of eggs using a synthesis of hyperspectral imaging and multivariate analysis. J FOOD ENG 2015. [DOI: 10.1016/j.jfoodeng.2015.02.013] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
11
|
Esteki M, Nouroozi S, Shahsavari Z. A fast and direct spectrophotometric method for the simultaneous determination of methyl paraben and hydroquinone in cosmetic products using successive projections algorithm. Int J Cosmet Sci 2015; 38:25-34. [DOI: 10.1111/ics.12241] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2015] [Accepted: 05/03/2015] [Indexed: 11/29/2022]
Affiliation(s)
- M. Esteki
- Department of Chemistry; University of Zanjan; Zanjan 45195-313 Iran
| | - S. Nouroozi
- Department of Chemistry; University of Zanjan; Zanjan 45195-313 Iran
| | - Z. Shahsavari
- Department of Chemistry; University of Zanjan; Zanjan 45195-313 Iran
| |
Collapse
|
12
|
Unfolded partial least squares/residual bilinearization combined with the Successive Projections Algorithm for interval selection: enhanced excitation-emission fluorescence data modeling in the presence of the inner filter effect. Anal Bioanal Chem 2015; 407:5649-59. [DOI: 10.1007/s00216-015-8745-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2014] [Revised: 03/31/2015] [Accepted: 04/27/2015] [Indexed: 11/24/2022]
|
13
|
Dankowska A, Małecka M, Kowalewski W. Detection of plant oil addition to cheese by synchronous fluorescence spectroscopy. ACTA ACUST UNITED AC 2015; 95:413-424. [PMID: 26097644 PMCID: PMC4471384 DOI: 10.1007/s13594-015-0218-5] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2014] [Revised: 02/04/2015] [Accepted: 02/19/2015] [Indexed: 11/29/2022]
Abstract
The fraudulent addition of plant oils during the manufacturing of hard cheeses is a real issue for the dairy industry. Considering the importance of monitoring adulterations of genuine cheeses, the potential of fluorescence spectroscopy for the detection of cheese adulteration with plant oils was investigated. Synchronous fluorescence spectra were collected within the range of 240 to 700 nm with different wavelength intervals. The lowest detection limits of adulteration, 3.0 and 4.4%, respectively, were observed for the application of wavelength intervals of 60 and 80 nm. Multiple linear regression models were used to calculate the level of adulteration, with the lowest root mean square error of prediction and root mean square error of cross validation equalling 1.5 and 1.8%, respectively, for the measurement acquired at the wavelength interval of 60 nm. Lower classification errors were obtained for the successive projections algorithm-linear discriminant analysis (SPA-LDA) rather than for the principal component analysis (PCA)-LDA method. The lowest classification error rates equalled 3.8% (∆λ = 10 and 30 nm) and 0.0% (∆λ = 60 nm) for the PCA-LDA and SPA-LDA classification methods, respectively. The applied technique is useful for detecting the addition of plant fat to hard cheese.
Collapse
Affiliation(s)
- Anna Dankowska
- Faculty of Commodity Science, Poznań University of Economics, Poznań, Poland
| | - Maria Małecka
- Faculty of Commodity Science, Poznań University of Economics, Poznań, Poland
| | - Wojciech Kowalewski
- Faculty of Mathematics and Computer Science, Adam Mickiewicz University, Poznań, Poland
| |
Collapse
|
14
|
Liu K, Chen X, Li L, Chen H, Ruan X, Liu W. A consensus successive projections algorithm – multiple linear regression method for analyzing near infrared spectra. Anal Chim Acta 2015; 858:16-23. [DOI: 10.1016/j.aca.2014.12.033] [Citation(s) in RCA: 59] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2014] [Revised: 12/10/2014] [Accepted: 12/16/2014] [Indexed: 11/26/2022]
|
15
|
Dankowska A, Małecka M, Kowalewski W. Application of synchronous fluorescence spectroscopy with multivariate data analysis for determination of butter adulteration. Int J Food Sci Technol 2014. [DOI: 10.1111/ijfs.12594] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Anna Dankowska
- Faculty of Commodity Science; Poznań University of Economics; Poznań Poland
| | - Maria Małecka
- Faculty of Commodity Science; Poznań University of Economics; Poznań Poland
| | - Wojciech Kowalewski
- Faculty of Mathematics and Computer Science; Adam Mickiewicz University; Poznań Poland
| |
Collapse
|
16
|
Estimating Soil Organic Carbon Using VIS/NIR Spectroscopy with SVMR and SPA Methods. REMOTE SENSING 2014. [DOI: 10.3390/rs6042699] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
17
|
The Successive Projections Algorithm for interval selection in trilinear partial least-squares with residual bilinearization. Anal Chim Acta 2014; 811:13-22. [DOI: 10.1016/j.aca.2013.12.022] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2013] [Revised: 12/12/2013] [Accepted: 12/18/2013] [Indexed: 11/17/2022]
|
18
|
Shi T, Chen Y, Liu H, Wang J, Wu G. Soil organic carbon content estimation with laboratory-based visible-near-infrared reflectance spectroscopy: feature selection. APPLIED SPECTROSCOPY 2014; 68:831-837. [PMID: 25061784 DOI: 10.1366/13-07294] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
This study, with Yixing (Jiangsu Province, China) and Honghu (Hubei Province, China) as study areas, aimed to compare the successive projection algorithm (SPA) and the genetic algorithm (GA) in spectral feature selection for estimating soil organic carbon (SOC) contents with visible-near-infrared (Vis-NIR) reflectance spectroscopy and further to assess whether the spectral features selected from one site could be applied to another site. The SOC content and Vis-NIR reflectance spectra of soil samples were measured in the laboratory. Savitzky-Golay smoothing and log10(1/R) (R is reflectance) were used for spectral preprocessing. The reflectance spectra were resampled using different spacing intervals ranging from 2 to 10 nm. Then, SPA and GA were conducted for selecting the spectral features of SOC. Partial least square regression (PLSR) with full-spectrum PLSR and the spectral features selected by SPA (SPA-PLSR) and GA (GA-PLSR) were calibrated and validated using independent datasets, respectively. Moreover, the spectral features selected from one study area were applied to another area. Study results showed that, for the two study areas, the SPA-PLSR and GA-PLSR improved estimation accuracies and reduced spectral variables compared with the full spectrum PLSR in estimating SOC contents; GA-PLSR obtained better estimation results than SPA-PLSR, whereas SPA was simpler than GA, and the spectral features selected from Yixing could be well applied to Honghu, but not the reverse. These results indicated that the SPA and GA could reduce the spectral variables and improve the performance of PLSR model and that GA performed better than SPA in estimating SOC contents. However, SPA is simpler and time-saving compared with GA in selecting the spectral features of SOC. The spectral features selected from one dataset could be applied to a target dataset when the dataset contains sufficient information adequately describing the variability of samples of the target dataset.
Collapse
Affiliation(s)
- Tiezhu Shi
- School of Resource and Environmental Science and Key Laboratory of Geographic Information System of the Ministry of Education, Wuhan University, 430079, Wuhan, China
| | | | | | | | | |
Collapse
|
19
|
Application of a new SPA-SVM coupling method for QSPR study of electrophoretic mobilities of some organic and inorganic compounds. CHINESE CHEM LETT 2013. [DOI: 10.1016/j.cclet.2013.06.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
20
|
de Araújo Gomes A, Galvão RKH, de Araújo MCU, Véras G, da Silva EC. The successive projections algorithm for interval selection in PLS. Microchem J 2013. [DOI: 10.1016/j.microc.2013.03.015] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
21
|
Dankowska A, Małecka M, Kowalewski W. Discrimination of edible olive oils by means of synchronous fluorescence spectroscopy with multivariate data analysis. GRASAS Y ACEITES 2013. [DOI: 10.3989/gya.012613] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
22
|
Shahlaei M. Descriptor selection methods in quantitative structure-activity relationship studies: a review study. Chem Rev 2013; 113:8093-103. [PMID: 23822589 DOI: 10.1021/cr3004339] [Citation(s) in RCA: 116] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Mohsen Shahlaei
- Department of Medicinal Chemistry and Novel Drug Delivery Research Center, School of Pharmacy, Kermanshah University of Medical Sciences , Kermanshah 81746-73461, Iran
| |
Collapse
|
23
|
Soares SFC, Gomes AA, Araujo MCU, Filho ARG, Galvão RKH. The successive projections algorithm. Trends Analyt Chem 2013. [DOI: 10.1016/j.trac.2012.09.006] [Citation(s) in RCA: 142] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
24
|
Screening analysis of biodiesel feedstock using UV–vis, NIR and synchronous fluorescence spectrometries and the successive projections algorithm. Talanta 2012; 97:579-83. [DOI: 10.1016/j.talanta.2012.04.056] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2012] [Revised: 04/25/2012] [Accepted: 04/28/2012] [Indexed: 11/24/2022]
|
25
|
Xu H, Qi B, Sun T, Fu X, Ying Y. Variable selection in visible and near-infrared spectra: Application to on-line determination of sugar content in pears. J FOOD ENG 2012. [DOI: 10.1016/j.jfoodeng.2011.09.022] [Citation(s) in RCA: 90] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
26
|
A modification of the successive projections algorithm for spectral variable selection in the presence of unknown interferents. Anal Chim Acta 2011; 689:22-8. [DOI: 10.1016/j.aca.2011.01.022] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2010] [Revised: 12/24/2010] [Accepted: 01/12/2011] [Indexed: 11/18/2022]
|
27
|
A Non-invasive Method for Screening Sodium Hydroxymethanesulfonate in Wheat Flour by Near-Infrared Spectroscopy. FOOD ANAL METHOD 2011. [DOI: 10.1007/s12161-011-9198-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
28
|
de Lira LFB, de Albuquerque MS, Pacheco JGA, Fonseca TM, Cavalcanti EHDS, Stragevitch L, Pimentel MF. Infrared spectroscopy and multivariate calibration to monitor stability quality parameters of biodiesel. Microchem J 2010. [DOI: 10.1016/j.microc.2010.02.014] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
29
|
Zou X, Zhao J, Mao H, Shi J, Yin X, Li Y. Genetic algorithm interval partial least squares regression combined successive projections algorithm for variable selection in near-infrared quantitative analysis of pigment in cucumber leaves. APPLIED SPECTROSCOPY 2010; 64:786-794. [PMID: 20615293 DOI: 10.1366/000370210791666246] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Variable (or wavelength) selection plays an important role in the quantitative analysis of near-infrared (NIR) spectra. A method based on a genetic algorithm interval partial least squares regression (GAiPLS) combined successive projections algorithm (SPA) was proposed for variable selection in NIR spectroscopy. GAiPLS was used to select informative interval regions among the spectrum, and then SPA was employed to select the most informative variables and to minimize collinearity between those variables in the model. The performance of the proposed method was compared with the full-spectrum model, conventional interval partial least squares regression (iPLS), and backward interval partial least squares regression (BiPLS) for modeling the NIR data sets of pigments in cucumber leaf samples. The multiple linear regression (MLR) model was obtained with eight variables for chlorophylls and five variables for carotenoids selected by SPA. When the SPA model was applied to the prediction of the validation set, the correlation coefficients of the predicted value by MLR and the measured value for the validation data set (r(p)) of chlorophylls and carotenoids were 0.917 and 0.932, respectively. Results show that the proposed method was able to select important wavelengths from the NIR spectra and makes the prediction more robust and accurate in quantitative analysis.
Collapse
Affiliation(s)
- Xiaobo Zou
- Agricultural Product Nondestructive Detection Lab, School of Food and Biological Engineering, Jiangsu University, Zhenjiang, Jiangsu 212013, China.
| | | | | | | | | | | |
Collapse
|
30
|
Acebal CC, Grünhut M, Lista AG, Fernández Band BS. Successive projections algorithm applied to spectral data for the simultaneous determination of flavour enhancers. Talanta 2010; 82:222-6. [DOI: 10.1016/j.talanta.2010.04.024] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2010] [Revised: 04/12/2010] [Accepted: 04/14/2010] [Indexed: 11/27/2022]
|
31
|
Variables selection methods in near-infrared spectroscopy. Anal Chim Acta 2010; 667:14-32. [DOI: 10.1016/j.aca.2010.03.048] [Citation(s) in RCA: 651] [Impact Index Per Article: 46.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2009] [Revised: 03/21/2010] [Accepted: 03/23/2010] [Indexed: 02/07/2023]
|
32
|
Application of a hybrid variable selection method for the classification of rapeseed oils based on 1H NMR spectral analysis. Eur Food Res Technol 2010. [DOI: 10.1007/s00217-010-1241-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
33
|
Goudarzi N, Goodarzi M, Araujo MCU, Galvão RKH. QSPR modeling of soil sorption coefficients (K(OC)) of pesticides using SPA-ANN and SPA-MLR. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2009; 57:7153-7158. [PMID: 19722589 DOI: 10.1021/jf9008839] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
A quantitative structure-property relationship (QSPR) study was conducted to predict the adsorption coefficients of some pesticides. The successive projection algorithm feature selection (SPA) strategy was used as descriptor selection and model development method. Modeling of the relationship between selected molecular descriptors and adsorption coefficient data was achieved by linear (multiple linear regression; MLR) and nonlinear (artificial neural network; ANN) methods. The QSPR models were validated by cross-validation as well as application of the models to predict the K(OC) of external set compounds, which did not contribute to model development steps. Both linear and nonlinear methods provided accurate predictions, although more accurate results were obtained by the ANN model. The root-mean-square errors of test set obtained by MLR and ANN models were 0.3705 and 0.2888, respectively.
Collapse
Affiliation(s)
- Nasser Goudarzi
- Faculty of Chemistry, Shahrood University of Technology, P.O. Box 316, Shahrood, Iran.
| | | | | | | |
Collapse
|
34
|
Successive projections algorithm improving the multivariate simultaneous direct spectrophotometric determination of five phenolic compounds in sea water. Microchem J 2007. [DOI: 10.1016/j.microc.2006.04.021] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
35
|
Luis ML, Fraga JM, Jiménez F, Jiménez AI, Hernández OM, Arias JJ. Selection of Wavelength Range and Number of Factors to be Used in the PLS Treatment of Spectrophotometric Data. ANAL LETT 2007. [DOI: 10.1080/00032710600867598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
36
|
Abstract
The applicability of genetic algorithms for solving multicomponent analyses is systematically examined. As a genetic algorithm (GA), the basic proposal of Goldberg is implemented in a straightforward manner to simulate multicomponent analyses in analogy to the well-established UV-vis or IR methods, especially multicomponent regression. The main focus of the study is to investigate the behavior of the genetic algorithm in order to compare it with the well-known behavior of multicomponent regression. A remarkable difference between the two methods is that the genetic algorithm method does not need any calibration procedure because of its pure searching characteristic. As important features of multicomponent systems, the degree of signal overlap (selectivity), the behavior of systems with known and unknown component numbers and qualities, and linear as well as nonlinear relationships between the analytical signal and concentration are varied within the simulations. According to multicomponent regression, recovering concentrations by a genetic algorithm is of limited applicability with the exception of systems at a low degree of signal overlap. On the other hand, the recovery of a probe spectrum in the analytical process always gives satisfactory results independent of the features of the probe system. The genetic algorithm obviously shows autoadaptive behavior in probe spectrum recovery. The quality and quantity of the resulting components may dramatically differ from the given probe, although the resulting spectrum is nearly the same. In such cases, the resulting component mixture can be interpreted as an imitation of the probe. As well probe spectra, theoretically designed spectra can also be autoadapted by genetic algorithms. The only limitation is that the desired spectrum must, of course, be incorporated into the search space defined by the involved components. Furthermore, a spectral signal is only one single property of a chemical compound or mixture. Because of the nonlinear search characteristic of genetic algorithms, any other chemical or physical property can also be treated as a desired property. Therefore, the conclusion of the study is well-founded that an old challenge of applied chemistry, namely, the development of new chemical products with desired properties, seems to be reachable under the control of genetic algorithms.
Collapse
Affiliation(s)
- Peter Zinn
- Ruhr-Universität Bochum, Lehrstuhl für Analytische Chemie, 44780 Bochum, Germany.
| |
Collapse
|
37
|
El-Sayed AAY, El-Salem NA. Recent Developments of Derivative Spectrophotometry and Their Analytical Applications. ANAL SCI 2005; 21:595-614. [PMID: 15984192 DOI: 10.2116/analsci.21.595] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Articles about the development of derivative spectrophotometric methods and analytical applications of derivative spectrophotometry (DS) published in the last nine years (since 1994) are reviewed.
Collapse
Affiliation(s)
- Abdel-Aziz Y El-Sayed
- Department of Chemistry, Faculty of Science, Al-Azhar University (Assiut Branch), Assiut, Egypt.
| | | |
Collapse
|
38
|
Breitkreitz MC, Raimundo IMJ, Rohwedder JJR, Pasquini C, Dantas Filho HA, José GE, Araújo MCU. Determination of total sulfur in diesel fuel employing NIR spectroscopy and multivariate calibration. Analyst 2003; 128:1204-7. [PMID: 14529031 DOI: 10.1039/b305265f] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
A method for sulfur determination in diesel fuel employing near infrared spectroscopy, variable selection and multivariate calibration is described. The performances of principal component regression (PCR) and partial least square (PLS) chemometric methods were compared with those shown by multiple linear regression (MLR), performed after variable selection based on the genetic algorithm (GA) or the successive projection algorithm (SPA). Ninety seven diesel samples were divided into three sets (41 for calibration, 30 for internal validation and 26 for external validation), each of them covering the full range of sulfur concentrations (from 0.07 to 0.33% w/w). Transflectance measurements were performed from 850 to 1800 nm. Although principal component analysis identified the presence of three groups, PLS, PCR and MLR provided models whose predicting capabilities were independent of the diesel type. Calibration with PLS and PCR employing all the 454 wavelengths provided root mean square errors of prediction (RMSEP) of 0.036% and 0.043% for the validation set, respectively. The use of GA and SPA for variable selection provided calibration models based on 19 and 9 wavelengths, with a RMSEP of 0.031% (PLS-GA), 0.022% (MLR-SPA) and 0.034% (MLR-GA). As the ASTM 4294 method allows a reproducibility of 0.05%, it can be concluded that a method based on NIR spectroscopy and multivariate calibration can be employed for the determination of sulfur in diesel fuels. Furthermore, the selection of variables can provide more robust calibration models and SPA provided more parsimonious models than GA.
Collapse
|
39
|
Hadjiloucas S, Galvão RKH, Bowen JW. Analysis of spectroscopic measurements of leaf water content at terahertz frequencies using linear transforms. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA. A, OPTICS, IMAGE SCIENCE, AND VISION 2002; 19:2495-509. [PMID: 12469746 DOI: 10.1364/josaa.19.002495] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
We provide a unified framework for a range of linear transforms that can be used for the analysis of terahertz spectroscopic data, with particular emphasis on their application to the measurement of leaf water content. The use of linear transforms for filtering, regression, and classification is discussed. For illustration, a classification problem involving leaves at three stages of drought and a prediction problem involving simulated spectra are presented. Issues resulting from scaling the data set are discussed. Using Lagrange multipliers, we arrive at the transform that yields the maximum separation between the spectra and show that this optimal transform is equivalent to computing the Euclidean distance between the samples. The optimal linear transform is compared with the average for all the spectra as well as with the Karhunen-Loève transform to discriminate a wet leaf from a dry leaf. We show that taking several principal components into account is equivalent to defining new axes in which data are to be analyzed. The procedure shows that the coefficients of the Karhunen-Loève transform are well suited to the process of classification of spectra. This is in line with expectations, as these coefficients are built from the statistical properties of the data set analyzed.
Collapse
Affiliation(s)
- Sillas Hadjiloucas
- Department of Cybernetics, The University of Reading, PO Box 225, Whiteknights, Reading, RG6 6AY UK.
| | | | | |
Collapse
|