1
|
Chen H, Tan C, Lin Z, Chen M, Cheng B. Applying virtual sample generation and ensemble modeling for improving the spectral diagnosis of cancer. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2024; 318:124518. [PMID: 38796889 DOI: 10.1016/j.saa.2024.124518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 05/11/2024] [Accepted: 05/23/2024] [Indexed: 05/29/2024]
Abstract
Cancer diagnosis plays a key role in facilitating treatment and improving survival rates of patients. The combination of near-infrared (NIR) spectroscopy with data-driven algorithms offers a rapid and cost-effective approach for such a task. Due to the limitations of objective cases, the number of tumor samples is usually smaller, and the resulting dataset exhibit the issues of class imbalance, which has a more serious impact on the performance of diagnostic models. To deal with class imbalance and improve the sensitivity, this work investigates the feasibility of NIR spectroscopy combined with virtual sample generation (VSG) as well as ensemble strategy for developing diagnostic models. Based on preliminary experiment, several learning algorithms such as discriminant analysis (DA) and partial least square-discriminant analysis (PLS-DA) are screened out as algorithms for constructing prediction models. Three algorithms of VSG including synthetic minority oversampling technique (SMOTE), Borderline-SMOTE and adaptive synthetic sampling (ADASYN) are used for experiment. A fixed sample subset composed of 27 cancer samples and 54 normal samples are hold out as the test set. Three training sets containing 5, 10, 25 minority class samples and 54 majority class samples are used for model development. The experimental result indicates that overall, with PLS-DA algorithm, all VSG approaches can significantly improve the sensitivity of cancer diagnosis for all cases of training sets with different minority samples, but ADASYN performs the best. It reveals that the integration of NIR, PLS-DA, and ADASYN is a promising tool package for developing diagnosis methods.
Collapse
Affiliation(s)
- Hui Chen
- Key Lab of Process Analysis and Control of Sichuan Universities, Yibin University, Yibin, Sichuan 644000, China; Hospital, Yibin University, Yibin, Sichuan 644000, China
| | - Chao Tan
- Key Lab of Process Analysis and Control of Sichuan Universities, Yibin University, Yibin, Sichuan 644000, China; College of Materials and Chemical Engineering, Yibin University, Yibin, Sichuan 644000, China.
| | - Zan Lin
- Department of Knee Sports Injury, Sichuan Province Orthopedic Hospital, Chengdu, Sichuan 610041, China
| | - Maoxian Chen
- Key Lab of Process Analysis and Control of Sichuan Universities, Yibin University, Yibin, Sichuan 644000, China
| | - Bin Cheng
- Key Lab of Process Analysis and Control of Sichuan Universities, Yibin University, Yibin, Sichuan 644000, China
| |
Collapse
|
2
|
Bian X, Liu Y, Zhang R, Sun H, Liu P, Tan X. Rapid quantification of grapeseed oil multiple adulterations using near-infrared spectroscopy coupled with a novel double ensemble modeling method. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2024; 311:124016. [PMID: 38354676 DOI: 10.1016/j.saa.2024.124016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 02/04/2024] [Accepted: 02/07/2024] [Indexed: 02/16/2024]
Abstract
As a high-quality edible oil, grapeseed oil is often adulterated with low-price/quality vegetable oils. A novel ensemble modeling method is proposed for quantitative analysis of grapeseed oil adulterations combined with near-infrared (NIR) spectroscopy. The method combines Monte Carlo (MC) sampling and whale optimization algorithm (WOA) to build numerous partial least squares (PLS) sub-models, named MC-WOA-PLS. A total of 80 adulterated grapeseed oil samples were prepared by mixing grapeseed oil with soybean oil, palm oil, cottonseed oil, and corn oil with the designed mass percentages. NIR spectra of the 80 samples were measured in a transmittance mode in the range of 12,000-4000 cm-1. Parameters in MC-WOA-PLS including the number of latent variables (LVs) in PLS, iteration number of WOA, whale number, number of PLS sub-models, and percentage of training subsets were optimized. To validate the prediction performance of the model, root mean square error of calibration (RMSEC), root mean square error of cross-validation (RMSECV), root mean squared error of prediction (RMSEP), correlation coefficient (R), residual predictive deviation (RPD), and standard deviation (S.D.) were used. Compared with PLS, standard normal variate-PLS (SNV-PLS), uninformative variable elimination-PLS (UVE-PLS), Monte Carlo uninformative variable elimination-PLS (MCUVE-PLS), randomization test-PLS (RT-PLS), variable importance in projection-PLS (VIP-PLS), and WOA-PLS, MC-WOA-PLS achieves the best prediction accuracy and stability for quantification of the five pure oils in adulterated grapeseed oil samples.
Collapse
Affiliation(s)
- Xihui Bian
- State Key Laboratory of Separation Membranes and Membrane Processes, School of Chemical Engineering and Technology, Tiangong University, Tianjin 300387, PR China; NMPA Key Laboratory for Technology Research and Evaluation of Drug Products, Shandong University, Jinan 250012, PR China.
| | - Yuxia Liu
- State Key Laboratory of Separation Membranes and Membrane Processes, School of Chemical Engineering and Technology, Tiangong University, Tianjin 300387, PR China
| | - Rongling Zhang
- State Key Laboratory of Separation Membranes and Membrane Processes, School of Chemical Engineering and Technology, Tiangong University, Tianjin 300387, PR China
| | - Hao Sun
- State Key Laboratory of Separation Membranes and Membrane Processes, School of Chemical Engineering and Technology, Tiangong University, Tianjin 300387, PR China
| | - Peng Liu
- State Key Laboratory of Separation Membranes and Membrane Processes, School of Chemical Engineering and Technology, Tiangong University, Tianjin 300387, PR China
| | - Xiaoyao Tan
- State Key Laboratory of Separation Membranes and Membrane Processes, School of Chemical Engineering and Technology, Tiangong University, Tianjin 300387, PR China
| |
Collapse
|
3
|
Li M, Lai W, Li R, Zhou J, Liu Y, Yu T, Zhang T, Tang H, Li H. Novel Random Forest Ensemble Modeling Strategy Combined with Quantitative Structure-Property Relationship for Density Prediction of Energetic Materials. ACS OMEGA 2023; 8:2752-2759. [PMID: 36687054 PMCID: PMC9850487 DOI: 10.1021/acsomega.2c07436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Accepted: 12/23/2022] [Indexed: 06/17/2023]
Abstract
With the further development of the concept of green chemistry, the new generation of energetic materials tends to exhibit detonation properties such as higher insensitivity, higher density, and higher energy. Therefore, the precise molecular design and green and efficient synthesis of energetic materials will be one of the serious challenges. For the purpose of accurate prediction of detonation performance of energetic materials, an ensemble modeling strategy based on the combination of Monte Carlo (MC) and variable importance measurement (VIM) improved random forest (RF) and quantitative structure-property relationship (QSPR) is proposed, which was successfully used for density prediction of energetic materials. First, the structure of 162 energetic compounds was optimized by Gaussian software, and the molecular descriptor data were calculated by CODESSA software based on the optimized molecular structure. Then, the MCVIMRF_Med ensemble model was constructed on the basis of the above molecular descriptor data and the corresponding energetic compound density index. The joint X-Y distance algorithm (SPXY) is used to partition the data set. And then, MC is used to further divide the calibration set data into multiple subsets for the construction of the ensemble model. The subset size and the number of iterations of the MCVIMRF_Med ensemble model were optimized through MC cross validation. The final output strategy of the ensemble model is optimized based on the optimized parameters, and an output optimization method based on median screening is proposed and successfully applied for the prediction performance optimization of the MCVIMRF_Med ensemble model. To further investigate the performance of the MCVIMRF_Med ensemble model, the performance of it was compared with partial least squares, RF, VIMRF, and MCVIMRF calibration models. It shows that the MCVIMRF_Med ensemble model can achieve a better prediction result for the density of energetic materials, with R 2 CV of 0.9596, RMSECV of 0.0437 g/cm3, R 2 P of 0.9768, RMSEP of 0.0578 g/cm3, and relative analysis deviation of prediction set of 3.951. Therefore, the MCVIMRF_Med ensemble modeling strategy combined with QSPR is an effective approach for the density prediction of energetic materials. This work is expected to provide new research ideas and technical support for accurate prediction of detonation performance of energetic materials.
Collapse
Affiliation(s)
- Maogang Li
- Key
Laboratory of Synthetic and Natural Functional Molecule of the Ministry
of Education, College of Chemistry & Materials Science, Northwest University, Xi’an 710127, China
| | - Weipeng Lai
- Xi’an
Modern Chemistry Research Institute, Xi’an 710065, China
| | - Ruirui Li
- Guangzhou
University of Chinese Medicine, Guangzhou 510006, China
| | - Jiajun Zhou
- Key
Laboratory of Synthetic and Natural Functional Molecule of the Ministry
of Education, College of Chemistry & Materials Science, Northwest University, Xi’an 710127, China
| | - Yingzhe Liu
- Xi’an
Modern Chemistry Research Institute, Xi’an 710065, China
| | - Tao Yu
- Xi’an
Modern Chemistry Research Institute, Xi’an 710065, China
| | - Tianlong Zhang
- Key
Laboratory of Synthetic and Natural Functional Molecule of the Ministry
of Education, College of Chemistry & Materials Science, Northwest University, Xi’an 710127, China
| | - Hongsheng Tang
- Key
Laboratory of Synthetic and Natural Functional Molecule of the Ministry
of Education, College of Chemistry & Materials Science, Northwest University, Xi’an 710127, China
| | - Hua Li
- Key
Laboratory of Synthetic and Natural Functional Molecule of the Ministry
of Education, College of Chemistry & Materials Science, Northwest University, Xi’an 710127, China
- College
of Chemistry and Chemical Engineering, Xi’an
Shiyou University, Xi’an 710065, China
| |
Collapse
|
4
|
Yu S, Liu J. Ensemble calibration model of near-infrared spectroscopy based on functional data analysis. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2022; 280:121569. [PMID: 35780759 DOI: 10.1016/j.saa.2022.121569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 05/26/2022] [Accepted: 06/25/2022] [Indexed: 06/15/2023]
Abstract
As a nondestructive detection technology, near-infrared spectroscopy has been widely applied in various fields. With the wide application of near-infrared spectroscopy, the research on data processing has attracted more attention. Different from the existing discrete data model and based on the functional data analysis method, an ensemble calibration model FDA-EM-PLS (functional data analysis-ensemble learning-partial least squares) of near-infrared spectroscopy is proposed in this paper. Firstly, the near-infrared spectroscopy of each sample is divided into several intervals, and the functional data analysis is carried out on each interval. Then, the samples are clustered according to the generated functions, which can not only reduce the influence of noise, but also provide a theoretical basis for selecting variables. Further, Monte Carlo sampling is used to generate training subsets from clustering samples for ensemble learning, which not only solves the problem of small samples, but also improves the robustness of the model. The relevant experimental results show that the absolute relative error of FDA-EM-PLS for the corn and soil data are both less than 10%.
Collapse
Affiliation(s)
- Shaohui Yu
- School of Mathematics and Statistics, Hefei Normal University, Hefei 230061, China
| | - Jing Liu
- Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China.
| |
Collapse
|
5
|
Wang HP, Chen P, Dai JW, Liu D, Li JY, Xu YP, Chu XL. Recent advances of chemometric calibration methods in modern spectroscopy: Algorithms, strategy, and related issues. Trends Analyt Chem 2022. [DOI: 10.1016/j.trac.2022.116648] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
6
|
Zhang X, Li S, Shan Y, Li P, Jiang L, Liu X, Fan W. Accurate nondestructive prediction of soluble solids content in citrus by near‐infrared diffuse reflectance spectroscopy with characteristic variable selection. J FOOD PROCESS PRES 2022. [DOI: 10.1111/jfpp.16480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Xinxin Zhang
- College of Food Science and Technology Hunan Provincial Key Laboratory of Food Science and Biotechnology Hunan Agricultural University Changsha 410128 P. R. China
| | - Shangke Li
- College of Food Science and Technology Hunan Provincial Key Laboratory of Food Science and Biotechnology Hunan Agricultural University Changsha 410128 P. R. China
| | - Yang Shan
- Hunan Agricultural Product Processing Institute Hunan Provincial Key Laboratory for Fruits and Vegetables Storage Processing and Quality Safety Hunan Academy of Agricultural Sciences Changsha 410125 P. R. China
| | - Pao Li
- College of Food Science and Technology Hunan Provincial Key Laboratory of Food Science and Biotechnology Hunan Agricultural University Changsha 410128 P. R. China
- Hunan Agricultural Product Processing Institute Hunan Provincial Key Laboratory for Fruits and Vegetables Storage Processing and Quality Safety Hunan Academy of Agricultural Sciences Changsha 410125 P. R. China
| | - Liwen Jiang
- College of Food Science and Technology Hunan Provincial Key Laboratory of Food Science and Biotechnology Hunan Agricultural University Changsha 410128 P. R. China
| | - Xia Liu
- College of Food Science and Technology Hunan Provincial Key Laboratory of Food Science and Biotechnology Hunan Agricultural University Changsha 410128 P. R. China
| | - Wei Fan
- College of Food Science and Technology Hunan Provincial Key Laboratory of Food Science and Biotechnology Hunan Agricultural University Changsha 410128 P. R. China
| |
Collapse
|
7
|
Spectral variable selection based on least absolute shrinkage and selection operator with ridge-adding homotopy. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS 2022. [DOI: 10.1016/j.chemolab.2021.104487] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
|
8
|
Wang K, Bian X, Zheng M, Liu P, Lin L, Tan X. Rapid determination of hemoglobin concentration by a novel ensemble extreme learning machine method combined with near-infrared spectroscopy. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2021; 263:120138. [PMID: 34304011 DOI: 10.1016/j.saa.2021.120138] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/11/2021] [Revised: 06/23/2021] [Accepted: 06/29/2021] [Indexed: 06/13/2023]
Abstract
A novel ensemble extreme learning machine (ELM) approach that combines Monte Carlo (MC) sampling and least absolute shrinkage and selection operator (LASSO), named as MC-LASSO-ELM, is proposed to determine hemoglobin concentration of blood. It employs MC sampling to randomly select samples from the training set and LASSO further to choose variables from selected samples to establish plenty of ELM sub-models. The final prediction is obtained by combining the predictions of these sub-models. Combined with near-infrared spectroscopy, MC-LASSO-ELM is used to determine the hemoglobin concentration of blood. Compared with ELM, MC-ELM and LASSO-ELM, MC-LASSO-ELM can obtain the best stability and highest accuracy.
Collapse
Affiliation(s)
- Kaiyi Wang
- State Key Laboratory of Separation Membranes and Membrane Processes, Tiangong University, Tianjin 300387, PR China; Tianjin Key Laboratory of Green Chemical Process Engineering, School of Chemical Engineering and Technology, Tiangong University, Tianjin 300387, PR China
| | - Xihui Bian
- State Key Laboratory of Separation Membranes and Membrane Processes, Tiangong University, Tianjin 300387, PR China; Tianjin Key Laboratory of Green Chemical Process Engineering, School of Chemical Engineering and Technology, Tiangong University, Tianjin 300387, PR China; Key Lab of Process Analysis and Control of Sichuan Universities, Yibin University, 644000, PR China.
| | - Meng Zheng
- Tianjin Key Laboratory of Green Chemical Process Engineering, School of Chemical Engineering and Technology, Tiangong University, Tianjin 300387, PR China
| | - Peng Liu
- State Key Laboratory of Separation Membranes and Membrane Processes, Tiangong University, Tianjin 300387, PR China; Tianjin Key Laboratory of Green Chemical Process Engineering, School of Chemical Engineering and Technology, Tiangong University, Tianjin 300387, PR China
| | - Ligang Lin
- State Key Laboratory of Separation Membranes and Membrane Processes, Tiangong University, Tianjin 300387, PR China
| | - Xiaoyao Tan
- State Key Laboratory of Separation Membranes and Membrane Processes, Tiangong University, Tianjin 300387, PR China; Tianjin Key Laboratory of Green Chemical Process Engineering, School of Chemical Engineering and Technology, Tiangong University, Tianjin 300387, PR China
| |
Collapse
|
9
|
Ensemble Modeling on Near-Infrared Spectra as Rapid Tool for Assessment of Soil Health Indicators for Sustainable Food Production Systems. SOIL SYSTEMS 2021. [DOI: 10.3390/soilsystems5040069] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
A novel total ensemble (TE) algorithm was developed and compared with random forest optimization (RFO), gradient boosted machines (GBM), partial least squares (PLS), Cubist and Bayesian additive regression tree (BART) algorithms to predict numerous soil health indicators in soils with diverse climate-smart land uses at different soil depths. The study investigated how land-use practices affect several soil health indicators. Good predictions using the ensemble method were obtained for total carbon (R2 = 0.87; RMSE = 0.39; RPIQ = 1.36 and RPD = 1.51), total nitrogen (R2 = 0.82; RMSE = 0.03; RPIQ = 2.00 and RPD = 1.60), and exchangeable bases, m3. Cu, m3. Fe, m3. B, m3. Mn, exchangeable Na, Ca (R2 > 0.70). The performances of algorithms were in order of TE > Cubist > BART > PLS > GBM > RFO. Soil properties differed significantly among land uses and between soil depths. In Kenya, however, soil pH was not significant, except at depths of 45–100 cm, while the Fe levels in Tanzanian grassland were significantly high at all depths. Ugandan agroforestry had a substantially high concentration of ExCa at 0–15 cm. The total ensemble method showed better predictions as compared to other algorithms. Climate-smart land-use practices to preserve soil quality can be adopted for sustainable food production systems.
Collapse
|
10
|
Sun A, Jia W, Hei D, Qiu M, Cheng C, Li J. A full spectral analysis method for the gamma spectrum: weighted library least squares. ANALYTICAL METHODS : ADVANCING METHODS AND APPLICATIONS 2021; 13:4718-4723. [PMID: 34580692 DOI: 10.1039/d1ay01319j] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The traditional library least squares approach (LLS) is affected by the inconsistency of the statistical uncertainties of different channels in a gamma spectrum, which leads to large fluctuations in the analysis results. This work proposes a weighted library least squares approach (WLLS) that uses the square root of the count to weight the regression objective function and has implemented a verification experiment based on Prompt Gamma Neutron Activation Analysis (PGNAA). The results showed that, after weighing using the square root of the count, the fluctuation level of statistical uncertainty in the spectrum was reduced from 44.34 to 2.25. After the analysis of the WLLS approach, the average standard deviation of the results was reduced to at least 0.37 times that of the LLS approach.
Collapse
Affiliation(s)
- AiYun Sun
- Department of Nuclear Science and Technology, College of Materials Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, 210016, China
| | - WenBao Jia
- Department of Nuclear Science and Technology, College of Materials Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, 210016, China
- Collaborative Innovation Centre of Radiation Medicine of Jiangsu Higher Education Institutions, Suzhou, 215000, China
| | - DaQian Hei
- School of Nuclear Science and Technology, Lanzhou University, Lanzhou, 730000, China.
| | - MengCheng Qiu
- Department of Nuclear Science and Technology, College of Materials Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, 210016, China
| | - Can Cheng
- Department of Nuclear Science and Technology, College of Materials Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, 210016, China
| | - JiaTong Li
- School of Physical Science and Technology, Lanzhou University, Lanzhou, 730000, China
| |
Collapse
|