1
|
Ren J, Xiong Y, Chen X, Hao Y. Comparative Analysis of Machine Learning and Deep Learning Algorithms for Assessing Agricultural Product Quality Using NIRS. SENSORS (BASEL, SWITZERLAND) 2024; 24:5438. [PMID: 39205133 PMCID: PMC11360223 DOI: 10.3390/s24165438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Revised: 08/17/2024] [Accepted: 08/19/2024] [Indexed: 09/04/2024]
Abstract
The success of near-infrared spectroscopy (NIRS) analysis hinges on the precision and robustness of the calibration model. Shallow learning (SL) algorithms like partial least squares discriminant analysis (PLS-DA) often fall short in capturing the interrelationships between adjacent spectral variables, and the analysis results are easily affected by spectral noise, which dramatically limits the breadth and depth of applications of NIRS. Deep learning (DL) methods, with their capacity to discern intricate features from limited samples, have been progressively integrated into NIRS. In this paper, two discriminant analysis problems, including wheat kernels and Yali pears as examples, and several representative calibration models were used to research the robustness and effectiveness of the model. Additionally, this article proposed a near-infrared calibration model, which was based on the Gramian angular difference field method and coordinate attention convolutional neural networks (G-CACNNs). The research results show that, compared with SL, spectral preprocessing has a smaller impact on the analysis accuracy of consensus learning (CL) and DL, and the latter has the highest analysis accuracy in the modeling results using the original spectrum. The accuracy of G-CACNNs in two discrimination tasks was 98.48% and 99.39%. Finally, this research compared the performance of various models under noise to evaluate the robustness and noise resistance of the proposed method.
Collapse
Affiliation(s)
- Jiwen Ren
- School of Mechatronics and Vehicle Engineering, East China Jiaotong University, Nanchang 330013, China; (J.R.); (Y.X.)
| | - Yuming Xiong
- School of Mechatronics and Vehicle Engineering, East China Jiaotong University, Nanchang 330013, China; (J.R.); (Y.X.)
| | - Xinyu Chen
- Optoelectronics Department of Changzhou Institute of Technology, Changzhou 213000, China;
| | - Yong Hao
- School of Mechatronics and Vehicle Engineering, East China Jiaotong University, Nanchang 330013, China; (J.R.); (Y.X.)
| |
Collapse
|
2
|
Li MX, Shi YB, Zhang JB, Wan X, Fang J, Wu Y, Fu R, Li Y, Li L, Su LL, Ji D, Lu TL, Bian ZH. Rapid evaluation of Ziziphi Spinosae Semen and its adulterants based on the combination of FT-NIR and multivariate algorithms. Food Chem X 2023; 20:101022. [PMID: 38144802 PMCID: PMC10740088 DOI: 10.1016/j.fochx.2023.101022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 11/09/2023] [Accepted: 11/19/2023] [Indexed: 12/26/2023] Open
Abstract
Ziziphi Spinosae Semen (ZSS) is a valued seed renowned for its sedative and sleep-enhancing properties. However, the price increase has been accompanied by adulteration. In this study, chromaticity analysis and Fourier transform near-infrared (FT-NIR) combined with multivariate algorithms were employed to identify the adulteration and quantitatively predict the adulteration ratio. The findings suggested that the utilization of chromaticity extractor was insufficient for identification of adulteration ratio. The raw spectrum of ZMS and HAS adulterants extracted by FT-NIR was processed by SNV + CARS and 1d + SG + ICO respectively, the average accuracy of machine learning classification model was improved from 77.06 % to 97.58 %. Furthermore, the R2 values of the calibration and prediction set of the two quantitative prediction regression models of adulteration ratio are greater than 0.99, demonstrating excellent linearity and predictive accuracy. Overall, this study demonstrated that FT-NIR combined with multivariate algorithms provided a significant approach to addressing the growing issue of ZSS adulteration.
Collapse
Affiliation(s)
- Ming-xuan Li
- College of Pharmacy, Nanjing University of Chinese Medicine, Nanjing, 210023, China
| | - Ya-bo Shi
- College of Pharmacy, Nanjing University of Chinese Medicine, Nanjing, 210023, China
| | - Jiu-ba Zhang
- College of Pharmacy, Nanjing University of Chinese Medicine, Nanjing, 210023, China
| | - Xin Wan
- College of Pharmacy, Nanjing University of Chinese Medicine, Nanjing, 210023, China
| | - Jun Fang
- College of Pharmacy, Nanjing University of Chinese Medicine, Nanjing, 210023, China
| | - Yi Wu
- College of Pharmacy, Nanjing University of Chinese Medicine, Nanjing, 210023, China
| | - Rao Fu
- College of Pharmacy, Nanjing University of Chinese Medicine, Nanjing, 210023, China
| | - Yu Li
- College of Pharmacy, Nanjing University of Chinese Medicine, Nanjing, 210023, China
| | - Lin Li
- College of Pharmacy, Nanjing University of Chinese Medicine, Nanjing, 210023, China
| | - Lian-lin Su
- College of Pharmacy, Nanjing University of Chinese Medicine, Nanjing, 210023, China
| | - De Ji
- College of Pharmacy, Nanjing University of Chinese Medicine, Nanjing, 210023, China
| | - Tu-lin Lu
- College of Pharmacy, Nanjing University of Chinese Medicine, Nanjing, 210023, China
| | - Zhen-hua Bian
- Department of Pharmacy, Wuxi TCM Hospital Affiliated to Nanjing University of Chinese Medicine, Wuxi, 214071, China
- Jiangsu CM Clinical Innovation Center of Degenerative Bone & Joint Disease, Wuxi TCM Hospital Affiliated to Nanjing University of Chinese Medicine, Wuxi, 214071, China
| |
Collapse
|
3
|
Zhang J, Zhou X, Li B. PFCE2: A versatile parameter-free calibration enhancement framework for near-infrared spectroscopy. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2023; 301:122978. [PMID: 37295380 DOI: 10.1016/j.saa.2023.122978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/22/2023] [Revised: 05/29/2023] [Accepted: 06/02/2023] [Indexed: 06/12/2023]
Abstract
Near-infrared (NIR) spectroscopy is a widely used technique for chemical analysis, but it has faced challenges of calibration transfer, maintenance, and enhancement among different instruments and conditions. The parameter-free calibration enhancement (PFCE) framework was developed to address these challenges with non-supervised (NS), semi-supervised (SS), and full-supervised (FS) methods. This study presented PFCE2, an updated version of the PFCE framework that incorporates two new constraints and a new method to improve the robustness and efficiency of calibration enhancement. First, normalized L2 and L1 constraints were introduced to replace the correlation coefficient (Corr) constraint used in the original PFCE. These constraints preserve the parameter-free feature of PFCE and impose smoothness or sparsity on the model coefficients. Second, multitask PFCE (MT-PFCE) was proposed within the framework to address the calibration enhancement among multiple instruments, enabling the framework to be versatile for all possible calibration transfer situations. Demonstrations conducted on three NIR datasets of tablets, plant leaves, and corn showed that the PFCE methods with the new L2 and L1 constraints can result in more accurate and robust predictions than the Corr constraint, especially when the standard sample size is small. Moreover, MT-PFCE could refine all models in the involved scenarios at once, leading to significant enhancement in model performance, compared to the original PFCE method with the same data requirements. Finally, the applicable situations of the PFCE framework and other analogous calibration transfer methods were summarized, facilitating users to choose suitable methods for their application. The source codes written in both MATLAB and Python are available at https://github.com/JinZhangLab/PFCE and https://pypi.org/project/pynir/, respectively.
Collapse
Affiliation(s)
- Jin Zhang
- Key Laboratory of Environmental Pollution Monitoring and Disease Control, Ministry of Education, School of Public Health, Guizhou Medical University, Guiyang 550025, China
| | - Xu Zhou
- Key Laboratory of Environmental Pollution Monitoring and Disease Control, Ministry of Education, School of Public Health, Guizhou Medical University, Guiyang 550025, China
| | - Boyan Li
- Key Laboratory of Environmental Pollution Monitoring and Disease Control, Ministry of Education, School of Public Health, Guizhou Medical University, Guiyang 550025, China.
| |
Collapse
|
4
|
Geng Y, Ni H, Shen H, Wang H, Wu J, Pan K, Wu Y, Chen Y, Luo Y, Xu T, Liu X. Feasibility of an NIR spectral calibration transfer algorithm based on optimized feature variables to predict tobacco samples in different states. ANALYTICAL METHODS : ADVANCING METHODS AND APPLICATIONS 2023; 15:719-728. [PMID: 36722963 DOI: 10.1039/d2ay01805e] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
The prediction accuracy of calibration models for near-infrared (NIR) spectroscopy typically relies on the morphology and homogeneity of the samples. To achieve non-homogeneous tobacco samples for non-destructive and rapid analysis, a method that can predict tobacco filament samples using reliable models based on the corresponding tobacco powder is proposed here. First, as it is necessary to establish a simple and robust calibrated model with excellent performance, based on full-wavelength PLSR (Full-PLSR), the key feature variables were screened by three methods, namely competitive adaptive reweighted sampling (CARS), variable combination population analysis-iteratively retaining informative variables (VCPA-IRIV), and variable combination population analysis-genetic algorithm (VCPA-GA). The partial least squares regression (PLSR) models for predicting the total sugar content in tobacco were established based on three optimal wavelength sets and named CARS-PLSR, VCPA-IRIV-PLSR and VCPA-GA-PLSR, respectively. Subsequently, they were combined with different calibration transfer algorithms, including calibration transfer based on canonical correlation analysis (CTCCA), slope/bias correction (S/B) and non-supervised parameter-free framework for calibration enhancement (NS-PFCE), to evaluate the best prediction model for the tobacco filament samples. Compared with the previous two transfer algorithms, NS-PFCE performed the best under various wavelength conditions. The prediction results indicated that the most successful approach for predicting the tobacco filament samples was achieved by VCPA-IRIV-PLSR when coupled with the NS-PFCE method, which obtained the highest determination coefficient (Rp2 = 0.9340) and the lowest root mean square error of the prediction set (RMSEP = 0.8425). VCPA-IRIV simplifies the calibration model and improves the efficiency of model transfer (31 variables). Furthermore, it pledges the prediction accuracy of the tobacco filament samples when combined with NS-PFCE. In summary, calibration transfer based on optimized feature variables can eliminate prediction errors caused by sample morphological differences and proves to be a more beneficial method for online application in the tobacco industry.
Collapse
Affiliation(s)
- Yingrui Geng
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
| | - Hongfei Ni
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Hangzhou 310018, China
| | - Huanchao Shen
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Hangzhou 310018, China
| | - Hui Wang
- Technology Center, China Tobacco Zhejiang Industrial Co., Ltd, Hangzhou 310008, China
| | - Jizhong Wu
- Technology Center, China Tobacco Zhejiang Industrial Co., Ltd, Hangzhou 310008, China
| | - Keyu Pan
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
| | - Yongjiang Wu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
| | - Yong Chen
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
| | - Yingjie Luo
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
| | - Tengfei Xu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
| | - Xuesong Liu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
| |
Collapse
|
5
|
Zhang H, Tan H, Lin B, Yang X, Sun Z, Zhong L, Gao L, Li L, Dong Q, Nie L, Zang H. Improved Principal Component Analysis (IPCA): A Novel Method for Quantitative Calibration Transfer between Different Near-Infrared Spectrometers. MOLECULES (BASEL, SWITZERLAND) 2023; 28:molecules28010406. [PMID: 36615595 PMCID: PMC9823907 DOI: 10.3390/molecules28010406] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 12/25/2022] [Accepted: 12/30/2022] [Indexed: 01/04/2023]
Abstract
Given the labor-consuming nature of model establishment, model transfer has become a considerable topic in the study of near-infrared (NIR) spectroscopy. Recently, many new algorithms have been proposed for the model transfer of spectra collected by the same types of instruments under different situations. However, in a practical scenario, we need to deal with model transfer between different types of instruments. To expand model applicability, we must develop a method that could transfer spectra acquired from different types of NIR spectrometers with different wavenumbers or absorbance. Therefore, in our study, we propose a new methodology based on improved principal component analysis (IPCA) for calibration transfer between different types of spectrometers. We adopted three datasets for method evaluation, including public pharmaceutical tablets (dataset 1), corn data (dataset 2), and the spectra of eight batches of samples acquired from the plasma ethanol precipitation process collected by FT-NIR and MicroNIR spectrometers (dataset 3). In the calibration transfer for public datasets, IPCA displayed comparable results with the classical calibration transfer method using piecewise direct standardization (PDS), indicating its obvious ability to transfer spectra collected from the same types of instruments. However, in the calibration transfer for dataset 3, our proposed IPCA method achieved a successful bi-transfer between the spectra acquired from the benchtop and micro-instruments with/without wavelength region selection. Furthermore, our proposed method enabled improvements in prediction ability rather than the degradation of the models built with original micro spectra. Therefore, our proposed method has no limitations on the spectrum for model transfer between different types of NIR instruments, thus allowing a wide application range, which could provide a supporting technology for the practical application of NIR spectroscopy.
Collapse
Affiliation(s)
- Hui Zhang
- NMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Wenhuaxi Road 44, Jinan 250012, China
- National Glycoengineering Research Center, Shandong University, Qingdao 266237, China
- NMPA Key Laboratory for Quality Research and Evaluation of Carbohydrate-Based Medicine, Shandong University, Qingdao 266237, China
- Shandong Provincial Technology Innovation Center of Carbohydrate, Shandong University, Qingdao 266237, China
| | - Haining Tan
- National Glycoengineering Research Center, Shandong University, Qingdao 266237, China
- NMPA Key Laboratory for Quality Research and Evaluation of Carbohydrate-Based Medicine, Shandong University, Qingdao 266237, China
- Shandong Provincial Technology Innovation Center of Carbohydrate, Shandong University, Qingdao 266237, China
| | - Boran Lin
- NMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Wenhuaxi Road 44, Jinan 250012, China
| | - Xiangchun Yang
- NMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Wenhuaxi Road 44, Jinan 250012, China
| | - Zhongyu Sun
- NMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Wenhuaxi Road 44, Jinan 250012, China
| | - Liang Zhong
- NMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Wenhuaxi Road 44, Jinan 250012, China
| | - Lele Gao
- NMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Wenhuaxi Road 44, Jinan 250012, China
| | - Lian Li
- NMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Wenhuaxi Road 44, Jinan 250012, China
| | - Qin Dong
- NMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Wenhuaxi Road 44, Jinan 250012, China
| | - Lei Nie
- NMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Wenhuaxi Road 44, Jinan 250012, China
- Correspondence: (L.N.); (H.Z.); Tel.: +86-531-8838-2330 (L.N.); +86-531-8838-0268 (H.Z.)
| | - Hengchang Zang
- NMPA Key Laboratory for Technology Research and Evaluation of Drug Products, School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Wenhuaxi Road 44, Jinan 250012, China
- Correspondence: (L.N.); (H.Z.); Tel.: +86-531-8838-2330 (L.N.); +86-531-8838-0268 (H.Z.)
| |
Collapse
|
6
|
Geng Y, Shen H, Ni H, Tian Y, Zhao Z, Chen Y, Liu X. Non-destructive determination of total sugar content in tobacco filament based on calibration transfer with parameter free adjustment. Microchem J 2022. [DOI: 10.1016/j.microc.2022.107797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
7
|
Zhang J, Yuan T, Wei S, Feng Z, Li B, Huang H. New strategy for clinical etiologic diagnosis of acute ischemic stroke and blood biomarker discovery based on machine learning. RSC Adv 2022; 12:14716-14723. [PMID: 35702238 PMCID: PMC9109259 DOI: 10.1039/d2ra02022j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 05/09/2022] [Indexed: 12/02/2022] Open
Abstract
Acute ischemic stroke (AIS) is a syndrome characterized by high morbidity, prevalence, mortality, recurrence and disability. The longer the delay before proper treatment of a stroke, the greater the likelihood of brain damage and disability. Computed tomography and nuclear magnetic resonance are the primary choices for fast diagnosis of AIS in the early stage, which can provide certain information about infarction location and degree, and even the vascular distribution of lesions responsible for strokes. However, this is quite difficult to achieve in small clinics or at-home diagnoses. Hematology tests could quickly obtain a large number of pathology-related indicators, and offer an effective method for rapid AIS diagnosis when combined with the machine learning technique. To explore a reliable, predictable method for early clinical etiologic diagnosis of AIS, a retrospective study was deployed on 456 AIS patients at the early stage and 28 reference subjects without the symptoms of AIS, by means of the selected significant traits amongst 64 clinical and blood traits in conjunction with powerful machine learning strategies. Five representative biomarkers were closely related to cardioembolic (CE), 22 to large artery atherosclerosis (LAA), and 15 to small vessel occlusion (SVO) strokes, respectively. With these biomarkers, different etiologic subtypes of stroke patients were determined with high accuracy of >0.73, sensitivity of >0.73, and specificity of >0.70, which was comparable to the accuracy obtained in the emergency department by clinical diagnosis. The proposed method may offer an alternative strategy for the etiologic diagnosis of AIS at the early stage when integrating significant blood traits into machine learning. A rapid and safe strategy was proposed for clinical etiologic diagnosis of acute ischemic stroke at the early stage using clinical hematology traits and machine learning. Blood biomarkers were effectively identified.![]()
Collapse
Affiliation(s)
- Jin Zhang
- School of Public Health/Key Laboratory of Endemic and Ethnic Diseases, Ministry of Education & Key Laboratory of Medical Molecular Biology of Guizhou Province, Guizhou Medical University Guiyang 550025 China
| | - Ting Yuan
- Center for Clinical Laboratories, The Affiliated Hospital of Guizhou Medical University Guiyang 550014 China.,School of Clinical Laboratory Science, Guizhou Medical University Guiyang 550025 China
| | - Sixi Wei
- Center for Clinical Laboratories, The Affiliated Hospital of Guizhou Medical University Guiyang 550014 China.,School of Clinical Laboratory Science, Guizhou Medical University Guiyang 550025 China
| | - Zhanhui Feng
- Neurological Department, The Affiliated Hospital of Guizhou Medical University Guiyang 550014 China
| | - Boyan Li
- School of Public Health/Key Laboratory of Endemic and Ethnic Diseases, Ministry of Education & Key Laboratory of Medical Molecular Biology of Guizhou Province, Guizhou Medical University Guiyang 550025 China
| | - Hai Huang
- Center for Clinical Laboratories, The Affiliated Hospital of Guizhou Medical University Guiyang 550014 China.,School of Clinical Laboratory Science, Guizhou Medical University Guiyang 550025 China
| |
Collapse
|
8
|
Wang HP, Chen P, Dai JW, Liu D, Li JY, Xu YP, Chu XL. Recent advances of chemometric calibration methods in modern spectroscopy: Algorithms, strategy, and related issues. Trends Analyt Chem 2022. [DOI: 10.1016/j.trac.2022.116648] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
9
|
Piecewise direct standardization assisted with second-order calibration methods to solve signal instability in high-performance liquid chromatography-diode array detection systems. J Chromatogr A 2022; 1667:462851. [DOI: 10.1016/j.chroma.2022.462851] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 01/23/2022] [Accepted: 01/24/2022] [Indexed: 11/21/2022]
|
10
|
Shan P, Li Z, Wang Q, He Z, Wang S, Zhao Y, Wu Z, Peng S. Self-organizing maps-based generalized feature set selection for model adaption without reference data for batch process. Anal Chim Acta 2021; 1188:339205. [PMID: 34794558 DOI: 10.1016/j.aca.2021.339205] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2021] [Revised: 10/19/2021] [Accepted: 10/20/2021] [Indexed: 12/01/2022]
Abstract
When fourier transform infrared spectroscopy (FTIR) techniques combined with multivariate calibration are used to measure the key process features or analyte concentrations during batch process, model adaption is indispensable for maintaining the predictability of a primary calibration model in new secondary batches. Many model adaption methods conforming to the actual application scenario of batch process have been proposed. Here we report on a novel standard-free model adaption method without reference measurement called variable selection strategy with self-organizing maps (VSSOM). It uses self-organizing maps (SOM) to classify the whole spectral variables into multiple classes according to the spectra from primary batch and secondary batch, respectively; and the corresponding primary feature subsets and secondary feature subsets are formed firstly. Secondly, candidate feature subsets without empty elements are generated by operating intersection between any primary feature subsets and any secondary feature subsets. Thirdly, the candidate feature subset with minimum root mean square error of cross-validation (RMSECV) for the primary calibration set is selected as the optimal feature subset. In this manner, the optimal feature subset can be identified from the candidate feature subsets. In other words, VSSOM aims to create a stable and consistent feature subset across different batches provided that it selects better features within the intersection sets between primary feature subsets and any secondary feature subsets. Two batch process datasets (γ-polyglutamic acid fermentation and paeoniflorin extraction) are presented for comparing the VSSOM method with No transfer partial least squares (PLS), boxcar signal transfer (BST), successive projection algorithm (SPA), transfer component analysis (TCA) and domain-invariant iterative partial least squares (DIPALS). Experimental results show that VSSOM has superior performance and comparable prediction performance in all the scenarios.
Collapse
Affiliation(s)
- Peng Shan
- College of Information Science and Engineering, Northeastern University, Shenyang, 110819, Liaoning Province, China.
| | - Zhigang Li
- College of Information Science and Engineering, Northeastern University, Shenyang, 110819, Liaoning Province, China
| | - Qiaoyun Wang
- College of Information Science and Engineering, Northeastern University, Shenyang, 110819, Liaoning Province, China
| | - Zhonghai He
- College of Information Science and Engineering, Northeastern University, Shenyang, 110819, Liaoning Province, China
| | - Shuyu Wang
- College of Information Science and Engineering, Northeastern University, Shenyang, 110819, Liaoning Province, China
| | - Yuhui Zhao
- School of Computer Science and Engineering, Northeastern University, Shenyang, 110819, Liaoning Province, China
| | - Zhui Wu
- College of Information Science and Engineering, Northeastern University, Shenyang, 110819, Liaoning Province, China
| | - Silong Peng
- Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
| |
Collapse
|
11
|
Zhang J, Yang L, Tian Z, Zhao W, Sun C, Zhu L, Huang M, Guo G, Liang G. Large-Scale Screening of Antifungal Peptides Based on Quantitative Structure-Activity Relationship. ACS Med Chem Lett 2021; 13:99-104. [PMID: 35059128 PMCID: PMC8762751 DOI: 10.1021/acsmedchemlett.1c00556] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2021] [Accepted: 12/06/2021] [Indexed: 01/16/2023] Open
Abstract
Antifungal peptides are effective, biocompatible, and biodegradable, and thus, they are promising to be the next generation of drugs for treating infections caused by fungi. The identification processes of highly active peptides, however, are still time-consuming and labor-intensive. Quantitative structure-activity relationships (QSARs) have dramatically facilitated the discovery of many bioactive drug molecules without a priori knowledge. In this study, we have established an effective QSAR protocol for screening antifungal peptides. The screening protocol integrates an accurate antifungal peptide classification model and four activity prediction models against specified target fungi. A demonstrative application was performed on more than three million candidate peptides, and three outstanding peptides were identified. The whole screening took only a few days, which was much faster than our previous experimental screening works. In conclusion, the protocol is useful and effective for reducing repetitive laboratory efforts in antifungal peptide discovery. The prediction server (antifungal Web server) is freely available at www.chemoinfolab.com/antifungal.
Collapse
Affiliation(s)
- Jin Zhang
- School
of Public Health, Guizhou Medical University, Guiyang 550025, China
| | - Longbing Yang
- School
of Basic Medical Sciences, Guizhou Medical
University, Guiyang 550025, China
| | - Zhuqing Tian
- School
of Basic Medical Sciences, Guizhou Medical
University, Guiyang 550025, China
| | - Wenjing Zhao
- School
of Basic Medical Sciences, Guizhou Medical
University, Guiyang 550025, China
| | - Chaoqin Sun
- School
of Basic Medical Sciences, Guizhou Medical
University, Guiyang 550025, China
| | - Lijuan Zhu
- School
of Basic Medical Sciences, Guizhou Medical
University, Guiyang 550025, China
| | - Mingjiao Huang
- School
of Basic Medical Sciences, Guizhou Medical
University, Guiyang 550025, China
| | - Guo Guo
- School
of Basic Medical Sciences, Guizhou Medical
University, Guiyang 550025, China,The
Key and Characteristic Laboratory of Modern Pathogen Biology, Guizhou Medical University, Guiyang 550025, China,Translational
Medicine Research Center, Guizhou Medical
University, Guiyang 550025, China,Guo
Guo: School of Basic Medical
Sciences, The Key and Characteristic Laboratory of Modern Pathogen
Biology, Translational Medicine Research Center, Guizhou Medical University,
Guiyang 550025, China.
| | - Guiyou Liang
- Translational
Medicine Research Center, Guizhou Medical
University, Guiyang 550025, China,Guiyou Liang
| |
Collapse
|
12
|
Mishra P. Chemometric approaches for calibrating high-throughput spectral imaging setups to support digital plant phenotyping by calibrating and transferring spectral models from a point spectrometer. Anal Chim Acta 2021; 1187:339154. [PMID: 34753582 DOI: 10.1016/j.aca.2021.339154] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Revised: 09/27/2021] [Accepted: 10/05/2021] [Indexed: 01/13/2023]
Abstract
Visible and near-infrared (Vis-NIR) spectral imaging is appearing as a potential tool to support high-throughput digital agricultural plant phenotyping. One of the uses of spectral imaging is to predict non-destructively the chemical constituents in the plants such as nitrogen content which can be related to the functional status of plants. However, before using high-throughput spectral imaging, it requires extensive calibration, just as needed for any other spectral sensor. Calibrating the high-throughput spectral imaging setup can be a challenging task as the resources needed to run experiments in high-throughput setups are far more than performing measurements with point spectrometers. Hence, to supply a resource-efficient approach to calibrate spectral cameras integrated with high-throughput plant phenotyping setups, this study proposes the use of chemometric calibration transfer (CT) and model update. The main idea was to use a point spectrometer to develop the primary model and transfer it to the spectral cameras integrated into the high-throughput setups. The potential of the approach was showed using a real Vis-NIR dataset related to nitrogen prediction in wheat plants measured with point spectrometer, tabletop spectral cameras and spectral cameras integrated with a high-throughput plant phenotyping setup. For CT and model update, direct standardization and parameter-free calibration enhancement approaches were explored. A key aim of this study was to only use and compare techniques that does not require any further optimization as they can be easily implemented by the plant biologist in future applications. The proposed approach based on the transfer of point spectroscopy models to spectral cameras in a high-throughput setup can allow spectral calibrations to be sharable and widely applicable, thus helping the global digital plant phenotyping community.
Collapse
Affiliation(s)
- Puneet Mishra
- Wageningen University & Research, Food and Biobased Research, Bornse Weilanden 9, P.O. Box 17, 6700AA, Wageningen, the Netherlands.
| |
Collapse
|
13
|
Predicting sensitivity of recently harvested tomatoes and tomato sepals to future fungal infections. Sci Rep 2021; 11:23109. [PMID: 34848748 PMCID: PMC8633320 DOI: 10.1038/s41598-021-02302-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Accepted: 11/10/2021] [Indexed: 11/08/2022] Open
Abstract
Tomato is an important commercial product which is perishable by nature and highly susceptible to fungal incidence once it is harvested. Not all tomatoes are equally vulnerable to pathogenic fungi, and an early detection of the vulnerable ones can help in taking timely preventive actions, ranging from isolating tomato batches to adjusting storage conditions, but also in making right business decisions like dynamic pricing based on quality or better shelf life estimate. More importantly, early detection of vulnerable produce can help in taking timely actions to minimize potential post-harvest losses. This paper investigates Near-infrared (NIR) hyperspectral imaging (1000-1700 nm) and machine learning to build models to automatically predict the susceptibility of sepals of recently harvested tomatoes to future fungal infections. Hyperspectral images of newly harvested tomatoes (cultivar Brioso) from 5 different growers were acquired before the onset of any visible fungal infection. After imaging, the tomatoes were placed under controlled conditions suited for fungal germination and growth for a 4-day period, and then imaged using normal color cameras. All sepals in the color images were ranked for fungal severity using crowdsourcing, and the final severity of each sepal was fused using principal component analysis. A novel hyperspectral data processing pipeline is presented which was used to automatically segment the tomato sepals from spectral images with multiple tomatoes connected via a truss. The key modelling question addressed in this research is whether there is a correlation between the hyperspectral data captured at harvest and the fungal infection observed 4 days later. Using 10-fold and group k-fold cross-validation, XG-Boost and Random Forest based regression models were trained on the features derived from the hyperspectral data corresponding to each sepal in the training set and tested on hold out test set. The best model found a Pearson correlation of 0.837, showing that there is strong linear correlation between the NIR spectra and the future fungal severity of the sepal. The sepal specific predictions were aggregated to predict the susceptibility of individual tomatoes, and a correlation of 0.92 was found. Besides modelling, focus is also on model interpretation, particularly to understand which spectral features are most relevant to model prediction. Two approaches to model interpretation were explored, feature importance and SHAP (SHapley Additive exPlanations), resulting in similar conclusions that the NIR range between 1390-1420 nm contributes most to the model's final decision.
Collapse
|
14
|
Mishra P, Woltering E. Handling batch-to-batch variability in portable spectroscopy of fresh fruit with minimal parameter adjustment. Anal Chim Acta 2021; 1177:338771. [PMID: 34482899 DOI: 10.1016/j.aca.2021.338771] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 05/17/2021] [Accepted: 06/15/2021] [Indexed: 10/21/2022]
Abstract
Near-infrared (NIR) spectroscopy models for fresh fruit quality prediction often fail when used on a new batch or scenario having new variability which was absent in the primary calibration. To handle the new variability often model updating is required. In this study, to solve the challenge of updating NIR models related to fresh fruit quality properties, the use of a semi-supervised parameter-free calibration enhancement (PFCE) approach was proposed. Model updating with PFCE was shown in two ways: first where the model on the primary batch was updated individually for each new fruit batch, and second where the model was sequentially updated for the next batches. Furthermore, for the first time, a case of updating an instrument transferred model was also presented. The PFCE approach was shown in two real cases related to moisture and total soluble solids prediction in pear and kiwi fruit. In the case of pear, the model was later updated for 3 new measurement batches, while, for kiwi, a commercial model was updated to incorporate the variability of a new experiment carried out with a new instrument in the laboratory environment. For each modelling demonstration, the performance was benchmarked with the partial least-square (PLS) regression analysis on the primary batch. The results showed that the models updated with a semi-supervised approach kept a high predictive performance on new measurement batches, without any extra parameter optimization. An instrument transferred model was also updated to maintain its performance on different batches. Further, the sequential updating approach was found to be performing better than the update for individual batches, as the models were able to learn from multiple batches. Model updating with a semi-supervised approach can allow the NIR spectroscopy of fresh fruit to be scalable, where models can be shared between scientific or application community.
Collapse
Affiliation(s)
- Puneet Mishra
- Wageningen Food and Biobased Research, Bornse Weilanden 9, P.O. Box 17, 6700AA, Wageningen, the Netherlands.
| | - Ernst Woltering
- Wageningen Food and Biobased Research, Bornse Weilanden 9, P.O. Box 17, 6700AA, Wageningen, the Netherlands; Horticulture and Product Physiology Group, Wageningen University, Droevendaalsesteeg 1, P.O. Box 630, 6700AP, Wageningen, the Netherlands
| |
Collapse
|