1
|
Zhang T, Liu Z, Ma Q, Hu D, Dai Y, Zhang X, Zhou Z. Identification of Dendrobium Using Laser-Induced Breakdown Spectroscopy in Combination with a Multivariate Algorithm Model. Foods 2024; 13:1676. [PMID: 38890910 PMCID: PMC11172223 DOI: 10.3390/foods13111676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Revised: 05/23/2024] [Accepted: 05/25/2024] [Indexed: 06/20/2024] Open
Abstract
Dendrobium, a highly effective traditional Chinese medicinal herb, exhibits significant variations in efficacy and price among different varieties. Therefore, achieving an efficient classification of Dendrobium is crucial. However, most of the existing identification methods for Dendrobium make it difficult to simultaneously achieve both non-destructiveness and high efficiency, making it challenging to truly meet the needs of industrial production. In this study, we combined Laser-Induced Breakdown Spectroscopy (LIBS) with multivariate models to classify 10 varieties of Dendrobium. LIBS spectral data for each Dendrobium variety were collected from three circular medicinal blocks. During the data analysis phase, multivariate models to classify different Dendrobium varieties first preprocess the LIBS spectral data using Gaussian filtering and stacked correlation coefficient feature selection. Subsequently, the constructed fusion model is utilized for classification. The results demonstrate that the classification accuracy of 10 Dendrobium varieties reached 100%. Compared to Support Vector Machine (SVM), Random Forest (RF), and K-Nearest Neighbors (KNN), our method improved classification accuracy by 14%, 20%, and 20%, respectively. Additionally, it outperforms three models (SVM, RF, and KNN) with added Principal Component Analysis (PCA) by 10%, 10%, and 17%. This fully validates the excellent performance of our classification method. Finally, visualization analysis of the entire research process based on t-distributed Stochastic Neighbor Embedding (t-SNE) technology further enhances the interpretability of the model. This study, by combining LIBS and machine learning technologies, achieves efficient classification of Dendrobium, providing a feasible solution for the identification of Dendrobium and even traditional Chinese medicinal herbs.
Collapse
Affiliation(s)
- Tingsong Zhang
- College of Opto-Electro-Mechanical Engineering, Zhejiang A&F University, Hangzhou 311300, China (Z.L.); (Y.D.)
| | - Ziyuan Liu
- College of Opto-Electro-Mechanical Engineering, Zhejiang A&F University, Hangzhou 311300, China (Z.L.); (Y.D.)
| | - Qing Ma
- College of Opto-Electro-Mechanical Engineering, Zhejiang A&F University, Hangzhou 311300, China (Z.L.); (Y.D.)
| | - Dong Hu
- College of Opto-Electro-Mechanical Engineering, Zhejiang A&F University, Hangzhou 311300, China (Z.L.); (Y.D.)
| | - Yujia Dai
- College of Opto-Electro-Mechanical Engineering, Zhejiang A&F University, Hangzhou 311300, China (Z.L.); (Y.D.)
| | - Xinfeng Zhang
- State Key Laboratory of Subtropical Silviculture, Zhejiang A&F University, Hangzhou 311300, China
| | - Zhu Zhou
- College of Opto-Electro-Mechanical Engineering, Zhejiang A&F University, Hangzhou 311300, China (Z.L.); (Y.D.)
| |
Collapse
|
2
|
Morger A, Garcia de Lomana M, Norinder U, Svensson F, Kirchmair J, Mathea M, Volkamer A. Studying and mitigating the effects of data drifts on ML model performance at the example of chemical toxicity data. Sci Rep 2022; 12:7244. [PMID: 35508546 PMCID: PMC9068909 DOI: 10.1038/s41598-022-09309-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 03/17/2022] [Indexed: 11/09/2022] Open
Abstract
Machine learning models are widely applied to predict molecular properties or the biological activity of small molecules on a specific protein. Models can be integrated in a conformal prediction (CP) framework which adds a calibration step to estimate the confidence of the predictions. CP models present the advantage of ensuring a predefined error rate under the assumption that test and calibration set are exchangeable. In cases where the test data have drifted away from the descriptor space of the training data, or where assay setups have changed, this assumption might not be fulfilled and the models are not guaranteed to be valid. In this study, the performance of internally valid CP models when applied to either newer time-split data or to external data was evaluated. In detail, temporal data drifts were analysed based on twelve datasets from the ChEMBL database. In addition, discrepancies between models trained on publicly-available data and applied to proprietary data for the liver toxicity and MNT in vivo endpoints were investigated. In most cases, a drastic decrease in the validity of the models was observed when applied to the time-split or external (holdout) test sets. To overcome the decrease in model validity, a strategy for updating the calibration set with data more similar to the holdout set was investigated. Updating the calibration set generally improved the validity, restoring it completely to its expected value in many cases. The restored validity is the first requisite for applying the CP models with confidence. However, the increased validity comes at the cost of a decrease in model efficiency, as more predictions are identified as inconclusive. This study presents a strategy to recalibrate CP models to mitigate the effects of data drifts. Updating the calibration sets without having to retrain the model has proven to be a useful approach to restore the validity of most models.
Collapse
Affiliation(s)
- Andrea Morger
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin Berlin, Berlin, 10117, Germany
| | - Marina Garcia de Lomana
- BASF SE, 67056, Ludwigshafen, Germany
- Division of Pharmaceutical Chemistry, Department of Pharmaceutical Sciences, University of Vienna, Vienna, 1090, Austria
| | - Ulf Norinder
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, 751 24, Sweden
- Dept Computer and Systems Sciences, Stockholm University, Kista, 164 07, Sweden
- MTM Research Centre, School of Science and Technology, 701 82, Örebro, Sweden
| | - Fredrik Svensson
- Alzheimer's Research UK UCL Drug Discovery Institute, London, WC1E 6BT, UK
| | - Johannes Kirchmair
- Division of Pharmaceutical Chemistry, Department of Pharmaceutical Sciences, University of Vienna, Vienna, 1090, Austria
| | | | - Andrea Volkamer
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin Berlin, Berlin, 10117, Germany.
| |
Collapse
|
3
|
Wang Y, Zhang M, Wu R, Wang H, Luo Z, Li G. Speech neuromuscular decoding based on spectrogram images using conformal predictors with Bi-LSTM. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.03.025] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
4
|
Zhang M, Wang Y, Wei Z, Yang M, Luo Z, Li G. Inductive conformal prediction for silent speech recognition. J Neural Eng 2020; 17. [PMID: 32120355 DOI: 10.1088/1741-2552/ab7ba0] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2019] [Accepted: 03/02/2020] [Indexed: 12/14/2022]
Abstract
OBJECTIVE Silent speech recognition based on surface electromyography has been studied for years. Though some progress in feature selection and classification has been achieved, one major problem remains: how to provide confident or reliable prediction. APPROACH Inductive conformal prediction (ICP) is a suitable and effective method to tackle this problem. This paper applies ICP with the underlying algorithm of random forest to provide confidence and reliability. We also propose a method, test time data augmentation, to use ICP as a way to utilize unlabelled data in order to improve prediction performance. MAIN RESULTS Using ICP, p-values and confidence regions for individual predictions are obtained with a guaranteed error rate. Test time data augmentation also outputs relatively better conformal predictions as more unlabelled training data accumulated. Additionally, the validity and efficiency of ICP under different significance levels are demonstrated and evaluated on the silent speech recognition dataset obtained by our own device. SIGNIFICANCE These results show the viability and effectiveness of ICP in silent speech recognition. Moreover, ICP has potential to be a powerful method for confidence predictions to ensure reliability, both in data augmentation and online prediction.
Collapse
Affiliation(s)
- Ming Zhang
- State Key Laboratory of Industrial Control Technology, Institute of Cyber Systems and Control, Zhejiang University, Hangzhou, 310058, CHINA
| | - You Wang
- State Key Laboratory of Industrial Control Technology, Institute of Cyber Systems and Control, Zhejiang University, Hangzhou, Zhejiang, CHINA
| | - Zhang Wei
- State Key Laboratory of Industrial Control Technology, Institute of Cyber Systems and Control, Zhejiang University, Hangzhou, Zhejiang, CHINA
| | - Meng Yang
- Department of Computer Science and Technology, School of Mechanical Electronic and Information Engineering, China University of Mining and Technology - Beijing Campus, Beijing, CHINA
| | - Zhiyuan Luo
- Department of Computer Science, Royal Holloway University of London, Egham, Surrey, UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND
| | - Guang Li
- State Key Laboratory of Industrial Control Technology, Institute of Cyber Systems and Control, Zhejiang University, Hangzhou, Zhejiang, CHINA
| |
Collapse
|
5
|
Li P, Ren Z, Shao K, Tan H, Niu Z. Research on Distinguishing Fish Meal Quality Using Different Characteristic Parameters Based on Electronic Nose Technology. SENSORS 2019; 19:s19092146. [PMID: 31075849 PMCID: PMC6540599 DOI: 10.3390/s19092146] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Revised: 04/26/2019] [Accepted: 05/07/2019] [Indexed: 11/16/2022]
Abstract
In this paper, a portable electronic nose, that was independently developed, was employed to detect and classify a fish meal of different qualities. SPME-GC-MS (solid phase microextraction gas chromatography mass spectrometry) analysis of fish meal was presented. Due to the large amount of data of the original features detected by the electronic nose, a reasonable selection of the original features was necessary before processing, so as to reduce the dimension. The integral value, wavelet energy value, maximum gradient value, average differential value, relation steady-state response average value and variance value were selected as six different characteristic parameters, to study fish meal samples with different storage time grades. Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA), and five recognition modes, which included the multilayer perceptron neural network classification method, random forest classification method, k nearest neighbor algorithm, support vector machine algorithm, and Bayesian classification method, were employed for the classification. The result showed that the RF classification method had the highest accuracy rate for the classification algorithm. The highest accuracy rate for distinguishing fish meal samples with different qualities was achieved using the integral value, stable value, and average differential value. The lowest accuracy rate for distinguishing fish meal samples with different qualities was achieved using the maximum gradient value. This finding shows that the electronic nose can identify fish meal samples with different storage times.
Collapse
Affiliation(s)
- Pei Li
- College of Engineering, Huazhong Agricultural University, Wuhan 430070, China.
| | - Zouhong Ren
- College of Engineering, Huazhong Agricultural University, Wuhan 430070, China.
| | - Kaiyi Shao
- College of Engineering, Huazhong Agricultural University, Wuhan 430070, China.
| | - Hequn Tan
- College of Engineering, Huazhong Agricultural University, Wuhan 430070, China.
- Key Laboratory of Agricultural Equipment in Mid-lower Yangtze River, Ministry of Agriculture, Wuhan 430070, China.
| | - Zhiyou Niu
- College of Engineering, Huazhong Agricultural University, Wuhan 430070, China.
- Key Laboratory of Agricultural Equipment in Mid-lower Yangtze River, Ministry of Agriculture, Wuhan 430070, China.
| |
Collapse
|