1
|
Idris NF, Ismail MA, Jaya MIM, Ibrahim AO, Abulfaraj AW, Binzagr F. Stacking with Recursive Feature Elimination-Isolation Forest for classification of diabetes mellitus. PLoS One 2024; 19:e0302595. [PMID: 38718024 PMCID: PMC11078423 DOI: 10.1371/journal.pone.0302595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Accepted: 04/08/2024] [Indexed: 05/12/2024] Open
Abstract
Diabetes Mellitus is one of the oldest diseases known to humankind, dating back to ancient Egypt. The disease is a chronic metabolic disorder that heavily burdens healthcare providers worldwide due to the steady increment of patients yearly. Worryingly, diabetes affects not only the aging population but also children. It is prevalent to control this problem, as diabetes can lead to many health complications. As evolution happens, humankind starts integrating computer technology with the healthcare system. The utilization of artificial intelligence assists healthcare to be more efficient in diagnosing diabetes patients, better healthcare delivery, and more patient eccentric. Among the advanced data mining techniques in artificial intelligence, stacking is among the most prominent methods applied in the diabetes domain. Hence, this study opts to investigate the potential of stacking ensembles. The aim of this study is to reduce the high complexity inherent in stacking, as this problem contributes to longer training time and reduces the outliers in the diabetes data to improve the classification performance. In addressing this concern, a novel machine learning method called the Stacking Recursive Feature Elimination-Isolation Forest was introduced for diabetes prediction. The application of stacking with Recursive Feature Elimination is to design an efficient model for diabetes diagnosis while using fewer features as resources. This method also incorporates the utilization of Isolation Forest as an outlier removal method. The study uses accuracy, precision, recall, F1 measure, training time, and standard deviation metrics to identify the classification performances. The proposed method acquired an accuracy of 79.077% for PIMA Indians Diabetes and 97.446% for the Diabetes Prediction dataset, outperforming many existing methods and demonstrating effectiveness in the diabetes domain.
Collapse
Affiliation(s)
- Nur Farahaina Idris
- Faculty of Computing, Universiti Malaysia Pahang Al-Sultan Abdullah, Pekan, Pahang, Malaysia
| | - Mohd Arfian Ismail
- Faculty of Computing, Universiti Malaysia Pahang Al-Sultan Abdullah, Pekan, Pahang, Malaysia
- Centre of Excellence for Artificial Intelligence & Data Science, Universiti, Al-Sultan Pahang, Lebuhraya Tun Razak, Gambang, Malaysia
| | - Mohd Izham Mohd Jaya
- Faculty of Computing, Universiti Malaysia Pahang Al-Sultan Abdullah, Pekan, Pahang, Malaysia
| | - Ashraf Osman Ibrahim
- Creative Advanced Machine Intelligence Research Centre, Faculty of Computing and Informatics, Universiti Malaysia Sabah, Sabah, Malaysia
| | - Anas W. Abulfaraj
- Department of Information Systems, King Abdulaziz University, Rabigh, Saudi Arabia
| | - Faisal Binzagr
- Department of Computer Science, King Abdulaziz University, Rabigh, Saudi Arabia
| |
Collapse
|
2
|
Chiu CC, Wu CM, Chien TN, Kao LJ, Li C, Jiang HL. Applying an Improved Stacking Ensemble Model to Predict the Mortality of ICU Patients with Heart Failure. J Clin Med 2022; 11:6460. [PMID: 36362686 PMCID: PMC9659015 DOI: 10.3390/jcm11216460] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Revised: 10/21/2022] [Accepted: 10/26/2022] [Indexed: 08/31/2023] Open
Abstract
Cardiovascular diseases have been identified as one of the top three causes of death worldwide, with onset and deaths mostly due to heart failure (HF). In ICU, where patients with HF are at increased risk of death and consume significant medical resources, early and accurate prediction of the time of death for patients at high risk of death would enable them to receive appropriate and timely medical care. The data for this study were obtained from the MIMIC-III database, where we collected vital signs and tests for 6699 HF patient during the first 24 h of their first ICU admission. In order to predict the mortality of HF patients in ICUs more precisely, an integrated stacking model is proposed and applied in this paper. In the first stage of dataset classification, the datasets were subjected to first-level classifiers using RF, SVC, KNN, LGBM, Bagging, and Adaboost. Then, the fusion of these six classifier decisions was used to construct and optimize the stacked set of second-level classifiers. The results indicate that our model obtained an accuracy of 95.25% and AUROC of 82.55% in predicting the mortality rate of HF patients, which demonstrates the outstanding capability and efficiency of our method. In addition, the results of this study also revealed that platelets, glucose, and blood urea nitrogen were the clinical features that had the greatest impact on model prediction. The results of this analysis not only improve the understanding of patients' conditions by healthcare professionals but allow for a more optimal use of healthcare resources.
Collapse
Affiliation(s)
- Chih-Chou Chiu
- Department of Business Management, National Taipei University of Technology, Taipei 106, Taiwan
| | - Chung-Min Wu
- Department of Business Management, National Taipei University of Technology, Taipei 106, Taiwan
| | - Te-Nien Chien
- College of Management, National Taipei University of Technology, Taipei 106, Taiwan
| | - Ling-Jing Kao
- Department of Business Management, National Taipei University of Technology, Taipei 106, Taiwan
| | - Chengcheng Li
- College of Management, National Taipei University of Technology, Taipei 106, Taiwan
| | - Han-Ling Jiang
- Alliance Manchester Business School, University of Manchester, Manchester M15 6PB, UK
| |
Collapse
|
3
|
Bai X, Zhou Y, Feng X, Tao M, Zhang J, Deng S, Lou B, Yang G, Wu Q, Yu L, Yang Y, He Y. Evaluation of rice bacterial blight severity from lab to field with hyperspectral imaging technique. FRONTIERS IN PLANT SCIENCE 2022; 13:1037774. [PMID: 36340356 PMCID: PMC9627309 DOI: 10.3389/fpls.2022.1037774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Accepted: 10/03/2022] [Indexed: 06/16/2023]
Abstract
Hyperspectral imaging technique combined with machine learning is a powerful tool for the evaluation of disease phenotype in rice disease-resistant breeding. However, the current studies are almost carried out in the lab environment, which is difficult to apply to the field environment. In this paper, we used visible/near-infrared hyperspectral images to analysis the severity of rice bacterial blight (BB) and proposed a novel disease index construction strategy (NDSCI) for field application. A designed long short-term memory network with attention mechanism could evaluate the BB severity robustly, and the attention block could filter important wavelengths. Best results were obtained based on the fusion of important wavelengths and color features with an accuracy of 0.94. Then, NSDCI was constructed based on the important wavelength and color feature related to BB severity. The correlation coefficient of NDSCI extended to the field data reached -0.84, showing good scalability. This work overcomes the limitations of environmental conditions and sheds new light on the rapid measurement of phenotype in disease-resistant breeding.
Collapse
Affiliation(s)
- Xiulin Bai
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou, China
| | - Yujie Zhou
- Zhuji Agricultural Technology Extension Center, Zhuji, China
| | - Xuping Feng
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou, China
| | - Mingzhu Tao
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou, China
| | - Jinnuo Zhang
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou, China
| | - Shuiguang Deng
- College of Computer Science and Technology, Zhejiang University, Hangzhou, China
| | - Binggan Lou
- College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Guofeng Yang
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou, China
| | - Qingguan Wu
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou, China
| | - Li Yu
- Agricultural Experiment Station & Agricultural Sci-Tech Park Management Committee, Zhejiang University, Hangzhou, China
| | - Yong Yang
- State Key Laboratory for Managing Biotic and Chemical Treats to the Quality and Safety of Agro-Products, Key Laboratory of Biotechnology for Plant Protection, Ministry of Agriculture, and Rural Affairs, Zhejiang Provincial Key Laboratory of Biotechnology for Plant Protection, Institute of Virology and Biotechnology, Zhejiang Academy of Agricultural Science, Hangzhou, China
| | - Yong He
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou, China
| |
Collapse
|
4
|
Fitoussi R, Faure MO, Beauchef G, Achard S. Human skin responses to environmental pollutants: A review of current scientific models. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2022; 306:119316. [PMID: 35469928 DOI: 10.1016/j.envpol.2022.119316] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/25/2021] [Revised: 04/11/2022] [Accepted: 04/14/2022] [Indexed: 06/14/2023]
Abstract
Whatever the exposure route, chemical, physical and biological pollutants modify the whole organism response, leading to nerve, cardiac, respiratory, reproductive, and skin system pathologies. Skin acts as a barrier for preventing pollutant modifications. This review aims to present the available scientific models, which help investigate the impact of pollution on the skin. The research question was "Which experimental models illustrate the impact of pollution on the skin in humans?" The review covered a period of 10 years following a PECO statement on in vitro, ex vivo, in vivo and in silico models. Of 582 retrieved articles, 118 articles were eligible. In oral and inhalation routes, dermal exposure had an important impact at both local and systemic levels. Healthy skin models included primary cells, cell lines, co-cultures, reconstructed human epidermis, and skin explants. In silico models estimated skin exposure and permeability. All pollutants affected the skin by altering elasticity, thickness, the structure of epidermal barrier strength, and dermal extracellular integrity. Some specific models concerned wound healing or the skin aging process. Underlying mechanisms were an exacerbated inflammatory skin reaction with the modulation of several cytokines and oxidative stress responses, ending with apoptosis. Pathological skin models revealed the consequences of environmental pollutants on psoriasis, atopic dermatitis, and tumour development. Finally, scientific models were used for evaluating the safety and efficacy of potential skin formulations in preventing the skin aging process or skin irritation after repeated contact. The review gives an overview of scientific skin models used to assess the effects of pollutants. Chemical and physical pollutants were mainly represented while biological contaminants were little studied. In future developments, cell hypoxia and microbiota models may be considered as more representative of clinical situations. Models considering humidity and temperature variations may reflect the impact of these changes.
Collapse
Affiliation(s)
| | - Marie-Odile Faure
- Scientific Consulting For You, 266 avenue Daumesnil, 75012, PARIS, France
| | | | - Sophie Achard
- HERA Team (Health Environmental Risk Assessment), INSERM UMR1153, CRESS-INRAE, Université Paris Cité, Faculté de Pharmacie, 4 avenue de l'Observatoire, 75270 CEDEX 06, PARIS, France.
| |
Collapse
|
5
|
Exploration of Black Boxes of Supervised Machine Learning Models: A Demonstration on Development of Predictive Heart Risk Score. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:5475313. [PMID: 35602638 PMCID: PMC9119773 DOI: 10.1155/2022/5475313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Accepted: 04/26/2022] [Indexed: 11/18/2022]
Abstract
Machine learning (ML) often provides applicable high-performance models to facilitate decision-makers in various fields. However, this high performance is achieved at the expense of the interpretability of these models, which has been criticized by practitioners and has become a significant hindrance in their application. Therefore, in highly sensitive decisions, black boxes of ML models are not recommended. We proposed a novel methodology that uses complex supervised ML models and transforms them into simple, interpretable, transparent statistical models. This methodology is like stacking ensemble ML in which the best ML models are used as a base learner to compute relative feature weights. The index of these weights is further used as a single covariate in the simple logistic regression model to estimate the likelihood of an event. We tested this methodology on the primary dataset related to cardiovascular diseases (CVDs), the leading cause of mortalities in recent times. Therefore, early risk assessment is an important dimension that can potentially reduce the burden of CVDs and their related mortality through accurate but interpretable risk prediction models. We developed an artificial neural network and support vector machines based on ML models and transformed them into a simple statistical model and heart risk scores. These simplified models were found transparent, reliable, valid, interpretable, and approximate in predictions. The findings of this study suggest that complex supervised ML models can be efficiently transformed into simple statistical models that can also be validated.
Collapse
|
6
|
Feng Y, Wang X, Zhang J. A heterogeneous ensemble learning method for neuroblastoma survival prediction. IEEE J Biomed Health Inform 2021; 26:1472-1483. [PMID: 33848254 DOI: 10.1109/jbhi.2021.3073056] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Neuroblastoma is a pediatric cancer with high morbidity and mortality. Accurate survival prediction of patients with neuroblastoma plays an important role in the formulation of treatment plans. In this study, we proposed a heterogeneous ensemble learning method to predict the survival of neuroblastoma patients and extract decision rules from the proposed method to assist doctors in making decisions. After data preprocessing, five heterogeneous base learners were developed, which consisted of decision tree, random forest, support vector machine based on genetic algorithm, extreme gradient boosting and light gradient boosting machine. Subsequently, a heterogeneous feature selection method was devised to obtain the optimal feature subset of each base learner, and the optimal feature subset of each base learner guided the construction of the base learners as a priori knowledge. Furthermore, an area under curve-based ensemble mechanism was proposed to integrate the five heterogeneous base learners. Finally, the proposed method was compared with mainstream machine learning methods from different indicators, and valuable information was extracted by using the partial dependency plot analysis method and rule-extracted method from the proposed method. Experimental results show that the proposed method achieves an accuracy of 91.64%, recall of 91.14%, and AUC of 91.35% and is significantly better than the mainstream machine learning methods. In addition, interpretable rules with accuracy higher than 0.900 and predicted responses are extracted from the proposed method. Our study can effectively improve the performance of the clinical decision support system to improve the survival of neuroblastoma patients.
Collapse
|
7
|
Shen T, Yu H, Wang YZ. Discrimination of Gentiana and Its Related Species Using IR Spectroscopy Combined with Feature Selection and Stacked Generalization. Molecules 2020; 25:molecules25061442. [PMID: 32210010 PMCID: PMC7144467 DOI: 10.3390/molecules25061442] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2020] [Revised: 03/15/2020] [Accepted: 03/20/2020] [Indexed: 01/09/2023] Open
Abstract
Gentiana, which is one of the largest genera of Gentianoideae, most of which had potential pharmaceutical value, and applied to local traditional medical treatment. Because of the phytochemical diversity and difference of bioactive compounds among species, which makes it crucial to accurately identify authentic Gentiana species. In this paper, the feasibility of using the infrared spectroscopy technique combined with chemometrics analysis to identify Gentiana and its related species was studied. A total of 180 batches of raw spectral fingerprints were obtained from 18 species of Gentiana and Tripterospermum by near-infrared (NIR: 10,000-4000 cm-1) and Fourier transform mid-infrared (MIR: 4000-600 cm-1) spectrum. Firstly, principal component analysis (PCA) was utilized to explore the natural grouping of the 180 samples. Secondly, random forests (RF), support vector machine (SVM), and K-nearest neighbors (KNN) models were built while using full spectra (including 1487 NIR variables and 1214 FT-MIR variables, respectively). The MIR-SVM model had a higher classification accuracy rate than the other models that were based on the results of the calibration sets and prediction sets. The five feature selection strategies, VIP (variable importance in the projection), Boruta, GARF (genetic algorithm combined with random forest), GASVM (genetic algorithm combined with support vector machine), and Venn diagram calculation, were used to reduce the dimensions of the data variable in order to further reduce numbers of variables for modeling. Finally, 101 NIR and 73 FT-MIR bands were selected as the feature variables, respectively. Thirdly, stacking models were built based on the optimal spectral dataset. Most of the stacking models performed better than the full spectra-based models. RF and SVM (as base learners), combined with the SVM meta-classifier, was the optimal stacked generalization strategy. For the SG-Ven-MIR-SVM model, the accuracy (ACC) of the calibration set and validation set were both 100%. Sensitivity (SE), specificity (SP), efficiency (EFF), Matthews correlation coefficient (MCC), and Cohen's kappa coefficient (K) were all 1, which showed that the model had the optimal authenticity identification performance. Those parameters indicated that stacked generalization combined with feature selection is probably an important technique for improving the classification model predictive accuracy and avoid overfitting. The study result can provide a valuable reference for the safety and effectiveness of the clinical application of medicinal Gentiana.
Collapse
Affiliation(s)
- Tao Shen
- Yunnan Herbal Laboratory, Institute of Herb Biotic Resources, School of Life and Sciences, Yunnan University, Kunming 650091, China;
- The International Joint Research Center for Sustainable Utilization of Cordyceps Bioresources in China (Yunnan) and Southeast Asia, Yunnan University, Kunming 650091, China
- College of Chemistry, Biological and Environment, Yuxi Normal University, Yu’xi 653100, China
| | - Hong Yu
- Yunnan Herbal Laboratory, Institute of Herb Biotic Resources, School of Life and Sciences, Yunnan University, Kunming 650091, China;
- The International Joint Research Center for Sustainable Utilization of Cordyceps Bioresources in China (Yunnan) and Southeast Asia, Yunnan University, Kunming 650091, China
- Correspondence: ; Tel.: +86-1370-067-6633
| | - Yuan-Zhong Wang
- Medicinal Plants Research Institute, Yunnan Academy of Agricultural Sciences, Kunming 650200, China;
| |
Collapse
|