1
|
Ejiyi CJ, Qin Z, Ukwuoma CC, Nneji GU, Monday HN, Ejiyi MB, Ejiyi TU, Okechukwu U, Bamisile OO. Comparative performance analysis of Boruta, SHAP, and Borutashap for disease diagnosis: A study with multiple machine learning algorithms. NETWORK (BRISTOL, ENGLAND) 2024:1-38. [PMID: 38511557 DOI: 10.1080/0954898x.2024.2331506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Accepted: 03/13/2024] [Indexed: 03/22/2024]
Abstract
Interpretable machine learning models are instrumental in disease diagnosis and clinical decision-making, shedding light on relevant features. Notably, Boruta, SHAP (SHapley Additive exPlanations), and BorutaShap were employed for feature selection, each contributing to the identification of crucial features. These selected features were then utilized to train six machine learning algorithms, including LR, SVM, ETC, AdaBoost, RF, and LR, using diverse medical datasets obtained from public sources after rigorous preprocessing. The performance of each feature selection technique was evaluated across multiple ML models, assessing accuracy, precision, recall, and F1-score metrics. Among these, SHAP showcased superior performance, achieving average accuracies of 80.17%, 85.13%, 90.00%, and 99.55% across diabetes, cardiovascular, statlog, and thyroid disease datasets, respectively. Notably, the LGBM emerged as the most effective algorithm, boasting an average accuracy of 91.00% for most disease states. Moreover, SHAP enhanced the interpretability of the models, providing valuable insights into the underlying mechanisms driving disease diagnosis. This comprehensive study contributes significant insights into feature selection techniques and machine learning algorithms for disease diagnosis, benefiting researchers and practitioners in the medical field. Further exploration of feature selection methods and algorithms holds promise for advancing disease diagnosis methodologies, paving the way for more accurate and interpretable diagnostic models.
Collapse
Affiliation(s)
- Chukwuebuka Joseph Ejiyi
- School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Zhen Qin
- School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Chiagoziem Chima Ukwuoma
- School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Grace Ugochi Nneji
- Software Engineering Department, Sino-British Collaborative Education, Chengdu University of Technology, Oxford Brookes University, Chengdu, China
| | - Happy Nkanta Monday
- Software Engineering Department, Sino-British Collaborative Education, Chengdu University of Technology, Oxford Brookes University, Chengdu, China
| | | | - Thomas Ugochukwu Ejiyi
- Department of Pure and Industrial Chemistry, University of Nigeria Nsukka, Enugu, Nigeria
| | | | - Olusola O Bamisile
- Sichuan Industrial Internet Intelligent Monitoring and Application Engineering Technology Research Centre, Chengdu University of Technology, Chengdu, China
| |
Collapse
|
2
|
Lee S, Joshi GP, Son CH, Lee G. Combining Gaussian Process with Hybrid Optimal Feature Decision in Cuffless Blood Pressure Estimation. Diagnostics (Basel) 2023; 13:diagnostics13040736. [PMID: 36832226 PMCID: PMC9955403 DOI: 10.3390/diagnostics13040736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2022] [Revised: 02/06/2023] [Accepted: 02/08/2023] [Indexed: 02/17/2023] Open
Abstract
Noninvasive blood pressure estimation is crucial for cardiovascular and hypertension patients. Cuffless-based blood pressure estimation has received much attention recently for continuous blood pressure monitoring. This paper proposes a new methodology that combines the Gaussian process with hybrid optimal feature decision (HOFD) in cuffless blood pressure estimation. First, we can choose one of the feature selection methods: robust neighbor component analysis (RNCA), minimum redundancy, maximum relevance (MRMR), and F-test, based on the proposed hybrid optimal feature decision. After that, a filter-based RNCA algorithm uses the training dataset to obtain weighted functions by minimizing the loss function. Next, we combine the Gaussian process (GP) algorithm as the evaluation criteria, which is used to determine the best feature subset. Hence, combining GP with HOFD leads to an effective feature selection process. The proposed combining Gaussian process with the RNCA algorithm shows that the root mean square errors (RMSEs) for the SBP (10.75 mmHg) and DBP (8.02 mmHg) are lower than those of the conventional algorithms. The experimental results represent that the proposed algorithm is very effective.
Collapse
Affiliation(s)
- Soojeong Lee
- Department of Computer Engineering, Sejong University, 209 Neungdong-ro, Gwangjin-gu, Seoul 05006, Republic of Korea
| | - Gyanendra Prasad Joshi
- Department of Computer Engineering, Sejong University, 209 Neungdong-ro, Gwangjin-gu, Seoul 05006, Republic of Korea
| | - Chang-Hwan Son
- Department of Software Science & Engineering, Kunsan National University, 558 Daehak-ro, Gunsan-si 54150, Republic of Korea
- Correspondence: (C.-H.S.); (G.L.)
| | - Gangseong Lee
- Ingenium College, Kwangwoon University, 20 Kwangwoon-ro, Nowon-gu, Seoul 01897, Republic of Korea
- Correspondence: (C.-H.S.); (G.L.)
| |
Collapse
|
3
|
Gonzalez-Jimenez D, Del-Olmo J, Poza J, Garramiola F, Madina P. Data-Driven Low-Frequency Oscillation Event Detection Strategy for Railway Electrification Networks. SENSORS (BASEL, SWITZERLAND) 2022; 23:254. [PMID: 36616852 PMCID: PMC9824671 DOI: 10.3390/s23010254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Revised: 12/20/2022] [Accepted: 12/23/2022] [Indexed: 06/17/2023]
Abstract
Low-frequency oscillations (LFO) occur in railway electrification systems due to the incorporation of new trains with switching converters. As a result, the increased harmonic content can cause catenary stability problems under certain conditions. Most of the research published on this topic to date is focused on modelling the event and analysing it using frequency spectrums. However, in recent years, due to the new technologies linked to Big Data (BD) and data mining (DM), a new opportunity to study and detect LFO events by means of machine-learning (ML) methods has emerged. Trains continuously collect data from the most important catenary variables, which offers new resources for analysing this type of event. Therefore, this article presents the design and implementation of a data-driven LFO event detection strategy for AC railway network scenarios. Compared to previous investigations, a new approach to analyse and detect LFO events, based on field data and ML, is presented. To obtain the most appropriate detection approach for the context of this application, on the one hand, this investigation includes a comparison of machine-learning algorithms (support vector machine, logistic regression, random forest, k-nearest neighbours, naïve Bayes) which have been trained with real field data. On the other hand, an analysis of key parameters and features to optimize event detection is also included. Thus, the most significant result of this work is the high metric values of the solution, reaching values above 97% in accuracy and 93% in F-1 score with the random forest algorithm. In addition, the applicability and training of data-driven methods with real field data are demonstrated. This automatic detection strategy can help with speeding up and improving LFO detection tasks that used to be performed manually. Finally, it is worth mentioning that this research has been structured based on the CRISP-DM methodology, established as the de facto approach for industrial DM projects.
Collapse
|
4
|
Feature Selection Based on Adaptive Particle Swarm Optimization with Leadership Learning. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:1825341. [PMID: 36072739 PMCID: PMC9441366 DOI: 10.1155/2022/1825341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 08/07/2022] [Accepted: 08/09/2022] [Indexed: 12/02/2022]
Abstract
With the rapid development of the Internet of Things (IoT), the curse of dimensionality becomes increasingly common. Feature selection (FS) is to eliminate irrelevant and redundant features in the datasets. Particle swarm optimization (PSO) is an efficient metaheuristic algorithm that has been successfully applied to obtain the optimal feature subset with essential information in an acceptable time. However, it is easy to fall into the local optima when dealing with high-dimensional datasets due to constant parameter values and insufficient population diversity. In the paper, an FS method is proposed by utilizing adaptive PSO with leadership learning (APSOLL). An adaptive updating strategy for parameters is used to replace the constant parameters, and the leadership learning strategy is utilized to provide valid population diversity. Experimental results on 10 UCI datasets show that APSOLL has better exploration and exploitation capabilities through comparison with PSO, grey wolf optimizer (GWO), Harris hawks optimization (HHO), flower pollination algorithm (FPA), salp swarm algorithm (SSA), linear PSO (LPSO), and hybrid PSO and differential evolution (HPSO-DE). Moreover, less than 8% of features in the original datasets are selected on average, and the feature subsets are more effective in most cases compared to those generated by 6 traditional FS methods (analysis of variance (ANOVA), Chi-Squared (CHI2), Pearson, Spearman, Kendall, and Mutual Information (MI)).
Collapse
|