1
|
Ejiyi CJ, Qin Z, Ukwuoma CC, Nneji GU, Monday HN, Ejiyi MB, Ejiyi TU, Okechukwu U, Bamisile OO. Comparative performance analysis of Boruta, SHAP, and Borutashap for disease diagnosis: A study with multiple machine learning algorithms. NETWORK (BRISTOL, ENGLAND) 2024:1-38. [PMID: 38511557 DOI: 10.1080/0954898x.2024.2331506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Accepted: 03/13/2024] [Indexed: 03/22/2024]
Abstract
Interpretable machine learning models are instrumental in disease diagnosis and clinical decision-making, shedding light on relevant features. Notably, Boruta, SHAP (SHapley Additive exPlanations), and BorutaShap were employed for feature selection, each contributing to the identification of crucial features. These selected features were then utilized to train six machine learning algorithms, including LR, SVM, ETC, AdaBoost, RF, and LR, using diverse medical datasets obtained from public sources after rigorous preprocessing. The performance of each feature selection technique was evaluated across multiple ML models, assessing accuracy, precision, recall, and F1-score metrics. Among these, SHAP showcased superior performance, achieving average accuracies of 80.17%, 85.13%, 90.00%, and 99.55% across diabetes, cardiovascular, statlog, and thyroid disease datasets, respectively. Notably, the LGBM emerged as the most effective algorithm, boasting an average accuracy of 91.00% for most disease states. Moreover, SHAP enhanced the interpretability of the models, providing valuable insights into the underlying mechanisms driving disease diagnosis. This comprehensive study contributes significant insights into feature selection techniques and machine learning algorithms for disease diagnosis, benefiting researchers and practitioners in the medical field. Further exploration of feature selection methods and algorithms holds promise for advancing disease diagnosis methodologies, paving the way for more accurate and interpretable diagnostic models.
Collapse
Affiliation(s)
- Chukwuebuka Joseph Ejiyi
- School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Zhen Qin
- School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Chiagoziem Chima Ukwuoma
- School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Grace Ugochi Nneji
- Software Engineering Department, Sino-British Collaborative Education, Chengdu University of Technology, Oxford Brookes University, Chengdu, China
| | - Happy Nkanta Monday
- Software Engineering Department, Sino-British Collaborative Education, Chengdu University of Technology, Oxford Brookes University, Chengdu, China
| | | | - Thomas Ugochukwu Ejiyi
- Department of Pure and Industrial Chemistry, University of Nigeria Nsukka, Enugu, Nigeria
| | | | - Olusola O Bamisile
- Sichuan Industrial Internet Intelligent Monitoring and Application Engineering Technology Research Centre, Chengdu University of Technology, Chengdu, China
| |
Collapse
|
2
|
Chen W, Cai Y, Li A, Su Y, Jiang K. EEG feature selection method based on maximum information coefficient and quantum particle swarm. Sci Rep 2023; 13:14515. [PMID: 37666919 PMCID: PMC10477332 DOI: 10.1038/s41598-023-41682-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Accepted: 08/30/2023] [Indexed: 09/06/2023] Open
Abstract
To reduce the dimensionality of EEG features and improve classification accuracy, we propose an improved hybrid feature selection method for EEG feature selection. First, MIC is used to remove irrelevant features and redundant features to reduce the search space of the second stage. QPSO is then used to optimize the feature in the second stage to obtain the optimal feature subset. Considering that both dimensionality and classification accuracy affect the performance of feature subsets, we design a new fitness function. Moreover, we optimize the parameters of the classifier while optimizing the feature subset to improve the classification accuracy and reduce the running time of the algorithm. Finally, experiments were performed on EEG and UCI datasets and compared with five existing feature selection methods. The results show that the feature subsets obtained by the proposed method have low dimensionality, high classification accuracy, and low computational complexity, which validates the effectiveness of the proposed method.
Collapse
Affiliation(s)
- Wan Chen
- Rocket Force University of Engineering, Xi'an, 710025, China
| | - Yanping Cai
- Rocket Force University of Engineering, Xi'an, 710025, China.
| | - Aihua Li
- Rocket Force University of Engineering, Xi'an, 710025, China
| | - Yanzhao Su
- Rocket Force University of Engineering, Xi'an, 710025, China
| | - Ke Jiang
- Rocket Force University of Engineering, Xi'an, 710025, China
| |
Collapse
|
3
|
AbdelAty AM, Yousri D, Chelloug S, Alduailij M, Abd Elaziz M. Fractional order adaptive hunter-prey optimizer for feature selection. ALEXANDRIA ENGINEERING JOURNAL 2023; 75:531-547. [DOI: 10.1016/j.aej.2023.05.092] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
|
4
|
Yan H, Li H, Yi B. Multi-channel Convolutional Neural Network with Sentiment Information for Sentiment Classification. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2023. [DOI: 10.1007/s13369-023-07695-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/23/2023]
|
5
|
Yin K, Zhai J, Xie A, Zhu J. Feature selection using max dynamic relevancy and min redundancy. Pattern Anal Appl 2023. [DOI: 10.1007/s10044-023-01138-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/19/2023]
|
6
|
Pramanik R, Sarkar S, Sarkar R. An adaptive and altruistic PSO-based deep feature selection method for Pneumonia detection from Chest X-rays. Appl Soft Comput 2022; 128:109464. [PMID: 35966452 PMCID: PMC9364947 DOI: 10.1016/j.asoc.2022.109464] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 07/11/2022] [Accepted: 07/29/2022] [Indexed: 12/23/2022]
Abstract
Pneumonia is one of the major reasons for child mortality especially in income-deprived regions of the world. Although it can be detected and treated with very less sophisticated instruments and medication, Pneumonia detection still remains a major concern in developing countries. Computer-aided based diagnosis (CAD) systems can be used in such countries due to their lower operating costs than professional medical experts. In this paper, we propose a CAD system for Pneumonia detection from Chest X-rays, using the concepts of deep learning and a meta-heuristic algorithm. We first extract deep features from the pre-trained ResNet50, fine-tuned on a target Pneumonia dataset. Then, we propose a feature selection technique based on particle swarm optimization (PSO), which is modified using a memory-based adaptation parameter, and enriched by incorporating an altruistic behavior into the agents. We name our feature selection method as adaptive and altruistic PSO (AAPSO). The proposed method successfully eliminates non-informative features obtained from the ResNet50 model, thereby improving the Pneumonia detection ability of the overall framework. Extensive experimentation and thorough analysis on a publicly available Pneumonia dataset establish the superiority of the proposed method over several other frameworks used for Pneumonia detection. Apart from Pneumonia detection, AAPSO is further evaluated on some standard UCI datasets, gene expression datasets for cancer prediction and a COVID-19 prediction dataset. The overall results are satisfactory, thereby confirming the usefulness of AAPSO in dealing with varied real-life problems. The supporting source codes of this work can be found at https://github.com/rishavpramanik/AAPSO.
Collapse
Affiliation(s)
- Rishav Pramanik
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, 700032, India
| | - Sourodip Sarkar
- Department of Electronics and Communication Engineering, Heritage Institute of Technology, Kolkata, 700107, India
| | - Ram Sarkar
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, 700032, India
| |
Collapse
|
7
|
Abdelwahed NM, El-Tawel GS, Makhlouf MA. Effective hybrid feature selection using different bootstrap enhances cancers classification performance. BioData Min 2022; 15:24. [PMID: 36175944 PMCID: PMC9523996 DOI: 10.1186/s13040-022-00304-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Accepted: 08/31/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Machine learning can be used to predict the different onset of human cancers. Highly dimensional data have enormous, complicated problems. One of these is an excessive number of genes plus over-fitting, fitting time, and classification accuracy. Recursive Feature Elimination (RFE) is a wrapper method for selecting the best subset of features that cause the best accuracy. Despite the high performance of RFE, time computation and over-fitting are two disadvantages of this algorithm. Random forest for selection (RFS) proves its effectiveness in selecting the effective features and improving the over-fitting problem. METHOD This paper proposed a method, namely, positions first bootstrap step (PFBS) random forest selection recursive feature elimination (RFS-RFE) and its abbreviation is PFBS- RFS-RFE to enhance cancer classification performance. It used a bootstrap with many positions included in the outer first bootstrap step (OFBS), inner first bootstrap step (IFBS), and outer/ inner first bootstrap step (O/IFBS). In the first position, OFBS is applied as a resampling method (bootstrap) with replacement before selection step. The RFS is applied with bootstrap = false i.e., the whole datasets are used to build each tree. The importance features are hybrid with RFE to select the most relevant subset of features. In the second position, IFBS is applied as a resampling method (bootstrap) with replacement during applied RFS. The importance features are hybrid with RFE. In the third position, O/IFBS is applied as a hybrid of first and second positions. RFE used logistic regression (LR) as an estimator. The proposed methods are incorporated with four classifiers to solve the feature selection problems and modify the performance of RFE, in which five datasets with different size are used to assess the performance of the PFBS-RFS-RFE. RESULTS The results showed that the O/IFBS-RFS-RFE achieved the best performance compared with previous work and enhanced the accuracy, variance and ROC area for RNA gene and dermatology erythemato-squamous diseases datasets to become 99.994%, 0.0000004, 1.000 and 100.000%, 0.0 and 1.000, respectively. CONCLUSION High dimensional datasets and RFE algorithm face many troubles in cancers classification performance. PFBS-RFS-RFE is proposed to fix these troubles with different positions. The importance features which extracted from RFS are used with RFE to obtain the effective features.
Collapse
Affiliation(s)
- Noura Mohammed Abdelwahed
- Department of Information Systems, Faculty of Computers and Informatics, Suez Canal University, Ismailia, Egypt.
| | - Gh S El-Tawel
- Department of Computer Science, Faculty of Computers and Informatics, Suez Canal University, Ismailia, Egypt
| | - M A Makhlouf
- Department of Information Systems, Faculty of Computers and Informatics, Suez Canal University, Ismailia, Egypt
| |
Collapse
|
8
|
Song XF, Zhang Y, Gong DW, Gao XZ. A Fast Hybrid Feature Selection Based on Correlation-Guided Clustering and Particle Swarm Optimization for High-Dimensional Data. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:9573-9586. [PMID: 33729976 DOI: 10.1109/tcyb.2021.3061152] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The "curse of dimensionality" and the high computational cost have still limited the application of the evolutionary algorithm in high-dimensional feature selection (FS) problems. This article proposes a new three-phase hybrid FS algorithm based on correlation-guided clustering and particle swarm optimization (PSO) (HFS-C-P) to tackle the above two problems at the same time. To this end, three kinds of FS methods are effectively integrated into the proposed algorithm based on their respective advantages. In the first and second phases, a filter FS method and a feature clustering-based method with low computational cost are designed to reduce the search space used by the third phase. After that, the third phase applies oneself to finding an optimal feature subset by using an evolutionary algorithm with the global searchability. Moreover, a symmetric uncertainty-based feature deletion method, a fast correlation-guided feature clustering strategy, and an improved integer PSO are developed to improve the performance of the three phases, respectively. Finally, the proposed algorithm is validated on 18 publicly available real-world datasets in comparison with nine FS algorithms. Experimental results show that the proposed algorithm can obtain a good feature subset with the lowest computational cost.
Collapse
|
9
|
A HYBRID SENTIMENT ANALYSIS APPROACH USING BLACK WIDOW OPTIMIZATION BASED FEATURE SELECTION. INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH 2022. [DOI: 10.4018/ijirr.289955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
This paper proposes a novel hybrid framework with BWO based feature reduction technique which combines the merits of both machine learning and lexicon-based approaches to attain better scalability and accuracy. The scalability problem arises due to noisy, irrelevant and unique features present in the extracted features from proposed approach, which can be eliminated by adopting an effective feature reduction technique. In our proposed BWO approach, without changing the accuracy (90%), the feature-set size is reduced up to 43%. The proposed feature selection technique outperforms other commonly used PSO and GAbased feature selection techniques with reduced computation time of 21 sec. Moreover, our sentiment analysis approach is analysed using performance metrices such as precision, recall, F-measure, and computation time. Many organizations can use these online reviews to make well-informed decisions towards the users’ interests and preferences to enhance customer satisfaction, product quality and to find the aspects to improve the products, thereby to generate more profits.
Collapse
|
10
|
|
11
|
Shaban WM, Rabie AH, Saleh AI, Abo-Elsoud MA. Detecting COVID-19 patients based on fuzzy inference engine and Deep Neural Network. Appl Soft Comput 2021; 99:106906. [PMID: 33204229 PMCID: PMC7659585 DOI: 10.1016/j.asoc.2020.106906] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Revised: 11/02/2020] [Accepted: 11/10/2020] [Indexed: 12/23/2022]
Abstract
COVID-19, as an infectious disease, has shocked the world and still threatens the lives of billions of people. Recently, the detection of coronavirus (COVID-19) is a critical task for the medical practitioner. Unfortunately, COVID-19 spreads so quickly between people and approaches millions of people worldwide in few months. It is very much essential to quickly and accurately identify the infected people so that prevention of spread can be taken. Although several medical tests have been used to detect certain injuries, the hopefully detection efficiency has not been accomplished yet. In this paper, a new Hybrid Diagnose Strategy (HDS) has been introduced. HDS relies on a novel technique for ranking selected features by projecting them into a proposed Patient Space (PS). A Feature Connectivity Graph (FCG) is constructed which indicates both the weight of each feature as well as the binding degree to other features. The rank of a feature is determined based on two factors; the first is the feature weight, while the second is its binding degree to its neighbors in PS. Then, the ranked features are used to derive the classification model that can classify new persons to decide whether they are infected or not. The classification model is a hybrid model that consists of two classifiers; fuzzy inference engine and Deep Neural Network (DNN). The proposed HDS has been compared against recent techniques. Experimental results have shown that the proposed HDS outperforms the other competitors in terms of the average value of accuracy, precision, recall, and F-measure in which it provides about of 97.658%, 96.756%, 96.55%, and 96.615% respectively. Additionally, HDS provides the lowest error value of 2.342%. Further, the results were validated statistically using Wilcoxon Signed Rank Test and Friedman Test.
Collapse
Affiliation(s)
- Warda M Shaban
- Nile higher institute for engineering and technology, Egypt
| | - Asmaa H Rabie
- Computers and Control Dept. Faculty of engineering, Mansoura University, Egypt
| | - Ahmed I Saleh
- Computers and Control Dept. Faculty of engineering, Mansoura University, Egypt
| | - M A Abo-Elsoud
- Electronics and Communication Dept. Faculty of engineering, Mansoura University, Egypt
| |
Collapse
|