Arukonda S, Cheruku R. Nested genetic algorithm-based classifier selection and placement in multi-level ensemble framework for effective disease diagnosis.
Comput Methods Biomech Biomed Engin 2023:1-24. [PMID:
38126276 DOI:
10.1080/10255842.2023.2294264]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 12/05/2023] [Indexed: 12/23/2023]
Abstract
Effective disease diagnosis is a critical unmet need on a global scale. The intricacies of the numerous disease mechanisms and underlying symptoms make developing a model for early diagnosis and effective treatment extremely difficult. Machine learning (ML) can help to solve some of these issues. Recently, various ensemble-based ML models have benefited clinicians in early diagnosis. However, one of the most difficult challenges in multi-level ensemble approaches is the classifier selection and their placement in the ensemble framework as it improves the overall performance. Let m classifiers have to select from n classifiers there are ( n m ) ways. Again, these ( n m ) possibilities can be arranged in m ! ways. Finding the best m classifiers and their positions from total ( n m ) m ! ways is a challenging and hard problem. To address this challenge, a dynamic three-level ensemble framework is proposed. A nested Genetic Algorithm (GA) and ensemble-based fitness function are employed to optimize the classifier selection and their placement in a three-level ensemble framework. Our approach used eleven classifiers and chose seven classifiers by maximizing the fitness function. The proposed model experiments on 12 disease datasets. The proposed model outperformed in terms of accuracy, F1, and G-measure on the Chronic Kidney Disease (CKD) dataset is 0.987, 0.988, and 0.989, respectively. In terms of AUC on the Heart disease dataset (HDD) is 0.998 and in terms of recall on the Hypothyroid disease dataset (HyDD) is 0.988. In addition, the proposed model superiority is statically evaluated by Wilcoxon-Signed-Rank (WSR) test compared with other ensemble models, such as random forest (RF), bagging classifier (BC), XGBoost (XGB), and gradient boost classifier (GBC) with probability value p < 0.05 results shows all the traditional ensemble model differs with proposed model and also effective size evaluated with using the matched-pairs rank biserial correlation coefficient wc and statistical results shows effective size is large with RF and BC and effective size is medium with XGB and GBC. Proposed model has outperformed comparing with State-Of-The-Art (SOTA) ensemble and non-ensemble models. Further, the proposed model outperformed in terms of the ROC curve in the majority of the disease datasets. The results suggest the usage of the proposed model for disease diagnosis applications.
Collapse