1
|
Zhao Z, Guo S, Han L, Wu L, Zhang Y, Yan B. Altruistic seagull optimization algorithm enables selection of radiomic features for predicting benign and malignant pulmonary nodules. Comput Biol Med 2024; 180:108996. [PMID: 39137669 DOI: 10.1016/j.compbiomed.2024.108996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2024] [Revised: 05/22/2024] [Accepted: 08/02/2024] [Indexed: 08/15/2024]
Abstract
Accurately differentiating indeterminate pulmonary nodules remains a significant challenge in clinical practice. This challenge becomes increasingly formidable when dealing with the vast radiomic features obtained from low-dose computed tomography, a lung cancer screening technique being rolling out in many areas of the world. Consequently, this study proposed the Altruistic Seagull Optimization Algorithm (AltSOA) for the selection of radiomic features in predicting the malignancy risk of pulmonary nodules. This innovative approach incorporated altruism into the traditional seagull optimization algorithm to seek a global optimal solution. A multi-objective fitness function was designed for training the pulmonary nodule prediction model, aiming to use fewer radiomic features while ensuring prediction performance. Among global radiomic features, the AltSOA identified 11 interested features, including the gray level co-occurrence matrix. This automatically selected panel of radiomic features enabled precise prediction (area under the curve = 0.8383 (95 % confidence interval 0.7862-0.8863)) of the malignancy risk of pulmonary nodules, surpassing the proficiency of radiologists. Furthermore, the interpretability, clinical utility, and generalizability of the pulmonary nodule prediction model were thoroughly discussed. All results consistently underscore the superiority of the AltSOA in predicting the malignancy risk of pulmonary nodules. And the proposed malignant risk prediction model for pulmonary nodules holds promise for enhancing existing lung cancer screening methods. The supporting source codes of this work can be found at: https://github.com/zzl2022/PBMPN.
Collapse
Affiliation(s)
- Zhilei Zhao
- National Key Lab of Autonomous Intelligent Unmanned Systems, School of Automation, Beijing Institute of Technology, Beijing, 100081, China.
| | - Shuli Guo
- National Key Lab of Autonomous Intelligent Unmanned Systems, School of Automation, Beijing Institute of Technology, Beijing, 100081, China.
| | - Lina Han
- Department of Cardiology, The Second Medical Center, Chinese PLA General Hospital, Beijing, 100853, China.
| | - Lei Wu
- National Key Lab of Autonomous Intelligent Unmanned Systems, School of Automation, Beijing Institute of Technology, Beijing, 100081, China.
| | - Yating Zhang
- National Key Lab of Autonomous Intelligent Unmanned Systems, School of Automation, Beijing Institute of Technology, Beijing, 100081, China.
| | - Biyu Yan
- National Key Lab of Autonomous Intelligent Unmanned Systems, School of Automation, Beijing Institute of Technology, Beijing, 100081, China.
| |
Collapse
|
2
|
Longkumer I, Mazumder DH. A novel parallel feature rank aggregation algorithm for gene selection applied to microarray data classification. Comput Biol Chem 2024; 112:108182. [PMID: 39197395 DOI: 10.1016/j.compbiolchem.2024.108182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 07/07/2024] [Accepted: 08/22/2024] [Indexed: 09/01/2024]
Abstract
Microarray data often comprises numerous genes, yet not all genes are relevant for predicting cancer. Feature selection becomes a crucial step to reduce the high dimensionality in these kinds of data. While no single feature selection method consistently outperforms others across diverse domains, the combination of multiple feature selectors or rankers tends to produce more effective results compared to relying on a single ranker alone. However, this approach can be computationally expensive, particularly when handling a large quantity of features. Hence, this paper presents a parallel feature rank aggregation that utilizes borda count as the rank aggregator. The concept of vertically partitioning the data along feature space was adapted to ease the parallel execution of the aggregation task. Features were selected based on the final aggregated rank list, and their classification performances were evaluated. The model's execution time was also observed across multiple worker nodes of the cluster. The experiment was conducted on six benchmark microarray datasets. The results show the capability of the proposed distributed framework compared to the sequential version in all the cases. It also illustrated the improved accuracy performance of the proposed method and its ability to select a minimal number of genes.
Collapse
Affiliation(s)
- Imtisenla Longkumer
- National Institute of Technology Nagaland, Chumukedima, Dimapur, Nagaland 797103, India
| | | |
Collapse
|
3
|
Nguyen TM, Vo HHP, Yoo M. Enhancing Intrusion Detection in Wireless Sensor Networks Using a GSWO-CatBoost Approach. SENSORS (BASEL, SWITZERLAND) 2024; 24:3339. [PMID: 38894128 PMCID: PMC11175018 DOI: 10.3390/s24113339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Revised: 05/16/2024] [Accepted: 05/20/2024] [Indexed: 06/21/2024]
Abstract
Intrusion detection systems (IDSs) in wireless sensor networks (WSNs) rely heavily on effective feature selection (FS) for enhanced efficacy. This study proposes a novel approach called Genetic Sacrificial Whale Optimization (GSWO) to address the limitations of conventional methods. GSWO combines a genetic algorithm (GA) and whale optimization algorithms (WOA) modified by applying a new three-population division strategy with a proposed conditional inherited choice (CIC) to overcome premature convergence in WOA. The proposed approach achieves a balance between exploration and exploitation and enhances global search abilities. Additionally, the CatBoost model is employed for classification, effectively handling categorical data with complex patterns. A new technique for fine-tuning CatBoost's hyperparameters is introduced, using effective quantization and the GSWO strategy. Extensive experimentation on various datasets demonstrates the superiority of GSWO-CatBoost, achieving higher accuracy rates on the WSN-DS, WSNBFSF, NSL-KDD, and CICIDS2017 datasets than the existing approaches. The comprehensive evaluations highlight the real-time applicability and accuracy of the proposed method across diverse data sources, including specialized WSN datasets and established benchmarks. Specifically, our GSWO-CatBoost method has an inference time nearly 100 times faster than deep learning methods while achieving high accuracy rates of 99.65%, 99.99%, 99.76%, and 99.74% for WSN-DS, WSNBFSF, NSL-KDD, and CICIDS2017, respectively.
Collapse
Affiliation(s)
- Thuan Minh Nguyen
- Department of Electronic Engineering, Soongsil University, Seoul 06978, Republic of Korea; (T.M.N.); (H.H.-P.V.)
| | - Hanh Hong-Phuc Vo
- Department of Electronic Engineering, Soongsil University, Seoul 06978, Republic of Korea; (T.M.N.); (H.H.-P.V.)
| | - Myungsik Yoo
- School of Electronic Engineering, Soongsil University, Seoul 06978, Republic of Korea
| |
Collapse
|
4
|
Zhang M, Yan K, Chen Y, Yu R. Anticipating interpersonal sensitivity: A predictive model for early intervention in psychological disorders in college students. Comput Biol Med 2024; 172:108134. [PMID: 38492456 DOI: 10.1016/j.compbiomed.2024.108134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 01/17/2024] [Accepted: 02/12/2024] [Indexed: 03/18/2024]
Abstract
Psychological disorders, notably social anxiety and depression, exert detrimental effects on university students, impeding academic achievement and overall development. Timely identification of interpersonal sensitivity becomes imperative to implement targeted support and interventions. This study selected 958 freshmen from higher education institutions in Zhejiang province as the research sample. Utilizing the runge-kutta search and elite levy spreading enhanced moth-flame optimization (MFO) in conjunction with the kernel extreme learning machine (KELM), we propose an efficient intelligent prediction model, namely bREMFO-KELM, for predicting the interpersonal sensitivity of college students. IEEE CEC 2017 benchmark functions and the interpersonal sensitivity dataset were employed as the basis for detailed comparisons with peer-reviewed studies and well-known machine learning models. The experimental results demonstrate the outstanding performance of the bREMFO-KELM model in predicting the sensitivity of interpersonal relationships in college students, achieving an impressive accuracy rate of 97.186%. In-depth analysis reveals that the prediction of interpersonal sensitivity in college students is closely associated with multiple features, including easily hurt in relationships, shy and uneasy with the opposite sex, feeling inferior to others, discomfort when observed or discussed, and blame and criticize others. These features are not only crucial for the accuracy of the prediction model but also provide valuable information for a deeper understanding of the sensitivity of college students' interpersonal relationships. In conclusion, the bREMFO-KELM model excels not only in performance but also possesses a high degree of interpretability, providing robust support for predicting the sensitivity of interpersonal relationships in college students.
Collapse
Affiliation(s)
- Min Zhang
- Department of Student Affairs, Wenzhou University, Wenzhou, 325035, China.
| | - Kailei Yan
- Key Laboratory of Intelligent Informatics for Safety & Emergency of Zhejiang Province, Wenzhou University, Wenzhou, 325035, China.
| | - Yufeng Chen
- Key Laboratory of Intelligent Informatics for Safety & Emergency of Zhejiang Province, Wenzhou University, Wenzhou, 325035, China.
| | - Ruying Yu
- Mental Health Education Center, Wenzhou University, Wenzhou, 325035, China.
| |
Collapse
|
5
|
Yang G, Li W, Xie W, Wang L, Yu K. An improved binary particle swarm optimization algorithm for clinical cancer biomarker identification in microarray data. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 244:107987. [PMID: 38157825 DOI: 10.1016/j.cmpb.2023.107987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 11/04/2023] [Accepted: 12/16/2023] [Indexed: 01/03/2024]
Abstract
BACKGROUND AND OBJECTIVE The limited number of samples and high-dimensional features in microarray data make selecting a small number of features for disease diagnosis a challenging problem. Traditional feature selection methods based on evolutionary algorithms are difficult to search for the optimal set of features in a limited time when dealing with the high-dimensional feature selection problem. New solutions are proposed to solve the above problems. METHODS In this paper, we propose a hybrid feature selection method (C-IFBPFE) for biomarker identification in microarray data, which combines clustering and improved binary particle swarm optimization while incorporating an embedded feature elimination strategy. Firstly, an adaptive redundant feature judgment method based on correlation clustering is proposed for feature screening to reduce the search space in the subsequent stage. Secondly, we propose an improved flipping probability-based binary particle swarm optimization (IFBPSO), better applicable to the binary particle swarm optimization problem. Finally, we also design a new feature elimination (FE) strategy embedded in the binary particle swarm optimization algorithm. This strategy gradually removes poorer features during iterations to reduce the number of features and improve accuracy. RESULTS We compared C-IFBPFE with other published hybrid feature selection methods on eight public datasets and analyzed the impact of each improvement. The proposed method outperforms other current state-of-the-art feature selection methods in terms of accuracy, number of features, sensitivity, and specificity. The ablation study of this method validates the efficacy of each component, especially the proposed feature elimination strategy significantly improves the performance of the algorithm. CONCLUSIONS The hybrid feature selection method proposed in this paper helps address the issue of high-dimensional microarray data with few samples. It can select a small subset of features and achieve high classification accuracy on microarray datasets. Additionally, independent validation of the selected features shows that those chosen by C-IFBPFE have strong correlations with disease phenotypes and can identify important biomarkers from data related to biomedical problems.
Collapse
Affiliation(s)
- Guicheng Yang
- College of Computer Science and Engineering, Northeastern University, Shenyang, 110000, Liaoning, China.
| | - Wei Li
- Key Laboratory of Intelligent Computing in Medical Image (MIIC), Northeastern University, Ministry of Education, Shenyang, 110000, Liaoning, China; National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Shenyang, 110819, Liaoning, China.
| | - Weidong Xie
- College of Computer Science and Engineering, Northeastern University, Shenyang, 110000, Liaoning, China.
| | - Linjie Wang
- College of Computer Science and Engineering, Northeastern University, Shenyang, 110000, Liaoning, China.
| | - Kun Yu
- College of Medicine and Bioinformation Engineering, Northeastern University, Shenyang, 110819, Liaoning, China.
| |
Collapse
|
6
|
Rakhshaninejad M, Fathian M, Shirkoohi R, Barzinpour F, Gandomi AH. Refining breast cancer biomarker discovery and drug targeting through an advanced data-driven approach. BMC Bioinformatics 2024; 25:33. [PMID: 38253993 PMCID: PMC10810249 DOI: 10.1186/s12859-024-05657-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Accepted: 01/15/2024] [Indexed: 01/24/2024] Open
Abstract
Breast cancer remains a major public health challenge worldwide. The identification of accurate biomarkers is critical for the early detection and effective treatment of breast cancer. This study utilizes an integrative machine learning approach to analyze breast cancer gene expression data for superior biomarker and drug target discovery. Gene expression datasets, obtained from the GEO database, were merged post-preprocessing. From the merged dataset, differential expression analysis between breast cancer and normal samples revealed 164 differentially expressed genes. Meanwhile, a separate gene expression dataset revealed 350 differentially expressed genes. Additionally, the BGWO_SA_Ens algorithm, integrating binary grey wolf optimization and simulated annealing with an ensemble classifier, was employed on gene expression datasets to identify predictive genes including TOP2A, AKR1C3, EZH2, MMP1, EDNRB, S100B, and SPP1. From over 10,000 genes, BGWO_SA_Ens identified 1404 in the merged dataset (F1 score: 0.981, PR-AUC: 0.998, ROC-AUC: 0.995) and 1710 in the GSE45827 dataset (F1 score: 0.965, PR-AUC: 0.986, ROC-AUC: 0.972). The intersection of DEGs and BGWO_SA_Ens selected genes revealed 35 superior genes that were consistently significant across methods. Enrichment analyses uncovered the involvement of these superior genes in key pathways such as AMPK, Adipocytokine, and PPAR signaling. Protein-protein interaction network analysis highlighted subnetworks and central nodes. Finally, a drug-gene interaction investigation revealed connections between superior genes and anticancer drugs. Collectively, the machine learning workflow identified a robust gene signature for breast cancer, illuminated their biological roles, interactions and therapeutic associations, and underscored the potential of computational approaches in biomarker discovery and precision oncology.
Collapse
Affiliation(s)
- Morteza Rakhshaninejad
- Industrial Engineering Department, Iran University of Science and Technology, Hengam Street, Tehran, 1684613114, Tehran, Iran
| | - Mohammad Fathian
- Industrial Engineering Department, Iran University of Science and Technology, Hengam Street, Tehran, 1684613114, Tehran, Iran.
| | - Reza Shirkoohi
- Cancer Biology Research Center, Cancer Institute, Imam Khomeini Hospital Complex, Tehran University of Medical Sciences, Keshavarz Boulevard, Tehran, 1419733141, Tehran, Iran
| | - Farnaz Barzinpour
- Industrial Engineering Department, Iran University of Science and Technology, Hengam Street, Tehran, 1684613114, Tehran, Iran
| | - Amir H Gandomi
- Faculty of Engineering and Information Technology, University of Technology Sydney, Ultimo, 2007, NSW, Australia
- University Research and Innovation Center (EKIK), Óbuda University, Budapest, 1034, Hungary
| |
Collapse
|
7
|
Barrera-García J, Cisternas-Caneo F, Crawford B, Gómez Sánchez M, Soto R. Feature Selection Problem and Metaheuristics: A Systematic Literature Review about Its Formulation, Evaluation and Applications. Biomimetics (Basel) 2023; 9:9. [PMID: 38248583 PMCID: PMC10813816 DOI: 10.3390/biomimetics9010009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2023] [Revised: 12/16/2023] [Accepted: 12/18/2023] [Indexed: 01/23/2024] Open
Abstract
Feature selection is becoming a relevant problem within the field of machine learning. The feature selection problem focuses on the selection of the small, necessary, and sufficient subset of features that represent the general set of features, eliminating redundant and irrelevant information. Given the importance of the topic, in recent years there has been a boom in the study of the problem, generating a large number of related investigations. Given this, this work analyzes 161 articles published between 2019 and 2023 (20 April 2023), emphasizing the formulation of the problem and performance measures, and proposing classifications for the objective functions and evaluation metrics. Furthermore, an in-depth description and analysis of metaheuristics, benchmark datasets, and practical real-world applications are presented. Finally, in light of recent advances, this review paper provides future research opportunities.
Collapse
Affiliation(s)
- José Barrera-García
- Escuela de Ingeniería Informática, Pontificia Universidad Católica de Valparaíso, Avenida Brasil 2241, Valparaíso 2362807, Chile; (J.B.-G.); (F.C.-C.); (R.S.)
| | - Felipe Cisternas-Caneo
- Escuela de Ingeniería Informática, Pontificia Universidad Católica de Valparaíso, Avenida Brasil 2241, Valparaíso 2362807, Chile; (J.B.-G.); (F.C.-C.); (R.S.)
| | - Broderick Crawford
- Escuela de Ingeniería Informática, Pontificia Universidad Católica de Valparaíso, Avenida Brasil 2241, Valparaíso 2362807, Chile; (J.B.-G.); (F.C.-C.); (R.S.)
| | - Mariam Gómez Sánchez
- Departamento de Electrotecnia e Informática, Universidad Técnica Federico Santa María, Federico Santa María 6090, Viña del Mar 2520000, Chile;
| | - Ricardo Soto
- Escuela de Ingeniería Informática, Pontificia Universidad Católica de Valparaíso, Avenida Brasil 2241, Valparaíso 2362807, Chile; (J.B.-G.); (F.C.-C.); (R.S.)
| |
Collapse
|
8
|
Osama S, Ali M, Ali AA, Shaban H. Gene selection and tumor identification based on a hybrid of the multi-filter embedded recursive mountain gazelle algorithm. Comput Biol Med 2023; 167:107674. [PMID: 37976816 DOI: 10.1016/j.compbiomed.2023.107674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 10/09/2023] [Accepted: 11/06/2023] [Indexed: 11/19/2023]
Abstract
Microarray gene expression data are useful for identifying gene expression patterns associated with cancer outcomes; however, their high dimensionality make it difficult to extract meaningful information and accurately classify tumors. Hence, developing effective methods for reducing dimensionality while preserving relevant information is a crucial task. Hybrid-based gene selection methods are widely proposed in the gene expression analysis domain and can still be enhanced in terms of efficiency and reliability. This study proposes a new hybrid-based gene selection method, called multi-filter embedded mountain gazelle optimizer (MUL-MGO), which utilizes two filters and an embedded method to remove irrelevant genes, followed by selecting the most relevant genes using recently developed MGO algorithm. To the best of our knowledge, this is the first work to exploit MGO as a gene or feature selection method. A new version of MGO, called recursive mountain gazelle optimizer (RMGO), which implements MGO algorithm recursively to avoid local optima, minimize search space, and obtain minimum gene count without decreasing the classifier's performance, is developed. The proposed RMGO is used to develop a new hybrid gene selection method employing similar filters and embedded methods as MUL-MGO, but with a recursive MGO algorithm version. The resulting method is called multi-filter embedded recursive mountain gazelle optimizer (MUL-RMGO). Several classifiers are used for cancer classification. Accordingly, several experimental studies are performed on eight microarray gene expression datasets to demonstrate the proficiencies of MUL-MGO and MUL-RMGO methods. The experimental findings indicate the efficiency and productivity of the suggested MUL-MGO and MUL-RMGO methods for gene selection. The methods outperform cutting-edge methods in the literature, with MUL-RMGO exceeding MUL-MGO in terms of accuracy and selected gene count.
Collapse
Affiliation(s)
- Sarah Osama
- Computer Science Department, Faculty of Computers and Information, Minia University, Minia, Egypt.
| | - Moatez Ali
- Department of Internal Medicine, St. Barnabas Hospital, NY, USA.
| | - Abdelmgeid A Ali
- Computer Science Department, Faculty of Computers and Information, Minia University, Minia, Egypt.
| | - Hassan Shaban
- Computer Science Department, Faculty of Computers and Information, Minia University, Minia, Egypt.
| |
Collapse
|
9
|
Zengin HY, Karabulut E. Biomarker detection using corrected degree of domesticity in hybrid social network feature selection for improving classifier performance. BMC Bioinformatics 2023; 24:407. [PMID: 37904081 PMCID: PMC10617059 DOI: 10.1186/s12859-023-05540-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 10/20/2023] [Indexed: 11/01/2023] Open
Abstract
BACKGROUND Dimension reduction, especially feature selection, is an important step in improving classification performance for high-dimensional data. Particularly in cancer research, when reducing the number of features, i.e., genes, it is important to select the most informative features/potential biomarkers that could affect the diagnostic accuracy. Therefore, researchers continuously try to explore more efficient ways to reduce the large number of features/genes to a small but informative subset before the classification task. Hybrid methods have been extensively investigated for this purpose, and research to find the optimal approach is ongoing. Social network analysis is used as a part of a hybrid method, although there are several issues that have arisen when using social network tools, such as using a single environment for computing, constructing an adjacency matrix or computing network measures. Therefore, in our study, we apply a hybrid feature selection method consisting of several machine learning algorithms in addition to social network analysis with our proposed network metric, called the corrected degree of domesticity, in a single environment, R, to improve the support vector machine classifier's performance. In addition, we evaluate and compare the performances of several combinations used in the different steps of the method with a simulation experiment. RESULTS The proposed method improves the classifier's performance compared to using the whole feature set in all the cases we investigate. Additionally, in terms of the area under the receiver operating characteristic (ROC) curve, our approach improves classification performance compared to several approaches in the literature. CONCLUSION When using the corrected degree of domesticity as a network degree centrality measure, it is important to use our correction to compare nodes/features with no connection outside of their community since it provides a more accurate ranking among the features. Due to the nature of the hybrid method, which includes social network analysis, it is necessary to investigate possible combinations to provide an optimal solution for the microarray data used in the research.
Collapse
Affiliation(s)
- Hatice Yağmur Zengin
- Department of Biostatistics, Hacettepe University Faculty of Medicine, Sıhhiye, 06230, Ankara, Türkiye.
| | - Erdem Karabulut
- Department of Biostatistics, Hacettepe University Faculty of Medicine, Sıhhiye, 06230, Ankara, Türkiye
| |
Collapse
|
10
|
Yu X, Qin W, Lin X, Shan Z, Huang L, Shao Q, Wang L, Chen M. Synergizing the enhanced RIME with fuzzy K-nearest neighbor for diagnose of pulmonary hypertension. Comput Biol Med 2023; 165:107408. [PMID: 37672924 DOI: 10.1016/j.compbiomed.2023.107408] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 08/19/2023] [Accepted: 08/27/2023] [Indexed: 09/08/2023]
Abstract
Pulmonary hypertension (PH) is an uncommon yet severe condition characterized by sustained elevation of blood pressure in the pulmonary arteries. The delaying treatment can result in disease progression, right ventricular failure, increased risk of complications, and even death. Early recognition and timely treatment are crucial in halting PH progression, improving cardiac function, and reducing complications. Within this study, we present a highly promising hybrid model, known as bERIME_FKNN, which constitutes a feature selection approach integrating the enhanced rime algorithm (ERIME) and fuzzy K-nearest neighbor (FKNN) technique. The ERIME introduces the triangular game search strategy, which augments the algorithm's capacity for global exploration by judiciously electing distinct search agents across the exploratory domain. This approach fosters both competitive rivalry and collaborative synergy among these agents. Moreover, an random follower search strategy is incorporated to bestow a novel trajectory upon the principal search agent, thereby enriching the spectrum of search directions. Initially, ERIME is meticulously compared to 11 state-of-the-art algorithms using the IEEE CEC2017 benchmark functions across diverse dimensionalities such as 10, 30, 50, and 100, ultimately validating its exceptional optimization capability within the model. Subsequently, employing the color moment and grayscale co-occurrence matrix methodologies, a total of 118 features are extracted from 63 PH patients' and 60 healthy individuals' images, alongside an analysis of 14,514 recordings obtained from these patients utilizing the developed bERIME_FKNN model. The outcomes manifest that the bERIME_FKNN model exhibits a conspicuous prowess in the realm of PH classification, attaining an accuracy and specificity exceeding 99%. This implies that the model serves as a valuable computer-aided tool, delivering an advanced warning system for diagnosis and prognosis evaluation of PH.
Collapse
Affiliation(s)
- Xiaoming Yu
- Department of Pulmonary and Critical Care Medicine, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, Zhejiang, China.
| | - Wenxiang Qin
- The First School of Medicine, School of Information and Engineering, Wenzhou Medical University, Wenzhou, 325000, Zhejiang, China.
| | - Xiao Lin
- Department of Pulmonary and Critical Care Medicine, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, Zhejiang, China.
| | - Zhuohan Shan
- The First School of Medicine, School of Information and Engineering, Wenzhou Medical University, Wenzhou, 325000, Zhejiang, China.
| | - Liyao Huang
- Key Laboratory of Intelligent Informatics for Safety & Emergency of Zhejiang Province, Wenzhou University, Wenzhou, 325035, China.
| | - Qike Shao
- Key Laboratory of Intelligent Informatics for Safety & Emergency of Zhejiang Province, Wenzhou University, Wenzhou, 325035, China.
| | - Liangxing Wang
- Department of Pulmonary and Critical Care Medicine, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, Zhejiang, China.
| | - Mayun Chen
- Department of Pulmonary and Critical Care Medicine, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, Zhejiang, China.
| |
Collapse
|
11
|
Cavallaro C, Cutello V, Pavone M, Zito F. Discovering anomalies in big data: a review focused on the application of metaheuristics and machine learning techniques. Front Big Data 2023; 6:1179625. [PMID: 37663272 PMCID: PMC10470118 DOI: 10.3389/fdata.2023.1179625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Accepted: 07/24/2023] [Indexed: 09/05/2023] Open
Abstract
With the increase in available data from computer systems and their security threats, interest in anomaly detection has increased as well in recent years. The need to diagnose faults and cyberattacks has also focused scientific research on the automated classification of outliers in big data, as manual labeling is difficult in practice due to their huge volumes. The results obtained from data analysis can be used to generate alarms that anticipate anomalies and thus prevent system failures and attacks. Therefore, anomaly detection has the purpose of reducing maintenance costs as well as making decisions based on reports. During the last decade, the approaches proposed in the literature to classify unknown anomalies in log analysis, process analysis, and time series have been mainly based on machine learning and deep learning techniques. In this study, we provide an overview of current state-of-the-art methodologies, highlighting their advantages and disadvantages and the new challenges. In particular, we will see that there is no absolute best method, i.e., for any given dataset a different method may achieve the best result. Finally, we describe how the use of metaheuristics within machine learning algorithms makes it possible to have more robust and efficient tools.
Collapse
Affiliation(s)
- Claudia Cavallaro
- Department of Mathematics and Computer Science, University of Catania, Catania, Italy
| | | | | | | |
Collapse
|
12
|
Fu Q, Li Q, Li X. An improved multi-objective marine predator algorithm for gene selection in classification of cancer microarray data. Comput Biol Med 2023; 160:107020. [PMID: 37196457 DOI: 10.1016/j.compbiomed.2023.107020] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Revised: 04/09/2023] [Accepted: 05/05/2023] [Indexed: 05/19/2023]
Abstract
Gene selection (GS) is an important branch of interest within the field of feature selection, which is widely used in cancer classification. It provides essential insights into the pathogenesis of cancer and enables a deeper understanding of cancer data. In cancer classification, GS is essentially a multi-objective optimization problem, which aims to simultaneously optimize the two objectives of classification accuracy and the size of the gene subset. The marine predator algorithm (MPA) has been successfully employed in practical applications, however, its random initialization can lead to blindness, which may adversely affect the convergence of the algorithm. Furthermore, the elite individuals in guiding evolution are randomly chosen from the Pareto solutions, which may degrade the good exploration performance of the population. To overcome these limitations, a multi-objective improved MPA with continuous mapping initialization and leader selection strategies is proposed. In this work, a new continuous mapping initialization with ReliefF overwhelms the defects with less information in late evolution. Moreover, an improved elite selection mechanism with Gaussian distribution guides the population to evolve towards a better Pareto front. Finally, an efficient mutation method is adopted to prevent evolutionary stagnation. To evaluate its effectiveness, the proposed algorithm was compared with 9 famous algorithms. The experimental results on 16 datasets demonstrate that the proposed algorithm can significantly reduce the data dimension and obtain the highest classification accuracy on most of high-dimension cancer microarray datasets.
Collapse
Affiliation(s)
- Qiyong Fu
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China
| | - Qi Li
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China
| | - Xiaobo Li
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China.
| |
Collapse
|
13
|
Vahabzadeh V, Moattar MH. Robust microarray data feature selection using a correntropy based distance metric learning approach. Comput Biol Med 2023; 161:107056. [PMID: 37235945 DOI: 10.1016/j.compbiomed.2023.107056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 04/18/2023] [Accepted: 05/20/2023] [Indexed: 05/28/2023]
Abstract
Classification of high-dimensional microarray data is a challenge in bioinformatics and genetic data processing. One of the challenging issues of feature selection is the presence of outliers. The Euclidean distance metric is sensitive to outliers. In this study, a distance metric learning based feature selection approach that uses the correntropy function as the discrimination metric is proposed. For this purpose, the metric learning problem is formulated as an optimization problem and solved using the Lagrange method. The output of the approach signifies the most important and robust features. After feature selection, different classification methods such as SVM, decision trees, and NN classifiers are used to investigate the classification accuracy of the proposed method as well as precision, recall, and F-measure. Experiments are carried out on 13 high-dimensional datasets and show that the proposed method outperforms the previous models in terms of accuracy and robustness.
Collapse
Affiliation(s)
- Venus Vahabzadeh
- Department of Software Engineering, Mashhad Branch, Islamic Azad University, Mashhad, Iran.
| | | |
Collapse
|
14
|
Pati SK, Gupta MK, Banerjee A, Mallik S, Zhao Z. PPIGCF: A Protein-Protein Interaction-Based Gene Correlation Filter for Optimal Gene Selection. Genes (Basel) 2023; 14:genes14051063. [PMID: 37239423 DOI: 10.3390/genes14051063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 04/26/2023] [Accepted: 05/04/2023] [Indexed: 05/28/2023] Open
Abstract
Biological data at the omics level are highly complex, requiring powerful computational approaches to identifying significant intrinsic characteristics to further search for informative markers involved in the studied phenotype. In this paper, we propose a novel dimension reduction technique, protein-protein interaction-based gene correlation filtration (PPIGCF), which builds on gene ontology (GO) and protein-protein interaction (PPI) structures to analyze microarray gene expression data. PPIGCF first extracts the gene symbols with their expression from the experimental dataset, and then, classifies them based on GO biological process (BP) and cellular component (CC) annotations. Every classification group inherits all the information on its CCs, corresponding to the BPs, to establish a PPI network. Then, the gene correlation filter (regarding gene rank and the proposed correlation coefficient) is computed on every network and eradicates a few weakly correlated genes connected with their corresponding networks. PPIGCF finds the information content (IC) of the other genes related to the PPI network and takes only the genes with the highest IC values. The satisfactory results of PPIGCF are used to prioritize significant genes. We performed a comparison with current methods to demonstrate our technique's efficiency. From the experiment, it can be concluded that PPIGCF needs fewer genes to reach reasonable accuracy (~99%) for cancer classification. This paper reduces the computational complexity and enhances the time complexity of biomarker discovery from datasets.
Collapse
Affiliation(s)
- Soumen Kumar Pati
- Department of Bioinformatics, Maulana Abul Kalam Azad University of Technology, Haringhata 741249, West Bengal, India
| | - Manan Kumar Gupta
- Department of Bioinformatics, Maulana Abul Kalam Azad University of Technology, Haringhata 741249, West Bengal, India
| | - Ayan Banerjee
- Department of Computer Science and Engineering, Jalpaiguri Govt. Engineering College, Jalpaiguri 735102, West Bengal, India
| | - Saurav Mallik
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Department of Environmental Health, Harvard T H Chan School of Public Health, Boston, MA 02115, USA
- Department of Pharmacology & Toxicology, University of Arizona, Tucson, AZ 85721, USA
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
15
|
Bhattacharya A, Saha B, Chattopadhyay S, Sarkar R. Deep feature selection using adaptive β-Hill Climbing aided whale optimization algorithm for lung and colon cancer detection. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2023.104692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
|
16
|
Blourchi P, Ghasemzadeh A. Majority voting based on different feature ranking techniques from gene expression. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2023. [DOI: 10.3233/jifs-224029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]
Abstract
In bioinformatics studies, many modeling tasks are characterized by high dimensionality, leading to the widespread use of feature selection techniques to reduce dimensionality. There are a multitude of feature selection techniques that have been proposed in the literature, each relying on a single measurement method to select candidate features. This has an impact on the classification performance. To address this issue, we propose a majority voting method that uses five different feature ranking techniques: entropy score, Pearson’s correlation coefficient, Spearman correlation coefficient, Kendall correlation coefficient, and t-test. By using a majority voting approach, only the features that appear in all five ranking methods are selected. This selection process has three key advantages over traditional techniques. Firstly, it is independent of any particular feature ranking method. Secondly, the feature space dimension is significantly reduced compared to other ranking methods. Finally, the performance is improved as the most discriminatory and informative features are selected via the majority voting process. The performance of the proposed method was evaluated using an SVM, and the results were assessed using accuracy, sensitivity, specificity, and AUC on various biomedical datasets. The results demonstrate the superior effectiveness of the proposed method compared to state-of-the-art methods in the literature.
Collapse
|
17
|
Marjit S, Bhattacharyya T, Chatterjee B, Sarkar R. Simulated annealing aided genetic algorithm for gene selection from microarray data. Comput Biol Med 2023; 158:106854. [PMID: 37023541 DOI: 10.1016/j.compbiomed.2023.106854] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 02/26/2023] [Accepted: 03/30/2023] [Indexed: 04/03/2023]
Abstract
In recent times, microarray gene expression datasets have gained significant popularity due to their usefulness to identify different types of cancer directly through bio-markers. These datasets possess a high gene-to-sample ratio and high dimensionality, with only a few genes functioning as bio-markers. Consequently, a significant amount of data is redundant, and it is essential to filter out important genes carefully. In this paper, we propose the Simulated Annealing aided Genetic Algorithm (SAGA), a meta-heuristic approach to identify informative genes from high-dimensional datasets. SAGA utilizes a two-way mutation-based Simulated Annealing (SA) as well as Genetic Algorithm (GA) to ensure a good trade-off between exploitation and exploration of the search space, respectively. The naive version of GA often gets stuck in a local optimum and depends on the initial population, leading to premature convergence. To address this, we have blended a clustering-based population generation with SA to distribute the initial population of GA over the entire feature space. To further enhance the performance, we reduce the initial search space by a score-based filter approach called the Mutually Informed Correlation Coefficient (MICC). The proposed method is evaluated on 6 microarray and 6 omics datasets. Comparison of SAGA with contemporary algorithms has shown that SAGA performs much better than its peers. Our code is available at https://github.com/shyammarjit/SAGA.
Collapse
|
18
|
Devi SS, Prithiviraj K.. Breast Cancer Classification With Microarray Gene Expression Data Based on Improved Whale Optimization Algorithm. INTERNATIONAL JOURNAL OF SWARM INTELLIGENCE RESEARCH 2023. [DOI: 10.4018/ijsir.317091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Breast cancer is one of the most common and dangerous cancer types in women worldwide. Since it is generally a genetic disease, microarray technology-based cancer prediction is technically significant among lot of diagnosis methods. The microarray gene expression data contains fewer samples with many redundant and noisy genes. It leads to inaccurate diagnose and low prediction accuracy. To overcome these difficulties, this paper proposes an Improved Whale Optimization Algorithm (IWOA) for wrapper based feature selection in gene expression data. The proposed IWOA incorporates modified cross over and mutation operations to enhance the exploration and exploitation of classical WOA. The proposed IWOA adapts multiobjective fitness function, which simultaneously balance between minimization of error rate and feature selection. The experimental analysis demonstrated that, the proposed IWOA with Gradient Boost Classifier (GBC) achieves high classification accuracy of 97.7% with minimum subset of features and also converges quickly for the breast cancer dataset.
Collapse
Affiliation(s)
- S. Sathiya Devi
- University College of Engineering, Birla Institute of Technology, Trichy, India
| | - Prithiviraj K.
- University College of Engineering, Birla Institute of Technology, Trichy, India
| |
Collapse
|
19
|
Zhong C, Li G, Meng Z, Li H, He W. A self-adaptive quantum equilibrium optimizer with artificial bee colony for feature selection. Comput Biol Med 2023; 153:106520. [PMID: 36608463 DOI: 10.1016/j.compbiomed.2022.106520] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 11/28/2022] [Accepted: 12/31/2022] [Indexed: 01/03/2023]
Abstract
Feature selection (FS) is a popular data pre-processing technique in machine learning to extract the optimal features to maintain or increase the classification accuracy of the dataset, which is a combinatorial optimization problem, requiring a powerful optimizer to obtain the optimum subset. The equilibrium optimizer (EO) is a recent physical-based metaheuristic algorithm with good performance for various optimization problems, but it may encounter premature or the local convergence in feature selection. This work presents a self-adaptive quantum EO with artificial bee colony for feature selection, named SQEOABC. In the proposed algorithm, the quantum theory and the self-adaptive mechanism are employed into the updating rule of EO to enhance convergence, and the updating mechanism from the artificial bee colony is also incorporated into EO to achieve appropriate FS solutions. In the experiments, 25 benchmark datasets from the UCI repository are investigated to verify SQEOABC, which is compared with several state-of-the-art metaheuristic algorithms and the variants of EO. The statistical results of fitness values and accuracy demonstrate that SQEOABC has better performance than the compared algorithms and the variants of EO. Finally, a real-world FS problem from COVID-19 illustrates the effectiveness and superiority of SQEOABC.
Collapse
Affiliation(s)
- Changting Zhong
- Department of Engineering Mechanics, State Key Laboratory of Structural Analyses for Industrial Equipment, Dalian University of Technology, Dalian, 116024, China; School of Civil Engineering and Architecture, Hainan University, Haikou 570228, China.
| | - Gang Li
- Department of Engineering Mechanics, State Key Laboratory of Structural Analyses for Industrial Equipment, Dalian University of Technology, Dalian, 116024, China; Ningbo Institute of Dalian University of Technology, Ningbo, 315000, China.
| | - Zeng Meng
- School of Civil Engineering, Hefei University of Technology, Hefei, 230009, China.
| | - Haijiang Li
- BIM for Smart Engineering Centre, Cardiff School of Engineering, Cardiff University, Queen's Buildings, Cardiff, CF24 3AA, Whales, UK.
| | - Wanxin He
- Department of Engineering Mechanics, State Key Laboratory of Structural Analyses for Industrial Equipment, Dalian University of Technology, Dalian, 116024, China.
| |
Collapse
|
20
|
Zhang M, Wang JS, Liu Y, Wang M, Li XD, Guo FJ. Feature selection method based on stochastic fractal search henry gas solubility optimization algorithm. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2023. [DOI: 10.3233/jifs-221036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
In most data mining tasks, feature selection is an essential preprocessing stage. Henry’s Gas Solubility Optimization (HGSO) algorithm is a physical heuristic algorithm based on Henry’s law, which simulates the process of gas solubility in liquid with temperature. In this paper, an improved Henry’s Gas Solubility Optimization based on stochastic fractal search (SFS-HGSO) is proposed for feature selection and engineering optimization. Three stochastic fractal strategies based on Gaussian walk, Lévy flight and Brownian motion are adopted respectively, and the diffusion is based on the high-quality solutions obtained by the original algorithm. Individuals with different fitness are assigned different energies, and the number of diffusing individuals is determined according to individual energy. This strategy increases the diversity of search strategies and enhances the ability of local search. It greatly improves the shortcomings of the original HGSO position updating method is single and the convergence speed is slow. This algorithm is used to solve the problem of feature selection, and KNN classifier is used to evaluate the effectiveness of selected features. In order to verify the performance of the proposed feature selection method, 20 standard UCI benchmark datasets are used, and the performance is compared with other swarm intelligence optimization algorithms, such as WOA, HHO and HBA. The algorithm is also applied to the solution of benchmark function. Experimental results show that these three improved strategies can effectively improve the performance of HGSO algorithm, and achieve excellent results in feature selection and engineering optimization problems.
Collapse
Affiliation(s)
- Min Zhang
- School of Electronic and Information Engineering, University of Science & Technology Liaoning, Anshan, China
| | - Jie-Sheng Wang
- School of Electronic and Information Engineering, University of Science & Technology Liaoning, Anshan, China
| | - Yu Liu
- School of Electronic and Information Engineering, University of Science & Technology Liaoning, Anshan, China
| | - Min Wang
- School of Electronic and Information Engineering, University of Science & Technology Liaoning, Anshan, China
| | - Xu-Dong Li
- School of Electronic and Information Engineering, University of Science & Technology Liaoning, Anshan, China
| | - Fu-Jun Guo
- School of Electronic and Information Engineering, University of Science & Technology Liaoning, Anshan, China
| |
Collapse
|
21
|
Nadimi-Shahraki MH, Zamani H, Mirjalili S. Enhanced whale optimization algorithm for medical feature selection: A COVID-19 case study. Comput Biol Med 2022; 148:105858. [PMID: 35868045 DOI: 10.1016/j.compbiomed.2022.105858] [Citation(s) in RCA: 40] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 06/15/2022] [Accepted: 07/08/2022] [Indexed: 01/01/2023]
Abstract
The whale optimization algorithm (WOA) is a prominent problem solver which is broadly applied to solve NP-hard problems such as feature selection. However, it and most of its variants suffer from low population diversity and poor search strategy. Introducing efficient strategies is highly demanded to mitigate these core drawbacks of WOA particularly for dealing with the feature selection problem. Therefore, this paper is devoted to proposing an enhanced whale optimization algorithm named E-WOA using a pooling mechanism and three effective search strategies named migrating, preferential selecting, and enriched encircling prey. The performance of E-WOA is evaluated and compared with well-known WOA variants to solve global optimization problems. The obtained results proved that the E-WOA outperforms WOA's variants. After E-WOA showed a sufficient performance, then, it was used to propose a binary E-WOA named BE-WOA to select effective features, particularly from medical datasets. The BE-WOA is validated using medical diseases datasets and compared with the latest high-performing optimization algorithms in terms of fitness, accuracy, sensitivity, precision, and number of features. Moreover, the BE-WOA is applied to detect coronavirus disease 2019 (COVID-19) disease. The experimental and statistical results prove the efficiency of the BE-WOA in searching the problem space and selecting the most effective features compared to comparative optimization algorithms.
Collapse
Affiliation(s)
- Mohammad H Nadimi-Shahraki
- Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University, Najafabad, Iran; Big Data Research Center, Najafabad Branch, Islamic Azad University, Najafabad, Iran; Centre for Artificial Intelligence Research and Optimisation, Torrens University Australia, Brisbane, Australia.
| | - Hoda Zamani
- Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University, Najafabad, Iran; Big Data Research Center, Najafabad Branch, Islamic Azad University, Najafabad, Iran
| | - Seyedali Mirjalili
- Centre for Artificial Intelligence Research and Optimisation, Torrens University Australia, Brisbane, Australia; Yonsei Frontier Lab, Yonsei University, Seoul, Republic of Korea
| |
Collapse
|
22
|
Binary Aquila Optimizer for Selecting Effective Features from Medical Data: A COVID-19 Case Study. MATHEMATICS 2022. [DOI: 10.3390/math10111929] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Medical technological advancements have led to the creation of various large datasets with numerous attributes. The presence of redundant and irrelevant features in datasets negatively influences algorithms and leads to decreases in the performance of the algorithms. Using effective features in data mining and analyzing tasks such as classification can increase the accuracy of the results and relevant decisions made by decision-makers using them. This increase can become more acute when dealing with challenging, large-scale problems in medical applications. Nature-inspired metaheuristics show superior performance in finding optimal feature subsets in the literature. As a seminal attempt, a wrapper feature selection approach is presented on the basis of the newly proposed Aquila optimizer (AO) in this work. In this regard, the wrapper approach uses AO as a search algorithm in order to discover the most effective feature subset. S-shaped binary Aquila optimizer (SBAO) and V-shaped binary Aquila optimizer (VBAO) are two binary algorithms suggested for feature selection in medical datasets. Binary position vectors are generated utilizing S- and V-shaped transfer functions while the search space stays continuous. The suggested algorithms are compared to six recent binary optimization algorithms on seven benchmark medical datasets. In comparison to the comparative algorithms, the gained results demonstrate that using both proposed BAO variants can improve the classification accuracy on these medical datasets. The proposed algorithm is also tested on the real-dataset COVID-19. The findings testified that SBAO outperforms comparative algorithms regarding the least number of selected features with the highest accuracy.
Collapse
|