1
|
Zhang L, Chen X. Social coevolution and Sine chaotic opposition learning Chimp Optimization Algorithm for feature selection. Sci Rep 2024; 14:15413. [PMID: 38965341 PMCID: PMC11224333 DOI: 10.1038/s41598-024-66285-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 07/01/2024] [Indexed: 07/06/2024] Open
Abstract
Feature selection is a hot problem in machine learning. Swarm intelligence algorithms play an essential role in feature selection due to their excellent optimisation ability. The Chimp Optimisation Algorithm (CHoA) is a new type of swarm intelligence algorithm. It has quickly won widespread attention in the academic community due to its fast convergence speed and easy implementation. However, CHoA has specific challenges in balancing local and global search, limiting its optimisation accuracy and leading to premature convergence, thus affecting the algorithm's performance on feature selection tasks. This study proposes Social coevolution and Sine chaotic opposition learning Chimp Optimization Algorithm (SOSCHoA). SOSCHoA enhances inter-population interaction through social coevolution, improving local search. Additionally, it introduces sine chaotic opposition learning to increase population diversity and prevent local optima. Extensive experiments on 12 high-dimensional classification datasets demonstrate that SOSCHoA outperforms existing algorithms in classification accuracy, convergence, and stability. Although SOSCHoA shows advantages in handling high-dimensional datasets, there is room for future research and optimization, particularly concerning feature dimensionality reduction.
Collapse
Affiliation(s)
- Li Zhang
- College of Computer Engineering, Jiangsu University of Technology, Changzhou, 213001, People's Republic of China.
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, People's Republic of China.
| | - XiaoBo Chen
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, People's Republic of China
- People's Bank of China Changzhou City Center Branch, Jiangsu, 213001, Changzhou, People's Republic of China
| |
Collapse
|
2
|
Zhou X, Chen Y, Gui W, Heidari AA, Cai Z, Wang M, Chen H, Li C. Enhanced differential evolution algorithm for feature selection in tuberculous pleural effusion clinical characteristics analysis. Artif Intell Med 2024; 153:102886. [PMID: 38749310 DOI: 10.1016/j.artmed.2024.102886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 03/17/2024] [Accepted: 04/27/2024] [Indexed: 06/11/2024]
Abstract
Tuberculous pleural effusion poses a significant threat to human health due to its potential for severe disease and mortality. Without timely treatment, it may lead to fatal consequences. Therefore, early identification and prompt treatment are crucial for preventing problems such as chronic lung disease, respiratory failure, and death. This study proposes an enhanced differential evolution algorithm based on colony predation and dispersed foraging strategies. A series of experiments conducted on the IEEE CEC 2017 competition dataset validated the global optimization capability of the method. Additionally, a binary version of the algorithm is introduced to assess the algorithm's ability to address feature selection problems. Comprehensive comparisons of the effectiveness of the proposed algorithm with 8 similar algorithms were conducted using public datasets with feature sizes ranging from 10 to 10,000. Experimental results demonstrate that the proposed method is an effective feature selection approach. Furthermore, a predictive model for tuberculous pleural effusion is established by integrating the proposed algorithm with support vector machines. The performance of the proposed model is validated using clinical records collected from 140 tuberculous pleural effusion patients, totaling 10,780 instances. Experimental results indicate that the proposed model can identify key correlated indicators such as pleural effusion adenosine deaminase, temperature, white blood cell count, and pleural effusion color, aiding in the clinical feature analysis of tuberculous pleural effusion and providing early warning for its treatment and prediction.
Collapse
Affiliation(s)
- Xinsen Zhou
- Institute of Big Data and Information Technology, Wenzhou University, Wenzhou 325000, China.
| | - Yi Chen
- Institute of Big Data and Information Technology, Wenzhou University, Wenzhou 325000, China.
| | - Wenyong Gui
- Institute of Big Data and Information Technology, Wenzhou University, Wenzhou 325000, China.
| | - Ali Asghar Heidari
- School of Surveying and Geospatial Engineering, College of Engineering, University of Tehran, Tehran, Iran.
| | - Zhennao Cai
- Institute of Big Data and Information Technology, Wenzhou University, Wenzhou 325000, China.
| | - Mingjing Wang
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou 325000, China.
| | - Huiling Chen
- Institute of Big Data and Information Technology, Wenzhou University, Wenzhou 325000, China.
| | - Chengye Li
- Department of Pulmonary and Critical Care Medicine, the First Affiliated Hospital of Wenzhou Medical University, Wenzhou 325000, China.
| |
Collapse
|
3
|
Zhang L, Chen X. Enhanced chimp hierarchy optimization algorithm with adaptive lens imaging for feature selection in data classification. Sci Rep 2024; 14:6910. [PMID: 38519568 PMCID: PMC10959962 DOI: 10.1038/s41598-024-57518-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Accepted: 03/19/2024] [Indexed: 03/25/2024] Open
Abstract
Feature selection is a critical component of machine learning and data mining to remove redundant and irrelevant features from a dataset. The Chimp Optimization Algorithm (CHoA) is widely applicable to various optimization problems due to its low number of parameters and fast convergence rate. However, CHoA has a weak exploration capability and tends to fall into local optimal solutions in solving the feature selection process, leading to ineffective removal of irrelevant and redundant features. To solve this problem, this paper proposes the Enhanced Chimp Hierarchy Optimization Algorithm for adaptive lens imaging (ALI-CHoASH) for searching the optimal classification problems for the optimal subset of features. Specifically, to enhance the exploration and exploitation capability of CHoA, we designed a chimp social hierarchy. We employed a novel social class factor to label the class situation of each chimp, enabling effective modelling and optimization of the relationships among chimp individuals. Then, to parse chimps' social and collaborative behaviours with different social classes, we introduce other attacking prey and autonomous search strategies to help chimp individuals approach the optimal solution faster. In addition, considering the poor diversity of chimp groups in the late iteration, we propose an adaptive lens imaging back-learning strategy to avoid the algorithm falling into a local optimum. Finally, we validate the improvement of ALI-CHoASH in exploration and exploitation capabilities using several high-dimensional datasets. We also compare ALI-CHoASH with eight state-of-the-art methods in classification accuracy, feature subset size, and computation time to demonstrate its superiority.
Collapse
Affiliation(s)
- Li Zhang
- College of Computer Engineering, Jiangsu University of Technology, Changzhou, 213001, People's Republic of China.
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education Jilin University, Changchun, 130012, People's Republic of China.
| | - XiaoBo Chen
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education Jilin University, Changchun, 130012, People's Republic of China
- People's Bank of China Changzhou City Center Branch, Changzhou, 213001, Jiangsu, People's Republic of China
| |
Collapse
|
4
|
Yang G, Li W, Xie W, Wang L, Yu K. An improved binary particle swarm optimization algorithm for clinical cancer biomarker identification in microarray data. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 244:107987. [PMID: 38157825 DOI: 10.1016/j.cmpb.2023.107987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 11/04/2023] [Accepted: 12/16/2023] [Indexed: 01/03/2024]
Abstract
BACKGROUND AND OBJECTIVE The limited number of samples and high-dimensional features in microarray data make selecting a small number of features for disease diagnosis a challenging problem. Traditional feature selection methods based on evolutionary algorithms are difficult to search for the optimal set of features in a limited time when dealing with the high-dimensional feature selection problem. New solutions are proposed to solve the above problems. METHODS In this paper, we propose a hybrid feature selection method (C-IFBPFE) for biomarker identification in microarray data, which combines clustering and improved binary particle swarm optimization while incorporating an embedded feature elimination strategy. Firstly, an adaptive redundant feature judgment method based on correlation clustering is proposed for feature screening to reduce the search space in the subsequent stage. Secondly, we propose an improved flipping probability-based binary particle swarm optimization (IFBPSO), better applicable to the binary particle swarm optimization problem. Finally, we also design a new feature elimination (FE) strategy embedded in the binary particle swarm optimization algorithm. This strategy gradually removes poorer features during iterations to reduce the number of features and improve accuracy. RESULTS We compared C-IFBPFE with other published hybrid feature selection methods on eight public datasets and analyzed the impact of each improvement. The proposed method outperforms other current state-of-the-art feature selection methods in terms of accuracy, number of features, sensitivity, and specificity. The ablation study of this method validates the efficacy of each component, especially the proposed feature elimination strategy significantly improves the performance of the algorithm. CONCLUSIONS The hybrid feature selection method proposed in this paper helps address the issue of high-dimensional microarray data with few samples. It can select a small subset of features and achieve high classification accuracy on microarray datasets. Additionally, independent validation of the selected features shows that those chosen by C-IFBPFE have strong correlations with disease phenotypes and can identify important biomarkers from data related to biomedical problems.
Collapse
Affiliation(s)
- Guicheng Yang
- College of Computer Science and Engineering, Northeastern University, Shenyang, 110000, Liaoning, China.
| | - Wei Li
- Key Laboratory of Intelligent Computing in Medical Image (MIIC), Northeastern University, Ministry of Education, Shenyang, 110000, Liaoning, China; National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Shenyang, 110819, Liaoning, China.
| | - Weidong Xie
- College of Computer Science and Engineering, Northeastern University, Shenyang, 110000, Liaoning, China.
| | - Linjie Wang
- College of Computer Science and Engineering, Northeastern University, Shenyang, 110000, Liaoning, China.
| | - Kun Yu
- College of Medicine and Bioinformation Engineering, Northeastern University, Shenyang, 110819, Liaoning, China.
| |
Collapse
|
5
|
Feda AK, Adegboye M, Adegboye OR, Agyekum EB, Fendzi Mbasso W, Kamel S. S-shaped grey wolf optimizer-based FOX algorithm for feature selection. Heliyon 2024; 10:e24192. [PMID: 38293420 PMCID: PMC10825485 DOI: 10.1016/j.heliyon.2024.e24192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 12/09/2023] [Accepted: 01/04/2024] [Indexed: 02/01/2024] Open
Abstract
The FOX algorithm is a recently developed metaheuristic approach inspired by the behavior of foxes in their natural habitat. While the FOX algorithm exhibits commendable performance, its basic version, in complex problem scenarios, may become trapped in local optima, failing to identify the optimal solution due to its weak exploitation capabilities. This research addresses a high-dimensional feature selection problem. In feature selection, the most informative features are retained while discarding irrelevant ones. An enhanced version of the FOX algorithm is proposed, aiming to mitigate its drawbacks in feature selection. The improved approach referred to as S-shaped Grey Wolf Optimizer-based FOX (FOX-GWO), which focuses on augmenting the local search capabilities of the FOX algorithm via the integration of GWO. Additionally, the introduction of an S-shaped transfer function enables the population to explore both binary options throughout the search process. Through a series of experiments on 18 datasets with varying dimensions, FOX-GWO outperforms in 83.33 % of datasets for average accuracy, 61.11 % for reduced feature dimensionality, and 72.22 % for average fitness value across the 18 datasets. Meaning it efficiently explores high-dimensional spaces. These findings highlight its practical value and potential to advance feature selection in complex data analysis, enhancing model prediction accuracy.
Collapse
Affiliation(s)
- Afi Kekeli Feda
- Management Information System Department, European University of Lefke, Mersin, 10, Turkey
| | | | | | - Ephraim Bonah Agyekum
- Department of Nuclear and Renewable Energy, Ural Federal University named after the first President of Russia Boris Yeltsin, 620002, 19 Mira Street, Ekaterinburg, Russia
| | - Wulfran Fendzi Mbasso
- Laboratory of Technology and Applied Sciences, University Institute of Technology, University of Douala, PO Box: 8698, Douala, Cameroon
| | - Salah Kamel
- Department of Electrical Engineering, Faculty of Engineering, Aswan University, Aswan, 81542, Egypt
| |
Collapse
|
6
|
Barrera-García J, Cisternas-Caneo F, Crawford B, Gómez Sánchez M, Soto R. Feature Selection Problem and Metaheuristics: A Systematic Literature Review about Its Formulation, Evaluation and Applications. Biomimetics (Basel) 2023; 9:9. [PMID: 38248583 PMCID: PMC10813816 DOI: 10.3390/biomimetics9010009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2023] [Revised: 12/16/2023] [Accepted: 12/18/2023] [Indexed: 01/23/2024] Open
Abstract
Feature selection is becoming a relevant problem within the field of machine learning. The feature selection problem focuses on the selection of the small, necessary, and sufficient subset of features that represent the general set of features, eliminating redundant and irrelevant information. Given the importance of the topic, in recent years there has been a boom in the study of the problem, generating a large number of related investigations. Given this, this work analyzes 161 articles published between 2019 and 2023 (20 April 2023), emphasizing the formulation of the problem and performance measures, and proposing classifications for the objective functions and evaluation metrics. Furthermore, an in-depth description and analysis of metaheuristics, benchmark datasets, and practical real-world applications are presented. Finally, in light of recent advances, this review paper provides future research opportunities.
Collapse
Affiliation(s)
- José Barrera-García
- Escuela de Ingeniería Informática, Pontificia Universidad Católica de Valparaíso, Avenida Brasil 2241, Valparaíso 2362807, Chile; (J.B.-G.); (F.C.-C.); (R.S.)
| | - Felipe Cisternas-Caneo
- Escuela de Ingeniería Informática, Pontificia Universidad Católica de Valparaíso, Avenida Brasil 2241, Valparaíso 2362807, Chile; (J.B.-G.); (F.C.-C.); (R.S.)
| | - Broderick Crawford
- Escuela de Ingeniería Informática, Pontificia Universidad Católica de Valparaíso, Avenida Brasil 2241, Valparaíso 2362807, Chile; (J.B.-G.); (F.C.-C.); (R.S.)
| | - Mariam Gómez Sánchez
- Departamento de Electrotecnia e Informática, Universidad Técnica Federico Santa María, Federico Santa María 6090, Viña del Mar 2520000, Chile;
| | - Ricardo Soto
- Escuela de Ingeniería Informática, Pontificia Universidad Católica de Valparaíso, Avenida Brasil 2241, Valparaíso 2362807, Chile; (J.B.-G.); (F.C.-C.); (R.S.)
| |
Collapse
|
7
|
Rabie AH, Saleh AI. A new diagnostic autism spectrum disorder (DASD) strategy using ensemble diagnosis methodology based on blood tests. Health Inf Sci Syst 2023; 11:36. [PMID: 37588694 PMCID: PMC10425316 DOI: 10.1007/s13755-023-00234-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 07/16/2023] [Indexed: 08/18/2023] Open
Abstract
Autism Spectrum Disorder (ASD) is a complex neurodevelopmental disease that impacts a child's way of behavior and social communication. In early childhood, children with ASD typically exhibit symptoms such as difficulty in social interaction, limited interests, and repetitive behavior. Although there are symptoms of ASD disease, most people do not understand these symptoms and therefore do not have enough knowledge to determine whether or not a child has ASD. Thus, early detection of ASD children based on accurate diagnosis model based on Artificial Intelligence (AI) techniques is a critical process to reduce the spread of the disease and control it early. Through this paper, a new Diagnostic Autism Spectrum Disorder (DASD) strategy is presented to quickly and accurately detect ASD children. DASD contains two layers called Data Filter Layer (DFL) and Diagnostic Layer (DL). Feature selection and outlier rejection processes are performed in DFL to filter the ASD dataset from less important features and incorrect data before using the diagnostic or detection method in DL to accurately diagnose the patients. In DFL, Binary Gray Wolf Optimization (BGWO) technique is used to select the most significant set of features while Binary Genetic Algorithm (BGA) technique is used to eliminate invalid training data. Then, Ensemble Diagnosis Methodology (EDM) as a new diagnostic technique is used in DL to quickly and precisely diagnose ASD children. In this paper, the main contribution is EDM that consists of several diagnostic models including Enhanced K-Nearest Neighbors (EKNN) as one of them. EKNN represents a hybrid technique consisting of three methods called K-Nearest Neighbors (KNN), Naïve Bayes (NB), and Chimp Optimization Algorithm (COA). NB is used as a weighed method to convert data from feature space to weight space. Then, COA is used as a data generation method to reduce the size of training dataset. Finally, KNN is applied on the reduced data in weight space to quickly and accurately diagnose ASD children based on new training dataset with small size. ASD blood tests dataset is used to test the proposed DASD strategy against other recent strategies [1]. It is concluded that the DASD strategy is superior to other strategies based on many performance measures including accuracy, error, recall, precision, micro_average precision, macro_average precision, micro_average recall, macro_average recall, F1-measure, and implementation-time with values equal to 0.93, 0.07, 0.83, 0.82, 0.80, 0.83, 0.79, 0.81, 0.79, and 1.5 s respectively.
Collapse
Affiliation(s)
- Asmaa H. Rabie
- ComputerEngineering and Systems Dept., Faculty of Engineering, Mansoura University, Mansoura, Egypt
| | - Ahmed I. Saleh
- ComputerEngineering and Systems Dept., Faculty of Engineering, Mansoura University, Mansoura, Egypt
| |
Collapse
|
8
|
Ahmed FR, Alsenany SA, Abdelaliem SMF, Deif MA. Development of a hybrid LSTM with chimp optimization algorithm for the pressure ventilator prediction. Sci Rep 2023; 13:20927. [PMID: 38017008 PMCID: PMC10684522 DOI: 10.1038/s41598-023-47837-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Accepted: 11/19/2023] [Indexed: 11/30/2023] Open
Abstract
The utilization of mechanical ventilation is of utmost importance in the management of individuals afflicted with severe pulmonary conditions. During periods of a pandemic, it becomes imperative to build ventilators that possess the capability to autonomously adapt parameters over the course of treatment. In order to fulfil this requirement, a research investigation was undertaken with the aim of forecasting the magnitude of pressure applied on the patient by the ventilator. The aforementioned forecast was derived from a comprehensive analysis of many variables, including the ventilator's characteristics and the patient's medical state. This analysis was conducted utilizing a sophisticated computational model referred to as Long Short-Term Memory (LSTM). To enhance the predictive accuracy of the LSTM model, the researchers utilized the Chimp Optimization method (ChoA) method. The integration of LSTM and ChoA led to the development of the LSTM-ChoA model, which successfully tackled the issue of hyperparameter selection for the LSTM model. The experimental results revealed that the LSTM-ChoA model exhibited superior performance compared to alternative optimization algorithms, namely whale grey wolf optimizer (GWO), optimization algorithm (WOA), and particle swarm optimization (PSO). Additionally, the LSTM-ChoA model outperformed regression models, including K-nearest neighbor (KNN) Regressor, Random and Forest (RF) Regressor, and Support Vector Machine (SVM) Regressor, in accurately predicting ventilator pressure. The findings indicate that the suggested predictive model, LSTM-ChoA, demonstrates a reduced mean square error (MSE) value. Specifically, when comparing ChoA with GWO, the MSE fell by around 14.8%. Furthermore, when comparing ChoA with PSO and WOA, the MSE decreased by approximately 60%. Additionally, the analysis of variance (ANOVA) findings revealed that the p-value for the LSTM-ChoA model was 0.000, which is less than the predetermined significance level of 0.05. This indicates that the results of the LSTM-ChoA model are statistically significant.
Collapse
Affiliation(s)
- Fatma Refaat Ahmed
- Department of Nursing, College of Health Sciences, University of Sharjah, Sharjah, UAE
- Critical Care and Emergency Nursing Department, Faculty of Nursing, Alexandria University, Alexandria, Egypt
| | - Samira Ahmed Alsenany
- Department of Community Health Nursing, College of Nursing, Princess Nourah bint Abdulrahman University, P.O. Box 84428, 11671, Riyadh, Saudi Arabia
| | - Sally Mohammed Farghaly Abdelaliem
- Department of Nursing Management and Education, College of Nursing, Princess Nourah bint Abdulrahman University, P.O. Box 84428, 11671, Riyadh, Saudi Arabia.
| | - Mohanad A Deif
- Department of Artificial Intelligence, College of Information Technology, Misr University for Science and Technology (MUST), 6th of October City, 12566, Egypt
| |
Collapse
|
9
|
Saleh AI, Hussien SA. Disease Diagnosis Based on Improved Gray Wolf Optimization (IGWO) and Ensemble Classification. Ann Biomed Eng 2023; 51:2579-2605. [PMID: 37452216 DOI: 10.1007/s10439-023-03303-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 06/30/2023] [Indexed: 07/18/2023]
Abstract
This paper introduces a simple strategy for diagnosing disease, which is called improved gray wolf optimization (IGWO) and ensemble classification. The proposed strategy consists of two sequential phases, which are; (i) Feature Selection Phase (FSP) and (ii) Ensemble Classification Phase (ECP). During the former, the most effective features for diagnosing disease are selected, while during the latter, the actual diagnosis takes place depending on voting of five different classifiers. The main contribution of this paper is a suggested modification for the traditional Gray Wolf Optimization (GWO), which is called Improved Gray Wolf Optimization (IGWO). As an optimization technique, the proposed IGWO is employed in the FSP for selecting the effective features. For evaluating, IGWO has been implemented using recent feature selection techniques as well as the proposed method. To accomplish the classification phase; ensemble classification has been used which uses several classification techniques such as; Naïve Bayes (NB), Support Vector Machines (SVM), Deep Neural Network (DNN), Decision Tree (DT), and K-Nearest Neighbors (KNN). Ensemble classification integrate several classifiers for improving prediction performance. Experimental results have shown that employing IGWO promotes the performance of the diagnosing strategy of different diseases in terms of precision, recall, and accuracy.
Collapse
Affiliation(s)
- Ahmed I Saleh
- Computers and Control Department, Faculty of Engineering, Mansoura University, Mansoura, Egypt
| | - Shaimaa A Hussien
- Delta Higher Institute for Engineering and Technology, Mansoura, Egypt.
| |
Collapse
|
10
|
Adegboye OR, Feda AK, Ishaya MM, Agyekum EB, Kim KC, Mbasso WF, Kamel S. Antenna S-parameter optimization based on golden sine mechanism based honey badger algorithm with tent chaos. Heliyon 2023; 9:e21596. [PMID: 38034692 PMCID: PMC10682539 DOI: 10.1016/j.heliyon.2023.e21596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Revised: 10/22/2023] [Accepted: 10/24/2023] [Indexed: 12/02/2023] Open
Abstract
This work proposed a new method to optimize the antenna S-parameter using a Golden Sine mechanism-based Honey Badger Algorithm that employs Tent chaos (GST-HBA). The Honey Badger Algorithm (HBA) is a promising optimization method that similar to other metaheuristic algorithms, is prone to premature convergence and lacks diversity in the population. The Honey Badger Algorithm is inspired by the behavior of honey badgers who use their sense of smell and honeyguide birds to move toward the honeycomb. Our proposed approach aims to improve the performance of HBA and enhance the accuracy of the optimization process for antenna S-parameter optimization. The approach we propose in this study leverages the strengths of both tent chaos and the golden sine mechanism to achieve fast convergence, population diversity, and a good tradeoff between exploitation and exploration. We begin by testing our approach on 20 standard benchmark functions, and then we apply it to a test suite of 8 S-parameter functions. We perform tests comparing the outcomes to those of other optimization algorithms, the result shows that the suggested algorithm is superior.
Collapse
Affiliation(s)
| | - Afi Kekeli Feda
- Management Information System Department, European University of Lefke, Mersin, 10, Turkey
| | - Meshack Magaji Ishaya
- Electrical and Electronics Engineering Department, Cyprus International University, Mersin, 10, Turkey
| | - Ephraim Bonah Agyekum
- Department of Nuclear and Renewable Energy, Ural Federal University Named After the First President of Russia Boris, 19 Mira Street, Yeltsin, Ekaterinburg, 620002, Russia
| | - Ki-Chai Kim
- Department of Electrical Engineering, Yeungnam University, Gyeongsan, 38541, South Korea
| | - Wulfran Fendzi Mbasso
- Laboratory of Technology and Applied Sciences, University Institute of Technology, University of Douala, PO Box: 8698, Douala, Cameroon
| | - Salah Kamel
- Electrical Engineering Department, Faculty of Engineering, Aswan University, 81542, Aswan, Egypt
| |
Collapse
|
11
|
Fu Q, Li Q, Li X. An improved multi-objective marine predator algorithm for gene selection in classification of cancer microarray data. Comput Biol Med 2023; 160:107020. [PMID: 37196457 DOI: 10.1016/j.compbiomed.2023.107020] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Revised: 04/09/2023] [Accepted: 05/05/2023] [Indexed: 05/19/2023]
Abstract
Gene selection (GS) is an important branch of interest within the field of feature selection, which is widely used in cancer classification. It provides essential insights into the pathogenesis of cancer and enables a deeper understanding of cancer data. In cancer classification, GS is essentially a multi-objective optimization problem, which aims to simultaneously optimize the two objectives of classification accuracy and the size of the gene subset. The marine predator algorithm (MPA) has been successfully employed in practical applications, however, its random initialization can lead to blindness, which may adversely affect the convergence of the algorithm. Furthermore, the elite individuals in guiding evolution are randomly chosen from the Pareto solutions, which may degrade the good exploration performance of the population. To overcome these limitations, a multi-objective improved MPA with continuous mapping initialization and leader selection strategies is proposed. In this work, a new continuous mapping initialization with ReliefF overwhelms the defects with less information in late evolution. Moreover, an improved elite selection mechanism with Gaussian distribution guides the population to evolve towards a better Pareto front. Finally, an efficient mutation method is adopted to prevent evolutionary stagnation. To evaluate its effectiveness, the proposed algorithm was compared with 9 famous algorithms. The experimental results on 16 datasets demonstrate that the proposed algorithm can significantly reduce the data dimension and obtain the highest classification accuracy on most of high-dimension cancer microarray datasets.
Collapse
Affiliation(s)
- Qiyong Fu
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China
| | - Qi Li
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China
| | - Xiaobo Li
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China.
| |
Collapse
|
12
|
Wang Z, Zhou Y, Takagi T, Song J, Tian YS, Shibuya T. Genetic algorithm-based feature selection with manifold learning for cancer classification using microarray data. BMC Bioinformatics 2023; 24:139. [PMID: 37031189 PMCID: PMC10082986 DOI: 10.1186/s12859-023-05267-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 04/02/2023] [Indexed: 04/10/2023] Open
Abstract
BACKGROUND Microarray data have been widely utilized for cancer classification. The main characteristic of microarray data is "large p and small n" in that data contain a small number of subjects but a large number of genes. It may affect the validity of the classification. Thus, there is a pressing demand of techniques able to select genes relevant to cancer classification. RESULTS This study proposed a novel feature (gene) selection method, Iso-GA, for cancer classification. Iso-GA hybrids the manifold learning algorithm, Isomap, in the genetic algorithm (GA) to account for the latent nonlinear structure of the gene expression in the microarray data. The Davies-Bouldin index is adopted to evaluate the candidate solutions in Isomap and to avoid the classifier dependency problem. Additionally, a probability-based framework is introduced to reduce the possibility of genes being randomly selected by GA. The performance of Iso-GA was evaluated on eight benchmark microarray datasets of cancers. Iso-GA outperformed other benchmarking gene selection methods, leading to good classification accuracy with fewer critical genes selected. CONCLUSIONS The proposed Iso-GA method can effectively select fewer but critical genes from microarray data to achieve competitive classification performance.
Collapse
Affiliation(s)
- Zixuan Wang
- Division of Medical Data Informatics, Human Genome Center, Institute of Medical Science, The University of Tokyo, Tokyo, 108-8639, Japan.
| | - Yi Zhou
- Beijing International Center for Mathematical Research, Peking University, Beijing, 100871, China
| | - Tatsuya Takagi
- Graduate School of Pharmaceutical Sciences, Osaka University, 1-6 Yamadaoka, Suita, Osaka, 565-0871, Japan
| | - Jiangning Song
- Biomedicine Discovery Institute and Monash Data Futures Institute, Monash University, Melbourne, VIC, 3800, Australia
| | - Yu-Shi Tian
- Graduate School of Pharmaceutical Sciences, Osaka University, 1-6 Yamadaoka, Suita, Osaka, 565-0871, Japan.
| | - Tetsuo Shibuya
- Division of Medical Data Informatics, Human Genome Center, Institute of Medical Science, The University of Tokyo, Tokyo, 108-8639, Japan
| |
Collapse
|
13
|
Zhong C, Li G, Meng Z, Li H, He W. A self-adaptive quantum equilibrium optimizer with artificial bee colony for feature selection. Comput Biol Med 2023; 153:106520. [PMID: 36608463 DOI: 10.1016/j.compbiomed.2022.106520] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 11/28/2022] [Accepted: 12/31/2022] [Indexed: 01/03/2023]
Abstract
Feature selection (FS) is a popular data pre-processing technique in machine learning to extract the optimal features to maintain or increase the classification accuracy of the dataset, which is a combinatorial optimization problem, requiring a powerful optimizer to obtain the optimum subset. The equilibrium optimizer (EO) is a recent physical-based metaheuristic algorithm with good performance for various optimization problems, but it may encounter premature or the local convergence in feature selection. This work presents a self-adaptive quantum EO with artificial bee colony for feature selection, named SQEOABC. In the proposed algorithm, the quantum theory and the self-adaptive mechanism are employed into the updating rule of EO to enhance convergence, and the updating mechanism from the artificial bee colony is also incorporated into EO to achieve appropriate FS solutions. In the experiments, 25 benchmark datasets from the UCI repository are investigated to verify SQEOABC, which is compared with several state-of-the-art metaheuristic algorithms and the variants of EO. The statistical results of fitness values and accuracy demonstrate that SQEOABC has better performance than the compared algorithms and the variants of EO. Finally, a real-world FS problem from COVID-19 illustrates the effectiveness and superiority of SQEOABC.
Collapse
Affiliation(s)
- Changting Zhong
- Department of Engineering Mechanics, State Key Laboratory of Structural Analyses for Industrial Equipment, Dalian University of Technology, Dalian, 116024, China; School of Civil Engineering and Architecture, Hainan University, Haikou 570228, China.
| | - Gang Li
- Department of Engineering Mechanics, State Key Laboratory of Structural Analyses for Industrial Equipment, Dalian University of Technology, Dalian, 116024, China; Ningbo Institute of Dalian University of Technology, Ningbo, 315000, China.
| | - Zeng Meng
- School of Civil Engineering, Hefei University of Technology, Hefei, 230009, China.
| | - Haijiang Li
- BIM for Smart Engineering Centre, Cardiff School of Engineering, Cardiff University, Queen's Buildings, Cardiff, CF24 3AA, Whales, UK.
| | - Wanxin He
- Department of Engineering Mechanics, State Key Laboratory of Structural Analyses for Industrial Equipment, Dalian University of Technology, Dalian, 116024, China.
| |
Collapse
|
14
|
Zhang M, Wang JS, Liu Y, Wang M, Li XD, Guo FJ. Feature selection method based on stochastic fractal search henry gas solubility optimization algorithm. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2023. [DOI: 10.3233/jifs-221036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
In most data mining tasks, feature selection is an essential preprocessing stage. Henry’s Gas Solubility Optimization (HGSO) algorithm is a physical heuristic algorithm based on Henry’s law, which simulates the process of gas solubility in liquid with temperature. In this paper, an improved Henry’s Gas Solubility Optimization based on stochastic fractal search (SFS-HGSO) is proposed for feature selection and engineering optimization. Three stochastic fractal strategies based on Gaussian walk, Lévy flight and Brownian motion are adopted respectively, and the diffusion is based on the high-quality solutions obtained by the original algorithm. Individuals with different fitness are assigned different energies, and the number of diffusing individuals is determined according to individual energy. This strategy increases the diversity of search strategies and enhances the ability of local search. It greatly improves the shortcomings of the original HGSO position updating method is single and the convergence speed is slow. This algorithm is used to solve the problem of feature selection, and KNN classifier is used to evaluate the effectiveness of selected features. In order to verify the performance of the proposed feature selection method, 20 standard UCI benchmark datasets are used, and the performance is compared with other swarm intelligence optimization algorithms, such as WOA, HHO and HBA. The algorithm is also applied to the solution of benchmark function. Experimental results show that these three improved strategies can effectively improve the performance of HGSO algorithm, and achieve excellent results in feature selection and engineering optimization problems.
Collapse
Affiliation(s)
- Min Zhang
- School of Electronic and Information Engineering, University of Science & Technology Liaoning, Anshan, China
| | - Jie-Sheng Wang
- School of Electronic and Information Engineering, University of Science & Technology Liaoning, Anshan, China
| | - Yu Liu
- School of Electronic and Information Engineering, University of Science & Technology Liaoning, Anshan, China
| | - Min Wang
- School of Electronic and Information Engineering, University of Science & Technology Liaoning, Anshan, China
| | - Xu-Dong Li
- School of Electronic and Information Engineering, University of Science & Technology Liaoning, Anshan, China
| | - Fu-Jun Guo
- School of Electronic and Information Engineering, University of Science & Technology Liaoning, Anshan, China
| |
Collapse
|
15
|
Saleh AI, Rabie AH. Human monkeypox diagnose (HMD) strategy based on data mining and artificial intelligence techniques. Comput Biol Med 2023; 152:106383. [PMID: 36481764 PMCID: PMC9715266 DOI: 10.1016/j.compbiomed.2022.106383] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2022] [Revised: 11/02/2022] [Accepted: 11/28/2022] [Indexed: 12/04/2022]
Abstract
In May 2022, monkeypox re-emerged as a rare zoonotic disease that is an important viral disease for public health. Monkeypox can be transmitted from animals to humans, between humans through close contact with an infected human, or with a virus stained substance. Through this paper, a new detection strategy based on artificial intelligence techniques is provided to early detect monkeypox patients. This strategy is called Human Monkeypox Detection (HMD) strategy and mainly consists of two main phases, which are; (i) Selection Phase (SP) and (ii) Detection Phase (DP). While SP tries to select the best features, DP tries to introduce fast and accurate detection based on valid data from SP. In SP, an Improved Binary Chimp Optimization (IBCO) algorithm as a new feature selection algorithm is introduced to select valuable features before learning an Ensemble Diagnosis (ED) model as a new diagnostic algorithm in the next phase called DP. In fact, the proposed IBCO algorithm is a hybrid selection algorithm that includes both filter and wrapper methods. IBCO consists of a filter layer called Filter Selection Layer (FSL) and a wrapper layer called Wrapper Selection Layer (WSL). At first, monkeypox dataset is entered into FSL to quickly select meaningful features by using 'm' filter selection techniques. Then, 'm' sets of selected features are fed into WSL to construct the initial population of Binary Chimp Optimization (BCO) algorithm to precisely choose the best set of features for the next phase (DP). Finally, the ED model will be correctly trained on the filtered data from FSL. This model consists of three diagnostic algorithms called Weighted Naïve Bayes (WNB), Weighted K-Nearest Neighbors (WKNN), and deep learning which are combined using a new weighted voting method to provide the best diagnostic results. The weighted values of WNB algorithm are determined by measuring the impact of each feature on the class categories while the Grey Wolf Optimization (GWO) algorithm is used to determine the weighted values of WKNN. Experimental results illustrated that the suggested feature selection algorithm called IBCO outperforms other modern feature selection methods and also the proposed ED model outperforms other modern diagnostic models. At the end, the HMD strategy gives the best results compared to other modern strategies with accuracy, precision, and recall values equal 98.48%, 91.1% and 88.91% respectively. Also, the HMD gives 92.56%,89.01%,88.01%,85.01%, 83.9%, and 5.4 s for micro-average precision, micro-average recall, macro-average precision, macro-average recall, F1-measure, and implementation time values respectively.
Collapse
|
16
|
Fu X, Zhu L, Wu B, Wang J, Zhao X, Ryspayev A. An efficient multilevel thresholding segmentation method based on improved chimp optimization algorithm. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2022. [DOI: 10.3233/jifs-223224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
To improve the traditional image segmentation, an efficient multilevel thresholding segmentation method based on improved Chimp Optimization Algorithm (IChOA) is developed in this paper. Kapur entropy is utilized as the objective function. The best threshold values for RGB images’ three channels are found using IChOA. Meanwhile, several strategies are introduced including population initialization strategy combining with Gaussian chaos and opposition-based learning, the position update mechanism of particle swarm algorithm (PSO), the Gaussian-Cauchy mutation and the adaptive nonlinear strategy. These methods enable the IChOA to raise the diversity of the population and enhance both the exploration and exploitation. Additionally, the search ability, accuracy and stability of IChOA have been significantly enhanced. To prove the superiority of the IChOA based multilevel thresholding segmentation method, a comparison experiment is conducted between IChOA and 5 six meta-heuristic algorithms using 12 test functions, which fully demonstrate that IChOA can obtain high-quality solutions and almost does not suffer from premature convergence. Furthermore, by using 10 standard test images the IChOA-based multilevel thresholding image segmentation method is compared with other peers and evaluated the segmentation results using 5 evaluation indicators with the average fitness value, PSNR, SSIM, FSIM and computational time. The experimental results reveal that the presented IChOA-based multilevel thresholding image segmentation method has tremendous potential to be utilized as an image segmentation method for color images because it can be an effective swarm intelligence optimization method that can maintain a delicate balance during the segmentation process of color images.
Collapse
Affiliation(s)
- Xue Fu
- College of Mechanical and Electrical Engineering, Northeast Forestry University, Harbin, Heilongjiang, China
| | - Liangkuan Zhu
- College of Mechanical and Electrical Engineering, Northeast Forestry University, Harbin, Heilongjiang, China
| | - Bowen Wu
- School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China
| | - Jingyu Wang
- College of Mechanical and Electrical Engineering, Northeast Forestry University, Harbin, Heilongjiang, China
| | - Xiaohan Zhao
- College of Mechanical and Electrical Engineering, Northeast Forestry University, Harbin, Heilongjiang, China
| | - Arystan Ryspayev
- College of Mechanical and Electrical Engineering, Northeast Forestry University, Harbin, Heilongjiang, China
| |
Collapse
|
17
|
Hu J, Lv S, Zhou T, Chen H, Xiao L, Huang X, Wang L, Wu P. Identification of Pulmonary Hypertension Animal Models Using a New Evolutionary Machine Learning Framework Based on Blood Routine Indicators. JOURNAL OF BIONIC ENGINEERING 2022; 20:762-781. [PMID: 36466726 PMCID: PMC9703443 DOI: 10.1007/s42235-022-00292-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Revised: 10/17/2022] [Accepted: 10/19/2022] [Indexed: 06/17/2023]
Abstract
Pulmonary Hypertension (PH) is a global health problem that affects about 1% of the global population. Animal models of PH play a vital role in unraveling the pathophysiological mechanisms of the disease. The present study proposes a Kernel Extreme Learning Machine (KELM) model based on an improved Whale Optimization Algorithm (WOA) for predicting PH mouse models. The experimental results showed that the selected blood indicators, including Haemoglobin (HGB), Hematocrit (HCT), Mean, Platelet Volume (MPV), Platelet distribution width (PDW), and Platelet-Large Cell Ratio (P-LCR), were essential for identifying PH mouse models using the feature selection method proposed in this paper. Remarkably, the method achieved 100.0% accuracy and 100.0% specificity in classification, demonstrating that our method has great potential to be used for evaluating and identifying mouse PH models.
Collapse
Affiliation(s)
- Jiao Hu
- Department of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou, 325035 People’s Republic of China
| | - Shushu Lv
- Department of Dermatology, Beijing Tongren Hospital, Capital Medical University, Beijing, 100730 People’s Republic of China
| | - Tao Zhou
- The First Clinical College, Wenzhou Medical University, Wenzhou, 325000 People’s Republic of China
| | - Huiling Chen
- Department of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou, 325035 People’s Republic of China
| | - Lei Xiao
- Department of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou, 325035 People’s Republic of China
| | - Xiaoying Huang
- Department of Pulmonary and Critical Care Medicine, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000 People’s Republic of China
| | - Liangxing Wang
- Department of Pulmonary and Critical Care Medicine, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000 People’s Republic of China
| | - Peiliang Wu
- Department of Pulmonary and Critical Care Medicine, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000 People’s Republic of China
| |
Collapse
|
18
|
Zhang M, Wang JS, Hou JN, Song HM, Li XD, Guo FJ. RG-NBEO: a ReliefF guided novel binary equilibrium optimizer with opposition-based S-shaped and V-shaped transfer functions for feature selection. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10333-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
19
|
Phan TTH, Nguyen-Doan D, Nguyen-Huu D, Nguyen-Van H, Pham-Hong T. Investigation on new Mel frequency cepstral coefficients features and hyper-parameters tuning technique for bee sound recognition. Soft comput 2022. [DOI: 10.1007/s00500-022-07596-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
20
|
Abed-alguni BH, Alawad NA, Al-Betar MA, Paul D. Opposition-based sine cosine optimizer utilizing refraction learning and variable neighborhood search for feature selection. APPL INTELL 2022; 53:13224-13260. [PMID: 36247211 PMCID: PMC9547101 DOI: 10.1007/s10489-022-04201-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/21/2022] [Indexed: 12/03/2022]
Abstract
This paper proposes new improved binary versions of the Sine Cosine Algorithm (SCA) for the Feature Selection (FS) problem. FS is an essential machine learning and data mining task of choosing a subset of highly discriminating features from noisy, irrelevant, high-dimensional, and redundant features to best represent a dataset. SCA is a recent metaheuristic algorithm established to emulate a model based on sine and cosine trigonometric functions. It was initially proposed to tackle problems in the continuous domain. The SCA has been modified to Binary SCA (BSCA) to deal with the binary domain of the FS problem. To improve the performance of BSCA, three accumulative improved variations are proposed (i.e., IBSCA1, IBSCA2, and IBSCA3) where the last version has the best performance. IBSCA1 employs Opposition Based Learning (OBL) to help ensure a diverse population of candidate solutions. IBSCA2 improves IBSCA1 by adding Variable Neighborhood Search (VNS) and Laplace distribution to support several mutation methods. IBSCA3 improves IBSCA2 by optimizing the best candidate solution using Refraction Learning (RL), a novel OBL approach based on light refraction. For performance evaluation, 19 real-wold datasets, including a COVID-19 dataset, were selected with different numbers of features, classes, and instances. Three performance measurements have been used to test the IBSCA versions: classification accuracy, number of features, and fitness values. Furthermore, the performance of the last variation of IBSCA3 is compared against 28 existing popular algorithms. Interestingly, IBCSA3 outperformed almost all comparative methods in terms of classification accuracy and fitness values. At the same time, it was ranked 15 out of 19 in terms of number of features. The overall simulation and statistical results indicate that IBSCA3 performs better than the other algorithms.
Collapse
Affiliation(s)
| | | | - Mohammed Azmi Al-Betar
- Artificial Intelligence Research Center (AIRC), College of Engineering and Information Technology, Ajman University, Ajman, United Arab Emirates
| | - David Paul
- School of Science and Technology, University of New England, Armidale, Australia
| |
Collapse
|
21
|
Pashaei E. Mutation-based Binary Aquila optimizer for gene selection in cancer classification. Comput Biol Chem 2022; 101:107767. [PMID: 36084602 DOI: 10.1016/j.compbiolchem.2022.107767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 07/10/2022] [Accepted: 08/29/2022] [Indexed: 11/19/2022]
Abstract
Microarray data classification is one of the hottest issues in the field of bioinformatics due to its efficiency in diagnosing patients' ailments. But the difficulty is that microarrays possess a huge number of genes where the majority of which are redundant or irrelevant resulting in the deterioration of classification accuracy. For this issue, mutated binary Aquila Optimizer (MBAO) with a time-varying mirrored S-shaped (TVMS) transfer function is proposed as a new wrapper gene (or feature) selection method to find the optimal subset of informative genes. The suggested hybrid method utilizes Minimum Redundancy Maximum Relevance (mRMR) as a filtering approach to choose top-ranked genes in the first stage and then uses MBAO-TVMS as an efficient wrapper approach to identify the most discriminative genes in the second stage. TVMS is adopted to transform the continuous version of Aquila Optimizer (AO) to binary one and a mutation mechanism is incorporated into binary AO to aid the algorithm to escape local optima and improve its global search capabilities. The suggested method was tested on eleven well-known benchmark microarray datasets and compared to other current state-of-the-art methods. Based on the obtained results, mRMR-MBAO confirms its superiority over the mRMR-BAO algorithm and the other comparative GS approaches on the majority of the medical datasets strategies in terms of classification accuracy and the number of selected genes. R codes of MBAO are available at https://github.com/el-pashaei/MBAO.
Collapse
Affiliation(s)
- Elham Pashaei
- Department of Computer Engineering, Istanbul Gelisim University, Istanbul, Turkey.
| |
Collapse
|
22
|
Akinola OO, Ezugwu AE, Agushaka JO, Zitar RA, Abualigah L. Multiclass feature selection with metaheuristic optimization algorithms: a review. Neural Comput Appl 2022; 34:19751-19790. [PMID: 36060097 PMCID: PMC9424068 DOI: 10.1007/s00521-022-07705-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Accepted: 08/02/2022] [Indexed: 11/24/2022]
Abstract
Selecting relevant feature subsets is vital in machine learning, and multiclass feature selection is harder to perform since most classifications are binary. The feature selection problem aims at reducing the feature set dimension while maintaining the performance model accuracy. Datasets can be classified using various methods. Nevertheless, metaheuristic algorithms attract substantial attention to solving different problems in optimization. For this reason, this paper presents a systematic survey of literature for solving multiclass feature selection problems utilizing metaheuristic algorithms that can assist classifiers selects optima or near optima features faster and more accurately. Metaheuristic algorithms have also been presented in four primary behavior-based categories, i.e., evolutionary-based, swarm-intelligence-based, physics-based, and human-based, even though some literature works presented more categorization. Further, lists of metaheuristic algorithms were introduced in the categories mentioned. In finding the solution to issues related to multiclass feature selection, only articles on metaheuristic algorithms used for multiclass feature selection problems from the year 2000 to 2022 were reviewed about their different categories and detailed descriptions. We considered some application areas for some of the metaheuristic algorithms applied for multiclass feature selection with their variations. Popular multiclass classifiers for feature selection were also examined. Moreover, we also presented the challenges of metaheuristic algorithms for feature selection, and we identified gaps for further research studies.
Collapse
Affiliation(s)
- Olatunji O. Akinola
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201 KwaZulu-Natal South Africa
| | - Absalom E. Ezugwu
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201 KwaZulu-Natal South Africa
| | - Jeffrey O. Agushaka
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201 KwaZulu-Natal South Africa
| | - Raed Abu Zitar
- Sorbonne Center of Artificial Intelligence, Sorbonne University-Abu Dhabi, 38044 Abu Dhabi, United Arab Emirates
| | - Laith Abualigah
- Hourani Center for Applied Scientific Research, Al-Ahliyya Amman University, Amman, 19328 Jordan
- Faculty of Inforsmation Technology, Middle East University, Amman, 11831 Jordan
| |
Collapse
|
23
|
A Highly Discriminative Hybrid Feature Selection Algorithm for Cancer Diagnosis. ScientificWorldJournal 2022; 2022:1056490. [PMID: 35983572 PMCID: PMC9381276 DOI: 10.1155/2022/1056490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 07/20/2022] [Indexed: 11/17/2022] Open
Abstract
Cancer is a deadly disease that occurs due to rapid and uncontrolled cell growth. In this article, a machine learning (ML) algorithm is proposed to diagnose different cancer diseases from big data. The algorithm comprises a two-stage hybrid feature selection. In the first stage, an overall ranker is initiated to combine the results of three filter-based feature evaluation methods, namely, chi-squared, F-statistic, and mutual information (MI). The features are then ordered according to this combination. In the second stage, the modified wrapper-based sequential forward selection is utilized to discover the optimal feature subset, using ML models such as support vector machine (SVM), decision tree (DT), random forest (RF), and K-nearest neighbor (KNN) classifiers. To examine the proposed algorithm, many tests have been carried out on four cancerous microarray datasets, employing in the process 10-fold cross-validation and hyperparameter tuning. The performance of the algorithm is evaluated by calculating the diagnostic accuracy. The results indicate that for the leukemia dataset, both SVM and KNN models register the highest accuracy at 100% using only 5 features. For the ovarian cancer dataset, the SVM model achieves the highest accuracy at 100% using only 6 features. For the small round blue cell tumor (SRBCT) dataset, the SVM model also achieves the highest accuracy at 100% using only 8 features. For the lung cancer dataset, the SVM model also achieves the highest accuracy at 99.57% using 19 features. By comparing with other algorithms, the results obtained from the proposed algorithm are superior in terms of the number of selected features and diagnostic accuracy.
Collapse
|
24
|
A novel biomarker selection method combining graph neural network and gene relationships applied to microarray data. BMC Bioinformatics 2022; 23:303. [PMID: 35883022 PMCID: PMC9327232 DOI: 10.1186/s12859-022-04848-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 07/15/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The discovery of critical biomarkers is significant for clinical diagnosis, drug research and development. Researchers usually obtain biomarkers from microarray data, which comes from the dimensional curse. Feature selection in machine learning is usually used to solve this problem. However, most methods do not fully consider feature dependence, especially the real pathway relationship of genes. RESULTS Experimental results show that the proposed method is superior to classical algorithms and advanced methods in feature number and accuracy, and the selected features have more significance. METHOD This paper proposes a feature selection method based on a graph neural network. The proposed method uses the actual dependencies between features and the Pearson correlation coefficient to construct graph-structured data. The information dissemination and aggregation operations based on graph neural network are applied to fuse node information on graph structured data. The redundant features are clustered by the spectral clustering method. Then, the feature ranking aggregation model using eight feature evaluation methods acts on each clustering sub-cluster for different feature selection. CONCLUSION The proposed method can effectively remove redundant features. The algorithm's output has high stability and classification accuracy, which can potentially select potential biomarkers.
Collapse
|