1
|
Al-Shalif SA, Senan N, Saeed F, Ghaban W, Ibrahim N, Aamir M, Sharif W. A systematic literature review on meta-heuristic based feature selection techniques for text classification. PeerJ Comput Sci 2024; 10:e2084. [PMID: 38983195 PMCID: PMC11232610 DOI: 10.7717/peerj-cs.2084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 05/03/2024] [Indexed: 07/11/2024]
Abstract
Feature selection (FS) is a critical step in many data science-based applications, especially in text classification, as it includes selecting relevant and important features from an original feature set. This process can improve learning accuracy, streamline learning duration, and simplify outcomes. In text classification, there are often many excessive and unrelated features that impact performance of the applied classifiers, and various techniques have been suggested to tackle this problem, categorized as traditional techniques and meta-heuristic (MH) techniques. In order to discover the optimal subset of features, FS processes require a search strategy, and MH techniques use various strategies to strike a balance between exploration and exploitation. The goal of this research article is to systematically analyze the MH techniques used for FS between 2015 and 2022, focusing on 108 primary studies from three different databases such as Scopus, Science Direct, and Google Scholar to identify the techniques used, as well as their strengths and weaknesses. The findings indicate that MH techniques are efficient and outperform traditional techniques, with the potential for further exploration of MH techniques such as Ringed Seal Search (RSS) to improve FS in several applications.
Collapse
Affiliation(s)
- Sarah Abdulkarem Al-Shalif
- Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Parit Raja, Johor, Malaysia
| | - Norhalina Senan
- Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Parit Raja, Johor, Malaysia
| | - Faisal Saeed
- DAAI Research Group, Department of Computing and Data Science, School of Computing and Digital Technology, University of Birmingham, Birmingham, United Kingdom
| | - Wad Ghaban
- Applied College, University of Tabuk, Tabuk, Saudi Arabia
| | - Noraini Ibrahim
- Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Parit Raja, Johor, Malaysia
| | - Muhammad Aamir
- School of Electronics, Computing and Mathematics,, University of Derby, Derby, United Kingdom
| | - Wareesa Sharif
- Faculty of Computing, The Islamia University of Bahawalpur, Bahawalpur, Pakistan
| |
Collapse
|
2
|
Chen Z, Xinxian L, Guo R, Zhang L, Dhahbi S, Bourouis S, Liu L, Wang X. Dispersed differential hunger games search for high dimensional gene data feature selection. Comput Biol Med 2023; 163:107197. [PMID: 37390761 DOI: 10.1016/j.compbiomed.2023.107197] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 06/08/2023] [Accepted: 06/19/2023] [Indexed: 07/02/2023]
Abstract
The realms of modern medicine and biology have provided substantial data sets of genetic roots that exhibit a high dimensionality. Clinical practice and associated processes are primarily dependent on data-driven decision-making. However, the high dimensionality of the data in these domains increases the complexity and size of processing. It can be challenging to determine representative genes while reducing the data's dimensionality. A successful gene selection will serve to mitigate the computing costs and refine the accuracy of the classification by eliminating superfluous or duplicative features. To address this concern, this research suggests a wrapper gene selection approach based on the HGS, combined with a dispersed foraging strategy and a differential evolution strategy, to form a new algorithm named DDHGS. Introducing the DDHGS algorithm to the global optimization field and its binary derivative bDDHGS to the feature selection problem is anticipated to refine the existing search balance between explorative and exploitative cores. We assess and confirm the efficacy of our proposed method, DDHGS, by comparing it with DE and HGS combined with a single strategy, seven classic algorithms, and ten advanced algorithms on the IEEE CEC 2017 test suite. Furthermore, to further evaluate DDHGS' performance, we compare it with several CEC winners and DE-based techniques of great efficiency on 23 popular optimization functions and the IEEE CEC 2014 benchmark test suite. The experimentation asserted that the bDDHGS approach was able to surpass bHGS and a variety of existing methods when applied to fourteen feature selection datasets from the UCI repository. The metrics measured--classification accuracy, the number of selected features, fitness scores, and execution time--all showed marked improvements with the use of bDDHGS. Considering all results, it can be concluded that bDDHGS is an optimal optimizer and an effective feature selection tool in the wrapper mode.
Collapse
Affiliation(s)
- Zhiqing Chen
- School of Intelligent Manufacturing, Wenzhou Polytechnic, Wenzhou, 325035, China.
| | - Li Xinxian
- Wenzhou Vocational College of Science and Technology, Wenzhou, 325006, China.
| | - Ran Guo
- Cyberspace Institute Advanced Technology, Guangzhou University, Guangzhou, 510006, China.
| | - Lejun Zhang
- Cyberspace Institute Advanced Technology, Guangzhou University, Guangzhou, 510006, China; College of Information Engineering, Yangzhou University, Yangzhou, 225127, China; Research and Development Center for E-Learning, Ministry of Education, Beijing, 100039, China.
| | - Sami Dhahbi
- Department of Computer Science, College of Science and Art at Mahayil, King Khalid University, Muhayil, Aseer, 62529, Saudi Arabia.
| | - Sami Bourouis
- Department of Information Technology, College of Computers and Information Technology, Taif University, P.O.Box 11099, Taif, 21944, Saudi Arabia.
| | - Lei Liu
- College of Computer Science, Sichuan University, Chengdu, Sichuan, 610065, China.
| | - Xianchuan Wang
- Information Technology Center, Wenzhou Medical University, Wenzhou, 325035, China.
| |
Collapse
|
3
|
Chen Z, Xuan P, Heidari AA, Liu L, Wu C, Chen H, Escorcia-Gutierrez J, Mansour RF. An artificial bee bare-bone hunger games search for global optimization and high-dimensional feature selection. iScience 2023; 26:106679. [PMID: 37216098 PMCID: PMC10193239 DOI: 10.1016/j.isci.2023.106679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 03/01/2023] [Accepted: 04/12/2023] [Indexed: 05/24/2023] Open
Abstract
The domains of contemporary medicine and biology have generated substantial high-dimensional genetic data. Identifying representative genes and decreasing the dimensionality of the data can be challenging. The goal of gene selection is to minimize computing costs and enhance classification precision. Therefore, this article designs a new wrapper gene selection algorithm named artificial bee bare-bone hunger games search (ABHGS), which is the hunger games search (HGS) integrated with an artificial bee strategy and a Gaussian bare-bone structure to address this issue. To evaluate and validate the performance of our proposed method, ABHGS is compared to HGS and a single strategy embedded in HGS, six classic algorithms, and ten advanced algorithms on the CEC 2017 functions. The experimental results demonstrate that the bABHGS outperforms the original HGS. Compared to peers, it increases classification accuracy and decreases the number of selected features, indicating its actual engineering utility in spatial search and feature selection.
Collapse
Affiliation(s)
- Zhiqing Chen
- School of Intelligent Manufacturing, Wenzhou Polytechnic, Wenzhou 325035, China
| | - Ping Xuan
- Department of Computer Science, School of Engineering, Shantou University, Shantou 515063, China
| | - Ali Asghar Heidari
- Key Laboratory of Intelligent Informatics for Safety & Emergency of Zhejiang Province, Wenzhou University, Wenzhou 325035, China
| | - Lei Liu
- College of Computer Science, Sichuan University, Chengdu, Sichuan 610065, China
| | - Chengwen Wu
- Key Laboratory of Intelligent Informatics for Safety & Emergency of Zhejiang Province, Wenzhou University, Wenzhou 325035, China
| | - Huiling Chen
- Key Laboratory of Intelligent Informatics for Safety & Emergency of Zhejiang Province, Wenzhou University, Wenzhou 325035, China
| | - José Escorcia-Gutierrez
- Department of Computational Science and Electronics, Universidad de la Costa, CUC, Barranquilla 080002, Colombia
| | - Romany F. Mansour
- Department of Mathematics, Faculty of Science, New Valley University, El-Kharga 72511, Egypt
| |
Collapse
|
4
|
Khishe M. Greedy opposition-based learning for chimp optimization algorithm. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10343-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
5
|
Evolving chimp optimization algorithm by weighted opposition-based technique and greedy search for multimodal engineering problems. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
6
|
Dokeroglu T, Deniz A, Kiziloz HE. A comprehensive survey on recent metaheuristics for feature selection. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.04.083] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
|
7
|
|
8
|
The Bombus-terrestris bee optimization algorithm for feature selection. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03478-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
9
|
Abstract
AbstractCommunication via email has expanded dramatically in recent decades due to its cost-effectiveness, convenience, speed, and utility for a variety of contexts, including social, scientific, cultural, political, authentication, and advertising applications. Spam is an email sent to a large number of individuals or organizations without the recipient's desire or request. It is increasingly becoming a harmful part of email traffic and can negatively affect the usability of email systems. Such emails consume network bandwidth as well as storage space, causing email systems to slow down, wasting time and effort scanning and eliminating enormous amounts of useless information. Spam is also used for distributing offensive and harmful content on the Internet. The objective of the current study was to develop a new method for email spam detection with high accuracy and a low error rate. There are several methods to recognize, detect, filter, categorize, and delete spam emails, and almost the majority of the proposed methods have some extent of error rate. None of the spam detection techniques, despite the optimizations performed, have been effective alone. A step in text mining and message classification is feature selection, and one of the best approaches for feature selection is the use of metaheuristic algorithms. This article introduces a new method for detecting spam using the Horse herd metaheuristic Optimization Algorithm (HOA). First, the continuous HOA was transformed into a discrete algorithm. The inputs of the resulting algorithm then became opposition-based and then converted to multiobjective. Finally, it was used for spam detection, which is a discrete and multiobjective problem. The evaluation results indicate that the proposed method performs better compared to other methods such as K-nearest neighbours-grey wolf optimisation, K-nearest neighbours, multilayer perceptron, support vector machine, and Naive Bayesian. The results show that the new multiobjective opposition-based binary horse herd optimizer, running on the UCI data set, has been more successful in the average selection size and classification accuracy compared with other standard metaheuristic methods. According to the findings, the proposed algorithm is substantially more accurate in detecting spam emails in the data set in comparison with other similar algorithms, and it shows lower computational complexity.
Collapse
|
10
|
B-MFO: A Binary Moth-Flame Optimization for Feature Selection from Medical Datasets. COMPUTERS 2021. [DOI: 10.3390/computers10110136] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Advancements in medical technology have created numerous large datasets including many features. Usually, all captured features are not necessary, and there are redundant and irrelevant features, which reduce the performance of algorithms. To tackle this challenge, many metaheuristic algorithms are used to select effective features. However, most of them are not effective and scalable enough to select effective features from large medical datasets as well as small ones. Therefore, in this paper, a binary moth-flame optimization (B-MFO) is proposed to select effective features from small and large medical datasets. Three categories of B-MFO were developed using S-shaped, V-shaped, and U-shaped transfer functions to convert the canonical MFO from continuous to binary. These categories of B-MFO were evaluated on seven medical datasets and the results were compared with four well-known binary metaheuristic optimization algorithms: BPSO, bGWO, BDA, and BSSA. In addition, the convergence behavior of the B-MFO and comparative algorithms were assessed, and the results were statistically analyzed using the Friedman test. The experimental results demonstrate a superior performance of B-MFO in solving the feature selection problem for different medical datasets compared to other comparative algorithms.
Collapse
|
11
|
Wei Q, Wang C, Wen Y. Minimum attribute reduction algorithm based on quick extraction and multi-strategy social spider optimization. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-210133] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Intelligent optimization algorithm combined with rough set theory to solve minimum attribute reduction (MAR) is time consuming due to repeated evaluations of the same position. The algorithm also finds in poor solution quality because individuals are not fully explored in space. This study proposed an algorithm based on quick extraction and multi-strategy social spider optimization (QSSOAR). First, a similarity constraint strategy was called to constrain the initial state of the population. In the iterative process, an adaptive opposition-based learning (AOBL) was used to enlarge the search space. To obtain a reduction with fewer attributes, the dynamic redundancy detection (DRD) strategy was applied to remove redundant attributes in the reduction result. Furthermore, the quick extraction strategy was introduced to avoid multiple repeated computations in this paper. By combining an array with key-value pairs, the corresponding value can be obtained by simple comparison. The proposed algorithm and four representative algorithms were compared on nine UCI datasets. The results show that the proposed algorithm performs well in reduction ability, running time, and convergence speed. Meanwhile, the results confirm the superiority of the algorithm in solving MAR.
Collapse
Affiliation(s)
- Qianjin Wei
- Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin, China
| | - Chengxian Wang
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, China
| | - Yimin Wen
- Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin, China
| |
Collapse
|
12
|
Abd Elaziz M, Yousri D, Mirjalili S. A hybrid Harris hawks-moth-flame optimization algorithm including fractional-order chaos maps and evolutionary population dynamics. ADVANCES IN ENGINEERING SOFTWARE 2021; 154:102973. [DOI: 10.1016/j.advengsoft.2021.102973] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
|
13
|
Ekinci S, Izci D, Hekimoğlu B. Optimal FOPID Speed Control of DC Motor via Opposition-Based Hybrid Manta Ray Foraging Optimization and Simulated Annealing Algorithm. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2021. [DOI: 10.1007/s13369-020-05050-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
|
14
|
Abdel-Basset M, Chang V, Mohamed R. HSMA_WOA: A hybrid novel Slime mould algorithm with whale optimization algorithm for tackling the image segmentation problem of chest X-ray images. Appl Soft Comput 2020; 95:106642. [PMID: 32843887 PMCID: PMC7439973 DOI: 10.1016/j.asoc.2020.106642] [Citation(s) in RCA: 86] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 08/11/2020] [Accepted: 08/12/2020] [Indexed: 01/28/2023]
Abstract
Recently, a novel virus called COVID-19 has pervasive worldwide, starting from China and moving to all the world to eliminate a lot of persons. Many attempts have been experimented to identify the infection with COVID-19. The X-ray images were one of the attempts to detect the influence of COVID-19 on the infected persons from involving those experiments. According to the X-ray analysis, bilateral pulmonary parenchymal ground-glass and consolidative pulmonary opacities can be caused by COVID-19 - sometimes with a rounded morphology and a peripheral lung distribution. But unfortunately, the specification or if the person infected with COVID-19 or not is so hard under the X-ray images. X-ray images could be classified using the machine learning techniques to specify if the person infected severely, mild, or not infected. To improve the classification accuracy of the machine learning, the region of interest within the image that contains the features of COVID-19 must be extracted. This problem is called the image segmentation problem (ISP). Many techniques have been proposed to overcome ISP. The most commonly used technique due to its simplicity, speed, and accuracy are threshold-based segmentation. This paper proposes a new hybrid approach based on the thresholding technique to overcome ISP for COVID-19 chest X-ray images by integrating a novel meta-heuristic algorithm known as a slime mold algorithm (SMA) with the whale optimization algorithm to maximize the Kapur's entropy. The performance of integrated SMA has been evaluated on 12 chest X-ray images with threshold levels up to 30 and compared with five algorithms: Lshade algorithm, whale optimization algorithm (WOA), FireFly algorithm (FFA), Harris-hawks algorithm (HHA), salp swarm algorithms (SSA), and the standard SMA. The experimental results demonstrate that the proposed algorithm outperforms SMA under Kapur's entropy for all the metrics used and the standard SMA could perform better than the other algorithms in the comparison under all the metrics.
Collapse
Affiliation(s)
| | - Victor Chang
- School of Computing, Engineering and Digital Technologies, Teesside University, UK
| | - Reda Mohamed
- Faculty of Computers and Informatics, Zagazig University, Sharqiyah, Egypt
| |
Collapse
|
15
|
Yin Q, Cao B, Li X, Wang B, Zhang Q, Wei X. An Intelligent Optimization Algorithm for Constructing a DNA Storage Code: NOL-HHO. Int J Mol Sci 2020; 21:E2191. [PMID: 32235762 PMCID: PMC7139338 DOI: 10.3390/ijms21062191] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Revised: 03/07/2020] [Accepted: 03/18/2020] [Indexed: 11/16/2022] Open
Abstract
The high density, large capacity, and long-term stability of DNA molecules make them an emerging storage medium that is especially suitable for the long-term storage of large datasets. The DNA sequences used in storage need to consider relevant constraints to avoid nonspecific hybridization reactions, such as the No-runlength constraint, GC-content, and the Hamming distance. In this work, a new nonlinear control parameter strategy and a random opposition-based learning strategy were used to improve the Harris hawks optimization algorithm (for the improved algorithm NOL-HHO) in order to prevent it from falling into local optima. Experimental testing was performed on 23 widely used benchmark functions, and the proposed algorithm was used to obtain better coding lower bounds for DNA storage. The results show that our algorithm can better maintain a smooth transition between exploration and exploitation and has stronger global exploration capabilities as compared with other algorithms. At the same time, the improvement of the lower bound directly affects the storage capacity and code rate, which promotes the further development of DNA storage technology.
Collapse
Affiliation(s)
- Qiang Yin
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| | - Ben Cao
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| | - Xue Li
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| | - Bin Wang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| | - Qiang Zhang
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| | - Xiaopeng Wei
- The Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, China
| |
Collapse
|
16
|
Asilian Bidgoli A, Ebrahimpour-Komleh H, Rahnamayan S. An evolutionary decomposition-based multi-objective feature selection for multi-label classification. PeerJ Comput Sci 2020; 6:e261. [PMID: 33816913 PMCID: PMC7924502 DOI: 10.7717/peerj-cs.261] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Accepted: 01/22/2020] [Indexed: 05/25/2023]
Abstract
Data classification is a fundamental task in data mining. Within this field, the classification of multi-labeled data has been seriously considered in recent years. In such problems, each data entity can simultaneously belong to several categories. Multi-label classification is important because of many recent real-world applications in which each entity has more than one label. To improve the performance of multi-label classification, feature selection plays an important role. It involves identifying and removing irrelevant and redundant features that unnecessarily increase the dimensions of the search space for the classification problems. However, classification may fail with an extreme decrease in the number of relevant features. Thus, minimizing the number of features and maximizing the classification accuracy are two desirable but conflicting objectives in multi-label feature selection. In this article, we introduce a multi-objective optimization algorithm customized for selecting the features of multi-label data. The proposed algorithm is an enhanced variant of a decomposition-based multi-objective optimization approach, in which the multi-label feature selection problem is divided into single-objective subproblems that can be simultaneously solved using an evolutionary algorithm. This approach leads to accelerating the optimization process and finding more diverse feature subsets. The proposed method benefits from a local search operator to find better solutions for each subproblem. We also define a pool of genetic operators to generate new feature subsets based on old generation. To evaluate the performance of the proposed algorithm, we compare it with two other multi-objective feature selection approaches on eight real-world benchmark datasets that are commonly used for multi-label classification. The reported results of multi-objective method evaluation measures, such as hypervolume indicator and set coverage, illustrate an improvement in the results obtained by the proposed method. Moreover, the proposed method achieved better results in terms of classification accuracy with fewer features compared with state-of-the-art methods.
Collapse
Affiliation(s)
- Azam Asilian Bidgoli
- Department of Electrical and Computer Engineering, University of Kashan, Kashan, Iran
| | | | - Shahryar Rahnamayan
- Nature Inspired Computational Intelligence (NICI) Lab, Department of Electrical, Computer, and Software Engineering, Ontario Tech University, Oshawa, ON, Canada
| |
Collapse
|