1
|
Geng Y, Li Y, Deng C. An Improved Binary Walrus Optimizer with Golden Sine Disturbance and Population Regeneration Mechanism to Solve Feature Selection Problems. Biomimetics (Basel) 2024; 9:501. [PMID: 39194480 DOI: 10.3390/biomimetics9080501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2024] [Revised: 08/13/2024] [Accepted: 08/14/2024] [Indexed: 08/29/2024] Open
Abstract
Feature selection (FS) is a significant dimensionality reduction technique in machine learning and data mining that is adept at managing high-dimensional data efficiently and enhancing model performance. Metaheuristic algorithms have become one of the most promising solutions in FS owing to their powerful search capabilities as well as their performance. In this paper, the novel improved binary walrus optimizer (WO) algorithm utilizing the golden sine strategy, elite opposition-based learning (EOBL), and population regeneration mechanism (BGEPWO) is proposed for FS. First, the population is initialized using an iterative chaotic map with infinite collapses (ICMIC) chaotic map to improve the diversity. Second, a safe signal is obtained by introducing an adaptive operator to enhance the stability of the WO and optimize the trade-off between exploration and exploitation of the algorithm. Third, BGEPWO innovatively designs a population regeneration mechanism to continuously eliminate hopeless individuals and generate new promising ones, which keeps the population moving toward the optimal solution and accelerates the convergence process. Fourth, EOBL is used to guide the escape behavior of the walrus to expand the search range. Finally, the golden sine strategy is utilized for perturbing the population in the late iteration to improve the algorithm's capacity to evade local optima. The BGEPWO algorithm underwent evaluation on 21 datasets of different sizes and was compared with the BWO algorithm and 10 other representative optimization algorithms. The experimental results demonstrate that BGEPWO outperforms these competing algorithms in terms of fitness value, number of selected features, and F1-score in most datasets. The proposed algorithm achieves higher accuracy, better feature reduction ability, and stronger convergence by increasing population diversity, continuously balancing exploration and exploitation processes and effectively escaping local optimal traps.
Collapse
Affiliation(s)
- Yanyu Geng
- College of Computer Science and Technology, Jilin University, Changchun 130012, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
| | - Ying Li
- College of Computer Science and Technology, Jilin University, Changchun 130012, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
| | - Chunyan Deng
- College of Computer Science and Technology, Jilin University, Changchun 130012, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
| |
Collapse
|
2
|
Barrera-García J, Cisternas-Caneo F, Crawford B, Gómez Sánchez M, Soto R. Feature Selection Problem and Metaheuristics: A Systematic Literature Review about Its Formulation, Evaluation and Applications. Biomimetics (Basel) 2023; 9:9. [PMID: 38248583 PMCID: PMC10813816 DOI: 10.3390/biomimetics9010009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2023] [Revised: 12/16/2023] [Accepted: 12/18/2023] [Indexed: 01/23/2024] Open
Abstract
Feature selection is becoming a relevant problem within the field of machine learning. The feature selection problem focuses on the selection of the small, necessary, and sufficient subset of features that represent the general set of features, eliminating redundant and irrelevant information. Given the importance of the topic, in recent years there has been a boom in the study of the problem, generating a large number of related investigations. Given this, this work analyzes 161 articles published between 2019 and 2023 (20 April 2023), emphasizing the formulation of the problem and performance measures, and proposing classifications for the objective functions and evaluation metrics. Furthermore, an in-depth description and analysis of metaheuristics, benchmark datasets, and practical real-world applications are presented. Finally, in light of recent advances, this review paper provides future research opportunities.
Collapse
Affiliation(s)
- José Barrera-García
- Escuela de Ingeniería Informática, Pontificia Universidad Católica de Valparaíso, Avenida Brasil 2241, Valparaíso 2362807, Chile; (J.B.-G.); (F.C.-C.); (R.S.)
| | - Felipe Cisternas-Caneo
- Escuela de Ingeniería Informática, Pontificia Universidad Católica de Valparaíso, Avenida Brasil 2241, Valparaíso 2362807, Chile; (J.B.-G.); (F.C.-C.); (R.S.)
| | - Broderick Crawford
- Escuela de Ingeniería Informática, Pontificia Universidad Católica de Valparaíso, Avenida Brasil 2241, Valparaíso 2362807, Chile; (J.B.-G.); (F.C.-C.); (R.S.)
| | - Mariam Gómez Sánchez
- Departamento de Electrotecnia e Informática, Universidad Técnica Federico Santa María, Federico Santa María 6090, Viña del Mar 2520000, Chile;
| | - Ricardo Soto
- Escuela de Ingeniería Informática, Pontificia Universidad Católica de Valparaíso, Avenida Brasil 2241, Valparaíso 2362807, Chile; (J.B.-G.); (F.C.-C.); (R.S.)
| |
Collapse
|
3
|
Dib FK, Rodgers P. Graph drawing using Jaya. PLoS One 2023; 18:e0287744. [PMID: 37368896 DOI: 10.1371/journal.pone.0287744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 06/13/2023] [Indexed: 06/29/2023] Open
Abstract
Graph drawing, involving the automatic layout of graphs, is vital for clear data visualization and interpretation but poses challenges due to the optimization of a multi-metric objective function, an area where current search-based methods seek improvement. In this paper, we investigate the performance of Jaya algorithm for automatic graph layout with straight lines. Jaya algorithm has not been previously used in the field of graph drawing. Unlike most population-based methods, Jaya algorithm is a parameter-less algorithm in that it requires no algorithm-specific control parameters and only population size and number of iterations need to be specified, which makes it easy for researchers to apply in the field. To improve Jaya algorithm's performance, we applied Latin Hypercube Sampling to initialize the population of individuals so that they widely cover the search space. We developed a visualization tool that simplifies the integration of search methods, allowing for easy performance testing of algorithms on graphs with weighted aesthetic metrics. We benchmarked the Jaya algorithm and its enhanced version against Hill Climbing and Simulated Annealing, commonly used graph-drawing search algorithms which have a limited number of parameters, to demonstrate Jaya algorithm's effectiveness in the field. We conducted experiments on synthetic datasets with varying numbers of nodes and edges using the Erdős-Rényi model and real-world graph datasets and evaluated the quality of the generated layouts, and the performance of the methods based on number of function evaluations. We also conducted a scalability experiment on Jaya algorithm to evaluate its ability to handle large-scale graphs. Our results showed that Jaya algorithm significantly outperforms Hill Climbing and Simulated Annealing in terms of the quality of the generated graph layouts and the speed at which the layouts were produced. Using improved population sampling generated better layouts compared to the original Jaya algorithm using the same number of function evaluations. Moreover, Jaya algorithm was able to draw layouts for graphs with 500 nodes in a reasonable time.
Collapse
Affiliation(s)
- Fadi K Dib
- Computer Science Department, Center for Applied Mathematics and Bioinformatics (CAMB), Gulf University for Science and Technology (GUST), Hawally, Kuwait
| | - Peter Rodgers
- School of Computing, University of Kent, Canterbury, Kent, United Kingdom
| |
Collapse
|
4
|
Braik M, Awadallah MA, Al-Betar M, Hammouri AI, Alzubi OA. Cognitively Enhanced Versions of Capuchin Search Algorithm for Feature Selection in Medical Diagnosis: a COVID-19 Case Study. Cognit Comput 2023:1-38. [PMID: 37362196 PMCID: PMC10241154 DOI: 10.1007/s12559-023-10149-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 04/28/2023] [Indexed: 06/28/2023]
Abstract
Feature selection (FS) is a crucial area of cognitive computation that demands further studies. It has recently received a lot of attention from researchers working in machine learning and data mining. It is broadly employed in many different applications. Many enhanced strategies have been created for FS methods in cognitive computation to boost the performance of the methods. The goal of this paper is to present three adaptive versions of the capuchin search algorithm (CSA) that each features a better search ability than the parent CSA. These versions are used to select optimal feature subset based on a binary version of each adapted one and the k-Nearest Neighbor (k-NN) classifier. These versions were matured by applying several strategies, including automated control of inertia weight, acceleration coefficients, and other computational factors, to ameliorate search potency and convergence speed of CSA. In the velocity model of CSA, some growth computational functions, known as exponential, power, and S-shaped functions, were adopted to evolve three versions of CSA, referred to as exponential CSA (ECSA), power CSA (PCSA), and S-shaped CSA (SCSA), respectively. The results of the proposed FS methods on 24 benchmark datasets with different dimensions from various repositories were compared with other k-NN based FS methods from the literature. The results revealed that the proposed methods significantly outperformed the performance of CSA and other well-established FS methods in several relevant criteria. In particular, among the 24 datasets considered, the proposed binary ECSA, which yielded the best overall results among all other proposed versions, is able to excel the others in 18 datasets in terms of classification accuracy, 13 datasets in terms of specificity, 10 datasets in terms of sensitivity, and 14 datasets in terms of fitness values. Simply put, the results on 15, 9, and 5 datasets out of the 24 datasets studied showed that the performance levels of the binary ECSA, PCSA, and SCSA are over 90% in respect of specificity, sensitivity, and accuracy measures, respectively. The thorough results via different comparisons divulge the efficiency of the proposed methods in widening the classification accuracy compared to other methods, ensuring the ability of the proposed methods in exploring the feature space and selecting the most useful features for classification studies.
Collapse
Affiliation(s)
- Malik Braik
- Department of Computer Science, Al-Balqa Applied University, Salt, Jordan
| | - Mohammed A. Awadallah
- Department of Computer Science, Al-Aqsa University, P.O. Box 4051, Gaza, Palestine
- Artificial Intelligence Research Center (AIRC), College of Engineering and Information Technology, Ajman University, Ajman, United Arab Emirates
| | - Mohammed Azmi Al-Betar
- Artificial Intelligence Research Center (AIRC), College of Engineering and Information Technology, Ajman University, Ajman, United Arab Emirates
- Department of Information Technology, Al-Huson University College, Al-Balqa Applied University, Al-Huson, Irbid, Jordan
| | | | - Omar A. Alzubi
- Department of Computer Science, Al-Balqa Applied University, Salt, Jordan
| |
Collapse
|
5
|
Mafarja M, Thaher T, Al-Betar MA, Too J, Awadallah MA, Abu Doush I, Turabieh H. Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning. APPL INTELL 2023; 53:1-43. [PMID: 36785593 PMCID: PMC9909674 DOI: 10.1007/s10489-022-04427-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/23/2022] [Indexed: 02/11/2023]
Abstract
Software Fault Prediction (SFP) is an important process to detect the faulty components of the software to detect faulty classes or faulty modules early in the software development life cycle. In this paper, a machine learning framework is proposed for SFP. Initially, pre-processing and re-sampling techniques are applied to make the SFP datasets ready to be used by ML techniques. Thereafter seven classifiers are compared, namely K-Nearest Neighbors (KNN), Naive Bayes (NB), Linear Discriminant Analysis (LDA), Linear Regression (LR), Decision Tree (DT), Support Vector Machine (SVM), and Random Forest (RF). The RF classifier outperforms all other classifiers in terms of eliminating irrelevant/redundant features. The performance of RF is improved further using a dimensionality reduction method called binary whale optimization algorithm (BWOA) to eliminate the irrelevant/redundant features. Finally, the performance of BWOA is enhanced by hybridizing the exploration strategies of the grey wolf optimizer (GWO) and harris hawks optimization (HHO) algorithms. The proposed method is called SBEWOA. The SFP datasets utilized are selected from the PROMISE repository using sixteen datasets for software projects with different sizes and complexity. The comparative evaluation against nine well-established feature selection methods proves that the proposed SBEWOA is able to significantly produce competitively superior results for several instances of the evaluated dataset. The algorithms' performance is compared in terms of accuracy, the number of features, and fitness function. This is also proved by the 2-tailed P-values of the Wilcoxon signed ranks statistical test used. In conclusion, the proposed method is an efficient alternative ML method for SFP that can be used for similar problems in the software engineering domain.
Collapse
Affiliation(s)
- Majdi Mafarja
- Department of Computer Science, Birzeit University, Birzeit, Palestine
| | - Thaer Thaher
- Department of Computer Systems Engineering, Arab American University, Jenin, Palestine
- Information Technology Engineering, Al-Quds University, Abu Dies, Jerusalem, Palestine
| | - Mohammed Azmi Al-Betar
- Artificial Intelligence Research Center (AIRC), College of Engineering and Information Technology, Ajman University, Ajman, United Arab EmiratesDeepSinghML2017, Irbid, Jordan
| | - Jingwei Too
- Faculty of Electrical Engineering, Universiti Teknikal Malaysia Melaka, Hang Tuah Jaya, 76100 Durian Tunggal Melaka, Malaysia
| | - Mohammed A. Awadallah
- Department of Computer Science, Al-Aqsa University, P.O. Box 4051, Gaza, Palestine
- Artificial Intelligence Research Center (AIRC), Ajman University, Ajman, United Arab Emirates
| | - Iyad Abu Doush
- Department of Computing, College of Engineering and Applied Sciences, American University of Kuwait, Salmiya, Kuwait
- Computer Science Department, Yarmouk University, Irbid, Jordan
| | - Hamza Turabieh
- Department of Health Management and Informatics, University of Missouri, Columbia, 5 Hospital Drive, Columbia, MO 65212 USA
| |
Collapse
|
6
|
Braik M. Enhanced Ali Baba and the forty thieves algorithm for feature selection. Neural Comput Appl 2023; 35:6153-6184. [PMID: 36408290 PMCID: PMC9666985 DOI: 10.1007/s00521-022-08015-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 10/26/2022] [Indexed: 11/16/2022]
Abstract
Feature Selection (FS) aims to ameliorate the classification rate of dataset models by selecting only a small set of appropriate features from the initial range of features. In consequence, a reliable optimization method is needed to deal with the matters involved in this problem. Often, traditional methods fail to optimally reduce the high dimensionality of the feature space of complex datasets, which lead to the elicitation of weak classification models. Meta-heuristics can offer a favorable classification rate for high-dimensional datasets. Here, a binary version of a new human-based algorithm named Ali Baba and the Forty Thieves (AFT) was applied to tackle a pool of FS problems. Although AFT is an efficient meta-heuristic for optimizing many problems, it sometimes exhibits premature convergence and low search performance. These issues were mitigated by proposing three enhanced versions of AFT, namely: (1) A Binary Multi-layered AFT called BMAFT which uses hierarchical and distributed frameworks, (2) Binary Elitist AFT (BEAFT) which uses an elitist learning strategy, and, (3) Binary Self-adaptive AFT (BSAFT) which uses an adapted tracking distance parameter. These versions along with the basic Binary AFT (BAFT) were expansively assessed on twenty-four problems gathered from different repositories. The results showed that the proposed algorithms substantially enhance the performance of BAFT in terms of convergence speed and solution accuracy. On top of that, the overall results showed that BMAFT is the most competitive, which provided the best results with excellent performance scores compared to other competing algorithms.
Collapse
Affiliation(s)
- Malik Braik
- Department of Computer Science, Al-Balqa Applied University, Salt, Jordan
| |
Collapse
|
7
|
Thirumoorthy K, J. JJB. A feature selection model for software defect prediction using binary Rao optimization algorithm. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
8
|
Aydın Z. JayaX Algorithm for Simultaneous Layout and Size Optimization of Grillages. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2022. [DOI: 10.1007/s13369-022-07195-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
9
|
An enhanced binary Rat Swarm Optimizer based on local-best concepts of PSO and collaborative crossover operators for feature selection. Comput Biol Med 2022; 147:105675. [PMID: 35687926 DOI: 10.1016/j.compbiomed.2022.105675] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 05/24/2022] [Accepted: 05/26/2022] [Indexed: 11/22/2022]
Abstract
In this paper, an enhanced binary version of the Rat Swarm Optimizer (RSO) is proposed to deal with Feature Selection (FS) problems. FS is an important data reduction step in data mining which finds the most representative features from the entire data. Many FS-based swarm intelligence algorithms have been used to tackle FS. However, the door is still open for further investigations since no FS method gives cutting-edge results for all cases. In this paper, a recent swarm intelligence metaheuristic method called RSO which is inspired by the social and hunting behavior of a group of rats is enhanced and explored for FS problems. The binary enhanced RSO is built based on three successive modifications: i) an S-shape transfer function is used to develop binary RSO algorithms; ii) the local search paradigm of particle swarm optimization is used with the iterative loop of RSO to boost its local exploitation; iii) three crossover mechanisms are used and controlled by a switch probability to improve the diversity. Based on these enhancements, three versions of RSO are produced, referred to as Binary RSO (BRSO), Binary Enhanced RSO (BERSO), and Binary Enhanced RSO with Crossover operators (BERSOC). To assess the performance of these versions, a benchmark of 24 datasets from various domains is used. The proposed methods are assessed concerning the fitness value, number of selected features, classification accuracy, specificity, sensitivity, and computational time. The best performance is achieved by BERSOC followed by BERSO and then BRSO. These proposed versions are comparatively assessed against 25 well-regarded metaheuristic methods and five filter-based approaches. The obtained results underline their superiority by producing new best results for some datasets.
Collapse
|
10
|
Huang X, Hong KR, Kim JS, Choe IJ. Multi-objective uncertain project selection considering synergy. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01532-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
11
|
Zitar RA, Al-Betar MA, Awadallah MA, Doush IA, Assaleh K. An Intensive and Comprehensive Overview of JAYA Algorithm, its Versions and Applications. ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING : STATE OF THE ART REVIEWS 2022; 29:763-792. [PMID: 34075292 PMCID: PMC8155802 DOI: 10.1007/s11831-021-09585-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Accepted: 04/05/2021] [Indexed: 05/16/2023]
Abstract
In this review paper, JAYA algorithm, which is a recent population-based algorithm is intensively overviewed. The JAYA algorithm combines the survival of the fittest principle from evolutionary algorithms as well as the global optimal solution attractions of Swarm Intelligence methods. Initially, the optimization model and convergence characteristics of JAYA algorithm are carefully analyzed. Thereafter, the proposed versions of JAYA algorithm have been surveyed such as modified, binary, hybridized, parallel, chaotic, multi-objective and others. The various applications tackled using relevant versions of JAYA algorithm are also discussed and summarized based on several problem domains. Furthermore, the open sources code of JAYA algorithm are identified to provide enrich resources for JAYA research communities. The critical analysis of JAYA algorithm reveals its advantages and limitations in dealing with optimization problems. Finally, the paper ends up with conclusion and possible future enhancements suggested to improve the performance of JAYA algorithm. The reader of this overview will determine the best domains and applications used by JAYA algorithm and can justify their JAYA-related contributions.
Collapse
Affiliation(s)
- Raed Abu Zitar
- Sorbonne University Center of Artificial Intelligence, Sorbonne University-Abu Dhabi, Abu Dhabi, UAE
| | - Mohammed Azmi Al-Betar
- Artificial Intelligence Research Center (AIRC), College of Engineering and Information Technology, Ajman University, Ajman, UAE
- Department of Information Technology, Al-Huson University College, Al-Balqa Applied University, Irbid, Jordan
| | - Mohammed A. Awadallah
- Artificial Intelligence Research Center (AIRC), College of Engineering and Information Technology, Ajman University, Ajman, UAE
- Department of Computer Science, Al-Aqsa University, P.O. Box 4051, Gaza, Palestine
| | - Iyad Abu Doush
- Computing Department, American University of Kuwait, Salmiya, Kuwait
- Computer Science Department, Yarmouk University, Irbid, Jordan
| | - Khaled Assaleh
- Artificial Intelligence Research Center (AIRC), College of Engineering and Information Technology, Ajman University, Ajman, UAE
| |
Collapse
|
12
|
Binary Horse herd optimization algorithm with crossover operators for feature selection. Comput Biol Med 2021; 141:105152. [PMID: 34952338 DOI: 10.1016/j.compbiomed.2021.105152] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Revised: 12/11/2021] [Accepted: 12/14/2021] [Indexed: 01/30/2023]
Abstract
This paper proposes a binary version of Horse herd Optimization Algorithm (HOA) to tackle Feature Selection (FS) problems. This algorithm mimics the conduct of a pack of horses when they are trying to survive. To build a Binary version of HOA, or referred to as BHOA, twofold of adjustments were made: i) Three transfer functions, namely S-shape, V-shape and U-shape, are utilized to transform the continues domain into a binary one. Four configurations of each transfer function are also well studied to yield four alternatives. ii) Three crossover operators: one-point, two-point and uniform are also suggested to ensure the efficiency of the proposed method for FS domain. The performance of the proposed fifteen BHOA versions is examined using 24 real-world FS datasets. A set of six metric measures was used to evaluate the outcome of the optimization methods: accuracy, number of features selected, fitness values, sensitivity, specificity and computational time. The best-formed version of the proposed versions is BHOA with S-shape and one-point crossover. The comparative evaluation was also accomplished against 21 state-of-the-art methods. The proposed method is able to find very competitive results where some of them are the best-recorded. Due to the viability of the proposed method, it can be further considered in other areas of machine learning.
Collapse
|
13
|
Fan Y, Liu J, Wu S. Exploring instance correlations with local discriminant model for multi-label feature selection. APPL INTELL 2021. [DOI: 10.1007/s10489-021-02799-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
14
|
Local Neighbourhood Edge Responsive Image Descriptor for Texture Classification Using Gaussian Mutated JAYA Optimization Algorithm. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2021. [DOI: 10.1007/s13369-021-05417-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
15
|
Abiodun EO, Alabdulatif A, Abiodun OI, Alawida M, Alabdulatif A, Alkhawaldeh RS. A systematic review of emerging feature selection optimization methods for optimal text classification: the present state and prospective opportunities. Neural Comput Appl 2021; 33:15091-15118. [PMID: 34404964 PMCID: PMC8361413 DOI: 10.1007/s00521-021-06406-8] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 07/31/2021] [Indexed: 02/07/2023]
Abstract
Specialized data preparation techniques, ranging from data cleaning, outlier detection, missing value imputation, feature selection (FS), amongst others, are procedures required to get the most out of data and, consequently, get the optimal performance of predictive models for classification tasks. FS is a vital and indispensable technique that enables the model to perform faster, eliminate noisy data, remove redundancy, reduce overfitting, improve precision and increase generalization on testing data. While conventional FS techniques have been leveraged for classification tasks in the past few decades, they fail to optimally reduce the high dimensionality of the feature space of texts, thus breeding inefficient predictive models. Emerging technologies such as the metaheuristics and hyper-heuristics optimization methods provide a new paradigm for FS due to their efficiency in improving the accuracy of classification, computational demands, storage, as well as functioning seamlessly in solving complex optimization problems with less time. However, little details are known on best practices for case-to-case usage of emerging FS methods. The literature continues to be engulfed with clear and unclear findings in leveraging effective methods, which, if not performed accurately, alters precision, real-world-use feasibility, and the predictive model's overall performance. This paper reviews the present state of FS with respect to metaheuristics and hyper-heuristic methods. Through a systematic literature review of over 200 articles, we set out the most recent findings and trends to enlighten analysts, practitioners and researchers in the field of data analytics seeking clarity in understanding and implementing effective FS optimization methods for improved text classification tasks.
Collapse
Affiliation(s)
- Esther Omolara Abiodun
- School of Computer Sciences, Universiti Sains Malaysia, George Town, Malaysia ,Department of Computer Sciences, University of Abuja, Abuja, Nigeria
| | - Abdulatif Alabdulatif
- Department of Computer Science, College of Computer, Qassim University, Buraydah, Saudi Arabia
| | - Oludare Isaac Abiodun
- School of Computer Sciences, Universiti Sains Malaysia, George Town, Malaysia ,Department of Computer Sciences, University of Abuja, Abuja, Nigeria
| | - Moatsum Alawida
- School of Computer Sciences, Universiti Sains Malaysia, George Town, Malaysia ,Department of Computer Sciences, Abu Dhabi University, Abu Dhabi, UAE
| | - Abdullah Alabdulatif
- Computer Department, College of Sciences and Arts, Qassim University, P.O. Box 53, Al-Rass, Saudi Arabia
| | - Rami S. Alkhawaldeh
- Department of Computer Information Systems, The University of Jordan, Aqaba, 77110 Jordan
| |
Collapse
|