1
|
Wang X, Wang Y, Ma Z, Wong KC, Li X. Exhaustive Exploitation of Nature-Inspired Computation for Cancer Screening in an Ensemble Manner. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1366-1379. [PMID: 38578856 DOI: 10.1109/tcbb.2024.3385402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/07/2024]
Abstract
Accurate screening of cancer types is crucial for effective cancer detection and precise treatment selection. However, the association between gene expression profiles and tumors is often limited to a small number of biomarker genes. While computational methods using nature-inspired algorithms have shown promise in selecting predictive genes, existing techniques are limited by inefficient search and poor generalization across diverse datasets. This study presents a framework termed Evolutionary Optimized Diverse Ensemble Learning (EODE) to improve ensemble learning for cancer classification from gene expression data. The EODE methodology combines an intelligent grey wolf optimization algorithm for selective feature space reduction, guided random injection modeling for ensemble diversity enhancement, and subset model optimization for synergistic classifier combinations. Extensive experiments were conducted across 35 gene expression benchmark datasets encompassing varied cancer types. Results demonstrated that EODE obtained significantly improved screening accuracy over individual and conventionally aggregated models. The integrated optimization of advanced feature selection, directed specialized modeling, and cooperative classifier ensembles helps address key challenges in current nature-inspired approaches. This provides an effective framework for robust and generalized ensemble learning with gene expression biomarkers.
Collapse
|
2
|
Valiaveetil DR, T K. Elephant herding optimized features-based fast RCNN for classifying leukemia stages. Technol Health Care 2024:THC240750. [PMID: 39485713 DOI: 10.3233/thc-240750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2024]
Abstract
BACKGROUND Leukemia is a cancer that develops in the bone marrow and blood that is brought on by an excessive generation of abnormal white blood cells. This disease damages deoxyribonucleic acid (DNA), which is associated with immature cells, particularly white blood cells. It is time-consuming and requires enhanced accuracy for radiologists to diagnose acute leukemia cells. OBJECTIVE To overcome this issue, we have studied the use of a novel proposed LEU-EHO NET. METHODS LEU-EHO NET has been proposed for classifying blood smear images based on leukemia-free and leukemia-infected images. Initially, the input blood smear images are pre-processed using two techniques: normalization and cropping black edges in images. The pre-processed images are then subjected to MobileNet for feature extraction. After that, Elephant Herding Optimization (EHO) is used to select the relevant feature from the retrieved characteristics. Finally, Faster RCNN is trained with the selected features to perform the classification task and discriminate between Normal and Abnormal. RESULTS The total accuracy of the proposed LEU-EHO NET is 99.30%. The proposed LEU-EHO NET model enhances the overall accuracy by 0.69%, 16.21%, 1.10%, 1.71%, and 1.38% better than Inception v3 XGBoost, VGGNet, DNN, SVM and MobilenetV2 respectively. CONCLUSION The approach needs to be improved so that overlapped cells can be segmented more accurately. Additionally, future work might improve classification accuracy by utilizing different deep learning models.
Collapse
|
3
|
Al-Shalif SA, Senan N, Saeed F, Ghaban W, Ibrahim N, Aamir M, Sharif W. A systematic literature review on meta-heuristic based feature selection techniques for text classification. PeerJ Comput Sci 2024; 10:e2084. [PMID: 38983195 PMCID: PMC11232610 DOI: 10.7717/peerj-cs.2084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 05/03/2024] [Indexed: 07/11/2024]
Abstract
Feature selection (FS) is a critical step in many data science-based applications, especially in text classification, as it includes selecting relevant and important features from an original feature set. This process can improve learning accuracy, streamline learning duration, and simplify outcomes. In text classification, there are often many excessive and unrelated features that impact performance of the applied classifiers, and various techniques have been suggested to tackle this problem, categorized as traditional techniques and meta-heuristic (MH) techniques. In order to discover the optimal subset of features, FS processes require a search strategy, and MH techniques use various strategies to strike a balance between exploration and exploitation. The goal of this research article is to systematically analyze the MH techniques used for FS between 2015 and 2022, focusing on 108 primary studies from three different databases such as Scopus, Science Direct, and Google Scholar to identify the techniques used, as well as their strengths and weaknesses. The findings indicate that MH techniques are efficient and outperform traditional techniques, with the potential for further exploration of MH techniques such as Ringed Seal Search (RSS) to improve FS in several applications.
Collapse
Affiliation(s)
- Sarah Abdulkarem Al-Shalif
- Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Parit Raja, Johor, Malaysia
| | - Norhalina Senan
- Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Parit Raja, Johor, Malaysia
| | - Faisal Saeed
- DAAI Research Group, Department of Computing and Data Science, School of Computing and Digital Technology, University of Birmingham, Birmingham, United Kingdom
| | - Wad Ghaban
- Applied College, University of Tabuk, Tabuk, Saudi Arabia
| | - Noraini Ibrahim
- Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Parit Raja, Johor, Malaysia
| | - Muhammad Aamir
- School of Electronics, Computing and Mathematics,, University of Derby, Derby, United Kingdom
| | - Wareesa Sharif
- Faculty of Computing, The Islamia University of Bahawalpur, Bahawalpur, Pakistan
| |
Collapse
|
4
|
Mohamed TIA, Ezugwu AE, Fonou-Dombeu JV, Mohammed M, Greeff J, Elbashir MK. A novel feature selection algorithm for identifying hub genes in lung cancer. Sci Rep 2023; 13:21671. [PMID: 38066059 PMCID: PMC10709567 DOI: 10.1038/s41598-023-48953-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 12/01/2023] [Indexed: 12/18/2023] Open
Abstract
Lung cancer, a life-threatening disease primarily affecting lung tissue, remains a significant contributor to mortality in both developed and developing nations. Accurate biomarker identification is imperative for effective cancer diagnosis and therapeutic strategies. This study introduces the Voting-Based Enhanced Binary Ebola Optimization Search Algorithm (VBEOSA), an innovative ensemble-based approach combining binary optimization and the Ebola optimization search algorithm. VBEOSA harnesses the collective power of the state-of-the-art classification models through soft voting. Moreover, our research applies VBEOSA to an extensive lung cancer gene expression dataset obtained from TCGA, following essential preprocessing steps including outlier detection and removal, data normalization, and filtration. VBEOSA aids in feature selection, leading to the discovery of key hub genes closely associated with lung cancer, validated through comprehensive protein-protein interaction analysis. Notably, our investigation reveals ten significant hub genes-ADRB2, ACTB, ARRB2, GNGT2, ADRB1, ACTG1, ACACA, ATP5A1, ADCY9, and ADRA1B-each demonstrating substantial involvement in the domain of lung cancer. Furthermore, our pathway analysis sheds light on the prominence of strategic pathways such as salivary secretion and the calcium signaling pathway, providing invaluable insights into the intricate molecular mechanisms underpinning lung cancer. We also utilize the weighted gene co-expression network analysis (WGCNA) method to identify gene modules exhibiting strong correlations with clinical attributes associated with lung cancer. Our findings underscore the efficacy of VBEOSA in feature selection and offer profound insights into the multifaceted molecular landscape of lung cancer. Finally, we are confident that this research has the potential to improve diagnostic capabilities and further enrich our understanding of the disease, thus setting the stage for future advancements in the clinical management of lung cancer. The VBEOSA source codes is publicly available at https://github.com/TEHNAN/VBEOSA-A-Novel-Feature-Selection-Algorithm-for-Identifying-hub-Genes-in-Lung-Cancer .
Collapse
Affiliation(s)
- Tehnan I A Mohamed
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, South Africa
- Department of Computer Science, Faculty of Mathematical and Computer Sciences, University of Gezira, Wad Madani, 11123, Sudan
| | - Absalom E Ezugwu
- Unit for Data Science and Computing, North-West University, Potchefstroom, South Africa.
| | - Jean Vincent Fonou-Dombeu
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, South Africa
| | - Mohanad Mohammed
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, South Africa
| | - Japie Greeff
- School of Computer Science and Information Systems, Faculty of Natural and Agricultural Sciences, North-West University, Vanderbijlpark, South Africa
| | - Murtada K Elbashir
- Department of Information Systems, College of Computer and Information Sciences, Jouf University, 72388, Sakaka, Saudi Arabia
| |
Collapse
|
5
|
Chen Z, Xinxian L, Guo R, Zhang L, Dhahbi S, Bourouis S, Liu L, Wang X. Dispersed differential hunger games search for high dimensional gene data feature selection. Comput Biol Med 2023; 163:107197. [PMID: 37390761 DOI: 10.1016/j.compbiomed.2023.107197] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 06/08/2023] [Accepted: 06/19/2023] [Indexed: 07/02/2023]
Abstract
The realms of modern medicine and biology have provided substantial data sets of genetic roots that exhibit a high dimensionality. Clinical practice and associated processes are primarily dependent on data-driven decision-making. However, the high dimensionality of the data in these domains increases the complexity and size of processing. It can be challenging to determine representative genes while reducing the data's dimensionality. A successful gene selection will serve to mitigate the computing costs and refine the accuracy of the classification by eliminating superfluous or duplicative features. To address this concern, this research suggests a wrapper gene selection approach based on the HGS, combined with a dispersed foraging strategy and a differential evolution strategy, to form a new algorithm named DDHGS. Introducing the DDHGS algorithm to the global optimization field and its binary derivative bDDHGS to the feature selection problem is anticipated to refine the existing search balance between explorative and exploitative cores. We assess and confirm the efficacy of our proposed method, DDHGS, by comparing it with DE and HGS combined with a single strategy, seven classic algorithms, and ten advanced algorithms on the IEEE CEC 2017 test suite. Furthermore, to further evaluate DDHGS' performance, we compare it with several CEC winners and DE-based techniques of great efficiency on 23 popular optimization functions and the IEEE CEC 2014 benchmark test suite. The experimentation asserted that the bDDHGS approach was able to surpass bHGS and a variety of existing methods when applied to fourteen feature selection datasets from the UCI repository. The metrics measured--classification accuracy, the number of selected features, fitness scores, and execution time--all showed marked improvements with the use of bDDHGS. Considering all results, it can be concluded that bDDHGS is an optimal optimizer and an effective feature selection tool in the wrapper mode.
Collapse
Affiliation(s)
- Zhiqing Chen
- School of Intelligent Manufacturing, Wenzhou Polytechnic, Wenzhou, 325035, China.
| | - Li Xinxian
- Wenzhou Vocational College of Science and Technology, Wenzhou, 325006, China.
| | - Ran Guo
- Cyberspace Institute Advanced Technology, Guangzhou University, Guangzhou, 510006, China.
| | - Lejun Zhang
- Cyberspace Institute Advanced Technology, Guangzhou University, Guangzhou, 510006, China; College of Information Engineering, Yangzhou University, Yangzhou, 225127, China; Research and Development Center for E-Learning, Ministry of Education, Beijing, 100039, China.
| | - Sami Dhahbi
- Department of Computer Science, College of Science and Art at Mahayil, King Khalid University, Muhayil, Aseer, 62529, Saudi Arabia.
| | - Sami Bourouis
- Department of Information Technology, College of Computers and Information Technology, Taif University, P.O.Box 11099, Taif, 21944, Saudi Arabia.
| | - Lei Liu
- College of Computer Science, Sichuan University, Chengdu, Sichuan, 610065, China.
| | - Xianchuan Wang
- Information Technology Center, Wenzhou Medical University, Wenzhou, 325035, China.
| |
Collapse
|
6
|
Jia L, Wang T, Gad AG, Salem A. A weighted-sum chaotic sparrow search algorithm for interdisciplinary feature selection and data classification. Sci Rep 2023; 13:14061. [PMID: 37640716 PMCID: PMC10462760 DOI: 10.1038/s41598-023-38252-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Accepted: 07/05/2023] [Indexed: 08/31/2023] Open
Abstract
In today's data-driven digital culture, there is a critical demand for optimized solutions that essentially reduce operating expenses while attempting to increase productivity. The amount of memory and processing time that can be used to process enormous volumes of data are subject to a number of limitations. This would undoubtedly be more of a problem if a dataset contained redundant and uninteresting information. For instance, many datasets contain a number of non-informative features that primarily deceive a given classification algorithm. In order to tackle this, researchers have been developing a variety of feature selection (FS) techniques that aim to eliminate unnecessary information from the raw datasets before putting them in front of a machine learning (ML) algorithm. Meta-heuristic optimization algorithms are often a solid choice to solve NP-hard problems like FS. In this study, we present a wrapper FS technique based on the sparrow search algorithm (SSA), a type of meta-heuristic. SSA is a swarm intelligence (SI) method that stands out because of its quick convergence and improved stability. SSA does have some drawbacks, like lower swarm diversity and weak exploration ability in late iterations, like the majority of SI algorithms. So, using ten chaotic maps, we try to ameliorate SSA in three ways: (i) the initial swarm generation; (ii) the substitution of two random variables in SSA; and (iii) clamping the sparrows crossing the search range. As a result, we get CSSA, a chaotic form of SSA. Extensive comparisons show CSSA to be superior in terms of swarm diversity and convergence speed in solving various representative functions from the Institute of Electrical and Electronics Engineers (IEEE) Congress on Evolutionary Computation (CEC) benchmark set. Furthermore, experimental analysis of CSSA on eighteen interdisciplinary, multi-scale ML datasets from the University of California Irvine (UCI) data repository, as well as three high-dimensional microarray datasets, demonstrates that CSSA outperforms twelve state-of-the-art algorithms in a classification task based on FS discipline. Finally, a 5%-significance-level statistical post-hoc analysis based on Wilcoxon's signed-rank test, Friedman's rank test, and Nemenyi's test confirms CSSA's significance in terms of overall fitness, classification accuracy, selected feature size, computational time, convergence trace, and stability.
Collapse
Affiliation(s)
- LiYun Jia
- Department of Mathematics and Physics, Hebei University of Architecture, Zhangjiakou, 075000, China
| | - Tao Wang
- Department of Mathematics and Physics, Hebei University of Architecture, Zhangjiakou, 075000, China
| | - Ahmed G Gad
- Faculty of Computers and Information, Kafrelsheikh University, Kafrelsheikh, 33516, Egypt.
| | - Ahmed Salem
- College of Computing and Information Technology, Arab Academy for Science, Technology and Maritime Transport (AASTMT), Cairo, Egypt
| |
Collapse
|
7
|
Abd Elaziz M, Ouadfel S, Ibrahim RA. Boosting capuchin search with stochastic learning strategy for feature selection. Neural Comput Appl 2023; 35:14061-14080. [DOI: 10.1007/s00521-023-08400-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 02/13/2023] [Indexed: 09/02/2023]
Abstract
AbstractThe technological revolution has made available a large amount of data with many irrelevant and noisy features that alter the analysis process and increase time processing. Therefore, feature selection (FS) approaches are used to select the smallest subset of relevant features. Feature selection is viewed as an optimization process for which meta-heuristics have been successfully applied. Thus, in this paper, a new feature selection approach is proposed based on an enhanced version of the Capuchin search algorithm (CapSA). In the developed FS approach, named ECapSA, three modifications have been introduced to avoid a lack of diversity, and premature convergence of the basic CapSA: (1) The inertia weight is adjusted using the logistic map, (2) sine cosine acceleration coefficients are added to improve convergence, and (3) a stochastic learning strategy is used to add more diversity to the movement of Capuchin and a levy random walk. To demonstrate the performance of ECapSA, different datasets are used, and it is compared with other well-known FS methods. The results provide evidence of the superiority of ECapSA among the tested datasets and competitive methods in terms of performance metrics.
Collapse
|
8
|
Chen Z, Xuan P, Heidari AA, Liu L, Wu C, Chen H, Escorcia-Gutierrez J, Mansour RF. An artificial bee bare-bone hunger games search for global optimization and high-dimensional feature selection. iScience 2023; 26:106679. [PMID: 37216098 PMCID: PMC10193239 DOI: 10.1016/j.isci.2023.106679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 03/01/2023] [Accepted: 04/12/2023] [Indexed: 05/24/2023] Open
Abstract
The domains of contemporary medicine and biology have generated substantial high-dimensional genetic data. Identifying representative genes and decreasing the dimensionality of the data can be challenging. The goal of gene selection is to minimize computing costs and enhance classification precision. Therefore, this article designs a new wrapper gene selection algorithm named artificial bee bare-bone hunger games search (ABHGS), which is the hunger games search (HGS) integrated with an artificial bee strategy and a Gaussian bare-bone structure to address this issue. To evaluate and validate the performance of our proposed method, ABHGS is compared to HGS and a single strategy embedded in HGS, six classic algorithms, and ten advanced algorithms on the CEC 2017 functions. The experimental results demonstrate that the bABHGS outperforms the original HGS. Compared to peers, it increases classification accuracy and decreases the number of selected features, indicating its actual engineering utility in spatial search and feature selection.
Collapse
Affiliation(s)
- Zhiqing Chen
- School of Intelligent Manufacturing, Wenzhou Polytechnic, Wenzhou 325035, China
| | - Ping Xuan
- Department of Computer Science, School of Engineering, Shantou University, Shantou 515063, China
| | - Ali Asghar Heidari
- Key Laboratory of Intelligent Informatics for Safety & Emergency of Zhejiang Province, Wenzhou University, Wenzhou 325035, China
| | - Lei Liu
- College of Computer Science, Sichuan University, Chengdu, Sichuan 610065, China
| | - Chengwen Wu
- Key Laboratory of Intelligent Informatics for Safety & Emergency of Zhejiang Province, Wenzhou University, Wenzhou 325035, China
| | - Huiling Chen
- Key Laboratory of Intelligent Informatics for Safety & Emergency of Zhejiang Province, Wenzhou University, Wenzhou 325035, China
| | - José Escorcia-Gutierrez
- Department of Computational Science and Electronics, Universidad de la Costa, CUC, Barranquilla 080002, Colombia
| | - Romany F. Mansour
- Department of Mathematics, Faculty of Science, New Valley University, El-Kharga 72511, Egypt
| |
Collapse
|
9
|
Xu M, Song Q, Xi M, Zhou Z. Binary arithmetic optimization algorithm for feature selection. Soft comput 2023; 27:1-35. [PMID: 37362265 PMCID: PMC10191101 DOI: 10.1007/s00500-023-08274-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/19/2023] [Indexed: 06/28/2023]
Abstract
Feature selection, widely used in data preprocessing, is a challenging problem as it involves hard combinatorial optimization. So far some meta-heuristic algorithms have shown effectiveness in solving hard combinatorial optimization problems. As the arithmetic optimization algorithm only performs well in dealing with continuous optimization problems, multiple binary arithmetic optimization algorithms (BAOAs) utilizing different strategies are proposed to perform feature selection. First, six algorithms are formed based on six different transfer functions by converting the continuous search space to the discrete search space. Second, in order to enhance the speed of searching and the ability of escaping from the local optima, six other algorithms are further developed by integrating the transfer functions and Lévy flight. Based on 20 common University of California Irvine (UCI) datasets, the performance of our proposed algorithms in feature selection is evaluated, and the results demonstrate that BAOA_S1LF is the most superior among all the proposed algorithms. Moreover, the performance of BAOA_S1LF is compared with other meta-heuristic algorithms on 26 UCI datasets, and the corresponding results show the superiority of BAOA_S1LF in feature selection. Source codes of BAOA_S1LF are publicly available at: https://www.mathworks.com/matlabcentral/fileexchange/124545-binary-arithmetic-optimization-algorithm.
Collapse
Affiliation(s)
- Min Xu
- School of Physics and Electronic Engineering, Sichuan Normal University, Chengdu, 610101 Sichuan China
| | - Qixian Song
- School of Physics and Electronic Engineering, Sichuan Normal University, Chengdu, 610101 Sichuan China
| | - Mingyang Xi
- School of Physics and Electronic Engineering, Sichuan Normal University, Chengdu, 610101 Sichuan China
| | - Zhaorong Zhou
- School of Physics and Electronic Engineering, Sichuan Normal University, Chengdu, 610101 Sichuan China
- Meteorological Information and Signal Processing Key Laboratory of Sichuan Higher Education Institutes, Chengdu University of Information Technology, Chengdu, 610225 Sichuan China
| |
Collapse
|
10
|
Parmaksiz H, Yuzgec U, Dokur E, Erdogan N. Mutation based improved dragonfly optimization algorithm for a neuro-fuzzy system in short term wind speed forecasting. Knowl Based Syst 2023. [DOI: 10.1016/j.knosys.2023.110472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/16/2023]
|
11
|
Yue Y, Cao L, Lu D, Hu Z, Xu M, Wang S, Li B, Ding H. Review and empirical analysis of sparrow search algorithm. Artif Intell Rev 2023. [DOI: 10.1007/s10462-023-10435-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2023]
|
12
|
Alzaqebah A, Al-Kadi O, Aljarah I. An enhanced Harris hawk optimizer based on extreme learning machine for feature selection. PROGRESS IN ARTIFICIAL INTELLIGENCE 2023. [DOI: 10.1007/s13748-023-00298-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/01/2023]
|
13
|
Mafarja M, Thaher T, Al-Betar MA, Too J, Awadallah MA, Abu Doush I, Turabieh H. Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning. APPL INTELL 2023; 53:1-43. [PMID: 36785593 PMCID: PMC9909674 DOI: 10.1007/s10489-022-04427-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/23/2022] [Indexed: 02/11/2023]
Abstract
Software Fault Prediction (SFP) is an important process to detect the faulty components of the software to detect faulty classes or faulty modules early in the software development life cycle. In this paper, a machine learning framework is proposed for SFP. Initially, pre-processing and re-sampling techniques are applied to make the SFP datasets ready to be used by ML techniques. Thereafter seven classifiers are compared, namely K-Nearest Neighbors (KNN), Naive Bayes (NB), Linear Discriminant Analysis (LDA), Linear Regression (LR), Decision Tree (DT), Support Vector Machine (SVM), and Random Forest (RF). The RF classifier outperforms all other classifiers in terms of eliminating irrelevant/redundant features. The performance of RF is improved further using a dimensionality reduction method called binary whale optimization algorithm (BWOA) to eliminate the irrelevant/redundant features. Finally, the performance of BWOA is enhanced by hybridizing the exploration strategies of the grey wolf optimizer (GWO) and harris hawks optimization (HHO) algorithms. The proposed method is called SBEWOA. The SFP datasets utilized are selected from the PROMISE repository using sixteen datasets for software projects with different sizes and complexity. The comparative evaluation against nine well-established feature selection methods proves that the proposed SBEWOA is able to significantly produce competitively superior results for several instances of the evaluated dataset. The algorithms' performance is compared in terms of accuracy, the number of features, and fitness function. This is also proved by the 2-tailed P-values of the Wilcoxon signed ranks statistical test used. In conclusion, the proposed method is an efficient alternative ML method for SFP that can be used for similar problems in the software engineering domain.
Collapse
Affiliation(s)
- Majdi Mafarja
- Department of Computer Science, Birzeit University, Birzeit, Palestine
| | - Thaer Thaher
- Department of Computer Systems Engineering, Arab American University, Jenin, Palestine
- Information Technology Engineering, Al-Quds University, Abu Dies, Jerusalem, Palestine
| | - Mohammed Azmi Al-Betar
- Artificial Intelligence Research Center (AIRC), College of Engineering and Information Technology, Ajman University, Ajman, United Arab EmiratesDeepSinghML2017, Irbid, Jordan
| | - Jingwei Too
- Faculty of Electrical Engineering, Universiti Teknikal Malaysia Melaka, Hang Tuah Jaya, 76100 Durian Tunggal Melaka, Malaysia
| | - Mohammed A. Awadallah
- Department of Computer Science, Al-Aqsa University, P.O. Box 4051, Gaza, Palestine
- Artificial Intelligence Research Center (AIRC), Ajman University, Ajman, United Arab Emirates
| | - Iyad Abu Doush
- Department of Computing, College of Engineering and Applied Sciences, American University of Kuwait, Salmiya, Kuwait
- Computer Science Department, Yarmouk University, Irbid, Jordan
| | - Hamza Turabieh
- Department of Health Management and Informatics, University of Missouri, Columbia, 5 Hospital Drive, Columbia, MO 65212 USA
| |
Collapse
|
14
|
Wang Y, Zhou S. An improved poor and rich optimization algorithm. PLoS One 2023; 18:e0267633. [PMID: 36757967 PMCID: PMC9910665 DOI: 10.1371/journal.pone.0267633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 04/12/2022] [Indexed: 02/10/2023] Open
Abstract
The poor and rich optimization algorithm (PRO) is a new bio-inspired meta-heuristic algorithm based on the behavior of the poor and the rich. PRO suffers from low convergence speed and premature convergence, and easily traps in the local optimum, when solving very complex function optimization problems. To overcome these limitations, this study proposes an improved poor and rich optimization (IPRO) algorithm. First, to meet the requirements of convergence speed and swarm diversity requirements across different evolutionary stages of the algorithm, the population is dynamically divided into the poor and rich sub-population. Second, for the rich sub-population, this study designs a novel individual updating mechanism that learns from the evolution information of the global optimum individual and that of the poor sub-population simultaneously, to further accelerate convergence speed and minimize swarm diversity loss. Third, for the poor sub-population, this study designs a novel individual updating mechanism that improves some evolution information by learning alternately from the rich and Gauss distribution, gradually improves evolutionary genes, and maintains swarm diversity. The IPRO is then compared with four state-of-the-art swarm evolutionary algorithms with various characteristics on the CEC 2013 test suite. Experimental results demonstrate the competitive advantages of IPRO in convergence precision and speed when solving function optimization problems.
Collapse
Affiliation(s)
- Yanjiao Wang
- Department of Electrical Engineering, Northeast Electric Power University, Jilin, China
| | - Shengnan Zhou
- Department of Electrical Engineering, Northeast Electric Power University, Jilin, China
- * E-mail:
| |
Collapse
|
15
|
A hierarchical intrusion detection system based on extreme learning machine and nature-inspired optimization. Comput Secur 2023. [DOI: 10.1016/j.cose.2022.102957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
|
16
|
Gholami H, Mohammadifar A. Novel deep learning hybrid models (CNN-GRU and DLDL-RF) for the susceptibility classification of dust sources in the Middle East: a global source. Sci Rep 2022; 12:19342. [PMID: 36369266 PMCID: PMC9652306 DOI: 10.1038/s41598-022-24036-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 11/09/2022] [Indexed: 11/13/2022] Open
Abstract
Dust storms have many negative consequences, and affect all kinds of ecosystems, as well as climate and weather conditions. Therefore, classification of dust storm sources into different susceptibility categories can help us mitigate its negative effects. This study aimed to classify the susceptibility of dust sources in the Middle East (ME) by developing two novel deep learning (DL) hybrid models based on the convolutional neural network-gated recurrent unit (CNN-GRU) model, and the dense layer deep learning-random forest (DLDL-RF) model. The Dragonfly algorithm (DA) was used to identify the critical features controlling dust sources. Game theory was used for the interpretability of the DL model's output. Predictive DL models were constructed by dividing datasets randomly into train (70%) and test (30%) groups, six statistical indicators being then applied to assess the DL hybrid model performance for both datasets (train and test). Among 13 potential features (or variables) controlling dust sources, seven variables were selected as important and six as non-important by DA, respectively. Based on the DLDL-RF hybrid model - a model with higher accuracy in comparison with CNN-GRU-23.1, 22.8, and 22.2% of the study area were classified as being of very low, low and moderate susceptibility, whereas 20.2 and 11.7% of the area were classified as representing high and very high susceptibility classes, respectively. Among seven important features selected by DA, clay content, silt content, and precipitation were identified as the three most important by game theory through permutation values. Overall, DL hybrid models were found to be efficient methods for prediction purposes on large spatial scales with no or incomplete datasets from ground-based measurements.
Collapse
Affiliation(s)
- Hamid Gholami
- Department of Natural Resources Engineering, University of Hormozgan, Bandar-Abbas, , Hormozgan, Iran.
| | - Aliakbar Mohammadifar
- Department of Natural Resources Engineering, University of Hormozgan, Bandar-Abbas, , Hormozgan, Iran
| |
Collapse
|
17
|
Abed-alguni BH, Alawad NA, Al-Betar MA, Paul D. Opposition-based sine cosine optimizer utilizing refraction learning and variable neighborhood search for feature selection. APPL INTELL 2022; 53:13224-13260. [PMID: 36247211 PMCID: PMC9547101 DOI: 10.1007/s10489-022-04201-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/21/2022] [Indexed: 12/03/2022]
Abstract
This paper proposes new improved binary versions of the Sine Cosine Algorithm (SCA) for the Feature Selection (FS) problem. FS is an essential machine learning and data mining task of choosing a subset of highly discriminating features from noisy, irrelevant, high-dimensional, and redundant features to best represent a dataset. SCA is a recent metaheuristic algorithm established to emulate a model based on sine and cosine trigonometric functions. It was initially proposed to tackle problems in the continuous domain. The SCA has been modified to Binary SCA (BSCA) to deal with the binary domain of the FS problem. To improve the performance of BSCA, three accumulative improved variations are proposed (i.e., IBSCA1, IBSCA2, and IBSCA3) where the last version has the best performance. IBSCA1 employs Opposition Based Learning (OBL) to help ensure a diverse population of candidate solutions. IBSCA2 improves IBSCA1 by adding Variable Neighborhood Search (VNS) and Laplace distribution to support several mutation methods. IBSCA3 improves IBSCA2 by optimizing the best candidate solution using Refraction Learning (RL), a novel OBL approach based on light refraction. For performance evaluation, 19 real-wold datasets, including a COVID-19 dataset, were selected with different numbers of features, classes, and instances. Three performance measurements have been used to test the IBSCA versions: classification accuracy, number of features, and fitness values. Furthermore, the performance of the last variation of IBSCA3 is compared against 28 existing popular algorithms. Interestingly, IBCSA3 outperformed almost all comparative methods in terms of classification accuracy and fitness values. At the same time, it was ranked 15 out of 19 in terms of number of features. The overall simulation and statistical results indicate that IBSCA3 performs better than the other algorithms.
Collapse
Affiliation(s)
| | | | - Mohammed Azmi Al-Betar
- Artificial Intelligence Research Center (AIRC), College of Engineering and Information Technology, Ajman University, Ajman, United Arab Emirates
| | - David Paul
- School of Science and Technology, University of New England, Armidale, Australia
| |
Collapse
|
18
|
Akinola OA, Agushaka JO, Ezugwu AE. Binary dwarf mongoose optimizer for solving high-dimensional feature selection problems. PLoS One 2022; 17:e0274850. [PMID: 36201524 PMCID: PMC9536540 DOI: 10.1371/journal.pone.0274850] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Accepted: 09/06/2022] [Indexed: 11/13/2022] Open
Abstract
Selecting appropriate feature subsets is a vital task in machine learning. Its main goal is to remove noisy, irrelevant, and redundant feature subsets that could negatively impact the learning model's accuracy and improve classification performance without information loss. Therefore, more advanced optimization methods have been employed to locate the optimal subset of features. This paper presents a binary version of the dwarf mongoose optimization called the BDMO algorithm to solve the high-dimensional feature selection problem. The effectiveness of this approach was validated using 18 high-dimensional datasets from the Arizona State University feature selection repository and compared the efficacy of the BDMO with other well-known feature selection techniques in the literature. The results show that the BDMO outperforms other methods producing the least average fitness value in 14 out of 18 datasets which means that it achieved 77.77% on the overall best fitness values. The result also shows BDMO demonstrating stability by returning the least standard deviation (SD) value in 13 of 18 datasets (72.22%). Furthermore, the study achieved higher validation accuracy in 15 of the 18 datasets (83.33%) over other methods. The proposed approach also yielded the highest validation accuracy attainable in the COIL20 and Leukemia datasets which vividly portray the superiority of the BDMO.
Collapse
Affiliation(s)
- Olatunji A. Akinola
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, KwaZulu-Natal, South Africa
| | - Jeffrey O. Agushaka
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, KwaZulu-Natal, South Africa
- Department of Computer Science, Federal University of Lafia, Lafia, Nasarawa State, Nigeria
| | - Absalom E. Ezugwu
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, KwaZulu-Natal, South Africa
| |
Collapse
|
19
|
An Efficient High-dimensional Feature Selection Approach Driven By Enhanced Multi-strategy Grey Wolf Optimizer for Biological Data Classification. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07836-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/10/2022]
|
20
|
Hybrid binary COOT algorithm with simulated annealing for feature selection in high-dimensional microarray data. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07780-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
21
|
Pashaei E. Mutation-based Binary Aquila optimizer for gene selection in cancer classification. Comput Biol Chem 2022; 101:107767. [PMID: 36084602 DOI: 10.1016/j.compbiolchem.2022.107767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 07/10/2022] [Accepted: 08/29/2022] [Indexed: 11/19/2022]
Abstract
Microarray data classification is one of the hottest issues in the field of bioinformatics due to its efficiency in diagnosing patients' ailments. But the difficulty is that microarrays possess a huge number of genes where the majority of which are redundant or irrelevant resulting in the deterioration of classification accuracy. For this issue, mutated binary Aquila Optimizer (MBAO) with a time-varying mirrored S-shaped (TVMS) transfer function is proposed as a new wrapper gene (or feature) selection method to find the optimal subset of informative genes. The suggested hybrid method utilizes Minimum Redundancy Maximum Relevance (mRMR) as a filtering approach to choose top-ranked genes in the first stage and then uses MBAO-TVMS as an efficient wrapper approach to identify the most discriminative genes in the second stage. TVMS is adopted to transform the continuous version of Aquila Optimizer (AO) to binary one and a mutation mechanism is incorporated into binary AO to aid the algorithm to escape local optima and improve its global search capabilities. The suggested method was tested on eleven well-known benchmark microarray datasets and compared to other current state-of-the-art methods. Based on the obtained results, mRMR-MBAO confirms its superiority over the mRMR-BAO algorithm and the other comparative GS approaches on the majority of the medical datasets strategies in terms of classification accuracy and the number of selected genes. R codes of MBAO are available at https://github.com/el-pashaei/MBAO.
Collapse
Affiliation(s)
- Elham Pashaei
- Department of Computer Engineering, Istanbul Gelisim University, Istanbul, Turkey.
| |
Collapse
|
22
|
Akinola OA, Ezugwu AE, Oyelade ON, Agushaka JO. A hybrid binary dwarf mongoose optimization algorithm with simulated annealing for feature selection on high dimensional multi-class datasets. Sci Rep 2022; 12:14945. [PMID: 36056062 PMCID: PMC9440036 DOI: 10.1038/s41598-022-18993-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2022] [Accepted: 08/23/2022] [Indexed: 11/08/2022] Open
Abstract
The dwarf mongoose optimization (DMO) algorithm developed in 2022 was applied to solve continuous mechanical engineering design problems with a considerable balance of the exploration and exploitation phases as a metaheuristic approach. Still, the DMO is restricted in its exploitation phase, somewhat hindering the algorithm's optimal performance. In this paper, we proposed a new hybrid method called the BDMSAO, which combines the binary variants of the DMO (or BDMO) and simulated annealing (SA) algorithm. In the modelling and implementation of the hybrid BDMSAO algorithm, the BDMO is employed and used as the global search method and the simulated annealing (SA) as the local search component to enhance the limited exploitative mechanism of the BDMO. The new hybrid algorithm was evaluated using eighteen (18) UCI machine learning datasets of low and medium dimensions. The BDMSAO was also tested using three high-dimensional medical datasets to assess its robustness. The results showed the efficacy of the BDMSAO in solving challenging feature selection problems on varying datasets dimensions and its outperformance over ten other methods in the study. Specifically, the BDMSAO achieved an overall result of 61.11% in producing the highest classification accuracy possible and getting 100% accuracy on 9 of 18 datasets. It also yielded the maximum accuracy obtainable on the three high-dimensional datasets utilized while achieving competitive performance regarding the number of features selected.
Collapse
Affiliation(s)
- Olatunji A Akinola
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, KwaZulu-Natal, South Africa
| | - Absalom E Ezugwu
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, KwaZulu-Natal, South Africa.
| | - Olaide N Oyelade
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, KwaZulu-Natal, South Africa
| | - Jeffrey O Agushaka
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, KwaZulu-Natal, South Africa
| |
Collapse
|
23
|
Akinola OO, Ezugwu AE, Agushaka JO, Zitar RA, Abualigah L. Multiclass feature selection with metaheuristic optimization algorithms: a review. Neural Comput Appl 2022; 34:19751-19790. [PMID: 36060097 PMCID: PMC9424068 DOI: 10.1007/s00521-022-07705-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Accepted: 08/02/2022] [Indexed: 11/24/2022]
Abstract
Selecting relevant feature subsets is vital in machine learning, and multiclass feature selection is harder to perform since most classifications are binary. The feature selection problem aims at reducing the feature set dimension while maintaining the performance model accuracy. Datasets can be classified using various methods. Nevertheless, metaheuristic algorithms attract substantial attention to solving different problems in optimization. For this reason, this paper presents a systematic survey of literature for solving multiclass feature selection problems utilizing metaheuristic algorithms that can assist classifiers selects optima or near optima features faster and more accurately. Metaheuristic algorithms have also been presented in four primary behavior-based categories, i.e., evolutionary-based, swarm-intelligence-based, physics-based, and human-based, even though some literature works presented more categorization. Further, lists of metaheuristic algorithms were introduced in the categories mentioned. In finding the solution to issues related to multiclass feature selection, only articles on metaheuristic algorithms used for multiclass feature selection problems from the year 2000 to 2022 were reviewed about their different categories and detailed descriptions. We considered some application areas for some of the metaheuristic algorithms applied for multiclass feature selection with their variations. Popular multiclass classifiers for feature selection were also examined. Moreover, we also presented the challenges of metaheuristic algorithms for feature selection, and we identified gaps for further research studies.
Collapse
Affiliation(s)
- Olatunji O. Akinola
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201 KwaZulu-Natal South Africa
| | - Absalom E. Ezugwu
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201 KwaZulu-Natal South Africa
| | - Jeffrey O. Agushaka
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201 KwaZulu-Natal South Africa
| | - Raed Abu Zitar
- Sorbonne Center of Artificial Intelligence, Sorbonne University-Abu Dhabi, 38044 Abu Dhabi, United Arab Emirates
| | - Laith Abualigah
- Hourani Center for Applied Scientific Research, Al-Ahliyya Amman University, Amman, 19328 Jordan
- Faculty of Inforsmation Technology, Middle East University, Amman, 11831 Jordan
| |
Collapse
|
24
|
An Efficient Heap Based Optimizer Algorithm for Feature Selection. MATHEMATICS 2022. [DOI: 10.3390/math10142396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The heap-based optimizer (HBO) is an innovative meta-heuristic inspired by human social behavior. In this research, binary adaptations of the heap-based optimizer B_HBO are presented and used to determine the optimal features for classifications in wrapping form. In addition, HBO balances exploration and exploitation by employing self-adaptive parameters that can adaptively search the solution domain for the optimal solution. In the feature selection domain, the presented algorithms for the binary Heap-based optimizer B_HBO are used to find feature subsets that maximize classification performance while lowering the number of selected features. The textitk-nearest neighbor (textitk-NN) classifier ensures that the selected features are significant. The new binary methods are compared to eight common optimization methods recently employed in this field, including Ant Lion Optimization (ALO), Archimedes Optimization Algorithm (AOA), Backtracking Search Algorithm (BSA), Crow Search Algorithm (CSA), Levy flight distribution (LFD), Particle Swarm Optimization (PSO), Slime Mold Algorithm (SMA), and Tree Seed Algorithm (TSA) in terms of fitness, accuracy, precision, sensitivity, F-score, the number of selected features, and statistical tests. Twenty datasets from the UCI repository are evaluated and compared using a set of evaluation indicators. The non-parametric Wilcoxon rank-sum test was used to determine whether the proposed algorithms’ results varied statistically significantly from those of the other compared methods. The comparison analysis demonstrates that B_HBO is superior or equivalent to the other algorithms used in the literature.
Collapse
|
25
|
Dokeroglu T, Deniz A, Kiziloz HE. A comprehensive survey on recent metaheuristics for feature selection. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.04.083] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
|
26
|
|
27
|
An enhanced binary Rat Swarm Optimizer based on local-best concepts of PSO and collaborative crossover operators for feature selection. Comput Biol Med 2022; 147:105675. [PMID: 35687926 DOI: 10.1016/j.compbiomed.2022.105675] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 05/24/2022] [Accepted: 05/26/2022] [Indexed: 11/22/2022]
Abstract
In this paper, an enhanced binary version of the Rat Swarm Optimizer (RSO) is proposed to deal with Feature Selection (FS) problems. FS is an important data reduction step in data mining which finds the most representative features from the entire data. Many FS-based swarm intelligence algorithms have been used to tackle FS. However, the door is still open for further investigations since no FS method gives cutting-edge results for all cases. In this paper, a recent swarm intelligence metaheuristic method called RSO which is inspired by the social and hunting behavior of a group of rats is enhanced and explored for FS problems. The binary enhanced RSO is built based on three successive modifications: i) an S-shape transfer function is used to develop binary RSO algorithms; ii) the local search paradigm of particle swarm optimization is used with the iterative loop of RSO to boost its local exploitation; iii) three crossover mechanisms are used and controlled by a switch probability to improve the diversity. Based on these enhancements, three versions of RSO are produced, referred to as Binary RSO (BRSO), Binary Enhanced RSO (BERSO), and Binary Enhanced RSO with Crossover operators (BERSOC). To assess the performance of these versions, a benchmark of 24 datasets from various domains is used. The proposed methods are assessed concerning the fitness value, number of selected features, classification accuracy, specificity, sensitivity, and computational time. The best performance is achieved by BERSOC followed by BERSO and then BRSO. These proposed versions are comparatively assessed against 25 well-regarded metaheuristic methods and five filter-based approaches. The obtained results underline their superiority by producing new best results for some datasets.
Collapse
|
28
|
Liang S, Fang Z, Sun G, Qu G. Biogeography-based optimization with adaptive migration and adaptive mutation with its application in sidelobe reduction of antenna arrays. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.108772] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
29
|
Bu C, Wang J, Wang X. Towards delay-optimized and resource-efficient network function dynamic deployment for VNF service chaining. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.108711] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
30
|
Spatiotemporal Hybrid Random Forest Model for Tea Yield Prediction Using Satellite-Derived Variables. REMOTE SENSING 2022. [DOI: 10.3390/rs14030805] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
Crop yield forecasting is critical for enhancing food security and ensuring an appropriate food supply. It is critical to complete this activity with high precision at the regional and national levels to facilitate speedy decision-making. Tea is a big cash crop that contributes significantly to economic development, with a market of USD 200 billion in 2020 that is expected to reach over USD 318 billion by 2025. As a developing country, Bangladesh can be a greater part of this industry and increase its exports through its tea yield and production with favorable climatic features and land quality. Regrettably, the tea yield in Bangladesh has not increased significantly since 2008 like many other countries, despite having suitable climatic and land conditions, which is why quantifying the yield is imperative. This study developed a novel spatiotemporal hybrid DRS–RF model with a dragonfly optimization (DR) algorithm and support vector regression (S) as a feature selection approach. This study used satellite-derived hydro-meteorological variables between 1981 and 2020 from twenty stations across Bangladesh to address the spatiotemporal dependency of the predictor variables for the tea yield (Y). The results illustrated that the proposed DRS–RF hybrid model improved tea yield forecasting over other standalone machine learning approaches, with the least relative error value (11%). This study indicates that integrating the random forest model with the dragonfly algorithm and SVR-based feature selection improves prediction performance. This hybrid approach can help combat food risk and management for other countries.
Collapse
|
31
|
|
32
|
A Review of the Modification Strategies of the Nature Inspired Algorithms for Feature Selection Problem. MATHEMATICS 2022. [DOI: 10.3390/math10030464] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
This survey is an effort to provide a research repository and a useful reference for researchers to guide them when planning to develop new Nature-inspired Algorithms tailored to solve Feature Selection problems (NIAs-FS). We identified and performed a thorough literature review in three main streams of research lines: Feature selection problem, optimization algorithms, particularly, meta-heuristic algorithms, and modifications applied to NIAs to tackle the FS problem. We provide a detailed overview of 156 different articles about NIAs modifications for tackling FS. We support our discussions by analytical views, visualized statistics, applied examples, open-source software systems, and discuss open issues related to FS and NIAs. Finally, the survey summarizes the main foundations of NIAs-FS with approximately 34 different operators investigated. The most popular operator is chaotic maps. Hybridization is the most widely used modification technique. There are three types of hybridization: Integrating NIA with another NIA, integrating NIA with a classifier, and integrating NIA with a classifier. The most widely used hybridization is the one that integrates a classifier with the NIA. Microarray and medical applications are the dominated applications where most of the NIA-FS are modified and used. Despite the popularity of the NIAs-FS, there are still many areas that need further investigation.
Collapse
|
33
|
Pashaei E, Pashaei E. An efficient binary chimp optimization algorithm for feature selection in biomedical data classification. Neural Comput Appl 2022. [DOI: 10.1007/s00521-021-06775-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
34
|
|
35
|
Binary Horse herd optimization algorithm with crossover operators for feature selection. Comput Biol Med 2021; 141:105152. [PMID: 34952338 DOI: 10.1016/j.compbiomed.2021.105152] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Revised: 12/11/2021] [Accepted: 12/14/2021] [Indexed: 01/30/2023]
Abstract
This paper proposes a binary version of Horse herd Optimization Algorithm (HOA) to tackle Feature Selection (FS) problems. This algorithm mimics the conduct of a pack of horses when they are trying to survive. To build a Binary version of HOA, or referred to as BHOA, twofold of adjustments were made: i) Three transfer functions, namely S-shape, V-shape and U-shape, are utilized to transform the continues domain into a binary one. Four configurations of each transfer function are also well studied to yield four alternatives. ii) Three crossover operators: one-point, two-point and uniform are also suggested to ensure the efficiency of the proposed method for FS domain. The performance of the proposed fifteen BHOA versions is examined using 24 real-world FS datasets. A set of six metric measures was used to evaluate the outcome of the optimization methods: accuracy, number of features selected, fitness values, sensitivity, specificity and computational time. The best-formed version of the proposed versions is BHOA with S-shape and one-point crossover. The comparative evaluation was also accomplished against 21 state-of-the-art methods. The proposed method is able to find very competitive results where some of them are the best-recorded. Due to the viability of the proposed method, it can be further considered in other areas of machine learning.
Collapse
|
36
|
Liu W, Wang J. Recursive elimination–election algorithms for wrapper feature selection. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
37
|
Elkabbash ET, Mostafa RR, Barakat SI. Android malware classification based on random vector functional link and artificial Jellyfish Search optimizer. PLoS One 2021; 16:e0260232. [PMID: 34797851 PMCID: PMC8604294 DOI: 10.1371/journal.pone.0260232] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Accepted: 11/04/2021] [Indexed: 11/19/2022] Open
Abstract
Smartphone usage is nearly ubiquitous worldwide, and Android provides the leading open-source operating system, retaining the most significant market share and active user population of all open-source operating systems. Hence, malicious actors target the Android operating system to capitalize on this consumer reliance and vulnerabilities present in the system. Hackers often use confidential user data to exploit users for advertising, extortion, and theft. Notably, most Android malware detection tools depend on conventional machine-learning algorithms; hence, they lose the benefits of metaheuristic optimization. Here, we introduce a novel detection system based on optimizing the random vector functional link (RVFL) using the artificial Jellyfish Search (JS) optimizer following dimensional reduction of Android application features. JS is used to determine the optimal configurations of RVFL to improve classification performance. RVFL+JS minimizes the runtime of the execution of the optimized models with the best performance metrics, based on a dataset consisting of 11,598 multi-class applications and 471 static and dynamic features.
Collapse
Affiliation(s)
- Emad T. Elkabbash
- Information Systems Department, Faculty of Computers and Information Sciences, Mansoura University, Mansoura, Egypt
| | - Reham R. Mostafa
- Information Systems Department, Faculty of Computers and Information Sciences, Mansoura University, Mansoura, Egypt
| | - Sherif I. Barakat
- Information Systems Department, Faculty of Computers and Information Sciences, Mansoura University, Mansoura, Egypt
| |
Collapse
|
38
|
Boosting Atomic Orbit Search Using Dynamic-Based Learning for Feature Selection. MATHEMATICS 2021. [DOI: 10.3390/math9212786] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Feature selection (FS) is a well-known preprocess step in soft computing and machine learning algorithms. It plays a critical role in different real-world applications since it aims to determine the relevant features and remove other ones. This process (i.e., FS) reduces the time and space complexity of the learning technique used to handle the collected data. The feature selection methods based on metaheuristic (MH) techniques established their performance over all the conventional FS methods. So, in this paper, we presented a modified version of new MH techniques named Atomic Orbital Search (AOS) as FS technique. This is performed using the advances of dynamic opposite-based learning (DOL) strategy that is used to enhance the ability of AOS to explore the search domain. This is performed by increasing the diversity of the solutions during the searching process and updating the search domain. A set of eighteen datasets has been used to evaluate the efficiency of the developed FS approach, named AOSD, and the results of AOSD are compared with other MH methods. From the results, AOSD can reduce the number of features by preserving or increasing the classification accuracy better than other MH techniques.
Collapse
|
39
|
|
40
|
Rank-driven salp swarm algorithm with orthogonal opposition-based learning for global optimization. APPL INTELL 2021; 52:7922-7964. [PMID: 34764621 PMCID: PMC8516494 DOI: 10.1007/s10489-021-02776-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/16/2021] [Indexed: 12/04/2022]
Abstract
Salp swarm algorithm (SSA) is a relatively new and straightforward swarm-based meta-heuristic optimization algorithm, which is inspired by the flocking behavior of salps when foraging and navigating in oceans. Although SSA is very competitive, it suffers from some limitations including unbalanced exploration and exploitation operation, slow convergence. Therefore, this study presents an improved version of SSA, called OOSSA, to enhance the comprehensive performance of the basic method. In preference, a new opposition-based learning strategy based on optical lens imaging principle is proposed, and combined with the orthogonal experimental design, an orthogonal lens opposition-based learning technique is designed to help the population jump out of a local optimum. Next, the scheme of adaptively adjusting the number of leaders is embraced to boost the global exploration capability and improve the convergence speed. Also, a dynamic learning strategy is applied to the canonical methodology to improve the exploitation capability. To confirm the efficacy of the proposed OOSSA, this paper uses 26 standard mathematical optimization functions with various features to test the method. Alongside, the performance of the proposed methodology is validated by Wilcoxon signed-rank and Friedman statistical tests. Additionally, three well-known engineering optimization problems and unknown parameters extraction issue of photovoltaic model are applied to check the ability of the OOSA algorithm to obtain solutions to intractable real-world problems. The experimental results reveal that the developed OOSSA is significantly superior to the standard SSA, currently popular SSA-based algorithms, and other state-of-the-artmeta-heuristic algorithms for solving numerical optimization, real-world engineering optimization, and photovoltaic model parameter extraction problems. Finally, an OOSSA-based path planning approach is developed for creating the shortest obstacle-free route for autonomous mobile robots. Our introduced method is compared with several successful swarm-based metaheuristic techniques in five maps, and the comparative results indicate that the suggested approach can generate the shortest collision-free trajectory as compared to other peers.
Collapse
|
41
|
A hybridization approach with predicted solution candidates for improving population-based optimization algorithms. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.04.082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
42
|
Improved sine cosine algorithm with simulated annealing and singer chaotic map for Hadith classification. Neural Comput Appl 2021. [DOI: 10.1007/s00521-021-06448-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
43
|
A New Set of Mutation Operators for Dragonfly Algorithm. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2021. [DOI: 10.1007/s13369-021-05639-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
44
|
Spatial bound whale optimization algorithm: an efficient high-dimensional feature selection approach. Neural Comput Appl 2021. [DOI: 10.1007/s00521-021-06224-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
45
|
Gene selection for microarray data classification based on Gray Wolf Optimizer enhanced with TRIZ-inspired operators. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107034] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
46
|
Chantar H, Tubishat M, Essgaer M, Mirjalili S. Hybrid Binary Dragonfly Algorithm with Simulated Annealing for Feature Selection. ACTA ACUST UNITED AC 2021; 2:295. [PMID: 34056623 PMCID: PMC8147911 DOI: 10.1007/s42979-021-00687-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Accepted: 05/10/2021] [Indexed: 11/24/2022]
Abstract
There are various fields are affected by the growth of data dimensionality. The major problems which are resulted from high dimensionality of data including high memory requirements, high computational cost, and low machine learning classifier performance. Therefore, proper selection of relevant features from the set of available features and the removal of irrelevant features will solve these problems. Therefore, to solve the feature selection problem, an improved version of Dragonfly Algorithm (DA) is proposed by combining it with Simulated Annealing (SA), where the improved algorithm named BDA-SA. To solve the local optima problem of DA and enhance its ability in selecting the best subset of features for classification problems, Simulated Annealing (SA) was applied to the best solution found by Binary Dragonfly algorithm in attempt to improve its accuracy. A set of frequently used data sets from UCI repository was utilized to evaluate the performance of the proposed FS approach. Results show that the proposed hybrid approach, named BDA-SA, has superior performance when compared to wrapper-based FS methods including a feature selection method based on the basic version of Binary Dragonfly Algorithm.
Collapse
Affiliation(s)
- Hamouda Chantar
- Faculty of Information Technology, Sebha University, Sebha, Libya
| | - Mohammad Tubishat
- School of Information Technology, Skyline University College, Sharjah, United Arab Emirates
| | - Mansour Essgaer
- Faculty of Information Technology, Sebha University, Sebha, Libya
| | - Seyedali Mirjalili
- Center for Artificial Intelligence Research and Optimization, Torrens University Australia, Fortitude Valley, Brisbane, QLD 4006 Australia
- Yonsei Frontier Lab, Yonsei University, Seoul, Korea
| |
Collapse
|
47
|
Wang L, Shi R, Dong J. A Hybridization of Dragonfly Algorithm Optimization and Angle Modulation Mechanism for 0-1 Knapsack Problems. ENTROPY 2021; 23:e23050598. [PMID: 34066266 PMCID: PMC8152024 DOI: 10.3390/e23050598] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Revised: 05/08/2021] [Accepted: 05/10/2021] [Indexed: 11/16/2022]
Abstract
The dragonfly algorithm (DA) is a new intelligent algorithm based on the theory of dragonfly foraging and evading predators. DA exhibits excellent performance in solving multimodal continuous functions and engineering problems. To make this algorithm work in the binary space, this paper introduces an angle modulation mechanism on DA (called AMDA) to generate bit strings, that is, to give alternative solutions to binary problems, and uses DA to optimize the coefficients of the trigonometric function. Further, to improve the algorithm stability and convergence speed, an improved AMDA, called IAMDA, is proposed by adding one more coefficient to adjust the vertical displacement of the cosine part of the original generating function. To test the performance of IAMDA and AMDA, 12 zero-one knapsack problems are considered along with 13 classic benchmark functions. Experimental results prove that IAMDA has a superior convergence speed and solution quality as compared to other algorithms.
Collapse
|
48
|
Pashaei E, Pashaei E. Gene selection using hybrid dragonfly black hole algorithm: A case study on RNA-seq COVID-19 data. Anal Biochem 2021; 627:114242. [PMID: 33974890 DOI: 10.1016/j.ab.2021.114242] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Revised: 04/12/2021] [Accepted: 05/02/2021] [Indexed: 11/18/2022]
Abstract
This paper introduces a new hybrid approach (DBH) for solving gene selection problem that incorporates the strengths of two existing metaheuristics: binary dragonfly algorithm (BDF) and binary black hole algorithm (BBHA). This hybridization aims to identify a limited and stable set of discriminative genes without sacrificing classification accuracy, whereas most current methods have encountered challenges in extracting disease-related information from a vast amount of redundant genes. The proposed approach first applies the minimum redundancy maximum relevancy (MRMR) filter method to reduce the dimensionality of feature space and then utilizes the suggested hybrid DBH algorithm to determine a smaller set of significant genes. The proposed approach was evaluated on eight benchmark gene expression datasets, and then, was compared against the latest state-of-art techniques to demonstrate algorithm efficiency. The comparative study shows that the proposed approach achieves a significant improvement as compared with existing methods in terms of classification accuracy and the number of selected genes. Moreover, the performance of the suggested method was examined on real RNA-Seq coronavirus-related gene expression data of asthmatic patients for selecting the most significant genes in order to improve the discriminative accuracy of angiotensin-converting enzyme 2 (ACE2). ACE2, as a coronavirus receptor, is a biomarker that helps to classify infected patients from uninfected in order to identify subgroups at risk for COVID-19. The result denotes that the suggested MRMR-DBH approach represents a very promising framework for finding a new combination of most discriminative genes with high classification accuracy.
Collapse
Affiliation(s)
- Elnaz Pashaei
- Department of Software Engineering, Istanbul Aydin University, Istanbul, Turkey.
| | - Elham Pashaei
- Department of Computer Engineering, Istanbul Gelisim University, Istanbul, Turkey.
| |
Collapse
|
49
|
Pachauri N, K GB. Automatic drug infusion control based on metaheuristic H
2
optimal theory for regulating the mean arterial blood pressure. ASIA-PAC J CHEM ENG 2021. [DOI: 10.1002/apj.2654] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Nikhil Pachauri
- School of Electrical & Electronics Engineering SASTRA University Thanjavur India
| | - Ghousiya Begum K
- School of Electrical & Electronics Engineering SASTRA University Thanjavur India
| |
Collapse
|
50
|
Jiang Y, Luo Q, Wei Y, Abualigah L, Zhou Y. An efficient binary Gradient-based optimizer for feature selection. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2021; 18:3813-3854. [PMID: 34198414 DOI: 10.3934/mbe.2021192] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Feature selection (FS) is a classic and challenging optimization task in the field of machine learning and data mining. Gradient-based optimizer (GBO) is a recently developed metaheuristic with population-based characteristics inspired by gradient-based Newton's method that uses two main operators: the gradient search rule (GSR), the local escape operator (LEO) and a set of vectors to explore the search space for solving continuous problems. This article presents a binary GBO (BGBO) algorithm and for feature selecting problems. The eight independent GBO variants are proposed, and eight transfer functions divided into two families of S-shaped and V-shaped are evaluated to map the search space to a discrete space of research. To verify the performance of the proposed binary GBO algorithm, 18 well-known UCI datasets and 10 high-dimensional datasets are tested and compared with other advanced FS methods. The experimental results show that among the proposed binary GBO algorithms has the best comprehensive performance and has better performance than other well known metaheuristic algorithms in terms of the performance measures.
Collapse
Affiliation(s)
- Yugui Jiang
- College of Artificial Intelligence, Guangxi University for Nationalities, Nanning 530006, China
- Guangxi Key Laboratories of Hybrid Computation and IC Design Analysis, Nanning 530006, China
| | - Qifang Luo
- College of Artificial Intelligence, Guangxi University for Nationalities, Nanning 530006, China
- Guangxi Key Laboratories of Hybrid Computation and IC Design Analysis, Nanning 530006, China
| | - Yuanfei Wei
- Xiangsihu College of Gunagxi University for Nationalities, Nanning, Guangxi 532100, China
| | - Laith Abualigah
- Faculty of Computer Sciences and Informatics, Amman Arab University, Amman 11953, Jordan
| | - Yongquan Zhou
- College of Artificial Intelligence, Guangxi University for Nationalities, Nanning 530006, China
- Guangxi Key Laboratories of Hybrid Computation and IC Design Analysis, Nanning 530006, China
| |
Collapse
|