1
|
Barrera-García J, Cisternas-Caneo F, Crawford B, Gómez Sánchez M, Soto R. Feature Selection Problem and Metaheuristics: A Systematic Literature Review about Its Formulation, Evaluation and Applications. Biomimetics (Basel) 2023; 9:9. [PMID: 38248583 PMCID: PMC10813816 DOI: 10.3390/biomimetics9010009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2023] [Revised: 12/16/2023] [Accepted: 12/18/2023] [Indexed: 01/23/2024] Open
Abstract
Feature selection is becoming a relevant problem within the field of machine learning. The feature selection problem focuses on the selection of the small, necessary, and sufficient subset of features that represent the general set of features, eliminating redundant and irrelevant information. Given the importance of the topic, in recent years there has been a boom in the study of the problem, generating a large number of related investigations. Given this, this work analyzes 161 articles published between 2019 and 2023 (20 April 2023), emphasizing the formulation of the problem and performance measures, and proposing classifications for the objective functions and evaluation metrics. Furthermore, an in-depth description and analysis of metaheuristics, benchmark datasets, and practical real-world applications are presented. Finally, in light of recent advances, this review paper provides future research opportunities.
Collapse
Affiliation(s)
- José Barrera-García
- Escuela de Ingeniería Informática, Pontificia Universidad Católica de Valparaíso, Avenida Brasil 2241, Valparaíso 2362807, Chile; (J.B.-G.); (F.C.-C.); (R.S.)
| | - Felipe Cisternas-Caneo
- Escuela de Ingeniería Informática, Pontificia Universidad Católica de Valparaíso, Avenida Brasil 2241, Valparaíso 2362807, Chile; (J.B.-G.); (F.C.-C.); (R.S.)
| | - Broderick Crawford
- Escuela de Ingeniería Informática, Pontificia Universidad Católica de Valparaíso, Avenida Brasil 2241, Valparaíso 2362807, Chile; (J.B.-G.); (F.C.-C.); (R.S.)
| | - Mariam Gómez Sánchez
- Departamento de Electrotecnia e Informática, Universidad Técnica Federico Santa María, Federico Santa María 6090, Viña del Mar 2520000, Chile;
| | - Ricardo Soto
- Escuela de Ingeniería Informática, Pontificia Universidad Católica de Valparaíso, Avenida Brasil 2241, Valparaíso 2362807, Chile; (J.B.-G.); (F.C.-C.); (R.S.)
| |
Collapse
|
2
|
Multi-Label Feature Selection with Conditional Mutual Information. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:9243893. [PMID: 36254206 PMCID: PMC9569236 DOI: 10.1155/2022/9243893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 09/09/2022] [Accepted: 09/19/2022] [Indexed: 11/18/2022]
Abstract
Feature selection is an important way to optimize the efficiency and accuracy of classifiers. However, traditional feature selection methods cannot work with many kinds of data in the real world, such as multi-label data. To overcome this challenge, multi-label feature selection is developed. Multi-label feature selection plays an irreplaceable role in pattern recognition and data mining. This process can improve the efficiency and accuracy of multi-label classification. However, traditional multi-label feature selection based on mutual information does not fully consider the effect of redundancy among labels. The deficiency may lead to repeated computing of mutual information and leave room to enhance the accuracy of multi-label feature selection. To deal with this challenge, this paper proposed a multi-label feature selection based on conditional mutual information among labels (CRMIL). Firstly, we analyze how to reduce the redundancy among features based on existing papers. Secondly, we propose a new approach to diminish the redundancy among labels. This method takes label sets as conditions to calculate the relevance between features and labels. This approach can weaken the impact of the redundancy among labels on feature selection results. Finally, we analyze this algorithm and balance the effects of relevance and redundancy on the evaluation function. For testing CRMIL, we compare it with the other eight multi-label feature selection algorithms on ten datasets and use four evaluation criteria to examine the results. Experimental results illustrate that CRMIL performs better than other existing algorithms.
Collapse
|
3
|
Song J, Li Z, Yao G, Wei S, Li L, Wu H. Framework for feature selection of predicting the diagnosis and prognosis of necrotizing enterocolitis. PLoS One 2022; 17:e0273383. [PMID: 35984833 PMCID: PMC9390903 DOI: 10.1371/journal.pone.0273383] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 08/08/2022] [Indexed: 11/18/2022] Open
Abstract
Neonatal necrotizing enterocolitis (NEC) occurs worldwide and is a major source of neonatal morbidity and mortality. Researchers have developed many methods for predicting NEC diagnosis and prognosis. However, most people use statistical methods to select features, which may ignore the correlation between features. In addition, because they consider a small dimension of characteristics, they neglect some laboratory parameters such as white blood cell count, lymphocyte percentage, and mean platelet volume, which could be potentially influential factors affecting the diagnosis and prognosis of NEC. To address these issues, we include more perinatal, clinical, and laboratory information, including anemia—red blood cell transfusion and feeding strategies, and propose a ridge regression and Q-learning strategy based bee swarm optimization (RQBSO) metaheuristic algorithm for predicting NEC diagnosis and prognosis. Finally, a linear support vector machine (linear SVM), which specializes in classifying high-dimensional features, is used as a classifier. In the NEC diagnostic prediction experiment, the area under the receiver operating characteristic curve (AUROC) of dataset 1 (feeding intolerance + NEC) reaches 94.23%. In the NEC prognostic prediction experiment, the AUROC of dataset 2 (medical NEC + surgical NEC) reaches 91.88%. Additionally, the classification accuracy of the RQBSO algorithm on the NEC dataset is higher than the other feature selection algorithms. Thus, the proposed approach has the potential to identify predictors that contribute to the diagnosis of NEC and stratification of disease severity in a clinical setting.
Collapse
Affiliation(s)
- Jianfei Song
- College of Communication Engineering, Jilin University, Changchun, Jilin, PR China
| | - Zhenyu Li
- Department of Neonatology, Jilin University First Hospital, Changchun, Jilin, PR China
| | - Guijin Yao
- College of Communication Engineering, Jilin University, Changchun, Jilin, PR China
| | - Songping Wei
- College of Communication Engineering, Jilin University, Changchun, Jilin, PR China
| | - Ling Li
- College of Communication Engineering, Jilin University, Changchun, Jilin, PR China
- * E-mail: (LL); (HW)
| | - Hui Wu
- Department of Neonatology, Jilin University First Hospital, Changchun, Jilin, PR China
- * E-mail: (LL); (HW)
| |
Collapse
|
4
|
A self-adaptive weighted differential evolution approach for large-scale feature selection. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2021.107633] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
|
5
|
Multi-Population Parallel Wolf Pack Algorithm for Task Assignment of UAV Swarm. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app112411996] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The effectiveness of the Wolf Pack Algorithm (WPA) in high-dimensional discrete optimization problems has been verified in previous studies; however, it usually takes too long to obtain the best solution. This paper proposes the Multi-Population Parallel Wolf Pack Algorithm (MPPWPA), in which the size of the wolf population is reduced by dividing the population into multiple sub-populations that optimize independently at the same time. Using the approximate average division method, the population is divided into multiple equal mass sub-populations whose better individuals constitute an elite sub-population. Through the elite-mass population distribution, those better individuals are optimized twice by the elite sub-population and mass sub-populations, which can accelerate the convergence. In order to maintain the population diversity, population pretreatment is proposed. The sub-populations migrate according to a constant migration probability and the migration of sub-populations are equivalent to the re-division of the confluent population. Finally, the proposed algorithm is carried out in a synchronous parallel system. Through the simulation experiments on the task assignment of the UAV swarm in three scenarios whose dimensions of solution space are 8, 30 and 150, the MPPWPA is verified as being effective in improving the optimization performance.
Collapse
|
6
|
|
7
|
Improved Equilibrium Optimization Algorithm Using Elite Opposition-Based Learning and New Local Search Strategy for Feature Selection in Medical Datasets. COMPUTATION 2021. [DOI: 10.3390/computation9060068] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The rapid growth in biomedical datasets has generated high dimensionality features that negatively impact machine learning classifiers. In machine learning, feature selection (FS) is an essential process for selecting the most significant features and reducing redundant and irrelevant features. In this study, an equilibrium optimization algorithm (EOA) is used to minimize the selected features from high-dimensional medical datasets. EOA is a novel metaheuristic physics-based algorithm and newly proposed to deal with unimodal, multi-modal, and engineering problems. EOA is considered as one of the most powerful, fast, and best performing population-based optimization algorithms. However, EOA suffers from local optima and population diversity when dealing with high dimensionality features, such as in biomedical datasets. In order to overcome these limitations and adapt EOA to solve feature selection problems, a novel metaheuristic optimizer, the so-called improved equilibrium optimization algorithm (IEOA), is proposed. Two main improvements are included in the IEOA: The first improvement is applying elite opposite-based learning (EOBL) to improve population diversity. The second improvement is integrating three novel local search strategies to prevent it from becoming stuck in local optima. The local search strategies applied to enhance local search capabilities depend on three approaches: mutation search, mutation–neighborhood search, and a backup strategy. The IEOA has enhanced the population diversity, classification accuracy, and selected features, and increased the convergence speed rate. To evaluate the performance of IEOA, we conducted experiments on 21 biomedical benchmark datasets gathered from the UCI repository. Four standard metrics were used to test and evaluate IEOA’s performance: the number of selected features, classification accuracy, fitness value, and p-value statistical test. Moreover, the proposed IEOA was compared with the original EOA and other well-known optimization algorithms. Based on the experimental results, IEOA confirmed its better performance in comparison to the original EOA and the other optimization algorithms, for the majority of the used datasets.
Collapse
|
8
|
Hao K, Zhao J, Yu K, Li C, Wang C. Path Planning of Mobile Robots Based on a Multi-Population Migration Genetic Algorithm. SENSORS (BASEL, SWITZERLAND) 2020; 20:E5873. [PMID: 33080811 PMCID: PMC7589392 DOI: 10.3390/s20205873] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/03/2020] [Revised: 10/14/2020] [Accepted: 10/15/2020] [Indexed: 11/16/2022]
Abstract
In the field of robot path planning, aiming at the problems of the standard genetic algorithm, such as premature maturity, low convergence path quality, poor population diversity, and difficulty in breaking the local optimal solution, this paper proposes a multi-population migration genetic algorithm. The multi-population migration genetic algorithm randomly divides a large population into several small with an identical population number. The migration mechanism among the populations is used to replace the screening mechanism of the selection operator. Operations such as the crossover operator and the mutation operator also are improved. Simulation results show that the multi-population migration genetic algorithm (MPMGA) is not only suitable for simulation maps of various scales and various obstacle distributions, but also has superior performance and effectively solves the problems of the standard genetic algorithm.
Collapse
Affiliation(s)
- Kun Hao
- School of Computer and Information Engineering, Tianjin Chengjian University, Tianjin 300384, China; (J.Z.); (C.L.)
| | - Jiale Zhao
- School of Computer and Information Engineering, Tianjin Chengjian University, Tianjin 300384, China; (J.Z.); (C.L.)
| | - Kaicheng Yu
- School of International Education, Tianjin Chengjian University, Tianjin 300384, China;
| | - Cheng Li
- School of Computer and Information Engineering, Tianjin Chengjian University, Tianjin 300384, China; (J.Z.); (C.L.)
| | - Chuanqi Wang
- Tianjin Keyvia Electric Co., Ltd, Tianjin 300384, China;
| |
Collapse
|