1
|
Guo X, Hu J, Yu H, Wang M, Yang B. A new population initialization of metaheuristic algorithms based on hybrid fuzzy rough set for high-dimensional gene data feature selection. Comput Biol Med 2023; 166:107538. [PMID: 37857136 DOI: 10.1016/j.compbiomed.2023.107538] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 09/06/2023] [Accepted: 09/28/2023] [Indexed: 10/21/2023]
Abstract
In the realm of modern medicine and biology, vast amounts of genetic data with high complexity are available. However, dealing with such high-dimensional data poses challenges due to increased processing complexity and size. Identifying critical genes to reduce data dimensionality is essential. The filter-wrapper hybrid method is a commonly used approach in feature selection. Most of these methods employ filters such as MRMR and ReliefF, but the performance of these simple filters is limited. Rough set methods, on the other hand, are a type of filter method that outperforms traditional filters. Simultaneously, many studies have pointed out the crucial importance of good initialization strategies for the performance of the metaheuristic algorithm (a type of wrapper-based method). Combining these two points, this paper proposes a novel filter-wrapper hybrid method for high-dimensional feature selection. To be specific, we utilize the variant of bWOA (binary Whale Optimization Algorithm) based on Hybrid Fuzzy Rough Set to perform attribute reduction, and the reduced attributes are used as prior knowledge to initialize the population. We then employ metaheuristics for further feature selection based on this initialized population. We conducted experiments using five different algorithms on 14 UCI datasets. The experiment results show that after applying the initialization method proposed in this article, the performance of five enhanced algorithms, has shown significant improvement. Particularly, the improved bMFO using our initialization method: fuzzy_bMFO outperformed six currently advanced algorithms, indicating that our initialization method for metaheuristic algorithms is suitable for high-dimensional feature selection tasks.
Collapse
Affiliation(s)
- Xuanming Guo
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China.
| | - Jiao Hu
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China.
| | - Helong Yu
- College of Information Technology, Jilin Agricultural University, Changchun, 130118, China.
| | - Mingjing Wang
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, 325000, China.
| | - Bo Yang
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China.
| |
Collapse
|
2
|
Alhenawi E, Al-Sayyed R, Hudaib A, Mirjalili S. Improved intelligent water drop-based hybrid feature selection method for microarray data processing. Comput Biol Chem 2023; 103:107809. [PMID: 36696844 DOI: 10.1016/j.compbiolchem.2022.107809] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2021] [Revised: 12/13/2022] [Accepted: 12/30/2022] [Indexed: 01/15/2023]
Abstract
Classifying microarray datasets, which usually contains many noise genes that degrade the performance of classifiers and decrease classification accuracy rate, is a competitive research topic. Feature selection (FS) is one of the most practical ways for finding the most optimal subset of genes that increases classification's accuracy for diagnostic and prognostic prediction of tumor cancer from the microarray datasets. This means that we always need to develop more efficient FS methods, that select only optimal or close-to-optimal subset of features to improve classification performance. In this paper, we propose a hybrid FS method for microarray data processing, that combines an ensemble filter with an Improved Intelligent Water Drop (IIWD) algorithm as a wrapper by adding one of three local search (LS) algorithms: Tabu search (TS), Novel LS algorithm (NLSA), or Hill Climbing (HC) in each iteration from IWD, and using a correlation coefficient filter as a heuristic undesirability (HUD) for next node selection in the original IWD algorithm. The effects of adding three different LS algorithms to the proposed IIWD algorithm have been evaluated through comparing the performance of the proposed ensemble filter-IIWD-based wrapper without adding any LS algorithms named (PHFS-IWD) FS method versus its performance when adding a specific LS algorithm from (TS, NLSA or HC) in FS methods named, (PHFS-IWDTS, PHFS-IWDNLSA, and PHFS-IWDHC), respectively. Naïve Bayes(NB) classifier with five microarray datasets have been deployed for evaluating and comparing the proposed hybrid FS methods. Results show that using LS algorithms in each iteration from the IWD algorithm improves F-score value with an average equal to 5% compared with PHFS-IWD. Also, PHFS-IWDNLSA improves the F-score value with an average of 4.15% over PHFS-IWDTS, and 5.67% over PHFS-IWDHC while PHFS-IWDTS outperformed PHFS-IWDHC with an average of increment equal to 1.6%. On the other hand, the proposed hybrid-based FS methods improve accuracy with an average equal to 8.92% in three out of five datasets and decrease the number of genes with a percentage of 58.5% in all five datasets compared with six of the most recent state-of-the-art FS methods.
Collapse
Affiliation(s)
- Esra'a Alhenawi
- Software Engineering Department, Al-Ahliyya Amman University, Amman, Jordan; King Abdullah II School for Information Technology, The University of Jordan, Amman, Jordan.
| | - Rizik Al-Sayyed
- King Abdullah II School for Information Technology, The University of Jordan, Amman, Jordan.
| | - Amjad Hudaib
- King Abdullah II School for Information Technology, The University of Jordan, Amman, Jordan.
| | - Seyedali Mirjalili
- Center for Artificial Intelligence Research and Optimization, Torrens University Australia, Fortitude Valley, Brisbane, 4006 QLD, Australia; University Research and Innovation Center, Obuda University, Budapest, Hungary.
| |
Collapse
|
3
|
Yue Y, Cao L, Lu D, Hu Z, Xu M, Wang S, Li B, Ding H. Review and empirical analysis of sparrow search algorithm. Artif Intell Rev 2023. [DOI: 10.1007/s10462-023-10435-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2023]
|
4
|
Zitar RA, Al-Betar MA, Awadallah MA, Doush IA, Assaleh K. An Intensive and Comprehensive Overview of JAYA Algorithm, its Versions and Applications. ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING : STATE OF THE ART REVIEWS 2022; 29:763-792. [PMID: 34075292 PMCID: PMC8155802 DOI: 10.1007/s11831-021-09585-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Accepted: 04/05/2021] [Indexed: 05/16/2023]
Abstract
In this review paper, JAYA algorithm, which is a recent population-based algorithm is intensively overviewed. The JAYA algorithm combines the survival of the fittest principle from evolutionary algorithms as well as the global optimal solution attractions of Swarm Intelligence methods. Initially, the optimization model and convergence characteristics of JAYA algorithm are carefully analyzed. Thereafter, the proposed versions of JAYA algorithm have been surveyed such as modified, binary, hybridized, parallel, chaotic, multi-objective and others. The various applications tackled using relevant versions of JAYA algorithm are also discussed and summarized based on several problem domains. Furthermore, the open sources code of JAYA algorithm are identified to provide enrich resources for JAYA research communities. The critical analysis of JAYA algorithm reveals its advantages and limitations in dealing with optimization problems. Finally, the paper ends up with conclusion and possible future enhancements suggested to improve the performance of JAYA algorithm. The reader of this overview will determine the best domains and applications used by JAYA algorithm and can justify their JAYA-related contributions.
Collapse
Affiliation(s)
- Raed Abu Zitar
- Sorbonne University Center of Artificial Intelligence, Sorbonne University-Abu Dhabi, Abu Dhabi, UAE
| | - Mohammed Azmi Al-Betar
- Artificial Intelligence Research Center (AIRC), College of Engineering and Information Technology, Ajman University, Ajman, UAE
- Department of Information Technology, Al-Huson University College, Al-Balqa Applied University, Irbid, Jordan
| | - Mohammed A. Awadallah
- Artificial Intelligence Research Center (AIRC), College of Engineering and Information Technology, Ajman University, Ajman, UAE
- Department of Computer Science, Al-Aqsa University, P.O. Box 4051, Gaza, Palestine
| | - Iyad Abu Doush
- Computing Department, American University of Kuwait, Salmiya, Kuwait
- Computer Science Department, Yarmouk University, Irbid, Jordan
| | - Khaled Assaleh
- Artificial Intelligence Research Center (AIRC), College of Engineering and Information Technology, Ajman University, Ajman, UAE
| |
Collapse
|
5
|
Alhenawi E, Al-Sayyed R, Hudaib A, Mirjalili S. Feature selection methods on gene expression microarray data for cancer classification: A systematic review. Comput Biol Med 2022; 140:105051. [PMID: 34839186 DOI: 10.1016/j.compbiomed.2021.105051] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Revised: 11/01/2021] [Accepted: 11/15/2021] [Indexed: 11/29/2022]
Abstract
This systematic review provides researchers interested in feature selection (FS) for processing microarray data with comprehensive information about the main research directions for gene expression classification conducted during the recent seven years. A set of 132 researches published by three different publishers is reviewed. The studied papers are categorized into nine directions based on their objectives. The FS directions that received various levels of attention were then summarized. The review revealed that 'propose hybrid FS methods' represented the most interesting research direction with a percentage of 34.9%, while the other directions have lower percentages that ranged from 13.6% down to 3%. This guides researchers to select the most competitive research direction. Papers in each category are thoroughly reviewed based on six perspectives, mainly: method(s), classifier(s), dataset(s), dataset dimension(s) range, performance metric(s), and result(s) achieved.
Collapse
Affiliation(s)
- Esra'a Alhenawi
- King Abdullah II School for Information Technology, The University of Jordan, Amman, Jordan.
| | - Rizik Al-Sayyed
- King Abdullah II School for Information Technology, The University of Jordan, Amman, Jordan.
| | - Amjad Hudaib
- King Abdullah II School for Information Technology, The University of Jordan, Amman, Jordan.
| | - Seyedali Mirjalili
- Center for Artificial Intelligence Research and Optimization, Torrens University Australia, Fortitude Valley, Brisbane, 4006, QLD, Australia; Yonsei Frontier Lab, Yonsei University, Seoul, South Korea.
| |
Collapse
|
6
|
Abinash MJ, Vasudevan V. Boundaries tuned support vector machine (BT-SVM) classifier for cancer prediction from gene selection. Comput Methods Biomech Biomed Engin 2021; 25:794-807. [PMID: 34585639 DOI: 10.1080/10255842.2021.1981300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
In recent days, the identified genes which are detecting cancer-causing diseases are plays a crucial part in the microarray data analysis. Huge volume of data required since the disease changed often. Conventional data mining techniques are lacking in space concern and time complexity. Based on big data the proposed work is executed. Using the ISPCA - Improved Supervised Principal Component Analysis, feature extraction is developed in this study. For gene expression, co-variance matrix is generated and through feature selection cancer classification is performed by IPSCA. Further feature selection process by boundaries tuned support vector machines (BT-SVM) classifier and modified particle swarm optimization with novel wrapper model algorithm are performed. The experimentation is carried out by utilizing different datasets like leukaemia, breast cancer dataset, brain cancer, colon, and lung carcinoma from the UCI repository. The proposed work is executed on six benchmark dataset for DNA microarray data in terms of accuracy, recall, and precision to evaluate the performance of the proposed work. For evaluating the proposed work effectiveness, it is compared with various traditional techniques and resulted in optimum accuracy, recall, precision and training time with and without feature selection effectively.
Collapse
Affiliation(s)
- M J Abinash
- Department of Computer Science, Sri Kaliswari College (Autonomous), Sivakasi, TamilNadu, India
| | - V Vasudevan
- Department of Information Technology, Kalasalingam Academy of Research and Education, Krishnankoil, Tamil Nadu, India
| |
Collapse
|
7
|
Mahapatra S, Sahu SS. ANOVA-particle swarm optimization-based feature selection and gradient boosting machine classifier for improved protein-protein interaction prediction. Proteins 2021; 90:443-454. [PMID: 34528291 DOI: 10.1002/prot.26236] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2020] [Revised: 08/09/2021] [Accepted: 09/03/2021] [Indexed: 01/22/2023]
Abstract
Feature fusion and selection strategies have been applied to improve accuracy in the prediction of protein-protein interaction (PPI). In this paper, an embedded feature selection framework is developed by integrating a cost function based on analysis of variance (ANOVA) with the particle swarm optimization (PSO), termed AVPSO. Initially, the features of the protein sequences extracted using pseudo-amino acid composition (PseAAC), conjoint triad composition, and local descriptor are fused. Then, AVPSO is employed to select the optimal set of features. The light gradient boosting machine (LGBM) classifier is used to predict the PPIs using the optimal feature subset. On the five-fold cross-validation analysis, the proposed model (AVPSO-LGBM) achieved an average accuracy of 97.12% and 95.09%, respectively, on the intraspecies PPI datasets Saccharomyces cerevisiae and Helicobacter pylori. On the interspecies, PPI datasets of the Human-Bacillus and Human-Yersinia, an average accuracy of 95.20% and 93.44%, are achieved. Results obtained on independent test datasets, and network datasets show that the prediction accuracy of the AVPSO-LGBM is better than the existing methods, demonstrating its generalization ability. The improved prediction performance obtained by the proposed model makes it a reliable and effective PPI prediction model.
Collapse
Affiliation(s)
- Satyajit Mahapatra
- Department of Electronics and Communication Engineering, Birla Institute of Technology, Ranchi, India
| | - Sitanshu Sekhar Sahu
- Department of Electronics and Communication Engineering, Birla Institute of Technology, Ranchi, India
| |
Collapse
|
8
|
A novel bio-inspired hybrid multi-filter wrapper gene selection method with ensemble classifier for microarray data. Neural Comput Appl 2021; 35:11531-11561. [PMID: 34539088 PMCID: PMC8435304 DOI: 10.1007/s00521-021-06459-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Accepted: 08/26/2021] [Indexed: 01/04/2023]
Abstract
Microarray technology is known as one of the most important tools for collecting DNA expression data. This technology allows researchers to investigate and examine types of diseases and their origins. However, microarray data are often associated with a small sample size, a significant number of genes, imbalanced data, etc., making classification models inefficient. Thus, a new hybrid solution based on a multi-filter and adaptive chaotic multi-objective forest optimization algorithm (AC-MOFOA) is presented to solve the gene selection problem and construct the Ensemble Classifier. In the proposed solution, a multi-filter model (i.e., ensemble filter) is proposed as preprocessing step to reduce the dataset's dimensions, using a combination of five filter methods to remove redundant and irrelevant genes. Accordingly, the results of the five filter methods are combined using a voting-based function. Additionally, the results of the proposed multi-filter indicate that it has good capability in reducing the gene subset size and selecting relevant genes. Then, an AC-MOFOA based on the concepts of non-dominated sorting, crowding distance, chaos theory, and adaptive operators is presented. AC-MOFOA as a wrapper method aimed at reducing dataset dimensions, optimizing KELM, and increasing the accuracy of the classification, simultaneously. Next, in this method, an ensemble classifier model is presented using AC-MOFOA results to classify microarray data. The performance of the proposed algorithm was evaluated on nine public microarray datasets, and its results were compared in terms of the number of selected genes, classification efficiency, execution time, time complexity, hypervolume indicator, and spacing metric with five hybrid multi-objective methods, and three hybrid single-objective methods. According to the results, the proposed hybrid method could increase the accuracy of the KELM in most datasets by reducing the dataset's dimensions and achieve similar or superior performance compared to other multi-objective methods. Furthermore, the proposed Ensemble Classifier model could provide better classification accuracy and generalizability in the seven of nine microarray datasets compared to conventional ensemble methods. Moreover, the comparison results of the Ensemble Classifier model with three state-of-the-art ensemble generation methods indicate its competitive performance in which the proposed ensemble model achieved better results in the five of nine datasets.
Collapse
|
9
|
Local Neighbourhood Edge Responsive Image Descriptor for Texture Classification Using Gaussian Mutated JAYA Optimization Algorithm. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2021. [DOI: 10.1007/s13369-021-05417-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
10
|
Baliarsingh SK, Muhammad K, Bakshi S. SARA: A memetic algorithm for high-dimensional biomedical data. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2020.107009] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
|