1
|
Osama S, Ali M, Ali AA, Shaban H. Gene selection and tumor identification based on a hybrid of the multi-filter embedded recursive mountain gazelle algorithm. Comput Biol Med 2023; 167:107674. [PMID: 37976816 DOI: 10.1016/j.compbiomed.2023.107674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 10/09/2023] [Accepted: 11/06/2023] [Indexed: 11/19/2023]
Abstract
Microarray gene expression data are useful for identifying gene expression patterns associated with cancer outcomes; however, their high dimensionality make it difficult to extract meaningful information and accurately classify tumors. Hence, developing effective methods for reducing dimensionality while preserving relevant information is a crucial task. Hybrid-based gene selection methods are widely proposed in the gene expression analysis domain and can still be enhanced in terms of efficiency and reliability. This study proposes a new hybrid-based gene selection method, called multi-filter embedded mountain gazelle optimizer (MUL-MGO), which utilizes two filters and an embedded method to remove irrelevant genes, followed by selecting the most relevant genes using recently developed MGO algorithm. To the best of our knowledge, this is the first work to exploit MGO as a gene or feature selection method. A new version of MGO, called recursive mountain gazelle optimizer (RMGO), which implements MGO algorithm recursively to avoid local optima, minimize search space, and obtain minimum gene count without decreasing the classifier's performance, is developed. The proposed RMGO is used to develop a new hybrid gene selection method employing similar filters and embedded methods as MUL-MGO, but with a recursive MGO algorithm version. The resulting method is called multi-filter embedded recursive mountain gazelle optimizer (MUL-RMGO). Several classifiers are used for cancer classification. Accordingly, several experimental studies are performed on eight microarray gene expression datasets to demonstrate the proficiencies of MUL-MGO and MUL-RMGO methods. The experimental findings indicate the efficiency and productivity of the suggested MUL-MGO and MUL-RMGO methods for gene selection. The methods outperform cutting-edge methods in the literature, with MUL-RMGO exceeding MUL-MGO in terms of accuracy and selected gene count.
Collapse
Affiliation(s)
- Sarah Osama
- Computer Science Department, Faculty of Computers and Information, Minia University, Minia, Egypt.
| | - Moatez Ali
- Department of Internal Medicine, St. Barnabas Hospital, NY, USA.
| | - Abdelmgeid A Ali
- Computer Science Department, Faculty of Computers and Information, Minia University, Minia, Egypt.
| | - Hassan Shaban
- Computer Science Department, Faculty of Computers and Information, Minia University, Minia, Egypt.
| |
Collapse
|
2
|
Pacheco J, Saiz O, Casado S, Ubillos S. A multistart tabu search-based method for feature selection in medical applications. Sci Rep 2023; 13:17140. [PMID: 37816874 PMCID: PMC10564765 DOI: 10.1038/s41598-023-44437-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 10/08/2023] [Indexed: 10/12/2023] Open
Abstract
In the design of classification models, irrelevant or noisy features are often generated. In some cases, there may even be negative interactions among features. These weaknesses can degrade the performance of the models. Feature selection is a task that searches for a small subset of relevant features from the original set that generate the most efficient models possible. In addition to improving the efficiency of the models, feature selection confers other advantages, such as greater ease in the generation of the necessary data as well as clearer and more interpretable models. In the case of medical applications, feature selection may help to distinguish which characteristics, habits, and factors have the greatest impact on the onset of diseases. However, feature selection is a complex task due to the large number of possible solutions. In the last few years, methods based on different metaheuristic strategies, mainly evolutionary algorithms, have been proposed. The motivation of this work is to develop a method that outperforms previous methods, with the benefits that this implies especially in the medical field. More precisely, the present study proposes a simple method based on tabu search and multistart techniques. The proposed method was analyzed and compared to other methods by testing their performance on several medical databases. Specifically, eight databases belong to the well-known repository of the University of California in Irvine and one of our own design were used. In these computational tests, the proposed method outperformed other recent methods as gauged by various metrics and classifiers. The analyses were accompanied by statistical tests, the results of which showed that the superiority of our method is significant and therefore strengthened these conclusions. In short, the contribution of this work is the development of a method that, on the one hand, is based on different strategies than those used in recent methods, and on the other hand, improves the performance of these methods.
Collapse
|
3
|
Parhi P, Bisoi R, Kishore Dash P. An improvised nature-inspired algorithm enfolded broad learning system for disease classification. EGYPTIAN INFORMATICS JOURNAL 2023. [DOI: 10.1016/j.eij.2023.03.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]
|
4
|
Vahabzadeh V, Moattar MH. Robust microarray data feature selection using a correntropy based distance metric learning approach. Comput Biol Med 2023; 161:107056. [PMID: 37235945 DOI: 10.1016/j.compbiomed.2023.107056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 04/18/2023] [Accepted: 05/20/2023] [Indexed: 05/28/2023]
Abstract
Classification of high-dimensional microarray data is a challenge in bioinformatics and genetic data processing. One of the challenging issues of feature selection is the presence of outliers. The Euclidean distance metric is sensitive to outliers. In this study, a distance metric learning based feature selection approach that uses the correntropy function as the discrimination metric is proposed. For this purpose, the metric learning problem is formulated as an optimization problem and solved using the Lagrange method. The output of the approach signifies the most important and robust features. After feature selection, different classification methods such as SVM, decision trees, and NN classifiers are used to investigate the classification accuracy of the proposed method as well as precision, recall, and F-measure. Experiments are carried out on 13 high-dimensional datasets and show that the proposed method outperforms the previous models in terms of accuracy and robustness.
Collapse
Affiliation(s)
- Venus Vahabzadeh
- Department of Software Engineering, Mashhad Branch, Islamic Azad University, Mashhad, Iran.
| | | |
Collapse
|
5
|
Mowlaei ME, Shi X. FSF-GA: A Feature Selection Framework for Phenotype Prediction Using Genetic Algorithms. Genes (Basel) 2023; 14:genes14051059. [PMID: 37239419 DOI: 10.3390/genes14051059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 05/04/2023] [Accepted: 05/06/2023] [Indexed: 05/28/2023] Open
Abstract
(1) Background: Phenotype prediction is a pivotal task in genetics in order to identify how genetic factors contribute to phenotypic differences. This field has seen extensive research, with numerous methods proposed for predicting phenotypes. Nevertheless, the intricate relationship between genotypes and complex phenotypes, including common diseases, has resulted in an ongoing challenge to accurately decipher the genetic contribution. (2) Results: In this study, we propose a novel feature selection framework for phenotype prediction utilizing a genetic algorithm (FSF-GA) that effectively reduces the feature space to identify genotypes contributing to phenotype prediction. We provide a comprehensive vignette of our method and conduct extensive experiments using a widely used yeast dataset. (3) Conclusions: Our experimental results show that our proposed FSF-GA method delivers comparable phenotype prediction performance as compared to baseline methods, while providing features selected for predicting phenotypes. These selected feature sets can be used to interpret the underlying genetic architecture that contributes to phenotypic variation.
Collapse
Affiliation(s)
- Mohammad Erfan Mowlaei
- Department of Computer and Information Sciences, Temple University, 925 N. 12th Street, Philadelphia, PA 19122, USA
| | - Xinghua Shi
- Department of Computer and Information Sciences, Temple University, 925 N. 12th Street, Philadelphia, PA 19122, USA
| |
Collapse
|
6
|
Wang Z, Zhou Y, Takagi T, Song J, Tian YS, Shibuya T. Genetic algorithm-based feature selection with manifold learning for cancer classification using microarray data. BMC Bioinformatics 2023; 24:139. [PMID: 37031189 PMCID: PMC10082986 DOI: 10.1186/s12859-023-05267-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 04/02/2023] [Indexed: 04/10/2023] Open
Abstract
BACKGROUND Microarray data have been widely utilized for cancer classification. The main characteristic of microarray data is "large p and small n" in that data contain a small number of subjects but a large number of genes. It may affect the validity of the classification. Thus, there is a pressing demand of techniques able to select genes relevant to cancer classification. RESULTS This study proposed a novel feature (gene) selection method, Iso-GA, for cancer classification. Iso-GA hybrids the manifold learning algorithm, Isomap, in the genetic algorithm (GA) to account for the latent nonlinear structure of the gene expression in the microarray data. The Davies-Bouldin index is adopted to evaluate the candidate solutions in Isomap and to avoid the classifier dependency problem. Additionally, a probability-based framework is introduced to reduce the possibility of genes being randomly selected by GA. The performance of Iso-GA was evaluated on eight benchmark microarray datasets of cancers. Iso-GA outperformed other benchmarking gene selection methods, leading to good classification accuracy with fewer critical genes selected. CONCLUSIONS The proposed Iso-GA method can effectively select fewer but critical genes from microarray data to achieve competitive classification performance.
Collapse
Affiliation(s)
- Zixuan Wang
- Division of Medical Data Informatics, Human Genome Center, Institute of Medical Science, The University of Tokyo, Tokyo, 108-8639, Japan.
| | - Yi Zhou
- Beijing International Center for Mathematical Research, Peking University, Beijing, 100871, China
| | - Tatsuya Takagi
- Graduate School of Pharmaceutical Sciences, Osaka University, 1-6 Yamadaoka, Suita, Osaka, 565-0871, Japan
| | - Jiangning Song
- Biomedicine Discovery Institute and Monash Data Futures Institute, Monash University, Melbourne, VIC, 3800, Australia
| | - Yu-Shi Tian
- Graduate School of Pharmaceutical Sciences, Osaka University, 1-6 Yamadaoka, Suita, Osaka, 565-0871, Japan.
| | - Tetsuo Shibuya
- Division of Medical Data Informatics, Human Genome Center, Institute of Medical Science, The University of Tokyo, Tokyo, 108-8639, Japan
| |
Collapse
|
7
|
Blourchi P, Ghasemzadeh A. Majority voting based on different feature ranking techniques from gene expression. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2023. [DOI: 10.3233/jifs-224029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]
Abstract
In bioinformatics studies, many modeling tasks are characterized by high dimensionality, leading to the widespread use of feature selection techniques to reduce dimensionality. There are a multitude of feature selection techniques that have been proposed in the literature, each relying on a single measurement method to select candidate features. This has an impact on the classification performance. To address this issue, we propose a majority voting method that uses five different feature ranking techniques: entropy score, Pearson’s correlation coefficient, Spearman correlation coefficient, Kendall correlation coefficient, and t-test. By using a majority voting approach, only the features that appear in all five ranking methods are selected. This selection process has three key advantages over traditional techniques. Firstly, it is independent of any particular feature ranking method. Secondly, the feature space dimension is significantly reduced compared to other ranking methods. Finally, the performance is improved as the most discriminatory and informative features are selected via the majority voting process. The performance of the proposed method was evaluated using an SVM, and the results were assessed using accuracy, sensitivity, specificity, and AUC on various biomedical datasets. The results demonstrate the superior effectiveness of the proposed method compared to state-of-the-art methods in the literature.
Collapse
|
8
|
Semisupervised Bacterial Heuristic Feature Selection Algorithm for High-Dimensional Classification with Missing Labels. INT J INTELL SYST 2023. [DOI: 10.1155/2023/4196920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/24/2023]
Abstract
Feature selection is a crucial method for discovering relevant features in high-dimensional data. However, most studies primarily focus on completely labeled data, ignoring the frequent occurrence of missing labels in real-world problems. To address high-dimensional and label-missing problems in data classification simultaneously, we proposed a semisupervised bacterial heuristic feature selection algorithm. To track the label-missing problem, a k-nearest neighbor semisupervised learning strategy is designed to reconstruct missing labels. In addition, the bacterial heuristic algorithm is improved using hierarchical population initialization, dynamic learning, and elite population evolution strategies to enhance the search capacity for various feature combinations. To verify the effectiveness of the proposed algorithm, three groups of comparison experiments based on eight datasets are employed, including two traditional feature selection methods, four bacterial heuristic feature selection algorithms, and two swarm-based heuristic feature selection algorithms. Experimental results demonstrate that the proposed algorithm has obvious advantages in terms of classification accuracy and selected feature numbers.
Collapse
|
9
|
Vahmiyan M, Kheirabadi M, Akbari E. Feature selection methods in microarray gene expression data: a systematic mapping study. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07661-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/07/2022]
|
10
|
Pashaei E. Mutation-based Binary Aquila optimizer for gene selection in cancer classification. Comput Biol Chem 2022; 101:107767. [PMID: 36084602 DOI: 10.1016/j.compbiolchem.2022.107767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 07/10/2022] [Accepted: 08/29/2022] [Indexed: 11/19/2022]
Abstract
Microarray data classification is one of the hottest issues in the field of bioinformatics due to its efficiency in diagnosing patients' ailments. But the difficulty is that microarrays possess a huge number of genes where the majority of which are redundant or irrelevant resulting in the deterioration of classification accuracy. For this issue, mutated binary Aquila Optimizer (MBAO) with a time-varying mirrored S-shaped (TVMS) transfer function is proposed as a new wrapper gene (or feature) selection method to find the optimal subset of informative genes. The suggested hybrid method utilizes Minimum Redundancy Maximum Relevance (mRMR) as a filtering approach to choose top-ranked genes in the first stage and then uses MBAO-TVMS as an efficient wrapper approach to identify the most discriminative genes in the second stage. TVMS is adopted to transform the continuous version of Aquila Optimizer (AO) to binary one and a mutation mechanism is incorporated into binary AO to aid the algorithm to escape local optima and improve its global search capabilities. The suggested method was tested on eleven well-known benchmark microarray datasets and compared to other current state-of-the-art methods. Based on the obtained results, mRMR-MBAO confirms its superiority over the mRMR-BAO algorithm and the other comparative GS approaches on the majority of the medical datasets strategies in terms of classification accuracy and the number of selected genes. R codes of MBAO are available at https://github.com/el-pashaei/MBAO.
Collapse
Affiliation(s)
- Elham Pashaei
- Department of Computer Engineering, Istanbul Gelisim University, Istanbul, Turkey.
| |
Collapse
|
11
|
Nature-inspired metaheuristics model for gene selection and classification of biomedical microarray data. Med Biol Eng Comput 2022; 60:1627-1646. [PMID: 35399141 DOI: 10.1007/s11517-022-02555-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 03/16/2022] [Indexed: 12/19/2022]
Abstract
Identifying a small subset of informative genes from a gene expression dataset is an important process for sample classification in the fields of bioinformatics and machine learning. In this process, there are two objectives: first, to minimize the number of selected genes, and second, to maximize the classification accuracy of the used classifier. In this paper, a hybrid machine learning framework based on a nature-inspired cuckoo search (CS) algorithm has been proposed to resolve this problem. The proposed framework is obtained by incorporating the cuckoo search (CS) algorithm with an artificial bee colony (ABC) in the exploitation and exploration of the genetic algorithm (GA). These strategies are used to maintain an appropriate balance between the exploitation and exploration phases of the ABC and GA algorithms in the search process. In preprocessing, the independent component analysis (ICA) method extracts the important genes from the dataset. Then, the proposed gene selection algorithms along with the Naive Bayes (NB) classifier and leave-one-out cross-validation (LOOCV) have been applied to find a small set of informative genes that maximize the classification accuracy. To conduct a comprehensive performance study, proposed algorithms have been applied on six benchmark datasets of gene expression. The experimental comparison shows that the proposed framework (ICA and CS-based hybrid algorithm with NB classifier) performs a deeper search in the iterative process, which can avoid premature convergence and produce better results compared to the previously published feature selection algorithm for the NB classifier.
Collapse
|
12
|
Abstract
AbstractFeature Selection (FS) is an important preprocessing step that is involved in machine learning and data mining tasks for preparing data (especially high-dimensional data) by eliminating irrelevant and redundant features, thus reducing the potential curse of dimensionality of a given large dataset. Consequently, FS is arguably a combinatorial NP-hard problem in which the computational time increases exponentially with an increase in problem complexity. To tackle such a problem type, meta-heuristic techniques have been opted by an increasing number of scholars. Herein, a novel meta-heuristic algorithm, called Sparrow Search Algorithm (SSA), is presented. The SSA still performs poorly on exploratory behavior and exploration-exploitation trade-off because it does not duly stimulate the search within feasible regions, and the exploitation process suffers noticeable stagnation. Therefore, we improve SSA by adopting: i) a strategy for Random Re-positioning of Roaming Agents (3RA); and ii) a novel Local Search Algorithm (LSA), which are algorithmically incorporated into the original SSA structure. To the FS problem, SSA is improved and cloned as a binary variant, namely, the improved Binary SSA (iBSSA), which would strive to select the optimal or near-optimal features from a given dataset while keeping the classification accuracy maximized. For binary conversion, the iBSSA was primarily validated against nine common S-shaped and V-shaped Transfer Functions (TFs), thus producing nine iBSSA variants. To verify the robustness of these variants, three well-known classification techniques, including k-Nearest Neighbor (k-NN), Support Vector Machine (SVM), and Random Forest (RF) were adopted as fitness evaluators with the proposed iBSSA approach and many other competing algorithms, on 18 multifaceted, multi-scale benchmark datasets from the University of California Irvine (UCI) data repository. Then, the overall best-performing iBSSA variant for each of the three classifiers was compared with binary variants of 12 different well-known meta-heuristic algorithms, including the original SSA (BSSA), Artificial Bee Colony (BABC), Particle Swarm Optimization (BPSO), Bat Algorithm (BBA), Grey Wolf Optimization (BGWO), Whale Optimization Algorithm (BWOA), Grasshopper Optimization Algorithm (BGOA) SailFish Optimizer (BSFO), Harris Hawks Optimization (BHHO), Bird Swarm Algorithm (BBSA), Atom Search Optimization (BASO), and Henry Gas Solubility Optimization (BHGSO). Based on a Wilcoxon’s non-parametric statistical test ($$\alpha =0.05$$
α
=
0.05
), the superiority of iBSSA with the three classifiers was very evident against counterparts across the vast majority of the selected datasets, achieving a feature size reduction of up to 92% along with up to 100% classification accuracy on some of those datasets.
Collapse
|
13
|
Abd Elaziz M, Ewees AA, Yousri D, Abualigah L, Al-qaness MAA. Modified marine predators algorithm for feature selection: case study metabolomics. Knowl Inf Syst 2022. [DOI: 10.1007/s10115-021-01641-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
14
|
Pashaei E, Pashaei E. An efficient binary chimp optimization algorithm for feature selection in biomedical data classification. Neural Comput Appl 2022. [DOI: 10.1007/s00521-021-06775-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
15
|
Cauteruccio F. Alignment of Microarray Data. Methods Mol Biol 2022; 2401:217-237. [PMID: 34902131 DOI: 10.1007/978-1-0716-1839-4_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The aim in microarray data analysis is to discover patterns of gene expression and to identify similar genes. Simply comparing new gene sequences to known DNA sequences often does not reveal the function of a new gene; thus, more sophisticated techniques are in order. Nowadays, data mining techniques, and in particular the clustering process, play an important role in bioinformatics. To analyze vast amounts of data can be difficult; thus, a way to cluster similar data is needed. This chapter is devoted to illustrate the general data mining approach used in microarray data analysis, combining clustering, alignment and similarity, and to highlight a novel similarity measure capable of capturing hidden correlations between data.
Collapse
Affiliation(s)
- Francesco Cauteruccio
- Department of Mathematics and Computer Science, University of Calabria, Rende, Italy.
| |
Collapse
|
16
|
|
17
|
Bayesian Gene Selection Based on Pathway Information and Network-Constrained Regularization. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:7471516. [PMID: 34394707 PMCID: PMC8360753 DOI: 10.1155/2021/7471516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Revised: 07/05/2021] [Accepted: 07/23/2021] [Indexed: 11/18/2022]
Abstract
High-throughput data make it possible to study expression levels of thousands of genes simultaneously under a particular condition. However, only few of the genes are discriminatively expressed. How to identify these biomarkers precisely is significant for disease diagnosis, prognosis, and therapy. Many studies utilized pathway information to identify the biomarkers. However, most of these studies only incorporate the group information while the pathway structural information is ignored. In this paper, we proposed a Bayesian gene selection with a network-constrained regularization method, which can incorporate the pathway structural information as priors to perform gene selection. All the priors are conjugated; thus, the parameters can be estimated effectively through Gibbs sampling. We present the application of our method on 6 microarray datasets, comparing with Bayesian Lasso, Bayesian Elastic Net, and Bayesian Fused Lasso. The results show that our method performs better than other Bayesian methods and pathway structural information can improve the result.
Collapse
|
18
|
Dhal P, Azad C. A comprehensive survey on feature selection in the various fields of machine learning. APPL INTELL 2021. [DOI: 10.1007/s10489-021-02550-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
19
|
Gene selection for microarray data classification based on Gray Wolf Optimizer enhanced with TRIZ-inspired operators. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107034] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
20
|
Ben N, Qiqi D, Hong W, Jing L. Simplified bacterial foraging optimization with quorum sensing for global optimization. INT J INTELL SYST 2021. [DOI: 10.1002/int.22396] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Affiliation(s)
- Niu Ben
- College of Management Shenzhen University Shenzhen China
| | - Duan Qiqi
- College of Management Shenzhen University Shenzhen China
| | - Wang Hong
- College of Management Shenzhen University Shenzhen China
- Department of Mechanical Engineering The Hong Kong Polytechnic University Hung Hom Hong Kong
| | - Liu Jing
- School of Engineering and Information Technology, University of New South Wales Canberra Australian Capital Territory Australia
| |
Collapse
|
21
|
|
22
|
Qu C, Zhang L, Li J, Deng F, Tang Y, Zeng X, Peng X. Improving feature selection performance for classification of gene expression data using Harris Hawks optimizer with variable neighborhood learning. Brief Bioinform 2021; 22:6238587. [PMID: 33876181 DOI: 10.1093/bib/bbab097] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Revised: 02/28/2021] [Accepted: 03/03/2021] [Indexed: 11/14/2022] Open
Abstract
Gene expression profiling has played a significant role in the identification and classification of tumor molecules. In gene expression data, only a few feature genes are closely related to tumors. It is a challenging task to select highly discriminative feature genes, and existing methods fail to deal with this problem efficiently. This article proposes a novel metaheuristic approach for gene feature extraction, called variable neighborhood learning Harris Hawks optimizer (VNLHHO). First, the F-score is used for a primary selection of the genes in gene expression data to narrow down the selection range of the feature genes. Subsequently, a variable neighborhood learning strategy is constructed to balance the global exploration and local exploitation of the Harris Hawks optimization. Finally, mutation operations are employed to increase the diversity of the population, so as to prevent the algorithm from falling into a local optimum. In addition, a novel activation function is used to convert the continuous solution of the VNLHHO into binary values, and a naive Bayesian classifier is utilized as a fitness function to select feature genes that can help classify biological tissues of binary and multi-class cancers. An experiment is conducted on gene expression profile data of eight types of tumors. The results show that the classification accuracy of the VNLHHO is greater than 96.128% for tumors in the colon, nervous system and lungs and 100% for the rest. We compare seven other algorithms and demonstrate the superiority of the VNLHHO in terms of the classification accuracy, fitness value and AUC value in feature selection for gene expression data.
Collapse
Affiliation(s)
- Chiwen Qu
- College of Mathematics and Statistics, Hunan Normal University, China
| | - Lupeng Zhang
- Department of Pathology and Pathophysiology, Jishou University School of Medicine, Jishou University, China
| | - Jinlong Li
- Department of Pathology and Pathophysiology, Jishou University School of Medicine, Jishou University, China
| | - Fang Deng
- Department of Epidemiology and Health Statistics, Xiangya Public Health School, Central South University, China
| | - Yifan Tang
- Department of Pathology and Pathophysiology, Hunan Normal University School of Medicine, Hunan Normal University, China
| | - Xiaomin Zeng
- Department of Epidemiology and Health Statistics, Xiangya Public Health School, Central South University, China
| | - Xiaoning Peng
- Department of Pathology and Pathophysiology, Hunan Normal University School of Medicine, Hunan Normal University, China
| |
Collapse
|
23
|
Too J, Mirjalili S. A Hyper Learning Binary Dragonfly Algorithm for Feature Selection: A COVID-19 Case Study. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2020.106553] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/09/2022]
|
24
|
Mahendran N, Durai Raj Vincent PM, Srinivasan K, Chang CY. Machine Learning Based Computational Gene Selection Models: A Survey, Performance Evaluation, Open Issues, and Future Research Directions. Front Genet 2020; 11:603808. [PMID: 33362861 PMCID: PMC7758324 DOI: 10.3389/fgene.2020.603808] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Accepted: 10/29/2020] [Indexed: 12/20/2022] Open
Abstract
Gene Expression is the process of determining the physical characteristics of living beings by generating the necessary proteins. Gene Expression takes place in two steps, translation and transcription. It is the flow of information from DNA to RNA with enzymes' help, and the end product is proteins and other biochemical molecules. Many technologies can capture Gene Expression from the DNA or RNA. One such technique is Microarray DNA. Other than being expensive, the main issue with Microarray DNA is that it generates high-dimensional data with minimal sample size. The issue in handling such a heavyweight dataset is that the learning model will be over-fitted. This problem should be addressed by reducing the dimension of the data source to a considerable amount. In recent years, Machine Learning has gained popularity in the field of genomic studies. In the literature, many Machine Learning-based Gene Selection approaches have been discussed, which were proposed to improve dimensionality reduction precision. This paper does an extensive review of the various works done on Machine Learning-based gene selection in recent years, along with its performance analysis. The study categorizes various feature selection algorithms under Supervised, Unsupervised, and Semi-supervised learning. The works done in recent years to reduce the features for diagnosing tumors are discussed in detail. Furthermore, the performance of several discussed methods in the literature is analyzed. This study also lists out and briefly discusses the open issues in handling the high-dimension and less sample size data.
Collapse
Affiliation(s)
- Nivedhitha Mahendran
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India
| | - P. M. Durai Raj Vincent
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India
| | - Kathiravan Srinivasan
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India
| | - Chuan-Yu Chang
- Department of Computer Science and Information Engineering, National Yunlin University of Science and Technology, Douliu, Taiwan
| |
Collapse
|
25
|
A survey on single and multi omics data mining methods in cancer data classification. J Biomed Inform 2020; 107:103466. [DOI: 10.1016/j.jbi.2020.103466] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Revised: 05/01/2020] [Accepted: 05/31/2020] [Indexed: 01/09/2023]
|
26
|
Shukla AK. Feature selection inspired by human intelligence for improving classification accuracy of cancer types. Comput Intell 2020. [DOI: 10.1111/coin.12341] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Alok Kumar Shukla
- Department of Computer Science & EngineeringG.L. Bajaj Institute of Technology and Management Gr. Noida India
| |
Collapse
|
27
|
|
28
|
An Adapting Chemotaxis Bacterial Foraging Optimization Algorithm for Feature Selection in Classification. LECTURE NOTES IN COMPUTER SCIENCE 2020. [PMCID: PMC7354779 DOI: 10.1007/978-3-030-53956-6_25] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Efficient classification methods can improve the data quality or relevance to better optimize some Internet applications such as fast searching engine and accurate identification. However, in the big data era, difficulties and volumes of data processing increase drastically. To decrease the huge computational cost, heuristic algorithms have been used. In this paper, an Adapting Chemotaxis Bacterial Foraging Optimization (ACBFO) algorithm is proposed based on basic Bacterial Foraging Optimization (BFO) algorithm. The aim of this work is to design a modified algorithm which is more suitable for data classification. The proposed algorithm has two updating strategies and one structural changing. First, the adapting chemotaxis step updating strategy is responsible to increase the flexibility of searching. Second, the feature subsets updating strategy better combines the proposed heuristic algorithm with the KNN classifier. Third, the nesting structure of BFO has been simplified to reduce the computation complexity. The ACBFO has been compared with BFO, BFOLIW and BPSO by testing on 12 widely used benchmark datasets. The result shows that ACBFO has a good ability of solving classification problems and gets higher accuracy than the other comparation algorithm.
Collapse
|
29
|
Cancer data classification using binary bat optimization and extreme learning machine with a novel fitness function. Med Biol Eng Comput 2019; 57:2673-2682. [PMID: 31713709 DOI: 10.1007/s11517-019-02043-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Accepted: 08/24/2019] [Indexed: 10/25/2022]
Abstract
Cancer classification is one of the crucial tasks in medical field. The gene expression of cells helps in identifying the cancer. The high dimensionality of gene expression data hinders the classification performance of any machine learning models. Therefore, we propose, in this paper a methodology to classify cancer using gene expression data. We employ a bio-inspired algorithm called binary bat algorithm for feature selection and extreme learning machine for classification purpose. We also propose a novel fitness function for optimizing the feature selection process by binary bat algorithm. Our proposed methodology has been compared with original fitness function that has been found in the literature. The experiments conducted show that the former outperforms the latter. Graphical Abstract Classification using Binary Bat Optimization and Extreme Learning Machine.
Collapse
|
30
|
|
31
|
Shukla AK. Identification of cancerous gene groups from microarray data by employing adaptive genetic and support vector machine technique. Comput Intell 2019. [DOI: 10.1111/coin.12245] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Alok Kumar Shukla
- Department of Computer Science & EngineeringG.L. Bajaj Institute of Technology & Management Greater Noida India
| |
Collapse
|
32
|
A study on metaheuristics approaches for gene selection in microarray data: algorithms, applications and open challenges. EVOLUTIONARY INTELLIGENCE 2019. [DOI: 10.1007/s12065-019-00306-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
33
|
Shukla AK, Tripathi D. Identification of potential biomarkers on microarray data using distributed gene selection approach. Math Biosci 2019; 315:108230. [DOI: 10.1016/j.mbs.2019.108230] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Revised: 06/04/2019] [Accepted: 07/16/2019] [Indexed: 02/09/2023]
|
34
|
A review of feature selection methods in medical applications. Comput Biol Med 2019; 112:103375. [PMID: 31382212 DOI: 10.1016/j.compbiomed.2019.103375] [Citation(s) in RCA: 165] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Revised: 07/29/2019] [Accepted: 07/29/2019] [Indexed: 11/22/2022]
Abstract
Feature selection is a preprocessing technique that identifies the key features of a given problem. It has traditionally been applied in a wide range of problems that include biological data processing, finance, and intrusion detection systems. In particular, feature selection has been successfully used in medical applications, where it can not only reduce dimensionality but also help us understand the causes of a disease. We describe some basic concepts related to medical applications and provide some necessary background information on feature selection. We review the most recent feature selection methods developed for and applied in medical problems, covering prolific research fields such as medical imaging, biomedical signal processing, and DNA microarray data analysis. A case study of two medical applications that includes actual patient data is used to demonstrate the suitability of applying feature selection methods in medical problems and to illustrate how these methods work in real-world scenarios.
Collapse
|
35
|
Su Y, Li S, Zheng C, Zhang X. A Heuristic Algorithm for Identifying Molecular Signatures in Cancer. IEEE Trans Nanobioscience 2019; 19:132-141. [PMID: 31352348 DOI: 10.1109/tnb.2019.2930647] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Molecular signatures of cancer, e.g., genes or microRNAs (miRNAs), have been recognized very important in predicting the occurrence of cancer. From gene-expression and miRNA-expression data, the challenge of identifying molecular signatures lies in the huge number of molecules compared to the small number of samples. To address this issue, in this paper, we propose a heuristic algorithm to identify molecular signatures, termed HAMS, for cancer diagnosis by modeling it as a multi-objective optimization problem. In the proposed HAMS, an elitist-guided individual update strategy is proposed to obtain a small number of molecular signatures, which are closely related with cancer and contain less redundant signatures. Experimental results demonstrate that the proposed HAMS achieves superior performance over seven state-of-the-art algorithms on both gene-expression and miRNA-expression datasets. We also validate the biological significance of the molecular signatures obtained by the proposed HAMS through biological analysis.
Collapse
|
36
|
Chen SB, Zhang YM, Ding CH, Zhang J, Luo B. Extended adaptive Lasso for multi-class and multi-label feature selection. Knowl Based Syst 2019. [DOI: 10.1016/j.knosys.2019.02.021] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
37
|
Ventura-Molina E, Alarcón-Paredes A, Aldape-Pérez M, Yáñez-Márquez C, Adolfo Alonso G. Gene selection for enhanced classification on microarray data using a weighted k-NN based algorithm. INTELL DATA ANAL 2019. [DOI: 10.3233/ida-173720] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Elías Ventura-Molina
- Centro de Investigación en Computación, Instituto Politécnico Nacional. Av. Juan de Dios Bátiz, Esq. Miguel Othón de Mendizábal. Col. Nueva Industrial Vallejo, Gustavo A. Madero, 07738, Ciudad de México, México
| | - Antonio Alarcón-Paredes
- Facultad de Ingeniería, Universidad Autónoma de Guerrero. Av. Lázaro Cárdenas s/n, Ciudad Universitaria Zona Sur, 39087. Chilpancingo Guerrero, México
| | - Mario Aldape-Pérez
- Centro de Innovación y Desarrollo Tecnológico en Cómputo, Instituto Politécnico Nacional, México. Av. Juan de Dios Bátiz, Col. Nueva Industrial Vallejo, 07700, Ciudad de México, México
| | - Cornelio Yáñez-Márquez
- Centro de Investigación en Computación, Instituto Politécnico Nacional. Av. Juan de Dios Bátiz, Esq. Miguel Othón de Mendizábal. Col. Nueva Industrial Vallejo, Gustavo A. Madero, 07738, Ciudad de México, México
| | - Gustavo Adolfo Alonso
- Facultad de Ingeniería, Universidad Autónoma de Guerrero. Av. Lázaro Cárdenas s/n, Ciudad Universitaria Zona Sur, 39087. Chilpancingo Guerrero, México
| |
Collapse
|
38
|
Abstract
The automatic classification of DNA microarray data is one of the hot topics in the field of bioinformatics, since it is an effective tool for the diagnosis of diseases in patients. The aim of this chapter is to present the most relevant aspects related to the classification of microarrays. We carried out an analysis of the strategies used for the classification of microarray data and a review of the main methods used in the literature. In addition, other related aspects are addressed as the reduction of dimensionality, to try to eliminate redundant information in genes, or the treatment of imbalanced data and missing of data. To conclude, we present an exhaustive review of the main scientific works in journals to show the most successful techniques applied in this discipline as well as the most used datasets to verify their effectiveness.
Collapse
|
39
|
Real-Time Decision Making in First Mile and Last Mile Logistics: How Smart Scheduling Affects Energy Efficiency of Hyperconnected Supply Chain Solutions. ENERGIES 2018. [DOI: 10.3390/en11071833] [Citation(s) in RCA: 64] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Energy efficiency and environmental issues have been largely neglected in logistics. In a traditional supply chain, the objective of improving energy efficiency is targeted at the level of single parts of the value making chain. Industry 4.0 technologies make it possible to build hyperconnected logistic solutions, where the objective of decreasing energy consumption and economic footprint is targeted at the global level. The problems of energy efficiency are especially relevant in first mile and last mile delivery logistics, where deliveries are composed of individual orders and each order must be picked up and delivered at different locations. Within the frame of this paper, the author describes a real-time scheduling optimization model focusing on energy efficiency of the operation. After a systematic literature review, this paper introduces a mathematical model of last mile delivery problems including scheduling and assignment problems. The objective of the model is to determine the optimal assignment and scheduling for each order so as to minimize energy consumption, which allows to improve energy efficiency. Next, a black hole optimization-based heuristic is described, whose performance is validated with different benchmark functions. The scenario analysis validates the model and evaluates its performance to increase energy efficiency in last mile logistics.
Collapse
|
40
|
Evolutionary Population Dynamics and Grasshopper Optimization approaches for feature selection problems. Knowl Based Syst 2018. [DOI: 10.1016/j.knosys.2017.12.037] [Citation(s) in RCA: 269] [Impact Index Per Article: 44.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
41
|
Feature Genes Selection Using Supervised Locally Linear Embedding and Correlation Coefficient for Microarray Classification. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2018; 2018:5490513. [PMID: 29666661 PMCID: PMC5831962 DOI: 10.1155/2018/5490513] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2017] [Revised: 12/17/2017] [Accepted: 12/21/2017] [Indexed: 11/17/2022]
Abstract
The selection of feature genes with high recognition ability from the gene expression profiles has gained great significance in biology. However, most of the existing methods have a high time complexity and poor classification performance. Motivated by this, an effective feature selection method, called supervised locally linear embedding and Spearman's rank correlation coefficient (SLLE-SC2), is proposed which is based on the concept of locally linear embedding and correlation coefficient algorithms. Supervised locally linear embedding takes into account class label information and improves the classification performance. Furthermore, Spearman's rank correlation coefficient is used to remove the coexpression genes. The experiment results obtained on four public tumor microarray datasets illustrate that our method is valid and feasible.
Collapse
|
42
|
Differential evolution for filter feature selection based on information theory and feature ranking. Knowl Based Syst 2018. [DOI: 10.1016/j.knosys.2017.10.028] [Citation(s) in RCA: 186] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
43
|
Gene selection for microarray data classification via subspace learning and manifold regularization. Med Biol Eng Comput 2017; 56:1271-1284. [PMID: 29256006 DOI: 10.1007/s11517-017-1751-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Accepted: 11/03/2017] [Indexed: 10/18/2022]
Abstract
With the rapid development of DNA microarray technology, large amount of genomic data has been generated. Classification of these microarray data is a challenge task since gene expression data are often with thousands of genes but a small number of samples. In this paper, an effective gene selection method is proposed to select the best subset of genes for microarray data with the irrelevant and redundant genes removed. Compared with original data, the selected gene subset can benefit the classification task. We formulate the gene selection task as a manifold regularized subspace learning problem. In detail, a projection matrix is used to project the original high dimensional microarray data into a lower dimensional subspace, with the constraint that the original genes can be well represented by the selected genes. Meanwhile, the local manifold structure of original data is preserved by a Laplacian graph regularization term on the low-dimensional data space. The projection matrix can serve as an importance indicator of different genes. An iterative update algorithm is developed for solving the problem. Experimental results on six publicly available microarray datasets and one clinical dataset demonstrate that the proposed method performs better when compared with other state-of-the-art methods in terms of microarray data classification. Graphical Abstract The graphical abstract of this work.
Collapse
|
44
|
Unsupervised feature selection based on self-representation sparse regression and local similarity preserving. INT J MACH LEARN CYB 2017. [DOI: 10.1007/s13042-017-0760-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
45
|
|