1
|
Panda P, Bisoy SK, Kautish S, Ahmad R, Irshad A, Sarwar N. Ensemble Classification Model With CFS-IGWO-Based Feature Selection for Cancer Detection Using Microarray Data. Int J Telemed Appl 2024; 2024:4105224. [PMID: 39449963 PMCID: PMC11502127 DOI: 10.1155/2024/4105224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 05/16/2024] [Accepted: 07/05/2024] [Indexed: 10/26/2024] Open
Abstract
Cancer is the top cause of death worldwide, and machine learning (ML) has made an indelible mark on the field of early cancer detection, thereby lowering the death toll. ML-based model for cancer diagnosis is done using two forms of data: gene expression data and microarray data. The data on gene expression levels includes many dimensions. When dealing with data with a high dimension, the efficiency of an ML-based model is decreased. Microarray data is distinguished by its high dimensionality with a greater number of features and a smaller sample size. In this work, two ensemble techniques are proposed using majority voting technique and weighted average technique. Correlation feature selection (CFS) is used for feature selection, and improved grey wolf optimizer (IGWO) is used for feature optimization. Support vector machines (SVMs), multilayer perceptron (MLP) classification, logistic regression (LR), decision tree (DT), adaptive boosting (AdaBoost) classifier, extreme learning machines (ELMs), and K-nearest neighbor (KNN) are used as classifiers. The results of each distinct base learner were then combined using weighted average and majority voting ensemble methods. Accuracy (ACC), specificity (SPE), sensitivity (SEN), precision (PRE), Matthews correlation coefficient (MCC), and F1-score (F1-S) are used to assess the performance. Our result shows that majority voting achieves better performance than the weighted average ensemble technique.
Collapse
Affiliation(s)
- Pinakshi Panda
- Department of Computer Science & Engineering, C. V. Raman Global University, Bidyanagar, Mahura, Janla 752054, Bhubaneswar, Odisha, India
| | - Sukant Kishoro Bisoy
- Department of Computer Science & Engineering, C. V. Raman Global University, Bidyanagar, Mahura, Janla 752054, Bhubaneswar, Odisha, India
| | - Sandeep Kautish
- Apex Institute of Technology, Chandigarh University, Mohali, Punjab, India
| | - Reyaz Ahmad
- School of General Education, Skyline University College, Sharjah, UAE
| | - Asma Irshad
- School of Biochemistry and Biotechnology, University of the Punjab, Lahore, Pakistan
| | - Nadeem Sarwar
- Department of Computer Science, Bahria University Lahore Campus, Lahore 54600, Pakistan
| |
Collapse
|
2
|
Esfandiari A, Nasiri N. Gene selection and cancer classification using interaction-based feature clustering and improved-binary Bat algorithm. Comput Biol Med 2024; 181:109071. [PMID: 39205342 DOI: 10.1016/j.compbiomed.2024.109071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Revised: 08/13/2024] [Accepted: 08/22/2024] [Indexed: 09/04/2024]
Abstract
In high-dimensional gene expression data, selecting an optimal subset of genes is crucial for achieving high classification accuracy and reliable diagnosis of diseases. This paper proposes a two-stage hybrid model for gene selection based on clustering and a swarm intelligence algorithm to identify the most informative genes with high accuracy. First, a clustering-based multivariate filter approach is performed to explore the interactions between the features and eliminate any redundant or irrelevant ones. Then, by controlling for the problem of premature convergence in the binary Bat algorithm, the optimal gene subset is determined using different classifiers with the Monte Carlo cross-validation data partitioning model. The effectiveness of our proposed framework is evaluated using eight gene expression datasets, by comparison with other recently published algorithms in the literature. Experiments confirm that in seven out of eight datasets, the proposed method can achieve superior results in terms of classification accuracy and gene subset size. In particular, it achieves a classification accuracy of 100% in Lymphoma and Ovarian datasets and above 97.4% in the rest with a minimum number of genes. The results demonstrate that our proposed algorithm has the potential to solve the feature selection problem in different applications with high-dimensional datasets.
Collapse
Affiliation(s)
- Ahmad Esfandiari
- Department of Computer Engineering, Sari Branch, Islamic Azad University, Sari, Iran.
| | - Niki Nasiri
- Pediatric Infectious Diseases Research Center, Communicable Diseases Institute, Mazandaran University of Medical Sciences, Sari, Iran
| |
Collapse
|
3
|
Saini R, Tiwari AK, Nath A, Singh P, Maurya SP, Shah MA. Covering assisted intuitionistic fuzzy bi-selection technique for data reduction and its applications. Sci Rep 2024; 14:13568. [PMID: 38866851 DOI: 10.1038/s41598-024-62099-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 05/13/2024] [Indexed: 06/14/2024] Open
Abstract
The dimension and size of data is growing rapidly with the extensive applications of computer science and lab based engineering in daily life. Due to availability of vagueness, later uncertainty, redundancy, irrelevancy, and noise, which imposes concerns in building effective learning models. Fuzzy rough set and its extensions have been applied to deal with these issues by various data reduction approaches. However, construction of a model that can cope with all these issues simultaneously is always a challenging task. None of the studies till date has addressed all these issues simultaneously. This paper investigates a method based on the notions of intuitionistic fuzzy (IF) and rough sets to avoid these obstacles simultaneously by putting forward an interesting data reduction technique. To accomplish this task, firstly, a novel IF similarity relation is addressed. Secondly, we establish an IF rough set model on the basis of this similarity relation. Thirdly, an IF granular structure is presented by using the established similarity relation and the lower approximation. Next, the mathematical theorems are used to validate the proposed notions. Then, the importance-degree of the IF granules is employed for redundant size elimination. Further, significance-degree-preserved dimensionality reduction is discussed. Hence, simultaneous instance and feature selection for large volume of high-dimensional datasets can be performed to eliminate redundancy and irrelevancy in both dimension and size, where vagueness and later uncertainty are handled with rough and IF sets respectively, whilst noise is tackled with IF granular structure. Thereafter, a comprehensive experiment is carried out over the benchmark datasets to demonstrate the effectiveness of simultaneous feature and data point selection methods. Finally, our proposed methodology aided framework is discussed to enhance the regression performance for IC50 of Antiviral Peptides.
Collapse
Affiliation(s)
- Rajat Saini
- Department of Mathematics, School of Basic Sciences, Central University of Haryana, Mahendergarh, 123031, India
| | - Anoop Kumar Tiwari
- Department of Computer Science and Information Technology, Central University of Haryana, Mahendergarh, 123031, India.
| | - Abhigyan Nath
- Department of Biochemistry, Pt. Jawahar Lal Nehru Memorial Medical College, Raipur, 492001, India
| | - Phool Singh
- Department of Mathematics (SoET), Central University of Haryana, Mahendergarh, 123031, India
| | - S P Maurya
- Department of Geophysics, Institute of Science, Banaras Hindu University, Varanasi, 221005, India
| | - Mohd Asif Shah
- Department of Economics, Kebri Dehar University, 250, Kebri Dehar, Somali, Ethiopia.
- Division of Research and Development, Lovely Professional University, Phagwara, Punjab, 144001, India.
- Department of Economics, Kardan University, Parwan e Du, Kabul, 1001, Afghanistan.
| |
Collapse
|
4
|
Tiwari AK, Saini R, Nath A, Singh P, Shah MA. Hybrid similarity relation based mutual information for feature selection in intuitionistic fuzzy rough framework and its applications. Sci Rep 2024; 14:5958. [PMID: 38472266 PMCID: PMC10933482 DOI: 10.1038/s41598-024-55902-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2023] [Accepted: 02/28/2024] [Indexed: 03/14/2024] Open
Abstract
Fuzzy rough entropy established in the notion of fuzzy rough set theory, which has been effectively and efficiently applied for feature selection to handle the uncertainty in real-valued datasets. Further, Fuzzy rough mutual information has been presented by integrating information entropy with fuzzy rough set to measure the importance of features. However, none of the methods till date can handle noise, uncertainty and vagueness simultaneously due to both judgement and identification, which lead to degrade the overall performances of the learning algorithms with the increment in the number of mixed valued conditional features. In the current study, these issues are tackled by presenting a novel intuitionistic fuzzy (IF) assisted mutual information concept along with IF granular structure. Initially, a hybrid IF similarity relation is introduced. Based on this relation, an IF granular structure is introduced. Then, IF rough conditional and joint entropies are established. Further, mutual information based on these concepts are discussed. Next, mathematical theorems are proved to demonstrate the validity of the given notions. Thereafter, significance of the features subset is computed by using this mutual information, and corresponding feature selection is suggested to delete the irrelevant and redundant features. The current approach effectively handles noise and subsequent uncertainty in both nominal and mixed data (including both nominal and category variables). Moreover, comprehensive experimental performances are evaluated on real-valued benchmark datasets to demonstrate the practical validation and effectiveness of the addressed technique. Finally, an application of the proposed method is exhibited to improve the prediction of phospholipidosis positive molecules. RF(h2o) produces the most effective results till date based on our proposed methodology with sensitivity, accuracy, specificity, MCC, and AUC of 86.7%, 90.1%, 93.0% , 0.808, and 0.922 respectively.
Collapse
Affiliation(s)
- Anoop Kumar Tiwari
- Department of Computer Science and Information Technology, Central University of Haryana, Mahendergarh, 123031, India
| | - Rajat Saini
- Department of Mathematics, School of Basic Sciences, Central University of Haryana, Mahendergarh, 123031, India.
| | - Abhigyan Nath
- Department of Biochemistry, Pt. Jawahar Lal Nehru Memorial Medical College, Raipur, 492001, India
| | - Phool Singh
- Department of Mathematics (SoET), Central University of Haryana, Mahendergarh, 123031, India
| | - Mohd Asif Shah
- Department of Economics, Kebri Dehar University, 250, Kebri Dehar, Somali, Ethiopia.
- Centre of Research Impact and Outcome, Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, 140401, Punjab, India.
- Division of Research and Development, Lovely Professional University, Phagwara, 144001, Punjab, India.
| |
Collapse
|
5
|
Xie K, Hou Y, Zhou X. Deep centroid: a general deep cascade classifier for biomedical omics data classification. Bioinformatics 2024; 40:btae039. [PMID: 38305432 PMCID: PMC10868341 DOI: 10.1093/bioinformatics/btae039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 01/13/2024] [Accepted: 01/30/2024] [Indexed: 02/03/2024] Open
Abstract
MOTIVATION Classification of samples using biomedical omics data is a widely used method in biomedical research. However, these datasets often possess challenging characteristics, including high dimensionality, limited sample sizes, and inherent biases across diverse sources. These factors limit the performance of traditional machine learning models, particularly when applied to independent datasets. RESULTS To address these challenges, we propose a novel classifier, Deep Centroid, which combines the stability of the nearest centroid classifier and the strong fitting ability of the deep cascade strategy. Deep Centroid is an ensemble learning method with a multi-layer cascade structure, consisting of feature scanning and cascade learning stages that can dynamically adjust the training scale. We apply Deep Centroid to three precision medicine applications-cancer early diagnosis, cancer prognosis, and drug sensitivity prediction-using cell-free DNA fragmentations, gene expression profiles, and DNA methylation data. Experimental results demonstrate that Deep Centroid outperforms six traditional machine learning models in all three applications, showcasing its potential in biological omics data classification. Furthermore, functional annotations reveal that the features scanned by the model exhibit biological significance, indicating its interpretability from a biological perspective. Our findings underscore the promising application of Deep Centroid in the classification of biomedical omics data, particularly in the field of precision medicine. AVAILABILITY AND IMPLEMENTATION Deep Centroid is available at both github (github.com/xiexiexiekuan/DeepCentroid) and Figshare (https://figshare.com/articles/software/Deep_Centroid_A_General_Deep_Cascade_Classifier_for_Biomedical_Omics_Data_Classification/24993516).
Collapse
Affiliation(s)
- Kuan Xie
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, People’s Republic of China
| | - Yuying Hou
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, People’s Republic of China
| | - Xionghui Zhou
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, People’s Republic of China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan 430070, People’s Republic of China
| |
Collapse
|
6
|
Kim S. Inferring Drug Set and Identifying the Mechanism of Drugs for PC3. Int J Mol Sci 2024; 25:765. [PMID: 38255837 PMCID: PMC10815650 DOI: 10.3390/ijms25020765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 12/24/2023] [Accepted: 01/05/2024] [Indexed: 01/24/2024] Open
Abstract
Drug repurposing is a strategy for discovering new applications of existing drugs for use in various diseases. Despite the use of structured networks in drug research, it is still unclear how drugs interact with one another or with genes. Prostate adenocarcinoma is the second leading cause of cancer mortality in the United States, with an estimated incidence of 288,300 new cases and 34,700 deaths in 2023. In our study, we used integrative information from genes, pathways, and drugs for machine learning methods such as clustering, feature selection, and enrichment pathway analysis. We investigated how drugs affect drugs and how drugs affect genes in human pancreatic cancer cell lines that were derived from bone metastases of grade IV prostate cancer. Finally, we identified significant drug interactions within or between clusters, such as estradiol-rosiglitazone, estradiol-diclofenac, troglitazone-rosiglitazone, celecoxib-rofecoxib, celecoxib-diclofenac, and sodium phenylbutyrate-valproic acid.
Collapse
Affiliation(s)
- Shinuk Kim
- College of Engineering, Sangmyung University, Cheonan 31066, Republic of Korea
| |
Collapse
|
7
|
Zhou K, Yin Z, Gu J, Zeng Z. A Feature Selection Method Based on Graph Theory for Cancer Classification. Comb Chem High Throughput Screen 2024; 27:650-660. [PMID: 37056061 DOI: 10.2174/1386207326666230413085646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 02/02/2023] [Accepted: 02/24/2023] [Indexed: 04/15/2023]
Abstract
OBJECTIVE Gene expression profile data is a good data source for people to study tumors, but gene expression data has the characteristics of high dimension and redundancy. Therefore, gene selection is a very important step in microarray data classification. METHODS In this paper, a feature selection method based on the maximum mutual information coefficient and graph theory is proposed. Each feature of gene expression data is treated as a vertex of the graph, and the maximum mutual information coefficient between genes is used to measure the relationship between the vertices to construct an undirected graph, and then the core and coritivity theory is used to determine the feature subset of gene data. RESULTS In this work, we used three different classification models and three different evaluation metrics such as accuracy, F1-Score, and AUC to evaluate the classification performance to avoid reliance on any one classifier or evaluation metric. The experimental results on six different types of genetic data show that our proposed algorithm has high accuracy and robustness compared to other advanced feature selection methods. CONCLUSION In this method, the importance and correlation of features are considered at the same time, and the problem of gene selection in microarray data classification is solved.
Collapse
Affiliation(s)
- Kai Zhou
- School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science, Shanghai, 201620, China
| | - Zhixiang Yin
- School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science, Shanghai, 201620, China
| | - Jiaying Gu
- School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science, Shanghai, 201620, China
| | - Zhiliang Zeng
- School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science, Shanghai, 201620, China
| |
Collapse
|
8
|
Lee H, Kim J. A Gene Selection Method Considering Measurement Errors. J Comput Biol 2024; 31:71-82. [PMID: 38010511 DOI: 10.1089/cmb.2023.0041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2023] Open
Abstract
The analysis of gene expression data has made significant contributions to understanding disease mechanisms and developing new drugs and therapies. In such analysis, gene selection is often required for identifying informative and relevant genes and removing redundant and irrelevant ones. However, this is not an easy task as gene expression data have inherent challenges such as ultra-high dimensionality, biological noise, and measurement errors. This study focuses on the measurement errors in gene selection problems. Typically, high-throughput experiments have their own intrinsic measurement errors, which can result in an increase of falsely discovered genes. To alleviate this problem, this study proposes a gene selection method that takes into account measurement errors using generalized liner measurement error models. The method consists of iterative filtering and selection steps until convergence, leading to fewer false positives and providing stable results under measurement errors. The performance of the proposed method is demonstrated through simulation studies and applied to a lung cancer data set.
Collapse
Affiliation(s)
- Hajoung Lee
- Department of Statistics, Sungkyunkwan University, Seoul, South Korea
| | - Jaejik Kim
- Department of Statistics, Sungkyunkwan University, Seoul, South Korea
| |
Collapse
|
9
|
Rahimi MR, Makarem D, Sarspy S, Mahdavi SA, Albaghdadi MF, Armaghan SM. Classification of cancer cells and gene selection based on microarray data using MOPSO algorithm. J Cancer Res Clin Oncol 2023; 149:15171-15184. [PMID: 37634207 DOI: 10.1007/s00432-023-05308-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 08/16/2023] [Indexed: 08/29/2023]
Abstract
PURPOSE Microarray information is crucial for the identification and categorisation of malignant tissues. The very limited sample size in the microarray has always been a challenge for classification design in cancer research. As a result, by pre-processing gene selection approaches and genes lacking their information, the microarray data are deleted prior to categorisation. In essence, an appropriate gene selection technique can significantly increase the accuracy of illness (cancer) classification. METHODS For the classification of high-dimensional microarray data, a novel approach based on the hybrid model of multi-objective particle swarm optimisation (MOPSO) is proposed in this research. First, a binary vector representing each particle's position is presented at random. A gene is represented by each bit. Bit 0 denotes the absence of selection of the characteristic (gene) corresponding to it, while bit 1 denotes the selection of the gene. Therefore, the position of each particle represents a set of genes, and the linear Bayesian discriminant analysis classification algorithm calculates each particle's degree of fitness to assess the quality of the gene set that particle has chosen. The suggested methodology is applied to four different cancer database sets, and the results are contrasted with those of other approaches currently in use. RESULTS The proposed algorithm has been applied on four sets of cancer database and its results have been compared with other existing methods. The results of the implementation show that the improvement of classification accuracy in the proposed algorithm compared to other methods for four sets of databases is 25.84% on average. So that it has improved by 18.63% in the blood cancer database, 24.25% in the lung cancer database, 27.73% in the breast cancer database, and 32.80% in the prostate cancer database. Therefore, the proposed algorithm is able to identify a small set of genes containing information in a way choose to increase the classification accuracy. CONCLUSION Our proposed solution is used for data classification, which also improves classification accuracy. This is possible because the MOPSO model removes redundancy and reduces the number of redundant and redundant genes by considering how genes are correlated with each other.
Collapse
Affiliation(s)
| | - Dorna Makarem
- Escuela Tecnica Superior de Ingenieros de Telecomunicacion Politecnica de Madrid, Madrid, Spain
| | - Sliva Sarspy
- Department of Computer Science, College of Science, Cihan University-Erbil, Erbil, Iraq
| | | | | | | |
Collapse
|
10
|
Kamalov F, Sulieman H, Moussa S, Reyes JA, Safaraliev M. Nested ensemble selection: An effective hybrid feature selection method. Heliyon 2023; 9:e19686. [PMID: 37809839 PMCID: PMC10558945 DOI: 10.1016/j.heliyon.2023.e19686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 08/29/2023] [Accepted: 08/30/2023] [Indexed: 10/10/2023] Open
Abstract
It has been shown that while feature selection algorithms are able to distinguish between relevant and irrelevant features, they fail to differentiate between relevant and redundant and correlated features. To address this issue, we propose a highly effective approach, called Nested Ensemble Selection (NES), that is based on a combination of filter and wrapper methods. The proposed feature selection algorithm differs from the existing filter-wrapper hybrid methods in its simplicity and efficiency as well as precision. The new algorithm is able to separate the relevant variables from the irrelevant as well as the redundant and correlated features. Furthermore, we provide a robust heuristic for identifying the optimal number of selected features which remains one of the greatest challenges in feature selection. Numerical experiments on synthetic and real-life data demonstrate the effectiveness of the proposed method. The NES algorithm achieves perfect precision on the synthetic data and near optimal accuracy on the real-life data. The proposed method is compared against several popular algorithms including mRMR, Boruta, genetic, recursive feature elimination, Lasso, and Elastic Net. The results show that NES significantly outperforms the benchmarks algorithms especially on multi-class datasets.
Collapse
Affiliation(s)
- Firuz Kamalov
- Department of Electrical Engineering, Canadian University Dubai, Dubai, United Arab Emirates
| | - Hana Sulieman
- Department of Mathematics and Statistics, American University of Sharjah, Sharjah, United Arab Emirates
| | - Sherif Moussa
- Department of Electrical Engineering, Canadian University Dubai, Dubai, United Arab Emirates
| | - Jorge Avante Reyes
- Department of Electrical Engineering, Canadian University Dubai, Dubai, United Arab Emirates
| | - Murodbek Safaraliev
- Department of Automated Electrical Systems, Ural Federal University, Yekaterinburg, Russian Federation
| |
Collapse
|
11
|
Li W, Chi Y, Yu K, Xie W. A two-stage hybrid biomarker selection method based on ensemble filter and binary differential evolution incorporating binary African vultures optimization. BMC Bioinformatics 2023; 24:130. [PMID: 37016297 PMCID: PMC10072044 DOI: 10.1186/s12859-023-05247-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Accepted: 03/21/2023] [Indexed: 04/06/2023] Open
Abstract
BACKGROUND In the field of genomics and personalized medicine, it is a key issue to find biomarkers directly related to the diagnosis of specific diseases from high-throughput gene microarray data. Feature selection technology can discover biomarkers with disease classification information. RESULTS We use support vector machines as classifiers and use the five-fold cross-validation average classification accuracy, recall, precision and F1 score as evaluation metrics to evaluate the identified biomarkers. Experimental results show classification accuracy above 0.93, recall above 0.92, precision above 0.91, and F1 score above 0.94 on eight microarray datasets. METHOD This paper proposes a two-stage hybrid biomarker selection method based on ensemble filter and binary differential evolution incorporating binary African vultures optimization (EF-BDBA), which can effectively reduce the dimension of microarray data and obtain optimal biomarkers. In the first stage, we propose an ensemble filter feature selection method. The method combines an improved fast correlation-based filter algorithm with Fisher score. obviously redundant and irrelevant features can be filtered out to initially reduce the dimensionality of the microarray data. In the second stage, the optimal feature subset is selected using an improved binary differential evolution incorporating an improved binary African vultures optimization algorithm. The African vultures optimization algorithm has excellent global optimization ability. It has not been systematically applied to feature selection problems, especially for gene microarray data. We combine it with a differential evolution algorithm to improve population diversity. CONCLUSION Compared with traditional feature selection methods and advanced hybrid methods, the proposed method achieves higher classification accuracy and identifies excellent biomarkers while retaining fewer features. The experimental results demonstrate the effectiveness and advancement of our proposed algorithmic model.
Collapse
Affiliation(s)
- Wei Li
- Key Laboratory of Intelligent Computing in Medical Image (MIIC), Northeastern University, Ministry of Education, Shenyang, China
| | - Yuhuan Chi
- School of Computer Science and Engineering, Northeastern University, Shenyang, China
| | - Kun Yu
- School of Biomedical and Information Engineering, Northeastern University, Shenyang, China
| | - Weidong Xie
- School of Computer Science and Engineering, Northeastern University, Shenyang, China.
| |
Collapse
|
12
|
Sun L, Si S, Ding W, Xu J, Zhang Y. BSSFS: binary sparrow search algorithm for feature selection. INT J MACH LEARN CYB 2023. [DOI: 10.1007/s13042-023-01788-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
|
13
|
Feature selection using Information Gain and decision information in neighborhood decision system. Appl Soft Comput 2023. [DOI: 10.1016/j.asoc.2023.110100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
|
14
|
Qu K, Xu J, Han Z, Xu S. Maximum relevance minimum redundancy-based feature selection using rough mutual information in adaptive neighborhood rough sets. APPL INTELL 2023. [DOI: 10.1007/s10489-022-04398-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
|
15
|
Ashraf MT, Hamid I, Nawaz Q, Ali H. Hybrid Approach using Extreme Gradient Boosting (XGBoost) and Evolutionary Algorithm for Cancer Classification. 2023 INTERNATIONAL MULTI-DISCIPLINARY CONFERENCE IN EMERGING RESEARCH TRENDS (IMCERT) 2023. [DOI: 10.1109/imcert57083.2023.10075236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Affiliation(s)
| | - Isma Hamid
- National Textie University,Department of Computer Science,Faisalabad,Pakistan
| | - Qamar Nawaz
- University of Agriculture,Department of Computer Science,Faisalabad,Pakistan
| | - Hamid Ali
- National Textile University,Department of Computer Science,Faisalabad,Pakistan
| |
Collapse
|
16
|
Xie W, Wang L, Yu K, Shi T, Li W. Improved multi-layer binary firefly algorithm for optimizing feature selection and classification of microarray data. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
17
|
Li M, Ke L, Wang L, Deng S, Yu X. A novel hybrid gene selection for tumor identification by combining multifilter integration and a recursive flower pollination search algorithm. Knowl Based Syst 2023. [DOI: 10.1016/j.knosys.2022.110250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
18
|
A high-dimensional feature selection method based on modified Gray Wolf Optimization. Appl Soft Comput 2023. [DOI: 10.1016/j.asoc.2023.110031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
19
|
Zhang H, Sun Q, Dong K. Information-theoretic partially labeled heterogeneous feature selection based on neighborhood rough sets. Int J Approx Reason 2022. [DOI: 10.1016/j.ijar.2022.12.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
20
|
An S, Guo X, Wang C, Guo G, Dai J. A Soft Neighborhood Rough Set Model and Its Applications. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.12.074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
21
|
Pan Y, Xu W, Ran Q. An incremental approach to feature selection using the weighted dominance-based neighborhood rough sets. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01695-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
22
|
|
23
|
Sun L, Wang X, Ding W, Xu J. TSFNFR: Two-stage fuzzy neighborhood-based feature reduction with binary whale optimization algorithm for imbalanced data classification. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109849] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
24
|
Zhang D, Zhu P. Variable radius neighborhood rough sets and attribute reduction. Int J Approx Reason 2022. [DOI: 10.1016/j.ijar.2022.08.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
25
|
Feature selection based on self-information and entropy measures for incomplete neighborhood decision systems. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-022-00882-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
AbstractFor incomplete datasets with mixed numerical and symbolic features, feature selection based on neighborhood multi-granulation rough sets (NMRS) is developing rapidly. However, its evaluation function only considers the information contained in the lower approximation of the neighborhood decision, which easily leads to the loss of some information. To solve this problem, we construct a novel NMRS-based uncertain measure for feature selection, named neighborhood multi-granulation self-information-based pessimistic neighborhood multi-granulation tolerance joint entropy (PTSIJE), which can be used to incomplete neighborhood decision systems. First, from the algebra view, four kinds of neighborhood multi-granulation self-information measures of decision variables are proposed by using the upper and lower approximations of NMRS. We discuss the related properties, and find the fourth measure-lenient neighborhood multi-granulation self-information measure (NMSI) has better classification performance. Then, inspired by the algebra and information views simultaneously, a feature selection method based on PTSIJE is proposed. Finally, the Fisher score method is used to delete uncorrelated features to reduce the computational complexity for high-dimensional gene datasets, and a heuristic feature selection algorithm is raised to improve classification performance for mixed and incomplete datasets. Experimental results on 11 datasets show that our method selects fewer features and has higher classification accuracy than related methods.
Collapse
|
26
|
Xing Y, Kochunov P, van Erp TG, Ma T, Calhoun VD, Du Y. A novel neighborhood rough set-based feature selection method and its application to biomarker identification of schizophrenia. IEEE J Biomed Health Inform 2022; 27:215-226. [PMID: 36201411 PMCID: PMC10076451 DOI: 10.1109/jbhi.2022.3212479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Feature selection can disclose biomarkers of mental disorders that have unclear biological mechanisms. Although neighborhood rough set (NRS) has been applied to discover important sparse features, it has hardly ever been utilized in neuroimaging-based biomarker identification, probably due to the inadequate feature evaluation metric and incomplete information provided under a single-granularity. Here, we propose a new NRS-based feature selection method and successfully identify brain functional connectivity biomarkers of schizophrenia (SZ) using functional magnetic resonance imaging (fMRI) data. Specifically, we develop a new weighted metric based on NRS combined with information entropy to evaluate the capacity of features in distinguishing different groups. Inspired by multi-granularity information maximization theory, we further take advantage of the complementary information from different neighborhood sizes via a multi-granularity fusion to obtain the most discriminative and stable features. For validation, we compare our method with six popular feature selection methods using three public omics datasets as well as resting-state fMRI data of 393 SZ patients and 429 healthy controls. Results show that our method obtained higher classification accuracies on both omics data (100.0%, 88.6%, and 72.2% for three omics datasets, respectively) and fMRI data (93.9% for main dataset, and 76.3% and 83.8% for two independent datasets, respectively). Moreover, our findings reveal biologically meaningful substrates of SZ, notably involving the connectivity between the thalamus and superior temporal gyrus as well as between the postcentral gyrus and calcarine gyrus. Taken together, we propose a new NRS-based feature selection method that shows the potential of exploring effective and sparse neuroimaging-based biomarkers of mental disorders.
Collapse
Affiliation(s)
- Ying Xing
- School of Computer and Information Technology, Shanxi University, Taiyuan, China
| | - Peter Kochunov
- Maryland Psychiatric Research Center and Department of Psychiatry, University of Maryland, School of Medicine, Baltimore, MD, USA
| | - Theo G.M. van Erp
- Department of Psychiatry and Human Behavior, School of Medicine, University of California, Irvine, CA, USA
| | - Tianzhou Ma
- Department of Epidemiology and Biostatistics, University of Maryland, College Park, MD, USA
| | - Vince D. Calhoun
- Tri-Institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, Emory University, Atlanta, GA, USA
| | - Yuhui Du
- School of Computer and Information Technology, Shanxi University, Taiyuan, China
| |
Collapse
|
27
|
Sun L, Wang X, Ding W, Xu J, Meng H. TSFNFS: two-stage-fuzzy-neighborhood feature selection with binary whale optimization algorithm. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01653-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/10/2022]
|
28
|
AFNFS: Adaptive fuzzy neighborhood-based feature selection with adaptive synthetic over-sampling for imbalanced data. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.08.118] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
|
29
|
Ju H, Ding W, Shi Z, Huang J, Yang J, Yang X. Attribute reduction with personalized information granularity of nearest mutual neighbors. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.09.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
30
|
Qin X, Zhang S, Yin D, Chen D, Dong X. Two-stage feature selection for classification of gene expression data based on an improved Salp Swarm Algorithm. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:13747-13781. [PMID: 36654066 DOI: 10.3934/mbe.2022641] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Microarray technology has developed rapidly in recent years, producing a large number of ultra-high dimensional gene expression data. However, due to the huge sample size and dimension proportion of gene expression data, it is very challenging work to screen important genes from gene expression data. For small samples of high-dimensional biomedical data, this paper proposes a two-stage feature selection framework combining Wrapper, embedding and filtering to avoid the curse of dimensionality. The proposed framework uses weighted gene co-expression network (WGCNA), random forest and minimal redundancy maximal relevance (mRMR) for first stage feature selection. In the second stage, a new gene selection method based on the improved binary Salp Swarm Algorithm is proposed, which combines machine learning methods to adaptively select feature subsets suitable for classification algorithms. Finally, the classification accuracy is evaluated using six methods: lightGBM, RF, SVM, XGBoost, MLP and KNN. To verify the performance of the framework and the effectiveness of the proposed algorithm, the number of genes selected and the classification accuracy was compared with the other five intelligent optimization algorithms. The results show that the proposed framework achieves an accuracy equal to or higher than other advanced intelligent algorithms on 10 datasets, and achieves an accuracy of over 97.6% on all 10 datasets. This shows that the method proposed in this paper can solve the feature selection problem related to high-dimensional data, and the proposed framework has no data set limitation, and it can be applied to other fields involving feature selection.
Collapse
Affiliation(s)
- Xiwen Qin
- School of Mathematics and Statistics, Changchun University of Technology, Changchun 130012, China
| | - Shuang Zhang
- School of Mathematics and Statistics, Changchun University of Technology, Changchun 130012, China
| | - Dongmei Yin
- School of Mathematics and Statistics, Changchun University of Technology, Changchun 130012, China
| | - Dongxue Chen
- School of Mathematics and Statistics, Changchun University of Technology, Changchun 130012, China
| | - Xiaogang Dong
- School of Mathematics and Statistics, Changchun University of Technology, Changchun 130012, China
| |
Collapse
|
31
|
Peng X, Wang P, Xia S, Wang C, Pu C, Qian J. FNC: A fast neighborhood calculation framework. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109394] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
32
|
Zhang B, Li Y, Chai Z. A novel random multi-subspace based ReliefF for feature selection. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
33
|
Gaussian kernel based gene selection in a single cell gene decision space. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.08.050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
34
|
A novel biomarker selection method combining graph neural network and gene relationships applied to microarray data. BMC Bioinformatics 2022; 23:303. [PMID: 35883022 PMCID: PMC9327232 DOI: 10.1186/s12859-022-04848-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 07/15/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The discovery of critical biomarkers is significant for clinical diagnosis, drug research and development. Researchers usually obtain biomarkers from microarray data, which comes from the dimensional curse. Feature selection in machine learning is usually used to solve this problem. However, most methods do not fully consider feature dependence, especially the real pathway relationship of genes. RESULTS Experimental results show that the proposed method is superior to classical algorithms and advanced methods in feature number and accuracy, and the selected features have more significance. METHOD This paper proposes a feature selection method based on a graph neural network. The proposed method uses the actual dependencies between features and the Pearson correlation coefficient to construct graph-structured data. The information dissemination and aggregation operations based on graph neural network are applied to fuse node information on graph structured data. The redundant features are clustered by the spectral clustering method. Then, the feature ranking aggregation model using eight feature evaluation methods acts on each clustering sub-cluster for different feature selection. CONCLUSION The proposed method can effectively remove redundant features. The algorithm's output has high stability and classification accuracy, which can potentially select potential biomarkers.
Collapse
|
35
|
Abstract
The complexity of the data type and distribution leads to the increase in uncertainty in the relationship between samples, which brings challenges to effectively mining the potential cluster structure of data. Ensemble clustering aims to obtain a unified cluster division by fusing multiple different base clustering results. This paper proposes a three-way ensemble clustering algorithm based on sample’s perturbation theory to solve the problem of inaccurate decision making caused by inaccurate information or insufficient data. The algorithm first combines the natural nearest neighbor algorithm to generate two sets of perturbed data sets, randomly extracts the feature subsets of the samples, and uses the traditional clustering algorithm to obtain different base clusters. The sample’s stability is obtained by using the co-association matrix and determinacy function, and then the samples can be divided into a stable region and unstable region according to a threshold for the sample’s stability. The stable region consists of high-stability samples and is divided into the core region of each cluster using the K-means algorithm. The unstable region consists of low-stability samples and is assigned to the fringe regions of each cluster. Therefore, a three-way clustering result is formed. The experimental results show that the proposed algorithm in this paper can obtain better clustering results compared with other clustering ensemble algorithms on the UCI Machine Learning Repository data set, and can effectively reveal the clustering structure.
Collapse
|
36
|
Information gain-based semi-supervised feature selection for hybrid data. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03770-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
37
|
Rezaee K, Jeon G, Khosravi MR, Attar HH, Sabzevari A. Deep learning‐based microarray cancer classification and ensemble gene selection approach. IET Syst Biol 2022; 16:120-131. [PMID: 35790076 PMCID: PMC9290776 DOI: 10.1049/syb2.12044] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 04/04/2022] [Accepted: 05/31/2022] [Indexed: 12/19/2022] Open
Abstract
Malignancies and diseases of various genetic origins can be diagnosed and classified with microarray data. There are many obstacles to overcome due to the large size of the gene and the small number of samples in the microarray. A combination strategy for gene expression in a variety of diseases is described in this paper, consisting of two steps: identifying the most effective genes via soft ensembling and classifying them with a novel deep neural network. The feature selection approach combines three strategies to select wrapper genes and rank them according to the k‐nearest neighbour algorithm, resulting in a very generalisable model with low error levels. Using soft ensembling, the most effective subsets of genes were identified from three microarray datasets of diffuse large cell lymphoma, leukaemia, and prostate cancer. A stacked deep neural network was used to classify all three datasets, achieving an average accuracy of 97.51%, 99.6%, and 96.34%, respectively. In addition, two previously unreported datasets from small, round blue cell tumors (SRBCTs)and multiple sclerosis‐related brain tissue lesions were examined to show the generalisability of the model method.
Collapse
Affiliation(s)
- Khosro Rezaee
- Department of Biomedical Engineering Meybod University Meybod Iran
| | - Gwanggil Jeon
- Department of Embedded Systems Engineering College of Information Technology Incheon National University Incheon Korea
| | | | - Hani H. Attar
- Department of Energy Engineering Zarqa University Zarqa Jordan
| | | |
Collapse
|
38
|
|
39
|
Feature selection using self-information uncertainty measures in neighborhood information systems. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03760-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
40
|
Hu M, Tsang ECC, Guo Y, Xu W. Fast and Robust Attribute Reduction Based on the Separability in Fuzzy Decision Systems. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:5559-5572. [PMID: 33400663 DOI: 10.1109/tcyb.2020.3040803] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Attribute reduction is one of the most important preprocessing steps in machine learning and data mining. As a key step of attribute reduction, attribute evaluation directly affects classification performance, search time, and stopping criterion. The existing evaluation functions are greatly dependent on the relationship between objects, which makes its computational time and space more costly. To solve this problem, we propose a novel separability-based evaluation function and reduction method by using the relationship between objects and decision categories directly. The degree of aggregation (DA) of intraclass objects and the degree of dispersion (DD) of between-class objects are first defined to measure the significance of an attribute subset. Then, the separability of attribute subsets is defined by DA and DD in fuzzy decision systems, and we design a sequentially forward selection based on the separability (SFSS) algorithm to select attributes. Furthermore, a postpruning strategy is introduced to prevent overfitting and determine a termination parameter. Finally, the SFSS algorithm is compared with some typical reduction algorithms using some public datasets from UCI and ELVIRA Biomedical repositories. The interpretability of SFSS is directly presented by the performance on MNIST handwritten digits. The experimental comparisons show that SFSS is fast and robust, which has higher classification accuracy and compression ratio, with extremely low computational time.
Collapse
|
41
|
Wang P, Qu L, Zhang Q. Information entropy based attribute reduction for incomplete heterogeneous data. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2022. [DOI: 10.3233/jifs-212037] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Attribute reduction in an information system (IS) is an important research topic in rough set theory (RST). This paper investigates attribute reduction for incomplete heterogeneous data based on information entropy. Information entropy in an incomplete IS with heterogeneous data (IISH) is first defined. Then, some derived notions of information entropy, such as joint information entropy, conditional information entropy, mutual information entropy, gain and gain ratio in an incomplete decision IS with heterogeneous data (IDISH), are presented. Next, information entropy is applied to perform attribute reduction in an IDISH. Two attribute reduction algorithms, based on gain and gain ratio, respectively, are proposed. Finally, in order to illustrate the feasibility and efficiency of the proposed algorithms, experimental analysis is carried out and comparisons are done. It is worth mentioning that the incomplete rate is used to deal with incomplete heterogeneous data.
Collapse
Affiliation(s)
- Pei Wang
- Key Laboratory of Complex System Optimization and Big Data Processing in Department of Guangxi Education, Yulin Normal University, Yulin, Guangxi, P.R. China
| | - Liangdong Qu
- School of Artificial Intelligence, Guangxi University for Nationalities, Nanning, Guangxi, P.R. China
| | - Qinli Zhang
- School of Big Data and Artificial Intelligence, Chizhou University, Chizhou, Anhui, P.R. China
| |
Collapse
|
42
|
Online group streaming feature selection using entropy-based uncertainty measures for fuzzy neighborhood rough sets. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-022-00763-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
AbstractOnline group streaming feature selection, as an essential online processing method, can deal with dynamic feature selection tasks by considering the original group structure information of the features. Due to the fuzziness and uncertainty of the feature stream, some existing methods are unstable and yield low predictive accuracy. To address these issues, this paper presents a novel online group streaming feature selection method (FNE-OGSFS) using fuzzy neighborhood entropy-based uncertainty measures. First, a separability measure integrating the dependency degree with the coincidence degree is proposed and introduced into the fuzzy neighborhood rough sets model to define a new fuzzy neighborhood entropy. Second, inspired by both algebra and information views, some fuzzy neighborhood entropy-based uncertainty measures are investigated and some properties are derived. Furthermore, the optimal features in the group are selected to flow into the feature space according to the significance of features, and the features with interactions are left. Then, all selected features are re-evaluated by the Lasso model to discard the redundant features. Finally, an online group streaming feature selection algorithm is designed. Experimental results compared with eight representative methods on thirteen datasets show that FNE-OGSFS can achieve better comprehensive performance.
Collapse
|
43
|
Xin XW, Shi CL, Sun JB, Xue ZA, Song JH, Peng WM. A novel attribute reduction method based on intuitionistic fuzzy three-way cognitive clustering. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03496-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
44
|
Sun L, Zhang J, Ding W, Xu J. Feature reduction for imbalanced data classification using similarity-based feature clustering with adaptive weighted K-nearest neighbors. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.02.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
|
45
|
Sun L, Si S, Zhao J, Xu J, Lin Y, Lv Z. Feature selection using binary monarch butterfly optimization. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03554-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
46
|
|
47
|
Xu J, Qu K, Meng X, Sun Y, Hou Q. Feature selection based on multiview entropy measures in multiperspective rough set. INT J INTELL SYST 2022. [DOI: 10.1002/int.22878] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Affiliation(s)
- Jiucheng Xu
- Engineering Lab of Intelligence Business & Internet of Things Henan Province Xinxiang China
- College of Computer and Information Engineering Henan Normal University Xinxiang China
| | - Kanglin Qu
- Engineering Lab of Intelligence Business & Internet of Things Henan Province Xinxiang China
- College of Computer and Information Engineering Henan Normal University Xinxiang China
| | - Xiangru Meng
- Engineering Lab of Intelligence Business & Internet of Things Henan Province Xinxiang China
- College of Computer and Information Engineering Henan Normal University Xinxiang China
| | - Yuanhao Sun
- Engineering Lab of Intelligence Business & Internet of Things Henan Province Xinxiang China
- College of Computer and Information Engineering Henan Normal University Xinxiang China
| | - Qincheng Hou
- Engineering Lab of Intelligence Business & Internet of Things Henan Province Xinxiang China
- College of Computer and Information Engineering Henan Normal University Xinxiang China
| |
Collapse
|
48
|
Liu Z, Wang R, Zhang W. Improving the generalization of unsupervised feature learning by using data from different sources on gene expression data for cancer diagnosis. Med Biol Eng Comput 2022; 60:1055-1073. [DOI: 10.1007/s11517-022-02522-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Accepted: 01/30/2022] [Indexed: 10/19/2022]
|
49
|
Mei W, Liu Z, Tang L, Su Y. Test Strategy Optimization Based on Soft Sensing and Ensemble Belief Measurement. SENSORS 2022; 22:s22062138. [PMID: 35336309 PMCID: PMC8948794 DOI: 10.3390/s22062138] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/12/2022] [Revised: 03/02/2022] [Accepted: 03/04/2022] [Indexed: 12/04/2022]
Abstract
Resulting from the short production cycle and rapid design technology development, traditional prognostic and health management (PHM) approaches become impractical and fail to match the requirement of systems with structural and functional complexity. Among all PHM designs, testability design and maintainability design face critical difficulties. First, testability design requires much labor and knowledge preparation, and wastes the sensor recording information. Second, maintainability design suffers bad influences by improper testability design. We proposed a test strategy optimization based on soft-sensing and ensemble belief measurements to overcome these problems. Instead of serial PHM design, the proposed method constructs a closed loop between testability and maintenance to generate an adaptive fault diagnostic tree with soft-sensor nodes. The diagnostic tree generated ensures high efficiency and flexibility, taking advantage of extreme learning machine (ELM) and affinity propagation (AP). The experiment results show that our method receives the highest performance with state-of-art methods. Additionally, the proposed method enlarges the diagnostic flexibility and saves much human labor on testability design.
Collapse
Affiliation(s)
- Wenjuan Mei
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China; (W.M.); (Y.S.)
| | - Zhen Liu
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China; (W.M.); (Y.S.)
- Correspondence: ; Tel.: +86-028-6183-0316
| | - Lei Tang
- Southwest Institute of Technical Physics, Chengdu 611731, China;
| | - Yuanzhang Su
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China; (W.M.); (Y.S.)
- School of Foreign Language, University of Electronic Science and Technology of China, Chengdu 611731, China
| |
Collapse
|
50
|
Wang Y, Wang S. Some results on fuzzy relations. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2022. [DOI: 10.3233/jifs-212215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Fuzzy relation is one of the main research contents of fuzzy set theory. This paper obtains some results on fuzzy relations by studying relationships between fuzzy relations and their uncertainty measurement. The concepts of equality, dependence, partial dependence and independence between fuzzy relations are first introduced. Then, uncertainty measurement for a fuzzy relation is investigated by using dependence between fuzzy relations. Moreover, the basic properties of uncertainty measurement are obtained. Next, effectiveness analysis is carried out. Finally, an application of the proposed measures in attribute reduction for heterogeneous data is given. These results will be helpful for understanding the essence of a fuzzy relation.
Collapse
Affiliation(s)
- Yini Wang
- Guangxi Key Laboratory of Cross-border E-commerce Intelligent Information Processing, Guangxi University of Finance and Economics, Nanning, Guangxi, P.R. China
- Panyapiwat Institute of Management, Bangkok, Bangkok, Thailand
| | - Sichun Wang
- Guangxi Key Laboratory of Cross-border E-commerce Intelligent Information Processing, Guangxi University of Finance and Economics, Nanning, Guangxi, P.R. China
| |
Collapse
|