1
|
Esfandiari A, Nasiri N. Gene selection and cancer classification using interaction-based feature clustering and improved-binary Bat algorithm. Comput Biol Med 2024; 181:109071. [PMID: 39205342 DOI: 10.1016/j.compbiomed.2024.109071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Revised: 08/13/2024] [Accepted: 08/22/2024] [Indexed: 09/04/2024]
Abstract
In high-dimensional gene expression data, selecting an optimal subset of genes is crucial for achieving high classification accuracy and reliable diagnosis of diseases. This paper proposes a two-stage hybrid model for gene selection based on clustering and a swarm intelligence algorithm to identify the most informative genes with high accuracy. First, a clustering-based multivariate filter approach is performed to explore the interactions between the features and eliminate any redundant or irrelevant ones. Then, by controlling for the problem of premature convergence in the binary Bat algorithm, the optimal gene subset is determined using different classifiers with the Monte Carlo cross-validation data partitioning model. The effectiveness of our proposed framework is evaluated using eight gene expression datasets, by comparison with other recently published algorithms in the literature. Experiments confirm that in seven out of eight datasets, the proposed method can achieve superior results in terms of classification accuracy and gene subset size. In particular, it achieves a classification accuracy of 100% in Lymphoma and Ovarian datasets and above 97.4% in the rest with a minimum number of genes. The results demonstrate that our proposed algorithm has the potential to solve the feature selection problem in different applications with high-dimensional datasets.
Collapse
Affiliation(s)
- Ahmad Esfandiari
- Department of Computer Engineering, Sari Branch, Islamic Azad University, Sari, Iran.
| | - Niki Nasiri
- Pediatric Infectious Diseases Research Center, Communicable Diseases Institute, Mazandaran University of Medical Sciences, Sari, Iran
| |
Collapse
|
2
|
|
3
|
A Feature Selection Algorithm Integrating Maximum Classification Information and Minimum Interaction Feature Dependency Information. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2021:3569632. [PMID: 34992644 PMCID: PMC8727115 DOI: 10.1155/2021/3569632] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/28/2021] [Revised: 11/21/2021] [Accepted: 12/07/2021] [Indexed: 11/17/2022]
Abstract
Feature selection is the key step in the analysis of high-dimensional small sample data. The core of feature selection is to analyse and quantify the correlation between features and class labels and the redundancy between features. However, most of the existing feature selection algorithms only consider the classification contribution of individual features and ignore the influence of interfeature redundancy and correlation. Therefore, this paper proposes a feature selection algorithm for nonlinear dynamic conditional relevance (NDCRFS) through the study and analysis of the existing feature selection algorithm ideas and method. Firstly, redundancy and relevance between features and between features and class labels are discriminated by mutual information, conditional mutual information, and interactive mutual information. Secondly, the selected features and candidate features are dynamically weighted utilizing information gain factors. Finally, to evaluate the performance of this feature selection algorithm, NDCRFS was validated against 6 other feature selection algorithms on three classifiers, using 12 different data sets, for variability and classification metrics between the different algorithms. The experimental results show that the NDCRFS method can improve the quality of the feature subsets and obtain better classification results.
Collapse
|
4
|
Bhattacharjee S, Ikromjanov K, Carole KS, Madusanka N, Cho NH, Hwang YB, Sumon RI, Kim HC, Choi HK. Cluster Analysis of Cell Nuclei in H&E-Stained Histological Sections of Prostate Cancer and Classification Based on Traditional and Modern Artificial Intelligence Techniques. Diagnostics (Basel) 2021; 12:diagnostics12010015. [PMID: 35054182 PMCID: PMC8774423 DOI: 10.3390/diagnostics12010015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 12/14/2021] [Accepted: 12/20/2021] [Indexed: 11/16/2022] Open
Abstract
Biomarker identification is very important to differentiate the grade groups in the histopathological sections of prostate cancer (PCa). Assessing the cluster of cell nuclei is essential for pathological investigation. In this study, we present a computer-based method for cluster analyses of cell nuclei and performed traditional (i.e., unsupervised method) and modern (i.e., supervised method) artificial intelligence (AI) techniques for distinguishing the grade groups of PCa. Two datasets on PCa were collected to carry out this research. Histopathology samples were obtained from whole slides stained with hematoxylin and eosin (H&E). In this research, state-of-the-art approaches were proposed for color normalization, cell nuclei segmentation, feature selection, and classification. A traditional minimum spanning tree (MST) algorithm was employed to identify the clusters and better capture the proliferation and community structure of cell nuclei. K-medoids clustering and stacked ensemble machine learning (ML) approaches were used to perform traditional and modern AI-based classification. The binary and multiclass classification was derived to compare the model quality and results between the grades of PCa. Furthermore, a comparative analysis was carried out between traditional and modern AI techniques using different performance metrics (i.e., statistical parameters). Cluster features of the cell nuclei can be useful information for cancer grading. However, further validation of cluster analysis is required to accomplish astounding classification results.
Collapse
Affiliation(s)
| | - Kobiljon Ikromjanov
- Department of Digital Anti-Aging Healthcare, u-AHRC, Inje University, Gimhae 50834, Korea; (K.I.); (K.S.C.); (Y.-B.H.); (R.I.S.); (H.-C.K.)
| | - Kouayep Sonia Carole
- Department of Digital Anti-Aging Healthcare, u-AHRC, Inje University, Gimhae 50834, Korea; (K.I.); (K.S.C.); (Y.-B.H.); (R.I.S.); (H.-C.K.)
| | - Nuwan Madusanka
- School of Computing & IT, Sri Lanka Technological Campus, Paduka 10500, Sri Lanka;
| | - Nam-Hoon Cho
- Department of Pathology, Yonsei University Hospital, Seoul 03722, Korea;
| | - Yeong-Byn Hwang
- Department of Digital Anti-Aging Healthcare, u-AHRC, Inje University, Gimhae 50834, Korea; (K.I.); (K.S.C.); (Y.-B.H.); (R.I.S.); (H.-C.K.)
| | - Rashadul Islam Sumon
- Department of Digital Anti-Aging Healthcare, u-AHRC, Inje University, Gimhae 50834, Korea; (K.I.); (K.S.C.); (Y.-B.H.); (R.I.S.); (H.-C.K.)
| | - Hee-Cheol Kim
- Department of Digital Anti-Aging Healthcare, u-AHRC, Inje University, Gimhae 50834, Korea; (K.I.); (K.S.C.); (Y.-B.H.); (R.I.S.); (H.-C.K.)
| | - Heung-Kook Choi
- Department of Computer Engineering, u-AHRC, Inje University, Gimhae 50834, Korea;
- Correspondence: ; Tel.: +82-10-6733-3437
| |
Collapse
|
5
|
Wang Y, Wang L, Yang Y, Lian T. SemSeq4FD: Integrating global semantic relationship and local sequential order to enhance text representation for fake news detection. EXPERT SYSTEMS WITH APPLICATIONS 2021; 166:114090. [PMID: 33041529 DOI: 10.1016/j.eswa.2021.114864] [Citation(s) in RCA: 274] [Impact Index Per Article: 91.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2020] [Revised: 09/17/2020] [Accepted: 10/02/2020] [Indexed: 05/27/2023]
Abstract
The wide spread of fake news has caused huge losses to both governments and the public. Many existing works on fake news detection utilized spreading information like propagators profiles and the propagation structure. However, such methods face the difficulty of data collection and cannot detect fake news at the early stage. An alternative approach is to detect fake news solely based on its content. Early content-based methods rely on manually designed linguistic features. Such shallow features are domain-dependent, and cannot easily be generalized to cross-domain data. Recently, many natural language processing tasks resort to deep learning methods to learn word, sentence, and document representations. In this paper, we propose a novel graph-based neural network model named SemSeq4FD for early fake news detection based on enhanced text representations. In SemSeq4FD, we model the global pair-wise semantic relations between sentences as a complete graph, and learn the global sentence representations via a graph convolutional network with self-attention mechanism. Considering the importance of local context in conveying the sentence meaning, we employ a 1D convolutional network to learn the local sentence representations. The two representations are combined to form the enhanced sentence representations. Then a LSTM-based network is used to model the sequence of enhanced sentence representations, yielding the final document representation for fake news detection. Experiments conducted on four real-world datasets in English and Chinese, including cross-source and cross-domain datasets, demonstrate that our model can outperform the state-of-the-art methods.
Collapse
Affiliation(s)
- Yuhang Wang
- Data Science College, Taiyuan University of Technology, Jinzhong, Shanxi, 030600, China
| | - Li Wang
- Data Science College, Taiyuan University of Technology, Jinzhong, Shanxi, 030600, China
| | - Yanjie Yang
- Data Science College, Taiyuan University of Technology, Jinzhong, Shanxi, 030600, China
| | - Tao Lian
- Data Science College, Taiyuan University of Technology, Jinzhong, Shanxi, 030600, China
| |
Collapse
|
6
|
|
7
|
|
8
|
Gene selection and disease prediction from gene expression data using a two-stage hetero-associative memory. PROGRESS IN ARTIFICIAL INTELLIGENCE 2018. [DOI: 10.1007/s13748-018-0148-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
|
9
|
Afghah F, Razi A, Soroushmehr R, Ghanbari H, Najarian K. Game Theoretic Approach for Systematic Feature Selection; Application in False Alarm Detection in Intensive Care Units. ENTROPY (BASEL, SWITZERLAND) 2018; 20:E190. [PMID: 33265281 PMCID: PMC7512707 DOI: 10.3390/e20030190] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/10/2018] [Revised: 02/27/2018] [Accepted: 03/05/2018] [Indexed: 01/19/2023]
Abstract
Intensive Care Units (ICUs) are equipped with many sophisticated sensors and monitoring devices to provide the highest quality of care for critically ill patients. However, these devices might generate false alarms that reduce standard of care and result in desensitization of caregivers to alarms. Therefore, reducing the number of false alarms is of great importance. Many approaches such as signal processing and machine learning, and designing more accurate sensors have been developed for this purpose. However, the significant intrinsic correlation among the extracted features from different sensors has been mostly overlooked. A majority of current data mining techniques fail to capture such correlation among the collected signals from different sensors that limits their alarm recognition capabilities. Here, we propose a novel information-theoretic predictive modeling technique based on the idea of coalition game theory to enhance the accuracy of false alarm detection in ICUs by accounting for the synergistic power of signal attributes in the feature selection stage. This approach brings together techniques from information theory and game theory to account for inter-features mutual information in determining the most correlated predictors with respect to false alarm by calculating Banzhaf power of each feature. The numerical results show that the proposed method can enhance classification accuracy and improve the area under the ROC (receiver operating characteristic) curve compared to other feature selection techniques, when integrated in classifiers such as Bayes-Net that consider inter-features dependencies.
Collapse
Affiliation(s)
- Fatemeh Afghah
- School of Informatics, Computing and Cyber Systems, Northern Arizona University, Flagstaff, AZ 86011, USA
| | - Abolfazl Razi
- School of Informatics, Computing and Cyber Systems, Northern Arizona University, Flagstaff, AZ 86011, USA
| | - Reza Soroushmehr
- Department of Emergency Medicine, University of Michigan, Ann Arbor, MI 48109, USA
| | - Hamid Ghanbari
- Department of Emergency Medicine, University of Michigan, Ann Arbor, MI 48109, USA
| | - Kayvan Najarian
- Department of Emergency Medicine, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
10
|
Feature Subset Selection for Cancer Classification Using Weight Local Modularity. Sci Rep 2016; 6:34759. [PMID: 27703256 PMCID: PMC5050509 DOI: 10.1038/srep34759] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2016] [Accepted: 09/19/2016] [Indexed: 11/27/2022] Open
Abstract
Microarray is recently becoming an important tool for profiling the global gene expression patterns of tissues. Gene selection is a popular technology for cancer classification that aims to identify a small number of informative genes from thousands of genes that may contribute to the occurrence of cancers to obtain a high predictive accuracy. This technique has been extensively studied in recent years. This study develops a novel feature selection (FS) method for gene subset selection by utilizing the Weight Local Modularity (WLM) in a complex network, called the WLMGS. In the proposed method, the discriminative power of gene subset is evaluated by using the weight local modularity of a weighted sample graph in the gene subset where the intra-class distance is small and the inter-class distance is large. A higher local modularity of the gene subset corresponds to a greater discriminative of the gene subset. With the use of forward search strategy, a more informative gene subset as a group can be selected for the classification process. Computational experiments show that the proposed algorithm can select a small subset of the predictive gene as a group while preserving classification accuracy.
Collapse
|
11
|
García V, Salvador Sánchez J. Mapping microarray gene expression data into dissimilarity spaces for tumor classification. Inf Sci (N Y) 2015. [DOI: 10.1016/j.ins.2014.09.064] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|