1
|
Chen J, Wen B. Bi-level gene selection of cancer by combining clustering and sparse learning. Comput Biol Med 2024; 172:108236. [PMID: 38471351 DOI: 10.1016/j.compbiomed.2024.108236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 02/07/2024] [Accepted: 02/25/2024] [Indexed: 03/14/2024]
Abstract
The diagnosis of cancer based on gene expression profile data has attracted extensive attention in the field of biomedical science. This type of data usually has the characteristics of high dimensionality and noise. In this paper, a hybrid gene selection method based on clustering and sparse learning is proposed to choose the key genes with high precision. We first propose a filter method, which combines the k-means clustering algorithm and signal-to-noise ratio ranking method, and then, a weighted gene co-expression network has been applied to the reduced data set to identify modules corresponding to biological pathways. Moreover, we choose the key genes by using group bridge and sparse group lasso as wrapper methods. Finally, we conduct some numerical experiments on six cancer datasets. The numerical results show that our proposed method has achieved good performance in gene selection and cancer classification.
Collapse
Affiliation(s)
- Junnan Chen
- School of Science, Hebei University of Technology, Tianjin, PR China.
| | - Bo Wen
- Institute of Mathematics, Hebei University of Technology, Tianjin, PR China.
| |
Collapse
|
2
|
Yin YT, Shi L, Wu C, Zhang MY, Li JX, Zhou YF, Wang SC, Wang HY, Mai SJ. TRIM29 modulates proteins involved in PTEN/AKT/mTOR and JAK2/STAT3 signaling pathway and suppresses the progression of hepatocellular carcinoma. Med Oncol 2024; 41:79. [PMID: 38393440 DOI: 10.1007/s12032-024-02307-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 01/17/2024] [Indexed: 02/25/2024]
Abstract
Tripartite motif-containing 29 (TRIM29), also known as the ataxia telangiectasia group D-complementing (ATDC) gene, has been reported to play an oncogenic or tumor suppressive role in developing different tumors. So far, its expression and biological functions in hepatocellular carcinoma (HCC) remain unclear. We investigated TRIM29 expression pattern in human HCC samples using quantitative RT-PCR and immunohistochemistry. Relationships between TRIM29 expression level, clinical prognostic indicators, overall survival (OS), and disease-free survival (DFS) were evaluated by Kaplan-Meier analysis and Cox proportional hazards model. A series of in vitro experiments and a xenograft tumor model were conducted to detect the functions of TRIM29 in HCC cells. RNA sequencing, western blotting, and immunochemical staining were performed to assess the molecular regulation of TRIM29 in HCC. We found that the mRNA and protein levels of TRIM29 were significantly reduced in HCC samples, compared with adjacent noncancerous tissues, and were negatively correlated with poor differentiation of HCC tissues. Survival analysis confirmed that lower TRIM29 expression significantly correlated with shorter OS and DFS of HCC patients. TRIM29 overexpression remarkably inhibited cell proliferation, migration, and EMT in HCC cells, whereas knockdown of TRIM29 reversed these effects. Moreover, deactivation of the PTEN/AKT/mTOR and JAK2/STAT3 pathways might be involved in the tumor suppressive role of TRIM29 in HCC. Our findings indicate that TRIM29 in HCC exerts its tumor suppressive effects through inhibition of the PTEN/AKT/mTOR and JAK2/STAT3 signaling pathways and may be used as a potential biomarker for survival in patients with HCC.
Collapse
Affiliation(s)
- Yu-Ting Yin
- State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-Sen University Cancer Center, Guangzhou, 510060, People's Republic of China
| | - Lu Shi
- State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-Sen University Cancer Center, Guangzhou, 510060, People's Republic of China
| | - Chun Wu
- State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-Sen University Cancer Center, Guangzhou, 510060, People's Republic of China
| | - Mei-Yin Zhang
- State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-Sen University Cancer Center, Guangzhou, 510060, People's Republic of China
| | - Jia-Xin Li
- State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-Sen University Cancer Center, Guangzhou, 510060, People's Republic of China
| | - Yu-Feng Zhou
- State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-Sen University Cancer Center, Guangzhou, 510060, People's Republic of China
| | - Shuo-Cheng Wang
- State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-Sen University Cancer Center, Guangzhou, 510060, People's Republic of China
| | - Hui-Yun Wang
- State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-Sen University Cancer Center, Guangzhou, 510060, People's Republic of China.
| | - Shi-Juan Mai
- State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-Sen University Cancer Center, Guangzhou, 510060, People's Republic of China.
| |
Collapse
|
3
|
Zhou K, Yin Z, Gu J, Zeng Z. A Feature Selection Method Based on Graph Theory for Cancer Classification. Comb Chem High Throughput Screen 2024; 27:650-660. [PMID: 37056061 DOI: 10.2174/1386207326666230413085646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 02/02/2023] [Accepted: 02/24/2023] [Indexed: 04/15/2023]
Abstract
OBJECTIVE Gene expression profile data is a good data source for people to study tumors, but gene expression data has the characteristics of high dimension and redundancy. Therefore, gene selection is a very important step in microarray data classification. METHODS In this paper, a feature selection method based on the maximum mutual information coefficient and graph theory is proposed. Each feature of gene expression data is treated as a vertex of the graph, and the maximum mutual information coefficient between genes is used to measure the relationship between the vertices to construct an undirected graph, and then the core and coritivity theory is used to determine the feature subset of gene data. RESULTS In this work, we used three different classification models and three different evaluation metrics such as accuracy, F1-Score, and AUC to evaluate the classification performance to avoid reliance on any one classifier or evaluation metric. The experimental results on six different types of genetic data show that our proposed algorithm has high accuracy and robustness compared to other advanced feature selection methods. CONCLUSION In this method, the importance and correlation of features are considered at the same time, and the problem of gene selection in microarray data classification is solved.
Collapse
Affiliation(s)
- Kai Zhou
- School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science, Shanghai, 201620, China
| | - Zhixiang Yin
- School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science, Shanghai, 201620, China
| | - Jiaying Gu
- School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science, Shanghai, 201620, China
| | - Zhiliang Zeng
- School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science, Shanghai, 201620, China
| |
Collapse
|
4
|
Verma RK, Lokhande KB, Srivastava PK, Singh A. Elucidating B4GALNT1 as potential biomarker in hepatocellular carcinoma using machine learning models and mutational dynamics explored through MD simulation. INFORMATICS IN MEDICINE UNLOCKED 2024; 48:101514. [DOI: 10.1016/j.imu.2024.101514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2025] Open
|
5
|
Wang Y, Gao X, Ru X, Sun P, Wang J. The Weight-Based Feature Selection (WBFS) Algorithm Classifies Lung Cancer Subtypes Using Proteomic Data. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1003. [PMID: 37509950 PMCID: PMC10378569 DOI: 10.3390/e25071003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 06/27/2023] [Accepted: 06/28/2023] [Indexed: 07/30/2023]
Abstract
Feature selection plays an important role in improving the performance of classification or reducing the dimensionality of high-dimensional datasets, such as high-throughput genomics/proteomics data in bioinformatics. As a popular approach with computational efficiency and scalability, information theory has been widely incorporated into feature selection. In this study, we propose a unique weight-based feature selection (WBFS) algorithm that assesses selected features and candidate features to identify the key protein biomarkers for classifying lung cancer subtypes from The Cancer Proteome Atlas (TCPA) database and we further explored the survival analysis between selected biomarkers and subtypes of lung cancer. Results show good performance of the combination of our WBFS method and Bayesian network for mining potential biomarkers. These candidate signatures have valuable biological significance in tumor classification and patient survival analysis. Taken together, this study proposes the WBFS method that helps to explore candidate biomarkers from biomedical datasets and provides useful information for tumor diagnosis or therapy strategies.
Collapse
Affiliation(s)
- Yangyang Wang
- School of Electronics and Information, Northwestern Polytechnical University, Xi'an 710129, China
| | - Xiaoguang Gao
- School of Electronics and Information, Northwestern Polytechnical University, Xi'an 710129, China
| | - Xinxin Ru
- School of Electronics and Information, Northwestern Polytechnical University, Xi'an 710129, China
| | - Pengzhan Sun
- School of Electronics and Information, Northwestern Polytechnical University, Xi'an 710129, China
| | - Jihan Wang
- Xi'an Key Laboratory of Stem Cell and Regenerative Medicine, Institute of Medical Research, Northwestern Polytechnical University, Xi'an 710072, China
| |
Collapse
|
6
|
Li W, Chi Y, Yu K, Xie W. A two-stage hybrid biomarker selection method based on ensemble filter and binary differential evolution incorporating binary African vultures optimization. BMC Bioinformatics 2023; 24:130. [PMID: 37016297 PMCID: PMC10072044 DOI: 10.1186/s12859-023-05247-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Accepted: 03/21/2023] [Indexed: 04/06/2023] Open
Abstract
BACKGROUND In the field of genomics and personalized medicine, it is a key issue to find biomarkers directly related to the diagnosis of specific diseases from high-throughput gene microarray data. Feature selection technology can discover biomarkers with disease classification information. RESULTS We use support vector machines as classifiers and use the five-fold cross-validation average classification accuracy, recall, precision and F1 score as evaluation metrics to evaluate the identified biomarkers. Experimental results show classification accuracy above 0.93, recall above 0.92, precision above 0.91, and F1 score above 0.94 on eight microarray datasets. METHOD This paper proposes a two-stage hybrid biomarker selection method based on ensemble filter and binary differential evolution incorporating binary African vultures optimization (EF-BDBA), which can effectively reduce the dimension of microarray data and obtain optimal biomarkers. In the first stage, we propose an ensemble filter feature selection method. The method combines an improved fast correlation-based filter algorithm with Fisher score. obviously redundant and irrelevant features can be filtered out to initially reduce the dimensionality of the microarray data. In the second stage, the optimal feature subset is selected using an improved binary differential evolution incorporating an improved binary African vultures optimization algorithm. The African vultures optimization algorithm has excellent global optimization ability. It has not been systematically applied to feature selection problems, especially for gene microarray data. We combine it with a differential evolution algorithm to improve population diversity. CONCLUSION Compared with traditional feature selection methods and advanced hybrid methods, the proposed method achieves higher classification accuracy and identifies excellent biomarkers while retaining fewer features. The experimental results demonstrate the effectiveness and advancement of our proposed algorithmic model.
Collapse
Affiliation(s)
- Wei Li
- Key Laboratory of Intelligent Computing in Medical Image (MIIC), Northeastern University, Ministry of Education, Shenyang, China
| | - Yuhuan Chi
- School of Computer Science and Engineering, Northeastern University, Shenyang, China
| | - Kun Yu
- School of Biomedical and Information Engineering, Northeastern University, Shenyang, China
| | - Weidong Xie
- School of Computer Science and Engineering, Northeastern University, Shenyang, China.
| |
Collapse
|
7
|
Identification of gene signatures for COAD using feature selection and Bayesian network approaches. Sci Rep 2022; 12:8761. [PMID: 35610288 PMCID: PMC9130243 DOI: 10.1038/s41598-022-12780-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Accepted: 05/03/2022] [Indexed: 12/13/2022] Open
Abstract
The combination of TCGA and GTEx databases will provide more comprehensive information for characterizing the human genome in health and disease, especially for underlying the cancer genetic alterations. Here we analyzed the gene expression profile of COAD in both tumor samples from TCGA and normal colon tissues from GTEx. Using the SNR-PPFS feature selection algorithms, we discovered a 38 gene signatures that performed well in distinguishing COAD tumors from normal samples. Bayesian network of the 38 genes revealed that DEGs with similar expression patterns or functions interacted more closely. We identified 14 up-DEGs that were significantly correlated with tumor stages. Cox regression analysis demonstrated that tumor stage, STMN4 and FAM135B dysregulation were independent prognostic factors for COAD survival outcomes. Overall, this study indicates that using feature selection approaches to select key gene signatures from high-dimensional datasets can be an effective way for studying cancer genomic characteristics.
Collapse
|