1
|
Hu J, Szymczak S. Evaluation of network-guided random forest for disease gene discovery. BioData Min 2024; 17:10. [PMID: 38627770 PMCID: PMC11020917 DOI: 10.1186/s13040-024-00361-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 04/09/2024] [Indexed: 04/20/2024] Open
Abstract
BACKGROUND Gene network information is believed to be beneficial for disease module and pathway identification, but has not been explicitly utilized in the standard random forest (RF) algorithm for gene expression data analysis. We investigate the performance of a network-guided RF where the network information is summarized into a sampling probability of predictor variables which is further used in the construction of the RF. RESULTS Our simulation results suggest that network-guided RF does not provide better disease prediction than the standard RF. In terms of disease gene discovery, if disease genes form module(s), network-guided RF identifies them more accurately. In addition, when disease status is independent from genes in the given network, spurious gene selection results can occur when using network information, especially on hub genes. Our empirical analysis on two balanced microarray and RNA-Seq breast cancer datasets from The Cancer Genome Atlas (TCGA) for classification of progesterone receptor (PR) status also demonstrates that network-guided RF can identify genes from PGR-related pathways, which leads to a better connected module of identified genes. CONCLUSIONS Gene networks can provide additional information to aid the gene expression analysis for disease module and pathway identification. But they need to be used with caution and validation on the results need to be carried out to guard against spurious gene selection. More robust approaches to incorporate such information into RF construction also warrant further study.
Collapse
Affiliation(s)
- Jianchang Hu
- Institute of Medical Biometry and Statistics, University of Lübeck, Ratzeburger Allee 160, Lübeck, 23562, Germany
| | - Silke Szymczak
- Institute of Medical Biometry and Statistics, University of Lübeck, Ratzeburger Allee 160, Lübeck, 23562, Germany.
| |
Collapse
|
2
|
Sheng M, Cai H, Yang Q, Li J, Zhang J, Liu L. A Random Walk-Based Method to Identify Candidate Genes Associated With Lymphoma. Front Genet 2021; 12:792754. [PMID: 34899868 PMCID: PMC8655984 DOI: 10.3389/fgene.2021.792754] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Accepted: 11/02/2021] [Indexed: 11/16/2022] Open
Abstract
Lymphoma is a serious type of cancer, especially for adolescents and elder adults, although this malignancy is quite rare compared with other types of cancer. The cause of this malignancy remains ambiguous. Genetic factor is deemed to be highly associated with the initiation and progression of lymphoma, and several genes have been related to this disease. Determining the pathogeny of lymphoma by identifying the related genes is important. In this study, we presented a random walk-based method to infer the novel lymphoma-associated genes. From the reported 1,458 lymphoma-associated genes and protein–protein interaction network, raw candidate genes were mined by using the random walk with restart algorithm. The determined raw genes were further filtered by using three screening tests (i.e., permutation, linkage, and enrichment tests). These tests could control false-positive genes and screen out essential candidate genes with strong linkages to validate the lymphoma-associated genes. A total of 108 inferred genes were obtained. Analytical results indicated that some inferred genes, such as RAC3, TEC, IRAK2/3/4, PRKCE, SMAD3, BLK, TXK, PRKCQ, were associated with the initiation and progression of lymphoma.
Collapse
Affiliation(s)
- Minjie Sheng
- Department of Ophthalmology, Yangpu Hospital, School of Medicine, Tongji University, Shanghai, China
| | - Haiying Cai
- Department of Ophthalmology, Yangpu Hospital, School of Medicine, Tongji University, Shanghai, China
| | - Qin Yang
- Department of Ophthalmology, Yangpu Hospital, School of Medicine, Tongji University, Shanghai, China
| | - Jing Li
- Department of Ophthalmology, Yangpu Hospital, School of Medicine, Tongji University, Shanghai, China
| | - Jian Zhang
- Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.,Shanghai Key Laboratory of Ocular Fundus Diseases, Shanghai, China.,Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai, China.,National Clinical Research Center for Eye Diseases, Shanghai, China.,Shanghai Engineering Center for Precise Diagnosis and Treatment of Eye Diseases, Shanghai, China
| | - Lihua Liu
- Department of Ophthalmology, Yangpu Hospital, School of Medicine, Tongji University, Shanghai, China
| |
Collapse
|
3
|
Identification of Novel Choroidal Neovascularization-Related Genes Using Laplacian Heat Diffusion Algorithm. BIOMED RESEARCH INTERNATIONAL 2021; 2021:2295412. [PMID: 34532497 PMCID: PMC8440095 DOI: 10.1155/2021/2295412] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/09/2021] [Accepted: 08/20/2021] [Indexed: 11/20/2022]
Abstract
Choroidal neovascularization (CNV) is a type of eye disease that can cause vision loss. In recent years, many studies have attempted to investigate the major pathological processes and molecular pathogenic mechanisms of CNV. Because many diseases are related to genes, the genes associated with CNV need to be identified. In this study, we proposed a network-based approach for identifying novel CNV-associated genes. To execute such method, we first employed a protein-protein interaction network reported in STRING. Then, we applied a network diffusion algorithm, Laplacian heat diffusion, on this network by selecting validated CNV-related genes as the seed nodes. As a result, some novel genes that had unknown but strong relationships with validated genes were identified. Furthermore, we used a screening procedure to extract the most essential genes. Eleven latent CNV-related genes were finally obtained. Extensive analyses were performed to confirm that these genes are novel CNV-related genes.
Collapse
|