1
|
Zheng C, Wang M, Yamada R, Okada D. Delving into gene-set multiplex networks facilitated by a k-nearest neighbor-based measure of similarity. Comput Struct Biotechnol J 2023; 21:4988-5002. [PMID: 37867964 PMCID: PMC10589751 DOI: 10.1016/j.csbj.2023.09.042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 09/22/2023] [Accepted: 09/28/2023] [Indexed: 10/24/2023] Open
Abstract
Gene sets are functional units for living cells. Previously, limited studies investigated the complex relations among gene sets, but documents about their altering patterns across biological conditions still need to be prepared. In this study, we adopted and modified a classical k-nearest neighbor-based association function to detect inter-gene-set similarities. Based on this method, we built multiplex networks of gene sets for the first time; these networks contain layers of gene sets corresponding to different populations of cells. The context-based multiplex networks can capture meaningful biological variation and have considerable differences from knowledge-based networks of gene sets built on Jaccard similarity, as demonstrated in this study. Furthermore, at the scale of individual gene sets, the structural coefficients of gene sets (multiplex PageRank centrality, clustering coefficient, and participation coefficient) disclose the diversity of gene sets from the perspective of structural properties and make it easier to identify unique gene sets. In gene set enrichment analysis (GSEA), each gene set is treated independently, and its contextual and relational attributes are ignored. The structural coefficients of gene sets can supplement GSEA with information about the overall picture of gene sets, promoting the constructive reorganization of the enriched terms and helping researchers better prioritize and select gene sets.
Collapse
Affiliation(s)
- Cheng Zheng
- Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, South Research Bldg. No.1(5F), 53 Shogoinkawahara-cho, Sakyo-ku, Kyoto, 6068507, Kyoto, Japan
| | - Man Wang
- Department of Signal Transduction, Research Institute for Microbial Diseases, Osaka University, 3-1 Yamadaoka, Suita, 5650871, Osaka, Japan
| | - Ryo Yamada
- Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, South Research Bldg. No.1(5F), 53 Shogoinkawahara-cho, Sakyo-ku, Kyoto, 6068507, Kyoto, Japan
| | - Daigo Okada
- Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, South Research Bldg. No.1(5F), 53 Shogoinkawahara-cho, Sakyo-ku, Kyoto, 6068507, Kyoto, Japan
| |
Collapse
|
2
|
Xu Y, Cui X, Zhang L, Zhao T, Wang Y. Metastasis-related gene identification by compound constrained NMF and a semisupervised cluster approach using pancancer multiomics features. Comput Biol Med 2022; 151:106263. [PMID: 36371902 DOI: 10.1016/j.compbiomed.2022.106263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 10/16/2022] [Accepted: 10/30/2022] [Indexed: 11/11/2022]
Abstract
In recent years, with the gradual increase in pancancer-related research, more attention has been given to the field of pancancer metastasis. However, the molecular mechanism of pancancer metastasis is very unclear, and identification methods for pancancer metastasis-related genes are still lacking. In view of this research status, we developed a novel pipeline to identify pancancer metastasis-related genes based on compound constrained nonnegative matrix factorization (CCNMF). To solve the above problems, the following modules were designed. A correntropy operator and feature similarity fusion (FSF) were first adopted to process the multiomics features of genes; thus, the influences caused by irrelevant biomolecular patterns, manifested as non-Gaussian noise, were minimized. CCNMF was then adopted to handle the above features with compound constraints consisting of a gene relation network and a "metastasis-related" gene set, which maximizes the biological interpretability of the metafeatures generated by NMF. Since a negative set of pancancer "metastasis-related" genes could hardly be obtained, semisupervised analyses were performed on gene features acquired by each step in our pipeline to examine our method's effect. 83% of the 236 candidates identified by the above method were associated with the metastasis of one or more cancers, 71.9% candidates were identified immune-related in pancancer in addition to the hallmark genes. Our study provides an effective and interpretable method for identifying metastasis-related as well as immune-related genes, and the method is successfully applied to TCGA pancancer data.
Collapse
Affiliation(s)
- Yining Xu
- Faculty of Computing, Harbin Institute of Technology, 92 Xidazhi Street,TIB #20, Harbin, 150000, Hei Long Jiang, China.
| | - Xinran Cui
- Faculty of Computing, Harbin Institute of Technology, 92 Xidazhi Street,TIB #20, Harbin, 150000, Hei Long Jiang, China.
| | - Liyuan Zhang
- Faculty of Computing, Harbin Institute of Technology, 92 Xidazhi Street,TIB #20, Harbin, 150000, Hei Long Jiang, China.
| | - Tianyi Zhao
- School of medicine and Health, Harbin Institute of Technology, 92 Xidazhi Street,TIB #20, Harbin, 150000, Hei Long Jiang, China.
| | - Yadong Wang
- Faculty of Computing, Harbin Institute of Technology, 92 Xidazhi Street,TIB #20, Harbin, 150000, Hei Long Jiang, China.
| |
Collapse
|