1
|
Li G, Hu Z, Luo X, Liu J, Wu J, Peng W, Zhu X. Identification of cancer driver genes based on hierarchical weak consensus model. Health Inf Sci Syst 2024; 12:21. [PMID: 38464463 PMCID: PMC10917728 DOI: 10.1007/s13755-024-00279-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Accepted: 01/31/2024] [Indexed: 03/12/2024] Open
Abstract
Cancer is a complex gene mutation disease that derives from the accumulation of mutations during somatic cell evolution. With the advent of high-throughput technology, a large amount of omics data has been generated, and how to find cancer-related driver genes from a large number of omics data is a challenge. In the early stage, the researchers developed many frequency-based driver genes identification methods, but they could not identify driver genes with low mutation rates well. Afterwards, researchers developed network-based methods by fusing multi-omics data, but they rarely considered the connection among features. In this paper, after analyzing a large number of methods for integrating multi-omics data, a hierarchical weak consensus model for fusing multiple features is proposed according to the connection among features. By analyzing the connection between PPI network and co-mutation hypergraph network, this paper firstly proposes a new topological feature, called co-mutation clustering coefficient (CMCC). Then, a hierarchical weak consensus model is used to integrate CMCC, mRNA and miRNA differential expression scores, and a new driver genes identification method HWC is proposed. In this paper, the HWC method and current 7 state-of-the-art methods are compared on three types of cancers. The comparison results show that HWC has the best identification performance in statistical evaluation index, functional consistency and the partial area under ROC curve. Supplementary Information The online version contains supplementary material available at 10.1007/s13755-024-00279-6.
Collapse
Affiliation(s)
- Gaoshi Li
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Zhipeng Hu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Xinlong Luo
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Jiafei Liu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Jingli Wu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500 Yunnan China
| | - Xiaoshu Zhu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
- School of Computer and Information Security & School of Software Engineering, Guilin University of Electronic Science and Technology, Guilin, China
| |
Collapse
|
2
|
Wang Y, Hong J, Lu Y, Sheng N, Fu Y, Yang L, Meng L, Huang L, Wang H. A Controllability Reinforcement Learning Method for Pancreatic Cancer Biomarker Identification. IEEE Trans Nanobioscience 2024; 23:556-563. [PMID: 39133596 DOI: 10.1109/tnb.2024.3441689] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2024]
Abstract
Pancreatic cancer is one of the most malignant cancers with rapid progression and poor prognosis. The use of transcriptional data can be effective in finding new biomarkers for pancreatic cancer. Many network-based methods used to identify cancer biomarkers are proposed, among which the combination of network controllability appears. However, most of the existing methods do not study RNA, rely on priori and mutations information, or can only achieve classification tasks. In this study, we propose a method combined Relational Graph Convolutional Network and Deep Q-Network called RDDriver to identify pancreatic cancer biomarkers based on multi-layer heterogeneous transcriptional regulation network. Firstly, we construct a regulation network containing long non-coding RNA, microRNA, and messenger RNA. Secondly, Relational Graph Convolutional Network is used to learn the node representation. Finally, we use the idea of Deep Q-Network to build a model, which score and prioritize each RNA with the Popov-Belevitch-Hautus criterion. We train RDDriver on three small simulated networks, and calculate the average score after applying the model parameters to the regulation networks separately. To demonstrate the effectiveness of the method, we perform experiments for comparison between RDDriver and other eight methods based on the approximate benchmark of three types cancer drivers RNAs.
Collapse
|
3
|
Song J, Song Z, Gong Y, Ge L, Lou W. Advancing cancer driver gene identification through an integrative network and pathway approach. J Biomed Inform 2024; 158:104729. [PMID: 39306314 DOI: 10.1016/j.jbi.2024.104729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 08/29/2024] [Accepted: 09/16/2024] [Indexed: 09/26/2024]
Abstract
OBJECTIVE Cancer is a complex genetic disease characterized by the accumulation of various mutations, with driver genes playing a crucial role in cancer initiation and progression. Distinguishing driver genes from passenger mutations is essential for understanding cancer biology and discovering therapeutic targets. However, the majority of existing methods ignore the mutational heterogeneity and commonalities among patients, which hinders the identification of driver genes more effectively. METHODS This study introduces MCSdriver, a novel computational model that integrates network and pathway information to prioritize the identification of cancer driver genes. MCSdriver employs a bidirectional random walk algorithm to quantify the mutual exclusivity and functional relationships between mutated genes within patient cohorts. It calculates similarity scores based on a mutual exclusivity-weighted network and pathway coverage patterns, accounting for patient-specific heterogeneity and molecular profile similarity. RESULTS This approach enhances the accuracy and quality of driver gene identification. MCSdriver demonstrates superior performance in identifying cancer driver genes across four cancer types from The Cancer Genome Atlas, showing a higher F-score, Recall and Precision compared to existing ranking list-based and module-based models. CONCLUSION The MCSdriver model not only outperforms other models in identifying known cancer driver genes but also effectively identifies novel driver genes involved in cancer-related biological processes. The model's consideration of patient-specific heterogeneity and similarity in molecular profiles significantly enhances the accuracy and quality of driver gene identification. Validation through Gene Ontology enrichment analysis and literature mining further underscores its potential application value in personalized cancer therapy, offering a promising tool for advancing our understanding and treatment of cancer.
Collapse
Affiliation(s)
- Junrong Song
- The School of Information, Yunnan University of Finance and Economics, Kunming, Yunnan 650221, PR China; Yunnan Key Laboratory of Service Computing, Yunnan University of Finance and Economics, Kunming, Yunnan 650221, PR China.
| | - Zhiming Song
- The School of Information, Yunnan University of Finance and Economics, Kunming, Yunnan 650221, PR China; Yunnan Key Laboratory of Service Computing, Yunnan University of Finance and Economics, Kunming, Yunnan 650221, PR China
| | - Yuanli Gong
- The School of Information, Yunnan University of Finance and Economics, Kunming, Yunnan 650221, PR China
| | - Lichang Ge
- The School of Information, Yunnan University of Finance and Economics, Kunming, Yunnan 650221, PR China
| | - Wenlu Lou
- Yunnan Key Laboratory of Service Computing, Yunnan University of Finance and Economics, Kunming, Yunnan 650221, PR China; The School of Business, Yunnan University of Finance and Economics, Kunming, Yunnan 650221, PR China
| |
Collapse
|
4
|
Zhang T, Zhang SW, Xie MY, Li Y. Identifying cooperating cancer driver genes in individual patients through hypergraph random walk. J Biomed Inform 2024; 157:104710. [PMID: 39159864 DOI: 10.1016/j.jbi.2024.104710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Revised: 07/30/2024] [Accepted: 08/14/2024] [Indexed: 08/21/2024]
Abstract
OBJECTIVE Identifying cancer driver genes, especially rare or patient-specific cancer driver genes, is a primary goal in cancer therapy. Although researchers have proposed some methods to tackle this problem, these methods mostly identify cancer driver genes at single gene level, overlooking the cooperative relationship among cancer driver genes. Identifying cooperating cancer driver genes in individual patients is pivotal for understanding cancer etiology and advancing the development of personalized therapies. METHODS Here, we propose a novel Personalized Cooperating cancer Driver Genes (PCoDG) method by using hypergraph random walk to identify the cancer driver genes that cooperatively drive individual patient cancer progression. By leveraging the powerful ability of hypergraph in representing multi-way relationships, PCoDG first employs the personalized hypergraph to depict the complex interactions among mutated genes and differentially expressed genes of an individual patient. Then, a hypergraph random walk algorithm based on hyperedge similarity is utilized to calculate the importance scores of mutated genes, integrating these scores with signaling pathway data to identify the cooperating cancer driver genes in individual patients. RESULTS The experimental results on three TCGA cancer datasets (i.e., BRCA, LUAD, and COADREAD) demonstrate the effectiveness of PCoDG in identifying personalized cooperating cancer driver genes. These genes identified by PCoDG not only offer valuable insights into patient stratification correlating with clinical outcomes, but also provide an useful reference resource for tailoring personalized treatments. CONCLUSION We propose a novel method that can effectively identify cooperating cancer driver genes for individual patients, thereby deepening our understanding of the cooperative relationship among personalized cancer driver genes and advancing the development of precision oncology.
Collapse
Affiliation(s)
- Tong Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China; School of Electrical and Mechanical Engineering, Pingdingshan University, Pingdingshan 467000, China
| | - Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China.
| | - Ming-Yu Xie
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Yan Li
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| |
Collapse
|
5
|
Zhang N, Ma F, Guo D, Pang Y, Wang C, Zhang Y, Zheng X, Wang M. A novel hypergraph model for identifying and prioritizing personalized drivers in cancer. PLoS Comput Biol 2024; 20:e1012068. [PMID: 38683860 PMCID: PMC11081510 DOI: 10.1371/journal.pcbi.1012068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 05/09/2024] [Accepted: 04/09/2024] [Indexed: 05/02/2024] Open
Abstract
Cancer development is driven by an accumulation of a small number of driver genetic mutations that confer the selective growth advantage to the cell, while most passenger mutations do not contribute to tumor progression. The identification of these driver genes responsible for tumorigenesis is a crucial step in designing effective cancer treatments. Although many computational methods have been developed with this purpose, the majority of existing methods solely provided a single driver gene list for the entire cohort of patients, ignoring the high heterogeneity of driver events across patients. It remains challenging to identify the personalized driver genes. Here, we propose a novel method (PDRWH), which aims to prioritize the mutated genes of a single patient based on their impact on the abnormal expression of downstream genes across a group of patients who share the co-mutation genes and similar gene expression profiles. The wide experimental results on 16 cancer datasets from TCGA showed that PDRWH excels in identifying known general driver genes and tumor-specific drivers. In the comparative testing across five cancer types, PDRWH outperformed existing individual-level methods as well as cohort-level methods. Our results also demonstrated that PDRWH could identify both common and rare drivers. The personalized driver profiles could improve tumor stratification, providing new insights into understanding tumor heterogeneity and taking a further step toward personalized treatment. We also validated one of our predicted novel personalized driver genes on tumor cell proliferation by vitro cell-based assays, the promoting effect of the high expression of Low-density lipoprotein receptor-related protein 1 (LRP1) on tumor cell proliferation.
Collapse
Affiliation(s)
- Naiqian Zhang
- School of Mathematics and Statistics, Shandong University, Weihai, China
| | - Fubin Ma
- School of Mathematics and Statistics, Shandong University, Weihai, China
| | - Dong Guo
- School of Mathematics and Statistics, Shandong University, Weihai, China
- Department of Central Lab, Weihai Municipal Hospital, Shandong University, Weihai, China
| | - Yuxuan Pang
- SDU-ANU Joint Science College, Shandong University, Weihai, China
| | - Chenye Wang
- School of Mathematics and Statistics, Shandong University, Weihai, China
| | - Yusen Zhang
- School of Mathematics and Statistics, Shandong University, Weihai, China
| | - Xiaoqi Zheng
- Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Mingyi Wang
- School of Mathematics and Statistics, Shandong University, Weihai, China
- Department of Central Lab, Weihai Municipal Hospital, Shandong University, Weihai, China
| |
Collapse
|
6
|
Wei PJ, Zhu AD, Cao R, Zheng C. Personalized Driver Gene Prediction Using Graph Convolutional Networks with Conditional Random Fields. BIOLOGY 2024; 13:184. [PMID: 38534453 DOI: 10.3390/biology13030184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 03/03/2024] [Accepted: 03/10/2024] [Indexed: 03/28/2024]
Abstract
Cancer is a complex and evolutionary disease mainly driven by the accumulation of genetic variations in genes. Identifying cancer driver genes is important. However, most related studies have focused on the population level. Cancer is a disease with high heterogeneity. Thus, the discovery of driver genes at the individual level is becoming more valuable but is a great challenge. Although there have been some computational methods proposed to tackle this challenge, few can cover all patient samples well, and there is still room for performance improvement. In this study, to identify individual-level driver genes more efficiently, we propose the PDGCN method. PDGCN integrates multiple types of data features, including mutation, expression, methylation, copy number data, and system-level gene features, along with network structural features extracted using Node2vec in order to construct a sample-gene interaction network. Prediction is performed using a graphical convolutional neural network model with a conditional random field layer, which is able to better combine the network structural features with biological attribute features. Experiments on the ACC (Adrenocortical Cancer) and KICH (Kidney Chromophobe) datasets from TCGA (The Cancer Genome Atlas) demonstrated that the method performs better compared to other similar methods. It can identify not only frequently mutated driver genes, but also rare candidate driver genes and novel biomarker genes. The results of the survival and enrichment analyses of these detected genes demonstrate that the method can identify important driver genes at the individual level.
Collapse
Affiliation(s)
- Pi-Jing Wei
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, Hefei 230601, China
| | - An-Dong Zhu
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, Hefei 230601, China
| | - Ruifen Cao
- School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei 230601, China
| | - Chunhou Zheng
- School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei 230601, China
| |
Collapse
|
7
|
Xu X, Qi Z, Wang L, Zhang M, Geng Z, Han X. Gsw-fi: a GLM model incorporating shrinkage and double-weighted strategies for identifying cancer driver genes with functional impact. BMC Bioinformatics 2024; 25:99. [PMID: 38448819 PMCID: PMC10916024 DOI: 10.1186/s12859-024-05707-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 02/16/2024] [Indexed: 03/08/2024] Open
Abstract
BACKGROUND Cancer, a disease with high morbidity and mortality rates, poses a significant threat to human health. Driver genes, which harbor mutations accountable for the initiation and progression of tumors, play a crucial role in cancer development. Identifying driver genes stands as a paramount objective in cancer research and precision medicine. RESULTS In the present work, we propose a method for identifying driver genes using a Generalized Linear Regression Model (GLM) with Shrinkage and double-Weighted strategies based on Functional Impact, which is named GSW-FI. Firstly, an estimating model is proposed for assessing the background functional impacts of genes based on GLM, utilizing gene features as predictors. Secondly, the shrinkage and double-weighted strategies as two revising approaches are integrated to ensure the rationality of the identified driver genes. Lastly, a statistical method of hypothesis testing is designed to identify driver genes by leveraging the estimated background function impacts. Experimental results conducted on 31 The Cancer Genome Altas datasets demonstrate that GSW-FI outperforms ten other prediction methods in terms of the overlap fraction with well-known databases and consensus predictions among different methods. CONCLUSIONS GSW-FI presents a novel approach that efficiently identifies driver genes with functional impact mutations using computational methods, thereby advancing the development of precision medicine for cancer.
Collapse
Affiliation(s)
- Xiaolu Xu
- School of Computer and Artificial Intelligence, Liaoning Normal University, Dalian, China
| | - Zitong Qi
- Department of Statistics, University of Washington, Seattle, USA
| | - Lei Wang
- Center for Reproductive and Genetic Medicine, Dalian Women and Children's Medical Group, Dalian, China.
| | - Meiwei Zhang
- Center for Reproductive and Genetic Medicine, Dalian Women and Children's Medical Group, Dalian, China.
| | - Zhaohong Geng
- Department of Cardiology, Second Affiliated Hospital of Dalian Medical University, Dalian, China
| | - Xiumei Han
- College of Artificial Intelligence, Dalian Maritime University, Dalian, China
| |
Collapse
|
8
|
Song J, Song Z, Zhang J, Gong Y. Privacy-Preserving Identification of Cancer Subtype-Specific Driver Genes Based on Multigenomics Data with Privatedriver. J Comput Biol 2024; 31:99-116. [PMID: 38271572 DOI: 10.1089/cmb.2023.0115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2024] Open
Abstract
Identifying cancer subtype-specific driver genes from a large number of irrelevant passengers is crucial for targeted therapy in cancer treatment. Recently, the rapid accumulation of large-scale cancer genomics data from multiple institutions has presented remarkable opportunities for identification of cancer subtype-specific driver genes. However, the insufficient subtype samples, privacy issues, and heterogenous of aberration events pose great challenges in precisely identifying cancer subtype-specific driver genes. To address this, we introduce privatedriver, the first model for identifying subtype-specific driver genes that integrates genomics data from multiple institutions in a data privacy-preserving collaboration manner. The process of identifying subtype-specific cancer driver genes using privatedriver involves the following two steps: genomics data integration and collaborative training. In the integration process, the aberration events from multiple genomics data sources are combined for each institution using the forward and backward propagation method of NetICS. In the collaborative training process, each institution utilizes the federated learning framework to upload encrypted model parameters instead of raw data of all institutions to train a global model by using the non-negative matrix factorization algorithm. We applied privatedriver on head and neck squamous cell and colon cancer from The Cancer Genome Atlas website and evaluated it with two benchmarks using macro-Fscore. The comparison analysis demonstrates that privatedriver achieves comparable results to centralized learning models and outperforms most other nonprivacy preserving models, all while ensuring the confidentiality of patient information. We also demonstrate that, for varying predicted driver gene distributions in subtype, our model fully considers the heterogeneity of subtype and identifies subtype-specific driver genes corresponding to the given prognosis and therapeutic effect. The success of privatedriver reveals the feasibility and effectiveness of identifying cancer subtype-specific driver genes in a data protection manner, providing new insights for future privacy-preserving driver gene identification studies.
Collapse
Affiliation(s)
- Junrong Song
- School of Information; Kunming, P.R. China
- Yunnan Key Laboratory of Service Computing; Yunnan University of Finance and Economics, Kunming, P.R. China
| | - Zhiming Song
- School of Information; Kunming, P.R. China
- Yunnan Key Laboratory of Service Computing; Yunnan University of Finance and Economics, Kunming, P.R. China
| | - Jinpeng Zhang
- School of Information; Kunming, P.R. China
- Yunnan Key Laboratory of Service Computing; Yunnan University of Finance and Economics, Kunming, P.R. China
- The School of Computer Science and Engineering, Yunnan University, Kunming, P.R. China
| | | |
Collapse
|
9
|
Wang X, Kostrzewa C, Reiner A, Shen R, Begg C. Adaptation of a mutual exclusivity framework to identify driver mutations within oncogenic pathways. Am J Hum Genet 2024; 111:227-241. [PMID: 38232729 PMCID: PMC10870134 DOI: 10.1016/j.ajhg.2023.12.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 12/05/2023] [Accepted: 12/05/2023] [Indexed: 01/19/2024] Open
Abstract
Distinguishing genomic alterations in cancer-associated genes that have functional impact on tumor growth and disease progression from the ones that are passengers and confer no fitness advantage have important clinical implications. Evidence-based methods for nominating drivers are limited by existing knowledge on the oncogenic effects and therapeutic benefits of specific variants from clinical trials or experimental settings. As clinical sequencing becomes a mainstay of patient care, applying computational methods to mine the rapidly growing clinical genomic data holds promise in uncovering functional candidates beyond the existing knowledge base and expanding the patient population that could potentially benefit from genetically targeted therapies. We propose a statistical and computational method (MAGPIE) that builds on a likelihood approach leveraging the mutual exclusivity pattern within an oncogenic pathway for identifying probabilistically both the specific genes within a pathway and the individual mutations within such genes that are truly the drivers. Alterations in a cancer-associated gene are assumed to be a mixture of driver and passenger mutations with the passenger rates modeled in relationship to tumor mutational burden. We use simulations to study the operating characteristics of the method and assess false-positive and false-negative rates in driver nomination. When applied to a large study of primary melanomas, the method accurately identifies the known driver genes within the RTK-RAS pathway and nominates several rare variants as prime candidates for functional validation. A comprehensive evaluation of MAGPIE against existing tools has also been conducted leveraging the Cancer Genome Atlas data.
Collapse
Affiliation(s)
- Xinjun Wang
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
| | - Caroline Kostrzewa
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Allison Reiner
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Ronglai Shen
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Colin Begg
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
| |
Collapse
|
10
|
Huang Y, Chen F, Sun H, Zhong C. Exploring gene-patient association to identify personalized cancer driver genes by linear neighborhood propagation. BMC Bioinformatics 2024; 25:34. [PMID: 38254011 PMCID: PMC10804660 DOI: 10.1186/s12859-024-05662-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 01/18/2024] [Indexed: 01/24/2024] Open
Abstract
BACKGROUND Driver genes play a vital role in the development of cancer. Identifying driver genes is critical for diagnosing and understanding cancer. However, challenges remain in identifying personalized driver genes due to tumor heterogeneity of cancer. Although many computational methods have been developed to solve this problem, few efforts have been undertaken to explore gene-patient associations to identify personalized driver genes. RESULTS Here we propose a method called LPDriver to identify personalized cancer driver genes by employing linear neighborhood propagation model on individual genetic data. LPDriver builds personalized gene network based on the genetic data of individual patients, extracts the gene-patient associations from the bipartite graph of the personalized gene network and utilizes a linear neighborhood propagation model to mine gene-patient associations to detect personalized driver genes. The experimental results demonstrate that as compared to the existing methods, our method shows competitive performance and can predict cancer driver genes in a more accurate way. Furthermore, these results also show that besides revealing novel driver genes that have been reported to be related with cancer, LPDriver is also able to identify personalized cancer driver genes for individual patients by their network characteristics even if the mutation data of genes are hidden. CONCLUSIONS LPDriver can provide an effective approach to predict personalized cancer driver genes, which could promote the diagnosis and treatment of cancer. The source code and data are freely available at https://github.com/hyr0771/LPDriver .
Collapse
Affiliation(s)
- Yiran Huang
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004, China
- Key Laboratory of Parallel, Distributed and Intelligent Computing in Guangxi Universities and Colleges, Guangxi University, Nanning, 530004, China
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning, 530004, China
| | - Fuhao Chen
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004, China
| | - Hongtao Sun
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004, China
| | - Cheng Zhong
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004, China.
- Key Laboratory of Parallel, Distributed and Intelligent Computing in Guangxi Universities and Colleges, Guangxi University, Nanning, 530004, China.
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning, 530004, China.
| |
Collapse
|
11
|
Liu CH, Lai YL, Shen PC, Liu HC, Tsai MH, Wang YD, Lin WJ, Chen FH, Li CY, Wang SC, Hung MC, Cheng WC. DriverDBv4: a multi-omics integration database for cancer driver gene research. Nucleic Acids Res 2024; 52:D1246-D1252. [PMID: 37956338 PMCID: PMC10767848 DOI: 10.1093/nar/gkad1060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/12/2023] [Accepted: 10/25/2023] [Indexed: 11/15/2023] Open
Abstract
Advancements in high-throughput technology offer researchers an extensive range of multi-omics data that provide deep insights into the complex landscape of cancer biology. However, traditional statistical models and databases are inadequate to interpret these high-dimensional data within a multi-omics framework. To address this limitation, we introduce DriverDBv4, an updated iteration of the DriverDB cancer driver gene database (http://driverdb.bioinfomics.org/). This updated version offers several significant enhancements: (i) an increase in the number of cohorts from 33 to 70, encompassing approximately 24 000 samples; (ii) inclusion of proteomics data, augmenting the existing types of omics data and thus expanding the analytical scope; (iii) implementation of multiple multi-omics algorithms for identification of cancer drivers; (iv) new visualization features designed to succinctly summarize high-context data and redesigned existing sections to accommodate the increased volume of datasets and (v) two new functions in Customized Analysis, specifically designed for multi-omics driver identification and subgroup expression analysis. DriverDBv4 facilitates comprehensive interpretation of multi-omics data across diverse cancer types, thereby enriching the understanding of cancer heterogeneity and aiding in the development of personalized clinical approaches. The database is designed to foster a more nuanced understanding of the multi-faceted nature of cancer.
Collapse
Affiliation(s)
- Chia-Hsin Liu
- Cancer Biology and Precision Therapeutics Center, China Medical University, Taichung 404328, Taiwan
| | - Yo-Liang Lai
- Department of Radiation Oncology, China Medical University, Taichung 404328, Taiwan
| | - Pei-Chun Shen
- Cancer Biology and Precision Therapeutics Center, China Medical University, Taichung 404328, Taiwan
| | - Hsiu-Cheng Liu
- Cancer Biology and Precision Therapeutics Center, China Medical University, Taichung 404328, Taiwan
| | - Meng-Hsin Tsai
- Cancer Biology and Precision Therapeutics Center, China Medical University, Taichung 404328, Taiwan
| | - Yu-De Wang
- Graduate Institute of Biomedical Sciences, China Medical University, Taichung 404328, Taiwan
- Department of Urology, China Medical University, Taichung 404328, Taiwan
| | - Wen-Jen Lin
- Cancer Biology and Precision Therapeutics Center, China Medical University, Taichung 404328, Taiwan
- School of Medicine, China Medical University, Taichung 404328, Taiwan
| | - Fang-Hsin Chen
- Institute of Nuclear Engineering and Science, National Tsing Hua University, Hsinchu 300044, Taiwan
| | - Chia-Yang Li
- Graduate Institute of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung 80708, Taiwan
| | - Shu-Chi Wang
- Department of Medical Laboratory Science and Biotechnology, Kaohsiung Medical University, Kaohsiung 80708, Taiwan
| | - Mien-Chie Hung
- Cancer Biology and Precision Therapeutics Center, China Medical University, Taichung 404328, Taiwan
- Graduate Institute of Biomedical Sciences, China Medical University, Taichung 404328, Taiwan
- Institute of Biochemistry and Molecular Biology, China Medical University, Taichung 404328, Taiwan
- Molecular Medicine Center, China Medical University Hospital, China Medical University, Taichung 404328, Taiwan
- Department of Biotechnology, Asia University, Taichung 413305, Taiwan
| | - Wei-Chung Cheng
- Cancer Biology and Precision Therapeutics Center, China Medical University, Taichung 404328, Taiwan
- Graduate Institute of Biomedical Sciences, China Medical University, Taichung 404328, Taiwan
- The Ph.D. program for Cancer Biology and Drug Discovery, China Medical University and Academia Sinica, Taichung 404328, Taiwan
| |
Collapse
|
12
|
Wang Y, Zhou B, Ru J, Meng X, Wang Y, Liu W. Advances in computational methods for identifying cancer driver genes. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:21643-21669. [PMID: 38124614 DOI: 10.3934/mbe.2023958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Cancer driver genes (CDGs) are crucial in cancer prevention, diagnosis and treatment. This study employed computational methods for identifying CDGs, categorizing them into four groups. The major frameworks for each of these four categories were summarized. Additionally, we systematically gathered data from public databases and biological networks, and we elaborated on computational methods for identifying CDGs using the aforementioned databases. Further, we summarized the algorithms, mainly involving statistics and machine learning, used for identifying CDGs. Notably, the performances of nine typical identification methods for eight types of cancer were compared to analyze the applicability areas of these methods. Finally, we discussed the challenges and prospects associated with methods for identifying CDGs. The present study revealed that the network-based algorithms and machine learning-based methods demonstrated superior performance.
Collapse
Affiliation(s)
- Ying Wang
- School of Computer Science and Engineering, Changshu Institute of Technology, Changshu 215500, China
| | - Bohao Zhou
- School of Computer Science and Engineering, Changshu Institute of Technology, Changshu 215500, China
| | - Jidong Ru
- School of Textile Garment and Design, Changshu Institute of Technology, Changshu 215500, China
| | - Xianglian Meng
- School of Computer Information and Engineering, Changzhou Institute of Technology, Changzhou 213032, China
| | - Yundong Wang
- School of Computer Science and Engineering, Changshu Institute of Technology, Changshu 215500, China
| | - Wenjie Liu
- School of Computer Information and Engineering, Changzhou Institute of Technology, Changzhou 213032, China
| |
Collapse
|
13
|
Xu J, Pang B, Lan Y, Dou R, Wang S, Kang S, Zhang W, Liu Y, Zhang Y, Ping Y. Identifying the personalized driver gene sets maximally contributing to abnormality of transcriptome phenotype in glioblastoma multiforme individuals. Mol Oncol 2023; 17:2472-2490. [PMID: 37491836 PMCID: PMC10620122 DOI: 10.1002/1878-0261.13499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Revised: 06/21/2023] [Accepted: 07/24/2023] [Indexed: 07/27/2023] Open
Abstract
High heterogeneity in genome and phenotype of cancer populations made it difficult to apply population-based common driver genes to the diagnosis and treatment of cancer individuals. Characterizing and identifying the personalized driver mechanism for glioblastoma multiforme (GBM) individuals were pivotal for the realization of precision medicine. We proposed an integrative method to identify the personalized driver gene sets by integrating the profiles of gene expression and genetic alterations in cancer individuals. This method coupled genetic algorithm and random walk to identify the optimal gene sets that could explain abnormality of transcriptome phenotype to the maximum extent. The personalized driver gene sets were identified for 99 GBM individuals using our method. We found that genomic alterations in between one and seven driver genes could maximally and cumulatively explain the dysfunction of cancer hallmarks across GBM individuals. The driver gene sets were distinct even in GBM individuals with significantly similar transcriptomic phenotypes. Our method identified MCM4 with rare genetic alterations as previously unknown oncogenic genes, the high expression of which were significantly associated with poor GBM prognosis. The functional experiments confirmed that knockdown of MCM4 could significantly inhibit proliferation, invasion, migration, and clone formation of the GBM cell lines U251 and U118MG, and overexpression of MCM4 significantly promoted the proliferation, invasion, migration, and clone formation of the GBM cell line U87MG. Our method could dissect the personalized driver genetic alteration sets that are pivotal for developing targeted therapy strategies and precision medicine. Our method could be extended to identify key drivers from other levels and could be applied to more cancer types.
Collapse
Affiliation(s)
- Jinyuan Xu
- College of Bioinformatics Science and TechnologyHarbin Medical UniversityChina
| | - Bo Pang
- College of Bioinformatics Science and TechnologyHarbin Medical UniversityChina
| | - Yujia Lan
- College of Bioinformatics Science and TechnologyHarbin Medical UniversityChina
| | - Renjie Dou
- College of Bioinformatics Science and TechnologyHarbin Medical UniversityChina
| | - Shuai Wang
- College of Bioinformatics Science and TechnologyHarbin Medical UniversityChina
| | - Shaobo Kang
- College of Bioinformatics Science and TechnologyHarbin Medical UniversityChina
| | - Wanmei Zhang
- College of Bioinformatics Science and TechnologyHarbin Medical UniversityChina
| | - Yuanyuan Liu
- College of Bioinformatics Science and TechnologyHarbin Medical UniversityChina
| | - Yijing Zhang
- College of Bioinformatics Science and TechnologyHarbin Medical UniversityChina
| | - Yanyan Ping
- College of Bioinformatics Science and TechnologyHarbin Medical UniversityChina
| |
Collapse
|
14
|
Gillman R, Field MA, Schmitz U, Karamatic R, Hebbard L. Identifying cancer driver genes in individual tumours. Comput Struct Biotechnol J 2023; 21:5028-5038. [PMID: 37867967 PMCID: PMC10589724 DOI: 10.1016/j.csbj.2023.10.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 10/10/2023] [Accepted: 10/11/2023] [Indexed: 10/24/2023] Open
Abstract
Cancer is a heterogeneous disease with a strong genetic component making it suitable for precision medicine approaches aimed at identifying the underlying molecular drivers within a tumour. Large scale population-level cancer sequencing consortia have identified many actionable mutations common across both cancer types and sub-types, resulting in an increasing number of successful precision medicine programs. Nonetheless, such approaches fail to consider the effects of mutations unique to an individual patient and may miss rare driver mutations, necessitating personalised approaches to driver-gene prioritisation. One approach is to quantify the functional importance of individual mutations in a single tumour based on how they affect the expression of genes in a gene interaction network (GIN). These GIN-based approaches can be broadly divided into those that utilise an existing reference GIN and those that construct de novo patient-specific GINs. These single-tumour approaches have several limitations that likely influence their results, such as use of reference cohort data, network choice, and approaches to mathematical approximation, and more research is required to evaluate the in vitro and in vivo applicability of their predictions. This review examines the current state of the art methods that identify driver genes in single tumours with a focus on GIN-based driver prioritisation.
Collapse
Affiliation(s)
- Rhys Gillman
- Department of Biomedical Sciences and Molecular and Cell Biology, College of Public Health, Medical, and Veterinary Sciences, James Cook University, Townsville, Queensland, Australia
- Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Cairns, Queensland, Australia
| | - Matt A. Field
- Department of Biomedical Sciences and Molecular and Cell Biology, College of Public Health, Medical, and Veterinary Sciences, James Cook University, Townsville, Queensland, Australia
- Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Cairns, Queensland, Australia
- Immunogenomics Lab, Garvan Institute of Medical Research, Darlinghurst, New South Wales, Australia
- Menzies School of Health Research, Charles Darwin University, Darwin, Northern Territory, Australia
| | - Ulf Schmitz
- Department of Biomedical Sciences and Molecular and Cell Biology, College of Public Health, Medical, and Veterinary Sciences, James Cook University, Townsville, Queensland, Australia
- Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Cairns, Queensland, Australia
| | - Rozemary Karamatic
- Gastroenterology and Hepatology, Townsville University Hospital, PO Box 670, Townsville, Queensland 4810, Australia
- College of Medicine and Dentistry, Division of Tropical Health and Medicine, James Cook University, Townsville, Queensland, Australia
| | - Lionel Hebbard
- Department of Biomedical Sciences and Molecular and Cell Biology, College of Public Health, Medical, and Veterinary Sciences, James Cook University, Townsville, Queensland, Australia
- Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Cairns, Queensland, Australia
- Storr Liver Centre, Westmead Institute for Medical Research, Westmead Hospital and University of Sydney, Sydney, New South Wales, Australia
- Australian Institute for Tropical Health and Medicine, Townsville, Queensland, Australia
| |
Collapse
|
15
|
Nirgude S, Desai S, Khanchandani V, Nagarajan V, Thumsi J, Choudhary B. Integration of exome-seq and mRNA-seq using DawnRank, identified genes involved in innate immunity as drivers of breast cancer in the Indian cohort. PeerJ 2023; 11:e16033. [PMID: 37810779 PMCID: PMC10552747 DOI: 10.7717/peerj.16033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 08/14/2023] [Indexed: 10/10/2023] Open
Abstract
Genetic heterogeneity influences the prognosis and therapy of breast cancer. The cause of disease progression varies and can be addressed individually. To identify the mutations and their impact on disease progression at an individual level, we sequenced exome and transcriptome from matched normal-tumor samples. We utilised DawnRank to prioritise driver genes and identify specific mutations in Indian patients. Mutations in the C3 and HLA genes were identified as drivers of disease progression, indicating the involvement of the innate immune system. We performed immune profiling on 16 matched normal/tumor samples using CIBERSORTx. We identified CD8+ve T cells, M2 macrophages, and neutrophils to be enriched in luminal A and T cells CD4+naïve, natural killer (NK) cells activated, T follicular helper (Tfh) cells, dendritic cells activated, and neutrophils in triple-negative breast cancer (TNBC) subtypes. Weighted gene co-expression network analysis (WGCNA) revealed activation of T cell-mediated response in ER positive samples and Interleukin and Interferons in ER negative samples. WGCNA analysis also identified unique pathways for each individual, suggesting that rare mutations/expression signatures can be used to design personalised treatment.
Collapse
Affiliation(s)
- Snehal Nirgude
- Institute of Bioinformatics and Applied Biotechnology, Bengaluru, Karnataka, India
- Human Genetics, Children’s Hospital of Philadelphia, Philadelphia, USA
| | - Sagar Desai
- Institute of Bioinformatics and Applied Biotechnology, Bengaluru, Karnataka, India
| | - Vartika Khanchandani
- Institute of Bioinformatics and Applied Biotechnology, Bengaluru, Karnataka, India
| | | | | | - Bibha Choudhary
- Institute of Bioinformatics and Applied Biotechnology, Bengaluru, Karnataka, India
| |
Collapse
|
16
|
Peng W, Yu P, Dai W, Fu X, Liu L, Pan Y. A Graph Convolution Network-Based Model for Prioritizing Personalized Cancer Driver Genes of Individual Patients. IEEE Trans Nanobioscience 2023; 22:744-754. [PMID: 37195839 DOI: 10.1109/tnb.2023.3277316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Abstract
Cancer driver genes are mutated genes that play a key role in the growth of cancer cells. Accurately identifying the cancer driver genes helps us understand cancer's pathogenesis and develop effective treatment strategies. However, cancers are highly heterogeneous diseases; patients with the same cancer type may have different genomic characteristics and clinical symptoms. Hence, it is urgent to devise effective methods to identify personalized cancer driver genes of individual patients to help determine whether a patient can be treated with a certain targeted drug. This work presents a method for predicting personalized cancer Driver genes of individual patients based on Graph Convolution Networks and Neighbor Interactions called NIGCNDriver. NIGCNDriver first constructs a gene-sample association matrix using the associations between a sample and its known driver genes. Then, it employs graph convolution models on the gene-sample network to aggregate neighbor node features, and themself features, and then combines with the element-wise level interactions between neighbors to learn new feature representations for the samples and gene nodes. Finally, a linear correlation coefficient decoder is used to reconstruct the association between the sample and the mutant gene, enabling the prediction of a personalized driver gene for the individual sample. We applied the NIGCNDriver method to predict cancer driver genes for individual samples in the TCGA and cancer cell line datasets. The results show that our method outperforms the baseline methods in cancer driver gene prediction for individual samples.
Collapse
|
17
|
Cui Y, Wang Z, Wang X, Zhang Y, Zhang Y, Pan T, Zhang Z, Li S, Guo Y, Akutsu T, Song J. SMG: self-supervised masked graph learning for cancer gene identification. Brief Bioinform 2023; 24:bbad406. [PMID: 37950905 PMCID: PMC10639095 DOI: 10.1093/bib/bbad406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 09/26/2023] [Accepted: 10/24/2023] [Indexed: 11/13/2023] Open
Abstract
Cancer genomics is dedicated to elucidating the genes and pathways that contribute to cancer progression and development. Identifying cancer genes (CGs) associated with the initiation and progression of cancer is critical for characterization of molecular-level mechanism in cancer research. In recent years, the growing availability of high-throughput molecular data and advancements in deep learning technologies has enabled the modelling of complex interactions and topological information within genomic data. Nevertheless, because of the limited labelled data, pinpointing CGs from a multitude of potential mutations remains an exceptionally challenging task. To address this, we propose a novel deep learning framework, termed self-supervised masked graph learning (SMG), which comprises SMG reconstruction (pretext task) and task-specific fine-tuning (downstream task). In the pretext task, the nodes of multi-omic featured protein-protein interaction (PPI) networks are randomly substituted with a defined mask token. The PPI networks are then reconstructed using the graph neural network (GNN)-based autoencoder, which explores the node correlations in a self-prediction manner. In the downstream tasks, the pre-trained GNN encoder embeds the input networks into feature graphs, whereas a task-specific layer proceeds with the final prediction. To assess the performance of the proposed SMG method, benchmarking experiments are performed on three node-level tasks (identification of CGs, essential genes and healthy driver genes) and one graph-level task (identification of disease subnetwork) across eight PPI networks. Benchmarking experiments and performance comparison with existing state-of-the-art methods demonstrate the superiority of SMG on multi-omic feature engineering.
Collapse
Affiliation(s)
- Yan Cui
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
| | - Zhikang Wang
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Xiaoyu Wang
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Yiwen Zhang
- School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia
| | - Ying Zhang
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| | - Tong Pan
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | | | - Shanshan Li
- School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia
| | - Yuming Guo
- School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| |
Collapse
|
18
|
Li Y, Zhang SW, Xie MY, Zhang T. PhenoDriver: interpretable framework for studying personalized phenotype-associated driver genes in breast cancer. Brief Bioinform 2023; 24:bbad291. [PMID: 37738403 DOI: 10.1093/bib/bbad291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Revised: 07/12/2023] [Accepted: 07/27/2023] [Indexed: 09/24/2023] Open
Abstract
Identifying personalized cancer driver genes and further revealing their oncogenic mechanisms is critical for understanding the mechanisms of cell transformation and aiding clinical diagnosis. Almost all existing methods primarily focus on identifying driver genes at the cohort or individual level but fail to further uncover their underlying oncogenic mechanisms. To fill this gap, we present an interpretable framework, PhenoDriver, to identify personalized cancer driver genes, elucidate their roles in cancer development and uncover the association between driver genes and clinical phenotypic alterations. By analyzing 988 breast cancer patients, we demonstrate the outstanding performance of PhenoDriver in identifying breast cancer driver genes at the cohort level compared to other state-of-the-art methods. Otherwise, our PhenoDriver can also effectively identify driver genes with both recurrent and rare mutations in individual patients. We further explore and reveal the oncogenic mechanisms of some known and unknown breast cancer driver genes (e.g. TP53, MAP3K1, HTT, etc.) identified by PhenoDriver, and construct their subnetworks for regulating clinical abnormal phenotypes. Notably, most of our findings are consistent with existing biological knowledge. Based on the personalized driver profiles, we discover two existing and one unreported breast cancer subtypes and uncover their molecular mechanisms. These results intensify our understanding for breast cancer mechanisms, guide therapeutic decisions and assist in the development of targeted anticancer therapies.
Collapse
Affiliation(s)
- Yan Li
- School of Automation from Northwestern Polytechnical University, China
| | - Shao-Wu Zhang
- School of Automation from Northwestern Polytechnical University, China
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, China
| | - Ming-Yu Xie
- School of Automation from Northwestern Polytechnical University, China
| | - Tong Zhang
- School of Automation from Northwestern Polytechnical University, China
| |
Collapse
|
19
|
Berber I, Erten C, Kazan H. Predator: Predicting the Impact of Cancer Somatic Mutations on Protein-Protein Interactions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3163-3172. [PMID: 37030791 DOI: 10.1109/tcbb.2023.3262119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Since many biological processes are governed by protein-protein interactions, understanding which mutations lead to a disruption in these interactions is profoundly important for cancer research. Most of the existing methods focus on the stability of the protein without considering the specific effects of a mutation on its interactions with other proteins. Here, we focus on somatic mutations that appear on the interface regions of the protein and predict the interactions that would be affected by a mutation of interest. We build an ensemble model, Predator, that classifies the interface mutations as disruptive or nondisruptive based on the predicted effects of mutations on specific protein-protein interactions. We show that Predator outperforms existing approaches in literature in terms of prediction accuracy. We then apply Predator on various TCGA cancer cohorts and perform comprehensive analysis at cohort level, patient level, and gene level in determining the genes whose interface mutations tend to yield a disruption in its interactions. The predictions obtained by Predator shed light on interesting patterns on several genes for each cohort regarding their potential as cancer drivers. Our analyses further reveal that the identified genes and their frequently disrupted partners exhibit patterns of mutually exclusivity across cancer cohorts under study.
Collapse
|
20
|
Chu X, Guan B, Dai L, Liu JX, Li F, Shang J. Network embedding framework for driver gene discovery by combining functional and structural information. BMC Genomics 2023; 24:426. [PMID: 37516822 PMCID: PMC10386255 DOI: 10.1186/s12864-023-09515-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2022] [Accepted: 07/13/2023] [Indexed: 07/31/2023] Open
Abstract
Comprehensive analysis of multiple data sets can identify potential driver genes for various cancers. In recent years, driver gene discovery based on massive mutation data and gene interaction networks has attracted increasing attention, but there is still a need to explore combining functional and structural information of genes in protein interaction networks to identify driver genes. Therefore, we propose a network embedding framework combining functional and structural information to identify driver genes. Firstly, we combine the mutation data and gene interaction networks to construct mutation integration network using network propagation algorithm. Secondly, the struc2vec model is used for extracting gene features from the mutation integration network, which contains both gene's functional and structural information. Finally, machine learning algorithms are utilized to identify the driver genes. Compared with the previous four excellent methods, our method can find gene pairs that are distant from each other through structural similarities and has better performance in identifying driver genes for 12 cancers in the cancer genome atlas. At the same time, we also conduct a comparative analysis of three gene interaction networks, three gene standard sets, and five machine learning algorithms. Our framework provides a new perspective for feature selection to identify novel driver genes.
Collapse
Affiliation(s)
- Xin Chu
- School of Computer Science, Qufu Normal University, Rizhao, 27826, China
| | - Boxin Guan
- School of Computer Science, Qufu Normal University, Rizhao, 27826, China
| | - Lingyun Dai
- School of Computer Science, Qufu Normal University, Rizhao, 27826, China
| | - Jin-Xing Liu
- School of Computer Science, Qufu Normal University, Rizhao, 27826, China
| | - Feng Li
- School of Computer Science, Qufu Normal University, Rizhao, 27826, China.
| | - Junliang Shang
- School of Computer Science, Qufu Normal University, Rizhao, 27826, China.
| |
Collapse
|
21
|
Zhu X, Zhao W, Zhou Z, Gu X. Unraveling the Drivers of Tumorigenesis in the Context of Evolution: Theoretical Models and Bioinformatics Tools. J Mol Evol 2023:10.1007/s00239-023-10117-0. [PMID: 37246992 DOI: 10.1007/s00239-023-10117-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Accepted: 05/09/2023] [Indexed: 05/30/2023]
Abstract
Cancer originates from somatic cells that have accumulated mutations. These mutations alter the phenotype of the cells, allowing them to escape homeostatic regulation that maintains normal cell numbers. The emergence of malignancies is an evolutionary process in which the random accumulation of somatic mutations and sequential selection of dominant clones cause cancer cells to proliferate. The development of technologies such as high-throughput sequencing has provided a powerful means to measure subclonal evolutionary dynamics across space and time. Here, we review the patterns that may be observed in cancer evolution and the methods available for quantifying the evolutionary dynamics of cancer. An improved understanding of the evolutionary trajectories of cancer will enable us to explore the molecular mechanism of tumorigenesis and to design tailored treatment strategies.
Collapse
Affiliation(s)
- Xunuo Zhu
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Wenyi Zhao
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Zhan Zhou
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China.
- The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, 322000, China.
- Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 310058, China.
| | - Xun Gu
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA, 50011, USA.
| |
Collapse
|
22
|
Meng P, Wang G, Guo H, Jiang T. Identifying cancer driver genes using a two-stage random walk with restart on a gene interaction network. Comput Biol Med 2023; 158:106810. [PMID: 37011433 DOI: 10.1016/j.compbiomed.2023.106810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Revised: 03/08/2023] [Accepted: 03/20/2023] [Indexed: 04/03/2023]
Abstract
Cancer development and progression are significantly influenced by cancer driver genes. Understanding cancer driver genes and their mechanisms of action is essential for developing effective cancer treatments. As a result, identifying driver genes is important for drug development, cancer diagnosis, and treatment. Here, we present an algorithm to discover driver genes based on the two-stage random walk with restart (RWR), and the modified method for calculating the transition probability matrix in random walk algorithm. First, we performed the first stage of RWR on the whole gene interaction network, in which we employ a new method for calculating the transition probability matrix and extracted the subnetwork based on nodes that had a high correlation with the seed nodes. The subnetwork was then applied to the second stage of RWR and the nodes were re-ranked in the subnetwork. Our approach outperformed existing methods in identifying driver genes. The outcome of the effect of three gene interaction networks, two rounds of random walk, and the seed nodes' sensitivity were all compared at the same time. In addition, we identified several potential driver genes, some of which are involved in driving cancer development. Overall, our method is efficient in various cancer types, significantly outperforms existing methods, and can identify possible driver genes.
Collapse
|
23
|
Lee S, Jung H, Park J, Ahn J. Accurate Prediction of Cancer Prognosis by Exploiting Patient-Specific Cancer Driver Genes. Int J Mol Sci 2023; 24:ijms24076445. [PMID: 37047418 PMCID: PMC10095073 DOI: 10.3390/ijms24076445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 03/17/2023] [Accepted: 03/28/2023] [Indexed: 04/03/2023] Open
Abstract
Accurate prediction of the prognoses of cancer patients and identification of prognostic biomarkers are both important for the improved treatment of cancer patients, in addition to enhanced anticancer drugs. Many previous bioinformatic studies have been carried out to achieve this goal; however, there remains room for improvement in terms of accuracy. In this study, we demonstrated that patient-specific cancer driver genes could be used to predict cancer prognoses more accurately. To identify patient-specific cancer driver genes, we first generated patient-specific gene networks before using modified PageRank to generate feature vectors that represented the impacts genes had on the patient-specific gene network. Subsequently, the feature vectors of the good and poor prognosis groups were used to train the deep feedforward network. For the 11 cancer types in the TCGA data, the proposed method showed a significantly better prediction performance than the existing state-of-the-art methods for three cancer types (BRCA, CESC and PAAD), better performance for five cancer types (COAD, ESCA, HNSC, KIRC and STAD), and a similar or slightly worse performance for the remaining three cancer types (BLCA, LIHC and LUAD). Furthermore, the case study for the identified breast cancer and cervical squamous cell carcinoma prognostic genes and their subnetworks included several pathways associated with the progression of breast cancer and cervical squamous cell carcinoma. These results suggested that heterogeneous cancer driver information may be associated with cancer prognosis.
Collapse
Affiliation(s)
- Suyeon Lee
- Department of Computer Science and Engineering, Incheon National University, Incheon 22012, Republic of Korea
| | - Heewon Jung
- Samsung Electronics Company Ltd., Suwon 16677, Republic of Korea
| | - Jiwoo Park
- Department of Computer Science and Engineering, Incheon National University, Incheon 22012, Republic of Korea
| | - Jaegyoon Ahn
- Department of Computer Science and Engineering, Incheon National University, Incheon 22012, Republic of Korea
- Correspondence:
| |
Collapse
|
24
|
Chen HH, Hsueh CW, Lee CH, Hao TY, Tu TY, Chang LY, Lee JC, Lin CY. SWEET: a single-sample network inference method for deciphering individual features in disease. Brief Bioinform 2023; 24:7017366. [PMID: 36719112 PMCID: PMC10025435 DOI: 10.1093/bib/bbad032] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 01/05/2023] [Accepted: 01/14/2023] [Indexed: 02/01/2023] Open
Abstract
Recently, extracting inherent biological system information (e.g. cellular networks) from genome-wide expression profiles for developing personalized diagnostic and therapeutic strategies has become increasingly important. However, accurately constructing single-sample networks (SINs) to capture individual characteristics and heterogeneity in disease remains challenging. Here, we propose a sample-specific-weighted correlation network (SWEET) method to model SINs by integrating the genome-wide sample-to-sample correlation (i.e. sample weights) with the differential network between perturbed and aggregate networks. For a group of samples, the genome-wide sample weights can be assessed without prior knowledge of intrinsic subpopulations to address the network edge number bias caused by sample size differences. Compared with the state-of-the-art SIN inference methods, the SWEET SINs in 16 cancers more likely fit the scale-free property, display higher overlap with the human interactomes and perform better in identifying three types of cancer-related genes. Moreover, integrating SWEET SINs with a network proximity measure facilitates characterizing individual features and therapy in diseases, such as somatic mutation, mut-driver and essential genes. Biological experiments further validated two candidate repurposable drugs, albendazole for head and neck squamous cell carcinoma (HNSCC) and lung adenocarcinoma (LUAD) and encorafenib for HNSCC. By applying SWEET, we also identified two possible LUAD subtypes that exhibit distinct clinical features and molecular mechanisms. Overall, the SWEET method complements current SIN inference and analysis methods and presents a view of biological systems at the network level to offer numerous clues for further investigation and clinical translation in network medicine and precision medicine.
Collapse
Affiliation(s)
- Hsin-Hua Chen
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Chun-Wei Hsueh
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Chia-Hwa Lee
- School of Medical Laboratory Science and Biotechnology, College of Medical Science and Technology, Taipei Medical University, Taipei 110, Taiwan
- TMU Research Center of Cancer Translational Medicine, Taipei Medical University, Taipei 110, Taiwan
- Ph.D. Program in Medical Biotechnology, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
| | - Ting-Yi Hao
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Tzu-Ying Tu
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Lan-Yun Chang
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Jih-Chin Lee
- Department of Otolaryngology-Head and Neck Surgery, Tri-Service General Hospital, National Defense Medical Center, Taipei 110, Taiwan
| | - Chun-Yu Lin
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
- Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
- Institute of Data Science and Engineering, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
- Center for Intelligent Drug Systems and Smart Bio-devices, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
- School of Dentistry, Kaohsiung Medical University, Kaohsiung 807, Taiwan
| |
Collapse
|
25
|
Cheng X, Amanullah M, Liu W, Liu Y, Pan X, Zhang H, Xu H, Liu P, Lu Y. WMDS.net: a network control framework for identifying key players in transcriptome programs. Bioinformatics 2023; 39:7023921. [PMID: 36727489 PMCID: PMC9925106 DOI: 10.1093/bioinformatics/btad071] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 01/16/2023] [Accepted: 02/01/2023] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Mammalian cells can be transcriptionally reprogramed to other cellular phenotypes. Controllability of such complex transitions in transcriptional networks underlying cellular phenotypes is an inherent biological characteristic. This network controllability can be interpreted by operating a few key regulators to guide the transcriptional program from one state to another. Finding the key regulators in the transcriptional program can provide key insights into the network state transition underlying cellular phenotypes. RESULTS To address this challenge, here, we proposed to identify the key regulators in the transcriptional co-expression network as a minimum dominating set (MDS) of driver nodes that can fully control the network state transition. Based on the theory of structural controllability, we developed a weighted MDS network model (WMDS.net) to find the driver nodes of differential gene co-expression networks. The weight of WMDS.net integrates the degree of nodes in the network and the significance of gene co-expression difference between two physiological states into the measurement of node controllability of the transcriptional network. To confirm its validity, we applied WMDS.net to the discovery of cancer driver genes in RNA-seq datasets from The Cancer Genome Atlas. WMDS.net is powerful among various cancer datasets and outperformed the other top-tier tools with a better balance between precision and recall. AVAILABILITY AND IMPLEMENTATION https://github.com/chaofen123/WMDS.net. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiang Cheng
- Department of Gynecologic Oncology, Women's Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou 310006, China.,Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China
| | - Md Amanullah
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China.,Department of Respiratory Medicine, Key Laboratory of Precision Medicine in Diagnosis and Monitoring Research of Zhejiang Province, Sir Run Run Shaw Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou 310016, China
| | - Weigang Liu
- Department of Respiratory Medicine, Key Laboratory of Precision Medicine in Diagnosis and Monitoring Research of Zhejiang Province, Sir Run Run Shaw Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou 310016, China
| | - Yi Liu
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China.,Department of Respiratory Medicine, Key Laboratory of Precision Medicine in Diagnosis and Monitoring Research of Zhejiang Province, Sir Run Run Shaw Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou 310016, China
| | - Xiaoqing Pan
- Department of Mathematics, Shanghai Normal University, Xuhui 200234, China
| | - Honghe Zhang
- Department of Pathology, Research Unit of Intelligence Classification of Tumor Pathology and Precision Therapy, Chinese Academy of Medical Sciences, Zhejiang University School of Medicine, Hangzhou 310058, China
| | - Haiming Xu
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China
| | - Pengyuan Liu
- Department of Gynecologic Oncology, Women's Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou 310006, China.,Department of Physiology, Center of Systems Molecular Medicine, Medical College of Wisconsin, Milwaukee, WI 53226, USA.,Cancer Center, Zhejiang University, Hangzhou 310029, China
| | - Yan Lu
- Department of Gynecologic Oncology, Women's Hospital and Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou 310006, China.,Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China.,Cancer Center, Zhejiang University, Hangzhou 310029, China
| |
Collapse
|
26
|
Xi J, Deng Z, Liu Y, Wang Q, Shi W. Integrating multi-type aberrations from DNA and RNA through dynamic mapping gene space for subtype-specific breast cancer driver discovery. PeerJ 2023; 11:e14843. [PMID: 36755866 PMCID: PMC9901305 DOI: 10.7717/peerj.14843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Accepted: 01/11/2023] [Indexed: 02/05/2023] Open
Abstract
Driver event discovery is a crucial demand for breast cancer diagnosis and therapy. In particular, discovering subtype-specificity of drivers can prompt the personalized biomarker discovery and precision treatment of cancer patients. Still, most of the existing computational driver discovery studies mainly exploit the information from DNA aberrations and gene interactions. Notably, cancer driver events would occur due to not only DNA aberrations but also RNA alternations, but integrating multi-type aberrations from both DNA and RNA is still a challenging task for breast cancer drivers. On the one hand, the data formats of different aberration types also differ from each other, known as data format incompatibility. On the other hand, different types of aberrations demonstrate distinct patterns across samples, known as aberration type heterogeneity. To promote the integrated analysis of subtype-specific breast cancer drivers, we design a "splicing-and-fusing" framework to address the issues of data format incompatibility and aberration type heterogeneity simultaneously. To overcome the data format incompatibility, the "splicing-step" employs a knowledge graph structure to connect multi-type aberrations from the DNA and RNA data into a unified formation. To tackle the aberration type heterogeneity, the "fusing-step" adopts a dynamic mapping gene space integration approach to represent the multi-type information by vectorized profiles. The experiments also demonstrate the advantages of our approach in both the integration of multi-type aberrations from DNA and RNA and the discovery of subtype-specific breast cancer drivers. In summary, our "splicing-and-fusing" framework with knowledge graph connection and dynamic mapping gene space fusion of multi-type aberrations data from DNA and RNA can successfully discover potential breast cancer drivers with subtype-specificity indication.
Collapse
Affiliation(s)
- Jianing Xi
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou, China
| | - Zhen Deng
- School of Basic Medical Sciences, Guangzhou Medical University, Guangzhou, China
| | - Yang Liu
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou, China
| | - Qian Wang
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou, China
| | - Wen Shi
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou, China
| |
Collapse
|
27
|
Dutta D, Sen A, Satagopan J. Sparse canonical correlation to identify breast cancer related genes regulated by copy number aberrations. PLoS One 2022; 17:e0276886. [PMID: 36584096 PMCID: PMC9803132 DOI: 10.1371/journal.pone.0276886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 10/16/2022] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Copy number aberrations (CNAs) in cancer affect disease outcomes by regulating molecular phenotypes, such as gene expressions, that drive important biological processes. To gain comprehensive insights into molecular biomarkers for cancer, it is critical to identify key groups of CNAs, the associated gene modules, regulatory modules, and their downstream effect on outcomes. METHODS In this paper, we demonstrate an innovative use of sparse canonical correlation analysis (sCCA) to effectively identify the ensemble of CNAs, and gene modules in the context of binary and censored disease endpoints. Our approach detects potentially orthogonal gene expression modules which are highly correlated with sets of CNA and then identifies the genes within these modules that are associated with the outcome. RESULTS Analyzing clinical and genomic data on 1,904 breast cancer patients from the METABRIC study, we found 14 gene modules to be regulated by groups of proximally located CNA sites. We validated this finding using an independent set of 1,077 breast invasive carcinoma samples from The Cancer Genome Atlas (TCGA). Our analysis of 7 clinical endpoints identified several novel and interpretable regulatory associations, highlighting the role of CNAs in key biological pathways and processes for breast cancer. Genes significantly associated with the outcomes were enriched for early estrogen response pathway, DNA repair pathways as well as targets of transcription factors such as E2F4, MYC, and ETS1 that have recognized roles in tumor characteristics and survival. Subsequent meta-analysis across the endpoints further identified several genes through the aggregation of weaker associations. CONCLUSIONS Our findings suggest that sCCA analysis can aggregate weaker associations to identify interpretable and important genes, modules, and clinically consequential pathways.
Collapse
Affiliation(s)
- Diptavo Dutta
- Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland, United States of America
- Integrative Tumor Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
- * E-mail: ,
| | - Ananda Sen
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States of America
- Department of Family Medicine, University of Michigan, Ann Arbor, MI, United States of America
| | - Jaya Satagopan
- Department of Biostatistics and Epidemiology, Rutgers University, New Brunswick, NJ, United States of America
| |
Collapse
|
28
|
Liu Y, Han J, Kong T, Xiao N, Mei Q, Liu J. DriverMP enables improved identification of cancer driver genes. Gigascience 2022; 12:giad106. [PMID: 38091511 PMCID: PMC10716827 DOI: 10.1093/gigascience/giad106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 10/30/2023] [Accepted: 11/22/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND Cancer is widely regarded as a complex disease primarily driven by genetic mutations. A critical concern and significant obstacle lies in discerning driver genes amid an extensive array of passenger genes. FINDINGS We present a new method termed DriverMP for effectively prioritizing altered genes on a cancer-type level by considering mutated gene pairs. It is designed to first apply nonsilent somatic mutation data, protein‒protein interaction network data, and differential gene expression data to prioritize mutated gene pairs, and then individual mutated genes are prioritized based on prioritized mutated gene pairs. Application of this method in 10 cancer datasets from The Cancer Genome Atlas demonstrated its great improvements over all the compared state-of-the-art methods in identifying known driver genes. Then, a comprehensive analysis demonstrated the reliability of the novel driver genes that are strongly supported by clinical experiments, disease enrichment, or biological pathway analysis. CONCLUSIONS The new method, DriverMP, which is able to identify driver genes by effectively integrating the advantages of multiple kinds of cancer data, is available at https://github.com/LiuYangyangSDU/DriverMP. In addition, we have developed a novel driver gene database for 10 cancer types and an online service that can be freely accessed without registration for users. The DriverMP method, the database of novel drivers, and the user-friendly online server are expected to contribute to new diagnostic and therapeutic opportunities for cancers.
Collapse
Affiliation(s)
- Yangyang Liu
- School of Mathematics and Statistics, Shandong University (Weihai), Weihai 264209, China
| | - Jiyun Han
- School of Mathematics and Statistics, Shandong University (Weihai), Weihai 264209, China
| | - Tongxin Kong
- School of Mathematics and Statistics, Shandong University (Weihai), Weihai 264209, China
| | - Nannan Xiao
- School of Mathematics and Statistics, Shandong University (Weihai), Weihai 264209, China
| | - Qinglin Mei
- MOE Key Laboratory of Bioinformatics, BNRIST Bioinformatics Division, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Juntao Liu
- School of Mathematics and Statistics, Shandong University (Weihai), Weihai 264209, China
| |
Collapse
|
29
|
Li F, Li H, Shang J, Liu JX, Dai L, Liu X, Li Y. A network-based method for identifying cancer driver genes based on node control centrality. Exp Biol Med (Maywood) 2022; 248:232-241. [PMID: 36573462 PMCID: PMC10107394 DOI: 10.1177/15353702221139201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Cancer is one of the major contributors to human mortality and has a serious influence on human survival and health. In biomedical research, the identification of cancer driver genes (cancer drivers for short) is an important task; cancer drivers can promote the progression and generation of cancer. To identify cancer drivers, many methods have been developed. These computational models only identify coding cancer drivers; however, non-coding drivers likewise play significant roles in the progression of cancer. Hence, we propose a Network-based Method for identifying cancer Driver Genes based on node Control Centrality (NMDGCC), which can identify coding and non-coding cancer driver genes. The process of NMDGCC for identifying driver genes mainly includes the following two steps. In the first step, we construct a gene interaction network by using mRNAs and miRNAs expression data in the cancer state. In the second step, the control centrality of the node is used to identify cancer drivers in the constructed network. We use the breast cancer dataset from The Cancer Genome Atlas (TCGA) to verify the effectiveness of NMDGCC. Compared with the existing methods of cancer driver genes identification, NMDGCC has a better performance. NMDGCC also identifies 295 miRNAs as non-coding cancer drivers, of which 158 are related to tumorigenesis of BRCA. We also apply NMDGCC to identify driver genes related to the different breast cancer subtypes. The result shows that NMDGCC detects many cancer drivers of specific cancer subtypes.
Collapse
Affiliation(s)
- Feng Li
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Han Li
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Junliang Shang
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Jin-Xing Liu
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Lingyun Dai
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Xikui Liu
- Department of Electrical Engineering and Information Technology, Shandong University of Science and Technology, Jinan 250031, China
| | - Yan Li
- Department of Electrical Engineering and Information Technology, Shandong University of Science and Technology, Jinan 250031, China
| |
Collapse
|
30
|
Kossinna P, Cai W, Lu X, Shemanko CS, Zhang Q. Stabilized COre gene and Pathway Election uncovers pan-cancer shared pathways and a cancer-specific driver. SCIENCE ADVANCES 2022; 8:eabo2846. [PMID: 36542714 PMCID: PMC9770999 DOI: 10.1126/sciadv.abo2846] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 11/22/2022] [Indexed: 06/17/2023]
Abstract
Approaches systematically characterizing interactions via transcriptomic data usually follow two systems: (i) coexpression network analyses focusing on correlations between genes and (ii) linear regressions (usually regularized) to select multiple genes jointly. Both suffer from the problem of stability: A slight change of parameterization or dataset could lead to marked alterations of outcomes. Here, we propose Stabilized COre gene and Pathway Election (SCOPE), a tool integrating bootstrapped least absolute shrinkage and selection operator and coexpression analysis, leading to robust outcomes insensitive to variations in data. By applying SCOPE to six cancer expression datasets (BRCA, COAD, KIRC, LUAD, PRAD, and THCA) in The Cancer Genome Atlas, we identified core genes capturing interaction effects in crucial pan-cancer pathways related to genome instability and DNA damage response. Moreover, we highlighted the pivotal role of CD63 as an oncogenic driver and a potential therapeutic target in kidney cancer. SCOPE enables stabilized investigations toward complex interactions using transcriptome data.
Collapse
Affiliation(s)
- Pathum Kossinna
- Department of Biochemistry & Molecular Biology, University of Calgary, Calgary, Alberta T2N 1N4, Canada
- Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, Alberta T2N 1N4, Canada
| | - Weijia Cai
- Department of Cancer Biology, Sidney Kimmel Cancer Center, Thomas Jefferson University, Philadelphia, PA 19107, USA
| | - Xuewen Lu
- Department of Mathematics and Statistics, University of Calgary, Calgary, Alberta T2N 1N4, Canada
| | - Carrie S. Shemanko
- Department of Biological Sciences, University of Calgary, Calgary, Alberta T2N 1N4, Canada
- Arnie Charbonneau Cancer Research Institute, University of Calgary, Calgary, Alberta T2N 1N4, Canada
| | - Qingrun Zhang
- Department of Biochemistry & Molecular Biology, University of Calgary, Calgary, Alberta T2N 1N4, Canada
- Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, Alberta T2N 1N4, Canada
- Department of Mathematics and Statistics, University of Calgary, Calgary, Alberta T2N 1N4, Canada
| |
Collapse
|
31
|
He Z, Lin Y, Wei R, Liu C, Jiang D. Repulsion and attraction in searching: A hybrid algorithm based on gravitational kernel and vital few for cancer driver gene prediction. Comput Biol Med 2022; 151:106236. [PMID: 36370584 DOI: 10.1016/j.compbiomed.2022.106236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 10/15/2022] [Accepted: 10/22/2022] [Indexed: 12/27/2022]
Abstract
By taking a new perspective to combine a machine learning method with an evolutionary algorithm, a new hybrid algorithm is developed to predict cancer driver genes. Firstly, inspired by the search strategy with the capability of global search in evolutionary algorithms, a gravitational kernel is proposed to act on the full range of gene features. Constructed by fusing PPI and mutation features, the gravitational kernel is capable to produce repulsion effects. The candidate genes with greater mutation effects and PPI have higher similarity scores. According to repulsion, the similarity score of these promising genes is larger than ordinary genes, which is beneficial to search for these promising genes. Secondly, inspired by the idea of elite populations related to evolutionary algorithms, the concept of vital few is proposed. Targeted at a local scale, it acts on the candidate genes associated with vital few genes. Under attraction effect, these vital few driver genes attract those with similar mutational effects to them, which leads to greater similarity scores. Lastly, the model and parameters are optimized by using an evolutionary algorithm, so as to obtain the optimal model and parameters for cancer driver gene prediction. Herein, a comparison is performed with six other advanced methods of cancer driver gene prediction. According to the experimental results, the method proposed in this study outperforms these six state-of-the-art algorithms on the pan-oncogene dataset.
Collapse
Affiliation(s)
- Zhihui He
- Department of Computer Science, Shantou University, 515063, China
| | - Yingqing Lin
- Department of Computer Science, Shantou University, 515063, China
| | - Runguo Wei
- Department of Computer Science, Shantou University, 515063, China
| | - Cheng Liu
- Department of Computer Science, Shantou University, 515063, China
| | - Dazhi Jiang
- Department of Computer Science, Shantou University, 515063, China; Guangdong Provincial Key Laboratory of Information Security Technology, Sun Yat-sen University, Guangzhou 510399, China.
| |
Collapse
|
32
|
Azadifar S, Ahmadi A. A novel candidate disease gene prioritization method using deep graph convolutional networks and semi-supervised learning. BMC Bioinformatics 2022; 23:422. [PMID: 36241966 PMCID: PMC9563530 DOI: 10.1186/s12859-022-04954-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 09/20/2022] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Selecting and prioritizing candidate disease genes is necessary before conducting laboratory studies as identifying disease genes from a large number of candidate genes using laboratory methods, is a very costly and time-consuming task. There are many machine learning-based gene prioritization methods. These methods differ in various aspects including the feature vectors of genes, the used datasets with different structures, and the learning model. Creating a suitable feature vector for genes and an appropriate learning model on a variety of data with different and non-Euclidean structures, including graphs, as well as the lack of negative data are very important challenges of these methods. The use of graph neural networks has recently emerged in machine learning and other related fields, and they have demonstrated superior performance for a broad range of problems. METHODS In this study, a new semi-supervised learning method based on graph convolutional networks is presented using the novel constructing feature vector for each gene. In the proposed method, first, we construct three feature vectors for each gene using terms from the Gene Ontology (GO) database. Then, we train a graph convolution network on these vectors using protein-protein interaction (PPI) network data to identify disease candidate genes. Our model discovers hidden layer representations encoding in both local graph structure as well as features of nodes. This method is characterized by the simultaneous consideration of topological information of the biological network (e.g., PPI) and other sources of evidence. Finally, a validation has been done to demonstrate the efficiency of our method. RESULTS Several experiments are performed on 16 diseases to evaluate the proposed method's performance. The experiments demonstrate that our proposed method achieves the best results, in terms of precision, the area under the ROC curve (AUCs), and F1-score values, when compared with eight state-of-the-art network and machine learning-based disease gene prioritization methods. CONCLUSION This study shows that the proposed semi-supervised learning method appropriately classifies and ranks candidate disease genes using a graph convolutional network and an innovative method to create three feature vectors for genes based on the molecular function, cellular component, and biological process terms from GO data.
Collapse
Affiliation(s)
- Saeid Azadifar
- Faculty of Computer Engineering, K. N. Toosi University of Technology, Tehran, Iran
| | - Ali Ahmadi
- Faculty of Computer Engineering, K. N. Toosi University of Technology, Tehran, Iran
| |
Collapse
|
33
|
Zhang SW, Xu JY, Zhang T. DGMP: Identifying Cancer Driver Genes by Jointing DGCN and MLP from Multi-omics Genomic Data. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:928-938. [PMID: 36464123 PMCID: PMC10025764 DOI: 10.1016/j.gpb.2022.11.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 10/21/2022] [Accepted: 11/04/2022] [Indexed: 12/03/2022]
Abstract
Identification of cancer driver genes plays an important role in precision oncology research, which is helpful to understand cancer initiation and progression. However, most existing computational methods mainly used the protein-protein interaction (PPI) networks, or treated the directed gene regulatory networks (GRNs) as the undirected gene-gene association networks to identify the cancer driver genes, which will lose the unique structure regulatory information in the directed GRNs, and then affect the outcome of the cancer driver gene identification. Here, based on the multi-omics pan-cancer data (i.e., gene expression, mutation, copy number variation, and DNA methylation), we propose a novel method (called DGMP) to identify cancer driver genes by jointing directed graph convolutional network (DGCN) and multilayer perceptron (MLP). DGMP learns the multi-omics features of genes as well as the topological structure features in GRN with the DGCN model and uses MLP to weigh more on gene features for mitigating the bias toward the graph topological features in the DGCN learning process. The results on three GRNs show that DGMP outperforms other existing state-of-the-art methods. The ablation experimental results on the DawnNet network indicate that introducing MLP into DGCN can offset the performance degradation of DGCN, and jointing MLP and DGCN can effectively improve the performance of identifying cancer driver genes. DGMP can identify not only the highly mutated cancer driver genes but also the driver genes harboring other kinds of alterations (e.g., differential expression and aberrant DNA methylation) or genes involved in GRNs with other cancer genes. The source code of DGMP can be freely downloaded from https://github.com/NWPU-903PR/DGMP.
Collapse
Affiliation(s)
- Shao-Wu Zhang
- MOE Key Laboratory of Information Fusion Technology, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China.
| | - Jing-Yu Xu
- MOE Key Laboratory of Information Fusion Technology, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Tong Zhang
- MOE Key Laboratory of Information Fusion Technology, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| |
Collapse
|
34
|
Somatic variation in normal tissues: friend or foe of cancer early detection? Ann Oncol 2022; 33:1239-1249. [PMID: 36162751 DOI: 10.1016/j.annonc.2022.09.156] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 09/03/2022] [Accepted: 09/10/2022] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Seemingly normal tissues progressively become populated by mutant clones over time. Most of these clones bear mutations in well-known cancer genes but only rarely do they transform into cancer. This poses questions on what triggers cancer initiation and what implications somatic variation has for cancer early detection. DESIGN We analysed recent mutational screens of healthy and cancer-free diseased tissues to compare somatic drivers and the causes of somatic variation across tissues. We then reviewed the mechanisms of clonal expansion and their relationships with age and diseases other than cancer. We finally discussed the relevance of somatic variation for cancer initiation and how it can help or hinder cancer detection and prevention. RESULTS The extent of somatic variation is highly variable across tissues and depends on intrinsic features, such as tissue architecture and turnover, as well as the exposure to endogenous and exogenous insults. Most somatic mutations driving clonal expansion are tissue-specific and inactivate tumor suppressor genes involved in chromatin modification and cell growth signaling. Some of these genes are more frequently mutated in normal tissues than cancer, indicating a context-dependent cancer promoting or protective role. Mutant clones can persist over a long time or disappear rapidly, suggesting that their fitness depends on the dynamic equilibrium with the environment. The disruption of this equilibrium is likely responsible for their transformation into malignant clones and knowing what triggers this process is key for cancer prevention and early detection. Somatic variation should be considered in liquid biopsy, where it may contribute cancer-independent mutations, and in the identification of cancer drivers, since not all mutated genes favoring clonal expansion also drive tumorigenesis. CONCLUSIONS Somatic variation and the factors governing homeostasis of normal tissues should be taken into account when devising strategies for cancer prevention and early detection.
Collapse
|
35
|
Wu Q, Wang L, Tsui SKW. Mutational signatures representative transcriptomic perturbations in hepatocellular carcinoma. Front Genet 2022; 13:970907. [PMID: 36081995 PMCID: PMC9445436 DOI: 10.3389/fgene.2022.970907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 07/27/2022] [Indexed: 11/17/2022] Open
Abstract
Hepatocellular carcinoma (HCC) is a primary malignancy with increasing incidence and poor prognosis. Heterogeneity originating from genomic instability is one of the critical reasons of poor outcomes. However, the studies of underlying mechanisms and pathways affected by mutations are still not intelligible. Currently, integrative molecular-level studies using multiomics approaches enable comprehensive analysis for cancers, which is pivotal for personalized therapy and mortality reduction. In this study, genomic and transcriptomic data of HCC are obtained from The Cancer Genome Atlas (TCGA) to investigate the affected coding and non-coding RNAs, as well as their regulatory network due to certain mutational signatures of HCC. Different types of RNAs have their specific enriched biological functions in mutational signature-specific HCCs, upregulated coding RNAs are predominantly associated with lipid metabolism-related pathways, and downregulated coding RNAs are enriched in axonogenesis for tumor microenvironment generation. Additionally, differentially expressed miRNAs are inclined to concentrate in cancer-related signaling pathways. Some of these RNAs also serve as prognostic factors that help predict the survival outcome of HCCs with certain mutational signatures. Furthermore, deregulation of competing endogenous RNA (ceRNA) regulatory network is identified, which suggests a potential therapy via interference of miRNA activity for mutational signature-specific HCC. This study proposes a projection approach to reduce therapeutic complexity from genomic mutations to transcriptomic alterations. Through this method, we identify genes and pathways critical for mutational signature-specific HCC and further discover a series of prognostic markers indicating patient survival outcome.
Collapse
|
36
|
Zhang SW, Wang ZN, Li Y, Guo WF. Prioritization of cancer driver gene with prize-collecting steiner tree by introducing an edge weighted strategy in the personalized gene interaction network. BMC Bioinformatics 2022; 23:341. [PMID: 35974311 PMCID: PMC9380343 DOI: 10.1186/s12859-022-04802-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Accepted: 06/13/2022] [Indexed: 11/10/2022] Open
Abstract
Background Cancer is a heterogeneous disease in which tumor genes cooperate as well as adapt and evolve to the changing conditions for individual patients. It is a meaningful task to discover the personalized cancer driver genes that can provide diagnosis and target drug for individual patients. However, most of existing methods mainly ranks potential personalized cancer driver genes by considering the patient-specific nodes information on the gene/protein interaction network. These methods ignore the personalized edge weight information in gene interaction network, leading to false positive results. Results In this work, we presented a novel algorithm (called PDGPCS) to predict the Personalized cancer Driver Genes based on the Prize-Collecting Steiner tree model by considering the personalized edge weight information. PDGPCS first constructs the personalized weighted gene interaction network by integrating the personalized gene expression data and prior known gene/protein interaction network knowledge. Then the gene mutation data and pathway data are integrated to quantify the impact of each mutant gene on every dysregulated pathway with the prize-collecting Steiner tree model. Finally, according to the mutant gene’s aggregated impact score on all dysregulated pathways, the mutant genes are ranked for prioritizing the personalized cancer driver genes. Experimental results on four TCGA cancer datasets show that PDGPCS has better performance than other personalized driver gene prediction methods. In addition, we verified that the personalized edge weight of gene interaction network can improve the prediction performance. Conclusions PDGPCS can more accurately identify the personalized driver genes and takes a step further toward personalized medicine and treatment. The source code of PDGPCS can be freely downloaded from https://github.com/NWPU-903PR/PDGPCS. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04802-y.
Collapse
Affiliation(s)
- Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China.
| | - Zhen-Nan Wang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Yan Li
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Wei-Feng Guo
- School of Electrical Engineering, Zhengzhou University, Zhengzhou, 450001, China.
| |
Collapse
|
37
|
Wang C, Shi J, Cai J, Zhang Y, Zheng X, Zhang N. DriverRWH: discovering cancer driver genes by random walk on a gene mutation hypergraph. BMC Bioinformatics 2022; 23:277. [PMID: 35831792 PMCID: PMC9281118 DOI: 10.1186/s12859-022-04788-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Accepted: 06/08/2022] [Indexed: 12/24/2022] Open
Abstract
Background Recent advances in next-generation sequencing technologies have helped investigators generate massive amounts of cancer genomic data. A critical challenge in cancer genomics is identification of a few cancer driver genes whose mutations cause tumor growth. However, the majority of existing computational approaches underuse the co-occurrence mutation information of the individuals, which are deemed to be important in tumorigenesis and tumor progression, resulting in high rate of false positive. Results To make full use of co-mutation information, we present a random walk algorithm referred to as DriverRWH on a weighted gene mutation hypergraph model, using somatic mutation data and molecular interaction network data to prioritize candidate driver genes. Applied to tumor samples of different cancer types from The Cancer Genome Atlas, DriverRWH shows significantly better performance than state-of-art prioritization methods in terms of the area under the curve scores and the cumulative number of known driver genes recovered in top-ranked candidate genes. Besides, DriverRWH discovers several potential drivers, which are enriched in cancer-related pathways. DriverRWH recovers approximately 50% known driver genes in the top 30 ranked candidate genes for more than half of the cancer types. In addition, DriverRWH is also highly robust to perturbations in the mutation data and gene functional network data. Conclusion DriverRWH is effective among various cancer types in prioritizes cancer driver genes and provides considerable improvement over other tools with a better balance of precision and sensitivity. It can be a useful tool for detecting potential driver genes and facilitate targeted cancer therapies. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04788-7.
Collapse
Affiliation(s)
- Chenye Wang
- School of Mathematics and Statistics, Shandong University, Weihai, 264209, China
| | - Junhan Shi
- School of Mathematics and Statistics, Shandong University, Weihai, 264209, China
| | - Jiansheng Cai
- Department of Mathematics, Weifang University, Weifang, 261061, Shandong, China
| | - Yusen Zhang
- School of Mathematics and Statistics, Shandong University, Weihai, 264209, China
| | - Xiaoqi Zheng
- Department of Mathematics, Shanghai Normal University, Shanghai, 200234, China
| | - Naiqian Zhang
- School of Mathematics and Statistics, Shandong University, Weihai, 264209, China.
| |
Collapse
|
38
|
Liu C, Dai Y, Yu K, Zhang ZK. Enhancing Cancer Driver Gene Prediction by Protein-Protein Interaction Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2231-2240. [PMID: 33656997 DOI: 10.1109/tcbb.2021.3063532] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
With the advances in gene sequencing technologies, millions of somatic mutations have been reported in the past decades, but mining cancer driver genes with oncogenic mutations from these data remains a critical and challenging area of research. In this study, we proposed a network-based classification method for identifying cancer driver genes with merging the multi-biological information. In this method, we construct a cancer specific genetic network from the human protein-protein interactome (PPI) to mine the network structure attributes, and combine biological information such as mutation frequency and differential expression of genes to achieve accurate prediction of cancer driver genes. Across seven different cancer types, the proposed algorithm always achieves high prediction accuracy, which is superior to the existing advanced methods. In the analysis of the predicted results, about 40 percent of the top 10 candidate genes overlap with the Cancer Gene Census database. Interestingly, the feature comparison indicates that the network based features are still more important than the biological features, including the mutation frequency and genetic differential expression. Further analyses also show that the integration of network structure attributes and biological information is valuable for predicting new cancer driver genes.
Collapse
|
39
|
Yan J, Hu Z, Li ZW, Sun S, Guo WF. Network Control Models With Personalized Genomics Data for Understanding Tumor Heterogeneity in Cancer. Front Oncol 2022; 12:891676. [PMID: 35712516 PMCID: PMC9195174 DOI: 10.3389/fonc.2022.891676] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 04/12/2022] [Indexed: 11/25/2022] Open
Abstract
Due to rapid development of high-throughput sequencing and biotechnology, it has brought new opportunities and challenges in developing efficient computational methods for exploring personalized genomics data of cancer patients. Because of the high-dimension and small sample size characteristics of these personalized genomics data, it is difficult for excavating effective information by using traditional statistical methods. In the past few years, network control methods have been proposed to solve networked system with high-dimension and small sample size. Researchers have made progress in the design and optimization of network control principles. However, there are few studies comprehensively surveying network control methods to analyze the biomolecular network data of individual patients. To address this problem, here we comprehensively surveyed complex network control methods on personalized omics data for understanding tumor heterogeneity in precision medicine of individual patients with cancer.
Collapse
Affiliation(s)
- Jipeng Yan
- Department of Nephrology, Xijing Hospital, The Fourth Military Medical University, Xi’an, China
| | - Zhuo Hu
- School of Electrical Engineering, Zhengzhou University, Zhengzhou, China
| | - Zong-Wei Li
- School of Electrical Engineering, Zhengzhou University, Zhengzhou, China
| | - Shiren Sun
- Department of Nephrology, Xijing Hospital, The Fourth Military Medical University, Xi’an, China
- *Correspondence: Wei-Feng Guo, ; Shiren Sun,
| | - Wei-Feng Guo
- School of Electrical Engineering, Zhengzhou University, Zhengzhou, China
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, China
- *Correspondence: Wei-Feng Guo, ; Shiren Sun,
| |
Collapse
|
40
|
Sudhakar M, Rengaswamy R, Raman K. Multi-Omic Data Improve Prediction of Personalized Tumor Suppressors and Oncogenes. Front Genet 2022; 13:854190. [PMID: 35620468 PMCID: PMC9127508 DOI: 10.3389/fgene.2022.854190] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Accepted: 04/04/2022] [Indexed: 12/12/2022] Open
Abstract
The progression of tumorigenesis starts with a few mutational and structural driver events in the cell. Various cohort-based computational tools exist to identify driver genes but require multiple samples to identify less frequently mutated driver genes. Many studies use different methods to identify driver mutations/genes from mutations that have no impact on tumor progression; however, a small fraction of patients show no mutational events in any known driver genes. Current unsupervised methods map somatic and expression data onto a network to identify personalized driver genes based on changes in expression. Our method is the first machine learning model to classify genes as tumor suppressor gene (TSG), oncogene (OG), or neutral, thus assigning the functional impact of the gene in the patient. In this study, we develop a multi-omic approach, PIVOT (Personalized Identification of driVer OGs and TSGs), to train on experimentally or computationally validated mutational and structural driver events. Given the lack of any gold standards for the identification of personalized driver genes, we label the data using four strategies and, based on classification metrics, show gene-based labeling strategies perform best. We build different models using SNV, RNA, and multi-omic features to be used based on the data available. Our models trained on multi-omic data improved predictions compared with mutation and expression data, achieving an accuracy ≥0.99 for BRCA, LUAD, and COAD datasets. We show network and expression-based features contribute the most to PIVOT. Our predictions on BRCA, COAD, and LUAD cancer types reveal commonly altered genes such as TP53 and PIK3CA, which are predicted drivers for multiple cancer types. Along with known driver genes, our models also identify new driver genes such as PRKCA, SOX9, and PSMD4. Our multi-omic model labels both CNV and mutations with a more considerable contribution by CNV alterations. While predicting labels for genes mutated in multiple samples, we also label rare driver events occurring in as few as one sample. We also identify genes with dual roles within the same cancer type. Overall, PIVOT labels personalized driver genes as TSGs and OGs and also identified rare driver genes.
Collapse
Affiliation(s)
- Malvika Sudhakar
- Centre for Integrative Biology and Systems mEdicine (IBSE), Indian Institute of Technology (IIT) Madras, Chennai, India.,Robert Bosch Center for Data Science and Artificial Intelligence (RBCDSAI), IIT Madras, Chennai, India.,Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, IIT Madras, Chennai, India
| | - Raghunathan Rengaswamy
- Centre for Integrative Biology and Systems mEdicine (IBSE), Indian Institute of Technology (IIT) Madras, Chennai, India.,Robert Bosch Center for Data Science and Artificial Intelligence (RBCDSAI), IIT Madras, Chennai, India.,Department of Chemical Engineering, IIT Madras, Chennai, India
| | - Karthik Raman
- Centre for Integrative Biology and Systems mEdicine (IBSE), Indian Institute of Technology (IIT) Madras, Chennai, India.,Robert Bosch Center for Data Science and Artificial Intelligence (RBCDSAI), IIT Madras, Chennai, India.,Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, IIT Madras, Chennai, India
| |
Collapse
|
41
|
Petrov I, Alexeyenko A. Individualized discovery of rare cancer drivers in global network context. eLife 2022; 11:74010. [PMID: 35593700 PMCID: PMC9159755 DOI: 10.7554/elife.74010] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 05/20/2022] [Indexed: 11/13/2022] Open
Abstract
Late advances in genome sequencing expanded the space of known cancer driver genes several-fold. However, most of this surge was based on computational analysis of somatic mutation frequencies and/or their impact on the protein function. On the contrary, experimental research necessarily accounted for functional context of mutations interacting with other genes and conferring cancer phenotypes. Eventually, just such results become ‘hard currency’ of cancer biology. The new method, NEAdriver employs knowledge accumulated thus far in the form of global interaction network and functionally annotated pathways in order to recover known and predict novel driver genes. The driver discovery was individualized by accounting for mutations’ co-occurrence in each tumour genome – as an alternative to summarizing information over the whole cancer patient cohorts. For each somatic genome change, probabilistic estimates from two lanes of network analysis were combined into joint likelihoods of being a driver. Thus, ability to detect previously unnoticed candidate driver events emerged from combining individual genomic context with network perspective. The procedure was applied to 10 largest cancer cohorts followed by evaluating error rates against previous cancer gene sets. The discovered driver combinations were shown to be informative on cancer outcome. This revealed driver genes with individually sparse mutation patterns that would not be detectable by other computational methods and related to cancer biology domains poorly covered by previous analyses. In particular, recurrent mutations of collagen, laminin, and integrin genes were observed in the adenocarcinoma and glioblastoma cancers. Considering constellation patterns of candidate drivers in individual cancer genomes opens a novel avenue for personalized cancer medicine.
Collapse
Affiliation(s)
- Iurii Petrov
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden.,Science for Life Laboratory, Solna, Sweden
| | - Andrey Alexeyenko
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden.,Science for Life Laboratory, Solna, Sweden.,Evi-networks, enskild konsultföretag, Huddinge, Sweden
| |
Collapse
|
42
|
Erten C, Houdjedj A, Kazan H, Taleb Bahmed AA. PersonaDrive: A Method for the Identification and Prioritization of Personalized Cancer Drivers. Bioinformatics 2022; 38:3407-3414. [PMID: 35579340 DOI: 10.1093/bioinformatics/btac329] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 05/06/2022] [Accepted: 05/11/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION A major challenge in cancer genomics is to distinguish the driver mutations that are causally linked to cancer from passenger mutations that do not contribute to cancer development. The majority of existing methods provide a single driver gene list for the entire cohort of patients. However, since mutation profiles of patients from the same cancer type show a high degree of heterogeneity, a more ideal approach is to identify patient-specific drivers. RESULTS We propose a novel method that integrates genomic data, biological pathways, and protein connectivity information for personalized identification of driver genes. The method is formulated on a personalized bipartite graph for each patient. Our approach provides a personalized ranking of the mutated genes of a patient based on the sum of weighted 'pairwise pathway coverage' scores across all the samples, where appropriate pairwise patient similarity scores are used as weights to normalize these coverage scores. We compare our method against three state-of-the-art patient-specific cancer gene prioritization methods. The comparisons are with respect to a novel evaluation method that takes into account the personalized nature of the problem. We show that our approach outperforms the existing alternatives for both the TCGA and the cell line data. Additionally, we show that the KEGG/Reactome pathways enriched in our ranked genes and those that are enriched in cell lines' reference sets overlap significantly when compared to the overlaps achieved by the rankings of the alternative methods. Our findings can provide valuable information towards the development of personalized treatments and therapies. AVAILABILITY All the code and data are available at https://github.com/abu-compbio/PersonaDrive (archived at https://doi.org/10.5281/zenodo.6520187). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Cesim Erten
- Department of Computer Engineering, Antalya Bilim University, Antalya, 07190, Turkey
| | - Aissa Houdjedj
- Department of Computer Engineering, Antalya Bilim University, Antalya, 07190, Turkey.,Department of Computer Engineering, Akdeniz University, Antalya, 07070, Turkey
| | - Hilal Kazan
- Department of Computer Engineering, Antalya Bilim University, Antalya, 07190, Turkey
| | - Ahmed Amine Taleb Bahmed
- Electrical and Computer Engineering Graduate Program, Antalya Bilim University, Antalya, 07190, Turkey
| |
Collapse
|
43
|
Chen Z, Lu Y, Cao B, Zhang W, Edwards A, Zhang K. Driver gene detection through Bayesian network integration of mutation and expression profiles. Bioinformatics 2022; 38:2781-2790. [PMID: 35561191 PMCID: PMC9113331 DOI: 10.1093/bioinformatics/btac203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 03/12/2022] [Accepted: 04/06/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION The identification of mutated driver genes and the corresponding pathways is one of the primary goals in understanding tumorigenesis at the patient level. Integration of multi-dimensional genomic data from existing repositories, e.g., The Cancer Genome Atlas (TCGA), offers an effective way to tackle this issue. In this study, we aimed to leverage the complementary genomic information of individuals and create an integrative framework to identify cancer-related driver genes. Specifically, based on pinpointed differentially expressed genes, variants in somatic mutations and a gene interaction network, we proposed an unsupervised Bayesian network integration (BNI) method to detect driver genes and estimate the disease propagation at the patient and/or cohort levels. This new method first captures inherent structural information to construct a functional gene mutation network and then extracts the driver genes and their controlled downstream modules using the minimum cover subset method. RESULTS Using other credible sources (e.g. Cancer Gene Census and Network of Cancer Genes), we validated the driver genes predicted by the BNI method in three TCGA pan-cancer cohorts. The proposed method provides an effective approach to address tumor heterogeneity faced by personalized medicine. The pinpointed drivers warrant further wet laboratory validation. AVAILABILITY AND IMPLEMENTATION The supplementary tables and source code can be obtained from https://xavieruniversityoflouisiana.sharefile.com/d-se6df2c8d0ebe4800a3030311efddafe5. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhong Chen
- Department of Computer Science, Xavier University of Louisiana, New Orleans, LA 70125, USA
- Bioinformatics Core of Xavier RCMI Center for Cancer Research, Xavier University of Louisiana, New Orleans, LA 70125, USA
| | - You Lu
- Department of Computer Science, Xavier University of Louisiana, New Orleans, LA 70125, USA
- Bioinformatics Core of Xavier RCMI Center for Cancer Research, Xavier University of Louisiana, New Orleans, LA 70125, USA
| | - Bo Cao
- Division of Basic and Pharmaceutical Sciences, College of Pharmacy, Xavier University of Louisiana, New Orleans, LA 70125, USA
| | - Wensheng Zhang
- Department of Computer Science, Xavier University of Louisiana, New Orleans, LA 70125, USA
- Bioinformatics Core of Xavier RCMI Center for Cancer Research, Xavier University of Louisiana, New Orleans, LA 70125, USA
| | - Andrea Edwards
- Department of Computer Science, Xavier University of Louisiana, New Orleans, LA 70125, USA
| | - Kun Zhang
- To whom correspondence should be addressed
| |
Collapse
|
44
|
Rahimi M, Teimourpour B, Akhavan-Safar M. DGRanker: Cancer Driver Gene Detection in Human Transcriptional Regulatory Network. IRANIAN JOURNAL OF BIOTECHNOLOGY 2022; 20:e3066. [PMID: 36337068 PMCID: PMC9583818 DOI: 10.30498/ijb.2022.289013.3066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
BACKGROUND Cancer is a group of diseases that have received much attention in biological research because of its high mortality rate and the lack of accurate identification of its root causes. In such studies, researchers usually try to identify cancer driver genes (CDGs) that start cancer in a cell. The majority of the methods that have ever been proposed for the identification of CDGs are based on gene expression data and the concept of mutation in genomic data. Recently, using networking techniques and the concept of influence maximization, some models have been proposed to identify these genes. OBJECTIVES We aimed to construct the cancer transcriptional regulatory network and identify cancer driver genes using a network science approach without the use of mutation and genomic data. MATERIALS AND METHODS In this study, we will employ the social influence network theory to identify CDGs in the human gene regulatory network (GRN) that is based on the concept of influence and power of webpages. First, we will create GRN Networks using gene expression data and Existing nodes and edges. Next, we will implement the modified algorithm on GRN networks being studied by weighting the regulatory interaction edges using the influence spread concept. Nodes with the highest ratings will be selected as the CDGs. RESULTS The results show our proposed method outperforms most of the other computational and network-based methods and show its superiority in identifying CDGs compared to many other methods. In addition, the proposed method can identify many CDGs that are overlooked by all previously published methods. CONCLUSIONS Our study demonstrated that the Google's PageRank algorithm can be utilized and modified as a network-based method for identifying cancer driver gene in transcriptional regulatory network. Furthermore, the proposed method can be considered as a complementary method to the computational-based cancer driver gene identification tools.
Collapse
Affiliation(s)
- Majid Rahimi
- Department of information technology, School of Systems and Industrial Engineering, Tarbiat Modares University (TMU), Tehran, Iran
| | - Babak Teimourpour
- School of Systems and Industrial Engineering, Tarbiat Modares University (TMU) Chamran/Al-e-Ahmad Highways Intersection, Tehran, Iran
| | - Mostafa Akhavan-Safar
- Department of Computer and Information Technology Engineering, Payame Noor University, Tehran, Iran
| |
Collapse
|
45
|
Huo Y, Li X, Xu P, Bao Z, Liu W. Analysis of Breast Cancer Based on the Dysregulated Network. Front Genet 2022; 13:856075. [PMID: 35242172 PMCID: PMC8886234 DOI: 10.3389/fgene.2022.856075] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Accepted: 01/28/2022] [Indexed: 11/13/2022] Open
Abstract
Breast cancer is a heterogeneous disease, and its development is closely associated with the underlying molecular regulatory network. In this paper, we propose a new way to measure the regulation strength between genes based on their expression values, and construct the dysregulated networks (DNs) for the four subtypes of breast cancer. Our results show that the key dysregulated networks (KDNs) are significantly enriched in critical breast cancer-related pathways and driver genes; closely related to drug targets; and have significant differences in survival analysis. Moreover, the key dysregulated genes could serve as potential driver genes, drug targets, and prognostic markers for each breast cancer subtype. Therefore, the KDN is expected to be an effective and novel way to understand the mechanisms of breast cancer.
Collapse
Affiliation(s)
- Yanhao Huo
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China
| | - Xianbin Li
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China
| | - Peng Xu
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China.,School of Computer Science of Information Technology, Qiannan Normal University for Nationalities, Duyun, China
| | - Zhenshen Bao
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China.,School of Computer Science of Information Technology, Qiannan Normal University for Nationalities, Duyun, China
| | - Wenbin Liu
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China
| |
Collapse
|
46
|
Akhavan-Safar M, Teimourpour B, Nowzari-Dalini A. A network-based method for detecting cancer driver gene in transcriptional regulatory networks using the structure analysis of weighted regulatory interactions. Curr Bioinform 2022. [DOI: 10.2174/1574893617666220127094224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
The identification of genes that instigate cell anomalies and cause cancer in humans is an important field in oncology research. Abnormalities in these genes are transferred to other genes in the cell, disrupting its normal functionality. Such genes are known as cancer driver genes (CDGs). Various methods have been proposed for predicting CDGs, most of which are based on genomic data and computational methods. Some novel bioinformatic approaches have been developed.
Objective:
In this article, we propose a network-based algorithm, SalsaDriver (Stochastic approach for link-structure analysis to driver detection), which can calculate the receiving and influencing power of each gene using the stochastic analysis of regulatory interaction structures in gene regulatory networks.
Method:
First, regulatory networks related to breast, colon, and lung cancers were constructed using gene expression data and a list of regulatory interactions, the weights of which were then calculated using biological and topological features of the network. After that, the weighted regulatory interactions were used in the structure analysis of interactions achieved using two separate Markov chains on the bipartite graph taken from the main graph of the gene network and implementing the stochastic approach for link-structure analysis. The proposed algorithm categorizes higher-ranked genes as driver genes.
Results:
The proposed algorithm was compared with 24 other computational and network tools based on the F-measure value and the number of detected CDGs. The results were validated using four valid databases. The findings of this study show that SalsaDriver outperforms other methods and can identify a significant number of driver genes not identified using other methods.
Conclusion:
The SalsaDriver network-based approach is suitable for predicting CDGs and can be used as a complementary method along with other computational tools.
Collapse
Affiliation(s)
- Mostafa Akhavan-Safar
- Department of Computer and Information Technology Engineering, Payame Noor University (PNU), P.O. Box, 19395-4697, Tehran, Iran
- Department of Information Technology Engineering, School of Systems and Industrial Engineering, Tarbiat Modares University (TMU), Tehran, Iran
| | - Babak Teimourpour
- Department of Information Technology Engineering, School of Systems and Industrial Engineering, Tarbiat Modares University (TMU), Tehran, Iran
| | - Abbas Nowzari-Dalini
- Department of Computer Science, School of Mathematics, Statistics, and Computer Science, University of Tehran, Tehran, Iran
| |
Collapse
|
47
|
Dressler L, Bortolomeazzi M, Keddar MR, Misetic H, Sartini G, Acha-Sagredo A, Montorsi L, Wijewardhane N, Repana D, Nulsen J, Goldman J, Pollitt M, Davis P, Strange A, Ambrose K, Ciccarelli FD. Comparative assessment of genes driving cancer and somatic evolution in non-cancer tissues: an update of the Network of Cancer Genes (NCG) resource. Genome Biol 2022; 23:35. [PMID: 35078504 PMCID: PMC8790917 DOI: 10.1186/s13059-022-02607-z] [Citation(s) in RCA: 37] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 01/10/2022] [Indexed: 12/30/2022] Open
Abstract
Background Genetic alterations of somatic cells can drive non-malignant clone formation and promote cancer initiation. However, the link between these processes remains unclear and hampers our understanding of tissue homeostasis and cancer development. Results Here, we collect a literature-based repertoire of 3355 well-known or predicted drivers of cancer and non-cancer somatic evolution in 122 cancer types and 12 non-cancer tissues. Mapping the alterations of these genes in 7953 pan-cancer samples reveals that, despite the large size, the known compendium of drivers is still incomplete and biased towards frequently occurring coding mutations. High overlap exists between drivers of cancer and non-cancer somatic evolution, although significant differences emerge in their recurrence. We confirm and expand the unique properties of drivers and identify a core of evolutionarily conserved and essential genes whose germline variation is strongly counter-selected. Somatic alteration in even one of these genes is sufficient to drive clonal expansion but not malignant transformation. Conclusions Our study offers a comprehensive overview of our current understanding of the genetic events initiating clone expansion and cancer revealing significant gaps and biases that still need to be addressed. The compendium of cancer and non-cancer somatic drivers, their literature support, and properties are accessible in the Network of Cancer Genes and Healthy Drivers resource at http://www.network-cancer-genes.org/. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-022-02607-z.
Collapse
|
48
|
Sudhakar M, Rengaswamy R, Raman K. Novel ratio-metric features enable the identification of new driver genes across cancer types. Sci Rep 2022; 12:5. [PMID: 34997044 PMCID: PMC8741763 DOI: 10.1038/s41598-021-04015-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Accepted: 12/13/2021] [Indexed: 12/27/2022] Open
Abstract
An emergent area of cancer genomics is the identification of driver genes. Driver genes confer a selective growth advantage to the cell. While several driver genes have been discovered, many remain undiscovered, especially those mutated at a low frequency across samples. This study defines new features and builds a pan-cancer model, cTaG, to identify new driver genes. The features capture the functional impact of the mutations as well as their recurrence across samples, which helps build a model unbiased to genes with low frequency. The model classifies genes into the functional categories of driver genes, tumour suppressor genes (TSGs) and oncogenes (OGs), having distinct mutation type profiles. We overcome overfitting and show that certain mutation types, such as nonsense mutations, are more important for classification. Further, cTaG was employed to identify tissue-specific driver genes. Some known cancer driver genes predicted by cTaG as TSGs with high probability are ARID1A, TP53, and RB1. In addition to these known genes, potential driver genes predicted are CD36, ZNF750 and ARHGAP35 as TSGs and TAB3 as an oncogene. Overall, our approach surmounts the issue of low recall and bias towards genes with high mutation rates and predicts potential new driver genes for further experimental screening. cTaG is available at https://github.com/RamanLab/cTaG .
Collapse
Affiliation(s)
- Malvika Sudhakar
- Department of Biotechnology, Bhupat Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, India
- Centre for Integrative Biology and Systems mEdicine (IBSE), Indian Institute of Technology Madras, Chennai, India
- Robert Bosch Centre for Data Science and Artificial Intelligence (RBCDSAI), Indian Institute of Technology Madras, Chennai, India
| | - Raghunathan Rengaswamy
- Centre for Integrative Biology and Systems mEdicine (IBSE), Indian Institute of Technology Madras, Chennai, India.
- Robert Bosch Centre for Data Science and Artificial Intelligence (RBCDSAI), Indian Institute of Technology Madras, Chennai, India.
- Department of Chemical Engineering, Indian Institute of Technology Madras, Chennai, India.
| | - Karthik Raman
- Department of Biotechnology, Bhupat Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, India.
- Centre for Integrative Biology and Systems mEdicine (IBSE), Indian Institute of Technology Madras, Chennai, India.
- Robert Bosch Centre for Data Science and Artificial Intelligence (RBCDSAI), Indian Institute of Technology Madras, Chennai, India.
| |
Collapse
|
49
|
Network Approaches for Precision Oncology. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1361:199-213. [DOI: 10.1007/978-3-030-91836-1_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
50
|
Ahmed R, Erten C, Houdjedj A, Kazan H, Yalcin C. A Network-Centric Framework for the Evaluation of Mutual Exclusivity Tests on Cancer Drivers. Front Genet 2021; 12:746495. [PMID: 34899838 PMCID: PMC8664367 DOI: 10.3389/fgene.2021.746495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Accepted: 10/27/2021] [Indexed: 12/03/2022] Open
Abstract
One of the key concepts employed in cancer driver gene identification is that of mutual exclusivity (ME); a driver mutation is less likely to occur in case of an earlier mutation that has common functionality in the same molecular pathway. Several ME tests have been proposed recently, however the current protocols to evaluate ME tests have two main limitations. Firstly the evaluations are mostly with respect to simulated data and secondly the evaluation metrics lack a network-centric view. The latter is especially crucial as the notion of common functionality can be achieved through searching for interaction patterns in relevant networks. We propose a network-centric framework to evaluate the pairwise significances found by statistical ME tests. It has three main components. The first component consists of metrics employed in the network-centric ME evaluations. Such metrics are designed so that network knowledge and the reference set of known cancer genes are incorporated in ME evaluations under a careful definition of proper control groups. The other two components are designed as further mechanisms to avoid confounders inherent in ME detection on top of the network-centric view. To this end, our second objective is to dissect the side effects caused by mutation load artifacts where mutations driving tumor subtypes with low mutation load might be incorrectly diagnosed as mutually exclusive. Finally, as part of the third main component, the confounding issue stemming from the use of nonspecific interaction networks generated as combinations of interactions from different tissues is resolved through the creation and use of tissue-specific networks in the proposed framework. The data, the source code and useful scripts are available at: https://github.com/abu-compbio/NetCentric.
Collapse
Affiliation(s)
- Rafsan Ahmed
- Electrical and Computer Engineering Graduate Program, Antalya Bilim University, Antalya, Turkey
| | - Cesim Erten
- Department of Computer Engineering, Antalya Bilim University, Antalya, Turkey
| | - Aissa Houdjedj
- Department of Computer Engineering, Antalya Bilim University, Antalya, Turkey
| | - Hilal Kazan
- Department of Computer Engineering, Antalya Bilim University, Antalya, Turkey
| | - Cansu Yalcin
- Department of Computer Engineering, Antalya Bilim University, Antalya, Turkey
| |
Collapse
|