1
|
Lu D, Zheng Y, Yi X, Hao J, Zeng X, Han L, Li Z, Jiao S, Jiang B, Ai J, Peng J. Identifying potential risk genes for clear cell renal cell carcinoma with deep reinforcement learning. Nat Commun 2025; 16:3591. [PMID: 40234405 DOI: 10.1038/s41467-025-58439-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Accepted: 03/18/2025] [Indexed: 04/17/2025] Open
Abstract
Clear cell renal cell carcinoma (ccRCC) is the most prevalent type of renal cell carcinoma. However, our understanding of ccRCC risk genes remains limited. This gap in knowledge poses challenges to the effective diagnosis and treatment of ccRCC. To address this problem, we propose a deep reinforcement learning-based computational approach named RL-GenRisk to identify ccRCC risk genes. Distinct from traditional supervised models, RL-GenRisk frames the identification of ccRCC risk genes as a Markov Decision Process, combining the graph convolutional network and Deep Q-Network for risk gene identification. Moreover, a well-designed data-driven reward is proposed for mitigating the limitation of scant known risk genes. The evaluation demonstrates that RL-GenRisk outperforms existing methods in ccRCC risk gene identification. Additionally, RL-GenRisk identifies eight potential ccRCC risk genes. We successfully validated epidermal growth factor receptor (EGFR) and piccolo presynaptic cytomatrix protein (PCLO), corroborated through independent datasets and biological experimentation. This approach may also be used for other diseases in the future.
Collapse
Affiliation(s)
- Dazhi Lu
- AI for Science Interdisciplinary Research Center, School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Yan Zheng
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Xianyanling Yi
- Department of Urology, Institute of Urology, West China Hospital, Sichuan University, Chengdu, China
| | - Jianye Hao
- College of Intelligence and Computing, Tianjin University, Tianjin, China.
| | - Xi Zeng
- AI for Science Interdisciplinary Research Center, School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Lu Han
- AI for Science Interdisciplinary Research Center, School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Zhigang Li
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Shaoqing Jiao
- School of Software, Northwestern Polytechnical University, Xi'an, China
| | - Bei Jiang
- Tianjin Second People's Hospital, Tianjin, China
| | - Jianzhong Ai
- Department of Urology, Institute of Urology, West China Hospital, Sichuan University, Chengdu, China.
| | - Jiajie Peng
- AI for Science Interdisciplinary Research Center, School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
- Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, China.
| |
Collapse
|
2
|
Das S, Patel V, Chakravarty S, Ghosh A, Mukhopadhyay A, Biswas NK. An ensemble machine learning-based performance evaluation identifies top In-Silico pathogenicity prediction methods that best classify driver mutations in cancer. BioData Min 2025; 18:7. [PMID: 39833905 PMCID: PMC11744934 DOI: 10.1186/s13040-024-00420-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Accepted: 12/26/2024] [Indexed: 01/22/2025] Open
Abstract
BACKGROUND AND OBJECTIVE Accurate identification and prioritization of driver-mutations in cancer is critical for effective patient management. Despite the presence of numerous bioinformatic algorithms for estimating mutation pathogenicity, there is significant variation in their assessments. This inconsistency is evident even for well-established cancer driver mutations. This study aims to develop an ensemble machine learning approach to evaluate the performance (rank) of pathogenic and conservation scoring algorithms (PCSAs) based on their ability to distinguish pathogenic driver mutations from benign passenger (non-driver) mutations in head and neck squamous cell carcinoma (HNSC). METHODS The study used a dataset from 502 HNSC patients, classifying mutations based on 299 known high-confidence cancer driver genes. Missense somatic mutations in driver genes were treated as driver mutations, while non-driver mutations were randomly selected from other genes. Each mutation was annotated with 41 PCSAs. Three machine learning algorithms-logistic regression, random forest, and support vector machine-along with recursive feature elimination, were used to rank these PCSAs. The final ranking of the PCSAs was determined using rank-average-sort and rank-sum-sort methods. RESULTS The random forest algorithm emerged as the top performer among the three tested ML algorithms, with an AUC-ROC of 0.89, compared to 0.83 for the other two, in distinguishing pathogenic driver mutations from benign passenger mutations using all 41 PCSAs. The top 11 PCSAs were selected based on the first quintile cut-off from the final rank-sum distribution. Classifiers built using these top 11 PCSAs (DEOGEN2, Integrated_fitCons, MVP, etc.) demonstrated significantly higher performance (p-value < 2.22e-16) compared to those using the remaining 30 PCSAs across all three ML algorithms, in separating pathogenic driver from benign passenger mutations. The top PCSAs demonstrated strong performance on a validation cohort including independent HNSC and other cancer types: breast, lung, and colorectal - reflecting its consistency, robustness and generalizability. CONCLUSIONS The ensemble machine learning approach effectively evaluates the performance of PCSAs based on their ability to differentiate pathogenic drivers from benign passenger mutations in HNSC and other cancer types. Notably, some well-known PCSAs performed poorly, underscoring the importance of data-driven selection over relying solely on popularity.
Collapse
Affiliation(s)
- Subrata Das
- Biotechnology Research and Innovation Council-National Institute of Biomedical Genomics (BRIC-NIBMG), National Institute of Biomedical Genomics, Kalyani, West Bengal, India
| | - Vatsal Patel
- Biotechnology Research and Innovation Council-National Institute of Biomedical Genomics (BRIC-NIBMG), National Institute of Biomedical Genomics, Kalyani, West Bengal, India
| | - Shouvik Chakravarty
- Biotechnology Research and Innovation Council-National Institute of Biomedical Genomics (BRIC-NIBMG), National Institute of Biomedical Genomics, Kalyani, West Bengal, India
- Biotechnology Research and Innovation Council-Regional Centre for Biotechnology (BRIC- RCB), Faridabad, India
| | - Arnab Ghosh
- Biotechnology Research and Innovation Council-National Institute of Biomedical Genomics (BRIC-NIBMG), National Institute of Biomedical Genomics, Kalyani, West Bengal, India
- Biotechnology Research and Innovation Council-Regional Centre for Biotechnology (BRIC- RCB), Faridabad, India
| | - Anirban Mukhopadhyay
- Department of Computer Science and Engineering, University of Kalyani, Kalyani, West Bengal, 741235, India.
| | - Nidhan K Biswas
- Biotechnology Research and Innovation Council-National Institute of Biomedical Genomics (BRIC-NIBMG), National Institute of Biomedical Genomics, Kalyani, West Bengal, India.
| |
Collapse
|
3
|
Zhang H, Lin C, Chen Y, Shen X, Wang R, Chen Y, Lyu J. Enhancing Molecular Network-Based Cancer Driver Gene Prediction Using Machine Learning Approaches: Current Challenges and Opportunities. J Cell Mol Med 2025; 29:e70351. [PMID: 39804102 PMCID: PMC11726689 DOI: 10.1111/jcmm.70351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2024] [Revised: 12/24/2024] [Accepted: 01/02/2025] [Indexed: 01/16/2025] Open
Abstract
Cancer is a complex disease driven by mutations in the genes that play critical roles in cellular processes. The identification of cancer driver genes is crucial for understanding tumorigenesis, developing targeted therapies and identifying rational drug targets. Experimental identification and validation of cancer driver genes are time-consuming and costly. Studies have demonstrated that interactions among genes are associated with similar phenotypes. Therefore, identifying cancer driver genes using molecular network-based approaches is necessary. Molecular network-based random walk-based approaches, which integrate mutation data with protein-protein interaction networks, have been widely employed in predicting cancer driver genes and demonstrated robust predictive potential. However, recent advancements in deep learning, particularly graph-based models, have provided novel opportunities for enhancing the prediction of cancer driver genes. This review aimed to comprehensively explore how machine learning methodologies, particularly network propagation, graph neural networks, autoencoders, graph embeddings, and attention mechanisms, improve the scalability and interpretability of molecular network-based cancer gene prediction.
Collapse
Affiliation(s)
- Hao Zhang
- Postgraduate Training Base Alliance of Wenzhou Medical UniversityWenzhouZhejiangChina
- Wenzhou Key Laboratory of Biophysics, Wenzhou InstituteUniversity of Chinese Academy of SciencesWenzhouZhejiangChina
| | - Chaohuan Lin
- Postgraduate Training Base Alliance of Wenzhou Medical UniversityWenzhouZhejiangChina
- Wenzhou Key Laboratory of Biophysics, Wenzhou InstituteUniversity of Chinese Academy of SciencesWenzhouZhejiangChina
| | - Ying'ao Chen
- Wenzhou Key Laboratory of Biophysics, Wenzhou InstituteUniversity of Chinese Academy of SciencesWenzhouZhejiangChina
| | | | - Ruizhe Wang
- Wenzhou Longwan High SchoolWenzhouZhejiangChina
| | - Yiqi Chen
- Wenzhou Longwan High SchoolWenzhouZhejiangChina
| | - Jie Lyu
- Postgraduate Training Base Alliance of Wenzhou Medical UniversityWenzhouZhejiangChina
- Wenzhou Key Laboratory of Biophysics, Wenzhou InstituteUniversity of Chinese Academy of SciencesWenzhouZhejiangChina
| |
Collapse
|
4
|
Mao Z, Gao F, Sun T, Xiao Y, Wu J, Xiao Y, Chu H, Wu D, Du M, Zheng R, Zhang Z. RB1 Mutations Induce Smoking-Related Bladder Cancer by Modulating the Cytochrome P450 Pathway. ENVIRONMENTAL TOXICOLOGY 2024; 39:5357-5370. [PMID: 39239764 DOI: 10.1002/tox.24409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 07/14/2024] [Accepted: 08/10/2024] [Indexed: 09/07/2024]
Abstract
Cigarette smoking causes multiple cancers by directly influencing mutation burden of driver mutations. However, the mechanism between somatic mutation caused by cigarette smoking and bladder tumorigenesis remains elusive. Smoking-related mutation profile of bladder cancer was characterized by The Cancer Genome Atlas cohort. Integraticve OncoGenomics database was utilized to detect the smoking-related driver genes, and its biological mechanism predictions were interpreted based on bulk transcriptome and single-cell transcriptome, as well as cell experiments. Cigarette smoking was associated with an increased tumor mutational burden under 65 years old (p = 0.031), and generated specific mutational signatures in smokers. RB1 was identified as a differentially mutated driver gene between smokers and nonsmokers, and the mutation rate of RB1 increased twofold after smoking (p = 0.008). RB1 mutations and the 4-aminobiphenyl interference could significantly decrease the RB1 expression level and thus promote the proliferation, invasion, and migration ability of bladder cancer cells. Enrichment analysis and real-time quantitative PCR (RT-qPCR) data showed that RB1 mutations inhibited cytochrome P450 pathway by reducing expression levels of UGT1A6 and AKR1C2. In addition, we also observed that the component of immunological cells was regulated by RB1 mutations through the stronger cell-to-cell interactions between epithelial scissor+ cells and immune cells in smokers. This study highlighted that RB1 mutations could drive smoking-related bladder tumorigenesis through inhibiting cytochrome P450 pathway and regulating tumor immune microenvironment.
Collapse
Affiliation(s)
- Zhenguang Mao
- Department of Environmental Genomics and Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, Center for Global Health, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, School of Public Health, Nanjing Medical University, Nanjing, China
- Institute of Clinical Research, The Affiliated Taizhou People's Hospital of Nanjing Medical University, Taizhou, China
| | - Fang Gao
- Department of Environmental Genomics and Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, Center for Global Health, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, School of Public Health, Nanjing Medical University, Nanjing, China
- Key Laboratory of Environmental Medicine Engineering, Ministry of Education of China, School of Public Health, Southeast University, Nanjing, China
| | - Tuo Sun
- Department of Environmental Genomics and Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, Center for Global Health, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, School of Public Health, Nanjing Medical University, Nanjing, China
- Institute of Clinical Research, The Affiliated Taizhou People's Hospital of Nanjing Medical University, Taizhou, China
| | - Yi Xiao
- Department of Urology, Sir Run Run Hospital of Nanjing Medical University, Nanjing Medical University, Nanjing, China
| | - Jiajin Wu
- Department of Environmental Genomics and Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, Center for Global Health, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, School of Public Health, Nanjing Medical University, Nanjing, China
- Key Laboratory of Environmental Medicine Engineering, Ministry of Education of China, School of Public Health, Southeast University, Nanjing, China
| | - Yanping Xiao
- Department of Environmental Genomics and Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, Center for Global Health, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, School of Public Health, Nanjing Medical University, Nanjing, China
- Institute of Clinical Research, The Affiliated Taizhou People's Hospital of Nanjing Medical University, Taizhou, China
| | - Haiyan Chu
- Department of Environmental Genomics and Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, Center for Global Health, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Dongmei Wu
- Department of Environmental Genomics and Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, Center for Global Health, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Mulong Du
- Department of Environmental Genomics and Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, Center for Global Health, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, School of Public Health, Nanjing Medical University, Nanjing, China
- Department of Biostatistics, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China
- Department of Urology, the Second Affiliated Hospital of Nanjing Medical University, Nanjing Medical University, Nanjing, China
| | - Rui Zheng
- Department of Environmental Genomics and Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, Center for Global Health, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Zhengdong Zhang
- Department of Environmental Genomics and Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, Center for Global Health, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, School of Public Health, Nanjing Medical University, Nanjing, China
- Institute of Clinical Research, The Affiliated Taizhou People's Hospital of Nanjing Medical University, Taizhou, China
| |
Collapse
|
5
|
Pham DT, Tran TD. Drivergene.net: A Cytoscape app for the identification of driver nodes of large-scale complex networks and case studies in discovery of drug target genes. Comput Biol Med 2024; 179:108888. [PMID: 39047507 DOI: 10.1016/j.compbiomed.2024.108888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 06/15/2024] [Accepted: 07/11/2024] [Indexed: 07/27/2024]
Abstract
There are no tools to identify driver nodes of large-scale networks in approach of competition-based controllability. This study proposed a novel method for this computation of large-scale networks. It implemented the method in a new Cytoscape plug-in app called Drivergene.net. Experiments of the software on large-scale biomolecular networks have shown outstanding speed and computing power. Interestingly, 86.67% of the top 10 driver nodes found on these networks are anticancer drug target genes that reside mostly at the innermost K-cores of the networks. Finally, compared method with those of five other researchers and confirmed that the proposed method outperforms the other methods on identification of anticancer drug target genes. Taken together, Drivergene.net is a reliable tool that efficiently detects not only drug target genes from biomolecular networks but also driver nodes of large-scale complex networks. Drivergene.net with a user manual and example datasets are available https://github.com/tinhpd/Drivergene.git.
Collapse
Affiliation(s)
- Duc-Tinh Pham
- Complex Systems and Bioinformatics Lab, Hanoi University of Industry, 298 Cau Dien Street, Bac Tu Liem District, Hanoi, Viet Nam; Graduate University of Science and Technology, Academy of Science and Technology Viet Nam, 18 Hoang Quoc Viet Street, Cau Giay District, Hanoi, Viet Nam
| | - Tien-Dzung Tran
- Complex Systems and Bioinformatics Lab, Hanoi University of Industry, 298 Cau Dien Street, Bac Tu Liem District, Hanoi, Viet Nam; Faculty of Information and Communication Technology, Hanoi University of Industry, 298 Cau Dien Street, Bac Tu Liem District, Hanoi, Viet Nam.
| |
Collapse
|
6
|
Wang J, Yang M, Ali O, Dragland JS, Bjørås M, Farkas L. Predicting regulatory mutations and their target genes by new computational integrative analysis: A study of follicular lymphoma. Comput Biol Med 2024; 178:108787. [PMID: 38901187 DOI: 10.1016/j.compbiomed.2024.108787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 06/12/2024] [Accepted: 06/16/2024] [Indexed: 06/22/2024]
Abstract
Mutations in DNA regulatory regions are increasingly being recognized as important drivers of cancer and other complex diseases. These mutations can regulate gene expression by affecting DNA-protein binding and epigenetic profiles, such as DNA methylation in genome regulatory elements. However, identifying mutation hotspots associated with expression regulation and disease progression in non-coding DNA remains a challenge. Unlike most existing approaches that assign a mutation score to individual single nucleotide polymorphisms (SNP), a mutation block (MB)-based approach was introduced in this study to assess the collective impact of a cluster of SNPs on transcription factor-DNA binding affinity, differential gene expression (DEG), and nearby DNA methylation. Moreover, the long-distance target genes of functional MBs were identified using a new permutation-based algorithm that assessed the significance of correlations between DNA methylation at regulatory regions and target gene expression. Two new Python packages were developed. The Differential Methylation Region (DMR-analysis) analysis tool was used to detect DMR and map them to regulatory elements. The second tool, an integrated DMR, DEG, and SNP analysis tool (DDS-analysis), was used to combine the omics data to identify functional MBs and long-distance target genes. Both tools were validated in follicular lymphoma (FL) cohorts, where not only known functional MBs and their target genes (BCL2 and BCL6) were recovered, but also novel genes were found, including CDCA4 and JAG2, which may be associated with FL development. These genes are linked to target gene expression and are significantly correlated with the methylation of nearby DNA sequences in FL. The proposed computational integrative analysis of multiomics data holds promise for identifying regulatory mutations in cancer and other complex diseases.
Collapse
Affiliation(s)
- Junbai Wang
- Department of Clinical Molecular Biology (EpiGen), Akershus University Hospital and University of Oslo, Lørenskog, Norway; Institute of Clinical Medicine, Faculty of Medicine, University of Oslo, Campus AHUS/Oslo, Norway.
| | - Mingyi Yang
- Department of Microbiology, Oslo University Hospital, Oslo, Norway; Department of Medical Biochemistry, Oslo University Hospital, Oslo, Norway; Centre for Embryology and Healthy Development (CRESCO), University of Oslo, Oslo, 0373, Norway
| | - Omer Ali
- Institute of Clinical Medicine, Faculty of Medicine, University of Oslo, Campus AHUS/Oslo, Norway; Department of Pathology, Oslo University Hospital - Norwegian Radium Hospital, Oslo, Norway
| | - Jenny Sofie Dragland
- Department of Pathology, Oslo University Hospital - Norwegian Radium Hospital, Oslo, Norway
| | - Magnar Bjørås
- Department of Microbiology, Oslo University Hospital, Oslo, Norway; Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway; Centre for Embryology and Healthy Development (CRESCO), University of Oslo, Oslo, 0373, Norway
| | - Lorant Farkas
- Institute of Clinical Medicine, Faculty of Medicine, University of Oslo, Campus AHUS/Oslo, Norway; Department of Pathology, Oslo University Hospital - Norwegian Radium Hospital, Oslo, Norway
| |
Collapse
|
7
|
Wang T, Zhuo L, Chen Y, Fu X, Zeng X, Zou Q. ECD-CDGI: An efficient energy-constrained diffusion model for cancer driver gene identification. PLoS Comput Biol 2024; 20:e1012400. [PMID: 39213450 PMCID: PMC11392234 DOI: 10.1371/journal.pcbi.1012400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 09/12/2024] [Accepted: 08/10/2024] [Indexed: 09/04/2024] Open
Abstract
The identification of cancer driver genes (CDGs) poses challenges due to the intricate interdependencies among genes and the influence of measurement errors and noise. We propose a novel energy-constrained diffusion (ECD)-based model for identifying CDGs, termed ECD-CDGI. This model is the first to design an ECD-Attention encoder by combining the ECD technique with an attention mechanism. ECD-Attention encoder excels at generating robust gene representations that reveal the complex interdependencies among genes while reducing the impact of data noise. We concatenate topological embedding extracted from gene-gene networks through graph transformers to these gene representations. We conduct extensive experiments across three testing scenarios. Extensive experiments show that the ECD-CDGI model possesses the ability to not only be proficient in identifying known CDGs but also efficiently uncover unknown potential CDGs. Furthermore, compared to the GNN-based approach, the ECD-CDGI model exhibits fewer constraints by existing gene-gene networks, thereby enhancing its capability to identify CDGs. Additionally, ECD-CDGI is open-source and freely available. We have also launched the model as a complimentary online tool specifically crafted to expedite research efforts focused on CDGs identification.
Collapse
Affiliation(s)
- Tao Wang
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, China
| | - Linlin Zhuo
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, China
| | - Yifan Chen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Xiangzheng Fu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
8
|
Ma X, Li Z, Du Z, Xu Y, Chen Y, Zhuo L, Fu X, Liu R. Advancing cancer driver gene detection via Schur complement graph augmentation and independent subspace feature extraction. Comput Biol Med 2024; 174:108484. [PMID: 38643595 DOI: 10.1016/j.compbiomed.2024.108484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Revised: 03/18/2024] [Accepted: 04/15/2024] [Indexed: 04/23/2024]
Abstract
Accurately identifying cancer driver genes (CDGs) is crucial for guiding cancer treatment and has recently received great attention from researchers. However, the high complexity and heterogeneity of cancer gene regulatory networks limit the precition accuracy of existing deep learning models. To address this, we introduce a model called SCIS-CDG that utilizes Schur complement graph augmentation and independent subspace feature extraction techniques to effectively predict potential CDGs. Firstly, a random Schur complement strategy is adopted to generate two augmented views of gene network within a graph contrastive learning framework. Rapid randomization of the random Schur complement strategy enhances the model's generalization and its ability to handle complex networks effectively. Upholding the Schur complement principle in expectations promotes the preservation of the original gene network's vital structure in the augmented views. Subsequently, we employ feature extraction technology using multiple independent subspaces, each trained with independent weights to reduce inter-subspace dependence and improve the model's expressiveness. Concurrently, we introduced a feature expansion component based on the structure of the gene network to address issues arising from the limited dimensionality of node features. Moreover, it can alleviate the challenges posed by the heterogeneity of cancer gene networks to some extent. Finally, we integrate a learnable attention weight mechanism into the graph neural network (GNN) encoder, utilizing feature expansion technology to optimize the significance of various feature levels in the prediction task. Following extensive experimental validation, the SCIS-CDG model has exhibited high efficiency in identifying known CDGs and uncovering potential unknown CDGs in external datasets. Particularly when compared to previous conventional GNN models, its performance has seen significant improved. The code and data are publicly available at: https://github.com/mxqmxqmxq/SCIS-CDG.
Collapse
Affiliation(s)
- Xinqian Ma
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, 325027, Wenzhou, China
| | - Zhen Li
- School of Computer Science of Information Technology, Qiannan Normal University for Nationalities, Duyun, Guizhou 558000, China; Institute of Computational Science and Technology, Guangzhou University, 510000, Guangzhou, China
| | - Zhenya Du
- Guangzhou Xinhua University, 510520, Guangzhou, China
| | - Yan Xu
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, 325027, Wenzhou, China
| | - Yifan Chen
- College of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha, Hunan, 410004, China
| | - Linlin Zhuo
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, 325027, Wenzhou, China.
| | - Xiangzheng Fu
- College of Computer Science and Electronic Engineering, Hunan University, 410012, Changsha, China
| | - Ruijun Liu
- School of Software, Beihang University, Beijing, China.
| |
Collapse
|
9
|
Nourbakhsh M, Degn K, Saksager A, Tiberti M, Papaleo E. Prediction of cancer driver genes and mutations: the potential of integrative computational frameworks. Brief Bioinform 2024; 25:bbad519. [PMID: 38261338 PMCID: PMC10805075 DOI: 10.1093/bib/bbad519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 11/27/2023] [Accepted: 12/11/2023] [Indexed: 01/24/2024] Open
Abstract
The vast amount of available sequencing data allows the scientific community to explore different genetic alterations that may drive cancer or favor cancer progression. Software developers have proposed a myriad of predictive tools, allowing researchers and clinicians to compare and prioritize driver genes and mutations and their relative pathogenicity. However, there is little consensus on the computational approach or a golden standard for comparison. Hence, benchmarking the different tools depends highly on the input data, indicating that overfitting is still a massive problem. One of the solutions is to limit the scope and usage of specific tools. However, such limitations force researchers to walk on a tightrope between creating and using high-quality tools for a specific purpose and describing the complex alterations driving cancer. While the knowledge of cancer development increases daily, many bioinformatic pipelines rely on single nucleotide variants or alterations in a vacuum without accounting for cellular compartments, mutational burden or disease progression. Even within bioinformatics and computational cancer biology, the research fields work in silos, risking overlooking potential synergies or breakthroughs. Here, we provide an overview of databases and datasets for building or testing predictive cancer driver tools. Furthermore, we introduce predictive tools for driver genes, driver mutations, and the impact of these based on structural analysis. Additionally, we suggest and recommend directions in the field to avoid silo-research, moving towards integrative frameworks.
Collapse
Affiliation(s)
- Mona Nourbakhsh
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Kristine Degn
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Astrid Saksager
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Matteo Tiberti
- Cancer Structural Biology, Danish Cancer Institute, 2100 Copenhagen, Denmark
| | - Elena Papaleo
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
- Cancer Structural Biology, Danish Cancer Institute, 2100 Copenhagen, Denmark
| |
Collapse
|
10
|
Wang Y, Zhou B, Ru J, Meng X, Wang Y, Liu W. Advances in computational methods for identifying cancer driver genes. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:21643-21669. [PMID: 38124614 DOI: 10.3934/mbe.2023958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Cancer driver genes (CDGs) are crucial in cancer prevention, diagnosis and treatment. This study employed computational methods for identifying CDGs, categorizing them into four groups. The major frameworks for each of these four categories were summarized. Additionally, we systematically gathered data from public databases and biological networks, and we elaborated on computational methods for identifying CDGs using the aforementioned databases. Further, we summarized the algorithms, mainly involving statistics and machine learning, used for identifying CDGs. Notably, the performances of nine typical identification methods for eight types of cancer were compared to analyze the applicability areas of these methods. Finally, we discussed the challenges and prospects associated with methods for identifying CDGs. The present study revealed that the network-based algorithms and machine learning-based methods demonstrated superior performance.
Collapse
Affiliation(s)
- Ying Wang
- School of Computer Science and Engineering, Changshu Institute of Technology, Changshu 215500, China
| | - Bohao Zhou
- School of Computer Science and Engineering, Changshu Institute of Technology, Changshu 215500, China
| | - Jidong Ru
- School of Textile Garment and Design, Changshu Institute of Technology, Changshu 215500, China
| | - Xianglian Meng
- School of Computer Information and Engineering, Changzhou Institute of Technology, Changzhou 213032, China
| | - Yundong Wang
- School of Computer Science and Engineering, Changshu Institute of Technology, Changshu 215500, China
| | - Wenjie Liu
- School of Computer Information and Engineering, Changzhou Institute of Technology, Changzhou 213032, China
| |
Collapse
|
11
|
Lei M, Wu B, Zhang Z, Qin Y, Cao X, Cao Y, Liu B, Su X, Liu Y. A Web-Based Calculator to Predict Early Death Among Patients With Bone Metastasis Using Machine Learning Techniques: Development and Validation Study. J Med Internet Res 2023; 25:e47590. [PMID: 37870889 PMCID: PMC10628690 DOI: 10.2196/47590] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Revised: 07/05/2023] [Accepted: 08/24/2023] [Indexed: 10/24/2023] Open
Abstract
BACKGROUND Patients with bone metastasis often experience a significantly limited survival time, and a life expectancy of <3 months is generally regarded as a contraindication for extensive invasive surgeries. In this context, the accurate prediction of survival becomes very important since it serves as a crucial guide in making clinical decisions. OBJECTIVE This study aimed to develop a machine learning-based web calculator that can provide an accurate assessment of the likelihood of early death among patients with bone metastasis. METHODS This study analyzed a large cohort of 118,227 patients diagnosed with bone metastasis between 2010 and 2019 using the data obtained from a national cancer database. The entire cohort of patients was randomly split 9:1 into a training group (n=106,492) and a validation group (n=11,735). Six approaches-logistic regression, extreme gradient boosting machine, decision tree, random forest, neural network, and gradient boosting machine-were implemented in this study. The performance of these approaches was evaluated using 11 measures, and each approach was ranked based on its performance in each measure. Patients (n=332) from a teaching hospital were used as the external validation group, and external validation was performed using the optimal model. RESULTS In the entire cohort, a substantial proportion of patients (43,305/118,227, 36.63%) experienced early death. Among the different approaches evaluated, the gradient boosting machine exhibited the highest score of prediction performance (54 points), followed by the neural network (52 points) and extreme gradient boosting machine (50 points). The gradient boosting machine demonstrated a favorable discrimination ability, with an area under the curve of 0.858 (95% CI 0.851-0.865). In addition, the calibration slope was 1.02, and the intercept-in-large value was -0.02, indicating good calibration of the model. Patients were divided into 2 risk groups using a threshold of 37% based on the gradient boosting machine. Patients in the high-risk group (3105/4315, 71.96%) were found to be 4.5 times more likely to experience early death compared with those in the low-risk group (1159/7420, 15.62%). External validation of the model demonstrated a high area under the curve of 0.847 (95% CI 0.798-0.895), indicating its robust performance. The model developed by the gradient boosting machine has been deployed on the internet as a calculator. CONCLUSIONS This study develops a machine learning-based calculator to assess the probability of early death among patients with bone metastasis. The calculator has the potential to guide clinical decision-making and improve the care of patients with bone metastasis by identifying those at a higher risk of early death.
Collapse
Affiliation(s)
- Mingxing Lei
- Senior Department of Orthopedics, The Fourth Medical Center of PLA General Hospital, Beijing, China
- Department of Orthopedics, Hainan Hospital of Chinese PLA General Hospital, Hainan, China
- Chinese PLA Medical School, Beijing, China
| | - Bing Wu
- Senior Department of Orthopedics, The Fourth Medical Center of PLA General Hospital, Beijing, China
- Department of Orthopedics, The First Medical Center of PLA General Hospital, Beijing, China
| | - Zhicheng Zhang
- Senior Department of Orthopedics, The Fourth Medical Center of PLA General Hospital, Beijing, China
| | - Yong Qin
- Department of Joint and Sports Medicine Surgery, The Second Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Xuyong Cao
- Senior Department of Orthopedics, The Fourth Medical Center of PLA General Hospital, Beijing, China
| | - Yuncen Cao
- Senior Department of Orthopedics, The Fourth Medical Center of PLA General Hospital, Beijing, China
| | - Baoge Liu
- Department of Orthopedics, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Xiuyun Su
- Intelligent Medical Innovation institute, Southern University of Science and Technology Hospital, Shenzhen, China
| | - Yaosheng Liu
- Senior Department of Orthopedics, The Fourth Medical Center of PLA General Hospital, Beijing, China
- Department of Orthopedics, The Fifth Medical Center of PLA General Hospital, Beijing, China
- National Clinical Research Center for Orthopedics, Sports Medicine & Rehabilitation, PLA General Hospital, Beijing, China
| |
Collapse
|
12
|
Cui Y, Wang Z, Wang X, Zhang Y, Zhang Y, Pan T, Zhang Z, Li S, Guo Y, Akutsu T, Song J. SMG: self-supervised masked graph learning for cancer gene identification. Brief Bioinform 2023; 24:bbad406. [PMID: 37950905 PMCID: PMC10639095 DOI: 10.1093/bib/bbad406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 09/26/2023] [Accepted: 10/24/2023] [Indexed: 11/13/2023] Open
Abstract
Cancer genomics is dedicated to elucidating the genes and pathways that contribute to cancer progression and development. Identifying cancer genes (CGs) associated with the initiation and progression of cancer is critical for characterization of molecular-level mechanism in cancer research. In recent years, the growing availability of high-throughput molecular data and advancements in deep learning technologies has enabled the modelling of complex interactions and topological information within genomic data. Nevertheless, because of the limited labelled data, pinpointing CGs from a multitude of potential mutations remains an exceptionally challenging task. To address this, we propose a novel deep learning framework, termed self-supervised masked graph learning (SMG), which comprises SMG reconstruction (pretext task) and task-specific fine-tuning (downstream task). In the pretext task, the nodes of multi-omic featured protein-protein interaction (PPI) networks are randomly substituted with a defined mask token. The PPI networks are then reconstructed using the graph neural network (GNN)-based autoencoder, which explores the node correlations in a self-prediction manner. In the downstream tasks, the pre-trained GNN encoder embeds the input networks into feature graphs, whereas a task-specific layer proceeds with the final prediction. To assess the performance of the proposed SMG method, benchmarking experiments are performed on three node-level tasks (identification of CGs, essential genes and healthy driver genes) and one graph-level task (identification of disease subnetwork) across eight PPI networks. Benchmarking experiments and performance comparison with existing state-of-the-art methods demonstrate the superiority of SMG on multi-omic feature engineering.
Collapse
Affiliation(s)
- Yan Cui
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
| | - Zhikang Wang
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Xiaoyu Wang
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Yiwen Zhang
- School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia
| | - Ying Zhang
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| | - Tong Pan
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | | | - Shanshan Li
- School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia
| | - Yuming Guo
- School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| |
Collapse
|
13
|
Yang H, Liu Y, Yang Y, Li D, Wang Z. InDEP: an interpretable machine learning approach to predict cancer driver genes from multi-omics data. Brief Bioinform 2023; 24:bbad318. [PMID: 37649392 DOI: 10.1093/bib/bbad318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Revised: 06/14/2023] [Accepted: 08/16/2023] [Indexed: 09/01/2023] Open
Abstract
Cancer driver genes are critical in driving tumor cell growth, and precisely identifying these genes is crucial in advancing our understanding of cancer pathogenesis and developing targeted cancer drugs. Despite the current methods for discovering cancer driver genes that mainly rely on integrating multi-omics data, many existing models are overly complex, and it is difficult to interpret the results accurately. This study aims to address this issue by introducing InDEP, an interpretable machine learning framework based on cascade forests. InDEP is designed with easy-to-interpret features, cascade forests based on decision trees and a KernelSHAP module that enables fine-grained post-hoc interpretation. Integrating multi-omics data, InDEP can identify essential features of classified driver genes at both the gene and cancer-type levels. The framework accurately identifies driver genes, discovers new patterns that make genes as driver genes and refines the cancer driver gene catalog. In comparison with state-of-the-art methods, InDEP proved to be more accurate on the test set and identified reliable candidate driver genes. Mutational features were the primary drivers for InDEP's identifying driver genes, with other omics features also contributing. At the gene level, the framework concluded that substitution-type mutations were the main reason most genes were identified as driver genes. InDEP's ability to identify reliable candidate driver genes opens up new avenues for precision oncology and discovering new biomedical knowledge. This framework can help advance cancer research by providing an interpretable method for identifying cancer driver genes and their contribution to cancer pathogenesis, facilitating the development of targeted cancer drugs.
Collapse
Affiliation(s)
- Hai Yang
- Department of Computer Science and Engineering, East China University of Science and Technology, 200237, Shanghai, PR China
| | - Yawen Liu
- Department of Computer Science and Engineering, East China University of Science and Technology, 200237, Shanghai, PR China
| | - Yijing Yang
- Department of Computer Science, University of Illinois Urbana-Champaign, Champaign, Illinois, United States of America
| | - Dongdong Li
- Department of Computer Science and Engineering, East China University of Science and Technology, 200237, Shanghai, PR China
| | - Zhe Wang
- Department of Computer Science and Engineering, East China University of Science and Technology, 200237, Shanghai, PR China
| |
Collapse
|
14
|
Yap JYY, Goh LSH, Lim AJW, Chong SS, Lim LJ, Lee CG. Machine Learning Identifies a Signature of Nine Exosomal RNAs That Predicts Hepatocellular Carcinoma. Cancers (Basel) 2023; 15:3749. [PMID: 37509410 PMCID: PMC10377993 DOI: 10.3390/cancers15143749] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 07/21/2023] [Accepted: 07/23/2023] [Indexed: 07/30/2023] Open
Abstract
Hepatocellular carcinoma (HCC) is the third leading cause of cancer-related death worldwide. Although alpha fetoprotein (AFP) remains a commonly used serological marker of HCC, the sensitivity and specificity of AFP in detecting HCC is often limited. Exosomal RNA has emerged as a promising diagnostic tool for various cancers, but its use in HCC detection has yet to be fully explored. Here, we employed Machine Learning on 114,602 exosomal RNAs to identify a signature that can predict HCC. The exosomal expression data of 118 HCC patients and 112 healthy individuals were stratified split into Training, Validation and Unseen Test datasets. Feature selection was then performed on the initial training dataset using permutation importance, and the predictive performance of the selected features were tested on the validation dataset using Support Vector Machine (SVM) Classifier. A minimum of nine features were identified to be predictive of HCC and these nine features were then evaluated across six different models in an unseen test set. These features, mainly in the immune, platelet/neutrophil and cytoskeletal pathways, exhibited good predictive performance with ROC-AUC from 0.79-0.88 in the unseen test set. Hence, these nine exosomal RNAs have potential to be clinically useful minimally invasive biomarkers for HCC.
Collapse
Affiliation(s)
- Josephine Yu Yan Yap
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117596, Singapore
- NUS Graduate School, National University of Singapore, Singapore 119077, Singapore
| | - Laura Shih Hui Goh
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117596, Singapore
| | - Ashley Jun Wei Lim
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117596, Singapore
| | - Samuel S Chong
- Department of Paediatrics and Obstetrics & Gynaecology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 119074, Singapore
| | - Lee Jin Lim
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117596, Singapore
| | - Caroline G Lee
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117596, Singapore
- NUS Graduate School, National University of Singapore, Singapore 119077, Singapore
- Division of Cellular & Molecular Research, Humphrey Oei Institute of Cancer Research, National Cancer Centre Singapore, Singapore 168583, Singapore
- Duke-NUS Medical School, Singapore 169857, Singapore
| |
Collapse
|
15
|
Li S, Chen X, Chen J, Wu B, Liu J, Guo Y, Li M, Pu X. Multi-omics integration analysis of GPCRs in pan-cancer to uncover inter-omics relationships and potential driver genes. Comput Biol Med 2023; 161:106988. [PMID: 37201441 DOI: 10.1016/j.compbiomed.2023.106988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 03/30/2023] [Accepted: 04/27/2023] [Indexed: 05/20/2023]
Abstract
G protein-coupled receptors (GPCRs) are the largest drug target family. Unfortunately, applications of GPCRs in cancer therapy are scarce due to very limited knowledge regarding their correlations with cancers. Multi-omics data enables systematic investigations of GPCRs, yet their effective integration remains a challenge due to the complexity of the data. Here, we adopt two types of integration strategies, multi-staged and meta-dimensional approaches, to fully characterize somatic mutations, somatic copy number alterations (SCNAs), DNA methylations, and mRNA expressions of GPCRs in 33 cancers. Results from the multi-staged integration reveal that GPCR mutations cannot well predict expression dysregulation. The correlations between expressions and SCNAs are primarily positive, while correlations of the methylations with expressions and SCNAs are bimodal with negative correlations predominating. Based on these correlations, 32 and 144 potential cancer-related GPCRs driven by aberrant SCNA and methylation are identified, respectively. In addition, the meta-dimensional integration analysis is carried out by using deep learning models, which predict more than one hundred GPCRs as potential oncogenes. When comparing results between the two integration strategies, 165 cancer-related GPCRs are common in both, suggesting that they should be prioritized in future studies. However, 172 GPCRs emerge in only one, indicating that the two integration strategies should be considered concurrently to complement the information missed by the other such that obtain a more comprehensive understanding. Finally, correlation analysis further reveals that GPCRs, in particular for the class A and adhesion receptors, are generally immune-related. In a whole, the work is for the first time to reveal the associations between different omics layers and highlight the necessity of combing the two strategies in identifying cancer-related GPCRs.
Collapse
Affiliation(s)
- Shiqi Li
- College of Chemistry, Sichuan University, Chengdu, 610064, China.
| | - Xin Chen
- College of Chemistry, Sichuan University, Chengdu, 610064, China.
| | - Jianfang Chen
- College of Chemistry, Sichuan University, Chengdu, 610064, China.
| | - Binjian Wu
- College of Chemistry, Sichuan University, Chengdu, 610064, China.
| | - Jing Liu
- College of Chemistry, Sichuan University, Chengdu, 610064, China.
| | - Yanzhi Guo
- College of Chemistry, Sichuan University, Chengdu, 610064, China.
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu, 610064, China.
| | - Xuemei Pu
- College of Chemistry, Sichuan University, Chengdu, 610064, China.
| |
Collapse
|
16
|
The Cancermuts software package for the prioritization of missense cancer variants: a case study of AMBRA1 in melanoma. Cell Death Dis 2022; 13:872. [PMID: 36243772 PMCID: PMC9569343 DOI: 10.1038/s41419-022-05318-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 09/27/2022] [Accepted: 10/03/2022] [Indexed: 11/07/2022]
Abstract
Cancer genomics and cancer mutation databases have made an available wealth of information about missense mutations found in cancer patient samples. Contextualizing by means of annotation and predicting the effect of amino acid change help identify which ones are more likely to have a pathogenic impact. Those can be validated by means of experimental approaches that assess the impact of protein mutations on the cellular functions or their tumorigenic potential. Here, we propose the integrative bioinformatic approach Cancermuts, implemented as a Python package. Cancermuts is able to gather known missense cancer mutations from databases such as cBioPortal and COSMIC, and annotate them with the pathogenicity score REVEL as well as information on their source. It is also able to add annotations about the protein context these mutations are found in, such as post-translational modification sites, structured/unstructured regions, presence of short linear motifs, and more. We applied Cancermuts to the intrinsically disordered protein AMBRA1, a key regulator of many cellular processes frequently deregulated in cancer. By these means, we classified mutations of AMBRA1 in melanoma, where AMBRA1 is highly mutated and displays a tumor-suppressive role. Next, based on REVEL score, position along the sequence, and their local context, we applied cellular and molecular approaches to validate the predicted pathogenicity of a subset of mutations in an in vitro melanoma model. By doing so, we have identified two AMBRA1 mutations which show enhanced tumorigenic potential and are worth further investigation, highlighting the usefulness of the tool. Cancermuts can be used on any protein targets starting from minimal information, and it is available at https://www.github.com/ELELAB/cancermuts as free software.
Collapse
|