1
|
Valerio M, Inno A, Zambelli A, Cortesi L, Lorusso D, Viassolo V, Verzè M, Nicolis F, Gori S. Deep Neural Network Integrated into Network-Based Stratification (D3NS): A Method to Uncover Cancer Subtypes from Somatic Mutations. Cancers (Basel) 2024; 16:2845. [PMID: 39199616 PMCID: PMC11352240 DOI: 10.3390/cancers16162845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 08/06/2024] [Accepted: 08/12/2024] [Indexed: 09/01/2024] Open
Abstract
(1) Background: The identification of tumor subtypes is fundamental in precision medicine for accurate diagnoses and personalized therapies. Cancer development is often driven by the accumulation of somatic mutations that can cause alterations in tissue functions and morphologies. In this work, a method based on a deep neural network integrated into a network-based stratification framework (D3NS) is proposed to stratify tumors according to somatic mutations. (2) Methods: This approach leverages the power of deep neural networks to detect hidden information in the data by combining the knowledge contained in a network of gene interactions, as typical of network-based stratification methods. D3NS was applied using real-world data from The Cancer Genome Atlas for bladder, ovarian, and kidney cancers. (3) Results: This technique allows for the identification of tumor subtypes characterized by different survival rates and significant associations with several clinical outcomes (tumor stage, grade or response to therapy). (4) Conclusion: D3NS can provide a base model in cancer research and could be considered as a useful tool for tumor stratification, offering potential support in clinical settings.
Collapse
Affiliation(s)
- Matteo Valerio
- Medical Oncology, IRCCS Sacro Cuore Don Calabria Hospital, 37024 Negrar di Valpolicella, Verona, Italy
| | - Alessandro Inno
- Medical Oncology, IRCCS Sacro Cuore Don Calabria Hospital, 37024 Negrar di Valpolicella, Verona, Italy
| | - Alberto Zambelli
- Medical Oncology Unit, IRCCS Istituto Clinico Humanitas and Department of Biomedical Sciences, Humanitas University, 20089 Rozzano, Milan, Italy;
| | - Laura Cortesi
- Oncology, Hematology, and Respiratory Diseases, Azienda Ospedaliera-Universitaria, Policlinico di Modena, 41124 Modena, Italy
| | - Domenica Lorusso
- Gynecologic Oncology Unit, Humanitas San Pio X, Milan and Humanitas University, Pieve Emanuele, 20090 Milan, Italy
| | - Valeria Viassolo
- Medical Genetics, Medical Direction, IRCCS Sacro Cuore Don Calabria Hospital, 37024 Negrar di Valpolicella, Verona, Italy;
| | - Matteo Verzè
- Medical Direction, IRCCS Sacro Cuore Don Calabria Hospital, 37024 Negrar di Valpolicella, Verona, Italy; (M.V.)
| | - Fabrizio Nicolis
- Medical Direction, IRCCS Sacro Cuore Don Calabria Hospital, 37024 Negrar di Valpolicella, Verona, Italy; (M.V.)
| | - Stefania Gori
- Medical Oncology, IRCCS Sacro Cuore Don Calabria Hospital, 37024 Negrar di Valpolicella, Verona, Italy
| |
Collapse
|
2
|
Zou M, Li H, Su D, Xiong Y, Wei H, Wang S, Sun H, Wang T, Xi Q, Zuo Y, Yang L. Integrating somatic mutation profiles with structural deep clustering network for metabolic stratification in pancreatic cancer: a comprehensive analysis of prognostic and genomic landscapes. Brief Bioinform 2023; 25:bbad430. [PMID: 38040491 PMCID: PMC10783866 DOI: 10.1093/bib/bbad430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 09/29/2023] [Accepted: 11/05/2023] [Indexed: 12/03/2023] Open
Abstract
Pancreatic cancer is a globally recognized highly aggressive malignancy, posing a significant threat to human health and characterized by pronounced heterogeneity. In recent years, researchers have uncovered that the development and progression of cancer are often attributed to the accumulation of somatic mutations within cells. However, cancer somatic mutation data exhibit characteristics such as high dimensionality and sparsity, which pose new challenges in utilizing these data effectively. In this study, we propagated the discrete somatic mutation data of pancreatic cancer through a network propagation model based on protein-protein interaction networks. This resulted in smoothed somatic mutation profile data that incorporate protein network information. Based on this smoothed mutation profile data, we obtained the activity levels of different metabolic pathways in pancreatic cancer patients. Subsequently, using the activity levels of various metabolic pathways in cancer patients, we employed a deep clustering algorithm to establish biologically and clinically relevant metabolic subtypes of pancreatic cancer. Our study holds scientific significance in classifying pancreatic cancer based on somatic mutation data and may provide a crucial theoretical basis for the diagnosis and immunotherapy of pancreatic cancer patients.
Collapse
Affiliation(s)
- Min Zou
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Honghao Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Dongqing Su
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yuqiang Xiong
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Haodong Wei
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Shiyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Hongmei Sun
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Tao Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Qilemuge Xi
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot 010070, China
| | - Yongchun Zuo
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot 010070, China
- Digital College, Inner Mongolia Intelligent Union Big Data Academy, Inner Mongolia Wesure Date Technology Co., Ltd. Hohhot 010010, China
- Inner Mongolia International Mongolian Hospital, Hohhot 010065, China
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| |
Collapse
|
3
|
Petti M, Farina L. Network medicine for patients' stratification: From single-layer to multi-omics. WIREs Mech Dis 2023; 15:e1623. [PMID: 37323106 DOI: 10.1002/wsbm.1623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 03/08/2023] [Accepted: 05/30/2023] [Indexed: 06/17/2023]
Abstract
Precision medicine research increasingly relies on the integrated analysis of multiple types of omics. In the era of big data, the large availability of different health-related information represents a great, but at the same time untapped, chance with a potentially fundamental role in the prevention, diagnosis and prognosis of diseases. Computational methods are needed to combine this data to create a comprehensive view of a given disease. Network science can model biomedical data in terms of relationships among molecular players of different nature and has been successfully proposed as a new paradigm for studying human diseases. Patient stratification is an open challenge aimed at identifying subtypes with different disease manifestations, severity, and expected survival time. Several stratification approaches based on high-throughput gene expression measurements have been successfully applied. However, few attempts have been proposed to exploit the integration of various genotypic and phenotypic data to discover novel sub-types or improve the detection of known groupings. This article is categorized under: Cancer > Biomedical Engineering Cancer > Computational Models Cancer > Genetics/Genomics/Epigenetics.
Collapse
Affiliation(s)
- Manuela Petti
- Department of Computer, Control and Management Engineering, Sapienza University of Rome, Rome, Italy
| | - Lorenzo Farina
- Department of Computer, Control and Management Engineering, Sapienza University of Rome, Rome, Italy
| |
Collapse
|
4
|
Joshi S, Natteshan NVS, Rastogi R, Sampathkumar A, Pandimurugan V, Sountharrajan S. A novel artificial intelligence approach to detect the breast cancer using KNNet technique with EPM gene profiling. Funct Integr Genomics 2023; 23:302. [PMID: 37721631 DOI: 10.1007/s10142-023-01227-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Revised: 08/15/2023] [Accepted: 09/02/2023] [Indexed: 09/19/2023]
Abstract
Women's most frequent type of cancer is breast cancer, second only to lung cancer. This paper summarizes changes in genomics and epigenetics and incremental biological activities. A tumour develops through a series of phases involving a separate abnormal gene. Even though many diseases cause DNA mutations, most treatments are designed to relieve symptoms rather than change the DNA. Clustering short palindromic repeats (CRISPR) or Cas9 is the primary approach for discovering and confirming tumorigenic genomic targets. A Kohonen neural network with an expression programming model was developed for gene selection. The main problem in genetic selection is reducing the number of features chosen while maintaining accuracy. This purpose is accomplished systematically. In the end, the approach method performed better than the existing quantum squirrel-inspired algorithm and the recurrent neural network oppositional call search algorithm for genetic selection. The KNNet-EPM model used an expression programming approach to identify gene biomarkers for breast cancer. This method was achieved with RAE of 42%, sensitivity of 93%, f1 score of 88%, accuracy of 98%, kappa score of 83%, specificity of 92% and MAE of 30%.
Collapse
Affiliation(s)
- Shubham Joshi
- Department of Computer Science Engineering, Symbiosis Institute of Technology, Symbiosis International (Deemed) University, Pune, India
| | - N V S Natteshan
- School of Computing, Kalasalingam Academy of Research and Education, Krishnan Koil, TN, India
| | - Ravi Rastogi
- Department of CSE, Koneru Lakshmaiah Education Foundation, Vaddeswaram, AP, India
| | - A Sampathkumar
- Department of Applied Cybernetics, Faculty of Science, University of Hradec Kralove, Hradec Kralove, Czech Republic.
| | - V Pandimurugan
- School of Computing, Department of Networking and Communications, SRMIST, Kattankulathur Campus, Chennai, 603203, India
| | - S Sountharrajan
- Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Chennai, India
| |
Collapse
|
5
|
Sato M, Sato S, Shintani D, Hanaoka M, Ogasawara A, Miwa M, Yabuno A, Kurosaki A, Yoshida H, Fujiwara K, Hasegawa K. Clinical significance of metabolism-related genes and FAK activity in ovarian high-grade serous carcinoma. BMC Cancer 2022; 22:59. [PMID: 35027024 PMCID: PMC8756654 DOI: 10.1186/s12885-021-09148-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2021] [Accepted: 12/22/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Administration of poly (ADP-ribose) polymerase (PARP) inhibitors after achieving a response to platinum-containing drugs significantly prolonged relapse-free survival compared to placebo administration. PARP inhibitors have been used in clinical practice. However, patients with platinum-resistant relapsed ovarian cancer still have a poor prognosis and there is an unmet need. The purpose of this study was to examine the clinical significance of metabolic genes and focal adhesion kinase (FAK) activity in advanced ovarian high-grade serous carcinoma (HGSC). METHODS The RNA sequencing (RNA-seq) data and clinical data of HGSC patients were obtained from the Genomic Data Commons (GDC) Data Portal and analysed ( https://portal.gdc.cancer.gov/ ). In addition, tumour tissue was sampled by laparotomy or screening laparoscopy prior to treatment initiation from patients diagnosed with stage IIIC ovarian cancer (International Federation of Gynecology and Obstetrics (FIGO) classification, 2014) at the Saitama Medical University International Medical Center, and among the patients diagnosed with HGSC, 16 cases of available cryopreserved specimens were included in this study. The present study was reviewed and approved by the Institutional Review Board of Saitama Medical University International Medical Center (Saitama, Japan). Among the 6307 variable genes detected in both The Cancer Genome Atlas-Ovarian (TCGA-OV) data and clinical specimen data, 35 genes related to metabolism and FAK activity were applied. RNA-seq data were analysed using the Subio Platform (Subio Inc, Japan). JMP 15 (SAS, USA) was used for statistical analysis and various types of machine learning. The Kaplan-Meier method was used for survival analysis, and the Wilcoxon test was used to analyse significant differences. P < 0.05 was considered significant. RESULTS In the TCGA-OV data, patients with stage IIIC with a residual tumour diameter of 1-10 mm were selected for K means clustering and classified into groups with significant prognostic correlations (p = 0.0444). These groups were significantly associated with platinum sensitivity/resistance in clinical cases (χ2 test, p = 0.0408) and showed significant relationships with progression-free survival (p = 0.0307). CONCLUSION In the TCGA-OV data, 2 groups classified by clustering focusing on metabolism-related genes and FAK activity were shown to be associated with platinum resistance and a poor prognosis.
Collapse
Affiliation(s)
- Masakazu Sato
- Department of Gynecologic Oncology, Saitama Medical University International Medical Center, 1397-1 Yamane, Hidaka, Saitama, 350-1298, Japan.
| | - Sho Sato
- Department of Gynecologic Oncology, Saitama Medical University International Medical Center, 1397-1 Yamane, Hidaka, Saitama, 350-1298, Japan
| | - Daisuke Shintani
- Department of Gynecologic Oncology, Saitama Medical University International Medical Center, 1397-1 Yamane, Hidaka, Saitama, 350-1298, Japan
| | - Mieko Hanaoka
- Department of Gynecologic Oncology, Saitama Medical University International Medical Center, 1397-1 Yamane, Hidaka, Saitama, 350-1298, Japan
| | - Aiko Ogasawara
- Department of Gynecologic Oncology, Saitama Medical University International Medical Center, 1397-1 Yamane, Hidaka, Saitama, 350-1298, Japan
| | - Maiko Miwa
- Department of Gynecologic Oncology, Saitama Medical University International Medical Center, 1397-1 Yamane, Hidaka, Saitama, 350-1298, Japan
| | - Akira Yabuno
- Department of Gynecologic Oncology, Saitama Medical University International Medical Center, 1397-1 Yamane, Hidaka, Saitama, 350-1298, Japan
| | - Akira Kurosaki
- Department of Gynecologic Oncology, Saitama Medical University International Medical Center, 1397-1 Yamane, Hidaka, Saitama, 350-1298, Japan
| | - Hiroyuki Yoshida
- Department of Gynecologic Oncology, Saitama Medical University International Medical Center, 1397-1 Yamane, Hidaka, Saitama, 350-1298, Japan
| | | | - Kosei Hasegawa
- Department of Gynecologic Oncology, Saitama Medical University International Medical Center, 1397-1 Yamane, Hidaka, Saitama, 350-1298, Japan
| |
Collapse
|
6
|
Network Approaches for Precision Oncology. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1361:199-213. [DOI: 10.1007/978-3-030-91836-1_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
7
|
Liu Y, Xie G, Li A, He Z, Hei X. Prediction of Cancer-Related piRNAs Based on Network-Based Stratification Analysis. INT J PATTERN RECOGN 2021. [DOI: 10.1142/s0218001422590029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
PIWI-interacting RNA (PiRNA) was discovered in 2006 and is expected to become a new biomarker for diagnosis and prognosis of various diseases. The purpose of this study is to explore functions of piRNAs and identify cancer subtypes on the basis of the pattern of transcriptome and somatic mutation data. A total of 285 510 SNPs in piRNAs and genes, which might affect piRNA biogenesis or piRNA targets binding were identified. Significant co-expression networks of piRNAs were then constructed separately for 12 major types of cancer. Finally, mutational matrices were mapped to piRNA network, propagated, and clustered for identification of cancer-related piRNAs and cancer subtypes. Findings showed that subtypes of three types of cancer (COAD, STAD and UCEC), which are significantly associated with survival were identified. Analysis of differentially expressed piRNAs in UCEC subtypes showed that piRNA function is closely related to cancer hallmarks “Enabling Replicative Immortality” and contributes to initiation of cancer.
Collapse
Affiliation(s)
- Yajun Liu
- Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, Shaanxi, P. R. China
| | - Guo Xie
- Shaanxi Key Laboratory of Complex System Control and Intelligent Information Processing, School of Information Technology and Equipment Engineering, Xi’an University of Technology, P. R. China
| | - Aimin Li
- Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, Shaanxi, P. R. China
| | - Zongzhen He
- Xi’an University of Finance and Economics, Xi’an 710100, Shaanxi, P. R. China
| | - Xinhong Hei
- Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, Shaanxi, P. R. China
| |
Collapse
|
8
|
Jiang Q, Jin M. Feature Selection for Breast Cancer Classification by Integrating Somatic Mutation and Gene Expression. Front Genet 2021; 12:629946. [PMID: 33719339 PMCID: PMC7952975 DOI: 10.3389/fgene.2021.629946] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Accepted: 01/21/2021] [Indexed: 01/26/2023] Open
Abstract
Exploring the molecular mechanisms of breast cancer is essential for the early prediction, diagnosis, and treatment of cancer patients. The large scale of data obtained from the high-throughput sequencing technology makes it difficult to identify the driver mutations and a minimal optimal set of genes that are critical to the classification of cancer. In this study, we propose a novel method without any prior information to identify mutated genes associated with breast cancer. For the somatic mutation data, it is processed to a mutated matrix, from which the mutation frequency of each gene can be obtained. By setting a reasonable threshold for the mutation frequency, a mutated gene set is filtered from the mutated matrix. For the gene expression data, it is used to generate the gene expression matrix, while the mutated gene set is mapped onto the matrix to construct a co-expression profile. In the stage of feature selection, we propose a staged feature selection algorithm, using fold change, false discovery rate to select differentially expressed genes, mutual information to remove the irrelevant and redundant features, and the embedded method based on gradient boosting decision tree with Bayesian optimization to obtain an optimal model. In the stage of evaluation, we propose a weighted metric to modify the traditional accuracy to solve the sample imbalance problem. We apply the proposed method to The Cancer Genome Atlas breast cancer data and identify a mutated gene set, among which the implicated genes are oncogenes or tumor suppressors previously reported to be associated with carcinogenesis. As a comparison with the integrative network, we also perform the optimal model on the individual gene expression and the gold standard PMA50. The results show that the integrative network outperforms the gene expression and PMA50 in the average of most metrics, which indicate the effectiveness of our proposed method by integrating multiple data sources, and can discover the associated mutated genes in breast cancer.
Collapse
Affiliation(s)
- Qin Jiang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Min Jin
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| |
Collapse
|
9
|
He Z, Zhang J, Yuan X, Zhang Y. Integrating Somatic Mutations for Breast Cancer Survival Prediction Using Machine Learning Methods. Front Genet 2021; 11:632901. [PMID: 33537063 PMCID: PMC7848170 DOI: 10.3389/fgene.2020.632901] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 12/30/2020] [Indexed: 12/13/2022] Open
Abstract
Breast cancer is the most common malignancy in women, and because it has a high mortality rate, it is urgent to develop computational methods to increase the accuracy of breast cancer survival predictive models. Although multi-omics data such as gene expression have been extensively used in recent studies, the accurate prognosis of breast cancer remains a challenge. Somatic mutations are another important and promising data source for studying cancer development, and its effect on the prognosis of breast cancer remains to be further explored. Meanwhile, these omics datasets are high-dimensional and redundant. Therefore, we adopted multiple kernel learning (MKL) to efficiently integrate somatic mutation to currently molecular data including gene expression, copy number variation (CNV), methylation, and protein expression data for the prediction of breast cancer survival. Before integration, the maximum relevance minimum redundancy (mRMR) feature selection method was utilized to select features that present high relevance to survival and low redundancy among themselves for each type of data. The experimental results demonstrated that the proposed method achieved the most optimal performance and there was a remarkable improvement in the prediction performance when somatic mutations were included, indicating that somatic mutations are critical for improving breast cancer survival predictions. Moreover, mRMR was superior to other feature selection methods used in previous studies. Furthermore, MKL outperformed the other traditional classifiers in multi-omics data integration. Our analysis indicated that through employing promising omics data such as somatic mutations and harnessing the power of proper feature selection methods and effective integration frameworks, the breast cancer survival predictive accuracy can be further increased, thereby providing a more optimal clinical diagnosis and more effective treatment for breast cancer patients.
Collapse
Affiliation(s)
- Zongzhen He
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Junying Zhang
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Xiguo Yuan
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Yuanyuan Zhang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, China
| |
Collapse
|
10
|
Özcan Şimşek NÖ, Özgür A, Gürgen F. Statistical representation models for mutation information within genomic data. BMC Bioinformatics 2019; 20:324. [PMID: 31195961 PMCID: PMC6567431 DOI: 10.1186/s12859-019-2868-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Accepted: 04/30/2019] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND As DNA sequencing technologies are improving and getting cheaper, genomic data can be utilized for diagnosis of many diseases such as cancer. Human raw genome data is huge in size for computational systems. Therefore, there is a need for a compact and accurate representation of the valuable information in DNA. The occurrence of complex genetic disorders often results from multiple gene mutations. The effect of each mutation is not equal for the development of a disease. Inspired from the field of information retrieval, we propose using the term frequency (tf) and BM25 term weighting measures with the inverse document frequency (idf) and relevance frequency (rf) measures to weight genes based on their mutations. The underlying assumption is that the more mutations a gene has in patients with a certain disease and the less mutations it has in other patients, the more discriminative that gene is. RESULTS We evaluated the proposed representations on the task of cancer type classification. We applied various machine learning techniques using the tf-idf and tf-rf schemes and their BM25 versions. Our results show that the BM25-tf-rf representation leads to improved classification accuracy and f-score values compared to the other representations. The highest accuracy (76.44%) and f-score (76.95%) are achieved with the BM25-tf-rf based data representation. CONCLUSIONS As a result of our experiments, the BM25-tf-rf scheme and the proposed neural network model is shown to be the best performing classification system for our case study of cancer type classification. This system is further utilized for causal gene analysis. Examples from the most effective genes that are used for decision making are found to be in the literature as target or causal genes.
Collapse
Affiliation(s)
| | - Arzucan Özgür
- Department of Computer Engineering, Boğaziçi University, İstanbul, Turkey.
| | - Fikret Gürgen
- Department of Computer Engineering, Boğaziçi University, İstanbul, Turkey.
| |
Collapse
|
11
|
Ruffalo M, Bar-Joseph Z. Protein interaction disruption in cancer. BMC Cancer 2019; 19:370. [PMID: 31014259 PMCID: PMC6823625 DOI: 10.1186/s12885-019-5532-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Accepted: 03/27/2019] [Indexed: 12/18/2022] Open
Abstract
Background Most methods that integrate network and mutation data to study cancer focus on the effects of genes/proteins, quantifying the effect of mutations or differential expression of a gene and its neighbors, or identifying groups of genes that are significantly up- or down-regulated. However, several mutations are known to disrupt specific protein-protein interactions, and network dynamics are often ignored by such methods. Here we introduce a method that allows for predicting the disruption of specific interactions in cancer patients using somatic mutation data and protein interaction networks. Methods We extend standard network smoothing techniques to assign scores to the edges in a protein interaction network in addition to nodes. We use somatic mutations as input to our modified network smoothing method, producing scores that quantify the proximity of each edge to somatic mutations in individual samples. Results Using breast cancer mutation data, we show that predicted edges are significantly associated with patient survival and known ligand binding site mutations. In-silico analysis of protein binding further supports the ability of the method to infer novel disrupted interactions and provides a mechanistic explanation for the impact of mutations on key pathways. Conclusions Our results show the utility of our method both in identifying disruptions of protein interactions from known ligand binding site mutations, and in selecting novel clinically significant interactions.Supporting website with software and data: https://www.cs.cmu.edu/~mruffalo/mut-edge-disrupt/. Electronic supplementary material The online version of this article (10.1186/s12885-019-5532-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Matthew Ruffalo
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA, 15213, USA
| | - Ziv Bar-Joseph
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA, 15213, USA. .,Machine Learning Department, School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA, 15213, USA.
| |
Collapse
|
12
|
Menor M, Zhu Y, Wang Y, Zhang J, Jiang B, Deng Y. Development of somatic mutation signatures for risk stratification and prognosis in lung and colorectal adenocarcinomas. BMC Med Genomics 2019; 12:24. [PMID: 30704450 PMCID: PMC6357362 DOI: 10.1186/s12920-018-0454-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Background Prognostic signatures are vital to precision medicine. However, development of somatic mutation prognostic signatures for cancers remains a challenge. In this study we developed a novel method for discovering somatic mutation based prognostic signatures. Results Somatic mutation and clinical data for lung adenocarcinoma (LUAD) and colorectal adenocarcinoma (COAD) from The Cancer Genome Atlas (TCGA) were randomly divided into training (n = 328 for LUAD and 286 for COAD) and validation (n = 167 for LUAD and 141 for COAD) datasets. A novel method of using the log2 ratio of the tumor mutation frequency to the paired normal mutation frequency is computed for each patient and missense mutation. The missense mutation ratios were mean aggregated into gene-level somatic mutation profiles. The somatic mutations were assessed using univariate Cox analysis on the LUAD and COAD training sets separately. Stepwise multivariate Cox analysis resulted in a final gene prognostic signature for LUAD and COAD. Performance was compared to gene prognostic signatures generated using the same pipeline but with different somatic mutation profile representations based on tumor mutation frequency, binary calls, and gene-gene network normalization. Signature high-risk LUAD and COAD cases had worse overall survival compared to the signature low-risk cases in the validation set (log-rank test p-value = 0.0101 for LUAD and 0.0314 for COAD) using mutation tumor frequency ratio (MFR) profiles, while all other methods, including gene-gene network normalization, have statistically insignificant stratification (log-rank test p-value ≥0.05). Most of the genes in the final gene signatures using MFR profiles are cancer-related based on network and literature analysis. Conclusions We demonstrated the robustness of MFR profiles and its potential to be a powerful prognostic tool in cancer. The results are robust according to validation testing and the selected genes are biologically relevant.
Collapse
Affiliation(s)
- Mark Menor
- Department of Complementary & Integrative Medicine, University of Hawaii John A. Burns School of Medicine, Honolulu, HI, USA
| | - Yong Zhu
- National Medical Centre of Colorectal Disease, The Third Affiliated Hospital of Nanjing University of Chinese Medicine, Nanjing, People's Republic of China
| | - Yu Wang
- Department of Complementary & Integrative Medicine, University of Hawaii John A. Burns School of Medicine, Honolulu, HI, USA.,Department of Oncology, The Third Affiliated Hospital of Nanjing University of Chinese Medicine, Nanjing, 210001, Jiangsu Province, China
| | - Jicai Zhang
- Department of Laboratory Medicine, Shiyan Taihe Hospital, College of Biomedical Engineering, Hubei University of Medicine, Shiyan, Hubei, 442000, People's Republic of China
| | - Bin Jiang
- National Medical Centre of Colorectal Disease, The Third Affiliated Hospital of Nanjing University of Chinese Medicine, Nanjing, People's Republic of China.
| | - Youping Deng
- Department of Complementary & Integrative Medicine, University of Hawaii John A. Burns School of Medicine, Honolulu, HI, USA.
| |
Collapse
|
13
|
Ozturk K, Dow M, Carlin DE, Bejar R, Carter H. The Emerging Potential for Network Analysis to Inform Precision Cancer Medicine. J Mol Biol 2018; 430:2875-2899. [PMID: 29908887 PMCID: PMC6097914 DOI: 10.1016/j.jmb.2018.06.016] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Revised: 05/30/2018] [Accepted: 06/06/2018] [Indexed: 12/19/2022]
Abstract
Precision cancer medicine promises to tailor clinical decisions to patients using genomic information. Indeed, successes of drugs targeting genetic alterations in tumors, such as imatinib that targets BCR-ABL in chronic myelogenous leukemia, have demonstrated the power of this approach. However, biological systems are complex, and patients may differ not only by the specific genetic alterations in their tumor, but also by more subtle interactions among such alterations. Systems biology and more specifically, network analysis, provides a framework for advancing precision medicine beyond clinical actionability of individual mutations. Here we discuss applications of network analysis to study tumor biology, early methods for N-of-1 tumor genome analysis, and the path for such tools to the clinic.
Collapse
Affiliation(s)
- Kivilcim Ozturk
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Michelle Dow
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Daniel E Carlin
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA
| | - Rafael Bejar
- Moores Cancer Center, Division of Hematology and Oncology, University of California San Diego, La Jolla, CA 92093, USA
| | - Hannah Carter
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA; Moores Cancer Center and Institute for Genomic Medicine, University of California San Diego, La Jolla, CA 92093, USA; CIFAR, MaRS Centre, West Tower, 661 University Ave., Suite 505, Toronto, ON M5G 1M1, Canada.
| |
Collapse
|