1
|
Belova T, Biondi N, Hsieh PH, Lutsik P, Chudasama P, Kuijjer M. Heterogeneity in the gene regulatory landscape of leiomyosarcoma. NAR Cancer 2023; 5:zcad037. [PMID: 37492373 PMCID: PMC10365024 DOI: 10.1093/narcan/zcad037] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 07/06/2023] [Accepted: 07/18/2023] [Indexed: 07/27/2023] Open
Abstract
Characterizing inter-tumor heterogeneity is crucial for selecting suitable cancer therapy, as the presence of diverse molecular subgroups of patients can be associated with disease outcome or response to treatment. While cancer subtypes are often characterized by differences in gene expression, the mechanisms driving these differences are generally unknown. We set out to model the regulatory mechanisms driving sarcoma heterogeneity based on patient-specific, genome-wide gene regulatory networks. We developed a new computational framework, PORCUPINE, which combines knowledge on biological pathways with permutation-based network analysis to identify pathways that exhibit significant regulatory heterogeneity across a patient population. We applied PORCUPINE to patient-specific leiomyosarcoma networks modeled on data from The Cancer Genome Atlas and validated our results in an independent dataset from the German Cancer Research Center. PORCUPINE identified 37 heterogeneously regulated pathways, including pathways representing potential targets for treatment of subgroups of leiomyosarcoma patients, such as FGFR and CTLA4 inhibitory signaling. We validated the detected regulatory heterogeneity through analysis of networks and chromatin states in leiomyosarcoma cell lines. We showed that the heterogeneity identified with PORCUPINE is not associated with methylation profiles or clinical features, thereby suggesting an independent mechanism of patient heterogeneity driven by the complex landscape of gene regulatory interactions.
Collapse
Affiliation(s)
- Tatiana Belova
- Computational Biology and Systems Medicine Group, Centre for Molecular Medicine Norway, University of Oslo, Oslo, Norway
| | - Nicola Biondi
- Precision Sarcoma Research Group, German Cancer Research Center (DKFZ) and National Center for Tumor Diseases, Heidelberg, Germany
| | - Ping-Han Hsieh
- Computational Biology and Systems Medicine Group, Centre for Molecular Medicine Norway, University of Oslo, Oslo, Norway
| | - Pavlo Lutsik
- Division of Cancer Epigenomics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of Oncology, Catholic University (KU) Leuven, Leuven, Belgium
| | - Priya Chudasama
- Precision Sarcoma Research Group, German Cancer Research Center (DKFZ) and National Center for Tumor Diseases, Heidelberg, Germany
| | - Marieke L Kuijjer
- Computational Biology and Systems Medicine Group, Centre for Molecular Medicine Norway, University of Oslo, Oslo, Norway
- Department of Pathology, Leiden University Medical Center, Leiden, the Netherlands
- Leiden Center for Computational Oncology, Leiden University Medical Center, Leiden, the Netherlands
| |
Collapse
|
2
|
Marquardt A, Kollmannsberger P, Krebs M, Argentiero A, Knott M, Solimando AG, Kerscher AG. Visual Clustering of Transcriptomic Data from Primary and Metastatic Tumors-Dependencies and Novel Pitfalls. Genes (Basel) 2022; 13:genes13081335. [PMID: 35893071 PMCID: PMC9394300 DOI: 10.3390/genes13081335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Revised: 07/20/2022] [Accepted: 07/23/2022] [Indexed: 02/06/2023] Open
Abstract
Personalized oncology is a rapidly evolving area and offers cancer patients therapy options that are more specific than ever. However, there is still a lack of understanding regarding transcriptomic similarities or differences of metastases and corresponding primary sites. Applying two unsupervised dimension reduction methods (t-Distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP)) on three datasets of metastases (n = 682 samples) with three different data transformations (unprocessed, log10 as well as log10 + 1 transformed values), we visualized potential underlying clusters. Additionally, we analyzed two datasets (n = 616 samples) containing metastases and primary tumors of one entity, to point out potential familiarities. Using these methods, no tight link between the site of resection and cluster formation outcome could be demonstrated, or for datasets consisting of solely metastasis or mixed datasets. Instead, dimension reduction methods and data transformation significantly impacted visual clustering results. Our findings strongly suggest data transformation to be considered as another key element in the interpretation of visual clustering approaches along with initialization and different parameters. Furthermore, the results highlight the need for a more thorough examination of parameters used in the analysis of clusters.
Collapse
Affiliation(s)
- André Marquardt
- Institute of Pathology, Klinikum Stuttgart, 70174 Stuttgart, Germany
- Institute of Pathology, University of Würzburg, 97080 Würzburg, Germany
- Bavarian Center for Cancer Research (BZKF), 97080 Würzburg, Germany
- Correspondence: (A.M.); (A.G.K.)
| | - Philip Kollmannsberger
- Center for Computational and Theoretical Biology, University of Würzburg, 97074 Würzburg, Germany;
| | - Markus Krebs
- Comprehensive Cancer Center Mainfranken, University Hospital Würzburg, 97080 Würzburg, Germany;
- Department of Urology and Pediatric Urology, University Hospital Würzburg, 97080 Würzburg, Germany
| | - Antonella Argentiero
- IRCCS Istituto Tumori “Giovanni Paolo II” of Bari, 70124 Bari, Italy; (A.A.); (A.G.S.)
| | - Markus Knott
- Department of Hematology, Oncology, Stem Cell Transplantation and Palliative Care, Klinikum Stuttgart, 70174 Stuttgart, Germany;
- Stuttgart Cancer Center–Tumor Unit Eva Mayr-Stihl, Klinikum Stuttgart, 70174 Stuttgart, Germany
| | - Antonio Giovanni Solimando
- IRCCS Istituto Tumori “Giovanni Paolo II” of Bari, 70124 Bari, Italy; (A.A.); (A.G.S.)
- Guido Baccelli Unit of Internal Medicine, Department of Biomedical Sciences and Human Oncology, School of Medicine, Aldo Moro University of Bari, 70124 Bari, Italy
| | - Alexander Georg Kerscher
- Comprehensive Cancer Center Mainfranken, University Hospital Würzburg, 97080 Würzburg, Germany;
- Correspondence: (A.M.); (A.G.K.)
| |
Collapse
|
3
|
Zhang Z, Hernandez K, Savage J, Li S, Miller D, Agrawal S, Ortuno F, Staudt LM, Heath A, Grossman RL. Uniform genomic data analysis in the NCI Genomic Data Commons. Nat Commun 2021; 12:1226. [PMID: 33619257 PMCID: PMC7900240 DOI: 10.1038/s41467-021-21254-9] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Accepted: 01/14/2021] [Indexed: 12/28/2022] Open
Abstract
The goal of the National Cancer Institute's (NCI's) Genomic Data Commons (GDC) is to provide the cancer research community with a data repository of uniformly processed genomic and associated clinical data that enables data sharing and collaborative analysis in the support of precision medicine. The initial GDC dataset include genomic, epigenomic, proteomic, clinical and other data from the NCI TCGA and TARGET programs. Data production for the GDC started in June, 2015 using an OpenStack-based private cloud. By June of 2016, the GDC had analyzed more than 50,000 raw sequencing data inputs, as well as multiple other data types. Using the latest human genome reference build GRCh38, the GDC generated a variety of data types from aligned reads to somatic mutations, gene expression, miRNA expression, DNA methylation status, and copy number variation. In this paper, we describe the pipelines and workflows used to process and harmonize the data in the GDC. The generated data, as well as the original input files from TCGA and TARGET, are available for download and exploratory analysis at the GDC Data Portal and Legacy Archive ( https://gdc.cancer.gov/ ).
Collapse
Affiliation(s)
- Zhenyu Zhang
- Center for Translational Data Science, University of Chicago, Chicago, IL, USA
| | - Kyle Hernandez
- Center for Translational Data Science, University of Chicago, Chicago, IL, USA
| | - Jeremiah Savage
- Center for Translational Data Science, University of Chicago, Chicago, IL, USA
- AbbVie Inc., Redwood City, CA, USA
| | - Shenglai Li
- Center for Translational Data Science, University of Chicago, Chicago, IL, USA
| | - Dan Miller
- Center for Translational Data Science, University of Chicago, Chicago, IL, USA
- Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Stuti Agrawal
- Center for Translational Data Science, University of Chicago, Chicago, IL, USA
- Merck Healthcare KGaA, Darmstadt, Germany
| | - Francisco Ortuno
- Center for Translational Data Science, University of Chicago, Chicago, IL, USA
- Clinical Bioinformatics Area, Fundacion Progreso y Salud (FPS), Seville, Spain
| | | | - Allison Heath
- Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Robert L Grossman
- Center for Translational Data Science, University of Chicago, Chicago, IL, USA.
| |
Collapse
|
4
|
Zhao Y, Pan Z, Namburi S, Pattison A, Posner A, Balachander S, Paisie CA, Reddi HV, Rueter J, Gill AJ, Fox S, Raghav KPS, Flynn WF, Tothill RW, Li S, Karuturi RKM, George J. CUP-AI-Dx: A tool for inferring cancer tissue of origin and molecular subtype using RNA gene-expression data and artificial intelligence. EBioMedicine 2020; 61:103030. [PMID: 33039710 PMCID: PMC7553237 DOI: 10.1016/j.ebiom.2020.103030] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Revised: 09/10/2020] [Accepted: 09/11/2020] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Cancer of unknown primary (CUP), representing approximately 3-5% of all malignancies, is defined as metastatic cancer where a primary site of origin cannot be found despite a standard diagnostic workup. Because knowledge of a patient's primary cancer remains fundamental to their treatment, CUP patients are significantly disadvantaged and most have a poor survival outcome. Developing robust and accessible diagnostic methods for resolving cancer tissue of origin, therefore, has significant value for CUP patients. METHODS We developed an RNA-based classifier called CUP-AI-Dx that utilizes a 1D Inception convolutional neural network (1D-Inception) model to infer a tumor's primary tissue of origin. CUP-AI-Dx was trained using the transcriptional profiles of 18,217 primary tumours representing 32 cancer types from The Cancer Genome Atlas project (TCGA) and International Cancer Genome Consortium (ICGC). Gene expression data was ordered by gene chromosomal coordinates as input to the 1D-CNN model, and the model utilizes multiple convolutional kernels with different configurations simultaneously to improve generality. The model was optimized through extensive hyperparameter tuning, including different max-pooling layers and dropout settings. For 11 tumour types, we also developed a random forest model that can classify the tumour's molecular subtype according to prior TCGA studies. The optimised CUP-AI-Dx tissue of origin classifier was tested on 394 metastatic samples from 11 tumour types from TCGA and 92 formalin-fixed paraffin-embedded (FFPE) samples representing 18 cancer types from two clinical laboratories. The CUP-AI-Dx molecular subtype was also independently tested on independent ovarian and breast cancer microarray datasets FINDINGS: CUP-AI-Dx identifies the primary site with an overall top-1-accuracy of 98.54% in cross-validation and 96.70% on a test dataset. When applied to two independent clinical-grade RNA-seq datasets generated from two different institutes from the US and Australia, our model predicted the primary site with a top-1-accuracy of 86.96% and 72.46% respectively. INTERPRETATION The CUP-AI-Dx predicts tumour primary site and molecular subtype with high accuracy and therefore can be used to assist the diagnostic work-up of cancers of unknown primary or uncertain origin using a common and accessible genomics platform. FUNDING NIH R35 GM133562, NCI P30 CA034196, Victorian Cancer Agency Australia.
Collapse
Affiliation(s)
- Yue Zhao
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA
| | - Ziwei Pan
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA; Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT, USA
| | - Sandeep Namburi
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA
| | - Andrew Pattison
- Department of Clinical Pathology and Centre for Cancer Research, University of Melbourne, Parkville, Melbourne, Australia
| | - Atara Posner
- Department of Clinical Pathology and Centre for Cancer Research, University of Melbourne, Parkville, Melbourne, Australia
| | - Shiva Balachander
- Department of Clinical Pathology and Centre for Cancer Research, University of Melbourne, Parkville, Melbourne, Australia
| | - Carolyn A Paisie
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA
| | - Honey V Reddi
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA; The Jackson Laboratory Cancer Center, Bar Harbor, ME, USA
| | - Jens Rueter
- The Jackson Laboratory Cancer Center, Bar Harbor, ME, USA
| | - Anthony J Gill
- Cancer Diagnosis and Pathology Group, Kolling Institute of Medical Research, Royal North Shore Hospital, St Leonards, New South Wales 2065 Australia; NSW Health Pathology, Department of Anatomical Pathology, Royal North Shore Hospital, Sydney, New South Wales 2065 Australia; Department of Anatomical Pathology, Douglass Hanly Moir Pathology, Macquarie Park, New South Wales 2113 Australia; University of Sydney, Sydney, New South Wales 2006 Australia
| | - Stephen Fox
- Peter MacCallum Cancer Centre, Department of Pathology, University of Melbourne, Victoria, Australia
| | - Kanwal P S Raghav
- Department of Gastrointestinal Medical Oncology, Division of Cancer Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - William F Flynn
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA
| | - Richard W Tothill
- Department of Clinical Pathology and Centre for Cancer Research, University of Melbourne, Parkville, Melbourne, Australia; Peter MacCallum Cancer Centre, Parkville, Melbourne, Australia.
| | - Sheng Li
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA; The Jackson Laboratory Cancer Center, Bar Harbor, ME, USA; Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT, USA; Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA.
| | - R Krishna Murthy Karuturi
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA; The Jackson Laboratory Cancer Center, Bar Harbor, ME, USA; Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA.
| | - Joshy George
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA; The Jackson Laboratory Cancer Center, Bar Harbor, ME, USA.
| |
Collapse
|
5
|
Godichon-Baggioni A, Maugis-Rabusseau C, Rau A. Multiview cluster aggregation and splitting, with an application to multiomic breast cancer data. Ann Appl Stat 2020. [DOI: 10.1214/19-aoas1317] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
6
|
González-Reymúndez A, Vázquez AI. Multi-omic signatures identify pan-cancer classes of tumors beyond tissue of origin. Sci Rep 2020; 10:8341. [PMID: 32433524 PMCID: PMC7239905 DOI: 10.1038/s41598-020-65119-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Accepted: 04/07/2020] [Indexed: 02/08/2023] Open
Abstract
Despite recent advances in treatment, cancer continues to be one of the most lethal human maladies. One of the challenges of cancer treatment is the diversity among similar tumors that exhibit different clinical outcomes. Most of this variability comes from wide-spread molecular alterations that can be summarized by omic integration. Here, we have identified eight novel tumor groups (C1-8) via omic integration, characterized by unique cancer signatures and clinical characteristics. C3 had the best clinical outcomes, while C2 and C5 had poorest. C1, C7, and C8 were upregulated for cellular and mitochondrial translation, and relatively low proliferation. C6 and C4 were also downregulated for cellular and mitochondrial translation, and had high proliferation rates. C4 was represented by copy losses on chromosome 6, and had the highest number of metastatic samples. C8 was characterized by copy losses on chromosome 11, having also the lowest lymphocytic infiltration rate. C6 had the lowest natural killer infiltration rate and was represented by copy gains of genes in chromosome 11. C7 was represented by copy gains on chromosome 6, and had the highest upregulation in mitochondrial translation. We believe that, since molecularly alike tumors could respond similarly to treatment, our results could inform therapeutic action.
Collapse
Affiliation(s)
- Agustín González-Reymúndez
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, USA
- Institute for Quantitative Health Science and Engineering (IQ), Michigan State University, East Lansing, MI, USA
| | - Ana I Vázquez
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, USA.
- Institute for Quantitative Health Science and Engineering (IQ), Michigan State University, East Lansing, MI, USA.
| |
Collapse
|
7
|
Coretto P, Serra A, Tagliaferri R. Robust clustering of noisy high-dimensional gene expression data for patients subtyping. Bioinformatics 2019; 34:4064-4072. [PMID: 29939219 DOI: 10.1093/bioinformatics/bty502] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2018] [Accepted: 06/19/2018] [Indexed: 12/12/2022] Open
Abstract
Motivation One of the most important research areas in personalized medicine is the discovery of disease sub-types with relevance in clinical applications. This is usually accomplished by exploring gene expression data with unsupervised clustering methodologies. Then, with the advent of multiple omics technologies, data integration methodologies have been further developed to obtain better performances in patient separability. However, these methods do not guarantee the survival separability of the patients in different clusters. Results We propose a new methodology that first computes a robust and sparse correlation matrix of the genes, then decomposes it and projects the patient data onto the first m spectral components of the correlation matrix. After that, a robust and adaptive to noise clustering algorithm is applied. The clustering is set up to optimize the separation between survival curves estimated cluster-wise. The method is able to identify clusters that have different omics signatures and also statistically significant differences in survival time. The proposed methodology is tested on five cancer datasets downloaded from The Cancer Genome Atlas repository. The proposed method is compared with the Similarity Network Fusion (SNF) approach, and model based clustering based on Student's t-distribution (TMIX). Our method obtains a better performance in terms of survival separability, even if it uses a single gene expression view compared to the multi-view approach of the SNF method. Finally, a pathway based analysis is accomplished to highlight the biological processes that differentiate the obtained patient groups. Availability and implementation Our R source code is available online at https://github.com/angy89/RobustClusteringPatientSubtyping. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pietro Coretto
- Department of Economics and Statistics, STATLAB, University of Salerno, Fisciano, SA, Italy
| | - Angela Serra
- Department of Management and Innovation Systems, NeuRoNeLab, University of Salerno, Fisciano, SA, Italy
| | - Roberto Tagliaferri
- Department of Management and Innovation Systems, NeuRoNeLab, University of Salerno, Fisciano, SA, Italy
| |
Collapse
|
8
|
Abrams ZB, Zucker M, Wang M, Asiaee Taheri A, Abruzzo LV, Coombes KR. Thirty biologically interpretable clusters of transcription factors distinguish cancer type. BMC Genomics 2018; 19:738. [PMID: 30305013 PMCID: PMC6180590 DOI: 10.1186/s12864-018-5093-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2018] [Accepted: 09/19/2018] [Indexed: 12/27/2022] Open
Abstract
Background Transcription factors are essential regulators of gene expression and play critical roles in development, differentiation, and in many cancers. To carry out their regulatory programs, they must cooperate in networks and bind simultaneously to sites in promoter or enhancer regions of genes. We hypothesize that the mRNA co-expression patterns of transcription factors can be used both to learn how they cooperate in networks and to distinguish between cancer types. Results We recently developed a new algorithm, Thresher, that combines principal component analysis, outlier filtering, and von Mises-Fisher mixture models to cluster genes (in this case, transcription factors) based on expression, determining the optimal number of clusters in the process. We applied Thresher to the RNA-Seq expression data of 486 transcription factors from more than 10,000 samples of 33 kinds of cancer studied in The Cancer Genome Atlas (TCGA). We found that 30 clusters of transcription factors from a 29-dimensional principal component space were able to distinguish between most cancer types, and could separate tumor samples from normal controls. Moreover, each cluster of transcription factors could be either (i) linked to a tissue-specific expression pattern or (ii) associated with a fundamental biological process such as cell cycle, angiogenesis, apoptosis, or cytoskeleton. Clusters of the second type were more likely also to be associated with embryonically lethal mouse phenotypes. Conclusions Using our approach, we have shown that the mRNA expression patterns of transcription factors contain most of the information needed to distinguish different cancer types. The Thresher method is capable of discovering biologically interpretable clusters of genes. It can potentially be applied to other gene sets, such as signaling pathways, to decompose them into simpler, yet biologically meaningful, components. Electronic supplementary material The online version of this article (10.1186/s12864-018-5093-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Zachary B Abrams
- Department of Biomedical Informatics, The Ohio State University, 1800 Cannon Drive, Columbus, 43210, OH, USA
| | - Mark Zucker
- Department of Biomedical Informatics, The Ohio State University, 1800 Cannon Drive, Columbus, 43210, OH, USA
| | - Min Wang
- Department of Biomedical Informatics, The Ohio State University, 1800 Cannon Drive, Columbus, 43210, OH, USA.,Mathematical Biosciences Institute, The Ohio State University, 1735 Neil Avenue, Columbus, 43210, OH, USA
| | - Amir Asiaee Taheri
- Department of Biomedical Informatics, The Ohio State University, 1800 Cannon Drive, Columbus, 43210, OH, USA.,Mathematical Biosciences Institute, The Ohio State University, 1735 Neil Avenue, Columbus, 43210, OH, USA
| | - Lynne V Abruzzo
- Department of Pathology, The Ohio State University, 129 Hamilton Hall, 1645 Neil Avenue, Columbus, 43210, OH, USA
| | - Kevin R Coombes
- Department of Biomedical Informatics, The Ohio State University, 1800 Cannon Drive, Columbus, 43210, OH, USA.
| |
Collapse
|
9
|
Baali I, Acar DAE, Aderinwale TW, HafezQorani S, Kazan H. Predicting clinical outcomes in neuroblastoma with genomic data integration. Biol Direct 2018; 13:20. [PMID: 30621745 PMCID: PMC6889397 DOI: 10.1186/s13062-018-0223-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2017] [Accepted: 09/03/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Neuroblastoma is a heterogeneous disease with diverse clinical outcomes. Current risk group models require improvement as patients within the same risk group can still show variable prognosis. Recently collected genome-wide datasets provide opportunities to infer neuroblastoma subtypes in a more unified way. Within this context, data integration is critical as different molecular characteristics can contain complementary signals. To this end, we utilized the genomic datasets available for the SEQC cohort patients to develop supervised and unsupervised models that can predict disease prognosis. RESULTS Our supervised model trained on the SEQC cohort can accurately predict overall survival and event-free survival profiles of patients in two independent cohorts. We also performed extensive experiments to assess the prediction accuracy of high risk patients and patients without MYCN amplification. Our results from this part suggest that clinical endpoints can be predicted accurately across multiple cohorts. To explore the data in an unsupervised manner, we used an integrative clustering strategy named multi-view kernel k-means (MVKKM) that can effectively integrate multiple high-dimensional datasets with varying weights. We observed that integrating different gene expression datasets results in a better patient stratification compared to using these datasets individually. Also, our identified subgroups provide a better Cox regression model fit compared to the existing risk group definitions. CONCLUSION Altogether, our results indicate that integration of multiple genomic characterizations enables the discovery of subtypes that improve over existing definitions of risk groups. Effective prediction of survival times will have a direct impact on choosing the right therapies for patients. REVIEWERS This article was reviewed by Susmita Datta, Wenzhong Xiao and Ziv Shkedy.
Collapse
Affiliation(s)
- Ilyes Baali
- Department of Computer Engineering, Antalya Bilim University, Antalya, Turkey
| | - D Alp Emre Acar
- Department of Computer Engineering, Antalya Bilim University, Antalya, Turkey.,Present Address: Department of Electrical and Computer Engineering, Boston University, Boston, US
| | - Tunde W Aderinwale
- Electrical and Computer Engineering Graduate Program, Institute of Applied Sciences, Antalya Bilim University, Antalya, Turkey.,Present Address: Department of Computer Science, Purdue University, West Lafayette, US
| | - Saber HafezQorani
- Graduate School of Informatics, Department of Health Informatics, Middle East Technical University, Ankara, Turkey.,Present Address: BC Cancer Agency Genome Sciences Centre, Vancouver, BC, Canada
| | - Hilal Kazan
- Department of Computer Engineering, Antalya Bilim University, Antalya, Turkey.
| |
Collapse
|
10
|
Parimbelli E, Marini S, Sacchi L, Bellazzi R. Patient similarity for precision medicine: A systematic review. J Biomed Inform 2018; 83:87-96. [PMID: 29864490 DOI: 10.1016/j.jbi.2018.06.001] [Citation(s) in RCA: 66] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Revised: 05/16/2018] [Accepted: 06/01/2018] [Indexed: 12/19/2022]
Abstract
Evidence-based medicine is the most prevalent paradigm adopted by physicians. Clinical practice guidelines typically define a set of recommendations together with eligibility criteria that restrict their applicability to a specific group of patients. The ever-growing size and availability of health-related data is currently challenging the broad definitions of guideline-defined patient groups. Precision medicine leverages on genetic, phenotypic, or psychosocial characteristics to provide precise identification of patient subsets for treatment targeting. Defining a patient similarity measure is thus an essential step to allow stratification of patients into clinically-meaningful subgroups. The present review investigates the use of patient similarity as a tool to enable precision medicine. 279 articles were analyzed along four dimensions: data types considered, clinical domains of application, data analysis methods, and translational stage of findings. Cancer-related research employing molecular profiling and standard data analysis techniques such as clustering constitute the majority of the retrieved studies. Chronic and psychiatric diseases follow as the second most represented clinical domains. Interestingly, almost one quarter of the studies analyzed presented a novel methodology, with the most advanced employing data integration strategies and being portable to different clinical domains. Integration of such techniques into decision support systems constitutes and interesting trend for future research.
Collapse
Affiliation(s)
- E Parimbelli
- Telfer School of Management, University of Ottawa, Ottawa, Canada; Interdepartmental Centre for Health Technologies, University of Pavia, Italy.
| | - S Marini
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, USA; Interdepartmental Centre for Health Technologies, University of Pavia, Italy
| | - L Sacchi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Italy; Interdepartmental Centre for Health Technologies, University of Pavia, Italy
| | - R Bellazzi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Italy; Interdepartmental Centre for Health Technologies, University of Pavia, Italy; RCCS ICS Maugeri, Pavia, Italy
| |
Collapse
|
11
|
Sorting Five Human Tumor Types Reveals Specific Biomarkers and Background Classification Genes. Sci Rep 2018; 8:8180. [PMID: 29802335 PMCID: PMC5970138 DOI: 10.1038/s41598-018-26310-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Accepted: 05/10/2018] [Indexed: 12/16/2022] Open
Abstract
We applied two state-of-the-art, knowledge independent data-mining methods - Dynamic Quantum Clustering (DQC) and t-Distributed Stochastic Neighbor Embedding (t-SNE) - to data from The Cancer Genome Atlas (TCGA). We showed that the RNA expression patterns for a mixture of 2,016 samples from five tumor types can sort the tumors into groups enriched for relevant annotations including tumor type, gender, tumor stage, and ethnicity. DQC feature selection analysis discovered 48 core biomarker transcripts that clustered tumors by tumor type. When these transcripts were removed, the geometry of tumor relationships changed, but it was still possible to classify the tumors using the RNA expression profiles of the remaining transcripts. We continued to remove the top biomarkers for several iterations and performed cluster analysis. Even though the most informative transcripts were removed from the cluster analysis, the sorting ability of remaining transcripts remained strong after each iteration. Further, in some iterations we detected a repeating pattern of biological function that wasn't detectable with the core biomarker transcripts present. This suggests the existence of a "background classification" potential in which the pattern of gene expression after continued removal of "biomarker" transcripts could still classify tumors in agreement with the tumor type.
Collapse
|