1
|
Xie J, Chen Y, Luo S, Yang W, Lin Y, Wang L, Ding X, Tong M, Yu R. Tracing unknown tumor origins with a biological-pathway-based transformer model. CELL REPORTS METHODS 2024; 4:100797. [PMID: 38889685 PMCID: PMC11228371 DOI: 10.1016/j.crmeth.2024.100797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 02/01/2024] [Accepted: 05/21/2024] [Indexed: 06/20/2024]
Abstract
Cancer of unknown primary (CUP) represents metastatic cancer where the primary site remains unidentified despite standard diagnostic procedures. To determine the tumor origin in such cases, we developed BPformer, a deep learning method integrating the transformer model with prior knowledge of biological pathways. Trained on transcriptomes from 10,410 primary tumors across 32 cancer types, BPformer achieved remarkable accuracy rates of 94%, 92%, and 89% in primary tumors and primary and metastatic sites of metastatic tumors, respectively, surpassing existing methods. Additionally, BPformer was validated in a retrospective study, demonstrating consistency with tumor sites diagnosed through immunohistochemistry and histopathology. Furthermore, BPformer was able to rank pathways based on their contribution to tumor origin identification, which helped to classify oncogenic signaling pathways into those that are highly conservative among different cancers versus those that are highly variable depending on their origins.
Collapse
Affiliation(s)
- Jiajing Xie
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 361102, China
| | - Ying Chen
- School of Informatics, Xiamen University, Xiamen, Fujian 361005, China
| | - Shijie Luo
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 361102, China
| | - Wenxian Yang
- Aginome Scientific, Xiamen, Fujian 361005, China
| | - Yuxiang Lin
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 361102, China
| | - Liansheng Wang
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 361102, China; School of Informatics, Xiamen University, Xiamen, Fujian 361005, China
| | - Xin Ding
- Department of Pathology, Zhongshan Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen, Fujian 361004, China.
| | - Mengsha Tong
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 361102, China; State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen, Fujian 361102, China.
| | - Rongshan Yu
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 361102, China; School of Informatics, Xiamen University, Xiamen, Fujian 361005, China; Aginome Scientific, Xiamen, Fujian 361005, China.
| |
Collapse
|
2
|
Van R, Alvarez D, Mize T, Gannavarapu S, Chintham Reddy L, Nasoz F, Han MV. A comparison of RNA-Seq data preprocessing pipelines for transcriptomic predictions across independent studies. BMC Bioinformatics 2024; 25:181. [PMID: 38720247 PMCID: PMC11080237 DOI: 10.1186/s12859-024-05801-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 05/02/2024] [Indexed: 05/12/2024] Open
Abstract
BACKGROUND RNA sequencing combined with machine learning techniques has provided a modern approach to the molecular classification of cancer. Class predictors, reflecting the disease class, can be constructed for known tissue types using the gene expression measurements extracted from cancer patients. One challenge of current cancer predictors is that they often have suboptimal performance estimates when integrating molecular datasets generated from different labs. Often, the quality of the data is variable, procured differently, and contains unwanted noise hampering the ability of a predictive model to extract useful information. Data preprocessing methods can be applied in attempts to reduce these systematic variations and harmonize the datasets before they are used to build a machine learning model for resolving tissue of origins. RESULTS We aimed to investigate the impact of data preprocessing steps-focusing on normalization, batch effect correction, and data scaling-through trial and comparison. Our goal was to improve the cross-study predictions of tissue of origin for common cancers on large-scale RNA-Seq datasets derived from thousands of patients and over a dozen tumor types. The results showed that the choice of data preprocessing operations affected the performance of the associated classifier models constructed for tissue of origin predictions in cancer. CONCLUSION By using TCGA as a training set and applying data preprocessing methods, we demonstrated that batch effect correction improved performance measured by weighted F1-score in resolving tissue of origin against an independent GTEx test dataset. On the other hand, the use of data preprocessing operations worsened classification performance when the independent test dataset was aggregated from separate studies in ICGC and GEO. Therefore, based on our findings with these publicly available large-scale RNA-Seq datasets, the application of data preprocessing techniques to a machine learning pipeline is not always appropriate.
Collapse
Affiliation(s)
- Richard Van
- School of Life Sciences, University of Nevada Las Vegas, Las Vegas, NV, USA
- Nevada Institute of Personalized Medicine, Las Vegas, NV, USA
| | - Daniel Alvarez
- Department of Computer Science, University of Nevada Las Vegas, Las Vegas, NV, USA
- Nevada Institute of Personalized Medicine, Las Vegas, NV, USA
| | - Travis Mize
- Icahn School of Medicine at Mount Sinai, Institute for Genomic Health, New York City, NY, USA
| | - Sravani Gannavarapu
- Department of Computer Science, University of Nevada Las Vegas, Las Vegas, NV, USA
- Nevada Institute of Personalized Medicine, Las Vegas, NV, USA
| | - Lohitha Chintham Reddy
- Department of Computer Science, University of Nevada Las Vegas, Las Vegas, NV, USA
- Nevada Institute of Personalized Medicine, Las Vegas, NV, USA
| | - Fatma Nasoz
- Department of Computer Science, University of Nevada Las Vegas, Las Vegas, NV, USA
- Nevada Institute of Personalized Medicine, Las Vegas, NV, USA
| | - Mira V Han
- School of Life Sciences, University of Nevada Las Vegas, Las Vegas, NV, USA.
- Nevada Institute of Personalized Medicine, Las Vegas, NV, USA.
| |
Collapse
|
3
|
Štancl P, Karlić R. Machine learning for pan-cancer classification based on RNA sequencing data. Front Mol Biosci 2023; 10:1285795. [PMID: 38028533 PMCID: PMC10667476 DOI: 10.3389/fmolb.2023.1285795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 10/30/2023] [Indexed: 12/01/2023] Open
Abstract
Despite recent improvements in cancer diagnostics, 2%-5% of all malignancies are still cancers of unknown primary (CUP), for which the tissue-of-origin (TOO) cannot be determined at the time of presentation. Since the primary site of cancer leads to the choice of optimal treatment, CUP patients pose a significant clinical challenge with limited treatment options. Data produced by large-scale cancer genomics initiatives, which aim to determine the genomic, epigenomic, and transcriptomic characteristics of a large number of individual patients of multiple cancer types, have led to the introduction of various methods that use machine learning to predict the TOO of cancer patients. In this review, we assess the reproducibility, interpretability, and robustness of results obtained by 20 recent studies that utilize different machine learning methods for TOO prediction based on RNA sequencing data, including their reported performance on independent data sets and identification of important features. Our review investigates the strengths and weaknesses of different methods, checks the correspondence of their results, and identifies potential issues with datasets used for model training and testing, assessing their potential usefulness in a clinical setting and suggesting future improvements.
Collapse
Affiliation(s)
| | - Rosa Karlić
- Bioinformatics Group, Division of Molecular Biology, Department of Biology, Faculty of Science, University of Zagreb, Zagreb, Croatia
| |
Collapse
|
4
|
Padwal MK, Basu S, Basu B. Application of Machine Learning in Predicting Hepatic Metastasis or Primary Site in Gastroenteropancreatic Neuroendocrine Tumors. Curr Oncol 2023; 30:9244-9261. [PMID: 37887568 PMCID: PMC10605255 DOI: 10.3390/curroncol30100668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 10/16/2023] [Accepted: 10/16/2023] [Indexed: 10/28/2023] Open
Abstract
Gastroenteropancreatic neuroendocrine tumors (GEP-NETs) account for 80% of gastroenteropancreatic neuroendocrine neoplasms (GEP-NENs). GEP-NETs are well-differentiated tumors, highly heterogeneous in biology and origin, and are often diagnosed at the metastatic stage. Diagnosis is commonly through clinical symptoms, histopathology, and PET-CT imaging, while molecular markers for metastasis and the primary site are unknown. Here, we report the identification of multi-gene signatures for hepatic metastasis and primary sites through analyses on RNA-SEQ datasets of pancreatic and small intestinal NETs tissue samples. Relevant gene features, identified from the normalized RNA-SEQ data using the mRMRe algorithm, were used to develop seven Machine Learning models (LDA, RF, CART, k-NN, SVM, XGBOOST, GBM). Two multi-gene random forest (RF) models classified primary and metastatic samples with 100% accuracy in training and test cohorts and >90% accuracy in an independent validation cohort. Similarly, three multi-gene RF models identified the pancreas or small intestine as the primary site with 100% accuracy in training and test cohorts, and >95% accuracy in an independent cohort. Multi-label models for concurrent prediction of hepatic metastasis and primary site returned >98.42% and >87.42% accuracies on training and test cohorts, respectively. A robust molecular signature to predict liver metastasis or the primary site for GEP-NETs is reported for the first time and could complement the clinical management of GEP-NETs.
Collapse
Affiliation(s)
- Mahesh Kumar Padwal
- Molecular Biology Division, Bhabha Atomic Research Centre, Mumbai 400085, India;
- Homi Bhabha National Institute, Mumbai 400094, India;
| | - Sandip Basu
- Homi Bhabha National Institute, Mumbai 400094, India;
- Radiation Medicine Centre, Bhabha Atomic Research Centre, Tata Memorial Hospital Annexe, Mumbai 400012, India
| | - Bhakti Basu
- Molecular Biology Division, Bhabha Atomic Research Centre, Mumbai 400085, India;
- Homi Bhabha National Institute, Mumbai 400094, India;
| |
Collapse
|
5
|
Bioinformatics approach to identify the core ontologies, pathways, signature genes and drug molecules of prostate cancer. INFORMATICS IN MEDICINE UNLOCKED 2023. [DOI: 10.1016/j.imu.2023.101179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
|
6
|
Cancer classification based on multiple dimensions: SNV patterns. Comput Biol Med 2022; 151:106270. [PMID: 36395594 DOI: 10.1016/j.compbiomed.2022.106270] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 10/09/2022] [Accepted: 10/30/2022] [Indexed: 11/13/2022]
Abstract
BACKGROUND The occurrence of cancer is closely related to single nucleotide variants (SNVs). However, in DNA samples collected from patients with distinct cancers, SNVs are detected in different patterns. Therefore, it is an important task to select the appropriate method by which to classify cancer to the greatest extent of SNV patterns, which will aid in cancer diagnosis and treatment. In traditional studies, researchers combined each SNV with its neighboring nucleotides to form a trinucleotide. Mutation signatures for cancer classification were extracted from the patterns of the trinucleotides, but the SNV feature extraction in a single dimension may result in partial information loss and poor model performance. RESULTS In this study, we defined multidimensional SNV (M-SNV) features to classify cancer. M-SNV features considered first- and second-order neighboring nucleotides of one-dimensional SNVs and included six types of features. We validated the feasibility of M-SNV features using a dataset obtained from The Cancer Genome Atlas (TCGA) consisting of 2761 samples from 12 cancers. We performed preliminary screening of 562,321 DNA mutation sites in these samples. The remaining mutation sites were characterized by cancer type in six signatures. We found that the extracted features showed a similar distribution in the cluster center of the cancer type of the samples. After the preprocessing of raw data, samples were more focused on the cancer subtype distributions at the SNV level. We used KNN (k-nearest neighbors) to classify the extracted features and employed the leave-one-out cross to verify them. The accuracy of classifying is stable at approximately 97% and can reach 97.43% in the most optimal case. Furthermore, we found that the validated oncogenes in the loci of the features had the highest importance among the 8 cancers. CONCLUSIONS It is feasible to classify cancers by the distribution of features we defined. Moreover, our methodology has potential implications for the discovery of oncogenes.
Collapse
|
7
|
A critical review of datasets and computational suites for improving cancer theranostics and biomarker discovery. MEDICAL ONCOLOGY (NORTHWOOD, LONDON, ENGLAND) 2022; 39:206. [PMID: 36175717 DOI: 10.1007/s12032-022-01815-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 07/29/2022] [Indexed: 10/14/2022]
Abstract
Cancer has been constantly evolving and so is the research pertaining to cancer diagnosis and therapeutic regimens. Early detection and specific therapeutics are the key features of modern cancer therapy. These requirements can only be fulfilled with the integration of diverse high-throughput technologies. Integration of advanced omics methodology involving genomics, epigenomics, proteomics, and transcriptomics provide a clear understanding of multi-faceted cancer. In the past few years, tremendous high-throughput data have been generated from cancer genomics and epigenomic analyses, which on further methodological analyses can yield better biological insights. The major epigenetic alterations reported in cancer are DNA methylation levels, histone post-translational modifications, and epi-miRNA regulating the oncogenes and tumor suppressor genes. While the genomic analyses like gene expression profiling, cancer gene prediction, and genome annotation divulge the genetic alterations in oncogenes or tumor suppressor genes. Also, systems biology approach using biological networks is being extensively used to identify novel cancer biomarkers. Therefore, integration of these multi-dimensional approaches will help to identify potential diagnostic and therapeutic biomarkers. Here, we reviewed the critical databases and tools dedicated to various epigenomic and genomic alterations in cancer. The review further focuses on the multi-omics resources available for further validating the identified cancer biomarkers. We also highlighted the tools for cancer biomarker discovery using a systems biology approach utilizing genomic and epigenomic data. Biomarkers predicted using such integrative approaches are shown to be more clinically relevant.
Collapse
|
8
|
Surachat K, Taylor TD, Wattanamatiphot W, Sukpisit S, Jeenkeawpiam K. aTAP: automated transcriptome analysis platform for processing RNA-seq data by de novo assembly. Heliyon 2022; 8:e10255. [PMID: 36033257 PMCID: PMC9404342 DOI: 10.1016/j.heliyon.2022.e10255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2021] [Revised: 04/27/2022] [Accepted: 08/05/2022] [Indexed: 11/05/2022] Open
Abstract
RNA-seq is a sequencing technique that uses next-generation sequencing (NGS) to explore and study the entire transcriptome of a biological sample. NGS-based analyses are mostly performed via command-line interfaces, which is an obstacle for molecular biologists and researchers. Therefore, the higher throughputs from NGS can only be accessed with the help of bioinformatics and computer science expertise. As the cost of sequencing is continuously falling, the use of RNA-seq seems certain to increase. To minimize the problems encountered by biologists and researchers in RNA-seq data analysis, we propose an automated platform with a web application that integrates various bioinformatics pipelines. The platform is intended to enable academic users to more easily analyze transcriptome datasets. Our automated Transcriptome Analysis Platform (aTAP) offers comprehensive bioinformatics workflows, including quality control of raw reads, trimming of low-quality reads, de novo transcriptome assembly, transcript expression quantification, differential expression analysis, and transcript annotation. aTAP has a user-friendly graphical interface, allowing researchers to interact with and visualize results in the web browser. This project offers an alternative way to analyze transcriptome data, by integrating efficient and well-known tools, that is simpler and more accessible to research communities. aTAP is freely available to academic users at https://atap.psu.ac.th/.
Collapse
Affiliation(s)
- Komwit Surachat
- Department of Biomedical Sciences and Biomedical Engineering, Faculty of Medicine, Prince of Songkla University, Hat Yai, Songkhla 90110, Thailand.,Translational Medicine Research Center, Faculty of Medicine, Prince of Songkla University, Hat Yai, Songkhla 90110, Thailand.,Molecular Evolution and Computational Biology Research Unit, Faculty of Science, Prince of Songkla University, Hat Yai, Songkhla 90110, Thailand
| | - Todd Duane Taylor
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan
| | - Wanicbut Wattanamatiphot
- Division of Computational Science, Faculty of Science, Prince of Songkla University, Hat Yai, Songkhla 90110, Thailand
| | - Sukgamon Sukpisit
- Division of Computational Science, Faculty of Science, Prince of Songkla University, Hat Yai, Songkhla 90110, Thailand
| | - Kongpop Jeenkeawpiam
- Molecular Evolution and Computational Biology Research Unit, Faculty of Science, Prince of Songkla University, Hat Yai, Songkhla 90110, Thailand
| |
Collapse
|
9
|
Nelligan NM, Bender MR, Feltus FA. Simulating the restoration of normal gene expression from different thyroid cancer stages using deep learning. BMC Cancer 2022; 22:612. [PMID: 35659616 PMCID: PMC9166476 DOI: 10.1186/s12885-022-09704-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Accepted: 05/24/2022] [Indexed: 11/18/2022] Open
Abstract
Background Thyroid cancer (THCA) is the most common endocrine malignancy and incidence is increasing. There is an urgent need to better understand the molecular differences between THCA tumors at different pathologic stages so appropriate diagnostic, prognostic, and treatment strategies can be applied. Transcriptome State Perturbation Generator (TSPG) is a tool created to identify the changes in gene expression necessary to transform the transcriptional state of a source sample to mimic that of a target. Methods We used TSPG to perturb the bulk RNA expression data from various THCA tumor samples at progressive stages towards the transcriptional pattern of normal thyroid tissue. The perturbations produced were analyzed to determine if there are consistently up- or down-regulated genes or functions in certain stages of tumors. Results Some genes of particular interest were investigated further in previous research. SLC6A15 was found to be down-regulated in all stage 1–3 samples. This gene has previously been identified as a tumor suppressor. The up-regulation of PLA2G12B in all samples was notable because the protein encoded by this gene belongs to the PLA2 superfamily, which is involved in metabolism, a major function of the thyroid gland. REN was up-regulated in all stage 3 and 4 samples. The enzyme renin encoded by this gene, has a role in the renin-angiotensin system; this system regulates angiogenesis and may have a role in cancer development and progression. This is supported by the consistent up-regulation of REN only in later stage tumor samples. Functional enrichment analysis showed that olfactory receptor activities and similar terms were enriched for the up-regulated genes which supports previous research concluding that abundance and stimulation of olfactory receptors is linked to cancer. Conclusions TSPG can be a useful tool in exploring large gene expression datasets and extracting the meaningful differences between distinct classes of data. We identified genes that were characteristically perturbed in certain sample types, including only late-stage THCA tumors. Additionally, we provided evidence for potential transcriptional signatures of each stage of thyroid cancer. These are potentially relevant targets for future investigation into THCA tumorigenesis. Supplementary Information The online version contains supplementary material available at 10.1186/s12885-022-09704-z.
Collapse
|
10
|
Jha A, Quesnel-Vallières M, Wang D, Thomas-Tikhonenko A, Lynch KW, Barash Y. Identifying common transcriptome signatures of cancer by interpreting deep learning models. Genome Biol 2022; 23:117. [PMID: 35581644 PMCID: PMC9112525 DOI: 10.1186/s13059-022-02681-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 04/27/2022] [Indexed: 01/01/2023] Open
Abstract
Background Cancer is a set of diseases characterized by unchecked cell proliferation and invasion of surrounding tissues. The many genes that have been genetically associated with cancer or shown to directly contribute to oncogenesis vary widely between tumor types, but common gene signatures that relate to core cancer pathways have also been identified. It is not clear, however, whether there exist additional sets of genes or transcriptomic features that are less well known in cancer biology but that are also commonly deregulated across several cancer types. Results Here, we agnostically identify transcriptomic features that are commonly shared between cancer types using 13,461 RNA-seq samples from 19 normal tissue types and 18 solid tumor types to train three feed-forward neural networks, based either on protein-coding gene expression, lncRNA expression, or splice junction use, to distinguish between normal and tumor samples. All three models recognize transcriptome signatures that are consistent across tumors. Analysis of attribution values extracted from our models reveals that genes that are commonly altered in cancer by expression or splicing variations are under strong evolutionary and selective constraints. Importantly, we find that genes composing our cancer transcriptome signatures are not frequently affected by mutations or genomic alterations and that their functions differ widely from the genes genetically associated with cancer. Conclusions Our results highlighted that deregulation of RNA-processing genes and aberrant splicing are pervasive features on which core cancer pathways might converge across a large array of solid tumor types. Supplementary Information The online version contains supplementary material available at (10.1186/s13059-022-02681-3).
Collapse
Affiliation(s)
- Anupama Jha
- Department of Computer and Information Science, School of Engineering and Applied Science, Philadelphia, USA.
| | - Mathieu Quesnel-Vallières
- Department of Genetics, Philadelphia, USA. .,Department of Biochemistry and Biophysics, Philadelphia, USA.
| | - David Wang
- Department of Genetics, Philadelphia, USA
| | - Andrei Thomas-Tikhonenko
- Department of Pathology and Laboratory Medicine, Philadelphia, USA.,Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA.,Division of Cancer Pathobiology, Children's Hospital of Philadelphia, Philadelphia, USA
| | - Kristen W Lynch
- Department of Biochemistry and Biophysics, Philadelphia, USA
| | - Yoseph Barash
- Department of Computer and Information Science, School of Engineering and Applied Science, Philadelphia, USA. .,Department of Genetics, Philadelphia, USA.
| |
Collapse
|
11
|
Ding Y, Jiang J, Xu J, Chen Y, Zheng Y, Jiang W, Mao C, Jiang H, Bao X, Shen Y, Li X, Teng L, Xu N. Site-specific therapy in cancers of unknown primary site: a systematic review and meta-analysis. ESMO Open 2022; 7:100407. [PMID: 35248824 PMCID: PMC8897579 DOI: 10.1016/j.esmoop.2022.100407] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 01/22/2022] [Accepted: 01/25/2022] [Indexed: 12/01/2022] Open
Abstract
Background Cancer of unknown primary site (CUP) is a term applied to characterize pathologically confirmed metastatic cancer with unknown primary tumor origin. It remains uncertain whether patients with CUP benefit from site-specific therapy guided by molecular profiling. Patients and methods A systematic search in PubMed, Web of Science, Embase, Cochrane Library, and ClinicalTrials.gov, and of conference abstracts from January 1976 to January 2021 was performed to identify studies investigating the efficacy of site-specific therapy on patients with CUP. The quality of included studies was evaluated using the Cochrane risk of bias tool and Newcastle–Ottawa scale. Eligible studies were weighted and pooled for meta-analysis. Hazard ratios (HRs) for overall survival (OS) and progression-free survival (PFS) were assessed to compare the efficacy of site-specific therapy with empiric therapy in patients with CUP. In addition, subgroup analyses were conducted. Results Five studies comprising 1114 patients were identified, of which 454 patients received site-specific therapy, and 660 patients received empiric therapy. Our meta-analysis revealed that site-specific therapy was not significantly associated with improved PFS [HR 0.93, 95% confidence interval (CI) 0.74-1.17, P = 0.534] and OS (HR 0.75, 95% CI 0.55-1.03, P = 0.069), compared with empiric therapy. However, during subgroup analysis significantly improved OS was associated with site-specific therapy in the high-accuracy predictive assay subgroup (HR 0.46, 95% CI 0.26-0.81, P = 0.008) compared with the low accuracy predictive assay subgroup (HR 0.93, 95% CI 0.75-1.15, P = 0.509). Furthermore, compared with patients with less responsive tumor types, more survival benefit from site-specific therapy was found in patients with more responsive tumors (HR 0.67, 95% CI 0.46-0.97, P = 0.037). Conclusions Our results suggest that site-specific therapy is not significantly associated with improved survival outcomes; however, it might benefit patients with CUP with responsive tumor types. Studies evaluating the role of site-specific therapy guided by molecular profiling in CUP provided contradictory results. Site-specific therapy is not significantly associated with improved survival outcomes in the overall CUP population. Molecularly defined site-specific therapy may improve OS only when high-accuracy assays assign CUP to responsive tumor types.
Collapse
Affiliation(s)
- Y Ding
- Department of Medical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - J Jiang
- Department of Surgical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - J Xu
- Department of Thoracic Surgery, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Y Chen
- Department of Surgical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Y Zheng
- Department of Medical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - W Jiang
- Department of Colorectal Surgery, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou; China
| | - C Mao
- Department of Medical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - H Jiang
- Department of Medical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - X Bao
- Department of Medical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Y Shen
- Centre of Clinical Laboratory, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou; China; Key Laboratory of Clinical In Vitro Diagnostic Techniques of Zhejiang Province, Hangzhou; China; Institute of Laboratory Medicine, Zhejiang University, Hangzhou; China
| | - X Li
- Department of Surgery, The Second Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - L Teng
- Department of Surgical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China.
| | - N Xu
- Department of Medical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China.
| |
Collapse
|
12
|
Abstract
This overview of the molecular pathology of lung cancer includes a review of the most salient molecular alterations of the genome, transcriptome, and the epigenome. The insights provided by the growing use of next-generation sequencing (NGS) in lung cancer will be discussed, and interrelated concepts such as intertumor heterogeneity, intratumor heterogeneity, tumor mutational burden, and the advent of liquid biopsy will be explored. Moreover, this work describes how the evolving field of molecular pathology refines the understanding of different histologic phenotypes of non-small-cell lung cancer (NSCLC) and the underlying biology of small-cell lung cancer. This review will provide an appreciation for how ongoing scientific findings and technologic advances in molecular pathology are crucial for development of biomarkers, therapeutic agents, clinical trials, and ultimately improved patient care.
Collapse
Affiliation(s)
- James J Saller
- Departments of Pathology and Thoracic Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, Florida 33612, USA
| | - Theresa A Boyle
- Departments of Pathology and Thoracic Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, Florida 33612, USA
| |
Collapse
|
13
|
Feng C, Xiang T, Yi Z, Zhao L, He S, Tian K. An Ensemble Model for Tumor Type Identification and Cancer Origins Classification. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2021; 2021:1660-1665. [PMID: 34891604 DOI: 10.1109/embc46164.2021.9629691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Tissue biopsy can be wildly used in cancer diagnosis. However, manually classifying the cancerous status of biopsies and tissue origin of tumors for cancerous ones requires skilled specialists and sophisticated equipment. As a result, a data-based model is urgently needed. In this paper, we propose a data-based ensemble model for tumor type identification and cancer origins classification. Our model is an ensemble model that combines different models based on mRNA groups which serve distinct functions. The experiment on the TCGA dataset exhibits a promising result on both tasks - 98% on tumor type identification and 96.1% on cancer origin classification. We also test our model on external validation datasets, which prove the robustness of our model.
Collapse
|
14
|
Li Y, Wu D, Wei C, Yang X, Zhou S. [CDK1, CCNB1 and NDC80 are associated with prognosis and progression of hepatitis B virus-associated hepatocellular carcinoma: a bioinformatic analysis]. NAN FANG YI KE DA XUE XUE BAO = JOURNAL OF SOUTHERN MEDICAL UNIVERSITY 2021; 41:1509-1518. [PMID: 34755666 DOI: 10.12122/j.issn.1673-4254.2021.10.09] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
OBJECTIVE To identify the key genes involved in the transformation of hepatitis B virus (HBV) into hepatocellular carcinoma (HCC) and explore the underlying molecular mechanisms. METHODS We analyzed the mRNA microarray data of 119 HBV-related HCC tissues and 252 HBV-related non-tumor tissues in GSE55092, GSE84044 and GSE121248 from the GEO database, and the "sva" R package was used to remove the batch effects. Integration analysis was performed to identify the differentially expressed genes (DEGs) in HBV-related liver cancer and liver tissues with HBV infection. The significant DEGs were functionally annotated using GO and KEGG analyses, and the most important modules and hub genes were explored with STRING analysis. Kaplan-Meier and Oncomine databases were used to verify the HCC gene expression data in the TCGA database to explore the correlations of the hub genes with the occurrence, progression and prognosis of HCC. We also examined the expressions of the hub genes in 17 pairs of surgical specimens of HCC and adjacent tissues using RT-qPCR. RESULTS We identified a total of 121 DEGs and 3 genetic markers in HCC (P < 0.01). These DEGs included cyclin1 (CDK1), cyclin B1 (CCNB1), and nuclear division cycle 80 (NDC80), which participated in cell cycle, pyrimidine metabolism and DNA replication and were highly correlated (P < 0.05). Analysis of the UALCAN database confirmed high expressions of these 3 genes in HCC tissues, which were correlated with a low survival rate of the patients, as shown by Kaplan-Meier analysis of the prognostic data from the UALCAN database. CDK1, CCNB1 and NDC80 were all correlated with the clinical grading of HCC (P < 0.05). The results of RT-qPCR on the surgical specimens verified significantly higher expressions of CDK1, CCNB1 and NDC80 mRNA in HCC tissues than in the adjacent tissues. CONCLUSION CDK1, CCNB1 and NDC80 genes can be used as prognostic markers of HBV-related HCC and may serve as potential targets in preclinical studies and clinical treatment of HCC.
Collapse
Affiliation(s)
- Y Li
- Department of Biochemistry and Molecular Biology, School of Basic Medicine, Guangxi Medical University, Nanning 530021, China.,The Key Laboratory of Longevity and Geriatric-related Diseases of the Ministry of Education, Nanning 530021, China
| | - D Wu
- Department of Biochemistry and Molecular Biology, School of Basic Medicine, Guangxi Medical University, Nanning 530021, China.,The Key Laboratory of Biomolecular Medicine Research in Guangxi Universities, Nanning 530021, China
| | - C Wei
- Department of Biochemistry and Molecular Biology, School of Basic Medicine, Guangxi Medical University, Nanning 530021, China.,The Key Laboratory of Biomolecular Medicine Research in Guangxi Universities, Nanning 530021, China
| | - X Yang
- Department of Biochemistry and Molecular Biology, School of Basic Medicine, Guangxi Medical University, Nanning 530021, China.,The Key Laboratory of Biomolecular Medicine Research in Guangxi Universities, Nanning 530021, China
| | - S Zhou
- Department of Biochemistry and Molecular Biology, School of Basic Medicine, Guangxi Medical University, Nanning 530021, China.,The Key Laboratory of the Ministry of Education for Early Prevention and Treatment of Regional High-incidence Tumors, Nanning 530021, China
| |
Collapse
|
15
|
Thind AS, Monga I, Thakur PK, Kumari P, Dindhoria K, Krzak M, Ranson M, Ashford B. Demystifying emerging bulk RNA-Seq applications: the application and utility of bioinformatic methodology. Brief Bioinform 2021; 22:6330938. [PMID: 34329375 DOI: 10.1093/bib/bbab259] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2021] [Revised: 06/14/2021] [Accepted: 06/18/2021] [Indexed: 12/13/2022] Open
Abstract
Significant innovations in next-generation sequencing techniques and bioinformatics tools have impacted our appreciation and understanding of RNA. Practical RNA sequencing (RNA-Seq) applications have evolved in conjunction with sequence technology and bioinformatic tools advances. In most projects, bulk RNA-Seq data is used to measure gene expression patterns, isoform expression, alternative splicing and single-nucleotide polymorphisms. However, RNA-Seq holds far more hidden biological information including details of copy number alteration, microbial contamination, transposable elements, cell type (deconvolution) and the presence of neoantigens. Recent novel and advanced bioinformatic algorithms developed the capacity to retrieve this information from bulk RNA-Seq data, thus broadening its scope. The focus of this review is to comprehend the emerging bulk RNA-Seq-based analyses, emphasizing less familiar and underused applications. In doing so, we highlight the power of bulk RNA-Seq in providing biological insights.
Collapse
Affiliation(s)
- Amarinder Singh Thind
- University of Wollongong, Wollongong, Australia.,Illawarra Health and Medical Research Institute, Wollongong, Australia
| | - Isha Monga
- Columbia University, New York City, NY, USA
| | | | - Pallawi Kumari
- Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | - Kiran Dindhoria
- Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | | | - Marie Ranson
- University of Wollongong, Wollongong, Australia.,Illawarra Health and Medical Research Institute, Wollongong, Australia
| | - Bruce Ashford
- University of Wollongong, Wollongong, Australia.,Illawarra Health and Medical Research Institute, Wollongong, Australia
| |
Collapse
|
16
|
Vibert J, Pierron G, Benoist C, Gruel N, Guillemot D, Vincent-Salomon A, Le Tourneau C, Livartowski A, Mariani O, Baulande S, Bidard FC, Delattre O, Waterfall JJ, Watson S. Identification of Tissue of Origin and Guided Therapeutic Applications in Cancers of Unknown Primary Using Deep Learning and RNA Sequencing (TransCUPtomics). J Mol Diagn 2021; 23:1380-1392. [PMID: 34325056 DOI: 10.1016/j.jmoldx.2021.07.009] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Revised: 05/14/2021] [Accepted: 07/14/2021] [Indexed: 01/04/2023] Open
Abstract
Cancers of unknown primary (CUP) are metastatic cancers for which the primary tumor is not found despite thorough diagnostic investigations. Multiple molecular assays have been proposed to identify the tissue of origin (TOO) and inform clinical care; however, none has been able to combine accuracy, interpretability, and easy access for routine use. We developed a classifier tool based on the training of a variational autoencoder to predict tissue of origin based on RNA-sequencing data. We used as training data 20,918 samples corresponding to 94 different categories, including 39 cancer types and 55 normal tissues. The TransCUPtomics classifier was applied to a retrospective cohort of 37 CUP patients and 11 prospective patients. TransCUPtomics exhibited an overall accuracy of 96% on reference data for TOO prediction. The TOO could be identified in 38 (79%) of 48 CUP patients. Eight of 11 prospective CUP patients (73%) could receive first-line therapy guided by TransCUPtomics prediction, with responses observed in most patients. The variational autoencoder added further utility by enabling prediction interpretability, and diagnostic predictions could be matched to detection of gene fusions and expressed variants. TransCUPtomics confidently predicted TOO for CUP and enabled tailored treatments leading to significant clinical responses. The interpretability of our approach is a powerful addition to improve the management of CUP patients.
Collapse
Affiliation(s)
- Julien Vibert
- INSERM U830, Équipe Labellisée Ligue Nationale Contre le Cancer, Diversity and Plasticity of Childhood Tumors Lab, PSL Research University, Institut Curie Research Center, Paris, France
| | - Gaëlle Pierron
- Somatic Genetics Unit, Department of Genetics, Institut Curie Hospital, Paris, France
| | - Camille Benoist
- Clinical Bioinformatic Unit, Department of Diagnostic and Theranostic Medecine, Institut Curie Hospital, Paris, France
| | - Nadège Gruel
- INSERM U830, Équipe Labellisée Ligue Nationale Contre le Cancer, Diversity and Plasticity of Childhood Tumors Lab, PSL Research University, Institut Curie Research Center, Paris, France; Department of Translational Research, PSL Research University, Institut Curie Research Center, Paris, France
| | - Delphine Guillemot
- Somatic Genetics Unit, Department of Genetics, Institut Curie Hospital, Paris, France
| | - Anne Vincent-Salomon
- Department of Diagnostic and Theranostic Medecine, Institut Curie Hospital, Paris, France
| | - Christophe Le Tourneau
- Department of Drug Development and Innovation, INSERM U900, Paris-Saclay University, Institut Curie Hospital and Research Center, Paris and Saint-Cloud
| | - Alain Livartowski
- Department of Medical Oncology, Institut Curie Hospital, Paris, France
| | - Odette Mariani
- Department of Diagnostic and Theranostic Medecine, Institut Curie Hospital, Paris, France
| | - Sylvain Baulande
- Institut Curie Genomics of Excellence (ICGex) Platform, PSL Research University, Institut Curie Research Center, Paris, France
| | - François-Clément Bidard
- Department of Medical Oncology, Institut Curie Hospital, Paris, France; INSERM CIC-BT 1428, UVSQ, Paris-Saclay University, Saint-Cloud, France
| | - Olivier Delattre
- INSERM U830, Équipe Labellisée Ligue Nationale Contre le Cancer, Diversity and Plasticity of Childhood Tumors Lab, PSL Research University, Institut Curie Research Center, Paris, France; Somatic Genetics Unit, Department of Genetics, Institut Curie Hospital, Paris, France
| | - Joshua J Waterfall
- Department of Translational Research, PSL Research University, Institut Curie Research Center, Paris, France; INSERM U830, PSL Research University, Institut Curie Research Center, Paris, France
| | - Sarah Watson
- INSERM U830, Équipe Labellisée Ligue Nationale Contre le Cancer, Diversity and Plasticity of Childhood Tumors Lab, PSL Research University, Institut Curie Research Center, Paris, France; Department of Medical Oncology, Institut Curie Hospital, Paris, France.
| |
Collapse
|
17
|
Dermawan JK, Rubin BP. The role of molecular profiling in the diagnosis and management of metastatic undifferentiated cancer of unknown primary ✰: Molecular profiling of metastatic cancer of unknown primary. Semin Diagn Pathol 2020; 38:193-198. [PMID: 33309276 DOI: 10.1053/j.semdp.2020.12.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 11/24/2020] [Accepted: 12/02/2020] [Indexed: 12/17/2022]
Abstract
Cancer of unknown primary (CUP) refers to metastatic tumors for which the primary tumor of origin cannot be determined at the time of diagnosis, despite extensive clinicopathologic investigations. Molecular profiling is increasingly able to predict a probable primary tumor type for CUP when clinicopathologic workup is inconclusive. Numerous studies have explored the use of various molecular profiling techniques for identification of site/tissue of origin of CUP. These techniques include gene expression profiling utilizing microarray, reverse transcriptase polymerase chain reaction, RNA-sequencing, somatic gene mutation profiling with next-generation DNA sequencing, and epigenomics including DNA methylation profiling. Despite the generally poor prognosis of CUP, a minority of patients can expect to benefit from targeted therapy despite being agnostic to the tissue of origin. Studies have explored the use of various molecular profiling techniques to predict prognostic and therapeutic biomarkers, with the goal of improving outcome for patients with CUP. However, discordant results between non-randomized and randomized clinical trials in evaluating tumor-type specific therapies raise uncertainties of the benefits of molecularly-predicted tissue of origin-based treatment in routine clinical use. Nevertheless, the current overall trend is in favor of using molecular tools to refine the diagnosis and clinical management of patients with CUP. More large-cohort, randomized prospective studies are needed to assess and validate the utility and feasibility of molecular profiling to uncover potentially targetable genetic alterations. These efforts will also yield further biological insights into the biology and pathogenesis of CUP (Graphical Abstract).
Collapse
Affiliation(s)
- Josephine K Dermawan
- Robert J. Tomsich Pathology and Laboratory Medicine Institute, Cleveland Clinic, Cleveland, OH 44195, United States
| | - Brian P Rubin
- Robert J. Tomsich Pathology and Laboratory Medicine Institute, Cleveland Clinic, Cleveland, OH 44195, United States.
| |
Collapse
|
18
|
Tsimberidou AM, Fountzilas E, Bleris L, Kurzrock R. Transcriptomics and solid tumors: The next frontier in precision cancer medicine. Semin Cancer Biol 2020; 84:50-59. [PMID: 32950605 DOI: 10.1016/j.semcancer.2020.09.007] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Revised: 08/16/2020] [Accepted: 09/09/2020] [Indexed: 01/08/2023]
Abstract
Transcriptomics, which encompasses assessments of alternative splicing and alternative polyadenylation, identification of fusion transcripts, explorations of noncoding RNAs, transcript annotation, and discovery of novel transcripts, is a valuable tool for understanding cancer mechanisms and identifying biomarkers. Recent advances in high-throughput technologies have enabled large-scale gene expression profiling. Importantly, RNA expression profiling of tumor tissue has been successfully used to determine clinically actionable molecular alterations. The WINTHER precision medicine clinical trial was the first prospective trial in diverse solid malignancies that assessed both genomics and transcriptomics to match treatments to specific molecular alterations. The use of transcriptome analysis in WINTHER and other trials increased the number of targetable -omic changes compared to genomic profiling alone. Other applications of transcriptomics involve the evaluation of tumor and circulating noncoding RNAs as predictive and prognostic biomarkers, the improvement of risk stratification by the use of prognostic and predictive multigene assays, the identification of fusion transcripts that drive tumors, and an improved understanding of the impact of DNA changes as some genomic alterations are silenced at the RNA level. Finally, RNA sequencing and gene expression analysis have been incorporated into clinical trials to identify markers predicting response to immunotherapy. Many issues regarding the complexity of the analysis, its reproducibility and variability, and the interpretation of the results still need to be addressed. The integration of transcriptomics with genomics, proteomics, epigenetics, and tumor immune profiling will improve biomarker discovery and our understanding of disease mechanisms and, thereby, accelerate the implementation of precision oncology.
Collapse
Affiliation(s)
- Apostolia M Tsimberidou
- The University of Texas MD Anderson Cancer Center, Department of Investigational Cancer Therapeutics, Houston, TX, USA.
| | - Elena Fountzilas
- Department of Medical Oncology, Euromedica General Clinic, Thessaloniki, Greece
| | - Leonidas Bleris
- Bioengineering Department, The University of Texas at Dallas, Richardson, TX, USA
| | - Razelle Kurzrock
- Center for Personalized Cancer Therapy and Division of Hematology and Oncology, UC San Diego Moores Cancer Center, San Diego, CA, USA
| |
Collapse
|
19
|
Kim AA, Rachid Zaim S, Subbian V. Assessing reproducibility and veracity across machine learning techniques in biomedicine: A case study using TCGA data. Int J Med Inform 2020; 141:104148. [DOI: 10.1016/j.ijmedinf.2020.104148] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Revised: 03/22/2020] [Accepted: 04/16/2020] [Indexed: 11/28/2022]
|
20
|
Ji Y, Yin Y, Zhang W. Integrated Bioinformatic Analysis Identifies Networks and Promising Biomarkers for Hepatitis B Virus-Related Hepatocellular Carcinoma. Int J Genomics 2020; 2020:2061024. [PMID: 32775402 PMCID: PMC7407030 DOI: 10.1155/2020/2061024] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Revised: 06/09/2020] [Accepted: 06/27/2020] [Indexed: 02/06/2023] Open
Abstract
Chronic infection with hepatitis B virus (HBV) has long been recognized as a dominant hazard factor for hepatocellular carcinoma (HCC) and accounts for at least half of HCC instances globally. However, the underlying molecular mechanism of HBV-linked HCC has not been completely elucidated. Here, three microarray datasets, totally containing 170 tumoral samples and 181 adjacent normal tissues from the liver of patients suffering from HBV-related HCC assembled from the Gene Expression Omnibus (GEO) database, were subjected to integrated analysis of differentially expressed genes (DEGs). Subsequently, the analysis of function and pathway enrichment as well as the protein-protein interaction network (PPI) was performed. The ten hub genes screened out from the PPI network were further subjected to expression profile and survival analysis. Overall, 329 DEGs (67 upregulated and 262 downregulated) were identified. Ten DEGs with the highest degree of connectivity included cyclin-dependent kinase 1 (CDK1), cyclin B1 (CCNB1), cyclin B2 (CCNB2), PDZ-binding kinase (PBK), abnormal spindle microtubule assembly (ASPM), nuclear division cycle 80 (NDC80), aurora kinase A (AURKA), targeting protein for xenopus kinesin-like protein 2 (TPX2), kinesin family member 2C (KIF2C), and centromere protein F (CENPF). Kaplan-Meier analysis unveiled that overexpression levels of KIF2C and TPX2 were relevant to both the poor overall survival and relapse-free survival. In summary, the hub genes validated in the present study may provide promising targets for the diagnosis, prognosis, and therapy of HBV-associated HCC. Additionally, our work uncovers various crucial biological components (e.g., extracellular exosome) and signaling pathways that participate in the progression of HCC induced by HBV, serving comprehensive knowledge of the mechanisms regarding HBV-related HCC.
Collapse
Affiliation(s)
- Yun Ji
- Department of Physiology and Pathophysiology, Peking University Health Science Center, Beijing 100191, China
| | - Yue Yin
- Department of Physiology and Pathophysiology, Peking University Health Science Center, Beijing 100191, China
| | - Weizhen Zhang
- Department of Physiology and Pathophysiology, Peking University Health Science Center, Beijing 100191, China
| |
Collapse
|
21
|
Fan F, Chen D, Zhao Y, Wang H, Sun H, Sun K. Rapid preliminary purity evaluation of tumor biopsies using deep learning approach. Comput Struct Biotechnol J 2020; 18:1746-1753. [PMID: 32695267 PMCID: PMC7352054 DOI: 10.1016/j.csbj.2020.06.007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Revised: 05/18/2020] [Accepted: 06/05/2020] [Indexed: 12/29/2022] Open
Abstract
Tumor biopsy is one of the most widely used materials in cancer diagnoses and molecular studies, where the purity of the biopsies (i.e., proportion of cells that are cancerous) is crucial for both applications. However, conventional approaches for tumor biopsy purity evaluation require experienced pathologists and/or various materials/experiments therefore were time-consuming and error prone. Rapid, easy-to-perform and cost-effective methods are thus still of demand. Recent studies had demonstrated that molecular signatures were informative to this task. Previously, we had developed GeneCT, a deep learning-based cancerous status and tissue-of-origin classifier for pan-tumor/tissue biopsies. In the current work, we applied GeneCT on datasets collected from various groups, where the experimental protocols and cancer types differed from each other. We found that GeneCT showed high accuracies on most datasets; for samples with unexpected results, in-depth investigations suggested that they might suffer from imperfect purity. In silico mixture experiments further showed that GeneCT classification was highly indicative in predicting the purity of the tumor biopsies. Considering that transcriptome profiling is a common and inexpensive experiment in molecular cancer studies, our deep learning-based GeneCT could thus serve as a valuable tool for rapid, preliminary tumor biopsy purity assessment.
Collapse
Affiliation(s)
- Fei Fan
- Department of Neurosurgery, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China
| | - Dan Chen
- The Third Affiliated Hospital (Provisional) of The Chinese University of Hong, Shenzhen, Shenzhen 518172, China
| | - Yu Zhao
- Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong SAR 999077, China
| | - Huating Wang
- Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong SAR 999077, China.,Department of Orthopaedics and Traumatology, The Chinese University of Hong Kong, Hong Kong SAR 999077, China
| | - Hao Sun
- Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong SAR 999077, China.,Department of Chemical Pathology, The Chinese University of Hong Kong, Hong Kong SAR 999077, China
| | - Kun Sun
- Shenzhen Bay Laboratory, Shenzhen 518132, China
| |
Collapse
|
22
|
Identification of common and dissimilar biomarkers for different cancer types from gene expressions of RNA-sequencing data. GENE REPORTS 2020. [DOI: 10.1016/j.genrep.2020.100654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
23
|
Jiang S, Cheng SJ, Ren LC, Wang Q, Kang YJ, Ding Y, Hou M, Yang XX, Lin Y, Liang N, Gao G. An expanded landscape of human long noncoding RNA. Nucleic Acids Res 2019; 47:7842-7856. [PMID: 31350901 PMCID: PMC6735957 DOI: 10.1093/nar/gkz621] [Citation(s) in RCA: 81] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2019] [Revised: 06/18/2019] [Accepted: 07/11/2019] [Indexed: 12/21/2022] Open
Abstract
Long noncoding RNAs (lncRNAs) are emerging as key regulators of multiple essential biological processes involved in physiology and pathology. By analyzing the largest compendium of 14,166 samples from normal and tumor tissues, we significantly expand the landscape of human long noncoding RNA with a high-quality atlas: RefLnc (Reference catalog of LncRNA). Powered by comprehensive annotation across multiple sources, RefLnc helps to pinpoint 275 novel intergenic lncRNAs correlated with sex, age or race as well as 369 novel ones associated with patient survival, clinical stage, tumor metastasis or recurrence. Integrated in a user-friendly online portal, the expanded catalog of human lncRNAs provides a valuable resource for investigating lncRNA function in both human biology and cancer development.
Collapse
Affiliation(s)
- Shuai Jiang
- Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
| | - Si-Jin Cheng
- Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
| | - Li-Chen Ren
- Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
| | - Qian Wang
- Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
| | - Yu-Jian Kang
- Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
| | - Yang Ding
- Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
| | - Mei Hou
- Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
| | - Xiao-Xu Yang
- Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
| | - Yuan Lin
- Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
| | - Nan Liang
- Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
| | - Ge Gao
- Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
| |
Collapse
|
24
|
Johnson NT, Dhroso A, Hughes KJ, Korkin D. Biological classification with RNA-seq data: Can alternatively spliced transcript expression enhance machine learning classifiers? RNA (NEW YORK, N.Y.) 2018; 24:1119-1132. [PMID: 29941426 PMCID: PMC6097660 DOI: 10.1261/rna.062802.117] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 06/03/2018] [Indexed: 05/09/2023]
Abstract
RNA sequencing (RNA-seq) is becoming a prevalent approach to quantify gene expression and is expected to gain better insights into a number of biological and biomedical questions compared to DNA microarrays. Most importantly, RNA-seq allows us to quantify expression at the gene or transcript levels. However, leveraging the RNA-seq data requires development of new data mining and analytics methods. Supervised learning methods are commonly used approaches for biological data analysis that have recently gained attention for their applications to RNA-seq data. Here, we assess the utility of supervised learning methods trained on RNA-seq data for a diverse range of biological classification tasks. We hypothesize that the transcript-level expression data are more informative for biological classification tasks than the gene-level expression data. Our large-scale assessment utilizes multiple data sets, organisms, lab groups, and RNA-seq analysis pipelines. Overall, we performed and assessed 61 biological classification problems that leverage three independent RNA-seq data sets and include over 2000 samples that come from multiple organisms, lab groups, and RNA-seq analyses. These 61 problems include predictions of the tissue type, sex, or age of the sample, healthy or cancerous phenotypes, and pathological tumor stages for the samples from the cancerous tissue. For each problem, the performance of three normalization techniques and six machine learning classifiers was explored. We find that for every single classification problem, the transcript-based classifiers outperform or are comparable with gene expression-based methods. The top-performing techniques reached a near perfect classification accuracy, demonstrating the utility of supervised learning for RNA-seq based data analysis.
Collapse
Affiliation(s)
- Nathan T Johnson
- Worcester Polytechnic Institute, Bioinformatics and Computational Biology Program, Worcester, Massachusetts 01609, USA
| | - Andi Dhroso
- Worcester Polytechnic Institute, Bioinformatics and Computational Biology Program, Worcester, Massachusetts 01609, USA
| | - Katelyn J Hughes
- Worcester Polytechnic Institute, Bioinformatics and Computational Biology Program, Worcester, Massachusetts 01609, USA
| | - Dmitry Korkin
- Worcester Polytechnic Institute, Bioinformatics and Computational Biology Program, Worcester, Massachusetts 01609, USA
- Worcester Polytechnic Institute, Department of Computer Science, Worcester, Massachusetts 01609, USA
| |
Collapse
|
25
|
Jiang W, Ding Y, Shen Y, Fan L, Zhou L, Li Z, Zheng Y, Zhao P, Liu L, Tong Z, Fang W, Wang W. Identifying the clonal origin of synchronous multifocal tumors in the hepatobiliary and pancreatic system using multi-omic platforms. Oncotarget 2018; 8:5016-5025. [PMID: 28008139 PMCID: PMC5354888 DOI: 10.18632/oncotarget.14018] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2016] [Accepted: 12/07/2016] [Indexed: 01/06/2023] Open
Abstract
Synchronous multifocal tumors often pose a diagnostic challenge for oncologists. The purpose of this study was to determine the clonal origin and metastatic relationship of synchronous multifocal tumors in the hepatobiliary and pancreatic system using multi-omic platforms. DNA samples were extracted from three masses harvested from a 50-year-old Han Chinese male patient who suffered from synchronous multifocal tumors in the pancreatic tail, upper biliary duct, and omentum at the time of diagnosis. The clonal origin of these samples was tested using two platforms: next-generation sequencing (NGS) of 390 key genes harboring cancer-relevant actionable mutations and whole-genome copy number variation (CNV) chip analysis. The NGS approach revealed high mutational concordance, and the gene CNV profiles were similar between lesions. Whole-genome CNVs for the three samples were further investigated using an Affymetrix chip. Using matched CNV chip data from The Cancer Genome Atlas (TCGA), we developed a computational model that generated tissue-specific CNV signatures for hepatocellular carcinoma, pancreatic carcinoma, and cholangiocarcinoma to accurately identify the origin of the tumor samples. After adding the patient's CNV chip data to the model, all three samples were clustered into the pancreatic cancer branch. Both our NGS and CNV chip analyses suggested that clinically diagnosed synchronous pancreatic cancer and cholangiocarcinoma originated from the same cell population in the pancreas in our patient. This study highlights the use of genomic tools to infer the origin of synchronous multifocal tumors, which could help to improve the accuracy of cancer diagnosis.
Collapse
Affiliation(s)
- Weiqin Jiang
- Cancer Biotherapy Center, First Affiliated Hospital, Zhejiang University, China
| | - Yongfeng Ding
- Department of Surgical Oncology, First Affiliated Hospital, Zhejiang University, China
| | - Yifei Shen
- Institute of Bioinformatics & Research Center for Air Pollution and Health, Zhejiang University, China
| | - Longjiang Fan
- Institute of Bioinformatics & Research Center for Air Pollution and Health, Zhejiang University, China
| | - Linfu Zhou
- Medical Biotechnology Laboratory, Zhejiang University, China
| | - Zhi Li
- Department of Radiology, First Affiliated Hospital, Zhejiang University, China
| | - Yi Zheng
- Cancer Biotherapy Center, First Affiliated Hospital, Zhejiang University, China
| | - Peng Zhao
- Cancer Biotherapy Center, First Affiliated Hospital, Zhejiang University, China
| | - Lulu Liu
- Cancer Biotherapy Center, First Affiliated Hospital, Zhejiang University, China
| | - Zhou Tong
- Cancer Biotherapy Center, First Affiliated Hospital, Zhejiang University, China
| | - Weijia Fang
- Cancer Biotherapy Center, First Affiliated Hospital, Zhejiang University, China
| | - Weilin Wang
- Key Laboratory of Precision Diagnosis & Treatment for Hepatobiliary & Pancreatic Tumor, First Affiliated Hospital, Zhejiang University, China.,Division of Hepatobiliary and Pancreatic Surgery, Department of Surgery, First Affiliated Hospital, Zhejiang University, China.,Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, China
| |
Collapse
|
26
|
Kumar-Sinha C, Chinnaiyan AM. Precision oncology in the age of integrative genomics. Nat Biotechnol 2018; 36:46-60. [PMID: 29319699 PMCID: PMC6364676 DOI: 10.1038/nbt.4017] [Citation(s) in RCA: 87] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2016] [Accepted: 10/20/2017] [Indexed: 02/08/2023]
Abstract
Precision oncology applies genomic and other molecular analyses of tumor biopsies to improve the diagnosis and treatment of cancers. In addition to identifying therapeutic options, precision oncology tracks the response of a tumor to an intervention at the molecular level and detects drug resistance and the mechanisms by which it occurs. Integrative genomics can include sequencing specific panels of genes, exomes, or the entire triad of the patient's germline, tumor exome, and tumor transcriptome. Although the capabilities of sequencing technologies continue to improve, widespread adoption of genomics-driven precision oncology in the clinic has been held back by logistical, regulatory, financial, and ethical considerations. Nevertheless, integrative clinical sequencing programs applied at the point of care have the potential to improve the clinical management of cancer patients.
Collapse
Affiliation(s)
- Chandan Kumar-Sinha
- Michigan Center for Translational Pathology
- Department of Pathology, University of Michigan
| | - Arul M. Chinnaiyan
- Michigan Center for Translational Pathology
- Department of Pathology, University of Michigan
- Department of Computational Medicine and Bioinformatics,
University of Michigan
- Howard Hughes Medical Institute, University of Michigan
Medical School
- Department of Urology, University of Michigan
- Comprehensive Cancer Center, University of Michigan Medical
School, Ann Arbor, MI 48109
| |
Collapse
|
27
|
Cieślik M, Chinnaiyan AM. Cancer transcriptome profiling at the juncture of clinical translation. Nat Rev Genet 2017; 19:93-109. [PMID: 29279605 DOI: 10.1038/nrg.2017.96] [Citation(s) in RCA: 159] [Impact Index Per Article: 19.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Methodological breakthroughs over the past four decades have repeatedly revolutionized transcriptome profiling. Using RNA sequencing (RNA-seq), it has now become possible to sequence and quantify the transcriptional outputs of individual cells or thousands of samples. These transcriptomes provide a link between cellular phenotypes and their molecular underpinnings, such as mutations. In the context of cancer, this link represents an opportunity to dissect the complexity and heterogeneity of tumours and to discover new biomarkers or therapeutic strategies. Here, we review the rationale, methodology and translational impact of transcriptome profiling in cancer.
Collapse
Affiliation(s)
- Marcin Cieślik
- Michigan Center for Translational Pathology, University of Michigan.,Department of Pathology, University of Michigan
| | - Arul M Chinnaiyan
- Michigan Center for Translational Pathology, University of Michigan.,Department of Pathology, University of Michigan.,Comprehensive Cancer Center, University of Michigan.,Department of Urology, University of Michigan.,Howard Hughes Medical Institute, University of Michigan, Ann Arbor, Michigan 48109, USA
| |
Collapse
|
28
|
Shang J, Song Q, Yang Z, Li D, Chen W, Luo L, Wang Y, Yang J, Li S. Identification of lung adenocarcinoma specific dysregulated genes with diagnostic and prognostic value across 27 TCGA cancer types. Oncotarget 2017; 8:87292-87306. [PMID: 29152081 PMCID: PMC5675633 DOI: 10.18632/oncotarget.19823] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2017] [Accepted: 06/18/2017] [Indexed: 01/06/2023] Open
Abstract
As the most common histologic subtype of lung cancer, lung adenocarcinoma (LUAD) contributes to a majority of cancer-related deaths worldwide annually. In order to find specific biomarkers of LUAD that are able to distinguish LUAD from other types of cancer so as to improve the early diagnostic and prognostic power in LUAD, we analyzed 10098 tumor tissue samples across 27 TCGA cancer types and identified 112 specific expressed genes in LUAD. Meantime, 8240 LUAD dysregulated genes in tumor and normal samples were identified. Combining with the results of specific expressed genes and dysregulated genes in LUAD, we found there were 70 specific dysregulated genes in LUAD (LUAD-SDGs). Then ROC curve revealed six LUAD-SDGs that may be of strong diagnostic value to predict the existence of cancer (area under curve[AUC] > 95%). Kaplan-Meier survival analysis was performed to identify 6 LUAD-SDGs associated with patients' prognosis (P-values < 0.001). Multivariate Cox proportional hazards regression was employed to demonstrate that the six LUAD-SDGs were independent prognostic factors. Then, we used the six overall survival (OS)-related LUAD-SDGs constructing a six-gene signature. Multivariate Cox regression analysis suggested that the six-gene signature was an independent prognostic factor of other clinical variables (hazard ratio [HR] = 1.5098, 95%CI = 1.2996-1.7538, P < 0.0001). Based on our findings, we first presented the LUAD-SDGs for LUAD diagnosis and prognosis. Our results may provide efficient biomarkers to clinical diagnostic and prognostic evaluation in LUAD.
Collapse
Affiliation(s)
- Jun Shang
- Department of Thoracic and Cardiovascular Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning 530021, P. R. China
| | - Qian Song
- Department of Thoracic and Cardiovascular Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning 530021, P. R. China
| | - Zuyi Yang
- Department of Hematology, The First Affiliated Hospital of Soochow University, Suzhou 215006, P. R. China
| | - Dongyao Li
- Department of Thoracic and Cardiovascular Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning 530021, P. R. China
| | - Wenjie Chen
- Department of Thoracic and Cardiovascular Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning 530021, P. R. China
| | - Lei Luo
- Department of Thoracic and Cardiovascular Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning 530021, P. R. China
| | - Yongkun Wang
- Department of Thoracic and Cardiovascular Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning 530021, P. R. China
| | - Jingcheng Yang
- Department of Thoracic and Cardiovascular Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning 530021, P. R. China
| | - Shikang Li
- Department of Thoracic and Cardiovascular Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning 530021, P. R. China
| |
Collapse
|
29
|
Jiang W, Shen Y, Ding Y, Ye C, Zheng Y, Zhao P, Liu L, Tong Z, Zhou L, Sun S, Zhang X, Teng L, Timko MP, Fan L, Fang W. A naive Bayes algorithm for tissue origin diagnosis (TOD-Bayes) of synchronous multifocal tumors in the hepatobiliary and pancreatic system. Int J Cancer 2017; 142:357-368. [PMID: 28921531 DOI: 10.1002/ijc.31054] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Revised: 08/24/2017] [Accepted: 09/04/2017] [Indexed: 12/30/2022]
Abstract
Synchronous multifocal tumors are common in the hepatobiliary and pancreatic system but because of similarities in their histological features, oncologists have difficulty in identifying their precise tissue clonal origin through routine histopathological methods. To address this problem and assist in more precise diagnosis, we developed a computational approach for tissue origin diagnosis based on naive Bayes algorithm (TOD-Bayes) using ubiquitous RNA-Seq data. Massive tissue-specific RNA-Seq data sets were first obtained from The Cancer Genome Atlas (TCGA) and ∼1,000 feature genes were used to train and validate the TOD-Bayes algorithm. The accuracy of the model was >95% based on tenfold cross validation by the data from TCGA. A total of 18 clinical cancer samples (including six negative controls) with definitive tissue origin were subsequently used for external validation and 17 of the 18 samples were classified correctly in our study (94.4%). Furthermore, we included as cases studies seven tumor samples, taken from two individuals who suffered from synchronous multifocal tumors across tissues, where the efforts to make a definitive primary cancer diagnosis by traditional diagnostic methods had failed. Using our TOD-Bayes analysis, the two clinical test cases were successfully diagnosed as pancreatic cancer (PC) and cholangiocarcinoma (CC), respectively, in agreement with their clinical outcomes. Based on our findings, we believe that the TOD-Bayes algorithm is a powerful novel methodology to accurately identify the tissue origin of synchronous multifocal tumors of unknown primary cancers using RNA-Seq data and an important step toward more precision-based medicine in cancer diagnosis and treatment.
Collapse
Affiliation(s)
- Weiqin Jiang
- Cancer Biotherapy Center, First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Yifei Shen
- Institute of Bioinformatics & IBM Bio-computational Laboratory, Zhejiang University, Hangzhou, China.,Research Center for Air Pollution and Health, Zhejiang University, Hangzhou, China
| | - Yongfeng Ding
- Department of Surgical Oncology, First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Chuyu Ye
- Institute of Bioinformatics & IBM Bio-computational Laboratory, Zhejiang University, Hangzhou, China.,Research Center for Air Pollution and Health, Zhejiang University, Hangzhou, China
| | - Yi Zheng
- Cancer Biotherapy Center, First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Peng Zhao
- Cancer Biotherapy Center, First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China.,Key Laboratory of Precision Diagnosis & Treatment for Hepatobiliary & Pancreatic Tumor of Zhejiang Province, First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Lulu Liu
- Cancer Biotherapy Center, First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Zhou Tong
- Cancer Biotherapy Center, First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Linfu Zhou
- Medical Biotechnology Laboratory, School of Medicine, Zhejiang University, Hangzhou, China
| | - Shuo Sun
- Institute of Bioinformatics & IBM Bio-computational Laboratory, Zhejiang University, Hangzhou, China.,Research Center for Air Pollution and Health, Zhejiang University, Hangzhou, China
| | - Xingchen Zhang
- Institute of Bioinformatics & IBM Bio-computational Laboratory, Zhejiang University, Hangzhou, China.,Research Center for Air Pollution and Health, Zhejiang University, Hangzhou, China
| | - Lisong Teng
- Department of Surgical Oncology, First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China.,Key Laboratory of Precision Diagnosis & Treatment for Hepatobiliary & Pancreatic Tumor of Zhejiang Province, First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Michael P Timko
- Departments of Biology and Public Health Science, University of Virginia, Charlottesville, VA, 22904
| | - Longjiang Fan
- Institute of Bioinformatics & IBM Bio-computational Laboratory, Zhejiang University, Hangzhou, China.,Research Center for Air Pollution and Health, Zhejiang University, Hangzhou, China
| | - Weijia Fang
- Cancer Biotherapy Center, First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China.,Key Laboratory of Precision Diagnosis & Treatment for Hepatobiliary & Pancreatic Tumor of Zhejiang Province, First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| |
Collapse
|
30
|
Rodriguez SA, Impey SD, Pelz C, Enestvedt B, Bakis G, Owens M, Morgan TK. RNA sequencing distinguishes benign from malignant pancreatic lesions sampled by EUS-guided FNA. Gastrointest Endosc 2016; 84:252-8. [PMID: 26808815 DOI: 10.1016/j.gie.2016.01.042] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/11/2015] [Accepted: 01/14/2016] [Indexed: 12/11/2022]
Abstract
BACKGROUND AND AIMS EUS-guided FNA (EUS-FNA) is the primary method used to obtain pancreatic tissue for preoperative diagnosis. Accumulating evidence suggests diagnostic and prognostic information may be obtained by gene-expression profiling of these biopsy specimens. RNA sequencing (RNAseq) is a newer method of gene-expression profiling, but published data are scant on the use of this method on pancreas tissue obtained via EUS-FNA. The aim of this study was to determine whether RNAseq of EUS-FNA biopsy samples of undiagnosed pancreatic masses can reliably discriminate between benign and malignant tissue. METHODS In this prospective study, consenting adults presented to 2 tertiary care hospitals for EUS of suspected pancreatic mass. Tissue was submitted for RNAseq. The results were compared with cytologic diagnosis, surgical pathology diagnosis, or benign clinical follow-up of at least 1 year. RESULTS Forty-eight patients with solid pancreatic mass lesions were enrolled. Nine samples were excluded because of inadequate RNA and 3 because of final pathologic diagnosis of neuroendocrine tumor. Data from the first 13 patients were used to construct a linear classifier, and this was tested on the final 23 patients (15 malignant and 8 benign lesions). RNAseq of EUS-FNA biopsy samples distinguishes ductal adenocarcinoma from benign pancreatic solid masses with a sensitivity of .87 (range, .58-.98) and specificity of .75 (range, .35-.96). CONCLUSIONS This proof-of-principle study suggests RNAseq of EUS-FNA samples can reliably detect adenocarcinoma and may provide a new method to evaluate more diagnostically challenging pancreatic lesions.
Collapse
Affiliation(s)
- Sarah A Rodriguez
- Division of Gastroenterology, Oregon Health & Science University, Portland, Oregon, USA; The Oregon Clinic Gastroenterology, Portland, Oregon, USA
| | - Soren D Impey
- Oregon Stem Cell Center, Department of Pediatrics, Oregon Health & Science University, Portland, Oregon, USA
| | - Carl Pelz
- Oregon Stem Cell Center, Department of Pediatrics, Oregon Health & Science University, Portland, Oregon, USA
| | - Brintha Enestvedt
- Division of Gastroenterology, Oregon Health & Science University, Portland, Oregon, USA
| | - Gennadiy Bakis
- Division of Gastroenterology, Oregon Health & Science University, Portland, Oregon, USA
| | - Michael Owens
- The Oregon Clinic Gastroenterology, Portland, Oregon, USA
| | - Terry K Morgan
- Department of Pathology, Oregon Health & Science University, Portland, Oregon, USA
| |
Collapse
|
31
|
Pan-cancer subtyping in a 2D-map shows substructures that are driven by specific combinations of molecular characteristics. Sci Rep 2016; 6:24949. [PMID: 27109935 PMCID: PMC4842960 DOI: 10.1038/srep24949] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2015] [Accepted: 04/07/2016] [Indexed: 02/02/2023] Open
Abstract
The use of genome-wide data in cancer research, for the identification of groups of patients with similar molecular characteristics, has become a standard approach for applications in therapy-response, prognosis-prediction, and drug-development. To progress in these applications, the trend is to move from single genome-wide measurements in a single cancer-type towards measuring several different molecular characteristics across multiple cancer-types. Although current approaches shed light on molecular characteristics of various cancer-types, detailed relationships between patients within cancer clusters are unclear. We propose a novel multi-omic integration approach that exploits the joint behavior of the different molecular characteristics, supports visual exploration of the data by a two-dimensional landscape, and inspection of the contribution of the different genome-wide data-types. We integrated 4,434 samples across 19 cancer-types, derived from TCGA, containing gene expression, DNA-methylation, copy-number variation and microRNA expression data. Cluster analysis revealed 18 clusters, where three clusters showed a complex collection of cancer-types, squamous-cell-carcinoma, colorectal cancers, and a novel grouping of kidney-cancers. Sixty-four samples were identified outside their tissue-of-origin cluster. Known and novel patient subgroups were detected for Acute Myeloid Leukemia’s, and breast cancers. Quantification of the contributions of the different molecular types showed that substructures are driven by specific (combinations of) molecular characteristics.
Collapse
|
32
|
Thompson JA, Tan J, Greene CS. Cross-platform normalization of microarray and RNA-seq data for machine learning applications. PeerJ 2016; 4:e1621. [PMID: 26844019 PMCID: PMC4736986 DOI: 10.7717/peerj.1621] [Citation(s) in RCA: 57] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2015] [Accepted: 01/02/2016] [Indexed: 01/08/2023] Open
Abstract
Large, publicly available gene expression datasets are often analyzed with the aid of machine learning algorithms. Although RNA-seq is increasingly the technology of choice, a wealth of expression data already exist in the form of microarray data. If machine learning models built from legacy data can be applied to RNA-seq data, larger, more diverse training datasets can be created and validation can be performed on newly generated data. We developed Training Distribution Matching (TDM), which transforms RNA-seq data for use with models constructed from legacy platforms. We evaluated TDM, as well as quantile normalization, nonparanormal transformation, and a simple log 2 transformation, on both simulated and biological datasets of gene expression. Our evaluation included both supervised and unsupervised machine learning approaches. We found that TDM exhibited consistently strong performance across settings and that quantile normalization also performed well in many circumstances. We also provide a TDM package for the R programming language.
Collapse
Affiliation(s)
- Jeffrey A. Thompson
- Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, United States of America
- Quantitative Biomedical Sciences Program, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, United States of America
| | - Jie Tan
- Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, United States of America
- Molecular and Cellular Biology, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, United States of America
| | - Casey S. Greene
- Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, United States of America
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, Pennslyvania, United States of America
| |
Collapse
|
33
|
Liu Y, Jing R, Xu J, Liu K, Xue J, Wen Z, Li M. Comparative analysis of oncogenes identified by microarray and RNA-sequencing as biomarkers for clinical prognosis. Biomark Med 2015; 9:1067-78. [DOI: 10.2217/bmm.15.97] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Aims: Although RNA-sequencing has been widely used to identify the differentially expressed genes (DEGs) as biomarkers to guide the therapeutic treatment, it is necessary to investigate the concordance of DEGs identified by microarray and RNA-sequencing for the clinical prognosis. Material & methods: By using The Cancer Genome Atlas data sets, we thoroughly investigated the concordance of DEGs identified from microarray and RNA-sequencing data and their molecular functions. Results: The DEGs identified by both technologies averaged ˜98.6% overlap. The cancer-related gene sets were significantly enriched with the DEGs and consistent between two technologies. Conclusions: The highly consistency of DEGs in their regulation directionality and molecular functions indicated the good reproducibility between microarray and RNA-sequencing in identifying potential oncogenes for clinical prognosis.
Collapse
Affiliation(s)
- Yuan Liu
- College of Chemistry, Sichuan University, Chengdu 610064, PR China
| | - Runyu Jing
- College of Chemistry, Sichuan University, Chengdu 610064, PR China
| | - Junmei Xu
- College of Chemistry, Sichuan University, Chengdu 610064, PR China
| | - Keqin Liu
- College of Chemistry, Sichuan University, Chengdu 610064, PR China
| | - Jiwei Xue
- College of Chemistry, Sichuan University, Chengdu 610064, PR China
| | - Zhining Wen
- College of Chemistry, Sichuan University, Chengdu 610064, PR China
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu 610064, PR China
| |
Collapse
|