51
|
Shen Y, Chu Q, Yin X, He Y, Bai P, Wang Y, Fang W, Timko MP, Fan L, Jiang W. TOD-CUP: a gene expression rank-based majority vote algorithm for tissue origin diagnosis of cancers of unknown primary. Brief Bioinform 2020; 22:2106-2118. [PMID: 32266390 DOI: 10.1093/bib/bbaa031] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2019] [Revised: 01/19/2020] [Accepted: 02/19/2020] [Indexed: 12/14/2022] Open
Abstract
Gene expression profiling holds great potential as a new approach to histological diagnosis and precision medicine of cancers of unknown primary (CUP). Batch effects and different data types greatly decrease the predictive performance of biomarker-based algorithms, and few methods have been widely applied to identify tissue origin of CUP up to now. To address this problem and assist in more precise diagnosis, we have developed a gene expression rank-based majority vote algorithm for tissue origin diagnosis of CUP (TOD-CUP) of most common cancer types. Based on massive tissue-specific RNA-seq data sets (10 553) found in The Cancer Genome Atlas (TCGA), 538 feature genes (biomarkers) were selected based on their gene expression ranks and used to predict tissue types. The top scoring pairs (TSPs) classifier of the tumor type was optimized by the TCGA training samples. To test the prediction accuracy of our TOD-CUP algorithm, we analyzed (1) two microarray data sets (1029 Agilent and 2277 Affymetrix/Illumina chips) and found 91% and 94% prediction accuracy, respectively, (2) RNA-seq data from five cancer types derived from 141 public metastatic cancer tumor samples and achieved 94% accuracy and (3) a total of 25 clinical cancer samples (including 14 metastatic cancer samples) were able to classify 24/25 samples correctly (96.0% accuracy). Taken together, the TOD-CUP algorithm provides a powerful and robust means to accurately identify the tissue origin of 24 cancer types across different data platforms. To make the TOD-CUP algorithm easily accessible for clinical application, we established a Web-based server for tumor tissue origin diagnosis (http://ibi. zju.edu.cn/todcup/).
Collapse
Affiliation(s)
- Yifei Shen
- Department of Medical Oncology, First Affiliated Hospital, Zhejiang University and the Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, USA
| | - Qinjie Chu
- Institute of Bioinformatics, Zhejiang University, China
| | - Xinxin Yin
- Institute of Bioinformatics, Zhejiang University, China
| | - Yinjun He
- College of Medicine, Zhejiang University, China
| | - Panpan Bai
- Institute of Bioinformatics, Zhejiang University, China
| | - Yunfei Wang
- Zhejiang Sheng Ting Biotechnology Co., China
| | - Weijia Fang
- Department of Medical Oncology, First Affiliated Hospital, Zhejiang University, China
| | - Michael P Timko
- Department of Biology & Public Health Sciences, University of Virginia, USA
| | - Longjiang Fan
- Department of Medical Oncology, First Affiliated Hospital, Zhejiang University, China
| | - Weiqin Jiang
- Department of Medical Oncology, First Affiliated Hospital, Zhejiang University, China
| |
Collapse
|
52
|
|
53
|
Li X, Huang H, Zhang J, Jiang F, Guo Y, Shi Y, Guo Z, Ao L. A qualitative transcriptional signature for predicting the biochemical recurrence risk of prostate cancer patients after radical prostatectomy. Prostate 2020; 80:376-387. [PMID: 31961962 PMCID: PMC7065139 DOI: 10.1002/pros.23952] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Accepted: 01/02/2020] [Indexed: 12/27/2022]
Abstract
BACKGROUND The qualitative transcriptional characteristics, the within-sample relative expression orderings (REOs) of genes, are highly robust against batch effects and sample quality variations. Hence, we develop a qualitative transcriptional signature based on REOs to predict the biochemical recurrence risk of prostate cancer (PCa) patients after radical prostatectomy. METHODS Gene pairs with REOs significantly correlated with the biochemical recurrence-free survival (BFS) were identified from 131 PCa samples in the training data set. From these gene pairs, we selected a qualitative transcriptional signature based on the within-sample REOs of gene pairs which could predict the recurrence risk of PCa patients after radical prostatectomy. RESULTS A signature consisting of 74 gene pairs, named 74-GPS, was developed for predicting the recurrence risk of PCa patients after radical prostatectomy based on the majority voting rule that a sample was assigned as high risk when at least 37 gene pairs of the 74-GPS voted for high risk; otherwise, low risk. The signature was validated in six independent datasets produced by different platforms. In each of the validation datasets, the Kaplan-Meier survival analysis showed that the average BFS of the low-risk group was significantly better than that of the high-risk group. Analyses of multiomics data of PCa samples from TCGA suggested that both the epigenomic and genomic alternations could cause the reproducible transcriptional differences between the two different prognostic groups. CONCLUSIONS The proposed qualitative transcriptional signature can robustly stratify PCa patients after radical prostatectomy into two groups with different recurrence risk and distinct multiomics characteristics. Hence, 74-GPS may serve as a helpful tool for guiding the management of PCa patients with radical prostatectomy at the individual level.
Collapse
Affiliation(s)
- Xiang Li
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical SciencesFujian Medical UniversityFuzhouChina
- Key Laboratory of Medical BioinformaticsFujian Medical UniversityFuzhouChina
- Fujian Key Laboratory of Tumor MicrobiologyFujian Medical UniversityFuzhouChina
| | - Haiyan Huang
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical SciencesFujian Medical UniversityFuzhouChina
| | - Jiahui Zhang
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical SciencesFujian Medical UniversityFuzhouChina
| | - Fengle Jiang
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical SciencesFujian Medical UniversityFuzhouChina
| | - Yating Guo
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical SciencesFujian Medical UniversityFuzhouChina
| | - Yidan Shi
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical SciencesFujian Medical UniversityFuzhouChina
| | - Zheng Guo
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical SciencesFujian Medical UniversityFuzhouChina
- Key Laboratory of Medical BioinformaticsFujian Medical UniversityFuzhouChina
- Fujian Key Laboratory of Tumor MicrobiologyFujian Medical UniversityFuzhouChina
| | - Lu Ao
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical SciencesFujian Medical UniversityFuzhouChina
- Key Laboratory of Medical BioinformaticsFujian Medical UniversityFuzhouChina
- Fujian Key Laboratory of Tumor MicrobiologyFujian Medical UniversityFuzhouChina
| |
Collapse
|
54
|
Wang C, Li J. SINC: a scale-invariant deep-neural-network classifier for bulk and single-cell RNA-seq data. Bioinformatics 2020; 36:1779-1784. [PMID: 31647523 DOI: 10.1093/bioinformatics/btz801] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Revised: 10/01/2019] [Accepted: 10/23/2019] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Scaling by sequencing depth is usually the first step of analysis of bulk or single-cell RNA-seq data, but estimating sequencing depth accurately can be difficult, especially for single-cell data, risking the validity of downstream analysis. It is thus of interest to eliminate the use of sequencing depth and analyze the original count data directly. RESULTS We call an analysis method 'scale-invariant' (SI) if it gives the same result under different estimates of sequencing depth and hence can use the original count data without scaling. For the problem of classifying samples into pre-specified classes, such as normal versus cancerous, we develop a deep-neural-network based SI classifier named scale-invariant deep neural-network classifier (SINC). On nine bulk and single-cell datasets, the classification accuracy of SINC is better than or competitive to the best of eight other classifiers. SINC is easier to use and more reliable on data where proper sequencing depth is hard to determine. AVAILABILITY AND IMPLEMENTATION This source code of SINC is available at https://www.nd.edu/∼jli9/SINC.zip. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chuanqi Wang
- Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Jun Li
- Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, IN 46556, USA
| |
Collapse
|
55
|
Krzhizhanovskaya VV, Závodszky G, Lees MH, Dongarra JJ, Sloot PMA, Brissos S, Teixeira J. Tree Based Advanced Relative Expression Analysis. LECTURE NOTES IN COMPUTER SCIENCE 2020. [PMCID: PMC7304016 DOI: 10.1007/978-3-030-50420-5_37] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
This paper presents a new concept for biomarker discovery and gene expression data classification that rises from the Relative Expression Analysis (RXA). The basic idea of RXA is to focus on simple ordering relationships between the expression of small sets of genes rather than their raw values. We propose a paradigm shift as we extend RXA concept to tree-based Advanced Relative Expression Analysis (ARXA). The main contribution is a decision tree with splitting nodes that consider relative fraction comparisons between multiple gene pairs. In addition, to face the enormous computational complexity of RXA, the most time-consuming part which is scoring all possible gene pairs in each splitting node is parallelized using GPU. This way the algorithm allows searching for more tailored interactions between sub-groups of genes in a reasonable time. Experiments carried out on 8 cancer-related datasets show not only significant improvement in accuracy and speed of our approach in comparison to various RXA solutions but also new interesting patterns between subgroups of genes.
Collapse
|
56
|
Scala G, Federico A, Fortino V, Greco D, Majello B. Knowledge Generation with Rule Induction in Cancer Omics. Int J Mol Sci 2019; 21:E18. [PMID: 31861438 PMCID: PMC6981587 DOI: 10.3390/ijms21010018] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Revised: 11/26/2019] [Accepted: 12/13/2019] [Indexed: 12/21/2022] Open
Abstract
The explosion of omics data availability in cancer research has boosted the knowledge of the molecular basis of cancer, although the strategies for its definitive resolution are still not well established. The complexity of cancer biology, given by the high heterogeneity of cancer cells, leads to the development of pharmacoresistance for many patients, hampering the efficacy of therapeutic approaches. Machine learning techniques have been implemented to extract knowledge from cancer omics data in order to address fundamental issues in cancer research, as well as the classification of clinically relevant sub-groups of patients and for the identification of biomarkers for disease risk and prognosis. Rule induction algorithms are a group of pattern discovery approaches that represents discovered relationships in the form of human readable associative rules. The application of such techniques to the modern plethora of collected cancer omics data can effectively boost our understanding of cancer-related mechanisms. In fact, the capability of these methods to extract a huge amount of human readable knowledge will eventually help to uncover unknown relationships between molecular attributes and the malignant phenotype. In this review, we describe applications and strategies for the usage of rule induction approaches in cancer omics data analysis. In particular, we explore the canonical applications and the future challenges and opportunities posed by multi-omics integration problems.
Collapse
Affiliation(s)
- Giovanni Scala
- Department of Biology, University of Naples Federico II, 80126 Naples, Italy;
| | - Antonio Federico
- Faculty of Medicine and Health Technology, Tampere University, 33014 Tampere, Finland; (A.F.); (D.G.)
| | - Vittorio Fortino
- Institute of Biomedicine, University of Eastern Finland, 70210 Kuopio, Finland;
| | - Dario Greco
- Faculty of Medicine and Health Technology, Tampere University, 33014 Tampere, Finland; (A.F.); (D.G.)
- Institute of Biotechnology, University of Helsinki, 00014 Helsinki, Finland
| | - Barbara Majello
- Department of Biology, University of Naples Federico II, 80126 Naples, Italy;
| |
Collapse
|
57
|
Smolander J, Stupnikov A, Glazko G, Dehmer M, Emmert-Streib F. Comparing biological information contained in mRNA and non-coding RNAs for classification of lung cancer patients. BMC Cancer 2019; 19:1176. [PMID: 31796020 PMCID: PMC6892207 DOI: 10.1186/s12885-019-6338-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 11/06/2019] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Deciphering the meaning of the human DNA is an outstanding goal which would revolutionize medicine and our way for treating diseases. In recent years, non-coding RNAs have attracted much attention and shown to be functional in part. Yet the importance of these RNAs especially for higher biological functions remains under investigation. METHODS In this paper, we analyze RNA-seq data, including non-coding and protein coding RNAs, from lung adenocarcinoma patients, a histologic subtype of non-small-cell lung cancer, with deep learning neural networks and other state-of-the-art classification methods. The purpose of our paper is three-fold. First, we compare the classification performance of different versions of deep belief networks with SVMs, decision trees and random forests. Second, we compare the classification capabilities of protein coding and non-coding RNAs. Third, we study the influence of feature selection on the classification performance. RESULTS As a result, we find that deep belief networks perform at least competitively to other state-of-the-art classifiers. Second, data from non-coding RNAs perform better than coding RNAs across a number of different classification methods. This demonstrates the equivalence of predictive information as captured by non-coding RNAs compared to protein coding RNAs, conventionally used in computational diagnostics tasks. Third, we find that feature selection has in general a negative effect on the classification performance which means that unfiltered data with all features give the best classification results. CONCLUSIONS Our study is the first to use ncRNAs beyond miRNAs for the computational classification of cancer and for performing a direct comparison of the classification capabilities of protein coding RNAs and non-coding RNAs.
Collapse
Affiliation(s)
- Johannes Smolander
- Predictive Society and Data Analytics Lab, Faculty of Information Technology and Communication Sciences, Tampere University, Tampere, Finland
- Turku Centre for Biotechnology, University of Turku, Turku, Finland
| | - Alexey Stupnikov
- Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, USA
| | - Galina Glazko
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, USA
| | - Matthias Dehmer
- Institute for Intelligent Production, Faculty for Management, University of Applied Sciences Upper Austria, Steyr, Austria
- Department of Mechatronics and Biomedical Computer Science, UMIT, Hall in Tyrol, Austria
- College of Artificial Intelligence, Nankai University, China, Tianjin, China
| | - Frank Emmert-Streib
- Predictive Society and Data Analytics Lab, Faculty of Information Technology and Communication Sciences, Tampere University, Tampere, Finland
- Institute of Biosciences and Medical Technology, Tampere, Finland
| |
Collapse
|
58
|
Jazayeri N, Sajedi H. Breast cancer diagnosis based on genomic data and extreme learning machine. SN APPLIED SCIENCES 2019. [DOI: 10.1007/s42452-019-1789-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
|
59
|
Sandhu V, Labori KJ, Borgida A, Lungu I, Bartlett J, Hafezi-Bakhtiari S, Denroche RE, Jang GH, Pasternack D, Mbaabali F, Watson M, Wilson J, Kure EH, Gallinger S, Haibe-Kains B. Meta-Analysis of 1,200 Transcriptomic Profiles Identifies a Prognostic Model for Pancreatic Ductal Adenocarcinoma. JCO Clin Cancer Inform 2019; 3:1-16. [DOI: 10.1200/cci.18.00102] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
PURPOSE With a dismal 8% median 5-year overall survival, pancreatic ductal adenocarcinoma (PDAC) is a highly lethal malignancy. Only 10% to 20% of patients are eligible for surgery, and more than 50% of these patients will die within 1 year of surgery. Building a molecular predictor of early death would enable the selection of patients with PDAC who are at high risk. MATERIALS AND METHODS We developed the Pancreatic Cancer Overall Survival Predictor (PCOSP), a prognostic model built from a unique set of 89 PDAC tumors in which gene expression was profiled using both microarray and sequencing platforms. We used a meta-analysis framework that was based on the binary gene pair method to create gene expression barcodes that were robust to biases arising from heterogeneous profiling platforms and batch effects. Leveraging the largest compendium of PDAC transcriptomic data sets to date, we show that PCOSP is a robust single-sample predictor of early death—1 year or less—after surgery in a subset of 823 samples with available transcriptomics and survival data. RESULTS The PCOSP model was strongly and significantly prognostic, with a meta-estimate of the area under the receiver operating curve of 0.70 ( P = 2.6E−22) and d-index (robust hazard ratio) of 1.9 (range, 1.6 to 2.3; ( = 1.4E−04) for binary and survival predictions, respectively. The prognostic value of PCOSP was independent of clinicopathologic parameters and molecular subtypes. Over-representation analysis of the PCOSP 2,619 gene pairs—1,070 unique genes—unveiled pathways associated with Hedgehog signaling, epithelial–mesenchymal transition, and extracellular matrix signaling. CONCLUSION PCOSP could improve treatment decisions by identifying patients who will not benefit from standard surgery/chemotherapy but who may benefit from a more aggressive treatment approach or enrollment in a clinical trial.
Collapse
Affiliation(s)
- Vandana Sandhu
- University Health Network, Toronto, Ontario, Canada
- Oslo University Hospital, Institute for Cancer Research, Oslo, Norway
| | | | | | - Ilinca Lungu
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - John Bartlett
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | | | | | - Gun Ho Jang
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | | | | | - Matthew Watson
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Julie Wilson
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Elin H. Kure
- Oslo University Hospital, Institute for Cancer Research, Oslo, Norway
- University of South-Eastern Norway, Bø in Telemark, Norway
| | - Steven Gallinger
- University Health Network, Toronto, Ontario, Canada
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Benjamin Haibe-Kains
- University Health Network, Toronto, Ontario, Canada
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
- University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
60
|
A Wrapper Feature Subset Selection Method Based on Randomized Search and Multilayer Structure. BIOMED RESEARCH INTERNATIONAL 2019; 2019:9864213. [PMID: 31828154 PMCID: PMC6885241 DOI: 10.1155/2019/9864213] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/08/2019] [Revised: 08/10/2019] [Accepted: 08/27/2019] [Indexed: 12/11/2022]
Abstract
The identification of discriminative features from information-rich data with the goal of clinical diagnosis is crucial in the field of biomedical science. In this context, many machine-learning techniques have been widely applied and achieved remarkable results. However, disease, especially cancer, is often caused by a group of features with complex interactions. Unlike traditional feature selection methods, which only focused on finding single discriminative features, a multilayer feature subset selection method (MLFSSM), which employs randomized search and multilayer structure to select a discriminative subset, is proposed herein. In each level of this method, many feature subsets are generated to assure the diversity of the combinations, and the weights of features are evaluated on the performances of the subsets. The weight of a feature would increase if the feature is selected into more subsets with better performances compared with other features on the current layer. In this manner, the values of feature weights are revised layer-by-layer; the precision of feature weights is constantly improved; and better subsets are repeatedly constructed by the features with higher weights. Finally, the topmost feature subset of the last layer is returned. The experimental results based on five public gene datasets showed that the subsets selected by MLFSSM were more discriminative than the results by traditional feature methods including LVW (a feature subset method used the Las Vegas method for randomized search strategy), GAANN (a feature subset selection method based genetic algorithm (GA)), and support vector machine recursive feature elimination (SVM-RFE). Furthermore, MLFSSM showed higher classification performance than some state-of-the-art methods which selected feature pairs or groups, including top scoring pair (TSP), k-top scoring pairs (K-TSP), and relative simplicity-based direct classifier (RS-DC).
Collapse
|
61
|
Ooi A. Advances in hereditary leiomyomatosis and renal cell carcinoma (HLRCC) research. Semin Cancer Biol 2019; 61:158-166. [PMID: 31689495 DOI: 10.1016/j.semcancer.2019.10.016] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Accepted: 10/26/2019] [Indexed: 12/30/2022]
Abstract
Hereditary Leiomyomatosis and Renal Cell Cancer (HLRCC) is an autosomal dominant hereditary cancer syndrome with incomplete penetrance. It is caused by a germline amorphic allele of the FH gene, which encodes the TCA cycle enzyme, fumarate hydratase (FH). HLRCC patients are genetically predisposed to develop skin leiomyomas, uterine fibroids, and the aggressive kidney cancer of type 2 papillary morphology. Loss-of-heterozygocity at the FH locus that cause a complete loss of FH enzymatic function is always detected in these tumor tissues. Molecular pathway elucidation, genomic studies, and systematic genetics screens reported over the last two decades have identified several FH-inactivation driven pathways alterations, as well as rationally conceived treatment strategies that specifically target FH-/- tumor cells. These treatment strategies include ferroptosis induction, oxidative stress promotion, and metabolic alteration. As the fundamental biology of HLRCC continues to be uncovered, these treatment strategies continue to be refined and may one day lead to a strategy to prevent disease onset among HLRCC patients. With a more complete picture of HLRCC biology, the safe translation of experimental treatment strategies into clinical practice is achievable in the foreseeable future.
Collapse
Affiliation(s)
- Aikseng Ooi
- Department of Pharmacology and Toxicology, University of Arizona, College of Pharmacy, 1703 East Mabel Street, 85721, Tucson, AZ, United States.
| |
Collapse
|
62
|
Fu Y, Qi L, Guo W, Jin L, Song K, You T, Zhang S, Gu Y, Zhao W, Guo Z. A qualitative transcriptional signature for predicting microsatellite instability status of right-sided Colon Cancer. BMC Genomics 2019; 20:769. [PMID: 31646964 PMCID: PMC6813057 DOI: 10.1186/s12864-019-6129-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Accepted: 09/23/2019] [Indexed: 12/16/2022] Open
Abstract
Background Microsatellite instability (MSI) accounts for about 15% of colorectal cancer and is associated with prognosis. Today, MSI is usually detected by polymerase chain reaction amplification of specific microsatellite markers. However, the instability is identified by comparing the length of microsatellite repeats in tumor and normal samples. In this work, we developed a qualitative transcriptional signature to individually predict MSI status for right-sided colon cancer (RCC) based on tumor samples. Results Using RCC samples, based on the relative expression orderings (REOs) of gene pairs, we extracted a signature consisting of 10 gene pairs (10-GPS) to predict MSI status for RCC through a feature selection process. A sample is predicted as MSI when the gene expression orderings of at least 7 gene pairs vote for MSI; otherwise the microsatellite stability (MSS). The classification performance reached the largest F-score in the training dataset. This signature was verified in four independent datasets of RCCs with the F-scores of 1, 0.9630, 0.9412 and 0.8798, respectively. Additionally, the hierarchical clustering analyses and molecular features also supported the correctness of the reclassifications of the MSI status by 10-GPS. Conclusions The qualitative transcriptional signature can be used to classify MSI status of RCC samples at the individualized level.
Collapse
Affiliation(s)
- Yelin Fu
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Lishuang Qi
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Wenbing Guo
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Liangliang Jin
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Kai Song
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Tianyi You
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Shuobo Zhang
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Yunyan Gu
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Wenyuan Zhao
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China.
| | - Zheng Guo
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China. .,Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China. .,Key Laboratory of Medical Bioinformatics, Fujian Province, Fuzhou, 350122, China.
| |
Collapse
|
63
|
Takahashi Y, Gleber-Netto FO, Bell D, Roberts D, Xie TX, Abdelmeguid AS, Pickering C, Myers JN, Hanna EY. Identification of markers predictive for response to induction chemotherapy in patients with sinonasal undifferentiated carcinoma. Oral Oncol 2019; 97:56-61. [PMID: 31421472 DOI: 10.1016/j.oraloncology.2019.07.028] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Revised: 05/09/2019] [Accepted: 07/29/2019] [Indexed: 01/09/2023]
Abstract
OBJECTIVES Sinonasal undifferentiated carcinoma (SNUC) is a rare, highly aggressive cancer. Despite aggressive multimodal therapy, its prognosis remains poor. Because of its locally advanced nature and high propensity for distant metastasis, we frequently use induction chemotherapy before definitive therapy in patients with SNUC. However, about 30% of patients do not respond to induction chemotherapy, and lack of response is associated with a poor survival rate. Therefore, in this study, we performed gene expression analysis of SNUC samples to identify prognostic markers for induction chemotherapy response. MATERIALS AND METHODS Formalin-fixed, paraffin-embedded SNUC tumor samples from previously untreated patients harvested before induction chemotherapy were used. Gene expression was performed using an oncology gene expression panel. RESULTS We identified 34 differentially expressed genes that distinguish the responders from the non-responders. Pathway analysis using these genes revealed alteration of multiple pathways between the two groups. Of these 34 genes, 24 distinguished between these two groups. Additionally, 16 gene pairs were associated with response to induction therapy. CONCLUSION We identified genes predictive of SNUC response to induction chemotherapy and pathways potentially associated with treatment outcome. This is the first report of identification of predictive biomarkers for response of SNUC to induction chemotherapy, and it may help us develop therapeutic strategies to improve the treatment outcomes of non-responders.
Collapse
Affiliation(s)
- Yoko Takahashi
- Department of Head and Neck Surgery, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA.
| | - Frederico O Gleber-Netto
- Department of Head and Neck Surgery, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Diana Bell
- Department of Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Dianna Roberts
- Department of Head and Neck Surgery, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Tong-Xin Xie
- Department of Head and Neck Surgery, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Ahmed S Abdelmeguid
- Department of Head and Neck Surgery, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Curtis Pickering
- Department of Head and Neck Surgery, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Jeffrey N Myers
- Department of Head and Neck Surgery, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Ehab Y Hanna
- Department of Head and Neck Surgery, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| |
Collapse
|
64
|
A novel analysis method for biomarker identification based on horizontal relationship: identifying potential biomarkers from large-scale hepatocellular carcinoma metabolomics data. Anal Bioanal Chem 2019; 411:6377-6386. [DOI: 10.1007/s00216-019-02011-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Revised: 06/03/2019] [Accepted: 07/01/2019] [Indexed: 02/07/2023]
|
65
|
Lee MY, Kim TK, Walters KA, Wang K. A biological function based biomarker panel optimization process. Sci Rep 2019; 9:7365. [PMID: 31089177 PMCID: PMC6517383 DOI: 10.1038/s41598-019-43779-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2018] [Accepted: 04/26/2019] [Indexed: 11/09/2022] Open
Abstract
Implementation of multi-gene biomarker panels identified from high throughput data, including microarray or next generation sequencing, need to be adapted to a platform suitable in a clinical setting such as quantitative polymerase chain reaction. However, technical challenges when transitioning from one measurement platform to another, such as inconsistent measurement results can affect panel development. We describe a process to overcome the challenges by replacing poor performing genes during platform transition and reducing the number of features without impacting classification performance. This approach assumes that a diagnostic panel reflects the effect of dysregulated biological processes associated with a disease, and genes involved in the same biological processes and coordinately affected by a disease share a similar discriminatory power. The utility of this optimization process was assessed using a published sepsis diagnostic panel. Substitution of more than half of the genes and/or reducing genes based on biological processes did not negatively affect the performance of the sepsis diagnostic panel. Our results suggest a systematic gene substitution and reduction process based on biological function can be used to alleviate the challenges associated with clinical development of biomarker panels.
Collapse
Affiliation(s)
- Min Young Lee
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Taek-Kyun Kim
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Kathie-Anne Walters
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Kai Wang
- Institute for Systems Biology, Seattle, Washington, United States of America.
| |
Collapse
|
66
|
A new data analysis method based on feature linear combination. J Biomed Inform 2019; 94:103173. [PMID: 30965135 DOI: 10.1016/j.jbi.2019.103173] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2018] [Revised: 04/02/2019] [Accepted: 04/06/2019] [Indexed: 01/15/2023]
Abstract
In biological data, feature relationships are complex and diverse, they could reflect physiological and pathological changes. Defining simple and efficient classification rules based on feature relationships is helpful for discriminating different conditions and studying disease mechanism. The popular data analysis method, k top scoring pairs (k-TSP), explores the feature relationship by focusing on the difference of the relative level of two features in different groups and classifies samples based on the exploration. To define more efficient classification rules, we propose a new data analysis method based on the linear combination of k > 0 top scoring pairs (LC-k-TSP). LC-k-TSP applies support vector machine (SVM) to define the best linear relationship of each feature pair, scores feature pairs by the discriminative abilities of the corresponding linear combinations and selects k disjoint top scoring pairs to construct an ensemble classifier. Experiments on twelve public datasets showed the superiority of LC-k-TSP over k-TSP which evaluates the relationship of every two features in the same way. The experiment also illustrated that LC-k-TSP performed similarly to SVM and random forest (RF) in accuracy rate. LC-k-TSP studies the own unique linear combination for each feature pair and defines simple classification rules, it is easy to explore the biomedical explanation. Finally, we applied LC-k-TSP to analyze the hepatocellular carcinoma (HCC) metabolomics data and define the simple classification rules for discrimination of different liver diseases. It obtained accuracy rates of 89.76% and 89.13% in distinguishing between small HCC and hepatic cirrhosis (CIR) groups as well as between HCC and CIR groups, superior to 87.99% and 80.35% by k-TSP. Hence, defining classification rules based on feature relationships is an effective way to analyze biological data. LC-k-TSP which checks different feature pairs by their corresponding unique best linear relationship has the superiority over k-TSP which checks each pair by the same linear relationship. Availability and implementation: http://www.402.dicp.ac.cn/download_ok_4.htm.
Collapse
|
67
|
A combined gene expression tool for parallel histological prediction and gene fusion detection in non-small cell lung cancer. Sci Rep 2019; 9:5207. [PMID: 30914778 PMCID: PMC6435686 DOI: 10.1038/s41598-019-41585-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Accepted: 03/12/2019] [Indexed: 01/10/2023] Open
Abstract
Accurate histological classification and identification of fusion genes represent two cornerstones of clinical diagnostics in non-small cell lung cancer (NSCLC). Here, we present a NanoString gene expression platform and a novel platform-independent, single sample predictor (SSP) of NSCLC histology for combined, simultaneous, histological classification and fusion gene detection in minimal formalin fixed paraffin embedded (FFPE) tissue. The SSP was developed in 68 NSCLC tumors of adenocarcinoma (AC), squamous cell carcinoma (SqCC) and large-cell neuroendocrine carcinoma (LCNEC) histology, based on NanoString expression of 11 (CHGA, SYP, CD56, SFTPG, NAPSA, TTF-1, TP73L, KRT6A, KRT5, KRT40, KRT16) relevant genes for IHC-based NSCLC histology classification. The SSP was combined with a gene fusion detection module (analyzing ALK, RET, ROS1, MET, NRG1, and NTRK1) into a multicomponent NanoString assay. The histological SSP was validated in six cohorts varying in size (n = 11–199), tissue origin (early or advanced disease), histological composition (including undifferentiated cancer), and gene expression platform. Fusion gene detection revealed five EML4-ALK fusions, four KIF5B-RET fusions, two CD74-NRG1 fusion and three MET exon 14 skipping events among 131 tested cases. The histological SSP was successfully trained and tested in the development cohort (mean AUC = 0.96 in iterated test sets). The SSP proved successful in predicting histology of NSCLC tumors of well-defined subgroups and difficult undifferentiated morphology irrespective of gene expression data platform. Discrepancies between gene expression prediction and histologic diagnosis included cases with mixed histologies, true large cell carcinomas, or poorly differentiated adenocarcinomas with mucin expression. In summary, we present a proof-of-concept multicomponent assay for parallel histological classification and multiplexed fusion gene detection in archival tissue, including a novel platform-independent histological SSP classifier. The assay and SSP could serve as a promising complement in the routine evaluation of diagnostic lung cancer biopsies.
Collapse
|
68
|
Khamesipour A, Kagaris D. Speeding up the discovery of combinations of differentially expressed genes for disease prediction and classification. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2019; 170:69-80. [PMID: 30712605 DOI: 10.1016/j.cmpb.2019.01.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2018] [Revised: 01/11/2019] [Accepted: 01/11/2019] [Indexed: 06/09/2023]
Abstract
BACKGROUND AND OBJECTIVE Finding combinations (i.e., pairs, or more generally, q-tuples with q ≥ 2) of genes whose behavior as a group differs significantly between two classes has received a lot of attention in the quest for the discovery of simple, accurate, and easily interpretable decision rules for disease classification and prediction. For example, the Top Scoring Pair (TSP) method seeks to find pairs of genes so that the probability of the reversal of the relative ranking of the expression levels of the genes in the two classes is maximized. The computational cost of finding a q-tuple of genes that scores highest under a given metric is O(Gq), where G is the total number of genes. This cost is often problematic or prohibitive in practice (even for q=2), as the number of genes G is often in the order of tens of thousands. METHODS In this paper, we show that this computational cost can be significantly reduced by excluding from consideration genes whose behavior is almost identical in the two classes and therefore their inclusion in any q-tuple is rather non-informative. Our criterion for the exclusion of genes is supported by a statistically robust metric, the Area Under the Curve (AUC) of the corresponding Receiver Operating Characteristic (ROC) curve. By filtering out genes whose AUC value is below a user-chosen threshold, as determined by a procedure that we describe in the paper, dramatic reductions in the run times are obtained while maintaining the same classification accuracy. RESULTS We have experimentally verified the gains of this approach on several case studies involving ovarian, colon, leukemia, breast and prostate cancers, and diffuse large b-cell lymphoma. CONCLUSIONS The proposed method is not only faster (for example, we observed an average 78.65% reduction over the run time of TSP) while maintaining the same classification accuracy, but it can even result in better classification accuracy due to its inherent ability to avoid the so-called "pivot" (non-informative) genes that may intrude in q-tuples chosen otherwise.
Collapse
Affiliation(s)
| | - Dimitri Kagaris
- ECE Dept., Southern Illinois University, Carbondale, IL 62901, USA.
| |
Collapse
|
69
|
Lin X, Huang X, Zhou L, Ren W, Zeng J, Yao W, Wang X. The Robust Classification Model Based on Combinatorial Features. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:650-657. [PMID: 29990202 DOI: 10.1109/tcbb.2017.2779512] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Analyzing the disease data from the view of combinatorial features may better characterize the disease phenotype. In this study, a novel method is proposed to construct feature combinations and a classification model (CFC-CM) by mining key feature relationships. CFC-CM iteratively tests for differences in the feature relationship between different groups. To do this, it uses a modified $k$k-top-scoring pair (M-$k$k-TSP) algorithm and then selects the most discriminative feature pairs in the current feature set to infer the combinatorial features and build the classification model. Compared with support vector machines, random forests, least absolute shrinkage and selection operator, elastic net, and M-$k$k-TSP, the superior performance of CFC-CM on nine public gene expression datasets validates its potential for more precise identification of complex diseases. Subsequently, CFC-CM was applied to two metabolomics datasets, it obtained accuracy rates of $88.73\pm 2.06\%$88.73±2.06% and $79.11\pm 2.70\%$79.11±2.70% in distinguishing between hepatocellular carcinoma and hepatic cirrhosis groups and between acute kidney injury (AKI) and non-AKI samples, results superior to those of the other five methods. In summary, the better results of CFC-CM show that in contrast to molecules and combinations constituted by just two features, the combinations inferred by appropriate number of features could better identify the complex diseases.
Collapse
|
70
|
Li M, Li H, Hong G, Tang Z, Liu G, Lin X, Lin M, Qi L, Guo Z. Identifying primary site of lung-limited Cancer of unknown primary based on relative gene expression orderings. BMC Cancer 2019; 19:67. [PMID: 30642283 PMCID: PMC6332677 DOI: 10.1186/s12885-019-5274-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Accepted: 01/03/2019] [Indexed: 01/11/2023] Open
Abstract
Background Precise diagnosis of the tissue origin for metastatic cancer of unknown primary (CUP) is essential for deciding the treatment scheme to improve patients’ prognoses, since the treatment for the metastases is the same as their primary counterparts. The purpose of this study is to identify a robust gene signature that can predict the origin for CUPs. Methods The within-sample relative gene expression orderings (REOs) of gene pairs within individual samples, which are insensitive to experimental batch effects and data normalizations, were exploited for identifying the prediction signature. Results Using gene expression profiles of the lung-limited metastatic colorectal cancer (LmCRC), we firstly showed that the within-sample REOs in lung metastases of colorectal cancer (CRC) samples were concordant with the REOs in primary CRC samples rather than with the REOs in primary lung cancer. Based on this phenomenon, we selected five gene pairs with consistent REOs in 498 primary CRC and reversely consistent REOs in 509 lung cancer samples, which were used as a signature for predicting primary sites of metastatic CRC based on the majority voting rule. Applying the signature to 654 primary CRC and 204 primary lung cancer samples collected from multiple datasets, the prediction accuracy reached 99.36%. This signature was also applied to 24 LmCRC samples collected from three datasets produced by different laboratories and the accuracy reached 100%, suggesting that the within-sample REOs in the primary site could reveal the original tissue of metastatic cancers. Conclusions The result demonstrated that the signature based on within-sample REOs of five gene pairs could exactly and robustly identify the primary sites of CUPs.
Collapse
Affiliation(s)
- Mengyao Li
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Department of Bioinformatics, Fujian Medical University, Fuzhou, 350001, China
| | - Hongdong Li
- Department of Bioinformatics, Gannan Medical University, Ganzhou, 341000, China.
| | - Guini Hong
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Department of Bioinformatics, Fujian Medical University, Fuzhou, 350001, China
| | - Zhongjie Tang
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Department of Bioinformatics, Fujian Medical University, Fuzhou, 350001, China
| | - Guanghao Liu
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Department of Bioinformatics, Fujian Medical University, Fuzhou, 350001, China
| | - Xiaofang Lin
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Department of Bioinformatics, Fujian Medical University, Fuzhou, 350001, China
| | - Mingzhang Lin
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Department of Bioinformatics, Fujian Medical University, Fuzhou, 350001, China
| | - Lishuang Qi
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Zheng Guo
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Department of Bioinformatics, Fujian Medical University, Fuzhou, 350001, China. .,Fujian Key Laboratory of Tumor Microbiology, Fujian Medical University, Fuzhou, 350001, China.
| |
Collapse
|
71
|
Grzadkowski MR, Sendorek DH, P'ng C, Huang V, Boutros PC. A comparative study of survival models for breast cancer prognostication revisited: the benefits of multi-gene models. BMC Bioinformatics 2018; 19:400. [PMID: 30390622 PMCID: PMC6215649 DOI: 10.1186/s12859-018-2430-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2018] [Accepted: 10/10/2018] [Indexed: 01/01/2023] Open
Abstract
Background The development of clinical -omic biomarkers for predicting patient prognosis has mostly focused on multi-gene models. However, several studies have described significant weaknesses of multi-gene biomarkers. Indeed, some high-profile reports have even indicated that multi-gene biomarkers fail to consistently outperform simple single-gene ones. Given the continual improvements in -omics technologies and the availability of larger, better-powered datasets, we revisited this “single-gene hypothesis” using new techniques and datasets. Results By deeply sampling the population of available gene sets, we compare the intrinsic properties of single-gene biomarkers to multi-gene biomarkers in twelve different partitions of a large breast cancer meta-dataset. We show that simple multi-gene models consistently outperformed single-gene biomarkers in all twelve partitions. We found 270 multi-gene biomarkers (one per ~11,111 sampled) that always made better predictions than the best single-gene model. Conclusions The single-gene hypothesis for breast cancer does not appear to retain its validity in the face of improved statistical models, lower-noise genomic technology and better-powered patient cohorts. These results highlight that it is critical to revisit older hypotheses in the light of newer techniques and datasets. Electronic supplementary material The online version of this article (10.1186/s12859-018-2430-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | - Vincent Huang
- Ontario Institute for Cancer Research, Toronto, Canada
| | - Paul C Boutros
- Ontario Institute for Cancer Research, Toronto, Canada. .,Department of Medical Biophysics, University of Toronto, Toronto, Canada. .,Department of Pharmacology & Toxicology, University of Toronto, Toronto, Canada.
| |
Collapse
|
72
|
Wu P, Wang D. Classification of a DNA Microarray for Diagnosing Cancer Using a Complex Network Based Method. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 16:801-808. [PMID: 30183642 DOI: 10.1109/tcbb.2018.2868341] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Applications that classify DNA microarray expression data are helpful for diagnosing cancer. Many attempts have been made to analyze these data; however, new methods are needed to obtain better results. In this study, a Complex Network (CN) classifier was exploited to implement the classification task. An algorithm was used to initialize the structure, which allowed input variables to be selected over layered connections and different activation functions for different nodes. Then, a hybrid method integrated the Genetic Programming and the Particle Swarm Optimization algorithms was used to identify an optimal structure with the parameters encoded in the classifier. The single CN classifier and an ensemble of CN classifiers were tested on four bench data sets. To ensure diversity of the ensemble classifiers, we constructed a base classifier using different feature sets, i.e., Pearson's correlation, Spearman's correlation, Euclidean distance, Cosine coefficient and the Fisher-ratio. The experimental results suggest that a single classifier can be used to obtain state-of-the-art results and the ensemble yielded better results.
Collapse
|
73
|
Kerins MJ, Milligan J, Wohlschlegel JA, Ooi A. Fumarate hydratase inactivation in hereditary leiomyomatosis and renal cell cancer is synthetic lethal with ferroptosis induction. Cancer Sci 2018; 109:2757-2766. [PMID: 29917289 PMCID: PMC6125459 DOI: 10.1111/cas.13701] [Citation(s) in RCA: 62] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Accepted: 06/17/2018] [Indexed: 12/31/2022] Open
Abstract
Hereditary leiomyomatosis and renal cell cancer (HLRCC) is a hereditary cancer syndrome characterized by inactivation of the Krebs cycle enzyme fumarate hydratase (FH). HLRCC patients are at high risk of developing kidney cancer of type 2 papillary morphology that is refractory to current radiotherapy, immunotherapy and chemotherapy. Hence, an effective therapy for this deadly form of cancer is urgently needed. Here, we show that FH inactivation (FH-/- ) proves synthetic lethal with inducers of ferroptosis, an iron-dependent and nonapoptotic form of cell death. Specifically, we identified gene signatures for compound sensitivities based on drug responses for 9 different drug classes against the NCI-60 cell lines. These signatures predicted that ferroptosis inducers would be selectively toxic to FH-/- cell line UOK262. Preferential cell death against UOK262-FH-/- was confirmed with 4 different ferroptosis inducers. Mechanistically, the FH-/- sensitivity to ferroptosis is attributed to dysfunctional GPX4, the primary cellular defender against ferroptosis. We identified that C93 of GPX4 is readily post-translationally modified by fumarates that accumulate in conditions of FH-/- , and that C93 modification represses GPX4 activity. Induction of ferroptosis in FH-inactivated tumors represents an opportunity for synthetic lethality in cancer.
Collapse
Affiliation(s)
- Michael J. Kerins
- Department of Pharmacology and ToxicologyCollege of PharmacyUniversity of ArizonaTucsonAZUSA
| | - John Milligan
- Department of Pharmacology and ToxicologyCollege of PharmacyUniversity of ArizonaTucsonAZUSA
| | - James A. Wohlschlegel
- Department of Biological ChemistryDavid Geffen School of MedicineUniversity of CaliforniaLos AngelesCAUSA
| | - Aikseng Ooi
- Department of Pharmacology and ToxicologyCollege of PharmacyUniversity of ArizonaTucsonAZUSA
| |
Collapse
|
74
|
Zheng YF, Lu X, Zhang XY, Guan BG. The landscape of DNA methylation in hepatocellular carcinoma. J Cell Physiol 2018; 234:2631-2638. [PMID: 30145793 DOI: 10.1002/jcp.27077] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Accepted: 06/28/2018] [Indexed: 12/24/2022]
Abstract
Better understanding of the relationship between changes in the overall methylation status of hepatocellular carcinoma (HCC) and disease progression will help us find good strategies for the early detection and treatment of HCC patients. The purpose of the study was to study the relations between the methylation status changes in HCC patients and progression of the disease to enable early detection and treatment of HCC patients. First, the DNA methylation data of 50 HCC samples and the surrounding normal samples were extracted and the change pattern of methylation status in the DNA promoter region of HCC samples against that of normal samples was studied. Then, some DNA methylation genes that could accurately identify cancer and cancer-adjacent tissues were identified using the k-top scoring pair method. Also, a prognostic signature that could predict the survival of HCC patients was constructed based on the overall survival time and death information of the early HCC patients. Finally, the obtained prognostic signature was verified. In conclusion, this study described the changes in the methylation spectrum during the development of HCC and identified genes associated with HCC progression and prognosis, which may offer new opportunities for the diagnosis and treatment of HCC.
Collapse
Affiliation(s)
- Yong-Fa Zheng
- Cancer Center, Renmin Hospital of Wuhan University, Wuhan, China
| | - Xiaojie Lu
- Nanjing Medical University, Nanjing, China
| | - Xiao-Yu Zhang
- Division of Gastrointestinal Surgery, Department of General Surgery, Huai'an Second People's Hospital and The Affiliated Huai'an Hospital of Xuzhou Medical University, Huai'an, China
| | - Bu-Gao Guan
- Department of General Surgery, People's Hospital of Jinhu, Huai'an, China
| |
Collapse
|
75
|
Analyzing omics data by pair-wise feature evaluation with horizontal and vertical comparisons. J Pharm Biomed Anal 2018; 157:20-26. [PMID: 29754039 DOI: 10.1016/j.jpba.2018.04.052] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2017] [Revised: 04/29/2018] [Accepted: 04/30/2018] [Indexed: 11/24/2022]
Abstract
Feature relationships are complex and may contain important information. k top scoring pairs (k-TSP) studies feature relationships by the horizontal comparison. This study examines feature relationships and proposes vertical and horizontal k-TSP (VH-k-TSP) to identify the discriminative feature pairs by evaluating feature pairs based on the vertical and horizontal comparisons. Complexity is introduced to compute the discriminative abilities of feature pairs by means of these two comparisons. VH-k-TSP was compared with support vector machine-recursive feature elimination, relative simplicity-support vector machine, k-TSP and M-k-TSP on nine public genomics datasets. For multi-class problems, one-to-one method was used. The experiments showed that VH-k-TSP outperformed the four methods in most cases. Then, VH-k-TSP was applied to a metabolomics data of liver disease. An accuracy rate of 88.11 ± 3.30% in discrimination between cirrhosis and hepatocellular carcinoma was obtained by VH-k-TSP, better than 77.39 ± 4.10% and 79.28 ± 3.73% obtained by k-TSP and M-k-TSP, respectively. Hence combining the vertical and horizontal comparisons could define more discriminative feature pairs.
Collapse
|
76
|
Hsu TY, Lin JM, Nguyen MHT, Chung FH, Tsai CC, Cheng HH, Lai YJ, Hung HN, Chen CS. Antigen Analysis of Pre-Eclamptic Plasma Antibodies Using Escherichia Coli Proteome Chips. Mol Cell Proteomics 2018; 17:1457-1469. [PMID: 29284593 PMCID: PMC6072543 DOI: 10.1074/mcp.ra117.000139] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2017] [Revised: 12/13/2017] [Indexed: 12/19/2022] Open
Abstract
Pre-eclampsia is one of the main causes of perinatal mortality and morbidity. Many biomarkers for diagnosing pre-eclampsia have been found but most have low accuracy. Therefore, a potential marker that can detect pre-eclampsia with high accuracy is required. Infection has been reported as a cause of pre-eclampsia. In recent years, protein microarray chips have been recognized as a strong and robust tool for profiling antibodies for infection diagnoses. The purpose of the present study was to profile antibodies in the human plasma of healthy and pre-eclamptic pregnancies to identify suitable biomarkers. In this study, an Escherichia coli chip was probed with samples from 29 individuals (16 pre-eclamptic women and 13 healthy pregnant women) to profile plasma antibodies. Bioinformatics tools were used to analyze the results, discover conserved motifs, compare against the entire human proteome, and perform protein functional analysis. An antibody classifier was identified using k-top scoring pairs and additional samples for a blinded test were collected. The findings indicated that compared with the healthy women, the pre-eclamptic women exhibited 108 and 130 differentially immunogenic proteins against human immunoglobulins G and M, respectively. In addition, pre-eclamptic women developed more immunoglobulin G but less immunoglobulin M against bacterial surface proteins compared with healthy women. The k-top scoring pairs identified five pairs of immunogenic proteins as classifiers with a high accuracy of 90% in the blind test. [AG] [ISV] GV [AE] L [LF] and [IV] [IV] RI [AG] [AD] E were the consensus motifs observed in immunogenic proteins in the immunoglobulin G and immunoglobulin M of pre-eclamptic women, respectively, whereas GA [AG] [AL] L [LF] and [SRY] [IQML] [ILV] [ILV] [ACG] GI [GH] [AEF] [AK] [ATY] [RG] N [IV] were observed in the immunoglobulins G and immunoglobulin M of healthy women, respectively.
Collapse
Affiliation(s)
- Te-Yao Hsu
- From the ‡Department of Obstetrics and Gynecology, Kaohsiung Chang Gung Memorial Hospital and Chang Gung University College of Medicine, Kaohsiung, Taiwan;
| | - Jyun-Mu Lin
- §Graduate Institute of Systems Biology and Bioinformatics, National Central University, Jhongli 32001, Taiwan
- ¶Department of Biomedical Science and Engineering, National Central University, Jhongli 32001, Taiwan
| | - Mai-Huong T Nguyen
- §Graduate Institute of Systems Biology and Bioinformatics, National Central University, Jhongli 32001, Taiwan
- ¶Department of Biomedical Science and Engineering, National Central University, Jhongli 32001, Taiwan
| | - Feng-Hsiang Chung
- §Graduate Institute of Systems Biology and Bioinformatics, National Central University, Jhongli 32001, Taiwan
- ¶Department of Biomedical Science and Engineering, National Central University, Jhongli 32001, Taiwan
| | - Ching-Chang Tsai
- From the ‡Department of Obstetrics and Gynecology, Kaohsiung Chang Gung Memorial Hospital and Chang Gung University College of Medicine, Kaohsiung, Taiwan
| | - Hsin-Hsin Cheng
- From the ‡Department of Obstetrics and Gynecology, Kaohsiung Chang Gung Memorial Hospital and Chang Gung University College of Medicine, Kaohsiung, Taiwan
| | - Yun-Ju Lai
- From the ‡Department of Obstetrics and Gynecology, Kaohsiung Chang Gung Memorial Hospital and Chang Gung University College of Medicine, Kaohsiung, Taiwan
| | - Hsuan-Ning Hung
- From the ‡Department of Obstetrics and Gynecology, Kaohsiung Chang Gung Memorial Hospital and Chang Gung University College of Medicine, Kaohsiung, Taiwan
| | - Chien-Sheng Chen
- §Graduate Institute of Systems Biology and Bioinformatics, National Central University, Jhongli 32001, Taiwan;
- ¶Department of Biomedical Science and Engineering, National Central University, Jhongli 32001, Taiwan
- ‖Department of Food Safety/Hygiene and Risk Management, College of Medicine, National Cheng Kung University, Tainan City 704, Taiwan
| |
Collapse
|
77
|
Jiao Y, Vert JP, Vert JP, Jiao Y, Vert JP. The Kendall and Mallows Kernels for Permutations. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2018; 40:1755-1769. [PMID: 28981406 DOI: 10.1109/tpami.2017.2719680] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
We show that the widely used Kendall tau correlation coefficient, and the related Mallows kernel, are positive definite kernels for permutations. They offer computationally attractive alternatives to more complex kernels on the symmetric group to learn from rankings, or learn to rank. We show how to extend these kernels to partial rankings, multivariate rankings and uncertain rankings. Examples are presented on how to formulate typical problems of learning from rankings such that they can be solved with state-of-the-art kernel algorithms. We demonstrate promising results on clustering heterogeneous rank data and high-dimensional classification problems in biomedical applications.
Collapse
|
78
|
AUCTSP: an improved biomarker gene pair class predictor. BMC Bioinformatics 2018; 19:244. [PMID: 29940833 PMCID: PMC6020231 DOI: 10.1186/s12859-018-2231-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2018] [Accepted: 06/04/2018] [Indexed: 11/10/2022] Open
Abstract
Background The Top Scoring Pair (TSP) classifier, based on the concept of relative ranking reversals in the expressions of pairs of genes, has been proposed as a simple, accurate, and easily interpretable decision rule for classification and class prediction of gene expression profiles. The idea that differences in gene expression ranking are associated with presence or absence of disease is compelling and has strong biological plausibility. Nevertheless, the TSP formulation ignores significant available information which can improve classification accuracy and is vulnerable to selecting genes which do not have differential expression in the two conditions (“pivot" genes). Results We introduce the AUCTSP classifier as an alternative rank-based estimator of the magnitude of the ranking reversals involved in the original TSP. The proposed estimator is based on the Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) and as such, takes into account the separation of the entire distribution of gene expression levels in gene pairs under the conditions considered, as opposed to comparing gene rankings within individual subjects as in the original TSP formulation. Through extensive simulations and case studies involving classification in ovarian, leukemia, colon, breast and prostate cancers and diffuse large b-cell lymphoma, we show the superiority of the proposed approach in terms of improving classification accuracy, avoiding overfitting and being less prone to selecting non-informative (pivot) genes. Conclusions The proposed AUCTSP is a simple yet reliable and robust rank-based classifier for gene expression classification. While the AUCTSP works by the same principle as TSP, its ability to determine the top scoring gene pair based on the relative rankings of two marker genes across all subjects as opposed to each individual subject results in significant performance gains in classification accuracy. In addition, the proposed method tends to avoid selection of non-informative (pivot) genes as members of the top-scoring pair. Electronic supplementary material The online version of this article (10.1186/s12859-018-2231-1) contains supplementary material, which is available to authorized users.
Collapse
|
79
|
Abstract
Technological advances enable increasingly comprehensive profiling of the molecular landscapes of cells, and these data can inform the personalized treatment of complex diseases. Two major obstacles are the complexity of these data and the high degree of person-to-person heterogeneity. We develop a highly simplified, personalized data representation by comparing the profile of an individual to the range of landscapes in a baseline population, thereby mimicking basic clinical diagnostic testing for departures of selected variables from normal levels. Moreover, our method can be applied to any data modality and at any level of granularity, from single features to any subset of features treated as a single entity, for example the gene expression levels in a pathway. Experiments involve both healthy human tissues and various cancer subtypes. Data collected from omics technologies have revealed pervasive heterogeneity and stochasticity of molecular states within and between phenotypes. A prominent example of such heterogeneity occurs between genome-wide mRNA, microRNA, and methylation profiles from one individual tumor to another, even within a cancer subtype. However, current methods in bioinformatics, such as detecting differentially expressed genes or CpG sites, are population-based and therefore do not effectively model intersample diversity. Here we introduce a unified theory to quantify sample-level heterogeneity that is applicable to a single omics profile. Specifically, we simplify an omics profile to a digital representation based on the omics profiles from a set of samples from a reference or baseline population (e.g., normal tissues). The state of any subprofile (e.g., expression vector for a subset of genes) is said to be “divergent” if it lies outside the estimated support of the baseline distribution and is consequently interpreted as “dysregulated” relative to that baseline. We focus on two cases: single features (e.g., individual genes) and distinguished subsets (e.g., regulatory pathways). Notably, since the divergence analysis is at the individual sample level, dysregulation can be analyzed probabilistically; for example, one can estimate the probability that a gene or pathway is divergent in some population. Finally, the reduction in complexity facilitates a more “personalized” and biologically interpretable analysis of variation, as illustrated by experiments involving tissue characterization, disease detection and progression, and disease–pathway associations.
Collapse
|
80
|
Circumvent the uncertainty in the applications of transcriptional signatures to tumor tissues sampled from different tumor sites. Oncotarget 2018; 8:30265-30275. [PMID: 28427173 PMCID: PMC5444741 DOI: 10.18632/oncotarget.15754] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2016] [Accepted: 01/30/2017] [Indexed: 11/25/2022] Open
Abstract
The expression measurements of thousands of genes are correlated with the proportions of tumor epithelial cell (PTEC) in clinical samples. Thus, for a tumor diagnostic or prognostic signature based on a summarization of expression levels of the signature genes, the risk score for a patient may dependent on the tumor tissues sampled from different tumor sites with diverse PTEC for the same patient. Here, we proposed that the within-samples relative expression orderings (REOs) based gene pairs signatures should be insensitive to PTEC variations. Firstly, by analysis of paired tumor epithelial cell and stromal cell microdissected samples from 27 cancer patients, we showed that above 80% of gene pairs had consistent REOs between the two cells, indicating these REOs would be independent of PTEC in cancer tissues. Then, by simulating tumor tissues with different PTEC using each of the 27 paired samples, we showed that about 90% REOs of gene pairs in tumor epithelial cells were maintained in tumor samples even when PTEC decreased to 30%. Especially, the REOs of gene pairs with larger expression differences in tumor epithelial cells tend to be more robust against PTEC variations. Finally, as a case study, we developed a gene pair signature which could robustly distinguish colorectal cancer tissues with various PTEC from normal tissues. We concluded that the REOs-based signatures were robust against PTEC variations.
Collapse
|
81
|
Chen R, Guan Q, Cheng J, He J, Liu H, Cai H, Hong G, Zhang J, Li N, Ao L, Guo Z. Robust transcriptional tumor signatures applicable to both formalin-fixed paraffin-embedded and fresh-frozen samples. Oncotarget 2018; 8:6652-6662. [PMID: 28036264 PMCID: PMC5351660 DOI: 10.18632/oncotarget.14257] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2016] [Accepted: 12/02/2016] [Indexed: 12/19/2022] Open
Abstract
Formalin-fixed paraffin-embedded (FFPE) samples represent a valuable resource for clinical researches. However, FFPE samples are usually considered an unreliable source for gene expression analysis due to the partial RNA degradation. In this study, through comparing gene expression profiles between FFPE samples and paired fresh-frozen (FF) samples for three cancer types, we firstly showed that expression measurements of thousands of genes had at least two-fold change in FFPE samples compared with paired FF samples. Therefore, for a transcriptional signature based on risk scores summarized from the expression levels of the signature genes, the risk score thresholds trained from FFPE (or FF) samples could not be applied to FF (or FFPE) samples. On the other hand, we found that more than 90% of the relative expression orderings (REOs) of gene pairs in the FF samples were maintained in their paired FFPE samples and largely unaffected by the storage time. The result suggested that the REOs of gene pairs were highly robust against partial RNA degradation in FFPE samples. Finally, as a case study, we developed a REOs-based signature to distinguish liver cirrhosis from hepatocellular carcinoma (HCC) using FFPE samples. The signature was validated in four datasets of FFPE samples and eight datasets of FF samples. In conclusion, the valuable FFPE samples can be fully exploited to identify REOs-based diagnostic and prognostic signatures which could be robustly applicable to both FF samples and FFPE samples with degraded RNA.
Collapse
Affiliation(s)
- Rou Chen
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Department of Bioinformatics, Fujian Medical University, Fuzhou 350001, China
| | - Qingzhou Guan
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Department of Bioinformatics, Fujian Medical University, Fuzhou 350001, China
| | - Jun Cheng
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Department of Bioinformatics, Fujian Medical University, Fuzhou 350001, China
| | - Jun He
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Department of Bioinformatics, Fujian Medical University, Fuzhou 350001, China
| | - Huaping Liu
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Department of Bioinformatics, Fujian Medical University, Fuzhou 350001, China
| | - Hao Cai
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Department of Bioinformatics, Fujian Medical University, Fuzhou 350001, China
| | - Guini Hong
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Department of Bioinformatics, Fujian Medical University, Fuzhou 350001, China
| | - Jiahui Zhang
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Department of Bioinformatics, Fujian Medical University, Fuzhou 350001, China
| | - Na Li
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Department of Bioinformatics, Fujian Medical University, Fuzhou 350001, China
| | - Lu Ao
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Department of Bioinformatics, Fujian Medical University, Fuzhou 350001, China
| | - Zheng Guo
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Department of Bioinformatics, Fujian Medical University, Fuzhou 350001, China
| |
Collapse
|
82
|
Pitroda SP, Stack ME, Liu GF, Song SS, Chen L, Liang H, Parekh AD, Huang X, Roach P, Posner MC, Weichselbaum RR, Khodarev NN. JAK2 Inhibitor SAR302503 Abrogates PD-L1 Expression and Targets Therapy-Resistant Non–small Cell Lung Cancers. Mol Cancer Ther 2018; 17:732-739. [DOI: 10.1158/1535-7163.mct-17-0667] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2017] [Revised: 11/27/2017] [Accepted: 01/17/2018] [Indexed: 11/16/2022]
|
83
|
Guan Q, Yan H, Chen Y, Zheng B, Cai H, He J, Song K, Guo Y, Ao L, Liu H, Zhao W, Wang X, Guo Z. Quantitative or qualitative transcriptional diagnostic signatures? A case study for colorectal cancer. BMC Genomics 2018; 19:99. [PMID: 29378509 PMCID: PMC5789529 DOI: 10.1186/s12864-018-4446-y] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Accepted: 01/11/2018] [Indexed: 12/20/2022] Open
Abstract
Background Due to experimental batch effects, the application of a quantitative transcriptional signature for disease diagnoses commonly requires inter-sample data normalization, which would be hardly applicable under common clinical settings. Many cancers might have qualitative differences with the non-cancer states in the gene expression pattern. Therefore, it is reasonable to explore the power of qualitative diagnostic signatures which are robust against experimental batch effects and other random factors. Results Firstly, using data of technical replicate samples from the MicroArray Quality Control (MAQC) project, we demonstrated that the low-throughput PCR-based technologies also exist large measurement variations for gene expression even when the samples were measured in the same test site. Then, we demonstrated the critical limitation of low stability for classifiers based on quantitative transcriptional signatures in applications to individual samples through a case study using a support vector machine and a naïve Bayesian classifier to discriminate colorectal cancer tissues from normal tissues. To address this problem, we identified a signature consisting of three gene pairs for discriminating colorectal cancer tissues from non-cancer (normal and inflammatory bowel disease) tissues based on within-sample relative expression orderings (REOs) of these gene pairs. The signature was well verified using 22 independent datasets measured by different microarray and RNA_seq platforms, obviating the need of inter-sample data normalization. Conclusions Subtle quantitative information of gene expression measurements tends to be unstable under current technical conditions, which will introduce uncertainty to clinical applications of the quantitative transcriptional diagnostic signatures. For diagnosis of disease states with qualitative transcriptional characteristics, the qualitative REO-based signatures could be robustly applied to individual samples measured by different platforms. Electronic supplementary material The online version of this article (10.1186/s12864-018-4446-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Qingzhou Guan
- Fujian Key Laboratory of Medical Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350122, China
| | - Haidan Yan
- Fujian Key Laboratory of Medical Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350122, China
| | - Yanhua Chen
- Fujian Key Laboratory of Medical Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350122, China
| | - Baotong Zheng
- Fujian Key Laboratory of Medical Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350122, China
| | - Hao Cai
- Fujian Key Laboratory of Medical Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350122, China
| | - Jun He
- Fujian Key Laboratory of Medical Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350122, China
| | - Kai Song
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - You Guo
- Fujian Key Laboratory of Medical Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350122, China.,Department of Preventive Medicine, School of Basic Medicine Sciences, Gannan Medical University, Ganzhou, 341000, China
| | - Lu Ao
- Fujian Key Laboratory of Medical Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350122, China
| | - Huaping Liu
- Fujian Key Laboratory of Medical Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350122, China
| | - Wenyuan Zhao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Xianlong Wang
- Fujian Key Laboratory of Medical Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350122, China.
| | - Zheng Guo
- Fujian Key Laboratory of Medical Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350122, China. .,Fujian Key Laboratory of Tumor Microbiology, Fujian Medical University, Fuzhou, 350122, China. .,College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China.
| |
Collapse
|
84
|
Sheikhpour R, Sarram MA, Chahooki MAZ, Sheikhpour R. A kernelized non-parametric classifier based on feature ranking in anisotropic Gaussian kernel. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.06.035] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
85
|
Li B, Cui Y, Diehn M, Li R. Development and Validation of an Individualized Immune Prognostic Signature in Early-Stage Nonsquamous Non-Small Cell Lung Cancer. JAMA Oncol 2017; 3:1529-1537. [PMID: 28687838 DOI: 10.1001/jamaoncol.2017.1609] [Citation(s) in RCA: 287] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Importance The prevalence of early-stage non-small cell lung cancer (NSCLC) is expected to increase with recent implementation of annual screening programs. Reliable prognostic biomarkers are needed to identify patients at a high risk for recurrence to guide adjuvant therapy. Objective To develop a robust, individualized immune signature that can estimate prognosis in patients with early-stage nonsquamous NSCLC. Design, Setting, and Participants This retrospective study analyzed the gene expression profiles of frozen tumor tissue samples from 19 public NSCLC cohorts, including 18 microarray data sets and 1 RNA-Seq data set for The Cancer Genome Atlas (TCGA) lung adenocarcinoma cohort. Only patients with nonsquamous NSCLC with clinical annotation were included. Samples were from 2414 patients with nonsquamous NSCLC, divided into a meta-training cohort (729 patients), meta-testing cohort (716 patients), and 3 independent validation cohorts (439, 323, and 207 patients). All patients underwent surgery with a negative surgical margin, received no adjuvant or neoadjuvant therapy, and had publicly available gene expression data and survival information. Data were collected from July 22 through September 8, 2016. Main Outcomes and Measures Overall survival. Results Of 2414 patients (1205 men [50%], 1111 women [46%], and 98 of unknown sex [4%]; median age [range], 64 [15-90] years), a prognostic immune signature of 25 gene pairs consisting of 40 unique genes was constructed using the meta-training data set. In the meta-testing and validation cohorts, the immune signature significantly stratified patients into high- vs low-risk groups in terms of overall survival across and within subpopulations with stage I, IA, IB, or II disease and remained as an independent prognostic factor in multivariate analyses (hazard ratio range, 1.72 [95% CI, 1.26-2.33; P < .001] to 2.36 [95% CI, 1.47-3.79; P < .001]) after adjusting for clinical and pathologic factors. Several biological processes, including chemotaxis, were enriched among genes in the immune signature. The percentage of neutrophil infiltration (5.6% vs 1.8%) and necrosis (4.6% vs 1.5%) was significantly higher in the high-risk immune group compared with the low-risk groups in TCGA data set (P < .003). The immune signature achieved a higher accuracy (mean concordance index [C-index], 0.64) than 2 commercialized multigene signatures (mean C-index, 0.53 and 0.61) for estimation of survival in comparable validation cohorts. When integrated with clinical characteristics such as age and stage, the composite clinical and immune signature showed improved prognostic accuracy in all validation data sets relative to molecular signatures alone (mean C-index, 0.70 vs 0.63) and another commercialized clinical-molecular signature (mean C-index, 0.68 vs 0.65). Conclusions and Relevance The proposed clinical-immune signature is a promising biomarker for estimating overall survival in nonsquamous NSCLC, including early-stage disease. Prospective studies are needed to test the clinical utility of the biomarker in individualized management of nonsquamous NSCLC.
Collapse
Affiliation(s)
- Bailiang Li
- Department of Radiation Oncology, Stanford University School of Medicine, Palo Alto, California
| | - Yi Cui
- Department of Radiation Oncology, Stanford University School of Medicine, Palo Alto, California.,Global Institution for Collaborative Research and Education, Hokkaido University, Sapporo, Japan
| | - Maximilian Diehn
- Department of Radiation Oncology, Stanford University School of Medicine, Palo Alto, California.,Institute for Stem Cell Biology and Regenerative Medicine, Stanford University School of Medicine, Palo Alto, California.,Stanford Cancer Institute, Stanford University School of Medicine, Palo Alto, California
| | - Ruijiang Li
- Department of Radiation Oncology, Stanford University School of Medicine, Palo Alto, California.,Stanford Cancer Institute, Stanford University School of Medicine, Palo Alto, California
| |
Collapse
|
86
|
Liu H, Li Y, He J, Guan Q, Chen R, Yan H, Zheng W, Song K, Cai H, Guo Y, Wang X, Guo Z. Robust transcriptional signatures for low-input RNA samples based on relative expression orderings. BMC Genomics 2017; 18:913. [PMID: 29179677 PMCID: PMC5704640 DOI: 10.1186/s12864-017-4280-7] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2017] [Accepted: 11/03/2017] [Indexed: 11/18/2022] Open
Abstract
Background It is often difficult to obtain sufficient quantity of RNA molecules for gene expression profiling under many practical situations. Amplification from low-input samples may induce artificial signals. Results We compared the expression measurements of low-input mRNA samples, from 25 pg to 1000 pg mRNA, which were amplified and profiled by Smart-seq, DP-seq and CEL-seq techniques using the Illumina HiSeq 2000 platform, with those of the paired high-input (50 ng) mRNA samples. Even with 1000 pg mRNA input, we found that thousands of genes had at least 2 folds-change of expression levels in the low-input samples compared with the corresponding paired high-input samples. Consequently, a transcriptional signature based on quantitative expression values and determined from high-input RNA samples cannot be applied to low-input samples, and vice versa. In contrast, the within-sample relative expression orderings (REOs) of approximately 90% of all the gene pairs in the high-input samples were maintained in the paired low-input samples with 1000 pg input mRNA molecules. Similar results were observed in the low-input total RNA samples amplified and profiled by the Whole-Genome DASL technique using the Illumina HumanRef-8 v3.0 platform. As a proof of principle, we developed REOs-based signatures from high-input RNA samples for discriminating cancer tissues and showed that they can be robustly applied to low-input RNA samples. Conclusions REOs-based signatures determined from the high-input RNA samples can be robustly applied to samples profiled with the low-input RNA samples, as low as the 1000 pg and 250 pg input samples but no longer stable in samples with less than 250 pg RNA input to a certain degree. Electronic supplementary material The online version of this article (10.1186/s12864-017-4280-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Huaping Liu
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China.,Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Yawei Li
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China
| | - Jun He
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China
| | - Qingzhou Guan
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China
| | - Rou Chen
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China
| | - Haidan Yan
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China
| | - Weicheng Zheng
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China
| | - Kai Song
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Hao Cai
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China
| | - You Guo
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China
| | - Xianlong Wang
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China.
| | - Zheng Guo
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China. .,Fujian Key Laboratory of Tumor Microbiology, Fujian Medical University, Fuzhou, 350122, China. .,Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China. .,Key Laboratory of Medical bioinformatics, Fujian Province, China.
| |
Collapse
|
87
|
Identification of gene pairs through penalized regression subject to constraints. BMC Bioinformatics 2017; 18:466. [PMID: 29100492 PMCID: PMC5670721 DOI: 10.1186/s12859-017-1872-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2017] [Accepted: 10/17/2017] [Indexed: 02/07/2023] Open
Abstract
Background This article concerns the identification of gene pairs or combinations of gene pairs associated with biological phenotype or clinical outcome, allowing for building predictive models that are not only robust to normalization but also easily validated and measured by qPCR techniques. However, given a small number of biological samples yet a large number of genes, this problem suffers from the difficulty of high computational complexity and imposes challenges to the accuracy of identification statistically. Results In this paper, we propose a parsimonious model representation and develop efficient algorithms for identification. Particularly, we derive an equivalent model subject to a sum-to-zero constraint in penalized linear regression, where the correspondence between nonzero coefficients in these models is established. Most importantly, it reduces the model complexity of the traditional approach from the quadratic order to the linear order in the number of candidate genes, while overcoming the difficulty of model nonidentifiablity. Computationally, we develop an algorithm using the alternating direction method of multipliers (ADMM) to deal with the constraint. Numerically, we demonstrate that the proposed method outperforms the traditional method in terms of the statistical accuracy. Moreover, we demonstrate that our ADMM algorithm is more computationally efficient than a coordinate descent algorithm with a local search. Finally, we illustrate the proposed method on a prostate cancer dataset to identify gene pairs that are associated with pre-operative prostate-specific antigen. Conclusion Our findings demonstrate the feasibility and utility of using gene pairs as biomarkers.
Collapse
|
88
|
Ma T, Song C, Tseng GC. Discussant paper on ‘Statistical contributions to bioinformatics: Design, modelling, structure learning and integration’. STAT MODEL 2017. [DOI: 10.1177/1471082x17705992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Affiliation(s)
- Tianzhou Ma
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh Pittsburgh, PA, USA
| | - Chi Song
- Division of Biostatistics, College of Public Health, Ohio State University, Columbus, OH, USA
| | - George C. Tseng
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh Pittsburgh, PA, USA
| |
Collapse
|
89
|
Salekin S, Bari MG, Raphael I, Forsthuber TG, Zhang JM. Early response index: a statistic to discover potential early stage disease biomarkers. BMC Bioinformatics 2017. [PMID: 28645323 PMCID: PMC5481992 DOI: 10.1186/s12859-017-1712-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Background Identifying disease correlated features early before large number of molecules are impacted by disease progression with significant abundance change is very advantageous to biologists for developing early disease diagnosis biomarkers. Disease correlated features have relatively low level of abundance change at early stages. Finding them using existing bioinformatic tools in high throughput data is a challenging task since the technology suffers from limited dynamic range and significant noise. Most existing biomarker discovery algorithms can only detect molecules with high abundance changes, frequently missing early disease diagnostic markers. Results We present a new statistic called early response index (ERI) to prioritize disease correlated molecules as potential early biomarkers. Instead of classification accuracy, ERI measures the average classification accuracy improvement attainable by a feature when it is united with other counterparts for classification. ERI is more sensitive to abundance changes than other ranking statistics. We have shown that ERI significantly outperforms SAM and Localfdr in detecting early responding molecules in a proteomics study of a mouse model of multiple sclerosis. Importantly, ERI was able to detect many disease relevant proteins before those algorithms detect them at a later time point. Conclusions ERI method is more sensitive for significant feature detection during early stage of disease development. It potentially has a higher specificity for biomarker discovery, and can be used to identify critical time frame for disease intervention. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1712-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sirajul Salekin
- Department of Electrical and Computer Engineering, University of Texas at San Antonio, One UTSA Circle, San Antonio, TX 78207, USA.
| | - Mehrab Ghanat Bari
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, 200 First Street SW, MN, Rochester, 55905, USA
| | - Itay Raphael
- Department of Biology, University of Texas at San Antonio, One UTSA Circle, San Antonio, TX 78207, USA
| | - Thomas G Forsthuber
- Department of Biology, University of Texas at San Antonio, One UTSA Circle, San Antonio, TX 78207, USA
| | - Jianqiu Michelle Zhang
- Department of Electrical and Computer Engineering, University of Texas at San Antonio, One UTSA Circle, San Antonio, TX 78207, USA
| |
Collapse
|
90
|
Reconstructing cell cycle pseudo time-series via single-cell transcriptome data. Nat Commun 2017; 8:22. [PMID: 28630425 PMCID: PMC5476636 DOI: 10.1038/s41467-017-00039-z|] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
Single-cell mRNA sequencing, which permits whole transcriptional profiling of individual cells, has been widely applied to study growth and development of tissues and tumors. Resolving cell cycle for such groups of cells is significant, but may not be adequately achieved by commonly used approaches. Here we develop a traveling salesman problem and hidden Markov model-based computational method named reCAT, to recover cell cycle along time for unsynchronized single-cell transcriptome data. We independently test reCAT for accuracy and reliability using several data sets. We find that cell cycle genes cluster into two major waves of expression, which correspond to the two well-known checkpoints, G1 and G2. Moreover, we leverage reCAT to exhibit methylation variation along the recovered cell cycle. Thus, reCAT shows the potential to elucidate diverse profiles of cell cycle, as well as other cyclic or circadian processes (e.g., in liver), on single-cell resolution.In single-cell RNA sequencing data of heterogeneous cell populations, cell cycle stage of individual cells would often be informative. Here, the authors introduce a computational model to reconstruct a pseudo-time series from single cell transcriptome data, identify the cell cycle stages, identify candidate cell cycle-regulated genes and recover the methylome changes during the cell cycle.
Collapse
|
91
|
Liu Z, Lou H, Xie K, Wang H, Chen N, Aparicio OM, Zhang MQ, Jiang R, Chen T. Reconstructing cell cycle pseudo time-series via single-cell transcriptome data. Nat Commun 2017. [PMID: 28630425 PMCID: PMC5476636 DOI: 10.1038/s41467-017-00039-z] [Citation(s) in RCA: 66] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Single-cell mRNA sequencing, which permits whole transcriptional profiling of individual cells, has been widely applied to study growth and development of tissues and tumors. Resolving cell cycle for such groups of cells is significant, but may not be adequately achieved by commonly used approaches. Here we develop a traveling salesman problem and hidden Markov model-based computational method named reCAT, to recover cell cycle along time for unsynchronized single-cell transcriptome data. We independently test reCAT for accuracy and reliability using several data sets. We find that cell cycle genes cluster into two major waves of expression, which correspond to the two well-known checkpoints, G1 and G2. Moreover, we leverage reCAT to exhibit methylation variation along the recovered cell cycle. Thus, reCAT shows the potential to elucidate diverse profiles of cell cycle, as well as other cyclic or circadian processes (e.g., in liver), on single-cell resolution. In single-cell RNA sequencing data of heterogeneous cell populations, cell cycle stage of individual cells would often be informative. Here, the authors introduce a computational model to reconstruct a pseudo-time series from single cell transcriptome data, identify the cell cycle stages, identify candidate cell cycle-regulated genes and recover the methylome changes during the cell cycle.
Collapse
Affiliation(s)
- Zehua Liu
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Huazhe Lou
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Department of Computer Sciences, State Key Lab of Intelligent Technology and Systems, Tsinghua University, Beijing, 100084, China
| | - Kaikun Xie
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Department of Computer Sciences, State Key Lab of Intelligent Technology and Systems, Tsinghua University, Beijing, 100084, China
| | - Hao Wang
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Department of Computer Sciences, State Key Lab of Intelligent Technology and Systems, Tsinghua University, Beijing, 100084, China
| | - Ning Chen
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Department of Computer Sciences, State Key Lab of Intelligent Technology and Systems, Tsinghua University, Beijing, 100084, China
| | - Oscar M Aparicio
- Program in Computational Biology and Bioinformatics, University of Southern California, Los Angeles, CA, 90089, USA
| | - Michael Q Zhang
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Department of Automation, Tsinghua University, Beijing, 100084, China.,Department of Molecular and Cell Biology, Center for Systems Biology, University of Texas at Dallas, 800 West Campbell Road, RL11, Richardson, TX, 75080-3021, USA
| | - Rui Jiang
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Department of Automation, Tsinghua University, Beijing, 100084, China.
| | - Ting Chen
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Department of Computer Sciences, State Key Lab of Intelligent Technology and Systems, Tsinghua University, Beijing, 100084, China. .,Program in Computational Biology and Bioinformatics, University of Southern California, Los Angeles, CA, 90089, USA.
| |
Collapse
|
92
|
Isella C, Brundu F, Bellomo SE, Galimi F, Zanella E, Porporato R, Petti C, Fiori A, Orzan F, Senetta R, Boccaccio C, Ficarra E, Marchionni L, Trusolino L, Medico E, Bertotti A. Selective analysis of cancer-cell intrinsic transcriptional traits defines novel clinically relevant subtypes of colorectal cancer. Nat Commun 2017; 8:15107. [PMID: 28561063 PMCID: PMC5499209 DOI: 10.1038/ncomms15107] [Citation(s) in RCA: 194] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2016] [Accepted: 03/01/2017] [Indexed: 12/15/2022] Open
Abstract
Stromal content heavily impacts the transcriptional classification of colorectal cancer (CRC), with clinical and biological implications. Lineage-dependent stromal transcriptional components could therefore dominate over more subtle expression traits inherent to cancer cells. Since in patient-derived xenografts (PDXs) stromal cells of the human tumour are substituted by murine counterparts, here we deploy human-specific expression profiling of CRC PDXs to assess cancer-cell intrinsic transcriptional features. Through this approach, we identify five CRC intrinsic subtypes (CRIS) endowed with distinctive molecular, functional and phenotypic peculiarities: (i) CRIS-A: mucinous, glycolytic, enriched for microsatellite instability or KRAS mutations; (ii) CRIS-B: TGF-β pathway activity, epithelial–mesenchymal transition, poor prognosis; (iii) CRIS-C: elevated EGFR signalling, sensitivity to EGFR inhibitors; (iv) CRIS-D: WNT activation, IGF2 gene overexpression and amplification; and (v) CRIS-E: Paneth cell-like phenotype, TP53 mutations. CRIS subtypes successfully categorize independent sets of primary and metastatic CRCs, with limited overlap on existing transcriptional classes and unprecedented predictive and prognostic performances. Stromal cells contribute to the gene expression profiles based on which colorectal cancer (CRC) molecular subtypes are classified. Here, patient-derived xenografts enable the authors to obtain cancer cell-specific transcriptomes by excluding transcripts from murine stromal cells, based on which they define CRC intrinsic subtypes (CRIS) and evaluate their prognostic and predictive potential.
Collapse
Affiliation(s)
- Claudio Isella
- Department of Oncology, University of Torino School of Medicine, 10060 Candiolo Torino, Italy.,Candiolo Cancer Institute-FPO IRCCS, 10060 Candiolo Torino, Italy
| | - Francesco Brundu
- Department of Control and Computer Engineering, Torino School of Engineering, 10129 Torino, Italy
| | - Sara E Bellomo
- Department of Oncology, University of Torino School of Medicine, 10060 Candiolo Torino, Italy.,Candiolo Cancer Institute-FPO IRCCS, 10060 Candiolo Torino, Italy
| | - Francesco Galimi
- Department of Oncology, University of Torino School of Medicine, 10060 Candiolo Torino, Italy.,Candiolo Cancer Institute-FPO IRCCS, 10060 Candiolo Torino, Italy
| | - Eugenia Zanella
- Candiolo Cancer Institute-FPO IRCCS, 10060 Candiolo Torino, Italy
| | | | - Consalvo Petti
- Department of Oncology, University of Torino School of Medicine, 10060 Candiolo Torino, Italy.,Candiolo Cancer Institute-FPO IRCCS, 10060 Candiolo Torino, Italy
| | - Alessandro Fiori
- Candiolo Cancer Institute-FPO IRCCS, 10060 Candiolo Torino, Italy
| | - Francesca Orzan
- Department of Oncology, University of Torino School of Medicine, 10060 Candiolo Torino, Italy.,Candiolo Cancer Institute-FPO IRCCS, 10060 Candiolo Torino, Italy
| | - Rebecca Senetta
- Candiolo Cancer Institute-FPO IRCCS, 10060 Candiolo Torino, Italy.,Department of Medical Sciences, University of Torino School of Medicine, 10060 Candiolo Torino, Italy
| | - Carla Boccaccio
- Department of Oncology, University of Torino School of Medicine, 10060 Candiolo Torino, Italy.,Candiolo Cancer Institute-FPO IRCCS, 10060 Candiolo Torino, Italy
| | - Elisa Ficarra
- Department of Control and Computer Engineering, Torino School of Engineering, 10129 Torino, Italy
| | - Luigi Marchionni
- Department of Oncology, Johns Hopkins University, Baltimore, 21287 Maryland, USA
| | - Livio Trusolino
- Department of Oncology, University of Torino School of Medicine, 10060 Candiolo Torino, Italy.,Candiolo Cancer Institute-FPO IRCCS, 10060 Candiolo Torino, Italy
| | - Enzo Medico
- Department of Oncology, University of Torino School of Medicine, 10060 Candiolo Torino, Italy.,Candiolo Cancer Institute-FPO IRCCS, 10060 Candiolo Torino, Italy
| | - Andrea Bertotti
- Department of Oncology, University of Torino School of Medicine, 10060 Candiolo Torino, Italy.,Candiolo Cancer Institute-FPO IRCCS, 10060 Candiolo Torino, Italy.,National Institute of Biostructures and Biosystems, INBB, 00136 Rome, Italy
| |
Collapse
|
93
|
Hornung R, Causeur D, Bernau C, Boulesteix AL. Improving cross-study prediction through addon batch effect adjustment or addon normalization. Bioinformatics 2017; 33:397-404. [PMID: 27797760 DOI: 10.1093/bioinformatics/btw650] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Accepted: 10/11/2016] [Indexed: 12/22/2022] Open
Abstract
Motivation To date most medical tests derived by applying classification methods to high-dimensional molecular data are hardly used in clinical practice. This is partly because the prediction error resulting when applying them to external data is usually much higher than internal error as evaluated through within-study validation procedures. We suggest the use of addon normalization and addon batch effect removal techniques in this context to reduce systematic differences between external data and the original dataset with the aim to improve prediction performance. Results We evaluate the impact of addon normalization and seven batch effect removal methods on cross-study prediction performance for several common classifiers using a large collection of microarray gene expression datasets, showing that some of these techniques reduce prediction error. Availability and Implementation All investigated addon methods are implemented in our R package bapred. Contact hornung@ibe.med.uni-muenchen.de. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Roman Hornung
- Department of Medical Informatics, Biometry and Epidemiology, University of Munich, Munich, Germany
| | - David Causeur
- Applied Mathematics Department, Agrocampus Ouest, Rennes, France
| | | | - Anne-Laure Boulesteix
- Department of Medical Informatics, Biometry and Epidemiology, University of Munich, Munich, Germany
| |
Collapse
|
94
|
Hong G, Li H, Zhang J, Guan Q, Chen R, Guo Z. Identifying disease-associated pathways in one-phenotype data based on reversal gene expression orderings. Sci Rep 2017; 7:1348. [PMID: 28465555 PMCID: PMC5431047 DOI: 10.1038/s41598-017-01536-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2016] [Accepted: 03/30/2017] [Indexed: 12/31/2022] Open
Abstract
Due to the invasiveness nature of tissue biopsy, it is common that investigators cannot collect sufficient normal controls for comparison with diseased samples. We developed a pathway enrichment tool, DRFunc, to detect significantly disease-disrupted pathways by incorporating normal controls from other experiments. The method was validated using both microarray and RNA-seq expression data for different cancers. The high concordant differentially ranked (DR) gene pairs were identified between cases and controls from different independent datasets. The DR gene pairs were used in the DRFunc algorithm to detect significantly disrupted pathways in one-phenotype expression data by combing controls from other studies. The DRFunc algorithm was exemplified by the detection of significant pathways in glioblastoma samples. The algorithm can also be used to detect altered pathways in the datasets with weak expression signals, as shown by the analysis on the expression data of chemotherapy-treated breast cancer samples.
Collapse
Affiliation(s)
- Guini Hong
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350108, China.
| | - Hongdong Li
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350108, China
| | - Jiahui Zhang
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350108, China
| | - Qingzhou Guan
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350108, China
| | - Rou Chen
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350108, China
| | - Zheng Guo
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350108, China.
- Fujian Key Laboratory of Tumor Microbiology, Fujian Medical University, Fuzhou, 350108, China.
| |
Collapse
|
95
|
Wang S, Wei J, Yang Z. Discrimination Structure Complementarity-Based Feature Selection. Comput Intell 2017. [DOI: 10.1111/coin.12118] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Shuqin Wang
- College of Computer and Information Engineering; Tianjin Normal University; Tianjin China
| | - Jinmao Wei
- College of Computer and Control Engineering; Nankai University; Tianjin China
- College of Software; Nankai University; Tianjin China
| | - Zhenglu Yang
- College of Computer and Control Engineering; Nankai University; Tianjin China
- College of Software; Nankai University; Tianjin China
| |
Collapse
|
96
|
Abstract
Background The Receiver Operator Characteristic (ROC) curve is well-known in evaluating classification performance in biomedical field. Owing to its superiority in dealing with imbalanced and cost-sensitive data, the ROC curve has been exploited as a popular metric to evaluate and find out disease-related genes (features). The existing ROC-based feature selection approaches are simple and effective in evaluating individual features. However, these approaches may fail to find real target feature subset due to their lack of effective means to reduce the redundancy between features, which is essential in machine learning. Results In this paper, we propose to assess feature complementarity by a trick of measuring the distances between the misclassified instances and their nearest misses on the dimensions of pairwise features. If a misclassified instance and its nearest miss on one feature dimension are far apart on another feature dimension, the two features are regarded as complementary to each other. Subsequently, we propose a novel filter feature selection approach on the basis of the ROC analysis. The new approach employs an efficient heuristic search strategy to select optimal features with highest complementarities. The experimental results on a broad range of microarray data sets validate that the classifiers built on the feature subset selected by our approach can get the minimal balanced error rate with a small amount of significant features. Conclusions Compared with other ROC-based feature selection approaches, our new approach can select fewer features and effectively improve the classification performance.
Collapse
|
97
|
|
98
|
|
99
|
Bari MG, Salekin S, Zhang JM. A Robust and Efficient Feature Selection Algorithm for Microarray Data. Mol Inform 2016; 36. [PMID: 28000384 DOI: 10.1002/minf.201600099] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2016] [Accepted: 11/21/2016] [Indexed: 12/20/2022]
Abstract
In the past decades, a few synergistic feature selection algorithms have been published, which includes Cooperative Index (CI) and K-Top Scoring Pair (k-TSP). These algorithms consider the synergistic behavior of features when they are included in a feature panel. Although promising results have been shown for these algorithms, there is lack of a comprehensive and fair comparison with other feature selection algorithms across a large number of microarray datasets in terms of classification accuracy and computational complexity. There is a need in evaluating their performance and reducing the complexity of such algorithms. We compared the performance of synergistic feature selection algorithms with 11 other commonly used algorithms based on 22 microarray gene expression binary class datasets. The evaluation confirms that synergistic algorithms such as CI and k-TSP will gradually increase the classification performance as more features are used in the classifiers. Also, in order to cut down computational cost, we proposed a new feature selection ranking score called Positive Synergy Index (PSI). Testing results show that features selected using PSI as well as synergistic feature selection algorithms provide better performance compared to with all other methods, while PSI has a computational complexity significantly lower than that of other synergistic algorithms.
Collapse
Affiliation(s)
- Mehrab Ghanat Bari
- Dept. of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN, 55905
| | - Sirajul Salekin
- Dept. of Electrical and Computer Engineering, The University of Texas as San Antonio, San Antonio, TX, 78249
| | - Jianqiu Michelle Zhang
- Dept. of Electrical and Computer Engineering, The University of Texas as San Antonio, San Antonio, TX, 78249
| |
Collapse
|
100
|
Ganesh Kumar P, Kavitha MS, Ahn BC. Automated Detection of Cancer Associated Genes Using a Combined Fuzzy-Rough-Set-Based F-Information and Water Swirl Algorithm of Human Gene Expression Data. PLoS One 2016; 11:e0167504. [PMID: 27936033 PMCID: PMC5148587 DOI: 10.1371/journal.pone.0167504] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2016] [Accepted: 11/15/2016] [Indexed: 11/22/2022] Open
Abstract
This study describes a novel approach to reducing the challenges of highly nonlinear multiclass gene expression values for cancer diagnosis. To build a fruitful system for cancer diagnosis, in this study, we introduced two levels of gene selection such as filtering and embedding for selection of potential genes and the most relevant genes associated with cancer, respectively. The filter procedure was implemented by developing a fuzzy rough set (FR)-based method for redefining the criterion function of f-information (FI) to identify the potential genes without discretizing the continuous gene expression values. The embedded procedure is implemented by means of a water swirl algorithm (WSA), which attempts to optimize the rule set and membership function required to classify samples using a fuzzy-rule-based multiclassification system (FRBMS). Two novel update equations are proposed in WSA, which have better exploration and exploitation abilities while designing a self-learning FRBMS. The efficiency of our new approach was evaluated on 13 multicategory and 9 binary datasets of cancer gene expression. Additionally, the performance of the proposed FRFI-WSA method in designing an FRBMS was compared with existing methods for gene selection and optimization such as genetic algorithm (GA), particle swarm optimization (PSO), and artificial bee colony algorithm (ABC) on all the datasets. In the global cancer map with repeated measurements (GCM_RM) dataset, the FRFI-WSA showed the smallest number of 16 most relevant genes associated with cancer using a minimal number of 26 compact rules with the highest classification accuracy (96.45%). In addition, the statistical validation used in this study revealed that the biological relevance of the most relevant genes associated with cancer and their linguistics detected by the proposed FRFI-WSA approach are better than those in the other methods. The simple interpretable rules with most relevant genes and effectively classified samples suggest that the proposed FRFI-WSA approach is reliable for classification of an individual’s cancer gene expression data with high precision and therefore it could be helpful for clinicians as a clinical decision support system.
Collapse
Affiliation(s)
| | - Muthu Subash Kavitha
- Department of Computer Vision and Image Processing, School of Electronics Engineering, Kyungpook National University, Daegu, South Korea
| | - Byeong-Cheol Ahn
- Department of Nuclear Medicine, Kyungpook National University School of Medicine and Hospital, Daegu, South Korea
- * E-mail:
| |
Collapse
|