Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Tan AC, Naiman DQ, Xu L, Winslow RL, Geman D. Simple decision rules for classifying human cancers from gene expression profiles. Bioinformatics 2005;21:3896-904. [PMID: 16105897 PMCID: PMC1987374 DOI: 10.1093/bioinformatics/bti631] [Citation(s) in RCA: 248] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

For:	Tan AC, Naiman DQ, Xu L, Winslow RL, Geman D. Simple decision rules for classifying human cancers from gene expression profiles. Bioinformatics 2005;21:3896-904. [PMID: 16105897 PMCID: PMC1987374 DOI: 10.1093/bioinformatics/bti631] [Citation(s) in RCA: 248] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

Number

Cited by Other Article(s)

Shen Y, Chu Q, Yin X, He Y, Bai P, Wang Y, Fang W, Timko MP, Fan L, Jiang W. TOD-CUP: a gene expression rank-based majority vote algorithm for tissue origin diagnosis of cancers of unknown primary. Brief Bioinform 2020;22:2106-2118. [PMID: 32266390 DOI: 10.1093/bib/bbaa031] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2019] [Revised: 01/19/2020] [Accepted: 02/19/2020] [Indexed: 12/14/2022] Open

Extract interpretability-accuracy balanced rules from artificial neural networks: A review. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.01.036] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Li X, Huang H, Zhang J, Jiang F, Guo Y, Shi Y, Guo Z, Ao L. A qualitative transcriptional signature for predicting the biochemical recurrence risk of prostate cancer patients after radical prostatectomy. Prostate 2020;80:376-387. [PMID: 31961962 PMCID: PMC7065139 DOI: 10.1002/pros.23952] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Accepted: 01/02/2020] [Indexed: 12/27/2022]

Abstract

BACKGROUND

The qualitative transcriptional characteristics, the within-sample relative expression orderings (REOs) of genes, are highly robust against batch effects and sample quality variations. Hence, we develop a qualitative transcriptional signature based on REOs to predict the biochemical recurrence risk of prostate cancer (PCa) patients after radical prostatectomy.

METHODS

Gene pairs with REOs significantly correlated with the biochemical recurrence-free survival (BFS) were identified from 131 PCa samples in the training data set. From these gene pairs, we selected a qualitative transcriptional signature based on the within-sample REOs of gene pairs which could predict the recurrence risk of PCa patients after radical prostatectomy.

RESULTS

A signature consisting of 74 gene pairs, named 74-GPS, was developed for predicting the recurrence risk of PCa patients after radical prostatectomy based on the majority voting rule that a sample was assigned as high risk when at least 37 gene pairs of the 74-GPS voted for high risk; otherwise, low risk. The signature was validated in six independent datasets produced by different platforms. In each of the validation datasets, the Kaplan-Meier survival analysis showed that the average BFS of the low-risk group was significantly better than that of the high-risk group. Analyses of multiomics data of PCa samples from TCGA suggested that both the epigenomic and genomic alternations could cause the reproducible transcriptional differences between the two different prognostic groups.

CONCLUSIONS

The proposed qualitative transcriptional signature can robustly stratify PCa patients after radical prostatectomy into two groups with different recurrence risk and distinct multiomics characteristics. Hence, 74-GPS may serve as a helpful tool for guiding the management of PCa patients with radical prostatectomy at the individual level.

Collapse

Affiliation(s)

Xiang Li Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical SciencesFujian Medical UniversityFuzhouChina Key Laboratory of Medical BioinformaticsFujian Medical UniversityFuzhouChina Fujian Key Laboratory of Tumor MicrobiologyFujian Medical UniversityFuzhouChina
Haiyan Huang Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical SciencesFujian Medical UniversityFuzhouChina
Jiahui Zhang Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical SciencesFujian Medical UniversityFuzhouChina
Fengle Jiang Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical SciencesFujian Medical UniversityFuzhouChina
Yating Guo Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical SciencesFujian Medical UniversityFuzhouChina
Yidan Shi Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical SciencesFujian Medical UniversityFuzhouChina
Zheng Guo Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical SciencesFujian Medical UniversityFuzhouChina Key Laboratory of Medical BioinformaticsFujian Medical UniversityFuzhouChina Fujian Key Laboratory of Tumor MicrobiologyFujian Medical UniversityFuzhouChina
Lu Ao Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical SciencesFujian Medical UniversityFuzhouChina Key Laboratory of Medical BioinformaticsFujian Medical UniversityFuzhouChina Fujian Key Laboratory of Tumor MicrobiologyFujian Medical UniversityFuzhouChina

Collapse

Wang C, Li J. SINC: a scale-invariant deep-neural-network classifier for bulk and single-cell RNA-seq data. Bioinformatics 2020;36:1779-1784. [PMID: 31647523 DOI: 10.1093/bioinformatics/btz801] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Revised: 10/01/2019] [Accepted: 10/23/2019] [Indexed: 11/12/2022] Open

Krzhizhanovskaya VV, Závodszky G, Lees MH, Dongarra JJ, Sloot PMA, Brissos S, Teixeira J. Tree Based Advanced Relative Expression Analysis. LECTURE NOTES IN COMPUTER SCIENCE 2020. [PMCID: PMC7304016 DOI: 10.1007/978-3-030-50420-5_37] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Scala G, Federico A, Fortino V, Greco D, Majello B. Knowledge Generation with Rule Induction in Cancer Omics. Int J Mol Sci 2019;21:E18. [PMID: 31861438 PMCID: PMC6981587 DOI: 10.3390/ijms21010018] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Revised: 11/26/2019] [Accepted: 12/13/2019] [Indexed: 12/21/2022] Open

Smolander J, Stupnikov A, Glazko G, Dehmer M, Emmert-Streib F. Comparing biological information contained in mRNA and non-coding RNAs for classification of lung cancer patients. BMC Cancer 2019;19:1176. [PMID: 31796020 PMCID: PMC6892207 DOI: 10.1186/s12885-019-6338-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 11/06/2019] [Indexed: 12/26/2022] Open

Jazayeri N, Sajedi H. Breast cancer diagnosis based on genomic data and extreme learning machine. SN APPLIED SCIENCES 2019. [DOI: 10.1007/s42452-019-1789-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open

Sandhu V, Labori KJ, Borgida A, Lungu I, Bartlett J, Hafezi-Bakhtiari S, Denroche RE, Jang GH, Pasternack D, Mbaabali F, Watson M, Wilson J, Kure EH, Gallinger S, Haibe-Kains B. Meta-Analysis of 1,200 Transcriptomic Profiles Identifies a Prognostic Model for Pancreatic Ductal Adenocarcinoma. JCO Clin Cancer Inform 2019;3:1-16. [DOI: 10.1200/cci.18.00102] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open

Abstract PURPOSE With a dismal 8% median 5-year overall survival, pancreatic ductal adenocarcinoma (PDAC) is a highly lethal malignancy. Only 10% to 20% of patients are eligible for surgery, and more than 50% of these patients will die within 1 year of surgery. Building a molecular predictor of early death would enable the selection of patients with PDAC who are at high risk. MATERIALS AND METHODS We developed the Pancreatic Cancer Overall Survival Predictor (PCOSP), a prognostic model built from a unique set of 89 PDAC tumors in which gene expression was profiled using both microarray and sequencing platforms. We used a meta-analysis framework that was based on the binary gene pair method to create gene expression barcodes that were robust to biases arising from heterogeneous profiling platforms and batch effects. Leveraging the largest compendium of PDAC transcriptomic data sets to date, we show that PCOSP is a robust single-sample predictor of early death—1 year or less—after surgery in a subset of 823 samples with available transcriptomics and survival data. RESULTS The PCOSP model was strongly and significantly prognostic, with a meta-estimate of the area under the receiver operating curve of 0.70 ( P = 2.6E−22) and d-index (robust hazard ratio) of 1.9 (range, 1.6 to 2.3; ( = 1.4E−04) for binary and survival predictions, respectively. The prognostic value of PCOSP was independent of clinicopathologic parameters and molecular subtypes. Over-representation analysis of the PCOSP 2,619 gene pairs—1,070 unique genes—unveiled pathways associated with Hedgehog signaling, epithelial–mesenchymal transition, and extracellular matrix signaling. CONCLUSION PCOSP could improve treatment decisions by identifying patients who will not benefit from standard surgery/chemotherapy but who may benefit from a more aggressive treatment approach or enrollment in a clinical trial. Collapse

A Wrapper Feature Subset Selection Method Based on Randomized Search and Multilayer Structure. BIOMED RESEARCH INTERNATIONAL 2019;2019:9864213. [PMID: 31828154 PMCID: PMC6885241 DOI: 10.1155/2019/9864213] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/08/2019] [Revised: 08/10/2019] [Accepted: 08/27/2019] [Indexed: 12/11/2022]

Abstract

The identification of discriminative features from information-rich data with the goal of clinical diagnosis is crucial in the field of biomedical science. In this context, many machine-learning techniques have been widely applied and achieved remarkable results. However, disease, especially cancer, is often caused by a group of features with complex interactions. Unlike traditional feature selection methods, which only focused on finding single discriminative features, a multilayer feature subset selection method (MLFSSM), which employs randomized search and multilayer structure to select a discriminative subset, is proposed herein. In each level of this method, many feature subsets are generated to assure the diversity of the combinations, and the weights of features are evaluated on the performances of the subsets. The weight of a feature would increase if the feature is selected into more subsets with better performances compared with other features on the current layer. In this manner, the values of feature weights are revised layer-by-layer; the precision of feature weights is constantly improved; and better subsets are repeatedly constructed by the features with higher weights. Finally, the topmost feature subset of the last layer is returned. The experimental results based on five public gene datasets showed that the subsets selected by MLFSSM were more discriminative than the results by traditional feature methods including LVW (a feature subset method used the Las Vegas method for randomized search strategy), GAANN (a feature subset selection method based genetic algorithm (GA)), and support vector machine recursive feature elimination (SVM-RFE). Furthermore, MLFSSM showed higher classification performance than some state-of-the-art methods which selected feature pairs or groups, including top scoring pair (TSP), k-top scoring pairs (K-TSP), and relative simplicity-based direct classifier (RS-DC).

Collapse

Ooi A. Advances in hereditary leiomyomatosis and renal cell carcinoma (HLRCC) research. Semin Cancer Biol 2019;61:158-166. [PMID: 31689495 DOI: 10.1016/j.semcancer.2019.10.016] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Accepted: 10/26/2019] [Indexed: 12/30/2022]

Fu Y, Qi L, Guo W, Jin L, Song K, You T, Zhang S, Gu Y, Zhao W, Guo Z. A qualitative transcriptional signature for predicting microsatellite instability status of right-sided Colon Cancer. BMC Genomics 2019;20:769. [PMID: 31646964 PMCID: PMC6813057 DOI: 10.1186/s12864-019-6129-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Accepted: 09/23/2019] [Indexed: 12/16/2022] Open

Takahashi Y, Gleber-Netto FO, Bell D, Roberts D, Xie TX, Abdelmeguid AS, Pickering C, Myers JN, Hanna EY. Identification of markers predictive for response to induction chemotherapy in patients with sinonasal undifferentiated carcinoma. Oral Oncol 2019;97:56-61. [PMID: 31421472 DOI: 10.1016/j.oraloncology.2019.07.028] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Revised: 05/09/2019] [Accepted: 07/29/2019] [Indexed: 01/09/2023]

A novel analysis method for biomarker identification based on horizontal relationship: identifying potential biomarkers from large-scale hepatocellular carcinoma metabolomics data. Anal Bioanal Chem 2019;411:6377-6386. [DOI: 10.1007/s00216-019-02011-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Revised: 06/03/2019] [Accepted: 07/01/2019] [Indexed: 02/07/2023]

Lee MY, Kim TK, Walters KA, Wang K. A biological function based biomarker panel optimization process. Sci Rep 2019;9:7365. [PMID: 31089177 PMCID: PMC6517383 DOI: 10.1038/s41598-019-43779-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2018] [Accepted: 04/26/2019] [Indexed: 11/09/2022] Open

A new data analysis method based on feature linear combination. J Biomed Inform 2019;94:103173. [PMID: 30965135 DOI: 10.1016/j.jbi.2019.103173] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2018] [Revised: 04/02/2019] [Accepted: 04/06/2019] [Indexed: 01/15/2023]

Abstract

In biological data, feature relationships are complex and diverse, they could reflect physiological and pathological changes. Defining simple and efficient classification rules based on feature relationships is helpful for discriminating different conditions and studying disease mechanism. The popular data analysis method, k top scoring pairs (k-TSP), explores the feature relationship by focusing on the difference of the relative level of two features in different groups and classifies samples based on the exploration. To define more efficient classification rules, we propose a new data analysis method based on the linear combination of k > 0 top scoring pairs (LC-k-TSP). LC-k-TSP applies support vector machine (SVM) to define the best linear relationship of each feature pair, scores feature pairs by the discriminative abilities of the corresponding linear combinations and selects k disjoint top scoring pairs to construct an ensemble classifier. Experiments on twelve public datasets showed the superiority of LC-k-TSP over k-TSP which evaluates the relationship of every two features in the same way. The experiment also illustrated that LC-k-TSP performed similarly to SVM and random forest (RF) in accuracy rate. LC-k-TSP studies the own unique linear combination for each feature pair and defines simple classification rules, it is easy to explore the biomedical explanation. Finally, we applied LC-k-TSP to analyze the hepatocellular carcinoma (HCC) metabolomics data and define the simple classification rules for discrimination of different liver diseases. It obtained accuracy rates of 89.76% and 89.13% in distinguishing between small HCC and hepatic cirrhosis (CIR) groups as well as between HCC and CIR groups, superior to 87.99% and 80.35% by k-TSP. Hence, defining classification rules based on feature relationships is an effective way to analyze biological data. LC-k-TSP which checks different feature pairs by their corresponding unique best linear relationship has the superiority over k-TSP which checks each pair by the same linear relationship. Availability and implementation: http://www.402.dicp.ac.cn/download_ok_4.htm.

Collapse

A combined gene expression tool for parallel histological prediction and gene fusion detection in non-small cell lung cancer. Sci Rep 2019;9:5207. [PMID: 30914778 PMCID: PMC6435686 DOI: 10.1038/s41598-019-41585-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Accepted: 03/12/2019] [Indexed: 01/10/2023] Open

Abstract

Accurate histological classification and identification of fusion genes represent two cornerstones of clinical diagnostics in non-small cell lung cancer (NSCLC). Here, we present a NanoString gene expression platform and a novel platform-independent, single sample predictor (SSP) of NSCLC histology for combined, simultaneous, histological classification and fusion gene detection in minimal formalin fixed paraffin embedded (FFPE) tissue. The SSP was developed in 68 NSCLC tumors of adenocarcinoma (AC), squamous cell carcinoma (SqCC) and large-cell neuroendocrine carcinoma (LCNEC) histology, based on NanoString expression of 11 (CHGA, SYP, CD56, SFTPG, NAPSA, TTF-1, TP73L, KRT6A, KRT5, KRT40, KRT16) relevant genes for IHC-based NSCLC histology classification. The SSP was combined with a gene fusion detection module (analyzing ALK, RET, ROS1, MET, NRG1, and NTRK1) into a multicomponent NanoString assay. The histological SSP was validated in six cohorts varying in size (n = 11–199), tissue origin (early or advanced disease), histological composition (including undifferentiated cancer), and gene expression platform. Fusion gene detection revealed five EML4-ALK fusions, four KIF5B-RET fusions, two CD74-NRG1 fusion and three MET exon 14 skipping events among 131 tested cases. The histological SSP was successfully trained and tested in the development cohort (mean AUC = 0.96 in iterated test sets). The SSP proved successful in predicting histology of NSCLC tumors of well-defined subgroups and difficult undifferentiated morphology irrespective of gene expression data platform. Discrepancies between gene expression prediction and histologic diagnosis included cases with mixed histologies, true large cell carcinomas, or poorly differentiated adenocarcinomas with mucin expression. In summary, we present a proof-of-concept multicomponent assay for parallel histological classification and multiplexed fusion gene detection in archival tissue, including a novel platform-independent histological SSP classifier. The assay and SSP could serve as a promising complement in the routine evaluation of diagnostic lung cancer biopsies.

Collapse

Khamesipour A, Kagaris D. Speeding up the discovery of combinations of differentially expressed genes for disease prediction and classification. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2019;170:69-80. [PMID: 30712605 DOI: 10.1016/j.cmpb.2019.01.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2018] [Revised: 01/11/2019] [Accepted: 01/11/2019] [Indexed: 06/09/2023]

Abstract

BACKGROUND AND OBJECTIVE

Finding combinations (i.e., pairs, or more generally, q-tuples with q ≥ 2) of genes whose behavior as a group differs significantly between two classes has received a lot of attention in the quest for the discovery of simple, accurate, and easily interpretable decision rules for disease classification and prediction. For example, the Top Scoring Pair (TSP) method seeks to find pairs of genes so that the probability of the reversal of the relative ranking of the expression levels of the genes in the two classes is maximized. The computational cost of finding a q-tuple of genes that scores highest under a given metric is O(G^q), where G is the total number of genes. This cost is often problematic or prohibitive in practice (even for q=2), as the number of genes G is often in the order of tens of thousands.

METHODS

In this paper, we show that this computational cost can be significantly reduced by excluding from consideration genes whose behavior is almost identical in the two classes and therefore their inclusion in any q-tuple is rather non-informative. Our criterion for the exclusion of genes is supported by a statistically robust metric, the Area Under the Curve (AUC) of the corresponding Receiver Operating Characteristic (ROC) curve. By filtering out genes whose AUC value is below a user-chosen threshold, as determined by a procedure that we describe in the paper, dramatic reductions in the run times are obtained while maintaining the same classification accuracy.

RESULTS

We have experimentally verified the gains of this approach on several case studies involving ovarian, colon, leukemia, breast and prostate cancers, and diffuse large b-cell lymphoma.

CONCLUSIONS

The proposed method is not only faster (for example, we observed an average 78.65% reduction over the run time of TSP) while maintaining the same classification accuracy, but it can even result in better classification accuracy due to its inherent ability to avoid the so-called "pivot" (non-informative) genes that may intrude in q-tuples chosen otherwise.

Collapse

Lin X, Huang X, Zhou L, Ren W, Zeng J, Yao W, Wang X. The Robust Classification Model Based on Combinatorial Features. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019;16:650-657. [PMID: 29990202 DOI: 10.1109/tcbb.2017.2779512] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Li M, Li H, Hong G, Tang Z, Liu G, Lin X, Lin M, Qi L, Guo Z. Identifying primary site of lung-limited Cancer of unknown primary based on relative gene expression orderings. BMC Cancer 2019;19:67. [PMID: 30642283 PMCID: PMC6332677 DOI: 10.1186/s12885-019-5274-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Accepted: 01/03/2019] [Indexed: 01/11/2023] Open

Grzadkowski MR, Sendorek DH, P'ng C, Huang V, Boutros PC. A comparative study of survival models for breast cancer prognostication revisited: the benefits of multi-gene models. BMC Bioinformatics 2018;19:400. [PMID: 30390622 PMCID: PMC6215649 DOI: 10.1186/s12859-018-2430-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2018] [Accepted: 10/10/2018] [Indexed: 01/01/2023] Open

Wu P, Wang D. Classification of a DNA Microarray for Diagnosing Cancer Using a Complex Network Based Method. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018;16:801-808. [PMID: 30183642 DOI: 10.1109/tcbb.2018.2868341] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Kerins MJ, Milligan J, Wohlschlegel JA, Ooi A. Fumarate hydratase inactivation in hereditary leiomyomatosis and renal cell cancer is synthetic lethal with ferroptosis induction. Cancer Sci 2018;109:2757-2766. [PMID: 29917289 PMCID: PMC6125459 DOI: 10.1111/cas.13701] [Citation(s) in RCA: 62] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Accepted: 06/17/2018] [Indexed: 12/31/2022] Open

Zheng YF, Lu X, Zhang XY, Guan BG. The landscape of DNA methylation in hepatocellular carcinoma. J Cell Physiol 2018;234:2631-2638. [PMID: 30145793 DOI: 10.1002/jcp.27077] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Accepted: 06/28/2018] [Indexed: 12/24/2022]

Analyzing omics data by pair-wise feature evaluation with horizontal and vertical comparisons. J Pharm Biomed Anal 2018;157:20-26. [PMID: 29754039 DOI: 10.1016/j.jpba.2018.04.052] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2017] [Revised: 04/29/2018] [Accepted: 04/30/2018] [Indexed: 11/24/2022]

Hsu TY, Lin JM, Nguyen MHT, Chung FH, Tsai CC, Cheng HH, Lai YJ, Hung HN, Chen CS. Antigen Analysis of Pre-Eclamptic Plasma Antibodies Using Escherichia Coli Proteome Chips. Mol Cell Proteomics 2018;17:1457-1469. [PMID: 29284593 PMCID: PMC6072543 DOI: 10.1074/mcp.ra117.000139] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2017] [Revised: 12/13/2017] [Indexed: 12/19/2022] Open

Abstract

Pre-eclampsia is one of the main causes of perinatal mortality and morbidity. Many biomarkers for diagnosing pre-eclampsia have been found but most have low accuracy. Therefore, a potential marker that can detect pre-eclampsia with high accuracy is required. Infection has been reported as a cause of pre-eclampsia. In recent years, protein microarray chips have been recognized as a strong and robust tool for profiling antibodies for infection diagnoses. The purpose of the present study was to profile antibodies in the human plasma of healthy and pre-eclamptic pregnancies to identify suitable biomarkers. In this study, an Escherichia coli chip was probed with samples from 29 individuals (16 pre-eclamptic women and 13 healthy pregnant women) to profile plasma antibodies. Bioinformatics tools were used to analyze the results, discover conserved motifs, compare against the entire human proteome, and perform protein functional analysis. An antibody classifier was identified using k-top scoring pairs and additional samples for a blinded test were collected. The findings indicated that compared with the healthy women, the pre-eclamptic women exhibited 108 and 130 differentially immunogenic proteins against human immunoglobulins G and M, respectively. In addition, pre-eclamptic women developed more immunoglobulin G but less immunoglobulin M against bacterial surface proteins compared with healthy women. The k-top scoring pairs identified five pairs of immunogenic proteins as classifiers with a high accuracy of 90% in the blind test. [AG] [ISV] GV [AE] L [LF] and [IV] [IV] RI [AG] [AD] E were the consensus motifs observed in immunogenic proteins in the immunoglobulin G and immunoglobulin M of pre-eclamptic women, respectively, whereas GA [AG] [AL] L [LF] and [SRY] [IQML] [ILV] [ILV] [ACG] GI [GH] [AEF] [AK] [ATY] [RG] N [IV] were observed in the immunoglobulins G and immunoglobulin M of healthy women, respectively.

Collapse

Affiliation(s)

Te-Yao Hsu From the ‡Department of Obstetrics and Gynecology, Kaohsiung Chang Gung Memorial Hospital and Chang Gung University College of Medicine, Kaohsiung, Taiwan;
Jyun-Mu Lin §Graduate Institute of Systems Biology and Bioinformatics, National Central University, Jhongli 32001, Taiwan ¶Department of Biomedical Science and Engineering, National Central University, Jhongli 32001, Taiwan
Mai-Huong T Nguyen §Graduate Institute of Systems Biology and Bioinformatics, National Central University, Jhongli 32001, Taiwan ¶Department of Biomedical Science and Engineering, National Central University, Jhongli 32001, Taiwan
Feng-Hsiang Chung §Graduate Institute of Systems Biology and Bioinformatics, National Central University, Jhongli 32001, Taiwan ¶Department of Biomedical Science and Engineering, National Central University, Jhongli 32001, Taiwan
Ching-Chang Tsai From the ‡Department of Obstetrics and Gynecology, Kaohsiung Chang Gung Memorial Hospital and Chang Gung University College of Medicine, Kaohsiung, Taiwan
Hsin-Hsin Cheng From the ‡Department of Obstetrics and Gynecology, Kaohsiung Chang Gung Memorial Hospital and Chang Gung University College of Medicine, Kaohsiung, Taiwan
Yun-Ju Lai From the ‡Department of Obstetrics and Gynecology, Kaohsiung Chang Gung Memorial Hospital and Chang Gung University College of Medicine, Kaohsiung, Taiwan
Hsuan-Ning Hung From the ‡Department of Obstetrics and Gynecology, Kaohsiung Chang Gung Memorial Hospital and Chang Gung University College of Medicine, Kaohsiung, Taiwan
Chien-Sheng Chen §Graduate Institute of Systems Biology and Bioinformatics, National Central University, Jhongli 32001, Taiwan; ¶Department of Biomedical Science and Engineering, National Central University, Jhongli 32001, Taiwan ‖Department of Food Safety/Hygiene and Risk Management, College of Medicine, National Cheng Kung University, Tainan City 704, Taiwan

Collapse

Jiao Y, Vert JP, Vert JP, Jiao Y, Vert JP. The Kendall and Mallows Kernels for Permutations. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2018;40:1755-1769. [PMID: 28981406 DOI: 10.1109/tpami.2017.2719680] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]

AUCTSP: an improved biomarker gene pair class predictor. BMC Bioinformatics 2018;19:244. [PMID: 29940833 PMCID: PMC6020231 DOI: 10.1186/s12859-018-2231-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2018] [Accepted: 06/04/2018] [Indexed: 11/10/2022] Open

Abstract

Background

The Top Scoring Pair (TSP) classifier, based on the concept of relative ranking reversals in the expressions of pairs of genes, has been proposed as a simple, accurate, and easily interpretable decision rule for classification and class prediction of gene expression profiles. The idea that differences in gene expression ranking are associated with presence or absence of disease is compelling and has strong biological plausibility. Nevertheless, the TSP formulation ignores significant available information which can improve classification accuracy and is vulnerable to selecting genes which do not have differential expression in the two conditions (“pivot" genes).

Results

We introduce the AUCTSP classifier as an alternative rank-based estimator of the magnitude of the ranking reversals involved in the original TSP. The proposed estimator is based on the Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) and as such, takes into account the separation of the entire distribution of gene expression levels in gene pairs under the conditions considered, as opposed to comparing gene rankings within individual subjects as in the original TSP formulation. Through extensive simulations and case studies involving classification in ovarian, leukemia, colon, breast and prostate cancers and diffuse large b-cell lymphoma, we show the superiority of the proposed approach in terms of improving classification accuracy, avoiding overfitting and being less prone to selecting non-informative (pivot) genes.

Conclusions

The proposed AUCTSP is a simple yet reliable and robust rank-based classifier for gene expression classification. While the AUCTSP works by the same principle as TSP, its ability to determine the top scoring gene pair based on the relative rankings of two marker genes across all subjects as opposed to each individual subject results in significant performance gains in classification accuracy. In addition, the proposed method tends to avoid selection of non-informative (pivot) genes as members of the top-scoring pair.

Electronic supplementary material

The online version of this article (10.1186/s12859-018-2231-1) contains supplementary material, which is available to authorized users.

Collapse

Digitizing omics profiles by divergence from a baseline. Proc Natl Acad Sci U S A 2018;115:4545-4552. [PMID: 29666255 PMCID: PMC5939095 DOI: 10.1073/pnas.1721628115] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open

Abstract

Technological advances enable increasingly comprehensive profiling of the molecular landscapes of cells, and these data can inform the personalized treatment of complex diseases. Two major obstacles are the complexity of these data and the high degree of person-to-person heterogeneity. We develop a highly simplified, personalized data representation by comparing the profile of an individual to the range of landscapes in a baseline population, thereby mimicking basic clinical diagnostic testing for departures of selected variables from normal levels. Moreover, our method can be applied to any data modality and at any level of granularity, from single features to any subset of features treated as a single entity, for example the gene expression levels in a pathway. Experiments involve both healthy human tissues and various cancer subtypes.

Data collected from omics technologies have revealed pervasive heterogeneity and stochasticity of molecular states within and between phenotypes. A prominent example of such heterogeneity occurs between genome-wide mRNA, microRNA, and methylation profiles from one individual tumor to another, even within a cancer subtype. However, current methods in bioinformatics, such as detecting differentially expressed genes or CpG sites, are population-based and therefore do not effectively model intersample diversity. Here we introduce a unified theory to quantify sample-level heterogeneity that is applicable to a single omics profile. Specifically, we simplify an omics profile to a digital representation based on the omics profiles from a set of samples from a reference or baseline population (e.g., normal tissues). The state of any subprofile (e.g., expression vector for a subset of genes) is said to be “divergent” if it lies outside the estimated support of the baseline distribution and is consequently interpreted as “dysregulated” relative to that baseline. We focus on two cases: single features (e.g., individual genes) and distinguished subsets (e.g., regulatory pathways). Notably, since the divergence analysis is at the individual sample level, dysregulation can be analyzed probabilistically; for example, one can estimate the probability that a gene or pathway is divergent in some population. Finally, the reduction in complexity facilitates a more “personalized” and biologically interpretable analysis of variation, as illustrated by experiments involving tissue characterization, disease detection and progression, and disease–pathway associations.

Collapse

Circumvent the uncertainty in the applications of transcriptional signatures to tumor tissues sampled from different tumor sites. Oncotarget 2018;8:30265-30275. [PMID: 28427173 PMCID: PMC5444741 DOI: 10.18632/oncotarget.15754] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2016] [Accepted: 01/30/2017] [Indexed: 11/25/2022] Open

Chen R, Guan Q, Cheng J, He J, Liu H, Cai H, Hong G, Zhang J, Li N, Ao L, Guo Z. Robust transcriptional tumor signatures applicable to both formalin-fixed paraffin-embedded and fresh-frozen samples. Oncotarget 2018;8:6652-6662. [PMID: 28036264 PMCID: PMC5351660 DOI: 10.18632/oncotarget.14257] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2016] [Accepted: 12/02/2016] [Indexed: 12/19/2022] Open

Pitroda SP, Stack ME, Liu GF, Song SS, Chen L, Liang H, Parekh AD, Huang X, Roach P, Posner MC, Weichselbaum RR, Khodarev NN. JAK2 Inhibitor SAR302503 Abrogates PD-L1 Expression and Targets Therapy-Resistant Non–small Cell Lung Cancers. Mol Cancer Ther 2018;17:732-739. [DOI: 10.1158/1535-7163.mct-17-0667] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2017] [Revised: 11/27/2017] [Accepted: 01/17/2018] [Indexed: 11/16/2022]

Guan Q, Yan H, Chen Y, Zheng B, Cai H, He J, Song K, Guo Y, Ao L, Liu H, Zhao W, Wang X, Guo Z. Quantitative or qualitative transcriptional diagnostic signatures? A case study for colorectal cancer. BMC Genomics 2018;19:99. [PMID: 29378509 PMCID: PMC5789529 DOI: 10.1186/s12864-018-4446-y] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Accepted: 01/11/2018] [Indexed: 12/20/2022] Open

Abstract

Background

Due to experimental batch effects, the application of a quantitative transcriptional signature for disease diagnoses commonly requires inter-sample data normalization, which would be hardly applicable under common clinical settings. Many cancers might have qualitative differences with the non-cancer states in the gene expression pattern. Therefore, it is reasonable to explore the power of qualitative diagnostic signatures which are robust against experimental batch effects and other random factors.

Results

Firstly, using data of technical replicate samples from the MicroArray Quality Control (MAQC) project, we demonstrated that the low-throughput PCR-based technologies also exist large measurement variations for gene expression even when the samples were measured in the same test site. Then, we demonstrated the critical limitation of low stability for classifiers based on quantitative transcriptional signatures in applications to individual samples through a case study using a support vector machine and a naïve Bayesian classifier to discriminate colorectal cancer tissues from normal tissues. To address this problem, we identified a signature consisting of three gene pairs for discriminating colorectal cancer tissues from non-cancer (normal and inflammatory bowel disease) tissues based on within-sample relative expression orderings (REOs) of these gene pairs. The signature was well verified using 22 independent datasets measured by different microarray and RNA_seq platforms, obviating the need of inter-sample data normalization.

Conclusions

Subtle quantitative information of gene expression measurements tends to be unstable under current technical conditions, which will introduce uncertainty to clinical applications of the quantitative transcriptional diagnostic signatures. For diagnosis of disease states with qualitative transcriptional characteristics, the qualitative REO-based signatures could be robustly applied to individual samples measured by different platforms.

Electronic supplementary material

The online version of this article (10.1186/s12864-018-4446-y) contains supplementary material, which is available to authorized users.

Collapse

Affiliation(s)

Qingzhou Guan Fujian Key Laboratory of Medical Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350122, China
Haidan Yan Fujian Key Laboratory of Medical Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350122, China
Yanhua Chen Fujian Key Laboratory of Medical Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350122, China
Baotong Zheng Fujian Key Laboratory of Medical Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350122, China
Hao Cai Fujian Key Laboratory of Medical Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350122, China
Jun He Fujian Key Laboratory of Medical Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350122, China
Kai Song College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
You Guo Fujian Key Laboratory of Medical Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350122, China.,Department of Preventive Medicine, School of Basic Medicine Sciences, Gannan Medical University, Ganzhou, 341000, China
Lu Ao Fujian Key Laboratory of Medical Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350122, China
Huaping Liu Fujian Key Laboratory of Medical Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350122, China
Wenyuan Zhao College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
Xianlong Wang Fujian Key Laboratory of Medical Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350122, China.
Zheng Guo Fujian Key Laboratory of Medical Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350122, China. .,Fujian Key Laboratory of Tumor Microbiology, Fujian Medical University, Fuzhou, 350122, China. .,College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China.

Collapse

Sheikhpour R, Sarram MA, Chahooki MAZ, Sheikhpour R. A kernelized non-parametric classifier based on feature ranking in anisotropic Gaussian kernel. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.06.035] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]

Li B, Cui Y, Diehn M, Li R. Development and Validation of an Individualized Immune Prognostic Signature in Early-Stage Nonsquamous Non-Small Cell Lung Cancer. JAMA Oncol 2017;3:1529-1537. [PMID: 28687838 DOI: 10.1001/jamaoncol.2017.1609] [Citation(s) in RCA: 287] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]

Abstract

Importance

The prevalence of early-stage non-small cell lung cancer (NSCLC) is expected to increase with recent implementation of annual screening programs. Reliable prognostic biomarkers are needed to identify patients at a high risk for recurrence to guide adjuvant therapy.

Objective

To develop a robust, individualized immune signature that can estimate prognosis in patients with early-stage nonsquamous NSCLC.

Design, Setting, and Participants

This retrospective study analyzed the gene expression profiles of frozen tumor tissue samples from 19 public NSCLC cohorts, including 18 microarray data sets and 1 RNA-Seq data set for The Cancer Genome Atlas (TCGA) lung adenocarcinoma cohort. Only patients with nonsquamous NSCLC with clinical annotation were included. Samples were from 2414 patients with nonsquamous NSCLC, divided into a meta-training cohort (729 patients), meta-testing cohort (716 patients), and 3 independent validation cohorts (439, 323, and 207 patients). All patients underwent surgery with a negative surgical margin, received no adjuvant or neoadjuvant therapy, and had publicly available gene expression data and survival information. Data were collected from July 22 through September 8, 2016.

Main Outcomes and Measures

Overall survival.

Results

Of 2414 patients (1205 men [50%], 1111 women [46%], and 98 of unknown sex [4%]; median age [range], 64 [15-90] years), a prognostic immune signature of 25 gene pairs consisting of 40 unique genes was constructed using the meta-training data set. In the meta-testing and validation cohorts, the immune signature significantly stratified patients into high- vs low-risk groups in terms of overall survival across and within subpopulations with stage I, IA, IB, or II disease and remained as an independent prognostic factor in multivariate analyses (hazard ratio range, 1.72 [95% CI, 1.26-2.33; P < .001] to 2.36 [95% CI, 1.47-3.79; P < .001]) after adjusting for clinical and pathologic factors. Several biological processes, including chemotaxis, were enriched among genes in the immune signature. The percentage of neutrophil infiltration (5.6% vs 1.8%) and necrosis (4.6% vs 1.5%) was significantly higher in the high-risk immune group compared with the low-risk groups in TCGA data set (P < .003). The immune signature achieved a higher accuracy (mean concordance index [C-index], 0.64) than 2 commercialized multigene signatures (mean C-index, 0.53 and 0.61) for estimation of survival in comparable validation cohorts. When integrated with clinical characteristics such as age and stage, the composite clinical and immune signature showed improved prognostic accuracy in all validation data sets relative to molecular signatures alone (mean C-index, 0.70 vs 0.63) and another commercialized clinical-molecular signature (mean C-index, 0.68 vs 0.65).

Conclusions and Relevance

The proposed clinical-immune signature is a promising biomarker for estimating overall survival in nonsquamous NSCLC, including early-stage disease. Prospective studies are needed to test the clinical utility of the biomarker in individualized management of nonsquamous NSCLC.

Collapse

Liu H, Li Y, He J, Guan Q, Chen R, Yan H, Zheng W, Song K, Cai H, Guo Y, Wang X, Guo Z. Robust transcriptional signatures for low-input RNA samples based on relative expression orderings. BMC Genomics 2017;18:913. [PMID: 29179677 PMCID: PMC5704640 DOI: 10.1186/s12864-017-4280-7] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2017] [Accepted: 11/03/2017] [Indexed: 11/18/2022] Open

Abstract

Background

It is often difficult to obtain sufficient quantity of RNA molecules for gene expression profiling under many practical situations. Amplification from low-input samples may induce artificial signals.

Results

We compared the expression measurements of low-input mRNA samples, from 25 pg to 1000 pg mRNA, which were amplified and profiled by Smart-seq, DP-seq and CEL-seq techniques using the Illumina HiSeq 2000 platform, with those of the paired high-input (50 ng) mRNA samples. Even with 1000 pg mRNA input, we found that thousands of genes had at least 2 folds-change of expression levels in the low-input samples compared with the corresponding paired high-input samples. Consequently, a transcriptional signature based on quantitative expression values and determined from high-input RNA samples cannot be applied to low-input samples, and vice versa. In contrast, the within-sample relative expression orderings (REOs) of approximately 90% of all the gene pairs in the high-input samples were maintained in the paired low-input samples with 1000 pg input mRNA molecules. Similar results were observed in the low-input total RNA samples amplified and profiled by the Whole-Genome DASL technique using the Illumina HumanRef-8 v3.0 platform. As a proof of principle, we developed REOs-based signatures from high-input RNA samples for discriminating cancer tissues and showed that they can be robustly applied to low-input RNA samples.

Conclusions

REOs-based signatures determined from the high-input RNA samples can be robustly applied to samples profiled with the low-input RNA samples, as low as the 1000 pg and 250 pg input samples but no longer stable in samples with less than 250 pg RNA input to a certain degree.

Electronic supplementary material

The online version of this article (10.1186/s12864-017-4280-7) contains supplementary material, which is available to authorized users.

Collapse

Affiliation(s)

Huaping Liu Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China.,Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
Yawei Li Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China
Jun He Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China
Qingzhou Guan Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China
Rou Chen Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China
Haidan Yan Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China
Weicheng Zheng Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China
Kai Song Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
Hao Cai Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China
You Guo Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China
Xianlong Wang Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China.
Zheng Guo Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China. .,Fujian Key Laboratory of Tumor Microbiology, Fujian Medical University, Fuzhou, 350122, China. .,Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China. .,Key Laboratory of Medical bioinformatics, Fujian Province, China.

Collapse

Identification of gene pairs through penalized regression subject to constraints. BMC Bioinformatics 2017;18:466. [PMID: 29100492 PMCID: PMC5670721 DOI: 10.1186/s12859-017-1872-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2017] [Accepted: 10/17/2017] [Indexed: 02/07/2023] Open

Ma T, Song C, Tseng GC. Discussant paper on ‘Statistical contributions to bioinformatics: Design, modelling, structure learning and integration’. STAT MODEL 2017. [DOI: 10.1177/1471082x17705992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Salekin S, Bari MG, Raphael I, Forsthuber TG, Zhang JM. Early response index: a statistic to discover potential early stage disease biomarkers. BMC Bioinformatics 2017. [PMID: 28645323 PMCID: PMC5481992 DOI: 10.1186/s12859-017-1712-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open

Reconstructing cell cycle pseudo time-series via single-cell transcriptome data. Nat Commun 2017;8:22. [PMID: 28630425 PMCID: PMC5476636 DOI: 10.1038/s41467-017-00039-z|] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open

Liu Z, Lou H, Xie K, Wang H, Chen N, Aparicio OM, Zhang MQ, Jiang R, Chen T. Reconstructing cell cycle pseudo time-series via single-cell transcriptome data. Nat Commun 2017. [PMID: 28630425 PMCID: PMC5476636 DOI: 10.1038/s41467-017-00039-z] [Citation(s) in RCA: 66] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open

Affiliation(s)

Zehua Liu MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Department of Automation, Tsinghua University, Beijing, 100084, China
Huazhe Lou MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Department of Computer Sciences, State Key Lab of Intelligent Technology and Systems, Tsinghua University, Beijing, 100084, China
Kaikun Xie MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Department of Computer Sciences, State Key Lab of Intelligent Technology and Systems, Tsinghua University, Beijing, 100084, China
Hao Wang MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Department of Computer Sciences, State Key Lab of Intelligent Technology and Systems, Tsinghua University, Beijing, 100084, China
Ning Chen MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Department of Computer Sciences, State Key Lab of Intelligent Technology and Systems, Tsinghua University, Beijing, 100084, China
Oscar M Aparicio Program in Computational Biology and Bioinformatics, University of Southern California, Los Angeles, CA, 90089, USA
Michael Q Zhang MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Department of Automation, Tsinghua University, Beijing, 100084, China.,Department of Molecular and Cell Biology, Center for Systems Biology, University of Texas at Dallas, 800 West Campbell Road, RL11, Richardson, TX, 75080-3021, USA
Rui Jiang MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Department of Automation, Tsinghua University, Beijing, 100084, China.
Ting Chen MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Department of Computer Sciences, State Key Lab of Intelligent Technology and Systems, Tsinghua University, Beijing, 100084, China. .,Program in Computational Biology and Bioinformatics, University of Southern California, Los Angeles, CA, 90089, USA.

Collapse

Isella C, Brundu F, Bellomo SE, Galimi F, Zanella E, Porporato R, Petti C, Fiori A, Orzan F, Senetta R, Boccaccio C, Ficarra E, Marchionni L, Trusolino L, Medico E, Bertotti A. Selective analysis of cancer-cell intrinsic transcriptional traits defines novel clinically relevant subtypes of colorectal cancer. Nat Commun 2017;8:15107. [PMID: 28561063 PMCID: PMC5499209 DOI: 10.1038/ncomms15107] [Citation(s) in RCA: 194] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2016] [Accepted: 03/01/2017] [Indexed: 12/15/2022] Open

Affiliation(s)

Claudio Isella Department of Oncology, University of Torino School of Medicine, 10060 Candiolo Torino, Italy.,Candiolo Cancer Institute-FPO IRCCS, 10060 Candiolo Torino, Italy
Francesco Brundu Department of Control and Computer Engineering, Torino School of Engineering, 10129 Torino, Italy
Sara E Bellomo Department of Oncology, University of Torino School of Medicine, 10060 Candiolo Torino, Italy.,Candiolo Cancer Institute-FPO IRCCS, 10060 Candiolo Torino, Italy
Francesco Galimi Department of Oncology, University of Torino School of Medicine, 10060 Candiolo Torino, Italy.,Candiolo Cancer Institute-FPO IRCCS, 10060 Candiolo Torino, Italy
Eugenia Zanella Candiolo Cancer Institute-FPO IRCCS, 10060 Candiolo Torino, Italy
Roberta Porporato Candiolo Cancer Institute-FPO IRCCS, 10060 Candiolo Torino, Italy
Consalvo Petti Department of Oncology, University of Torino School of Medicine, 10060 Candiolo Torino, Italy.,Candiolo Cancer Institute-FPO IRCCS, 10060 Candiolo Torino, Italy
Alessandro Fiori Candiolo Cancer Institute-FPO IRCCS, 10060 Candiolo Torino, Italy
Francesca Orzan Department of Oncology, University of Torino School of Medicine, 10060 Candiolo Torino, Italy.,Candiolo Cancer Institute-FPO IRCCS, 10060 Candiolo Torino, Italy
Rebecca Senetta Candiolo Cancer Institute-FPO IRCCS, 10060 Candiolo Torino, Italy.,Department of Medical Sciences, University of Torino School of Medicine, 10060 Candiolo Torino, Italy
Carla Boccaccio Department of Oncology, University of Torino School of Medicine, 10060 Candiolo Torino, Italy.,Candiolo Cancer Institute-FPO IRCCS, 10060 Candiolo Torino, Italy
Elisa Ficarra Department of Control and Computer Engineering, Torino School of Engineering, 10129 Torino, Italy
Luigi Marchionni Department of Oncology, Johns Hopkins University, Baltimore, 21287 Maryland, USA
Livio Trusolino Department of Oncology, University of Torino School of Medicine, 10060 Candiolo Torino, Italy.,Candiolo Cancer Institute-FPO IRCCS, 10060 Candiolo Torino, Italy
Enzo Medico Department of Oncology, University of Torino School of Medicine, 10060 Candiolo Torino, Italy.,Candiolo Cancer Institute-FPO IRCCS, 10060 Candiolo Torino, Italy
Andrea Bertotti Department of Oncology, University of Torino School of Medicine, 10060 Candiolo Torino, Italy.,Candiolo Cancer Institute-FPO IRCCS, 10060 Candiolo Torino, Italy.,National Institute of Biostructures and Biosystems, INBB, 00136 Rome, Italy

Collapse

Hornung R, Causeur D, Bernau C, Boulesteix AL. Improving cross-study prediction through addon batch effect adjustment or addon normalization. Bioinformatics 2017;33:397-404. [PMID: 27797760 DOI: 10.1093/bioinformatics/btw650] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Accepted: 10/11/2016] [Indexed: 12/22/2022] Open

Hong G, Li H, Zhang J, Guan Q, Chen R, Guo Z. Identifying disease-associated pathways in one-phenotype data based on reversal gene expression orderings. Sci Rep 2017;7:1348. [PMID: 28465555 PMCID: PMC5431047 DOI: 10.1038/s41598-017-01536-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2016] [Accepted: 03/30/2017] [Indexed: 12/31/2022] Open

Wang S, Wei J, Yang Z. Discrimination Structure Complementarity-Based Feature Selection. Comput Intell 2017. [DOI: 10.1111/coin.12118] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

AVC: Selecting discriminative features on basis of AUC by maximizing variable complementarity. BMC Bioinformatics 2017;18:50. [PMID: 28361689 PMCID: PMC5374660 DOI: 10.1186/s12859-017-1468-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open

Feature selection based on measurement of ability to classify subproblems. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2016.10.062] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]

Kernel-based learning and feature selection analysis for cancer diagnosis. Appl Soft Comput 2017. [DOI: 10.1016/j.asoc.2016.12.010] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Bari MG, Salekin S, Zhang JM. A Robust and Efficient Feature Selection Algorithm for Microarray Data. Mol Inform 2016;36. [PMID: 28000384 DOI: 10.1002/minf.201600099] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2016] [Accepted: 11/21/2016] [Indexed: 12/20/2022]

100

Ganesh Kumar P, Kavitha MS, Ahn BC. Automated Detection of Cancer Associated Genes Using a Combined Fuzzy-Rough-Set-Based F-Information and Water Swirl Algorithm of Human Gene Expression Data. PLoS One 2016;11:e0167504. [PMID: 27936033 PMCID: PMC5148587 DOI: 10.1371/journal.pone.0167504] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2016] [Accepted: 11/15/2016] [Indexed: 11/22/2022] Open

Abstract

This study describes a novel approach to reducing the challenges of highly nonlinear multiclass gene expression values for cancer diagnosis. To build a fruitful system for cancer diagnosis, in this study, we introduced two levels of gene selection such as filtering and embedding for selection of potential genes and the most relevant genes associated with cancer, respectively. The filter procedure was implemented by developing a fuzzy rough set (FR)-based method for redefining the criterion function of f-information (FI) to identify the potential genes without discretizing the continuous gene expression values. The embedded procedure is implemented by means of a water swirl algorithm (WSA), which attempts to optimize the rule set and membership function required to classify samples using a fuzzy-rule-based multiclassification system (FRBMS). Two novel update equations are proposed in WSA, which have better exploration and exploitation abilities while designing a self-learning FRBMS. The efficiency of our new approach was evaluated on 13 multicategory and 9 binary datasets of cancer gene expression. Additionally, the performance of the proposed FRFI-WSA method in designing an FRBMS was compared with existing methods for gene selection and optimization such as genetic algorithm (GA), particle swarm optimization (PSO), and artificial bee colony algorithm (ABC) on all the datasets. In the global cancer map with repeated measurements (GCM_RM) dataset, the FRFI-WSA showed the smallest number of 16 most relevant genes associated with cancer using a minimal number of 26 compact rules with the highest classification accuracy (96.45%). In addition, the statistical validation used in this study revealed that the biological relevance of the most relevant genes associated with cancer and their linguistics detected by the proposed FRFI-WSA approach are better than those in the other methods. The simple interpretable rules with most relevant genes and effectively classified samples suggest that the proposed FRFI-WSA approach is reliable for classification of an individual’s cancer gene expression data with high precision and therefore it could be helpful for clinicians as a clinical decision support system.

Collapse