1
|
Sokołowski H, Czajkowski M, Czajkowska A, Jurczuk K, Kretowski M. ITree: a user-driven tool for interactive decision-making with classification trees. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae273. [PMID: 38640482 DOI: 10.1093/bioinformatics/btae273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 03/16/2024] [Accepted: 04/17/2024] [Indexed: 04/21/2024]
Abstract
MOTIVATION ITree is an intuitive web tool for the manual, semi-automatic, and automatic induction of decision trees. It enables interactive modifications of tree structures and incorporates Relative Expression Analysis for detecting complex patterns in high-throughput molecular data. This makes ITree a versatile tool for both research and education in biomedical data analysis. RESULTS The tool allows users to instantly see the effects of modifications on decision trees, with updates to predictions and statistics displayed in real time, facilitating a deeper understanding of data classification processes. AVAILABILITY AND IMPLEMENTATION Available online at https://itree.wi.pb.edu.pl. Source code and documentation are hosted on GitHub at https://github.com/hsokolowski/iTree and in supplement.
Collapse
Affiliation(s)
- Hubert Sokołowski
- Faculty of Computer Science, Bialystok University of Technology, Bialystok 15-351, Poland
| | - Marcin Czajkowski
- Faculty of Computer Science, Bialystok University of Technology, Bialystok 15-351, Poland
| | - Anna Czajkowska
- Department of Medical Biology, Medical University of Bialystok, Bialystok 15-089, Poland
| | - Krzysztof Jurczuk
- Faculty of Computer Science, Bialystok University of Technology, Bialystok 15-351, Poland
| | - Marek Kretowski
- Faculty of Computer Science, Bialystok University of Technology, Bialystok 15-351, Poland
| |
Collapse
|
2
|
Zhang ZY, Sun ZJ, Gao D, Hao YD, Lin H, Liu F. Excavation of gene markers associated with pancreatic ductal adenocarcinoma based on interrelationships of gene expression. IET Syst Biol 2024. [PMID: 38530028 DOI: 10.1049/syb2.12090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 02/06/2024] [Accepted: 03/10/2024] [Indexed: 03/27/2024] Open
Abstract
Pancreatic ductal adenocarcinoma (PDAC) accounts for 95% of all pancreatic cancer cases, posing grave challenges to its diagnosis and treatment. Timely diagnosis is pivotal for improving patient survival, necessitating the discovery of precise biomarkers. An innovative approach was introduced to identify gene markers for precision PDAC detection. The core idea of our method is to discover gene pairs that display consistent opposite relative expression and differential co-expression patterns between PDAC and normal samples. Reversal gene pair analysis and differential partial correlation analysis were performed to determine reversal differential partial correlation (RDC) gene pairs. Using incremental feature selection, the authors refined the selected gene set and constructed a machine-learning model for PDAC recognition. As a result, the approach identified 10 RDC gene pairs. And the model could achieve a remarkable accuracy of 96.1% during cross-validation, surpassing gene expression-based models. The experiment on independent validation data confirmed the model's performance. Enrichment analysis revealed the involvement of these genes in essential biological processes and shed light on their potential roles in PDAC pathogenesis. Overall, the findings highlight the potential of these 10 RDC gene pairs as effective diagnostic markers for early PDAC detection, bringing hope for improving patient prognosis and survival.
Collapse
Affiliation(s)
- Zhao-Yue Zhang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu, China
| | - Zi-Jie Sun
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Dong Gao
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Yu-Duo Hao
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hao Lin
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Fen Liu
- Department of Radiation Oncology, Peking University Cancer Hospital (Inner Mongolia Campus), Affiliated Cancer Hospital of Inner Mongolia Medical University, Inner Mongolia Cancer Hospital, Hohhot, China
| |
Collapse
|
3
|
Tong M, Luo S, Gu L, Wang X, Zhang Z, Liang C, Huang H, Lin Y, Huang J. SIMarker: Cellular similarity detection and its application to diagnosis and prognosis of liver cancer. Comput Biol Med 2024; 171:108113. [PMID: 38368754 DOI: 10.1016/j.compbiomed.2024.108113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 01/09/2024] [Accepted: 02/04/2024] [Indexed: 02/20/2024]
Abstract
BACKGROUND The emergence of single-cell technology offers a unique opportunity to explore cellular similarity and heterogeneity between precancerous diseases and solid tumors. However, there is lacking a systematic study for identifying and characterizing similarities at single-cell resolution. METHODS We developed SIMarker, a computational framework to detect cellular similarities between precancerous diseases and solid tumors based on gene expression at single-cell resolution. Taking hepatocellular carcinoma (HCC) as a case study, we quantified the cellular and molecular connections between HCC and cirrhosis. Core analysis modules of SIMarker is publicly available at https://github.com/xmuhuanglab/SIMarker ("SIM" means "similarity" and "Marker" means "biomarkers). RESULTS We found PGA5+ hepatocytes in HCC showed cirrhosis-like characteristics, including similar transcriptional programs and gene regulatory networks. Consequently, the genes constituting the gene expression program of these cirrhosis-like subpopulations were designated as cirrhosis-like signatures (CLS). Strikingly, our utilization of CLS enabled the development of diagnosis and prognosis biomarkers based on within-sample relative expression orderings of gene pairs. These biomarkers achieved high precision and concordance compared with previous studies. CONCLUSIONS Our work provides a systematic method to investigate the clinical translational significance of cellular similarities between HCC and cirrhosis, which opens avenues for identifying similar paradigms in other categories of cancers and diseases.
Collapse
Affiliation(s)
- Mengsha Tong
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen, Fujian 361102, China; National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian, 316005, China.
| | - Shijie Luo
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen, Fujian 361102, China; National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian, 316005, China
| | - Lin Gu
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen, Fujian 361102, China
| | - Xinkang Wang
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen, Fujian 361102, China; National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian, 316005, China
| | - Zheyang Zhang
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen, Fujian 361102, China; National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian, 316005, China
| | - Chenyu Liang
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen, Fujian 361102, China; National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian, 316005, China
| | - Huaqiang Huang
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen, Fujian 361102, China
| | - Yuxiang Lin
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian, 316005, China
| | - Jialiang Huang
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen, Fujian 361102, China; National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian, 316005, China.
| |
Collapse
|
4
|
Lu M, Yin R, Chen XS. Ensemble methods of rank-based trees for single sample classification with gene expression profiles. J Transl Med 2024; 22:140. [PMID: 38321494 PMCID: PMC10848444 DOI: 10.1186/s12967-024-04940-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Accepted: 01/27/2024] [Indexed: 02/08/2024] Open
Abstract
Building Single Sample Predictors (SSPs) from gene expression profiles presents challenges, notably due to the lack of calibration across diverse gene expression measurement technologies. However, recent research indicates the viability of classifying phenotypes based on the order of expression of multiple genes. Existing SSP methods often rely on Top Scoring Pairs (TSP), which are platform-independent and easy to interpret through the concept of "relative expression reversals". Nevertheless, TSP methods face limitations in classifying complex patterns involving comparisons of more than two gene expressions. To overcome these constraints, we introduce a novel approach that extends TSP rules by constructing rank-based trees capable of encompassing extensive gene-gene comparisons. This method is bolstered by incorporating two ensemble strategies, boosting and random forest, to mitigate the risk of overfitting. Our implementation of ensemble rank-based trees employs boosting with LogitBoost cost and random forests, addressing both binary and multi-class classification problems. In a comparative analysis across 12 cancer gene expression datasets, our proposed methods demonstrate superior performance over both the k-TSP classifier and nearest template prediction methods. We have further refined our approach to facilitate variable selection and the generation of clear, precise decision rules from rank-based trees, enhancing interpretability. The cumulative evidence from our research underscores the significant potential of ensemble rank-based trees in advancing disease classification via gene expression data, offering a robust, interpretable, and scalable solution. Our software is available at https://CRAN.R-project.org/package=ranktreeEnsemble .
Collapse
Affiliation(s)
- Min Lu
- Division of Biostatistics, Department of Public Health Sciences, Miller School of Medicine, University of Miami, 1120 NW 14th Street, Miami, FL, 33136, USA.
| | - Ruijie Yin
- Division of Biostatistics, Department of Public Health Sciences, Miller School of Medicine, University of Miami, 1120 NW 14th Street, Miami, FL, 33136, USA
| | - X Steven Chen
- Division of Biostatistics, Department of Public Health Sciences, Miller School of Medicine, University of Miami, 1120 NW 14th Street, Miami, FL, 33136, USA.
- Sylvester Comprehensive Cancer Center, Miller School of Medicine, University of Miami, 1475 NW 12th Ave, Miami, FL, 33136, USA.
| |
Collapse
|
5
|
Pakula H, Omar M, Carelli R, Pederzoli F, Fanelli GN, Pannellini T, Socciarelli F, Van Emmenis L, Rodrigues S, Fidalgo-Ribeiro C, Nuzzo PV, Brady NJ, Dinalankara W, Jere M, Valencia I, Saladino C, Stone J, Unkenholz C, Garner R, Alexanderani MK, Khani F, de Almeida FN, Abate-Shen C, Greenblatt MB, Rickman DS, Barbieri CE, Robinson BD, Marchionni L, Loda M. Distinct mesenchymal cell states mediate prostate cancer progression. Nat Commun 2024; 15:363. [PMID: 38191471 PMCID: PMC10774315 DOI: 10.1038/s41467-023-44210-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 12/04/2023] [Indexed: 01/10/2024] Open
Abstract
In the complex tumor microenvironment (TME), mesenchymal cells are key players, yet their specific roles in prostate cancer (PCa) progression remain to be fully deciphered. This study employs single-cell RNA sequencing to delineate molecular changes in tumor stroma that influence PCa progression and metastasis. Analyzing mesenchymal cells from four genetically engineered mouse models (GEMMs) and correlating these findings with human tumors, we identify eight stromal cell populations with distinct transcriptional identities consistent across both species. Notably, stromal signatures in advanced mouse disease reflect those in human bone metastases, highlighting periostin's role in invasion and differentiation. From these insights, we derive a gene signature that predicts metastatic progression in localized disease beyond traditional Gleason scores. Our results illuminate the critical influence of stromal dynamics on PCa progression, suggesting new prognostic tools and therapeutic targets.
Collapse
Affiliation(s)
- Hubert Pakula
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10021, USA
| | - Mohamed Omar
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10021, USA
- Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, Belfer Research Building, 413 East 69th Street, New York, NY, 10021, USA
| | - Ryan Carelli
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10021, USA
| | - Filippo Pederzoli
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10021, USA
| | - Giuseppe Nicolò Fanelli
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10021, USA
- Department of Laboratory Medicine, Pisa University Hospital, Division of Pathology, Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, Pisa, 56126, Italy
| | - Tania Pannellini
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10021, USA
| | - Fabio Socciarelli
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10021, USA
| | - Lucie Van Emmenis
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10021, USA
| | - Silvia Rodrigues
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10021, USA
| | - Caroline Fidalgo-Ribeiro
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10021, USA
| | - Pier Vitale Nuzzo
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10021, USA
| | - Nicholas J Brady
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10021, USA
| | - Wikum Dinalankara
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10021, USA
| | - Madhavi Jere
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10021, USA
| | - Itzel Valencia
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10021, USA
| | - Christopher Saladino
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10021, USA
| | - Jason Stone
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10021, USA
| | - Caitlin Unkenholz
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10021, USA
| | - Richard Garner
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10021, USA
| | - Mohammad K Alexanderani
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10021, USA
| | - Francesca Khani
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10021, USA
| | - Francisca Nunes de Almeida
- Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Cory Abate-Shen
- Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University Irving Medical Center, New York, NY, 10032, USA
- Department of Molecular Pharmacology and Therapeutics, Vagelos College of Physicians and Surgeons, Columbia University Irving Medical Center, New York, NY, 10032, USA
- Department of Pathology and Cell Biology, Vagelos College of Physicians and Surgeons, Columbia University Irving Medical Center, New York, NY, 10032, USA
- Department of Urology, Vagelos College of Physicians and Surgeons, Columbia University Irving Medical Center, New York, NY, 10032, USA
- Department of Systems Biology, Vagelos College of Physicians and Surgeons, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Matthew B Greenblatt
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10021, USA
| | - David S Rickman
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10021, USA
| | - Christopher E Barbieri
- Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, Belfer Research Building, 413 East 69th Street, New York, NY, 10021, USA
- Department of Urology, Weill Cornell Medicine, New York, NY, 10021, USA
| | - Brian D Robinson
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10021, USA
- Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, Belfer Research Building, 413 East 69th Street, New York, NY, 10021, USA
- Department of Urology, Weill Cornell Medicine, New York, NY, 10021, USA
| | - Luigi Marchionni
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10021, USA
| | - Massimo Loda
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, 10021, USA.
- Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, Belfer Research Building, 413 East 69th Street, New York, NY, 10021, USA.
- Department of Oncologic Pathology, Dana-Farber Cancer Institute and Harvard Medical School, 450 Brookline Ave, Boston, MA, 02215, USA.
- University of Oxford, Nuffield Department of Surgical Sciences, Oxford, UK.
| |
Collapse
|
6
|
Omar M, Nuzzo PV, Ravera F, Bleve S, Fanelli GN, Zanettini C, Valencia I, Marchionni L. Notch-based gene signature for predicting the response to neoadjuvant chemotherapy in triple-negative breast cancer. J Transl Med 2023; 21:811. [PMID: 37964363 PMCID: PMC10647131 DOI: 10.1186/s12967-023-04713-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 11/08/2023] [Indexed: 11/16/2023] Open
Abstract
BACKGROUND While the efficacy of neoadjuvant chemotherapy (NACT) in treating triple-negative breast cancer (TNBC) is generally accepted, not all patients derive benefit from this preoperative treatment. Presently, there are no validated biomarkers to predict the NACT response, and previous attempts to develop predictive classifiers based on gene expression data have not demonstrated clinical utility. However, predictive models incorporating biological constraints have shown increased robustness and improved performance compared to agnostic classifiers. METHODS We used the preoperative transcriptomic profiles from 298 patients with TNBC to train and test a rank-based classifier, k-top scoring pairs, to predict whether the patient will have pathological complete response (pCR) or residual disease (RD) following NACT. To reduce overfitting and enhance the signature's interpretability, we constrained the training process to genes involved in the Notch signaling pathway. Subsequently, we evaluated the signature performance on two independent cohorts with 75 and 71 patients. Finally, we assessed the prognostic value of the signature by examining its association with relapse-free survival (RFS) using Kaplan‒Meier (KM) survival estimates and a multivariate Cox proportional hazards model. RESULTS The final signature consists of five gene pairs, whose relative ordering can be predictive of the NACT response. The signature has a robust performance at predicting pCR in TNBC patients with an area under the ROC curve (AUC) of 0.76 and 0.85 in the first and second testing cohorts, respectively, outperforming other gene signatures developed for the same purpose. Additionally, the signature was significantly associated with RFS in an independent TNBC patient cohort even after adjusting for T stage, patient age at the time of diagnosis, type of breast surgery, and menopausal status. CONCLUSION We introduce a robust gene signature to predict pathological complete response (pCR) in patients with TNBC. This signature applies easily interpretable, rank-based decision rules to genes regulated by the Notch signaling pathway, a known determinant in breast cancer chemoresistance. The robust predictive and prognostic performance of the signature make it a strong candidate for clinical implementation, aiding in the stratification of TNBC patients undergoing NACT.
Collapse
Affiliation(s)
- Mohamed Omar
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA.
- Dana Farber Cancer Institute, Boston, MA, USA.
| | - Pier Vitale Nuzzo
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Francesco Ravera
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
- Department of Internal Medicine, University of Genoa, Genoa, Italy
| | - Sara Bleve
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
- Department of Medical Oncology, IRCCS Istituto Romagnolo per lo Studio dei Tumori (IRST) "Dino Amadori", Meldola, Italy
| | - Giuseppe Nicolò Fanelli
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
- First Division of Pathology, Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, 56126, Pisa, Italy
| | - Claudio Zanettini
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Itzel Valencia
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Luigi Marchionni
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA.
| |
Collapse
|
7
|
Li C, Wang T, Lin X. Analyzing omics data by feature combinations based on kernel functions. J Bioinform Comput Biol 2023; 21:2350021. [PMID: 37852788 DOI: 10.1142/s021972002350021x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2023]
Abstract
Defining meaningful feature (molecule) combinations can enhance the study of disease diagnosis and prognosis. However, feature combinations are complex and various in biosystems, and the existing methods examine the feature cooperation in a single, fixed pattern for all feature pairs, such as linear combination. To identify the appropriate combination between two features and evaluate feature combination more comprehensively, this paper adopts kernel functions to study feature relationships and proposes a new omics data analysis method KF-[Formula: see text]-TSP. Besides linear combination, KF-[Formula: see text]-TSP also explores the nonlinear combination of features, and allows hybridizing multiple kernel functions to evaluate feature interaction from multiple views. KF-[Formula: see text]-TSP selects [Formula: see text] > 0 top-scoring pairs to build an ensemble classifier. Experimental results show that KF-[Formula: see text]-TSP with multiple kernel functions which evaluates feature combinations from multiple views is better than that with only one kernel function. Meanwhile, KF-[Formula: see text]-TSP performs better than TSP family algorithms and the previous methods based on conversion strategy in most cases. It performs similarly to the popular machine learning methods in omics data analysis, but involves fewer feature pairs. In the procedure of physiological and pathological changes, molecular interactions can be both linear and nonlinear. Hence, KF-[Formula: see text]-TSP, which can measure molecular combination from multiple perspectives, can help to mine information closely related to physiological and pathological changes and study disease mechanism.
Collapse
Affiliation(s)
- Chao Li
- School of Computer Science and Technology, Dalian University of Technology, No. 2 Linggong Road, Dalian, Liaoning 116024, P. R. China
| | - Tianxiang Wang
- School of Computer Science and Technology, Dalian University of Technology, No. 2 Linggong Road, Dalian, Liaoning 116024, P. R. China
| | - Xiaohui Lin
- School of Computer Science and Technology, Dalian University of Technology, No. 2 Linggong Road, Dalian, Liaoning 116024, P. R. China
| |
Collapse
|
8
|
Zhang Y, Lin X, Gao Z, Wang T, Dong K, Zhang J. An omics data analysis method based on feature linear relationship and graph convolutional network. J Biomed Inform 2023; 145:104479. [PMID: 37634557 DOI: 10.1016/j.jbi.2023.104479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 07/26/2023] [Accepted: 08/23/2023] [Indexed: 08/29/2023]
Abstract
Biological networks are known to be highly modular, and the dysfunction of network modules may cause diseases. Defining the key modules from the omics data and establishing the classification model is helpful in promoting the research of disease diagnosis and prognosis. However, for applying modules in downstream analysis such as disease states discrimination, most methods only utilize the node information, and ignore the node interactions or topological information, which may lead to false positives and limit the model performance. In this study, we propose an omics data analysis method based on feature linear relationship and graph convolutional network (LCNet). In LCNet, we adopt a way of applying the difference of feature linear relationships during disease development to characterize physiological and pathological changes and construct the differential linear relation network, which is simple and interpretable from the perspective of feature linear relationship. A greedy strategy is developed for searching the highly interactive modules with a strong discrimination ability. To fully utilize the information of the detected modules, the personalized sub-graphs for each sample based on the modules are defined, and the graph convolutional network (GCN) classifiers are trained to predict the sample labels. The experimental results on public datasets show the superiority of LCNet in classification performance. For Breast Cancer metabolic data, the identified metabolites by LCNet involve important pathways. Thus, LCNet can identify the module biomarkers by feature linear relationship and a greedy strategy, and label samples by personalized sub-graphs and GCN. It provides a new manner of utilizing node (molecule) information and topological information in the defined modules for better disease classification.
Collapse
Affiliation(s)
- Yanhui Zhang
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China
| | - Xiaohui Lin
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China.
| | - Zhenbo Gao
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China
| | - Tianxiang Wang
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China
| | - Kunjie Dong
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China
| | - Jianjun Zhang
- Cancer Hospital of Dalian University of Technology (Liaoning Cancer Hospital & Institute), Liaoning, China
| |
Collapse
|
9
|
Rydzewski NR, Helzer KT, Bootsma M, Shi Y, Bakhtiar H, Sjöström M, Zhao SG. Machine Learning & Molecular Radiation Tumor Biomarkers. Semin Radiat Oncol 2023; 33:243-251. [PMID: 37331779 DOI: 10.1016/j.semradonc.2023.03.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
Developing radiation tumor biomarkers that can guide personalized radiotherapy clinical decision making is a critical goal in the effort towards precision cancer medicine. High-throughput molecular assays paired with modern computational techniques have the potential to identify individual tumor-specific signatures and create tools that can help understand heterogenous patient outcomes in response to radiotherapy, allowing clinicians to fully benefit from the technological advances in molecular profiling and computational biology including machine learning. However, the increasingly complex nature of the data generated from high-throughput and "omics" assays require careful selection of analytical strategies. Furthermore, the power of modern machine learning techniques to detect subtle data patterns comes with special considerations to ensure that the results are generalizable. Herein, we review the computational framework of tumor biomarker development and describe commonly used machine learning approaches and how they are applied for radiation biomarker development using molecular data, as well as challenges and emerging research trends.
Collapse
Affiliation(s)
- Nicholas R Rydzewski
- Radiation Oncology Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD; Department of Human Oncology, University of Wisconsin, Madison, WI
| | - Kyle T Helzer
- Department of Human Oncology, University of Wisconsin, Madison, WI
| | - Matthew Bootsma
- Department of Human Oncology, University of Wisconsin, Madison, WI
| | - Yue Shi
- Department of Human Oncology, University of Wisconsin, Madison, WI
| | - Hamza Bakhtiar
- Department of Human Oncology, University of Wisconsin, Madison, WI
| | - Martin Sjöström
- Department of Radiation Oncology, University of California San Francisco, San Francisco, CA
| | - Shuang G Zhao
- Department of Human Oncology, University of Wisconsin, Madison, WI; Carbone Cancer Center, University of Wisconsin, Madison, WI; William S. Middleton Memorial Veterans Hospital, Madison, WI.
| |
Collapse
|
10
|
Pakula H, Omar M, Carelli R, Pederzoli F, Fanelli GN, Pannellini T, Van Emmenis L, Rodrigues S, Fidalgo-Ribeiro C, Nuzzo PV, Brady NJ, Jere M, Unkenholz C, Alexanderani MK, Khani F, de Almeida FN, Abate-Shen C, Greenblatt MB, Rickman DS, Barbieri CE, Robinson BD, Marchionni L, Loda M. Distinct mesenchymal cell states mediate prostate cancer progression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.29.534769. [PMID: 37034687 PMCID: PMC10081210 DOI: 10.1101/2023.03.29.534769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Alterations in tumor stroma influence prostate cancer progression and metastatic potential. However, the molecular underpinnings of this stromal-epithelial crosstalk are largely unknown. Here, we compare mesenchymal cells from four genetically engineered mouse models (GEMMs) of prostate cancer representing different stages of the disease to their wild-type (WT) counterparts by single-cell RNA sequencing (scRNA-seq) and, ultimately, to human tumors with comparable genotypes. We identified 8 transcriptionally and functionally distinct stromal populations responsible for common and GEMM-specific transcriptional programs. We show that stromal responses are conserved in mouse models and human prostate cancers with the same genomic alterations. We noted striking similarities between the transcriptional profiles of the stroma of murine models of advanced disease and those of of human prostate cancer bone metastases. These profiles were then used to build a robust gene signature that can predict metastatic progression in prostate cancer patients with localized disease and is also associated with progression-free survival independent of Gleason score. Taken together, this offers new evidence that stromal microenvironment mediates prostate cancer progression, further identifying tissue-based biomarkers and potential therapeutic targets of aggressive and metastatic disease.
Collapse
Affiliation(s)
- Hubert Pakula
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Mohamed Omar
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Ryan Carelli
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Filippo Pederzoli
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Giuseppe Nicolò Fanelli
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA
- Department of Laboratory Medicine, Pisa University Hospital, Division of Pathology, Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, Pisa 56126, Italy
| | - Tania Pannellini
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Lucie Van Emmenis
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Silvia Rodrigues
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Caroline Fidalgo-Ribeiro
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Pier V. Nuzzo
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Nicholas J. Brady
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Madhavi Jere
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Caitlin Unkenholz
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Mohammad K. Alexanderani
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Francesca Khani
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA
- Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, Belfer Research Building, 413 East 69th Street, New York, NY 10021, USA
- Department of Urology, Weill Cornell Medicine, New York, NY 10021, USA
| | - Francisca Nunes de Almeida
- Departments of Molecular Pharmacology and Therapeutics, Urology, Medicine, Pathology & Cell Biology and Systems Biology, Herbert Irving Comprehensive Cancer Center, Vagelos College of Physicians and Surgeons, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Cory Abate-Shen
- Departments of Molecular Pharmacology and Therapeutics, Urology, Medicine, Pathology & Cell Biology and Systems Biology, Herbert Irving Comprehensive Cancer Center, Vagelos College of Physicians and Surgeons, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Matthew B Greenblatt
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - David S. Rickman
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Christopher E. Barbieri
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA
- Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, Belfer Research Building, 413 East 69th Street, New York, NY 10021, USA
- Department of Urology, Weill Cornell Medicine, New York, NY 10021, USA
| | - Brian D. Robinson
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA
- Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, Belfer Research Building, 413 East 69th Street, New York, NY 10021, USA
- Department of Urology, Weill Cornell Medicine, New York, NY 10021, USA
| | - Luigi Marchionni
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Massimo Loda
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA
- Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, Belfer Research Building, 413 East 69th Street, New York, NY 10021, USA
- Department of Oncologic Pathology, Dana-Farber Cancer Institute and Harvard Medical School, 450 Brookline Ave, Boston, MA, 02215, USA
| |
Collapse
|
11
|
Omar M, Dinalankara W, Mulder L, Coady T, Zanettini C, Imada EL, Younes L, Geman D, Marchionni L. Using biological constraints to improve prediction in precision oncology. iScience 2023; 26:106108. [PMID: 36852282 PMCID: PMC9958363 DOI: 10.1016/j.isci.2023.106108] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 12/20/2022] [Accepted: 01/28/2023] [Indexed: 02/05/2023] Open
Abstract
Many gene signatures have been developed by applying machine learning (ML) on omics profiles, however, their clinical utility is often hindered by limited interpretability and unstable performance. Here, we show the importance of embedding prior biological knowledge in the decision rules yielded by ML approaches to build robust classifiers. We tested this by applying different ML algorithms on gene expression data to predict three difficult cancer phenotypes: bladder cancer progression to muscle-invasive disease, response to neoadjuvant chemotherapy in triple-negative breast cancer, and prostate cancer metastatic progression. We developed two sets of classifiers: mechanistic, by restricting the training to features capturing specific biological mechanisms; and agnostic, in which the training did not use any a priori biological information. Mechanistic models had a similar or better testing performance than their agnostic counterparts, with enhanced interpretability. Our findings support the use of biological constraints to develop robust gene signatures with high translational potential.
Collapse
Affiliation(s)
- Mohamed Omar
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10065, USA
| | - Wikum Dinalankara
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10065, USA
| | - Lotte Mulder
- Technical University Delft, 2628 CD Delft, the Netherlands
| | - Tendai Coady
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10065, USA
| | - Claudio Zanettini
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10065, USA
| | - Eddie Luidy Imada
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10065, USA
| | - Laurent Younes
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Donald Geman
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Luigi Marchionni
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10065, USA
| |
Collapse
|
12
|
Kwan B, Fuhrer T, Montemayor D, Fink JC, He J, Hsu CY, Messer K, Nelson RG, Pu M, Ricardo AC, Rincon-Choles H, Shah VO, Ye H, Zhang J, Sharma K, Natarajan L. A generalized covariate-adjusted top-scoring pair algorithm with applications to diabetic kidney disease stage classification in the Chronic Renal Insufficiency Cohort (CRIC) Study. BMC Bioinformatics 2023; 24:57. [PMID: 36803209 PMCID: PMC9942303 DOI: 10.1186/s12859-023-05171-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 02/02/2023] [Indexed: 02/22/2023] Open
Abstract
BACKGROUND The growing amount of high dimensional biomolecular data has spawned new statistical and computational models for risk prediction and disease classification. Yet, many of these methods do not yield biologically interpretable models, despite offering high classification accuracy. An exception, the top-scoring pair (TSP) algorithm derives parameter-free, biologically interpretable single pair decision rules that are accurate and robust in disease classification. However, standard TSP methods do not accommodate covariates that could heavily influence feature selection for the top-scoring pair. Herein, we propose a covariate-adjusted TSP method, which uses residuals from a regression of features on the covariates for identifying top scoring pairs. We conduct simulations and a data application to investigate our method, and compare it to existing classifiers, LASSO and random forests. RESULTS Our simulations found that features that were highly correlated with clinical variables had high likelihood of being selected as top scoring pairs in the standard TSP setting. However, through residualization, our covariate-adjusted TSP was able to identify new top scoring pairs, that were largely uncorrelated with clinical variables. In the data application, using patients with diabetes (n = 977) selected for metabolomic profiling in the Chronic Renal Insufficiency Cohort (CRIC) study, the standard TSP algorithm identified (valine-betaine, dimethyl-arg) as the top-scoring metabolite pair for classifying diabetic kidney disease (DKD) severity, whereas the covariate-adjusted TSP method identified the pair (pipazethate, octaethylene glycol) as top-scoring. Valine-betaine and dimethyl-arg had, respectively, ≥ 0.4 absolute correlation with urine albumin and serum creatinine, known prognosticators of DKD. Thus without covariate-adjustment the top-scoring pair largely reflected known markers of disease severity, whereas covariate-adjusted TSP uncovered features liberated from confounding, and identified independent prognostic markers of DKD severity. Furthermore, TSP-based methods achieved competitive classification accuracy in DKD to LASSO and random forests, while providing more parsimonious models. CONCLUSIONS We extended TSP-based methods to account for covariates, via a simple, easy to implement residualizing process. Our covariate-adjusted TSP method identified metabolite features, uncorrelated from clinical covariates, that discriminate DKD severity stage based on the relative ordering between two features, and thus provide insights into future studies on the order reversals in early vs advanced disease states.
Collapse
Grants
- R01 DK110541 NIDDK NIH HHS
- U24 DK060990 NIDDK NIH HHS
- R01DK118736, 1R01DK110541-01A1, U01DK060990, U01DK060984, U01DK061022, U01DK061021, U01DK061028, U01DK060980, U01DK060963, U01DK060902, U24DK060990 NIDDK NIH HHS
- National Science Foundation Graduate Research Fellowship Program
- Intramural Research Program of the National Institute of Diabetes and Digestive and Kidney Diseases
- National Institute of Diabetes and Digestive and Kidney Diseases
Collapse
Affiliation(s)
- Brian Kwan
- Division of Biostatistics and Bioinformatics, Herbert Wertheim School of Public Health, University of California, San Diego, La Jolla, CA, USA
- Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
| | - Tobias Fuhrer
- Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Daniel Montemayor
- Division of Nephrology, Department of Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
- Center for Renal Precision Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
| | - Jeffery C Fink
- Department of Medicine, University of Maryland, Baltimore School of Medicine, Baltimore, MD, USA
| | - Jiang He
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine and Tulane University Translational Science Institute,, New Orleans, LA, USA
| | - Chi-Yuan Hsu
- Division of Nephrology, University of California, San Francisco School of Medicine, San Francisco, CA, USA
| | - Karen Messer
- Division of Biostatistics and Bioinformatics, Herbert Wertheim School of Public Health, University of California, San Diego, La Jolla, CA, USA
- Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
| | - Robert G Nelson
- Chronic Kidney Disease Section, National Institute of Diabetes and Digestive and Kidney Diseases, Phoenix, AZ, USA
| | - Minya Pu
- Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
| | - Ana C Ricardo
- Department of Medicine, University of Illinois, Chicago, IL, USA
| | - Hernan Rincon-Choles
- Department of Nephrology, Glickman Urological and Kidney Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
| | - Vallabh O Shah
- University of New Mexico Health Sciences Center, Albuquerque, NM, USA
| | - Hongping Ye
- Division of Nephrology, Department of Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
- Center for Renal Precision Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
| | - Jing Zhang
- Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
| | - Kumar Sharma
- Division of Nephrology, Department of Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
- Center for Renal Precision Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
| | - Loki Natarajan
- Division of Biostatistics and Bioinformatics, Herbert Wertheim School of Public Health, University of California, San Diego, La Jolla, CA, USA.
- Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
13
|
Curti N, Levi G, Giampieri E, Castellani G, Remondini D. A network approach for low dimensional signatures from high throughput data. Sci Rep 2022; 12:22253. [PMID: 36564421 PMCID: PMC9789141 DOI: 10.1038/s41598-022-25549-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Accepted: 11/30/2022] [Indexed: 12/24/2022] Open
Abstract
One of the main objectives of high-throughput genomics studies is to obtain a low-dimensional set of observables-a signature-for sample classification purposes (diagnosis, prognosis, stratification). Biological data, such as gene or protein expression, are commonly characterized by an up/down regulation behavior, for which discriminant-based methods could perform with high accuracy and easy interpretability. To obtain the most out of these methods features selection is even more critical, but it is known to be a NP-hard problem, and thus most feature selection approaches focuses on one feature at the time (k-best, Sequential Feature Selection, recursive feature elimination). We propose DNetPRO, Discriminant Analysis with Network PROcessing, a supervised network-based signature identification method. This method implements a network-based heuristic to generate one or more signatures out of the best performing feature pairs. The algorithm is easily scalable, allowing efficient computing for high number of observables ([Formula: see text]-[Formula: see text]). We show applications on real high-throughput genomic datasets in which our method outperforms existing results, or is compatible with them but with a smaller number of selected features. Moreover, the geometrical simplicity of the resulting class-separation surfaces allows a clearer interpretation of the obtained signatures in comparison to nonlinear classification models.
Collapse
Affiliation(s)
- Nico Curti
- grid.6292.f0000 0004 1757 1758Department of Physics and Astronomy, University of Bologna, Bologna, Italy ,grid.470193.80000 0004 8343 7610INFN Bologna, Bologna, Italy
| | - Giuseppe Levi
- grid.6292.f0000 0004 1757 1758Department of Physics and Astronomy, University of Bologna, Bologna, Italy ,grid.470193.80000 0004 8343 7610INFN Bologna, Bologna, Italy
| | - Enrico Giampieri
- grid.470193.80000 0004 8343 7610INFN Bologna, Bologna, Italy ,grid.6292.f0000 0004 1757 1758Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, Bologna, Italy
| | - Gastone Castellani
- grid.470193.80000 0004 8343 7610INFN Bologna, Bologna, Italy ,grid.6292.f0000 0004 1757 1758Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, Bologna, Italy
| | - Daniel Remondini
- grid.6292.f0000 0004 1757 1758Department of Physics and Astronomy, University of Bologna, Bologna, Italy ,grid.470193.80000 0004 8343 7610INFN Bologna, Bologna, Italy
| |
Collapse
|
14
|
Ruan J, Xu S, Chen R, Qu W, Li Q, Ye C, Wu W, Jiang Q, Yan F, Shen E, Chu Q, Jia Y, Zhang X, Fu W, Chen J, Timko MP, Zhao P, Fan L, Shen Y. EMLI-ICC: an ensemble machine learning-based integration algorithm for metastasis prediction and risk stratification in intrahepatic cholangiocarcinoma. Brief Bioinform 2022; 23:6762744. [PMID: 36259363 DOI: 10.1093/bib/bbac450] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 09/09/2022] [Accepted: 09/21/2022] [Indexed: 12/14/2022] Open
Abstract
Robust strategies to identify patients at high risk for tumor metastasis, such as those frequently observed in intrahepatic cholangiocarcinoma (ICC), remain limited. While gene/protein expression profiling holds great potential as an approach to cancer diagnosis and prognosis, previously developed protocols using multiple diagnostic signatures for expression-based metastasis prediction have not been widely applied successfully because batch effects and different data types greatly decreased the predictive performance of gene/protein expression profile-based signatures in interlaboratory and data type dependent validation. To address this problem and assist in more precise diagnosis, we performed a genome-wide integrative proteome and transcriptome analysis and developed an ensemble machine learning-based integration algorithm for metastasis prediction (EMLI-Metastasis) and risk stratification (EMLI-Prognosis) in ICC. Based on massive proteome (216) and transcriptome (244) data sets, 132 feature (biomarker) genes were selected and used to train the EMLI-Metastasis algorithm. To accurately detect the metastasis of ICC patients, we developed a weighted ensemble machine learning method based on k-Top Scoring Pairs (k-TSP) method. This approach generates a metastasis classifier for each bootstrap aggregating training data set. Ten binary expression rank-based classifiers were generated for detection of metastasis separately. To further improve the accuracy of the method, the 10 binary metastasis classifiers were combined by weighted voting based on the score from the prediction results of each classifier. The prediction accuracy of the EMLI-Metastasis algorithm achieved 97.1% and 85.0% in proteome and transcriptome datasets, respectively. Among the 132 feature genes, 21 gene-pair signatures were developed to establish a metastasis-related prognosis risk-stratification model in ICC (EMLI-Prognosis). Based on EMLI-Prognosis algorithm, patients in the high-risk group had significantly dismal overall survival relative to the low-risk group in the clinical cohort (P-value < 0.05). Taken together, the EMLI-ICC algorithm provides a powerful and robust means for accurate metastasis prediction and risk stratification across proteome and transcriptome data types that is superior to currently used clinicopathological features in patients with ICC. Our developed algorithm could have profound implications not just in improved clinical care in cancer metastasis risk prediction, but also more broadly in machine-learning-based multi-cohort diagnosis method development. To make the EMLI-ICC algorithm easily accessible for clinical application, we established a web-based server for metastasis risk prediction (http://ibi.zju.edu.cn/EMLI/).
Collapse
Affiliation(s)
- Jian Ruan
- Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, & Key Laboratory of Cancer Prevention and Intervention, Ministry of Education, People's Republic of China
| | - Shuaishuai Xu
- Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, & Key Laboratory of Cancer Prevention and Intervention, Ministry of Education, People's Republic of China
| | - Ruyin Chen
- Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, & Key Laboratory of Cancer Prevention and Intervention, Ministry of Education, People's Republic of China
| | - Wenxin Qu
- Department of Laboratory Medicine, The First Affiliated Hospital, Zhejiang University School of Medicine, People's Republic of China
| | - Qiong Li
- Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, & Key Laboratory of Cancer Prevention and Intervention, Ministry of Education, People's Republic of China
| | - Chanqi Ye
- Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, & Key Laboratory of Cancer Prevention and Intervention, Ministry of Education, People's Republic of China
| | - Wei Wu
- Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, & Key Laboratory of Cancer Prevention and Intervention, Ministry of Education, People's Republic of China
| | - Qi Jiang
- Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, & Key Laboratory of Cancer Prevention and Intervention, Ministry of Education, People's Republic of China
| | - Feifei Yan
- Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, & Key Laboratory of Cancer Prevention and Intervention, Ministry of Education, People's Republic of China
| | - Enhui Shen
- Institute of Bioinformatics, Zhejiang University, People's Republic of China
| | - Qinjie Chu
- Institute of Bioinformatics, Zhejiang University, People's Republic of China
| | - Yunlu Jia
- Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, & Key Laboratory of Cancer Prevention and Intervention, Ministry of Education, People's Republic of China
| | - Xiaochen Zhang
- Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, & Key Laboratory of Cancer Prevention and Intervention, Ministry of Education, People's Republic of China
| | - Wenguang Fu
- Department of Hepatobiliary Surgery, The Affiliated Hospital of Southwest Medical University, People's Republic of China
| | - Jinzhang Chen
- Department of Oncology, Nanfang Hospital, Southern medical University, People's Republic of China
| | - Michael P Timko
- Lewis and Clark Professor of Biology, Department of Biology, and professor of the Public Health Sciences, University of Virginia, U.S.A
| | - Peng Zhao
- Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, & Key Laboratory of Cancer Prevention and Intervention, Ministry of Education, People's Republic of China
| | - Longjiang Fan
- Institute of Bioinformatics, Zhejiang University, People's Republic of China
| | - Yifei Shen
- Department of Laboratory Medicine, The First Affiliated Hospital, Zhejiang University School of Medicine, & Key Laboratory of Clinical In Vitro Diagnostic Techniques of Zhejiang Province, & Institute of Laboratory Medicine, Zhejiang University, People's Republic of China
| |
Collapse
|
15
|
Wang S, Li L, Zuo S, Kong L, Wei J, Dong J. Metabolic-related gene pairs signature analysis identifies ABCA1 expression levels on tumor-associated macrophages as a prognostic biomarker in primary IDHWT glioblastoma. Front Immunol 2022; 13:869061. [PMID: 36248907 PMCID: PMC9561761 DOI: 10.3389/fimmu.2022.869061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 09/14/2022] [Indexed: 11/26/2022] Open
Abstract
Background Although isocitrate dehydrogenase (IDH) mutation serves as a prognostic signature for routine clinical management of glioma, nearly 90% of glioblastomas (GBM) patients have a wild-type IDH genotype (IDHWT) and lack reliable signatures to identify distinct entities. Methods To develop a robust prognostic signature for IDHWT GBM patients, we retrospectively analyzed 4 public datasets of 377 primary frozen tumor tissue transcriptome profiling and clinical follow-up data. Samples were divided into a training dataset (204 samples) and a validation (173 samples) dataset. A prognostic signature consisting of 21 metabolism-related gene pairs (MRGPs) was developed based on the relative ranking of single-sample gene expression levels. GSEA and immune subtype analyses were performed to reveal differences in biological processes between MRGP risk groups. The single-cell RNA-seq dataset was used to examine the expression distribution of each MRG constituting the signature in tumor tissue subsets. Finally, the association of MRGs with tumor progression was biologically validated in orthotopic GBM models. Results The metabolic signature remained an independent prognostic factor (hazard ratio, 5.71 [3.542-9.218], P < 0.001) for stratifying patients into high- and low-risk levels in terms of overall survival across subgroups with MGMTp methylation statuses, expression subtypes, and chemo/ratio therapies. Immune-related biological processes were significantly different between MRGP risk groups. Compared with the low-risk group, the high-risk group was significantly enriched in humoral immune responses and phagocytosis processes, and had more monocyte infiltration and less activated DC, NK, and γδ T cell infiltration. scRNA-seq dataset analysis identified that the expression levels of 5 MRGs (ABCA1, HMOX1, MTHFD2, PIM1, and PTPRE) in TAMs increased with metabolic risk. With tumor progression, the expression level of ABCA1 in TAMs was positively correlated with the population of TAMs in tumor tissue. Downregulation of ABCA1 levels can promote TAM polarization towards an inflammatory phenotype and control tumor growth. Conclusions The metabolic signature is expected to be used in the individualized management of primary IDHWT GBM patients.
Collapse
Affiliation(s)
- Shiqun Wang
- Jiangsu Key Laboratory of Molecular Medicine, Medical School of Nanjing University, Nanjing, Jiangsu, China
- The Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital), Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou, Zhejiang, China
| | - Lu Li
- Department of Nephrology, Affiliated Children’s Hospital of Zhejiang University, Hangzhou, Zhejiang, China
| | - Shuguang Zuo
- Liuzhou Key Laboratory of Molecular Diagnosis, Guangxi Key Laboratory of Molecular Diagnosis and Application, Affiliated Liutie Central Hospital of Guangxi Medical University, Liuzhou, Guangxi, China
| | - Lingkai Kong
- Jiangsu Key Laboratory of Molecular Medicine, Medical School of Nanjing University, Nanjing, Jiangsu, China
| | - Jiwu Wei
- Jiangsu Key Laboratory of Molecular Medicine, Medical School of Nanjing University, Nanjing, Jiangsu, China
- *Correspondence: Jie Dong, ; Jiwu Wei,
| | - Jie Dong
- Jiangsu Key Laboratory of Molecular Medicine, Medical School of Nanjing University, Nanjing, Jiangsu, China
- *Correspondence: Jie Dong, ; Jiwu Wei,
| |
Collapse
|
16
|
Guryleva MV, Penzar DD, Chistyakov DV, Mironov AA, Favorov AV, Sergeeva MG. Investigation of the Role of PUFA Metabolism in Breast Cancer Using a Rank-Based Random Forest Algorithm. Cancers (Basel) 2022; 14:cancers14194663. [PMID: 36230586 PMCID: PMC9562210 DOI: 10.3390/cancers14194663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2022] [Revised: 09/15/2022] [Accepted: 09/21/2022] [Indexed: 11/16/2022] Open
Abstract
Simple Summary Polyunsaturated fatty acids (PUFAs) and their derivatives, oxylipins, are a constant focus of cancer research due to the relationship between cancer and processes of energy metabolism and inflammation, where a PUFA system is an active player. Only recently have methods been developed that allow for studying such complex systems. Using the Rank-based Random Forest (RF) model, we show that PUFA metabolism genes are critical for the pathogenesis of breast cancer (BC); BC subtypes differ in PUFA metabolism gene expression. The enrichment of BC subtypes with various genes associated with oxylipin signaling pathways indicates a different contribution of these compounds to the biology of subtypes. Abstract Polyunsaturated fatty acid (PUFA) metabolism is currently a focus in cancer research due to PUFAs functioning as structural components of the membrane matrix, as fuel sources for energy production, and as sources of secondary messengers, so called oxylipins, important players of inflammatory processes. Although breast cancer (BC) is the leading cause of cancer death among women worldwide, no systematic study of PUFA metabolism as a system of interrelated processes in this disease has been carried out. Here, we implemented a Boruta-based feature selection algorithm to determine the list of most important PUFA metabolism genes altered in breast cancer tissues compared with in normal tissues. A rank-based Random Forest (RF) model was built on the selected gene list (33 genes) and applied to predict the cancer phenotype to ascertain the PUFA genes involved in cancerogenesis. It showed high-performance of dichotomic classification (balanced accuracy of 0.94, ROC AUC 0.99) We also retrieved a list of the important PUFA genes (46 genes) that differed between molecular subtypes at the level of breast cancer molecular subtypes. The balanced accuracy of the classification model built on the specified genes was 0.82, while the ROC AUC for the sensitivity analysis was 0.85. Specific patterns of PUFA metabolic changes were obtained for each molecular subtype of breast cancer. These results show evidence that (1) PUFA metabolism genes are critical for the pathogenesis of breast cancer; (2) BC subtypes differ in PUFA metabolism genes expression; and (3) the lists of genes selected in the models are enriched with genes involved in the metabolism of signaling lipids.
Collapse
Affiliation(s)
- Mariia V. Guryleva
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119234 Moscow, Russia
| | - Dmitry D. Penzar
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119234 Moscow, Russia
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
| | - Dmitry V. Chistyakov
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, 119992 Moscow, Russia
- Correspondence: ; Tel.: +7-495-939-4332
| | - Andrey A. Mironov
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119234 Moscow, Russia
- Kharkevich Institute of Information Transmission Problems, Russian Academy of Sciences, 127051 Moscow, Russia
| | - Alexander V. Favorov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
- School of Medicine, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Marina G. Sergeeva
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, 119992 Moscow, Russia
| |
Collapse
|
17
|
Ferrarotto R, Mishra V, Herz E, Yaacov A, Solomon O, Rauch R, Mondshine A, Motin M, Leibovich-Rivkin T, Davis M, Kaye J, Weber CR, Shen L, Pearson AT, Rosenberg AJ, Chen X, Singh A, Aster JC, Agrawal N, Izumchenko E. AL101, a gamma-secretase inhibitor, has potent antitumor activity against adenoid cystic carcinoma with activated NOTCH signaling. Cell Death Dis 2022; 13:678. [PMID: 35931701 PMCID: PMC9355983 DOI: 10.1038/s41419-022-05133-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Revised: 07/21/2022] [Accepted: 07/25/2022] [Indexed: 01/21/2023]
Abstract
Adenoid cystic carcinoma (ACC) is an aggressive salivary gland malignancy with limited treatment options for recurrent or metastatic disease. Due to chemotherapy resistance and lack of targeted therapeutic approaches, current treatment options for the localized disease are limited to surgery and radiation, which fails to prevent locoregional recurrences and distant metastases in over 50% of patients. Approximately 20% of patients with ACC carry NOTCH-activating mutations that are associated with a distinct phenotype, aggressive disease, and poor prognosis. Given the role of NOTCH signaling in regulating tumor cell behavior, NOTCH inhibitors represent an attractive potential therapeutic strategy for this subset of ACC. AL101 (osugacestat) is a potent γ-secretase inhibitor that prevents activation of all four NOTCH receptors. While this investigational new drug has demonstrated antineoplastic activity in several preclinical cancer models and in patients with advanced solid malignancies, we are the first to study the therapeutic benefit of AL101 in ACC. Here, we describe the antitumor activity of AL101 using ACC cell lines, organoids, and patient-derived xenograft models. Specifically, we find that AL101 has potent antitumor effects in in vitro and in vivo models of ACC with activating NOTCH1 mutations and constitutively upregulated NOTCH signaling pathway, providing a strong rationale for evaluation of AL101 in clinical trials for patients with NOTCH-driven relapsed/refractory ACC.
Collapse
Affiliation(s)
- Renata Ferrarotto
- Department of Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Vasudha Mishra
- Department of Medicine, Section of Hematology and Oncology, University of Chicago, Chicago, IL, USA
| | - Elad Herz
- Ayala Pharmaceuticals, Rehovot, Israel
| | | | | | | | | | | | | | | | - Joel Kaye
- Ayala Pharmaceuticals, Rehovot, Israel
| | | | - Le Shen
- Department of Pathology, University of Chicago, Chicago, IL, USA
| | - Alexander T Pearson
- Department of Medicine, Section of Hematology and Oncology, University of Chicago, Chicago, IL, USA
| | - Ari J Rosenberg
- Department of Medicine, Section of Hematology and Oncology, University of Chicago, Chicago, IL, USA
| | - Xiangying Chen
- Department of Medicine, Section of Hematology and Oncology, University of Chicago, Chicago, IL, USA
| | - Alka Singh
- Department of Medicine, Section of Hematology and Oncology, University of Chicago, Chicago, IL, USA
| | - Jon C Aster
- Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Nishant Agrawal
- Department of Surgery, Section of Otolaryngology-Head and Neck Surgery, University of Chicago, Chicago, IL, USA
| | - Evgeny Izumchenko
- Department of Medicine, Section of Hematology and Oncology, University of Chicago, Chicago, IL, USA.
| |
Collapse
|
18
|
Liu Y, Lin Y, Yang W, Lin Y, Wu Y, Zhang Z, Lin N, Wang X, Tong M, Yu R. Application of individualized differential expression analysis in human cancer proteome. Brief Bioinform 2022; 23:6562685. [DOI: 10.1093/bib/bbac096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 02/06/2022] [Accepted: 02/23/2022] [Indexed: 11/13/2022] Open
Abstract
Abstract
Liquid chromatography–mass spectrometry-based quantitative proteomics can measure the expression of thousands of proteins from biological samples and has been increasingly applied in cancer research. Identifying differentially expressed proteins (DEPs) between tumors and normal controls is commonly used to investigate carcinogenesis mechanisms. While differential expression analysis (DEA) at an individual level is desired to identify patient-specific molecular defects for better patient stratification, most statistical DEP analysis methods only identify deregulated proteins at the population level. To date, robust individualized DEA algorithms have been proposed for ribonucleic acid data, but their performance on proteomics data is underexplored. Herein, we performed a systematic evaluation on five individualized DEA algorithms for proteins on cancer proteomic datasets from seven cancer types. Results show that the within-sample relative expression orderings (REOs) of protein pairs in normal tissues were highly stable, providing the basis for individualized DEA for proteins using REOs. Moreover, individualized DEA algorithms achieve higher precision in detecting sample-specific deregulated proteins than population-level methods. To facilitate the utilization of individualized DEA algorithms in proteomics for prognostic biomarker discovery and personalized medicine, we provide Individualized DEP Analysis IDEPAXMBD (XMBD: Xiamen Big Data, a biomedical open software initiative in the National Institute for Data Science in Health and Medicine, Xiamen University, China.) (https://github.com/xmuyulab/IDEPA-XMBD), which is a user-friendly and open-source Python toolkit that integrates individualized DEA algorithms for DEP-associated deregulation pattern recognition.
Collapse
Affiliation(s)
- Yachen Liu
- School of Informatics, Xiamen University, Xiamen, Fujian 316000, China
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 316005, China
| | - Yalan Lin
- School of Informatics, Xiamen University, Xiamen, Fujian 316000, China
| | - Wenxian Yang
- Aginome Scientific, Xiamen, Fujian 316005, China
| | - Yuxiang Lin
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 316005, China
- State Key Laboratory of Cellular Stress Biology, Innovation Center for Cell Signaling Network, School of Life Sciences, Xiamen University, Xiamen, Fujian 361102, China
| | - Yujuan Wu
- School of Informatics, Xiamen University, Xiamen, Fujian 316000, China
| | - Zheyang Zhang
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 316005, China
- State Key Laboratory of Cellular Stress Biology, Innovation Center for Cell Signaling Network, School of Life Sciences, Xiamen University, Xiamen, Fujian 361102, China
| | - Nuoqi Lin
- State Key Laboratory of Cellular Stress Biology, Innovation Center for Cell Signaling Network, School of Life Sciences, Xiamen University, Xiamen, Fujian 361102, China
| | - Xianlong Wang
- Department of Bioinformatics, School of Medical Technology and Engineering, Key Laboratory of Medical Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, Fujian 350122, China
| | - Mengsha Tong
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 316005, China
- State Key Laboratory of Cellular Stress Biology, Innovation Center for Cell Signaling Network, School of Life Sciences, Xiamen University, Xiamen, Fujian 361102, China
| | - Rongshan Yu
- School of Informatics, Xiamen University, Xiamen, Fujian 316000, China
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 316005, China
- Aginome Scientific, Xiamen, Fujian 316005, China
| |
Collapse
|
19
|
Sahtout MO, Wang H, Ghimire S. Different thresholding methods on Nearest Shrunken Centroid algorithm. COMMUN STAT-SIMUL C 2022. [DOI: 10.1080/03610918.2022.2047201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Affiliation(s)
| | - Haiyan Wang
- Department of Statistics, Kansas State University, Manhattan, KS, USA
| | - Santosh Ghimire
- Department of Applied Sciences and Chemical Engineering, Pulchowk Campus, Tribhuvan University, Kirtipur, Nepal
| |
Collapse
|
20
|
Huang X, Liao Z, Liu B, Tao F, Su B, Lin X. A Novel Method for Constructing Classification Models by Combining Different Biomarker Patterns. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:786-794. [PMID: 32894721 DOI: 10.1109/tcbb.2020.3022076] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Different biomarker patterns, such as those of molecular biomarkers and ratio biomarkers, have their own merits in clinical applications. In this study, a novel machine learning method used in biomedical data analysis for constructing classification models by combining different biomarker patterns (CDBP)is proposed. CDBP uses relative expression reversals to measure the discriminative ability of different biomarker patterns, and selects the pattern with the higher score for classifier construction. The decision boundary of CDBP can be characterized in simple and biologically meaningful manners. The CDBP method was compared with eight state-of-the-art methods on eight gene expression datasets to test its performance. CDBP, with fewer features or ratio features, had the highest classification performance. Subsequently, CDBP was employed to extract crucial diagnostic information from a rat hepatocarcinogenesis metabolomics dataset. The potential biomarkers selected by CDBP provided better classification of hepatocellular carcinoma (HCC)and non-HCC stages than previous works in the animal model. The statistical analyses of these potential biomarkers in an independent human dataset confirmed their discriminative abilities of different liver diseases. These experimental results highlight the potential of CDBP for biomarker identification from high-dimensional biomedical datasets and demonstrate that it can be a useful tool for disease classification.
Collapse
|
21
|
IndGOterm: a qualitative method for the identification of individually dysregulated GO terms in cancer. Brief Bioinform 2022; 23:6526723. [DOI: 10.1093/bib/bbac012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2021] [Revised: 12/24/2021] [Accepted: 01/08/2022] [Indexed: 11/12/2022] Open
Abstract
Abstract
Individual pathway analysis can dissect heterogeneities among different cancer patients and provide efficient guidelines for individualized therapy. However, the existence of the batch effect brings extensive limitations for the application of many individual methods for pathway analysis. Previously, researchers proposed that methods based on within-sample relative expression ordering (REO) of the genes are notably insensitive to ‘batch effects’. In this article, we focus on the Gene Ontology (GO) database and propose an individual qualitative GO term analysis method (IndGOterm) based on the REO of genes. Compared with some current widely used single-sample enrichment analysis methods, such as ssGSEA and GSVA, IndGOterm has a predominance of ignoring the batch effects caused by diverse technologies. Through the survival and drug responses analysis, we found IndGOterm could capture more terms connected to cancer than other single-sample enrichment analysis methods. Furthermore, through the application of IndGOterm, we found some terms that present different dysregulation models that manifest heterogenetic in homologous patients. Collectively, these results attested that IndGOterm could capture useful information from patients and be a useful tool to reveal the intrinsic characteristic of cancer. An open-source R statistical analysis package ‘IndGOterm’ is available at https://github.com/robert19960424/IndGOterm.
Collapse
|
22
|
Ghantous Y, Omar M, Broner EC, Agrawal N, Pearson AT, Rosenberg AJ, Mishra V, Singh A, Abu El-naaj I, Savage PA, Sidransky D, Marchionni L, Izumchenko E. A robust and interpretable gene signature for predicting the lymph node status of primary T1/T2 oral cavity squamous cell carcinoma. Int J Cancer 2022; 150:450-460. [PMID: 34569064 PMCID: PMC8760163 DOI: 10.1002/ijc.33828] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 08/31/2021] [Accepted: 09/21/2021] [Indexed: 02/03/2023]
Abstract
Oral cavity squamous cell carcinoma (OSCC) affects more than 30 000 individuals in the United States annually, with smoking and alcohol consumption being the main risk factors. Management of early-stage tumors usually includes surgical resection followed by postoperative radiotherapy in certain cases. The cervical lymph nodes (LNs) are the most common site for local metastasis, and elective neck dissection is usually performed if the primary tumor thickness is greater than 3.5 mm. However, postoperative histological examination often reveals that many patients with early-stage disease are negative for neck nodal metastasis, posing a pressing need for improved risk stratification to either avoid overtreatment or prevent the disease progression. To this end, we aimed to identify a primary tumor gene signature that can accurately predict cervical LN metastasis in patients with early-stage OSCC. Using gene expression profiles from 189 samples, we trained K-top scoring pairs models and identified six gene pairs that can distinguish primary tumors with nodal metastasis from those without metastasis. The signature was further validated on an independent cohort of 35 patients using real-time polymerase chain reaction (PCR) in which it achieved an area under the receiver operating characteristic (ROC) curve and accuracy of 90% and 91%, respectively. These results indicate that such signature holds promise as a quick and cost effective method for detecting patients at high risk of developing cervical LN metastasis, and may be potentially used to guide the neck treatment regimen in early-stage OSCC.
Collapse
Affiliation(s)
- Yasmin Ghantous
- Department of Otolaryngology and Head & Neck Surgery, Johns Hopkins University, School of Medicine, Baltimore, MD, USA.4 Department of Medicine, University of Chicago, Chicago, IL, USA.,Department of Oral and Maxillofacial Surgery, Baruch Padeh Medical Center, Faculty of Medicine, Bar Ilan University, Israel
| | - Mohamed Omar
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Esther Channah Broner
- Department of Otolaryngology and Head & Neck Surgery, Johns Hopkins University, School of Medicine, Baltimore, MD, USA.4 Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Nishant Agrawal
- Section of Otolaryngology-Head and Neck Surgery, University of Chicago, Chicago, IL, USA
| | - Alexander T. Pearson
- Department of Medicine, Section of Hematology and Oncology, University of Chicago, Chicago, IL, USA
| | - Ari J. Rosenberg
- Department of Medicine, Section of Hematology and Oncology, University of Chicago, Chicago, IL, USA
| | - Vasudha Mishra
- Department of Medicine, Section of Hematology and Oncology, University of Chicago, Chicago, IL, USA
| | - Alka Singh
- Department of Medicine, Section of Hematology and Oncology, University of Chicago, Chicago, IL, USA
| | - Imad Abu El-naaj
- Department of Oral and Maxillofacial Surgery, Baruch Padeh Medical Center, Faculty of Medicine, Bar Ilan University, Israel
| | - Peter A. Savage
- Department of Pathology, University of Chicago, Chicago, IL, USA
| | - David Sidransky
- Department of Otolaryngology and Head & Neck Surgery, Johns Hopkins University, School of Medicine, Baltimore, MD, USA.4 Department of Medicine, University of Chicago, Chicago, IL, USA.,Corresponding Authors: Evgeny Izumchenko, Department of Medicine, Section of Hematology and Oncology, University of Chicago, Chicago, IL, USA. , Luigi Marchionni, Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA. , and David Sidransky, Departments of Otolaryngology and Oncology, Johns Hopkins University, Baltimore, MD, USA
| | - Luigi Marchionni
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA.,Corresponding Authors: Evgeny Izumchenko, Department of Medicine, Section of Hematology and Oncology, University of Chicago, Chicago, IL, USA. , Luigi Marchionni, Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA. , and David Sidransky, Departments of Otolaryngology and Oncology, Johns Hopkins University, Baltimore, MD, USA
| | - Evgeny Izumchenko
- Department of Medicine, Section of Hematology and Oncology, University of Chicago, Chicago, IL, USA.,Corresponding Authors: Evgeny Izumchenko, Department of Medicine, Section of Hematology and Oncology, University of Chicago, Chicago, IL, USA. , Luigi Marchionni, Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA. , and David Sidransky, Departments of Otolaryngology and Oncology, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
23
|
Kim DM, Feilotter HE, Davey SK. BRCA1 Variant Assessment Using a Simple Analytic Assay. J Appl Lab Med 2022; 7:674-688. [PMID: 35021209 DOI: 10.1093/jalm/jfab163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 10/04/2021] [Indexed: 11/14/2022]
Abstract
BACKGROUND We previously developed a biological assay to accurately predict BRCA1 (BRCA1 DNA repair associated) mutation status, based on gene expression profiles of Epstein-Barr virus-transformed lymphoblastoid cell lines. The original work was done using whole genome expression microarrays, and nearest shrunken centroids analysis. While these approaches are appropriate for model building, they are difficult to implement clinically, where more targeted testing and analysis are required for time and cost savings. METHODS Here, we describe adaptation of the original predictor to use the NanoString nCounter platform for testing, with analysis based on the k-top scoring pairs (k-TSP) method. RESULTS Assessing gene expression using the nCounter platform on a set of lymphoblastoid cell lines yielded 93.8% agreement with the microarray-derived data, and 87.5% overall correct classification of BRCA1 carriers and controls. Using the original gene expression microarray data used to develop our predictor with nearest shrunken centroids, we rebuilt a classifier based on the k-TSP method. This classifier relies on the relative expression of 10 pairs of genes, compared to the original 43 identified by nearest shrunken centroids (NSC), and was 96.2% concordant with the original training set prediction, with a 94.3% overall correct classification of BRCA1 carriers and controls. CONCLUSIONS The k-TSP classifier was shown to accurately predict BRCA1 status using data generated on the nCounter platform and is feasible for initiating a clinical validation.
Collapse
Affiliation(s)
- Daniel M Kim
- Department of Pathology and Molecular Medicine, Queen's University Cancer Research Institute, Queen's University, Kingston, ON, Canada.,Division of Cancer Biology and Genetics, Queen's University Cancer Research Institute, Queen's University, Kingston, ON, Canada
| | - Harriet E Feilotter
- Department of Pathology and Molecular Medicine, Queen's University Cancer Research Institute, Queen's University, Kingston, ON, Canada.,Division of Cancer Biology and Genetics, Queen's University Cancer Research Institute, Queen's University, Kingston, ON, Canada
| | - Scott K Davey
- Department of Pathology and Molecular Medicine, Queen's University Cancer Research Institute, Queen's University, Kingston, ON, Canada.,Division of Cancer Biology and Genetics, Queen's University Cancer Research Institute, Queen's University, Kingston, ON, Canada.,Departments of Oncology and Biomedical and Molecular Sciences, Queen's University Cancer Research Institute, Queen's University, Kingston, ON, Canada
| |
Collapse
|
24
|
Laganà A. The Architecture of a Precision Oncology Platform. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1361:1-22. [DOI: 10.1007/978-3-030-91836-1_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
25
|
Li C, Gao Z, Su B, Xu G, Lin X. Data analysis methods for defining biomarkers from omics data. Anal Bioanal Chem 2021; 414:235-250. [PMID: 34951658 DOI: 10.1007/s00216-021-03813-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 11/26/2021] [Accepted: 11/29/2021] [Indexed: 02/01/2023]
Abstract
Omics mainly includes genomics, epigenomics, transcriptomics, proteomics and metabolomics. The rapid development of omics technology has opened up new ways to study disease diagnosis and prognosis and to define prospective information of complex diseases. Since omics data are usually large and complex, the method used to analyze the data and to define important information is crucial in omics study. In this review, we focus on advances in biomarker discovery methods based on omics data in the last decade, and categorize them as individual feature analysis, combinatorial feature analysis and network analysis. We also discuss the challenges and perspectives in this field.
Collapse
Affiliation(s)
- Chao Li
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, Liaoning, China
| | - Zhenbo Gao
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
| | - Benzhe Su
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
| | - Guowang Xu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, Liaoning, China
| | - Xiaohui Lin
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China.
| |
Collapse
|
26
|
Eriksson P, Marzouka NAD, Sjödahl G, Bernardo C, Liedberg F, Höglund M. A comparison of rule-based and centroid single-sample multiclass predictors for transcriptomic classification. Bioinformatics 2021; 38:1022-1029. [PMID: 34788787 PMCID: PMC8796360 DOI: 10.1093/bioinformatics/btab763] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Revised: 10/24/2021] [Accepted: 11/02/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Gene expression-based multiclass prediction, such as tumor subtyping, is a non-trivial bioinformatic problem. Most classifier methods operate by comparing expression levels relative to other samples. Methods that base predictions on the expression pattern within a sample have been proposed as an alternative. As these methods are invariant to the cohort composition and can be applied to a sample in isolation, they can collectively be termed single sample predictors (SSP). Such predictors could potentially be used for preprocessing-free classification of new samples and be built to function across different expression platforms where proper batch and dataset normalization is challenging. Here, we evaluate the behavior of several multiclass SSPs based on binary gene-pair rules (k-Top Scoring Pairs, Absolute Intrinsic Molecular Subtyping and a new Random Forest approach) and compare them to centroids built with centered or raw expression values, with the criteria that an optimal predictor should have high accuracy, overcome differences in tumor purity, be robust across expression platforms and provide an informative prediction output score. RESULTS We found that gene-pair-based SSPs showed excellent performance on many expression-based classification tasks. The three methods differed in prediction score output, handling of tied scores and behavior in low purity samples. The k-Top Scoring Pairs and Random Forest approach both achieved high classification accuracy while providing an informative prediction score. Although gene-pair-based SSPs have been touted as being cross-platform compatible (through training on mixed platform data), out-of-the-box compatibility with a new dataset remains a potential issue that warrants cohort-to-cohort verification. AVAILABILITY AND IMPLEMENTATION Our R package 'multiclassPairs' (https://cran.r-project.org/package=multiclassPairs) (https://doi.org/10.1093/bioinformatics/btab088) is freely available and enables easy training, prediction, and visualization using the gene-pair rule-based Random Forest SSP method and provides additional multiclass functionalities to the switchBox k-Top-Scoring Pairs package. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Nour-al-dain Marzouka
- Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden
| | - Gottfrid Sjödahl
- Urology - urothelial cancer, Department of Translational Medicine, Lund University, Skåne University Hospital, Malmö, Sweden
| | - Carina Bernardo
- Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden
| | - Fredrik Liedberg
- Urology - urothelial cancer, Department of Translational Medicine, Lund University, Skåne University Hospital, Malmö, Sweden
| | - Mattias Höglund
- Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden
| |
Collapse
|
27
|
Omar M, Marchionni L, Häcker G, Badr MT. Host Blood Gene Signatures Can Detect the Progression to Severe and Cerebral Malaria. Front Cell Infect Microbiol 2021; 11:743616. [PMID: 34746025 PMCID: PMC8569259 DOI: 10.3389/fcimb.2021.743616] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Accepted: 09/23/2021] [Indexed: 11/16/2022] Open
Abstract
Malaria is a major international public health problem that affects millions of patients worldwide especially in sub-Saharan Africa. Although many tests have been developed to diagnose malaria infections, we still lack reliable diagnostic biomarkers for the identification of disease severity, especially in endemic areas where the diagnosis of cerebral malaria is very difficult and requires the exclusion of all other possible causes. Previous host and pathogen transcriptomic studies have not yielded homogenous results that can be harnessed into a reliable diagnostic tool. Here we utilized a multi-cohort analysis approach using machine-learning algorithms to identify blood gene signatures that can distinguish severe and cerebral malaria from moderate and non-cerebral cases. Using a Regularized Random Forest model, we identified 28-gene and 32-gene signatures that can reliably distinguish severe and cerebral malaria, respectively. We tested the specificity of both signatures against other common infectious diseases to ensure the signatures reliability and suitability as diagnostic markers. The severe and cerebral malaria gene-signatures were further integrated through k-top scoring pairs classifiers into ten and nine gene pairs that could distinguish severe and cerebral malaria, respectively. These signatures have various implications that can be utilized as blood diagnostic tools for malaria severity in endemic countries.
Collapse
Affiliation(s)
- Mohamed Omar
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, United States
| | - Luigi Marchionni
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, United States
| | - Georg Häcker
- Institute of Medical Microbiology and Hygiene, Medical Center - University of Freiburg, Faculty of Medicine, Freiburg, Germany.,BIOSS Centre for Biological Signaling Studies, University of Freiburg, Freiburg, Germany
| | - Mohamed Tarek Badr
- Institute of Medical Microbiology and Hygiene, Medical Center - University of Freiburg, Faculty of Medicine, Freiburg, Germany.,IMM-PACT-Program, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| |
Collapse
|
28
|
Breast Cancer Consensus Subtypes: A system for subtyping breast cancer tumors based on gene expression. NPJ Breast Cancer 2021; 7:136. [PMID: 34642313 PMCID: PMC8511026 DOI: 10.1038/s41523-021-00345-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 09/21/2021] [Indexed: 12/11/2022] Open
Abstract
Breast cancer is heterogeneous in prognoses and drug responses. To organize breast cancers by gene expression independent of statistical methodology, we identified the Breast Cancer Consensus Subtypes (BCCS) as the consensus groupings of six different subtyping methods. Our classification software identified seven BCCS subtypes in a study cohort of publicly available data (n = 5950) including METABRIC, TCGA-BRCA, and data assayed by Affymetrix arrays. All samples were fresh-frozen from primary tumors. The estrogen receptor-positive (ER+) BCCS subtypes were: PCS1 (18%) good prognosis, stromal infiltration; PCS2 (15%) poor prognosis, highly proliferative; PCS3 (13%) poor prognosis, highly proliferative, activated IFN-gamma signaling, cytotoxic lymphocyte infiltration, high tumor mutation burden; PCS4 (18%) good prognosis, hormone response genes highly expressed. The ER− BCCS subtypes were: NCS1 (11%) basal; NCS2 (10%) elevated androgen response; NCS3 (5%) cytotoxic lymphocyte infiltration; unclassified tumors (9%). HER2+ tumors were heterogeneous with respect to BCCS.
Collapse
|
29
|
Xue Z, Yang S, Luo Y, Cai H, He M, Ding Y, Lei L, Peng W, Hong G, Guo Y. A 41-Gene Pair Signature for Predicting the Pathological Response of Locally Advanced Rectal Cancer to Neoadjuvant Chemoradiation. Front Med (Lausanne) 2021; 8:744295. [PMID: 34595195 PMCID: PMC8476893 DOI: 10.3389/fmed.2021.744295] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Accepted: 08/23/2021] [Indexed: 01/04/2023] Open
Abstract
Background and Purpose: Pathological response status is a standard reference for the early evaluation of the effect of neoadjuvant chemoradiation (nCRT) on locally advanced rectal cancer (LARC) patients. Various patients respond differently to nCRT, but identifying the pathological response of LARC to nCRT remains a challenge. Therefore, we aimed to identify a signature that can predict the response of LARC to nCRT. Material and Methods: The gene expression profiles of 111 LARC patients receiving fluorouracil-based nCRT were used to obtain gene pairs with within-sample relative expression orderings related to pathological response. These reversal gene pairs were ranked according to the mean decrease Gini index provided by the random forest algorithm to obtain the signature. This signature was verified in two public cohorts of 46 and 42 samples, and a cohort of 33 samples measured at our laboratory. In addition, the signature was used to predict disease-free survival benefits in a series of colorectal cancer datasets. Results: A 41-gene pair signature (41-GPS) was identified in the training cohort with an accuracy of 84.68% and an area under the receiver operating characteristic curve (AUC) of 0.94. In the two public test cohorts, the accuracy was 93.37 and 73.81%, with AUCs of 0.97 and 0.86, respectively. In our dataset, the AUC was 0.80. The results of the survival analysis show that 41-GPS plays an effective role in identifying patients who will respond to nCRT and have a better prognosis. Conclusion: The signature consisting of 41 gene pairs can robustly predict the clinical pathological response of LARC patients to nCRT.
Collapse
Affiliation(s)
- Zhengfa Xue
- School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou, China.,Medical Big Data and Bioinformatics Research Centre, First Affiliated Hospital of Gannan Medical University, Ganzhou, China
| | - Shuxin Yang
- School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou, China
| | - Yun Luo
- Medical Big Data and Bioinformatics Research Centre, First Affiliated Hospital of Gannan Medical University, Ganzhou, China
| | - Hao Cai
- Medical Big Data and Bioinformatics Research Centre, First Affiliated Hospital of Gannan Medical University, Ganzhou, China
| | - Ming He
- Medical Big Data and Bioinformatics Research Centre, First Affiliated Hospital of Gannan Medical University, Ganzhou, China
| | - Youping Ding
- Medical Big Data and Bioinformatics Research Centre, First Affiliated Hospital of Gannan Medical University, Ganzhou, China
| | - Lei Lei
- Medical Big Data and Bioinformatics Research Centre, First Affiliated Hospital of Gannan Medical University, Ganzhou, China
| | - Wei Peng
- Medical Big Data and Bioinformatics Research Centre, First Affiliated Hospital of Gannan Medical University, Ganzhou, China
| | - Guini Hong
- School of Medical Information Engineering, Gannan Medical University, Ganzhou, China
| | - You Guo
- School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou, China.,Medical Big Data and Bioinformatics Research Centre, First Affiliated Hospital of Gannan Medical University, Ganzhou, China
| |
Collapse
|
30
|
Yang Y, Zhang T, Xiao R, Hao X, Zhang H, Qu H, Xie B, Wang T, Fang X. Platform-independent approach for cancer detection from gene expression profiles of peripheral blood cells. Brief Bioinform 2021; 21:1006-1015. [PMID: 30895303 DOI: 10.1093/bib/bbz027] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2018] [Revised: 02/04/2019] [Accepted: 02/18/2019] [Indexed: 01/08/2023] Open
Abstract
Peripheral blood gene expression intensity-based methods for distinguishing healthy individuals from cancer patients are limited by sensitivity to batch effects and data normalization and variability between expression profiling assays. To improve the robustness and precision of blood gene expression-based tumour detection, it is necessary to perform molecular diagnostic tests using a more stable approach. Taking breast cancer as an example, we propose a machine learning-based framework that distinguishes breast cancer patients from healthy subjects by pairwise rank transformation of gene expression intensity in each sample. We showed the diagnostic potential of the method by performing RNA-seq for 37 peripheral blood samples from breast cancer patients and by collecting RNA-seq data from healthy donors in Genotype-Tissue Expression project and microarray mRNA expression datasets in Gene Expression Omnibus. The framework was insensitive to experimental batch effects and data normalization, and it can be simultaneously applied to new sample prediction.
Collapse
Affiliation(s)
- Yadong Yang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Tao Zhang
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Rudan Xiao
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Xiaopeng Hao
- Breast Oncology Department, Affiliated Hospital, Academy of Military Medical Sciences, Beijing, China
| | - Huiqiang Zhang
- Breast Oncology Department, Affiliated Hospital, Academy of Military Medical Sciences, Beijing, China
| | - Hongzhu Qu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Bingbing Xie
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Tao Wang
- Breast Oncology Department, Affiliated Hospital, Academy of Military Medical Sciences, Beijing, China
| | - Xiangdong Fang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
31
|
Cirenajwis H, Lauss M, Planck M, Vallon-Christersson J, Staaf J. Performance of gene expression-based single sample predictors for assessment of clinicopathological subgroups and molecular subtypes in cancers: a case comparison study in non-small cell lung cancer. Brief Bioinform 2021; 21:729-740. [PMID: 30721923 PMCID: PMC7299291 DOI: 10.1093/bib/bbz008] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Revised: 12/04/2018] [Accepted: 01/07/2019] [Indexed: 12/14/2022] Open
Abstract
The development of multigene classifiers for cancer prognosis, treatment prediction, molecular subtypes or clinicopathological groups has been a cornerstone in transcriptomic analyses of human malignancies for nearly two decades. However, many reported classifiers are critically limited by different preprocessing needs like normalization and data centering. In response, a new breed of classifiers, single sample predictors (SSPs), has emerged. SSPs classify samples in an N-of-1 fashion, relying on, e.g. gene rules comparing expression values within a sample. To date, several methods have been reported, but there is a lack of head-to-head performance comparison for typical cancer classification problems, representing an unmet methodological need in cancer bioinformatics. To resolve this need, we performed an evaluation of two SSPs [k-top-scoring pair classifier (kTSP) and absolute intrinsic molecular subtyping (AIMS)] for two case examples of different magnitude of difficulty in non-small cell lung cancer: gene expression–based classification of (i) tumor histology and (ii) molecular subtype. Through the analysis of ~2000 lung cancer samples for each case example (n = 1918 and n = 2106, respectively), we compared the performance of the methods for different sample compositions, training data set sizes, gene expression platforms and gene rule selections. Three main conclusions are drawn from the comparisons: both methods are platform independent, they select largely overlapping gene rules associated with actual underlying tumor biology and, for large training data sets, they behave interchangeably performance-wise. While SSPs like AIMS and kTSP offer new possibilities to move gene expression signatures/predictors closer to a clinical context, they are still importantly limited by the difficultness of the classification problem at hand.
Collapse
Affiliation(s)
- Helena Cirenajwis
- Division of Oncology and Pathology, Department of Clinical Sciences Lund, Lund University, Medicon Village, Lund, Sweden
| | - Martin Lauss
- Division of Oncology and Pathology, Department of Clinical Sciences Lund, Lund University, Medicon Village, Lund, Sweden
| | - Maria Planck
- Division of Oncology and Pathology, Department of Clinical Sciences Lund, Lund University, Medicon Village, Lund, Sweden
| | - Johan Vallon-Christersson
- Division of Oncology and Pathology, Department of Clinical Sciences Lund, Lund University, Medicon Village, Lund, Sweden
| | - Johan Staaf
- Division of Oncology and Pathology, Department of Clinical Sciences Lund, Lund University, Medicon Village, Lund, Sweden
| |
Collapse
|
32
|
Lenhof K, Gerstner N, Kehl T, Eckhart L, Schneider L, Lenhof HP. Merida: a novel boolean logic based integer linear program for personalized cancer therapy. Bioinformatics 2021; 37:3881-3888. [PMID: 34352075 PMCID: PMC8570817 DOI: 10.1093/bioinformatics/btab546] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Revised: 07/07/2021] [Accepted: 08/03/2021] [Indexed: 11/13/2022] Open
Abstract
Motivation A major goal of personalized medicine in oncology is the optimization of treatment strategies given measurements of the genetic and molecular profiles of cancer cells. To further our knowledge on drug sensitivity, machine learning techniques are commonly applied to cancer cell line panels. Results We present a novel integer linear programming formulation, called MEthod for Rule Identification with multi-omics DAta (MERIDA), for predicting the drug sensitivity of cancer cells. The method represents a modified version of the LOBICO method and yields easily interpretable models amenable to a Boolean logic-based interpretation. Since the proposed altered logical rules lead to an enormous acceleration of the running times of MERIDA compared to LOBICO, we cannot only consider larger input feature sets integrated from genetic and molecular omics data but also build more comprehensive models that mirror the complexity of cancer initiation and progression. Moreover, we enable the inclusion of a priori knowledge that can either stem from biomarker databases or can also be newly acquired knowledge gathered iteratively by previous runs of MERIDA. Our results show that this approach does not only lead to an improved predictive performance but also identifies a variety of putative sensitivity and resistance biomarkers. We also compare our approach to state-of-the-art machine learning methods and demonstrate the superior performance of our method. Hence, MERIDA has great potential to deepen our understanding of the molecular mechanisms causing drug sensitivity or resistance. Availability and implementation The corresponding code is available on github (https://github.com/unisb-bioinf/MERIDA.git). Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kerstin Lenhof
- Center for Bioinformatics Saar, Saarland University, Saarland Informatics Campus (E2.1), Saarbrücken, 66123, Germany
| | - Nico Gerstner
- Center for Bioinformatics Saar, Saarland University, Saarland Informatics Campus (E2.1), Saarbrücken, 66123, Germany
| | - Tim Kehl
- Center for Bioinformatics Saar, Saarland University, Saarland Informatics Campus (E2.1), Saarbrücken, 66123, Germany
| | - Lea Eckhart
- Center for Bioinformatics Saar, Saarland University, Saarland Informatics Campus (E2.1), Saarbrücken, 66123, Germany
| | - Lara Schneider
- Center for Bioinformatics Saar, Saarland University, Saarland Informatics Campus (E2.1), Saarbrücken, 66123, Germany
| | - Hans-Peter Lenhof
- Center for Bioinformatics Saar, Saarland University, Saarland Informatics Campus (E2.1), Saarbrücken, 66123, Germany
| |
Collapse
|
33
|
Shen Y, Chu Q, Timko MP, Fan L. scDetect: a rank-based ensemble learning algorithm for cell type identification of single-cell RNA sequencing in cancer. Bioinformatics 2021; 37:4115-4122. [PMID: 34048541 DOI: 10.1093/bioinformatics/btab410] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Revised: 05/16/2021] [Accepted: 05/27/2021] [Indexed: 01/23/2023] Open
Abstract
MOTIVATION Single-cell RNA sequencing (scRNA-seq) has enabled the characterization of different cell types in many tissues and tumor samples. Cell type identification is essential for single-cell RNA profiling, currently transforming the life sciences. Often, this is achieved by searching for combinations of genes that have previously been implicated as being cell-type specific, an approach that is not quantitative and does not explicitly take advantage of other scRNA-seq studies. Batch effects and different data platforms greatly decrease the predictive performance in inter-laboratory and different data type validation. RESULTS Here, we present a new ensemble learning method named as "scDetect" that combines gene expression rank-based analysis and a majority vote ensemble machine-learning probability-based prediction method capable of highly accurate classification of cells based on scRNA-seq data by different sequencing platforms. Because of tumor heterogeneity, in order to accurately predict tumor cells in the single cell RNA-seq data, we have also incorporated cell copy number variation consensus clustering and epithelial score in the classification. We applied scDetect to scRNA-seq data from pancreatic tissue, mononuclear cells, and tumor biopsies cells and show that scDetect classified individual cells with high accuracy and better than other publicly available tools. AVAILABILITY scDetect is an open source software. Source code and test data is freely available from Github (https://github.com/IVDgenomicslab/scDetect/) and Zenodo (https://zenodo.org/record/4764132#.YKCOlrH5AYN). The examples and tutorial page is at https://ivdgenomicslab.github.io/scDetect-Introduction/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yifei Shen
- Centre of Clinical Laboratory, First Affiliated Hospital, College of Medicine, Zhejiang University, China.,Key Laboratory of Clinical In Vitro Diagnostic Techniques of Zhejiang Province, China.,Institute of Laboratory Medicine, Zhejiang University, China
| | - Qinjie Chu
- Institute of Bioinformatics, Zhejiang University, China
| | - Michael P Timko
- Departments of Biology and Public Health Sciences, University of Virginia, USA
| | - Longjiang Fan
- Institute of Bioinformatics, Zhejiang University, China.,Department of Medical Oncology, First Affiliated Hospital, College of Medicine, Zhejiang University, China
| |
Collapse
|
34
|
Identification and Verification of a 17 Immune-Related Gene Pair Prognostic Signature for Colon Cancer. BIOMED RESEARCH INTERNATIONAL 2021; 2021:6057948. [PMID: 34124251 PMCID: PMC8166469 DOI: 10.1155/2021/6057948] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Revised: 04/15/2021] [Accepted: 05/13/2021] [Indexed: 12/12/2022]
Abstract
Background Colon cancer (CC) is a malignant tumor with a high incidence and poor prognosis. Accumulating evidence shows that the immune signature plays an important role in the tumorigenesis, progression, and prognosis of CC. Our study is aimed at establishing a novel robust immune-related gene pair signature for predicting the prognosis of CC. Methods Gene expression profiles and corresponding clinical information are obtained from two public data sets: The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO, GSE39582). We screened out immune-related gene pairs (IRGPs) associated with prognosis in the discovery cohort. Lasso-Cox proportional hazard regression was used to develop the best prognostic signature model. According to this, the patients in the validation cohort were divided into high immune-risk group and low immune-risk group, and the prediction ability of the signature model was verified by survival analysis and independent prognostic analysis. Results A total of 17 IRGPs composed of 26 IRGs were used to construct a prognostic-related risk scoring model. This model accurately predicted the prognosis of CC patients, and the patients in the high immune-risk group indicated poor prognosis in the discovery cohort and validation cohort. Besides, whether in univariate or multivariate analysis, the IRGP signature was an independent prognostic factor. T cell CD4 memory resting in the low-risk group was significantly higher than that in the high-risk group. Functional analysis showed that the biological processes of the low-risk group included "TCA cycle" and "RNA degradation," while the high-risk group was enriched in the "CAMs" and "focal adhesion" pathways. Conclusion We have successfully established a signature model composed of 17 IRGPs, which provides a novel idea to predict the prognosis of CC patients.
Collapse
|
35
|
Chen K, Xu H, Lei Y, Lio P, Li Y, Guo H, Ali Moni M. Integration and interplay of machine learning and bioinformatics approach to identify genetic interaction related to ovarian cancer chemoresistance. Brief Bioinform 2021; 22:6272796. [PMID: 33971668 DOI: 10.1093/bib/bbab100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 03/04/2021] [Accepted: 03/06/2021] [Indexed: 11/15/2022] Open
Abstract
Although chemotherapy is the first-line treatment for ovarian cancer (OCa) patients, chemoresistance (CR) decreases their progression-free survival. This paper investigates the genetic interaction (GI) related to OCa-CR. To decrease the complexity of establishing gene networks, individual signature genes related to OCa-CR are identified using a gradient boosting decision tree algorithm. Additionally, the genetic interaction coefficient (GIC) is proposed to measure the correlation of two signature genes quantitatively and explain their joint influence on OCa-CR. Gene pair that possesses high GIC is identified as signature pair. A total of 24 signature gene pairs are selected that include 10 individual signature genes and the influence of signature gene pairs on OCa-CR is explored. Finally, a signature gene pair-based prediction of OCa-CR is identified. The area under curve (AUC) is a widely used performance measure for machine learning prediction. The AUC of signature gene pair reaches 0.9658, whereas the AUC of individual signature gene-based prediction is 0.6823 only. The identified signature gene pairs not only build an efficient GI network of OCa-CR but also provide an interesting way for OCa-CR prediction. This improvement shows that our proposed method is a useful tool to investigate GI related to OCa-CR.
Collapse
Affiliation(s)
- Kexin Chen
- School of Electronics Engineering and Computer Science, Peking University, 100871, Beijing, China
| | - Haoming Xu
- Department of Biomedical Engineering, Duke University, 27708, Durham, United States
| | - Yiming Lei
- School of Electronics Engineering and Computer Science, Peking University, 100871, Beijing, China
| | - Pietro Lio
- Computer Laboratory, University of Cambridge, CB3-0FD, Cambridge, United Kingdom
| | - Yuan Li
- Department of Obstetrics and Gynecology, Peking University Third Hospital, 100083, Beijing, China
| | - Hongyan Guo
- Department of Obstetrics and Gynecology, Peking University Third Hospital, 100083, Beijing, China
| | - Mohammad Ali Moni
- School of Public health and Community Medicine, University of New South Wales, 2052, Sydney, Australia
| |
Collapse
|
36
|
Cheng J, Guo Y, Guan G, Huang H, Jiang F, He J, Wu J, Guo Z, Liu X, Ao L. Two novel qualitative transcriptional signatures robustly applicable to non-research-oriented colorectal cancer samples with low-quality RNA. J Cell Mol Med 2021; 25:3622-3633. [PMID: 33719152 PMCID: PMC8034468 DOI: 10.1111/jcmm.16467] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Revised: 02/19/2021] [Accepted: 03/01/2021] [Indexed: 12/12/2022] Open
Abstract
Currently, due to the low quality of RNA caused by degradation or low abundance, the accuracy of gene expression measurements by transcriptome sequencing (RNA‐seq) is very challenging for non‐research‐oriented clinical samples, majority of which are preserved in hospitals or tissue banks worldwide with complete pathological information and follow‐up data. Molecular signatures consisting of several genes are rarely applied to such samples. To utilize these resources effectively, 45 stage II non‐research‐oriented samples which were formalin‐fixed paraffin‐embedded (FFPE) colorectal carcinoma samples (CRC) using RNA‐seq have been analysed. Our results showed that although gene expression measurements were significantly affected, most cancer features, based on the relative expression orderings (REOs) of gene pairs, were well preserved. We then developed two REO‐based signatures, which consisted of 136 gene pairs for early diagnosis of CRC, and 4500 gene pairs for predicting post‐surgery relapse risk of stage II and III CRC. The performance of our signatures, which included hundreds or thousands of gene pairs, was more robust for non‐research‐oriented clinical samples, compared to that of two published concise REO‐based signatures. In conclusion, REO‐based signatures with relatively more gene pairs could be robustly applied to non‐research‐oriented CRC samples.
Collapse
Affiliation(s)
- Jun Cheng
- Affiliated Foshan Maternity and Child Healthcare Hospital, Southern Medical University (Foshan Maternity & Child Healthcare Hospital), Foshan, China.,Department of Bioinformatics, Fujian Key Laboratory of Medical Bioinformatics, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, China.,Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, China
| | - Yating Guo
- Department of Bioinformatics, Fujian Key Laboratory of Medical Bioinformatics, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, China.,Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, China
| | - Guoxian Guan
- Department of Colorectal Surgery, The Affiliated Union Hospital of Fujian Medical University, Fuzhou, China
| | - Haiyan Huang
- Department of Bioinformatics, Fujian Key Laboratory of Medical Bioinformatics, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, China.,Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, China
| | - Fengle Jiang
- Department of Bioinformatics, Fujian Key Laboratory of Medical Bioinformatics, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, China.,Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, China
| | - Jun He
- Department of Bioinformatics, Fujian Key Laboratory of Medical Bioinformatics, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, China.,Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, China
| | - Junling Wu
- Department of Bioinformatics, Fujian Key Laboratory of Medical Bioinformatics, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, China.,Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, China
| | - Zheng Guo
- Department of Bioinformatics, Fujian Key Laboratory of Medical Bioinformatics, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, China.,Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, China
| | - Xing Liu
- Department of Colorectal Surgery, The Affiliated Union Hospital of Fujian Medical University, Fuzhou, China
| | - Lu Ao
- Department of Bioinformatics, Fujian Key Laboratory of Medical Bioinformatics, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, China.,Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, China
| |
Collapse
|
37
|
Chen A, Laeyendecker O, Eshleman SH, Monaco DR, Kammers K, Larman HB, Ruczinski I. A top scoring pairs classifier for recent HIV infections. Stat Med 2021; 40:2604-2612. [PMID: 33660319 DOI: 10.1002/sim.8920] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 01/07/2021] [Accepted: 02/03/2021] [Indexed: 11/11/2022]
Abstract
Accurate incidence estimation of HIV infection from cross-sectional biomarker data is crucial for monitoring the epidemic and determining the impact of HIV prevention interventions. A key feature of cross-sectional incidence testing methods is the mean window period, defined as the average duration that infected individuals are classified as recently infected. Two assays available for cross-sectional incidence estimation, the BED capture immunoassay, and the Limiting Antigen (LAg) Avidity assay, measure a general characteristic of antibody response; performance of these assays can be affected and biased by factors such as viral suppression, resulting in sample misclassification and overestimation of HIV incidence. As availability and use of antiretroviral treatment increase worldwide, algorithms that do not include HIV viral load and are not impacted by viral suppression are needed for cross-sectional HIV incidence estimation. Using a phage display system to quantify antibody binding to over 3300 HIV peptides, we present a classifier based on top scoring peptide pairs that identifies recent infections using HIV antibody responses alone. Based on plasma samples from individuals with known dates of seroconversion, we estimated the mean window period for our classifier to be 217 days (95% confidence interval 183 to 257 days), compared to the estimated mean window period for the LAg-Avidity protocol of 106 days (76 to 146 days). Moreover, each of the four peptide pairs correctly classified more of the recent samples than the LAg-Avidity assay alone at the same classification accuracy for non-recent samples.
Collapse
Affiliation(s)
- Athena Chen
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Oliver Laeyendecker
- Laboratory of Immunoregulation, Division of Intramural Research, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Baltimore, Maryland, USA.,Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Susan H Eshleman
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Daniel R Monaco
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Kai Kammers
- Division of Biostatistics and Bioinformatics, Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Harry Benjamin Larman
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Ingo Ruczinski
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
| |
Collapse
|
38
|
Marzouka NAD, Eriksson P. multiclassPairs: an R package to train multiclass pair-based classifier. Bioinformatics 2021; 37:3043-3044. [PMID: 33543757 PMCID: PMC8479681 DOI: 10.1093/bioinformatics/btab088] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 01/27/2021] [Accepted: 02/02/2021] [Indexed: 02/02/2023] Open
Abstract
MOTIVATION k-Top Scoring Pairs (kTSP) algorithms utilize in-sample gene expression feature pair rules for class prediction, and have demonstrated excellent performance and robustness. The available packages and tools primarily focus on binary prediction (i.e. two classes). However, many real-world classification problems e.g. tumor subtype prediction, are multiclass tasks. RESULTS Here, we present multiclassPairs, an R package to train pair-based single sample classifiers for multiclass problems. multiclassPairs offers two main methods to build multiclass prediction models, either using a one-versus-rest kTSP scheme or through a novel pair-based Random Forest approach. The package also provides options for dealing with class imbalances, multiplatform training, missing features in test data and visualization of training and test results. AVAILABILITY AND IMPLEMENTATION 'multiclassPairs' package is available on CRAN servers and GitHub: https://github.com/NourMarzouka/multiclassPairs. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nour-Al-Dain Marzouka
- Department of Clinical Sciences, Division of Oncology, Lund University, 22381 Lund, Sweden,To whom correspondence should be addressed.
| | - Pontus Eriksson
- Department of Clinical Sciences, Division of Oncology, Lund University, 22381 Lund, Sweden
| |
Collapse
|
39
|
Cai R, Li J, Zhang Z, Yang X, Hao Z. DACH: Domain Adaptation Without Domain Information. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:5055-5067. [PMID: 31976912 DOI: 10.1109/tnnls.2019.2962817] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Domain adaptation is becoming increasingly important for learning systems in recent years, especially with the growing diversification of data domains in real-world applications, such as the genetic data from various sequencing platforms and video feeds from multiple surveillance cameras. Traditional domain adaptation approaches target to design transformations for each individual domain so that the twisted data from different domains follow an almost identical distribution. In many applications, however, the data from diversified domains are simply dumped to an archive even without clear domain labels. In this article, we discuss the possibility of learning domain adaptations even when the data does not contain domain labels. Our solution is based on our new model, named domain adaption using cross-domain homomorphism (DACH in short), to identify intrinsic homomorphism hidden in mixed data from all domains. DACH is generally compatible with existing deep learning frameworks, enabling the generation of nonlinear features from the original data domains. Our theoretical analysis not only shows the universality of the homomorphism, but also proves the convergence of DACH for significant homomorphism structures over the data domains is preserved. Empirical studies on real-world data sets validate the effectiveness of DACH on merging multiple data domains for joint machine learning tasks and the scalability of our algorithm to domain dimensionality.
Collapse
|
40
|
Moody L, Chen H, Pan YX. Considerations for feature selection using gene pairs and applications in large-scale dataset integration, novel oncogene discovery, and interpretable cancer screening. BMC Med Genomics 2020; 13:148. [PMID: 33087122 PMCID: PMC7579924 DOI: 10.1186/s12920-020-00778-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Background Advancements in transcriptomic profiling have led to the emergence of new challenges regarding data integration and interpretability. Variability between measurement platforms makes it difficult to compare between cohorts, and large numbers of gene features have encouraged the use black box methods that are not easily translated into biologically and clinically meaningful findings. We propose that gene rankings and algorithms that rely on relative expression within gene pairs can address such obstacles. Methods We implemented an innovative process to evaluate the performance of five feature selection methods on simulated gene-pair data. Along with TSP, we consider other methods that retain more information in their score calculations, including the magnitude of gene expression change as well as within-class variation. Tree-based rule extraction was also applied to serum microRNA (miRNA) pairs in order to devise a noninvasive screening tool for pancreatic and ovarian cancer. Results Gene pair data were simulated using different types of signal and noise. Pairs were filtered using feature selection approaches, including top-scoring pairs (TSP), absolute differences between gene ranks, and Fisher scores. Methods that retain more information, such as the magnitude of expression change and within-class variance, yielded higher classification accuracy using a random forest model. We then demonstrate two powerful applications of gene pairs by first performing large-scale integration of 52 breast cancer datasets consisting of 10,350 patients. Not only did we confirm known oncogenes, but we also propose novel tumorigenic genes, such as BSDC1 and U2AF1, that could distinguish between tumor subtypes. Finally, circulating miRNA pairs were filtered and salient rules were extracted to build simplified tree ensemble learners (STELs) for four types of cancer. These accessible clinical frameworks detected pancreatic and ovarian cancer with 84.8 and 93.6% accuracy, respectively. Conclusion Rank-based gene pair classification benefits from careful feature selection methods that preserve maximal information. Gene pairs enable dataset integration for greater statistical power and discovery of robust biomarkers as well as facilitate construction of user-friendly clinical screening tools.
Collapse
Affiliation(s)
- Laura Moody
- Division of Nutritional Sciences, University of Illinois Urbana-Champaign, 461 Bevier Hall, 905 South Goodwin Avenue, Urbana, IL, 61801, USA
| | - Hong Chen
- Division of Nutritional Sciences, University of Illinois Urbana-Champaign, 461 Bevier Hall, 905 South Goodwin Avenue, Urbana, IL, 61801, USA.,Department of Food Science and Human Nutrition, University of Illinois Urbana-Champaign, Urbana, IL, USA
| | - Yuan-Xiang Pan
- Division of Nutritional Sciences, University of Illinois Urbana-Champaign, 461 Bevier Hall, 905 South Goodwin Avenue, Urbana, IL, 61801, USA. .,Department of Food Science and Human Nutrition, University of Illinois Urbana-Champaign, Urbana, IL, USA. .,Illinois Informatics Institute, University of Illinois Urbana-Champaign, Urbana, IL, USA.
| |
Collapse
|
41
|
The Effects of Age, Cigarette Smoking, Sex, and Race on the Qualitative Characteristics of Lung Transcriptome. BIOMED RESEARCH INTERNATIONAL 2020; 2020:6418460. [PMID: 32802863 PMCID: PMC7424369 DOI: 10.1155/2020/6418460] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Accepted: 06/29/2020] [Indexed: 11/18/2022]
Abstract
The within-sample relative expression orderings (REOs) of genes, which are stable qualitative transcriptional characteristics, can provide abundant information for a disease. Methods based on REO comparisons have been proposed for identifying differentially expressed genes (DEGs) at the individual level and for detecting disease-associated genes based on one-phenotype disease data by reusing data of normal samples from other sources. Here, we evaluated the effects of common potential confounding factors, including age, cigarette smoking, sex, and race, on the REOs of gene pairs within normal lung tissues transcriptome. Our results showed that age has little effect on REOs within lung tissues. We found that about 0.23% of the significantly stable REOs of gene pairs in nonsmokers' lung tissues are reversed in smokers' lung tissues, introduced by 344 DEGs between the two groups of samples (RankCompV2, FDR <0.05), which are enriched in metabolism of xenobiotics by cytochrome P450, glutathione metabolism, and other pathways (hypergeometric test, FDR <0.05). Comparison between the normal lung tissue samples of males and females revealed fewer reversal REOs introduced by 24 DEGs between the sex groups, among which 19 DEGs are located on sex chromosomes and 5 DEGs involving in spermatogenesis and regulation of oocyte are located on autosomes. Between the normal lung tissue samples of white and black people, we identified 22 DEGs (RankCompV2, FDR <0.05) which introduced a few reversal REOs between the two races. In summary, the REO-based study should take into account the confounding factors of cigarette smoking, sex, and race.
Collapse
|
42
|
Liljedahl H, Karlsson A, Oskarsdottir GN, Salomonsson A, Brunnström H, Erlingsdottir G, Jönsson M, Isaksson S, Arbajian E, Ortiz-Villalón C, Hussein A, Bergman B, Vikström A, Monsef N, Branden E, Koyi H, de Petris L, Patthey A, Behndig AF, Johansson M, Planck M, Staaf J. A gene expression-based single sample predictor of lung adenocarcinoma molecular subtype and prognosis. Int J Cancer 2020; 148:238-251. [PMID: 32745259 PMCID: PMC7689824 DOI: 10.1002/ijc.33242] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 07/03/2020] [Accepted: 07/07/2020] [Indexed: 12/14/2022]
Abstract
Disease recurrence in surgically treated lung adenocarcinoma (AC) remains high. New approaches for risk stratification beyond tumor stage are needed. Gene expression-based AC subtypes such as the Cancer Genome Atlas Network (TCGA) terminal-respiratory unit (TRU), proximal-inflammatory (PI) and proximal-proliferative (PP) subtypes have been associated with prognosis, but show methodological limitations for robust clinical use. We aimed to derive a platform independent single sample predictor (SSP) for molecular subtype assignment and risk stratification that could function in a clinical setting. Two-class (TRU/nonTRU=SSP2) and three-class (TRU/PP/PI=SSP3) SSPs using the AIMS algorithm were trained in 1655 ACs (n = 9659 genes) from public repositories vs TCGA centroid subtypes. Validation and survival analysis were performed in 977 patients using overall survival (OS) and distant metastasis-free survival (DMFS) as endpoints. In the validation cohort, SSP2 and SSP3 showed accuracies of 0.85 and 0.81, respectively. SSPs captured relevant biology previously associated with the TCGA subtypes and were associated with prognosis. In survival analysis, OS and DMFS for cases discordantly classified between TCGA and SSP2 favored the SSP2 classification. In resected Stage I patients, SSP2 identified TRU-cases with better OS (hazard ratio [HR] = 0.30; 95% confidence interval [CI] = 0.18-0.49) and DMFS (TRU HR = 0.52; 95% CI = 0.33-0.83) independent of age, Stage IA/IB and gender. SSP2 was transformed into a NanoString nCounter assay and tested in 44 Stage I patients using RNA from formalin-fixed tissue, providing prognostic stratification (relapse-free interval, HR = 3.2; 95% CI = 1.2-8.8). In conclusion, gene expression-based SSPs can provide molecular subtype and independent prognostic information in early-stage lung ACs. SSPs may overcome critical limitations in the applicability of gene signatures in lung cancer.
Collapse
Affiliation(s)
- Helena Liljedahl
- Division of Oncology and Pathology, Department of Clinical Sciences Lund, Lund University, Medicon Village, Lund, Sweden
| | - Anna Karlsson
- Division of Oncology and Pathology, Department of Clinical Sciences Lund, Lund University, Medicon Village, Lund, Sweden
| | - Gudrun N Oskarsdottir
- Division of Oncology and Pathology, Department of Clinical Sciences Lund, Lund University, Medicon Village, Lund, Sweden.,Department of Respiratory Medicine and Allergology, Skåne University Hospital, Lund, Sweden
| | - Annette Salomonsson
- Division of Oncology and Pathology, Department of Clinical Sciences Lund, Lund University, Medicon Village, Lund, Sweden
| | - Hans Brunnström
- Division of Oncology and Pathology, Department of Clinical Sciences Lund, Lund University, Medicon Village, Lund, Sweden.,Department of Pathology, Laboratory Medicine Region Skåne, Lund, Sweden
| | - Gigja Erlingsdottir
- Department of Pathology, Landspitali University Hospital, Reykjavik, Iceland.,Department of Laboratory Medicine, Department of Pathology, Skåne University Hospital, Malmö, Sweden
| | - Mats Jönsson
- Division of Oncology and Pathology, Department of Clinical Sciences Lund, Lund University, Medicon Village, Lund, Sweden
| | - Sofi Isaksson
- Division of Oncology and Pathology, Department of Clinical Sciences Lund, Lund University, Medicon Village, Lund, Sweden
| | - Elsa Arbajian
- Division of Oncology and Pathology, Department of Clinical Sciences Lund, Lund University, Medicon Village, Lund, Sweden
| | | | - Aziz Hussein
- Department of Pathology and Cytology, Sahlgrenska University Hospital, Gothenburg, Sweden
| | - Bengt Bergman
- Department of Respiratory Medicine, Sahlgrenska University Hospital, Gothenburg, Sweden
| | - Anders Vikström
- Department of Pulmonary Medicine, University Hospital Linköping, Linköping, Sweden
| | - Nastaran Monsef
- Department of Pathology and Department of Clinical and Experimental Medicine, Linköping University, Linköping, Sweden
| | - Eva Branden
- Respiratory Medicine Unit, Department of Medicine Solna and CMM, Karolinska Institute and Karolinska University Hospital Solna, Stockholm, Sweden.,Centre for Research and Development, Uppsala University/Region Gävleborg, Gävle, Sweden
| | - Hirsh Koyi
- Respiratory Medicine Unit, Department of Medicine Solna and CMM, Karolinska Institute and Karolinska University Hospital Solna, Stockholm, Sweden.,Centre for Research and Development, Uppsala University/Region Gävleborg, Gävle, Sweden
| | - Luigi de Petris
- Thoracic Oncology Unit, Karolinska University Hospital and Department Oncology-Pathology, Karolinska Institute, Stockholm, Sweden
| | - Annika Patthey
- Department of Medical Biosciences, Pathology, Umeå University, Umeå, Sweden
| | - Annelie F Behndig
- Department of Public Health and Clinical Medicine, Division of Medicine, Umeå University, Umeå, Sweden
| | - Mikael Johansson
- Department of Radiation Sciences, Oncology, Umeå University, Umeå, Sweden
| | - Maria Planck
- Division of Oncology and Pathology, Department of Clinical Sciences Lund, Lund University, Medicon Village, Lund, Sweden.,Department of Respiratory Medicine and Allergology, Skåne University Hospital, Lund, Sweden
| | - Johan Staaf
- Division of Oncology and Pathology, Department of Clinical Sciences Lund, Lund University, Medicon Village, Lund, Sweden
| |
Collapse
|
43
|
Classification of gene expression patterns using a novel type-2 fuzzy multigranulation-based SVM model for the recognition of cancer mediating biomarkers. Neural Comput Appl 2020. [DOI: 10.1007/s00521-020-05241-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
44
|
Xie J, Xu Y, Chen H, Chi M, He J, Li M, Liu H, Xia J, Guan Q, Guo Z, Yan H. Identification of population-level differentially expressed genes in one-phenotype data. Bioinformatics 2020; 36:4283-4290. [PMID: 32428201 PMCID: PMC7520039 DOI: 10.1093/bioinformatics/btaa523] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Revised: 04/15/2020] [Accepted: 05/14/2020] [Indexed: 01/01/2023] Open
Abstract
Motivation For some specific tissues, such as the heart and brain, normal controls are difficult to obtain. Thus, studies with only a particular type of disease samples (one phenotype) cannot be analyzed using common methods, such as significance analysis of microarrays, edgeR and limma. The RankComp algorithm, which was mainly developed to identify individual-level differentially expressed genes (DEGs), can be applied to identify population-level DEGs for the one-phenotype data but cannot identify the dysregulation directions of DEGs. Results Here, we optimized the RankComp algorithm, termed PhenoComp. Compared with RankComp, PhenoComp provided the dysregulation directions of DEGs and had more robust detection power in both simulated and real one-phenotype data. Moreover, using the DEGs detected by common methods as the ‘gold standard’, the results showed that the DEGs detected by PhenoComp using only one-phenotype data were comparable to those identified by common methods using case-control samples, independent of the measurement platform. PhenoComp also exhibited good performance for weakly differential expression signal data. Availability and implementation The PhenoComp algorithm is available on the web at https://github.com/XJJ-student/PhenoComp. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiajing Xie
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350122, China.,Key Laboratory of Medical Bioinformatics, Fujian Province, Fuzhou 350122, China
| | - Yang Xu
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350122, China.,Key Laboratory of Medical Bioinformatics, Fujian Province, Fuzhou 350122, China
| | - Haifeng Chen
- Department of General Surgery, Fuzhou Second Hospital Affiliated to Xiamen University, Fuzhou 350007, China
| | - Meirong Chi
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350122, China.,Key Laboratory of Medical Bioinformatics, Fujian Province, Fuzhou 350122, China
| | - Jun He
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350122, China.,Key Laboratory of Medical Bioinformatics, Fujian Province, Fuzhou 350122, China
| | - Meifeng Li
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350122, China.,Key Laboratory of Medical Bioinformatics, Fujian Province, Fuzhou 350122, China
| | - Hui Liu
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350122, China.,Key Laboratory of Medical Bioinformatics, Fujian Province, Fuzhou 350122, China
| | - Jie Xia
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350122, China.,Key Laboratory of Medical Bioinformatics, Fujian Province, Fuzhou 350122, China
| | - Qingzhou Guan
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350122, China.,Key Laboratory of Medical Bioinformatics, Fujian Province, Fuzhou 350122, China
| | - Zheng Guo
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350122, China.,Key Laboratory of Medical Bioinformatics, Fujian Province, Fuzhou 350122, China
| | - Haidan Yan
- Department of Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, The School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350122, China.,Key Laboratory of Medical Bioinformatics, Fujian Province, Fuzhou 350122, China
| |
Collapse
|
45
|
Povero D, Yamashita H, Ren W, Subramanian MG, Myers RP, Eguchi A, Simonetto DA, Goodman ZD, Harrison SA, Sanyal AJ, Bosch J, Feldstein AE. Characterization and Proteome of Circulating Extracellular Vesicles as Potential Biomarkers for NASH. Hepatol Commun 2020; 4:1263-1278. [PMID: 32923831 PMCID: PMC7471415 DOI: 10.1002/hep4.1556] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 04/17/2020] [Accepted: 05/11/2020] [Indexed: 12/25/2022] Open
Abstract
Nonalcoholic fatty liver disease (NAFLD) is currently one of most common forms of chronic liver disease globally. NAFLD represents a wide spectrum of liver involvement from nonprogressive isolated steatosis to nonalcoholic steatohepatitis (NASH), characterized by liver necroinflammation and fibrosis and currently one of the top causes of end‐stage liver disease and hepatocellular carcinoma. At present, there is a lack of effective treatments, and a central barrier to the development of therapies is the requirement for an invasive liver biopsy for diagnosis of NASH. Discovery of reliable, noninvasive biomarkers are urgently needed. In this study, we tested whether circulating extracellular vesicles (EVs), cell‐derived small membrane‐surrounded structures with a rich cargo of bioactive molecules, may serve as reliable noninvasive “liquid biopsies” for NASH diagnosis and assessment of disease severity. Total circulating EVs and hepatocyte‐derived EVs were isolated by differential centrifugation and size‐exclusion chromatography from serum samples of healthy individuals, patients with precirrhotic NASH, and patients with cirrhotic NASH. EVs were further characterized by flow cytometry, electron microscopy, western blotting, and dynamic light scattering assays before performing a proteomics analysis. Our findings suggest that levels of total and hepatocyte‐derived EVs correlate with NASH clinical characteristics and disease severity. Additionally, using proteomics data, we developed understandable, powerful, and unique EV‐based proteomic signatures for potential diagnosis of advanced NASH. Conclusion: Our study shows that the quantity and protein constituents of circulating EVs provide strong evidence for EV protein–based liquid biopsies for NAFLD/NASH diagnosis.
Collapse
Affiliation(s)
- Davide Povero
- Department of Pediatrics University of California San Diego La Jolla CA
| | | | - Wenhua Ren
- Genomics and Microarray Core University of Colorado Denver Aurora CO
| | | | | | - Akiko Eguchi
- Department of Pediatrics University of California San Diego La Jolla CA
| | | | | | | | | | - Jaime Bosch
- Inselspital Bern University Bern Switzerland.,Ciberehd-Idibaps University of Barcelona Barcelona Spain
| | - Ariel E Feldstein
- Department of Pediatrics University of California San Diego La Jolla CA
| |
Collapse
|
46
|
|
47
|
Development and validation of an individualized DNA repair-related gene signature in localized clear cell renal cell carcinoma. World J Urol 2020; 39:1203-1210. [PMID: 32458095 DOI: 10.1007/s00345-020-03270-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Accepted: 05/17/2020] [Indexed: 10/24/2022] Open
Abstract
BACKGROUND To establish a robust, individualized DNA repair-related gene signature to estimate prognosis for patients with localized clear cell renal cell carcinoma (ccRCC). MATERIALS AND METHODS We retrospectively analyzed gene expression profiles of 541 localized ccRCC patients from two public ccRCC cohorts. The DNA repair-related gene pair index (DRPI) was constructed with the least absolute shrinkage and selection operator (LASSO) regression model. The associations between DRPI, overall survival (OS), and disease-specific survival (DSS) were evaluated by Kaplan-Meier analysis, univariate analysis, and multivariate Cox regression survival analysis. We compared the predictive accuracy of different risk models with Harrel's C-index. RESULTS In the primary univariate analysis, patients in DRPI-high-risk group had significantly shorter OS [P < 0.001, HR (95% CI) 2.093 (1.431-3.061)] and DSS [P < 0.001, HR (95% CI) 3.567 (2.017-6.339)]. After adjusted for stage and grade, DRPI-high-risk group remained an independent adverse risk factor for both OS [P = 0.026, HR (95% CI) 1.629 (1.094-2.452)] and DSS [P = 0.010, HR (95% CI) 2.209 (1.217-4.010)]. DPRI showed comparable predictive accuracy with cell cycle proliferation (CCP) score and ccA/ccB signature. Copy number alterations and tumor mutation burden were enriched in DRPI-high tumors. There were elevated number of Treg cells and higher T cell exhaustion marker expression in DRPI-high-risk tumors. The combined DNA repair-clinical score outperformed other risk models in terms of C-index. CONCLUSION We validated the proposed DRPI as a predictor of clinical outcome in localized ccRCC patients. It provides an individualized and more accurate risk assessment beyond clinicopathological characteristics.
Collapse
|
48
|
Wang K, Song K, Ma Z, Yao Y, Liu C, Yang J, Xiao H, Zhang J, Zhang Y, Zhao W. Identification of EMT-related high-risk stage II colorectal cancer and characterisation of metastasis-related genes. Br J Cancer 2020; 123:410-417. [PMID: 32435058 PMCID: PMC7403418 DOI: 10.1038/s41416-020-0902-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Revised: 04/25/2020] [Accepted: 05/01/2020] [Indexed: 11/09/2022] Open
Abstract
Background Our laboratory previously reported an individual-level prognostic signature for patients with stage II colorectal cancer (CRC). However, this signature was not applicable for RNA-sequencing datasets. In this study, we constructed a robust epithelial-to-mesenchymal transition (EMT)- related gene pair prognostic signature. Methods Based on EMT-related genes, metastasis-associated gene pairs were identified between metastatic and non-metastatic samples. Then, we selected prognosis-associated gene pairs, which were significantly correlated with disease-free survival of stage II CRC using multivariate Cox regression model, as the EMT-related prognosis signature. Results An EMT-related signature composed of fifty-one gene pairs (51-GPS) for prediction-relapse risk of patients with stage II CRC was developed, whose prognostic efficiency was validated in independent datasets. Moreover, 51-GPS achieved better predictive performance than other reported signatures, including a commercial signature Oncotype Dx colon cancer and an immune-related gene pair signature. Besides, EMT-related functional gene sets achieved high enrichment scores in high-risk samples. Especially, loss-of-function antisense approach showed that DEGs between the predicted two clusters were metastasis-related. Conclusions The EMT-related gene pair signature can identify the high relapse-risk patients with stage II CRC, which can facilitate individualised management of patients.
Collapse
Affiliation(s)
- Kai Wang
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Kai Song
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Zhigang Ma
- Department of Gastrointestinal Medical Oncology, Harbin Medical University Cancer Hospital, No. 150, Haping Road, Nangang District, Harbin, 150001, China
| | - Yang Yao
- Department of Gastrointestinal Medical Oncology, Harbin Medical University Cancer Hospital, No. 150, Haping Road, Nangang District, Harbin, 150001, China
| | - Chao Liu
- Department of Gastrointestinal Medical Oncology, Harbin Medical University Cancer Hospital, No. 150, Haping Road, Nangang District, Harbin, 150001, China
| | - Jing Yang
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Huiting Xiao
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Jiashuai Zhang
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Yanqiao Zhang
- Department of Gastrointestinal Medical Oncology, Harbin Medical University Cancer Hospital, No. 150, Haping Road, Nangang District, Harbin, 150001, China.
| | - Wenyuan Zhao
- Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China.
| |
Collapse
|
49
|
Zhang Z, Zhang S, Li X, Zhao Z, Chen C, Zhang J, Li M, Wei Z, Jiang W, Pan B, Li Y, Liu Y, Cao Y, Zhao W, Gu Y, Yu Y, Meng Q, Qi L. Reference genome and annotation updates lead to contradictory prognostic predictions in gene expression signatures: a case study of resected stage I lung adenocarcinoma. Brief Bioinform 2020; 22:5834482. [PMID: 32383445 DOI: 10.1093/bib/bbaa081] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Revised: 04/02/2020] [Accepted: 04/18/2020] [Indexed: 12/28/2022] Open
Abstract
RNA-sequencing enables accurate and low-cost transcriptome-wide detection. However, expression estimates vary as reference genomes and gene annotations are updated, confounding existing expression-based prognostic signatures. Herein, prognostic 9-gene pair signature (GPS) was applied to 197 patients with stage I lung adenocarcinoma derived from previous and latest data from The Cancer Genome Atlas (TCGA) processed with different reference genomes and annotations. For 9-GPS, 6.6% of patients exhibited discordant risk classifications between the two TCGA versions. Similar results were observed for other prognostic signatures, including IRGPI, 15-gene and ORACLE. We found that conflicting annotations for gene length and overlap were the major cause of their discordant risk classification. Therefore, we constructed a prognostic 40-GPS based on stable genes across GENCODE v20-v30 and validated it using public data of 471 stage I samples (log-rank P < 0.0010). Risk classification was still stable in RNA-sequencing data processed with the newest GENCODE v32 versus GENCODE v20-v30. Specifically, 40-GPS could predict survival for 30 stage I samples with formalin-fixed paraffin-embedded tissues (log-rank P = 0.0177). In conclusion, this method overcomes the vulnerability of existing prognostic signatures due to reference genome and annotation updates. 40-GPS may offer individualized clinical applications due to its prognostic accuracy and classification stability.
Collapse
|
50
|
Distinguishing Kawasaki Disease from Febrile Infectious Disease Using Gene Pair Signatures. BIOMED RESEARCH INTERNATIONAL 2020; 2020:6539398. [PMID: 32420360 PMCID: PMC7201505 DOI: 10.1155/2020/6539398] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/06/2020] [Accepted: 03/24/2020] [Indexed: 12/24/2022]
Abstract
Kawasaki disease (KD) is an acute systemic vasculitis of childhood with prolonged fever, and the diagnosis of KD is mainly based on clinical criteria, which is prone to misdiagnosis with other febrile infectious (FI) diseases. Currently, there remain no effective molecular markers for KD diagnosis. In this study, we aimed to use a relative-expression-based method k-TSP and resampling framework to identify robust gene pair signatures to distinguish KD from bacterial and virus febrile infectious diseases. Our study pool consisted of 808 childhood patients from several studies and assigned to three groups, namely, the discovery set (n = 224), validation set-1 (n = 197), and validation set-2 (n = 387). We had identified 60 biologically relevant gene pairs and developed a top-ranked gene pair classifier (TRGP) using the first seven signatures, with the area under the receiver-operating characteristic curves (AUROC) of 0.947 (95% CI, 0.918-0.976), a sensitivity of 0.936 (95% CI, 0.872-0.987), and a specificity of 0.774 (95% CI, 0.705-0.836) in the discovery set. In the validation set-1, the TRGP classifier distinguished KD from FI with AUROC of 0.955 (95% CI, 0.919-0.991), a sensitivity of 0.959 (95% CI, 0.925-0.986), and a specificity of 0.863 (95% CI, 0.764-0.961). In the validation set-2, the predictive performance of classification was with an AUROC of 0.796 (95% CI, 0.747-0.845), a sensitivity of 0.797 (95% CI, 0.720-0.864), and a specificity of 0.661 (95% CI, 0.606-0.717). Our study reveals that gene pair signatures are robust across diverse studies and can be utilized as objective biomarkers to distinguish KD from FI, helping to develop a fast, simple, and effective molecular approach to improve the diagnosis of KD.
Collapse
|