1
|
Anuntakarun S, Khamjerm J, Tangkijvanich P, Chuaypen N. Classification of Long Non-Coding RNAs s Between Early and Late Stage of Liver Cancers From Non-coding RNA Profiles Using Machine-Learning Approach. Bioinform Biol Insights 2024; 18:11779322241258586. [PMID: 38846329 PMCID: PMC11155358 DOI: 10.1177/11779322241258586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 05/10/2024] [Indexed: 06/09/2024] Open
Abstract
Long non-coding RNAs (lncRNAs), which are RNA sequences greater than 200 nucleotides in length, play a crucial role in regulating gene expression and biological processes associated with cancer development and progression. Liver cancer is a major cause of cancer-related mortality worldwide, notably in Thailand. Although machine learning has been extensively used in analyzing RNA-sequencing data for advanced knowledge, the identification of potential lncRNA biomarkers for cancer, particularly focusing on lncRNAs as molecular biomarkers in liver cancer, remains comparatively limited. In this study, our objective was to identify candidate lncRNAs in liver cancer. We employed an expression data set of lncRNAs from patients with liver cancer, which comprised 40 699 lncRNAs sourced from The CancerLivER database. Various feature selection methods and machine-learning approaches were used to identify these candidate lncRNAs. The results showed that the random forest algorithm could predict lncRNAs using features extracted from the database, which achieved an area under the curve (AUC) of 0.840 for classifying lncRNAs between early (stage 1) and late stages (stages 2, 3, and 4) of liver cancer. Five of 23 significant lncRNAs (WAC-AS1, MAPKAPK5-AS1, ARRDC1-AS1, AC133528.2, and RP11-1094M14.11) were differentially expressed between early and late stage of liver cancer. Based on the Gene Expression Profiling Interactive Analysis (GEPIA) database, higher expression of WAC-AS1, MAPKAPK5-AS1, and ARRDC1-AS1 was associated with shorter overall survival. In conclusion, the classification model could predict the early and late stages of liver cancer using the signature expression of lncRNA genes. The identified lncRNAs might be used as early diagnostic and prognostic biomarkers for patients with liver cancer.
Collapse
Affiliation(s)
- Songtham Anuntakarun
- Center of Excellence in Hepatitis and Liver Cancer, Department of Biochemistry, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
| | - Jakkrit Khamjerm
- Center of Excellence in Hepatitis and Liver Cancer, Department of Biochemistry, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
- Biomedical Engineering Program, Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, Thailand
| | - Pisit Tangkijvanich
- Center of Excellence in Hepatitis and Liver Cancer, Department of Biochemistry, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
| | - Natthaya Chuaypen
- Center of Excellence in Hepatitis and Liver Cancer, Department of Biochemistry, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
| |
Collapse
|
2
|
Chereda H, Leha A, Beißbarth T. Stable feature selection utilizing Graph Convolutional Neural Network and Layer-wise Relevance Propagation for biomarker discovery in breast cancer. Artif Intell Med 2024; 151:102840. [PMID: 38658129 DOI: 10.1016/j.artmed.2024.102840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 03/05/2024] [Accepted: 03/10/2024] [Indexed: 04/26/2024]
Abstract
High-throughput technologies are becoming increasingly important in discovering prognostic biomarkers and in identifying novel drug targets. With Mammaprint, Oncotype DX, and many other prognostic molecular signatures breast cancer is one of the paradigmatic examples of the utility of high-throughput data to deliver prognostic biomarkers, that can be represented in a form of a rather short gene list. Such gene lists can be obtained as a set of features (genes) that are important for the decisions of a Machine Learning (ML) method applied to high-dimensional gene expression data. Several studies have identified predictive gene lists for patient prognosis in breast cancer, but these lists are unstable and have only a few genes in common. Instability of feature selection impedes biological interpretability: genes that are relevant for cancer pathology should be members of any predictive gene list obtained for the same clinical type of patients. Stability and interpretability of selected features can be improved by including information on molecular networks in ML methods. Graph Convolutional Neural Network (GCNN) is a contemporary deep learning approach applicable to gene expression data structured by a prior knowledge molecular network. Layer-wise Relevance Propagation (LRP) and SHapley Additive exPlanations (SHAP) are methods to explain individual decisions of deep learning models. We used both GCNN+LRP and GCNN+SHAP techniques to construct feature sets by aggregating individual explanations. We suggest a methodology to systematically and quantitatively analyze the stability, the impact on the classification performance, and the interpretability of the selected feature sets. We used this methodology to compare GCNN+LRP to GCNN+SHAP and to more classical ML-based feature selection approaches. Utilizing a large breast cancer gene expression dataset we show that, while feature selection with SHAP is useful in applications where selected features have to be impactful for classification performance, among all studied methods GCNN+LRP delivers the most stable (reproducible) and interpretable gene lists.
Collapse
Affiliation(s)
- Hryhorii Chereda
- Medical Bioinformatics, University Medical Center Göttingen, Goldschmidtstraße 1, Göttingen, 37077, Germany
| | - Andreas Leha
- Medical Bioinformatics, University Medical Center Göttingen, Goldschmidtstraße 1, Göttingen, 37077, Germany; Medical Statistics, University Medical Center Göttingen, Humboldtallee 32, Göttingen, 37073, Germany; Scientific Core Facility Medical Biometry and Statistical Bioinformatics, University Medical Center Göttingen, Humboldtallee 32, Göttingen, 37073, Germany
| | - Tim Beißbarth
- Medical Bioinformatics, University Medical Center Göttingen, Goldschmidtstraße 1, Göttingen, 37077, Germany; Campus-Institute Data Science (CIDAS), University of Göttingen, Goldschmidtstraße 1, Göttingen, 37077, Germany.
| |
Collapse
|
3
|
Li Y, Li P, Liu Y, Geng W. A novel gene-based model for prognosis prediction of head and neck squamous cell carcinoma. Heliyon 2024; 10:e29449. [PMID: 38660262 PMCID: PMC11040035 DOI: 10.1016/j.heliyon.2024.e29449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Accepted: 04/08/2024] [Indexed: 04/26/2024] Open
Abstract
Background Head and neck squamous cell carcinoma (HNSCC) is a significant global health challenge. The identification of reliable prognostic biomarkers and construction of an accurate prognostic model are crucial. Methods In this study, mRNA expression data and clinical data of HNSCC patients from The Cancer Genome Atlas were used. Overlapping candidate genes (OCGs) were identified by intersecting differentially expressed genes and prognosis-related genes. Best prognostic genes were selected using the least absolute shrinkage and selection operator Cox regression based on OCGs, and a risk score was developed using the Cox coefficient of each gene. The prognostic power of the risk score was assessed using Kaplan-Meier survival analysis and time-dependent receiver operating characteristic analysis. Univariate and multivariate Cox regression were performed to identify independent prognostic parameters, which were used to construct a nomogram. The predictive accuracy of the nomogram was evaluated using calibration plots. Functional enrichment analysis of risk score related genes was performed to explore the potential biological functions and pathways. External validation was conducted using data from the Gene Expression Omnibus and ArrayExpress databases. Results FADS3, TNFRSF12A, TJP3, and FUT6 were screened to be significantly related to prognosis in HNSCC patients. The risk score effectively stratified patients into high-risk group with poor overall survival (OS) and low-risk group with better OS. Risk score, age, clinical M stage and clinical N stage were regarded as independent prognostic parameters by Cox regression analysis and used to construct a nomogram. The nomogram performed well in 1-, 2-, 3-, 5- and 10-year survival predictions. Functional enrichment analysis suggested that tight junction was closely related to the cancer. In addition, the prognostic power of the risk score was validated by external datasets. Conclusions This study constructed a gene-based model integrating clinical prognostic parameters to accurately predict prognosis in HNSCC patients.
Collapse
Affiliation(s)
- Yanxi Li
- Department of Dental Implant Center, Beijing Stomatological Hospital, School of Stomatology, Capital Medical University, Beijing, 100050, China
| | - Peiran Li
- Department of Maxillofacial Surgery, Beijing Stomatological Hospital, School of Stomatology, Capital Medical University, Beijing, 100050, China
| | - Yuqi Liu
- Department of Dental Implant Center, Beijing Stomatological Hospital, School of Stomatology, Capital Medical University, Beijing, 100050, China
| | - Wei Geng
- Department of Dental Implant Center, Beijing Stomatological Hospital, School of Stomatology, Capital Medical University, Beijing, 100050, China
| |
Collapse
|
4
|
Zhang L, Liu Q, Guo Y, Tian L, Chen K, Bai D, Yu H, Han X, Luo W, Feng T, Deng S, Xie G. DNA-based molecular classifiers for the profiling of gene expression signatures. J Nanobiotechnology 2024; 22:189. [PMID: 38632615 PMCID: PMC11025223 DOI: 10.1186/s12951-024-02445-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 03/28/2024] [Indexed: 04/19/2024] Open
Abstract
Although gene expression signatures offer tremendous potential in diseases diagnostic and prognostic, but massive gene expression signatures caused challenges for experimental detection and computational analysis in clinical setting. Here, we introduce a universal DNA-based molecular classifier for profiling gene expression signatures and generating immediate diagnostic outcomes. The molecular classifier begins with feature transformation, a modular and programmable strategy was used to capture relative relationships of low-concentration RNAs and convert them to general coding inputs. Then, competitive inhibition of the DNA catalytic reaction enables strict weight assignment for different inputs according to their importance, followed by summation, annihilation and reporting to accurately implement the mathematical model of the classifier. We validated the entire workflow by utilizing miRNA expression levels for the diagnosis of hepatocellular carcinoma (HCC) in clinical samples with an accuracy 85.7%. The results demonstrate the molecular classifier provides a universal solution to explore the correlation between gene expression patterns and disease diagnostics, monitoring, and prognosis, and supports personalized healthcare in primary care.
Collapse
Affiliation(s)
- Li Zhang
- Key Laboratory of Laboratory Medical Diagnostics, Ministry of Education, Department of Laboratory Medicine, Chongqing Medical University, Chongqing, 400016, China
- Department of Forensic Medicine, Chongqing Medical University, Chongqing, 400016, China
| | - Qian Liu
- Nuclear Medicine Department, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, 400010, China
| | - Yongcan Guo
- Clinical Laboratory, Traditional Chinese Medicine Hospital Affiliated to Southwest Medical University, Luzhou, 646000, China
| | - Luyao Tian
- Key Laboratory of Laboratory Medical Diagnostics, Ministry of Education, Department of Laboratory Medicine, Chongqing Medical University, Chongqing, 400016, China
| | - Kena Chen
- Key Laboratory of Laboratory Medical Diagnostics, Ministry of Education, Department of Laboratory Medicine, Chongqing Medical University, Chongqing, 400016, China
| | - Dan Bai
- Key Laboratory of Laboratory Medical Diagnostics, Ministry of Education, Department of Laboratory Medicine, Chongqing Medical University, Chongqing, 400016, China
| | - Hongyan Yu
- Key Laboratory of Laboratory Medical Diagnostics, Ministry of Education, Department of Laboratory Medicine, Chongqing Medical University, Chongqing, 400016, China
| | - Xiaole Han
- Key Laboratory of Laboratory Medical Diagnostics, Ministry of Education, Department of Laboratory Medicine, Chongqing Medical University, Chongqing, 400016, China
| | - Wang Luo
- Key Laboratory of Laboratory Medical Diagnostics, Ministry of Education, Department of Laboratory Medicine, Chongqing Medical University, Chongqing, 400016, China
| | - Tong Feng
- Key Laboratory of Laboratory Medical Diagnostics, Ministry of Education, Department of Laboratory Medicine, Chongqing Medical University, Chongqing, 400016, China
| | - Shixiong Deng
- Department of Forensic Medicine, Chongqing Medical University, Chongqing, 400016, China.
| | - Guoming Xie
- Key Laboratory of Laboratory Medical Diagnostics, Ministry of Education, Department of Laboratory Medicine, Chongqing Medical University, Chongqing, 400016, China.
| |
Collapse
|
5
|
Hu P, Wang T, Yan H, Huang Y, Zhao Y, Gao Y. Crucial role of hsa-mir-503, hsa-mir-1247, and their validation in prostate cancer. Aging (Albany NY) 2023; 15:12966-12981. [PMID: 37980162 DOI: 10.18632/aging.205213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Accepted: 10/17/2023] [Indexed: 11/20/2023]
Abstract
BACKGROUND Prostate cancer (PC) is a common urinary system malignancy, and advanced PC patients had a poor prognosis due to recurrence or distant metastasis. Therefore, it's imperative to reveal more details in tumorigenesis and prognosis of PC patients. METHODS The miRNA and mRNA expression profile data of 485 PC patients were obtained from The Cancer Genome Atlas database. The univariate Cox regression was applied to screen miRNAs relating to prognosis of PC. Then miRTarBase was used to predict target mRNAs of miRNAs. The hsa-mir-503/hsa-mir-1247 knockdown in 22RV1 cells was established to evaluate the effect of these two miRNAs on tumor cell migration and invasion ability. Flow cytometry was used to detect the effect of hsa-mir-503/hsa-mir-1247 knockdown on 22RV1 apoptosis rate. RESULTS Univariate Cox regression analysis identified hsa-mir-503 as a poor and hsa-mir-1247 as a favorable prognostic marker. Totally 649 target mRNAs were screened, among which DUSP19, FGF2, and SLC2A5 had a negative correlation with hsa-mir-503, while FGF2 and VSTM4 had a positive correlation with hsa-mir-1247. In 22RV1 cells, hsa-mir-503 was up-regulated, and hsa-mir-1247 was down-regulated. hsa-mir-503 knockdown attenuated the migration and invasion of 22RV1 cells, while hsa-mir-1247 knockdown exhibited the opposite effect. In addition, hsa-mir-503 knockdown promoted 22RV1 cell apoptosis. hsa-mir-1247 overexpression significantly inhibited the tumor growth of PC in vivo. CONCLUSIONS Herein, we demonstrated that hsa-mir-503 and hsa-mir-1247 could serve as new prognostic markers of PC, and hsa-mir-1247 had great potential to inhibit PC progression by suppressing the migration and invasion ability in vitro and in vivo.
Collapse
Affiliation(s)
- Ping Hu
- The First Department of Medical Oncology, General Hospital of Ningxia Medical University, Yinchuan 750004, Ningxia, P.R. China
| | - Tao Wang
- The Second Department of Surgical Oncology, General Hospital of Ningxia Medical University, Yinchuan 750004, Ningxia, P.R. China
| | - Hui Yan
- The Second Department of Medicine Oncology, General Hospital of Ningxia Medical University, Yinchuan 750004, Ningxia, P.R. China
| | - Ying Huang
- The Third Department of Medicine Oncology, General Hospital of Ningxia Medical University, Yinchuan 750004, Ningxia, P.R. China
| | - Yanjiao Zhao
- The Third Department of Medicine Oncology, General Hospital of Ningxia Medical University, Yinchuan 750004, Ningxia, P.R. China
| | - Yuanyuan Gao
- The Third Department of Medicine Oncology, General Hospital of Ningxia Medical University, Yinchuan 750004, Ningxia, P.R. China
| |
Collapse
|
6
|
Chen Y, Liu S, Papageorgiou LG, Theofilatos K, Tsoka S. Optimisation Models for Pathway Activity Inference in Cancer. Cancers (Basel) 2023; 15:1787. [PMID: 36980673 PMCID: PMC10046797 DOI: 10.3390/cancers15061787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 02/24/2023] [Accepted: 03/08/2023] [Indexed: 03/18/2023] Open
Abstract
BACKGROUND With advances in high-throughput technologies, there has been an enormous increase in data related to profiling the activity of molecules in disease. While such data provide more comprehensive information on cellular actions, their large volume and complexity pose difficulty in accurate classification of disease phenotypes. Therefore, novel modelling methods that can improve accuracy while offering interpretable means of analysis are required. Biological pathways can be used to incorporate a priori knowledge of biological interactions to decrease data dimensionality and increase the biological interpretability of machine learning models. METHODOLOGY A mathematical optimisation model is proposed for pathway activity inference towards precise disease phenotype prediction and is applied to RNA-Seq datasets. The model is based on mixed-integer linear programming (MILP) mathematical optimisation principles and infers pathway activity as the linear combination of pathway member gene expression, multiplying expression values with model-determined gene weights that are optimised to maximise discrimination of phenotype classes and minimise incorrect sample allocation. RESULTS The model is evaluated on the transcriptome of breast and colorectal cancer, and exhibits solution results of good optimality as well as good prediction performance on related cancer subtypes. Two baseline pathway activity inference methods and three advanced methods are used for comparison. Sample prediction accuracy, robustness against noise expression data, and survival analysis suggest competitive prediction performance of our model while providing interpretability and insight on key pathways and genes. Overall, our work demonstrates that the flexible nature of mathematical programming lends itself well to developing efficient computational strategies for pathway activity inference and disease subtype prediction.
Collapse
Affiliation(s)
- Yongnan Chen
- Department of Informatics, Faculty of Natural, Mathematical and Engineering Sciences, King's College London, Bush House, London WC2B 4BG, UK
| | - Songsong Liu
- School of Management, Harbin Institute of Technology, Harbin 150001, China
| | - Lazaros G Papageorgiou
- The Sargent Centre for Process Systems Engineering, Department of Chemical Engineering, University College London, Torrington Place, London WC1E 7JE, UK
| | - Konstantinos Theofilatos
- King's College London British Heart Foundation Centre, School of Cardiovascular and Metabolic Medicine and Sciences, London SE1 7EH, UK
| | - Sophia Tsoka
- Department of Informatics, Faculty of Natural, Mathematical and Engineering Sciences, King's College London, Bush House, London WC2B 4BG, UK
| |
Collapse
|
7
|
Santhanam B, Oikonomou P, Tavazoie S. Systematic assessment of prognostic molecular features across cancers. CELL GENOMICS 2023; 3:100262. [PMID: 36950380 PMCID: PMC10025453 DOI: 10.1016/j.xgen.2023.100262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 09/29/2022] [Accepted: 01/12/2023] [Indexed: 02/05/2023]
Abstract
Precision oncology promises accurate prediction of disease trajectories by utilizing molecular features of tumors. We present a systematic analysis of the prognostic potential of diverse molecular features across large cancer cohorts. We find that the mRNA expression of biologically coherent sets of genes (modules) is substantially more predictive of patient survival than single-locus genomic and transcriptomic aberrations. Extending our analysis beyond existing curated gene modules, we find a large novel class of highly prognostic DNA/RNA cis-regulatory modules associated with dynamic gene expression within cancers. Remarkably, in more than 82% of cancers, modules substantially improve survival stratification compared with conventional clinical factors and prominent genomic aberrations. The prognostic potential of cancer modules generalizes to external cohorts better than conventionally used single-gene features. Finally, a machine-learning framework demonstrates the combined predictive power of multiple modules, yielding prognostic models that perform substantially better than existing histopathological and clinical factors in common use.
Collapse
Affiliation(s)
- Balaji Santhanam
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
- Irving Institute for Cancer Dynamics, Columbia University, New York, NY 10032, USA
| | - Panos Oikonomou
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
- Irving Institute for Cancer Dynamics, Columbia University, New York, NY 10032, USA
| | - Saeed Tavazoie
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
- Irving Institute for Cancer Dynamics, Columbia University, New York, NY 10032, USA
| |
Collapse
|
8
|
Kircher M, Säurich J, Selle M, Jung K. Assessing Outlier Probabilities in Transcriptomics Data When Evaluating a Classifier. Genes (Basel) 2023; 14:genes14020387. [PMID: 36833313 PMCID: PMC9956321 DOI: 10.3390/genes14020387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 01/27/2023] [Accepted: 01/30/2023] [Indexed: 02/04/2023] Open
Abstract
Outliers in the training or test set used to fit and evaluate a classifier on transcriptomics data can considerably change the estimated performance of the model. Hence, an either too weak or a too optimistic accuracy is then reported and the estimated model performance cannot be reproduced on independent data. It is then also doubtful whether a classifier qualifies for clinical usage. We estimate classifier performances in simulated gene expression data with artificial outliers and in two real-world datasets. As a new approach, we use two outlier detection methods within a bootstrap procedure to estimate the outlier probability for each sample and evaluate classifiers before and after outlier removal by means of cross-validation. We found that the removal of outliers changed the classification performance notably. For the most part, removing outliers improved the classification results. Taking into account the fact that there are various, sometimes unclear reasons for a sample to be an outlier, we strongly advocate to always report the performance of a transcriptomics classifier with and without outliers in training and test data. This provides a more diverse picture of a classifier's performance and prevents reporting models that later turn out to be not applicable for clinical diagnoses.
Collapse
|
9
|
Liu X, Su L, Li J, Ou G. Identification of Pathway-Based Biomarkers with Crosstalk Analysis for Overall Survival Risk Prediction in Breast Cancer. Front Genet 2021; 12:689715. [PMID: 34745202 PMCID: PMC8566719 DOI: 10.3389/fgene.2021.689715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Accepted: 09/28/2021] [Indexed: 11/16/2022] Open
Abstract
Recently, many studies have investigated the role of gene-signature on the prognostic assessment of breast cancer (BC), however, the tumor heterogeneity and sequencing noise have limited the clinical usage of these models. Pathway-based approaches are more stable to the perturbation of certain gene expression. In this study, we constructed a prognostic classifier based on survival-related pathway crosstalk analysis. We estimated pathway’s deregulation scores (PDSs) for samples collected from public databases to select survival-related pathways. After pathway crosstalk analysis, we conducted K-means clustering analysis to cluster the patients into G1 and G2 subgroups. The survival outcome of the G2 subgroup was significantly worse than the G1 subgroup. Internal and external dataset exhibits high consistency with the training dataset. Significant differences were found between G2 and G1 subgroups on pathway activity, gene mutation, immune cell infiltration levels, and in particular immune cells/pathway’s activities were significantly negatively associated with BC patient’s outcomes. In conclusion, we established a novel classifier reflecting the overall survival risk of BC and successfully validated its clinical usage on multiple BC datasets, which could offer clinicians inspiration in formulating the clinical treatment plan.
Collapse
Affiliation(s)
- Xiaohua Liu
- State Key Laboratory of Oncology in South China, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - Lili Su
- School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Jingcong Li
- State Key Laboratory of Oncology in South China, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - Guoping Ou
- State Key Laboratory of Oncology in South China, Sun Yat-sen University Cancer Center, Guangzhou, China
| |
Collapse
|
10
|
Yang X, Jin X, Xu R, Yu Z, An N. ER expression associates with poor prognosis in male lung squamous carcinoma after radical resection. BMC Cancer 2021; 21:1043. [PMID: 34548052 PMCID: PMC8456567 DOI: 10.1186/s12885-021-08777-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Accepted: 09/08/2021] [Indexed: 11/30/2022] Open
Abstract
Background Clinical options for lung squamous carcinoma (LUSC) are still quite limited. Carcinogenesis is an exceedingly complicated process involving multi-level dysregulations. Therefore, only looking into one layer of genomic dysregulation is far from sufficient. Methods We identified differentially expressed genes with consistent upstream genetic or epigenetic dysregulations in LUSC. Random walk was adopted to identify genes significantly affected by upstream abnormalities. Expression differentiation and survival analysis were conducted for these significant genes, respectively. Prognostic power of selected gene was also tested in 102 male LUSC samples through immunohistochemistry assay. Results Twelve genes were successfully retrieved from biological network, including ERα (ESRS1), EGFR, AR, ATXN1, MAPK3, PRKACA, PRKCA, SMAD4, TP53, TRAF2, UBQLN4 and YWHAG, which were closely related to sex hormone signaling pathway. Survival analysis in public datasets indicated ERα was significantly associated with a poor overall survival (OS) in male LUSC. The result of our immunohistochemistry assay also demonstrated this correlation using R0 resected tumors (n = 102, HR: 2.152, 95% CI: 1.089–4.255, p = 0.024). Although disease-free survival (DFS) difference was non-significant (n = 102, p = 0.12), the tendency of distinction was straight-forward. Cox analysis indicated ERα was the only independent prognostic factor for male patients’ OS after R0 resection (HR = 2.152, p = 0.037). Conclusion ERα was significantly related to a poor prognosis in LUSC, especially for male patients after radical surgery, confirmed by our immunohistochemistry data. Supplementary Information The online version contains supplementary material available at 10.1186/s12885-021-08777-6.
Collapse
Affiliation(s)
- Xue Yang
- Department of Medical Oncology, The Affiliated Hospital of Qingdao University, Qingdao, 266003, Shandong, China
| | - Xiangfeng Jin
- Department of Thoracic Surgery, The Affiliated Hospital of Qingdao University, Qingdao, 266003, Shandong, China
| | - Rongjian Xu
- Department of Thoracic Surgery, The Affiliated Hospital of Qingdao University, Qingdao, 266003, Shandong, China
| | - Zhuang Yu
- Department of Medical Oncology, The Affiliated Hospital of Qingdao University, Qingdao, 266003, Shandong, China
| | - Ning An
- Department of Radiation Oncology, The Affiliated Hospital of Qingdao University, Qingdao, 266003, Shandong, China.
| |
Collapse
|
11
|
Bushnell GG, Deshmukh AP, den Hollander P, Luo M, Soundararajan R, Jia D, Levine H, Mani SA, Wicha MS. Breast cancer dormancy: need for clinically relevant models to address current gaps in knowledge. NPJ Breast Cancer 2021; 7:66. [PMID: 34050189 PMCID: PMC8163741 DOI: 10.1038/s41523-021-00269-x] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Accepted: 04/08/2021] [Indexed: 02/04/2023] Open
Abstract
Breast cancer is the most commonly diagnosed cancer in the USA. Although advances in treatment over the past several decades have significantly improved the outlook for this disease, most women who are diagnosed with estrogen receptor positive disease remain at risk of metastatic relapse for the remainder of their life. The cellular source of late relapse in these patients is thought to be disseminated tumor cells that reactivate after a long period of dormancy. The biology of these dormant cells and their natural history over a patient's lifetime is largely unclear. We posit that research on tumor dormancy has been significantly limited by the lack of clinically relevant models. This review will discuss existing dormancy models, gaps in biological understanding, and propose criteria for future models to enhance their clinical relevance.
Collapse
Affiliation(s)
- Grace G Bushnell
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Abhijeet P Deshmukh
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Petra den Hollander
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Ming Luo
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Rama Soundararajan
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Dongya Jia
- Center for Theoretical Biological Physics, Rice University, Houston, TX, USA
| | - Herbert Levine
- Center for Theoretical Biological Physics and Departments of Physics and Bioengineering, Northeastern University, Boston, MA, USA.
| | - Sendurai A Mani
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
| | - Max S Wicha
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
12
|
Li Y, Nowak CM, Pham U, Nguyen K, Bleris L. Cell morphology-based machine learning models for human cell state classification. NPJ Syst Biol Appl 2021; 7:23. [PMID: 34039992 PMCID: PMC8155075 DOI: 10.1038/s41540-021-00180-y] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Accepted: 02/16/2021] [Indexed: 12/30/2022] Open
Abstract
Herein, we implement and access machine learning architectures to ascertain models that differentiate healthy from apoptotic cells using exclusively forward (FSC) and side (SSC) scatter flow cytometry information. To generate training data, colorectal cancer HCT116 cells were subjected to miR-34a treatment and then classified using a conventional Annexin V/propidium iodide (PI)-staining assay. The apoptotic cells were defined as Annexin V-positive cells, which include early and late apoptotic cells, necrotic cells, as well as other dying or dead cells. In addition to fluorescent signal, we collected cell size and granularity information from the FSC and SSC parameters. Both parameters are subdivided into area, height, and width, thus providing a total of six numerical features that informed and trained our models. A collection of logistical regression, random forest, k-nearest neighbor, multilayer perceptron, and support vector machine was trained and tested for classification performance in predicting cell states using only the six aforementioned numerical features. Out of 1046 candidate models, a multilayer perceptron was chosen with 0.91 live precision, 0.93 live recall, 0.92 live f value and 0.97 live area under the ROC curve when applied on standardized data. We discuss and highlight differences in classifier performance and compare the results to the standard practice of forward and side scatter gating, typically performed to select cells based on size and/or complexity. We demonstrate that our model, a ready-to-use module for any flow cytometry-based analysis, can provide automated, reliable, and stain-free classification of healthy and apoptotic cells using exclusively size and granularity information.
Collapse
Affiliation(s)
- Yi Li
- Bioengineering Department, University of Texas at Dallas, Richardson, TX, USA.,Center for Systems Biology, University of Texas at Dallas, Richardson, TX, USA
| | - Chance M Nowak
- Bioengineering Department, University of Texas at Dallas, Richardson, TX, USA.,Center for Systems Biology, University of Texas at Dallas, Richardson, TX, USA.,Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, USA
| | - Uyen Pham
- Bioengineering Department, University of Texas at Dallas, Richardson, TX, USA.,Center for Systems Biology, University of Texas at Dallas, Richardson, TX, USA
| | - Khai Nguyen
- Bioengineering Department, University of Texas at Dallas, Richardson, TX, USA.,Center for Systems Biology, University of Texas at Dallas, Richardson, TX, USA
| | - Leonidas Bleris
- Bioengineering Department, University of Texas at Dallas, Richardson, TX, USA. .,Center for Systems Biology, University of Texas at Dallas, Richardson, TX, USA. .,Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, USA.
| |
Collapse
|
13
|
Classification of triple-negative breast cancers through a Boolean network model of the epithelial-mesenchymal transition. Cell Syst 2021; 12:457-462.e4. [PMID: 33961788 DOI: 10.1016/j.cels.2021.04.007] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Revised: 03/25/2021] [Accepted: 04/13/2021] [Indexed: 12/29/2022]
Abstract
Predicting the metastasis risk in patients with a primary breast cancer tumor is of fundamental importance to decide the best therapeutic strategy in the framework of personalized medicine. Here, we present ARIADNE, a general algorithmic strategy to assess the risk of metastasis from transcriptomic data of patients with triple-negative breast cancer, a subtype of breast cancer with poorer prognosis with respect to the other subtypes. ARIADNE identifies hybrid epithelial/mesenchymal phenotypes by mapping gene expression data into the states of a Boolean network model of the epithelial-mesenchymal pathway. Using this mapping, it is possible to stratify patients according to their prognosis, as we show by validating the strategy with three independent cohorts of triple-negative breast cancer patients. Our strategy provides a prognostic tool that could be applied to other biologically relevant pathways, in order to estimate the metastatic risk for other breast cancer subtypes or other tumor types. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
|
14
|
Kaur H, Bhalla S, Kaur D, Raghava GP. CancerLivER: a database of liver cancer gene expression resources and biomarkers. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2021; 2020:5798989. [PMID: 32147717 PMCID: PMC7061090 DOI: 10.1093/database/baaa012] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Liver cancer is the fourth major lethal malignancy worldwide. To understand the development and progression of liver cancer, biomedical research generated a tremendous amount of transcriptomics and disease-specific biomarker data. However, dispersed information poses pragmatic hurdles to delineate the significant markers for the disease. Hence, a dedicated resource for liver cancer is required that integrates scattered multiple formatted datasets and information regarding disease-specific biomarkers. Liver Cancer Expression Resource (CancerLivER) is a database that maintains gene expression datasets of liver cancer along with the putative biomarkers defined for the same in the literature. It manages 115 datasets that include gene-expression profiles of 9611 samples. Each of incorporated datasets was manually curated to remove any artefact; subsequently, a standard and uniform pipeline according to the specific technique is employed for their processing. Additionally, it contains comprehensive information on 594 liver cancer biomarkers which include mainly 315 gene biomarkers or signatures and 178 protein- and 46 miRNA-based biomarkers. To explore the full potential of data on liver cancer, a web-based interactive platform was developed to perform search, browsing and analyses. Analysis tools were also integrated to explore and visualize the expression patterns of desired genes among different types of samples based on individual gene, GO ontology and pathways. Furthermore, a dataset matrix download facility was provided to facilitate the users for their extensive analysis to elucidate more robust disease-specific signatures. Eventually, CancerLivER is a comprehensive resource which is highly useful for the scientific community working in the field of liver cancer.Availability: CancerLivER can be accessed on the web at https://webs.iiitd.edu.in/raghava/cancerliver.
Collapse
Affiliation(s)
- Harpreet Kaur
- Bioinformatics Centre, CSIR-Institute of Microbial Technology, Sector -39A, Chandigarh-160036, India.,Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi-110020, India
| | - Sherry Bhalla
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi-110020, India.,Centre for Systems Biology and Bioinformatics, Sector-25, Panjab University, Chandigarh-160036, India
| | - Dilraj Kaur
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi-110020, India
| | - Gajendra Ps Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi-110020, India
| |
Collapse
|
15
|
Manjang K, Tripathi S, Yli-Harja O, Dehmer M, Glazko G, Emmert-Streib F. Prognostic gene expression signatures of breast cancer are lacking a sensible biological meaning. Sci Rep 2021; 11:156. [PMID: 33420139 PMCID: PMC7794581 DOI: 10.1038/s41598-020-79375-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Accepted: 12/03/2020] [Indexed: 12/28/2022] Open
Abstract
The identification of prognostic biomarkers for predicting cancer progression is an important problem for two reasons. First, such biomarkers find practical application in a clinical context for the treatment of patients. Second, interrogation of the biomarkers themselves is assumed to lead to novel insights of disease mechanisms and the underlying molecular processes that cause the pathological behavior. For breast cancer, many signatures based on gene expression values have been reported to be associated with overall survival. Consequently, such signatures have been used for suggesting biological explanations of breast cancer and drug mechanisms. In this paper, we demonstrate for a large number of breast cancer signatures that such an implication is not justified. Our approach eliminates systematically all traces of biological meaning of signature genes and shows that among the remaining genes, surrogate gene sets can be formed with indistinguishable prognostic prediction capabilities and opposite biological meaning. Hence, our results demonstrate that none of the studied signatures has a sensible biological interpretation or meaning with respect to disease etiology. Overall, this shows that prognostic signatures are black-box models with sensible predictions of breast cancer outcome but no value for revealing causal connections. Furthermore, we show that the number of such surrogate gene sets is not small but very large.
Collapse
Affiliation(s)
- Kalifa Manjang
- Predictive Society and Data Analytics Lab, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland
| | - Shailesh Tripathi
- Predictive Society and Data Analytics Lab, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland
| | - Olli Yli-Harja
- Computational Systems Biology, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland
- Institute for Systems Biology, Seattle, WA, USA
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, USA
| | - Matthias Dehmer
- Steyr School of Management, University of Applied Sciences Upper Austria, 4400 Steyr Campus, Wels, Austria
- College of Artificial Intelligence, Nankai University, Tianjin, 300350, China
- Department of Biomedical Computer Science and Mechatronics, UMIT-The Health and Life Science University, 6060 Hall in Tyrol, Innsbruck, Austria
| | - Galina Glazko
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, USA
| | - Frank Emmert-Streib
- Predictive Society and Data Analytics Lab, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland.
- Institute of Biosciences and Medical Technology, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland.
| |
Collapse
|
16
|
Béal J, Pantolini L, Noël V, Barillot E, Calzone L. Personalized logical models to investigate cancer response to BRAF treatments in melanomas and colorectal cancers. PLoS Comput Biol 2021; 17:e1007900. [PMID: 33507915 PMCID: PMC7872233 DOI: 10.1371/journal.pcbi.1007900] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Revised: 02/09/2021] [Accepted: 12/21/2020] [Indexed: 11/19/2022] Open
Abstract
The study of response to cancer treatments has benefited greatly from the contribution of different omics data but their interpretation is sometimes difficult. Some mathematical models based on prior biological knowledge of signaling pathways facilitate this interpretation but often require fitting of their parameters using perturbation data. We propose a more qualitative mechanistic approach, based on logical formalism and on the sole mapping and interpretation of omics data, and able to recover differences in sensitivity to gene inhibition without model training. This approach is showcased by the study of BRAF inhibition in patients with melanomas and colorectal cancers who experience significant differences in sensitivity despite similar omics profiles. We first gather information from literature and build a logical model summarizing the regulatory network of the mitogen-activated protein kinase (MAPK) pathway surrounding BRAF, with factors involved in the BRAF inhibition resistance mechanisms. The relevance of this model is verified by automatically assessing that it qualitatively reproduces response or resistance behaviors identified in the literature. Data from over 100 melanoma and colorectal cancer cell lines are then used to validate the model's ability to explain differences in sensitivity. This generic model is transformed into personalized cell line-specific logical models by integrating the omics information of the cell lines as constraints of the model. The use of mutations alone allows personalized models to correlate significantly with experimental sensitivities to BRAF inhibition, both from drug and CRISPR targeting, and even better with the joint use of mutations and RNA, supporting multi-omics mechanistic models. A comparison of these untrained models with learning approaches highlights similarities in interpretation and complementarity depending on the size of the datasets. This parsimonious pipeline, which can easily be extended to other biological questions, makes it possible to explore the mechanistic causes of the response to treatment, on an individualized basis.
Collapse
Affiliation(s)
- Jonas Béal
- Institut Curie, PSL Research University, Paris, France
- INSERM, U900, Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Paris, France
| | - Lorenzo Pantolini
- Institut Curie, PSL Research University, Paris, France
- INSERM, U900, Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Paris, France
| | - Vincent Noël
- Institut Curie, PSL Research University, Paris, France
- INSERM, U900, Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Paris, France
| | - Emmanuel Barillot
- Institut Curie, PSL Research University, Paris, France
- INSERM, U900, Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Paris, France
| | - Laurence Calzone
- Institut Curie, PSL Research University, Paris, France
- INSERM, U900, Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Paris, France
| |
Collapse
|
17
|
Hutchison LAD, Berger B, Kohane IS. Meta-analysis of Caenorhabditis elegans single-cell developmental data reveals multi-frequency oscillation in gene activation. Bioinformatics 2020; 36:4047-4057. [PMID: 31860066 PMCID: PMC7332571 DOI: 10.1093/bioinformatics/btz864] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Revised: 09/23/2019] [Accepted: 12/18/2019] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION The advent of in vivo automated techniques for single-cell lineaging, sequencing and analysis of gene expression has begun to dramatically increase our understanding of organismal development. We applied novel meta-analysis and visualization techniques to the EPIC single-cell-resolution developmental gene expression dataset for Caenorhabditis elegans from Bao, Murray, Waterston et al. to gain insights into regulatory mechanisms governing the timing of development. RESULTS Our meta-analysis of the EPIC dataset revealed that a simple linear combination of the expression levels of the developmental genes is strongly correlated with the developmental age of the organism, irrespective of the cell division rate of different cell lineages. We uncovered a pattern of collective sinusoidal oscillation in gene activation, in multiple dominant frequencies and in multiple orthogonal axes of gene expression, pointing to the existence of a coordinated, multi-frequency global timing mechanism. We developed a novel method based on Fisher's Discriminant Analysis to identify gene expression weightings that maximally separate traits of interest, and found that remarkably, simple linear gene expression weightings are capable of producing sinusoidal oscillations of any frequency and phase, adding to the growing body of evidence that oscillatory mechanisms likely play an important role in the timing of development. We cross-linked EPIC with gene ontology and anatomy ontology terms, employing Fisher's Discriminant Analysis methods to identify previously unknown positive and negative genetic contributions to developmental processes and cell phenotypes. This meta-analysis demonstrates new evidence for direct linear and/or sinusoidal mechanisms regulating the timing of development. We uncovered a number of previously unknown positive and negative correlations between developmental genes and developmental processes or cell phenotypes. Our results highlight both the continued relevance of the EPIC technique, and the value of meta-analysis of previously published results. The presented analysis and visualization techniques are broadly applicable across developmental and systems biology. AVAILABILITY AND IMPLEMENTATION Analysis software available upon request. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Bonnie Berger
- MIT Computer Science and AI Lab, Cambridge, MA 02139, USA
| | | |
Collapse
|
18
|
Bloomstein JD, von Eyben R, Chan A, Rankin EB, Fregoso DR, Wang-Chiang J, Lee L, Xie LX, David SM, Stehr H, Esfahani MS, Giaccia AJ, Kidd EA. Validated limited gene predictor for cervical cancer lymph node metastases. Oncotarget 2020; 11:2302-2309. [PMID: 32595829 PMCID: PMC7299532 DOI: 10.18632/oncotarget.27632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2020] [Accepted: 05/14/2020] [Indexed: 11/30/2022] Open
Abstract
Purpose: Recognizing the prognostic significance of lymph node (LN) involvement for cervical cancer, we aimed to identify genes that are differentially expressed in LN+ versus LN- cervical cancer and to potentially create a validated predictive gene signature for LN involvement. Materials and Methods: Primary tumor biopsies were collected from 74 cervical cancer patients. RNA was extracted and RNA sequencing was performed. The samples were divided by institution into a training set (n = 57) and a testing set (n = 17). Differentially expressed genes were identified among the training cohort and used to train a Random Forest classifier. Results: 22 genes showed > 1.5 fold difference in expression between the LN+ and LN- groups. Using forward selection 5 genes were identified and, based on the clinical knowledge of these genes and testing of the different combinations, a 2-gene Random Forest model of BIRC3 and CD300LG was developed. The classification accuracy of lymph node (LN) status on the test set was 88.2%, with an Area under the Receiver Operating Characteristic curve (ROC-AUC) of 98.6%. Conclusions: We identified a 2 gene Random Forest model of BIRC3 and CD300LG that predicted lymph node involvement in a validation cohort. This validated model, following testing in additional cohorts, could be used to create a reverse transcription-quantitative polymerase chain reaction (RT-qPCR) tool that would be useful for helping to identify patients with LN involvement in resource-limited settings.
Collapse
Affiliation(s)
- Joshua D Bloomstein
- Department of Radiation Oncology, Stanford University School of Medicine, Stanford, CA, USA
| | - Rie von Eyben
- Department of Radiation Oncology, Stanford University School of Medicine, Stanford, CA, USA
| | - Andy Chan
- Department of Radiation Oncology, Stanford University School of Medicine, Stanford, CA, USA
| | - Erinn B Rankin
- Department of Radiation Oncology, Stanford University School of Medicine, Stanford, CA, USA
| | - Daniel R Fregoso
- Department of Radiation Oncology, Stanford University School of Medicine, Stanford, CA, USA
| | - Jing Wang-Chiang
- Department of Gynecologic Oncology, Santa Clara Valley Medical Center, Fruitdale, CA, USA
| | - Lisa Lee
- Department of Gynecologic Oncology, Santa Clara Valley Medical Center, Fruitdale, CA, USA
| | - Liang-Xi Xie
- Department of Radiation Oncology, Xiamen University Xiang'an Hospital, Xiamen, Fujian, China
| | | | - Henning Stehr
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
| | - Mohammad S Esfahani
- Department of Radiation Oncology, Stanford University School of Medicine, Stanford, CA, USA
| | - Amato J Giaccia
- Department of Radiation Oncology, Stanford University School of Medicine, Stanford, CA, USA
| | - Elizabeth A Kidd
- Department of Radiation Oncology, Stanford University School of Medicine, Stanford, CA, USA
| |
Collapse
|
19
|
SSizer: Determining the Sample Sufficiency for Comparative Biological Study. J Mol Biol 2020; 432:3411-3421. [DOI: 10.1016/j.jmb.2020.01.027] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2019] [Revised: 12/26/2019] [Accepted: 01/18/2020] [Indexed: 01/25/2023]
|
20
|
Prediction of Poor Response to Neoadjuvant Chemoradiation in Patients With Rectal Cancer Using a DNA Repair Deregulation Score: Picking the Losers Instead of the Winners. Dis Colon Rectum 2020; 63:300-309. [PMID: 31842156 DOI: 10.1097/dcr.0000000000001564] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
BACKGROUND Patients with rectal cancer may undergo neoadjuvant chemoradiation even in early stages in an attempt to achieve complete clinical response and undergo organ preservation. However, prediction of tumor response is unavailable. In this setting, accurate identification of poor responders could spare patients with early stage disease from potentially unnecessary chemoradiation. OBJECTIVE This study focused on development/test of a score based on DNA repair gene expression to predict response to neoadjuvant chemoradiation in patients with rectal cancer. DESIGN Pretreatment biopsy samples from patients with rectal cancer undergoing neoadjuvant chemoradiation were collected and underwent gene expression analysis using RNA-Seq (test cohort). A score was constructed using 8 differentially expressed DNA repair genes between good (complete clinical) and poor responders (incomplete clinical) to treatment. The score was validated in 2 independent cohorts of patients undergoing similar treatment strategies and using quantitative polymerase chain reaction and microarray gene expression data. SETTINGS This was a retrospective analysis of gene expression data from 3 independent institutions. PATIENTS Patients with rectal cancer undergoing neoadjuvant chemoradiation (50.4-54.0 Gy and 5-fluorouracil-based chemotherapy) were eligible. Patients with complete clinical response, complete pathological response, or ≤10% residual cancer cells were considered good responders. Patients with >10% residual cancer cells were considered poor responders. The test cohort included 25 patients (16 poor responders). Validation cohort 1 included 28 patients (18 poor responders), and validation cohort 2 included 46 patients (22 poor responders). MAIN OUTCOMES MEASURES Response was correlated with the DNA repair score calculated using the expression levels of 8 DNA repair genes. DNA repair score sensitivity, specificity, and positive and negative predictive values were determined in test and validation cohorts. RESULTS Poor responders had significantly lower DNA repair scores when compared with good responders across all 3 cohorts, regardless of the gene expression platform used. A low score correctly predicted poor response in 93%, 90%, and 71% in test, validation 1, and validation 2 cohorts. LIMITATIONS This study was limited by its small sample size, different gene expression platforms, and treatment regimens across different cohorts used. CONCLUSIONS A DNA repair gene score may predict patients likely to have poor response to chemoradiation. This score may be a relevant tool to be investigated in future studies focused on chemoradiation used in the context of organ preservation. See Video Abstract at http://links.lww.com/DCR/B104. PREDICCIÓN DE RESPUESTA DEFICIENTE A LA RADIO-QUIMIOTERAPIA NEOADYUVANTE EN PACIENTES CON CÁNCER RECTAL UTILIZANDO UNA PUNTUACIÓN DE DESREGULACIÓN DE REPARACIÓN DE ADN: ESCOGER LOS PERDEDORES EN LUGAR DE LOS GANADORES: Los pacientes con cáncer rectal pueden someterse a radio-quimioterapia neoadyuvante incluso en estadios tempranos en el intento por lograr una respuesta clínica completa y permitir una preservación de órgano. Sin embargo, aun no existen herramientas disponible para la prediccion de la respuesta tumoral al tratamiento. En este contexto, la identificación precisa de los tumores con mala respuesta al tratamiento podría evitar que los pacientes con enfermedad en estadio temprano sean sometidos a radio-quimioterapia potencialmente innecesaria.Desarrollo / testeo de una puntuación basada en la expresión genes reparadores del ADN para predecir la respuesta a la nCRT en pacientes con cáncer rectal.Se recogieron muestras de biopsia de pre-tratamiento de pacientes con cáncer rectal sometidos a radio-quimioterapia neoadyuvante y se les realizó un análisis de expresión génica utilizando RNAseq (cohorte de prueba). Se construyó una puntuación utilizando 8 genes de reparación de ADN expresados diferencialmente entre buenos (respuesta clinica completa) y pobres respondedores (respuesta clinica incompleta) al tratamiento. La puntuación se validó en 2 cohortes independientes de pacientes sometidos a estrategias de tratamiento similares y utilizando qPCR y datos de expresión de genes en chips ADN (biotecnología -microarrays).Análisis retrospectivo de los datos de expresión génica de 3 instituciones independientes.Fueron incluidos aquellos pacientes con cáncer rectal sometidos a radio-quimioterapia neoadyuvante (50,4-54 Gy y quimioterapia basada en 5FU). Los pacientes con respuesta clínica completa, respuesta patológica completa o ≤10% de células cancerosas residuales se consideraron buenos respondedores. Los pacientes con> 10% de células cancerosas residuales se consideraron de respuesta deficiente. La cohorte de prueba incluyó a 25 pacientes (16 respondedores pobres). La cohorte de validación n. ° 1 incluyó a 28 pacientes (18 respondedores pobres) y la cohorte de validación n. ° 2 incluyó a 46 pacientes (22 respondedores pobres).La respuesta se correlacionó con la puntuación de reparación de ADN calculada utilizando los niveles de expresión de 8 genes de reparación de ADN. La sensibilidad del puntaje de reparación del ADN, la especificidad, los valores predictivos positivos y negativos se determinaron en las cohortes de prueba y validación.Los malos respondedores tuvieron puntuaciones de reparación de ADN significativamente más bajas en comparación con los buenos respondedores en las 3 cohortes, independientemente de la plataforma de expresión génica utilizada. Una puntuación baja predijo correctamente una respuesta pobre en el 93%, 90% y 71% en las cohortes de prueba, validación n. ° 1 y validación n. ° 2, respectivamente.Pequeño tamaño de la muestra, diferentes plataformas de expresión génica y regímenes de tratamiento en diferentes cohortes utilizadas.La puntuacion basada en genes de reparación del ADN puede predecir los pacientes con respuesta pobre a la radio-quimioterapia. Esta puntuación puede ser una herramienta relevante para investigar en futuros estudios centrados en la radio-quimioterapia utilizada en el contexto de la preservación de órganos. Consulte Video Resumen en http://links.lww.com/DCR/B104. (Traducción-Dr. Xavier Delgadillo and Dr. Laura Melina Fernandez).
Collapse
|
21
|
A core collection of pan-schizophrenia genes allows building cohort-specific signatures of affected brain. Sci Rep 2019; 9:12671. [PMID: 31481672 PMCID: PMC6722126 DOI: 10.1038/s41598-019-48605-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Accepted: 07/30/2019] [Indexed: 02/05/2023] Open
Abstract
To investigate whether pan-schizophrenia genes could be leveraged for building cohort-specific signatures reflecting the functioning of the affected brain, we first collected 1,518 schizophrenia-related genes upon analysis of 12,316 independent peer-reviewed literature sources. More than half of these genes have been reported in at least 3 independent studies, and a majority (81.4%) were enriched within 156 functional pathways (p-values < 1e-15). Gene expression profiles of brain tissues were extracted from 14 publicly available independent datasets, and classified into "schizophrenia" and "normal" bins using dataset-specific subsets of core schizophrenia collection genes built with either a sparse representation-based variable selection (SRVS) approach or with analysis of variance (ANOVA)-based gene selection approach. Results showed that cohort-specific classifiers by both SRVS and ANOVA methods are capable of providing significantly higher accuracy in the diagnosis of schizophrenia than using the whole core genes (p < 3.38e-6), with relatively low sensitivity to the ethnic backgrounds or areas of brain biopsies. Our results suggest that the formation of consensus collection of pan-schizophrenia genes and its dissection into the functional components could be a feasible alternative to the expansion of sample size, which is needed for further in-depth studies of the pathophysiology of the human brain.
Collapse
|
22
|
Huo Z, Tang S, Park Y, Tseng G. P-value evaluation, variability index and biomarker categorization for adaptively weighted Fisher's meta-analysis method in omics applications. Bioinformatics 2019; 36:524-532. [PMID: 31359040 PMCID: PMC7867999 DOI: 10.1093/bioinformatics/btz589] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Revised: 05/25/2019] [Accepted: 07/24/2019] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION Meta-analysis methods have been widely used to combine results from multiple clinical or genomic studies to increase statistical powers and ensure robust and accurate conclusions. The adaptively weighted Fisher's method (AW-Fisher), initially developed for omics applications but applicable for general meta-analysis, is an effective approach to combine P-values from K independent studies and to provide better biological interpretability by characterizing which studies contribute to the meta-analysis. Currently, AW-Fisher suffers from the lack of fast P-value computation and variability estimate of AW weights. When the number of studies K is large, the 3K - 1 possible differential expression pattern categories generated by AW-Fisher can become intractable. In this paper, we develop an importance sampling scheme with spline interpolation to increase the accuracy and speed of the P-value calculation. We also apply bootstrapping to construct a variability index for the AW-Fisher weight estimator and a co-membership matrix to categorize (cluster) differentially expressed genes based on their meta-patterns for intuitive biological investigations. RESULTS The superior performance of the proposed methods is shown in simulations as well as two real omics meta-analysis applications to demonstrate its insightful biological findings. AVAILABILITY AND IMPLEMENTATION An R package AWFisher (calling C++) is available at Bioconductor and GitHub (https://github.com/Caleb-Huo/AWFisher), and all datasets and programing codes for this paper are available in the Supplementary Material. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhiguang Huo
- Department of Biostatistics, University of Florida, Gainesville, FL 32611, USA
| | - Shaowu Tang
- Roche Molecular Solutions, Inc., Pleasanton, CA 94588, USA
| | - Yongseok Park
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | | |
Collapse
|
23
|
Fa B, Luo C, Tang Z, Yan Y, Zhang Y, Yu Z. Pathway-based biomarker identification with crosstalk analysis for robust prognosis prediction in hepatocellular carcinoma. EBioMedicine 2019; 44:250-260. [PMID: 31101593 PMCID: PMC6606892 DOI: 10.1016/j.ebiom.2019.05.010] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Revised: 05/06/2019] [Accepted: 05/06/2019] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Although many prognostic single-gene (SG) lists have been identified in cancer research, application of these features is hampered due to poor robustness and performance on independent datasets. Pathway-based approaches have thus emerged which embed biological knowledge to yield reproducible features. METHODS Pathifier estimates pathways deregulation score (PDS) to represent the extent of pathway deregulation based on expression data, and most of its applications treat pathways as independent without addressing the effect of gene overlap between pathway pairs which we refer to as crosstalk. Here, we propose a novel procedure based on Pathifier methodology, which for the first time has been utilized with crosstalk accommodated to identify disease-specific features to predict prognosis in patients with hepatocellular carcinoma (HCC). FINDINGS With the cohort (N = 355) of HCC patients from The Cancer Genome Atlas (TCGA), cross validation (CV) revealed that PDSs identified were more robust and accurate than the SG features by deep learning (DL)-based approach. When validated on external HCC datasets, these features outperformed the SGs consistently. INTERPRETATION On average, we provide 10.2% improvement of prediction accuracy. Importantly, governing genes in these features provide valuable insight into the cancer hallmarks of HCC. We develop an R package PATHcrosstalk (available from GitHub https://github.com/fabotao/PATHcrosstalk) with which users can discover pathways of interest with crosstalk effect considered.
Collapse
Affiliation(s)
- Botao Fa
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China; SJTU-Yale Joint Centre for Biostatistics, Shanghai Jiao Tong University, Shanghai, China
| | - Chengwen Luo
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China; SJTU-Yale Joint Centre for Biostatistics, Shanghai Jiao Tong University, Shanghai, China
| | - Zhou Tang
- SJTU-Yale Joint Centre for Biostatistics, Shanghai Jiao Tong University, Shanghai, China
| | - Yuting Yan
- SJTU-Yale Joint Centre for Biostatistics, Shanghai Jiao Tong University, Shanghai, China
| | - Yue Zhang
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China; SJTU-Yale Joint Centre for Biostatistics, Shanghai Jiao Tong University, Shanghai, China
| | - Zhangsheng Yu
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China; SJTU-Yale Joint Centre for Biostatistics, Shanghai Jiao Tong University, Shanghai, China.
| |
Collapse
|
24
|
Two-Way Horizontal and Vertical Omics Integration for Disease Subtype Discovery. STATISTICS IN BIOSCIENCES 2019. [DOI: 10.1007/s12561-019-09242-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
25
|
Kim S, Kang D, Huo Z, Park Y, Tseng GC. Meta-analytic principal component analysis in integrative omics application. Bioinformatics 2019; 34:1321-1328. [PMID: 29186328 DOI: 10.1093/bioinformatics/btx765] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2016] [Accepted: 11/22/2017] [Indexed: 12/15/2022] Open
Abstract
Motivation With the prevalent usage of microarray and massively parallel sequencing, numerous high-throughput omics datasets have become available in the public domain. Integrating abundant information among omics datasets is critical to elucidate biological mechanisms. Due to the high-dimensional nature of the data, methods such as principal component analysis (PCA) have been widely applied, aiming at effective dimension reduction and exploratory visualization. Results In this article, we combine multiple omics datasets of identical or similar biological hypothesis and introduce two variations of meta-analytic framework of PCA, namely MetaPCA. Regularization is further incorporated to facilitate sparse feature selection in MetaPCA. We apply MetaPCA and sparse MetaPCA to simulations, three transcriptomic meta-analysis studies in yeast cell cycle, prostate cancer, mouse metabolism and a TCGA pan-cancer methylation study. The result shows improved accuracy, robustness and exploratory visualization of the proposed framework. Availability and implementation An R package MetaPCA is available online. (http://tsenglab.biostat.pitt.edu/software.htm). Contact ctseng@pitt.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- SungHwan Kim
- Department of Statistics, Keimyung University, Daegu 42601, South Korea
| | - Dongwan Kang
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Zhiguang Huo
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Yongseok Park
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - George C Tseng
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261, USA.,Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| |
Collapse
|
26
|
Gao YC, Zhou XH, Zhang W. An Ensemble Strategy to Predict Prognosis in Ovarian Cancer Based on Gene Modules. Front Genet 2019; 10:366. [PMID: 31068972 PMCID: PMC6491874 DOI: 10.3389/fgene.2019.00366] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Accepted: 04/05/2019] [Indexed: 12/15/2022] Open
Abstract
Due to the high heterogeneity and complexity of cancer, it is still a challenge to predict the prognosis of cancer patients. In this work, we used a clustering algorithm to divide patients into different subtypes in order to reduce the heterogeneity of the cancer patients in each subtype. Based on the hypothesis that the gene co-expression network may reveal relationships among genes, some communities in the network could influence the prognosis of cancer patients and all the prognosis-related communities could fully reveal the prognosis of cancer patients. To predict the prognosis for cancer patients in each subtype, we adopted an ensemble classifier based on the gene co-expression network of the corresponding subtype. Using the gene expression data of ovarian cancer patients in TCGA (The Cancer Genome Atlas), three subtypes were identified. Survival analysis showed that patients in different subtypes had different survival risks. Three ensemble classifiers were constructed for each subtype. Leave-one-out and independent validation showed that our method outperformed control and literature methods. Furthermore, the function annotation of the communities in each subtype showed that some communities were cancer-related. Finally, we found that the current drug targets can partially support our method.
Collapse
Affiliation(s)
| | - Xiong-Hui Zhou
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Wen Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| |
Collapse
|
27
|
Sahu AD, S Lee J, Wang Z, Zhang G, Iglesias-Bartolome R, Tian T, Wei Z, Miao B, Nair NU, Ponomarova O, Friedman AA, Amzallag A, Moll T, Kasumova G, Greninger P, Egan RK, Damon LJ, Frederick DT, Jerby-Arnon L, Wagner A, Cheng K, Park SG, Robinson W, Gardner K, Boland G, Hannenhalli S, Herlyn M, Benes C, Flaherty K, Luo J, Gutkind JS, Ruppin E. Genome-wide prediction of synthetic rescue mediators of resistance to targeted and immunotherapy. Mol Syst Biol 2019; 15:e8323. [PMID: 30858180 PMCID: PMC6413886 DOI: 10.15252/msb.20188323] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2018] [Revised: 12/31/2018] [Accepted: 01/21/2019] [Indexed: 01/09/2023] Open
Abstract
Most patients with advanced cancer eventually acquire resistance to targeted therapies, spurring extensive efforts to identify molecular events mediating therapy resistance. Many of these events involve synthetic rescue (SR) interactions, where the reduction in cancer cell viability caused by targeted gene inactivation is rescued by an adaptive alteration of another gene (the rescuer). Here, we perform a genome-wide in silico prediction of SR rescuer genes by analyzing tumor transcriptomics and survival data of 10,000 TCGA cancer patients. Predicted SR interactions are validated in new experimental screens. We show that SR interactions can successfully predict cancer patients' response and emerging resistance. Inhibiting predicted rescuer genes sensitizes resistant cancer cells to therapies synergistically, providing initial leads for developing combinatorial approaches to overcome resistance proactively. Finally, we show that the SR analysis of melanoma patients successfully identifies known mediators of resistance to immunotherapy and predicts novel rescuers.
Collapse
Affiliation(s)
- Avinash Das Sahu
- Department of Biostatistics and Computational Biology, Harvard School of Public Health, Boston, MA, USA
- Department of Medicine and Harvard Medical School, Massachusetts General Hospital Cancer Center, Boston, MA, USA
- University of Maryland Institute of Advanced Computer Science (UMIACS), University of Maryland, College Park, MD, USA
| | - Joo S Lee
- University of Maryland Institute of Advanced Computer Science (UMIACS), University of Maryland, College Park, MD, USA
- Cancer Data Science Lab, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Zhiyong Wang
- Department of Pharmacology & Moores Cancer Center, University of California, San Diego La Jolla, CA, USA
| | - Gao Zhang
- Molecular and Cellular Oncogenesis Program and Melanoma Research Center, The Wistar Institute, Philadelphia, PA, USA
- Department of Neurosurgery and The Preston Robert Tisch Brain Tumor Center, Duke University, Durham, NC, USA
| | | | - Tian Tian
- New Jersey Institute of Technology, Newark, NJ, USA
| | - Zhi Wei
- New Jersey Institute of Technology, Newark, NJ, USA
| | - Benchun Miao
- Department of Medicine and Harvard Medical School, Massachusetts General Hospital Cancer Center, Boston, MA, USA
| | - Nishanth Ulhas Nair
- University of Maryland Institute of Advanced Computer Science (UMIACS), University of Maryland, College Park, MD, USA
- Cancer Data Science Lab, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Olga Ponomarova
- University of Massachusetts Medical School, Worcester, MA, USA
| | - Adam A Friedman
- Department of Medicine and Harvard Medical School, Massachusetts General Hospital Cancer Center, Boston, MA, USA
| | - Arnaud Amzallag
- Department of Medicine and Harvard Medical School, Massachusetts General Hospital Cancer Center, Boston, MA, USA
| | - Tabea Moll
- Department of Medicine and Harvard Medical School, Massachusetts General Hospital Cancer Center, Boston, MA, USA
| | - Gyulnara Kasumova
- Department of Medicine and Harvard Medical School, Massachusetts General Hospital Cancer Center, Boston, MA, USA
| | - Patricia Greninger
- Department of Medicine and Harvard Medical School, Massachusetts General Hospital Cancer Center, Boston, MA, USA
| | - Regina K Egan
- Department of Medicine and Harvard Medical School, Massachusetts General Hospital Cancer Center, Boston, MA, USA
| | - Leah J Damon
- Department of Medicine and Harvard Medical School, Massachusetts General Hospital Cancer Center, Boston, MA, USA
| | - Dennie T Frederick
- Department of Medicine and Harvard Medical School, Massachusetts General Hospital Cancer Center, Boston, MA, USA
| | - Livnat Jerby-Arnon
- Schools of Computer Science & Medicine, Tel-Aviv University, Tel-Aviv, Israel
| | - Allon Wagner
- Department of Electrical Engineering and Computer Science, the Center for Computational Biology, University of California, Berkeley, CA, USA
| | - Kuoyuan Cheng
- University of Maryland Institute of Advanced Computer Science (UMIACS), University of Maryland, College Park, MD, USA
| | - Seung Gu Park
- Department of Biostatistics and Computational Biology, Harvard School of Public Health, Boston, MA, USA
| | - Welles Robinson
- University of Maryland Institute of Advanced Computer Science (UMIACS), University of Maryland, College Park, MD, USA
| | - Kevin Gardner
- Cancer Data Science Lab, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Genevieve Boland
- Department of Medicine and Harvard Medical School, Massachusetts General Hospital Cancer Center, Boston, MA, USA
| | - Sridhar Hannenhalli
- University of Maryland Institute of Advanced Computer Science (UMIACS), University of Maryland, College Park, MD, USA
| | - Meenhard Herlyn
- Molecular and Cellular Oncogenesis Program and Melanoma Research Center, The Wistar Institute, Philadelphia, PA, USA
| | - Cyril Benes
- Department of Medicine and Harvard Medical School, Massachusetts General Hospital Cancer Center, Boston, MA, USA
| | - Keith Flaherty
- Department of Medicine and Harvard Medical School, Massachusetts General Hospital Cancer Center, Boston, MA, USA
| | - Ji Luo
- National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - J Silvio Gutkind
- Department of Pharmacology & Moores Cancer Center, University of California, San Diego La Jolla, CA, USA
| | - Eytan Ruppin
- University of Maryland Institute of Advanced Computer Science (UMIACS), University of Maryland, College Park, MD, USA
- Cancer Data Science Lab, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
- Schools of Computer Science & Medicine, Tel-Aviv University, Tel-Aviv, Israel
| |
Collapse
|
28
|
Huo Z, Song C, Tseng G. BAYESIAN LATENT HIERARCHICAL MODEL FOR TRANSCRIPTOMIC META-ANALYSIS TO DETECT BIOMARKERS WITH CLUSTERED META-PATTERNS OF DIFFERENTIAL EXPRESSION SIGNALS. Ann Appl Stat 2019; 13:340-366. [PMID: 31007807 PMCID: PMC6472949 DOI: 10.1214/18-aoas1188] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Due to the rapid development of high-throughput experimental techniques and fast-dropping prices, many transcriptomic datasets have been generated and accumulated in the public domain. Meta-analysis combining multiple transcriptomic studies can increase the statistical power to detect disease-related biomarkers. In this paper, we introduce a Bayesian latent hierarchical model to perform transcriptomic meta-analysis. This method is capable of detecting genes that are differentially expressed (DE) in only a subset of the combined studies, and the latent variables help quantify homogeneous and heterogeneous differential expression signals across studies. A tight clustering algorithm is applied to detected biomarkers to capture differential meta-patterns that are informative to guide further biological investigation. Simulations and three examples, including a microarray dataset from metabolism-related knockout mice, an RNA-seq dataset from HIV transgenic rats, and cross-platform datasets from human breast cancer, are used to demonstrate the performance of the proposed method.
Collapse
Affiliation(s)
- Zhiguang Huo
- Department of Biostatistics University of Florida Gainesville, FL 32611
| | - Chi Song
- Division of Biostatistics College of Public Health The Ohio State University Columbus, OH 43210
| | - George Tseng
- Department of Biostatistics, Human Genetics and Computational Biology University of Pittsburgh Pittsburgh, PA 15261
| |
Collapse
|
29
|
Yu LH, Huang QW, Zhou XH. Identification of Cancer Hallmarks Based on the Gene Co-expression Networks of Seven Cancers. Front Genet 2019; 10:99. [PMID: 30838028 PMCID: PMC6389798 DOI: 10.3389/fgene.2019.00099] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Accepted: 01/29/2019] [Indexed: 12/20/2022] Open
Abstract
Identifying the hallmarks of cancer is essential for cancer research, and the genes involved in cancer hallmarks are likely to be cancer drivers. However, there is no appropriate method in the current literature for identifying genetic cancer hallmarks, especially considering the interrelationships among the genes. Here, we hypothesized that "dense clusters" (or "communities") in the gene co-expression networks of cancer patients may represent functional units regarding cancer formation and progression, and the communities present in the co-expression networks of multiple types of cancer may be cancer hallmarks. Consequently, we mined the conserved communities in the gene co-expression networks of seven cancers in order to identify candidate hallmarks. Functional annotation of the communities showed that they were mainly related to immune response, the cell cycle and the biological processes that maintain basic cellular functions. Survival analysis using the genes involved in the conserved communities verified that two of these hallmarks could predict the survival risks of cancer patients in multiple types of cancer. Furthermore, the genes involved in these hallmarks, one of which was related to the cell cycle, could be useful in screening for cancer drugs.
Collapse
Affiliation(s)
- Ling-Hao Yu
- College of Science, Huazhong Agricultural University, Wuhan, China
| | - Qin-Wei Huang
- College of Science, Huazhong Agricultural University, Wuhan, China
| | - Xiong-Hui Zhou
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| |
Collapse
|
30
|
Jardillier R, Chatelain F, Guyon L. Bioinformatics Methods to Select Prognostic Biomarker Genes from Large Scale Datasets: A Review. Biotechnol J 2018; 13:e1800103. [DOI: 10.1002/biot.201800103] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Revised: 10/15/2018] [Indexed: 12/28/2022]
Affiliation(s)
- Rémy Jardillier
- University Grenoble Alpes, CEA, INSERMBiology of Cancer Infection UMR_S 103638000GrenobleFrance
- University Grenoble Alpes, CNRS, Grenoble INPGIPSA‐labInstitute of Engineering University Grenoble Alpes38000GrenobleFrance
| | - Florent Chatelain
- University Grenoble Alpes, CNRS, Grenoble INPGIPSA‐labInstitute of Engineering University Grenoble Alpes38000GrenobleFrance
| | - Laurent Guyon
- University Grenoble Alpes, CEA, INSERMBiology of Cancer Infection UMR_S 103638000GrenobleFrance
| |
Collapse
|
31
|
Taveras LR, Cunningham HB, Imran JB. Can We Reliably Predict a Clinical Complete Response in Rectal Cancer? Current Trends and Future Strategies. CURRENT COLORECTAL CANCER REPORTS 2018. [DOI: 10.1007/s11888-018-0401-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
32
|
Shimoni Y. Association between expression of random gene sets and survival is evident in multiple cancer types and may be explained by sub-classification. PLoS Comput Biol 2018; 14:e1006026. [PMID: 29470520 PMCID: PMC5839591 DOI: 10.1371/journal.pcbi.1006026] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Revised: 03/06/2018] [Accepted: 02/06/2018] [Indexed: 01/15/2023] Open
Abstract
One of the goals of cancer research is to identify a set of genes that cause or control disease progression. However, although multiple such gene sets were published, these are usually in very poor agreement with each other, and very few of the genes proved to be functional therapeutic targets. Furthermore, recent findings from a breast cancer gene-expression cohort showed that sets of genes selected randomly can be used to predict survival with a much higher probability than expected. These results imply that many of the genes identified in breast cancer gene expression analysis may not be causal of cancer progression, even though they can still be highly predictive of prognosis. We performed a similar analysis on all the cancer types available in the cancer genome atlas (TCGA), namely, estimating the predictive power of random gene sets for survival. Our work shows that most cancer types exhibit the property that random selections of genes are more predictive of survival than expected. In contrast to previous work, this property is not removed by using a proliferation signature, which implies that proliferation may not always be the confounder that drives this property. We suggest one possible solution in the form of data-driven sub-classification to reduce this property significantly. Our results suggest that the predictive power of random gene sets may be used to identify the existence of sub-classes in the data, and thus may allow better understanding of patient stratification. Furthermore, by reducing the observed bias this may allow more direct identification of biologically relevant, and potentially causal, genes. Multiple gene sets have been published as predictive of cancer progression and metastasis in several cancer types. Although many of these sets proved to be highly predictive of survival, even gene sets for the same cancer (but from different data-sets or different analyses) exhibit very little overlap and to date did not provide functional therapeutic targets. Recent studies found that in breast cancer, even random gene sets can predict survival much better than would be expected, and on average are better than many published gene sets. Together, these results undermine the causal role of the published gene sets and their potential clinical implications. We show that random gene sets predict survival in many cancer types, and that this property no longer exists after splitting the data into subclasses based on data-driven clusters. This suggests that such sub-classification could increase the likelihood to identify causal genes that are potential therapeutic targets, and that this property can be used as an indication that there may be subclasses within the dataset.
Collapse
|
33
|
Sim W, Lee J, Choi C. Robust method for identification of prognostic gene signatures from gene expression profiles. Sci Rep 2017; 7:16926. [PMID: 29208919 PMCID: PMC5717170 DOI: 10.1038/s41598-017-17213-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2017] [Accepted: 11/22/2017] [Indexed: 12/11/2022] Open
Abstract
In the last decade, many attempts have been made to use gene expression profiles to identify prognostic genes for various types of cancer. Previous studies evaluating the prognostic value of genes suffered by failing to solve the critical problem of classifying patients into different risk groups based on specific gene expression threshold levels. Here, we present a novel method, called iterative patient partitioning (IPP), which was inspired by the receiver operating characteristic (ROC) curve, is based on the log-rank test and overcomes the threshold decision problem. We applied IPP to analyze datasets pertaining to various subtypes of breast cancer. Using IPP, we discovered both novel and well-studied prognostic genes related to cell cycle/proliferation or the immune response. The novel genes were further analyzed using copy-number alteration and mutation data, and these results supported their relationship with prognosis.
Collapse
Affiliation(s)
- Woogwang Sim
- Department of Bio and Brain Engineering, KAIST, Daejeon, 34141, Republic of Korea
| | - Jungsul Lee
- Cellex Life Sciences Incorporated, Daejeon, 34051, Republic of Korea.
| | - Chulhee Choi
- Department of Bio and Brain Engineering, KAIST, Daejeon, 34141, Republic of Korea. .,Cellex Life Sciences Incorporated, Daejeon, 34051, Republic of Korea.
| |
Collapse
|
34
|
Ma B, Xu Q, Song Y, Gao P, Wang Z. Current issues of preoperative radio(chemo)therapy and its future evolution in locally advanced rectal cancer. Future Oncol 2017; 13:2489-2501. [PMID: 29124955 DOI: 10.2217/fon-2017-0310] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
Neoadjuvant therapies are effective for local control and tumor downstaging. Up to date, preoperative long-course chemoradiotherapy and short-course radiotherapy are the two primary guideline-recommended neoadjuvant therapies for locally advanced rectal cancer patients. However, clinicians throughout the world are trying their best to further optimize the regimens and concepts of neoadjuvants. Hence, there is an urgent need to summarize evidence regarding indications of neaoadjuvant therapies and relative merits of current standard regimens. In addition, we also reviewed the optimized regimens mainly based on short-course radiotherapy with delayed surgery, consolidation chemotherapy, induction chemotherapy, chemotherapy alone without radiation and concepts in terms of organ preservation and personalized treatments to further explore the future evolution of neoadjuvant therapies in rectal cancer.
Collapse
Affiliation(s)
- Bin Ma
- Department of Surgical Oncology & General Surgery, the First Hospital of China Medical University, Shenyang 110001, PR China
| | - Qingzhou Xu
- Department of Surgical Oncology & General Surgery, the First Hospital of China Medical University, Shenyang 110001, PR China
| | - Yongxi Song
- Department of Surgical Oncology & General Surgery, the First Hospital of China Medical University, Shenyang 110001, PR China
| | - Peng Gao
- Department of Surgical Oncology & General Surgery, the First Hospital of China Medical University, Shenyang 110001, PR China
| | - Zhenning Wang
- Department of Surgical Oncology & General Surgery, the First Hospital of China Medical University, Shenyang 110001, PR China
| |
Collapse
|
35
|
Huo Z, Tseng G. Integrative Sparse K-Means With Overlapping Group Lasso in Genomic Applications for Disease Subtype Discovery. Ann Appl Stat 2017; 11:1011-1039. [PMID: 28959370 PMCID: PMC5613668 DOI: 10.1214/17-aoas1033] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Cancer subtypes discovery is the first step to deliver personalized medicine to cancer patients. With the accumulation of massive multi-level omics datasets and established biological knowledge databases, omics data integration with incorporation of rich existing biological knowledge is essential for deciphering a biological mechanism behind the complex diseases. In this manuscript, we propose an integrative sparse K-means (is-K means) approach to discover disease subtypes with the guidance of prior biological knowledge via sparse overlapping group lasso. An algorithm using an alternating direction method of multiplier (ADMM) will be applied for fast optimization. Simulation and three real applications in breast cancer and leukemia will be used to compare is-K means with existing methods and demonstrate its superior clustering accuracy, feature selection, functional annotation of detected molecular features and computing efficiency.
Collapse
Affiliation(s)
- Zhiguang Huo
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, ennsylvania 15261, USA
| | - George Tseng
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, ennsylvania 15261, USA
| |
Collapse
|
36
|
Lemler DJ, Lynch ML, Tesfay L, Deng Z, Paul BT, Wang X, Hegde P, Manz DH, Torti SV, Torti FM. DCYTB is a predictor of outcome in breast cancer that functions via iron-independent mechanisms. Breast Cancer Res 2017; 19:25. [PMID: 28270217 PMCID: PMC5341190 DOI: 10.1186/s13058-017-0814-9] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2016] [Accepted: 02/09/2017] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Duodenal cytochrome b (DCYTB) is a ferrireductase that functions together with divalent metal transporter 1 (DMT1) to mediate dietary iron reduction and uptake in the duodenum. DCYTB is also a member of a 16-gene iron regulatory gene signature (IRGS) that predicts metastasis-free survival in breast cancer patients. To better understand the relationship between DCYTB and breast cancer, we explored in detail the prognostic significance and molecular function of DCYTB in breast cancer. METHODS The prognostic significance of DCYTB expression was evaluated using publicly available microarray data. Signaling Pathway Impact Analysis (SPIA) of microarray data was used to identify potential novel functions of DCYTB. The role of DCYTB was assessed using immunohistochemistry and measurements of iron uptake, iron metabolism, and FAK signaling. RESULTS High DCYTB expression was associated with prolonged survival in two large independent cohorts, together totaling 1610 patients (cohort #1, p = 1.6e-11, n = 741; cohort #2, p = 1.2e-05, n = 869; log-rank test) as well as in the Gene expression-based Outcome for Breast cancer Online (GOBO) cohort (p < 1.0e-05, n = 1379). High DCYTB expression was also associated with increased survival in homogeneously treated groups of patients who received either tamoxifen or chemotherapy. Immunohistochemistry revealed that DCYTB is localized on the plasma membrane of breast epithelial cells, and that expression is dramatically reduced in high-grade tumors. Surprisingly, neither overexpression nor knockdown of DCYTB affected levels of ferritin H, transferrin receptor, labile iron or total cellular iron in breast cancer cells. Because SPIA pathway analysis of patient microarray data revealed an association between DCYTB and the focal adhesion pathway, we examined the influence of DCYTB on FAK activation in breast cancer cells. These experiments reveal that DCYTB reduces adhesion and activation of focal adhesion kinase (FAK) and its adapter protein paxillin. CONCLUSIONS DCYTB is an important predictor of outcome and is associated with response to therapy in breast cancer patients. DCYTB does not affect intracellular iron in breast cancer cells. Instead, DCYTB may retard cancer progression by reducing activation of FAK, a kinase that plays a central role in tumor cell adhesion and metastasis.
Collapse
Affiliation(s)
- David J. Lemler
- Department of Molecular Biology and Biophysics, University of Connecticut Health Center, Farmington, CT 06030 USA
- Present address: Department of Molecular Biomedical Sciences, North Carolina State University, CVM Research Building 474, Raleigh, NC 27695 USA
| | - Miranda L. Lynch
- Center for Quantitative Medicine, University of Connecticut Health Center, Farmington, CT 06030 USA
- Present address: Statistical Sciences Group CCS-6, Los Alamos National Laboratory, Los Alamos, NM 87545 USA
| | - Lia Tesfay
- Department of Molecular Biology and Biophysics, University of Connecticut Health Center, Farmington, CT 06030 USA
| | - Zhiyong Deng
- Department of Molecular Biology and Biophysics, University of Connecticut Health Center, Farmington, CT 06030 USA
| | - Bibbin T. Paul
- Department of Molecular Biology and Biophysics, University of Connecticut Health Center, Farmington, CT 06030 USA
| | - Xiaohong Wang
- Department of Pathology, University of Connecticut Health Center, Farmington, CT 06030 USA
| | - Poornima Hegde
- Department of Pathology, University of Connecticut Health Center, Farmington, CT 06030 USA
| | - David H. Manz
- Department of Molecular Biology and Biophysics, University of Connecticut Health Center, Farmington, CT 06030 USA
- School of Dental Medicine, University of Connecticut Health Center, Farmington, CT 06030 USA
| | - Suzy V. Torti
- Department of Molecular Biology and Biophysics, University of Connecticut Health Center, Farmington, CT 06030 USA
| | - Frank M. Torti
- Department of Medicine, University of Connecticut Health Center, Farmington, CT 06030 USA
| |
Collapse
|
37
|
Should We Give Up The Search for a Clinically Useful Gene Signature for the Prediction of Response of Rectal Cancer to Neoadjuvant Chemoradiation? Dis Colon Rectum 2016; 59:895-7. [PMID: 27505119 DOI: 10.1097/dcr.0000000000000620] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
|
38
|
Strbenac D, Mann GJ, Yang JYH, Ormerod JT. Differential distribution improves gene selection stability and has competitive classification performance for patient survival. Nucleic Acids Res 2016; 44:e119. [PMID: 27190235 PMCID: PMC5291264 DOI: 10.1093/nar/gkw444] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2015] [Accepted: 05/09/2016] [Indexed: 01/07/2023] Open
Abstract
A consistent difference in average expression level, often referred to as differential expression (DE), has long been used to identify genes useful for classification. However, recent cancer studies have shown that when transcription factors or epigenetic signals become deregulated, a change in expression variability (DV) of target genes is frequently observed. This suggests that assessing the importance of genes by either differential expression or variability alone potentially misses sets of important biomarkers that could lead to improved predictions and treatments. Here, we describe a new approach for assessing the importance of genes based on differential distribution (DD), which combines information from differential expression and differential variability into a unified metric. We show that feature ranking and selection stability based on DD can perform two to three times better than DE or DV alone, and that DD yields equivalent error rates to DE and DV. Finally, assessing genes via differential distribution produces a complementary set of selected genes to DE and DV, potentially opening up new categories of biomarkers.
Collapse
Affiliation(s)
- Dario Strbenac
- School of Mathematics and Statistics, University of Sydney, NSW 2006, Australia
| | - Graham J Mann
- Melanoma Institute Australia, University of Sydney, NSW 2060, Australia Centre for Cancer Research, Westmead Millennium Institute, University of Sydney, Westmead NSW 2145, Australia
| | - Jean Y H Yang
- School of Mathematics and Statistics, University of Sydney, NSW 2006, Australia
| | - John T Ormerod
- School of Mathematics and Statistics, University of Sydney, NSW 2006, Australia ARC Centre of Excellence for Mathematical & Statistical Frontiers, University of Melbourne, Parkville VIC 3010, Australia
| |
Collapse
|
39
|
An N, Yang X, Cheng S, Wang G, Zhang K. Developmental genes significantly afflicted by aberrant promoter methylation and somatic mutation predict overall survival of late-stage colorectal cancer. Sci Rep 2015; 5:18616. [PMID: 26691761 PMCID: PMC4686889 DOI: 10.1038/srep18616] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2015] [Accepted: 11/19/2015] [Indexed: 02/07/2023] Open
Abstract
Carcinogenesis is an exceedingly complicated process, which involves multi-level dysregulations, including genomics (majorly caused by somatic mutation and copy number variation), DNA methylomics, and transcriptomics. Therefore, only looking into one molecular level of cancer is not sufficient to uncover the intricate underlying mechanisms. With the abundant resources of public available data in the Cancer Genome Atlas (TCGA) database, an integrative strategy was conducted to systematically analyze the aberrant patterns of colorectal cancer on the basis of DNA copy number, promoter methylation, somatic mutation and gene expression. In this study, paired samples in each genomic level were retrieved to identify differentially expressed genes with corresponding genetic or epigenetic dysregulations. Notably, the result of gene ontology enrichment analysis indicated that the differentially expressed genes with corresponding aberrant promoter methylation or somatic mutation were both functionally concentrated upon developmental process, suggesting the intimate association between development and carcinogenesis. Thus, by means of random walk with restart, 37 significant development-related genes were retrieved from a priori-knowledge based biological network. In five independent microarray datasets, Kaplan-Meier survival and Cox regression analyses both confirmed that the expression of these genes was significantly associated with overall survival of Stage III/IV colorectal cancer patients.
Collapse
Affiliation(s)
- Ning An
- State Key Laboratory of Molecular Oncology, Department of Etiology and Carcinogenesis, Peking Union Medical College & Cancer Institute (Hospital), Chinese Academy of Medical Sciences, Beijing, 100021, China
| | - Xue Yang
- State Key Laboratory of Molecular Oncology, Department of Etiology and Carcinogenesis, Peking Union Medical College & Cancer Institute (Hospital), Chinese Academy of Medical Sciences, Beijing, 100021, China
| | - Shujun Cheng
- State Key Laboratory of Molecular Oncology, Department of Etiology and Carcinogenesis, Peking Union Medical College & Cancer Institute (Hospital), Chinese Academy of Medical Sciences, Beijing, 100021, China
| | - Guiqi Wang
- Department of Endoscopy, Cancer Hospital, Chinese Academy of Medical Sciences, Beijing, 100021, China
| | - Kaitai Zhang
- State Key Laboratory of Molecular Oncology, Department of Etiology and Carcinogenesis, Peking Union Medical College & Cancer Institute (Hospital), Chinese Academy of Medical Sciences, Beijing, 100021, China
| |
Collapse
|
40
|
Yard B, Chie EK, Adams DJ, Peacock C, Abazeed ME. Radiotherapy in the Era of Precision Medicine. Semin Radiat Oncol 2015; 25:227-36. [DOI: 10.1016/j.semradonc.2015.05.003] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
41
|
Quantitative risk stratification of oral leukoplakia with exfoliative cytology. PLoS One 2015; 10:e0126760. [PMID: 25978541 PMCID: PMC4433206 DOI: 10.1371/journal.pone.0126760] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2015] [Accepted: 04/07/2015] [Indexed: 11/19/2022] Open
Abstract
Exfoliative cytology has been widely used for early diagnosis of oral squamous cell carcinoma (OSCC). Test outcome is reported as “negative”, “atypical” (defined as abnormal epithelial changes of uncertain diagnostic significance), and “positive” (defined as definitive cellular evidence of epithelial dysplasia or carcinoma). The major challenge is how to properly manage the “atypical” patients in order to diagnose OSCC early and prevent OSCC. In this study, we collected exfoliative cytology data, histopathology data, and clinical data of normal subjects (n=102), oral leukoplakia (OLK) patients (n=82), and OSCC patients (n=93), and developed a data analysis procedure for quantitative risk stratification of OLK patients. This procedure involving a step called expert-guided data transformation and reconstruction (EdTAR) which allows automatic data processing and reconstruction and reveals informative signals for subsequent risk stratification. Modern machine learning techniques were utilized to build statistical prediction models on the reconstructed data. Among the several models tested using resampling methods for parameter pruning and performance evaluation, Support Vector Machine (SVM) was found to be optimal with a high sensitivity (median>0.98) and specificity (median>0.99). With the SVM model, we constructed an oral cancer risk index (OCRI) which may potentially guide clinical follow-up of OLK patients. One OLK patient with an initial OCRI of 0.88 developed OSCC after 40 months of follow-up. In conclusion, we have developed a statistical method for qualitative risk stratification of OLK patients. This method may potentially improve cost-effectiveness of clinical follow-up of OLK patients, and help design clinical chemoprevention trial for high-risk populations.
Collapse
|
42
|
Cui H, Dhroso A, Johnson N, Korkin D. The variation game: Cracking complex genetic disorders with NGS and omics data. Methods 2015; 79-80:18-31. [PMID: 25944472 DOI: 10.1016/j.ymeth.2015.04.018] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2014] [Revised: 03/27/2015] [Accepted: 04/17/2015] [Indexed: 12/14/2022] Open
Abstract
Tremendous advances in Next Generation Sequencing (NGS) and high-throughput omics methods have brought us one step closer towards mechanistic understanding of the complex disease at the molecular level. In this review, we discuss four basic regulatory mechanisms implicated in complex genetic diseases, such as cancer, neurological disorders, heart disease, diabetes, and many others. The mechanisms, including genetic variations, copy-number variations, posttranscriptional variations, and epigenetic variations, can be detected using a variety of NGS methods. We propose that malfunctions detected in these mechanisms are not necessarily independent, since these malfunctions are often found associated with the same disease and targeting the same gene, group of genes, or functional pathway. As an example, we discuss possible rewiring effects of the cancer-associated genetic, structural, and posttranscriptional variations on the protein-protein interaction (PPI) network centered around P53 protein. The review highlights multi-layered complexity of common genetic disorders and suggests that integration of NGS and omics data is a critical step in developing new computational methods capable of deciphering this complexity.
Collapse
Affiliation(s)
- Hongzhu Cui
- Department of Computer Science, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, United States
| | - Andi Dhroso
- Department of Computer Science, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, United States
| | - Nathan Johnson
- Department of Computer Science, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, United States
| | - Dmitry Korkin
- Department of Computer Science, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, United States; Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, United States
| |
Collapse
|
43
|
Comprehensive evaluation of the effectiveness of gene expression signatures to predict complete response to neoadjuvant chemoradiotherapy and guide surgical intervention in rectal cancer. Cancer Genet 2015; 208:319-26. [PMID: 25963525 DOI: 10.1016/j.cancergen.2015.03.010] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2014] [Revised: 01/23/2015] [Accepted: 03/11/2015] [Indexed: 12/24/2022]
Abstract
Neoadjuvant chemoradiotherapy (nCRT) may lead to complete tumor regression in rectal cancer patients. Prediction of complete response to nCRT may allow a personalized management of rectal cancer and spare patients from unnecessary radical total mesorectal excision with or without sphincter preservation. To identify a gene expression signature capable of predicting complete pathological response (pCR) to nCRT, we performed a gene expression analysis in 25 pretreatment biopsies from patients who underwent 5FU-based nCRT using RNA-Seq. A supervised learning algorithm was used to identify expression signatures capable of predicting pCR, and the predictive value of these signatures was validated using independent samples. We also evaluated the utility of previously published signatures in predicting complete response in our cohort. We identified 27 differentially expressed genes between patients with pCR and patients with incomplete responses to nCRT. Predictive gene signatures using subsets of these 27 differentially expressed genes peaked at 81.8% accuracy. However, signatures with the highest sensitivity showed poor specificity, and vice-versa, when applied in an independent set of patients. Testing previously published signatures on our cohort also showed poor predictive value. Our results indicate that currently available predictive signatures are highly dependent on the sample set from which they are derived, and their accuracy is not superior to current imaging and clinical parameters used to assess response to nCRT and guide surgical intervention.
Collapse
|