1
|
Yao A, Wang L, Qi F, Li J, Meng J, Jiang T, He Y, Lai W. Risk factors and early detection of joint damage in patients with psoriasis: a case-control study. Int J Dermatol 2024. [PMID: 38682296 DOI: 10.1111/ijd.17212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 04/09/2024] [Accepted: 04/10/2024] [Indexed: 05/01/2024]
Abstract
BACKGROUND Our aim was to target the unsatisfied need for early detection of the at-risk population and determine the subgroup of patients whose psoriasis (PsO) could transform into psoriatic arthritis (PsA). METHODS A retrospective and longitudinal case-control study was conducted at Beijing Chao-yang Hospital. It included 75 patients who were clinically diagnosed with PsA in the case group and 345 who solely suffered from PsO without PsA in the control group. A variety of baseline covariates were gathered from every patient with PsO. Univariate and multivariate analyses and receiver operating characteristic (ROC) curves were used to identify underlying risk factors and determine whether it was necessary to examine the imaging of PsO patients. RESULTS In multivariate logistic regression analysis, age ≥40 (odds ratio (OR): 1.04, 95% confidence interval (CI): 1.02-1.06, P < 0.01), nail involvement (OR: 1.17, 95% CI: 1.09-1.32, P < 0.01), erythrocyte sedimentation rate (ESR) (OR: 1.03, 95% CI: 1.01-1.06, P < 0.05) and elevated high-sensitivity C-reactive protein (hs-CRP) (OR: 1.31, 95% CI: 1.13-1.53, P < 0.01) were perceived to be risk factors for the transformation from PsO into clinical PsA. By combining magnetic resonance imaging (MRI)-detected enthesitis with tenosynovitis, combined predictors demonstrated better diagnostic efficacy, with an improvement in specificity (94.3% vs. 69%) and similarities in sensitivity (89% vs. 84.6%). The areas under the ROC curve (AUCs) amounted to 0.925 (95% CI: 0.882-0.967, P < 0.01) and 0.858 (95% CI: 0.814-0.903, P < 0.01). CONCLUSIONS It was identified that age ≥40, nail involvement, as well as an elevated ESR, and hs-CRP served as independent risk factors for PsO transforming into PsA. Additionally, MRI provides additional value for the early recognition of PsA.
Collapse
Affiliation(s)
- Amin Yao
- Department of Dermatology, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
- Department of Dermatology, Beijing Chao-yang Hospital, Capital Medical University, Beijing, China
| | - Li Wang
- Department of Radiology, Beijing Chao-yang Hospital, Capital Medical University, Beijing, China
| | - Fei Qi
- Department of Dermatology, Beijing Chao-yang Hospital, Capital Medical University, Beijing, China
| | - Jialu Li
- Department of Radiology, Beijing Chao-yang Hospital, Capital Medical University, Beijing, China
| | - Juan Meng
- Department of Rheumatology, Beijing Chao-yang Hospital, Capital Medical University, Beijing, China
| | - Tao Jiang
- Department of Radiology, Beijing Chao-yang Hospital, Capital Medical University, Beijing, China
| | - Yanling He
- Department of Dermatology, Beijing Chao-yang Hospital, Capital Medical University, Beijing, China
| | - Wei Lai
- Department of Dermatology, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
2
|
Chu J, Sun N, Hu W, Chen X, Yi N, Shen Y. Bayesian hierarchical lasso Cox model: A 9-gene prognostic signature for overall survival in gastric cancer in an Asian population. PLoS One 2022; 17:e0266805. [PMID: 35421138 PMCID: PMC9009599 DOI: 10.1371/journal.pone.0266805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 03/29/2022] [Indexed: 12/24/2022] Open
Abstract
Objective
Gastric cancer (GC) is one of the most common tumour diseases worldwide and has poor survival, especially in the Asian population. Exploration based on biomarkers would be efficient for better diagnosis, prediction, and targeted therapy.
Methods
Expression profiles were downloaded from the Gene Expression Omnibus (GEO) database. Survival-related genes were identified by gene set enrichment analysis (GSEA) and univariate Cox. Then, we applied a Bayesian hierarchical lasso Cox model for prognostic signature screening. Protein-protein interaction and Spearman analysis were performed. Kaplan–Meier and receiver operating characteristic (ROC) curve analysis were applied to evaluate the prediction performance. Multivariate Cox regression was used to identify prognostic factors, and a prognostic nomogram was constructed for clinical application.
Results
With the Bayesian lasso Cox model, a 9-gene signature included TNFRSF11A, NMNAT1, EIF5A, NOTCH3, TOR2A, E2F8, PSMA5, TPMT, and KIF11 was established to predict overall survival in GC. Protein-protein interaction analysis indicated that E2F8 was likely related to KIF11. Kaplan-Meier analysis showed a significant difference between the high-risk and low-risk groups (P<0.001). Multivariate analysis demonstrated that the 9-gene signature was an independent predictor (HR = 2.609, 95% CI 2.017–3.370), and the C-index of the integrative model reached 0.75. Function enrichment analysis for different risk groups revealed the most significant enrichment pathway/term, including pyrimidine metabolism and respiratory electron transport chain.
Conclusion
Our findings suggested that a novel prognostic model based on a 9-gene signature was developed to predict GC patients in high-risk and improve prediction performance. We hope our model could provide a reference for risk classification and clinical decision-making.
Collapse
Affiliation(s)
- Jiadong Chu
- Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Na Sun
- Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Wei Hu
- Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Xuanli Chen
- Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Nengjun Yi
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Yueping Shen
- Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
- * E-mail:
| |
Collapse
|
3
|
Chen CK. Inference of genetic regulatory networks with regulatory hubs using vector autoregressions and automatic relevance determination with model selections. Stat Appl Genet Mol Biol 2021; 20:121-143. [PMID: 34963205 DOI: 10.1515/sagmb-2020-0054] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Accepted: 11/15/2021] [Indexed: 12/11/2022]
Abstract
The inference of genetic regulatory networks (GRNs) reveals how genes interact with each other. A few genes can regulate many genes as targets to control cell functions. We present new methods based on the order-1 vector autoregression (VAR1) for inferring GRNs from gene expression time series. The methods use the automatic relevance determination (ARD) to incorporate the regulatory hub structure into the estimation of VAR1 in a Bayesian framework. Several sparse approximation schemes are applied to the estimated regression weights or VAR1 model to generate the sparse weighted adjacency matrices representing the inferred GRNs. We apply the proposed and several widespread reference methods to infer GRNs with up to 100 genes using simulated, DREAM4 in silico and experimental E. coli gene expression time series. We show that the proposed methods are efficient on simulated hub GRNs and scale-free GRNs using short time series simulated by VAR1s and outperform reference methods on small-scale DREAM4 in silico GRNs and E. coli GRNs. They can utilize the known major regulatory hubs to improve the performance on larger DREAM4 in silico GRNs and E. coli GRNs. The impact of nonlinear time series data on the performance of proposed methods is discussed.
Collapse
Affiliation(s)
- Chi-Kan Chen
- Department of Applied Mathematics, National Chung Hsing University, 145 Xingda Rd., South District, Taichung City, Taiwan, ROC
| |
Collapse
|
4
|
Jia M, Li Z, Pan M, Tao M, Lu X, Liu Y. Evaluation of immune infiltrating of thyroid cancer based on the intrinsic correlation between pair-wise immune genes. Life Sci 2020; 259:118248. [PMID: 32791153 DOI: 10.1016/j.lfs.2020.118248] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Revised: 07/09/2020] [Accepted: 08/07/2020] [Indexed: 10/23/2022]
Abstract
INTRODUCTION Unlike most mutation-driven cancers, thyroid cancer is thought to be highly dependent on changes in human hormone levels. It has become research hotspot using the change of gene expression level as a detection and diagnostic marker. The internal relationship between two genes and disease development is used to avoid the instability caused by single gene fluctuation. Aim It is possible to achieve early diagnosis in thyroid cancer during tumorigenesis and recurrence using IGPS (immune gene pairs). METHODS We extracted thyroid cancer data from The Cancer Genome Atlas (TCGA), using CIBERSORT algorithm to infiltrate out 22 immune cells types. We screened out IGPS that differ significantly between different groups, then used LinearSVC model to learn and screen features, combined with deep learning neural network model to predict benign and malignant cancer as well as patients at different groups. KEY FINDINGS There are significant differences of immune cell ratio in tumor stages and relapse samples. We screen out 42 and 64 IGPS for in normal-tumor and non-relapsed groups respectively, for example ASCC3-MAP3K7 and ATF2-SOCS5, have significant correlation in IGPS expression. Then we use the IGPS to train the tumor diagnostic classifier, obtain average AUC are both 0.99 after ten times cross-validation. SIGNIFICANCE The IGPS gives us new insight to explore immune cell infiltration of thyroid cancer, deep learning model can be further used in early diagnosis of thyroid cancer and estimation of the risk of recurrence.
Collapse
Affiliation(s)
- Meng Jia
- Thyroid Surgery, the First Affiliated Hospital of Zhengzhou University, Henan, 450052 Zhengzhou, China
| | - Zhuyao Li
- Thyroid Surgery, the First Affiliated Hospital of Zhengzhou University, Henan, 450052 Zhengzhou, China
| | - Mengjiao Pan
- Thyroid Surgery, the First Affiliated Hospital of Zhengzhou University, Henan, 450052 Zhengzhou, China
| | - Mei Tao
- Thyroid Surgery, the First Affiliated Hospital of Zhengzhou University, Henan, 450052 Zhengzhou, China
| | - Xiubo Lu
- Thyroid Surgery, the First Affiliated Hospital of Zhengzhou University, Henan, 450052 Zhengzhou, China.
| | - Yang Liu
- Department of Radiotherapy, Henan Cancer Hospital and the Affiliated Cancer Hospital of Zhengzhou University, Zhengzhou 450008, China.
| |
Collapse
|
5
|
Chen CK. Inference of gene networks from gene expression time series using recurrent neural networks and sparse MAP estimation. J Bioinform Comput Biol 2018; 16:1850009. [PMID: 30051742 DOI: 10.1142/s0219720018500099] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
BACKGROUND The inference of genetic regulatory networks (GRNs) provides insight into the cellular responses to signals. A class of recurrent neural networks (RNNs) capturing the dynamics of GRN has been used as a basis for inferring small-scale GRNs from gene expression time series. The Bayesian framework facilitates incorporating the hypothesis of GRN into the model estimation to improve the accuracy of GRN inference. RESULTS We present new methods for inferring small-scale GRNs based on RNNs. The weights of wires of RNN represent the strengths of gene-to-gene regulatory interactions. We use a class of automatic relevance determination (ARD) priors to enforce the sparsity in the maximum a posteriori (MAP) estimates of wire weights of RNN. A particle swarm optimization (PSO) is integrated as an optimization engine into the MAP estimation process. Likely networks of genes generated based on estimated wire weights are combined using the majority rule to determine a final estimated GRN. As an alternative, a class of <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>q</mml:mi></mml:mrow></mml:msub></mml:math> -norm ( <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>q</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:math> ) priors is used for attaining the sparse MAP estimates of wire weights of RNN. We also infer the GRN using the maximum likelihood (ML) estimates of wire weights of RNN. The RNN-based GRN inference algorithms, ARD-RNN, <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>q</mml:mi></mml:mrow></mml:msub></mml:math> -RNN, and ML-RNN are tested on simulated and experimental E. coli and yeast time series containing 6-11 genes and 7-19 data points. Published GRN inference algorithms based on regressions and mutual information networks are performed on the benchmark datasets to compare performances. CONCLUSION ARD and <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>q</mml:mi></mml:mrow></mml:msub></mml:math> -norm priors are used for the estimation of wire weights of RNN. Results of GRN inference experiments show that ARD-RNN, <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>q</mml:mi></mml:mrow></mml:msub></mml:math> -RNN have similar best accuracies on the simulated time series. The ARD-RNN is more accurate than <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>q</mml:mi></mml:mrow></mml:msub></mml:math> -RNN, ML-RNN, and mostly more accurate than the reference algorithms on the experimental time series. The effectiveness of ARD-RNN for inferring small-scale GRNs using gene expression time series of limited length is empirically verified.
Collapse
Affiliation(s)
- Chi-Kan Chen
- Department of Applied Mathematics, National Chung Hsing University, Taiwan
| |
Collapse
|
6
|
Ow GS, Tang Z, Kuznetsov VA. Big data and computational biology strategy for personalized prognosis. Oncotarget 2018; 7:40200-40220. [PMID: 27229533 PMCID: PMC5130003 DOI: 10.18632/oncotarget.9571] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2015] [Accepted: 05/01/2016] [Indexed: 01/05/2023] Open
Abstract
The era of big data and precision medicine has led to accumulation of massive datasets of gene expression data and clinical information of patients. For a new patient, we propose that identification of a highly similar reference patient from an existing patient database via similarity matching of both clinical and expression data could be useful for predicting the prognostic risk or therapeutic efficacy. Here, we propose a novel methodology to predict disease/treatment outcome via analysis of the similarity between any pair of patients who are each characterized by a certain set of pre-defined biological variables (biomarkers or clinical features) represented initially as a prognostic binary variable vector (PBVV) and subsequently transformed to a prognostic signature vector (PSV). Our analyses revealed that Euclidean distance rather correlation distance measure was effective in defining an unbiased similarity measure calculated between two PSVs. We implemented our methods to high-grade serous ovarian cancer (HGSC) based on a 36-mRNA predictor that was previously shown to stratify patients into 3 distinct prognostic subgroups. We studied and revealed that patient's age, when converted into binary variable, was positively correlated with the overall risk of succumbing to the disease. When applied to an independent testing dataset, the inclusion of age into the molecular predictor provided more robust personalized prognosis of overall survival correlated with the therapeutic response of HGSC and provided benefit for treatment targeting of the tumors in HGSC patients. Finally, our method can be generalized and implemented in many other diseases to accurately predict personalized patients’ outcomes.
Collapse
Affiliation(s)
| | | | - Vladimir A Kuznetsov
- Bioinformatics Institute, Singapore 138671.,School of Computer Engineering, Nanyang Technological University, Singapore 639798
| |
Collapse
|
7
|
Cui Y, Li B, Li R. Decentralized Learning Framework of Meta-Survival Analysis for Developing Robust Prognostic Signatures. JCO Clin Cancer Inform 2017; 1:1-13. [PMID: 30657395 PMCID: PMC6873986 DOI: 10.1200/cci.17.00077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
PURPOSE A significant hurdle in developing reliable gene expression-based prognostic models has been the limited sample size, which can cause overfitting and false discovery. Combining data from multiple studies can enhance statistical power and reduce spurious findings, but how to address the biologic heterogeneity across different datasets remains a major challenge. Better meta-survival analysis approaches are needed. MATERIAL AND METHODS We presented a decentralized learning framework for meta-survival analysis without the need for data aggregation. Our method consisted of a series of proposals that together alleviated the influence of data heterogeneity and improved the performance of survival prediction. First, we transformed the gene expression profile of every sample into normalized percentile ranks to obtain platform-agnostic features. Second, we used Stouffer's meta-z approach in combination with Harrell's concordance index to prioritize and select genes to be included in the model. Third, we used survival discordance as a scale-independent model loss function. Instead of generating a merged dataset and training the model therein, we avoided comparing patients across datasets and individually evaluated the loss function on each dataset. Finally, we optimized the model by minimizing the joint loss function. RESULTS Through comprehensive evaluation on 31 public microarray datasets containing 6,724 samples of several cancer types, we demonstrated that the proposed method has outperformed (1) single prognostic genes identified using conventional meta-analysis, (2) multigene signatures trained on single datasets, (3) multigene signatures trained on merged datasets as well as by other existing meta-analysis methods, and (4) clinically applicable, established multigene signatures. CONCLUSION The decentralized learning approach can be used to effectively perform meta-analysis of gene expression data and to develop robust multigene prognostic signatures.
Collapse
Affiliation(s)
- Yi Cui
- Yi Cui, Bailiang Li, and Ruijiang Li, Stanford University School of Medicine, Stanford, CA; Yi Cui, Global Institution for Collaborative Research and Education, Hokkaido University, Sapporo, Japan
| | - Bailiang Li
- Yi Cui, Bailiang Li, and Ruijiang Li, Stanford University School of Medicine, Stanford, CA; Yi Cui, Global Institution for Collaborative Research and Education, Hokkaido University, Sapporo, Japan
| | - Ruijiang Li
- Yi Cui, Bailiang Li, and Ruijiang Li, Stanford University School of Medicine, Stanford, CA; Yi Cui, Global Institution for Collaborative Research and Education, Hokkaido University, Sapporo, Japan
| |
Collapse
|
8
|
Reconstructing Genetic Regulatory Networks Using Two-Step Algorithms with the Differential Equation Models of Neural Networks. Interdiscip Sci 2017; 10:823-835. [PMID: 28748400 DOI: 10.1007/s12539-017-0254-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2016] [Revised: 07/01/2017] [Accepted: 07/14/2017] [Indexed: 10/19/2022]
Abstract
BACKGROUND The identification of genetic regulatory networks (GRNs) provides insights into complex cellular processes. A class of recurrent neural networks (RNNs) captures the dynamics of GRN. Algorithms combining the RNN and machine learning schemes were proposed to reconstruct small-scale GRNs using gene expression time series. RESULTS We present new GRN reconstruction methods with neural networks. The RNN is extended to a class of recurrent multilayer perceptrons (RMLPs) with latent nodes. Our methods contain two steps: the edge rank assignment step and the network construction step. The former assigns ranks to all possible edges by a recursive procedure based on the estimated weights of wires of RNN/RMLP (RERNN/RERMLP), and the latter constructs a network consisting of top-ranked edges under which the optimized RNN simulates the gene expression time series. The particle swarm optimization (PSO) is applied to optimize the parameters of RNNs and RMLPs in a two-step algorithm. The proposed RERNN-RNN and RERMLP-RNN algorithms are tested on synthetic and experimental gene expression time series of small GRNs of about 10 genes. The experimental time series are from the studies of yeast cell cycle regulated genes and E. coli DNA repair genes. CONCLUSION The unstable estimation of RNN using experimental time series having limited data points can lead to fairly arbitrary predicted GRNs. Our methods incorporate RNN and RMLP into a two-step structure learning procedure. Results show that the RERMLP using the RMLP with a suitable number of latent nodes to reduce the parameter dimension often result in more accurate edge ranks than the RERNN using the regularized RNN on short simulated time series. Combining by a weighted majority voting rule the networks derived by the RERMLP-RNN using different numbers of latent nodes in step one to infer the GRN, the method performs consistently and outperforms published algorithms for GRN reconstruction on most benchmark time series. The framework of two-step algorithms can potentially incorporate with different nonlinear differential equation models to reconstruct the GRN.
Collapse
|
9
|
Meder L, König K, Ozretić L, Schultheis AM, Ueckeroth F, Ade CP, Albus K, Boehm D, Rommerscheidt-Fuss U, Florin A, Buhl T, Hartmann W, Wolf J, Merkelbach-Bruse S, Eilers M, Perner S, Heukamp LC, Buettner R. NOTCH, ASCL1, p53 and RB alterations define an alternative pathway driving neuroendocrine and small cell lung carcinomas. Int J Cancer 2015; 138:927-38. [PMID: 26340530 PMCID: PMC4832386 DOI: 10.1002/ijc.29835] [Citation(s) in RCA: 125] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2015] [Accepted: 08/19/2015] [Indexed: 12/17/2022]
Abstract
Small cell lung cancers (SCLCs) and extrapulmonary small cell cancers (SCCs) are very aggressive tumors arising de novo as primary small cell cancer with characteristic genetic lesions in RB1 and TP53. Based on murine models, neuroendocrine stem cells of the terminal bronchioli have been postulated as the cellular origin of primary SCLC. However, both in lung and many other organs, combined small cell/non‐small cell tumors and secondary transitions from non‐small cell carcinomas upon cancer therapy to neuroendocrine and small cell tumors occur. We define features of “small cell‐ness” based on neuroendocrine markers, characteristic RB1 and TP53 mutations and small cell morphology. Furthermore, here we identify a pathway driving the pathogenesis of secondary SCLC involving inactivating NOTCH mutations, activation of the NOTCH target ASCL1 and canonical WNT‐signaling in the context of mutual bi‐allelic RB1 and TP53 lesions. Additionaly, we explored ASCL1 dependent RB inactivation by phosphorylation, which is reversible by CDK5 inhibition. We experimentally verify the NOTCH‐ASCL1‐RB‐p53 signaling axis in vitro and validate its activation by genetic alterations in vivo. We analyzed clinical tumor samples including SCLC, SCC and pulmonary large cell neuroendocrine carcinomas and adenocarcinomas using amplicon‐based Next Generation Sequencing, immunohistochemistry and fluorescence in situ hybridization. In conclusion, we identified a novel pathway underlying rare secondary SCLC which may drive small cell carcinomas in organs other than lung, as well. What's new? Using next generation sequencing and establishing features of ‘small cell‐ness’, we identified a NOTCH‐ASCL1‐RB1‐TP53 signaling axis driving small cell cancers. In contrast to the previously described bi‐allelic RB1/TP53 loss in neuroendocrine stem cells as origin of primary small cell neuroendocrine cancers, the NOTCH‐ASCL1 mediated signaling defines an alternative pathway driving secondary small cell neuroendocrine cancers arising from non‐small cell cancers. Moreover, we show a preclinical rational for therapeutically testing WNT‐inhibitors in small cell cancers.
Collapse
Affiliation(s)
- Lydia Meder
- Institute of Pathology, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany.,Center for Integrated Oncology Cologne, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany.,Center for Integrated Oncology Bonn, University Hospital Bonn, Sigmund-Freud Straße 25, 53105, Bonn, Germany.,Lung Cancer Group Cologne, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany
| | - Katharina König
- Institute of Pathology, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany.,Center for Integrated Oncology Cologne, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany.,Center for Integrated Oncology Bonn, University Hospital Bonn, Sigmund-Freud Straße 25, 53105, Bonn, Germany.,Lung Cancer Group Cologne, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany
| | - Luka Ozretić
- Institute of Pathology, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany.,Center for Integrated Oncology Cologne, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany.,Center for Integrated Oncology Bonn, University Hospital Bonn, Sigmund-Freud Straße 25, 53105, Bonn, Germany.,Lung Cancer Group Cologne, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany
| | - Anne M Schultheis
- Institute of Pathology, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany.,Center for Integrated Oncology Cologne, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany.,Center for Integrated Oncology Bonn, University Hospital Bonn, Sigmund-Freud Straße 25, 53105, Bonn, Germany.,Lung Cancer Group Cologne, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany
| | - Frank Ueckeroth
- Institute of Pathology, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany.,Center for Integrated Oncology Cologne, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany.,Center for Integrated Oncology Bonn, University Hospital Bonn, Sigmund-Freud Straße 25, 53105, Bonn, Germany.,Lung Cancer Group Cologne, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany
| | - Carsten P Ade
- Biocenter, University of Würzburg, Am Hubland, Würzburg, 97074, Germany
| | - Kerstin Albus
- Institute of Pathology, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany.,Center for Integrated Oncology Cologne, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany.,Center for Integrated Oncology Bonn, University Hospital Bonn, Sigmund-Freud Straße 25, 53105, Bonn, Germany.,Lung Cancer Group Cologne, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany
| | - Diana Boehm
- Center for Integrated Oncology Cologne, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany.,Center for Integrated Oncology Bonn, University Hospital Bonn, Sigmund-Freud Straße 25, 53105, Bonn, Germany.,Department of Prostate Cancer Research, Institute of Pathology, University Hospital Bonn, Sigmund-Freud Straße 25, Bonn, 53105, Germany
| | - Ursula Rommerscheidt-Fuss
- Institute of Pathology, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany.,Lung Cancer Group Cologne, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany
| | - Alexandra Florin
- Institute of Pathology, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany.,Lung Cancer Group Cologne, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany
| | - Theresa Buhl
- Institute of Pathology, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany.,Center for Integrated Oncology Cologne, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany.,Center for Integrated Oncology Bonn, University Hospital Bonn, Sigmund-Freud Straße 25, 53105, Bonn, Germany.,Lung Cancer Group Cologne, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany
| | - Wolfgang Hartmann
- Institute of Pathology, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany
| | - Jürgen Wolf
- Center for Integrated Oncology Cologne, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany.,Center for Integrated Oncology Bonn, University Hospital Bonn, Sigmund-Freud Straße 25, 53105, Bonn, Germany.,Lung Cancer Group Cologne, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany.,Clinic for Internal Medicine I, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany
| | - Sabine Merkelbach-Bruse
- Institute of Pathology, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany.,Center for Integrated Oncology Cologne, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany.,Center for Integrated Oncology Bonn, University Hospital Bonn, Sigmund-Freud Straße 25, 53105, Bonn, Germany.,Lung Cancer Group Cologne, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany
| | - Martin Eilers
- Biocenter, University of Würzburg, Am Hubland, Würzburg, 97074, Germany
| | - Sven Perner
- Center for Integrated Oncology Cologne, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany.,Center for Integrated Oncology Bonn, University Hospital Bonn, Sigmund-Freud Straße 25, 53105, Bonn, Germany.,Biocenter, University of Würzburg, Am Hubland, Würzburg, 97074, Germany
| | - Lukas C Heukamp
- Institute of Pathology, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany.,Center for Integrated Oncology Cologne, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany.,Center for Integrated Oncology Bonn, University Hospital Bonn, Sigmund-Freud Straße 25, 53105, Bonn, Germany.,Lung Cancer Group Cologne, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany
| | - Reinhard Buettner
- Institute of Pathology, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany.,Center for Integrated Oncology Cologne, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany.,Center for Integrated Oncology Bonn, University Hospital Bonn, Sigmund-Freud Straße 25, 53105, Bonn, Germany.,Lung Cancer Group Cologne, University Hospital Cologne, Kerpener Straße 62, Cologne, 50937, Germany
| |
Collapse
|
10
|
Attallah O, Ma X. Bayesian neural network approach for determining the risk of re-intervention after endovascular aortic aneurysm repair. Proc Inst Mech Eng H 2014; 228:857-66. [PMID: 25212212 DOI: 10.1177/0954411914549980] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
This article proposes a Bayesian neural network approach to determine the risk of re-intervention after endovascular aortic aneurysm repair surgery. The target of proposed technique is to determine which patients have high chance to re-intervention (high-risk patients) and which are not (low-risk patients) after 5 years of the surgery. Two censored datasets relating to the clinical conditions of aortic aneurysms have been collected from two different vascular centers in the United Kingdom. A Bayesian network was first employed to solve the censoring issue in the datasets. Then, a back propagation neural network model was built using the uncensored data of the first center to predict re-intervention on the second center and classify the patients into high-risk and low-risk groups. Kaplan-Meier curves were plotted for each group of patients separately to show whether there is a significant difference between the two risk groups. Finally, the logrank test was applied to determine whether the neural network model was capable of predicting and distinguishing between the two risk groups. The results show that the Bayesian network used for uncensoring the data has improved the performance of the neural networks that were built for the two centers separately. More importantly, the neural network that was trained with uncensored data of the first center was able to predict and discriminate between groups of low risk and high risk of re-intervention after 5 years of endovascular aortic aneurysm surgery at center 2 (p = 0.0037 in the logrank test).
Collapse
Affiliation(s)
- Omneya Attallah
- Department of Electronics and Communications Engineering, Arab Academy for Science, Technology & Maritime Transport, Alexandria, Egypt School of Engineering and Applied Science, Aston University, Birmingham, UK
| | - Xianghong Ma
- School of Engineering and Applied Science, Aston University, Birmingham, UK
| |
Collapse
|
11
|
Kiani NA, Kaderali L. Dynamic probabilistic threshold networks to infer signaling pathways from time-course perturbation data. BMC Bioinformatics 2014; 15:250. [PMID: 25047753 PMCID: PMC4133630 DOI: 10.1186/1471-2105-15-250] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2013] [Accepted: 07/15/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Network inference deals with the reconstruction of molecular networks from experimental data. Given N molecular species, the challenge is to find the underlying network. Due to data limitations, this typically is an ill-posed problem, and requires the integration of prior biological knowledge or strong regularization. We here focus on the situation when time-resolved measurements of a system's response after systematic perturbations are available. RESULTS We present a novel method to infer signaling networks from time-course perturbation data. We utilize dynamic Bayesian networks with probabilistic Boolean threshold functions to describe protein activation. The model posterior distribution is analyzed using evolutionary MCMC sampling and subsequent clustering, resulting in probability distributions over alternative networks. We evaluate our method on simulated data, and study its performance with respect to data set size and levels of noise. We then use our method to study EGF-mediated signaling in the ERBB pathway. CONCLUSIONS Dynamic Probabilistic Threshold Networks is a new method to infer signaling networks from time-series perturbation data. It exploits the dynamic response of a system after external perturbation for network reconstruction. On simulated data, we show that the approach outperforms current state of the art methods. On the ERBB data, our approach recovers a significant fraction of the known interactions, and predicts novel mechanisms in the ERBB pathway.
Collapse
Affiliation(s)
- Narsis A Kiani
- Technische Universität Dresden, Medical Faculty Carl Gustav Carus, Institute for Medical Informatics and Biometry, Fetscherstr, 74, 01307 Dresden, Germany.
| | | |
Collapse
|
12
|
Boosting the concordance index for survival data--a unified framework to derive and evaluate biomarker combinations. PLoS One 2014; 9:e84483. [PMID: 24400093 PMCID: PMC3882229 DOI: 10.1371/journal.pone.0084483] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2013] [Accepted: 11/14/2013] [Indexed: 11/30/2022] Open
Abstract
The development of molecular signatures for the prediction of time-to-event outcomes is a methodologically challenging task in bioinformatics and biostatistics. Although there are numerous approaches for the derivation of marker combinations and their evaluation, the underlying methodology often suffers from the problem that different optimization criteria are mixed during the feature selection, estimation and evaluation steps. This might result in marker combinations that are suboptimal regarding the evaluation criterion of interest. To address this issue, we propose a unified framework to derive and evaluate biomarker combinations. Our approach is based on the concordance index for time-to-event data, which is a non-parametric measure to quantify the discriminatory power of a prediction rule. Specifically, we propose a gradient boosting algorithm that results in linear biomarker combinations that are optimal with respect to a smoothed version of the concordance index. We investigate the performance of our algorithm in a large-scale simulation study and in two molecular data sets for the prediction of survival in breast cancer patients. Our numerical results show that the new approach is not only methodologically sound but can also lead to a higher discriminatory power than traditional approaches for the derivation of gene signatures.
Collapse
|
13
|
Abstract
Recent developments in molecular biology have led to the massive discovery of new marker candidates for the prediction of patient survival. To evaluate the predictive value of these markers, statistical tools for measuring the performance of survival models are needed. We consider estimators of discrimination measures, which are a popular approach to evaluate survival predictions in biomarker studies. Estimators of discrimination measures are usually based on regularity assumptions such as the proportional hazards assumption. Based on two sets of molecular data and a simulation study, we show that violations of the regularity assumptions may lead to over-optimistic estimates of prediction accuracy and may therefore result in biased conclusions regarding the clinical utility of new biomarkers. In particular, we demonstrate that biased medical decision making is possible even if statistical checks indicate that all regularity assumptions are satisfied.
Collapse
|
14
|
Kim J, Sohn I, Son DS, Kim DH, Ahn T, Jung SH. Prediction of a time-to-event trait using genome wide SNP data. BMC Bioinformatics 2013; 14:58. [PMID: 23418752 PMCID: PMC3651372 DOI: 10.1186/1471-2105-14-58] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2012] [Accepted: 02/12/2013] [Indexed: 02/07/2023] Open
Abstract
Background A popular objective of many high-throughput genome projects is to discover various genomic markers associated with traits and develop statistical models to predict traits of future patients based on marker values. Results In this paper, we present a prediction method for time-to-event traits using genome-wide single-nucleotide polymorphisms (SNPs). We also propose a MaxTest associating between a time-to-event trait and a SNP accounting for its possible genetic models. The proposed MaxTest can help screen out nonprognostic SNPs and identify genetic models of prognostic SNPs. The performance of the proposed method is evaluated through simulations. Conclusions In conjunction with the MaxTest, the proposed method provides more parsimonious prediction models but includes more prognostic SNPs than some naive prediction methods. The proposed method is demonstrated with real GWAS data.
Collapse
Affiliation(s)
- Jinseog Kim
- Department of Statistics and Information Science, Dongguk University, Gyeongju 780-714, Korea
| | | | | | | | | | | |
Collapse
|
15
|
Schulte JH, Schowe B, Mestdagh P, Kaderali L, Kalaghatgi P, Schlierf S, Vermeulen J, Brockmeyer B, Pajtler K, Thor T, de Preter K, Speleman F, Morik K, Eggert A, Vandesompele J, Schramm A. Accurate prediction of neuroblastoma outcome based on miRNA expression profiles. Int J Cancer 2010; 127:2374-85. [PMID: 20473924 DOI: 10.1002/ijc.25436] [Citation(s) in RCA: 83] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
For neuroblastoma, the most common extracranial tumour of childhood, identification of new biomarkers and potential therapeutic targets is mandatory to improve risk stratification and survival rates. MicroRNAs are deregulated in most cancers, including neuroblastoma. In this study, we analysed 430 miRNAs in 69 neuroblastomas by stem-loop RT-qPCR. Prediction of event-free survival (EFS) with support vector machines (SVM) and actual survival times with Cox regression-based models (CASPAR) were highly accurate and were independently validated. SVM-accuracy for prediction of EFS was 88.7% (95% CI: 88.5-88.8%). For CASPAR-based predictions, 5y-EFS probability was 0.19% (95% CI: 0-38%) in the CASPAR-predicted short survival group compared with 0.78% (95%CI: 64-93%) in the CASPAR-predicted long survival group. Both classifiers were validated on an independent test set yielding accuracies of 94.74% (SVM) and 5y-EFS probabilities as 0.25 (95% CI: 0.0-0.55) for short versus 1 ± 0.0 for long survival (CASPAR), respectively. Amplification of the MYCN oncogene was highly correlated with deregulation of miRNA expression. In addition, 37 miRNAs correlated with TrkA expression, a marker of excellent outcome, and 6 miRNAs further analysed in vitro were regulated upon TrkA transfection, suggesting a functional relationship. Expression of the most significant TrkA-correlated miRNA, miR-542-5p, also discriminated between local and metastatic disease and was inversely correlated with MYCN amplification and event-free survival. We conclude that neuroblastoma patient outcome prediction using miRNA expression is feasible and effective. Studies testing miRNA-based predictors in comparison to and in combination with mRNA and aCGH information should be initiated. Specific miRNAs (e.g., miR-542-5p) might be important in neuroblastoma tumour biology, and qualify as potential therapeutic targets.
Collapse
Affiliation(s)
- Johannes H Schulte
- University Children's Hospital Essen, Hufelandstr 55, 45122 Essen, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
16
|
Mazur J, Ritter D, Reinelt G, Kaderali L. Reconstructing nonlinear dynamic models of gene regulation using stochastic sampling. BMC Bioinformatics 2009; 10:448. [PMID: 20038296 PMCID: PMC2811124 DOI: 10.1186/1471-2105-10-448] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2009] [Accepted: 12/28/2009] [Indexed: 12/01/2022] Open
Abstract
Background The reconstruction of gene regulatory networks from time series gene expression data is one of the most difficult problems in systems biology. This is due to several reasons, among them the combinatorial explosion of possible network topologies, limited information content of the experimental data with high levels of noise, and the complexity of gene regulation at the transcriptional, translational and post-translational levels. At the same time, quantitative, dynamic models, ideally with probability distributions over model topologies and parameters, are highly desirable. Results We present a novel approach to infer such models from data, based on nonlinear differential equations, which we embed into a stochastic Bayesian framework. We thus address both the stochasticity of experimental data and the need for quantitative dynamic models. Furthermore, the Bayesian framework allows it to easily integrate prior knowledge into the inference process. Using stochastic sampling from the Bayes' posterior distribution, our approach can infer different likely network topologies and model parameters along with their respective probabilities from given data. We evaluate our approach on simulated data and the challenge #3 data from the DREAM 2 initiative. On the simulated data, we study effects of different levels of noise and dataset sizes. Results on real data show that the dynamics and main regulatory interactions are correctly reconstructed. Conclusions Our approach combines dynamic modeling using differential equations with a stochastic learning framework, thus bridging the gap between biophysical modeling and stochastic inference approaches. Results show that the method can reap the advantages of both worlds, and allows the reconstruction of biophysically accurate dynamic models from noisy data. In addition, the stochastic learning framework used permits the computation of probability distributions over models and model parameters, which holds interesting prospects for experimental design purposes.
Collapse
Affiliation(s)
- Johanna Mazur
- Viroquant Research Group Modeling, University of Heidelberg, Bioquant BQ26, INF 267, D-69120 Heidelberg, Germany.
| | | | | | | |
Collapse
|
17
|
Pang H, Datta D, Zhao H. Pathway analysis using random forests with bivariate node-split for survival outcomes. ACTA ACUST UNITED AC 2009; 26:250-8. [PMID: 19933158 DOI: 10.1093/bioinformatics/btp640] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
MOTIVATION There is great interest in pathway-based methods for genomics data analysis in the research community. Although machine learning methods, such as random forests, have been developed to correlate survival outcomes with a set of genes, no study has assessed the abilities of these methods in incorporating pathway information for analyzing microarray data. In general, genes that are identified without incorporating biological knowledge are more difficult to interpret. Correlating pathway-based gene expression with survival outcomes may lead to biologically more meaningful prognosis biomarkers. Thus, a comprehensive study on how these methods perform in a pathway-based setting is warranted. RESULTS In this article, we describe a pathway-based method using random forests to correlate gene expression data with survival outcomes and introduce a novel bivariate node-splitting random survival forests. The proposed method allows researchers to identify important pathways for predicting patient prognosis and time to disease progression, and discover important genes within those pathways. We compared different implementations of random forests with different split criteria and found that bivariate node-splitting random survival forests with log-rank test is among the best. We also performed simulation studies that showed random forests outperforms several other machine learning algorithms and has comparable results with a newly developed component-wise Cox boosting model. Thus, pathway-based survival analysis using machine learning tools represents a promising approach in dissecting pathways and for generating new biological hypothesis from microarray studies. AVAILABILITY R package Pwayrfsurvival is available from URL: http://www.duke.edu/~hp44/pwayrfsurvival.htm. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Herbert Pang
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710, USA.
| | | | | |
Collapse
|
18
|
Oberthuer A, Theissen J, Westermann F, Hero B, Fischer M. Molecular characterization and classification of neuroblastoma. Future Oncol 2009; 5:625-39. [DOI: 10.2217/fon.09.41] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
For many decades, neuroblastoma has remained a challenging disease for both clinicians and researchers. Now, techniques that efficiently specify both comprehensive genetic and gene-expression alterations of neuroblastoma tumors have provided molecular markers that indicate tumor behavior and patient outcome with very high accuracy. Once the anticipated value of these markers has been confirmed in ongoing studies, patients may profit from more accurate risk assessment by integrating these markers into clinical routine. Moreover, disclosing further tumor-initiating events, such as the recently revealed oncogenic mutations of ALK, will further promote the elucidation of the genetic etiology of the disease. Together with recent information on altered signaling pathways in aggressively growing tumors, this knowledge will help to establish therapeutic strategies specifically targeting molecular key factors of neuroblastoma tumor progression.
Collapse
Affiliation(s)
- André Oberthuer
- University Children’s Hospital, Department of Pediatric Oncology, Kerpener Strasse 62, 50924 Cologne, Germany
| | - Jessica Theissen
- University of Cologne, Children’s Hospital, Department of Pediatric Oncology, Kerpener Strasse 62, 50924 Cologne, Germany
| | - Frank Westermann
- Department of Tumor Genetics German Cancer Research Center, Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
| | - Barbara Hero
- University of Cologne, Children’s Hospital, Department of Pediatric Oncology, Kerpener Strasse 62, 50924 Cologne, Germany
| | - Matthias Fischer
- University of Cologne, Children’s Hospital, Department of Pediatric Oncology, Kerpener Strasse 62, 50924 Cologne, Germany
| |
Collapse
|
19
|
Abstract
MOTIVATION There has been an increasing interest in expressing a survival phenotype (e.g. time to cancer recurrence or death) or its distribution in terms of a subset of the expression data of a subset of genes. Due to high dimensionality of gene expression data, however, there is a serious problem of collinearity in fitting a prediction model, e.g. Cox's proportional hazards model. To avoid the collinearity problem, several methods based on penalized Cox proportional hazards models have been proposed. However, those methods suffer from severe computational problems, such as slow or even failed convergence, because of high-dimensional matrix inversions required for model fitting. We propose to implement the penalized Cox regression with a lasso penalty via the gradient lasso algorithm that yields faster convergence to the global optimum than do other algorithms. Moreover the gradient lasso algorithm is guaranteed to converge to the optimum under mild regularity conditions. Hence, our gradient lasso algorithm can be a useful tool in developing a prediction model based on high-dimensional covariates including gene expression data. RESULTS Results from simulation studies showed that the prediction model by gradient lasso recovers the prognostic genes. Also results from diffuse large B-cell lymphoma datasets and Norway/Stanford breast cancer dataset indicate that our method is very competitive compared with popular existing methods by Park and Hastie and Goeman in its computational time, prediction and selectivity. AVAILABILITY R package glcoxph is available at http://datamining.dongguk.ac.kr/R/glcoxph.
Collapse
Affiliation(s)
- Insuk Sohn
- Department of Biostatistics & Bioinformatics, Duke University, NC 27705, USA
| | | | | | | |
Collapse
|
20
|
Reanalysis of neuroblastoma expression profiling data using improved methodology and extended follow-up increases validity of outcome prediction. Cancer Lett 2009; 282:55-62. [PMID: 19349112 DOI: 10.1016/j.canlet.2009.02.052] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2008] [Revised: 02/25/2009] [Accepted: 02/26/2009] [Indexed: 11/20/2022]
Abstract
Neuroblastoma is the most common extracranial childhood tumor, comprising 15% of all childhood cancer deaths. In an initial study, we used Affymetrix oligonucleotide microarrays to analyse gene expression in 68 primary neuroblastomas and compared different data mining approaches for prediction of early relapse. Here, we performed re-analyses of the data including prolonged follow-up and applied support vector machine (SVM) algorithms and outer cross-validation strategies to improve reliability of expression profiling based predictors. Accuracy of outcome prediction was significantly improved by the use of innovative SVM algorithms on the updated data. In addition, CASPAR, a hierarchical Bayesian approach, was used to predict survival times for the individual patient based on expression profiling data. CASPAR reliably predicted event-free survival, given a cut-off time of three years. Differential expression of genes used by CASPAR to predict patient outcome was validated in an independent cohort of 117 neuroblastomas. In conclusion, we show here for the first time that reanalysis of microarray data using improved methodology, state-of-the-art performance tests and updated follow-up data improves prognosis prediction, and may further improve risk stratification of individual patients.
Collapse
|
21
|
van Wieringen WN, Kun D, Hampel R, Boulesteix AL. Survival prediction using gene expression data: A review and comparison. Comput Stat Data Anal 2009. [DOI: 10.1016/j.csda.2008.05.021] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
22
|
Annest A, Bumgarner RE, Raftery AE, Yeung KY. Iterative Bayesian Model Averaging: a method for the application of survival analysis to high-dimensional microarray data. BMC Bioinformatics 2009; 10:72. [PMID: 19245714 PMCID: PMC2657791 DOI: 10.1186/1471-2105-10-72] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2008] [Accepted: 02/26/2009] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Microarray technology is increasingly used to identify potential biomarkers for cancer prognostics and diagnostics. Previously, we have developed the iterative Bayesian Model Averaging (BMA) algorithm for use in classification. Here, we extend the iterative BMA algorithm for application to survival analysis on high-dimensional microarray data. The main goal in applying survival analysis to microarray data is to determine a highly predictive model of patients' time to event (such as death, relapse, or metastasis) using a small number of selected genes. Our multivariate procedure combines the effectiveness of multiple contending models by calculating the weighted average of their posterior probability distributions. Our results demonstrate that our iterative BMA algorithm for survival analysis achieves high prediction accuracy while consistently selecting a small and cost-effective number of predictor genes. RESULTS We applied the iterative BMA algorithm to two cancer datasets: breast cancer and diffuse large B-cell lymphoma (DLBCL) data. On the breast cancer data, the algorithm selected a total of 15 predictor genes across 84 contending models from the training data. The maximum likelihood estimates of the selected genes and the posterior probabilities of the selected models from the training data were used to divide patients in the test (or validation) dataset into high- and low-risk categories. Using the genes and models determined from the training data, we assigned patients from the test data into highly distinct risk groups (as indicated by a p-value of 7.26e-05 from the log-rank test). Moreover, we achieved comparable results using only the 5 top selected genes with 100% posterior probabilities. On the DLBCL data, our iterative BMA procedure selected a total of 25 genes across 3 contending models from the training data. Once again, we assigned the patients in the validation set to significantly distinct risk groups (p-value = 0.00139). CONCLUSION The strength of the iterative BMA algorithm for survival analysis lies in its ability to account for model uncertainty. The results from this study demonstrate that our procedure selects a small number of genes while eclipsing other methods in predictive performance, making it a highly accurate and cost-effective prognostic tool in the clinical setting.
Collapse
Affiliation(s)
- Amalia Annest
- Institute of Technology/Computing and Software Systems, Box 358426, University of Washington, Tacoma, WA 98402, USA
| | - Roger E Bumgarner
- Department of Microbiology, Box 358070, University of Washington, Seattle, WA 98195, USA
| | - Adrian E Raftery
- Department of Statistics, Box 354320, University of Washington, Seattle, WA 98195, USA
| | - Ka Yee Yeung
- Department of Microbiology, Box 358070, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
23
|
Lee ES, Son DS, Kim SH, Lee J, Jo J, Han J, Kim H, Lee HJ, Choi HY, Jung Y, Park M, Lim YS, Kim K, Shim Y, Kim BC, Lee K, Huh N, Ko C, Park K, Lee JW, Choi YS, Kim J. Prediction of recurrence-free survival in postoperative non-small cell lung cancer patients by using an integrated model of clinical information and gene expression. Clin Cancer Res 2009; 14:7397-404. [PMID: 19010856 DOI: 10.1158/1078-0432.ccr-07-4937] [Citation(s) in RCA: 205] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
PURPOSE One of the main challenges of lung cancer research is identifying patients at high risk for recurrence after surgical resection. Simple, accurate, and reproducible methods of evaluating individual risks of recurrence are needed. EXPERIMENTAL DESIGN Based on a combined analysis of time-to-recurrence data, censoring information, and microarray data from a set of 138 patients, we selected statistically significant genes thought to be predictive of disease recurrence. The number of genes was further reduced by eliminating those whose expression levels were not reproducible by real-time quantitative PCR. Within these variables, a recurrence prediction model was constructed using Cox proportional hazard regression and validated via two independent cohorts (n = 56 and n = 59). RESULTS After performing a log-rank test of the microarray data and successively selecting genes based on real-time quantitative PCR analysis, the most significant 18 genes had P values of <0.05. After subsequent stepwise variable selection based on gene expression information and clinical variables, the recurrence prediction model consisted of six genes (CALB1, MMP7, SLC1A7, GSTA1, CCL19, and IFI44). Two pathologic variables, pStage and cellular differentiation, were developed. Validation by two independent cohorts confirmed that the proposed model is significantly accurate (P = 0.0314 and 0.0305, respectively). The predicted median recurrence-free survival times for each patient correlated well with the actual data. CONCLUSIONS We have developed an accurate, technically simple, and reproducible method for predicting individual recurrence risks. This model would potentially be useful in developing customized strategies for managing lung cancer.
Collapse
Affiliation(s)
- Eung-Sirk Lee
- Cancer Research Center, Center for Clinical Research, Samsung Biomedical Research Institute, Seoul, South Korea
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Oberthuer A, Kaderali L, Kahlert Y, Hero B, Westermann F, Berthold F, Brors B, Eils R, Fischer M. Subclassification and individual survival time prediction from gene expression data of neuroblastoma patients by using CASPAR. Clin Cancer Res 2008; 14:6590-601. [PMID: 18927300 DOI: 10.1158/1078-0432.ccr-07-4377] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
PURPOSE To predict individual survival times for neuroblastoma patients from gene expression data using the cancer survival prediction using automatic relevance determination (CASPAR) algorithm. EXPERIMENTAL DESIGN A first set of oligonucleotide microarray gene expression profiles comprising 256 neuroblastoma patients was generated. Then, CASPAR was combined with a leave-one-out cross-validation to predict individual times for both the whole cohort and subgroups of patients with unfavorable markers, including stage 4 disease (n = 67), unfavorable genetic alterations, intermediate-risk or high-risk stratification by the German neuroblastoma trial, and patients predicted as unfavorable by a recently described gene expression classifier (n = 83). Prediction accuracy of individual survival times was assessed by Kaplan-Meier analyses and time-dependent receiver operator characteristics curve analyses. Subsequently, classification results were validated in an independent cohort (n = 120). RESULTS CASPAR separated patients with divergent outcome in both the initial and the validation cohort [initial set, 5y-OS 0.94 +/- 0.04 (predicted long survival) versus 0.38 +/- 0.17 (predicted short survival), P < 0.0001; validation cohort, 5y-OS 0.94 +/- 0.07 (long) versus 0.40 +/- 0.13 (short), P < 0.0001]. Time-dependent receiver operator characteristics analyses showed that CASPAR-predicted individual survival times were highly accurate (initial set, mean area under the curve for first 10 years of overall survival prediction 0.92 +/- 0.04; validation set, 0.81 +/- 0.05). Furthermore, CASPAR significantly discriminated short (<5 years) from long survivors (>5 years) in subgroups of patients with unfavorable markers with the exception of MYCN-amplified patients (initial set). Confirmatory results with high significance were observed in the validation cohort [stage 4 disease (P = 0.0049), NB2004 intermediate-risk or high-risk stratification (P = 0.0017), and unfavorable gene expression prediction (P = 0.0017)]. CONCLUSIONS CASPAR accurately forecasts individual survival times for neuroblastoma patients from gene expression data.
Collapse
Affiliation(s)
- André Oberthuer
- Department of Pediatric Oncology and Hematology, University of Cologne, Cologne, Germany.
| | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Abstract
Gastric cancer has traditionally been staged using purely histological methods, but these methods provide little information about the biology of gastric cancer and have limited predictive power. Recent studies have shown that clinically relevant gastric cancer subtypes have distinct gene expression profiles. This approach, termed molecular staging, can lead to the discovery of novel diagnostic and prognostic biomarkers of gastric cancers. This update reviews advances in molecular staging of gastric cancer and discusses their implications for the prognosis and diagnosis of this complex disease. Technologies used in molecular staging as well as future directions for the optimization of molecular staging of gastric cancer are also discussed.
Collapse
Affiliation(s)
- Yan Jie Zhang
- Shanghai Institute of Digestive Disease, Shanghai Jiaotong University School of Medicine Renji Hospital, Shanghai, China
| | | |
Collapse
|
26
|
Diaz-Uriarte R. SignS: a parallelized, open-source, freely available, web-based tool for gene selection and molecular signatures for survival and censored data. BMC Bioinformatics 2008; 9:30. [PMID: 18208605 PMCID: PMC2265264 DOI: 10.1186/1471-2105-9-30] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2007] [Accepted: 01/21/2008] [Indexed: 11/17/2022] Open
Abstract
Background Censored data are increasingly common in many microarray studies that attempt to relate gene expression to patient survival. Several new methods have been proposed in the last two years. Most of these methods, however, are not available to biomedical researchers, leading to many re-implementations from scratch of ad-hoc, and suboptimal, approaches with survival data. Results We have developed SignS (Signatures for Survival data), an open-source, freely-available, web-based tool and R package for gene selection, building molecular signatures, and prediction with survival data. SignS implements four methods which, according to existing reviews, perform well and, by being of a very different nature, offer complementary approaches. We use parallel computing via MPI, leading to large decreases in user waiting time. Cross-validation is used to asses predictive performance and stability of solutions, the latter an issue of increasing concern given that there are often several solutions with similar predictive performance. Biological interpretation of results is enhanced because genes and signatures in models can be sent to other freely-available on-line tools for examination of PubMed references, GO terms, and KEGG and Reactome pathways of selected genes. Conclusion SignS is the first web-based tool for survival analysis of expression data, and one of the very few with biomedical researchers as target users. SignS is also one of the few bioinformatics web-based applications to extensively use parallelization, including fault tolerance and crash recovery. Because of its combination of methods implemented, usage of parallel computing, code availability, and links to additional data bases, SignS is a unique tool, and will be of immediate relevance to biomedical researchers, biostatisticians and bioinformaticians.
Collapse
Affiliation(s)
- Ramon Diaz-Uriarte
- Statistical Computing Team, Structural Biology and Biocomputing Programme, Spanish National Cancer Center (CNIO), Melchor Fernández Almagro 3, Madrid, 28029, Spain.
| |
Collapse
|
27
|
Inferring Gene Regulatory Networks from Expression Data. COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS 2008. [DOI: 10.1007/978-3-540-76803-6_2] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
|
28
|
Schramm A, Vandesompele J, Schulte JH, Dreesmann S, Kaderali L, Brors B, Eils R, Speleman F, Eggert A. Translating expression profiling into a clinically feasible test to predict neuroblastoma outcome. Clin Cancer Res 2007; 13:1459-65. [PMID: 17332289 DOI: 10.1158/1078-0432.ccr-06-2032] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
PURPOSE To assess the feasibility of predicting neuroblastoma outcome using highly parallel quantitative real-time PCR data. EXPERIMENTAL DESIGN We generated expression profiles of 63 neuroblastoma patients, 47 of which were analyzed by both Affymetrix U95A microarrays and highly parallel real-time PCR on microfluidic cards (MFC; Applied Biosystems). Top-ranked genes discriminating patients with event-free survival or relapse according to high-level analysis of Affymetrix chip data, as well as known neuroblastoma marker genes (MYCN and NTRK1/TrkA), were quantified simultaneously by real-time PCR. Analysis of PCR data was accomplished using high-level bioinformatics methods including prediction analysis of microarray, significance analysis of microarray, and Computerized Affected Sibling Pair Analyzer and Reporter. RESULTS Internal validation of the MFC method proved it highly reproducible. Correlation of MFC and chip expression data varied markedly for some genes. Outcome prediction using prediction analysis of microarray on real-time PCR data resulted in 80% accuracy, which is comparable to results obtained using the Affymetrix platform. Real-time PCR data were useful for risk assessment of relapsing neuroblastoma (P = 0.0006, log-rank test) when Computerized Affected Sibling Pair Analyzer and Reporter analysis was applied. CONCLUSIONS These data suggest that multiplex real-time PCR might be a promising approach to reduce the complexity of information obtained from whole-genome array experiments. It could provide a more convenient and less expensive tool for routine application in a clinical setting.
Collapse
Affiliation(s)
- Alexander Schramm
- Division of Hematology and Oncology, University Children's Hospital Essen, Essen, Germany
| | | | | | | | | | | | | | | | | |
Collapse
|