1
|
Wang W, Liu W. Integration of gene interaction information into a reweighted Lasso-Cox model for accurate survival prediction. Bioinformatics 2020; 36:5405-5414. [PMID: 33325490 DOI: 10.1093/bioinformatics/btaa1046] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 11/13/2020] [Accepted: 12/07/2020] [Indexed: 11/14/2022] Open
Abstract
Abstract
Motivation
Accurately predicting the risk of cancer patients is a central challenge for clinical cancer research. For high-dimensional gene expression data, Cox proportional hazard model with the least absolute shrinkage and selection operator for variable selection (Lasso-Cox) is one of the most popular feature selection and risk prediction algorithms. However, the Lasso-Cox model treats all genes equally, ignoring the biological characteristics of the genes themselves. This often encounters the problem of poor prognostic performance on independent datasets.
Results
Here, we propose a Reweighted Lasso-Cox (RLasso-Cox) model to ameliorate this problem by integrating gene interaction information. It is based on the hypothesis that topologically important genes in the gene interaction network tend to have stable expression changes. We used random walk to evaluate the topological weight of genes, and then highlighted topologically important genes to improve the generalization ability of the RLasso-Cox model. Experiments on datasets of three cancer types showed that the RLasso-Cox model improves the prognostic accuracy and robustness compared with the Lasso-Cox model and several existing network-based methods. More importantly, the RLasso-Cox model has the advantage of identifying small gene sets with high prognostic performance on independent datasets, which may play an important role in identifying robust survival biomarkers for various cancer types.
Availability and implementation
http://bioconductor.org/packages/devel/bioc/html/RLassoCox.html
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wei Wang
- Department of Mathematics, College of Science, Heilongjiang Institute of Technology, Harbin 150050, China
| | - Wei Liu
- Department of Mathematics, College of Science, Heilongjiang Institute of Technology, Harbin 150050, China
| |
Collapse
|
2
|
Li M, Zhao J, Li X, Chen Y, Feng C, Qian F, Liu Y, Zhang J, He J, Ai B, Ning Z, Liu W, Bai X, Han X, Wu Z, Xu X, Tang Z, Pan Q, Xu L, Li C, Wang Q, Li E. HiFreSP: A novel high-frequency sub-pathway mining approach to identify robust prognostic gene signatures. Brief Bioinform 2020; 21:1411-1424. [PMID: 31350847 DOI: 10.1093/bib/bbz078] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2019] [Revised: 05/19/2019] [Accepted: 06/04/2019] [Indexed: 02/05/2023] Open
Abstract
With the increasing awareness of heterogeneity in cancers, better prediction of cancer prognosis is much needed for more personalized treatment. Recently, extensive efforts have been made to explore the variations in gene expression for better prognosis. However, the prognostic gene signatures predicted by most existing methods have little robustness among different datasets of the same cancer. To improve the robustness of the gene signatures, we propose a novel high-frequency sub-pathways mining approach (HiFreSP), integrating a randomization strategy with gene interaction pathways. We identified a six-gene signature (CCND1, CSF3R, E2F2, JUP, RARA and TCF7) in esophageal squamous cell carcinoma (ESCC) by HiFreSP. This signature displayed a strong ability to predict the clinical outcome of ESCC patients in two independent datasets (log-rank test, P = 0.0045 and 0.0087). To further show the predictive performance of HiFreSP, we applied it to two other cancers: pancreatic adenocarcinoma and breast cancer. The identified signatures show high predictive power in all testing datasets of the two cancers. Furthermore, compared with the two popular prognosis signature predicting methods, the least absolute shrinkage and selection operator penalized Cox proportional hazards model and the random survival forest, HiFreSP showed better predictive accuracy and generalization across all testing datasets of the above three cancers. Lastly, we applied HiFreSP to 8137 patients involving 20 cancer types in the TCGA database and found high-frequency prognosis-associated pathways in many cancers. Taken together, HiFreSP shows higher prognostic capability and greater robustness, and the identified signatures provide clinical guidance for cancer prognosis. HiFreSP is freely available via GitHub: https://github.com/chunquanlipathway/HiFreSP.
Collapse
Affiliation(s)
- Meng Li
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, China
| | - Jianmei Zhao
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, China
| | - Xuecang Li
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, China
| | - Yang Chen
- Key Laboratory of Molecular Biology for High Cancer Incidence Coastal Chaoshan Area, Shantou University Medical College, Shantou, China
| | - Chenchen Feng
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, China
| | - Fengcui Qian
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, China
| | - Yuejuan Liu
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, China
| | - Jian Zhang
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, China
| | - Jianzhong He
- Institute of Oncologic Pathology, Shantou University Medical College
| | - Bo Ai
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, China
| | - Ziyu Ning
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, China
| | - Wei Liu
- Key Laboratory of Molecular Biology for High Cancer Incidence Coastal Chaoshan Area, Shantou University Medical College, Shantou, China
| | - Xuefeng Bai
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, China
| | - Xiaole Han
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, China
| | - Zhiyong Wu
- Departments of Oncology Surgery, Shantou Central Hospital, Affiliated Shantou Hospital of Sun Yat-Sen University
| | - Xiue Xu
- Institute of Oncologic Pathology, Shantou University Medical College
| | - Zhidong Tang
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, China
| | - Qi Pan
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, China
| | - Liyan Xu
- Key Laboratory of Molecular Biology for High Cancer Incidence Coastal Chaoshan Area, Shantou University Medical College, Shantou, China
| | - Chunquan Li
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, China
| | - Qiuyu Wang
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, China
| | - Enmin Li
- Key Laboratory of Molecular Biology for High Cancer Incidence Coastal Chaoshan Area, Shantou University Medical College, Shantou, China
| |
Collapse
|
3
|
Wang W, Liu W. Integration of gene interaction information into a reweighted random survival forest approach for accurate survival prediction and survival biomarker discovery. Sci Rep 2018; 8:13202. [PMID: 30181543 PMCID: PMC6123437 DOI: 10.1038/s41598-018-31497-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2017] [Accepted: 08/20/2018] [Indexed: 02/05/2023] Open
Abstract
Accurately predicting patient risk and identifying survival biomarkers are two important tasks in survival analysis. For the emerging high-throughput gene expression data, random survival forest (RSF) is attracting more and more attention as it not only shows excellent performance on survival prediction problems with high-dimensional variables, but also is capable of identifying important variables according to variable importance automatically calculated within the algorithm. However, RSF still suffers from some problems such as limited predictive accuracy on independent datasets and limited biological interpretation of survival biomarkers. In this study, we integrated gene interaction information into a Reweighted RSF model (RRSF) to improve predictive accuracy and identify biologically meaningful survival markers. We applied RRSF to the prediction of patients with glioblastoma multiforme (GBM) and esophageal squamous cell carcinoma (ESCC). With a reconstructed global pathway network and an mRNA-lncRNA co-expression network as the prior gene interaction information, RRSF showed better overall predictive performance than RSF on three GBM and two ESCC datasets. In addition, RRSF identified a two-gene and three-lncRNA signature, which showed robust prognostic values and had high biological relevance to the development of GBM and ESCC, respectively.
Collapse
Affiliation(s)
- Wei Wang
- Department of Mathematics, Heilongjiang Institute of Technology, Harbin, 150050, China
| | - Wei Liu
- Department of Mathematics, Heilongjiang Institute of Technology, Harbin, 150050, China.
- The Key Laboratory of Molecular Biology for High Cancer Incidence Coastal Chaoshan Area, Shantou University Medical College, Shantou, 515041, China.
| |
Collapse
|
4
|
Liu W, Wang W, Tian G, Xie W, Lei L, Liu J, Huang W, Xu L, Li E. Topologically inferring pathway activity for precise survival outcome prediction: breast cancer as a case. MOLECULAR BIOSYSTEMS 2017; 13:537-548. [PMID: 28098303 DOI: 10.1039/c6mb00757k] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Accurately predicting the survival outcome of patients is of great importance in clinical cancer research. In the past decade, building survival prediction models based on gene expression data has received increasing interest. However, the existing methods are mainly based on individual gene signatures, which are known to have limited prediction accuracy on independent datasets and unclear biological relevance. Here, we propose a novel pathway-based survival prediction method called DRWPSurv in order to accurately predict survival outcome. DRWPSurv integrates gene expression profiles and prior gene interaction information to topologically infer survival associated pathway activities, and uses the pathway activities as features to construct Lasso-Cox model. It uses topological importance of genes evaluated by directed random walk to enhance the robustness of pathway activities and thereby improve the predictive performance. We applied DRWPSurv on three independent breast cancer datasets and compared the predictive performance with a traditional gene-based method and four pathway-based methods. Results showed that pathway-based methods obtained comparable or better predictive performance than the gene-based method, whereas DRWPSurv could predict survival outcome with better accuracy and robustness among the pathway-based methods. In addition, the risk pathways identified by DRWPSurv provide biologically informative models for breast cancer prognosis and treatment.
Collapse
Affiliation(s)
- Wei Liu
- The Key Laboratory of Molecular Biology for High Cancer Incidence Coastal Chaoshan Area, Shantou University Medical College, Shantou, 515041, China. and Department of Mathematics, Heilongjiang Institute of Technology, Harbin, 150050, China
| | - Wei Wang
- Department of Mathematics, Heilongjiang Institute of Technology, Harbin, 150050, China
| | - Guohua Tian
- Department of Mathematics, Heilongjiang Institute of Technology, Harbin, 150050, China
| | - Wenming Xie
- Network Information Center, Shantou University Medical College, Shantou, 515041, China
| | - Li Lei
- Network Information Center, Shantou University Medical College, Shantou, 515041, China
| | - Jiujin Liu
- Network Information Center, Shantou University Medical College, Shantou, 515041, China
| | - Wanxun Huang
- Network Information Center, Shantou University Medical College, Shantou, 515041, China
| | - Liyan Xu
- The Key Laboratory of Molecular Biology for High Cancer Incidence Coastal Chaoshan Area, Shantou University Medical College, Shantou, 515041, China. and Institute of Oncologic Pathology, Shantou University Medical College, Shantou, 515041, China
| | - Enmin Li
- The Key Laboratory of Molecular Biology for High Cancer Incidence Coastal Chaoshan Area, Shantou University Medical College, Shantou, 515041, China. and Department of Biochemistry and Molecular Biology, Shantou University Medical College, Shantou 515041, China
| |
Collapse
|
5
|
Lv W, Wang Q, Chen H, Jiang Y, Zheng J, Shi M, Xu Y, Han J, Li C, Zhang R. Prioritization of rheumatoid arthritis risk subpathways based on global immune subpathway interaction network and random walk strategy. MOLECULAR BIOSYSTEMS 2016; 11:2986-97. [PMID: 26289534 DOI: 10.1039/c5mb00247h] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The initiation and development of rheumatoid arthritis (RA) is closely related to mutual dysfunction of multiple pathways. Furthermore, some similar molecular mechanisms are shared between RA and other immune diseases. Therefore it is vital to reveal the molecular mechanism of RA through searching for subpathways of immune diseases and investigating the crosstalk effect among subpathways. Here we exploited an integrated approach combining both construction of a subpathway-subpathway interaction network and a random walk strategy to prioritize RA risk subpathways. Our research can be divided into three parts: (1) acquisition of risk genes and identification of risk subpathways of 85 immune diseases by using subpathway-lenient distance similarity (subpathway-LDS) method; (2) construction of a global immune subpathway interaction (GISI) network with subpathways identified by subpathway-LDS; (3) optimization of RA risk subpathways by random walk strategy based on GISI network. The results showed that our method could effectively identify RA risk subpathways, such as MAPK signaling pathway, prostate cancer pathway and chemokine signaling pathway. The integrated strategy considering crosstalk between immune subpathways significantly improved the effect of risk subpathway identification. With the development of GWAS, our method will provide insight into exploring molecular mechanisms of immune diseases and might be a promising approach for studying other diseases.
Collapse
Affiliation(s)
- Wenhua Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150086, China.
| | | | | | | | | | | | | | | | | | | |
Collapse
|