1
|
Shen J, Wang S, Sun H, Huang J, Bai L, Wang X, Dong Y, Tang Z. A novel non-negative Bayesian stacking modeling method for Cancer survival prediction using high-dimensional omics data. BMC Med Res Methodol 2024; 24:105. [PMID: 38702624 PMCID: PMC11067084 DOI: 10.1186/s12874-024-02232-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 04/23/2024] [Indexed: 05/06/2024] Open
Abstract
BACKGROUND Survival prediction using high-dimensional molecular data is a hot topic in the field of genomics and precision medicine, especially for cancer studies. Considering that carcinogenesis has a pathway-based pathogenesis, developing models using such group structures is a closer mimic of disease progression and prognosis. Many approaches can be used to integrate group information; however, most of them are single-model methods, which may account for unstable prediction. METHODS We introduced a novel survival stacking method that modeled using group structure information to improve the robustness of cancer survival prediction in the context of high-dimensional omics data. With a super learner, survival stacking combines the prediction from multiple sub-models that are independently trained using the features in pre-grouped biological pathways. In addition to a non-negative linear combination of sub-models, we extended the super learner to non-negative Bayesian hierarchical generalized linear model and artificial neural network. We compared the proposed modeling strategy with the widely used survival penalized method Lasso Cox and several group penalized methods, e.g., group Lasso Cox, via simulation study and real-world data application. RESULTS The proposed survival stacking method showed superior and robust performance in terms of discrimination compared with single-model methods in case of high-noise simulated data and real-world data. The non-negative Bayesian stacking method can identify important biological signal pathways and genes that are associated with the prognosis of cancer. CONCLUSIONS This study proposed a novel survival stacking strategy incorporating biological group information into the cancer prognosis models. Additionally, this study extended the super learner to non-negative Bayesian model and ANN, enriching the combination of sub-models. The proposed Bayesian stacking strategy exhibited favorable properties in the prediction and interpretation of complex survival data, which may aid in discovering cancer targets.
Collapse
Affiliation(s)
- Junjie Shen
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Major Chronic Non-communicable Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, People's Republic of China
| | - Shuo Wang
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center-University of Freiburg, 79085, Freiburg, Germany
| | - Hao Sun
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Major Chronic Non-communicable Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, People's Republic of China
| | - Jie Huang
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Major Chronic Non-communicable Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, People's Republic of China
| | - Lu Bai
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Major Chronic Non-communicable Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, People's Republic of China
| | - Xichao Wang
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Major Chronic Non-communicable Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, People's Republic of China
| | - Yongfei Dong
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Major Chronic Non-communicable Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, People's Republic of China
| | - Zaixiang Tang
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Major Chronic Non-communicable Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, People's Republic of China.
| |
Collapse
|
2
|
Shen J, Wang S, Dong Y, Sun H, Wang X, Tang Z. A non-negative spike-and-slab lasso generalized linear stacking prediction modeling method for high-dimensional omics data. BMC Bioinformatics 2024; 25:119. [PMID: 38509499 PMCID: PMC10953151 DOI: 10.1186/s12859-024-05741-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Accepted: 03/11/2024] [Indexed: 03/22/2024] Open
Abstract
BACKGROUND High-dimensional omics data are increasingly utilized in clinical and public health research for disease risk prediction. Many previous sparse methods have been proposed that using prior knowledge, e.g., biological group structure information, to guide the model-building process. However, these methods are still based on a single model, offen leading to overconfident inferences and inferior generalization. RESULTS We proposed a novel stacking strategy based on a non-negative spike-and-slab Lasso (nsslasso) generalized linear model (GLM) for disease risk prediction in the context of high-dimensional omics data. Briefly, we used prior biological knowledge to segment omics data into a set of sub-data. Each sub-model was trained separately using the features from the group via a proper base learner. Then, the predictions of sub-models were ensembled by a super learner using nsslasso GLM. The proposed method was compared to several competitors, such as the Lasso, grlasso, and gsslasso, using simulated data and two open-access breast cancer data. As a result, the proposed method showed robustly superior prediction performance to the optimal single-model method in high-noise simulated data and real-world data. Furthermore, compared to the traditional stacking method, the proposed nsslasso stacking method can efficiently handle redundant sub-models and identify important sub-models. CONCLUSIONS The proposed nsslasso method demonstrated favorable predictive accuracy, stability, and biological interpretability. Additionally, the proposed method can also be used to detect new biomarkers and key group structures.
Collapse
Affiliation(s)
- Junjie Shen
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, No. 199 Renai Road, Suzhou, 215123, Jiangsu, People's Republic of China
| | - Shuo Wang
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, 79085, Freiburg, Germany
| | - Yongfei Dong
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, No. 199 Renai Road, Suzhou, 215123, Jiangsu, People's Republic of China
| | - Hao Sun
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, No. 199 Renai Road, Suzhou, 215123, Jiangsu, People's Republic of China
| | - Xichao Wang
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, No. 199 Renai Road, Suzhou, 215123, Jiangsu, People's Republic of China
| | - Zaixiang Tang
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, No. 199 Renai Road, Suzhou, 215123, Jiangsu, People's Republic of China.
| |
Collapse
|
3
|
Pu J, Yu H, Guo Y. A Novel Strategy to Identify Prognosis-Relevant Gene Sets in Cancers. Genes (Basel) 2022; 13:862. [PMID: 35627247 PMCID: PMC9141699 DOI: 10.3390/genes13050862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 05/06/2022] [Accepted: 05/09/2022] [Indexed: 11/16/2022] Open
Abstract
Molecular prognosis markers hold promise for improved prediction of patient survival, and a pathway or gene set may add mechanistic interpretation to their prognostic prediction power. In this study, we demonstrated a novel strategy to identify prognosis-relevant gene sets in cancers. Our study consists of a first round of gene-level analyses and a second round of gene-set-level analyses, in which the Composite Gene Expression Score critically summarizes a surrogate expression value at gene set level and a permutation procedure is exerted to assess prognostic significance of gene sets. An optional differential coexpression module is appended to the two phases of survival analyses to corroborate and refine prognostic gene sets. Our strategy was demonstrated in 33 cancer types across 32,234 gene sets. We found oncogenic gene sets accounted for an increased proportion among the final gene sets, and genes involved in DNA replication and DNA repair have ubiquitous prognositic value for multiple cancer types. In summary, we carried out the largest gene set based prognosis study to date. Compared to previous similar studies, our approach offered multiple improvements in design and methodology implementation. Functionally relevant gene sets of ubiquitous prognostic significance in multiple cancer types were identified.
Collapse
Affiliation(s)
- Junyi Pu
- School of Life Sciences, Northwest University, Xi’an 710069, China;
| | - Hui Yu
- Comprehensive Cancer Center, New Mexico University, Albuquerque, NM 87131, USA;
| | - Yan Guo
- Comprehensive Cancer Center, New Mexico University, Albuquerque, NM 87131, USA;
| |
Collapse
|
4
|
Shen Z, Jin Y, Sun Q, Zhang S, Chen X, Hu L, He C, Wang Y, Liu Q, Zhang H, Liu X, Wang L, Jiao J, Miao Y, Gu W, Wang F, Wang C, Shi Y, Ye J, Zhu T, Sun C, Song X, Xu L, Yan D, Sun H, Cao J, Li D, Li Z, Wang Z, Huang S, Xu K, Sang W. A Novel Prognostic Index Model for Adult Hemophagocytic Lymphohistiocytosis: A Multicenter Retrospective Analysis in China. Front Immunol 2022; 13:829878. [PMID: 35251016 PMCID: PMC8894441 DOI: 10.3389/fimmu.2022.829878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Accepted: 01/28/2022] [Indexed: 11/13/2022] Open
Abstract
Hemophagocytic lymphohistiocytosis (HLH) is an immune disorder with rapid progression and poor survival. Individual treatment strategy is restricted, due to the absence of precise stratification criteria. In this multicenter retrospective study, we aimed to develop a feasible prognostic model for adult HLH in China. A total of 270 newly diagnosed patients of adult HLH were retrieved from the Huaihai Lymphoma Working Group (HHLWG), of whom 184 from 5 medical centers served as derivation cohort, and 86 cases from 3 other centers served as validation cohort. X-Tile program and Maxstat analysis were used to identify optimal cutoff points of continuous variables; univariate and multivariate Cox analyses were used for variable selection, and the Kaplan–Meier curve was used to analyze the value of variables on prognosis. The C-index, Brier Score, and calibration curve were used for model validation. Multivariate analysis showed that age, creatinine, albumin, platelet, lymphocyte ratio, and alanine aminotransferase were independent prognostic factors. By rounding up the hazard ratios from 6 significant variables, a maximum of 9 points was assigned. The final scoring model of HHLWG-HPI was identified with four risk groups: low risk (≤3 pts), low-intermediate risk (4 pts), high-intermediate risk (5-6 pts), and high risk (≥7 pts), with 5-year overall survival rates of 68.5%, 35.2%, 21.3%, and 10.8%, respectively. The C-indexes were 0.796 and 0.758 in the derivation and validation cohorts by using a bootstrap resampling program. In conclusion, the HHLWG-HPI model provides a feasible and accurate stratification system for individualized treatment strategy in adult HLH.
Collapse
Affiliation(s)
- Ziyuan Shen
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Yingliang Jin
- Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Qian Sun
- Department of Hematology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Shuo Zhang
- Department of Hematology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Xi Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Lingling Hu
- Department of Hematology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Chenlu He
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Ying Wang
- Department of Personnel, Suqian First Hospital, Suqian, China
| | - Qinhua Liu
- Department of Hematology, The First Affiliated Hospital of Anhui Medical University, Hefei, China
| | - Hao Zhang
- Department of Hematology, The Affiliated Hospital of Jining Medical University, Jining, China
| | - Xin Liu
- Department of Hematology, The Affiliated Hospital of Jining Medical University, Jining, China
| | - Ling Wang
- Department of Hematology, Taian Central Hospital, Taian, China
| | - Jun Jiao
- Department of Hematology, Taian Central Hospital, Taian, China
| | - Yuqing Miao
- Department of Hematology, Yancheng First People’s Hospital, Yancheng, China
| | - Weiying Gu
- Department of Hematology, The First People’s Hospital of Changzhou, Changzhou, China
| | - Fei Wang
- Department of Hematology, The First People’s Hospital of Changzhou, Changzhou, China
| | - Chunling Wang
- Department of Hematology, Huai’an First People’s Hospital, Huai’an, China
| | - Yuye Shi
- Department of Hematology, Huai’an First People’s Hospital, Huai’an, China
| | - Jingjing Ye
- Department of Hematology, Qilu Hospital of Shandong University, Jinan, China
| | - Taigang Zhu
- Department of Hematology, The General Hospital of Wanbei Coal-Electric Group, Suzhou, China
| | - Cai Sun
- Department of Hematology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Xuguang Song
- Department of Hematology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Linyan Xu
- Department of Hematology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Dongmei Yan
- Department of Hematology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Haiying Sun
- Department of Hematology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Jiang Cao
- Department of Hematology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Depeng Li
- Department of Hematology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Zhenyu Li
- Department of Hematology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Zhao Wang
- Department of Hematology, Beijing Friendship Hospital, Capital Medical University, Beijing, China
| | - Shuiping Huang
- Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
- *Correspondence: Wei Sang, ; Kailin Xu, ; Shuiping Huang,
| | - Kailin Xu
- Department of Hematology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
- *Correspondence: Wei Sang, ; Kailin Xu, ; Shuiping Huang,
| | - Wei Sang
- Department of Hematology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
- *Correspondence: Wei Sang, ; Kailin Xu, ; Shuiping Huang,
| |
Collapse
|
5
|
Novel application of survival models for predicting microbial community transitions with variable selection for eDNA. Appl Environ Microbiol 2022; 88:e0214621. [PMID: 35138931 DOI: 10.1128/aem.02146-21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Survival analysis is a prolific statistical tool in medicine for inferring risk and time to disease-related events. However, it is under-utilized in microbiome research to predict microbial community mediated events, partly due to the sparsity and high dimensional nature of the data. We advance the application of Cox proportional hazards (Cox PH) survival models to environmental DNA (eDNA) data with feature selection suitable for filtering irrelevant and redundant taxonomic variables. Selection methods are compared in terms of false positives, sensitivity, and survival estimation accuracy in simulation and in a real data setting to forecast harmful cyanobacterial blooms. A novel extension of a method for selecting microbial biomarkers with survival data (SuRFCox) reliably outperforms other methods. We determine Cox PH models with SuRFCox selected predictors are more robust to varied signal, noise, and data correlation structure. SuRFCox also yields the most accurate and consistent prediction of blooms according to cross-validated testing by year over eight different bloom seasons. Identification of common biomarkers among validated survival forecasts over changing conditions has clear biological significance. Survival models with such biomarkers inform risk assessment and provide insight into the causes of critical community transitions. Importance In this paper, we report on a novel approach of selecting microorganisms for model-based prediction of the time to critical microbially-modulated events (e.g., harmful algal blooms, clinical outcomes, community shifts, etc.). Our novel method for identifying biomarkers from large, dynamic communities of microbes has broad utility to environmental and ecological impact risk assessment and public health. Results will also promote theoretical and practical advancements relevant to the biology of specific organisms. To address the unique challenge posed by diverse environmental conditions and sparse microbes, we developed a novel method of selecting predictors for modelling time-to-event data. Competing methods for selecting predictors are rigorously compared to determine which is the most accurate and generalizable. Model forecasts are applied to show suitable predictors can precisely quantify the risk over time of biological events like harmful cyanobacterial blooms.
Collapse
|
6
|
Saad M, He S, Thorstad W, Gay H, Barnett D, Zhao Y, Ruan S, Wang X, Li H. Learning-based Cancer Treatment Outcome Prognosis using Multimodal Biomarkers. IEEE TRANSACTIONS ON RADIATION AND PLASMA MEDICAL SCIENCES 2022; 6:231-244. [PMID: 35520102 PMCID: PMC9066560 DOI: 10.1109/trpms.2021.3104297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Predicting early in treatment whether a tumor is likely to be responsive is a difficult yet important task to support clinical decision-making. Studies have shown that multimodal biomarkers could provide complementary information and lead to more accurate treatment outcome prognosis than unimodal biomarkers. However, the prognosis accuracy could be affected by multimodal data heterogeneity and incompleteness. The small-sized and imbalance datasets also bring additional challenges for training a designed prognosis model. In this study, a modular framework employing multimodal biomarkers for cancer treatment outcome prediction was proposed. It includes four modules of synthetic data generation, deep feature extraction, multimodal feature fusion, and classification to address the challenges described above. The feasibility and advantages of the designed framework were demonstrated through an example study, in which the goal was to stratify oropharyngeal squamous cell carcinoma (OPSCC) patients with low- and high-risks of treatment failures by use of positron emission tomography (PET) image data and microRNA (miRNA) biomarkers. The superior prognosis performance and the comparison with other methods demonstrated the efficiency of the proposed framework and its ability of enabling seamless integration, validation and comparison of various algorithms in each module of the framework. The limitation and future work was discussed as well.
Collapse
Affiliation(s)
- Maliazurina Saad
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA. She is now with the MD Anderson Cancer Center, Houston, TX, USA
| | - Shenghua He
- Department of Computer Science and Engineering, Washington University, Saint louis, MO, USA
| | - Wade Thorstad
- Department of Radiation Oncology, Washington University School of Medicine, Saint louis, MO, USA
| | - Hiram Gay
- Department of Radiation Oncology, Washington University School of Medicine, Saint louis, MO, USA
| | - Daniel Barnett
- Carle Cancer Center, Carle Foundation Hospital, Urbana, IL, USA
| | - Yujie Zhao
- Mao Clinic at Florida, Jacksonville, FL, USA
| | - Su Ruan
- Laboratoire LITIS (EA 4108), Equipe Quantif, University of Rouen, France
| | - Xiaowei Wang
- Department of Pharmacology and Bioengineering, University of Illinois at Chicago, Chicago, IL, USA
| | - Hua Li
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Cancer Center at Illinois, and Carle Foundation Hospital, Urbana, IL, USA
| |
Collapse
|
7
|
Chu J, Sun NA, Hu W, Chen X, Yi N, Shen Y. The Application of Bayesian Methods in Cancer Prognosis and Prediction. Cancer Genomics Proteomics 2022; 19:1-11. [PMID: 34949654 PMCID: PMC8717957 DOI: 10.21873/cgp.20298] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 11/24/2021] [Accepted: 11/30/2021] [Indexed: 11/10/2022] Open
Abstract
With the development of high-throughput biological techniques, high-dimensional omics data have emerged. These molecular data provide a solid foundation for precision medicine and prognostic prediction of cancer. Bayesian methods contribute to constructing prognostic models with complex relationships in omics and improving performance by introducing different prior distribution, which is suitable for modelling the high-dimensional data involved. Using different omics, several Bayesian hierarchical approaches have been proposed for variable selection and model construction. In particular, the Bayesian methods of multi-omics integration have also been consistently proposed in recent years. Compared with single-omics, multi-omics integration modelling will contribute to improving predictive performance, gaining insights into the underlying mechanisms of tumour occurrence and development, and the discovery of more reliable biomarkers. In this work, we present a review of current proposed Bayesian approaches in prognostic prediction modelling in cancer.
Collapse
Affiliation(s)
- Jiadong Chu
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - N A Sun
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Wei Hu
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Xuanli Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Nengjun Yi
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, U.S.A
| | - Yueping Shen
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China;
| |
Collapse
|
8
|
Shen J, Liu J, Li H, Bai L, Du Z, Geng R, Cao J, Sun P, Tang Z. Explore association of genes in PDL1/PD1 pathway to radiotherapy survival benefit based on interaction model strategy. Radiat Oncol 2021; 16:223. [PMID: 34794456 PMCID: PMC8600865 DOI: 10.1186/s13014-021-01951-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Accepted: 11/08/2021] [Indexed: 02/25/2023] Open
Abstract
Purpose To explore the association of genes in “PD-L1 expression and PD-1 check point pathway in cancer” to radiotherapy survival benefit. Methods and materials Gene expression data and clinical information of cancers were downloaded from TCGA. Radiotherapy survival benefit was defined based on interaction model. Fast backward multivariate Cox regression was performed using stacking multiple interpolation data to identify radio-sensitive (RS) genes. Results Among the 73 genes in PD-L1/PD-1 pathway, we identified 24 RS genes in BRCA data set, 25 RS genes in STAD data set and 20 RS genes in HNSC data set, with some crossover genes. Theoretically, there are two types of RS genes. The expression level of Type I RS genes did not affect patients' overall survival (OS), but when receiving radiotherapy, patients with different expression level of Type I RS genes had varied survival benefit. Oppositely, Type II RS genes affected patients' OS. And when receiving radiotherapy, those with lower OS could benefit a lot. Type II RS genes in BRCA had strong positive correlation and closely biological interactions. When performing cluster analysis using these related Type II RS genes, patients could be divided into RS group and non-RS group in BRCA and METABRIC data sets. Conclusions Our study explored potential radio-sensitive biomarkers of several main cancer types in an important tumor immune checkpoint pathway and revealed a strong association between this pathway and radiotherapy survival benefit. New types of RS genes could be identified based on expanded definition to RS genes. Supplementary Information The online version contains supplementary material available at 10.1186/s13014-021-01951-x.
Collapse
Affiliation(s)
- Junjie Shen
- Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, 215123, China.,Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Suzhou, 215123, China
| | - Jingfang Liu
- Department of Gynaecology and Obstetrics, The First Affiliated Hospital of Soochow University, Suzhou, 215123, China
| | - Huijun Li
- Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, 215123, China.,Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Suzhou, 215123, China
| | - Lu Bai
- Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, 215123, China.,Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Suzhou, 215123, China
| | - Zixuan Du
- Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, 215123, China.,Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Suzhou, 215123, China
| | - Ruirui Geng
- Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, 215123, China.,Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Suzhou, 215123, China
| | - Jianping Cao
- School of Radiation Medicine and Protection and Collaborative Innovation Center of Radiation Medicine of Jiangsu Higher Education Institutions, Soochow University, Suzhou, 215006, China
| | - Peng Sun
- Department of Otolaryngology, The First Affiliated Hospital of Soochow University, Suzhou, 215123, China.
| | - Zaixiang Tang
- Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, 215123, China. .,Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Suzhou, 215123, China.
| |
Collapse
|
9
|
Zheng X, Amos CI, Frost HR. Pan-cancer evaluation of gene expression and somatic alteration data for cancer prognosis prediction. BMC Cancer 2021; 21:1053. [PMID: 34563154 PMCID: PMC8467202 DOI: 10.1186/s12885-021-08796-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 08/16/2021] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND Over the past decades, approaches for diagnosing and treating cancer have seen significant improvement. However, the variability of patient and tumor characteristics has limited progress on methods for prognosis prediction. The development of high-throughput omics technologies now provides multiple approaches for characterizing tumors. Although a large number of published studies have focused on integration of multi-omics data and use of pathway-level models for cancer prognosis prediction, there still exists a gap of knowledge regarding the prognostic landscape across multi-omics data for multiple cancer types using both gene-level and pathway-level predictors. METHODS In this study, we systematically evaluated three often available types of omics data (gene expression, copy number variation and somatic point mutation) covering both DNA-level and RNA-level features. We evaluated the landscape of predictive performance of these three omics modalities for 33 cancer types in the TCGA using a Lasso or Group Lasso-penalized Cox model and either gene or pathway level predictors. RESULTS We constructed the prognostic landscape using three types of omics data for 33 cancer types on both the gene and pathway levels. Based on this landscape, we found that predictive performance is cancer type dependent and we also highlighted the cancer types and omics modalities that support the most accurate prognostic models. In general, models estimated on gene expression data provide the best predictive performance on either gene or pathway level and adding copy number variation or somatic point mutation data to gene expression data does not improve predictive performance, with some exceptional cohorts including low grade glioma and thyroid cancer. In general, pathway-level models have better interpretative performance, higher stability and smaller model size across multiple cancer types and omics data types relative to gene-level models. CONCLUSIONS Based on this landscape and comprehensively comparison, models estimated on gene expression data provide the best predictive performance on either gene or pathway level. Pathway-level models have better interpretative performance, higher stability and smaller model size relative to gene-level models.
Collapse
Affiliation(s)
- Xingyu Zheng
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| | - Christopher I Amos
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA. .,Department of Medicine, Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, TX, USA.
| | - H Robert Frost
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA.
| |
Collapse
|
10
|
A potential prognostic prediction model of colon adenocarcinoma with recurrence based on prognostic lncRNA signatures. Hum Genomics 2020; 14:24. [PMID: 32522293 PMCID: PMC7288433 DOI: 10.1186/s40246-020-00270-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Accepted: 05/13/2020] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Colon adenocarcinoma (COAD) is one of the common gastrointestinal malignant diseases, with high mortality rate and poor prognosis due to delayed diagnosis. This study aimed to construct a prognostic prediction model for patients with colon adenocarcinoma (COAD) recurrence. METHODS Differently expressed RNAs (DERs) between recurrence and non-recurrence COAD samples were identified based on expression profile data from the NCBI Gene Expression Omnibus (GEO) repository and The Cancer Genome Atlas (TCGA) database. Then, recurrent COAD discriminating classifier was established using SMV-RFE algorithm, and receiver operating characteristic curve was used to assess the predictive power of classifier. Furthermore, the prognostic prediction model was constructed based on univariate and multivariate Cox regression analysis, and Kaplan-Meier survival curve analysis was used to estimate this model. Furthermore, the co-expression network of DElncRNAs and DEmRNAs was constructed followed by GO and KEGG pathway enrichment analysis. RESULTS A total of 54 optimized signature DElncRNAs were screened and SMV classifier was constructed, which presented a high accuracy to distinguish recurrence and non-recurrence COAD samples. Furthermore, six independent prognostic lncRNAs signatures (LINC00852, ZNF667-AS1, FOXP1-IT1, LINC01560, TAF1A-AS1, and LINC00174) in COAD patients with recurrence were screened, and the prognostic prediction model for recurrent COAD was constructed, which possessed a relative satisfying predicted ability both in the training dataset and validation dataset. Furthermore, the DEmRNAs in the co-expression network were mainly enriched in glycan biosynthesis, cardiac muscle contraction, and colorectal cancer. CONCLUSIONS Our study revealed that six lncRNA signatures acted as an independent prognostic biomarker for patients with COAD recurrence.
Collapse
|
11
|
Zhao Z, Li Y, Wu Y, Chen R. Deep learning-based model for predicting progression in patients with head and neck squamous cell carcinoma. Cancer Biomark 2020; 27:19-28. [PMID: 31658045 DOI: 10.3233/cbm-190380] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
PURPOSE This study endeavors to build a deep learning (DL)-based model for predicting disease progression in head and neck squamous cell carcinoma (HNSCC) patients by integrating multi-omics data. METHODS RNA sequencing, miRNA sequencing, and methylation data from The Cancer Genome Atlas (TCGA) were used as input for autoencoder, a DL approach. An autoencoder-based prognosis model for PFS was built by SVM algorithm and tested in three confirmation sets. Predictive performance of the model was compared to two alternative approaches. Differential expression analysis for mRNAs, microRNAs (miRNA) and methylation was conducted. Moreover, functional annotation of differentially expressed genes (DEGs) was achieved through function enrichment analysis. RESULT The DL-based prognosis model identified two subgroups of patients with significantly different PFS, and showcased a good model fitness (C-index = 0.73). The two identified PFS subtypes were successfully validated in three confirmation sets. The DL-based model was more accurate and efficient than principal component analysis (PCA) or individual Cox-PH-based models. There were 348 DEGs, 23 differentially expressed miRNAs and 55 differentially methylated genes between the two PFS subtypes. These genes were significantly involved in several immune-related biological processes and primary immunodeficiency, cell adhesion molecules (CAMs), B cell receptor signaling and leukocyte transendothelial migration pathways. CONCLUSION The DL-based model introduced in this study is reliable and robust in predicting disease progression in HNSCC patients. A number of pathways and genes targets are unraveled to be implicated in cancer progression. Utility of this model would facilitate development of more individualized therapy for HNSCC patients and improve prognosis.
Collapse
Affiliation(s)
- Zhen Zhao
- Department of Otolaryngology, Nanjing First Hospital, Nanjing Medical University, Nanjing, Jiangsu 210006, China.,Department of Otolaryngology, Nanjing First Hospital, Nanjing Medical University, Nanjing, Jiangsu 210006, China
| | - Yingli Li
- Department of Plastic Surgery, The 960th Hospital of the PLA Joint Logistic Support Force, Jinan, Shandong 250000, China.,Department of Otolaryngology, Nanjing First Hospital, Nanjing Medical University, Nanjing, Jiangsu 210006, China
| | - Yuanqing Wu
- Department of Otolaryngology, Nanjing First Hospital, Nanjing Medical University, Nanjing, Jiangsu 210006, China
| | - Rongrong Chen
- Department of Otolaryngology, Nanjing First Hospital, Nanjing Medical University, Nanjing, Jiangsu 210006, China
| |
Collapse
|
12
|
Wu G, Zhang M. A novel risk score model based on eight genes and a nomogram for predicting overall survival of patients with osteosarcoma. BMC Cancer 2020; 20:456. [PMID: 32448271 PMCID: PMC7245838 DOI: 10.1186/s12885-020-06741-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Accepted: 03/12/2020] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND This study aims to identify a predictive model to predict survival outcomes of osteosarcoma (OS) patients. METHODS A RNA sequencing dataset (the training set) and a microarray dataset (the validation set) were obtained from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) database, respectively. Differentially expressed genes (DEGs) between metastatic and non-metastatic OS samples were identified in training set. Prognosis-related DEGs were screened and optimized by support vector machine (SVM) recursive feature elimination. A SVM classifier was built to classify metastatic and non-metastatic OS samples. Independent prognosic genes were extracted by multivariate regression analysis to build a risk score model followed by performance evaluation in two datasets by Kaplan-Meier (KM) analysis. Independent clinical prognostic indicators were identified followed by nomogram analysis. Finally, functional analyses of survival-related genes were conducted. RESULT Totally, 345 DEGs and 45 prognosis-related genes were screened. A SVM classifier could distinguish metastatic and non-metastatic OS samples. An eight-gene signature was an independent prognostic marker and used for constructing a risk score model. The risk score model could separate OS samples into high and low risk groups in two datasets (training set: log-rank p < 0.01, C-index = 0.805; validation set: log-rank p < 0.01, C-index = 0.797). Tumor metastasis and RS model status were independent prognostic factors and nomogram model exhibited accurate survival prediction for OS. Additionally, functional analyses of survival-related genes indicated they were closely associated with immune responses and cytokine-cytokine receptor interaction pathway. CONCLUSION An eight-gene predictive model and nomogram were developed to predict OS prognosis.
Collapse
Affiliation(s)
- Guangzhi Wu
- Departments of Hand Surgery, The Third Hospital of Jilin University, Changchun, Jilin Province China
| | - Minglei Zhang
- Departments of Orthopedics, The Third Hospital of Jilin University, Changchun, Jilin Province China
| |
Collapse
|
13
|
Zheng X, Amos CI, Frost HR. Comparison of pathway and gene-level models for cancer prognosis prediction. BMC Bioinformatics 2020; 21:76. [PMID: 32111152 PMCID: PMC7048092 DOI: 10.1186/s12859-020-3423-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Accepted: 02/17/2020] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Cancer prognosis prediction is valuable for patients and clinicians because it allows them to appropriately manage care. A promising direction for improving the performance and interpretation of expression-based predictive models involves the aggregation of gene-level data into biological pathways. While many studies have used pathway-level predictors for cancer survival analysis, a comprehensive comparison of pathway-level and gene-level prognostic models has not been performed. To address this gap, we characterized the performance of penalized Cox proportional hazard models built using either pathway- or gene-level predictors for the cancers profiled in The Cancer Genome Atlas (TCGA) and pathways from the Molecular Signatures Database (MSigDB). RESULTS When analyzing TCGA data, we found that pathway-level models are more parsimonious, more robust, more computationally efficient and easier to interpret than gene-level models with similar predictive performance. For example, both pathway-level and gene-level models have an average Cox concordance index of ~ 0.85 for the TCGA glioma cohort, however, the gene-level model has twice as many predictors on average, the predictor composition is less stable across cross-validation folds and estimation takes 40 times as long as compared to the pathway-level model. When the complex correlation structure of the data is broken by permutation, the pathway-level model has greater predictive performance while still retaining superior interpretative power, robustness, parsimony and computational efficiency relative to the gene-level models. For example, the average concordance index of the pathway-level model increases to 0.88 while the gene-level model falls to 0.56 for the TCGA glioma cohort using survival times simulated from uncorrelated gene expression data. CONCLUSION The results of this study show that when the correlations among gene expression values are low, pathway-level analyses can yield better predictive performance, greater interpretative power, more robust models and less computational cost relative to a gene-level model. When correlations among genes are high, a pathway-level analysis provides equivalent predictive power compared to a gene-level analysis while retaining the advantages of interpretability, robustness and computational efficiency.
Collapse
Affiliation(s)
- Xingyu Zheng
- 0000 0001 2179 2404grid.254880.3Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH 03755 USA
| | - Christopher I. Amos
- 0000 0001 2179 2404grid.254880.3Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH 03755 USA ,0000 0001 2160 926Xgrid.39382.33Department of Medicine, Baylor College of Medicine, Institute for Clinical and Translational Research, 1 Baylor Plaza, Houston, TX 77030 USA
| | - H. Robert Frost
- 0000 0001 2179 2404grid.254880.3Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH 03755 USA
| |
Collapse
|
14
|
Yi B, Tang C, Tao Y, Zhao Z. Definition of a novel vascular invasion-associated multi-gene signature for predicting survival in patients with hepatocellular carcinoma. Oncol Lett 2020; 19:147-158. [PMID: 31897125 PMCID: PMC6923904 DOI: 10.3892/ol.2019.11072] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2019] [Accepted: 09/11/2019] [Indexed: 12/12/2022] Open
Abstract
The aim of the present study was to identify a vascular invasion-associated gene signature for predicting prognosis in patients with hepatocellular carcinoma (HCC). Using RNA-sequencing data of 292 HCC samples from The Cancer Genome Atlas (TCGA), the present study screened differentially expressed genes (DEGs) between patients with and without vascular invasion. Feature genes were selected from the DEGs by support vector machine (SVM)-based recursive feature elimination (RFE-SVM) algorithm to build a classifier. A multi-gene signature was selected by L1 penalized (LASSO) Cox proportional hazards (PH) regression model from the feature genes selected by the RFE-SVM to develop a prognostic scoring model. TCGA set was defined as the training set and was divided by the gene signature into a high-risk group and a low-risk group. Involvement of the DEGs between the two risk groups in pathways was also investigated. The presence and absence of vascular invasion between patients of training set was 175 DEGs. A classification model of 42 genes performed well in differentiating patients with and without vascular invasion on the training set and the validation set. A 14-gene prognostic model was built that could divide the training set or the validation set into two risk groups with significantly different survival outcomes. A total of 762 DEGs in the two risk groups of the training set were revealed to be significantly associated with a number of signaling pathways. The present study provided a 42-gene classifier for predicting vascular invasion, and identified a vascular invasion-associated 14-gene signature for predicting prognosis in patients with HCC. Several genes and pathways in HCC development are characterized and may be potential therapeutic targets for this type of cancer.
Collapse
Affiliation(s)
- Bo Yi
- Department of Hepatobiliary Surgery, Zhu Zhou Central Hospital, Zhuzhou, Hunan 412007, P.R. China
| | - Caixi Tang
- Department of Hepatobiliary Surgery, Zhu Zhou Central Hospital, Zhuzhou, Hunan 412007, P.R. China
| | - Yin Tao
- Department of Hepatobiliary Surgery, Zhu Zhou Central Hospital, Zhuzhou, Hunan 412007, P.R. China
| | - Zhijian Zhao
- Department of Hepatobiliary Surgery, Zhu Zhou Central Hospital, Zhuzhou, Hunan 412007, P.R. China
| |
Collapse
|
15
|
Fang J. Tightly integrated genomic and epigenomic data mining using tensor decomposition. Bioinformatics 2019; 35:112-118. [PMID: 29939222 DOI: 10.1093/bioinformatics/bty513] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2018] [Accepted: 06/21/2018] [Indexed: 12/12/2022] Open
Abstract
Motivation Complex diseases such as cancers often involve multiple types of genomic and/or epigenomic abnormalities. Rapid accumulation of multiple types of omics data demands methods for integrating the multidimensional data in order to elucidate complex relationships among different types of genomic and epigenomic abnormalities. Results In the present study, we propose a tightly integrated approach based on tensor decomposition. Multiple types of data, including mRNA, methylation, copy number variations and somatic mutations, are merged into a high-order tensor which is used to develop predictive models for overall survival. The weight tensors of the models are constrained using CANDECOMP/PARAFAC (CP) tensor decomposition and learned using support tensor machine regression (STR) and ridge tensor regression (RTR). The results demonstrate that the tensor decomposition based approaches can achieve better performance than the models based individual data type and the concatenation approach. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jianwen Fang
- Computational & Systems Biology Branch, Biometric Research Program, Division of Cancer Treatment and Diagnosis, National Cancer Institute, 9609 Medical Center Dr., Rockville, MD, USA
| |
Collapse
|
16
|
Quinn TP, Lee SC, Venkatesh S, Nguyen T. Improving the classification of neuropsychiatric conditions using gene ontology terms as features. Am J Med Genet B Neuropsychiatr Genet 2019; 180:508-518. [PMID: 31025483 DOI: 10.1002/ajmg.b.32727] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/22/2018] [Revised: 02/14/2019] [Accepted: 03/08/2019] [Indexed: 11/11/2022]
Abstract
Although neuropsychiatric disorders have an established genetic background, their molecular foundations remain elusive. This has prompted many investigators to search for explanatory biomarkers that can predict clinical outcomes. One approach uses machine learning to classify patients based on blood mRNA expression. However, these endeavors typically fail to achieve the high level of performance, stability, and generalizability required for clinical translation. Moreover, these classifiers can lack interpretability because not all genes have relevance to researchers. For this study, we hypothesized that annotation-based classifiers can improve classification performance, stability, generalizability, and interpretability. To this end, we evaluated the models of four classification algorithms on six neuropsychiatric data sets using four annotation databases. Our results suggest that the Gene Ontology Biological Process database can transform gene expression into an annotation-based feature space that is accurate and stable. We also show how annotation features can improve the interpretability of classifiers: as annotations are used to assign biological importance to genes, the biological importance of annotation-based features are the features themselves. In evaluating the annotation features, we find that top ranked annotations tend contain top ranked genes, suggesting that the most predictive annotations are a superset of the most predictive genes. Based on this, and the fact that annotations are used routinely to assign biological importance to genetic data, we recommend transforming gene-level expression into annotation-level expression prior to the classification of neuropsychiatric conditions.
Collapse
Affiliation(s)
- Thomas P Quinn
- Centre for Pattern Recognition and Data Analytics (PRaDA), Deakin University, Geelong, Victoria, Australia.,Centre for Molecular and Medical Research, Deakin University, Geelong, Victoria, Australia.,Bioinformatics Core Research Group, Deakin University, Geelong, Victoria, Australia
| | - Samuel C Lee
- Centre for Pattern Recognition and Data Analytics (PRaDA), Deakin University, Geelong, Victoria, Australia
| | - Svetha Venkatesh
- Centre for Pattern Recognition and Data Analytics (PRaDA), Deakin University, Geelong, Victoria, Australia
| | - Thin Nguyen
- Centre for Pattern Recognition and Data Analytics (PRaDA), Deakin University, Geelong, Victoria, Australia
| |
Collapse
|
17
|
Abstract
MOTIVATION Estimating the future course of patients with cancer lesions is invaluable to physicians; however, current clinical methods fail to effectively use the vast amount of multimodal data that is available for cancer patients. To tackle this problem, we constructed a multimodal neural network-based model to predict the survival of patients for 20 different cancer types using clinical data, mRNA expression data, microRNA expression data and histopathology whole slide images (WSIs). We developed an unsupervised encoder to compress these four data modalities into a single feature vector for each patient, handling missing data through a resilient, multimodal dropout method. Encoding methods were tailored to each data type-using deep highway networks to extract features from clinical and genomic data, and convolutional neural networks to extract features from WSIs. RESULTS We used pancancer data to train these feature encodings and predict single cancer and pancancer overall survival, achieving a C-index of 0.78 overall. This work shows that it is possible to build a pancancer model for prognosis that also predicts prognosis in single cancer sites. Furthermore, our model handles multiple data modalities, efficiently analyzes WSIs and represents patient multimodal data flexibly into an unsupervised, informative representation. We thus present a powerful automated tool to accurately determine prognosis, a key step towards personalized treatment for cancer patients. AVAILABILITY AND IMPLEMENTATION https://github.com/gevaertlab/MultimodalPrognosis.
Collapse
Affiliation(s)
| | - Olivier Gevaert
- Department of Medicine and Biomedical Data Science, Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA
| |
Collapse
|
18
|
Dereli O, Oğuz C, Gönen M. Path2Surv: Pathway/gene set-based survival analysis using multiple kernel learning. Bioinformatics 2019; 35:5137-5145. [DOI: 10.1093/bioinformatics/btz446] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2018] [Revised: 05/17/2019] [Accepted: 05/25/2019] [Indexed: 12/18/2022] Open
Abstract
Abstract
Motivation
Survival analysis methods that integrate pathways/gene sets into their learning model could identify molecular mechanisms that determine survival characteristics of patients. Rather than first picking the predictive pathways/gene sets from a given collection and then training a predictive model on the subset of genomic features mapped to these selected pathways/gene sets, we developed a novel machine learning algorithm (Path2Surv) that conjointly performs these two steps using multiple kernel learning.
Results
We extensively tested our Path2Surv algorithm on 7655 patients from 20 cancer types using cancer-specific pathway/gene set collections and gene expression profiles of these patients. Path2Surv statistically significantly outperformed survival random forest (RF) on 12 out of 20 datasets and obtained comparable predictive performance against survival support vector machine (SVM) using significantly fewer gene expression features (i.e. less than 10% of what survival RF and survival SVM used).
Availability and implementation
Our implementations of survival SVM and Path2Surv algorithms in R are available at https://github.com/mehmetgonen/path2surv together with the scripts that replicate the reported experiments.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Onur Dereli
- Graduate School of Sciences and Engineering, İstanbul 34450, Turkey
| | - Ceyda Oğuz
- Department of Industrial Engineering, College of Engineering, İstanbul 34450, Turkey
| | - Mehmet Gönen
- Department of Industrial Engineering, College of Engineering, İstanbul 34450, Turkey
- School of Medicine, Koc¸ University, İstanbul 34450, Turkey
- Department of Biomedical Engineering, School of Medicine, Oregon Health & Science University, Portland, OR 97239, USA
| |
Collapse
|
19
|
Zhang Y, Yang W, Li D, Yang JY, Guan R, Yang MQ. Toward the precision breast cancer survival prediction utilizing combined whole genome-wide expression and somatic mutation analysis. BMC Med Genomics 2018; 11:104. [PMID: 30454048 DOI: 10.1109/bibm.2017.8217762] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/23/2023] Open
Abstract
BACKGROUND Breast cancer is the most common type of invasive cancer in woman. It accounts for approximately 18% of all cancer deaths worldwide. It is well known that somatic mutation plays an essential role in cancer development. Hence, we propose that a prognostic prediction model that integrates somatic mutations with gene expression can improve survival prediction for cancer patients and also be able to reveal the genetic mutations associated with survival. METHOD Differential expression analysis was used to identify breast cancer related genes. Genetic algorithm (GA) and univariate Cox regression analysis were applied to filter out survival related genes. DAVID was used for enrichment analysis on somatic mutated gene set. The performance of survival predictors were assessed by Cox regression model and concordance index(C-index). RESULTS We investigated the genome-wide gene expression profile and somatic mutations of 1091 breast invasive carcinoma cases from The Cancer Genome Atlas (TCGA). We identified 118 genes with high hazard ratios as breast cancer survival risk gene candidates (log rank p < 0.0001 and c-index = 0.636). Multiple breast cancer survival related genes were found in this gene set, including FOXR2, FOXD1, MTNR1B and SDC1. Further genetic algorithm (GA) revealed an optimal gene set consisted of 88 genes with higher c-index (log rank p < 0.0001 and c-index = 0.656). We validated this gene set on an independent breast cancer data set and achieved a similar performance (log rank p < 0.0001 and c-index = 0.614). Moreover, we revealed 25 functional annotations, 15 gene ontology terms and 14 pathways that were significantly enriched in the genes that showed distinct mutation patterns in the different survival risk groups. These functional gene sets were used as new features for the survival prediction model. In particular, our results suggested that the Fanconi anemia pathway had an important role in breast cancer prognosis. CONCLUSIONS Our study indicated that the expression levels of the gene signatures remain the effective indicators for breast cancer survival prediction. Combining the gene expression information with other types of features derived from somatic mutations can further improve the performance of survival prediction. The pathways that were associated with survival risk suggested by our study can be further investigated for improving cancer patient survival.
Collapse
Affiliation(s)
- Yifan Zhang
- MidSouth Bioinformatics Center and Joint Bioinformatics Ph.D. Program of University of Arkansas at Little Rock and Univ. of Arkansas Medical Sciences, 2801 S. Univ. Ave, Little Rock, 72204, USA
| | - William Yang
- Department of Computer Science, Carnegie Mellon University School of Computer Science, 5000 Forbes Ave, Pittsburgh, 24105, USA
| | - Dan Li
- MidSouth Bioinformatics Center and Joint Bioinformatics Ph.D. Program of University of Arkansas at Little Rock and Univ. of Arkansas Medical Sciences, 2801 S. Univ. Ave, Little Rock, 72204, USA
| | - Jack Y Yang
- MidSouth Bioinformatics Center and Joint Bioinformatics Ph.D. Program of University of Arkansas at Little Rock and Univ. of Arkansas Medical Sciences, 2801 S. Univ. Ave, Little Rock, 72204, USA
| | - Renchu Guan
- MidSouth Bioinformatics Center and Joint Bioinformatics Ph.D. Program of University of Arkansas at Little Rock and Univ. of Arkansas Medical Sciences, 2801 S. Univ. Ave, Little Rock, 72204, USA
| | - Mary Qu Yang
- MidSouth Bioinformatics Center and Joint Bioinformatics Ph.D. Program of University of Arkansas at Little Rock and Univ. of Arkansas Medical Sciences, 2801 S. Univ. Ave, Little Rock, 72204, USA.
| |
Collapse
|
20
|
Zhang Y, Yang W, Li D, Yang JY, Guan R, Yang MQ. Toward the precision breast cancer survival prediction utilizing combined whole genome-wide expression and somatic mutation analysis. BMC Med Genomics 2018; 11:104. [PMID: 30454048 PMCID: PMC6245494 DOI: 10.1186/s12920-018-0419-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Background Breast cancer is the most common type of invasive cancer in woman. It accounts for approximately 18% of all cancer deaths worldwide. It is well known that somatic mutation plays an essential role in cancer development. Hence, we propose that a prognostic prediction model that integrates somatic mutations with gene expression can improve survival prediction for cancer patients and also be able to reveal the genetic mutations associated with survival. Method Differential expression analysis was used to identify breast cancer related genes. Genetic algorithm (GA) and univariate Cox regression analysis were applied to filter out survival related genes. DAVID was used for enrichment analysis on somatic mutated gene set. The performance of survival predictors were assessed by Cox regression model and concordance index(C-index). Results We investigated the genome-wide gene expression profile and somatic mutations of 1091 breast invasive carcinoma cases from The Cancer Genome Atlas (TCGA). We identified 118 genes with high hazard ratios as breast cancer survival risk gene candidates (log rank p < 0.0001 and c-index = 0.636). Multiple breast cancer survival related genes were found in this gene set, including FOXR2, FOXD1, MTNR1B and SDC1. Further genetic algorithm (GA) revealed an optimal gene set consisted of 88 genes with higher c-index (log rank p < 0.0001 and c-index = 0.656). We validated this gene set on an independent breast cancer data set and achieved a similar performance (log rank p < 0.0001 and c-index = 0.614). Moreover, we revealed 25 functional annotations, 15 gene ontology terms and 14 pathways that were significantly enriched in the genes that showed distinct mutation patterns in the different survival risk groups. These functional gene sets were used as new features for the survival prediction model. In particular, our results suggested that the Fanconi anemia pathway had an important role in breast cancer prognosis. Conclusions Our study indicated that the expression levels of the gene signatures remain the effective indicators for breast cancer survival prediction. Combining the gene expression information with other types of features derived from somatic mutations can further improve the performance of survival prediction. The pathways that were associated with survival risk suggested by our study can be further investigated for improving cancer patient survival. Electronic supplementary material The online version of this article (10.1186/s12920-018-0419-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yifan Zhang
- MidSouth Bioinformatics Center and Joint Bioinformatics Ph.D. Program of University of Arkansas at Little Rock and Univ. of Arkansas Medical Sciences, 2801 S. Univ. Ave, Little Rock, 72204, USA
| | - William Yang
- Department of Computer Science, Carnegie Mellon University School of Computer Science, 5000 Forbes Ave, Pittsburgh, 24105, USA
| | - Dan Li
- MidSouth Bioinformatics Center and Joint Bioinformatics Ph.D. Program of University of Arkansas at Little Rock and Univ. of Arkansas Medical Sciences, 2801 S. Univ. Ave, Little Rock, 72204, USA
| | - Jack Y Yang
- MidSouth Bioinformatics Center and Joint Bioinformatics Ph.D. Program of University of Arkansas at Little Rock and Univ. of Arkansas Medical Sciences, 2801 S. Univ. Ave, Little Rock, 72204, USA
| | - Renchu Guan
- MidSouth Bioinformatics Center and Joint Bioinformatics Ph.D. Program of University of Arkansas at Little Rock and Univ. of Arkansas Medical Sciences, 2801 S. Univ. Ave, Little Rock, 72204, USA
| | - Mary Qu Yang
- MidSouth Bioinformatics Center and Joint Bioinformatics Ph.D. Program of University of Arkansas at Little Rock and Univ. of Arkansas Medical Sciences, 2801 S. Univ. Ave, Little Rock, 72204, USA.
| |
Collapse
|
21
|
Zhang X, Li B, Han H, Song S, Xu H, Yi Z, Hong Y, Zhuang W, Yi N. Pathway-structured predictive modeling for multi-level drug response in multiple myeloma. Bioinformatics 2018; 34:3609-3615. [PMID: 29850860 PMCID: PMC6198861 DOI: 10.1093/bioinformatics/bty436] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Revised: 05/08/2018] [Accepted: 05/24/2018] [Indexed: 11/12/2022] Open
Abstract
Motivation Molecular analyses suggest that myeloma is composed of distinct sub-types that have different molecular pathologies and various response rates to certain treatments. Drug responses in multiple myeloma (MM) are usually recorded as a multi-level ordinal outcome. One of the goals of drug response studies is to predict which response category any patients belong to with high probability based on their clinical and molecular features. However, as most of genes have small effects, gene-based models may provide limited predictive accuracy. In that case, methods for predicting multi-level ordinal drug responses by incorporating biological pathways are desired but have not been developed yet. Results We propose a pathway-structured method for predicting multi-level ordinal responses using a two-stage approach. We first develop hierarchical ordinal logistic models and an efficient quasi-Newton algorithm for jointly analyzing numerous correlated variables. Our two-stage approach first obtains the linear predictor (called the pathway score) for each pathway by fitting all predictors within each pathway using the hierarchical ordinal logistic approach, and then combines the pathway scores as new predictors to build a predictive model. We applied the proposed method to two publicly available datasets for predicting multi-level ordinal drug responses in MM using large-scale gene expression data and pathway information. Our results show that our approach not only significantly improved the predictive performance compared with the corresponding gene-based model but also allowed us to identify biologically relevant pathways. Availability and implementation The proposed approach has been implemented in our R package BhGLM, which is freely available from the public GitHub repository https://github.com/abbyyan3/BhGLM.
Collapse
Affiliation(s)
- Xinyan Zhang
- Department of Biostatistics, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Statesboro, GA, USA
| | - Bingzong Li
- Department of Hematology, The Second Affiliated Hospital of Soochow University, Suzhou, China
| | - Huiying Han
- Department of Cell Biology, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Sha Song
- Department of Cell Biology, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Hongxia Xu
- Department of Cell Biology, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Zixuan Yi
- School of Medicine, Eastern Virginia Medical School, Norfork, VA, USA
| | - Yating Hong
- Department of Hematology, The Second Affiliated Hospital of Soochow University, Suzhou, China
| | - Wenzhuo Zhuang
- Department of Cell Biology, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Nengjun Yi
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, USA
| |
Collapse
|
22
|
Wang JH, Chen YH. Overlapping group screening for detection of gene-gene interactions: application to gene expression profiles with survival trait. BMC Bioinformatics 2018; 19:335. [PMID: 30241463 PMCID: PMC6150983 DOI: 10.1186/s12859-018-2372-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2018] [Accepted: 09/12/2018] [Indexed: 01/29/2023] Open
Abstract
Background The development of a disease is a complex process that may result from joint effects of multiple genes. In this article, we propose the overlapping group screening (OGS) approach to determining active genes and gene-gene interactions incorporating prior pathway information. The OGS method is developed to overcome the challenges in genome-wide data analysis that the number of the genes and gene-gene interactions is far greater than the sample size, and the pathways generally overlap with one another. The OGS method is further proposed for patients’ survival prediction based on gene expression data. Results Simulation studies demonstrate that the performance of the OGS approach in identifying the true main and interaction effects is good and the survival prediction accuracy of OGS with the Lasso penalty is better than the ordinary Lasso method. In real data analysis, we identify several significant genes and/or epistasis interactions that are associated with clinical survival outcomes of diffuse large B-cell lymphoma (DLBCL) and non-small-cell lung cancer (NSCLC) by utilizing prior pathway information from the KEGG pathway and the GO biological process databases, respectively. Conclusions The OGS approach is useful for selecting important genes and epistasis interactions in the ultra-high dimensional feature space. The prediction ability of OGS with the Lasso penalty is better than existing methods. The OGS approach is generally applicable to various types of outcome data (quantitative, qualitative, censored event time data) and regression models (e.g. linear, logistic, and Cox’s regression models). Electronic supplementary material The online version of this article (10.1186/s12859-018-2372-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jie-Huei Wang
- Institute of Statistical Science, Academia Sinica, Nankang, Taipei, Taiwan
| | - Yi-Hau Chen
- Institute of Statistical Science, Academia Sinica, Nankang, Taipei, Taiwan.
| |
Collapse
|
23
|
Ozturk K, Dow M, Carlin DE, Bejar R, Carter H. The Emerging Potential for Network Analysis to Inform Precision Cancer Medicine. J Mol Biol 2018; 430:2875-2899. [PMID: 29908887 PMCID: PMC6097914 DOI: 10.1016/j.jmb.2018.06.016] [Citation(s) in RCA: 54] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Revised: 05/30/2018] [Accepted: 06/06/2018] [Indexed: 12/19/2022]
Abstract
Precision cancer medicine promises to tailor clinical decisions to patients using genomic information. Indeed, successes of drugs targeting genetic alterations in tumors, such as imatinib that targets BCR-ABL in chronic myelogenous leukemia, have demonstrated the power of this approach. However, biological systems are complex, and patients may differ not only by the specific genetic alterations in their tumor, but also by more subtle interactions among such alterations. Systems biology and more specifically, network analysis, provides a framework for advancing precision medicine beyond clinical actionability of individual mutations. Here we discuss applications of network analysis to study tumor biology, early methods for N-of-1 tumor genome analysis, and the path for such tools to the clinic.
Collapse
Affiliation(s)
- Kivilcim Ozturk
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Michelle Dow
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Daniel E Carlin
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA
| | - Rafael Bejar
- Moores Cancer Center, Division of Hematology and Oncology, University of California San Diego, La Jolla, CA 92093, USA
| | - Hannah Carter
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA; Moores Cancer Center and Institute for Genomic Medicine, University of California San Diego, La Jolla, CA 92093, USA; CIFAR, MaRS Centre, West Tower, 661 University Ave., Suite 505, Toronto, ON M5G 1M1, Canada.
| |
Collapse
|
24
|
Yi N, Tang Z, Zhang X, Guo B. BhGLM: Bayesian hierarchical GLMs and survival models, with applications to genomics and epidemiology. Bioinformatics 2018; 35:1419-1421. [PMID: 30219850 PMCID: PMC7963076 DOI: 10.1093/bioinformatics/bty803] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Revised: 09/05/2018] [Accepted: 09/12/2018] [Indexed: 01/31/2023] Open
Abstract
SUMMARY BhGLM is a freely available R package that implements Bayesian hierarchical modeling for high-dimensional clinical and genomic data. It consists of functions for setting up various Bayesian hierarchical models, including generalized linear models (GLMs) and Cox survival models, with four types of prior distributions for coefficients, i.e. double-exponential, Student-t, mixture double-exponential and mixture Student-t. These functions adapt fast and stable algorithms to estimate parameters. BhGLM also provides functions for summarizing results numerically and graphically and for evaluating predictive values. The package is particularly useful for analyzing large-scale molecular data, i.e. detecting disease-associated variables and predicting disease outcomes. We here describe the models, algorithms and associated features implemented in BhGLM. AVAILABILITY AND IMPLEMENTATION The package is freely available from the public GitHub repository, https://github.com/nyiuab/BhGLM.
Collapse
Affiliation(s)
- Nengjun Yi
- To whom correspondence should be addressed.
| | - Zaixiang Tang
- Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, China
| | - Xinyan Zhang
- Department of Biostatistics, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Statesboro, GA, USA
| | - Boyi Guo
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA
| |
Collapse
|
25
|
Deng K, Zhang F, Song W, Zhao W, Rong Z, Cai Y, Xu H, Lu M, Wang W, Li A, Hou Y, Li Z, Li K. Identification of pathway-based recurrence-associated signatures in optimally debulked patients with serous ovarian cancer. J Cell Biochem 2018; 119:8564-8573. [PMID: 30126000 DOI: 10.1002/jcb.27098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2017] [Accepted: 04/26/2018] [Indexed: 11/06/2022]
Abstract
Serous ovarian cancer (SOC) is the most common form of the histological subtype of epithelial ovarian cancer, with the worst clinical outcome. Despite improvements in surgery and chemotherapy, most patients with SOC experience recurrence within 12-18 months of first-line treatment. Current studies are unable to robustly predict the recurrence of SOC, and more accurate predictive models are urgently required. We have, therefore, developed a novel pathway-structured model to predict the recurrence of SOC. We trained the model on a set of 333 patients and validated it in 3 diversified validation datasets of 403 patients. Genes significantly associated with recurrence within each pathway were identified using a Cox proportional hazards model based on LASSO estimation in the training dataset. Next, a pathway-structured scoring matrix was obtained after computation of the prognostic score for each pathway by fitting to the Cox proportional hazards model. With the pathway-structure scoring matrix as an input, the pathway-based recurrent signatures were identified using the Cox proportional hazards model based on LASSO estimation and the significant pathway-based signatures were externally validated in 3 independent datasets. Meanwhile, our pathway-structured model was compared with a commonly used gene-based model. Our results revealed that our 12 pathway-based signatures successfully predicted the recurrence of SOC with high accuracy in the training dataset and in the 3 validation datasets. Moreover, our pathway-structured model was superior to the gene-based model in 4 datasets. The pathways selected in our study will provide new insights into the pathogenesis and clinical treatments of SOC.
Collapse
Affiliation(s)
- Kui Deng
- Department of Epidemiology and Biostatistics, School of Public Health, Harbin Medical University, Harbin, China
| | - Fan Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Harbin Medical University, Harbin, China
| | - Wei Song
- Department of Epidemiology and Biostatistics, School of Public Health, Harbin Medical University, Harbin, China
| | - Weiwei Zhao
- Department of Epidemiology and Biostatistics, School of Public Health, Harbin Medical University, Harbin, China
| | - Zhiwei Rong
- Department of Epidemiology and Biostatistics, School of Public Health, Harbin Medical University, Harbin, China
| | - Yuqing Cai
- Department of Epidemiology and Biostatistics, School of Public Health, Harbin Medical University, Harbin, China
| | - Huan Xu
- Department of Epidemiology and Biostatistics, School of Public Health, Harbin Medical University, Harbin, China
| | - Mingliang Lu
- Department of Epidemiology and Biostatistics, School of Public Health, Harbin Medical University, Harbin, China
| | - Wenjie Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Harbin Medical University, Harbin, China
| | - Ang Li
- Department of Epidemiology and Biostatistics, School of Public Health, Harbin Medical University, Harbin, China
| | - Yan Hou
- Department of Epidemiology and Biostatistics, School of Public Health, Harbin Medical University, Harbin, China
| | - Zhenzi Li
- Department of Epidemiology and Biostatistics, School of Public Health, Harbin Medical University, Harbin, China
| | - Kang Li
- Department of Epidemiology and Biostatistics, School of Public Health, Harbin Medical University, Harbin, China
| |
Collapse
|
26
|
Nedungadi P, Iyer A, Gutjahr G, Bhaskar J, Pillai AB. Data-Driven Methods for Advancing Precision Oncology. CURRENT PHARMACOLOGY REPORTS 2018; 4:145-156. [PMID: 33520605 PMCID: PMC7845924 DOI: 10.1007/s40495-018-0127-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
PURPOSE OF REVIEW This article discusses the advances, methods, challenges, and future directions of data-driven methods in advancing precision oncology for biomedical research, drug discovery, clinical research, and practice. RECENT FINDINGS Precision oncology provides individually tailored cancer treatment by considering an individual's genetic makeup, clinical, environmental, social, and lifestyle information. Challenges include voluminous, heterogeneous, and disparate data generated by different technologies with multiple modalities such as Omics, electronic health records, clinical registries and repositories, medical imaging, demographics, wearables, and sensors. Statistical and machine learning methods have been continuously adapting to the ever-increasing size and complexity of data. Precision Oncology supportive analytics have improved turnaround time in biomarker discovery and time-to-application of new and repurposed drugs. Precision oncology additionally seeks to identify target patient populations based on genomic alterations that are sensitive or resistant to conventional or experimental treatments. Predictive models have been developed for cancer progression and survivorship, drug sensitivity and resistance, and identification of the most suitable combination treatments for individual patient scenarios. In the future, clinical decision support systems need to be revamped to better incorporate knowledge from precision oncology, thus enabling clinical practitioners to provide precision cancer care. SUMMARY Open Omics datasets, machine learning algorithms, and predictive models have enabled the advancement of precision oncology. Clinical decision support systems with integrated electronic health record and Omics data are needed to provide data-driven recommendations to assist clinicians in disease prevention, early identification, and individualized treatment. Additionally, as cancer is a constantly evolving disorder, clinical decision systems will need to be continually updated based on more recent knowledge and datasets.
Collapse
Affiliation(s)
- Prema Nedungadi
- Center for Research in Analytics & Technology in Education, Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, India
- Department of Computer Science, School of Engineering, Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, India
| | - Akshay Iyer
- Center for Research in Analytics & Technology in Education, Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, India
| | - Georg Gutjahr
- Center for Research in Analytics & Technology in Education, Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, India
| | - Jasmine Bhaskar
- Center for Research in Analytics & Technology in Education, Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, India
- Department of Computer Science, School of Engineering, Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, India
| | - Asha B. Pillai
- Division of Pediatric Hematology/Oncology, Departments of Pediatrics and Microbiology and Immunology, University of Miami Miller School of Medicine, Miami, FL, USA
| |
Collapse
|
27
|
Chaudhary K, Poirion OB, Lu L, Garmire LX. Deep Learning-Based Multi-Omics Integration Robustly Predicts Survival in Liver Cancer. Clin Cancer Res 2018; 24:1248-1259. [PMID: 28982688 PMCID: PMC6050171 DOI: 10.1158/1078-0432.ccr-17-0853] [Citation(s) in RCA: 490] [Impact Index Per Article: 81.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2017] [Revised: 06/18/2017] [Accepted: 10/02/2017] [Indexed: 02/07/2023]
Abstract
Identifying robust survival subgroups of hepatocellular carcinoma (HCC) will significantly improve patient care. Currently, endeavor of integrating multi-omics data to explicitly predict HCC survival from multiple patient cohorts is lacking. To fill this gap, we present a deep learning (DL)-based model on HCC that robustly differentiates survival subpopulations of patients in six cohorts. We built the DL-based, survival-sensitive model on 360 HCC patients' data using RNA sequencing (RNA-Seq), miRNA sequencing (miRNA-Seq), and methylation data from The Cancer Genome Atlas (TCGA), which predicts prognosis as good as an alternative model where genomics and clinical data are both considered. This DL-based model provides two optimal subgroups of patients with significant survival differences (P = 7.13e-6) and good model fitness [concordance index (C-index) = 0.68]. More aggressive subtype is associated with frequent TP53 inactivation mutations, higher expression of stemness markers (KRT19 and EPCAM) and tumor marker BIRC5, and activated Wnt and Akt signaling pathways. We validated this multi-omics model on five external datasets of various omics types: LIRI-JP cohort (n = 230, C-index = 0.75), NCI cohort (n = 221, C-index = 0.67), Chinese cohort (n = 166, C-index = 0.69), E-TABM-36 cohort (n = 40, C-index = 0.77), and Hawaiian cohort (n = 27, C-index = 0.82). This is the first study to employ DL to identify multi-omics features linked to the differential survival of patients with HCC. Given its robustness over multiple cohorts, we expect this workflow to be useful at predicting HCC prognosis prediction. Clin Cancer Res; 24(6); 1248-59. ©2017 AACR.
Collapse
Affiliation(s)
| | - Olivier B Poirion
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, Hawaii
| | - Liangqun Lu
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, Hawaii
- Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at Manoa, Honolulu, Hawaii
| | - Lana X Garmire
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, Hawaii.
- Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at Manoa, Honolulu, Hawaii
| |
Collapse
|
28
|
Kuznetsov VA, Tang Z, Ivshina AV. Identification of common oncogenic and early developmental pathways in the ovarian carcinomas controlling by distinct prognostically significant microRNA subsets. BMC Genomics 2017; 18:692. [PMID: 28984201 PMCID: PMC5629558 DOI: 10.1186/s12864-017-4027-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Background High-grade serous ovarian carcinoma (HG-SOC) is the dominant tumor histologic type in epithelial ovarian cancers, exhibiting highly aberrant microRNA expression profiles and diverse pathways that collectively determine the disease aggressiveness and clinical outcomes. However, the functional relationships between microRNAs, the common pathways controlled by the microRNAs and their prognostic and therapeutic significance remain poorly understood. Methods We investigated the gene expression patterns of microRNAs in the tumors of 582 HG-SOC patients to identify prognosis signatures and pathways controlled by tumor miRNAs. We developed a variable selection and prognostic method, which performs a robust selection of small-sized subsets of the predictive features (e.g., expressed microRNAs) that collectively serves as the biomarkers of cancer risk and progression stratification system, interconnecting these features with common cancer-related pathways. Results Across different cohorts, our meta-analysis revealed two robust and unbiased miRNA-based prognostic classifiers. Each classifier reproducibly discriminates HG-SOC patients into high-confidence low-, intermediate- or high-prognostic risk subgroups with essentially different 5-year overall survival rates of 51.6-85%, 20-38.1%, and 0-10%, respectively. Significant correlations of the risk subgroup’s stratification with chemotherapy treatment response were observed. We predicted specific target genes involved in nine cancer-related and two oocyte maturation pathways (neurotrophin and progesterone-mediated oocyte maturation), where each gene can be controlled by more than one miRNA species of the distinct miRNA HG-SOC prognostic classifiers. Conclusions We identified robust and reproducible miRNA-based prognostic subsets of the of HG-SOC classifiers. The miRNAs of these classifiers could control nine oncogenic and two developmental pathways, highlighting common underlying pathologic mechanisms and perspective targets for the further development of a personalized prognosis assay(s) and the development of miRNA-interconnected pathway-centric and multi-agent therapeutic intervention. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-4027-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Vladimir A Kuznetsov
- Genome and Gene Expression Data Analysis Division, Bioinformatics Institute, A-STAR, 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Singapore. .,School of Computer Science and Engineering, Nanyang Technological University, Singapore, 639798, Singapore.
| | - Zhiqun Tang
- Genome and Gene Expression Data Analysis Division, Bioinformatics Institute, A-STAR, 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Singapore
| | - Anna V Ivshina
- Genome and Gene Expression Data Analysis Division, Bioinformatics Institute, A-STAR, 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Singapore
| |
Collapse
|