1
|
Shen J, Wang S, Sun H, Huang J, Bai L, Wang X, Dong Y, Tang Z. A novel non-negative Bayesian stacking modeling method for Cancer survival prediction using high-dimensional omics data. BMC Med Res Methodol 2024; 24:105. [PMID: 38702624 PMCID: PMC11067084 DOI: 10.1186/s12874-024-02232-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 04/23/2024] [Indexed: 05/06/2024] Open
Abstract
BACKGROUND Survival prediction using high-dimensional molecular data is a hot topic in the field of genomics and precision medicine, especially for cancer studies. Considering that carcinogenesis has a pathway-based pathogenesis, developing models using such group structures is a closer mimic of disease progression and prognosis. Many approaches can be used to integrate group information; however, most of them are single-model methods, which may account for unstable prediction. METHODS We introduced a novel survival stacking method that modeled using group structure information to improve the robustness of cancer survival prediction in the context of high-dimensional omics data. With a super learner, survival stacking combines the prediction from multiple sub-models that are independently trained using the features in pre-grouped biological pathways. In addition to a non-negative linear combination of sub-models, we extended the super learner to non-negative Bayesian hierarchical generalized linear model and artificial neural network. We compared the proposed modeling strategy with the widely used survival penalized method Lasso Cox and several group penalized methods, e.g., group Lasso Cox, via simulation study and real-world data application. RESULTS The proposed survival stacking method showed superior and robust performance in terms of discrimination compared with single-model methods in case of high-noise simulated data and real-world data. The non-negative Bayesian stacking method can identify important biological signal pathways and genes that are associated with the prognosis of cancer. CONCLUSIONS This study proposed a novel survival stacking strategy incorporating biological group information into the cancer prognosis models. Additionally, this study extended the super learner to non-negative Bayesian model and ANN, enriching the combination of sub-models. The proposed Bayesian stacking strategy exhibited favorable properties in the prediction and interpretation of complex survival data, which may aid in discovering cancer targets.
Collapse
Affiliation(s)
- Junjie Shen
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Major Chronic Non-communicable Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, People's Republic of China
| | - Shuo Wang
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center-University of Freiburg, 79085, Freiburg, Germany
| | - Hao Sun
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Major Chronic Non-communicable Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, People's Republic of China
| | - Jie Huang
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Major Chronic Non-communicable Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, People's Republic of China
| | - Lu Bai
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Major Chronic Non-communicable Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, People's Republic of China
| | - Xichao Wang
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Major Chronic Non-communicable Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, People's Republic of China
| | - Yongfei Dong
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Major Chronic Non-communicable Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, People's Republic of China
| | - Zaixiang Tang
- Department of Biostatistics, School of Public Health, Jiangsu Key Laboratory of Preventive and Translational Medicine for Major Chronic Non-communicable Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Medical College of Soochow University, Suzhou, Jiangsu, 215123, People's Republic of China.
| |
Collapse
|
2
|
Vashistha R, Noor Z, Dasgupta S, Pu J, Deng S. Application of statistical machine learning in biomarker selection. Sci Rep 2023; 13:18331. [PMID: 37884606 PMCID: PMC10603146 DOI: 10.1038/s41598-023-45323-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 10/18/2023] [Indexed: 10/28/2023] Open
Abstract
In the recent JAVELIN Bladder 100 phase 3 trial, avelumab plus best supportive care significantly prolonged overall survival relative to best supportive care alone as first-line maintenance therapy following first-line platinum-based chemotherapy in patients with advanced urothelial cancer (aUC). Discovering biomarkers using genomic profiling to understand potential patient heterogeneity is essential to help improve patient care with precision medicine. For the JAVELIN Bladder 100 trial, it is unclear which variable selection methods can most reliably identify biomarkers to inform patient care because the dataset is characterized by high collinearity and low signal. The aim of this paper was to evaluate available selection methods and their ability to discover prognostic and predictive biomarkers in patients with aUC receiving first-line maintenance therapy. A simulation study evaluated the performance of popular variable selection approaches for high-dimensional data, including penalized regression models, random survival forests, and Bayesian variable selection methods. For Bayesian variable selection methods, a modified Bayesian Information Criterion (BIC) thresholding rule was proposed in addition to the traditional BIC thresholding rule. These methods were applied to the JAVELIN Bladder 100 dataset to investigate potential biomarkers associated with survival benefit. Results from the simulations demonstrated the strengths and limitations of the different methods. The variable selection methods demonstrated low false discovery rates under different conditions. However, their performance declined in the presence of high collinearity. Using the JAVELIN Bladder 100 data, we identified some potentially significant biomarkers across multiple models. Several lasso-related methods were able to identify potentially biologically meaningful variables in the trial. Some variable selection methods (such as stochastic search variable selection and random survival forest) may not be well suited to this type of data due to the presence of extreme collinearity and low signal. Future research should explore novel variable selection methods that may be more suitable for identifying prognostic and predictive biomarkers in this population.Trial registration: ClinicalTrials.gov Identifier: NCT02603432.
Collapse
Affiliation(s)
- Ritwik Vashistha
- Department of Statistics and Data Sciences, The University of Texas at Austin, Austin, TX, USA
| | - Zubdahe Noor
- Pfizer Research and Development, Pfizer Healthcare India Private Limited, Chennai, India
| | - Shibasish Dasgupta
- Pfizer Research and Development, Pfizer Healthcare India Private Limited, Chennai, India.
- Chennai Mathematical Institute, Chennai, India.
| | - Jie Pu
- Pfizer Research and Development, Pfizer, New York, NY, USA
| | - Shibing Deng
- Pfizer Research and Development, Pfizer, New York, NY, USA
| |
Collapse
|
3
|
Zhao Z, Feng Q, Zhang Y, Ning Z. Adaptive risk-aware sharable and individual subspace learning for cancer survival analysis with multi-modality data. Brief Bioinform 2023; 24:6847200. [PMID: 36433784 DOI: 10.1093/bib/bbac489] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 09/16/2022] [Accepted: 10/15/2022] [Indexed: 11/27/2022] Open
Abstract
Biomedical multi-modality data (also named multi-omics data) refer to data that span different types and derive from multiple sources in clinical practices (e.g. gene sequences, proteomics and histopathological images), which can provide comprehensive perspectives for cancers and generally improve the performance of survival models. However, the performance improvement of multi-modality survival models may be hindered by two key issues as follows: (1) how to learn and fuse modality-sharable and modality-individual representations from multi-modality data; (2) how to explore the potential risk-aware characteristics in each risk subgroup, which is beneficial to risk stratification and prognosis evaluation. Additionally, learning-based survival models generally refer to numerous hyper-parameters, which requires time-consuming parameter setting and might result in a suboptimal solution. In this paper, we propose an adaptive risk-aware sharable and individual subspace learning method for cancer survival analysis. The proposed method jointly learns sharable and individual subspaces from multi-modality data, whereas two auxiliary terms (i.e. intra-modality complementarity and inter-modality incoherence) are developed to preserve the complementary and distinctive properties of each modality. Moreover, it equips with a grouping co-expression constraint for obtaining risk-aware representation and preserving local consistency. Furthermore, an adaptive-weighted strategy is employed to efficiently estimate crucial parameters during the training stage. Experimental results on three public datasets demonstrate the superiority of our proposed model.
Collapse
Affiliation(s)
- Zhangxin Zhao
- School of Biomedical Engineering at Southern Medical University, Guangdong, China
| | - Qianjin Feng
- School of Biomedical Engineering at Southern Medical University, Guangdong, China
| | - Yu Zhang
- School of Biomedical Engineering, Southern Medical University, Guangdong, China
| | - Zhenyuan Ning
- School of Biomedical Engineering at Southern Medical University, Guangdong, China
| |
Collapse
|
4
|
Motloch LJ, Jirak P, Gareeva D, Davtyan P, Gumerov R, Lakman I, Tataurov A, Zulkarneev R, Kabirov I, Cai B, Valeev B, Pavlov V, Kopp K, Hoppe UC, Lichtenauer M, Fiedler L, Pistulli R, Zagidullin N. Cardiovascular Biomarkers for Prediction of in-hospital and 1-Year Post-discharge Mortality in Patients With COVID-19 Pneumonia. Front Med (Lausanne) 2022; 9:906665. [PMID: 35836945 PMCID: PMC9273888 DOI: 10.3389/fmed.2022.906665] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Accepted: 05/30/2022] [Indexed: 01/08/2023] Open
Abstract
Aims While COVID-19 affects the cardiovascular system, the potential clinical impact of cardiovascular biomarkers on predicting outcomes in COVID-19 patients is still unknown. Therefore, to investigate this issue we analyzed the prognostic potential of cardiac biomarkers on in-hospital and long-term post-discharge mortality of patients with COVID-19 pneumonia. Methods Serum soluble ST2, VCAM-1, and hs-TnI were evaluated upon admission in 280 consecutive patients hospitalized with COVID-19-associated pneumonia in a single, tertiary care center. Patient clinical and laboratory characteristics and the concentration of biomarkers were correlated with in-hospital [Hospital stay: 11 days (10; 14)] and post-discharge all-cause mortality at 1 year follow-up [FU: 354 days (342; 361)]. Results 11 patients died while hospitalized for COVID-19 (3.9%), and 11 patients died during the 1-year post-discharge follow-up period (n = 11, 4.1%). Using multivariate analysis, VCAM-1 was shown to predict mortality during the hospital period (HR 1.081, CI 95% 1.035;1.129, p = 0.017), but not ST2 or hs-TnI. In contrast, during one-year FU post hospital discharge, ST2 (HR 1.006, 95% CI 1.002;1.009, p < 0.001) and hs-TnI (HR 1.362, 95% CI 1.050;1.766, p = 0.024) predicted mortality, although not VCAM-1. Conclusion In patients hospitalized with Covid-19 pneumonia, elevated levels of VCAM-1 at admission were associated with in-hospital mortality, while ST2 and hs-TnI might predict post-discharge mortality in long term follow-up.
Collapse
Affiliation(s)
- Lukas J. Motloch
- University Clinic for Internal Medicine II, Paracelsus Medical University, Salzburg, Austria
- *Correspondence: Lukas J. Motloch
| | - Peter Jirak
- University Clinic for Internal Medicine II, Paracelsus Medical University, Salzburg, Austria
| | - Diana Gareeva
- Cardiovascular Disease in COVID-19, International Research Network, Ufa, Russia
- Department of Internal Diseases, Bashkir State Medical University, Ufa, Russia
| | - Paruir Davtyan
- Cardiovascular Disease in COVID-19, International Research Network, Ufa, Russia
- Department of Internal Diseases, Bashkir State Medical University, Ufa, Russia
| | - Ruslan Gumerov
- Cardiovascular Disease in COVID-19, International Research Network, Ufa, Russia
- Department of Internal Diseases, Bashkir State Medical University, Ufa, Russia
| | - Irina Lakman
- Cardiovascular Disease in COVID-19, International Research Network, Ufa, Russia
- Department of Internal Diseases, Bashkir State Medical University, Ufa, Russia
- Department of Biomedical Engineering, Ufa State Aviation Technical University, Ufa, Russia
- Scientific Laboratory for the Socio-Economic Region Problems Investigation, Bashkir State University, Ufa, Russia
| | - Aleksandr Tataurov
- Scientific Laboratory for the Socio-Economic Region Problems Investigation, Bashkir State University, Ufa, Russia
| | - Rustem Zulkarneev
- Department of Internal Diseases, Bashkir State Medical University, Ufa, Russia
| | - Ildar Kabirov
- Department of Urology, Bashkir State Medical University, Ufa, Russia
| | - Benzhi Cai
- Cardiovascular Disease in COVID-19, International Research Network, Ufa, Russia
- The Key Laboratory of Cardiovascular Medicine Research, Ministry of Education, Department of Pharmacy at the Second Affiliated Hospital, and Department of Pharmacology at College of Pharmacy, Harbin Medical University, Harbin, China
| | - Bairas Valeev
- Department of Internal Diseases, Bashkir State Medical University, Ufa, Russia
| | - Valentin Pavlov
- Cardiovascular Disease in COVID-19, International Research Network, Ufa, Russia
- Department of Urology, Bashkir State Medical University, Ufa, Russia
| | - Kristen Kopp
- University Clinic for Internal Medicine II, Paracelsus Medical University, Salzburg, Austria
| | - Uta C. Hoppe
- University Clinic for Internal Medicine II, Paracelsus Medical University, Salzburg, Austria
| | - Michael Lichtenauer
- University Clinic for Internal Medicine II, Paracelsus Medical University, Salzburg, Austria
| | - Lukas Fiedler
- University Clinic for Internal Medicine II, Paracelsus Medical University, Salzburg, Austria
- Department of Internal Medicine, Cardiology, Nephrology and Intensive Care Medicine, Hospital Wiener Neustadt, Wiener Neustadt, Austria
| | - Rudin Pistulli
- Department of Cardiology I, Coronary and Peripheral Vascular Disease, Heart Failure, University Hospital Munster, Munster, Germany
| | - Naufal Zagidullin
- Cardiovascular Disease in COVID-19, International Research Network, Ufa, Russia
- Department of Internal Diseases, Bashkir State Medical University, Ufa, Russia
- Department of Biomedical Engineering, Ufa State Aviation Technical University, Ufa, Russia
| |
Collapse
|
5
|
Chu J, Sun NA, Hu W, Chen X, Yi N, Shen Y. The Application of Bayesian Methods in Cancer Prognosis and Prediction. Cancer Genomics Proteomics 2022; 19:1-11. [PMID: 34949654 PMCID: PMC8717957 DOI: 10.21873/cgp.20298] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 11/24/2021] [Accepted: 11/30/2021] [Indexed: 11/10/2022] Open
Abstract
With the development of high-throughput biological techniques, high-dimensional omics data have emerged. These molecular data provide a solid foundation for precision medicine and prognostic prediction of cancer. Bayesian methods contribute to constructing prognostic models with complex relationships in omics and improving performance by introducing different prior distribution, which is suitable for modelling the high-dimensional data involved. Using different omics, several Bayesian hierarchical approaches have been proposed for variable selection and model construction. In particular, the Bayesian methods of multi-omics integration have also been consistently proposed in recent years. Compared with single-omics, multi-omics integration modelling will contribute to improving predictive performance, gaining insights into the underlying mechanisms of tumour occurrence and development, and the discovery of more reliable biomarkers. In this work, we present a review of current proposed Bayesian approaches in prognostic prediction modelling in cancer.
Collapse
Affiliation(s)
- Jiadong Chu
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - N A Sun
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Wei Hu
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Xuanli Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Nengjun Yi
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, U.S.A
| | - Yueping Shen
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China;
| |
Collapse
|
6
|
Ning Z, Du D, Tu C, Feng Q, Zhang Y. Relation-Aware Shared Representation Learning for Cancer Prognosis Analysis With Auxiliary Clinical Variables and Incomplete Multi-Modality Data. IEEE TRANSACTIONS ON MEDICAL IMAGING 2022; 41:186-198. [PMID: 34460368 DOI: 10.1109/tmi.2021.3108802] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The integrative analysis of complementary phenotype information contained in multi-modality data (e.g., histopathological images and genomic data) has advanced the prognostic evaluation of cancers. However, multi-modality based prognosis analysis confronts two challenges: (1) how to explore underlying relations inherent in different modalities data for learning compact and discriminative multi-modality representations; (2) how to take full consideration of incomplete multi-modality data for constructing accurate and robust prognostic model, since a host of complete multi-modality data are not always available. Additionally, many existing multi-modality based prognostic methods commonly ignore relevant clinical variables (e.g., grade and stage), which, however, may provide supplemental information to promote the performance of model. In this paper, we propose a relation-aware shared representation learning method for prognosis analysis of cancers, which makes full use of clinical information and incomplete multi-modality data. The proposed method learns multi-modal shared space tailored for prognostic model via a dual mapping. Within the shared space, it equips with relational regularizers to explore the potential relations (i.e., feature-label and feature-feature relations) among multi-modality data for inducing discriminatory representations and simultaneously obtaining extra sparsity for alleviating overfitting. Moreover, it regresses and incorporates multiple auxiliary clinical attributes with dynamic coefficients to meliorate performance. Furthermore, in training stage, a partial mapping strategy is employed to extend and train a more reliable model with incomplete multi-modality data. We have evaluated our method on three public datasets derived from The Cancer Genome Atlas (TCGA) project, and the experimental results demonstrate the superior performance of the proposed method.
Collapse
|
7
|
Zagidullin NS, Motloch LJ, Musin TI, Bagmanova ZA, Lakman IA, Tyurin AV, Gumerov RM, Enikeev D, Cai B, Gareeva DF, Davtyan PA, Gareev DA, Talipova HM, Badykov MR, Jirak P, Kopp K, Hoppe UC, Pistulli R, Pavlov VN. J-waves in acute COVID-19: A novel disease characteristic and predictor of mortality? PLoS One 2021; 16:e0257982. [PMID: 34648510 PMCID: PMC8516278 DOI: 10.1371/journal.pone.0257982] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 09/14/2021] [Indexed: 12/14/2022] Open
Abstract
Background J-waves represent a common finding in routine ECGs (5–6%) and are closely linked to ventricular tachycardias. While arrhythmias and non-specific ECG alterations are a frequent finding in COVID-19, an analysis of J-wave incidence in acute COVID-19 is lacking. Methods A total of 386 patients consecutively, hospitalized due to acute COVID-19 pneumonia were included in this retrospective analysis. Admission ECGs were analyzed, screened for J-waves and correlated to clinical characteristics and 28-day mortality. Results J-waves were present in 12.2% of patients. Factors associated with the presence of J-waves were old age, female sex, a history of stroke and/or heart failure, high CRP levels as well as a high BMI. Mortality rates were significantly higher in patients with J-waves in the admission ECG compared to the non-J-wave cohort (J-wave: 14.9% vs. non-J-wave 3.8%, p = 0.001). After adjusting for confounders using a multivariable cox regression model, the incidence of J-waves was an independent predictor of mortality at 28-days (OR 2.76 95% CI: 1.15–6.63; p = 0.023). J-waves disappeared or declined in 36.4% of COVID-19 survivors with available ECGs for 6–8 months follow-up. Conclusion J-waves are frequently and often transiently found in the admission ECG of patients hospitalized with acute COVID-19. Furthermore, they seem to be an independent predictor of 28-day mortality.
Collapse
Affiliation(s)
- Naufal Shamilevich Zagidullin
- Department of Internal Medicine I, Bashkir State Medical University, Ufa, Russian Federation
- Department of Biomedical Engineering of Ufa State Aviation Technical University, Ufa, Russian Federation
- * E-mail:
| | - Lukas J. Motloch
- Clinic II for Internal Medicine, University Hospital Salzburg, Paracelsus Medical University, Salzburg, Austria
| | - Timur Ilgamovich Musin
- Department of Internal Medicine I, Bashkir State Medical University, Ufa, Russian Federation
| | | | - Irina Alexandrovna Lakman
- Department of Biomedical Engineering of Ufa State Aviation Technical University, Ufa, Russian Federation
- Department of Economics, Finance and Business, Bashkir State University, Ufa, Russian Federation
| | | | | | - Dinar Enikeev
- Department of Biomedical Engineering of Ufa State Aviation Technical University, Ufa, Russian Federation
| | - Benzhi Cai
- Department of Pharmacy at The Second Affiliated Hospital, and Department of Pharmacology (The Key Laboratory of Cardiovascular Medicine Research, Ministry of Education) at College of Pharmacy, Harbin Medical University, Harbin, China
| | | | | | - Damir Aidarovich Gareev
- Department of Internal Medicine I, Bashkir State Medical University, Ufa, Russian Federation
| | | | | | - Peter Jirak
- Clinic II for Internal Medicine, University Hospital Salzburg, Paracelsus Medical University, Salzburg, Austria
| | - Kristen Kopp
- Clinic II for Internal Medicine, University Hospital Salzburg, Paracelsus Medical University, Salzburg, Austria
| | - Uta C. Hoppe
- Clinic II for Internal Medicine, University Hospital Salzburg, Paracelsus Medical University, Salzburg, Austria
| | - Rudin Pistulli
- Department of Cardiology I, Coronary and Peripheral Vascular Disease, Heart Failure, University Hospital Münster, Münster, Germany
| | | |
Collapse
|
8
|
Tang B, Wang Y, Chen Y, Li M, Tao Y. A Novel Early-Stage Lung Adenocarcinoma Prognostic Model Based on Feature Selection With Orthogonal Regression. Front Cell Dev Biol 2021; 8:620746. [PMID: 33585460 PMCID: PMC7874010 DOI: 10.3389/fcell.2020.620746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Accepted: 11/17/2020] [Indexed: 11/13/2022] Open
Abstract
Carcinoma diagnosis and prognosis are still hindered by the lack of effective prediction model and integration methodology. We proposed a novel feature selection with orthogonal regression (FSOR) method to resolve predictor selection and performance optimization. Functional enrichment and clinical outcome analyses with multi-omics information validated the method's robustness in the early-stage prognosis of lung adenocarcinoma. Furthermore, compared with the classic least absolute shrinkage and selection operator (LASSO) regression method [the averaged 1- to 4-years predictive area under the receiver operating characteristic curve (AUC) measure, 0.6998], the proposed one outperforms more accurately by 0.7208 with fewer predictors, particularly its averaged 1- to 3-years AUC reaches 0.723, vs. classic 0.6917 on The Cancer Genome Atlas (TCGA). In sum, the proposed method can deliver better prediction performance for early-stage prognosis and improve therapy strategy but with less predictor consideration and computation burden. The self-composed running scripts, together with the processed results, are available at https://github.com/gladex/PM-FSOR.
Collapse
Affiliation(s)
- Binhua Tang
- Epigenetics & Function Group, Hohai University, Nanjing, China
| | - Yuqi Wang
- Epigenetics & Function Group, Hohai University, Nanjing, China
| | - Yu Chen
- Epigenetics & Function Group, Hohai University, Nanjing, China
| | - Ming Li
- Epigenetics & Function Group, Hohai University, Nanjing, China
| | - Yongfeng Tao
- Epigenetics & Function Group, Hohai University, Nanjing, China
| |
Collapse
|
9
|
Yang X, Amgad M, Cooper LAD, Du Y, Fu H, Ivanov AA. High expression of MKK3 is associated with worse clinical outcomes in African American breast cancer patients. J Transl Med 2020; 18:334. [PMID: 32873298 PMCID: PMC7465409 DOI: 10.1186/s12967-020-02502-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 08/25/2020] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND African American women experience a twofold higher incidence of triple-negative breast cancer (TNBC) and are 40% more likely to die from breast cancer than women of other ethnicities. However, the molecular bases for the survival disparity in breast cancer remain unclear, and no race-specific therapeutic targets have been proposed. To address this knowledge gap, we performed a systematic analysis of the relationship between gene mRNA expression and clinical outcomes determined for The Cancer Genome Atlas (TCGA) breast cancer patient cohort. METHODS The systematic differential analysis of mRNA expression integrated with the analysis of clinical outcomes was performed for 1055 samples from the breast invasive carcinoma TCGA PanCancer cohorts. A deep learning fully-convolutional model was used to determine the association between gene expression and tumor features based on breast cancer patient histopathological images. RESULTS We found that more than 30% of all protein-coding genes are differentially expressed in White and African American breast cancer patients. We have determined a set of 32 genes whose overexpression in African American patients strongly correlates with decreased survival of African American but not White breast cancer patients. Among those genes, the overexpression of mitogen-activated protein kinase kinase 3 (MKK3) has one of the most dramatic and race-specific negative impacts on the survival of African American patients, specifically with triple-negative breast cancer. We found that MKK3 can promote the TNBC tumorigenesis in African American patients in part by activating of the epithelial-to-mesenchymal transition induced by master regulator MYC. CONCLUSIONS The poor clinical outcomes in African American women with breast cancer can be associated with the abnormal elevation of individual gene expression. Such genes, including those identified and prioritized in this study, could represent new targets for therapeutic intervention. A strong correlation between MKK3 overexpression, activation of its binding partner and major oncogene MYC, and worsened clinical outcomes suggests the MKK3-MYC protein-protein interaction as a new promising target to reduce racial disparity in breast cancer survival.
Collapse
Affiliation(s)
- Xuan Yang
- Department of Pharmacology and Chemical Biology, Emory University School of Medicine, Emory University, 1510 Clifton Road, Atlanta, GA, 30322, USA.,Emory Chemical Biology Discovery Center, Emory University School of Medicine, Emory University, Atlanta, GA, USA
| | - Mohamed Amgad
- Department of Biomedical Informatics, Emory University School of Medicine, Emory University, Atlanta, GA, USA
| | - Lee A D Cooper
- Department of Pathology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Yuhong Du
- Department of Pharmacology and Chemical Biology, Emory University School of Medicine, Emory University, 1510 Clifton Road, Atlanta, GA, 30322, USA.,Emory Chemical Biology Discovery Center, Emory University School of Medicine, Emory University, Atlanta, GA, USA.,Winship Cancer Institute, Emory University, Atlanta, GA, USA
| | - Haian Fu
- Department of Pharmacology and Chemical Biology, Emory University School of Medicine, Emory University, 1510 Clifton Road, Atlanta, GA, 30322, USA. .,Emory Chemical Biology Discovery Center, Emory University School of Medicine, Emory University, Atlanta, GA, USA. .,Winship Cancer Institute, Emory University, Atlanta, GA, USA. .,Department of Hematology & Medical Oncology, Emory University, Atlanta, GA, USA.
| | - Andrey A Ivanov
- Department of Pharmacology and Chemical Biology, Emory University School of Medicine, Emory University, 1510 Clifton Road, Atlanta, GA, 30322, USA. .,Emory Chemical Biology Discovery Center, Emory University School of Medicine, Emory University, Atlanta, GA, USA. .,Winship Cancer Institute, Emory University, Atlanta, GA, USA.
| |
Collapse
|
10
|
Belhechmi S, Bin RD, Rotolo F, Michiels S. Accounting for grouped predictor variables or pathways in high-dimensional penalized Cox regression models. BMC Bioinformatics 2020; 21:277. [PMID: 32615919 PMCID: PMC7331150 DOI: 10.1186/s12859-020-03618-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Accepted: 06/19/2020] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND The standard lasso penalty and its extensions are commonly used to develop a regularized regression model while selecting candidate predictor variables on a time-to-event outcome in high-dimensional data. However, these selection methods focus on a homogeneous set of variables and do not take into account the case of predictors belonging to functional groups; typically, genomic data can be grouped according to biological pathways or to different types of collected data. Another challenge is that the standard lasso penalisation is known to have a high false discovery rate. RESULTS We evaluated different penalizations in a Cox model to select grouped variables in order to further penalize variables that, in addition to having a low effect, belong to a group with a low overall effect; and to favor the selection of variables that, in addition to having a large effect, belong to a group with a large overall effect. We considered the case of prespecified and disjoint groups and proposed diverse weights for the adaptive lasso method. In particular we proposed the product Max Single Wald by Single Wald weighting (MSW*SW) which takes into account the information of the group to which it belongs and of this biomarker. Through simulations, we compared the selection and prediction ability of our approach with the standard lasso, the composite Minimax Concave Penalty (cMCP), the group exponential lasso (gel), the Integrative L1-Penalized Regression with Penalty Factors (IPF-Lasso), and the Sparse Group Lasso (SGL) methods. In addition, we illustrated the methods using gene expression data of 614 breast cancer patients. CONCLUSIONS The adaptive lasso with the MSW*SW weighting method incorporates both the information in the grouping structure and the individual variable. It outperformed the competitors by reducing the false discovery rate without severely increasing the false negative rate.
Collapse
Affiliation(s)
- Shaima Belhechmi
- Université Paris-Saclay, Univ. Paris-Sud, UVSQ, CESP, INSERM U1018 Oncostat, Villejuif, F-94805, France.,Service de biostatistique et d'épidémiologie, Gustave Roussy, Villejuif, F-94805, France
| | | | - Federico Rotolo
- Biostatistics and Data Management Unit, Innate Pharma, Marseille, France
| | - Stefan Michiels
- Université Paris-Saclay, Univ. Paris-Sud, UVSQ, CESP, INSERM U1018 Oncostat, Villejuif, F-94805, France. .,Service de biostatistique et d'épidémiologie, Gustave Roussy, Villejuif, F-94805, France.
| |
Collapse
|