1
|
Lavalley-Morelle A, Peiffer-Smadja N, Gressens SB, Souhail B, Lahens A, Bounhiol A, Lescure FX, Mentré F, Mullaert J. Multivariate joint model under competing risks to predict death of hospitalized patients for SARS-CoV-2 infection. Biom J 2024; 66:e2300049. [PMID: 37915123 DOI: 10.1002/bimj.202300049] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 06/18/2023] [Accepted: 07/26/2023] [Indexed: 11/03/2023]
Abstract
During the coronavirus disease 2019 (COVID-19) pandemic, several clinical prognostic scores have been proposed and evaluated in hospitalized patients, relying on variables available at admission. However, capturing data collected from the longitudinal follow-up of patients during hospitalization may improve prediction accuracy of a clinical outcome. To answer this question, 327 patients diagnosed with COVID-19 and hospitalized in an academic French hospital between January and July 2020 are included in the analysis. Up to 59 biomarkers were measured from the patient admission to the time to death or discharge from hospital. We consider a joint model with multiple linear or nonlinear mixed-effects models for biomarkers evolution, and a competing risks model involving subdistribution hazard functions for the risks of death and discharge. The links are modeled by shared random effects, and the selection of the biomarkers is mainly based on the significance of the link between the longitudinal and survival parts. Three biomarkers are retained: the blood neutrophil counts, the arterial pH, and the C-reactive protein. The predictive performances of the model are evaluated with the time-dependent area under the curve (AUC) for different landmark and horizon times, and compared with those obtained from a baseline model that considers only information available at admission. The joint modeling approach helps to improve predictions when sufficient information is available. For landmark 6 days and horizon of 30 days, we obtain AUC [95% CI] 0.73 [0.65, 0.81] and 0.81 [0.73, 0.89] for the baseline and joint model, respectively (p = 0.04). Statistical inference is validated through a simulation study.
Collapse
Affiliation(s)
| | - Nathan Peiffer-Smadja
- Université Paris Cité, INSERM, IAME, Paris, France
- Department of Infectious and Tropical Diseases, AP-HP, Bichat-Claude Bernard University Hospital, Paris, France
| | - Simon B Gressens
- Department of Infectious and Tropical Diseases, AP-HP, Bichat-Claude Bernard University Hospital, Paris, France
| | - Bérénice Souhail
- Department of Infectious and Tropical Diseases, AP-HP, Bichat-Claude Bernard University Hospital, Paris, France
| | - Alexandre Lahens
- Department of Infectious and Tropical Diseases, AP-HP, Bichat-Claude Bernard University Hospital, Paris, France
| | - Agathe Bounhiol
- Department of Infectious and Tropical Diseases, AP-HP, Bichat-Claude Bernard University Hospital, Paris, France
| | - François-Xavier Lescure
- Université Paris Cité, INSERM, IAME, Paris, France
- Department of Infectious and Tropical Diseases, AP-HP, Bichat-Claude Bernard University Hospital, Paris, France
| | - France Mentré
- Université Paris Cité, INSERM, IAME, Paris, France
- Department of Epidemiology, Biostatistics and Clinical Research, AP-HP, Bichat-Claude Bernard University Hospital, Paris, France
| | - Jimmy Mullaert
- Université Paris Cité, INSERM, IAME, Paris, France
- Department of Epidemiology, Biostatistics and Clinical Research, AP-HP, Bichat-Claude Bernard University Hospital, Paris, France
| |
Collapse
|
2
|
Tao S, Ravindranath R, Wang SY. Predicting Glaucoma Progression to Surgery with Artificial Intelligence Survival Models. OPHTHALMOLOGY SCIENCE 2023; 3:100336. [PMID: 37415920 PMCID: PMC10320266 DOI: 10.1016/j.xops.2023.100336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 05/16/2023] [Accepted: 05/17/2023] [Indexed: 07/08/2023]
Abstract
Purpose Prior artificial intelligence (AI) models for predicting glaucoma progression have used traditional classifiers that do not consider the longitudinal nature of patients' follow-up. In this study, we developed survival-based AI models for predicting glaucoma patients' progression to surgery, comparing performance of regression-, tree-, and deep learning-based approaches. Design Retrospective observational study. Subjects Patients with glaucoma seen at a single academic center from 2008 to 2020 identified from electronic health records (EHRs). Methods From the EHRs, we identified 361 baseline features, including demographics, eye examinations, diagnoses, and medications. We trained AI survival models to predict patients' progression to glaucoma surgery using the following: (1) a penalized Cox proportional hazards (CPH) model with principal component analysis (PCA); (2) random survival forests (RSFs); (3) gradient-boosting survival (GBS); and (4) a deep learning model (DeepSurv). The concordance index (C-index) and mean cumulative/dynamic area under the curve (mean AUC) were used to evaluate model performance on a held-out test set. Explainability was investigated using Shapley values for feature importance and visualization of model-predicted cumulative hazard curves for patients with different treatment trajectories. Main Outcome Measures Progression to glaucoma surgery. Results Of the 4512 patients with glaucoma, 748 underwent glaucoma surgery, with a median follow-up of 1038 days. The DeepSurv model performed best overall (C-index, 0.775; mean AUC, 0.802) among the models studied in this article (CPH with PCA: C-index, 0.745; mean AUC, 0.780; RSF: C-index, 0.766; mean AUC, 0.804; GBS: C-index, 0.764; mean AUC, 0.791). Predicted cumulative hazard curves demonstrate how models could distinguish between patient who underwent early surgery and patients who underwent surgery after > 3000 days of follow-up or no surgery. Conclusions Artificial intelligence survival models can predict progression to glaucoma surgery using structured data from EHRs. Tree-based and deep learning-based models performed better at predicting glaucoma progression to surgery than the CPH regression model, potentially because of their better suitability for high-dimensional data sets. Future work predicting ophthalmic outcomes should consider using tree-based and deep learning-based survival AI models. Additional research is needed to develop and evaluate more sophisticated deep learning survival models that can incorporate clinical notes or imaging. Financial Disclosures Proprietary or commercial disclosure may be found after the references.
Collapse
Affiliation(s)
- Shiqi Tao
- Byers Eye Institute, Department of Ophthalmology, Stanford University, Palo Alto, California
| | - Rohith Ravindranath
- Byers Eye Institute, Department of Ophthalmology, Stanford University, Palo Alto, California
| | - Sophia Y. Wang
- Byers Eye Institute, Department of Ophthalmology, Stanford University, Palo Alto, California
| |
Collapse
|
3
|
Zhang CC, Hou RP, Feng W, Fu XL. Lymph Node Parameters Predict Adjuvant Chemoradiotherapy Efficacy and Disease-Free Survival in Pathologic N2 Non-Small Cell Lung Cancer. Front Oncol 2021; 11:736892. [PMID: 34604073 PMCID: PMC8484950 DOI: 10.3389/fonc.2021.736892] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 08/30/2021] [Indexed: 12/25/2022] Open
Abstract
Pathologic N2 non-small cell lung cancer (NSCLC) is prominently intrinsically heterogeneous. We aimed to identify homogeneous prognostic subgroups and evaluate the role of different adjuvant treatments. We retrospectively collected patients with resected pathologic T1-3N2M0 NSCLC from the Shanghai Chest Hospital as the primary cohort and randomly allocated them (3:1) to the training set and the validation set 1. We had patients from the Fudan University Shanghai Cancer Center as an external validation cohort (validation set 2) with the same inclusion and exclusion criteria. Variables significantly related to disease-free survival (DFS) were used to build an adaptive Elastic-Net Cox regression model. Nomogram was used to visualize the model. The discriminative and calibration abilities of the model were assessed by time-dependent area under the receiver operating characteristic curves (AUCs) and calibration curves. The primary cohort consisted of 1,312 patients. Tumor size, histology, grade, skip N2, involved N2 stations, lymph node ratio (LNR), and adjuvant treatment pattern were identified as significant variables associated with DFS and integrated into the adaptive Elastic-Net Cox regression model. A nomogram was developed to predict DFS. The model showed good discrimination (the median AUC in the validation set 1: 0.66, range 0.62 to 0.71; validation set 2: 0.66, range 0.61 to 0.73). We developed and validated a nomogram that contains multiple variables describing lymph node status (skip N2, involved N2 stations, and LNR) to predict the DFS of patients with resected pathologic N2 NSCLC. Through this model, we could identify a subtype of NSCLC with a more malignant clinical biological behavior and found that this subtype remained at high risk of disease recurrence after adjuvant chemoradiotherapy.
Collapse
Affiliation(s)
- Chen-Chen Zhang
- Department of Radiation Oncology, Shanghai Chest Hospital, Shanghai Jiao Tong University, Shanghai, China
| | - Run-Ping Hou
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Wen Feng
- Department of Radiation Oncology, Shanghai Chest Hospital, Shanghai Jiao Tong University, Shanghai, China
| | - Xiao-Long Fu
- Department of Radiation Oncology, Shanghai Chest Hospital, Shanghai Jiao Tong University, Shanghai, China.,Department of Radiation Oncology, Fudan University Shanghai Cancer Center, Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| |
Collapse
|
4
|
Haider SP, Zeevi T, Baumeister P, Reichel C, Sharaf K, Forghani R, Kann BH, Judson BL, Prasad ML, Burtness B, Mahajan A, Payabvash S. Potential Added Value of PET/CT Radiomics for Survival Prognostication beyond AJCC 8th Edition Staging in Oropharyngeal Squamous Cell Carcinoma. Cancers (Basel) 2020; 12:cancers12071778. [PMID: 32635216 PMCID: PMC7407414 DOI: 10.3390/cancers12071778] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Revised: 06/29/2020] [Accepted: 06/30/2020] [Indexed: 12/18/2022] Open
Abstract
Accurate risk-stratification can facilitate precision therapy in oropharyngeal squamous cell carcinoma (OPSCC). We explored the potential added value of baseline positron emission tomography (PET)/computed tomography (CT) radiomic features for prognostication and risk stratification of OPSCC beyond the American Joint Committee on Cancer (AJCC) 8th edition staging scheme. Using institutional and publicly available datasets, we included OPSCC patients with known human papillomavirus (HPV) status, without baseline distant metastasis and treated with curative intent. We extracted 1037 PET and 1037 CT radiomic features quantifying lesion shape, imaging intensity, and texture patterns from primary tumors and metastatic cervical lymph nodes. Utilizing random forest algorithms, we devised novel machine-learning models for OPSCC progression-free survival (PFS) and overall survival (OS) using “radiomics” features, “AJCC” variables, and the “combined” set as input. We designed both single- (PET or CT) and combined-modality (PET/CT) models. Harrell’s C-index quantified survival model performance; risk stratification was evaluated in Kaplan–Meier analysis. A total of 311 patients were included. In HPV-associated OPSCC, the best “radiomics” model achieved an average C-index ± standard deviation of 0.62 ± 0.05 (p = 0.02) for PFS prediction, compared to 0.54 ± 0.06 (p = 0.32) utilizing “AJCC” variables. Radiomics-based risk-stratification of HPV-associated OPSCC was significant for PFS and OS. Similar trends were observed in HPV-negative OPSCC. In conclusion, radiomics imaging features extracted from pre-treatment PET/CT may provide complimentary information to the current AJCC staging scheme for survival prognostication and risk-stratification of HPV-associated OPSCC.
Collapse
Affiliation(s)
- Stefan P. Haider
- Section of Neuroradiology, Department of Radiology and Biomedical Imaging, Yale School of Medicine, 789 Howard Ave, New Haven, CT 06519, USA; (S.P.H.); (A.M.)
- Department of Otorhinolaryngology, University Hospital of Ludwig Maximilians Universität München, Marchioninistrasse 15, 81377 Munich, Germany; (P.B.); (C.R.); (K.S.)
| | - Tal Zeevi
- Center for Translational Imaging Analysis and Machine Learning, Department of Radiology and Biomedical Imaging, Yale School of Medicine, 333 Cedar Street, New Haven, CT 06510, USA;
| | - Philipp Baumeister
- Department of Otorhinolaryngology, University Hospital of Ludwig Maximilians Universität München, Marchioninistrasse 15, 81377 Munich, Germany; (P.B.); (C.R.); (K.S.)
| | - Christoph Reichel
- Department of Otorhinolaryngology, University Hospital of Ludwig Maximilians Universität München, Marchioninistrasse 15, 81377 Munich, Germany; (P.B.); (C.R.); (K.S.)
| | - Kariem Sharaf
- Department of Otorhinolaryngology, University Hospital of Ludwig Maximilians Universität München, Marchioninistrasse 15, 81377 Munich, Germany; (P.B.); (C.R.); (K.S.)
| | - Reza Forghani
- Department of Diagnostic Radiology and Augmented Intelligence & Precision Health Laboratory, McGill University Health Centre & Research Institute, 1650 Cedar Avenue, Montreal, QC H3G 1A4, Canada;
| | - Benjamin H. Kann
- Department of Radiation Oncology, Dana-Farber Cancer Institute, Harvard Medical School, 450 Brookline Avenue, Boston, MA 02215, USA;
| | - Benjamin L. Judson
- Division of Otolaryngology, Department of Surgery, Yale School of Medicine, 330 Cedar Street, New Haven, CT 06520, USA;
| | - Manju L. Prasad
- Department of Pathology, Yale School of Medicine, 310 Cedar Street, New Haven, CT 06520, USA;
| | - Barbara Burtness
- Section of Medical Oncology, Department of Internal Medicine, Yale School of Medicine, 25 York Street, New Haven, CT 06520, USA;
| | - Amit Mahajan
- Section of Neuroradiology, Department of Radiology and Biomedical Imaging, Yale School of Medicine, 789 Howard Ave, New Haven, CT 06519, USA; (S.P.H.); (A.M.)
| | - Seyedmehdi Payabvash
- Section of Neuroradiology, Department of Radiology and Biomedical Imaging, Yale School of Medicine, 789 Howard Ave, New Haven, CT 06519, USA; (S.P.H.); (A.M.)
- Correspondence: ; Tel.: +1-(203)-214-4650
| |
Collapse
|
5
|
A Predictor of Early Disease Recurrence in Patients With Breast Cancer Using a Cell-free RNA and Protein Liquid Biopsy. Clin Breast Cancer 2019; 20:108-116. [PMID: 31607655 DOI: 10.1016/j.clbc.2019.07.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 07/05/2019] [Accepted: 07/13/2019] [Indexed: 12/12/2022]
Abstract
INTRODUCTION Circulating biomarkers have been increasingly used in the clinical management of breast cancer. The present study evaluated whether RNAs and a protein present in the plasma of patients with breast cancer might have utility as prognostic biomarkers complementary to existing clinical tests. PATIENTS AND METHODS We performed microarray profiling of small noncoding RNAs in plasma samples from 30 patients with breast cancer and 10 control individuals. Two small noncoding RNAs, including microRNA (miR)-923, were selected and quantified in plasma samples from an evaluation cohort of 253 patients with breast cancer, using droplet digital polymerase chain reaction. We also measured cancer antigen (CA) 15-3 protein levels in these samples. Cox regression survival analysis was used to determine which markers were associated with patient prognosis. RESULTS As independent markers of prognosis, the plasma levels of miR-923 and CA 15-3 at the time of surgery for breast cancer were significantly associated with prognosis, irrespective of treatment (Cox proportional hazards, P = 3.9 × 10-3 and 1.9 × 10-9, respectively). After building a multivariable model with standard clinical and pathological features, the addition of miR-923 and CA 15-3 information into the model resulted in a significantly better predictor of disease recurrence in patients, irrespective of treatment, compared with the use of clinicopathological data alone (area under the curve at 3 years, 0.858 vs. 0.770 with clinicopathological markers only; P = .017). CONCLUSION We propose that the plasma levels of miR-923 and CA 15-3, combined with standard clinicopathological predictors, could be used as a preoperative, noninvasive estimate of patient prognosis to identify which women might need more aggressive treatment or closer surveillance after surgery for breast cancer.
Collapse
|
6
|
Beyene KM, El Ghouch A, Oulhaj A. On the validity of time-dependent AUC estimation in the presence of cure fraction. Biom J 2019; 61:1430-1447. [PMID: 31310019 DOI: 10.1002/bimj.201800376] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2018] [Revised: 04/16/2019] [Accepted: 06/04/2019] [Indexed: 11/09/2022]
Abstract
During the last decades, several approaches have been proposed to estimate the time-dependent area under the receiver operating characteristic curve (AUC) of risk tools derived from survival data. The validity of these estimators relies on some regularity assumptions among which a survival function being proper. In practice, this assumption is not always satisfied because a fraction of the population may not be susceptible to experience the event of interest even for long follow-up. Studying the sensitivity of the proposed estimators to the violation of this assumption is of substantial interest. In this paper, we investigate the performance of a nonparametric simple estimator, developed for classical survival data, in the case when the population exhibits a cure fraction. Motivated from the current practice of deriving risk tools in oncology and cardiovascular disease prevention, we also assess the loss, in terms of predictive performance, when deriving risk tools from survival models that do not acknowledge the presence of cure. The simulation results show that the investigated method is valid even under the presence of cure. They also show that risk tools derived from survival models that ignore the presence of cure have smaller AUC compared to those derived from survival models that acknowledge the presence of cure. This was also attested with a real data analysis from a breast cancer study.
Collapse
Affiliation(s)
- Kassu M Beyene
- Institute of Statistics, Biostatistics and Actuarial Sciences, Catholic University of Louvain, Louvain la Neuve, Belgium
| | - Anouar El Ghouch
- Institute of Statistics, Biostatistics and Actuarial Sciences, Catholic University of Louvain, Louvain la Neuve, Belgium
| | - Abderrahim Oulhaj
- Institute of Public Health, College of Medicine and Health Sciences, UAE University, Al-Ain, United Arab Emirates
| |
Collapse
|
7
|
Samawi HH, Sim HW, Chan KK, Alghamdi MA, Lee-Ying RM, Knox JJ, Gill P, Romagnino A, Batuyong E, Ko YJ, Davies JM, Lim HJ, Cheung WY, Tam VC. Prognosis of patients with hepatocellular carcinoma treated with sorafenib: a comparison of five models in a large Canadian database. Cancer Med 2018; 7:2816-2825. [PMID: 29766659 PMCID: PMC6051235 DOI: 10.1002/cam4.1493] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Revised: 03/09/2018] [Accepted: 03/20/2018] [Indexed: 02/06/2023] Open
Abstract
Several systems (tumor-node-metastasis [TNM], Barcelona Clinic Liver Cancer [BCLC], Okuda, Cancer of the Liver Italian Program [CLIP], and albumin-bilirubin grade [ALBI]) were developed to estimate the prognosis of patients with hepatocellular carcinoma (HCC) mostly prior to the prevalent use of sorafenib. We aimed to compare the prognostic and discriminatory power of these models in predicting survival for HCC patients treated with sorafenib and to identify independent prognostic factors for survival in this population. Patients who received sorafenib for the treatment of HCC between 1 January 2008 and 30 June 2015 in the provinces of British Columbia and Alberta, and two large cancer centers in Toronto, Ontario, were included. Survival was assessed using the Kaplan-Meier method. Multivariate Cox regression was used to identify predictors of survival. The models were compared with respect to homogeneity, discriminatory ability, monotonicity of gradients, time-dependent area under the curve, and Akaike information criterion. A total of 681 patients were included. 80% were males, 86% had Child-Pugh class A, and 37% of patients were East Asians. The most common etiology for liver disease was hepatitis B (34%) and C (31%). In all model comparisons, CLIP performed better while BCLC and TNM7 performed less favorably but the differences were small. The utility of each system in allocating patients into different prognostic groups varied, for example, TNM poorly differentiated patients in advanced stages (8.7 months (m) (95% CI 6.5-11.5) versus 8.4 m (95% CI 7.0-9.6) for stages III and IV, respectively) while ALBI had excellent discrimination of early grades (15.6 m [95% CI 13.0-18.4] versus 8.3 m [95% CI 7.0-9.2] for grades 1 and 2, respectively). On multivariate analysis, hepatitis C, alcoholism, and prior hepatic resection were independently prognostic of better survival (P < 0.01). In conclusion, none of the prognostic systems was optimal in predicting survival in sorafenib-treated patients with HCC. Etiology of liver disease should be considered in future models and clinical trial designs.
Collapse
Affiliation(s)
- Haider H Samawi
- British Columbia Cancer Agency, Vancouver, British Columbia, Canada
| | - Hao-Wen Sim
- Princess Margaret Cancer Centre, Toronto, Ontario, Canada
| | - Kelvin K Chan
- Sunnybrook Odette Cancer Centre, Toronto, Ontario, Canada
| | | | | | | | - Parneet Gill
- British Columbia Cancer Agency, Vancouver, British Columbia, Canada
| | | | | | - Yoo-Joung Ko
- Sunnybrook Odette Cancer Centre, Toronto, Ontario, Canada
| | - Janine M Davies
- British Columbia Cancer Agency, Vancouver, British Columbia, Canada
| | - Howard J Lim
- British Columbia Cancer Agency, Vancouver, British Columbia, Canada
| | | | | |
Collapse
|
8
|
Yengo L, Arredouani A, Marre M, Roussel R, Vaxillaire M, Falchi M, Haoudi A, Tichet J, Balkau B, Bonnefond A, Froguel P. Impact of statistical models on the prediction of type 2 diabetes using non-targeted metabolomics profiling. Mol Metab 2016; 5:918-925. [PMID: 27689004 PMCID: PMC5034686 DOI: 10.1016/j.molmet.2016.08.011] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Revised: 08/12/2016] [Accepted: 08/16/2016] [Indexed: 12/21/2022] Open
Abstract
OBJECTIVE Characterizing specific metabolites in sub-clinical phases preceding the onset of type 2 diabetes to enable efficient preventive and personalized interventions. RESEARCH DESIGN AND METHODS We developed predictive models of type 2 diabetes using two strategies. One strategy focused on the probability of incidence only and was based on logistic regression (MRS1); the other strategy accounted for the age at diagnosis of diabetes and was based on Cox regression (MRS2). We assessed 293 metabolites using non-targeted metabolomics in fasting plasma samples of 1,044 participants (including 231 incident cases over 9 years) used as training population; and fasting serum samples of 128 participants (64 incident cases versus 64 controls) used as validation population. We applied a LASSO-based variable selection aiming at maximizing the out-of-sample area under the receiver operating characteristic curve (AROC) and integrated AROC. RESULTS Sixteen and 17 metabolites were selected for MRS1 and MRS2, respectively, with AROC = 90% and 73% in the training and validation populations, respectively for MRS1. MRS2 had a similar performance and was significantly associated with a younger age of onset of type 2 diabetes (β = -3.44 years per MRS2 SD in the training population, p = 1.56 × 10(-7); β = -4.73 years per MRS2 SD in the validation population, p = 4.04 × 10(-3)). CONCLUSIONS Overall, this study illustrates that metabolomics improves prediction of type 2 diabetes incidence of 4.5% on top of known clinical and biological markers, reaching 90% in total AROC, which is considered the threshold for clinical validity, suggesting it may be used in targeting interventions to prevent type 2 diabetes.
Collapse
Affiliation(s)
- Loic Yengo
- CNRS UMR8199, Pasteur Institute of Lille, Lille, France; European Genomic Institute for Diabetes (EGID), FR-3508, Lille, France; Lille University, France
| | | | - Michel Marre
- INSERM, U1138 (équipe 2: Pathophysiology and Therapeutics of Vascular and Renal Diseases Related to Diabetes, Centre de Recherches des Cordeliers), Paris, France; University Paris 7 Denis Diderot, Sorbonne Paris Cité, France; AP-HP, DHU FIRE, Department of Endocrinology, Diabetology, Nutrition, and Metabolic Diseases, Bichat Claude Bernard Hospital, Paris, France
| | - Ronan Roussel
- INSERM, U1138 (équipe 2: Pathophysiology and Therapeutics of Vascular and Renal Diseases Related to Diabetes, Centre de Recherches des Cordeliers), Paris, France; University Paris 7 Denis Diderot, Sorbonne Paris Cité, France; AP-HP, DHU FIRE, Department of Endocrinology, Diabetology, Nutrition, and Metabolic Diseases, Bichat Claude Bernard Hospital, Paris, France
| | - Martine Vaxillaire
- CNRS UMR8199, Pasteur Institute of Lille, Lille, France; European Genomic Institute for Diabetes (EGID), FR-3508, Lille, France; Lille University, France
| | - Mario Falchi
- Department of Genomics of Common Disease, School of Public Health, Imperial College London, Hammersmith Hospital, London, UK
| | - Abdelali Haoudi
- Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | | | - Beverley Balkau
- INSERM U-1018, CESP, Renal and Cardiovascular Epidemiology, UVSQ-UPS, Villejuif, France
| | - Amélie Bonnefond
- CNRS UMR8199, Pasteur Institute of Lille, Lille, France; European Genomic Institute for Diabetes (EGID), FR-3508, Lille, France; Lille University, France
| | - Philippe Froguel
- CNRS UMR8199, Pasteur Institute of Lille, Lille, France; European Genomic Institute for Diabetes (EGID), FR-3508, Lille, France; Lille University, France; Department of Genomics of Common Disease, School of Public Health, Imperial College London, Hammersmith Hospital, London, UK.
| |
Collapse
|
9
|
Mayr A, Hofner B, Schmid M. Boosting the discriminatory power of sparse survival models via optimization of the concordance index and stability selection. BMC Bioinformatics 2016; 17:288. [PMID: 27444890 PMCID: PMC4957316 DOI: 10.1186/s12859-016-1149-8] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2015] [Accepted: 07/13/2016] [Indexed: 12/15/2022] Open
Abstract
Background When constructing new biomarker or gene signature scores for time-to-event outcomes, the underlying aims are to develop a discrimination model that helps to predict whether patients have a poor or good prognosis and to identify the most influential variables for this task. In practice, this is often done fitting Cox models. Those are, however, not necessarily optimal with respect to the resulting discriminatory power and are based on restrictive assumptions. We present a combined approach to automatically select and fit sparse discrimination models for potentially high-dimensional survival data based on boosting a smooth version of the concordance index (C-index). Due to this objective function, the resulting prediction models are optimal with respect to their ability to discriminate between patients with longer and shorter survival times. The gradient boosting algorithm is combined with the stability selection approach to enhance and control its variable selection properties. Results The resulting algorithm fits prediction models based on the rankings of the survival times and automatically selects only the most stable predictors. The performance of the approach, which works best for small numbers of informative predictors, is demonstrated in a large scale simulation study: C-index boosting in combination with stability selection is able to identify a small subset of informative predictors from a much larger set of non-informative ones while controlling the per-family error rate. In an application to discover biomarkers for breast cancer patients based on gene expression data, stability selection yielded sparser models and the resulting discriminatory power was higher than with lasso penalized Cox regression models. Conclusion The combination of stability selection and C-index boosting can be used to select small numbers of informative biomarkers and to derive new prediction rules that are optimal with respect to their discriminatory power. Stability selection controls the per-family error rate which makes the new approach also appealing from an inferential point of view, as it provides an alternative to classical hypothesis tests for single predictor effects. Due to the shrinkage and variable selection properties of statistical boosting algorithms, the latter tests are typically unfeasible for prediction models fitted by boosting. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1149-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Andreas Mayr
- Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Waldstr. 6, Erlangen, 91054, Germany. .,Institut für Medizinische Biometrie, Informatik und Epidemiologie, Rheinische Friedrich-Wilhelms-Universität Bonn, Sigmund-Freud-Str. 25, Bonn, 53105, Germany.
| | - Benjamin Hofner
- Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Waldstr. 6, Erlangen, 91054, Germany
| | - Matthias Schmid
- Institut für Medizinische Biometrie, Informatik und Epidemiologie, Rheinische Friedrich-Wilhelms-Universität Bonn, Sigmund-Freud-Str. 25, Bonn, 53105, Germany
| |
Collapse
|
10
|
Rodríguez-Álvarez MX, Meira-Machado L, Abu-Assi E, Raposeiras-Roubín S. Nonparametric estimation of time-dependent ROC curves conditional on a continuous covariate. Stat Med 2015; 35:1090-102. [PMID: 26487068 DOI: 10.1002/sim.6769] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2014] [Accepted: 09/29/2015] [Indexed: 12/31/2022]
Abstract
The receiver-operating characteristic (ROC) curve is the most widely used measure for evaluating the performance of a diagnostic biomarker when predicting a binary disease outcome. The ROC curve displays the true positive rate (or sensitivity) and the false positive rate (or 1-specificity) for different cut-off values used to classify an individual as healthy or diseased. In time-to-event studies, however, the disease status (e.g. death or alive) of an individual is not a fixed characteristic, and it varies along the study. In such cases, when evaluating the performance of the biomarker, several issues should be taken into account: first, the time-dependent nature of the disease status; and second, the presence of incomplete data (e.g. censored data typically present in survival studies). Accordingly, to assess the discrimination power of continuous biomarkers for time-dependent disease outcomes, time-dependent extensions of true positive rate, false positive rate, and ROC curve have been recently proposed. In this work, we present new nonparametric estimators of the cumulative/dynamic time-dependent ROC curve that allow accounting for the possible modifying effect of current or past covariate measures on the discriminatory power of the biomarker. The proposed estimators can accommodate right-censored data, as well as covariate-dependent censoring. The behavior of the estimators proposed in this study will be explored through simulations and illustrated using data from a cohort of patients who suffered from acute coronary syndrome.
Collapse
Affiliation(s)
- María Xosé Rodríguez-Álvarez
- Department of Statistics and Operations Research, and Biomedical Research Centre (CINBIO), University of Vigo, Campus Lagoas-Marcosende s/n, Vigo, 36310, Spain
| | - Luís Meira-Machado
- Centre of Mathematics and Department of Mathematics and Applications, University of Minho, Campus de Azurém, Guimarães, 4800-058, Portugal
| | - Emad Abu-Assi
- Department of Cardiology, University Clinical Hospital of Santiago de Compostela, Spain
| | | |
Collapse
|
11
|
Boulesteix AL, Schmid M. Machine learning versus statistical modeling. Biom J 2014; 56:588-93. [DOI: 10.1002/bimj.201300226] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2013] [Revised: 11/04/2013] [Accepted: 11/06/2013] [Indexed: 12/19/2022]
Affiliation(s)
- Anne-Laure Boulesteix
- Department of Medical Informatics, Biometry and Epidemiology; University of Munich; Germany
| | | |
Collapse
|
12
|
Boosting the concordance index for survival data--a unified framework to derive and evaluate biomarker combinations. PLoS One 2014; 9:e84483. [PMID: 24400093 PMCID: PMC3882229 DOI: 10.1371/journal.pone.0084483] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2013] [Accepted: 11/14/2013] [Indexed: 11/30/2022] Open
Abstract
The development of molecular signatures for the prediction of time-to-event outcomes is a methodologically challenging task in bioinformatics and biostatistics. Although there are numerous approaches for the derivation of marker combinations and their evaluation, the underlying methodology often suffers from the problem that different optimization criteria are mixed during the feature selection, estimation and evaluation steps. This might result in marker combinations that are suboptimal regarding the evaluation criterion of interest. To address this issue, we propose a unified framework to derive and evaluate biomarker combinations. Our approach is based on the concordance index for time-to-event data, which is a non-parametric measure to quantify the discriminatory power of a prediction rule. Specifically, we propose a gradient boosting algorithm that results in linear biomarker combinations that are optimal with respect to a smoothed version of the concordance index. We investigate the performance of our algorithm in a large-scale simulation study and in two molecular data sets for the prediction of survival in breast cancer patients. Our numerical results show that the new approach is not only methodologically sound but can also lead to a higher discriminatory power than traditional approaches for the derivation of gene signatures.
Collapse
|