1
|
Rizopoulos D, Taylor JMG. Optimizing dynamic predictions from joint models using super learning. Stat Med 2024; 43:1315-1328. [PMID: 38270062 DOI: 10.1002/sim.10010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 11/30/2023] [Accepted: 12/29/2023] [Indexed: 01/26/2024]
Abstract
Joint models for longitudinal and time-to-event data are often employed to calculate dynamic individualized predictions used in numerous applications of precision medicine. Two components of joint models that influence the accuracy of these predictions are the shape of the longitudinal trajectories and the functional form linking the longitudinal outcome history to the hazard of the event. Finding a single well-specified model that produces accurate predictions for all subjects and follow-up times can be challenging, especially when considering multiple longitudinal outcomes. In this work, we use the concept of super learning and avoid selecting a single model. In particular, we specify a weighted combination of the dynamic predictions calculated from a library of joint models with different specifications. The weights are selected to optimize a predictive accuracy metric using V-fold cross-validation. We use as predictive accuracy measures the expected quadratic prediction error and the expected predictive cross-entropy. In a simulation study, we found that the super learning approach produces results very similar to the Oracle model, which was the model with the best performance in the test datasets. All proposed methodology is implemented in the freely available R package JMbayes2.
Collapse
Affiliation(s)
- Dimitris Rizopoulos
- Department of Biostatistics, Erasmus MC University Medical Center, Rotterdam, The Netherlands
- Department of Epidemiology, Erasmus MC University Medical Center, Rotterdam, The Netherlands
| | - Jeremy M G Taylor
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
2
|
Bouaziz O. Assessing model prediction performance for the expected cumulative number of recurrent events. Lifetime Data Anal 2024; 30:262-289. [PMID: 37975951 DOI: 10.1007/s10985-023-09610-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 09/19/2023] [Indexed: 11/19/2023]
Abstract
In a recurrent event setting, we introduce a new score designed to evaluate the prediction ability, for a given model, of the expected cumulative number of recurrent events. This score can be seen as an extension of the Brier Score for single time to event data but works for recurrent events with or without a terminal event. Theoretical results are provided that show that under standard assumptions in a recurrent event context, our score can be asymptotically decomposed as the sum of the theoretical mean squared error between the model and the true expected cumulative number of recurrent events and an inseparability term that does not depend on the model. This decomposition is further illustrated on simulations studies. It is also shown that this score should be used in comparison with a reference model, such as a nonparametric estimator that does not include the covariates. Finally, the score is applied for the prediction of hospitalisations on a dataset of patients suffering from atrial fibrillation and a comparison of the prediction performances of different models, such as the Cox model, the Aalen Model or the Ghosh and Lin model, is investigated.
Collapse
|
3
|
Yang W, Jiang J, Schnellinger EM, Kimmel SE, Guo W. Modified Brier score for evaluating prediction accuracy for binary outcomes. Stat Methods Med Res 2022; 31:2287-2296. [PMID: 36031854 PMCID: PMC9691523 DOI: 10.1177/09622802221122391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The Brier score has been a popular measure of prediction accuracy for binary outcomes. However, it is not straightforward to interpret the Brier score for a prediction model since its value depends on the outcome prevalence. We decompose the Brier score into two components, the mean squares between the estimated and true underlying binary probabilities, and the variance of the binary outcome that is not reflective of the model performance. We then propose to modify the Brier score by removing the variance of the binary outcome, estimated via a general sliding window approach. We show that the new proposed measure is more sensitive for comparing different models through simulation. A standardized performance improvement measure is also proposed based on the new criterion to quantify the improvement of prediction performance. We apply the new measures to the data from the Breast Cancer Surveillance Consortium and compare the performance of predicting breast cancer risk using the models with and without its most important predictor.
Collapse
Affiliation(s)
- Wei Yang
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, USA
| | - Jiakun Jiang
- Center for Statistics and Data Science, Beijing Normal University, Zhuhai, China
| | - Erin M Schnellinger
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, USA
| | - Stephen E Kimmel
- Department of Epidemiology, University of Florida, Gainesville, USA
| | - Wensheng Guo
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, USA
| |
Collapse
|
4
|
Cho Y, Molinaro AM, Hu C, Strawderman RL. Regression trees and ensembles for cumulative incidence functions. Int J Biostat 2022; 18:397-419. [PMID: 35334192 PMCID: PMC9509494 DOI: 10.1515/ijb-2021-0014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 03/02/2022] [Indexed: 01/10/2023]
Abstract
The use of cumulative incidence functions for characterizing the risk of one type of event in the presence of others has become increasingly popular over the past two decades. The problems of modeling, estimation and inference have been treated using parametric, nonparametric and semi-parametric methods. Efforts to develop suitable extensions of machine learning methods, such as regression trees and ensemble methods, have begun comparatively recently. In this paper, we propose a novel approach to estimating cumulative incidence curves in a competing risks setting using regression trees and associated ensemble estimators. The proposed methods use augmented estimators of the Brier score risk as the primary basis for building and pruning trees, and lead to methods that are easily implemented using existing R packages. Data from the Radiation Therapy Oncology Group (trial 9410) is used to illustrate these new methods.
Collapse
Affiliation(s)
- Youngjoo Cho
- Department of Applied Statistics, Konkuk University, Seoul, Republic of Korea
| | - Annette M. Molinaro
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA, USA
| | - Chen Hu
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, MA, USA
| | - Robert L. Strawderman
- Department of Biostatistics & Computational Biology, University of Rochester, Rochester, NY, USA
| |
Collapse
|
5
|
Talebi A, Mortensen RN, Gerds TA, Jeppesen JL, Torp-Pedersen C. Prediction of cardiovascular events from systolic or diastolic blood pressure. J Clin Hypertens (Greenwich) 2022; 24:760-769. [PMID: 35470947 PMCID: PMC9180316 DOI: 10.1111/jch.14468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Revised: 02/26/2022] [Accepted: 03/01/2022] [Indexed: 11/27/2022]
Abstract
Over time, a focus on blood pressure has transferred from diastolic pressure to systolic pressure. Formal analyses of differences in predictive value are scarce. Our goal of the study was whether office SBP adds prognostic information to office DBP and whether both 24-h ambulatory SBP and 24-h ambulatory DBP is specifically important. The authors examined 2097 participants from a population cohort recruited in Copenhagen, Denmark. Cause-specific Cox regression was performed to predict 10-year person-specific absolute risks of fatal and non-fatal cardiovascular (CV) events. Also, the time-dependent area under the receiver operator curve (AUC) was utilized to evaluate discriminative ability. The calibration plots of the models (Hosmer-May test) were calculated as well as the Brier score which combines (discrimination and calibration). Adding both 24-h ambulatory SBP and 24-h ambulatory diastolic blood pressure did not significantly increase AUC for CV mortality and CV events. Moreover, adding both office SBP and office DBP did not significantly improve AUC for both CV mortality and CV events. The difference in AUC (95% confidence interval; p-value) was .26% (-.2% to .73%; .27) for 10-year CV mortality and .69% (-.09% to 1.46%; .082) for 10-year risk of CV events. The difference in AUC was .12% (-.2% to .44%; .46) for 10-year CV mortality and .04% (-.35 to .42%; .85) for 10-year risk of CV events. Moreover, for both CV mortality and CV events, office SBP did not improve prognostic information to office DBP. In addition, the Brier scores of office BP in both CV mortality and CV events were .078 and .077, respectively. Furthermore, the Brier scores were .077 and .078 in CV mortality and CV events of 24-h ambulatory. For the average population as those participating in a population survey, the 10-year discriminative ability for long-term predictions of CV death and CV events is not improved by adding systolic to diastolic blood pressure. This finding is found for ambulatory as well as office blood pressure.
Collapse
Affiliation(s)
- Atefeh Talebi
- Colorectal Research Center, Iran University of Medical Sciences, Tehran, Iran
| | | | | | - Jørgen Lykke Jeppesen
- Cardiology, Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.,Department of Medicine, Amager Hvidovre Hospital Glostrup, University of Copenhagen, Glostrup, Denmark
| | | |
Collapse
|
6
|
Xu Q, Wang L, Ming J, Cao H, Liu T, Yu X, Bai Y, Liang S, Hu R, Wang L, Chen C, Zhou J, Ji Q. Using noninvasive anthropometric indices to develop and validate a predictive model for metabolic syndrome in Chinese adults: a nationwide study. BMC Endocr Disord 2022; 22:53. [PMID: 35241044 PMCID: PMC8895645 DOI: 10.1186/s12902-022-00948-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 01/27/2022] [Indexed: 12/23/2022] Open
Abstract
PURPOSE Metabolic syndrome (Mets) is a pathological condition that includes many abnormal metabolic components and requires a simple detection method for rapid use in a large population. The aim of the study was to develop a diagnostic model for Mets in a Chinese population with noninvasive anthropometric and demographic predictors. PATIENTS AND METHODS Least absolute shrinkage and selection operator (LASSO) regression was used to screen predictors. A large sample from the China National Diabetes and Metabolic Disorders Survey (CNDMDS) was used to develop the model with logistic regression, and internal, internal-external and external validation were conducted to evaluate the model performance. A score calculator was developed to display the final model. RESULTS We evaluated the discrimination and calibration of the model by receiver operator characteristic (ROC) curves and calibration curve analysis. The area under the ROC curves (AUCs) and the Brier score of the original model were 0.88 and 0.122, respectively. The mean AUCs and the mean Brier score of 10-fold cross validation were 0.879 and 0.122, respectively. The mean AUCs and the mean Brier score of internal-external validation were 0.878 and 0.121, respectively. The AUCs and Brier score of external validation were 0.862 and 0.133, respectively. CONCLUSIONS The model developed in this study has good discrimination and calibration performance. Its stability was proved by internal validation, external validation and internal-external validation. Then, this model has been displayed by a calculator which can exhibit the specific predictive probability for easy use in Chinese population.
Collapse
Affiliation(s)
- Qian Xu
- Department of Endocrinology, Xijing Hospital, Air Force Medical University, Changle West Road No. 169, Xi'an, 710032, Shaanxi, China
| | - Li Wang
- Department of Endocrinology, Xijing Hospital, Air Force Medical University, Changle West Road No. 169, Xi'an, 710032, Shaanxi, China
| | - Jie Ming
- Department of Endocrinology, Xijing Hospital, Air Force Medical University, Changle West Road No. 169, Xi'an, 710032, Shaanxi, China
| | - Hongwei Cao
- Department of Endocrinology, Xijing Hospital, Air Force Medical University, Changle West Road No. 169, Xi'an, 710032, Shaanxi, China
| | - Tao Liu
- Department of Endocrinology, Xijing Hospital, Air Force Medical University, Changle West Road No. 169, Xi'an, 710032, Shaanxi, China
| | - Xinwen Yu
- Department of Endocrinology, Xijing Hospital, Air Force Medical University, Changle West Road No. 169, Xi'an, 710032, Shaanxi, China
| | - Yuanyuan Bai
- Department of Endocrinology, Xijing Hospital, Air Force Medical University, Changle West Road No. 169, Xi'an, 710032, Shaanxi, China
| | - Shengru Liang
- Department of Endocrinology, Xijing Hospital, Air Force Medical University, Changle West Road No. 169, Xi'an, 710032, Shaanxi, China
| | - Ruofan Hu
- Department of Endocrinology, Xijing Hospital, Air Force Medical University, Changle West Road No. 169, Xi'an, 710032, Shaanxi, China
| | - Li Wang
- Department of Endocrinology, Xijing Hospital, Air Force Medical University, Changle West Road No. 169, Xi'an, 710032, Shaanxi, China
| | - Changsheng Chen
- Department of Health Statistics, School of Preventive Medicine, Air Force Medical University, Changle West Road No. 169, Xi'an, 710032, Shaanxi, China
| | - Jie Zhou
- Department of Endocrinology, Xijing Hospital, Air Force Medical University, Changle West Road No. 169, Xi'an, 710032, Shaanxi, China.
| | - Qiuhe Ji
- Department of Endocrinology, Xijing Hospital, Air Force Medical University, Changle West Road No. 169, Xi'an, 710032, Shaanxi, China.
| |
Collapse
|
7
|
Liu Y, Zhou S, Wei H, An S. A comparative study of forest methods for time-to-event data: variable selection and predictive performance. BMC Med Res Methodol 2021; 21:193. [PMID: 34563138 DOI: 10.1186/s12874-021-01386-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 09/02/2021] [Indexed: 11/17/2022] Open
Abstract
Background As a hot method in machine learning field, the forests approach is an attractive alternative approach to Cox model. Random survival forests (RSF) methodology is the most popular survival forests method, whereas its drawbacks exist such as a selection bias towards covariates with many possible split points. Conditional inference forests (CIF) methodology is known to reduce the selection bias via a two-step split procedure implementing hypothesis tests as it separates the variable selection and splitting, but its computation costs too much time. Random forests with maximally selected rank statistics (MSR-RF) methodology proposed recently seems to be a great improvement on RSF and CIF. Methods In this paper we used simulation study and real data application to compare prediction performances and variable selection performances among three survival forests methods, including RSF, CIF and MSR-RF. To evaluate the performance of variable selection, we combined all simulations to calculate the frequency of ranking top of the variable importance measures of the correct variables, where higher frequency means better selection ability. We used Integrated Brier Score (IBS) and c-index to measure the prediction accuracy of all three methods. The smaller IBS value, the greater the prediction. Results Simulations show that three forests methods differ slightly in prediction performance. MSR-RF and RSF might perform better than CIF when there are only continuous or binary variables in the datasets. For variable selection performance, When there are multiple categorical variables in the datasets, the selection frequency of RSF seems to be lowest in most cases. MSR-RF and CIF have higher selection rates, and CIF perform well especially with the interaction term. The fact that correlation degree of the variables has little effect on the selection frequency indicates that three forest methods can handle data with correlation. When there are only continuous variables in the datasets, MSR-RF perform better. When there are only binary variables in the datasets, RSF and MSR-RF have more advantages than CIF. When the variable dimension increases, MSR-RF and RSF seem to be more robustthan CIF Conclusions All three methods show advantages in prediction performances and variable selection performances under different situations. The recent proposed methodology MSR-RF possess practical value and is well worth popularizing. It is important to identify the appropriate method in real use according to the research aim and the nature of covariates. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-021-01386-8.
Collapse
|
8
|
Zhou QM, Zhe L, Brooke RJ, Hudson MM, Yuan Y. A relationship between the incremental values of area under the ROC curve and of area under the precision-recall curve. Diagn Progn Res 2021; 5:13. [PMID: 34261544 PMCID: PMC8278775 DOI: 10.1186/s41512-021-00102-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 06/08/2021] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Incremental value (IncV) evaluates the performance change between an existing risk model and a new model. Different IncV metrics do not always agree with each other. For example, compared with a prescribed-dose model, an ovarian-dose model for predicting acute ovarian failure has a slightly lower area under the receiver operating characteristic curve (AUC) but increases the area under the precision-recall curve (AP) by 48%. This phenomenon of disagreement is not uncommon, and can create confusion when assessing whether the added information improves the model prediction accuracy. METHODS In this article, we examine the analytical connections and differences between the AUC IncV (ΔAUC) and AP IncV (ΔAP). We also compare the true values of these two IncV metrics in a numerical study. Additionally, as both are semi-proper scoring rules, we compare them with a strictly proper scoring rule: the IncV of the scaled Brier score (ΔsBrS) in the numerical study. RESULTS We demonstrate that ΔAUC and ΔAP are both weighted averages of the changes (from the existing model to the new one) in separating the risk score distributions between events and non-events. However, ΔAP assigns heavier weights to the changes in higher-risk regions, whereas ΔAUC weights the changes equally. Due to this difference, the two IncV metrics can disagree, and the numerical study shows that their disagreement becomes more pronounced as the event rate decreases. In the numerical study, we also find that ΔAP has a wide range, from negative to positive, but the range of ΔAUC is much smaller. In addition, ΔAP and ΔsBrS are highly consistent, but ΔAUC is negatively correlated with ΔsBrS and ΔAP when the event rate is low. CONCLUSIONS ΔAUC treats the wins and losses of a new risk model equally across different risk regions. When neither the existing or new model is the true model, this equality could attenuate a superior performance of the new model for a sub-region. In contrast, ΔAP accentuates the change in the prediction accuracy for higher-risk regions.
Collapse
Affiliation(s)
- Qian M. Zhou
- grid.260120.70000 0001 0816 8287Department of Mathematics and Statistics, Mississippi State University, Mississippi State, MS USA
| | - Lu Zhe
- grid.17089.37School of Public Health, University of Alberta, Edmonton, AB Canada
| | - Russell J. Brooke
- grid.240871.80000 0001 0224 711XSt. Jude Children’s Research Hospital, Memphis, TN USA
| | - Melissa M. Hudson
- grid.240871.80000 0001 0224 711XSt. Jude Children’s Research Hospital, Memphis, TN USA
| | - Yan Yuan
- grid.17089.37School of Public Health, University of Alberta, Edmonton, AB Canada
| |
Collapse
|
9
|
Li YH, Sheu WH, Yeh WC, Chang YC, Lee IT. Predicting Long-Term Mortality in Patients with Angina across the Spectrum of Dysglycemia: A Machine Learning Approach. Diagnostics (Basel) 2021; 11:1060. [PMID: 34207578 DOI: 10.3390/diagnostics11061060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Revised: 06/02/2021] [Accepted: 06/05/2021] [Indexed: 11/21/2022] Open
Abstract
We aimed to develop and validate a model for predicting mortality in patients with angina across the spectrum of dysglycemia. A total of 1479 patients admitted for coronary angiography due to angina were enrolled. All-cause mortality served as the primary endpoint. The models were validated with five-fold cross validation to predict long-term mortality. The features selected by least absolute shrinkage and selection operator (LASSO) were age, heart rate, plasma glucose levels at 30 min and 120 min during an oral glucose tolerance test (OGTT), the use of angiotensin II receptor blockers, the use of diuretics, and smoking history. This best performing model was built using a random survival forest with selected features. It had a good discriminative ability (Harrell’s C-index: 0.829) and acceptable calibration (Brier score: 0.08) for predicting long-term mortality. Among patients with obstructive coronary artery disease confirmed by angiography, our model outperformed the Global Registry of Acute Coronary Events discharge score for mortality prediction (Harrell’s C-index: 0.829 vs. 0.739, p < 0.001). In conclusion, we developed a machine learning model to predict long-term mortality among patients with angina. With the integration of OGTT, the model could help to identify a high risk of mortality across the spectrum of dysglycemia.
Collapse
|
10
|
Barnes DR, Danelson KA, Moholkar NM, Loftis KL. Methodology for Evaluation of WIAMan Injury Assessment Reference Curves Using Whole Body Match-Paired Data. Ann Biomed Eng 2021. [PMID: 33880631 DOI: 10.1007/s10439-021-02770-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Accepted: 03/27/2021] [Indexed: 10/21/2022]
Abstract
Development of the Warrior Injury Assessment Manikin (WIAMan) capability has included the creation of injury assessment reference curves (IARCs) specific to under-body blast (UBB) loading mechanisms and injuries. The WIAMan IARCs were created from high-rate vertical loading tests of component post-mortem human surrogates (PMHS) and analogous components of the WIAMan anthropomorphic test device (ATD). Validation of the WIAMan IARCs is required prior to the WIAMan ATD being utilized for injury assessment in live-fire vehicle test events. A portion of the validation process involves evaluating the ability of the IARCs to predict injury at the system level (whole body). This study evaluates a methodology to assess the performance of the WIAMan IARCs using match-paired tests of whole body PMHS and the WIAMan ATD. The methodology includes a qualitative analysis designed to identify false-positive and false-negative ATD predictions, as well as a quantitative analysis that utilizes area under the receiver-operating characteristic curve (AROC) and Brier score indices to grade IARC performance. Three WIAMan IARCs were used to exemplify the proposed methodology and results are provided. Attributes of the false-prediction, AROC, and Brier score portions of the methodology are presented, with results indicating the new methodology is thorough and robust in evaluation of IARCs.
Collapse
|
11
|
Li P, Taylor JMG, Spratt DE, Karnes RJ, Schipper MJ. Evaluation of predictive model performance of an existing model in the presence of missing data. Stat Med 2021; 40:3477-3498. [PMID: 33843085 DOI: 10.1002/sim.8978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2020] [Revised: 02/13/2021] [Accepted: 03/24/2021] [Indexed: 11/11/2022]
Abstract
In medical research, the Brier score (BS) and the area under the receiver operating characteristic (ROC) curves (AUC) are two common metrics used to evaluate prediction models of a binary outcome, such as using biomarkers to predict the risk of developing a disease in the future. The assessment of an existing prediction models using data with missing covariate values is challenging. In this article, we propose inverse probability weighted (IPW) and augmented inverse probability weighted (AIPW) estimates of AUC and BS to handle the missing data. An alternative approach uses multiple imputation (MI), which requires a model for the distribution of the missing variable. We evaluated the performance of IPW and AIPW in comparison with MI in simulation studies under missing completely at random, missing at random, and missing not at random scenarios. When there are missing observations in the data, MI and IPW can be used to obtain unbiased estimates of BS and AUC if the imputation model for the missing variable or the model for the missingness is correctly specified. MI is more efficient than IPW. Our simulation results suggest that AIPW can be more efficient than IPW, and also achieves double robustness from miss-specification of either the missingness model or the imputation model. The outcome variable should be included in the model for the missing variable under all scenarios, while it only needs to be included in missingness model if the missingness depends on the outcome. We illustrate these methods using an example from prostate cancer.
Collapse
Affiliation(s)
- Pin Li
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| | - Jeremy M G Taylor
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA.,Department of Radiation Oncology, University of Michigan, Ann Arbor, Michigan, USA
| | - Daniel E Spratt
- Department of Radiation Oncology, University of Michigan, Ann Arbor, Michigan, USA
| | | | - Matthew J Schipper
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA.,Department of Radiation Oncology, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
12
|
Abstract
Calibration is an important measure of the predictive accuracy for a prognostic risk model. A widely used measure of calibration when the outcome is survival time is the expected Brier score. In this paper, methodology is developed to accurately estimate the difference in expected Brier scores derived from nested survival models and to compute an accompanying variance estimate of this difference. The methodology is applicable to time invariant and time-varying coefficient Cox survival models. The nested survival model approach is often applied to the scenario where the full model consists of conventional and new covariates and the subset model contains the conventional covariates alone. A complicating factor in the methodologic development is that the Cox model specification cannot, in general, be simultaneously satisfied for nested models. The problem has been resolved by projecting the properly specified full survival model onto the lower dimensional space of conventional markers alone. Simulations are performed to examine the method's finite sample properties and a prostate cancer data set is used to illustrate its application.
Collapse
Affiliation(s)
- Glenn Heller
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering, 485 Lexington Avenue, New York, New York, 10017, USA.
| |
Collapse
|
13
|
Abstract
The random survival forest (RSF) is a non-parametric alternative to the Cox proportional hazards model in modeling time-to-event data. In this article, we developed a modeling framework to incorporate multivariate longitudinal data in the model building process to enhance the predictive performance of RSF. To extract the essential features of the multivariate longitudinal outcomes, two methods were adopted and compared: multivariate functional principal component analysis and multivariate fast covariance estimation for sparse functional data. These resulting features, which capture the trajectories of the multiple longitudinal outcomes, are then included as time-independent predictors in the subsequent RSF model. This non-parametric modeling framework, denoted as functional survival forests, is better at capturing the various trends in both the longitudinal outcomes and the survival model which may be difficult to model using only parametric approaches. These advantages are demonstrated through simulations and applications to the Alzheimer's Disease Neuroimaging Initiative.
Collapse
Affiliation(s)
- Jeffrey Lin
- Department of Biostatistics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Kan Li
- Merck Research Laboratory, Merck & Co., North Wales, PA, USA
| | - Sheng Luo
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| |
Collapse
|
14
|
Dwomoh D, Adu B, Dodoo D, Theisen M, Iddi S, Gerds TA. Evaluating the predictive performance of malaria antibodies and FCGR3B gene polymorphisms on Plasmodium falciparum infection outcome: a prospective cohort study. Malar J 2020; 19:307. [PMID: 32854708 PMCID: PMC7450914 DOI: 10.1186/s12936-020-03381-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2019] [Accepted: 08/19/2020] [Indexed: 12/03/2022] Open
Abstract
Background Malaria antigen-specific antibodies and polymorphisms in host receptors involved in antibody functionality have been associated with different outcomes of Plasmodium falciparum infections. Thus, to identify key prospective malaria antigens for vaccine development, there is the need to evaluate the associations between malaria antibodies and antibody dependent host factors with more rigorous statistical methods. In this study, different statistical models were used to evaluate the predictive performance of malaria-specific antibodies and host gene polymorphisms on P. falciparum infection in a longitudinal cohort study involving Ghanaian children. Methods Models with different functional forms were built using known predictors (age, sickle cell status, blood group status, parasite density, and mosquito bed net use) and malaria antigen-specific immunoglobulin (Ig) G and IgG subclasses and FCGR3B polymorphisms shown to mediate antibody-dependent cellular functions. Malaria antigens studied were Merozoite surface proteins (MSP-1 and MSP-3), Glutamate Rich Protein (GLURP)-R0, R2, and the Apical Membrane Antigen (AMA-1). The models were evaluated through visualization and assessment of differences between the Area Under the Receiver Operating Characteristic Curve and Brier Score estimated by suitable internal cross-validation designs. Results This study found that the FCGR3B-c.233C>A genotype and IgG against AMA1 were relatively better compared to the other antibodies and FCGR3B genotypes studied in classifying or predicting malaria risk among children. Conclusions The data supports the P. falciparum, AMA1 as an important malaria vaccine antigen, while FCGR3B-c.233C>A under the additive and dominant models of inheritance could be an important modifier of the effect of malaria protective antibodies.
Collapse
Affiliation(s)
- Duah Dwomoh
- Department of Biostatistics, School of Public Health, University of Ghana, Accra, Ghana.
| | - Bright Adu
- Department of Immunology, Noguchi Memorial Institute of Medical Research, College of Health Sciences, University of Ghana, Accra, Ghana
| | - Daniel Dodoo
- Department of Immunology, Noguchi Memorial Institute of Medical Research, College of Health Sciences, University of Ghana, Accra, Ghana
| | - Michael Theisen
- Department for Congenital Disorders, Statens Serum Institut, Copenhagen, Denmark.,Centre for Medical Parasitology at Department of International Health, Immunology and Microbiology, University of Copenhagen, Copenhagen, Denmark.,Department of Infectious Diseases, Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark
| | - Samuel Iddi
- Department of Statistics and Actuarial Sciences, University of Ghana, Accra, Ghana
| | - Thomas A Gerds
- Section of Biostatistics, Department of Public Health, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
15
|
Richardson ML, Amini B, Beckmann NM, Subhawong TK. Measuring and Teaching Confidence Calibration Among Radiologists: A Multi-Institution Study. J Am Coll Radiol 2020; 17:1314-1321. [PMID: 32739415 DOI: 10.1016/j.jacr.2020.06.035] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2020] [Revised: 03/25/2020] [Accepted: 06/01/2020] [Indexed: 11/16/2022]
Abstract
OBJECTIVE Our purpose was to assess the calibration of resident, fellow, and attending radiologists on a simple image classification task (presence or absence of an anterior cruciate ligament [ACL] tear based on interpretation of sagittal proton density, fat-saturated MR images) and to assess whether teaching residents could improve their calibration. METHODS We created a test containing 30 randomized, sagittal, proton density, fat-saturated MR images of the ACL (15 normal, 15 torn). This test was administered in person to 20 trainees and 3 attendings at one medical center in one state. An online version of the test was given to 23 trainees and 14 attendings from 11 other medical centers in nine other states. Subjects were asked to give their confidence level (0%-100%) that each ACL was torn. RESULTS Cross-sectional data were collected from 60 radiologists (mean time after medical school = 9.3 years, minimum = 1 year, maximum = 36 years). This demonstrated a statistically significant improvement in calibration as a function of increasing experience (P = .020). Longitudinal data were collected from 12 trainees at the start and end of their musculoskeletal radiology rotation, with an intervening review of the primary and secondary signs of ACL tear on MR. A statistically significant improvement in calibration was noted during the rotation (P = .028). CONCLUSIONS Confidence calibration is a promising tool for quality improvement and radiologist self-assessment. Our study showed that calibration loss improves with experience in radiologists tested on a common and clinically important image classification task. We also demonstrated that calibration can be successfully taught to residents over a relatively short period (2-4 weeks).
Collapse
Affiliation(s)
| | - Behrang Amini
- Department of Radiology, M.D. Anderson Cancer Center, Houston, Texas
| | | | - Ty K Subhawong
- Department of Radiology, University of Miami Health System, Miami, Florida
| |
Collapse
|
16
|
Biccler JL, Bøgsted M, Van Aelst S, Verdonck T. Outlier robust modeling of survival curves in the presence of potentially time-varying coefficients. Stat Methods Med Res 2020; 29:2683-2696. [PMID: 32180501 DOI: 10.1177/0962280220910193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In time to event studies, censoring often occurs and models that take this into account are wide-spread. In the presence of outliers, standard estimators of model parameters may be affected such that results and conclusions are not reliable anymore. This in turn also hampers the detection of these outliers due to masking effects. To cope with outliers when using proportional hazard models, we propose to use the Brier score as a loss function. Since the coefficients often vary over time, we focus on the piecewise constant hazard model, which can flexibly model time-varying coefficients if a large number of cut-points is used. To prevent overfitting, we add a penalty term that potentially shrinks time-varying effects to constant effects. By fitting the coefficients of the piecewise constant hazard model using a penalized Brier score loss, we obtain a robust model that can handle time-varying coefficients. Its good performance is illustrated in a simulation study and using two datasets from practice.
Collapse
Affiliation(s)
- Jorne Lionel Biccler
- Department of Haematology, Aalborg University Hospital, Aalborg, Denmark.,Department of Clinical Medicine, Aalborg University, Aalborg, Denmark
| | - Martin Bøgsted
- Department of Haematology, Aalborg University Hospital, Aalborg, Denmark.,Department of Clinical Medicine, Aalborg University, Aalborg, Denmark
| | | | - Tim Verdonck
- Department of Mathematics, KU Leuven, Leuven, Belgium.,Department of Mathematics, University of Antwerp, Antwerp, Belgium
| |
Collapse
|
17
|
Enserro DM, Demler OV, Pencina MJ, D'Agostino RB. Measures for evaluation of prognostic improvement under multivariate normality for nested and nonnested models. Stat Med 2019; 38:3817-3831. [PMID: 31211443 DOI: 10.1002/sim.8204] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2018] [Revised: 04/15/2019] [Accepted: 04/23/2019] [Indexed: 12/22/2022]
Abstract
When comparing performances of two risk prediction models, several metrics exist to quantify prognostic improvement, including the change in the area under the Receiver Operating Characteristic curve, the Integrated Discrimination Improvement, the Net Reclassification Index at event rate, the change in Standardized Net Benefit, the change in Brier score, and the change in scaled Brier score. We explore the behavior and interrelationships between these metrics under multivariate normality in nested and nonnested model comparisons. We demonstrate that, within the framework of linear discriminant analysis, all six statistics are functions of squared Mahalanobis distance, a robust metric that properly measures discrimination by quantifying the separation between the risk scores of events and nonevents. These relationships are important for overall interpretability and clinical usefulness. Through simulation, we demonstrate that the performance of the theoretical estimators under normality is comparable or superior to empirical estimation methods typically used by investigators. In particular, the theoretical estimators for the Net Reclassification Index and the change in Standardized Net Benefit exhibit less variability in their estimates as compared to their empirically estimated counterparts. Finally, we explore how these metrics behave with potentially nonnormal data by applying these methods in a practical example based on the sex-specific cardiovascular disease risk models from the Framingham Heart Study. Our findings aim to give greater insight into the behavior of these measures and the connections existing among them and to provide additional estimation methods with less variability for the Net Reclassification Index and the change in Standardized Net Benefit.
Collapse
Affiliation(s)
- Danielle M Enserro
- NRG Oncology; Clinical Trials Development Division, Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, New York.,Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts
| | - Olga V Demler
- Division of Preventive Medicine, Brigham and Women's Hospital; Harvard Medical School, Boston, Massachusetts
| | - Michael J Pencina
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina
| | - Ralph B D'Agostino
- Department of Mathematics & Statistics, Boston University, Boston, Massachusetts
| |
Collapse
|
18
|
Luijken K, Groenwold RHH, Van Calster B, Steyerberg EW, van Smeden M. Impact of predictor measurement heterogeneity across settings on the performance of prediction models: A measurement error perspective. Stat Med 2019; 38:3444-3459. [PMID: 31148207 PMCID: PMC6619392 DOI: 10.1002/sim.8183] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2018] [Revised: 02/02/2019] [Accepted: 04/08/2019] [Indexed: 12/23/2022]
Abstract
It is widely acknowledged that the predictive performance of clinical prediction models should be studied in patients that were not part of the data in which the model was derived. Out‐of‐sample performance can be hampered when predictors are measured differently at derivation and external validation. This may occur, for instance, when predictors are measured using different measurement protocols or when tests are produced by different manufacturers. Although such heterogeneity in predictor measurement between derivation and validation data is common, the impact on the out‐of‐sample performance is not well studied. Using analytical and simulation approaches, we examined out‐of‐sample performance of prediction models under various scenarios of heterogeneous predictor measurement. These scenarios were defined and clarified using an established taxonomy of measurement error models. The results of our simulations indicate that predictor measurement heterogeneity can induce miscalibration of prediction and affects discrimination and overall predictive accuracy, to extents that the prediction model may no longer be considered clinically useful. The measurement error taxonomy was found to be helpful in identifying and predicting effects of heterogeneous predictor measurements between settings of prediction model derivation and validation. Our work indicates that homogeneity of measurement strategies across settings is of paramount importance in prediction research.
Collapse
Affiliation(s)
- K Luijken
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - R H H Groenwold
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands.,Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - B Van Calster
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands.,Department of Development and Regeneration, University of Leuven, Leuven, Belgium
| | - E W Steyerberg
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands.,Department of Public Health, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - M van Smeden
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| |
Collapse
|
19
|
Wu C, Li L. Quantifying and estimating the predictive accuracy for censored time-to-event data with competing risks. Stat Med 2018; 37:3106-3124. [PMID: 29766537 DOI: 10.1002/sim.7806] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Revised: 03/29/2018] [Accepted: 04/11/2018] [Indexed: 01/13/2023]
Abstract
This paper focuses on quantifying and estimating the predictive accuracy of prognostic models for time-to-event outcomes with competing events. We consider the time-dependent discrimination and calibration metrics, including the receiver operating characteristics curve and the Brier score, in the context of competing risks. To address censoring, we propose a unified nonparametric estimation framework for both discrimination and calibration measures, by weighting the censored subjects with the conditional probability of the event of interest given the observed data. The proposed method can be extended to time-dependent predictive accuracy metrics constructed from a general class of loss functions. We apply the methodology to a data set from the African American Study of Kidney Disease and Hypertension to evaluate the predictive accuracy of a prognostic risk score in predicting end-stage renal disease, accounting for the competing risk of pre-end-stage renal disease death, and evaluate its numerical performance in extensive simulation studies.
Collapse
Affiliation(s)
- Cai Wu
- Department of Biostatistics, The University of Texas Health Science Center at Houston, Houston, TX, USA.,Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Liang Li
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
20
|
Vosler PS, Orsini M, Enepekides DJ, Higgins KM. Predicting complications of major head and neck oncological surgery: an evaluation of the ACS NSQIP surgical risk calculator. J Otolaryngol Head Neck Surg 2018; 47:21. [PMID: 29566750 PMCID: PMC5863849 DOI: 10.1186/s40463-018-0269-8] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2017] [Accepted: 03/12/2018] [Indexed: 12/03/2022] Open
Abstract
Background The American College of Surgeons National Surgical Quality Improvement Program (ACS NSQIP) universal surgical risk calculator is an online tool intended to improve the informed consent process and surgical decision-making. The risk calculator uses a database of information from 585 hospitals to predict a patient’s risk of developing specific postoperative outcomes. Methods Patient records at a major Canadian tertiary care referral center between July 2015 and March 2017 were reviewed for surgical cases including one of six major head and neck oncologic surgeries: total thyroidectomy, total laryngectomy, hemiglossectomy, partial glossectomy, laryngopharyngectomy, and composite resection. Preoperative information for 107 patients was entered into the risk calculator and compared to observed postoperative outcomes. Statistical analysis of the risk calculator was completed for the entire study population, for stratification by procedure, and by utilization of microvascular reconstruction. Accuracy was assessed using the ratio of predicted to observed outcomes, Receiver Operating Characteristics (ROC), Brier score, and the Wilcoxon signed–ranked test. Results The risk calculator accurately predicted the incidences for 11 of 12 outcomes for patients that did not undergo free flap reconstruction (NFF group), but was less accurate for patients that underwent free flap reconstruction (FF group). Length of stay (LOS) analysis showed similar results, with predicted and observed LOS statistically different in the overall population and FF group analyses (p = 0.001 for both), but not for the NFF group analysis (p = 0.764). All outcomes in the NFF group, when analyzed for calibration, met the threshold value (Brier scores < 0.09). Risk predictions for 8 of 12, and 10 of 12 outcomes were adequately calibrated in the FF group and the overall study population, respectively. Analyses by procedure were excellent, with the risk calculator showing adequate calibration for 7 of 8 procedural categories and adequate discrimination for all calculable categories (6 of 6). Conclusion The NSQIP-RC demonstrated efficacy for predicting postoperative complications in head and neck oncology surgeries that do not require microvascular reconstruction. The predictive value of the metric can be improved by inclusion of several factors important for risk stratification in head and neck oncology.
Collapse
Affiliation(s)
- Peter S Vosler
- Department of Otolaryngology-Head & Neck Surgery, Sunnybrook Health Sciences Centre, University of Toronto, 2075 Bayview Avenue, Suite M1 102, Toronto, ON, M4N 3M5, Canada
| | - Mario Orsini
- Department of Otolaryngology-Head & Neck Surgery, Sunnybrook Health Sciences Centre, University of Toronto, 2075 Bayview Avenue, Suite M1 102, Toronto, ON, M4N 3M5, Canada
| | - Danny J Enepekides
- Department of Otolaryngology-Head & Neck Surgery, Sunnybrook Health Sciences Centre, University of Toronto, 2075 Bayview Avenue, Suite M1 102, Toronto, ON, M4N 3M5, Canada
| | - Kevin M Higgins
- Department of Otolaryngology-Head & Neck Surgery, Sunnybrook Health Sciences Centre, University of Toronto, 2075 Bayview Avenue, Suite M1 102, Toronto, ON, M4N 3M5, Canada.
| |
Collapse
|
21
|
Abstract
BACKGROUND Many measures of prediction accuracy have been developed. However, the most popular ones in typical medical outcome prediction settings require additional investigation of calibration. METHODS We show how rescaling the Brier score produces a measure that combines discrimination and calibration in one value and improves interpretability by adjusting for a benchmark model. We have called this measure the index of prediction accuracy (IPA). The IPA permits a common interpretation across binary, time to event, and competing risk outcomes. We illustrate this measure using example datasets. RESULTS The IPA is simple to compute, and example code is provided. The values of the IPA appear very interpretable. CONCLUSIONS IPA should be a prominent measure reported in studies of medical prediction model performance. However, IPA is only a measure of average performance and, by default, does not measure the utility of a medical decision.
Collapse
Affiliation(s)
- Michael W. Kattan
- 0000 0001 0675 4725grid.239578.2Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, 9500 Euclid Avenue/JJN3-01, Cleveland, OH 44195 USA
| | - Thomas A. Gerds
- 0000 0001 0674 042Xgrid.5254.6Department of Public Health, Section of Biostatistics, University of Copenhagen, N.J. Fjords Alle 12, 4 th 1957 Frederiksberg, Øster Farimagsgade 5, 1014 Copenhagen, Denmark
| |
Collapse
|
22
|
Abstract
The statistical evaluation of probabilistic disease forecasts often involves calculation of metrics defined conditionally on disease status, such as sensitivity and specificity. However, for the purpose of disease management decision making, metrics defined conditionally on the result of the forecast-predictive values-are also important, although less frequently reported. In this context, the application of scoring rules in the evaluation of probabilistic disease forecasts is discussed. An index of separation with application in the evaluation of probabilistic disease forecasts, described in the clinical literature, is also considered and its relation to scoring rules illustrated. Scoring rules provide a principled basis for the evaluation of probabilistic forecasts used in plant disease management. In particular, the decomposition of scoring rules into interpretable components is an advantageous feature of their application in the evaluation of disease forecasts.
Collapse
Affiliation(s)
- Gareth Hughes
- Crop and Soil Systems Research Group, SRUC, Edinburgh EH9 3JG, U.K
| | - Fiona J Burnett
- Crop and Soil Systems Research Group, SRUC, Edinburgh EH9 3JG, U.K
| |
Collapse
|
23
|
Schulz A, Zöller D, Nickels S, Beutel ME, Blettner M, Wild PS, Binder H. Simulation of complex data structures for planning of studies with focus on biomarker comparison. BMC Med Res Methodol 2017; 17:90. [PMID: 28610631 PMCID: PMC5470184 DOI: 10.1186/s12874-017-0364-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2016] [Accepted: 05/24/2017] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND There are a growing number of observational studies that do not only focus on single biomarkers for predicting an outcome event, but address questions in a multivariable setting. For example, when quantifying the added value of new biomarkers in addition to established risk factors, the aim might be to rank several new markers with respect to their prediction performance. This makes it important to consider the marker correlation structure for planning such a study. Because of the complexity, a simulation approach may be required to adequately assess sample size or other aspects, such as the choice of a performance measure. METHODS In a simulation study based on real data, we investigated how to generate covariates with realistic distributions and what generating model should be used for the outcome, aiming to determine the least amount of information and complexity needed to obtain realistic results. As a basis for the simulation a large epidemiological cohort study, the Gutenberg Health Study was used. The added value of markers was quantified and ranked in subsampling data sets of this population data, and simulation approaches were judged by the quality of the ranking. One of the evaluated approaches, the random forest, requires original data at the individual level. Therefore, also the effect of the size of a pilot study for random forest based simulation was investigated. RESULTS We found that simple logistic regression models failed to adequately generate realistic data, even with extensions such as interaction terms or non-linear effects. The random forest approach was seen to be more appropriate for simulation of complex data structures. Pilot studies starting at about 250 observations were seen to provide a reasonable level of information for this approach. CONCLUSIONS We advise to avoid oversimplified regression models for simulation, in particular when focusing on multivariable research questions. More generally, a simulation should be based on real data for adequately reflecting complex observational data structures, such as found in epidemiological cohort studies.
Collapse
Affiliation(s)
- Andreas Schulz
- Preventive Cardiology and Preventive Medicine, Center for Cardiology, University Medical Center of the Johannes Gutenberg-University Mainz, Langenbeckstraße 1, Mainz, 55131, Germany.
- Center for Translational Vascular Biology (CTVB), University Medical Center of the Johannes Gutenberg-University Mainz, Langenbeckstraße 1, Mainz, 55131, Germany.
| | - Daniela Zöller
- Institute of Medical Biostatistics, Epidemiology and Informatics, University Medical Center of the Johannes Gutenberg-University Mainz, Obere Zahlbacher Str. 69, Mainz, 55131, Germany
| | - Stefan Nickels
- Department of Ophthalmology, University Medical Center of the Johannes Gutenberg-University Mainz, Langenbeckstraße 1, Mainz, 55131, Germany
| | - Manfred E Beutel
- Clinic for Psychosomatic Medicine and Psychotherapy, University Medical Center of the Johannes Gutenberg-University Mainz, Langenbeckstraße 1, Mainz, 55131, Germany
| | - Maria Blettner
- Institute of Medical Biostatistics, Epidemiology and Informatics, University Medical Center of the Johannes Gutenberg-University Mainz, Obere Zahlbacher Str. 69, Mainz, 55131, Germany
| | - Philipp S Wild
- Preventive Cardiology and Preventive Medicine, Center for Cardiology, University Medical Center of the Johannes Gutenberg-University Mainz, Langenbeckstraße 1, Mainz, 55131, Germany
- Center for Translational Vascular Biology (CTVB), University Medical Center of the Johannes Gutenberg-University Mainz, Langenbeckstraße 1, Mainz, 55131, Germany
- Center for Thrombosis and Hemostasis, University Medical Center of the Johannes Gutenberg-University Mainz, Langenbeckstraße 1, Mainz, 55131, Germany
- DZHK (German Center for Cardiovascular Research), partner site RhineMain, Mainz, Langenbeckstraße 1, Mainz, 55131, Germany
| | - Harald Binder
- Institute for Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, Stefan-Meier-Str. 26, Freiburg, 79104, Germany
| |
Collapse
|
24
|
Abstract
Prediction accuracy of a cure model when it is used to predict the cure probability of a patient is an important but not well-addressed issue in survival analysis. We propose a method to assess the prediction accuracy of a mixture cure model in predicting cure probability based on inverse probability of censoring weights to incorporate the censoring and latent cure status in the data. The inverse probability of censoring weight-adjusted estimator is shown to be consistent for the true expected prediction error for cure probability. A simulation study shows that the estimator performs well with finite samples when subjects with censored survival times greater than the largest uncensored time are identified as cured, an approach that is often used in mixture cure model literature to increase model identifiability. The simulation study also investigates the performance of the estimator with different thresholds to identify cured subjects and the estimator based on observed (training) data only. The method is applied to bone barrow transplant data for leukemia patients for assessing prediction accuracy for the cure probabilities.
Collapse
Affiliation(s)
- Wenyu Jiang
- 1 Department of Mathematics and Statistics, Queen's University, Kingston, ON, Canada
| | - Haoyu Sun
- 1 Department of Mathematics and Statistics, Queen's University, Kingston, ON, Canada
| | - Yingwei Peng
- 1 Department of Mathematics and Statistics, Queen's University, Kingston, ON, Canada.,2 Department of Public Health Sciences, Queen's University, Kingston, ON, Canada.,3 Division of Cancer Care and Epidemiology, Queen's Cancer Research Institute, Kingston, ON, Canada
| |
Collapse
|
25
|
Abstract
BACKGROUND A variety of statistics have been proposed as tools to help investigators assess the value of diagnostic tests or prediction models. The Brier score has been recommended on the grounds that it is a proper scoring rule that is affected by both discrimination and calibration. However, the Brier score is prevalence dependent in such a way that the rank ordering of tests or models may inappropriately vary by prevalence. METHODS We explored four common clinical scenarios: comparison of a highly accurate binary test with a continuous prediction model of moderate predictiveness; comparison of two binary tests where the importance of sensitivity versus specificity is inversely associated with prevalence; comparison of models and tests to default strategies of assuming that all or no patients are positive; and comparison of two models with miscalibration in opposite directions. RESULTS In each case, we found that the Brier score gave an inappropriate rank ordering of the tests and models. Conversely, net benefit, a decision-analytic measure, gave results that always favored the preferable test or model. CONCLUSIONS Brier score does not evaluate clinical value of diagnostic tests or prediction models. We advocate, as an alternative, the use of decision-analytic measures such as net benefit. TRIAL REGISTRATION Not applicable.
Collapse
Affiliation(s)
- Melissa Assel
- 0000 0001 2171 9952grid.51462.34Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, USA
| | - Daniel D. Sjoberg
- 0000 0001 2171 9952grid.51462.34Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, USA
| | - Andrew J. Vickers
- 0000 0001 2171 9952grid.51462.34Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, USA
| |
Collapse
|
26
|
Khudyakov P, Gorfine M, Zucker D, Spiegelman D. The impact of covariate measurement error on risk prediction. Stat Med 2015; 34:2353-67. [PMID: 25865315 PMCID: PMC4480422 DOI: 10.1002/sim.6498] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2014] [Accepted: 03/11/2015] [Indexed: 01/05/2023]
Abstract
In the development of risk prediction models, predictors are often measured with error. In this paper, we investigate the impact of covariate measurement error on risk prediction. We compare the prediction performance using a costly variable measured without error, along with error-free covariates, to that of a model based on an inexpensive surrogate along with the error-free covariates. We consider continuous error-prone covariates with homoscedastic and heteroscedastic errors, and also a discrete misclassified covariate. Prediction performance is evaluated by the area under the receiver operating characteristic curve (AUC), the Brier score (BS), and the ratio of the observed to the expected number of events (calibration). In an extensive numerical study, we show that (i) the prediction model with the error-prone covariate is very well calibrated, even when it is mis-specified; (ii) using the error-prone covariate instead of the true covariate can reduce the AUC and increase the BS dramatically; (iii) adding an auxiliary variable, which is correlated with the error-prone covariate but conditionally independent of the outcome given all covariates in the true model, can improve the AUC and BS substantially. We conclude that reducing measurement error in covariates will improve the ensuing risk prediction, unless the association between the error-free and error-prone covariates is very high. Finally, we demonstrate how a validation study can be used to assess the effect of mismeasured covariates on risk prediction. These concepts are illustrated in a breast cancer risk prediction model developed in the Nurses' Health Study.
Collapse
Affiliation(s)
- Polyna Khudyakov
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, U.S.A
| | - Malka Gorfine
- Faculty of Industrial Engineering and Management, Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel
| | - David Zucker
- Department of Statistics, Hebrew University of Jerusalem, Mt. Scopus, Jerusalem, Israel
| | - Donna Spiegelman
- Departments of Epidemiology, Biostatistics, Nutrition and Global Health, Harvard T.H. Chan School of Public Health, Boston, MA, U.S.A
| |
Collapse
|
27
|
Wey A, Connett J, Rudser K. Combining parametric, semi-parametric, and non-parametric survival models with stacked survival models. Biostatistics 2015; 16:537-49. [PMID: 25662068 DOI: 10.1093/biostatistics/kxv001] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2014] [Accepted: 01/05/2015] [Indexed: 11/13/2022] Open
Abstract
For estimating conditional survival functions, non-parametric estimators can be preferred to parametric and semi-parametric estimators due to relaxed assumptions that enable robust estimation. Yet, even when misspecified, parametric and semi-parametric estimators can possess better operating characteristics in small sample sizes due to smaller variance than non-parametric estimators. Fundamentally, this is a bias-variance trade-off situation in that the sample size is not large enough to take advantage of the low bias of non-parametric estimation. Stacked survival models estimate an optimally weighted combination of models that can span parametric, semi-parametric, and non-parametric models by minimizing prediction error. An extensive simulation study demonstrates that stacked survival models consistently perform well across a wide range of scenarios by adaptively balancing the strengths and weaknesses of individual candidate survival models. In addition, stacked survival models perform as well as or better than the model selected through cross-validation. Finally, stacked survival models are applied to a well-known German breast cancer study.
Collapse
Affiliation(s)
- Andrew Wey
- University of Hawaii, Honolulu, HI 96815, USAUniversity of Minnesota, Minneapolis, MN 55455, USA
| | - John Connett
- University of Hawaii, Honolulu, HI 96815, USAUniversity of Minnesota, Minneapolis, MN 55455, USA
| | - Kyle Rudser
- University of Hawaii, Honolulu, HI 96815, USAUniversity of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
28
|
Ishwaran H, Gerds TA, Kogalur UB, Moore RD, Gange SJ, Lau BM. Random survival forests for competing risks. Biostatistics 2014; 15:757-73. [PMID: 24728979 PMCID: PMC4173102 DOI: 10.1093/biostatistics/kxu010] [Citation(s) in RCA: 133] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2013] [Revised: 02/23/2014] [Accepted: 02/27/2014] [Indexed: 11/14/2022] Open
Abstract
We introduce a new approach to competing risks using random forests. Our method is fully non-parametric and can be used for selecting event-specific variables and for estimating the cumulative incidence function. We show that the method is highly effective for both prediction and variable selection in high-dimensional problems and in settings such as HIV/AIDS that involve many competing risks.
Collapse
Affiliation(s)
- Hemant Ishwaran
- Division of Biostatistics, University of Miami, Miami, FL 33136, USA
| | - Thomas A Gerds
- Department of Biostatistics, University of Copenhagen, 1014 Copenhagen, Denmark
| | - Udaya B Kogalur
- Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Richard D Moore
- Department of Medicine, School of Medicine, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Stephen J Gange
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Bryan M Lau
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA
| |
Collapse
|
29
|
Sène M, Taylor JM, Dignam JJ, Jacqmin-Gadda H, Proust-Lima C. Individualized dynamic prediction of prostate cancer recurrence with and without the initiation of a second treatment: Development and validation. Stat Methods Med Res 2014; 25:2972-2991. [PMID: 24847900 DOI: 10.1177/0962280214535763] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
With the emergence of rich information on biomarkers after treatments, new types of prognostic tools are being developed: dynamic prognostic tools that can be updated at each new biomarker measurement. Such predictions are of interest in oncology where after an initial treatment, patients are monitored with repeated biomarker data. However, in such setting, patients may receive second treatments to slow down the progression of the disease. This paper aims to develop and validate dynamic individual predictions that allow the possibility of a new treatment in order to help understand the benefit of initiating new treatments during the monitoring period. The prediction of the event in the next x years is done under two scenarios: (1) the patient initiates immediately a second treatment, (2) the patient does not initiate any treatment in the next x years. Predictions are derived from shared random-effect models. Applied to prostate cancer data, different specifications for the dependence between the prostate-specific antigen repeated measures, the initiation of a second treatment (hormonal therapy), and the risk of clinical recurrence are investigated and compared. The predictive accuracy of the dynamic predictions is evaluated with two measures (Brier score and prognostic cross-entropy) for which approximated cross-validated estimators are proposed.
Collapse
Affiliation(s)
- Mbéry Sène
- INSERM, Centre INSERM U897-Epidemiologie-Biostatistique, Bordeaux, France.,Université de Bordeaux, ISPED, Bordeaux, France
| | - Jeremy Mg Taylor
- Department of Biostatistics, Department of Radiation Oncology, University of Michigan, Ann Arbor, MI, USA
| | - James J Dignam
- Department of Health Studies, University of Chicago, Chicago, IL, USA.,Radiation Therapy Oncology Group, American College of Radiology, Philadelphia, PA, USA
| | - Hélène Jacqmin-Gadda
- INSERM, Centre INSERM U897-Epidemiologie-Biostatistique, Bordeaux, France.,Université de Bordeaux, ISPED, Bordeaux, France
| | - Cécile Proust-Lima
- INSERM, Centre INSERM U897-Epidemiologie-Biostatistique, Bordeaux, France .,Université de Bordeaux, ISPED, Bordeaux, France
| |
Collapse
|
30
|
Cortese G, Gerds TA, Andersen PK. Comparing predictions among competing risks models with time-dependent covariates. Stat Med 2013; 32:3089-101. [PMID: 23494745 DOI: 10.1002/sim.5773] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2012] [Accepted: 02/01/2013] [Indexed: 11/08/2022]
Abstract
Prediction of cumulative incidences is often a primary goal in clinical studies with several endpoints. We compare predictions among competing risks models with time-dependent covariates. For a series of landmark time points, we study the predictive accuracy of a multi-state regression model, where the time-dependent covariate represents an intermediate state, and two alternative landmark approaches. At each landmark time point, the prediction performance is measured as the t-year expected Brier score where pseudovalues are constructed in order to deal with right-censored event times. We apply the methods to data from a bone marrow transplant study where graft versus host disease is considered a time-dependent covariate for predicting relapse and death in remission.
Collapse
Affiliation(s)
- Giuliana Cortese
- Department of Statistical Sciences, University of Padua, Via Cesare Battisti 241, 35121 Padua, Italy.
| | | | | |
Collapse
|
31
|
Abstract
Most statistical developments in the joint modelling area have focused on the shared random-effect models that include characteristics of the longitudinal marker as predictors in the model for the time-to-event. A less well-known approach is the joint latent class model which consists in assuming that a latent class structure entirely captures the correlation between the longitudinal marker trajectory and the risk of the event. Owing to its flexibility in modelling the dependency between the longitudinal marker and the event time, as well as its ability to include covariates, the joint latent class model may be particularly suited for prediction problems. This article aims at giving an overview of joint latent class modelling, especially in the prediction context. The authors introduce the model, discuss estimation and goodness-of-fit, and compare it with the shared random-effect model. Then, dynamic predictive tools derived from joint latent class models, as well as measures to evaluate their dynamic predictive accuracy, are presented. A detailed illustration of the methods is given in the context of the prediction of prostate cancer recurrence after radiation therapy based on repeated measures of Prostate Specific Antigen.
Collapse
Affiliation(s)
- Cécile Proust-Lima
- 1INSERM, U897, Epidemiology and Biostatistics Research Center, F-33076 Bordeaux, France
| | | | | | | |
Collapse
|