1
|
Scheerders ERY, van Klaveren D, Malskat WSJ, van Rijn MJE, van der Velden SK, Nijsten T, van den Bos RR. Development and External Validation of a Prediction Model for Patients with Varicose Veins Suitable for Isolated Ambulatory Phlebectomy. Eur J Vasc Endovasc Surg 2024; 68:387-394. [PMID: 38710320 DOI: 10.1016/j.ejvs.2024.05.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 04/11/2024] [Accepted: 05/01/2024] [Indexed: 05/08/2024]
Abstract
OBJECTIVE Isolated ambulatory phlebectomy is a potential treatment option for patients with an incompetent great saphenous vein (GSV) or anterior accessory saphenous vein and one or more incompetent tributaries. Being able to determine which patients will most likely benefit from isolated phlebectomy is important. This study aimed to identify predictors for avoidance of secondary axial ablation after isolated phlebectomy and to develop and externally validate a multivariable model for predicting this outcome. METHODS For model development, data from patients included in the SAPTAP trial were used. The investigated outcome was avoidance of ablation of the saphenous trunk one year after isolated ambulatory phlebectomy. Pre-defined candidate predictors were analysed with multivariable logistic regression. Predictors were selected using Akaike information criterion backward selection. Discriminative ability was assessed by the concordance index. Bootstrapping was used to correct regression coefficients, and the C index for overfitting. The model was externally validated using a population of 94 patients, with an incompetent GSV and one or more incompetent tributaries, who underwent isolated phlebectomy. RESULTS For model development, 225 patients were used, of whom 167 (74.2%) did not undergo additional ablation of the saphenous trunk one year after isolated phlebectomy. The final model consisted of three predictors for avoidance of axial ablation: tributary length (< 15 cm vs. > 30 cm: odds ratio [OR] 0.09, 95% confidence interval [CI] 0.02 - 0.40; 15 - 30 cm vs. > 30 cm: OR 0.18, 95% CI 0.09 - 0.38); saphenofemoral junction (SFJ) reflux (absent vs. present: OR 2.53, 95% CI 0.81 - 7.87); and diameter of the saphenous trunk (per millimetre change: OR 0.63, 95% CI 0.41 - 0.96). The discriminative ability of the model was moderate (0.72 at internal validation; 0.73 at external validation). CONCLUSION A model was developed for predicting avoidance of secondary ablation of the saphenous trunk one year after isolated ambulatory phlebectomy, which can be helpful in daily practice to determine the suitable treatment strategy in patients with an incompetent saphenous trunk and one or more incompetent tributaries. Patients having a longer tributary, smaller diameter saphenous trunk, and absence of terminal valve reflux in the SFJ are more likely to benefit from isolated phlebectomy.
Collapse
Affiliation(s)
- Eveline R Y Scheerders
- Department of Dermatology, Erasmus MC University Medical Centre, Rotterdam, the Netherlands
| | - David van Klaveren
- Department of Public Health, Erasmus MC University Medical Centre, Rotterdam, the Netherlands
| | - Wendy S J Malskat
- Department of Dermatology, Erasmus MC University Medical Centre, Rotterdam, the Netherlands
| | - Marie Josee E van Rijn
- Department of Vascular Surgery, Erasmus MC University Medical Centre, Rotterdam, the Netherlands
| | - Simone K van der Velden
- Department of Dermatology, Erasmus MC University Medical Centre, Rotterdam, the Netherlands; MohsA Clinic, Eindhoven, the Netherlands
| | - Tamar Nijsten
- Department of Dermatology, Erasmus MC University Medical Centre, Rotterdam, the Netherlands
| | - Renate R van den Bos
- Department of Dermatology, Erasmus MC University Medical Centre, Rotterdam, the Netherlands.
| |
Collapse
|
2
|
de Winkel J, Maas CCHM, Roozenbeek B, van Klaveren D, Lingsma HF. Pitfalls of single-study external validation illustrated with a model predicting functional outcome after aneurysmal subarachnoid hemorrhage. BMC Med Res Methodol 2024; 24:176. [PMID: 39118007 PMCID: PMC11308226 DOI: 10.1186/s12874-024-02280-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 07/09/2024] [Indexed: 08/10/2024] Open
Abstract
BACKGROUND Prediction models are often externally validated with data from a single study or cohort. However, the interpretation of performance estimates obtained with single-study external validation is not as straightforward as assumed. We aimed to illustrate this by conducting a large number of external validations of a prediction model for functional outcome in subarachnoid hemorrhage (SAH) patients. METHODS We used data from the Subarachnoid Hemorrhage International Trialists (SAHIT) data repository (n = 11,931, 14 studies) to refit the SAHIT model for predicting a dichotomous functional outcome (favorable versus unfavorable), with the (extended) Glasgow Outcome Scale or modified Rankin Scale score, at a minimum of three months after discharge. We performed leave-one-cluster-out cross-validation to mimic the process of multiple single-study external validations. Each study represented one cluster. In each of these validations, we assessed discrimination with Harrell's c-statistic and calibration with calibration plots, the intercepts, and the slopes. We used random effects meta-analysis to obtain the (reference) mean performance estimates and between-study heterogeneity (I2-statistic). The influence of case-mix variation on discriminative performance was assessed with the model-based c-statistic and we fitted a "membership model" to obtain a gross estimate of transportability. RESULTS Across 14 single-study external validations, model performance was highly variable. The mean c-statistic was 0.74 (95%CI 0.70-0.78, range 0.52-0.84, I2 = 0.92), the mean intercept was -0.06 (95%CI -0.37-0.24, range -1.40-0.75, I2 = 0.97), and the mean slope was 0.96 (95%CI 0.78-1.13, range 0.53-1.31, I2 = 0.90). The decrease in discriminative performance was attributable to case-mix variation, between-study heterogeneity, or a combination of both. Incidentally, we observed poor generalizability or transportability of the model. CONCLUSIONS We demonstrate two potential pitfalls in the interpretation of model performance with single-study external validation. With single-study external validation. (1) model performance is highly variable and depends on the choice of validation data and (2) no insight is provided into generalizability or transportability of the model that is needed to guide local implementation. As such, a single single-study external validation can easily be misinterpreted and lead to a false appreciation of the clinical prediction model. Cross-validation is better equipped to address these pitfalls.
Collapse
Affiliation(s)
- Jordi de Winkel
- Department of Neurology, Erasmus MC University Medical Center Rotterdam, 40 Doctor Molewaterplein, P.O. Box 2040, Rotterdam, Zuid-Holland, 3015 GD, The Netherlands.
- Department of Public Health, Erasmus MC University Medical Center Rotterdam, Rotterdam, Zuid-Holland, The Netherlands.
| | - Carolien C H M Maas
- Department of Public Health, Erasmus MC University Medical Center Rotterdam, Rotterdam, Zuid-Holland, The Netherlands
| | - Bob Roozenbeek
- Department of Neurology, Erasmus MC University Medical Center Rotterdam, 40 Doctor Molewaterplein, P.O. Box 2040, Rotterdam, Zuid-Holland, 3015 GD, The Netherlands
| | - David van Klaveren
- Department of Public Health, Erasmus MC University Medical Center Rotterdam, Rotterdam, Zuid-Holland, The Netherlands
| | - Hester F Lingsma
- Department of Public Health, Erasmus MC University Medical Center Rotterdam, Rotterdam, Zuid-Holland, The Netherlands
| |
Collapse
|
3
|
Hoogland J, Efthimiou O, Nguyen TL, Debray TPA. Evaluating individualized treatment effect predictions: A model-based perspective on discrimination and calibration assessment. Stat Med 2024. [PMID: 39090523 DOI: 10.1002/sim.10186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 06/07/2024] [Accepted: 07/16/2024] [Indexed: 08/04/2024]
Abstract
In recent years, there has been a growing interest in the prediction of individualized treatment effects. While there is a rapidly growing literature on the development of such models, there is little literature on the evaluation of their performance. In this paper, we aim to facilitate the validation of prediction models for individualized treatment effects. The estimands of interest are defined based on the potential outcomes framework, which facilitates a comparison of existing and novel measures. In particular, we examine existing measures of discrimination for benefit (variations of the c-for-benefit), and propose model-based extensions to the treatment effect setting for discrimination and calibration metrics that have a strong basis in outcome risk prediction. The main focus is on randomized trial data with binary endpoints and on models that provide individualized treatment effect predictions and potential outcome predictions. We use simulated data to provide insight into the characteristics of the examined discrimination and calibration statistics under consideration, and further illustrate all methods in a trial of acute ischemic stroke treatment. The results show that the proposed model-based statistics had the best characteristics in terms of bias and accuracy. While resampling methods adjusted for the optimism of performance estimates in the development data, they had a high variance across replications that limited their accuracy. Therefore, individualized treatment effect models are best validated in independent data. To aid implementation, a software implementation of the proposed methods was made available in R.
Collapse
Affiliation(s)
- J Hoogland
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Epidemiology and Data Science, Amsterdam University Medical Center, Amsterdam, The Netherlands
| | - O Efthimiou
- Institute of Primary Health Care (BIHAM), University of Bern, Bern, Switzerland
- Institute of Social and Preventive Medicine (ISPM), University of Bern, Bern, Switzerland
| | - T L Nguyen
- Section of Epidemiology, Department of Public Health, University of Copenhagen, Copenhagen, Denmark
| | - T P A Debray
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Smart Data Analysis and Statistics B.V., Utrecht, The Netherlands
| |
Collapse
|
4
|
Mikolić A, Brasher PMA, Brubacher JR, Panenka W, Scheuermeyer FX, Archambault P, Khazei A, Silverberg ND. External Validation of the Post-Concussion Symptoms Rule for Predicting Mild Traumatic Brain Injury Outcome. J Neurotrauma 2024; 41:1929-1936. [PMID: 38226635 DOI: 10.1089/neu.2023.0484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2024] Open
Abstract
Persistent symptoms are common after a mild traumatic brain injury (mTBI). The Post-Concussion Symptoms (PoCS) Rule is a newly developed clinical decision rule for the prediction of persistent post-concussion symptoms (PPCS) 3 months after an mTBI. The PoCS Rule includes assessment of demographic and clinical characteristics and headache presence in the emergency department (ED), and follow-up assessment of symptoms at 7 days post-injury using two thresholds (lower/higher) for symptom scoring. We examined the PoCS Rule in an independent sample. We analyzed a clinical trial that recruited participants with mTBI from EDs in Greater Vancouver, Canada. The primary analysis used data from 236 participants, who were randomized to a usual care control group, and completed the Rivermead Postconcussion Symptoms Questionnaire at 3 months. The primary outcome was PPCS, as defined by the PoCS authors. We assessed the overall performance of the PoCS rule (area under the receiver operating characteristic curve [AUC]), sensitivity, and specificity. More than 40% of participants (median age 38 years, 59% female) reported PPCS at 3 months. Most participants (88%) were categorized as being at medium risk based on the ED assessment, and a majority were considered as being at high risk according to the final PoCS Rule (81% using a lower threshold and 72% using a higher threshold). The PoCS Rule showed a sensitivity of 93% (95% confidence interval [CI], 88-98; lower threshold) and 85% (95% CI, 78-92; higher threshold), and a specificity of 28% (95% CI, 21-36) and 37% (95% CI, 29-46), respectively. The overall performance was modest (AUC 0.61, 95% CI 0.59, 0.65). In conclusion, the PoCS Rule was sensitive for PPCS, but had a low specificity in our sample. Follow-up assessment of symptoms can improve risk stratification after mTBI.
Collapse
Affiliation(s)
- Ana Mikolić
- Department of Psychology, University of British Columbia, Vancouver, British Columbia, Canada
- Rehabilitation Research Program, Centre for Aging SMART at Vancouver Coastal Health, Vancouver, British Columbia, Canada
| | - Penelope M A Brasher
- Centre for Clinical Epidemiology & Evaluation, Vancouver Coastal Health Research Institute, Vancouver, British Columbia, Canada
| | - Jeffrey R Brubacher
- Department of Emergency Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| | - William Panenka
- Department of Psychiatry, University of British Columbia, Vancouver, British Columbia, Canada
- British Columbia Provincial Neuropsychiatry Program, Vancouver, British Columbia, Canada
- Department of Djavad Mowafaghian Centre for Brain Health, University of British Columbia, Vancouver, British Columbia, Canada
| | - Frank X Scheuermeyer
- Department of Emergency Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| | - Patrick Archambault
- Department of Djavad Mowafaghian Centre for Brain Health, University of British Columbia, Vancouver, British Columbia, Canada
| | - Afshin Khazei
- Department of Emergency Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| | - Noah D Silverberg
- Department of Psychology, University of British Columbia, Vancouver, British Columbia, Canada
- Rehabilitation Research Program, Centre for Aging SMART at Vancouver Coastal Health, Vancouver, British Columbia, Canada
- Department of Family and Emergency Medicine, Université Laval, Québec, Québec, Canada
| |
Collapse
|
5
|
Parsons SK, Rodday AM, Upshaw JN, Scharman CD, Cui Z, Cao Y, Tiger YKR, Maurer MJ, Evens AM. Harnessing multi-source data for individualized care in Hodgkin Lymphoma. Blood Rev 2024; 65:101170. [PMID: 38290895 PMCID: PMC11382606 DOI: 10.1016/j.blre.2024.101170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 12/22/2023] [Accepted: 01/11/2024] [Indexed: 02/01/2024]
Abstract
Hodgkin lymphoma is a rare, but highly curative form of cancer, primarily afflicting adolescents and young adults. Despite multiple seminal trials over the past twenty years, there is no single consensus-based treatment approach beyond use of multi-agency chemotherapy with curative intent. The use of radiation continues to be debated in early-stage disease, as part of combined modality treatment, as well as in salvage, as an important form of consolidation. While short-term disease outcomes have varied little across these different approaches across both early and advanced stage disease, the potential risk of severe, longer-term risk has varied considerably. Over the past decade novel therapeutics have been employed in the retrieval setting in preparation to and as consolidation after autologous stem cell transplant. More recently, these novel therapeutics have moved to the frontline setting, initially compared to standard-of-care treatment and later in a direct head-to-head comparison combined with multi-agent chemotherapy. In 2018, we established the HoLISTIC Consortium, bringing together disease and methods experts to develop clinical decision models based on individual patient data to guide providers, patients, and caregivers in decision-making. In this review, we detail the steps we followed to create the master database of individual patient data from patients treated over the past 20 years, using principles of data science. We then describe different methodological approaches we are taking to clinical decision making, beginning with clinical prediction tools at the time of diagnosis, to multi-state models, incorporating treatments and their response. Finally, we describe how simulation modeling can be used to estimate risks of late effects, based on cumulative exposure from frontline and salvage treatment. The resultant database and tools employed are dynamic with the expectation that they will be updated as better and more complete information becomes available.
Collapse
Affiliation(s)
- Susan K Parsons
- Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, United States of America; Division of Hematology/Oncology, Tufts Medical Center, Boston, MA, United States of America.
| | - Angie Mae Rodday
- Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, United States of America
| | - Jenica N Upshaw
- Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, United States of America; The CardioVascular Center and Advanced Heart Failure Program, Tufts Medical Center, Boston, MA, United States of America
| | | | - Zhu Cui
- Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, United States of America; Division of Hematology/Oncology, Tufts Medical Center, Boston, MA, United States of America
| | - Yenong Cao
- Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, United States of America; Division of Hematology/Oncology, Tufts Medical Center, Boston, MA, United States of America
| | - Yun Kyoung Ryu Tiger
- Division of Blood Disorders, Rutgers Cancer Institute New Jersey, New Brunswick, NJ, United States of America
| | - Matthew J Maurer
- Division of Clinical Trials and Biostatistics and Division of Hematology, Mayo Clinic, Rochester, MN, United States of America
| | - Andrew M Evens
- Division of Blood Disorders, Rutgers Cancer Institute New Jersey, New Brunswick, NJ, United States of America
| |
Collapse
|
6
|
de Winkel J, Roozenbeek B, Dijkland SA, Dammers R, van Doormaal PJ, van der Jagt M, van Klaveren D, Dippel DWJ, Lingsma HF. Personalized decision-making for aneurysm treatment of aneurysmal subarachnoid hemorrhage: development and validation of a clinical prediction tool. BMC Neurol 2024; 24:65. [PMID: 38360580 PMCID: PMC10868110 DOI: 10.1186/s12883-024-03546-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 01/22/2024] [Indexed: 02/17/2024] Open
Abstract
BACKGROUND In patients with aneurysmal subarachnoid hemorrhage suitable for endovascular coiling and neurosurgical clip-reconstruction, the aneurysm treatment decision-making process could be improved by considering heterogeneity of treatment effect and durability of treatment. We aimed to develop and validate a tool to predict individualized treatment benefit of endovascular coiling compared to neurosurgical clip-reconstruction. METHODS We used randomized data (International Subarachnoid Aneurysm Trial, n = 2143) to develop models to predict 2-month functional outcome and to predict time-to-rebleed-or-retreatment. We modeled for heterogeneity of treatment effect by adding interaction terms of treatment with prespecified predictors and with baseline risk of the outcome. We predicted outcome with both treatments and calculated absolute treatment benefit. We described the patient characteristics of patients with ≥ 5% point difference in the predicted probability of favorable functional outcome (modified Rankin Score 0-2) and of no rebleed or retreatment within 10 years. Model performance was expressed with the c-statistic and calibration plots. We performed bootstrapping and leave-one-cluster-out cross-validation and pooled cluster-specific c-statistics with random effects meta-analysis. RESULTS The pooled c-statistics were 0.72 (95% CI: 0.69-0.75) for the prediction of 2-month favorable functional outcome and 0.67 (95% CI: 0.63-0.71) for prediction of no rebleed or retreatment within 10 years. We found no significant interaction between predictors and treatment. The average predicted benefit in favorable functional outcome was 6% (95% CI: 3-10%) in favor of coiling, but 11% (95% CI: 9-13%) for no rebleed or retreatment in favor of clip-reconstruction. 134 patients (6%), young and in favorable clinical condition, had negligible functional outcome benefit of coiling but had a ≥ 5% point benefit of clip-reconstruction in terms of durability of treatment. CONCLUSIONS We show that young patients in favorable clinical condition and without extensive vasospasm have a negligible benefit in functional outcome of endovascular coiling - compared to neurosurgical clip-reconstruction - while at the same time having a substantially lower probability of retreatment or rebleeding from neurosurgical clip-reconstruction - compared to endovascular coiling. The SHARP prediction tool ( https://sharpmodels.shinyapps.io/sharpmodels/ ) could support and incentivize a multidisciplinary discussion about aneurysm treatment decision-making by providing individualized treatment benefit estimates.
Collapse
Affiliation(s)
- Jordi de Winkel
- Department of Neurology, Erasmus MC University Medical Center Rotterdam, 40 Doctor Molewaterplein, P.O. Box 2405, 3015 GD, Rotterdam, Zuid-Holland, The Netherlands.
- Department of Public Health, Erasmus MC University Medical Center Rotterdam, Rotterdam, Zuid-Holland, The Netherlands.
| | - Bob Roozenbeek
- Department of Neurology, Erasmus MC University Medical Center Rotterdam, 40 Doctor Molewaterplein, P.O. Box 2405, 3015 GD, Rotterdam, Zuid-Holland, The Netherlands
| | - Simone A Dijkland
- Department of Neurology, Erasmus MC University Medical Center Rotterdam, 40 Doctor Molewaterplein, P.O. Box 2405, 3015 GD, Rotterdam, Zuid-Holland, The Netherlands
| | - Ruben Dammers
- Department of Neurosurgery, Erasmus MC University Medical Center Rotterdam, Rotterdam, Zuid-Holland, The Netherlands
| | - Pieter-Jan van Doormaal
- Department of Radiology and Nuclear Medicine, Erasmus MC University Medical Center Rotterdam, Rotterdam, Zuid-Holland, The Netherlands
| | - Mathieu van der Jagt
- Department of Intensive Care Adults, Erasmus MC University Medical Center Rotterdam, Rotterdam, Zuid-Holland, The Netherlands
| | - David van Klaveren
- Department of Public Health, Erasmus MC University Medical Center Rotterdam, Rotterdam, Zuid-Holland, The Netherlands
| | - Diederik W J Dippel
- Department of Neurology, Erasmus MC University Medical Center Rotterdam, 40 Doctor Molewaterplein, P.O. Box 2405, 3015 GD, Rotterdam, Zuid-Holland, The Netherlands
| | - Hester F Lingsma
- Department of Public Health, Erasmus MC University Medical Center Rotterdam, Rotterdam, Zuid-Holland, The Netherlands
| |
Collapse
|
7
|
de Jong VMT, Hoogland J, Moons KGM, Riley RD, Nguyen TL, Debray TPA. Propensity-based standardization to enhance the validation and interpretation of prediction model discrimination for a target population. Stat Med 2023; 42:3508-3528. [PMID: 37311563 DOI: 10.1002/sim.9817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Revised: 02/26/2023] [Accepted: 05/19/2023] [Indexed: 06/15/2023]
Abstract
External validation of the discriminative ability of prediction models is of key importance. However, the interpretation of such evaluations is challenging, as the ability to discriminate depends on both the sample characteristics (ie, case-mix) and the generalizability of predictor coefficients, but most discrimination indices do not provide any insight into their respective contributions. To disentangle differences in discriminative ability across external validation samples due to a lack of model generalizability from differences in sample characteristics, we propose propensity-weighted measures of discrimination. These weighted metrics, which are derived from propensity scores for sample membership, are standardized for case-mix differences between the model development and validation samples, allowing for a fair comparison of discriminative ability in terms of model characteristics in a target population of interest. We illustrate our methods with the validation of eight prediction models for deep vein thrombosis in 12 external validation data sets and assess our methods in a simulation study. In the illustrative example, propensity score standardization reduced between-study heterogeneity of discrimination, indicating that between-study variability was partially attributable to case-mix. The simulation study showed that only flexible propensity-score methods (allowing for non-linear effects) produced unbiased estimates of model discrimination in the target population, and only when the positivity assumption was met. Propensity score-based standardization may facilitate the interpretation of (heterogeneity in) discriminative ability of a prediction model as observed across multiple studies, and may guide model updating strategies for a particular target population. Careful propensity score modeling with attention for non-linear relations is recommended.
Collapse
Affiliation(s)
- Valentijn M T de Jong
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Data Analytics and Methods Task Force, European Medicines Agency, Amsterdam, The Netherlands
| | - Jeroen Hoogland
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Department of Epidemiology and Data Science, Amsterdam University Medical Centers, Amsterdam, The Netherlands
| | - Karel G M Moons
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Richard D Riley
- Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| | - Tri-Long Nguyen
- Section of Epidemiology, Department of Public Health, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Thomas P A Debray
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Smart Data Analysis and Statistics, Utrecht, The Netherlands
| |
Collapse
|
8
|
Steingrimsson JA. Extending prediction models for use in a new target population with failure time outcomes. Biostatistics 2023; 24:728-742. [PMID: 35389429 DOI: 10.1093/biostatistics/kxac011] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 03/14/2022] [Accepted: 03/21/2022] [Indexed: 07/20/2023] Open
Abstract
Prediction models are often built and evaluated using data from a population that differs from the target population where model-derived predictions are intended to be used in. In this article, we present methods for evaluating model performance in the target population when some observations are right censored. The methods assume that outcome and covariate data are available from a source population used for model development and covariates, but no outcome data, are available from the target population. We evaluate the finite sample performance of the proposed estimators using simulations and apply the methods to transport a prediction model built using data from a lung cancer screening trial to a nationally representative population of participants eligible for lung cancer screening.
Collapse
Affiliation(s)
- Jon A Steingrimsson
- Department of Biostatistics, Brown University, 121 South Main Street, Providence, RI 02903, USA
| |
Collapse
|
9
|
Steingrimsson JA, Gatsonis C, Li B, Dahabreh IJ. Transporting a Prediction Model for Use in a New Target Population. Am J Epidemiol 2023; 192:296-304. [PMID: 35872598 PMCID: PMC11004796 DOI: 10.1093/aje/kwac128] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 05/23/2022] [Accepted: 07/19/2022] [Indexed: 02/07/2023] Open
Abstract
We considered methods for transporting a prediction model for use in a new target population, both when outcome and covariate data for model development are available from a source population that has a different covariate distribution compared with the target population and when covariate data (but not outcome data) are available from the target population. We discuss how to tailor the prediction model to account for differences in the data distribution between the source population and the target population. We also discuss how to assess the model's performance (e.g., by estimating the mean squared prediction error) in the target population. We provide identifiability results for measures of model performance in the target population for a potentially misspecified prediction model under a sampling design where the source and the target population samples are obtained separately. We introduce the concept of prediction error modifiers that can be used to reason about tailoring measures of model performance to the target population. We illustrate the methods in simulated data and apply them to transport a prediction model for lung cancer diagnosis from the National Lung Screening Trial to the nationally representative target population of trial-eligible individuals in the National Health and Nutrition Examination Survey.
Collapse
Affiliation(s)
- Jon A Steingrimsson
- Correspondence to Dr. Jon A. Steingrimsson, Department of Biostatistics, School of Public Health, Brown University, 121 S. Main Street, Providence, RI 02903 (e-mail: )
| | | | | | | |
Collapse
|
10
|
McLernon DJ, Giardiello D, Van Calster B, Wynants L, van Geloven N, van Smeden M, Therneau T, Steyerberg EW. Assessing Performance and Clinical Usefulness in Prediction Models With Survival Outcomes: Practical Guidance for Cox Proportional Hazards Models. Ann Intern Med 2023; 176:105-114. [PMID: 36571841 DOI: 10.7326/m22-0844] [Citation(s) in RCA: 32] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Risk prediction models need thorough validation to assess their performance. Validation of models for survival outcomes poses challenges due to the censoring of observations and the varying time horizon at which predictions can be made. This article describes measures to evaluate predictions and the potential improvement in decision making from survival models based on Cox proportional hazards regression. As a motivating case study, the authors consider the prediction of the composite outcome of recurrence or death (the "event") in patients with breast cancer after surgery. They developed a simple Cox regression model with 3 predictors, as in the Nottingham Prognostic Index, in 2982 women (1275 events over 5 years of follow-up) and externally validated this model in 686 women (285 events over 5 years). Improvement in performance was assessed after the addition of progesterone receptor as a prognostic biomarker. The model predictions can be evaluated across the full range of observed follow-up times or for the event occurring by the end of a fixed time horizon of interest. The authors first discuss recommended statistical measures that evaluate model performance in terms of discrimination, calibration, or overall performance. Further, they evaluate the potential clinical utility of the model to support clinical decision making according to a net benefit measure. They provide SAS and R code to illustrate internal and external validation. The authors recommend the proposed set of performance measures for transparent reporting of the validity of predictions from survival models.
Collapse
Affiliation(s)
- David J McLernon
- Institute of Applied Health Sciences, University of Aberdeen, Aberdeen, United Kingdom (D.J.M.)
| | - Daniele Giardiello
- Netherlands Cancer Institute, Amsterdam, the Netherlands, Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands, and Institute of Biomedicine, Eurac Research, Affiliated Institute of the University of Lübeck, Bolzano, Italy (D.G.)
| | - Ben Van Calster
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands, and Department of Development and Regeneration, Katholieke Universiteit Leuven, Leuven, Belgium (B.V.)
| | - Laure Wynants
- School for Public Health and Primary Care, Maastricht University, Maastricht, the Netherlands (L.W.)
| | - Nan van Geloven
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands (N.V., E.W.S.)
| | - Maarten van Smeden
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands (M.V.)
| | - Terry Therneau
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, Minnesota (T.T.)
| | - Ewout W Steyerberg
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands (N.V., E.W.S.)
| |
Collapse
|
11
|
Quandt F, Meißner N, Wölfer TA, Flottmann F, Deb-Chatterji M, Kellert L, Fiehler J, Goyal M, Saver JL, Gerloff C, Thomalla G, Tiedt S. RCT versus real-world cohorts: Differences in patient characteristics drive associations with outcome after EVT. Eur Stroke J 2022; 8:231-240. [PMID: 37021166 PMCID: PMC10069173 DOI: 10.1177/23969873221142642] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 11/09/2022] [Indexed: 12/23/2022] Open
Abstract
Background: The selection of patients with large-vessel occlusion (LVO) stroke for endovascular treatment (EVT) depends on patient characteristics and procedural metrics. The relation of these variables to functional outcome after EVT has been assessed in numerous datasets from both randomized controlled trials (RCT) and real-world registries, but whether differences in their case mix modulate outcome prediction is unknown. Methods: We leveraged data from individual patients with anterior LVO stroke treated with EVT from completed RCTs from the Virtual International Stroke Trials Archive ( N = 479) and from the German Stroke Registry ( N = 4079). Cohorts were compared regarding (i) patient characteristics and procedural pre-EVT metrics, (ii) these variables’ relation to functional outcome, and (iii) the performance of derived outcome prediction models. Relation to outcome (functional dependence defined by a modified Rankin Scale score of 3–6 at 90 days) was analyzed by logistic regression models and a machine learning algorithm. Results: Ten out of 11 analyzed baseline variables differed between the RCT and real-world cohort: RCT patients were younger, had higher admission NIHSS scores, and received thrombolysis more often (all p < 0.0001). Largest differences at the level of individual outcome predictors were observed for age (RCT: adjusted odds ratio (aOR), 1.29 (95% CI, 1.10–1.53) vs real-world aOR, 1.65 (95% CI, 1.54–1.78) per 10-year increments, p < 0.001). Treatment with intravenous thrombolysis was not significantly associated with functional outcome in the RCT cohort (aOR, 1.64 (95 % CI, 0.91–3.00)), but in the real-world cohort (aOR, 0.81 (95% CI, 0.69–0.96); p for cohort heterogeneity = 0.056). Outcome prediction was more accurate when constructing and testing the model using real-world data compared to construction with RCT data and testing on real-world data (area under the curve, 0.82 (95% CI, 0.79–0.85) vs 0.79 (95% CI, 0.77–0.80), p = 0.004). Conclusions: RCT and real-world cohorts considerably differ in patient characteristics, individual outcome predictor strength, and overall outcome prediction model performance.
Collapse
Affiliation(s)
- Fanny Quandt
- Department of Neurology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Nina Meißner
- Institute for Stroke and Dementia Research, University Hospital, LMU Munich, Munich, Germany
| | - Teresa A Wölfer
- Institute for Stroke and Dementia Research, University Hospital, LMU Munich, Munich, Germany
| | - Fabian Flottmann
- Department of Diagnostic and Interventional Neuroradiology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Milani Deb-Chatterji
- Department of Neurology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Lars Kellert
- Department of Neurology, University Hospital, LMU Munich, Munich, Germany
| | - Jens Fiehler
- Department of Diagnostic and Interventional Neuroradiology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Mayank Goyal
- Department of Radiology, University of Calgary, Foothills Medical Centre, Calgary, AB, Canada
| | - Jeffrey L Saver
- Department of Neurology and Comprehensive Stroke Center, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
| | - Christian Gerloff
- Department of Neurology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Götz Thomalla
- Department of Neurology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Steffen Tiedt
- Institute for Stroke and Dementia Research, University Hospital, LMU Munich, Munich, Germany
| |
Collapse
|
12
|
Rylance RT, Wagner P, Olesen KKW, Carlson J, Alfredsson J, Jernberg T, Leosdottir M, Johansson P, Vasko P, Maeng M, Mohammed MA, Erlinge D. Patient-oriented risk score for predicting death 1 year after myocardial infarction: the SweDen risk score. Open Heart 2022; 9:openhrt-2022-002143. [PMID: 36460308 PMCID: PMC9723953 DOI: 10.1136/openhrt-2022-002143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Accepted: 10/28/2022] [Indexed: 12/04/2022] Open
Abstract
OBJECTIVES Our aim was to derive, based on the SWEDEHEART registry, and validate, using the Western Denmark Heart registry, a patient-oriented risk score, the SweDen score, which could calculate the risk of 1-year mortality following a myocardial infarction (MI). METHODS The factors included in the SweDen score were age, sex, smoking, diabetes, heart failure and statin use. These were chosen a priori by the SWEDEHEART steering group based on the premise that the factors were information known by the patients themselves. The score was evaluated using various statistical methods such as time-dependent receiver operating characteristics curves of the linear predictor, area under the curve metrics, Kaplan-Meier survivor curves and the calibration slope. RESULTS The area under the curve values were 0.81 in the derivation data and 0.76 in the validation data. The Kaplan-Meier curves showed similar patient profiles across datasets. The calibration slope was 1.03 (95% CI 0.99 to 1.08) in the validation data using the linear predictor from the derivation data. CONCLUSIONS The SweDen risk score is a novel tool created for patient use. The risk score calculator will be available online and presents mortality risk on a colour scale to simplify interpretation and to avoid exact life span expectancies. It provides a validated patient-oriented risk score predicting the risk of death within 1 year after suffering an MI, which visualises the benefit of statin use and smoking cessation in a simple way.
Collapse
Affiliation(s)
- Rebecca Tremain Rylance
- Department of Cardiology, Clinical Sciences, Lund University and Skåne University Hospital, Lund, Sweden
| | - Philippe Wagner
- Center for Clinical Research, Uppsala University, Uppsala, Sweden
| | - Kevin K W Olesen
- Department of Cardiology, Aarhus University Hospital, Aarhus, Denmark
| | - Jonas Carlson
- Department of Cardiology, Clinical Sciences, Lund University and Skåne University Hospital, Lund, Sweden
| | - Joakim Alfredsson
- Department of Cardiology, Karolinska University Hospital, Linkoping, Sweden
| | - Tomas Jernberg
- The Swedish Heart and Lung Association, Stockholm, Sweden
| | - Margret Leosdottir
- Department of Clinical Sciences, Skåne University Hospital Lund, Malmö, Sweden,Department of Clinical Sciences, Lund University, Malmo, Sweden
| | | | - Peter Vasko
- Department of Cardiology, Karolinska University Hospital, Linkoping, Sweden
| | - Michael Maeng
- Department of Cardiology, Clinical Sciences, Lund University and Skåne University Hospital, Lund, Sweden
| | - Moman Aladdin Mohammed
- Department of Cardiology, Clinical Sciences, Lund University and Skåne University Hospital, Lund, Sweden
| | - David Erlinge
- Department of Cardiology, Clinical Sciences, Lund University and Skåne University Hospital, Lund, Sweden
| |
Collapse
|
13
|
van Klaveren D, Zanos TP, Nelson J, Levy TJ, Park JG, Retel Helmrich IRA, Rietjens JAC, Basile MJ, Hajizadeh N, Lingsma HF, Kent DM. Prognostic models for COVID-19 needed updating to warrant transportability over time and space. BMC Med 2022; 20:456. [PMID: 36424619 PMCID: PMC9686462 DOI: 10.1186/s12916-022-02651-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 11/04/2022] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Supporting decisions for patients who present to the emergency department (ED) with COVID-19 requires accurate prognostication. We aimed to evaluate prognostic models for predicting outcomes in hospitalized patients with COVID-19, in different locations and across time. METHODS We included patients who presented to the ED with suspected COVID-19 and were admitted to 12 hospitals in the New York City (NYC) area and 4 large Dutch hospitals. We used second-wave patients who presented between September and December 2020 (2137 and 3252 in NYC and the Netherlands, respectively) to evaluate models that were developed on first-wave patients who presented between March and August 2020 (12,163 and 5831). We evaluated two prognostic models for in-hospital death: The Northwell COVID-19 Survival (NOCOS) model was developed on NYC data and the COVID Outcome Prediction in the Emergency Department (COPE) model was developed on Dutch data. These models were validated on subsequent second-wave data at the same site (temporal validation) and at the other site (geographic validation). We assessed model performance by the Area Under the receiver operating characteristic Curve (AUC), by the E-statistic, and by net benefit. RESULTS Twenty-eight-day mortality was considerably higher in the NYC first-wave data (21.0%), compared to the second-wave (10.1%) and the Dutch data (first wave 10.8%; second wave 10.0%). COPE discriminated well at temporal validation (AUC 0.82), with excellent calibration (E-statistic 0.8%). At geographic validation, discrimination was satisfactory (AUC 0.78), but with moderate over-prediction of mortality risk, particularly in higher-risk patients (E-statistic 2.9%). While discrimination was adequate when NOCOS was tested on second-wave NYC data (AUC 0.77), NOCOS systematically overestimated the mortality risk (E-statistic 5.1%). Discrimination in the Dutch data was good (AUC 0.81), but with over-prediction of risk, particularly in lower-risk patients (E-statistic 4.0%). Recalibration of COPE and NOCOS led to limited net benefit improvement in Dutch data, but to substantial net benefit improvement in NYC data. CONCLUSIONS NOCOS performed moderately worse than COPE, probably reflecting unique aspects of the early pandemic in NYC. Frequent updating of prognostic models is likely to be required for transportability over time and space during a dynamic pandemic.
Collapse
Affiliation(s)
- David van Klaveren
- Department of Public Health, Erasmus MC University Medical Center Rotterdam, Dr. Molewaterplein 50, 3015 GE, Rotterdam, The Netherlands. .,Predictive Analytics and Comparative Effectiveness Center, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, USA.
| | - Theodoros P Zanos
- Institute of Bioelectronic Medicine, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA
| | - Jason Nelson
- Predictive Analytics and Comparative Effectiveness Center, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, USA
| | - Todd J Levy
- Institute of Bioelectronic Medicine, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA
| | - Jinny G Park
- Predictive Analytics and Comparative Effectiveness Center, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, USA
| | - Isabel R A Retel Helmrich
- Department of Public Health, Erasmus MC University Medical Center Rotterdam, Dr. Molewaterplein 50, 3015 GE, Rotterdam, The Netherlands
| | - Judith A C Rietjens
- Department of Public Health, Erasmus MC University Medical Center Rotterdam, Dr. Molewaterplein 50, 3015 GE, Rotterdam, The Netherlands
| | - Melissa J Basile
- Division of Pulmonary Critical Care and Sleep Medicine, Department of Medicine, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell Health, Hempstead, NY, USA
| | - Negin Hajizadeh
- Division of Pulmonary Critical Care and Sleep Medicine, Department of Medicine, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell Health, Hempstead, NY, USA
| | - Hester F Lingsma
- Department of Public Health, Erasmus MC University Medical Center Rotterdam, Dr. Molewaterplein 50, 3015 GE, Rotterdam, The Netherlands
| | - David M Kent
- Predictive Analytics and Comparative Effectiveness Center, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, USA
| |
Collapse
|
14
|
Helmrich IRAR, Mikolić A, Kent DM, Lingsma HF, Wynants L, Steyerberg EW, van Klaveren D. Does poor methodological quality of prediction modeling studies translate to poor model performance? An illustration in traumatic brain injury. Diagn Progn Res 2022; 6:8. [PMID: 35509061 PMCID: PMC9068255 DOI: 10.1186/s41512-022-00122-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Accepted: 02/09/2022] [Indexed: 12/01/2022] Open
Abstract
BACKGROUND Prediction modeling studies often have methodological limitations, which may compromise model performance in new patients and settings. We aimed to examine the relation between methodological quality of model development studies and their performance at external validation. METHODS We systematically searched for externally validated multivariable prediction models that predict functional outcome following moderate or severe traumatic brain injury. Risk of bias and applicability of development studies was assessed with the Prediction model Risk Of Bias Assessment Tool (PROBAST). Each model was rated for its presentation with sufficient detail to be used in practice. Model performance was described in terms of discrimination (AUC), and calibration. Delta AUC (dAUC) was calculated to quantify the percentage change in discrimination between development and validation for all models. Generalized estimation equations (GEE) were used to examine the relation between methodological quality and dAUC while controlling for clustering. RESULTS We included 54 publications, presenting ten development studies of 18 prediction models, and 52 external validation studies, including 245 unique validations. Two development studies (four models) were found to have low risk of bias (RoB). The other eight publications (14 models) showed high or unclear RoB. The median dAUC was positive in low RoB models (dAUC 8%, [IQR - 4% to 21%]) and negative in high RoB models (dAUC - 18%, [IQR - 43% to 2%]). The GEE showed a larger average negative change in discrimination for high RoB models (- 32% (95% CI: - 48 to - 15) and unclear RoB models (- 13% (95% CI: - 16 to - 10)) compared to that seen in low RoB models. CONCLUSION Lower methodological quality at model development associates with poorer model performance at external validation. Our findings emphasize the importance of adherence to methodological principles and reporting guidelines in prediction modeling studies.
Collapse
Affiliation(s)
- Isabel R A Retel Helmrich
- Department of Public Health, Center for Medical Decision Making, Erasmus MC-University Medical Center, Rotterdam, the Netherlands.
| | - Ana Mikolić
- Department of Public Health, Center for Medical Decision Making, Erasmus MC-University Medical Center, Rotterdam, the Netherlands
| | - David M Kent
- Predictive Analytics and Comparative Effectiveness Center, Institute for Clinical Research and Health Policy Studies/Tufts Medical Center, Boston, USA
| | - Hester F Lingsma
- Department of Public Health, Center for Medical Decision Making, Erasmus MC-University Medical Center, Rotterdam, the Netherlands
| | - Laure Wynants
- Department of Epidemiology, School for Public Health and Primary Care, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, The Netherlands
| | - Ewout W Steyerberg
- Department of Public Health, Center for Medical Decision Making, Erasmus MC-University Medical Center, Rotterdam, the Netherlands
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - David van Klaveren
- Department of Public Health, Center for Medical Decision Making, Erasmus MC-University Medical Center, Rotterdam, the Netherlands
- Predictive Analytics and Comparative Effectiveness Center, Institute for Clinical Research and Health Policy Studies/Tufts Medical Center, Boston, USA
| |
Collapse
|
15
|
Gulati G, Upshaw J, Wessler BS, Brazil RJ, Nelson J, van Klaveren D, Lundquist CM, Park JG, McGinnes H, Steyerberg EW, Van Calster B, Kent DM. Generalizability of Cardiovascular Disease Clinical Prediction Models: 158 Independent External Validations of 104 Unique Models. Circ Cardiovasc Qual Outcomes 2022; 15:e008487. [PMID: 35354282 PMCID: PMC9015037 DOI: 10.1161/circoutcomes.121.008487] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Background: While clinical prediction models (CPMs) are used increasingly commonly to guide patient care, the performance and clinical utility of these CPMs in new patient cohorts is poorly understood. Methods: We performed 158 external validations of 104 unique CPMs across 3 domains of cardiovascular disease (primary prevention, acute coronary syndrome, and heart failure). Validations were performed in publicly available clinical trial cohorts and model performance was assessed using measures of discrimination, calibration, and net benefit. To explore potential reasons for poor model performance, CPM-clinical trial cohort pairs were stratified based on relatedness, a domain-specific set of characteristics to qualitatively grade the similarity of derivation and validation patient populations. We also examined the model-based C-statistic to assess whether changes in discrimination were because of differences in case-mix between the derivation and validation samples. The impact of model updating on model performance was also assessed. Results: Discrimination decreased significantly between model derivation (0.76 [interquartile range 0.73–0.78]) and validation (0.64 [interquartile range 0.60–0.67], P<0.001), but approximately half of this decrease was because of narrower case-mix in the validation samples. CPMs had better discrimination when tested in related compared with distantly related trial cohorts. Calibration slope was also significantly higher in related trial cohorts (0.77 [interquartile range, 0.59–0.90]) than distantly related cohorts (0.59 [interquartile range 0.43–0.73], P=0.001). When considering the full range of possible decision thresholds between half and twice the outcome incidence, 91% of models had a risk of harm (net benefit below default strategy) at some threshold; this risk could be reduced substantially via updating model intercept, calibration slope, or complete re-estimation. Conclusions: There are significant decreases in model performance when applying cardiovascular disease CPMs to new patient populations, resulting in substantial risk of harm. Model updating can mitigate these risks. Care should be taken when using CPMs to guide clinical decision-making.
Collapse
Affiliation(s)
- Gaurav Gulati
- Predictive Analytics and Comparative Effectiveness (PACE) Center, Institute for Clinical Research and Health Policy Studies (ICRHPS), Tufts Medical Center, Boston, MA (G.G., J.U., B.S.W., R.J.B., J.N., D.v.K., C.M.L., J.G.P., H.M., D.M.K.).,Division of Cardiology, Tufts Medical Center, Boston, MA (G.G., J.U., B.S.W.)
| | - Jenica Upshaw
- Predictive Analytics and Comparative Effectiveness (PACE) Center, Institute for Clinical Research and Health Policy Studies (ICRHPS), Tufts Medical Center, Boston, MA (G.G., J.U., B.S.W., R.J.B., J.N., D.v.K., C.M.L., J.G.P., H.M., D.M.K.).,Division of Cardiology, Tufts Medical Center, Boston, MA (G.G., J.U., B.S.W.)
| | - Benjamin S Wessler
- Predictive Analytics and Comparative Effectiveness (PACE) Center, Institute for Clinical Research and Health Policy Studies (ICRHPS), Tufts Medical Center, Boston, MA (G.G., J.U., B.S.W., R.J.B., J.N., D.v.K., C.M.L., J.G.P., H.M., D.M.K.).,Division of Cardiology, Tufts Medical Center, Boston, MA (G.G., J.U., B.S.W.)
| | - Riley J Brazil
- Predictive Analytics and Comparative Effectiveness (PACE) Center, Institute for Clinical Research and Health Policy Studies (ICRHPS), Tufts Medical Center, Boston, MA (G.G., J.U., B.S.W., R.J.B., J.N., D.v.K., C.M.L., J.G.P., H.M., D.M.K.)
| | - Jason Nelson
- Predictive Analytics and Comparative Effectiveness (PACE) Center, Institute for Clinical Research and Health Policy Studies (ICRHPS), Tufts Medical Center, Boston, MA (G.G., J.U., B.S.W., R.J.B., J.N., D.v.K., C.M.L., J.G.P., H.M., D.M.K.)
| | - David van Klaveren
- Predictive Analytics and Comparative Effectiveness (PACE) Center, Institute for Clinical Research and Health Policy Studies (ICRHPS), Tufts Medical Center, Boston, MA (G.G., J.U., B.S.W., R.J.B., J.N., D.v.K., C.M.L., J.G.P., H.M., D.M.K.).,Department of Biomedical Data Sciences, Leiden University Medical Centre, Netherlands (D.v.K., E.W.S., B.V.C.)
| | - Christine M Lundquist
- Predictive Analytics and Comparative Effectiveness (PACE) Center, Institute for Clinical Research and Health Policy Studies (ICRHPS), Tufts Medical Center, Boston, MA (G.G., J.U., B.S.W., R.J.B., J.N., D.v.K., C.M.L., J.G.P., H.M., D.M.K.)
| | - Jinny G Park
- Predictive Analytics and Comparative Effectiveness (PACE) Center, Institute for Clinical Research and Health Policy Studies (ICRHPS), Tufts Medical Center, Boston, MA (G.G., J.U., B.S.W., R.J.B., J.N., D.v.K., C.M.L., J.G.P., H.M., D.M.K.)
| | - Hannah McGinnes
- Predictive Analytics and Comparative Effectiveness (PACE) Center, Institute for Clinical Research and Health Policy Studies (ICRHPS), Tufts Medical Center, Boston, MA (G.G., J.U., B.S.W., R.J.B., J.N., D.v.K., C.M.L., J.G.P., H.M., D.M.K.)
| | - Ewout W Steyerberg
- Department of Biomedical Data Sciences, Leiden University Medical Centre, Netherlands (D.v.K., E.W.S., B.V.C.)
| | - Ben Van Calster
- Department of Biomedical Data Sciences, Leiden University Medical Centre, Netherlands (D.v.K., E.W.S., B.V.C.).,KU Leuven, Department of Development and Regeneration, Belgium (B.V.C.).,EPI-Center, KU Leuven, Belgium (B.V.C.)
| | - David M Kent
- Predictive Analytics and Comparative Effectiveness (PACE) Center, Institute for Clinical Research and Health Policy Studies (ICRHPS), Tufts Medical Center, Boston, MA (G.G., J.U., B.S.W., R.J.B., J.N., D.v.K., C.M.L., J.G.P., H.M., D.M.K.)
| |
Collapse
|
16
|
Holl DC, Mikolic A, Blaauw J, Lodewijkx R, Foppen M, Jellema K, van der Gaag NA, den Hertog HM, Jacobs B, van der Naalt J, Verbaan D, Kho KH, Dirven CMF, Dammers R, Lingsma HF, van Klaveren D. External validation of prognostic models predicting outcome after chronic subdural hematoma. Acta Neurochir (Wien) 2022; 164:2719-2730. [PMID: 35501576 PMCID: PMC9519711 DOI: 10.1007/s00701-022-05216-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Accepted: 04/07/2022] [Indexed: 01/26/2023]
Abstract
BACKGROUND Several prognostic models for outcomes after chronic subdural hematoma (CSDH) treatment have been published in recent years. However, these models are not sufficiently validated for use in daily clinical practice. We aimed to assess the performance of existing prediction models for outcomes in patients diagnosed with CSDH. METHODS We systematically searched relevant literature databases up to February 2021 to identify prognostic models for outcome prediction in patients diagnosed with CSDH. For the external validation of prognostic models, we used a retrospective database, containing data of 2384 patients from three Dutch regions. Prognostic models were included if they predicted either mortality, hematoma recurrence, functional outcome, or quality of life. Models were excluded when predictors were absent in our database or available for < 150 patients in our database. We assessed calibration, and discrimination (quantified by the concordance index C) of the included prognostic models in our retrospective database. RESULTS We identified 1680 original publications of which 1656 were excluded based on title or abstract, mostly because they did not concern CSDH or did not define a prognostic model. Out of 18 identified models, three could be externally validated in our retrospective database: a model for 30-day mortality in 1656 patients, a model for 2 months, and another for 3-month hematoma recurrence both in 1733 patients. The models overestimated the proportion of patients with these outcomes by 11% (15% predicted vs. 4% observed), 1% (10% vs. 9%), and 2% (11% vs. 9%), respectively. Their discriminative ability was poor to modest (C of 0.70 [0.63-0.77]; 0.46 [0.35-0.56]; 0.59 [0.51-0.66], respectively). CONCLUSIONS None of the examined models showed good predictive performance for outcomes after CSDH treatment in our dataset. This study confirms the difficulty in predicting outcomes after CSDH and emphasizes the heterogeneity of CSDH patients. The importance of developing high-quality models by using unified predictors and relevant outcome measures and appropriate modeling strategies is warranted.
Collapse
Affiliation(s)
- Dana C. Holl
- grid.5645.2000000040459992XDepartment of Neurosurgery, Erasmus Medical Centre, Erasmus MC Stroke Centre, Dr Molewaterplein 40, 3015 GD Rotterdam, The Netherlands ,grid.5645.2000000040459992XDepartment of Public Health, Erasmus Medical Centre, Rotterdam, The Netherlands ,grid.414842.f0000 0004 0395 6796Department of Neurology, Haaglanden Medical Centre, Hague, The Netherlands
| | - Ana Mikolic
- grid.5645.2000000040459992XDepartment of Public Health, Erasmus Medical Centre, Rotterdam, The Netherlands
| | - Jurre Blaauw
- grid.4494.d0000 0000 9558 4598Department of Neurology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Roger Lodewijkx
- Department of Neurosurgery, Amsterdam Medical Centre, Amsterdam, The Netherlands
| | - Merijn Foppen
- Department of Neurosurgery, Amsterdam Medical Centre, Amsterdam, The Netherlands
| | - Korné Jellema
- grid.414842.f0000 0004 0395 6796Department of Neurology, Haaglanden Medical Centre, Hague, The Netherlands
| | - Niels A. van der Gaag
- grid.10419.3d0000000089452978University Neurosurgical Centre Holland (UNCH), Leiden University Medical Centre, Haaglanden Medical Centre, Haga Teaching Hospital, Leiden, The Netherlands
| | - Heleen M. den Hertog
- grid.452600.50000 0001 0547 5927Department of Neurology, Isala Hospital Zwolle, Zwolle, The Netherlands
| | - Bram Jacobs
- grid.4494.d0000 0000 9558 4598Department of Neurology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Joukje van der Naalt
- grid.4494.d0000 0000 9558 4598Department of Neurology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Dagmar Verbaan
- Department of Neurosurgery, Amsterdam Medical Centre, Amsterdam, The Netherlands
| | - K. H. Kho
- Department of Neurosurgery, NeurocenterMedisch Spectrum Twente, Enschede, The Netherlands ,grid.6214.10000 0004 0399 8953Clinical Neurophysiology Group, University of Twente, Enschede, The Netherlands
| | - C. M. F. Dirven
- grid.5645.2000000040459992XDepartment of Neurosurgery, Erasmus Medical Centre, Erasmus MC Stroke Centre, Dr Molewaterplein 40, 3015 GD Rotterdam, The Netherlands
| | - Ruben Dammers
- grid.5645.2000000040459992XDepartment of Neurosurgery, Erasmus Medical Centre, Erasmus MC Stroke Centre, Dr Molewaterplein 40, 3015 GD Rotterdam, The Netherlands
| | - Hester F. Lingsma
- grid.5645.2000000040459992XDepartment of Public Health, Erasmus Medical Centre, Rotterdam, The Netherlands
| | - David van Klaveren
- grid.5645.2000000040459992XDepartment of Public Health, Erasmus Medical Centre, Rotterdam, The Netherlands
| |
Collapse
|
17
|
Sadatsafavi M, Saha-Chaudhuri P, Petkau J. Model-Based ROC Curve: Examining the Effect of Case Mix and Model Calibration on the ROC Plot. Med Decis Making 2021; 42:487-499. [PMID: 34657518 PMCID: PMC9005838 DOI: 10.1177/0272989x211050909] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Background The performance of risk prediction models is often characterized in terms of discrimination and calibration. The receiver-operating characteristic (ROC) curve is widely used for evaluating model discrimination. However, when comparing ROC curves across different samples, the effect of case mix makes the interpretation of discrepancies difficult. Further, compared with model discrimination, evaluating model calibration has not received the same level of attention. Current methods for examining model calibration require specification of smoothing or grouping factors. Methods We introduce the “model-based” ROC curve (mROC) to assess model calibration and the effect of case mix during external validation. The mROC curve is the ROC curve that should be observed if the prediction model is calibrated in the external population. We show that calibration-in-the-large and the equivalence of mROC and ROC curves are together sufficient conditions for the model to be calibrated. Based on this, we propose a novel statistical test for calibration that, unlike current methods, does not require any subjective specification of smoothing or grouping factors. Results Through a stylized example, we demonstrate how mROC separates the effect of case mix and model miscalibration when externally validating a risk prediction model. We present the results of simulation studies that confirm the properties of the new calibration test. A case study on predicting the risk of acute exacerbations of chronic obstructive pulmonary disease puts the developments in a practical context. R code for the implementation of this method is provided. Conclusion mROC can easily be constructed and used to interpret the effect of case mix and calibration on the ROC plot. Given the popularity of ROC curves among applied investigators, this framework can further promote assessment of model calibration. Highlights
Collapse
Affiliation(s)
- Mohsen Sadatsafavi
- Faculty of Pharmaceutical Sciences, The University of British Columbia, Vancouver, BC, Canada.,Faculty of Medicine, The University of British Columbia, Vancouver, BC, Canada
| | | | - John Petkau
- Department of Statistics, The University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
18
|
van Klaveren D, Rekkas A, Alsma J, Verdonschot RJCG, Koning DTJJ, Kamps MJA, Dormans T, Stassen R, Weijer S, Arnold KS, Tomlow B, de Geus HRH, van Bruchem-Visser RL, Miedema JR, Verbon A, van Nood E, Kent DM, Schuit SCE, Lingsma H. COVID outcome prediction in the emergency department (COPE): using retrospective Dutch hospital data to develop simple and valid models for predicting mortality and need for intensive care unit admission in patients who present at the emergency department with suspected COVID-19. BMJ Open 2021; 11:e051468. [PMID: 34531219 PMCID: PMC8449847 DOI: 10.1136/bmjopen-2021-051468] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Accepted: 08/23/2021] [Indexed: 12/15/2022] Open
Abstract
OBJECTIVES Develop simple and valid models for predicting mortality and need for intensive care unit (ICU) admission in patients who present at the emergency department (ED) with suspected COVID-19. DESIGN Retrospective. SETTING Secondary care in four large Dutch hospitals. PARTICIPANTS Patients who presented at the ED and were admitted to hospital with suspected COVID-19. We used 5831 first-wave patients who presented between March and August 2020 for model development and 3252 second-wave patients who presented between September and December 2020 for model validation. OUTCOME MEASURES We developed separate logistic regression models for in-hospital death and for need for ICU admission, both within 28 days after hospital admission. Based on prior literature, we considered quickly and objectively obtainable patient characteristics, vital parameters and blood test values as predictors. We assessed model performance by the area under the receiver operating characteristic curve (AUC) and by calibration plots. RESULTS Of 5831 first-wave patients, 629 (10.8%) died within 28 days after admission. ICU admission was fully recorded for 2633 first-wave patients in 2 hospitals, with 214 (8.1%) ICU admissions within 28 days. A simple model-COVID outcome prediction in the emergency department (COPE)-with age, respiratory rate, C reactive protein, lactate dehydrogenase, albumin and urea captured most of the ability to predict death. COPE was well calibrated and showed good discrimination for mortality in second-wave patients (AUC in four hospitals: 0.82 (95% CI 0.78 to 0.86); 0.82 (95% CI 0.74 to 0.90); 0.79 (95% CI 0.70 to 0.88); 0.83 (95% CI 0.79 to 0.86)). COPE was also able to identify patients at high risk of needing ICU admission in second-wave patients (AUC in two hospitals: 0.84 (95% CI 0.78 to 0.90); 0.81 (95% CI 0.66 to 0.95)). CONCLUSIONS COPE is a simple tool that is well able to predict mortality and need for ICU admission in patients who present to the ED with suspected COVID-19 and may help patients and doctors in decision making.
Collapse
Affiliation(s)
- David van Klaveren
- Department of Public Health, Erasmus MC, Rotterdam, The Netherlands
- Predictive Analytics and Comparative Effectiveness Center, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, Massachusetts, USA
| | - Alexandros Rekkas
- Department of Medical Informatics, Erasmus MC, Rotterdam, The Netherlands
| | - Jelmer Alsma
- Department of Internal Medicine, Erasmus MC, Rotterdam, The Netherlands
| | | | - Dick T J J Koning
- Department of Intensive Care, Catharina Hospital, Eindhoven, The Netherlands
| | - Marlijn J A Kamps
- Department of Intensive Care, Catharina Hospital, Eindhoven, The Netherlands
| | - Tom Dormans
- Department of Intensive Care, Zuyderland Medical Centre Heerlen, Heerlen, The Netherlands
| | - Robert Stassen
- Department of Traumatology, Maastricht University Medical Centre+, Maastricht, The Netherlands
| | - Sebastiaan Weijer
- Department of Internal Medicine, Antonius Hospital Sneek, Sneek, The Netherlands
| | - Klaas-Sierk Arnold
- Department of Intensive Care, Antonius Hospital Sneek, Sneek, The Netherlands
| | - Benjamin Tomlow
- Department of Pulmonary Medicine, Isala Hospitals, Zwolle, The Netherlands
| | - Hilde R H de Geus
- Department of Intensive Care, Erasmus MC, Rotterdam, The Netherlands
| | | | - Jelle R Miedema
- Department of Pulmonary Medicine, Erasmus MC, Rotterdam, The Netherlands
| | - Annelies Verbon
- Department of Medical Microbiology and Infectious Diseases, Erasmus MC, Rotterdam, The Netherlands
| | - Els van Nood
- Department of Internal Medicine, Department of Medical Microbiology and Infectious Diseases, Erasmus MC, Rotterdam, The Netherlands
| | - David M Kent
- Predictive Analytics and Comparative Effectiveness Center, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, Massachusetts, USA
| | | | - Hester Lingsma
- Department of Public Health, Erasmus MC, Rotterdam, The Netherlands
| |
Collapse
|
19
|
Wessler BS, Nelson J, Park JG, McGinnes H, Gulati G, Brazil R, Van Calster B, van Klaveren D, Venema E, Steyerberg E, Paulus JK, Kent DM. External Validations of Cardiovascular Clinical Prediction Models: A Large-Scale Review of the Literature. Circ Cardiovasc Qual Outcomes 2021; 14:e007858. [PMID: 34340529 PMCID: PMC8366535 DOI: 10.1161/circoutcomes.121.007858] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
BACKGROUND There are many clinical prediction models (CPMs) available to inform treatment decisions for patients with cardiovascular disease. However, the extent to which they have been externally tested, and how well they generally perform has not been broadly evaluated. METHODS A SCOPUS citation search was run on March 22, 2017 to identify external validations of cardiovascular CPMs in the Tufts Predictive Analytics and Comparative Effectiveness CPM Registry. We assessed the extent of external validation, performance heterogeneity across databases, and explored factors associated with model performance, including a global assessment of the clinical relatedness between the derivation and validation data. RESULTS We identified 2030 external validations of 1382 CPMs. Eight hundred seven (58%) of the CPMs in the Registry have never been externally validated. On average, there were 1.5 validations per CPM (range, 0-94). The median external validation area under the receiver operating characteristic curve was 0.73 (25th-75th percentile [interquartile range (IQR)], 0.66-0.79), representing a median percent decrease in discrimination of -11.1% (IQR, -32.4% to +2.7%) compared with performance on derivation data. 81% (n=1333) of validations reporting area under the receiver operating characteristic curve showed discrimination below that reported in the derivation dataset. 53% (n=983) of the validations report some measure of CPM calibration. For CPMs evaluated more than once, there was typically a large range of performance. Of 1702 validations classified by relatedness, the percent change in discrimination was -3.7% (IQR, -13.2 to 3.1) for closely related validations (n=123), -9.0 (IQR, -27.6 to 3.9) for related validations (n=862), and -17.2% (IQR, -42.3 to 0) for distantly related validations (n=717; P<0.001). CONCLUSIONS Many published cardiovascular CPMs have never been externally validated, and for those that have, apparent performance during development is often overly optimistic. A single external validation appears insufficient to broadly understand the performance heterogeneity across different settings.
Collapse
Affiliation(s)
- Benjamin S Wessler
- Predictive Analytics and Comparative Effectiveness (PACE) (B.S.W., J.N., J.G.P., H.G., G.G., R.B., D.v.K., J.K.P., D.M.K.), Tufts Medical Center, Boston, MA.,Division of Cardiology (B.S.W., G.G.), Tufts Medical Center, Boston, MA
| | - Jason Nelson
- Predictive Analytics and Comparative Effectiveness (PACE) (B.S.W., J.N., J.G.P., H.G., G.G., R.B., D.v.K., J.K.P., D.M.K.), Tufts Medical Center, Boston, MA
| | - Jinny G Park
- Predictive Analytics and Comparative Effectiveness (PACE) (B.S.W., J.N., J.G.P., H.G., G.G., R.B., D.v.K., J.K.P., D.M.K.), Tufts Medical Center, Boston, MA
| | - Hannah McGinnes
- Predictive Analytics and Comparative Effectiveness (PACE) (B.S.W., J.N., J.G.P., H.G., G.G., R.B., D.v.K., J.K.P., D.M.K.), Tufts Medical Center, Boston, MA
| | - Gaurav Gulati
- Predictive Analytics and Comparative Effectiveness (PACE) (B.S.W., J.N., J.G.P., H.G., G.G., R.B., D.v.K., J.K.P., D.M.K.), Tufts Medical Center, Boston, MA.,Division of Cardiology (B.S.W., G.G.), Tufts Medical Center, Boston, MA
| | - Riley Brazil
- Predictive Analytics and Comparative Effectiveness (PACE) (B.S.W., J.N., J.G.P., H.G., G.G., R.B., D.v.K., J.K.P., D.M.K.), Tufts Medical Center, Boston, MA
| | - Ben Van Calster
- KU Leuven, Department of Development and Regeneration, Belgium (B.V.C.)
| | - David van Klaveren
- Predictive Analytics and Comparative Effectiveness (PACE) (B.S.W., J.N., J.G.P., H.G., G.G., R.B., D.v.K., J.K.P., D.M.K.), Tufts Medical Center, Boston, MA.,Department of Biomedical Data Sciences (D.v.K.), Leiden University Medical Centre, Netherlands
| | - Esmee Venema
- Department of Public Health (E.V., E.S.), Erasmus MC University Medical Center, Rotterdam, the Netherlands.,Department of Neurology (E.V.), Erasmus MC University Medical Center, Rotterdam, the Netherlands
| | - Ewout Steyerberg
- Department of Biomedical Data Sciences (E.S.), Leiden University Medical Centre, Netherlands.,Department of Public Health (E.V., E.S.), Erasmus MC University Medical Center, Rotterdam, the Netherlands
| | - Jessica K Paulus
- Predictive Analytics and Comparative Effectiveness (PACE) (B.S.W., J.N., J.G.P., H.G., G.G., R.B., D.v.K., J.K.P., D.M.K.), Tufts Medical Center, Boston, MA
| | - David M Kent
- Predictive Analytics and Comparative Effectiveness (PACE) (B.S.W., J.N., J.G.P., H.G., G.G., R.B., D.v.K., J.K.P., D.M.K.), Tufts Medical Center, Boston, MA
| |
Collapse
|
20
|
Venema E, Wessler BS, Paulus JK, Salah R, Raman G, Leung LY, Koethe BC, Nelson J, Park JG, van Klaveren D, Steyerberg EW, Kent DM. Large-scale validation of the prediction model risk of bias assessment Tool (PROBAST) using a short form: high risk of bias models show poorer discrimination. J Clin Epidemiol 2021; 138:32-39. [PMID: 34175377 DOI: 10.1016/j.jclinepi.2021.06.017] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Revised: 06/15/2021] [Accepted: 06/21/2021] [Indexed: 12/14/2022]
Abstract
OBJECTIVE To assess whether the Prediction model Risk Of Bias ASsessment Tool (PROBAST) and a shorter version of this tool can identify clinical prediction models (CPMs) that perform poorly at external validation. STUDY DESIGN AND SETTING We evaluated risk of bias (ROB) on 102 CPMs from the Tufts CPM Registry, comparing PROBAST to a short form consisting of six PROBAST items anticipated to best identify high ROB. We then applied the short form to all CPMs in the Registry with at least 1 validation (n=556) and assessed the change in discrimination (dAUC) in external validation cohorts (n=1,147). RESULTS PROBAST classified 98/102 CPMS as high ROB. The short form identified 96 of these 98 as high ROB (98% sensitivity), with perfect specificity. In the full CPM registry, 527 of 556 CPMs (95%) were classified as high ROB, 20 (3.6%) low ROB, and 9 (1.6%) unclear ROB. Only one model with unclear ROB was reclassified to high ROB after full PROBAST assessment of all low and unclear ROB models. Median change in discrimination was significantly smaller in low ROB models (dAUC -0.9%, IQR -6.2-4.2%) compared to high ROB models (dAUC -11.7%, IQR -33.3-2.6%; P<0.001). CONCLUSION High ROB is pervasive among published CPMs. It is associated with poor discriminative performance at validation, supporting the application of PROBAST or a shorter version in CPM reviews.
Collapse
Affiliation(s)
- Esmee Venema
- Department of Public Health, Erasmus MC University Medical Center, Rotterdam, the Netherlands; Department of Neurology, Erasmus MC University Medical Center, Rotterdam, the Netherlands
| | - Benjamin S Wessler
- Predictive Analytics and Comparative Effectiveness Center, Tufts Medical Center, Boston, MA, USA; Valve Center, Division of Cardiology, Tufts Medical Center, Boston, MA, USA
| | - Jessica K Paulus
- Predictive Analytics and Comparative Effectiveness Center, Tufts Medical Center, Boston, MA, USA
| | - Rehab Salah
- Ministry of Health and Population Hospitals, Benha Faculty of Medicine, Benha, Egypt
| | - Gowri Raman
- Center for Clinical Evidence Synthesis, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, USA
| | - Lester Y Leung
- Comprehensive Stroke Center, Division of Stroke and Cerebrovascular Diseases, Department of Neurology, Tufts Medical Center, Boston, MA, USA
| | - Benjamin C Koethe
- Predictive Analytics and Comparative Effectiveness Center, Tufts Medical Center, Boston, MA, USA
| | - Jason Nelson
- Predictive Analytics and Comparative Effectiveness Center, Tufts Medical Center, Boston, MA, USA
| | - Jinny G Park
- Predictive Analytics and Comparative Effectiveness Center, Tufts Medical Center, Boston, MA, USA
| | - David van Klaveren
- Department of Public Health, Erasmus MC University Medical Center, Rotterdam, the Netherlands; Predictive Analytics and Comparative Effectiveness Center, Tufts Medical Center, Boston, MA, USA
| | - Ewout W Steyerberg
- Department of Public Health, Erasmus MC University Medical Center, Rotterdam, the Netherlands; Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands
| | - David M Kent
- Predictive Analytics and Comparative Effectiveness Center, Tufts Medical Center, Boston, MA, USA.
| |
Collapse
|
21
|
Development and Reporting of Prediction Models: Guidance for Authors From Editors of Respiratory, Sleep, and Critical Care Journals. Crit Care Med 2021; 48:623-633. [PMID: 32141923 PMCID: PMC7161722 DOI: 10.1097/ccm.0000000000004246] [Citation(s) in RCA: 172] [Impact Index Per Article: 57.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Supplemental Digital Content is available in the text. Prediction models aim to use available data to predict a health state or outcome that has not yet been observed. Prediction is primarily relevant to clinical practice, but is also used in research, and administration. While prediction modeling involves estimating the relationship between patient factors and outcomes, it is distinct from casual inference. Prediction modeling thus requires unique considerations for development, validation, and updating. This document represents an effort from editors at 31 respiratory, sleep, and critical care medicine journals to consolidate contemporary best practices and recommendations related to prediction study design, conduct, and reporting. Herein, we address issues commonly encountered in submissions to our various journals. Key topics include considerations for selecting predictor variables, operationalizing variables, dealing with missing data, the importance of appropriate validation, model performance measures and their interpretation, and good reporting practices. Supplemental discussion covers emerging topics such as model fairness, competing risks, pitfalls of “modifiable risk factors”, measurement error, and risk for bias. This guidance is not meant to be overly prescriptive; we acknowledge that every study is different, and no set of rules will fit all cases. Additional best practices can be found in the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) guidelines, to which we refer readers for further details.
Collapse
|
22
|
Ramspek CL, Evans M, Wanner C, Drechsler C, Chesnaye NC, Szymczak M, Krajewska M, Torino C, Porto G, Hayward S, Caskey F, Dekker FW, Jager KJ, van Diepen M. Kidney Failure Prediction Models: A Comprehensive External Validation Study in Patients with Advanced CKD. J Am Soc Nephrol 2021; 32:1174-1186. [PMID: 33685974 PMCID: PMC8259669 DOI: 10.1681/asn.2020071077] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2020] [Accepted: 12/26/2020] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND Various prediction models have been developed to predict the risk of kidney failure in patients with CKD. However, guideline-recommended models have yet to be compared head to head, their validation in patients with advanced CKD is lacking, and most do not account for competing risks. METHODS To externally validate 11 existing models of kidney failure, taking the competing risk of death into account, we included patients with advanced CKD from two large cohorts: the European Quality Study (EQUAL), an ongoing European prospective, multicenter cohort study of older patients with advanced CKD, and the Swedish Renal Registry (SRR), an ongoing registry of nephrology-referred patients with CKD in Sweden. The outcome of the models was kidney failure (defined as RRT-treated ESKD). We assessed model performance with discrimination and calibration. RESULTS The study included 1580 patients from EQUAL and 13,489 patients from SRR. The average c statistic over the 11 validated models was 0.74 in EQUAL and 0.80 in SRR, compared with 0.89 in previous validations. Most models with longer prediction horizons overestimated the risk of kidney failure considerably. The 5-year Kidney Failure Risk Equation (KFRE) overpredicted risk by 10%-18%. The four- and eight-variable 2-year KFRE and the 4-year Grams model showed excellent calibration and good discrimination in both cohorts. CONCLUSIONS Some existing models can accurately predict kidney failure in patients with advanced CKD. KFRE performed well for a shorter time frame (2 years), despite not accounting for competing events. Models predicting over a longer time frame (5 years) overestimated risk because of the competing risk of death. The Grams model, which accounts for the latter, is suitable for longer-term predictions (4 years).
Collapse
Affiliation(s)
- Chava L. Ramspek
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Marie Evans
- Division of Renal Medicine, Department of Clinical Science, Intervention and Technology, Karolinska Institute and Karolinska University Hospital, Stockholm, Sweden
| | - Christoph Wanner
- Division of Nephrology, University Hospital of Wurzburg, Wurzburg, Germany
| | - Christiane Drechsler
- Division of Nephrology, Department of Internal Medicine 1, University Hospital Wurzburg, Wurzburg, Germany
| | - Nicholas C. Chesnaye
- Department of Medical Informatics, European Renal Association–European Dialysis and Transplant Association Registry, Amsterdam University Medical Center, University of Amsterdam, Amsterdam Public Health Institute, Amsterdam, The Netherlands
| | - Maciej Szymczak
- Department of Nephrology and Transplantation Medicine, Wroclaw Medical University, Wroclaw, Poland
| | - Magdalena Krajewska
- Department of Nephrology and Transplantation Medicine, Wroclaw Medical University, Wroclaw, Poland
| | - Claudia Torino
- Department of Clinical Epidemiology of Renal Diseases and Hypertension, Consiglio Nazionale della Ricerche - Istituto di fisiologia clinica, Reggio Calabria, Italy
| | - Gaetana Porto
- Department of Clinical Epidemiology of Renal Diseases and Hypertension, Consiglio Nazionale della Ricerche - Istituto di fisiologia clinica, Reggio Calabria, Italy
| | - Samantha Hayward
- Department of Translational Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom,United Kingdom Renal Registry, Bristol, United Kingdom
| | - Fergus Caskey
- Departmen of Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom
| | - Friedo W. Dekker
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Kitty J. Jager
- Department of Medical Informatics, European Renal Association–European Dialysis and Transplant Association Registry, Amsterdam University Medical Center, University of Amsterdam, Amsterdam Public Health Institute, Amsterdam, The Netherlands
| | - Merel van Diepen
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | | |
Collapse
|
23
|
Mikolić A, Polinder S, Steyerberg EW, Retel Helmrich IRA, Giacino JT, Maas AIR, van der Naalt J, Voormolen DC, von Steinbüchel N, Wilson L, Lingsma HF, van Klaveren D. Prediction of Global Functional Outcome and Post-Concussive Symptoms after Mild Traumatic Brain Injury: External Validation of Prognostic Models in the Collaborative European NeuroTrauma Effectiveness Research in Traumatic Brain Injury (CENTER-TBI) Study. J Neurotrauma 2020; 38:196-209. [PMID: 32977737 DOI: 10.1089/neu.2020.7074] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The majority of traumatic brain injuries (TBIs) are categorized as mild, according to a baseline Glasgow Coma Scale (GCS) score of 13-15. Prognostic models that were developed to predict functional outcome and persistent post-concussive symptoms (PPCS) after mild TBI have rarely been externally validated. We aimed to externally validate models predicting 3-12-month Glasgow Outcome Scale Extended (GOSE) or PPCS in adults with mild TBI. We analyzed data from the Collaborative European NeuroTrauma Effectiveness Research in Traumatic Brain Injury (CENTER-TBI) project, which included 2862 adults with mild TBI, with 6-month GOSE available for 2374 and Rivermead Post-Concussion Symptoms Questionnaire (RPQ) results available for 1605 participants. Model performance was evaluated based on calibration (graphically and characterized by slope and intercept) and discrimination (C-index). We validated five published models for 6-month GOSE and three for 6-month PPCS scores. The models used different cutoffs for outcome and some included symptoms measured 2 weeks post-injury. Discriminative ability varied substantially (C-index between 0.58 and 0.79). The models developed in the Corticosteroid Randomisation After Significant Head Injury (CRASH) trial for prediction of GOSE <5 discriminated best (C-index 0.78 and 0.79), but were poorly calibrated. The best performing models for PPCS included 2-week symptoms (C-index 0.75 and 0.76). In conclusion, none of the prognostic models for early prediction of GOSE and PPCS has both good calibration and discrimination in persons with mild TBI. In future studies, prognostic models should be tailored to the population with mild TBI, predicting relevant end-points based on readily available predictors.
Collapse
Affiliation(s)
- Ana Mikolić
- Department of Public Health, Center for Medical Decision Making, Erasmus MC-University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - Suzanne Polinder
- Department of Public Health, Center for Medical Decision Making, Erasmus MC-University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - Ewout W Steyerberg
- Department of Public Health, Center for Medical Decision Making, Erasmus MC-University Medical Center Rotterdam, Rotterdam, The Netherlands.,Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - Isabel R A Retel Helmrich
- Department of Public Health, Center for Medical Decision Making, Erasmus MC-University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - Joseph T Giacino
- Department of Physical Medicine and Rehabilitation, Spaulding Rehabilitation Hospital, Charlestown, Massachusetts, USA.,Department of Physical Medicine and Rehabilitation, Harvard Medical School, Cambridge, Massachusetts, USA
| | - Andrew I R Maas
- Department of Neurosurgery, Antwerp University Hospital and University of Antwerp, Antwerp, Belgium
| | - Joukje van der Naalt
- Department of Neurology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Daphne C Voormolen
- Department of Public Health, Center for Medical Decision Making, Erasmus MC-University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - Nicole von Steinbüchel
- Institute of Medical Psychology and Medical Sociology, Georg-August-University, Göttingen, Germany
| | - Lindsay Wilson
- Division of Psychology, University of Stirling, Stirling, United Kingdom
| | - Hester F Lingsma
- Department of Public Health, Center for Medical Decision Making, Erasmus MC-University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - David van Klaveren
- Department of Public Health, Center for Medical Decision Making, Erasmus MC-University Medical Center Rotterdam, Rotterdam, The Netherlands.,Predictive Analytics and Comparative Effectiveness Center, Institute for Clinical Research and Health Policy Studies/Tufts Medical Center, Boston, Massachusetts, USA
| | | |
Collapse
|
24
|
Myers PD, Ng K, Severson K, Kartoun U, Dai W, Huang W, Anderson FA, Stultz CM. Identifying unreliable predictions in clinical risk models. NPJ Digit Med 2020; 3:8. [PMID: 31993506 PMCID: PMC6978376 DOI: 10.1038/s41746-019-0209-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Accepted: 12/03/2019] [Indexed: 12/19/2022] Open
Abstract
The ability to identify patients who are likely to have an adverse outcome is an essential component of good clinical care. Therefore, predictive risk stratification models play an important role in clinical decision making. Determining whether a given predictive model is suitable for clinical use usually involves evaluating the model's performance on large patient datasets using standard statistical measures of success (e.g., accuracy, discriminatory ability). However, as these metrics correspond to averages over patients who have a range of different characteristics, it is difficult to discern whether an individual prediction on a given patient should be trusted using these measures alone. In this paper, we introduce a new method for identifying patient subgroups where a predictive model is expected to be poor, thereby highlighting when a given prediction is misleading and should not be trusted. The resulting "unreliability score" can be computed for any clinical risk model and is suitable in the setting of large class imbalance, a situation often encountered in healthcare settings. Using data from more than 40,000 patients in the Global Registry of Acute Coronary Events (GRACE), we demonstrate that patients with high unreliability scores form a subgroup in which the predictive model has both decreased accuracy and decreased discriminatory ability.
Collapse
Affiliation(s)
- Paul D. Myers
- Department of Electrical Engineering and Computer Science and Research Laboratory for Electronics, Massachusetts Institute of Technology, Cambridge, MA USA
| | - Kenney Ng
- Center for Computational Health, IBM Research, Cambridge, MA USA
| | - Kristen Severson
- Center for Computational Health, IBM Research, Cambridge, MA USA
| | - Uri Kartoun
- Center for Computational Health, IBM Research, Cambridge, MA USA
| | - Wangzhi Dai
- Department of Electrical Engineering and Computer Science and Research Laboratory for Electronics, Massachusetts Institute of Technology, Cambridge, MA USA
| | - Wei Huang
- Center for Outcomes Research, University of Massachusetts Medical School, Worcester, MA USA
| | - Frederick A. Anderson
- Center for Outcomes Research, University of Massachusetts Medical School, Worcester, MA USA
| | - Collin M. Stultz
- Department of Electrical Engineering and Computer Science and Research Laboratory for Electronics, Massachusetts Institute of Technology, Cambridge, MA USA
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA USA
- Division of Cardiology, Massachusetts General Hospital, Boston, MA USA
| |
Collapse
|
25
|
Kent DM, van Klaveren D, Paulus JK, D'Agostino R, Goodman S, Hayward R, Ioannidis JPA, Patrick-Lake B, Morton S, Pencina M, Raman G, Ross JS, Selker HP, Varadhan R, Vickers A, Wong JB, Steyerberg EW. The Predictive Approaches to Treatment effect Heterogeneity (PATH) Statement: Explanation and Elaboration. Ann Intern Med 2020; 172:W1-W25. [PMID: 31711094 PMCID: PMC7750907 DOI: 10.7326/m18-3668] [Citation(s) in RCA: 87] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The PATH (Predictive Approaches to Treatment effect Heterogeneity) Statement was developed to promote the conduct of, and provide guidance for, predictive analyses of heterogeneity of treatment effects (HTE) in clinical trials. The goal of predictive HTE analysis is to provide patient-centered estimates of outcome risk with versus without the intervention, taking into account all relevant patient attributes simultaneously, to support more personalized clinical decision making than can be made on the basis of only an overall average treatment effect. The authors distinguished 2 categories of predictive HTE approaches (a "risk-modeling" and an "effect-modeling" approach) and developed 4 sets of guidance statements: criteria to determine when risk-modeling approaches are likely to identify clinically meaningful HTE, methodological aspects of risk-modeling methods, considerations for translation to clinical practice, and considerations and caveats in the use of effect-modeling approaches. They discuss limitations of these methods and enumerate research priorities for advancing methods designed to generate more personalized evidence. This explanation and elaboration document describes the intent and rationale of each recommendation and discusses related analytic considerations, caveats, and reservations.
Collapse
|
26
|
Pellegrini F, Copetti M, Sormani MP, Bovis F, de Moor C, Debray TPA, Kieseier BC. Predicting disability progression in multiple sclerosis: Insights from advanced statistical modeling. Mult Scler 2019; 26:1828-1836. [DOI: 10.1177/1352458519887343] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Background: There is an unmet need for precise methods estimating disease prognosis in multiple sclerosis (MS). Objective: Using advanced statistical modeling, we assessed the prognostic value of various clinical measures for disability progression. Methods: Advanced models to assess baseline prognostic factors for disability progression over 2 years were applied to a pooled sample of patients from placebo arms in four different phase III clinical trials. least absolute shrinkage and selection operator (LASSO) and ridge regression, elastic nets, support vector machines, and unconditional and conditional random forests were applied to model time to clinical disability progression confirmed at 24 weeks. Sensitivity analyses for different definitions of a combined endpoint were carried out, and bootstrap was used to assess prediction model performance. Results: A total of 1582 patients were included, of which 434 (27.4%) had disability progression in a combined endpoint over 2 years. Overall model discrimination performance was relatively poor (all C-indices ⩽ 0.65) across all models and across different definitions of progression. Conclusion: Inconsistency of prognostic factor importance ranking confirmed the relatively poor prediction ability of baseline factors in modeling disease progression in MS. Our findings underline the importance to explore alternative predictors as well as alternative definitions of commonly used endpoints.
Collapse
Affiliation(s)
| | - Massimiliano Copetti
- Unit of Biostatistics, IRCCS Casa Sollievo della Sofferenza Hospital, San Giovanni Rotondo, Italy
| | - Maria Pia Sormani
- Department of Health Sciences (DISSAL), University of Genova, Genova, Italy/IRCCS Ospedale Policlinico San Martino, Genova, Italy
| | - Francesca Bovis
- Department of Health Sciences (DISSAL), University of Genova, Genova, Italy
| | | | - Thomas PA Debray
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Bernd C Kieseier
- Biogen, Cambridge, MA, USA/Department of Neurology, Medical Faculty, Heinrich-Heine University, Düsseldorf, Germany
| |
Collapse
|
27
|
Debray TPA, Damen JAAG, Riley RD, Snell K, Reitsma JB, Hooft L, Collins GS, Moons KGM. A framework for meta-analysis of prediction model studies with binary and time-to-event outcomes. Stat Methods Med Res 2019; 28:2768-2786. [PMID: 30032705 PMCID: PMC6728752 DOI: 10.1177/0962280218785504] [Citation(s) in RCA: 117] [Impact Index Per Article: 23.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
It is widely recommended that any developed-diagnostic or prognostic-prediction model is externally validated in terms of its predictive performance measured by calibration and discrimination. When multiple validations have been performed, a systematic review followed by a formal meta-analysis helps to summarize overall performance across multiple settings, and reveals under which circumstances the model performs suboptimal (alternative poorer) and may need adjustment. We discuss how to undertake meta-analysis of the performance of prediction models with either a binary or a time-to-event outcome. We address how to deal with incomplete availability of study-specific results (performance estimates and their precision), and how to produce summary estimates of the c-statistic, the observed:expected ratio and the calibration slope. Furthermore, we discuss the implementation of frequentist and Bayesian meta-analysis methods, and propose novel empirically-based prior distributions to improve estimation of between-study heterogeneity in small samples. Finally, we illustrate all methods using two examples: meta-analysis of the predictive performance of EuroSCORE II and of the Framingham Risk Score. All examples and meta-analysis models have been implemented in our newly developed R package "metamisc".
Collapse
Affiliation(s)
- Thomas PA Debray
- Julius Center for Health Sciences and
Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
- Cochrane Netherlands, University Medical
Center Utrecht, Utrecht, The Netherlands
| | - Johanna AAG Damen
- Julius Center for Health Sciences and
Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
- Cochrane Netherlands, University Medical
Center Utrecht, Utrecht, The Netherlands
| | - Richard D Riley
- Research Institute for Primary Care and
Health Sciences, Keele University, Staffordshire, UK
| | - Kym Snell
- Research Institute for Primary Care and
Health Sciences, Keele University, Staffordshire, UK
| | - Johannes B Reitsma
- Julius Center for Health Sciences and
Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
- Cochrane Netherlands, University Medical
Center Utrecht, Utrecht, The Netherlands
| | - Lotty Hooft
- Julius Center for Health Sciences and
Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
- Cochrane Netherlands, University Medical
Center Utrecht, Utrecht, The Netherlands
| | - Gary S Collins
- Centre for Statistics in Medicine,
University of Oxford, Oxford, UK
| | - Karel GM Moons
- Julius Center for Health Sciences and
Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
- Cochrane Netherlands, University Medical
Center Utrecht, Utrecht, The Netherlands
| |
Collapse
|
28
|
Cowley LE, Farewell DM, Maguire S, Kemp AM. Methodological standards for the development and evaluation of clinical prediction rules: a review of the literature. Diagn Progn Res 2019; 3:16. [PMID: 31463368 PMCID: PMC6704664 DOI: 10.1186/s41512-019-0060-y] [Citation(s) in RCA: 125] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Accepted: 05/12/2019] [Indexed: 12/20/2022] Open
Abstract
Clinical prediction rules (CPRs) that predict the absolute risk of a clinical condition or future outcome for individual patients are abundant in the medical literature; however, systematic reviews have demonstrated shortcomings in the methodological quality and reporting of prediction studies. To maximise the potential and clinical usefulness of CPRs, they must be rigorously developed and validated, and their impact on clinical practice and patient outcomes must be evaluated. This review aims to present a comprehensive overview of the stages involved in the development, validation and evaluation of CPRs, and to describe in detail the methodological standards required at each stage, illustrated with examples where appropriate. Important features of the study design, statistical analysis, modelling strategy, data collection, performance assessment, CPR presentation and reporting are discussed, in addition to other, often overlooked aspects such as the acceptability, cost-effectiveness and longer-term implementation of CPRs, and their comparison with clinical judgement. Although the development and evaluation of a robust, clinically useful CPR is anything but straightforward, adherence to the plethora of methodological standards, recommendations and frameworks at each stage will assist in the development of a rigorous CPR that has the potential to contribute usefully to clinical practice and decision-making and have a positive impact on patient care.
Collapse
Affiliation(s)
- Laura E. Cowley
- Division of Population Medicine, School of Medicine, Neuadd Meirionnydd, Heath Park, Cardiff University, Wales, CF14 4YS UK
| | - Daniel M. Farewell
- Division of Population Medicine, School of Medicine, Neuadd Meirionnydd, Heath Park, Cardiff University, Wales, CF14 4YS UK
| | - Sabine Maguire
- Division of Population Medicine, School of Medicine, Neuadd Meirionnydd, Heath Park, Cardiff University, Wales, CF14 4YS UK
| | - Alison M. Kemp
- Division of Population Medicine, School of Medicine, Neuadd Meirionnydd, Heath Park, Cardiff University, Wales, CF14 4YS UK
| |
Collapse
|
29
|
Wynants L, Kent DM, Timmerman D, Lundquist CM, Van Calster B. Untapped potential of multicenter studies: a review of cardiovascular risk prediction models revealed inappropriate analyses and wide variation in reporting. Diagn Progn Res 2019; 3:6. [PMID: 31093576 PMCID: PMC6460661 DOI: 10.1186/s41512-019-0046-9] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/13/2018] [Accepted: 01/03/2019] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Clinical prediction models are often constructed using multicenter databases. Such a data structure poses additional challenges for statistical analysis (clustered data) but offers opportunities for model generalizability to a broad range of centers. The purpose of this study was to describe properties, analysis, and reporting of multicenter studies in the Tufts PACE Clinical Prediction Model Registry and to illustrate consequences of common design and analyses choices. METHODS Fifty randomly selected studies that are included in the Tufts registry as multicenter and published after 2000 underwent full-text screening. Simulated examples illustrate some key concepts relevant to multicenter prediction research. RESULTS Multicenter studies differed widely in the number of participating centers (range 2 to 5473). Thirty-nine of 50 studies ignored the multicenter nature of data in the statistical analysis. In the others, clustering was resolved by developing the model on only one center, using mixed effects or stratified regression, or by using center-level characteristics as predictors. Twenty-three of 50 studies did not describe the clinical settings or type of centers from which data was obtained. Four of 50 studies discussed neither generalizability nor external validity of the developed model. CONCLUSIONS Regression methods and validation strategies tailored to multicenter studies are underutilized. Reporting on generalizability and potential external validity of the model lacks transparency. Hence, multicenter prediction research has untapped potential. REGISTRATION This review was not registered.
Collapse
Affiliation(s)
- L. Wynants
- Department of Development and Regeneration, KU Leuven, Herestraat 49, box 7003, 3000 Leuven, Belgium
- Department of Epidemiology, CAPHRI Care and Public Health Research Institute, Maastricht University, PO Box 9600, 6200 MD Maastricht, The Netherlands
| | - D. M. Kent
- Predictive Analytics and Comparative Effectiveness (PACE) Center, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, 800 Washington St, Box 63, Boston, MA 02111 USA
| | - D. Timmerman
- Department of Development and Regeneration, KU Leuven, Herestraat 49, box 7003, 3000 Leuven, Belgium
- Department of Obstetrics and Gynecology, University Hospitals Leuven, Herestraat 49, 3000 Leuven, Belgium
| | - C. M. Lundquist
- Predictive Analytics and Comparative Effectiveness (PACE) Center, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, 800 Washington St, Box 63, Boston, MA 02111 USA
| | - B. Van Calster
- Department of Development and Regeneration, KU Leuven, Herestraat 49, box 7003, 3000 Leuven, Belgium
- Department of Biomedical Data Sciences, Leiden University Medical Center, PO Box 9600, Leiden, 2300RC The Netherlands
| |
Collapse
|
30
|
van Klaveren D, Steyerberg EW, Gönen M, Vergouwe Y. The calibrated model-based concordance improved assessment of discriminative ability in patient clusters of limited sample size. Diagn Progn Res 2019; 3:11. [PMID: 31183411 PMCID: PMC6551913 DOI: 10.1186/s41512-019-0055-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/17/2018] [Accepted: 03/28/2019] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Discriminative ability is an important aspect of prediction model performance, but challenging to assess in clustered (e.g., multicenter) data. Concordance (c)-indexes may be too extreme within small clusters. We aimed to define a new approach for the assessment of discriminative ability in clustered data. METHODS We assessed discriminative ability of a prediction model for the binary outcome mortality after traumatic brain injury within centers of the CRASH trial. With multilevel logistic regression analysis, we estimated cluster-specific calibration slopes which we used to obtain the recently proposed calibrated model-based concordance (c-mbc) within each cluster. We compared the c-mbc with the naïve c-index in centers of the CRASH trial and in simulations of clusters with varying calibration slopes. RESULTS The c-mbc was less extreme in distribution than the c-index in 19 European centers (internal validation; n = 1716) and 36 non-European centers (external validation; n = 3135) of the CRASH trial. In simulations, the c-mbc was biased but less variable than the naïve c-index, resulting in lower root mean squared errors. CONCLUSIONS The c-mbc, based on multilevel regression analysis of the calibration slope, is an attractive alternative to the c-index as a measure of discriminative ability in multicenter studies with patient clusters of limited sample size.
Collapse
Affiliation(s)
- David van Klaveren
- 000000040459992Xgrid.5645.2Department of Public Health, Erasmus University Medical Center, Rotterdam, The Netherlands
- 0000 0000 8934 4045grid.67033.31Predictive Analytics and Comparative Effectiveness Center, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, USA
| | - Ewout W. Steyerberg
- 000000040459992Xgrid.5645.2Department of Public Health, Erasmus University Medical Center, Rotterdam, The Netherlands
- 0000000089452978grid.10419.3dDepartment of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - Mithat Gönen
- 0000 0001 2171 9952grid.51462.34Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, USA
| | - Yvonne Vergouwe
- 000000040459992Xgrid.5645.2Department of Public Health, Erasmus University Medical Center, Rotterdam, The Netherlands
| |
Collapse
|
31
|
Snell KIE, Ensor J, Debray TPA, Moons KGM, Riley RD. Meta-analysis of prediction model performance across multiple studies: Which scale helps ensure between-study normality for the C-statistic and calibration measures? Stat Methods Med Res 2018; 27:3505-3522. [PMID: 28480827 PMCID: PMC6193210 DOI: 10.1177/0962280217705678] [Citation(s) in RCA: 64] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
If individual participant data are available from multiple studies or clusters, then a prediction model can be externally validated multiple times. This allows the model's discrimination and calibration performance to be examined across different settings. Random-effects meta-analysis can then be used to quantify overall (average) performance and heterogeneity in performance. This typically assumes a normal distribution of 'true' performance across studies. We conducted a simulation study to examine this normality assumption for various performance measures relating to a logistic regression prediction model. We simulated data across multiple studies with varying degrees of variability in baseline risk or predictor effects and then evaluated the shape of the between-study distribution in the C-statistic, calibration slope, calibration-in-the-large, and E/O statistic, and possible transformations thereof. We found that a normal between-study distribution was usually reasonable for the calibration slope and calibration-in-the-large; however, the distributions of the C-statistic and E/O were often skewed across studies, particularly in settings with large variability in the predictor effects. Normality was vastly improved when using the logit transformation for the C-statistic and the log transformation for E/O, and therefore we recommend these scales to be used for meta-analysis. An illustrated example is given using a random-effects meta-analysis of the performance of QRISK2 across 25 general practices.
Collapse
Affiliation(s)
- Kym IE Snell
- Research Institute for Primary Care and
Health Sciences, Keele University, Staffordshire, UK
| | - Joie Ensor
- Research Institute for Primary Care and
Health Sciences, Keele University, Staffordshire, UK
| | - Thomas PA Debray
- Julius Center for Health Sciences and
Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
- Cochrane Netherlands, University Medical
Center Utrecht, Utrecht, The Netherlands
| | - Karel GM Moons
- Julius Center for Health Sciences and
Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
- Cochrane Netherlands, University Medical
Center Utrecht, Utrecht, The Netherlands
| | - Richard D Riley
- Research Institute for Primary Care and
Health Sciences, Keele University, Staffordshire, UK
| |
Collapse
|
32
|
Powers S, McGuire V, Bernstein L, Canchola AJ, Whittemore AS. Evaluating disease prediction models using a cohort whose covariate distribution differs from that of the target population. Stat Methods Med Res 2017; 28:309-320. [PMID: 28812439 DOI: 10.1177/0962280217723945] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Personal predictive models for disease development play important roles in chronic disease prevention. The performance of these models is evaluated by applying them to the baseline covariates of participants in external cohort studies, with model predictions compared to subjects' subsequent disease incidence. However, the covariate distribution among participants in a validation cohort may differ from that of the population for which the model will be used. Since estimates of predictive model performance depend on the distribution of covariates among the subjects to which it is applied, such differences can cause misleading estimates of model performance in the target population. We propose a method for addressing this problem by weighting the cohort subjects to make their covariate distribution better match that of the target population. Simulations show that the method provides accurate estimates of model performance in the target population, while un-weighted estimates may not. We illustrate the method by applying it to evaluate an ovarian cancer prediction model targeted to US women, using cohort data from participants in the California Teachers Study. The methods can be implemented using open-source code for public use as the R-package RMAP (Risk Model Assessment Package) available at http://stanford.edu/~ggong/rmap/ .
Collapse
Affiliation(s)
- Scott Powers
- 1 Department of Statistics, Stanford University, Stanford, CA, USA
| | - Valerie McGuire
- 2 Department of Health Research and Policy, Stanford University School of Medicine, Stanford, CA, USA
| | - Leslie Bernstein
- 3 Department of Population Sciences, Beckman Research Institute of City of Hope, Duarte, CA, USA
| | | | - Alice S Whittemore
- 2 Department of Health Research and Policy, Stanford University School of Medicine, Stanford, CA, USA
| |
Collapse
|