76
|
Collins GS, Dhiman P, Andaur Navarro CL, Ma J, Hooft L, Reitsma JB, Logullo P, Beam AL, Peng L, Van Calster B, van Smeden M, Riley RD, Moons KG. Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence. BMJ Open 2021; 11:e048008. [PMID: 34244270 PMCID: PMC8273461 DOI: 10.1136/bmjopen-2020-048008] [Citation(s) in RCA: 262] [Impact Index Per Article: 87.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 06/23/2021] [Indexed: 12/18/2022] Open
Abstract
INTRODUCTION The Transparent Reporting of a multivariable prediction model of Individual Prognosis Or Diagnosis (TRIPOD) statement and the Prediction model Risk Of Bias ASsessment Tool (PROBAST) were both published to improve the reporting and critical appraisal of prediction model studies for diagnosis and prognosis. This paper describes the processes and methods that will be used to develop an extension to the TRIPOD statement (TRIPOD-artificial intelligence, AI) and the PROBAST (PROBAST-AI) tool for prediction model studies that applied machine learning techniques. METHODS AND ANALYSIS TRIPOD-AI and PROBAST-AI will be developed following published guidance from the EQUATOR Network, and will comprise five stages. Stage 1 will comprise two systematic reviews (across all medical fields and specifically in oncology) to examine the quality of reporting in published machine-learning-based prediction model studies. In stage 2, we will consult a diverse group of key stakeholders using a Delphi process to identify items to be considered for inclusion in TRIPOD-AI and PROBAST-AI. Stage 3 will be virtual consensus meetings to consolidate and prioritise key items to be included in TRIPOD-AI and PROBAST-AI. Stage 4 will involve developing the TRIPOD-AI checklist and the PROBAST-AI tool, and writing the accompanying explanation and elaboration papers. In the final stage, stage 5, we will disseminate TRIPOD-AI and PROBAST-AI via journals, conferences, blogs, websites (including TRIPOD, PROBAST and EQUATOR Network) and social media. TRIPOD-AI will provide researchers working on prediction model studies based on machine learning with a reporting guideline that can help them report key details that readers need to evaluate the study quality and interpret its findings, potentially reducing research waste. We anticipate PROBAST-AI will help researchers, clinicians, systematic reviewers and policymakers critically appraise the design, conduct and analysis of machine learning based prediction model studies, with a robust standardised tool for bias evaluation. ETHICS AND DISSEMINATION Ethical approval has been granted by the Central University Research Ethics Committee, University of Oxford on 10-December-2020 (R73034/RE001). Findings from this study will be disseminated through peer-review publications. PROSPERO REGISTRATION NUMBER CRD42019140361 and CRD42019161764.
Collapse
|
77
|
Moriarty AS, Paton LW, Snell KIE, Riley RD, Buckman JEJ, Gilbody S, Chew-Graham CA, Ali S, Pilling S, Meader N, Phillips B, Coventry PA, Delgadillo J, Richards DA, Salisbury C, McMillan D. The development and validation of a prognostic model to PREDICT Relapse of depression in adult patients in primary care: protocol for the PREDICTR study. Diagn Progn Res 2021; 5:12. [PMID: 34215317 PMCID: PMC8254312 DOI: 10.1186/s41512-021-00101-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Accepted: 05/19/2021] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Most patients who present with depression are treated in primary care by general practitioners (GPs). Relapse of depression is common (at least 50% of patients treated for depression will relapse after a single episode) and leads to considerable morbidity and decreased quality of life for patients. The majority of patients will relapse within 6 months, and those with a history of relapse are more likely to relapse in the future than those with no such history. GPs see a largely undifferentiated case-mix of patients, and once patients with depression reach remission, there is limited guidance to help GPs stratify patients according to risk of relapse. We aim to develop a prognostic model to predict an individual's risk of relapse within 6-8 months of entering remission. The long-term objective is to inform the clinical management of depression after the acute phase. METHODS We will develop a prognostic model using secondary analysis of individual participant data drawn from seven RCTs and one longitudinal cohort study in primary or community care settings. We will use logistic regression to predict the outcome of relapse of depression within 6-8 months. We plan to include the following established relapse predictors in the model: residual depressive symptoms, number of previous depressive episodes, co-morbid anxiety and severity of index episode. We will use a "full model" development approach, including all available predictors. Performance statistics (optimism-adjusted C-statistic, calibration-in-the-large, calibration slope) and calibration plots (with smoothed calibration curves) will be calculated. Generalisability of predictive performance will be assessed through internal-external cross-validation. Clinical utility will be explored through net benefit analysis. DISCUSSION We will derive a statistical model to predict relapse of depression in remitted depressed patients in primary care. Assuming the model has sufficient predictive performance, we outline the next steps including independent external validation and further assessment of clinical utility and impact. STUDY REGISTRATION ClinicalTrials.gov ID: NCT04666662.
Collapse
|
78
|
Stock SJ, Horne M, Bruijn M, White H, Boyd KA, Heggie R, Wotherspoon L, Aucott L, Morris RK, Dorling J, Jackson L, Chandiramani M, David AL, Khalil A, Shennan A, van Baaren GJ, Hodgetts-Morton V, Lavender T, Schuit E, Harper-Clarke S, Mol BW, Riley RD, Norman JE, Norrie J. Development and validation of a risk prediction model of preterm birth for women with preterm labour symptoms (the QUIDS study): A prospective cohort study and individual participant data meta-analysis. PLoS Med 2021; 18:e1003686. [PMID: 34228732 PMCID: PMC8259998 DOI: 10.1371/journal.pmed.1003686] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 06/07/2021] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Timely interventions in women presenting with preterm labour can substantially improve health outcomes for preterm babies. However, establishing such a diagnosis is very challenging, as signs and symptoms of preterm labour are common and can be nonspecific. We aimed to develop and externally validate a risk prediction model using concentration of vaginal fluid fetal fibronectin (quantitative fFN), in combination with clinical risk factors, for the prediction of spontaneous preterm birth and assessed its cost-effectiveness. METHODS AND FINDINGS Pregnant women included in the analyses were 22+0 to 34+6 weeks gestation with signs and symptoms of preterm labour. The primary outcome was spontaneous preterm birth within 7 days of quantitative fFN test. The risk prediction model was developed and internally validated in an individual participant data (IPD) meta-analysis of 5 European prospective cohort studies (2009 to 2016; 1,783 women; mean age 29.7 years; median BMI 24.8 kg/m2; 67.6% White; 11.7% smokers; 51.8% nulliparous; 10.4% with multiple pregnancy; 139 [7.8%] with spontaneous preterm birth within 7 days). The model was then externally validated in a prospective cohort study in 26 United Kingdom centres (2016 to 2018; 2,924 women; mean age 28.2 years; median BMI 25.4 kg/m2; 88.2% White; 21% smokers; 35.2% nulliparous; 3.5% with multiple pregnancy; 85 [2.9%] with spontaneous preterm birth within 7 days). The developed risk prediction model for spontaneous preterm birth within 7 days included quantitative fFN, current smoking, not White ethnicity, nulliparity, and multiple pregnancy. After internal validation, the optimism adjusted area under the curve was 0.89 (95% CI 0.86 to 0.92), and the optimism adjusted Nagelkerke R2 was 35% (95% CI 33% to 37%). On external validation in the prospective UK cohort population, the area under the curve was 0.89 (95% CI 0.84 to 0.94), and Nagelkerke R2 of 36% (95% CI: 34% to 38%). Recalibration of the model's intercept was required to ensure overall calibration-in-the-large. A calibration curve suggested close agreement between predicted and observed risks in the range of predictions 0% to 10%, but some miscalibration (underprediction) at higher risks (slope 1.24 (95% CI 1.23 to 1.26)). Despite any miscalibration, the net benefit of the model was higher than "treat all" or "treat none" strategies for thresholds up to about 15% risk. The economic analysis found the prognostic model was cost effective, compared to using qualitative fFN, at a threshold for hospital admission and treatment of ≥2% risk of preterm birth within 7 days. Study limitations include the limited number of participants who are not White and levels of missing data for certain variables in the development dataset. CONCLUSIONS In this study, we found that a risk prediction model including vaginal fFN concentration and clinical risk factors showed promising performance in the prediction of spontaneous preterm birth within 7 days of test and has potential to inform management decisions for women with threatened preterm labour. Further evaluation of the risk prediction model in clinical practice is required to determine whether the risk prediction model improves clinical outcomes if used in practice. TRIAL REGISTRATION The study was approved by the West of Scotland Research Ethics Committee (16/WS/0068). The study was registered with ISRCTN Registry (ISRCTN 41598423) and NIHR Portfolio (CPMS: 31277).
Collapse
|
79
|
Snell KIE, Archer L, Ensor J, Bonnett LJ, Debray TPA, Phillips B, Collins GS, Riley RD. External validation of clinical prediction models: simulation-based sample size calculations were more reliable than rules-of-thumb. J Clin Epidemiol 2021; 135:79-89. [PMID: 33596458 PMCID: PMC8352630 DOI: 10.1016/j.jclinepi.2021.02.011] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2020] [Revised: 12/14/2020] [Accepted: 02/09/2021] [Indexed: 01/01/2023]
Abstract
INTRODUCTION Sample size "rules-of-thumb" for external validation of clinical prediction models suggest at least 100 events and 100 non-events. Such blanket guidance is imprecise, and not specific to the model or validation setting. We investigate factors affecting precision of model performance estimates upon external validation, and propose a more tailored sample size approach. METHODS Simulation of logistic regression prediction models to investigate factors associated with precision of performance estimates. Then, explanation and illustration of a simulation-based approach to calculate the minimum sample size required to precisely estimate a model's calibration, discrimination and clinical utility. RESULTS Precision is affected by the model's linear predictor (LP) distribution, in addition to number of events and total sample size. Sample sizes of 100 (or even 200) events and non-events can give imprecise estimates, especially for calibration. The simulation-based calculation accounts for the LP distribution and (mis)calibration in the validation sample. Application identifies 2430 required participants (531 events) for external validation of a deep vein thrombosis diagnostic model. CONCLUSION Where researchers can anticipate the distribution of the model's LP (eg, based on development sample, or a pilot study), a simulation-based approach for calculating sample size for external validation offers more flexibility and reliability than rules-of-thumb.
Collapse
|
80
|
Martin GP, Sperrin M, Snell KIE, Buchan I, Riley RD. Authors' reply to Sabour and Ghajari "Clinical prediction models to predict the risk of multiple binary outcomes: Methodological issues". Stat Med 2021; 40:1861-1862. [PMID: 33687094 DOI: 10.1002/sim.8872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 12/19/2020] [Indexed: 11/07/2022]
|
81
|
Ban L, Abdul Sultan A, West J, Tata LJ, Riley RD, Nelson-Piercy C, Grainge MJ. External validation of a model to predict women most at risk of postpartum venous thromboembolism: Maternity clot risk. Thromb Res 2021; 208:202-210. [PMID: 34120750 DOI: 10.1016/j.thromres.2021.05.020] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Revised: 04/29/2021] [Accepted: 05/28/2021] [Indexed: 12/23/2022]
Abstract
INTRODUCTION Venous thromboembolism (VTE) is the leading cause of direct maternal mortality in high-income countries. We previously developed a risk prediction score for postpartum venous thromboembolism (VTE) in women without a previous VTE. In this paper, we provide further external validation and assess its performance across various groups of postpartum women from England. MATERIALS AND METHODS Cohort study using primary and secondary care data covering England. We used data from QResearch comprising women with pregnancies ending in live birth or stillbirth recoded in Hospital Episodes Statistics between 2004 and 2015. Outcome was VTE in the 6 weeks postpartum. Our predictor variables included sociodemographic and lifestyle characteristics, pre-existing comorbidities, and pregnancy and delivery characteristics. RESULTS Among 535,583 women with 700,185 deliveries, 549 VTE events were recorded (absolute risk of 7.8 VTE events per 10,000 deliveries). When we compared predicted probabilities of VTE for each woman from the original model with actual VTE events, we obtained a C-statistic of 0.67 (95% CI 0.65 to 0.70). However, our model slightly over-predicted VTE risk for the higher risk women (calibration slope = 0.84; 95% CI 0.74 to 0.94). Performance was similar across groups defined by calendar time, socioeconomic status, age group and geographical area. The score performed comparably with the existing algorithm used by the UK Royal College of Obstetrician and Gynaecologists. CONCLUSIONS Our model enables flexibility in setting new treatment thresholds. Adopting it in clinical practice may help optimise use of low-molecular-weight heparin postpartum to maximise health gain by better targeting of high-risk groups.
Collapse
|
82
|
Levis B, Hattle M, Riley RD. PRIME-IPD SERIES Part 2. Retrieving, checking, and harmonizing data are underappreciated challenges in individual participant data meta-analyses. J Clin Epidemiol 2021; 136:221-223. [PMID: 34010711 DOI: 10.1016/j.jclinepi.2021.05.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 05/03/2021] [Indexed: 12/12/2022]
|
83
|
Moriarty AS, Meader N, Snell KI, Riley RD, Paton LW, Chew-Graham CA, Gilbody S, Churchill R, Phillips RS, Ali S, McMillan D. Prognostic models for predicting relapse or recurrence of major depressive disorder in adults. Cochrane Database Syst Rev 2021; 5:CD013491. [PMID: 33956992 PMCID: PMC8102018 DOI: 10.1002/14651858.cd013491.pub2] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
BACKGROUND Relapse (the re-emergence of depressive symptoms after some level of improvement but preceding recovery) and recurrence (onset of a new depressive episode after recovery) are common in depression, lead to worse outcomes and quality of life for patients and exert a high economic cost on society. Outcomes can be predicted by using multivariable prognostic models, which use information about several predictors to produce an individualised risk estimate. The ability to accurately predict relapse or recurrence while patients are well (in remission) would allow the identification of high-risk individuals and may improve overall treatment outcomes for patients by enabling more efficient allocation of interventions to prevent relapse and recurrence. OBJECTIVES To summarise the predictive performance of prognostic models developed to predict the risk of relapse, recurrence, sustained remission or recovery in adults with major depressive disorder who meet criteria for remission or recovery. SEARCH METHODS We searched the Cochrane Library (current issue); Ovid MEDLINE (1946 onwards); Ovid Embase (1980 onwards); Ovid PsycINFO (1806 onwards); and Web of Science (1900 onwards) up to May 2020. We also searched sources of grey literature, screened the reference lists of included studies and performed a forward citation search. There were no restrictions applied to the searches by date, language or publication status . SELECTION CRITERIA We included development and external validation (testing model performance in data separate from the development data) studies of any multivariable prognostic models (including two or more predictors) to predict relapse, recurrence, sustained remission, or recovery in adults (aged 18 years and over) with remitted depression, in any clinical setting. We included all study designs and accepted all definitions of relapse, recurrence and other related outcomes. We did not specify a comparator prognostic model. DATA COLLECTION AND ANALYSIS Two review authors independently screened references; extracted data (using a template based on the CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS)); and assessed risks of bias of included studies (using the Prediction model Risk Of Bias ASsessment Tool (PROBAST)). We referred any disagreements to a third independent review author. Where we found sufficient (10 or more) external validation studies of an individual model, we planned to perform a meta-analysis of its predictive performance, specifically with respect to its calibration (how well the predicted probabilities match the observed proportions of individuals that experience the outcome) and discrimination (the ability of the model to differentiate between those with and without the outcome). Recommendations could not be qualified using the GRADE system, as guidance is not yet available for prognostic model reviews. MAIN RESULTS We identified 11 eligible prognostic model studies (10 unique prognostic models). Seven were model development studies; three were model development and external validation studies; and one was an external validation-only study. Multiple estimates of performance measures were not available for any of the models and, meta-analysis was therefore not possible. Ten out of the 11 included studies were assessed as being at high overall risk of bias. Common weaknesses included insufficient sample size, inappropriate handling of missing data and lack of information about discrimination and calibration. One paper (Klein 2018) was at low overall risk of bias and presented a prognostic model including the following predictors: number of previous depressive episodes, residual depressive symptoms and severity of the last depressive episode. The external predictive performance of this model was poor (C-statistic 0.59; calibration slope 0.56; confidence intervals not reported). None of the identified studies examined the clinical utility (net benefit) of the developed model. AUTHORS' CONCLUSIONS Of the 10 prognostic models identified (across 11 studies), only four underwent external validation. Most of the studies (n = 10) were assessed as being at high overall risk of bias, and the one study that was at low risk of bias presented a model with poor predictive performance. There is a need for improved prognostic research in this clinical area, with future studies conforming to current best practice recommendations for prognostic model development/validation and reporting findings in line with the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement.
Collapse
|
84
|
de Jong VMT, Moons KGM, Eijkemans MJC, Riley RD, Debray TPA. Developing more generalizable prediction models from pooled studies and large clustered data sets. Stat Med 2021; 40:3533-3559. [PMID: 33948970 PMCID: PMC8252590 DOI: 10.1002/sim.8981] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Revised: 02/16/2021] [Accepted: 03/22/2021] [Indexed: 12/14/2022]
Abstract
Prediction models often yield inaccurate predictions for new individuals. Large data sets from pooled studies or electronic healthcare records may alleviate this with an increased sample size and variability in sample characteristics. However, existing strategies for prediction model development generally do not account for heterogeneity in predictor‐outcome associations between different settings and populations. This limits the generalizability of developed models (even from large, combined, clustered data sets) and necessitates local revisions. We aim to develop methodology for producing prediction models that require less tailoring to different settings and populations. We adopt internal‐external cross‐validation to assess and reduce heterogeneity in models' predictive performance during the development. We propose a predictor selection algorithm that optimizes the (weighted) average performance while minimizing its variability across the hold‐out clusters (or studies). Predictors are added iteratively until the estimated generalizability is optimized. We illustrate this by developing a model for predicting the risk of atrial fibrillation and updating an existing one for diagnosing deep vein thrombosis, using individual participant data from 20 cohorts (N = 10 873) and 11 diagnostic studies (N = 10 014), respectively. Meta‐analysis of calibration and discrimination performance in each hold‐out cluster shows that trade‐offs between average and heterogeneity of performance occurred. Our methodology enables the assessment of heterogeneity of prediction model performance during model development in multiple or clustered data sets, thereby informing researchers on predictor selection to improve the generalizability to different settings and populations, and reduce the need for model tailoring. Our methodology has been implemented in the R package metamisc.
Collapse
|
85
|
Chappell FM, Crawford F, Horne M, Leese GP, Martin A, Weller D, Boulton AJM, Abbott C, Monteiro-Soares M, Veves A, Riley RD. Development and validation of a clinical prediction rule for development of diabetic foot ulceration: an analysis of data from five cohort studies. BMJ Open Diabetes Res Care 2021; 9:9/1/e002150. [PMID: 34035053 PMCID: PMC8154962 DOI: 10.1136/bmjdrc-2021-002150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Revised: 03/05/2021] [Accepted: 04/03/2021] [Indexed: 11/16/2022] Open
Abstract
INTRODUCTION The aim of the study was to develop and validate a clinical prediction rule (CPR) for foot ulceration in people with diabetes. RESEARCH DESIGN AND METHODS Development of a CPR using individual participant data from four international cohort studies identified by systematic review, with validation in a fifth study. Development cohorts were from primary and secondary care foot clinics in Europe and the USA (n=8255, adults over 18 years old, with diabetes, ulcer free at recruitment). Using data from monofilament testing, presence/absence of pulses, and participant history of previous ulcer and/or amputation, we developed a simple CPR to predict who will develop a foot ulcer within 2 years of initial assessment and validated it in a fifth study (n=3324). The CPR's performance was assessed with C-statistics, calibration slopes, calibration-in-the-large, and a net benefit analysis. RESULTS CPR scores of 0, 1, 2, 3, and 4 had a risk of ulcer within 2 years of 2.4% (95% CI 1.5% to 3.9%), 6.0% (95% CI 3.5% to 9.5%), 14.0% (95% CI 8.5% to 21.3%), 29.2% (95% CI 19.2% to 41.0%), and 51.1% (95% CI 37.9% to 64.1%), respectively. In the validation dataset, calibration-in-the-large was -0.374 (95% CI -0.561 to -0.187) and calibration slope 1.139 (95% CI 0.994 to 1.283). The C-statistic was 0.829 (95% CI 0.790 to 0.868). The net benefit analysis suggested that people with a CPR score of 1 or more (risk of ulceration 6.0% or more) should be referred for treatment. CONCLUSION The clinical prediction rule is simple, using routinely obtained data, and could help prevent foot ulcers by redirecting care to patients with scores of 1 or above. It has been validated in a community setting, and requires further validation in secondary care settings.
Collapse
|
86
|
van Smeden M, Reitsma JB, Riley RD, Collins GS, Moons KG. Clinical prediction models: diagnosis versus prognosis. J Clin Epidemiol 2021; 132:142-145. [PMID: 33775387 DOI: 10.1016/j.jclinepi.2021.01.009] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Revised: 01/13/2021] [Accepted: 01/14/2021] [Indexed: 12/19/2022]
Abstract
Clinical prediction models play an increasingly important role in contemporary clinical care, by informing healthcare professionals, patients and their relatives about outcome risks, with the aim to facilitate (shared) medical decision making and improve health outcomes. Diagnostic prediction models aim to calculate an individual's risk that a disease is already present, whilst prognostic prediction models aim to calculate the risk of particular heath states occurring in the future. This article serves as a primer for diagnostic and prognostic clinical prediction models, by discussing the basic terminology, some of the inherent challenges, and the need for validation of predictive performance and the evaluation of impact of these models in clinical care.
Collapse
|
87
|
Riley RD, Snell KIE, Martin GP, Whittle R, Archer L, Sperrin M, Collins GS. Penalization and shrinkage methods produced unreliable clinical prediction models especially when sample size was small. J Clin Epidemiol 2021; 132:88-96. [PMID: 33307188 PMCID: PMC8026952 DOI: 10.1016/j.jclinepi.2020.12.005] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Revised: 11/15/2020] [Accepted: 12/02/2020] [Indexed: 12/14/2022]
Abstract
OBJECTIVES When developing a clinical prediction model, penalization techniques are recommended to address overfitting, as they shrink predictor effect estimates toward the null and reduce mean-square prediction error in new individuals. However, shrinkage and penalty terms ('tuning parameters') are estimated with uncertainty from the development data set. We examined the magnitude of this uncertainty and the subsequent impact on prediction model performance. STUDY DESIGN AND SETTING This study comprises applied examples and a simulation study of the following methods: uniform shrinkage (estimated via a closed-form solution or bootstrapping), ridge regression, the lasso, and elastic net. RESULTS In a particular model development data set, penalization methods can be unreliable because tuning parameters are estimated with large uncertainty. This is of most concern when development data sets have a small effective sample size and the model's Cox-Snell R2 is low. The problem can lead to considerable miscalibration of model predictions in new individuals. CONCLUSION Penalization methods are not a 'carte blanche'; they do not guarantee a reliable prediction model is developed. They are more unreliable when needed most (i.e., when overfitting may be large). We recommend they are best applied with large effective sample sizes, as identified from recent sample size calculations that aim to minimize the potential for model overfitting and precisely estimate key parameters.
Collapse
|
88
|
Ensor J, Snell KIE, Debray TPA, Lambert PC, Look MP, Mamas MA, Moons KGM, Riley RD. Individual participant data meta-analysis for external validation, recalibration, and updating of a flexible parametric prognostic model. Stat Med 2021; 40:3066-3084. [PMID: 33768582 DOI: 10.1002/sim.8959] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Revised: 03/04/2021] [Accepted: 03/05/2021] [Indexed: 12/14/2022]
Abstract
Individual participant data (IPD) from multiple sources allows external validation of a prognostic model across multiple populations. Often this reveals poor calibration, potentially causing poor predictive performance in some populations. However, rather than discarding the model outright, it may be possible to modify the model to improve performance using recalibration techniques. We use IPD meta-analysis to identify the simplest method to achieve good model performance. We examine four options for recalibrating an existing time-to-event model across multiple populations: (i) shifting the baseline hazard by a constant, (ii) re-estimating the shape of the baseline hazard, (iii) adjusting the prognostic index as a whole, and (iv) adjusting individual predictor effects. For each strategy, IPD meta-analysis examines (heterogeneity in) model performance across populations. Additionally, the probability of achieving good performance in a new population can be calculated allowing ranking of recalibration methods. In an applied example, IPD meta-analysis reveals that the existing model had poor calibration in some populations, and large heterogeneity across populations. However, re-estimation of the intercept substantially improved the expected calibration in new populations, and reduced between-population heterogeneity. Comparing recalibration strategies showed that re-estimating both the magnitude and shape of the baseline hazard gave the highest predicted probability of good performance in a new population. In conclusion, IPD meta-analysis allows a prognostic model to be externally validated in multiple settings, and enables recalibration strategies to be compared and ranked to decide on the least aggressive recalibration strategy to achieve acceptable external model performance without discarding existing model information.
Collapse
|
89
|
Bullock GS, Hughes T, Sergeant JC, Callaghan MJ, Collins GS, Riley RD. Improving prediction model systematic review methodology: Letter to the Editor. TRANSLATIONAL SPORTS MEDICINE 2021. [DOI: 10.1002/tsm2.240] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
90
|
Riley RD, Van Calster B, Collins GS. A note on estimating the Cox-Snell R 2 from a reported C statistic (AUROC) to inform sample size calculations for developing a prediction model with a binary outcome. Stat Med 2021; 40:859-864. [PMID: 33283904 DOI: 10.1002/sim.8806] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Accepted: 10/23/2020] [Indexed: 11/05/2022]
Abstract
In 2019 we published a pair of articles in Statistics in Medicine that describe how to calculate the minimum sample size for developing a multivariable prediction model with a continuous outcome, or with a binary or time-to-event outcome. As for any sample size calculation, the approach requires the user to specify anticipated values for key parameters. In particular, for a prediction model with a binary outcome, the outcome proportion and a conservative estimate for the overall fit of the developed model as measured by the Cox-Snell R2 (proportion of variance explained) must be specified. This proposal raises the question of how to identify a plausible value for R2 in advance of model development. Our articles suggest researchers should identify R2 from closely related models already published in their field. In this letter, we present details on how to derive R2 using the reported C statistic (AUROC) for such existing prediction models with a binary outcome. The C statistic is commonly reported, and so our approach allows researchers to obtain R2 for subsequent sample size calculations for new models. Stata and R code is provided, and a small simulation study.
Collapse
|
91
|
Crocker TF, Clegg A, Riley RD, Lam N, Bajpai R, Jordão M, Patetsini E, Ramiz R, Ensor J, Forster A, Gladman JRF. Community-based complex interventions to sustain independence in older people, stratified by frailty: a protocol for a systematic review and network meta-analysis. BMJ Open 2021; 11:e045637. [PMID: 33589465 PMCID: PMC7887376 DOI: 10.1136/bmjopen-2020-045637] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
INTRODUCTION Maintaining independence is a primary goal of community health and care services for older people, but there is currently insufficient guidance about which services to implement. Therefore, we aim to synthesise evidence on the effectiveness of community-based complex interventions to sustain independence for older people, including the effect of frailty, and group interventions to identify the best configurations. METHODS AND ANALYSIS Systematic review and network meta-analysis (NMA). We will include randomised controlled trials (RCTs) and cluster RCTs of community-based complex interventions to sustain independence for older people living at home (mean age ≥65 years), compared with usual care or another complex intervention. We will search MEDLINE (1946 to September 2020), Embase (1947 to September 2020), CINAHL (1981 to September 2020), PsycINFO (1806 to September 2020), CENTRAL and clinical trial registries from inception to September 2020, without date/language restrictions, and scan included papers' reference lists. Main outcomes were: living at home, activities of daily living (basic/instrumental), home-care services usage, hospitalisation, care home admission, costs and cost effectiveness. Additional outcomes were: health status, depression, loneliness, falls and mortality. Interventions will be coded, summarised and grouped. An NMA using a multivariate random-effects model for each outcome separately will determine the relative effects of different complex interventions. For each outcome, we will produce summary effect estimates for each pair of treatments in the network, with 95% CI, ranking plots and measures, and the borrowing of strength statistic. Inconsistency will be examined using a 'design-by-treatment interaction' model. We will assess risk of bias (Cochrane tool V.2) and certainty of evidence using the Grading of Recommendations Assessment, Development and Evaluation for NMA approach. ETHICS AND DISSEMINATION This research will use aggregated, anonymised, published data. Findings will be reported according to Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidance. They will be disseminated to policy-makers, commissioners and providers, and via conferences and scientific journals. PROSPERO REGISTRATION NUMBER CRD42019162195.
Collapse
|
92
|
Albasri A, Hattle M, Koshiaris C, Dunnigan A, Paxton B, Fox SE, Smith M, Archer L, Levis B, Payne RA, Riley RD, Roberts N, Snell KIE, Lay-Flurrie S, Usher-Smith J, Stevens R, Hobbs FDR, McManus RJ, Sheppard JP. Association between antihypertensive treatment and adverse events: systematic review and meta-analysis. BMJ 2021; 372:n189. [PMID: 33568342 PMCID: PMC7873715 DOI: 10.1136/bmj.n189] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/14/2021] [Indexed: 12/13/2022]
Abstract
OBJECTIVE To examine the association between antihypertensive treatment and specific adverse events. DESIGN Systematic review and meta-analysis. ELIGIBILITY CRITERIA Randomised controlled trials of adults receiving antihypertensives compared with placebo or no treatment, more antihypertensive drugs compared with fewer antihypertensive drugs, or higher blood pressure targets compared with lower targets. To avoid small early phase trials, studies were required to have at least 650 patient years of follow-up. INFORMATION SOURCES Searches were conducted in Embase, Medline, CENTRAL, and the Science Citation Index databases from inception until 14 April 2020. MAIN OUTCOME MEASURES The primary outcome was falls during trial follow-up. Secondary outcomes were acute kidney injury, fractures, gout, hyperkalaemia, hypokalaemia, hypotension, and syncope. Additional outcomes related to death and major cardiovascular events were extracted. Risk of bias was assessed using the Cochrane risk of bias tool, and random effects meta-analysis was used to pool rate ratios, odds ratios, and hazard ratios across studies, allowing for between study heterogeneity (τ2). RESULTS Of 15 023 articles screened for inclusion, 58 randomised controlled trials were identified, including 280 638 participants followed up for a median of 3 (interquartile range 2-4) years. Most of the trials (n=40, 69%) had a low risk of bias. Among seven trials reporting data for falls, no evidence was found of an association with antihypertensive treatment (summary risk ratio 1.05, 95% confidence interval 0.89 to 1.24, τ2=0.009). Antihypertensives were associated with an increased risk of acute kidney injury (1.18, 95% confidence interval 1.01 to 1.39, τ2=0.037, n=15), hyperkalaemia (1.89, 1.56 to 2.30, τ2=0.122, n=26), hypotension (1.97, 1.67 to 2.32, τ2=0.132, n=35), and syncope (1.28, 1.03 to 1.59, τ2=0.050, n=16). The heterogeneity between studies assessing acute kidney injury and hyperkalaemia events was reduced when focusing on drugs that affect the renin angiotensin-aldosterone system. Results were robust to sensitivity analyses focusing on adverse events leading to withdrawal from each trial. Antihypertensive treatment was associated with a reduced risk of all cause mortality, cardiovascular death, and stroke, but not of myocardial infarction. CONCLUSIONS This meta-analysis found no evidence to suggest that antihypertensive treatment is associated with falls but found evidence of an association with mild (hyperkalaemia, hypotension) and severe adverse events (acute kidney injury, syncope). These data could be used to inform shared decision making between doctors and patients about initiation and continuation of antihypertensive treatment, especially in patients at high risk of harm because of previous adverse events or poor renal function. REGISTRATION PROSPERO CRD42018116860.
Collapse
|
93
|
Sauerbrei W, Bland M, Evans SJW, Riley RD, Royston P, Schumacher M, Collins GS. Doug Altman: Driving critical appraisal and improvements in the quality of methodological and medical research. Biom J 2021; 63:226-246. [PMID: 32639065 DOI: 10.1002/bimj.202000053] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Revised: 05/20/2020] [Accepted: 06/03/2020] [Indexed: 12/12/2022]
Abstract
Doug Altman was a visionary leader and one of the most influential medical statisticians of the last 40 years. Based on a presentation in the "Invited session in memory of Doug Altman" at the 40th Annual Conference of the International Society for Clinical Biostatistics (ISCB) in Leuven, Belgium and our long-standing collaborations with Doug, we discuss his contributions to regression modeling, reporting, prognosis research, as well as some more general issues while acknowledging that we cannot cover the whole spectrum of Doug's considerable methodological output. His statement "To maximize the benefit to society, you need to not just do research but do it well" should be a driver for all researchers. To improve current and future research, we aim to summarize Doug's messages for these three topics.
Collapse
|
94
|
Archer L, Snell KIE, Ensor J, Hudda MT, Collins GS, Riley RD. Minimum sample size for external validation of a clinical prediction model with a continuous outcome. Stat Med 2021; 40:133-146. [PMID: 33150684 DOI: 10.1002/sim.8766] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Revised: 08/06/2020] [Accepted: 09/11/2020] [Indexed: 01/12/2023]
Abstract
Clinical prediction models provide individualized outcome predictions to inform patient counseling and clinical decision making. External validation is the process of examining a prediction model's performance in data independent to that used for model development. Current external validation studies often suffer from small sample sizes, and subsequently imprecise estimates of a model's predictive performance. To address this, we propose how to determine the minimum sample size needed for external validation of a clinical prediction model with a continuous outcome. Four criteria are proposed, that target precise estimates of (i) R2 (the proportion of variance explained), (ii) calibration-in-the-large (agreement between predicted and observed outcome values on average), (iii) calibration slope (agreement between predicted and observed values across the range of predicted values), and (iv) the variance of observed outcome values. Closed-form sample size solutions are derived for each criterion, which require the user to specify anticipated values of the model's performance (in particular R2 ) and the outcome variance in the external validation dataset. A sensible starting point is to base values on those for the model development study, as obtained from the publication or study authors. The largest sample size required to meet all four criteria is the recommended minimum sample size needed in the external validation dataset. The calculations can also be applied to estimate expected precision when an existing dataset with a fixed sample size is available, to help gauge if it is adequate. We illustrate the proposed methods on a case-study predicting fat-free mass in children.
Collapse
|
95
|
Jenkins DA, Martin GP, Sperrin M, Riley RD, Debray TPA, Collins GS, Peek N. Continual updating and monitoring of clinical prediction models: time for dynamic prediction systems? Diagn Progn Res 2021; 5:1. [PMID: 33431065 PMCID: PMC7797885 DOI: 10.1186/s41512-020-00090-3] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Accepted: 12/08/2020] [Indexed: 01/01/2023] Open
Abstract
Clinical prediction models (CPMs) have become fundamental for risk stratification across healthcare. The CPM pipeline (development, validation, deployment, and impact assessment) is commonly viewed as a one-time activity, with model updating rarely considered and done in a somewhat ad hoc manner. This fails to address the fact that the performance of a CPM worsens over time as natural changes in populations and care pathways occur. CPMs need constant surveillance to maintain adequate predictive performance. Rather than reactively updating a developed CPM once evidence of deteriorated performance accumulates, it is possible to proactively adapt CPMs whenever new data becomes available. Approaches for validation then need to be changed accordingly, making validation a continuous rather than a discrete effort. As such, "living" (dynamic) CPMs represent a paradigm shift, where the analytical methods dynamically generate updated versions of a model through time; one then needs to validate the system rather than each subsequent model revision.
Collapse
|
96
|
Collins SD, Peek N, Riley RD, Martin GP. Sample sizes of prediction model studies in prostate cancer were rarely justified and often insufficient. J Clin Epidemiol 2020; 133:53-60. [PMID: 33383128 DOI: 10.1016/j.jclinepi.2020.12.011] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 12/02/2020] [Accepted: 12/08/2020] [Indexed: 12/20/2022]
Abstract
OBJECTIVE Developing clinical prediction models (CPMs) on data of sufficient sample size is critical to help minimize overfitting. Using prostate cancer as a clinical exemplar, we aimed to investigate to what extent existing CPMs adhere to recent formal sample size criteria, or historic rules of thumb of events per predictor parameter (EPP)≥10. STUDY DESIGN AND SETTING A systematic review to identify CPMs related to prostate cancer, which provided enough information to calculate minimum sample size. We compared the reported sample size of each CPM against the traditional 10 EPP rule of thumb and formal sample size criteria. RESULTS About 211 CPMs were included. Three of the studies justified the sample size used, mostly using EPP rules of thumb. Overall, 69% of the CPMs were derived on sample sizes that surpassed the traditional EPP≥10 rule of thumb, but only 48% surpassed recent formal sample size criteria. For most CPMs, the required sample size based on formal criteria was higher than the sample sizes to surpass 10 EPP. CONCLUSION Few of the CPMs included in this study justified their sample size, with most justifications being based on EPP. This study shows that, in real-world data sets, adhering to the classic EPP rules of thumb is insufficient to adhere to recent formal sample size criteria.
Collapse
|
97
|
Andaur Navarro CL, Damen JAAG, Takada T, Nijman SWJ, Dhiman P, Ma J, Collins GS, Bajpai R, Riley RD, Moons KG, Hooft L. Protocol for a systematic review on the methodological and reporting quality of prediction model studies using machine learning techniques. BMJ Open 2020; 10:e038832. [PMID: 33177137 PMCID: PMC7661369 DOI: 10.1136/bmjopen-2020-038832] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 07/09/2020] [Accepted: 10/08/2020] [Indexed: 12/18/2022] Open
Abstract
INTRODUCTION Studies addressing the development and/or validation of diagnostic and prognostic prediction models are abundant in most clinical domains. Systematic reviews have shown that the methodological and reporting quality of prediction model studies is suboptimal. Due to the increasing availability of larger, routinely collected and complex medical data, and the rising application of Artificial Intelligence (AI) or machine learning (ML) techniques, the number of prediction model studies is expected to increase even further. Prediction models developed using AI or ML techniques are often labelled as a 'black box' and little is known about their methodological and reporting quality. Therefore, this comprehensive systematic review aims to evaluate the reporting quality, the methodological conduct, and the risk of bias of prediction model studies that applied ML techniques for model development and/or validation. METHODS AND ANALYSIS A search will be performed in PubMed to identify studies developing and/or validating prediction models using any ML methodology and across all medical fields. Studies will be included if they were published between January 2018 and December 2019, predict patient-related outcomes, use any study design or data source, and available in English. Screening of search results and data extraction from included articles will be performed by two independent reviewers. The primary outcomes of this systematic review are: (1) the adherence of ML-based prediction model studies to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD), and (2) the risk of bias in such studies as assessed using the Prediction model Risk Of Bias ASsessment Tool (PROBAST). A narrative synthesis will be conducted for all included studies. Findings will be stratified by study type, medical field and prevalent ML methods, and will inform necessary extensions or updates of TRIPOD and PROBAST to better address prediction model studies that used AI or ML techniques. ETHICS AND DISSEMINATION Ethical approval is not required for this study because only available published data will be analysed. Findings will be disseminated through peer-reviewed publications and scientific conferences. SYSTEMATIC REVIEW REGISTRATION PROSPERO, CRD42019161764.
Collapse
|
98
|
Snell KIE, Allotey J, Smuk M, Hooper R, Chan C, Ahmed A, Chappell LC, Von Dadelszen P, Green M, Kenny L, Khalil A, Khan KS, Mol BW, Myers J, Poston L, Thilaganathan B, Staff AC, Smith GCS, Ganzevoort W, Laivuori H, Odibo AO, Arenas Ramírez J, Kingdom J, Daskalakis G, Farrar D, Baschat AA, Seed PT, Prefumo F, da Silva Costa F, Groen H, Audibert F, Masse J, Skråstad RB, Salvesen KÅ, Haavaldsen C, Nagata C, Rumbold AR, Heinonen S, Askie LM, Smits LJM, Vinter CA, Magnus P, Eero K, Villa PM, Jenum AK, Andersen LB, Norman JE, Ohkuchi A, Eskild A, Bhattacharya S, McAuliffe FM, Galindo A, Herraiz I, Carbillon L, Klipstein-Grobusch K, Yeo SA, Browne JL, Moons KGM, Riley RD, Thangaratinam S. External validation of prognostic models predicting pre-eclampsia: individual participant data meta-analysis. BMC Med 2020; 18:302. [PMID: 33131506 PMCID: PMC7604970 DOI: 10.1186/s12916-020-01766-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/19/2020] [Accepted: 08/26/2020] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Pre-eclampsia is a leading cause of maternal and perinatal mortality and morbidity. Early identification of women at risk during pregnancy is required to plan management. Although there are many published prediction models for pre-eclampsia, few have been validated in external data. Our objective was to externally validate published prediction models for pre-eclampsia using individual participant data (IPD) from UK studies, to evaluate whether any of the models can accurately predict the condition when used within the UK healthcare setting. METHODS IPD from 11 UK cohort studies (217,415 pregnant women) within the International Prediction of Pregnancy Complications (IPPIC) pre-eclampsia network contributed to external validation of published prediction models, identified by systematic review. Cohorts that measured all predictor variables in at least one of the identified models and reported pre-eclampsia as an outcome were included for validation. We reported the model predictive performance as discrimination (C-statistic), calibration (calibration plots, calibration slope, calibration-in-the-large), and net benefit. Performance measures were estimated separately in each available study and then, where possible, combined across studies in a random-effects meta-analysis. RESULTS Of 131 published models, 67 provided the full model equation and 24 could be validated in 11 UK cohorts. Most of the models showed modest discrimination with summary C-statistics between 0.6 and 0.7. The calibration of the predicted compared to observed risk was generally poor for most models with observed calibration slopes less than 1, indicating that predictions were generally too extreme, although confidence intervals were wide. There was large between-study heterogeneity in each model's calibration-in-the-large, suggesting poor calibration of the predicted overall risk across populations. In a subset of models, the net benefit of using the models to inform clinical decisions appeared small and limited to probability thresholds between 5 and 7%. CONCLUSIONS The evaluated models had modest predictive performance, with key limitations such as poor calibration (likely due to overfitting in the original development datasets), substantial heterogeneity, and small net benefit across settings. The evidence to support the use of these prediction models for pre-eclampsia in clinical decision-making is limited. Any models that we could not validate should be examined in terms of their predictive performance, net benefit, and heterogeneity across multiple UK settings before consideration for use in practice. TRIAL REGISTRATION PROSPERO ID: CRD42015029349 .
Collapse
|
99
|
Martin GP, Sperrin M, Snell KIE, Buchan I, Riley RD. Clinical prediction models to predict the risk of multiple binary outcomes: a comparison of approaches. Stat Med 2020; 40:498-517. [PMID: 33107066 DOI: 10.1002/sim.8787] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Revised: 08/25/2020] [Accepted: 10/07/2020] [Indexed: 12/13/2022]
Abstract
Clinical prediction models (CPMs) can predict clinically relevant outcomes or events. Typically, prognostic CPMs are derived to predict the risk of a single future outcome. However, there are many medical applications where two or more outcomes are of interest, meaning this should be more widely reflected in CPMs so they can accurately estimate the joint risk of multiple outcomes simultaneously. A potentially naïve approach to multi-outcome risk prediction is to derive a CPM for each outcome separately, then multiply the predicted risks. This approach is only valid if the outcomes are conditionally independent given the covariates, and it fails to exploit the potential relationships between the outcomes. This paper outlines several approaches that could be used to develop CPMs for multiple binary outcomes. We consider four methods, ranging in complexity and conditional independence assumptions: namely, probabilistic classifier chain, multinomial logistic regression, multivariate logistic regression, and a Bayesian probit model. These are compared with methods that rely on conditional independence: separate univariate CPMs and stacked regression. Employing a simulation study and real-world example, we illustrate that CPMs for joint risk prediction of multiple outcomes should only be derived using methods that model the residual correlation between outcomes. In such a situation, our results suggest that probabilistic classification chains, multinomial logistic regression or the Bayesian probit model are all appropriate choices. We call into question the development of CPMs for each outcome in isolation when multiple correlated or structurally related outcomes are of interest and recommend more multivariate approaches to risk prediction.
Collapse
|
100
|
Townsend R, Manji A, Allotey J, Heazell A, Jorgensen L, Magee LA, Mol BW, Snell K, Riley RD, Sandall J, Smith G, Patel M, Thilaganathan B, von Dadelszen P, Thangaratinam S, Khalil A. Can risk prediction models help us individualise stillbirth prevention? A systematic review and critical appraisal of published risk models. BJOG 2020; 128:214-224. [PMID: 32894620 DOI: 10.1111/1471-0528.16487] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/02/2020] [Indexed: 11/29/2022]
Abstract
BACKGROUND Stillbirth prevention is an international priority - risk prediction models could individualise care and reduce unnecessary intervention, but their use requires evaluation. OBJECTIVES To identify risk prediction models for stillbirth, and assess their potential accuracy and clinical benefit in practice. SEARCH STRATEGY MEDLINE, Embase, DH-DATA and AMED databases were searched from inception to June 2019 using terms relevant to stillbirth, perinatal mortality and prediction models. The search was compliant with Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) guidelines. SELECTION CRITERIA Studies developing and/or validating prediction models for risk of stillbirth developed for application during pregnancy. DATA COLLECTION AND ANALYSIS Study screening and data extraction were conducted in duplicate, using the CHARMS checklist. Risk of bias was appraised using the PROBAST tool. RESULTS The search identified 2751 citations. Fourteen studies reporting development of 69 models were included. Variables consistently included were: ethnicity, body mass index, uterine artery Doppler, pregnancy-associated plasma protein and placental growth factor. For almost all models there were significant concerns about risk of bias. Apparent model performance (i.e. in the development dataset) was highest in models developed for use later in pregnancy and including maternal characteristics, and ultrasound and biochemical variables, but few were internally validated and none were externally validated. CONCLUSIONS Almost all models identified were at high risk of bias. There are first-trimester models of possible clinical benefit in early risk stratification; these require validation and clinical evaluation. There were few later pregnancy models but, if validated, these could be most relevant to individualised discussions around timing of birth. TWEETABLE ABSTRACT Prediction models using maternal factors, blood tests and ultrasound could individualise stillbirth prevention, but existing models are at high risk of bias.
Collapse
|