51
|
Hattle M, Burke DL, Trikalinos T, Schmid CH, Chen Y, Jackson D, Riley RD. Multivariate meta-analysis of multiple outcomes: characteristics and predictors of borrowing of strength from Cochrane reviews. Syst Rev 2022; 11:149. [PMID: 35883187 PMCID: PMC9316363 DOI: 10.1186/s13643-022-01999-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 06/07/2022] [Indexed: 11/29/2022] Open
Abstract
OBJECTIVES Multivariate meta-analysis allows the joint synthesis of multiple outcomes accounting for their correlation. This enables borrowing of strength (BoS) across outcomes, which may lead to greater efficiency and even different conclusions compared to separate univariate meta-analyses. However, multivariate meta-analysis is complex to apply, so guidance is needed to flag (in advance of analysis) when the approach is most useful. STUDY DESIGN AND SETTING We use 43 Cochrane intervention reviews to empirically investigate the characteristics of meta-analysis datasets that are associated with a larger BoS statistic (from 0 to 100%) when applying a bivariate meta-analysis of binary outcomes. RESULTS Four characteristics were identified as strongly associated with BoS: the total number of studies, the number of studies with the outcome of interest, the percentage of studies missing the outcome of interest, and the largest absolute within-study correlation. Using these characteristics, we then develop a model for predicting BoS in a new dataset, which is shown to have good performance (an adjusted R2 of 50%). Applied examples are used to illustrate the use of the BoS prediction model. CONCLUSIONS Cochrane reviewers mainly use univariate meta-analysis methods, but the identified characteristics associated with BoS and our subsequent prediction model for BoS help to flag when a multivariate meta-analysis may also be beneficial in Cochrane reviews with multiple binary outcomes. Extension to non-Cochrane reviews and other outcome types is still required.
Collapse
|
52
|
Papadimitropoulou K, Riley RD, Dekkers OM, Stijnen T, le Cessie S. MA-cont:pre/post effect size: An interactive tool for the meta-analysis of continuous outcomes using R Shiny. Res Synth Methods 2022; 13:649-660. [PMID: 35841123 PMCID: PMC9546083 DOI: 10.1002/jrsm.1592] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 05/23/2022] [Accepted: 07/08/2022] [Indexed: 11/25/2022]
Abstract
Meta‐analysis is a widely used methodology to combine evidence from different sources examining a common research phenomenon, to obtain a quantitative summary of the studied phenomenon. In the medical field, multiple studies investigate the effectiveness of new treatments and meta‐analysis is largely performed to generate the summary (average) treatment effect. In the meta‐analysis of aggregate continuous outcomes measured in a pretest‐posttest design using differences in means as the effect measure, a plethora of methods exist: analysis of final (follow‐up) scores, analysis of change scores and analysis of covariance. Specialised and general‐purpose statistical software is used to apply the various methods, yet, often the choice among them depends on data availability and statistical affinity. We present a new web‐based tool, MA‐cont:pre/post effect size, to conduct meta‐analysis of continuous data assessed pre‐ and post‐treatment using the aforementioned approaches on aggregate data and a more flexible approach of generating and analysing pseudo individual participant data. The interactive web environment, available by R Shiny, is used to create this free‐to‐use statistical tool, requiring no programming skills by the users. A basic statistical understanding of the methods running in the background is a prerequisite and we encourage the users to seek advice from technical experts when necessary.
Collapse
|
53
|
Singh J, Gsteiger S, Wheaton L, Riley RD, Abrams KR, Gillies CL, Bujkiewicz S. Bayesian network meta-analysis methods for combining individual participant data and aggregate data from single arm trials and randomised controlled trials. BMC Med Res Methodol 2022; 22:186. [PMID: 35818035 PMCID: PMC9275254 DOI: 10.1186/s12874-022-01657-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Accepted: 05/23/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Increasingly in network meta-analysis (NMA), there is a need to incorporate non-randomised evidence to estimate relative treatment effects, and in particular in cases with limited randomised evidence, sometimes resulting in disconnected networks of treatments. When combining different sources of data, complex NMA methods are required to address issues associated with participant selection bias, incorporating single-arm trials (SATs), and synthesising a mixture of individual participant data (IPD) and aggregate data (AD). We develop NMA methods which synthesise data from SATs and randomised controlled trials (RCTs), using a mixture of IPD and AD, for a dichotomous outcome. METHODS We propose methods under both contrast-based (CB) and arm-based (AB) parametrisations, and extend the methods to allow for both within- and across-trial adjustments for covariate effects. To illustrate the methods, we use an applied example investigating the effectiveness of biologic disease-modifying anti-rheumatic drugs for rheumatoid arthritis (RA). We applied the methods to a dataset obtained from a literature review consisting of 14 RCTs and an artificial dataset consisting of IPD from two SATs and AD from 12 RCTs, where the artificial dataset was created by removing the control arms from the only two trials assessing tocilizumab in the original dataset. RESULTS Without adjustment for covariates, the CB method with independent baseline response parameters (CBunadjInd) underestimated the effectiveness of tocilizumab when applied to the artificial dataset compared to the original dataset, albeit with significant overlap in posterior distributions for treatment effect parameters. The CB method with exchangeable baseline response parameters produced effectiveness estimates in agreement with CBunadjInd, when the predicted baseline response estimates were similar to the observed baseline response. After adjustment for RA duration, there was a reduction in across-trial heterogeneity in baseline response but little change in treatment effect estimates. CONCLUSIONS Our findings suggest incorporating SATs in NMA may be useful in some situations where a treatment is disconnected from a network of comparator treatments, due to a lack of comparative evidence, to estimate relative treatment effects. The reliability of effect estimates based on data from SATs may depend on adjustment for covariate effects, although further research is required to understand this in more detail.
Collapse
|
54
|
Dhiman P, Ma J, Andaur Navarro CL, Speich B, Bullock G, Damen JAA, Hooft L, Kirtley S, Riley RD, Van Calster B, Moons KGM, Collins GS. Risk of bias of prognostic models developed using machine learning: a systematic review in oncology. Diagn Progn Res 2022; 6:13. [PMID: 35794668 PMCID: PMC9261114 DOI: 10.1186/s41512-022-00126-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Accepted: 02/07/2022] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Prognostic models are used widely in the oncology domain to guide medical decision-making. Little is known about the risk of bias of prognostic models developed using machine learning and the barriers to their clinical uptake in the oncology domain. METHODS We conducted a systematic review and searched MEDLINE and EMBASE databases for oncology-related studies developing a prognostic model using machine learning methods published between 01/01/2019 and 05/09/2019. The primary outcome was risk of bias, judged using the Prediction model Risk Of Bias ASsessment Tool (PROBAST). We described risk of bias overall and for each domain, by development and validation analyses separately. RESULTS We included 62 publications (48 development-only; 14 development with validation). 152 models were developed across all publications and 37 models were validated. 84% (95% CI: 77 to 89) of developed models and 51% (95% CI: 35 to 67) of validated models were at overall high risk of bias. Bias introduced in the analysis was the largest contributor to the overall risk of bias judgement for model development and validation. 123 (81%, 95% CI: 73.8 to 86.4) developed models and 19 (51%, 95% CI: 35.1 to 67.3) validated models were at high risk of bias due to their analysis, mostly due to shortcomings in the analysis including insufficient sample size and split-sample internal validation. CONCLUSIONS The quality of machine learning based prognostic models in the oncology domain is poor and most models have a high risk of bias, contraindicating their use in clinical practice. Adherence to better standards is urgently needed, with a focus on sample size estimation and analysis methods, to improve the quality of these models.
Collapse
|
55
|
Bullock GS, Mylott J, Hughes T, Nicholson KF, Riley RD, Collins GS. Just How Confident Can We Be in Predicting Sports Injuries? A Systematic Review of the Methodological Conduct and Performance of Existing Musculoskeletal Injury Prediction Models in Sport. Sports Med 2022; 52:2469-2482. [PMID: 35689749 DOI: 10.1007/s40279-022-01698-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/24/2022] [Indexed: 10/18/2022]
Abstract
BACKGROUND An increasing number of musculoskeletal injury prediction models are being developed and implemented in sports medicine. Prediction model quality needs to be evaluated so clinicians can be informed of their potential usefulness. OBJECTIVE To evaluate the methodological conduct and completeness of reporting of musculoskeletal injury prediction models in sport. METHODS A systematic review was performed from inception to June 2021. Studies were included if they: (1) predicted sport injury; (2) used regression, machine learning, or deep learning models; (3) were written in English; (4) were peer reviewed. RESULTS Thirty studies (204 models) were included; 60% of studies utilized only regression methods, 13% only machine learning, and 27% both regression and machine learning approaches. All studies developed a prediction model and no studies externally validated a prediction model. Two percent of models (7% of studies) were low risk of bias and 98% of models (93% of studies) were high or unclear risk of bias. Three studies (10%) performed an a priori sample size calculation; 14 (47%) performed internal validation. Nineteen studies (63%) reported discrimination and two (7%) reported calibration. Four studies (13%) reported model equations for statistical predictions and no machine learning studies reported code or hyperparameters. CONCLUSION Existing sport musculoskeletal injury prediction models were poorly developed and have a high risk of bias. No models could be recommended for use in practice. The majority of models were developed with small sample sizes, had inadequate assessment of model performance, and were poorly reported. To create clinically useful sports musculoskeletal injury prediction models, considerable improvements in methodology and reporting are urgently required.
Collapse
|
56
|
van Geloven N, Giardiello D, Bonneville EF, Teece L, Ramspek CL, van Smeden M, Snell KIE, van Calster B, Pohar-Perme M, Riley RD, Putter H, Steyerberg E. Validation of prediction models in the presence of competing risks: a guide through modern methods. BMJ 2022; 377:e069249. [PMID: 35609902 DOI: 10.1136/bmj-2021-069249] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
57
|
Dhiman P, Ma J, Andaur Navarro CL, Speich B, Bullock G, Damen JAA, Hooft L, Kirtley S, Riley RD, Van Calster B, Moons KGM, Collins GS. Methodological conduct of prognostic prediction models developed using machine learning in oncology: a systematic review. BMC Med Res Methodol 2022; 22:101. [PMID: 35395724 PMCID: PMC8991704 DOI: 10.1186/s12874-022-01577-x] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2021] [Accepted: 03/18/2022] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Describe and evaluate the methodological conduct of prognostic prediction models developed using machine learning methods in oncology. METHODS We conducted a systematic review in MEDLINE and Embase between 01/01/2019 and 05/09/2019, for studies developing a prognostic prediction model using machine learning methods in oncology. We used the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement, Prediction model Risk Of Bias ASsessment Tool (PROBAST) and CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS) to assess the methodological conduct of included publications. Results were summarised by modelling type: regression-, non-regression-based and ensemble machine learning models. RESULTS Sixty-two publications met inclusion criteria developing 152 models across all publications. Forty-two models were regression-based, 71 were non-regression-based and 39 were ensemble models. A median of 647 individuals (IQR: 203 to 4059) and 195 events (IQR: 38 to 1269) were used for model development, and 553 individuals (IQR: 69 to 3069) and 50 events (IQR: 17.5 to 326.5) for model validation. A higher number of events per predictor was used for developing regression-based models (median: 8, IQR: 7.1 to 23.5), compared to alternative machine learning (median: 3.4, IQR: 1.1 to 19.1) and ensemble models (median: 1.7, IQR: 1.1 to 6). Sample size was rarely justified (n = 5/62; 8%). Some or all continuous predictors were categorised before modelling in 24 studies (39%). 46% (n = 24/62) of models reporting predictor selection before modelling used univariable analyses, and common method across all modelling types. Ten out of 24 models for time-to-event outcomes accounted for censoring (42%). A split sample approach was the most popular method for internal validation (n = 25/62, 40%). Calibration was reported in 11 studies. Less than half of models were reported or made available. CONCLUSIONS The methodological conduct of machine learning based clinical prediction models is poor. Guidance is urgently needed, with increased awareness and education of minimum prediction modelling standards. Particular focus is needed on sample size estimation, development and validation analysis methods, and ensuring the model is available for independent validation, to improve quality of machine learning based clinical prediction models.
Collapse
|
58
|
Riley RD, Collins GS, Ensor J, Archer L, Booth S, Mozumder SI, Rutherford MJ, van Smeden M, Lambert PC, Snell KIE. Minimum sample size calculations for external validation of a clinical prediction model with a time-to-event outcome. Stat Med 2022; 41:1280-1295. [PMID: 34915593 DOI: 10.1002/sim.9275] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 11/15/2021] [Accepted: 11/16/2021] [Indexed: 12/23/2022]
Abstract
Previous articles in Statistics in Medicine describe how to calculate the sample size required for external validation of prediction models with continuous and binary outcomes. The minimum sample size criteria aim to ensure precise estimation of key measures of a model's predictive performance, including measures of calibration, discrimination, and net benefit. Here, we extend the sample size guidance to prediction models with a time-to-event (survival) outcome, to cover external validation in datasets containing censoring. A simulation-based framework is proposed, which calculates the sample size required to target a particular confidence interval width for the calibration slope measuring the agreement between predicted risks (from the model) and observed risks (derived using pseudo-observations to account for censoring) on the log cumulative hazard scale. Precise estimation of calibration curves, discrimination, and net-benefit can also be checked in this framework. The process requires assumptions about the validation population in terms of the (i) distribution of the model's linear predictor and (ii) event and censoring distributions. Existing information can inform this; in particular, the linear predictor distribution can be approximated using the C-index or Royston's D statistic from the model development article, together with the overall event risk. We demonstrate how the approach can be used to calculate the sample size required to validate a prediction model for recurrent venous thromboembolism. Ideally the sample size should ensure precise calibration across the entire range of predicted risks, but must at least ensure adequate precision in regions important for clinical decision-making. Stata and R code are provided.
Collapse
|
59
|
Siegel L, Murad MH, Riley RD, Bazerbachi F, Wang Z, Chu H. A Guide to Estimating the Reference Range From a Meta-Analysis Using Aggregate or Individual Participant Data. Am J Epidemiol 2022; 191:948-956. [PMID: 35102410 DOI: 10.1093/aje/kwac013] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 12/09/2021] [Accepted: 01/21/2022] [Indexed: 12/16/2022] Open
Abstract
Clinicians frequently must decide whether a patient's measurement reflects that of a healthy "normal" individual. Thus, the reference range is defined as the interval in which some proportion (frequently 95%) of measurements from a healthy population is expected to fall. One can estimate it from a single study or preferably from a meta-analysis of multiple studies to increase generalizability. This range differs from the confidence interval for the pooled mean and the prediction interval for a new study mean in a meta-analysis, which do not capture natural variation across healthy individuals. Methods for estimating the reference range from a meta-analysis of aggregate data that incorporates both within- and between-study variations were recently proposed. In this guide, we present 3 approaches for estimating the reference range: one frequentist, one Bayesian, and one empirical. Each method can be applied to either aggregate or individual-participant data meta-analysis, with the latter being the gold standard when available. We illustrate the application of these approaches to data from a previously published individual-participant data meta-analysis of studies measuring liver stiffness by transient elastography in healthy individuals between 2006 and 2016.
Collapse
|
60
|
Allotey J, Whittle R, Snell KIE, Smuk M, Townsend R, von Dadelszen P, Heazell AEP, Magee L, Smith GCS, Sandall J, Thilaganathan B, Zamora J, Riley RD, Khalil A, Thangaratinam S. External validation of prognostic models to predict stillbirth using International Prediction of Pregnancy Complications (IPPIC) Network database: individual participant data meta-analysis. ULTRASOUND IN OBSTETRICS & GYNECOLOGY : THE OFFICIAL JOURNAL OF THE INTERNATIONAL SOCIETY OF ULTRASOUND IN OBSTETRICS AND GYNECOLOGY 2022; 59:209-219. [PMID: 34405928 DOI: 10.1002/uog.23757] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 06/30/2021] [Accepted: 08/02/2021] [Indexed: 06/13/2023]
Abstract
OBJECTIVE Stillbirth is a potentially preventable complication of pregnancy. Identifying women at high risk of stillbirth can guide decisions on the need for closer surveillance and timing of delivery in order to prevent fetal death. Prognostic models have been developed to predict the risk of stillbirth, but none has yet been validated externally. In this study, we externally validated published prediction models for stillbirth using individual participant data (IPD) meta-analysis to assess their predictive performance. METHODS MEDLINE, EMBASE, DH-DATA and AMED databases were searched from inception to December 2020 to identify studies reporting stillbirth prediction models. Studies that developed or updated prediction models for stillbirth for use at any time during pregnancy were included. IPD from cohorts within the International Prediction of Pregnancy Complications (IPPIC) Network were used to validate externally the identified prediction models whose individual variables were available in the IPD. The risk of bias of the models and cohorts was assessed using the Prediction study Risk Of Bias ASsessment Tool (PROBAST). The discriminative performance of the models was evaluated using the C-statistic, and calibration was assessed using calibration plots, calibration slope and calibration-in-the-large. Performance measures were estimated separately in each cohort, as well as summarized across cohorts using random-effects meta-analysis. Clinical utility was assessed using net benefit. RESULTS Seventeen studies reporting the development of 40 prognostic models for stillbirth were identified. None of the models had been previously validated externally, and the full model equation was reported for only one-fifth (20%, 8/40) of the models. External validation was possible for three of these models, using IPD from 19 cohorts (491 201 pregnant women) within the IPPIC Network database. Based on evaluation of the model development studies, all three models had an overall high risk of bias, according to PROBAST. In the IPD meta-analysis, the models had summary C-statistics ranging from 0.53 to 0.65 and summary calibration slopes ranging from 0.40 to 0.88, with risk predictions that were generally too extreme compared with the observed risks. The models had little to no clinical utility, as assessed by net benefit. However, there remained uncertainty in the performance of some models due to small available sample sizes. CONCLUSIONS The three validated stillbirth prediction models showed generally poor and uncertain predictive performance in new data, with limited evidence to support their clinical application. The findings suggest methodological shortcomings in their development, including overfitting. Further research is needed to further validate these and other models, identify stronger prognostic factors and develop more robust prediction models. © 2021 The Authors. Ultrasound in Obstetrics & Gynecology published by John Wiley & Sons Ltd on behalf of International Society of Ultrasound in Obstetrics and Gynecology.
Collapse
|
61
|
Andaur Navarro CL, Damen JAA, Takada T, Nijman SWJ, Dhiman P, Ma J, Collins GS, Bajpai R, Riley RD, Moons KGM, Hooft L. Completeness of reporting of clinical prediction models developed using supervised machine learning: a systematic review. BMC Med Res Methodol 2022; 22:12. [PMID: 35026997 PMCID: PMC8759172 DOI: 10.1186/s12874-021-01469-6] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Accepted: 11/15/2021] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND While many studies have consistently found incomplete reporting of regression-based prediction model studies, evidence is lacking for machine learning-based prediction model studies. We aim to systematically review the adherence of Machine Learning (ML)-based prediction model studies to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) Statement. METHODS We included articles reporting on development or external validation of a multivariable prediction model (either diagnostic or prognostic) developed using supervised ML for individualized predictions across all medical fields. We searched PubMed from 1 January 2018 to 31 December 2019. Data extraction was performed using the 22-item checklist for reporting of prediction model studies ( www.TRIPOD-statement.org ). We measured the overall adherence per article and per TRIPOD item. RESULTS Our search identified 24,814 articles, of which 152 articles were included: 94 (61.8%) prognostic and 58 (38.2%) diagnostic prediction model studies. Overall, articles adhered to a median of 38.7% (IQR 31.0-46.4%) of TRIPOD items. No article fully adhered to complete reporting of the abstract and very few reported the flow of participants (3.9%, 95% CI 1.8 to 8.3), appropriate title (4.6%, 95% CI 2.2 to 9.2), blinding of predictors (4.6%, 95% CI 2.2 to 9.2), model specification (5.2%, 95% CI 2.4 to 10.8), and model's predictive performance (5.9%, 95% CI 3.1 to 10.9). There was often complete reporting of source of data (98.0%, 95% CI 94.4 to 99.3) and interpretation of the results (94.7%, 95% CI 90.0 to 97.3). CONCLUSION Similar to prediction model studies developed using conventional regression-based techniques, the completeness of reporting is poor. Essential information to decide to use the model (i.e. model specification and its performance) is rarely reported. However, some items and sub-items of TRIPOD might be less suitable for ML-based prediction model studies and thus, TRIPOD requires extensions. Overall, there is an urgent need to improve the reporting quality and usability of research to avoid research waste. SYSTEMATIC REVIEW REGISTRATION PROSPERO, CRD42019161764.
Collapse
|
62
|
Ramspek CL, Teece L, Snell KIE, Evans M, Riley RD, van Smeden M, van Geloven N, van Diepen M. Lessons learnt when accounting for competing events in the external validation of time-to-event prognostic models. Int J Epidemiol 2021; 51:615-625. [PMID: 34919691 PMCID: PMC9082803 DOI: 10.1093/ije/dyab256] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Accepted: 11/24/2021] [Indexed: 12/22/2022] Open
Abstract
Background External validation of prognostic models is necessary to assess the accuracy and generalizability of the model to new patients. If models are validated in a setting in which competing events occur, these competing risks should be accounted for when comparing predicted risks to observed outcomes. Methods We discuss existing measures of calibration and discrimination that incorporate competing events for time-to-event models. These methods are illustrated using a clinical-data example concerning the prediction of kidney failure in a population with advanced chronic kidney disease (CKD), using the guideline-recommended Kidney Failure Risk Equation (KFRE). The KFRE was developed using Cox regression in a diverse population of CKD patients and has been proposed for use in patients with advanced CKD in whom death is a frequent competing event. Results When validating the 5-year KFRE with methods that account for competing events, it becomes apparent that the 5-year KFRE considerably overestimates the real-world risk of kidney failure. The absolute overestimation was 10%age points on average and 29%age points in older high-risk patients. Conclusions It is crucial that competing events are accounted for during external validation to provide a more reliable assessment the performance of a model in clinical settings in which competing risks occur.
Collapse
|
63
|
Riley RD. Applied meta‐analysis with R and Stata. Ding‐GengChen, Karl E.Peace (2021). Boca Raton, FL, Chapman and Hall/CRC Press, 2nd ed., 456 pages. ISBN 9780367183837. Biom J 2021. [DOI: 10.1002/bimj.202100226] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
|
64
|
Martin GP, Riley RD, Collins GS, Sperrin M. Developing clinical prediction models when adhering to minimum sample size recommendations: The importance of quantifying bootstrap variability in tuning parameters and predictive performance. Stat Methods Med Res 2021; 30:2545-2561. [PMID: 34623193 PMCID: PMC8649413 DOI: 10.1177/09622802211046388] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Recent minimum sample size formula (Riley et al.) for developing clinical prediction models help ensure that development datasets are of sufficient size to minimise overfitting. While these criteria are known to avoid excessive overfitting on average, the extent of variability in overfitting at recommended sample sizes is unknown. We investigated this through a simulation study and empirical example to develop logistic regression clinical prediction models using unpenalised maximum likelihood estimation, and various post-estimation shrinkage or penalisation methods. While the mean calibration slope was close to the ideal value of one for all methods, penalisation further reduced the level of overfitting, on average, compared to unpenalised methods. This came at the cost of higher variability in predictive performance for penalisation methods in external data. We recommend that penalisation methods are used in data that meet, or surpass, minimum sample size requirements to further mitigate overfitting, and that the variability in predictive performance and any tuning parameters should always be examined as part of the model development process, since this provides additional information over average (optimism-adjusted) performance alone. Lower variability would give reassurance that the developed clinical prediction model will perform well in new individuals from the same population as was used for model development.
Collapse
|
65
|
Hoogland J, IntHout J, Belias M, Rovers MM, Riley RD, E. Harrell Jr F, Moons KGM, Debray TPA, Reitsma JB. A tutorial on individualized treatment effect prediction from randomized trials with a binary endpoint. Stat Med 2021; 40:5961-5981. [PMID: 34402094 PMCID: PMC9291969 DOI: 10.1002/sim.9154] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Revised: 06/08/2021] [Accepted: 07/19/2021] [Indexed: 12/23/2022]
Abstract
Randomized trials typically estimate average relative treatment effects, but decisions on the benefit of a treatment are possibly better informed by more individualized predictions of the absolute treatment effect. In case of a binary outcome, these predictions of absolute individualized treatment effect require knowledge of the individual's risk without treatment and incorporation of a possibly differential treatment effect (ie, varying with patient characteristics). In this article, we lay out the causal structure of individualized treatment effect in terms of potential outcomes and describe the required assumptions that underlie a causal interpretation of its prediction. Subsequently, we describe regression models and model estimation techniques that can be used to move from average to more individualized treatment effect predictions. We focus mainly on logistic regression-based methods that are both well-known and naturally provide the required probabilistic estimates. We incorporate key components from both causal inference and prediction research to arrive at individualized treatment effect predictions. While the separate components are well known, their successful amalgamation is very much an ongoing field of research. We cut the problem down to its essentials in the setting of a randomized trial, discuss the importance of a clear definition of the estimand of interest, provide insight into the required assumptions, and give guidance with respect to modeling and estimation options. Simulated data illustrate the potential of different modeling options across scenarios that vary both average treatment effect and treatment effect heterogeneity. Two applied examples illustrate individualized treatment effect prediction in randomized trial data.
Collapse
|
66
|
Foroutan F, Guyatt G, Trivella M, Kreuzberger N, Skoetz N, Riley RD, Roshanov PS, Alba AC, Sekercioglu N, Canelo C, Munn Z, Brignardello-Petersen R, Schünemann HJ, Iorio A. GRADE concept paper 2: Concepts for judging certainty on the calibration of prognostic models in a body of validation studies. J Clin Epidemiol 2021; 143:202-211. [PMID: 34800677 DOI: 10.1016/j.jclinepi.2021.11.024] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2021] [Revised: 10/16/2021] [Accepted: 11/10/2021] [Indexed: 12/23/2022]
Abstract
``In this paper, we highlight key concepts...'' is background.The sentence ``IN this paper, we highlight key concepts in evaluating the certainty of evidence regarding the calibration of prognostic models'' is methods. The rest is results and conclusion. Brognostic models combine several prognostic factors to provide an estimate of the likelihood (or risk) of future events in individual patients, conditional on their prognostic factor values. A fundamental part of evaluating prognostic models is undertaking studies to determine whether their predictive performance, such as calibration and discrimination, is reproduced across settings. Systematic reviews and meta-analyses of studies evaluating prognostic models' performance are a necessary step for selection of models for clinical practice and for testing the underlying assumption that their use will improve outcomes, including patient's reassurance and optimal future planning. In this paper, we highlight key concepts in evaluating the certainty of evidence regarding the calibration of prognostic models. Four concepts are key to evaluating the certainty of evidence on prognostic models' performance regarding calibration. The first concept is that the inference regarding calibration may take 1 of 2 forms: deciding whether 1 is rating certainty that a model's performance is satisfactory or, instead, unsatisfactory, in either case defining the threshold for satisfactory (or unsatisfactory) model performance. Second, inconsistency is the critical GRADE domain to deciding whether we are rating certainty in the model performance being satisfactory or unsatisfactory. Third, depending on whether 1 is rating certainty in satisfactory or unsatisfactory performance, different patterns of inconsistency of results across studies will inform ratings of certainty of evidence. Fourth, exploring the distribution of point estimates of observed to expected ratio across individual studies, and its determinants, will bear on the need for and direction of future research.
Collapse
|
67
|
Adab P, Jordan RE, Fitzmaurice D, Ayres JG, Cheng KK, Cooper BG, Daley A, Dickens A, Enocson A, Greenfield S, Haroon S, Jolly K, Jowett S, Lambe T, Martin J, Miller MR, Rai K, Riley RD, Sadhra S, Sitch A, Siebert S, Stockley RA, Turner A. Case-finding and improving patient outcomes for chronic obstructive pulmonary disease in primary care: the BLISS research programme including cluster RCT. PROGRAMME GRANTS FOR APPLIED RESEARCH 2021. [DOI: 10.3310/pgfar09130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background
Chronic obstructive pulmonary disease is a major contributor to morbidity, mortality and health service costs but is vastly underdiagnosed. Evidence on screening and how best to approach this is not clear. There are also uncertainties around the natural history (prognosis) of chronic obstructive pulmonary disease and how it impacts on work performance.
Objectives
Work package 1: to evaluate alternative methods of screening for undiagnosed chronic obstructive pulmonary disease in primary care, with clinical effectiveness and cost-effectiveness analyses and an economic model of a routine screening programme. Work package 2: to recruit a primary care chronic obstructive pulmonary disease cohort, develop a prognostic model [Birmingham Lung Improvement StudieS (BLISS)] to predict risk of respiratory hospital admissions, validate an existing model to predict mortality risk, address some uncertainties about natural history and explore the potential for a home exercise intervention. Work package 3: to identify which factors are associated with employment, absenteeism, presenteeism (working while unwell) and evaluate the feasibility of offering formal occupational health assessment to improve work performance.
Design
Work package 1: a cluster randomised controlled trial with household-level randomised comparison of two alternative case-finding approaches in the intervention arm. Work package 2: cohort study – focus groups. Work package 3: subcohort – feasibility study.
Setting
Primary care settings in West Midlands, UK.
Participants
Work package 1: 74,818 people who have smoked aged 40–79 years without a previous chronic obstructive pulmonary disease diagnosis from 54 general practices. Work package 2: 741 patients with previously diagnosed chronic obstructive pulmonary disease from 71 practices and participants from the work package 1 randomised controlled trial. Twenty-six patients took part in focus groups. Work package 3: occupational subcohort with 248 patients in paid employment at baseline. Thirty-five patients took part in an occupational health intervention feasibility study.
Interventions
Work package 1: targeted case-finding – symptom screening questionnaire, administered opportunistically or additionally by post, followed by diagnostic post-bronchodilator spirometry. The comparator was routine care. Work package 2: twenty-three candidate variables selected from literature and expert reviews. Work package 3: sociodemographic, clinical and occupational characteristics; occupational health assessment and recommendations.
Main outcome measures
Work package 1: yield (screen-detected chronic obstructive pulmonary disease) and cost-effectiveness of case-finding; effectiveness of screening on respiratory hospitalisation and mortality after approximately 4 years. Work package 2: respiratory hospitalisation within 2 years, and barriers to and facilitators of physical activity. Work package 3: work performance – feasibility and acceptability of the occupational health intervention and study processes.
Results
Work package 1: targeted case-finding resulted in greater yield of previously undiagnosed chronic obstructive pulmonary disease than routine care at 1 year [n = 1278 (4%) vs. n = 337 (1%), respectively; adjusted odds ratio 7.45, 95% confidence interval 4.80 to 11.55], and a model-based estimate of a regular screening programme suggested an incremental cost-effectiveness ratio of £16,596 per additional quality-adjusted life-year gained. However, long-term follow-up of the trial showed that at ≈4 years there was no clear evidence that case-finding, compared with routine practice, was effective in reducing respiratory admissions (adjusted hazard ratio 1.04, 95% confidence interval 0.73 to1.47) or mortality (hazard ratio 1.15, 95% confidence interval 0.82 to 1.61). Work package 2: 2305 patients, comprising 1564 with previously diagnosed chronic obstructive pulmonary disease and 741 work package 1 participants (330 with and 411 without obstruction), were recruited. The BLISS prognostic model among cohort participants with confirmed airflow obstruction (n = 1894) included 6 of 23 candidate variables (i.e. age, Chronic Obstructive Pulmonary Disease Assessment Test score, 12-month respiratory admissions, body mass index, diabetes and forced expiratory volume in 1 second percentage predicted). After internal validation and adjustment (uniform shrinkage factor 0.87, 95% confidence interval 0.72 to 1.02), the model discriminated well in predicting 2-year respiratory hospital admissions (c-statistic 0.75, 95% confidence interval 0.72 to 0.79). In focus groups, physical activity engagement was related to self-efficacy and symptom severity. Work package 3: in the occupational subcohort, increasing dyspnoea and exposure to inhaled irritants were associated with lower work productivity at baseline. Longitudinally, increasing exacerbations and worsening symptoms, but not a decline in airflow obstruction, were associated with absenteeism and presenteeism. The acceptability of the occupational health intervention was low, leading to low uptake and low implementation of recommendations and making a full trial unfeasible.
Limitations
Work package 1: even with the most intensive approach, only 38% of patients responded to the case-finding invitation. Management of case-found patients with chronic obstructive pulmonary disease in primary care was generally poor, limiting interpretation of the long-term effectiveness of case-finding on clinical outcomes. Work package 2: the components of the BLISS model may not always be routinely available and calculation of the score requires a computerised system. Work package 3: relatively few cohort participants were in paid employment at baseline, limiting the interpretation of predictors of lower work productivity.
Conclusions
This programme has addressed some of the major uncertainties around screening for undiagnosed chronic obstructive pulmonary disease and has resulted in the development of a novel, accurate model for predicting respiratory hospitalisation in people with chronic obstructive pulmonary disease and the inception of a primary care chronic obstructive pulmonary disease cohort for longer-term follow-up. We have also identified factors that may affect work productivity in people with chronic obstructive pulmonary disease as potential targets for future intervention.
Future work
We plan to obtain data for longer-term follow-up of trial participants at 10 years. The BLISS model needs to be externally validated. Our primary care chronic obstructive pulmonary disease cohort is a unique resource for addressing further questions to better understand the prognosis of chronic obstructive pulmonary disease.
Trial registration
Current Controlled Trials ISRCTN14930255.
Funding
This project was funded by the National Institute for Health Research (NIHR) Programme Grants for Applied Research programme and will be published in full in Programme Grants for Applied Research; Vol. 9, No. 13. See the NIHR Journals Library website for further project information.
Collapse
|
68
|
Andaur Navarro CL, Damen JAA, Takada T, Nijman SWJ, Dhiman P, Ma J, Collins GS, Bajpai R, Riley RD, Moons KGM, Hooft L. Risk of bias in studies on prediction models developed using supervised machine learning techniques: systematic review. BMJ 2021; 375:n2281. [PMID: 34670780 PMCID: PMC8527348 DOI: 10.1136/bmj.n2281] [Citation(s) in RCA: 88] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/13/2021] [Indexed: 12/23/2022]
Abstract
OBJECTIVE To assess the methodological quality of studies on prediction models developed using machine learning techniques across all medical specialties. DESIGN Systematic review. DATA SOURCES PubMed from 1 January 2018 to 31 December 2019. ELIGIBILITY CRITERIA Articles reporting on the development, with or without external validation, of a multivariable prediction model (diagnostic or prognostic) developed using supervised machine learning for individualised predictions. No restrictions applied for study design, data source, or predicted patient related health outcomes. REVIEW METHODS Methodological quality of the studies was determined and risk of bias evaluated using the prediction risk of bias assessment tool (PROBAST). This tool contains 21 signalling questions tailored to identify potential biases in four domains. Risk of bias was measured for each domain (participants, predictors, outcome, and analysis) and each study (overall). RESULTS 152 studies were included: 58 (38%) included a diagnostic prediction model and 94 (62%) a prognostic prediction model. PROBAST was applied to 152 developed models and 19 external validations. Of these 171 analyses, 148 (87%, 95% confidence interval 81% to 91%) were rated at high risk of bias. The analysis domain was most frequently rated at high risk of bias. Of the 152 models, 85 (56%, 48% to 64%) were developed with an inadequate number of events per candidate predictor, 62 handled missing data inadequately (41%, 33% to 49%), and 59 assessed overfitting improperly (39%, 31% to 47%). Most models used appropriate data sources to develop (73%, 66% to 79%) and externally validate the machine learning based prediction models (74%, 51% to 88%). Information about blinding of outcome and blinding of predictors was, however, absent in 60 (40%, 32% to 47%) and 79 (52%, 44% to 60%) of the developed models, respectively. CONCLUSION Most studies on machine learning based prediction models show poor methodological quality and are at high risk of bias. Factors contributing to risk of bias include small study size, poor handling of missing data, and failure to deal with overfitting. Efforts to improve the design, conduct, reporting, and validation of such studies are necessary to boost the application of machine learning based prediction models in clinical practice. SYSTEMATIC REVIEW REGISTRATION PROSPERO CRD42019161764.
Collapse
|
69
|
Allotey J, Snell KI, Smuk M, Hooper R, Chan CL, Ahmed A, Chappell LC, von Dadelszen P, Dodds J, Green M, Kenny L, Khalil A, Khan KS, Mol BW, Myers J, Poston L, Thilaganathan B, Staff AC, Smith GC, Ganzevoort W, Laivuori H, Odibo AO, Ramírez JA, Kingdom J, Daskalakis G, Farrar D, Baschat AA, Seed PT, Prefumo F, da Silva Costa F, Groen H, Audibert F, Masse J, Skråstad RB, Salvesen KÅ, Haavaldsen C, Nagata C, Rumbold AR, Heinonen S, Askie LM, Smits LJ, Vinter CA, Magnus PM, Eero K, Villa PM, Jenum AK, Andersen LB, Norman JE, Ohkuchi A, Eskild A, Bhattacharya S, McAuliffe FM, Galindo A, Herraiz I, Carbillon L, Klipstein-Grobusch K, Yeo S, Teede HJ, Browne JL, Moons KG, Riley RD, Thangaratinam S. Validation and development of models using clinical, biochemical and ultrasound markers for predicting pre-eclampsia: an individual participant data meta-analysis. Health Technol Assess 2021; 24:1-252. [PMID: 33336645 DOI: 10.3310/hta24720] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Pre-eclampsia is a leading cause of maternal and perinatal mortality and morbidity. Early identification of women at risk is needed to plan management. OBJECTIVES To assess the performance of existing pre-eclampsia prediction models and to develop and validate models for pre-eclampsia using individual participant data meta-analysis. We also estimated the prognostic value of individual markers. DESIGN This was an individual participant data meta-analysis of cohort studies. SETTING Source data from secondary and tertiary care. PREDICTORS We identified predictors from systematic reviews, and prioritised for importance in an international survey. PRIMARY OUTCOMES Early-onset (delivery at < 34 weeks' gestation), late-onset (delivery at ≥ 34 weeks' gestation) and any-onset pre-eclampsia. ANALYSIS We externally validated existing prediction models in UK cohorts and reported their performance in terms of discrimination and calibration. We developed and validated 12 new models based on clinical characteristics, clinical characteristics and biochemical markers, and clinical characteristics and ultrasound markers in the first and second trimesters. We summarised the data set-specific performance of each model using a random-effects meta-analysis. Discrimination was considered promising for C-statistics of ≥ 0.7, and calibration was considered good if the slope was near 1 and calibration-in-the-large was near 0. Heterogeneity was quantified using I 2 and τ2. A decision curve analysis was undertaken to determine the clinical utility (net benefit) of the models. We reported the unadjusted prognostic value of individual predictors for pre-eclampsia as odds ratios with 95% confidence and prediction intervals. RESULTS The International Prediction of Pregnancy Complications network comprised 78 studies (3,570,993 singleton pregnancies) identified from systematic reviews of tests to predict pre-eclampsia. Twenty-four of the 131 published prediction models could be validated in 11 UK cohorts. Summary C-statistics were between 0.6 and 0.7 for most models, and calibration was generally poor owing to large between-study heterogeneity, suggesting model overfitting. The clinical utility of the models varied between showing net harm to showing minimal or no net benefit. The average discrimination for IPPIC models ranged between 0.68 and 0.83. This was highest for the second-trimester clinical characteristics and biochemical markers model to predict early-onset pre-eclampsia, and lowest for the first-trimester clinical characteristics models to predict any pre-eclampsia. Calibration performance was heterogeneous across studies. Net benefit was observed for International Prediction of Pregnancy Complications first and second-trimester clinical characteristics and clinical characteristics and biochemical markers models predicting any pre-eclampsia, when validated in singleton nulliparous women managed in the UK NHS. History of hypertension, parity, smoking, mode of conception, placental growth factor and uterine artery pulsatility index had the strongest unadjusted associations with pre-eclampsia. LIMITATIONS Variations in study population characteristics, type of predictors reported, too few events in some validation cohorts and the type of measurements contributed to heterogeneity in performance of the International Prediction of Pregnancy Complications models. Some published models were not validated because model predictors were unavailable in the individual participant data. CONCLUSION For models that could be validated, predictive performance was generally poor across data sets. Although the International Prediction of Pregnancy Complications models show good predictive performance on average, and in the singleton nulliparous population, heterogeneity in calibration performance is likely across settings. FUTURE WORK Recalibration of model parameters within populations may improve calibration performance. Additional strong predictors need to be identified to improve model performance and consistency. Validation, including examination of calibration heterogeneity, is required for the models we could not validate. STUDY REGISTRATION This study is registered as PROSPERO CRD42015029349. FUNDING This project was funded by the National Institute for Health Research (NIHR) Health Technology Assessment programme and will be published in full in Health Technology Assessment; Vol. 24, No. 72. See the NIHR Journals Library website for further project information.
Collapse
|
70
|
Dhiman P, Ma J, Navarro CA, Speich B, Bullock G, Damen JA, Kirtley S, Hooft L, Riley RD, Van Calster B, Moons KGM, Collins GS. Reporting of prognostic clinical prediction models based on machine learning methods in oncology needs to be improved. J Clin Epidemiol 2021; 138:60-72. [PMID: 34214626 PMCID: PMC8592577 DOI: 10.1016/j.jclinepi.2021.06.024] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 06/15/2021] [Accepted: 06/25/2021] [Indexed: 12/23/2022]
Abstract
OBJECTIVE Evaluate the completeness of reporting of prognostic prediction models developed using machine learning methods in the field of oncology. STUDY DESIGN AND SETTING We conducted a systematic review, searching the MEDLINE and Embase databases between 01/01/2019 and 05/09/2019, for non-imaging studies developing a prognostic clinical prediction model using machine learning methods (as defined by primary study authors) in oncology. We used the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement to assess the reporting quality of included publications. We described overall reporting adherence of included publications and by each section of TRIPOD. RESULTS Sixty-two publications met the inclusion criteria. 48 were development studies and 14 were development with validation studies. 152 models were developed across all publications. Median adherence to TRIPOD reporting items was 41% [range: 10%-67%] and at least 50% adherence was found in 19% (n=12/62) of publications. Adherence was lower in development only studies (median: 38% [range: 10%-67%]); and higher in development with validation studies (median: 49% [range: 33%-59%]). CONCLUSION Reporting of clinical prediction models using machine learning in oncology is poor and needs urgent improvement, so readers and stakeholders can appraise the study methods, understand study findings, and reduce research waste.
Collapse
|
71
|
Van Calster B, Wynants L, Riley RD, van Smeden M, Collins GS. Methodology over metrics: current scientific standards are a disservice to patients and society. J Clin Epidemiol 2021; 138:219-226. [PMID: 34077797 PMCID: PMC8795888 DOI: 10.1016/j.jclinepi.2021.05.018] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Revised: 05/23/2021] [Accepted: 05/25/2021] [Indexed: 01/08/2023]
Abstract
Covid-19 research made it painfully clear that the scandal of poor medical research, as denounced by Altman in 1994, persists today. The overall quality of medical research remains poor, despite longstanding criticisms. The problems are well known, but the research community fails to properly address them. We suggest that most problems stem from an underlying paradox: although methodology is undeniably the backbone of high-quality and responsible research, science consistently undervalues methodology. The focus remains more on the destination (research claims and metrics) than on the journey. Notwithstanding, research should serve society more than the reputation of those involved. While we notice that many initiatives are being established to improve components of the research cycle, these initiatives are too disjointed. The overall system is monolithic and slow to adapt. We assert that top-down action is needed from journals, universities, funders and governments to break the cycle and put methodology first. These actions should involve the widespread adoption of registered reports, balanced research funding between innovative, incremental and methodological research projects, full recognition and demystification of peer review, improved methodological review of reports, adherence to reporting guidelines, and investment in methodological education and research. Currently, the scientific enterprise is doing a major disservice to patients and society.
Collapse
|
72
|
Bullock GS, Hughes T, Sergeant JC, Callaghan MJ, Riley RD, Collins GS. Clinical Prediction Models in Sports Medicine: A Guide for Clinicians and Researchers. J Orthop Sports Phys Ther 2021; 51:517-525. [PMID: 34592832 DOI: 10.2519/jospt.2021.10697] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
SYNOPSIS Participating in sport carries inherent risk of injury. Clinicians execute high-level clinical reasoning and decision making to support athletes to achieve the best outcomes. Accurately diagnosing a problem, estimating prognosis, or selecting the most suitable intervention for each athlete is challenging. Clinical prediction models are tools to assist clinicians in estimating the risk or probability of a health outcome for an individual by using data from multiple predictors. Although common in general medical literature, clinical prediction models are rare in sports medicine. The purpose of this article was to (1) describe the steps required to develop and validate (ie, evaluate) a clinical prediction model for clinical researchers, and (2) help sports medicine clinicians understand and interpret clinical prediction model studies. Using a case study to illustrate how to implement clinical prediction models in practice, we address the following issues in developing and validating a clinical prediction model: study design and data, sample size, missing data, selecting predictors, handling continuous predictors, model fitting, internal and external validation, performance measures, reporting, and model presentation. Our work builds on initiatives to improve diagnostic and prognostic clinical research, including the PROGnosis RESearch Strategy (PROGRESS) series of papers and textbook and the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement. J Orthop Sports Phys Ther 2021;51(10):517-525. doi:10.2519/jospt.2021.10697.
Collapse
|
73
|
Stock SJ, Horne M, Bruijn M, White H, Heggie R, Wotherspoon L, Boyd K, Aucott L, Morris RK, Dorling J, Jackson L, Chandiramani M, David A, Khalil A, Shennan A, Baaren GJV, Hodgetts-Morton V, Lavender T, Schuit E, Harper-Clarke S, Mol B, Riley RD, Norman J, Norrie J. A prognostic model, including quantitative fetal fibronectin, to predict preterm labour: the QUIDS meta-analysis and prospective cohort study. Health Technol Assess 2021; 25:1-168. [PMID: 34498576 DOI: 10.3310/hta25520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND The diagnosis of preterm labour is challenging. False-positive diagnoses are common and result in unnecessary, potentially harmful treatments (e.g. tocolytics, antenatal corticosteroids and magnesium sulphate) and costly hospital admissions. Measurement of fetal fibronectin in vaginal fluid is a biochemical test that can indicate impending preterm birth. OBJECTIVES To develop an externally validated prognostic model using quantitative fetal fibronectin concentration, in combination with clinical risk factors, for the prediction of spontaneous preterm birth and to assess its cost-effectiveness. DESIGN The study comprised (1) a qualitative study to establish the decisional needs of pregnant women and their caregivers, (2) an individual participant data meta-analysis of existing studies to develop a prognostic model for spontaneous preterm birth within 7 days in women with symptoms of preterm labour based on quantitative fetal fibronectin and clinical risk factors, (3) external validation of the prognostic model in a prospective cohort study across 26 UK centres, (4) a model-based economic evaluation comparing the prognostic model with qualitative fetal fibronectin, and quantitative fetal fibronectin with cervical length measurement, in terms of cost per QALY gained and (5) a qualitative assessment of the acceptability of quantitative fetal fibronectin. DATA SOURCES/SETTING The model was developed using data from five European prospective cohort studies of quantitative fetal fibronectin. The UK prospective cohort study was carried out across 26 UK centres. PARTICIPANTS Pregnant women at 22+0-34+6 weeks' gestation with signs and symptoms of preterm labour. HEALTH TECHNOLOGY BEING ASSESSED Quantitative fetal fibronectin. MAIN OUTCOME MEASURES Spontaneous preterm birth within 7 days. RESULTS The individual participant data meta-analysis included 1783 women and 139 events of spontaneous preterm birth within 7 days (event rate 7.8%). The prognostic model that was developed included quantitative fetal fibronectin, smoking, ethnicity, nulliparity and multiple pregnancy. The model was externally validated in a cohort of 2837 women, with 83 events of spontaneous preterm birth within 7 days (event rate 2.93%), an area under the curve of 0.89 (95% confidence interval 0.84 to 0.93), a calibration slope of 1.22 and a Nagelkerke R 2 of 0.34. The economic analysis found that the prognostic model was cost-effective compared with using qualitative fetal fibronectin at a threshold for hospital admission and treatment of ≥ 2% risk of preterm birth within 7 days. LIMITATIONS The outcome proportion (spontaneous preterm birth within 7 days of test) was 2.9% in the validation study. This is in line with other studies, but having slightly fewer than 100 events is a limitation in model validation. CONCLUSIONS A prognostic model that included quantitative fetal fibronectin and clinical risk factors showed excellent performance in the prediction of spontaneous preterm birth within 7 days of test, was cost-effective and can be used to inform a decision support tool to help guide management decisions for women with threatened preterm labour. FUTURE WORK The prognostic model will be embedded in electronic maternity records and a mobile telephone application, enabling ongoing data collection for further refinement and validation of the model. STUDY REGISTRATION This study is registered as PROSPERO CRD42015027590 and Current Controlled Trials ISRCTN41598423. FUNDING This project was funded by the National Institute for Health Research (NIHR) Health Technology Assessment programme and will be published in full in Health Technology Assessment; Vol. 25, No. 52. See the NIHR Journals Library website for further project information.
Collapse
|
74
|
Riley RD, Debray TPA, Collins GS, Archer L, Ensor J, van Smeden M, Snell KIE. Minimum sample size for external validation of a clinical prediction model with a binary outcome. Stat Med 2021; 40:4230-4251. [PMID: 34031906 DOI: 10.1002/sim.9025] [Citation(s) in RCA: 99] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Revised: 02/01/2021] [Accepted: 03/22/2021] [Indexed: 12/22/2022]
Abstract
In prediction model research, external validation is needed to examine an existing model's performance using data independent to that for model development. Current external validation studies often suffer from small sample sizes and consequently imprecise predictive performance estimates. To address this, we propose how to determine the minimum sample size needed for a new external validation study of a prediction model for a binary outcome. Our calculations aim to precisely estimate calibration (Observed/Expected and calibration slope), discrimination (C-statistic), and clinical utility (net benefit). For each measure, we propose closed-form and iterative solutions for calculating the minimum sample size required. These require specifying: (i) target SEs (confidence interval widths) for each estimate of interest, (ii) the anticipated outcome event proportion in the validation population, (iii) the prediction model's anticipated (mis)calibration and variance of linear predictor values in the validation population, and (iv) potential risk thresholds for clinical decision-making. The calculations can also be used to inform whether the sample size of an existing (already collected) dataset is adequate for external validation. We illustrate our proposal for external validation of a prediction model for mechanical heart valve failure with an expected outcome event proportion of 0.018. Calculations suggest at least 9835 participants (177 events) are required to precisely estimate the calibration and discrimination measures, with this number driven by the calibration slope criterion, which we anticipate will often be the case. Also, 6443 participants (116 events) are required to precisely estimate net benefit at a risk threshold of 8%. Software code is provided.
Collapse
|
75
|
Collins GS, Riley RD, van Smeden M. Flaws in the Development and Validation of a Coronavirus Disease 2019 Prediction Model. Clin Infect Dis 2021; 73:557-558. [PMID: 32936916 PMCID: PMC7543337 DOI: 10.1093/cid/ciaa1406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2020] [Indexed: 11/24/2022] Open
|