101
|
Collins GS, van Smeden M, Riley RD. COVID-19 prediction models should adhere to methodological and reporting standards. Eur Respir J 2020; 56:2002643. [PMID: 32703773 PMCID: PMC7377211 DOI: 10.1183/13993003.02643-2020] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2020] [Accepted: 07/06/2020] [Indexed: 12/23/2022]
Abstract
The coronavirus disease 2019 (COVID-19) pandemic has led to a proliferation of clinical prediction models to aid diagnosis, disease severity assessment and prognosis. A systematic review has identified 66 COVID-19 prediction models: concluding that all, with no exception, are at high risk of bias due to concerns surrounding the data quality, statistical analysis and reporting, and none are recommended for use [1]. Therefore, we read with interest the recent paper by Wu et al. [2] describing the development of a model to identify COVID-19 patients with severe disease on admission to facilitate triage. However, our enthusiasm was dampened by a number of concerns surrounding the design, analysis and reporting of the study which deserve highlighting to readers. COVID-19 prediction models should adhere to methodological and reporting standards https://bit.ly/3ebnook
Collapse
|
102
|
Hong C, Salanti G, Morton SC, Riley RD, Chu H, Kimmel SE, Chen Y. Testing small study effects in multivariate meta-analysis. Biometrics 2020; 76:1240-1250. [PMID: 32720712 DOI: 10.1111/biom.13342] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Revised: 06/06/2019] [Accepted: 09/10/2019] [Indexed: 01/10/2023]
Abstract
Small study effects occur when smaller studies show different, often larger, treatment effects than large ones, which may threaten the validity of systematic reviews and meta-analyses. The most well-known reasons for small study effects include publication bias, outcome reporting bias, and clinical heterogeneity. Methods to account for small study effects in univariate meta-analysis have been extensively studied. However, detecting small study effects in a multivariate meta-analysis setting remains an untouched research area. One of the complications is that different types of selection processes can be involved in the reporting of multivariate outcomes. For example, some studies may be completely unpublished while others may selectively report multiple outcomes. In this paper, we propose a score test as an overall test of small study effects in multivariate meta-analysis. Two detailed case studies are given to demonstrate the advantage of the proposed test over various naive applications of univariate tests in practice. Through simulation studies, the proposed test is found to retain nominal Type I error rates with considerable power in moderate sample size settings. Finally, we also evaluate the concordance between the proposed tests with the naive application of univariate tests by evaluating 44 systematic reviews with multiple outcomes from the Cochrane Database.
Collapse
|
103
|
Hong C, Duan R, Zeng L, Hubbard RA, Lumley T, Riley RD, Chu H, Kimmel SE, Chen Y. The Galaxy Plot: A New Visualization Tool for Bivariate Meta-Analysis Studies. Am J Epidemiol 2020; 189:861-869. [PMID: 31942603 PMCID: PMC7438574 DOI: 10.1093/aje/kwz286] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2018] [Revised: 12/13/2019] [Accepted: 12/23/2019] [Indexed: 12/31/2022] Open
Abstract
Funnel plots have been widely used to detect small-study effects in the results of univariate meta-analyses. However, there is no existing visualization tool that is the counterpart of the funnel plot in the multivariate setting. We propose a new visualization method, the galaxy plot, which can simultaneously present the effect sizes of bivariate outcomes and their standard errors in a 2-dimensional space. We illustrate the use of the galaxy plot with 2 case studies, including a meta-analysis of hypertension trials with studies from 1979-1991 (Hypertension. 2005;45(5):907-913) and a meta-analysis of structured telephone support or noninvasive telemonitoring with studies from 1966-2015 (Heart. 2017;103(4):255-257). The galaxy plot is an intuitive visualization tool that can aid in interpreting results of multivariate meta-analysis. It preserves all of the information presented by separate funnel plots for each outcome while elucidating more complex features that may only be revealed by examining the joint distribution of the bivariate outcomes.
Collapse
|
104
|
Booth S, Riley RD, Ensor J, Lambert PC, Rutherford MJ. Temporal recalibration for improving prognostic model development and risk predictions in settings where survival is improving over time. Int J Epidemiol 2020; 49:1316-1325. [PMID: 32243524 PMCID: PMC7750972 DOI: 10.1093/ije/dyaa030] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/10/2020] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Prognostic models are typically developed in studies covering long time periods. However, if more recent years have seen improvements in survival, then using the full dataset may lead to out-of-date survival predictions. Period analysis addresses this by developing the model in a subset of the data from a recent time window, but results in a reduction of sample size. METHODS We propose a new approach, called temporal recalibration, to combine the advantages of period analysis and full cohort analysis. This approach develops a model in the entire dataset and then recalibrates the baseline survival using a period analysis sample. The approaches are demonstrated utilizing a prognostic model in colon cancer built using both Cox proportional hazards and flexible parametric survival models with data from 1996-2005 from the Surveillance, Epidemiology, and End Results (SEER) Program database. Comparison of model predictions with observed survival estimates were made for new patients subsequently diagnosed in 2006 and followed-up until 2015. RESULTS Period analysis and temporal recalibration provided more up-to-date survival predictions that more closely matched observed survival in subsequent data than the standard full cohort models. In addition, temporal recalibration provided more precise estimates of predictor effects. CONCLUSION Prognostic models are typically developed using a full cohort analysis that can result in out-of-date long-term survival estimates when survival has improved in recent years. Temporal recalibration is a simple method to address this, which can be used when developing and updating prognostic models to ensure survival predictions are more closely calibrated with the observed survival of individuals diagnosed subsequently.
Collapse
|
105
|
Papadimitropoulou K, Stijnen T, Riley RD, Dekkers OM, le Cessie S. Meta-analysis of continuous outcomes: Using pseudo IPD created from aggregate data to adjust for baseline imbalance and assess treatment-by-baseline modification. Res Synth Methods 2020; 11:780-794. [PMID: 32643264 PMCID: PMC7754323 DOI: 10.1002/jrsm.1434] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Revised: 05/08/2020] [Accepted: 06/23/2020] [Indexed: 12/24/2022]
Abstract
Meta‐analysis of individual participant data (IPD) is considered the “gold‐standard” for synthesizing clinical study evidence. However, gaining access to IPD can be a laborious task (if possible at all) and in practice only summary (aggregate) data are commonly available. In this work we focus on meta‐analytic approaches of comparative studies where aggregate data are available for continuous outcomes measured at baseline (pre‐treatment) and follow‐up (post‐treatment). We propose a method for constructing pseudo individual baselines and outcomes based on the aggregate data. These pseudo IPD can be subsequently analysed using standard analysis of covariance (ANCOVA) methods. Pseudo IPD for continuous outcomes reported at two timepoints can be generated using the sufficient statistics of an ANCOVA model, i.e., the mean and standard deviation at baseline and follow‐up per group, together with the correlation of the baseline and follow‐up measurements. Applying the ANCOVA approach, which crucially adjusts for baseline imbalances and accounts for the correlation between baseline and change scores, to the pseudo IPD, results in identical estimates to the ones obtained by an ANCOVA on the true IPD. In addition, an interaction term between baseline and treatment effect can be added. There are several modeling options available under this approach, which makes it very flexible. Methods are exemplified using reported data of a previously published IPD meta‐analysis of 10 trials investigating the effect of antihypertensive treatments on systolic blood pressure, leading to identical results compared with the true IPD analysis and of a meta‐analysis of fewer trials, where baseline imbalance occurred.
Collapse
|
106
|
Rücker G, Nikolakopoulou A, Papakonstantinou T, Salanti G, Riley RD, Schwarzer G. The statistical importance of a study for a network meta-analysis estimate. BMC Med Res Methodol 2020; 20:190. [PMID: 32664867 PMCID: PMC7386174 DOI: 10.1186/s12874-020-01075-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Accepted: 07/06/2020] [Indexed: 02/06/2023] Open
Abstract
Background In pairwise meta-analysis, the contribution of each study to the pooled estimate is given by its weight, which is based on the inverse variance of the estimate from that study. For network meta-analysis (NMA), the contribution of direct (and indirect) evidence is easily obtained from the diagonal elements of a hat matrix. It is, however, not fully clear how to generalize this to the percentage contribution of each study to a NMA estimate. Methods We define the importance of each study for a NMA estimate by the reduction of the estimate’s variance when adding the given study to the others. An equivalent interpretation is the relative loss in precision when the study is left out. Importances are values between 0 and 1. An importance of 1 means that the study is an essential link of the pathway in the network connecting one of the treatments with another. Results Importances can be defined for two-stage and one-stage NMA. These numbers in general do not add to one and thus cannot be interpreted as ‘percentage contributions’. After briefly discussing other available approaches, we question whether it is possible to obtain unique percentage contributions for NMA. Conclusions Importances generalize the concept of weights in pairwise meta-analysis in a natural way. Moreover, they are uniquely defined, easily calculated, and have an intuitive interpretation. We give some real examples for illustration.
Collapse
|
107
|
Hughes T, Riley RD, Callaghan MJ, Sergeant JC. The Value of Preseason Screening for Injury Prediction: The Development and Internal Validation of a Multivariable Prognostic Model to Predict Indirect Muscle Injury Risk in Elite Football (Soccer) Players. SPORTS MEDICINE-OPEN 2020; 6:22. [PMID: 32462372 PMCID: PMC7253524 DOI: 10.1186/s40798-020-00249-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Accepted: 04/30/2020] [Indexed: 12/23/2022]
Abstract
Background In elite football (soccer), periodic health examination (PHE) could provide prognostic factors to predict injury risk. Objective To develop and internally validate a prognostic model to predict individualised indirect (non-contact) muscle injury (IMI) risk during a season in elite footballers, only using PHE-derived candidate prognostic factors. Methods Routinely collected preseason PHE and injury data were used from 152 players over 5 seasons (1st July 2013 to 19th May 2018). Ten candidate prognostic factors (12 parameters) were included in model development. Multiple imputation was used to handle missing values. The outcome was any time-loss, index indirect muscle injury (I-IMI) affecting the lower extremity. A full logistic regression model was fitted, and a parsimonious model developed using backward-selection to remove factors that exceeded a threshold that was equivalent to Akaike’s Information Criterion (alpha 0.157). Predictive performance was assessed through calibration, discrimination and decision-curve analysis, averaged across all imputed datasets. The model was internally validated using bootstrapping and adjusted for overfitting. Results During 317 participant-seasons, 138 I-IMIs were recorded. The parsimonious model included only age and frequency of previous IMIs; apparent calibration was perfect, but discrimination was modest (C-index = 0.641, 95% confidence interval (CI) = 0.580 to 0.703), with clinical utility evident between risk thresholds of 37–71%. After validation and overfitting adjustment, performance deteriorated (C-index = 0.589 (95% CI = 0.528 to 0.651); calibration-in-the-large = − 0.009 (95% CI = − 0.239 to 0.239); calibration slope = 0.718 (95% CI = 0.275 to 1.161)). Conclusion The selected PHE data were insufficient prognostic factors from which to develop a useful model for predicting IMI risk in elite footballers. Further research should prioritise identifying novel prognostic factors to improve future risk prediction models in this field. Trial registration NCT03782389
Collapse
|
108
|
Riley RD, Legha A, Jackson D, Morris TP, Ensor J, Snell KIE, White IR, Burke DL. One-stage individual participant data meta-analysis models for continuous and binary outcomes: Comparison of treatment coding options and estimation methods. Stat Med 2020; 39:2536-2555. [PMID: 32394498 DOI: 10.1002/sim.8555] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Revised: 12/09/2019] [Accepted: 04/03/2020] [Indexed: 01/22/2023]
Abstract
A one-stage individual participant data (IPD) meta-analysis synthesizes IPD from multiple studies using a general or generalized linear mixed model. This produces summary results (eg, about treatment effect) in a single step, whilst accounting for clustering of participants within studies (via a stratified study intercept, or random study intercepts) and between-study heterogeneity (via random treatment effects). We use simulation to evaluate the performance of restricted maximum likelihood (REML) and maximum likelihood (ML) estimation of one-stage IPD meta-analysis models for synthesizing randomized trials with continuous or binary outcomes. Three key findings are identified. First, for ML or REML estimation of stratified intercept or random intercepts models, a t-distribution based approach generally improves coverage of confidence intervals for the summary treatment effect, compared with a z-based approach. Second, when using ML estimation of a one-stage model with a stratified intercept, the treatment variable should be coded using "study-specific centering" (ie, 1/0 minus the study-specific proportion of participants in the treatment group), as this reduces the bias in the between-study variance estimate (compared with 1/0 and other coding options). Third, REML estimation reduces downward bias in between-study variance estimates compared with ML estimation, and does not depend on the treatment variable coding; for binary outcomes, this requires REML estimation of the pseudo-likelihood, although this may not be stable in some situations (eg, when data are sparse). Two applied examples are used to illustrate the findings.
Collapse
|
109
|
Phillips B, Morgan JE, Haeusler GM, Riley RD. Individual participant data validation of the PICNICC prediction model for febrile neutropenia. Arch Dis Child 2020; 105:439-445. [PMID: 31690548 PMCID: PMC7212933 DOI: 10.1136/archdischild-2019-317308] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Revised: 09/20/2019] [Accepted: 10/18/2019] [Indexed: 12/23/2022]
Abstract
BACKGROUND Risk-stratified approaches to managing cancer therapies and their consequent complications rely on accurate predictions to work effectively. The risk-stratified management of fever with neutropenia is one such very common area of management in paediatric practice. Such rules are frequently produced and promoted without adequate confirmation of their accuracy. METHODS An individual participant data meta-analytic validation of the 'Predicting Infectious ComplicatioNs In Children with Cancer' (PICNICC) prediction model for microbiologically documented infection in paediatric fever with neutropenia was undertaken. Pooled estimates were produced using random-effects meta-analysis of the area under the curve-receiver operating characteristic curve (AUC-ROC), calibration slope and ratios of expected versus observed cases (E/O). RESULTS The PICNICC model was poorly predictive of microbiologically documented infection (MDI) in these validation cohorts. The pooled AUC-ROC was 0.59, 95% CI 0.41 to 0.78, tau2=0, compared with derivation value of 0.72, 95% CI 0.71 to 0.76. There was poor discrimination (pooled slope estimate 0.03, 95% CI -0.19 to 0.26) and calibration in the large (pooled E/O ratio 1.48, 95% CI 0.87 to 2.1). Three different simple recalibration approaches failed to improve performance meaningfully. CONCLUSION This meta-analysis shows the PICNICC model should not be used at admission to predict MDI. Further work should focus on validating alternative prediction models. Validation across multiple cohorts from diverse locations is essential before widespread clinical adoption of such rules to avoid overtreating or undertreating children with fever with neutropenia.
Collapse
|
110
|
Takwoingi Y, Partlett C, Riley RD, Hyde C, Deeks JJ. Methods and reporting of systematic reviews of comparative accuracy were deficient: a methodological survey and proposed guidance. J Clin Epidemiol 2020; 121:1-14. [PMID: 31843693 PMCID: PMC7203546 DOI: 10.1016/j.jclinepi.2019.12.007] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Revised: 10/02/2019] [Accepted: 12/11/2019] [Indexed: 01/27/2023]
Abstract
OBJECTIVE The objective of this study was to examine methodological and reporting characteristics of systematic reviews and meta-analyses which compare diagnostic test accuracy (DTA) of multiple index tests, identify good practice, and develop guidance for better reporting. STUDY DESIGN AND SETTING Methodological survey of 127 comparative or multiple tests reviews published in 74 different general medical and specialist journals. We summarized methods and reporting characteristics that are likely to differ between reviews of a single test and comparative reviews. We then developed guidance to enhance reporting of test comparisons in DTA reviews. RESULTS Of 127 reviews, 16 (13%) reviews restricted study selection and test comparisons to comparative accuracy studies while the remaining 111 (87%) reviews included any study type. Fifty-three reviews (42%) statistically compared test accuracy with only 18 (34%) of these using recommended methods. Reporting of several items-in particular the role of the index tests, test comparison strategy, and limitations of indirect comparisons (i.e., comparisons involving any study type)-was deficient in many reviews. Five reviews with exemplary methods and reporting were identified. CONCLUSION Reporting quality of reviews which evaluate and compare multiple tests is poor. The guidance developed, complemented with the exemplars, can assist review authors in producing better quality comparative reviews.
Collapse
|
111
|
Riley RD, Debray TPA, Fisher D, Hattle M, Marlin N, Hoogland J, Gueyffier F, Staessen JA, Wang J, Moons KGM, Reitsma JB, Ensor J. Individual participant data meta-analysis to examine interactions between treatment effect and participant-level covariates: Statistical recommendations for conduct and planning. Stat Med 2020; 39:2115-2137. [PMID: 32350891 PMCID: PMC7401032 DOI: 10.1002/sim.8516] [Citation(s) in RCA: 79] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Revised: 02/07/2020] [Accepted: 02/08/2020] [Indexed: 01/06/2023]
Abstract
Precision medicine research often searches for treatment‐covariate interactions, which refers to when a treatment effect (eg, measured as a mean difference, odds ratio, hazard ratio) changes across values of a participant‐level covariate (eg, age, gender, biomarker). Single trials do not usually have sufficient power to detect genuine treatment‐covariate interactions, which motivate the sharing of individual participant data (IPD) from multiple trials for meta‐analysis. Here, we provide statistical recommendations for conducting and planning an IPD meta‐analysis of randomized trials to examine treatment‐covariate interactions. For conduct, two‐stage and one‐stage statistical models are described, and we recommend: (i) interactions should be estimated directly, and not by calculating differences in meta‐analysis results for subgroups; (ii) interaction estimates should be based solely on within‐study information; (iii) continuous covariates and outcomes should be analyzed on their continuous scale; (iv) nonlinear relationships should be examined for continuous covariates, using a multivariate meta‐analysis of the trend (eg, using restricted cubic spline functions); and (v) translation of interactions into clinical practice is nontrivial, requiring individualized treatment effect prediction. For planning, we describe first why the decision to initiate an IPD meta‐analysis project should not be based on between‐study heterogeneity in the overall treatment effect; and second, how to calculate the power of a potential IPD meta‐analysis project in advance of IPD collection, conditional on characteristics (eg, number of participants, standard deviation of covariates) of the trials (potentially) promising their IPD. Real IPD meta‐analysis projects are used for illustration throughout.
Collapse
|
112
|
Wynants L, Van Calster B, Collins GS, Riley RD, Heinze G, Schuit E, Bonten MMJ, Dahly DL, Damen JAA, Debray TPA, de Jong VMT, De Vos M, Dhiman P, Haller MC, Harhay MO, Henckaerts L, Heus P, Kammer M, Kreuzberger N, Lohmann A, Luijken K, Ma J, Martin GP, McLernon DJ, Andaur Navarro CL, Reitsma JB, Sergeant JC, Shi C, Skoetz N, Smits LJM, Snell KIE, Sperrin M, Spijker R, Steyerberg EW, Takada T, Tzoulaki I, van Kuijk SMJ, van Bussel B, van der Horst ICC, van Royen FS, Verbakel JY, Wallisch C, Wilkinson J, Wolff R, Hooft L, Moons KGM, van Smeden M. Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. BMJ 2020; 369:m1328. [PMID: 32265220 PMCID: PMC7222643 DOI: 10.1136/bmj.m1328] [Citation(s) in RCA: 1622] [Impact Index Per Article: 405.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/31/2020] [Indexed: 12/12/2022]
Abstract
OBJECTIVE To review and appraise the validity and usefulness of published and preprint reports of prediction models for diagnosing coronavirus disease 2019 (covid-19) in patients with suspected infection, for prognosis of patients with covid-19, and for detecting people in the general population at increased risk of covid-19 infection or being admitted to hospital with the disease. DESIGN Living systematic review and critical appraisal by the COVID-PRECISE (Precise Risk Estimation to optimise covid-19 Care for Infected or Suspected patients in diverse sEttings) group. DATA SOURCES PubMed and Embase through Ovid, up to 1 July 2020, supplemented with arXiv, medRxiv, and bioRxiv up to 5 May 2020. STUDY SELECTION Studies that developed or validated a multivariable covid-19 related prediction model. DATA EXTRACTION At least two authors independently extracted data using the CHARMS (critical appraisal and data extraction for systematic reviews of prediction modelling studies) checklist; risk of bias was assessed using PROBAST (prediction model risk of bias assessment tool). RESULTS 37 421 titles were screened, and 169 studies describing 232 prediction models were included. The review identified seven models for identifying people at risk in the general population; 118 diagnostic models for detecting covid-19 (75 were based on medical imaging, 10 to diagnose disease severity); and 107 prognostic models for predicting mortality risk, progression to severe disease, intensive care unit admission, ventilation, intubation, or length of hospital stay. The most frequent types of predictors included in the covid-19 prediction models are vital signs, age, comorbidities, and image features. Flu-like symptoms are frequently predictive in diagnostic models, while sex, C reactive protein, and lymphocyte counts are frequent prognostic factors. Reported C index estimates from the strongest form of validation available per model ranged from 0.71 to 0.99 in prediction models for the general population, from 0.65 to more than 0.99 in diagnostic models, and from 0.54 to 0.99 in prognostic models. All models were rated at high or unclear risk of bias, mostly because of non-representative selection of control patients, exclusion of patients who had not experienced the event of interest by the end of the study, high risk of model overfitting, and unclear reporting. Many models did not include a description of the target population (n=27, 12%) or care setting (n=75, 32%), and only 11 (5%) were externally validated by a calibration plot. The Jehi diagnostic model and the 4C mortality score were identified as promising models. CONCLUSION Prediction models for covid-19 are quickly entering the academic literature to support medical decision making at a time when they are urgently needed. This review indicates that almost all pubished prediction models are poorly reported, and at high risk of bias such that their reported predictive performance is probably optimistic. However, we have identified two (one diagnostic and one prognostic) promising models that should soon be validated in multiple cohorts, preferably through collaborative efforts and data sharing to also allow an investigation of the stability and heterogeneity in their performance across populations and settings. Details on all reviewed models are publicly available at https://www.covprecise.org/. Methodological guidance as provided in this paper should be followed because unreliable predictions could cause more harm than benefit in guiding clinical decisions. Finally, prediction model authors should adhere to the TRIPOD (transparent reporting of a multivariable prediction model for individual prognosis or diagnosis) reporting guideline. SYSTEMATIC REVIEW REGISTRATION Protocol https://osf.io/ehc47/, registration https://osf.io/wy245. READERS' NOTE This article is a living systematic review that will be updated to reflect emerging evidence. Updates may occur for up to two years from the date of original publication. This version is update 3 of the original article published on 7 April 2020 (BMJ 2020;369:m1328). Previous updates can be found as data supplements (https://www.bmj.com/content/369/bmj.m1328/related#datasupp). When citing this paper please consider adding the update number and date of access for clarity.
Collapse
|
113
|
Riley RD, Ensor J, Snell KIE, Harrell FE, Martin GP, Reitsma JB, Moons KGM, Collins G, van Smeden M. Calculating the sample size required for developing a clinical prediction model. BMJ 2020; 368:m441. [PMID: 32188600 DOI: 10.1136/bmj.m441] [Citation(s) in RCA: 706] [Impact Index Per Article: 176.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
114
|
de Jong VM, Moons KG, Riley RD, Tudur Smith C, Marson AG, Eijkemans MJ, Debray TP. Individual participant data meta-analysis of intervention studies with time-to-event outcomes: A review of the methodology and an applied example. Res Synth Methods 2020; 11:148-168. [PMID: 31759339 PMCID: PMC7079159 DOI: 10.1002/jrsm.1384] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 10/23/2019] [Accepted: 10/24/2019] [Indexed: 12/14/2022]
Abstract
Many randomized trials evaluate an intervention effect on time-to-event outcomes. Individual participant data (IPD) from such trials can be obtained and combined in a so-called IPD meta-analysis (IPD-MA), to summarize the overall intervention effect. We performed a narrative literature review to provide an overview of methods for conducting an IPD-MA of randomized intervention studies with a time-to-event outcome. We focused on identifying good methodological practice for modeling frailty of trial participants across trials, modeling heterogeneity of intervention effects, choosing appropriate association measures, dealing with (trial differences in) censoring and follow-up times, and addressing time-varying intervention effects and effect modification (interactions).We discuss how to achieve this using parametric and semi-parametric methods, and describe how to implement these in a one-stage or two-stage IPD-MA framework. We recommend exploring heterogeneity of the effect(s) through interaction and non-linear effects. Random effects should be applied to account for residual heterogeneity of the intervention effect. We provide further recommendations, many of which specific to IPD-MA of time-to-event data from randomized trials examining an intervention effect.We illustrate several key methods in a real IPD-MA, where IPD of 1225 participants from 5 randomized clinical trials were combined to compare the effects of Carbamazepine and Valproate on the incidence of epileptic seizures.
Collapse
|
115
|
Dickens AP, Fitzmaurice DA, Adab P, Sitch A, Riley RD, Enocson A, Jordan RE. Accuracy of Vitalograph lung monitor as a screening test for COPD in primary care. NPJ Prim Care Respir Med 2020; 30:2. [PMID: 31900421 PMCID: PMC6941963 DOI: 10.1038/s41533-019-0158-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Accepted: 11/13/2019] [Indexed: 11/25/2022] Open
Abstract
Microspirometry may be useful as the second stage of a screening pathway among patients reporting respiratory symptoms. We assessed sensitivity and specificity of the Vitalograph® lung monitor compared with post-bronchodilator confirmatory spirometry (ndd Easy on-PC) among primary care chronic obstructive pulmonary disease (COPD) patients within the Birmingham COPD cohort. We report a case–control analysis within 71 general practices in the UK. Eligible patients were aged ≥40 years who were either on a clinical COPD register or reported chronic respiratory symptoms on a questionnaire. Participants performed pre- and post-bronchodilator microspirometry, prior to confirmatory spirometry. Out of the 544 participants, COPD was confirmed in 337 according to post-bronchodilator confirmatory spirometry. Pre-bronchodilator, using the LLN as a cut-point, the lung monitor had a sensitivity of 50.5% (95% CI 45.0%, 55.9%) and a specificity of 99.0% (95% CI 96.6%, 99.9%) in our sample. Using a fixed ratio of FEV1/FEV6 < 0.7 to define obstruction in the lung monitor, sensitivity increased (58.8%; 95% CI 53.0, 63.8) while specificity was virtually identical (98.6%; 95% CI 95.8, 99.7). Within our sample, the optimal cut-point for the lung monitor was FEV1/FEV6 < 0.78, with sensitivity of 82.8% (95% CI 78.3%, 86.7%) and specificity of 85.0% (95% CI 79.4%, 89.6%). Test performance of the lung monitor was unaffected by bronchodilation. The lung monitor could be used in primary care without a bronchodilator using a simple ratio of FEV1/FEV6 as part of a screening pathway for COPD among patients reporting respiratory symptoms.
Collapse
|
116
|
Riley RD. Correction to: Minimum sample size for developing a multivariable prediction model: Part II-binary and time-to-event outcomes by Riley RD, Snell KI, Ensor J, et al. Stat Med 2019; 38:5672. [PMID: 31793031 DOI: 10.1002/sim.8409] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Accepted: 10/02/2019] [Indexed: 11/10/2022]
|
117
|
Moriarty AS, Meader N, Gilbody S, Chew-Graham CA, Churchill R, Ali S, Phillips RS, Riley RD, McMillan D. Prognostic models for predicting relapse or recurrence of depression. Hippokratia 2019. [DOI: 10.1002/14651858.cd013491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
118
|
Hayden JA, Wilson MN, Stewart S, Cartwright JL, Smith AO, Riley RD, van Tulder M, Bendix T, Cecchi F, Costa LOP, Dufour N, Ferreira ML, Foster NE, Gudavalli MR, Hartvigsen J, Helmhout P, Kool J, Koumantakis GA, Kovacs FM, Kuukkanen T, Long A, Macedo LG, Machado LAC, Maher CG, Mehling W, Morone G, Peterson T, Rasmussen-Barr E, Ryan CG, Sjögren T, Smeets R, Staal JB, Unsgaard-Tøndel M, Wajswelner H, Yeung EW. Exercise treatment effect modifiers in persistent low back pain: an individual participant data meta-analysis of 3514 participants from 27 randomised controlled trials. Br J Sports Med 2019; 54:1277-1278. [DOI: 10.1136/bjsports-2019-101205] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/13/2019] [Indexed: 01/26/2023]
Abstract
BackgroundLow back pain is one of the leading causes of disability worldwide. Exercise therapy is widely recommended to treat persistent non-specific low back pain. While evidence suggests exercise is, on average, moderately effective, there remains uncertainty about which individuals might benefit the most from exercise.MethodsIn parallel with a Cochrane review update, we requested individual participant data (IPD) from high-quality randomised clinical trials of adults with our two primary outcomes of interest, pain and functional limitations, and calculated global recovery. We compiled a master data set including baseline participant characteristics, exercise and comparison characteristics, and outcomes at short-term, moderate-term and long-term follow-up. We conducted descriptive analyses and one-stage IPD meta-analysis using multilevel mixed-effects regression of the overall treatment effect and prespecified potential treatment effect modifiers.ResultsWe received IPD for 27 trials (3514 participants). For studies included in this analysis, compared with no treatment/usual care, exercise therapy on average reduced pain (mean effect/100 (95% CI) −10.7 (−14.1 to –7.4)), a result compatible with a clinically important 20% smallest worthwhile effect. Exercise therapy reduced functional limitations with a clinically important 23% improvement (mean effect/100 (95% CI) −10.2 (−13.2 to –7.3)) at short-term follow-up. Not having heavy physical demands at work and medication use for low back pain were potential treatment effect modifiers—these were associated with superior exercise outcomes relative to non-exercise comparisons. Lower body mass index was also associated with better outcomes in exercise compared with no treatment/usual care. This study was limited by inconsistent availability and measurement of participant characteristics.ConclusionsThis study provides potentially useful information to help treat patients and design future studies of exercise interventions that are better matched to specific subgroups.Protocol publicationhttps://doi.org/10.1186/2046-4053-1-64
Collapse
|
119
|
Hayden JA, Wilson MN, Riley RD, Iles R, Pincus T, Ogilvie R. Individual recovery expectations and prognosis of outcomes in non-specific low back pain: prognostic factor review. Cochrane Database Syst Rev 2019; 2019:CD011284. [PMID: 31765487 PMCID: PMC6877336 DOI: 10.1002/14651858.cd011284.pub2] [Citation(s) in RCA: 61] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
BACKGROUND Low back pain is costly and disabling. Prognostic factor evidence can help healthcare providers and patients understand likely prognosis, inform the development of prediction models to identify subgroups, and may inform new treatment strategies. Recent studies have suggested that people who have poor expectations for recovery experience more back pain disability, but study results have differed. OBJECTIVES To synthesise evidence on the association between recovery expectations and disability outcomes in adults with low back pain, and explore sources of heterogeneity. SEARCH METHODS The search strategy included broad and focused electronic searches of MEDLINE, Embase, CINAHL, and PsycINFO to 12 March 2019, reference list searches of relevant reviews and included studies, and citation searches of relevant expectation measurement tools. SELECTION CRITERIA We included low back pain prognosis studies from any setting assessing general, self-efficacy, and treatment expectations (measured dichotomously and continuously on a 0 - 10 scale), and their association with work participation, clinically important recovery, functional limitations, or pain intensity outcomes at short (3 months), medium (6 months), long (12 months), and very long (> 16 months) follow-up. DATA COLLECTION AND ANALYSIS We extracted study characteristics and all reported estimates of unadjusted and adjusted associations between expectations and related outcomes. Two review authors independently assessed risks of bias using the Quality in Prognosis Studies (QUIPS) tool. We conducted narrative syntheses and meta-analyses when appropriate unadjusted or adjusted estimates were available. Two review authors independently graded and reported the overall quality of evidence. MAIN RESULTS We screened 4635 unique citations to include 60 studies (30,530 participants). Thirty-five studies were conducted in Europe, 21 in North America, and four in Australia. Study populations were mostly chronic (37%), from healthcare (62%) or occupational settings (26%). General expectation was the most common type of recovery expectation measured (70%); 16 studies measured more than one type of expectation. Usable data for syntheses were available for 52 studies (87% of studies; 28,885 participants). We found moderate-quality evidence that positive recovery expectations are strongly associated with better work participation (narrative synthesis: 21 studies; meta-analysis: 12 studies, 4777 participants: odds ratio (OR) 2.43, 95% confidence interval (CI) 1.64 to 3.62), and low-quality evidence for clinically important recovery outcomes (narrative synthesis: 12 studies; meta-analysis: 5 studies, 1820 participants: OR 1.89, 95% CI 1.49 to 2.41), both at follow-up times closest to 12 months, using adjusted data. The association of recovery expectations with other outcomes of interest, including functional limitations (narrative synthesis: 10 studies; meta-analysis: 3 studies, 1435 participants: OR 1.40, 95% CI 0.85 to 2.31) and pain intensity (narrative synthesis: 9 studies; meta-analysis: 3 studies, 1555 participants: OR 1.15, 95% CI 1.08 to 1.23) outcomes at follow-up times closest to 12 months using adjusted data, is less certain, achieving very low- and low-quality evidence, respectively. No studies reported statistically significant or clinically important negative associations between recovery expectations and any low back pain outcome. AUTHORS' CONCLUSIONS We found that individual recovery expectations are probably strongly associated with future work participation (moderate-quality evidence) and may be associated with clinically important recovery outcomes (low-quality evidence). The association of recovery expectations with other outcomes of interest is less certain. Our findings suggest that recovery expectations should be considered in future studies, to improve prognosis and management of low back pain.
Collapse
|
120
|
Debray TPA, Damen JAAG, Riley RD, Snell K, Reitsma JB, Hooft L, Collins GS, Moons KGM. A framework for meta-analysis of prediction model studies with binary and time-to-event outcomes. Stat Methods Med Res 2019; 28:2768-2786. [PMID: 30032705 PMCID: PMC6728752 DOI: 10.1177/0962280218785504] [Citation(s) in RCA: 101] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
It is widely recommended that any developed-diagnostic or prognostic-prediction model is externally validated in terms of its predictive performance measured by calibration and discrimination. When multiple validations have been performed, a systematic review followed by a formal meta-analysis helps to summarize overall performance across multiple settings, and reveals under which circumstances the model performs suboptimal (alternative poorer) and may need adjustment. We discuss how to undertake meta-analysis of the performance of prediction models with either a binary or a time-to-event outcome. We address how to deal with incomplete availability of study-specific results (performance estimates and their precision), and how to produce summary estimates of the c-statistic, the observed:expected ratio and the calibration slope. Furthermore, we discuss the implementation of frequentist and Bayesian meta-analysis methods, and propose novel empirically-based prior distributions to improve estimation of between-study heterogeneity in small samples. Finally, we illustrate all methods using two examples: meta-analysis of the predictive performance of EuroSCORE II and of the Framingham Risk Score. All examples and meta-analysis models have been implemented in our newly developed R package "metamisc".
Collapse
|
121
|
Price MJ, Blake HA, Kenyon S, White IR, Jackson D, Kirkham JJ, Neilson JP, Deeks JJ, Riley RD. Empirical comparison of univariate and multivariate meta-analyses in Cochrane Pregnancy and Childbirth reviews with multiple binary outcomes. Res Synth Methods 2019; 10:440-451. [PMID: 31058440 PMCID: PMC6771837 DOI: 10.1002/jrsm.1353] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Revised: 04/04/2019] [Accepted: 04/13/2019] [Indexed: 12/20/2022]
Abstract
BACKGROUND Multivariate meta-analysis (MVMA) jointly synthesizes effects for multiple correlated outcomes. The MVMA model is potentially more difficult and time-consuming to apply than univariate models, so if its use makes little difference to parameter estimates, it could be argued that it is redundant. METHODS We assessed the applicability and impact of MVMA in Cochrane Pregnancy and Childbirth (CPCB) systematic reviews. We applied MVMA to CPCB reviews published between 2011 and 2013 with two or more binary outcomes with at least three studies and compared findings with results of univariate meta-analyses. Univariate random effects meta-analysis models were fitted using restricted maximum likelihood estimation (REML). RESULTS Eighty CPCB reviews were published. MVMA could not be applied in 70 of these reviews. MVMA was not feasible in three of the remaining 10 reviews because the appropriate models failed to converge. Estimates from MVMA agreed with those of univariate analyses in most of the other seven reviews. Statistical significance changed in two reviews: In one, this was due to a very small change in P value; in the other, the MVMA result for one outcome suggested that previous univariate results may be vulnerable to small-study effects and that the certainty of clinical conclusions needs consideration. CONCLUSIONS MVMA methods can be applied only in a minority of reviews of interventions in pregnancy and childbirth and can be difficult to apply because of missing correlations or lack of convergence. Nevertheless, clinical and/or statistical conclusions from MVMA may occasionally differ from those from univariate analyses.
Collapse
|
122
|
Hudda MT, Fewtrell MS, Haroun D, Lum S, Williams JE, Wells JCK, Riley RD, Owen CG, Cook DG, Rudnicka AR, Whincup PH, Nightingale CM. Development and validation of a prediction model for fat mass in children and adolescents: meta-analysis using individual participant data. BMJ 2019; 366:l4293. [PMID: 31340931 PMCID: PMC6650932 DOI: 10.1136/bmj.l4293] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
OBJECTIVES To develop and validate a prediction model for fat mass in children aged 4-15 years using routinely available risk factors of height, weight, and demographic information without the need for more complex forms of assessment. DESIGN Individual participant data meta-analysis. SETTING Four population based cross sectional studies and a fifth study for external validation, United Kingdom. PARTICIPANTS A pooled derivation dataset (four studies) of 2375 children and an external validation dataset of 176 children with complete data on anthropometric measurements and deuterium dilution assessments of fat mass. MAIN OUTCOME MEASURE Multivariable linear regression analysis, using backwards selection for inclusion of predictor variables and allowing non-linear relations, was used to develop a prediction model for fat-free mass (and subsequently fat mass by subtracting resulting estimates from weight) based on the four studies. Internal validation and then internal-external cross validation were used to examine overfitting and generalisability of the model's predictive performance within the four development studies; external validation followed using the fifth dataset. RESULTS Model derivation was based on a multi-ethnic population of 2375 children (47.8% boys, n=1136) aged 4-15 years. The final model containing predictor variables of height, weight, age, sex, and ethnicity had extremely high predictive ability (optimism adjusted R2: 94.8%, 95% confidence interval 94.4% to 95.2%) with excellent calibration of observed and predicted values. The internal validation showed minimal overfitting and good model generalisability, with excellent calibration and predictive performance. External validation in 176 children aged 11-12 years showed promising generalisability of the model (R2: 90.0%, 95% confidence interval 87.2% to 92.8%) with good calibration of observed and predicted fat mass (slope: 1.02, 95% confidence interval 0.97 to 1.07). The mean difference between observed and predicted fat mass was -1.29 kg (95% confidence interval -1.62 to -0.96 kg). CONCLUSION The developed model accurately predicted levels of fat mass in children aged 4-15 years. The prediction model is based on simple anthropometric measures without the need for more complex forms of assessment and could improve the accuracy of assessments for body fatness in children (compared with those provided by body mass index) for effective surveillance, prevention, and management of clinical and public health obesity.
Collapse
|
123
|
Debray TP, de Jong VM, Moons KG, Riley RD. Evidence synthesis in prognosis research. Diagn Progn Res 2019; 3:13. [PMID: 31338426 PMCID: PMC6621956 DOI: 10.1186/s41512-019-0059-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Accepted: 04/16/2019] [Indexed: 12/11/2022] Open
Abstract
Over the past few years, evidence synthesis has become essential to investigate and improve the generalizability of medical research findings. This strategy often involves a meta-analysis to formally summarize quantities of interest, such as relative treatment effect estimates. The use of meta-analysis methods is, however, less straightforward in prognosis research because substantial variation exists in research objectives, analysis methods and the level of reported evidence. We present a gentle overview of statistical methods that can be used to summarize data of prognostic factor and prognostic model studies. We discuss how aggregate data, individual participant data, or a combination thereof can be combined through meta-analysis methods. Recent examples are provided throughout to illustrate the various methods.
Collapse
|
124
|
Mackie FL, Whittle R, Morris RK, Hyett J, Riley RD, Kilby MD. First-trimester ultrasound measurements and maternal serum biomarkers as prognostic factors in monochorionic twins: a cohort study. Diagn Progn Res 2019; 3:9. [PMID: 31093579 PMCID: PMC6507122 DOI: 10.1186/s41512-019-0054-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Accepted: 03/20/2019] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Monochorionic twin pregnancies are at high risk of adverse outcomes, but it is not possible to predict which pregnancies will develop complications. The aim of the study was to evaluate, in monochorionic twin pregnancies, whether first-trimester ultrasound (nuchal translucency [NT], crown-rump length [CRL]), and maternal serum biomarkers (alpha-fetoprotein [AFP], soluble fms-like tyrosine kinase-1 [sFlt-1] and placental growth factor [PlGF]), are prognostic factors for fetal adverse outcome composite, twin-twin transfusion syndrome (TTTS), growth restriction, and intrauterine fetal death (IUFD). METHODS A cohort study of 177 monochorionic diamniotic twin pregnancies. Independent prognostic ability of each factor was assessed by multivariable logistic regression, adjusting for standard prognostic factors. Factors were analysed as continuous data; thus, the reported ORs relate to either 1% change in NT or CRL inter-twin percentage discordance or one unit of measure in each serum biomarker. RESULTS The odds of the fetal adverse outcome composite were significantly associated with increased NT inter-twin percentage discordance (adjusted OR 1.03 [95% CI 1.01, 1.06]) and CRL inter-twin percentage discordance (adjusted OR 1.17 [95% CI 1.07, 1.29]). TTTS was significantly associated with increased NT discordance (adjusted OR 1.06 [95% CI 1.03, 1.10]) and decreased PlGF (adjusted OR 0.42 [95% CI 0.19, 0.93]). Antenatal growth restriction was significantly associated with increased CRL discordance (adjusted OR 1.20 [95% CI 1.08, 1.34]). Single and double IUFD were associated with decreased PlGF (adjusted OR 0.34 [95% CI 0.12, 0.98]) and (adjusted OR 0.18 [95%CI 0.05, 0.58]) respectively. CONCLUSIONS This study has identified potential individual prognostic factors in the first trimester (fetal biometric and maternal serum biomarkers) that show promise but require further robust evaluation in a larger, prospective series of MC twin pregnancies, so that their usefulness both individually and in combination can be defined. TRIAL REGISTRATION ISRCTN 13114861 (retrospectively registered).
Collapse
|
125
|
Bonnett LJ, Snell KIE, Collins GS, Riley RD. Guide to presenting clinical prediction models for use in clinical settings. BMJ 2019; 365:l737. [PMID: 30995987 DOI: 10.1136/bmj.l737] [Citation(s) in RCA: 84] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|