26
|
Riley RD, Collins GS, Hattle M, Whittle R, Ensor J. Calculating the power of a planned individual participant data meta-analysis of randomised trials to examine a treatment-covariate interaction with a time-to-event outcome. Res Synth Methods 2023; 14:718-730. [PMID: 37386750 PMCID: PMC10947306 DOI: 10.1002/jrsm.1650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 04/24/2023] [Accepted: 06/07/2023] [Indexed: 07/01/2023]
Abstract
Before embarking on an individual participant data meta-analysis (IPDMA) project, researchers should consider the power of their planned IPDMA conditional on the studies promising their IPD and their characteristics. Such power estimates help inform whether the IPDMA project is worth the time and funding investment, before IPD are collected. Here, we suggest how to estimate the power of a planned IPDMA of randomised trials aiming to examine treatment-covariate interactions at the participant-level (i.e., treatment effect modifiers). We focus on a time-to-event (survival) outcome with a binary or continuous covariate, and propose an approximate analytic power calculation that conditions on the actual characteristics of trials, for example, in terms of sample sizes and covariate distributions. The proposed method has five steps: (i) extracting the following aggregate data for each group in each trial-the number of participants and events, the mean and SD for each continuous covariate, and the proportion of participants in each category for each binary covariate; (ii) specifying a minimally important interaction size; (iii) deriving an approximate estimate of Fisher's information matrix for each trial and the corresponding variance of the interaction estimate per trial, based on assuming an exponential survival distribution; (iv) deriving the estimated variance of the summary interaction estimate from the planned IPDMA, under a common-effect assumption, and (v) calculating the power of the IPDMA based on a two-sided Wald test. Stata and R code are provided and a real example provided for illustration. Further evaluation in real examples and simulations is needed.
Collapse
|
27
|
Wu F, Fuleihan GEH, Cai G, Lamberg-Allardt C, Viljakainen HT, Rahme M, Grønborg IM, Andersen R, Khadilkar A, Zulf MM, Mølgaard C, Larnkjær A, Zhu K, Riley RD, Winzenberg T. Vitamin D supplementation for improving bone density in vitamin D-deficient children and adolescents: systematic review and individual participant data meta-analysis of randomized controlled trials. Am J Clin Nutr 2023; 118:498-506. [PMID: 37661104 DOI: 10.1016/j.ajcnut.2023.05.028] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Revised: 04/25/2023] [Accepted: 05/23/2023] [Indexed: 09/05/2023] Open
Abstract
BACKGROUND Vitamin D supplements are widely used for improving bone health in children and adolescents, but their effects in vitamin D-deficient children are unclear. OBJECTIVES This study aimed to examine whether the effect of vitamin D supplementation on bone mineral density (BMD) in children and adolescents differs by baseline vitamin D status and estimate the effect in vitamin D-deficient individuals. METHODS This is a systematic review and individual participant data (IPD) meta-analysis. We searched the Cochrane Central Register of Controlled Trials, MEDLINE, MBASE, CINAHL, AMED, and ISI Web of Science (until May 27, 2020) for randomized controlled trials (RCTs) of vitamin D supplementation reporting bone density outcomes after ≥6 mo in healthy individuals aged 1-19 y. We used two-stage IPD meta-analysis to determine treatment effects on total body bone mineral content and BMD at the hip, femoral neck, lumbar spine, and proximal and distal forearm after 1 y; examine whether effects varied by baseline serum 25-hydroxyvitamin D [25(OH)D] concentration, and estimate treatment effects for each 25(OH)D subgroup. RESULTS Eleven RCTs were included. Nine comprising 1439 participants provided IPD (86% females, mean baseline 25(OH)D = 36.3 nmol/L). Vitamin D supplementation had a small overall effect on total hip areal BMD (weighted mean difference = 6.8; 95% confidence interval: 0.7, 12.9 mg/cm2; I2 = 7.2%), but no effects on other outcomes. There was no clear evidence of linear or nonlinear interactions between baseline 25(OH)D and treatment; effects were similar in baseline 25(OH)D subgroups (cutoff of 35 or 50 nmol/L). The evidence was of high certainty. CONCLUSIONS Clinically important benefits for bone density from 1-y vitamin D supplementation in healthy children and adolescents, regardless of baseline vitamin D status, are unlikely. However, our findings are mostly generalizable to White postpubertal girls and do not apply to those with baseline 25(OH)D outside the studied range or with symptomatic vitamin D deficiency (e.g., rickets). This study was preregistered at PROSPERO as CRD42017068772. https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42017068772.
Collapse
|
28
|
de Jong VMT, Hoogland J, Moons KGM, Riley RD, Nguyen TL, Debray TPA. Propensity-based standardization to enhance the validation and interpretation of prediction model discrimination for a target population. Stat Med 2023; 42:3508-3528. [PMID: 37311563 DOI: 10.1002/sim.9817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Revised: 02/26/2023] [Accepted: 05/19/2023] [Indexed: 06/15/2023]
Abstract
External validation of the discriminative ability of prediction models is of key importance. However, the interpretation of such evaluations is challenging, as the ability to discriminate depends on both the sample characteristics (ie, case-mix) and the generalizability of predictor coefficients, but most discrimination indices do not provide any insight into their respective contributions. To disentangle differences in discriminative ability across external validation samples due to a lack of model generalizability from differences in sample characteristics, we propose propensity-weighted measures of discrimination. These weighted metrics, which are derived from propensity scores for sample membership, are standardized for case-mix differences between the model development and validation samples, allowing for a fair comparison of discriminative ability in terms of model characteristics in a target population of interest. We illustrate our methods with the validation of eight prediction models for deep vein thrombosis in 12 external validation data sets and assess our methods in a simulation study. In the illustrative example, propensity score standardization reduced between-study heterogeneity of discrimination, indicating that between-study variability was partially attributable to case-mix. The simulation study showed that only flexible propensity-score methods (allowing for non-linear effects) produced unbiased estimates of model discrimination in the target population, and only when the positivity assumption was met. Propensity score-based standardization may facilitate the interpretation of (heterogeneity in) discriminative ability of a prediction model as observed across multiple studies, and may guide model updating strategies for a particular target population. Careful propensity score modeling with attention for non-linear relations is recommended.
Collapse
|
29
|
Dhiman P, Ma J, Qi C, Bullock G, Sergeant JC, Riley RD, Collins GS. Sample size requirements are not being considered in studies developing prediction models for binary outcomes: a systematic review. BMC Med Res Methodol 2023; 23:188. [PMID: 37598153 PMCID: PMC10439652 DOI: 10.1186/s12874-023-02008-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 08/04/2023] [Indexed: 08/21/2023] Open
Abstract
BACKGROUND Having an appropriate sample size is important when developing a clinical prediction model. We aimed to review how sample size is considered in studies developing a prediction model for a binary outcome. METHODS We searched PubMed for studies published between 01/07/2020 and 30/07/2020 and reviewed the sample size calculations used to develop the prediction models. Using the available information, we calculated the minimum sample size that would be needed to estimate overall risk and minimise overfitting in each study and summarised the difference between the calculated and used sample size. RESULTS A total of 119 studies were included, of which nine studies provided sample size justification (8%). The recommended minimum sample size could be calculated for 94 studies: 73% (95% CI: 63-82%) used sample sizes lower than required to estimate overall risk and minimise overfitting including 26% studies that used sample sizes lower than required to estimate overall risk only. A similar number of studies did not meet the ≥ 10EPV criteria (75%, 95% CI: 66-84%). The median deficit of the number of events used to develop a model was 75 [IQR: 234 lower to 7 higher]) which reduced to 63 if the total available data (before any data splitting) was used [IQR:225 lower to 7 higher]. Studies that met the minimum required sample size had a median c-statistic of 0.84 (IQR:0.80 to 0.9) and studies where the minimum sample size was not met had a median c-statistic of 0.83 (IQR: 0.75 to 0.9). Studies that met the ≥ 10 EPP criteria had a median c-statistic of 0.80 (IQR: 0.73 to 0.84). CONCLUSIONS Prediction models are often developed with no sample size calculation, as a consequence many are too small to precisely estimate the overall risk. We encourage researchers to justify, perform and report sample size calculations when developing a prediction model.
Collapse
|
30
|
Pate A, Sperrin M, Riley RD, Sergeant JC, Van Staa T, Peek N, Mamas MA, Lip GYH, O'Flaherty M, Buchan I, Martin GP. Developing prediction models to estimate the risk of two survival outcomes both occurring: A comparison of techniques. Stat Med 2023; 42:3184-3207. [PMID: 37218664 PMCID: PMC11155421 DOI: 10.1002/sim.9771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 03/21/2023] [Accepted: 04/26/2023] [Indexed: 05/24/2023]
Abstract
INTRODUCTION This study considers the prediction of the time until two survival outcomes have both occurred. We compared a variety of analytical methods motivated by a typical clinical problem of multimorbidity prognosis. METHODS We considered five methods: product (multiply marginal risks), dual-outcome (directly model the time until both events occur), multistate models (msm), and a range of copula and frailty models. We assessed calibration and discrimination under a variety of simulated data scenarios, varying outcome prevalence, and the amount of residual correlation. The simulation focused on model misspecification and statistical power. Using data from the Clinical Practice Research Datalink, we compared model performance when predicting the risk of cardiovascular disease and type 2 diabetes both occurring. RESULTS Discrimination was similar for all methods. The product method was poorly calibrated in the presence of residual correlation. The msm and dual-outcome models were the most robust to model misspecification but suffered a drop in performance at small sample sizes due to overfitting, which the copula and frailty model were less susceptible to. The copula and frailty model's performance were highly dependent on the underlying data structure. In the clinical example, the product method was poorly calibrated when adjusting for 8 major cardiovascular risk factors. DISCUSSION We recommend the dual-outcome method for predicting the risk of two survival outcomes both occurring. It was the most robust to model misspecification, although was also the most prone to overfitting. The clinical example motivates the use of the methods considered in this study.
Collapse
|
31
|
Holden MA, Hattle M, Runhaar J, Riley RD, Healey EL, Quicke J, van der Windt DA, Dziedzic K, van Middelkoop M, Burke D, Corp N, Legha A, Bierma-Zeinstra S, Foster NE. Moderators of the effect of therapeutic exercise for knee and hip osteoarthritis: a systematic review and individual participant data meta-analysis. THE LANCET. RHEUMATOLOGY 2023; 5:e386-e400. [PMID: 38251550 DOI: 10.1016/s2665-9913(23)00122-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 04/08/2023] [Accepted: 04/17/2023] [Indexed: 01/23/2024]
Abstract
BACKGROUND Many international clinical guidelines recommend therapeutic exercise as a core treatment for knee and hip osteoarthritis. We aimed to identify individual patient-level moderators of the effect of therapeutic exercise for reducing pain and improving physical function in people with knee osteoarthritis, hip osteoarthritis, or both. METHODS We did a systematic review and individual participant data (IPD) meta-analysis of randomised controlled trials comparing therapeutic exercise with non-exercise controls in people with knee osteoathritis, hip osteoarthritis, or both. We searched ten databases from March 1, 2012, to Feb 25, 2019, for randomised controlled trials comparing the effects of exercise with non-exercise or other exercise controls on pain and physical function outcomes among people with knee osteoarthritis, hip osteoarthritis, or both. IPD were requested from leads of all eligible randomised controlled trials. 12 potential moderators of interest were explored to ascertain whether they were associated with short-term (12 weeks), medium-term (6 months), and long-term (12 months) effects of exercise on self-reported pain and physical function, in comparison with non-exercise controls. Overall intervention effects were also summarised. This study is prospectively registered on PROSPERO (CRD42017054049). FINDINGS Of 91 eligible randomised controlled trials that compared exercise with non-exercise controls, IPD from 31 randomised controlled trials (n=4241 participants) were included in the meta-analysis. Randomised controlled trials included participants with knee osteoarthritis (18 [58%] of 31 trials), hip osteoarthritis (six [19%]), or both (seven [23%]) and tested heterogeneous exercise interventions versus heterogeneous non-exercise controls, with variable risk of bias. Summary meta-analysis results showed that, on average, compared with non-exercise controls, therapeutic exercise reduced pain on a standardised 0-100 scale (with 100 corresponding to worst pain), with a difference of -6·36 points (95% CI -8·45 to -4·27, borrowing of strength [BoS] 10·3%, between-study variance [τ2] 21·6) in the short term, -3·77 points (-5·97 to -1·57, BoS 30·0%, τ2 14·4) in the medium term, and -3·43 points (-5·18 to -1·69, BoS 31·7%, τ2 4·5) in the long term. Therapeutic exercise also improved physical function on a standardised 0-100 scale (with 100 corresponding to worst physical function), with a difference of -4·46 points in the short term (95% CI -5·95 to -2·98, BoS 10·5%, τ2 10·1), -2·71 points in the medium term (-4·63 to -0·78, BoS 33·6%, τ2 11·9), and -3·39 points in the long term (-4·97 to -1·81, BoS 34·1%, τ2 6·4). Baseline pain and physical function moderated the effect of exercise on pain and physical function outcomes. Those with higher self-reported pain and physical function scores at baseline (ie, poorer physical function) generally benefited more than those with lower self-reported pain and physical function scores at baseline, with the evidence most certain in the short term (12 weeks). INTERPRETATION There was evidence of a small, positive overall effect of therapeutic exercise on pain and physical function compared with non-exercise controls. However, this effect is of questionable clinical importance, particularly in the medium and long term. As individuals with higher pain severity and poorer physical function at baseline benefited more than those with lower pain severity and better physical function at baseline, targeting individuals with higher levels of osteoarthritis-related pain and disability for therapeutic exercise might be of merit. FUNDING Chartered Society of Physiotherapy Charitable Trust and the National Institute for Health and Care Research.
Collapse
|
32
|
Marlin N, Godolphin PJ, Hooper RL, Riley RD, Rogozińska E. Nonlinear effects and effect modification at the participant-level in IPD meta-analysis part 2: methodological guidance is available. J Clin Epidemiol 2023; 159:319-329. [PMID: 37146657 DOI: 10.1016/j.jclinepi.2023.04.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 03/20/2023] [Accepted: 04/26/2023] [Indexed: 05/07/2023]
Abstract
OBJECTIVES To review methodological guidance for nonlinear covariate-outcome associations (NL), and linear effect modification and nonlinear effect modification (LEM and NLEM) at the participant level in individual participant data meta-analyses (IPDMAs) and their power requirements. STUDY DESIGN AND SETTING We searched Medline, Embase, Web of Science, Scopus, PsycINFO and the Cochrane Library to identify methodology publications on IPDMA of LEM, NL or NLEM (PROSPERO CRD42019126768). RESULTS Through screening 6,466 records we identified 54 potential articles of which 23 full texts were relevant. Nine further relevant publications were published before or after the literature search and were added. Of these 32 references, 21 articles considered LEM, 6 articles NL or NLEM and 6 articles described sample size calculations. A book described all four. Sample size may be calculated through simulation or closed form. Assessments of LEM or NLEM at the participant level need to be based on within-trial information alone. Nonlinearity (NL or NLEM) can be modeled using polynomials or splines to avoid categorization. CONCLUSION Detailed methodological guidance on IPDMA of effect modification at participant-level is available. However, methodology papers for sample size and nonlinearity are rarer and may not cover all scenarios. On these aspects, further guidance is needed.
Collapse
|
33
|
Marlin N, Godolphin PJ, Hooper RL, Riley RD, Rogozińska E. Nonlinear effects and effect modification at the participant-level in IPD meta-analysis part 1: analysis methods are often substandard. J Clin Epidemiol 2023; 159:309-318. [PMID: 37146661 DOI: 10.1016/j.jclinepi.2023.04.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 03/20/2023] [Accepted: 04/26/2023] [Indexed: 05/07/2023]
Abstract
OBJECTIVES To review analysis methods used for linear effect modification (LEM), nonlinear covariate-outcome associations (NL) and nonlinear effect modification (NLEM) at the participant-level in individual participant data meta-analysis (IPDMA). STUDY DESIGN AND SETTING We searched Medline, Embase, Web of Science, Scopus, PsycINFO and the Cochrane Library to identify IPDMA of randomized controlled trials (PROSPERO CRD42019126768). We investigated if and how IPDMA examined LEM, NL and NLEM, including whether aggregation bias was addressed and if power was considered. RESULTS We screened 6,466 records, randomly sampled 207 and identified 100 IPDMA of LEM, NL or NLEM. Power for LEM was calculated a priori in 3 IPDMA. Of 100 IPDMA, 94 analyzed LEM, 4 NLEM and 8 NL. One-stage models were favoured for all three (56%, 100%, 50%, respectively). Two-stage models were used in 15%, 0% and 25% of IPDMA with unclear descriptions in 30%, 0% and 25%, respectively. Only 12% of one-stage LEM and NLEM IPDMA provided sufficient detail to confirm they had addressed aggregation bias. CONCLUSION Investigation of effect modification at the participant-level is common in IPDMA projects, but methods are often open to bias or lack detailed descriptions. Nonlinearity of continuous covariates and power of IPDMA are rarely assessed.
Collapse
|
34
|
Andaur Navarro CL, Damen JAA, Takada T, Nijman SWJ, Dhiman P, Ma J, Collins GS, Bajpai R, Riley RD, Moons KGM, Hooft L. Systematic review finds "spin" practices and poor reporting standards in studies on machine learning-based prediction models. J Clin Epidemiol 2023; 158:99-110. [PMID: 37024020 DOI: 10.1016/j.jclinepi.2023.03.024] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 02/24/2023] [Accepted: 03/28/2023] [Indexed: 04/08/2023]
Abstract
OBJECTIVES We evaluated the presence and frequency of spin practices and poor reporting standards in studies that developed and/or validated clinical prediction models using supervised machine learning techniques. STUDY DESIGN AND SETTING We systematically searched PubMed from 01/2018 to 12/2019 to identify diagnostic and prognostic prediction model studies using supervised machine learning. No restrictions were placed on data source, outcome, or clinical specialty. RESULTS We included 152 studies: 38% reported diagnostic models and 62% prognostic models. When reported, discrimination was described without precision estimates in 53/71 abstracts (74.6% [95% CI 63.4-83.3]) and 53/81 main texts (65.4% [95% CI 54.6-74.9]). Of the 21 abstracts that recommended the model to be used in daily practice, 20 (95.2% [95% CI 77.3-99.8]) lacked any external validation of the developed models. Likewise, 74/133 (55.6% [95% CI 47.2-63.8]) studies made recommendations for clinical use in their main text without any external validation. Reporting guidelines were cited in 13/152 (8.6% [95% CI 5.1-14.1]) studies. CONCLUSION Spin practices and poor reporting standards are also present in studies on prediction models using machine learning techniques. A tailored framework for the identification of spin will enhance the sound reporting of prediction model studies.
Collapse
|
35
|
Nakafero G, Grainge MJ, Williams HC, Card T, Taal MW, Aithal GP, Fox CP, Mallen CD, van der Windt DA, Stevenson MD, Riley RD, Abhishek A. Risk stratified monitoring for methotrexate toxicity in immune mediated inflammatory diseases: prognostic model development and validation using primary care data from the UK. BMJ 2023; 381:e074678. [PMID: 37253479 DOI: 10.1136/bmj-2022-074678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
OBJECTIVE To develop and validate a prognostic model to inform risk stratified decisions on frequency of monitoring blood tests during long term methotrexate treatment. DESIGN Retrospective cohort study. SETTING Electronic health records within the UK's Clinical Practice Research Datalink (CPRD) Gold and CPRD Aurum. PARTICIPANTS Adults (≥18 years) with a diagnosis of an immune mediated inflammatory disease who were prescribed methotrexate by their general practitioner for six months or more during 2007-19. MAIN OUTCOME MEASURE Discontinuation of methotrexate owing to abnormal monitoring blood test result. Patients were followed-up from six months after their first prescription for methotrexate in primary care to the earliest of outcome, drug discontinuation for any other reason, leaving the practice, last data collection from the practice, death, five years, or 31 December 2019. Cox regression was performed to develop the risk equation, with bootstrapping used to shrink predictor effects for optimism. Multiple imputation handled missing predictor data. Model performance was assessed in terms of calibration and discrimination. RESULTS Data from 13 110 (854 events) and 23 999 (1486 events) participants were included in the development and validation cohorts, respectively. 11 candidate predictors (17 parameters) were included. In the development dataset, the optimism adjusted R2 was 0.13 and the optimism adjusted Royston D statistic was 0.79. The calibration slope and Royston D statistic in the validation dataset for the entire follow-up period was 0.94 (95% confidence interval 0.85 to 1.02) and 0.75 (95% confidence interval 0.67 to 0.83), respectively. The prognostic model performed well in predicting outcomes in clinically relevant subgroups defined by age group, type of immune mediated inflammatory disease, and methotrexate dose. CONCLUSION A prognostic model was developed and validated that uses information collected during routine clinical care and may be used to risk stratify the frequency of monitoring blood test during long term methotrexate treatment.
Collapse
|
36
|
Snell KIE, Levis B, Damen JAA, Dhiman P, Debray TPA, Hooft L, Reitsma JB, Moons KGM, Collins GS, Riley RD. Transparent reporting of multivariable prediction models for individual prognosis or diagnosis: checklist for systematic reviews and meta-analyses (TRIPOD-SRMA). BMJ 2023; 381:e073538. [PMID: 37137496 PMCID: PMC10155050 DOI: 10.1136/bmj-2022-073538] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/22/2023] [Indexed: 05/05/2023]
|
37
|
Dhiman P, Ma J, Andaur Navarro CL, Speich B, Bullock G, Damen JAA, Hooft L, Kirtley S, Riley RD, Van Calster B, Moons KGM, Collins GS. Overinterpretation of findings in machine learning prediction model studies in oncology: a systematic review. J Clin Epidemiol 2023; 157:120-133. [PMID: 36935090 DOI: 10.1016/j.jclinepi.2023.03.012] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 03/03/2023] [Accepted: 03/14/2023] [Indexed: 03/19/2023]
Abstract
OBJECTIVES In biomedical research, spin is the overinterpretation of findings, and it is a growing concern. To date, the presence of spin has not been evaluated in prognostic model research in oncology, including studies developing and validating models for individualized risk prediction. STUDY DESIGN AND SETTING We conducted a systematic review, searching MEDLINE and EMBASE for oncology-related studies that developed and validated a prognostic model using machine learning published between 1st January, 2019, and 5th September, 2019. We used existing spin frameworks and described areas of highly suggestive spin practices. RESULTS We included 62 publications (including 152 developed models; 37 validated models). Reporting was inconsistent between methods and the results in 27% of studies due to additional analysis and selective reporting. Thirty-two studies (out of 36 applicable studies) reported comparisons between developed models in their discussion and predominantly used discrimination measures to support their claims (78%). Thirty-five studies (56%) used an overly strong or leading word in their title, abstract, results, discussion, or conclusion. CONCLUSION The potential for spin needs to be considered when reading, interpreting, and using studies that developed and validated prognostic models in oncology. Researchers should carefully report their prognostic model research using words that reflect their actual results and strength of evidence.
Collapse
|
38
|
Pate A, Riley RD, Collins GS, van Smeden M, Van Calster B, Ensor J, Martin GP. Minimum sample size for developing a multivariable prediction model using multinomial logistic regression. Stat Methods Med Res 2023; 32:555-571. [PMID: 36660777 PMCID: PMC10012398 DOI: 10.1177/09622802231151220] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
AIMS Multinomial logistic regression models allow one to predict the risk of a categorical outcome with > 2 categories. When developing such a model, researchers should ensure the number of participants (n ) is appropriate relative to the number of events (E k ) and the number of predictor parameters (p k ) for each category k. We propose three criteria to determine the minimum n required in light of existing criteria developed for binary outcomes. PROPOSED CRITERIA The first criterion aims to minimise the model overfitting. The second aims to minimise the difference between the observed and adjusted R 2 Nagelkerke. The third criterion aims to ensure the overall risk is estimated precisely. For criterion (i), we show the sample size must be based on the anticipated Cox-snell R 2 of distinct 'one-to-one' logistic regression models corresponding to the sub-models of the multinomial logistic regression, rather than on the overall Cox-snell R 2 of the multinomial logistic regression. EVALUATION OF CRITERIA We tested the performance of the proposed criteria (i) through a simulation study and found that it resulted in the desired level of overfitting. Criterion (ii) and (iii) were natural extensions from previously proposed criteria for binary outcomes and did not require evaluation through simulation. SUMMARY We illustrated how to implement the sample size criteria through a worked example considering the development of a multinomial risk prediction model for tumour type when presented with an ovarian mass. Code is provided for the simulation and worked example. We will embed our proposed criteria within the pmsampsize R library and Stata modules.
Collapse
|
39
|
Lee SI, Hope H, O'Reilly D, Kent L, Santorelli G, Subramanian A, Moss N, Azcoaga-Lorenzo A, Fagbamigbe AF, Nelson-Piercy C, Yau C, McCowan C, Kennedy JI, Phillips K, Singh M, Mhereeg M, Cockburn N, Brocklehurst P, Plachcinski R, Riley RD, Thangaratinam S, Brophy S, Hemali Sudasinghe SPB, Agrawal U, Vowles Z, Abel KM, Nirantharakumar K, Black M, Eastwood KA. Maternal and child outcomes for pregnant women with pre-existing multiple long-term conditions: protocol for an observational study in the UK. BMJ Open 2023; 13:e068718. [PMID: 36828655 PMCID: PMC9972454 DOI: 10.1136/bmjopen-2022-068718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Accepted: 02/07/2023] [Indexed: 02/26/2023] Open
Abstract
INTRODUCTION One in five pregnant women has multiple pre-existing long-term conditions in the UK. Studies have shown that maternal multiple long-term conditions are associated with adverse outcomes. This observational study aims to compare maternal and child outcomes for pregnant women with multiple long-term conditions to those without multiple long-term conditions (0 or 1 long-term conditions). METHODS AND ANALYSIS Pregnant women aged 15-49 years old with a conception date between 2000 and 2019 in the UK will be included with follow-up till 2019. The data source will be routine health records from all four UK nations (Clinical Practice Research Datalink (England), Secure Anonymised Information Linkage (Wales), Scotland routine health records and Northern Ireland Maternity System) and the Born in Bradford birth cohort. The exposure of two or more pre-existing, long-term physical or mental health conditions will be defined from a list of health conditions predetermined by women and clinicians. The association of maternal multiple long-term conditions with (a) antenatal, (b) peripartum, (c) postnatal and long-term and (d) mental health outcomes, for both women and their children will be examined. Outcomes of interest will be guided by a core outcome set. Comparisons will be made between pregnant women with and without multiple long-term conditions using modified Poisson and Cox regression. Generalised estimating equation will account for the clustering effect of women who had more than one pregnancy episode. Where appropriate, multiple imputation with chained equation will be used for missing data. Federated analysis will be conducted for each dataset and results will be pooled using random-effects meta-analyses. ETHICS AND DISSEMINATION Approval has been obtained from the respective data sources in each UK nation. Study findings will be submitted for publications in peer-reviewed journals and presented at key conferences.
Collapse
|
40
|
Debray TPA, Collins GS, Riley RD, Snell KIE, Van Calster B, Reitsma JB, Moons KGM. Transparent reporting of multivariable prediction models developed or validated using clustered data: TRIPOD-Cluster checklist. BMJ 2023; 380:e071018. [PMID: 36750242 PMCID: PMC9903175 DOI: 10.1136/bmj-2022-071018] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/09/2022] [Indexed: 02/09/2023]
|
41
|
Debray TPA, Collins GS, Riley RD, Snell KIE, Van Calster B, Reitsma JB, Moons KGM. Transparent reporting of multivariable prediction models developed or validated using clustered data (TRIPOD-Cluster): explanation and elaboration. BMJ 2023; 380:e071058. [PMID: 36750236 PMCID: PMC9903176 DOI: 10.1136/bmj-2022-071058] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/07/2022] [Indexed: 02/09/2023]
|
42
|
Hudda MT, Archer L, van Smeden M, Moons KGM, Collins GS, Steyerberg EW, Wahlich C, Reitsma JB, Riley RD, Van Calster B, Wynants L. Minimal reporting improvement after peer review in reports of COVID-19 prediction models: systematic review. J Clin Epidemiol 2023; 154:75-84. [PMID: 36528232 PMCID: PMC9749392 DOI: 10.1016/j.jclinepi.2022.12.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 11/29/2022] [Accepted: 12/07/2022] [Indexed: 12/15/2022]
Abstract
OBJECTIVES To assess improvement in the completeness of reporting coronavirus (COVID-19) prediction models after the peer review process. STUDY DESIGN AND SETTING Studies included in a living systematic review of COVID-19 prediction models, with both preprint and peer-reviewed published versions available, were assessed. The primary outcome was the change in percentage adherence to the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) reporting guidelines between pre-print and published manuscripts. RESULTS Nineteen studies were identified including seven (37%) model development studies, two external validations of existing models (11%), and 10 (53%) papers reporting on both development and external validation of the same model. Median percentage adherence among preprint versions was 33% (min-max: 10 to 68%). The percentage adherence of TRIPOD components increased from preprint to publication in 11/19 studies (58%), with adherence unchanged in the remaining eight studies. The median change in adherence was just 3 percentage points (pp, min-max: 0-14 pp) across all studies. No association was observed between the change in percentage adherence and preprint score, journal impact factor, or time between journal submission and acceptance. CONCLUSIONS The preprint reporting quality of COVID-19 prediction modeling studies is poor and did not improve much after peer review, suggesting peer review had a trivial effect on the completeness of reporting during the pandemic.
Collapse
|
43
|
Andaur Navarro CL, Damen JAA, van Smeden M, Takada T, Nijman SWJ, Dhiman P, Ma J, Collins GS, Bajpai R, Riley RD, Moons KGM, Hooft L. Systematic review identifies the design and methodological conduct of studies on machine learning-based prediction models. J Clin Epidemiol 2023; 154:8-22. [PMID: 36436815 DOI: 10.1016/j.jclinepi.2022.11.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 10/09/2022] [Accepted: 11/22/2022] [Indexed: 11/27/2022]
Abstract
BACKGROUND AND OBJECTIVES We sought to summarize the study design, modelling strategies, and performance measures reported in studies on clinical prediction models developed using machine learning techniques. METHODS We search PubMed for articles published between 01/01/2018 and 31/12/2019, describing the development or the development with external validation of a multivariable prediction model using any supervised machine learning technique. No restrictions were made based on study design, data source, or predicted patient-related health outcomes. RESULTS We included 152 studies, 58 (38.2% [95% CI 30.8-46.1]) were diagnostic and 94 (61.8% [95% CI 53.9-69.2]) prognostic studies. Most studies reported only the development of prediction models (n = 133, 87.5% [95% CI 81.3-91.8]), focused on binary outcomes (n = 131, 86.2% [95% CI 79.8-90.8), and did not report a sample size calculation (n = 125, 82.2% [95% CI 75.4-87.5]). The most common algorithms used were support vector machine (n = 86/522, 16.5% [95% CI 13.5-19.9]) and random forest (n = 73/522, 14% [95% CI 11.3-17.2]). Values for area under the Receiver Operating Characteristic curve ranged from 0.45 to 1.00. Calibration metrics were often missed (n = 494/522, 94.6% [95% CI 92.4-96.3]). CONCLUSION Our review revealed that focus is required on handling of missing values, methods for internal validation, and reporting of calibration to improve the methodological conduct of studies on machine learning-based prediction models. SYSTEMATIC REVIEW REGISTRATION PROSPERO, CRD42019161764.
Collapse
|
44
|
Sperrin M, Riley RD, Collins GS, Martin GP. Targeted validation: validating clinical prediction models in their intended population and setting. Diagn Progn Res 2022; 6:24. [PMID: 36550534 PMCID: PMC9773429 DOI: 10.1186/s41512-022-00136-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 11/14/2022] [Indexed: 12/24/2022] Open
Abstract
Clinical prediction models must be appropriately validated before they can be used. While validation studies are sometimes carefully designed to match an intended population/setting of the model, it is common for validation studies to take place with arbitrary datasets, chosen for convenience rather than relevance. We call estimating how well a model performs within the intended population/setting "targeted validation". Use of this term sharpens the focus on the intended use of a model, which may increase the applicability of developed models, avoid misleading conclusions, and reduce research waste. It also exposes that external validation may not be required when the intended population for the model matches the population used to develop the model; here, a robust internal validation may be sufficient, especially if the development dataset was large.
Collapse
|
45
|
Riley RD, Cole TJ, Deeks J, Kirkham JJ, Morris J, Perera R, Wade A, Collins GS. On the 12th Day of Christmas, a Statistician Sent to Me . . . BMJ 2022; 379:e072883. [PMID: 36593578 PMCID: PMC9844255 DOI: 10.1136/bmj-2022-072883] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
46
|
Archer L, Koshiaris C, Lay-Flurrie S, Snell KIE, Riley RD, Stevens R, Banerjee A, Usher-Smith JA, Clegg A, Payne RA, Hobbs FDR, McManus RJ, Sheppard JP. Development and external validation of a risk prediction model for falls in patients with an indication for antihypertensive treatment: retrospective cohort study. BMJ 2022; 379:e070918. [PMID: 36347531 PMCID: PMC9641577 DOI: 10.1136/bmj-2022-070918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/21/2022] [Indexed: 11/09/2022]
Abstract
OBJECTIVE To develop and externally validate the STRAtifying Treatments In the multi-morbid Frail elderlY (STRATIFY)-Falls clinical prediction model to identify the risk of hospital admission or death from a fall in patients with an indication for antihypertensive treatment. DESIGN Retrospective cohort study. SETTING Primary care data from electronic health records contained within the UK Clinical Practice Research Datalink (CPRD). PARTICIPANTS Patients aged 40 years or older with at least one blood pressure measurement between 130 mm Hg and 179 mm Hg. MAIN OUTCOME MEASURE First serious fall, defined as hospital admission or death with a primary diagnosis of a fall within 10 years of the index date (12 months after cohort entry). Model development was conducted using a Fine-Gray approach in data from CPRD GOLD, accounting for the competing risk of death from other causes, with subsequent recalibration at one, five, and 10 years using pseudo values. External validation was conducted using data from CPRD Aurum, with performance assessed through calibration curves and the observed to expected ratio, C statistic, and D statistic, pooled across general practices, and clinical utility using decision curve analysis at thresholds around 10%. RESULTS Analysis included 1 772 600 patients (experiencing 62 691 serious falls) from CPRD GOLD used in model development, and 3 805 366 (experiencing 206 956 serious falls) from CPRD Aurum in the external validation. The final model consisted of 24 predictors, including age, sex, ethnicity, alcohol consumption, living in an area of high social deprivation, a history of falls, multiple sclerosis, and prescriptions of antihypertensives, antidepressants, hypnotics, and anxiolytics. Upon external validation, the recalibrated model showed good discrimination, with pooled C statistics of 0.833 (95% confidence interval 0.831 to 0.835) and 0.843 (0.841 to 0.844) at five and 10 years, respectively. Original model calibration was poor on visual inspection and although this was improved with recalibration, under-prediction of risk remained (observed to expected ratio at 10 years 1.839, 95% confidence interval 1.811 to 1.865). Nevertheless, decision curve analysis suggests potential clinical utility, with net benefit larger than other strategies. CONCLUSIONS This prediction model uses commonly recorded clinical characteristics and distinguishes well between patients at high and low risk of falls in the next 1-10 years. Although miscalibration was evident on external validation, the model still had potential clinical utility around risk thresholds of 10% and so could be useful in routine clinical practice to help identify those at high risk of falls who might benefit from closer monitoring or early intervention to prevent future falls. Further studies are needed to explore the appropriate thresholds that maximise the model's clinical utility and cost effectiveness.
Collapse
|
47
|
Riley RD, Hattle M, Collins GS, Whittle R, Ensor J. Calculating the power to examine treatment-covariate interactions when planning an individual participant data meta-analysis of randomized trials with a binary outcome. Stat Med 2022; 41:4822-4837. [PMID: 35932153 PMCID: PMC9805219 DOI: 10.1002/sim.9538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 06/30/2022] [Accepted: 07/12/2022] [Indexed: 01/09/2023]
Abstract
Before embarking on an individual participant data meta-analysis (IPDMA) project, researchers and funders need assurance it is worth their time and cost. This should include consideration of how many studies are promising their IPD and, given the characteristics of these studies, the power of an IPDMA including them. Here, we show how to estimate the power of a planned IPDMA of randomized trials to examine treatment-covariate interactions at the participant level (ie, treatment effect modifiers). We focus on a binary outcome with binary or continuous covariates, and propose a three-step approach, which assumes the true interaction size is common to all trials. In step one, the user must specify a minimally important interaction size and, for each trial separately (eg, as obtained from trial publications), the following aggregate data: the number of participants and events in control and treatment groups, the mean and SD for each continuous covariate, and the proportion of participants in each category for each binary covariate. This allows the variance of the interaction estimate to be calculated for each trial, using an analytic solution for Fisher's information matrix from a logistic regression model. Step 2 calculates the variance of the summary interaction estimate from the planned IPDMA (equal to the inverse of the sum of the inverse trial variances from step 1), and step 3 calculates the corresponding power based on a two-sided Wald test. Stata and R code are provided, and two examples given for illustration. Extension to allow for between-study heterogeneity is also considered.
Collapse
|
48
|
Hudda MT, Wells JCK, Adair LS, Alvero-Cruz JRA, Ashby-Thompson MN, Ballesteros-Vásquez MN, Barrera-Exposito J, Caballero B, Carnero EA, Cleghorn GJ, Davies PSW, Desmond M, Devakumar D, Gallagher D, Guerrero-Alcocer EV, Haschke F, Horlick M, Ben Jemaa H, Khan AI, Mankai A, Monyeki MA, Nashandi HL, Ortiz-Hernandez L, Plasqui G, Reichert FF, Robles-Sardin AE, Rush E, Shypailo RJ, Sobiecki JG, Ten Hoor GA, Valdés J, Wickramasinghe VP, Wong WW, Riley RD, Owen CG, Whincup PH, Nightingale CM. External validation of a prediction model for estimating fat mass in children and adolescents in 19 countries: individual participant data meta-analysis. BMJ 2022; 378:e071185. [PMID: 36130780 PMCID: PMC9490487 DOI: 10.1136/bmj-2022-071185] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
OBJECTIVE To evaluate the performance of a UK based prediction model for estimating fat-free mass (and indirectly fat mass) in children and adolescents in non-UK settings. DESIGN Individual participant data meta-analysis. SETTING 19 countries. PARTICIPANTS 5693 children and adolescents (49.7% boys) aged 4 to 15 years with complete data on the predictors included in the UK based model (weight, height, age, sex, and ethnicity) and on the independently assessed outcome measure (fat-free mass determined by deuterium dilution assessment). MAIN OUTCOME MEASURES The outcome of the UK based prediction model was natural log transformed fat-free mass (lnFFM). Predictive performance statistics of R2, calibration slope, calibration-in-the-large, and root mean square error were assessed in each of the 19 countries and then pooled through random effects meta-analysis. Calibration plots were also derived for each country, including flexible calibration curves. RESULTS The model showed good predictive ability in non-UK populations of children and adolescents, providing R2 values of >75% in all countries and >90% in 11 of the 19 countries, and with good calibration (ie, agreement) of observed and predicted values. Root mean square error values (on fat-free mass scale) were <4 kg in 17 of the 19 settings. Pooled values (95% confidence intervals) of R2, calibration slope, and calibration-in-the-large were 88.7% (85.9% to 91.4%), 0.98 (0.97 to 1.00), and 0.01 (-0.02 to 0.04), respectively. Heterogeneity was evident in the R2 and calibration-in-the-large values across settings, but not in the calibration slope. Model performance did not vary markedly between boys and girls, age, ethnicity, and national income groups. To further improve the accuracy of the predictions, the model equation was recalibrated for the intercept in each setting so that country specific equations are available for future use. CONCLUSION The UK based prediction model, which is based on readily available measures, provides predictions of childhood fat-free mass, and hence fat mass, in a range of non-UK settings that explain a large proportion of the variability in observed fat-free mass, and exhibit good calibration performance, especially after recalibration of the intercept for each population. The model demonstrates good generalisability in both low-middle income and high income populations of healthy children and adolescents aged 4-15 years.
Collapse
|
49
|
Riley RD, Dias S, Donegan S, Tierney JF, Stewart LA, Efthimiou O, Phillippo DM. Using individual participant data to improve network meta-analysis projects. BMJ Evid Based Med 2022; 28:197-203. [PMID: 35948411 DOI: 10.1136/bmjebm-2022-111931] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/01/2022] [Indexed: 11/04/2022]
Abstract
A network meta-analysis combines the evidence from existing randomised trials about the comparative efficacy of multiple treatments. It allows direct and indirect evidence about each comparison to be included in the same analysis, and provides a coherent framework to compare and rank treatments. A traditional network meta-analysis uses aggregate data (eg, treatment effect estimates and standard errors) obtained from publications or trial investigators. An alternative approach is to obtain, check, harmonise and meta-analyse the individual participant data (IPD) from each trial. In this article, we describe potential advantages of IPD for network meta-analysis projects, emphasising five key benefits: (1) improving the quality and scope of information available for inclusion in the meta-analysis, (2) examining and plotting distributions of covariates across trials (eg, for potential effect modifiers), (3) standardising and improving the analysis of each trial, (4) adjusting for prognostic factors to allow a network meta-analysis of conditional treatment effects and (5) including treatment-covariate interactions (effect modifiers) to allow relative treatment effects to vary by participant-level covariate values (eg, age, baseline depression score). A running theme of all these benefits is that they help examine and reduce heterogeneity (differences in the true treatment effect between trials) and inconsistency (differences in the true treatment effect between direct and indirect evidence) in the network. As a consequence, an IPD network meta-analysis has the potential for more precise, reliable and informative results for clinical practice and even allows treatment comparisons to be made for individual patients and targeted populations conditional on their particular characteristics.
Collapse
|
50
|
Moriarty AS, Meader N, Snell KIE, Riley RD, Paton LW, Dawson S, Hendon J, Chew-Graham CA, Gilbody S, Churchill R, Phillips RS, Ali S, McMillan D. Predicting relapse or recurrence of depression: systematic review of prognostic models. Br J Psychiatry 2022; 221:448-458. [PMID: 35048843 DOI: 10.1192/bjp.2021.218] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
BACKGROUND Relapse and recurrence of depression are common, contributing to the overall burden of depression globally. Accurate prediction of relapse or recurrence while patients are well would allow the identification of high-risk individuals and may effectively guide the allocation of interventions to prevent relapse and recurrence. AIMS To review prognostic models developed to predict the risk of relapse, recurrence, sustained remission, or recovery in adults with remitted major depressive disorder. METHOD We searched the Cochrane Library (current issue); Ovid MEDLINE (1946 onwards); Ovid Embase (1980 onwards); Ovid PsycINFO (1806 onwards); and Web of Science (1900 onwards) up to May 2021. We included development and external validation studies of multivariable prognostic models. We assessed risk of bias of included studies using the Prediction model risk of bias assessment tool (PROBAST). RESULTS We identified 12 eligible prognostic model studies (11 unique prognostic models): 8 model development-only studies, 3 model development and external validation studies and 1 external validation-only study. Multiple estimates of performance measures were not available and meta-analysis was therefore not necessary. Eleven out of the 12 included studies were assessed as being at high overall risk of bias and none examined clinical utility. CONCLUSIONS Due to high risk of bias of the included studies, poor predictive performance and limited external validation of the models identified, presently available clinical prediction models for relapse and recurrence of depression are not yet sufficiently developed for deploying in clinical settings. There is a need for improved prognosis research in this clinical area and future studies should conform to best practice methodological and reporting guidelines.
Collapse
|