1
|
Ali F, Clark H, Machulda M, Senjem ML, Lowe VJ, Jack CR, Josephs KA, Whitwell J, Botha H. Patterns of brain volume and metabolism predict clinical features in the progressive supranuclear palsy spectrum. Brain Commun 2024; 6:fcae233. [PMID: 39056025 PMCID: PMC11272075 DOI: 10.1093/braincomms/fcae233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 03/26/2024] [Accepted: 07/14/2024] [Indexed: 07/28/2024] Open
Abstract
Progressive supranuclear palsy (PSP) is a neurodegenerative tauopathy that presents with highly heterogenous clinical syndromes. We perform cross-sectional data-driven discovery of independent patterns of brain atrophy and hypometabolism across the entire PSP spectrum. We then use these patterns to predict specific clinical features and to assess their relationship to phenotypic heterogeneity. We included 111 patients with PSP (60 with Richardson syndrome and 51 with cortical and subcortical variant subtypes). Ninety-one were used as the training set and 20 as a test set. The presence and severity of granular clinical variables such as postural instability, parkinsonism, apraxia and supranuclear gaze palsy were noted. Domains of akinesia, ocular motor impairment, postural instability and cognitive dysfunction as defined by the Movement Disorders Society criteria for PSP were also recorded. Non-negative matrix factorization was used on cross-sectional MRI and fluorodeoxyglucose-positron emission tomography (FDG-PET) scans. Independent models for each as well as a combined model for MRI and FDG-PET were developed and used to predict the granular clinical variables. Both MRI and FDG-PET were better at predicting presence of a symptom than severity, suggesting identification of disease state may be more robust than disease stage. FDG-PET predicted predominantly cortical abnormalities better than MRI such as ideomotor apraxia, apraxia of speech and frontal dysexecutive syndrome. MRI demonstrated prediction of cortical and more so sub-cortical abnormalities, such as parkinsonism. Distinct neuroanatomical foci were predictive in MRI- and FDG-PET-based models. For example, vertical gaze palsy was predicted by midbrain atrophy on MRI, but frontal eye field hypometabolism on FDG-PET. Findings also differed by scale or instrument used. For example, prediction of ocular motor abnormalities using the PSP Saccadic Impairment Scale was stronger than with the Movement Disorders Society Diagnostic criteria for PSP oculomotor impairment designation. Combination of MRI and FDG-PET demonstrated enhanced detection of parkinsonism and frontal syndrome presence and apraxia, cognitive impairment and bradykinesia severity. Both MRI and FDG-PET patterns were able to predict some measures in the test set; however, prediction of global cognition measured by Montreal Cognitive Assessment was the strongest. MRI predictions generalized more robustly to the test set. PSP leads to neurodegeneration in motor, cognitive and ocular motor networks at cortical and subcortical foci, leading to diverse yet overlapping clinical syndromes. To advance understanding of phenotypic heterogeneity in PSP, it is essential to consider data-driven approaches to clinical neuroimaging analyses.
Collapse
Affiliation(s)
- Farwa Ali
- Department of Neurology, Mayo Clinic, Rochester, MN 55905, USA
| | - Heather Clark
- Department of Neurology, Mayo Clinic, Rochester, MN 55905, USA
| | - Mary Machulda
- Department of Psychiatry and Psychology, Mayo Clinic, Rochester, MN 55905, USA
| | | | - Val J Lowe
- Department of Radiology, Mayo Clinic, Rochester, MN 55905, USA
| | - Clifford R Jack
- Department of Radiology, Mayo Clinic, Rochester, MN 55905, USA
| | - Keith A Josephs
- Department of Neurology, Mayo Clinic, Rochester, MN 55905, USA
| | | | - Hugo Botha
- Department of Neurology, Mayo Clinic, Rochester, MN 55905, USA
| |
Collapse
|
2
|
Pavlou M, Ambler G, Qu C, Seaman SR, White IR, Omar RZ. An evaluation of sample size requirements for developing risk prediction models with binary outcomes. BMC Med Res Methodol 2024; 24:146. [PMID: 38987715 PMCID: PMC11234534 DOI: 10.1186/s12874-024-02268-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 06/24/2024] [Indexed: 07/12/2024] Open
Abstract
BACKGROUND Risk prediction models are routinely used to assist in clinical decision making. A small sample size for model development can compromise model performance when the model is applied to new patients. For binary outcomes, the calibration slope (CS) and the mean absolute prediction error (MAPE) are two key measures on which sample size calculations for the development of risk models have been based. CS quantifies the degree of model overfitting while MAPE assesses the accuracy of individual predictions. METHODS Recently, two formulae were proposed to calculate the sample size required, given anticipated features of the development data such as the outcome prevalence and c-statistic, to ensure that the expectation of the CS and MAPE (over repeated samples) in models fitted using MLE will meet prespecified target values. In this article, we use a simulation study to evaluate the performance of these formulae. RESULTS We found that both formulae work reasonably well when the anticipated model strength is not too high (c-statistic < 0.8), regardless of the outcome prevalence. However, for higher model strengths the CS formula underestimates the sample size substantially. For example, for c-statistic = 0.85 and 0.9, the sample size needed to be increased by at least 50% and 100%, respectively, to meet the target expected CS. On the other hand, the MAPE formula tends to overestimate the sample size for high model strengths. These conclusions were more pronounced for higher prevalence than for lower prevalence. Similar results were drawn when the outcome was time to event with censoring. Given these findings, we propose a simulation-based approach, implemented in the new R package 'samplesizedev', to correctly estimate the sample size even for high model strengths. The software can also calculate the variability in CS and MAPE, thus allowing for assessment of model stability. CONCLUSIONS The calibration and MAPE formulae suggest sample sizes that are generally appropriate for use when the model strength is not too high. However, they tend to be biased for higher model strengths, which are not uncommon in clinical risk prediction studies. On those occasions, our proposed adjustments to the sample size calculations will be relevant.
Collapse
Affiliation(s)
| | | | - Chen Qu
- Department of Statistical Science, UCL, London, UK
| | - Shaun R Seaman
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| | | | | |
Collapse
|
3
|
Chalkou K, Hamza T, Benkert P, Kuhle J, Zecca C, Simoneau G, Pellegrini F, Manca A, Egger M, Salanti G. Combining randomized and non-randomized data to predict heterogeneous effects of competing treatments. Res Synth Methods 2024; 15:641-656. [PMID: 38501273 DOI: 10.1002/jrsm.1717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 01/26/2024] [Accepted: 02/16/2024] [Indexed: 03/20/2024]
Abstract
Some patients benefit from a treatment while others may do so less or do not benefit at all. We have previously developed a two-stage network meta-regression prediction model that synthesized randomized trials and evaluates how treatment effects vary across patient characteristics. In this article, we extended this model to combine different sources of types in different formats: aggregate data (AD) and individual participant data (IPD) from randomized and non-randomized evidence. In the first stage, a prognostic model is developed to predict the baseline risk of the outcome using a large cohort study. In the second stage, we recalibrated this prognostic model to improve our predictions for patients enrolled in randomized trials. In the third stage, we used the baseline risk as effect modifier in a network meta-regression model combining AD, IPD randomized clinical trial to estimate heterogeneous treatment effects. We illustrated the approach in the re-analysis of a network of studies comparing three drugs for relapsing-remitting multiple sclerosis. Several patient characteristics influence the baseline risk of relapse, which in turn modifies the effect of the drugs. The proposed model makes personalized predictions for health outcomes under several treatment options and encompasses all relevant randomized and non-randomized evidence.
Collapse
Affiliation(s)
- Konstantina Chalkou
- Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland
- Graduate School for Health Sciences, University of Bern, Bern, Switzerland
- Department of Clinical Research, University of Bern, Bern, Switzerland
| | - Tasnim Hamza
- Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland
- Graduate School for Health Sciences, University of Bern, Bern, Switzerland
| | - Pascal Benkert
- Department of Clinical Research, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Jens Kuhle
- Multiple Sclerosis Centre, Neurologic Clinic and Policlinic, Department of Head, Spine and Neuromedicine, University Hospital Basel, University of Basel, Basel, Switzerland
- Multiple Sclerosis Centre, Neurologic Clinic and Policlinic, Department of Biomedicine, University Hospital Basel, University of Basel, Basel, Switzerland
- Multiple Sclerosis Centre, Neurologic Clinic and Policlinic, Department of Clinical Research, University Hospital Basel, University of Basel, Basel, Switzerland
- Research Center for Clinical Neuroimmunology and Neuroscience (RC2NB), University Hospital, University of Basel, Basel, Switzerland
| | - Chiara Zecca
- Multiple Sclerosis Center, Neurocenter of Southern Switzerland, EOC, Lugano, Switzerland
- Faculty of Biomedical Sciences, Università della Svizzera Italiana, Lugano, Switzerland
| | | | | | - Andrea Manca
- Centre for Health Economics, University of York, York, UK
| | - Matthias Egger
- Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
| | - Georgia Salanti
- Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland
| |
Collapse
|
4
|
Pavlou M, Omar RZ, Ambler G. Penalized Regression Methods With Modified Cross-Validation and Bootstrap Tuning Produce Better Prediction Models. Biom J 2024; 66:e202300245. [PMID: 38922968 DOI: 10.1002/bimj.202300245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 04/22/2024] [Accepted: 05/06/2024] [Indexed: 06/28/2024]
Abstract
Risk prediction models fitted using maximum likelihood estimation (MLE) are often overfitted resulting in predictions that are too extreme and a calibration slope (CS) less than 1. Penalized methods, such as Ridge and Lasso, have been suggested as a solution to this problem as they tend to shrink regression coefficients toward zero, resulting in predictions closer to the average. The amount of shrinkage is regulated by a tuning parameter,λ , $\lambda ,$ commonly selected via cross-validation ("standard tuning"). Though penalized methods have been found to improve calibration on average, they often over-shrink and exhibit large variability in the selected λ $\lambda $ and hence the CS. This is a problem, particularly for small sample sizes, but also when using sample sizes recommended to control overfitting. We consider whether these problems are partly due to selecting λ $\lambda $ using cross-validation with "training" datasets of reduced size compared to the original development sample, resulting in an over-estimation of λ $\lambda $ and, hence, excessive shrinkage. We propose a modified cross-validation tuning method ("modified tuning"), which estimates λ $\lambda $ from a pseudo-development dataset obtained via bootstrapping from the original dataset, albeit of larger size, such that the resulting cross-validation training datasets are of the same size as the original dataset. Modified tuning can be easily implemented in standard software and is closely related to bootstrap selection of the tuning parameter ("bootstrap tuning"). We evaluated modified and bootstrap tuning for Ridge and Lasso in simulated and real data using recommended sample sizes, and sizes slightly lower and higher. They substantially improved the selection of λ $\lambda $ , resulting in improved CS compared to the standard tuning method. They also improved predictions compared to MLE.
Collapse
|
5
|
Gutman R, Karavani E, Shimoni Y. Improving Inverse Probability Weighting by Post-calibrating Its Propensity Scores. Epidemiology 2024; 35:473-480. [PMID: 38619218 PMCID: PMC11191550 DOI: 10.1097/ede.0000000000001733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 03/18/2024] [Indexed: 04/16/2024]
Abstract
Theoretical guarantees for causal inference using propensity scores are partially based on the scores behaving like conditional probabilities. However, prediction scores between zero and one do not necessarily behave like probabilities, especially when output by flexible statistical estimators. We perform a simulation study to assess the error in estimating the average treatment effect before and after applying a simple and well-established postprocessing method to calibrate the propensity scores. We observe that postcalibration reduces the error in effect estimation and that larger improvements in calibration result in larger improvements in effect estimation. Specifically, we find that expressive tree-based estimators, which are often less calibrated than logistic regression-based models initially, tend to show larger improvements relative to logistic regression-based models. Given the improvement in effect estimation and that postcalibration is computationally cheap, we recommend its adoption when modeling propensity scores with expressive models.
Collapse
Affiliation(s)
- Rom Gutman
- From the IBM Research, University of Haifa Campus
- Technion - Israel Institute of Technology, Haifa, Israel
| | | | | |
Collapse
|
6
|
Fan Q, Wang Y, Cheng J, Pan B, Zang X, Liu R, Deng Y. Single-cell RNA-seq reveals T cell exhaustion and immune response landscape in osteosarcoma. Front Immunol 2024; 15:1362970. [PMID: 38629071 PMCID: PMC11018946 DOI: 10.3389/fimmu.2024.1362970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Accepted: 03/18/2024] [Indexed: 04/19/2024] Open
Abstract
Background T cell exhaustion in the tumor microenvironment has been demonstrated as a substantial contributor to tumor immunosuppression and progression. However, the correlation between T cell exhaustion and osteosarcoma (OS) remains unclear. Methods In our present study, single-cell RNA-seq data for OS from the GEO database was analysed to identify CD8+ T cells and discern CD8+ T cell subsets objectively. Subgroup differentiation trajectory was then used to pinpoint genes altered in response to T cell exhaustion. Subsequently, six machine learning algorithms were applied to develop a prognostic model linked with T cell exhaustion. This model was subsequently validated in the TARGETs and Meta cohorts. Finally, we examined disparities in immune cell infiltration, immune checkpoints, immune-related pathways, and the efficacy of immunotherapy between high and low TEX score groups. Results The findings unveiled differential exhaustion in CD8+ T cells within the OS microenvironment. Three genes related to T cell exhaustion (RAD23A, SAC3D1, PSIP1) were identified and employed to formulate a T cell exhaustion model. This model exhibited robust predictive capabilities for OS prognosis, with patients in the low TEX score group demonstrating a more favorable prognosis, increased immune cell infiltration, and heightened responsiveness to treatment compared to those in the high TEX score group. Conclusion In summary, our research elucidates the role of T cell exhaustion in the immunotherapy and progression of OS, the prognostic model constructed based on T cell exhaustion-related genes holds promise as a potential method for prognostication in the management and treatment of OS patients.
Collapse
Affiliation(s)
- Qizhi Fan
- Department of Spine Surgery, Third Xiangya Hospital, Central South University, Changsha, China
| | - Yiyan Wang
- Department of Spine Surgery, Third Xiangya Hospital, Central South University, Changsha, China
| | - Jun Cheng
- Department of Spine Surgery, Third Xiangya Hospital, Central South University, Changsha, China
| | - Boyu Pan
- Department of Orthopedics, Third Hospital of Changsha, Changsha, China
| | - Xiaofang Zang
- Department of Spine Surgery, Third Xiangya Hospital, Central South University, Changsha, China
| | - Renfeng Liu
- Department of Spine Surgery, Third Xiangya Hospital, Central South University, Changsha, China
| | - Youwen Deng
- Department of Spine Surgery, Third Xiangya Hospital, Central South University, Changsha, China
| |
Collapse
|
7
|
Dunias ZS, Van Calster B, Timmerman D, Boulesteix AL, van Smeden M. A comparison of hyperparameter tuning procedures for clinical prediction models: A simulation study. Stat Med 2024; 43:1119-1134. [PMID: 38189632 DOI: 10.1002/sim.9932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 09/10/2023] [Accepted: 09/21/2023] [Indexed: 01/09/2024]
Abstract
Tuning hyperparameters, such as the regularization parameter in Ridge or Lasso regression, is often aimed at improving the predictive performance of risk prediction models. In this study, various hyperparameter tuning procedures for clinical prediction models were systematically compared and evaluated in low-dimensional data. The focus was on out-of-sample predictive performance (discrimination, calibration, and overall prediction error) of risk prediction models developed using Ridge, Lasso, Elastic Net, or Random Forest. The influence of sample size, number of predictors and events fraction on performance of the hyperparameter tuning procedures was studied using extensive simulations. The results indicate important differences between tuning procedures in calibration performance, while generally showing similar discriminative performance. The one-standard-error rule for tuning applied to cross-validation (1SE CV) often resulted in severe miscalibration. Standard non-repeated and repeated cross-validation (both 5-fold and 10-fold) performed similarly well and outperformed the other tuning procedures. Bootstrap showed a slight tendency to more severe miscalibration than standard cross-validation-based tuning procedures. Differences between tuning procedures were larger for smaller sample sizes, lower events fractions and fewer predictors. These results imply that the choice of tuning procedure can have a profound influence on the predictive performance of prediction models. The results support the application of standard 5-fold or 10-fold cross-validation that minimizes out-of-sample prediction error. Despite an increased computational burden, we found no clear benefit of repeated over non-repeated cross-validation for hyperparameter tuning. We warn against the potentially detrimental effects on model calibration of the popular 1SE CV rule for tuning prediction models in low-dimensional settings.
Collapse
Affiliation(s)
- Zoë S Dunias
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Ben Van Calster
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - Dirk Timmerman
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
- Department of Obstetrics and Gynecology, University Hospitals Leuven, Leuven, Belgium
| | - Anne-Laure Boulesteix
- Institute for Medical Information Processing, Biometry and Epidemiology, University of Munich, Munich, Germany
- Munich Center for Machine Learning (MCML), LMU Munich, Munich, Germany
| | - Maarten van Smeden
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
| |
Collapse
|
8
|
Hoogland J, Debray TPA, Crowther MJ, Riley RD, IntHout J, Reitsma JB, Zwinderman AH. Regularized parametric survival modeling to improve risk prediction models. Biom J 2024; 66:e2200319. [PMID: 37775946 DOI: 10.1002/bimj.202200319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 04/30/2023] [Accepted: 09/17/2023] [Indexed: 10/01/2023]
Abstract
We propose to combine the benefits of flexible parametric survival modeling and regularization to improve risk prediction modeling in the context of time-to-event data. Thereto, we introduce ridge, lasso, elastic net, and group lasso penalties for both log hazard and log cumulative hazard models. The log (cumulative) hazard in these models is represented by a flexible function of time that may depend on the covariates (i.e., covariate effects may be time-varying). We show that the optimization problem for the proposed models can be formulated as a convex optimization problem and provide a user-friendly R implementation for model fitting and penalty parameter selection based on cross-validation. Simulation study results show the advantage of regularization in terms of increased out-of-sample prediction accuracy and improved calibration and discrimination of predicted survival probabilities, especially when sample size was relatively small with respect to model complexity. An applied example illustrates the proposed methods. In summary, our work provides both a foundation for and an easily accessible implementation of regularized parametric survival modeling and suggests that it improves out-of-sample prediction performance.
Collapse
Affiliation(s)
- J Hoogland
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Department of Epidemiology and Data Science, Amsterdam University Medical Centers, Amsterdam, The Netherlands
| | - T P A Debray
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - M J Crowther
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - R D Riley
- School for Medicine, Keele University, Keele, Staffordshire, UK
| | - J IntHout
- Radboud Institute for Health Sciences (RIHS), Radboud University Medical Center, Nijmegen, The Netherlands
| | - J B Reitsma
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - A H Zwinderman
- Department of Epidemiology and Data Science, Amsterdam University Medical Centers, Amsterdam, The Netherlands
| |
Collapse
|
9
|
Lohmann A, Groenwold RHH, van Smeden M. Comparison of likelihood penalization and variance decomposition approaches for clinical prediction models: A simulation study. Biom J 2024; 66:e2200108. [PMID: 37199142 DOI: 10.1002/bimj.202200108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 09/30/2022] [Accepted: 11/10/2022] [Indexed: 05/19/2023]
Abstract
Logistic regression is one of the most commonly used approaches to develop clinical risk prediction models. Developers of such models often rely on approaches that aim to minimize the risk of overfitting and improve predictive performance of the logistic model, such as through likelihood penalization and variance decomposition techniques. We present an extensive simulation study that compares the out-of-sample predictive performance of risk prediction models derived using the elastic net, with Lasso and ridge as special cases, and variance decomposition techniques, namely, incomplete principal component regression and incomplete partial least squares regression. We varied the expected events per variable, event fraction, number of candidate predictors, presence of noise predictors, and the presence of sparse predictors in a full-factorial design. Predictive performance was compared on measures of discrimination, calibration, and prediction error. Simulation metamodels were derived to explain the performance differences within model derivation approaches. Our results indicate that, on average, prediction models developed using penalization and variance decomposition approaches outperform models developed using ordinary maximum likelihood estimation, with penalization approaches being consistently superior over the variance decomposition approaches. Differences in performance were most pronounced on the calibration of the model. Performance differences regarding prediction error and concordance statistic outcomes were often small between approaches. The use of likelihood penalization and variance decomposition techniques methods was illustrated in the context of peripheral arterial disease.
Collapse
Affiliation(s)
- Anna Lohmann
- Department of Welfare, EAH Jena University of Applied Sciences, Jena, Germany
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Rolf H H Groenwold
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - Maarten van Smeden
- Julius Center for Health Science and Primary Care, University Medical Center Utrecht, Utrecht, The Netherland
| |
Collapse
|
10
|
Riley RD, Pate A, Dhiman P, Archer L, Martin GP, Collins GS. Clinical prediction models and the multiverse of madness. BMC Med 2023; 21:502. [PMID: 38110939 PMCID: PMC10729337 DOI: 10.1186/s12916-023-03212-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 12/05/2023] [Indexed: 12/20/2023] Open
Abstract
BACKGROUND Each year, thousands of clinical prediction models are developed to make predictions (e.g. estimated risk) to inform individual diagnosis and prognosis in healthcare. However, most are not reliable for use in clinical practice. MAIN BODY We discuss how the creation of a prediction model (e.g. using regression or machine learning methods) is dependent on the sample and size of data used to develop it-were a different sample of the same size used from the same overarching population, the developed model could be very different even when the same model development methods are used. In other words, for each model created, there exists a multiverse of other potential models for that sample size and, crucially, an individual's predicted value (e.g. estimated risk) may vary greatly across this multiverse. The more an individual's prediction varies across the multiverse, the greater the instability. We show how small development datasets lead to more different models in the multiverse, often with vastly unstable individual predictions, and explain how this can be exposed by using bootstrapping and presenting instability plots. We recommend healthcare researchers seek to use large model development datasets to reduce instability concerns. This is especially important to ensure reliability across subgroups and improve model fairness in practice. CONCLUSIONS Instability is concerning as an individual's predicted value is used to guide their counselling, resource prioritisation, and clinical decision making. If different samples lead to different models with very different predictions for the same individual, then this should cast doubt into using a particular model for that individual. Therefore, visualising, quantifying and reporting the instability in individual-level predictions is essential when proposing a new model.
Collapse
Affiliation(s)
- Richard D Riley
- College of Medical and Dental Sciences, Institute of Applied Health Research, University of Birmingham, Birmingham, B15 2TT, UK.
- National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, Birmingham, UK.
| | - Alexander Pate
- Division of Informatics, Imaging and Data Science, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| | - Paula Dhiman
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK
| | - Lucinda Archer
- College of Medical and Dental Sciences, Institute of Applied Health Research, University of Birmingham, Birmingham, B15 2TT, UK
- National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, Birmingham, UK
| | - Glen P Martin
- Division of Informatics, Imaging and Data Science, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| | - Gary S Collins
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK
| |
Collapse
|
11
|
Riley RD, Collins GS. Stability of clinical prediction models developed using statistical or machine learning methods. Biom J 2023; 65:e2200302. [PMID: 37466257 PMCID: PMC10952221 DOI: 10.1002/bimj.202200302] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 04/26/2023] [Accepted: 05/02/2023] [Indexed: 07/20/2023]
Abstract
Clinical prediction models estimate an individual's risk of a particular health outcome. A developed model is a consequence of the development dataset and model-building strategy, including the sample size, number of predictors, and analysis method (e.g., regression or machine learning). We raise the concern that many models are developed using small datasets that lead to instability in the model and its predictions (estimated risks). We define four levels of model stability in estimated risks moving from the overall mean to the individual level. Through simulation and case studies of statistical and machine learning approaches, we show instability in a model's estimated risks is often considerable, and ultimately manifests itself as miscalibration of predictions in new data. Therefore, we recommend researchers always examine instability at the model development stage and propose instability plots and measures to do so. This entails repeating the model-building steps (those used to develop the original prediction model) in each of multiple (e.g., 1000) bootstrap samples, to produce multiple bootstrap models, and deriving (i) a prediction instability plot of bootstrap model versus original model predictions; (ii) the mean absolute prediction error (mean absolute difference between individuals' original and bootstrap model predictions), and (iii) calibration, classification, and decision curve instability plots of bootstrap models applied in the original sample. A case study illustrates how these instability assessments help reassure (or not) whether model predictions are likely to be reliable (or not), while informing a model's critical appraisal (risk of bias rating), fairness, and further validation requirements.
Collapse
Affiliation(s)
- Richard D. Riley
- Institute of Applied Health ResearchCollege of Medical and Dental SciencesUniversity of BirminghamBirminghamUK
| | - Gary S. Collins
- Centre for Statistics in MedicineNuffield Department of OrthopaedicsRheumatology and Musculoskeletal SciencesUniversity of OxfordOxfordUK
| |
Collapse
|
12
|
Buick JE, Austin PC, Cheskes S, Ko DT, Atzema CL. Prediction models in prehospital and emergency medicine research: How to derive and internally validate a clinical prediction model. Acad Emerg Med 2023; 30:1150-1160. [PMID: 37266925 DOI: 10.1111/acem.14756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Revised: 05/24/2023] [Accepted: 05/29/2023] [Indexed: 06/03/2023]
Abstract
Clinical prediction models are created to help clinicians with medical decision making, aid in risk stratification, and improve diagnosis and/or prognosis. With growing availability of both prehospital and in-hospital observational registries and electronic health records, there is an opportunity to develop, validate, and incorporate prediction models into clinical practice. However, many prediction models have high risk of bias due to poor methodology. Given that there are no methodological standards aimed at developing prediction models specifically in the prehospital setting, the objective of this paper is to describe the appropriate methodology for the derivation and validation of clinical prediction models in this setting. What follows can also be applied to the emergency medicine (EM) setting. There are eight steps that should be followed when developing and internally validating a prediction model: (1) problem definition, (2) coding of predictors, (3) addressing missing data, (4) ensuring adequate sample size, (5) variable selection, (6) evaluating model performance, (7) internal validation, and (8) model presentation. Subsequent steps include external validation, assessment of impact, and cost-effectiveness. By following these steps, researchers can develop a prediction model with the methodological rigor and quality required for prehospital and EM research.
Collapse
Affiliation(s)
- Jason E Buick
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
| | - Peter C Austin
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
- ICES, Toronto, Ontario, Canada
- Sunnybrook Health Sciences Centre, Toronto, Ontario, Canada
| | - Sheldon Cheskes
- Sunnybrook Health Sciences Centre, Toronto, Ontario, Canada
- Division of Emergency Medicine, Department of Family and Community Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Dennis T Ko
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
- ICES, Toronto, Ontario, Canada
- Sunnybrook Health Sciences Centre, Toronto, Ontario, Canada
- Department of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Clare L Atzema
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
- ICES, Toronto, Ontario, Canada
- Sunnybrook Health Sciences Centre, Toronto, Ontario, Canada
- Division of Emergency Medicine, Department of Medicine, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
13
|
Schmidt AF, Leinveber P, Panovsky R, Soukup L, Machac P, van de Leur RR, Sammani A, Lekadir K, Ter Riele A, Asselbergs FW, Boonstra MJ. DCM-PROGRESS: predicting end-stage heart failure in non-ischemic dilated cardiomyopathy patients. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.09.10.23295251. [PMID: 37745419 PMCID: PMC10516079 DOI: 10.1101/2023.09.10.23295251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Aims Patients with non-ischemic dilated cardiomyopathy (DCM) are at considerable risk for end-stage heart failure (HF), requiring close monitoring to identify early signs of disease. We aimed to develop a model to predict the 5-years risk of end-stage HF, allowing for tailored patient monitoring and management. Methods and results Derivation data were available from a Dutch cohort of 293 DCM patients, with external validation available from a Czech Republic cohort of 235 DCM patients. Candidate predictors spanned patient and family histories, ECG and echocardiogram measurements, and biochemistry. End-stage HF was defined as a composite of death, heart transplantation, or implantation of a ventricular assist device. Lasso and sigmoid kernel support vector machine (SVM) algorithms were trained using cross-validation. During follow-up 65 (22%) of Dutch DCM patients developed end-stage HF, with 27 (11%) cases in the Czech cohort. Out of the two considered models, the lasso model (retaining NYHA class, heart rate, systolic blood pressure, height, R-axis, and TAPSE as predictors) reached the highest discriminative performance (testing c-statistic of 0.85, 95%CI 0.58; 0.94), which was confirmed in the external validation cohort (c-statistic of 0.75, 95%CI 0.61; 0.82), compared to a c-statistic of 0.69 for the MAGGIC score. Both the MAGGIC score and the DCM-PROGRESS model slightly over-estimated the true risk, but were otherwise appropriately calibrated. Conclusion We developed a highly discriminative risk-prediction model for end-stage HF in DCM patients. The model was validated in two countries, suggesting the model can meaningfully improve clinical decision-making.
Collapse
Affiliation(s)
- A F Schmidt
- Department of Cardiology, Amsterdam Cardiovascular Sciences, Amsterdam University Medical Centres, University of Amsterdam, Amsterdam, the Netherlands
- Institute of Cardiovascular Science, Faculty of Population Health, University College London, London, United Kingdom
- UCL British Heart Foundation Research Accelerator, London, United Kingdom
- Department of Cardiology, Division Heart and Lungs, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands
| | - P Leinveber
- International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - R Panovsky
- Department of Internal Medicine-Cardioangiology, International Clinical Research Center, St. Anne's University Hospital Brno, Czech Republic
- International Clinical Research Center, Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - L Soukup
- International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - P Machac
- International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - R R van de Leur
- Department of Cardiology, Division Heart and Lungs, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands
| | - A Sammani
- Department of Cardiology, Division Heart and Lungs, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands
| | - K Lekadir
- Department de Matemàtiques i Informàtica, Universitat de Barcelona, Barcelona, Spain
| | - A Ter Riele
- Department of Cardiology, Division Heart and Lungs, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands
| | - F W Asselbergs
- Department of Cardiology, Amsterdam Cardiovascular Sciences, Amsterdam University Medical Centres, University of Amsterdam, Amsterdam, the Netherlands
- Department of Cardiology, Division Heart and Lungs, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands
- Institute of Health Informatics, Faculty of Population Health, University College London, London, UK
| | - M J Boonstra
- Department of Cardiology, Amsterdam Cardiovascular Sciences, Amsterdam University Medical Centres, University of Amsterdam, Amsterdam, the Netherlands
- Department of Cardiology, Division Heart and Lungs, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands
| |
Collapse
|
14
|
Rentroia-Pacheco B, Tokez S, Bramer EM, Venables ZC, van de Werken HJ, Bellomo D, van Klaveren D, Mooyaart AL, Hollestein LM, Wakkee M. Personalised decision making to predict absolute metastatic risk in cutaneous squamous cell carcinoma: development and validation of a clinico-pathological model. EClinicalMedicine 2023; 63:102150. [PMID: 37662519 PMCID: PMC10468358 DOI: 10.1016/j.eclinm.2023.102150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 07/14/2023] [Accepted: 07/25/2023] [Indexed: 09/05/2023] Open
Abstract
Background Cutaneous squamous cell carcinoma (cSCC) is a common skin cancer, affecting more than 2 million people worldwide yearly and metastasising in 2-5% of patients. However, current clinical staging systems do not provide estimates of absolute metastatic risk, hence missing the opportunity for more personalised treatment advice. We aimed to develop a clinico-pathological model that predicts the probability of metastasis in patients with cSCC. Methods Nationwide cohorts from (1) all patients with a first primary cSCC in The Netherlands in 2007-2008 and (2) all patients with a cSCC in 2013-2015 in England were used to derive nested case-control cohorts. Pathology records of primary cSCCs that originated a loco-regional or distant metastasis were identified, and these cSCCs were matched to primary cSCCs of controls without metastasis (1:1 ratio). The model was developed on the Dutch cohort (n = 390) using a weighted Cox regression model with backward selection and validated on the English cohort (n = 696). Model performance was assessed using weighted versions of the C-index, calibration metrics, and decision curve analysis; and compared to the Brigham and Women's Hospital (BWH) and the American Joint Committee on Cancer (AJCC) staging systems. Members of the multidisciplinary Skin Cancer Outcomes (SCOUT) consortium were surveyed to interpret metastatic risk cutoffs in a clinical context. Findings Eight out of eleven clinico-pathological variables were selected. The model showed good discriminative ability, with an optimism-corrected C-index of 0.80 (95% Confidence interval (CI) 0.75-0.85) in the development cohort and a C-index of 0.84 (95% CI 0.81-0.87) in the validation cohort. Model predictions were well-calibrated: the calibration slope was 0.96 (95% CI 0.76-1.16) in the validation cohort. Decision curve analysis showed improved net benefit compared to current staging systems, particularly for thresholds relevant for decisions on follow-up and adjuvant treatment. The model is available as an online web-based calculator (https://emc-dermatology.shinyapps.io/cscc-abs-met-risk/). Interpretation This validated model assigns personalised metastatic risk predictions to patients with cSCC, using routinely reported histological and patient-specific risk factors. The model can empower clinicians and healthcare systems in identifying patients with high-risk cSCC and offering personalised care/treatment and follow-up. Use of the model for clinical decision-making in different patient populations must be further investigated. Funding PPP Allowance made available by Health-Holland, Top Sector Life Sciences & Health, to stimulate public-private partnerships.
Collapse
Affiliation(s)
- Barbara Rentroia-Pacheco
- Department of Dermatology, Erasmus MC Cancer Institute, Erasmus University Medical Center, Rotterdam, the Netherlands
| | - Selin Tokez
- Department of Dermatology, Erasmus MC Cancer Institute, Erasmus University Medical Center, Rotterdam, the Netherlands
| | - Edo M. Bramer
- Department of Dermatology, Erasmus MC Cancer Institute, Erasmus University Medical Center, Rotterdam, the Netherlands
| | - Zoe C. Venables
- Department of Dermatology, Norfolk and Norwich University Hospital, Norwich, United Kingdom
- National Disease Registration Service, NHS England, United Kingdom
- Norwich Medical School, University of East Anglia, Norwich, United Kingdom
| | - Harmen J.G. van de Werken
- Department of Immunology, Erasmus MC Cancer Institute, Erasmus University Medical Center, Rotterdam, the Netherlands
| | | | - David van Klaveren
- Department of Public Health, Center for Medical Decision Making, Erasmus University Medical Center, Rotterdam, the Netherlands
| | - Antien L. Mooyaart
- Department of Pathology, Erasmus University Medical Center, Rotterdam, the Netherlands
| | - Loes M. Hollestein
- Department of Dermatology, Erasmus MC Cancer Institute, Erasmus University Medical Center, Rotterdam, the Netherlands
- Department of Research, Netherlands Comprehensive Cancer Organization (IKNL), Utrecht, the Netherlands
| | - Marlies Wakkee
- Department of Dermatology, Erasmus MC Cancer Institute, Erasmus University Medical Center, Rotterdam, the Netherlands
| |
Collapse
|
15
|
Dhiman P, Ma J, Qi C, Bullock G, Sergeant JC, Riley RD, Collins GS. Sample size requirements are not being considered in studies developing prediction models for binary outcomes: a systematic review. BMC Med Res Methodol 2023; 23:188. [PMID: 37598153 PMCID: PMC10439652 DOI: 10.1186/s12874-023-02008-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 08/04/2023] [Indexed: 08/21/2023] Open
Abstract
BACKGROUND Having an appropriate sample size is important when developing a clinical prediction model. We aimed to review how sample size is considered in studies developing a prediction model for a binary outcome. METHODS We searched PubMed for studies published between 01/07/2020 and 30/07/2020 and reviewed the sample size calculations used to develop the prediction models. Using the available information, we calculated the minimum sample size that would be needed to estimate overall risk and minimise overfitting in each study and summarised the difference between the calculated and used sample size. RESULTS A total of 119 studies were included, of which nine studies provided sample size justification (8%). The recommended minimum sample size could be calculated for 94 studies: 73% (95% CI: 63-82%) used sample sizes lower than required to estimate overall risk and minimise overfitting including 26% studies that used sample sizes lower than required to estimate overall risk only. A similar number of studies did not meet the ≥ 10EPV criteria (75%, 95% CI: 66-84%). The median deficit of the number of events used to develop a model was 75 [IQR: 234 lower to 7 higher]) which reduced to 63 if the total available data (before any data splitting) was used [IQR:225 lower to 7 higher]. Studies that met the minimum required sample size had a median c-statistic of 0.84 (IQR:0.80 to 0.9) and studies where the minimum sample size was not met had a median c-statistic of 0.83 (IQR: 0.75 to 0.9). Studies that met the ≥ 10 EPP criteria had a median c-statistic of 0.80 (IQR: 0.73 to 0.84). CONCLUSIONS Prediction models are often developed with no sample size calculation, as a consequence many are too small to precisely estimate the overall risk. We encourage researchers to justify, perform and report sample size calculations when developing a prediction model.
Collapse
Affiliation(s)
- Paula Dhiman
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK.
| | - Jie Ma
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK
| | - Cathy Qi
- Population Data Science, Faculty of Medicine, Health and Life Science, Swansea University Medical School, Swansea University, Singleton Park, Swansea, SA2 8PP, UK
| | - Garrett Bullock
- Department of Orthopaedic Surgery, Wake Forest School of Medicine, Winston-Salem, NC, USA
- Centre for Sport, Exercise and Osteoarthritis Research Versus Arthritis, University of Oxford, Oxford, UK
| | - Jamie C Sergeant
- Centre for Biostatistics, University of Manchester, Manchester Academic Health Science Centre, Manchester, M13 9PL, UK
- Centre for Epidemiology Versus Arthritis, Centre for Musculoskeletal Research, University of Manchester, Manchester Academic Health Science Centre, Manchester, M13 9PT, UK
| | - Richard D Riley
- Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, B15 2TT, Birmingham, UK
| | - Gary S Collins
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK
| |
Collapse
|
16
|
Lewis MW, Webb CA, Kuhn M, Akman E, Jobson SA, Rosso IM. Predicting Fear Extinction in Posttraumatic Stress Disorder. Brain Sci 2023; 13:1131. [PMID: 37626488 PMCID: PMC10452660 DOI: 10.3390/brainsci13081131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 07/21/2023] [Accepted: 07/26/2023] [Indexed: 08/27/2023] Open
Abstract
Fear extinction is the basis of exposure therapies for posttraumatic stress disorder (PTSD), but half of patients do not improve. Predicting fear extinction in individuals with PTSD may inform personalized exposure therapy development. The participants were 125 trauma-exposed adults (96 female) with a range of PTSD symptoms. Electromyography, electrocardiogram, and skin conductance were recorded at baseline, during dark-enhanced startle, and during fear conditioning and extinction. Using a cross-validated, hold-out sample prediction approach, three penalized regressions and conventional ordinary least squares were trained to predict fear-potentiated startle during extinction using 50 predictor variables (5 clinical, 24 self-reported, and 21 physiological). The predictors, selected by penalized regression algorithms, were included in multivariable regression analyses, while univariate regressions assessed individual predictors. All the penalized regressions outperformed OLS in prediction accuracy and generalizability, as indexed by the lower mean squared error in the training and holdout subsamples. During early extinction, the consistent predictors across all the modeling approaches included dark-enhanced startle, the depersonalization and derealization subscale of the dissociative experiences scale, and the PTSD hyperarousal symptom score. These findings offer novel insights into the modeling approaches and patient characteristics that may reliably predict fear extinction in PTSD. Penalized regression shows promise for identifying symptom-related variables to enhance the predictive modeling accuracy in clinical research.
Collapse
Affiliation(s)
- Michael W. Lewis
- Center for Depression, Anxiety, and Stress Research, McLean Hospital, Belmont, MA 02478, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA 02115, USA
| | - Christian A. Webb
- Center for Depression, Anxiety, and Stress Research, McLean Hospital, Belmont, MA 02478, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA 02115, USA
| | - Manuel Kuhn
- Center for Depression, Anxiety, and Stress Research, McLean Hospital, Belmont, MA 02478, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA 02115, USA
| | - Eylül Akman
- Center for Depression, Anxiety, and Stress Research, McLean Hospital, Belmont, MA 02478, USA
| | - Sydney A. Jobson
- Center for Depression, Anxiety, and Stress Research, McLean Hospital, Belmont, MA 02478, USA
| | - Isabelle M. Rosso
- Center for Depression, Anxiety, and Stress Research, McLean Hospital, Belmont, MA 02478, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
17
|
Chiorino G, Petracci E, Sehovic E, Gregnanin I, Camussi E, Mello-Grand M, Ostano P, Riggi E, Vergini V, Russo A, Berrino E, Ortale A, Garena F, Venesio T, Gallo F, Favettini E, Frigerio A, Matullo G, Segnan N, Giordano L. Plasma microRNA ratios associated with breast cancer detection in a nested case-control study from a mammography screening cohort. Sci Rep 2023; 13:12040. [PMID: 37491482 PMCID: PMC10368693 DOI: 10.1038/s41598-023-38886-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Accepted: 07/17/2023] [Indexed: 07/27/2023] Open
Abstract
Mammographic breast cancer screening is effective in reducing breast cancer mortality. Nevertheless, several limitations are known. Therefore, developing an alternative or complementary non-invasive tool capable of increasing the accuracy of the screening process is highly desirable. The objective of this study was to identify circulating microRNA (miRs) ratios associated with BC in women attending mammography screening. A nested case-control study was conducted within the ANDROMEDA cohort (women of age 46-67 attending BC screening). Pre-diagnostic plasma samples, information on life-styles and common BC risk factors were collected. Small-RNA sequencing was carried out on plasma samples from 65 cases and 66 controls. miR ratios associated with BC were selected by two-sample Wilcoxon test and lasso logistic regression. Subsequent assessment by RT-qPCR of the miRs contained in the selected miR ratios was carried out as a platform validation. To identify the most promising biomarkers, penalised logistic regression was further applied to candidate miR ratios alone, or in combination with non-molecular factors. Small-RNA sequencing yielded 20 candidate miR ratios associated with BC, which were further assessed by RT-qPCR. In the resulting model, penalised logistic regression selected seven miR ratios (miR-199a-3p_let-7a-5p, miR-26b-5p_miR-142-5p, let-7b-5p_miR-19b-3p, miR-101-3p_miR-19b-3p, miR-93-5p_miR-19b-3p, let-7a-5p_miR-22-3p and miR-21-5p_miR-23a-3p), together with body mass index (BMI), menopausal status (MS), the interaction term BMI * MS, life-style score and breast density. The ROC AUC of the model was 0.79 with a sensitivity and specificity of 71.9% and 76.6%, respectively. We identified biomarkers potentially useful for BC screening measured through a widespread and low-cost technique. This is the first study reporting circulating miRs for BC detection in a screening setting. Validation in a wider sample is warranted.Trial registration: The Andromeda prospective cohort study protocol was retrospectively registered on 27-11-2015 (NCT02618538).
Collapse
Affiliation(s)
- Giovanna Chiorino
- Cancer Genomics Lab, Fondazione Edo ed Elvo Tempia, Via Malta 3, 13900, Biella, Italy
| | - Elisabetta Petracci
- Unit of Biostatistics and Clinical Trials, IRCCS Istituto Romagnolo per lo Studio dei Tumori (IRST) "Dino Amadori", Meldola, Italy
| | - Emir Sehovic
- Cancer Genomics Lab, Fondazione Edo ed Elvo Tempia, Via Malta 3, 13900, Biella, Italy.
- Department of Life Sciences and Systems Biology, University of Turin, Turin, Italy.
| | - Ilaria Gregnanin
- Cancer Genomics Lab, Fondazione Edo ed Elvo Tempia, Via Malta 3, 13900, Biella, Italy
| | - Elisa Camussi
- SSD Epidemiologia Screening, CPO-AOU Città della Salute e della Scienza di Torino, Via Camillo Benso Di Cavour 31, 10123, Turin, Italy
| | - Maurizia Mello-Grand
- Cancer Genomics Lab, Fondazione Edo ed Elvo Tempia, Via Malta 3, 13900, Biella, Italy
| | - Paola Ostano
- Cancer Genomics Lab, Fondazione Edo ed Elvo Tempia, Via Malta 3, 13900, Biella, Italy
| | - Emilia Riggi
- SSD Epidemiologia Screening, CPO-AOU Città della Salute e della Scienza di Torino, Via Camillo Benso Di Cavour 31, 10123, Turin, Italy
| | - Viviana Vergini
- SSD Epidemiologia Screening, CPO-AOU Città della Salute e della Scienza di Torino, Via Camillo Benso Di Cavour 31, 10123, Turin, Italy
| | - Alessia Russo
- Department of Medical Sciences, University of Turin, Turin, Italy
| | - Enrico Berrino
- Department of Medical Sciences, University of Turin, Turin, Italy
- Pathology Unit, Candiolo Cancer Institute, FPO IRCCS, Candiolo, Italy
| | - Andrea Ortale
- SSD Epidemiologia Screening, CPO-AOU Città della Salute e della Scienza di Torino, Via Camillo Benso Di Cavour 31, 10123, Turin, Italy
| | - Francesca Garena
- SSD Epidemiologia Screening, CPO-AOU Città della Salute e della Scienza di Torino, Via Camillo Benso Di Cavour 31, 10123, Turin, Italy
| | - Tiziana Venesio
- Pathology Unit, Candiolo Cancer Institute, FPO IRCCS, Candiolo, Italy
| | - Federica Gallo
- Epidemiology Unit, Staff Health Direction, Local Health Authority 1 of Cuneo, Cuneo, Italy
| | | | - Alfonso Frigerio
- SSD Epidemiologia Screening, CPO-AOU Città della Salute e della Scienza di Torino, Via Camillo Benso Di Cavour 31, 10123, Turin, Italy
| | - Giuseppe Matullo
- Department of Medical Sciences, University of Turin, Turin, Italy
| | - Nereo Segnan
- SSD Epidemiologia Screening, CPO-AOU Città della Salute e della Scienza di Torino, Via Camillo Benso Di Cavour 31, 10123, Turin, Italy.
| | - Livia Giordano
- SSD Epidemiologia Screening, CPO-AOU Città della Salute e della Scienza di Torino, Via Camillo Benso Di Cavour 31, 10123, Turin, Italy
| |
Collapse
|
18
|
Blythe R, Parsons R, Barnett AG, McPhail SM, White NM. Vital signs-based deterioration prediction model assumptions can lead to losses in prediction performance. J Clin Epidemiol 2023; 159:106-115. [PMID: 37245699 DOI: 10.1016/j.jclinepi.2023.05.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 04/11/2023] [Accepted: 05/22/2023] [Indexed: 05/30/2023]
Abstract
OBJECTIVE Vital signs-based models are complicated by repeated measures per patient and frequently missing data. This paper investigated the impacts of common vital signs modeling assumptions during clinical deterioration prediction model development. STUDY DESIGN AND SETTING Electronic medical record (EMR) data from five Australian hospitals (1 January 2019-31 December 2020) were used. Summary statistics for each observation's prior vital signs were created. Missing data patterns were investigated using boosted decision trees, then imputed with common methods. Two example models predicting in-hospital mortality were developed, as follows: logistic regression and eXtreme Gradient Boosting. Model discrimination and calibration were assessed using the C-statistic and nonparametric calibration plots. RESULTS The data contained 5,620,641 observations from 342,149 admissions. Missing vitals were associated with observation frequency, vital sign variability, and patient consciousness. Summary statistics improved discrimination slightly for logistic regression and markedly for eXtreme Gradient Boosting. Imputation method led to notable differences in model discrimination and calibration. Model calibration was generally poor. CONCLUSION Summary statistics and imputation methods can improve model discrimination and reduce bias during model development, but it is questionable whether these differences are clinically significant. Researchers should consider why data are missing during model development and how this may impact clinical utility.
Collapse
Affiliation(s)
- Robin Blythe
- Australian Centre for Health Services Innovation, Centre for Healthcare Transformation, School of Public Health and Social Work, Faculty of Health, Queensland University of Technology, 60 Musk Ave, Kelvin Grove, Queensland, 4059, Australia
| | - Rex Parsons
- Australian Centre for Health Services Innovation, Centre for Healthcare Transformation, School of Public Health and Social Work, Faculty of Health, Queensland University of Technology, 60 Musk Ave, Kelvin Grove, Queensland, 4059, Australia
| | - Adrian G Barnett
- Australian Centre for Health Services Innovation, Centre for Healthcare Transformation, School of Public Health and Social Work, Faculty of Health, Queensland University of Technology, 60 Musk Ave, Kelvin Grove, Queensland, 4059, Australia
| | - Steven M McPhail
- Australian Centre for Health Services Innovation, Centre for Healthcare Transformation, School of Public Health and Social Work, Faculty of Health, Queensland University of Technology, 60 Musk Ave, Kelvin Grove, Queensland, 4059, Australia; Digital Health and Informatics, Metro South Health, 199 Ipswich Road, Brisbane, Queensland, 4102, Australia
| | - Nicole M White
- Australian Centre for Health Services Innovation, Centre for Healthcare Transformation, School of Public Health and Social Work, Faculty of Health, Queensland University of Technology, 60 Musk Ave, Kelvin Grove, Queensland, 4059, Australia.
| |
Collapse
|
19
|
Venkatasubramaniam A, Mateen BA, Shields BM, Hattersley AT, Jones AG, Vollmer SJ, Dennis JM. Comparison of causal forest and regression-based approaches to evaluate treatment effect heterogeneity: an application for type 2 diabetes precision medicine. BMC Med Inform Decis Mak 2023; 23:110. [PMID: 37328784 PMCID: PMC10276367 DOI: 10.1186/s12911-023-02207-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 06/01/2023] [Indexed: 06/18/2023] Open
Abstract
OBJECTIVE Precision medicine requires reliable identification of variation in patient-level outcomes with different available treatments, often termed treatment effect heterogeneity. We aimed to evaluate the comparative utility of individualized treatment selection strategies based on predicted individual-level treatment effects from a causal forest machine learning algorithm and a penalized regression model. METHODS Cohort study characterizing individual-level glucose-lowering response (6 month reduction in HbA1c) in people with type 2 diabetes initiating SGLT2-inhibitor or DPP4-inhibitor therapy. Model development set comprised 1,428 participants in the CANTATA-D and CANTATA-D2 randomised clinical trials of SGLT2-inhibitors versus DPP4-inhibitors. For external validation, calibration of observed versus predicted differences in HbA1c in patient strata defined by size of predicted HbA1c benefit was evaluated in 18,741 patients in UK primary care (Clinical Practice Research Datalink). RESULTS Heterogeneity in treatment effects was detected in clinical trial participants with both approaches (proportion predicted to have a benefit on SGLT2-inhibitor therapy over DPP4-inhibitor therapy: causal forest: 98.6%; penalized regression: 81.7%). In validation, calibration was good with penalized regression but sub-optimal with causal forest. A strata with an HbA1c benefit > 10 mmol/mol with SGLT2-inhibitors (3.7% of patients, observed benefit 11.0 mmol/mol [95%CI 8.0-14.0]) was identified using penalized regression but not causal forest, and a much larger strata with an HbA1c benefit 5-10 mmol with SGLT2-inhibitors was identified with penalized regression (regression: 20.9% of patients, observed benefit 7.8 mmol/mol (95%CI 6.7-8.9); causal forest 11.6%, observed benefit 8.7 mmol/mol (95%CI 7.4-10.1). CONCLUSIONS Consistent with recent results for outcome prediction with clinical data, when evaluating treatment effect heterogeneity researchers should not rely on causal forest or other similar machine learning algorithms alone, and must compare outputs with standard regression, which in this evaluation was superior.
Collapse
Affiliation(s)
| | - Bilal A Mateen
- The Alan Turing Institute, British Library, 96 Euston Road, London, NW1 2DB, UK
- University College London, Institute of Health Informatics, 222 Euston Rd, London, NW1 2DA, UK
| | - Beverley M Shields
- University of Exeter Medical School, Institute of Biomedical & Clinical Science, RILD Building, Royal Devon & Exeter Hospital, Barrack Road, Exeter, EX2 5DW, UK
| | - Andrew T Hattersley
- University of Exeter Medical School, Institute of Biomedical & Clinical Science, RILD Building, Royal Devon & Exeter Hospital, Barrack Road, Exeter, EX2 5DW, UK
| | - Angus G Jones
- University of Exeter Medical School, Institute of Biomedical & Clinical Science, RILD Building, Royal Devon & Exeter Hospital, Barrack Road, Exeter, EX2 5DW, UK
| | | | - John M Dennis
- University of Exeter Medical School, Institute of Biomedical & Clinical Science, RILD Building, Royal Devon & Exeter Hospital, Barrack Road, Exeter, EX2 5DW, UK.
| |
Collapse
|
20
|
Pate A, Riley RD, Collins GS, van Smeden M, Van Calster B, Ensor J, Martin GP. Minimum sample size for developing a multivariable prediction model using multinomial logistic regression. Stat Methods Med Res 2023; 32:555-571. [PMID: 36660777 PMCID: PMC10012398 DOI: 10.1177/09622802231151220] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
AIMS Multinomial logistic regression models allow one to predict the risk of a categorical outcome with > 2 categories. When developing such a model, researchers should ensure the number of participants (n ) is appropriate relative to the number of events (E k ) and the number of predictor parameters (p k ) for each category k. We propose three criteria to determine the minimum n required in light of existing criteria developed for binary outcomes. PROPOSED CRITERIA The first criterion aims to minimise the model overfitting. The second aims to minimise the difference between the observed and adjusted R 2 Nagelkerke. The third criterion aims to ensure the overall risk is estimated precisely. For criterion (i), we show the sample size must be based on the anticipated Cox-snell R 2 of distinct 'one-to-one' logistic regression models corresponding to the sub-models of the multinomial logistic regression, rather than on the overall Cox-snell R 2 of the multinomial logistic regression. EVALUATION OF CRITERIA We tested the performance of the proposed criteria (i) through a simulation study and found that it resulted in the desired level of overfitting. Criterion (ii) and (iii) were natural extensions from previously proposed criteria for binary outcomes and did not require evaluation through simulation. SUMMARY We illustrated how to implement the sample size criteria through a worked example considering the development of a multinomial risk prediction model for tumour type when presented with an ovarian mass. Code is provided for the simulation and worked example. We will embed our proposed criteria within the pmsampsize R library and Stata modules.
Collapse
Affiliation(s)
- Alexander Pate
- Division of Informatics, Imaging and Data Science, Faculty of Biology, Medicine and Health, University of Manchester, Manchester Academic Health Science Centre, Manchester, UK
| | - Richard D Riley
- Centre for Prognosis Research, School of Medicine, Keele University, Staffordshire, UK
| | - Gary S Collins
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford, UK
| | - Maarten van Smeden
- Julius Center for Health Sciences, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, Netherlands
| | - Ben Van Calster
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
- Department of Biomedical Data Sciences, Leiden University Medical Centre, Leiden, Netherlands
- EPI-center, KU Leuven, Leuven, Belgium
| | - Joie Ensor
- Centre for Prognosis Research, School of Medicine, Keele University, Staffordshire, UK
| | - Glen P Martin
- Division of Informatics, Imaging and Data Science, Faculty of Biology, Medicine and Health, University of Manchester, Manchester Academic Health Science Centre, Manchester, UK
| |
Collapse
|
21
|
Debray TPA, Collins GS, Riley RD, Snell KIE, Van Calster B, Reitsma JB, Moons KGM. Transparent reporting of multivariable prediction models developed or validated using clustered data (TRIPOD-Cluster): explanation and elaboration. BMJ 2023; 380:e071058. [PMID: 36750236 PMCID: PMC9903176 DOI: 10.1136/bmj-2022-071058] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/07/2022] [Indexed: 02/09/2023]
Affiliation(s)
- Thomas P A Debray
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
- Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
| | - Gary S Collins
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Botnar Research Centre, University of Oxford, Oxford, UK
- National Institute for Health and Care Research Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford, UK
| | - Richard D Riley
- Centre for Prognosis Research, School of Medicine, Keele University, Keele, UK
| | - Kym I E Snell
- Centre for Prognosis Research, School of Medicine, Keele University, Keele, UK
| | - Ben Van Calster
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
- EPI-centre, KU Leuven, Leuven, Belgium
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, Netherlands
| | - Johannes B Reitsma
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
- Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
| | - Karel G M Moons
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
- Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
| |
Collapse
|
22
|
Wells J, Wang C, Dolgin K, Kayyali R. SPUR: A Patient-Reported Medication Adherence Model as a Predictor of Admission and Early Readmission in Patients Living with Type 2 Diabetes. Patient Prefer Adherence 2023; 17:441-455. [PMID: 36844798 PMCID: PMC9948632 DOI: 10.2147/ppa.s397424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 01/14/2023] [Indexed: 02/20/2023] Open
Abstract
PURPOSE Poor medication adherence (MA) is linked to an increased likelihood of hospital admission. Early interventions to address MA may reduce this risk and associated health-care costs. This study aimed to evaluate a holistic Patient Reported Outcome Measure (PROM) of MA, known as SPUR, as a predictor of general admission and early readmission in patients living with Type 2 Diabetes. PATIENTS AND METHODS An observational study design was used to assess data collected over a 12-month period including 6-month retrospective and 6-month prospective monitoring of the number of admissions and early readmissions (admissions occurring within 30 days of discharge) across the cohort. Patients (n = 200) were recruited from a large South London NHS Trust. Covariates of interest included: age, ethnicity, gender, level of education, income, the number of medicines and medical conditions, and a Covid-19 diagnosis. A Poisson or negative binomial model was employed for count outcomes, with the exponentiated coefficient indicating incident ratios (IR) [95% CI]. For binary outcomes (Coefficient, [95% CI]), a logistic regression model was developed. RESULTS Higher SPUR scores (increased adherence) were significantly associated with a lower number of admissions (IR = 0.98, [0.96, 1.00]). The number of medical conditions (IR = 1.07, [1.01, 1.13]), age ≥80 years (IR = 5.18, [1.01, 26.55]), a positive Covid-19 diagnosis during follow-up (IR = 1.83, [1.11, 3.02]) and GCSE education (IR = 2.11, [1.15,3.87]) were factors associated with a greater risk of admission. When modelled as a binary variable, only the SPUR score (-0.051, [-0.094, -0.007]) was significantly predictive of an early readmission, with patients reporting higher SPUR scores being less likely to experience an early readmission. CONCLUSION Higher levels of MA, as determined by SPUR, were significantly associated with a lower risk of general admissions and early readmissions among patients living with Type 2 Diabetes.
Collapse
Affiliation(s)
- Joshua Wells
- Department of Pharmacy, Kingston University, Kingston upon Thames, KT1 2EE, UK
| | - Chao Wang
- Faculty of Health, Science, Social Care and Education, Kingston University, Kingston upon Thames, KT2 7LB, UK
| | - Kevin Dolgin
- Behavioural Science Department, Observia, Paris, 75015, France
| | - Reem Kayyali
- Department of Pharmacy, Kingston University, Kingston upon Thames, KT1 2EE, UK
- Correspondence: Reem Kayyali, Department of Pharmacy, Kingston University, Penrhyn Road, Kingston upon Thames, KT1 2EE, UK, Tel/Fax +44 208 417 2561, Email
| |
Collapse
|
23
|
Zhang H, Zhang N, Wu W, Zhou R, Li S, Wang Z, Dai Z, Zhang L, Liu Z, Zhang J, Luo P, Liu Z, Cheng Q. Machine learning-based tumor-infiltrating immune cell-associated lncRNAs for predicting prognosis and immunotherapy response in patients with glioblastoma. Brief Bioinform 2022; 23:6711411. [PMID: 36136350 DOI: 10.1093/bib/bbac386] [Citation(s) in RCA: 46] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 07/29/2022] [Accepted: 08/10/2022] [Indexed: 12/14/2022] Open
Abstract
Long noncoding ribonucleic acids (RNAs; lncRNAs) have been associated with cancer immunity regulation. However, the roles of immune cell-specific lncRNAs in glioblastoma (GBM) remain largely unknown. In this study, a novel computational framework was constructed to screen the tumor-infiltrating immune cell-associated lncRNAs (TIIClnc) for developing TIIClnc signature by integratively analyzing the transcriptome data of purified immune cells, GBM cell lines and bulk GBM tissues using six machine learning algorithms. As a result, TIIClnc signature could distinguish survival outcomes of GBM patients across four independent datasets, including the Xiangya in-house dataset, and more importantly, showed superior performance than 95 previously established signatures in gliomas. TIIClnc signature was revealed to be an indicator of the infiltration level of immune cells and predicted the response outcomes of immunotherapy. The positive correlation between TIIClnc signature and CD8, PD-1 and PD-L1 was verified in the Xiangya in-house dataset. As a newly demonstrated predictive biomarker, the TIIClnc signature enabled a more precise selection of the GBM population who would benefit from immunotherapy and should be validated and applied in the near future.
Collapse
Affiliation(s)
- Hao Zhang
- Department of Neurosurgery, Xiangya Hospital, Central South University, China.,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, China.,Department of Neurosurgery, The Second Affiliated Hospital, Chongqing Medical University, China
| | - Nan Zhang
- Department of Neurosurgery, Xiangya Hospital, Central South University, China.,One-third Lab, College of Bioinformatics Science and Technology, Harbin Medical University, China.,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, China
| | - Wantao Wu
- Department of Neurosurgery, Xiangya Hospital, Central South University, China.,Department of Oncology, Xiangya Hospital, Central South University, China.,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, China
| | - Ran Zhou
- Division of Neuroscience and Experimental Psychology, Faculty of Biology, Medicine and Health, University of Manchester, UK
| | - Shuyu Li
- Department of Thyroid and Breast Surgery, Tongji Hospital, Tongji Medical College of Huazhong University of Science and Technology, China
| | - Zeyu Wang
- Department of Neurosurgery, Xiangya Hospital, Central South University, China.,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, China
| | - Ziyu Dai
- Department of Neurosurgery, Xiangya Hospital, Central South University, China.,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, China
| | - Liyang Zhang
- Department of Neurosurgery, Xiangya Hospital, Central South University, China.,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, China
| | - Zaoqu Liu
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou, China
| | - Jian Zhang
- Department of Oncology, Zhujiang Hospital, Southern Medical University, China
| | - Peng Luo
- Department of Oncology, Zhujiang Hospital, Southern Medical University, China
| | - Zhixiong Liu
- Department of Neurosurgery, Xiangya Hospital, Central South University, China.,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, China
| | - Quan Cheng
- Department of Neurosurgery, Xiangya Hospital, Central South University, China.,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, China
| |
Collapse
|
24
|
Virdee PS, Patnick J, Watkinson P, Holt T, Birks J. Full Blood Count Trends for Colorectal Cancer Detection in Primary Care: Development and Validation of a Dynamic Prediction Model. Cancers (Basel) 2022; 14:cancers14194779. [PMID: 36230702 PMCID: PMC9563332 DOI: 10.3390/cancers14194779] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 09/22/2022] [Accepted: 09/27/2022] [Indexed: 11/24/2022] Open
Abstract
Simple Summary Colorectal cancer is the fourth most common cancer and second most common cause of cancer-death in the UK. If diagnosed and treated early-stage, when the cancer has not spread, 9 in 10 patients are alive five years later. If diagnosed at a late-stage, when the cancer has spread, this drops to 1 in 10 alive. Early detection can save lives, but more than half of colorectal cancers are diagnosed late-stage in the UK. Growing tumours often cause subtle changes in blood test results that could help with earlier detection. For example, patients diagnosed with colorectal cancer often have an increasingly lowering haemoglobin for a few years before their diagnosis, which is not seen in patients without colorectal cancer. These differences as subtle so may be difficult for doctors in primary care to spot from a series of blood tests. We developed a computer-based tool to do this. This tool checks the changes in a patient’s blood test results over the last five years to see how likely they are to have colorectal cancer. We report this tool here and describe how well it works in identifying colorectal cancer cases using blood tests performed in primary care. Abstract Colorectal cancer has low survival rates when late-stage, so earlier detection is important. The full blood count (FBC) is a common blood test performed in primary care. Relevant trends in repeated FBCs are related to colorectal cancer presence. We developed and internally validated dynamic prediction models utilising trends for early detection. We performed a cohort study. Sex-stratified multivariate joint models included age at baseline (most recent FBC) and simultaneous trends over historical haemoglobin, mean corpuscular volume (MCV), and platelet measurements up to baseline FBC for two-year risk of diagnosis. Performance measures included the c-statistic and calibration slope. We analysed 250,716 males and 246,695 females in the development cohort and 312,444 males and 462,900 females in the validation cohort, with 0.4% of males and 0.3% of females diagnosed two years after baseline FBC. Compared to average population trends, patient-level declines in haemoglobin and MCV and rise in platelets up to baseline FBC increased risk of diagnosis in two years. C-statistic: 0.751 (males) and 0.763 (females). Calibration slope: 1.06 (males) and 1.05 (females). Our models perform well, with low miscalibration. Utilising trends could bring forward diagnoses to earlier stages and improve survival rates. External validation is now required.
Collapse
Affiliation(s)
- Pradeep S. Virdee
- Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford OX2 6GG, UK
- Correspondence:
| | - Julietta Patnick
- Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, UK
| | - Peter Watkinson
- Kadoorie Centre for Critical Care Research and Education, Oxford University Hospitals NHS Trust, Oxford OX3 9DU, UK
| | - Tim Holt
- Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford OX2 6GG, UK
| | - Jacqueline Birks
- Centre for Statistics in Medicine, NDORMS, University of Oxford, Oxford OX3 7LD, UK
| |
Collapse
|
25
|
Stolarski AE, Kim J, Rop K, Wee K, Zhang Q, Remick DG. Machine learning and murine models explain failures of clinical sepsis trials. J Trauma Acute Care Surg 2022; 93:187-194. [PMID: 35881034 PMCID: PMC9335891 DOI: 10.1097/ta.0000000000003691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
BACKGROUND Multiple clinical trials failed to demonstrate the efficacy of hydrocortisone, ascorbic acid, and thiamine (HAT) in sepsis. These trials were dominated by patients with pulmonary sepsis and have not accounted for differences in the inflammatory responses across varying etiologies of injury/illness. Hydrocortisone, ascorbic acid, and thiamine have previously revealed tremendous benefits in animal peritonitis sepsis models (cecal ligation and puncture [CLP]) in contradiction to the various clinical trials. The impact of HAT remains unclear in pulmonary sepsis. Our objective was to investigate the impact of HAT in pneumonia, consistent with the predominate etiology in the discordant clinical trials. We hypothesized that, in a pulmonary sepsis model, HAT would act synergistically to reduce end-organ dysfunction by the altering the inflammatory response, in a unique manner compared with CLP. METHODS Using Pseudomonas aeruginosa pneumonia, a pulmonary sepsis model (pneumonia [PNA]) was compared directly to previously investigated intra-abdominal sepsis models. Machine learning applied to early vital signs stratified animals into those predicted to die (pDie) versus predicted to live (pLive). Animals were then randomized to receive antibiotics and fluids (vehicle [VEH]) vs. HAT). Vitals, cytokines, vitamin C, and markers of liver and kidney function were assessed in the blood, bronchoalveolar lavage, and organ homogenates. RESULTS PNA was induced in 119 outbred wild-type Institute of Cancer Research mice (predicted mortality approximately 50%) similar to CLP. In PNA, interleukin 1 receptor antagonist in 72-hour bronchoalveolar lavage was lower with HAT (2.36 ng/mL) compared with VEH (4.88 ng/mL; p = 0.04). The remaining inflammatory cytokines and markers of liver/renal function showed no significant difference with HAT in PNA. PNA vitamin C levels were 0.62 mg/dL (pDie HAT), lower than vitamin C levels after CLP (1.195 mg/dL). Unlike CLP, PNA mice did not develop acute kidney injury (blood urea nitrogen: pDie, 33.5 mg/dL vs. pLive, 27.6 mg/dL; p = 0.17). Furthermore, following PNA, HAT did not significantly reduce microscopic renal oxidative stress (mean gray area: pDie, 16.64 vs. pLive, 6.88; p = 0.93). Unlike CLP where HAT demonstrated a survival benefit, HAT had no impact on survival in PNA. CONCLUSION Hydrocortisone, ascorbic acid, and thiamine therapy has minimal benefits in pneumonia. The inflammatory response induced by pulmonary sepsis is unique compared with the response during intra-abdominal sepsis. Consequently, different etiologies of sepsis respond differently to HAT therapy.
Collapse
Affiliation(s)
| | - Jiyoun Kim
- Boston Medical Center | Boston University – Department of Pathology and Laboratory Medicine
| | - Kevin Rop
- Boston Medical Center | Boston University – Department of Pathology and Laboratory Medicine
| | - Katherine Wee
- Boston Medical Center | Boston University – Department of Pathology and Laboratory Medicine
| | - Qiuyang Zhang
- Boston Medical Center | Boston University – Department of Pathology and Laboratory Medicine
| | - Daniel G. Remick
- Boston Medical Center | Boston University – Department of Pathology and Laboratory Medicine
| |
Collapse
|
26
|
Loohuis AMM, Burger H, Wessels N, Dekker J, Malmberg AG, Berger MY, Blanker MH, van der Worp H. Prediction model study focusing on eHealth in the management of urinary incontinence: the Personalised Advantage Index as a decision-making aid. BMJ Open 2022; 12:e051827. [PMID: 35879013 PMCID: PMC9328108 DOI: 10.1136/bmjopen-2021-051827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
OBJECTIVE To develop a prediction model and illustrate the practical potential of personalisation of treatment decisions between app-based treatment and care as usual for urinary incontinence (UI). DESIGN A prediction model study using data from a pragmatic, randomised controlled, non-inferiority trial. SETTING Dutch primary care from 2015, with social media included from 2017. Enrolment ended on July 2018. PARTICIPANTS Adult women were eligible if they had ≥2 episodes of UI per week, access to mobile apps and wanted treatment. Of the 350 screened women, 262 were eligible and randomised to app-based treatment or care as usual; 195 (74%) attended follow-up. PREDICTORS Literature review and expert opinion identified 13 candidate predictors, categorised into two groups: Prognostic factors (independent of treatment type), such as UI severity, postmenopausal state, vaginal births, general physical health status, pelvic floor muscle function and body mass index; and modifiers (dependent on treatment type), such as age, UI type and duration, impact on quality of life, previous physical therapy, recruitment method and educational level. MAIN OUTCOME MEASURE Primary outcome was symptom severity after a 4-month follow-up period, measured by the International Consultation on Incontinence Questionnaire the Urinary Incontinence Short Form. Prognostic factors and modifiers were combined into a final prediction model. For each participant, we then predicted treatment outcomes and calculated a Personalised Advantage Index (PAI). RESULTS Baseline UI severity (prognostic) and age, educational level and impact on quality of life (modifiers) independently affected treatment effect of eHealth. The mean PAI was 0.99±0.79 points, being of clinical relevance in 21% of individuals. Applying the PAI also significantly improved treatment outcomes at the group level. CONCLUSIONS Personalising treatment choice can support treatment decision making between eHealth and care as usual through the practical application of prediction modelling. Concerning eHealth for UI, this could facilitate the choice between app-based treatment and care as usual. TRIAL REGISTRATION NUMBER NL4948t.
Collapse
Affiliation(s)
- Anne Martina Maria Loohuis
- Department of General Practice and Elderly Care medicine, University Medical Center Groningen, Groningen, The Netherlands
| | - Huibert Burger
- Department of General Practice and Elderly Care medicine, University Medical Center Groningen, Groningen, The Netherlands
| | - Nienke Wessels
- Department of General Practice and Elderly Care medicine, University Medical Center Groningen, Groningen, The Netherlands
| | - Janny Dekker
- Department of General Practice and Elderly Care medicine, University Medical Center Groningen, Groningen, The Netherlands
| | - Alec Gga Malmberg
- Department of Obstetrics and Gynaecology, University Medical Centre Groningen, Groningen, The Netherlands
| | - Marjolein Y Berger
- Department of General Practice and Elderly Care medicine, University Medical Center Groningen, Groningen, The Netherlands
| | - Marco H Blanker
- Department of General Practice and Elderly Care medicine, University Medical Center Groningen, Groningen, The Netherlands
| | - Henk van der Worp
- Department of General Practice and Elderly Care medicine, University Medical Center Groningen, Groningen, The Netherlands
| |
Collapse
|
27
|
van Royen FS, Moons KGM, Geersing GJ, van Smeden M. Developing, validating, updating and judging the impact of prognostic models for respiratory diseases. Eur Respir J 2022; 60:13993003.00250-2022. [PMID: 35728976 DOI: 10.1183/13993003.00250-2022] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Accepted: 05/27/2022] [Indexed: 11/05/2022]
Affiliation(s)
- Florien S van Royen
- Dept. General Practice, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
| | - Karel G M Moons
- Dept. Epidemiology, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
| | - Geert-Jan Geersing
- Dept. General Practice, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
| | - Maarten van Smeden
- Dept. Epidemiology, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
| |
Collapse
|
28
|
Hafermann L, Klein N, Rauch G, Kammer M, Heinze G. Using Background Knowledge from Preceding Studies for Building a Random Forest Prediction Model: A Plasmode Simulation Study. ENTROPY 2022; 24:e24060847. [PMID: 35741566 PMCID: PMC9222226 DOI: 10.3390/e24060847] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 06/14/2022] [Accepted: 06/15/2022] [Indexed: 12/05/2022]
Abstract
There is an increasing interest in machine learning (ML) algorithms for predicting patient outcomes, as these methods are designed to automatically discover complex data patterns. For example, the random forest (RF) algorithm is designed to identify relevant predictor variables out of a large set of candidates. In addition, researchers may also use external information for variable selection to improve model interpretability and variable selection accuracy, thereby prediction quality. However, it is unclear to which extent, if at all, RF and ML methods may benefit from external information. In this paper, we examine the usefulness of external information from prior variable selection studies that used traditional statistical modeling approaches such as the Lasso, or suboptimal methods such as univariate selection. We conducted a plasmode simulation study based on subsampling a data set from a pharmacoepidemiologic study with nearly 200,000 individuals, two binary outcomes and 1152 candidate predictor (mainly sparse binary) variables. When the scope of candidate predictors was reduced based on external knowledge RF models achieved better calibration, that is, better agreement of predictions and observed outcome rates. However, prediction quality measured by cross-entropy, AUROC or the Brier score did not improve. We recommend appraising the methodological quality of studies that serve as an external information source for future prediction model development.
Collapse
Affiliation(s)
- Lorena Hafermann
- Institute of Biometry and Clinical Epidemiology, Charité–Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Charitéplatz 1, 10117 Berlin, Germany; (L.H.); (G.R.)
| | - Nadja Klein
- Chair of Statistics and Data Science, School of Business and Economics, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099 Berlin, Germany
- Correspondence: (N.K.); (G.H.)
| | - Geraldine Rauch
- Institute of Biometry and Clinical Epidemiology, Charité–Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Charitéplatz 1, 10117 Berlin, Germany; (L.H.); (G.R.)
| | - Michael Kammer
- Section for Clinical Biometrics, Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Spitalgasse 23, 1090 Vienna, Austria;
| | - Georg Heinze
- Section for Clinical Biometrics, Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Spitalgasse 23, 1090 Vienna, Austria;
- Correspondence: (N.K.); (G.H.)
| |
Collapse
|
29
|
van den Goorbergh R, van Smeden M, Timmerman D, Van Calster B. The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression. J Am Med Inform Assoc 2022; 29:1525-1534. [PMID: 35686364 PMCID: PMC9382395 DOI: 10.1093/jamia/ocac093] [Citation(s) in RCA: 58] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Revised: 05/12/2022] [Accepted: 05/27/2022] [Indexed: 12/23/2022] Open
Abstract
OBJECTIVE Methods to correct class imbalance (imbalance between the frequency of outcome events and nonevents) are receiving increasing interest for developing prediction models. We examined the effect of imbalance correction on the performance of logistic regression models. MATERIAL AND METHODS Prediction models were developed using standard and penalized (ridge) logistic regression under 4 methods to address class imbalance: no correction, random undersampling, random oversampling, and SMOTE. Model performance was evaluated in terms of discrimination, calibration, and classification. Using Monte Carlo simulations, we studied the impact of training set size, number of predictors, and the outcome event fraction. A case study on prediction modeling for ovarian cancer diagnosis is presented. RESULTS The use of random undersampling, random oversampling, or SMOTE yielded poorly calibrated models: the probability to belong to the minority class was strongly overestimated. These methods did not result in higher areas under the ROC curve when compared with models developed without correction for class imbalance. Although imbalance correction improved the balance between sensitivity and specificity, similar results were obtained by shifting the probability threshold instead. DISCUSSION Imbalance correction led to models with strong miscalibration without better ability to distinguish between patients with and without the outcome event. The inaccurate probability estimates reduce the clinical utility of the model, because decisions about treatment are ill-informed. CONCLUSION Outcome imbalance is not a problem in itself, imbalance correction may even worsen model performance.
Collapse
Affiliation(s)
- Ruben van den Goorbergh
- Julius Center for Health Sciences and Primary Care, UMC Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Maarten van Smeden
- Julius Center for Health Sciences and Primary Care, UMC Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Dirk Timmerman
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium.,Department of Obstetrics and Gynecology, University Hospitals Leuven, Leuven, Belgium
| | - Ben Van Calster
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium.,Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands.,EPI-Center, KU Leuven, Leuven, Belgium
| |
Collapse
|
30
|
Abstract
INTRODUCTION The immunobiology defining the clinically apparent differences in response to sepsis remains unclear. We hypothesize that in murine models of sepsis we can identify phenotypes of sepsis using non-invasive physiologic parameters (NIPP) early after infection to distinguish between different inflammatory states. METHODS Two murine models of sepsis were used: gram-negative pneumonia (PNA) and cecal ligation and puncture (CLP). All mice were treated with broad spectrum antibiotics and fluid resuscitation. High-risk sepsis responders (pDie) were defined as those predicted to die within 72 h following infection. Low-risk responders (pLive) were expected to survive the initial 72 h of sepsis. Statistical modeling in R was used for statistical analysis and machine learning. RESULTS NIPP obtained at 6 and 24 h after infection of 291 mice (85 PNA and 206 CLP) were used to define the sepsis phenotypes. Lasso regression for variable selection with 10-fold cross-validation was used to define the optimal shrinkage parameters. The variables selected to discriminate between phenotypes included 6-h temperature and 24-h pulse distention, heart rate (HR), and temperature. Applying the model to fit test data (n = 55), area under the curve (AUC) for the receiver operating characteristics (ROC) curve was 0.93. Subgroup analysis of 120 CLP mice revealed a HR of <620 bpm at 24 h as a univariate predictor of pDie. (AUC of ROC curve = 0.90). Subgroup analysis of PNA exposed mice (n = 121) did not reveal a single predictive variable highlighting the complex physiological alterations in response to sepsis. CONCLUSION In murine models with various etiologies of sepsis, non-invasive vitals assessed just 6 and 24 h after infection can identify different sepsis phenotypes. Stratification by sepsis phenotypes can transform future studies investigating novel therapies for sepsis.
Collapse
|
31
|
Modern Learning from Big Data in Critical Care: Primum Non Nocere. Neurocrit Care 2022; 37:174-184. [PMID: 35513752 PMCID: PMC9071245 DOI: 10.1007/s12028-022-01510-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 04/06/2022] [Indexed: 12/13/2022]
Abstract
Large and complex data sets are increasingly available for research in critical care. To analyze these data, researchers use techniques commonly referred to as statistical learning or machine learning (ML). The latter is known for large successes in the field of diagnostics, for example, by identification of radiological anomalies. In other research areas, such as clustering and prediction studies, there is more discussion regarding the benefit and efficiency of ML techniques compared with statistical learning. In this viewpoint, we aim to explain commonly used statistical learning and ML techniques and provide guidance for responsible use in the case of clustering and prediction questions in critical care. Clustering studies have been increasingly popular in critical care research, aiming to inform how patients can be characterized, classified, or treated differently. An important challenge for clustering studies is to ensure and assess generalizability. This limits the application of findings in these studies toward individual patients. In the case of predictive questions, there is much discussion as to what algorithm should be used to most accurately predict outcome. Aspects that determine usefulness of ML, compared with statistical techniques, include the volume of the data, the dimensionality of the preferred model, and the extent of missing data. There are areas in which modern ML methods may be preferred. However, efforts should be made to implement statistical frameworks (e.g., for dealing with missing data or measurement error, both omnipresent in clinical data) in ML methods. To conclude, there are important opportunities but also pitfalls to consider when performing clustering or predictive studies with ML techniques. We advocate careful valuation of new data-driven findings. More interaction is needed between the engineer mindset of experts in ML methods, the insight in bias of epidemiologists, and the probabilistic thinking of statisticians to extract as much information and knowledge from data as possible, while avoiding harm.
Collapse
|
32
|
McNamara ME, Zisser M, Beevers CG, Shumake J. Not just “big” data: Importance of sample size, measurement error, and uninformative predictors for developing prognostic models for digital interventions. Behav Res Ther 2022; 153:104086. [DOI: 10.1016/j.brat.2022.104086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Revised: 03/11/2022] [Accepted: 04/05/2022] [Indexed: 11/24/2022]
|
33
|
Fonseca de Freitas D, Kadra-Scalzo G, Agbedjro D, Francis E, Ridler I, Pritchard M, Shetty H, Segev A, Casetta C, Smart SE, Downs J, Christensen SR, Bak N, Kinon BJ, Stahl D, MacCabe JH, Hayes RD. Using a statistical learning approach to identify sociodemographic and clinical predictors of response to clozapine. J Psychopharmacol 2022; 36:498-506. [PMID: 35212240 PMCID: PMC9066692 DOI: 10.1177/02698811221078746] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
BACKGROUND A proportion of people with treatment-resistant schizophrenia fail to show improvement on clozapine treatment. Knowledge of the sociodemographic and clinical factors predicting clozapine response may be useful in developing personalised approaches to treatment. METHODS This retrospective cohort study used data from the electronic health records of the South London and Maudsley (SLaM) hospital between 2007 and 2011. Using the Least Absolute Shrinkage and Selection Operator (LASSO) regression statistical learning approach, we examined 35 sociodemographic and clinical factors' predictive ability of response to clozapine at 3 months of treatment. Response was assessed by the level of change in the severity of the symptoms using the Clinical Global Impression (CGI) scale. RESULTS We identified 242 service-users with a treatment-resistant psychotic disorder who had their first trial of clozapine and continued the treatment for at least 3 months. The LASSO regression identified three predictors of response to clozapine: higher severity of illness at baseline, female gender and having a comorbid mood disorder. These factors are estimated to explain 18% of the variance in clozapine response. The model's optimism-corrected calibration slope was 1.37, suggesting that the model will underfit when applied to new data. CONCLUSIONS These findings suggest that women, people with a comorbid mood disorder and those who are most ill at baseline respond better to clozapine. However, the accuracy of the internally validated and recalibrated model was low. Therefore, future research should indicate whether a prediction model developed by including routinely collected data, in combination with biological information, presents adequate predictive ability to be applied in clinical settings.
Collapse
Affiliation(s)
| | | | - Deborah Agbedjro
- Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, UK
| | - Emma Francis
- Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, UK
| | - Isobel Ridler
- Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, UK
| | - Megan Pritchard
- Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, UK
| | - Hitesh Shetty
- Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, UK
| | - Aviv Segev
- Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, UK
- Shalvata Mental Health Center, Hod Hasharon, Israel
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Cecilia Casetta
- Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, UK
- Department of Health Sciences, Università degli Studi di Milano, Milan, Italy
| | - Sophie E Smart
- Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, UK
- MRC Centre for Neuropsychiatric Genetics & Genomics, Cardiff University, Cardiff, UK
| | - Johnny Downs
- Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, UK
| | | | | | | | - Daniel Stahl
- Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, UK
| | - James H MacCabe
- Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, UK
| | - Richard D Hayes
- Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, UK
| |
Collapse
|
34
|
Oosterhoff JHF, Gravesteijn BY, Karhade AV, Jaarsma RL, Kerkhoffs GMMJ, Ring D, Schwab JH, Steyerberg EW, Doornberg JN. Feasibility of Machine Learning and Logistic Regression Algorithms to Predict Outcome in Orthopaedic Trauma Surgery. J Bone Joint Surg Am 2022; 104:544-551. [PMID: 34921550 DOI: 10.2106/jbjs.21.00341] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
BACKGROUND Statistical models using machine learning (ML) have the potential for more accurate estimates of the probability of binary events than logistic regression. The present study used existing data sets from large musculoskeletal trauma trials to address the following study questions: (1) Do ML models produce better probability estimates than logistic regression models? (2) Are ML models influenced by different variables than logistic regression models? METHODS We created ML and logistic regression models that estimated the probability of a specific fracture (posterior malleolar involvement in distal spiral tibial shaft and ankle fractures, scaphoid fracture, and distal radial fracture) or adverse event (subsequent surgery [after distal biceps repair or tibial shaft fracture], surgical site infection, and postoperative delirium) using 9 data sets from published musculoskeletal trauma studies. Each data set was split into training (80%) and test (20%) subsets. Fivefold cross-validation of the training set was used to develop the ML models. The best-performing model was then assessed in the independent testing data. Performance was assessed by (1) discrimination (c-statistic), (2) calibration (slope and intercept), and (3) overall performance (Brier score). RESULTS The mean c-statistic was 0.01 higher for the logistic regression models compared with the best ML models for each data set (range, -0.01 to 0.06). There were fewer variables strongly associated with variation in the ML models, and many were dissimilar from those in the logistic regression models. CONCLUSIONS The observation that ML models produce probability estimates comparable with logistic regression models for binary events in musculoskeletal trauma suggests that their benefit may be limited in this context.
Collapse
Affiliation(s)
- Jacobien H F Oosterhoff
- Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts
- Department of Orthopaedic Surgery, Amsterdam Movement Sciences, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, the Netherlands
- Department of Orthopaedic & Trauma Surgery, Flinders Medical Centre, Flinders University, Adelaide, South Australia, Australia
| | - Benjamin Y Gravesteijn
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands
| | - Aditya V Karhade
- Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts
| | - Ruurd L Jaarsma
- Department of Orthopaedic & Trauma Surgery, Flinders Medical Centre, Flinders University, Adelaide, South Australia, Australia
| | - Gino M M J Kerkhoffs
- Department of Orthopaedic Surgery, Amsterdam Movement Sciences, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, the Netherlands
| | - David Ring
- Department of Surgery and Perioperative Care, Dell Medical School, University of Texas, Austin, Texas
| | - Joseph H Schwab
- Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts
| | - Ewout W Steyerberg
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands
| | - Job N Doornberg
- Department of Orthopaedic & Trauma Surgery, Flinders Medical Centre, Flinders University, Adelaide, South Australia, Australia
- Department of Orthopaedic Surgery, University Medical Centre Groningen, University of Groningen, Groningen, the Netherlands
| | | |
Collapse
|
35
|
Gregorich M, Melograna F, Sunqvist M, Michiels S, Van Steen K, Heinze G. Individual-specific networks for prediction modelling – A scoping review of methods. BMC Med Res Methodol 2022; 22:62. [PMID: 35249534 PMCID: PMC8898441 DOI: 10.1186/s12874-022-01544-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 02/11/2022] [Indexed: 11/10/2022] Open
Abstract
Background Recent advances in biotechnology enable the acquisition of high-dimensional data on individuals, posing challenges for prediction models which traditionally use covariates such as clinical patient characteristics. Alternative forms of covariate representations for the features derived from these modern data modalities should be considered that can utilize their intrinsic interconnection. The connectivity information between these features can be represented as an individual-specific network defined by a set of nodes and edges, the strength of which can vary from individual to individual. Global or local graph-theoretical features describing the network may constitute potential prognostic biomarkers instead of or in addition to traditional covariates and may replace the often unsuccessful search for individual biomarkers in a high-dimensional predictor space. Methods We conducted a scoping review to identify, collate and critically appraise the state-of-art in the use of individual-specific networks for prediction modelling in medicine and applied health research, published during 2000–2020 in the electronic databases PubMed, Scopus and Embase. Results Our scoping review revealed the main application areas namely neurology and pathopsychology, followed by cancer research, cardiology and pathology (N = 148). Network construction was mainly based on Pearson correlation coefficients of repeated measurements, but also alternative approaches (e.g. partial correlation, visibility graphs) were found. For covariates measured only once per individual, network construction was mostly based on quantifying an individual’s contribution to the overall group-level structure. Despite the multitude of identified methodological approaches for individual-specific network inference, the number of studies that were intended to enable the prediction of clinical outcomes for future individuals was quite limited, and most of the models served as proof of concept that network characteristics can in principle be useful for prediction. Conclusion The current body of research clearly demonstrates the value of individual-specific network analysis for prediction modelling, but it has not yet been considered as a general tool outside the current areas of application. More methodological research is still needed on well-founded strategies for network inference, especially on adequate network sparsification and outcome-guided graph-theoretical feature extraction and selection, and on how networks can be exploited efficiently for prediction modelling. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-022-01544-6.
Collapse
|
36
|
Yan Y, Yang Z, Semenkovich TR, Kozower BD, Meyers BF, Nava RG, Kreisel D, Puri V. Comparison of standard and penalized logistic regression in risk model development. JTCVS OPEN 2022; 9:303-316. [PMID: 36003440 PMCID: PMC9390725 DOI: 10.1016/j.xjon.2022.01.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Accepted: 01/13/2022] [Indexed: 11/26/2022]
Abstract
Objective Regression models are ubiquitous in thoracic surgical research. We aimed to compare the value of standard logistic regression with the more complex but increasingly used penalized regression models using a recently published risk model as an example. Methods Using a standardized data set of clinical T1-3N0 esophageal cancer patients, we created models to predict the likelihood of unexpected pathologic nodal disease after surgical resection. Models were fitted using standard logistic regression or penalized regression (ridge, lasso, elastic net, and adaptive lasso). We compared the model performance (Brier score, calibration slope, C statistic, and overfitting) of standard regression with penalized regression models. Results Among 3206 patients with clinical T1-3N0 esophageal cancer, 668 (22%) had unexpected pathologic nodal disease. Of the 15 candidate variables considered in the models, the key predictors of nodal disease included clinical tumor stage, tumor size, grade, and presence of lymphovascular invasion. The standard regression model and all 4 penalized logistic regression models had virtually identical performance with Brier score ranging from 0.138 to 0.141, concordance index ranging from 0.775 to 0.788, and calibration slope from 0.965 to 1.05. Conclusions For predictive modeling in surgical outcomes research, when the data set is large and the outcome of interest is relatively frequent, standard regression models and the more complicated penalized models are very likely to have similar predictive performance. The choice of statistical methods for risk model development should be on the basis of the nature of the data at hand and good statistical practice, rather than the novelty or complexity of statistical models.
Collapse
Affiliation(s)
- Yan Yan
- Division of Public Health Sciences, Washington University School of Medicine, St Louis, Mo
| | - Zhizhou Yang
- Division of Cardiothoracic Surgery, Washington University School of Medicine, St Louis, Mo
| | - Tara R. Semenkovich
- Division of Cardiothoracic Surgery, Washington University School of Medicine, St Louis, Mo
| | - Benjamin D. Kozower
- Division of Cardiothoracic Surgery, Washington University School of Medicine, St Louis, Mo
| | - Bryan F. Meyers
- Division of Cardiothoracic Surgery, Washington University School of Medicine, St Louis, Mo
| | - Ruben G. Nava
- Division of Cardiothoracic Surgery, Washington University School of Medicine, St Louis, Mo
| | - Daniel Kreisel
- Division of Cardiothoracic Surgery, Washington University School of Medicine, St Louis, Mo
| | - Varun Puri
- Division of Cardiothoracic Surgery, Washington University School of Medicine, St Louis, Mo
- Address for reprints: Varun Puri, MD, MSCI, 660 S Euclid Ave, Campus Box 8234, St Louis, MO 63110.
| |
Collapse
|
37
|
Chen YJ, Wang WF, Jhang KM, Chang MC, Chang CC, Liao YC. Prediction of Institutionalization for Patients With Dementia in Taiwan According to Condition at Entry to Dementia Collaborative Care. J Appl Gerontol 2022; 41:1357-1364. [PMID: 35220779 DOI: 10.1177/07334648211073129] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
This study aimed to examine the institutionalization rate in patients with dementia in Taiwan, identify the predictors of institutionalization, and conduct a mediation analysis of caregiver burden between neuropsychiatric symptoms and institutionalization. We analyzed data from a retrospective cohort registered in dementia collaborative care (N = 518). The analyses applied univariate and multivariate Cox proportional hazard regression with Firth's penalized likelihood to assess the relationship between each predictor at entry and institutionalization for survival analysis. Thirty (5.8%) patients were censored due to institutionalization after a median follow-up of one-and-a-half years. Neuropsychiatric symptoms, loss of walking ability, and living alone predicted institutionalization. Caregiver burden may partially mediate the effects of neuropsychiatric symptoms and institutionalization. High caregiver burden due to presence of neuropsychiatric symptoms may partially contribute to institutionalization among people living with dementia in Taiwan. However, proper management of neuropsychiatric symptoms and caregiver empowerment may ameliorate institutionalization risk.
Collapse
Affiliation(s)
- Yen-Jen Chen
- Department of Psychiatry, 36596Changhua Christian Hospital, Changhua, Taiwan.,Department of Psychiatry, Yuanlin Christian Hospital, Changhua, Taiwan
| | - Wen-Fu Wang
- Department of Neurology, 36596Changhua Christian Hospital, Changhua, Taiwan.,Department of Holistic Wellness, Ming Dao University, Changhua, Taiwan
| | - Kai-Ming Jhang
- Department of Neurology, 36596Changhua Christian Hospital, Changhua, Taiwan
| | - Ming-Che Chang
- Department of Nuclear Medicine, 36596Changhua Christian Hospital, Changhua, Taiwan
| | - Cheng-Chen Chang
- Department of Psychiatry, 36596Changhua Christian Hospital, Changhua, Taiwan.,School of Medicine, 34899Chung Shan Medical University, Taichung, Taiwan
| | - Yi-Cheng Liao
- Department of Psychiatry, 36596Changhua Christian Hospital, Changhua, Taiwan
| |
Collapse
|
38
|
de Hond AAH, Leeuwenberg AM, Hooft L, Kant IMJ, Nijman SWJ, van Os HJA, Aardoom JJ, Debray TPA, Schuit E, van Smeden M, Reitsma JB, Steyerberg EW, Chavannes NH, Moons KGM. Guidelines and quality criteria for artificial intelligence-based prediction models in healthcare: a scoping review. NPJ Digit Med 2022; 5:2. [PMID: 35013569 PMCID: PMC8748878 DOI: 10.1038/s41746-021-00549-7] [Citation(s) in RCA: 105] [Impact Index Per Article: 52.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 12/13/2021] [Indexed: 12/23/2022] Open
Abstract
While the opportunities of ML and AI in healthcare are promising, the growth of complex data-driven prediction models requires careful quality and applicability assessment before they are applied and disseminated in daily practice. This scoping review aimed to identify actionable guidance for those closely involved in AI-based prediction model (AIPM) development, evaluation and implementation including software engineers, data scientists, and healthcare professionals and to identify potential gaps in this guidance. We performed a scoping review of the relevant literature providing guidance or quality criteria regarding the development, evaluation, and implementation of AIPMs using a comprehensive multi-stage screening strategy. PubMed, Web of Science, and the ACM Digital Library were searched, and AI experts were consulted. Topics were extracted from the identified literature and summarized across the six phases at the core of this review: (1) data preparation, (2) AIPM development, (3) AIPM validation, (4) software development, (5) AIPM impact assessment, and (6) AIPM implementation into daily healthcare practice. From 2683 unique hits, 72 relevant guidance documents were identified. Substantial guidance was found for data preparation, AIPM development and AIPM validation (phases 1-3), while later phases clearly have received less attention (software development, impact assessment and implementation) in the scientific literature. The six phases of the AIPM development, evaluation and implementation cycle provide a framework for responsible introduction of AI-based prediction models in healthcare. Additional domain and technology specific research may be necessary and more practical experience with implementing AIPMs is needed to support further guidance.
Collapse
Affiliation(s)
- Anne A H de Hond
- Department of Information Technology and Digital Innovation, Leiden University Medical Center, Leiden, The Netherlands.
- Clinical AI Implementation and Research Lab, Leiden University Medical Center, Leiden, The Netherlands.
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands.
| | - Artuur M Leeuwenberg
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands.
| | - Lotty Hooft
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Ilse M J Kant
- Department of Information Technology and Digital Innovation, Leiden University Medical Center, Leiden, The Netherlands
- Clinical AI Implementation and Research Lab, Leiden University Medical Center, Leiden, The Netherlands
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - Steven W J Nijman
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Hendrikus J A van Os
- Clinical AI Implementation and Research Lab, Leiden University Medical Center, Leiden, The Netherlands
- National eHealth Living Lab, Leiden, The Netherlands
| | - Jiska J Aardoom
- National eHealth Living Lab, Leiden, The Netherlands
- Department of Public Health and Primary Care, Leiden University Medical Center, Leiden, The Netherlands
| | - Thomas P A Debray
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Ewoud Schuit
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Maarten van Smeden
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Johannes B Reitsma
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Ewout W Steyerberg
- Clinical AI Implementation and Research Lab, Leiden University Medical Center, Leiden, The Netherlands
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - Niels H Chavannes
- National eHealth Living Lab, Leiden, The Netherlands
- Department of Public Health and Primary Care, Leiden University Medical Center, Leiden, The Netherlands
| | - Karel G M Moons
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
39
|
Joshi A, Geroldinger A, Jiricka L, Senchaudhuri P, Corcoran C, Heinze G. Solutions to problems of nonexistence of parameter estimates and sparse data bias in Poisson regression. Stat Methods Med Res 2021; 31:253-266. [PMID: 34931909 PMCID: PMC8829730 DOI: 10.1177/09622802211065405] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Poisson regression can be challenging with sparse data, in particular with certain data constellations where maximum likelihood estimates of regression coefficients do not exist. This paper provides a comprehensive evaluation of methods that give finite regression coefficients when maximum likelihood estimates do not exist, including Firth’s general approach to bias reduction, exact conditional Poisson regression, and a Bayesian estimator using weakly informative priors that can be obtained via data augmentation. Furthermore, we include in our evaluation a new proposal for a modification of Firth’s approach, improving its performance for predictions without compromising its attractive bias-correcting properties for regression coefficients. We illustrate the issue of the nonexistence of maximum likelihood estimates with a dataset arising from the recent outbreak of COVID-19 and an example from implant dentistry. All methods are evaluated in a comprehensive simulation study under a variety of realistic scenarios, evaluating their performance for prediction and estimation. To conclude, while exact conditional Poisson regression may be confined to small data sets only, both the modification of Firth’s approach and the Bayesian estimator are universally applicable solutions with attractive properties for prediction and estimation. While the Bayesian method needs specification of prior variances for the regression coefficients, the modified Firth approach does not require any user input.
Collapse
Affiliation(s)
- Ashwini Joshi
- Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| | - Angelika Geroldinger
- Center for Medical Statistics, Informatics and Intelligent Systems, Section for Clinical Biometrics, 27271Medical University of Vienna, Vienna, Austria
| | - Lena Jiricka
- Center for Medical Statistics, Informatics and Intelligent Systems, Section for Clinical Biometrics, 27271Medical University of Vienna, Vienna, Austria
| | | | - Christopher Corcoran
- Jon M. Huntsman School of Business, Department for Data Analytics and Information Systems, 4606Utah State University, Logan, UT, USA
| | - Georg Heinze
- Center for Medical Statistics, Informatics and Intelligent Systems, Section for Clinical Biometrics, 27271Medical University of Vienna, Vienna, Austria
| |
Collapse
|
40
|
Strömer A, Staerk C, Klein N, Weinhold L, Titze S, Mayr A. Deselection of base-learners for statistical boosting-with an application to distributional regression. Stat Methods Med Res 2021; 31:207-224. [PMID: 34882438 DOI: 10.1177/09622802211051088] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
We present a new procedure for enhanced variable selection for component-wise gradient boosting. Statistical boosting is a computational approach that emerged from machine learning, which allows to fit regression models in the presence of high-dimensional data. Furthermore, the algorithm can lead to data-driven variable selection. In practice, however, the final models typically tend to include too many variables in some situations. This occurs particularly for low-dimensional data (p<n), where we observe a slow overfitting behavior of boosting. As a result, more variables get included into the final model without altering the prediction accuracy. Many of these false positives are incorporated with a small coefficient and therefore have a small impact, but lead to a larger model. We try to overcome this issue by giving the algorithm the chance to deselect base-learners with minor importance. We analyze the impact of the new approach on variable selection and prediction performance in comparison to alternative methods including boosting with earlier stopping as well as twin boosting. We illustrate our approach with data of an ongoing cohort study for chronic kidney disease patients, where the most influential predictors for the health-related quality of life measure are selected in a distributional regression approach based on beta regression.
Collapse
Affiliation(s)
- Annika Strömer
- Department of Medical Biometrics, Informatics and Epidemiology, Faculty of Medicine, 9374University of Bonn, Germany
| | - Christian Staerk
- Department of Medical Biometrics, Informatics and Epidemiology, Faculty of Medicine, 9374University of Bonn, Germany
| | - Nadja Klein
- Emmy Noether Research Group in Statistics and Data Science, Humboldt-Universität zu Berlin, Germany
| | - Leonie Weinhold
- Department of Medical Biometrics, Informatics and Epidemiology, Faculty of Medicine, 9374University of Bonn, Germany
| | - Stephanie Titze
- Department of Nephrology and Hypertension, 9171FAU Erlangen-Nuremberg, Germany
| | - Andreas Mayr
- Department of Medical Biometrics, Informatics and Epidemiology, Faculty of Medicine, 9374University of Bonn, Germany
| |
Collapse
|
41
|
Martin GP, Riley RD, Collins GS, Sperrin M. Developing clinical prediction models when adhering to minimum sample size recommendations: The importance of quantifying bootstrap variability in tuning parameters and predictive performance. Stat Methods Med Res 2021; 30:2545-2561. [PMID: 34623193 PMCID: PMC8649413 DOI: 10.1177/09622802211046388] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Recent minimum sample size formula (Riley et al.) for developing clinical prediction models help ensure that development datasets are of sufficient size to minimise overfitting. While these criteria are known to avoid excessive overfitting on average, the extent of variability in overfitting at recommended sample sizes is unknown. We investigated this through a simulation study and empirical example to develop logistic regression clinical prediction models using unpenalised maximum likelihood estimation, and various post-estimation shrinkage or penalisation methods. While the mean calibration slope was close to the ideal value of one for all methods, penalisation further reduced the level of overfitting, on average, compared to unpenalised methods. This came at the cost of higher variability in predictive performance for penalisation methods in external data. We recommend that penalisation methods are used in data that meet, or surpass, minimum sample size requirements to further mitigate overfitting, and that the variability in predictive performance and any tuning parameters should always be examined as part of the model development process, since this provides additional information over average (optimism-adjusted) performance alone. Lower variability would give reassurance that the developed clinical prediction model will perform well in new individuals from the same population as was used for model development.
Collapse
Affiliation(s)
- Glen P Martin
- Division of Informatics, Imaging and Data Science, Faculty of
Biology, Medicine and Health, University of Manchester, Manchester Academic Health Science Centre, UK
| | - Richard D Riley
- Centre for Prognosis Research, School of Medicine, Keele University,
UK
| | - Gary S Collins
- Centre for Statistics in Medicine, Nuffield Department of
Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford,
UK
| | - Matthew Sperrin
- Division of Informatics, Imaging and Data Science, Faculty of
Biology, Medicine and Health, University of Manchester, Manchester Academic Health Science Centre, UK
| |
Collapse
|
42
|
Hoogland J, IntHout J, Belias M, Rovers MM, Riley RD, E. Harrell Jr F, Moons KGM, Debray TPA, Reitsma JB. A tutorial on individualized treatment effect prediction from randomized trials with a binary endpoint. Stat Med 2021; 40:5961-5981. [PMID: 34402094 PMCID: PMC9291969 DOI: 10.1002/sim.9154] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Revised: 06/08/2021] [Accepted: 07/19/2021] [Indexed: 12/23/2022]
Abstract
Randomized trials typically estimate average relative treatment effects, but decisions on the benefit of a treatment are possibly better informed by more individualized predictions of the absolute treatment effect. In case of a binary outcome, these predictions of absolute individualized treatment effect require knowledge of the individual's risk without treatment and incorporation of a possibly differential treatment effect (ie, varying with patient characteristics). In this article, we lay out the causal structure of individualized treatment effect in terms of potential outcomes and describe the required assumptions that underlie a causal interpretation of its prediction. Subsequently, we describe regression models and model estimation techniques that can be used to move from average to more individualized treatment effect predictions. We focus mainly on logistic regression-based methods that are both well-known and naturally provide the required probabilistic estimates. We incorporate key components from both causal inference and prediction research to arrive at individualized treatment effect predictions. While the separate components are well known, their successful amalgamation is very much an ongoing field of research. We cut the problem down to its essentials in the setting of a randomized trial, discuss the importance of a clear definition of the estimand of interest, provide insight into the required assumptions, and give guidance with respect to modeling and estimation options. Simulated data illustrate the potential of different modeling options across scenarios that vary both average treatment effect and treatment effect heterogeneity. Two applied examples illustrate individualized treatment effect prediction in randomized trial data.
Collapse
Affiliation(s)
- Jeroen Hoogland
- Julius Center for Health Sciences and Primary Care, University Medical Center UtrechtUtrecht UniversityUtrechtthe Netherlands
| | - Joanna IntHout
- Radboud Institute for Health Sciences (RIHS)Radboud University Medical CenterNijmegenthe Netherlands
| | - Michail Belias
- Radboud Institute for Health Sciences (RIHS)Radboud University Medical CenterNijmegenthe Netherlands
| | - Maroeska M. Rovers
- Radboud Institute for Health Sciences (RIHS)Radboud University Medical CenterNijmegenthe Netherlands
| | | | - Frank E. Harrell Jr
- Department of BiostatisticsVanderbilt University School of MedicineNashvilleTennesseeUSA
| | - Karel G. M. Moons
- Julius Center for Health Sciences and Primary Care, University Medical Center UtrechtUtrecht UniversityUtrechtthe Netherlands
- Cochrane Netherlands, University Medical Center UtrechtUtrecht UniversityUtrechtthe Netherlands
| | - Thomas P. A. Debray
- Julius Center for Health Sciences and Primary Care, University Medical Center UtrechtUtrecht UniversityUtrechtthe Netherlands
- Cochrane Netherlands, University Medical Center UtrechtUtrecht UniversityUtrechtthe Netherlands
| | - Johannes B. Reitsma
- Julius Center for Health Sciences and Primary Care, University Medical Center UtrechtUtrecht UniversityUtrechtthe Netherlands
- Cochrane Netherlands, University Medical Center UtrechtUtrecht UniversityUtrechtthe Netherlands
| |
Collapse
|
43
|
Šinkovec H, Heinze G, Blagus R, Geroldinger A. To tune or not to tune, a case study of ridge logistic regression in small or sparse datasets. BMC Med Res Methodol 2021; 21:199. [PMID: 34592945 PMCID: PMC8482588 DOI: 10.1186/s12874-021-01374-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Accepted: 08/19/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND For finite samples with binary outcomes penalized logistic regression such as ridge logistic regression has the potential of achieving smaller mean squared errors (MSE) of coefficients and predictions than maximum likelihood estimation. There is evidence, however, that ridge logistic regression can result in highly variable calibration slopes in small or sparse data situations. METHODS In this paper, we elaborate this issue further by performing a comprehensive simulation study, investigating the performance of ridge logistic regression in terms of coefficients and predictions and comparing it to Firth's correction that has been shown to perform well in low-dimensional settings. In addition to tuned ridge regression where the penalty strength is estimated from the data by minimizing some measure of the out-of-sample prediction error or information criterion, we also considered ridge regression with pre-specified degree of shrinkage. We included 'oracle' models in the simulation study in which the complexity parameter was chosen based on the true event probabilities (prediction oracle) or regression coefficients (explanation oracle) to demonstrate the capability of ridge regression if truth was known. RESULTS Performance of ridge regression strongly depends on the choice of complexity parameter. As shown in our simulation and illustrated by a data example, values optimized in small or sparse datasets are negatively correlated with optimal values and suffer from substantial variability which translates into large MSE of coefficients and large variability of calibration slopes. In contrast, in our simulations pre-specifying the degree of shrinkage prior to fitting led to accurate coefficients and predictions even in non-ideal settings such as encountered in the context of rare outcomes or sparse predictors. CONCLUSIONS Applying tuned ridge regression in small or sparse datasets is problematic as it results in unstable coefficients and predictions. In contrast, determining the degree of shrinkage according to some meaningful prior assumptions about true effects has the potential to reduce bias and stabilize the estimates.
Collapse
Affiliation(s)
- Hana Šinkovec
- Section for Clinical Biometrics, Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Spitalgasse 23, 1090, Vienna, Austria
| | - Georg Heinze
- Section for Clinical Biometrics, Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Spitalgasse 23, 1090, Vienna, Austria
| | - Rok Blagus
- Institute for Biostatistics and Medical Informatics, University of Ljubljana, Ljubljana, Slovenia
| | - Angelika Geroldinger
- Section for Clinical Biometrics, Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Spitalgasse 23, 1090, Vienna, Austria.
| |
Collapse
|
44
|
Li Y, Liang M, Mao L, Wang S. Robust estimation and variable selection for the accelerated failure time model. Stat Med 2021; 40:4473-4491. [PMID: 34031919 PMCID: PMC8364878 DOI: 10.1002/sim.9042] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Revised: 04/25/2021] [Accepted: 04/26/2021] [Indexed: 11/10/2022]
Abstract
This article concerns robust modeling of the survival time for cancer patients. Accurate prediction of patient survival time is crucial to the development of effective therapeutic strategies. To this goal, we propose a unified Expectation-Maximization approach combined with the L1 -norm penalty to perform variable selection and parameter estimation simultaneously in the accelerated failure time model with right-censored survival data of moderate sizes. Our approach accommodates general loss functions, and reduces to the well-known Buckley-James method when the squared-error loss is used without regularization. To mitigate the effects of outliers and heavy-tailed noise in real applications, we recommend the use of robust loss functions under the general framework. Furthermore, our approach can be extended to incorporate group structure among covariates. We conduct extensive simulation studies to assess the performance of the proposed methods with different loss functions and apply them to an ovarian carcinoma study as an illustration.
Collapse
Affiliation(s)
- Yi Li
- Department of Statistics, University of Wisconsin-Madison, Wisconsin, USA
| | - Muxuan Liang
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Washington, USA
| | - Lu Mao
- Department of Biostatistics and Medical Informatics, School of Medicine and Public Health, University of Wisconsin-Madison, Wisconsin, USA
| | - Sijian Wang
- Department of Statistics, Rutgers University, New Jersey, USA
| |
Collapse
|
45
|
Cornelissen LL, Caram‐Deelder C, Fustolo‐Gunnink SF, Groenwold RHH, Stanworth SJ, Zwaginga JJ, van der Bom JG. Expected individual benefit of prophylactic platelet transfusions in hemato-oncology patients based on bleeding risks. Transfusion 2021; 61:2578-2587. [PMID: 34263930 PMCID: PMC8518514 DOI: 10.1111/trf.16587] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2021] [Revised: 06/15/2021] [Accepted: 06/23/2021] [Indexed: 01/01/2023]
Abstract
BACKGROUND Prophylactic platelet transfusions prevent bleeding in hemato-oncology patients, but it is unclear how any benefit varies between patients. Our aim was to assess if patients with different baseline risks for bleeding benefit differently from a prophylactic platelet transfusion strategy. STUDY DESIGN AND METHODS Using the data from the randomized controlled TOPPS trial (Trial of Platelet Prophylaxis), we developed a prediction model for World Health Organization grades 2, 3, and 4 bleeding risk (defined as at least one bleeding episode in a 30 days period) and grouped patients in four risk-quartiles based on this predicted baseline risk. Predictors in the model were baseline platelet count, age, diagnosis, disease modifying treatment, disease status, previous stem cell transplantation, and the randomization arm. RESULTS The model had a c-statistic of 0.58 (95% confidence interval [CI] 0.54-0.64). There was little variation in predicted risks (quartiles 46%, 47%, and 51%), but prophylactic platelet transfusions gave a risk reduction in all risk quartiles. The absolute risk difference (ARD) was 3.4% (CI -12.2 to 18.9) in the lowest risk quartile (quartile 1), 7.4% (95% CI -8.4 to 23.3) in quartile 2, 6.8% (95% CI -9.1 to 22.9) in quartile 3, and 12.8% (CI -3.1 to 28.7) in the highest risk quartile (quartile 4). CONCLUSION In our study, generally accepted bleeding risk predictors had limited predictive power (expressed by the low c-statistic), and, given the wide confidence intervals of predicted ARD, could not aid in identifying subgroups of patients who might benefit more (or less) from prophylactic platelet transfusion.
Collapse
Affiliation(s)
- Loes L. Cornelissen
- Jon J van Rood Center for Clinical Transfusion Research, Sanquin/LUMCLeidenThe Netherlands
- Department of HematologyLeiden University medical CenterLeidenThe Netherlands
- Department of Clinical EpidemiologyLeiden University Medical CenterLeidenThe Netherlands
| | - Camila Caram‐Deelder
- Jon J van Rood Center for Clinical Transfusion Research, Sanquin/LUMCLeidenThe Netherlands
- Department of Clinical EpidemiologyLeiden University Medical CenterLeidenThe Netherlands
| | - Susanna F. Fustolo‐Gunnink
- Jon J van Rood Center for Clinical Transfusion Research, Sanquin/LUMCLeidenThe Netherlands
- Department of Clinical EpidemiologyLeiden University Medical CenterLeidenThe Netherlands
- Department of Pediatric Hematology, Emma Children's Hospital, Amsterdam University Medical Center (UMC)University of AmsterdamAmsterdamThe Netherlands
| | - Rolf H. H. Groenwold
- Department of Clinical EpidemiologyLeiden University Medical CenterLeidenThe Netherlands
| | - Simon J. Stanworth
- Transfusion Medicine, NHS Blood and Transplant (NHSBT)OxfordUK
- Department of HaematologyOxford University Hospitals NHS Foundation TrustOxfordUK
- Radcliffe Department of MedicineUniversity of OxfordOxfordUK
- NIHR Oxford Biomedical Research CentreOxfordUK
| | - Jaap Jan Zwaginga
- Jon J van Rood Center for Clinical Transfusion Research, Sanquin/LUMCLeidenThe Netherlands
- Department of HematologyLeiden University medical CenterLeidenThe Netherlands
| | - Johanna G. van der Bom
- Jon J van Rood Center for Clinical Transfusion Research, Sanquin/LUMCLeidenThe Netherlands
- Department of Clinical EpidemiologyLeiden University Medical CenterLeidenThe Netherlands
| |
Collapse
|
46
|
Heinze G, van Smeden M, Wynants L, Steyerberg E, van Calster B. Prediction models: stepwise development and simultaneous validation is a step back. J Clin Epidemiol 2021; 142:330-331. [PMID: 34348179 DOI: 10.1016/j.jclinepi.2021.07.019] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2021] [Accepted: 07/28/2021] [Indexed: 12/23/2022]
Affiliation(s)
- Georg Heinze
- Section for Clinical Biometrics, Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Spitalgasse 23, 1090 Vienna, Austria.
| | - Maarten van Smeden
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands
| | - Laure Wynants
- Department of Epidemiology, CAPHRI Care and Public Health Research Institute, Maastricht University, Netherlands; Department of Development and Regenaration, KU Leuven, Leuven, Belgium; EPI-center, KU Leuven, Belgium
| | - Ewout Steyerberg
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands
| | - Ben van Calster
- Department of Development and Regenaration, KU Leuven, Leuven, Belgium; EPI-center, KU Leuven, Belgium; Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands
| |
Collapse
|
47
|
Abstract
In bioprocess engineering the Qualtiy by Design (QbD) initiative encourages the use of models to define design spaces. However, clear guidelines on how models for QbD are validated are still missing. In this review we provide a comprehensive overview of the validation methods, mathematical approaches, and metrics currently applied in bioprocess modeling. The methods cover analytics for data used for modeling, model training and selection, measures for predictiveness, and model uncertainties. We point out the general issues in model validation and calibration for different types of models and put this into the context of existing health authority recommendations. This review provides a starting point for developing a guide for model validation approaches. There is no one-fits-all approach, but this review should help to identify the best fitting validation method, or combination of methods, for the specific task and the type of bioprocess model that is being developed.
Collapse
|
48
|
Srisuwarn P, Srisuma S, Sriapha C, Tongpoo A, Rittilert P, Pradoo A, Tanpudsa Y, Wananukul W. Clinical effects and factors associated with adverse clinical outcomes of hymenopteran stings treated in a Thai Poison Centre: a retrospective cross-sectional study. Clin Toxicol (Phila) 2021; 60:168-174. [PMID: 33960850 DOI: 10.1080/15563650.2021.1918705] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
OBJECTIVE To describe clinical effects and outcomes of hymenopteran stings and to explore the non-laboratory factors associated with adverse clinical outcomes, a composite outcome including death, respiratory failure requiring intubation, acute kidney injury (AKI) requiring dialysis and hypotension requiring vasopressor use. METHODS A retrospective cross-sectional study was performed at the Ramathibodi Poison Center, a poison centre of a tertiary care hospital in Thailand. All cases of hymenopteran sting consultations from January 2015 to June 2019 were consecutively enrolled, and charts were reviewed. Demographics, initial clinical characteristics and outcomes were collected. Factors associated with adverse clinical outcome were explored. RESULTS One hundred and fourteen hymenopteran stings cases (wasp 48%, bee 33%, hornet 14% and carpenter bee 8.8%) were included (median age, 36.5 years (interquartile range 9-55); male 63%). The prevalence of adverse clinical outcomes was 12.3% (95%CI 6.88-12.8). At initial presentation, 100% of cases had local skin reactions, 11.4% were clinical anaphylaxis, and 8% had red urine. Adverse clinical outcomes included death (n = 10), respiratory failure requiring intubation (n = 9), AKI requiring dialysis (n = 6) and hypotension requiring vasopressor use (n = 2). None of the patients with carpenter bee or hornet stings developed adverse clinical outcomes. In univariable analysis, urticaria, wheezing, red urine, wasp sting and sting number > 10 were significantly associated with adverse clinical outcomes. In multivariable analysis, red urine (adjusted OR 11.1 (95% CI 1.57-216)), wheezing (adjusted OR 16.7 (95% CI 1.43-402)) and a number of stings > 10 (adjusted OR 21.5 (95% CI2.13-2557)) were significant. CONCLUSIONS Adverse clinical outcomes in hymenopteran stings were not uncommon among cases inquiring to a national Thai poison centre. At initial presentation, red urine, wheezing and a number stings >10 were significantly associated with adverse clinical outcomes. Larger epidemiologic studies are required to confirm these associations.
Collapse
Affiliation(s)
- Praopilad Srisuwarn
- Department of Medicine, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok, Thailand.,Ramathibodi Poison Center, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
| | - Sahaphume Srisuma
- Department of Medicine, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok, Thailand.,Ramathibodi Poison Center, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
| | - Charuwan Sriapha
- Ramathibodi Poison Center, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
| | - Achara Tongpoo
- Ramathibodi Poison Center, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
| | - Panee Rittilert
- Ramathibodi Poison Center, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
| | - Aimon Pradoo
- Ramathibodi Poison Center, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
| | - Yuvadee Tanpudsa
- Ramathibodi Poison Center, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
| | - Winai Wananukul
- Department of Medicine, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok, Thailand.,Ramathibodi Poison Center, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
| |
Collapse
|
49
|
Riley RD, Snell KIE, Martin GP, Whittle R, Archer L, Sperrin M, Collins GS. Penalization and shrinkage methods produced unreliable clinical prediction models especially when sample size was small. J Clin Epidemiol 2021; 132:88-96. [PMID: 33307188 PMCID: PMC8026952 DOI: 10.1016/j.jclinepi.2020.12.005] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Revised: 11/15/2020] [Accepted: 12/02/2020] [Indexed: 12/14/2022]
Abstract
OBJECTIVES When developing a clinical prediction model, penalization techniques are recommended to address overfitting, as they shrink predictor effect estimates toward the null and reduce mean-square prediction error in new individuals. However, shrinkage and penalty terms ('tuning parameters') are estimated with uncertainty from the development data set. We examined the magnitude of this uncertainty and the subsequent impact on prediction model performance. STUDY DESIGN AND SETTING This study comprises applied examples and a simulation study of the following methods: uniform shrinkage (estimated via a closed-form solution or bootstrapping), ridge regression, the lasso, and elastic net. RESULTS In a particular model development data set, penalization methods can be unreliable because tuning parameters are estimated with large uncertainty. This is of most concern when development data sets have a small effective sample size and the model's Cox-Snell R2 is low. The problem can lead to considerable miscalibration of model predictions in new individuals. CONCLUSION Penalization methods are not a 'carte blanche'; they do not guarantee a reliable prediction model is developed. They are more unreliable when needed most (i.e., when overfitting may be large). We recommend they are best applied with large effective sample sizes, as identified from recent sample size calculations that aim to minimize the potential for model overfitting and precisely estimate key parameters.
Collapse
Affiliation(s)
- Richard D Riley
- Centre for Prognosis Research, School of Medicine, Keele University, Staffordshire, UK, ST5 5BG.
| | - Kym I E Snell
- Centre for Prognosis Research, School of Medicine, Keele University, Staffordshire, UK, ST5 5BG
| | - Glen P Martin
- Division of Informatics, Imaging and Data Science, Faculty of Biology, Medicine and Health, University of Manchester, Manchester Academic Health Science Centre, Manchester, UK
| | - Rebecca Whittle
- Centre for Prognosis Research, School of Medicine, Keele University, Staffordshire, UK, ST5 5BG
| | - Lucinda Archer
- Centre for Prognosis Research, School of Medicine, Keele University, Staffordshire, UK, ST5 5BG
| | - Matthew Sperrin
- Division of Informatics, Imaging and Data Science, Faculty of Biology, Medicine and Health, University of Manchester, Manchester Academic Health Science Centre, Manchester, UK
| | - Gary S Collins
- Nuffield Department of Orthopaedics, Centre for Statistics in Medicine, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK, OX3 7LD; NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford, OX3 9DU, UK
| |
Collapse
|
50
|
Christodoulou E, van Smeden M, Edlinger M, Timmerman D, Wanitschek M, Steyerberg EW, Van Calster B. Adaptive sample size determination for the development of clinical prediction models. Diagn Progn Res 2021; 5:6. [PMID: 33745449 PMCID: PMC7983402 DOI: 10.1186/s41512-021-00096-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 02/15/2021] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND We suggest an adaptive sample size calculation method for developing clinical prediction models, in which model performance is monitored sequentially as new data comes in. METHODS We illustrate the approach using data for the diagnosis of ovarian cancer (n = 5914, 33% event fraction) and obstructive coronary artery disease (CAD; n = 4888, 44% event fraction). We used logistic regression to develop a prediction model consisting only of a priori selected predictors and assumed linear relations for continuous predictors. We mimicked prospective patient recruitment by developing the model on 100 randomly selected patients, and we used bootstrapping to internally validate the model. We sequentially added 50 random new patients until we reached a sample size of 3000 and re-estimated model performance at each step. We examined the required sample size for satisfying the following stopping rule: obtaining a calibration slope ≥ 0.9 and optimism in the c-statistic (or AUC) < = 0.02 at two consecutive sample sizes. This procedure was repeated 500 times. We also investigated the impact of alternative modeling strategies: modeling nonlinear relations for continuous predictors and correcting for bias on the model estimates (Firth's correction). RESULTS Better discrimination was achieved in the ovarian cancer data (c-statistic 0.9 with 7 predictors) than in the CAD data (c-statistic 0.7 with 11 predictors). Adequate calibration and limited optimism in discrimination was achieved after a median of 450 patients (interquartile range 450-500) for the ovarian cancer data (22 events per parameter (EPP), 20-24) and 850 patients (750-900) for the CAD data (33 EPP, 30-35). A stricter criterion, requiring AUC optimism < = 0.01, was met with a median of 500 (23 EPP) and 1500 (59 EPP) patients, respectively. These sample sizes were much higher than the well-known 10 EPP rule of thumb and slightly higher than a recently published fixed sample size calculation method by Riley et al. Higher sample sizes were required when nonlinear relationships were modeled, and lower sample sizes when Firth's correction was used. CONCLUSIONS Adaptive sample size determination can be a useful supplement to fixed a priori sample size calculations, because it allows to tailor the sample size to the specific prediction modeling context in a dynamic fashion.
Collapse
Affiliation(s)
| | - Maarten van Smeden
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, Netherlands
| | - Michael Edlinger
- Department of Development & Regeneration, KU Leuven, Leuven, Belgium
- Department of Medical Statistics, Informatics, and Health Economics, Medical University Innsbruck, Innsbruck, Austria
| | - Dirk Timmerman
- Department of Development & Regeneration, KU Leuven, Leuven, Belgium
- Department of Obstetrics and Gynecology, University Hospitals Leuven, Leuven, Belgium
| | - Maria Wanitschek
- University Clinic of Internal Medicine III - Cardiology and Angiology, Tirol Kliniken, Innsbruck, Austria
| | - Ewout W Steyerberg
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, Netherlands
| | - Ben Van Calster
- Department of Development & Regeneration, KU Leuven, Leuven, Belgium.
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, Netherlands.
- EPI-centre, KU Leuven, Leuven, Belgium.
| |
Collapse
|