1
|
Krakowski K, Oliver D, Arribas M, Stahl D, Fusar-Poli P. Dynamic and Transdiagnostic Risk Calculator Based on Natural Language Processing for the Prediction of Psychosis in Secondary Mental Health Care: Development and Internal-External Validation Cohort Study. Biol Psychiatry 2024; 96:604-614. [PMID: 38852896 DOI: 10.1016/j.biopsych.2024.05.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 04/05/2024] [Accepted: 05/10/2024] [Indexed: 06/11/2024]
Abstract
BACKGROUND Automatic transdiagnostic risk calculators can improve the detection of individuals at risk of psychosis. However, they rely on assessment at a single point in time and can be refined with dynamic modeling techniques that account for changes in risk over time. METHODS We included 158,139 patients (5007 events) who received a first index diagnosis of a nonorganic and nonpsychotic mental disorder within electronic health records from the South London and Maudsley National Health Service Foundation Trust between January 1, 2008, and October 8, 2021. A dynamic Cox landmark model was developed to estimate the 2-year risk of developing psychosis according to the TRIPOD (Transparent Reporting of a multivariate prediction model for Individual Prognosis or Diagnosis) statement. The dynamic model included 24 predictors extracted at 9 landmark points (baseline, 0, 6, 12, 24, 30, 36, 42, and 48 months): 3 demographic, 1 clinical, and 20 natural language processing-based symptom and substance use predictors. Performance was compared with a static Cox regression model with all predictors assessed at baseline only and indexed via discrimination (C-index), calibration (calibration plots), and potential clinical utility (decision curves) in internal-external validation. RESULTS The dynamic model improved discrimination performance from baseline compared with the static model (dynamic: C-index = 0.9; static: C-index = 0.87) and the final landmark point (dynamic: C-index = 0.79; static: C-index = 0.76). The dynamic model was also significantly better calibrated (calibration slope = 0.97-1.1) than the static model at later landmark points (≥24 months). Net benefit was higher for the dynamic than for the static model at later landmark points (≥24 months). CONCLUSIONS These findings suggest that dynamic prediction models can improve the detection of individuals at risk for psychosis in secondary mental health care settings.
Collapse
Affiliation(s)
- Kamil Krakowski
- Department of Brain and Behavioural Sciences, University of Pavia, Pavia, Italy; Early Psychosis: Interventions and Clinical-Detection Laboratory, Department of Psychosis Studies, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
| | - Dominic Oliver
- Early Psychosis: Interventions and Clinical-Detection Laboratory, Department of Psychosis Studies, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom; Department of Psychiatry, University of Oxford, Oxford, United Kingdom; National Institute for Health and Care Research Oxford Health Biomedical Research Centre, Oxford, United Kingdom; OPEN Early Detection Service, Oxford Health National Health Service Foundation Trust, Oxford, United Kingdom
| | - Maite Arribas
- Early Psychosis: Interventions and Clinical-Detection Laboratory, Department of Psychosis Studies, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
| | - Daniel Stahl
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom
| | - Paolo Fusar-Poli
- Department of Brain and Behavioural Sciences, University of Pavia, Pavia, Italy; Early Psychosis: Interventions and Clinical-Detection Laboratory, Department of Psychosis Studies, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom; OASIS Service, South London and the Maudsley National Health Service Foundation Trust, London, United Kingdom; Department of Psychiatry and Psychotherapy, Ludwig Maximilian University Munich, Munich, Germany.
| |
Collapse
|
2
|
Sun T, McCoy AB, Storrow AB, Liu D. Addressing the implementation challenge of risk prediction model due to missing risk factors: The submodel approximation approach. Stat Med 2024. [PMID: 39264051 DOI: 10.1002/sim.10184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 05/27/2024] [Accepted: 07/15/2024] [Indexed: 09/13/2024]
Abstract
Clinical prediction models have been widely acknowledged as informative tools providing evidence-based support for clinical decision making. However, prediction models are often underused in clinical practice due to many reasons including missing information upon real-time risk calculation in electronic health records (EHR) system. Existing literature to address this challenge focuses on statistical comparison of various approaches while overlooking the feasibility of their implementation in EHR. In this article, we propose a novel and feasible submodel approach to address this challenge for prediction models developed using the model approximation (also termed "preconditioning") method. The proposed submodel coefficients are equivalent to the corresponding original prediction model coefficients plus a correction factor. Comprehensive simulations were conducted to assess the performance of the proposed method and compared with the existing "one-step-sweep" approach as well as the imputation approach. In general, the simulation results show the preconditioning-based submodel approach is robust to various heterogeneity scenarios and is comparable to the imputation-based approach, while the "one-step-sweep" approach is less robust under certain heterogeneity scenarios. The proposed method was applied to facilitate real-time implementation of a prediction model to identify emergency department patients with acute heart failure who can be safely discharged home.
Collapse
Affiliation(s)
- Tianyi Sun
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Allison B McCoy
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Alan B Storrow
- Department of Emergency Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Dandan Liu
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee
| |
Collapse
|
3
|
Efthimiou O, Seo M, Chalkou K, Debray T, Egger M, Salanti G. Developing clinical prediction models: a step-by-step guide. BMJ 2024; 386:e078276. [PMID: 39227063 PMCID: PMC11369751 DOI: 10.1136/bmj-2023-078276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/12/2024] [Indexed: 09/05/2024]
Affiliation(s)
- Orestis Efthimiou
- Institute of Primary Health Care (BIHAM), University of Bern, Bern, Switzerland
- Institute of Social and Preventive Medicine (ISPM), University of Bern, Bern, Switzerland
| | - Michael Seo
- Institute of Social and Preventive Medicine (ISPM), University of Bern, Bern, Switzerland
| | | | - Thomas Debray
- Smart Data Analysis and Statistics B V, Utrecht, The Netherlands
| | - Matthias Egger
- Institute of Social and Preventive Medicine (ISPM), University of Bern, Bern, Switzerland
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
| | - Georgia Salanti
- Institute of Social and Preventive Medicine (ISPM), University of Bern, Bern, Switzerland
| |
Collapse
|
4
|
Stahl D. New horizons in prediction modelling using machine learning in older people's healthcare research. Age Ageing 2024; 53:afae201. [PMID: 39311424 PMCID: PMC11417961 DOI: 10.1093/ageing/afae201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 06/26/2024] [Indexed: 09/26/2024] Open
Abstract
Machine learning (ML) and prediction modelling have become increasingly influential in healthcare, providing critical insights and supporting clinical decisions, particularly in the age of big data. This paper serves as an introductory guide for health researchers and readers interested in prediction modelling and explores how these technologies support clinical decisions, particularly with big data, and covers all aspects of the development, assessment and reporting of a model using ML. The paper starts with the importance of prediction modelling for precision medicine. It outlines different types of prediction and machine learning approaches, including supervised, unsupervised and semi-supervised learning, and provides an overview of popular algorithms for various outcomes and settings. It also introduces key theoretical ML concepts. The importance of data quality, preprocessing and unbiased model performance evaluation is highlighted. Concepts of apparent, internal and external validation will be introduced along with metrics for discrimination and calibration for different types of outcomes. Additionally, the paper addresses model interpretation, fairness and implementation in clinical practice. Finally, the paper provides recommendations for reporting and identifies common pitfalls in prediction modelling and machine learning. The aim of the paper is to help readers understand and critically evaluate research papers that present ML models and to serve as a first guide for developing, assessing and implementing their own.
Collapse
Affiliation(s)
- Daniel Stahl
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology & Neuroscience, King’s College London, London, UK
| |
Collapse
|
5
|
Blythe R, Parsons R, Barnett AG, Cook D, McPhail SM, White NM. Prioritising deteriorating patients using time-to-event analysis: prediction model development and internal-external validation. Crit Care 2024; 28:247. [PMID: 39020419 PMCID: PMC11256441 DOI: 10.1186/s13054-024-05021-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Accepted: 07/05/2024] [Indexed: 07/19/2024] Open
Abstract
BACKGROUND Binary classification models are frequently used to predict clinical deterioration, however they ignore information on the timing of events. An alternative is to apply time-to-event models, augmenting clinical workflows by ranking patients by predicted risks. This study examines how and why time-to-event modelling of vital signs data can help prioritise deterioration assessments using lift curves, and develops a prediction model to stratify acute care inpatients by risk of clinical deterioration. METHODS We developed and validated a Cox regression for time to in-hospital mortality. The model used time-varying covariates to estimate the risk of clinical deterioration. Adult inpatient medical records from 5 Australian hospitals between 1 January 2019 and 31 December 2020 were used for model development and validation. Model discrimination and calibration were assessed using internal-external cross validation. A discrete-time logistic regression model predicting death within 24 h with the same covariates was used as a comparator to the Cox regression model to estimate differences in predictive performance between the binary and time-to-event outcome modelling approaches. RESULTS Our data contained 150,342 admissions and 1016 deaths. Model discrimination was higher for Cox regression than for discrete-time logistic regression, with cross-validated AUCs of 0.96 and 0.93, respectively, for mortality predictions within 24 h, declining to 0.93 and 0.88, respectively, for mortality predictions within 1 week. Calibration plots showed that calibration varied by hospital, but this can be mitigated by ranking patients by predicted risks. CONCLUSION Time-varying covariate Cox models can be powerful tools for triaging patients, which may lead to more efficient and effective care in time-poor environments when the times between observations are highly variable.
Collapse
Affiliation(s)
- Robin Blythe
- Australian Centre for Health Services Innovation and Centre for Healthcare Transformation, School of Public Health and Social Work, Faculty of Health, Queensland University of Technology, 60 Musk Ave, Kelvin Grove, Qld, 4059, Australia.
| | - Rex Parsons
- Australian Centre for Health Services Innovation and Centre for Healthcare Transformation, School of Public Health and Social Work, Faculty of Health, Queensland University of Technology, 60 Musk Ave, Kelvin Grove, Qld, 4059, Australia
| | - Adrian G Barnett
- Australian Centre for Health Services Innovation and Centre for Healthcare Transformation, School of Public Health and Social Work, Faculty of Health, Queensland University of Technology, 60 Musk Ave, Kelvin Grove, Qld, 4059, Australia
| | - David Cook
- Intensive Care Unit, Princess Alexandra Hospital, Metro South Health, Woolloongabba, 4102, Qld, Australia
| | - Steven M McPhail
- Australian Centre for Health Services Innovation and Centre for Healthcare Transformation, School of Public Health and Social Work, Faculty of Health, Queensland University of Technology, 60 Musk Ave, Kelvin Grove, Qld, 4059, Australia
- Digital Health and Informatics, Metro South Health, Woolloongabba, 4102, Qld, Australia
| | - Nicole M White
- Australian Centre for Health Services Innovation and Centre for Healthcare Transformation, School of Public Health and Social Work, Faculty of Health, Queensland University of Technology, 60 Musk Ave, Kelvin Grove, Qld, 4059, Australia
| |
Collapse
|
6
|
D'Agostino McGowan L, Lotspeich SC, Hepler SA. The "Why" behind including "Y" in your imputation model. Stat Methods Med Res 2024:9622802241244608. [PMID: 38625810 DOI: 10.1177/09622802241244608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/18/2024]
Abstract
Missing data is a common challenge when analyzing epidemiological data, and imputation is often used to address this issue. Here, we investigate the scenario where a covariate used in an analysis has missingness and will be imputed. There are recommendations to include the outcome from the analysis model in the imputation model for missing covariates, but it is not necessarily clear if this recommendation always holds and why this is sometimes true. We examine deterministic imputation (i.e. single imputation with fixed values) and stochastic imputation (i.e. single or multiple imputation with random values) methods and their implications for estimating the relationship between the imputed covariate and the outcome. We mathematically demonstrate that including the outcome variable in imputation models is not just a recommendation but a requirement to achieve unbiased results when using stochastic imputation methods. Moreover, we dispel common misconceptions about deterministic imputation models and demonstrate why the outcome should not be included in these models. This article aims to bridge the gap between imputation in theory and in practice, providing mathematical derivations to explain common statistical recommendations. We offer a better understanding of the considerations involved in imputing missing covariates and emphasize when it is necessary to include the outcome variable in the imputation model.
Collapse
Affiliation(s)
| | - Sarah C Lotspeich
- Department of Statistical Sciences, Wake Forest University, Winston-Salem, NC, USA
| | - Staci A Hepler
- Department of Statistical Sciences, Wake Forest University, Winston-Salem, NC, USA
| |
Collapse
|
7
|
Archer L, Relton SD, Akbari A, Best K, Bucknall M, Conroy S, Hattle M, Hollinghurst J, Humphrey S, Lyons RA, Richards S, Walters K, West R, van der Windt D, Riley RD, Clegg A. Development and external validation of the eFalls tool: a multivariable prediction model for the risk of ED attendance or hospitalisation with a fall or fracture in older adults. Age Ageing 2024; 53:afae057. [PMID: 38520142 PMCID: PMC10960070 DOI: 10.1093/ageing/afae057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Indexed: 03/25/2024] Open
Abstract
BACKGROUND Falls are common in older adults and can devastate personal independence through injury such as fracture and fear of future falls. Methods to identify people for falls prevention interventions are currently limited, with high risks of bias in published prediction models. We have developed and externally validated the eFalls prediction model using routinely collected primary care electronic health records (EHR) to predict risk of emergency department attendance/hospitalisation with fall or fracture within 1 year. METHODS Data comprised two independent, retrospective cohorts of adults aged ≥65 years: the population of Wales, from the Secure Anonymised Information Linkage Databank (model development); the population of Bradford and Airedale, England, from Connected Bradford (external validation). Predictors included electronic frailty index components, supplemented with variables informed by literature reviews and clinical expertise. Fall/fracture risk was modelled using multivariable logistic regression with a Least Absolute Shrinkage and Selection Operator penalty. Predictive performance was assessed through calibration, discrimination and clinical utility. Apparent, internal-external cross-validation and external validation performance were assessed across general practices and in clinically relevant subgroups. RESULTS The model's discrimination performance (c-statistic) was 0.72 (95% confidence interval, CI: 0.68 to 0.76) on internal-external cross-validation and 0.82 (95% CI: 0.80 to 0.83) on external validation. Calibration was variable across practices, with some over-prediction in the validation population (calibration-in-the-large, -0.87; 95% CI: -0.96 to -0.78). Clinical utility on external validation was improved after recalibration. CONCLUSION The eFalls prediction model shows good performance and could support proactive stratification for falls prevention services if appropriately embedded into primary care EHR systems.
Collapse
Affiliation(s)
- Lucinda Archer
- Institute for Applied Health Research, University of Birmingham, Birmingham, UK
- National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, University of Birmingham, Birmingham, UK
| | - Samuel D Relton
- Leeds Institute of Health Sciences, University of Leeds, Leeds, UK
| | - Ashley Akbari
- Population Data Science, Swansea University Medical School, Swansea University, Swansea, UK
| | - Kate Best
- Academic Unit for Ageing and Stroke Research, University of Leeds, Bradford Teaching Hospitals NHS Foundation Trust, Bradford, UK
| | | | - Simon Conroy
- Institute of Cardiovascular Science, University College London, London, UK
| | - Miriam Hattle
- Institute for Applied Health Research, University of Birmingham, Birmingham, UK
- National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, University of Birmingham, Birmingham, UK
| | - Joe Hollinghurst
- Population Data Science, Swansea University Medical School, Swansea University, Swansea, UK
| | - Sara Humphrey
- Bradford District and Craven Health and Care Partnership, Bradford, UK
| | - Ronan A Lyons
- Population Data Science, Swansea University Medical School, Swansea University, Swansea, UK
| | - Suzanne Richards
- Leeds Institute of Health Sciences, University of Leeds, Leeds, UK
| | - Kate Walters
- Primary Care and Population Health, University College London, London, UK
| | - Robert West
- Leeds Institute of Health Sciences, University of Leeds, Leeds, UK
| | | | - Richard D Riley
- Institute for Applied Health Research, University of Birmingham, Birmingham, UK
- National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, University of Birmingham, Birmingham, UK
| | - Andrew Clegg
- Academic Unit for Ageing and Stroke Research, University of Leeds, Bradford Teaching Hospitals NHS Foundation Trust, Bradford, UK
| | | |
Collapse
|
8
|
Barreñada L, Ledger A, Dhiman P, Collins G, Wynants L, Verbakel JY, Timmerman D, Valentin L, Van Calster B. ADNEX risk prediction model for diagnosis of ovarian cancer: systematic review and meta-analysis of external validation studies. BMJ MEDICINE 2024; 3:e000817. [PMID: 38375077 PMCID: PMC10875560 DOI: 10.1136/bmjmed-2023-000817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 01/25/2024] [Indexed: 02/21/2024]
Abstract
Objectives To conduct a systematic review of studies externally validating the ADNEX (Assessment of Different Neoplasias in the adnexa) model for diagnosis of ovarian cancer and to present a meta-analysis of its performance. Design Systematic review and meta-analysis of external validation studies. Data sources Medline, Embase, Web of Science, Scopus, and Europe PMC, from 15 October 2014 to 15 May 2023. Eligibility criteria for selecting studies All external validation studies of the performance of ADNEX, with any study design and any study population of patients with an adnexal mass. Two independent reviewers extracted the data. Disagreements were resolved by discussion. Reporting quality of the studies was scored with the TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) reporting guideline, and methodological conduct and risk of bias with PROBAST (Prediction model Risk Of Bias Assessment Tool). Random effects meta-analysis of the area under the receiver operating characteristic curve (AUC), sensitivity and specificity at the 10% risk of malignancy threshold, and net benefit and relative utility at the 10% risk of malignancy threshold were performed. Results 47 studies (17 007 tumours) were included, with a median study sample size of 261 (range 24-4905). On average, 61% of TRIPOD items were reported. Handling of missing data, justification of sample size, and model calibration were rarely described. 91% of validations were at high risk of bias, mainly because of the unexplained exclusion of incomplete cases, small sample size, or no assessment of calibration. The summary AUC to distinguish benign from malignant tumours in patients who underwent surgery was 0.93 (95% confidence interval 0.92 to 0.94, 95% prediction interval 0.85 to 0.98) for ADNEX with the serum biomarker, cancer antigen 125 (CA125), as a predictor (9202 tumours, 43 centres, 18 countries, and 21 studies) and 0.93 (95% confidence interval 0.91 to 0.94, 95% prediction interval 0.85 to 0.98) for ADNEX without CA125 (6309 tumours, 31 centres, 13 countries, and 12 studies). The estimated probability that the model has use clinically in a new centre was 95% (with CA125) and 91% (without CA125). When restricting analysis to studies with a low risk of bias, summary AUC values were 0.93 (with CA125) and 0.91 (without CA125), and estimated probabilities that the model has use clinically were 89% (with CA125) and 87% (without CA125). Conclusions The results of the meta-analysis indicated that ADNEX performed well in distinguishing between benign and malignant tumours in populations from different countries and settings, regardless of whether the serum biomarker, CA125, was used as a predictor. A key limitation was that calibration was rarely assessed. Systematic review registration PROSPERO CRD42022373182.
Collapse
Affiliation(s)
- Lasai Barreñada
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
| | - Ashleigh Ledger
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
| | - Paula Dhiman
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford Centre for Statistics in Medicine, Oxford, UK
| | - Gary Collins
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford Centre for Statistics in Medicine, Oxford, UK
| | - Laure Wynants
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
- Department of Epidemiology, Universiteit Maastricht Care and Public Health Research Institute, Maastricht, Netherlands
| | - Jan Y Verbakel
- Department of Public Health and Primary care, KU Leuven, Leuven, Belgium
- Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, UK
- Leuven Unit for Health Technology Assessment Research (LUHTAR), KU Leuven, Leuven, Belgium
| | - Dirk Timmerman
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
- Department of Obstetrics and Gynaecology, UZ Leuven campus Gasthuisberg Dienst gynaecologie en verloskunde, Leuven, Belgium
| | - Lil Valentin
- Department of Obstetrics and Gynaecology, Skåne University Hospital, Malmo, Sweden
- Department of Clinical Sciences Malmö, Lund University, Lund, Sweden
| | - Ben Van Calster
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
- Leuven Unit for Health Technology Assessment Research (LUHTAR), KU Leuven, Leuven, Belgium
- Department of Biomedical Data Sciences, Leiden University Medical Centre, Leiden, Netherlands
| |
Collapse
|
9
|
Cardoso P, Dennis JM, Bowden J, Shields BM, McKinley TJ. Dirichlet process mixture models to impute missing predictor data in counterfactual prediction models: an application to predict optimal type 2 diabetes therapy. BMC Med Inform Decis Mak 2024; 24:12. [PMID: 38191403 PMCID: PMC10773072 DOI: 10.1186/s12911-023-02400-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Accepted: 12/11/2023] [Indexed: 01/10/2024] Open
Abstract
BACKGROUND The handling of missing data is a challenge for inference and regression modelling. A particular challenge is dealing with missing predictor information, particularly when trying to build and make predictions from models for use in clinical practice. METHODS We utilise a flexible Bayesian approach for handling missing predictor information in regression models. This provides practitioners with full posterior predictive distributions for both the missing predictor information (conditional on the observed predictors) and the outcome-of-interest. We apply this approach to a previously proposed counterfactual treatment selection model for type 2 diabetes second-line therapies. Our approach combines a regression model and a Dirichlet process mixture model (DPMM), where the former defines the treatment selection model, and the latter provides a flexible way to model the joint distribution of the predictors. RESULTS We show that DPMMs can model complex relationships between predictor variables and can provide powerful means of fitting models to incomplete data (under missing-completely-at-random and missing-at-random assumptions). This framework ensures that the posterior distribution for the parameters and the conditional average treatment effect estimates automatically reflect the additional uncertainties associated with missing data due to the hierarchical model structure. We also demonstrate that in the presence of multiple missing predictors, the DPMM model can be used to explore which variable(s), if collected, could provide the most additional information about the likely outcome. CONCLUSIONS When developing clinical prediction models, DPMMs offer a flexible way to model complex covariate structures and handle missing predictor information. DPMM-based counterfactual prediction models can also provide additional information to support clinical decision-making, including allowing predictions with appropriate uncertainty to be made for individuals with incomplete predictor data.
Collapse
Affiliation(s)
- Pedro Cardoso
- University of Exeter, Medical School, Exeter, England
| | - John M Dennis
- University of Exeter, Medical School, Exeter, England
| | - Jack Bowden
- University of Exeter, Medical School, Exeter, England
| | | | | |
Collapse
|
10
|
Rijk MH, Platteel TN, Geersing GJ, Hollander M, Dalmolen BLGP, Little P, Rutten FH, van Smeden M, Venekamp RP. Predicting adverse outcomes in adults with a community-acquired lower respiratory tract infection: a protocol for the development and validation of two prediction models for (i) all-cause hospitalisation and mortality and (ii) cardiovascular outcomes. Diagn Progn Res 2023; 7:23. [PMID: 38057921 DOI: 10.1186/s41512-023-00161-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 10/10/2023] [Indexed: 12/08/2023] Open
Abstract
BACKGROUND Community-acquired lower respiratory tract infections (LRTI) are common in primary care and patients at particular risk of adverse outcomes, e.g., hospitalisation and mortality, are challenging to identify. LRTIs are also linked to an increased incidence of cardiovascular diseases (CVD) following the initial infection, whereas concurrent CVD might negatively impact overall prognosis in LRTI patients. Accurate risk prediction of adverse outcomes in LRTI patients, while considering the interplay with CVD, can aid general practitioners (GP) in the clinical decision-making process, and may allow for early detection of deterioration. This paper therefore presents the design of the development and external validation of two models for predicting individual risk of all-cause hospitalisation or mortality (model 1) and short-term incidence of CVD (model 2) in adults presenting to primary care with LRTI. METHODS Both models will be developed using linked routine electronic health records (EHR) data from Dutch primary and secondary care, and the mortality registry. Adults aged ≥ 40 years with a GP-diagnosis of LRTI between 2016 and 2019 are eligible for inclusion. Relevant patient demographics, medical history, medication use, presenting signs and symptoms, and vital and laboratory measurements will be considered as candidate predictors. Outcomes of interest include 30-day all-cause hospitalisation or mortality (model 1) and 90-day CVD (model 2). Multivariable elastic net regression techniques will be used for model development. During the modelling process, the incremental predictive value of CVD for hospitalisation or all-cause mortality (model 1) will also be assessed. The models will be validated through internal-external cross-validation and external validation in an equivalent cohort of primary care LRTI patients. DISCUSSION Implementation of currently available prediction models for primary care LRTI patients is hampered by limited assessment of model performance. While considering the role of CVD in LRTI prognosis, we aim to develop and externally validate two models that predict clinically relevant outcomes to aid GPs in clinical decision-making. Challenges that we anticipate include the possibility of low event rates and common problems related to the use of EHR data, such as candidate predictor measurement and missingness, how best to retrieve information from free text fields, and potential misclassification of outcome events.
Collapse
Affiliation(s)
- Merijn H Rijk
- Department of General Practice & Nursing Science, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands.
| | - Tamara N Platteel
- Department of General Practice & Nursing Science, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands
| | - Geert-Jan Geersing
- Department of General Practice & Nursing Science, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands
| | - Monika Hollander
- Department of General Practice & Nursing Science, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands
| | | | - Paul Little
- Primary Care Research Center, Primary Care Population Sciences and Medical Education Unit, University of Southampton, Southampton, United Kingdom
| | - Frans H Rutten
- Department of General Practice & Nursing Science, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands
| | - Maarten van Smeden
- Department of Epidemiology & Health Economics, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands
| | - Roderick P Venekamp
- Department of General Practice & Nursing Science, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands
| |
Collapse
|
11
|
Lapp L, Roper M, Kavanagh K, Schraag S. Development and validation of a digital biomarker predicting acute kidney injury following cardiac surgery on an hourly basis. JTCVS OPEN 2023; 16:540-581. [PMID: 38204694 PMCID: PMC10775068 DOI: 10.1016/j.xjon.2023.09.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 09/01/2023] [Accepted: 09/06/2023] [Indexed: 01/12/2024]
Abstract
Objectives To develop and validate a digital biomarker for predicting the onset of acute kidney injury (AKI) on an hourly basis up to 24 hours in advance in the intensive care unit after cardiac surgery. Methods The study analyzed data from 6056 adult patients undergoing coronary artery bypass graft and/or valve surgery between April 1, 2012, and December 31, 2018 (development phase, training, and testing) and 3572 patients between January 1, 2019, and June 30, 2022 (validation phase). The study used 2 dynamic predictive modeling approaches, namely logistic regression and bootstrap aggregated regression trees machine (BARTm), to predict AKI. The mean area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and positive and negative predictive values across all lead times before the occurrence of AKI were reported. The clinical practicality was assessed using calibration. Results Of all included patients, 8.45% and 16.66% had AKI in the development and validation phases, respectively. When applied to testing data, AKI was predicted with the mean AUC of 0.850 and 0.802 by BARTm and logistic regression, respectively. When applied to validation data, BARTm and LR resulted in a mean AUC of 0.844 and 0.786, respectively. Conclusions This study demonstrated the successful prediction of AKI on an hourly basis up to 24 hours in advance. The digital biomarkers developed and validated in this study have the potential to assist clinicians in optimizing treatment and implementing preventive strategies for patients at risk of developing AKI after cardiac surgery in the intensive care unit.
Collapse
Affiliation(s)
- Linda Lapp
- Department of Computer and Information Sciences, Faculty of Science, University of Strathclyde, Glasgow, Scotland
| | - Marc Roper
- Department of Computer and Information Sciences, Faculty of Science, University of Strathclyde, Glasgow, Scotland
| | - Kimberley Kavanagh
- Department of Mathematics and Statistics, Faculty of Science, University of Strathclyde, Glasgow, Scotland
| | - Stefan Schraag
- Department of Anaesthesia and Perioperative Medicine, Golden Jubilee National Hospital, Clydebank, United Kingdom
| |
Collapse
|