1
|
Zhou Q, He R, Li H, Gu M. Development and validation of a nomogram to predict the risk of in-hospital MACE for emergence NSTE-ACS: A retrospective multicenter study based on the Chinese population. Int J Med Inform 2025; 199:105884. [PMID: 40147416 DOI: 10.1016/j.ijmedinf.2025.105884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2024] [Revised: 03/04/2025] [Accepted: 03/19/2025] [Indexed: 03/29/2025]
Abstract
PURPOSE Our study aims to develop and validate an effective in-hospital major adverse cardiovascular events(MACE) prediction model for patients with emergency Non-ST elevation acute coronary syndrome(NSTE-ACS). METHODS We retrospectively collected NSTE-ACS patients in three tertiary hospitals in Chongqing. In-hospital MACE was the predicted outcome. Patients from one hospital were divided into training set and internal validation set according to the ratio of 7:3. Besides, 662 patients from two other tertiary hospitals were for external validation. Patient information including demographics, laboratory tests results and disease course records were for comprehensive analysis. Finally, LASSO were used to identify the predictors and develop the model. This model was subsequently visualized as a nomogram, followed by both internal and external validations.The receiver operating characteristic curve, calibration curve and clinical decision curve analysis were used to assess the model's discrimination, calibration and clinical applicability, respectively. RESULTS A total of 3,308 patients were included, 375 of whom developed in-hospital MACE. The LR model demonstrated that length of stay, neutrophils, myoglobin, NYHA, CCI, NT-proBNP, LVEF and respiratory failure were risk factors for in-hospital MACE in emergence NSTE-ACS patients. In the training set, the AUC was 0.860 (95%CI:0.831-0.889). In external validation,the AUC was 0.855(95%CI:0.808-0.902), and both the calibration curve and DCA in validation set also revealed stable predictive accuracy and clinical validity.Additionally,it is available to calculate the MACE risk online via the web page (https://cocozhou99.shinyapps.io/DynNomapp/). CONCLUSION The prediction model we constructed has good predictive performance and can help healthcare professionals accurately assess the risk of in-hospital MACE in emergence NSTE-ACS patients.
Collapse
Affiliation(s)
- Qianhui Zhou
- Department of Nursing, The First Affiliated Hospital of Chongqing Medical, University, Chongqing, China
| | - Rui He
- Department of Cardiothoracic Surgery, The First Affiliated Hospital of Chongqing, Medical University, Chongqing, China
| | - Hong Li
- Department of Nursing, The First Affiliated Hospital of Chongqing Medical, University, Chongqing, China
| | - Manping Gu
- Department of Nursing, The First Affiliated Hospital of Chongqing Medical, University, Chongqing, China.
| |
Collapse
|
2
|
Negoi I. Personalized surveillance in colorectal cancer: Integrating circulating tumor DNA and artificial intelligence into post-treatment follow-up. World J Gastroenterol 2025; 31:106670. [DOI: 10.3748/wjg.v31.i18.106670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/03/2025] [Revised: 04/07/2025] [Accepted: 04/18/2025] [Indexed: 05/13/2025] Open
Abstract
Given the growing burden of colorectal cancer (CRC) as a global health challenge, it becomes imperative to focus on strategies that can mitigate its impact. Post-treatment surveillance has emerged as essential for early detection of recurrence, significantly improving patient outcomes. However, intensive surveillance strategies have shown mixed results compared to less intensive methods, emphasizing the necessity for personalized, risk-adapted approaches. The observed suboptimal adherence to existing surveillance protocols underscores the urgent need for more tailored and efficient strategies. In this context, circulating tumor DNA (ctDNA) emerges as a promising biomarker with significant potential to revolutionize post-treatment surveillance, demonstrating high specificity [0.95, 95% confidence interval (CI): 0.91-0.97] and robust diagnostic odds (37.6, 95%CI: 20.8-68.0) for recurrence detection. Furthermore, artificial intelligence and machine learning models integrating patient-specific and tumor features can enhance risk stratification and optimize surveillance strategies. The reported area under the receiver operating characteristic curve, measuring artificial intelligence model performance in predicting CRC recurrence, ranged from 0.581 and 0.593 at the lowest to 0.979 and 0.978 at the highest in training and validation cohorts, respectively. Despite this promise, addressing cost, accessibility, and extensive validation remains crucial for equitable integration into clinical practice.
Collapse
Affiliation(s)
- Ionut Negoi
- Department of General Surgery, Carol Davila University of Medicine and Pharmacy Bucharest, Clinical Emergency Hospital of Bucharest, Bucharest 014461, Romania
| |
Collapse
|
3
|
Alfaraj SA, Kist JM, Groenwold RHH, Spruit M, Mook-Kanamori D, Vos RC. External validation of SCORE2-Diabetes in The Netherlands across various socioeconomic levels in native-Dutch and non-Dutch populations. Eur J Prev Cardiol 2025; 32:555-563. [PMID: 39485827 DOI: 10.1093/eurjpc/zwae354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Revised: 07/15/2024] [Accepted: 10/17/2024] [Indexed: 11/03/2024]
Abstract
AIMS Adults with type 2 diabetes have an increased risk of cardiovascular events (CVEs), the world's leading cause of mortality. The SCORE2-Diabetes model is a tool designed to estimate the 10-year risk of CVE specifically in individuals with type 2 diabetes. However, the performance of such models may vary across different demographic and socioeconomic groups, necessitating validation and assessment in diverse populations. This study aims to externally validate SCORE2-Diabetes and assess its performance across various socioeconomic and migration origins in The Netherlands. METHODS AND RESULTS We selected adults with type 2 diabetes, aged 40-79 years and without previous CVE from the Extramural LUMC Academic Network (ELAN) primary care data cohort from 2007 to 2023. ELAN data were linked with Statistics Netherlands registry data to obtain information about the country of origin and socioeconomic status (SES). Cardiovascular event was defined as myocardial infarction, stroke, or CV mortality. Non-CV mortality was considered a competing event. Analyses were stratified by sex, Dutch vs. other non-Dutch countries of origin, and quintiles of SES. Of the 26 544 included adults with type 2 diabetes, 2518 developed CVE. SCORE2-Diabetes showed strong predictive accuracy for CVE in the Dutch population [observed-to-expected ratio (OE) = 1.000, 95% CI = 0.990-1.008 for men, and OE = 1.050, 95% CI = 1.042-1.057 for women]. For non-Dutch individuals, the model underestimated CVE risk (OE = 1.121, 95% CI = 1.108-1.131 for men, and OE = 1.100, 95% CI = 1.092-1.111 for women). The model also underestimated the CVE risk (OE > 1) in low SES groups and overestimated the risk (OE < 1) in high SES groups. Discrimination was moderate across subgroups with c-indices between 0.6 and 0.7. CONCLUSION SCORE2-Diabetes accurately predicted the risk of CVE in the Dutch population. However, it underpredicted the risk of CVE in the low SES groups and non-Dutch origins, while overpredicting the risk in high SES men and women. Additional clinical judgment must be considered when using SCORE2-Diabetes for different SES and countries of origin. LAY SUMMARY A new study validates the SCORE2-Diabetes model for predicting a 10-year risk of cardiovascular events in type 2 diabetes. Strong accuracy for the Dutch population, but underestimation of the risk for low SES and non-Dutch groups. SCORE2-Diabetes should be used with extra caution across diverse subgroups.
Collapse
Affiliation(s)
- Sukainah A Alfaraj
- Department of Public Health and Primary Care/Health Campus the Hague, Leiden University Medical Center (LUMC), Leiden, The Netherlands
| | - Janet M Kist
- Department of Public Health and Primary Care/Health Campus the Hague, Leiden University Medical Center (LUMC), Leiden, The Netherlands
| | | | - Marco Spruit
- Department of Public Health and Primary Care/Health Campus the Hague, Leiden University Medical Center (LUMC), Leiden, The Netherlands
- Leiden Institute of Advanced Computer Science (LIACS), Leiden University, Leiden, The Netherlands
| | - Dennis Mook-Kanamori
- Department of Public Health and Primary Care/Health Campus the Hague, Leiden University Medical Center (LUMC), Leiden, The Netherlands
- Department of Clinical Epidemiology, LUMC, Leiden, The Netherlands
| | - Rimke C Vos
- Department of Public Health and Primary Care/Health Campus the Hague, Leiden University Medical Center (LUMC), Leiden, The Netherlands
| |
Collapse
|
4
|
van de Klundert J, Perez-Galarce F, Olivares M, Pengel L, de Weerd A. The comparative performance of models predicting patient and graft survival after kidney transplantation: A systematic review. Transplant Rev (Orlando) 2025; 39:100934. [PMID: 40339177 DOI: 10.1016/j.trre.2025.100934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2025] [Revised: 04/25/2025] [Accepted: 04/26/2025] [Indexed: 05/10/2025]
Abstract
BACKGROUND Cox proportional hazard models have long been the model of choice for survival prediction after kidney transplantation. In recent years, a variety of novel model types have been proposed. We investigate the prediction performance across different model types, including machine learning models and traditional model types. METHODS A systematic review was conducted following PROBAST and CHARMS, also considering extensions to TRIPOD+AI and PROBAST+AI, for data collection and risk of bias assessment. The review only included publications that reported on prediction performance for models of different types. A comparative analysis tested performance differences between the model types. RESULTS The review included 37 publications which presented 134 comparative studies. The designs of many studies left room for improvement and most studies had high risk of bias. The collected data admitted testing of performance differences for 22 pairs of model types, ten of which yielded significant differences. Support Vector Machines and Logistic Regression were never found to outperform other model types. Other comparisons, however, provide inconclusive comparative performance results and none of the model types performed consistently and significantly better than alternatives. CONCLUSIONS Rigorous review of current evidence and comparative performance evidence finds no significant kidney transplant survival prediction performance differences that Cox Proportional Hazard models are being outperformed. The design of many of the studies implies high risk of bias and more and better designed studies which reutilize best performing models are needed. This enables to resolve model biases, reporting issues, and to increase the power of comparative performance analysis.
Collapse
Affiliation(s)
| | - Francisco Perez-Galarce
- Department of Computer Science, School of Engineering, Pontifica Universidad Catolica, Santiago, Chile; Facultad de Ingeniería y Negocios, Universidad de Las Américas, Sede Providencia, Manuel Montt 948, Santiago, Chile
| | - Marcelo Olivares
- Faculty of Economics and Business, Universidad de Chile, Santiago, Chile
| | - Liset Pengel
- Erasmus MC Transplant Institute, University Medical Center Rotterdam, the Netherlands
| | - Annelies de Weerd
- Erasmus MC Transplant Institute, University Medical Center Rotterdam, Department of Internal Medicine, the Netherlands
| |
Collapse
|
5
|
Siemens K, Hunt BJ, Tibby SM. In Response. Anesth Analg 2025; 140:e60-e61. [PMID: 39977338 DOI: 10.1213/ane.0000000000007464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2025]
Affiliation(s)
- Kristina Siemens
- Paediatric Intensive care Unit Evelina London Children's Hospital, Guy's & St Thomas NHS Foundation Trust, London, United Kingdom
| | - Beverley J Hunt
- Thrombosis and Haemophilia Centre, Thrombosis and Vascular Biology Group, Guy's & St Thomas NHS Foundation Trust, London, United Kingdom
| | - Shane M Tibby
- Paediatric Intensive care Unit Evelina London Children's Hospital, Guy's & St Thomas NHS Foundation Trust, London, United Kingdom
| |
Collapse
|
6
|
Butt AL, Allan PG, Dang DD, Tanaka KA. Test Driving an Old Car on a New Road-The Need for Context-Specific Adaptations in Predictive Modeling. Anesth Analg 2025; 140:e59-e60. [PMID: 39977344 DOI: 10.1213/ane.0000000000007463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2025]
Affiliation(s)
- Amir L Butt
- Department of Anesthesiology, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma
| | - Parker G Allan
- Department of Anesthesiology, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma
| | - Dustin D Dang
- University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma
| | - Kenichi A Tanaka
- Department of Anesthesiology, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma,
| |
Collapse
|
7
|
Neal SR, Sturrock SS, Musorowegomo D, Gannon H, Zaman M, Cortina-Borja M, Le Doare K, Heys M, Chimhini G, Fitzgerald F. Clinical prediction models to diagnose neonatal sepsis in low-income and middle-income countries: a scoping review. BMJ Glob Health 2025; 10:e017582. [PMID: 40204466 DOI: 10.1136/bmjgh-2024-017582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2024] [Accepted: 02/26/2025] [Indexed: 04/11/2025] Open
Abstract
INTRODUCTION Neonatal sepsis causes significant morbidity and mortality worldwide but is difficult to diagnose clinically. Clinical prediction models (CPMs) could improve diagnostic accuracy, facilitating earlier treatment for cases and avoiding antibiotic overuse. Neonates in low-income and middle-income countries (LMICs) are disproportionately affected by sepsis, yet no review has comprehensively synthesised evidence for CPMs validated in this setting. METHODS We performed a scoping review of CPMs to diagnose neonatal sepsis using Ovid MEDLINE, Ovid Embase, Scopus, Web of Science, Global Index Medicus and the Cochrane Library. The most recent searches were performed on 16 June 2024. We included studies published in English or Spanish that validated a new or existing CPM for neonatal sepsis in any healthcare setting in an LMIC. Studies were excluded if they validated a prognostic model or where data for neonates could not be separated from a larger paediatric population. Studies were selected by two independent reviewers and summarised by narrative synthesis. RESULTS From 4598 unique records, we included 82 studies validating 44 distinct models in 24 252 neonates. Most studies were set in neonatal intensive or special care units (n=64, 78%) in middle-income countries (n=81, 99%) and included neonates already suspected of sepsis (n=58, 71%). Only four studies (5%) were set in the WHO African region, and only one study included data from a low-income country. Two-thirds of CPMs (n=30) required laboratory parameters, and three-quarters (n=34) were only validated in one study. CONCLUSION Our review highlights several literature gaps, particularly a paucity of studies validating models in the lowest-income countries where neonatal sepsis is most prevalent, and models for the undifferentiated neonatal population that do not rely on laboratory tests. Furthermore, heterogeneity in study populations, definitions of sepsis and reporting of models inhibits meaningful comparison between studies and may hinder progress towards useful diagnostic tools.
Collapse
Affiliation(s)
- Samuel R Neal
- UCL GOS Institute of Child Health, London, UK
- The University of Edinburgh College of Medicine and Veterinary Medicine, Edinburgh, UK
| | | | - David Musorowegomo
- University of Zimbabwe Faculty of Medicine and Health Sciences, Harare, Zimbabwe
| | | | - Michele Zaman
- Queen's University School of Medicine, Kingston, Ontario, Canada
| | | | | | | | - Gwendoline Chimhini
- University of Zimbabwe Faculty of Medicine and Health Sciences, Harare, Zimbabwe
| | | |
Collapse
|
8
|
Rysstad T, Grotle M, Traeger AC, Aasdahl L, Vigdal ØN, Aanesen F, Øiestad BE, Pripp AH, Wynne-Jones G, Dunn KM, Fors EA, Linton SJ, Tveter AT. Predicting prolonged work absence due to musculoskeletal disorders: development, validation, and clinical usefulness of prognostic prediction models. Int Arch Occup Environ Health 2025:10.1007/s00420-025-02129-8. [PMID: 40198330 DOI: 10.1007/s00420-025-02129-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2024] [Accepted: 02/11/2025] [Indexed: 04/10/2025]
Abstract
PURPOSE Given the lack of robust prognostic models for early identification of individuals at risk of work disability, this study aimed to develop and externally validate three models for prolonged work absence among individuals on sick leave due to musculoskeletal disorders. METHODS We developed three multivariable logistic regression models using data from 934 individuals on sick leave for 4-12 weeks due to musculoskeletal disorders, recruited through the Norwegian Labour and Welfare Administration. The models predicted three outcomes: (1) > 90 consecutive sick days, (2) > 180 consecutive sick days, and (3) any new or increased work assessment allowance or disability pension within 12 months. Each model was externally validated in a separate cohort of participants (8-12 weeks of sick leave) from a different geographical region in Norway. We evaluated model performance using discrimination (c-statistic), calibration, and assessed clinical usefulness using decision curve analysis (net benefit). Bootstrapping was used to adjust for overoptimism. RESULTS All three models showed good predictive performance in the external validation sample, with c-statistics exceeding 0.76. The model predicting > 180 days performed best, demonstrating good calibration and discrimination (c-statistic 0.79 (95% CI 0.73-0.85), and providing net benefit across a range of decision thresholds from 0.10 to 0.80. CONCLUSIONS These models, particularly the one predicting > 180 days, may facilitate secondary prevention strategies and guide future clinical trials. Further validation and refinement are necessary to optimise the models and to test their performance in larger samples.
Collapse
Affiliation(s)
- Tarjei Rysstad
- Department of Rehabilitation Science and Health Technology, Faculty of Health Sciences, Oslo Metropolitan University, St. Olavs Plass, P.O. Box 4, 0130, Oslo, Norway.
| | - Margreth Grotle
- Department of Rehabilitation Science and Health Technology, Faculty of Health Sciences, Oslo Metropolitan University, St. Olavs Plass, P.O. Box 4, 0130, Oslo, Norway
- Department of Research and Innovation, Division of Clinical Neuroscience, Oslo University Hospital, Oslo, Norway
| | - Adrian C Traeger
- Institute for Musculoskeletal Health, The University of Sydney and Sydney Local Health District, Sydney, Australia
- School of Public Health, Faculty of Medicine and Health, The University of Sydney, Sydney, Australia
| | - Lene Aasdahl
- Department of Public Health and Nursing, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
- Unicare Helsefort Rehabilitation Centre, Rissa, Norway
| | - Ørjan Nesse Vigdal
- Department of Rehabilitation Science and Health Technology, Faculty of Health Sciences, Oslo Metropolitan University, St. Olavs Plass, P.O. Box 4, 0130, Oslo, Norway
| | - Fiona Aanesen
- National Institute of Occupational Health, Majorstuen, Oslo, Norway
| | - Britt Elin Øiestad
- Department of Rehabilitation Science and Health Technology, Faculty of Health Sciences, Oslo Metropolitan University, St. Olavs Plass, P.O. Box 4, 0130, Oslo, Norway
| | - Are Hugo Pripp
- Oslo Centre of Biostatistics and Epidemiology, Research Support Services, Oslo University Hospital, Oslo, Norway
- Faculty of Health Sciences, Oslo Metropolitan University, Oslo, Norway
| | | | - Kate M Dunn
- School of Medicine, Keele University, Staffordshire, UK
| | - Egil A Fors
- Department of Public Health and Nursing, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
| | - Steven J Linton
- Department of Law, Psychology, and Social Work, Örebro University, Orebro, Sweden
| | - Anne Therese Tveter
- Department of Rehabilitation Science and Health Technology, Faculty of Health Sciences, Oslo Metropolitan University, St. Olavs Plass, P.O. Box 4, 0130, Oslo, Norway
- Center for Treatment of Rheumatic and Musculoskeletal Diseases (REMEDY), Diakonhjemmet Hospital, Oslo, Norway
| |
Collapse
|
9
|
Wang Z, Wang W, Sun C, Li J, Xie S, Xu J, Zou K, Jin Y, Yan S, Liao X, Kang Y, Coopersmith CM, Sun X. A methodological systematic review of validation and performance of sepsis real-time prediction models. NPJ Digit Med 2025; 8:190. [PMID: 40189694 PMCID: PMC11973177 DOI: 10.1038/s41746-025-01587-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2024] [Accepted: 03/26/2025] [Indexed: 04/09/2025] Open
Abstract
Sepsis real-time prediction models (SRPMs) provide timely alerts and may improve patient outcomes but face limited clinical adoption due to inconsistent validation methods and potential biases. Comprehensive evaluation, including external full-window validation with model- and outcome-level metrics, is crucial for real-world effectiveness, yet performance evidence remains scarce. This study systematically reviewed SRPM performance across validation methods, analyzing 91 studies from multiple databases. Only 54.9% applied full-window validation with both metric types. Performance decreased under external and full-window validation, with median AUROCs of 0.886 and 0.861 at 6- and 12-hours pre-onset, dropping to 0.783 in full-window external validation. Median Utility Scores declined from 0.381 in internal to -0.164 in external validation. Combining AUROC and Utility Score identified top-performing SRPMs in 18.7% of studies. Hand-crafted features significantly improved performance. Future research should focus on multi-center datasets, hand-crafted features, multi-metric full-window validation, and prospective trials to support clinical implementation.
Collapse
Affiliation(s)
- Zichen Wang
- Department of Critical Care Medicine, Chinese Evidence-based Medicine Center, West China Hospital, Sichuan University, Chengdu, 610041, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, 610041, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, China
- West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, China
| | - Wen Wang
- Department of Critical Care Medicine, Chinese Evidence-based Medicine Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, 610041, China.
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, China.
| | - Che Sun
- Department of Critical Care Medicine, Chinese Evidence-based Medicine Center, West China Hospital, Sichuan University, Chengdu, 610041, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, 610041, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, China
| | - Jili Li
- Department of Critical Care Medicine, Chinese Evidence-based Medicine Center, West China Hospital, Sichuan University, Chengdu, 610041, China
- West China School of Medicine, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Shuangyi Xie
- Department of Critical Care Medicine, Chinese Evidence-based Medicine Center, West China Hospital, Sichuan University, Chengdu, 610041, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, 610041, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, China
| | - Jiayue Xu
- Department of Critical Care Medicine, Chinese Evidence-based Medicine Center, West China Hospital, Sichuan University, Chengdu, 610041, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, 610041, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, China
| | - Kang Zou
- Department of Critical Care Medicine, Chinese Evidence-based Medicine Center, West China Hospital, Sichuan University, Chengdu, 610041, China
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, 610041, China
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, China
| | - Yinghui Jin
- Center for Evidence-Based and Translational Medicine, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
| | - Siyu Yan
- Center for Evidence-Based and Translational Medicine, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
| | - Xuelian Liao
- Department of Critical Care Medicine, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Yan Kang
- Department of Critical Care Medicine, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Craig M Coopersmith
- Emory Critical Care Center and Department of Surgery, Emory University School of Medicine, Atlanta, GA, USA
| | - Xin Sun
- Department of Critical Care Medicine, Chinese Evidence-based Medicine Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, Chengdu, 610041, China.
- Sichuan Center of Technology Innovation for Real World Data, Chengdu, China.
- West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, China.
| |
Collapse
|
10
|
Heesen P, Christ SM, Ciobanu-Caraus O, Kahraman A, Schelling G, Studer G, Bode-Lesniewska B, Fuchs B. Clinical prognostic models for sarcomas: a systematic review and critical appraisal of development and validation studies. Diagn Progn Res 2025; 9:7. [PMID: 40189567 PMCID: PMC11974052 DOI: 10.1186/s41512-025-00186-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/22/2024] [Accepted: 02/28/2025] [Indexed: 04/09/2025] Open
Abstract
BACKGROUND Current clinical guidelines recommend the use of clinical prognostic models (CPMs) for therapeutic decision-making in sarcoma patients. However, the number and quality of developed and externally validated CPMs is unknown. Therefore, we aimed to describe and critically assess CPMs for sarcomas. METHODS We performed a systematic review including all studies describing the development and/or external validation of a CPM for sarcomas. We searched the databases MEDLINE, EMBASE, Cochrane Central, and Scopus from inception until June 7th, 2022. The risk of bias was assessed using the prediction model risk of bias assessment tool (PROBAST). RESULTS Seven thousand six hundred fifty-six records were screened, of which 145 studies were eventually included, developing 182 and externally validating 59 CPMs. The most frequently modeled type of sarcoma was osteosarcoma (43/182; 23.6%), and the most frequently predicted outcome was overall survival (81/182; 44.5%). The most used predictors were the patient's age (133/182; 73.1%) and tumor size (116/182; 63.7%). Univariable screening was used in 137 (75.3%) CPMs, and only 7 (3.9%) CPMs were developed using pre-specified predictors based on clinical knowledge or literature. The median c-statistic on the development dataset was 0.74 (interquartile range [IQR] 0.71, 0.78). Calibration was reported for 142 CPMs (142/182; 78.0%). The median c-statistic of external validations was 0.72 (IQR 0.68-0.75). Calibration was reported for 46 out of 59 (78.0%) externally validated CPMs. We found 169 out of 241 (70.1%) CPMs to be at high risk of bias, mostly due to the high risk of bias in the analysis domain. DISCUSSION While various CPMs for sarcomas have been developed, the clinical utility of most of them is hindered by a high risk of bias and limited external validation. Future research should prioritise validating and updating existing well-developed CPMs over developing new ones to ensure reliable prognostic tools. TRIAL REGISTRATION PROSPERO CRD42022335222.
Collapse
Affiliation(s)
- Philip Heesen
- Faculty of Medicine, University of Zurich, Raemistrasse 71, Zurich, 8006, Switzerland.
| | - Sebastian M Christ
- Department of Radiation Oncology, University Hospital Zurich and University of Zurich, Raemistrasse 100, Zurich, 8091, Switzerland
| | | | - Abdullah Kahraman
- School of Life Sciences, University of Applied Sciences and Arts Northwestern Switzerland, Hofackerstrasse 30, Muttenz, 4132, Switzerland
| | - Georg Schelling
- Department of Orthopaedics and Trauma, University Teaching Hospital LUKS, Sarcoma Service, Spitalstrasse, 6000, Lucerne, Switzerland
| | - Gabriela Studer
- Department of Radiation Oncology, University Teaching Hospital LUKS, Spitalstrasse, 6000, Lucerne, Switzerland
- Faculty of Health Sciences and Medicine, University of Lucerne, Frohburgstrasse 3, Lucerne, 6002, Switzerland
| | - Beata Bode-Lesniewska
- Pathology Institute Enge and University of Zurich, Museumstrasse 135, Zurich, 8005, Switzerland
| | - Bruno Fuchs
- Department of Orthopaedics and Trauma, University Teaching Hospital LUKS, Sarcoma Service, Spitalstrasse, 6000, Lucerne, Switzerland
- Faculty of Health Sciences and Medicine, University of Lucerne, Frohburgstrasse 3, Lucerne, 6002, Switzerland
- Department of Orthopaedics and Trauma, Kantonsspital Winterthur, Sarcoma Service, Brauerstrasse 15, Winterthur, 8400, Switzerland
| |
Collapse
|
11
|
Rysavy MA. Challenges in making an evidence-based prognosis. Semin Perinatol 2025; 49:152054. [PMID: 40404235 DOI: 10.1016/j.semperi.2025.152054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/06/2025] [Revised: 02/11/2025] [Accepted: 02/11/2025] [Indexed: 05/24/2025]
Abstract
Prognosis is one of three traditional roles of clinicians, along with diagnosis and therapy. Prognostication-predicting and communicating about what to expect-plays a major, if overlooked, role in the day-to-day practice of both obstetricians and neonatologists. This article describes several challenges in formulating an evidence-based prognosis that practicing clinicians may find helpful to consider in their practice.
Collapse
Affiliation(s)
- Matthew A Rysavy
- McGovern Medical School at UTHealth Houston, Houston, TX, USA; Children's Memorial Hermann Hospital, Houston, TX, USA.
| |
Collapse
|
12
|
Schots BBS, Pizarro CS, Arends BKO, Oerlemans MIFJ, Ahmetagić D, van der Harst P, van Es R. Deep learning for electrocardiogram interpretation: Bench to bedside. Eur J Clin Invest 2025; 55 Suppl 1:e70002. [PMID: 40191935 PMCID: PMC11973865 DOI: 10.1111/eci.70002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Accepted: 01/23/2025] [Indexed: 04/09/2025]
Abstract
BACKGROUND Recent advancements in deep learning (DL), a subset of artificial intelligence, have shown the potential to automate and improve disease recognition, phenotyping and prediction of disease onset and outcomes by analysing various sources of medical data. The electrocardiogram (ECG) is a valuable tool for diagnosing and monitoring cardiovascular conditions. METHODS The implementation of DL in ECG analysis has been used to detect and predict rhythm abnormalities and conduction abnormalities, ischemic and structural heart diseases, with performance comparable to physicians. However, despite promising development of DL algorithms for automatic ECG analysis, the integration of DL-based ECG analysis and deployment of medical devices incorporating these algorithms into routine clinical practice remains limited. RESULTS This narrative review highlights the applications of DL in 12-lead ECG analysis. Furthermore, we review randomized controlled trials that assess the clinical effectiveness of these DL tools. Finally, it addresses different key barriers to widespread implementation in clinical practice, including regulatory hurdles, algorithm transparency and data privacy concerns. CONCLUSIONS By outlining both the progress and the obstacles in this field, this review aims to provide insights into how DL could shape the future of ECG analysis and enhance cardiovascular care in daily clinical practice.
Collapse
Affiliation(s)
- Bas B. S. Schots
- Department of CardiologyUniversity Medical Center UtrechtUtrechtThe Netherlands
| | - Camila S. Pizarro
- Department of CardiologyUniversity Medical Center UtrechtUtrechtThe Netherlands
| | - Bauke K. O. Arends
- Department of CardiologyUniversity Medical Center UtrechtUtrechtThe Netherlands
| | | | - Dino Ahmetagić
- Department of CardiologyUniversity Medical Center UtrechtUtrechtThe Netherlands
| | - Pim van der Harst
- Department of CardiologyUniversity Medical Center UtrechtUtrechtThe Netherlands
| | - René van Es
- Department of CardiologyUniversity Medical Center UtrechtUtrechtThe Netherlands
- Cordys Analytics B.V.UtrechtThe Netherlands
| |
Collapse
|
13
|
Tian CH, Liu LY, Huang YF, Yang HJ, Lai YY, Li CL, Gan D, Yang J. Clinical prediction models for in vitro fertilization outcomes: a systematic review, meta-analysis, and external validation. Hum Reprod 2025; 40:633-646. [PMID: 39983753 DOI: 10.1093/humrep/deaf013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2024] [Revised: 12/16/2024] [Indexed: 02/23/2025] Open
Abstract
STUDY QUESTION What is the best-performing model currently predicting live birth outcomes for IVF or ICSI? SUMMARY ANSWER Among the identified prognostic models, McLernon's post-treatment model outperforms other models in both the meta-analysis and external validation of a Chinese cohort. WHAT IS KNOWN ALREADY With numerous similar models available across different time periods and using various predictors in IVF prognostic models, there is a need to summarize and evaluate them, due to a lack of validated evidence distinguishing high-quality from low-quality prediction tools. However, there is a notable dearth of research in the form of meta-analysis or external validation assessing the performance of models in predicting live births in this field. STUDY DESIGN, SIZE, DURATION The researchers conducted a comprehensive literature review in PubMed, EMBASE, and Web of Science, using keywords related to prognostic models and IVF/ICSI live birth outcomes. The search included studies published up to 3 April 2024, and was limited to English language studies. PARTICIPANTS/MATERIALS, SETTING, METHODS The review included studies that developed or validated prognostic models for IVF live birth outcomes while providing clear reports on model characteristics. Researchers extracted and analysed the data in accordance with the guidelines outlined in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses and other model-related guidelines. For model effects in meta-analysis, the choice would be based on the heterogeneity assessed using the I2 statistic and the Cochrane Q test. Model performance was evaluated by assessing their area under the receiver operating characteristic curves (AUCs) and calibration plots in the studies. MAIN RESULTS AND THE ROLE OF CHANCE This review provides a comprehensive summary of data derived from 72 studies with an overall ROB of high or unclear. These studies contained a total of 132 predictors and 86 prognostic models, and then meta-analyses were performed for each of the five selected models. The total random effects of Templeton's, Nelson's, McLernon's pre-treatment and post-treatment model demonstrated AUCs of 0.65 (95% CI: 0.61-0.69), 0.63 (95% CI: 0.63-0.64), 0.67 (95% CI: 0.62-0.71), and 0.73 (95% CI: 0.71-0.75), respectively. The total fixed effects of the intelligent data analysis score (iDAScore) model estimated an AUC of 0.66 (95% CI: 0.63-0.68). The external validation of the initial four models in our cohort produced AUCs ranging from 0.53 to 0.58, and the calibration was confirmed through calibration plots. LIMITATIONS, REASONS FOR CAUTION While the focus on English-language studies and live birth outcomes may constrain the generalizability of the findings to diverse populations, this approach equips clinicians, who view live births as the ultimate objective, with more precise and actionable reference guidelines. WIDER IMPLICATIONS OF THE FINDINGS This study represents the first meta-analysis in the field of IVF prediction models, definitively confirming the superior performance of McLernon's post-treatment model. The conclusion is reinforced by independent validation from another perspective. Nevertheless, further investigation is warranted to develop new models and to externally validate existing high-performing models for prognostic accuracy in IVF outcomes. STUDY FUNDING/COMPETING INTEREST(S) This study was supported by the National Natural Science Foundation of China (Grant No. 82174517). The authors report no conflict of interest. REGISTRATION NUMBER 2022 CRD42022312018.
Collapse
Affiliation(s)
- C H Tian
- Acupuncture and Tuina School, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - L Y Liu
- Acupuncture and Tuina School, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Y F Huang
- Acupuncture and Tuina School, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - H J Yang
- Clinical School, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Y Y Lai
- Acupuncture and Tuina School, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - C L Li
- Acupuncture and Tuina School, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - D Gan
- Department of Traditional Chinese Medicine, Sichuan Jinxin Xinan Women's and Children's Hospital, Chengdu, China
| | - J Yang
- Acupuncture and Tuina School, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| |
Collapse
|
14
|
Ai C, Song J, Yuan C, Xu G, Yang J, Lv T, Jin S, Wu H, Xiang B, Yang J. Prediction model of the T cell-mediated rejection after liver transplantation in children and adults: A case-controlled study. Int J Surg 2025; 111:2827-2837. [PMID: 39878165 DOI: 10.1097/js9.0000000000002279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Accepted: 01/05/2025] [Indexed: 01/31/2025]
Abstract
OBJECTIVE T cell-mediated rejection (TCMR) is a major concern following liver transplantation (LT), and identifying its predictors could help improve post-transplant prognosis. This study aimed to develop a model to predict the risk of TCMR in children and adults after LT. METHOD Pre-transplant demographic characteristics, intraoperative parameters, and especially early post-transplant laboratory data for 1221 LT recipients (1096 adults and 125 children) were obtained from the Hospital, University, between 1 January 2015, and 1 January 2022. These data were analyzed to develop the prediction model. RESULT The incidence of TCMR was higher in pediatric LT recipients than in adults (17.6% vs. 6.4%, P < 0.001). In adult recipients, seven predictors were identified: donor sex, recipient age, recipient height, and post-transplant levels of serum direct bilirubin, urea, platelets, and neutrophil-to-lymphocyte ratio. In pediatric recipients, four predictors were identified: post-transplant levels of serum monocyte percentage, direct bilirubin, albumin, and gamma-glutamyl transferase. The area under the model's curve incorporating these variables for predicting TCMR after LT was 0.713 (95% confidence interval, CI: 0.655-0.770) in adults and 0.786 (95% CI: 0.675-0.896) in children. Decision curve analyses demonstrated the clinical significance of the model. CONCLUSION This study developed a prediction model that may be useful in identifying high-TCMR-risk populations in both adult and pediatric LT recipients.
Collapse
Affiliation(s)
- Chengbo Ai
- Department of Pediatric Surgery, West China Hospital/West China School of Medicine, Sichuan University, Chengdu, PR China
| | - Jiulin Song
- Department of Pediatric Surgery, West China Hospital/West China School of Medicine, Sichuan University, Chengdu, PR China
| | - Chi Yuan
- Department of Pediatric Surgery, West China Hospital/West China School of Medicine, Sichuan University, Chengdu, PR China
| | - Gang Xu
- Department of Liver Transplant Center, Organ Transplant Center, West China Hospital/West China School of Medicine, Sichuan University, Chengdu, PR China
| | - Jian Yang
- Department of Liver Transplant Center, Organ Transplant Center, West China Hospital/West China School of Medicine, Sichuan University, Chengdu, PR China
| | - Tao Lv
- Department of Liver Transplant Center, Organ Transplant Center, West China Hospital/West China School of Medicine, Sichuan University, Chengdu, PR China
| | - Shuguang Jin
- Department of Pediatric Surgery, West China Hospital/West China School of Medicine, Sichuan University, Chengdu, PR China
| | - Hong Wu
- Department of Liver Transplant Center, Organ Transplant Center, West China Hospital/West China School of Medicine, Sichuan University, Chengdu, PR China
| | - Bo Xiang
- Department of Pediatric Surgery, West China Hospital/West China School of Medicine, Sichuan University, Chengdu, PR China
| | - Jiayin Yang
- Department of Liver Transplant Center, Organ Transplant Center, West China Hospital/West China School of Medicine, Sichuan University, Chengdu, PR China
| |
Collapse
|
15
|
Stassen RC, Maas CCHM, Leong SP, Kashani-Sabet M, White RL, Pockaj BA, Zager JS, Schneebaum S, Vetto JT, Avisar E, Harrison Howard J, O’Donoghue C, Kosiorek H, van Akkooi ACJ, Verhoef C, van Klaveren D, Grünhagen DJ, Olofsson Bagge R. External validation of a model to predict recurrence-free and melanoma-specific survival for patients with melanoma after sentinel node biopsy. Br J Surg 2025; 112:znaf037. [PMID: 40243383 PMCID: PMC12004364 DOI: 10.1093/bjs/znaf037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2024] [Revised: 12/11/2024] [Accepted: 01/26/2025] [Indexed: 04/18/2025]
Abstract
BACKGROUND Recently, a model to predict 5-year recurrence-free survival (RFS) and melanoma-specific survival (MSS) after sentinel lymph node biopsy (SLNB) was published. The aim of this study was to validate that model in a large independent international cohort. METHODS The database of the Sentinel Lymph Node Working Group (SLNWG) was analysed for patients with malignant melanoma who underwent SLNB. Patients with clinical stage III melanoma, a history of other malignancies, or receiving concomitant systemic therapies during follow-up were excluded. The model's predictive performance was evaluated using discrimination and calibration metrics in the eligible cohort. Decision curve analysis was performed to assess the clinical value of the model. RESULTS The external validation cohort consisted of 6174 patients of the SLNWG from the USA, Europe, and Israel. A positive sentinel node was found in 788 patients (12.8%). The area under the time-dependent receiver operating characteristic (ROC) curve of the external validation was 0.76 (95% c.i. 0.74 to 0.77) for RFS and 0.79 (95% c.i. 0.76 to 0.81) for MSS. The model was well calibrated, as the observed 5-year survival rates aligned closely with the predicted survival rates (calibration slope of 0.98 for RFS and calibration slope of 0.99 for MSS). The model provided a net benefit versus the 'treat all' and 'treat none' strategies at the predetermined probability threshold for recurrence of 45%. CONCLUSION The model demonstrated good performance in a large heterogeneous independent cohort, emphasizing its robustness. Decision curve analysis revealed a clear net benefit of the model over a treat all strategy, highlighting its potential for clinical use.
Collapse
Affiliation(s)
- Robert C Stassen
- Department of Surgical Oncology, Erasmus Medical Centre Cancer Institute, Rotterdam, The Netherlands
| | - Carolien C H M Maas
- Department of Public Health, Erasmus University Medical Centre, Rotterdam, The Netherlands
| | - Stanley P Leong
- Department of Surgery, California Pacific Medical Center and Research Institute, San Francisco, California, USA
| | - Mohammed Kashani-Sabet
- Department of Surgery, California Pacific Medical Center and Research Institute, San Francisco, California, USA
| | - Richard L White
- Department of Surgery, Levine Cancer Institute, Carolinas Medical Center, Atrium Health, Charlotte, North Carolina, USA
| | | | - Jonathan S Zager
- Department of Cutaneous Oncology, Moffitt Cancer Center, Tampa, Florida, USA
| | - Schlomo Schneebaum
- Department of Surgery, Tel-Aviv Sourasky Medical Center, Tel Aviv, Israel
| | - John T Vetto
- Division of Surgical Oncology, Oregon Health & Science University, Portland, Oregon, USA
| | - Eli Avisar
- Department of Surgery, Division of Surgical Oncology at University of Miami Miller School of Medicine, Miami, Florida, USA
| | - J Harrison Howard
- Department of Surgery, University of South Alabama, Mobile, Alabama, USA
| | - Cristina O’Donoghue
- Department of Surgery, Rush University Medical Center, Chicago, Illinois, USA
| | - Heidi Kosiorek
- Department of Quantitative Health Sciences, Mayo Clinic Arizona, Scottsdale, Arizona, USA
| | - Alexander C J van Akkooi
- Melanoma Institute Australia, University of Sydney, Sydney, New South Wales, Australia
- Faculty of Medicine and Health, University of Sydney, Sydney, New South Wales, Australia
- Department of Melanoma and Surgical Oncology, Royal Prince Alfred Hospital, Sydney, New South Wales, Australia
| | - Cornelis Verhoef
- Department of Surgical Oncology, Erasmus Medical Centre Cancer Institute, Rotterdam, The Netherlands
| | - David van Klaveren
- Department of Public Health, Erasmus University Medical Centre, Rotterdam, The Netherlands
| | - Dirk J Grünhagen
- Department of Surgical Oncology, Erasmus Medical Centre Cancer Institute, Rotterdam, The Netherlands
| | - Roger Olofsson Bagge
- Department of Surgery, Institute of Clinical Sciences, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
- Department of Surgery, Sahlgrenska University Hospital, Gothenburg, Sweden
| |
Collapse
|
16
|
Hartmann S, Dwyer D, Scott I, Wannan CMJ, Nguyen J, Lin A, Middeldorp CM, Wood SJ, Yung AR, McGorry PD, Nelson B, Clark SR. Dynamic Updating of Psychosis Prediction Models in Individuals at Ultra-High Risk of Psychosis. BIOLOGICAL PSYCHIATRY. COGNITIVE NEUROSCIENCE AND NEUROIMAGING 2025:S2451-9022(25)00119-3. [PMID: 40158694 DOI: 10.1016/j.bpsc.2025.03.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/05/2025] [Revised: 03/11/2025] [Accepted: 03/15/2025] [Indexed: 04/02/2025]
Abstract
BACKGROUND The performance of psychiatric risk calculators can deteriorate over time due to changes in patient population, referral pathways, and medical advances. Such temporal biases in existing models may lead to suboptimal decisions when translated into clinical practice. Methods are available to correct this bias, but no research has been conducted to investigate their utility in psychiatry. METHODS We aimed to analyze the performance of model updating methods for predicting psychosis onset by 1 year in 780 individuals at ultra-high risk (UHR) of psychosis from the UHR 1000+ cohort, a longitudinal cohort of UHR individuals recruited to research studies at Orygen, Melbourne, Australia, between 1995 and 2020. Model updating was performed using a yearly adjusted model (recalibration), a continuously updated model (refitting), and a continuous Bayesian updating model (dynamic updating) and compared with a static logistic regression prediction model (original) regarding calibration, discrimination, and clinical net benefit. RESULTS The original model was poorly calibrated over the entire validation period. All 3 updating methods improved the predictive performance compared with the original model (recalibration: p = .009; refitting: p = .020; dynamic updating: p = .001). The dynamic updating method demonstrated the best predictive performance (Harrell's C-index = 0.71; 95% CI, 0.60 to 0.82), calibration slope (slope = 1.12; 95% CI, 0.46 to 1.87), and clinical net benefit over the entire validation period. CONCLUSIONS Dynamic updating of psychosis prediction models may help to mitigate decreases in performance over time. Therefore, existing psychosis prediction models need to be monitored for temporal biases to mitigate potentially harmful decisions.
Collapse
Affiliation(s)
- Simon Hartmann
- Discipline of Psychiatry, Adelaide Medical School, The University of Adelaide, Adelaide, South Australia, Australia; Orygen, Melbourne, Victoria, Australia; Centre for Youth Mental Health, The University of Melbourne, Melbourne, Victoria, Australia.
| | - Dominic Dwyer
- Orygen, Melbourne, Victoria, Australia; Centre for Youth Mental Health, The University of Melbourne, Melbourne, Victoria, Australia
| | - Isabelle Scott
- Orygen, Melbourne, Victoria, Australia; Centre for Youth Mental Health, The University of Melbourne, Melbourne, Victoria, Australia
| | - Cassandra M J Wannan
- Orygen, Melbourne, Victoria, Australia; Centre for Youth Mental Health, The University of Melbourne, Melbourne, Victoria, Australia
| | - Josh Nguyen
- Orygen, Melbourne, Victoria, Australia; Centre for Youth Mental Health, The University of Melbourne, Melbourne, Victoria, Australia
| | - Ashleigh Lin
- School of Population and Global Health, The University of Western Australia, Perth, Western Australia, Australia
| | - Christel M Middeldorp
- Child Health Research Center, University of Queensland, St Lucia, Brisbane, Queensland, Australia; Child and Youth Mental Health Service, Children's Health Queensland Hospital and Health Service, Brisbane, Queensland, Australia; Department of Child and Adolescent Psychiatry and Psychology, Amsterdam University Medical Center, Amsterdam Public Health Research Institute, Amsterdam, the Netherlands; Arkin Mental Health Care, Amsterdam, the Netherlands; Levvel, Academic Center for Child and Adolescent Psychiatry, Amsterdam, the Netherlands
| | - Stephen J Wood
- Orygen, Melbourne, Victoria, Australia; Centre for Youth Mental Health, The University of Melbourne, Melbourne, Victoria, Australia; School of Psychology, University of Birmingham, Edgbaston, United Kingdom
| | - Alison R Yung
- Deakin University, Institute of Mental and Physical Health and Clinical Translation, Geelong, Victoria, Australia; School of Health Science, University of Manchester, Manchester, United Kingdom
| | - Patrick D McGorry
- Orygen, Melbourne, Victoria, Australia; Centre for Youth Mental Health, The University of Melbourne, Melbourne, Victoria, Australia
| | - Barnaby Nelson
- Orygen, Melbourne, Victoria, Australia; Centre for Youth Mental Health, The University of Melbourne, Melbourne, Victoria, Australia
| | - Scott R Clark
- Discipline of Psychiatry, Adelaide Medical School, The University of Adelaide, Adelaide, South Australia, Australia
| |
Collapse
|
17
|
Mansmann U, Ön BI. The validation of prediction models deserves more recognition. BMC Med 2025; 23:166. [PMID: 40102914 PMCID: PMC11921473 DOI: 10.1186/s12916-025-03994-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Accepted: 03/11/2025] [Indexed: 03/20/2025] Open
Affiliation(s)
- Ulrich Mansmann
- Institute for Medical Information Processing, Biometry and Epidemiology (IBE), Faculty of Medicine, Pettenkofer School of Public Health, LMU Munich, Munich, Germany.
| | - Begüm Irmak Ön
- Institute for Medical Information Processing, Biometry and Epidemiology (IBE), Faculty of Medicine, Pettenkofer School of Public Health, LMU Munich, Munich, Germany
| |
Collapse
|
18
|
Smart MH, Lin JY, Layden BT, Eisenberg Y, Pickard AS, Sharp LK, Danielson KK, Kong A. Diabetes Screening in the Emergency Department: Development of a Predictive Model for Elevated Hemoglobin A1c. J Diabetes Res 2025; 2025:8830658. [PMID: 40109952 PMCID: PMC11922610 DOI: 10.1155/jdr/8830658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/18/2024] [Accepted: 02/04/2025] [Indexed: 03/22/2025] Open
Abstract
Aims: We developed a prediction model for elevated hemoglobin A1c (HbA1c) among patients presenting to the emergency department (ED) at risk for diabetes to identify important factors that may influence follow-up patient care. Methods: Retrospective electronic health records data among patients screened for diabetes at the ED in May 2021 was used. The primary outcome was elevated HbA1c (≥ 5.7%). The data was divided into a derivation set (80%) and a test set (20%) stratified by elevated HbA1c. In the derivation set, we estimated the optimal significance level for backward elimination using a 10-fold cross-validation method. A final model was derived using the entire derivation set and validated on the test set. Performance statistics included C-statistic, sensitivity, specificity, predictive values, Hosmer-Lemeshow test, and Brier score. Results: There were 590 ED patients screened for diabetes in May 2021. The final model included nine variables: age, race/ethnicity, insurance, chief complaints of back pain and fever/chills, and a past medical history of obesity, hyperlipidemia, chronic obstructive pulmonary disease, and substance misuse. Adequate model discrimination (C-statistic = 0.75; sensitivity, specificity, and predictive values > 0.70), no evidence of model ill fit (Hosmer-Lemeshow test = 0.29), and moderate Brier score (0.21) suggest acceptable model performance. Conclusion: In addition to age, obesity, and hyperlipidemia, a history of substance misuse was identified as an important predictor of elevated HbA1c levels among patients screened for diabetes in the ED. Our findings suggest that substance misuse may be an important factor to consider when facilitating follow-up care for patients identified with prediabetes or diabetes in the ED and warrants further investigation. Future research efforts should also include external validation in larger samples of ED patients.
Collapse
Affiliation(s)
- Mary H. Smart
- Department of Pharmacy Systems, Outcomes and Policy, College of Pharmacy, The University of Illinois Chicago, Chicago, Illinois, USA
| | - Janet Y. Lin
- Department of Emergency Medicine, College of Medicine, The University of Illinois Chicago, Chicago, Illinois, USA
| | - Brian T. Layden
- Division of Endocrinology, Diabetes and Metabolism, Department of Medicine, The University of Illinois Chicago, Chicago, Illinois, USA
- Jesse Brown Veterans Affairs Medical Center, Chicago, Illinois, USA
| | - Yuval Eisenberg
- Division of Endocrinology, Diabetes and Metabolism, Department of Medicine, The University of Illinois Chicago, Chicago, Illinois, USA
| | - A. Simon Pickard
- Department of Pharmacy Systems, Outcomes and Policy, College of Pharmacy, The University of Illinois Chicago, Chicago, Illinois, USA
| | - Lisa K. Sharp
- Department of Biobehavioral Nursing Science, College of Nursing, The University of Illinois Chicago, Chicago, Illinois, USA
| | - Kirstie K. Danielson
- Division of Endocrinology, Diabetes and Metabolism, Department of Medicine, The University of Illinois Chicago, Chicago, Illinois, USA
| | - Angela Kong
- Department of Pharmacy Systems, Outcomes and Policy, College of Pharmacy, The University of Illinois Chicago, Chicago, Illinois, USA
| |
Collapse
|
19
|
Drebin HM, Kurtansky NR, Hosein S, Nadelmann E, Moy AP, Ariyan CE, Bello DM, Brady MS, Coit DG, Marchetti MA, Bartlett EK. Declining Clinical Utility of Tools for Predicting Sentinel Lymph Node Biopsy Status: A Single Institution Experience from 2000 to 2021. Ann Surg Oncol 2025; 32:1463-1472. [PMID: 39681721 DOI: 10.1245/s10434-024-16698-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Accepted: 11/28/2024] [Indexed: 12/18/2024]
Abstract
INTRODUCTION Clinicopathologic data-based sentinel lymph node (SLN) prediction models are used to select patients with melanoma for sentinel lymph node biopsy (SLNB). However, the temporal performance of these models is unknown. Therefore, we investigated whether the performance and clinical utility of the Melanoma Institute of Australia, Memorial Sloan Kettering Cancer Center, and Friedman et al. models changed over time. PATIENTS AND METHODS Primary cutaneous melanoma cases that underwent SLNB at a single tertiary-care cancer center from 2000 to 2021 were identified from a prospectively maintained database. Calibration plots were generated. Values for estimated risks of SLN positivity and area under the receiver operator curve (AUC) were calculated. Clinical utility was assessed at thresholds between 5 and 10% using decision curve analysis. RESULTS In total, 2977 SLNB cases were included. The estimated risk of SLN positivity and AUCs were similar across periods for all models. However, calibration decreased over time for all models, with progressive underprediction of SLN positivity. Clinical utility also declined over time; in the most recent period investigated (2018-2021), no model offered clinical utility at risk thresholds ≤ 8%, and only the Friedman model provided clinical utility at risk thresholds of 9-10%. CONCLUSIONS The calibration and clinical utility of three predominant models for SLN prediction declined over time. There is a need to periodically reassess the performance of SLN prognostic tools as they are applied to contemporary cohorts. Future studies are needed to determine whether findings are generalizable outside of this study cohort.
Collapse
Affiliation(s)
- Harrison M Drebin
- Gastric and Mixed Tumor Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA
| | - Nicholas R Kurtansky
- Dermatology Service, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Sharif Hosein
- Dermatology Service, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Emily Nadelmann
- Dermatology Service, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Andrea P Moy
- Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Charlotte E Ariyan
- Gastric and Mixed Tumor Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Danielle M Bello
- Gastric and Mixed Tumor Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Mary S Brady
- Gastric and Mixed Tumor Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Daniel G Coit
- Gastric and Mixed Tumor Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | | | - Edmund K Bartlett
- Gastric and Mixed Tumor Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
| |
Collapse
|
20
|
Nanki T, Yamaguchi T, Umetsu K, Tanabe R, Maeda N, Kanazawa M, Furuno Y, Matsuda S, Takemoto S, Asao K, Kamiuchi T. Development and validation of a prediction model for serious infections in rheumatoid arthritis patients treated with tocilizumab in Japan. Clin Rheumatol 2025; 44:1081-1093. [PMID: 39918730 PMCID: PMC11865113 DOI: 10.1007/s10067-025-07328-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2024] [Revised: 12/13/2024] [Accepted: 01/09/2025] [Indexed: 02/27/2025]
Abstract
OBJECTIVES To develop a prediction model for serious infections (SIs) in rheumatoid arthritis (RA) patients treated with tocilizumab in Japan and to evaluate the model's performance compared to previously developed models, i.e., 'DANBIO' and 'postmarketing surveillance' (PMS). METHOD This non-interventional retrospective cohort study utilized the Medical Data Vision database in Japan. The study population was derived from patients ≥ 18 years with RA who initiated tocilizumab between April 2008 and July 2021. SIs were assessed during the 1-year follow-up from tocilizumab initiation. The candidate predictors were identified based on previous studies, known risk factors, potentially relevant factors, and data availability. The prediction model was developed using logistic regression. The model's performance was compared with previously developed models using cross-entropy and area under the receiver operating characteristic curve (AUC). RESULTS Of the 6501 RA patients, 4.57% experienced SIs during the 1-year follow-up. The model included 17 predictors for SI (e.g., age (odds ratio 1.013 (95% confidence interval 1.002-1.024)), history of SIs (2.569 (1.636-3.745)), diverticulitis (2.183 (1.000-3.989))). The model showed a lower cross-entropy and a higher AUC (0.1488; 0.712) compared to DANBIO (0.1932; 0.591) and PMS (0.1561; 0.565) models, and the sensitivity, specificity, positive predictive value, and negative predictive value using 5% threshold were 72%, 64%, 7%, and 98%, respectively. CONCLUSIONS The model developed in this study seems to have the potential to inform the risk of SIs in RA patients treated with tocilizumab and may help the early identification of patients at risk of SIs to reduce morbidity and mortality.
Collapse
Affiliation(s)
- Toshihiro Nanki
- Division of Rheumatology, Department of Internal Medicine, Toho University School of Medicine, Tokyo, Japan.
| | | | - Kosei Umetsu
- Drug Safety Division, Chugai Pharmaceutical Co. Ltd, Tokyo, Japan
| | - Ryunosuke Tanabe
- Drug Safety Division, Chugai Pharmaceutical Co. Ltd, Tokyo, Japan
| | - Naoki Maeda
- Drug Safety Division, Chugai Pharmaceutical Co. Ltd, Tokyo, Japan
| | - Minori Kanazawa
- Drug Safety Division, Chugai Pharmaceutical Co. Ltd, Tokyo, Japan
| | - Yuko Furuno
- Drug Safety Division, Chugai Pharmaceutical Co. Ltd, Tokyo, Japan
| | - Shinichi Matsuda
- Drug Safety Division, Chugai Pharmaceutical Co. Ltd, Tokyo, Japan
| | - Shinya Takemoto
- Drug Safety Division, Chugai Pharmaceutical Co. Ltd, Tokyo, Japan
| | | | - Tatsuya Kamiuchi
- Drug Safety Division, Chugai Pharmaceutical Co. Ltd, Tokyo, Japan
| |
Collapse
|
21
|
Madathil S, Dhouib M, Lelong Q, Bourassine A, Monsonego J. A multimodal deep learning model for cervical pre-cancers and cancers prediction: Development and internal validation study. Comput Biol Med 2025; 186:109710. [PMID: 39847948 DOI: 10.1016/j.compbiomed.2025.109710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 10/10/2024] [Accepted: 01/15/2025] [Indexed: 01/25/2025]
Abstract
BACKGROUND The current cervical cancer screening and diagnosis have limitations due to their subjectivity and lack of reproducibility. We describe the development of a deep learning (DL)-based diagnostic risk prediction model and evaluate its potential for clinical impact. METHOD We developed and internally validated a DL model which accommodates both clinical data and colposcopy images in predicting the patients CIN2+ status using a retrospective cohort of 6356 cases of LEEP-conization/cone-biopsy (gold-standard diagnosis) following an abnormal screening result. The overall performance, discrimination, and calibration of the model were compared to expert clinician's colposcopic impression. The potential for clinical impact was assessed with rate of unnecessary conizations that could be avoided by using our model. RESULTS The model combining clinical history and colposcopy images demonstrated superior performance prediction of CIN2+(AUC-ROC = 95.3 %, accuracy = 90.8 %, PPV = 94.1 %, NPV = 87.9 %) and better calibration compared to models that used image or clinical history data alone and outperformed clinician's colposcopic impressions. Moreover, if a decision threshold of 10 % is applied to the predicted probability from this model to recommend conization, up to 35 % of conizations could be avoided without missing any true CIN2+ cases. CONCLUSION We present a novel DL model to predict cervical neoplasia with potential for reducing unnecessary conization. External validation studies are warranted for assessing generalizability.
Collapse
Affiliation(s)
- Sreenath Madathil
- Faculty of Dental Medicine and Oral Health Sciences, McGill University, Montreal, Canada; Gerald Bronfman Department of Oncology, Faculty of Medicine, McGill University, Montreal, Canada
| | - Mohamed Dhouib
- École Polytechnique, Institut Polytechnique de Paris, Palaiseau, France
| | - Quitterie Lelong
- École Polytechnique, Institut Polytechnique de Paris, Palaiseau, France
| | - Ahmed Bourassine
- École Polytechnique, Institut Polytechnique de Paris, Palaiseau, France
| | | |
Collapse
|
22
|
van Leeuwen FD, Steyerberg EW, van Klaveren D, Wessler B, Kent DM, van Zwet EW. Instability of the AUROC of Clinical Prediction Models. Stat Med 2025; 44:e70011. [PMID: 39921554 PMCID: PMC11806515 DOI: 10.1002/sim.70011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Revised: 12/04/2024] [Accepted: 01/18/2025] [Indexed: 02/10/2025]
Abstract
BACKGROUND External validations are essential to assess the performance of a clinical prediction model (CPM) before deployment. Apart from model misspecification, also differences in patient population, the standard of care, predictor definitions, and other factors influence a model's discriminative ability, as commonly quantified by the AUC (or c-statistic). We aimed to quantify the variation in AUCs across sets of external validation studies and propose ways to adjust expectations of a model's performance in a new setting. METHODS The Tufts-PACE CPM Registry holds a collection of CPMs for prognosis in cardiovascular disease. We analyzed the AUC estimates of 469 CPMs with at least one external validation. Combined, these CPMs had a total of 1603 external validations reported in the literature. For each CPM and its associated set of validation studies, we performed a random-effects meta-analysis to estimate the between-study standard deviationτ $$ \tau $$ among the AUCs. Since the majority of these meta-analyses have only a handful of validations, this leads to very poor estimates ofτ $$ \tau $$ . So, instead of focusing on a single CPM, we estimated a log-normal distribution ofτ $$ \tau $$ across all 469 CPMs. We then used this distribution as an empirical prior. We used cross-validation to compare this empirical Bayesian approach with frequentist fixed and random-effects meta-analyses. RESULTS The 469 CPMs included in our study had a median of 2 external validations with an IQR of [1-3]. The estimated distribution ofτ $$ \tau $$ had a mean of 0.055 and a standard deviation of 0.015. Ifτ $$ \tau $$ = 0.05, then the 95% prediction interval for the AUC in a new setting has a width of at least+ / - $$ +/- $$ 0.1, no matter how many validations have been done. When there are fewer than 5 validations, which is typically the case, the usual frequentist methods grossly underestimate the uncertainty about the AUC in a new setting. Accounting forτ $$ \tau $$ in a Bayesian approach achieved near nominal coverage. CONCLUSION Due to large heterogeneity among the validated AUC values of a CPM, there is great irreducible uncertainty in predicting the AUC in a new setting. This uncertainty is underestimated by existing methods. The proposed empirical Bayes approach addresses this problem which merits wide application in judging the validity of prediction models.
Collapse
Affiliation(s)
- Florian D. van Leeuwen
- Department of Biomedical Data SciencesLeiden University Medical CenterLeidenThe Netherlands
| | - Ewout W. Steyerberg
- Department of Biomedical Data SciencesLeiden University Medical CenterLeidenThe Netherlands
| | - David van Klaveren
- Department of Public HealthErasmus University Medical CenterRotterdamNetherlands
- Predictive Analytics and Comparative Effectiveness CenterInstitute for Clinical Research and Health Policy Studies, Tufts Medical CenterBostonMassachusettsUSA
| | - Ben Wessler
- Predictive Analytics and Comparative Effectiveness CenterInstitute for Clinical Research and Health Policy Studies, Tufts Medical CenterBostonMassachusettsUSA
| | - David M. Kent
- Predictive Analytics and Comparative Effectiveness CenterInstitute for Clinical Research and Health Policy Studies, Tufts Medical CenterBostonMassachusettsUSA
| | - Erik W. van Zwet
- Department of Biomedical Data SciencesLeiden University Medical CenterLeidenThe Netherlands
| |
Collapse
|
23
|
Nong P, Maurer E, Dwivedi R. The urgency of centering safety-net organizations in AI governance. NPJ Digit Med 2025; 8:117. [PMID: 39984650 PMCID: PMC11845669 DOI: 10.1038/s41746-025-01479-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Accepted: 01/24/2025] [Indexed: 02/23/2025] Open
Abstract
Although robust AI governance requires the engagement of diverse stakeholders across the artificial intelligence (AI) ecosystem, the US safety net has largely been excluded from this kind of collaboration. Without a reorientation of the AI governance agenda, marginalized patients will disproportionately bear the risks of AI in the US healthcare system. To prevent this replication of digital inequity and an organizational digital divide, we suggest specific next steps for diverse stakeholders to progress toward more equitable policy and practice.
Collapse
Affiliation(s)
- Paige Nong
- Division of Health Policy and Management, University of Minnesota School of Public Health, Minneapolis, MN, USA.
| | - Eric Maurer
- Community-University Health Care Center, Minneapolis, MN, USA
| | - Roli Dwivedi
- Community-University Health Care Center, Minneapolis, MN, USA
- Department of Family Medicine & Community Health, University of Minnesota Medical School, Minneapolis, MN, USA
| |
Collapse
|
24
|
Ling XC, Chen HSL, Yeh PH, Cheng YC, Huang CY, Shen SC, Lee YS. Deep Learning in Glaucoma Detection and Progression Prediction: A Systematic Review and Meta-Analysis. Biomedicines 2025; 13:420. [PMID: 40002833 PMCID: PMC11852503 DOI: 10.3390/biomedicines13020420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2024] [Revised: 12/21/2024] [Accepted: 02/06/2025] [Indexed: 02/27/2025] Open
Abstract
Purpose: To evaluate the performance of deep learning (DL) in diagnosing glaucoma and predicting its progression using fundus photography and retinal optical coherence tomography (OCT) images. Materials and Methods: Relevant studies published up to 30 October 2024 were retrieved from PubMed, Medline, EMBASE, Cochrane Library, Web of Science, and ClinicalKey. A bivariate random-effects model was employed to calculate pooled sensitivity, specificity, positive and negative likelihood ratios, and area under the receiver operating characteristic curve (AUROC). Results: A total of 48 studies were included in the meta-analysis. DL algorithms demonstrated high diagnostic performance in glaucoma detection using fundus photography and OCT images. For fundus photography, the pooled sensitivity and specificity were 0.92 (95% CI: 0.89-0.94) and 0.93 (95% CI: 0.90-0.95), respectively, with an AUROC of 0.90 (95% CI: 0.88-0.92). For the OCT imaging, the pooled sensitivity and specificity were 0.90 (95% CI: 0.84-0.94) and 0.87 (95% CI: 0.81-0.91), respectively, with an AUROC of 0.86 (95% CI: 0.83-0.90). In predicting glaucoma progression, DL models generally showed less robust performance, with pooled sensitivities and specificities ranging lower than in diagnostic tasks. Internal validation datasets showed higher accuracy than external validation datasets. Conclusions: DL algorithms achieve excellent performance in diagnosing glaucoma using fundus photography and OCT imaging. To enhance the prediction of glaucoma progression, future DL models should integrate multimodal data, including functional assessments, such as visual field measurements, and undergo extensive validation in real-world clinical settings.
Collapse
Affiliation(s)
- Xiao Chun Ling
- Department of Ophthalmology, Chang Gung Memorial Hospital, Linkou, Taoyuan 333, Taiwan; (X.C.L.)
- Graduate Institute of Clinical Medical Sciences, Chang Gung University, Taoyuan 333, Taiwan
- College of Medicine, Chang Gung University, Taoyuan 333, Taiwan
| | - Henry Shen-Lih Chen
- Department of Ophthalmology, Chang Gung Memorial Hospital, Linkou, Taoyuan 333, Taiwan; (X.C.L.)
| | - Po-Han Yeh
- Department of Ophthalmology, New Taipei Municipal Tucheng Hospital, New Taipei 236, Taiwan
| | - Yu-Chun Cheng
- Department of Ophthalmology, Chang Gung Memorial Hospital, Linkou, Taoyuan 333, Taiwan; (X.C.L.)
| | - Chu-Yen Huang
- Department of Ophthalmology, Chang Gung Memorial Hospital, Linkou, Taoyuan 333, Taiwan; (X.C.L.)
| | - Su-Chin Shen
- Department of Ophthalmology, Chang Gung Memorial Hospital, Linkou, Taoyuan 333, Taiwan; (X.C.L.)
| | - Yung-Sung Lee
- Department of Ophthalmology, Chang Gung Memorial Hospital, Linkou, Taoyuan 333, Taiwan; (X.C.L.)
- College of Medicine, Chang Gung University, Taoyuan 333, Taiwan
- Department of Ophthalmology, New Taipei Municipal Tucheng Hospital, New Taipei 236, Taiwan
| |
Collapse
|
25
|
Gill SS, Ponniah HS, Giersztein S, Anantharaj RM, Namireddy SR, Killilea J, Ramsay D, Salih A, Thavarajasingam A, Scurtu D, Jankovic D, Russo S, Kramer A, Thavarajasingam SG. The diagnostic and prognostic capability of artificial intelligence in spinal cord injury: A systematic review. BRAIN & SPINE 2025; 5:104208. [PMID: 40027293 PMCID: PMC11871462 DOI: 10.1016/j.bas.2025.104208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2024] [Revised: 01/20/2025] [Accepted: 02/04/2025] [Indexed: 03/05/2025]
Abstract
Background Artificial intelligence (AI) models have shown potential for diagnosing and prognosticating traumatic spinal cord injury (tSCI), but their clinical utility remains uncertain. Method ology: The primary aim was to evaluate the performance of AI algorithms in diagnosing and prognosticating tSCI. Subsequent systematic searching of seven databases identified studies evaluating AI models. PROBAST and TRIPOD tools were used to assess the quality and reporting of included studies (PROSPERO: CRD42023464722). Fourteen studies, comprising 20 models and 280,817 pooled imaging datasets, were included. Analysis was conducted in line with the SWiM guidelines. Results For prognostication, 11 studies predicted outcomes including AIS improvement (30%), mortality and ambulatory ability (20% each), and discharge or length of stay (10%). The mean AUC was 0.770 (range: 0.682-0.902), indicating moderate predictive performance. Diagnostic models utilising DTI, CT, and T2-weighted MRI with CNN-based segmentation achieved a weighted mean accuracy of 0.898 (range: 0.813-0.938), outperforming prognostic models. Conclusion AI demonstrates strong diagnostic accuracy (mean accuracy: 0.898) and moderate prognostic capability (mean AUC: 0.770) for tSCI. However, the lack of standardised frameworks and external validation limits clinical applicability. Future models should integrate multimodal data, including imaging, patient characteristics, and clinician judgment, to improve utility and alignment with clinical practice.
Collapse
Affiliation(s)
- Saran Singh Gill
- Imperial Brain & Spine Initiative, Imperial College London, London, United Kingdom
- Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Hariharan Subbiah Ponniah
- Imperial Brain & Spine Initiative, Imperial College London, London, United Kingdom
- Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Sho Giersztein
- Imperial Brain & Spine Initiative, Imperial College London, London, United Kingdom
| | | | - Srikar Reddy Namireddy
- Imperial Brain & Spine Initiative, Imperial College London, London, United Kingdom
- Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Joshua Killilea
- Imperial Brain & Spine Initiative, Imperial College London, London, United Kingdom
| | - DanieleS.C. Ramsay
- Imperial Brain & Spine Initiative, Imperial College London, London, United Kingdom
| | - Ahmed Salih
- Imperial Brain & Spine Initiative, Imperial College London, London, United Kingdom
| | | | - Daniel Scurtu
- Department of Neurosurgery, Universitätsmedizin Mainz, Mainz, Germany
| | - Dragan Jankovic
- Department of Neurosurgery, LMU University Hospital, LMU, Munich, Germany
| | - Salvatore Russo
- Imperial College Healthcare NHS Trust, London, United Kingdom
| | - Andreas Kramer
- Department of Neurosurgery, LMU University Hospital, LMU, Munich, Germany
| | - Santhosh G. Thavarajasingam
- Imperial Brain & Spine Initiative, Imperial College London, London, United Kingdom
- Department of Neurosurgery, LMU University Hospital, LMU, Munich, Germany
| |
Collapse
|
26
|
Van den Eynde R, Vrancken A, Foubert R, Tuand K, Vandendriessche T, Schrijvers A, Verbrugghe P, Devos T, Van Calster B, Rex S. Prognostic models for prediction of perioperative allogeneic red blood cell transfusion in adult cardiac surgery: A systematic review and meta-analysis. Transfusion 2025; 65:397-409. [PMID: 39726297 PMCID: PMC11826302 DOI: 10.1111/trf.18108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2024] [Revised: 12/04/2024] [Accepted: 12/04/2024] [Indexed: 12/28/2024]
Abstract
OBJECTIVES Identifying cardiac surgical patients at risk of requiring red blood cell (RBC) transfusion is crucial for optimizing their outcome. We critically appraised prognostic models preoperatively predicting perioperative exposure to RBC transfusion in adult cardiac surgery and summarized model performance. METHODS Design: Systematic review and meta-analysis. STUDY ELIGIBILITY CRITERIA Studies developing and/or externally validating models preoperatively predicting perioperative RBC transfusion in adult cardiac surgery. Information sources MEDLINE, CENTRAL & CDSR, Embase, Transfusion Evidence Library, Web of Science, Scopus, ClinicalTrials.gov, and WHO ICTRP. Risk of bias and applicability: Quality of reporting was assessed with the Transparent Reporting of studies on prediction models for Individual Prognosis or Diagnosis adherence form, and risk of bias and applicability with the Prediction model Risk of Bias ASsessment Tool. SYNTHESIS METHODS Random-effects meta-analyses of concordance-statistics and total observed:expected ratios for models externally validated ≥5 times. RESULTS Nine model development, and 27 external validation studies were included. The average TRIPOD adherence score was 66.4% (range 44.1%-85.2%). All studies but 1 were rated high risk of bias. For TRUST and TRACK, the only models externally validated ≥5 times, summary c-statistics were 0.74 (95% CI: 0.65-0.84; 6 contributing studies) and 0.72 (95% CI: 0.68-0.75; 5 contributing studies) respectively, and summary total observed:expected ratios were 0.86 (95% CI: 0.71-1.05; 5 contributing studies) and 0.94 (95% CI: 0.74-1.19; 5 contributing studies), respectively. Considerable heterogeneity was observed in all meta-analyses. DISCUSSION Future high quality external validation and model updating studies which strictly adhere to reporting guidelines, are warranted.
Collapse
Affiliation(s)
- Raf Van den Eynde
- Department of Cardiovascular Sciences, Unit Anesthesiology and Algology, Biomedical Sciences GroupUniversity of Leuven (KU Leuven)LeuvenBelgium
| | - Annemarie Vrancken
- Department of Cardiovascular Sciences, Unit Anesthesiology and Algology, Biomedical Sciences GroupUniversity of Leuven (KU Leuven)LeuvenBelgium
| | - Ruben Foubert
- Department of Cardiovascular Sciences, Unit Anesthesiology and Algology, Biomedical Sciences GroupUniversity of Leuven (KU Leuven)LeuvenBelgium
| | - Krizia Tuand
- KU Leuven Libraries ‐ 2Bergen ‐ Learning Centre Désiré CollenLeuvenBelgium
| | | | - An Schrijvers
- Department of Cardiovascular Sciences, Unit Anesthesiology and Algology, Biomedical Sciences GroupUniversity of Leuven (KU Leuven)LeuvenBelgium
| | - Peter Verbrugghe
- Department of Cardiovascular Sciences, Unit Cardiac surgery, Biomedical Sciences GroupUniversity of Leuven (KU Leuven)LeuvenBelgium
| | - Timothy Devos
- Department of Hematology, University Hospitals Leuven, and Department of Microbiology and Immunology, Laboratory of Molecular Immunology (Rega Institute)University of Leuven (KU Leuven)LeuvenBelgium
| | - Ben Van Calster
- Department of Development and Regeneration, Unit Woman and ChildUniversity of Leuven (KU Leuven)LeuvenBelgium
| | - Steffen Rex
- Department of Cardiovascular Sciences, Unit Anesthesiology and Algology, Biomedical Sciences GroupUniversity of Leuven (KU Leuven)LeuvenBelgium
| |
Collapse
|
27
|
van der Meijden SL, van Boekel AM, Schinkelshoek LJ, van Goor H, Steyerberg EW, Nelissen RG, Mesotten D, Geerts BF, de Boer MG, Arbous MS. Development and validation of artificial intelligence models for early detection of postoperative infections (PERISCOPE): a multicentre study using electronic health record data. THE LANCET REGIONAL HEALTH. EUROPE 2025; 49:101163. [PMID: 39720095 PMCID: PMC11667051 DOI: 10.1016/j.lanepe.2024.101163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/13/2024] [Revised: 11/20/2024] [Accepted: 11/21/2024] [Indexed: 12/26/2024]
Abstract
Background Postoperative infections significantly impact patient outcomes and costs, exacerbated by late diagnoses, yet early reliable predictors are scarce. Existing artificial intelligence (AI) models for postoperative infection prediction often lack external validation or perform poorly in local settings when validated. We aimed to develop locally valid models as part of the PERISCOPE AI system to enable early detection, safer discharge, and more timely treatment of patients. Methods We developed and validated XGBoost models to predict postoperative infections within 7 and 30 days of surgery. Using retrospective pre-operative and intra-operative electronic health record data from 2014 to 2023 across various surgical specialities, the models were developed at Hospital A and validated and updated at Hospitals B and C in the Netherlands and Belgium. Model performance was evaluated before and after updating using the two most recent years of data as temporal validation datasets. Main outcome measures were model discrimination (area under the receiver operating characteristic curve (AUROC)), calibration (slope, intercept, and plots), and clinical utility (decision curve analysis with net benefit). Findings The study included 253,010 surgical procedures with 23,903 infections within 30-days. Discriminative performance, calibration properties, and clinical utility significantly improved after updating. Final AUROCs after updating for Hospitals A, B, and C were 0.82 (95% confidence interval (CI) 0.81-0.83), 0.82 (95% CI 0.81-0.83), and 0.91 (95% CI 0.90-0.91) respectively for 30-day predictions on the temporal validation datasets (2022-2023). Calibration plots demonstrated adequate correspondence between observed outcomes and predicted risk. All local models were deemed clinically useful as the net benefit was higher than default strategies (treat all and treat none) over a wide range of clinically relevant decision thresholds. Interpretation PERISCOPE can accurately predict overall postoperative infections within 7- and 30-days post-surgery. The robust performance implies potential for improving clinical care in diverse clinical target populations. This study supports the need for approaches to local updating of AI models to account for domain shifts in patient populations and data distributions across different clinical settings. Funding This study was funded by a REACT EU grant from European Regional Development Fund (ERDF) and Kansen voor West.
Collapse
Affiliation(s)
- Siri L. van der Meijden
- Intensive Care Unit, Leiden University Medical Centre, Leiden, the Netherlands
- Healthplus.ai B.V., Amsterdam, the Netherlands
| | - Anna M. van Boekel
- Intensive Care Unit, Leiden University Medical Centre, Leiden, the Netherlands
| | | | - Harry van Goor
- General Surgery Department, Radboud University Medical Centre, Nijmegen, the Netherlands
| | - Ewout W. Steyerberg
- Department of Biomedical Data Sciences, Leiden University Medical Centre, Leiden, the Netherlands
| | - Rob G.H.H. Nelissen
- Department of Orthopaedics, Leiden University Medical Centre, Leiden, the Netherlands
| | - Dieter Mesotten
- Department of Anaesthesiology, Intensive Care Medicine, Ziekenhuis Oost-Limburg, Genk, Belgium
- Faculty of Medicine and Life Sciences, Limburg Clinical Research Centre, UHasselt, Diepenbeek, Belgium
| | | | - Mark G.J. de Boer
- Department of Infectious Diseases, Leiden University Medical Centre, Leiden, the Netherlands
| | - M. Sesmu Arbous
- Intensive Care Unit, Leiden University Medical Centre, Leiden, the Netherlands
| |
Collapse
|
28
|
Memedovich A, Steele B, Orr T, Chaudhry S, Tadrous M, Kesselheim AS, Hollis A, Beall RF. Predicting patent challenges for small-molecule drugs: A cross-sectional study. PLoS Med 2025; 22:e1004540. [PMID: 39937776 PMCID: PMC11867330 DOI: 10.1371/journal.pmed.1004540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/04/2024] [Revised: 02/27/2025] [Accepted: 01/22/2025] [Indexed: 02/14/2025] Open
Abstract
BACKGROUND The high cost of prescription drugs in the United States is maintained by brand-name manufacturers' competition-free period made possible in part through patent protection, which generic competitors must challenge to enter the market early. Understanding the predictors of these challenges can inform policy development to encourage timely generic competition. Identifying categories of drugs systematically overlooked by challengers, such as those with low market size, highlights gaps where unchecked patent quality and high prices persist, and can help design policy interventions to help promote timely patient access to generic drugs including enhanced patent scrutiny or incentives for challenges. Our objective was to characterize and assess the extent to which market size and other drug characteristics can predict patent challenges for brand-name drugs. METHODS AND FINDINGS This cross-sectional study included new patented small-molecule drugs approved by the FDA from 2007 to 2018. Market size, patent, and patent challenge data came from IQVIA MIDAS pharmaceutical quarterly sales data, the FDA's Orange Book database, and the FDA's Paragraph IV list. Predictive models were constructed using random forest and elastic net classification. The primary outcome was the occurrence of a patent challenge within the first year of eligibility. Of the 210 new small-molecule drugs included in the sample, 55% experienced initiation of patent challenge within the first year of eligibility. Market value was the most important predictor variable, with larger markets being more likely to be associated with patent challenges. Drugs in the anti-infective therapeutic class or those with fast-track approval were less likely to be challenged. The limitations of this work arise from the exclusion of variables that were not readily available publicly, will be the target of future research, or were deemed beyond the scope of this project. CONCLUSIONS Generic competition does not occur with the same timeliness across all drug markets, which can leave granted patents of questionable merit in place and sustain high brand-name drug prices. Predictive models may help direct limited resources for post-grant patent validity review and adjust policy when generic competition is lacking.
Collapse
Affiliation(s)
- Ally Memedovich
- Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
| | - Brian Steele
- Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
| | - Taylor Orr
- Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
| | - Shanzeh Chaudhry
- Leslie Dan Faculty of Pharmacy, University of Toronto, Toronto, Ontario, Canada
| | - Mina Tadrous
- Leslie Dan Faculty of Pharmacy, University of Toronto, Toronto, Ontario, Canada
| | - Aaron S. Kesselheim
- Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Aidan Hollis
- Department of Economics, University of Calgary, Calgary, Alberta, Canada
| | - Reed F. Beall
- Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
| |
Collapse
|
29
|
Shamsutdinova D, Stamate D, Stahl D. Balancing accuracy and Interpretability: An R package assessing complex relationships beyond the Cox model and applications to clinical prediction. Int J Med Inform 2025; 194:105700. [PMID: 39546831 DOI: 10.1016/j.ijmedinf.2024.105700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Accepted: 11/08/2024] [Indexed: 11/17/2024]
Abstract
BACKGROUND Accurate and interpretable models are essential for clinical decision-making, where predictions can directly impact patient care. Machine learning (ML) survival methods can handle complex multidimensional data and achieve high accuracy but require post-hoc explanations. Traditional models such as the Cox Proportional Hazards Model (Cox-PH) are less flexible, but fast, stable, and intrinsically transparent. Moreover, ML does not always outperform Cox-PH in clinical settings, warranting a diligent model validation. We aimed to develop a set of R functions to help explore the limits of Cox-PH compared to the tree-based and deep learning survival models for clinical prediction modelling, employing ensemble learning and nested cross-validation. METHODS We developed a set of R functions, publicly available as the package "survcompare". It supports Cox-PH and Cox-Lasso, and Survival Random Forest (SRF) and DeepHit are the ML alternatives, along with the ensemble methods integrating Cox-PH with SRF or DeepHit designed to isolate the marginal value of ML. The package performs a repeated nested cross-validation and tests for statistical significance of the ML's superiority using the survival-specific performance metrics, the concordance index, time-dependent AUC-ROC and calibration slope. To get practical insights, we applied this methodology to clinical and simulated datasets with varying complexities and sizes. RESULTS In simulated data with non-linearities or interactions, ML models outperformed Cox-PH at sample sizes ≥ 500. ML superiority was also observed in imaging and high-dimensional clinical data. However, for tabular clinical data, the performance gains of ML were minimal; in some cases, regularised Cox-Lasso recovered much of the ML's performance advantage with significantly faster computations. Ensemble methods combining Cox-PH and ML predictions were instrumental in quantifying Cox-PH's limits and improving ML calibration. Traditional models like Cox-PH or Cox-Lasso should not be overlooked while developing clinical predictive models from tabular data or data of limited size. CONCLUSION Our package offers researchers a framework and practical tool for evaluating the accuracy-interpretability trade-off, helping make informed decisions about model selection.
Collapse
Affiliation(s)
- Diana Shamsutdinova
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom.
| | - Daniel Stamate
- Data Science and Soft Computing Lab, Computing Department, Goldsmiths University of London, United Kingdom; School of Health Sciences, University of Manchester, Manchester, United Kingdom
| | - Daniel Stahl
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
| |
Collapse
|
30
|
Meijerink LM, Dunias ZS, Leeuwenberg AM, de Hond AAH, Jenkins DA, Martin GP, Sperrin M, Peek N, Spijker R, Hooft L, Moons KGM, van Smeden M, Schuit E. Updating methods for artificial intelligence-based clinical prediction models: a scoping review. J Clin Epidemiol 2025; 178:111636. [PMID: 39662644 DOI: 10.1016/j.jclinepi.2024.111636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Revised: 12/02/2024] [Accepted: 12/03/2024] [Indexed: 12/13/2024]
Abstract
OBJECTIVES To give an overview of methods for updating artificial intelligence (AI)-based clinical prediction models based on new data. STUDY DESIGN AND SETTING We comprehensively searched Scopus and Embase up to August 2022 for articles that addressed developments, descriptions, or evaluations of prediction model updating methods. We specifically focused on articles in the medical domain involving AI-based prediction models that were updated based on new data, excluding regression-based updating methods as these have been extensively discussed elsewhere. We categorized and described the identified methods used to update the AI-based prediction model as well as the use cases in which they were used. RESULTS We included 78 articles. The majority of the included articles discussed updating for neural network methods (93.6%) with medical images as input data (65.4%). In many articles (51.3%) existing, pretrained models for broad tasks were updated to perform specialized clinical tasks. Other common reasons for model updating were to address changes in the data over time and cross-center differences; however, more unique use cases were also identified, such as updating a model from a broad population to a specific individual. We categorized the identified model updating methods into four categories: neural network-specific methods (described in 92.3% of the articles), ensemble-specific methods (2.5%), model-agnostic methods (9.0%), and other (1.3%). Variations of neural network-specific methods are further categorized based on the following: (1) the part of the original neural network that is kept, (2) whether and how the original neural network is extended with new parameters, and (3) to what extent the original neural network parameters are adjusted to the new data. The most frequently occurring method (n = 30) involved selecting the first layer(s) of an existing neural network, appending new, randomly initialized layers, and then optimizing the entire neural network. CONCLUSION We identified many ways to adjust or update AI-based prediction models based on new data, within a large variety of use cases. Updating methods for AI-based prediction models other than neural networks (eg, random forest) appear to be underexplored in clinical prediction research. PLAIN LANGUAGE SUMMARY AI-based prediction models are increasingly used in health care, helping clinicians with diagnosing diseases, guiding treatment decisions, and informing patients. However, these prediction models do not always work well when applied to hospitals, patient populations, or times different from those used to develop the models. Developing new models for every situation is neither practical nor desired, as it wastes resources, time, and existing knowledge. A more efficient approach is to adjust existing models to new contexts ('updating'), but there is limited guidance on how to do this for AI-based clinical prediction models. To address this, we reviewed 78 studies in detail to understand how researchers are currently updating AI-based clinical prediction models, and the types of situations in which these updating methods are used. Our findings provide a comprehensive overview of the available methods to update existing models. This is intended to serve as guidance and inspiration for researchers. Ultimately, this can lead to better reuse of existing models and improve the quality and efficiency of AI-based prediction models in health care.
Collapse
Affiliation(s)
- Lotta M Meijerink
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands.
| | - Zoë S Dunias
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Artuur M Leeuwenberg
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Anne A H de Hond
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - David A Jenkins
- Division of Informatics, Imaging and Data Sciences, University of Manchester, Manchester, United Kingdom
| | - Glen P Martin
- Division of Informatics, Imaging and Data Sciences, University of Manchester, Manchester, United Kingdom
| | - Matthew Sperrin
- Division of Informatics, Imaging and Data Sciences, University of Manchester, Manchester, United Kingdom
| | - Niels Peek
- Department of Public Health and Primary Care, The Healthcare Improvement Studies Institute, University of Cambridge, Cambridge, United Kingdom
| | - René Spijker
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Lotty Hooft
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Karel G M Moons
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Maarten van Smeden
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Ewoud Schuit
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
31
|
Bindels BJJ, Kuijten RH, Groot OQ, Huele EH, Gal R, de Groot MCH, van der Velden JM, Delawi D, Schwab JH, Verkooijen HM, Verlaan JJ, Tobert D, Rutges JPHJ. External validation of twelve existing survival prediction models for patients with spinal metastases. Spine J 2025:S1529-9430(25)00063-4. [PMID: 39894281 DOI: 10.1016/j.spinee.2025.01.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/29/2024] [Revised: 12/19/2024] [Accepted: 01/20/2025] [Indexed: 02/04/2025]
Abstract
BACKGROUND CONTEXT Survival prediction models for patients with spinal metastases may inform patients and clinicians in shared decision-making. PURPOSE To externally validate all existing survival prediction models for patients with spinal metastases. DESIGN Prospective cohort study using retrospective data. PATIENT SAMPLE 953 patients. OUTCOME MEASURES Survival in months, area under the curve (AUC), and calibration intercept and slope. METHOD This study included patients with spinal metastases referred to a single tertiary referral center between 2016 and 2021. Twelve models for predicting 3, 6, and 12-month survival were externally validated Bollen, Mizumoto, Modified Bauer, New England Spinal Metastasis Score, Original Bauer, Oswestry Spinal Risk Index (OSRI), PathFx, Revised Katagiri, Revised Tokuhashi, Skeletal Oncology Research Group Machine Learning Algorithm (SORG-MLA), Tomita, and Van der Linden. Discrimination was assessed using (AUC) and calibration using the intercept and slope. Calibration was considered appropriate if calibration measures were close to their ideal values with narrow confidence intervals. RESULTS In total, 953 patients were included. Survival was 76.4% at 3 months (728/953), 62.2% at 6 months (593/953), and 50.3% at 12 months (479/953). Revised Katagiri yielded AUCs of 0.79 (95% CI, 0.76-0.82) to 0.81 (95% CI, 0.79-0.84), Bollen yielded AUCs of 0.76 (95% CI, 0.73-0.80) to 0.77 (95% CI, 0.75-0.80), and OSRI yielded AUCs of 0.75 (95% CI, 0.72-0.78) to 0.77 (95% CI, 0.74-0.79). The other 9 prediction models yielded AUCs ranging from 0.59 (95% CI, 0.55-0.63) to 0.76 (95% CI, 0.74-0.79). None of the twelve models yielded appropriate calibration. CONCLUSIONS Twelve survival prediction models for patients with spinal metastases yielded poor to fair discrimination and poor calibration. Survival prediction models may inform decision-making in patients with spinal metastases, provided that recalibration using recent patient data is performed.
Collapse
Affiliation(s)
- B J J Bindels
- Department of Orthopedic Surgery, University Medical Center Utrecht, Heidelberglaan 100, 3584 CX, Utrecht, Utrecht, The Netherlands
| | - R H Kuijten
- Department of Orthopedic Surgery, University Medical Center Utrecht, Heidelberglaan 100, 3584 CX, Utrecht, Utrecht, The Netherlands
| | - O Q Groot
- Department of Orthopedic Surgery, University Medical Center Utrecht, Heidelberglaan 100, 3584 CX, Utrecht, Utrecht, The Netherlands
| | - E H Huele
- Division of Imaging and Oncology, Utrecht University, University Medical Center Utrecht, Heidelberglaan 100, 3584 CX, Utrecht, Utrecht, The Netherlands
| | - R Gal
- Division of Imaging and Oncology, Utrecht University, University Medical Center Utrecht, Heidelberglaan 100, 3584 CX, Utrecht, Utrecht, The Netherlands
| | - M C H de Groot
- Central Diagnostic Library, University Medical Center Utrecht, Heidelberglaan 100, 3584 CX, Utrecht, Utrecht, The Netherlands
| | - J M van der Velden
- Division of Imaging and Oncology, Utrecht University, University Medical Center Utrecht, Heidelberglaan 100, 3584 CX, Utrecht, Utrecht, The Netherlands
| | - D Delawi
- Department of Orthopedic Surgery, Antonius Medical Center, Koekoekslaan 1, 3435 CM, Nieuwegein, Utrecht, The Netherlands
| | - J H Schwab
- Department of Orthopedic Surgery, Cedars-Sinai Medical Center, 8700 Beverly Boulevard, Los Angeles, CA, USA
| | - H M Verkooijen
- Division of Imaging and Oncology, Utrecht University, University Medical Center Utrecht, Heidelberglaan 100, 3584 CX, Utrecht, Utrecht, The Netherlands; Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Heidelberglaan 100, 3584 CX, Utrecht, Utrecht, The Netherlands
| | - J J Verlaan
- Department of Orthopedic Surgery, University Medical Center Utrecht, Heidelberglaan 100, 3584 CX, Utrecht, Utrecht, The Netherlands; Division of Imaging and Oncology, Utrecht University, University Medical Center Utrecht, Heidelberglaan 100, 3584 CX, Utrecht, Utrecht, The Netherlands
| | - D Tobert
- Department of Orthopedic Surgery, Massachusetts General Hospital, 55 Fruit Street, Boston, MA 02114, USA
| | - J P H J Rutges
- Department of Orthopedics and Sports Medicine, Erasmus Medical Center, Doctor Molewaterplein 40, 3015 GD, Rotterdam, Zuid-Holland, The Netherlands.
| |
Collapse
|
32
|
Tack B, Vita D, Mbuyamba J, Ntangu E, Vuvu H, Kahindo I, Ngina J, Luyindula A, Nama N, Mputu T, Im J, Jeon H, Marks F, Toelen J, Lunguya O, Jacobs J, Van Calster B. Developing a clinical prediction model to modify empirical antibiotics for non-typhoidal Salmonella bloodstream infection in children under-five in the Democratic Republic of Congo. BMC Infect Dis 2025; 25:122. [PMID: 39871187 PMCID: PMC11771121 DOI: 10.1186/s12879-024-10319-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 12/05/2024] [Indexed: 01/29/2025] Open
Abstract
BACKGROUND Non-typhoidal Salmonella (NTS) frequently cause bloodstream infection in children under-five in sub-Saharan Africa, particularly in malaria-endemic areas. Due to increasing drug resistance, NTS are often not covered by standard-of-care empirical antibiotics for severe febrile illness. We developed a clinical prediction model to orient the choice of empirical antibiotics (standard-of-care versus alternative antibiotics) for children admitted to hospital in settings with high proportions of drug-resistant NTS. METHODS Data were collected during a prospective cohort study in children (> 28 days-< 5 years) admitted with severe febrile illness to Kisantu district hospital, DR Congo. The outcome variable was blood culture confirmed NTS bloodstream infection; the comparison group were children without NTS bloodstream infection. Predictors were selected a priori based on systematic literature review. The prediction model was developed with multivariable logistic regression; a simplified scoring system was derived. Internal validation to estimate optimism-corrected performance was performed using bootstrapping and net benefits were calculated to evaluate clinical usefulness. RESULTS NTS bloodstream infection was diagnosed in 12.7% (295/2327) of enrolled children. The area under the curve was 0.79 (95%CI: 0.76-0.82) for the prediction model, and 0.78 (0.85-0.80) for the scoring system. The estimated calibration slopes were 0.95 (model) and 0.91 (scoring system). At a decision threshold of 20% NTS risk, the prediction model and scoring system had 57% and 53% sensitivity, and 85% specificity. The net benefit for decisions thresholds < 30% ranged from 2.4 to 3.9 per 100 children. CONCLUSION The model predicts NTS bloodstream infection and can support the choice of empiric antibiotics to include coverage of drug-resistant NTS, in particular for decision thresholds < 30%. External validation studies are needed to investigate generalizability. TRIAL REGISTRATION DeNTS study, clinicaltrials.gov: NCT04473768 (registration 16/07/2020) and TreNTS study, clinicaltrials.gov: NCT04850677 (registration 20/04/2021).
Collapse
Affiliation(s)
- Bieke Tack
- Department of Clinical Sciences, Institute of Tropical Medicine, Nationalestraat 155, 2000, Antwerp, Belgium.
- Department of Microbiology, Immunology and Transplantation, KU Leuven, Louvain, Belgium.
- Department of Pediatrics, University Hospitals Leuven, Louvain, Belgium.
| | - Daniel Vita
- Saint Luc Hôpital Général de Référence Kisantu, Kisantu, Democratic Republic of Congo
| | - Jules Mbuyamba
- Department of Microbiology, Institut National de Recherche Biomédicale, Kinshasa, Democratic Republic of Congo
- Department of Medical Biology, University Teaching Hospital of Kinshasa, Kinshasa, Democratic Republic of Congo
| | - Emmanuel Ntangu
- Saint Luc Hôpital Général de Référence Kisantu, Kisantu, Democratic Republic of Congo
| | - Hornela Vuvu
- Saint Luc Hôpital Général de Référence Kisantu, Kisantu, Democratic Republic of Congo
| | - Immaculée Kahindo
- Department of Microbiology, Institut National de Recherche Biomédicale, Kinshasa, Democratic Republic of Congo
| | - Japhet Ngina
- Saint Luc Hôpital Général de Référence Kisantu, Kisantu, Democratic Republic of Congo
| | - Aimée Luyindula
- Saint Luc Hôpital Général de Référence Kisantu, Kisantu, Democratic Republic of Congo
| | - Naomie Nama
- Saint Luc Hôpital Général de Référence Kisantu, Kisantu, Democratic Republic of Congo
| | - Tito Mputu
- Saint Luc Hôpital Général de Référence Kisantu, Kisantu, Democratic Republic of Congo
| | - Justin Im
- International Vaccine Institute, Seoul, Republic of Korea
| | - Hyonjin Jeon
- International Vaccine Institute, Seoul, Republic of Korea
| | - Florian Marks
- International Vaccine Institute, Seoul, Republic of Korea
- Cambridge Institute of Therapeutic Immunology and Infectious Disease, School of Clinical Medicine, University of Cambridge, Cambridge, UK
- Heidelberg Institute of Global Health, University of Heidelberg, Heidelberg, Germany
- Madagascar Institute for Vaccine Research, University of Antananarivo, Antananarivo, Madagascar
| | - Jaan Toelen
- Department of Pediatrics, University Hospitals Leuven, Louvain, Belgium
- Department of Development and Regeneration, KU Leuven, Louvain, Belgium
| | - Octavie Lunguya
- Department of Microbiology, Institut National de Recherche Biomédicale, Kinshasa, Democratic Republic of Congo
- Department of Medical Biology, University Teaching Hospital of Kinshasa, Kinshasa, Democratic Republic of Congo
| | - Jan Jacobs
- Department of Clinical Sciences, Institute of Tropical Medicine, Nationalestraat 155, 2000, Antwerp, Belgium
- Department of Microbiology, Immunology and Transplantation, KU Leuven, Louvain, Belgium
| | - Ben Van Calster
- Department of Development and Regeneration, KU Leuven, Louvain, Belgium
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands
- EPI-Center, KU Leuven, Louvain, Belgium
| |
Collapse
|
33
|
Clift AK. How Outcome Prediction Could Aid Clinical Practice. Br J Hosp Med (Lond) 2025; 86:1-6. [PMID: 39862035 DOI: 10.12968/hmed.2024.0781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2025]
Abstract
Predictive algorithms have myriad potential clinical decision-making implications from prognostic counselling to improving clinical trial efficiency. Large observational (or "real world") cohorts are a common data source for the development and evaluation of such tools. There is significant optimism regarding the benefits and use cases for risk-based care, but there is a notable disparity between the volume of clinical prediction models published and implementation into healthcare systems that drive and realise patient benefit. Considering the perspective of a clinician or clinical researcher that may encounter clinical predictive algorithms in the near future as a user or developer, this editorial: (1) discusses the ways in which prediction models built using observational data could inform better clinical decisions; (2) summarises the main steps in producing a model with special focus on key appraisal factors; and (3) highlights recent work driving evolution in the ways that we should conceptualise, build and evaluate these tools.
Collapse
|
34
|
Hillier B, Scandrett K, Coombe A, Hernandez-Boussard T, Steyerberg E, Takwoingi Y, Velickovic V, Dinnes J. Risk prediction tools for pressure injury occurrence: an umbrella review of systematic reviews reporting model development and validation methods. Diagn Progn Res 2025; 9:2. [PMID: 39806510 PMCID: PMC11730812 DOI: 10.1186/s41512-024-00182-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Accepted: 12/02/2024] [Indexed: 01/16/2025] Open
Abstract
BACKGROUND Pressure injuries (PIs) place a substantial burden on healthcare systems worldwide. Risk stratification of those who are at risk of developing PIs allows preventive interventions to be focused on patients who are at the highest risk. The considerable number of risk assessment scales and prediction models available underscores the need for a thorough evaluation of their development, validation, and clinical utility. Our objectives were to identify and describe available risk prediction tools for PI occurrence, their content and the development and validation methods used. METHODS The umbrella review was conducted according to Cochrane guidance. MEDLINE, Embase, CINAHL, EPISTEMONIKOS, Google Scholar, and reference lists were searched to identify relevant systematic reviews. The risk of bias was assessed using adapted AMSTAR-2 criteria. Results were described narratively. All included reviews contributed to building a comprehensive list of risk prediction tools. RESULTS We identified 32 eligible systematic reviews only seven of which described the development and validation of risk prediction tools for PI. Nineteen reviews assessed the prognostic accuracy of the tools and 11 assessed clinical effectiveness. Of the seven reviews reporting model development and validation, six included only machine learning models. Two reviews included external validations of models, although only one review reported any details on external validation methods or results. This was also the only review to report measures of both discrimination and calibration. Five reviews presented measures of discrimination, such as the area under the curve (AUC), sensitivities, specificities, F1 scores, and G-means. For the four reviews that assessed the risk of bias assessment using the PROBAST tool, all models but one were found to be at high or unclear risk of bias. CONCLUSIONS Available tools do not meet current standards for the development or reporting of risk prediction models. The majority of tools have not been externally validated. Standardised and rigorous approaches to risk prediction model development and validation are needed. TRIAL REGISTRATION The protocol was registered on the Open Science Framework ( https://osf.io/tepyk ).
Collapse
Affiliation(s)
- Bethany Hillier
- Department of Applied Health Sciences, College of Medicine and Health, University of Birmingham, Edgbaston, Birmingham, UK
- NIHR Birmingham Biomedical Research Centre, University Hospitals Birmingham NHS Foundation Trust and University of Birmingham, Birmingham, UK
| | - Katie Scandrett
- Department of Applied Health Sciences, College of Medicine and Health, University of Birmingham, Edgbaston, Birmingham, UK
| | - April Coombe
- Department of Applied Health Sciences, College of Medicine and Health, University of Birmingham, Edgbaston, Birmingham, UK
- NIHR Birmingham Biomedical Research Centre, University Hospitals Birmingham NHS Foundation Trust and University of Birmingham, Birmingham, UK
| | | | - Ewout Steyerberg
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - Yemisi Takwoingi
- Department of Applied Health Sciences, College of Medicine and Health, University of Birmingham, Edgbaston, Birmingham, UK
- NIHR Birmingham Biomedical Research Centre, University Hospitals Birmingham NHS Foundation Trust and University of Birmingham, Birmingham, UK
| | - Vladica Velickovic
- Evidence Generation Department, HARTMANN GROUP, Heidenheim, Germany
- Institute of Public Health, Medical, Decision Making and Health Technology Assessment, UMIT, Hall, Tirol, Austria
| | - Jacqueline Dinnes
- Department of Applied Health Sciences, College of Medicine and Health, University of Birmingham, Edgbaston, Birmingham, UK.
- NIHR Birmingham Biomedical Research Centre, University Hospitals Birmingham NHS Foundation Trust and University of Birmingham, Birmingham, UK.
| |
Collapse
|
35
|
Robledo KP, Marschner IC, Grossmann M, Handelsman DJ, Yeap BB, Allan CA, Foote C, Inder WJ, Stuckey BGA, Jesudason D, Bracken K, Keech AC, Jenkins AJ, Gebski V, Jardine M, Wittert G. Predicting type 2 diabetes and testosterone effects in high-risk Australian men: development and external validation of a 2-year risk model. Eur J Endocrinol 2025; 192:15-24. [PMID: 39720906 DOI: 10.1093/ejendo/lvae166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/24/2024] [Revised: 11/13/2024] [Accepted: 12/21/2024] [Indexed: 12/26/2024]
Abstract
OBJECTIVE We have shown that men aged 50 years+ at high risk of type 2 diabetes treated with testosterone together with a lifestyle program reduced the risk of type 2 diabetes at 2 years by 40% compared to a lifestyle program alone. To develop a personalized approach to treatment, we aimed to explore a prognostic model for incident type 2 diabetes at 2 years and investigate biomarkers predictive of the testosterone effect. DESIGN Model development in 783 men with impaired glucose tolerance but not type 2 diabetes from Testosterone for Prevention of Type 2 Diabetes; a multicenter, 2-year trial of Testosterone vs placebo. External validation performed in 236 men from the Examining Outcomes in Chronic Disease in the 45 and Up Study (EXTEND-45, n = 267 357). METHODS Type 2 diabetes at 2 years defined as 2-h fasting glucose by oral glucose tolerance test (OGTT) ≥11.1 mmol/L. Risk factors, including predictive biomarkers of testosterone treatment, were assessed using penalized logistic regression. RESULTS Baseline HbA1c and 2-h OGTT glucose were dominant predictors, together with testosterone, age, and an interaction between testosterone and HbA1c (P = .035, greater benefit with HbA1c ≥ 5.6%, 38 mmol/mol). The final model identified men who developed type 2 diabetes, with C-statistics 0.827 in development and 0.798 in validation. After recalibration, the model accurately predicted a participant's absolute risk of type 2 diabetes. CONCLUSIONS Baseline HbA1c and 2-h OGTT glucose predict incident type 2 diabetes at 2 years in high-risk men, with risk modified independently by testosterone treatment. Men with HbA1c ≥ 5.6% (38 mmol/mol) benefit most from testosterone treatment, beyond a lifestyle program.
Collapse
Affiliation(s)
- Kristy P Robledo
- NHMRC Clinical Trials Centre, University of Sydney, Locked bag 77, Camperdown, NSW 1450, Australia
| | - Ian C Marschner
- NHMRC Clinical Trials Centre, University of Sydney, Locked bag 77, Camperdown, NSW 1450, Australia
| | - Mathis Grossmann
- Department of Endocrinology, Austin Hospital, Heidelberg, VIC 3084, Australia
- Department of Medicine, University of Melbourne, Parkville, VIC 3010, Australia
| | - David J Handelsman
- Andrology Laboratory, ANZAC Research Institute, University of Sydney, Concord, NSW 2139, Australia
- Andrology Department, Concord Hospital, Concord, NSW 2139, Australia
| | - Bu B Yeap
- Medical School, University of Western Australia, Perth, WA 6009, Australia
- Department of Endocrinology and Diabetes, Fiona Stanley Hospital, Murdoch, WA 6150, Australia
| | - Carolyn A Allan
- Centre for Endocrinology and Metabolism, Hudson Institute of Medical Research, Clayton, VIC 3168, Australia
- School of Clinical Sciences, Monash University, Clayton, VIC 3800, Australia
| | - Celine Foote
- The George Institute for Global Health, University of New South Wales, Sydney, NSW 2052, Australia
| | - Warrick J Inder
- Department of Diabetes and Endocrinology, Princess Alexandra Hospital, Woolloongabba, QLD 4102, Australia
- Medical School, University of Queensland, Herston, QLD 4029, Australia
| | - Bronwyn G A Stuckey
- Keogh Institute for Medical Research, Nedlands, WA 6009, Australia
- Department of Endocrinology and Diabetes, Sir Charles Gairdner Hospital, Nedlands, WA 6009, Australia
- Medical School, University of Western Australia, Nedlands, WA 6009, Australia
| | - David Jesudason
- School of Medicine, The University of Adelaide, Adelaide, SA 5005, Australia
- Endocrinology Unit, The Queen Elizabeth Hospital, Woodville South, SA 5011, Australia
| | - Karen Bracken
- Faculty of Medicine and Health, University of Sydney, Camperdown, NSW 2006, Australia
| | - Anthony C Keech
- NHMRC Clinical Trials Centre, University of Sydney, Locked bag 77, Camperdown, NSW 1450, Australia
| | - Alicia J Jenkins
- Diabetes and Vascular Medicine, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
| | - Val Gebski
- NHMRC Clinical Trials Centre, University of Sydney, Locked bag 77, Camperdown, NSW 1450, Australia
| | - Meg Jardine
- NHMRC Clinical Trials Centre, University of Sydney, Locked bag 77, Camperdown, NSW 1450, Australia
| | - Gary Wittert
- Freemasons Centre for Male Health and Wellbeing, South Australian Health and Medical Research Institute, North Terrace, SA 5000, Australia
- Medical School, University of Adelaide, North Terrace, Adelaide 5000, Australia
| |
Collapse
|
36
|
Rockenschaub P, Akay EM, Carlisle BG, Hilbert A, Wendland J, Meyer-Eschenbach F, Näher AF, Frey D, Madai VI. External validation of AI-based scoring systems in the ICU: a systematic review and meta-analysis. BMC Med Inform Decis Mak 2025; 25:5. [PMID: 39762808 PMCID: PMC11702098 DOI: 10.1186/s12911-024-02830-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Accepted: 12/17/2024] [Indexed: 01/11/2025] Open
Abstract
BACKGROUND Machine learning (ML) is increasingly used to predict clinical deterioration in intensive care unit (ICU) patients through scoring systems. Although promising, such algorithms often overfit their training cohort and perform worse at new hospitals. Thus, external validation is a critical - but frequently overlooked - step to establish the reliability of predicted risk scores to translate them into clinical practice. We systematically reviewed how regularly external validation of ML-based risk scores is performed and how their performance changed in external data. METHODS We searched MEDLINE, Web of Science, and arXiv for studies using ML to predict deterioration of ICU patients from routine data. We included primary research published in English before December 2023. We summarised how many studies were externally validated, assessing differences over time, by outcome, and by data source. For validated studies, we evaluated the change in area under the receiver operating characteristic (AUROC) attributable to external validation using linear mixed-effects models. RESULTS We included 572 studies, of which 84 (14.7%) were externally validated, increasing to 23.9% by 2023. Validated studies made disproportionate use of open-source data, with two well-known US datasets (MIMIC and eICU) accounting for 83.3% of studies. On average, AUROC was reduced by -0.037 (95% CI -0.052 to -0.027) in external data, with more than 0.05 reduction in 49.5% of studies. DISCUSSION External validation, although increasing, remains uncommon. Performance was generally lower in external data, questioning the reliability of some recently proposed ML-based scores. Interpretation of the results was challenged by an overreliance on the same few datasets, implicit differences in case mix, and exclusive use of AUROC.
Collapse
Affiliation(s)
- Patrick Rockenschaub
- CLAIM - Charité Lab for AI in Medicine, Charité - Universitätsmedizin Berlin, Berlin, Germany
- QUEST Center for Responsible Research, Berlin Institute of Health at Charité Universitätsmedizin Berlin, Berlin, Germany
- Institute of Clinical Epidemiology, Public Health, Health Economics, Medical Statistics and Informatics, Medical University of Innsbruck, Innsbruck, Austria
| | - Ela Marie Akay
- CLAIM - Charité Lab for AI in Medicine, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Benjamin Gregory Carlisle
- STREAM - Studies of Translation, Ethics and Medicine, School of Population and Global Health, McGill University, Montréal, Canada
| | - Adam Hilbert
- CLAIM - Charité Lab for AI in Medicine, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Joshua Wendland
- Chair for Artificial Intelligence and Formal Methods, Faculty of Computer Science, Ruhr University, Bochum, Germany
| | - Falk Meyer-Eschenbach
- Institute of Medical Informatics, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Anatol-Fiete Näher
- Institute of Medical Informatics, Charité - Universitätsmedizin Berlin, Berlin, Germany
- Digital Global Public Health, Hasso Plattner Institute for Digital Engineering, University of Potsdam, Potsdam, Germany
| | - Dietmar Frey
- CLAIM - Charité Lab for AI in Medicine, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Vince Istvan Madai
- QUEST Center for Responsible Research, Berlin Institute of Health at Charité Universitätsmedizin Berlin, Berlin, Germany.
- Faculty of Computing, Engineering and the Built Environment, School of Computing and Digital Technology, Birmingham City University, Birmingham, UK.
| |
Collapse
|
37
|
Avelino-Silva TJ, Lee SJ, Covinsky KE, Walter LC, Deardorff WJ, Boscardin J, Campora F, Szlejf C, Suemoto CK, Smith AK. External Validation of the Walter Index for Posthospitalization Mortality Prediction in Older Adults. JAMA Netw Open 2025; 8:e2455475. [PMID: 39841475 PMCID: PMC11755200 DOI: 10.1001/jamanetworkopen.2024.55475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Accepted: 11/14/2024] [Indexed: 01/23/2025] Open
Abstract
Importance The Walter Index is a widely used prognostic tool for assessing 12-month mortality risk among hospitalized older adults. Developed in the US in 2001, its accuracy in contemporary non-US contexts is unclear. Objective To evaluate the external validity of the Walter Index in predicting posthospitalization mortality risk in Brazilian older adult inpatients. Design, Setting, and Participants This prognostic study used data from a cohort of adults aged 70 years or older admitted to the geriatric unit of a university hospital in Brazil from January 1, 2009, to February 28, 2020. Participants underwent comprehensive geriatric assessments at admission, were reevaluated at discharge, and were subsequently followed up for 48 months. Data were analyzed from March to July 2024. Main Outcomes and Measures The Walter Index, a score based on 6 risk factors (male sex, dependent activities of daily living at discharge, heart failure, cancer, high creatinine level, and low albumin level), was calculated to assess its predictive accuracy for 12-month mortality as well as 6-, 24-, and 48-month mortality. The study investigated whether incorporating delirium, frailty, or C-reactive protein level enhanced accuracy. Performance was assessed using discrimination, calibration, and clinical utility measures. Results In total, 2780 participants (mean [SD] age, 81 [7] years; 1795 [65%] female) were included, with 89 (3%) lost to follow-up. The 12-month posthospitalization mortality rate was 23% (646 participants). Mortality was 7% (47 of 634) in the lowest-risk group (0-1 point), 17% (111 of 668) for 2 to 3 points, 25% (198 of 803) for 4 to 6 points, and 43% (290 of 675) in the highest-risk group (≥7 points). The index demonstrated an area under the receiver operating characteristic curve (AUC) of 0.714 (95% CI, 0.691-0.736) for predicting 12-month posthospitalization mortality (AUCs were 0.75 and 0.80 in the original derivation and validation cohorts, respectively). Comparable results were observed for mortality at 6 months (AUC, 0.726; 95% CI, 0.700-0.752), 24 months (AUC, 0.711; 95% CI, 0.691-0.730), and 48 months (AUC, 0.719; 95% CI, 0.700-0.738). Adding delirium modestly increased the index's discrimination (AUC, 0.723; 95% CI, 0.702-0.749); additionally including frailty and C-reactive protein level did not improve discrimination further (AUC, 0.723; 95% CI, 0.701-0.744). Conclusions and Relevance In this prognostic study of hospitalized older adults in Brazil, the Walter Index showed similar discrimination in predicting postdischarge mortality as it did 2 decades ago in the US. These findings highlight the need for continuous validation and potential modification of established prognostic tools to improve their applicability across settings.
Collapse
Affiliation(s)
- Thiago J. Avelino-Silva
- Division of Geriatrics, School of Medicine, University of California San Francisco
- Laboratorio de Investigacao Medica em Envelhecimento (LIM-66), Servico de Geriatria, Hospital das Clinicas (HCFMUSP), Faculdade de Medicina, Universidade de São Paulo, São Paulo, Brazil
| | - Sei J. Lee
- Division of Geriatrics, School of Medicine, University of California San Francisco
- Geriatrics, Palliative and Extended Care Service Line, San Francisco Veterans Administration Health Care System, San Francisco, California
| | - Kenneth E. Covinsky
- Division of Geriatrics, School of Medicine, University of California San Francisco
- Geriatrics, Palliative and Extended Care Service Line, San Francisco Veterans Administration Health Care System, San Francisco, California
| | - Louise C. Walter
- Division of Geriatrics, School of Medicine, University of California San Francisco
- Geriatrics, Palliative and Extended Care Service Line, San Francisco Veterans Administration Health Care System, San Francisco, California
| | - W. James Deardorff
- Division of Geriatrics, School of Medicine, University of California San Francisco
- Geriatrics, Palliative and Extended Care Service Line, San Francisco Veterans Administration Health Care System, San Francisco, California
| | - John Boscardin
- Division of Geriatrics, School of Medicine, University of California San Francisco
- Geriatrics, Palliative and Extended Care Service Line, San Francisco Veterans Administration Health Care System, San Francisco, California
| | - Flavia Campora
- Laboratorio de Investigacao Medica em Envelhecimento (LIM-66), Servico de Geriatria, Hospital das Clinicas (HCFMUSP), Faculdade de Medicina, Universidade de São Paulo, São Paulo, Brazil
| | - Claudia Szlejf
- Division of Geriatrics, School of Medicine, University of California San Francisco
- Hospital Israelita Albert Einstein, São Paulo, Brazil
| | - Claudia K. Suemoto
- Division of Geriatrics, School of Medicine, University of California San Francisco
- Laboratorio de Investigacao Medica em Envelhecimento (LIM-66), Servico de Geriatria, Hospital das Clinicas (HCFMUSP), Faculdade de Medicina, Universidade de São Paulo, São Paulo, Brazil
| | - Alexander K. Smith
- Division of Geriatrics, School of Medicine, University of California San Francisco
- Geriatrics, Palliative and Extended Care Service Line, San Francisco Veterans Administration Health Care System, San Francisco, California
| |
Collapse
|
38
|
Luu HS. Laboratory Data as a Potential Source of Bias in Healthcare Artificial Intelligence and Machine Learning Models. Ann Lab Med 2025; 45:12-21. [PMID: 39444135 PMCID: PMC11609702 DOI: 10.3343/alm.2024.0323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Revised: 09/10/2024] [Accepted: 10/18/2024] [Indexed: 10/25/2024] Open
Abstract
Artificial intelligence (AI) and machine learning (ML) are anticipated to transform the practice of medicine. As one of the largest sources of digital data in healthcare, laboratory results can strongly influence AI and ML algorithms that require large sets of healthcare data for training. Embedded bias introduced into AI and ML models not only has disastrous consequences for quality of care but also may perpetuate and exacerbate health disparities. The lack of test harmonization, which is defined as the ability to produce comparable results and the same interpretation irrespective of the method or instrument platform used to produce the result, may introduce aggregation bias into algorithms with potential adverse outcomes for patients. Limited interoperability of laboratory results at the technical, syntactic, semantic, and organizational levels is a source of embedded bias that limits the accuracy and generalizability of algorithmic models. Population-specific issues, such as inadequate representation in clinical trials and inaccurate race attribution, not only affect the interpretation of laboratory results but also may perpetuate erroneous conclusions based on AI and ML models in the healthcare literature.
Collapse
Affiliation(s)
- Hung S. Luu
- Department of Pathology, UT Southwestern Medical Center, Dallas, TX, USA
| |
Collapse
|
39
|
Wernly B, Guidet B, Beil M. The role of artificial intelligence in life-sustaining treatment decisions: current state and future considerations. Intensive Care Med 2025; 51:157-159. [PMID: 39661140 DOI: 10.1007/s00134-024-07738-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2024] [Accepted: 11/19/2024] [Indexed: 12/12/2024]
Affiliation(s)
- Bernhard Wernly
- Department of Internal Medicine, General Hospital Oberndorf, Salzburg, Austria.
- Institute of General Practice, Family Medicine and Preventive Medicine, Paracelsus Medical University, Salzburg, Austria.
- Department of Internal Medicine, Saint John of God Hospital, Teaching Hospital of the Paracelsus Medical Private University, Salzburg, Austria.
- Clinic I for Internal Medicine, University Hospital Salzburg, Paracelsus Medical University, Salzburg, Austria.
| | - Bertrand Guidet
- INSERM, Institut Pierre Louis d'Epidémiologie Et de Santé Publique, AP-HP, Hôpital Saint Antoine, Sorbonne Université, Service MIR, Paris, France
| | - Michael Beil
- School of Computer Science and Engineering, Hebrew University of Jerusalem, Jerusalem, Israel
| |
Collapse
|
40
|
Cabanillas Silva P, Sun H, Rezk M, Roccaro-Waldmeyer DM, Fliegenschmidt J, Hulde N, von Dossow V, Meesseman L, Depraetere K, Stieg J, Szymanowsky R, Dahlweid FM. Longitudinal Model Shifts of Machine Learning-Based Clinical Risk Prediction Models: Evaluation Study of Multiple Use Cases Across Different Hospitals. J Med Internet Res 2024; 26:e51409. [PMID: 39671571 DOI: 10.2196/51409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 01/30/2024] [Accepted: 10/16/2024] [Indexed: 12/15/2024] Open
Abstract
BACKGROUND In recent years, machine learning (ML)-based models have been widely used in clinical domains to predict clinical risk events. However, in production, the performances of such models heavily rely on changes in the system and data. The dynamic nature of the system environment, characterized by continuous changes, has significant implications for prediction models, leading to performance degradation and reduced clinical efficacy. Thus, monitoring model shifts and evaluating their impact on prediction models are of utmost importance. OBJECTIVE This study aimed to assess the impact of a model shift on ML-based prediction models by evaluating 3 different use cases-delirium, sepsis, and acute kidney injury (AKI)-from 2 hospitals (M and H) with different patient populations and investigate potential model deterioration during the COVID-19 pandemic period. METHODS We trained prediction models using retrospective data from earlier years and examined the presence of a model shift using data from more recent years. We used the area under the receiver operating characteristic curve (AUROC) to evaluate model performance and analyzed the calibration curves over time. We also assessed the influence on clinical decisions by evaluating the alert rate, the rates of over- and underdiagnosis, and the decision curve. RESULTS The 2 data sets used in this study contained 189,775 and 180,976 medical cases for hospitals M and H, respectively. Statistical analyses (Z test) revealed no significant difference (P>.05) between the AUROCs from the different years for all use cases and hospitals. For example, in hospital M, AKI did not show a significant difference between 2020 (AUROC=0.898) and 2021 (AUROC=0.907, Z=-1.171, P=.242). Similar results were observed in both hospitals and for all use cases (sepsis and delirium) when comparing all the different years. However, when evaluating the calibration curves at the 2 hospitals, model shifts were observed for the delirium and sepsis use cases but not for AKI. Additionally, to investigate the clinical utility of our models, we performed decision curve analysis (DCA) and compared the results across the different years. A pairwise nonparametric statistical comparison showed no differences in the net benefit at the probability thresholds of interest (P>.05). The comprehensive evaluations performed in this study ensured robust model performance of all the investigated models across the years. Moreover, neither performance deteriorations nor alert surges were observed during the COVID-19 pandemic period. CONCLUSIONS Clinical risk prediction models were affected by the dynamic and continuous evolution of clinical practices and workflows. The performance of the models evaluated in this study appeared stable when assessed using AUROCs, showing no significant variations over the years. Additional model shift investigations suggested that a calibration shift was present for certain use cases (delirium and sepsis). However, these changes did not have any impact on the clinical utility of the models based on DCA. Consequently, it is crucial to closely monitor data changes and detect possible model shifts, along with their potential influence on clinical decision-making.
Collapse
Affiliation(s)
| | - Hong Sun
- Provincial Key Laboratory of Multimodal Perceiving and Intelligent Systems, Jiaxing University, Jiaxing, China
- Engineering Research Center of Intelligent Human Health Situation Awareness of Zhejiang Province, Jiaxing University, Jiaxing, China
| | | | | | - Janis Fliegenschmidt
- Institute of Anaesthesiology and Pain Therapy, Heart and Diabetes Centre North Rhine Westphalia, University Hospital of Ruhr-University Bochum, Bad Oeynhausen, Germany
| | - Nikolai Hulde
- Institute of Anaesthesiology and Pain Therapy, Heart and Diabetes Centre North Rhine Westphalia, University Hospital of Ruhr-University Bochum, Bad Oeynhausen, Germany
| | - Vera von Dossow
- Institute of Anaesthesiology and Pain Therapy, Heart and Diabetes Centre North Rhine Westphalia, University Hospital of Ruhr-University Bochum, Bad Oeynhausen, Germany
| | | | | | | | | | | |
Collapse
|
41
|
Ke JXC, Jen TTH, Gao S, Ngo L, Wu L, Flexman AM, Schwarz SKW, Brown CJ, Görges M. Development and internal validation of time-to-event risk prediction models for major medical complications within 30 days after elective colectomy. PLoS One 2024; 19:e0314526. [PMID: 39621640 PMCID: PMC11611139 DOI: 10.1371/journal.pone.0314526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Accepted: 11/12/2024] [Indexed: 12/12/2024] Open
Abstract
BACKGROUND Patients undergoing colectomy are at risk of numerous major complications. However, existing binary risk stratification models do not predict when a patient may be at highest risks of each complication. Accurate prediction of the timing of complications facilitates targeted, resource-efficient monitoring. We sought to develop and internally validate Cox proportional hazards models to predict time-to-complication of major complications within 30 days after elective colectomy. METHODS We studied a retrospective cohort from the multicentered American College of Surgeons National Surgical Quality Improvement Program procedure-targeted colectomy dataset. Patients aged 18 years or above, who underwent elective colectomy between January 1, 2014 and December 31, 2019 were included. A priori candidate predictors were selected based on variable availability, literature review, and multidisciplinary team consensus. Outcomes were mortality, hospital readmission, myocardial infarction, cerebral vascular events, pneumonia, venous thromboembolism, acute renal failure, and sepsis or septic shock within 30 days after surgery. RESULTS The cohort consisted of 132145 patients (mean ± SD age, 61 ± 15 years; 52% females). Complication rates ranged between 0.3% (n = 383) for cardiac arrest and acute renal failure to 5.3% (n = 6986) for bleeding requiring transfusion, with readmission rate of 8.6% (n = 11415). We observed distinct temporal patterns for each complication: the median [quartiles] postoperative day of complication diagnosis ranged from 1 [0, 2] days for bleeding requiring transfusion to 12 [6, 18] days for venous thromboembolism. Models for mortality, myocardial infarction, pneumonia, and renal failure showed good discrimination with a concordance > 0.8, while models for readmission, venous thromboembolism, and sepsis performed poorly with a concordance of 0.6 to 0.7. Models exhibited good calibration but ranges were limited to low probability areas. CONCLUSIONS We developed and internally validated time-to-event prediction models for complications after elective colectomy. Once further validated, the models can facilitate tailored monitoring of high risk patients during high risk periods. TRIAL REGISTRATION Clinicaltrials.gov (NCT05150548; Principal Investigator: Janny Xue Chen Ke, M.D., M.Sc., F.R.C.P.C.; initial posting: November 25, 2021).
Collapse
Affiliation(s)
- Janny X. C. Ke
- Department of Anesthesiology, Pharmacology & Therapeutics, Faculty of Medicine, The University of British Columbia, Vancouver, British Columbia, Canada
- Department of Anesthesia, St. Paul’s Hospital/Providence Health Care, Vancouver, British Columbia, Canada
| | - Tim T. H. Jen
- Department of Anesthesiology, Pharmacology & Therapeutics, Faculty of Medicine, The University of British Columbia, Vancouver, British Columbia, Canada
- Department of Anesthesia, St. Paul’s Hospital/Providence Health Care, Vancouver, British Columbia, Canada
| | - Sihaoyu Gao
- Department of Statistics, Faculty of Science, The University of British Columbia, Vancouver, British Columbia, Canada
| | - Long Ngo
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
- Division of General Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Lang Wu
- Department of Statistics, Faculty of Science, The University of British Columbia, Vancouver, British Columbia, Canada
| | - Alana M. Flexman
- Department of Anesthesiology, Pharmacology & Therapeutics, Faculty of Medicine, The University of British Columbia, Vancouver, British Columbia, Canada
- Department of Anesthesia, St. Paul’s Hospital/Providence Health Care, Vancouver, British Columbia, Canada
| | - Stephan K. W. Schwarz
- Department of Anesthesiology, Pharmacology & Therapeutics, Faculty of Medicine, The University of British Columbia, Vancouver, British Columbia, Canada
- Department of Anesthesia, St. Paul’s Hospital/Providence Health Care, Vancouver, British Columbia, Canada
| | - Carl J. Brown
- Department of Surgery, Faculty of Medicine, The University of British Columbia, Vancouver, British Columbia, Canada
- Department of Surgery, St. Paul’s Hospital/Providence Health Care, Vancouver, British Columbia, Canada
| | - Matthias Görges
- Department of Anesthesiology, Pharmacology & Therapeutics, Faculty of Medicine, The University of British Columbia, Vancouver, British Columbia, Canada
- BC Children’s Hospital Research Institute, Vancouver, British Columbia, Canada
| |
Collapse
|
42
|
Tangel VE, Hoeks SE, Stolker RJ, Brown S, Pryor KO, de Graaff JC. International multi-institutional external validation of preoperative risk scores for 30-day in-hospital mortality in paediatric patients. Br J Anaesth 2024; 133:1222-1233. [PMID: 39477712 DOI: 10.1016/j.bja.2024.09.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 08/14/2024] [Accepted: 09/14/2024] [Indexed: 11/19/2024] Open
Abstract
BACKGROUND Risk prediction scores are used to guide clinical decision-making. Our primary objective was to externally validate two patient-specific risk scores for 30-day in-hospital mortality using the Multicenter Perioperative Outcomes Group (MPOG) registry: the Pediatric Risk Assessment (PRAm) score and the intrinsic surgical risk score. The secondary objective was to recalibrate these scores. METHODS Data from 56 US and Dutch hospitals with paediatric caseloads were included. The primary outcome was 30-day mortality. To assess model discrimination, the area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUC-PR) were calculated. Model calibration was assessed by plotting the observed and predicted probabilities. Decision analytic curves were fit. RESULTS The 30-day mortality was 0.14% (822/606 488). The AUROC for the PRAm upon external validation was 0.856 (95% confidence interval 0.844-0.869), and the AUC-PR was 0.008. Upon recalibration, the AUROC was 0.873 (0.861-0.886), and the AUC-PR was 0.031. The AUROC for the external validation of the intrinsic surgical risk score was 0.925 (0.914-0.936) and AUC-PR was 0.085. Upon recalibration, the AUROC was 0.925 (0.915-0.936), and the AUC-PR was 0.094. Calibration metrics for both scores were favourable because of the large cluster of cases with low probabilities of mortality. Decision curve analyses showed limited benefit to using either score. CONCLUSIONS The intrinsic surgical risk score performed better than the PRAm, but both resulted in large numbers of false positives. Both scores exhibited decreased performance compared with the original studies. ASA physical status scores in sicker patients drove the superior performance of the intrinsic surgical risk score, suggesting the use of a risk score does not improve prediction.
Collapse
Affiliation(s)
- Virginia E Tangel
- Department of Anesthesiology, Erasmus University Medical Centre, Rotterdam, The Netherlands; Department of Anesthesiology, Weill Cornell Medicine, New York, NY, USA.
| | - Sanne E Hoeks
- Department of Anesthesiology, Erasmus University Medical Centre, Rotterdam, The Netherlands
| | - Robert Jan Stolker
- Department of Anesthesiology, Erasmus University Medical Centre, Rotterdam, The Netherlands
| | - Sydney Brown
- Department of Anesthesiology, University of Michigan, Ann Arbor, MI, USA
| | - Kane O Pryor
- Department of Anesthesiology, Weill Cornell Medicine, New York, NY, USA
| | - Jurgen C de Graaff
- Department of Anesthesiology, Erasmus University Medical Centre, Rotterdam, The Netherlands; Department of Anesthesiology, Weill Cornell Medicine, New York, NY, USA; Department of Anesthesiology, Adrz-Erasmus MC, Goes, The Netherlands
| |
Collapse
|
43
|
Chavosh Nejad M, Vestergaard Matthiesen R, Dukovska-Popovska I, Jakobsen T, Johansen J. Machine learning for predicting duration of surgery and length of stay: A literature review on joint arthroplasty. Int J Med Inform 2024; 192:105631. [PMID: 39293161 DOI: 10.1016/j.ijmedinf.2024.105631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 08/15/2024] [Accepted: 09/13/2024] [Indexed: 09/20/2024]
Abstract
INTRODUCTION In recent years, different factors such as population aging have caused escalating demand for hip and knee arthroplasty straining already limited hospitals' resources. To address this challenge, focus is put on medical and operational efficiency improvements. This includes an increased use of machine learning (ML) to predict duration of surgery (DOS) and length of stay (LOS) for total knee and total hip arthroplasty, which can be utilized for optimizing resource allocation to satisfy medical and operational limitations. This paper explores the development and performance of ML models in predicting DOS and LOS. METHODS A systematic search of publications between 2010-2023 was conducted following PRISMA guidelines. Considering the inclusion and exclusion criteria, 28 out of 722 gathered papers from PubMed, Web of Science, and manual search were included in the study. Descriptive statistics was used to analyze the extracted data regarding data preprocessing, model development, and model performance assessment. RESULTS Most of the papers work on LOS as a binary variable. Patient's age was identified as the most frequently used and reported as important variable for predicting DOS and LOS. Investigations also illustrated that within the resulting 28 papers, more than 71% of models reached good to perfect performance based on the area under the receiver operating characteristic curve (AUC), where artificial neural networks and ensemble learning models had the biggest share among the best-performing models. CONCLUSION The utilization of ML models is increasing in the literature. The current performance level indicates that ML can potentially turn to powerful tools in predicting DOS and LOS for different purposes. Meanwhile, the literature is not matured yet in reporting real-life application. Future studies can focus on model specification and validation by considering empirical application.
Collapse
Affiliation(s)
- Mohammad Chavosh Nejad
- Department of Materials and Production, Aalborg University, Fibigerstræde 16, 2-109, Aalborg Ø 9220, Danmark.
| | | | - Iskra Dukovska-Popovska
- Department of Materials and Production, Aalborg University, Fibigerstræde 16, 2-107, Aalborg Ø 9220, Danmark.
| | - Thomas Jakobsen
- Department of Orthopaedics, Aalborg University Hospital, Hobrovej 18-22, Aalborg Universitetshospital, Aalborg Syd 9000, Danmark.
| | - John Johansen
- Department of Materials and Production, Aalborg University, Fibigerstræde 16, 2-114, Aalborg Ø 9220, Danmark.
| |
Collapse
|
44
|
Rockenschaub P, Madai VI, Frey D. The authors reply. Crit Care Med 2024; 52:e638-e639. [PMID: 39637279 DOI: 10.1097/ccm.0000000000006441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2024]
Affiliation(s)
- Patrick Rockenschaub
- Institute of Clinical Epidemiology, Public Health, Health Economics, Medical Statistics and Informatics, Medical University of Innsbruck, Innsbruck, Austria
| | - Vince Istvan Madai
- QUEST Center for Responsible Research, Berlin Institute of Health at Charité Universitätsmedizin Berlin, Berlin, Germany
- Faculty of Computing, Engineering and the Built Environment, School of Computing and Digital Technology, Birmingham City University, Birmingham, United Kingdom
| | - Dietmar Frey
- Charité Lab for Artificial Intelligence in Medicine (CLAIM), Charité-Universitätsmedizin Berlin, Berlin, Germany
| |
Collapse
|
45
|
Gonzalez R, Saha A, Campbell CJ, Nejat P, Lokker C, Norgan AP. Seeing the random forest through the decision trees. Supporting learning health systems from histopathology with machine learning models: Challenges and opportunities. J Pathol Inform 2024; 15:100347. [PMID: 38162950 PMCID: PMC10755052 DOI: 10.1016/j.jpi.2023.100347] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 10/06/2023] [Accepted: 11/01/2023] [Indexed: 01/03/2024] Open
Abstract
This paper discusses some overlooked challenges faced when working with machine learning models for histopathology and presents a novel opportunity to support "Learning Health Systems" with them. Initially, the authors elaborate on these challenges after separating them according to their mitigation strategies: those that need innovative approaches, time, or future technological capabilities and those that require a conceptual reappraisal from a critical perspective. Then, a novel opportunity to support "Learning Health Systems" by integrating hidden information extracted by ML models from digitalized histopathology slides with other healthcare big data is presented.
Collapse
Affiliation(s)
- Ricardo Gonzalez
- DeGroote School of Business, McMaster University, Hamilton, Ontario, Canada
- Division of Computational Pathology and Artificial Intelligence, Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, United States
| | - Ashirbani Saha
- Department of Oncology, Faculty of Health Sciences, McMaster University, Hamilton, Ontario, Canada
- Escarpment Cancer Research Institute, McMaster University and Hamilton Health Sciences, Hamilton, Ontario, Canada
| | - Clinton J.V. Campbell
- William Osler Health System, Brampton, Ontario, Canada
- Department of Pathology and Molecular Medicine, Faculty of Health Sciences, McMaster University, Hamilton, Ontario, Canada
| | - Peyman Nejat
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| | - Cynthia Lokker
- Health Information Research Unit, Department of Health Research Methods, Evidence and Impact, McMaster University, Hamilton, Ontario, Canada
| | - Andrew P. Norgan
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, United States
| |
Collapse
|
46
|
de Ruijter UW, Kaplan ZLR, Eijkenaar F, Maas CCHM, van der Heide A, Bax WA, Lingsma HF. Identifying persistent high-cost patients in the hospital for care management: development and validation of prediction models. BMC Health Serv Res 2024; 24:1469. [PMID: 39593019 PMCID: PMC11590622 DOI: 10.1186/s12913-024-11936-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Accepted: 11/13/2024] [Indexed: 11/28/2024] Open
Abstract
BACKGROUND Healthcare use by High-Need High-Cost (HNHC) patients is believed to be modifiable through better coordination of care. To identify patients for care management, a hybrid approach is recommended that combines clinical assessment of need with model-based prediction of cost. Models that predict high healthcare costs persisting over time are relevant but scarce. We aimed to develop and validate two models predicting Persistent High-Cost (PHC) status upon hospital outpatient visit and hospital admission, respectively. METHODS We performed a retrospective cohort study using claims data from a national health insurer in the Netherlands-a regulated competitive health care system with universal coverage. We created two populations of adults based on their index event in 2016: a first hospital outpatient visit (i.e., outpatient population) or hospital admission (i.e., hospital admission population). Both were divided in a development (January-June) and validation (July-December) cohort. Our outcome of interest, PHC status, was defined as belonging to the top 10% of total annual healthcare costs for three consecutive years after the index event. Predictors were predefined based on an earlier systematic review and collected in the year prior to the index event. Predictor effects were quantified through logistic multivariable regression analysis. To increase usability, we also developed smaller models containing the lowest number of predictors while maintaining comparable performance. This was based on relative predictor importance (Wald χ2). Model performance was evaluated by means of discrimination (C-statistic) and calibration (plots). RESULTS In the outpatient development cohort (n = 135,558), 2.2% of patients (n = 3,016) was PHC. In the hospital admission development cohort (n = 24,805), this was 5.8% (n = 1,451). Both full models included 27 predictors, while their smaller counterparts had 10 (outpatient model) and 11 predictors (hospital admission model). In the outpatient validation cohort (n = 84,009) and hospital admission validation cohort (n = 20,768), discrimination was good for full models (C-statistics 0.75; 0.74) and smaller models (C-statistics 0.70; 0.73), while calibration plots indicated that models were well-calibrated. CONCLUSIONS We developed and validated two models predicting PHC status that demonstrate good discrimination and calibration. Both models are suitable for integration into electronic health records to aid a hybrid case-finding strategy for HNHC care management.
Collapse
Affiliation(s)
- Ursula W de Ruijter
- Department of Public Health, Erasmus MC University Medical Center, Rotterdam, The Netherlands.
- Department of Internal Medicine, Northwest Clinics, Alkmaar, The Netherlands.
| | - Z L Rana Kaplan
- Department of Public Health, Erasmus MC University Medical Center, Rotterdam, The Netherlands
| | - Frank Eijkenaar
- Erasmus School of Health Policy & Management, Erasmus University, Rotterdam, The Netherlands
| | - Carolien C H M Maas
- Department of Public Health, Erasmus MC University Medical Center, Rotterdam, The Netherlands
| | - Agnes van der Heide
- Department of Public Health, Erasmus MC University Medical Center, Rotterdam, The Netherlands
| | - Willem A Bax
- Department of Internal Medicine, Northwest Clinics, Alkmaar, The Netherlands
| | - Hester F Lingsma
- Department of Public Health, Erasmus MC University Medical Center, Rotterdam, The Netherlands
| |
Collapse
|
47
|
Sajanti A, Hellström S, Bennett C, Srinath A, Jhaveri A, Cao Y, Takala R, Frantzén J, Koskimäki F, Falter J, Lyne SB, Rantamäki T, Posti JP, Roine S, Jänkälä M, Puolitaival J, Kolehmainen S, Girard R, Rahi M, Rinne J, Castrén E, Koskimäki J. Soluble Urokinase-Type Plasminogen Activator Receptor and Inflammatory Biomarker Response with Prognostic Significance after Acute Neuronal Injury - a Prospective Cohort Study. Inflammation 2024:10.1007/s10753-024-02185-1. [PMID: 39540961 DOI: 10.1007/s10753-024-02185-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Revised: 10/30/2024] [Accepted: 11/05/2024] [Indexed: 11/16/2024]
Abstract
Aneurysmal subarachnoid hemorrhage (aSAH), ischemic stroke (IS), and traumatic brain injury (TBI) are severe conditions impacting individuals and society. Identifying reliable prognostic biomarkers for predicting survival or recovery remains a challenge. Soluble urokinase type plasminogen activator receptor (suPAR) has gained attention as a potential prognostic biomarker in acute sepsis. This study evaluates suPAR and related neuroinflammatory biomarkers in serum for brain injury prognosis. This prospective study included 31 aSAH, 30 IS, 13 TBI, and three healthy controls (n = 77). Serum samples were collected on average 5.9 days post-injury, analyzing suPAR, IL-1β, cyclophilin A, and TNFα levels using ELISA. Outcomes were assessed 90 days post-injury with the modified Rankin Scale (mRS), categorized as favorable (mRS 0-2) or unfavorable (mRS 3-6). Statistical analyses included 2-tailed t-tests, Pearson's correlations, and machine learning linear discriminant analysis (LDA) for biomarker combinations. Elevated suPAR levels were found in brain injury patients compared to controls (p = 0.017). Increased suPAR correlated with unfavorable outcomes (p = 0.0018) and showed prognostic value (AUC = 0.66, p = 0.03). IL-1β levels were higher in the unfavorable group (p = 0.0015). LDA combinatory analysis resulted a fair prognostic accuracy with canonical equation = 0.775[suPAR] + 0.667[IL1-β] (AUC = 0.77, OR 0.296, sensitivity 93.1%, specificity 53.1%, p = 0.0007). No correlation was found between suPAR and CRP or infection status. Elevated suPAR levels in acute brain injury patients were associated with poorer outcomes, highlighting suPAR's potential as a prognostic biomarker across different brain injury types. Combining IL-1β with suPAR improved prognostic accuracy, supporting a multimodal biomarker approach for predicting outcomes.
Collapse
Affiliation(s)
- Antti Sajanti
- Neurocenter, Department of Neurosurgery, Turku University Hospital and University of Turku, P.O. Box 52, Hämeentie 11, FI-20521, Turku, Finland
| | - Santtu Hellström
- Neurocenter, Department of Neurosurgery, Turku University Hospital and University of Turku, P.O. Box 52, Hämeentie 11, FI-20521, Turku, Finland
| | - Carolyn Bennett
- Neurovascular Surgery Program, Section of Neurosurgery, The University of Chicago Medicine and Biological Sciences, 5841 S. Maryland, Chicago, IL, 60637, USA
| | - Abhinav Srinath
- Neurovascular Surgery Program, Section of Neurosurgery, The University of Chicago Medicine and Biological Sciences, 5841 S. Maryland, Chicago, IL, 60637, USA
| | - Aditya Jhaveri
- Neurovascular Surgery Program, Section of Neurosurgery, The University of Chicago Medicine and Biological Sciences, 5841 S. Maryland, Chicago, IL, 60637, USA
| | - Ying Cao
- Department of Radiation Oncology, Kansas University Medical Center, Kansas City, KS, 66160, USA
| | - Riikka Takala
- Perioperative Services, Intensive Care and Pain Medicine, Turku University Hospital and University of Turku, POB 52, 20521, Turku, Finland
| | - Janek Frantzén
- Neurocenter, Department of Neurosurgery, Turku University Hospital and University of Turku, P.O. Box 52, Hämeentie 11, FI-20521, Turku, Finland
| | - Fredrika Koskimäki
- Neurocenter, Acute Stroke Unit, Turku University Hospital, P.O. Box 52, FI-20521, Turku, Finland
| | - Johannes Falter
- Department of Neurosurgery, University Medical Center of Regensburg, Regensburg, Germany
| | - Seán B Lyne
- Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Tomi Rantamäki
- Laboratory of Neurotherapeutics, Molecular and Integrative Biosciences Research Programme, Faculty of Biological and Environmental Sciences and Drug Research Program, Division of Pharmacology and Pharmacotherapy, Faculty of Pharmacy, University of Helsinki, Helsinki, Finland
| | - Jussi P Posti
- Neurocenter, Department of Neurosurgery, Turku University Hospital and University of Turku, P.O. Box 52, Hämeentie 11, FI-20521, Turku, Finland
| | - Susanna Roine
- Neurocenter, Acute Stroke Unit, Turku University Hospital, P.O. Box 52, FI-20521, Turku, Finland
| | - Miro Jänkälä
- Department of Neurosurgery, Oulu University Hospital, Box 25, 90029 OYS, Oulu, Finland
| | - Jukka Puolitaival
- Department of Neurosurgery, Oulu University Hospital, Box 25, 90029 OYS, Oulu, Finland
| | - Sulo Kolehmainen
- Neuroscience Center, HiLIFE, University of Helsinki, Box 63, 00014, Helsinki, Finland
| | - Romuald Girard
- Neurovascular Surgery Program, Section of Neurosurgery, The University of Chicago Medicine and Biological Sciences, 5841 S. Maryland, Chicago, IL, 60637, USA
| | - Melissa Rahi
- Neurocenter, Department of Neurosurgery, Turku University Hospital and University of Turku, P.O. Box 52, Hämeentie 11, FI-20521, Turku, Finland
| | - Jaakko Rinne
- Neurocenter, Department of Neurosurgery, Turku University Hospital and University of Turku, P.O. Box 52, Hämeentie 11, FI-20521, Turku, Finland
| | - Eero Castrén
- Neuroscience Center, HiLIFE, University of Helsinki, Box 63, 00014, Helsinki, Finland
| | - Janne Koskimäki
- Neurocenter, Department of Neurosurgery, Turku University Hospital and University of Turku, P.O. Box 52, Hämeentie 11, FI-20521, Turku, Finland.
- Department of Neurosurgery, Oulu University Hospital, Box 25, 90029 OYS, Oulu, Finland.
- Neuroscience Center, HiLIFE, University of Helsinki, Box 63, 00014, Helsinki, Finland.
| |
Collapse
|
48
|
Hong M, Kang RR, Yang JH, Rhee SJ, Lee H, Kim YG, Lee K, Kim H, Lee YS, Youn T, Kim SH, Ahn YM. Comprehensive Symptom Prediction in Inpatients With Acute Psychiatric Disorders Using Wearable-Based Deep Learning Models: Development and Validation Study. J Med Internet Res 2024; 26:e65994. [PMID: 39536315 PMCID: PMC11602769 DOI: 10.2196/65994] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2024] [Revised: 10/20/2024] [Accepted: 10/20/2024] [Indexed: 11/16/2024] Open
Abstract
BACKGROUND Assessing the complex and multifaceted symptoms of patients with acute psychiatric disorders proves to be significantly challenging for clinicians. Moreover, the staff in acute psychiatric wards face high work intensity and risk of burnout, yet research on the introduction of digital technologies in this field remains limited. The combination of continuous and objective wearable sensor data acquired from patients with deep learning techniques holds the potential to overcome the limitations of traditional psychiatric assessments and support clinical decision-making. OBJECTIVE This study aimed to develop and validate wearable-based deep learning models to comprehensively predict patient symptoms across various acute psychiatric wards in South Korea. METHODS Participants diagnosed with schizophrenia and mood disorders were recruited from 4 wards across 3 hospitals and prospectively observed using wrist-worn wearable devices during their admission period. Trained raters conducted periodic clinical assessments using the Brief Psychiatric Rating Scale, Hamilton Anxiety Rating Scale, Montgomery-Asberg Depression Rating Scale, and Young Mania Rating Scale. Wearable devices collected patients' heart rate, accelerometer, and location data. Deep learning models were developed to predict psychiatric symptoms using 2 distinct approaches: single symptoms individually (Single) and multiple symptoms simultaneously via multitask learning (Multi). These models further addressed 2 problems: within-subject relative changes (Deterioration) and between-subject absolute severity (Score). Four configurations were consequently developed for each scale: Single-Deterioration, Single-Score, Multi-Deterioration, and Multi-Score. Data of participants recruited before May 1, 2024, underwent cross-validation, and the resulting fine-tuned models were then externally validated using data from the remaining participants. RESULTS Of the 244 enrolled participants, 191 (78.3%; 3954 person-days) were included in the final analysis after applying the exclusion criteria. The demographic and clinical characteristics of participants, as well as the distribution of sensor data, showed considerable variations across wards and hospitals. Data of 139 participants were used for cross-validation, while data of 52 participants were used for external validation. The Single-Deterioration and Multi-Deterioration models achieved similar overall accuracy values of 0.75 in cross-validation and 0.73 in external validation. The Single-Score and Multi-Score models attained overall R² values of 0.78 and 0.83 in cross-validation and 0.66 and 0.74 in external validation, respectively, with the Multi-Score model demonstrating superior performance. CONCLUSIONS Deep learning models based on wearable sensor data effectively classified symptom deterioration and predicted symptom severity in participants in acute psychiatric wards. Despite lower computational costs, Multi models demonstrated equivalent or superior performance than Single models, suggesting that multitask learning is a promising approach for comprehensive symptom prediction. However, significant variations were observed across wards, which presents a key challenge for developing clinical decision support systems in acute psychiatric wards. Future studies may benefit from recurring local validation or federated learning to address generalizability issues.
Collapse
Affiliation(s)
- Minseok Hong
- Department of Neuropsychiatry, Seoul National University Hospital, Seoul, Republic of Korea
- Department of Psychiatry, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Ri-Ra Kang
- Department of IT Convergence Engineering, Gachon University, Seongnam-si, Republic of Korea
| | - Jeong Hun Yang
- Department of Psychiatry, Seoul National University College of Medicine, Seoul, Republic of Korea
- Department of Psychiatry, Chungnam National University Sejong Hospital, Sejong, Republic of Korea
| | - Sang Jin Rhee
- Department of Neuropsychiatry, Seoul National University Hospital, Seoul, Republic of Korea
| | - Hyunju Lee
- Department of Neuropsychiatry, Seoul National University Hospital, Seoul, Republic of Korea
| | - Yong-Gyom Kim
- Department of IT Convergence Engineering, Gachon University, Seongnam-si, Republic of Korea
| | - KangYoon Lee
- Department of IT Convergence Engineering, Gachon University, Seongnam-si, Republic of Korea
- Department of Computer Engineering, Gachon University, Seongnam-si, Republic of Korea
| | - HongGi Kim
- Healthconnect Co. Ltd., Seoul, Republic of Korea
| | - Yu Sang Lee
- Department of Psychiatry, Yong-In Mental Hospital, Yongin-si, Republic of Korea
| | - Tak Youn
- Department of Psychiatry and Electroconvulsive Therapy Center, Dongguk University International Hospital, Goyang-si, Republic of Korea
- Institute of Buddhism and Medicine, Dongguk University, Seoul, Republic of Korea
| | - Se Hyun Kim
- Department of Neuropsychiatry, Seoul National University Hospital, Seoul, Republic of Korea
- Department of Psychiatry, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Yong Min Ahn
- Department of Neuropsychiatry, Seoul National University Hospital, Seoul, Republic of Korea
- Department of Psychiatry, Seoul National University College of Medicine, Seoul, Republic of Korea
- Institute of Human Behavioral Medicine, Seoul National University Medical Research Center, Seoul, Republic of Korea
| |
Collapse
|
49
|
Yoon SJ, Jutte PC, Soriano A, Sousa R, Zijlstra WP, Wouthuyzen-Bakker M. Predicting periprosthetic joint infection: external validation of preoperative prediction models. J Bone Jt Infect 2024; 9:231-239. [PMID: 39539737 PMCID: PMC11554715 DOI: 10.5194/jbji-9-231-2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Accepted: 08/29/2024] [Indexed: 11/16/2024] Open
Abstract
Introduction: Prediction models for periprosthetic joint infections (PJIs) are gaining interest due to their potential to improve clinical decision-making. However, their external validity across various settings remains uncertain. This study aimed to externally validate promising preoperative PJI prediction models in a recent multinational European cohort. Methods: Three preoperative PJI prediction models - by Tan et al. (2018), Del Toro et al. (2019), and Bülow et al. (2022) - that have previously demonstrated high levels of accuracy were selected for validation. A retrospective observational analysis of patients undergoing total hip arthroplasty (THA) and total knee arthroplasty (TKA) at centers in the Netherlands, Portugal, and Spain between January 2020 and December 2021 was conducted. Patient characteristics were compared between our cohort and those used to develop the models. Performance was assessed through discrimination and calibration. Results: The study included 2684 patients, 60 of whom developed a PJI (2.2 %). Our cohort differed from the models' original cohorts with respect to demographic variables, procedural variables, and comorbidity prevalence. The overall accuracies of the models, measured with the c statistic, were 0.72, 0.69, and 0.72 for the Tan, Del Toro, and Bülow models, respectively. Calibration was reasonable, but the PJI risk estimates were most accurate for predicted infection risks below 3 %-4 %. The Tan model overestimated PJI risk above 4 %, whereas the Del Toro model underestimated PJI risk above 3 %. Conclusions: The Tan, Del Toro, and Bülow PJI prediction models were externally validated in this multinational cohort, demonstrating potential for clinical application in identifying high-risk patients and enhancing preoperative counseling and prevention strategies.
Collapse
Affiliation(s)
- Seung-Jae Yoon
- Department of Orthopaedic Surgery, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | - Paul C Jutte
- Department of Orthopaedic Surgery, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | - Alex Soriano
- Infectious Diseases Service, Clínic Barcelona, University of Barcelona, Barcelona, Spain
| | - Ricardo Sousa
- Porto Bone Infection Group (GRIP), Orthopaedic Department, Centro Hospitalar Universitário do Porto, Porto, Portugal
| | - Wierd P Zijlstra
- Department of Orthopaedic Surgery, Medical Center Leeuwarden, Leeuwarden, the Netherlands
| | - Marjan Wouthuyzen-Bakker
- Department of Medical Microbiology and Infection Prevention, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| |
Collapse
|
50
|
Hosar R, Berntsen GR, Steinsbekk A. Validity of the Johns Hopkins Adjusted Clinical Groups system on the utilisation of healthcare services in Norway: a retrospective cross-sectional study. BMC Health Serv Res 2024; 24:1279. [PMID: 39448990 PMCID: PMC11515438 DOI: 10.1186/s12913-024-11715-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Accepted: 10/07/2024] [Indexed: 10/26/2024] Open
Abstract
BACKGROUND The Adjusted Clinical Groups (ACG) System is a validated electronic risk stratification system. However, there is a lack of studies on the association between different ACG risk scores and the utilisation of different healthcare services using different sources of input data. The aim of this study was therefore to assess the validity of the association between five different ACG risk scores and the utilisation of a range of different healthcare services using input data from either general practitioners (GPs) or hospitals. METHODS Registry-based study of all adult inhabitants in four Norwegian municipalities that received somatic healthcare in one year (N = 168 285). The ACG risk scores resource utilisation band, unscaled ACG concurrent risk, unscaled concurrent risk, frailty flag and chronic condition count were calculated using age, sex and diagnosis codes from GPs and a hospital, respectively. Healthcare utilisation covered GP, municipal and hospital services. Areas under the receiver operating curve (AUC) were calculated and compared to the AUC of a model using only age and sex. RESULTS Utilisation of all healthcare services increased with increasing scores in the "resource utilisation band" (RUB) and all other investigated ACG risk scores. The risk scores overall distinguished well between levels of utilisation of GP visits (AUC up to 0.84), hospitalisation (AUC up to 0.8) and specialist outpatient visits (AUC up to 0.72), but not out-of-hours GP visits (AUC up to 0.62). The score "unscaled ACG concurrent risk" overall performed best. Risk scores based on data from either GPs or hospitals performed better for the classification of healthcare services in their respective domains. The model based on age and sex performed better for distinguishing between levels of utilisation of municipal services (AUC 0.83-0.90 compared to 0.46-0.79). CONCLUSIONS Risk scores from the ACG system is valid for classifying GP visits, hospitalisation and specialist outpatient visits. It does not outperform simpler models in the classification of utilisation of municipal services such as nursing homes and home services and outpatient emergency care in primary healthcare. The ACG system can be applied in Norway using administrative data from either GPs or hospitals.
Collapse
Affiliation(s)
- Rannei Hosar
- Department of Public Health and Nursing, Norwegian University of Science and Technology (NTNU), Trondheim, Norway.
| | - Gro Rosvold Berntsen
- Norwegian Center for E-Health Research, University Hospital of North Norway, Tromsø, Norway
- Institute of Community Medicine, UiT The Arctic University of Norway, Tromsø, Norway
| | - Aslak Steinsbekk
- Department of Public Health and Nursing, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
- Norwegian Center for E-Health Research, University Hospital of North Norway, Tromsø, Norway
| |
Collapse
|