1
|
Jin Y, Gönen M, Kattan MW. Are Statistical Tests Really Needed to Compare Training and Validation Sets for Prediction Model Development and Evaluation? Chest 2025; 167:40-41. [PMID: 39794078 DOI: 10.1016/j.chest.2024.07.164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 07/23/2024] [Accepted: 07/25/2024] [Indexed: 01/13/2025] Open
Affiliation(s)
- Yuxuan Jin
- Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, OH
| | - Mithat Gönen
- Department of Epidemiology & Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY.
| | - Michael W Kattan
- Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, OH
| |
Collapse
|
2
|
Azizmalayeri M, Abu-Hanna A, Cinà G. Unmasking the chameleons: A benchmark for out-of-distribution detection in medical tabular data. Int J Med Inform 2024; 195:105762. [PMID: 39708667 DOI: 10.1016/j.ijmedinf.2024.105762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 12/04/2024] [Accepted: 12/13/2024] [Indexed: 12/23/2024]
Abstract
BACKGROUND Machine Learning (ML) models often struggle to generalize effectively to data that deviates from the training distribution. This raises significant concerns about the reliability of real-world healthcare systems encountering such inputs known as out-of-distribution (OOD) data. These concerns can be addressed by real-time detection of OOD inputs. While numerous OOD detection approaches have been suggested in other fields - especially in computer vision - it remains unclear whether similar methods effectively address challenges posed by medical tabular data. OBJECTIVE To answer this important question, we propose an extensive reproducible benchmark to compare different OOD detection methods in medical tabular data across a comprehensive suite of tests. METHOD To achieve this, we leverage 4 different and large public medical datasets, including eICU and MIMIC-IV, and consider various kinds of OOD cases within these datasets. For example, we examine OODs originating from a statistically different dataset than the training set according to the membership model introduced by Debray et al. [1], as well as OODs obtained by splitting a given dataset based on a value of a distinguishing variable. To identify OOD instances, we explore a range of 10 density-based methods that learn the marginal distribution of the data, alongside 17 post-hoc detectors that are applied on top of prediction models already trained on the data. The prediction models involve three distinct architectures, namely MLP, ResNet, and Transformer. MAIN RESULTS In our experiments, when the membership model achieved an AUC of 0.98, which indicated a clear distinction between OOD data and the training set, we observed that the OOD detection methods had achieved AUC values exceeding 0.95 in distinguishing OOD data. In contrast, in the experiments with subtler changes in data distribution such as selecting OOD data based on ethnicity and age characteristics, many OOD detection methods performed similarly to a random classifier with AUC values close to 0.5. This may suggest a correlation between separability, as indicated by the membership model, and OOD detection performance, as indicated by the AUC of the detection model. This warrants future research.
Collapse
Affiliation(s)
- Mohammad Azizmalayeri
- Department of Medical Informatics, Amsterdam Public Health Research Institute, Amsterdam UMC, University of Amsterdam, the Netherlands.
| | - Ameen Abu-Hanna
- Department of Medical Informatics, Amsterdam Public Health Research Institute, Amsterdam UMC, University of Amsterdam, the Netherlands.
| | - Giovanni Cinà
- Department of Medical Informatics, Amsterdam Public Health Research Institute, Amsterdam UMC, University of Amsterdam, the Netherlands; Institute of Logic, Language and Computation, University of Amsterdam, the Netherlands; Pacmed, Amsterdam, the Netherlands.
| |
Collapse
|
3
|
van Boekel AM, van der Meijden SL, Arbous SM, Nelissen RGHH, Veldkamp KE, Nieswaag EB, Jochems KFT, Holtz J, Veenstra AVIJ, Reijman J, de Jong Y, van Goor H, Wiewel MA, Schoones JW, Geerts BF, de Boer MGJ. Systematic evaluation of machine learning models for postoperative surgical site infection prediction. PLoS One 2024; 19:e0312968. [PMID: 39666725 PMCID: PMC11637340 DOI: 10.1371/journal.pone.0312968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Accepted: 10/15/2024] [Indexed: 12/14/2024] Open
Abstract
BACKGROUND Surgical site infections (SSIs) lead to increased mortality and morbidity, as well as increased healthcare costs. Multiple models for the prediction of this serious surgical complication have been developed, with an increasing use of machine learning (ML) tools. OBJECTIVE The aim of this systematic review was to assess the performance as well as the methodological quality of validated ML models for the prediction of SSIs. METHODS A systematic search in PubMed, Embase and the Cochrane library was performed from inception until July 2023. Exclusion criteria were the absence of reported model validation, SSIs as part of a composite adverse outcome, and pediatric populations. ML performance measures were evaluated, and ML performances were compared to regression-based methods for studies that reported both methods. Risk of bias (ROB) of the studies was assessed using the Prediction model Risk of Bias Assessment Tool. RESULTS Of the 4,377 studies screened, 24 were included in this review, describing 85 ML models. Most models were only internally validated (81%). The C-statistic was the most used performance measure (reported in 96% of the studies) and only two studies reported calibration metrics. A total of 116 different predictors were described, of which age, steroid use, sex, diabetes, and smoking were most frequently (100% to 75%) incorporated. Thirteen studies compared ML models to regression-based models and showed a similar performance of both modelling methods. For all included studies, the overall ROB was high or unclear. CONCLUSIONS A multitude of ML models for the prediction of SSIs are available, with large variability in performance. However, most models lacked external validation, performance was reported limitedly, and the risk of bias was high. In studies describing both ML models and regression-based models, one modelling method did not outperform the other.
Collapse
Affiliation(s)
- Anna M. van Boekel
- Department of Internal Medicine, Leiden University Medical Center, Leiden, The Netherlands
| | - Siri L. van der Meijden
- Department of Intensive Care, Leiden University Medical Center, Leiden, The Netherlands
- Healthplus.ai R&D B.V., Amsterdam, The Netherlands
| | - Sesmu M. Arbous
- Department of Intensive Care, Leiden University Medical Center, Leiden, The Netherlands
| | - Rob G. H. H. Nelissen
- Department of Orthopedic surgery, Leiden University Medical Center, Leiden, The Netherlands
| | - Karin E. Veldkamp
- Department of Medical Microbiology and Infection Control, Leiden University Medical Center, Leiden, The Netherlands
| | - Emma B. Nieswaag
- Department of Intensive Care, Leiden University Medical Center, Leiden, The Netherlands
- Healthplus.ai R&D B.V., Amsterdam, The Netherlands
| | - Kim F. T. Jochems
- Department of Intensive Care, Leiden University Medical Center, Leiden, The Netherlands
- Healthplus.ai R&D B.V., Amsterdam, The Netherlands
| | - Jeroen Holtz
- Department of Intensive Care, Leiden University Medical Center, Leiden, The Netherlands
- Healthplus.ai R&D B.V., Amsterdam, The Netherlands
| | - Annekee van IJlzinga Veenstra
- Department of Intensive Care, Leiden University Medical Center, Leiden, The Netherlands
- Healthplus.ai R&D B.V., Amsterdam, The Netherlands
| | - Jeroen Reijman
- Department of Intensive Care, Leiden University Medical Center, Leiden, The Netherlands
- Healthplus.ai R&D B.V., Amsterdam, The Netherlands
| | - Ype de Jong
- Department of Internal Medicine, Leiden University Medical Center, Leiden, The Netherlands
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Harry van Goor
- Department of Surgery, Radboud UMC, Nijmegen, The Netherlands
| | | | - Jan W. Schoones
- Waleus Medical Library, Leiden University Medical Center, Leiden, The Netherlands
| | | | - Mark G. J. de Boer
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
- Department of Infectious disease, Leiden University Medical Center, Leiden, The Netherlands
| |
Collapse
|
4
|
Ni H, Peng Y, Pan Q, Gao Z, Li S, Chen L, Lin Y. Prediction model of ICU readmission in Chinese patients with acute type A aortic dissection: a retrospective study. BMC Med Inform Decis Mak 2024; 24:358. [PMID: 39593004 PMCID: PMC11600566 DOI: 10.1186/s12911-024-02770-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Accepted: 11/15/2024] [Indexed: 11/28/2024] Open
Abstract
BACKGROUND Readmission to the intensive care unit (ICU) remains a severe challenge, leading to higher rates of death and a greater financial burden. This study aimed to develop a nomogram-based prediction model for individuals with acute type A aortic dissection (ATAAD). METHODS A total of 846 ATAAD patients were retrospectively enrolled between May 2014 and October 2021. Logistic regression was employed to identify the independent risk factors. The prediction model was evaluated using the Hosmer-Lemeshow (H-L) test, the calibration curve, and the area under the receiver operating characteristic curve (AUC). Decision curve analysis (DCA) was used to assess the clinical utility. RESULTS 57 (6.7%) ATAAD patients were readmitted to ICU following their release from the ICU. ICU readmission was predicted with age ≥ 65 years old, body mass index (BMI) ≥ 28 kg/m2, tracheotomy, continuous renal replacement therapy (CRRT), and the length of initial ICU stay were predictors of ICU readmission. The AUC was 0.837 (95%CI: 0.789-0.884) and the model fit the data well (H-L test, P = 0.519). DCA also demonstrated good clinical practicability. CONCLUSIONS This prediction model may be helpful for clinicians to assess the risk of ICU readmission, and facilitate the early identification of ATAAD patients at high risk.
Collapse
Affiliation(s)
- Hong Ni
- Department of Nursing, Fujian Medical University Union Hospital, Fuzhou, Fujian Province, China
- Department of Neurology, Fujian Medical University Union Hospital, Fuzhou, Fujian Province, China
| | - Yanchun Peng
- Department of Nursing, Fujian Medical University Union Hospital, Fuzhou, Fujian Province, China
| | - Qiong Pan
- Department of Nursing, Fujian Medical University Union Hospital, Fuzhou, Fujian Province, China
| | - Zhuling Gao
- Department of Neurology, Fujian Medical University Union Hospital, Fuzhou, Fujian Province, China
| | - Sailan Li
- Department of Cardiac Surgery, Fujian Medical University Union Hospital, Fuzhou, Fujian Province, China
| | - Liangwan Chen
- Department of Cardiac Surgery, Fujian Medical University Union Hospital, Fuzhou, Fujian Province, China.
| | - Yanjuan Lin
- Department of Nursing, Fujian Medical University Union Hospital, Fuzhou, Fujian Province, China.
- Department of Cardiac Surgery, Fujian Medical University Union Hospital, Fuzhou, Fujian Province, China.
| |
Collapse
|
5
|
Koochakpour K, Pant D, Westbye OS, Røst TB, Leventhal B, Koposov R, Clausen C, Skokauskas N, Nytrø Ø. Ability of clinical data to predict readmission in Child and Adolescent Mental Health Services. PeerJ Comput Sci 2024; 10:e2367. [PMID: 39650424 PMCID: PMC11622991 DOI: 10.7717/peerj-cs.2367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 09/07/2024] [Indexed: 12/11/2024]
Abstract
This study addresses the challenge of predicting readmissions in Child and Adolescent Mental Health Services (CAMHS) by analyzing the predictability of readmissions over short, medium, and long term periods. Using health records spanning 35 years, which included 22,643 patients and 30,938 episodes of care, we focused on the episode of care as a central unit, defined as a referral-discharge cycle that incorporates assessments and interventions. Data pre-processing involved handling missing values, normalizing, and transforming data, while resolving issues related to overlapping episodes and correcting registration errors where possible. Readmission prediction was inferred from electronic health records (EHR), as this variable was not directly recorded. A binary classifier distinguished between readmitted and non-readmitted patients, followed by a multi-class classifier to categorize readmissions based on timeframes: short (within 6 months), medium (6 months - 2 years), and long (more than 2 years). Several predictive models were evaluated based on metrics like AUC, F1-score, precision, and recall, and the K-prototype algorithm was employed to explore similarities between episodes through clustering. The optimal binary classifier (Oversampled Gradient Boosting) achieved an AUC of 0.7005, while the multi-class classifier (Oversampled Random Forest) reached an AUC of 0.6368. The K-prototype resulted in three clusters as optimal (SI: 0.256, CI: 4473.64). Despite identifying relationships between care intensity, case complexity, and readmission risk, generalizing these findings proved difficult, partly because clinicians often avoid discharging patients likely to be readmitted. Overall, while this dataset offers insights into patient care and service patterns, predicting readmissions remains challenging, suggesting a need for improved analytical models that consider patient development, disease progression, and intervention effects.
Collapse
Affiliation(s)
- Kaban Koochakpour
- Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway
| | - Dipendra Pant
- Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway
- Department of Child and Adolescent Psychiatry, Clinic of Mental Health Care, St. Olav University Hospital, Trondheim, Norway
| | - Odd Sverre Westbye
- Department of Child and Adolescent Psychiatry, Clinic of Mental Health Care, St. Olav University Hospital, Trondheim, Norway
- Regional Centre for Child and Youth Mental Health and Child Welfare (RKBU Central Norway), Department of Mental Health, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway
| | - Thomas Brox Røst
- Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway
- Vivit AS, Trondheim, Norway
| | | | - Roman Koposov
- Regional Centre for Child and Youth Mental Health and Child Welfare (RKBU North), UiT The Arctic University of Norway, Tromsø, Norway
| | - Carolyn Clausen
- Regional Centre for Child and Youth Mental Health and Child Welfare (RKBU Central Norway), Department of Mental Health, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway
| | - Norbert Skokauskas
- Regional Centre for Child and Youth Mental Health and Child Welfare (RKBU Central Norway), Department of Mental Health, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway
| | - Øystein Nytrø
- Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway
- Department of Child and Adolescent Psychiatry, Clinic of Mental Health Care, St. Olav University Hospital, Trondheim, Norway
- Department of Computer Science, UiT The Arctic University of Norway, Tromsø, Norway
| |
Collapse
|
6
|
Dantan E, Foucher Y, Simon-Pimmel J, Léger M, Campfort M, Lasocki S, Lakhal K, Bouras M, Roquilly A, Cinotti R. Long-term survival of traumatic brain injury and intra-cerebral haemorrhage patients: A multicentric observational cohort. J Crit Care 2024; 83:154843. [PMID: 38875914 DOI: 10.1016/j.jcrc.2024.154843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 05/13/2024] [Accepted: 06/06/2024] [Indexed: 06/16/2024]
Abstract
PURPOSE Mortality is often assessed during ICU stay and early after, but rarely at later stage. We aimed to compare the long-term mortality between TBI and ICH patients. MATERIALS AND METHODS From an observational cohort, we studied 580 TBI patients and 435 ICH patients, admitted from January 2013 to February 2021 in 3 ICUs and alive at 7-days post-ICU discharge. We performed a Lasso-penalized Cox survival analysis. RESULTS We estimated 7-year survival rates at 72.8% (95%CI from 67.3% to 78.7%) for ICH patients and at 84.9% (95%CI from 80.9% to 89.1%) for TBI patients: ICH patients presenting a higher mortality risk than TBI patients. Additionally, we identified variables associated with higher mortality risk (age, ICU length of stay, tracheostomy, low GCS, absence of intracranial pressure monitoring). We also observed anisocoria related with the mortality risk in the early stage after ICU stay. CONCLUSIONS In this ICU survivor population with a prolonged follow-up, we highlight an acute risk of death after ICU stay, which seems to last longer in ICH patients. Several variables characteristic of disease severity appeared associated with long-term mortality, raising the hypothesis that the most severe patients deserve closer follow-up after ICU stay.
Collapse
Affiliation(s)
- E Dantan
- Nantes Université, Univ Tours, CHU Nantes, INSERM, MethodS in Patients-centered outcomes and HEalth Research, SPHERE, F-44000 Nantes, France.
| | - Y Foucher
- Poitiers Université, CHU de Poitiers, CIC INSERM 1402, Poitiers, France
| | - J Simon-Pimmel
- Nantes Université, Univ Tours, CHU Nantes, INSERM, MethodS in Patients-centered outcomes and HEalth Research, SPHERE, F-44000 Nantes, France
| | - M Léger
- Department of Anaesthesiology and Critical Care, Angers University, CHU Angers, Angers, France
| | - M Campfort
- Department of Anaesthesiology and Critical Care, Angers University, CHU Angers, Angers, France
| | - S Lasocki
- Department of Anaesthesiology and Critical Care, Angers University, CHU Angers, Angers, France
| | - K Lakhal
- Nantes Université, CHU Nantes, Pôle Anesthésie Réanimations, Service d'Anesthésie Réanimation Chirurgicale, Hôpital Laennec, Nantes F-44093, France
| | - M Bouras
- Nantes Université, CHU Nantes, INSERM, Center for Research in Transplantation and Translational Immunology, UMR, 1064 Nantes, France; CHU Nantes, INSERM, Nantes Université, Anesthesie Reanimation, CIC0004, 1413 Nantes, France
| | - A Roquilly
- Nantes Université, CHU Nantes, INSERM, Center for Research in Transplantation and Translational Immunology, UMR, 1064 Nantes, France; CHU Nantes, INSERM, Nantes Université, Anesthesie Reanimation, CIC0004, 1413 Nantes, France
| | - R Cinotti
- Nantes Université, Univ Tours, CHU Nantes, INSERM, MethodS in Patients-centered outcomes and HEalth Research, SPHERE, F-44000 Nantes, France; Nantes Université, CHU Nantes, Pôle Anesthésie Réanimations, Service d'Anesthésie Réanimation chirurgicale, Hôtel Dieu, Nantes F-44093, France
| |
Collapse
|
7
|
Lin J, Yang J, Yin M, Tang Y, Chen L, Xu C, Zhu S, Gao J, Liu L, Liu X, Gu C, Huang Z, Wei Y, Zhu J. Development and Validation of Multimodal Models to Predict the 30-Day Mortality of ICU Patients Based on Clinical Parameters and Chest X-Rays. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024; 37:1312-1322. [PMID: 38448758 PMCID: PMC11300735 DOI: 10.1007/s10278-024-01066-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 02/21/2024] [Accepted: 02/22/2024] [Indexed: 03/08/2024]
Abstract
We aimed to develop and validate multimodal ICU patient prognosis models that combine clinical parameters data and chest X-ray (CXR) images. A total of 3798 subjects with clinical parameters and CXR images were extracted from the Medical Information Mart for Intensive Care IV (MIMIC-IV) database and an external hospital (the test set). The primary outcome was 30-day mortality after ICU admission. Automated machine learning (AutoML) and convolutional neural networks (CNNs) were used to construct single-modal models based on clinical parameters and CXR separately. An early fusion approach was used to integrate both modalities (clinical parameters and CXR) into a multimodal model named PrismICU. Compared to the single-modal models, i.e., the clinical parameter model (AUC = 0.80, F1-score = 0.43) and the CXR model (AUC = 0.76, F1-score = 0.45) and the scoring system APACHE II (AUC = 0.83, F1-score = 0.77), PrismICU (AUC = 0.95, F1 score = 0.95) showed improved performance in predicting the 30-day mortality in the validation set. In the test set, PrismICU (AUC = 0.82, F1-score = 0.61) was also better than the clinical parameters model (AUC = 0.72, F1-score = 0.50), CXR model (AUC = 0.71, F1-score = 0.36), and APACHE II (AUC = 0.62, F1-score = 0.50). PrismICU, which integrated clinical parameters data and CXR images, performed better than single-modal models and the existing scoring system. It supports the potential of multimodal models based on structured data and imaging in clinical management.
Collapse
Affiliation(s)
- Jiaxi Lin
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Jiangsu, Suzhou 215006, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou, China
| | - Jin Yang
- Department of Critical Care Medicine, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Jiangsu, Suzhou 215006, China
| | - Minyue Yin
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Jiangsu, Suzhou 215006, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou, China
| | - Yuxiu Tang
- Department of Critical Care Medicine, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Jiangsu, Suzhou 215006, China
| | - Liquan Chen
- Department of Critical Care Medicine, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Jiangsu, Suzhou 215006, China
| | - Chang Xu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Jiangsu, Suzhou 215006, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou, China
| | - Shiqi Zhu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Jiangsu, Suzhou 215006, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou, China
| | - Jingwen Gao
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Jiangsu, Suzhou 215006, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou, China
| | - Lu Liu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Jiangsu, Suzhou 215006, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou, China
| | - Xiaolin Liu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Jiangsu, Suzhou 215006, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou, China
| | - Chenqi Gu
- Department of Radiology, The First Affiliated Hospital of Soochow University, Suzhou, China
| | - Zhou Huang
- Department of Radiology, The First Affiliated Hospital of Soochow University, Suzhou, China
| | - Yao Wei
- Department of Critical Care Medicine, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Jiangsu, Suzhou 215006, China.
| | - Jinzhou Zhu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, 188 Shizi Street, Jiangsu, Suzhou 215006, China.
- Suzhou Clinical Center of Digestive Diseases, Suzhou, China.
| |
Collapse
|
8
|
Liou L, Scott E, Parchure P, Ouyang Y, Egorova N, Freeman R, Hofer IS, Nadkarni GN, Timsina P, Kia A, Levin MA. Assessing calibration and bias of a deployed machine learning malnutrition prediction model within a large healthcare system. NPJ Digit Med 2024; 7:149. [PMID: 38844546 PMCID: PMC11156633 DOI: 10.1038/s41746-024-01141-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 05/22/2024] [Indexed: 06/09/2024] Open
Abstract
Malnutrition is a frequently underdiagnosed condition leading to increased morbidity, mortality, and healthcare costs. The Mount Sinai Health System (MSHS) deployed a machine learning model (MUST-Plus) to detect malnutrition upon hospital admission. However, in diverse patient groups, a poorly calibrated model may lead to misdiagnosis, exacerbating health care disparities. We explored the model's calibration across different variables and methods to improve calibration. Data from adult patients admitted to five MSHS hospitals from January 1, 2021 - December 31, 2022, were analyzed. We compared MUST-Plus prediction to the registered dietitian's formal assessment. Hierarchical calibration was assessed and compared between the recalibration sample (N = 49,562) of patients admitted between January 1, 2021 - December 31, 2022, and the hold-out sample (N = 17,278) of patients admitted between January 1, 2023 - September 30, 2023. Statistical differences in calibration metrics were tested using bootstrapping with replacement. Before recalibration, the overall model calibration intercept was -1.17 (95% CI: -1.20, -1.14), slope was 1.37 (95% CI: 1.34, 1.40), and Brier score was 0.26 (95% CI: 0.25, 0.26). Both weak and moderate measures of calibration were significantly different between White and Black patients and between male and female patients. Logistic recalibration significantly improved calibration of the model across race and gender in the hold-out sample. The original MUST-Plus model showed significant differences in calibration between White vs. Black patients. It also overestimated malnutrition in females compared to males. Logistic recalibration effectively reduced miscalibration across all patient subgroups. Continual monitoring and timely recalibration can improve model accuracy.
Collapse
Affiliation(s)
- Lathan Liou
- Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | | | - Prathamesh Parchure
- Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Yuxia Ouyang
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Anesthesiology, Perioperative and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Natalia Egorova
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Robert Freeman
- Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ira S Hofer
- Department of Anesthesiology, Perioperative and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Division of Data Driven and Digital Medicine (D3M), The Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Girish N Nadkarni
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Division of Data Driven and Digital Medicine (D3M), The Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Prem Timsina
- Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Arash Kia
- Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Anesthesiology, Perioperative and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Matthew A Levin
- Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Anesthesiology, Perioperative and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
9
|
Biesheuvel LA, Dongelmans DA, Elbers PW. Artificial intelligence to advance acute and intensive care medicine. Curr Opin Crit Care 2024; 30:246-250. [PMID: 38525882 PMCID: PMC11064910 DOI: 10.1097/mcc.0000000000001150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2024]
Abstract
PURPOSE OF REVIEW This review explores recent key advancements in artificial intelligence for acute and intensive care medicine. As artificial intelligence rapidly evolves, this review aims to elucidate its current applications, future possibilities, and the vital challenges that are associated with its integration into emergency medical dispatch, triage, medical consultation and ICUs. RECENT FINDINGS The integration of artificial intelligence in emergency medical dispatch (EMD) facilitates swift and accurate assessment. In the emergency department (ED), artificial intelligence driven triage models leverage diverse patient data for improved outcome predictions, surpassing human performance in retrospective studies. Artificial intelligence can streamline medical documentation in the ED and enhances medical imaging interpretation. The introduction of large multimodal generative models showcases the future potential to process varied biomedical data for comprehensive decision support. In the ICU, artificial intelligence applications range from early warning systems to treatment suggestions. SUMMARY Despite promising academic strides, widespread artificial intelligence adoption in acute and critical care is hindered by ethical, legal, technical, organizational, and validation challenges. Despite these obstacles, artificial intelligence's potential to streamline clinical workflows is evident. When these barriers are overcome, future advancements in artificial intelligence have the potential to transform the landscape of patient care for acute and intensive care medicine.
Collapse
Affiliation(s)
- Laurens A. Biesheuvel
- Department of Intensive Care Medicine, Center for Critical Care Computational Intelligence (C4I), Amsterdam Medical Data Science (AMDS), Amsterdam Cardiovascular Science (ACS), Amsterdam Institute for Infection and Immunity (AII), Amsterdam Public Health (APH), Amsterdam UMC
- Quantitative Data Analytics Group, Department of Computer Science, Faculty of Science, Vrije Universiteit
| | - Dave A. Dongelmans
- Department of Intensive Care Medicine, Amsterdam Public Health (APH), Amsterdam UMC, University of Amsterdam
- National Intensive Care Evaluation Foundation, Amsterdam, The Netherlands
| | - Paul W.G. Elbers
- Department of Intensive Care Medicine, Center for Critical Care Computational Intelligence (C4I), Amsterdam Medical Data Science (AMDS), Amsterdam Cardiovascular Science (ACS), Amsterdam Institute for Infection and Immunity (AII), Amsterdam Public Health (APH), Amsterdam UMC
| |
Collapse
|
10
|
Lost J, Ashraf N, Jekel L, von Reppert M, Tillmanns N, Willms K, Merkaj S, Petersen GC, Avesta A, Ramakrishnan D, Omuro A, Nabavizadeh A, Bakas S, Bousabarah K, Lin M, Aneja S, Sabel M, Aboian M. Enhancing clinical decision-making: An externally validated machine learning model for predicting isocitrate dehydrogenase mutation in gliomas using radiomics from presurgical magnetic resonance imaging. Neurooncol Adv 2024; 6:vdae157. [PMID: 39659829 PMCID: PMC11630777 DOI: 10.1093/noajnl/vdae157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2024] Open
Abstract
Background Glioma, the most prevalent primary brain tumor, poses challenges in prognosis, particularly in the high-grade subclass, despite advanced treatments. The recent shift in tumor classification underscores the crucial role of isocitrate dehydrogenase (IDH) mutation status in the clinical care of glioma patients. However, conventional methods for determining IDH status, including biopsy, have limitations. Exploring the use of machine learning (ML) on magnetic resonance imaging to predict IDH mutation status shows promise but encounters challenges in generalizability and translation into clinical practice because most studies either use single institution or homogeneous datasets for model training and validation. Our study aims to bridge this gap by using multi-institution data for model validation. Methods This retrospective study utilizes data from large, annotated datasets for internal (377 cases from Yale New Haven Hospitals) and external validation (207 cases from facilities outside Yale New Haven Health). The 6-step research process includes image acquisition, semi-automated tumor segmentation, feature extraction, model building with feature selection, internal validation, and external validation. An extreme gradient boosting ML model predicted the IDH mutation status, confirmed by immunohistochemistry. Results The ML model demonstrated high performance, with an Area under the Curve (AUC), Accuracy, Sensitivity, and Specificity in internal validation of 0.862, 0.865, 0.885, and 0.713, and external validation of 0.835, 0.851, 0.850, and 0.847. Conclusions The ML model, built on a heterogeneous dataset, provided robust results in external validation for the prediction task, emphasizing its potential clinical utility. Future research should explore expanding its applicability and validation in diverse global healthcare settings.
Collapse
Affiliation(s)
- Jan Lost
- Department of Neurosurgery, Heinrich-Heine University, Dusseldorf, Germany
| | - Nader Ashraf
- College of Medicine, Alfaisal University, Riyadh, Saudi Arabia
| | - Leon Jekel
- DKFZ Division of Translational Neurooncology at the WTZ, German Cancer Consortium, DKTK Partner Site, University Hospital Essen, Essen, Germany
| | | | - Niklas Tillmanns
- Department of Diagnostic and Interventional Radiology, Medical Faculty, University Dusseldorf, Dusseldorf, Germany
| | | | | | | | - Arman Avesta
- Department of Radiology, Massachusetts General Hospital, Boston, Massachusetts, USA
| | - Divya Ramakrishnan
- Department of Radiology and Biomedical Imaging, Yale School of Medicine, New Haven, Connecticut, USA
| | - Antonio Omuro
- Department of Neurology and Yale Cancer Center, Yale School of Medicine, New Haven, Connecticut, USA
| | - Ali Nabavizadeh
- Department of Radiology, Perelman School of Medicine, Hospital of University of Pennsylvania, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Spyridon Bakas
- Division of Computational Pathology, Department of Pathology and Laboratory Medicine, Indiana University School of Medicine, Indianapolis, Indiana, USA
| | | | - MingDe Lin
- Visage Imaging, Inc., San Diego, California, USA
| | - Sanjay Aneja
- Department of Therapeutic Radiology, Yale School of Medicine, New Haven, Connecticut, USA
| | | | - Mariam Aboian
- Department of Radiology, Children’s Hospital of Philadelphia (CHOP), Philadelphia, Pennsylvania, USA
| |
Collapse
|
11
|
Gorham TJ, Tumin D, Groner J, Allen E, Retzke J, Hersey S, Liu SB, Macias C, Alachraf K, Smith AW, Blount T, Wall B, Crickmore K, Wooten WI, Jamison SD, Rust S. Predicting emergency department visits among children with asthma in two academic medical systems. J Asthma 2023; 60:2137-2144. [PMID: 37318283 DOI: 10.1080/02770903.2023.2225603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 06/11/2023] [Indexed: 06/16/2023]
Abstract
Objective: To develop and validate a predictive algorithm that identifies pediatric patients at risk of asthma-related emergencies, and to test whether algorithm performance can be improved in an external site via local retraining.Methods: In a retrospective cohort at the first site, data from 26 008 patients with asthma aged 2-18 years (2012-2017) were used to develop a lasso-regularized logistic regression model predicting emergency department visits for asthma within one year of a primary care encounter, known as the Asthma Emergency Risk (AER) score. Internal validation was conducted on 8634 patient encounters from 2018. External validation of the AER score was conducted using 1313 pediatric patient encounters from a second site during 2018. The AER score components were then reweighted using logistic regression using data from the second site to improve local model performance. Prediction intervals (PI) were constructed via 10 000 bootstrapped samples.Results: At the first site, the AER score had a cross-validated area under the receiver operating characteristic curve (AUROC) of 0.768 (95% PI: 0.745-0.790) during model training and an AUROC of 0.769 in the 2018 internal validation dataset (p = 0.959). When applied without modification to the second site, the AER score had an AUROC of 0.684 (95% PI: 0.624-0.742). After local refitting, the cross-validated AUROC improved to 0.737 (95% PI: 0.676-0.794; p = 0.037 as compared to initial AUROC).Conclusions: The AER score demonstrated strong internal validity, but external validity was dependent on reweighting model components to reflect local data characteristics at the external site.
Collapse
Affiliation(s)
- Tyler J Gorham
- Information Technology Research & Innovation, Nationwide Children's Hospital, Columbus, OH, USA
| | - Dmitry Tumin
- Department of Pediatrics, Brody School of Medicine at East Carolina University, Greenville, NC, USA
| | - Judith Groner
- Division of Primary Care Pediatrics, Nationwide Children's Hospital, Columbus, OH, USA
- Department of Pediatrics, The Ohio State University, Columbus, OH, USA
| | - Elizabeth Allen
- Division of Primary Care Pediatrics, Nationwide Children's Hospital, Columbus, OH, USA
- Department of Pediatrics, The Ohio State University, Columbus, OH, USA
| | - Jessica Retzke
- Division of Primary Care Pediatrics, Nationwide Children's Hospital, Columbus, OH, USA
- Department of Pediatrics, The Ohio State University, Columbus, OH, USA
| | - Stephen Hersey
- Division of Primary Care Pediatrics, Nationwide Children's Hospital, Columbus, OH, USA
- Department of Pediatrics, The Ohio State University, Columbus, OH, USA
| | - Swan Bee Liu
- Information Technology Research & Innovation, Nationwide Children's Hospital, Columbus, OH, USA
| | - Charlie Macias
- Quality Improvement Services, Nationwide Children's Hospital, Columbus, OH, USA
| | - Kamel Alachraf
- Brody School of Medicine at East Carolina University, Greenville, NC, USA
| | - Aimee W Smith
- Department of Psychology, East Carolina University, Greenville, NC, USA
| | | | | | | | - William I Wooten
- Department of Pediatrics, Brody School of Medicine at East Carolina University, Greenville, NC, USA
| | - Shaundreal D Jamison
- Department of Pediatrics, Brody School of Medicine at East Carolina University, Greenville, NC, USA
| | - Steve Rust
- Information Technology Research & Innovation, Nationwide Children's Hospital, Columbus, OH, USA
| |
Collapse
|
12
|
Kagendi N, Mwau M. A Machine Learning Approach to Predict HIV Viral Load Hotspots in Kenya Using Real-World Data. HEALTH DATA SCIENCE 2023; 3:0019. [PMID: 38487196 PMCID: PMC10880164 DOI: 10.34133/hds.0019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 04/25/2023] [Indexed: 03/17/2024]
Abstract
Background Machine learning models are not in routine use for predicting HIV status. Our objective is to describe the development of a machine learning model to predict HIV viral load (VL) hotspots as an early warning system in Kenya, based on routinely collected data by affiliate entities of the Ministry of Health. Based on World Health Organization's recommendations, hotspots are health facilities with ≥20% people living with HIV whose VL is not suppressed. Prediction of VL hotspots provides an early warning system to health administrators to optimize treatment and resources distribution. Methods A random forest model was built to predict the hotspot status of a health facility in the upcoming month, starting from 2016. Prior to model building, the datasets were cleaned and checked for outliers and multicollinearity at the patient level. The patient-level data were aggregated up to the facility level before model building. We analyzed data from 4 million tests and 4,265 facilities. The dataset at the health facility level was divided into train (75%) and test (25%) datasets. Results The model discriminates hotspots from non-hotspots with an accuracy of 78%. The F1 score of the model is 69% and the Brier score is 0.139. In December 2019, our model correctly predicted 434 VL hotspots in addition to the observed 446 VL hotspots. Conclusion The hotspot mapping model can be essential to antiretroviral therapy programs. This model can provide support to decision-makers to identify VL hotspots ahead in time using cost-efficient routinely collected data.
Collapse
Affiliation(s)
| | - Matilu Mwau
- Kenya Medical Research Institute, Nairobi, Kenya
| |
Collapse
|
13
|
van der Meijden S, Arbous M, Geerts B. Possibilities and challenges for artificial intelligence and machine learning in perioperative care. BJA Educ 2023; 23:288-294. [PMID: 37465235 PMCID: PMC10350557 DOI: 10.1016/j.bjae.2023.04.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/21/2023] [Indexed: 07/20/2023] Open
Affiliation(s)
- S.L. van der Meijden
- Healthplus.ai-R&D B.V., Amsterdam, The Netherlands
- Intensive Care Unit, Leiden University Medical Centre, Leiden, The Netherlands
| | - M.S. Arbous
- Intensive Care Unit, Leiden University Medical Centre, Leiden, The Netherlands
| | - B.F. Geerts
- Healthplus.ai-R&D B.V., Amsterdam, The Netherlands
| |
Collapse
|
14
|
de Hond AAH, Shah VB, Kant IMJ, Van Calster B, Steyerberg EW, Hernandez-Boussard T. Perspectives on validation of clinical predictive algorithms. NPJ Digit Med 2023; 6:86. [PMID: 37149704 PMCID: PMC10163568 DOI: 10.1038/s41746-023-00832-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Accepted: 04/28/2023] [Indexed: 05/08/2023] Open
Affiliation(s)
- Anne A H de Hond
- Clinical AI Implementation and Research Lab, Leiden University Medical Centre, Leiden, the Netherlands.
- Department of Medicine (Biomedical Informatics), Stanford University, Stanford, CA, USA.
- Department of Biomedical Data Sciences, Leiden University Medical Centre, Leiden, the Netherlands.
| | - Vaibhavi B Shah
- Department of Medicine (Biomedical Informatics), Stanford University, Stanford, CA, USA
| | - Ilse M J Kant
- Department of Digital Health, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Ben Van Calster
- Department of Biomedical Data Sciences, Leiden University Medical Centre, Leiden, the Netherlands
- Department of Development & Regeneration, KU Leuven, Leuven, Belgium
| | - Ewout W Steyerberg
- Clinical AI Implementation and Research Lab, Leiden University Medical Centre, Leiden, the Netherlands
- Department of Biomedical Data Sciences, Leiden University Medical Centre, Leiden, the Netherlands
| | - Tina Hernandez-Boussard
- Department of Medicine (Biomedical Informatics), Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
- Department of Epidemiology & Population Health (by courtesy), Stanford University, Stanford, CA, USA
| |
Collapse
|
15
|
Predicting Readmission or Death After Discharge From the ICU: External Validation and Retraining of a Machine Learning Model: Erratum. Crit Care Med 2023; 51:e105. [PMID: 36928025 PMCID: PMC10510829 DOI: 10.1097/ccm.0000000000005818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
|