1
|
Wang S, Zhu X. Nationwide hospital admission data statistics and disease-specific 30-day readmission prediction. Health Inf Sci Syst 2022; 10:25. [PMID: 36065327 PMCID: PMC9439279 DOI: 10.1007/s13755-022-00195-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Accepted: 08/19/2022] [Indexed: 11/26/2022] Open
Abstract
Purpose Hospital readmission prediction uses historical patient visit data to train machine learning models to predict risk of patients being readmitted after the discharge. Data used to train models, such as patient demographics, disease types, localized distributions etc., play significant roles in the model performance. To date, many methods exist for hospital readmission prediction, but answers to some important questions still remain open. For example, how will demographics, such as gender, age, geographic, impact on readmission prediction? Do patients suffering from different diseases vary significantly in their readmission rates? What are the nationwide hospital admission data characteristics? and how do hospital speciality, ownership, and locations impact on their readmission rates? In this study, we carry systematic investigations to answer the above questions, and propose a predictive modeling framework to predict disease-specific 30-day hospital readmission. Methods We first implement statistics analysis by using National Readmission Databases (NRD) with over 15 million hospital visits. After that, we create features and disease-specific readmission datasets. An ensemble learning framework is proposed to conduct hospital readmission prediction and Friedman test and Nemenyi post-hoc test is used to validate our proposed method. Results Using National Readmission Databases (NRD), with over 15 million hospital visits, as our testbed, we summarize nationwide patient admission data statistics, in related to demographic, disease types, and hospital factors. We use feature engineering to design 526 representative features to model each patient visit. Our studies found that readmission rates vary significantly from diseases to diseases. For six diseases studied in our research, their readmission rates vary from 1.832 (Pneumonia) to 8.761% (Diabetes). Using random sampling and voting approaches, our study shows that soft voting outperforms hard voting on majority results, especially for AUC and balanced accuracy which are the main measures for imbalanced data. Random under sampling using 1.1:1 for negative:positive ratio achieves the best performance for AUC, balanced accuracy, and F1-score. Conclusion This paper carries out systematic studies to understand US nationwide hospital readmission data statistics, and further designs a machine learning framework for disease-specific 30-day hospital readmission prediction. Our study shows that hospital readmission rates vary significantly with respect to different disease types, gender, age groups, any other factors. Gradient boosting achieves the best performance for disease specific hospital readmission prediction.
Collapse
Affiliation(s)
- Shuwen Wang
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, 777 Glades, Boca Raton, FL 33431 USA
| | - Xingquan Zhu
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, 777 Glades, Boca Raton, FL 33431 USA
| |
Collapse
|
2
|
Ning Y, Li S, Ong MEH, Xie F, Chakraborty B, Ting DSW, Liu N. A novel interpretable machine learning system to generate clinical risk scores: An application for predicting early mortality or unplanned readmission in a retrospective cohort study. PLOS DIGITAL HEALTH 2022; 1:e0000062. [PMID: 36812536 PMCID: PMC9931273 DOI: 10.1371/journal.pdig.0000062] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 05/10/2022] [Indexed: 01/19/2023]
Abstract
Risk scores are widely used for clinical decision making and commonly generated from logistic regression models. Machine-learning-based methods may work well for identifying important predictors to create parsimonious scores, but such 'black box' variable selection limits interpretability, and variable importance evaluated from a single model can be biased. We propose a robust and interpretable variable selection approach using the recently developed Shapley variable importance cloud (ShapleyVIC) that accounts for variability in variable importance across models. Our approach evaluates and visualizes overall variable contributions for in-depth inference and transparent variable selection, and filters out non-significant contributors to simplify model building steps. We derive an ensemble variable ranking from variable contributions across models, which is easily integrated with an automated and modularized risk score generator, AutoScore, for convenient implementation. In a study of early death or unplanned readmission after hospital discharge, ShapleyVIC selected 6 variables from 41 candidates to create a well-performing risk score, which had similar performance to a 16-variable model from machine-learning-based ranking. Our work contributes to the recent emphasis on interpretability of prediction models for high-stakes decision making, providing a disciplined solution to detailed assessment of variable importance and transparent development of parsimonious clinical risk scores.
Collapse
Affiliation(s)
- Yilin Ning
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
| | - Siqi Li
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
| | - Marcus Eng Hock Ong
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore, Singapore,Health Services Research Centre, Singapore Health Services, Singapore, Singapore,Department of Emergency Medicine, Singapore General Hospital, Singapore, Singapore
| | - Feng Xie
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore,Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore, Singapore
| | - Bibhas Chakraborty
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore,Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore, Singapore,Department of Statistics and Data Science, National University of Singapore, Singapore, Singapore,Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, United States of America
| | - Daniel Shu Wei Ting
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore,Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore,SingHealth AI Health Program, Singapore Health Services, Singapore, Singapore
| | - Nan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore,Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore, Singapore,Health Services Research Centre, Singapore Health Services, Singapore, Singapore,SingHealth AI Health Program, Singapore Health Services, Singapore, Singapore,Institute of Data Science, National University of Singapore, Singapore, Singapore,* E-mail:
| |
Collapse
|