1
|
Tang M, Zhao Y, Xiao J, Jiang S, Tan J, Xu Q, Pan C, Wang J. Development and validation of a predictive model for prolonged length of stay in elderly type 2 diabetes mellitus patients combined with cerebral infarction. Front Neurol 2024; 15:1405096. [PMID: 39148703 PMCID: PMC11325865 DOI: 10.3389/fneur.2024.1405096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 07/12/2024] [Indexed: 08/17/2024] Open
Abstract
Background This study aimed to identify the predictive factors for prolonged length of stay (LOS) in elderly type 2 diabetes mellitus (T2DM) patients suffering from cerebral infarction (CI) and construct a predictive model to effectively utilize hospital resources. Methods Clinical data were retrospectively collected from T2DM patients suffering from CI aged ≥65 years who were admitted to five tertiary hospitals in Southwest China. The least absolute shrinkage and selection operator (LASSO) regression model and multivariable logistic regression analysis were conducted to identify the independent predictors of prolonged LOS. A nomogram was constructed to visualize the model. The discrimination, calibration, and clinical practicality of the model were evaluated according to the area under the receiver operating characteristic curve (AUROC), calibration curve, decision curve analysis (DCA), and clinical impact curve (CIC). Results A total of 13,361 patients were included, comprising 6,023, 2,582, and 4,756 patients in the training, internal validation, and external validation sets, respectively. The results revealed that the ACCI score, OP, PI, analgesics use, antibiotics use, psychotropic drug use, insurance type, and ALB were independent predictors for prolonged LOS. The eight-predictor LASSO logistic regression displayed high prediction ability, with an AUROC of 0.725 (95% confidence interval [CI]: 0.710-0.739), a sensitivity of 0.662 (95% CI: 0.639-0.686), and a specificity of 0.675 (95% CI: 0.661-0.689). The calibration curve (bootstraps = 1,000) showed good calibration. In addition, the DCA and CIC also indicated good clinical practicality. An operation interface on a web page (https://xxmyyz.shinyapps.io/prolonged_los1/) was also established to facilitate clinical use. Conclusion The developed model can predict the risk of prolonged LOS in elderly T2DM patients diagnosed with CI, enabling clinicians to optimize bed management.
Collapse
Affiliation(s)
- Mingshan Tang
- Department of Neurology, Affiliated Banan Hospital of Chongqing Medical University, Chongqing, China
| | - Yan Zhao
- Department of Neurology, Affiliated Banan Hospital of Chongqing Medical University, Chongqing, China
| | - Jing Xiao
- Department of Neurology, Affiliated Banan Hospital of Chongqing Medical University, Chongqing, China
| | - Side Jiang
- Department of Neurology, Affiliated Banan Hospital of Chongqing Medical University, Chongqing, China
| | - Juntao Tan
- Operation Management Office, Affiliated Banan Hospital of Chongqing Medical University, Chongqing, China
| | - Qian Xu
- Library, Chongqing Medical University, Chongqing, China
| | - Chengde Pan
- Department of Neurology, Affiliated Banan Hospital of Chongqing Medical University, Chongqing, China
| | - Jie Wang
- Department of Neurology, Affiliated Banan Hospital of Chongqing Medical University, Chongqing, China
| |
Collapse
|
2
|
Abdurrab I, Mahmood T, Sheikh S, Aijaz S, Kashif M, Memon A, Ali I, Peerwani G, Pathan A, Alkhodre AB, Siddiqui MS. Predicting the Length of Stay of Cardiac Patients Based on Pre-Operative Variables-Bayesian Models vs. Machine Learning Models. Healthcare (Basel) 2024; 12:249. [PMID: 38255136 PMCID: PMC10815919 DOI: 10.3390/healthcare12020249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 01/04/2024] [Accepted: 01/16/2024] [Indexed: 01/24/2024] Open
Abstract
Length of stay (LoS) prediction is deemed important for a medical institution's operational and logistical efficiency. Sound estimates of a patient's stay increase clinical preparedness and reduce aberrations. Various statistical methods and techniques are used to quantify and predict the LoS of a patient based on pre-operative clinical features. This study evaluates and compares the results of Bayesian (simple Bayesian regression and hierarchical Bayesian regression) models and machine learning (ML) regression models against multiple evaluation metrics for the problem of LoS prediction of cardiac patients admitted to Tabba Heart Institute, Karachi, Pakistan (THI) between 2015 and 2020. In addition, the study also presents the use of hierarchical Bayesian regression to account for data variability and skewness without homogenizing the data (by removing outliers). LoS estimates from the hierarchical Bayesian regression model resulted in a root mean squared error (RMSE) and mean absolute error (MAE) of 1.49 and 1.16, respectively. Simple Bayesian regression (without hierarchy) achieved an RMSE and MAE of 3.36 and 2.05, respectively. The average RMSE and MAE of ML models remained at 3.36 and 1.98, respectively.
Collapse
Affiliation(s)
- Ibrahim Abdurrab
- Department of Computer Science, Institute of Business Administration, Karachi 75270, Pakistan;
| | - Tariq Mahmood
- Department of Computer Science, Institute of Business Administration, Karachi 75270, Pakistan;
| | - Sana Sheikh
- Department of Clinical Research Cardiology, Tabba Heart Institute, Karachi 75950, Pakistan; (S.S.); (S.A.); (M.K.); (A.M.); (I.A.); (G.P.); (A.P.)
| | - Saba Aijaz
- Department of Clinical Research Cardiology, Tabba Heart Institute, Karachi 75950, Pakistan; (S.S.); (S.A.); (M.K.); (A.M.); (I.A.); (G.P.); (A.P.)
| | - Muhammad Kashif
- Department of Clinical Research Cardiology, Tabba Heart Institute, Karachi 75950, Pakistan; (S.S.); (S.A.); (M.K.); (A.M.); (I.A.); (G.P.); (A.P.)
| | - Ahson Memon
- Department of Clinical Research Cardiology, Tabba Heart Institute, Karachi 75950, Pakistan; (S.S.); (S.A.); (M.K.); (A.M.); (I.A.); (G.P.); (A.P.)
| | - Imran Ali
- Department of Clinical Research Cardiology, Tabba Heart Institute, Karachi 75950, Pakistan; (S.S.); (S.A.); (M.K.); (A.M.); (I.A.); (G.P.); (A.P.)
| | - Ghazal Peerwani
- Department of Clinical Research Cardiology, Tabba Heart Institute, Karachi 75950, Pakistan; (S.S.); (S.A.); (M.K.); (A.M.); (I.A.); (G.P.); (A.P.)
| | - Asad Pathan
- Department of Clinical Research Cardiology, Tabba Heart Institute, Karachi 75950, Pakistan; (S.S.); (S.A.); (M.K.); (A.M.); (I.A.); (G.P.); (A.P.)
| | - Ahmad B. Alkhodre
- Faculty of Computer and Information Systems, Islamic University of Madinah, Madinah 42351, Saudi Arabia; (A.B.A.); (M.S.S.)
| | - Muhammad Shoaib Siddiqui
- Faculty of Computer and Information Systems, Islamic University of Madinah, Madinah 42351, Saudi Arabia; (A.B.A.); (M.S.S.)
| |
Collapse
|
3
|
Zang T, Zhu Y, Huang X, Yang X, Chen Q, Yu J, Tang F. Enhancing length of stay prediction by learning similarity-aware representations for hospitalized patients. Artif Intell Med 2023; 144:102660. [PMID: 37783550 DOI: 10.1016/j.artmed.2023.102660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 08/01/2023] [Accepted: 09/04/2023] [Indexed: 10/04/2023]
Abstract
This paper focuses on predicting the length of stay for patients on the first day of admission and propose a predictive model named DGLoS. In order to capture the influence of various complex factors on the length of stay as well as the dependencies among various factors, DGLoS uses a deep neural network to model both the patient information and diagnostic information. Targeting at different attribution types, we utilize different coding methods to convert raw data to the input features. Besides, we find that similar patients have closer lengths of stay. Therefore, we further design a module based on graph representation learning to generate patients' similarity-aware representations, capturing the similarity between patients and therefore enhancing predictions. These similarity-aware representations are incorporated into the output of the deep neural network to jointly perform the prediction. We have conducted comprehensive experiments on a real-world hospitalization dataset. The performance comparison shows that our proposed DGLoS model improves predictive performance and the significance test demonstrates the improvement is significant. The ablation study verifies the effectiveness of each of the proposed components and the hyper-parameter investigation shows the robustness of the proposed model.
Collapse
Affiliation(s)
- Tianzi Zang
- Shanghai Jiao Tong University, Shanghai, China; Nanjing University of Aeronautics and Astronautics, Nanjing, China
| | - Yanmin Zhu
- Shanghai Jiao Tong University, Shanghai, China.
| | | | - Xinchen Yang
- East China University of Science and Technology, Shanghai, China
| | | | - Jiadi Yu
- Shanghai Jiao Tong University, Shanghai, China
| | | |
Collapse
|
4
|
Liu M, Guo C, Guo S. An explainable knowledge distillation method with XGBoost for ICU mortality prediction. Comput Biol Med 2023; 152:106466. [PMID: 36566626 DOI: 10.1016/j.compbiomed.2022.106466] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 11/15/2022] [Accepted: 12/19/2022] [Indexed: 12/24/2022]
Abstract
BACKGROUND AND OBJECTIVE Mortality prediction is an important task in intensive care unit (ICU) for quantifying the severity of patients' physiological condition. Currently, scoring systems are widely applied for mortality prediction, while the performance is unsatisfactory in many clinical conditions due to the non-specificity and linearity characteristics of the used model. As the availability of the large volume of data recorded in electronic health records (EHRs), deep learning models have achieved state-of-art predictive performance. However, deep learning models are hard to meet the requirement of explainability in clinical conditions. Hence, an explainable Knowledge Distillation method with XGBoost (XGB-KD) is proposed to improve the predictive performance of XGBoost while supporting better explainability. METHODS In this method, we first use outperformed deep learning teacher models to learn the complex patterns hidden in high-dimensional multivariate time series data. Then, we distill knowledge from soft labels generated by the ensemble of teacher models to guide the training of XGBoost student model, whose inputs are meaningful features obtained from feature engineering. Finally, we conduct model calibration to obtain predicted probabilities reflecting the true posterior probabilities and use SHapley Additive exPlanations (SHAP) to obtain insights about the trained model. RESULTS We conduct comprehensive experiments on MIMIC-III dataset to evaluate our method. The results demonstrate that our method achieves better predictive performance than vanilla XGBoost, deep learning models and several state-of-art baselines from related works. Our method can also provide intuitive explanations. CONCLUSIONS Our method is useful for improving the predictive performance of XGBoost by distilling knowledge from deep learning models and can provide meaningful explanations for predictions.
Collapse
Affiliation(s)
- Mucan Liu
- Institute of Systems Engineering, Dalian University of Technology, Dalian 116024, China
| | - Chonghui Guo
- Institute of Systems Engineering, Dalian University of Technology, Dalian 116024, China.
| | - Sijia Guo
- Institute of Systems Engineering, Dalian University of Technology, Dalian 116024, China
| |
Collapse
|
5
|
Barsasella D, Bah K, Mishra P, Uddin M, Dhar E, Suryani DL, Setiadi D, Masturoh I, Sugiarti I, Jonnagaddala J, Syed-Abdul S. A Machine Learning Model to Predict Length of Stay and Mortality among Diabetes and Hypertension Inpatients. MEDICINA (KAUNAS, LITHUANIA) 2022; 58:1568. [PMID: 36363525 PMCID: PMC9694021 DOI: 10.3390/medicina58111568] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 10/27/2022] [Accepted: 10/28/2022] [Indexed: 08/18/2023]
Abstract
Background and Objectives: Taiwan is among the nations with the highest rates of Type 2 Diabetes Mellitus (T2DM) and Hypertension (HTN). As more cases are reported each year, there is a rise in hospital admissions for people seeking medical attention. This creates a burden on hospitals and affects the overall management and administration of the hospitals. Hence, this study aimed to develop a machine learning (ML) model to predict the Length of Stay (LoS) and mortality among T2DM and HTN inpatients. Materials and Methods: Using Taiwan's National Health Insurance Research Database (NHIRD), this cohort study consisted of 58,618 patients, where 25,868 had T2DM, 32,750 had HTN, and 6419 had both T2DM and HTN. We analyzed the data with different machine learning models for the prediction of LoS and mortality. The evaluation was done by plotting descriptive statistical graphs, feature importance, precision-recall curve, accuracy plots, and AUC. The training and testing data were set at a ratio of 8:2 before applying ML algorithms. Results: XGBoost showed the best performance in predicting LoS (R2 0.633; RMSE 0.386; MAE 0.123), and RF resulted in a slightly lower performance (R2 0.591; RMSE 0.401; MAE 0.027). Logistic Regression (LoR) performed the best in predicting mortality (CV Score 0.9779; Test Score 0.9728; Precision 0.9432; Recall 0.9786; AUC 0.97 and AUPR 0.93), closely followed by Ridge Classifier (CV Score 0.9736; Test Score 0.9692; Precision 0.9312; Recall 0.9463; AUC 0.94 and AUPR 0.89). Conclusions: We developed a robust prediction model for LoS and mortality of T2DM and HTN inpatients. Linear Regression showed the best performance for LoS, and Logistic Regression performed the best in predicting mortality. The results showed that ML algorithms can not only help healthcare professionals in data-driven decision-making but can also facilitate early intervention and resource planning.
Collapse
Affiliation(s)
- Diana Barsasella
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei 106, Taiwan
- International Center for Health Information Technology (ICHIT), College of Medical Science and Technology, Taipei Medical University, Taipei 106, Taiwan
- Department of Medical Record and Health Information, Health Polytechnic of the Ministry of Health Tasikmalaya, Tasikmalaya 46115, West Java, Indonesia
| | - Karamo Bah
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei 106, Taiwan
- International Center for Health Information Technology (ICHIT), College of Medical Science and Technology, Taipei Medical University, Taipei 106, Taiwan
| | | | - Mohy Uddin
- Research Quality Management Section, King Abdullah International Medical Research Center, Ministry of National Guard-Health Affairs, Riyadh 11481, Saudi Arabia
| | - Eshita Dhar
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei 106, Taiwan
- International Center for Health Information Technology (ICHIT), College of Medical Science and Technology, Taipei Medical University, Taipei 106, Taiwan
| | - Dewi Lena Suryani
- Department of Medical Record and Health Information, Health Polytechnic of the Ministry of Health Tasikmalaya, Tasikmalaya 46115, West Java, Indonesia
| | - Dedi Setiadi
- Department of Medical Record and Health Information, Health Polytechnic of the Ministry of Health Tasikmalaya, Tasikmalaya 46115, West Java, Indonesia
| | - Imas Masturoh
- Department of Medical Record and Health Information, Health Polytechnic of the Ministry of Health Tasikmalaya, Tasikmalaya 46115, West Java, Indonesia
| | - Ida Sugiarti
- Department of Medical Record and Health Information, Health Polytechnic of the Ministry of Health Tasikmalaya, Tasikmalaya 46115, West Java, Indonesia
| | - Jitendra Jonnagaddala
- School of Population Health, University of New South Wales, Kensington, NSW 2033, Australia
| | - Shabbir Syed-Abdul
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei 106, Taiwan
- International Center for Health Information Technology (ICHIT), College of Medical Science and Technology, Taipei Medical University, Taipei 106, Taiwan
- School of Gerontology and Long-Term Care, College of Nursing, Taipei Medical University, Taipei 106, Taiwan
| |
Collapse
|
6
|
Das R, Saleh S, Nielsen I, Kaviraj A, Sharma P, Dey K, Saha S. Performance analysis of machine learning algorithms and screening formulae for β-thalassemia trait screening of Indian antenatal women. Int J Med Inform 2022; 167:104866. [PMID: 36174416 DOI: 10.1016/j.ijmedinf.2022.104866] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 07/19/2022] [Accepted: 09/07/2022] [Indexed: 10/31/2022]
Abstract
BACKGROUND Currently, more than forty discrimination formulae based on red blood cell (RBC) parameters and some supervised machine learning algorithms (MLAs) have been recommended for β-thalassemia trait (BTT) screening. The present study was aimed to evaluate and compare the performance of 26 such formulae and 13 MLAs on antenatal woman data with a recently developed formula SCSBTT, which is available for evaluation in over seventy countries as an Android app, called SUSOKA[16]. METHODS A diagnostic database of 2942 antenatal females were collected from PGIMER, Chandigarh, India and was used for this analysis. The data set consists of hypochromic microcytic anemia, BTT, Hemoglobin E trait, double heterozygote for Hemoglobin S and BTT, heterozygote for Hemoglobin D Punjab and normal subjects. Performance of the formulae and the MLAs were assessed by Sensitivity, Specificity, Youden's Index, and AUC-ROC measures. A final recommendation was made from the ranking obtained through two Multiple Criteria Decision-Making (MCDM) techniques, namely, Simultaneous Evaluation of Criteria and Alternatives (SECA) and TOPSIS. RESULTS It was observed that Extreme Learning Machine (ELM) and Gradient Boosting Classifier (GBC) showed maximum Youden's index and AUC-ROC measures compared to all discriminating formulae. Sensitivity remains maximum for SCSBTT. K-means clustering and the ranking from MCDM methods show that SCSBTT, Shine & Lal and Ravanbakhsh-F4 formula ensures higher performance among all formulae. The discriminant power of some MLAs and formulae was found considerably lower than that reported in original studies. CONCLUSION Comparative information on MLAs can aid researchers in developing new discriminating formulae that simultaneously ensure higher sensitivity and specificity. More multi-centric verification of the formulae on heterogeneous data is indispensable. SCSBTT and Shine & Lal formula, and ELM and GBC are recommended for screening BTT based on MCDM. SCSBTT can be used with certainty as a tangible cost-saving screening tool for mass screening for antenatal women in India and other countries.
Collapse
Affiliation(s)
- Reena Das
- Department of Hematology, Postgraduate Institute of Medical Education and Research, Chandigarh 160012, India
| | - Sarkaft Saleh
- Department of Materials and Production, Aalborg University, DK 9220 Aalborg, Denmark
| | - Izabela Nielsen
- Department of Materials and Production, Aalborg University, DK 9220 Aalborg, Denmark
| | - Anilava Kaviraj
- Department of Zoology, University of Kalyani, Kalyani 741235, India
| | - Prashant Sharma
- Department of Hematology, Postgraduate Institute of Medical Education and Research, Chandigarh 160012, India
| | - Kartick Dey
- Department of Mathematics, University of Engineering & Management, Kolkata 700160, India
| | - Subrata Saha
- Department of Materials and Production, Aalborg University, DK 9220 Aalborg, Denmark
| |
Collapse
|