1
|
Gong K, Xue Y, Kong L, Xie X. Cost prediction for ischemic heart disease hospitalization: Interpretable feature extraction using network analysis. J Biomed Inform 2024; 154:104652. [PMID: 38718897 DOI: 10.1016/j.jbi.2024.104652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 04/26/2024] [Accepted: 05/01/2024] [Indexed: 05/23/2024]
Abstract
OBJECTIVES Ischemic heart disease (IHD) is a significant contributor to global mortality and disability, imposing a substantial social and economic burden on individuals and healthcare systems. To enhance the efficient allocation of medical resources and ultimately benefit a larger population, accurate prediction of healthcare costs is crucial. METHODS We developed an interpretable IHD hospitalization cost prediction model that integrates network analysis with machine learning. Specifically, our network-enhanced model extracts explainable features by leveraging a diagnosis-procedure concurrence network and advanced graph kernel techniques, facilitating the capture of intricate relationships between medical codes. RESULTS The proposed model achieved an R2 of 0.804 ± 0.008 and a root mean square error (RMSE) of 17,076 ± 420 CNY on the temporal validation dataset, demonstrating comparable performance to the model employing less interpretable code embedding features (R2: 0.800 ± 0.008; RMSE: 17,279 ± 437 CNY) and the hybrid graph isomorphism network (R2: 0.802 ± 0.007; RMSE: 17,249 ± 387 CNY). The interpretation of the network-enhanced model assisted in pinpointing specific diagnoses and procedures associated with higher hospitalization costs, including acute kidney injury, permanent atrial fibrillation, intra-aortic balloon bump, and temporary pacemaker placement, among others. CONCLUSION Our analysis results demonstrate that the proposed model strikes a balance between predictive accuracy and interpretability. It aids in identifying specific diagnoses and procedures associated with higher hospitalization costs, underscoring its potential to support intelligent management of IHD.
Collapse
Affiliation(s)
- Kaidi Gong
- Department of Industrial Engineering, Tsinghua University, Beijing, 100084, China.
| | - Yajun Xue
- Department of Cardiovascular Medicine, Beijing Tsinghua Changgung Hospital, School of Clinical Medicine, Tsinghua University, Beijing, 102218, China.
| | - Lingyun Kong
- Department of Cardiovascular Medicine, Beijing Tsinghua Changgung Hospital, School of Clinical Medicine, Tsinghua University, Beijing, 102218, China.
| | - Xiaolei Xie
- Department of Industrial Engineering, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
2
|
Wang M, He X, Liu L, Fang Q, Zhang M, Chen H, Liu Y. HCT: Chinese Medical Machine Reading Comprehension Question-Answering via Hierarchically Collaborative Transformer. IEEE J Biomed Health Inform 2024; 28:3055-3066. [PMID: 38381639 DOI: 10.1109/jbhi.2024.3368288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Chinese medical machine reading comprehension question-answering (cMed-MRCQA) is a critical component of the intelligence question-answering task, focusing on the Chinese medical domain question-answering task. Its purpose enable machines to analyze and understand the given text and question and then extract the accurate answer. To enhance cMed-MRCQA performance, it is essential to possess a profound comprehension and analysis of the context, deduce concealed information from the textual content and, subsequently, precisely determine the answer's span. The answer span has predominantly been defined by language items, with sentences employed in most instances. However, it is worth noting that sentences may not be properly split to varying degrees in various languages, making it challenging for the model to predict the answer zone. To alleviate this issue, this paper presents a novel architecture called HCT based on a Hierarchically Collaborative Transformer. Specifically, we presented a hierarchical collaborative method to locate the boundaries of sentence and answer spans separately. First, we designed a hierarchical encoding module to obtain the local semantic features of the corpus; second, we proposed a sentence-level self-attention module and a fused interaction-attention module to get the global information about the text. Finally, the model is trained by combining loss functions. Extensive experiments were conducted on the public dataset CMedMRC and the reconstruction dataset eMedicine to validate the effectiveness of the proposed method. Experimental results showed that the proposed method performed better than the state-of-the-art methods. Using the F1 metric, our model scored 90.4% on the CMedMRC and 73.2% on eMedicine.
Collapse
|
3
|
Rafferty J, Lee A, Lyons RA, Akbari A, Peek N, Jalali-najafabadi F, Ba Dhafari T, Lyons J, Watkins A, Bailey R. Using hypergraphs to quantify importance of sets of diseases by healthcare resource utilisation: A retrospective cohort study. PLoS One 2023; 18:e0295300. [PMID: 38100428 PMCID: PMC10723667 DOI: 10.1371/journal.pone.0295300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2022] [Accepted: 11/20/2023] [Indexed: 12/17/2023] Open
Abstract
Rates of Multimorbidity (also called Multiple Long Term Conditions, MLTC) are increasing in many developed nations. People with multimorbidity experience poorer outcomes and require more healthcare intervention. Grouping of conditions by health service utilisation is poorly researched. The study population consisted of a cohort of people living in Wales, UK aged 20 years or older in 2000 who were followed up until the end of 2017. Multimorbidity clusters by prevalence and healthcare resource use (HRU) were modelled using hypergraphs, mathematical objects relating diseases via links which can connect any number of diseases, thus capturing information about sets of diseases of any size. The cohort included 2,178,938 people. The most prevalent diseases were hypertension (13.3%), diabetes (6.9%), depression (6.7%) and chronic obstructive pulmonary disease (5.9%). The most important sets of diseases when considering prevalence generally contained a small number of diseases, while the most important sets of diseases when considering HRU were sets containing many diseases. The most important set of diseases taking prevalence and HRU into account was diabetes & hypertension and this combined measure of importance featured hypertension most often in the most important sets of diseases. We have used a single approach to find the most important sets of diseases based on co-occurrence and HRU measures, demonstrating the flexibility of the hypergraph approach. Hypertension, the most important single disease, is silent, underdiagnosed and increases the risk of life threatening co-morbidities. Co-occurrence of endocrine and cardiovascular diseases was common in the most important sets. Combining measures of prevalence with HRU provides insights which would be helpful for those planning and delivering services.
Collapse
Affiliation(s)
- James Rafferty
- Population Data Science, Swansea University Medical School, Swansea University, Swansea, United Kingdom
| | - Alexandra Lee
- Population Data Science, Swansea University Medical School, Swansea University, Swansea, United Kingdom
| | - Ronan A. Lyons
- Population Data Science, Swansea University Medical School, Swansea University, Swansea, United Kingdom
| | - Ashley Akbari
- Population Data Science, Swansea University Medical School, Swansea University, Swansea, United Kingdom
| | - Niels Peek
- Division of Informatics, Imaging and Data Science, School of Health Sciences, The University of Manchester, Manchester, United Kingdom
- Alan Turing Institute, London, United Kingdom
| | - Farideh Jalali-najafabadi
- Centre for Genetics and Genomics Versus Arthritis, Centre for Musculoskeletal Research, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, The University of Manchester, Manchester, United Kingdom
| | - Thamer Ba Dhafari
- Division of Informatics, Imaging and Data Science, School of Health Sciences, The University of Manchester, Manchester, United Kingdom
| | - Jane Lyons
- Population Data Science, Swansea University Medical School, Swansea University, Swansea, United Kingdom
| | - Alan Watkins
- Population Data Science, Swansea University Medical School, Swansea University, Swansea, United Kingdom
| | - Rowena Bailey
- Population Data Science, Swansea University Medical School, Swansea University, Swansea, United Kingdom
| |
Collapse
|
4
|
Zou M, An Y, Kuang H, Wang J. LGTRL-DE: Local and Global Temporal Representation Learning with Demographic Embedding for in-hospital mortality prediction. J Biomed Inform 2023:104408. [PMID: 37295630 DOI: 10.1016/j.jbi.2023.104408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 03/28/2023] [Accepted: 05/28/2023] [Indexed: 06/12/2023]
Abstract
Predicting the patient's in-hospital mortality from the historical Electronic Medical Records (EMRs) can assist physicians to make clinical decisions and assign medical resources. In recent years, researchers proposed many deep learning methods to predict in-hospital mortality by learning patient representations. However, most of these methods fail to comprehensively learn the temporal representations and do not sufficiently mine the contextual knowledge of demographic information. We propose a novel end-to-end approach based on Local and Global Temporal Representation Learning with Demographic Embedding (LGTRL-DE) to address the current issues for in-hospital mortality prediction. LGTRL-DE is enabled by (1) a local temporal representation learning module that captures the temporal information and analyzes the health status from a local perspective through a recurrent neural network with the demographic initialization and the local attention mechanism; (2) a Transformer-based global temporal representation learning module that extracts the interaction dependencies among clinical events; (3) a multi-view representation fusion module that fuses temporal and static information and generates the final patient's health representations. We evaluate our proposed LGTRL-DE on two public real-world clinical datasets (MIMIC-III and e-ICU). Experimental results show that LGTRL-DE achieves an area under receiver operating characteristic curve of 0.8685 and 0.8733 on the MIMIC-III and e-ICU datasets, respectively, outperforming state-of-the-art approaches.
Collapse
Affiliation(s)
- Mengjie Zou
- School of Computer Science and Engineering, Central South University, Changsha, 410083, PR China.
| | - Ying An
- The Institute of Big Data, Central South University, Changsha, 410083, PR China.
| | - Hulin Kuang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, PR China.
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, PR China.
| |
Collapse
|
5
|
AI Models for Predicting Readmission of Pneumonia Patients within 30 Days after Discharge. ELECTRONICS 2022. [DOI: 10.3390/electronics11050673] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
A model with capability for precisely predicting readmission is a target being pursued worldwide. The objective of this study is to design predictive models using artificial intelligence methods and data retrieved from the National Health Insurance Research Database of Taiwan for identifying high-risk pneumonia patients with 30-day all-cause readmissions. An integrated genetic algorithm (GA) and support vector machine (SVM), namely IGS, were used to design predictive models optimized with three objective functions. In IGS, GA was used for selecting salient features and optimal SVM parameters, while SVM was used for constructing the models. For comparison, logistic regression (LR) and deep neural network (DNN) were also applied for model construction. The IGS model with AUC used as the objective function achieved an accuracy, sensitivity, specificity, and area under ROC curve (AUC) of 70.11%, 73.46%, 69.26%, and 0.7758, respectively, outperforming the models designed with LR (65.77%, 78.44%, 62.54%, and 0.7689, respectively) and DNN (61.50%, 79.34%, 56.95%, and 0.7547, respectively), as well as previously reported models constructed using thedata of electronic health records with an AUC of 0.71–0.74. It can be used for automatically detecting pneumonia patients with a risk of all-cause readmissions within 30 days after discharge so as to administer suitable interventions to reduce readmission and healthcare costs.
Collapse
|
6
|
Wu J, Lin Y, Li P, Hu Y, Zhang L, Kong G. Predicting Prolonged Length of ICU Stay through Machine Learning. Diagnostics (Basel) 2021; 11:diagnostics11122242. [PMID: 34943479 PMCID: PMC8700580 DOI: 10.3390/diagnostics11122242] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 11/22/2021] [Accepted: 11/24/2021] [Indexed: 12/12/2022] Open
Abstract
This study aimed to construct machine learning (ML) models for predicting prolonged length of stay (pLOS) in intensive care units (ICU) among general ICU patients. A multicenter database called eICU (Collaborative Research Database) was used for model derivation and internal validation, and the Medical Information Mart for Intensive Care (MIMIC) III database was used for external validation. We used four different ML methods (random forest, support vector machine, deep learning, and gradient boosting decision tree (GBDT)) to develop prediction models. The prediction performance of the four models were compared with the customized simplified acute physiology score (SAPS) II. The area under the receiver operation characteristic curve (AUROC), area under the precision-recall curve (AUPRC), estimated calibration index (ECI), and Brier score were used to measure performance. In internal validation, the GBDT model achieved the best overall performance (Brier score, 0.164), discrimination (AUROC, 0.742; AUPRC, 0.537), and calibration (ECI, 8.224). In external validation, the GBDT model also achieved the best overall performance (Brier score, 0.166), discrimination (AUROC, 0.747; AUPRC, 0.536), and calibration (ECI, 8.294). External validation showed that the calibration curve of the GBDT model was an optimal fit, and four ML models outperformed the customized SAPS II model. The GBDT-based pLOS-ICU prediction model had the best prediction performance among the five models on both internal and external datasets. Furthermore, it has the potential to assist ICU physicians to identify patients with pLOS-ICU risk and provide appropriate clinical interventions to improve patient outcomes.
Collapse
Affiliation(s)
- Jingyi Wu
- National Institute of Health Data Science, Peking University, Beijing 100191, China; (J.W.); (L.Z.)
- Advanced Institute of Information Technology, Peking University, Hangzhou 311215, China;
| | - Yu Lin
- Department of Medicine and Therapeutics, LKS Institute of Health Science, The Chinese University of Hong Kong, Hong Kong, China;
| | - Pengfei Li
- Advanced Institute of Information Technology, Peking University, Hangzhou 311215, China;
| | - Yonghua Hu
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing 100191, China;
- Medical Informatics Center, Peking University, Beijing 100191, China
| | - Luxia Zhang
- National Institute of Health Data Science, Peking University, Beijing 100191, China; (J.W.); (L.Z.)
- Advanced Institute of Information Technology, Peking University, Hangzhou 311215, China;
- Renal Division, Department of Medicine, Peking University First Hospital, Peking University Institute of Nephrology, Beijing 100034, China
| | - Guilan Kong
- National Institute of Health Data Science, Peking University, Beijing 100191, China; (J.W.); (L.Z.)
- Advanced Institute of Information Technology, Peking University, Hangzhou 311215, China;
- Correspondence: ; Tel.: +86-18710098511
| |
Collapse
|