1
|
Liou L, Scott E, Parchure P, Ouyang Y, Egorova N, Freeman R, Hofer IS, Nadkarni GN, Timsina P, Kia A, Levin MA. Assessing calibration and bias of a deployed machine learning malnutrition prediction model within a large healthcare system. NPJ Digit Med 2024; 7:149. [PMID: 38844546 PMCID: PMC11156633 DOI: 10.1038/s41746-024-01141-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 05/22/2024] [Indexed: 06/09/2024] Open
Abstract
Malnutrition is a frequently underdiagnosed condition leading to increased morbidity, mortality, and healthcare costs. The Mount Sinai Health System (MSHS) deployed a machine learning model (MUST-Plus) to detect malnutrition upon hospital admission. However, in diverse patient groups, a poorly calibrated model may lead to misdiagnosis, exacerbating health care disparities. We explored the model's calibration across different variables and methods to improve calibration. Data from adult patients admitted to five MSHS hospitals from January 1, 2021 - December 31, 2022, were analyzed. We compared MUST-Plus prediction to the registered dietitian's formal assessment. Hierarchical calibration was assessed and compared between the recalibration sample (N = 49,562) of patients admitted between January 1, 2021 - December 31, 2022, and the hold-out sample (N = 17,278) of patients admitted between January 1, 2023 - September 30, 2023. Statistical differences in calibration metrics were tested using bootstrapping with replacement. Before recalibration, the overall model calibration intercept was -1.17 (95% CI: -1.20, -1.14), slope was 1.37 (95% CI: 1.34, 1.40), and Brier score was 0.26 (95% CI: 0.25, 0.26). Both weak and moderate measures of calibration were significantly different between White and Black patients and between male and female patients. Logistic recalibration significantly improved calibration of the model across race and gender in the hold-out sample. The original MUST-Plus model showed significant differences in calibration between White vs. Black patients. It also overestimated malnutrition in females compared to males. Logistic recalibration effectively reduced miscalibration across all patient subgroups. Continual monitoring and timely recalibration can improve model accuracy.
Collapse
Affiliation(s)
- Lathan Liou
- Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | | | - Prathamesh Parchure
- Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Yuxia Ouyang
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Anesthesiology, Perioperative and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Natalia Egorova
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Robert Freeman
- Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ira S Hofer
- Department of Anesthesiology, Perioperative and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Division of Data Driven and Digital Medicine (D3M), The Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Girish N Nadkarni
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Division of Data Driven and Digital Medicine (D3M), The Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Prem Timsina
- Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Arash Kia
- Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Anesthesiology, Perioperative and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Matthew A Levin
- Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Anesthesiology, Perioperative and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
2
|
Chaudhary R, Nourelahi M, Thoma FW, Gellad WF, Lo-Ciganic WH, Bliden KP, Gurbel PA, Neal MD, Jain SK, Bhonsale A, Mulukutla SR, Wang Y, Harinstein ME, Saba S, Visweswaran S. Machine Learning - Based Bleeding Risk Predictions in Atrial Fibrillation Patients on Direct Oral Anticoagulants. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.27.24307985. [PMID: 38854094 PMCID: PMC11160827 DOI: 10.1101/2024.05.27.24307985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Importance Accurately predicting major bleeding events in non-valvular atrial fibrillation (AF) patients on direct oral anticoagulants (DOACs) is crucial for personalized treatment and improving patient outcomes, especially with emerging alternatives like left atrial appendage closure devices. The left atrial appendage closure devices reduce stroke risk comparably but with significantly fewer non-procedural bleeding events. Objective To evaluate the performance of machine learning (ML) risk models in predicting clinically significant bleeding events requiring hospitalization and hemorrhagic stroke in non-valvular AF patients on DOACs compared to conventional bleeding risk scores (HAS-BLED, ORBIT, and ATRIA) at the index visit to a cardiologist for AF management. Design Prognostic modeling with retrospective cohort study design using electronic health record (EHR) data, with clinical follow-up at one-, two-, and five-years. Setting University of Pittsburgh Medical Center (UPMC) system. Participants 24,468 non-valvular AF patients aged ≥18 years treated with DOACs, excluding those with prior history of significant bleeding, other indications for DOACs, on warfarin or contraindicated to DOACs. Exposures DOAC therapy for non-valvular AF. Main Outcomes and Measures The primary endpoint was clinically significant bleeding requiring hospitalization within one year of index visit. The models incorporated demographic, clinical, and laboratory variables available in the EHR at the index visit. Results Among 24,468 patients, 553 (2.3%) had bleeding events within one year, 829 (3.5%) within two years, and 1,292 (5.8%) within five years of index visit. We evaluated multivariate logistic regression and ML models including random forest, classification trees, k-nearest neighbor, naive Bayes, and extreme gradient boosting (XGBoost) which modestly outperformed HAS-BLED, ATRIA, and ORBIT scores in predicting clinically significant bleeding at 1-year follow-up. The best performing model (random forest) showed area under the curve (AUC-ROC) 0.76 (0.70-0.81), G-Mean score of 0.67, net reclassification index 0.14 compared to 0.57 (0.50-0.63), G-Mean score of 0.57 for HASBLED score, p-value for difference <0.001. The ML models had improved performance compared to conventional risk across time-points of 2-year and 5-years and within the subgroup of hemorrhagic stroke. SHAP analysis identified novel risk factors including measures from body mass index, cholesterol profile, and insurance type beyond those used in conventional risk scores. Conclusions and Relevance Our findings demonstrate the superior performance of ML models compared to conventional bleeding risk scores and identify novel risk factors highlighting the potential for personalized bleeding risk assessment in AF patients on DOACs.
Collapse
|
3
|
Huberts LCE, Li S, Blake V, Jorm L, Yu J, Ooi SY, Gallego B. Predictive analytics for cardiovascular patient readmission and mortality: An explainable approach. Comput Biol Med 2024; 174:108321. [PMID: 38626511 DOI: 10.1016/j.compbiomed.2024.108321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Revised: 02/06/2024] [Accepted: 03/13/2024] [Indexed: 04/18/2024]
Abstract
BACKGROUND Cardiovascular patients experience high rates of adverse outcomes following discharge from hospital, which may be preventable through early identification and targeted action. This study aimed to investigate the effectiveness and explainability of machine learning algorithms in predicting unplanned readmission and death in cardiovascular patients at 30 days and 180 days from discharge. METHODS Gradient boosting machines were trained and evaluated using data from hospital electronic medical records linked to hospital administrative and mortality data for 39,255 patients admitted to four hospitals in New South Wales, Australia between 2017 and 2021. Sociodemographic variables, admission history, and clinical information were used as potential predictors. The performance was compared to LASSO regression, as well as the HOSPITAL and LACE risk score indices. Important risk factors identified by the gradient-boosting machine model were explored using Shapley values. RESULTS The models performed well, especially for the mortality outcomes. Area under the receiver operating characteristic curve values were 0.70 for readmission and 0.87-0.90 for mortality using the full gradient boosting machine algorithms. Among the top predictors for 30-day and 180-day readmission were increased red cell distribution width, old age (especially above 80 years), high measured troponin and urea levels, not being married or in a relationship, and low albumin levels. For mortality, these included increased red cell distribution width, old age (especially older than 70 years), high measured troponin and urea levels, high neutrophil and monocyte counts, and low eosinophil and lymphocyte counts. The Shapley values gave clear insight into the dynamics of decision-tree-based models. CONCLUSIONS We demonstrated an explainable predictive algorithm to identify cardiovascular patients who are at high risk of readmission or death at discharge from the hospital and identified key risk factors.
Collapse
Affiliation(s)
- Leo C E Huberts
- Centre for Big Data Research in Health, University of New South Wales, Sydney, NSW, Australia.
| | - Sihan Li
- Centre for Big Data Research in Health, University of New South Wales, Sydney, NSW, Australia
| | - Victoria Blake
- Centre for Big Data Research in Health, University of New South Wales, Sydney, NSW, Australia; Eastern Heart Clinic, Prince of Wales Hospital, Sydney, NSW, Australia
| | - Louisa Jorm
- Centre for Big Data Research in Health, University of New South Wales, Sydney, NSW, Australia
| | - Jennifer Yu
- School of Clinical Medicine, Faculty of Medicine and Health, University of New South Wales, Sydney, NSW, Australia; Prince of Wales Hospital, South Eastern Sydney Local Health District, NSW, Australia
| | - Sze-Yuan Ooi
- School of Clinical Medicine, Faculty of Medicine and Health, University of New South Wales, Sydney, NSW, Australia; Prince of Wales Hospital, South Eastern Sydney Local Health District, NSW, Australia
| | - Blanca Gallego
- Centre for Big Data Research in Health, University of New South Wales, Sydney, NSW, Australia
| |
Collapse
|
4
|
Zhang Y, Zhu X, Gao F, Yang S. Systematic Review and Critical Appraisal of Prediction Models for Readmission in Coronary Artery Disease Patients: Assessing Current Efficacy and Future Directions. Risk Manag Healthc Policy 2024; 17:549-557. [PMID: 38496372 PMCID: PMC10944133 DOI: 10.2147/rmhp.s451436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 03/04/2024] [Indexed: 03/19/2024] Open
Abstract
Purpose Coronary artery disease (CAD) patients frequently face readmissions due to suboptimal disease management. Prediction models are pivotal for detecting early unplanned readmissions. This review offers a unified assessment, aiming to lay the groundwork for enhancing prediction models and informing prevention strategies. Methods A search through five databases (PubMed, Web of Science, EBSCOhost, Embase, China National Knowledge Infrastructure) up to September 2023 identified studies on prediction models for coronary artery disease patient readmissions for this review. Two independent reviewers used the CHARMS checklist for data extraction and the PROBAST tool for bias assessment. Results From 12,457 records, 15 studies were selected, contributing 30 models targeting various CAD patient groups (AMI, CABG, ACS) from primarily China, the USA, and Canada. Models utilized varied methods such as logistic regression and machine learning, with performance predominantly measured by the c-index. Key predictors included age, gender, and hospital stay duration. Readmission rates in the studies varied from 4.8% to 45.1%. Despite high bias risk across models, several showed notable accuracy and calibration. Conclusion The study highlights the need for thorough external validation and the use of the PROBAST tool to reduce bias in models predicting readmission for CAD patients.
Collapse
Affiliation(s)
- Yunhao Zhang
- College of Nursing, Hangzhou Normal University, Hangzhou, People’s Republic of China
| | - Xuejiao Zhu
- College of Nursing, Hangzhou Normal University, Hangzhou, People’s Republic of China
| | - Fuer Gao
- College of Nursing, Hangzhou Normal University, Hangzhou, People’s Republic of China
| | - Shulan Yang
- Department of Nursing, Zhejiang Hospital, Hangzhou, People’s Republic of China
| |
Collapse
|
5
|
Meade SM, Salas-Vega S, Nagy MR, Sundar SJ, Steinmetz MP, Benzel EC, Habboub G. A Pilot Remote Curriculum to Enhance Resident and Medical Student Understanding of Machine Learning in Healthcare. World Neurosurg 2023; 180:e142-e148. [PMID: 37696433 DOI: 10.1016/j.wneu.2023.09.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 08/22/2023] [Accepted: 09/04/2023] [Indexed: 09/13/2023]
Abstract
BACKGROUND Despite the expanding role of machine learning (ML) in health care and patient expectations for clinicians to understand ML-based tools, few for-credit curricula exist specifically for neurosurgical trainees to learn basic principles and implications of ML for medical research and clinical practice. We implemented a novel, remotely delivered curriculum designed to develop literacy in ML for neurosurgical trainees. METHODS A 4-week pilot medical elective was designed specifically for trainees to build literacy in basic ML concepts. Qualitative feedback from interested and enrolled students was collected to assess students' and trainees' reactions, learning, and future application of course content. RESULTS Despite 15 interested learners, only 3 medical students and 1 neurosurgical resident completed the course. Enrollment included students and trainees from 3 different institutions. All learners who completed the course found the lectures relevant to their future practice as clinicians and researchers and reported improved confidence in applying and understanding published literature applying ML techniques in health care. Barriers to ample enrollment and retention (e.g., balancing clinical responsibilities) were identified. CONCLUSIONS This pilot elective demonstrated the interest, value, and feasibility of a remote elective to establish ML literacy; however, feedback to increase accessibility and flexibility of the course encouraged our team to implement changes. Future elective iterations will have a semiannual, 2-week format, splitting lectures more clearly between theory (the method and its value) and application (coding instructions) and will make lectures open-source prerequisites to allow tailoring of student learning to their planned application of these methods in their practice and research.
Collapse
Affiliation(s)
- Seth M Meade
- Department of Neurosurgery, Cleveland Clinic Lerner College of Medicine, Cleveland, Ohio, USA; Case Western School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA; Department of Neurosurgery, Neurologic Institute, Center for Spine Health, Cleveland Clinic Foundation, Cleveland, Ohio, USA.
| | - Sebastian Salas-Vega
- Case Western School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA; Department of Neurosurgery, Neurologic Institute, Center for Spine Health, Cleveland Clinic Foundation, Cleveland, Ohio, USA; Department of Neurosurgery, Inova Health System, Falls Church, Virginia, USA
| | - Matthew R Nagy
- Department of Neurosurgery, Cleveland Clinic Lerner College of Medicine, Cleveland, Ohio, USA; Case Western School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA
| | - Swetha J Sundar
- Department of Neurosurgery, Cleveland Clinic Lerner College of Medicine, Cleveland, Ohio, USA; Department of Neurosurgery, Neurologic Institute, Center for Spine Health, Cleveland Clinic Foundation, Cleveland, Ohio, USA
| | - Michael P Steinmetz
- Department of Neurosurgery, Cleveland Clinic Lerner College of Medicine, Cleveland, Ohio, USA; Department of Neurosurgery, Neurologic Institute, Center for Spine Health, Cleveland Clinic Foundation, Cleveland, Ohio, USA
| | - Edward C Benzel
- Department of Neurosurgery, Cleveland Clinic Lerner College of Medicine, Cleveland, Ohio, USA; Department of Neurosurgery, Neurologic Institute, Center for Spine Health, Cleveland Clinic Foundation, Cleveland, Ohio, USA
| | - Ghaith Habboub
- Department of Neurosurgery, Cleveland Clinic Lerner College of Medicine, Cleveland, Ohio, USA; Department of Neurosurgery, Neurologic Institute, Center for Spine Health, Cleveland Clinic Foundation, Cleveland, Ohio, USA
| |
Collapse
|
6
|
Matsumoto K, Nohara Y, Sakaguchi M, Takayama Y, Fukushige S, Soejima H, Nakashima N, Kamouchi M. Temporal Generalizability of Machine Learning Models for Predicting Postoperative Delirium Using Electronic Health Record Data: Model Development and Validation Study. JMIR Perioper Med 2023; 6:e50895. [PMID: 37883164 PMCID: PMC10636625 DOI: 10.2196/50895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Revised: 09/24/2023] [Accepted: 09/29/2023] [Indexed: 10/27/2023] Open
Abstract
BACKGROUND Although machine learning models demonstrate significant potential in predicting postoperative delirium, the advantages of their implementation in real-world settings remain unclear and require a comparison with conventional models in practical applications. OBJECTIVE The objective of this study was to validate the temporal generalizability of decision tree ensemble and sparse linear regression models for predicting delirium after surgery compared with that of the traditional logistic regression model. METHODS The health record data of patients hospitalized at an advanced emergency and critical care medical center in Kumamoto, Japan, were collected electronically. We developed a decision tree ensemble model using extreme gradient boosting (XGBoost) and a sparse linear regression model using least absolute shrinkage and selection operator (LASSO) regression. To evaluate the predictive performance of the model, we used the area under the receiver operating characteristic curve (AUROC) and the Matthews correlation coefficient (MCC) to measure discrimination and the slope and intercept of the regression between predicted and observed probabilities to measure calibration. The Brier score was evaluated as an overall performance metric. We included 11,863 consecutive patients who underwent surgery with general anesthesia between December 2017 and February 2022. The patients were divided into a derivation cohort before the COVID-19 pandemic and a validation cohort during the COVID-19 pandemic. Postoperative delirium was diagnosed according to the confusion assessment method. RESULTS A total of 6497 patients (68.5, SD 14.4 years, women n=2627, 40.4%) were included in the derivation cohort, and 5366 patients (67.8, SD 14.6 years, women n=2105, 39.2%) were included in the validation cohort. Regarding discrimination, the XGBoost model (AUROC 0.87-0.90 and MCC 0.34-0.44) did not significantly outperform the LASSO model (AUROC 0.86-0.89 and MCC 0.34-0.41). The logistic regression model (AUROC 0.84-0.88, MCC 0.33-0.40, slope 1.01-1.19, intercept -0.16 to 0.06, and Brier score 0.06-0.07), with 8 predictors (age, intensive care unit, neurosurgery, emergency admission, anesthesia time, BMI, blood loss during surgery, and use of an ambulance) achieved good predictive performance. CONCLUSIONS The XGBoost model did not significantly outperform the LASSO model in predicting postoperative delirium. Furthermore, a parsimonious logistic model with a few important predictors achieved comparable performance to machine learning models in predicting postoperative delirium.
Collapse
Affiliation(s)
| | - Yasunobu Nohara
- Big Data Science and Technology, Faculty of Advanced Science and Technology, Kumamoto University, Kumamoto, Japan
| | - Mikako Sakaguchi
- Department of Nursing, Saiseikai Kumamoto Hospital, Kumamoto, Japan
| | - Yohei Takayama
- Department of Nursing, Saiseikai Kumamoto Hospital, Kumamoto, Japan
| | - Syota Fukushige
- Department of Inspection, Saiseikai Kumamoto Hospital, Kumamoto, Japan
| | - Hidehisa Soejima
- Institute for Medical Information Research and Analysis, Saiseikai Kumamoto Hospital, Kumamoto, Japan
| | - Naoki Nakashima
- Medical Information Center, Kyushu University Hospital, Fukuoka, Japan
| | - Masahiro Kamouchi
- Department of Health Care Administration and Management, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Japan
- Center for Cohort Studies, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Japan
| |
Collapse
|
7
|
Ose D, Adediran E, Owens R, Gardner E, Mervis M, Turner C, Carlson E, Forbes D, Jasumback CL, Stuligross J, Pohl S, Kiraly B. Electronic Health Record-Driven Approaches in Primary Care to Strengthen Hypertension Management Among Racial and Ethnic Minoritized Groups in the United States: Systematic Review. J Med Internet Res 2023; 25:e42409. [PMID: 37713256 PMCID: PMC10541643 DOI: 10.2196/42409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 06/01/2023] [Accepted: 07/04/2023] [Indexed: 09/16/2023] Open
Abstract
BACKGROUND Managing hypertension in racial and ethnic minoritized groups (eg, African American/Black patients) in primary care is highly relevant. However, evidence on whether or how electronic health record (EHR)-driven approaches in primary care can help improve hypertension management for patients of racial and ethnic minoritized groups in the United States remains scarce. OBJECTIVE This review aims to examine the role of the EHR in supporting interventions in primary care to strengthen the hypertension management of racial and ethnic minoritized groups in the United States. METHODS A search strategy based on the PICO (Population, Intervention, Comparison, and Outcome) guidelines was utilized to query and identify peer-reviewed articles on the Web of Science and PubMed databases. The search strategy was based on terms related to racial and ethnic minoritized groups, hypertension, primary care, and EHR-driven interventions. Articles were excluded if the focus was not hypertension management in racial and ethnic minoritized groups or if there was no mention of health record data utilization. RESULTS A total of 29 articles were included in this review. Regarding populations, Black/African American patients represented the largest population (26/29, 90%) followed by Hispanic/Latino (18/29, 62%), Asian American (7/29, 24%), and American Indian/Alaskan Native (2/29, 7%) patients. No study included patients who identified as Native Hawaiian/Pacific Islander. The EHR was used to identify patients (25/29, 86%), drive the intervention (21/29, 72%), and monitor results and outcomes (7/29, 59%). Most often, EHR-driven approaches were used for health coaching interventions, disease management programs, clinical decision support (CDS) systems, and best practice alerts (BPAs). Regarding outcomes, out of 8 EHR-driven health coaching interventions, only 3 (38%) reported significant results. In contrast, all the included studies related to CDS and BPA applications reported some significant results with respect to improving hypertension management. CONCLUSIONS This review identified several use cases for the integration of the EHR in supporting primary care interventions to strengthen hypertension management in racial and ethnic minoritized patients in the United States. Some clinical-based interventions implementing CDS and BPA applications showed promising results. However, more research is needed on community-based interventions, particularly those focusing on patients who are Asian American, American Indian/Alaskan Native, and Native Hawaiian/Pacific Islander. The developed taxonomy comprising "identifying patients," "driving intervention," and "monitoring results" to classify EHR-driven approaches can be a helpful tool to facilitate this.
Collapse
Affiliation(s)
- Dominik Ose
- Department of Family and Preventive Medicine, University of Utah, Salt Lake City, UT, United States
| | - Emmanuel Adediran
- Department of Family and Preventive Medicine, University of Utah, Salt Lake City, UT, United States
| | - Robert Owens
- Department of Family and Preventive Medicine, University of Utah, Salt Lake City, UT, United States
| | - Elena Gardner
- Department of Family and Preventive Medicine, University of Utah, Salt Lake City, UT, United States
| | - Matthew Mervis
- Department of Family and Preventive Medicine, University of Utah, Salt Lake City, UT, United States
| | - Cindy Turner
- Department of Family and Preventive Medicine, University of Utah, Salt Lake City, UT, United States
| | - Emily Carlson
- Community Physicians Group, University of Utah, Salt Lake City, UT, United States
| | - Danielle Forbes
- Utah Department of Health and Human Services, Salt Lake City, UT, United States
| | | | - John Stuligross
- Utah Department of Health and Human Services, Salt Lake City, UT, United States
| | - Susan Pohl
- Department of Family and Preventive Medicine, University of Utah, Salt Lake City, UT, United States
| | - Bernadette Kiraly
- Department of Family and Preventive Medicine, University of Utah, Salt Lake City, UT, United States
| |
Collapse
|
8
|
Song X, Tong Y, Luo Y, Chang H, Gao G, Dong Z, Wu X, Tong R. Predicting 7-day unplanned readmission in elderly patients with coronary heart disease using machine learning. Front Cardiovasc Med 2023; 10:1190038. [PMID: 37614939 PMCID: PMC10442485 DOI: 10.3389/fcvm.2023.1190038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 07/24/2023] [Indexed: 08/25/2023] Open
Abstract
Background Short-term unplanned readmission is always neglected, especially for elderly patients with coronary heart disease (CHD). However, tools to predict unplanned readmission are lacking. This study aimed to establish the most effective predictive model for the unplanned 7-day readmission in elderly CHD patients using machine learning (ML) algorithms. Methods The detailed clinical data of elderly CHD patients were collected retrospectively. Five ML algorithms, including extreme gradient boosting (XGB), random forest, multilayer perceptron, categorical boosting, and logistic regression, were used to establish predictive models. We used the area under the receiver operating characteristic curve (AUC), accuracy, precision, recall, the F1 value, the Brier score, the area under the precision-recall curve (AUPRC), and the calibration curve to evaluate the performance of ML models. The SHapley Additive exPlanations (SHAP) value was used to interpret the best model. Results The final study included 834 elderly CHD patients, whose average age was 73.5 ± 8.4 years, among whom 426 (51.08%) were men and 139 had 7-day unplanned readmissions. The XGB model had the best performance, exhibiting the highest AUC (0.9729), accuracy (0.9173), F1 value (0.9134), and AUPRC (0.9766). The Brier score of the XGB model was 0.08. The calibration curve of the XGB model showed good performance. The SHAP method showed that fracture, hypertension, length of stay, aspirin, and D-dimer were the most important indicators for the risk of 7-day unplanned readmissions. The top 10 variables were used to build a compact XGB, which also showed good predictive performance. Conclusions In this study, five ML algorithms were used to predict 7-day unplanned readmissions in elderly patients with CHD. The XGB model had the best predictive performance and potential clinical application perspective.
Collapse
Affiliation(s)
- Xuewu Song
- Department of Pharmacy, Sichuan Provincial People’s Hospital, University of Electronic Science and Technology of China, Chengdu, China
- Chinese Academy of Sciences Sichuan Translational Medicine Research Hospital, Chengdu, China
| | - Yitong Tong
- Chengdu Second People’s Hospital, Chengdu, China
| | - Yi Luo
- Department of Pharmacy, Sichuan Provincial People’s Hospital, University of Electronic Science and Technology of China, Chengdu, China
- Chinese Academy of Sciences Sichuan Translational Medicine Research Hospital, Chengdu, China
| | - Huan Chang
- Department of Pharmacy, Sichuan Provincial People’s Hospital, University of Electronic Science and Technology of China, Chengdu, China
- Chinese Academy of Sciences Sichuan Translational Medicine Research Hospital, Chengdu, China
| | - Guangjie Gao
- Department of Pharmacy, Sichuan Provincial People’s Hospital, University of Electronic Science and Technology of China, Chengdu, China
- Chinese Academy of Sciences Sichuan Translational Medicine Research Hospital, Chengdu, China
| | - Ziyi Dong
- Department of Pharmacy, Sichuan Provincial People’s Hospital, University of Electronic Science and Technology of China, Chengdu, China
- Chinese Academy of Sciences Sichuan Translational Medicine Research Hospital, Chengdu, China
| | - Xingwei Wu
- Department of Pharmacy, Sichuan Provincial People’s Hospital, University of Electronic Science and Technology of China, Chengdu, China
- Chinese Academy of Sciences Sichuan Translational Medicine Research Hospital, Chengdu, China
| | - Rongsheng Tong
- Department of Pharmacy, Sichuan Provincial People’s Hospital, University of Electronic Science and Technology of China, Chengdu, China
- Chinese Academy of Sciences Sichuan Translational Medicine Research Hospital, Chengdu, China
| |
Collapse
|
9
|
Wei R, Guan X, Liu E, Zhang W, Lv J, Huang H, Zhao Z, Chen H, Liu Z, Jiang Z, Wang X. Development of a machine learning algorithm to predict complications of total laparoscopic anterior resection and natural orifice specimen extraction surgery in rectal cancer. EUROPEAN JOURNAL OF SURGICAL ONCOLOGY 2023; 49:1258-1268. [PMID: 36653246 DOI: 10.1016/j.ejso.2023.01.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 11/01/2022] [Accepted: 01/08/2023] [Indexed: 01/11/2023]
Abstract
BACKGROUND Total laparoscopic anterior resection (tLAR) and natural orifice specimen extraction surgery (NOSES) has been widely adopted in the treatment of rectal cancer (RC). However, no study has been performed to predict the short-term outcomes of tLAR using machine learning algorithms to analyze a national cohort. METHODS Data from consecutive RC patients who underwent tLAR were collected from the China NOSES Database (CNDB). The random forest (RF), extreme gradient boosting (XGBoost), support vector machine (SVM), deep neural network (DNN), logistic regression (LR) and K-nearest neighbor (KNN) algorithms were used to develop risk models to predict short-term complications of tLAR. The area under the receiver operating characteristic curve (AUROC), Gini coefficient, specificity and sensitivity were calculated to assess the performance of each risk model. The selected factors from the models were evaluated by relative importance. RESULTS A total of 4313 RC patients were identified, and 667 patients (15.5%) developed postoperative complications. The machine learning model of XGBoost showed more promising results in the prediction of complication than other models (AUROC 0.90, P < 0.001). The performance was similar when internal and external validation was used. In the XGBoost model, the top four influential factors were the distance from the lower edge of the tumor to the anus, age at diagnosis, surgical time and comorbidities. In risk stratification analysis, the rate of postoperative complications in the high-risk group was significantly higher than in the medium- and low-risk groups (P < 0.001). CONCLUSION The machine learning model shows potential benefits in predicting the risk of complications in RC patients after tLAR. This novel approach can provide reliable individual information for surgical treatment recommendations.
Collapse
Affiliation(s)
- Ran Wei
- Department of Colorectal Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Xu Guan
- Department of Colorectal Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Enrui Liu
- Department of Colorectal Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Weiyuan Zhang
- Department of Colorectal Surgery, The Second Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Jingfang Lv
- Department of Colorectal Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Haiyang Huang
- Department of Colorectal Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Zhixun Zhao
- Department of Colorectal Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Haipeng Chen
- Department of Colorectal Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Zheng Liu
- Department of Colorectal Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Zheng Jiang
- Department of Colorectal Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China.
| | - Xishan Wang
- Department of Colorectal Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China.
| |
Collapse
|
10
|
Jin Y, Weberpals JG, Wang SV, Desai RJ, Merola D, Lin KJ. The Impact of Longitudinal Data-Completeness of Electronic Health Record Data on the Prediction Performance of Clinical Risk Scores. Clin Pharmacol Ther 2023; 113:1359-1367. [PMID: 37026443 PMCID: PMC10924806 DOI: 10.1002/cpt.2901] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 03/22/2023] [Indexed: 04/08/2023]
Abstract
The impact of electronic health record (EHR) discontinuity (i.e., receiving care outside of a given EHR system) on EHR-based risk prediction is unknown. We aimed to assess the impact of EHR-continuity on the performance of clinical risk scores. The study cohort consisted of patients aged ≥ 65 years with ≥ 1 EHR encounter in the 2 networks in Massachusetts (MA; 2007/1/1-2017/12/31, internal training and validation dataset), and one network in North Carolina (NC; 2007/1/1-2016/12/31, external validation dataset) that were linked with Medicare claims data. Risk scores were calculated using EHR data alone vs. linked EHR-claims data (not subject to misclassification due to EHR-discontinuity): (i) combined comorbidity score (CCS), (ii) claim-based frailty score (CFI), (iii) CHAD2 DS2 -VASc, and (iv) Hypertension, Abnormal renal/liver function, Stroke, Bleeding, Labile, Elderly, and Drugs (HAS-BLED). We assessed the performance of CCS and CFI predicting death, CHAD2 DS2 -VASc predicting ischemic stroke, and HAS-BLED predicting bleeding by area under receiver operating characteristic curve (AUROC), stratified by quartiles of predicted EHR-continuity (Q1-4). There were 319,740 patients in the MA systems and 125,380 in the NC system. In the external validation dataset, AUROC for EHR-based CCS predicting 1-year risk of death was 0.583 in Q1 (lowest) EHR-continuity group, which increased to 0.739 in Q4 (highest) EHR-continuity group. The corresponding improvement in AUROC was 0.539 to 0.647 for CFI, 0.556 to 0.637 for CHAD2 DS2 -VASc, and 0.517 to 0.556 for HAS-BLED. The AUROC in Q4 EHR-continuity group based on EHR alone approximates that based on EHR-claims data. The prediction performance of four clinical risk scores was substantially worse in patients with lower vs. high EHR-continuity.
Collapse
Affiliation(s)
- Yinzhu Jin
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Janick G. Weberpals
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Shirley V. Wang
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Rishi J. Desai
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | | | - Kueiyu Joshua Lin
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
- Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
11
|
Park K, Cho M, Song M, Yoo S, Baek H, Kim S, Kim K. Exploring the potential of OMOP common data model for process mining in healthcare. PLoS One 2023; 18:e0279641. [PMID: 36595527 PMCID: PMC9810199 DOI: 10.1371/journal.pone.0279641] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Accepted: 12/09/2022] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND AND OBJECTIVE Recently, Electronic Health Records (EHR) are increasingly being converted to Common Data Models (CDMs), a database schema designed to provide standardized vocabularies to facilitate collaborative observational research. To date, however, rare attempts exist to leverage CDM data for healthcare process mining, a technique to derive process-related knowledge (e.g., process model) from event logs. This paper presents a method to extract, construct, and analyze event logs from the Observational Medical Outcomes Partnership (OMOP) CDM for process mining and demonstrates CDM-based healthcare process mining with several real-life study cases while answering frequently posed questions in process mining, in the CDM environment. METHODS We propose a method to extract, construct, and analyze event logs from the OMOP CDM for process types including inpatient, outpatient, emergency room processes, and patient journey. Using the proposed method, we extract the retrospective data of several surgical procedure cases (i.e., Total Laparoscopic Hysterectomy (TLH), Total Hip Replacement (THR), Coronary Bypass (CB), Transcatheter Aortic Valve Implantation (TAVI), Pancreaticoduodenectomy (PD)) from the CDM of a Korean tertiary hospital. Patient data are extracted for each of the operations and analyzed using several process mining techniques. RESULTS Using process mining, the clinical pathways, outpatient process models, emergency room process models, and patient journeys are demonstrated using the extracted logs. The result shows CDM's usability as a novel and valuable data source for healthcare process analysis, yet with a few considerations. We found that CDM should be complemented by different internal and external data sources to address the administrative and operational aspects of healthcare processes, particularly for outpatient and ER process analyses. CONCLUSION To the best of our knowledge, we are the first to exploit CDM for healthcare process mining. Specifically, we provide a step-by-step guidance by demonstrating process analysis from locating relevant CDM tables to visualizing results using process mining tools. The proposed method can be widely applicable across different institutions. This work can contribute to bringing a process mining perspective to the existing CDM users in the changing Hospital Information Systems (HIS) environment and also to facilitating CDM-based studies in the process mining research community.
Collapse
Affiliation(s)
- Kangah Park
- Department of Industrial and Management Engineering, Pohang University of Science and Technology (POSTECH), Pohang, South Korea
| | - Minsu Cho
- School of Information Convergence, Kwangwoon University, Seoul, South Korea
| | - Minseok Song
- Department of Industrial and Management Engineering, Pohang University of Science and Technology (POSTECH), Pohang, South Korea
- * E-mail: (MS); (SY)
| | - Sooyoung Yoo
- Healthcare ICT Research Center, Office of eHealth Research and Businesses, Seoul National University Bundang Hospital, Seongnam, South Korea
- * E-mail: (MS); (SY)
| | - Hyunyoung Baek
- Healthcare ICT Research Center, Office of eHealth Research and Businesses, Seoul National University Bundang Hospital, Seongnam, South Korea
| | - Seok Kim
- Healthcare ICT Research Center, Office of eHealth Research and Businesses, Seoul National University Bundang Hospital, Seongnam, South Korea
| | - Kidong Kim
- Department of Obstetrics and Gynecology, Seoul National University Bundang Hospital, Seongnam, South Korea
| |
Collapse
|
12
|
Dhillon SK, Ganggayah MD, Sinnadurai S, Lio P, Taib NA. Theory and Practice of Integrating Machine Learning and Conventional Statistics in Medical Data Analysis. Diagnostics (Basel) 2022; 12:2526. [PMID: 36292218 PMCID: PMC9601117 DOI: 10.3390/diagnostics12102526] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 09/26/2022] [Accepted: 10/04/2022] [Indexed: 11/16/2022] Open
Abstract
The practice of medical decision making is changing rapidly with the development of innovative computing technologies. The growing interest of data analysis with improvements in big data computer processing methods raises the question of whether machine learning can be integrated with conventional statistics in health research. To help address this knowledge gap, this paper presents a review on the conceptual integration between conventional statistics and machine learning, focusing on the health research. The similarities and differences between the two are compared using mathematical concepts and algorithms. The comparison between conventional statistics and machine learning methods indicates that conventional statistics are the fundamental basis of machine learning, where the black box algorithms are derived from basic mathematics, but are advanced in terms of automated analysis, handling big data and providing interactive visualizations. While the nature of both these methods are different, they are conceptually similar. Based on our review, we conclude that conventional statistics and machine learning are best to be integrated to develop automated data analysis tools. We also strongly believe that machine learning could be explored by health researchers to enhance conventional statistics in decision making for added reliable validation measures.
Collapse
Affiliation(s)
- Sarinder Kaur Dhillon
- Data Science & Bioinformatics Laboratory, Institute of Biological Sciences, Faculty of Science, Universiti Malaya, Kuala Lumpur 50603, Malaysia
| | - Mogana Darshini Ganggayah
- Department of Econometrics and Business Statistics, School of Business, Monash University Malaysia, Kuala Lumpur 47500, Malaysia
| | - Siamala Sinnadurai
- Department of Population Medicine and Lifestyle Disease Prevention, Medical University of Bialystok, 15-269 Bialystok, Poland
| | - Pietro Lio
- Department of Computer Science and Technology, University of Cambridge, 15 JJ Thomson Avenue, Cambridge CB3 0FD, UK
| | - Nur Aishah Taib
- Department of Surgery, Faculty of Medicine, Universiti Malaya, Kuala Lumpur 50603, Malaysia
| |
Collapse
|
13
|
Wang Y, Li C, Yuan M, Ren B, Liu C, Zheng J, Lin Z, Ren F, Gao D. Development of a complete blood count with differential-based prediction model for in-hospital mortality among patients with acute myocardial infarction in the coronary care unit. Front Cardiovasc Med 2022; 9:1001356. [PMID: 36277791 PMCID: PMC9581274 DOI: 10.3389/fcvm.2022.1001356] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2022] [Accepted: 09/21/2022] [Indexed: 11/13/2022] Open
Abstract
Purpose In recent years, the complete blood count with differential (CBC w/diff) test has drawn strong interest because of its prognostic value in cardiovascular diseases. We aimed to develop a CBC w/diff-based prediction model for in-hospital mortality among patients with severe acute myocardial infarction (AMI) in the coronary care unit (CCU). Materials and methods This single-center retrospective study used data from a public database. The neural network method was applied. The performance of the model was assessed by discrimination and calibration. The discrimination performance of our model was compared to that of seven other classical machine learning models and five well-studied CBC w/diff clinical indicators. Finally, a permutation test was applied to evaluate the importance rank of the predictor variables. Results A total of 2,231 patient medical records were included. With a mean area under the curve (AUC) of 0.788 [95% confidence interval (CI), 0.736-0.838], our model outperformed all other models and indices. Furthermore, it performed well in calibration. Finally, the top three predictors were white blood cell count (WBC), red blood cell distribution width-coefficient of variation (RDW-CV), and neutrophil percentage. Surprisingly, after dropping seven variables with poor prediction values, the AUC of our model increased to 0.812 (95% CI, 0.762-0.859) (P < 0.05). Conclusion We used a neural network method to develop a risk prediction model for in-hospital mortality among patients with AMI in the CCU based on the CBC w/diff test, which performed well and would aid in early clinical decision-making. The top three important predictors were WBC, RDW-CV and neutrophil percentage.
Collapse
Affiliation(s)
- Yu Wang
- Department of Cardiology, Xi’an Jiaotong University Second Affiliated Hospital, Xi’an, China
| | - Changfu Li
- Department of Digestive Medicine, Daqing Longnan Hospital, Daqing, China
| | - Miao Yuan
- Department of Cardiology, Xi’an Jiaotong University Second Affiliated Hospital, Xi’an, China
| | - Bincheng Ren
- Department of Cardiology, Xi’an Jiaotong University Second Affiliated Hospital, Xi’an, China
| | - Chang Liu
- Department of Cardiology, Xi’an Jiaotong University Second Affiliated Hospital, Xi’an, China
| | - Jiawei Zheng
- Department of Cardiology, Xi’an Jiaotong University Second Affiliated Hospital, Xi’an, China
| | - Zehao Lin
- Department of Cardiology, Xi’an Jiaotong University Second Affiliated Hospital, Xi’an, China
| | - Fuxian Ren
- Department of Cardiology, Meishan Branch of the Third Affiliated Hospital, Yanan University School of Medical, Meishan, China
| | - Dengfeng Gao
- Department of Cardiology, Xi’an Jiaotong University Second Affiliated Hospital, Xi’an, China
| |
Collapse
|
14
|
Abstract
PURPOSE OF REVIEW The past decade has brought increased efforts to better understand causes for ACS readmissions and strategies to minimize them. This review seeks to provide a critical appraisal of this rapidly growing body of literature. RECENT FINDINGS Prior to 2010, readmission rates for patients suffering from ACS remained relatively constant. More recently, several strategies have been implemented to mitigate this including improved risk assessment models, transition care bundles, and development of targeted programs by federal organizations and professional societies. These strategies have been associated with a significant reduction in ACS readmission rates in more recent years. With this, improvements in 30-day post-discharge mortality rates are also being appreciated. As we continue to expand our knowledge on independent risk factors for ACS readmissions, further strategies targeting at-risk populations may further decrease the rate of readmissions. Efforts to understand and reduce 30-day ACS readmission rates have resulted in overall improved quality of care for patients.
Collapse
|
15
|
Brown JR, Ricket IM, Reeves RM, Shah RU, Goodrich CA, Gobbel G, Stabler ME, Perkins AM, Minter F, Cox KC, Dorn C, Denton J, Bray BE, Gouripeddi R, Higgins J, Chapman WW, MacKenzie T, Matheny ME. Information Extraction From Electronic Health Records to Predict Readmission Following Acute Myocardial Infarction: Does Natural Language Processing Using Clinical Notes Improve Prediction of Readmission? J Am Heart Assoc 2022; 11:e024198. [PMID: 35322668 PMCID: PMC9075435 DOI: 10.1161/jaha.121.024198] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 02/04/2022] [Indexed: 11/23/2022]
Abstract
Background Social risk factors influence rehospitalization rates yet are challenging to incorporate into prediction models. Integration of social risk factors using natural language processing (NLP) and machine learning could improve risk prediction of 30-day readmission following an acute myocardial infarction. Methods and Results Patients were enrolled into derivation and validation cohorts. The derivation cohort included inpatient discharges from Vanderbilt University Medical Center between January 1, 2007, and December 31, 2016, with a primary diagnosis of acute myocardial infarction, who were discharged alive, and not transferred from another facility. The validation cohort included patients from Dartmouth-Hitchcock Health Center between April 2, 2011, and December 31, 2016, meeting the same eligibility criteria described above. Data from both sites were linked to Centers for Medicare & Medicaid Services administrative data to supplement 30-day hospital readmissions. Clinical notes from each cohort were extracted, and an NLP model was deployed, counting mentions of 7 social risk factors. Five machine learning models were run using clinical and NLP-derived variables. Model discrimination and calibration were assessed, and receiver operating characteristic comparison analyses were performed. The 30-day rehospitalization rates among the derivation (n=6165) and validation (n=4024) cohorts were 15.1% (n=934) and 10.2% (n=412), respectively. The derivation models demonstrated no statistical improvement in model performance with the addition of the selected NLP-derived social risk factors. Conclusions Social risk factors extracted using NLP did not significantly improve 30-day readmission prediction among hospitalized patients with acute myocardial infarction. Alternative methods are needed to capture social risk factors.
Collapse
Affiliation(s)
- Jeremiah R. Brown
- Departments of Epidemiology and Biomedical Data ScienceDartmouth Geisel School of MedicineHanoverNH
| | - Iben M. Ricket
- Departments of Epidemiology and Biomedical Data ScienceDartmouth Geisel School of MedicineHanoverNH
| | - Ruth M. Reeves
- Department of Biomedical InformaticsVanderbilt University Medical CenterNashvilleTN
- Geriatric Research Education and Clinical Care CenterTennessee Valley Healthcare System VANashvilleTN
| | - Rashmee U. Shah
- Division of Cardiovascular MedicineUniversity of Utah School of MedicineSalt Lake CityUT
| | - Christine A. Goodrich
- Departments of Epidemiology and Biomedical Data ScienceDartmouth Geisel School of MedicineHanoverNH
| | - Glen Gobbel
- Department of Biomedical InformaticsVanderbilt University Medical CenterNashvilleTN
- Geriatric Research Education and Clinical Care CenterTennessee Valley Healthcare System VANashvilleTN
- Department of BiostatisticsVanderbilt University Medical CenterNashvilleTN
- Division of General Internal MedicineVanderbilt University Medical CenterNashvilleTN
| | - Meagan E. Stabler
- Departments of Epidemiology and Biomedical Data ScienceDartmouth Geisel School of MedicineHanoverNH
| | - Amy M. Perkins
- Geriatric Research Education and Clinical Care CenterTennessee Valley Healthcare System VANashvilleTN
- Department of BiostatisticsVanderbilt University Medical CenterNashvilleTN
| | - Freneka Minter
- Department of Biomedical InformaticsVanderbilt University Medical CenterNashvilleTN
| | - Kevin C. Cox
- Departments of Epidemiology and Biomedical Data ScienceDartmouth Geisel School of MedicineHanoverNH
| | - Chad Dorn
- Department of Biomedical InformaticsVanderbilt University Medical CenterNashvilleTN
| | - Jason Denton
- Department of Biomedical InformaticsVanderbilt University Medical CenterNashvilleTN
| | - Bruce E. Bray
- Division of General Internal MedicineVanderbilt University Medical CenterNashvilleTN
- Department of Biomedical InformaticsUniversity of Utah School of MedicineSalt Lake CityUT
| | - Ramkiran Gouripeddi
- Department of Biomedical InformaticsUniversity of Utah School of MedicineSalt Lake CityUT
- Utah Clinical & Translational Science InstituteUniversity of UtahSalt Lake CityUT
| | - John Higgins
- Departments of Epidemiology and Biomedical Data ScienceDartmouth Geisel School of MedicineHanoverNH
| | - Wendy W. Chapman
- Centre for Digital Transformation of HealthUniversity of MelbourneMelbourneVictoriaAustralia
| | - Todd MacKenzie
- Departments of Epidemiology and Biomedical Data ScienceDartmouth Geisel School of MedicineHanoverNH
| | - Michael E. Matheny
- Department of Biomedical InformaticsVanderbilt University Medical CenterNashvilleTN
- Geriatric Research Education and Clinical Care CenterTennessee Valley Healthcare System VANashvilleTN
- Department of BiostatisticsVanderbilt University Medical CenterNashvilleTN
- Division of General Internal MedicineVanderbilt University Medical CenterNashvilleTN
| |
Collapse
|
16
|
Omission in Funding. JAMA Netw Open 2022; 5:e224247. [PMID: 35254437 PMCID: PMC8902646 DOI: 10.1001/jamanetworkopen.2022.4247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
17
|
Reeves RM, Christensen L, Brown JR, Conway M, Levis M, Gobbel GT, Shah RU, Goodrich C, Ricket I, Minter F, Bohm A, Bray BE, Matheny ME, Chapman W. Adaptation of an NLP system to a new healthcare environment to identify social determinants of health. J Biomed Inform 2021; 120:103851. [PMID: 34174396 DOI: 10.1016/j.jbi.2021.103851] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Revised: 06/16/2021] [Accepted: 06/21/2021] [Indexed: 11/18/2022]
Abstract
Social determinants of health (SDoH) are increasingly important factors for population health, healthcare outcomes, and care delivery. However, many of these factors are not reliably captured within structured electronic health record (EHR) data. In this work, we evaluated and adapted a previously published NLP tool to include additional social risk factors for deployment at Vanderbilt University Medical Center in an Acute Myocardial Infarction cohort. We developed a transformation of the SDoH outputs of the tool into the OMOP common data model (CDM) for re-use across many potential use cases, yielding performance measures across 8 SDoH classes of precision 0.83 recall 0.74 and F-measure of 0.78.
Collapse
Affiliation(s)
- Ruth M Reeves
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States; Geriatric Research Education and Clinical Care Center, Tennessee Valley Healthcare System VA, Nashville, TN, United States.
| | - Lee Christensen
- Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, UT, United States
| | - Jeremiah R Brown
- Department of Epidemiology and Biomedical Data Science, Dartmouth Geisel School of Medicine, Hanover, NH, United States
| | - Michael Conway
- Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, UT, United States
| | - Maxwell Levis
- Department of Epidemiology and Biomedical Data Science, Dartmouth Geisel School of Medicine, Hanover, NH, United States
| | - Glenn T Gobbel
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States; Division of General Internal Medicine, Vanderbilt University Medical Center, Nashville, TN, United States; Geriatric Research Education and Clinical Care Center, Tennessee Valley Healthcare System VA, Nashville, TN, United States
| | - Rashmee U Shah
- Division of Cardiovascular Medicine, University of Utah School of Medicine, Salt Lake City, UT, United States
| | - Christine Goodrich
- Department of Epidemiology and Biomedical Data Science, Dartmouth Geisel School of Medicine, Hanover, NH, United States
| | - Iben Ricket
- Department of Epidemiology and Biomedical Data Science, Dartmouth Geisel School of Medicine, Hanover, NH, United States
| | - Freneka Minter
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Andrew Bohm
- Department of Epidemiology and Biomedical Data Science, Dartmouth Geisel School of Medicine, Hanover, NH, United States
| | - Bruce E Bray
- Division of Cardiovascular Medicine, University of Utah School of Medicine, Salt Lake City, UT, United States; Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, UT, United States
| | - Michael E Matheny
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States; Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, United States; Division of General Internal Medicine, Vanderbilt University Medical Center, Nashville, TN, United States; Geriatric Research Education and Clinical Care Center, Tennessee Valley Healthcare System VA, Nashville, TN, United States
| | - Wendy Chapman
- Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, UT, United States; Centre for Clinical and Public Health Informatics, University of Melbourne, Melbourne, Australia
| |
Collapse
|