1
|
Baek S, Jeong YJ, Kim YH, Kim JY, Kim JH, Kim EY, Lim JK, Kim J, Kim Z, Kim K, Chung MJ. Development and Validation of a Robust and Interpretable Early Triaging Support System for Patients Hospitalized With COVID-19: Predictive Algorithm Modeling and Interpretation Study. J Med Internet Res 2024; 26:e52134. [PMID: 38206673 PMCID: PMC10811577 DOI: 10.2196/52134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 11/03/2023] [Accepted: 12/25/2023] [Indexed: 01/12/2024] Open
Abstract
BACKGROUND Robust and accurate prediction of severity for patients with COVID-19 is crucial for patient triaging decisions. Many proposed models were prone to either high bias risk or low-to-moderate discrimination. Some also suffered from a lack of clinical interpretability and were developed based on early pandemic period data. Hence, there has been a compelling need for advancements in prediction models for better clinical applicability. OBJECTIVE The primary objective of this study was to develop and validate a machine learning-based Robust and Interpretable Early Triaging Support (RIETS) system that predicts severity progression (involving any of the following events: intensive care unit admission, in-hospital death, mechanical ventilation required, or extracorporeal membrane oxygenation required) within 15 days upon hospitalization based on routinely available clinical and laboratory biomarkers. METHODS We included data from 5945 hospitalized patients with COVID-19 from 19 hospitals in South Korea collected between January 2020 and August 2022. For model development and external validation, the whole data set was partitioned into 2 independent cohorts by stratified random cluster sampling according to hospital type (general and tertiary care) and geographical location (metropolitan and nonmetropolitan). Machine learning models were trained and internally validated through a cross-validation technique on the development cohort. They were externally validated using a bootstrapped sampling technique on the external validation cohort. The best-performing model was selected primarily based on the area under the receiver operating characteristic curve (AUROC), and its robustness was evaluated using bias risk assessment. For model interpretability, we used Shapley and patient clustering methods. RESULTS Our final model, RIETS, was developed based on a deep neural network of 11 clinical and laboratory biomarkers that are readily available within the first day of hospitalization. The features predictive of severity included lactate dehydrogenase, age, absolute lymphocyte count, dyspnea, respiratory rate, diabetes mellitus, c-reactive protein, absolute neutrophil count, platelet count, white blood cell count, and saturation of peripheral oxygen. RIETS demonstrated excellent discrimination (AUROC=0.937; 95% CI 0.935-0.938) with high calibration (integrated calibration index=0.041), satisfied all the criteria of low bias risk in a risk assessment tool, and provided detailed interpretations of model parameters and patient clusters. In addition, RIETS showed potential for transportability across variant periods with its sustainable prediction on Omicron cases (AUROC=0.903, 95% CI 0.897-0.910). CONCLUSIONS RIETS was developed and validated to assist early triaging by promptly predicting the severity of hospitalized patients with COVID-19. Its high performance with low bias risk ensures considerably reliable prediction. The use of a nationwide multicenter cohort in the model development and validation implicates generalizability. The use of routinely collected features may enable wide adaptability. Interpretations of model parameters and patients can promote clinical applicability. Together, we anticipate that RIETS will facilitate the patient triaging workflow and efficient resource allocation when incorporated into a routine clinical practice.
Collapse
Affiliation(s)
- Sangwon Baek
- Medical AI Research Center, Samsung Medical Center, Seoul, Republic of Korea
- Center for Data Science, New York University, New York, NY, United States
| | - Yeon Joo Jeong
- Department of Radiology, Research Institute for Convergence of Biomedical Science and Technology, Pusan National University Yangsan Hospital, Yangsan, Republic of Korea
| | - Yun-Hyeon Kim
- Department of Radiology, Chonnam National University Hospital, Gwangju, Republic of Korea
| | - Jin Young Kim
- Department of Radiology, Keimyung University Dongsan Hospital, Daegu, Republic of Korea
| | - Jin Hwan Kim
- Department of Radiology, Chungnam National University Hospital, Daejeon, Republic of Korea
| | - Eun Young Kim
- Department of Radiology, Gachon University Gil Medical Center, Incheon, Republic of Korea
| | - Jae-Kwang Lim
- Department of Radiology, School of Medicine, Kyungpook National University, Daegu, Republic of Korea
| | - Jungok Kim
- Department of Infectious Diseases, Chungnam National University Sejong Hospital, Sejong, Republic of Korea
| | - Zero Kim
- Medical AI Research Center, Samsung Medical Center, Seoul, Republic of Korea
- Department of Data Convergence & Future Medicine, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Kyunga Kim
- Department of Data Convergence & Future Medicine, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
- Biomedical Statistics Center, Research Institute for Future Medicine, Samsung Medical Center, Seoul, Republic of Korea
- Department of Digital Health, SAIHST, Sungkyunkwan University, Seoul, Republic of Korea
| | - Myung Jin Chung
- Medical AI Research Center, Samsung Medical Center, Seoul, Republic of Korea
- Department of Data Convergence & Future Medicine, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
- Department of Digital Health, SAIHST, Sungkyunkwan University, Seoul, Republic of Korea
- Department of Radiology, Samsung Medical Center, Seoul, Republic of Korea
| |
Collapse
|
2
|
Chang F, Krishnan J, Hurst JH, Yarrington ME, Anderson DJ, O'Brien EC, Goldstein BA. Comparing Natural Language Processing and Structured Medical Data to Develop a Computable Phenotype for Patients Hospitalized Due to COVID-19: Retrospective Analysis. JMIR Med Inform 2023; 11:e46267. [PMID: 37621195 PMCID: PMC10466442 DOI: 10.2196/46267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 05/19/2023] [Accepted: 06/17/2023] [Indexed: 06/27/2023] Open
Abstract
Background Throughout the COVID-19 pandemic, many hospitals conducted routine testing of hospitalized patients for SARS-CoV-2 infection upon admission. Some of these patients are admitted for reasons unrelated to COVID-19 and incidentally test positive for the virus. Because COVID-19-related hospitalizations have become a critical public health indicator, it is important to identify patients who are hospitalized because of COVID-19 as opposed to those who are admitted for other indications. Objective We compared the performance of different computable phenotype definitions for COVID-19 hospitalizations that use different types of data from electronic health records (EHRs), including structured EHR data elements, clinical notes, or a combination of both data types. Methods We conducted a retrospective data analysis, using clinician chart review-based validation at a large academic medical center. We reviewed and analyzed the charts of 586 hospitalized individuals who tested positive for SARS-CoV-2 in January 2022. We used LASSO (least absolute shrinkage and selection operator) regression and random forests to fit classification algorithms that incorporated structured EHR data elements, clinical notes, or a combination of structured data and clinical notes. We used natural language processing to incorporate data from clinical notes. The performance of each model was evaluated based on the area under the receiver operator characteristic curve (AUROC) and an associated decision rule based on sensitivity and positive predictive value. We also identified top words and clinical indicators of COVID-19-specific hospitalization and assessed the impact of different phenotyping strategies on estimated hospital outcome metrics. Results Based on a chart review, 38.2% (224/586) of patients were determined to have been hospitalized for reasons other than COVID-19, despite having tested positive for SARS-CoV-2. A computable phenotype that used clinical notes had significantly better discrimination than one that used structured EHR data elements (AUROC: 0.894 vs 0.841; P<.001) and performed similarly to a model that combined clinical notes with structured data elements (AUROC: 0.894 vs 0.893; P=.91). Assessments of hospital outcome metrics significantly differed based on whether the population included all hospitalized patients who tested positive for SARS-CoV-2 or those who were determined to have been hospitalized due to COVID-19. Conclusions These findings highlight the importance of cause-specific phenotyping for COVID-19 hospitalizations. More generally, this work demonstrates the utility of natural language processing approaches for deriving information related to patient hospitalizations in cases where there may be multiple conditions that could serve as the primary indication for hospitalization.
Collapse
Affiliation(s)
- Feier Chang
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, United States
| | - Jay Krishnan
- Department of Medicine, Duke University, Durham, NC, United States
| | - Jillian H Hurst
- Department of Pediatrics, Duke University, Durham, NC, United States
| | | | | | - Emily C O'Brien
- Department of Population Health Sciences, Duke University, Durham, NC, United States
- Duke Clinical Research Institute, Duke University, Durham, NC, United States
| | - Benjamin A Goldstein
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, United States
- Department of Pediatrics, Duke University, Durham, NC, United States
- Department of Population Health Sciences, Duke University, Durham, NC, United States
- Duke Clinical Research Institute, Duke University, Durham, NC, United States
| |
Collapse
|