1
|
Ravindranath R, Stein JD, Hernandez-Boussard T, Fisher AC, Wang SY. The Impact of Race, Ethnicity, and Sex on Fairness in Artificial Intelligence for Glaucoma Prediction Models. OPHTHALMOLOGY SCIENCE 2025; 5:100596. [PMID: 39386055 PMCID: PMC11462200 DOI: 10.1016/j.xops.2024.100596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 07/31/2024] [Accepted: 08/07/2024] [Indexed: 10/12/2024]
Abstract
Objective Despite advances in artificial intelligence (AI) in glaucoma prediction, most works lack multicenter focus and do not consider fairness concerning sex, race, or ethnicity. This study aims to examine the impact of these sensitive attributes on developing fair AI models that predict glaucoma progression to necessitating incisional glaucoma surgery. Design Database study. Participants Thirty-nine thousand ninety patients with glaucoma, as identified by International Classification of Disease codes from 7 academic eye centers participating in the Sight OUtcomes Research Collaborative. Methods We developed XGBoost models using 3 approaches: (1) excluding sensitive attributes as input features, (2) including them explicitly as input features, and (3) training separate models for each group. Model input features included demographic details, diagnosis codes, medications, and clinical information (intraocular pressure, visual acuity, etc.), from electronic health records. The models were trained on patients from 5 sites (N = 27 999) and evaluated on a held-out internal test set (N = 3499) and 2 external test sets consisting of N = 1550 and N = 2542 patients. Main Outcomes and Measures Area under the receiver operating characteristic curve (AUROC) and equalized odds on the test set and external sites. Results Six thousand six hundred eighty-two (17.1%) of 39 090 patients underwent glaucoma surgery with a mean age of 70.1 (standard deviation 14.6) years, 54.5% female, 62.3% White, 22.1% Black, and 4.7% Latinx/Hispanic. We found that not including the sensitive attributes led to better classification performance (AUROC: 0.77-0.82) but worsened fairness when evaluated on the internal test set. However, on external test sites, the opposite was true: including sensitive attributes resulted in better classification performance (AUROC: external #1 - [0.73-0.81], external #2 - [0.67-0.70]), but varying degrees of fairness for sex and race as measured by equalized odds. Conclusions Artificial intelligence models predicting whether patients with glaucoma progress to surgery demonstrated bias with respect to sex, race, and ethnicity. The effect of sensitive attribute inclusion and exclusion on fairness and performance varied based on internal versus external test sets. Prior to deployment, AI models should be evaluated for fairness on the target population. Financial Disclosures Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
Collapse
Affiliation(s)
- Rohith Ravindranath
- Department of Ophthalmology, Byers Eye Institute, Stanford University, Palo Alto, California
| | - Joshua D. Stein
- Department of Ophthalmology & Visual Sciences, University of Michigan Kellogg Eye Center, Ann Arbor, Michigan
| | | | - A. Caroline Fisher
- Department of Ophthalmology, Byers Eye Institute, Stanford University, Palo Alto, California
| | - Sophia Y. Wang
- Department of Ophthalmology, Byers Eye Institute, Stanford University, Palo Alto, California
| |
Collapse
|
2
|
Xu C, Xu Q, Liu L, Zhou M, Xing Z, Zhou Z, Ren D, Zhou C, Zhang L, Li X, Zhan X, Gevaert O, Lu G. A tri-light warning system for hospitalized COVID-19 patients: Credibility-based risk stratification for future pandemic preparedness. Eur J Radiol Open 2024; 13:100603. [PMID: 39469109 PMCID: PMC11513506 DOI: 10.1016/j.ejro.2024.100603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Revised: 09/12/2024] [Accepted: 09/30/2024] [Indexed: 10/30/2024] Open
Abstract
Purpose The novel coronavirus pneumonia (COVID-19) has continually spread and mutated, requiring a patient risk stratification system to optimize medical resources and improve pandemic response. We aimed to develop a conformal prediction-based tri-light warning system for stratifying COVID-19 patients, applicable to both original and emerging variants. Methods We retrospectively collected data from 3646 patients across multiple centers in China. The dataset was divided into a training set (n = 1451), a validation set (n = 662), an external test set from Huoshenshan Field Hospital (n = 1263), and a specific test set for Delta and Omicron variants (n = 544). The tri-light warning system extracts radiomic features from CT (computed tomography) and integrates clinical records to classify patients into high-risk (red), uncertain-risk (yellow), and low-risk (green) categories. Models were built to predict ICU (intensive care unit) admissions (adverse cases in training/validation/Huoshenshan/variant test sets: n = 39/21/262/11) and were evaluated using AUROC ((area under the receiver operating characteristic curve)) and AUPRC ((area under the precision-recall curve)) metrics. Results The dataset included 1830 men (50.2 %) and 1816 women (50.8 %), with a median age of 53.7 years (IQR [interquartile range]: 42-65 years). The system demonstrated strong performance under data distribution shifts, with AUROC of 0.89 and AUPRC of 0.42 for original strains, and AUROC of 0.77-0.85 and AUPRC of 0.51-0.60 for variants. Conclusion The tri-light warning system can enhance pandemic responses by effectively stratifying COVID-19 patients under varying conditions and data shifts.
Collapse
Affiliation(s)
- Chuanjun Xu
- Department of Radiology, the Second Hospital of Nanjing, Nanjing University of Chinese Medicine, Nanjing 210003, China
| | - Qinmei Xu
- Department of Biomedical Data Science (BMIR), Department of Medicine, Stanford University, Stanford, CA 94304, USA
| | - Li Liu
- Department of Computer Science, University of California Santa Cruz, Santa Cruze, CA 95064, USA
| | - Mu Zhou
- Department of Computer Science, Rutgers University, 110 Frelinghuysen Road, Piscataway, NJ 08854, USA
| | - Zijian Xing
- Department of Deepwise AI Lab, Deepwise Inc., Beijing, China
| | - Zhen Zhou
- Department of Deepwise AI Lab, Deepwise Inc., Beijing, China
| | - Danyang Ren
- Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Changsheng Zhou
- Department of Medical Imaging, Jinling Hospital, Nanjing, Jiangsu, China
| | - Longjiang Zhang
- Department of Medical Imaging, Jinling Hospital, Nanjing, Jiangsu, China
| | - Xiao Li
- Department of Medical Imaging, Jinling Hospital, Nanjing, Jiangsu, China
| | - Xianghao Zhan
- Department of Bioengineering, Stanford University, Stanford 94305, USA
| | - Olivier Gevaert
- Department of Biomedical Data Science (BMIR), Department of Medicine, Stanford University, Stanford, CA 94304, USA
| | - Guangming Lu
- Department of Medical Imaging, Jinling Hospital, Nanjing, Jiangsu, China
| |
Collapse
|
3
|
Heumos L, Ehmele P, Treis T, Upmeier Zu Belzen J, Roellin E, May L, Namsaraeva A, Horlava N, Shitov VA, Zhang X, Zappia L, Knoll R, Lang NJ, Hetzel L, Virshup I, Sikkema L, Curion F, Eils R, Schiller HB, Hilgendorff A, Theis FJ. An open-source framework for end-to-end analysis of electronic health record data. Nat Med 2024:10.1038/s41591-024-03214-0. [PMID: 39266748 DOI: 10.1038/s41591-024-03214-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Accepted: 07/25/2024] [Indexed: 09/14/2024]
Abstract
With progressive digitalization of healthcare systems worldwide, large-scale collection of electronic health records (EHRs) has become commonplace. However, an extensible framework for comprehensive exploratory analysis that accounts for data heterogeneity is missing. Here we introduce ehrapy, a modular open-source Python framework designed for exploratory analysis of heterogeneous epidemiology and EHR data. ehrapy incorporates a series of analytical steps, from data extraction and quality control to the generation of low-dimensional representations. Complemented by rich statistical modules, ehrapy facilitates associating patients with disease states, differential comparison between patient clusters, survival analysis, trajectory inference, causal inference and more. Leveraging ontologies, ehrapy further enables data sharing and training EHR deep learning models, paving the way for foundational models in biomedical research. We demonstrate ehrapy's features in six distinct examples. We applied ehrapy to stratify patients affected by unspecified pneumonia into finer-grained phenotypes. Furthermore, we reveal biomarkers for significant differences in survival among these groups. Additionally, we quantify medication-class effects of pneumonia medications on length of stay. We further leveraged ehrapy to analyze cardiovascular risks across different data modalities. We reconstructed disease state trajectories in patients with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) based on imaging data. Finally, we conducted a case study to demonstrate how ehrapy can detect and mitigate biases in EHR data. ehrapy, thus, provides a framework that we envision will standardize analysis pipelines on EHR data and serve as a cornerstone for the community.
Collapse
Affiliation(s)
- Lukas Heumos
- Institute of Computational Biology, Helmholtz Munich, Munich, Germany
- Institute of Lung Health and Immunity and Comprehensive Pneumology Center with the CPC-M bioArchive; Helmholtz Zentrum Munich; member of the German Center for Lung Research (DZL), Munich, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - Philipp Ehmele
- Institute of Computational Biology, Helmholtz Munich, Munich, Germany
| | - Tim Treis
- Institute of Computational Biology, Helmholtz Munich, Munich, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | | | - Eljas Roellin
- Institute of Computational Biology, Helmholtz Munich, Munich, Germany
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
| | - Lilly May
- Institute of Computational Biology, Helmholtz Munich, Munich, Germany
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
| | - Altana Namsaraeva
- Institute of Computational Biology, Helmholtz Munich, Munich, Germany
- Konrad Zuse School of Excellence in Learning and Intelligent Systems (ELIZA), Darmstadt, Germany
| | - Nastassya Horlava
- Institute of Computational Biology, Helmholtz Munich, Munich, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - Vladimir A Shitov
- Institute of Computational Biology, Helmholtz Munich, Munich, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - Xinyue Zhang
- Institute of Computational Biology, Helmholtz Munich, Munich, Germany
| | - Luke Zappia
- Institute of Computational Biology, Helmholtz Munich, Munich, Germany
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
| | - Rainer Knoll
- Systems Medicine, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE), Bonn, Germany
| | - Niklas J Lang
- Institute of Lung Health and Immunity and Comprehensive Pneumology Center with the CPC-M bioArchive; Helmholtz Zentrum Munich; member of the German Center for Lung Research (DZL), Munich, Germany
| | - Leon Hetzel
- Institute of Computational Biology, Helmholtz Munich, Munich, Germany
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
| | - Isaac Virshup
- Institute of Computational Biology, Helmholtz Munich, Munich, Germany
| | - Lisa Sikkema
- Institute of Computational Biology, Helmholtz Munich, Munich, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - Fabiola Curion
- Institute of Computational Biology, Helmholtz Munich, Munich, Germany
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
| | - Roland Eils
- Health Data Science Unit, Heidelberg University and BioQuant, Heidelberg, Germany
- Center for Digital Health, Berlin Institute of Health (BIH) at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Herbert B Schiller
- Institute of Lung Health and Immunity and Comprehensive Pneumology Center with the CPC-M bioArchive; Helmholtz Zentrum Munich; member of the German Center for Lung Research (DZL), Munich, Germany
- Research Unit, Precision Regenerative Medicine (PRM), Helmholtz Munich, Munich, Germany
| | - Anne Hilgendorff
- Institute of Lung Health and Immunity and Comprehensive Pneumology Center with the CPC-M bioArchive; Helmholtz Zentrum Munich; member of the German Center for Lung Research (DZL), Munich, Germany
- Center for Comprehensive Developmental Care (CDeCLMU) at the Social Pediatric Center, Dr. von Hauner Children's Hospital, LMU Hospital, Ludwig Maximilian University, Munich, Germany
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Munich, Munich, Germany.
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany.
- Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Munich, Germany.
| |
Collapse
|
4
|
Vabalas A, Hartonen T, Vartiainen P, Jukarainen S, Viippola E, Rodosthenous RS, Liu A, Hägg S, Perola M, Ganna A. Deep learning-based prediction of one-year mortality in Finland is an accurate but unfair aging marker. NATURE AGING 2024; 4:1014-1027. [PMID: 38914859 PMCID: PMC11257968 DOI: 10.1038/s43587-024-00657-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 05/27/2024] [Indexed: 06/26/2024]
Abstract
Short-term mortality risk, which is indicative of individual frailty, serves as a marker for aging. Previous age clocks focused on predicting either chronological age or longer-term mortality. Aging clocks predicting short-term mortality are lacking and their algorithmic fairness remains unexamined. We developed a deep learning model to predict 1-year mortality using nationwide longitudinal data from the Finnish population (FinRegistry; n = 5.4 million), incorporating more than 8,000 features spanning up to 50 years. We achieved an area under the curve (AUC) of 0.944, outperforming a baseline model that included only age and sex (AUC = 0.897). The model generalized well to different causes of death (AUC > 0.800 for 45 of 50 causes), including coronavirus disease 2019, which was absent in the training data. Performance varied among demographics, with young females exhibiting the best and older males the worst results. Extensive prediction fairness analyses highlighted disparities among disadvantaged groups, posing challenges to equitable integration into public health interventions. Our model accurately identified short-term mortality risk, potentially serving as a population-wide aging marker.
Collapse
Affiliation(s)
- Andrius Vabalas
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
| | - Tuomo Hartonen
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
| | - Pekka Vartiainen
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- Pediatric Research Center, Helsinki University Hospital and University of Helsinki, Helsinki, Finland
| | - Sakari Jukarainen
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
| | - Essi Viippola
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
| | | | - Aoxing Liu
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Sara Hägg
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Markus Perola
- The Finnish Institute for Health and Welfare, Helsinki, Finland
| | - Andrea Ganna
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
| |
Collapse
|
5
|
Nigo M, Rasmy L, Mao B, Kannadath BS, Xie Z, Zhi D. Deep learning model for personalized prediction of positive MRSA culture using time-series electronic health records. Nat Commun 2024; 15:2036. [PMID: 38448409 PMCID: PMC10917736 DOI: 10.1038/s41467-024-46211-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 02/19/2024] [Indexed: 03/08/2024] Open
Abstract
Methicillin-resistant Staphylococcus aureus (MRSA) poses significant morbidity and mortality in hospitals. Rapid, accurate risk stratification of MRSA is crucial for optimizing antibiotic therapy. Our study introduced a deep learning model, PyTorch_EHR, which leverages electronic health record (EHR) time-series data, including wide-variety patient specific data, to predict MRSA culture positivity within two weeks. 8,164 MRSA and 22,393 non-MRSA patient events from Memorial Hermann Hospital System, Houston, Texas are used for model development. PyTorch_EHR outperforms logistic regression (LR) and light gradient boost machine (LGBM) models in accuracy (AUROCPyTorch_EHR = 0.911, AUROCLR = 0.857, AUROCLGBM = 0.892). External validation with 393,713 patient events from the Medical Information Mart for Intensive Care (MIMIC)-IV dataset in Boston confirms its superior accuracy (AUROCPyTorch_EHR = 0.859, AUROCLR = 0.816, AUROCLGBM = 0.838). Our model effectively stratifies patients into high-, medium-, and low-risk categories, potentially optimizing antimicrobial therapy and reducing unnecessary MRSA-specific antimicrobials. This highlights the advantage of deep learning models in predicting MRSA positive cultures, surpassing traditional machine learning models and supporting clinicians' judgments.
Collapse
Affiliation(s)
- Masayuki Nigo
- McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA.
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA.
- Division of Infectious Diseases, Department of Medicine, Houston Methodist Hospital, Texas Medical Center, Houston, TX, USA.
| | - Laila Rasmy
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Bingyu Mao
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Bijun Sai Kannadath
- Department of Internal Medicine, University of Arizona College of Medicine, Phoenix, AZ, USA
| | - Ziqian Xie
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Degui Zhi
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| |
Collapse
|
6
|
Li F, Rasmy L, Xiang Y, Feng J, Abdelhameed A, Hu X, Sun Z, Aguilar D, Dhoble A, Du J, Wang Q, Niu S, Dang Y, Zhang X, Xie Z, Nian Y, He J, Zhou Y, Li J, Prosperi M, Bian J, Zhi D, Tao C. Dynamic Prognosis Prediction for Patients on DAPT After Drug-Eluting Stent Implantation: Model Development and Validation. J Am Heart Assoc 2024; 13:e029900. [PMID: 38293921 PMCID: PMC11056175 DOI: 10.1161/jaha.123.029900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Accepted: 12/01/2023] [Indexed: 02/01/2024]
Abstract
BACKGROUND The rapid evolution of artificial intelligence (AI) in conjunction with recent updates in dual antiplatelet therapy (DAPT) management guidelines emphasizes the necessity for innovative models to predict ischemic or bleeding events after drug-eluting stent implantation. Leveraging AI for dynamic prediction has the potential to revolutionize risk stratification and provide personalized decision support for DAPT management. METHODS AND RESULTS We developed and validated a new AI-based pipeline using retrospective data of drug-eluting stent-treated patients, sourced from the Cerner Health Facts data set (n=98 236) and Optum's de-identified Clinformatics Data Mart Database (n=9978). The 36 months following drug-eluting stent implantation were designated as our primary forecasting interval, further segmented into 6 sequential prediction windows. We evaluated 5 distinct AI algorithms for their precision in predicting ischemic and bleeding risks. Model discriminative accuracy was assessed using the area under the receiver operating characteristic curve, among other metrics. The weighted light gradient boosting machine stood out as the preeminent model, thus earning its place as our AI-DAPT model. The AI-DAPT demonstrated peak accuracy in the 30 to 36 months window, charting an area under the receiver operating characteristic curve of 90% [95% CI, 88%-92%] for ischemia and 84% [95% CI, 82%-87%] for bleeding predictions. CONCLUSIONS Our AI-DAPT excels in formulating iterative, refined dynamic predictions by assimilating ongoing updates from patients' clinical profiles, holding value as a novel smart clinical tool to facilitate optimal DAPT duration management with high accuracy and adaptability.
Collapse
Affiliation(s)
- Fang Li
- McWilliams School of Biomedical InformaticsUniversity of Texas Health Science Center at HoustonHoustonTXUSA
- Department of Artificial Intelligence and InformaticsMayo ClinicJacksonvilleFLUSA
| | - Laila Rasmy
- McWilliams School of Biomedical InformaticsUniversity of Texas Health Science Center at HoustonHoustonTXUSA
| | - Yang Xiang
- Peng Cheng LaboratoryShenzhenGuangdongChina
| | - Jingna Feng
- McWilliams School of Biomedical InformaticsUniversity of Texas Health Science Center at HoustonHoustonTXUSA
- Department of Artificial Intelligence and InformaticsMayo ClinicJacksonvilleFLUSA
| | - Ahmed Abdelhameed
- McWilliams School of Biomedical InformaticsUniversity of Texas Health Science Center at HoustonHoustonTXUSA
- Department of Artificial Intelligence and InformaticsMayo ClinicJacksonvilleFLUSA
| | - Xinyue Hu
- McWilliams School of Biomedical InformaticsUniversity of Texas Health Science Center at HoustonHoustonTXUSA
- Department of Artificial Intelligence and InformaticsMayo ClinicJacksonvilleFLUSA
| | - Zenan Sun
- McWilliams School of Biomedical InformaticsUniversity of Texas Health Science Center at HoustonHoustonTXUSA
| | - David Aguilar
- Department of Internal Medicine, McGovern Medical SchoolUniversity of Texas Health Science Center at HoustonHoustonTXUSA
- LSU School of Medicine, LSU Health New OrleansNew OrleansLAUSA
| | - Abhijeet Dhoble
- Department of Internal Medicine, McGovern Medical SchoolUniversity of Texas Health Science Center at HoustonHoustonTXUSA
| | - Jingcheng Du
- McWilliams School of Biomedical InformaticsUniversity of Texas Health Science Center at HoustonHoustonTXUSA
| | - Qing Wang
- McWilliams School of Biomedical InformaticsUniversity of Texas Health Science Center at HoustonHoustonTXUSA
| | - Shuteng Niu
- McWilliams School of Biomedical InformaticsUniversity of Texas Health Science Center at HoustonHoustonTXUSA
| | - Yifang Dang
- McWilliams School of Biomedical InformaticsUniversity of Texas Health Science Center at HoustonHoustonTXUSA
| | - Xinyuan Zhang
- McWilliams School of Biomedical InformaticsUniversity of Texas Health Science Center at HoustonHoustonTXUSA
| | - Ziqian Xie
- McWilliams School of Biomedical InformaticsUniversity of Texas Health Science Center at HoustonHoustonTXUSA
| | - Yi Nian
- McWilliams School of Biomedical InformaticsUniversity of Texas Health Science Center at HoustonHoustonTXUSA
| | - JianPing He
- McWilliams School of Biomedical InformaticsUniversity of Texas Health Science Center at HoustonHoustonTXUSA
| | - Yujia Zhou
- McWilliams School of Biomedical InformaticsUniversity of Texas Health Science Center at HoustonHoustonTXUSA
| | - Jianfu Li
- McWilliams School of Biomedical InformaticsUniversity of Texas Health Science Center at HoustonHoustonTXUSA
- Department of Artificial Intelligence and InformaticsMayo ClinicJacksonvilleFLUSA
| | - Mattia Prosperi
- Data Intelligence Systems Lab, Department of Epidemiology, College of Public Health and Health Professions & College of MedicineUniversity of FloridaGainesvilleFLUSA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, College of MedicineUniversity of FloridaGainesvilleFLUSA
| | - Degui Zhi
- McWilliams School of Biomedical InformaticsUniversity of Texas Health Science Center at HoustonHoustonTXUSA
| | - Cui Tao
- McWilliams School of Biomedical InformaticsUniversity of Texas Health Science Center at HoustonHoustonTXUSA
- Department of Artificial Intelligence and InformaticsMayo ClinicJacksonvilleFLUSA
| |
Collapse
|
7
|
Viderman D, Kotov A, Popov M, Abdildin Y. Machine and deep learning methods for clinical outcome prediction based on physiological data of COVID-19 patients: a scoping review. Int J Med Inform 2024; 182:105308. [PMID: 38091862 DOI: 10.1016/j.ijmedinf.2023.105308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 11/20/2023] [Accepted: 12/03/2023] [Indexed: 01/07/2024]
Abstract
INTRODUCTION Since the beginning of the COVID-19 pandemic, numerous machine and deep learning (MDL) methods have been proposed in the literature to analyze patient physiological data. The objective of this review is to summarize various aspects of these methods and assess their practical utility for predicting various clinical outcomes. METHODS We searched PubMed, Scopus, and Cochrane Library, screened and selected the studies matching the inclusion criteria. The clinical analysis focused on the characteristics of the patient cohorts in the studies included in this review, the specific tasks in the context of the COVID-19 pandemic that machine and deep learning methods were used for, and their practical limitations. The technical analysis focused on the details of specific MDL methods and their performance. RESULTS Analysis of the 48 selected studies revealed that the majority (∼54 %) of them examined the application of MDL methods for the prediction of survival/mortality-related patient outcomes, while a smaller fraction (∼13 %) of studies also examined applications to the prediction of patients' physiological outcomes and hospital resource utilization. 21 % of the studies examined the application of MDL methods to multiple clinical tasks. Machine and deep learning methods have been shown to be effective at predicting several outcomes of COVID-19 patients, such as disease severity, complications, intensive care unit (ICU) transfer, and mortality. MDL methods also achieved high accuracy in predicting the required number of ICU beds and ventilators. CONCLUSION Machine and deep learning methods have been shown to be valuable tools for predicting disease severity, organ dysfunction and failure, patient outcomes, and hospital resource utilization during the COVID-19 pandemic. The discovered knowledge and our conclusions and recommendations can also be useful to healthcare professionals and artificial intelligence researchers in managing future pandemics.
Collapse
Affiliation(s)
- Dmitriy Viderman
- Department of Surgery, School of Medicine, Nazarbayev University, Astana, Kazakhstan; Department of Anesthesiology, Intensive Care, and Pain Medicine, National Research Oncology Center, Astana, Kazakhstan.
| | - Alexander Kotov
- Department of Computer Science, College of Engineering, Wayne State University, Detroit, USA.
| | - Maxim Popov
- Department of Computer Science, School of Engineering and Digital Sciences, Nazarbayev University, Astana, Kazakhstan.
| | - Yerkin Abdildin
- Department of Mechanical and Aerospace Engineering, School of Engineering and Digital Sciences, Nazarbayev University, Astana, Kazakhstan.
| |
Collapse
|
8
|
Park H, Choi CM, Kim SH, Kim SH, Kim DK, Jeong JB. In-hospital real-time prediction of COVID-19 severity regardless of disease phase using electronic health records. PLoS One 2024; 19:e0294362. [PMID: 38271404 PMCID: PMC10810421 DOI: 10.1371/journal.pone.0294362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 10/31/2023] [Indexed: 01/27/2024] Open
Abstract
Coronavirus disease 2019 (COVID-19) has strained healthcare systems worldwide. Predicting COVID-19 severity could optimize resource allocation, like oxygen devices and intensive care. If machine learning model could forecast the severity of COVID-19 patients, hospital resource allocation would be more comfortable. This study evaluated machine learning models using electronic records from 3,996 COVID-19 patients to forecast mild, moderate, or severe disease up to 2 days in advance. A deep neural network (DNN) model achieved 91.8% accuracy, 0.96 AUROC, and 0.90 AUPRC for 2-day predictions, regardless of disease phase. Tree-based models like random forest achieved slightly better metrics (random forest: 94.1% of accuracy, 0.98 AUROC, 0.95 AUPRC; Gradient boost: 94.1% of accuracy, 0.98 AUROC, 0.94 AUPRC), prioritizing treatment factors like steroid use. However, the DNN relied more on fixed patient factors like demographics and symptoms in aspect to SHAP value importance. Since treatment patterns vary between hospitals, the DNN may be more generalizable than tree-based models (random forest, gradient boost model). The results demonstrate accurate short-term forecasting of COVID-19 severity using routine clinical data. DNN models may balance predictive performance and generalizability better than other methods. Severity predictions by machine learning model could facilitate resource planning, like ICU arrangement and oxygen devices.
Collapse
Affiliation(s)
- Hyungjun Park
- Division of pulmonology and Critical Care Medicine, Department of Internal Medicine, Gumdan top hospital, Incheon, South Korea
| | - Chang-Min Choi
- Division of Pulmonology and Critical Care Medicine, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, South Korea
- Division of Oncology, Department of Internal Medicine, University of Ulsan College of Medicine, Asan Medical Center, Seoul, South Korea
| | - Sung-Hoon Kim
- Department of Anesthesiology and Pain Medicine, Asan Medical Center, Ulsan College of Medicine, Seoul, South Korea
| | - Su Hwan Kim
- Department of Internal Medicine, Seoul National University College of Medicine, Seoul, South Korea
- Division of Gastroenterology, Department of Internal Medicine, Seoul Metropolitan Government Seoul National University Boramae Medical Center, Seoul, South Korea
| | - Deog Kyoem Kim
- Department of Internal Medicine, Seoul National University College of Medicine, Seoul, South Korea
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Seoul Metropolitan Government Seoul National University Boramae Medical Center, Seoul, South Korea
| | - Ji Bong Jeong
- Department of Internal Medicine, Seoul National University College of Medicine, Seoul, South Korea
- Division of Gastroenterology, Department of Internal Medicine, Seoul Metropolitan Government Seoul National University Boramae Medical Center, Seoul, South Korea
| |
Collapse
|
9
|
Guo F, Adekanmbi V, Hsu CD, Polychronopoulou E, Berenson AB. One dose versus two doses of COVID-19 vaccine for the prevention of breakthrough infections among people previously infected with SARS-Cov-2. J Med Virol 2024; 96:e29391. [PMID: 38235834 PMCID: PMC10837048 DOI: 10.1002/jmv.29391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 12/04/2023] [Accepted: 01/01/2024] [Indexed: 01/19/2024]
Abstract
Studies have suggested the effectiveness of COVID-19 vaccines in preventing SARS-CoV-2 reinfection among those previously infected. However, it is not yet clear if one dose of the vaccine is enough to prevent breakthrough infections compared to two doses. Using data from Optum deidentified COVID-19 Electronic Health Record (EHR) data set, we assessed breakthrough infection risks in individuals previously infected, comparing those with one vaccine dose to those with two doses. Propensity scores were applied to mitigate confounding factors. Follow-up spanned 6 months, beginning 2 weeks postvaccination. Among 213 845 individuals, those receiving one vaccine dose had a significantly higher breakthrough infection risk than the two-dose group (HR 1.69, 95% CI 1.54-1.85). This pattern was observed across genders, racial/ethnic groups, age categories, and vaccine types. This study reveals a substantial disparity in the risk of breakthrough infections between individuals receiving one versus two doses of the COVID-19 vaccine, suggesting that a single dose may not provide adequate protection against reinfection.
Collapse
Affiliation(s)
- Fangjian Guo
- Division of Population and Preventive Health, Department of Obstetrics & Gynecology, The University of Texas Medical Branch at Galveston, Galveston, Texas, United States
- Center for Interdisciplinary Research in Women’s Health, School of Medicine, The University of Texas Medical Branch at Galveston, Galveston, Texas, United States
| | - Victor Adekanmbi
- Division of Population and Preventive Health, Department of Obstetrics & Gynecology, The University of Texas Medical Branch at Galveston, Galveston, Texas, United States
- Center for Interdisciplinary Research in Women’s Health, School of Medicine, The University of Texas Medical Branch at Galveston, Galveston, Texas, United States
| | - Christine D. Hsu
- Division of Population and Preventive Health, Department of Obstetrics & Gynecology, The University of Texas Medical Branch at Galveston, Galveston, Texas, United States
- Center for Interdisciplinary Research in Women’s Health, School of Medicine, The University of Texas Medical Branch at Galveston, Galveston, Texas, United States
| | - Efstathia Polychronopoulou
- Office of Biostatistics, School of Public and Population Health, The University of Texas Medical Branch at Galveston, Galveston, Texas, United States
| | - Abbey B. Berenson
- Division of Population and Preventive Health, Department of Obstetrics & Gynecology, The University of Texas Medical Branch at Galveston, Galveston, Texas, United States
- Center for Interdisciplinary Research in Women’s Health, School of Medicine, The University of Texas Medical Branch at Galveston, Galveston, Texas, United States
| |
Collapse
|
10
|
Rakhshan SA, Zaj M, Ghane FH, Nejad MS. Exploring the potential of learning methods and recurrent dynamic model with vaccination: A comparative case study of COVID-19 in Austria, Brazil, and China. Phys Rev E 2024; 109:014212. [PMID: 38366403 DOI: 10.1103/physreve.109.014212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 12/11/2023] [Indexed: 02/18/2024]
Abstract
In order to effectively manage infectious diseases, it is crucial to understand the interplay between disease dynamics and human conduct. Various factors can impact the control of an epidemic, including social interventions, adherence to health protocols, mask-wearing, and vaccination. This article presents the development of an innovative hybrid model, known as the Combined Dynamic-Learning Model, that integrates classical recurrent dynamic models with four different learning methods. The model is composed of two approaches: The first approach introduces a traditional dynamic model that focuses on analyzing the impact of vaccination on the occurrence of an epidemic, and the second approach employs various learning methods to forecast the potential outcomes of an epidemic. Furthermore, our numerical results offer an interesting comparison between the traditional approach and modern learning techniques. Our classic dynamic model is a compartmental model that aims to analyze and forecast the diffusion of epidemics. The model we propose has a recurrent structure with piecewise constant parameters and includes compartments for susceptible, exposed, vaccinated, infected, and recovered individuals. This model can accurately mirror the dynamics of infectious diseases, which enables us to evaluate the impact of restrictive measures on the spread of diseases. We conduct a comprehensive dynamic analysis of our model. Additionally, we suggest an optimal numerical design to determine the parameters of the system. Also, we use regression tree learning, bidirectional long short-term memory, gated recurrent unit, and a combined deep learning method for training and evaluation of an epidemic. In the final section of our paper, we apply these methods to recently published data on COVID-19 in Austria, Brazil, and China from 26 February 2021 to 4 August 2021, which is when vaccination efforts began. To evaluate the numerical results, we utilized various metrics such as RMSE and R-squared. Our findings suggest that the dynamic model is ideal for long-term analysis, data fitting, and identifying parameters that impact epidemics. However, it is not as effective as the supervised learning method for making long-term forecasts. On the other hand, supervised learning techniques, compared to dynamic models, are more effective for predicting the spread of diseases, but not for analyzing the behavior of epidemics.
Collapse
Affiliation(s)
- Seyed Ali Rakhshan
- Department of Mathematics, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Marzie Zaj
- Department of Mathematics, Ferdowsi University of Mashhad, Mashhad, Iran
| | | | - Mahdi Soltani Nejad
- Department of Railway Engineering, Iran University of Science and Technology, Tehran, Iran
| |
Collapse
|
11
|
Papanastasiou G, Yang G, Fotiadis DI, Dikaios N, Wang C, Huda A, Sobolevsky L, Raasch J, Perez E, Sidhu G, Palumbo D. Large-scale deep learning analysis to identify adult patients at risk for combined and common variable immunodeficiencies. COMMUNICATIONS MEDICINE 2023; 3:189. [PMID: 38123736 PMCID: PMC10733406 DOI: 10.1038/s43856-023-00412-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 11/21/2023] [Indexed: 12/23/2023] Open
Abstract
BACKGROUND Primary immunodeficiency (PI) is a group of heterogeneous disorders resulting from immune system defects. Over 70% of PI is undiagnosed, leading to increased mortality, co-morbidity and healthcare costs. Among PI disorders, combined immunodeficiencies (CID) are characterized by complex immune defects. Common variable immunodeficiency (CVID) is among the most common types of PI. In light of available treatments, it is critical to identify adult patients at risk for CID and CVID, before the development of serious morbidity and mortality. METHODS We developed a deep learning-based method (named "TabMLPNet") to analyze clinical history from nationally representative medical claims from electronic health records (Optum® data, covering all US), evaluated in the setting of identifying CID/CVID in adults. Further, we revealed the most important CID/CVID-associated antecedent phenotype combinations. Four large cohorts were generated: a total of 47,660 PI cases and (1:1 matched) controls. RESULTS The sensitivity/specificity of TabMLPNet modeling ranges from 0.82-0.88/0.82-0.85 across cohorts. Distinctive combinations of antecedent phenotypes associated with CID/CVID are identified, consisting of respiratory infections/conditions, genetic anomalies, cardiac defects, autoimmune diseases, blood disorders and malignancies, which can possibly be useful to systematize the identification of CID and CVID. CONCLUSIONS We demonstrated an accurate method in terms of CID and CVID detection evaluated on large-scale medical claims data. Our predictive scheme can potentially lead to the development of new clinical insights and expanded guidelines for identification of adult patients at risk for CID and CVID as well as be used to improve patient outcomes on population level.
Collapse
Affiliation(s)
| | - Guang Yang
- National Heart and Lung Institute, Imperial College London, London, UK
- Cardiovascular Research Centre, Royal Brompton Hospital, London, UK
- School of Biomedical Engineering & Imaging Sciences, King's College London, London, UK
| | - Dimitris I Fotiadis
- Department of Biomedical Research, Institute of Molecular Biology and Biotechnology, FORTH, Ioannina, Greece
- Unit of Medical Technology and Intelligent Information Systems, University of Ioannina, Ioannina, Greece
| | | | - Chengjia Wang
- School of Mathematical and Computer Sciences, Heriot Watt, Edinburgh, UK
- Edinburgh Centre for Robotics, Edinburgh, UK
| | | | | | | | - Elena Perez
- Allergy Associates of the Palm Beaches, North Palm Beach, FL, USA
| | | | | |
Collapse
|
12
|
Honchar O, Ashcheulova T, Chumachenko T, Chumachenko D, Bobeiko A, Blazhko V, Khodosh E, Matiash N, Ambrosova T, Herasymchuk N, Kochubiei O, Smyrnova V. A prognostic model and pre-discharge predictors of post-COVID-19 syndrome after hospitalization for SARS-CoV-2 infection. Front Public Health 2023; 11:1276211. [PMID: 38094237 PMCID: PMC10716462 DOI: 10.3389/fpubh.2023.1276211] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 10/25/2023] [Indexed: 12/18/2023] Open
Abstract
Background Post-COVID-19 syndrome (PCS) has been increasingly recognized as an emerging problem: 50% of patients report ongoing symptoms 1 year after acute infection, with most typical manifestations (fatigue, dyspnea, psychiatric and neurological symptoms) having potentially debilitating effect. Early identification of high-risk candidates for PCS development would facilitate the optimal use of resources directed to rehabilitation of COVID-19 convalescents. Objective To study the in-hospital clinical characteristics of COVID-19 survivors presenting with self-reported PCS at 3 months and to identify the early predictors of its development. Methods 221 hospitalized COVID-19 patients underwent symptoms assessment, 6-min walk test, and echocardiography pre-discharge and at 1 month; presence of PCS was assessed 3 months after discharge. Unsupervised machine learning was used to build a SANN-based binary classification model of PCS development. Results PCS at 3 months has been detected in 75% patients. Higher symptoms level in the PCS group was not associated with worse physical functional recovery or significant echocardiographic changes. Despite identification of a set of pre-discharge predictors, inclusion of parameters obtained at 1 month proved necessary to obtain a high accuracy model of PCS development, with inputs list including age, sex, in-hospital levels of CRP, eGFR and need for oxygen supplementation, and level of post-exertional symptoms at 1 month after discharge (fatigue and dyspnea in 6MWT and MRC Dyspnea score). Conclusion Hospitalized COVID-19 survivors at 3 months were characterized by 75% prevalence of PCS, the development of which could be predicted with an 89% accuracy using the derived neural network-based classification model.
Collapse
Affiliation(s)
- Oleksii Honchar
- Department of Propedeutics of Internal Medicine No.1, Fundamentals of Bioethics and Biosafety, Kharkiv National Medical University, Kharkiv, Ukraine
| | - Tetiana Ashcheulova
- Department of Propedeutics of Internal Medicine No.1, Fundamentals of Bioethics and Biosafety, Kharkiv National Medical University, Kharkiv, Ukraine
| | - Tetyana Chumachenko
- Department of Epidemiology, Kharkiv National Medical University, Kharkiv, Ukraine
| | - Dmytro Chumachenko
- Department of Mathematical Modelling and Artificial Intelligence, National Aerospace University "Kharkiv Aviation Institute", Kharkiv, Ukraine
| | - Alla Bobeiko
- Department of Pulmonology, MNE “Clinical City Hospital No.13” of Kharkiv City Council, Kharkiv, Ukraine
| | - Viktor Blazhko
- Department of Pulmonology, MNE “Clinical City Hospital No.13” of Kharkiv City Council, Kharkiv, Ukraine
| | - Eduard Khodosh
- Department of Pulmonology, MNE “Clinical City Hospital No.13” of Kharkiv City Council, Kharkiv, Ukraine
| | - Nataliia Matiash
- Department of Pulmonology, MNE “Clinical City Hospital No.13” of Kharkiv City Council, Kharkiv, Ukraine
| | - Tetiana Ambrosova
- Department of Propedeutics of Internal Medicine No.1, Fundamentals of Bioethics and Biosafety, Kharkiv National Medical University, Kharkiv, Ukraine
| | - Nina Herasymchuk
- Department of Propedeutics of Internal Medicine No.1, Fundamentals of Bioethics and Biosafety, Kharkiv National Medical University, Kharkiv, Ukraine
| | - Oksana Kochubiei
- Department of Propedeutics of Internal Medicine No.1, Fundamentals of Bioethics and Biosafety, Kharkiv National Medical University, Kharkiv, Ukraine
| | - Viktoriia Smyrnova
- Department of Propedeutics of Internal Medicine No.1, Fundamentals of Bioethics and Biosafety, Kharkiv National Medical University, Kharkiv, Ukraine
| |
Collapse
|
13
|
Huang Y, Li J, Li M, Aparasu RR. Application of machine learning in predicting survival outcomes involving real-world data: a scoping review. BMC Med Res Methodol 2023; 23:268. [PMID: 37957593 PMCID: PMC10641971 DOI: 10.1186/s12874-023-02078-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 10/20/2023] [Indexed: 11/15/2023] Open
Abstract
BACKGROUND Despite the interest in machine learning (ML) algorithms for analyzing real-world data (RWD) in healthcare, the use of ML in predicting time-to-event data, a common scenario in clinical practice, is less explored. ML models are capable of algorithmically learning from large, complex datasets and can offer advantages in predicting time-to-event data. We reviewed the recent applications of ML for survival analysis using RWD in healthcare. METHODS PUBMED and EMBASE were searched from database inception through March 2023 to identify peer-reviewed English-language studies of ML models for predicting time-to-event outcomes using the RWD. Two reviewers extracted information on the data source, patient population, survival outcome, ML algorithms, and the Area Under the Curve (AUC). RESULTS Of 257 citations, 28 publications were included. Random survival forests (N = 16, 57%) and neural networks (N = 11, 39%) were the most popular ML algorithms. There was variability across AUC for these ML models (median 0.789, range 0.6-0.950). ML algorithms were predominately considered for predicting overall survival in oncology (N = 12, 43%). ML survival models were often used to predict disease prognosis or clinical events (N = 27, 96%) in the oncology, while less were used for treatment outcomes (N = 1, 4%). CONCLUSIONS The ML algorithms, random survival forests and neural networks, are mainly used for RWD to predict survival outcomes such as disease prognosis or clinical events in the oncology. This review shows that more opportunities remain to apply these ML algorithms to inform treatment decision-making in clinical practice. More methodological work is also needed to ensure the utility and applicability of ML models in survival outcomes.
Collapse
Affiliation(s)
- Yinan Huang
- Department of Pharmacy Administration, School of Pharmacy, University of Mississippi, University, MS, 38677, USA
| | - Jieni Li
- Department of Pharmaceutical Health Outcomes and Policy, College of Pharmacy, University of Houston, Houston, TX, 77204, USA
| | - Mai Li
- Department of Industrial Engineering, Cullen College of Engineering, University of Houston, Houston, TX, USA
| | - Rajender R Aparasu
- Department of Pharmaceutical Health Outcomes and Policy, College of Pharmacy, University of Houston, Houston, TX, 77204, USA.
| |
Collapse
|
14
|
Wang L, Zipursky AR, Geva A, McMurry AJ, Mandl KD, Miller TA. A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital. JAMIA Open 2023; 6:ooad047. [PMID: 37425487 PMCID: PMC10322650 DOI: 10.1093/jamiaopen/ooad047] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 06/13/2023] [Accepted: 06/30/2023] [Indexed: 07/11/2023] Open
Abstract
Objective To identify a cohort of COVID-19 cases, including when evidence of virus positivity was only mentioned in the clinical text, not in structured laboratory data in the electronic health record (EHR). Materials and Methods Statistical classifiers were trained on feature representations derived from unstructured text in patient EHRs. We used a proxy dataset of patients with COVID-19 polymerase chain reaction (PCR) tests for training. We selected a model based on performance on our proxy dataset and applied it to instances without COVID-19 PCR tests. A physician reviewed a sample of these instances to validate the classifier. Results On the test split of the proxy dataset, our best classifier obtained 0.56 F1, 0.6 precision, and 0.52 recall scores for SARS-CoV2 positive cases. In an expert validation, the classifier correctly identified 97.6% (81/84) as COVID-19 positive and 97.8% (91/93) as not SARS-CoV2 positive. The classifier labeled an additional 960 cases as not having SARS-CoV2 lab tests in hospital, and only 177 of those cases had the ICD-10 code for COVID-19. Discussion Proxy dataset performance may be worse because these instances sometimes include discussion of pending lab tests. The most predictive features are meaningful and interpretable. The type of external test that was performed is rarely mentioned. Conclusion COVID-19 cases that had testing done outside of the hospital can be reliably detected from the text in EHRs. Training on a proxy dataset was a suitable method for developing a highly performant classifier without labor-intensive labeling efforts.
Collapse
Affiliation(s)
- Lijing Wang
- Department of Data Science, New Jersey Institute of Technology, Newark, New Jersey, USA
| | - Amy R Zipursky
- Computational Health Informatics Program and Department of Emergency Medicine, Boston Children's Hospital, Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Alon Geva
- Computational Health Informatics Program and Division of Critical Care Medicine, Department of Anesthesiology, Critical Care, and Pain Medicine, Boston Children’s Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Andrew J McMurry
- Computational Health Informatics Program, Boston Children’s Hospital, Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Kenneth D Mandl
- Computational Health Informatics Program, Boston Children’s Hospital, Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Timothy A Miller
- Computational Health Informatics Program, Boston Children’s Hospital, Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
15
|
Lentzen M, Linden T, Veeranki S, Madan S, Kramer D, Leodolter W, Frohlich H. A Transformer-Based Model Trained on Large Scale Claims Data for Prediction of Severe COVID-19 Disease Progression. IEEE J Biomed Health Inform 2023; 27:4548-4558. [PMID: 37347632 DOI: 10.1109/jbhi.2023.3288768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/24/2023]
Abstract
In situations like the COVID-19 pandemic, healthcare systems are under enormous pressure as they can rapidly collapse under the burden of the crisis. Machine learning (ML) based risk models could lift the burden by identifying patients with a high risk of severe disease progression. Electronic Health Records (EHRs) provide crucial sources of information to develop these models because they rely on routinely collected healthcare data. However, EHR data is challenging for training ML models because it contains irregularly timestamped diagnosis, prescription, and procedure codes. For such data, transformer-based models are promising. We extended the previously published Med-BERT model by including age, sex, medications, quantitative clinical measures, and state information. After pre-training on approximately 988 million EHRs from 3.5 million patients, we developed models to predict Acute Respiratory Manifestations (ARM) risk using the medical history of 80,211 COVID-19 patients. Compared to Random Forests, XGBoost, and RETAIN, our transformer-based models more accurately forecast the risk of developing ARM after COVID-19 infection. We used Integrated Gradients and Bayesian networks to understand the link between the essential features of our model. Finally, we evaluated adapting our model to Austrian in-patient data. Our study highlights the promise of predictive transformer-based models for precision medicine.
Collapse
|
16
|
Zhang T, Tan T, Wang X, Gao Y, Han L, Balkenende L, D'Angelo A, Bao L, Horlings HM, Teuwen J, Beets-Tan RGH, Mann RM. RadioLOGIC, a healthcare model for processing electronic health records and decision-making in breast disease. Cell Rep Med 2023; 4:101131. [PMID: 37490915 PMCID: PMC10439251 DOI: 10.1016/j.xcrm.2023.101131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 05/26/2023] [Accepted: 06/30/2023] [Indexed: 07/27/2023]
Abstract
Digital health data used in diagnostics, patient care, and oncology research continue to accumulate exponentially. Most medical information, and particularly radiology results, are stored in free-text format, and the potential of these data remains untapped. In this study, a radiological repomics-driven model incorporating medical token cognition (RadioLOGIC) is proposed to extract repomics (report omics) features from unstructured electronic health records and to assess human health and predict pathological outcome via transfer learning. The average accuracy and F1-weighted score for the extraction of repomics features using RadioLOGIC are 0.934 and 0.934, respectively, and 0.906 and 0.903 for the prediction of breast imaging-reporting and data system scores. The areas under the receiver operating characteristic curve for the prediction of pathological outcome without and with transfer learning are 0.912 and 0.945, respectively. RadioLOGIC outperforms cohort models in the capability to extract features and also reveals promise for checking clinical diagnoses directly from electronic health records.
Collapse
Affiliation(s)
- Tianyu Zhang
- Department of Radiology, Netherlands Cancer Institute (NKI), Plesmanlaan 121, 1066 CX Amsterdam, the Netherlands; GROW School for Oncology and Development Biology, Maastricht University, P.O. Box 616, 6200 MD Maastricht, the Netherlands; Department of Diagnostic Imaging, Radboud University Medical Center, Geert Grooteplein 10, 6525 GA Nijmegen, the Netherlands
| | - Tao Tan
- Department of Radiology, Netherlands Cancer Institute (NKI), Plesmanlaan 121, 1066 CX Amsterdam, the Netherlands; Department of Diagnostic Imaging, Radboud University Medical Center, Geert Grooteplein 10, 6525 GA Nijmegen, the Netherlands; Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China.
| | - Xin Wang
- Department of Radiology, Netherlands Cancer Institute (NKI), Plesmanlaan 121, 1066 CX Amsterdam, the Netherlands; GROW School for Oncology and Development Biology, Maastricht University, P.O. Box 616, 6200 MD Maastricht, the Netherlands; Department of Diagnostic Imaging, Radboud University Medical Center, Geert Grooteplein 10, 6525 GA Nijmegen, the Netherlands
| | - Yuan Gao
- Department of Radiology, Netherlands Cancer Institute (NKI), Plesmanlaan 121, 1066 CX Amsterdam, the Netherlands; GROW School for Oncology and Development Biology, Maastricht University, P.O. Box 616, 6200 MD Maastricht, the Netherlands; Department of Diagnostic Imaging, Radboud University Medical Center, Geert Grooteplein 10, 6525 GA Nijmegen, the Netherlands
| | - Luyi Han
- Department of Radiology, Netherlands Cancer Institute (NKI), Plesmanlaan 121, 1066 CX Amsterdam, the Netherlands; Department of Diagnostic Imaging, Radboud University Medical Center, Geert Grooteplein 10, 6525 GA Nijmegen, the Netherlands
| | - Luuk Balkenende
- Department of Radiology, Netherlands Cancer Institute (NKI), Plesmanlaan 121, 1066 CX Amsterdam, the Netherlands; Department of Diagnostic Imaging, Radboud University Medical Center, Geert Grooteplein 10, 6525 GA Nijmegen, the Netherlands
| | - Anna D'Angelo
- Dipartimento di diagnostica per immagini, Radioterapia, Oncologia ed ematologia, Fondazione Universitaria A. Gemelli, IRCCS Roma, Roma, Italy
| | - Lingyun Bao
- Affiliated Hangzhou First People's Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Hugo M Horlings
- Division of Pathology, Netherlands Cancer Institute (NKI), Plesmanlaan 121, 1066 CX Amsterdam, the Netherlands
| | - Jonas Teuwen
- Department of Radiation Oncology, Netherlands Cancer Institute (NKI), Plesmanlaan 121, 1066 CX Amsterdam, the Netherlands
| | - Regina G H Beets-Tan
- Department of Radiology, Netherlands Cancer Institute (NKI), Plesmanlaan 121, 1066 CX Amsterdam, the Netherlands; GROW School for Oncology and Development Biology, Maastricht University, P.O. Box 616, 6200 MD Maastricht, the Netherlands
| | - Ritse M Mann
- Department of Radiology, Netherlands Cancer Institute (NKI), Plesmanlaan 121, 1066 CX Amsterdam, the Netherlands; Department of Diagnostic Imaging, Radboud University Medical Center, Geert Grooteplein 10, 6525 GA Nijmegen, the Netherlands
| |
Collapse
|
17
|
Liang C, Lyu T, Weissman S, Daering N, Olatosi B, Hikmet N, Li X. Early Prediction of COVID-19 Associated Hospitalization at the Time of CDC Contact Tracing using Machine Learning: Towards Pandemic Preparedness. RESEARCH SQUARE 2023:rs.3.rs-3213502. [PMID: 37609292 PMCID: PMC10441515 DOI: 10.21203/rs.3.rs-3213502/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
Objective To develop and validate machine learning models for predicting COVID-19 related hospitalization as early as CDC contact tracing using integrated CDC contact tracing and South Carolina medical claims data. Methods Using the dataset (n=82,073, 1/1/2018 - 3/1/2020), we identified 3,305 patients with COVID-19 and were captured by contact tracing. We developed and validated machine learning models (i.e., support vector machine, random forest, XGboost), followed by multi-level validations and pilot statewide implementation. Results Using 10-cross validation, random forest outperformed other models (F1=0.872 for general hospitalization and 0.763 for COVID-19 related hospitalization), followed by XGBoost (F1=0.845 and 0.682) and support vector machine (F1=0.845 and 0.644). We identified new self-reported symptoms from contact tracing (e.g., fatigue, congestion, headache, loss of taste) that are highly predictive of hospitalization. Conclusions Our study demonstrated the feasibility of identifying individuals at risk of hospitalization at the time of contact tracing for early intervention and prevention. Policy implications Our findings demonstrate existing promise for leveraging CDC contact tracing for establishing a cost-effective statewide surveillance and generalizability for nationwide adoption for enhancing pandemic preparedness in the US.
Collapse
|
18
|
Liu M, Li S, Yuan H, Ong MEH, Ning Y, Xie F, Saffari SE, Shang Y, Volovici V, Chakraborty B, Liu N. Handling missing values in healthcare data: A systematic review of deep learning-based imputation techniques. Artif Intell Med 2023; 142:102587. [PMID: 37316097 DOI: 10.1016/j.artmed.2023.102587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Revised: 04/08/2023] [Accepted: 05/16/2023] [Indexed: 06/16/2023]
Abstract
OBJECTIVE The proper handling of missing values is critical to delivering reliable estimates and decisions, especially in high-stakes fields such as clinical research. In response to the increasing diversity and complexity of data, many researchers have developed deep learning (DL)-based imputation techniques. We conducted a systematic review to evaluate the use of these techniques, with a particular focus on the types of data, intending to assist healthcare researchers from various disciplines in dealing with missing data. MATERIALS AND METHODS We searched five databases (MEDLINE, Web of Science, Embase, CINAHL, and Scopus) for articles published prior to February 8, 2023 that described the use of DL-based models for imputation. We examined selected articles from four perspectives: data types, model backbones (i.e., main architectures), imputation strategies, and comparisons with non-DL-based methods. Based on data types, we created an evidence map to illustrate the adoption of DL models. RESULTS Out of 1822 articles, a total of 111 were included, of which tabular static data (29%, 32/111) and temporal data (40%, 44/111) were the most frequently investigated. Our findings revealed a discernible pattern in the choice of model backbones and data types, for example, the dominance of autoencoder and recurrent neural networks for tabular temporal data. The discrepancy in imputation strategy usage among data types was also observed. The "integrated" imputation strategy, which solves the imputation task simultaneously with downstream tasks, was most popular for tabular temporal data (52%, 23/44) and multi-modal data (56%, 5/9). Moreover, DL-based imputation methods yielded a higher level of imputation accuracy than non-DL methods in most studies. CONCLUSION The DL-based imputation models are a family of techniques, with diverse network structures. Their designation in healthcare is usually tailored to data types with different characteristics. Although DL-based imputation models may not be superior to conventional approaches across all datasets, it is highly possible for them to achieve satisfactory results for a particular data type or dataset. There are, however, still issues with regard to portability, interpretability, and fairness associated with current DL-based imputation models.
Collapse
Affiliation(s)
- Mingxuan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore
| | - Siqi Li
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore
| | - Han Yuan
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore
| | - Marcus Eng Hock Ong
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore; Department of Emergency Medicine, Singapore General Hospital, Singapore
| | - Yilin Ning
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore
| | - Feng Xie
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore; Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore
| | - Seyed Ehsan Saffari
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore; Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore
| | - Yuqing Shang
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore
| | - Victor Volovici
- Department of Neurosurgery, Erasmus MC University Medical Center, Rotterdam, the Netherlands
| | - Bibhas Chakraborty
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore; Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore; Department of Statistics and Data Science, National University of Singapore, Singapore; Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - Nan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore; Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore; SingHealth AI Office, Singapore Health Services, Singapore; Institute of Data Science, National University of Singapore, Singapore.
| |
Collapse
|
19
|
Wang M, Sushil M, Miao BY, Butte AJ. Bottom-up and top-down paradigms of artificial intelligence research approaches to healthcare data science using growing real-world big data. J Am Med Inform Assoc 2023; 30:1323-1332. [PMID: 37187158 PMCID: PMC10280344 DOI: 10.1093/jamia/ocad085] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 04/03/2023] [Accepted: 05/04/2023] [Indexed: 05/17/2023] Open
Abstract
OBJECTIVES As the real-world electronic health record (EHR) data continue to grow exponentially, novel methodologies involving artificial intelligence (AI) are becoming increasingly applied to enable efficient data-driven learning and, ultimately, to advance healthcare. Our objective is to provide readers with an understanding of evolving computational methods and help in deciding on methods to pursue. TARGET AUDIENCE The sheer diversity of existing methods presents a challenge for health scientists who are beginning to apply computational methods to their research. Therefore, this tutorial is aimed at scientists working with EHR data who are early entrants into the field of applying AI methodologies. SCOPE This manuscript describes the diverse and growing AI research approaches in healthcare data science and categorizes them into 2 distinct paradigms, the bottom-up and top-down paradigms to provide health scientists venturing into artificial intelligent research with an understanding of the evolving computational methods and help in deciding on methods to pursue through the lens of real-world healthcare data.
Collapse
Affiliation(s)
- Michelle Wang
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California, USA
| | - Madhumita Sushil
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California, USA
| | - Brenda Y Miao
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California, USA
| | - Atul J Butte
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California, USA
- Department of Pediatrics, University of California, San Francisco, San Francisco, California, USA
| |
Collapse
|
20
|
Rakhshan SA, Nejad MS, Zaj M, Ghane FH. Global analysis and prediction scenario of infectious outbreaks by recurrent dynamic model and machine learning models: A case study on COVID-19. Comput Biol Med 2023; 158:106817. [PMID: 36989749 PMCID: PMC10035804 DOI: 10.1016/j.compbiomed.2023.106817] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 03/10/2023] [Accepted: 03/20/2023] [Indexed: 03/25/2023]
Abstract
It is essential to evaluate patient outcomes at an early stage when dealing with a pandemic to provide optimal clinical care and resource management. Many methods have been proposed to provide a roadmap against different pandemics, including the recent pandemic disease COVID-19. Due to recurrent epidemic waves of COVID-19, which have been observed in many countries, mathematical modeling and forecasting of COVID-19 are still necessary as long as the world continues to battle against the pandemic. Modeling may aid in determining which interventions to try or predict future growth patterns. In this article, we design a combined approach for analyzing any pandemic in two separate parts. In the first part of the paper, we develop a recurrent SEIRS compartmental model to predict recurrent outbreak patterns of diseases. Due to its time-varying parameters, our model is able to reflect the dynamics of infectious diseases, and to measure the effectiveness of the restrictive measures. We discuss the stable solutions of the corresponding autonomous system with frozen parameters. We focus on the regime shifts and tipping points; then we investigate tipping phenomena due to parameter drifts in our time-varying parameters model that exhibits a bifurcation in the frozen-in case. Furthermore, we propose an optimal numerical design for estimating the system’s parameters. In the second part, we introduce machine learning models to strengthen the methodology of our paper in data analysis, particularly for prediction scenarios. We use MLP, RBF, LSTM, ANFIS, and GRNN for training and evaluation of COVID-19. Then, we compare the results with the recurrent dynamical system in the fitting process and prediction scenario. We also confirm results by implementing our methods on the released data on COVID-19 by WHO for Italy, Germany, Iran, and South Africa between 1/22/2020 and 7/24/2021, when people were engaged with different variants including Alpha, Beta, Gamma, and Delta. The results of this article show that the dynamic model is adequate for long-term analysis and data fitting, as well as obtaining parameters affecting the epidemic. However, it is ineffective in providing a long-term forecast. In contrast machine learning methods effectively provide disease prediction, although they do not provide analysis such as dynamic models. Finally, some metrics, including RMSE, R-Squared, and accuracy, are used to evaluate the machine learning models. These metrics confirm that ANFIS and RBF perform better than other methods in training and testing zones.
Collapse
Affiliation(s)
| | - Mahdi Soltani Nejad
- Department of Railway Engineering, Iran University of Science and Technology, Iran
| | - Marzie Zaj
- Department of Mathematics, Ferdowsi University of Mashhad, Iran
| | | |
Collapse
|
21
|
Clancy J, Hoffmann CS, Pickett BE. Transcriptomics secondary analysis of severe human infection with SARS-CoV-2 identifies gene expression changes and predicts three transcriptional biomarkers in leukocytes. Comput Struct Biotechnol J 2023; 21:1403-1413. [PMID: 36785619 PMCID: PMC9908618 DOI: 10.1016/j.csbj.2023.02.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 02/02/2023] [Accepted: 02/02/2023] [Indexed: 02/11/2023] Open
Abstract
SARS-CoV-2 is the causative agent of COVID-19, which has greatly affected human health since it first emerged. Defining the human factors and biomarkers that differentiate severe SARS-CoV-2 infection from mild infection has become of increasing interest to clinicians. To help address this need, we retrieved 269 public RNA-seq human transcriptome samples from GEO that had qualitative disease severity metadata. We then subjected these samples to a robust RNA-seq data processing workflow to calculate gene expression in PBMCs, whole blood, and leukocytes, as well as to predict transcriptional biomarkers in PBMCs and leukocytes. This process involved using Salmon for read mapping, edgeR to calculate significant differential expression levels, and gene ontology enrichment using Camera. We then performed a random forest machine learning analysis on the read counts data to identify genes that best classified samples based on the COVID-19 severity phenotype. This approach produced a ranked list of leukocyte genes based on their Gini values that includes TGFBI, TTYH2, and CD4, which are associated with both the immune response and inflammation. Our results show that these three genes can potentially classify samples with severe COVID-19 with accuracy of ∼88% and an area under the receiver operating characteristic curve of 92.6--indicating acceptable specificity and sensitivity. We expect that our findings can help contribute to the development of improved diagnostics that may aid in identifying severe COVID-19 cases, guide clinical treatment, and improve mortality rates.
Collapse
|
22
|
Cardiovascular and Renal Comorbidities Included into Neural Networks Predict the Outcome in COVID-19 Patients Admitted to an Intensive Care Unit: Three-Center, Cross-Validation, Age- and Sex-Matched Study. J Cardiovasc Dev Dis 2023; 10:jcdd10020039. [PMID: 36826535 PMCID: PMC9967447 DOI: 10.3390/jcdd10020039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 01/16/2023] [Accepted: 01/19/2023] [Indexed: 01/25/2023] Open
Abstract
Here, we performed a multicenter, age- and sex-matched study to compare the efficiency of various machine learning algorithms in the prediction of COVID-19 fatal outcomes and to develop sensitive, specific, and robust artificial intelligence tools for the prompt triage of patients with severe COVID-19 in the intensive care unit setting. In a challenge against other established machine learning algorithms (decision trees, random forests, extra trees, neural networks, k-nearest neighbors, and gradient boosting: XGBoost, LightGBM, and CatBoost) and multivariate logistic regression as a reference, neural networks demonstrated the highest sensitivity, sufficient specificity, and excellent robustness. Further, neural networks based on coronary artery disease/chronic heart failure, stage 3-5 chronic kidney disease, blood urea nitrogen, and C-reactive protein as the predictors exceeded 90% sensitivity and 80% specificity, reaching AUROC of 0.866 at primary cross-validation and 0.849 at secondary cross-validation on virtual samples generated by the bootstrapping procedure. These results underscore the impact of cardiovascular and renal comorbidities in the context of thrombotic complications characteristic of severe COVID-19. As aforementioned predictors can be obtained from the case histories or are inexpensive to be measured at admission to the intensive care unit, we suggest this predictor composition is useful for the triage of critically ill COVID-19 patients.
Collapse
|
23
|
Liang Z, Zhang Z, Chen H, Zhang Z. Disease prediction based on multi-type data fusion from Chinese electronic health record. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:13732-13746. [PMID: 36654065 DOI: 10.3934/mbe.2022640] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Disease prediction by using a variety of healthcare data to assist doctors in disease diagnosis is becoming a more and more important research topic recently. This paper proposes a disease prediction model that fuses multiple types of encoded representations of Chinese electronic health records (EHRs). The model framework utilizes a multi-head self-attention mechanism, which combines textual and numerical features to enhance text representations. The BiLSTM-CRF and TextCNN models are used, respectively, to extract entities and then obtain the embedding representations of them. The representations of text and entities in it are combined together for formulating representations of EHRs. The experimental results on EHRs data collected from a Three Grade Class B Hospital General in Gansu Province, China, show that our model achieved an F1 score of 91.92%, which outperforms the previous baseline methods.
Collapse
Affiliation(s)
- Zhaoyu Liang
- College of Computer Science and Engineering, Northwest Normal University, 967 Anning East Road, Lanzhou 730070, China
| | - Zhichang Zhang
- College of Computer Science and Engineering, Northwest Normal University, 967 Anning East Road, Lanzhou 730070, China
| | - Haoyuan Chen
- College of Computer Science and Engineering, Northwest Normal University, 967 Anning East Road, Lanzhou 730070, China
| | - Ziqin Zhang
- College of Computer Science and Engineering, Northwest Normal University, 967 Anning East Road, Lanzhou 730070, China
| |
Collapse
|
24
|
Wiegand M, Cowan SL, Waddington CS, Halsall DJ, Keevil VL, Tom BDM, Taylor V, Gkrania-Klotsas E, Preller J, Goudie RJB. Development and validation of a dynamic 48-hour in-hospital mortality risk stratification for COVID-19 in a UK teaching hospital: a retrospective cohort study. BMJ Open 2022; 12:e060026. [PMID: 36691139 PMCID: PMC9445230 DOI: 10.1136/bmjopen-2021-060026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Accepted: 07/13/2022] [Indexed: 02/02/2023] Open
Abstract
OBJECTIVES To develop a disease stratification model for COVID-19 that updates according to changes in a patient's condition while in hospital to facilitate patient management and resource allocation. DESIGN In this retrospective cohort study, we adopted a landmarking approach to dynamic prediction of all-cause in-hospital mortality over the next 48 hours. We accounted for informative predictor missingness and selected predictors using penalised regression. SETTING All data used in this study were obtained from a single UK teaching hospital. PARTICIPANTS We developed the model using 473 consecutive patients with COVID-19 presenting to a UK hospital between 1 March 2020 and 12 September 2020; and temporally validated using data on 1119 patients presenting between 13 September 2020 and 17 March 2021. PRIMARY AND SECONDARY OUTCOME MEASURES The primary outcome is all-cause in-hospital mortality within 48 hours of the prediction time. We accounted for the competing risks of discharge from hospital alive and transfer to a tertiary intensive care unit for extracorporeal membrane oxygenation. RESULTS Our final model includes age, Clinical Frailty Scale score, heart rate, respiratory rate, oxygen saturation/fractional inspired oxygen ratio, white cell count, presence of acidosis (pH <7.35) and interleukin-6. Internal validation achieved an area under the receiver operating characteristic (AUROC) of 0.90 (95% CI 0.87 to 0.93) and temporal validation gave an AUROC of 0.86 (95% CI 0.83 to 0.88). CONCLUSIONS Our model incorporates both static risk factors (eg, age) and evolving clinical and laboratory data, to provide a dynamic risk prediction model that adapts to both sudden and gradual changes in an individual patient's clinical condition. On successful external validation, the model has the potential to be a powerful clinical risk assessment tool. TRIAL REGISTRATION The study is registered as 'researchregistry5464' on the Research Registry (www.researchregistry.com).
Collapse
Affiliation(s)
- Martin Wiegand
- Faculty of Infectious Diseases, London School of Hygiene & Tropical Medicine, London, UK
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| | - Sarah L Cowan
- Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | | | - David J Halsall
- Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - Victoria L Keevil
- Department of Medicine, University of Cambridge, Cambridge, UK
- Department of Medicine for the Elderly, Addenbrooke's Hospital, Cambridge, UK
| | - Brian D M Tom
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| | - Vince Taylor
- Cancer Research UK, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | | | - Jacobus Preller
- Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | | |
Collapse
|
25
|
Nigo M, Tran HTN, Xie Z, Feng H, Mao B, Rasmy L, Miao H, Zhi D. PK-RNN-V E: A deep learning model approach to vancomycin therapeutic drug monitoring using electronic health record data. J Biomed Inform 2022; 133:104166. [PMID: 35985620 DOI: 10.1016/j.jbi.2022.104166] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 05/18/2022] [Accepted: 08/12/2022] [Indexed: 11/18/2022]
Abstract
Vancomycin is a commonly used antimicrobial in hospitals, and therapeutic drug monitoring (TDM) is required to optimize its efficacy and avoid toxicities. Bayesian models are currently recommended to predict the antibiotic levels. These models, however, although using carefully designed lab observations, were often developed in limited patient populations. The increasing availability of electronic health record (EHR) data offers an opportunity to develop TDM models for real-world patient populations. Here, we present a deep learning-based pharmacokinetic prediction model for vancomycin (PK-RNN-V E) using a large EHR dataset of 5,483 patients with 55,336 vancomycin administrations. PK-RNN-V E takes the patient's real-time sparse and irregular observations and offers dynamic predictions. Our results show that RNN-PK-V E offers a root mean squared error (RMSE) of 5.39 and outperforms the traditional Bayesian model (VTDM model) with an RMSE of 6.29. We believe that PK-RNN-V E can provide a pharmacokinetic model for vancomycin and other antimicrobials that require TDM.
Collapse
Affiliation(s)
- Masayuki Nigo
- Division of Infectious Diseases, Department of Internal Medicine, The University of Texas Health Science Center at Houston, McGovern Medical School, Houston, TX, United States; School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, United States.
| | | | - Ziqian Xie
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Han Feng
- School of Public Health, University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Bingyu Mao
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Laila Rasmy
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Hongyu Miao
- School of Public Health, University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Degui Zhi
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, United States.
| |
Collapse
|