1
|
Varošanec AM, Marković L, Sonicki Z. A Novel Time-Aware Deep Learning Model Predicting Myopia in Children and Adolescents. OPHTHALMOLOGY SCIENCE 2024; 4:100563. [PMID: 39165695 PMCID: PMC11334700 DOI: 10.1016/j.xops.2024.100563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Revised: 05/26/2024] [Accepted: 06/05/2024] [Indexed: 08/22/2024]
Abstract
Objective To quantitatively predict children's and adolescents' spherical equivalent (SE) by leveraging their variable-length historical vision records. Design Retrospective analysis. Participants Eight hundred ninety-five myopic children and adolescents aged 4 to 18 years, with a complete ophthalmic examination and retinoscopy in cycloplegia prior to spectacle correction, were enrolled in the period from January 1, 2008 to July 1, 2023 at the University Hospital "Sveti Duh," Zagreb, Croatia. Methods A novel modification of time-aware long short-term memory (LSTM) was used to quantitatively predict children's and adolescents' SE within 7 years after diagnosis. Main Outcome Measures The utilization of extended gate time-aware LSTM involved capturing temporal features within irregularly sampled time series data. This approach aligned more closely with the characteristics of fact-based data, increasing its applicability and contributing to the early identification of myopia progression. Results The testing set exhibited a mean absolute prediction error (MAE) of 0.10 ± 0.15 diopter (D) for SE. Lower MAE values were associated with longer sequence lengths, shorter prediction durations, older age groups, and low myopia, while higher MAE values were observed with shorter sequence lengths, longer prediction durations, younger age groups, and in premyopic or high myopic individuals, ranging from as low as 0.03 ± 0.04 D to as high as 0.45 ± 0.24 D. Conclusions Extended gate time-aware LSTM capturing temporal features in irregularly sampled time series data can be used to quantitatively predict children's and adolescents' SE within 7 years with an overall error of 0.10 ± 0.15 D. This value is substantially lower than the threshold for prediction to be considered clinically acceptable, such as a criterion of 0.75 D. Financial Disclosures The author(s) have no proprietary or commercial interest in any materials discussed in this article.
Collapse
Affiliation(s)
- Ana Maria Varošanec
- University Eye Department, University Hospital “Sveti Duh”, Reference Center of The Ministry of Health of The Republic of Croatia for Pediatric Ophthalmology and Strabismus, Reference Center of The Ministry of Health of The Republic of Croatia for Inherited Retinal Dystrophies, Zagreb, Croatia
- Faculty of Dental Medicine and Health Osijek, University Josip Juraj Strossmayer in Osijek, Croatia
| | - Leon Marković
- University Eye Department, University Hospital “Sveti Duh”, Reference Center of The Ministry of Health of The Republic of Croatia for Pediatric Ophthalmology and Strabismus, Reference Center of The Ministry of Health of The Republic of Croatia for Inherited Retinal Dystrophies, Zagreb, Croatia
- Faculty of Dental Medicine and Health Osijek, University Josip Juraj Strossmayer in Osijek, Croatia
| | - Zdenko Sonicki
- Department of Medical Statistics, Epidemiology and Medical Informatics, Andrija Štampar School of Public Health, School of Medicine, University of Zagreb, Zagreb, Croatia
| |
Collapse
|
2
|
Nigo M, Rasmy L, Mao B, Kannadath BS, Xie Z, Zhi D. Deep learning model for personalized prediction of positive MRSA culture using time-series electronic health records. Nat Commun 2024; 15:2036. [PMID: 38448409 PMCID: PMC10917736 DOI: 10.1038/s41467-024-46211-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 02/19/2024] [Indexed: 03/08/2024] Open
Abstract
Methicillin-resistant Staphylococcus aureus (MRSA) poses significant morbidity and mortality in hospitals. Rapid, accurate risk stratification of MRSA is crucial for optimizing antibiotic therapy. Our study introduced a deep learning model, PyTorch_EHR, which leverages electronic health record (EHR) time-series data, including wide-variety patient specific data, to predict MRSA culture positivity within two weeks. 8,164 MRSA and 22,393 non-MRSA patient events from Memorial Hermann Hospital System, Houston, Texas are used for model development. PyTorch_EHR outperforms logistic regression (LR) and light gradient boost machine (LGBM) models in accuracy (AUROCPyTorch_EHR = 0.911, AUROCLR = 0.857, AUROCLGBM = 0.892). External validation with 393,713 patient events from the Medical Information Mart for Intensive Care (MIMIC)-IV dataset in Boston confirms its superior accuracy (AUROCPyTorch_EHR = 0.859, AUROCLR = 0.816, AUROCLGBM = 0.838). Our model effectively stratifies patients into high-, medium-, and low-risk categories, potentially optimizing antimicrobial therapy and reducing unnecessary MRSA-specific antimicrobials. This highlights the advantage of deep learning models in predicting MRSA positive cultures, surpassing traditional machine learning models and supporting clinicians' judgments.
Collapse
Affiliation(s)
- Masayuki Nigo
- McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA.
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA.
- Division of Infectious Diseases, Department of Medicine, Houston Methodist Hospital, Texas Medical Center, Houston, TX, USA.
| | - Laila Rasmy
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Bingyu Mao
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Bijun Sai Kannadath
- Department of Internal Medicine, University of Arizona College of Medicine, Phoenix, AZ, USA
| | - Ziqian Xie
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Degui Zhi
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| |
Collapse
|
3
|
Budiarto A, Tsang KCH, Wilson AM, Sheikh A, Shah SA. Machine Learning-Based Asthma Attack Prediction Models From Routinely Collected Electronic Health Records: Systematic Scoping Review. JMIR AI 2023; 2:e46717. [PMID: 38875586 PMCID: PMC11041490 DOI: 10.2196/46717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 09/28/2023] [Accepted: 10/09/2023] [Indexed: 06/16/2024]
Abstract
BACKGROUND An early warning tool to predict attacks could enhance asthma management and reduce the likelihood of serious consequences. Electronic health records (EHRs) providing access to historical data about patients with asthma coupled with machine learning (ML) provide an opportunity to develop such a tool. Several studies have developed ML-based tools to predict asthma attacks. OBJECTIVE This study aims to critically evaluate ML-based models derived using EHRs for the prediction of asthma attacks. METHODS We systematically searched PubMed and Scopus (the search period was between January 1, 2012, and January 31, 2023) for papers meeting the following inclusion criteria: (1) used EHR data as the main data source, (2) used asthma attack as the outcome, and (3) compared ML-based prediction models' performance. We excluded non-English papers and nonresearch papers, such as commentary and systematic review papers. In addition, we also excluded papers that did not provide any details about the respective ML approach and its result, including protocol papers. The selected studies were then summarized across multiple dimensions including data preprocessing methods, ML algorithms, model validation, model explainability, and model implementation. RESULTS Overall, 17 papers were included at the end of the selection process. There was considerable heterogeneity in how asthma attacks were defined. Of the 17 studies, 8 (47%) studies used routinely collected data both from primary care and secondary care practices together. Extreme imbalanced data was a notable issue in most studies (13/17, 76%), but only 38% (5/13) of them explicitly dealt with it in their data preprocessing pipeline. The gradient boosting-based method was the best ML method in 59% (10/17) of the studies. Of the 17 studies, 14 (82%) studies used a model explanation method to identify the most important predictors. None of the studies followed the standard reporting guidelines, and none were prospectively validated. CONCLUSIONS Our review indicates that this research field is still underdeveloped, given the limited body of evidence, heterogeneity of methods, lack of external validation, and suboptimally reported models. We highlighted several technical challenges (class imbalance, external validation, model explanation, and adherence to reporting guidelines to aid reproducibility) that need to be addressed to make progress toward clinical adoption.
Collapse
Affiliation(s)
- Arif Budiarto
- Asthma UK Center for Applied Research, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
- Bioinformatics and Data Science Research Center, Bina Nusantara University, Jakarta, Indonesia
| | - Kevin C H Tsang
- Asthma UK Center for Applied Research, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
| | - Andrew M Wilson
- Norwich Medical School, University of East Anglia, Norwich, United Kingdom
- Norfolk and Norwich University Hospital NHS Foundation Trust, Norwich, United Kingdom
| | - Aziz Sheikh
- Asthma UK Center for Applied Research, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
| | - Syed Ahmar Shah
- Asthma UK Center for Applied Research, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
4
|
Pungitore S, Subbian V. Assessment of Prediction Tasks and Time Window Selection in Temporal Modeling of Electronic Health Record Data: a Systematic Review. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2023; 7:313-331. [PMID: 37637723 PMCID: PMC10449760 DOI: 10.1007/s41666-023-00143-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 04/12/2023] [Accepted: 07/28/2023] [Indexed: 08/29/2023]
Abstract
Temporal electronic health record (EHR) data are often preferred for clinical prediction tasks because they offer more complete representations of a patient's pathophysiology than static data. A challenge when working with temporal EHR data is problem formulation, which includes defining the time windows of interest and the prediction task. Our objective was to conduct a systematic review that assessed the definition and reporting of concepts relevant to temporal clinical prediction tasks. We searched PubMed® and IEEE Xplore® databases for studies from January 1, 2010 applying machine learning models to EHR data for patient outcome prediction. Publications applying time-series methods were selected for further review. We identified 92 studies and summarized them by clinical context and definition and reporting of the prediction problem. For the time windows of interest, 12 studies did not discuss window lengths, 57 used a single set of window lengths, and 23 evaluated the relationship between window length and model performance. We also found that 72 studies had appropriate reporting of the prediction task. However, evaluation of prediction problem formulation for temporal EHR data was complicated by heterogeneity in assessing and reporting of these concepts. Even among studies modeling similar clinical outcomes, there were variations in terminology used to describe the prediction problem, rationale for window lengths, and determination of the outcome of interest. As temporal modeling using EHR data expands, minimal reporting standards should include time-series specific concerns to promote rigor and reproducibility in future studies and facilitate model implementation in clinical settings. Supplementary Information The online version contains supplementary material available at 10.1007/s41666-023-00143-4.
Collapse
Affiliation(s)
- Sarah Pungitore
- Program in Applied Mathematics, Department of Mathematics, 617 N Santa Rita Ave, Tucson, AZ 85721 USA
| | - Vignesh Subbian
- Department of Biomedical Engineering, The University of Arizona, Tucson, AZ 85721-0020 USA
- Department of Systems and Industrial Engineering, The University of Arizona, Tucson, AZ 85721-0020 USA
| |
Collapse
|
5
|
Budiarto A, Sheikh A, Wilson A, Price DB, Shah SA. Handling Class Imbalance in Machine Learning-based Prediction Models: A Case Study in Asthma Management. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023; 2023:1-5. [PMID: 38083129 DOI: 10.1109/embc40787.2023.10340751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
A data-driven prediction tool has the potential to provide early warning of an asthma attack and improve asthma management and outcomes. Most previous machine learning (ML)-based studies for asthma attack prediction have reported a severe class imbalance, with major implications for model performance. We aimed to undertake a systematic comparison of several class imbalance handling techniques in the context of risk prediction models for asthma prognosis. We used data from 9,835 asthma patients extracted from the Medical Information Mart for Intensive Care (MIMIC) IV database and deployed five class imbalance handling methods based on synthetic minority oversampling technique (SMOTE) and cost function customisation. We then compared their performances in improving two-class classifier models developed using logistic regression (LR) and extreme gradient boosting (XGBoost) for three different prediction tasks with varying severity of class imbalance (proportion of majority class ranging from 90.86% to 98.98%). The cost function customisation technique substantially outperformed the SMOTE-based methods in all tasks. XGBoost combined with cost function customisation achieved the highest prediction performance for the outcome with the most extreme class imbalance ratio (AUC = 0.72). Our findings suggest that the cost function customisation-based approach to tackle class imbalance provides substantially better performance compared to oversampling in the context of asthma management.Clinical Relevance- This study underscores the challenge of class imbalance in the context of prediction tools to improve asthma management and outcomes and provides a methodological solution that addresses the challenge. Accurate asthma prediction tools can provide early warning and potentially prevent deterioration thereby improving the quality of life of patients with asthma.
Collapse
|
6
|
Luo J, Lan L, Huang S, Zeng X, Xiang Q, Li M, Yang S, Zhao W, Zhou X. Real-time prediction of organ failures in patients with acute pancreatitis using longitudinal irregular data. J Biomed Inform 2023; 139:104310. [PMID: 36773821 DOI: 10.1016/j.jbi.2023.104310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Revised: 01/10/2023] [Accepted: 02/06/2023] [Indexed: 02/12/2023]
Abstract
It is extremely important to identify patients with acute pancreatitis who are at high risk for developing persistent organ failures early in the course of the disease. Due to the irregularity of longitudinal data and the poor interpretability of complex models, many models used to identify acute pancreatitis patients with a high risk of organ failure tended to rely on simple statistical models and limited their application to the early stages of patient admission. With the success of recurrent neural networks in modeling longitudinal medical data and the development of interpretable algorithms, these problems can be well addressed. In this study, we developed a novel model named Multi-task and Time-aware Gated Recurrent Unit RNN (MT-GRU) to directly predict organ failure in patients with acute pancreatitis based on irregular medical EMR data. Our proposed end-to-end multi-task model achieved significantly better performance compared to two-stage models. In addition, our model not only provided an accurate early warning of organ failure for patients throughout their hospital stay, but also demonstrated individual and population-level important variables, allowing physicians to understand the scientific basis of the model for decision-making. By providing early warning of the risk of organ failure, our proposed model is expected to assist physicians in improving outcomes for patients with acute pancreatitis.
Collapse
Affiliation(s)
- Jiawei Luo
- West China Biomedical Big Data Center, West China Hospital/West China School of Medicine, Sichuan University, Chengdu, China.
| | - Lan Lan
- IT Center, Beijing Tiantan Hospital, Capital Medical University, Beijing, China.
| | - Shixin Huang
- School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing, China.
| | - Xiaoxi Zeng
- West China Biomedical Big Data Center, West China Hospital/West China School of Medicine, Sichuan University, Chengdu, China.
| | - Qu Xiang
- West China Biomedical Big Data Center, West China Hospital/West China School of Medicine, Sichuan University, Chengdu, China.
| | - Mengjiao Li
- West China Biomedical Big Data Center, West China Hospital/West China School of Medicine, Sichuan University, Chengdu, China.
| | - Shu Yang
- College of Medical Information Engineering, Chengdu University of Traditional Chinese Medicine, Chengdu, China.
| | - Weiling Zhao
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, USA.
| | - Xiaobo Zhou
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, USA.
| |
Collapse
|
7
|
Liu LJ, Ortiz-Soriano V, Neyra JA, Chen J. KIT-LSTM: Knowledge-guided Time-aware LSTM for Continuous Clinical Risk Prediction. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2022; 2022:1086-1091. [PMID: 37131483 PMCID: PMC10151119 DOI: 10.1109/bibm55620.2022.9994931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Rapid accumulation of temporal Electronic Health Record (EHR) data and recent advances in deep learning have shown high potential in precisely and timely predicting patients' risks using AI. However, most existing risk prediction approaches ignore the complex asynchronous and irregular problems in real-world EHR data. This paper proposes a novel approach called Knowledge-guIded Time-aware LSTM (KIT-LSTM) for continuous mortality predictions using EHR. KIT-LSTM extends LSTM with two time-aware gates and a knowledge-aware gate to better model EHR and interprets results. Experiments on real-world data for patients with acute kidney injury with dialysis (AKI-D) demonstrate that KIT-LSTM performs better than the state-of-the-art methods for predicting patients' risk trajectories and model interpretation. KIT-LSTM can better support timely decision-making for clinicians.
Collapse
Affiliation(s)
- Lucas Jing Liu
- Department of Computer Science University of Kentucky, Lexington, KY, USA
| | | | - Javier A Neyra
- Department of Internal Medicine, Division of Nephrology University of Alabama at Birmingham, Birmingham, AL, USA
- Department of Internal Medicine University of Kentucky, Lexington, KY, USA
| | - Jin Chen
- Department of Computer Science University of Kentucky, Lexington, KY, USA
- Department of Internal Medicine University of Kentucky, Lexington, KY, USA
| |
Collapse
|
8
|
Silva JF, Matos S. Modelling patient trajectories using multimodal information. J Biomed Inform 2022; 134:104195. [PMID: 36150641 DOI: 10.1016/j.jbi.2022.104195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 07/16/2022] [Accepted: 08/30/2022] [Indexed: 11/26/2022]
Abstract
BACKGROUND Electronic Health Records (EHRs) aggregate diverse information at the patient level, holding a trajectory representative of the evolution of the patient health status throughout time. Although this information provides context and can be leveraged by physicians to monitor patient health and make more accurate prognoses/diagnoses, patient records can contain information from very long time spans, which combined with the rapid generation rate of medical data makes clinical decision making more complex. Patient trajectory modelling can assist by exploring existing information in a scalable manner, and can contribute in augmenting health care quality by fostering preventive medicine practices (e.g. earlier disease diagnosis). METHODS We propose a solution to model patient trajectories that combines different types of information (e.g. clinical text, standard codes) and considers the temporal aspect of clinical data. This solution leverages two different architectures: one supporting flexible sets of input features, to convert patient admissions into dense representations; and a second exploring extracted admission representations in a recurrent-based architecture, where patient trajectories are processed in sub-sequences using a sliding window mechanism. RESULTS The developed solution was evaluated on two different clinical outcomes, unexpected patient readmission and disease progression, using the publicly available Medical Information Mart for Intensive Care (MIMIC)-III clinical database. The results obtained demonstrate the potential of the first architecture to model readmission and diagnoses prediction using single patient admissions. While information from clinical text did not show the discriminative power observed in other existing works, this may be explained by the need to fine-tune the clinicalBERT model. Finally, we demonstrate the potential of the sequence-based architecture using a sliding window mechanism to represent the input data, attaining comparable performances to other existing solutions. CONCLUSION Herein, we explored DL-based techniques to model patient trajectories and propose two flexible architectures that explore patient admissions on an individual and sequence basis. The combination of clinical text with other types of information led to positive results, which can be further improved by including a fine-tuned version of clinicalBERT in the architectures. The proposed solution can be publicly accessed at https://github.com/bioinformatics-ua/PatientTM.
Collapse
Affiliation(s)
| | - Sérgio Matos
- DETI/IEETA, University of Aveiro, Aveiro, Portugal.
| |
Collapse
|
9
|
Ghazi MM, Sorensen L, Ourselin S, Nielsen M. CARRNN: A Continuous Autoregressive Recurrent Neural Network for Deep Representation Learning From Sporadic Temporal Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; PP:792-802. [PMID: 35666790 DOI: 10.1109/tnnls.2022.3177366] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Learning temporal patterns from multivariate longitudinal data is challenging especially in cases when data is sporadic, as often seen in, e.g., healthcare applications where the data can suffer from irregularity and asynchronicity as the time between consecutive data points can vary across features and samples, hindering the application of existing deep learning models that are constructed for complete, evenly spaced data with fixed sequence lengths. In this article, a novel deep learning-based model is developed for modeling multiple temporal features in sporadic data using an integrated deep learning architecture based on a recurrent neural network (RNN) unit and a continuous-time autoregressive (CAR) model. The proposed model, called CARRNN, uses a generalized discrete-time autoregressive (AR) model that is trainable end-to-end using neural networks modulated by time lags to describe the changes caused by the irregularity and asynchronicity. It is applied to time-series regression and classification tasks for Alzheimer's disease progression modeling, intensive care unit (ICU) mortality rate prediction, human activity recognition, and event-based digit recognition, where the proposed model based on a gated recurrent unit (GRU) in all cases achieves significantly better predictive performance than the state-of-the-art methods using RNNs, GRUs, and long short-term memory (LSTM) networks.
Collapse
|
10
|
Rasmy L, Nigo M, Kannadath BS, Xie Z, Mao B, Patel K, Zhou Y, Zhang W, Ross A, Xu H, Zhi D. Recurrent neural network models (CovRNN) for predicting outcomes of patients with COVID-19 on admission to hospital: model development and validation using electronic health record data. Lancet Digit Health 2022; 4:e415-e425. [PMID: 35466079 PMCID: PMC9023005 DOI: 10.1016/s2589-7500(22)00049-8] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 01/11/2022] [Accepted: 03/07/2022] [Indexed: 02/08/2023]
Abstract
BACKGROUND Predicting outcomes of patients with COVID-19 at an early stage is crucial for optimised clinical care and resource management, especially during a pandemic. Although multiple machine learning models have been proposed to address this issue, because of their requirements for extensive data preprocessing and feature engineering, they have not been validated or implemented outside of their original study site. Therefore, we aimed to develop accurate and transferrable predictive models of outcomes on hospital admission for patients with COVID-19. METHODS In this study, we developed recurrent neural network-based models (CovRNN) to predict the outcomes of patients with COVID-19 by use of available electronic health record data on admission to hospital, without the need for specific feature selection or missing data imputation. CovRNN was designed to predict three outcomes: in-hospital mortality, need for mechanical ventilation, and prolonged hospital stay (>7 days). For in-hospital mortality and mechanical ventilation, CovRNN produced time-to-event risk scores (survival prediction; evaluated by the concordance index) and all-time risk scores (binary prediction; area under the receiver operating characteristic curve [AUROC] was the main metric); we only trained a binary classification model for prolonged hospital stay. For binary classification tasks, we compared CovRNN against traditional machine learning algorithms: logistic regression and light gradient boost machine. Our models were trained and validated on the heterogeneous, deidentified data of 247 960 patients with COVID-19 from 87 US health-care systems derived from the Cerner Real-World COVID-19 Q3 Dataset up to September 2020. We held out the data of 4175 patients from two hospitals for external validation. The remaining 243 785 patients from the 85 health systems were grouped into training (n=170 626), validation (n=24 378), and multi-hospital test (n=48 781) sets. Model performance was evaluated in the multi-hospital test set. The transferability of CovRNN was externally validated by use of deidentified data from 36 140 patients derived from the US-based Optum deidentified COVID-19 electronic health record dataset (version 1015; from January, 2007, to Oct 15, 2020). Exact dates of data extraction were masked by the databases to ensure patient data safety. FINDINGS CovRNN binary models achieved AUROCs of 93·0% (95% CI 92·6-93·4) for the prediction of in-hospital mortality, 92·9% (92·6-93·2) for the prediction of mechanical ventilation, and 86·5% (86·2-86·9) for the prediction of a prolonged hospital stay, outperforming light gradient boost machine and logistic regression algorithms. External validation confirmed AUROCs in similar ranges (91·3-97·0% for in-hospital mortality prediction, 91·5-96·0% for the prediction of mechanical ventilation, and 81·0-88·3% for the prediction of prolonged hospital stay). For survival prediction, CovRNN achieved a concordance index of 86·0% (95% CI 85·1-86·9) for in-hospital mortality and 92·6% (92·2-93·0) for mechanical ventilation. INTERPRETATION Trained on a large, heterogeneous, real-world dataset, our CovRNN models showed high prediction accuracy and transferability through consistently good performances on multiple external datasets. Our results show the feasibility of a COVID-19 predictive model that delivers high accuracy without the need for complex feature engineering. FUNDING Cancer Prevention and Research Institute of Texas.
Collapse
Affiliation(s)
- Laila Rasmy
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Masayuki Nigo
- McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| | | | - Ziqian Xie
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Bingyu Mao
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Khush Patel
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Yujia Zhou
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Wanheng Zhang
- School of Public Health, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Angela Ross
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Hua Xu
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Degui Zhi
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA,Correspondence to: Dr Degui Zhi, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
11
|
Xie F, Yuan H, Ning Y, Ong MEH, Feng M, Hsu W, Chakraborty B, Liu N. Deep learning for temporal data representation in electronic health records: A systematic review of challenges and methodologies. J Biomed Inform 2021; 126:103980. [PMID: 34974189 DOI: 10.1016/j.jbi.2021.103980] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Revised: 11/07/2021] [Accepted: 12/20/2021] [Indexed: 12/21/2022]
Abstract
OBJECTIVE Temporal electronic health records (EHRs) contain a wealth of information for secondary uses, such as clinical events prediction and chronic disease management. However, challenges exist for temporal data representation. We therefore sought to identify these challenges and evaluate novel methodologies for addressing them through a systematic examination of deep learning solutions. METHODS We searched five databases (PubMed, Embase, the Institute of Electrical and Electronics Engineers [IEEE] Xplore Digital Library, the Association for Computing Machinery [ACM] Digital Library, and Web of Science) complemented with hand-searching in several prestigious computer science conference proceedings. We sought articles that reported deep learning methodologies on temporal data representation in structured EHR data from January 1, 2010, to August 30, 2020. We summarized and analyzed the selected articles from three perspectives: nature of time series, methodology, and model implementation. RESULTS We included 98 articles related to temporal data representation using deep learning. Four major challenges were identified, including data irregularity, heterogeneity, sparsity, and model opacity. We then studied how deep learning techniques were applied to address these challenges. Finally, we discuss some open challenges arising from deep learning. CONCLUSION Temporal EHR data present several major challenges for clinical prediction modeling and data utilization. To some extent, current deep learning solutions can address these challenges. Future studies may consider designing comprehensive and integrated solutions. Moreover, researchers should incorporate clinical domain knowledge into study designs and enhance model interpretability to facilitate clinical implementation.
Collapse
Affiliation(s)
- Feng Xie
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore; Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore
| | - Han Yuan
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore
| | - Yilin Ning
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore
| | - Marcus Eng Hock Ong
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore; Department of Emergency Medicine, Singapore General Hospital, Singapore
| | - Mengling Feng
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore
| | - Wynne Hsu
- School of Computing, National University of Singapore, Singapore; Institute of Data Science, National University of Singapore, Singapore
| | - Bibhas Chakraborty
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore; Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore; Department of Statistics and Data Science, National University of Singapore, Singapore; Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, United States
| | - Nan Liu
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore; Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore; Institute of Data Science, National University of Singapore, Singapore; SingHealth AI Health Program, Singapore Health Services, Singapore.
| |
Collapse
|
12
|
Yang YC, Islam SU, Noor A, Khan S, Afsar W, Nazir S. Influential Usage of Big Data and Artificial Intelligence in Healthcare. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:5812499. [PMID: 34527076 PMCID: PMC8437645 DOI: 10.1155/2021/5812499] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 08/09/2021] [Indexed: 01/07/2023]
Abstract
Artificial intelligence (AI) is making computer systems capable of executing human brain tasks in many fields in all aspects of daily life. The enhancement in information and communications technology (ICT) has indisputably improved the quality of people's lives around the globe. Especially, ICT has led to a very needy and tremendous improvement in the health sector which is commonly known as electronic health (eHealth) and medical health (mHealth). Deep machine learning and AI approaches are commonly presented in many applications using big data, which consists of all relevant data about the medical health and diseases which a model can access at the time of execution or diagnosis of diseases. For example, cardiovascular imaging has now accurate imaging combined with big data from the eHealth record and pathology to better characterize the disease and personalized therapy. In clinical work and imaging, cancer care is getting improved by knowing the tumor biology and helping in the implementation of precision medicine. The Markov model is used to extract new approaches for leveraging cancer. In this paper, we have reviewed existing research relevant to eHealth and mHealth where various models are discussed which uses big data for the diagnosis and healthcare system. This paper summarizes the recent promising applications of AI and big data in medical health and electronic health, which have potentially added value to diagnosis and patient care.
Collapse
Affiliation(s)
- Yan Cheng Yang
- Foreign Language Department, Luoyang Institute of Science and Technology, Luoyang, Henan, China
- Foreign Language Department/Language and Cognition Center, Hunan University, Changsha, Hunan, China
| | - Saad Ul Islam
- Department of Computer Science, University of Swabi, Swabi, Pakistan
| | - Asra Noor
- Department of Computer Science, University of Swabi, Swabi, Pakistan
| | - Sadia Khan
- Department of Computer Science, University of Swabi, Swabi, Pakistan
| | - Waseem Afsar
- Department of Computer Science, University of Swabi, Swabi, Pakistan
| | - Shah Nazir
- Department of Computer Science, University of Swabi, Swabi, Pakistan
| |
Collapse
|
13
|
Mitra R, MacLean AL. RVAgene: Generative modeling of gene expression time series data. Bioinformatics 2021; 37:3252-3262. [PMID: 33974008 PMCID: PMC8504625 DOI: 10.1093/bioinformatics/btab260] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 04/19/2021] [Accepted: 04/22/2021] [Indexed: 12/04/2022] Open
Abstract
Motivation Methods to model dynamic changes in gene expression at a genome-wide level are not currently sufficient for large (temporally rich or single-cell) datasets. Variational autoencoders offer means to characterize large datasets and have been used effectively to characterize features of single-cell datasets. Here, we extend these methods for use with gene expression time series data. Results We present RVAgene: a recurrent variational autoencoder to model gene expression dynamics. RVAgene learns to accurately and efficiently reconstruct temporal gene profiles. It also learns a low dimensional representation of the data via a recurrent encoder network that can be used for biological feature discovery, and from which we can generate new gene expression data by sampling the latent space. We test RVAgene on simulated and real biological datasets, including embryonic stem cell differentiation and kidney injury response dynamics. In all cases, RVAgene accurately reconstructed complex gene expression temporal profiles. Via cross validation, we show that a low-error latent space representation can be learnt using only a fraction of the data. Through clustering and gene ontology term enrichment analysis on the latent space, we demonstrate the potential of RVAgene for unsupervised discovery. In particular, RVAgene identifies new programs of shared gene regulation of Lox family genes in response to kidney injury. Availability and implementation All datasets analyzed in this manuscript are publicly available and have been published previously. RVAgene is available in Python, at GitHub: https://github.com/maclean-lab/RVAgene; Zenodo archive: http://doi.org/10.5281/zenodo.4271097. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Raktim Mitra
- Quantitative and Computational Biology, University of Southern California, Los Angeles, CA-90007, USA
| | - Adam L MacLean
- Quantitative and Computational Biology, University of Southern California, Los Angeles, CA-90007, USA
| |
Collapse
|
14
|
Ferté T, Cossin S, Schaeverbeke T, Barnetche T, Jouhet V, Hejblum BP. Automatic phenotyping of electronical health record: PheVis algorithm. J Biomed Inform 2021; 117:103746. [PMID: 33746080 DOI: 10.1016/j.jbi.2021.103746] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 03/02/2021] [Accepted: 03/05/2021] [Indexed: 11/18/2022]
Abstract
Electronic Health Records (EHRs) often lack reliable annotation of patient medical conditions. Phenorm, an automated unsupervised algorithm to identify patient medical conditions from EHR data, has been developed. PheVis extends PheNorm at the visit resolution. PheVis combines diagnosis codes together with medical concepts extracted from medical notes, incorporating past history in a machine learning approach to provide an interpretable parametric predictor of the occurrence probability for a given medical condition at each visit. PheVis is applied to two real-world use-cases using the datawarehouse of the University Hospital of Bordeaux: i) rheumatoid arthritis, a chronic condition; ii) tuberculosis, an acute condition. Cross-validated AUROC were respectively 0.943 [0.940; 0.945] and 0.987 [0.983; 0.990]. Cross-validated AUPRC were respectively 0.754 [0.744; 0.763] and 0.299 [0.198; 0.403]. PheVis performs well for chronic conditions, though absence of exclusion of past medical history by natural language processing tools limits its performance in French for acute conditions. It achieves significantly better performance than state-of-the-art unsupervised methods especially for chronic diseases.
Collapse
Affiliation(s)
- Thomas Ferté
- Bordeaux Hospital University Center, Pôle de santé publique, Service d'information médicale, Unité Informatique et Archivistique Médicales, F-33000 Bordeaux, France; Univ. Bordeaux ISPED, Inserm Bordeaux Population Health Research Center UMR 1219, Inria BSO, team SISTM, F-33000 Bordeaux, France.
| | - Sébastien Cossin
- Bordeaux Hospital University Center, Pôle de santé publique, Service d'information médicale, Unité Informatique et Archivistique Médicales, F-33000 Bordeaux, France; Univ. Bordeaux, Inserm, Bordeaux Population Health Research Center, team ERIAS, UMR 1219, F-33000 Bordeaux, France
| | - Thierry Schaeverbeke
- Rheumatology department, FHU ACRONIM, Bordeaux University Hospital, F-33076 Bordeaux, France
| | - Thomas Barnetche
- Rheumatology department, FHU ACRONIM, Bordeaux University Hospital, F-33076 Bordeaux, France
| | - Vianney Jouhet
- Bordeaux Hospital University Center, Pôle de santé publique, Service d'information médicale, Unité Informatique et Archivistique Médicales, F-33000 Bordeaux, France; Univ. Bordeaux, Inserm, Bordeaux Population Health Research Center, team ERIAS, UMR 1219, F-33000 Bordeaux, France
| | - Boris P Hejblum
- Univ. Bordeaux ISPED, Inserm Bordeaux Population Health Research Center UMR 1219, Inria BSO, team SISTM, F-33000 Bordeaux, France
| |
Collapse
|
15
|
Sisk R, Lin L, Sperrin M, Barrett JK, Tom B, Diaz-Ordaz K, Peek N, Martin GP. Informative presence and observation in routine health data: A review of methodology for clinical risk prediction. J Am Med Inform Assoc 2021; 28:155-166. [PMID: 33164082 PMCID: PMC7810439 DOI: 10.1093/jamia/ocaa242] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Accepted: 09/17/2020] [Indexed: 12/20/2022] Open
Abstract
Objective Informative presence (IP) is the phenomenon whereby the presence or absence of patient data is potentially informative with respect to their health condition, with informative observation (IO) being the longitudinal equivalent. These phenomena predominantly exist within routinely collected healthcare data, in which data collection is driven by the clinical requirements of patients and clinicians. The extent to which IP and IO are considered when using such data to develop clinical prediction models (CPMs) is unknown, as is the existing methodology aiming at handling these issues. This review aims to synthesize such existing methodology, thereby helping identify an agenda for future methodological work. Materials and Methods A systematic literature search was conducted by 2 independent reviewers using prespecified keywords. Results Thirty-six articles were included. We categorized the methods presented within as derived predictors (including some representation of the measurement process as a predictor in the model), modeling under IP, and latent structures. Including missing indicators or summary measures as predictors is the most commonly presented approach amongst the included studies (24 of 36 articles). Discussion This is the first review to collate the literature in this area under a prediction framework. A considerable body relevant of literature exists, and we present ways in which the described methods could be developed further. Guidance is required for specifying the conditions under which each method should be used to enable applied prediction modelers to use these methods. Conclusions A growing recognition of IP and IO exists within the literature, and methodology is increasingly becoming available to leverage these phenomena for prediction purposes. IP and IO should be approached differently in a prediction context than when the primary goal is explanation. The work included in this review has demonstrated theoretical and empirical benefits of incorporating IP and IO, and therefore we recommend that applied health researchers consider incorporating these methods in their work.
Collapse
Affiliation(s)
- Rose Sisk
- Division of Informatics, Imaging and Data Sciences, School of Health Sciences, University of Manchester, Manchester, United Kingdom
| | - Lijing Lin
- Division of Informatics, Imaging and Data Sciences, School of Health Sciences, University of Manchester, Manchester, United Kingdom
| | - Matthew Sperrin
- Division of Informatics, Imaging and Data Sciences, School of Health Sciences, University of Manchester, Manchester, United Kingdom
| | - Jessica K Barrett
- MRC Biostatistics Unit, University of Cambridge, Cambridge, United Kingdom.,Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom
| | - Brian Tom
- MRC Biostatistics Unit, University of Cambridge, Cambridge, United Kingdom
| | - Karla Diaz-Ordaz
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Niels Peek
- Division of Informatics, Imaging and Data Sciences, School of Health Sciences, University of Manchester, Manchester, United Kingdom.,NIHR Biomedical Research Centre, Manchester Academic Health Science Centre, University of Manchester, Manchester, United Kingdom.,Alan Turing Institute, University of Manchester, London, United Kingdom
| | - Glen P Martin
- Division of Informatics, Imaging and Data Sciences, School of Health Sciences, University of Manchester, Manchester, United Kingdom
| |
Collapse
|
16
|
Xiang Y, Ji H, Zhou Y, Li F, Du J, Rasmy L, Wu S, Zheng WJ, Xu H, Zhi D, Zhang Y, Tao C. Asthma Exacerbation Prediction and Risk Factor Analysis Based on a Time-Sensitive, Attentive Neural Network: Retrospective Cohort Study. J Med Internet Res 2020; 22:e16981. [PMID: 32735224 PMCID: PMC7428917 DOI: 10.2196/16981] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2019] [Revised: 03/02/2020] [Accepted: 05/13/2020] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Asthma exacerbation is an acute or subacute episode of progressive worsening of asthma symptoms and can have a significant impact on patients' quality of life. However, efficient methods that can help identify personalized risk factors and make early predictions are lacking. OBJECTIVE This study aims to use advanced deep learning models to better predict the risk of asthma exacerbations and to explore potential risk factors involved in progressive asthma. METHODS We proposed a novel time-sensitive, attentive neural network to predict asthma exacerbation using clinical variables from large electronic health records. The clinical variables were collected from the Cerner Health Facts database between 1992 and 2015, including 31,433 adult patients with asthma. Interpretations on both patient and cohort levels were investigated based on the model parameters. RESULTS The proposed model obtained an area under the curve value of 0.7003 through a five-fold cross-validation, which outperformed the baseline methods. The results also demonstrated that the addition of elapsed time embeddings considerably improved the prediction performance. Further analysis observed diverse distributions of contributing factors across patients as well as some possible cohort-level risk factors, which could be found supporting evidence from peer-reviewed literature such as respiratory diseases and esophageal reflux. CONCLUSIONS The proposed neural network model performed better than previous methods for the prediction of asthma exacerbation. We believe that personalized risk scores and analyses of contributing factors can help clinicians better assess the individual's level of disease progression and afford the opportunity to adjust treatment, prevent exacerbation, and improve outcomes.
Collapse
Affiliation(s)
- Yang Xiang
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Hangyu Ji
- Division of Gastroenterology, Guang'anmen Hospital, China Academy of Chinese Medical Sciences, Beijing, China
| | - Yujia Zhou
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Fang Li
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Jingcheng Du
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Laila Rasmy
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Stephen Wu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - W Jim Zheng
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Hua Xu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Degui Zhi
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Yaoyun Zhang
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Cui Tao
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| |
Collapse
|
17
|
Estiri H, Strasser ZH, Klann JG, McCoy TH, Wagholikar KB, Vasey S, Castro VM, Murphy ME, Murphy SN. Transitive Sequencing Medical Records for Mining Predictive and Interpretable Temporal Representations. PATTERNS (NEW YORK, N.Y.) 2020; 1:100051. [PMID: 32835307 PMCID: PMC7301790 DOI: 10.1016/j.patter.2020.100051] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 04/27/2020] [Accepted: 05/26/2020] [Indexed: 12/13/2022]
Abstract
Electronic health records (EHRs) contain important temporal information about the progression of disease and treatment outcomes. This paper proposes a transitive sequencing approach for constructing temporal representations from EHR observations for downstream machine learning. Using clinical data from a cohort of patients with congestive heart failure, we mined temporal representations by transitive sequencing of EHR medication and diagnosis records for classification and prediction tasks. We compared the classification and prediction performances of the transitive sequential representations (bag-of-sequences approach) with the conventional approach of using aggregated vectors of EHR data (aggregated vector representation) across different classifiers. We found that the transitive sequential representations are better phenotype "differentiators" and predictors than the "atemporal" EHR records. Our results also demonstrated that data representations obtained from transitive sequencing of EHR observations can present novel insights about the progression of the disease that are difficult to discern when clinical data are treated independently of the patient's history.
Collapse
Affiliation(s)
- Hossein Estiri
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA
- Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA
- Harvard Medical School, Boston, MA 02115, USA
| | - Zachary H. Strasser
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA
- Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA
- Harvard Medical School, Boston, MA 02115, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Jeffery G. Klann
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA
- Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA
- Harvard Medical School, Boston, MA 02115, USA
| | - Thomas H. McCoy
- Harvard Medical School, Boston, MA 02115, USA
- Center for Quantitative Health, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Kavishwar B. Wagholikar
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA
- Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA
- Harvard Medical School, Boston, MA 02115, USA
| | - Sebastien Vasey
- Department of Mathematics, Harvard University, Cambridge, MA 02138, USA
| | - Victor M. Castro
- Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA
| | - MaryKate E. Murphy
- Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA
| | - Shawn N. Murphy
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA
- Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA
- Harvard Medical School, Boston, MA 02115, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
- Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, USA
| |
Collapse
|
18
|
Liu Y, Zhang Q, Zhao G, Liu G, Liu Z. Deep Learning-Based Method of Diagnosing Hyperlipidemia and Providing Diagnostic Markers Automatically. Diabetes Metab Syndr Obes 2020; 13:679-691. [PMID: 32210601 PMCID: PMC7073442 DOI: 10.2147/dmso.s242585] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Accepted: 02/26/2020] [Indexed: 12/18/2022] Open
Abstract
INTRODUCTION The research of auxiliary diagnosis has always been one of the hotspots in the world. The implementation of auxiliary diagnosis support algorithm for medical text data faces challenges with interpretability and creditability. The improvement of clinical diagnostic techniques means not only the improvement of diagnostic accuracy but also the further study of diagnostic basis. Traditional research methods for diagnostic markers often require a large amount of time and economic costs. Research objects are often dozens of samples, and it is, therefore, difficult to synthesize large amounts of data. Therefore, the comprehensiveness and reliability of traditional methods have yet to be improved. Therefore, the establishment of a model that can automatically diagnose diseases and automatically provide a diagnostic basis at the same time has a positive effect on the improvement of medical diagnostic techniques. METHODS Here, we established an auxiliary diagnostic tool based on attention deep learning algorithm to diagnostic hyperlipemia and automatically predict the corresponding diagnostic markers using hematological parameters. In this paper, we not only demonstrated the ability of the proposed model to automatically diagnose diseases using text-based medical data, such as physiological parameters, but also demonstrated its ability to forecast disease diagnostic markers. Human physiological parameters are used as input to the model, and the doctor's diagnosis results as an output. Through the attention layer, the degree of attention of the model to different physiological parameters can be obtained, that is, the model provides a diagnostic basis. RESULTS It achieved 94% ACC, 97.48% AUC, 96% sensitivity and 92% specificity with the test dataset. All the above samples are drawn from clinical practice. Moreover, the model predicted the diagnostic markers of hyperlipidemia by the attention mechanism, and the results were fully agreeable to the golden criteria. DISCUSSION The auxiliary diagnosis system proposed in this paper not only achieves the accurate and robust performance, and can be used for the preliminary diagnosis of patients, but also showing its great potential to discover new diagnostic markers. Therefore, it not only can improve the efficiency of clinical diagnosis but also shorten the research period of researching a diagnosis basis to an extent. It has a positive significance to the development of the medical diagnosis level.
Collapse
Affiliation(s)
- Yuliang Liu
- College of Electronic Information and Automation, Tianjin University of Science and Technology, Tianjin300222, People’s Republic of China
| | - Quan Zhang
- College of Electronic Information and Automation, Tianjin University of Science and Technology, Tianjin300222, People’s Republic of China
| | - Geng Zhao
- Tianjin Medical University Hospital for Metabolic Disease, Tianjin300134, People’s Republic of China
| | - Guohua Liu
- College of Electronic Information and Optical Engineering, Nankai University, Tianjin300350, People’s Republic of China
- Tianjin Key Laboratory of Optoelectronic Sensor and Sensing Network Technology, Nankai University, Tianjin300350, People’s Republic of China
| | - Zhiang Liu
- School of Electrical Engineering and Automation, Harbin Institute of Technology, Harbin150001, People’s Republic of China
| |
Collapse
|
19
|
Juhn Y, Liu H. Artificial intelligence approaches using natural language processing to advance EHR-based clinical research. J Allergy Clin Immunol 2020; 145:463-469. [PMID: 31883846 PMCID: PMC7771189 DOI: 10.1016/j.jaci.2019.12.897] [Citation(s) in RCA: 96] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Revised: 12/18/2019] [Accepted: 12/19/2019] [Indexed: 01/17/2023]
Abstract
The wide adoption of electronic health record systems in health care generates big real-world data that open new venues to conduct clinical research. As a large amount of valuable clinical information is locked in clinical narratives, natural language processing techniques as an artificial intelligence approach have been leveraged to extract information from clinical narratives in electronic health records. This capability of natural language processing potentially enables automated chart review for identifying patients with distinctive clinical characteristics in clinical care and reduces methodological heterogeneity in defining phenotype, obscuring biological heterogeneity in research concerning allergy, asthma, and immunology. This brief review discusses the current literature on the secondary use of electronic health record data for clinical research concerning allergy, asthma, and immunology and highlights the potential, challenges, and implications of natural language processing techniques.
Collapse
Affiliation(s)
- Young Juhn
- Precision Population Science Lab, Division of Community Pediatric and Adolescent Medicine, Department of Pediatric and Adolescent Medicine, Rochester, Minn; Division of Allergy, Department of Medicine, Mayo Clinic, Rochester, Minn.
| | - Hongfang Liu
- Division of Digital Health, Department of Health Sciences Research, Mayo Clinic, Rochester, Minn
| |
Collapse
|
20
|
Moon S, Liu S, Scott CG, Samudrala S, Abidian MM, Geske JB, Noseworthy PA, Shellum JL, Chaudhry R, Ommen SR, Nishimura RA, Liu H, Arruda-Olson AM. Automated extraction of sudden cardiac death risk factors in hypertrophic cardiomyopathy patients by natural language processing. Int J Med Inform 2019; 128:32-38. [PMID: 31160009 DOI: 10.1016/j.ijmedinf.2019.05.008] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Revised: 01/19/2019] [Accepted: 05/11/2019] [Indexed: 01/12/2023]
Abstract
BACKGROUND The management of hypertrophic cardiomyopathy (HCM) patients requires the knowledge of risk factors associated with sudden cardiac death (SCD). SCD risk factors such as syncope and family history of SCD (FH-SCD) as well as family history of HCM (FH-HCM) are documented in electronic health records (EHRs) as clinical narratives. Automated extraction of risk factors from clinical narratives by natural language processing (NLP) may expedite management workflow of HCM patients. The aim of this study was to develop and deploy NLP algorithms for automated extraction of syncope, FH-SCD, and FH-HCM from clinical narratives. METHODS AND RESULTS We randomly selected 200 patients from the Mayo HCM registry for development (n = 100) and testing (n = 100) of NLP algorithms for extraction of syncope, FH-SCD as well as FH-HCM from clinical narratives of EHRs. The clinical reference standard was manually abstracted by 2 independent annotators. Performance of NLP algorithms was compared to aggregation and summarization of data entries in the HCM registry for syncope, FH-SCD, and FH-HCM. We also compared the NLP algorithms with billing codes for syncope as well as responses to patient survey questions for FH-SCD and FH-HCM. These analyses demonstrated NLP had superior sensitivity (0.96 vs 0.39, p < 0.001) and comparable specificity (0.90 vs 0.92, p = 0.74) and PPV (0.90 vs 0.83, p = 0.37) compared to billing codes for syncope. For FH-SCD, NLP outperformed survey responses for all parameters (sensitivity: 0.91 vs 0.59, p = 0.002; specificity: 0.98 vs 0.50, p < 0.001; PPV: 0.97 vs 0.38, p < 0.001). NLP also achieved superior sensitivity (0.95 vs 0.24, p < 0.001) with comparable specificity (0.95 vs 1.0, p-value not calculable) and positive predictive value (PPV) (0.92 vs 1.0, p = 0.09) compared to survey responses for FH-HCM. CONCLUSIONS Automated extraction of syncope, FH-SCD and FH-HCM using NLP is feasible and has promise to increase efficiency of workflow for providers managing HCM patients.
Collapse
Affiliation(s)
- Sungrim Moon
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Sijia Liu
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Christopher G Scott
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Sujith Samudrala
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Mohamed M Abidian
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Jeffrey B Geske
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | | | - Jane L Shellum
- Robert and Patricia Kern Center for Science of Health Care Delivery, Mayo Clinic, Rochester, MN, USA
| | - Rajeev Chaudhry
- Robert and Patricia Kern Center for Science of Health Care Delivery, Mayo Clinic, Rochester, MN, USA; Division of Community Internal Medicine, Mayo Clinic, Rochester, MN, USA
| | - Steve R Ommen
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Rick A Nishimura
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Hongfang Liu
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Adelaide M Arruda-Olson
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA; Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
21
|
Chen D, Liu S, Kingsbury P, Sohn S, Storlie CB, Habermann EB, Naessens JM, Larson DW, Liu H. Deep learning and alternative learning strategies for retrospective real-world clinical data. NPJ Digit Med 2019; 2:43. [PMID: 31304389 PMCID: PMC6550223 DOI: 10.1038/s41746-019-0122-0] [Citation(s) in RCA: 108] [Impact Index Per Article: 21.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Accepted: 05/09/2019] [Indexed: 02/06/2023] Open
Abstract
In recent years, there is increasing enthusiasm in the healthcare research community for artificial intelligence to provide big data analytics and augment decision making. One of the prime reasons for this is the enormous impact of deep learning for utilization of complex healthcare big data. Although deep learning is a powerful analytic tool for the complex data contained in electronic health records (EHRs), there are also limitations which can make the choice of deep learning inferior in some healthcare applications. In this paper, we give a brief overview of the limitations of deep learning illustrated through case studies done over the years aiming to promote the consideration of alternative analytic strategies for healthcare.
Collapse
Affiliation(s)
- David Chen
- Division of Digital Health Sciences, Mayo Clinic, Rochester, MN USA
| | - Sijia Liu
- Division of Digital Health Sciences, Mayo Clinic, Rochester, MN USA
| | - Paul Kingsbury
- Division of Digital Health Sciences, Mayo Clinic, Rochester, MN USA
| | - Sunghwan Sohn
- Division of Digital Health Sciences, Mayo Clinic, Rochester, MN USA
| | - Curtis B. Storlie
- Department of Health Science Research, Mayo Clinic, Rochester, MN USA
| | | | - James M. Naessens
- Department of Health Science Research, Mayo Clinic, Rochester, MN USA
| | - David W. Larson
- Department of Colorectal Surgery, Mayo Clinic, Rochester, MN USA
| | - Hongfang Liu
- Division of Digital Health Sciences, Mayo Clinic, Rochester, MN USA
| |
Collapse
|
22
|
Moskovitch R, Shahar Y, Wang F, Hripcsak G. Temporal biomedical data analytics. J Biomed Inform 2019; 90:103092. [PMID: 30654029 PMCID: PMC9745669 DOI: 10.1016/j.jbi.2018.12.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2018] [Accepted: 12/24/2018] [Indexed: 02/07/2023]
Affiliation(s)
- Robert Moskovitch
- Department of Information Systems Engineering, Ben Gurion University of the Negev, Beersheba, Israel.
| | - Yuval Shahar
- Department of Information Systems Engineering, Ben Gurion University of the Negev, Beersheba, Israel.
| | - Fei Wang
- Department of Healthcare Policy and Research, Weill Cornell Medical College, Cornell University, New York, NY, USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University, New York, NY, USA.
| |
Collapse
|