Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zhao J, Papapetrou P, Asker L, Boström H. Learning from heterogeneous temporal data in electronic health records. J Biomed Inform 2016;65:105-119. [PMID: 27919732 DOI: 10.1016/j.jbi.2016.11.006] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2016] [Revised: 10/21/2016] [Accepted: 11/21/2016] [Indexed: 11/30/2022]

For:	Zhao J, Papapetrou P, Asker L, Boström H. Learning from heterogeneous temporal data in electronic health records. J Biomed Inform 2016;65:105-119. [PMID: 27919732 DOI: 10.1016/j.jbi.2016.11.006] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2016] [Revised: 10/21/2016] [Accepted: 11/21/2016] [Indexed: 11/30/2022]

Number

Cited by Other Article(s)

Xie J, Wang Z, Yu Z, Ding Y, Guo B. Prototype Learning for Medical Time Series Classification via Human-Machine Collaboration. SENSORS (BASEL, SWITZERLAND) 2024;24:2655. [PMID: 38676273 PMCID: PMC11054195 DOI: 10.3390/s24082655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 04/15/2024] [Accepted: 04/15/2024] [Indexed: 04/28/2024]

Abstract

Deep neural networks must address the dual challenge of delivering high-accuracy predictions and providing user-friendly explanations. While deep models are widely used in the field of time series modeling, deciphering the core principles that govern the models' outputs remains a significant challenge. This is crucial for fostering the development of trusted models and facilitating domain expert validation, thereby empowering users and domain experts to utilize them confidently in high-risk decision-making contexts (e.g., decision-support systems in healthcare). In this work, we put forward a deep prototype learning model that supports interpretable and manipulable modeling and classification of medical time series (i.e., ECG signal). Specifically, we first optimize the representation of single heartbeat data by employing a bidirectional long short-term memory and attention mechanism, and then construct prototypes during the training phase. The final classification outcomes (i.e., normal sinus rhythm, atrial fibrillation, and other rhythm) are determined by comparing the input with the obtained prototypes. Moreover, the proposed model presents a human-machine collaboration mechanism, allowing domain experts to refine the prototypes by integrating their expertise to further enhance the model's performance (contrary to the human-in-the-loop paradigm, where humans primarily act as supervisors or correctors, intervening when required, our approach focuses on a human-machine collaboration, wherein both parties engage as partners, enabling more fluid and integrated interactions). The experimental outcomes presented herein delineate that, within the realm of binary classification tasks-specifically distinguishing between normal sinus rhythm and atrial fibrillation-our proposed model, albeit registering marginally lower performance in comparison to certain established baseline models such as Convolutional Neural Networks (CNNs) and bidirectional long short-term memory with attention mechanisms (Bi-LSTMAttns), evidently surpasses other contemporary state-of-the-art prototype baseline models. Moreover, it demonstrates significantly enhanced performance relative to these prototype baseline models in the context of triple classification tasks, which encompass normal sinus rhythm, atrial fibrillation, and other rhythm classifications. The proposed model manifests a commendable prediction accuracy of 0.8414, coupled with macro precision, recall, and F1-score metrics of 0.8449, 0.8224, and 0.8235, respectively, achieving both high classification accuracy as well as good interpretability.

Collapse

Grzenda A, Widge AS. Electronic health records and stratified psychiatry: bridge to precision treatment? Neuropsychopharmacology 2024;49:285-290. [PMID: 37667021 PMCID: PMC10700348 DOI: 10.1038/s41386-023-01724-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 08/24/2023] [Accepted: 08/27/2023] [Indexed: 09/06/2023]

Berge GT, Granmo OC, Tveit TO, Ruthjersen AL, Sharma J. Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records. BMC Med Inform Decis Mak 2023;23:188. [PMID: 37723446 PMCID: PMC10507898 DOI: 10.1186/s12911-023-02271-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Accepted: 08/17/2023] [Indexed: 09/20/2023] Open

Abstract

BACKGROUND

Data mining of electronic health records (EHRs) has a huge potential for improving clinical decision support and to help healthcare deliver precision medicine. Unfortunately, the rule-based and machine learning-based approaches used for natural language processing (NLP) in healthcare today all struggle with various shortcomings related to performance, efficiency, or transparency.

METHODS

In this paper, we address these issues by presenting a novel method for NLP that implements unsupervised learning of word embeddings, semi-supervised learning for simplified and accelerated clinical vocabulary and concept building, and deterministic rules for fine-grained control of information extraction. The clinical language is automatically learnt, and vocabulary, concepts, and rules supporting a variety of NLP downstream tasks can further be built with only minimal manual feature engineering and tagging required from clinical experts. Together, these steps create an open processing pipeline that gradually refines the data in a transparent way, which greatly improves the interpretable nature of our method. Data transformations are thus made transparent and predictions interpretable, which is imperative for healthcare. The combined method also has other advantages, like potentially being language independent, demanding few domain resources for maintenance, and able to cover misspellings, abbreviations, and acronyms. To test and evaluate the combined method, we have developed a clinical decision support system (CDSS) named Information System for Clinical Concept Searching (ICCS) that implements the method for clinical concept tagging, extraction, and classification.

RESULTS

In empirical studies the method shows high performance (recall 92.6%, precision 88.8%, F-measure 90.7%), and has demonstrated its value to clinical practice. Here we employ a real-life EHR-derived dataset to evaluate the method's performance on the task of classification (i.e., detecting patient allergies) against a range of common supervised learning algorithms. The combined method achieves state-of-the-art performance compared to the alternative methods we evaluate. We also perform a qualitative analysis of common word embedding methods on the task of word similarity to examine their potential for supporting automatic feature engineering for clinical NLP tasks.

CONCLUSIONS

Based on the promising results, we suggest more research should be aimed at exploiting the inherent synergies between unsupervised, supervised, and rule-based paradigms for clinical NLP.

Collapse

Pungitore S, Subbian V. Assessment of Prediction Tasks and Time Window Selection in Temporal Modeling of Electronic Health Record Data: a Systematic Review. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2023;7:313-331. [PMID: 37637723 PMCID: PMC10449760 DOI: 10.1007/s41666-023-00143-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 04/12/2023] [Accepted: 07/28/2023] [Indexed: 08/29/2023]

Abstract

Temporal electronic health record (EHR) data are often preferred for clinical prediction tasks because they offer more complete representations of a patient's pathophysiology than static data. A challenge when working with temporal EHR data is problem formulation, which includes defining the time windows of interest and the prediction task. Our objective was to conduct a systematic review that assessed the definition and reporting of concepts relevant to temporal clinical prediction tasks. We searched PubMed® and IEEE Xplore® databases for studies from January 1, 2010 applying machine learning models to EHR data for patient outcome prediction. Publications applying time-series methods were selected for further review. We identified 92 studies and summarized them by clinical context and definition and reporting of the prediction problem. For the time windows of interest, 12 studies did not discuss window lengths, 57 used a single set of window lengths, and 23 evaluated the relationship between window length and model performance. We also found that 72 studies had appropriate reporting of the prediction task. However, evaluation of prediction problem formulation for temporal EHR data was complicated by heterogeneity in assessing and reporting of these concepts. Even among studies modeling similar clinical outcomes, there were variations in terminology used to describe the prediction problem, rationale for window lengths, and determination of the outcome of interest. As temporal modeling using EHR data expands, minimal reporting standards should include time-series specific concerns to promote rigor and reproducibility in future studies and facilitate model implementation in clinical settings.

Supplementary Information

The online version contains supplementary material available at 10.1007/s41666-023-00143-4.

Collapse

Herrero-Zazo M, Fitzgerald T, Taylor V, Street H, Chaudhry AN, Bradley JR, Birney E, Keevil VL. Using machine learning to model older adult inpatient trajectories from electronic health records data. iScience 2022;26:105876. [PMID: 36691609 PMCID: PMC9860485 DOI: 10.1016/j.isci.2022.105876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 10/25/2022] [Accepted: 12/20/2022] [Indexed: 12/26/2022] Open

Vauvelle A, Creed P, Denaxas S. Neural-signature methods for structured EHR prediction. BMC Med Inform Decis Mak 2022;22:320. [PMID: 36476601 PMCID: PMC9730578 DOI: 10.1186/s12911-022-02055-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 11/17/2022] [Indexed: 12/12/2022] Open

Rabhi S, Blanchard F, Diallo AM, Zeghlache D, Lukas C, Berot A, Delemer B, Barraud S. Temporal deep learning framework for retinopathy prediction in patients with type 1 diabetes. Artif Intell Med 2022;133:102408. [PMID: 36328668 DOI: 10.1016/j.artmed.2022.102408] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Revised: 09/17/2022] [Accepted: 09/21/2022] [Indexed: 12/13/2022]

Affiliation(s)

Sara Rabhi Department RS2M, Télécom SudParis, 9 rue Charles Fourier, Evry, 91000, France.
Frédéric Blanchard CRESTIC EA 3804, Université Reims Champagne-Ardenne, UFR Sciences Exactes et Naturelles, Moulin de la Housse, 51687, Reims, France
Alpha Mamadou Diallo CHU de Reims - Hôpital Robert Debré, Service d'Endocrinologie - Diabète - Nutrition, Avenue du Général Koenig, 51092, Reims, France; Laboratoire de recherche en Santé Publique, Vieillissement, Qualité de vie et Réadaptation des Sujets Fragiles, EA 3797, Université Reims Champagne-Ardenne, 51092, Reims, France
Djamal Zeghlache Department RS2M, Télécom SudParis, 9 rue Charles Fourier, Evry, 91000, France
Céline Lukas CHU de Reims - Hôpital Robert Debré, Service d'Endocrinologie - Diabète - Nutrition, Avenue du Général Koenig, 51092, Reims, France; Laboratoire de recherche en Santé Publique, Vieillissement, Qualité de vie et Réadaptation des Sujets Fragiles, EA 3797, Université Reims Champagne-Ardenne, 51092, Reims, France
Aurélie Berot CHU de Reims - American Memorial Hospital - Service de Pédiatrie, 47 rue Cognac Jay, 51092, Reims, France; Laboratoire d'Education et Pratiques de Santé, EA 3412, Université Sorbonne Paris Nord, 74 rue Marcel Cachin, 93017, Bobigny, France
Brigitte Delemer CRESTIC EA 3804, Université Reims Champagne-Ardenne, UFR Sciences Exactes et Naturelles, Moulin de la Housse, 51687, Reims, France; CHU de Reims - Hôpital Robert Debré, Service d'Endocrinologie - Diabète - Nutrition, Avenue du Général Koenig, 51092, Reims, France
Sara Barraud CRESTIC EA 3804, Université Reims Champagne-Ardenne, UFR Sciences Exactes et Naturelles, Moulin de la Housse, 51687, Reims, France; CHU de Reims - Hôpital Robert Debré, Service d'Endocrinologie - Diabète - Nutrition, Avenue du Général Koenig, 51092, Reims, France

Collapse

A novel kernel based approach to arbitrary length symbolic data with application to type 2 diabetes risk. Sci Rep 2022;12:4985. [PMID: 35322076 PMCID: PMC8943170 DOI: 10.1038/s41598-022-08757-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Accepted: 03/07/2022] [Indexed: 11/08/2022] Open

Huang Y, Zheng Z, Ma M, Xin X, Liu H, Fei X, Wei L, Chen H. Improving Performance of Outcome Prediction for In-patients with Acute Myocardial Infarction Based on Embedding Representation Learned from Electronic Medical Records: Development and Validation Study (Preprint). J Med Internet Res 2022;24:e37486. [PMID: 35921141 PMCID: PMC9386580 DOI: 10.2196/37486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 06/02/2022] [Accepted: 07/18/2022] [Indexed: 11/18/2022] Open

Abstract

Background

The widespread secondary use of electronic medical records (EMRs) promotes health care quality improvement. Representation learning that can automatically extract hidden information from EMR data has gained increasing attention.

Objective

We aimed to propose a patient representation with more feature associations and task-specific feature importance to improve the outcome prediction performance for inpatients with acute myocardial infarction (AMI).

Methods

Medical concepts, including patients’ age, gender, disease diagnoses, laboratory tests, structured radiological features, procedures, and medications, were first embedded into real-value vectors using the improved skip-gram algorithm, where concepts in the context windows were selected by feature association strengths measured by association rule confidence. Then, each patient was represented as the sum of the feature embeddings weighted by the task-specific feature importance, which was applied to facilitate predictive model prediction from global and local perspectives. We finally applied the proposed patient representation into mortality risk prediction for 3010 and 1671 AMI inpatients from a public data set and a private data set, respectively, and compared it with several reference representation methods in terms of the area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), and F1-score.

Results

Compared with the reference methods, the proposed embedding-based representation showed consistently superior predictive performance on the 2 data sets, achieving mean AUROCs of 0.878 and 0.973, AUPRCs of 0.220 and 0.505, and F1-scores of 0.376 and 0.674 for the public and private data sets, respectively, while the greatest AUROCs, AUPRCs, and F1-scores among the reference methods were 0.847 and 0.939, 0.196 and 0.283, and 0.344 and 0.361 for the public and private data sets, respectively. Feature importance integrated in patient representation reflected features that were also critical in prediction tasks and clinical practice.

Conclusions

The introduction of feature associations and feature importance facilitated an effective patient representation and contributed to prediction performance improvement and model interpretation.

Collapse

Xie F, Yuan H, Ning Y, Ong MEH, Feng M, Hsu W, Chakraborty B, Liu N. Deep learning for temporal data representation in electronic health records: A systematic review of challenges and methodologies. J Biomed Inform 2021;126:103980. [PMID: 34974189 DOI: 10.1016/j.jbi.2021.103980] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Revised: 11/07/2021] [Accepted: 12/20/2021] [Indexed: 12/21/2022]

Yang YC, Islam SU, Noor A, Khan S, Afsar W, Nazir S. Influential Usage of Big Data and Artificial Intelligence in Healthcare. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021;2021:5812499. [PMID: 34527076 PMCID: PMC8437645 DOI: 10.1155/2021/5812499] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 08/09/2021] [Indexed: 01/07/2023]

Estiri H, Strasser ZH, Murphy SN. High-throughput phenotyping with temporal sequences. J Am Med Inform Assoc 2021;28:772-781. [PMID: 33313899 DOI: 10.1093/jamia/ocaa288] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Accepted: 11/04/2020] [Indexed: 12/15/2022] Open

Abstract

OBJECTIVE

High-throughput electronic phenotyping algorithms can accelerate translational research using data from electronic health record (EHR) systems. The temporal information buried in EHRs is often underutilized in developing computational phenotypic definitions. This study aims to develop a high-throughput phenotyping method, leveraging temporal sequential patterns from EHRs.

MATERIALS AND METHODS

We develop a representation mining algorithm to extract 5 classes of representations from EHR diagnosis and medication records: the aggregated vector of the records (aggregated vector representation), the standard sequential patterns (sequential pattern mining), the transitive sequential patterns (transitive sequential pattern mining), and 2 hybrid classes. Using EHR data on 10 phenotypes from the Mass General Brigham Biobank, we train and validate phenotyping algorithms.

RESULTS

Phenotyping with temporal sequences resulted in a superior classification performance across all 10 phenotypes compared with the standard representations in electronic phenotyping. The high-throughput algorithm's classification performance was superior or similar to the performance of previously published electronic phenotyping algorithms. We characterize and evaluate the top transitive sequences of diagnosis records paired with the records of risk factors, symptoms, complications, medications, or vaccinations.

DISCUSSION

The proposed high-throughput phenotyping approach enables seamless discovery of sequential record combinations that may be difficult to assume from raw EHR data. Transitive sequences offer more accurate characterization of the phenotype, compared with its individual components, and reflect the actual lived experiences of the patients with that particular disease.

CONCLUSION

Sequential data representations provide a precise mechanism for incorporating raw EHR records into downstream machine learning. Our approach starts with user interpretability and works backward to the technology.

Collapse

Giangreco NP, Tatonetti NP. Evaluating risk detection methods to uncover ontogenic-mediated adverse drug effect mechanisms in children. BioData Min 2021;14:34. [PMID: 34294093 PMCID: PMC8296590 DOI: 10.1186/s13040-021-00264-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Accepted: 06/16/2021] [Indexed: 02/08/2023] Open

Alhassan Z, Watson M, Budgen D, Alshammari R, Alessa A, Al Moubayed N. Improving Current Glycated Hemoglobin Prediction in Adults: Use of Machine Learning Algorithms With Electronic Health Records. JMIR Med Inform 2021;9:e25237. [PMID: 34028357 PMCID: PMC8185616 DOI: 10.2196/25237] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Revised: 01/05/2021] [Accepted: 04/22/2021] [Indexed: 01/30/2023] Open

Abstract

Background

Predicting the risk of glycated hemoglobin (HbA_1c) elevation can help identify patients with the potential for developing serious chronic health problems, such as diabetes. Early preventive interventions based upon advanced predictive models using electronic health records data for identifying such patients can ultimately help provide better health outcomes.

Objective

Our study investigated the performance of predictive models to forecast HbA_1c elevation levels by employing several machine learning models. We also examined the use of patient electronic health record longitudinal data in the performance of the predictive models. Explainable methods were employed to interpret the decisions made by the black box models.

Methods

This study employed multiple logistic regression, random forest, support vector machine, and logistic regression models, as well as a deep learning model (multilayer perceptron) to classify patients with normal (<5.7%) and elevated (≥5.7%) levels of HbA_1c. We also integrated current visit data with historical (longitudinal) data from previous visits. Explainable machine learning methods were used to interrogate the models and provide an understanding of the reasons behind the decisions made by the models. All models were trained and tested using a large data set from Saudi Arabia with 18,844 unique patient records.

Results

The machine learning models achieved promising results for predicting current HbA_1c elevation risk. When coupled with longitudinal data, the machine learning models outperformed the multiple logistic regression model used in the comparative study. The multilayer perceptron model achieved an accuracy of 83.22% for the area under receiver operating characteristic curve when used with historical data. All models showed a close level of agreement on the contribution of random blood sugar and age variables with and without longitudinal data.

Conclusions

This study shows that machine learning models can provide promising results for the task of predicting current HbA_1c levels (≥5.7% or less). Using patients’ longitudinal data improved the performance and affected the relative importance for the predictors used. The models showed results that are consistent with comparable studies.

Collapse

Beauchemin M, Weng C, Sung L, Pichon A, Abbott M, Hershman DL, Schnall R. Data Quality of Chemotherapy-Induced Nausea and Vomiting Documentation. Appl Clin Inform 2021;12:320-328. [PMID: 33882585 PMCID: PMC8060070 DOI: 10.1055/s-0041-1728698] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Accepted: 03/02/2021] [Indexed: 02/08/2023] Open

Ljubic B, Roychoudhury S, Cao XH, Pavlovski M, Obradovic S, Nair R, Glass L, Obradovic Z. Influence of medical domain knowledge on deep learning for Alzheimer's disease prediction. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2020;197:105765. [PMID: 33011665 PMCID: PMC7502243 DOI: 10.1016/j.cmpb.2020.105765] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Accepted: 09/16/2020] [Indexed: 05/03/2023]

Abstract

BACKGROUND AND OBJECTIVE

Alzheimer's disease (AD) is the most common type of dementia that can seriously affect a person's ability to perform daily activities. Estimates indicate that AD may rank third as a cause of death for older people, after heart disease and cancer. Identification of individuals at risk for developing AD is imperative for testing therapeutic interventions. The objective of the study was to determine could diagnostics of AD from EMR data alone (without relying on diagnostic imaging) be significantly improved by applying clinical domain knowledge in data preprocessing and positive dataset selection rather than setting naïve filters.

METHODS

Data were extracted from the repository of heterogeneous ambulatory EMR data, collected from primary care medical offices all over the U.S. Medical domain knowledge was applied to build a positive dataset from data relevant to AD. Selected Clinically Relevant Positive (SCRP) datasets were used as inputs to a Long-Short-Term Memory (LSTM) Recurrent Neural Network (RNN) deep learning model to predict will the patient develop AD.

RESULTS

Risk scores prediction of AD using the drugs domain information in an SCRP AD dataset of 2,324 patients achieved high out-of-sample score - 0.98-0.99 Area Under the Precision-Recall Curve (AUPRC) when using 90% of SCRP dataset for training. AUPRC dropped to 0.89 when training the model using less than 1,500 cases from the SCRP dataset. The model was still significantly better than when using naïve dataset selection.

CONCLUSION

The LSTM RNN method that used data relevant to AD performed significantly better when learning from the SCRP dataset than when datasets were selected naïvely. The integration of qualitative medical knowledge for dataset selection and deep learning technology provided a mechanism for significant improvement of AD prediction. Accurate and early prediction of AD is significant in the identification of patients for clinical trials, which can possibly result in the discovery of new drugs for treatments of AD. Also, the contribution of the proposed predictions of AD is a better selection of patients who need imaging diagnostics for differential diagnosis of AD from other degenerative brain disorders.

Collapse

Quantitative and temporal approach to utilising electronic medical records from general practices in mental health prediction. Comput Biol Med 2020;125:103973. [DOI: 10.1016/j.compbiomed.2020.103973] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2020] [Revised: 08/11/2020] [Accepted: 08/11/2020] [Indexed: 01/06/2023]

Estiri H, Strasser ZH, Klann JG, McCoy TH, Wagholikar KB, Vasey S, Castro VM, Murphy ME, Murphy SN. Transitive Sequencing Medical Records for Mining Predictive and Interpretable Temporal Representations. PATTERNS (NEW YORK, N.Y.) 2020;1:100051. [PMID: 32835307 PMCID: PMC7301790 DOI: 10.1016/j.patter.2020.100051] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 04/27/2020] [Accepted: 05/26/2020] [Indexed: 12/13/2022]

Affiliation(s)

Hossein Estiri Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA Harvard Medical School, Boston, MA 02115, USA
Zachary H. Strasser Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA Harvard Medical School, Boston, MA 02115, USA Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
Jeffery G. Klann Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA Harvard Medical School, Boston, MA 02115, USA
Thomas H. McCoy Harvard Medical School, Boston, MA 02115, USA Center for Quantitative Health, Massachusetts General Hospital, Boston, MA 02114, USA
Kavishwar B. Wagholikar Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA Harvard Medical School, Boston, MA 02115, USA
Sebastien Vasey Department of Mathematics, Harvard University, Cambridge, MA 02138, USA
Victor M. Castro Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA
MaryKate E. Murphy Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA
Shawn N. Murphy Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA Harvard Medical School, Boston, MA 02115, USA Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, USA

Collapse

Gao S, Alawad M, Schaefferkoetter N, Penberthy L, Wu XC, Durbin EB, Coyle L, Ramanathan A, Tourassi G. Using case-level context to classify cancer pathology reports. PLoS One 2020;15:e0232840. [PMID: 32396579 PMCID: PMC7217446 DOI: 10.1371/journal.pone.0232840] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 04/22/2020] [Indexed: 11/18/2022] Open

Combi C, Pozzi G. Clinical Information Systems and Artificial Intelligence: Recent Research Trends. Yearb Med Inform 2019;28:83-94. [PMID: 31419820 PMCID: PMC6697548 DOI: 10.1055/s-0039-1677915] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open

Smooth Bayesian network model for the prediction of future high-cost patients with COPD. Int J Med Inform 2019;126:147-155. [PMID: 31029256 DOI: 10.1016/j.ijmedinf.2019.03.017] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Revised: 02/28/2019] [Accepted: 03/26/2019] [Indexed: 02/05/2023]

Abstract

INTRODUCTION

The clinical course of chronic obstructive pulmonary disease (COPD) is marked by acute exacerbation events that increase hospitalization rates and healthcare spending. The early identification of future high-cost patients with COPD may decrease healthcare spending by informing individualized interventions that prevent exacerbation events and decelerate disease progression. Existing studies of cost prediction of other chronic diseases have applied regression and machine-learning methods that cannot capture the complex causal relationships between COPD factors. Thus, the exploration of these factors through nonlinear, high-dimensional but explainable modeling is greatly needed.

OBJECTIVES

We aimed to develop a machine-learning model to identify future high-cost patients with COPD. Such a model should incorporate expert knowledge about causal relationships, and the method for estimating the model could provide more accurate predictions than other machine learning methods.

METHODS

We used the 2011-2013 medical insurance data of patients with COPD in a large city. The data set included demographic information and admission records. Leveraging on developments in graphical modeling methods, we proposed a smooth Bayesian network (SBN) model for the prediction of high-cost individuals using medical insurance data. The modeling method incorporated some expert knowledge about causal relationships (i.e., about the Bayesian network structure). We employed a smoothing kernel based on the weighted nearest neighborhood method in the SBN model to address overfitting, case-mix effect, and data sparsity (i.e., using data about "similar patients").

RESULTS

The proposed SBN achieved the area under curve (AUC) of 0.80 and showed considerable improvement over the baseline machine-learning methods. Besides confirming the known factors from the literature, we found "region" (i.e., a suburban or urban area) to be a significant factor, and that in a 3-tier system with primary, secondary and tertiary hospitals, COPD patients who had been admitted to primary hospitals were more likely to develop into future high-cost patients than patients who had been admitted to tertiary hospitals.

CONCLUSION

The proposed SBN model not only obtained higher prediction accuracy and stronger generalizability than a number of benchmark machine-learning methods, but also used the Bayesian network to capture the complex causal relationships between different predictors by incorporating expert knowledge. Furthermore, a framework was developed to establish the relationships between exposure to historical trajectory and future outcome, which can also be applied to other temporal data to model different trajectory information and predict other outcomes.

Collapse

J48SS: A Novel Decision Tree Approach for the Handling of Sequential and Time Series Data. COMPUTERS 2019. [DOI: 10.3390/computers8010021] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Bagattini F, Karlsson I, Rebane J, Papapetrou P. A classification framework for exploiting sparse multi-variate temporal features with application to adverse drug event detection in medical records. BMC Med Inform Decis Mak 2019;19:7. [PMID: 30630486 PMCID: PMC6327495 DOI: 10.1186/s12911-018-0717-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2018] [Accepted: 12/04/2018] [Indexed: 12/13/2022] Open

Abstract

BACKGROUND

Adverse drug events (ADEs) as well as other preventable adverse events in the hospital setting incur a yearly monetary cost of approximately $3.5 billion, in the United States alone. Therefore, it is of paramount importance to reduce the impact and prevalence of ADEs within the healthcare sector, not only since it will result in reducing human suffering, but also as a means to substantially reduce economical strains on the healthcare system. One approach to mitigate this problem is to employ predictive models. While existing methods have been focusing on the exploitation of static features, limited attention has been given to temporal features.

METHODS

In this paper, we present a novel classification framework for detecting ADEs in complex Electronic health records (EHRs) by exploiting the temporality and sparsity of the underlying features. The proposed framework consists of three phases for transforming sparse and multi-variate time series features into a single-valued feature representation, which can then be used by any classifier. Moreover, we propose and evaluate three different strategies for leveraging feature sparsity by incorporating it into the new representation.

RESULTS

A large-scale evaluation on 15 ADE datasets extracted from a real-world EHR system shows that the proposed framework achieves significantly improved predictive performance compared to state-of-the-art. Moreover, our framework can reveal features that are clinically consistent with medical findings on ADE detection.

CONCLUSIONS

Our study and experimental findings demonstrate that temporal multi-variate features of variable length and with high sparsity can be effectively utilized to predict ADEs from EHRs. Two key advantages of our framework are that it is method agnostic, i.e., versatile, and of low computational cost, i.e., fast; hence providing an important building block for future exploitation within the domain of machine learning from EHRs.

Collapse

Wu S, Liu S, Sohn S, Moon S, Wi CI, Juhn Y, Liu H. Modeling asynchronous event sequences with RNNs. J Biomed Inform 2018;83:167-177. [PMID: 29883623 PMCID: PMC6103779 DOI: 10.1016/j.jbi.2018.05.016] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2017] [Revised: 05/10/2018] [Accepted: 05/26/2018] [Indexed: 12/14/2022]

Valmarska A, Miljkovic D, Lavrač N, Robnik-Šikonja M. Analysis of medications change in Parkinson’s disease progression data. J Intell Inf Syst 2018. [DOI: 10.1007/s10844-018-0502-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Schuler A, Callahan A, Jung K, Shah NH. Performing an Informatics Consult: Methods and Challenges. J Am Coll Radiol 2018;15:563-568. [PMID: 29396125 PMCID: PMC5901653 DOI: 10.1016/j.jacr.2017.12.023] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2017] [Accepted: 12/15/2017] [Indexed: 12/24/2022]

Verma SS, Ritchie MD. Another Round of "Clue" to Uncover the Mystery of Complex Traits. Genes (Basel) 2018;9:E61. [PMID: 29370075 PMCID: PMC5852557 DOI: 10.3390/genes9020061] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Revised: 12/19/2017] [Accepted: 01/15/2018] [Indexed: 12/13/2022] Open

Pham T, Tran T, Phung D, Venkatesh S. Predicting healthcare trajectories from medical records: A deep learning approach. J Biomed Inform 2017;69:218-229. [PMID: 28410981 DOI: 10.1016/j.jbi.2017.04.001] [Citation(s) in RCA: 141] [Impact Index Per Article: 20.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Revised: 03/23/2017] [Accepted: 04/01/2017] [Indexed: 11/18/2022]