Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Rasmy L, Tiryaki F, Zhou Y, Xiang Y, Tao C, Xu H, Zhi D. Representation of EHR data for predictive modeling: a comparison between UMLS and other terminologies. J Am Med Inform Assoc 2020;27:1593-1599. [PMID: 32930711 PMCID: PMC7647355 DOI: 10.1093/jamia/ocaa180] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Accepted: 07/24/2020] [Indexed: 01/23/2023] Open

For:	Rasmy L, Tiryaki F, Zhou Y, Xiang Y, Tao C, Xu H, Zhi D. Representation of EHR data for predictive modeling: a comparison between UMLS and other terminologies. J Am Med Inform Assoc 2020;27:1593-1599. [PMID: 32930711 PMCID: PMC7647355 DOI: 10.1093/jamia/ocaa180] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Accepted: 07/24/2020] [Indexed: 01/23/2023] Open

Number

Cited by Other Article(s)

Yoon W, Chen S, Gao Y, Zhao Z, Dligach D, Bitterman DS, Afshar M, Miller T. LCD Benchmark: Long Clinical Document Benchmark on Mortality Prediction for Language Models. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.26.24304920. [PMID: 38585973 PMCID: PMC10996733 DOI: 10.1101/2024.03.26.24304920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]

Malecki SL, Loffler A, Tamming D, Dyrby Johansen N, Biering-Sørensen T, Fralick M, Sohail S, Shi J, Roberts SB, Colacci M, Ismail M, Razak F, Verma AA. Development and external validation of tools for categorizing diagnosis codes in international hospital data. Int J Med Inform 2024;189:105508. [PMID: 38851134 DOI: 10.1016/j.ijmedinf.2024.105508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 03/17/2024] [Accepted: 05/27/2024] [Indexed: 06/10/2024]

Affiliation(s)

Sarah L Malecki Department of Medicine, University of Toronto, Toronto, ON, Canada.
Anne Loffler St. Michael's Hospital, University of Toronto, Toronto, ON, Canada
Daniel Tamming St. Michael's Hospital, University of Toronto, Toronto, ON, Canada
Niklas Dyrby Johansen Department of Cardiology, Copenhagen University Hospital - Herlev and Gentofte, Copenhagen, Denmark; Center for Translational Cardiology and Pragmatic Randomized Trials, Department of Biomedical Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark
Tor Biering-Sørensen Department of Cardiology, Copenhagen University Hospital - Herlev and Gentofte, Copenhagen, Denmark; Center for Translational Cardiology and Pragmatic Randomized Trials, Department of Biomedical Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark
Michael Fralick Division of General Internal Medicine, Sinai Health System, ON, Toronto, Canada; Institute of Health Policy, Management, and Evaluation, University of Toronto, Toronto, ON, Canada
Shahmir Sohail Department of Medicine, University of Toronto, Toronto, ON, Canada
Jessica Shi St. Michael's Hospital, University of Toronto, Toronto, ON, Canada
Surain B Roberts St. Michael's Hospital, University of Toronto, Toronto, ON, Canada
Michael Colacci Department of Medicine, University of Toronto, Toronto, ON, Canada; St. Michael's Hospital, University of Toronto, Toronto, ON, Canada; Institute of Health Policy, Management, and Evaluation, University of Toronto, Toronto, ON, Canada
Marwa Ismail St. Michael's Hospital, University of Toronto, Toronto, ON, Canada
Fahad Razak Department of Medicine, University of Toronto, Toronto, ON, Canada; St. Michael's Hospital, University of Toronto, Toronto, ON, Canada; Institute of Health Policy, Management, and Evaluation, University of Toronto, Toronto, ON, Canada
Amol A Verma Department of Medicine, University of Toronto, Toronto, ON, Canada; St. Michael's Hospital, University of Toronto, Toronto, ON, Canada; Institute of Health Policy, Management, and Evaluation, University of Toronto, Toronto, ON, Canada; Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada

Collapse

Mishra AK, Chong B, Arunachalam SP, Oberg AL, Majumder S. Machine Learning Models for Pancreatic Cancer Risk Prediction Using Electronic Health Record Data-A Systematic Review and Assessment. Am J Gastroenterol 2024:00000434-990000000-01167. [PMID: 38752654 DOI: 10.14309/ajg.0000000000002870] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 05/06/2024] [Indexed: 06/20/2024]

Abstract

INTRODUCTION

Accurate risk prediction can facilitate screening and early detection of pancreatic cancer (PC). We conducted a systematic review to critically evaluate effectiveness of machine learning (ML) and artificial intelligence (AI) techniques applied to electronic health records (EHR) for PC risk prediction.

METHODS

Ovid MEDLINE(R), Ovid EMBASE, Ovid Cochrane Central Register of Controlled Trials, Ovid Cochrane Database of Systematic Reviews, Scopus, and Web of Science were searched for articles that utilized ML/AI techniques to predict PC, published between January 1, 2012, and February 1, 2024. Study selection and data extraction were conducted by 2 independent reviewers. Critical appraisal and data extraction were performed using the CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies checklist. Risk of bias and applicability were examined using prediction model risk of bias assessment tool.

RESULTS

Thirty studies including 169,149 PC cases were identified. Logistic regression was the most frequent modeling method. Twenty studies utilized a curated set of known PC risk predictors or those identified by clinical experts. ML model discrimination performance (C-index) ranged from 0.57 to 1.0. Missing data were underreported, and most studies did not implement explainable-AI techniques or report exclusion time intervals.

DISCUSSION

AI/ML models for PC risk prediction using known risk factors perform reasonably well and may have near-term applications in identifying cohorts for targeted PC screening if validated in real-world data sets. The combined use of structured and unstructured EHR data using emerging AI models while incorporating explainable-AI techniques has the potential to identify novel PC risk factors, and this approach merits further study.

Collapse

Swaminathan A, Lopez I, Wang W, Srivastava U, Tran E, Bhargava-Shah A, Wu JY, Ren AL, Caoili K, Bui B, Alkhani L, Lee S, Mohit N, Seo N, Macedo N, Cheng W, Liu C, Thomas R, Chen JH, Gevaert O. Selective prediction for extracting unstructured clinical data. J Am Med Inform Assoc 2023;31:188-197. [PMID: 37769323 PMCID: PMC10746316 DOI: 10.1093/jamia/ocad182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 08/21/2023] [Accepted: 08/24/2023] [Indexed: 09/30/2023] Open

Affiliation(s)

Akshay Swaminathan Stanford University School of Medicine, Stanford, CA, United States Cerebral Inc. Claymont, DE, United States
Ivan Lopez Stanford University School of Medicine, Stanford, CA, United States Cerebral Inc. Claymont, DE, United States
William Wang Department of Biology, Stanford University, Stanford, CA, United States Department of Bioengineering, Stanford University, Stanford, CA, United States
Ujwal Srivastava Department of Computer Science, Stanford University, Stanford, CA, United States
Edward Tran Department of Computer Science, Stanford University, Stanford, CA, United States Department of Management Science and Engineering, Stanford University, Stanford, CA, United States
Aarohi Bhargava-Shah Stanford University School of Medicine, Stanford, CA, United States
Janet Y Wu Stanford University School of Medicine, Stanford, CA, United States
Alexander L Ren Stanford University School of Medicine, Stanford, CA, United States
Kaitlin Caoili Stanford University School of Medicine, Stanford, CA, United States
Brandon Bui Department of Human Biology, Stanford University, Stanford, CA, United States
Layth Alkhani Department of Bioengineering, Stanford University, Stanford, CA, United States Department of Chemistry, Stanford University, Stanford, CA, United States
Susan Lee Department of Computer Science, Stanford University, Stanford, CA, United States
Nathan Mohit Department of Computer Science, Stanford University, Stanford, CA, United States Department of Human Biology, Stanford University, Stanford, CA, United States
Noel Seo Department of Sociology, Stanford University, Stanford, CA, United States
Nicholas Macedo Department of Biology, Stanford University, Stanford, CA, United States Department of Radiology, Stanford University School of Medicine, Stanford, CA, United States
Winson Cheng Department of Computer Science, Stanford University, Stanford, CA, United States Department of Chemistry, Stanford University, Stanford, CA, United States
Charles Liu Department of Surgery, Stanford University School of Medicine, Stanford, CA, United States
Reena Thomas Department of Neurology and Neurological Sciences, Stanford Health Care, Stanford, CA, United States
Jonathan H Chen Stanford Center for Biomedical Informatics Research, Stanford, CA, United States Division of Hospital Medicine, Stanford, CA, United States Clinical Excellence Research Center, Stanford, CA, United States Department of Medicine, Stanford, CA, United States
Olivier Gevaert Stanford Center for Biomedical Informatics Research, Stanford, CA, United States Department of Medicine, Stanford, CA, United States

Collapse

Liu Y, Deng Y, Wang H, Liu W, He X, Zeng H. A nomogram for predicting echocardiogram prescription in outpatients: an analysis of the NAMCS database. Front Cardiovasc Med 2023;10:1183504. [PMID: 37908500 PMCID: PMC10613676 DOI: 10.3389/fcvm.2023.1183504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 09/19/2023] [Indexed: 11/02/2023] Open

Abstract

Background and objective

Cardiovascular disease is the leading cause of morbidity and mortality globally. Echocardiography is a commonly used method for assessing the condition of patients with cardiovascular disease. However, little is known about the population characteristics of patients who are recommended for echocardiographic examinations.

Methods

The National Ambulatory Medical Care Survey was a cross-sectional survey previously undertaken in the USA. In this study, publicly accessible data from the National Ambulatory Medical Care Survey database (for 2007-2016 and 2018-2019; data for 2017 was not published) were utilized to create a nomogram based on significant risk predictors. The study was performed in accordance with the relevant guidelines and regulations stipulated in the National Ambulatory Medical Care Survey database. Patients were randomly assigned to one of two groups: training cohort or validation cohort. The latter was used to assess the reliability of the prediction nomogram. Decision curve analysis was performed to evaluate the net benefit. Propensity score matching analysis was used to evaluate the relevance of echocardiography to clinical decision-making.

Results

A total of 217,178 outpatients were enrolled. Multivariable logistic regression analysis demonstrated that hypertension, hyperlipidemia, coronary artery disease/ischemic heart disease/history of myocardial infarction, congestive heart failure, major reason for visit, metropolitan statistical area, cerebrovascular disease/history of stroke or transient ischemic attack, previously assessed, insurance, referred, diagnosis, and reason for visit were all predictors of echocardiogram prescription in outpatients. The reliability of the predictive nomogram was confirmed in the validation cohort. After propensity score matching, there was a significant difference in new cardiovascular agent prescriptions between the echocardiogram and no echocardiogram groups (P < 0.01).

Conclusion

In this cohort study, a nomogram based on the characteristics of outpatients was developed to predict the possibility of prescribing echocardiography. The echocardiogram group was more likely to be prescribed new cardiovascular agents. These findings may contribute to providing information about the gap between actual utilizations and guidelines and the actual outpatient practice, as well as meeting the needs of outpatients.

Collapse

Amirahmadi A, Ohlsson M, Etminani K. Deep learning prediction models based on EHR trajectories: A systematic review. J Biomed Inform 2023;144:104430. [PMID: 37380061 DOI: 10.1016/j.jbi.2023.104430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 06/08/2023] [Accepted: 06/17/2023] [Indexed: 06/30/2023]

Abstract

BACKGROUND

Electronic health records (EHRs) are generated at an ever-increasing rate. EHR trajectories, the temporal aspect of health records, facilitate predicting patients' future health-related risks. It enables healthcare systems to increase the quality of care through early identification and primary prevention. Deep learning techniques have shown great capacity for analyzing complex data and have been successful for prediction tasks using complex EHR trajectories. This systematic review aims to analyze recent studies to identify challenges, knowledge gaps, and ongoing research directions.

METHODS

For this systematic review, we searched Scopus, PubMed, IEEE Xplore, and ACM databases from Jan 2016 to April 2022 using search terms centered around EHR, deep learning, and trajectories. Then the selected papers were analyzed according to publication characteristics, objectives, and their solutions regarding existing challenges, such as the model's capacity to deal with intricate data dependencies, data insufficiency, and explainability.

RESULTS

After removing duplicates and out-of-scope papers, 63 papers were selected, which showed rapid growth in the number of research in recent years. Predicting all diseases in the next visit and the onset of cardiovascular diseases were the most common targets. Different contextual and non-contextual representation learning methods are employed to retrieve important information from the sequence of EHR trajectories. Recurrent neural networks and the time-aware attention mechanism for modeling long-term dependencies, self-attentions, convolutional neural networks, graphs for representing inner visit relations, and attention scores for explainability were frequently used among the reviewed publications.

CONCLUSIONS

This systematic review demonstrated how recent breakthroughs in deep learning methods have facilitated the modeling of EHR trajectories. Research on improving the ability of graph neural networks, attention mechanisms, and cross-modal learning to analyze intricate dependencies among EHRs has shown good progress. There is a need to increase the number of publicly available EHR trajectory datasets to allow for easier comparison among different models. Also, very few developed models can handle all aspects of EHR trajectory data.

Collapse

Ben Miled Z, Dexter PR, Grout RW, Boustani M. Feature engineering from medical notes: A case study of dementia detection. Heliyon 2023;9:e14636. [PMID: 37020943 PMCID: PMC10068125 DOI: 10.1016/j.heliyon.2023.e14636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 03/12/2023] [Accepted: 03/13/2023] [Indexed: 03/19/2023] Open

Abstract

Background and objectives

Medical notes are narratives that describe the health of the patient in free text format. These notes can be more informative than structured data such as the history of medications or disease conditions. They are routinely collected and can be used to evaluate the patient's risk for developing chronic diseases such as dementia. This study investigates different methodologies for transforming routine care notes into dementia risk classifiers and evaluates the generalizability of these classifiers to new patients and new health care institutions.

Methods

The notes collected over the relevant history of the patient are lengthy. In this study, TF-ICF is used to select keywords with the highest discriminative ability between at risk dementia patients and healthy controls. The medical notes are then summarized in the form of occurrences of the selected keywords. Two different encodings of the summary are compared. The first encoding consists of the average of the vector embedding of each keyword occurrence as produced by the BERT or Clinical BERT pre-trained language models. The second encoding aggregates the keywords according to UMLS concepts and uses each concept as an exposure variable. For both encodings, misspellings of the selected keywords are also considered in an effort to improve the predictive performance of the classifiers. A neural network is developed over the first encoding and a gradient boosted trees model is applied to the second encoding. Patients from a single health care institution are used to develop all the classifiers which are then evaluated on held-out patients from the same health care institution as well as test patients from two other health care institutions.

Results

The results indicate that it is possible to identify patients at risk for dementia one year ahead of the onset of the disease using medical notes with an AUC of 75% when a gradient boosted trees model is used in conjunction with exposure variables derived from UMLS concepts. However, this performance is not maintained with an embedded feature space and when the classifier is applied to patients from other health care institutions. Moreover, an analysis of the top predictors of the gradient boosted trees model indicates that different features inform the classification depending on whether or not spelling variants of the keywords are included.

Conclusion

The present study demonstrates that medical notes can enable risk prediction models for complex chronic diseases such as dementia. However, additional research efforts are needed to improve the generalizability of these models. These efforts should take into consideration the length and localization of the medical notes; the availability of sufficient training data for each disease condition; and the variabilities resulting from different feature engineering techniques.

Collapse

Park J, Artin MG, Lee KE, May BL, Park M, Hur C, Tatonetti NP. Structured deep embedding model to generate composite clinical indices from electronic health records for early detection of pancreatic cancer. PATTERNS (NEW YORK, N.Y.) 2023;4:100636. [PMID: 36699740 PMCID: PMC9868652 DOI: 10.1016/j.patter.2022.100636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 08/18/2022] [Accepted: 10/24/2022] [Indexed: 12/12/2022]

Gupta M, Gallamoza B, Cutrona N, Dhakal P, Poulain R, Beheshti R. An Extensive Data Processing Pipeline for MIMIC-IV. PROCEEDINGS OF MACHINE LEARNING RESEARCH 2022;193:311-325. [PMID: 36686986 PMCID: PMC9854277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]

Kiser AC, Eilbeck K, Ferraro JP, Skarda DE, Samore MH, Bucher B. Standard Vocabularies to Improve Machine Learning Model Transferability With Electronic Health Record Data: Retrospective Cohort Study Using Health Care-Associated Infection. JMIR Med Inform 2022;10:e39057. [PMID: 36040784 PMCID: PMC9472055 DOI: 10.2196/39057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 08/09/2022] [Accepted: 08/15/2022] [Indexed: 11/13/2022] Open

Abstract

BACKGROUND

With the widespread adoption of electronic healthcare records (EHRs) by US hospitals, there is an opportunity to leverage this data for the development of predictive algorithms to improve clinical care. A key barrier in model development and implementation includes the external validation of model discrimination, which is rare and often results in worse performance. One reason why machine learning models are not externally generalizable is data heterogeneity. A potential solution to address the substantial data heterogeneity between health care systems is to use standard vocabularies to map EHR data elements. The advantage of these vocabularies is a hierarchical relationship between elements, which allows the aggregation of specific clinical features to more general grouped concepts.

OBJECTIVE

This study aimed to evaluate grouping EHR data using standard vocabularies to improve the transferability of machine learning models for the detection of postoperative health care-associated infections across institutions with different EHR systems.

METHODS

Patients who underwent surgery from the University of Utah Health and Intermountain Healthcare from July 2014 to August 2017 with complete follow-up data were included. The primary outcome was a health care-associated infection within 30 days of the procedure. EHR data from 0-30 days after the operation were mapped to standard vocabularies and grouped using the hierarchical relationships of the vocabularies. Model performance was measured using the area under the receiver operating characteristic curve (AUC) and F₁-score in internal and external validations. To evaluate model transferability, a difference-in-difference metric was defined as the difference in performance drop between internal and external validations for the baseline and grouped models.

RESULTS

A total of 5775 patients from the University of Utah and 15,434 patients from Intermountain Healthcare were included. The prevalence of selected outcomes was from 4.9% (761/15,434) to 5% (291/5775) for surgical site infections, from 0.8% (44/5775) to 1.1% (171/15,434) for pneumonia, from 2.6% (400/15,434) to 3% (175/5775) for sepsis, and from 0.8% (125/15,434) to 0.9% (50/5775) for urinary tract infections. In all outcomes, the grouping of data using standard vocabularies resulted in a reduced drop in AUC and F₁-score in external validation compared to baseline features (all P<.001, except urinary tract infection AUC: P=.002). The difference-in-difference metrics ranged from 0.005 to 0.248 for AUC and from 0.075 to 0.216 for F₁-score.

CONCLUSIONS

We demonstrated that grouping machine learning model features based on standard vocabularies improved model transferability between data sets across 2 institutions. Improving model transferability using standard vocabularies has the potential to improve the generalization of clinical prediction models across the health care system.

Collapse

Rasmy L, Nigo M, Kannadath BS, Xie Z, Mao B, Patel K, Zhou Y, Zhang W, Ross A, Xu H, Zhi D. Recurrent neural network models (CovRNN) for predicting outcomes of patients with COVID-19 on admission to hospital: model development and validation using electronic health record data. Lancet Digit Health 2022;4:e415-e425. [PMID: 35466079 PMCID: PMC9023005 DOI: 10.1016/s2589-7500(22)00049-8] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 01/11/2022] [Accepted: 03/07/2022] [Indexed: 02/08/2023]

Abstract

BACKGROUND

Predicting outcomes of patients with COVID-19 at an early stage is crucial for optimised clinical care and resource management, especially during a pandemic. Although multiple machine learning models have been proposed to address this issue, because of their requirements for extensive data preprocessing and feature engineering, they have not been validated or implemented outside of their original study site. Therefore, we aimed to develop accurate and transferrable predictive models of outcomes on hospital admission for patients with COVID-19.

METHODS

In this study, we developed recurrent neural network-based models (CovRNN) to predict the outcomes of patients with COVID-19 by use of available electronic health record data on admission to hospital, without the need for specific feature selection or missing data imputation. CovRNN was designed to predict three outcomes: in-hospital mortality, need for mechanical ventilation, and prolonged hospital stay (>7 days). For in-hospital mortality and mechanical ventilation, CovRNN produced time-to-event risk scores (survival prediction; evaluated by the concordance index) and all-time risk scores (binary prediction; area under the receiver operating characteristic curve [AUROC] was the main metric); we only trained a binary classification model for prolonged hospital stay. For binary classification tasks, we compared CovRNN against traditional machine learning algorithms: logistic regression and light gradient boost machine. Our models were trained and validated on the heterogeneous, deidentified data of 247 960 patients with COVID-19 from 87 US health-care systems derived from the Cerner Real-World COVID-19 Q3 Dataset up to September 2020. We held out the data of 4175 patients from two hospitals for external validation. The remaining 243 785 patients from the 85 health systems were grouped into training (n=170 626), validation (n=24 378), and multi-hospital test (n=48 781) sets. Model performance was evaluated in the multi-hospital test set. The transferability of CovRNN was externally validated by use of deidentified data from 36 140 patients derived from the US-based Optum deidentified COVID-19 electronic health record dataset (version 1015; from January, 2007, to Oct 15, 2020). Exact dates of data extraction were masked by the databases to ensure patient data safety.

FINDINGS

CovRNN binary models achieved AUROCs of 93·0% (95% CI 92·6-93·4) for the prediction of in-hospital mortality, 92·9% (92·6-93·2) for the prediction of mechanical ventilation, and 86·5% (86·2-86·9) for the prediction of a prolonged hospital stay, outperforming light gradient boost machine and logistic regression algorithms. External validation confirmed AUROCs in similar ranges (91·3-97·0% for in-hospital mortality prediction, 91·5-96·0% for the prediction of mechanical ventilation, and 81·0-88·3% for the prediction of prolonged hospital stay). For survival prediction, CovRNN achieved a concordance index of 86·0% (95% CI 85·1-86·9) for in-hospital mortality and 92·6% (92·2-93·0) for mechanical ventilation.

INTERPRETATION

Trained on a large, heterogeneous, real-world dataset, our CovRNN models showed high prediction accuracy and transferability through consistently good performances on multiple external datasets. Our results show the feasibility of a COVID-19 predictive model that delivers high accuracy without the need for complex feature engineering.

FUNDING

Cancer Prevention and Research Institute of Texas.

Collapse

Integration of Artificial Intelligence and Blockchain Technology in Healthcare and Agriculture. J FOOD QUALITY 2022. [DOI: 10.1155/2022/4228448] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open

Rafee A, Riepenhausen S, Neuhaus P, Meidt A, Dugas M, Varghese J. ELaPro, a LOINC-mapped core dataset for top laboratory procedures of eligibility screening for clinical trials. BMC Med Res Methodol 2022;22:141. [PMID: 35568796 PMCID: PMC9107639 DOI: 10.1186/s12874-022-01611-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 04/20/2022] [Indexed: 12/21/2022] Open

Abstract

Background

Screening for eligible patients continues to pose a great challenge for many clinical trials. This has led to a rapidly growing interest in standardizing computable representations of eligibility criteria (EC) in order to develop tools that leverage data from electronic health record (EHR) systems. Although laboratory procedures (LP) represent a common entity of EC that is readily available and retrievable from EHR systems, there is a lack of interoperable data models for this entity of EC. A public, specialized data model that utilizes international, widely-adopted terminology for LP, e.g. Logical Observation Identifiers Names and Codes (LOINC®), is much needed to support automated screening tools.

Objective

The aim of this study is to establish a core dataset for LP most frequently requested to recruit patients for clinical trials using LOINC terminology. Employing such a core dataset could enhance the interface between study feasibility platforms and EHR systems and significantly improve automatic patient recruitment.

Methods

We used a semi-automated approach to analyze 10,516 screening forms from the Medical Data Models (MDM) portal’s data repository that are pre-annotated with Unified Medical Language System (UMLS). An automated semantic analysis based on concept frequency is followed by an extensive manual expert review performed by physicians to analyze complex recruitment-relevant concepts not amenable to automatic approach.

Results

Based on analysis of 138,225 EC from 10,516 screening forms, 55 laboratory procedures represented 77.87% of all UMLS laboratory concept occurrences identified in the selected EC forms. We identified 26,413 unique UMLS concepts from 118 UMLS semantic types and covered the vast majority of Medical Subject Headings (MeSH) disease domains.

Conclusions

Only a small set of common LP covers the majority of laboratory concepts in screening EC forms which supports the feasibility of establishing a focused core dataset for LP. We present ELaPro, a novel, LOINC-mapped, core dataset for the most frequent 55 LP requested in screening for clinical trials. ELaPro is available in multiple machine-readable data formats like CSV, ODM and HL7 FHIR. The extensive manual curation of this large number of free-text EC as well as the combining of UMLS and LOINC terminologies distinguishes this specialized dataset from previous relevant datasets in the literature.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12874-022-01611-y.

Collapse

Castro VM, Gainer V, Wattanasin N, Benoit B, Cagan A, Ghosh B, Goryachev S, Metta R, Park H, Wang D, Mendis M, Rees M, Herrick C, Murphy SN. The Mass General Brigham Biobank Portal: an i2b2-based data repository linking disparate and high-dimensional patient data to support multimodal analytics. J Am Med Inform Assoc 2021;29:643-651. [PMID: 34849976 PMCID: PMC8922162 DOI: 10.1093/jamia/ocab264] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 10/20/2021] [Accepted: 11/16/2021] [Indexed: 01/07/2023] Open

NE–LP: Normalized entropy- and loss prediction-based sampling for active learning in Chinese word segmentation on EHRs. Neural Comput Appl 2021. [DOI: 10.1007/s00521-021-05896-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Bastarache L. Using Phecodes for Research with the Electronic Health Record: From PheWAS to PheRS. Annu Rev Biomed Data Sci 2021;4:1-19. [PMID: 34465180 DOI: 10.1146/annurev-biodatasci-122320-112352] [Citation(s) in RCA: 64] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Ding X, Mower J, Subramanian D, Cohen T. Augmenting aer2vec: Enriching distributed representations of adverse event report data with orthographic and lexical information. J Biomed Inform 2021;119:103833. [PMID: 34111555 DOI: 10.1016/j.jbi.2021.103833] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 05/10/2021] [Accepted: 06/02/2021] [Indexed: 11/29/2022]

Humphreys BL, Del Fiol G, Xu H. The UMLS knowledge sources at 30: indispensable to current research and applications in biomedical informatics. J Am Med Inform Assoc 2020;27:1499-1501. [PMID: 33059366 DOI: 10.1093/jamia/ocaa208] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Indexed: 01/22/2023] Open