1
|
Pressat-Laffouilhère T, Balayé P, Dahamna B, Lelong R, Billey K, Darmoni SJ, Grosjean J. Evaluation of Doc'EDS: a French semantic search tool to query health documents from a clinical data warehouse. BMC Med Inform Decis Mak 2022; 22:34. [PMID: 35135538 PMCID: PMC8822768 DOI: 10.1186/s12911-022-01762-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Accepted: 01/20/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Unstructured data from electronic health records represent a wealth of information. Doc'EDS is a pre-screening tool based on textual and semantic analysis. The Doc'EDS system provides a graphic user interface to search documents in French. The aim of this study was to present the Doc'EDS tool and to provide a formal evaluation of its semantic features. METHODS Doc'EDS is a search tool built on top of the clinical data warehouse developed at Rouen University Hospital. This tool is a multilevel search engine combining structured and unstructured data. It also provides basic analytical features and semantic utilities. A formal evaluation was conducted to measure the impact of Natural Language Processing algorithms. RESULTS Approximately 18.1 million narrative documents are stored in Doc'EDS. The formal evaluation was conducted in 5000 clinical concepts that were manually collected. The F-measures of negative concepts and hypothetical concepts were respectively 0.89 and 0.57. CONCLUSION In this formal evaluation, we have shown that Doc'EDS is able to deal with language subtleties to enhance an advanced full text search in French health documents. The Doc'EDS tool is currently used on a daily basis to help researchers to identify patient cohorts thanks to unstructured data.
Collapse
Affiliation(s)
- Thibaut Pressat-Laffouilhère
- Department of Biomedical Informatics, Rouen University Hospital, Normandy, France.,LITIS EA4108, Rouen University, Normandy, France
| | - Pierre Balayé
- Department of Biomedical Informatics, Rouen University Hospital, Normandy, France
| | - Badisse Dahamna
- Department of Biomedical Informatics, Rouen University Hospital, Normandy, France.,LIMICS U1142 INSERM, Sorbonne Université & Sorbonne Paris Nord, Paris, France
| | - Romain Lelong
- Department of Biomedical Informatics, Rouen University Hospital, Normandy, France.,LIMICS U1142 INSERM, Sorbonne Université & Sorbonne Paris Nord, Paris, France
| | - Kévin Billey
- Department of Biomedical Informatics, Rouen University Hospital, Normandy, France.,LITIS EA4108, Rouen University, Normandy, France
| | - Stéfan J Darmoni
- Department of Biomedical Informatics, Rouen University Hospital, Normandy, France.,LIMICS U1142 INSERM, Sorbonne Université & Sorbonne Paris Nord, Paris, France
| | - Julien Grosjean
- Department of Biomedical Informatics, Rouen University Hospital, Normandy, France. .,LIMICS U1142 INSERM, Sorbonne Université & Sorbonne Paris Nord, Paris, France.
| |
Collapse
|
2
|
Ye C, Malin BA, Fabbri D. Leveraging medical context to recommend semantically similar terms for chart reviews. BMC Med Inform Decis Mak 2021; 21:353. [PMID: 34922536 PMCID: PMC8684266 DOI: 10.1186/s12911-021-01724-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 12/09/2021] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Information retrieval (IR) help clinicians answer questions posed to large collections of electronic medical records (EMRs), such as how best to identify a patient's cancer stage. One of the more promising approaches to IR for EMRs is to expand a keyword query with similar terms (e.g., augmenting cancer with mets). However, there is a large range of clinical chart review tasks, such that fixed sets of similar terms is insufficient. Current language models, such as Bidirectional Encoder Representations from Transformers (BERT) embeddings, do not capture the full non-textual context of a task. In this study, we present new methods that provide similar terms dynamically by adjusting with the context of the chart review task. METHODS We introduce a vector space for medical-context in which each word is represented by a vector that captures the word's usage in different medical contexts (e.g., how frequently cancer is used when ordering a prescription versus describing family history) beyond the context learned from the surrounding text. These vectors are transformed into a vector space for customizing the set of similar terms selected for different chart review tasks. We evaluate the vector space model with multiple chart review tasks, in which supervised machine learning models learn to predict the preferred terms of clinically knowledgeable reviewers. To quantify the usefulness of the predicted similar terms to a baseline of standard word2vec embeddings, we measure (1) the prediction performance of the medical-context vector space model using the area under the receiver operating characteristic curve (AUROC) and (2) the labeling effort required to train the models. RESULTS The vector space outperformed the baseline word2vec embeddings in all three chart review tasks with an average AUROC of 0.80 versus 0.66, respectively. Additionally, the medical-context vector space significantly reduced the number of labels required to learn and predict the preferred similar terms of reviewers. Specifically, the labeling effort was reduced to 10% of the entire dataset in all three tasks. CONCLUSIONS The set of preferred similar terms that are relevant to a chart review task can be learned by leveraging the medical context of the task.
Collapse
Affiliation(s)
- Cheng Ye
- Department of Computer Science, Vanderbilt University, 2301 Vanderbilt Place, PMB 351679, Nashville, TN, 37235-1679, USA.
| | - Bradley A Malin
- Department of Computer Science, Vanderbilt University, 2301 Vanderbilt Place, PMB 351679, Nashville, TN, 37235-1679, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Daniel Fabbri
- Department of Computer Science, Vanderbilt University, 2301 Vanderbilt Place, PMB 351679, Nashville, TN, 37235-1679, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
3
|
Hill JR, Visweswaran S, Ning X, Schleyer TK. Use, Impact, Weaknesses, and Advanced Features of Search Functions for Clinical Use in Electronic Health Records: A Scoping Review. Appl Clin Inform 2021; 12:417-428. [PMID: 34261171 PMCID: PMC8279817 DOI: 10.1055/s-0041-1730033] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Objective
Although vast amounts of patient information are captured in electronic health records (EHRs), effective clinical use of this information is challenging due to inadequate and inefficient access to it at the point of care. The purpose of this study was to conduct a scoping review of the literature on the use of EHR search functions within a single patient's record in clinical settings to characterize the current state of research on the topic and identify areas for future study.
Methods
We conducted a literature search of four databases to identify articles on within-EHR search functions or the use of EHR search function in the context of clinical tasks. After reviewing titles and abstracts and performing a full-text review of selected articles, we included 17 articles in the analysis. We qualitatively identified themes in those articles and synthesized the literature for each theme.
Results
Based on the 17 articles analyzed, we delineated four themes: (1) how clinicians use search functions, (2) impact of search functions on clinical workflow, (3) weaknesses of current search functions, and (4) advanced search features. Our review found that search functions generally facilitate patient information retrieval by clinicians and are positively received by users. However, existing search functions have weaknesses, such as yielding false negatives and false positives, which can decrease trust in the results, and requiring a high cognitive load to perform an inclusive search of a patient's record.
Conclusion
Despite the widespread adoption of EHRs, only a limited number of articles describe the use of EHR search functions in a clinical setting, despite evidence that they benefit clinician workflow and productivity. Some of the weaknesses of current search functions may be addressed by enhancing EHR search functions with collaborative filtering.
Collapse
Affiliation(s)
- Jordan R Hill
- Department of Medicine, Indiana University School of Medicine, Indianapolis, Indiana, United States
| | - Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States
| | - Xia Ning
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, United States.,Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio, United States.,Translational Data Analytics Institute, The Ohio State University, Ohio, United States
| | - Titus K Schleyer
- Department of Medicine, Indiana University School of Medicine, Indianapolis, Indiana, United States.,Center for Biomedical Informatics, Regenstrief Institute, Indianapolis, Indiana, United States
| |
Collapse
|
4
|
Chen YP, Lo YH, Lai F, Huang CH. Disease Concept-Embedding Based on the Self-Supervised Method for Medical Information Extraction from Electronic Health Records and Disease Retrieval: Algorithm Development and Validation Study. J Med Internet Res 2021; 23:e25113. [PMID: 33502324 PMCID: PMC7875703 DOI: 10.2196/25113] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2020] [Revised: 11/19/2020] [Accepted: 01/15/2021] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND The electronic health record (EHR) contains a wealth of medical information. An organized EHR can greatly help doctors treat patients. In some cases, only limited patient information is collected to help doctors make treatment decisions. Because EHRs can serve as a reference for this limited information, doctors' treatment capabilities can be enhanced. Natural language processing and deep learning methods can help organize and translate EHR information into medical knowledge and experience. OBJECTIVE In this study, we aimed to create a model to extract concept embeddings from EHRs for disease pattern retrieval and further classification tasks. METHODS We collected 1,040,989 emergency department visits from the National Taiwan University Hospital Integrated Medical Database and 305,897 samples from the National Hospital and Ambulatory Medical Care Survey Emergency Department data. After data cleansing and preprocessing, the data sets were divided into training, validation, and test sets. We proposed a Transformer-based model to embed EHRs and used Bidirectional Encoder Representations from Transformers (BERT) to extract features from free text and concatenate features with structural data as input to our proposed model. Then, Deep InfoMax (DIM) and Simple Contrastive Learning of Visual Representations (SimCLR) were used for the unsupervised embedding of the disease concept. The pretrained disease concept-embedding model, named EDisease, was further finetuned to adapt to the critical care outcome prediction task. We evaluated the performance of embedding using t-distributed stochastic neighbor embedding (t-SNE) to perform dimension reduction for visualization. The performance of the finetuned predictive model was evaluated against published models using the area under the receiver operating characteristic (AUROC). RESULTS The performance of our model on the outcome prediction had the highest AUROC of 0.876. In the ablation study, the use of a smaller data set or fewer unsupervised methods for pretraining deteriorated the prediction performance. The AUROCs were 0.857, 0.870, and 0.868 for the model without pretraining, the model pretrained by only SimCLR, and the model pretrained by only DIM, respectively. On the smaller finetuning set, the AUROC was 0.815 for the proposed model. CONCLUSIONS Through contrastive learning methods, disease concepts can be embedded meaningfully. Moreover, these methods can be used for disease retrieval tasks to enhance clinical practice capabilities. The disease concept model is also suitable as a pretrained model for subsequent prediction tasks.
Collapse
Affiliation(s)
- Yen-Pin Chen
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei City, Taiwan
- Department of Emergency Medicine, National Taiwan University BioMedical Park Hospital, Hsinchu County, Taiwan
- Department of Emergency Medicine, National Taiwan University Hospital, Taipei City, Taiwan
| | - Yuan-Hsun Lo
- Department of Applied Mathematics, National Pingtung University, Pingtung City, Taiwan
| | - Feipei Lai
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei City, Taiwan
| | - Chien-Hua Huang
- Department of Emergency Medicine, National Taiwan University Hospital, Taipei City, Taiwan
| |
Collapse
|
5
|
Liu S, Wang Y, Wen A, Wang L, Hong N, Shen F, Bedrick S, Hersh W, Liu H. Implementation of a Cohort Retrieval System for Clinical Data Repositories Using the Observational Medical Outcomes Partnership Common Data Model: Proof-of-Concept System Validation. JMIR Med Inform 2020; 8:e17376. [PMID: 33021486 PMCID: PMC7576539 DOI: 10.2196/17376] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 06/04/2020] [Accepted: 07/28/2020] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Widespread adoption of electronic health records has enabled the secondary use of electronic health record data for clinical research and health care delivery. Natural language processing techniques have shown promise in their capability to extract the information embedded in unstructured clinical data, and information retrieval techniques provide flexible and scalable solutions that can augment natural language processing systems for retrieving and ranking relevant records. OBJECTIVE In this paper, we present the implementation of a cohort retrieval system that can execute textual cohort selection queries on both structured data and unstructured text-Cohort Retrieval Enhanced by Analysis of Text from Electronic Health Records (CREATE). METHODS CREATE is a proof-of-concept system that leverages a combination of structured queries and information retrieval techniques on natural language processing results to improve cohort retrieval performance using the Observational Medical Outcomes Partnership Common Data Model to enhance model portability. The natural language processing component was used to extract common data model concepts from textual queries. We designed a hierarchical index to support the common data model concept search utilizing information retrieval techniques and frameworks. RESULTS Our case study on 5 cohort identification queries, evaluated using the precision at 5 information retrieval metric at both the patient-level and document-level, demonstrates that CREATE achieves a mean precision at 5 of 0.90, which outperforms systems using only structured data or only unstructured text with mean precision at 5 values of 0.54 and 0.74, respectively. CONCLUSIONS The implementation and evaluation of Mayo Clinic Biobank data demonstrated that CREATE outperforms cohort retrieval systems that only use one of either structured data or unstructured text in complex textual cohort queries.
Collapse
Affiliation(s)
- Sijia Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Yanshan Wang
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Andrew Wen
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Liwei Wang
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Na Hong
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Feichen Shen
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Steven Bedrick
- Department of Computer Science and Electrical Engineering, Oregon Health & Science University, Portland, OR, United States
| | - William Hersh
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, United States
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| |
Collapse
|
6
|
Gagalova KK, Leon Elizalde MA, Portales-Casamar E, Görges M. What You Need to Know Before Implementing a Clinical Research Data Warehouse: Comparative Review of Integrated Data Repositories in Health Care Institutions. JMIR Form Res 2020; 4:e17687. [PMID: 32852280 PMCID: PMC7484778 DOI: 10.2196/17687] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Revised: 06/09/2020] [Accepted: 07/17/2020] [Indexed: 12/23/2022] Open
Abstract
Background Integrated data repositories (IDRs), also referred to as clinical data warehouses, are platforms used for the integration of several data sources through specialized analytical tools that facilitate data processing and analysis. IDRs offer several opportunities for clinical data reuse, and the number of institutions implementing an IDR has grown steadily in the past decade. Objective The architectural choices of major IDRs are highly diverse and determining their differences can be overwhelming. This review aims to explore the underlying models and common features of IDRs, provide a high-level overview for those entering the field, and propose a set of guiding principles for small- to medium-sized health institutions embarking on IDR implementation. Methods We reviewed manuscripts published in peer-reviewed scientific literature between 2008 and 2020, and selected those that specifically describe IDR architectures. Of 255 shortlisted articles, we found 34 articles describing 29 different architectures. The different IDRs were analyzed for common features and classified according to their data processing and integration solution choices. Results Despite common trends in the selection of standard terminologies and data models, the IDRs examined showed heterogeneity in the underlying architecture design. We identified 4 common architecture models that use different approaches for data processing and integration. These different approaches were driven by a variety of features such as data sources, whether the IDR was for a single institution or a collaborative project, the intended primary data user, and purpose (research-only or including clinical or operational decision making). Conclusions IDR implementations are diverse and complex undertakings, which benefit from being preceded by an evaluation of requirements and definition of scope in the early planning stage. Factors such as data source diversity and intended users of the IDR influence data flow and synchronization, both of which are crucial factors in IDR architecture planning.
Collapse
Affiliation(s)
- Kristina K Gagalova
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, Canada.,Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, Canada.,Research Institute, BC Children's Hospital, Vancouver, BC, Canada
| | - M Angelica Leon Elizalde
- Research Institute, BC Children's Hospital, Vancouver, BC, Canada.,School of Population and Public Health, University of British Columbia, Vancouver, BC, Canada
| | - Elodie Portales-Casamar
- Research Institute, BC Children's Hospital, Vancouver, BC, Canada.,Department of Pediatrics, University of British Columbia, Vancouver, BC, Canada
| | - Matthias Görges
- Research Institute, BC Children's Hospital, Vancouver, BC, Canada.,Department of Anesthesiology, Pharmacology and Therapeutics, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
7
|
Ruppel H, Bhardwaj A, Manickam RN, Adler-Milstein J, Flagg M, Ballesca M, Liu VX. Assessment of Electronic Health Record Search Patterns and Practices by Practitioners in a Large Integrated Health Care System. JAMA Netw Open 2020; 3:e200512. [PMID: 32142128 PMCID: PMC7060491 DOI: 10.1001/jamanetworkopen.2020.0512] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
IMPORTANCE The electronic health record (EHR) is a source of practitioner dissatisfaction in part because of challenges with information retrieval. To improve data accessibility, a better understanding of practitioners' information needs within individual patient records is needed. OBJECTIVE To assess EHR users' searches using data from a large integrated health care system. DESIGN, SETTING, AND PARTICIPANTS This retrospective cross-sectional analysis used EHR search data from Kaiser Permanente Northern California, an integrated health care delivery system with more than 4.4 million members. Users' EHR search activity data were obtained from April 1, 2018, to May 15, 2019. MAIN OUTCOMES AND MEASURES Search term frequency was grouped by user and practitioner types. Network analyses were performed of co-occurring search terms within a single search episode, and centrality measures for search terms (degree and betweenness centrality) were calculated. RESULTS A total of 12 313 047 search activities (including 4 328 330 searches and 7 984 717 result views) conducted by 34 735 unique users within 977 160 unique patient EHRs were identified. In aggregate, users searched for 208 374 unique search terms and conducted a median of 4 searches (interquartile range, 1-28 searches). Of all 97 367 active EHR users, 34 735 (35.7%) conducted at least 1 search. However, of all 12 968 active EHR physician users, 9801 (75.6%) conducted at least 1 search, and of all 1908 active pharmacist users, 1402 (73.5%) conducted at least 1 search. The top 3 most commonly searched terms were statin (75 017 searches [1.7%]), colonoscopy (73 545 [1.7%]), and pft (54 990 [1.3%]). However, wide variation in top searches were noted across practitioner groups. Terms searched most often with another term in a single linked search episode included statin, lisinopril, colonoscopy, gabapentin, and aspirin. CONCLUSIONS AND RELEVANCE Although physicians and pharmacists were the most active users of EHR searches, search volume and frequently searched terms varied considerably by and within user role. Further customization of the EHR interface may help leverage users' search content and patterns to improve targeted information retrieval.
Collapse
Affiliation(s)
- Halley Ruppel
- Division of Research, Kaiser Permanente Northern California, Oakland
| | - Aashish Bhardwaj
- Division of Research, Kaiser Permanente Northern California, Oakland
| | - Raj N. Manickam
- Division of Research, Kaiser Permanente Northern California, Oakland
| | | | - Marc Flagg
- The Permanente Medical Group, Oakland, California
| | | | - Vincent X. Liu
- Division of Research, Kaiser Permanente Northern California, Oakland
- The Permanente Medical Group, Oakland, California
| |
Collapse
|
8
|
Wang Y, Wen A, Liu S, Hersh W, Bedrick S, Liu H. Test collections for electronic health record-based clinical information retrieval. JAMIA Open 2019; 2:360-368. [PMID: 31709390 PMCID: PMC6824517 DOI: 10.1093/jamiaopen/ooz016] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2019] [Revised: 04/26/2019] [Accepted: 04/03/2019] [Indexed: 01/03/2023] Open
Abstract
Objectives To create test collections for evaluating clinical information retrieval (IR) systems and advancing clinical IR research. Materials and Methods Electronic health record (EHR) data, including structured and free-text data, from 45 000 patients who are a part of the Mayo Clinic Biobank cohort was retrieved from the clinical data warehouse. The clinical IR system indexed a total of 42 million free-text EHR documents. The search queries consisted of 56 topics developed through a collaboration between Mayo Clinic and Oregon Health & Science University. We described the creation of test collections, including a to-be-evaluated document pool using five retrieval models, and human assessment guidelines. We analyzed the relevance judgment results in terms of human agreement and time spent, and results of three levels of relevance, and reported performance of five retrieval models. Results The two judges had a moderate overall agreement with a Kappa value of 0.49, spent a consistent amount of time judging the relevance, and were able to identify easy and difficult topics. The conventional retrieval model performed best on most topics while a concept-based retrieval model had better performance on the topics requiring conceptual level retrieval. Discussion IR can provide an alternate approach to leveraging clinical narratives for patient information discovery as it is less dependent on semantics. Our study showed the feasibility of test collections along with a few challenges. Conclusion The conventional test collections for evaluating the IR system show potential for successfully evaluating clinical IR systems with a few challenges to be investigated.
Collapse
Affiliation(s)
- Yanshan Wang
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Andrew Wen
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Sijia Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - William Hersh
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, Oregon, USA
| | - Steven Bedrick
- Department of Computer Science and Electrical Engineering, Oregon Health & Science University, Portland, Oregon, USA
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| |
Collapse
|
9
|
Moon S, Liu S, Chen D, Wang Y, Wood DL, Chaudhry R, Liu H, Kingsbury P. Salience of Medical Concepts of Inside Clinical Texts and Outside Medical Records for Referred Cardiovascular Patients. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2019; 3:200-219. [PMID: 35415427 PMCID: PMC8982748 DOI: 10.1007/s41666-019-00044-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2018] [Revised: 11/29/2018] [Accepted: 01/05/2019] [Indexed: 12/03/2022]
Abstract
Outside medical records (OMRs) accompanying referred patients are frequently sent as faxes from external healthcare providers. Accessing useful and relevant information from these OMRs in a timely manner is a challenging task due to a combination of the presence of machine-illegible information and the limited system interoperability inherent in healthcare. Little research has been done on investigating information in OMRs. This paper evaluated overlapping and non-overlapping medical concepts captured from digitally faxed OMRs for patients transferring to the Department of Cardiovascular Medicine and from clinical consultant notes generated at the Mayo Clinic. We used optical character recognition (OCR) techniques to make faxed OMRs machine-readable and used natural language processing (NLP) techniques to capture clinical concepts from both machine-readable OMRs and Mayo clinical notes. We measured the level of overlap in medical concepts between OMRs and Mayo clinical narratives in the quantitative approaches and assessed the salience of concepts specific to Cardiovascular Medicine by calculating the ratio of those mentioned concepts relative to an independent clinical corpus. Among the concepts collected from the OMRs, 11.19% of those were also present in the Mayo clinical narratives that were generated within the 3 months after their initial encounter at the Mayo Clinic. For those common concepts, 73.97% were identified in initial consultant notes (ICNs) and 26.03% were captured over subsequent follow-up consultant notes (FCNs). These findings implied that information collected from the OMRs is potentially informative for patient care, but some valuable information (additionally identified in FCNs) collected from the OMRs is not fully used in an earlier stage of the care process. The concepts collected from the ICNs have the highest salience to Cardiovascular Medicine (0.112) compared to concepts in OMRs and concepts in FCNs. Additionally, unique concepts captured in ICNs (unseen in OMRs or FCNs) carried the most salient information (0.094), which demonstrated that ICNs provided the most informative concepts for the care of transferred patients.
Collapse
Affiliation(s)
- Sungrim Moon
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | - Sijia Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
- Department of Computer Science and Engineering, University at Buffalo, Buffalo, NY USA
| | - David Chen
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | - Yanshan Wang
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | | | - Rajeev Chaudhry
- Department of Medicine and Center for Translational Informatics, Mayo Clinic, Rochester, MN USA
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | - Paul Kingsbury
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| |
Collapse
|
10
|
Complexities, variations, and errors of numbering within clinical notes: the potential impact on information extraction and cohort-identification. BMC Med Inform Decis Mak 2019; 19:75. [PMID: 30944012 PMCID: PMC6448181 DOI: 10.1186/s12911-019-0784-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Numbers and numerical concepts appear frequently in free text clinical notes from electronic health records. Knowledge of the frequent lexical variations of these numerical concepts, and their accurate identification, is important for many information extraction tasks. This paper describes an analysis of the variation in how numbers and numerical concepts are represented in clinical notes. METHODS We used an inverted index of approximately 100 million notes to obtain the frequency of various permutations of numbers and numerical concepts, including the use of Roman numerals, numbers spelled as English words, and invalid dates, among others. Overall, twelve types of lexical variants were analyzed. RESULTS We found substantial variation in how these concepts were represented in the notes, including multiple data quality issues. We also demonstrate that not considering these variations could have substantial real-world implications for cohort identification tasks, with one case missing > 80% of potential patients. CONCLUSIONS Numbering within clinical notes can be variable, and not taking these variations into account could result in missing or inaccurate information for natural language processing and information retrieval tasks.
Collapse
|
11
|
Yatsko VA. Informatics, Information Science, and Computer Science. SCIENTIFIC AND TECHNICAL INFORMATION PROCESSING 2018. [DOI: 10.3103/s0147688218040081] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
12
|
Ye C, Fabbri D. Extracting similar terms from multiple EMR-based semantic embeddings to support chart reviews. J Biomed Inform 2018; 83:63-72. [PMID: 29793071 DOI: 10.1016/j.jbi.2018.05.014] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2017] [Revised: 04/24/2018] [Accepted: 05/20/2018] [Indexed: 01/20/2023]
Abstract
OBJECTIVE Word embeddings project semantically similar terms into nearby points in a vector space. When trained on clinical text, these embeddings can be leveraged to improve keyword search and text highlighting. In this paper, we present methods to refine the selection process of similar terms from multiple EMR-based word embeddings, and evaluate their performance quantitatively and qualitatively across multiple chart review tasks. MATERIALS AND METHODS Word embeddings were trained on each clinical note type in an EMR. These embeddings were then combined, weighted, and truncated to select a refined set of similar terms to be used in keyword search and text highlighting. To evaluate their quality, we measured the similar terms' information retrieval (IR) performance using precision-at-K (P@5, P@10). Additionally a user study evaluated users' search term preferences, while a timing study measured the time to answer a question from a clinical chart. RESULTS The refined terms outperformed the baseline method's information retrieval performance (e.g., increasing the average P@5 from 0.48 to 0.60). Additionally, the refined terms were preferred by most users, and reduced the average time to answer a question. CONCLUSIONS Clinical information can be more quickly retrieved and synthesized when using semantically similar term from multiple embeddings.
Collapse
Affiliation(s)
- Cheng Ye
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA.
| | - Daniel Fabbri
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
13
|
Su CH, Li TC, Cho DY, Ma WF, Chang YS, Lee TH, Huang LC. Effectiveness of a computerised system of patient education in clinical practice: a longitudinal nested cohort study. BMJ Open 2018; 8:e020621. [PMID: 29724740 PMCID: PMC5942434 DOI: 10.1136/bmjopen-2017-020621] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
INTRODUCTION Developing electronic health record information systems is an international trend for promoting the integration of health information and enhancing the quality of medical services. Patient education is a frequent intervention in nursing care, and recording the amount and quality of patient education have become essential in the nursing record. The aims of this study are (1): to develop a high-quality Patient Education Assessment and Description Record System (PEADRS) in the electronic medical record (2); to examine the effectiveness of the PEADRS on documentation and nurses' satisfaction (3); to facilitate communication and cooperation between professionals. METHODS AND ANALYSIS A quasi-experimental design and random sampling will be used. The participants are nurses who are involved in patient education by using traditional record or the PEADRS at a medical centre. A prospective longitudinal nested cohort study will be conducted to compare the effectiveness of the PEADRS, including (1): the length of nursing documentation (2); satisfaction with using the PEADRS; and (3) the benefit to professional cooperation. ETHICS AND DISSEMINATION Patient privacy will be protected according to Electronic Medical Record Management Practices of the hospital. This study develops a patient education digital record system, which would profit the quality of clinical practice in health education. The results will be published in peer-reviewed journals and will be presented at scientific conferences.
Collapse
Affiliation(s)
- Chia-Hsien Su
- Department of Public Health, China Medical University, Taichung, Taiwan
- Nursing, New Taipei City Hospital, Taipei, Taiwan
| | - Tsai-Chung Li
- Graduate Institute of Biostatistics, China Medical University, Taichung, Taiwan
| | - Der-Yang Cho
- Graduate Institute of Biomedical Sciences, China Medical University, Taichung, Taiwan
- Department of Neurosurgery, China Medical University Hospital, Taichung, Taiwan
| | - Wei-Fen Ma
- Department of Nursing, China Medical University Hospital, Taichung, Taiwan
- School of Nursing, China Medical University, Taichung, Taiwan
| | - Yu-Shan Chang
- Department of Public Health, China Medical University, Taichung, Taiwan
- Department of Nursing, China Medical University Hospital, Taichung, Taiwan
| | - Tsung-Han Lee
- Information Technology Office, China Medical University Hospital, Taichung, Taiwan
| | - Li-Chi Huang
- Department of Nursing, China Medical University Hospital, Taichung, Taiwan
- School of Nursing, China Medical University, Taichung, Taiwan
| |
Collapse
|
14
|
A novel tool for the identification of correlations in medical data by faceted search. Comput Biol Med 2017; 85:98-105. [DOI: 10.1016/j.compbiomed.2017.04.011] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2016] [Revised: 04/05/2017] [Accepted: 04/13/2017] [Indexed: 11/17/2022]
|
15
|
Hanauer DA, Wu DTY, Yang L, Mei Q, Murkowski-Steffy KB, Vydiswaran VGV, Zheng K. Development and empirical user-centered evaluation of semantically-based query recommendation for an electronic health record search engine. J Biomed Inform 2017; 67:1-10. [PMID: 28131722 DOI: 10.1016/j.jbi.2017.01.013] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2016] [Revised: 12/21/2016] [Accepted: 01/23/2017] [Indexed: 02/01/2023]
Abstract
OBJECTIVE The utility of biomedical information retrieval environments can be severely limited when users lack expertise in constructing effective search queries. To address this issue, we developed a computer-based query recommendation algorithm that suggests semantically interchangeable terms based on an initial user-entered query. In this study, we assessed the value of this approach, which has broad applicability in biomedical information retrieval, by demonstrating its application as part of a search engine that facilitates retrieval of information from electronic health records (EHRs). MATERIALS AND METHODS The query recommendation algorithm utilizes MetaMap to identify medical concepts from search queries and indexed EHR documents. Synonym variants from UMLS are used to expand the concepts along with a synonym set curated from historical EHR search logs. The empirical study involved 33 clinicians and staff who evaluated the system through a set of simulated EHR search tasks. User acceptance was assessed using the widely used technology acceptance model. RESULTS The search engine's performance was rated consistently higher with the query recommendation feature turned on vs. off. The relevance of computer-recommended search terms was also rated high, and in most cases the participants had not thought of these terms on their own. The questions on perceived usefulness and perceived ease of use received overwhelmingly positive responses. A vast majority of the participants wanted the query recommendation feature to be available to assist in their day-to-day EHR search tasks. DISCUSSION AND CONCLUSION Challenges persist for users to construct effective search queries when retrieving information from biomedical documents including those from EHRs. This study demonstrates that semantically-based query recommendation is a viable solution to addressing this challenge.
Collapse
Affiliation(s)
- David A Hanauer
- Department of Pediatrics, University of Michigan Medical School, 5312 CC, SPC 5940, 1500 East Medical Center Drive, Ann Arbor, MI 48109, USA; School of Information, University of Michigan, 105 South State Street, Ann Arbor, MI 48109, USA.
| | - Danny T Y Wu
- School of Information, University of Michigan, 105 South State Street, Ann Arbor, MI 48109, USA; Department of Pediatrics, University of Michigan Medical School, 5312 CC, SPC 5940, 1500 East Medical Center Drive, Ann Arbor, MI 48109, USA.
| | - Lei Yang
- School of Information, University of Michigan, 105 South State Street, Ann Arbor, MI 48109, USA.
| | - Qiaozhu Mei
- School of Information, University of Michigan, 105 South State Street, Ann Arbor, MI 48109, USA; Department of Electrical Engineering and Computer Science, University of Michigan, 2260 Hayward Street, Ann Arbor, MI 48109, USA.
| | - Katherine B Murkowski-Steffy
- Department of Health Management and Policy, School of Public Health, 1415 Washington Heights, Ann Arbor, MI 48109, USA.
| | - V G Vinod Vydiswaran
- Department of Learning Health Sciences, University of Michigan Medical School, 1111 East Catherine Street, Ann Arbor, MI 48109, USA; School of Information, University of Michigan, 105 South State Street, Ann Arbor, MI 48109, USA.
| | - Kai Zheng
- Department of Health Management and Policy, School of Public Health, 1415 Washington Heights, Ann Arbor, MI 48109, USA; School of Information, University of Michigan, 105 South State Street, Ann Arbor, MI 48109, USA.
| |
Collapse
|
16
|
de Charry F, Sadoune K, Sebban C, Rey P, de Parisot A, Nicolas-Virelizier E, Belhabri A, Ghesquières H, Ninet J, Faurie P. [Association of lymphoma and granulomatosis: A case series]. Rev Med Interne 2015; 37:453-9. [PMID: 26611429 DOI: 10.1016/j.revmed.2015.10.344] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2015] [Revised: 08/24/2015] [Accepted: 10/23/2015] [Indexed: 12/14/2022]
Abstract
PURPOSE The sarcoidosis-lymphoma syndrome is a recognised entity. However, the presence of granulomas in patients with a haematological disease should not lead too easily to a diagnosis of sarcoidosis. The presence of granulomatous lesions during the follow-up of these patients raises diagnostic and therapeutic issues. METHODS We included 25 patients followed by the department of haematology in a French hospital (Centre Léon-Bérard). These patients presented with granulomatous lesions. Patients with a history of sarcoidosis were excluded. We report the type of haematological disease, the time of onset of the granulomatous disease compared to that of lymphoma, associated symptoms, aetiology and outcome. Patients were divided into three groups according to the time of onset of the granulomatous lesions. RESULTS Granulomatous lesions appeared before the haematological disease in 4 cases, was concomitant in 8 cases and appeared later in 13 remaining cases. The two main subtypes of lymphoma encountered were: diffuse large cell lymphoma (36%) and Hodgkin's lymphoma (28%). Granulomatous lesions were related to the progression of the hematological disease in 11 cases, to sarcoidosis in 4 cases, to infection in 3 cases, to drug allergy in one case, to inflammatory bowel disease in one case, to granuloma annulare in one case and was isolated in 4 cases (no identified etiology). In the group where granulomas appeared after the haematological disease, mean SUV was 11 for the haematological disease versus 6.4 for granulomas. CONCLUSION Granulomatous diseases in lymphomas can be due to various aetiologies: infection, reaction to the haematological disease, or systemic sarcoidosis. It is an important challenge for clinicians, who can miss the diagnosis of lymphoma and or conclude to a treatment failure or a relapse. Computed tomography scan (CT-scan) or (18)F-deoxyglucose-positron emission tomography scan can help establish a diagnosis but do not replace biopsy.
Collapse
Affiliation(s)
- F de Charry
- Service de médecine interne, hôpital Édouard-Herriot, 5, place d'Arsonval, 69003 Lyon, France; Service d'hématologie, centre Léon-Bérard, 28, rue Laënnec, 69003 Lyon, France; Service de médecine interne, hôpital d'Instruction des Armées Desgenettes, 108, boulevard Pinel, 69003 Lyon, France.
| | - K Sadoune
- Service de médecine nucléaire, centre Léon-Bérard, 28, rue Laënnec, 69003 Lyon, France
| | - C Sebban
- Service d'hématologie, centre Léon-Bérard, 28, rue Laënnec, 69003 Lyon, France
| | - P Rey
- Service d'hématologie, centre Léon-Bérard, 28, rue Laënnec, 69003 Lyon, France
| | - A de Parisot
- Service d'hématologie, centre Léon-Bérard, 28, rue Laënnec, 69003 Lyon, France
| | | | - A Belhabri
- Service d'hématologie, centre Léon-Bérard, 28, rue Laënnec, 69003 Lyon, France
| | - H Ghesquières
- Service d'hématologie, centre Léon-Bérard, 28, rue Laënnec, 69003 Lyon, France
| | - J Ninet
- Service de médecine interne, hôpital Édouard-Herriot, 5, place d'Arsonval, 69003 Lyon, France
| | - P Faurie
- Service de médecine interne, hôpital Édouard-Herriot, 5, place d'Arsonval, 69003 Lyon, France
| |
Collapse
|
17
|
Lehmann CU, Gundlapalli AV. Improving Bridging from Informatics Practice to Theory. Methods Inf Med 2015; 54:540-5. [PMID: 26577504 DOI: 10.3414/me15-01-0138] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2015] [Accepted: 10/22/2015] [Indexed: 11/09/2022]
Abstract
BACKGROUND In 1962, Methods of Information in Medicine ( MIM ) began to publish papers on the methodology and scientific fundamentals of organizing, representing, and analyzing data, information, and knowledge in biomedicine and health care. Considered a companion journal, Applied Clinical Informatics ( ACI ) was launched in 2009 with a mission to establish a platform that allows sharing of knowledge between clinical medicine and health IT specialists as well as to bridge gaps between visionary design and successful and pragmatic deployment of clinical information systems. Both journals are official journals of the International Medical Informatics Association. OBJECTIVES As a follow-up to prior work, we set out to explore congruencies and interdependencies in publications of ACI and MIM. The objectives were to describe the major topics discussed in articles published in ACI in 2014 and to determine if there was evidence that theory in 2014 MIM publications was informed by practice described in ACI publications in any year. We also set out to describe lessons learned in the context of bridging informatics practice and theory and offer opinions on how ACI editorial policies could evolve to foster and improve such bridging. METHODS We conducted a retrospective observational study and reviewed all articles published in ACI during the calendar year 2014 (Volume 5) for their main theme, conclusions, and key words. We then reviewed the citations of all MIM papers from 2014 to determine if there were references to ACI articles from any year. Lessons learned in the context of bridging informatics practice and theory and opinions on ACI editorial policies were developed by consensus among the two authors. RESULTS A total of 70 articles were published in ACI in 2014. Clinical decision support, clinical documentation, usability, Meaningful Use, health information exchange, patient portals, and clinical research informatics emerged as major themes. Only one MIM article from 2014 cited an ACI article. There are several lessons learned including the possibility that there may not be direct links between MIM theory and ACI practice articles. ACI editorial policies will continue to evolve to reflect the breadth and depth of the practice of clinical informatics and articles received for publication. Efforts to encourage bridging of informatics practice and theory may be considered by the ACI editors. CONCLUSIONS The lack of direct links from informatics theory-based papers published in MIM in 2014 to papers published in ACI continues as was described for papers published during 2012 to 2013 in the two companion journals. Thus, there is little evidence that theory in MIM has been informed by practice in ACI.
Collapse
Affiliation(s)
| | - A V Gundlapalli
- Adi V. Gundlapalli, MD, PhD, MS, Chief Health Informatics Officer, VA Salt Lake City Health Care System, Salt Lake City, UT 84148, USA, E-mail:
| |
Collapse
|
18
|
|
19
|
Supporting information retrieval from electronic health records: A report of University of Michigan's nine-year experience in developing and using the Electronic Medical Record Search Engine (EMERSE). J Biomed Inform 2015; 55:290-300. [PMID: 25979153 DOI: 10.1016/j.jbi.2015.05.003] [Citation(s) in RCA: 291] [Impact Index Per Article: 32.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2015] [Revised: 03/31/2015] [Accepted: 05/05/2015] [Indexed: 12/18/2022]
Abstract
OBJECTIVE This paper describes the University of Michigan's nine-year experience in developing and using a full-text search engine designed to facilitate information retrieval (IR) from narrative documents stored in electronic health records (EHRs). The system, called the Electronic Medical Record Search Engine (EMERSE), functions similar to Google but is equipped with special functionalities for handling challenges unique to retrieving information from medical text. MATERIALS AND METHODS Key features that distinguish EMERSE from general-purpose search engines are discussed, with an emphasis on functions crucial to (1) improving medical IR performance and (2) assuring search quality and results consistency regardless of users' medical background, stage of training, or level of technical expertise. RESULTS Since its initial deployment, EMERSE has been enthusiastically embraced by clinicians, administrators, and clinical and translational researchers. To date, the system has been used in supporting more than 750 research projects yielding 80 peer-reviewed publications. In several evaluation studies, EMERSE demonstrated very high levels of sensitivity and specificity in addition to greatly improved chart review efficiency. DISCUSSION Increased availability of electronic data in healthcare does not automatically warrant increased availability of information. The success of EMERSE at our institution illustrates that free-text EHR search engines can be a valuable tool to help practitioners and researchers retrieve information from EHRs more effectively and efficiently, enabling critical tasks such as patient case synthesis and research data abstraction. CONCLUSION EMERSE, available free of charge for academic use, represents a state-of-the-art medical IR tool with proven effectiveness and user acceptance.
Collapse
|