Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Almeida JR, Silva JF, Matos S, Oliveira JL. A two-stage workflow to extract and harmonize drug mentions from clinical notes into observational databases. J Biomed Inform 2021;120:103849. [PMID: 34214696 DOI: 10.1016/j.jbi.2021.103849] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Revised: 06/04/2021] [Accepted: 06/19/2021] [Indexed: 01/02/2023]

For:	Almeida JR, Silva JF, Matos S, Oliveira JL. A two-stage workflow to extract and harmonize drug mentions from clinical notes into observational databases. J Biomed Inform 2021;120:103849. [PMID: 34214696 DOI: 10.1016/j.jbi.2021.103849] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Revised: 06/04/2021] [Accepted: 06/19/2021] [Indexed: 01/02/2023]

Number

Cited by Other Article(s)

Henke E, Zoch M, Peng Y, Reinecke I, Sedlmayr M, Bathelt F. Conceptual design of a generic data harmonization process for OMOP common data model. BMC Med Inform Decis Mak 2024;24:58. [PMID: 38408983 PMCID: PMC10895818 DOI: 10.1186/s12911-024-02458-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 02/09/2024] [Indexed: 02/28/2024] Open

Henke E, Zoch M, Kallfelz M, Ruhnke T, Leutner LA, Spoden M, Günster C, Sedlmayr M, Bathelt F. Assessing the Use of German Claims Data Vocabularies for Research in the Observational Medical Outcomes Partnership Common Data Model: Development and Evaluation Study. JMIR Med Inform 2023;11:e47959. [PMID: 37942786 PMCID: PMC10653283 DOI: 10.2196/47959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 09/07/2023] [Accepted: 09/09/2023] [Indexed: 11/10/2023] Open

Abstract

Background

National classifications and terminologies already routinely used for documentation within patient care settings enable the unambiguous representation of clinical information. However, the diversity of different vocabularies across health care institutions and countries is a barrier to achieving semantic interoperability and exchanging data across sites. The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) enables the standardization of structure and medical terminology. It allows the mapping of national vocabularies into so-called standard concepts, representing normative expressions for international analyses and research. Within our project "Hybrid Quality Indicators Using Machine Learning Methods" (Hybrid-QI), we aim to harmonize source codes used in German claims data vocabularies that are currently unavailable in the OMOP CDM.

Objective

This study aims to increase the coverage of German vocabularies in the OMOP CDM. We aim to completely transform the source codes used in German claims data into the OMOP CDM without data loss and make German claims data usable for OMOP CDM-based research.

Methods

To prepare the missing German vocabularies for the OMOP CDM, we defined a vocabulary preparation approach consisting of the identification of all codes of the corresponding vocabularies, their assembly into machine-readable tables, and the translation of German designations into English. Furthermore, we used 2 proposed approaches for OMOP-compliant vocabulary preparation: the mapping to standard concepts using the Observational Health Data Sciences and Informatics (OHDSI) tool Usagi and the preparation of new 2-billion concepts (ie, concept_id >2 billion). Finally, we evaluated the prepared vocabularies regarding completeness and correctness using synthetic German claims data and calculated the coverage of German claims data vocabularies in the OMOP CDM.

Results

Our vocabulary preparation approach was able to map 3 missing German vocabularies to standard concepts and prepare 8 vocabularies as new 2-billion concepts. The completeness evaluation showed that the prepared vocabularies cover 44.3% (3288/7417) of the source codes contained in German claims data. The correctness evaluation revealed that the specified validity periods in the OMOP CDM are compliant for the majority (705,531/706,032, 99.9%) of source codes and associated dates in German claims data. The calculation of the vocabulary coverage showed a noticeable decrease of missing vocabularies from 55% (11/20) to 10% (2/20) due to our preparation approach.

Conclusions

By preparing 10 vocabularies, we showed that our approach is applicable to any type of vocabulary used in a source data set. The prepared vocabularies are currently limited to German vocabularies, which can only be used in national OMOP CDM research projects, because the mapping of new 2-billion concepts to standard concepts is missing. To participate in international OHDSI network studies with German claims data, future work is required to map the prepared 2-billion concepts to standard concepts.

Collapse

de Groot R, Püttmann DP, Fleuren LM, Thoral PJ, Elbers PWG, de Keizer NF, Cornet R. Determining and assessing characteristics of data element names impacting the performance of annotation using Usagi. Int J Med Inform 2023;178:105200. [PMID: 37703800 DOI: 10.1016/j.ijmedinf.2023.105200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 08/11/2023] [Accepted: 08/23/2023] [Indexed: 09/15/2023]

Abstract

INTRODUCTION

Hospitals generate large amounts of data and this data is generally modeled and labeled in a proprietary way, hampering its exchange and integration. Manually annotating data element names to internationally standardized data element identifiers is a time-consuming effort. Tools can support performing this task automatically. This study aimed to determine what factors influence the quality of automatic annotations.

METHODS

Data element names were used from the Dutch COVID-19 ICU Data Warehouse containing data on intensive care patients with COVID-19 from 25 hospitals in the Netherlands. In this data warehouse, the data had been merged using a proprietary terminology system while also storing the original hospital labels (synonymous names). Usagi, an OHDSI annotation tool, was used to perform the annotation for the data. A gold standard was used to determine if Usagi made correct annotations. Logistic regression was used to determine if the number of characters, number of words, match score (Usagi's certainty) and hospital label origin influenced Usagi's performance to annotate correctly.

RESULTS

Usagi automatically annotated 30.5% of the data element names correctly and 5.5% of the synonymous names. The match score is the best predictor for Usagi finding the correct annotation. It was determined that the AUC of data element names was 0.651 and 0.752 for the synonymous names respectively. The AUC for the individual hospital label origins varied between 0.460 to 0.905.

DISCUSSION

The results show that Usagi performed better to annotate the data element names than the synonymous names. The hospital origin in the synonymous names dataset was associated with the amount of correctly annotated concepts. Hospitals that performed better had shorter synonymous names and fewer words. Using shorter data element names or synonymous names should be considered to optimize the automatic annotating process. Overall, the performance of Usagi is too poor to completely rely on for automatic annotation.

Collapse

Kumar S, Nanelia A, Mariappan R, Rajagopal A, Rajan V. Patient Representation Learning From Heterogeneous Data Sources and Knowledge Graphs Using Deep Collective Matrix Factorization: Evaluation Study. JMIR Med Inform 2022;10:e28842. [PMID: 35049514 PMCID: PMC8814927 DOI: 10.2196/28842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 11/07/2021] [Accepted: 11/14/2021] [Indexed: 11/13/2022] Open

Abstract

BACKGROUND

Patient representation learning aims to learn features, also called representations, from input sources automatically, often in an unsupervised manner, for use in predictive models. This obviates the need for cumbersome, time- and resource-intensive manual feature engineering, especially from unstructured data such as text, images, or graphs. Most previous techniques have used neural network-based autoencoders to learn patient representations, primarily from clinical notes in electronic medical records (EMRs). Knowledge graphs (KGs), with clinical entities as nodes and their relations as edges, can be extracted automatically from biomedical literature and provide complementary information to EMR data that have been found to provide valuable predictive signals.

OBJECTIVE

This study aims to evaluate the efficacy of collective matrix factorization (CMF), both the classical variant and a recent neural architecture called deep CMF (DCMF), in integrating heterogeneous data sources from EMR and KG to obtain patient representations for clinical decision support tasks.

METHODS

Using a recent formulation for obtaining graph representations through matrix factorization within the context of CMF, we infused auxiliary information during patient representation learning. We also extended the DCMF architecture to create a task-specific end-to-end model that learns to simultaneously find effective patient representations and predictions. We compared the efficacy of such a model to that of first learning unsupervised representations and then independently learning a predictive model. We evaluated patient representation learning using CMF-based methods and autoencoders for 2 clinical decision support tasks on a large EMR data set.

RESULTS

Our experiments show that DCMF provides a seamless way for integrating multiple sources of data to obtain patient representations, both in unsupervised and supervised settings. Its performance in single-source settings is comparable with that of previous autoencoder-based representation learning methods. When DCMF is used to obtain representations from a combination of EMR and KG, where most previous autoencoder-based methods cannot be used directly, its performance is superior to that of previous nonneural methods for CMF. Infusing information from KGs into patient representations using DCMF was found to improve downstream predictive performance.

CONCLUSIONS

Our experiments indicate that DCMF is a versatile model that can be used to obtain representations from single and multiple data sources and combine information from EMR data and KGs. Furthermore, DCMF can be used to learn representations in both supervised and unsupervised settings. Thus, DCMF offers an effective way of integrating heterogeneous data sources and infusing auxiliary knowledge into patient representations.

Collapse

A contextual multi-task neural approach to medication and adverse events identification from clinical text. J Biomed Inform 2021;125:103960. [PMID: 34875387 DOI: 10.1016/j.jbi.2021.103960] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 11/04/2021] [Accepted: 11/22/2021] [Indexed: 12/27/2022]