1
|
Henke E, Zoch M, Peng Y, Reinecke I, Sedlmayr M, Bathelt F. Conceptual design of a generic data harmonization process for OMOP common data model. BMC Med Inform Decis Mak 2024; 24:58. [PMID: 38408983 PMCID: PMC10895818 DOI: 10.1186/s12911-024-02458-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 02/09/2024] [Indexed: 02/28/2024] Open
Abstract
BACKGROUND To gain insight into the real-life care of patients in the healthcare system, data from hospital information systems and insurance systems are required. Consequently, linking clinical data with claims data is necessary. To ensure their syntactic and semantic interoperability, the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) from the Observational Health Data Sciences and Informatics (OHDSI) community was chosen. However, there is no detailed guide that would allow researchers to follow a generic process for data harmonization, i.e. the transformation of local source data into the standardized OMOP CDM format. Thus, the aim of this paper is to conceptualize a generic data harmonization process for OMOP CDM. METHODS For this purpose, we conducted a literature review focusing on publications that address the harmonization of clinical or claims data in OMOP CDM. Subsequently, the process steps used and their chronological order as well as applied OHDSI tools were extracted for each included publication. The results were then compared to derive a generic sequence of the process steps. RESULTS From 23 publications included, a generic data harmonization process for OMOP CDM was conceptualized, consisting of nine process steps: dataset specification, data profiling, vocabulary identification, coverage analysis of vocabularies, semantic mapping, structural mapping, extract-transform-load-process, qualitative and quantitative data quality analysis. Furthermore, we identified seven OHDSI tools which supported five of the process steps. CONCLUSIONS The generic data harmonization process can be used as a step-by-step guide to assist other researchers in harmonizing source data in OMOP CDM.
Collapse
Affiliation(s)
- Elisa Henke
- Institute for Medical Informatics and Biometry, Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, 01307, Dresden, Germany.
| | - Michele Zoch
- Institute for Medical Informatics and Biometry, Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, 01307, Dresden, Germany
| | - Yuan Peng
- Institute for Medical Informatics and Biometry, Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, 01307, Dresden, Germany
| | - Ines Reinecke
- Data Integration Center, Center for Medical Informatics, University Hospital Carl Gustav Carus Dresden, 01307, Dresden, Germany
| | - Martin Sedlmayr
- Institute for Medical Informatics and Biometry, Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, 01307, Dresden, Germany
| | | |
Collapse
|
2
|
Oja M, Tamm S, Mooses K, Pajusalu M, Talvik HA, Ott A, Laht M, Malk M, Lõo M, Holm J, Haug M, Šuvalov H, Särg D, Vilo J, Laur S, Kolde R, Reisberg S. Transforming Estonian health data to the Observational Medical Outcomes Partnership (OMOP) Common Data Model: lessons learned. JAMIA Open 2023; 6:ooad100. [PMID: 38058679 PMCID: PMC10697784 DOI: 10.1093/jamiaopen/ooad100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 11/15/2023] [Indexed: 12/08/2023] Open
Abstract
Objective To describe the reusable transformation process of electronic health records (EHR), claims, and prescriptions data into Observational Medical Outcome Partnership (OMOP) Common Data Model (CDM), together with challenges faced and solutions implemented. Materials and Methods We used Estonian national health databases that store almost all residents' claims, prescriptions, and EHR records. To develop and demonstrate the transformation process of Estonian health data to OMOP CDM, we used a 10% random sample of the Estonian population (n = 150 824 patients) from 2012 to 2019 (MAITT dataset). For the sample, complete information from all 3 databases was converted to OMOP CDM version 5.3. The validation was performed using open-source tools. Results In total, we transformed over 100 million entries to standard concepts using standard OMOP vocabularies with the average mapping rate 95%. For conditions, observations, drugs, and measurements, the mapping rate was over 90%. In most cases, SNOMED Clinical Terms were used as the target vocabulary. Discussion During the transformation process, we encountered several challenges, which are described in detail with concrete examples and solutions. Conclusion For a representative 10% random sample, we successfully transferred complete records from 3 national health databases to OMOP CDM and created a reusable transformation process. Our work helps future researchers to transform linked databases into OMOP CDM more efficiently, ultimately leading to better real-world evidence.
Collapse
Affiliation(s)
- Marek Oja
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
| | - Sirli Tamm
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
| | - Kerli Mooses
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
| | - Maarja Pajusalu
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
| | - Harry-Anton Talvik
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
- STACC, 51009 Tartu, Estonia
| | - Anne Ott
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
| | - Marianna Laht
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
| | - Maria Malk
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
| | - Marcus Lõo
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
| | - Johannes Holm
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
| | - Markus Haug
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
| | - Hendrik Šuvalov
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
| | - Dage Särg
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
- STACC, 51009 Tartu, Estonia
| | - Jaak Vilo
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
- STACC, 51009 Tartu, Estonia
| | - Sven Laur
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
| | - Raivo Kolde
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
| | - Sulev Reisberg
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
- STACC, 51009 Tartu, Estonia
| |
Collapse
|
3
|
Henke E, Zoch M, Kallfelz M, Ruhnke T, Leutner LA, Spoden M, Günster C, Sedlmayr M, Bathelt F. Assessing the Use of German Claims Data Vocabularies for Research in the Observational Medical Outcomes Partnership Common Data Model: Development and Evaluation Study. JMIR Med Inform 2023; 11:e47959. [PMID: 37942786 PMCID: PMC10653283 DOI: 10.2196/47959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 09/07/2023] [Accepted: 09/09/2023] [Indexed: 11/10/2023] Open
Abstract
Background National classifications and terminologies already routinely used for documentation within patient care settings enable the unambiguous representation of clinical information. However, the diversity of different vocabularies across health care institutions and countries is a barrier to achieving semantic interoperability and exchanging data across sites. The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) enables the standardization of structure and medical terminology. It allows the mapping of national vocabularies into so-called standard concepts, representing normative expressions for international analyses and research. Within our project "Hybrid Quality Indicators Using Machine Learning Methods" (Hybrid-QI), we aim to harmonize source codes used in German claims data vocabularies that are currently unavailable in the OMOP CDM. Objective This study aims to increase the coverage of German vocabularies in the OMOP CDM. We aim to completely transform the source codes used in German claims data into the OMOP CDM without data loss and make German claims data usable for OMOP CDM-based research. Methods To prepare the missing German vocabularies for the OMOP CDM, we defined a vocabulary preparation approach consisting of the identification of all codes of the corresponding vocabularies, their assembly into machine-readable tables, and the translation of German designations into English. Furthermore, we used 2 proposed approaches for OMOP-compliant vocabulary preparation: the mapping to standard concepts using the Observational Health Data Sciences and Informatics (OHDSI) tool Usagi and the preparation of new 2-billion concepts (ie, concept_id >2 billion). Finally, we evaluated the prepared vocabularies regarding completeness and correctness using synthetic German claims data and calculated the coverage of German claims data vocabularies in the OMOP CDM. Results Our vocabulary preparation approach was able to map 3 missing German vocabularies to standard concepts and prepare 8 vocabularies as new 2-billion concepts. The completeness evaluation showed that the prepared vocabularies cover 44.3% (3288/7417) of the source codes contained in German claims data. The correctness evaluation revealed that the specified validity periods in the OMOP CDM are compliant for the majority (705,531/706,032, 99.9%) of source codes and associated dates in German claims data. The calculation of the vocabulary coverage showed a noticeable decrease of missing vocabularies from 55% (11/20) to 10% (2/20) due to our preparation approach. Conclusions By preparing 10 vocabularies, we showed that our approach is applicable to any type of vocabulary used in a source data set. The prepared vocabularies are currently limited to German vocabularies, which can only be used in national OMOP CDM research projects, because the mapping of new 2-billion concepts to standard concepts is missing. To participate in international OHDSI network studies with German claims data, future work is required to map the prepared 2-billion concepts to standard concepts.
Collapse
Affiliation(s)
- Elisa Henke
- Institute for Medical Informatics and Biometry, Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, Dresden, Germany
| | - Michéle Zoch
- Institute for Medical Informatics and Biometry, Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, Dresden, Germany
| | | | - Thomas Ruhnke
- Wissenschaftliches Institut der AOK (AOK Research Institute), Berlin, Germany
| | - Liz Annika Leutner
- Institute for Medical Informatics and Biometry, Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, Dresden, Germany
| | - Melissa Spoden
- Wissenschaftliches Institut der AOK (AOK Research Institute), Berlin, Germany
| | - Christian Günster
- Wissenschaftliches Institut der AOK (AOK Research Institute), Berlin, Germany
| | - Martin Sedlmayr
- Institute for Medical Informatics and Biometry, Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, Dresden, Germany
| | | |
Collapse
|
4
|
Mazzotti DR, Haendel MA, McMurry JA, Smith CJ, Buysse DJ, Roenneberg T, Penzel T, Purcell S, Redline S, Zhang Y, Merikangas KR, Menetski JP, Mullington J, Boudreau E. Sleep and circadian informatics data harmonization: a workshop report from the Sleep Research Society and Sleep Research Network. Sleep 2022; 45:zsac002. [PMID: 35030631 PMCID: PMC9189941 DOI: 10.1093/sleep/zsac002] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 12/21/2021] [Indexed: 01/16/2023] Open
Abstract
The increasing availability and complexity of sleep and circadian data are equally exciting and challenging. The field is in constant technological development, generating better high-resolution physiological and molecular data than ever before. Yet, the promise of large-scale studies leveraging millions of patients is limited by suboptimal approaches for data sharing and interoperability. As a result, integration of valuable clinical and basic resources is problematic, preventing knowledge discovery and rapid translation of findings into clinical care. To understand the current data landscape in the sleep and circadian domains, the Sleep Research Society (SRS) and the Sleep Research Network (now a task force of the SRS) organized a workshop on informatics and data harmonization, presented at the World Sleep Congress 2019, in Vancouver, Canada. Experts in translational informatics gathered with sleep research experts to discuss opportunities and challenges in defining strategies for data harmonization. The goal of this workshop was to fuel discussion and foster innovative approaches for data integration and development of informatics infrastructure supporting multi-site collaboration. Key recommendations included collecting and storing findable, accessible, interoperable, and reusable data; identifying existing international cohorts and resources supporting research in sleep and circadian biology; and defining the most relevant sleep data elements and associated metadata that could be supported by early integration initiatives. This report introduces foundational concepts with the goal of facilitating engagement between the sleep/circadian and informatics communities and is a call to action for the implementation and adoption of data harmonization strategies in this domain.
Collapse
Affiliation(s)
- Diego R Mazzotti
- Division of Medical Informatics, Department of Internal Medicine, University of Kansas Medical Center, Kansas City, KS, USA
- Division of Pulmonary Critical Care and Sleep Medicine, Department of Internal Medicine, University of Kansas Medical Center, Kansas City, KS, USA
| | - Melissa A Haendel
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Julie A McMurry
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Connor J Smith
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, USA
| | - Daniel J Buysse
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA,USA
| | - Till Roenneberg
- Institute and Polyclinic for Occupational-, Social- and Environmental Medicine, LMU Munich, Germany
| | - Thomas Penzel
- Interdisciplinary Center of Sleep Medicine, Charité University Hospital, Berlin, Germany
| | - Shaun Purcell
- Department of Psychiatry, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Susan Redline
- Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Ying Zhang
- Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Kathleen R Merikangas
- Genetic Epidemiology Research Branch, Intramural Research Program, National Institute of Mental Health, Bethesda, MD, USA
| | | | - Janet Mullington
- Department of Neurology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
| | - Eilis Boudreau
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, USA
| |
Collapse
|
5
|
Jung H, Yoo S, Kim S, Heo E, Kim B, Lee HY, Hwang H. Patient-Level Fall Risk Prediction Using the Observational Medical Outcomes Partnership's Common Data Model: Pilot Feasibility Study. JMIR Med Inform 2022; 10:e35104. [PMID: 35275076 PMCID: PMC8957002 DOI: 10.2196/35104] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 01/02/2022] [Accepted: 01/31/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Falls in acute care settings threaten patients' safety. Researchers have been developing fall risk prediction models and exploring risk factors to provide evidence-based fall prevention practices; however, such efforts are hindered by insufficient samples, limited covariates, and a lack of standardized methodologies that aid study replication. OBJECTIVE The objectives of this study were to (1) convert fall-related electronic health record data into the standardized Observational Medical Outcome Partnership's (OMOP) common data model format and (2) develop models that predict fall risk during 2 time periods. METHODS As a pilot feasibility test, we converted fall-related electronic health record data (nursing notes, fall risk assessment sheet, patient acuity assessment sheet, and clinical observation sheet) into standardized OMOP common data model format using an extraction, transformation, and load process. We developed fall risk prediction models for 2 time periods (within 7 days of admission and during the entire hospital stay) using 2 algorithms (least absolute shrinkage and selection operator logistic regression and random forest). RESULTS In total, 6277 nursing statements, 747,049,486 clinical observation sheet records, 1,554,775 fall risk scores, and 5,685,011 patient acuity scores were converted into OMOP common data model format. All our models (area under the receiver operating characteristic curve 0.692-0.726) performed better than the Hendrich II Fall Risk Model. Patient acuity score, fall history, age ≥60 years, movement disorder, and central nervous system agents were the most important predictors in the logistic regression models. CONCLUSIONS To enhance model performance further, we are currently converting all nursing records into the OMOP common data model data format, which will then be included in the models. Thus, in the near future, the performance of fall risk prediction models could be improved through the application of abundant nursing records and external validation.
Collapse
Affiliation(s)
- Hyesil Jung
- Office of eHealth Research and Business, Seoul National University Bundang Hospital, Seongnam-si, Republic of Korea
| | - Sooyoung Yoo
- Office of eHealth Research and Business, Seoul National University Bundang Hospital, Seongnam-si, Republic of Korea
| | - Seok Kim
- Office of eHealth Research and Business, Seoul National University Bundang Hospital, Seongnam-si, Republic of Korea
| | - Eunjeong Heo
- Office of eHealth Research and Business, Seoul National University Bundang Hospital, Seongnam-si, Republic of Korea
| | - Borham Kim
- Office of eHealth Research and Business, Seoul National University Bundang Hospital, Seongnam-si, Republic of Korea
| | - Ho-Young Lee
- Office of eHealth Research and Business, Seoul National University Bundang Hospital, Seongnam-si, Republic of Korea
| | - Hee Hwang
- Kakao Healthcare Company-In-Company, Seongnam-si, Republic of Korea
| |
Collapse
|
6
|
Mazzotti DR. Landscape of biomedical informatics standards and terminologies for clinical sleep medicine research: A systematic review. Sleep Med Rev 2021; 60:101529. [PMID: 34455108 DOI: 10.1016/j.smrv.2021.101529] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 05/14/2021] [Accepted: 07/03/2021] [Indexed: 12/31/2022]
Abstract
A systematic literature review was conducted to understand the current landscape of standards and terminologies used in clinical sleep medicine. Literature search on PubMed, EMBASE, Medline and Web of Science was performed in March 2021 using terms related to sleep, terminologies, standards, harmonization, semantics, ontology, and electronic health records (EHR). Systematic review was carried out according to PRISMA. Among 128 included studies, 35 were eligible for review. Articles were broadly classified into six topics: standard terminology efforts, reporting standards, databases and resources, data integration efforts, EHR abstraction and standards for automated sleep scoring. This review highlights the progress and challenges related to establishing computable terminologies in sleep medicine, and identifies gaps, limitations and research opportunities related to data integration that could improve adoption of clinical research informatics in this field. There is a need for the systematic adoption of standardized terminologies in all areas of sleep medicine. Existing data aggregation resources could be leveraged to support the development of an integrated infrastructure and subsequent deployment in EHR systems within sleep centers. Ultimately, the adoption of standardized practices for documenting sleep disorders and related traits facilitates data sharing, thus accelerating discovery and clinical translation of informatics approaches applied to sleep medicine.
Collapse
Affiliation(s)
- Diego R Mazzotti
- Division of Medical Informatics, Department of Internal Medicine, University of Kansas Medical Center, Kansas City, KS, USA.
| |
Collapse
|