1
|
Henke E, Zoch M, Peng Y, Reinecke I, Sedlmayr M, Bathelt F. Conceptual design of a generic data harmonization process for OMOP common data model. BMC Med Inform Decis Mak 2024; 24:58. [PMID: 38408983 PMCID: PMC10895818 DOI: 10.1186/s12911-024-02458-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 02/09/2024] [Indexed: 02/28/2024] Open
Abstract
BACKGROUND To gain insight into the real-life care of patients in the healthcare system, data from hospital information systems and insurance systems are required. Consequently, linking clinical data with claims data is necessary. To ensure their syntactic and semantic interoperability, the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) from the Observational Health Data Sciences and Informatics (OHDSI) community was chosen. However, there is no detailed guide that would allow researchers to follow a generic process for data harmonization, i.e. the transformation of local source data into the standardized OMOP CDM format. Thus, the aim of this paper is to conceptualize a generic data harmonization process for OMOP CDM. METHODS For this purpose, we conducted a literature review focusing on publications that address the harmonization of clinical or claims data in OMOP CDM. Subsequently, the process steps used and their chronological order as well as applied OHDSI tools were extracted for each included publication. The results were then compared to derive a generic sequence of the process steps. RESULTS From 23 publications included, a generic data harmonization process for OMOP CDM was conceptualized, consisting of nine process steps: dataset specification, data profiling, vocabulary identification, coverage analysis of vocabularies, semantic mapping, structural mapping, extract-transform-load-process, qualitative and quantitative data quality analysis. Furthermore, we identified seven OHDSI tools which supported five of the process steps. CONCLUSIONS The generic data harmonization process can be used as a step-by-step guide to assist other researchers in harmonizing source data in OMOP CDM.
Collapse
Affiliation(s)
- Elisa Henke
- Institute for Medical Informatics and Biometry, Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, 01307, Dresden, Germany.
| | - Michele Zoch
- Institute for Medical Informatics and Biometry, Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, 01307, Dresden, Germany
| | - Yuan Peng
- Institute for Medical Informatics and Biometry, Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, 01307, Dresden, Germany
| | - Ines Reinecke
- Data Integration Center, Center for Medical Informatics, University Hospital Carl Gustav Carus Dresden, 01307, Dresden, Germany
| | - Martin Sedlmayr
- Institute for Medical Informatics and Biometry, Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, 01307, Dresden, Germany
| | | |
Collapse
|
2
|
Carlson B, Watkins M, Li M, Furner B, Cohen E, Volchenboum SL. Using A Standardized Nomenclature to Semantically Map Oncology-Related Concepts from Common Data Models to a Pediatric Cancer Data Model. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2024; 2023:874-883. [PMID: 38222364 PMCID: PMC10785885] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]
Abstract
The Pediatric Cancer Data Commons (PCDC) comprises an international community whose ironclad commitment to data sharing is combatting pediatric cancer in an unprecedented way. The byproduct of their data sharing efforts is a gold-standard consensus data model covering many types of pediatric cancer. This article describes an effort to utilize SSSOM, an emerging specification for semantically-rich data mappings, to provide a "hub and spoke" model of mappings from several common data models (CDMs) to the PCDC data model. This provides important contributions to the research community, including: 1) a clear view of the current coverage of these CDMs in the domain of pediatric oncology, and 2) a demonstration of creating standardized mappings. These mappings can allow downstream crosswalk for data transformation and enhance data sharing. This can guide those who currently create and maintain brittle ad hoc data mappings in order to utilize the growing volume of viable research data.
Collapse
Affiliation(s)
- Bradley Carlson
- Department of Pediatrics, University of Chicago, Chicago, IL
| | - Michael Watkins
- Department of Pediatrics, University of Chicago, Chicago, IL
| | - Mei Li
- Department of Pediatrics, University of Chicago, Chicago, IL
| | - Brian Furner
- Department of Pediatrics, University of Chicago, Chicago, IL
| | - Ellen Cohen
- Department of Pediatrics, University of Chicago, Chicago, IL
| | | |
Collapse
|
3
|
Voss EA, Blacketer C, van Sandijk S, Moinat M, Kallfelz M, van Speybroeck M, Prieto-Alhambra D, Schuemie MJ, Rijnbeek PR. European Health Data & Evidence Network-learnings from building out a standardized international health data network. J Am Med Inform Assoc 2023; 31:209-219. [PMID: 37952118 PMCID: PMC10746315 DOI: 10.1093/jamia/ocad214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Revised: 10/19/2023] [Accepted: 10/26/2023] [Indexed: 11/14/2023] Open
Abstract
OBJECTIVE Health data standardized to a common data model (CDM) simplifies and facilitates research. This study examines the factors that make standardizing observational health data to the Observational Medical Outcomes Partnership (OMOP) CDM successful. MATERIALS AND METHODS Twenty-five data partners (DPs) from 11 countries received funding from the European Health Data Evidence Network (EHDEN) to standardize their data. Three surveys, DataQualityDashboard results, and statistics from the conversion process were analyzed qualitatively and quantitatively. Our measures of success were the total number of days to transform source data into the OMOP CDM and participation in network research. RESULTS The health data converted to CDM represented more than 133 million patients. 100%, 88%, and 84% of DPs took Surveys 1, 2, and 3. The median duration of the 6 key extract, transform, and load (ETL) processes ranged from 4 to 115 days. Of the 25 DPs, 21 DPs were considered applicable for analysis of which 52% standardized their data on time, and 48% participated in an international collaborative study. DISCUSSION This study shows that the consistent workflow used by EHDEN proves appropriate to support the successful standardization of observational data across Europe. Over the 25 successful transformations, we confirmed that getting the right people for the ETL is critical and vocabulary mapping requires specific expertise and support of tools. Additionally, we learned that teams that proactively prepared for data governance issues were able to avoid considerable delays improving their ability to finish on time. CONCLUSION This study provides guidance for future DPs to standardize to the OMOP CDM and participate in distributed networks. We demonstrate that the Observational Health Data Sciences and Informatics community must continue to evaluate and provide guidance and support for what ultimately develops the backbone of how community members generate evidence.
Collapse
Affiliation(s)
- Erica A Voss
- OHDSI Collaborators, Observational Health Data Sciences and Informatics (OHDSI), New York, NY, United States
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, the Netherlands
- Janssen Pharmaceutical Research and Development LLC, Raritan, NJ 08869, United States
| | - Clair Blacketer
- OHDSI Collaborators, Observational Health Data Sciences and Informatics (OHDSI), New York, NY, United States
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, the Netherlands
- Janssen Pharmaceutical Research and Development LLC, Raritan, NJ 08869, United States
| | - Sebastiaan van Sandijk
- OHDSI Collaborators, Observational Health Data Sciences and Informatics (OHDSI), New York, NY, United States
- Odysseus Data Services, Prague, Czech Republic
| | - Maxim Moinat
- OHDSI Collaborators, Observational Health Data Sciences and Informatics (OHDSI), New York, NY, United States
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, the Netherlands
| | - Michael Kallfelz
- OHDSI Collaborators, Observational Health Data Sciences and Informatics (OHDSI), New York, NY, United States
- Odysseus Data Services, Prague, Czech Republic
| | - Michel van Speybroeck
- Janssen Pharmaceutical Research and Development LLC, Raritan, NJ 08869, United States
| | - Daniel Prieto-Alhambra
- OHDSI Collaborators, Observational Health Data Sciences and Informatics (OHDSI), New York, NY, United States
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, the Netherlands
- Centre for Statistics in Medicine, NDORMS, University of Oxford, Oxford, United Kingdom
| | - Martijn J Schuemie
- OHDSI Collaborators, Observational Health Data Sciences and Informatics (OHDSI), New York, NY, United States
- Janssen Pharmaceutical Research and Development LLC, Raritan, NJ 08869, United States
- Department of Biostatistics, University of California, Los Angeles, CA 90095, United States
| | - Peter R Rijnbeek
- OHDSI Collaborators, Observational Health Data Sciences and Informatics (OHDSI), New York, NY, United States
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, the Netherlands
| |
Collapse
|
4
|
Oja M, Tamm S, Mooses K, Pajusalu M, Talvik HA, Ott A, Laht M, Malk M, Lõo M, Holm J, Haug M, Šuvalov H, Särg D, Vilo J, Laur S, Kolde R, Reisberg S. Transforming Estonian health data to the Observational Medical Outcomes Partnership (OMOP) Common Data Model: lessons learned. JAMIA Open 2023; 6:ooad100. [PMID: 38058679 PMCID: PMC10697784 DOI: 10.1093/jamiaopen/ooad100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 11/15/2023] [Indexed: 12/08/2023] Open
Abstract
Objective To describe the reusable transformation process of electronic health records (EHR), claims, and prescriptions data into Observational Medical Outcome Partnership (OMOP) Common Data Model (CDM), together with challenges faced and solutions implemented. Materials and Methods We used Estonian national health databases that store almost all residents' claims, prescriptions, and EHR records. To develop and demonstrate the transformation process of Estonian health data to OMOP CDM, we used a 10% random sample of the Estonian population (n = 150 824 patients) from 2012 to 2019 (MAITT dataset). For the sample, complete information from all 3 databases was converted to OMOP CDM version 5.3. The validation was performed using open-source tools. Results In total, we transformed over 100 million entries to standard concepts using standard OMOP vocabularies with the average mapping rate 95%. For conditions, observations, drugs, and measurements, the mapping rate was over 90%. In most cases, SNOMED Clinical Terms were used as the target vocabulary. Discussion During the transformation process, we encountered several challenges, which are described in detail with concrete examples and solutions. Conclusion For a representative 10% random sample, we successfully transferred complete records from 3 national health databases to OMOP CDM and created a reusable transformation process. Our work helps future researchers to transform linked databases into OMOP CDM more efficiently, ultimately leading to better real-world evidence.
Collapse
Affiliation(s)
- Marek Oja
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
| | - Sirli Tamm
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
| | - Kerli Mooses
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
| | - Maarja Pajusalu
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
| | - Harry-Anton Talvik
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
- STACC, 51009 Tartu, Estonia
| | - Anne Ott
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
| | - Marianna Laht
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
| | - Maria Malk
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
| | - Marcus Lõo
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
| | - Johannes Holm
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
| | - Markus Haug
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
| | - Hendrik Šuvalov
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
| | - Dage Särg
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
- STACC, 51009 Tartu, Estonia
| | - Jaak Vilo
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
- STACC, 51009 Tartu, Estonia
| | - Sven Laur
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
| | - Raivo Kolde
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
| | - Sulev Reisberg
- Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia
- STACC, 51009 Tartu, Estonia
| |
Collapse
|
5
|
Henke E, Zoch M, Kallfelz M, Ruhnke T, Leutner LA, Spoden M, Günster C, Sedlmayr M, Bathelt F. Assessing the Use of German Claims Data Vocabularies for Research in the Observational Medical Outcomes Partnership Common Data Model: Development and Evaluation Study. JMIR Med Inform 2023; 11:e47959. [PMID: 37942786 PMCID: PMC10653283 DOI: 10.2196/47959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 09/07/2023] [Accepted: 09/09/2023] [Indexed: 11/10/2023] Open
Abstract
Background National classifications and terminologies already routinely used for documentation within patient care settings enable the unambiguous representation of clinical information. However, the diversity of different vocabularies across health care institutions and countries is a barrier to achieving semantic interoperability and exchanging data across sites. The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) enables the standardization of structure and medical terminology. It allows the mapping of national vocabularies into so-called standard concepts, representing normative expressions for international analyses and research. Within our project "Hybrid Quality Indicators Using Machine Learning Methods" (Hybrid-QI), we aim to harmonize source codes used in German claims data vocabularies that are currently unavailable in the OMOP CDM. Objective This study aims to increase the coverage of German vocabularies in the OMOP CDM. We aim to completely transform the source codes used in German claims data into the OMOP CDM without data loss and make German claims data usable for OMOP CDM-based research. Methods To prepare the missing German vocabularies for the OMOP CDM, we defined a vocabulary preparation approach consisting of the identification of all codes of the corresponding vocabularies, their assembly into machine-readable tables, and the translation of German designations into English. Furthermore, we used 2 proposed approaches for OMOP-compliant vocabulary preparation: the mapping to standard concepts using the Observational Health Data Sciences and Informatics (OHDSI) tool Usagi and the preparation of new 2-billion concepts (ie, concept_id >2 billion). Finally, we evaluated the prepared vocabularies regarding completeness and correctness using synthetic German claims data and calculated the coverage of German claims data vocabularies in the OMOP CDM. Results Our vocabulary preparation approach was able to map 3 missing German vocabularies to standard concepts and prepare 8 vocabularies as new 2-billion concepts. The completeness evaluation showed that the prepared vocabularies cover 44.3% (3288/7417) of the source codes contained in German claims data. The correctness evaluation revealed that the specified validity periods in the OMOP CDM are compliant for the majority (705,531/706,032, 99.9%) of source codes and associated dates in German claims data. The calculation of the vocabulary coverage showed a noticeable decrease of missing vocabularies from 55% (11/20) to 10% (2/20) due to our preparation approach. Conclusions By preparing 10 vocabularies, we showed that our approach is applicable to any type of vocabulary used in a source data set. The prepared vocabularies are currently limited to German vocabularies, which can only be used in national OMOP CDM research projects, because the mapping of new 2-billion concepts to standard concepts is missing. To participate in international OHDSI network studies with German claims data, future work is required to map the prepared 2-billion concepts to standard concepts.
Collapse
Affiliation(s)
- Elisa Henke
- Institute for Medical Informatics and Biometry, Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, Dresden, Germany
| | - Michéle Zoch
- Institute for Medical Informatics and Biometry, Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, Dresden, Germany
| | | | - Thomas Ruhnke
- Wissenschaftliches Institut der AOK (AOK Research Institute), Berlin, Germany
| | - Liz Annika Leutner
- Institute for Medical Informatics and Biometry, Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, Dresden, Germany
| | - Melissa Spoden
- Wissenschaftliches Institut der AOK (AOK Research Institute), Berlin, Germany
| | - Christian Günster
- Wissenschaftliches Institut der AOK (AOK Research Institute), Berlin, Germany
| | - Martin Sedlmayr
- Institute for Medical Informatics and Biometry, Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, Dresden, Germany
| | | |
Collapse
|
6
|
Park J, Lee JY, Moon MH, Park YH, Rho MJ. Cancer Research Line (CAREL): Development of Expanded Distributed Research Networks for Prostate Cancer and Lung Cancer. Technol Cancer Res Treat 2023; 22:15330338221149262. [PMID: 36977531 PMCID: PMC10061631 DOI: 10.1177/15330338221149262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/30/2023] Open
Abstract
Objectives: Big data-based multicenter medical research is expected to bring significant advances to cancer treatment worldwide. However, there are concerns related to data sharing among multicenter networks. Clinical data can be shielded by firewalls using distributed research networks (DRNs). We attempted to develop DRNs for multicenter research that can be easily installed and used by any institution. Patients and Methods: We propose a DRN for multicenter cancer research called the cancer research line (CAREL) and present a data catalog based on a common data model (CDM). CAREL was validated using 1723 patients with prostate cancer and 14 990 patients with lung cancer in a retrospective study. We used the attribute-value pairs and array data type JavaScript object notation (JSON) format to interface third-party security solutions such as blockchain. Results: We developed visualized data catalogs of prostate and lung cancer based on the observational medical outcomes partnership (OMOP) CDM, from which researchers can easily browse and select relevant data. We made the CAREL source code readily available for download and application for relevant purposes. In addition, it is possible to realize a multicenter research network using CAREL development sources. Conclusion: CAREL source can enable medical institutions to participate in multicenter cancer research. Our technology is open source, so small institutions that cannot afford to spend high costs can use it to develop a platform for multicenter research.
Collapse
Affiliation(s)
- Jihwan Park
- Department of Computer Education, Dankook Liberal Art College, Dankook University, Cheonan-si, Chungcheongnam-do, Republic of Korea
| | - Ji Youl Lee
- Department of Urology, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| | - Mi Hyoung Moon
- Department of Thoracic and Cardiovascular Surgery, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| | - Yong Hyun Park
- Department of Urology, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| | - Mi Jung Rho
- College of Health Science, Dankook University, Cheonan-si, Chungcheongnam-do, Republic of Korea
| |
Collapse
|
7
|
Choi S, Joo HJ, Kim Y, Kim JH, Seok J. Conversion of Automated 12-Lead Electrocardiogram Interpretations to OMOP CDM Vocabulary. Appl Clin Inform 2022; 13:880-890. [PMID: 36130711 PMCID: PMC9492322 DOI: 10.1055/s-0042-1756427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
Background
A computerized 12-lead electrocardiogram (ECG) can automatically generate diagnostic statements, which are helpful for clinical purposes. Standardization is required for big data analysis when using ECG data generated by different interpretation algorithms. The common data model (CDM) is a standard schema designed to overcome heterogeneity between medical data. Diagnostic statements usually contain multiple CDM concepts and also include non-essential noise information, which should be removed during CDM conversion. Existing CDM conversion tools have several limitations, such as the requirement for manual validation, inability to extract multiple CDM concepts, and inadequate noise removal.
Objectives
We aim to develop a fully automated text data conversion algorithm that overcomes limitations of existing tools and manual conversion.
Methods
We used interpretations printed by 12-lead resting ECG tests from three different vendors: GE Medical Systems, Philips Medical Systems, and Nihon Kohden. For automatic mapping, we first constructed an ontology-lexicon of ECG interpretations. After clinical coding, an optimized tool for converting ECG interpretation to CDM terminology is developed using term-based text processing.
Results
Using the ontology-lexicon, the cosine similarity-based algorithm and rule-based hierarchical algorithm showed comparable conversion accuracy (97.8 and 99.6%, respectively), while an integrated algorithm based on a heuristic approach, ECG2CDM, demonstrated superior performance (99.9%) for datasets from three major vendors.
Conclusion
We developed a user-friendly software that runs the ECG2CDM algorithm that is easy to use even if the user is not familiar with CDM or medical terminology. We propose that automated algorithms can be helpful for further big data analysis with an integrated and standardized ECG dataset.
Collapse
Affiliation(s)
- Sunho Choi
- School of Electrical Engineering, Korea University, Seoul, South Korea
| | - Hyung Joon Joo
- Korea University Research Institute for Medical Bigdata Science, Korea University, Seoul, South Korea.,Department of Cardiology, Cardiovascular Center, Korea University College of Medicine, Seoul, South Korea
| | - Yoojoong Kim
- School of Computer Science and Information Engineering, The Catholic University of Korea, Seoul, South Korea
| | - Jong-Ho Kim
- Korea University Research Institute for Medical Bigdata Science, Korea University, Seoul, South Korea.,Department of Cardiology, Cardiovascular Center, Korea University College of Medicine, Seoul, South Korea
| | - Junhee Seok
- School of Electrical Engineering, Korea University, Seoul, South Korea
| |
Collapse
|
8
|
Chung YG, Jeon Y, Yoo S, Kim H, Hwang H. Big data analysis and artificial intelligence in epilepsy - common data model analysis and machine learning-based seizure detection and forecasting. Clin Exp Pediatr 2022; 65:272-282. [PMID: 34844397 PMCID: PMC9171464 DOI: 10.3345/cep.2021.00766] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Accepted: 10/27/2021] [Indexed: 11/27/2022] Open
Abstract
There has been significant interest in big data analysis and artificial intelligence (AI) in medicine. Ever-increasing medical data and advanced computing power have enabled the number of big data analyses and AI studies to increase rapidly. Here we briefly introduce epilepsy, big data, and AI and review big data analysis using a common data model. Studies in which AI has been actively applied, such as those of electroencephalography epileptiform discharge detection, seizure detection, and forecasting, will be reviewed. We will also provide practical suggestions for pediatricians to understand and interpret big data analysis and AI research and work together with technical expertise.
Collapse
Affiliation(s)
- Yoon Gi Chung
- Division of Pediatric Neurology, Department of Pediatrics, Seoul National University Bundang Hospital, Seongnam, Korea
| | | | - Sooyoung Yoo
- Office of eHealth Research and Business, Seoul National University Bundang Hospital, Seongnam, Korea
| | - Hunmin Kim
- Division of Pediatric Neurology, Department of Pediatrics, Seoul National University Bundang Hospital, Seongnam, Korea.,Department of Pediatrics, Seoul National University College of Medicine, Seoul, Korea
| | - Hee Hwang
- Division of Pediatric Neurology, Department of Pediatrics, Seoul National University Bundang Hospital, Seongnam, Korea.,Department of Pediatrics, Seoul National University College of Medicine, Seoul, Korea
| |
Collapse
|
9
|
Quiroz JC, Chard T, Sa Z, Ritchie A, Jorm L, Gallego B. Extract, transform, load framework for the conversion of health databases to OMOP. PLoS One 2022; 17:e0266911. [PMID: 35404974 PMCID: PMC9000122 DOI: 10.1371/journal.pone.0266911] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 03/29/2022] [Indexed: 11/22/2022] Open
Abstract
Common data models standardize the structures and semantics of health datasets, enabling reproducibility and large-scale studies that leverage the data from multiple locations and settings. The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) is one of the leading common data models. While there is a strong incentive to convert datasets to OMOP, the conversion is time and resource-intensive, leaving the research community in need of tools for mapping data to OMOP. We propose an extract, transform, load (ETL) framework that is metadata-driven and generic across source datasets. The ETL framework uses a new data manipulation language (DML) that organizes SQL snippets in YAML. Our framework includes a compiler that converts YAML files with mapping logic into an ETL script. Access to the ETL framework is available via a web application, allowing users to upload and edit YAML files via web editor and obtain an ETL SQL script for use in development environments. The structure of the DML maximizes readability, refactoring, and maintainability, while minimizing technical debt and standardizing the writing of ETL operations for mapping to OMOP. Our framework also supports transparency of the mapping process and reuse by different institutions.
Collapse
Affiliation(s)
- Juan C. Quiroz
- Centre for Big Data Research in Health, UNSW, Sydney, Australia
- * E-mail:
| | - Tim Chard
- Centre for Big Data Research in Health, UNSW, Sydney, Australia
| | - Zhisheng Sa
- Centre for Big Data Research in Health, UNSW, Sydney, Australia
| | - Angus Ritchie
- Concord Clinical School, University of Sydney, Sydney, Australia
- Health Informatics Unit, Sydney Local Health District, Camperdown, Australia
| | - Louisa Jorm
- Centre for Big Data Research in Health, UNSW, Sydney, Australia
| | - Blanca Gallego
- Centre for Big Data Research in Health, UNSW, Sydney, Australia
| |
Collapse
|
10
|
Paris N, Lamer A, Parrot A. Transformation and Evaluation of the MIMIC Database in the OMOP Common Data Model: Development and Usability Study. JMIR Med Inform 2021; 9:e30970. [PMID: 34904958 PMCID: PMC8715361 DOI: 10.2196/30970] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Revised: 10/03/2021] [Accepted: 10/05/2021] [Indexed: 12/22/2022] Open
Abstract
Background In the era of big data, the intensive care unit (ICU) is likely to benefit from real-time computer analysis and modeling based on close patient monitoring and electronic health record data. The Medical Information Mart for Intensive Care (MIMIC) is the first open access database in the ICU domain. Many studies have shown that common data models (CDMs) improve database searching by allowing code, tools, and experience to be shared. The Observational Medical Outcomes Partnership (OMOP) CDM is spreading all over the world. Objective The objective was to transform MIMIC into an OMOP database and to evaluate the benefits of this transformation for analysts. Methods We transformed MIMIC (version 1.4.21) into OMOP format (version 5.3.3.1) through semantic and structural mapping. The structural mapping aimed at moving the MIMIC data into the right place in OMOP, with some data transformations. The mapping was divided into 3 phases: conception, implementation, and evaluation. The conceptual mapping aimed at aligning the MIMIC local terminologies to OMOP's standard ones. It consisted of 3 phases: integration, alignment, and evaluation. A documented, tested, versioned, exemplified, and open repository was set up to support the transformation and improvement of the MIMIC community's source code. The resulting data set was evaluated over a 48-hour datathon. Results With an investment of 2 people for 500 hours, 64% of the data items of the 26 MIMIC tables were standardized into the OMOP CDM and 78% of the source concepts mapped to reference terminologies. The model proved its ability to support community contributions and was well received during the datathon, with 160 participants and 15,000 requests executed with a maximum duration of 1 minute. Conclusions The resulting MIMIC-OMOP data set is the first MIMIC-OMOP data set available free of charge with real disidentified data ready for replicable intensive care research. This approach can be generalized to any medical field.
Collapse
Affiliation(s)
| | - Antoine Lamer
- InterHop, Paris, France.,Univ. Lille, CHU Lille, ULR 2694 - METRICS: Évaluation des Technologies de santé et des Pratiques médicales, Lille, France
| | | |
Collapse
|
11
|
Lamer A, Abou-Arab O, Bourgeois A, Parrot A, Popoff B, Beuscart JB, Tavernier B, Moussa MD. Transforming Anesthesia Data Into the Observational Medical Outcomes Partnership Common Data Model: Development and Usability Study. J Med Internet Res 2021; 23:e29259. [PMID: 34714250 PMCID: PMC8590192 DOI: 10.2196/29259] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Revised: 06/14/2021] [Accepted: 07/05/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Electronic health records (EHRs, such as those created by an anesthesia management system) generate a large amount of data that can notably be reused for clinical audits and scientific research. The sharing of these data and tools is generally affected by the lack of system interoperability. To overcome these issues, Observational Health Data Sciences and Informatics (OHDSI) developed the Observational Medical Outcomes Partnership (OMOP) common data model (CDM) to standardize EHR data and promote large-scale observational and longitudinal research. Anesthesia data have not previously been mapped into the OMOP CDM. OBJECTIVE The primary objective was to transform anesthesia data into the OMOP CDM. The secondary objective was to provide vocabularies, queries, and dashboards that might promote the exploitation and sharing of anesthesia data through the CDM. METHODS Using our local anesthesia data warehouse, a group of 5 experts from 5 different medical centers identified local concepts related to anesthesia. The concepts were then matched with standard concepts in the OHDSI vocabularies. We performed structural mapping between the design of our local anesthesia data warehouse and the OMOP CDM tables and fields. To validate the implementation of anesthesia data into the OMOP CDM, we developed a set of queries and dashboards. RESULTS We identified 522 concepts related to anesthesia care. They were classified as demographics, units, measurements, operating room steps, drugs, periods of interest, and features. After semantic mapping, 353 (67.7%) of these anesthesia concepts were mapped to OHDSI concepts. Further, 169 (32.3%) concepts related to periods and features were added to the OHDSI vocabularies. Then, 8 OMOP CDM tables were implemented with anesthesia data and 2 new tables (EPISODE and FEATURE) were added to store secondarily computed data. We integrated data from 5,72,609 operations and provided the code for a set of 8 queries and 4 dashboards related to anesthesia care. CONCLUSIONS Generic data concerning demographics, drugs, units, measurements, and operating room steps were already available in OHDSI vocabularies. However, most of the intraoperative concepts (the duration of specific steps, an episode of hypotension, etc) were not present in OHDSI vocabularies. The OMOP mapping provided here enables anesthesia data reuse.
Collapse
Affiliation(s)
- Antoine Lamer
- Univ. Lille, CHU Lille, ULR 2694 - METRICS: Évaluation des technologies de santé et des pratiques médicales, Lille, France
- InterHop, Paris, France
- Univ. Lille, Faculté Ingénierie et Management de la Santé, Lille, France
| | - Osama Abou-Arab
- Department of Anaesthesiology and Critical Care Medicine, Amiens Picardie University Hospital, Amiens, France
| | - Alexandre Bourgeois
- Department of Anesthesiology and Critical Care Medicine, Regional University Hospital of Nancy, Nancy, France
| | | | - Benjamin Popoff
- Department of Anaesthesiology and Critical Care, Rouen University Hospital, Rouen, France
| | - Jean-Baptiste Beuscart
- Univ. Lille, CHU Lille, ULR 2694 - METRICS: Évaluation des technologies de santé et des pratiques médicales, Lille, France
| | - Benoît Tavernier
- Univ. Lille, CHU Lille, ULR 2694 - METRICS: Évaluation des technologies de santé et des pratiques médicales, Lille, France
- Department of Anesthesiology and Critical Care, CHU Lille, Lille, France
| | | |
Collapse
|
12
|
Dhombres F, Charlet J. Knowledge Representation and Management: Interest in New Solutions for Ontology Curation. Yearb Med Inform 2021; 30:185-190. [PMID: 34479390 PMCID: PMC8416227 DOI: 10.1055/s-0041-1726508] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Objective:
To select, present and summarize some of the best papers in the field of Knowledge Representation and Management (KRM) published in 2020.
Methods:
A comprehensive and standardized review of the medical informatics literature was performed to select the most interesting papers of KRM published in 2020, based on PubMed queries. This review was conducted according to the IMIA Yearbook guidelines.
Results:
Four best papers were selected among 1,175 publications. In contrast with the papers selected last year, the four best papers of 2020 demonstrated a significant focus on methods and tools for ontology curation and design. The usual KRM application domains (bioinformatics, machine learning, and electronic health records) were also represented.
Conclusion:
In 2020, ontology curation emerges as a significant topic of research interest. Bioinformatics, machine learning, and electronics health records remain significant research areas in the KRM community with various applications. Knowledge representations are key to advance machine learning by providing context and to develop novel bioinformatics metrics. As in 2019, representations serve a great variety of applications across many medical domains, with actionable results and now with growing adhesion to the open science initiative.
Collapse
Affiliation(s)
- Ferdinand Dhombres
- Sorbonne Université, INSERM, Univ Sorbonne Paris Nord, LIMICS, Paris, France.,Sorbonne Université, Service de Médecine Fœtale, DMU Origyne, AP-HP, Hôpital Armand Trousseau, Paris, France
| | - Jean Charlet
- Sorbonne Université, INSERM, Univ Sorbonne Paris Nord, LIMICS, Paris, France.,AP-HP, DRCI, Paris, France
| | | |
Collapse
|
13
|
Sathappan SMK, Jeon YS, Dang TK, Lim SC, Shao YM, Tai ES, Feng M. Transformation of Electronic Health Records and Questionnaire Data to OMOP CDM: A Feasibility Study Using SG_T2DM Dataset. Appl Clin Inform 2021; 12:757-767. [PMID: 34380168 PMCID: PMC8357458 DOI: 10.1055/s-0041-1732301] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Background
Diabetes mellitus (DM) is an important public health concern in Singapore and places a massive burden on health care spending. Tackling chronic diseases such as DM requires innovative strategies to integrate patients' data from diverse sources and use scientific discovery to inform clinical practice that can help better manage the disease. The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) was chosen as the framework for integrating data with disparate formats.
Objective
The study aimed to evaluate the feasibility of converting Singapore based data source, comprising of electronic health records (EHR), cognitive and depression assessment questionnaire data to OMOP CDM standard. Additionally, we also validate whether our OMOP CDM instance is fit for the purpose of research by executing a simple treatment pathways study using Atlas, a graphical user interface tool to conduct analysis on OMOP CDM data as a proof of concept.
Methods
We used de-identified EHR, cognitive, and depression assessment questionnaires data from a tertiary care hospital in Singapore to convert it to version 5.3.1 of OMOP CDM standard. We evaluate the OMOP CDM conversion by (1) assessing the mapping coverage (that is the percentage of source terms mapped to OMOP CDM standard); (2) local raw dataset versus CDM dataset analysis; and (3) Implementing Harmonized Intrinsic Data Quality Framework using an open-source R package called Data Quality Dashboard.
Results
The content coverage of OMOP CDM vocabularies is more than 90% for clinical data, but only around 11% for questionnaire data. The comparison of characteristics between source and target data returned consistent results and our transformed data did not pass 38 (1.4%) out of 2,622 quality checks.
Conclusion
Adoption of OMOP CDM at our site demonstrated that EHR data are feasible for standardization with minimal information loss, whereas challenges remain for standardizing cognitive and depression assessment questionnaire data that requires further work.
Collapse
Affiliation(s)
- Selva Muthu Kumaran Sathappan
- Saw Swee Hock School of Public Health, National University Health System and National University of Singapore, Singapore, Singapore
| | - Young Seok Jeon
- Saw Swee Hock School of Public Health, National University Health System and National University of Singapore, Singapore, Singapore
| | - Trung Kien Dang
- Saw Swee Hock School of Public Health, National University Health System and National University of Singapore, Singapore, Singapore
| | - Su Chi Lim
- Clinical Research Unit, Khoo Teck Puat Hospital, Singapore, Singapore
| | - Yi-Ming Shao
- Clinical Research Unit, Khoo Teck Puat Hospital, Singapore, Singapore
| | - E Shyong Tai
- Division of Endocrinology, National University Hospital, Singapore, Singapore
| | - Mengling Feng
- Saw Swee Hock School of Public Health, National University Health System and National University of Singapore, Singapore, Singapore.,Institute of Data Science, National University of Singapore, Singapore, Singapore
| |
Collapse
|
14
|
Kang B, Yoon J, Kim HY, Jo SJ, Lee Y, Kam HJ. Deep-learning-based automated terminology mapping in OMOP-CDM. J Am Med Inform Assoc 2021; 28:1489-1496. [PMID: 33987667 DOI: 10.1093/jamia/ocab030] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2020] [Revised: 01/07/2021] [Accepted: 02/05/2021] [Indexed: 11/14/2022] Open
Abstract
OBJECTIVE Accessing medical data from multiple institutions is difficult owing to the interinstitutional diversity of vocabularies. Standardization schemes, such as the common data model, have been proposed as solutions to this problem, but such schemes require expensive human supervision. This study aims to construct a trainable system that can automate the process of semantic interinstitutional code mapping. MATERIALS AND METHODS To automate mapping between source and target codes, we compute the embedding-based semantic similarity between corresponding descriptive sentences. We also implement a systematic approach for preparing training data for similarity computation. Experimental results are compared to traditional word-based mappings. RESULTS The proposed model is compared against the state-of-the-art automated matching system, which is called Usagi, of the Observational Medical Outcomes Partnership common data model. By incorporating multiple negative training samples per positive sample, our semantic matching method significantly outperforms Usagi. Its matching accuracy is at least 10% greater than that of Usagi, and this trend is consistent across various top-k measurements. DISCUSSION The proposed deep learning-based mapping approach outperforms previous simple word-level matching algorithms because it can account for contextual and semantic information. Additionally, we demonstrate that the manner in which negative training samples are selected significantly affects the overall performance of the system. CONCLUSION Incorporating the semantics of code descriptions more significantly increases matching accuracy compared to traditional text co-occurrence-based approaches. The negative training sample collection methodology is also an important component of the proposed trainable system that can be adopted in both present and future related systems.
Collapse
Affiliation(s)
- Byungkon Kang
- Department of Computer Science, State University of New York, Incheon, South Korea
| | - Jisang Yoon
- Graduate School of Information, Yonsei University, Seoul, South Korea
| | - Ha Young Kim
- Graduate School of Information, Yonsei University, Seoul, South Korea
| | - Sung Jin Jo
- Department of Industrial and Management Engineering, Pohang University of Science and Technology, Pohang, North Gyeongsang,South Korea
| | - Yourim Lee
- RWE Analytics, EvidNet, Seongnam-si, Gyeonggi-do, South Korea
| | - Hye Jin Kam
- Healthcare, Life Solution Cluster, New Business Unit, Hanwha Life, Seoul, South Korea
| |
Collapse
|
15
|
Maier C, Kapsner LA, Mate S, Prokosch HU, Kraus S. Patient Cohort Identification on Time Series Data Using the OMOP Common Data Model. Appl Clin Inform 2021; 12:57-64. [PMID: 33506478 DOI: 10.1055/s-0040-1721481] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND The identification of patient cohorts for recruiting patients into clinical trials requires an evaluation of study-specific inclusion and exclusion criteria. These criteria are specified depending on corresponding clinical facts. Some of these facts may not be present in the clinical source systems and need to be calculated either in advance or at cohort query runtime (so-called feasibility query). OBJECTIVES We use the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) as the repository for our clinical data. However, Atlas, the graphical user interface of OMOP, does not offer the functionality to perform calculations on facts data. Therefore, we were in search for a different approach. The objective of this study is to investigate whether the Arden Syntax can be used for feasibility queries on the OMOP CDM to enable on-the-fly calculations at query runtime, to eliminate the need to precalculate data elements that are involved with researchers' criteria specification. METHODS We implemented a service that reads the facts from the OMOP repository and provides it in a form which an Arden Syntax Medical Logic Module (MLM) can process. Then, we implemented an MLM that applies the eligibility criteria to every patient data set and outputs the list of eligible cases (i.e., performs the feasibility query). RESULTS The study resulted in an MLM-based feasibility query that identifies cases of overventilation as an example of how an on-the-fly calculation can be realized. The algorithm is split into two MLMs to provide the reusability of the approach. CONCLUSION We found that MLMs are a suitable technology for feasibility queries on the OMOP CDM. Our method of performing on-the-fly calculations can be employed with any OMOP instance and without touching existing infrastructure like the Extract, Transform and Load pipeline. Therefore, we think that it is a well-suited method to perform on-the-fly calculations on OMOP.
Collapse
Affiliation(s)
- Christian Maier
- Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Bayern, Germany
| | - Lorenz A Kapsner
- Medical Center for Information and Communication Technology, Universitätsklinikum Erlangen, Erlangen, Bayern, Germany
| | - Sebastian Mate
- Medical Center for Information and Communication Technology, Universitätsklinikum Erlangen, Erlangen, Bayern, Germany
| | - Hans-Ulrich Prokosch
- Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Bayern, Germany.,Medical Center for Information and Communication Technology, Universitätsklinikum Erlangen, Erlangen, Bayern, Germany
| | - Stefan Kraus
- Department of Computer Science, Mannheim University of Applied Sciences, Mannheim, Baden-Württemberg, Germany
| |
Collapse
|
16
|
Unberath P, Prokosch HU, Gründner J, Erpenbeck M, Maier C, Christoph J. EHR-Independent Predictive Decision Support Architecture Based on OMOP. Appl Clin Inform 2020; 11:399-404. [PMID: 32492716 PMCID: PMC7269719 DOI: 10.1055/s-0040-1710393] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND The increasing availability of molecular and clinical data of cancer patients combined with novel machine learning techniques has the potential to enhance clinical decision support, example, for assessing a patient's relapse risk. While these prediction models often produce promising results, a deployment in clinical settings is rarely pursued. OBJECTIVES In this study, we demonstrate how prediction tools can be integrated generically into a clinical setting and provide an exemplary use case for predicting relapse risk in melanoma patients. METHODS To make the decision support architecture independent of the electronic health record (EHR) and transferable to different hospital environments, it was based on the widely used Observational Medical Outcomes Partnership (OMOP) common data model (CDM) rather than on a proprietary EHR data structure. The usability of our exemplary implementation was evaluated by means of conducting user interviews including the thinking-aloud protocol and the system usability scale (SUS) questionnaire. RESULTS An extract-transform-load process was developed to extract relevant clinical and molecular data from their original sources and map them to OMOP. Further, the OMOP WebAPI was adapted to retrieve all data for a single patient and transfer them into the decision support Web application for enabling physicians to easily consult the prediction service including monitoring of transferred data. The evaluation of the application resulted in a SUS score of 86.7. CONCLUSION This work proposes an EHR-independent means of integrating prediction models for deployment in clinical settings, utilizing the OMOP CDM. The usability evaluation revealed that the application is generally suitable for routine use while also illustrating small aspects for improvement.
Collapse
Affiliation(s)
- Philipp Unberath
- Department of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Hans Ulrich Prokosch
- Department of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Julian Gründner
- Department of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Marcel Erpenbeck
- Department of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Christian Maier
- Department of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Jan Christoph
- Department of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| |
Collapse
|