1
|
Coutinho-Almeida J, Saez C, Correia R, Rodrigues PP. Development and initial validation of a data quality evaluation tool in obstetrics real-world data through HL7-FHIR interoperable Bayesian networks and expert rules. JAMIA Open 2024; 7:ooae062. [PMID: 39070966 PMCID: PMC11283181 DOI: 10.1093/jamiaopen/ooae062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 06/05/2024] [Accepted: 06/19/2024] [Indexed: 07/30/2024] Open
Abstract
Background The increasing prevalence of electronic health records (EHRs) in healthcare systems globally has underscored the importance of data quality for clinical decision-making and research, particularly in obstetrics. High-quality data is vital for an accurate representation of patient populations and to avoid erroneous healthcare decisions. However, existing studies have highlighted significant challenges in EHR data quality, necessitating innovative tools and methodologies for effective data quality assessment and improvement. Objective This article addresses the critical need for data quality evaluation in obstetrics by developing a novel tool. The tool utilizes Health Level 7 (HL7) Fast Healthcare Interoperable Resources (FHIR) standards in conjunction with Bayesian Networks and expert rules, offering a novel approach to assessing data quality in real-world obstetrics data. Methods A harmonized framework focusing on completeness, plausibility, and conformance underpins our methodology. We employed Bayesian networks for advanced probabilistic modeling, integrated outlier detection methods, and a rule-based system grounded in domain-specific knowledge. The development and validation of the tool were based on obstetrics data from 9 Portuguese hospitals, spanning the years 2019-2020. Results The developed tool demonstrated strong potential for identifying data quality issues in obstetrics EHRs. Bayesian networks used in the tool showed high performance for various features with area under the receiver operating characteristic curve (AUROC) between 75% and 97%. The tool's infrastructure and interoperable format as a FHIR Application Programming Interface (API) enables a possible deployment of a real-time data quality assessment in obstetrics settings. Our initial assessments show promised, even when compared with physicians' assessment of real records, the tool can reach AUROC of 88%, depending on the threshold defined. Discussion Our results also show that obstetrics clinical records are difficult to assess in terms of quality and assessments like ours could benefit from more categorical approaches of ranking between bad and good quality. Conclusion This study contributes significantly to the field of EHR data quality assessment, with a specific focus on obstetrics. The combination of HL7-FHIR interoperability, machine learning techniques, and expert knowledge presents a robust, adaptable solution to the challenges of healthcare data quality. Future research should explore tailored data quality evaluations for different healthcare contexts, as well as further validation of the tool capabilities, enhancing the tool's utility across diverse medical domains.
Collapse
Affiliation(s)
- João Coutinho-Almeida
- CINTESIS@RISE—Centre for Health Technologies and Services Research, University of Porto, 4200-319 Porto, Portugal
- MEDCIDS—Faculty of Medicine of University of Porto, 4200-319 Porto, Portugal
- Health Data Science PhD Program, Faculty of Medicine of the University of Porto, 4200-319 Porto, Portugal
| | - Carlos Saez
- Instituto Universitario de Aplicaciones de las Tecnologías de la Información y de las Comunicaciones Avanzadas, Universitat Politècnica de València, 46022 Valencia, Spain
| | - Ricardo Correia
- CINTESIS@RISE—Centre for Health Technologies and Services Research, University of Porto, 4200-319 Porto, Portugal
- MEDCIDS—Faculty of Medicine of University of Porto, 4200-319 Porto, Portugal
- Health Data Science PhD Program, Faculty of Medicine of the University of Porto, 4200-319 Porto, Portugal
| | - Pedro Pereira Rodrigues
- CINTESIS@RISE—Centre for Health Technologies and Services Research, University of Porto, 4200-319 Porto, Portugal
- MEDCIDS—Faculty of Medicine of University of Porto, 4200-319 Porto, Portugal
- Health Data Science PhD Program, Faculty of Medicine of the University of Porto, 4200-319 Porto, Portugal
| |
Collapse
|
2
|
Rosenau L, Behrend P, Wiedekopf J, Gruendner J, Ingenerf J. Uncovering Harmonization Potential in Health Care Data Through Iterative Refinement of Fast Healthcare Interoperability Resources Profiles Based on Retrospective Discrepancy Analysis: Case Study. JMIR Med Inform 2024; 12:e57005. [PMID: 39042420 PMCID: PMC11303887 DOI: 10.2196/57005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 04/15/2024] [Accepted: 04/17/2024] [Indexed: 07/24/2024] Open
Abstract
BACKGROUND Cross-institutional interoperability between health care providers remains a recurring challenge worldwide. The German Medical Informatics Initiative, a collaboration of 37 university hospitals in Germany, aims to enable interoperability between partner sites by defining Fast Healthcare Interoperability Resources (FHIR) profiles for the cross-institutional exchange of health care data, the Core Data Set (CDS). The current CDS and its extension modules define elements representing patients' health care records. All university hospitals in Germany have made significant progress in providing routine data in a standardized format based on the CDS. In addition, the central research platform for health, the German Portal for Medical Research Data feasibility tool, allows medical researchers to query the available CDS data items across many participating hospitals. OBJECTIVE In this study, we aimed to evaluate a novel approach of combining the current top-down generated FHIR profiles with the bottom-up generated knowledge gained by the analysis of respective instance data. This allowed us to derive options for iteratively refining FHIR profiles using the information obtained from a discrepancy analysis. METHODS We developed an FHIR validation pipeline and opted to derive more restrictive profiles from the original CDS profiles. This decision was driven by the need to align more closely with the specific assumptions and requirements of the central feasibility platform's search ontology. While the original CDS profiles offer a generic framework adaptable for a broad spectrum of medical informatics use cases, they lack the specificity to model the nuanced criteria essential for medical researchers. A key example of this is the necessity to represent specific laboratory codings and values interdependencies accurately. The validation results allow us to identify discrepancies between the instance data at the clinical sites and the profiles specified by the feasibility platform and addressed in the future. RESULTS A total of 20 university hospitals participated in this study. Historical factors, lack of harmonization, a wide range of source systems, and case sensitivity of coding are some of the causes for the discrepancies identified. While in our case study, Conditions, Procedures, and Medications have a high degree of uniformity in the coding of instance data due to legislative requirements for billing in Germany, we found that laboratory values pose a significant data harmonization challenge due to their interdependency between coding and value. CONCLUSIONS While the CDS achieves interoperability, different challenges for federated data access arise, requiring more specificity in the profiles to make assumptions on the instance data. We further argue that further harmonization of the instance data can significantly lower required retrospective harmonization efforts. We recognize that discrepancies cannot be resolved solely at the clinical site; therefore, our findings have a wide range of implications and will require action on multiple levels and by various stakeholders.
Collapse
Affiliation(s)
- Lorenz Rosenau
- IT Center for Clinical Research, University of Lübeck, Lübeck, Germany
| | - Paul Behrend
- IT Center for Clinical Research, University of Lübeck, Lübeck, Germany
| | - Joshua Wiedekopf
- IT Center for Clinical Research, University of Lübeck, Lübeck, Germany
| | - Julian Gruendner
- Chair for Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Josef Ingenerf
- IT Center for Clinical Research, University of Lübeck, Lübeck, Germany
- Institute of Medical Informatics, University of Lübeck, Lübeck, Germany
| |
Collapse
|
3
|
Rödle W, Prokosch HU, Neumann E, Toni I, Haering-Zahn J, Neubert A, Eberl S. Creating a Medication Therapy Observational Research Database from an Electronic Medical Record: Challenges and Data Curation. Appl Clin Inform 2024; 15:111-118. [PMID: 38325408 PMCID: PMC10849827 DOI: 10.1055/s-0043-1777741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 08/28/2023] [Indexed: 02/09/2024] Open
Abstract
BACKGROUND Observational research has shown its potential to complement experimental research and clinical trials by secondary use of treatment data from hospital care processes. It can also be applied to better understand pediatric drug utilization for establishing safer drug therapy. Clinical documentation processes often limit data quality in pediatric medical records requiring data curation steps, which are mostly underestimated. OBJECTIVES The objectives of this study were to transform and curate data from a departmental electronic medical record into an observational research database. We particularly aim at identifying data quality problems, illustrating reasons for such problems and describing the systematic data curation process established to create high-quality data for observational research. METHODS Data were extracted from an electronic medical record used by four wards of a German university children's hospital from April 2012 to June 2020. A four-step data preparation, mapping, and curation process was established. Data quality of the generated dataset was firstly assessed following an established 3 × 3 Data Quality Assessment guideline and secondly by comparing a sample subset of the database with an existing gold standard. RESULTS The generated dataset consists of 770,158 medication dispensations associated with 89,955 different drug exposures from 21,285 clinical encounters. A total of 6,840 different narrative drug therapy descriptions were mapped to 1,139 standard terms for drug exposures. Regarding the quality criterion correctness, the database was consistent and had overall a high agreement with our gold standard. CONCLUSION Despite large amounts of freetext descriptions and contextual knowledge implicitly included in the electronic medical record, we were able to identify relevant data quality issues and to establish a semi-automated data curation process leading to a high-quality observational research database. Because of inconsistent dosage information in the original documentation this database is limited to a drug utilization database without detailed dosage information.
Collapse
Affiliation(s)
- Wolfgang Rödle
- Chair of Medical Informatics, Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
| | - Hans-Ulrich Prokosch
- Chair of Medical Informatics, Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
| | - Eva Neumann
- Dr Margarete Fischer Bosch Institute of Clinical Pharmacology, Stuttgart, Germany
| | - Irmgard Toni
- Departmant of Paediatrics and Adolescent Medicine, Universitätsklinikum Erlangen, Erlangen, Germany
| | - Julia Haering-Zahn
- Departmant of Paediatrics and Adolescent Medicine, Universitätsklinikum Erlangen, Erlangen, Germany
| | - Antje Neubert
- Departmant of Paediatrics and Adolescent Medicine, Universitätsklinikum Erlangen, Erlangen, Germany
| | - Sonja Eberl
- Departmant of Paediatrics and Adolescent Medicine, Universitätsklinikum Erlangen, Erlangen, Germany
| |
Collapse
|
4
|
Gierend K, Freiesleben S, Kadioglu D, Siegel F, Ganslandt T, Waltemath D. The Status of Data Management Practices Across German Medical Data Integration Centers: Mixed Methods Study. J Med Internet Res 2023; 25:e48809. [PMID: 37938878 PMCID: PMC10666010 DOI: 10.2196/48809] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 09/09/2023] [Accepted: 09/29/2023] [Indexed: 11/10/2023] Open
Abstract
BACKGROUND In the context of the Medical Informatics Initiative, medical data integration centers (DICs) have implemented complex data flows to transfer routine health care data into research data repositories for secondary use. Data management practices are of importance throughout these processes, and special attention should be given to provenance aspects. Insufficient knowledge can lead to validity risks and reduce the confidence and quality of the processed data. The need to implement maintainable data management practices is undisputed, but there is a great lack of clarity on the status. OBJECTIVE Our study examines the current data management practices throughout the data life cycle within the Medical Informatics in Research and Care in University Medicine (MIRACUM) consortium. We present a framework for the maturity status of data management practices and present recommendations to enable a trustful dissemination and reuse of routine health care data. METHODS In this mixed methods study, we conducted semistructured interviews with stakeholders from 10 DICs between July and September 2021. We used a self-designed questionnaire that we tailored to the MIRACUM DICs, to collect qualitative and quantitative data. Our study method is compliant with the Good Reporting of a Mixed Methods Study (GRAMMS) checklist. RESULTS Our study provides insights into the data management practices at the MIRACUM DICs. We identify several traceability issues that can be partially explained with a lack of contextual information within nonharmonized workflow steps, unclear responsibilities, missing or incomplete data elements, and incomplete information about the computational environment information. Based on the identified shortcomings, we suggest a data management maturity framework to reach more clarity and to help define enhanced data management strategies. CONCLUSIONS The data management maturity framework supports the production and dissemination of accurate and provenance-enriched data for secondary use. Our work serves as a catalyst for the derivation of an overarching data management strategy, abiding data integrity and provenance characteristics as key factors. We envision that this work will lead to the generation of fairer and maintained health research data of high quality.
Collapse
Affiliation(s)
- Kerstin Gierend
- Department of Biomedical Informatics at the Center for Preventive Medicine and Digital Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Sherry Freiesleben
- Core Unit Data Integration Center and Medical Informatics Laboratory, University Medicine Greifswald, Greifswald, Germany
| | - Dennis Kadioglu
- Institute for Medical Informatics (IMI), Goethe University Frankfurt, University Hospital, Frankfurt am Main, Germany
- Department for Information and Communication Technology (DICT), Data Integration Center (DIC), Goethe University Frankfurt, University Hospital, Frankfurt am Main, Germany
| | - Fabian Siegel
- Department of Biomedical Informatics at the Center for Preventive Medicine and Digital Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Thomas Ganslandt
- Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Dagmar Waltemath
- Core Unit Data Integration Center and Medical Informatics Laboratory, University Medicine Greifswald, Greifswald, Germany
| |
Collapse
|
5
|
Palm J, Meineke FA, Przybilla J, Peschel T. "fhircrackr": An R Package Unlocking Fast Healthcare Interoperability Resources for Statistical Analysis. Appl Clin Inform 2023; 14:54-64. [PMID: 36696915 PMCID: PMC9876659 DOI: 10.1055/s-0042-1760436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND The growing interest in the secondary use of electronic health record (EHR) data has increased the number of new data integration and data sharing infrastructures. The present work has been developed in the context of the German Medical Informatics Initiative, where 29 university hospitals agreed to the usage of the Health Level Seven Fast Healthcare Interoperability Resources (FHIR) standard for their newly established data integration centers. This standard is optimized to describe and exchange medical data but less suitable for standard statistical analysis which mostly requires tabular data formats. OBJECTIVES The objective of this work is to establish a tool that makes FHIR data accessible for standard statistical analysis by providing means to retrieve and transform data from a FHIR server. The tool should be implemented in a programming environment known to most data analysts and offer functions with variable degrees of flexibility and automation catering to users with different levels of FHIR expertise. METHODS We propose the fhircrackr framework, which allows downloading and flattening FHIR resources for data analysis. The framework supports different download and authentication protocols and gives the user full control over the data that is extracted from the FHIR resources and transformed into tables. We implemented it using the programming language R [1] and published it under the GPL-3 open source license. RESULTS The framework was successfully applied to both publicly available test data and real-world data from several ongoing studies. While the processing of larger real-world data sets puts a considerable burden on computation time and memory consumption, those challenges can be attenuated with a number of suitable measures like parallelization and temporary storage mechanisms. CONCLUSION The fhircrackr R package provides an open source solution within an environment that is familiar to most data scientists and helps overcome the practical challenges that still hamper the usage of EHR data for research.
Collapse
Affiliation(s)
- Julia Palm
- Institute of Medical Statistics, Computer and Data Sciences, Jena University Hospital, Jena, Thüringen, Germany
| | - Frank A Meineke
- Institute for Medical Informatics, Statistics and Epidemiology, University of Leipzig, Leipzig, Germany
| | - Jens Przybilla
- Institute for Medical Informatics, Statistics and Epidemiology, University of Leipzig, Leipzig, Germany.,Clinical Trial Centre Leipzig, University of Leipzig, Leipzig, Germany
| | - Thomas Peschel
- Institute for Medical Informatics, Statistics and Epidemiology, University of Leipzig, Leipzig, Germany
| |
Collapse
|
6
|
Mang JM, Seuchter SA, Gulden C, Schild S, Kraska D, Prokosch HU, Kapsner LA. DQAgui: a graphical user interface for the MIRACUM data quality assessment tool. BMC Med Inform Decis Mak 2022; 22:213. [PMID: 35953813 PMCID: PMC9367129 DOI: 10.1186/s12911-022-01961-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Accepted: 08/03/2022] [Indexed: 11/11/2022] Open
Abstract
Background With the growing impact of observational research studies, there is also a growing focus on data quality (DQ). As opposed to experimental study designs, observational research studies are performed using data mostly collected in a non-research context (secondary use). Depending on the number of data elements to be analyzed, DQ reports of data stored within research networks can grow very large. They might be cumbersome to read and important information could be overseen quickly. To address this issue, a DQ assessment (DQA) tool with a graphical user interface (GUI) was developed and provided as a web application. Methods The aim was to provide an easy-to-use interface for users without prior programming knowledge to carry out DQ checks and to present the results in a clearly structured way. This interface serves as a starting point for a more detailed investigation of possible DQ irregularities. A user-centered development process ensured the practical feasibility of the interactive GUI. The interface was implemented in the R programming language and aligned to Kahn et al.’s DQ categories conformance, completeness and plausibility. Results With DQAgui, an R package with a web-app frontend for DQ assessment was developed. The GUI allows users to perform DQ analyses of tabular data sets and to systematically evaluate the results. During the development of the GUI, additional features were implemented, such as analyzing a subset of the data by defining time periods and restricting the analyses to certain data elements. Conclusions As part of the MIRACUM project, DQAgui is now being used at ten German university hospitals for DQ assessment and to provide a central overview of the availability of important data elements in a datamap over 2 years. Future development efforts should focus on design optimization and include a usability evaluation. Supplementary Information The online version contains supplementary material available at 10.1186/s12911-022-01961-z.
Collapse
Affiliation(s)
- Jonathan M Mang
- Medical Center for Information and Communication Technology, Universitätsklinikum Erlangen, Erlangen, Germany.
| | - Susanne A Seuchter
- Medical Center for Information and Communication Technology, Universitätsklinikum Erlangen, Erlangen, Germany
| | - Christian Gulden
- Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany
| | - Stefanie Schild
- Medical Center for Information and Communication Technology, Universitätsklinikum Erlangen, Erlangen, Germany.,Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany
| | - Detlef Kraska
- Medical Center for Information and Communication Technology, Universitätsklinikum Erlangen, Erlangen, Germany
| | - Hans-Ulrich Prokosch
- Medical Center for Information and Communication Technology, Universitätsklinikum Erlangen, Erlangen, Germany.,Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany
| | - Lorenz A Kapsner
- Medical Center for Information and Communication Technology, Universitätsklinikum Erlangen, Erlangen, Germany.,Institute of Radiology, Universitätsklinikum Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| |
Collapse
|
7
|
R Packages for Data Quality Assessments and Data Monitoring: A Software Scoping Review with Recommendations for Future Developments. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12094238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Data quality assessments (DQA) are necessary to ensure valid research results. Despite the growing availability of tools of relevance for DQA in the R language, a systematic comparison of their functionalities is missing. Therefore, we review R packages related to data quality (DQ) and assess their scope against a DQ framework for observational health studies. Based on a systematic search, we screened more than 140 R packages related to DQA in the Comprehensive R Archive Network. From these, we selected packages which target at least three of the four DQ dimensions (integrity, completeness, consistency, accuracy) in a reference framework. We evaluated the resulting 27 packages for general features (e.g., usability, metadata handling, output types, descriptive statistics) and the possible assessment’s breadth. To facilitate comparisons, we applied all packages to a publicly available dataset from a cohort study. We found that the packages’ scope varies considerably regarding functionalities and usability. Only three packages follow a DQ concept, and some offer an extensive rule-based issue analysis. However, the reference framework does not include a few implemented functionalities, and it should be broadened accordingly. Improved use of metadata to empower DQA and user-friendliness enhancement, such as GUIs and reports that grade the severity of DQ issues, stand out as the main directions for future developments.
Collapse
|