1
|
Tabari P, Costagliola G, De Rosa M, Boeker M. State-of-the-Art Fast Healthcare Interoperability Resources (FHIR)-Based Data Model and Structure Implementations: Systematic Scoping Review. JMIR Med Inform 2024; 12:e58445. [PMID: 39316433 DOI: 10.2196/58445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2024] [Revised: 07/28/2024] [Accepted: 08/17/2024] [Indexed: 09/25/2024] Open
Abstract
BACKGROUND Data models are crucial for clinical research as they enable researchers to fully use the vast amount of clinical data stored in medical systems. Standardized data and well-defined relationships between data points are necessary to guarantee semantic interoperability. Using the Fast Healthcare Interoperability Resources (FHIR) standard for clinical data representation would be a practical methodology to enhance and accelerate interoperability and data availability for research. OBJECTIVE This research aims to provide a comprehensive overview of the state-of-the-art and current landscape in FHIR-based data models and structures. In addition, we intend to identify and discuss the tools, resources, limitations, and other critical aspects mentioned in the selected research papers. METHODS To ensure the extraction of reliable results, we followed the instructions of the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) checklist. We analyzed the indexed articles in PubMed, Scopus, Web of Science, IEEE Xplore, the ACM Digital Library, and Google Scholar. After identifying, extracting, and assessing the quality and relevance of the articles, we synthesized the extracted data to identify common patterns, themes, and variations in the use of FHIR-based data models and structures across different studies. RESULTS On the basis of the reviewed articles, we could identify 2 main themes: dynamic (pipeline-based) and static data models. The articles were also categorized into health care use cases, including chronic diseases, COVID-19 and infectious diseases, cancer research, acute or intensive care, random and general medical notes, and other conditions. Furthermore, we summarized the important or common tools and approaches of the selected papers. These items included FHIR-based tools and frameworks, machine learning approaches, and data storage and security. The most common resource was "Observation" followed by "Condition" and "Patient." The limitations and challenges of developing data models were categorized based on the issues of data integration, interoperability, standardization, performance, and scalability or generalizability. CONCLUSIONS FHIR serves as a highly promising interoperability standard for developing real-world health care apps. The implementation of FHIR modeling for electronic health record data facilitates the integration, transmission, and analysis of data while also advancing translational research and phenotyping. Generally, FHIR-based exports of local data repositories improve data interoperability for systems and data warehouses across different settings. However, ongoing efforts to address existing limitations and challenges are essential for the successful implementation and integration of FHIR data models.
Collapse
Affiliation(s)
- Parinaz Tabari
- Department of Informatics, University of Salerno, Fisciano, Italy
| | | | - Mattia De Rosa
- Department of Informatics, University of Salerno, Fisciano, Italy
| | - Martin Boeker
- Institute for Artificial Intelligence and Informatics in Medicine, Medical Center rechts der Isar, School of Medicine and Health, Technical University of Munich, Munich, Germany
| |
Collapse
|
2
|
Daniel Boie S, Meyer-Eschenbach F, Schreiber F, Giesa N, Barrenetxea J, Guinemer C, Haufe S, Krämer M, Brunecker P, Prasser F, Balzer F. A scalable approach for critical care data extraction and analysis in an academic medical center. Int J Med Inform 2024; 192:105611. [PMID: 39255725 DOI: 10.1016/j.ijmedinf.2024.105611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2024] [Revised: 08/16/2024] [Accepted: 08/28/2024] [Indexed: 09/12/2024]
Abstract
BACKGROUND Electronic health records are a valuable asset for research, but their use is challenging due to inconsistencies of records, heterogeneous formats and the distribution over multiple, non-integrated information systems. Hence, specialized health data engineering and data science expertise are required to enable research. To facilitate secondary use of clinical routine data collected in our intensive care wards, we developed a scalable approach, consisting of cohort generation, variable filtering and data extraction steps. OBJECTIVE With this report we share our workflow of data request, cohort identification and data extraction. We present an algorithm for automatic data extraction from our critical care information system (CCIS) that can be adapted to other object-oriented data bases. METHODS We introduced a data request process with functionalities for automated identification of patient cohorts and a specialized hierarchical data structure that supports filtering relevant variables from the CCIS and further systems for the specified cohorts. The data extraction algorithm takes patient pseudonyms and variable lists as inputs. Algorithms are implemented in Python, leveraging the PySpark framework running on our data lake infrastructure. RESULTS Our data request process is in operational use since June 2022. Since then we have served 121 projects with 148 service requests in total. We discuss the hierarchical structure and the frequently used data items of our CCIS in detail and present an application example, including cohort selection, data extraction and data transformation into an analyses-ready format. CONCLUSIONS Using clinical routine data for secondary research is challenging and requires an interdisciplinary team. We developed a scalable approach that automates steps for cohort identification, data extraction and common data pre-processing steps. Additionally, we facilitate data harmonization, integration and consult on typical data analysis scenarios, machine learning algorithms and visualizations in dashboards.
Collapse
Affiliation(s)
- Sebastian Daniel Boie
- Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Medical Informatics, Charitéplatz 1, 10117 Berlin, Germany.
| | - Falk Meyer-Eschenbach
- Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Medical Informatics, Charitéplatz 1, 10117 Berlin, Germany; Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Clinical Study Center, Charitéplatz 1, 10117 Berlin, Germany
| | - Fabian Schreiber
- Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Medical Informatics, Charitéplatz 1, 10117 Berlin, Germany
| | - Niklas Giesa
- Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Medical Informatics, Charitéplatz 1, 10117 Berlin, Germany; Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Clinical Study Center, Charitéplatz 1, 10117 Berlin, Germany
| | - Jon Barrenetxea
- Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Medical Informatics, Charitéplatz 1, 10117 Berlin, Germany
| | - Camille Guinemer
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Core Unit Research IT, Charitéplatz 1, 10117 Berlin, Germany
| | - Stefan Haufe
- Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Medical Informatics, Charitéplatz 1, 10117 Berlin, Germany; Physikalisch-Technische Bundesanstalt, Abbestrasse 2-12, 10587 Berlin, Germany; Technische Universität Berlin, Str. des 17. Juni 135, 10623 Berlin, Germany; Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin Center for Advanced Neuroimaging, Charitéplatz 1, 10117 Berlin, Germany
| | - Michael Krämer
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Core Unit Research IT, Charitéplatz 1, 10117 Berlin, Germany
| | - Peter Brunecker
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Core Unit Research IT, Charitéplatz 1, 10117 Berlin, Germany
| | - Fabian Prasser
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Medical Informatics Group, Charitéplatz 1, 10117 Berlin, Germany
| | - Felix Balzer
- Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Medical Informatics, Charitéplatz 1, 10117 Berlin, Germany
| |
Collapse
|
3
|
Molaei S, Bousejin NG, Ghosheh GO, Thakur A, Chauhan VK, Zhu T, Clifton DA. CliqueFluxNet: Unveiling EHR Insights with Stochastic Edge Fluxing and Maximal Clique Utilisation Using Graph Neural Networks. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2024; 8:555-575. [PMID: 39131103 PMCID: PMC11310186 DOI: 10.1007/s41666-024-00169-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 05/16/2024] [Accepted: 06/27/2024] [Indexed: 08/13/2024]
Abstract
Electronic Health Records (EHRs) play a crucial role in shaping predictive are models, yet they encounter challenges such as significant data gaps and class imbalances. Traditional Graph Neural Network (GNN) approaches have limitations in fully leveraging neighbourhood data or demanding intensive computational requirements for regularisation. To address this challenge, we introduce CliqueFluxNet, a novel framework that innovatively constructs a patient similarity graph to maximise cliques, thereby highlighting strong inter-patient connections. At the heart of CliqueFluxNet lies its stochastic edge fluxing strategy - a dynamic process involving random edge addition and removal during training. This strategy aims to enhance the model's generalisability and mitigate overfitting. Our empirical analysis, conducted on MIMIC-III and eICU datasets, focuses on the tasks of mortality and readmission prediction. It demonstrates significant progress in representation learning, particularly in scenarios with limited data availability. Qualitative assessments further underscore CliqueFluxNet's effectiveness in extracting meaningful EHR representations, solidifying its potential for advancing GNN applications in healthcare analytics.
Collapse
Affiliation(s)
- Soheila Molaei
- Department of Engineering Science, University of Oxford, Oxford, OX1 3AZ UK
| | | | - Ghadeer O. Ghosheh
- Department of Engineering Science, University of Oxford, Oxford, OX1 3AZ UK
| | - Anshul Thakur
- Department of Engineering Science, University of Oxford, Oxford, OX1 3AZ UK
| | | | - Tingting Zhu
- Department of Engineering Science, University of Oxford, Oxford, OX1 3AZ UK
| | - David A. Clifton
- Department of Engineering Science, University of Oxford, Oxford, OX1 3AZ UK
- Oxford-Suzhou Centre for Advanced Research (OSCAR), Suzhou, 215123 China
| |
Collapse
|
4
|
Gierend K, Krüger F, Genehr S, Hartmann F, Siegel F, Waltemath D, Ganslandt T, Zeleke AA. Provenance Information for Biomedical Data and Workflows: Scoping Review. J Med Internet Res 2024; 26:e51297. [PMID: 39178413 PMCID: PMC11380065 DOI: 10.2196/51297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 05/30/2024] [Accepted: 06/17/2024] [Indexed: 08/25/2024] Open
Abstract
BACKGROUND The record of the origin and the history of data, known as provenance, holds importance. Provenance information leads to higher interpretability of scientific results and enables reliable collaboration and data sharing. However, the lack of comprehensive evidence on provenance approaches hinders the uptake of good scientific practice in clinical research. OBJECTIVE This scoping review aims to identify approaches and criteria for provenance tracking in the biomedical domain. We reviewed the state-of-the-art frameworks, associated artifacts, and methodologies for provenance tracking. METHODS This scoping review followed the methodological framework developed by Arksey and O'Malley. We searched the PubMed and Web of Science databases for English-language articles published from 2006 to 2022. Title and abstract screening were carried out by 4 independent reviewers using the Rayyan screening tool. A majority vote was required for consent on the eligibility of papers based on the defined inclusion and exclusion criteria. Full-text reading and screening were performed independently by 2 reviewers, and information was extracted into a pretested template for the 5 research questions. Disagreements were resolved by a domain expert. The study protocol has previously been published. RESULTS The search resulted in a total of 764 papers. Of 624 identified, deduplicated papers, 66 (10.6%) studies fulfilled the inclusion criteria. We identified diverse provenance-tracking approaches ranging from practical provenance processing and managing to theoretical frameworks distinguishing diverse concepts and details of data and metadata models, provenance components, and notations. A substantial majority investigated underlying requirements to varying extents and validation intensities but lacked completeness in provenance coverage. Mostly, cited requirements concerned the knowledge about data integrity and reproducibility. Moreover, these revolved around robust data quality assessments, consistent policies for sensitive data protection, improved user interfaces, and automated ontology development. We found that different stakeholder groups benefit from the availability of provenance information. Thereby, we recognized that the term provenance is subjected to an evolutionary and technical process with multifaceted meanings and roles. Challenges included organizational and technical issues linked to data annotation, provenance modeling, and performance, amplified by subsequent matters such as enhanced provenance information and quality principles. CONCLUSIONS As data volumes grow and computing power increases, the challenge of scaling provenance systems to handle data efficiently and assist complex queries intensifies, necessitating automated and scalable solutions. With rising legal and scientific demands, there is an urgent need for greater transparency in implementing provenance systems in research projects, despite the challenges of unresolved granularity and knowledge bottlenecks. We believe that our recommendations enable quality and guide the implementation of auditable and measurable provenance approaches as well as solutions in the daily tasks of biomedical scientists. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) RR2-10.2196/31750.
Collapse
Affiliation(s)
- Kerstin Gierend
- Department of Biomedical Informatics, Mannheim Institute for intelligent Systems in Medicine, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Frank Krüger
- Faculty of Engineering, Wismar University of Applied Sciences, Wismar, Germany
- Institute of Communications Engineering, University of Rostock, Rostock, Germany
| | - Sascha Genehr
- Institute of Communications Engineering, University of Rostock, Rostock, Germany
| | - Francisca Hartmann
- Department of Biomedical Informatics, Mannheim Institute for intelligent Systems in Medicine, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Fabian Siegel
- Department of Biomedical Informatics, Mannheim Institute for intelligent Systems in Medicine, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Dagmar Waltemath
- Department of Medical Informatics, University Medicine Greifswald, Greifswald, Germany
| | - Thomas Ganslandt
- Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | | |
Collapse
|
5
|
Danese MD, Balasubramanian A, Bebb DG, Pundole X. Development of an algorithm to identify small cell lung cancer patients in claims databases. Front Oncol 2024; 14:1358562. [PMID: 39211549 PMCID: PMC11357974 DOI: 10.3389/fonc.2024.1358562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 07/29/2024] [Indexed: 09/04/2024] Open
Abstract
Introduction The treatment landscape of small cell lung cancer (SCLC) is evolving. Evidence generated from administrative claims is needed to characterize real-world SCLC patients. However, the current ICD-10 coding system cannot distinguish SCLC from non-small cell lung cancer (NSCLC). We developed and estimated the accuracy of an algorithm to identify SCLC in claims-only databases. Methods We performed a cross-sectional study of lung cancer patients diagnosed from 2016-2017 using the Surveillance, Epidemiology and End Results (SEER), linked with Medicare database. The analysis included two phases - data exploration (utilizing a 25% random sample) and data validation (remaining 75% sample). The SEER definition of SCLC and NSCLC were used as the gold standard. Claims-based algorithms were identified and evaluated for their sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). Results The eligible cohort included 31,912 lung cancer patients. The mean age was 76.3 years, 44.6% were male, with 9.4% having SCLC and 90.6% identified as NSCLC using SEER. The exploration analysis identified potential algorithms based on treatment data. In the validation analysis of 7,438 lung cancer patients who received systemic treatment in the outpatient setting, an etoposide-based algorithm (etoposide use in 180 days following lung cancer diagnosis) to identify SCLC showed: sensitivity 95%, specificity 95%, PPV 82% and NPV 99%. Discussion An etoposide treatment-based algorithm showed good accuracy in identifying SCLC patients. Such algorithms can facilitate analyses of treatment patterns, outcomes, healthcare resource and costs among treated SCLC patients, thereby bolstering the evidence-base for best patient care.
Collapse
Affiliation(s)
- Mark D. Danese
- Outcomes Insights, Inc., United States, Calabasas, CA, United States
| | | | | | | |
Collapse
|
6
|
Bebbington E, Miles J, Young A, van Baar ME, Bernal N, Brekke RL, van Dammen L, Elmasry M, Inoue Y, McMullen KA, Paton L, Thamm OC, Tracy LM, Zia N, Singer Y, Dunn K. Exploring the similarities and differences of burn registers globally: Results from a data dictionary comparison study. Burns 2024; 50:850-865. [PMID: 38267291 DOI: 10.1016/j.burns.2024.01.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Revised: 12/08/2023] [Accepted: 01/10/2024] [Indexed: 01/26/2024]
Abstract
INTRODUCTION Pooling and comparing data from the existing global network of burn registers represents a powerful, yet untapped, opportunity to improve burn prevention and care. There have been no studies investigating whether registers are sufficiently similar to allow data comparisons. It is also not known what differences exist that could bias analyses. Understanding this information is essential prior to any future data sharing. The aim of this project was to compare the variables collected in countrywide and intercountry burn registers to understand their similarities and differences. METHODS Register custodians were invited to participate and share their data dictionaries. Inclusion and exclusion criteria were compared to understand each register population. Descriptive statistics were calculated for the number of unique variables. Variables were classified into themes. Definition, method, timing of measurement, and response options were compared for a sample of register concepts. RESULTS 13 burn registries participated in the study. Inclusion criteria varied between registers. Median number of variables per register was 94 (range 28 - 890), of which 24% (range 4.8 - 100%) were required to be collected. Six themes (patient information, admission details, injury, inpatient, outpatient, other) and 41 subthemes were identified. Register concepts of age and timing of injury show similarities in data collection. Intent, mechanism, inhalational injury, infection, and patient death show greater variation in measurement. CONCLUSIONS We found some commonalities between registers and some differences. Commonalities would assist in any future efforts to pool and compare data between registers. Differences between registers could introduce selection and measurement bias, which needs to be addressed in any strategy aiming to facilitate burn register data sharing. We recommend the development of common data elements used in an international minimum data set for burn injuries, including standard definitions and methods of measurement, as the next step in achieving burn register data sharing.
Collapse
Affiliation(s)
- Emily Bebbington
- Centre for Mental Health and Society, Bangor University, Wrexham Academic Unit, Technology Park, Wrexham LL13 7YP, UK.
| | - Joanna Miles
- Plastic and Reconstructive Surgery Department, Norfolk and Norwich University Hospital, Colney Lane, Norwich NR4 7UY, UK
| | - Amber Young
- Bristol Centre for Surgical Research, Bristol Medical School, Department of Population Health Sciences, Canynge Hall, 39 Whatley Road, Bristol BS8 2PS, UK
| | - Margriet E van Baar
- Dutch Burn Repository R3, Association of Dutch Burn Centres, Maasstad Hospital, Maasstadweg 21, 3079 DZ Rotterdam, the Netherlands
| | - Nicole Bernal
- The Ohio State University Wexner Medical Center, 410 W 10th Ave, Columbus, OH 43235, USA; Burn Care Quality Platform, American Burn Association, 311 S. Wacker Drive, Suite 950, Chicago, IL, USA
| | - Ragnvald Ljones Brekke
- Norwegian Burn Registry, Norwegian National Burn Center, Haukeland University Hospital, Haukelandsveien 22, 5009 Bergen, Norway
| | - Lotte van Dammen
- Burn Centres Outcomes Registry The Netherlands, Dutch Burns Foundation, Zeestraat 29, 1941 AJ Beverwijk, the Netherlands
| | - Moustafa Elmasry
- Burn Unit Database, Swedish Burn Register, Department of Hand Surgery, Plastic Surgery and Burns, Linköping University, Linköping, Sweden
| | - Yoshiaki Inoue
- Japanese Burn Register, Japanese Society for Burn Injuries, Shunkosha Inc. Lambdax Building, 2-4-12 Ohkubo, Shinjuku-ku, Tokyo 169-0072, Japan
| | - Kara A McMullen
- Burn Model System, Burn Model System National Data and Statistical Center, Department of Rehabilitation Medicine, University of Washington, Box 354237, Seattle, WA 98195-4237, USA
| | - Lia Paton
- Care of Burns in Scotland, National Managed Clinical Network, NHS National Services Scotland, Gyle Square, 1 South Gyle Crescent, Edinburgh EH12 9EB, UK
| | - Oliver C Thamm
- German Burn Registry, German Society for Burn Treatment (DGV), Luisenstrasse 58-59, 10117 Berlin, Germany; University of Witten/Herdecke, Alfred-Herrenhausen-Strasse 50, 58455 Witten, Germany
| | - Lincoln M Tracy
- School of Public Health & Preventive Medicine, Monash University, 553 St Kilda Road, Melbourne, VIC 3004, Australia
| | - Nukhba Zia
- South Asia Burn Registry, Johns Hopkins International Injury Research Unit, Department of International Health, Health Systems Program, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
| | - Yvonne Singer
- School of Nursing and Midwifery, Griffith University, Nathan Campus, 170 Kessels Road, Brisbane, QLD, Australia
| | - Ken Dunn
- Burn Care Informatics Group, NHS, UK
| |
Collapse
|
7
|
Frid S, Bracons Cucó G, Gil Rojas J, López-Rueda A, Pastor Duran X, Martínez-Sáez O, Lozano-Rubí R. Evaluation of OMOP CDM, i2b2 and ICGC ARGO for supporting data harmonization in a breast cancer use case of a multicentric European AI project. J Biomed Inform 2023; 147:104505. [PMID: 37774908 DOI: 10.1016/j.jbi.2023.104505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 09/20/2023] [Accepted: 09/21/2023] [Indexed: 10/01/2023]
Abstract
OBJECTIVE Observational research in cancer poses great challenges regarding adequate data sharing and consolidation based on a homogeneous data semantic base. Common Data Models (CDMs) can help consolidate health data repositories from different institutions minimizing loss of meaning by organizing data into a standard structure. This study aims to evaluate the performance of the Observational Medical Outcomes Partnership (OMOP) CDM, Informatics for Integrating Biology & the Bedside (i2b2) and International Cancer Genome Consortium, Accelerating Research in Genomic Oncology (ICGC ARGO) for representing non-imaging data in a breast cancer use case of EuCanImage. METHODS We used ontologies to represent metamodels of OMOP, i2b2, and ICGC ARGO and variables used in a cancer use case of a European AI project. We selected four evaluation criteria for the CDMs adapted from previous research: content coverage, simplicity, integration, implementability. RESULTS i2b2 and OMOP exhibited higher element completeness (100% each) than ICGC ARGO (58.1%), while the three achieved 100% domain completeness. ICGC ARGO normalizes only one of our variables with a standard terminology, while i2b2 and OMOP use standardized vocabularies for all of them. In terms of simplicity, ICGC ARGO and i2b2 proved to be simpler both in terms of ontological model (276 and 175 elements, respectively) and in the queries (7 and 20 lines of code, respectively), while OMOP required a much more complex ontological model (615 elements) and queries similar to those of i2b2 (20 lines). Regarding implementability, OMOP had the highest number of mentions in articles in PubMed (130) and Google Scholar (1,810), ICGC ARGO had the highest number of updates to the CDM since 2020 (4), and i2b2 is the model with more tools specifically developed for the CDM (26). CONCLUSION ICGC ARGO proved to be rigid and very limited in the representation of oncologic concepts, while i2b2 and OMOP showed a very good performance. i2b2's lack of a common dictionary hinders its scalability, requiring sites that will share data to explicitly define a conceptual framework, and suggesting that OMOP and its Oncology extension could be the more suitable choice. Future research employing these CDMs with actual datasets is needed.
Collapse
Affiliation(s)
- Santiago Frid
- Clinical Informatics Service, Hospital Clínic de Barcelona, Villarroel 170, 08036 Barcelona, Spain. https://twitter.com/santifrik
| | - Guillem Bracons Cucó
- Fundació de Recerca Clínic Barcelona - Institut d'Investigacions Biomèdiques August Pi i Sunyer, Rosselló 149-153, 08036 Barcelona, Spain
| | - Jessyca Gil Rojas
- Clinical Informatics Service, Hospital Clínic de Barcelona, Villarroel 170, 08036 Barcelona, Spain
| | - Antonio López-Rueda
- Clinical Informatics Service, Hospital Clínic de Barcelona, Villarroel 170, 08036 Barcelona, Spain; Radiology Service, Hospital Clínic de Barcelona, Villarroel 170, 08036 Barcelona, Spain
| | - Xavier Pastor Duran
- Clinical Informatics Service, Hospital Clínic de Barcelona, Villarroel 170, 08036 Barcelona, Spain
| | - Olga Martínez-Sáez
- Oncology Service, Hospital Clínic de Barcelona, Villarroel 170, 08036 Barcelona, Spain
| | - Raimundo Lozano-Rubí
- Oncology Service, Hospital Clínic de Barcelona, Villarroel 170, 08036 Barcelona, Spain
| |
Collapse
|
8
|
Johns M, Meurers T, Wirth FN, Haber AC, Müller A, Halilovic M, Balzer F, Prasser F. Data Provenance in Biomedical Research: Scoping Review. J Med Internet Res 2023; 25:e42289. [PMID: 36972116 PMCID: PMC10132013 DOI: 10.2196/42289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 12/14/2022] [Accepted: 12/23/2022] [Indexed: 03/29/2023] Open
Abstract
BACKGROUND Data provenance refers to the origin, processing, and movement of data. Reliable and precise knowledge about data provenance has great potential to improve reproducibility as well as quality in biomedical research and, therefore, to foster good scientific practice. However, despite the increasing interest on data provenance technologies in the literature and their implementation in other disciplines, these technologies have not yet been widely adopted in biomedical research. OBJECTIVE The aim of this scoping review was to provide a structured overview of the body of knowledge on provenance methods in biomedical research by systematizing articles covering data provenance technologies developed for or used in this application area; describing and comparing the functionalities as well as the design of the provenance technologies used; and identifying gaps in the literature, which could provide opportunities for future research on technologies that could receive more widespread adoption. METHODS Following a methodological framework for scoping studies and the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines, articles were identified by searching the PubMed, IEEE Xplore, and Web of Science databases and subsequently screened for eligibility. We included original articles covering software-based provenance management for scientific research published between 2010 and 2021. A set of data items was defined along the following five axes: publication metadata, application scope, provenance aspects covered, data representation, and functionalities. The data items were extracted from the articles, stored in a charting spreadsheet, and summarized in tables and figures. RESULTS We identified 44 original articles published between 2010 and 2021. We found that the solutions described were heterogeneous along all axes. We also identified relationships among motivations for the use of provenance information, feature sets (capture, storage, retrieval, visualization, and analysis), and implementation details such as the data models and technologies used. The important gap that we identified is that only a few publications address the analysis of provenance data or use established provenance standards, such as PROV. CONCLUSIONS The heterogeneity of provenance methods, models, and implementations found in the literature points to the lack of a unified understanding of provenance concepts for biomedical data. Providing a common framework, a biomedical reference, and benchmarking data sets could foster the development of more comprehensive provenance solutions.
Collapse
Affiliation(s)
- Marco Johns
- Medical Informatics Group, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Thierry Meurers
- Medical Informatics Group, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Felix N Wirth
- Medical Informatics Group, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Anna C Haber
- Medical Informatics Group, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Armin Müller
- Medical Informatics Group, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Mehmed Halilovic
- Medical Informatics Group, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Felix Balzer
- Institute of Medical Informatics, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Fabian Prasser
- Medical Informatics Group, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| |
Collapse
|
9
|
Bebbington E, Miles J, Peck M, Singer Y, Dunn K, Young A. Exploring the similarities and differences of variables collected by burn registers globally: protocol for a data dictionary review study. BMJ Open 2023; 13:e066512. [PMID: 36854585 PMCID: PMC9980371 DOI: 10.1136/bmjopen-2022-066512] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 03/02/2023] Open
Abstract
INTRODUCTION Burn registers can provide high-quality clinical data that can be used for surveillance, research, planning service provision and clinical quality assessment. Many countrywide and intercountry burn registers now exist. The variables collected by burn registers are not standardised internationally. Few international burn register data comparisons are completed beyond basic morbidity and mortality statistics. Data comparisons across registers require analysis of homogenous variables. Little work has been done to understand whether burn registers have sufficiently similar variables to enable useful comparisons. The aim of this project is to compare the variables collected in countrywide and intercountry burn registers internationally to understand their similarities and differences. METHODS AND ANALYSIS Burn register custodians will be invited to participate in the study and to share their register data dictionaries. Study objectives are to compare patient inclusion and exclusion criteria of each participating burn register; determine which variables are collected by each register, and if variables are required or optional, identify common variable themes; and compare a sample of variables to understand how they are defined and measured. All variable names will be extracted from each register and common themes will be identified. Detailed information will be extracted for a sample of variables to give a deeper insight into similarities and differences between registers. ETHICS AND DISSEMINATION No patient data will be used in this project. Permission to use each register's data dictionary will be sought from respective register custodians. Results will be presented at international meetings and published in open access journals. These results will be of interest to register custodians and researchers wishing to explore international data comparisons, and countries wishing to establish their own burn register.
Collapse
Affiliation(s)
- Emily Bebbington
- Centre for Mental Health and Society, Bangor University, Bangor, UK
- Emergency Department, Ysbyty Gwynedd, Bangor, UK
| | - Joanna Miles
- Plastic and Reconstructive Surgery Department, Norfolk and Norwich University Hospitals NHS Foundation Trust, Norwich, UK
| | - Michael Peck
- Arizona Burn Center, Valleywise Health Medical Center, Phoenix, Arizona, USA
- Department of Surgery, Creighton University Health Sciences Campus, Phoenix, Arizona, USA
| | - Yvonne Singer
- Victoria Adult Burn Service, The Alfred Hospital, Melbourne, Victoria, Australia
| | - Ken Dunn
- Burn Care Informatics Group, NHS England, Manchester, UK
| | - Amber Young
- Children's Burn Research Centre, University Hospitals Bristol and Weston NHS Foundation Trust, Bristol, UK
- Bristol Centre for Surgical Research, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
| |
Collapse
|
10
|
Lam SSW, Fang AHS, Koh MS, Shantakumar S, Yeo SH, Matchar DB, Ong MEH, Poon KMT, Huang L, Harikrishan S, Milea D, Burke D, Webb D, Ragavendran N, Tan NC, Loo CM. Development of a real-world database for asthma and COPD: The SingHealth-Duke-NUS-GSK COPD and Asthma Real-World Evidence (SDG-CARE) collaboration. BMC Med Inform Decis Mak 2023; 23:4. [PMID: 36624490 PMCID: PMC9830781 DOI: 10.1186/s12911-022-02071-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 11/25/2022] [Indexed: 01/11/2023] Open
Abstract
PURPOSE The SingHealth-Duke-GlaxoSmithKline COPD and Asthma Real-world Evidence (SDG-CARE) collaboration was formed to accelerate the use of Singaporean real-world evidence in research and clinical care. A centerpiece of the collaboration was to develop a near real-time database from clinical and operational data sources to inform healthcare decision making and research studies on asthma and chronic obstructive pulmonary disease (COPD). METHODS Our multidisciplinary team, including clinicians, epidemiologists, data scientists, medical informaticians and IT engineers, adopted the hybrid waterfall-agile project management methodology to develop the SingHealth COPD and Asthma Data Mart (SCDM). The SCDM was developed within the organizational data warehouse. It pulls and maps data from various information systems using extract, transform and load (ETL) pipelines. Robust user testing and data verification was also performed to ensure that the business requirements were met and that the ETL pipelines were valid. RESULTS The SCDM includes 199 data elements relevant to asthma and COPD. Data verification was performed and found the SCDM to be reliable. As of December 31, 2019, the SCDM contained 36,407 unique patients with asthma and COPD across the spectrum from primary to tertiary care in our healthcare system. The database updates weekly to add new data of existing patients and to include new patients who fulfil the inclusion criteria. CONCLUSIONS The SCDM was systematically developed and tested to support the use RWD for clinical and health services research in asthma and COPD. This can serve as a platform to provide research and operational insights to improve the care delivered to our patients.
Collapse
Affiliation(s)
- Sean Shao Wei Lam
- grid.428397.30000 0004 0385 0924Health Services and Systems Research, Duke-NUS Medical School, Singapore, Singapore ,grid.453420.40000 0004 0469 9402Health Services Research Centre, Singapore Health Services, 20 College Road, The Academia – Discovery Tower Level 6, Singapore, 169856 Singapore ,grid.512024.00000 0004 8513 1236Health Services Research Institute, SingHealth Duke NUS Academic Medical Centre, Singapore, Singapore ,grid.412634.60000 0001 0697 8112Lee Kong Chian School of Business, Singapore Management University, Singapore, Singapore
| | - Andrew Hao Sen Fang
- grid.453420.40000 0004 0469 9402SingHealth Polyclinics, SingHealth, Singapore, Singapore
| | - Mariko Siyue Koh
- grid.163555.10000 0000 9486 5048Department of Respiratory and Critical Care Medicine, Singapore General Hospital, Singapore, Singapore ,grid.428397.30000 0004 0385 0924Duke-NUS Medical School, Singapore, Singapore
| | - Sumitra Shantakumar
- grid.428397.30000 0004 0385 0924Duke-NUS Medical School, Singapore, Singapore ,GlaxoSmithKline, Singapore, Singapore
| | | | - David Bruce Matchar
- grid.428397.30000 0004 0385 0924Duke-NUS Medical School, Singapore, Singapore ,grid.26009.3d0000 0004 1936 7961Department of Internal Medicine (General Internal Medicine), Duke University Medical School, Durham, NC USA ,grid.163555.10000 0000 9486 5048Department of Internal Medicine, Singapore General Hospital, Singapore, Singapore
| | - Marcus Eng Hock Ong
- grid.428397.30000 0004 0385 0924Health Services and Systems Research, Duke-NUS Medical School, Singapore, Singapore ,grid.453420.40000 0004 0469 9402Health Services Research Centre, Singapore Health Services, 20 College Road, The Academia – Discovery Tower Level 6, Singapore, 169856 Singapore ,grid.512024.00000 0004 8513 1236Health Services Research Institute, SingHealth Duke NUS Academic Medical Centre, Singapore, Singapore ,grid.163555.10000 0000 9486 5048Department of Emergency Medicine, Singapore General Hospital, Singapore, Singapore
| | | | - Liming Huang
- Integrated Health Information Systems, Singapore, Singapore
| | - Sudha Harikrishan
- grid.453420.40000 0004 0469 9402Health Services Research Centre, Singapore Health Services, 20 College Road, The Academia – Discovery Tower Level 6, Singapore, 169856 Singapore
| | | | - Des Burke
- GlaxoSmithKline, Singapore, Singapore
| | - Dave Webb
- GlaxoSmithKline, Singapore, Singapore
| | - Narayanan Ragavendran
- grid.453420.40000 0004 0469 9402Health Services Research Centre, Singapore Health Services, 20 College Road, The Academia – Discovery Tower Level 6, Singapore, 169856 Singapore ,grid.428397.30000 0004 0385 0924Duke-NUS Medical School, Singapore, Singapore
| | - Ngiap Chuan Tan
- grid.453420.40000 0004 0469 9402SingHealth Polyclinics, SingHealth, Singapore, Singapore ,grid.428397.30000 0004 0385 0924Duke-NUS Medical School, Singapore, Singapore
| | - Chian Min Loo
- grid.163555.10000 0000 9486 5048Department of Respiratory and Critical Care Medicine, Singapore General Hospital, Singapore, Singapore ,grid.428397.30000 0004 0385 0924Duke-NUS Medical School, Singapore, Singapore
| |
Collapse
|
11
|
Danese MD, Fox KM, Duryea JL, Desai P, Rubin RJ. The rate, cost and outcomes of parathyroidectomy in the united states dialysis population from 2016-2018. BMC Nephrol 2022; 23:220. [PMID: 35729513 PMCID: PMC9215010 DOI: 10.1186/s12882-022-02848-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Accepted: 06/08/2022] [Indexed: 11/16/2022] Open
Abstract
Background In end-stage kidney disease, patients may undergo parathyroidectomy if secondary hyperparathyroidism cannot be managed medically. This study was designed to estimate the parathyroidectomy rate in the United States (US) and to quantify changes in costs and other outcomes after parathyroidectomy. Methods This was a retrospective observational cohort study using US Renal Data System data for 2015–2018. Parathyroidectomy rates were estimated for adult hemodialysis and peritoneal dialysis patients alive at the beginning of 2016, 2017, and 2018 who were followed for a year or until parathyroidectomy, death, or transplant. Incremental differences in economic and clinical outcomes were compared before and after parathyroidectomy in adult hemodialysis and peritoneal dialysis patients who received a parathyroidectomy in 2016 and 2017. Results The rate of parathyroidectomy per 1,000
person-years decreased from 6.5 (95% CI 6.2-6.8) in 2016 to 5.3 (95% CI
5.0-5.6) in 2018. The incremental
increase in 12-month cost after versus before parathyroidectomy was $25,314
(95% CI $23,777-$27,078). By the second
month after parathyroidectomy, 58% of patients had a corrected calcium level
< 8.5 mg/dL. In the year after
parathyroidectomy (versus before), hospitalizations increased by 1.4 per
person-year (95% CI 1.3-1.5), hospital days increased by 12.1 per person-year
(95% CI 11.2-13.0), dialysis visits decreased by 5.2 per person-year (95% CI
4.4-5.9), and office visits declined by 1.3 per person-year (95% CI
1.0-1.5). The incremental rate per 1,000
person years for hematoma/bleed was 224.4 (95% CI 152.5-303.1), for vocal cord
paralysis was 124.6 (95% CI 59.1-232.1), and for seroma was 27.4 (95% CI
0.4-59.0). Conclusions Parathyroidectomy was a relatively uncommon event in the hemodialysis and peritoneal dialysis populations. The incremental cost of parathyroidectomy was mostly attributable to the cost of the parathyroidectomy hospitalization. Hypocalcemia occurred in over half of patients, and calcium and phosphate levels were reduced. Clinicians, payers, and patients should understand the potential clinical and economic outcomes when considering parathyroidectomy. Supplementary Information The online version contains supplementary material available at 10.1186/s12882-022-02848-x.
Collapse
Affiliation(s)
- Mark D Danese
- Outcomes Insights, Inc., 30200 Agoura Road, Suite 230, Agoura Hills, CA, 91301, USA.
| | - Kathleen M Fox
- Global Health Economics, Amgen, Inc., Thousand Oaks, CA, USA
| | - Jennifer L Duryea
- Outcomes Insights, Inc., 30200 Agoura Road, Suite 230, Agoura Hills, CA, 91301, USA
| | | | - Robert J Rubin
- Division of Nephrology and Hypertension, Georgetown University, Washington, DC, USA
| |
Collapse
|
12
|
Abstract
A huge array of data in nephrology is collected through patient registries, large epidemiological studies, electronic health records, administrative claims, clinical trial repositories, mobile health devices and molecular databases. Application of these big data, particularly using machine-learning algorithms, provides a unique opportunity to obtain novel insights into kidney diseases, facilitate personalized medicine and improve patient care. Efforts to make large volumes of data freely accessible to the scientific community, increased awareness of the importance of data sharing and the availability of advanced computing algorithms will facilitate the use of big data in nephrology. However, challenges exist in accessing, harmonizing and integrating datasets in different formats from disparate sources, improving data quality and ensuring that data are secure and the rights and privacy of patients and research participants are protected. In addition, the optimism for data-driven breakthroughs in medicine is tempered by scepticism about the accuracy of calibration and prediction from in silico techniques. Machine-learning algorithms designed to study kidney health and diseases must be able to handle the nuances of this specialty, must adapt as medical practice continually evolves, and must have global and prospective applicability for external and future datasets.
Collapse
|