1
|
Riepenhausen S, Blumenstock M, Niklas C, Hegselmann S, Neuhaus P, Meidt A, Püttmann C, Storck M, Ganzinger M, Varghese J, Dugas M. Europe's Largest Research Infrastructure for Curated Medical Data Models with Semantic Annotations. Methods Inf Med 2024. [PMID: 38740374 DOI: 10.1055/s-0044-1786839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
BACKGROUND Structural metadata from the majority of clinical studies and routine health care systems is currently not yet available to the scientific community. OBJECTIVE To provide an overview of available contents in the Portal of Medical Data Models (MDM Portal). METHODS The MDM Portal is a registered European information infrastructure for research and health care, and its contents are curated and semantically annotated by medical experts. It enables users to search, view, discuss, and download existing medical data models. RESULTS The most frequent keyword is "clinical trial" (n = 18,777), and the most frequent disease-specific keyword is "breast neoplasms" (n = 1,943). Most data items are available in English (n = 545,749) and German (n = 109,267). Manually curated semantic annotations are available for 805,308 elements (554,352 items, 58,101 item groups, and 192,855 code list items), which were derived from 25,257 data models. In total, 1,609,225 Unified Medical Language System (UMLS) codes have been assigned, with 66,373 unique UMLS codes. CONCLUSION To our knowledge, the MDM Portal constitutes Europe's largest collection of medical data models with semantically annotated elements. As such, it can be used to increase compatibility of medical datasets and can be utilized as a large expert-annotated medical text corpus for natural language processing.
Collapse
Affiliation(s)
- Sarah Riepenhausen
- Institute of Medical Informatics, University of Münster, Münster, Nordrhein-Westfalen, Germany
| | - Max Blumenstock
- Institute of Medical Informatics, Heidelberg University Hospital, Heidelberg, Germany
| | - Christian Niklas
- Institute of Medical Informatics, Heidelberg University Hospital, Heidelberg, Germany
| | - Stefan Hegselmann
- Institute of Medical Informatics, University of Münster, Münster, Nordrhein-Westfalen, Germany
| | - Philipp Neuhaus
- Institute of Medical Informatics, University of Münster, Münster, Nordrhein-Westfalen, Germany
| | - Alexandra Meidt
- Institute of Medical Informatics, University of Münster, Münster, Nordrhein-Westfalen, Germany
| | - Cornelia Püttmann
- Institute of Medical Informatics, University of Münster, Münster, Nordrhein-Westfalen, Germany
| | - Michael Storck
- Institute of Medical Informatics, University of Münster, Münster, Nordrhein-Westfalen, Germany
| | - Matthias Ganzinger
- Institute of Medical Informatics, Heidelberg University Hospital, Heidelberg, Germany
| | - Julian Varghese
- Institute of Medical Informatics, University of Münster, Münster, Nordrhein-Westfalen, Germany
| | - Martin Dugas
- Institute of Medical Informatics, Heidelberg University Hospital, Heidelberg, Germany
- European Research Center for Information Systems (ERCIS), Münster, Nordrhein-Westfalen, Germany
| |
Collapse
|
2
|
Rodoreda-Pallàs B, Lumillo-Gutiérrez I, Miró Catalina Q, Torra Escarrer E, Sanahuja Juncadella J, Morin Fraile V. Recording of Social Determinants in Computerized Medical Records in Primary Care Consultations: Quasi-Experimental Study. JMIR Form Res 2023; 7:e41706. [PMID: 36696168 PMCID: PMC10013680 DOI: 10.2196/41706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 11/30/2022] [Accepted: 12/15/2022] [Indexed: 01/26/2023] Open
Abstract
BACKGROUND Social determinants of health may be more important than medical or lifestyle choices in influencing people's health. Even so, there is a deficit in recording these in patients' computerized medical histories. The Spanish administration and the World Health Organization are promoting the recording of diagnoses in computerized clinical histories with the aim of benefiting the individual, the professional, and the community. In most cases, professionals tend to record only clinical diagnoses despite evidence in the literature documenting that addressing the social determinants of health can lead to improvements in health and reductions in social disparities in disease. OBJECTIVE This study aims to develop and evaluate the effectiveness of a mixed intervention (face-to-face-digital) aimed at improving the quantity and quality of the records of the social determinants of health in computerized medical records at primary care clinics. METHODS A quasi-experimental, nonrandomized, controlled, multicenter study with 2 parallel study arms was conducted in the area of Central Catalonia (Spain) with primary care professionals of the Institut Català de la Salut (ICS), working from September 23, 2019, to March 31, 2020. All interested professionals were accepted. In total, 22 basic health areas were involved in the study. In Spain and Catalonia, the International Classification of Diseases is used, in which there is a coding of the social determinants of health. Five social determinants were selected by a physician, a nurse, and a social worker; these professionals had experience in primary care and were experts in community health. The choice was made taking into account the ease of use, benefit, and existing terminology. The intervention, based on the integration of a checklist, was integrated as part of the usual multidisciplinary clinical workflow in primary care consultations to influence the recording of these determinants in the patient's computerized medical record. RESULTS After 6 months of implementing the intervention, the volume and quantity of records of 5 social determinants of health were compared, and a significant increase in the median number of pre- and postintervention diagnoses was observed (P≤.001). There was also an increase in the diversity of selected social determinants. Using the linear regression model, the significant mean increase of the experimental group with respect to the control group was estimated with a coefficient of 8.18 (95% CI 5.11-11.26). CONCLUSIONS The intervention described in this study is an effective tool for coding the social determinants of health designed by a multidisciplinary team to be incorporated into the workflow of primary care practices. The effectiveness of its usability and the description of the intervention described here should be generalizable to any environment. TRIAL REGISTRATION ClinicalTrials.gov NCT04151056; https://clinicaltrials.gov/ct2/show/NCT04151056.
Collapse
Affiliation(s)
- Berta Rodoreda-Pallàs
- Santpedor Primary Health Care, EAP Navarcles/Sant Frutiós /Santpedor, Primary Care Service Bages-Berguedà, Central Catalonia Territorial Management, Institut Català de la Salut, Santpedor, Spain.,Health Promotion in Rural Areas Research Group, Institut Català de la Salut, Sant Fruitós de Bages, Spain.,Research Support Unit of Central Catalonia, Fundació Institut Universitari per a la Recerca a l'Atenció Primària de Salut Jordi Gol i Gurina, Barcelona, Spain
| | - Iris Lumillo-Gutiérrez
- Department of Public Health, Mental Health and Maternal and Child Health Nursing, Universitat de Barcelona, Barcelona, Spain.,Chronicity and Complexity Care Unit, Baix Llobregat Centre Primary Care Service, Southern Metropolitan Territorial Management, Institut Català de la Salut, Cornellà de Llobregat (Barcelona), Spain.,Research Group on Environments and Materials for Learning, Universitat de Barcelona, Barcelona, Spain
| | - Queralt Miró Catalina
- Health Promotion in Rural Areas Research Group, Institut Català de la Salut, Sant Fruitós de Bages, Spain.,Research Support Unit of Central Catalonia, Fundació Institut Universitari per a la Recerca a l'Atenció Primària de Salut Jordi Gol i Gurina, Barcelona, Spain
| | - Eva Torra Escarrer
- Sant Vicenç de Castellet Primary Health Care, Primary Care Service Bages-Berguedà, Central Catalonia Territorial Management, Institut Català de la Salut, Sant Vicenç de Castellet, Spain
| | - Jaume Sanahuja Juncadella
- Plaça Catalunya Primary Health Care, Primary Care Service Bages-Berguedà, Central Catalonia Territorial Management, Institut Català de la Salut, Manresa, Spain
| | - Victoria Morin Fraile
- Department of Public Health, Mental Health and Maternal and Child Health Nursing, Universitat de Barcelona, Barcelona, Spain.,Research Group on Environments and Materials for Learning, Universitat de Barcelona, Barcelona, Spain.,Health Education and Promotion, Universitat de Barcelona, Barcelona, Spain.,School of Nursing, Universitat de Barcelona, Barcelona, Spain
| |
Collapse
|
3
|
Rafee A, Riepenhausen S, Neuhaus P, Meidt A, Dugas M, Varghese J. ELaPro, a LOINC-mapped core dataset for top laboratory procedures of eligibility screening for clinical trials. BMC Med Res Methodol 2022; 22:141. [PMID: 35568796 PMCID: PMC9107639 DOI: 10.1186/s12874-022-01611-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 04/20/2022] [Indexed: 12/21/2022] Open
Abstract
Background Screening for eligible patients continues to pose a great challenge for many clinical trials. This has led to a rapidly growing interest in standardizing computable representations of eligibility criteria (EC) in order to develop tools that leverage data from electronic health record (EHR) systems. Although laboratory procedures (LP) represent a common entity of EC that is readily available and retrievable from EHR systems, there is a lack of interoperable data models for this entity of EC. A public, specialized data model that utilizes international, widely-adopted terminology for LP, e.g. Logical Observation Identifiers Names and Codes (LOINC®), is much needed to support automated screening tools. Objective The aim of this study is to establish a core dataset for LP most frequently requested to recruit patients for clinical trials using LOINC terminology. Employing such a core dataset could enhance the interface between study feasibility platforms and EHR systems and significantly improve automatic patient recruitment. Methods We used a semi-automated approach to analyze 10,516 screening forms from the Medical Data Models (MDM) portal’s data repository that are pre-annotated with Unified Medical Language System (UMLS). An automated semantic analysis based on concept frequency is followed by an extensive manual expert review performed by physicians to analyze complex recruitment-relevant concepts not amenable to automatic approach. Results Based on analysis of 138,225 EC from 10,516 screening forms, 55 laboratory procedures represented 77.87% of all UMLS laboratory concept occurrences identified in the selected EC forms. We identified 26,413 unique UMLS concepts from 118 UMLS semantic types and covered the vast majority of Medical Subject Headings (MeSH) disease domains. Conclusions Only a small set of common LP covers the majority of laboratory concepts in screening EC forms which supports the feasibility of establishing a focused core dataset for LP. We present ELaPro, a novel, LOINC-mapped, core dataset for the most frequent 55 LP requested in screening for clinical trials. ELaPro is available in multiple machine-readable data formats like CSV, ODM and HL7 FHIR. The extensive manual curation of this large number of free-text EC as well as the combining of UMLS and LOINC terminologies distinguishes this specialized dataset from previous relevant datasets in the literature. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-022-01611-y.
Collapse
Affiliation(s)
- Ahmed Rafee
- Institute of Medical Informatics, University of Münster, Münster, Germany. .,Department of Internal Medicine (D), University Hospital of Münster, Münster, Germany.
| | - Sarah Riepenhausen
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Philipp Neuhaus
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Alexandra Meidt
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Martin Dugas
- Institute of Medical Informatics, Heidelberg University Hospital, Heidelberg, Germany
| | - Julian Varghese
- Institute of Medical Informatics, University of Münster, Münster, Germany.
| |
Collapse
|
4
|
Berenspöhler S, Minnerup J, Dugas M, Varghese J. Common Data Elements for Meaningful Stroke Documentation in Routine Care and Clinical Research: Retrospective Data Analysis. JMIR Med Inform 2021; 9:e27396. [PMID: 34636733 PMCID: PMC8548969 DOI: 10.2196/27396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2021] [Revised: 07/12/2021] [Accepted: 07/19/2021] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Medical information management for stroke patients is currently a very time-consuming endeavor. There are clear guidelines and procedures to treat patients having acute stroke, but it is not known how well these established practices are reflected in patient documentation. OBJECTIVE This study compares a variety of documentation processes regarding stroke. The main objective of this work is to provide an overview of the most commonly occurring medical concepts in stroke documentation and identify overlaps between different documentation contexts to allow for the definition of a core data set that could be used in potential data interfaces. METHODS Medical source documentation forms from different documentation contexts, including hospitals, clinical trials, registries, and international standards, regarding stroke treatment followed by rehabilitation were digitized in the operational data model. Each source data element was semantically annotated using the Unified Medical Language System. The concept codes were analyzed for semantic overlaps. A concept was considered common if it appeared in at least two documentation contexts. The resulting common concepts were extended with implementation details, including data types and permissible values based on frequent patterns of source data elements, using an established expert-based and semiautomatic approach. RESULTS In total, 3287 data elements were identified, and 1051 of these emerged as unique medical concepts. The 100 most frequent medical concepts cover 9.51% (100/1051) of all concept occurrences in stroke documentation, and the 50 most frequent concepts cover 4.75% (50/1051). A list of common data elements was implemented in different standardized machine-readable formats on a public metadata repository for interoperable reuse. CONCLUSIONS Standardization of medical documentation is a prerequisite for data exchange as well as the transferability and reuse of data. In the long run, standardization would save time and money and extend the capabilities for which such data could be used. In the context of this work, a lack of standardization was observed regarding current information management. Free-form text fields and intricate questions complicate automated data access and transfer between institutions. This work also revealed the potential of a unified documentation process as a core data set of the 50 most frequent common data elements, accounting for 34% of the documentation in medical information management. Such a data set offers a starting point for standardized and interoperable data collection in routine care, quality management, and clinical research.
Collapse
Affiliation(s)
- Sarah Berenspöhler
- Institute of Medical Informatics, Westfälische Wilhelms-University Münster, Münster, Germany
| | - Jens Minnerup
- Department of Neurology with Institute of Translational Neurology, University Hospital Münster, Münster, Germany
| | - Martin Dugas
- Institute of Medical Informatics, Heidelberg University Hospital, Heidelberg, Germany
| | - Julian Varghese
- Institute of Medical Informatics, Westfälische Wilhelms-University Münster, Münster, Germany
| |
Collapse
|
5
|
Hegselmann S, Storck M, Gessner S, Neuhaus P, Varghese J, Bruland P, Meidt A, Mertens C, Riepenhausen S, Baier S, Stöcker B, Henke J, Schmidt CO, Dugas M. Pragmatic MDR: a metadata repository with bottom-up standardization of medical metadata through reuse. BMC Med Inform Decis Mak 2021; 21:160. [PMID: 34001121 PMCID: PMC8130274 DOI: 10.1186/s12911-021-01524-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Accepted: 05/09/2021] [Indexed: 11/27/2022] Open
Abstract
Background The variety of medical documentation often leads to incompatible data elements that impede data integration between institutions. A common approach to standardize and distribute metadata definitions are ISO/IEC 11179 norm-compliant metadata repositories with top-down standardization. To the best of our knowledge, however, it is not yet common practice to reuse the content of publicly accessible metadata repositories for creation of case report forms or routine documentation. We suggest an alternative concept called pragmatic metadata repository, which enables a community-driven bottom-up approach for agreeing on data collection models. A pragmatic metadata repository collects real-world documentation and considers frequent metadata definitions as high quality with potential for reuse. Methods We implemented a pragmatic metadata repository proof of concept application and filled it with medical forms from the Portal of Medical Data Models. We applied this prototype in two use cases to demonstrate its capabilities for reusing metadata: first, integration into a study editor for the suggestion of data elements and, second, metadata synchronization between two institutions. Moreover, we evaluated the emergence of bottom-up standards in the prototype and two medical data managers assessed their quality for 24 medical concepts. Results The resulting prototype contained 466,569 unique metadata definitions. Integration into the study editor led to a reuse of 1836 items and item groups. During the metadata synchronization, semantic codes of 4608 data elements were transferred. Our evaluation revealed that for less complex medical concepts weak bottom-up standards could be established. However, more diverse disease-related concepts showed no convergence of data elements due to an enormous heterogeneity of metadata. The survey showed fair agreement (Kalpha = 0.50, 95% CI 0.43–0.56) for good item quality of bottom-up standards. Conclusions We demonstrated the feasibility of the pragmatic metadata repository concept for medical documentation. Applications of the prototype in two use cases suggest that it facilitates the reuse of data elements. Our evaluation showed that bottom-up standardization based on a large collection of real-world metadata can yield useful results. The proposed concept shall not replace existing top-down approaches, rather it complements them by showing what is commonly used in the community to guide other researchers. Supplementary Information The online version contains supplementary material available at 10.1186/s12911-021-01524-8.
Collapse
Affiliation(s)
- Stefan Hegselmann
- Institute of Medical Informatics, University of Münster, Münster, Germany.
| | - Michael Storck
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Sophia Gessner
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Philipp Neuhaus
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Julian Varghese
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Philipp Bruland
- University of Applied Sciences Ostwestfalen-Lippe, Lemgo, Germany
| | - Alexandra Meidt
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Cornelia Mertens
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Sarah Riepenhausen
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Sonja Baier
- Centre for Clinical Trials, University of Münster, Münster, Germany
| | - Benedikt Stöcker
- Centre for Clinical Trials, University of Münster, Münster, Germany
| | - Jörg Henke
- Institute of Community Medicine, University Medicine of Greifswald, Greifswald, Germany
| | | | - Martin Dugas
- Institute of Medical Informatics, University of Münster, Münster, Germany
| |
Collapse
|
6
|
Elghafari A, Finkelstein J. Automated Identification of Common Disease-Specific Outcomes for Comparative Effectiveness Research Using ClinicalTrials.gov: Algorithm Development and Validation Study. JMIR Med Inform 2021; 9:e18298. [PMID: 33460388 PMCID: PMC7899806 DOI: 10.2196/18298] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Revised: 08/30/2020] [Accepted: 01/17/2021] [Indexed: 01/02/2023] Open
Abstract
Background Common disease-specific outcomes are vital for ensuring comparability of clinical trial data and enabling meta analyses and interstudy comparisons. Traditionally, the process of deciding which outcomes should be recommended as common for a particular disease relied on assembling and surveying panels of subject-matter experts. This is usually a time-consuming and laborious process. Objective The objectives of this work were to develop and evaluate a generalized pipeline that can automatically identify common outcomes specific to any given disease by finding, downloading, and analyzing data of previous clinical trials relevant to that disease. Methods An automated pipeline to interface with ClinicalTrials.gov’s application programming interface and download the relevant trials for the input condition was designed. The primary and secondary outcomes of those trials were parsed and grouped based on text similarity and ranked based on frequency. The quality and usefulness of the pipeline’s output were assessed by comparing the top outcomes identified by it for chronic obstructive pulmonary disease (COPD) to a list of 80 outcomes manually abstracted from the most frequently cited and comprehensive reviews delineating clinical outcomes for COPD. Results The common disease-specific outcome pipeline successfully downloaded and processed 3876 studies related to COPD. Manual verification indicated that the pipeline was downloading and processing the same number of trials as were obtained from the self-service ClinicalTrials.gov portal. Evaluating the automatically identified outcomes against the manually abstracted ones showed that the pipeline achieved a recall of 92% and precision of 79%. The precision number indicated that the pipeline was identifying many outcomes that were not covered in the literature reviews. Assessment of those outcomes indicated that they are relevant to COPD and could be considered in future research. Conclusions An automated evidence-based pipeline can identify common clinical trial outcomes of comparable breadth and quality as the outcomes identified in comprehensive literature reviews. Moreover, such an approach can highlight relevant outcomes for further consideration.
Collapse
Affiliation(s)
- Anas Elghafari
- Center for Biomedical and Population Health Informatics, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Joseph Finkelstein
- Center for Biomedical and Population Health Informatics, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| |
Collapse
|