1
|
Rojas JC, Lyons PG, Chhikara K, Chaudhari V, Bhavani SV, Nour M, Buell KG, Smith KD, Gao CA, Amagai S, Mao C, Luo Y, Barker AK, Nuppnau M, Beck H, Baccile R, Hermsen M, Liao Z, Park-Egan B, Carey KA, XuanHan, Hochberg CH, Ingraham NE, Parker WF. A Common Longitudinal Intensive Care Unit data Format (CLIF) to enable multi-institutional federated critical illness research. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.09.04.24313058. [PMID: 39281737 PMCID: PMC11398431 DOI: 10.1101/2024.09.04.24313058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/18/2024]
Abstract
Background Critical illness, or acute organ failure requiring life support, threatens over five million American lives annually. Electronic health record (EHR) data are a source of granular information that could generate crucial insights into the nature and optimal treatment of critical illness. However, data management, security, and standardization are barriers to large-scale critical illness EHR studies. Methods A consortium of critical care physicians and data scientists from eight US healthcare systems developed the Common Longitudinal Intensive Care Unit (ICU) data Format (CLIF), an open-source database format that harmonizes a minimum set of ICU Data Elements for use in critical illness research. We created a pipeline to process adult ICU EHR data at each site. After development and iteration, we conducted two proof-of-concept studies with a federated research architecture: 1) an external validation of an in-hospital mortality prediction model for critically ill patients and 2) an assessment of 72-hour temperature trajectories and their association with mechanical ventilation and in-hospital mortality using group-based trajectory models. Results We converted longitudinal data from 94,356 critically ill patients treated in 2020-2021 (mean age 60.6 years [standard deviation 17.2], 30% Black, 7% Hispanic, 45% female) across 8 health systems and 33 hospitals into the CLIF format, The in-hospital mortality prediction model performed well in the health system where it was derived (0.81 AUC, 0.06 Brier score). Performance across CLIF consortium sites varied (AUCs: 0.74-0.83, Brier scores: 0.06-0.01), and demonstrated some degradation in predictive capability. Temperature trajectories were similar across health systems. Hypothermic and hyperthermic-slow-resolver patients consistently had the highest mortality. Conclusions CLIF facilitates efficient, rigorous, and reproducible critical care research. Our federated case studies showcase CLIF's potential for disease sub-phenotyping and clinical decision-support evaluation. Future applications include pragmatic EHR-based trials, target trial emulations, foundational multi-modal AI models of critical illness, and real-time critical care quality dashboards.
Collapse
Affiliation(s)
- Juan C Rojas
- Division of Pulmonology, Critical Care, and Sleep Medicine, Rush University, Chicago, IL
| | - Patrick G Lyons
- Department of Medicine, Oregon Health & Science University, Portland, OR
| | - Kaveri Chhikara
- Section of Pulmonary and Critical Care, Department of Medicine, University of Chicago, Chicago, IL
| | - Vaishvik Chaudhari
- Division of Pulmonology, Critical Care, and Sleep Medicine, Rush University, Chicago, IL
| | | | - Muna Nour
- Department of Medicine, Emory University, Atlanta, GA
| | - Kevin G Buell
- Section of Pulmonary and Critical Care, Department of Medicine, University of Chicago, Chicago, IL
| | - Kevin D Smith
- Section of Pulmonary and Critical Care, Department of Medicine, University of Chicago, Chicago, IL
| | - Catherine A Gao
- Division of Pulmonary and Critical Care, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL
| | - Saki Amagai
- Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL
| | - Chengsheng Mao
- Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL
| | - Yuan Luo
- Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL
| | - Anna K Barker
- Division of Pulmonary and Critical Care, Department of Internal Medicine, University of Michigan, Ann Arbor, MI
| | - Mark Nuppnau
- Division of Pulmonary and Critical Care, Department of Internal Medicine, University of Michigan, Ann Arbor, MI
| | - Haley Beck
- MacLean Center for Clinical Medical Ethics, University of Chicago Medicine, Chicago, IL
| | - Rachel Baccile
- Section of Pulmonary and Critical Care, Department of Medicine, University of Chicago, Chicago, IL
| | - Michael Hermsen
- Department of Medicine, University of Wisconsin School of Medicine and Public Health, Madison, WI
| | - Zewei Liao
- Department of Medicine, University of Chicago, Chicago, IL
| | - Brenna Park-Egan
- Department of Medicine, Oregon Health & Science University, Portland, OR
| | - Kyle A Carey
- Department of Medicine, University of Chicago, Chicago, IL
| | - XuanHan
- Division of Pulmonary, Critical Care, and Sleep Medicine, Department of Medicine, Tufts University School of Medicine, Boston, MA
| | - Chad H Hochberg
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, Johns Hopkins University, Baltimore, MD
| | - Nicholas E Ingraham
- Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, University of Minnesota Medical School; University of Minnesota, Minneapolis, MN
| | - William F Parker
- Section of Pulmonary and Critical Care, Department of Medicine, University of Chicago, Chicago, IL
- MacLean Center for Clinical Medical Ethics, University of Chicago Medicine, Chicago, IL
- Department of Public Health Sciences, University of Chicago, Chicago, IL
| |
Collapse
|
2
|
De Brouwer M, Bonte P, Arndt D, Vander Sande M, Dimou A, Verborgh R, De Turck F, Ongenae F. Optimized continuous homecare provisioning through distributed data-driven semantic services and cross-organizational workflows. J Biomed Semantics 2024; 15:9. [PMID: 38845042 PMCID: PMC11154993 DOI: 10.1186/s13326-024-00303-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 03/19/2024] [Indexed: 06/09/2024] Open
Abstract
BACKGROUND In healthcare, an increasing collaboration can be noticed between different caregivers, especially considering the shift to homecare. To provide optimal patient care, efficient coordination of data and workflows between these different stakeholders is required. To achieve this, data should be exposed in a machine-interpretable, reusable manner. In addition, there is a need for smart, dynamic, personalized and performant services provided on top of this data. Flexible workflows should be defined that realize their desired functionality, adhere to use case specific quality constraints and improve coordination across stakeholders. User interfaces should allow configuring all of this in an easy, user-friendly way. METHODS A distributed, generic, cascading reasoning reference architecture can solve the presented challenges. It can be instantiated with existing tools built upon Semantic Web technologies that provide data-driven semantic services and constructing cross-organizational workflows. These tools include RMLStreamer to generate Linked Data, DIVIDE to adaptively manage contextually relevant local queries, Streaming MASSIF to deploy reusable services, AMADEUS to compose semantic workflows, and RMLEditor and Matey to configure rules to generate Linked Data. RESULTS A use case demonstrator is built on a scenario that focuses on personalized smart monitoring and cross-organizational treatment planning. The performance and usability of the demonstrator's implementation is evaluated. The former shows that the monitoring pipeline efficiently processes a stream of 14 observations per second: RMLStreamer maps JSON observations to RDF in 13.5 ms, a C-SPARQL query to generate fever alarms is executed on a window of 5 s in 26.4 ms, and Streaming MASSIF generates a smart notification for fever alarms based on severity and urgency in 1539.5 ms. DIVIDE derives the C-SPARQL queries in 7249.5 ms, while AMADEUS constructs a colon cancer treatment plan and performs conflict detection with it in 190.8 ms and 1335.7 ms, respectively. CONCLUSIONS Existing tools built upon Semantic Web technologies can be leveraged to optimize continuous care provisioning. The evaluation of the building blocks on a realistic homecare monitoring use case demonstrates their applicability, usability and good performance. Further extending the available user interfaces for some tools is required to increase their adoption.
Collapse
Affiliation(s)
- Mathias De Brouwer
- Department of Information Technology, IDLab - Ghent University - imec, 9052, Ghent, Belgium.
| | - Pieter Bonte
- Stream Intelligence Lab, KU Leuven Kulak, Kortrijk, 8500, Belgium
| | - Dörthe Arndt
- International Center for Computational Logic, Technische Universität Dresden, 01187, Dresden, Germany
| | | | - Anastasia Dimou
- Department of Computer Science, KU Leuven, 2860, Sint-Katelijne-Waver, Belgium
| | - Ruben Verborgh
- Department of Electronics and Information Systems, IDLab - Ghent University - imec, 9052, Ghent, Belgium
| | - Filip De Turck
- Department of Information Technology, IDLab - Ghent University - imec, 9052, Ghent, Belgium
| | - Femke Ongenae
- Department of Information Technology, IDLab - Ghent University - imec, 9052, Ghent, Belgium
| |
Collapse
|
3
|
Palojoki S, Lehtonen L, Vuokko R. Semantic Interoperability of Electronic Health Records: Systematic Review of Alternative Approaches for Enhancing Patient Information Availability. JMIR Med Inform 2024; 12:e53535. [PMID: 38686541 PMCID: PMC11066539 DOI: 10.2196/53535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 02/21/2024] [Accepted: 02/24/2024] [Indexed: 05/02/2024] Open
Abstract
Background Semantic interoperability facilitates the exchange of and access to health data that are being documented in electronic health records (EHRs) with various semantic features. The main goals of semantic interoperability development entail patient data availability and use in diverse EHRs without a loss of meaning. Internationally, current initiatives aim to enhance semantic development of EHR data and, consequently, the availability of patient data. Interoperability between health information systems is among the core goals of the European Health Data Space regulation proposal and the World Health Organization's Global Strategy on Digital Health 2020-2025. Objective To achieve integrated health data ecosystems, stakeholders need to overcome challenges of implementing semantic interoperability elements. To research the available scientific evidence on semantic interoperability development, we defined the following research questions: What are the key elements of and approaches for building semantic interoperability integrated in EHRs? What kinds of goals are driving the development? and What kinds of clinical benefits are perceived following this development? Methods Our research questions focused on key aspects and approaches for semantic interoperability and on possible clinical and semantic benefits of these choices in the context of EHRs. Therefore, we performed a systematic literature review in PubMed by defining our study framework based on previous research. Results Our analysis consisted of 14 studies where data models, ontologies, terminologies, classifications, and standards were applied for building interoperability. All articles reported clinical benefits of the selected approach to enhancing semantic interoperability. We identified 3 main categories: increasing the availability of data for clinicians (n=6, 43%), increasing the quality of care (n=4, 29%), and enhancing clinical data use and reuse for varied purposes (n=4, 29%). Regarding semantic development goals, data harmonization and developing semantic interoperability between different EHRs was the largest category (n=8, 57%). Enhancing health data quality through standardization (n=5, 36%) and developing EHR-integrated tools based on interoperable data (n=1, 7%) were the other identified categories. The results were closely coupled with the need to build usable and computable data out of heterogeneous medical information that is accessible through various EHRs and databases (eg, registers). Conclusions When heading toward semantic harmonization of clinical data, more experiences and analyses are needed to assess how applicable the chosen solutions are for semantic interoperability of health care data. Instead of promoting a single approach, semantic interoperability should be assessed through several levels of semantic requirements A dual model or multimodel approach is possibly usable to address different semantic interoperability issues during development. The objectives of semantic interoperability are to be achieved in diffuse and disconnected clinical care environments. Therefore, approaches for enhancing clinical data availability should be well prepared, thought out, and justified to meet economically sustainable and long-term outcomes.
Collapse
Affiliation(s)
- Sari Palojoki
- Department of Steering of Healthcare and Social Welfare, Ministry of Social Affairs and Health, Helsinki, Finland
| | - Lasse Lehtonen
- Diagnostic Center, Helsinki University Hospital District, Helsinki, Finland
| | - Riikka Vuokko
- Department of Steering of Healthcare and Social Welfare, Ministry of Social Affairs and Health, Helsinki, Finland
| |
Collapse
|
4
|
Fogleman BM, Goldman M, Holland AB, Dyess G, Patel A. Charting Tomorrow's Healthcare: A Traditional Literature Review for an Artificial Intelligence-Driven Future. Cureus 2024; 16:e58032. [PMID: 38738104 PMCID: PMC11088287 DOI: 10.7759/cureus.58032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/11/2024] [Indexed: 05/14/2024] Open
Abstract
Electronic health record (EHR) systems have developed over time in parallel with general advancements in mainstream technology. As artificially intelligent (AI) systems rapidly impact multiple societal sectors, it has become apparent that medicine is not immune from the influences of this powerful technology. Particularly appealing is how AI may aid in improving healthcare efficiency with note-writing automation. This literature review explores the current state of EHR technologies in healthcare, specifically focusing on possibilities for addressing EHR challenges through the automation of dictation and note-writing processes with AI integration. This review offers a broad understanding of existing capabilities and potential advancements, emphasizing innovations such as voice-to-text dictation, wearable devices, and AI-assisted procedure note dictation. The primary objective is to provide researchers with valuable insights, enabling them to generate new technologies and advancements within the healthcare landscape. By exploring the benefits, challenges, and future of AI integration, this review encourages the development of innovative solutions, with the goal of enhancing patient care and healthcare delivery efficiency.
Collapse
Affiliation(s)
- Brody M Fogleman
- Internal Medicine, Edward Via College of Osteopathic Medicine - Carolinas, Spartanburg, USA
| | - Matthew Goldman
- Neurological Surgery, Houston Methodist Hospital, Houston, USA
| | - Alexander B Holland
- General Surgery, Edward Via College of Osteopathic Medicine - Carolinas, Spartanburg, USA
| | - Garrett Dyess
- Medicine, University of South Alabama College of Medicine, Mobile, USA
| | - Aashay Patel
- Neurological Surgery, University of Florida College of Medicine, Gainesville, USA
| |
Collapse
|
5
|
Frid S, Pastor Duran X, Bracons Cucó G, Pedrera-Jiménez M, Serrano-Balazote P, Muñoz Carrero A, Lozano-Rubí R. An Ontology-Based Approach for Consolidating Patient Data Standardized With European Norm/International Organization for Standardization 13606 (EN/ISO 13606) Into Joint Observational Medical Outcomes Partnership (OMOP) Repositories: Description of a Methodology. JMIR Med Inform 2023; 11:e44547. [PMID: 36884279 PMCID: PMC10034609 DOI: 10.2196/44547] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 12/28/2022] [Accepted: 01/05/2023] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND To discover new knowledge from data, they must be correct and in a consistent format. OntoCR, a clinical repository developed at Hospital Clínic de Barcelona, uses ontologies to represent clinical knowledge and map locally defined variables to health information standards and common data models. OBJECTIVE The aim of the study is to design and implement a scalable methodology based on the dual-model paradigm and the use of ontologies to consolidate clinical data from different organizations in a standardized repository for research purposes without loss of meaning. METHODS First, the relevant clinical variables are defined, and the corresponding European Norm/International Organization for Standardization (EN/ISO) 13606 archetypes are created. Data sources are then identified, and an extract, transform, and load process is carried out. Once the final data set is obtained, the data are transformed to create EN/ISO 13606-normalized electronic health record (EHR) extracts. Afterward, ontologies that represent archetyped concepts and map them to EN/ISO 13606 and Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) standards are created and uploaded to OntoCR. Data stored in the extracts are inserted into its corresponding place in the ontology, thus obtaining instantiated patient data in the ontology-based repository. Finally, data can be extracted via SPARQL queries as OMOP CDM-compliant tables. RESULTS Using this methodology, EN/ISO 13606-standardized archetypes that allow for the reuse of clinical information were created, and the knowledge representation of our clinical repository by modeling and mapping ontologies was extended. Furthermore, EN/ISO 13606-compliant EHR extracts of patients (6803), episodes (13,938), diagnosis (190,878), administered medication (222,225), cumulative drug dose (222,225), prescribed medication (351,247), movements between units (47,817), clinical observations (6,736,745), laboratory observations (3,392,873), limitation of life-sustaining treatment (1,298), and procedures (19,861) were created. Since the creation of the application that inserts data from extracts into the ontologies is not yet finished, the queries were tested and the methodology was validated by importing data from a random subset of patients into the ontologies using a locally developed Protégé plugin ("OntoLoad"). In total, 10 OMOP CDM-compliant tables ("Condition_occurrence," 864 records; "Death," 110; "Device_exposure," 56; "Drug_exposure," 5609; "Measurement," 2091; "Observation," 195; "Observation_period," 897; "Person," 922; "Visit_detail," 772; and "Visit_occurrence," 971) were successfully created and populated. CONCLUSIONS This study proposes a methodology for standardizing clinical data, thus allowing its reuse without any changes in the meaning of the modeled concepts. Although this paper focuses on health research, our methodology suggests that the data be initially standardized per EN/ISO 13606 to obtain EHR extracts with a high level of granularity that can be used for any purpose. Ontologies constitute a valuable approach for knowledge representation and standardization of health information in a standard-agnostic manner. With the proposed methodology, institutions can go from local raw data to standardized, semantically interoperable EN/ISO 13606 and OMOP repositories.
Collapse
Affiliation(s)
- Santiago Frid
- Medical Informatics Unit, Hospital Clínic de Barcelona, Barcelona, Spain
- Clinical Foundations Department, Universitat de Barcelona, Barcelona, Spain
| | - Xavier Pastor Duran
- Medical Informatics Unit, Hospital Clínic de Barcelona, Barcelona, Spain
- Clinical Foundations Department, Universitat de Barcelona, Barcelona, Spain
| | | | | | | | - Adolfo Muñoz Carrero
- Unit of Investigation in Telemedicine and Digital Health, Instituto de Salud Carlos III, Madrid, Spain
| | - Raimundo Lozano-Rubí
- Medical Informatics Unit, Hospital Clínic de Barcelona, Barcelona, Spain
- Clinical Foundations Department, Universitat de Barcelona, Barcelona, Spain
| |
Collapse
|
6
|
Pedrera-Jiménez M, García-Barrio N, Rubio-Mayo P, Tato-Gómez A, Cruz-Bermúdez JL, Bernal-Sobrino JL, Muñoz-Carrero A, Serrano-Balazote P. TransformEHRs: a flexible methodology for building transparent ETL processes for EHR reuse. Methods Inf Med 2022; 61:e89-e102. [PMID: 36220109 PMCID: PMC9788916 DOI: 10.1055/s-0042-1757763] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
BACKGROUND During the COVID-19 pandemic, several methodologies were designed for obtaining electronic health record (EHR)-derived datasets for research. These processes are often based on black boxes, on which clinical researchers are unaware of how the data were recorded, extracted, and transformed. In order to solve this, it is essential that extract, transform, and load (ETL) processes are based on transparent, homogeneous, and formal methodologies, making them understandable, reproducible, and auditable. OBJECTIVES This study aims to design and implement a methodology, according with FAIR Principles, for building ETL processes (focused on data extraction, selection, and transformation) for EHR reuse in a transparent and flexible manner, applicable to any clinical condition and health care organization. METHODS The proposed methodology comprises four stages: (1) analysis of secondary use models and identification of data operations, based on internationally used clinical repositories, case report forms, and aggregated datasets; (2) modeling and formalization of data operations, through the paradigm of the Detailed Clinical Models; (3) agnostic development of data operations, selecting SQL and R as programming languages; and (4) automation of the ETL instantiation, building a formal configuration file with XML. RESULTS First, four international projects were analyzed to identify 17 operations, necessary to obtain datasets according to the specifications of these projects from the EHR. With this, each of the data operations was formalized, using the ISO 13606 reference model, specifying the valid data types as arguments, inputs and outputs, and their cardinality. Then, an agnostic catalog of data was developed through data-oriented programming languages previously selected. Finally, an automated ETL instantiation process was built from an ETL configuration file formally defined. CONCLUSIONS This study has provided a transparent and flexible solution to the difficulty of making the processes for obtaining EHR-derived data for secondary use understandable, auditable, and reproducible. Moreover, the abstraction carried out in this study means that any previous EHR reuse methodology can incorporate these results into them.
Collapse
Affiliation(s)
- Miguel Pedrera-Jiménez
- Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain,ETSI Telecomunicación, Universidad Politécnica de Madrid, Madrid, Spain,Address for correspondence Miguel Pedrera-Jiménez, Eng, MSc Health Informatics DepartmentHospital Universitario 12 de Octubre, Av. de Córdoba, s/n, 28041 MadridSpain
| | - Noelia García-Barrio
- Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain
| | - Paula Rubio-Mayo
- Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain
| | - Alberto Tato-Gómez
- Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain
| | - Juan Luis Cruz-Bermúdez
- Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain
| | - José Luis Bernal-Sobrino
- Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain
| | | | - Pablo Serrano-Balazote
- Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain
| |
Collapse
|
7
|
Zhang H, Lyu T, Yin P, Bost S, He X, Guo Y, Prosperi M, Hogan WR, Bian J. A scoping review of semantic integration of health data and information. Int J Med Inform 2022; 165:104834. [PMID: 35863206 DOI: 10.1016/j.ijmedinf.2022.104834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 07/06/2022] [Accepted: 07/13/2022] [Indexed: 11/25/2022]
Abstract
OBJECTIVE We summarized a decade of new research focusing on semantic data integration (SDI) since 2009, and we aim to: (1) summarize the state-of-art approaches on integrating health data and information; and (2) identify the main gaps and challenges of integrating health data and information from multiple levels and domains. MATERIALS AND METHODS We used PubMed as our focus is applications of SDI in biomedical domains and followed the Preferred Reporting Items for Systematic Review and Meta-Analyses (PRISMA) to search and report for relevant studies published between January 1, 2009 and December 31, 2021. We used Covidence-a systematic review management system-to carry out this scoping review. RESULTS The initial search from PubMed resulted in 5,326 articles using the two sets of keywords. We then removed 44 duplicates and 5,282 articles were retained for abstract screening. After abstract screening, we included 246 articles for full-text screening, among which 87 articles were deemed eligible for full-text extraction. We summarized the 87 articles from four aspects: (1) methods for the global schema; (2) data integration strategies (i.e., federated system vs. data warehousing); (3) the sources of the data; and (4) downstream applications. CONCLUSION SDI approach can effectively resolve the semantic heterogeneities across different data sources. We identified two key gaps and challenges in existing SDI studies that (1) many of the existing SDI studies used data from only single-level data sources (e.g., integrating individual-level patient records from different hospital systems), and (2) documentation of the data integration processes is sparse, threatening the reproducibility of SDI studies.
Collapse
Affiliation(s)
- Hansi Zhang
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Tianchen Lyu
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Pengfei Yin
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Sarah Bost
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Xing He
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Yi Guo
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Mattia Prosperi
- Department of Epidemiology, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Willian R Hogan
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Jiang Bian
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States.
| |
Collapse
|
8
|
Sun H, Depraetere K, Meesseman L, Cabanillas Silva P, Szymanowsky R, Fliegenschmidt J, Hulde N, von Dossow V, Vanbiervliet M, De Baerdemaeker J, Roccaro-Waldmeyer DM, Stieg J, Domínguez Hidalgo M, Dahlweid FM. Evaluating live performance of machine learning based prediction models for different clinical risks: a study of live systems in different hospitals (Preprint). J Med Internet Res 2021; 24:e34295. [PMID: 35502887 PMCID: PMC9214618 DOI: 10.2196/34295] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 02/25/2022] [Accepted: 04/12/2022] [Indexed: 11/30/2022] Open
Abstract
Background Machine learning algorithms are currently used in a wide array of clinical domains to produce models that can predict clinical risk events. Most models are developed and evaluated with retrospective data, very few are evaluated in a clinical workflow, and even fewer report performances in different hospitals. In this study, we provide detailed evaluations of clinical risk prediction models in live clinical workflows for three different use cases in three different hospitals. Objective The main objective of this study was to evaluate clinical risk prediction models in live clinical workflows and compare their performance in these setting with their performance when using retrospective data. We also aimed at generalizing the results by applying our investigation to three different use cases in three different hospitals. Methods We trained clinical risk prediction models for three use cases (ie, delirium, sepsis, and acute kidney injury) in three different hospitals with retrospective data. We used machine learning and, specifically, deep learning to train models that were based on the Transformer model. The models were trained using a calibration tool that is common for all hospitals and use cases. The models had a common design but were calibrated using each hospital’s specific data. The models were deployed in these three hospitals and used in daily clinical practice. The predictions made by these models were logged and correlated with the diagnosis at discharge. We compared their performance with evaluations on retrospective data and conducted cross-hospital evaluations. Results The performance of the prediction models with data from live clinical workflows was similar to the performance with retrospective data. The average value of the area under the receiver operating characteristic curve (AUROC) decreased slightly by 0.6 percentage points (from 94.8% to 94.2% at discharge). The cross-hospital evaluations exhibited severely reduced performance: the average AUROC decreased by 8 percentage points (from 94.2% to 86.3% at discharge), which indicates the importance of model calibration with data from the deployment hospital. Conclusions Calibrating the prediction model with data from different deployment hospitals led to good performance in live settings. The performance degradation in the cross-hospital evaluation identified limitations in developing a generic model for different hospitals. Designing a generic process for model development to generate specialized prediction models for each hospital guarantees model performance in different hospitals.
Collapse
Affiliation(s)
- Hong Sun
- Dedalus Healthcare, Antwerp, Belgium
| | | | | | | | | | - Janis Fliegenschmidt
- Institute of Anesthesiology and Pain Therapy, Heart and Diabetes Centre North Rhine-Westphalia, University Hospital of Ruhr-University Bochum, Bad Oeynhausen, Germany
| | - Nikolai Hulde
- Institute of Anesthesiology and Pain Therapy, Heart and Diabetes Centre North Rhine-Westphalia, University Hospital of Ruhr-University Bochum, Bad Oeynhausen, Germany
| | - Vera von Dossow
- Institute of Anesthesiology and Pain Therapy, Heart and Diabetes Centre North Rhine-Westphalia, University Hospital of Ruhr-University Bochum, Bad Oeynhausen, Germany
| | | | | | | | | | | | | |
Collapse
|
9
|
Sun H, Depraetere K, Meesseman L, De Roo J, Vanbiervliet M, De Baerdemaeker J, Muys H, von Dossow V, Hulde N, Szymanowsky R. A scalable approach for developing clinical risk prediction applications in different hospitals. J Biomed Inform 2021; 118:103783. [DOI: 10.1016/j.jbi.2021.103783] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2020] [Revised: 04/07/2021] [Accepted: 04/08/2021] [Indexed: 12/19/2022]
|
10
|
Sun H, Arndt D, De Roo J, Mannens E. Predicting future state for adaptive clinical pathway management. J Biomed Inform 2021; 117:103750. [PMID: 33774204 DOI: 10.1016/j.jbi.2021.103750] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Revised: 02/17/2021] [Accepted: 03/11/2021] [Indexed: 11/27/2022]
Abstract
Clinical decision support systems are assisting physicians in providing care to patients. However, in the context of clinical pathway management such systems are rather limited as they only take the current state of the patient into account and ignore the possible evolvement of that state in the future. In the past decade, the availability of big data in the healthcare domain did open a new era for clinical decision support. Machine learning technologies are now widely used in the clinical domain, nevertheless, mostly as a tool for disease prediction. A tool that not only predicts future states, but also enables adaptive clinical pathway management based on these predictions is still in need. This paper introduces weighted state transition logic, a logic to model state changes based on actions planned in clinical pathways. Weighted state transition logic extends linear logic by taking weights - numerical values indicating the quality of an action or an entire clinical pathway - into account. It allows us to predict the future states of a patient and it enables adaptive clinical pathway management based on these predictions. We provide an implementation of weighted state transition logic using semantic web technologies, which makes it easy to integrate semantic data and rules as background knowledge. Executed by a semantic reasoner, it is possible to generate a clinical pathway towards a target state, as well as to detect potential conflicts in the future when multiple pathways are coexisting. The transitions from the current state to the predicted future state are traceable, which builds trust from human users on the generated pathway.
Collapse
Affiliation(s)
- Hong Sun
- Dedalus Healthcare, Roderveldlaan 2, 2600 Antwerp, Belgium.
| | - Dörthe Arndt
- IDLab, Department of Electronics and Information Systems, Ghent University - imec, AA-Tower, Technologiepark 122, B-9052 Ghent, Belgium; Computational Logic Group, TU Dresden, Germany
| | - Jos De Roo
- Dedalus Healthcare, Roderveldlaan 2, 2600 Antwerp, Belgium
| | - Erik Mannens
- IDLab, Department of Electronics and Information Systems, Ghent University - imec, AA-Tower, Technologiepark 122, B-9052 Ghent, Belgium
| |
Collapse
|
11
|
Pedrera-Jiménez M, García-Barrio N, Cruz-Rojo J, Terriza-Torres AI, López-Jiménez EA, Calvo-Boyero F, Jiménez-Cerezo MJ, Blanco-Martínez AJ, Roig-Domínguez G, Cruz-Bermúdez JL, Bernal-Sobrino JL, Serrano-Balazote P, Muñoz-Carrero A. Obtaining EHR-derived datasets for COVID-19 research within a short time: a flexible methodology based on Detailed Clinical Models. J Biomed Inform 2021; 115:103697. [PMID: 33548541 PMCID: PMC7857038 DOI: 10.1016/j.jbi.2021.103697] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Revised: 12/18/2020] [Accepted: 02/01/2021] [Indexed: 10/27/2022]
Abstract
BACKGROUND COVID-19 ranks as the single largest health incident worldwide in decades. In such a scenario, electronic health records (EHRs) should provide a timely response to healthcare needs and to data uses that go beyond direct medical care and are known as secondary uses, which include biomedical research. However, it is usual for each data analysis initiative to define its own information model in line with its requirements. These specifications share clinical concepts, but differ in format and recording criteria, something that creates data entry redundancy in multiple electronic data capture systems (EDCs) with the consequent investment of effort and time by the organization. OBJECTIVE This study sought to design and implement a flexible methodology based on detailed clinical models (DCM), which would enable EHRs generated in a tertiary hospital to be effectively reused without loss of meaning and within a short time. MATERIAL AND METHODS The proposed methodology comprises four stages: (1) specification of an initial set of relevant variables for COVID-19; (2) modeling and formalization of clinical concepts using ISO 13606 standard and SNOMED CT and LOINC terminologies; (3) definition of transformation rules to generate secondary use models from standardized EHRs and development of them using R language; and (4) implementation and validation of the methodology through the generation of the International Severe Acute Respiratory and emerging Infection Consortium (ISARIC-WHO) COVID-19 case report form. This process has been implemented into a 1300-bed tertiary Hospital for a cohort of 4489 patients hospitalized from 25 February 2020 to 10 September 2020. RESULTS An initial and expandable set of relevant concepts for COVID-19 was identified, modeled and formalized using ISO-13606 standard and SNOMED CT and LOINC terminologies. Similarly, an algorithm was designed and implemented with R and then applied to process EHRs in accordance with standardized concepts, transforming them into secondary use models. Lastly, these resources were applied to obtain a data extract conforming to the ISARIC-WHO COVID-19 case report form, without requiring manual data collection. The methodology allowed obtaining the observation domain of this model with a coverage of over 85% of patients in the majority of concepts. CONCLUSION This study has furnished a solution to the difficulty of rapidly and efficiently obtaining EHR-derived data for secondary use in COVID-19, capable of adapting to changes in data specifications and applicable to other organizations and other health conditions. The conclusion to be drawn from this initial validation is that this DCM-based methodology allows the effective reuse of EHRs generated in a tertiary Hospital during COVID-19 pandemic, with no additional effort or time for the organization and with a greater data scope than that yielded by conventional manual data collection process in ad-hoc EDCs.
Collapse
Affiliation(s)
- Miguel Pedrera-Jiménez
- Hospital Universitario 12 de Octubre, Av. de Córdoba, s/n, 28041 Madrid, Spain; ETSI Telecomunicación, Universidad Politécnica de Madrid, 28040 Madrid, Spain.
| | | | - Jaime Cruz-Rojo
- Hospital Universitario 12 de Octubre, Av. de Córdoba, s/n, 28041 Madrid, Spain.
| | | | | | | | | | | | | | | | | | | | - Adolfo Muñoz-Carrero
- Digital Health Research Dept., Instituto de Salud Carlos III, Av. de Monforte de Lemos, 5, 28029 Madrid, Spain.
| |
Collapse
|
12
|
Kersloot MG, van Putten FJP, Abu-Hanna A, Cornet R, Arts DL. Natural language processing algorithms for mapping clinical text fragments onto ontology concepts: a systematic review and recommendations for future studies. J Biomed Semantics 2020; 11:14. [PMID: 33198814 PMCID: PMC7670625 DOI: 10.1186/s13326-020-00231-z] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Accepted: 11/03/2020] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Free-text descriptions in electronic health records (EHRs) can be of interest for clinical research and care optimization. However, free text cannot be readily interpreted by a computer and, therefore, has limited value. Natural Language Processing (NLP) algorithms can make free text machine-interpretable by attaching ontology concepts to it. However, implementations of NLP algorithms are not evaluated consistently. Therefore, the objective of this study was to review the current methods used for developing and evaluating NLP algorithms that map clinical text fragments onto ontology concepts. To standardize the evaluation of algorithms and reduce heterogeneity between studies, we propose a list of recommendations. METHODS Two reviewers examined publications indexed by Scopus, IEEE, MEDLINE, EMBASE, the ACM Digital Library, and the ACL Anthology. Publications reporting on NLP for mapping clinical text from EHRs to ontology concepts were included. Year, country, setting, objective, evaluation and validation methods, NLP algorithms, terminology systems, dataset size and language, performance measures, reference standard, generalizability, operational use, and source code availability were extracted. The studies' objectives were categorized by way of induction. These results were used to define recommendations. RESULTS Two thousand three hundred fifty five unique studies were identified. Two hundred fifty six studies reported on the development of NLP algorithms for mapping free text to ontology concepts. Seventy-seven described development and evaluation. Twenty-two studies did not perform a validation on unseen data and 68 studies did not perform external validation. Of 23 studies that claimed that their algorithm was generalizable, 5 tested this by external validation. A list of sixteen recommendations regarding the usage of NLP systems and algorithms, usage of data, evaluation and validation, presentation of results, and generalizability of results was developed. CONCLUSION We found many heterogeneous approaches to the reporting on the development and evaluation of NLP algorithms that map clinical text to ontology concepts. Over one-fourth of the identified publications did not perform an evaluation. In addition, over one-fourth of the included studies did not perform a validation, and 88% did not perform external validation. We believe that our recommendations, alongside an existing reporting standard, will increase the reproducibility and reusability of future studies and NLP algorithms in medicine.
Collapse
Affiliation(s)
- Martijn G. Kersloot
- Amsterdam UMC, University of Amsterdam, Department of Medical Informatics, Amsterdam Public Health Research Institute Castor EDC, Room J1B-109, PO Box 22700, 1100 DE Amsterdam, The Netherlands
- Castor EDC, Amsterdam, The Netherlands
| | - Florentien J. P. van Putten
- Amsterdam UMC, University of Amsterdam, Department of Medical Informatics, Amsterdam Public Health Research Institute Castor EDC, Room J1B-109, PO Box 22700, 1100 DE Amsterdam, The Netherlands
| | - Ameen Abu-Hanna
- Amsterdam UMC, University of Amsterdam, Department of Medical Informatics, Amsterdam Public Health Research Institute Castor EDC, Room J1B-109, PO Box 22700, 1100 DE Amsterdam, The Netherlands
| | - Ronald Cornet
- Amsterdam UMC, University of Amsterdam, Department of Medical Informatics, Amsterdam Public Health Research Institute Castor EDC, Room J1B-109, PO Box 22700, 1100 DE Amsterdam, The Netherlands
| | - Derk L. Arts
- Amsterdam UMC, University of Amsterdam, Department of Medical Informatics, Amsterdam Public Health Research Institute Castor EDC, Room J1B-109, PO Box 22700, 1100 DE Amsterdam, The Netherlands
- Castor EDC, Amsterdam, The Netherlands
| |
Collapse
|
13
|
Huang L, Luo H, Li S, Wu FX, Wang J. Drug-drug similarity measure and its applications. Brief Bioinform 2020; 22:5956929. [PMID: 33152756 DOI: 10.1093/bib/bbaa265] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 09/13/2020] [Accepted: 09/14/2020] [Indexed: 02/01/2023] Open
Abstract
Drug similarities play an important role in modern biology and medicine, as they help scientists gain deep insights into drugs' therapeutic mechanisms and conduct wet labs that may significantly improve the efficiency of drug research and development. Nowadays, a number of drug-related databases have been constructed, with which many methods have been developed for computing similarities between drugs for studying associations between drugs, human diseases, proteins (drug targets) and more. In this review, firstly, we briefly introduce the publicly available drug-related databases. Secondly, based on different drug features, interaction relationships and multimodal data, we summarize similarity calculation methods in details. Then, we discuss the applications of drug similarities in various biological and medical areas. Finally, we evaluate drug similarity calculation methods with common evaluation metrics to illustrate the important roles of drug similarity measures on different applications.
Collapse
Affiliation(s)
- Lan Huang
- Hunan Provincial Key Lab of Bioinformatics, School of Computer Science and Engineering at Central South University, Hunan, China
| | - Huimin Luo
- School of Computer and Information Engineering at Henan University, Kaifeng, China
| | - Suning Li
- Hunan Provincial Key Lab of Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| | - Fang-Xiang Wu
- College of Engineering and Department of Computer Sciences, University of Saskatchewan, Saskatoon, Canada
| | - Jianxin Wang
- Hunan Provincial Key Lab of Bioinformatics, School of Computer Science and Engineering at Central South University, Hunan, China
| |
Collapse
|
14
|
Yu G, Zeng X, Ni S, Jia Z, Chen W, Lu X, An J, Duan H, Shu Q, Li H. A computational method to quantitatively measure pediatric drug safety using electronic medical records. BMC Med Res Methodol 2020; 20:9. [PMID: 31937265 PMCID: PMC6961323 DOI: 10.1186/s12874-020-0902-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Accepted: 01/09/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Drug safety in children is a major concern; however, there is still a lack of methods for quantitatively measuring, let alone to improving, drug safety in children under different clinical conditions. To assess pediatric drug safety under different clinical conditions, a computational method based on Electronic Medical Record (EMR) datasets was proposed. METHODS In this study, a computational method was designed to extract the significant drug-diagnosis associations (based on a Bonferroni-adjusted hypergeometric P-value < 0.05) among drug and diagnosis co-occurrence in EMR datasets. This allows for differences between pediatric and adult drug use to be compared based on different EMR datasets. The drug-diagnosis associations were further used to generate drug clusters under specific clinical conditions using unsupervised clustering. A 5-layer quantitative pediatric drug safety level was proposed based on the drug safety statement of the pediatric labeling of each drug. Therefore, the drug safety levels under different pediatric clinical conditions were calculated. Two EMR datasets from a 1900-bed children's hospital and a 2000-bed general hospital were used to test this method. RESULTS The comparison between the children's hospital and the general hospital showed unique features of pediatric drug use and identified the drug treatment gap between children and adults. In total, 591 drugs were used in the children's hospital; 18 drug clusters that were associated with certain clinical conditions were generated based on our method; and the quantitative drug safety levels of each drug cluster (under different clinical conditions) were calculated, analyzed, and visualized. CONCLUSION With this method, quantitative drug safety levels under certain clinical conditions in pediatric patients can be evaluated and compared. If there are longitudinal data, improvements can also be measured. This method has the potential to be used in many population-level, health data-based drug safety studies.
Collapse
Affiliation(s)
- Gang Yu
- The Children's Hospital of Zhejiang University School of Medicine and National Clinical Research Center for Child Health, Hangzhou, China
| | - Xian Zeng
- College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China
| | - Shaoqing Ni
- The Children's Hospital of Zhejiang University School of Medicine and National Clinical Research Center for Child Health, Hangzhou, China
| | - Zheng Jia
- College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China
| | - Weihong Chen
- Department of Pharmacy, Shanxi Dayi Hospital, Taiyuan, China
| | - Xudong Lu
- College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China
| | - Jiye An
- College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China
| | - Huilong Duan
- College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China
| | - Qiang Shu
- The Children's Hospital of Zhejiang University School of Medicine and National Clinical Research Center for Child Health, Hangzhou, China
| | - Haomin Li
- The Children's Hospital of Zhejiang University School of Medicine and National Clinical Research Center for Child Health, Hangzhou, China.
| |
Collapse
|
15
|
Abstract
PIC (Paediatric Intensive Care) is a large paediatric-specific, single-centre, bilingual database comprising information relating to children admitted to critical care units at a large children's hospital in China. The database is deidentified and includes vital sign measurements, medications, laboratory measurements, fluid balance, diagnostic codes, length of hospital stays, survival data, and more. The data are publicly available after registration, which includes completion of a training course on research with human subjects and signing of a data use agreement mandating responsible handling of the data and adherence to the principle of collaborative research. Although the PIC can be considered an extension of the widely used MIMIC (Medical Information Mart for Intensive Care) database in the field of paediatric critical care, it has many unique characteristics and can support database-based academic and industrial applications such as machine learning algorithms, clinical decision support tools, quality improvement initiatives, and international data sharing.
Collapse
|
16
|
Freedman HG, Williams H, Miller MA, Birtwell D, Mowery DL, Stoeckert CJ. A novel tool for standardizing clinical data in a semantically rich model. J Biomed Inform 2020; 112S:100086. [PMID: 34417005 DOI: 10.1016/j.yjbinx.2020.100086] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Revised: 09/08/2020] [Accepted: 09/09/2020] [Indexed: 11/18/2022]
Abstract
Standardizing clinical information in a semantically rich data model is useful for promoting interoperability and facilitating high quality research. Semantic Web technologies such as Resource Description Framework can be utilized to their full potential when a model accurately reflects the semantics of the clinical situation it describes. To this end, ontologies that abide by sound organizational principles can be used as the building blocks of a semantically rich model for the storage of clinical data. However, it is a challenge to programmatically define such a model and load data from disparate sources. The PennTURBO Semantic Engine is a tool developed at the University of Pennsylvania that transforms concise RDF data into a source-independent, semantically rich model. This system sources classes from an application ontology and specifically defines how instances of those classes may relate to each other. Additionally, the system defines and executes RDF data transformations by launching dynamically generated SPARQL update statements. The Semantic Engine was designed as a generalizable data standardization tool, and is able to work with various data models and incoming data sources. Its human-readable configuration files can easily be shared between institutions, providing the basis for collaboration on a standard data model.
Collapse
Affiliation(s)
- Hayden G Freedman
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA 19104, United States
| | - Heather Williams
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA 19104, United States
| | - Mark A Miller
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA 19104, United States
| | - David Birtwell
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA 19104, United States
| | - Danielle L Mowery
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA 19104, United States; Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Drive, Philadelphia, PA 19104, United States
| | - Christian J Stoeckert
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA 19104, United States; Department of Genetics, Perelman School of Medicine, University of Pennsylvania, 415 Curie Boulevard, Philadelphia, PA 19104, United States
| |
Collapse
|
17
|
Jain NM, Culley A, Knoop T, Micheel C, Osterman T, Levy M. Conceptual Framework to Support Clinical Trial Optimization and End-to-End Enrollment Workflow. JCO Clin Cancer Inform 2019; 3:1-10. [PMID: 31225983 PMCID: PMC6873934 DOI: 10.1200/cci.19.00033] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/02/2019] [Indexed: 12/19/2022] Open
Abstract
In this work, we present a conceptual framework to support clinical trial optimization and enrollment workflows and review the current state, limitations, and future trends in this space. This framework includes knowledge representation of clinical trials, clinical trial optimization, clinical trial design, enrollment workflows for prospective clinical trial matching, waitlist management, and, finally, evaluation strategies for assessing improvement.
Collapse
Affiliation(s)
- Neha M. Jain
- Vanderbilt University Medical Center, Nashville, TN
| | | | - Teresa Knoop
- Vanderbilt University Medical Center, Nashville, TN
| | | | | | - Mia Levy
- Vanderbilt University Medical Center, Nashville, TN
- Rush University Medical Center, Chicago, IL
| |
Collapse
|
18
|
Measure clinical drug–drug similarity using Electronic Medical Records. Int J Med Inform 2019; 124:97-103. [DOI: 10.1016/j.ijmedinf.2019.02.003] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Revised: 12/21/2018] [Accepted: 02/10/2019] [Indexed: 12/22/2022]
|
19
|
Idri A, Benhar H, Fernández-Alemán JL, Kadi I. A systematic map of medical data preprocessing in knowledge discovery. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2018; 162:69-85. [PMID: 29903496 DOI: 10.1016/j.cmpb.2018.05.007] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Revised: 04/25/2018] [Accepted: 05/03/2018] [Indexed: 06/08/2023]
Abstract
BACKGROUND AND OBJECTIVE Datamining (DM) has, over the last decade, received increased attention in the medical domain and has been widely used to analyze medical datasets in order to extract useful knowledge and previously unknown patterns. However, historical medical data can often comprise inconsistent, noisy, imbalanced, missing and high dimensional data. These challenges lead to a serious bias in predictive modeling and reduce the performance of DM techniques. Data preprocessing is, therefore, an essential step in knowledge discovery as regards improving the quality of data and making it appropriate and suitable for DM techniques. The objective of this paper is to review the use of preprocessing techniques in clinical datasets. METHODS We performed a systematic map of studies regarding the application of data preprocessing to healthcare and published between January 2000 and December 2017. A search string was determined on the basis of the mapping questions and the PICO categories. The search string was then applied in digital databases covering the fields of computer science and medical informatics in order to identify relevant studies. The studies were initially selected by reading their titles, abstracts and keywords. Those that were selected at that stage were then reviewed using a set of inclusion and exclusion criteria in order to eliminate any that were not relevant. This process resulted in 126 primary studies. RESULTS Selected studies were analyzed and classified according to their publication years and channels, research type, empirical type and contribution type. The findings of this mapping study revealed that researchers have paid a considerable amount of attention to preprocessing in medical DM in last decade. A significant number of the selected studies used data reduction and cleaning preprocessing tasks. Moreover, the disciplines in which preprocessing have received most attention are: cardiology, endocrinology and oncology. CONCLUSIONS Researchers should develop and implement standards for an effective integration of multiple medical data types. Moreover, we identified the need to perform literature reviews.
Collapse
Affiliation(s)
- A Idri
- Software Project Management Research Team, ENSIAS, University Mohammed V of Rabat, Morocco.
| | - H Benhar
- Software Project Management Research Team, ENSIAS, University Mohammed V of Rabat, Morocco.
| | - J L Fernández-Alemán
- Department of Informatics and Systems, Faculty of Computer Science, University of Murcia, Spain.
| | - I Kadi
- Software Project Management Research Team, ENSIAS, University Mohammed V of Rabat, Morocco.
| |
Collapse
|
20
|
Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, Liu PJ, Liu X, Marcus J, Sun M, Sundberg P, Yee H, Zhang K, Zhang Y, Flores G, Duggan GE, Irvine J, Le Q, Litsch K, Mossin A, Tansuwan J, Wang D, Wexler J, Wilson J, Ludwig D, Volchenboum SL, Chou K, Pearson M, Madabushi S, Shah NH, Butte AJ, Howell MD, Cui C, Corrado GS, Dean J. Scalable and accurate deep learning with electronic health records. NPJ Digit Med 2018; 1:18. [PMID: 31304302 PMCID: PMC6550175 DOI: 10.1038/s41746-018-0029-1] [Citation(s) in RCA: 932] [Impact Index Per Article: 155.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2018] [Revised: 03/14/2018] [Accepted: 03/26/2018] [Indexed: 12/17/2022] Open
Abstract
Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority of information in each patient's record. We propose a representation of patients' entire raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. We demonstrate that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization. We validated our approach using de-identified EHR data from two US academic medical centers with 216,221 adult patients hospitalized for at least 24 h. In the sequential format we propose, this volume of EHR data unrolled into a total of 46,864,534,945 data points, including clinical notes. Deep learning models achieved high accuracy for tasks such as predicting: in-hospital mortality (area under the receiver operator curve [AUROC] across sites 0.93-0.94), 30-day unplanned readmission (AUROC 0.75-0.76), prolonged length of stay (AUROC 0.85-0.86), and all of a patient's final discharge diagnoses (frequency-weighted AUROC 0.90). These models outperformed traditional, clinically-used predictive models in all cases. We believe that this approach can be used to create accurate and scalable predictions for a variety of clinical scenarios. In a case study of a particular prediction, we demonstrate that neural networks can be used to identify relevant information from the patient's chart.
Collapse
Affiliation(s)
- Alvin Rajkomar
- Google Inc, Mountain View, CA USA
- University of California, San Francisco, San Francisco, CA USA
| | | | - Kai Chen
- Google Inc, Mountain View, CA USA
| | | | | | | | | | | | | | - Mimi Sun
- Google Inc, Mountain View, CA USA
| | | | | | | | - Yi Zhang
- Google Inc, Mountain View, CA USA
| | | | | | | | - Quoc Le
- Google Inc, Mountain View, CA USA
| | | | | | | | - De Wang
- Google Inc, Mountain View, CA USA
| | | | | | - Dana Ludwig
- University of California, San Francisco, San Francisco, CA USA
| | | | | | | | | | | | - Atul J. Butte
- University of California, San Francisco, San Francisco, CA USA
| | | | | | | | | |
Collapse
|
21
|
Maier C, Lang L, Storf H, Vormstein P, Bieber R, Bernarding J, Herrmann T, Haverkamp C, Horki P, Laufer J, Berger F, Höning G, Fritsch HW, Schüttler J, Ganslandt T, Prokosch HU, Sedlmayr M. Towards Implementation of OMOP in a German University Hospital Consortium. Appl Clin Inform 2018; 9:54-61. [PMID: 29365340 PMCID: PMC5801887 DOI: 10.1055/s-0037-1617452] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Background
In 2015, the German Federal Ministry of Education and Research initiated a large data integration and data sharing research initiative to improve the reuse of data from patient care and translational research. The Observational Medical Outcomes Partnership (OMOP) common data model and the Observational Health Data Sciences and Informatics (OHDSI) tools could be used as a core element in this initiative for harmonizing the terminologies used as well as facilitating the federation of research analyses across institutions.
Objective
To realize an OMOP/OHDSI-based pilot implementation within a consortium of eight German university hospitals, evaluate the applicability to support data harmonization and sharing among them, and identify potential enhancement requirements.
Methods
The vocabularies and terminological mapping required for importing the fact data were prepared, and the process for importing the data from the source files was designed. For eight German university hospitals, a virtual machine preconfigured with the OMOP database and the OHDSI tools as well as the jobs to import the data and conduct the analysis was provided. Last, a federated/distributed query to test the approach was executed.
Results
While the mapping of ICD-10 German Modification succeeded with a rate of 98.8% of all terms for diagnoses, the procedures could not be mapped and hence an extension to the OMOP standard terminologies had to be made.
Overall, the data of 3 million inpatients with approximately 26 million conditions, 21 million procedures, and 23 million observations have been imported. A federated query to identify a cohort of colorectal cancer patients was successfully executed and yielded 16,701 patient cases visualized in a Sunburst plot. Conclusion
OMOP/OHDSI is a viable open source solution for data integration in a German research consortium. Once the terminology problems can be solved, researchers can build on an active community for further development.
Collapse
|
22
|
Abdulnabi M, Al-Haiqi A, Kiah MLM, Zaidan AA, Zaidan BB, Hussain M. A distributed framework for health information exchange using smartphone technologies. J Biomed Inform 2017; 69:230-250. [PMID: 28433825 DOI: 10.1016/j.jbi.2017.04.013] [Citation(s) in RCA: 60] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2016] [Revised: 03/30/2017] [Accepted: 04/18/2017] [Indexed: 11/15/2022]
Abstract
Nationwide health information exchange (NHIE) continues to be a persistent concern for government agencies, despite the many efforts and the conceived benefits of sharing patient data among healthcare providers. Difficulties in ensuring global connectivity, interoperability, and concerns on security have always hampered the government from successfully deploying NHIE. By looking at NHIE from a fresh perspective and bearing in mind the pervasiveness and power of modern mobile platforms, this paper proposes a new approach to NHIE that builds on the notion of consumer-mediated HIE, albeit without the focus on central health record banks. With the growing acceptance of smartphones as reliable, indispensable, and most personal devices, we suggest to leverage the concept of mobile personal health records (PHRs installed on smartphones) to the next level. We envision mPHRs that take the form of distributed storage units for health information, under the full control and direct possession of patients, who can have ready access to their personal data whenever needed. However, for the actual exchange of data with health information systems managed by healthcare providers, the latter have to be interoperable with patient-carried mPHRs. Computer industry has long ago solved a similar problem of interoperability between peripheral devices and operating systems. We borrow from that solution the idea of providing special interfaces between mPHRs and provider systems. This interface enables the two entities to communicate with no change to either end. The design and operation of the proposed approach is explained. Additional pointers on potential implementations are provided, and issues that pertain to any solution to implement NHIE are discussed.
Collapse
Affiliation(s)
- Mohamed Abdulnabi
- Security Lab, Wisma R&D, Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia
| | - Ahmed Al-Haiqi
- Department of Electronics and Communication Engineering, College of Engineering, Universiti Tenaga Nasional, Kajang, Malaysia
| | - M L M Kiah
- Security Lab, Wisma R&D, Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia.
| | - A A Zaidan
- Department of Computing, Faculty of Arts, Computing and Creative Industry, Universiti Pendidikan Sultan Idris, Tanjong Malim, Perak, Malaysia
| | - B B Zaidan
- Department of Computing, Faculty of Arts, Computing and Creative Industry, Universiti Pendidikan Sultan Idris, Tanjong Malim, Perak, Malaysia
| | - Muzammil Hussain
- Security Lab, Wisma R&D, Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia
| |
Collapse
|
23
|
Singer DRJ, Zaïr ZM. Clinical Perspectives on Targeting Therapies for Personalized Medicine. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2015; 102:79-114. [PMID: 26827603 PMCID: PMC7102676 DOI: 10.1016/bs.apcsb.2015.11.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Expected benefits from new technology include more efficient patient selection for clinical trials, more cost-effective treatment pathways for patients and health services and a more profitable accelerated approach for drug developers. Regulatory authorities expect the pharmaceutical and biotechnology industries to accelerate their development of companion diagnostics and companion therapeutics toward the goal of safer and more effective personalized medicine, and expect health services to fund and prescribers to adopt these new therapeutic technologies. This review discusses the importance of a range of new approaches to developing new and reprofiled medicines to treat common and serious diseases, and rare diseases: new network pharmacology approaches, adaptive trial designs with enriched populations more likely to respond safely to treatment, as assessed by companion diagnostics for response and toxicity risk and use of “real world” data. Case studies are described of single and multiple protein drug targets in several important therapeutic areas. These case studies also illustrate the value and complexity of use of selective biomarkers of clinical response and risk of adverse drug effects, either singly or in combination.
Collapse
Affiliation(s)
| | - Zoulikha M Zaïr
- Warwick Medical School, University of Warwick, Coventry, United Kingdom
| |
Collapse
|