1
|
Duda SN, Kennedy N, Conway D, Cheng AC, Nguyen V, Zayas-Cabán T, Harris PA. HL7 FHIR-based tools and initiatives to support clinical research: a scoping review. J Am Med Inform Assoc 2022; 29:1642-1653. [PMID: 35818340 DOI: 10.1093/jamia/ocac105] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 05/23/2022] [Accepted: 06/20/2022] [Indexed: 11/14/2022] Open
Abstract
OBJECTIVES The HL7® fast healthcare interoperability resources (FHIR®) specification has emerged as the leading interoperability standard for the exchange of healthcare data. We conducted a scoping review to identify trends and gaps in the use of FHIR for clinical research. MATERIALS AND METHODS We reviewed published literature, federally funded project databases, application websites, and other sources to discover FHIR-based papers, projects, and tools (collectively, "FHIR projects") available to support clinical research activities. RESULTS Our search identified 203 different FHIR projects applicable to clinical research. Most were associated with preparations to conduct research, such as data mapping to and from FHIR formats (n = 66, 32.5%) and managing ontologies with FHIR (n = 30, 14.8%), or post-study data activities, such as sharing data using repositories or registries (n = 24, 11.8%), general research data sharing (n = 23, 11.3%), and management of genomic data (n = 21, 10.3%). With the exception of phenotyping (n = 19, 9.4%), fewer FHIR-based projects focused on needs within the clinical research process itself. DISCUSSION Funding and usage of FHIR-enabled solutions for research are expanding, but most projects appear focused on establishing data pipelines and linking clinical systems such as electronic health records, patient-facing data systems, and registries, possibly due to the relative newness of FHIR and the incentives for FHIR integration in health information systems. Fewer FHIR projects were associated with research-only activities. CONCLUSION The FHIR standard is becoming an essential component of the clinical research enterprise. To develop FHIR's full potential for clinical research, funding and operational stakeholders should address gaps in FHIR-based research tools and methods.
Collapse
Affiliation(s)
- Stephany N Duda
- Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, Tennessee, USA.,Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Nan Kennedy
- Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Douglas Conway
- Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Alex C Cheng
- Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, Tennessee, USA.,Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Viet Nguyen
- Stratametrics LLC, Salt Lake City, Utah, USA.,HL7 Da Vinci Project, Ann Arbor, Michigan, USA
| | - Teresa Zayas-Cabán
- National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Paul A Harris
- Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, Tennessee, USA.,Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| |
Collapse
|
2
|
Tuck D. A cancer graph: a lung cancer property graph database in Neo4j. BMC Res Notes 2022; 15:45. [PMID: 35164854 PMCID: PMC8842806 DOI: 10.1186/s13104-022-05912-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Accepted: 01/24/2022] [Indexed: 12/13/2022] Open
Abstract
Objectives A novel graph data model of non-small cell lung cancer clinical and genomic data has been constructed with two aims: (1) provide a suitable model for facilitating graph analytics within the Neo4j framework or through tools which can interact through existing Neo4j APIs; and (2) provide a base model extensible to other cancer types and additional datasets such as those derived from electronic health records and other real world sources. Data description Clinical and genomic data integrated with a novel property graph database schema from publicly available datasets and analyses based on The Cancer Genome Atlas lung cancer datasets augmented by with subgraphs patient-patient social network from similarity and correlation as well as individual based biological networks.
Collapse
|
3
|
Yuan Z, Finan S, Warner J, Savova G, Hochheiser H. Interactive Exploration of Longitudinal Cancer Patient Histories Extracted From Clinical Text. JCO Clin Cancer Inform 2021; 4:412-420. [PMID: 32383981 PMCID: PMC7265796 DOI: 10.1200/cci.19.00115] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
PURPOSE Retrospective cancer research requires identification of patients matching both categorical and temporal inclusion criteria, often on the basis of factors exclusively available in clinical notes. Although natural language processing approaches for inferring higher-level concepts have shown promise for bringing structure to clinical texts, interpreting results is often challenging, involving the need to move between abstracted representations and constituent text elements. Our goal was to build interactive visual tools to support the process of interpreting rich representations of histories of patients with cancer. METHODS Qualitative inquiry into user tasks and goals, a structured data model, and an innovative natural language processing pipeline were used to guide design. RESULTS The resulting information visualization tool provides cohort- and patient-level views with linked interactions between components. CONCLUSION Interactive tools hold promise for facilitating the interpretation of patient summaries and identification of cohorts for retrospective research.
Collapse
Affiliation(s)
- Zhou Yuan
- University of Pittsburgh, Pittsburgh, PA
| | | | | | | | | |
Collapse
|
4
|
Wen A, Rasmussen LV, Stone D, Liu S, Kiefer R, Adekkanattu P, Brandt PS, Pacheco JA, Luo Y, Wang F, Pathak J, Liu H, Jiang G. CQL4NLP: Development and Integration of FHIR NLP Extensions in Clinical Quality Language for EHR-driven Phenotyping. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2021; 2021:624-633. [PMID: 34457178 PMCID: PMC8378647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Lack of standardized representation of natural language processing (NLP) components in phenotyping algorithms hinders portability of the phenotyping algorithms and their execution in a high-throughput and reproducible manner. The objective of the study is to develop and evaluate a standard-driven approach - CQL4NLP - that integrates a collection of NLP extensions represented in the HL7 Fast Healthcare Interoperability Resources (FHIR) standard into the clinical quality language (CQL). A minimal NLP data model with 11 NLP-specific data elements was created, including six FHIR NLP extensions. All 11 data elements were identified from their usage in real-world phenotyping algorithms. An NLP ruleset generation mechanism was integrated into the NLP2FHIR pipeline and the NLP rulesets enabled comparable performance for a case study with the identification of obesity comorbidities. The NLP ruleset generation mechanism created a reproducible process for defining the NLP components of a phenotyping algorithm and its execution.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Yuan Luo
- Northwestern University, Chicago, IL
| | - Fei Wang
- Weill Cornell Medicine, New York, NY
| | | | | | | |
Collapse
|
5
|
Colicchio TK, Dissanayake PI, Cimino JJ. Formal representation of patients' care context data: the path to improving the electronic health record. J Am Med Inform Assoc 2021; 27:1648-1657. [PMID: 32935127 PMCID: PMC7671623 DOI: 10.1093/jamia/ocaa134] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Revised: 05/15/2020] [Accepted: 06/10/2020] [Indexed: 11/24/2022] Open
Abstract
Objective To develop a collection of concept-relationship-concept tuples to formally represent patients’ care context data to inform electronic health record (EHR) development. Materials and Methods We reviewed semantic relationships reported in the literature and developed a manual annotation schema. We used the initial schema to annotate sentences extracted from narrative note sections of cardiology, urology, and ear, nose, and throat (ENT) notes. We audio recorded ENT visits and annotated their parsed transcripts. We combined the results of each annotation into a consolidated set of concept-relationship-concept tuples. We then compared the tuples used within and across the multiple data sources. Results We annotated a total of 626 sentences. Starting with 8 relationships from the literature, we annotated 182 sentences from 8 inpatient consult notes (initial set of tuples = 43). Next, we annotated 232 sentences from 10 outpatient visit notes (enhanced set of tuples = 75). Then, we annotated 212 sentences from transcripts of 5 outpatient visits (final set of tuples = 82). The tuples from the visit transcripts covered 103 (74%) concepts documented in the notes of their respective visits. There were 20 (24%) tuples used across all data sources, 10 (12%) used only in inpatient notes, 15 (18%) used only in visit notes, and 7 (9%) used only in the visit transcripts. Conclusions We produced a robust set of 82 tuples useful to represent patients’ care context data. We propose several applications of our tuples to improve EHR navigation, data entry, learning health systems, and decision support.
Collapse
Affiliation(s)
| | | | - James J Cimino
- Informatics Institute, University of Alabama at Birmingham, USA
| |
Collapse
|
6
|
Ontological representation, classification and data-driven computing of phenotypes. J Biomed Semantics 2020; 11:15. [PMID: 33349245 PMCID: PMC7751121 DOI: 10.1186/s13326-020-00230-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Accepted: 11/03/2020] [Indexed: 11/21/2022] Open
Abstract
Background The successful determination and analysis of phenotypes plays a key role in the diagnostic process, the evaluation of risk factors and the recruitment of participants for clinical and epidemiological studies. The development of computable phenotype algorithms to solve these tasks is a challenging problem, caused by various reasons. Firstly, the term ‘phenotype’ has no generally agreed definition and its meaning depends on context. Secondly, the phenotypes are most commonly specified as non-computable descriptive documents. Recent attempts have shown that ontologies are a suitable way to handle phenotypes and that they can support clinical research and decision making. The SMITH Consortium is dedicated to rapidly establish an integrative medical informatics framework to provide physicians with the best available data and knowledge and enable innovative use of healthcare data for research and treatment optimisation. In the context of a methodological use case ‘phenotype pipeline’ (PheP), a technology to automatically generate phenotype classifications and annotations based on electronic health records (EHR) is developed. A large series of phenotype algorithms will be implemented. This implies that for each algorithm a classification scheme and its input variables have to be defined. Furthermore, a phenotype engine is required to evaluate and execute developed algorithms. Results In this article, we present a Core Ontology of Phenotypes (COP) and the software Phenotype Manager (PhenoMan), which implements a novel ontology-based method to model, classify and compute phenotypes from already available data. Our solution includes an enhanced iterative reasoning process combining classification tasks with mathematical calculations at runtime. The ontology as well as the reasoning method were successfully evaluated with selected phenotypes including SOFA score, socio-economic status, body surface area and WHO BMI classification based on available medical data. Conclusions We developed a novel ontology-based method to model phenotypes of living beings with the aim of automated phenotype reasoning based on available data. This new approach can be used in clinical context, e.g., for supporting the diagnostic process, evaluating risk factors, and recruiting appropriate participants for clinical and epidemiological studies.
Collapse
|
7
|
Bouaud J, Pelayo S, Lamy JB, Prebet C, Ngo C, Teixeira L, Guézennec G, Séroussi B. Implementation of an ontological reasoning to support the guideline-based management of primary breast cancer patients in the DESIREE project. Artif Intell Med 2020; 108:101922. [DOI: 10.1016/j.artmed.2020.101922] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Revised: 02/25/2020] [Accepted: 07/01/2020] [Indexed: 10/23/2022]
|
8
|
Najafabadipour M, Zanin M, Rodríguez-González A, Torrente M, Nuñez García B, Cruz Bermudez JL, Provencio M, Menasalvas E. Reconstructing the patient's natural history from electronic health records. Artif Intell Med 2020; 105:101860. [PMID: 32505419 DOI: 10.1016/j.artmed.2020.101860] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Revised: 04/06/2020] [Accepted: 04/06/2020] [Indexed: 10/24/2022]
Abstract
The automatic extraction of a patient's natural history from Electronic Health Records (EHRs) is a critical step towards building intelligent systems that can reason about clinical variables and support decision making. Although EHRs contain a large amount of valuable information about the patient's medical care, this information can only be fully understood when analyzed in a temporal context. Any intelligent system should then be able to extract medical concepts, date expressions, temporal relations and the temporal ordering of medical events from the free texts of EHRs; yet, this task is hard to tackle, due to the domain specific nature of EHRs, writing quality and lack of structure of these texts, and more generally the presence of redundant information. In this paper, we introduce a new Natural Language Processing (NLP) framework, capable of extracting the aforementioned elements from EHRs written in Spanish using rule-based methods. We focus on building medical timelines, which include disease diagnosis and its progression over time. By using a large dataset of EHRs comprising information about patients suffering from lung cancer, we show that our framework has an adequate level of performance by correctly building the timeline for 843 patients from a pool of 989 patients, achieving a precision of 0.852.
Collapse
Affiliation(s)
- Marjan Najafabadipour
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Madrid, Spain; Hospital Universitario Puerta de Hierro Majadahonda, Madrid, Spain.
| | - Massimiliano Zanin
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Madrid, Spain.
| | | | - Maria Torrente
- Hospital Universitario Puerta de Hierro Majadahonda, Madrid, Spain.
| | | | | | | | - Ernestina Menasalvas
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Madrid, Spain.
| |
Collapse
|
9
|
Hong N, Wen A, Shen F, Sohn S, Wang C, Liu H, Jiang G. Developing a scalable FHIR-based clinical data normalization pipeline for standardizing and integrating unstructured and structured electronic health record data. JAMIA Open 2019; 2:570-579. [PMID: 32025655 PMCID: PMC6993992 DOI: 10.1093/jamiaopen/ooz056] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 09/23/2019] [Accepted: 10/01/2019] [Indexed: 11/30/2022] Open
Abstract
Objective To design, develop, and evaluate a scalable clinical data normalization pipeline for standardizing unstructured electronic health record (EHR) data leveraging the HL7 Fast Healthcare Interoperability Resources (FHIR) specification. Methods We established an FHIR-based clinical data normalization pipeline known as NLP2FHIR that mainly comprises: (1) a module for a core natural language processing (NLP) engine with an FHIR-based type system; (2) a module for integrating structured data; and (3) a module for content normalization. We evaluated the FHIR modeling capability focusing on core clinical resources such as Condition, Procedure, MedicationStatement (including Medication), and FamilyMemberHistory using Mayo Clinic’s unstructured EHR data. We constructed a gold standard reusing annotation corpora from previous NLP projects. Results A total of 30 mapping rules, 62 normalization rules, and 11 NLP-specific FHIR extensions were created and implemented in the NLP2FHIR pipeline. The elements that need to integrate structured data from each clinical resource were identified. The performance of unstructured data modeling achieved F scores ranging from 0.69 to 0.99 for various FHIR element representations (0.69–0.99 for Condition; 0.75–0.84 for Procedure; 0.71–0.99 for MedicationStatement; and 0.75–0.95 for FamilyMemberHistory). Conclusion We demonstrated that the NLP2FHIR pipeline is feasible for modeling unstructured EHR data and integrating structured elements into the model. The outcomes of this work provide standards-based tools of clinical data normalization that is indispensable for enabling portable EHR-driven phenotyping and large-scale data analytics, as well as useful insights for future developments of the FHIR specifications with regard to handling unstructured clinical data.
Collapse
Affiliation(s)
- Na Hong
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Andrew Wen
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Feichen Shen
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Sunghwan Sohn
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Chen Wang
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Guoqian Jiang
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| |
Collapse
|
10
|
Phillips CA, Razzaghi H, Aglio T, McNeil MJ, Salvesen-Quinn M, Sopfe J, Wilkes JJ, Forrest CB, Bailey LC. Development and evaluation of a computable phenotype to identify pediatric patients with leukemia and lymphoma treated with chemotherapy using electronic health record data. Pediatr Blood Cancer 2019; 66:e27876. [PMID: 31207054 PMCID: PMC7135896 DOI: 10.1002/pbc.27876] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Revised: 04/30/2019] [Accepted: 05/25/2019] [Indexed: 01/27/2023]
Abstract
BACKGROUND Widespread implementation of electronic health records (EHR) has created new opportunities for pediatric oncology observational research. Little attention has been given to using EHR data to identify patients with pediatric hematologic malignancies. METHODS This study used EHR-derived data in a pediatric clinical data research network, PEDSnet, to develop and evaluate a computable phenotype algorithm to identify pediatric patients with leukemia and lymphoma who received treatment with chemotherapy. To guide early development, multiple computable phenotype-defined cohorts were compared to one institution's tumor registry. The most promising algorithm was chosen for formal evaluation and consisted of at least two leukemia/lymphoma diagnoses (Systematized Nomenclature of Medicine codes) within a 90-day period, two chemotherapy exposures, and three hematology-oncology provider encounters. During evaluation, the computable phenotype was executed against EHR data from 2011 to 2016 at three large institutions. Classification accuracy was assessed by masked medical record review with phenotype-identified patients compared to a control group with at least three hematology-oncology encounters. RESULTS The computable phenotype had sensitivity of 100% (confidence interval [CI] 99%, 100%), specificity of 99% (CI 99%, 100%), positive predictive value (PPV) and negative predictive value (NPV) of 100%, and C-statistic of 1 at the development institution. The computable phenotype performance was similar at the two test institutions with sensitivity of 100% (CI 99%, 100%), specificity of 99% (CI 99%, 100%), PPV of 96%, NPV of 100%, and C-statistic of 0.99. CONCLUSION The EHR-based computable phenotype is an accurate cohort identification tool for pediatric patients with leukemia and lymphoma who have been treated with chemotherapy and is ready for use in clinical studies.
Collapse
Affiliation(s)
- Charles A Phillips
- Division of Oncology and Center for Childhood Cancer Research, The Children’s Hospital of Philadelphia, Philadelphia, PA
| | - Hanieh Razzaghi
- Division of Oncology and Center for Childhood Cancer Research, The Children’s Hospital of Philadelphia, Philadelphia, PA
| | - Taylor Aglio
- Department of Pediatrics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA
| | - Michael J McNeil
- Department of Oncology, St. Jude Children’s Research Hospital, Memphis, TN
| | | | - Jenna Sopfe
- Center for Cancer and Blood Disorders, Department of Pediatrics, University of Colorado, Denver, CO
| | - Jennifer J Wilkes
- Division of Hematology and Oncology and Center for Clinical and Translational Research, Department of Pediatrics, Seattle Children’s Hospital and the University of Washington, Seattle, WA
| | - Christopher B Forrest
- Department of Pediatrics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA,Department of Biomedical and Health Informatics, The Children’s Hospital of Philadelphia, Philadelphia, PA
| | - L Charles Bailey
- Division of Oncology and Center for Childhood Cancer Research, The Children’s Hospital of Philadelphia, Philadelphia, PA,Department of Pediatrics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA,Department of Biomedical and Health Informatics, The Children’s Hospital of Philadelphia, Philadelphia, PA
| |
Collapse
|
11
|
Saripalle RK. Fast Health Interoperability Resources (FHIR). INTERNATIONAL JOURNAL OF E-HEALTH AND MEDICAL COMMUNICATIONS 2019. [DOI: 10.4018/ijehmc.2019010105] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The inception of EHR has shown a lot of potentials and virtually eliminated the drawbacks of paper-based medical notes. However, the transition has not been seamless due to various technical and political drawbacks. One of the major technical challenges is interoperability. The biomedical community has established various structural and semantic standards to capture and share medical data across heterogeneous systems such as ASTM Community Care Record, Health Level 7 (HL7) Clinical Care Document, etc. The HL7 organization has recently published Fast Health Interoperability Resources (FHIR) – a standard to improve interoperability, overcome shortcomings of the previous standard and integrate lightweight web services. This article provides an overview of HL7 FHIR, its concepts and literature review on its current status, usage, and adoption. Based on the thorough research and literature review, the authors strongly believe that FHIR can bridge interoperability gap between the growing number of disparate and variety of healthcare entities.
Collapse
|
12
|
Reddy BP, Houlding B, Hederman L, Canney M, Debruyne C, O'Brien C, Meehan A, O'Sullivan D, Little MA. Data linkage in medical science using the resource description framework: the AVERT model. HRB Open Res 2018; 1:20. [PMID: 32002509 PMCID: PMC6973528 DOI: 10.12688/hrbopenres.12851.2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/15/2018] [Indexed: 12/04/2022] Open
Abstract
There is an ongoing challenge as to how best manage and understand ‘big data’ in precision medicine settings. This paper describes the potential for a Linked Data approach, using a Resource Description Framework (RDF) model, to combine multiple datasets with temporal and spatial elements of varying dimensionality. This “AVERT model” provides a framework for converting multiple standalone files of various formats, from both clinical and environmental settings, into a single data source. This data source can thereafter be queried effectively, shared with outside parties, more easily understood by multiple stakeholders using standardized vocabularies, incorporating provenance metadata and supporting temporo-spatial reasoning. The approach has further advantages in terms of data sharing, security and subsequent analysis. We use a case study relating to anti-Glomerular Basement Membrane (GBM) disease, a rare autoimmune condition, to illustrate a technical proof of concept for the AVERT model.
Collapse
Affiliation(s)
- Brian P Reddy
- Trinity Health Kidney Centre, Tallaght Hospital, Dublin, Ireland.,ADAPT Centre for Digital Content, University of Dublin, Dublin, Ireland.,Health Economics Policy and Evaluation Centre, National University of Ireland, Galway, Galway, Ireland
| | - Brett Houlding
- School of Computer Science and Statistics, University of Dublin, Dublin, Ireland
| | - Lucy Hederman
- ADAPT Centre for Digital Content, University of Dublin, Dublin, Ireland.,School of Computer Science and Statistics, University of Dublin, Dublin, Ireland
| | - Mark Canney
- Trinity Health Kidney Centre, Tallaght Hospital, Dublin, Ireland
| | - Christophe Debruyne
- ADAPT Centre for Digital Content, University of Dublin, Dublin, Ireland.,Vrije Universiteit Brussel, Brussles, Belgium
| | - Ciaran O'Brien
- ADAPT Centre for Digital Content, University of Dublin, Dublin, Ireland
| | - Alan Meehan
- ADAPT Centre for Digital Content, University of Dublin, Dublin, Ireland
| | - Declan O'Sullivan
- ADAPT Centre for Digital Content, University of Dublin, Dublin, Ireland.,School of Computer Science and Statistics, University of Dublin, Dublin, Ireland
| | - Mark A Little
- Trinity Health Kidney Centre, Tallaght Hospital, Dublin, Ireland.,Irish Centre for Vascular Biology, University of Dublin, Dublin, Ireland
| |
Collapse
|
13
|
Reddy BP, Houlding B, Hederman L, Canney M, Debruyne C, O'Brien C, Meehan A, O'Sullivan D, Little MA. Data linkage in medical science using the resource description framework: the AVERT model. HRB Open Res 2018; 1:20. [PMID: 32002509 PMCID: PMC6973528 DOI: 10.12688/hrbopenres.12851.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/15/2018] [Indexed: 11/02/2023] Open
Abstract
There is an ongoing challenge as to how best manage and understand 'big data' in precision medicine settings. This paper describes the potential for a Linked Data approach, using a Resource Description Framework (RDF) model, to combine multiple datasets with temporal and spatial elements of varying dimensionality. This "AVERT model" provides a framework for converting multiple standalone files of various formats, from both clinical and environmental settings, into a single data source. This data source can thereafter be queried effectively, shared with outside parties, more easily understood by multiple stakeholders using standardized vocabularies, incorporating provenance metadata and supporting temporo-spatial reasoning. The approach has further advantages in terms of data sharing, security and subsequent analysis. We use a case study relating to anti-Glomerular Basement Membrane (GBM) disease, a rare autoimmune condition, to illustrate a technical proof of concept for the AVERT model.
Collapse
Affiliation(s)
- Brian P Reddy
- Trinity Health Kidney Centre, Tallaght Hospital, Dublin, Ireland
- ADAPT Centre for Digital Content, University of Dublin, Dublin, Ireland
- Health Economics Policy and Evaluation Centre, National University of Ireland, Galway, Galway, Ireland
| | - Brett Houlding
- School of Computer Science and Statistics, University of Dublin, Dublin, Ireland
| | - Lucy Hederman
- ADAPT Centre for Digital Content, University of Dublin, Dublin, Ireland
- School of Computer Science and Statistics, University of Dublin, Dublin, Ireland
| | - Mark Canney
- Trinity Health Kidney Centre, Tallaght Hospital, Dublin, Ireland
| | - Christophe Debruyne
- ADAPT Centre for Digital Content, University of Dublin, Dublin, Ireland
- Vrije Universiteit Brussel, Brussles, Belgium
| | - Ciaran O'Brien
- ADAPT Centre for Digital Content, University of Dublin, Dublin, Ireland
| | - Alan Meehan
- ADAPT Centre for Digital Content, University of Dublin, Dublin, Ireland
| | - Declan O'Sullivan
- ADAPT Centre for Digital Content, University of Dublin, Dublin, Ireland
- School of Computer Science and Statistics, University of Dublin, Dublin, Ireland
| | - Mark A Little
- Trinity Health Kidney Centre, Tallaght Hospital, Dublin, Ireland
- Irish Centre for Vascular Biology, University of Dublin, Dublin, Ireland
| |
Collapse
|
14
|
Hong N, Wen A, Shen F, Sohn S, Liu S, Liu H, Jiang G. Integrating Structured and Unstructured EHR Data Using an FHIR-based Type System: A Case Study with Medication Data. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2018; 2017:74-83. [PMID: 29888045 PMCID: PMC5961797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Standards-based modeling of electronic health records (EHR) data holds great significance for data interoperability and large-scale usage. Integration of unstructured data into a standard data model, however, poses unique challenges partially due to heterogeneous type systems used in existing clinical NLP systems. We introduce a scalable and standards-based framework for integrating structured and unstructured EHR data leveraging the HL7 Fast Healthcare Interoperability Resources (FHIR) specification. We implemented a clinical NLP pipeline enhanced with an FHIR-based type system and performed a case study using medication data from Mayo Clinic's EHR. Two UIMA-based NLP tools known as MedXN and MedTime were integrated in the pipeline to extract FHIR MedicationStatement resources and related attributes from unstructured medication lists. We developed a rule-based approach for assigning the NLP output types to the FHIR elements represented in the type system, whereas we investigated the FHIR elements belonging to the source of the structured EMR data. We used the FHIR resource "MedicationStatement" as an example to illustrate our integration framework and methods. For evaluation, we manually annotated FHIR elements in 166 medication statements from 14 clinical notes generated by Mayo Clinic in the course of patient care, and used standard performance measures (precision, recall and f-measure). The F-scores achieved ranged from 0.73 to 0.99 for the various FHIR element representations. The results demonstrated that our framework based on the FHIR type system is feasible for normalizing and integrating both structured and unstructured EHR data.
Collapse
|
15
|
Jean-Quartier C, Jeanquartier F, Jurisica I, Holzinger A. In silico cancer research towards 3R. BMC Cancer 2018; 18:408. [PMID: 29649981 PMCID: PMC5897933 DOI: 10.1186/s12885-018-4302-0] [Citation(s) in RCA: 64] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2017] [Accepted: 03/26/2018] [Indexed: 01/11/2023] Open
Abstract
BACKGROUND Improving our understanding of cancer and other complex diseases requires integrating diverse data sets and algorithms. Intertwining in vivo and in vitro data and in silico models are paramount to overcome intrinsic difficulties given by data complexity. Importantly, this approach also helps to uncover underlying molecular mechanisms. Over the years, research has introduced multiple biochemical and computational methods to study the disease, many of which require animal experiments. However, modeling systems and the comparison of cellular processes in both eukaryotes and prokaryotes help to understand specific aspects of uncontrolled cell growth, eventually leading to improved planning of future experiments. According to the principles for humane techniques milestones in alternative animal testing involve in vitro methods such as cell-based models and microfluidic chips, as well as clinical tests of microdosing and imaging. Up-to-date, the range of alternative methods has expanded towards computational approaches, based on the use of information from past in vitro and in vivo experiments. In fact, in silico techniques are often underrated but can be vital to understanding fundamental processes in cancer. They can rival accuracy of biological assays, and they can provide essential focus and direction to reduce experimental cost. MAIN BODY We give an overview on in vivo, in vitro and in silico methods used in cancer research. Common models as cell-lines, xenografts, or genetically modified rodents reflect relevant pathological processes to a different degree, but can not replicate the full spectrum of human disease. There is an increasing importance of computational biology, advancing from the task of assisting biological analysis with network biology approaches as the basis for understanding a cell's functional organization up to model building for predictive systems. CONCLUSION Underlining and extending the in silico approach with respect to the 3Rs for replacement, reduction and refinement will lead cancer research towards efficient and effective precision medicine. Therefore, we suggest refined translational models and testing methods based on integrative analyses and the incorporation of computational biology within cancer research.
Collapse
Affiliation(s)
- Claire Jean-Quartier
- Holzinger Group, Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, Graz, Austria
| | - Fleur Jeanquartier
- Holzinger Group, Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, Graz, Austria
- Institute of Interactive Systems and Data Science, Graz University of Technology, Graz, Austria
| | - Igor Jurisica
- Krembil Research Institute, University Health Network; Depts. of Medical Bioph. and Comp. Sci., University of Toronto; Institute of Neuroimmunology, Slovak Academy of Sciences, Toronto, Canada
| | - Andreas Holzinger
- Holzinger Group, Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, Graz, Austria
- Institute of Interactive Systems and Data Science, Graz University of Technology, Graz, Austria
| |
Collapse
|
16
|
El-Sappagh S, Kwak D, Ali F, Kwak KS. DMTO: a realistic ontology for standard diabetes mellitus treatment. J Biomed Semantics 2018; 9:8. [PMID: 29409535 PMCID: PMC5800094 DOI: 10.1186/s13326-018-0176-y] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2017] [Accepted: 01/04/2018] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Treatment of type 2 diabetes mellitus (T2DM) is a complex problem. A clinical decision support system (CDSS) based on massive and distributed electronic health record data can facilitate the automation of this process and enhance its accuracy. The most important component of any CDSS is its knowledge base. This knowledge base can be formulated using ontologies. The formal description logic of ontology supports the inference of hidden knowledge. Building a complete, coherent, consistent, interoperable, and sharable ontology is a challenge. RESULTS This paper introduces the first version of the newly constructed Diabetes Mellitus Treatment Ontology (DMTO) as a basis for shared-semantics, domain-specific, standard, machine-readable, and interoperable knowledge relevant to T2DM treatment. It is a comprehensive ontology and provides the highest coverage and the most complete picture of coded knowledge about T2DM patients' current conditions, previous profiles, and T2DM-related aspects, including complications, symptoms, lab tests, interactions, treatment plan (TP) frameworks, and glucose-related diseases and medications. It adheres to the design principles recommended by the Open Biomedical Ontologies Foundry and is based on ontological realism that follows the principles of the Basic Formal Ontology and the Ontology for General Medical Science. DMTO is implemented under Protégé 5.0 in Web Ontology Language (OWL) 2 format and is publicly available through the National Center for Biomedical Ontology's BioPortal at http://bioportal.bioontology.org/ontologies/DMTO . The current version of DMTO includes more than 10,700 classes, 277 relations, 39,425 annotations, 214 semantic rules, and 62,974 axioms. We provide proof of concept for this approach to modeling TPs. CONCLUSION The ontology is able to collect and analyze most features of T2DM as well as customize chronic TPs with the most appropriate drugs, foods, and physical exercises. DMTO is ready to be used as a knowledge base for semantically intelligent and distributed CDSS systems.
Collapse
Affiliation(s)
- Shaker El-Sappagh
- Information Systems Department, Faculty of Computers and Informatics, Benha University, Banha Mansura Road, Meit Ghamr - Benha, Banha, Al Qalyubia Governorate 3000-104 Egypt
| | - Daehan Kwak
- Department of Computer Science, Kean University, Union, NJ 07083 USA
| | - Farman Ali
- Department of Information and Communication Engineering, Inha University, 100 Inharo, Nam-gu, Incheon, 22212 South Korea
| | - Kyung-Sup Kwak
- Department of Information and Communication Engineering, Inha University, 100 Inharo, Nam-gu, Incheon, 22212 South Korea
| |
Collapse
|
17
|
Abstract
Computational manipulation of knowledge is an important, and often under-appreciated, aspect of biomedical Data Science. The first Data Science initiative from the US National Institutes of Health was entitled "Big Data to Knowledge (BD2K)." The main emphasis of the more than $200M allocated to that program has been on "Big Data;" the "Knowledge" component has largely been the implicit assumption that the work will lead to new biomedical knowledge. However, there is long-standing and highly productive work in computational knowledge representation and reasoning, and computational processing of knowledge has a role in the world of Data Science. Knowledge-based biomedical Data Science involves the design and implementation of computer systems that act as if they knew about biomedicine. There are many ways in which a computational approach might act as if it knew something: for example, it might be able to answer a natural language question about a biomedical topic, or pass an exam; it might be able to use existing biomedical knowledge to rank or evaluate hypotheses; it might explain or interpret data in light of prior knowledge, either in a Bayesian or other sort of framework. These are all examples of automated reasoning that act on computational representations of knowledge. After a brief survey of existing approaches to knowledge-based data science, this position paper argues that such research is ripe for expansion, and expanded application.
Collapse
Affiliation(s)
- Lawrence E Hunter
- Computational Bioscience, University of Colorado School of Medicine, Aurora, CO 80045, USA ; ORCID: https://orcid.org/0000-0003-1455-3370
| |
Collapse
|
18
|
Savova GK, Tseytlin E, Finan S, Castine M, Miller T, Medvedeva O, Harris D, Hochheiser H, Lin C, Chavan G, Jacobson RS. DeepPhe: A Natural Language Processing System for Extracting Cancer Phenotypes from Clinical Records. Cancer Res 2017; 77:e115-e118. [PMID: 29092954 DOI: 10.1158/0008-5472.can-17-0615] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Revised: 07/20/2017] [Accepted: 10/02/2017] [Indexed: 11/16/2022]
Abstract
Precise phenotype information is needed to understand the effects of genetic and epigenetic changes on tumor behavior and responsiveness. Extraction and representation of cancer phenotypes is currently mostly performed manually, making it difficult to correlate phenotypic data to genomic data. In addition, genomic data are being produced at an increasingly faster pace, exacerbating the problem. The DeepPhe software enables automated extraction of detailed phenotype information from electronic medical records of cancer patients. The system implements advanced Natural Language Processing and knowledge engineering methods within a flexible modular architecture, and was evaluated using a manually annotated dataset of the University of Pittsburgh Medical Center breast cancer patients. The resulting platform provides critical and missing computational methods for computational phenotyping. Working in tandem with advanced analysis of high-throughput sequencing, these approaches will further accelerate the transition to precision cancer treatment. Cancer Res; 77(21); e115-8. ©2017 AACR.
Collapse
Affiliation(s)
- Guergana K Savova
- Boston Children's Hospital, Boston, Massachusetts. .,Harvard Medical School, Boston, Massachusetts
| | - Eugene Tseytlin
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Sean Finan
- Boston Children's Hospital, Boston, Massachusetts
| | - Melissa Castine
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Timothy Miller
- Boston Children's Hospital, Boston, Massachusetts.,Harvard Medical School, Boston, Massachusetts
| | - Olga Medvedeva
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - David Harris
- Boston Children's Hospital, Boston, Massachusetts
| | - Harry Hochheiser
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Chen Lin
- Boston Children's Hospital, Boston, Massachusetts
| | - Girish Chavan
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Rebecca S Jacobson
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania.,University of Pittsburgh Cancer Institute, Pittsburgh, Pennsylvania
| |
Collapse
|
19
|
Dhombres F, Charlet J. Knowledge Representation and Management, It's Time to Integrate! Yearb Med Inform 2017; 26:148-151. [PMID: 29063556 DOI: 10.15265/iy-2017-030] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Objectives: To select, present, and summarize the best papers published in 2016 in the field of Knowledge Representation and Management (KRM). Methods: A comprehensive and standardized review of the medical informatics literature was performed based on a PubMed query. Results: Among the 1,421 retrieved papers, the review process resulted in the selection of four best papers focused on the integration of heterogeneous data via the development and the alignment of terminological resources. In the first article, the authors provide a curated and standardized version of the publicly available US FDA Adverse Event Reporting System. Such a resource will improve the quality of the underlying data, and enable standardized analyses using common vocabularies. The second article describes a project developed in order to facilitate heterogeneous data integration in the i2b2 framework. The originality is to allow users integrate the data described in different terminologies and to build a new repository, with a unique model able to support the representation of the various data. The third paper is dedicated to model the association between multiple phenotypic traits described within the Human Phenotype Ontology (HPO) and the corresponding genotype in the specific context of rare diseases (rare variants). Finally, the fourth paper presents solutions to annotation-ontology mapping in genome-scale data. Of particular interest in this work is the Experimental Factor Ontology (EFO) and its generic association model, the Ontology of Biomedical AssociatioN (OBAN). Conclusion: Ontologies have started to show their efficiency to integrate medical data for various tasks in medical informatics: electronic health records data management, clinical research, and knowledge-based systems development.
Collapse
|
20
|
Rosenbloom ST, Carroll RJ, Warner JL, Matheny ME, Denny JC. Representing Knowledge Consistently Across Health Systems. Yearb Med Inform 2017; 26:139-147. [PMID: 29063555 DOI: 10.15265/iy-2017-018] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Objectives: Electronic health records (EHRs) have increasingly emerged as a powerful source of clinical data that can be leveraged for reuse in research and in modular health apps that integrate into diverse health information technologies. A key challenge to these use cases is representing the knowledge contained within data from different EHR systems in a uniform fashion. Method: We reviewed several recent studies covering the knowledge representation in the common data models for the Observational Medical Outcomes Partnership (OMOP) and its Observational Health Data Sciences and Informatics program, and the United States Patient Centered Outcomes Research Network (PCORNet). We also reviewed the Health Level 7 Fast Healthcare Interoperability Resource standard supporting app-like programs that can be used across multiple EHR and research systems. Results: There has been a recent growth in high-impact efforts to support quality-assured and standardized clinical data sharing across different institutions and EHR systems. We focused on three major efforts as part of a larger landscape moving towards shareable, transportable, and computable clinical data. Conclusion: The growth in approaches to developing common data models to support interoperable knowledge representation portends an increasing availability of high-quality clinical data in support of research. Building on these efforts will allow a future whereby significant portions of the populations in the world may be able to share their data for research.
Collapse
|
21
|
Gonzalez-Hernandez G, Sarker A, O’Connor K, Savova G. Capturing the Patient's Perspective: a Review of Advances in Natural Language Processing of Health-Related Text. Yearb Med Inform 2017; 26:214-227. [PMID: 29063568 PMCID: PMC6250990 DOI: 10.15265/iy-2017-029] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Background: Natural Language Processing (NLP) methods are increasingly being utilized to mine knowledge from unstructured health-related texts. Recent advances in noisy text processing techniques are enabling researchers and medical domain experts to go beyond the information encapsulated in published texts (e.g., clinical trials and systematic reviews) and structured questionnaires, and obtain perspectives from other unstructured sources such as Electronic Health Records (EHRs) and social media posts. Objectives: To review the recently published literature discussing the application of NLP techniques for mining health-related information from EHRs and social media posts. Methods: Literature review included the research published over the last five years based on searches of PubMed, conference proceedings, and the ACM Digital Library, as well as on relevant publications referenced in papers. We particularly focused on the techniques employed on EHRs and social media data. Results: A set of 62 studies involving EHRs and 87 studies involving social media matched our criteria and were included in this paper. We present the purposes of these studies, outline the key NLP contributions, and discuss the general trends observed in the field, the current state of research, and important outstanding problems. Conclusions: Over the recent years, there has been a continuing transition from lexical and rule-based systems to learning-based approaches, because of the growth of annotated data sets and advances in data science. For EHRs, publicly available annotated data is still scarce and this acts as an obstacle to research progress. On the contrary, research on social media mining has seen a rapid growth, particularly because the large amount of unlabeled data available via this resource compensates for the uncertainty inherent to the data. Effective mechanisms to filter out noise and for mapping social media expressions to standard medical concepts are crucial and latent research problems. Shared tasks and other competitive challenges have been driving factors behind the implementation of open systems, and they are likely to play an imperative role in the development of future systems.
Collapse
Affiliation(s)
- G. Gonzalez-Hernandez
- Department of Epidemiology, Biostatistics, and Informatics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - A. Sarker
- Department of Epidemiology, Biostatistics, and Informatics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - K. O’Connor
- Department of Epidemiology, Biostatistics, and Informatics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - G. Savova
- Boston Children’s Hospital and Harvard Medical School, Boston, MA, USA
| |
Collapse
|
22
|
Jeanquartier F, Jean-Quartier C, Kotlyar M, Tokar T, Hauschild AC, Jurisica I, Holzinger A. Machine Learning for In Silico Modeling of Tumor Growth. LECTURE NOTES IN COMPUTER SCIENCE 2016. [DOI: 10.1007/978-3-319-50478-0_21] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|