1
|
Zhang H, Lyu T, Yin P, Bost S, He X, Guo Y, Prosperi M, Hogan WR, Bian J. A scoping review of semantic integration of health data and information. Int J Med Inform 2022; 165:104834. [PMID: 35863206 DOI: 10.1016/j.ijmedinf.2022.104834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 07/06/2022] [Accepted: 07/13/2022] [Indexed: 11/25/2022]
Abstract
OBJECTIVE We summarized a decade of new research focusing on semantic data integration (SDI) since 2009, and we aim to: (1) summarize the state-of-art approaches on integrating health data and information; and (2) identify the main gaps and challenges of integrating health data and information from multiple levels and domains. MATERIALS AND METHODS We used PubMed as our focus is applications of SDI in biomedical domains and followed the Preferred Reporting Items for Systematic Review and Meta-Analyses (PRISMA) to search and report for relevant studies published between January 1, 2009 and December 31, 2021. We used Covidence-a systematic review management system-to carry out this scoping review. RESULTS The initial search from PubMed resulted in 5,326 articles using the two sets of keywords. We then removed 44 duplicates and 5,282 articles were retained for abstract screening. After abstract screening, we included 246 articles for full-text screening, among which 87 articles were deemed eligible for full-text extraction. We summarized the 87 articles from four aspects: (1) methods for the global schema; (2) data integration strategies (i.e., federated system vs. data warehousing); (3) the sources of the data; and (4) downstream applications. CONCLUSION SDI approach can effectively resolve the semantic heterogeneities across different data sources. We identified two key gaps and challenges in existing SDI studies that (1) many of the existing SDI studies used data from only single-level data sources (e.g., integrating individual-level patient records from different hospital systems), and (2) documentation of the data integration processes is sparse, threatening the reproducibility of SDI studies.
Collapse
Affiliation(s)
- Hansi Zhang
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Tianchen Lyu
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Pengfei Yin
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Sarah Bost
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Xing He
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Yi Guo
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Mattia Prosperi
- Department of Epidemiology, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Willian R Hogan
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Jiang Bian
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States.
| |
Collapse
|
2
|
Gagalova KK, Leon Elizalde MA, Portales-Casamar E, Görges M. What You Need to Know Before Implementing a Clinical Research Data Warehouse: Comparative Review of Integrated Data Repositories in Health Care Institutions. JMIR Form Res 2020; 4:e17687. [PMID: 32852280 PMCID: PMC7484778 DOI: 10.2196/17687] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Revised: 06/09/2020] [Accepted: 07/17/2020] [Indexed: 12/23/2022] Open
Abstract
Background Integrated data repositories (IDRs), also referred to as clinical data warehouses, are platforms used for the integration of several data sources through specialized analytical tools that facilitate data processing and analysis. IDRs offer several opportunities for clinical data reuse, and the number of institutions implementing an IDR has grown steadily in the past decade. Objective The architectural choices of major IDRs are highly diverse and determining their differences can be overwhelming. This review aims to explore the underlying models and common features of IDRs, provide a high-level overview for those entering the field, and propose a set of guiding principles for small- to medium-sized health institutions embarking on IDR implementation. Methods We reviewed manuscripts published in peer-reviewed scientific literature between 2008 and 2020, and selected those that specifically describe IDR architectures. Of 255 shortlisted articles, we found 34 articles describing 29 different architectures. The different IDRs were analyzed for common features and classified according to their data processing and integration solution choices. Results Despite common trends in the selection of standard terminologies and data models, the IDRs examined showed heterogeneity in the underlying architecture design. We identified 4 common architecture models that use different approaches for data processing and integration. These different approaches were driven by a variety of features such as data sources, whether the IDR was for a single institution or a collaborative project, the intended primary data user, and purpose (research-only or including clinical or operational decision making). Conclusions IDR implementations are diverse and complex undertakings, which benefit from being preceded by an evaluation of requirements and definition of scope in the early planning stage. Factors such as data source diversity and intended users of the IDR influence data flow and synchronization, both of which are crucial factors in IDR architecture planning.
Collapse
Affiliation(s)
- Kristina K Gagalova
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, Canada.,Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, Canada.,Research Institute, BC Children's Hospital, Vancouver, BC, Canada
| | - M Angelica Leon Elizalde
- Research Institute, BC Children's Hospital, Vancouver, BC, Canada.,School of Population and Public Health, University of British Columbia, Vancouver, BC, Canada
| | - Elodie Portales-Casamar
- Research Institute, BC Children's Hospital, Vancouver, BC, Canada.,Department of Pediatrics, University of British Columbia, Vancouver, BC, Canada
| | - Matthias Görges
- Research Institute, BC Children's Hospital, Vancouver, BC, Canada.,Department of Anesthesiology, Pharmacology and Therapeutics, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
3
|
Nydegger U, Lung T, Risch L, Risch M, Medina Escobar P, Bodmer T. Inflammation Thread Runs across Medical Laboratory Specialities. Mediators Inflamm 2016; 2016:4121837. [PMID: 27493451 PMCID: PMC4963559 DOI: 10.1155/2016/4121837] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2016] [Accepted: 05/31/2016] [Indexed: 12/16/2022] Open
Abstract
We work on the assumption that four major specialities or sectors of medical laboratory assays, comprising clinical chemistry, haematology, immunology, and microbiology, embraced by genome sequencing techniques, are routinely in use. Medical laboratory markers for inflammation serve as model: they are allotted to most fields of medical lab assays including genomics. Incessant coding of assays aligns each of them in the long lists of big data. As exemplified with the complement gene family, containing C2, C3, C8A, C8B, CFH, CFI, and ITGB2, heritability patterns/risk factors associated with diseases with genetic glitch of complement components are unfolding. The C4 component serum levels depend on sufficient vitamin D whilst low vitamin D is inversely related to IgG1, IgA, and C3 linking vitamin sufficiency to innate immunity. Whole genome sequencing of microbial organisms may distinguish virulent from nonvirulent and antibiotic resistant from nonresistant varieties of the same species and thus can be listed in personal big data banks including microbiological pathology; the big data warehouse continues to grow.
Collapse
Affiliation(s)
- Urs Nydegger
- Labormedizinisches Zentrum Dr. Risch and Kantonsspital Graubünden, 7000 Chur, Switzerland
| | - Thomas Lung
- Labormedizinisches Zentrum Dr. Risch and Kantonsspital Graubünden, 7000 Chur, Switzerland
| | - Lorenz Risch
- Labormedizinisches Zentrum Dr. Risch and Kantonsspital Graubünden, 7000 Chur, Switzerland
| | - Martin Risch
- Labormedizinisches Zentrum Dr. Risch and Kantonsspital Graubünden, 7000 Chur, Switzerland
| | - Pedro Medina Escobar
- Labormedizinisches Zentrum Dr. Risch and Kantonsspital Graubünden, 7000 Chur, Switzerland
| | - Thomas Bodmer
- Labormedizinisches Zentrum Dr. Risch and Kantonsspital Graubünden, 7000 Chur, Switzerland
| |
Collapse
|
4
|
Szakonyi D. LEAFDATA: a literature-curated database for Arabidopsis leaf development. PLANT METHODS 2016; 12:15. [PMID: 26884807 PMCID: PMC4754890 DOI: 10.1186/s13007-016-0115-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Accepted: 02/08/2016] [Indexed: 06/02/2023]
Abstract
BACKGROUND In the post-genomic era, biological databases provide an easy access to a wide variety of scientific data. The vast quantity of literature calls for curated databases where existing knowledge is carefully organized in order to aid novel discoveries. Leaves, the main photosynthetic organs are not only vital for plant growth but also essential for maintaining the global ecosystem by producing oxygen and food. Therefore, studying and understanding leaf formation and growth are key objectives in biology. Arabidopsis thaliana to this date remains the prime experimental model organism in plant science. DESCRIPTION LEAFDATA was created as an easily accessible and searchable web tool to assemble a relevant collection of Arabidopsis leaf literature. LEAFDATA currently contains 13,553 categorized statements from 380 processed publications. LEAFDATA can be searched for genes of interest using Arabidopsis Genome Initiative identifiers, for selected papers by means of PubMed IDs, authors and specific keywords. The results page contains details of the original publications, text fragments from the curated literature grouped according to information types and direct links to PubMed pages of the original papers. CONCLUSIONS The LEAFDATA database offers access to searchable entries curated from a large number of scientific publications. Due to the unprecedented details of annotations and the fact that LEAFDATA already provides records about approximately 1600 individual loci, this database is useful for the entire plant research community.
Collapse
Affiliation(s)
- Dóra Szakonyi
- Instituto Gulbenkian de Ciência, 2780-156 Oeiras, Portugal
| |
Collapse
|
5
|
Abstract
The field of pathology is rapidly transforming from a semiquantitative and empirical science toward a big data discipline. Large data sets from across multiple omics fields may now be extracted from a patient's tissue sample. Tissue is, however, complex, heterogeneous, and prone to artifact. A reductionist view of tissue and disease progression, which does not take this complexity into account, may lead to single biomarkers failing in clinical trials. The integration of standardized multi-omics big data and the retention of valuable information on spatial heterogeneity are imperative to model complex disease mechanisms. Mathematical modeling through systems pathology approaches is the ideal medium to distill the significant information from these large, multi-parametric, and hierarchical data sets. Systems pathology may also predict the dynamical response of disease progression or response to therapy regimens from a static tissue sample. Next-generation pathology will incorporate big data with systems medicine in order to personalize clinical practice for both prognostic and predictive patient care.
Collapse
Affiliation(s)
- Peter D Caie
- Quantitative and systems pathology, University of St Andrews, North Haugh, Fife, St Andrews, KY16 9TF, UK
| | - David J Harrison
- Quantitative and systems pathology, University of St Andrews, North Haugh, Fife, St Andrews, KY16 9TF, UK.
| |
Collapse
|
6
|
Xu J, Rasmussen LV, Shaw PL, Jiang G, Kiefer RC, Mo H, Pacheco JA, Speltz P, Zhu Q, Denny JC, Pathak J, Thompson WK, Montague E. Review and evaluation of electronic health records-driven phenotype algorithm authoring tools for clinical and translational research. J Am Med Inform Assoc 2015. [PMID: 26224336 DOI: 10.1093/jamia/ocv070] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
OBJECTIVE To review and evaluate available software tools for electronic health record-driven phenotype authoring in order to identify gaps and needs for future development. MATERIALS AND METHODS Candidate phenotype authoring tools were identified through (1) literature search in four publication databases (PubMed, Embase, Web of Science, and Scopus) and (2) a web search. A collection of tools was compiled and reviewed after the searches. A survey was designed and distributed to the developers of the reviewed tools to discover their functionalities and features. RESULTS Twenty-four different phenotype authoring tools were identified and reviewed. Developers of 16 of these identified tools completed the evaluation survey (67% response rate). The surveyed tools showed commonalities but also varied in their capabilities in algorithm representation, logic functions, data support and software extensibility, search functions, user interface, and data outputs. DISCUSSION Positive trends identified in the evaluation included: algorithms can be represented in both computable and human readable formats; and most tools offer a web interface for easy access. However, issues were also identified: many tools were lacking advanced logic functions for authoring complex algorithms; the ability to construct queries that leveraged un-structured data was not widely implemented; and many tools had limited support for plug-ins or external analytic software. CONCLUSIONS Existing phenotype authoring tools could enable clinical researchers to work with electronic health record data more efficiently, but gaps still exist in terms of the functionalities of such tools. The present work can serve as a reference point for the future development of similar tools.
Collapse
Affiliation(s)
- Jie Xu
- Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Luke V Rasmussen
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Pamela L Shaw
- Galter Health Science Library, Clinical and Translational Sciences Institute (NUCATS), Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Guoqian Jiang
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Richard C Kiefer
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Huan Mo
- Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN, USA
| | - Jennifer A Pacheco
- Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Peter Speltz
- Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN, USA
| | - Qian Zhu
- Department of Information Systems, University of Maryland, Baltimore County (UMBC), Baltimore, MD, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN, USA
| | - Jyotishman Pathak
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - William K Thompson
- Center for Biomedical Research Informatics, NorthShore University Health System, Evanston, IL, USA
| | - Enid Montague
- Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| |
Collapse
|
7
|
Ferretti Y, Miyoshi NSB, Silva WA, Felipe JC. BioBankWarden: A web-based system to support translational cancer research by managing clinical and biomaterial data. Comput Biol Med 2015; 84:254-261. [PMID: 25959800 DOI: 10.1016/j.compbiomed.2015.04.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2014] [Revised: 02/10/2015] [Accepted: 04/04/2015] [Indexed: 02/04/2023]
Abstract
BACKGROUND Researchers of translational medicine face numerous challenges in attempting to bring research results to the bedside. This field of research covers a wide range of resources, including blood and tissue samples, which are processed for isolation of RNA and DNA to study cancer omics data (genomics, proteomics and metabolomics). Clinical information about patients׳ habits, family history, physical examinations, remissions, etc., is also important to underpin studies aimed at identifying patterns that lead to the development of cancer and to its successful treatment. PURPOSE Development of a web-based computer system-BioBankWarden-to manage, consolidate and integrate these diversified data, enabling cancer research groups to retrieve and analyze clinical and biomolecular data within an integrative environment. The system has a three-tier architecture comprising database, logic and user-interface layers. RESULTS The system׳s integrated database and user-friendly interface allow for the control of patient records, biomaterial storage, research groups, research projects, users and biomaterial exchange. CONCLUSIONS BioBankWarden can be used to store and retrieve specific information from different clinical fields linked to biomaterials collected from patients, providing the functionalities required to support translational research in the field of cancer.
Collapse
Affiliation(s)
- Yuri Ferretti
- Inter-institutional Post-graduation Program on Bioinformatics, University of São Paulo, Brazil; Department of Computing and Mathematics, Faculty of Philosophy Sciences and Languages of Ribeirão Preto, University of São Paulo, Brazil.
| | | | - Wilson Araújo Silva
- Center for Integrative Systems Biology - CISBi, NAP/USP, University of São Paulo, Brazil; Department of Genetics, School of Medicine of Ribeirão Preto, University of São Paulo, Brazil.
| | - Joaquim Cezar Felipe
- Inter-institutional Post-graduation Program on Bioinformatics, University of São Paulo, Brazil; Department of Computing and Mathematics, Faculty of Philosophy Sciences and Languages of Ribeirão Preto, University of São Paulo, Brazil; Center for Integrative Systems Biology - CISBi, NAP/USP, University of São Paulo, Brazil.
| |
Collapse
|