Zhang H, Lyu T, Yin P, Bost S, He X, Guo Y, Prosperi M, Hogan WR, Bian J. A scoping review of semantic integration of health data and information.
Int J Med Inform 2022;
165:104834. [PMID:
35863206 DOI:
10.1016/j.ijmedinf.2022.104834]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 07/06/2022] [Accepted: 07/13/2022] [Indexed: 11/25/2022]
Abstract
OBJECTIVE
We summarized a decade of new research focusing on semantic data integration (SDI) since 2009, and we aim to: (1) summarize the state-of-art approaches on integrating health data and information; and (2) identify the main gaps and challenges of integrating health data and information from multiple levels and domains.
MATERIALS AND METHODS
We used PubMed as our focus is applications of SDI in biomedical domains and followed the Preferred Reporting Items for Systematic Review and Meta-Analyses (PRISMA) to search and report for relevant studies published between January 1, 2009 and December 31, 2021. We used Covidence-a systematic review management system-to carry out this scoping review.
RESULTS
The initial search from PubMed resulted in 5,326 articles using the two sets of keywords. We then removed 44 duplicates and 5,282 articles were retained for abstract screening. After abstract screening, we included 246 articles for full-text screening, among which 87 articles were deemed eligible for full-text extraction. We summarized the 87 articles from four aspects: (1) methods for the global schema; (2) data integration strategies (i.e., federated system vs. data warehousing); (3) the sources of the data; and (4) downstream applications.
CONCLUSION
SDI approach can effectively resolve the semantic heterogeneities across different data sources. We identified two key gaps and challenges in existing SDI studies that (1) many of the existing SDI studies used data from only single-level data sources (e.g., integrating individual-level patient records from different hospital systems), and (2) documentation of the data integration processes is sparse, threatening the reproducibility of SDI studies.
Collapse