1
|
Zhang H, Lyu T, Yin P, Bost S, He X, Guo Y, Prosperi M, Hogan WR, Bian J. A scoping review of semantic integration of health data and information. Int J Med Inform 2022; 165:104834. [PMID: 35863206 DOI: 10.1016/j.ijmedinf.2022.104834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 07/06/2022] [Accepted: 07/13/2022] [Indexed: 11/25/2022]
Abstract
OBJECTIVE We summarized a decade of new research focusing on semantic data integration (SDI) since 2009, and we aim to: (1) summarize the state-of-art approaches on integrating health data and information; and (2) identify the main gaps and challenges of integrating health data and information from multiple levels and domains. MATERIALS AND METHODS We used PubMed as our focus is applications of SDI in biomedical domains and followed the Preferred Reporting Items for Systematic Review and Meta-Analyses (PRISMA) to search and report for relevant studies published between January 1, 2009 and December 31, 2021. We used Covidence-a systematic review management system-to carry out this scoping review. RESULTS The initial search from PubMed resulted in 5,326 articles using the two sets of keywords. We then removed 44 duplicates and 5,282 articles were retained for abstract screening. After abstract screening, we included 246 articles for full-text screening, among which 87 articles were deemed eligible for full-text extraction. We summarized the 87 articles from four aspects: (1) methods for the global schema; (2) data integration strategies (i.e., federated system vs. data warehousing); (3) the sources of the data; and (4) downstream applications. CONCLUSION SDI approach can effectively resolve the semantic heterogeneities across different data sources. We identified two key gaps and challenges in existing SDI studies that (1) many of the existing SDI studies used data from only single-level data sources (e.g., integrating individual-level patient records from different hospital systems), and (2) documentation of the data integration processes is sparse, threatening the reproducibility of SDI studies.
Collapse
Affiliation(s)
- Hansi Zhang
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Tianchen Lyu
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Pengfei Yin
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Sarah Bost
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Xing He
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Yi Guo
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Mattia Prosperi
- Department of Epidemiology, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Willian R Hogan
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Jiang Bian
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States.
| |
Collapse
|
2
|
Vlaanderen J, de Hoogh K, Hoek G, Peters A, Probst-Hensch N, Scalbert A, Melén E, Tonne C, de Wit GA, Chadeau-Hyam M, Katsouyanni K, Esko T, Jongsma KR, Vermeulen R. Developing the building blocks to elucidate the impact of the urban exposome on cardiometabolic-pulmonary disease: The EU EXPANSE project. Environ Epidemiol 2021; 5:e162. [PMID: 34414346 PMCID: PMC8367039 DOI: 10.1097/ee9.0000000000000162] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 06/01/2021] [Indexed: 12/30/2022] Open
Abstract
By 2030, more than 80% of Europe's population will live in an urban environment. The urban exposome, consisting of factors such as where we live and work, where and what we eat, our social network, and what chemical and physical hazards we are exposed to, provides important targets to improve population health. The EXPANSE (EXposome Powered tools for healthy living in urbAN SEttings) project will study the impact of the urban exposome on the major contributors to Europe's burden of disease: Cardio-Metabolic and Pulmonary Disease. EXPANSE will address one of the most pertinent questions for urban planners, policy makers, and European citizens: "How to maximize one's health in a modern urban environment?" EXPANSE will take the next step in exposome research by (1) bringing together exposome and health data of more than 55 million adult Europeans and OMICS information for more than 2 million Europeans; (2) perform personalized exposome assessment for 5,000 individuals in five urban regions; (3) applying ultra-high-resolution mass-spectrometry to screen for chemicals in 10,000 blood samples; (4) evaluating the evolution of the exposome and health through the life course; and (5) evaluating the impact of changes in the urban exposome on the burden of cardiometabolic and pulmonary disease. EXPANSE will translate its insights and innovations into research and dissemination tools that will be openly accessible via the EXPANSE toolbox. By applying innovative ethics-by-design throughout the project, the social and ethical acceptability of these tools will be safeguarded. EXPANSE is part of the European Human Exposome Network.
Collapse
Affiliation(s)
- Jelle Vlaanderen
- Institute for Risk Assessment Sciences, Utrecht University, Utrecht, The Netherlands
| | - Kees de Hoogh
- Institute for Risk Assessment Sciences, Utrecht University, Utrecht, The Netherlands
- Swiss Tropical Health, Basel, Switzerland
- University of Basel, Switzerland
| | - Gerard Hoek
- Institute for Risk Assessment Sciences, Utrecht University, Utrecht, The Netherlands
| | - Annette Peters
- Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | | | - Augustin Scalbert
- International Agency for Research on Cancer (IARC), Biomarkers Group, Lyon, France
| | - Erik Melén
- Department of Clinical Science and Education Södersjukhuset, Karolinska Institutet, Stockholm, Sweden
| | - Cathryn Tonne
- Barcelona Institute for Global Health (ISGlobal), Universitat Pompeu Fabra, CIBER Epidemiología y Salud Pública, Barcelona, Spain
| | - G Ardine de Wit
- Department of health care innovation and evaluation, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Centre for Nutrition, Prevention and Healthcare. National Institute of Public Health and the Environment, Bilthoven, the Netherlands
| | - Marc Chadeau-Hyam
- Institute for Risk Assessment Sciences, Utrecht University, Utrecht, The Netherlands
- Imperial College London, London, United Kingdom
| | - Klea Katsouyanni
- Imperial College London, London, United Kingdom
- National and Kapodistrian University of Athens, Athens, Greece
| | | | - Karin R Jongsma
- Department of Medical Humanities, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Roel Vermeulen
- Institute for Risk Assessment Sciences, Utrecht University, Utrecht, The Netherlands
- Department of health care innovation and evaluation, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Imperial College London, London, United Kingdom
| |
Collapse
|
3
|
Warmink-Perdijk WDB, Peters LL, Tigchelaar EF, Dekens JAM, Jankipersadsing SA, Zhernakova A, Bossers WJR, Sikkema J, de Jonge A, Reijneveld SA, Verkade HJ, Koppelman GH, Wijmenga C, Kuipers F, Scherjon SA. Lifelines NEXT: a prospective birth cohort adding the next generation to the three-generation Lifelines cohort study. Eur J Epidemiol 2020; 35:157-168. [PMID: 32100173 PMCID: PMC7125065 DOI: 10.1007/s10654-020-00614-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Accepted: 02/07/2020] [Indexed: 01/10/2023]
Abstract
Epidemiological research has shown there to be a strong relationship between preconceptional, prenatal, birth and early-life factors and lifelong health. The Lifelines NEXT is a birth cohort designed to study the effects of intrinsic and extrinsic determinants on health and disease in a four-generation design. It is embedded within the Lifelines cohort study, a prospective three-generation population-based cohort study recording the health and health-related aspects of 167,729 individuals living in Northern Netherlands. In Lifelines NEXT we aim to include 1500 pregnant Lifelines participants and intensively follow them, their partners and their children until at least 1 year after birth. Longer-term follow-up of physical and psychological health will then be embedded following Lifelines procedures. During the Lifelines NEXT study period biomaterials-including maternal and neonatal (cord) blood, placental tissue, feces, breast milk, nasal swabs and urine-will be collected from the mother and child at 10 time points. We will also collect data on medical, social, lifestyle and environmental factors via questionnaires at 14 different time points and continuous data via connected devices. The extensive collection of different (bio)materials from mother and child during pregnancy and afterwards will provide the means to relate environmental factors including maternal and neonatal microbiome composition) to (epi)genetics, health and developmental outcomes. The nesting of the study within Lifelines enables us to include preconceptional transgenerational data and can be used to identify other extended families within the cohort.
Collapse
Affiliation(s)
- Willemijn D B Warmink-Perdijk
- Department of Midwifery Science, Amsterdam Public Health Research Institute, Amsterdam UMC, Vrije Universiteit Amsterdam, Van de Boechorstraat 7, 1081 BT, Amsterdam, The Netherlands.
- Department of General Practice and Elderly Medicine, University Medical Center Groningen, University of Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands.
- AVAG (Academy Midwifery Amsterdam and Groningen), Dirk Huizingastraat 3-5, 9713 GL, Groningen, The Netherlands.
| | - Lilian L Peters
- Department of Midwifery Science, Amsterdam Public Health Research Institute, Amsterdam UMC, Vrije Universiteit Amsterdam, Van de Boechorstraat 7, 1081 BT, Amsterdam, The Netherlands
- Department of General Practice and Elderly Medicine, University Medical Center Groningen, University of Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
- AVAG (Academy Midwifery Amsterdam and Groningen), Dirk Huizingastraat 3-5, 9713 GL, Groningen, The Netherlands
| | - Ettje F Tigchelaar
- Department of Genetics, University Medical Center Groningen, University of Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - Jackie A M Dekens
- Department of Genetics, University Medical Center Groningen, University of Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
- Center for Development and Innovation, University Medical Center Groningen, University of Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - Soesma A Jankipersadsing
- Department of Genetics, University Medical Center Groningen, University of Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - Alexandra Zhernakova
- Department of Genetics, University Medical Center Groningen, University of Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - Willem J R Bossers
- Lifelines Cohort Study, Bloemsingel 1, 9713 BZ, Groningen, The Netherlands
| | - Jan Sikkema
- Center for Development and Innovation, University Medical Center Groningen, University of Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - Ank de Jonge
- Department of Midwifery Science, Amsterdam Public Health Research Institute, Amsterdam UMC, Vrije Universiteit Amsterdam, Van de Boechorstraat 7, 1081 BT, Amsterdam, The Netherlands
- AVAG (Academy Midwifery Amsterdam and Groningen), Dirk Huizingastraat 3-5, 9713 GL, Groningen, The Netherlands
| | - Sijmen A Reijneveld
- Department of Health Sciences, University Medical Center Groningen, University of Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - Henkjan J Verkade
- Department of Pediatrics, Pediatric Gastroenterology - Hepatology, University Medical Center Groningen, University of Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - Gerard H Koppelman
- Department of Pediatric Pulmonology and Pediatric Allergy, Beatrix Children's Hospital, University Medical Center Groningen, University of Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
- Groningen Research Institute for Asthma and COPD (GRIAC), University Medical Center Groningen, University of Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - Cisca Wijmenga
- Department of Genetics, University Medical Center Groningen, University of Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - Folkert Kuipers
- Department of Pediatrics/Laboratory Medicine, University Medical Center Groningen, University of Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - Sicco A Scherjon
- Department of Obstetrics and Gynecology, University Medical Center Groningen, University of Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| |
Collapse
|
4
|
Kourou KD, Pezoulas VC, Georga EI, Exarchos TP, Tsanakas P, Tsiknakis M, Varvarigou T, De Vita S, Tzioufas A, Fotiadis DI. Cohort Harmonization and Integrative Analysis From a Biomedical Engineering Perspective. IEEE Rev Biomed Eng 2019; 12:303-318. [DOI: 10.1109/rbme.2018.2855055] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
5
|
Fostering population-based cohort data discovery: The Maelstrom Research cataloguing toolkit. PLoS One 2018; 13:e0200926. [PMID: 30040866 PMCID: PMC6057635 DOI: 10.1371/journal.pone.0200926] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Accepted: 07/05/2018] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND The lack of accessible and structured documentation creates major barriers for investigators interested in understanding, properly interpreting and analyzing cohort data and biological samples. Providing the scientific community with open information is essential to optimize usage of these resources. A cataloguing toolkit is proposed by Maelstrom Research to answer these needs and support the creation of comprehensive and user-friendly study- and network-specific web-based metadata catalogues. METHODS Development of the Maelstrom Research cataloguing toolkit was initiated in 2004. It was supported by the exploration of existing catalogues and standards, and guided by input from partner initiatives having used or pilot tested incremental versions of the toolkit. RESULTS The cataloguing toolkit is built upon two main components: a metadata model and a suite of open-source software applications. The model sets out specific fields to describe study profiles; characteristics of the subpopulations of participants; timing and design of data collection events; and datasets/variables collected at each data collection event. It also includes the possibility to annotate variables with different classification schemes. When combined, the model and software support implementation of study and variable catalogues and provide a powerful search engine to facilitate data discovery. CONCLUSIONS The Maelstrom Research cataloguing toolkit already serves several national and international initiatives and the suite of software is available to new initiatives through the Maelstrom Research website. With the support of new and existing partners, we hope to ensure regular improvements of the toolkit.
Collapse
|
6
|
Doiron D, Marcon Y, Fortier I, Burton P, Ferretti V. Software Application Profile: Opal and Mica: open-source software solutions for epidemiological data management, harmonization and dissemination. Int J Epidemiol 2018; 46:1372-1378. [PMID: 29025122 PMCID: PMC5837212 DOI: 10.1093/ije/dyx180] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/08/2017] [Indexed: 01/11/2023] Open
Abstract
Motivation Improving the dissemination of information on existing epidemiological studies and facilitating the interoperability of study databases are essential to maximizing the use of resources and accelerating improvements in health. To address this, Maelstrom Research proposes Opal and Mica, two inter-operable open-source software packages providing out-of-the-box solutions for epidemiological data management, harmonization and dissemination. Implementation Opal and Mica are two standalone but inter-operable web applications written in Java, JavaScript and PHP. They provide web services and modern user interfaces to access them. General features Opal allows users to import, manage, annotate and harmonize study data. Mica is used to build searchable web portals disseminating study and variable metadata. When used conjointly, Mica users can securely query and retrieve summary statistics on geographically dispersed Opal servers in real-time. Integration with the DataSHIELD approach allows conducting more complex federated analyses involving statistical models. Availability Opal and Mica are open-source and freely available at [www.obiba.org] under a General Public License (GPL) version 3, and the metadata models and taxonomies that accompany them are available under a Creative Commons licence.
Collapse
Affiliation(s)
- Dany Doiron
- Research Institute of the McGill University Health Centre, Montreal, QC, Canada.,Swiss Tropical and Public Health Institute, Basel, Switzerland.,University of Basel, Basel, Switzerland
| | - Yannick Marcon
- Research Institute of the McGill University Health Centre, Montreal, QC, Canada
| | - Isabel Fortier
- Research Institute of the McGill University Health Centre, Montreal, QC, Canada
| | - Paul Burton
- University of Bristol, School of Social and Community Medicine, Bristol, UK
| | | |
Collapse
|
7
|
Bastião Silva L, Trifan A, Luís Oliveira J. MONTRA: An agile architecture for data publishing and discovery. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2018; 160:33-42. [PMID: 29728244 DOI: 10.1016/j.cmpb.2018.03.024] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2017] [Revised: 02/26/2018] [Accepted: 03/27/2018] [Indexed: 06/08/2023]
Abstract
BACKGROUND AND OBJECTIVE Data catalogues are a common form of capturing and presenting information about a specific kind of entity (e.g. products, services, professionals, datasets, etc.). However, the construction of a web-based catalogue for a particular scenario normally implies the development of a specific and dedicated solution. In this paper, we present MONTRA, a rapid-application development framework designed to facilitate the integration and discovery of heterogeneous objects, which may be characterized by distinct data structures. METHODS MONTRA was developed following a plugin-based architecture to allow dynamic composition of services over represented datasets. The core of MONTRA's functionalities resides in a flexible data skeleton used to characterize data entities, and from which a fully-fledged web data catalogue is automatically generated, ensuring access control and data privacy. RESULTS MONTRA is being successfully used by several European projects to collect and manage biomedical databases. In this paper, we describe three of these applications scenarios. CONCLUSIONS This work was motivated by the plethora of geographically scattered biomedical repositories, and by the role they can play altogether for the understanding of diseases and of the real-world effectiveness of treatments. Using metadata to expose datasets' characteristics, MONTRA greatly simplifies the task of building data catalogues. The source code is publicly available at https://github.com/bioinformatics-ua/montra.
Collapse
|
8
|
Pang C, Kelpin F, van Enckevort D, Eklund N, Silander K, Hendriksen D, de Haan M, Jetten J, de Boer T, Charbon B, Holub P, Hillege H, Swertz MA. BiobankUniverse: automatic matchmaking between datasets for biobank data discovery and integration. Bioinformatics 2017; 33:3627-3634. [PMID: 29036577 PMCID: PMC5870622 DOI: 10.1093/bioinformatics/btx478] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2017] [Accepted: 07/22/2017] [Indexed: 11/24/2022] Open
Abstract
Motivation Biobanks are indispensable for large-scale genetic/epidemiological studies, yet it remains difficult for researchers to determine which biobanks contain data matching their research questions. Results To overcome this, we developed a new matching algorithm that identifies pairs of related data elements between biobanks and research variables with high precision and recall. It integrates lexical comparison, Unified Medical Language System ontology tagging and semantic query expansion. The result is BiobankUniverse, a fast matchmaking service for biobanks and researchers. Biobankers upload their data elements and researchers their desired study variables, BiobankUniverse automatically shortlists matching attributes between them. Users can quickly explore matching potential and search for biobanks/data elements matching their research. They can also curate matches and define personalized data-universes. Availability and implementation BiobankUniverse is available at http://biobankuniverse.com or can be downloaded as part of the open source MOLGENIS suite at http://github.com/molgenis/molgenis. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chao Pang
- Department of Genetics, Genomics Coordination Center, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands.,Department of Epidemiology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Fleur Kelpin
- Department of Genetics, Genomics Coordination Center, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - David van Enckevort
- Department of Genetics, Genomics Coordination Center, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Niina Eklund
- Department of Public Health Solutions, National Institute for Health and Welfare, Helsinki, Finland
| | - Kaisa Silander
- Department of Public Health Solutions, National Institute for Health and Welfare, Helsinki, Finland
| | - Dennis Hendriksen
- Department of Genetics, Genomics Coordination Center, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Mark de Haan
- Department of Genetics, Genomics Coordination Center, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Jonathan Jetten
- Department of Genetics, Genomics Coordination Center, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Tommy de Boer
- Department of Genetics, Genomics Coordination Center, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Bart Charbon
- Department of Genetics, Genomics Coordination Center, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Petr Holub
- Biobanking and BioMolecular Resources Research Infrastructure (BBMRI-ERIC), Graz, Austria
| | - Hans Hillege
- Department of Epidemiology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Morris A Swertz
- Department of Genetics, Genomics Coordination Center, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands.,Department of Epidemiology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| |
Collapse
|
9
|
Hofer-Picout P, Pichler H, Eder J, Neururer SB, Müller H, Reihs R, Holub P, Insam T, Goebel G. Conception and Implementation of an Austrian Biobank Directory Integration Framework. Biopreserv Biobank 2017; 15:332-340. [PMID: 28380303 DOI: 10.1089/bio.2016.0113] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
INTRODUCTION Sample collections and data are hosted within different biobanks at diverse institutions across Europe. Our data integration framework aims at incorporating data about sample collections from different biobanks into a common research infrastructure, facilitating researchers' abilities to obtain high-quality samples to conduct their research. The resulting information must be locally gathered and distributed to searchable higher level information biobank directories to maximize the visibility on the national and European levels. Therefore, biobanks and sample collections must be clearly described and unambiguously identified. We describe how to tackle the challenges of integrating biobank-related data between biobank directories using heterogeneous data schemas and different technical environments. METHODS To establish a data exchange infrastructure between all biobank directories involved, we propose the following steps: (A) identification of core entities, terminology, and semantic relationships, (B) harmonization of heterogeneous data schemas of different Biobanking and Biomolecular Resources Research Infrastructure (BBMRI) directories, and (C) formulation of technical core principles for biobank data exchange between directories. RESULTS (A) We identified the major core elements to describe biobanks in biobank directories. Since all directory data models were partially based on Minimum Information About BIobank Data Sharing (MIABIS) 2.0, the MIABIS 2.0 core model was used for compatibility. (B) Different projection scenarios were elaborated in collaboration with all BBMRI.at partners. A minimum set of mandatory and optional core entities and data items was defined for mapping across all directory levels. (C) Major core data exchange principles were formulated and data interfaces implemented by all biobank directories involved. DISCUSSION We agreed on a MIABIS 2.0-based core set of harmonized biobank attributes and established a list of data exchange core principles for integrating biobank directories on different levels. This generic approach and the data exchange core principles proposed herein can also be applied in related tasks like integration and harmonization of biobank data on the individual sample and patient levels.
Collapse
Affiliation(s)
- Philipp Hofer-Picout
- 1 Department of Medical Statistics, Informatics and Health Economics, Medical University of Innsbruck , Innsbruck, Austria
| | - Horst Pichler
- 2 Department of Information and Communication Systems, University of Klagenfurt , Klagenfurt, Austria
| | - Johann Eder
- 2 Department of Information and Communication Systems, University of Klagenfurt , Klagenfurt, Austria
| | - Sabrina B Neururer
- 1 Department of Medical Statistics, Informatics and Health Economics, Medical University of Innsbruck , Innsbruck, Austria
| | - Heimo Müller
- 3 Institute of Pathology, Medical University Graz , Graz, Austria
| | - Robert Reihs
- 3 Institute of Pathology, Medical University Graz , Graz, Austria
| | - Petr Holub
- 4 Institute of Computer Science, Masaryk University , Brno, Czech Republic .,5 Biobanking and Biomolecular Resources Research Infrastructure (BBMRI-ERIC) , Graz, Austria
| | - Thomas Insam
- 6 Department of Obstetrics and Gynecology, Medical University of Innsbruck , Innsbruck, Austria
| | - Georg Goebel
- 1 Department of Medical Statistics, Informatics and Health Economics, Medical University of Innsbruck , Innsbruck, Austria
| |
Collapse
|