1
|
Pereira A, Almeida JR, Lopes RP, Oliveira JL. Querying semantic catalogues of biomedical databases. J Biomed Inform 2023; 137:104272. [PMID: 36563828 DOI: 10.1016/j.jbi.2022.104272] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 11/03/2022] [Accepted: 12/12/2022] [Indexed: 12/24/2022]
Abstract
BACKGROUND Secondary use of health data is a valuable source of knowledge that boosts observational studies, leading to important discoveries in the medical and biomedical sciences. The fundamental guiding principle for performing a successful observational study is the research question and the approach in advance of executing a study. However, in multi-centre studies, finding suitable datasets to support the study is challenging, time-consuming, and sometimes impossible without a deep understanding of each dataset. METHODS We propose a strategy for retrieving biomedical datasets of interest that were semantically annotated, using an interface built by applying a methodology for transforming natural language questions into formal language queries. The advantages of creating biomedical semantic data are enhanced by using natural language interfaces to issue complex queries without manipulating a logical query language. RESULTS Our methodology was validated using Alzheimer's disease datasets published in a European platform for sharing and reusing biomedical data. We converted data to semantic information format using biomedical ontologies in everyday use in the biomedical community and published it as a FAIR endpoint. We have considered natural language questions of three types: single-concept questions, questions with exclusion criteria, and multi-concept questions. Finally, we analysed the performance of the question-answering module we used and its limitations. The source code is publicly available at https://bioinformatics-ua.github.io/BioKBQA/. CONCLUSION We propose a strategy for using information extracted from biomedical data and transformed into a semantic format using open biomedical ontologies. Our method uses natural language to formulate questions to be answered by this semantic data without the direct use of formal query languages.
Collapse
Affiliation(s)
| | - João Rafael Almeida
- DETI/IEETA, LASI, University of Aveiro, Aveiro, Portugal; Department of Computation, University of A Coruña, A Coruña, Spain.
| | - Rui Pedro Lopes
- CeDRI, Polytechnic Institute of Bragança, Bragança, Portugal.
| | | |
Collapse
|
2
|
Bose N, Brookes AJ, Scordis P, Visser PJ. Data and sample sharing as an enabler for large-scale biomarker research and development: The EPND perspective. Front Neurol 2022; 13:1031091. [PMID: 36530625 PMCID: PMC9748546 DOI: 10.3389/fneur.2022.1031091] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 10/24/2022] [Indexed: 08/08/2023] Open
Abstract
Biomarker discovery, development, and validation are reliant on large-scale analyses of high-quality samples and data. Currently, significant quantities of data and samples have been generated by European studies on Alzheimer's disease (AD) and other neurodegenerative diseases (NDD), representing a valuable resource for developing biomarkers to support early detection of disease, treatment monitoring, and patient stratification. However, discovery of, access to, and sharing of data and samples from AD and NDD research are hindered both by silos that limit collaboration, and by the array of complex requirements for secure, legal, and ethical sharing. In this Perspective article, we examine key challenges currently hampering large-scale biomarker research, and outline how the European Platform for Neurodegenerative Diseases (EPND) plans to address them. The first such challenge is a fragmented landscape filled with technical barriers that make it difficult to discover and access high-quality samples and data in one location. A second challenge is related to the complex array of legal and ethical requirements that must be navigated by researchers when sharing data and samples, to ensure compliance with data protection regulations and research ethics. Another challenge is the lack of broad-scale collaboration and opportunities to facilitate partnerships between data and sample contributors and researchers, in addition to a lack of regulatory engagement early in the research process to enable validation of potential biomarkers. A further challenge facing projects is the need to remain sustainable beyond initial funding periods, ensuring data and samples are shared and reused, thereby driving further research and innovation. In addressing these challenges, EPND will enable an environment of faster and more disruptive research on diagnostics and disease-modifying therapies for Alzheimer's disease and other neurodegenerative diseases.
Collapse
Affiliation(s)
- Niranjan Bose
- Health and Life Sciences, Gates Ventures, Kirkland, WA, United States
- Department of Health Metrics Sciences, University of Washington, Seattle, WA, United States
| | - Anthony J. Brookes
- Department of Genetics and Genome Biology, University of Leicester, Leicester, United Kingdom
| | | | - Pieter Jelle Visser
- Alzheimer Center Amsterdam, Department of Neurology, Amsterdam Neuroscience, Vrije Universiteit Amsterdam, Amsterdam UMC, Amsterdam, Netherlands
- Alzheimer Center Limburg, School for Mental Health and Neuroscience, Maastricht University, Maastricht, Netherlands
- Department of Neurobiology, Care Sciences and Society, Division of Neurogeriatrics, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
3
|
Semantic Data Visualisation for Biomedical Database Catalogues. Healthcare (Basel) 2022; 10:healthcare10112287. [DOI: 10.3390/healthcare10112287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 11/08/2022] [Accepted: 11/10/2022] [Indexed: 11/16/2022] Open
Abstract
Biomedical databases often have restricted access policies and governance rules. Thus, an adequate description of their content is essential for researchers who wish to use them for medical research. A strategy for publishing information without disclosing patient-level data is through database fingerprinting and aggregate characterisations. However, this information is still presented in a format that makes it challenging to search, analyse, and decide on the best databases for a domain of study. Several strategies allow one to visualise and compare the characteristics of multiple biomedical databases. Our study focused on a European platform for sharing and disseminating biomedical data. We use semantic data visualisation techniques to assist in comparing descriptive metadata from several databases. The great advantage lies in streamlining the database selection process, ensuring that sensitive details are not shared. To address this goal, we have considered two levels of data visualisation, one characterising a single database and the other involving multiple databases in network-level visualisations. This study revealed the impact of the proposed visualisations and some open challenges in representing semantically annotated biomedical datasets. Identifying future directions in this scope was one of the outcomes of this work.
Collapse
|
4
|
Almeida JR, Pratas D, Oliveira JL. A semi-automatic methodology for analysing distributed and private biobanks. Comput Biol Med 2020; 130:104180. [PMID: 33360272 DOI: 10.1016/j.compbiomed.2020.104180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 12/14/2020] [Accepted: 12/14/2020] [Indexed: 10/22/2022]
Abstract
Privacy issues limit the analysis and cross-exploration of most distributed and private biobanks, often raised by the multiple dimensionality and sensitivity of the data associated with access restrictions and policies. These characteristics prevent collaboration between entities, constituting a barrier to emergent personalized and public health challenges, namely the discovery of new druggable targets, identification of disease-causing genetic variants, or the study of rare diseases. In this paper, we propose a semi-automatic methodology for the analysis of distributed and private biobanks. The strategies involved in the proposed methodology efficiently enable the creation and execution of unified genomic studies using distributed repositories, without compromising the information present in the datasets. We apply the methodology to a case study in the current Covid-19, ensuring the combination of the diagnostics from multiple entities while maintaining privacy through a completely identical procedure. Moreover, we show that the methodology follows a simple, intuitive, and practical scheme.
Collapse
Affiliation(s)
- João Rafael Almeida
- DETI/IEETA, University of Aveiro, Aveiro, Portugal; Department of Computation, University of A Coruña, A Coruña, Spain.
| | - Diogo Pratas
- DETI/IEETA, University of Aveiro, Aveiro, Portugal; Department of Virology, University of Helsinki, Helsinki, Finland.
| | | |
Collapse
|
5
|
Lovestone S. The European medical information framework: A novel ecosystem for sharing healthcare data across Europe. Learn Health Syst 2020; 4:e10214. [PMID: 32313838 PMCID: PMC7156868 DOI: 10.1002/lrh2.10214] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Revised: 11/27/2019] [Accepted: 11/29/2019] [Indexed: 12/15/2022] Open
Abstract
INTRODUCTION The European medical information framework (EMIF) was an Innovative Medicines Initiative project jointly supported by the European Union and the European Federation of Pharmaceutical Industries and Associations, that generated a common technology and governance framework to identify, assess and (re)use healthcare data, to facilitate real-world data research. The objectives of EMIF included providing a unified platform to support a wide range of studies within two verification programmes-Alzheimer's disease (EMIF-AD), and metabolic consequences of obesity (EMIF-MET). METHODS The EMIF platform was built around two main data-types: electronic health record data and research cohort data, and the platform architecture composed of a set of tools designed to enable data discovery and characterisation. This included the EMIF catalogue, which allowed users to find relevant data sources, including the data-types collected. Data harmonisation via a common data model were central to the project especially for population data sources. EMIF also developed an ethical code of practice to ensure data protection, patient confidentiality and compliance with the European Data Protection Directive, and GDPR. RESULTS Currently 18 population-based disease agnostic and 60 cohort-based Alzheimer's data partners from across 14 countries are contained within the catalogue, and this will continue to expand. The work conducted in EMIF-AD and EMIF-MET includes standardizing cohorts, summarising baseline characteristics of patients, developing diagnostic algorithms, epidemiological studies, identifying and validating novel biomarkers and selecting potential patient samples for pharmacological intervention. CONCLUSIONS EMIF was designed to provide a sustainable model as demonstrated by the sustainability plans for EMIF-AD. Although network-wide studies using EMIF were not conducted during this project to evaluate its sustainability, learning from EMIF will be used in the follow-on IMI-2 project, European Health Data and Evidence Network (EHDEN). Furthermore, EMIF has facilitated collaborations between partners and continues to promote a wider adoption of principles, technology and architecture through some of its continued work.
Collapse
Affiliation(s)
- Simon Lovestone
- Neurodegeneration, Janssen R&D, Janssen Pharmaceutica, Beerse, Belgium
| | | |
Collapse
|
6
|
TASKA: A modular task management system to support health research studies. BMC Med Inform Decis Mak 2019; 19:121. [PMID: 31266480 PMCID: PMC6604289 DOI: 10.1186/s12911-019-0844-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Accepted: 06/20/2019] [Indexed: 11/25/2022] Open
Abstract
Background Many healthcare databases have been routinely collected over the past decades, to support clinical practice and administrative services. However, their secondary use for research is often hindered by restricted governance rules. Furthermore, health research studies typically involve many participants with complementary roles and responsibilities which require proper process management. Results From a wide set of requirements collected from European clinical studies, we developed TASKA, a task/workflow management system that helps to cope with the socio-technical issues arising when dealing with multidisciplinary and multi-setting clinical studies. The system is based on a two-layered architecture: 1) the backend engine, which follows a micro-kernel pattern, for extensibility, and RESTful web services, for decoupling from the web clients; 2) and the client, entirely developed in ReactJS, allowing the construction and management of studies through a graphical interface. TASKA is a GNU GPL open source project, accessible at https://github.com/bioinformatics-ua/taska. A demo version is also available at https://bioinformatics.ua.pt/taska. Conclusions The system is currently used to support feasibility studies across several institutions and countries, in the context of the European Medical Information Framework (EMIF) project. The tool was shown to simplify the set-up of health studies, the management of participants and their roles, as well as the overall governance process.
Collapse
|
7
|
Trifan A, Oliveira JL. Patient data discovery platforms as enablers of biomedical and translational research: A systematic review. J Biomed Inform 2019; 93:103154. [PMID: 30922867 DOI: 10.1016/j.jbi.2019.103154] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2018] [Revised: 03/15/2019] [Accepted: 03/18/2019] [Indexed: 11/28/2022]
Abstract
BACKGROUND The global shift from paper health records to electronic ones has led to an impressive growth of biomedical digital data along the past two decades. Exploring and extracting knowledge from these data has the potential to enhance translational research and lead to positive outcomes for the population's health and healthcare. OBECTIVE The aim of this study was to conduct a systematic review to identify software platforms that enable discovery, secondary use and interoperability of biomedical data. Additionally, we aim evaluating the identified solutions in terms of clinical interest and main healthcare-related outcomes. METHODS A systematic search of the scientific literature published and indexed in Pubmed between January 2014 and September 2018 was performed. Inclusion criteria were as follows: relevance for the topic of biomedical data discovery, English language, and free full text. To increase the recall, we developed a semi-automatic and incremental methodology to retrieve articles that cite one or more of the previous set. RESULTS A total number of 500 candidate papers were retrieved through this methodology. Of these, 85 were eligible for abstract assessment. Finally, 37 studies qualified for a full-text review, and 20 provided enough information for the study objectives. CONCLUSIONS This study revealed that biomedical discovery platforms are both a current necessity and a significantly innovative agent in the area of healthcare. The outcomes that were identified, in terms of scientific publications, clinical studies and research collaborations stand as evidence.
Collapse
|
8
|
Pedrosa M, Silva JM, Silva JF, Matos S, Costa C. SCREEN-DR: Collaborative platform for diabetic retinopathy. Int J Med Inform 2018; 120:137-146. [PMID: 30409338 DOI: 10.1016/j.ijmedinf.2018.10.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2018] [Revised: 09/25/2018] [Accepted: 10/14/2018] [Indexed: 10/28/2022]
Abstract
BACKGROUND AND OBJECTIVE Diabetic retinopathy (DR) is the most prevalent microvascular complication of diabetes mellitus and can lead to irreversible visual loss. Screening programs, based on retinal imaging techniques, are fundamental to detect the disease since the initial stages are asymptomatic. Most of these examinations reflect negative cases and many have poor image quality, representing an important inefficiency factor. The SCREEN-DR project aims to tackle this limitation, by researching and developing computer-aided methods for diabetic retinopathy detection. This article presents a multidisciplinary collaborative platform that was created to meet the needs of physicians and researchers, aiming at the creation of machine learning algorithms to facilitate the screening process. METHODS Our proposal is a collaborative platform for textual and visual annotation of image datasets. The architecture and layout were optimized for annotating DR images by gathering feedback from several physicians during the design and conceptualization of the platform. It allows the aggregation and indexing of imagiology studies from diverse sources, and supports the creation and annotation of phenotype-specific datasets to feed artificial intelligence algorithms. The platform makes use of an anonymization pipeline and role-based access control for securing personal data. RESULTS The SCREEN-DR platform has been deployed in the production environment of the SCREEN-DR project at http://demo.dicoogle.com/screen-dr, and the source code of the project is publicly available. We provide a description of the platform's interface and use cases it supports. At the time of publication, four physicians have created a total of 1826 annotations for 701 distinct images, and the annotated data has been used for training classification models.
Collapse
|