51
|
Lapins M, Arvidsson S, Lampa S, Berg A, Schaal W, Alvarsson J, Spjuth O. A confidence predictor for logD using conformal regression and a support-vector machine. J Cheminform 2018; 10:17. [PMID: 29616425 PMCID: PMC5882484 DOI: 10.1186/s13321-018-0271-1] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Accepted: 03/25/2018] [Indexed: 02/03/2023] Open
Abstract
Lipophilicity is a major determinant of ADMET properties and overall suitability of drug candidates. We have developed large-scale models to predict water–octanol distribution coefficient (logD) for chemical compounds, aiding drug discovery projects. Using ACD/logD data for 1.6 million compounds from the ChEMBL database, models are created and evaluated by a support-vector machine with a linear kernel using conformal prediction methodology, outputting prediction intervals at a specified confidence level. The resulting model shows a predictive ability of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\hbox {Q}^{2}=0.973$$\end{document}Q2=0.973 and with the best performing nonconformity measure having median prediction interval of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\pm ~0.39$$\end{document}±0.39 log units at 80% confidence and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\pm ~0.60$$\end{document}±0.60 log units at 90% confidence. The model is available as an online service via an OpenAPI interface, a web page with a molecular editor, and we also publish predictive values at 90% confidence level for 91 M PubChem structures in RDF format for download and as an URI resolver service.![]()
Collapse
Affiliation(s)
- Maris Lapins
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24, Uppsala, Sweden
| | - Staffan Arvidsson
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24, Uppsala, Sweden
| | - Samuel Lampa
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24, Uppsala, Sweden
| | - Arvid Berg
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24, Uppsala, Sweden
| | - Wesley Schaal
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24, Uppsala, Sweden
| | - Jonathan Alvarsson
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24, Uppsala, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24, Uppsala, Sweden.
| |
Collapse
|
52
|
Xin J, Afrasiabi C, Lelong S, Adesara J, Tsueng G, Su AI, Wu C. Cross-linking BioThings APIs through JSON-LD to facilitate knowledge exploration. BMC Bioinformatics 2018; 19:30. [PMID: 29390967 PMCID: PMC5796402 DOI: 10.1186/s12859-018-2041-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2017] [Accepted: 01/24/2018] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND Application Programming Interfaces (APIs) are now widely used to distribute biological data. And many popular biological APIs developed by many different research teams have adopted Javascript Object Notation (JSON) as their primary data format. While usage of a common data format offers significant advantages, that alone is not sufficient for rich integrative queries across APIs. RESULTS Here, we have implemented JSON for Linking Data (JSON-LD) technology on the BioThings APIs that we have developed, MyGene.info , MyVariant.info and MyChem.info . JSON-LD provides a standard way to add semantic context to the existing JSON data structure, for the purpose of enhancing the interoperability between APIs. We demonstrated several use cases that were facilitated by semantic annotations using JSON-LD, including simpler and more precise query capabilities as well as API cross-linking. CONCLUSIONS We believe that this pattern offers a generalizable solution for interoperability of APIs in the life sciences.
Collapse
Affiliation(s)
- Jiwen Xin
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Cyrus Afrasiabi
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Sebastien Lelong
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Julee Adesara
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Ginger Tsueng
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Andrew I Su
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Chunlei Wu
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA.
| |
Collapse
|
53
|
Garcia A, Lopez F, Garcia L, Giraldo O, Bucheli V, Dumontier M. Biotea: semantics for Pubmed Central. PeerJ 2018; 6:e4201. [PMID: 29312824 PMCID: PMC5755483 DOI: 10.7717/peerj.4201] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2017] [Accepted: 12/07/2017] [Indexed: 01/26/2023] Open
Abstract
A significant portion of biomedical literature is represented in a manner that makes it difficult for consumers to find or aggregate content through a computational query. One approach to facilitate reuse of the scientific literature is to structure this information as linked data using standardized web technologies. In this paper we present the second version of Biotea, a semantic, linked data version of the open-access subset of PubMed Central that has been enhanced with specialized annotation pipelines that uses existing infrastructure from the National Center for Biomedical Ontology. We expose our models, services, software and datasets. Our infrastructure enables manual and semi-automatic annotation, resulting data are represented as RDF-based linked data and can be readily queried using the SPARQL query language. We illustrate the utility of our system with several use cases. Our datasets, methods and techniques are available at http://biotea.github.io.
Collapse
Affiliation(s)
- Alexander Garcia
- Ontology Engineering Group, Universidad Politécnica de Madrid, Madrid, Spain
| | - Federico Lopez
- Escuela de Ingeniería de Sistemas y Computación, Universidad del Valle, Cali, Colombia
| | - Leyla Garcia
- Temporal Knowledge Bases Group, Department of Computer Languages and Systems, Universitat Jaume I, Castelló de la Plana, Spain
| | - Olga Giraldo
- Ontology Engineering Group, Universidad Politécnica de Madrid, Madrid, Spain
| | - Victor Bucheli
- Escuela de Ingeniería de Sistemas y Computación, Universidad del Valle, Cali, Colombia
| | - Michel Dumontier
- Maastricht University, Institute of Data Science, Maastricht, The Netherlands
| |
Collapse
|
54
|
Kawashima S, Katayama T, Hatanaka H, Kushida T, Takagi T. NBDC RDF portal: a comprehensive repository for semantic data in life sciences. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2018:5255118. [PMID: 30576482 PMCID: PMC6301334 DOI: 10.1093/database/bay123] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/05/2018] [Accepted: 10/15/2018] [Indexed: 11/28/2022]
Abstract
In the life sciences, researchers increasingly want to access multiple databases in an integrated way. However, different databases currently use different formats and vocabularies, hindering the proper integration of heterogeneous life science data. Adopting the Resource Description Framework (RDF) has the potential to address such issues by improving database interoperability, leading to advances in automatic data processing. Based on this idea, we have advised many Japanese database development groups to expose their databases in RDF. To further promote such activities, we have developed an RDF-based life science dataset repository called the National Bioscience Database Center (NBDC) RDF portal. All the datasets in this repository have been reviewed by the NBDC to ensure interoperability and queryability. As of July 2018, the service includes 21 RDF datasets, comprising over 45.5 billion triples. It provides SPARQL endpoints for all datasets, useful metadata and the ability to download RDF files. The NBDC RDF portal can be accessed at https://integbio.jp/rdf/.
Collapse
Affiliation(s)
- Shuichi Kawashima
- Database Center for Life Science, Research Organization of Information and Systems, 178-4-4 Wakashiba, Kashiwa, Chiba, Japan
| | - Toshiaki Katayama
- Database Center for Life Science, Research Organization of Information and Systems, 178-4-4 Wakashiba, Kashiwa, Chiba, Japan
| | - Hideki Hatanaka
- National Bioscience Database Center, Japan Science and Technology Agency, 5-3 Yonbancho, Chiyoda-ku, Tokyo, Japan
| | - Tatsuya Kushida
- National Bioscience Database Center, Japan Science and Technology Agency, 5-3 Yonbancho, Chiyoda-ku, Tokyo, Japan
| | - Toshihisa Takagi
- National Bioscience Database Center, Japan Science and Technology Agency, 5-3 Yonbancho, Chiyoda-ku, Tokyo, Japan.,DNA Data Bank of Japan Center, National Institute of Genetics, Shizuoka, Japan.,Department of Biological Sciences, Graduate School of Science, The University of Tokyo, 2-11-16 Yayoi, Bunkyo-ku, Tokyo, Japan
| |
Collapse
|
55
|
|
56
|
Esteban-Gil A, Fernández-Breis JT, Boeker M. Analysis and visualization of disease courses in a semantically-enabled cancer registry. J Biomed Semantics 2017; 8:46. [PMID: 28962670 PMCID: PMC5622544 DOI: 10.1186/s13326-017-0154-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2016] [Accepted: 09/19/2017] [Indexed: 12/20/2022] Open
Abstract
Background Regional and epidemiological cancer registries are important for cancer research and the quality management of cancer treatment. Many technological solutions are available to collect and analyse data for cancer registries nowadays. However, the lack of a well-defined common semantic model is a problem when user-defined analyses and data linking to external resources are required. The objectives of this study are: (1) design of a semantic model for local cancer registries; (2) development of a semantically-enabled cancer registry based on this model; and (3) semantic exploitation of the cancer registry for analysing and visualising disease courses. Results Our proposal is based on our previous results and experience working with semantic technologies. Data stored in a cancer registry database were transformed into RDF employing a process driven by OWL ontologies. The semantic representation of the data was then processed to extract semantic patient profiles, which were exploited by means of SPARQL queries to identify groups of similar patients and to analyse the disease timelines of patients. Based on the requirements analysis, we have produced a draft of an ontology that models the semantics of a local cancer registry in a pragmatic extensible way. We have implemented a Semantic Web platform that allows transforming and storing data from cancer registries in RDF. This platform also permits users to formulate incremental user-defined queries through a graphical user interface. The query results can be displayed in several customisable ways. The complex disease timelines of individual patients can be clearly represented. Different events, e.g. different therapies and disease courses, are presented according to their temporal and causal relations. Conclusion The presented platform is an example of the parallel development of ontologies and applications that take advantage of semantic web technologies in the medical field. The semantic structure of the representation renders it easy to analyse key figures of the patients and their evolution at different granularity levels.
Collapse
Affiliation(s)
- Angel Esteban-Gil
- Fundación para la Formación e Investigación Sanitarias de la Región de Murcia, Biomedical Informatics & Bioinformatics Platform, IMIB-Arrixaca, C/ Luis Fontes Pagán, n° 9, Murcia, 30003, Spain
| | - Jesualdo Tomás Fernández-Breis
- Dpto. Informática y Sistemas, Facultad de Informática, Universidad de Murcia, IMIB-Arrixaca, Facultad de Informática, Campus de Espinardo, Murcia, 30100, Spain.
| | - Martin Boeker
- Institute for Medical Biometry and Statistics, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Stefan-Meier-Str. 26, Freiburg, 79104, Germany
| |
Collapse
|
57
|
Timón S, Rincón M, Martínez-Tomás R. Extending XNAT Platform with an Incremental Semantic Framework. Front Neuroinform 2017; 11:57. [PMID: 28912709 PMCID: PMC5583223 DOI: 10.3389/fninf.2017.00057] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2017] [Accepted: 08/14/2017] [Indexed: 11/13/2022] Open
Abstract
Informatics increases the yield from neuroscience due to improved data. Data sharing and accessibility enable joint efforts between different research groups, as well as replication studies, pivotal for progress in the field. Research data archiving solutions are evolving rapidly to address these necessities, however, distributed data integration is still difficult because of the need of explicit agreements for disparate data models. To address these problems, ontologies are widely used in biomedical research to obtain common vocabularies and logical descriptions, but its application may suffer from scalability issues, domain bias, and loss of low-level data access. With the aim of improving the application of semantic models in biobanking systems, an incremental semantic framework that takes advantage of the latest advances in biomedical ontologies and the XNAT platform is designed and implemented. We follow a layered architecture that allows the alignment of multi-domain biomedical ontologies to manage data at different levels of abstraction. To illustrate this approach, the development is integrated in the JPND (EU Joint Program for Neurodegenerative Disease) APGeM project, focused on finding early biomarkers for Alzheimer's and other dementia related diseases.
Collapse
Affiliation(s)
- Santiago Timón
- Departamento de Inteligencia Artificial, Universidad Nacional de Educación a DistanciaMadrid, Spain.,Department of Neurology, Akershus University HospitalLørenskog, Norway.,Intervention Centre, Oslo University HospitalOslo, Norway
| | - Mariano Rincón
- Departamento de Inteligencia Artificial, Universidad Nacional de Educación a DistanciaMadrid, Spain
| | - Rafael Martínez-Tomás
- Departamento de Inteligencia Artificial, Universidad Nacional de Educación a DistanciaMadrid, Spain
| |
Collapse
|
58
|
Abstract
Purpose
This paper reports on a quantitative study of data gathered from the Linked Open Vocabularies (LOV) catalogue, including the use of network analysis and metrics. The purpose of this paper is to gain insights into the structure of LOV and the use of vocabularies in the Web of Data. It is important to note that not all the vocabularies in it are registered in LOV. Given the de-centralised and collaborative nature of the use and adoption of these vocabularies, the results of the study can be used to identify emergent important vocabularies that are shaping the Web of Data.
Design/methodology/approach
The methodology is based on an analytical approach to a data set that captures a complete snapshot of the LOV catalogue dated April 2014. An initial analysis of the data is presented in order to obtain insights into the characteristics of the vocabularies found in LOV. This is followed by an analysis of the use of Vocabulary of a Friend properties that describe relations among vocabularies. Finally, the study is complemented with an analysis of the usage of the different vocabularies, and concludes by proposing a number of metrics.
Findings
The most relevant insight is that unsurprisingly the vocabularies with more presence are those used to model Semantic Web data, such as Resource Description Framework, RDF Schema and OWL, as well as broadly used standards as Simple Knowledge Organization System, DCTERMS and DCE. It was also discovered that the most used language is English and the vocabularies are not considered to be highly specialised in a field. Also, there is not a dominant scope of the vocabularies. Regarding the structural analysis, it is concluded that LOV is a heterogeneous network.
Originality/value
The paper provides an empirical analysis of the structure of LOV and the relations between its vocabularies, together with some metrics that may be of help to determine the important vocabularies from a practical perspective. The results are of interest for a better understanding of the evolution and dynamics of the Web of Data, and for applications that attempt to retrieve data in the Linked Data Cloud. These applications can benefit from the insights into the important vocabularies to be supported and the value added when mapping between and using the vocabularies.
Collapse
|
59
|
McCusker JP, Dumontier M, Yan R, He S, Dordick JS, McGuinness DL. Finding melanoma drugs through a probabilistic knowledge graph. PeerJ Comput Sci 2017; 3:e106. [PMID: 37133296 PMCID: PMC10151034 DOI: 10.7717/peerj-cs.106] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2016] [Accepted: 12/27/2016] [Indexed: 05/04/2023]
Abstract
Metastatic cutaneous melanoma is an aggressive skin cancer with some progression-slowing treatments but no known cure. The omics data explosion has created many possible drug candidates; however, filtering criteria remain challenging, and systems biology approaches have become fragmented with many disconnected databases. Using drug, protein and disease interactions, we built an evidence-weighted knowledge graph of integrated interactions. Our knowledge graph-based system, ReDrugS, can be used via an application programming interface or web interface, and has generated 25 high-quality melanoma drug candidates. We show that probabilistic analysis of systems biology graphs increases drug candidate quality compared to non-probabilistic methods. Four of the 25 candidates are novel therapies, three of which have been tested with other cancers. All other candidates have current or completed clinical trials, or have been studied in in vivo or in vitro. This approach can be used to identify candidate therapies for use in research or personalized medicine.
Collapse
Affiliation(s)
| | - Michel Dumontier
- Stanford Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, CA, USA
| | - Rui Yan
- Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY, USA
| | - Sylvia He
- Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY, USA
| | - Jonathan S. Dordick
- Department of Chemical & Biological Engineering, Rensselaer Polytechnic Institute, Troy, NY, USA
- Center for Biotechnology & Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY, USA
| | - Deborah L. McGuinness
- Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY, USA
- Center for Biotechnology & Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY, USA
| |
Collapse
|
60
|
Mashima J, Kodama Y, Fujisawa T, Katayama T, Okuda Y, Kaminuma E, Ogasawara O, Okubo K, Nakamura Y, Takagi T. DNA Data Bank of Japan. Nucleic Acids Res 2016; 45:D25-D31. [PMID: 27924010 PMCID: PMC5210514 DOI: 10.1093/nar/gkw1001] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2016] [Revised: 10/13/2016] [Accepted: 10/15/2016] [Indexed: 12/27/2022] Open
Abstract
The DNA Data Bank of Japan (DDBJ) (http://www.ddbj.nig.ac.jp) has been providing public data services for thirty years (since 1987). We are collecting nucleotide sequence data from researchers as a member of the International Nucleotide Sequence Database Collaboration (INSDC, http://www.insdc.org), in collaboration with the US National Center for Biotechnology Information (NCBI) and European Bioinformatics Institute (EBI). The DDBJ Center also services Japanese Genotype-phenotype Archive (JGA), with the National Bioscience Database Center to collect human-subjected data from Japanese researchers. Here, we report our database activities for INSDC and JGA over the past year, and introduce retrieval and analytical services running on our supercomputer system and their recent modifications. Furthermore, with the Database Center for Life Science, the DDBJ Center improves semantic web technologies to integrate and to share biological data, for providing the RDF version of the sequence data.
Collapse
Affiliation(s)
- Jun Mashima
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Yuichi Kodama
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Takatomo Fujisawa
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | | | - Yoshihiro Okuda
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Eli Kaminuma
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Osamu Ogasawara
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Kousaku Okubo
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Yasukazu Nakamura
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Toshihisa Takagi
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan .,National Bioscience Database Center, Japan Science and Technology Agency, Tokyo 102-8666, Japan
| |
Collapse
|
61
|
Piñero J, Bravo À, Queralt-Rosinach N, Gutiérrez-Sacristán A, Deu-Pons J, Centeno E, García-García J, Sanz F, Furlong LI. DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Res 2016; 45:D833-D839. [PMID: 27924018 PMCID: PMC5210640 DOI: 10.1093/nar/gkw943] [Citation(s) in RCA: 1522] [Impact Index Per Article: 190.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2016] [Revised: 09/29/2016] [Accepted: 10/18/2016] [Indexed: 12/12/2022] Open
Abstract
The information about the genetic basis of human diseases lies at the heart of precision medicine and drug discovery. However, to realize its full potential to support these goals, several problems, such as fragmentation, heterogeneity, availability and different conceptualization of the data must be overcome. To provide the community with a resource free of these hurdles, we have developed DisGeNET (http://www.disgenet.org), one of the largest available collections of genes and variants involved in human diseases. DisGeNET integrates data from expert curated repositories, GWAS catalogues, animal models and the scientific literature. DisGeNET data are homogeneously annotated with controlled vocabularies and community-driven ontologies. Additionally, several original metrics are provided to assist the prioritization of genotype-phenotype relationships. The information is accessible through a web interface, a Cytoscape App, an RDF SPARQL endpoint, scripts in several programming languages and an R package. DisGeNET is a versatile platform that can be used for different research purposes including the investigation of the molecular underpinnings of specific human diseases and their comorbidities, the analysis of the properties of disease genes, the generation of hypothesis on drug therapeutic action and drug adverse effects, the validation of computationally predicted disease genes and the evaluation of text-mining methods performance.
Collapse
Affiliation(s)
- Janet Piñero
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences (DCEXS), Universitat Pompeu Fabra (UPF), C/Dr Aiguader 88, E-08003 Barcelona, Spain
| | - Àlex Bravo
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences (DCEXS), Universitat Pompeu Fabra (UPF), C/Dr Aiguader 88, E-08003 Barcelona, Spain
| | - Núria Queralt-Rosinach
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences (DCEXS), Universitat Pompeu Fabra (UPF), C/Dr Aiguader 88, E-08003 Barcelona, Spain
| | - Alba Gutiérrez-Sacristán
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences (DCEXS), Universitat Pompeu Fabra (UPF), C/Dr Aiguader 88, E-08003 Barcelona, Spain
| | - Jordi Deu-Pons
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences (DCEXS), Universitat Pompeu Fabra (UPF), C/Dr Aiguader 88, E-08003 Barcelona, Spain
| | - Emilio Centeno
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences (DCEXS), Universitat Pompeu Fabra (UPF), C/Dr Aiguader 88, E-08003 Barcelona, Spain
| | - Javier García-García
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences (DCEXS), Universitat Pompeu Fabra (UPF), C/Dr Aiguader 88, E-08003 Barcelona, Spain
| | - Ferran Sanz
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences (DCEXS), Universitat Pompeu Fabra (UPF), C/Dr Aiguader 88, E-08003 Barcelona, Spain
| | - Laura I Furlong
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences (DCEXS), Universitat Pompeu Fabra (UPF), C/Dr Aiguader 88, E-08003 Barcelona, Spain
| |
Collapse
|
62
|
Shaban-Nejad A, Lavigne M, Okhmatovskaia A, Buckeridge DL. PopHR: a knowledge-based platform to support integration, analysis, and visualization of population health data. Ann N Y Acad Sci 2016; 1387:44-53. [PMID: 27750378 DOI: 10.1111/nyas.13271] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2016] [Revised: 08/30/2016] [Accepted: 09/09/2016] [Indexed: 11/29/2022]
Abstract
Population health decision makers must consider complex relationships between multiple concepts measured with differential accuracy from heterogeneous data sources. Population health information systems are currently limited in their ability to integrate data and present a coherent portrait of population health. Consequentially, these systems can provide only basic support for decision makers. The Population Health Record (PopHR) is a semantic web application that automates the integration and extraction of massive amounts of heterogeneous data from multiple distributed sources (e.g., administrative data, clinical records, and survey responses) to support the measurement and monitoring of population health and health system performance for a defined population. The design of the PopHR draws on the theories of the determinants of health and evidence-based public health to harmonize and explicitly link information about a population with evidence about the epidemiology and control of chronic diseases. Organizing information in this manner and linking it explicitly to evidence is expected to improve decision making related to the planning, implementation, and evaluation of population health and health system interventions. In this paper, we describe the PopHR platform and discuss the architecture, design, key modules, and its implementation and use.
Collapse
Affiliation(s)
- Arash Shaban-Nejad
- Department of Epidemiology, Biostatistics, and Occupational Health, McGill University, Montréal, Québec, Canada.,University of Tennessee Health Science Center-Oak Ridge National Laboratory (UTHSC-ORNL), Center for Biomedical Informatics, Department of Pediatrics, Memphis, Tennessee
| | - Maxime Lavigne
- Department of Epidemiology, Biostatistics, and Occupational Health, McGill University, Montréal, Québec, Canada
| | - Anya Okhmatovskaia
- Department of Epidemiology, Biostatistics, and Occupational Health, McGill University, Montréal, Québec, Canada
| | - David L Buckeridge
- Department of Epidemiology, Biostatistics, and Occupational Health, McGill University, Montréal, Québec, Canada
| |
Collapse
|
63
|
Shen F, Lee Y. Knowledge Discovery from Biomedical Ontologies in Cross Domains. PLoS One 2016; 11:e0160005. [PMID: 27548262 PMCID: PMC4993478 DOI: 10.1371/journal.pone.0160005] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2016] [Accepted: 07/12/2016] [Indexed: 01/19/2023] Open
Abstract
In recent years, there is an increasing demand for sharing and integration of medical data in biomedical research. In order to improve a health care system, it is required to support the integration of data by facilitating semantic interoperability systems and practices. Semantic interoperability is difficult to achieve in these systems as the conceptual models underlying datasets are not fully exploited. In this paper, we propose a semantic framework, called Medical Knowledge Discovery and Data Mining (MedKDD), that aims to build a topic hierarchy and serve the semantic interoperability between different ontologies. For the purpose, we fully focus on the discovery of semantic patterns about the association of relations in the heterogeneous information network representing different types of objects and relationships in multiple biological ontologies and the creation of a topic hierarchy through the analysis of the discovered patterns. These patterns are used to cluster heterogeneous information networks into a set of smaller topic graphs in a hierarchical manner and then to conduct cross domain knowledge discovery from the multiple biological ontologies. Thus, patterns made a greater contribution in the knowledge discovery across multiple ontologies. We have demonstrated the cross domain knowledge discovery in the MedKDD framework using a case study with 9 primary biological ontologies from Bio2RDF and compared it with the cross domain query processing approach, namely SLAP. We have confirmed the effectiveness of the MedKDD framework in knowledge discovery from multiple medical ontologies.
Collapse
Affiliation(s)
- Feichen Shen
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Yugyung Lee
- School of Computing and Engineering, University of Missouri - Kansas City, Kansas City, Missouri, United States of America
- * E-mail:
| |
Collapse
|
64
|
Dumontier M, Gray AJG, Marshall MS, Alexiev V, Ansell P, Bader G, Baran J, Bolleman JT, Callahan A, Cruz-Toledo J, Gaudet P, Gombocz EA, Gonzalez-Beltran AN, Groth P, Haendel M, Ito M, Jupp S, Juty N, Katayama T, Kobayashi N, Krishnaswami K, Laibe C, Le Novère N, Lin S, Malone J, Miller M, Mungall CJ, Rietveld L, Wimalaratne SM, Yamaguchi A. The health care and life sciences community profile for dataset descriptions. PeerJ 2016; 4:e2331. [PMID: 27602295 PMCID: PMC4991880 DOI: 10.7717/peerj.2331] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2016] [Accepted: 07/14/2016] [Indexed: 11/20/2022] Open
Abstract
Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.
Collapse
Affiliation(s)
- Michel Dumontier
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, United States of America
| | - Alasdair J G Gray
- Department of Computer Science, Heriot-Watt University, Edinburgh, United Kingdom
| | - M Scott Marshall
- Department of Radiation Oncology (MAASTRO), GROW- School for Oncology and Developmental Biology, MAASTRO Clinic, Maastricht, Netherlands
| | | | | | - Gary Bader
- The Donnelly Centre, University of Toronto, Toronto, Canada
| | - Joachim Baran
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, United States of America
| | - Jerven T Bolleman
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Geneve, Switzerland
| | - Alison Callahan
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, United States of America
| | | | - Pascale Gaudet
- CALIPHO group, SIB Swiss Institute of Bioinformatics, Geneve, Switzerland
| | | | | | | | - Melissa Haendel
- Department of Medical Informatics and Epidemiology, Oregon Health Sciences University, Portland, OR, United States of America
| | - Maori Ito
- Office of Medical Informatics and Epidemiology, Pharmaceuticals and Medical Devices Agency, Chiyoda-ku, Japan
| | - Simon Jupp
- EMBL, European Bioinformatics Institute, Saffron Walden, United Kingdom
| | - Nick Juty
- EMBL, European Bioinformatics Institute, Saffron Walden, United Kingdom
| | | | - Norio Kobayashi
- Advanced Center for Computing and Communication, RIKEN, Wako-shi, Saitama, Japan
| | | | - Camille Laibe
- EMBL, European Bioinformatics Institute, Saffron Walden, United Kingdom
| | | | - Simon Lin
- Nationwide Children's Hospital, Columbus, OH, United States of America
| | - James Malone
- EMBL, European Bioinformatics Institute, Saffron Walden, United Kingdom
| | - Michael Miller
- Institute for Systems Biology, Seattle, WA, United States of America
| | - Christopher J Mungall
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, United States of America
| | - Laurens Rietveld
- Department of Exact Sciences, VU University Amsterdam, Amsterdam, Netherlands
| | | | | |
Collapse
|
65
|
Fernández-Breis JT, Chiba H, Legaz-García MDC, Uchiyama I. The Orthology Ontology: development and applications. J Biomed Semantics 2016; 7:34. [PMID: 27259657 PMCID: PMC4893294 DOI: 10.1186/s13326-016-0077-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Accepted: 05/17/2016] [Indexed: 11/16/2022] Open
Abstract
Background Computational comparative analysis of multiple genomes provides valuable opportunities to biomedical research. In particular, orthology analysis can play a central role in comparative genomics; it guides establishing evolutionary relations among genes of organisms and allows functional inference of gene products. However, the wide variations in current orthology databases necessitate the research toward the shareability of the content that is generated by different tools and stored in different structures. Exchanging the content with other research communities requires making the meaning of the content explicit. Description The need for a common ontology has led to the creation of the Orthology Ontology (ORTH) following the best practices in ontology construction. Here, we describe our model and major entities of the ontology that is implemented in the Web Ontology Language (OWL), followed by the assessment of the quality of the ontology and the application of the ORTH to existing orthology datasets. This shareable ontology enables the possibility to develop Linked Orthology Datasets and a meta-predictor of orthology through standardization for the representation of orthology databases. The ORTH is freely available in OWL format to all users at http://purl.org/net/orth. Conclusions The Orthology Ontology can serve as a framework for the semantic standardization of orthology content and it will contribute to a better exploitation of orthology resources in biomedical research. The results demonstrate the feasibility of developing shareable datasets using this ontology. Further applications will maximize the usefulness of this ontology.
Collapse
Affiliation(s)
| | - Hirokazu Chiba
- National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, 444-8585, Aichi, Japan
| | | | - Ikuo Uchiyama
- National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, 444-8585, Aichi, Japan
| |
Collapse
|
66
|
Rodríguez-Iglesias A, Rodríguez-González A, Irvine AG, Sesma A, Urban M, Hammond-Kosack KE, Wilkinson MD. Publishing FAIR Data: An Exemplar Methodology Utilizing PHI-Base. FRONTIERS IN PLANT SCIENCE 2016; 7:641. [PMID: 27433158 PMCID: PMC4922217 DOI: 10.3389/fpls.2016.00641] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/20/2016] [Accepted: 04/26/2016] [Indexed: 06/06/2023]
Abstract
Pathogen-Host interaction data is core to our understanding of disease processes and their molecular/genetic bases. Facile access to such core data is particularly important for the plant sciences, where individual genetic and phenotypic observations have the added complexity of being dispersed over a wide diversity of plant species vs. the relatively fewer host species of interest to biomedical researchers. Recently, an international initiative interested in scholarly data publishing proposed that all scientific data should be "FAIR"-Findable, Accessible, Interoperable, and Reusable. In this work, we describe the process of migrating a database of notable relevance to the plant sciences-the Pathogen-Host Interaction Database (PHI-base)-to a form that conforms to each of the FAIR Principles. We discuss the technical and architectural decisions, and the migration pathway, including observations of the difficulty and/or fidelity of each step. We examine how multiple FAIR principles can be addressed simultaneously through careful design decisions, including making data FAIR for both humans and machines with minimal duplication of effort. We note how FAIR data publishing involves more than data reformatting, requiring features beyond those exhibited by most life science Semantic Web or Linked Data resources. We explore the value-added by completing this FAIR data transformation, and then test the result through integrative questions that could not easily be asked over traditional Web-based data resources. Finally, we demonstrate the utility of providing explicit and reliable access to provenance information, which we argue enhances citation rates by encouraging and facilitating transparent scholarly reuse of these valuable data holdings.
Collapse
Affiliation(s)
| | | | - Alistair G. Irvine
- Department of Computational and Systems Biology, Rothamsted ResearchHarpenden, UK
| | - Ane Sesma
- Center for Plant Biotechnology and Genomics, Universidad Politécnica de MadridMadrid, Spain
| | - Martin Urban
- Department of Plant Biology and Crop Science, Rothamsted ResearchHarpenden, UK
| | | | - Mark D. Wilkinson
- Center for Plant Biotechnology and Genomics, Universidad Politécnica de MadridMadrid, Spain
| |
Collapse
|
67
|
Brochhausen M, Zheng J, Birtwell D, Williams H, Masci AM, Ellis HJ, Stoeckert CJ. OBIB-a novel ontology for biobanking. J Biomed Semantics 2016; 7:23. [PMID: 27148435 PMCID: PMC4855778 DOI: 10.1186/s13326-016-0068-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Accepted: 04/21/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Biobanking necessitates extensive integration of data to allow data analysis and specimen sharing. Ontologies have been demonstrated to be a promising approach in fostering better semantic integration of biobank-related data. Hitherto no ontology provided the coverage needed to capture a broad spectrum of biobank user scenarios. METHODS Based in the principles laid out by the Open Biological and Biomedical Ontologies Foundry two biobanking ontologies have been developed. These two ontologies were merged using a modular approach consistent with the initial development principles. The merging was facilitated by the fact that both ontologies use the same Upper Ontology and re-use classes from a similar set of pre-existing ontologies. RESULTS Based on the two previous ontologies the Ontology for Biobanking (http://purl.obolibrary.org/obo/obib.owl) was created. Due to the fact that there was no overlap between the two source ontologies the coverage of the resulting ontology is significantly larger than of the two source ontologies. The ontology is successfully used in managing biobank information of the Penn Medicine BioBank. CONCLUSIONS Sharing development principles and Upper Ontologies facilitates subsequent merging of ontologies to achieve a broader coverage.
Collapse
Affiliation(s)
- Mathias Brochhausen
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, 4301 W. Markham St., #782, Little Rock, AR 72205-7199 USA
| | - Jie Zheng
- Department of Genetics, Institute for Translational Medicine and Therapeutics, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA
| | - David Birtwell
- Penn Medicine BioBank, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA
| | - Heather Williams
- Penn Medicine BioBank, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA
| | - Anna Maria Masci
- Department of Biostatistics and Bioinformatics, Duke Medical Center, Duke University, Durnham, USA
| | - Helena Judge Ellis
- Duke Biobank, Duke Translational Research Institute, Duke University, Durnham, USA
| | - Christian J Stoeckert
- Department of Genetics, Institute for Translational Medicine and Therapeutics, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA
| |
Collapse
|
68
|
Shen F, Liu H, Sohn S, Larson DW, Lee Y. Predicate Oriented Pattern Analysis for Biomedical Knowledge Discovery. INTELLIGENT INFORMATION MANAGEMENT 2016; 8:66-85. [PMID: 28983419 PMCID: PMC5626454 DOI: 10.4236/iim.2016.83006] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
In the current biomedical data movement, numerous efforts have been made to convert and normalize a large number of traditional structured and unstructured data (e.g., EHRs, reports) to semi-structured data (e.g., RDF, OWL). With the increasing number of semi-structured data coming into the biomedical community, data integration and knowledge discovery from heterogeneous domains become important research problem. In the application level, detection of related concepts among medical ontologies is an important goal of life science research. It is more crucial to figure out how different concepts are related within a single ontology or across multiple ontologies by analysing predicates in different knowledge bases. However, the world today is one of information explosion, and it is extremely difficult for biomedical researchers to find existing or potential predicates to perform linking among cross domain concepts without any support from schema pattern analysis. Therefore, there is a need for a mechanism to do predicate oriented pattern analysis to partition heterogeneous ontologies into closer small topics and do query generation to discover cross domain knowledge from each topic. In this paper, we present such a model that predicates oriented pattern analysis based on their close relationship and generates a similarity matrix. Based on this similarity matrix, we apply an innovated unsupervised learning algorithm to partition large data sets into smaller and closer topics and generate meaningful queries to fully discover knowledge over a set of interlinked data sources. We have implemented a prototype system named BmQGen and evaluate the proposed model with colorectal surgical cohort from the Mayo Clinic.
Collapse
Affiliation(s)
- Feichen Shen
- CSEE Department, University of Missouri, Kansas City, MO, USA
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, MN, USA
| | - Sunghwan Sohn
- Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, MN, USA
| | - David W Larson
- Department of Surgery, Mayo Clinic College of Medicine, Rochester, MN, USA
| | - Yugyung Lee
- CSEE Department, University of Missouri, Kansas City, MO, USA
| |
Collapse
|
69
|
Queralt-Rosinach N, Piñero J, Bravo À, Sanz F, Furlong LI. DisGeNET-RDF: harnessing the innovative power of the Semantic Web to explore the genetic basis of diseases. Bioinformatics 2016; 32:2236-8. [PMID: 27153650 PMCID: PMC4937199 DOI: 10.1093/bioinformatics/btw214] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2015] [Accepted: 04/14/2016] [Indexed: 11/13/2022] Open
Abstract
Motivation: DisGeNET-RDF makes available knowledge on the genetic basis of human diseases in the Semantic Web. Gene-disease associations (GDAs) and their provenance metadata are published as human-readable and machine-processable web resources. The information on GDAs included in DisGeNET-RDF is interlinked to other biomedical databases to support the development of bioinformatics approaches for translational research through evidence-based exploitation of a rich and fully interconnected linked open data. Availability and implementation:http://rdf.disgenet.org/ Contact:support@disgenet.org
Collapse
Affiliation(s)
- Núria Queralt-Rosinach
- Integrative Biomedical Informatics (IBI) Group, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences (DCEXS), Universitat Pompeu Fabra (UPF), C/Doctor Aiguader 88, E-08003 Barcelona, Spain
| | - Janet Piñero
- Integrative Biomedical Informatics (IBI) Group, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences (DCEXS), Universitat Pompeu Fabra (UPF), C/Doctor Aiguader 88, E-08003 Barcelona, Spain
| | - Àlex Bravo
- Integrative Biomedical Informatics (IBI) Group, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences (DCEXS), Universitat Pompeu Fabra (UPF), C/Doctor Aiguader 88, E-08003 Barcelona, Spain
| | - Ferran Sanz
- Integrative Biomedical Informatics (IBI) Group, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences (DCEXS), Universitat Pompeu Fabra (UPF), C/Doctor Aiguader 88, E-08003 Barcelona, Spain
| | - Laura I Furlong
- Integrative Biomedical Informatics (IBI) Group, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences (DCEXS), Universitat Pompeu Fabra (UPF), C/Doctor Aiguader 88, E-08003 Barcelona, Spain
| |
Collapse
|
70
|
Callahan A, Abeyruwan SW, Al-Ali H, Sakurai K, Ferguson AR, Popovich PG, Shah NH, Visser U, Bixby JL, Lemmon VP. RegenBase: a knowledge base of spinal cord injury biology for translational research. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw040. [PMID: 27055827 PMCID: PMC4823819 DOI: 10.1093/database/baw040] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/03/2015] [Accepted: 03/03/2016] [Indexed: 12/20/2022]
Abstract
Spinal cord injury (SCI) research is a data-rich field that aims to identify the biological mechanisms resulting in loss of function and mobility after SCI, as well as develop therapies that promote recovery after injury. SCI experimental methods, data and domain knowledge are locked in the largely unstructured text of scientific publications, making large scale integration with existing bioinformatics resources and subsequent analysis infeasible. The lack of standard reporting for experiment variables and results also makes experiment replicability a significant challenge. To address these challenges, we have developed RegenBase, a knowledge base of SCI biology. RegenBase integrates curated literature-sourced facts and experimental details, raw assay data profiling the effect of compounds on enzyme activity and cell growth, and structured SCI domain knowledge in the form of the first ontology for SCI, using Semantic Web representation languages and frameworks. RegenBase uses consistent identifier schemes and data representations that enable automated linking among RegenBase statements and also to other biological databases and electronic resources. By querying RegenBase, we have identified novel biological hypotheses linking the effects of perturbagens to observed behavioral outcomes after SCI. RegenBase is publicly available for browsing, querying and download. Database URL:http://regenbase.org
Collapse
Affiliation(s)
- Alison Callahan
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA 94305
| | | | - Hassan Al-Ali
- Miami Project to Cure Paralysis, University of Miami School of Medicine, Miami, FL 33136
| | - Kunie Sakurai
- Miami Project to Cure Paralysis, University of Miami School of Medicine, Miami, FL 33136
| | - Adam R Ferguson
- Brain and Spinal Injury Center (BASIC), Department of Neurological Surgery, University of California, San Francisco; San Francisco Veterans Affairs Medical Center, San Francisco, CA 94143
| | - Phillip G Popovich
- Center for Brain and Spinal Cord Repair and the Department of Neuroscience, The Ohio State University, Columbus, OH 43210
| | - Nigam H Shah
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA 94305
| | - Ubbo Visser
- Department of Computer Science, University of Miami, Coral Gables, FL 33146
| | - John L Bixby
- Miami Project to Cure Paralysis, University of Miami School of Medicine, Miami, FL 33136 Center for Computational Science, University of Miami, Coral Gables, FL 33146 Department of Cellular and Molecular Pharmacology, University of Miami School of Medicine, Miami, FL 33136, USA
| | - Vance P Lemmon
- Miami Project to Cure Paralysis, University of Miami School of Medicine, Miami, FL 33136 Center for Computational Science, University of Miami, Coral Gables, FL 33146
| |
Collapse
|
71
|
Burgstaller-Muehlbacher S, Waagmeester A, Mitraka E, Turner J, Putman T, Leong J, Naik C, Pavlidis P, Schriml L, Good BM, Su AI. Wikidata as a semantic framework for the Gene Wiki initiative. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw015. [PMID: 26989148 PMCID: PMC4795929 DOI: 10.1093/database/baw015] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/07/2015] [Accepted: 02/01/2016] [Indexed: 11/14/2022]
Abstract
Open biological data are distributed over many resources making them challenging to integrate, to update and to disseminate quickly. Wikidata is a growing, open community database which can serve this purpose and also provides tight integration with Wikipedia. In order to improve the state of biological data, facilitate data management and dissemination, we imported all human and mouse genes, and all human and mouse proteins into Wikidata. In total, 59,721 human genes and 73,355 mouse genes have been imported from NCBI and 27,306 human proteins and 16,728 mouse proteins have been imported from the Swissprot subset of UniProt. As Wikidata is open and can be edited by anybody, our corpus of imported data serves as the starting point for integration of further data by scientists, the Wikidata community and citizen scientists alike. The first use case for these data is to populate Wikipedia Gene Wiki infoboxes directly from Wikidata with the data integrated above. This enables immediate updates of the Gene Wiki infoboxes as soon as the data in Wikidata are modified. Although Gene Wiki pages are currently only on the English language version of Wikipedia, the multilingual nature of Wikidata allows for usage of the data we imported in all 280 different language Wikipedias. Apart from the Gene Wiki infobox use case, a SPARQL endpoint and exporting functionality to several standard formats (e.g. JSON, XML) enable use of the data by scientists. In summary, we created a fully open and extensible data resource for human and mouse molecular biology and biochemistry data. This resource enriches all the Wikipedias with structured information and serves as a new linking hub for the biological semantic web. Database URL: https://www.wikidata.org/.
Collapse
Affiliation(s)
| | | | | | - Julia Turner
- The Scripps Research Institute, La Jolla, CA, USA
| | - Tim Putman
- The Scripps Research Institute, La Jolla, CA, USA
| | - Justin Leong
- The University of British Columbia, Vancouver, British Columbia, Canada and
| | - Chinmay Naik
- Bangalore Inst. Of Technology, Visvesvaraya Technological University, Bangalore, Karnataka
| | - Paul Pavlidis
- The University of British Columbia, Vancouver, British Columbia, Canada and
| | - Lynn Schriml
- University of Maryland Baltimore, Baltimore, MD, USA
| | | | - Andrew I Su
- The Scripps Research Institute, La Jolla, CA, USA
| |
Collapse
|
72
|
Shaban-Nejad A, Mamiya H, Riazanov A, Forster AJ, Baker CJO, Tamblyn R, Buckeridge DL. From Cues to Nudge: A Knowledge-Based Framework for Surveillance of Healthcare-Associated Infections. J Med Syst 2015; 40:23. [PMID: 26537131 DOI: 10.1007/s10916-015-0364-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2015] [Accepted: 09/30/2015] [Indexed: 10/22/2022]
Abstract
We propose an integrated semantic web framework consisting of formal ontologies, web services, a reasoner and a rule engine that together recommend appropriate level of patient-care based on the defined semantic rules and guidelines. The classification of healthcare-associated infections within the HAIKU (Hospital Acquired Infections - Knowledge in Use) framework enables hospitals to consistently follow the standards along with their routine clinical practice and diagnosis coding to improve quality of care and patient safety. The HAI ontology (HAIO) groups over thousands of codes into a consistent hierarchy of concepts, along with relationships and axioms to capture knowledge on hospital-associated infections and complications with focus on the big four types, surgical site infections (SSIs), catheter-associated urinary tract infection (CAUTI); hospital-acquired pneumonia, and blood stream infection. By employing statistical inferencing in our study we use a set of heuristics to define the rule axioms to improve the SSI case detection. We also demonstrate how the occurrence of an SSI is identified using semantic e-triggers. The e-triggers will be used to improve our risk assessment of post-operative surgical site infections (SSIs) for patients undergoing certain type of surgeries (e.g., coronary artery bypass graft surgery (CABG)).
Collapse
Affiliation(s)
- Arash Shaban-Nejad
- School of Public Health, University of California at Berkeley, 50 University Hall, 94720-7360, Berkeley, CA, USA. .,Department of Epidemiology and Biostatistics, McGill University, Montreal, QC, Canada.
| | - Hiroshi Mamiya
- Department of Epidemiology and Biostatistics, McGill University, Montreal, QC, Canada
| | - Alexandre Riazanov
- IPSNP Computing Inc, Suite 1000, 44 Chipman Hill, Station A, PO Box 7289, Saint John, NB, E2L 4S6, Canada
| | - Alan J Forster
- Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada
| | - Christopher J O Baker
- Department of Epidemiology and Biostatistics, McGill University, Montreal, QC, Canada.,Department of Computer Science, University of New Brunswick, Saint John, NB, Canada
| | - Robyn Tamblyn
- Department of Epidemiology and Biostatistics, McGill University, Montreal, QC, Canada
| | - David L Buckeridge
- Department of Epidemiology and Biostatistics, McGill University, Montreal, QC, Canada
| |
Collapse
|
73
|
González-Beltrán A, Li P, Zhao J, Avila-Garcia MS, Roos M, Thompson M, van der Horst E, Kaliyaperumal R, Luo R, Lee TL, Lam TW, Edmunds SC, Sansone SA, Rocca-Serra P. From Peer-Reviewed to Peer-Reproduced in Scholarly Publishing: The Complementary Roles of Data Models and Workflows in Bioinformatics. PLoS One 2015; 10:e0127612. [PMID: 26154165 PMCID: PMC4495984 DOI: 10.1371/journal.pone.0127612] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2014] [Accepted: 04/16/2015] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION Reproducing the results from a scientific paper can be challenging due to the absence of data and the computational tools required for their analysis. In addition, details relating to the procedures used to obtain the published results can be difficult to discern due to the use of natural language when reporting how experiments have been performed. The Investigation/Study/Assay (ISA), Nanopublications (NP), and Research Objects (RO) models are conceptual data modelling frameworks that can structure such information from scientific papers. Computational workflow platforms can also be used to reproduce analyses of data in a principled manner. We assessed the extent by which ISA, NP, and RO models, together with the Galaxy workflow system, can capture the experimental processes and reproduce the findings of a previously published paper reporting on the development of SOAPdenovo2, a de novo genome assembler. RESULTS Executable workflows were developed using Galaxy, which reproduced results that were consistent with the published findings. A structured representation of the information in the SOAPdenovo2 paper was produced by combining the use of ISA, NP, and RO models. By structuring the information in the published paper using these data and scientific workflow modelling frameworks, it was possible to explicitly declare elements of experimental design, variables, and findings. The models served as guides in the curation of scientific information and this led to the identification of inconsistencies in the original published paper, thereby allowing its authors to publish corrections in the form of an errata. AVAILABILITY SOAPdenovo2 scripts, data, and results are available through the GigaScience Database: http://dx.doi.org/10.5524/100044; the workflows are available from GigaGalaxy: http://galaxy.cbiit.cuhk.edu.hk; and the representations using the ISA, NP, and RO models are available through the SOAPdenovo2 case study website http://isa-tools.github.io/soapdenovo2/. CONTACT philippe.rocca-serra@oerc.ox.ac.uk and susanna-assunta.sansone@oerc.ox.ac.uk.
Collapse
Affiliation(s)
| | - Peter Li
- GigaScience, BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong, People’s Republic of China
| | - Jun Zhao
- InfoLab21, Lancaster University, Bailrigg, Lancaster, LA1 4WA, United Kingdom
| | - Maria Susana Avila-Garcia
- Nuffield Department of Medicine, Experimental Medicine Division, John Radcliffe Hospital, Headley Way, Headington, Oxford, OX3 9DU, United Kingdom
| | - Marco Roos
- Department of Human Genetics, Leiden University Medical Center, P.O. Box 9600, 2300 RC Leiden, The Netherlands
| | - Mark Thompson
- Department of Human Genetics, Leiden University Medical Center, P.O. Box 9600, 2300 RC Leiden, The Netherlands
| | - Eelke van der Horst
- Department of Human Genetics, Leiden University Medical Center, P.O. Box 9600, 2300 RC Leiden, The Netherlands
| | - Rajaram Kaliyaperumal
- Department of Human Genetics, Leiden University Medical Center, P.O. Box 9600, 2300 RC Leiden, The Netherlands
| | - Ruibang Luo
- HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Pokfulam, Hong Kong, People’s Republic of China
| | - Tin-Lap Lee
- School of Biomedical Sciences and CUHK-BGI Innovation Institute of Trans-omics, The Chinese University of Hong Kong, Shatin, Hong Kong, People’s Republic of China
| | - Tak-wah Lam
- HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Pokfulam, Hong Kong, People’s Republic of China
| | - Scott C. Edmunds
- GigaScience, BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong, People’s Republic of China
| | | | - Philippe Rocca-Serra
- Oxford e-Research Centre, University of Oxford, 7 Keble Road, OX1 3QG, United Kingdom
| |
Collapse
|
74
|
Baran J, Durgahee BSB, Eilbeck K, Antezana E, Hoehndorf R, Dumontier M. GFVO: the Genomic Feature and Variation Ontology. PeerJ 2015; 3:e933. [PMID: 26019997 PMCID: PMC4435477 DOI: 10.7717/peerj.933] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2014] [Accepted: 04/14/2015] [Indexed: 01/06/2023] Open
Abstract
Falling costs in genomic laboratory experiments have led to a steady increase of genomic feature and variation data. Multiple genomic data formats exist for sharing these data, and whilst they are similar, they are addressing slightly different data viewpoints and are consequently not fully compatible with each other. The fragmentation of data format specifications makes it hard to integrate and interpret data for further analysis with information from multiple data providers. As a solution, a new ontology is presented here for annotating and representing genomic feature and variation dataset contents. The Genomic Feature and Variation Ontology (GFVO) specifically addresses genomic data as it is regularly shared using the GFF3 (incl. FASTA), GTF, GVF and VCF file formats. GFVO simplifies data integration and enables linking of genomic annotations across datasets through common semantics of genomic types and relations. Availability and implementation. The latest stable release of the ontology is available via its base URI; previous and development versions are available at the ontology's GitHub repository: https://github.com/BioInterchange/Ontologies; versions of the ontology are indexed through BioPortal (without external class-/property-equivalences due to BioPortal release 4.10 limitations); examples and reference documentation is provided on a separate web-page: http://www.biointerchange.org/ontologies.html. GFVO version 1.0.2 is licensed under the CC0 1.0 Universal license (https://creativecommons.org/publicdomain/zero/1.0) and therefore de facto within the public domain; the ontology can be appropriated without attribution for commercial and non-commercial use.
Collapse
Affiliation(s)
- Joachim Baran
- Stanford Center for Biomedical Informatics Research, School of Medicine, Stanford University , Stanford, CA , USA
| | | | - Karen Eilbeck
- Department of Biomedical Informatics, School of Medicine, University of Utah , Salt Lake City, UT , USA
| | - Erick Antezana
- Department of Biology, Norwegian University of Science and Technology , Trondheim , Norway
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences & Engineering Division and Computational Bioscience Research Center, King Abdullah University of Science and Technology , Thuwal , Kingdom of Saudi Arabia
| | - Michel Dumontier
- Stanford Center for Biomedical Informatics Research, School of Medicine, Stanford University , Stanford, CA , USA
| |
Collapse
|
75
|
Piñero J, Queralt-Rosinach N, Bravo À, Deu-Pons J, Bauer-Mehren A, Baron M, Sanz F, Furlong LI. DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015; 2015:bav028. [PMID: 25877637 PMCID: PMC4397996 DOI: 10.1093/database/bav028] [Citation(s) in RCA: 630] [Impact Index Per Article: 70.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2014] [Accepted: 03/09/2015] [Indexed: 11/25/2022]
Abstract
DisGeNET is a comprehensive discovery platform designed to address a variety of questions concerning the genetic underpinning of human diseases. DisGeNET contains over 380 000 associations between >16 000 genes and 13 000 diseases, which makes it one of the largest repositories currently available of its kind. DisGeNET integrates expert-curated databases with text-mined data, covers information on Mendelian and complex diseases, and includes data from animal disease models. It features a score based on the supporting evidence to prioritize gene-disease associations. It is an open access resource available through a web interface, a Cytoscape plugin and as a Semantic Web resource. The web interface supports user-friendly data exploration and navigation. DisGeNET data can also be analysed via the DisGeNET Cytoscape plugin, and enriched with the annotations of other plugins of this popular network analysis software suite. Finally, the information contained in DisGeNET can be expanded and complemented using Semantic Web technologies and linked to a variety of resources already present in the Linked Data cloud. Hence, DisGeNET offers one of the most comprehensive collections of human gene-disease associations and a valuable set of tools for investigating the molecular mechanisms underlying diseases of genetic origin, designed to fulfill the needs of different user profiles, including bioinformaticians, biologists and health-care practitioners. Database URL: http://www.disgenet.org/
Collapse
Affiliation(s)
- Janet Piñero
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, C/Dr Aiguader 88, E-08003 Barcelona, Spain, Roche Pharma Research and Early Development, pRED Informatics, Roche Innovation Center Penzberg, Roche Diagnostics GmbH, Nonnenwald 2, 82377 Penzberg, Germany and Scientific & Business Information Services, Roche Diagnostics GmbH, Nonnenwald 2, 82377 Penzberg, Germany
| | - Núria Queralt-Rosinach
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, C/Dr Aiguader 88, E-08003 Barcelona, Spain, Roche Pharma Research and Early Development, pRED Informatics, Roche Innovation Center Penzberg, Roche Diagnostics GmbH, Nonnenwald 2, 82377 Penzberg, Germany and Scientific & Business Information Services, Roche Diagnostics GmbH, Nonnenwald 2, 82377 Penzberg, Germany
| | - Àlex Bravo
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, C/Dr Aiguader 88, E-08003 Barcelona, Spain, Roche Pharma Research and Early Development, pRED Informatics, Roche Innovation Center Penzberg, Roche Diagnostics GmbH, Nonnenwald 2, 82377 Penzberg, Germany and Scientific & Business Information Services, Roche Diagnostics GmbH, Nonnenwald 2, 82377 Penzberg, Germany
| | - Jordi Deu-Pons
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, C/Dr Aiguader 88, E-08003 Barcelona, Spain, Roche Pharma Research and Early Development, pRED Informatics, Roche Innovation Center Penzberg, Roche Diagnostics GmbH, Nonnenwald 2, 82377 Penzberg, Germany and Scientific & Business Information Services, Roche Diagnostics GmbH, Nonnenwald 2, 82377 Penzberg, Germany
| | - Anna Bauer-Mehren
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, C/Dr Aiguader 88, E-08003 Barcelona, Spain, Roche Pharma Research and Early Development, pRED Informatics, Roche Innovation Center Penzberg, Roche Diagnostics GmbH, Nonnenwald 2, 82377 Penzberg, Germany and Scientific & Business Information Services, Roche Diagnostics GmbH, Nonnenwald 2, 82377 Penzberg, Germany
| | - Martin Baron
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, C/Dr Aiguader 88, E-08003 Barcelona, Spain, Roche Pharma Research and Early Development, pRED Informatics, Roche Innovation Center Penzberg, Roche Diagnostics GmbH, Nonnenwald 2, 82377 Penzberg, Germany and Scientific & Business Information Services, Roche Diagnostics GmbH, Nonnenwald 2, 82377 Penzberg, Germany
| | - Ferran Sanz
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, C/Dr Aiguader 88, E-08003 Barcelona, Spain, Roche Pharma Research and Early Development, pRED Informatics, Roche Innovation Center Penzberg, Roche Diagnostics GmbH, Nonnenwald 2, 82377 Penzberg, Germany and Scientific & Business Information Services, Roche Diagnostics GmbH, Nonnenwald 2, 82377 Penzberg, Germany
| | - Laura I Furlong
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, C/Dr Aiguader 88, E-08003 Barcelona, Spain, Roche Pharma Research and Early Development, pRED Informatics, Roche Innovation Center Penzberg, Roche Diagnostics GmbH, Nonnenwald 2, 82377 Penzberg, Germany and Scientific & Business Information Services, Roche Diagnostics GmbH, Nonnenwald 2, 82377 Penzberg, Germany
| |
Collapse
|
76
|
Chiba H, Nishide H, Uchiyama I. Construction of an ortholog database using the semantic web technology for integrative analysis of genomic data. PLoS One 2015; 10:e0122802. [PMID: 25875762 PMCID: PMC4395280 DOI: 10.1371/journal.pone.0122802] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2014] [Accepted: 02/13/2015] [Indexed: 12/30/2022] Open
Abstract
Recently, various types of biological data, including genomic sequences, have been rapidly accumulating. To discover biological knowledge from such growing heterogeneous data, a flexible framework for data integration is necessary. Ortholog information is a central resource for interlinking corresponding genes among different organisms, and the Semantic Web provides a key technology for the flexible integration of heterogeneous data. We have constructed an ortholog database using the Semantic Web technology, aiming at the integration of numerous genomic data and various types of biological information. To formalize the structure of the ortholog information in the Semantic Web, we have constructed the Ortholog Ontology (OrthO). While the OrthO is a compact ontology for general use, it is designed to be extended to the description of database-specific concepts. On the basis of OrthO, we described the ortholog information from our Microbial Genome Database for Comparative Analysis (MBGD) in the form of Resource Description Framework (RDF) and made it available through the SPARQL endpoint, which accepts arbitrary queries specified by users. In this framework based on the OrthO, the biological data of different organisms can be integrated using the ortholog information as a hub. Besides, the ortholog information from different data sources can be compared with each other using the OrthO as a shared ontology. Here we show some examples demonstrating that the ortholog information described in RDF can be used to link various biological data such as taxonomy information and Gene Ontology. Thus, the ortholog database using the Semantic Web technology can contribute to biological knowledge discovery through integrative data analysis.
Collapse
Affiliation(s)
- Hirokazu Chiba
- Laboratory of Genome Informatics, National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, Aichi, Japan
| | - Hiroyo Nishide
- Data Integration and Analysis Facility, National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, Aichi, Japan
| | - Ikuo Uchiyama
- Laboratory of Genome Informatics, National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, Aichi, Japan
- Data Integration and Analysis Facility, National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, Aichi, Japan
- * E-mail:
| |
Collapse
|
77
|
González-Beltrán A, Maguire E, Sansone SA, Rocca-Serra P. linkedISA: semantic representation of ISA-Tab experimental metadata. BMC Bioinformatics 2014; 15 Suppl 14:S4. [PMID: 25472428 PMCID: PMC4255742 DOI: 10.1186/1471-2105-15-s14-s4] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Reporting and sharing experimental metadata- such as the experimental design, characteristics of the samples, and procedures applied, along with the analysis results, in a standardised manner ensures that datasets are comprehensible and, in principle, reproducible, comparable and reusable. Furthermore, sharing datasets in formats designed for consumption by humans and machines will also maximize their use. The Investigation/Study/Assay (ISA) open source metadata tracking framework facilitates standards-compliant collection, curation, visualization, storage and sharing of datasets, leveraging on other platforms to enable analysis and publication. The ISA software suite includes several components used in increasingly diverse set of life science and biomedical domains; it is underpinned by a general-purpose format, ISA-Tab, and conversions exist into formats required by public repositories. While ISA-Tab works well mainly as a human readable format, we have also implemented a linked data approach to semantically define the ISA-Tab syntax. RESULTS We present a semantic web representation of the ISA-Tab syntax that complements ISA-Tab's syntactic interoperability with semantic interoperability. We introduce the linkedISA conversion tool from ISA-Tab to the Resource Description Framework (RDF), supporting mappings from the ISA syntax to multiple community-defined, open ontologies and capitalising on user-provided ontology annotations in the experimental metadata. We describe insights of the implementation and how annotations can be expanded driven by the metadata. We applied the conversion tool as part of Bio-GraphIIn, a web-based application supporting integration of the semantically-rich experimental descriptions. Designed in a user-friendly manner, the Bio-GraphIIn interface hides most of the complexities to the users, exposing a familiar tabular view of the experimental description to allow seamless interaction with the RDF representation, and visualising descriptors to drive the query over the semantic representation of the experimental design. In addition, we defined queries over the linkedISA RDF representation and demonstrated its use over the linkedISA conversion of datasets from Nature' Scientific Data online publication. CONCLUSIONS Our linked data approach has allowed us to: 1) make the ISA-Tab semantics explicit and machine-processable, 2) exploit the existing ontology-based annotations in the ISA-Tab experimental descriptions, 3) augment the ISA-Tab syntax with new descriptive elements, 4) visualise and query elements related to the experimental design. Reasoning over ISA-Tab metadata and associated data will facilitate data integration and knowledge discovery.
Collapse
Affiliation(s)
| | - Eamonn Maguire
- Oxford e-Research Centre, University of Oxford, Oxford, OX1 3QG, UK
| | | | | |
Collapse
|
78
|
Chibucos MC, Mungall CJ, Balakrishnan R, Christie KR, Huntley RP, White O, Blake JA, Lewis SE, Giglio M. Standardized description of scientific evidence using the Evidence Ontology (ECO). DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau075. [PMID: 25052702 PMCID: PMC4105709 DOI: 10.1093/database/bau075] [Citation(s) in RCA: 82] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
The Evidence Ontology (ECO) is a structured, controlled vocabulary for capturing evidence in biological research. ECO includes diverse terms for categorizing evidence that supports annotation assertions including experimental types, computational methods, author statements and curator inferences. Using ECO, annotation assertions can be distinguished according to the evidence they are based on such as those made by curators versus those automatically computed or those made via high-throughput data review versus single test experiments. Originally created for capturing evidence associated with Gene Ontology annotations, ECO is now used in other capacities by many additional annotation resources including UniProt, Mouse Genome Informatics, Saccharomyces Genome Database, PomBase, the Protein Information Resource and others. Information on the development and use of ECO can be found at http://evidenceontology.org. The ontology is freely available under Creative Commons license (CC BY-SA 3.0), and can be downloaded in both Open Biological Ontologies and Web Ontology Language formats at http://code.google.com/p/evidenceontology. Also at this site is a tracker for user submission of term requests and questions. ECO remains under active development in response to user-requested terms and in collaborations with other ontologies and database resources. Database URL: Evidence Ontology Web site: http://evidenceontology.org
Collapse
Affiliation(s)
- Marcus C Chibucos
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA, Saccharomyces Genome Database, Department of Genetics, Stanford University, Stanford, CA 94305, USA, Computational Biology and Bioinformatics, The Jackson Laboratory, Bar Harbor, ME 04609, USA, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD UK, Department of Epidemiology, University of Maryland School of Medicine, Baltimore, MD 21201, USA and Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USAInstitute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA, Saccharomyces Genome Database, Department of Genetics, Stanford University, Stanford, CA 94305, USA, Computational Biology and Bioinformatics, The Jackson Laboratory, Bar Harbor, ME 04609, USA, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD UK, Department of Epidemiology, University of Maryland School of Medicine, Baltimore, MD 21201, USA and Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Christopher J Mungall
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA, Saccharomyces Genome Database, Department of Genetics, Stanford University, Stanford, CA 94305, USA, Computational Biology and Bioinformatics, The Jackson Laboratory, Bar Harbor, ME 04609, USA, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD UK, Department of Epidemiology, University of Maryland School of Medicine, Baltimore, MD 21201, USA and Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Rama Balakrishnan
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA, Saccharomyces Genome Database, Department of Genetics, Stanford University, Stanford, CA 94305, USA, Computational Biology and Bioinformatics, The Jackson Laboratory, Bar Harbor, ME 04609, USA, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD UK, Department of Epidemiology, University of Maryland School of Medicine, Baltimore, MD 21201, USA and Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Karen R Christie
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA, Saccharomyces Genome Database, Department of Genetics, Stanford University, Stanford, CA 94305, USA, Computational Biology and Bioinformatics, The Jackson Laboratory, Bar Harbor, ME 04609, USA, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD UK, Department of Epidemiology, University of Maryland School of Medicine, Baltimore, MD 21201, USA and Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Rachael P Huntley
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA, Saccharomyces Genome Database, Department of Genetics, Stanford University, Stanford, CA 94305, USA, Computational Biology and Bioinformatics, The Jackson Laboratory, Bar Harbor, ME 04609, USA, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD UK, Department of Epidemiology, University of Maryland School of Medicine, Baltimore, MD 21201, USA and Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Owen White
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA, Saccharomyces Genome Database, Department of Genetics, Stanford University, Stanford, CA 94305, USA, Computational Biology and Bioinformatics, The Jackson Laboratory, Bar Harbor, ME 04609, USA, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD UK, Department of Epidemiology, University of Maryland School of Medicine, Baltimore, MD 21201, USA and Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USAInstitute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA, Saccharomyces Genome Database, Department of Genetics, Stanford University, Stanford, CA 94305, USA, Computational Biology and Bioinformatics, The Jackson Laboratory, Bar Harbor, ME 04609, USA, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD UK, Department of Epidemiology, University of Maryland School of Medicine, Baltimore, MD 21201, USA and Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Judith A Blake
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA, Saccharomyces Genome Database, Department of Genetics, Stanford University, Stanford, CA 94305, USA, Computational Biology and Bioinformatics, The Jackson Laboratory, Bar Harbor, ME 04609, USA, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD UK, Department of Epidemiology, University of Maryland School of Medicine, Baltimore, MD 21201, USA and Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Suzanna E Lewis
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA, Saccharomyces Genome Database, Department of Genetics, Stanford University, Stanford, CA 94305, USA, Computational Biology and Bioinformatics, The Jackson Laboratory, Bar Harbor, ME 04609, USA, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD UK, Department of Epidemiology, University of Maryland School of Medicine, Baltimore, MD 21201, USA and Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Michelle Giglio
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA, Saccharomyces Genome Database, Department of Genetics, Stanford University, Stanford, CA 94305, USA, Computational Biology and Bioinformatics, The Jackson Laboratory, Bar Harbor, ME 04609, USA, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD UK, Department of Epidemiology, University of Maryland School of Medicine, Baltimore, MD 21201, USA and Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USAInstitute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA, Saccharomyces Genome Database, Department of Genetics, Stanford University, Stanford, CA 94305, USA, Computational Biology and Bioinformatics, The Jackson Laboratory, Bar Harbor, ME 04609, USA, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD UK, Department of Epidemiology, University of Maryland School of Medicine, Baltimore, MD 21201, USA and Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| |
Collapse
|
79
|
Bölling C, Weidlich M, Holzhütter HG. SEE: structured representation of scientific evidence in the biomedical domain using Semantic Web techniques. J Biomed Semantics 2014; 5:S1. [PMID: 25093070 PMCID: PMC4108886 DOI: 10.1186/2041-1480-5-s1-s1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Accounts of evidence are vital to evaluate and reproduce scientific findings and integrate data on an informed basis. Currently, such accounts are often inadequate, unstandardized and inaccessible for computational knowledge engineering even though computational technologies, among them those of the semantic web, are ever more employed to represent, disseminate and integrate biomedical data and knowledge. RESULTS We present SEE (Semantic EvidencE), an RDF/OWL based approach for detailed representation of evidence in terms of the argumentative structure of the supporting background for claims even in complex settings. We derive design principles and identify minimal components for the representation of evidence. We specify the Reasoning and Discourse Ontology (RDO), an OWL representation of the model of scientific claims, their subjects, their provenance and their argumentative relations underlying the SEE approach. We demonstrate the application of SEE and illustrate its design patterns in a case study by providing an expressive account of the evidence for certain claims regarding the isolation of the enzyme glutamine synthetase. CONCLUSIONS SEE is suited to provide coherent and computationally accessible representations of evidence-related information such as the materials, methods, assumptions, reasoning and information sources used to establish a scientific finding by adopting a consistently claim-based perspective on scientific results and their evidence. SEE allows for extensible evidence representations, in which the level of detail can be adjusted and which can be extended as needed. It supports representation of arbitrary many consecutive layers of interpretation and attribution and different evaluations of the same data. SEE and its underlying model could be a valuable component in a variety of use cases that require careful representation or examination of evidence for data presented on the semantic web or in other formats.
Collapse
Affiliation(s)
- Christian Bölling
- Institute of Biochemistry, Charité Universitätsmedizin Berlin, Berlin, Germany
| | - Michael Weidlich
- Department of Computer Science, Humboldt-Universität zu Berlin, Berlin, Germany
| | | |
Collapse
|
80
|
González-Beltrán A, Maguire E, Sansone SA, Rocca-Serra P. linkedISA: semantic representation of ISA-Tab experimental metadata. BMC Bioinformatics 2014; 15. [PMID: 25472428 PMCID: PMC4255742 DOI: 10.1186/1471-2105-15-s14-s4,] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/10/2023] Open
Abstract
BACKGROUND Reporting and sharing experimental metadata- such as the experimental design, characteristics of the samples, and procedures applied, along with the analysis results, in a standardised manner ensures that datasets are comprehensible and, in principle, reproducible, comparable and reusable. Furthermore, sharing datasets in formats designed for consumption by humans and machines will also maximize their use. The Investigation/Study/Assay (ISA) open source metadata tracking framework facilitates standards-compliant collection, curation, visualization, storage and sharing of datasets, leveraging on other platforms to enable analysis and publication. The ISA software suite includes several components used in increasingly diverse set of life science and biomedical domains; it is underpinned by a general-purpose format, ISA-Tab, and conversions exist into formats required by public repositories. While ISA-Tab works well mainly as a human readable format, we have also implemented a linked data approach to semantically define the ISA-Tab syntax. RESULTS We present a semantic web representation of the ISA-Tab syntax that complements ISA-Tab's syntactic interoperability with semantic interoperability. We introduce the linkedISA conversion tool from ISA-Tab to the Resource Description Framework (RDF), supporting mappings from the ISA syntax to multiple community-defined, open ontologies and capitalising on user-provided ontology annotations in the experimental metadata. We describe insights of the implementation and how annotations can be expanded driven by the metadata. We applied the conversion tool as part of Bio-GraphIIn, a web-based application supporting integration of the semantically-rich experimental descriptions. Designed in a user-friendly manner, the Bio-GraphIIn interface hides most of the complexities to the users, exposing a familiar tabular view of the experimental description to allow seamless interaction with the RDF representation, and visualising descriptors to drive the query over the semantic representation of the experimental design. In addition, we defined queries over the linkedISA RDF representation and demonstrated its use over the linkedISA conversion of datasets from Nature' Scientific Data online publication. CONCLUSIONS Our linked data approach has allowed us to: 1) make the ISA-Tab semantics explicit and machine-processable, 2) exploit the existing ontology-based annotations in the ISA-Tab experimental descriptions, 3) augment the ISA-Tab syntax with new descriptive elements, 4) visualise and query elements related to the experimental design. Reasoning over ISA-Tab metadata and associated data will facilitate data integration and knowledge discovery.
Collapse
Affiliation(s)
| | - Eamonn Maguire
- Oxford e-Research Centre, University of Oxford, Oxford, OX1 3QG, UK
| | | | | |
Collapse
|
81
|
Chaudhri VK, Elenius D, Goldenkranz A, Gong A, Martone ME, Webb W, Yorke-Smith N. Comparative analysis of knowledge representation and reasoning requirements across a range of life sciences textbooks. J Biomed Semantics 2014; 5:51. [PMID: 25785183 PMCID: PMC4362633 DOI: 10.1186/2041-1480-5-51] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2014] [Accepted: 11/26/2014] [Indexed: 11/29/2022] Open
Abstract
Background Using knowledge representation for biomedical projects is now commonplace. In previous work, we represented the knowledge found in a college-level biology textbook in a fashion useful for answering questions. We showed that embedding the knowledge representation and question-answering abilities in an electronic textbook helped to engage student interest and improve learning. A natural question that arises from this success, and this paper’s primary focus, is whether a similar approach is applicable across a range of life science textbooks. To answer that question, we considered four different textbooks, ranging from a below-introductory college biology text to an advanced, graduate-level neuroscience textbook. For these textbooks, we investigated the following questions: (1) To what extent is knowledge shared between the different textbooks? (2) To what extent can the same upper ontology be used to represent the knowledge found in different textbooks? (3) To what extent can the questions of interest for a range of textbooks be answered by using the same reasoning mechanisms? Results Our existing modeling and reasoning methods apply especially well both to a textbook that is comparable in level to the text studied in our previous work (i.e., an introductory-level text) and to a textbook at a lower level, suggesting potential for a high degree of portability. Even for the overlapping knowledge found across the textbooks, the level of detail covered in each textbook was different, which requires that the representations must be customized for each textbook. We also found that for advanced textbooks, representing models and scientific reasoning processes was particularly important. Conclusions With some additional work, our representation methodology would be applicable to a range of textbooks. The requirements for knowledge representation are common across textbooks, suggesting that a shared semantic infrastructure for the life sciences is feasible. Because our representation overlaps heavily with those already being used for biomedical ontologies, this work suggests a natural pathway to include such representations as part of the life sciences curriculum at different grade levels.
Collapse
Affiliation(s)
| | | | | | | | | | - William Webb
- Foothill Community College, Los Altos Hills, CA USA
| | - Neil Yorke-Smith
- American University of Beirut, Beirut, Lebanon ; University of Cambridge, Cambridge, UK
| |
Collapse
|
82
|
Samadian S, McManus B, Wilkinson MD. Extending and encoding existing biological terminologies and datasets for use in the reasoned semantic web. J Biomed Semantics 2012; 3:6. [PMID: 22818710 PMCID: PMC3639885 DOI: 10.1186/2041-1480-3-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2011] [Accepted: 05/13/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Clinical phenotypes and disease-risk stratification are most often determined through the direct observations of clinicians in conjunction with published standards and guidelines, where the clinical expert is the final arbiter of the patient's classification. While this "human" approach is highly desirable in the context of personalized and optimal patient care, it is problematic in a healthcare research setting because the basis for the patient's classification is not transparent, and likely not reproducible from one clinical expert to another. This sits in opposition to the rigor required to execute, for example, Genome-wide association analyses and other high-throughput studies where a large number of variables are being compared to a complex disease phenotype. Most clinical classification systems and are not structured for automated classification, and similarly, clinical data is generally not represented in a form that lends itself to automated integration and interpretation. Here we apply Semantic Web technologies to the problem of automated, transparent interpretation of clinical data for use in high-throughput research environments, and explore migration-paths for existing data and legacy semantic standards. RESULTS Using a dataset from a cardiovascular cohort collected two decades ago, we present a migration path - both for the terminologies/classification systems and the data - that enables rich automated clinical classification using well-established standards. This is achieved by establishing a simple and flexible core data model, which is combined with a layered ontological framework utilizing both logical reasoning and analytical algorithms to iteratively "lift" clinical data through increasingly complex layers of interpretation and classification. We compare our automated analysis to that of the clinical expert, and discrepancies are used to refine the ontological models, finally arriving at ontologies that mirror the expert opinion of the individual clinical researcher. Other discrepancies, however, could not be as easily modeled, and we evaluate what information we are lacking that would allow these discrepancies to be resolved in an automated manner. CONCLUSIONS We demonstrate that the combination of semantically-explicit data, logically rigorous models of clinical guidelines, and publicly-accessible Semantic Web Services, can be used to execute automated, rigorous and reproducible clinical classifications with an accuracy approaching that of an expert. Discrepancies between the manual and automatic approaches reveal, as expected, that clinicians do not always rigorously follow established guidelines for classification; however, we demonstrate that "personalized" ontologies may represent a re-usable and transparent approach to modeling individual clinical expertise, leading to more reproducible science.
Collapse
Affiliation(s)
- Soroush Samadian
- UBC James Hogg Research Center, Institute for Heart Lung Health, St. Paul's Hospital, Room 166, Burrard Building 1081 Burrard Street, Vancouver, BC, Canada, V6Z 1Y6
| | - Bruce McManus
- UBC James Hogg Research Center, Institute for Heart Lung Health, St. Paul's Hospital, Room 166, Burrard Building 1081 Burrard Street, Vancouver, BC, Canada, V6Z 1Y6
| | - Mark D Wilkinson
- NCE CECR Center of Excellence for the Prevention of Organ Failure (PROOF Centre), St. Paul's Hospital, Room 166, Burrard Building 1081 Burrard Street, Vancouver, BC, Canada, V6Z 1Y6
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid, Madrid, Spain
| |
Collapse
|