Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Kazakov Y, Krötzsch M, Simančík F. The Incredible ELK. J Autom Reason 2013. [DOI: 10.1007/s10817-013-9296-3] [Citation(s) in RCA: 132] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Number

Cited by Other Article(s)

Slater K, Schofield PN, Wright J, Clift P, Irani A, Bradlow W, Aziz F, Gkoutos GV. Talking about diseases; developing a model of patient and public-prioritised disease phenotypes. NPJ Digit Med 2024;7:263. [PMID: 39349692 PMCID: PMC11443070 DOI: 10.1038/s41746-024-01257-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 09/11/2024] [Indexed: 10/04/2024] Open

Abstract

Deep phenotyping describes the use of standardised terminologies to create comprehensive phenotypic descriptions of biomedical phenomena. These characterisations facilitate secondary analysis, evidence synthesis, and practitioner awareness, thereby guiding patient care. The vast majority of this knowledge is derived from sources that describe an academic understanding of disease, including academic literature and experimental databases. Previous work indicates a gulf between the priorities, perspectives, and perceptions held by different healthcare stakeholders. Using social media data, we develop a phenotype model that represents a public perspective on disease and compare this with a model derived from a combination of existing academic phenotype databases. We identified 52,198 positive disease-phenotype associations from social media across 311 diseases. We further identified 24,618 novel phenotype associations not shared by the biomedical and literature-derived phenotype model across 304 diseases, of which we considered 14,531 significant. Manifestations of disease affecting quality of life, and concerning endocrine, digestive, and reproductive diseases were over-represented in the social media phenotype model. An expert clinical review found that social media-derived associations were considered similarly well-established to those derived from literature, and were seen significantly more in patient clinical encounters. The phenotype model recovered from social media presents a significantly different perspective than existing resources derived from biomedical databases and literature, providing a large number of associations novel to the latter dataset. We propose that the integration and interrogation of these public perspectives on the disease can inform clinical awareness, improve secondary analysis, and bridge understanding and priorities across healthcare stakeholders.

Collapse

Yamagata Y, Kushida T, Onami S, Masuya H. Homeostasis imbalance process ontology: a study on COVID-19 infectious processes. BMC Med Inform Decis Mak 2024;23:301. [PMID: 38778394 PMCID: PMC11110177 DOI: 10.1186/s12911-024-02516-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Accepted: 04/15/2024] [Indexed: 05/25/2024] Open

Dally D, Amith M, Mauldin RL, Thomas L, Dang Y, Tao C. A Semantic Approach to Describe Social and Economic Characteristics That Impact Health Outcomes (Social Determinants of Health): Ontology Development Study. Online J Public Health Inform 2024;16:e52845. [PMID: 38477963 PMCID: PMC10973958 DOI: 10.2196/52845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Revised: 11/28/2023] [Accepted: 02/19/2024] [Indexed: 03/14/2024] Open

Abstract

BACKGROUND

Social determinants of health (SDoH) have been described by the World Health Organization as the conditions in which individuals are born, live, work, and age. These conditions can be grouped into 3 interrelated levels known as macrolevel (societal), mesolevel (community), and microlevel (individual) determinants. The scope of SDoH expands beyond the biomedical level, and there remains a need to connect other areas such as economics, public policy, and social factors.

OBJECTIVE

Providing a computable artifact that can link health data to concepts involving the different levels of determinants may improve our understanding of the impact SDoH have on human populations. Modeling SDoH may help to reduce existing gaps in the literature through explicit links between the determinants and biological factors. This in turn can allow researchers and clinicians to make better sense of data and discover new knowledge through the use of semantic links.

METHODS

An experimental ontology was developed to represent knowledge of the social and economic characteristics of SDoH. Information from 27 literature sources was analyzed to gather concepts and encoded using Web Ontology Language, version 2 (OWL2) and Protégé. Four evaluators independently reviewed the ontology axioms using natural language translation. The analyses from the evaluations and selected terminologies from the Basic Formal Ontology were used to create a revised ontology with a broad spectrum of knowledge concepts ranging from the macrolevel to the microlevel determinants.

RESULTS

The literature search identified several topics of discussion for each determinant level. Publications for the macrolevel determinants centered around health policy, income inequality, welfare, and the environment. Articles relating to the mesolevel determinants discussed work, work conditions, psychosocial factors, socioeconomic position, outcomes, food, poverty, housing, and crime. Finally, sources found for the microlevel determinants examined gender, ethnicity, race, and behavior. Concepts were gathered from the literature and used to produce an ontology consisting of 383 classes, 109 object properties, and 748 logical axioms. A reasoning test revealed no inconsistent axioms.

CONCLUSIONS

This ontology models heterogeneous social and economic concepts to represent aspects of SDoH. The scope of SDoH is expansive, and although the ontology is broad, it is still in its early stages. To our current understanding, this ontology represents the first attempt to concentrate on knowledge concepts that are currently not covered by existing ontologies. Future direction will include further expanding the ontology to link with other biomedical ontologies, including alignment for granular semantics.

Collapse

Sanjak J, Binder J, Yadaw AS, Zhu Q, Mathé EA. Clustering rare diseases within an ontology-enriched knowledge graph. J Am Med Inform Assoc 2023;31:154-164. [PMID: 37759342 PMCID: PMC10746319 DOI: 10.1093/jamia/ocad186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Revised: 08/01/2023] [Accepted: 09/05/2023] [Indexed: 09/29/2023] Open

Stefancsik R, Balhoff JP, Balk MA, Ball RL, Bello SM, Caron AR, Chesler EJ, de Souza V, Gehrke S, Haendel M, Harris LW, Harris NL, Ibrahim A, Koehler S, Matentzoglu N, McMurry JA, Mungall CJ, Munoz-Torres MC, Putman T, Robinson P, Smedley D, Sollis E, Thessen AE, Vasilevsky N, Walton DO, Osumi-Sutherland D. The Ontology of Biological Attributes (OBA)-computational traits for the life sciences. Mamm Genome 2023;34:364-378. [PMID: 37076585 PMCID: PMC10382347 DOI: 10.1007/s00335-023-09992-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 04/06/2023] [Indexed: 04/21/2023]

Affiliation(s)

Ray Stefancsik European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK.
James P Balhoff Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC, 27517, USA
Meghan A Balk Natural History Museum, University of Oslo, Oslo, Norway
Robyn L Ball The Jackson Laboratory, Bar Harbor, ME, 04609, USA
Susan M Bello The Jackson Laboratory, Bar Harbor, ME, 04609, USA
Anita R Caron European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
Elissa J Chesler The Jackson Laboratory, Bar Harbor, ME, 04609, USA
Vinicius de Souza European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
Sarah Gehrke Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
Melissa Haendel Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
Laura W Harris European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
Nomi L Harris Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
Arwa Ibrahim European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
Sebastian Koehler Ada Health GmbH, Berlin, Germany
Nicolas Matentzoglu Semanticly, Athens, Greece
Julie A McMurry Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
Christopher J Mungall Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
Monica C Munoz-Torres Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
Tim Putman Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
Peter Robinson The Jackson Laboratory, Bar Harbor, ME, 04609, USA
Damian Smedley William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, EC1M 6BQ, UK
Elliot Sollis European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
Anne E Thessen Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
Nicole Vasilevsky Data Collaboration Center, Critical Path Institute, Tucson, AZ, 85718, USA
David O Walton The Jackson Laboratory, Bar Harbor, ME, 04609, USA
David Osumi-Sutherland European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK

Collapse

Sanjak J, Zhu Q, Mathé EA. Clustering rare diseases within an ontology-enriched knowledge graph. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.15.528673. [PMID: 36824742 PMCID: PMC9949046 DOI: 10.1101/2023.02.15.528673] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]

Abstract

Objective

Identifying sets of rare diseases with shared aspects of etiology and pathophysiology may enable drug repurposing and/or platform based therapeutic development. Toward that aim, we utilized an integrative knowledge graph-based approach to constructing clusters of rare diseases.

Materials and Methods

Data on 3,242 rare diseases were extracted from the National Center for Advancing Translational Science (NCATS) Genetic and Rare Diseases Information center (GARD) internal data resources. The rare disease data was enriched with additional biomedical data, including gene and phenotype ontologies, biological pathway data and small molecule-target activity data, to create a knowledge graph (KG). Node embeddings were used to convert nodes into vectors upon which k-means clustering was applied. We validated the disease clusters through semantic similarity and feature enrichment analysis.

Results

A node embedding model was trained on the ontology enriched rare disease KG and k-means clustering was applied to the embedding vectors resulting in 37 disease clusters with a mean size of 87 diseases. We validate the disease clusters quantitatively by looking at semantic similarity of clustered diseases, using the Orphanet Rare Disease Ontology. In addition, the clusters were analyzed for enrichment of associated genes, revealing that the enriched genes within clusters were shown to be highly related.

Discussion

We demonstrate that node embeddings are an effective method for clustering diseases within a heterogenous KG. Semantically similar diseases and relevant enriched genes have been uncovered within the clusters. Connections between disease clusters and approved or investigational drugs are enumerated for follow-up efforts.

Conclusion

Our study lays out a method for clustering rare diseases using the graph node embeddings. We develop an easy to maintain pipeline that can be updated when new data on rare diseases emerges. The embeddings themselves can be paired with other representation learning methods for other data types, such as drugs, to address other predictive modeling problems. Detailed subnetwork analysis and in-depth review of individual clusters may lead to translatable findings. Future work will focus on incorporation of additional data sources, with a particular focus on common disease data.

Collapse

Slater K, Williams JA, Schofield PN, Russell S, Pendleton SC, Karwath A, Fanning H, Ball S, Hoehndorf R, Gkoutos GV. Klarigi: Characteristic explanations for semantic biomedical data. Comput Biol Med 2023;153:106425. [PMID: 36638616 DOI: 10.1016/j.compbiomed.2022.106425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 12/04/2022] [Accepted: 12/13/2022] [Indexed: 12/24/2022]

Abstract

Annotation of biomedical entities with ontology classes provides for formal semantic analysis and mobilisation of background knowledge in determining their relationships. To date, enrichment analysis has been routinely employed to identify classes that are over-represented in annotations across sets of groups, such as biosample gene expression profiles or patient phenotypes, and is useful for a range of tasks including differential diagnosis and causative variant prioritisation. These approaches, however, usually consider only univariate relationships, make limited use of the semantic features of ontologies, and provide limited information and evaluation of the explanatory power of both singular and grouped candidate classes. Moreover, they are not designed to solve the problem of deriving cohesive, characteristic, and discriminatory sets of classes for entity groups. We have developed a new tool, called Klarigi, which introduces multiple scoring heuristics for identification of classes that are both compositional and discriminatory for groups of entities annotated with ontology classes. The tool includes a novel algorithm for derivation of multivariable semantic explanations for entity groups, makes use of semantic inference through live use of an ontology reasoner, and includes a classification method for identifying the discriminatory power of candidate sets, in addition to significance testing apposite to traditional enrichment approaches. We describe the design and implementation of Klarigi, including its scoring and explanation determination methods, and evaluate its use in application to two test cases with clinical significance, comparing and contrasting methods and results with literature-based and enrichment analysis methods. We demonstrate that Klarigi produces characteristic and discriminatory explanations for groups of biomedical entities in two settings. We also show that these explanations recapitulate and extend the knowledge held in existing biomedical databases and literature for several diseases. We conclude that Klarigi provides a distinct and valuable perspective on biomedical datasets when compared with traditional enrichment methods, and therefore constitutes a new method by which biomedical datasets can be explored, contributing to improved insight into semantic data.

Collapse

Affiliation(s)

Karin Slater College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; MRC Health Data Research UK (HDR UK), Midlands, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK.
John A Williams College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
Paul N Schofield Department of Physiology, Development, and Neuroscience, University of Cambridge, UK
Sophie Russell College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK
Samantha C Pendleton College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK
Andreas Karwath College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; MRC Health Data Research UK (HDR UK), Midlands, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
Hilary Fanning Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
Simon Ball Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
Robert Hoehndorf Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, UK
Georgios V Gkoutos College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; NIHR Experimental Cancer Medicine Centre, UK; NIHR Surgical Reconstruction and Microbiology Research Centre, UK; NIHR Biomedical Research Centre, UK; MRC Health Data Research UK (HDR UK), Midlands, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK

Collapse

Stefancsik R, Balhoff JP, Balk MA, Ball R, Bello SM, Caron AR, Chessler E, de Souza V, Gehrke S, Haendel M, Harris LW, Harris NL, Ibrahim A, Koehler S, Matentzoglu N, McMurry JA, Mungall CJ, Munoz-Torres MC, Putman T, Robinson P, Smedley D, Sollis E, Thessen AE, Vasilevsky N, Walton DO, Osumi-Sutherland D. The Ontology of Biological Attributes (OBA) - Computational Traits for the Life Sciences. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.26.525742. [PMID: 36747660 PMCID: PMC9900877 DOI: 10.1101/2023.01.26.525742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]

Affiliation(s)

Ray Stefancsik European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
James P. Balhoff Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC 27517, USA
Meghan A. Balk National Ecological Observatory Network, Battelle, Boulder, CO 80301, USA
Robyn Ball The Jackson Laboratory, Bar Harbor, ME 04609, USA
Susan M. Bello The Jackson Laboratory, Bar Harbor, ME 04609, USA
Anita R. Caron European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
Elissa Chessler The Jackson Laboratory, Bar Harbor, ME 04609, USA
Vinicius de Souza European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
Sarah Gehrke Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
Melissa Haendel Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
Laura W. Harris European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
Nomi L. Harris Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
Arwa Ibrahim European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
Sebastian Koehler Ada Health GmbH, Berlin, Germany
Nicolas Matentzoglu Semanticly Ltd., Athens, Greece
Julie A. McMurry Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
Christopher J. Mungall Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
Monica C. Munoz-Torres Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
Tim Putman Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
Peter Robinson The Jackson Laboratory, Bar Harbor, ME 04609, USA
Damian Smedley William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London EC1M 6BQ, UK
Elliot Sollis European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
Anne E Thessen Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
Nicole Vasilevsky Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
David O. Walton The Jackson Laboratory, Bar Harbor, ME 04609, USA
David Osumi-Sutherland European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK

Collapse

Bonatti P, Petrova I, Sauro L. Optimizing the computation of overriding in DLN. ARTIF INTELL 2022. [DOI: 10.1016/j.artint.2022.103764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Blagec K, Barbosa-Silva A, Ott S, Samwald M. A curated, ontology-based, large-scale knowledge graph of artificial intelligence tasks and benchmarks. Sci Data 2022;9:322. [PMID: 35715466 PMCID: PMC9205953 DOI: 10.1038/s41597-022-01435-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Accepted: 05/30/2022] [Indexed: 11/22/2022] Open

Vaidya G, Cellinese N, Lapp H. A new phylogenetic data standard for computable clade definitions: the Phyloreference Exchange Format (Phyx). PeerJ 2022;10:e12618. [PMID: 35186448 PMCID: PMC8855714 DOI: 10.7717/peerj.12618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Accepted: 11/18/2021] [Indexed: 01/06/2023] Open

Memory-Limited Model-Based Diagnosis. ARTIF INTELL 2022. [DOI: 10.1016/j.artint.2022.103681] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]

Blumberg KL, Ponsero AJ, Bomhoff M, Wood-Charlson EM, DeLong EF, Hurwitz BL. Ontology-Enriched Specifications Enabling Findable, Accessible, Interoperable, and Reusable Marine Metagenomic Datasets in Cyberinfrastructure Systems. Front Microbiol 2021;12:765268. [PMID: 34956127 PMCID: PMC8692764 DOI: 10.3389/fmicb.2021.765268] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 11/16/2021] [Indexed: 11/13/2022] Open

Abstract

Marine microbial ecology requires the systematic comparison of biogeochemical and sequence data to analyze environmental influences on the distribution and variability of microbial communities. With ever-increasing quantities of metagenomic data, there is a growing need to make datasets Findable, Accessible, Interoperable, and Reusable (FAIR) across diverse ecosystems. FAIR data is essential to developing analytical frameworks that integrate microbiological, genomic, ecological, oceanographic, and computational methods. Although community standards defining the minimal metadata required to accompany sequence data exist, they haven’t been consistently used across projects, precluding interoperability. Moreover, these data are not machine-actionable or discoverable by cyberinfrastructure systems. By making ‘omic and physicochemical datasets FAIR to machine systems, we can enable sequence data discovery and reuse based on machine-readable descriptions of environments or physicochemical gradients. In this work, we developed a novel technical specification for dataset encapsulation for the FAIR reuse of marine metagenomic and physicochemical datasets within cyberinfrastructure systems. This includes using Frictionless Data Packages enriched with terminology from environmental and life-science ontologies to annotate measured variables, their units, and the measurement devices used. This approach was implemented in Planet Microbe, a cyberinfrastructure platform and marine metagenomic web-portal. Here, we discuss the data properties built into the specification to make global ocean datasets FAIR within the Planet Microbe portal. We additionally discuss the selection of, and contributions to marine-science ontologies used within the specification. Finally, we use the system to discover data by which to answer various biological questions about environments, physicochemical gradients, and microbial communities in meta-analyses. This work represents a future direction in marine metagenomic research by proposing a specification for FAIR dataset encapsulation that, if adopted within cyberinfrastructure systems, would automate the discovery, exchange, and re-use of data needed to answer broader reaching questions than originally intended.

Collapse

Lawson J, Cabili MN, Kerry G, Boughtwood T, Thorogood A, Alper P, Bowers SR, Boyles RR, Brookes AJ, Brush M, Burdett T, Clissold H, Donnelly S, Dyke SO, Freeberg MA, Haendel MA, Hata C, Holub P, Jeanson F, Jene A, Kawashima M, Kawashima S, Konopko M, Kyomugisha I, Li H, Linden M, Rodriguez LL, Morita M, Mulder N, Muller J, Nagaie S, Nasir J, Ogishima S, Ota Wang V, Paglione LD, Pandya RN, Parkinson H, Philippakis AA, Prasser F, Rambla J, Reinold K, Rushton GA, Saltzman A, Saunders G, Sofia HJ, Spalding JD, Swertz MA, Tulchinsky I, van Enckevort EJ, Varma S, Voisin C, Yamamoto N, Yamasaki C, Zass L, Guidry Auvil JM, Nyrönen TH, Courtot M. The Data Use Ontology to streamline responsible access to human biomedical datasets. CELL GENOMICS 2021;1:None. [PMID: 34820659 PMCID: PMC8591903 DOI: 10.1016/j.xgen.2021.100028] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Revised: 07/02/2021] [Accepted: 08/09/2021] [Indexed: 11/25/2022]

Affiliation(s)

Jonathan Lawson Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
Moran N. Cabili Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
Giselle Kerry European Molecular Biology Laboratory—European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
Tiffany Boughtwood Australian Genomics, Murdoch Children’s Research Institute, Parkville, VIC, Australia
Adrian Thorogood Centre of Genomics and Policy, Department of Human Genetics, McGill University, Montreal, QC, Canada ELIXIR-Luxembourg, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
Pinar Alper ELIXIR-Luxembourg, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
Sarion R. Bowers Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
Rebecca R. Boyles RTI International, Research Triangle Park, NC, USA
Anthony J. Brookes University of Leicester, Leicester, UK
Matthew Brush University of Colorado Anschutz Medical Campus, Aurora, CO, USA
Tony Burdett European Molecular Biology Laboratory—European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
Hayley Clissold Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
Stacey Donnelly Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
Stephanie O.M. Dyke McGill Centre for Integrative Neuroscience, Montreal Neurological Institute, Department of Neurology & Neurosurgery, Faculty of Medicine, McGill University, Montreal, QC, Canada
Mallory A. Freeberg European Molecular Biology Laboratory—European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
Melissa A. Haendel University of Colorado Anschutz Medical Campus, Aurora, CO, USA
Chihiro Hata Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Japan
Petr Holub BBMRI-ERIC, AT and Masaryk University, Brno, Czech Republic
Francis Jeanson University Health Network, Toronto, ON, Canada
Aina Jene Centre de Regulació Genòmica (CRG), Barcelona, Spain
Minae Kawashima National Bioscience Database Center, Japan Science and Technology Agency, Tokyo, Japan
Shuichi Kawashima Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Kashiwa, Japan
Melissa Konopko ELIXIR Hub, Wellcome Genome Campus, Hinxton, UK
Irene Kyomugisha Division of Human Genetics, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
Haoyuan Li Canada’s Michael Smith Genome Sciences Centre, Vancouver, BC, Canada
Mikael Linden ELIXIR-Finland, CSC - IT Center for Science Ltd, Espoo, Finland
Laura Lyman Rodriguez Patient-Centered Outcomes Research Institute, Washington, DC, USA
Mizuki Morita Okayama University, Okayama, Japan
Nicola Mulder Computational Biology Division, IDM, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
Jean Muller Laboratoire de Génétique Médicale, Institut de Génétique Médicale d’Alsace, INSERM U1112, Université; de Strasbourg, Strasbourg, France Laboratoire de Diagnostic Génétique, Institut de Génétique Médicale d’Alsace, Hôpitaux Universitaires de Strasbourg, Strasbourg, France
Satoshi Nagaie Tohoku Medical Megabank Organization (ToMMo), Tohoku University, Sendai, Japan
Jamal Nasir Department of Life Sciences, University of Northampton, Northampton, UK
Soichi Ogishima Tohoku Medical Megabank Organization (ToMMo), Tohoku University, Sendai, Japan
Vivian Ota Wang Office of Data Sharing, National Cancer Institute, NIH, Rockville, MD, USA
Laura D. Paglione Spherical Cow Group, Rego Park, NY 11374, USA
Ravi N. Pandya Microsoft Research, Redmond, WA 98052, USA
Helen Parkinson European Molecular Biology Laboratory—European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
Anthony A. Philippakis Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
Fabian Prasser Berlin Institute of Health at Charité—Universitätsmedizin Berlin, Berlin, Germany
Jordi Rambla Centre de Regulació Genòmica (CRG), Barcelona, Spain
Kathy Reinold Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
Gregory A. Rushton Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
Andrea Saltzman Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
Gary Saunders ELIXIR Hub, Wellcome Genome Campus, Hinxton, UK
Heidi J. Sofia National Human Genome Research Institute, NIH, Bethesda, MD, USA
John D. Spalding European Molecular Biology Laboratory—European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
Morris A. Swertz Genomics Coordination Center, Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
Ilia Tulchinsky Google Cloud, Kitchener, ON N2H 5G5, Canada
Esther J. van Enckevort Genomics Coordination Center, Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
Susheel Varma Health Data Research UK, Gibbs Building, 215 Euston Road, London NW1 2BE, UK
Craig Voisin Google Cloud, Kitchener, ON N2H 5G5, Canada
Natsuko Yamamoto Osaka University, Osaka, Japan
Chisato Yamasaki Osaka University, Osaka, Japan
Lyndon Zass Computational Biology Division, IDM, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
Jaime M. Guidry Auvil Office of Data Sharing, National Cancer Institute, NIH, Rockville, MD, USA
Tommi H. Nyrönen ELIXIR-Finland, CSC - IT Center for Science Ltd, Espoo, Finland
Mélanie Courtot European Molecular Biology Laboratory—European Bioinformatics Institute (EMBL-EBI), Hinxton, UK

Collapse

Tena Cucala D, Cuenca Grau B, Horrocks I. Pay-as-you-go consequence-based reasoning for the description logic SROIQ. ARTIF INTELL 2021. [DOI: 10.1016/j.artint.2021.103518] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]

Kulmanov M, Smaili FZ, Gao X, Hoehndorf R. Semantic similarity and machine learning with ontologies. Brief Bioinform 2021;22:bbaa199. [PMID: 33049044 PMCID: PMC8293838 DOI: 10.1093/bib/bbaa199] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Revised: 08/03/2020] [Accepted: 08/04/2020] [Indexed: 12/13/2022] Open

Biomedical Ontologies: Coverage, Access and Use. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11664-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Slater LT, Gkoutos GV, Hoehndorf R. Towards semantic interoperability: finding and repairing hidden contradictions in biomedical ontologies. BMC Med Inform Decis Mak 2020;20:311. [PMID: 33319712 PMCID: PMC7736131 DOI: 10.1186/s12911-020-01336-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Accepted: 11/16/2020] [Indexed: 12/25/2022] Open

Abstract

Background

Ontologies are widely used throughout the biomedical domain. These ontologies formally represent the classes and relations assumed to exist within a domain. As scientific domains are deeply interlinked, so too are their representations. While individual ontologies can be tested for consistency and coherency using automated reasoning methods, systematically combining ontologies of multiple domains together may reveal previously hidden contradictions.

Methods

We developed a method that tests for hidden unsatisfiabilities in an ontology that arise when combined with other ontologies. For this purpose, we combined sets of ontologies and use automated reasoning to determine whether unsatisfiable classes are present. In addition, we designed and implemented a novel algorithm that can determine justifications for contradictions across extremely large and complicated ontologies, and use these justifications to semi-automatically repair ontologies by identifying a small set of axioms that, when removed, result in a consistent and coherent set of ontologies.

Results

We tested the mutual consistency of the OBO Foundry and the OBO ontologies and find that the combined OBO Foundry gives rise to at least 636 unsatisfiable classes, while the OBO ontologies give rise to more than 300,000 unsatisfiable classes. We also applied our semi-automatic repair algorithm to each combination of OBO ontologies that resulted in unsatisfiable classes, finding that only 117 axioms could be removed to account for all cases of unsatisfiability across all OBO ontologies.

Conclusions

We identified a large set of hidden unsatisfiability across a broad range of biomedical ontologies, and we find that this large set of unsatisfiable classes is the result of a relatively small amount of axiomatic disagreements. Our results show that hidden unsatisfiability is a serious problem in ontology interoperability; however, our results also provide a way towards more consistent ontologies by addressing the issues we identified.

Collapse

Bonatti PA, Ioffredo L, Petrova IM, Sauro L, Siahaan IR. Real-time reasoning in OWL2 for GDPR compliance. ARTIF INTELL 2020. [DOI: 10.1016/j.artint.2020.103389] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]

Protein ontology on the semantic web for knowledge discovery. Sci Data 2020;7:337. [PMID: 33046717 PMCID: PMC7550340 DOI: 10.1038/s41597-020-00679-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Accepted: 09/17/2020] [Indexed: 11/26/2022] Open

Schneider T, Šimkus M. Ontologies and Data Management: A Brief Survey. KUNSTLICHE INTELLIGENZ 2020;34:329-353. [PMID: 32999532 PMCID: PMC7497697 DOI: 10.1007/s13218-020-00686-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Accepted: 07/22/2020] [Indexed: 11/30/2022]

Machine Understandable Policies and GDPR Compliance Checking. KUNSTLICHE INTELLIGENZ 2020. [DOI: 10.1007/s13218-020-00677-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

Tripodi IJ, Callahan TJ, Westfall JT, Meitzer NS, Dowell RD, Hunter LE. Applying knowledge-driven mechanistic inference to toxicogenomics. Toxicol In Vitro 2020;66:104877. [PMID: 32387679 DOI: 10.1016/j.tiv.2020.104877] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 04/13/2020] [Accepted: 04/23/2020] [Indexed: 02/07/2023]

Abeysinghe R, Hinderer EW, Moseley HNB, Cui L. SSIF: Subsumption-based Sub-term Inference Framework to audit Gene Ontology. Bioinformatics 2020;36:3207-3214. [PMID: 32065617 PMCID: PMC7214018 DOI: 10.1093/bioinformatics/btaa106] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Revised: 02/08/2020] [Accepted: 02/11/2020] [Indexed: 01/02/2023] Open

Kasalica V, Knorr M, Leite J, Lopes C. NoHR: An Overview. KUNSTLICHE INTELLIGENZ 2020. [DOI: 10.1007/s13218-020-00650-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Mabee PM, Balhoff JP, Dahdul WM, Lapp H, Mungall CJ, Vision TJ. A Logical Model of Homology for Comparative Biology. Syst Biol 2020;69:345-362. [PMID: 31596473 PMCID: PMC7672696 DOI: 10.1093/sysbio/syz067] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2019] [Revised: 09/20/2019] [Accepted: 09/26/2019] [Indexed: 01/09/2023] Open

Harth A, Kirrane S, Ngonga Ngomo AC, Paulheim H, Rula A, Gentile AL, Haase P, Cochez M. Hybrid Reasoning Over Large Knowledge Bases Using On-The-Fly Knowledge Extraction. THE SEMANTIC WEB 2020. [PMCID: PMC7250607 DOI: 10.1007/978-3-030-49461-2_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Understanding and improving ontology reasoning efficiency through learning and ranking. INFORM SYST 2020. [DOI: 10.1016/j.is.2019.07.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

A polynomial Time Subsumption Algorithm for Nominal SafeELO⊥under Rational Closure. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2018.09.037] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Jackson RC, Balhoff JP, Douglass E, Harris NL, Mungall CJ, Overton JA. ROBOT: A Tool for Automating Ontology Workflows. BMC Bioinformatics 2019;20:407. [PMID: 31357927 PMCID: PMC6664714 DOI: 10.1186/s12859-019-3002-3] [Citation(s) in RCA: 68] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Accepted: 07/19/2019] [Indexed: 11/21/2022] Open

Abstract

BACKGROUND

Ontologies are invaluable in the life sciences, but building and maintaining ontologies often requires a challenging number of distinct tasks such as running automated reasoners and quality control checks, extracting dependencies and application-specific subsets, generating standard reports, and generating release files in multiple formats. Similar to more general software development, automation is the key to executing and managing these tasks effectively and to releasing more robust products in standard forms. For ontologies using the Web Ontology Language (OWL), the OWL API Java library is the foundation for a range of software tools, including the Protégé ontology editor. In the Open Biological and Biomedical Ontologies (OBO) community, we recognized the need to package a wide range of low-level OWL API functionality into a library of common higher-level operations and to make those operations available as a command-line tool.

RESULTS

ROBOT (a recursive acronym for "ROBOT is an OBO Tool") is an open source library and command-line tool for automating ontology development tasks. The library can be called from any programming language that runs on the Java Virtual Machine (JVM). Most usage is through the command-line tool, which runs on macOS, Linux, and Windows. ROBOT provides ontology processing commands for a variety of tasks, including commands for converting formats, running a reasoner, creating import modules, running reports, and various other tasks. These commands can be combined into larger workflows using a separate task execution system such as GNU Make, and workflows can be automatically executed within continuous integration systems.

CONCLUSIONS

ROBOT supports automation of a wide range of ontology development tasks, focusing on OBO conventions. It packages common high-level ontology development functionality into a convenient library, and makes it easy to configure, combine, and execute individual tasks in comprehensive, automated workflows. This helps ontology developers to efficiently create, maintain, and release high-quality ontologies, so that they can spend more time focusing on development tasks. It also helps guarantee that released ontologies are free of certain types of logical errors and conform to standard quality control checks, increasing the overall robustness and efficiency of the ontology development lifecycle.

Collapse

Smaili FZ, Gao X, Hoehndorf R. OPA2Vec: combining formal and informal content of biomedical ontologies to improve similarity-based prediction. Bioinformatics 2018;35:2133-2140. [DOI: 10.1093/bioinformatics/bty933] [Citation(s) in RCA: 65] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2018] [Revised: 11/02/2018] [Accepted: 11/07/2018] [Indexed: 12/11/2022] Open

Alshahrani M, Khan MA, Maddouri O, Kinjo AR, Queralt-Rosinach N, Hoehndorf R. Neuro-symbolic representation learning on biological knowledge graphs. Bioinformatics 2018;33:2723-2730. [PMID: 28449114 PMCID: PMC5860058 DOI: 10.1093/bioinformatics/btx275] [Citation(s) in RCA: 54] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2016] [Accepted: 04/18/2017] [Indexed: 11/12/2022] Open

Stucky BJ, Guralnick R, Deck J, Denny EG, Bolmgren K, Walls R. The Plant Phenology Ontology: A New Informatics Resource for Large-Scale Integration of Plant Phenology Data. FRONTIERS IN PLANT SCIENCE 2018;9:517. [PMID: 29765382 PMCID: PMC5938398 DOI: 10.3389/fpls.2018.00517] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/01/2018] [Accepted: 04/04/2018] [Indexed: 05/25/2023]

Rodríguez-García MÁ, Hoehndorf R. Inferring ontology graph structures using OWL reasoning. BMC Bioinformatics 2018;19:7. [PMID: 29304741 PMCID: PMC5756413 DOI: 10.1186/s12859-017-1999-8] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2017] [Accepted: 12/13/2017] [Indexed: 12/14/2022] Open

Osumi-Sutherland D. Cell ontology in an age of data-driven cell classification. BMC Bioinformatics 2017;18:558. [PMID: 29322914 PMCID: PMC5763290 DOI: 10.1186/s12859-017-1980-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open

Rodríguez-García MÁ, Gkoutos GV, Schofield PN, Hoehndorf R. Integrating phenotype ontologies with PhenomeNET. J Biomed Semantics 2017;8:58. [PMID: 29258588 PMCID: PMC5735523 DOI: 10.1186/s13326-017-0167-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Accepted: 11/22/2017] [Indexed: 01/05/2023] Open

Boudellioua I, Mahamad Razali RB, Kulmanov M, Hashish Y, Bajic VB, Goncalves-Serra E, Schoenmakers N, Gkoutos GV, Schofield PN, Hoehndorf R. Semantic prioritization of novel causative genomic variants. PLoS Comput Biol 2017;13:e1005500. [PMID: 28414800 PMCID: PMC5411092 DOI: 10.1371/journal.pcbi.1005500] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2016] [Revised: 05/01/2017] [Accepted: 04/04/2017] [Indexed: 12/14/2022] Open

Abstract

Discriminating the causative disease variant(s) for individuals with inherited or de novo mutations presents one of the main challenges faced by the clinical genetics community today. Computational approaches for variant prioritization include machine learning methods utilizing a large number of features, including molecular information, interaction networks, or phenotypes. Here, we demonstrate the PhenomeNET Variant Predictor (PVP) system that exploits semantic technologies and automated reasoning over genotype-phenotype relations to filter and prioritize variants in whole exome and whole genome sequencing datasets. We demonstrate the performance of PVP in identifying causative variants on a large number of synthetic whole exome and whole genome sequences, covering a wide range of diseases and syndromes. In a retrospective study, we further illustrate the application of PVP for the interpretation of whole exome sequencing data in patients suffering from congenital hypothyroidism. We find that PVP accurately identifies causative variants in whole exome and whole genome sequencing datasets and provides a powerful resource for the discovery of causal variants.

We address the problem of how to distinguish which of the many thousands of DNA sequence variants carried by an individual with a rare disease is responsible for the disease phenotypes. This can help clinicians arrive at a diagnosis, but also can be instrumental in improving our understanding of the pathobiology of the disease. Many methods are currently available to help with the problem of determining causative variant, using information about evolutionary conservation and prediction of the functional consequences of the sequence variant. We have developed a novel algorithm (PVP) which augments existing strategies by using the similarity of the patients phenotype to known phenotype-genotype data in human and model organism databases to further rank potential candidate genes. In a retrospective study, we apply PVP to the interpretation of whole exome sequencing data in patients suffering from congenital hypothyroidism, and find that PVP accurately identifies causative variants in whole exome and whole genome sequencing datasets and provides a powerful resource for the discovery of causal variants.

Collapse

Affiliation(s)

Imane Boudellioua King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
Rozaimi B. Mahamad Razali King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
Maxat Kulmanov King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
Yasmeen Hashish King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
Vladimir B. Bajic King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
Eva Goncalves-Serra Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom
Nadia Schoenmakers University of Cambridge Metabolic Research Laboratories, Wellcome Trust—Medical Research Council, Institute of Metabolic Science, Addenbrooke’s Hospital, Cambridge, United Kingdom
Georgios V. Gkoutos College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, United Kingdom Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, United Kingdom Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, United Kingdom * E-mail: (GVG); (PNS); (RH)
Paul N. Schofield Department of Physiology, Development & Neuroscience, University of Cambridge, Cambridge, United Kingdom * E-mail: (GVG); (PNS); (RH)
Robert Hoehndorf King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia * E-mail: (GVG); (PNS); (RH)

Collapse

Parsia B, Matentzoglu N, Gonçalves RS, Glimm B, Steigmiller A. The OWL Reasoner Evaluation (ORE) 2015 Competition Report. J Autom Reason 2017;59:455-482. [PMID: 30069067 PMCID: PMC6044265 DOI: 10.1007/s10817-017-9406-8] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2016] [Accepted: 02/01/2017] [Indexed: 11/25/2022]

Zhou Z, Qi G. GEL: A Platform-Independent Reasoner for Parallel Classification with OWL EL Ontologies Using Graph Representation. INT J ARTIF INTELL T 2017. [DOI: 10.1142/s0218213017600016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Esposito A, Esposito AM, Troncone A, Cordasco G, Orlandini A, Tsoukalas L. Editorial. INT J ARTIF INTELL T 2017. [DOI: 10.1142/s0218213017020018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Hoehndorf R, Alshahrani M, Gkoutos GV, Gosline G, Groom Q, Hamann T, Kattge J, de Oliveira SM, Schmidt M, Sierra S, Smets E, Vos RA, Weiland C. The flora phenotype ontology (FLOPO): tool for integrating morphological traits and phenotypes of vascular plants. J Biomed Semantics 2016;7:65. [PMID: 27842607 PMCID: PMC5109718 DOI: 10.1186/s13326-016-0107-8] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Accepted: 11/01/2016] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

The systematic analysis of a large number of comparable plant trait data can support investigations into phylogenetics and ecological adaptation, with broad applications in evolutionary biology, agriculture, conservation, and the functioning of ecosystems. Floras, i.e., books collecting the information on all known plant species found within a region, are a potentially rich source of such plant trait data. Floras describe plant traits with a focus on morphology and other traits relevant for species identification in addition to other characteristics of plant species, such as ecological affinities, distribution, economic value, health applications, traditional uses, and so on. However, a key limitation in systematically analyzing information in Floras is the lack of a standardized vocabulary for the described traits as well as the difficulties in extracting structured information from free text.

RESULTS

We have developed the Flora Phenotype Ontology (FLOPO), an ontology for describing traits of plant species found in Floras. We used the Plant Ontology (PO) and the Phenotype And Trait Ontology (PATO) to extract entity-quality relationships from digitized taxon descriptions in Floras, and used a formal ontological approach based on phenotype description patterns and automated reasoning to generate the FLOPO. The resulting ontology consists of 25,407 classes and is based on the PO and PATO. The classified ontology closely follows the structure of Plant Ontology in that the primary axis of classification is the observed plant anatomical structure, and more specific traits are then classified based on parthood and subclass relations between anatomical structures as well as subclass relations between phenotypic qualities.

CONCLUSIONS

The FLOPO is primarily intended as a framework based on which plant traits can be integrated computationally across all species and higher taxa of flowering plants. Importantly, it is not intended to replace established vocabularies or ontologies, but rather serve as an overarching framework based on which different application- and domain-specific ontologies, thesauri and vocabularies of phenotypes observed in flowering plants can be integrated.

Collapse

Affiliation(s)

Robert Hoehndorf Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955–6900 Kingdom of Saudi Arabia Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955–6900 Kingdom of Saudi Arabia
Mona Alshahrani Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955–6900 Kingdom of Saudi Arabia Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955–6900 Kingdom of Saudi Arabia
Georgios V. Gkoutos College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, B15 2TT United Kingdom Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, B15 2TT United Kingdom Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, SY23 2AX United Kingdom
George Gosline Royal Botanical Gardens, Kew, Richmond, Surrey, TW9 3AB United Kingdom
Quentin Groom Botanic Garden Meise, Nieuwelaan 38, Meise, 1860 Belgium
Thomas Hamann Naturalis Biodiversity Center, P.O. Box 9517, Leiden, 2300 RA The Netherlands
Jens Kattge Max Planck Institute for Biogeochemistry, Hans Knoell Str. 10, Jena, 07745 Germany German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Deutscher Platz 5e, Leipzig, 04103 Germany
Sylvia Mota de Oliveira Naturalis Biodiversity Center, P.O. Box 9517, Leiden, 2300 RA The Netherlands
Marco Schmidt Senckenberg Biodiversity and Climate Research Centre (BiK-F), Senckenberganlage 25, Frankfurt am Main, 60325 Germany
Soraya Sierra Naturalis Biodiversity Center, P.O. Box 9517, Leiden, 2300 RA The Netherlands
Erik Smets Naturalis Biodiversity Center, P.O. Box 9517, Leiden, 2300 RA The Netherlands
Rutger A. Vos Naturalis Biodiversity Center, P.O. Box 9517, Leiden, 2300 RA The Netherlands
Claus Weiland Senckenberg Biodiversity and Climate Research Centre (BiK-F), Senckenberganlage 25, Frankfurt am Main, 60325 Germany

Collapse

Eiter T, Fink M, Stepanova D. Data repair of inconsistent nonmonotonic description logic programs. ARTIF INTELL 2016. [DOI: 10.1016/j.artint.2016.06.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

The Bayesian Ontology Language $$\mathcal {BEL}$$ BEL. J Autom Reason 2016. [DOI: 10.1007/s10817-016-9386-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

Hill DP, D'Eustachio P, Berardini TZ, Mungall CJ, Renedo N, Blake JA. Modeling biochemical pathways in the gene ontology. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016;2016:baw126. [PMID: 27589964 PMCID: PMC5009323 DOI: 10.1093/database/baw126] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/24/2016] [Accepted: 08/10/2016] [Indexed: 12/05/2022]

Minimizing conservativity violations in ontology alignments: algorithms and evaluation. Knowl Inf Syst 2016. [DOI: 10.1007/s10115-016-0983-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Slater L, Gkoutos GV, Schofield PN, Hoehndorf R. Using AberOWL for fast and scalable reasoning over BioPortal ontologies. J Biomed Semantics 2016;7:49. [PMID: 27502585 PMCID: PMC4976511 DOI: 10.1186/s13326-016-0090-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2016] [Accepted: 07/08/2016] [Indexed: 11/30/2022] Open

Dececchi TA, Mabee PM, Blackburn DC. Data Sources for Trait Databases: Comparing the Phenomic Content of Monographs and Evolutionary Matrices. PLoS One 2016;11:e0155680. [PMID: 27191170 PMCID: PMC4871461 DOI: 10.1371/journal.pone.0155680] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2016] [Accepted: 05/03/2016] [Indexed: 01/17/2023] Open

Detwiler LT, Mejino JLV, Brinkley JF. From frames to OWL2: Converting the Foundational Model of Anatomy. Artif Intell Med 2016;69:12-21. [PMID: 27235801 DOI: 10.1016/j.artmed.2016.04.003] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2015] [Revised: 04/22/2016] [Accepted: 04/23/2016] [Indexed: 12/17/2022]

Abstract

OBJECTIVE

The Foundational Model of Anatomy (FMA) [Rosse C, Mejino JLV. A reference ontology for bioinformatics: the Foundational Model of Anatomy. J. Biomed. Inform. 2003;36:478-500] is an ontology that represents canonical anatomy at levels ranging from the entire body to biological macromolecules, and has rapidly become the primary reference ontology for human anatomy, and a template for model organisms. Prior to this work, the FMA was developed in a knowledge modeling language known as Protégé Frames. Frames is an intuitive representational language, but is no longer the industry standard. Recognizing the need for an official version of the FMA in the more modern semantic web language OWL2 (hereafter referred to as OWL), the objective of this work was to create a generalizable Frames-to-OWL conversion tool, to use the tool to convert the FMA to OWL, to "clean up" the converted FMA so that it classifies under an EL reasoner, and then to do all further development in OWL.

METHODS

The conversion tool is a Java application that uses the Protégé knowledge representation API for interacting with the initial Frames ontology, and uses the OWL-API for producing new statements (axioms, etc.) in OWL. The converter is relation centric. The conversion is configurable, on a property-by-property basis, via user-specifiable XML configuration files. The best conversion, for each property, was determined in conjunction with the FMA knowledge author. The convertor is potentially generalizable, which we partially demonstrate by using it to convert our Ontology of Craniofacial Development and Malformation as well as the FMA. Post-conversion cleanup involved using the Explain feature of Protégé to trace classification errors under the ELK reasoner in Protégé, fixing the errors, then re-running the reasoner.

RESULTS

We are currently doing all our development in the converted and cleaned-up version of the FMA. The FMA (updated every 3 months) is available via our FMA web page http://si.washington.edu/projects/fma, which also provides access to mailing lists, an issue tracker, a SPARQL endpoint (updated every week), and an online browser. The converted OCDM is available at http://www.si.washington.edu/projects/ocdm. The conversion code is open source, and available at http://purl.org/sig/software/frames2owl. Prior to the post-conversion cleanup 73% of the more than 100,000 classes were unsatisfiable. After correction of six types of errors no classes remained unsatisfiable.

CONCLUSION

Because our FMA conversion captures all or most of the information in the Frames version, is the only complete OWL version that classifies under an EL reasoner, and is maintained by the FMA authors themselves, we propose that this version should be the only official release version of the FMA in OWL, supplanting all other versions. Although several issues remain to be resolved post-conversion, release of a single, standardized version of the FMA in OWL will greatly facilitate its use in informatics research and in the development of a global knowledge base within the semantic web. Because of the fundamental nature of anatomy in both understanding and organizing biomedical information, and because of the importance of the FMA in particular in representing human anatomy, the FMA in OWL should greatly accelerate the development of an anatomically based structural information framework for organizing and linking a large amount of biomedical information.

Collapse

Druzinsky RE, Balhoff JP, Crompton AW, Done J, German RZ, Haendel MA, Herrel A, Herring SW, Lapp H, Mabee PM, Muller HM, Mungall CJ, Sternberg PW, Van Auken K, Vinyard CJ, Williams SH, Wall CE. Muscle Logic: New Knowledge Resource for Anatomy Enables Comprehensive Searches of the Literature on the Feeding Muscles of Mammals. PLoS One 2016;11:e0149102. [PMID: 26870952 PMCID: PMC4752357 DOI: 10.1371/journal.pone.0149102] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2015] [Accepted: 01/27/2016] [Indexed: 01/27/2023] Open

Abstract

Background

In recent years large bibliographic databases have made much of the published literature of biology available for searches. However, the capabilities of the search engines integrated into these databases for text-based bibliographic searches are limited. To enable searches that deliver the results expected by comparative anatomists, an underlying logical structure known as an ontology is required.

Development and Testing of the Ontology

Here we present the Mammalian Feeding Muscle Ontology (MFMO), a multi-species ontology focused on anatomical structures that participate in feeding and other oral/pharyngeal behaviors. A unique feature of the MFMO is that a simple, computable, definition of each muscle, which includes its attachments and innervation, is true across mammals. This construction mirrors the logical foundation of comparative anatomy and permits searches using language familiar to biologists. Further, it provides a template for muscles that will be useful in extending any anatomy ontology. The MFMO is developed to support the Feeding Experiments End-User Database Project (FEED, https://feedexp.org/), a publicly-available, online repository for physiological data collected from in vivo studies of feeding (e.g., mastication, biting, swallowing) in mammals. Currently the MFMO is integrated into FEED and also into two literature-specific implementations of Textpresso, a text-mining system that facilitates powerful searches of a corpus of scientific publications. We evaluate the MFMO by asking questions that test the ability of the ontology to return appropriate answers (competency questions). We compare the results of queries of the MFMO to results from similar searches in PubMed and Google Scholar.

Results and Significance

Our tests demonstrate that the MFMO is competent to answer queries formed in the common language of comparative anatomy, but PubMed and Google Scholar are not. Overall, our results show that by incorporating anatomical ontologies into searches, an expanded and anatomically comprehensive set of results can be obtained. The broader scientific and publishing communities should consider taking up the challenge of semantically enabled search capabilities.

Collapse

Affiliation(s)

Robert E. Druzinsky Department of Oral Biology, University of Illinois at Chicago, Chicago, Illinois, United States of America * E-mail:
James P. Balhoff RTI International, Research Triangle Park, North Carolina, United States of America
Alfred W. Crompton Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
James Done Division of Biology and Biological Engineering, M/C 156–29, California Institute of Technology, Pasadena, California, United States of America
Rebecca Z. German Department of Anatomy and Neurobiology, Northeast Ohio Medical University, Rootstown, Ohio, United States of America
Melissa A. Haendel Oregon Health and Science University, Portland, Oregon, United States of America
Anthony Herrel Département d’Ecologie et de Gestion de la Biodiversité, Museum National d’Histoire Naturelle, Paris, France
Susan W. Herring University of Washington, Department of Orthodontics, Seattle, Washington, United States of America
Hilmar Lapp National Evolutionary Synthesis Center, Durham, North Carolina, United States of America Center for Genomic and Computational Biology, Duke University, Durham, North Carolina, United States of America
Paula M. Mabee Department of Biology, University of South Dakota, Vermillion, South Dakota, United States of America
Hans-Michael Muller Division of Biology and Biological Engineering, M/C 156–29, California Institute of Technology, Pasadena, California, United States of America
Christopher J. Mungall Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
Paul W. Sternberg Division of Biology and Biological Engineering, M/C 156–29, California Institute of Technology, Pasadena, California, United States of America Howard Hughes Medical Institute, M/C 156–29, California Institute of Technology, Pasadena, California, United States of America
Kimberly Van Auken Division of Biology and Biological Engineering, M/C 156–29, California Institute of Technology, Pasadena, California, United States of America
Christopher J. Vinyard Department of Anatomy and Neurobiology, Northeast Ohio Medical University, Rootstown, Ohio, United States of America
Susan H. Williams Department of Biomedical Sciences, Ohio University Heritage College of Osteopathic Medicine, Athens, Ohio, United States of America
Christine E. Wall Department of Evolutionary Anthropology, Duke University, Durham, North Carolina, United States of America

Collapse

Hoehndorf R, Schofield PN, Gkoutos GV. The role of ontologies in biological and biomedical research: a functional perspective. Brief Bioinform 2015;16:1069-80. [PMID: 25863278 PMCID: PMC4652617 DOI: 10.1093/bib/bbv011] [Citation(s) in RCA: 119] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2014] [Revised: 01/20/2015] [Indexed: 12/19/2022] Open