1
|
Slater K, Schofield PN, Wright J, Clift P, Irani A, Bradlow W, Aziz F, Gkoutos GV. Talking about diseases; developing a model of patient and public-prioritised disease phenotypes. NPJ Digit Med 2024; 7:263. [PMID: 39349692 PMCID: PMC11443070 DOI: 10.1038/s41746-024-01257-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 09/11/2024] [Indexed: 10/04/2024] Open
Abstract
Deep phenotyping describes the use of standardised terminologies to create comprehensive phenotypic descriptions of biomedical phenomena. These characterisations facilitate secondary analysis, evidence synthesis, and practitioner awareness, thereby guiding patient care. The vast majority of this knowledge is derived from sources that describe an academic understanding of disease, including academic literature and experimental databases. Previous work indicates a gulf between the priorities, perspectives, and perceptions held by different healthcare stakeholders. Using social media data, we develop a phenotype model that represents a public perspective on disease and compare this with a model derived from a combination of existing academic phenotype databases. We identified 52,198 positive disease-phenotype associations from social media across 311 diseases. We further identified 24,618 novel phenotype associations not shared by the biomedical and literature-derived phenotype model across 304 diseases, of which we considered 14,531 significant. Manifestations of disease affecting quality of life, and concerning endocrine, digestive, and reproductive diseases were over-represented in the social media phenotype model. An expert clinical review found that social media-derived associations were considered similarly well-established to those derived from literature, and were seen significantly more in patient clinical encounters. The phenotype model recovered from social media presents a significantly different perspective than existing resources derived from biomedical databases and literature, providing a large number of associations novel to the latter dataset. We propose that the integration and interrogation of these public perspectives on the disease can inform clinical awareness, improve secondary analysis, and bridge understanding and priorities across healthcare stakeholders.
Collapse
Affiliation(s)
- Karin Slater
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, UK.
- Centre for Environmental Research and Justice, University of Birmingham, Birmingham, UK.
- Centre for Health Data Science, University of Birmingham, Birmingham, UK.
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK.
| | - Paul N Schofield
- Department of Physiology, Development, and Neuroscience, University of Cambridge, Cambridge, UK
| | | | - Paul Clift
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
| | - Anushka Irani
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, UK
- Division of Rheumatology, Mayo Clinic Florida, Jacksonville, FL, USA
| | - William Bradlow
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
| | - Furqan Aziz
- Centre for Health Data Science, University of Birmingham, Birmingham, UK
- School of Computing and Mathematical Sciences, University of Leicester, Leicester, UK
| | - Georgios V Gkoutos
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, UK
- Centre for Environmental Research and Justice, University of Birmingham, Birmingham, UK
- Centre for Health Data Science, University of Birmingham, Birmingham, UK
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
| |
Collapse
|
2
|
Yamagata Y, Kushida T, Onami S, Masuya H. Homeostasis imbalance process ontology: a study on COVID-19 infectious processes. BMC Med Inform Decis Mak 2024; 23:301. [PMID: 38778394 PMCID: PMC11110177 DOI: 10.1186/s12911-024-02516-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Accepted: 04/15/2024] [Indexed: 05/25/2024] Open
Abstract
BACKGROUND One significant challenge in addressing the coronavirus disease 2019 (COVID-19) pandemic is to grasp a comprehensive picture of its infectious mechanisms. We urgently need a consistent framework to capture the intricacies of its complicated viral infectious processes and diverse symptoms. RESULTS We systematized COVID-19 infectious processes through an ontological approach and provided a unified description framework of causal relationships from the early infectious stage to severe clinical manifestations based on the homeostasis imbalance process ontology (HoIP). HoIP covers a broad range of processes in the body, ranging from normal to abnormal. Moreover, our imbalance model enabled us to distinguish viral functional demands from immune defense processes, thereby supporting the development of new drugs, and our research demonstrates how ontological reasoning contributes to the identification of patients at severe risk. CONCLUSIONS The HoIP organises knowledge of COVID-19 infectious processes and related entities, such as molecules, drugs, and symptoms, with a consistent descriptive framework. HoIP is expected to harmonise the description of various heterogeneous processes and improve the interoperability of COVID-19 knowledge through the COVID-19 ontology harmonisation working group.
Collapse
Affiliation(s)
- Yuki Yamagata
- Life Science Data Sharing Unit, Infrastructure Research and Development Division, RIKEN Information R&D and Strategy Headquarters, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo, 650-0047, Japan.
- Laboratory for Developmental Dynamics, RIKEN Center for Biosystems Dynamics Research, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo, 650-0047, Japan.
| | - Tatsuya Kushida
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan
| | - Shuichi Onami
- Life Science Data Sharing Unit, Infrastructure Research and Development Division, RIKEN Information R&D and Strategy Headquarters, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo, 650-0047, Japan
- Laboratory for Developmental Dynamics, RIKEN Center for Biosystems Dynamics Research, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo, 650-0047, Japan
| | - Hiroshi Masuya
- Life Science Data Sharing Unit, Infrastructure Research and Development Division, RIKEN Information R&D and Strategy Headquarters, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo, 650-0047, Japan
- Integrated Bioresource Information Division, RIKEN BioResource Research Center, 3-1-1 Koyadai, Tsukuba-shi, Ibaraki, 305-0074, Japan
| |
Collapse
|
3
|
Dally D, Amith M, Mauldin RL, Thomas L, Dang Y, Tao C. A Semantic Approach to Describe Social and Economic Characteristics That Impact Health Outcomes (Social Determinants of Health): Ontology Development Study. Online J Public Health Inform 2024; 16:e52845. [PMID: 38477963 PMCID: PMC10973958 DOI: 10.2196/52845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Revised: 11/28/2023] [Accepted: 02/19/2024] [Indexed: 03/14/2024] Open
Abstract
BACKGROUND Social determinants of health (SDoH) have been described by the World Health Organization as the conditions in which individuals are born, live, work, and age. These conditions can be grouped into 3 interrelated levels known as macrolevel (societal), mesolevel (community), and microlevel (individual) determinants. The scope of SDoH expands beyond the biomedical level, and there remains a need to connect other areas such as economics, public policy, and social factors. OBJECTIVE Providing a computable artifact that can link health data to concepts involving the different levels of determinants may improve our understanding of the impact SDoH have on human populations. Modeling SDoH may help to reduce existing gaps in the literature through explicit links between the determinants and biological factors. This in turn can allow researchers and clinicians to make better sense of data and discover new knowledge through the use of semantic links. METHODS An experimental ontology was developed to represent knowledge of the social and economic characteristics of SDoH. Information from 27 literature sources was analyzed to gather concepts and encoded using Web Ontology Language, version 2 (OWL2) and Protégé. Four evaluators independently reviewed the ontology axioms using natural language translation. The analyses from the evaluations and selected terminologies from the Basic Formal Ontology were used to create a revised ontology with a broad spectrum of knowledge concepts ranging from the macrolevel to the microlevel determinants. RESULTS The literature search identified several topics of discussion for each determinant level. Publications for the macrolevel determinants centered around health policy, income inequality, welfare, and the environment. Articles relating to the mesolevel determinants discussed work, work conditions, psychosocial factors, socioeconomic position, outcomes, food, poverty, housing, and crime. Finally, sources found for the microlevel determinants examined gender, ethnicity, race, and behavior. Concepts were gathered from the literature and used to produce an ontology consisting of 383 classes, 109 object properties, and 748 logical axioms. A reasoning test revealed no inconsistent axioms. CONCLUSIONS This ontology models heterogeneous social and economic concepts to represent aspects of SDoH. The scope of SDoH is expansive, and although the ontology is broad, it is still in its early stages. To our current understanding, this ontology represents the first attempt to concentrate on knowledge concepts that are currently not covered by existing ontologies. Future direction will include further expanding the ontology to link with other biomedical ontologies, including alignment for granular semantics.
Collapse
Affiliation(s)
- Daniela Dally
- The University of Texas Health Science Center at Houston School of Public Health, The Brownsville Region, Brownsville, TX, United States
| | - Muhammad Amith
- Department of Biostatistics and Data Science, University of Texas Medical Branch, Galveston, TX, United States
- Department of Internal Medicine, University of Texas Medical Branch, Galveton, TX, United States
| | - Rebecca L Mauldin
- School of Social Work, The University of Texas at Arlington, Arlington, TX, United States
| | - Latisha Thomas
- School of Social Work, The University of Texas at Arlington, Arlington, TX, United States
| | - Yifang Dang
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Cui Tao
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, United States
| |
Collapse
|
4
|
Sanjak J, Binder J, Yadaw AS, Zhu Q, Mathé EA. Clustering rare diseases within an ontology-enriched knowledge graph. J Am Med Inform Assoc 2023; 31:154-164. [PMID: 37759342 PMCID: PMC10746319 DOI: 10.1093/jamia/ocad186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Revised: 08/01/2023] [Accepted: 09/05/2023] [Indexed: 09/29/2023] Open
Abstract
OBJECTIVE Identifying sets of rare diseases with shared aspects of etiology and pathophysiology may enable drug repurposing. Toward that aim, we utilized an integrative knowledge graph to construct clusters of rare diseases. MATERIALS AND METHODS Data on 3242 rare diseases were extracted from the National Center for Advancing Translational Science Genetic and Rare Diseases Information center internal data resources. The rare disease data enriched with additional biomedical data, including gene and phenotype ontologies, biological pathway data, and small molecule-target activity data, to create a knowledge graph (KG). Node embeddings were trained and clustered. We validated the disease clusters through semantic similarity and feature enrichment analysis. RESULTS Thirty-seven disease clusters were created with a mean size of 87 diseases. We validate the clusters quantitatively via semantic similarity based on the Orphanet Rare Disease Ontology. In addition, the clusters were analyzed for enrichment of associated genes, revealing that the enriched genes within clusters are highly related. DISCUSSION We demonstrate that node embeddings are an effective method for clustering diseases within a heterogenous KG. Semantically similar diseases and relevant enriched genes have been uncovered within the clusters. Connections between disease clusters and drugs are enumerated for follow-up efforts. CONCLUSION We lay out a method for clustering rare diseases using graph node embeddings. We develop an easy-to-maintain pipeline that can be updated when new data on rare diseases emerges. The embeddings themselves can be paired with other representation learning methods for other data types, such as drugs, to address other predictive modeling problems.
Collapse
Affiliation(s)
- Jaleal Sanjak
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD, United States
- Chief Technology Office, Booz Allen Hamilton, Bethesda, MD, United States
| | - Jessica Binder
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD, United States
| | - Arjun Singh Yadaw
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD, United States
| | - Qian Zhu
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD, United States
| | - Ewy A Mathé
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD, United States
| |
Collapse
|
5
|
Stefancsik R, Balhoff JP, Balk MA, Ball RL, Bello SM, Caron AR, Chesler EJ, de Souza V, Gehrke S, Haendel M, Harris LW, Harris NL, Ibrahim A, Koehler S, Matentzoglu N, McMurry JA, Mungall CJ, Munoz-Torres MC, Putman T, Robinson P, Smedley D, Sollis E, Thessen AE, Vasilevsky N, Walton DO, Osumi-Sutherland D. The Ontology of Biological Attributes (OBA)-computational traits for the life sciences. Mamm Genome 2023; 34:364-378. [PMID: 37076585 PMCID: PMC10382347 DOI: 10.1007/s00335-023-09992-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 04/06/2023] [Indexed: 04/21/2023]
Abstract
Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or any population-focussed measurable trait data. The integration of trait and biological attribute information with an ever increasing body of chemical, environmental and biological data greatly facilitates computational analyses and it is also highly relevant to biomedical and clinical applications. The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role. OBA is a standardised representational framework for observable attributes that are characteristics of biological entities, organisms, or parts of organisms. OBA has a modular design which provides several benefits for users and data integrators, including an automated and meaningful classification of trait terms computed on the basis of logical inferences drawn from domain-specific ontologies for cells, anatomical and other relevant entities. The logical axioms in OBA also provide a previously missing bridge that can computationally link Mendelian phenotypes with GWAS and quantitative traits. The term components in OBA provide semantic links and enable knowledge and data integration across specialised research community boundaries, thereby breaking silos.
Collapse
Affiliation(s)
- Ray Stefancsik
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK.
| | - James P Balhoff
- Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC, 27517, USA
| | - Meghan A Balk
- Natural History Museum, University of Oslo, Oslo, Norway
| | - Robyn L Ball
- The Jackson Laboratory, Bar Harbor, ME, 04609, USA
| | | | - Anita R Caron
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | - Vinicius de Souza
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Sarah Gehrke
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Melissa Haendel
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Laura W Harris
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Nomi L Harris
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Arwa Ibrahim
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | | | - Julie A McMurry
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Tim Putman
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | | | - Damian Smedley
- William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, EC1M 6BQ, UK
| | - Elliot Sollis
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Anne E Thessen
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Nicole Vasilevsky
- Data Collaboration Center, Critical Path Institute, Tucson, AZ, 85718, USA
| | | | | |
Collapse
|
6
|
Sanjak J, Zhu Q, Mathé EA. Clustering rare diseases within an ontology-enriched knowledge graph. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.15.528673. [PMID: 36824742 PMCID: PMC9949046 DOI: 10.1101/2023.02.15.528673] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
Abstract
Objective Identifying sets of rare diseases with shared aspects of etiology and pathophysiology may enable drug repurposing and/or platform based therapeutic development. Toward that aim, we utilized an integrative knowledge graph-based approach to constructing clusters of rare diseases. Materials and Methods Data on 3,242 rare diseases were extracted from the National Center for Advancing Translational Science (NCATS) Genetic and Rare Diseases Information center (GARD) internal data resources. The rare disease data was enriched with additional biomedical data, including gene and phenotype ontologies, biological pathway data and small molecule-target activity data, to create a knowledge graph (KG). Node embeddings were used to convert nodes into vectors upon which k-means clustering was applied. We validated the disease clusters through semantic similarity and feature enrichment analysis. Results A node embedding model was trained on the ontology enriched rare disease KG and k-means clustering was applied to the embedding vectors resulting in 37 disease clusters with a mean size of 87 diseases. We validate the disease clusters quantitatively by looking at semantic similarity of clustered diseases, using the Orphanet Rare Disease Ontology. In addition, the clusters were analyzed for enrichment of associated genes, revealing that the enriched genes within clusters were shown to be highly related. Discussion We demonstrate that node embeddings are an effective method for clustering diseases within a heterogenous KG. Semantically similar diseases and relevant enriched genes have been uncovered within the clusters. Connections between disease clusters and approved or investigational drugs are enumerated for follow-up efforts. Conclusion Our study lays out a method for clustering rare diseases using the graph node embeddings. We develop an easy to maintain pipeline that can be updated when new data on rare diseases emerges. The embeddings themselves can be paired with other representation learning methods for other data types, such as drugs, to address other predictive modeling problems. Detailed subnetwork analysis and in-depth review of individual clusters may lead to translatable findings. Future work will focus on incorporation of additional data sources, with a particular focus on common disease data.
Collapse
Affiliation(s)
- Jaleal Sanjak
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD
| | - Qian Zhu
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD
| | - Ewy A Mathé
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD
| |
Collapse
|
7
|
Slater K, Williams JA, Schofield PN, Russell S, Pendleton SC, Karwath A, Fanning H, Ball S, Hoehndorf R, Gkoutos GV. Klarigi: Characteristic explanations for semantic biomedical data. Comput Biol Med 2023; 153:106425. [PMID: 36638616 DOI: 10.1016/j.compbiomed.2022.106425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 12/04/2022] [Accepted: 12/13/2022] [Indexed: 12/24/2022]
Abstract
Annotation of biomedical entities with ontology classes provides for formal semantic analysis and mobilisation of background knowledge in determining their relationships. To date, enrichment analysis has been routinely employed to identify classes that are over-represented in annotations across sets of groups, such as biosample gene expression profiles or patient phenotypes, and is useful for a range of tasks including differential diagnosis and causative variant prioritisation. These approaches, however, usually consider only univariate relationships, make limited use of the semantic features of ontologies, and provide limited information and evaluation of the explanatory power of both singular and grouped candidate classes. Moreover, they are not designed to solve the problem of deriving cohesive, characteristic, and discriminatory sets of classes for entity groups. We have developed a new tool, called Klarigi, which introduces multiple scoring heuristics for identification of classes that are both compositional and discriminatory for groups of entities annotated with ontology classes. The tool includes a novel algorithm for derivation of multivariable semantic explanations for entity groups, makes use of semantic inference through live use of an ontology reasoner, and includes a classification method for identifying the discriminatory power of candidate sets, in addition to significance testing apposite to traditional enrichment approaches. We describe the design and implementation of Klarigi, including its scoring and explanation determination methods, and evaluate its use in application to two test cases with clinical significance, comparing and contrasting methods and results with literature-based and enrichment analysis methods. We demonstrate that Klarigi produces characteristic and discriminatory explanations for groups of biomedical entities in two settings. We also show that these explanations recapitulate and extend the knowledge held in existing biomedical databases and literature for several diseases. We conclude that Klarigi provides a distinct and valuable perspective on biomedical datasets when compared with traditional enrichment methods, and therefore constitutes a new method by which biomedical datasets can be explored, contributing to improved insight into semantic data.
Collapse
Affiliation(s)
- Karin Slater
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; MRC Health Data Research UK (HDR UK), Midlands, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK.
| | - John A Williams
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| | - Paul N Schofield
- Department of Physiology, Development, and Neuroscience, University of Cambridge, UK
| | - Sophie Russell
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK
| | - Samantha C Pendleton
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK
| | - Andreas Karwath
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; MRC Health Data Research UK (HDR UK), Midlands, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| | - Hilary Fanning
- Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| | - Simon Ball
- Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, UK
| | - Georgios V Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; NIHR Experimental Cancer Medicine Centre, UK; NIHR Surgical Reconstruction and Microbiology Research Centre, UK; NIHR Biomedical Research Centre, UK; MRC Health Data Research UK (HDR UK), Midlands, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| |
Collapse
|
8
|
Stefancsik R, Balhoff JP, Balk MA, Ball R, Bello SM, Caron AR, Chessler E, de Souza V, Gehrke S, Haendel M, Harris LW, Harris NL, Ibrahim A, Koehler S, Matentzoglu N, McMurry JA, Mungall CJ, Munoz-Torres MC, Putman T, Robinson P, Smedley D, Sollis E, Thessen AE, Vasilevsky N, Walton DO, Osumi-Sutherland D. The Ontology of Biological Attributes (OBA) - Computational Traits for the Life Sciences. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.26.525742. [PMID: 36747660 PMCID: PMC9900877 DOI: 10.1101/2023.01.26.525742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or any population-focused measurable trait data. Moreover, variations in gene expression in response to environmental disturbances even without any genetic alterations can also be associated with particular biological attributes. The integration of trait and biological attribute information with an ever increasing body of chemical, environmental and biological data greatly facilitates computational analyses and it is also highly relevant to biomedical and clinical applications. The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role. OBA is a standardised representational framework for observable attributes that are characteristics of biological entities, organisms, or parts of organisms. OBA has a modular design which provides several benefits for users and data integrators, including an automated and meaningful classification of trait terms computed on the basis of logical inferences drawn from domain-specific ontologies for cells, anatomical and other relevant entities. The logical axioms in OBA also provide a previously missing bridge that can computationally link Mendelian phenotypes with GWAS and quantitative traits. The term components in OBA provide semantic links and enable knowledge and data integration across specialised research community boundaries, thereby breaking silos.
Collapse
Affiliation(s)
- Ray Stefancsik
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - James P. Balhoff
- Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC 27517, USA
| | - Meghan A. Balk
- National Ecological Observatory Network, Battelle, Boulder, CO 80301, USA
| | - Robyn Ball
- The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | | | - Anita R. Caron
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | - Vinicius de Souza
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Sarah Gehrke
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | - Melissa Haendel
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | - Laura W. Harris
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Nomi L. Harris
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Arwa Ibrahim
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | | | - Julie A. McMurry
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | - Christopher J. Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Tim Putman
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | | | - Damian Smedley
- William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London EC1M 6BQ, UK
| | - Elliot Sollis
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Anne E Thessen
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | - Nicole Vasilevsky
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | | | | |
Collapse
|
9
|
Bonatti P, Petrova I, Sauro L. Optimizing the computation of overriding in DLN. ARTIF INTELL 2022. [DOI: 10.1016/j.artint.2022.103764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
10
|
Blagec K, Barbosa-Silva A, Ott S, Samwald M. A curated, ontology-based, large-scale knowledge graph of artificial intelligence tasks and benchmarks. Sci Data 2022; 9:322. [PMID: 35715466 PMCID: PMC9205953 DOI: 10.1038/s41597-022-01435-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Accepted: 05/30/2022] [Indexed: 11/22/2022] Open
Abstract
Research in artificial intelligence (AI) is addressing a growing number of tasks through a rapidly growing number of models and methodologies. This makes it difficult to keep track of where novel AI methods are successfully - or still unsuccessfully - applied, how progress is measured, how different advances might synergize with each other, and how future research should be prioritized. To help address these issues, we created the Intelligence Task Ontology and Knowledge Graph (ITO), a comprehensive, richly structured and manually curated resource on artificial intelligence tasks, benchmark results and performance metrics. The current version of ITO contains 685,560 edges, 1,100 classes representing AI processes and 1,995 properties representing performance metrics. The primary goal of ITO is to enable analyses of the global landscape of AI tasks and capabilities. ITO is based on technologies that allow for easy integration and enrichment with external data, automated inference and continuous, collaborative expert curation of underlying ontological models. We make the ITO dataset and a collection of Jupyter notebooks utilizing ITO openly available.
Collapse
Affiliation(s)
- Kathrin Blagec
- Medical University of Vienna, Center for Medical Statistics, Informatics and Intelligent Systems, Institute of Artificial Intelligence, Vienna, Austria
| | - Adriano Barbosa-Silva
- Medical University of Vienna, Center for Medical Statistics, Informatics and Intelligent Systems, Institute of Artificial Intelligence, Vienna, Austria
| | - Simon Ott
- Medical University of Vienna, Center for Medical Statistics, Informatics and Intelligent Systems, Institute of Artificial Intelligence, Vienna, Austria
| | - Matthias Samwald
- Medical University of Vienna, Center for Medical Statistics, Informatics and Intelligent Systems, Institute of Artificial Intelligence, Vienna, Austria.
| |
Collapse
|
11
|
Vaidya G, Cellinese N, Lapp H. A new phylogenetic data standard for computable clade definitions: the Phyloreference Exchange Format (Phyx). PeerJ 2022; 10:e12618. [PMID: 35186448 PMCID: PMC8855714 DOI: 10.7717/peerj.12618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Accepted: 11/18/2021] [Indexed: 01/06/2023] Open
Abstract
To be computationally reproducible and efficient, integration of disparate data depends on shared entities whose matching meaning (semantics) can be computationally assessed. For biodiversity data one of the most prevalent shared entities for linking data records is the associated taxon concept. Unlike Linnaean taxon names, the traditional way in which taxon concepts are provided, phylogenetic definitions are native to phylogenetic trees and offer well-defined semantics that can be transformed into formal, computationally evaluable logic expressions. These attributes make them highly suitable for phylogeny-driven comparative biology by allowing computationally verifiable and reproducible integration of taxon-linked data against Tree of Life-scale phylogenies. To achieve this, the first step is transforming phylogenetic definitions from the natural language text in which they are published to a structured interoperable data format that maintains strong ties to semantics and lends itself well to sharing, reuse, and long-term archival. To this end, we developed the Phyloreference Exchange Format (Phyx), a JSON-LD-based text format encompassing rich metadata for all elements of a phylogenetic definition, and we created a supporting software library, phyx.js, to streamline computational management of such files. Together they form a foundation layer for digitizing and computing with phylogenetic definitions of clades.
Collapse
Affiliation(s)
- Gaurav Vaidya
- Renaissance Computing Institute (RENCI), University of North Carolina at Chapel Hill, Chapel Hill, NC, United States of America,Florida Museum of Natural History, University of Florida, Gainesville, FL, United States of America
| | - Nico Cellinese
- Florida Museum of Natural History, University of Florida, Gainesville, FL, United States of America,Informatics Institute, University of Florida, Gainesville, FL, United States of America
| | - Hilmar Lapp
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, United States of America
| |
Collapse
|
12
|
|
13
|
Blumberg KL, Ponsero AJ, Bomhoff M, Wood-Charlson EM, DeLong EF, Hurwitz BL. Ontology-Enriched Specifications Enabling Findable, Accessible, Interoperable, and Reusable Marine Metagenomic Datasets in Cyberinfrastructure Systems. Front Microbiol 2021; 12:765268. [PMID: 34956127 PMCID: PMC8692764 DOI: 10.3389/fmicb.2021.765268] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 11/16/2021] [Indexed: 11/13/2022] Open
Abstract
Marine microbial ecology requires the systematic comparison of biogeochemical and sequence data to analyze environmental influences on the distribution and variability of microbial communities. With ever-increasing quantities of metagenomic data, there is a growing need to make datasets Findable, Accessible, Interoperable, and Reusable (FAIR) across diverse ecosystems. FAIR data is essential to developing analytical frameworks that integrate microbiological, genomic, ecological, oceanographic, and computational methods. Although community standards defining the minimal metadata required to accompany sequence data exist, they haven’t been consistently used across projects, precluding interoperability. Moreover, these data are not machine-actionable or discoverable by cyberinfrastructure systems. By making ‘omic and physicochemical datasets FAIR to machine systems, we can enable sequence data discovery and reuse based on machine-readable descriptions of environments or physicochemical gradients. In this work, we developed a novel technical specification for dataset encapsulation for the FAIR reuse of marine metagenomic and physicochemical datasets within cyberinfrastructure systems. This includes using Frictionless Data Packages enriched with terminology from environmental and life-science ontologies to annotate measured variables, their units, and the measurement devices used. This approach was implemented in Planet Microbe, a cyberinfrastructure platform and marine metagenomic web-portal. Here, we discuss the data properties built into the specification to make global ocean datasets FAIR within the Planet Microbe portal. We additionally discuss the selection of, and contributions to marine-science ontologies used within the specification. Finally, we use the system to discover data by which to answer various biological questions about environments, physicochemical gradients, and microbial communities in meta-analyses. This work represents a future direction in marine metagenomic research by proposing a specification for FAIR dataset encapsulation that, if adopted within cyberinfrastructure systems, would automate the discovery, exchange, and re-use of data needed to answer broader reaching questions than originally intended.
Collapse
Affiliation(s)
- Kai L Blumberg
- Department of Biosystems Engineering, University of Arizona, Tucson, AZ, United States
| | - Alise J Ponsero
- Department of Biosystems Engineering, University of Arizona, Tucson, AZ, United States
| | - Matthew Bomhoff
- Department of Biosystems Engineering, University of Arizona, Tucson, AZ, United States
| | - Elisha M Wood-Charlson
- E.O. Lawrence Berkeley National Laboratory, Environmental Genomics and Systems Biology Division, Berkeley, CA, United States
| | - Edward F DeLong
- Daniel K. Inouye Center for Microbial Oceanography, University of Hawai'i, Honolulu, HI, United States
| | - Bonnie L Hurwitz
- Department of Biosystems Engineering, University of Arizona, Tucson, AZ, United States.,BIO5 Institute, University of Arizona, Tucson, AZ, United States
| |
Collapse
|
14
|
Lawson J, Cabili MN, Kerry G, Boughtwood T, Thorogood A, Alper P, Bowers SR, Boyles RR, Brookes AJ, Brush M, Burdett T, Clissold H, Donnelly S, Dyke SO, Freeberg MA, Haendel MA, Hata C, Holub P, Jeanson F, Jene A, Kawashima M, Kawashima S, Konopko M, Kyomugisha I, Li H, Linden M, Rodriguez LL, Morita M, Mulder N, Muller J, Nagaie S, Nasir J, Ogishima S, Ota Wang V, Paglione LD, Pandya RN, Parkinson H, Philippakis AA, Prasser F, Rambla J, Reinold K, Rushton GA, Saltzman A, Saunders G, Sofia HJ, Spalding JD, Swertz MA, Tulchinsky I, van Enckevort EJ, Varma S, Voisin C, Yamamoto N, Yamasaki C, Zass L, Guidry Auvil JM, Nyrönen TH, Courtot M. The Data Use Ontology to streamline responsible access to human biomedical datasets. CELL GENOMICS 2021; 1:None. [PMID: 34820659 PMCID: PMC8591903 DOI: 10.1016/j.xgen.2021.100028] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Revised: 07/02/2021] [Accepted: 08/09/2021] [Indexed: 11/25/2022]
Abstract
Human biomedical datasets that are critical for research and clinical studies to benefit human health also often contain sensitive or potentially identifying information of individual participants. Thus, care must be taken when they are processed and made available to comply with ethical and regulatory frameworks and informed consent data conditions. To enable and streamline data access for these biomedical datasets, the Global Alliance for Genomics and Health (GA4GH) Data Use and Researcher Identities (DURI) work stream developed and approved the Data Use Ontology (DUO) standard. DUO is a hierarchical vocabulary of human and machine-readable data use terms that consistently and unambiguously represents a dataset's allowable data uses. DUO has been implemented by major international stakeholders such as the Broad and Sanger Institutes and is currently used in annotation of over 200,000 datasets worldwide. Using DUO in data management and access facilitates researchers' discovery and access of relevant datasets. DUO annotations increase the FAIRness of datasets and support data linkages using common data use profiles when integrating the data for secondary analyses. DUO is implemented in the Web Ontology Language (OWL) and, to increase community awareness and engagement, hosted in an open, centralized GitHub repository. DUO, together with the GA4GH Passport standard, offers a new, efficient, and streamlined data authorization and access framework that has enabled increased sharing of biomedical datasets worldwide.
Collapse
Affiliation(s)
- Jonathan Lawson
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Moran N. Cabili
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Giselle Kerry
- European Molecular Biology Laboratory—European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Tiffany Boughtwood
- Australian Genomics, Murdoch Children’s Research Institute, Parkville, VIC, Australia
| | - Adrian Thorogood
- Centre of Genomics and Policy, Department of Human Genetics, McGill University, Montreal, QC, Canada
- ELIXIR-Luxembourg, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Pinar Alper
- ELIXIR-Luxembourg, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | | | | | | | - Matthew Brush
- University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Tony Burdett
- European Molecular Biology Laboratory—European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Hayley Clissold
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Stacey Donnelly
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Stephanie O.M. Dyke
- McGill Centre for Integrative Neuroscience, Montreal Neurological Institute, Department of Neurology & Neurosurgery, Faculty of Medicine, McGill University, Montreal, QC, Canada
| | - Mallory A. Freeberg
- European Molecular Biology Laboratory—European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | | | - Chihiro Hata
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Japan
| | - Petr Holub
- BBMRI-ERIC, AT and Masaryk University, Brno, Czech Republic
| | | | - Aina Jene
- Centre de Regulació Genòmica (CRG), Barcelona, Spain
| | - Minae Kawashima
- National Bioscience Database Center, Japan Science and Technology Agency, Tokyo, Japan
| | - Shuichi Kawashima
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Kashiwa, Japan
| | | | - Irene Kyomugisha
- Division of Human Genetics, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Haoyuan Li
- Canada’s Michael Smith Genome Sciences Centre, Vancouver, BC, Canada
| | - Mikael Linden
- ELIXIR-Finland, CSC - IT Center for Science Ltd, Espoo, Finland
| | | | | | - Nicola Mulder
- Computational Biology Division, IDM, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Jean Muller
- Laboratoire de Génétique Médicale, Institut de Génétique Médicale d’Alsace, INSERM U1112, Université; de Strasbourg, Strasbourg, France
- Laboratoire de Diagnostic Génétique, Institut de Génétique Médicale d’Alsace, Hôpitaux Universitaires de Strasbourg, Strasbourg, France
| | - Satoshi Nagaie
- Tohoku Medical Megabank Organization (ToMMo), Tohoku University, Sendai, Japan
| | - Jamal Nasir
- Department of Life Sciences, University of Northampton, Northampton, UK
| | - Soichi Ogishima
- Tohoku Medical Megabank Organization (ToMMo), Tohoku University, Sendai, Japan
| | - Vivian Ota Wang
- Office of Data Sharing, National Cancer Institute, NIH, Rockville, MD, USA
| | | | | | - Helen Parkinson
- European Molecular Biology Laboratory—European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Anthony A. Philippakis
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Fabian Prasser
- Berlin Institute of Health at Charité—Universitätsmedizin Berlin, Berlin, Germany
| | - Jordi Rambla
- Centre de Regulació Genòmica (CRG), Barcelona, Spain
| | - Kathy Reinold
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Gregory A. Rushton
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Andrea Saltzman
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
| | | | - Heidi J. Sofia
- National Human Genome Research Institute, NIH, Bethesda, MD, USA
| | - John D. Spalding
- European Molecular Biology Laboratory—European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Morris A. Swertz
- Genomics Coordination Center, Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | | | - Esther J. van Enckevort
- Genomics Coordination Center, Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Susheel Varma
- Health Data Research UK, Gibbs Building, 215 Euston Road, London NW1 2BE, UK
| | | | | | | | - Lyndon Zass
- Computational Biology Division, IDM, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | | | | | - Mélanie Courtot
- European Molecular Biology Laboratory—European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| |
Collapse
|
15
|
Tena Cucala D, Cuenca Grau B, Horrocks I. Pay-as-you-go consequence-based reasoning for the description logic SROIQ. ARTIF INTELL 2021. [DOI: 10.1016/j.artint.2021.103518] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
16
|
Kulmanov M, Smaili FZ, Gao X, Hoehndorf R. Semantic similarity and machine learning with ontologies. Brief Bioinform 2021; 22:bbaa199. [PMID: 33049044 PMCID: PMC8293838 DOI: 10.1093/bib/bbaa199] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Revised: 08/03/2020] [Accepted: 08/04/2020] [Indexed: 12/13/2022] Open
Abstract
Ontologies have long been employed in the life sciences to formally represent and reason over domain knowledge and they are employed in almost every major biological database. Recently, ontologies are increasingly being used to provide background knowledge in similarity-based analysis and machine learning models. The methods employed to combine ontologies and machine learning are still novel and actively being developed. We provide an overview over the methods that use ontologies to compute similarity and incorporate them in machine learning methods; in particular, we outline how semantic similarity measures and ontology embeddings can exploit the background knowledge in ontologies and how ontologies can provide constraints that improve machine learning models. The methods and experiments we describe are available as a set of executable notebooks, and we also provide a set of slides and additional resources at https://github.com/bio-ontology-research-group/machine-learning-with-ontologies.
Collapse
Affiliation(s)
| | | | - Xin Gao
- Computational Bioscience Research Center and lead of the Structural and Functional Bioinformatics Group at King Abdullah University of Science and Technology
| | | |
Collapse
|
17
|
|
18
|
Slater LT, Gkoutos GV, Hoehndorf R. Towards semantic interoperability: finding and repairing hidden contradictions in biomedical ontologies. BMC Med Inform Decis Mak 2020; 20:311. [PMID: 33319712 PMCID: PMC7736131 DOI: 10.1186/s12911-020-01336-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Accepted: 11/16/2020] [Indexed: 12/25/2022] Open
Abstract
Background Ontologies are widely used throughout the biomedical domain. These ontologies formally represent the classes and relations assumed to exist within a domain. As scientific domains are deeply interlinked, so too are their representations. While individual ontologies can be tested for consistency and coherency using automated reasoning methods, systematically combining ontologies of multiple domains together may reveal previously hidden contradictions. Methods We developed a method that tests for hidden unsatisfiabilities in an ontology that arise when combined with other ontologies. For this purpose, we combined sets of ontologies and use automated reasoning to determine whether unsatisfiable classes are present. In addition, we designed and implemented a novel algorithm that can determine justifications for contradictions across extremely large and complicated ontologies, and use these justifications to semi-automatically repair ontologies by identifying a small set of axioms that, when removed, result in a consistent and coherent set of ontologies.
Results We tested the mutual consistency of the OBO Foundry and the OBO ontologies and find that the combined OBO Foundry gives rise to at least 636 unsatisfiable classes, while the OBO ontologies give rise to more than 300,000 unsatisfiable classes. We also applied our semi-automatic repair algorithm to each combination of OBO ontologies that resulted in unsatisfiable classes, finding that only 117 axioms could be removed to account for all cases of unsatisfiability across all OBO ontologies. Conclusions We identified a large set of hidden unsatisfiability across a broad range of biomedical ontologies, and we find that this large set of unsatisfiable classes is the result of a relatively small amount of axiomatic disagreements. Our results show that hidden unsatisfiability is a serious problem in ontology interoperability; however, our results also provide a way towards more consistent ontologies by addressing the issues we identified.
Collapse
Affiliation(s)
- Luke T Slater
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, B15 2TT, UK. .,Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, B15 2TT, UK.
| | - Georgios V Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, B15 2TT, UK.,Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, B15 2TT, UK.,NIHR Experimental Cancer Medicine Centre, Birmingham, B15 2TT, UK.,NIHR Surgical Reconstruction and Microbiology Research Centre, Birmingham, B15 2TT, UK.,NIHR Biomedical Research Centre, Birmingham, B15 2TT, UK.,MRC Health Data Research UK (HDR UK Midlands, Birmingham, B15 2TT, UK
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia
| |
Collapse
|
19
|
Bonatti PA, Ioffredo L, Petrova IM, Sauro L, Siahaan IR. Real-time reasoning in OWL2 for GDPR compliance. ARTIF INTELL 2020. [DOI: 10.1016/j.artint.2020.103389] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
20
|
Protein ontology on the semantic web for knowledge discovery. Sci Data 2020; 7:337. [PMID: 33046717 PMCID: PMC7550340 DOI: 10.1038/s41597-020-00679-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Accepted: 09/17/2020] [Indexed: 11/26/2022] Open
Abstract
The Protein Ontology (PRO) provides an ontological representation of protein-related entities, ranging from protein families to proteoforms to complexes. Protein Ontology Linked Open Data (LOD) exposes, shares, and connects knowledge about protein-related entities on the Semantic Web using Resource Description Framework (RDF), thus enabling integration with other Linked Open Data for biological knowledge discovery. For example, proteins (or variants thereof) can be retrieved on the basis of specific disease associations. As a community resource, we strive to follow the Findability, Accessibility, Interoperability, and Reusability (FAIR) principles, disseminate regular updates of our data, support multiple methods for accessing, querying and downloading data in various formats, and provide documentation both for scientists and programmers. PRO Linked Open Data can be browsed via faceted browser interface and queried using SPARQL via YASGUI. RDF data dumps are also available for download. Additionally, we developed RESTful APIs to support programmatic data access. We also provide W3C HCLS specification compliant metadata description for our data. The PRO Linked Open Data is available at https://lod.proconsortium.org/.
Collapse
|
21
|
Schneider T, Šimkus M. Ontologies and Data Management: A Brief Survey. KUNSTLICHE INTELLIGENZ 2020; 34:329-353. [PMID: 32999532 PMCID: PMC7497697 DOI: 10.1007/s13218-020-00686-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Accepted: 07/22/2020] [Indexed: 11/30/2022]
Abstract
Information systems have to deal with an increasing amount of data that is heterogeneous, unstructured, or incomplete. In order to align and complete data, systems may rely on taxonomies and background knowledge that are provided in the form of an ontology. This survey gives an overview of research work on the use of ontologies for accessing incomplete and/or heterogeneous data.
Collapse
|
22
|
|
23
|
Tripodi IJ, Callahan TJ, Westfall JT, Meitzer NS, Dowell RD, Hunter LE. Applying knowledge-driven mechanistic inference to toxicogenomics. Toxicol In Vitro 2020; 66:104877. [PMID: 32387679 DOI: 10.1016/j.tiv.2020.104877] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 04/13/2020] [Accepted: 04/23/2020] [Indexed: 02/07/2023]
Abstract
When considering toxic chemicals in the environment, a mechanistic, causal explanation of toxicity may be preferred over a statistical or machine learning-based prediction by itself. Elucidating a mechanism of toxicity is, however, a costly and time-consuming process that requires the participation of specialists from a variety of fields, often relying on animal models. We present an innovative mechanistic inference framework (MechSpy), which can be used as a hypothesis generation aid to narrow the scope of mechanistic toxicology analysis. MechSpy generates hypotheses of the most likely mechanisms of toxicity, by combining a semantically-interconnected knowledge representation of human biology, toxicology and biochemistry with gene expression time series on human tissue. Using vector representations of biological entities, MechSpy seeks enrichment in a manually curated list of high-level mechanisms of toxicity, represented as biochemically- and causally-linked ontology concepts. Besides predicting the canonical mechanism of toxicity for many well-studied compounds, we experimentally validated some of our predictions for other chemicals without an established mechanism of toxicity. This mechanistic inference framework is an advantageous tool for predictive toxicology, and the first of its kind to produce a mechanistic explanation for each prediction. MechSpy can be modified to include additional mechanisms of toxicity, and is generalizable to other types of mechanisms of human biology.
Collapse
Affiliation(s)
- Ignacio J Tripodi
- University of Colorado, Computer Science / Interdisciplinary Quantitative Biology, Boulder, CO 80309, USA.
| | - Tiffany J Callahan
- University of Colorado Anschutz Medical Campus, Computational Bioscience, Denver, CO 80045, USA
| | - Jessica T Westfall
- University of Colorado, Molecular, Cellular and Developmental Biology, Boulder, CO 80309, USA
| | | | - Robin D Dowell
- University of Colorado, Molecular, Cellular and Developmental Biology / Interdisciplinary Quantitative Biology, Boulder, CO 80309, USA
| | - Lawrence E Hunter
- University of Colorado Anschutz Medical Campus, Computational Bioscience / Interdisciplinary Quantitative Biology, Denver, CO 80045, USA
| |
Collapse
|
24
|
Abeysinghe R, Hinderer EW, Moseley HNB, Cui L. SSIF: Subsumption-based Sub-term Inference Framework to audit Gene Ontology. Bioinformatics 2020; 36:3207-3214. [PMID: 32065617 PMCID: PMC7214018 DOI: 10.1093/bioinformatics/btaa106] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Revised: 02/08/2020] [Accepted: 02/11/2020] [Indexed: 01/02/2023] Open
Abstract
MOTIVATION The Gene Ontology (GO) is the unifying biological vocabulary for codifying, managing and sharing biological knowledge. Quality issues in GO, if not addressed, can cause misleading results or missed biological discoveries. Manual identification of potential quality issues in GO is a challenging and arduous task, given its growing size. We introduce an automated auditing approach for suggesting potentially missing is-a relations, which may further reveal erroneous is-a relations. RESULTS We developed a Subsumption-based Sub-term Inference Framework (SSIF) by leveraging a novel term-algebra on top of a sequence-based representation of GO concepts along with three conditional rules (monotonicity, intersection and sub-concept rules). Applying SSIF to the October 3, 2018 release of GO suggested 1938 unique potentially missing is-a relations. Domain experts evaluated a random sample of 210 potentially missing is-a relations. The results showed SSIF achieved a precision of 60.61, 60.49 and 46.03% for the monotonicity, intersection and sub-concept rules, respectively. AVAILABILITY AND IMPLEMENTATION SSIF is implemented in Java. The source code is available at https://github.com/rashmie/SSIF. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rashmie Abeysinghe
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA.,Department of Computer Science
| | | | - Hunter N B Moseley
- Department of Molecular and Cellular Biochemistry.,Institute for Biomedical Informatics.,Markey Cancer Center.,Center for Environmental and Systems Biochemistry, University of Kentucky, Lexington, KY 40506, USA
| | - Licong Cui
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
25
|
|
26
|
Mabee PM, Balhoff JP, Dahdul WM, Lapp H, Mungall CJ, Vision TJ. A Logical Model of Homology for Comparative Biology. Syst Biol 2020; 69:345-362. [PMID: 31596473 PMCID: PMC7672696 DOI: 10.1093/sysbio/syz067] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2019] [Revised: 09/20/2019] [Accepted: 09/26/2019] [Indexed: 01/09/2023] Open
Abstract
There is a growing body of research on the evolution of anatomy in a wide variety of organisms. Discoveries in this field could be greatly accelerated by computational methods and resources that enable these findings to be compared across different studies and different organisms and linked with the genes responsible for anatomical modifications. Homology is a key concept in comparative anatomy; two important types are historical homology (the similarity of organisms due to common ancestry) and serial homology (the similarity of repeated structures within an organism). We explored how to most effectively represent historical and serial homology across anatomical structures to facilitate computational reasoning. We assembled a collection of homology assertions from the literature with a set of taxon phenotypes for the skeletal elements of vertebrate fins and limbs from the Phenoscape Knowledgebase. Using seven competency questions, we evaluated the reasoning ramifications of two logical models: the Reciprocal Existential Axioms (REA) homology model and the Ancestral Value Axioms (AVA) homology model. The AVA model returned all user-expected results in addition to the search term and any of its subclasses. The AVA model also returns any superclass of the query term in which a homology relationship has been asserted. The REA model returned the user-expected results for five out of seven queries. We identify some challenges of implementing complete homology queries due to limitations of OWL reasoning. This work lays the foundation for homology reasoning to be incorporated into other ontology-based tools, such as those that enable synthetic supermatrix construction and candidate gene discovery. [Homology; ontology; anatomy; morphology; evolution; knowledgebase; phenoscape.].
Collapse
Affiliation(s)
- Paula M Mabee
- Department of Biology, University of South Dakota, 414 East Clark Street, Vermillion, SD 57069, USA
| | - James P Balhoff
- Renaissance Computing Institute, University of North Carolina, 100 Europa Drive, Suite 540, Chapel Hill, NC 27517, USA
| | - Wasila M Dahdul
- Department of Biology, University of South Dakota, 414 East Clark Street, Vermillion, SD 57069, USA
| | - Hilmar Lapp
- Center for Genomic and Computational Biology, Duke University, 101 Science Drive, Durham, NC 27708, USA
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Todd J Vision
- Department of Biology and School of Information and Library Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-3280, USA
| |
Collapse
|
27
|
Harth A, Kirrane S, Ngonga Ngomo AC, Paulheim H, Rula A, Gentile AL, Haase P, Cochez M. Hybrid Reasoning Over Large Knowledge Bases Using On-The-Fly Knowledge Extraction. THE SEMANTIC WEB 2020. [PMCID: PMC7250607 DOI: 10.1007/978-3-030-49461-2_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The success of logic-based methods for comparing entities heavily depends on the axioms that have been described for them in the Knowledge Base (KB). Due to the incompleteness of even large and well engineered KBs, such methods suffer from low recall when applied in real-world use cases. To address this, we designed a reasoning framework that combines logic-based subsumption with statistical methods for on-the-fly knowledge extraction. Statistical methods extract additional (missing) axioms for the compared entities with the goal of tackling the incompleteness of KBs and thus improving recall. Although this can be beneficial, it can also introduce noise (false positives or false negatives). Hence, our framework uses heuristics to assess whether knowledge extraction is likely to be advantageous and only activates the statistical components if this is the case. We instantiate our framework by combining lightweight logic-based reasoning implemented on top of existing triple-stores with an axiom extraction method that is based on the labels of concepts. Our work was motivated by industrial use cases over which we evaluate our instantiated framework, showing that it outperforms approaches that are only based on textual information. Besides the best combination of precision and recall, our implementation is also scalable and is currently used in an industrial production environment.
Collapse
Affiliation(s)
- Andreas Harth
- University of Erlangen-Nuremberg, Nuremberg, Germany
| | - Sabrina Kirrane
- Vienna University of Economics and Business, Vienna, Austria
| | | | | | - Anisa Rula
- University of Milano-Bicocca, Milan, Italy
| | | | | | - Michael Cochez
- Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
28
|
|
29
|
|
30
|
Jackson RC, Balhoff JP, Douglass E, Harris NL, Mungall CJ, Overton JA. ROBOT: A Tool for Automating Ontology Workflows. BMC Bioinformatics 2019; 20:407. [PMID: 31357927 PMCID: PMC6664714 DOI: 10.1186/s12859-019-3002-3] [Citation(s) in RCA: 68] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Accepted: 07/19/2019] [Indexed: 11/21/2022] Open
Abstract
BACKGROUND Ontologies are invaluable in the life sciences, but building and maintaining ontologies often requires a challenging number of distinct tasks such as running automated reasoners and quality control checks, extracting dependencies and application-specific subsets, generating standard reports, and generating release files in multiple formats. Similar to more general software development, automation is the key to executing and managing these tasks effectively and to releasing more robust products in standard forms. For ontologies using the Web Ontology Language (OWL), the OWL API Java library is the foundation for a range of software tools, including the Protégé ontology editor. In the Open Biological and Biomedical Ontologies (OBO) community, we recognized the need to package a wide range of low-level OWL API functionality into a library of common higher-level operations and to make those operations available as a command-line tool. RESULTS ROBOT (a recursive acronym for "ROBOT is an OBO Tool") is an open source library and command-line tool for automating ontology development tasks. The library can be called from any programming language that runs on the Java Virtual Machine (JVM). Most usage is through the command-line tool, which runs on macOS, Linux, and Windows. ROBOT provides ontology processing commands for a variety of tasks, including commands for converting formats, running a reasoner, creating import modules, running reports, and various other tasks. These commands can be combined into larger workflows using a separate task execution system such as GNU Make, and workflows can be automatically executed within continuous integration systems. CONCLUSIONS ROBOT supports automation of a wide range of ontology development tasks, focusing on OBO conventions. It packages common high-level ontology development functionality into a convenient library, and makes it easy to configure, combine, and execute individual tasks in comprehensive, automated workflows. This helps ontology developers to efficiently create, maintain, and release high-quality ontologies, so that they can spend more time focusing on development tasks. It also helps guarantee that released ontologies are free of certain types of logical errors and conform to standard quality control checks, increasing the overall robustness and efficiency of the ontology development lifecycle.
Collapse
Affiliation(s)
| | - James P Balhoff
- Renaissance Computing Institute, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Eric Douglass
- Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Nomi L Harris
- Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | | | | |
Collapse
|
31
|
Smaili FZ, Gao X, Hoehndorf R. OPA2Vec: combining formal and informal content of biomedical ontologies to improve similarity-based prediction. Bioinformatics 2018; 35:2133-2140. [DOI: 10.1093/bioinformatics/bty933] [Citation(s) in RCA: 65] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2018] [Revised: 11/02/2018] [Accepted: 11/07/2018] [Indexed: 12/11/2022] Open
Affiliation(s)
- Fatima Zohra Smaili
- Computer, Electrical & Mathematical Sciences and Engineering (CEMSE) Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Xin Gao
- Computer, Electrical & Mathematical Sciences and Engineering (CEMSE) Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Robert Hoehndorf
- Computer, Electrical & Mathematical Sciences and Engineering (CEMSE) Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| |
Collapse
|
32
|
Alshahrani M, Khan MA, Maddouri O, Kinjo AR, Queralt-Rosinach N, Hoehndorf R. Neuro-symbolic representation learning on biological knowledge graphs. Bioinformatics 2018; 33:2723-2730. [PMID: 28449114 PMCID: PMC5860058 DOI: 10.1093/bioinformatics/btx275] [Citation(s) in RCA: 54] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2016] [Accepted: 04/18/2017] [Indexed: 11/12/2022] Open
Abstract
Motivation Biological data and knowledge bases increasingly rely on Semantic Web technologies and the use of knowledge graphs for data integration, retrieval and federated queries. In the past years, feature learning methods that are applicable to graph-structured data are becoming available, but have not yet widely been applied and evaluated on structured biological knowledge. Results: We develop a novel method for feature learning on biological knowledge graphs. Our method combines symbolic methods, in particular knowledge representation using symbolic logic and automated reasoning, with neural networks to generate embeddings of nodes that encode for related information within knowledge graphs. Through the use of symbolic logic, these embeddings contain both explicit and implicit information. We apply these embeddings to the prediction of edges in the knowledge graph representing problems of function prediction, finding candidate genes of diseases, protein-protein interactions, or drug target relations, and demonstrate performance that matches and sometimes outperforms traditional approaches based on manually crafted features. Our method can be applied to any biological knowledge graph, and will thereby open up the increasing amount of Semantic Web based knowledge bases in biology to use in machine learning and data analytics. Availability and implementation https://github.com/bio-ontology-research-group/walking-rdf-and-owl. Contact robert.hoehndorf@kaust.edu.sa. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mona Alshahrani
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Mohammad Asif Khan
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Omar Maddouri
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 23955-6900, Kingdom of Saudi Arabia.,Life Sciences Division, College of Science & Engineering, Hamad Bin Khalifa University, HBKU, Doha, Qatar
| | - Akira R Kinjo
- Institute for Protein Research, Osaka University 3-2 Yamadaoka, Suita, Osaka 565-0871, Japan
| | - Núria Queralt-Rosinach
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037 USA
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 23955-6900, Kingdom of Saudi Arabia
| |
Collapse
|
33
|
Stucky BJ, Guralnick R, Deck J, Denny EG, Bolmgren K, Walls R. The Plant Phenology Ontology: A New Informatics Resource for Large-Scale Integration of Plant Phenology Data. FRONTIERS IN PLANT SCIENCE 2018; 9:517. [PMID: 29765382 PMCID: PMC5938398 DOI: 10.3389/fpls.2018.00517] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/01/2018] [Accepted: 04/04/2018] [Indexed: 05/25/2023]
Abstract
Plant phenology - the timing of plant life-cycle events, such as flowering or leafing out - plays a fundamental role in the functioning of terrestrial ecosystems, including human agricultural systems. Because plant phenology is often linked with climatic variables, there is widespread interest in developing a deeper understanding of global plant phenology patterns and trends. Although phenology data from around the world are currently available, truly global analyses of plant phenology have so far been difficult because the organizations producing large-scale phenology data are using non-standardized terminologies and metrics during data collection and data processing. To address this problem, we have developed the Plant Phenology Ontology (PPO). The PPO provides the standardized vocabulary and semantic framework that is needed for large-scale integration of heterogeneous plant phenology data. Here, we describe the PPO, and we also report preliminary results of using the PPO and a new data processing pipeline to build a large dataset of phenology information from North America and Europe.
Collapse
Affiliation(s)
- Brian J. Stucky
- Florida Museum of Natural History, University of Florida, Gainesville, FL, United States
| | - Rob Guralnick
- Florida Museum of Natural History, University of Florida, Gainesville, FL, United States
| | - John Deck
- Berkeley Natural History Museums, University of California, Berkeley, Berkeley, CA, United States
| | - Ellen G. Denny
- USA National Phenology Network, The University of Arizona, Tucson, AZ, United States
| | - Kjell Bolmgren
- Unit for Field-based Forest Research, Swedish University of Agricultural Sciences, Lammhult, Sweden
| | - Ramona Walls
- CyVerse, The University of Arizona, Tucson, AZ, United States
| |
Collapse
|
34
|
Rodríguez-García MÁ, Hoehndorf R. Inferring ontology graph structures using OWL reasoning. BMC Bioinformatics 2018; 19:7. [PMID: 29304741 PMCID: PMC5756413 DOI: 10.1186/s12859-017-1999-8] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2017] [Accepted: 12/13/2017] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Ontologies are representations of a conceptualization of a domain. Traditionally, ontologies in biology were represented as directed acyclic graphs (DAG) which represent the backbone taxonomy and additional relations between classes. These graphs are widely exploited for data analysis in the form of ontology enrichment or computation of semantic similarity. More recently, ontologies are developed in a formal language such as the Web Ontology Language (OWL) and consist of a set of axioms through which classes are defined or constrained. While the taxonomy of an ontology can be inferred directly from the axioms of an ontology as one of the standard OWL reasoning tasks, creating general graph structures from OWL ontologies that exploit the ontologies' semantic content remains a challenge. RESULTS We developed a method to transform ontologies into graphs using an automated reasoner while taking into account all relations between classes. Searching for (existential) patterns in the deductive closure of ontologies, we can identify relations between classes that are implied but not asserted and generate graph structures that encode for a large part of the ontologies' semantic content. We demonstrate the advantages of our method by applying it to inference of protein-protein interactions through semantic similarity over the Gene Ontology and demonstrate that performance is increased when graph structures are inferred using deductive inference according to our method. Our software and experiment results are available at http://github.com/bio-ontology-research-group/Onto2Graph . CONCLUSIONS Onto2Graph is a method to generate graph structures from OWL ontologies using automated reasoning. The resulting graphs can be used for improved ontology visualization and ontology-based data analysis.
Collapse
Affiliation(s)
- Miguel Ángel Rodríguez-García
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955-6900 Kingdom of Saudi Arabia
- Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, Thuwal, 23955-6900 Kingdom of Saudi Arabia
| | - Robert Hoehndorf
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955-6900 Kingdom of Saudi Arabia
- Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, Thuwal, 23955-6900 Kingdom of Saudi Arabia
| |
Collapse
|
35
|
Abstract
Background Data-driven cell classification is becoming common and is now being implemented on a massive scale by projects such as the Human Cell Atlas. The scale of these efforts poses a challenge. How can the results be made searchable and accessible to biologists in general? How can they be related back to the rich classical knowledge of cell-types, anatomy and development? How will data from the various types of single cell analysis be made cross-searchable? Structured annotation with ontology terms provides a potential solution to these problems. In turn, there is great potential for using the outputs of data-driven cell classification to structure ontologies and integrate them with data-driven cell query systems. Results Focusing on examples from the mouse retina and Drosophila olfactory system, I present worked examples illustrating how formalization of cell ontologies can enhance querying of data-driven cell-classifications and how ontologies can be extended by integrating the outputs of data-driven cell classifications. Conclusions Annotation with ontology terms can play an important role in making data driven classifications searchable and query-able, but fulfilling this potential requires standardized formal patterns for structuring ontologies and annotations and for linking ontologies to the outputs of data-driven classification.
Collapse
Affiliation(s)
- David Osumi-Sutherland
- European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK.
| |
Collapse
|
36
|
Rodríguez-García MÁ, Gkoutos GV, Schofield PN, Hoehndorf R. Integrating phenotype ontologies with PhenomeNET. J Biomed Semantics 2017; 8:58. [PMID: 29258588 PMCID: PMC5735523 DOI: 10.1186/s13326-017-0167-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Accepted: 11/22/2017] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Integration and analysis of phenotype data from humans and model organisms is a key challenge in building our understanding of normal biology and pathophysiology. However, the range of phenotypes and anatomical details being captured in clinical and model organism databases presents complex problems when attempting to match classes across species and across phenotypes as diverse as behaviour and neoplasia. We have previously developed PhenomeNET, a system for disease gene prioritization that includes as one of its components an ontology designed to integrate phenotype ontologies. While not applicable to matching arbitrary ontologies, PhenomeNET can be used to identify related phenotypes in different species, including human, mouse, zebrafish, nematode worm, fruit fly, and yeast. RESULTS Here, we apply the PhenomeNET to identify related classes from two phenotype and two disease ontologies using automated reasoning. We demonstrate that we can identify a large number of mappings, some of which require automated reasoning and cannot easily be identified through lexical approaches alone. Combining automated reasoning with lexical matching further improves results in aligning ontologies. CONCLUSIONS PhenomeNET can be used to align and integrate phenotype ontologies. The results can be utilized for biomedical analyses in which phenomena observed in model organisms are used to identify causative genes and mutations underlying human disease.
Collapse
Affiliation(s)
- Miguel Ángel Rodríguez-García
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955-6900, Saudi Arabia.,Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology, 4700 KAUST, PO Box 2882, Thuwal, 23955-6900, Saudi Arabia
| | - Georgios V Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, B15 2TT, UK.,Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, B15 2TT, UK.,Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, SY23 2AX, UK
| | - Paul N Schofield
- Department of Physiology, Development & Neuroscience, University of Cambridge, Downing Street, Cambridge, CB2 3EG, UK
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955-6900, Saudi Arabia. .,Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology, 4700 KAUST, PO Box 2882, Thuwal, 23955-6900, Saudi Arabia.
| |
Collapse
|
37
|
Boudellioua I, Mahamad Razali RB, Kulmanov M, Hashish Y, Bajic VB, Goncalves-Serra E, Schoenmakers N, Gkoutos GV, Schofield PN, Hoehndorf R. Semantic prioritization of novel causative genomic variants. PLoS Comput Biol 2017; 13:e1005500. [PMID: 28414800 PMCID: PMC5411092 DOI: 10.1371/journal.pcbi.1005500] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2016] [Revised: 05/01/2017] [Accepted: 04/04/2017] [Indexed: 12/14/2022] Open
Abstract
Discriminating the causative disease variant(s) for individuals with inherited or de novo mutations presents one of the main challenges faced by the clinical genetics community today. Computational approaches for variant prioritization include machine learning methods utilizing a large number of features, including molecular information, interaction networks, or phenotypes. Here, we demonstrate the PhenomeNET Variant Predictor (PVP) system that exploits semantic technologies and automated reasoning over genotype-phenotype relations to filter and prioritize variants in whole exome and whole genome sequencing datasets. We demonstrate the performance of PVP in identifying causative variants on a large number of synthetic whole exome and whole genome sequences, covering a wide range of diseases and syndromes. In a retrospective study, we further illustrate the application of PVP for the interpretation of whole exome sequencing data in patients suffering from congenital hypothyroidism. We find that PVP accurately identifies causative variants in whole exome and whole genome sequencing datasets and provides a powerful resource for the discovery of causal variants. We address the problem of how to distinguish which of the many thousands of DNA sequence variants carried by an individual with a rare disease is responsible for the disease phenotypes. This can help clinicians arrive at a diagnosis, but also can be instrumental in improving our understanding of the pathobiology of the disease. Many methods are currently available to help with the problem of determining causative variant, using information about evolutionary conservation and prediction of the functional consequences of the sequence variant. We have developed a novel algorithm (PVP) which augments existing strategies by using the similarity of the patients phenotype to known phenotype-genotype data in human and model organism databases to further rank potential candidate genes. In a retrospective study, we apply PVP to the interpretation of whole exome sequencing data in patients suffering from congenital hypothyroidism, and find that PVP accurately identifies causative variants in whole exome and whole genome sequencing datasets and provides a powerful resource for the discovery of causal variants.
Collapse
Affiliation(s)
- Imane Boudellioua
- King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Rozaimi B. Mahamad Razali
- King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Maxat Kulmanov
- King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Yasmeen Hashish
- King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Vladimir B. Bajic
- King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Eva Goncalves-Serra
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom
| | - Nadia Schoenmakers
- University of Cambridge Metabolic Research Laboratories, Wellcome Trust—Medical Research Council, Institute of Metabolic Science, Addenbrooke’s Hospital, Cambridge, United Kingdom
| | - Georgios V. Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, United Kingdom
- Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, United Kingdom
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, United Kingdom
- * E-mail: (GVG); (PNS); (RH)
| | - Paul N. Schofield
- Department of Physiology, Development & Neuroscience, University of Cambridge, Cambridge, United Kingdom
- * E-mail: (GVG); (PNS); (RH)
| | - Robert Hoehndorf
- King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
- * E-mail: (GVG); (PNS); (RH)
| |
Collapse
|
38
|
Parsia B, Matentzoglu N, Gonçalves RS, Glimm B, Steigmiller A. The OWL Reasoner Evaluation (ORE) 2015 Competition Report. J Autom Reason 2017; 59:455-482. [PMID: 30069067 PMCID: PMC6044265 DOI: 10.1007/s10817-017-9406-8] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2016] [Accepted: 02/01/2017] [Indexed: 11/25/2022]
Abstract
The OWL Reasoner Evaluation competition is an annual competition (with an associated workshop) that pits OWL 2 compliant reasoners against each other on various standard reasoning tasks over naturally occurring problems. The 2015 competition was the third of its sort and had 14 reasoners competing in six tracks comprising three tasks (consistency, classification, and realisation) over two profiles (OWL 2 DL and EL). In this paper, we discuss the design, execution and results of the 2015 competition with particular attention to lessons learned for benchmarking, comparative experiments, and future competitions.
Collapse
Affiliation(s)
- Bijan Parsia
- Information Management Group, University of Manchester, Manchester, UK
| | | | - Rafael S. Gonçalves
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA USA
| | - Birte Glimm
- Institute of Artificial Intelligence, University of Ulm, Ulm, Germany
| | | |
Collapse
|
39
|
Zhou Z, Qi G. GEL: A Platform-Independent Reasoner for Parallel Classification with OWL EL Ontologies Using Graph Representation. INT J ARTIF INTELL T 2017. [DOI: 10.1142/s0218213017600016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In the community of Semantic Web, the state-of-the-art reasoners are all implemented on specific platforms. As the Web Ontology Language (OWL) has been widely used in different real-world applications, it is required that the task of reasoning can be flexibly adapted to different platforms. In this paper, we take the first effort to give a platformindependent approach for parallel classification of the description logic OWL EL, which is a tractable fragment of OWL 2. Classification, which is the task of computing a subsumption hierarchy between concepts, plays an important role in many applications of OWL EL. The current classification methods rely on the specific parallel computation platforms and can hardly achieve a tradeoff between efficiency and scalability. To develop a platform-independent approach for performing classification in OWL EL, we first give a novel and well defined graph formalism GEL for representing EL ontologies and present a group of sound and complete rules for ontology classification on a GEL graph. Based on this formalism, we design a parallel classification algorithm, which can be implemented on different parallel computation platforms, and we also show the correctness of the algorithm. We implement a system, which can easily switch between multi-core and a distributed cluster. Finally, we conduct experiments on several real-world OWL EL ontologies. The experimental results show that our system outperforms two state-of-the-art EL reasoning systems, and has a linear scalability on the extensions of GO ontologies.
Collapse
Affiliation(s)
- Zhangquan Zhou
- The School of Computer Science and Engineering, Southeast University, Nanjing (086-211189), China
| | - Guilin Qi
- The School of Computer Science and Engineering, Southeast University, Nanjing (086-211189), China
| |
Collapse
|
40
|
Esposito A, Esposito AM, Troncone A, Cordasco G, Orlandini A, Tsoukalas L. Editorial. INT J ARTIF INTELL T 2017. [DOI: 10.1142/s0218213017020018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Affiliation(s)
- Anna Esposito
- Università degli Studi della Campania “Luigi Vanvitelli”, Dipartimento di Psicologia, Viale Ellittico n. 31, 81100 Caserta, Italy
| | - Antonietta M. Esposito
- Istituto Nazionale di Geofisica E Vulcanologia, Sezione di Napoli, Osservatorio Vesuviano, Napoli, Italy
| | - Alda Troncone
- Università degli Studi della Campania “Luigi Vanvitelli”, Dipartimento di Psicologia, Viale Ellittico n. 31, 81100 Caserta, Italy
| | - Gennaro Cordasco
- Università degli Studi della Campania “Luigi Vanvitelli”, Dipartimento di Psicologia, Viale Ellittico n. 31, 81100 Caserta, Italy
| | - Andrea Orlandini
- Consiglio Nazionale delle Ricerche, Istituto di Scienze e Tecnologie della Cognizione (CNR-ISTC), Via S. Martino della Battaglia 44 I-00185, Roma, Italia
| | - Lefteris Tsoukalas
- Applied Intelligent Systems Lab (AISL), Purdue University, 400 Central Drive, West Lafayette, Indiana, USA
| |
Collapse
|
41
|
Hoehndorf R, Alshahrani M, Gkoutos GV, Gosline G, Groom Q, Hamann T, Kattge J, de Oliveira SM, Schmidt M, Sierra S, Smets E, Vos RA, Weiland C. The flora phenotype ontology (FLOPO): tool for integrating morphological traits and phenotypes of vascular plants. J Biomed Semantics 2016; 7:65. [PMID: 27842607 PMCID: PMC5109718 DOI: 10.1186/s13326-016-0107-8] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Accepted: 11/01/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The systematic analysis of a large number of comparable plant trait data can support investigations into phylogenetics and ecological adaptation, with broad applications in evolutionary biology, agriculture, conservation, and the functioning of ecosystems. Floras, i.e., books collecting the information on all known plant species found within a region, are a potentially rich source of such plant trait data. Floras describe plant traits with a focus on morphology and other traits relevant for species identification in addition to other characteristics of plant species, such as ecological affinities, distribution, economic value, health applications, traditional uses, and so on. However, a key limitation in systematically analyzing information in Floras is the lack of a standardized vocabulary for the described traits as well as the difficulties in extracting structured information from free text. RESULTS We have developed the Flora Phenotype Ontology (FLOPO), an ontology for describing traits of plant species found in Floras. We used the Plant Ontology (PO) and the Phenotype And Trait Ontology (PATO) to extract entity-quality relationships from digitized taxon descriptions in Floras, and used a formal ontological approach based on phenotype description patterns and automated reasoning to generate the FLOPO. The resulting ontology consists of 25,407 classes and is based on the PO and PATO. The classified ontology closely follows the structure of Plant Ontology in that the primary axis of classification is the observed plant anatomical structure, and more specific traits are then classified based on parthood and subclass relations between anatomical structures as well as subclass relations between phenotypic qualities. CONCLUSIONS The FLOPO is primarily intended as a framework based on which plant traits can be integrated computationally across all species and higher taxa of flowering plants. Importantly, it is not intended to replace established vocabularies or ontologies, but rather serve as an overarching framework based on which different application- and domain-specific ontologies, thesauri and vocabularies of phenotypes observed in flowering plants can be integrated.
Collapse
Affiliation(s)
- Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955–6900 Kingdom of Saudi Arabia
- Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955–6900 Kingdom of Saudi Arabia
| | - Mona Alshahrani
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955–6900 Kingdom of Saudi Arabia
- Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955–6900 Kingdom of Saudi Arabia
| | - Georgios V. Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, B15 2TT United Kingdom
- Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, B15 2TT United Kingdom
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, SY23 2AX United Kingdom
| | - George Gosline
- Royal Botanical Gardens, Kew, Richmond, Surrey, TW9 3AB United Kingdom
| | - Quentin Groom
- Botanic Garden Meise, Nieuwelaan 38, Meise, 1860 Belgium
| | - Thomas Hamann
- Naturalis Biodiversity Center, P.O. Box 9517, Leiden, 2300 RA The Netherlands
| | - Jens Kattge
- Max Planck Institute for Biogeochemistry, Hans Knoell Str. 10, Jena, 07745 Germany
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Deutscher Platz 5e, Leipzig, 04103 Germany
| | | | - Marco Schmidt
- Senckenberg Biodiversity and Climate Research Centre (BiK-F), Senckenberganlage 25, Frankfurt am Main, 60325 Germany
| | - Soraya Sierra
- Naturalis Biodiversity Center, P.O. Box 9517, Leiden, 2300 RA The Netherlands
| | - Erik Smets
- Naturalis Biodiversity Center, P.O. Box 9517, Leiden, 2300 RA The Netherlands
| | - Rutger A. Vos
- Naturalis Biodiversity Center, P.O. Box 9517, Leiden, 2300 RA The Netherlands
| | - Claus Weiland
- Senckenberg Biodiversity and Climate Research Centre (BiK-F), Senckenberganlage 25, Frankfurt am Main, 60325 Germany
| |
Collapse
|
42
|
Eiter T, Fink M, Stepanova D. Data repair of inconsistent nonmonotonic description logic programs. ARTIF INTELL 2016. [DOI: 10.1016/j.artint.2016.06.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
43
|
|
44
|
Hill DP, D'Eustachio P, Berardini TZ, Mungall CJ, Renedo N, Blake JA. Modeling biochemical pathways in the gene ontology. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw126. [PMID: 27589964 PMCID: PMC5009323 DOI: 10.1093/database/baw126] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/24/2016] [Accepted: 08/10/2016] [Indexed: 12/05/2022]
Abstract
The concept of a biological pathway, an ordered sequence of molecular transformations, is used to collect and represent molecular knowledge for a broad span of organismal biology. Representations of biomedical pathways typically are rich but idiosyncratic presentations of organized knowledge about individual pathways. Meanwhile, biomedical ontologies and associated annotation files are powerful tools that organize molecular information in a logically rigorous form to support computational analysis. The Gene Ontology (GO), representing Molecular Functions, Biological Processes and Cellular Components, incorporates many aspects of biological pathways within its ontological representations. Here we present a methodology for extending and refining the classes in the GO for more comprehensive, consistent and integrated representation of pathways, leveraging knowledge embedded in current pathway representations such as those in the Reactome Knowledgebase and MetaCyc. With carbohydrate metabolic pathways as a use case, we discuss how our representation supports the integration of variant pathway classes into a unified ontological structure that can be used for data comparison and analysis.
Collapse
Affiliation(s)
- David P Hill
- The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | - Peter D'Eustachio
- Department of Biochemistry and Molecular Pharmacology, NYU School of Medicine, New York, NY 10016, USA
| | - Tanya Z Berardini
- Arabidopsis Information Resource, Phoenix Bioinformatics, Redwood City, CA 94063, USA
| | | | | | | |
Collapse
|
45
|
|
46
|
Slater L, Gkoutos GV, Schofield PN, Hoehndorf R. Using AberOWL for fast and scalable reasoning over BioPortal ontologies. J Biomed Semantics 2016; 7:49. [PMID: 27502585 PMCID: PMC4976511 DOI: 10.1186/s13326-016-0090-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2016] [Accepted: 07/08/2016] [Indexed: 11/30/2022] Open
Abstract
Background Reasoning over biomedical ontologies using their OWL semantics has traditionally been a challenging task due to the high theoretical complexity of OWL-based automated reasoning. As a consequence, ontology repositories, as well as most other tools utilizing ontologies, either provide access to ontologies without use of automated reasoning, or limit the number of ontologies for which automated reasoning-based access is provided. Methods We apply the AberOWL infrastructure to provide automated reasoning-based access to all accessible and consistent ontologies in BioPortal (368 ontologies). We perform an extensive performance evaluation to determine query times, both for queries of different complexity and for queries that are performed in parallel over the ontologies. Results and conclusions We demonstrate that, with the exception of a few ontologies, even complex and parallel queries can now be answered in milliseconds, therefore allowing automated reasoning to be used on a large scale, to run in parallel, and with rapid response times.
Collapse
Affiliation(s)
- Luke Slater
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, B15 2TT, United Kingdom.
| | - Georgios V Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, B15 2TT, United Kingdom.,Institute of Translational Medicine, University Hospitals Birmingham NHS Foundation Trust, Birmingham, B15 2TT, United Kingdom
| | - Paul N Schofield
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, England, CB2 3EG, United Kingdom
| | - Robert Hoehndorf
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955-6900, Saudi Arabia
| |
Collapse
|
47
|
Dececchi TA, Mabee PM, Blackburn DC. Data Sources for Trait Databases: Comparing the Phenomic Content of Monographs and Evolutionary Matrices. PLoS One 2016; 11:e0155680. [PMID: 27191170 PMCID: PMC4871461 DOI: 10.1371/journal.pone.0155680] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2016] [Accepted: 05/03/2016] [Indexed: 01/17/2023] Open
Abstract
Databases of organismal traits that aggregate information from one or multiple sources can be leveraged for large-scale analyses in biology. Yet the differences among these data streams and how well they capture trait diversity have never been explored. We present the first analysis of the differences between phenotypes captured in free text of descriptive publications ('monographs') and those used in phylogenetic analyses ('matrices'). We focus our analysis on osteological phenotypes of the limbs of four extinct vertebrate taxa critical to our understanding of the fin-to-limb transition. We find that there is low overlap between the anatomical entities used in these two sources of phenotype data, indicating that phenotypes represented in matrices are not simply a subset of those found in monographic descriptions. Perhaps as expected, compared to characters found in matrices, phenotypes in monographs tend to emphasize descriptive and positional morphology, be somewhat more complex, and relate to fewer additional taxa. While based on a small set of focal taxa, these qualitative and quantitative data suggest that either source of phenotypes alone will result in incomplete knowledge of variation for a given taxon. As a broader community develops to use and expand databases characterizing organismal trait diversity, it is important to recognize the limitations of the data sources and develop strategies to more fully characterize variation both within species and across the tree of life.
Collapse
Affiliation(s)
- T. Alex Dececchi
- Department of Biology, University of South Dakota, Vermillion, South Dakota, United States of America
| | - Paula M. Mabee
- Department of Biology, University of South Dakota, Vermillion, South Dakota, United States of America
| | - David C. Blackburn
- Florida Museum of Natural History, University of Florida, Gainesville, Florida, United States of America
| |
Collapse
|
48
|
Detwiler LT, Mejino JLV, Brinkley JF. From frames to OWL2: Converting the Foundational Model of Anatomy. Artif Intell Med 2016; 69:12-21. [PMID: 27235801 DOI: 10.1016/j.artmed.2016.04.003] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2015] [Revised: 04/22/2016] [Accepted: 04/23/2016] [Indexed: 12/17/2022]
Abstract
OBJECTIVE The Foundational Model of Anatomy (FMA) [Rosse C, Mejino JLV. A reference ontology for bioinformatics: the Foundational Model of Anatomy. J. Biomed. Inform. 2003;36:478-500] is an ontology that represents canonical anatomy at levels ranging from the entire body to biological macromolecules, and has rapidly become the primary reference ontology for human anatomy, and a template for model organisms. Prior to this work, the FMA was developed in a knowledge modeling language known as Protégé Frames. Frames is an intuitive representational language, but is no longer the industry standard. Recognizing the need for an official version of the FMA in the more modern semantic web language OWL2 (hereafter referred to as OWL), the objective of this work was to create a generalizable Frames-to-OWL conversion tool, to use the tool to convert the FMA to OWL, to "clean up" the converted FMA so that it classifies under an EL reasoner, and then to do all further development in OWL. METHODS The conversion tool is a Java application that uses the Protégé knowledge representation API for interacting with the initial Frames ontology, and uses the OWL-API for producing new statements (axioms, etc.) in OWL. The converter is relation centric. The conversion is configurable, on a property-by-property basis, via user-specifiable XML configuration files. The best conversion, for each property, was determined in conjunction with the FMA knowledge author. The convertor is potentially generalizable, which we partially demonstrate by using it to convert our Ontology of Craniofacial Development and Malformation as well as the FMA. Post-conversion cleanup involved using the Explain feature of Protégé to trace classification errors under the ELK reasoner in Protégé, fixing the errors, then re-running the reasoner. RESULTS We are currently doing all our development in the converted and cleaned-up version of the FMA. The FMA (updated every 3 months) is available via our FMA web page http://si.washington.edu/projects/fma, which also provides access to mailing lists, an issue tracker, a SPARQL endpoint (updated every week), and an online browser. The converted OCDM is available at http://www.si.washington.edu/projects/ocdm. The conversion code is open source, and available at http://purl.org/sig/software/frames2owl. Prior to the post-conversion cleanup 73% of the more than 100,000 classes were unsatisfiable. After correction of six types of errors no classes remained unsatisfiable. CONCLUSION Because our FMA conversion captures all or most of the information in the Frames version, is the only complete OWL version that classifies under an EL reasoner, and is maintained by the FMA authors themselves, we propose that this version should be the only official release version of the FMA in OWL, supplanting all other versions. Although several issues remain to be resolved post-conversion, release of a single, standardized version of the FMA in OWL will greatly facilitate its use in informatics research and in the development of a global knowledge base within the semantic web. Because of the fundamental nature of anatomy in both understanding and organizing biomedical information, and because of the importance of the FMA in particular in representing human anatomy, the FMA in OWL should greatly accelerate the development of an anatomically based structural information framework for organizing and linking a large amount of biomedical information.
Collapse
Affiliation(s)
- Landon T Detwiler
- Biological Structure, University of Washington Structural Informatics Group, 1959 NE Pacific Street, Box 357420, Seattle, Washington 98195, USA.
| | - Jose L V Mejino
- Biological Structure, University of Washington Structural Informatics Group, 1959 NE Pacific Street, Box 357420, Seattle, Washington 98195, USA
| | - James F Brinkley
- Biological Structure, University of Washington Structural Informatics Group, 1959 NE Pacific Street, Box 357420, Seattle, Washington 98195, USA; Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA; Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
49
|
Druzinsky RE, Balhoff JP, Crompton AW, Done J, German RZ, Haendel MA, Herrel A, Herring SW, Lapp H, Mabee PM, Muller HM, Mungall CJ, Sternberg PW, Van Auken K, Vinyard CJ, Williams SH, Wall CE. Muscle Logic: New Knowledge Resource for Anatomy Enables Comprehensive Searches of the Literature on the Feeding Muscles of Mammals. PLoS One 2016; 11:e0149102. [PMID: 26870952 PMCID: PMC4752357 DOI: 10.1371/journal.pone.0149102] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2015] [Accepted: 01/27/2016] [Indexed: 01/27/2023] Open
Abstract
Background In recent years large bibliographic databases have made much of the published literature of biology available for searches. However, the capabilities of the search engines integrated into these databases for text-based bibliographic searches are limited. To enable searches that deliver the results expected by comparative anatomists, an underlying logical structure known as an ontology is required. Development and Testing of the Ontology Here we present the Mammalian Feeding Muscle Ontology (MFMO), a multi-species ontology focused on anatomical structures that participate in feeding and other oral/pharyngeal behaviors. A unique feature of the MFMO is that a simple, computable, definition of each muscle, which includes its attachments and innervation, is true across mammals. This construction mirrors the logical foundation of comparative anatomy and permits searches using language familiar to biologists. Further, it provides a template for muscles that will be useful in extending any anatomy ontology. The MFMO is developed to support the Feeding Experiments End-User Database Project (FEED, https://feedexp.org/), a publicly-available, online repository for physiological data collected from in vivo studies of feeding (e.g., mastication, biting, swallowing) in mammals. Currently the MFMO is integrated into FEED and also into two literature-specific implementations of Textpresso, a text-mining system that facilitates powerful searches of a corpus of scientific publications. We evaluate the MFMO by asking questions that test the ability of the ontology to return appropriate answers (competency questions). We compare the results of queries of the MFMO to results from similar searches in PubMed and Google Scholar. Results and Significance Our tests demonstrate that the MFMO is competent to answer queries formed in the common language of comparative anatomy, but PubMed and Google Scholar are not. Overall, our results show that by incorporating anatomical ontologies into searches, an expanded and anatomically comprehensive set of results can be obtained. The broader scientific and publishing communities should consider taking up the challenge of semantically enabled search capabilities.
Collapse
Affiliation(s)
- Robert E. Druzinsky
- Department of Oral Biology, University of Illinois at Chicago, Chicago, Illinois, United States of America
- * E-mail:
| | - James P. Balhoff
- RTI International, Research Triangle Park, North Carolina, United States of America
| | - Alfred W. Crompton
- Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - James Done
- Division of Biology and Biological Engineering, M/C 156–29, California Institute of Technology, Pasadena, California, United States of America
| | - Rebecca Z. German
- Department of Anatomy and Neurobiology, Northeast Ohio Medical University, Rootstown, Ohio, United States of America
| | - Melissa A. Haendel
- Oregon Health and Science University, Portland, Oregon, United States of America
| | - Anthony Herrel
- Département d’Ecologie et de Gestion de la Biodiversité, Museum National d’Histoire Naturelle, Paris, France
| | - Susan W. Herring
- University of Washington, Department of Orthodontics, Seattle, Washington, United States of America
| | - Hilmar Lapp
- National Evolutionary Synthesis Center, Durham, North Carolina, United States of America
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina, United States of America
| | - Paula M. Mabee
- Department of Biology, University of South Dakota, Vermillion, South Dakota, United States of America
| | - Hans-Michael Muller
- Division of Biology and Biological Engineering, M/C 156–29, California Institute of Technology, Pasadena, California, United States of America
| | - Christopher J. Mungall
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Paul W. Sternberg
- Division of Biology and Biological Engineering, M/C 156–29, California Institute of Technology, Pasadena, California, United States of America
- Howard Hughes Medical Institute, M/C 156–29, California Institute of Technology, Pasadena, California, United States of America
| | - Kimberly Van Auken
- Division of Biology and Biological Engineering, M/C 156–29, California Institute of Technology, Pasadena, California, United States of America
| | - Christopher J. Vinyard
- Department of Anatomy and Neurobiology, Northeast Ohio Medical University, Rootstown, Ohio, United States of America
| | - Susan H. Williams
- Department of Biomedical Sciences, Ohio University Heritage College of Osteopathic Medicine, Athens, Ohio, United States of America
| | - Christine E. Wall
- Department of Evolutionary Anthropology, Duke University, Durham, North Carolina, United States of America
| |
Collapse
|
50
|
Hoehndorf R, Schofield PN, Gkoutos GV. The role of ontologies in biological and biomedical research: a functional perspective. Brief Bioinform 2015; 16:1069-80. [PMID: 25863278 PMCID: PMC4652617 DOI: 10.1093/bib/bbv011] [Citation(s) in RCA: 119] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2014] [Revised: 01/20/2015] [Indexed: 12/19/2022] Open
Abstract
Ontologies are widely used in biological and biomedical research. Their success lies in their combination of four main features present in almost all ontologies: provision of standard identifiers for classes and relations that represent the phenomena within a domain; provision of a vocabulary for a domain; provision of metadata that describes the intended meaning of the classes and relations in ontologies; and the provision of machine-readable axioms and definitions that enable computational access to some aspects of the meaning of classes and relations. While each of these features enables applications that facilitate data integration, data access and analysis, a great potential lies in the possibility of combining these four features to support integrative analysis and interpretation of multimodal data. Here, we provide a functional perspective on ontologies in biology and biomedicine, focusing on what ontologies can do and describing how they can be used in support of integrative research. We also outline perspectives for using ontologies in data-driven science, in particular their application in structured data mining and machine learning applications.
Collapse
|