1
|
Vogt L, Mikó I, Bartolomaeus T. Anatomy and the type concept in biology show that ontologies must be adapted to the diagnostic needs of research. J Biomed Semantics 2022; 13:18. [PMID: 35761389 PMCID: PMC9235205 DOI: 10.1186/s13326-022-00268-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 04/12/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In times of exponential data growth in the life sciences, machine-supported approaches are becoming increasingly important and with them the need for FAIR (Findable, Accessible, Interoperable, Reusable) and eScience-compliant data and metadata standards. Ontologies, with their queryable knowledge resources, play an essential role in providing these standards. Unfortunately, biomedical ontologies only provide ontological definitions that answer What is it? questions, but no method-dependent empirical recognition criteria that answer How does it look? QUESTIONS Consequently, biomedical ontologies contain knowledge of the underlying ontological nature of structural kinds, but often lack sufficient diagnostic knowledge to unambiguously determine the reference of a term. RESULTS We argue that this is because ontology terms are usually textually defined and conceived as essentialistic classes, while recognition criteria often require perception-based definitions because perception-based contents more efficiently document and communicate spatial and temporal information-a picture is worth a thousand words. Therefore, diagnostic knowledge often must be conceived as cluster classes or fuzzy sets. Using several examples from anatomy, we point out the importance of diagnostic knowledge in anatomical research and discuss the role of cluster classes and fuzzy sets as concepts of grouping needed in anatomy ontologies in addition to essentialistic classes. In this context, we evaluate the role of the biological type concept and discuss its function as a general container concept for groupings not covered by the essentialistic class concept. CONCLUSIONS We conclude that many recognition criteria can be conceptualized as text-based cluster classes that use terms that are in turn based on perception-based fuzzy set concepts. Finally, we point out that only if biomedical ontologies model also relevant diagnostic knowledge in addition to ontological knowledge, they will fully realize their potential and contribute even more substantially to the establishment of FAIR and eScience-compliant data and metadata standards in the life sciences.
Collapse
Affiliation(s)
- Lars Vogt
- TIB Leibniz Information Centre for Science and Technology, Welfengarten 1B, 30167, Hannover, Germany.
| | - István Mikó
- Don Chandler Entomological Collection, University of New Hampshire, Durham, NH, USA
| | - Thomas Bartolomaeus
- Institut für Evolutionsbiologie und Ökologie, Universität Bonn, An der Immenburg 1, 53121, Bonn, Germany
| |
Collapse
|
2
|
Vogt L. FAIR data representation in times of eScience: a comparison of instance-based and class-based semantic representations of empirical data using phenotype descriptions as example. J Biomed Semantics 2021; 12:20. [PMID: 34823588 PMCID: PMC8613519 DOI: 10.1186/s13326-021-00254-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 11/11/2021] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND The size, velocity, and heterogeneity of Big Data outclasses conventional data management tools and requires data and metadata to be fully machine-actionable (i.e., eScience-compliant) and thus findable, accessible, interoperable, and reusable (FAIR). This can be achieved by using ontologies and through representing them as semantic graphs. Here, we discuss two different semantic graph approaches of representing empirical data and metadata in a knowledge graph, with phenotype descriptions as an example. Almost all phenotype descriptions are still being published as unstructured natural language texts, with far-reaching consequences for their FAIRness, substantially impeding their overall usability within the life sciences. However, with an increasing amount of anatomy ontologies becoming available and semantic applications emerging, a solution to this problem becomes available. Researchers are starting to document and communicate phenotype descriptions through the Web in the form of highly formalized and structured semantic graphs that use ontology terms and Uniform Resource Identifiers (URIs) to circumvent the problems connected with unstructured texts. RESULTS Using phenotype descriptions as an example, we compare and evaluate two basic representations of empirical data and their accompanying metadata in the form of semantic graphs: the class-based TBox semantic graph approach called Semantic Phenotype and the instance-based ABox semantic graph approach called Phenotype Knowledge Graph. Their main difference is that only the ABox approach allows for identifying every individual part and property mentioned in the description in a knowledge graph. This technical difference results in substantial practical consequences that significantly affect the overall usability of empirical data. The consequences affect findability, accessibility, and explorability of empirical data as well as their comparability, expandability, universal usability and reusability, and overall machine-actionability. Moreover, TBox semantic graphs often require querying under entailment regimes, which is computationally more complex. CONCLUSIONS We conclude that, from a conceptual point of view, the advantages of the instance-based ABox semantic graph approach outweigh its shortcomings and outweigh the advantages of the class-based TBox semantic graph approach. Therefore, we recommend the instance-based ABox approach as a FAIR approach for documenting and communicating empirical data and metadata in a knowledge graph.
Collapse
Affiliation(s)
- Lars Vogt
- TIB Leibniz Information Centre for Science and Technology, Welfengarten 1B, 30167, Hanover, Germany.
| |
Collapse
|
3
|
Stöver BC, Wiechers S, Müller KF. JPhyloIO: a Java library for event-based reading and writing of different phylogenetic file formats through a common interface. BMC Bioinformatics 2019; 20:402. [PMID: 31331268 PMCID: PMC6647125 DOI: 10.1186/s12859-019-2982-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Accepted: 07/02/2019] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Today a variety of phylogenetic file formats exists, some of which are well-established but limited in their data model, while other more recently introduced ones offer advanced features for metadata representation. Although most currently available software only supports the classical formats with a limited metadata model, it would be desirable to have support for the more advanced formats. This is necessary for users to produce richly annotated data that can be efficiently reused and make underlying workflows easily reproducible. A programming library that abstracts over the data and metadata models of the different formats and allows supporting all of them in one step would significantly simplify the development of new and the extension of existing software to address the need for better metadata annotation. RESULTS We developed the Java library JPhyloIO, which allows event-based reading and writing of the most common alignment and tree/network formats. It allows full access to all features of the nine currently supported formats. By implementing a single JPhyloIO-based reader and writer, application developers can support all of these formats. Due to the event-based architecture, JPhyloIO can be combined with any application data structure, and is memory efficient for large datasets. JPhyloIO is distributed under LGPL. Detailed documentation and example applications (available on http://bioinfweb.info/JPhyloIO/ ) significantly lower the entry barrier for bioinformaticians who wish to benefit from JPhyloIO's features in their own software. CONCLUSION JPhyloIO enables simplified development of new and extension of existing applications that support various standard formats simultaneously. This has the potential to improve interoperability between phylogenetic software tools and at the same time motivate usage of more recent metadata-rich formats such as NeXML or phyloXML.
Collapse
Affiliation(s)
- Ben C Stöver
- Institute for Evolution and Biodiversity, WWU Münster, Hüfferstraße 1, 48149, Münster, Germany.
| | - Sarah Wiechers
- Institute for Evolution and Biodiversity, WWU Münster, Hüfferstraße 1, 48149, Münster, Germany
| | - Kai F Müller
- Institute for Evolution and Biodiversity, WWU Münster, Hüfferstraße 1, 48149, Münster, Germany
| |
Collapse
|
4
|
Vogt L. Organizing phenotypic data-a semantic data model for anatomy. J Biomed Semantics 2019; 10:12. [PMID: 31221226 PMCID: PMC6585074 DOI: 10.1186/s13326-019-0204-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Accepted: 06/05/2019] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Currently, almost all morphological data are published as unstructured free text descriptions. This not only brings about terminological problems regarding semantic transparency, which hampers their re-use by non-experts, but the data cannot be parsed by computers either, which in turn hampers their integration across many fields in the life sciences, including genomics, systems biology, development, medicine, evolution, ecology, and systematics. With an ever-increasing amount of available ontologies and the development of adequate semantic technology, however, a solution to this problem becomes available. Instead of free text descriptions, morphological data can be recorded, stored, and communicated through the Web in the form of highly formalized and structured directed graphs (semantic graphs) that use ontology terms and URIs as terminology. RESULTS After introducing an instance-based approach of recording morphological descriptions as semantic graphs (i.e., Semantic Instance Anatomy Knowledge Graphs) and discussing accompanying metadata graphs, I propose a general scheme of how to efficiently organize the resulting graphs in a tuple store framework based on instances of defined named graph ontology classes. The use of such named graph resources allows meaningful fragmentation of the data, which in turn enables subsequent specification of all kinds of data views for managing and accessing morphological data. CONCLUSIONS Morphological data that comply with the here proposed semantic data model will not only be computer-parsable but also re-usable by non-experts and could be better integrated with other sources of data in the life sciences. This would allow morphology as a discipline to further participate in eScience and Big Data.
Collapse
Affiliation(s)
- Lars Vogt
- Institut für Evolutionsbiologie und Ökologie, Rheinische Friedrich-Wilhelms-Universität Bonn, An der Immenburg 1, 53121, Bonn, Germany.
| |
Collapse
|
5
|
Vogt L. Levels and building blocks-toward a domain granularity framework for the life sciences. J Biomed Semantics 2019; 10:4. [PMID: 30691505 PMCID: PMC6348634 DOI: 10.1186/s13326-019-0196-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Accepted: 01/14/2019] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND With the emergence of high-throughput technologies, Big Data and eScience, the use of online data repositories and the establishment of new data standards that require data to be computer-parsable become increasingly important. As a consequence, there is an increasing need for an integrated system of hierarchies of levels of different types of material entities that helps with organizing, structuring and integrating data from disparate sources to facilitate data exploration, data comparison and analysis. Theories of granularity provide such integrated systems. RESULTS On the basis of formal approaches to theories of granularity authored by information scientists and ontology researchers, I discuss the shortcomings of some applications of the concept of levels and argue that the general theory of granularity proposed by Keet circumvents these problems. I introduce the concept of building blocks, which gives rise to a hierarchy of levels that can be formally characterized by Keet's theory. This hierarchy functions as an organizational backbone for integrating various other hierarchies that I briefly discuss, resulting in a domain granularity framework for the life sciences. I also discuss the consequences of this granularity framework for the structure of the top-level category of 'material entity' in Basic Formal Ontology. CONCLUSIONS The domain granularity framework suggested here is meant to provide the basis on which a more comprehensive information framework for the life sciences can be developed, which would provide the much needed conceptual framework for representing domains that cover multiple granularity levels. This framework can be used for intuitively structuring data in the life sciences, facilitating data exploration, and it can be employed for reasoning over different granularity levels across different hierarchies. It would provide a methodological basis for establishing comparability between data sets and for quantitatively measuring their degree of semantic similarity.
Collapse
Affiliation(s)
- Lars Vogt
- Rheinische Friedrich-Wilhelms-Universität Bonn, Institut für Evolutionsbiologie und Ökologie, An der Immenburg 1, 53121, Bonn, Germany.
| |
Collapse
|
6
|
Vogt L, Baum R, Bhatty P, Köhler C, Meid S, Quast B, Grobe P. SOCCOMAS: a FAIR web content management system that uses knowledge graphs and that is based on semantic programming. Database (Oxford) 2019; 2019:baz067. [PMID: 31392324 PMCID: PMC6686081 DOI: 10.1093/database/baz067] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2018] [Revised: 01/08/2019] [Accepted: 03/29/2019] [Indexed: 11/13/2022]
Abstract
We introduce Semantic Ontology-Controlled application for web Content Management Systems (SOCCOMAS), a development framework for FAIR ('findable', 'accessible', 'interoperable', 'reusable') Semantic Web Content Management Systems (S-WCMSs). Each S-WCMS run by SOCCOMAS has its contents managed through a corresponding knowledge base that stores all data and metadata in the form of semantic knowledge graphs in a Jena tuple store. Automated procedures track provenance, user contributions and detailed change history. Each S-WCMS is accessible via both a graphical user interface (GUI), utilizing the JavaScript framework AngularJS, and a SPARQL endpoint. As a consequence, all data and metadata are maximally findable, accessible, interoperable and reusable and comply with the FAIR Guiding Principles. The source code of SOCCOMAS is written using the Semantic Programming Ontology (SPrO). SPrO consists of commands, attributes and variables, with which one can describe an S-WCMS. We used SPrO to describe all the features and workflows typically required by any S-WCMS and documented these descriptions in a SOCCOMAS source code ontology (SC-Basic). SC-Basic specifies a set of default features, such as provenance tracking and publication life cycle with versioning, which will be available in all S-WCMS run by SOCCOMAS. All features and workflows specific to a particular S-WCMS, however, must be described within an instance source code ontology (INST-SCO), defining, e.g. the function and composition of the GUI, with all its user interactions, the underlying data schemes and representations and all its workflow processes. The combination of descriptions in SC-Basic and a given INST-SCO specify the behavior of an S-WCMS. SOCCOMAS controls this S-WCMS through the Java-based middleware that accompanies SPrO, which functions as an interpreter. Because of the ontology-controlled design, SOCCOMAS allows easy customization with a minimum of technical programming background required, thereby seamlessly integrating conventional web page technologies with semantic web technologies. SOCCOMAS and the Java Interpreter are available from (https://github.com/SemanticProgramming).
Collapse
Affiliation(s)
- Lars Vogt
- Institut für Evolutionsbiologie und Ökologie, Rheinische Friedrich-Wilhelms-Universität Bonn, An der Immenburg 1, 53121 Bonn, Germany
| | - Roman Baum
- Institut für Evolutionsbiologie und Ökologie, Rheinische Friedrich-Wilhelms-Universität Bonn, An der Immenburg 1, 53121 Bonn, Germany
| | - Philipp Bhatty
- Zoologisches Forschungsmuseum Alexander Koenig, Adenauerallee 160, 53113 Bonn, Germany
| | - Christian Köhler
- Institut für Evolutionsbiologie und Ökologie, Rheinische Friedrich-Wilhelms-Universität Bonn, An der Immenburg 1, 53121 Bonn, Germany
- Zoologisches Forschungsmuseum Alexander Koenig, Adenauerallee 160, 53113 Bonn, Germany
| | - Sandra Meid
- Institut für Evolutionsbiologie und Ökologie, Rheinische Friedrich-Wilhelms-Universität Bonn, An der Immenburg 1, 53121 Bonn, Germany
| | - Björn Quast
- Zoologisches Forschungsmuseum Alexander Koenig, Adenauerallee 160, 53113 Bonn, Germany
| | - Peter Grobe
- Zoologisches Forschungsmuseum Alexander Koenig, Adenauerallee 160, 53113 Bonn, Germany
| |
Collapse
|
7
|
Vogt L. Towards a semantic approach to numerical tree inference in phylogenetics. Cladistics 2018; 34:200-224. [PMID: 34645075 DOI: 10.1111/cla.12195] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/03/2017] [Indexed: 12/24/2022] Open
Abstract
Conventional approaches to phylogeny reconstruction require a character analysis step prior to and methodologically separated from a numerical tree inference step. The former results in a character matrix that contains the empirical data analysed in the latter. This separation of steps involves various methodological and conceptual problems (e.g. homology assessment independent of tree inference and character optimization, character dependencies, discounting of alternative homology hypotheses). In morphology, the character analysis step covers the stages of morphological comparative studies, homology assessment and the identification and coding of morphological characters. Unfortunately, only the last stage requires some formalism, whereas the preceding stages are commonly regarded to be pre-rational and intuitive, which is why their reproducibility and analytical accessibility is limited. Here, I introduce a rational for a semantic approach to numerical tree inference that uses sets of semantic instance anatomies as data source instead of character matrices, thereby avoiding the above-mentioned problems. A semantic instance anatomy is an ontology-based description of the anatomical organization of a specimen in the form of a semantic graph. The semantic approach to numerical tree inference combines and integrates the steps of character analysis and numerical tree inference and makes both analytically accessible and communicable. Before outlining first steps for a research programme dedicated to the semantic approach to numerical tree inference, I discuss in detail the methodological, conceptual, and computational challenges and requirements that first have to be dealt with before adequate algorithms can be developed.
Collapse
Affiliation(s)
- Lars Vogt
- Institut für Evolutionsbiologie und Ökologie, Universität Bonn, An der Immenburg 1, Bonn, D-53121, Germany
| |
Collapse
|
8
|
Abstract
With a million described species and more than half a billion preserved specimens, the large scale of insect collections is unequaled by those of any other group. Advances in genomics, collection digitization, and imaging have begun to more fully harness the power that such large data stores can provide. These new approaches and technologies have transformed how entomological collections are managed and utilized. While genomic research has fundamentally changed the way many specimens are collected and curated, advances in technology have shown promise for extracting sequence data from the vast holdings already in museums. Efforts to mainstream specimen digitization have taken root and have accelerated traditional taxonomic studies as well as distribution modeling and global change research. Emerging imaging technologies such as microcomputed tomography and confocal laser scanning microscopy are changing how morphology can be investigated. This review provides an overview of how the realization of big data has transformed our field and what may lie in store.
Collapse
Affiliation(s)
- Andrew Edward Z Short
- Department of Ecology and Evolutionary Biology; and Division of Entomology, Biodiversity Institute, University of Kansas, Lawrence, Kansas 66045, USA;
| | - Torsten Dikow
- Department of Entomology, National Museum of Natural History, Smithsonian Institution, Washington, DC 20013, USA;
| | - Corrie S Moreau
- Department of Science and Education, Field Museum of Natural History, Chicago, Illinois 60605, USA;
| |
Collapse
|
9
|
Vogt L. The logical basis for coding ontologically dependent characters. Cladistics 2017; 34:438-458. [DOI: 10.1111/cla.12209] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/23/2017] [Indexed: 01/26/2023] Open
Affiliation(s)
- Lars Vogt
- Institut für Evolutionsbiologie und Ökologie; Universität Bonn; An der Immenburg 1 D-53121 Bonn Germany
| |
Collapse
|
10
|
Benson EE, Harding K, Mackenzie-dodds J. A new quality management perspective for biodiversity conservation and research: Investigating Biospecimen Reporting for Improved Study Quality (BRISQ) and the Standard PRE-analytical Code (SPREC) using Natural History Museum and culture collections as case studies. SYST BIODIVERS 2016. [DOI: 10.1080/14772000.2016.1201167] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Erica E. Benson
- Damar Research Scientists, Damar, Drum Road, Cuparmuir, Fife, Scotland KY15 5RJ, UK
| | - Keith Harding
- Damar Research Scientists, Damar, Drum Road, Cuparmuir, Fife, Scotland KY15 5RJ, UK
| | - Jacqueline Mackenzie-dodds
- Molecular Collections, Department of Life Sciences, Natural History Museum, Cromwell Road, London SW7 5BD, UK
| |
Collapse
|