1
|
Lars V, Tobias K, Robert H. Semantic units: organizing knowledge graphs into semantically meaningful units of representation. J Biomed Semantics 2024; 15:7. [PMID: 38802877 PMCID: PMC11131308 DOI: 10.1186/s13326-024-00310-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 05/14/2024] [Indexed: 05/29/2024] Open
Abstract
BACKGROUND In today's landscape of data management, the importance of knowledge graphs and ontologies is escalating as critical mechanisms aligned with the FAIR Guiding Principles-ensuring data and metadata are Findable, Accessible, Interoperable, and Reusable. We discuss three challenges that may hinder the effective exploitation of the full potential of FAIR knowledge graphs. RESULTS We introduce "semantic units" as a conceptual solution, although currently exemplified only in a limited prototype. Semantic units structure a knowledge graph into identifiable and semantically meaningful subgraphs by adding another layer of triples on top of the conventional data layer. Semantic units and their subgraphs are represented by their own resource that instantiates a corresponding semantic unit class. We distinguish statement and compound units as basic categories of semantic units. A statement unit is the smallest, independent proposition that is semantically meaningful for a human reader. Depending on the relation of its underlying proposition, it consists of one or more triples. Organizing a knowledge graph into statement units results in a partition of the graph, with each triple belonging to exactly one statement unit. A compound unit, on the other hand, is a semantically meaningful collection of statement and compound units that form larger subgraphs. Some semantic units organize the graph into different levels of representational granularity, others orthogonally into different types of granularity trees or different frames of reference, structuring and organizing the knowledge graph into partially overlapping, partially enclosed subgraphs, each of which can be referenced by its own resource. CONCLUSIONS Semantic units, applicable in RDF/OWL and labeled property graphs, offer support for making statements about statements and facilitate graph-alignment, subgraph-matching, knowledge graph profiling, and for management of access restrictions to sensitive data. Additionally, we argue that organizing the graph into semantic units promotes the differentiation of ontological and discursive information, and that it also supports the differentiation of multiple frames of reference within the graph.
Collapse
Affiliation(s)
- Vogt Lars
- TIB Leibniz Information Centre for Science and Technology, Welfengarten 1B, 30167, Hanover, Germany.
| | - Kuhn Tobias
- Department of Computer Science, Vrije Universiteit, Amsterdam, Netherlands
| | - Hoehndorf Robert
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, 23955, Thuwal, Saudi Arabia
| |
Collapse
|
2
|
Bernabé CH, Queralt-Rosinach N, Silva Souza VE, Bonino da Silva Santos LO, Mons B, Jacobsen A, Roos M. The use of foundational ontologies in biomedical research. J Biomed Semantics 2023; 14:21. [PMID: 38082345 PMCID: PMC10712036 DOI: 10.1186/s13326-023-00300-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 11/29/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND The FAIR principles recommend the use of controlled vocabularies, such as ontologies, to define data and metadata concepts. Ontologies are currently modelled following different approaches, sometimes describing conflicting definitions of the same concepts, which can affect interoperability. To cope with that, prior literature suggests organising ontologies in levels, where domain specific (low-level) ontologies are grounded in domain independent high-level ontologies (i.e., foundational ontologies). In this level-based organisation, foundational ontologies work as translators of intended meaning, thus improving interoperability. Despite their considerable acceptance in biomedical research, there are very few studies testing foundational ontologies. This paper describes a systematic literature mapping that was conducted to understand how foundational ontologies are used in biomedical research and to find empirical evidence supporting their claimed (dis)advantages. RESULTS From a set of 79 selected papers, we identified that foundational ontologies are used for several purposes: ontology construction, repair, mapping, and ontology-based data analysis. Foundational ontologies are claimed to improve interoperability, enhance reasoning, speed up ontology development and facilitate maintainability. The complexity of using foundational ontologies is the most commonly cited downside. Despite being used for several purposes, there were hardly any experiments (1 paper) testing the claims for or against the use of foundational ontologies. In the subset of 49 papers that describe the development of an ontology, it was observed a low adherence to ontology construction (16 papers) and ontology evaluation formal methods (4 papers). CONCLUSION Our findings have two main implications. First, the lack of empirical evidence about the use of foundational ontologies indicates a need for evaluating the use of such artefacts in biomedical research. Second, the low adherence to formal methods illustrates how the field could benefit from a more systematic approach when dealing with the development and evaluation of ontologies. The understanding of how foundational ontologies are used in the biomedical field can drive future research towards the improvement of ontologies and, consequently, data FAIRness. The adoption of formal methods can impact the quality and sustainability of ontologies, and reusing these methods from other fields is encouraged.
Collapse
Affiliation(s)
- César H Bernabé
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands.
| | | | | | - Luiz Olavo Bonino da Silva Santos
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
- University of Twente, Enschede, The Netherlands
| | - Barend Mons
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Annika Jacobsen
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Marco Roos
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands.
| |
Collapse
|
3
|
Vogt L, Mikó I, Bartolomaeus T. Anatomy and the type concept in biology show that ontologies must be adapted to the diagnostic needs of research. J Biomed Semantics 2022; 13:18. [PMID: 35761389 PMCID: PMC9235205 DOI: 10.1186/s13326-022-00268-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 04/12/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In times of exponential data growth in the life sciences, machine-supported approaches are becoming increasingly important and with them the need for FAIR (Findable, Accessible, Interoperable, Reusable) and eScience-compliant data and metadata standards. Ontologies, with their queryable knowledge resources, play an essential role in providing these standards. Unfortunately, biomedical ontologies only provide ontological definitions that answer What is it? questions, but no method-dependent empirical recognition criteria that answer How does it look? QUESTIONS Consequently, biomedical ontologies contain knowledge of the underlying ontological nature of structural kinds, but often lack sufficient diagnostic knowledge to unambiguously determine the reference of a term. RESULTS We argue that this is because ontology terms are usually textually defined and conceived as essentialistic classes, while recognition criteria often require perception-based definitions because perception-based contents more efficiently document and communicate spatial and temporal information-a picture is worth a thousand words. Therefore, diagnostic knowledge often must be conceived as cluster classes or fuzzy sets. Using several examples from anatomy, we point out the importance of diagnostic knowledge in anatomical research and discuss the role of cluster classes and fuzzy sets as concepts of grouping needed in anatomy ontologies in addition to essentialistic classes. In this context, we evaluate the role of the biological type concept and discuss its function as a general container concept for groupings not covered by the essentialistic class concept. CONCLUSIONS We conclude that many recognition criteria can be conceptualized as text-based cluster classes that use terms that are in turn based on perception-based fuzzy set concepts. Finally, we point out that only if biomedical ontologies model also relevant diagnostic knowledge in addition to ontological knowledge, they will fully realize their potential and contribute even more substantially to the establishment of FAIR and eScience-compliant data and metadata standards in the life sciences.
Collapse
Affiliation(s)
- Lars Vogt
- TIB Leibniz Information Centre for Science and Technology, Welfengarten 1B, 30167, Hannover, Germany.
| | - István Mikó
- Don Chandler Entomological Collection, University of New Hampshire, Durham, NH, USA
| | - Thomas Bartolomaeus
- Institut für Evolutionsbiologie und Ökologie, Universität Bonn, An der Immenburg 1, 53121, Bonn, Germany
| |
Collapse
|
4
|
Vogt L. Levels and building blocks-toward a domain granularity framework for the life sciences. J Biomed Semantics 2019; 10:4. [PMID: 30691505 PMCID: PMC6348634 DOI: 10.1186/s13326-019-0196-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Accepted: 01/14/2019] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND With the emergence of high-throughput technologies, Big Data and eScience, the use of online data repositories and the establishment of new data standards that require data to be computer-parsable become increasingly important. As a consequence, there is an increasing need for an integrated system of hierarchies of levels of different types of material entities that helps with organizing, structuring and integrating data from disparate sources to facilitate data exploration, data comparison and analysis. Theories of granularity provide such integrated systems. RESULTS On the basis of formal approaches to theories of granularity authored by information scientists and ontology researchers, I discuss the shortcomings of some applications of the concept of levels and argue that the general theory of granularity proposed by Keet circumvents these problems. I introduce the concept of building blocks, which gives rise to a hierarchy of levels that can be formally characterized by Keet's theory. This hierarchy functions as an organizational backbone for integrating various other hierarchies that I briefly discuss, resulting in a domain granularity framework for the life sciences. I also discuss the consequences of this granularity framework for the structure of the top-level category of 'material entity' in Basic Formal Ontology. CONCLUSIONS The domain granularity framework suggested here is meant to provide the basis on which a more comprehensive information framework for the life sciences can be developed, which would provide the much needed conceptual framework for representing domains that cover multiple granularity levels. This framework can be used for intuitively structuring data in the life sciences, facilitating data exploration, and it can be employed for reasoning over different granularity levels across different hierarchies. It would provide a methodological basis for establishing comparability between data sets and for quantitatively measuring their degree of semantic similarity.
Collapse
Affiliation(s)
- Lars Vogt
- Rheinische Friedrich-Wilhelms-Universität Bonn, Institut für Evolutionsbiologie und Ökologie, An der Immenburg 1, 53121, Bonn, Germany.
| |
Collapse
|
5
|
Vogt L. Towards a semantic approach to numerical tree inference in phylogenetics. Cladistics 2018; 34:200-224. [PMID: 34645075 DOI: 10.1111/cla.12195] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/03/2017] [Indexed: 12/24/2022] Open
Abstract
Conventional approaches to phylogeny reconstruction require a character analysis step prior to and methodologically separated from a numerical tree inference step. The former results in a character matrix that contains the empirical data analysed in the latter. This separation of steps involves various methodological and conceptual problems (e.g. homology assessment independent of tree inference and character optimization, character dependencies, discounting of alternative homology hypotheses). In morphology, the character analysis step covers the stages of morphological comparative studies, homology assessment and the identification and coding of morphological characters. Unfortunately, only the last stage requires some formalism, whereas the preceding stages are commonly regarded to be pre-rational and intuitive, which is why their reproducibility and analytical accessibility is limited. Here, I introduce a rational for a semantic approach to numerical tree inference that uses sets of semantic instance anatomies as data source instead of character matrices, thereby avoiding the above-mentioned problems. A semantic instance anatomy is an ontology-based description of the anatomical organization of a specimen in the form of a semantic graph. The semantic approach to numerical tree inference combines and integrates the steps of character analysis and numerical tree inference and makes both analytically accessible and communicable. Before outlining first steps for a research programme dedicated to the semantic approach to numerical tree inference, I discuss in detail the methodological, conceptual, and computational challenges and requirements that first have to be dealt with before adequate algorithms can be developed.
Collapse
Affiliation(s)
- Lars Vogt
- Institut für Evolutionsbiologie und Ökologie, Universität Bonn, An der Immenburg 1, Bonn, D-53121, Germany
| |
Collapse
|
6
|
Vogt L. The logical basis for coding ontologically dependent characters. Cladistics 2017; 34:438-458. [DOI: 10.1111/cla.12209] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/23/2017] [Indexed: 01/26/2023] Open
Affiliation(s)
- Lars Vogt
- Institut für Evolutionsbiologie und Ökologie; Universität Bonn; An der Immenburg 1 D-53121 Bonn Germany
| |
Collapse
|
7
|
Vogt L. Assessing similarity: on homology, characters and the need for a semantic approach to non-evolutionary comparative homology. Cladistics 2016; 33:513-539. [DOI: 10.1111/cla.12179] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/20/2016] [Indexed: 01/09/2023] Open
Affiliation(s)
- Lars Vogt
- Institut für Evolutionsbiologie und Ökologie; Universität Bonn; An der Immenburg 1 Bonn D-53121 Germany
| |
Collapse
|
8
|
Vogt L, Nickel M, Jenner RA, Deans AR. The need for data standards in zoomorphology. J Morphol 2013; 274:793-808. [PMID: 23508988 DOI: 10.1002/jmor.20138] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2012] [Revised: 12/10/2012] [Accepted: 01/18/2013] [Indexed: 11/05/2022]
Abstract
eScience is a new approach to research that focuses on data mining and exploration rather than data generation or simulation. This new approach is arguably a driving force for scientific progress and requires data to be openly available, easily accessible via the Internet, and compatible with each other. eScience relies on modern standards for the reporting and documentation of data and metadata. Here, we suggest necessary components (i.e., content, concept, nomenclature, format) of such standards in the context of zoomorphology. We document the need for using data repositories to prevent data loss and how publication practice is currently changing, with the emergence of dynamic publications and the publication of digital datasets. Subsequently, we demonstrate that in zoomorphology the scientific record is still limited to published literature and that zoomorphological data are usually not accessible through data repositories. The underlying problem is that zoomorphology lacks the standards for data and metadata. As a consequence, zoomorphology cannot participate in eScience. We argue that the standardization of morphological data requires i) a standardized framework for terminologies for anatomy and ii) a formalized method of description that allows computer-parsable morphological data to be communicable, compatible, and comparable. The role of controlled vocabularies (e.g., ontologies) for developing respective terminologies and methods of description is discussed, especially in the context of data annotation and semantic enhancement of publications. Finally, we introduce the International Consortium for Zoomorphology Standards, a working group that is open to everyone and whose aim is to stimulate and synthesize dialog about standards. It is the Consortium's ultimate goal to assist the zoomorphology community in developing modern data and metadata standards, including anatomy ontologies, thereby facilitating the participation of zoomorphology in eScience.
Collapse
Affiliation(s)
- Lars Vogt
- Abteilung Zoologie und Evolutionsbiologie, Institut für Evolutionsbiologie und Ökologie, Fachgruppe Biologie, Universität Bonn; An der Immenburg 1, Bonn D-53121, Germany.
| | | | | | | |
Collapse
|
9
|
Vogt L, Grobe P, Quast B, Bartolomaeus T. Fiat or bona fide boundary--a matter of granular perspective. PLoS One 2012; 7:e48603. [PMID: 23251333 PMCID: PMC3520998 DOI: 10.1371/journal.pone.0048603] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2012] [Accepted: 09/27/2012] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Distinguishing bona fide (i.e. natural) and fiat (i.e. artificial) physical boundaries plays a key role for distinguishing natural from artificial material entities and is thus relevant to any scientific formal foundational top-level ontology, as for instance the Basic Formal Ontology (BFO). In BFO, the distinction is essential for demarcating two foundational categories of material entity: object and fiat object part. The commonly used basis for demarcating bona fide from fiat boundary refers to two criteria: (i) intrinsic qualities of the boundary bearers (i.e. spatial/physical discontinuity, qualitative heterogeneity) and (ii) mind-independent existence of the boundary. The resulting distinction of bona fide and fiat boundaries is considered to be categorial and exhaustive. METHODOLOGY/PRINCIPAL FINDINGS By Referring to various examples from biology, we demonstrate that the hitherto used distinction of boundaries is not categorial: (i) spatial/physical discontinuity is a matter of scale and the differentiation of bona fide and fiat boundaries is thus granularity-dependent, and (ii) this differentiation is not absolute, but comes in degrees. By reducing the demarcation criteria to mind-independence and by also considering dispositions and historical relations of the bearers of boundaries, instead of only considering their spatio-structural properties, we demonstrate with various examples that spatio-structurally fiat boundaries can nevertheless be mind-independent and in this sense bona fide. CONCLUSIONS/SIGNIFICANCE We argue that the ontological status of a given boundary is perspective-dependent and that the strictly spatio-structural demarcation criteria follow a static perspective that is ignorant of causality and the dynamics of reality. Based on a distinction of several ontologically independent perspectives, we suggest different types of boundaries and corresponding material entities, including boundaries based on function (locomotion, physiology, ecology, development, reproduction) and common history (development, heredity, evolution). We argue that for each perspective one can differentiate respective bona fide from fiat boundaries.
Collapse
Affiliation(s)
- Lars Vogt
- Institut für Evolutionsbiologie und Ökologie, Universität Bonn, Bonn, Germany.
| | | | | | | |
Collapse
|
10
|
Vogt L, Grobe P, Quast B, Bartolomaeus T. Accommodating ontologies to biological reality--top-level categories of cumulative-constitutively organized material entities. PLoS One 2012; 7:e30004. [PMID: 22253856 PMCID: PMC3253816 DOI: 10.1371/journal.pone.0030004] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2011] [Accepted: 12/11/2011] [Indexed: 11/18/2022] Open
Abstract
Background The Basic Formal Ontology (BFO) is a top-level formal foundational ontology for the biomedical domain. It has been developed with the purpose to serve as an ontologically consistent template for top-level categories of application oriented and domain reference ontologies within the Open Biological and Biomedical Ontologies Foundry (OBO). BFO is important for enabling OBO ontologies to facilitate in reliably communicating and managing data and metadata within and across biomedical databases. Following its intended single inheritance policy, BFO's three top-level categories of material entity (i.e. ‘object’, ‘fiat object part’, ‘object aggregate’) must be exhaustive and mutually disjoint. We have shown elsewhere that for accommodating all types of constitutively organized material entities, BFO must be extended by additional categories of material entity. Methodology/Principal Findings Unfortunately, most biomedical material entities are cumulative-constitutively organized. We show that even the extended BFO does not exhaustively cover cumulative-constitutively organized material entities. We provide examples from biology and everyday life that demonstrate the necessity for ‘portion of matter’ as another material building block. This implies the necessity for further extending BFO by ‘portion of matter’ as well as three additional categories that possess portions of matter as aggregate components. These extensions are necessary if the basic assumption that all parts that share the same granularity level exhaustively sum to the whole should also apply to cumulative-constitutively organized material entities. By suggesting a notion of granular representation we provide a way to maintain the single inheritance principle when dealing with cumulative-constitutively organized material entities. Conclusions/Significance We suggest to extend BFO to incorporate additional categories of material entity and to rearrange its top-level material entity taxonomy. With these additions and the notion of granular representation, BFO would exhaustively cover all top-level types of material entities that application oriented ontologies may use as templates, while still maintaining the single inheritance principle.
Collapse
Affiliation(s)
- Lars Vogt
- Institut für Evolutionsbiologie und Ökologie, Universität Bonn, Bonn, Germany.
| | | | | | | |
Collapse
|
11
|
Wu WZ, Leung Y. Theory and applications of granular labelled partitions in multi-scale decision tables. Inf Sci (N Y) 2011. [DOI: 10.1016/j.ins.2011.04.047] [Citation(s) in RCA: 160] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
12
|
Vogt L, Grobe P, Quast B, Bartolomaeus T. Top-level categories of constitutively organized material entities--suggestions for a formal top-level ontology. PLoS One 2011; 6:e18794. [PMID: 21533043 PMCID: PMC3080885 DOI: 10.1371/journal.pone.0018794] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2010] [Accepted: 03/18/2011] [Indexed: 11/23/2022] Open
Abstract
Background Application oriented ontologies are important for reliably communicating and
managing data in databases. Unfortunately, they often differ in the
definitions they use and thus do not live up to their potential. This
problem can be reduced when using a standardized and ontologically
consistent template for the top-level categories from a top-level formal
foundational ontology. This would support ontological consistency within
application oriented ontologies and compatibility between them. The Basic
Formal Ontology (BFO) is such a foundational ontology for the biomedical
domain that has been developed following the single inheritance policy. It
provides the top-level template within the Open Biological and Biomedical
Ontologies Foundry. If it wants to live up to its expected role, its three
top-level categories of material entity (i.e., ‘object’,
‘fiat object part’, ‘object
aggregate’) must be exhaustive, i.e. every concrete material entity
must instantiate exactly one of them. Methodology/Principal Findings By systematically evaluating all possible basic configurations of material
building blocks we show that BFO's top-level categories of material
entity are not exhaustive. We provide examples from biology and everyday
life that demonstrate the necessity for two additional categories:
‘fiat object part aggregate’ and
‘object with fiat object part aggregate’. By
distinguishing topological coherence, topological adherence, and metric
proximity we furthermore provide a differentiation of clusters and groups as
two distinct subcategories for each of the three categories of material
entity aggregates, resulting in six additional subcategories of material
entity. Conclusions/Significance We suggest extending BFO to incorporate two additional categories of material
entity as well as two subcategories for each of the three categories of
material entity aggregates. With these additions, BFO would exhaustively
cover all top-level types of material entity that application oriented
ontologies may use as templates. Our result, however, depends on the premise
that all material entities are organized according to a constitutive
granularity.
Collapse
Affiliation(s)
- Lars Vogt
- Institut für Evolutionsbiologie und Ökologie, Universität Bonn, Bonn, Germany.
| | | | | | | |
Collapse
|