1
|
Bernabé-Díaz JA, Franco M, Vivo JM, Quesada-Martínez M, Fernández-Breis JT. An automated process for supporting decisions in clustering-based data analysis. Comput Methods Programs Biomed 2022; 219:106765. [PMID: 35367914 DOI: 10.1016/j.cmpb.2022.106765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Revised: 03/14/2022] [Accepted: 03/18/2022] [Indexed: 06/14/2023]
Abstract
BACKGROUND AND OBJECTIVE Metrics are commonly used by biomedical researchers and practitioners to measure and evaluate properties of individuals, instruments, models, methods, or datasets. Due to the lack of a standardized validation procedure for a metric, it is assumed that if a metric is appropriate for analyzing a dataset in a certain domain, then it will be appropriate for other datasets in the same domain. However, such generalizability cannot be taken for granted, since the behavior of a metric can vary in different scenarios. The study of such behavior of a metric is the objective of this paper, since it would allow for assessing its reliability before drawing any conclusion about biomedical datasets. METHODS We present a method to support in evaluating the behavior of quantitative metrics on datasets. Our approach assesses a metric by using clustering-based data analysis, and enhancing the decision-making process in the optimal classification. Our method assesses the metrics by applying two important criteria of the unsupervised classification validation that are calculated on the clusterings generated by the metric, namely stability and goodness of the clusters. The application of our method is facilitated to biomedical researchers by our evaluomeR tool. RESULTS The analytical power of our methods is shown in the results of the application of our method to analyze (1) the behavior of the impact factor metric for a series of journal categories; (2) which structural metrics provide a better partitioning of the content of a repository of biomedical ontologies, and (3) the heterogeneity sources in effect size metrics of biomedical primary studies. CONCLUSIONS The use of statistical properties such as stability and goodness of classifications allows for a useful analysis of the behavior of quantitative metrics, which can be used for supporting decisions about which metrics to apply on a certain dataset.
Collapse
Affiliation(s)
| | - Manuel Franco
- Dept. Statistics and Operations Research, University of Murcia, IMIB-Arrixaca, Spain
| | - Juana-María Vivo
- Dept. Statistics and Operations Research, University of Murcia, IMIB-Arrixaca, Spain
| | | | | |
Collapse
|
2
|
Abad-Navarro F, Quesada-Martínez M, Duque-Ramos A, Fernández-Breis JT. Analysis of readability and structural accuracy in SNOMED CT. BMC Med Inform Decis Mak 2020; 20:284. [PMID: 33319711 PMCID: PMC7737250 DOI: 10.1186/s12911-020-01291-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 10/13/2020] [Indexed: 11/18/2022] Open
Abstract
Background The increasing adoption of ontologies in biomedical research and the growing number of ontologies available have made it necessary to assure the quality of these resources. Most of the well-established ontologies, such as the Gene Ontology or SNOMED CT, have their own quality assurance processes. These have demonstrated their usefulness for the maintenance of the resources but are unable to detect all of the modelling flaws in the ontologies. Consequently, the development of efficient and effective quality assurance methods is needed. Methods Here, we propose a series of quantitative metrics based on the processing of the lexical regularities existing in the content of the ontology, to analyse readability and structural accuracy. The readability metrics account for the ratio of labels, descriptions, and synonyms associated with the ontology entities. The structural accuracy metrics evaluate how two ontology modelling best practices are followed: (1) lexically suggest locally define (LSLD), that is, if what is expressed in natural language for humans is available as logical axioms for machines; and (2) systematic naming, which accounts for the amount of label content of the classes in a given taxonomy shared. Results We applied the metrics to different versions of SNOMED CT. Both readability and structural accuracy metrics remained stable in time but could capture some changes in the modelling decisions in SNOMED CT. The value of the LSLD metric increased from 0.27 to 0.31, and the value of the systematic naming metric was around 0.17. We analysed the readability and structural accuracy in the SNOMED CT July 2019 release. The results showed that the fulfilment of the structural accuracy criteria varied among the SNOMED CT hierarchies. The value of the metrics for the hierarchies was in the range of 0–0.92 (LSLD) and 0.08–1 (systematic naming). We also identified the cases that did not meet the best practices. Conclusions We generated useful information about the engineering of the ontology, making the following contributions: (1) a set of readability metrics, (2) the use of lexical regularities to define structural accuracy metrics, and (3) the generation of quality assurance information for SNOMED CT.
Collapse
Affiliation(s)
- Francisco Abad-Navarro
- Departamento de Informática y Sistemas, Universidad de Murcia, Campus de Espinardo, 30100, Murcia, Spain.,Instituto Murciano de Investigación Biosanitaria (IMIB-Arrixaca), Hospital Clínico Universitario Virgen de la Arrixaca, 30120, Murcia, Spain
| | - Manuel Quesada-Martínez
- Center of Operations Research (CIO), Miguel Hernández University of Elche, Avda. de la Universidad, 03202, Alicante, Spain
| | - Astrid Duque-Ramos
- Facultad de Ingenierías, Universidad Autónoma Latinoamericana, Carrera 55 49, 050010, Medellín, Colombia
| | - Jesualdo Tomás Fernández-Breis
- Departamento de Informática y Sistemas, Universidad de Murcia, Campus de Espinardo, 30100, Murcia, Spain. .,Instituto Murciano de Investigación Biosanitaria (IMIB-Arrixaca), Hospital Clínico Universitario Virgen de la Arrixaca, 30120, Murcia, Spain.
| |
Collapse
|
3
|
Franco M, Vivo JM, Quesada-Martínez M, Duque-Ramos A, Fernández-Breis JT. Evaluation of ontology structural metrics based on public repository data. Brief Bioinform 2020; 21:473-485. [PMID: 30715146 DOI: 10.1093/bib/bbz009] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2018] [Revised: 12/20/2018] [Accepted: 01/05/2019] [Indexed: 11/14/2022] Open
Abstract
The development and application of biological ontologies have increased significantly in recent years. These ontologies can be retrieved from different repositories, which do not provide much information about quality aspects of the ontologies. In the past years, some ontology structural metrics have been proposed, but their validity as measurement instrument has not been sufficiently studied to date. In this work, we evaluate a set of reproducible and objective ontology structural metrics. Given the lack of standard methods for this purpose, we have applied an evaluation method based on the stability and goodness of the classifications of ontologies produced by each metric on an ontology corpus. The evaluation has been done using ontology repositories as corpora. More concretely, we have used 119 ontologies from the OBO Foundry repository and 78 ontologies from AgroPortal. First, we study the correlations between the metrics. Second, we study whether the clusters for a given metric are stable and have a good structure. The results show that the existing correlations are not biasing the evaluation, there are no metrics generating unstable clusterings and all the metrics evaluated provide at least reasonable clustering structure. Furthermore, our work permits to review and suggest the most reliable ontology structural metrics in terms of stability and goodness of their classifications. Availability: http://sele.inf.um.es/ontology-metrics.
Collapse
Affiliation(s)
- Manuel Franco
- Departamento de Estadística e Investigación Operativa, Universidad de Murcia, Murcia, Spain
| | - Juana María Vivo
- Departamento de Estadística e Investigación Operativa, Universidad de Murcia, Murcia, Spain
| | | | - Astrid Duque-Ramos
- Departamento de Sistemas, Facultad de Ingenierías, Universidad de Antioquia, Medellín, Colombia
| | | |
Collapse
|
4
|
Quesada-Martínez M, Marcos M, Abad-Navarro F, Martínez-Salvador B, Fernández-Breis JT. Towards the semantic enrichment of Computer Interpretable Guidelines: a method for the identification of relevant ontological terms. AMIA Annu Symp Proc 2018; 2018:922-931. [PMID: 30815135 PMCID: PMC6371308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Clinical Practice Guidelines (CPGs) contain recommendations intended to optimize patient care, produced based on a systematic review of evidence. In turn, Computer-Interpretable Guidelines (CIGs) are formalized versions of CPGs for use as decision-support systems. We consider the enrichment of the CIG by means of an OWL ontology that describes the clinical domain of the CIG, which could be exploited e.g. for the interoperability with the Electronic Health Record (EHR). As a first step, in this paper we describe a method to support the development of such an ontology starting from a CIG. The method uses an alignment algorithm for the automated identification of ontological terms relevant to the clinical domain of the CIG, as well as a web platform to manually review the alignments and select the appropriate ones. Finally, we present the results of the application of the method to a small corpus of CIGs.
Collapse
Affiliation(s)
- Manuel Quesada-Martínez
- Departamento de Informática y Sistemas, Universidad de Murcia, IMIB-Arrixaca, Spain
- Center of Operations Research (CIO), Miguel Hernández University of Elche, Spain
| | - Mar Marcos
- Dept. of Computer Engineering and Science, Universitat Jaume I, Spain
| | | | | | | |
Collapse
|
5
|
van Damme P, Quesada-Martínez M, Cornet R, Fernández-Breis JT. From lexical regularities to axiomatic patterns for the quality assurance of biomedical terminologies and ontologies. J Biomed Inform 2018; 84:59-74. [PMID: 29908358 DOI: 10.1016/j.jbi.2018.06.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2017] [Revised: 06/10/2018] [Accepted: 06/12/2018] [Indexed: 11/30/2022]
Abstract
Ontologies and terminologies have been identified as key resources for the achievement of semantic interoperability in biomedical domains. The development of ontologies is performed as a joint work by domain experts and knowledge engineers. The maintenance and auditing of these resources is also the responsibility of such experts, and this is usually a time-consuming, mostly manual task. Manual auditing is impractical and ineffective for most biomedical ontologies, especially for larger ones. An example is SNOMED CT, a key resource in many countries for codifying medical information. SNOMED CT contains more than 300000 concepts. Consequently its auditing requires the support of automatic methods. Many biomedical ontologies contain natural language content for humans and logical axioms for machines. The 'lexically suggest, logically define' principle means that there should be a relation between what is expressed in natural language and as logical axioms, and that such a relation should be useful for auditing and quality assurance. Besides, the meaning of this principle is that the natural language content for humans could be used to generate the logical axioms for the machines. In this work, we propose a method that combines lexical analysis and clustering techniques to (1) identify regularities in the natural language content of ontologies; (2) cluster, by similarity, labels exhibiting a regularity; (3) extract relevant information from those clusters; and (4) propose logical axioms for each cluster with the support of axiom templates. These logical axioms can then be evaluated with the existing axioms in the ontology to check their correctness and completeness, which are two fundamental objectives in auditing and quality assurance. In this paper, we describe the application of the method to two SNOMED CT modules, a 'congenital' module, obtained using concepts exhibiting the attribute Occurrence - Congenital, and a 'chronic' module, using concepts exhibiting the attribute Clinical course - Chronic. We obtained a precision and a recall of respectively 75% and 28% for the 'congenital' module, and 64% and 40% for the 'chronic' one. We consider these results to be promising, so our method can contribute to the support of content editors by using automatic methods for assuring the quality of biomedical ontologies and terminologies.
Collapse
Affiliation(s)
- Philip van Damme
- Department of Medical Informatics, Amsterdam Public Health research institute, Academic Medical Center, University of Amsterdam, The Netherlands.
| | - Manuel Quesada-Martínez
- Departamento de Informática y Sistemas, Universidad de Murcia, IMIB-Arrixaca, Murcia, Spain; Center of Operations Research (CIO), University Miguel Hernandez of Elche (UMH), Spain.
| | - Ronald Cornet
- Department of Medical Informatics, Amsterdam Public Health research institute, Academic Medical Center, University of Amsterdam, The Netherlands.
| | | |
Collapse
|
6
|
Quesada-Martínez M, Duque-Ramos A, Iniesta-Moreno M, Fernández-Breis JT. Preliminary Analysis of the OBO Foundry Ontologies and Their Evolution Using OQuaRE. Stud Health Technol Inform 2017; 235:426-430. [PMID: 28423828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The biomedical community has now developed a significant number of ontologies. The curation of biomedical ontologies is a complex task as they evolve rapidly, being new versions regularly published. Therefore, methods to support ontology developers in analysing and tracking the evolution of their ontologies are needed. OQuaRE is an ontology evaluation framework based on quantitative metrics that permits to obtain normalised scores for different ontologies. In this work, OQuaRE has been applied to 408 versions of the eight OBO Foundry member ontologies. The OBO Foundry member ontologies are supposed to have been built by applying the OBO Foundry principles. Our results show that this set of ontologies is actually following principles such as the naming convention, and that the evolution of the OBO Foundry member ontologies is generating ontologies with higher OQuaRE quality scores.
Collapse
|
7
|
Duque-Ramos A, Quesada-Martínez M, Iniesta-Moreno M, Fernández-Breis JT, Stevens R. Supporting the analysis of ontology evolution processes through the combination of static and dynamic scaling functions in OQuaRE. J Biomed Semantics 2016; 7:63. [PMID: 27751176 PMCID: PMC5067895 DOI: 10.1186/s13326-016-0091-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2016] [Accepted: 08/02/2016] [Indexed: 11/10/2022] Open
Abstract
Background The biomedical community has now developed a significant number of ontologies. The curation of biomedical ontologies is a complex task and biomedical ontologies evolve rapidly, so new versions are regularly and frequently published in ontology repositories. This has the implication of there being a high number of ontology versions over a short time span. Given this level of activity, ontology designers need to be supported in the effective management of the evolution of biomedical ontologies as the different changes may affect the engineering and quality of the ontology. This is why there is a need for methods that contribute to the analysis of the effects of changes and evolution of ontologies. Results In this paper we approach this issue from the ontology quality perspective. In previous work we have developed an ontology evaluation framework based on quantitative metrics, called OQuaRE. Here, OQuaRE is used as a core component in a method that enables the analysis of the different versions of biomedical ontologies using the quality dimensions included in OQuaRE. Moreover, we describe and use two scales for evaluating the changes between the versions of a given ontology. The first one is the static scale used in OQuaRE and the second one is a new, dynamic scale, based on the observed values of the quality metrics of a corpus defined by all the versions of a given ontology (life-cycle). In this work we explain how OQuaRE can be adapted for understanding the evolution of ontologies. Its use has been illustrated with the ontology of bioinformatics operations, types of data, formats, and topics (EDAM). Conclusions The two scales included in OQuaRE provide complementary information about the evolution of the ontologies. The application of the static scale, which is the original OQuaRE scale, to the versions of the EDAM ontology reveals a design based on good ontological engineering principles. The application of the dynamic scale has enabled a more detailed analysis of the evolution of the ontology, measured through differences between versions. The statistics of change based on the OQuaRE quality scores make possible to identify key versions where some changes in the engineering of the ontology triggered a change from the OQuaRE quality perspective. In the case of the EDAM, this study let us to identify that the fifth version of the ontology has the largest impact in the quality metrics of the ontology, when comparative analyses between the pairs of consecutive versions are performed.
Collapse
Affiliation(s)
- Astrid Duque-Ramos
- Universidad de Murcia, IMIB-Arrixaca, Campus de Espinardo, Murcia, 30071, Spain
| | | | | | | | - Robert Stevens
- School of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL, UK
| |
Collapse
|
8
|
Quesada-Martínez M, Fernández-Breis JT, Karlsson D. Suggesting Missing Relations in Biomedical Ontologies Based on Lexical Regularities. Stud Health Technol Inform 2016; 228:384-388. [PMID: 27577409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The number of biomedical ontologies has increased significantly in recent years. Many of such ontologies are the result of efforts of communities of domain experts and ontology engineers. The development and application of quality assurance (QA) methods should help these communities to develop useful ontologies for both humans and machines. According to previous studies, biomedical ontologies are rich in natural language content, but most of them are not so rich in axiomatic terms. Here, we are interested in studying the relation between content in natural language and content in axiomatic form. The analysis of the labels of the classes permits to identify lexical regularities (LRs), which are sets of words that are shared by labels of different classes. Our assumption is that the classes exhibiting an LR should be logically related through axioms, which is used to propose an algorithm to detect missing relations in the ontology. Here, we analyse a lexical regularity of SNOMED CT, congenital stenosis, which is reported as problematic by the SNOMED CT maintenance team.
Collapse
Affiliation(s)
| | | | - Daniel Karlsson
- Department of Biomedical Engineering, Linköping University, Sweden
| |
Collapse
|
9
|
Quesada-Martínez M, Fernández-Breis J, Stevens R. Lexical Characterisation of Bio-Ontologies by the Inspection of Regularities in Labels. Curr Bioinform 2015. [DOI: 10.2174/157489361002150518124739] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
10
|
Quesada-Martínez M, Fernández-Breis JT, Stevens R, Mikroyannidi E. Prioritising lexical patterns to increase axiomatisation in biomedical ontologies. The role of localisation and modularity. Methods Inf Med 2014; 54:56-64. [PMID: 24993110 DOI: 10.3414/me13-02-0026] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2013] [Accepted: 05/07/2014] [Indexed: 11/09/2022]
Abstract
INTRODUCTION This article is part of the Focus Theme of METHODS of Information in Medicine on "Managing Interoperability and Complexity in Health Systems". OBJECTIVES In previous work, we have defined methods for the extraction of lexical patterns from labels as an initial step towards semi-automatic ontology enrichment methods. Our previous findings revealed that many biomedical ontologies could benefit from enrichment methods using lexical patterns as a starting point.Here, we aim to identify which lexical patterns are appropriate for ontology enrichment, driving its analysis by metrics to prioritised the patterns. METHODS We propose metrics for suggesting which lexical regularities should be the starting point to enrich complex ontologies. Our method determines the relevance of a lexical pattern by measuring its locality in the ontology, that is, the distance between the classes associated with the pattern, and the distribution of the pattern in a certain module of the ontology. The methods have been applied to four significant biomedical ontologies including the Gene Ontology and SNOMED CT. RESULTS The metrics provide information about the engineering of the ontologies and the relevance of the patterns. Our method enables the suggestion of links between classes that are not made explicit in the ontology. We propose a prioritisation of the lexical patterns found in the analysed ontologies. CONCLUSIONS The locality and distribution of lexical patterns offer insights into the further engineering of the ontology. Developers can use this information to improve the axiomatisation of their ontologies.
Collapse
Affiliation(s)
- M Quesada-Martínez
- Manuel Quesada-Martínez, Universidad de Murcia, Departamento de Informática y Sistemas, Facultad de Informática, Campus de Espinardo, 30100 Murcia, Spain, E-mail:
| | | | | | | |
Collapse
|