1
|
Liu G, Jin C, Shi L, Yang C, Shuai J, Ying J. Enhancing Cross-Lingual Entity Alignment in Knowledge Graphs through Structure Similarity Rearrangement. SENSORS (BASEL, SWITZERLAND) 2023; 23:7096. [PMID: 37631633 PMCID: PMC10459157 DOI: 10.3390/s23167096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 07/22/2023] [Accepted: 08/07/2023] [Indexed: 08/27/2023]
Abstract
Cross-lingual entity alignment in knowledge graphs is a crucial task in knowledge fusion. This task involves learning low-dimensional embeddings for nodes in different knowledge graphs and identifying equivalent entities across them by measuring the distances between their representation vectors. Existing alignment models use neural network modules and the nearest neighbors algorithm to find suitable entity pairs. However, these models often ignore the importance of local structural features of entities during the alignment stage, which may lead to reduced matching accuracy. Specifically, nodes that are poorly represented may not benefit from their surrounding context. In this article, we propose a novel alignment model called SSR, which leverages the node embedding algorithm in graphs to select candidate entities and then rearranges them by local structural similarity in the source and target knowledge graphs. Our approach improves the performance of existing approaches and is compatible with them. We demonstrate the effectiveness of our approach on the DBP15k dataset, showing that it outperforms existing methods while requiring less time.
Collapse
Affiliation(s)
- Guiyang Liu
- School of Computer and Computing Science, Hangzhou City University, Hangzhou 310015, China; (G.L.); (L.S.)
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
| | - Canghong Jin
- School of Computer and Computing Science, Hangzhou City University, Hangzhou 310015, China; (G.L.); (L.S.)
| | - Longxiang Shi
- School of Computer and Computing Science, Hangzhou City University, Hangzhou 310015, China; (G.L.); (L.S.)
| | - Cheng Yang
- School of Computer and Computing Science, Hangzhou City University, Hangzhou 310015, China; (G.L.); (L.S.)
| | - Jiangbing Shuai
- Zhejiang Academy of Science & Technology for Inspection & Quarantine, Hangzhou 310051, China;
| | - Jing Ying
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
| |
Collapse
|
2
|
Zhou X, Lv Q, Geng A. Matching heterogeneous ontologies based on multi-strategy adaptive co-firefly algorithm. Knowl Inf Syst 2023. [DOI: 10.1007/s10115-023-01845-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
|
3
|
Abstract
Community-developed minimum information checklists are designed to drive the rich and consistent reporting of metadata, underpinning the reproducibility and reuse of the data. These reporting guidelines, however, are usually in the form of narratives intended for human consumption. Modular and reusable machine-readable versions are also needed. Firstly, to provide the necessary quantitative and verifiable measures of the degree to which the metadata descriptors meet these community requirements, a requirement of the FAIR Principles. Secondly, to encourage the creation of standards-driven templates for metadata authoring, especially when describing complex experiments that require multiple reporting guidelines to be used in combination or extended. We present new functionalities to support the creation and improvements of machine-readable models. We apply the approach to an exemplar set of reporting guidelines in Life Science and discuss the challenges. Our work, targeted to developers of standards and those familiar with standards, promotes the concept of compositional metadata elements and encourages the creation of community-standards which are modular and interoperable from the onset.
Collapse
|
4
|
van Damme P, Fernández-Breis JT, Benis N, Miñarro-Gimenez JA, de Keizer NF, Cornet R. Performance assessment of ontology matching systems for FAIR data. J Biomed Semantics 2022; 13:19. [PMID: 35841031 PMCID: PMC9284868 DOI: 10.1186/s13326-022-00273-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 06/15/2022] [Indexed: 11/24/2022] Open
Abstract
Background Ontology matching should contribute to the interoperability aspect of FAIR data (Findable, Accessible, Interoperable, and Reusable). Multiple data sources can use different ontologies for annotating their data and, thus, creating the need for dynamic ontology matching services. In this experimental study, we assessed the performance of ontology matching systems in the context of a real-life application from the rare disease domain. Additionally, we present a method for analyzing top-level classes to improve precision. Results We included three ontologies (NCIt, SNOMED CT, ORDO) and three matching systems (AgreementMakerLight 2.0, FCA-Map, LogMap 2.0). We evaluated the performance of the matching systems against reference alignments from BioPortal and the Unified Medical Language System Metathesaurus (UMLS). Then, we analyzed the top-level ancestors of matched classes, to detect incorrect mappings without consulting a reference alignment. To detect such incorrect mappings, we manually matched semantically equivalent top-level classes of ontology pairs. AgreementMakerLight 2.0, FCA-Map, and LogMap 2.0 had F1-scores of 0.55, 0.46, 0.55 for BioPortal and 0.66, 0.53, 0.58 for the UMLS respectively. Using vote-based consensus alignments increased performance across the board. Evaluation with manually created top-level hierarchy mappings revealed that on average 90% of the mappings’ classes belonged to top-level classes that matched. Conclusions Our findings show that the included ontology matching systems automatically produced mappings that were modestly accurate according to our evaluation. The hierarchical analysis of mappings seems promising when no reference alignments are available. All in all, the systems show potential to be implemented as part of an ontology matching service for querying FAIR data. Future research should focus on developing methods for the evaluation of mappings used in such mapping services, leading to their implementation in a FAIR data ecosystem. Supplementary Information The online version contains supplementary material available at (10.1186/s13326-022-00273-5).
Collapse
Affiliation(s)
- Philip van Damme
- Amsterdam UMC location University of Amsterdam, Department of Medical Informatics, Meibergdreef 9, Amsterdam, The Netherlands. .,Amsterdam Public Health, Digital Health & Methodology, Amsterdam, The Netherlands.
| | | | - Nirupama Benis
- Amsterdam UMC location University of Amsterdam, Department of Medical Informatics, Meibergdreef 9, Amsterdam, The Netherlands.,Amsterdam Public Health, Digital Health & Methodology, Amsterdam, The Netherlands
| | | | - Nicolette F de Keizer
- Amsterdam UMC location University of Amsterdam, Department of Medical Informatics, Meibergdreef 9, Amsterdam, The Netherlands.,Amsterdam Public Health, Methodology & Quality of Care, Amsterdam, The Netherlands
| | - Ronald Cornet
- Amsterdam UMC location University of Amsterdam, Department of Medical Informatics, Meibergdreef 9, Amsterdam, The Netherlands.,Amsterdam Public Health, Digital Health & Methodology, Amsterdam, The Netherlands
| |
Collapse
|
5
|
Matentzoglu N, Balhoff JP, Bello SM, Bizon C, Brush M, Callahan TJ, Chute CG, Duncan WD, Evelo CT, Gabriel D, Graybeal J, Gray A, Gyori BM, Haendel M, Harmse H, Harris NL, Harrow I, Hegde HB, Hoyt AL, Hoyt CT, Jiao D, Jiménez-Ruiz E, Jupp S, Kim H, Koehler S, Liener T, Long Q, Malone J, McLaughlin JA, McMurry JA, Moxon S, Munoz-Torres MC, Osumi-Sutherland D, Overton JA, Peters B, Putman T, Queralt-Rosinach N, Shefchek K, Solbrig H, Thessen A, Tudorache T, Vasilevsky N, Wagner AH, Mungall CJ. A Simple Standard for Sharing Ontological Mappings (SSSOM). Database (Oxford) 2022; 2022:baac035. [PMID: 35616100 PMCID: PMC9216545 DOI: 10.1093/database/baac035] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 03/08/2022] [Accepted: 05/11/2022] [Indexed: 02/03/2023]
Abstract
Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, are two terms equivalent or merely related? Are they narrow or broad matches? Or are they associated in some other way? Such relationships between the mapped terms are often not documented, which leads to incorrect assumptions and makes them hard to use in scenarios that require a high degree of precision (such as diagnostics or risk prediction). Furthermore, the lack of descriptions of how mappings were done makes it hard to combine and reconcile mappings, particularly curated and automated ones. We have developed the Simple Standard for Sharing Ontological Mappings (SSSOM) which addresses these problems by: (i) Introducing a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in mappings explicit. (ii) Defining an easy-to-use simple table-based format that can be integrated into existing data science pipelines without the need to parse or query ontologies, and that integrates seamlessly with Linked Data principles. (iii) Implementing open and community-driven collaborative workflows that are designed to evolve the standard continuously to address changing requirements and mapping practices. (iv) Providing reference tools and software libraries for working with the standard. In this paper, we present the SSSOM standard, describe several use cases in detail and survey some of the existing work on standardizing the exchange of mappings, with the goal of making mappings Findable, Accessible, Interoperable and Reusable (FAIR). The SSSOM specification can be found at http://w3id.org/sssom/spec. Database URL: http://w3id.org/sssom/spec.
Collapse
Affiliation(s)
| | - James P Balhoff
- RENCI, University of North Carolina, Chapel Hill, NC 27517, USA
| | | | - Chris Bizon
- RENCI, University of North Carolina, Chapel Hill, NC 27517, USA
| | - Matthew Brush
- University of Colorado Anschutz Medical Campus, Aurora, CO 80217, USA
| | | | | | | | - Chris T Evelo
- Maastricht University, Maastricht 6211 LK, The Netherlands
| | | | | | - Alasdair Gray
- Department of Computer Science, Heriot-Watt University, Edinburgh, Currie EH14 4AS, UK
| | | | - Melissa Haendel
- University of Colorado Anschutz Medical Campus, Aurora, CO 80217, USA
| | - Henriette Harmse
- European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Nomi L Harris
- Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | | | - Harshad B Hegde
- Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Amelia L Hoyt
- Beth Israel Deaconess Medical Center, Boston, MA 02215, USA
| | | | - Dazhi Jiao
- Johns Hopkins University, Baltimore, MD 21210, USA
| | - Ernesto Jiménez-Ruiz
- City University of London, London EC1V 0HB, UK
- University of Oslo, Oslo 0315, Norway
| | - Simon Jupp
- SciBite Limited, Bio Data Innovation Centre, Wellcome Genome Campus, Hinxton, Saffron Walden CB10 1DR, UK
| | | | | | | | - Qinqin Long
- Leiden University Medical Center, Leiden 2333 ZA, The Netherlands
| | - James Malone
- BenchSci, 25 York St Suite 1100, Toronto, ON M5J 2V5, Canada
| | | | - Julie A McMurry
- University of Colorado Anschutz Medical Campus, Aurora, CO 80217, USA
| | - Sierra Moxon
- Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | | | | | | | - Bjoern Peters
- La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - Tim Putman
- University of Colorado Anschutz Medical Campus, Aurora, CO 80217, USA
| | | | - Kent Shefchek
- University of Colorado Anschutz Medical Campus, Aurora, CO 80217, USA
| | | | - Anne Thessen
- University of Colorado Anschutz Medical Campus, Aurora, CO 80217, USA
| | | | - Nicole Vasilevsky
- University of Colorado Anschutz Medical Campus, Aurora, CO 80217, USA
| | - Alex H Wagner
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children’s Hospital, Columbus, OH 43205, USA
- The Ohio State University College of Medicine, Columbus, OH 43210, USA
| | | |
Collapse
|
6
|
Nguyen V, Yip HY, Bajaj G, Wijesiriwardene T, Javangula V, Parthasarathy S, Sheth A, Bodenreider O. Context-Enriched Learning Models for Aligning Biomedical Vocabularies at Scale in the UMLS Metathesaurus. PROCEEDINGS OF THE ... INTERNATIONAL WORLD-WIDE WEB CONFERENCE. INTERNATIONAL WWW CONFERENCE 2022; 2022:1037-1046. [PMID: 36108322 PMCID: PMC9455675 DOI: 10.1145/3485447.3511946] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The Unified Medical Language System (UMLS) Metathesaurus construction process mainly relies on lexical algorithms and manual expert curation for integrating over 200 biomedical vocabularies. A lexical-based learning model (LexLM) was developed to predict synonymy among Metathesaurus terms and largely outperforms a rule-based approach (RBA) that approximates the current construction process. However, the LexLM has the potential for being improved further because it only uses lexical information from the source vocabularies, while the RBA also takes advantage of contextual information. We investigate the role of multiple types of contextual information available to the UMLS editors, namely source synonymy (SS), source semantic group (SG), and source hierarchical relations (HR), for the UMLS vocabulary alignment (UVA) problem. In this paper, we develop multiple variants of context-enriched learning models (ConLMs) by adding to the LexLM the types of contextual information listed above. We represent these context types in context-enriched knowledge graphs (ConKGs) with four variants ConSS, ConSG, ConHR, and ConAll. We train these ConKG embeddings using seven KG embedding techniques. We create the ConLMs by concatenating the ConKG embedding vectors with the word embedding vectors from the LexLM. We evaluate the performance of the ConLMs using the UVA generalization test datasets with hundreds of millions of pairs. Our extensive experiments show a significant performance improvement from the ConLMs over the LexLM, namely +5.0% in precision (93.75%), +0.69% in recall (93.23%), +2.88% in F1 (93.49%) for the best ConLM. Our experiments also show that the ConAll variant including the three context types takes more time, but does not always perform better than other variants with a single context type. Finally, our experiments show that the pairs of terms with high lexical similarity benefit most from adding contextual information, namely +6.56% in precision (94.97%), +2.13% in recall (93.23%), +4.35% in F1 (94.09%) for the best ConLM. The pairs with lower degrees of lexical similarity also show performance improvement with +0.85% in F1 (96%) for low similarity and +1.31% in F1 (96.34%) for no similarity. These results demonstrate the importance of using contextual information in the UVA problem.
Collapse
Affiliation(s)
- Vinh Nguyen
- National Library of Medicine, Bethesda, Maryland, USA
| | - Hong Yung Yip
- University of South Carolina, Columbia, South Carolina, USA
| | | | | | | | | | - Amit Sheth
- University of South Carolina, Columbia, South Carolina, USA
| | | |
Collapse
|
7
|
McKenna L, Debruyne C, O’Sullivan D. Using linked data to create provenance-rich metadata interlinks: the design and evaluation of the NAISC-L interlinking framework for libraries, archives and museums. AI & SOCIETY 2022. [DOI: 10.1007/s00146-021-01373-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
AbstractLinked data (LD) have the capability to open up and share materials, held in libraries, archives and museums (LAMs), in ways that are restricted by many existing metadata standards. Specifically, LD interlinking can be used to enrich data and to improve data discoverability on the Web through interlinking related resources across datasets and institutions. However, there is currently a notable lack of interlinking across leading LD projects in LAMs, impacting upon the discoverability of their materials. This research describes the Novel Authoritative Interlinking for Semantic Web Cataloguing in Libraries (NAISC-L) interlinking framework. Unlike existing interlinking frameworks, NAISC-L was designed specifically with the requirements of the LAM domain in mind. The framework was evaluated by Information Professionals (IPs), including librarians, archivists and metadata cataloguers, via three user-experiments including a think-aloud test, an online interlink creation test and a field test in a music archive. Across all experiments, participants achieved a high level of interlink accuracy, and usability measures indicated that IPs found NAISC-L to be useful and user-friendly. Overall, NAISC-L was shown to be an effective framework for engaging IPs in the process of LD interlinking, and for facilitating the creation of richer and more authoritative interlinks between LAM resources. NAISC-L supports the linking of related resource across datasets and institutions, thereby enabling richer and more varied search queries, and can thus be used to improve the discoverability of materials held in LAMs.
Collapse
|
8
|
An Improved Structural-Based Ontology Matching Approach Using Similarity Spreading. INT J SEMANT WEB INF 2022. [DOI: 10.4018/ijswis.300825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Increasing number of ontologies demand the interoperability between them in order to gain accurate information. the ontology heterogeneity also makes the interoperability process even more difficult. These scenarios let the development of effective and efficient ontology matching. The existing ontology matching systems are mainly focusing with subject derivatives of the concern domain. Since ontologies are represented as data model in structured format, In this paper, a new modified model of similarity spreading for ontology mapping is proposed. In this approach the mapping mainly involves with node clustering based on edge affinity and then the graph matching is achieved by applying coefficient similarity propagation. This process is carried out by iterative manner and at the end the similarity score is calculated for iteration. This model is evaluated in terms of precision, recall and f-measure parameters and found that it outperforms well than its similar kind of systems.
Collapse
|
9
|
Multimatcher Model to Enhance Ontology Matching Using Background Knowledge. INFORMATION 2021. [DOI: 10.3390/info12110487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Ontology matching is a rapidly emerging topic crucial for semantic web effort, data integration, and interoperability. Semantic heterogeneity is one of the most challenging aspects of ontology matching. Consequently, background knowledge (BK) resources are utilized to bridge the semantic gap between the ontologies. Generic BK approaches use a single matcher to discover correspondences between entities from different ontologies. However, the Ontology Alignment Evaluation Initiative (OAEI) results show that not all matchers identify the same correct mappings. Moreover, none of the matchers can obtain good results across all matching tasks. This study proposes a novel BK multimatcher approach for improving ontology matching by effectively generating and combining mappings from biomedical ontologies. Aggregation strategies to create more effective mappings are discussed. Then, a matcher path confidence measure that helps select the most promising paths using the final mapping selection algorithm is proposed. The proposed model performance is tested using the Anatomy and Large Biomed tracks offered by the OAEI 2020. Results show that higher recall levels have been obtained. Moreover, the F-measure values achieved with our model are comparable with those obtained by the state of the art matchers.
Collapse
|
10
|
Wang P, Hu Y, Bai S, Zou S. Matching Biomedical Ontologies: Construction of Matching Clues and Systematic Evaluation of Different Combinations of Matchers. JMIR Med Inform 2021; 9:e28212. [PMID: 34420930 PMCID: PMC8414291 DOI: 10.2196/28212] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 04/23/2021] [Accepted: 05/19/2021] [Indexed: 11/20/2022] Open
Abstract
Background Ontology matching seeks to find semantic correspondences between ontologies. With an increasing number of biomedical ontologies being developed independently, matching these ontologies to solve the interoperability problem has become a critical task in biomedical applications. However, some challenges remain. First, extracting and constructing matching clues from biomedical ontologies is a nontrivial problem. Second, it is unknown whether there are dominant matchers while matching biomedical ontologies. Finally, ontology matching also suffers from computational complexity owing to the large-scale sizes of biomedical ontologies. Objective To investigate the effectiveness of matching clues and composite match approaches, this paper presents a spectrum of matchers with different combination strategies and empirically studies their influence on matching biomedical ontologies. Besides, extended reduction anchors are introduced to effectively decrease the time complexity while matching large biomedical ontologies. Methods In this paper, atomic and composite matching clues are first constructed in 4 dimensions: terminology, structure, external knowledge, and representation learning. Then, a spectrum of matchers based on a flexible combination of atomic clues are designed and utilized to comprehensively study the effectiveness. Besides, we carry out a systematic comparative evaluation of different combinations of matchers. Finally, extended reduction anchor is proposed to significantly alleviate the time complexity for matching large-scale biomedical ontologies. Results Experimental results show that considering distinguishable matching clues in biomedical ontologies leads to a substantial improvement in all available information. Besides, incorporating different types of matchers with reliability results in a marked improvement, which is comparative to the state-of-the-art methods. The dominant matchers achieve F1 measures of 0.9271, 0.8218, and 0.5 on Anatomy, FMA-NCI (Foundation Model of Anatomy-National Cancer Institute), and FMA-SNOMED data sets, respectively. Extended reduction anchor is able to solve the scalability problem of matching large biomedical ontologies. It achieves a significant reduction in time complexity with little loss of F1 measure at the same time, with a 0.21% decrease on the Anatomy data set and 0.84% decrease on the FMA-NCI data set, but with a 2.65% increase on the FMA-SNOMED data set. Conclusions This paper systematically analyzes and compares the effectiveness of different matching clues, matchers, and combination strategies. Multiple empirical studies demonstrate that distinguishing clues have significant implications for matching biomedical ontologies. In contrast to the matchers with single clue, those combining multiple clues exhibit more stable and accurate performance. In addition, our results provide evidence that the approach based on extended reduction anchors performs well for large ontology matching tasks, demonstrating an effective solution for the problem.
Collapse
Affiliation(s)
- Peng Wang
- School of Computer Science and Engineering, Southeast University, Nanjing, China.,School of Artificial Intelligence, Southeast University, Nanjing, China
| | - Yunyan Hu
- School of Computer Science and Engineering, Southeast University, Nanjing, China
| | - Shaochen Bai
- School of Artificial Intelligence, Southeast University, Nanjing, China
| | - Shiyi Zou
- Southeast University - Monash University Joint Graduate School, Suzhou, China
| |
Collapse
|
11
|
Nguyen V, Yip HY, Bodenreider O. Biomedical Vocabulary Alignment at Scale in the UMLS Metathesaurus. PROCEEDINGS OF THE ... INTERNATIONAL WORLD-WIDE WEB CONFERENCE. INTERNATIONAL WWW CONFERENCE 2021; 2021:2672-2683. [PMID: 34514472 PMCID: PMC8434895 DOI: 10.1145/3442381.3450128] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
With 214 source vocabularies, the construction and maintenance process of the UMLS (Unified Medical Language System) Metathesaurus terminology integration system is costly, time-consuming, and error-prone as it primarily relies on (1) lexical and semantic processing for suggesting groupings of synonymous terms, and (2) the expertise of UMLS editors for curating these synonymy predictions. This paper aims to improve the UMLS Metathesaurus construction process by developing a novel supervised learning approach for improving the task of suggesting synonymous pairs that can scale to the size and diversity of the UMLS source vocabularies. We evaluate this deep learning (DL) approach against a rule-based approach (RBA) that approximates the current UMLS Metathesaurus construction process. The key to the generalizability of our approach is the use of various degrees of lexical similarity in negative pairs during the training process. Our initial experiments demonstrate the strong performance across multiple datasets of our DL approach in terms of recall (91-92%), precision (88-99%), and F1 score (89-95%). Our DL approach largely outperforms the RBA method in recall (+23%), precision (+2.4%), and F1 score (+14.1%). This novel approach has great potential for improving the UMLS Metathesaurus construction process by providing better synonymy suggestions to the UMLS editors.
Collapse
Affiliation(s)
- Vinh Nguyen
- National Library of Medicine, Bethesda, Maryland, USA
| | - Hong Yung Yip
- University of South Carolina, Columbia, South Carolina, USA
| | | |
Collapse
|
12
|
|
13
|
Cabau-Laporta J, Ascensión AM, Arrospide-Elgarresta M, Gerovska D, Araúzo-Bravo MJ. FOntCell: Fusion of Ontologies of Cells. Front Cell Dev Biol 2021; 9:562908. [PMID: 33644039 PMCID: PMC7905052 DOI: 10.3389/fcell.2021.562908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Accepted: 01/05/2021] [Indexed: 11/25/2022] Open
Abstract
High-throughput cell-data technologies such as single-cell RNA-seq create a demand for algorithms for automatic cell classification and characterization. There exist several cell classification ontologies with complementary information. However, one needs to merge them to synergistically combine their information. The main difficulty in merging is to match the ontologies since they use different naming conventions. Therefore, we developed an algorithm that merges ontologies by integrating the name matching between class label names with the structure mapping between the ontology elements based on graph convolution. Since the structure mapping is a time consuming process, we designed two methods to perform the graph convolution: vectorial structure matching and constraint-based structure matching. To perform the vectorial structure matching, we designed a general method to calculate the similarities between vectors of different lengths for different metrics. Additionally, we adapted the slower Blondel method to work for structure matching. We implemented our algorithms into FOntCell, a software module in Python for efficient automatic parallel-computed merging/fusion of ontologies in the same or similar knowledge domains. FOntCell can unify dispersed knowledge from one domain into a unique ontology in OWL format and iteratively reuse it to continuously adapt ontologies with new data endlessly produced by data-driven classification methods, such as of the Human Cell Atlas. To navigate easily across the merged ontologies, it generates HTML files with tabulated and graphic summaries, and interactive circular Directed Acyclic Graphs. We used FOntCell to merge the CELDA, LifeMap and LungMAP Human Anatomy cell ontologies into a comprehensive cell ontology. We compared FOntCell with tools used for the alignment of mouse and human anatomy ontologies task proposed by the Ontology Alignment Evaluation Initiative (OAEI) and found that the Fβ alignment accuracies of FOntCell are above the geometric mean of the other tools; more importantly, it outperforms significantly the best OAEI tools in cell ontology alignment in terms of Fβ alignment accuracies.
Collapse
Affiliation(s)
- Javier Cabau-Laporta
- Computational Biology and Systems Biomedicine Group, Biodonostia Health Research Institute, San Sebastián, Spain
| | - Alex M Ascensión
- Computational Biology and Systems Biomedicine Group, Biodonostia Health Research Institute, San Sebastián, Spain
| | - Mikel Arrospide-Elgarresta
- Computational Biology and Systems Biomedicine Group, Biodonostia Health Research Institute, San Sebastián, Spain
| | - Daniela Gerovska
- Computational Biology and Systems Biomedicine Group, Biodonostia Health Research Institute, San Sebastián, Spain.,Computational Biomedicine Data Analysis Platform, Biodonostia Health Research Institute, San Sebastián, Spain
| | - Marcos J Araúzo-Bravo
- Computational Biology and Systems Biomedicine Group, Biodonostia Health Research Institute, San Sebastián, Spain.,Computational Biomedicine Data Analysis Platform, Biodonostia Health Research Institute, San Sebastián, Spain.,Basque Foundation for Science (IKERBASQUE), Bilbao, Spain.,Centro de Investigación Biomédica en Red (CIBER) of Frailty and Healthy Aging (CIBERfes), Madrid, Spain.,TransBioNet Thematic Network of Excellence for Transitional Bioinformatics, Barcelona Supercomputing Center, Barcelona, Spain.,Computational Biology and Bioinformatics, Department Cell and Developmental Biology Max Planck Institute for Molecular Biomedicine, Münster, Germany
| |
Collapse
|
14
|
SANOM-HOBBIT: simulated annealing-based ontology matching on HOBBIT platform. KNOWL ENG REV 2020. [DOI: 10.1017/s026988892000017x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Abstract
Ontology alignment is an important and inescapable problem for the interconnections of two ontologies stating the same concepts. Ontology alignment evaluation initiative (OAEI) has been taken place for more than a decade to monitor and help the progress of the field and to compare systematically existing alignment systems. As of 2018, the evaluation of systems is partly transitioned to the HOBBIT platform. This paper contains the description of our alignment system, simulated annealing-based ontology matching (SANOM), and its adaption into the HOBBIT platform. The outcomes of SANOM on the HOBBIT for several OAEI tracks are reported, and the results are compared with other competing systems in the corresponding tracks.
Collapse
|
15
|
SemTab 2019: Resources to Benchmark Tabular Data to Knowledge Graph Matching Systems. THE SEMANTIC WEB 2020. [PMCID: PMC7250611 DOI: 10.1007/978-3-030-49461-2_30] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
Abstract
Tabular data to Knowledge Graph matching is the process of assigning semantic tags from knowledge graphs (e.g., Wikidata or DBpedia) to the elements of a table. This task is a challenging problem for various reasons, including the lack of metadata (e.g., table and column names), the noisiness, heterogeneity, incompleteness and ambiguity in the data. The results of this task provide significant insights about potentially highly valuable tabular data, as recent works have shown, enabling a new family of data analytics and data science applications. Despite significant amount of work on various flavors of this problem, there is a lack of a common framework to conduct a systematic evaluation of state-of-the-art systems. The creation of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab) aims at filling this gap. In this paper, we report about the datasets, infrastructure and lessons learned from the first edition of the SemTab challenge.
Collapse
|
16
|
The Knowledge Graph Track at OAEI. THE SEMANTIC WEB 2020. [PMCID: PMC7250608 DOI: 10.1007/978-3-030-49461-2_20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The Ontology Alignment Evaluation Initiative (OAEI) is an annual evaluation of ontology matching tools. In 2018, we have started the Knowledge Graph track, whose goal is to evaluate the simultaneous matching of entities and schemas of large-scale knowledge graphs. In this paper, we discuss the design of the track and two different strategies of gold standard creation. We analyze results and experiences obtained in first editions of the track, and, by revealing a hidden task, we show that all tools submitted to the track (and probably also to other tracks) suffer from a bias which we name the golden hammer bias.
Collapse
|
17
|
Harth A, Kirrane S, Ngonga Ngomo AC, Paulheim H, Rula A, Gentile AL, Haase P, Cochez M. Detecting Synonymous Properties by Shared Data-Driven Definitions. THE SEMANTIC WEB 2020. [PMCID: PMC7250622 DOI: 10.1007/978-3-030-49461-2_21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Knowledge graphs have become an essential source of entity-centric information for modern applications. Today’s KGs have reached a size of billions of RDF triples extracted from a variety of sources, including structured sources and text. While this definitely improves completeness, the inherent variety of sources leads to severe heterogeneity, negatively affecting data quality by introducing duplicate information. We present a novel technique for detecting synonymous properties in large knowledge graphs by mining interpretable definitions of properties using association rule mining. Relying on such shared definitions, our technique is able to mine even synonym rules that have only little support in the data. In particular, our extensive experiments on DBpedia and Wikidata show that our rule-based approach can outperform state-of-the-art knowledge graph embedding techniques, while offering good interpretability through shared logical rules.
Collapse
Affiliation(s)
- Andreas Harth
- University of Erlangen-Nuremberg, Nuremberg, Germany
| | - Sabrina Kirrane
- Vienna University of Economics and Business, Vienna, Austria
| | | | | | - Anisa Rula
- University of Milano-Bicocca, Milan, Italy
| | | | | | - Michael Cochez
- Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
18
|
Abstract
Abstract
User validation is one of the challenges facing the ontology alignment community, as there are limits to the quality of the alignments produced by automated alignment algorithms. In this paper, we present a broad study on user validation of ontology alignments that encompasses three distinct but inter-related aspects: the profile of the user, the services of the alignment system, and its user interface. We discuss key issues pertaining to the alignment validation process under each of these aspects and provide an overview of how current systems address them. Finally, we use experiments from the Interactive Matching track of the Ontology Alignment Evaluation Initiative 2015–2018 to assess the impact of errors in alignment validation, and how systems cope with them as function of their services.
Collapse
|
19
|
Diversicon: Pluggable Lexical Domain Knowledge. JOURNAL ON DATA SEMANTICS 2019. [DOI: 10.1007/s13740-019-00107-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
20
|
A Framework for Efficient Matching of Large-Scale Metadata Models. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2019. [DOI: 10.1007/s13369-018-3443-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
21
|
Alghamdi SM, Sundberg BA, Sundberg JP, Schofield PN, Hoehndorf R. Quantitative evaluation of ontology design patterns for combining pathology and anatomy ontologies. Sci Rep 2019; 9:4025. [PMID: 30858527 PMCID: PMC6411989 DOI: 10.1038/s41598-019-40368-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Accepted: 02/14/2019] [Indexed: 12/28/2022] Open
Abstract
Data are increasingly annotated with multiple ontologies to capture rich information about the features of the subject under investigation. Analysis may be performed over each ontology separately, but recently there has been a move to combine multiple ontologies to provide more powerful analytical possibilities. However, it is often not clear how to combine ontologies or how to assess or evaluate the potential design patterns available. Here we use a large and well-characterized dataset of anatomic pathology descriptions from a major study of aging mice. We show how different design patterns based on the MPATH and MA ontologies provide orthogonal axes of analysis, and perform differently in over-representation and semantic similarity applications. We discuss how such a data-driven approach might be used generally to generate and evaluate ontology design patterns.
Collapse
Affiliation(s)
- Sarah M Alghamdi
- King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, 23955-6900, Saudi Arabia
- King Abdul-Aziz University, Faculty of Computing and Information Technology, Rabigh, 25732, Saudi Arabia
| | - Beth A Sundberg
- The Jackson Laboratory, 600, Main Street, Bar Harbor, ME, 04609, USA
| | - John P Sundberg
- The Jackson Laboratory, 600, Main Street, Bar Harbor, ME, 04609, USA
| | - Paul N Schofield
- The Jackson Laboratory, 600, Main Street, Bar Harbor, ME, 04609, USA.
- Department of Physiology, Development & Neuroscience, University of Cambridge, Downing Street, Cambridge, CB2 3EG, UK.
| | - Robert Hoehndorf
- King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, 23955-6900, Saudi Arabia.
| |
Collapse
|
22
|
Aligning Biomedical Metadata with Ontologies Using Clustering and Embeddings. THE SEMANTIC WEB 2019. [DOI: 10.1007/978-3-030-21348-0_10] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
|
23
|
|
24
|
|
25
|
Li W, Zhang S, Qi G. A graph-based approach for resolving incoherent ontology mappings. WEB INTELLIGENCE 2018. [DOI: 10.3233/web-180371] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Weizhuo Li
- MADIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China. E-mail:
- University of Chinese Academy of Sciences, Beijing, China. E-mail:
| | - Songmao Zhang
- MADIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China. E-mail:
| | - Guilin Qi
- School of Computer Science and Engineering, Southeast University, Nanjing, China. E-mail:
| |
Collapse
|
26
|
Dragisic Z, Ivanova V, Li H, Lambrix P. Experiences from the anatomy track in the ontology alignment evaluation initiative. J Biomed Semantics 2017; 8:56. [PMID: 29202830 PMCID: PMC5715990 DOI: 10.1186/s13326-017-0166-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2017] [Accepted: 10/27/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND One of the longest running tracks in the Ontology Alignment Evaluation Initiative is the Anatomy track which focuses on aligning two anatomy ontologies. The Anatomy track was started in 2005. In 2005 and 2006 the task in this track was to align the Foundational Model of Anatomy and the OpenGalen Anatomy Model. Since 2007 the ontologies used in the track are the Adult Mouse Anatomy and a part of the NCI Thesaurus. Since 2015 the data in the Anatomy track is also used in the Interactive track of the Ontology Alignment Evaluation Initiative. RESULTS In this paper we focus on the Anatomy track in the years 2007-2016 and the Anatomy part of the Interactive track in 2015-2016. We describe the data set and the changes it went through during the years as well as the challenges it poses for ontology alignment systems. Further, we give an overview of all systems that participated in the track and the techniques they have used. We discuss the performance results of the systems and summarize the general trends. CONCLUSIONS About 50 systems have participated in the Anatomy track. Many different techniques were used. The most popular matching techniques are string-based strategies and structure-based techniques. Many systems also use auxiliary information. The quality of the alignment has increased for the best performing systems since the beginning of the track and more and more systems check the coherence of the proposed alignment and implement a repair strategy. Further, interacting with an oracle is beneficial.
Collapse
Affiliation(s)
- Zlatan Dragisic
- Department of Computer and Information Science and Swedish e-Science Research Centre, Linköping University, Linköping, Sweden
| | - Valentina Ivanova
- Department of Computer and Information Science and Swedish e-Science Research Centre, Linköping University, Linköping, Sweden
| | - Huanyu Li
- Department of Computer and Information Science and Swedish e-Science Research Centre, Linköping University, Linköping, Sweden
| | - Patrick Lambrix
- Department of Computer and Information Science and Swedish e-Science Research Centre, Linköping University, Linköping, Sweden.
| |
Collapse
|
27
|
|
28
|
|
29
|
|
30
|
|
31
|
Abstract
Purpose
– Ontologies are used to formally describe the concepts within a domain in a machine-understandable way. Matching of heterogeneous ontologies is often essential for many applications like semantic annotation, query answering or ontology integration. Some ontologies may include a large number of entities which make the ontology matching process very complex in terms of the search space and execution time requirements. The purpose of this paper is to present a technique for finding degree of similarity between ontologies that trims down the search space by eliminating the ontology concepts that have less likelihood of being matched.
Design/methodology/approach
– Algorithms are written for finding key concepts, concept matching and relationship matching. WordNet is used for solving synonym problems during the matching process. The technique is evaluated using the reference alignments between ontologies from ontology alignment evaluation initiative benchmark in terms of degree of similarity, Pearson’s correlation coefficient and IR measures precision, recall and F-measure.
Findings
– Positive correlation between the degree of similarity and degree of similarity (reference alignment) and computed values of precision, recall and F-measure showed that if only key concepts of ontologies are compared, a time and search space efficient ontology matching system can be developed.
Originality/value
– On the basis of the present novel approach for ontology matching, it is concluded that using key concepts for ontology matching gives comparable results in reduced time and space.
Collapse
|
32
|
Tools for Ontology Matching—Practical Considerations from INTER-IoT Perspective. INTERNET AND DISTRIBUTED COMPUTING SYSTEMS 2016. [DOI: 10.1007/978-3-319-45940-0_27] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
33
|
Khan WA, Amin MB, Khattak AM, Hussain M, Afzal M, Lee S, Kim ES. Object-oriented and ontology-alignment patterns-based expressive Mediation Bridge Ontology (MBO). J Inf Sci 2015. [DOI: 10.1177/0165551514560952] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The Semantic Web is dependent on extensive knowledge management by interlinking resources on the web using matching techniques. This role is played by the progressing domain of ontology matching, by introducing ontology-matching tools. The focus of these matching tools is limited to matching techniques and automation, rather than expressive formal representation of alignments. We propose Mediation Bridge Ontology (MBO), an expressive alignment representation ontology used to store correspondences between matching ontologies matched by our ontology-matching tool, System for Parallel Heterogeneity Resolution (SPHeRe). The MBO utilizes object-oriented design patterns and the proposed ontology-alignment design patterns to provide extendibility and reusability factors to SPHeRe system. We compared our proposed system with existing systems using Coupling Factor, Number of Polymorphic methods and Rate of Change metrics to support extendibility and reusability. These factors contribute to the overall objective of interoperability for knowledge management in the Semantic Web.
Collapse
|
34
|
Sesen MB, Peake MD, Banares-Alcantara R, Tse D, Kadir T, Stanley R, Gleeson F, Brady M. Lung Cancer Assistant: a hybrid clinical decision support application for lung cancer care. J R Soc Interface 2015; 11:20140534. [PMID: 24990290 PMCID: PMC4233704 DOI: 10.1098/rsif.2014.0534] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Multidisciplinary team (MDT) meetings are becoming the model of care for cancer patients worldwide. While MDTs have improved the quality of cancer care, the meetings impose substantial time pressure on the members, who generally attend several such MDTs. We describe Lung Cancer Assistant (LCA), a clinical decision support (CDS) prototype designed to assist the experts in the treatment selection decisions in the lung cancer MDTs. A novel feature of LCA is its ability to provide rule-based and probabilistic decision support within a single platform. The guideline-based CDS is based on clinical guideline rules, while the probabilistic CDS is based on a Bayesian network trained on the English Lung Cancer Audit Database (LUCADA). We assess rule-based and probabilistic recommendations based on their concordances with the treatments recorded in LUCADA. Our results reveal that the guideline rule-based recommendations perform well in simulating the recorded treatments with exact and partial concordance rates of 0.57 and 0.79, respectively. On the other hand, the exact and partial concordance rates achieved with probabilistic results are relatively poorer with 0.27 and 0.76. However, probabilistic decision support fulfils a complementary role in providing accurate survival estimations. Compared to recorded treatments, both CDS approaches promote higher resection rates and multimodality treatments.
Collapse
Affiliation(s)
- M Berkan Sesen
- Department of Engineering Science, University of Oxford, Oxford OX1 3PJ, UK
| | - Michael D Peake
- Clinical Effectiveness and Evaluation Unit, Royal College of Physicians of London, London NW1 4LE, UK Department of Respiratory Medicine, Glenfield Hospital, Leicester LE3 9QP, UK
| | | | - Donald Tse
- Department of Clinical Radiology, Oxford University Hospitals NHS Trust, Oxford OX3 7LJ, UK
| | | | - Roz Stanley
- Clinical Effectiveness and Evaluation Unit, Royal College of Physicians of London, London NW1 4LE, UK
| | - Fergus Gleeson
- Department of Clinical Radiology, Oxford University Hospitals NHS Trust, Oxford OX3 7LJ, UK
| | - Michael Brady
- Department of Oncology, University of Oxford, Oxford OX3 7DQ, UK
| |
Collapse
|
35
|
|
36
|
Context-Based Matching: Design of a Flexible Framework and Experiment. JOURNAL ON DATA SEMANTICS 2013. [DOI: 10.1007/s13740-013-0019-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
37
|
|