Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	[Subscribe] [Scholar Register]

Number

Cited by Other Article(s)

Wikipedia bi-linear link (WBLM) model: A new approach for measuring semantic similarity and relatedness between linguistic concepts using Wikipedia link structure. Inf Process Manag 2023. [DOI: 10.1016/j.ipm.2022.103202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Babalou S, Algergawy A, König-Ries B. SimBio: Adopting Particle Swarm Optimization for ontology-based biomedical term similarity assessment. DATA KNOWL ENG 2023. [DOI: 10.1016/j.datak.2022.102137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]

Hussain MJ, Bai H, Wasti SH, Huang G, Jiang Y. Evaluating semantic similarity and relatedness between concepts by combining taxonomic and non-taxonomic semantic features of WordNet and Wikipedia. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.01.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]

Jiang S, Zhu Y, Liu C, Song X, Li X, Min W. Dataset Bias in Few-Shot Image Recognition. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023;45:229-246. [PMID: 35201982 DOI: 10.1109/tpami.2022.3153611] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]

Abstract

The goal of few-shot image recognition (FSIR) is to identify novel categories with a small number of annotated samples by exploiting transferable knowledge from training data (base categories). Most current studies assume that the transferable knowledge can be well used to identify novel categories. However, such transferable capability may be impacted by the dataset bias, and this problem has rarely been investigated before. Besides, most of few-shot learning methods are biased to different datasets, which is also an important issue that needs to be investigated deeply. In this paper, we first investigate the impact of transferable capabilities learned from base categories. Specifically, we use the relevance to measure relationships between base categories and novel categories. Distributions of base categories are depicted via the instance density and category diversity. The FSIR model learns better transferable knowledge from relevant training data. In the relevant data, dense instances or diverse categories can further enrich the learned knowledge. Experimental results on different sub-datasets of Imagenet demonstrate category relevance, instance density and category diversity can depict transferable bias from distributions of base categories. Second, we investigate performance differences on different datasets from the aspects of dataset structures and different few-shot learning methods. Specifically, we introduce image complexity, intra-concept visual consistency, and inter-concept visual similarity to quantify characteristics of dataset structures. We use these quantitative characteristics and eight few-shot learning methods to analyze performance differences on multiple datasets. Based on the experimental analysis, some insightful observations are obtained from the perspective of both dataset structures and few-shot learning methods. We hope these observations are useful to guide future few-shot learning research on new datasets or tasks. Our data is available at http://123.57.42.89/dataset-bias/dataset-bias.html.

Collapse

Deng Y, Bai W, Jiang Y, Tang Y. Subgraph-based feature fusion models for semantic similarity computation in heterogeneous knowledge graphs. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Lara-Clares A, Lastra-Díaz JJ, Garcia-Serrano A. A reproducible experimental survey on biomedical sentence similarity: A string-based method sets the state of the art. PLoS One 2022;17:e0276539. [PMID: 36409715 PMCID: PMC9678326 DOI: 10.1371/journal.pone.0276539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Accepted: 10/08/2022] [Indexed: 11/22/2022] Open

Abstract

This registered report introduces the largest, and for the first time, reproducible experimental survey on biomedical sentence similarity with the following aims: (1) to elucidate the state of the art of the problem; (2) to solve some reproducibility problems preventing the evaluation of most current methods; (3) to evaluate several unexplored sentence similarity methods; (4) to evaluate for the first time an unexplored benchmark, called Corpus-Transcriptional-Regulation (CTR); (5) to carry out a study on the impact of the pre-processing stages and Named Entity Recognition (NER) tools on the performance of the sentence similarity methods; and finally, (6) to bridge the lack of software and data reproducibility resources for methods and experiments in this line of research. Our reproducible experimental survey is based on a single software platform, which is provided with a detailed reproducibility protocol and dataset as supplementary material to allow the exact replication of all our experiments and results. In addition, we introduce a new aggregated string-based sentence similarity method, called LiBlock, together with eight variants of current ontology-based methods, and a new pre-trained word embedding model trained on the full-text articles in the PMC-BioC corpus. Our experiments show that our novel string-based measure establishes the new state of the art in sentence similarity analysis in the biomedical domain and significantly outperforms all the methods evaluated herein, with the only exception of one ontology-based method. Likewise, our experiments confirm that the pre-processing stages, and the choice of the NER tool for ontology-based methods, have a very significant impact on the performance of the sentence similarity methods. We also detail some drawbacks and limitations of current methods, and highlight the need to refine the current benchmarks. Finally, a notable finding is that our new string-based method significantly outperforms all state-of-the-art Machine Learning (ML) models evaluated herein.

Collapse

Semantic Relatedness in DBpedia: A Comparative and Experimental Assessment. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.11.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Llinas J, Malhotra R. An Expanded Framework for Situation Control. Front Syst Neurosci 2022;16:796100. [PMID: 35965997 PMCID: PMC9366210 DOI: 10.3389/fnsys.2022.796100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 04/12/2022] [Indexed: 11/13/2022] Open

Abstract There is an extensive body of literature on the topic of estimating situational states, in applications ranging from cyber-defense to military operations to traffic situations and autonomous cars. In the military/defense/intelligence literature, situation assessment seems to be the sine qua non for any research on surveillance and reconnaissance, command and control, and intelligence analysis. Virtually all of this work focuses on assessing the situation-at-the-moment; many if not most of the estimation techniques are based on Data and Information Fusion (DIF) approaches, with some recent schemes employing Artificial Intelligence (AI) and Machine Learning (ML) methods. But estimating and recognizing situational conditions is most often couched in a decision-making, action-taking context, implying that actions may be needed so that certain goal situations will be reached as a result of such actions, or at least that progress toward such goal states will be made. This context thus frames the estimation of situational states in the larger context of a control-loop, with a need to understand the temporal evolution of situational states, not just a snapshot at a given time. Estimating situational dynamics requires the important functions of situation recognition, situation prediction, and situation understanding that are also central to such an integrated estimation + action-taking architecture. The varied processes for all of these combined capabilities lie in a closed-loop “situation control” framework, where the core operations of a stochastic control process involve situation recognition—learning—prediction—situation “error” assessment—and action taking to move the situation to a goal state. We propose several additional functionalities for this closed-loop control process in relation to some prior work on this topic, to include remarks on the integration of control-theoretic principles. Expanded remarks are also made on the state of the art of the schemas and computational technologies for situation recognition, prediction and understanding, as well as the roles for human intelligence in this larger framework. Collapse

Product image retrieval using category-aware siamese convolutional neural network feature. JOURNAL OF KING SAUD UNIVERSITY - COMPUTER AND INFORMATION SCIENCES 2022. [DOI: 10.1016/j.jksuci.2022.03.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Active-learning-based reconstruction of circuit model. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02700-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Xiang J, Zhang J, Zhao Y, Wu FX, Li M. Biomedical data, computational methods and tools for evaluating disease-disease associations. Brief Bioinform 2022;23:6522999. [PMID: 35136949 DOI: 10.1093/bib/bbac006] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 01/04/2022] [Accepted: 01/05/2022] [Indexed: 12/12/2022] Open

Slater LT, Russell S, Makepeace S, Carberry A, Karwath A, Williams JA, Fanning H, Ball S, Hoehndorf R, Gkoutos GV. Evaluating semantic similarity methods for comparison of text-derived phenotype profiles. BMC Med Inform Decis Mak 2022;22:33. [PMID: 35123470 PMCID: PMC8818208 DOI: 10.1186/s12911-022-01770-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Accepted: 01/21/2022] [Indexed: 11/16/2022] Open

Lastra-Díaz JJ, Lara-Clares A, Garcia-Serrano A. HESML: a real-time semantic measures library for the biomedical domain with a reproducible survey. BMC Bioinformatics 2022;23:23. [PMID: 34991460 PMCID: PMC8734250 DOI: 10.1186/s12859-021-04539-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 12/15/2021] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Ontology-based semantic similarity measures based on SNOMED-CT, MeSH, and Gene Ontology are being extensively used in many applications in biomedical text mining and genomics respectively, which has encouraged the development of semantic measures libraries based on the aforementioned ontologies. However, current state-of-the-art semantic measures libraries have some performance and scalability drawbacks derived from their ontology representations based on relational databases, or naive in-memory graph representations. Likewise, a recent reproducible survey on word similarity shows that one hybrid IC-based measure which integrates a shortest-path computation sets the state of the art in the family of ontology-based semantic measures. However, the lack of an efficient shortest-path algorithm for their real-time computation prevents both their practical use in any application and the use of any other path-based semantic similarity measure.

RESULTS

To bridge the two aforementioned gaps, this work introduces for the first time an updated version of the HESML Java software library especially designed for the biomedical domain, which implements the most efficient and scalable ontology representation reported in the literature, together with a new method for the approximation of the Dijkstra's algorithm for taxonomies, called Ancestors-based Shortest-Path Length (AncSPL), which allows the real-time computation of any path-based semantic similarity measure.

CONCLUSIONS

We introduce a set of reproducible benchmarks showing that HESML outperforms by several orders of magnitude the current state-of-the-art libraries in the three aforementioned biomedical ontologies, as well as the real-time performance and approximation quality of the new AncSPL shortest-path algorithm. Likewise, we show that AncSPL linearly scales regarding the dimension of the common ancestor subgraph regardless of the ontology size. Path-based measures based on the new AncSPL algorithm are up to six orders of magnitude faster than their exact implementation in large ontologies like SNOMED-CT and GO. Finally, we provide a detailed reproducibility protocol and dataset as supplementary material to allow the exact replication of all our experiments and results.

Collapse

Krishna Siva Prasad M, Sharma P. Exploring intrinsic information content models for addressing the issues of traditional semantic measures to evaluate verb similarity. COMPUT SPEECH LANG 2022. [DOI: 10.1016/j.csl.2021.101280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Paul M, Anand A. A New Family of Similarity Measures for Scoring Confidence of Protein Interactions Using Gene Ontology. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:19-30. [PMID: 34029194 DOI: 10.1109/tcbb.2021.3083150] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Hamad AH, Mahmood AA, Abed SA, Ying X. Semantic relatedness maximisation for word sense disambiguation using a hybrid firefly algorithm. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-210934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

HAZOP Ontology Semantic Similarity Algorithm Based on ACO-GRNN. Processes (Basel) 2021. [DOI: 10.3390/pr9122115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Martinez-Gil J, Mokadem R, Morvan F, Küng J, Hameurlain A. Interpretable entity meta-alignment in knowledge graphs using penalized regression: a case study in the biomedical domain. PROGRESS IN ARTIFICIAL INTELLIGENCE 2021. [DOI: 10.1007/s13748-021-00263-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Knowledge-based sentence semantic similarity: algebraical properties. PROGRESS IN ARTIFICIAL INTELLIGENCE 2021. [DOI: 10.1007/s13748-021-00248-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

Abstract AbstractDetermining the extent to which two text snippets are semantically equivalent is a well-researched topic in the areas of natural language processing, information retrieval and text summarization. The sentence-to-sentence similarity scoring is extensively used in both generic and query-based summarization of documents as a significance or a similarity indicator. Nevertheless, most of these applications utilize the concept of semantic similarity measure only as a tool, without paying importance to the inherent properties of such tools that ultimately restrict the scope and technical soundness of the underlined applications. This paper aims to contribute to fill in this gap. It investigates three popular WordNet hierarchical semantic similarity measures, namely path-length, Wu and Palmer and Leacock and Chodorow, from both algebraical and intuitive properties, highlighting their inherent limitations and theoretical constraints. We have especially examined properties related to range and scope of the semantic similarity score, incremental monotonicity evolution, monotonicity with respect to hyponymy/hypernymy relationship as well as a set of interactive properties. Extension from word semantic similarity to sentence similarity has also been investigated using a pairwise canonical extension. Properties of the underlined sentence-to-sentence similarity are examined and scrutinized. Next, to overcome inherent limitations of WordNet semantic similarity in terms of accounting for various Part-of-Speech word categories, a WordNet “All word-To-Noun conversion” that makes use of Categorial Variation Database (CatVar) is put forward and evaluated using a publicly available dataset with a comparison with some state-of-the-art methods. The finding demonstrates the feasibility of the proposal and opens up new opportunities in information retrieval and natural language processing tasks. Collapse

Measuring associational thinking through word embeddings. Artif Intell Rev 2021. [DOI: 10.1007/s10462-021-10056-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

Kulmanov M, Smaili FZ, Gao X, Hoehndorf R. Semantic similarity and machine learning with ontologies. Brief Bioinform 2021;22:bbaa199. [PMID: 33049044 PMCID: PMC8293838 DOI: 10.1093/bib/bbaa199] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Revised: 08/03/2020] [Accepted: 08/04/2020] [Indexed: 12/13/2022] Open

Bouvier B. Protein-Protein Interface Topology as a Predictor of Secondary Structure and Molecular Function Using Convolutional Deep Learning. J Chem Inf Model 2021;61:3292-3303. [PMID: 34225449 DOI: 10.1021/acs.jcim.1c00644] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]

Alkhamees MA, Alnuem MA, Al-Saleem SM, Al-Ssulami AM. A semantic metric for concepts similarity in knowledge graphs. J Inf Sci 2021. [DOI: 10.1177/01655515211020580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

An improved patent similarity measurement based on entities and semantic relations. J Informetr 2021. [DOI: 10.1016/j.joi.2021.101135] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

Lara-Clares A, Lastra-Díaz JJ, Garcia-Serrano A. Protocol for a reproducible experimental survey on biomedical sentence similarity. PLoS One 2021;16:e0248663. [PMID: 33760855 PMCID: PMC7990182 DOI: 10.1371/journal.pone.0248663] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 03/02/2021] [Indexed: 11/28/2022] Open

Abstract

Measuring semantic similarity between sentences is a significant task in the fields of Natural Language Processing (NLP), Information Retrieval (IR), and biomedical text mining. For this reason, the proposal of sentence similarity methods for the biomedical domain has attracted a lot of attention in recent years. However, most sentence similarity methods and experimental results reported in the biomedical domain cannot be reproduced for multiple reasons as follows: the copying of previous results without confirmation, the lack of source code and data to replicate both methods and experiments, and the lack of a detailed definition of the experimental setup, among others. As a consequence of this reproducibility gap, the state of the problem can be neither elucidated nor new lines of research be soundly set. On the other hand, there are other significant gaps in the literature on biomedical sentence similarity as follows: (1) the evaluation of several unexplored sentence similarity methods which deserve to be studied; (2) the evaluation of an unexplored benchmark on biomedical sentence similarity, called Corpus-Transcriptional-Regulation (CTR); (3) a study on the impact of the pre-processing stage and Named Entity Recognition (NER) tools on the performance of the sentence similarity methods; and finally, (4) the lack of software and data resources for the reproducibility of methods and experiments in this line of research. Identified these open problems, this registered report introduces a detailed experimental setup, together with a categorization of the literature, to develop the largest, updated, and for the first time, reproducible experimental survey on biomedical sentence similarity. Our aforementioned experimental survey will be based on our own software replication and the evaluation of all methods being studied on the same software platform, which will be specially developed for this work, and it will become the first publicly available software library for biomedical sentence similarity. Finally, we will provide a very detailed reproducibility protocol and dataset as supplementary material to allow the exact replication of all our experiments and results.

Collapse

Wang D, Zhao Y, Lin H, Zuo X. Automatic scoring of Chinese fill-in-the-blank questions based on improved P-means. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-202317] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Short text similarity measurement methods: a review. Soft comput 2021. [DOI: 10.1007/s00500-020-05479-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

A large reproducible benchmark of ontology-based methods and word embeddings for word similarity. INFORM SYST 2021. [DOI: 10.1016/j.is.2020.101636] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Top-k star queries on knowledge graphs through semantic-aware bounding match scores. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2020.106655] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Besbes G, Ben Abdallah Ben Lamine S, Baazaoui-Zghal H. Personalized Retrieval in the Medical Domain: A NoSQL Solution Based on Ontology Building. JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT 2020. [DOI: 10.1142/s0219649220500410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Jiang S, Wu W, Tomita N, Ganoe C, Hassanpour S. Multi-Ontology Refined Embeddings (MORE): A hybrid multi-ontology and corpus-based semantic representation model for biomedical concepts. J Biomed Inform 2020;111:103581. [PMID: 33010425 DOI: 10.1016/j.jbi.2020.103581] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Revised: 09/22/2020] [Accepted: 09/26/2020] [Indexed: 11/25/2022]

Abstract

OBJECTIVE

Currently, a major limitation for natural language processing (NLP) analyses in clinical applications is that concepts are not effectively referenced in various forms across different texts. This paper introduces Multi-Ontology Refined Embeddings (MORE), a novel hybrid framework that incorporates domain knowledge from multiple ontologies into a distributional semantic model, learned from a corpus of clinical text.

MATERIALS AND METHODS

We use the RadCore and MIMIC-III free-text datasets for the corpus-based component of MORE. For the ontology-based part, we use the Medical Subject Headings (MeSH) ontology and three state-of-the-art ontology-based similarity measures. In our approach, we propose a new learning objective, modified from the sigmoid cross-entropy objective function.

RESULTS AND DISCUSSION

We used two established datasets of semantic similarities among biomedical concept pairs to evaluate the quality of the generated word embeddings. On the first dataset with 29 concept pairs, with similarity scores established by physicians and medical coders, MORE's similarity scores have the highest combined correlation (0.633), which is 5.0% higher than that of the baseline model, and 12.4% higher than that of the best ontology-based similarity measure. On the second dataset with 449 concept pairs, MORE's similarity scores have a correlation of 0.481, based on the average of four medical residents' similarity ratings, and that outperforms the skip-gram model by 8.1%, and the best ontology measure by 6.9%. Furthermore, MORE outperforms three pre-trained transformer-based word embedding models (i.e., BERT, ClinicalBERT, and BioBERT) on both datasets.

CONCLUSION

MORE incorporates knowledge from several biomedical ontologies into an existing corpus-based distributional semantics model, improving both the accuracy of the learned word embeddings and the extensibility of the model to a broader range of biomedical concepts. MORE allows for more accurate clustering of concepts across a wide range of applications, such as analyzing patient health records to identify subjects with similar pathologies, or integrating heterogeneous clinical data to improve interoperability between hospitals.

Collapse

Colla D, Mensa E, Radicioni DP. Novel metrics for computing semantic similarity with sense embeddings. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.106346] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Hier DB, Kopel J, Brint SU, Wunsch DC, Olbricht GR, Azizi S, Allen B. Evaluation of standard and semantically-augmented distance metrics for neurology patients. BMC Med Inform Decis Mak 2020;20:203. [PMID: 32843023 PMCID: PMC7448345 DOI: 10.1186/s12911-020-01217-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Accepted: 08/12/2020] [Indexed: 12/23/2022] Open

Budán PD, Escañuela Gonzalez MG, Budán MCD, Martinez MV, Simari GR. Similarity notions in bipolar abstract argumentation. ARGUMENT & COMPUTATION 2020. [DOI: 10.3233/aac-190479] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

IntelliBot: A Dialogue-based chatbot for the insurance industry. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.105810] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Semantic Search Enhanced with Rating Scores. FUTURE INTERNET 2020. [DOI: 10.3390/fi12040067] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Measuring distance-based semantic similarity using meronymy and hyponymy relations. Neural Comput Appl 2020. [DOI: 10.1007/s00521-018-3766-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]

Leveraging synonymy and polysemy to improve semantic similarity assessments based on intrinsic information content. Artif Intell Rev 2020. [DOI: 10.1007/s10462-019-09725-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

An overview of distance and similarity functions for structured data. Artif Intell Rev 2020. [DOI: 10.1007/s10462-020-09821-w] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Tsaramirsis K, Tsaramirsis G, Khan FQ, Ahmad A, Khadidos AO, Khadidos A. More Agility to Semantic Similarities Algorithm Implementations. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2019;17:ijerph17010267. [PMID: 31905999 PMCID: PMC6982023 DOI: 10.3390/ijerph17010267] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/08/2019] [Revised: 12/22/2019] [Accepted: 12/28/2019] [Indexed: 02/07/2023]

Semantic association computation: a comprehensive survey. Artif Intell Rev 2019. [DOI: 10.1007/s10462-019-09781-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

Outsourcing analyses on privacy-protected multivariate categorical data stored in untrusted clouds. Knowl Inf Syst 2019. [DOI: 10.1007/s10115-019-01424-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

Molina Beltrán C, Segura Navarrete AA, Vidal-Castro C, Rubio-Manzano C, Martínez-Araneda C. Improving the affective analysis in texts. ELECTRONIC LIBRARY 2019. [DOI: 10.1108/el-11-2018-0219] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Abstract Purpose This paper aims to propose a method for automatically labelling an affective lexicon with intensity values by using the WordNet Similarity (WS) software package with the purpose of improving the results of an affective analysis process, which is relevant to interpreting the textual information that is available in social networks. The hypothesis states that it is possible to improve affective analysis by using a lexicon that is enriched with the intensity values obtained from similarity metrics. Encouraging results were obtained when an affective analysis based on a labelled lexicon was compared with that based on another lexicon without intensity values. Design/methodology/approach The authors propose a method for the automatic extraction of the affective intensity values of words using the similarity metrics implemented in WS. First, the intensity values were calculated for words having an affective root in WordNet. Then, to evaluate the effectiveness of the proposal, the results of the affective analysis based on a labelled lexicon were compared to the results of an analysis with and without affective intensity values. Findings The main contribution of this research is a method for the automatic extraction of the intensity values of affective words used to enrich a lexicon compared with the manual labelling process. The results obtained from the affective analysis with the new lexicon are encouraging, as they provide a better performance than those achieved using a lexicon without affective intensity values. Research limitations/implications Given the restrictions for calculating the similarity between two words, the lexicon labelled with intensity values is a subset of the original lexicon, which means that a large proportion of the words in the corpus are not labelled in the new lexicon. Practical implications The practical implications of this work include providing tools to improve the analysis of the feelings of the users of social networks. In particular, it is of interest to provide an affective lexicon that improves attempts to solve the problems of a digital society, such as the detection of cyberbullying. In this case, by achieving greater precision in the detection of emotions, it is possible to detect the roles of participants in a situation of cyberbullying, for example, the bully and victim. Other problems in which the application of affective lexicons is of importance are the detection of aggressiveness against women or gender violence or the detection of depressive states in young people and children. Social implications This work is interested in providing an affective lexicon that improves attempts to solve the problems of a digital society, such as the detection of cyberbullying. In this case, by achieving greater precision in the detection of emotions, it is possible to detect the roles of participants in a situation of cyber bullying, for example, the bully and victim. Other problems in which the application of affective lexicons is of importance are the detection of aggressiveness against women or gender violence or the detection of depressive states in young people and children. Originality/value The originality of the research lies in the proposed method for automatically labelling the words of an affective lexicon with intensity values by using WS. To date, a lexicon labelled with intensity values has been constructed using the opinions of experts, but that method is more expensive and requires more time than other existing methods. On the other hand, the new method developed herein is applicable to larger lexicons, requires less time and facilitates automatic updating. Collapse

Nguyen HT, Duong PH, Cambria E. Learning short-text semantic similarity with word embeddings and external knowledge sources. Knowl Based Syst 2019. [DOI: 10.1016/j.knosys.2019.07.013] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]

Li S, Wang G, Yang J. Survey on cloud model based similarity measure of uncertain concepts. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 2019. [DOI: 10.1049/trit.2019.0021] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Bazan J, Bazan-Socha S, Ochab M, Buregwa-Czuma S, Nowakowski T, Woźniak M. Effective construction of classifiers with the k-NN method supported by a concept ontology. Knowl Inf Syst 2019. [DOI: 10.1007/s10115-019-01391-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Fuentes-Lorenzo D, Morato J, Sanchez-Cuadrado S, Sanchez L. Building concept maps by adapting semantic distance metrics to Wikipedia. EDUCATION FOR INFORMATION 2019. [DOI: 10.3233/efi-190279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Zhu X, Yang X, Huang Y, Guo Q, Zhang B. Measuring similarity and relatedness using multiple semantic relations in WordNet. Knowl Inf Syst 2019. [DOI: 10.1007/s10115-019-01387-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]

Gopalakrishnan V, Jha K, Xun G, Ngo HQ, Zhang A. Towards self-learning based hypotheses generation in biomedical text domain. Bioinformatics 2019;34:2103-2115. [PMID: 29293920 DOI: 10.1093/bioinformatics/btx837] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Accepted: 12/22/2017] [Indexed: 01/01/2023] Open

Abstract

Motivation

The overwhelming amount of research articles in the domain of bio-medicine might cause important connections to remain unnoticed. Literature Based Discovery is a sub-field within biomedical text mining that peruses these articles to formulate high confident hypotheses on possible connections between medical concepts. Although many alternate methodologies have been proposed over the last decade, they still suffer from scalability issues. The primary reason, apart from the dense inter-connections between biological concepts, is the absence of information on the factors that lead to the edge-formation. In this work, we formulate this problem as a collaborative filtering task and leverage a relatively new concept of word-vectors to learn and mimic the implicit edge-formation process. Along with single-class classifier, we prune the search-space of redundant and irrelevant hypotheses to increase the efficiency of the system and at the same time maintaining and in some cases even boosting the overall accuracy.

Results

We show that our proposed framework is able to prune up to 90% of the hypotheses while still retaining high recall in top-K results. This level of efficiency enables the discovery algorithm to look for higher-order hypotheses, something that was infeasible until now. Furthermore, the generic formulation allows our approach to be agile to perform both open and closed discovery. We also experimentally validate that the core data-structures upon which the system bases its decision has a high concordance with the opinion of the experts.This coupled with the ability to understand the edge formation process provides us with interpretable results without any manual intervention.

Availability and implementation

The relevant JAVA codes are available at: https://github.com/vishrawas/Medline-Code_v2.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Torjmen-Khemakhem M, Gasmi K. Document/query expansion based on selecting significant concepts for context based retrieval of medical images. J Biomed Inform 2019;95:103210. [DOI: 10.1016/j.jbi.2019.103210] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2019] [Revised: 05/15/2019] [Accepted: 05/16/2019] [Indexed: 11/28/2022]