1
|
Wikipedia bi-linear link (WBLM) model: A new approach for measuring semantic similarity and relatedness between linguistic concepts using Wikipedia link structure. Inf Process Manag 2023. [DOI: 10.1016/j.ipm.2022.103202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
2
|
Babalou S, Algergawy A, König-Ries B. SimBio: Adopting Particle Swarm Optimization for ontology-based biomedical term similarity assessment. DATA KNOWL ENG 2023. [DOI: 10.1016/j.datak.2022.102137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
3
|
Hussain MJ, Bai H, Wasti SH, Huang G, Jiang Y. Evaluating semantic similarity and relatedness between concepts by combining taxonomic and non-taxonomic semantic features of WordNet and Wikipedia. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.01.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
4
|
Jiang S, Zhu Y, Liu C, Song X, Li X, Min W. Dataset Bias in Few-Shot Image Recognition. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:229-246. [PMID: 35201982 DOI: 10.1109/tpami.2022.3153611] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The goal of few-shot image recognition (FSIR) is to identify novel categories with a small number of annotated samples by exploiting transferable knowledge from training data (base categories). Most current studies assume that the transferable knowledge can be well used to identify novel categories. However, such transferable capability may be impacted by the dataset bias, and this problem has rarely been investigated before. Besides, most of few-shot learning methods are biased to different datasets, which is also an important issue that needs to be investigated deeply. In this paper, we first investigate the impact of transferable capabilities learned from base categories. Specifically, we use the relevance to measure relationships between base categories and novel categories. Distributions of base categories are depicted via the instance density and category diversity. The FSIR model learns better transferable knowledge from relevant training data. In the relevant data, dense instances or diverse categories can further enrich the learned knowledge. Experimental results on different sub-datasets of Imagenet demonstrate category relevance, instance density and category diversity can depict transferable bias from distributions of base categories. Second, we investigate performance differences on different datasets from the aspects of dataset structures and different few-shot learning methods. Specifically, we introduce image complexity, intra-concept visual consistency, and inter-concept visual similarity to quantify characteristics of dataset structures. We use these quantitative characteristics and eight few-shot learning methods to analyze performance differences on multiple datasets. Based on the experimental analysis, some insightful observations are obtained from the perspective of both dataset structures and few-shot learning methods. We hope these observations are useful to guide future few-shot learning research on new datasets or tasks. Our data is available at http://123.57.42.89/dataset-bias/dataset-bias.html.
Collapse
|
5
|
Deng Y, Bai W, Jiang Y, Tang Y. Subgraph-based feature fusion models for semantic similarity computation in heterogeneous knowledge graphs. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
6
|
Lara-Clares A, Lastra-Díaz JJ, Garcia-Serrano A. A reproducible experimental survey on biomedical sentence similarity: A string-based method sets the state of the art. PLoS One 2022; 17:e0276539. [PMID: 36409715 PMCID: PMC9678326 DOI: 10.1371/journal.pone.0276539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Accepted: 10/08/2022] [Indexed: 11/22/2022] Open
Abstract
This registered report introduces the largest, and for the first time, reproducible experimental survey on biomedical sentence similarity with the following aims: (1) to elucidate the state of the art of the problem; (2) to solve some reproducibility problems preventing the evaluation of most current methods; (3) to evaluate several unexplored sentence similarity methods; (4) to evaluate for the first time an unexplored benchmark, called Corpus-Transcriptional-Regulation (CTR); (5) to carry out a study on the impact of the pre-processing stages and Named Entity Recognition (NER) tools on the performance of the sentence similarity methods; and finally, (6) to bridge the lack of software and data reproducibility resources for methods and experiments in this line of research. Our reproducible experimental survey is based on a single software platform, which is provided with a detailed reproducibility protocol and dataset as supplementary material to allow the exact replication of all our experiments and results. In addition, we introduce a new aggregated string-based sentence similarity method, called LiBlock, together with eight variants of current ontology-based methods, and a new pre-trained word embedding model trained on the full-text articles in the PMC-BioC corpus. Our experiments show that our novel string-based measure establishes the new state of the art in sentence similarity analysis in the biomedical domain and significantly outperforms all the methods evaluated herein, with the only exception of one ontology-based method. Likewise, our experiments confirm that the pre-processing stages, and the choice of the NER tool for ontology-based methods, have a very significant impact on the performance of the sentence similarity methods. We also detail some drawbacks and limitations of current methods, and highlight the need to refine the current benchmarks. Finally, a notable finding is that our new string-based method significantly outperforms all state-of-the-art Machine Learning (ML) models evaluated herein.
Collapse
Affiliation(s)
- Alicia Lara-Clares
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain
| | - Juan J. Lastra-Díaz
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain
| | - Ana Garcia-Serrano
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain
| |
Collapse
|
7
|
Semantic Relatedness in DBpedia: A Comparative and Experimental Assessment. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.11.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
8
|
Llinas J, Malhotra R. An Expanded Framework for Situation Control. Front Syst Neurosci 2022; 16:796100. [PMID: 35965997 PMCID: PMC9366210 DOI: 10.3389/fnsys.2022.796100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 04/12/2022] [Indexed: 11/13/2022] Open
Abstract
There is an extensive body of literature on the topic of estimating situational states, in applications ranging from cyber-defense to military operations to traffic situations and autonomous cars. In the military/defense/intelligence literature, situation assessment seems to be the sine qua non for any research on surveillance and reconnaissance, command and control, and intelligence analysis. Virtually all of this work focuses on assessing the situation-at-the-moment; many if not most of the estimation techniques are based on Data and Information Fusion (DIF) approaches, with some recent schemes employing Artificial Intelligence (AI) and Machine Learning (ML) methods. But estimating and recognizing situational conditions is most often couched in a decision-making, action-taking context, implying that actions may be needed so that certain goal situations will be reached as a result of such actions, or at least that progress toward such goal states will be made. This context thus frames the estimation of situational states in the larger context of a control-loop, with a need to understand the temporal evolution of situational states, not just a snapshot at a given time. Estimating situational dynamics requires the important functions of situation recognition, situation prediction, and situation understanding that are also central to such an integrated estimation + action-taking architecture. The varied processes for all of these combined capabilities lie in a closed-loop “situation control” framework, where the core operations of a stochastic control process involve situation recognition—learning—prediction—situation “error” assessment—and action taking to move the situation to a goal state. We propose several additional functionalities for this closed-loop control process in relation to some prior work on this topic, to include remarks on the integration of control-theoretic principles. Expanded remarks are also made on the state of the art of the schemas and computational technologies for situation recognition, prediction and understanding, as well as the roles for human intelligence in this larger framework.
Collapse
Affiliation(s)
- James Llinas
- Industrial and Systems Engineering Department, University at Buffalo, Buffalo, NY, United States
- *Correspondence: James Llinas,
| | - Raj Malhotra
- U.S. Air Force Research Laboratory Sensors Directorate, Wright-Patterson Air Force Base, Dayton, OH, United States
| |
Collapse
|
9
|
Product image retrieval using category-aware siamese convolutional neural network feature. JOURNAL OF KING SAUD UNIVERSITY - COMPUTER AND INFORMATION SCIENCES 2022. [DOI: 10.1016/j.jksuci.2022.03.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
10
|
Active-learning-based reconstruction of circuit model. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02700-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
11
|
Xiang J, Zhang J, Zhao Y, Wu FX, Li M. Biomedical data, computational methods and tools for evaluating disease-disease associations. Brief Bioinform 2022; 23:6522999. [PMID: 35136949 DOI: 10.1093/bib/bbac006] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 01/04/2022] [Accepted: 01/05/2022] [Indexed: 12/12/2022] Open
Abstract
In recent decades, exploring potential relationships between diseases has been an active research field. With the rapid accumulation of disease-related biomedical data, a lot of computational methods and tools/platforms have been developed to reveal intrinsic relationship between diseases, which can provide useful insights to the study of complex diseases, e.g. understanding molecular mechanisms of diseases and discovering new treatment of diseases. Human complex diseases involve both external phenotypic abnormalities and complex internal molecular mechanisms in organisms. Computational methods with different types of biomedical data from phenotype to genotype can evaluate disease-disease associations at different levels, providing a comprehensive perspective for understanding diseases. In this review, available biomedical data and databases for evaluating disease-disease associations are first summarized. Then, existing computational methods for disease-disease associations are reviewed and classified into five groups in terms of the usages of biomedical data, including disease semantic-based, phenotype-based, function-based, representation learning-based and text mining-based methods. Further, we summarize software tools/platforms for computation and analysis of disease-disease associations. Finally, we give a discussion and summary on the research of disease-disease associations. This review provides a systematic overview for current disease association research, which could promote the development and applications of computational methods and tools/platforms for disease-disease associations.
Collapse
Affiliation(s)
- Ju Xiang
- School of Computer Science and Engineering, Central South University, China
| | - Jiashuai Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Yichao Zhao
- School of Computer Science and Engineering, Central South University, China
| | - Fang-Xiang Wu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Min Li
- Division of Biomedical Engineering and Department of Mechanical Engineering at University of Saskatchewan, Saskatoon, Canada
| |
Collapse
|
12
|
Slater LT, Russell S, Makepeace S, Carberry A, Karwath A, Williams JA, Fanning H, Ball S, Hoehndorf R, Gkoutos GV. Evaluating semantic similarity methods for comparison of text-derived phenotype profiles. BMC Med Inform Decis Mak 2022; 22:33. [PMID: 35123470 PMCID: PMC8818208 DOI: 10.1186/s12911-022-01770-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Accepted: 01/21/2022] [Indexed: 11/16/2022] Open
Abstract
Background Semantic similarity is a valuable tool for analysis in biomedicine. When applied to phenotype profiles derived from clinical text, they have the capacity to enable and enhance ‘patient-like me’ analyses, automated coding, differential diagnosis, and outcome prediction. While a large body of work exists exploring the use of semantic similarity for multiple tasks, including protein interaction prediction, and rare disease differential diagnosis, there is less work exploring comparison of patient phenotype profiles for clinical tasks. Moreover, there are no experimental explorations of optimal parameters or better methods in the area. Methods We develop a platform for reproducible benchmarking and comparison of experimental conditions for patient phentoype similarity. Using the platform, we evaluate the task of ranking shared primary diagnosis from uncurated phenotype profiles derived from all text narrative associated with admissions in the medical information mart for intensive care (MIMIC-III). Results 300 semantic similarity configurations were evaluated, as well as one embedding-based approach. On average, measures that did not make use of an external information content measure performed slightly better, however the best-performing configurations when measured by area under receiver operating characteristic curve and Top Ten Accuracy used term-specificity and annotation-frequency measures. Conclusion We identified and interpreted the performance of a large number of semantic similarity configurations for the task of classifying diagnosis from text-derived phenotype profiles in one setting. We also provided a basis for further research on other settings and related tasks in the area.
Collapse
|
13
|
Lastra-Díaz JJ, Lara-Clares A, Garcia-Serrano A. HESML: a real-time semantic measures library for the biomedical domain with a reproducible survey. BMC Bioinformatics 2022; 23:23. [PMID: 34991460 PMCID: PMC8734250 DOI: 10.1186/s12859-021-04539-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 12/15/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Ontology-based semantic similarity measures based on SNOMED-CT, MeSH, and Gene Ontology are being extensively used in many applications in biomedical text mining and genomics respectively, which has encouraged the development of semantic measures libraries based on the aforementioned ontologies. However, current state-of-the-art semantic measures libraries have some performance and scalability drawbacks derived from their ontology representations based on relational databases, or naive in-memory graph representations. Likewise, a recent reproducible survey on word similarity shows that one hybrid IC-based measure which integrates a shortest-path computation sets the state of the art in the family of ontology-based semantic measures. However, the lack of an efficient shortest-path algorithm for their real-time computation prevents both their practical use in any application and the use of any other path-based semantic similarity measure. RESULTS To bridge the two aforementioned gaps, this work introduces for the first time an updated version of the HESML Java software library especially designed for the biomedical domain, which implements the most efficient and scalable ontology representation reported in the literature, together with a new method for the approximation of the Dijkstra's algorithm for taxonomies, called Ancestors-based Shortest-Path Length (AncSPL), which allows the real-time computation of any path-based semantic similarity measure. CONCLUSIONS We introduce a set of reproducible benchmarks showing that HESML outperforms by several orders of magnitude the current state-of-the-art libraries in the three aforementioned biomedical ontologies, as well as the real-time performance and approximation quality of the new AncSPL shortest-path algorithm. Likewise, we show that AncSPL linearly scales regarding the dimension of the common ancestor subgraph regardless of the ontology size. Path-based measures based on the new AncSPL algorithm are up to six orders of magnitude faster than their exact implementation in large ontologies like SNOMED-CT and GO. Finally, we provide a detailed reproducibility protocol and dataset as supplementary material to allow the exact replication of all our experiments and results.
Collapse
Affiliation(s)
- Juan J. Lastra-Díaz
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), C/Juan del Rosal 16, 28040 Madrid, Spain
| | - Alicia Lara-Clares
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), C/Juan del Rosal 16, 28040 Madrid, Spain
| | - Ana Garcia-Serrano
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), C/Juan del Rosal 16, 28040 Madrid, Spain
| |
Collapse
|
14
|
Krishna Siva Prasad M, Sharma P. Exploring intrinsic information content models for addressing the issues of traditional semantic measures to evaluate verb similarity. COMPUT SPEECH LANG 2022. [DOI: 10.1016/j.csl.2021.101280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
15
|
Paul M, Anand A. A New Family of Similarity Measures for Scoring Confidence of Protein Interactions Using Gene Ontology. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:19-30. [PMID: 34029194 DOI: 10.1109/tcbb.2021.3083150] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The large-scale protein-protein interaction (PPI) data has the potential to play a significant role in the endeavor of understanding cellular processes. However, the presence of a considerable fraction of false positives is a bottleneck in realizing this potential. There have been continuous efforts to utilize complementary resources for scoring confidence of PPIs in a manner that false positive interactions get a low confidence score. Gene Ontology (GO), a taxonomy of biological terms to represent the properties of gene products and their relations, has been widely used for this purpose. We utilize GO to introduce a new set of specificity measures: Relative Depth Specificity (RDS), Relative Node-based Specificity (RNS), and Relative Edge-based Specificity (RES), leading to a new family of similarity measures. We use these similarity measures to obtain a confidence score for each PPI. We evaluate the new measures using four different benchmarks. We show that all the three measures are quite effective. Notably, RNS and RES more effectively distinguish true PPIs from false positives than the existing alternatives. RES also shows a robust set-discriminating power and can be useful for protein functional clustering as well.
Collapse
|
16
|
Hamad AH, Mahmood AA, Abed SA, Ying X. Semantic relatedness maximisation for word sense disambiguation using a hybrid firefly algorithm. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-210934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Word sense disambiguation (WSD) refers to determining the right meaning of a vague word using its context. The WSD intermediately consolidates the performance of final tasks to achieve high accuracy. Mainly, a WSD solution improves the accuracy of text summarisation, information retrieval, and machine translation. This study addresses the WSD by assigning a set of senses to a given text, where the maximum semantic relatedness is obtained. This is achieved by proposing a swarm intelligence method, called firefly algorithm (FA) to find the best possible set of senses. Because of the FA is based on a population of solutions, it explores the problem space more than exploiting it. Hence, we hybridise the FA with a one-point search algorithm to improve its exploitation capacity. Practically, this hybridisation aims to maximise the semantic relatedness of an eligible set of senses. In this study, the semantic relatedness is measured by proposing a glosses-overlapping method enriched by the notion of information content. To evaluate the proposed method, we have conducted intensive experiments with comparisons to the related works based on benchmark datasets. The obtained results showed that our method is comparable if not superior to the related works. Thus, the proposed method can be considered as an efficient solver for the WSD task.
Collapse
Affiliation(s)
- Aws Hamed Hamad
- Ministry of Higher Education & Scientific Research, Baghdad, Iraq
| | | | - Saad Adnan Abed
- High Performance Cloud Computing Center, Universiti Teknologi PETRONAS, Seri Iskandar, Perak, Malaysia
| | - Xu Ying
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan, China
| |
Collapse
|
17
|
HAZOP Ontology Semantic Similarity Algorithm Based on ACO-GRNN. Processes (Basel) 2021. [DOI: 10.3390/pr9122115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Hazard and operability (HAZOP) is an important safety analysis method, which is widely used in the safety evaluation of petrochemical industry. The HAZOP analysis report contains a large amount of expert knowledge and experience. In order to realize the effective expression and reuse of knowledge, the knowledge ontology is constructed to store the risk propagation path and realize the standardization of knowledge expression. On this basis, a comprehensive algorithm of ontology semantic similarity based on the ant clony optimization generalized neural network (ACO-GRNN) model is proposed to improve the accuracy of semantic comparison. This method combines the concept name, semantic distance, and improved attribute coincidence calculation method, and ACO-GRNN is used to train the weights of each part, avoiding the influence of manual weighting. The results show that the Pearson coefficient of this method reaches 0.9819, which is 45.83% higher than the traditional method. It could solve the problems of semantic comparison and matching, and lays a good foundation for subsequent knowledge retrieval and reuse.
Collapse
|
18
|
Martinez-Gil J, Mokadem R, Morvan F, Küng J, Hameurlain A. Interpretable entity meta-alignment in knowledge graphs using penalized regression: a case study in the biomedical domain. PROGRESS IN ARTIFICIAL INTELLIGENCE 2021. [DOI: 10.1007/s13748-021-00263-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
19
|
Knowledge-based sentence semantic similarity: algebraical properties. PROGRESS IN ARTIFICIAL INTELLIGENCE 2021. [DOI: 10.1007/s13748-021-00248-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
AbstractDetermining the extent to which two text snippets are semantically equivalent is a well-researched topic in the areas of natural language processing, information retrieval and text summarization. The sentence-to-sentence similarity scoring is extensively used in both generic and query-based summarization of documents as a significance or a similarity indicator. Nevertheless, most of these applications utilize the concept of semantic similarity measure only as a tool, without paying importance to the inherent properties of such tools that ultimately restrict the scope and technical soundness of the underlined applications. This paper aims to contribute to fill in this gap. It investigates three popular WordNet hierarchical semantic similarity measures, namely path-length, Wu and Palmer and Leacock and Chodorow, from both algebraical and intuitive properties, highlighting their inherent limitations and theoretical constraints. We have especially examined properties related to range and scope of the semantic similarity score, incremental monotonicity evolution, monotonicity with respect to hyponymy/hypernymy relationship as well as a set of interactive properties. Extension from word semantic similarity to sentence similarity has also been investigated using a pairwise canonical extension. Properties of the underlined sentence-to-sentence similarity are examined and scrutinized. Next, to overcome inherent limitations of WordNet semantic similarity in terms of accounting for various Part-of-Speech word categories, a WordNet “All word-To-Noun conversion” that makes use of Categorial Variation Database (CatVar) is put forward and evaluated using a publicly available dataset with a comparison with some state-of-the-art methods. The finding demonstrates the feasibility of the proposal and opens up new opportunities in information retrieval and natural language processing tasks.
Collapse
|
20
|
Measuring associational thinking through word embeddings. Artif Intell Rev 2021. [DOI: 10.1007/s10462-021-10056-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
AbstractThe development of a model to quantify semantic similarity and relatedness between words has been the major focus of many studies in various fields, e.g. psychology, linguistics, and natural language processing. Unlike the measures proposed by most previous research, this article is aimed at estimating automatically the strength of associative words that can be semantically related or not. We demonstrate that the performance of the model depends not only on the combination of independently constructed word embeddings (namely, corpus- and network-based embeddings) but also on the way these word vectors interact. The research concludes that the weighted average of the cosine-similarity coefficients derived from independent word embeddings in a double vector space tends to yield high correlations with human judgements. Moreover, we demonstrate that evaluating word associations through a measure that relies on not only the rank ordering of word pairs but also the strength of associations can reveal some findings that go unnoticed by traditional measures such as Spearman’s and Pearson’s correlation coefficients.
Collapse
|
21
|
Kulmanov M, Smaili FZ, Gao X, Hoehndorf R. Semantic similarity and machine learning with ontologies. Brief Bioinform 2021; 22:bbaa199. [PMID: 33049044 PMCID: PMC8293838 DOI: 10.1093/bib/bbaa199] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Revised: 08/03/2020] [Accepted: 08/04/2020] [Indexed: 12/13/2022] Open
Abstract
Ontologies have long been employed in the life sciences to formally represent and reason over domain knowledge and they are employed in almost every major biological database. Recently, ontologies are increasingly being used to provide background knowledge in similarity-based analysis and machine learning models. The methods employed to combine ontologies and machine learning are still novel and actively being developed. We provide an overview over the methods that use ontologies to compute similarity and incorporate them in machine learning methods; in particular, we outline how semantic similarity measures and ontology embeddings can exploit the background knowledge in ontologies and how ontologies can provide constraints that improve machine learning models. The methods and experiments we describe are available as a set of executable notebooks, and we also provide a set of slides and additional resources at https://github.com/bio-ontology-research-group/machine-learning-with-ontologies.
Collapse
Affiliation(s)
| | | | - Xin Gao
- Computational Bioscience Research Center and lead of the Structural and Functional Bioinformatics Group at King Abdullah University of Science and Technology
| | | |
Collapse
|
22
|
Bouvier B. Protein-Protein Interface Topology as a Predictor of Secondary Structure and Molecular Function Using Convolutional Deep Learning. J Chem Inf Model 2021; 61:3292-3303. [PMID: 34225449 DOI: 10.1021/acs.jcim.1c00644] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
To power the specific recognition and binding of protein partners into functional complexes, a wealth of information about the structure and function of the partners is necessarily encoded into the global shape of protein-protein interfaces and their local topological features. To identify whether this is the case, this study uses convolutional deep learning methods (typically leveraged for 2D image recognition) on 3D voxel representations of protein-protein interfaces colored by burial depth. A novel two-stage network fed with voxelizations of each interface at two distinct resolutions achieves balance between performance and computational cost. From the shape of the interfaces, the network tries to predict the presence of secondary structure motifs at the interface and the molecular function of the corresponding complex. Secondary structure and certain classes of function are found to be very well predicted, validating the hypothesis that interface shape is a conveyor of higher-level information. Interface patterns triggering the recognition of specific classes are also identified and described.
Collapse
Affiliation(s)
- Benjamin Bouvier
- Laboratoire de Glycochimie, des Antimicrobiens et des Agroressources, CNRS UMR7378/Université de Picardie Jules Verne, 10 rue Baudelocque, 80039 Amiens Cedex, France
| |
Collapse
|
23
|
Alkhamees MA, Alnuem MA, Al-Saleem SM, Al-Ssulami AM. A semantic metric for concepts similarity in knowledge graphs. J Inf Sci 2021. [DOI: 10.1177/01655515211020580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Semantic similarity between concepts concerns expressing the degree of similarity in meaning between two concepts in a computational model. This problem has recently attracted considerable attention from researchers in attempting to automate the understanding of word meanings to expedite the classification of users’ opinions and attitudes embedded in text. In this article, a semantic similarity metric is presented. The proposed metric, namely, weighted information-content ( wic), exploits the information content of the least common subsumer of two compared concepts and the depth information in knowledge graphs such as DBPedia and YAGO. The two similarity components were combined using calibrated cooperative contributions from both similarity components. A statistical test using the Spearman correlations on well-known human judgement word-similarity data sets showed that the wic metric produced more highly correlated similarities compared with state-of-the-art metrics. In addition, a real-world aspect category classification was evaluated, which exhibited further increased accuracy and recall.
Collapse
Affiliation(s)
- Majed A Alkhamees
- Department of Information Systems, King Saud University, Saudi Arabia
| | - Mohammed A Alnuem
- Department of Information Systems, King Saud University, Saudi Arabia
| | - Saleh M Al-Saleem
- Department of Information Systems, King Saud University, Saudi Arabia
| | | |
Collapse
|
24
|
|
25
|
Lara-Clares A, Lastra-Díaz JJ, Garcia-Serrano A. Protocol for a reproducible experimental survey on biomedical sentence similarity. PLoS One 2021; 16:e0248663. [PMID: 33760855 PMCID: PMC7990182 DOI: 10.1371/journal.pone.0248663] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 03/02/2021] [Indexed: 11/28/2022] Open
Abstract
Measuring semantic similarity between sentences is a significant task in the fields of Natural Language Processing (NLP), Information Retrieval (IR), and biomedical text mining. For this reason, the proposal of sentence similarity methods for the biomedical domain has attracted a lot of attention in recent years. However, most sentence similarity methods and experimental results reported in the biomedical domain cannot be reproduced for multiple reasons as follows: the copying of previous results without confirmation, the lack of source code and data to replicate both methods and experiments, and the lack of a detailed definition of the experimental setup, among others. As a consequence of this reproducibility gap, the state of the problem can be neither elucidated nor new lines of research be soundly set. On the other hand, there are other significant gaps in the literature on biomedical sentence similarity as follows: (1) the evaluation of several unexplored sentence similarity methods which deserve to be studied; (2) the evaluation of an unexplored benchmark on biomedical sentence similarity, called Corpus-Transcriptional-Regulation (CTR); (3) a study on the impact of the pre-processing stage and Named Entity Recognition (NER) tools on the performance of the sentence similarity methods; and finally, (4) the lack of software and data resources for the reproducibility of methods and experiments in this line of research. Identified these open problems, this registered report introduces a detailed experimental setup, together with a categorization of the literature, to develop the largest, updated, and for the first time, reproducible experimental survey on biomedical sentence similarity. Our aforementioned experimental survey will be based on our own software replication and the evaluation of all methods being studied on the same software platform, which will be specially developed for this work, and it will become the first publicly available software library for biomedical sentence similarity. Finally, we will provide a very detailed reproducibility protocol and dataset as supplementary material to allow the exact replication of all our experiments and results.
Collapse
Affiliation(s)
- Alicia Lara-Clares
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain
| | - Juan J. Lastra-Díaz
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain
| | - Ana Garcia-Serrano
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain
| |
Collapse
|
26
|
Wang D, Zhao Y, Lin H, Zuo X. Automatic scoring of Chinese fill-in-the-blank questions based on improved P-means. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-202317] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Chinese fill-in-the-blank questions contain both objective and subjective characteristics, and thus it has always been difficult to score them automatically. In this paper, fill-in-the-blank items are divided into those with word-level or sentence-level granularity; then, the items are automatically scored by different strategies. The automatic scoring framework combines semantic dictionary matching and semantic similarity calculations. First, fill-in-the-blank items with word-level granularity are divided into two types of test sites: the subject term test site, and the common word test site. We propose an algorithm for identifying an item’s test site. Then, a subject term dictionary with self-feedback learning ability is constructed to support the scoring of subject term test sites. The Tongyici Cilin semantic dictionary is used for scoring common word test sites. For fill-in-the-blank items with sentence-level granularity, an improved P-means model is used to generate a sentence vector of the standard answer and the examinee’s answer, and then the semantic similarity between the two answers is obtained by calculating the cosine distance of the sentence vector. Experimental results on actual test data show that the proposed algorithm has a maximum accuracy of 94.3% and achieves good results.
Collapse
Affiliation(s)
- Dong Wang
- School of Mathematics and Big Data, Guizhou Education University, Guiyang, China
- Big Data Science and Intelligent Engineering Research Institute, Guizhou Education University, Guiyang, China
| | - Yong Zhao
- School of Mathematics and Big Data, Guizhou Education University, Guiyang, China
- Big Data Science and Intelligent Engineering Research Institute, Guizhou Education University, Guiyang, China
| | - Hong Lin
- School of Mathematics and Big Data, Guizhou Education University, Guiyang, China
| | - Xin Zuo
- School of Mathematics and Big Data, Guizhou Education University, Guiyang, China
| |
Collapse
|
27
|
|
28
|
A large reproducible benchmark of ontology-based methods and word embeddings for word similarity. INFORM SYST 2021. [DOI: 10.1016/j.is.2020.101636] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
29
|
|
30
|
Besbes G, Ben Abdallah Ben Lamine S, Baazaoui-Zghal H. Personalized Retrieval in the Medical Domain: A NoSQL Solution Based on Ontology Building. JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT 2020. [DOI: 10.1142/s0219649220500410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Managing medical information in a Big Data context is a challenging task since searching for relevant information in a large volume of data needs advanced treatments. Medical data is a special type of data because it comes from different sources and in different formats and encapsulates medical knowledge. Personalized retrieval is necessary when it comes to medical data management. In fact, the patient’s medical record needs to be taken into account in order to offer relevant documents since it contains his/her medical history. The proposed approach offers an ontology building process based on the patient’s medical record. The built ontology is then used for personalized information retrieval as well as user similarity computation. The approach is composed of three layers: (1) Data layer, (2) Treatment layer and (3) Semantic layer and offers three treatments: (1) Ontology building, (2) Query reformulation and (3) User similarity computation. An application supporting all three layers has been implemented and it allowed an experimental evaluation of the proposal. The results show an improvement in the relevancy of returned medical documents.
Collapse
Affiliation(s)
- Ghada Besbes
- Riadi Laboratory, ENSI, University of Manouba, Tunisia
| | | | | |
Collapse
|
31
|
Jiang S, Wu W, Tomita N, Ganoe C, Hassanpour S. Multi-Ontology Refined Embeddings (MORE): A hybrid multi-ontology and corpus-based semantic representation model for biomedical concepts. J Biomed Inform 2020; 111:103581. [PMID: 33010425 DOI: 10.1016/j.jbi.2020.103581] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Revised: 09/22/2020] [Accepted: 09/26/2020] [Indexed: 11/25/2022]
Abstract
OBJECTIVE Currently, a major limitation for natural language processing (NLP) analyses in clinical applications is that concepts are not effectively referenced in various forms across different texts. This paper introduces Multi-Ontology Refined Embeddings (MORE), a novel hybrid framework that incorporates domain knowledge from multiple ontologies into a distributional semantic model, learned from a corpus of clinical text. MATERIALS AND METHODS We use the RadCore and MIMIC-III free-text datasets for the corpus-based component of MORE. For the ontology-based part, we use the Medical Subject Headings (MeSH) ontology and three state-of-the-art ontology-based similarity measures. In our approach, we propose a new learning objective, modified from the sigmoid cross-entropy objective function. RESULTS AND DISCUSSION We used two established datasets of semantic similarities among biomedical concept pairs to evaluate the quality of the generated word embeddings. On the first dataset with 29 concept pairs, with similarity scores established by physicians and medical coders, MORE's similarity scores have the highest combined correlation (0.633), which is 5.0% higher than that of the baseline model, and 12.4% higher than that of the best ontology-based similarity measure. On the second dataset with 449 concept pairs, MORE's similarity scores have a correlation of 0.481, based on the average of four medical residents' similarity ratings, and that outperforms the skip-gram model by 8.1%, and the best ontology measure by 6.9%. Furthermore, MORE outperforms three pre-trained transformer-based word embedding models (i.e., BERT, ClinicalBERT, and BioBERT) on both datasets. CONCLUSION MORE incorporates knowledge from several biomedical ontologies into an existing corpus-based distributional semantics model, improving both the accuracy of the learned word embeddings and the extensibility of the model to a broader range of biomedical concepts. MORE allows for more accurate clustering of concepts across a wide range of applications, such as analyzing patient health records to identify subjects with similar pathologies, or integrating heterogeneous clinical data to improve interoperability between hospitals.
Collapse
Affiliation(s)
- Steven Jiang
- Department of Computer Science, Dartmouth College, Hanover, NH 03755, USA
| | - Weiyi Wu
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| | - Naofumi Tomita
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| | - Craig Ganoe
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| | - Saeed Hassanpour
- Department of Computer Science, Dartmouth College, Hanover, NH 03755, USA; Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA; Department of Epidemiology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA.
| |
Collapse
|
32
|
Colla D, Mensa E, Radicioni DP. Novel metrics for computing semantic similarity with sense embeddings. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.106346] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
33
|
Hier DB, Kopel J, Brint SU, Wunsch DC, Olbricht GR, Azizi S, Allen B. Evaluation of standard and semantically-augmented distance metrics for neurology patients. BMC Med Inform Decis Mak 2020; 20:203. [PMID: 32843023 PMCID: PMC7448345 DOI: 10.1186/s12911-020-01217-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Accepted: 08/12/2020] [Indexed: 12/23/2022] Open
Abstract
Background Patient distances can be calculated based on signs and symptoms derived from an ontological hierarchy. There is controversy as to whether patient distance metrics that consider the semantic similarity between concepts can outperform standard patient distance metrics that are agnostic to concept similarity. The choice of distance metric can dominate the performance of classification or clustering algorithms. Our objective was to determine if semantically augmented distance metrics would outperform standard metrics on machine learning tasks. Methods We converted the neurological findings from 382 published neurology cases into sets of concepts with corresponding machine-readable codes. We calculated patient distances by four different metrics (cosine distance, a semantically augmented cosine distance, Jaccard distance, and a semantically augmented bipartite distance). Semantic augmentation for two of the metrics depended on concept similarities from a hierarchical neuro-ontology. For machine learning algorithms, we used the patient diagnosis as the ground truth label and patient findings as machine learning features. We assessed classification accuracy for four classifiers and cluster quality for two clustering algorithms for each of the distance metrics. Results Inter-patient distances were smaller when the distance metric was semantically augmented. Classification accuracy and cluster quality were not significantly different by distance metric. Conclusion Although semantic augmentation reduced inter-patient distances, we did not find improved classification accuracy or improved cluster quality with semantically augmented patient distance metrics when applied to a dataset of neurology patients. Further work is needed to assess the utility of semantically augmented patient distances.
Collapse
Affiliation(s)
- Daniel B Hier
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, 60612, USA.
| | - Jonathan Kopel
- Department of Internal Medicine, Texas Tech University Health Sciences Center, Lubbock, TX, USA
| | - Steven U Brint
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, 60612, USA
| | - Donald C Wunsch
- Department of Electrical and Computer Engineering, Missouri University of Science and Technology, Rolla, MO, 65401, USA
| | - Gayla R Olbricht
- Department of Mathematics and Statistics, Missouri University of Science and Technology, Rolla, MO, 65401, USA
| | - Sima Azizi
- Department of Electrical and Computer Engineering, Missouri University of Science and Technology, Rolla, MO, 65401, USA
| | - Blaine Allen
- Department of Electrical and Computer Engineering, Missouri University of Science and Technology, Rolla, MO, 65401, USA
| |
Collapse
|
34
|
Budán PD, Escañuela Gonzalez MG, Budán MCD, Martinez MV, Simari GR. Similarity notions in bipolar abstract argumentation. ARGUMENT & COMPUTATION 2020. [DOI: 10.3233/aac-190479] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Paola Daniela Budán
- Faculty of Exact Sciences and Technologies, Universidad Nacional de Santiago del Estero, Santiago del Estero, Argentina
- Artificial Intelligence R&D Laboratory, Department of Computer Science and Engineering Universidad Nacional del Sur, Bahía Blanca, Argentina. E-mail:
| | - Melisa Gisselle Escañuela Gonzalez
- Faculty of Exact Sciences and Technologies, Universidad Nacional de Santiago del Estero, Santiago del Estero, Argentina
- Argentine National Council of Scientific and Technical Research (CONICET), Buenos Aires, Argentina. E-mails: ,
| | - Maximiliano Celmo David Budán
- Faculty of Exact Sciences and Technologies, Universidad Nacional de Santiago del Estero, Santiago del Estero, Argentina
- Institute for Computer Science and Engineering (CONICET-UNS), Buenos Aires-Bahía Blanca, Argentina
- Argentine National Council of Scientific and Technical Research (CONICET), Buenos Aires, Argentina. E-mails: ,
| | - Maria Vanina Martinez
- Argentine National Council of Scientific and Technical Research (CONICET), Buenos Aires, Argentina. E-mails: ,
- Institute for Computer Science Research (CONICET-UBA), Buenos Aires, Argentina
- Department of Computer Science, Universidad de Buenos Aires (UBA), Buenos Aires, Argentina. E-mail:
| | - Guillermo Ricardo Simari
- Artificial Intelligence R&D Laboratory, Department of Computer Science and Engineering Universidad Nacional del Sur, Bahía Blanca, Argentina. E-mail:
- Institute for Computer Science and Engineering (CONICET-UNS), Buenos Aires-Bahía Blanca, Argentina
- Department of Computer Science and Engineering, Universidad Nacional del Sur (UNS), Buenos Aires-Bahía Blanca, Argentina. E-mail:
| |
Collapse
|
35
|
|
36
|
Abstract
This paper presents SemSime, a method based on semantic similarity for searching over a set of digital resources previously annotated by means of concepts from a weighted reference ontology. SemSime is an enhancement of SemSim and, with respect to the latter, it uses a frequency approach for weighting the ontology, and refines both the user request and the digital resources with the addition of rating scores. Such scores are High, Medium, and Low, and in the user request indicate the preferences assigned by the user to each of the concepts representing the searching criteria, whereas in the annotation of the digital resources they represent the levels of quality associated with each concept in describing the resources. The SemSime has been evaluated and the results of the experiment show that it performs better than SemSim and an evolution of it, referred to as S e m S i m R V .
Collapse
|
37
|
|
38
|
Leveraging synonymy and polysemy to improve semantic similarity assessments based on intrinsic information content. Artif Intell Rev 2020. [DOI: 10.1007/s10462-019-09725-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
39
|
|
40
|
Tsaramirsis K, Tsaramirsis G, Khan FQ, Ahmad A, Khadidos AO, Khadidos A. More Agility to Semantic Similarities Algorithm Implementations. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2019; 17:ijerph17010267. [PMID: 31905999 PMCID: PMC6982023 DOI: 10.3390/ijerph17010267] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/08/2019] [Revised: 12/22/2019] [Accepted: 12/28/2019] [Indexed: 02/07/2023]
Abstract
Algorithms for measuring semantic similarity between Gene Ontology (GO) terms has become a popular area of research in bioinformatics as it can help to detect functional associations between genes and potential impact to the health and well-being of humans, animals, and plants. While the focus of the research is on the design and improvement of GO semantic similarity algorithms, there is still a need for implementation of such algorithms before they can be used to solve actual biological problems. This can be challenging given that the potential users usually come from a biology background and they are not programmers. A number of implementations exist for some well-established algorithms but these implementations are not generic enough to support any algorithm other than the ones they are designed for. The aim of this paper is to shift the focus away from implementation, allowing researchers to focus on algorithm’s design and execution rather than implementation. This is achieved by an implementation approach capable of understanding and executing user defined GO semantic similarity algorithms. Questions and answers were used for the definition of the user defined algorithm. Additionally, this approach understands any direct acyclic digraph in an Open Biomedical Ontologies (OBO)-like format and its annotations. On the other hand, software developers of similar applications can also benefit by using this as a template for their applications.
Collapse
Affiliation(s)
- Kostandinos Tsaramirsis
- Infosuccess3D, 55 Navarxou Kountourgiotou Road, Aigaleo, 122 42 Athens, Greece
- Correspondence: (K.T.); (G.T.)
| | - Georgios Tsaramirsis
- Department of Information Technology, Faculty of Computing And IT, King Abdulaziz University, Jeddah 21589, Saudi Arabia; (F.Q.K.); (A.K.)
- Correspondence: (K.T.); (G.T.)
| | - Fazal Qudus Khan
- Department of Information Technology, Faculty of Computing And IT, King Abdulaziz University, Jeddah 21589, Saudi Arabia; (F.Q.K.); (A.K.)
| | - Awais Ahmad
- Dipartimento di informatica, universita’ degli Studi di Milano, 20122 Milan, Italy;
| | - Alaa Omar Khadidos
- Department of Information Systems, Faculty of Computing And IT, King Abdulaziz University, Jeddah 21589, Saudi Arabia;
| | - Adil Khadidos
- Department of Information Technology, Faculty of Computing And IT, King Abdulaziz University, Jeddah 21589, Saudi Arabia; (F.Q.K.); (A.K.)
| |
Collapse
|
41
|
Semantic association computation: a comprehensive survey. Artif Intell Rev 2019. [DOI: 10.1007/s10462-019-09781-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
42
|
Outsourcing analyses on privacy-protected multivariate categorical data stored in untrusted clouds. Knowl Inf Syst 2019. [DOI: 10.1007/s10115-019-01424-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
43
|
Molina Beltrán C, Segura Navarrete AA, Vidal-Castro C, Rubio-Manzano C, Martínez-Araneda C. Improving the affective analysis in texts. ELECTRONIC LIBRARY 2019. [DOI: 10.1108/el-11-2018-0219] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Purpose
This paper aims to propose a method for automatically labelling an affective lexicon with intensity values by using the WordNet Similarity (WS) software package with the purpose of improving the results of an affective analysis process, which is relevant to interpreting the textual information that is available in social networks. The hypothesis states that it is possible to improve affective analysis by using a lexicon that is enriched with the intensity values obtained from similarity metrics. Encouraging results were obtained when an affective analysis based on a labelled lexicon was compared with that based on another lexicon without intensity values.
Design/methodology/approach
The authors propose a method for the automatic extraction of the affective intensity values of words using the similarity metrics implemented in WS. First, the intensity values were calculated for words having an affective root in WordNet. Then, to evaluate the effectiveness of the proposal, the results of the affective analysis based on a labelled lexicon were compared to the results of an analysis with and without affective intensity values.
Findings
The main contribution of this research is a method for the automatic extraction of the intensity values of affective words used to enrich a lexicon compared with the manual labelling process. The results obtained from the affective analysis with the new lexicon are encouraging, as they provide a better performance than those achieved using a lexicon without affective intensity values.
Research limitations/implications
Given the restrictions for calculating the similarity between two words, the lexicon labelled with intensity values is a subset of the original lexicon, which means that a large proportion of the words in the corpus are not labelled in the new lexicon.
Practical implications
The practical implications of this work include providing tools to improve the analysis of the feelings of the users of social networks. In particular, it is of interest to provide an affective lexicon that improves attempts to solve the problems of a digital society, such as the detection of cyberbullying. In this case, by achieving greater precision in the detection of emotions, it is possible to detect the roles of participants in a situation of cyberbullying, for example, the bully and victim. Other problems in which the application of affective lexicons is of importance are the detection of aggressiveness against women or gender violence or the detection of depressive states in young people and children.
Social implications
This work is interested in providing an affective lexicon that improves attempts to solve the problems of a digital society, such as the detection of cyberbullying. In this case, by achieving greater precision in the detection of emotions, it is possible to detect the roles of participants in a situation of cyber bullying, for example, the bully and victim. Other problems in which the application of affective lexicons is of importance are the detection of aggressiveness against women or gender violence or the detection of depressive states in young people and children.
Originality/value
The originality of the research lies in the proposed method for automatically labelling the words of an affective lexicon with intensity values by using WS. To date, a lexicon labelled with intensity values has been constructed using the opinions of experts, but that method is more expensive and requires more time than other existing methods. On the other hand, the new method developed herein is applicable to larger lexicons, requires less time and facilitates automatic updating.
Collapse
|
44
|
Nguyen HT, Duong PH, Cambria E. Learning short-text semantic similarity with word embeddings and external knowledge sources. Knowl Based Syst 2019. [DOI: 10.1016/j.knosys.2019.07.013] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
45
|
Li S, Wang G, Yang J. Survey on cloud model based similarity measure of uncertain concepts. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 2019. [DOI: 10.1049/trit.2019.0021] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Affiliation(s)
- Shuai Li
- Chongqing Key Laboratory of Computational IntelligenceChongqing University of Posts and TelecommunicationsNo. 2 Chongwen Street, Nan‘an DistrictChongqingPeople's Republic of China
| | - Guoyin Wang
- Chongqing Key Laboratory of Computational IntelligenceChongqing University of Posts and TelecommunicationsNo. 2 Chongwen Street, Nan‘an DistrictChongqingPeople's Republic of China
| | - Jie Yang
- Chongqing Key Laboratory of Computational IntelligenceChongqing University of Posts and TelecommunicationsNo. 2 Chongwen Street, Nan‘an DistrictChongqingPeople's Republic of China
| |
Collapse
|
46
|
Bazan J, Bazan-Socha S, Ochab M, Buregwa-Czuma S, Nowakowski T, Woźniak M. Effective construction of classifiers with the k-NN method supported by a concept ontology. Knowl Inf Syst 2019. [DOI: 10.1007/s10115-019-01391-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
47
|
Fuentes-Lorenzo D, Morato J, Sanchez-Cuadrado S, Sanchez L. Building concept maps by adapting semantic distance metrics to Wikipedia. EDUCATION FOR INFORMATION 2019. [DOI: 10.3233/efi-190279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
| | - Jorge Morato
- Computer Science Department, Universidad Carlos III, Madrid, Spain
| | | | - Luis Sanchez
- Telematics Department, Universidad Carlos III, Madrid, Spain
| |
Collapse
|
48
|
Zhu X, Yang X, Huang Y, Guo Q, Zhang B. Measuring similarity and relatedness using multiple semantic relations in WordNet. Knowl Inf Syst 2019. [DOI: 10.1007/s10115-019-01387-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
49
|
Gopalakrishnan V, Jha K, Xun G, Ngo HQ, Zhang A. Towards self-learning based hypotheses generation in biomedical text domain. Bioinformatics 2019; 34:2103-2115. [PMID: 29293920 DOI: 10.1093/bioinformatics/btx837] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Accepted: 12/22/2017] [Indexed: 01/01/2023] Open
Abstract
Motivation The overwhelming amount of research articles in the domain of bio-medicine might cause important connections to remain unnoticed. Literature Based Discovery is a sub-field within biomedical text mining that peruses these articles to formulate high confident hypotheses on possible connections between medical concepts. Although many alternate methodologies have been proposed over the last decade, they still suffer from scalability issues. The primary reason, apart from the dense inter-connections between biological concepts, is the absence of information on the factors that lead to the edge-formation. In this work, we formulate this problem as a collaborative filtering task and leverage a relatively new concept of word-vectors to learn and mimic the implicit edge-formation process. Along with single-class classifier, we prune the search-space of redundant and irrelevant hypotheses to increase the efficiency of the system and at the same time maintaining and in some cases even boosting the overall accuracy. Results We show that our proposed framework is able to prune up to 90% of the hypotheses while still retaining high recall in top-K results. This level of efficiency enables the discovery algorithm to look for higher-order hypotheses, something that was infeasible until now. Furthermore, the generic formulation allows our approach to be agile to perform both open and closed discovery. We also experimentally validate that the core data-structures upon which the system bases its decision has a high concordance with the opinion of the experts.This coupled with the ability to understand the edge formation process provides us with interpretable results without any manual intervention. Availability and implementation The relevant JAVA codes are available at: https://github.com/vishrawas/Medline-Code_v2. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Vishrawas Gopalakrishnan
- Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, NY, USA
| | - Kishlay Jha
- Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, NY, USA
| | - Guangxu Xun
- Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, NY, USA
| | - Hung Q Ngo
- Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, NY, USA
| | - Aidong Zhang
- Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, NY, USA
| |
Collapse
|
50
|
Torjmen-Khemakhem M, Gasmi K. Document/query expansion based on selecting significant concepts for context based retrieval of medical images. J Biomed Inform 2019; 95:103210. [DOI: 10.1016/j.jbi.2019.103210] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2019] [Revised: 05/15/2019] [Accepted: 05/16/2019] [Indexed: 11/28/2022]
|