1
|
Alnowaiser K. Scientific text citation analysis using CNN features and ensemble learning model. PLoS One 2024; 19:e0302304. [PMID: 38805427 PMCID: PMC11132466 DOI: 10.1371/journal.pone.0302304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 04/02/2024] [Indexed: 05/30/2024] Open
Abstract
Citation illustrates the link between citing and cited documents. Different aspects of achievements like the journal's impact factor, author's ranking, and peers' judgment are analyzed using citations. However, citations are given the same weight for determining these important metrics. However academics contend that not all citations can ever have equal weight. Predominantly, such rankings are based on quantitative measures and the qualitative aspect is completely ignored. For a fair evaluation, qualitative evaluation of citations is needed in addition to quantitative ones. Many existing works that use qualitative evaluation consider binary class and categorize citations as important or unimportant. This study considers multi-class tasks for citation sentiments on imbalanced data and presents a novel framework for sentiment analysis in in-text citations of research articles. In the proposed technique, features are retrieved using a convolutional neural network (CNN), and classification is performed using a voting classifier that combines Logistic Regression (LR) and Stochastic Gradient Descent (SGD). The class imbalance problem is handled by the synthetic minority oversampling technique (SMOTE). Extensive experiments are performed in comparison with the proposed approach using SMOTE-generated data and machine learning models by term frequency (TF), and term frequency-inverse document frequency (TF-IDF) to evaluate the efficacy of the proposed approach for citation analysis. It is found that the proposed voting classifier using CNN features achieves an accuracy, precision, recall, and F1 score of 0.99 for all. This work not only advances the field of sentiment analysis in academic citations but also underscores the importance of incorporating qualitative aspects in evaluating the impact and sentiments conveyed through citations.
Collapse
Affiliation(s)
- Khaled Alnowaiser
- Department of Computer Engineering, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi Arabia
| |
Collapse
|
2
|
Wang M, Zhang X, Zhong H, Wang D. AIRank: An algorithm on evaluating the academic influence of papers based on heterogeneous academic network. J Inf Sci 2023. [DOI: 10.1177/01655515231151406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Evaluation of papers’ academic influence is a hot issue in the field of scientific research management. Academic big data provides a data treasure with the coexistence of different types of academic entities, which can be used to evaluate academic influence from a more macro and comprehensive perspective. Based on academic big data, a heterogeneous academic network composed of links within and between three types of academic entities (authors, papers and venues) is constructed. In addition, a new academic influence ranking algorithm, AIRank, is proposed to evaluate papers’ academic influence. Different from the existing academic influence ranking algorithms, AIRank has made innovations in the following two aspects. (1) AIRank distinguishes the influence transmission intensity between different node pairs. Different from the strategy of evenly distributing influence among different node pairs, AIRank quantifies the intensity of influence transmission between node pairs based on investigating the citation emotional attribute, semantic similarity and academic quality differences between node pairs. Based on the intensity characteristics, AIRank realises the distribution and transmission of influence among different node pairs. (2) AIRank incorporates the influence transmission from heterogeneous neighbours in evaluating papers’ influence. According to the academic influence of author nodes and venue nodes, AIRank fine-tunes the iteration formula of paper influence to obtain the ranking of papers under the joint influence of homogeneous and heterogeneous neighbours. Experimental results show that, compared with the ranking results based on citation frequency and PageRank algorithm, AIRank algorithm can produce more differentiated and reasonable academic influence ranking results.
Collapse
Affiliation(s)
- Mingyang Wang
- College of Information and Computer Engineering, Northeast Forestry University, People’s Republic of China
| | - Xinyue Zhang
- College of Information and Computer Engineering, Northeast Forestry University, People’s Republic of China
| | - Hongwei Zhong
- College of Information and Computer Engineering, Northeast Forestry University, People’s Republic of China
| | - Dailin Wang
- Library, Northeast Forestry University, People’s Republic of China
| |
Collapse
|
3
|
Toward potential hybrid features evaluation using MLP-ANN binary classification model to tackle meaningful citations. Scientometrics 2022. [DOI: 10.1007/s11192-022-04530-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
4
|
SDCF: semi-automatically structured dataset of citation functions. Scientometrics 2022. [DOI: 10.1007/s11192-022-04471-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
AbstractThere is increasing research interest in the automatic detection of citation functions, which is why authors of academic papers cite previous works. A machine learning approach for such a task requires a large dataset consisting of varied labels of citation functions. However, existing datasets contain a few instances and a limited number of labels. Furthermore, most labels have been built using narrow research fields. Addressing these issues, this paper proposes a semiautomatic approach to develop a large dataset of citation functions based on two types of datasets. The first type contains 5668 manually labeled instances to develop a new labeling scheme of citation functions, and the second type is the final dataset that is built automatically. Our labeling scheme covers papers from various areas of computer science, resulting in five coarse labels and 21 fine-grained labels. To validate the scheme, two annotators were employed for annotation experiments on 421 instances that produced Cohen’s Kappa values of 0.85 for coarse labels and 0.71 for fine-grained labels. Following this, we performed two classification stages, i.e., filtering, and fine-grained to build models using the first dataset. The classification followed several scenarios, including active learning (AL) in a low-resource setting. Our experiments show that Bidirectional Encoder Representations from Transformers (BERT)-based AL achieved 90.29% accuracy, which outperformed other methods in the filtering stage. In the fine-grained stage, the SciBERT-based AL strategy achieved a competitive 81.15% accuracy, which was slightly lower than the non-AL strategy. These results show that the AL is promising since it requires less than half of the dataset. Considering the number of labels, this paper released the largest dataset consisting of 1,840,815 instances.
Collapse
|
5
|
Scientometric Analysis and Classification of Research Using Convolutional Neural Networks: A Case Study in Data Science and Analytics. ELECTRONICS 2022. [DOI: 10.3390/electronics11132066] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
With the increasing development of published literature, classification methods based on bibliometric information and traditional machine learning approaches encounter performance challenges related to overly coarse classifications and low accuracy. This study presents a deep learning approach for scientometric analysis and classification of scientific literature based on convolutional neural networks (CNN). Three dimensions, namely publication features, author features, and content features, were divided into explicit and implicit features to form a set of scientometric terms through explicit feature extraction and implicit feature mapping. The weighted scientometric term vectors are fitted into a CNN model to achieve dual-label classification of literature based on research content and methods. The effectiveness of the proposed model is demonstrated using an application example from the data science and analytics literature. The empirical results show that the scientometric classification model proposed in this study performs better than comparable machine learning classification methods in terms of precision, recognition, and F1-score. It also exhibits higher accuracy than deep learning classification based solely on explicit and dominant features. This study provides a methodological guide for fine-grained classification of scientific literature and a thorough investigation of its practice.
Collapse
|
6
|
Wang Z, Wang K, Liu J, Huang J, Chen H. Measuring the innovation of method knowledge elements in scientific literature. Scientometrics 2022. [DOI: 10.1007/s11192-022-04350-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
7
|
Ghosal T, Tiwary P, Patton R, Stahl C. Towards establishing a research lineage via identification of significant citations. QUANTITATIVE SCIENCE STUDIES 2022. [DOI: 10.1162/qss_a_00170] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
Abstract
Finding the lineage of a research topic is crucial for understanding the prior state of the art and advancing scientific displacement. The deluge of scholarly articles makes it difficult to locate the most relevant previous work. It causes researchers to spend a considerable amount of time building up their literature list. Citations play a crucial role in discovering relevant literature. However, not all citations are created equal. The majority of the citations that a paper receives provide contextual and background information to the citing papers. In those cases, the cited paper is not central to the theme of citing papers. However, some papers build upon a given paper and further the research frontier. In those cases, the concerned cited paper plays a pivotal role in the citing paper. Hence, the nature of the citation that the former receives from the latter is significant. In this work, we discuss our investigations towards discovering significant citations of a given paper. We further show how we can leverage significant citations to build a research lineage via a significant citation graph. We demonstrate the efficacy of our idea with two real-life case studies. Our experiments yield promising results with respect to the current state of the art in classifying significant citations, outperforming the earlier ones by a relative margin of 20 points in terms of precision. We hypothesize that such an automated system can facilitate relevant literature discovery and help identify knowledge flow for a particular category of papers.
Collapse
|
8
|
An X, Sun X, Xu S. Important citations identification with semi-supervised classification model. Scientometrics 2022. [DOI: 10.1007/s11192-021-04212-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
9
|
Kunnath SN, Herrmannova D, Pride D, Knoth P. A meta-analysis of semantic classification of citations. QUANTITATIVE SCIENCE STUDIES 2021. [DOI: 10.1162/qss_a_00159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
Abstract
The aim of this literature review is to examine the current state of the art in the area of citation classification. In particular, we investigate the approaches for characterizing citations based on their semantic type. We conduct this literature review as a meta-analysis covering 60 scholarly articles in this domain. Although we included some of the manual pioneering works in this review, more emphasis is placed on the later automated methods, which use Machine Learning and Natural Language Processing (NLP) for analyzing the fine-grained linguistic features in the surrounding text of citations. The sections are organized based on the steps involved in the pipeline for citation classification. Specifically, we explore the existing classification schemes, data sets, preprocessing methods, extraction of contextual and noncontextual features, and the different types of classifiers and evaluation approaches. The review highlights the importance of identifying the citation types for research evaluation, the challenges faced by the researchers in the process, and the existing research gaps in this field.
Collapse
Affiliation(s)
| | | | - David Pride
- Knowledge Media Institute (KMi), The Open University, Milton Keynes, UK
| | - Petr Knoth
- Knowledge Media Institute (KMi), The Open University, Milton Keynes, UK
| |
Collapse
|
10
|
Applying text similarity algorithm to analyze the triangular citation behavior of scientists. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107362] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
11
|
An X, Sun X, Xu S, Hao L, Li J. Important citations identification by exploiting generative model into discriminative model. J Inf Sci 2021. [DOI: 10.1177/0165551521991034] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Although the citations between scientific documents are deemed as a vehicle for dissemination, inheritance and development of scientific knowledge, not all citations are well-positioned to be equal. A plethora of taxonomies and machine-learning models have been implemented to tackle the task of citation function and importance classification from qualitative aspect. Inspired by the success of kernel functions from resulting general models to promote the performance of the support vector machine (SVM) model, this work exploits the potential of combining generative and discriminative models for the task of citation importance classification. In more detail, generative features are generated from a topic model, citation influence model (CIM) and then fed to two discriminative traditional machine-learning models, SVM and RF (random forest), and a deep learning model, convolutional neural network (CNN), with other 13 traditional features to identify important citations. The extensive experiments are performed on two data sets with different characteristics. These three models perform better on the data set from one discipline. It is very possible that the patterns for important citations may vary by the fields, which disable machine-learning models to learn effectively the discriminative patterns from publications from multiple domains. The RF classifier outperforms the SVM classifier, which accords with many prior studies. However, the CNN model does not achieve the desired performance due to small-scaled data set. Furthermore, our CIM model–based features improve further the performance for identifying important citations.
Collapse
Affiliation(s)
- Xin An
- School of Economics & Management, Beijing Forestry University, P.R. China
| | - Xin Sun
- School of Economics & Management, Beijing Forestry University, P.R. China
| | - Shuo Xu
- Research Base of Beijing Modern Manufacturing Development, College of Economics and Management, Beijing University of Technology, P.R. China
| | - Liyuan Hao
- Research Base of Beijing Modern Manufacturing Development, College of Economics and Management, Beijing University of Technology, P.R. China
| | - Jinghong Li
- School of Economics & Management, Beijing Forestry University, P.R. China
| |
Collapse
|