1
|
Yang J, Zhang T, Tsai CY, Lu Y, Yao L. Evolution and emerging trends of named entity recognition: Bibliometric analysis from 2000 to 2023. Heliyon 2024; 10:e30053. [PMID: 38707358 PMCID: PMC11066397 DOI: 10.1016/j.heliyon.2024.e30053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 04/16/2024] [Accepted: 04/18/2024] [Indexed: 05/07/2024] Open
Abstract
Identifying valuable information within the extensive texts documented in natural language presents a significant challenge in various disciplines. Named Entity Recognition (NER), as one of the critical technologies in text data processing and mining, has become a current research hotspot. To accurately and objectively review the progress in NER, this paper employs bibliometric methods. It analyzes 1300 documents related to NER obtained from the Web of Science database using CiteSpace software. Firstly, statistical analysis is performed on the literature and journals that were obtained to explore the distribution characteristics of the literature. Secondly, the core authors in the field of NER, the development of the technology in different countries, and the leading institutions are explored by analyzing the number of publications and the cooperation network graph. Finally, explore the research frontiers, development tracks, research hotspots, and other information in this field from a scientific point of view, and further discuss the five research frontiers and seven research hotspots in depth. This paper explores the progress of NER research from both macro and micro perspectives. It aims to assist researchers in quickly grasping relevant information and offers constructive ideas and suggestions to promote the development of NER.
Collapse
Affiliation(s)
- Jun Yang
- School of Mechanical and Electrical Engineering, Guizhou Normal University, Guiyang, Guizhou, 550025, China
| | - Taihua Zhang
- School of Mechanical and Electrical Engineering, Guizhou Normal University, Guiyang, Guizhou, 550025, China
- Technical Engineering Center of Manufacturing Service and Knowledge Engineering, Guizhou Normal University, Guiyang, Guizhou, 550025, China
| | - Chieh-Yuan Tsai
- Department of Industrial Engineering and Management, Yuan Ze University, Taoyuan, 32003, Taiwan
| | - Yao Lu
- School of Mechanical and Electrical Engineering, Guizhou Normal University, Guiyang, Guizhou, 550025, China
- Technical Engineering Center of Manufacturing Service and Knowledge Engineering, Guizhou Normal University, Guiyang, Guizhou, 550025, China
| | - Liguo Yao
- School of Mechanical and Electrical Engineering, Guizhou Normal University, Guiyang, Guizhou, 550025, China
- Technical Engineering Center of Manufacturing Service and Knowledge Engineering, Guizhou Normal University, Guiyang, Guizhou, 550025, China
| |
Collapse
|
2
|
ELM-Based Active Learning via Asymmetric Samplers: Constructing a Multi-Class Text Corpus for Emotion Classification. Symmetry (Basel) 2022. [DOI: 10.3390/sym14081698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
A high-quality annotated text corpus is vital when training a deep learning model. However, it is insurmountable to acquire absolute abundant label-balanced data because of the huge labor and time costs needed in the labeling stage. To alleviate this situation, a novel active learning (AL) method is proposed in this paper, which is designed to scratch samples to construct multi-class and multi-label Chinese emotional text corpora. This work shrewdly leverages the superiorities, i.e., less learning time and generating parameters randomly possessed by extreme learning machines (ELMs), to initially measure textual emotion features. In addition, we designed a novel combined query strategy called an asymmetric sampler (which simultaneously considers uncertainty and representativeness) to verify and extract ideal samples. Furthermore, this model progressively modulates state-of-the-art prescriptions through cross-entropy, Kullback–Leibler, and Earth Mover’s distance. Finally, through stepwise-assessing the experimental results, the updated corpora present more enriched label distributions and have a higher weight of correlative emotional information. Likewise, in emotion classification experiments by ELM, the precision, recall, and F1 scores obtained 7.17%, 6.31%, and 6.71% improvements, respectively. Extensive emotion classification experiments were conducted by two widely used classifiers—SVM and LR—and their results also prove our method’s effectiveness in scratch emotional texts through comparisons.
Collapse
|
3
|
A review: development of named entity recognition (NER) technology for aeronautical information intelligence. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10197-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
4
|
Xu H, Hu B. Legal Text Recognition Using LSTM-CRF Deep Learning Model. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:9933929. [PMID: 35341203 PMCID: PMC8947905 DOI: 10.1155/2022/9933929] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 01/09/2022] [Accepted: 01/17/2022] [Indexed: 11/17/2022]
Abstract
In legal texts, named entity recognition (NER) is researched using deep learning models. First, the bidirectional (Bi)-long short-term memory (LSTM)-conditional random field (CRF) model for studying NER in legal texts is established. Second, different annotation methods are used to compare and analyze the entity recognition effect of the Bi-LSTM-CRF model. Finally, other objective loss functions are set to compare and analyze the entity recognition effect of the Bi-LSTM-CRF model. The research results show that the F1 value of the model trained on the word sequence labeling corpus on the named entity is 88.13%, higher than that of the word sequence labeling corpus. For the two types of entities, place names and organization names, the F1 values obtained by the Bi-LSTM-CRF model using word segmentation are 67.60% and 89.45%, respectively, higher than the F1 values obtained by the model using character segmentation. Therefore, the Bi-LSTM-CRF model using word segmentation is more suitable for recognizing extended entities. The parameter learning result using log-likelihood is better than that using the maximum interval criterion, and it is ideal for the Bi-LSTM-CRF model. This method provides ideas for the research of legal text recognition and has a particular value.
Collapse
Affiliation(s)
- Hesheng Xu
- Department of Law, Zhejiang University City College, Hangzhou 310015, China
| | - Bin Hu
- Department of Law, Zhejiang University City College, Hangzhou 310015, China
| |
Collapse
|
5
|
Fine-Grained Named Entity Recognition Using a Multi-Stacked Feature Fusion and Dual-Stacked Output in Korean. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app112210795] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Named entity recognition (NER) is a natural language processing task to identify spans that mention named entities and to annotate them with predefined named entity classes. Although many NER models based on machine learning have been proposed, their performance in terms of processing fine-grained NER tasks was less than acceptable. This is because the training data of a fine-grained NER task is much more unbalanced than those of a coarse-grained NER task. To overcome the problem presented by unbalanced data, we propose a fine-grained NER model that compensates for the sparseness of fine-grained NEs by using the contextual information of coarse-grained NEs. From another viewpoint, many NER models have used different levels of features, such as part-of-speech tags and gazetteer look-up results, in a nonhierarchical manner. Unfortunately, these models experience the feature interference problem. Our solution to this problem is to adopt a multi-stacked feature fusion scheme, which accepts different levels of features as its input. The proposed model is based on multi-stacked long short-term memories (LSTMs) with a multi-stacked feature fusion layer for acquiring multilevel embeddings and a dual-stacked output layer for predicting fine-grained NEs based on the categorical information of coarse-grained NEs. Our experiments indicate that the proposed model is capable of state-of-the-art performance. The results show that the proposed model can effectively alleviate the unbalanced data problem that frequently occurs in a fine-grained NER task. In addition, the multi-stacked feature fusion layer contributes to the improvement of NER performance, confirming that the proposed model can alleviate the feature interference problem. Based on this experimental result, we conclude that the proposed model is well-designed to effectively perform NER tasks.
Collapse
|
6
|
A Commodity Classification Framework Based on Machine Learning for Analysis of Trade Declaration. Symmetry (Basel) 2021. [DOI: 10.3390/sym13060964] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Text, voice, images and videos can express some intentions and facts in daily life. By understanding these contents, people can identify and analyze some behaviors. This paper focuses on the commodity trade declaration process and identifies the commodity categories based on text information on customs declarations. Although the technology of text recognition is mature in many application fields, there are few studies on the classification and recognition of customs declaration goods. In this paper, we proposed a classification framework based on machine learning (ML) models for commodity trade declaration that reaches a high rate of accuracy. This paper also proposed a symmetrical decision fusion method for this task based on convolutional neural network (CNN) and transformer. The experimental results show that the fusion model can make up for the shortcomings of the two original models and some improvements have been made. In the two datasets used in this paper, the accuracy can reach 88% and 99%, respectively. To promote the development of study of customs declaration business and Chinese text recognition, we also exposed the proprietary datasets used in this study.
Collapse
|
7
|
Abstract
Named entity recognition (NER) is an important task in the processing of natural language, which needs to determine entity boundaries and classify them into pre-defined categories. For low-resource languages, most state-of-the-art systems require tens of thousands of annotated sentences to obtain high performance. However, there is minimal annotated data available about Uyghur and Hungarian (UH languages) NER tasks. There are also specificities in each task—differences in words and word order across languages make it a challenging problem. In this paper, we present an effective solution to providing a meaningful and easy-to-use feature extractor for named entity recognition tasks: fine-tuning the pre-trained language model. Therefore, we propose a fine-tuning method for a low-resource language model, which constructs a fine-tuning dataset through data augmentation; then the dataset of a high-resource language is added; and finally the cross-language pre-trained model is fine-tuned on this dataset. In addition, we propose an attention-based fine-tuning strategy that uses symmetry to better select relevant semantic and syntactic information from pre-trained language models and apply these symmetry features to name entity recognition tasks. We evaluated our approach on Uyghur and Hungarian datasets, which showed wonderful performance compared to some strong baselines. We close with an overview of the available resources for named entity recognition and some of the open research questions.
Collapse
|