1
|
A deep active learning-based and crowdsourcing-assisted solution for named entity recognition in Chinese historical corpora. ASLIB J INFORM MANAG 2022. [DOI: 10.1108/ajim-03-2022-0107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
PurposeThe majority of existing studies about named entity recognition (NER) concentrate on the prediction enhancement of deep neural network (DNN)-based models themselves, but the issues about the scarcity of training corpus and the difficulty of annotation quality control are not fully solved, especially for Chinese ancient corpora. Therefore, designing a new integrated solution for Chinese historical NER, including automatic entity extraction and man-machine cooperative annotation, is quite valuable for improving the effectiveness of Chinese historical NER and fostering the development of low-resource information extraction.Design/methodology/approachThe research provides a systematic approach for Chinese historical NER with a three-stage framework. In addition to the stage of basic preprocessing, the authors create, retrain and yield a high-performance NER model only using limited labeled resources during the stage of augmented deep active learning (ADAL), which entails three steps—DNN-based NER modeling, hybrid pool-based sampling (HPS) based on the active learning (AL), and NER-oriented data augmentation (DA). ADAL is thought to have the capacity to maintain the performance of DNN as high as possible under the few-shot constraint. Then, to realize machine-aided quality control in crowdsourcing settings, the authors design a stage of globally-optimized automatic label consolidation (GALC). The core of GALC is a newly-designed label consolidation model called simulated annealing-based automatic label aggregation (“SA-ALC”), which incorporates the factors of worker reliability and global label estimation. The model can assure the annotation quality of those data from a crowdsourcing annotation system.FindingsExtensive experiments on two types of Chinese classical historical datasets show that the authors’ solution can effectively reduce the corpus dependency of a DNN-based NER model and alleviate the problem of label quality. Moreover, the results also show the superior performance of the authors’ pipeline approaches (i.e. HPS + DA and SA-ALC) compared to equivalent baselines in each stage.Originality/valueThe study sheds new light on the automatic extraction of Chinese historical entities in an all-technological-process integration. The solution is helpful to effectively reducing the annotation cost and controlling the labeling quality for the NER task. It can be further applied to similar tasks of information extraction and other low-resource fields in theoretical and practical ways.
Collapse
|
2
|
Yasmin R, Hassan MM, Grassel JT, Bhogaraju H, Escobedo AR, Fuentes O. Improving Crowdsourcing-Based Image Classification Through Expanded Input Elicitation and Machine Learning. Front Artif Intell 2022; 5:848056. [PMID: 35845435 PMCID: PMC9276979 DOI: 10.3389/frai.2022.848056] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Accepted: 05/24/2022] [Indexed: 11/13/2022] Open
Abstract
This work investigates how different forms of input elicitation obtained from crowdsourcing can be utilized to improve the quality of inferred labels for image classification tasks, where an image must be labeled as either positive or negative depending on the presence/absence of a specified object. Five types of input elicitation methods are tested: binary classification (positive or negative); the (x, y)-coordinate of the position participants believe a target object is located; level of confidence in binary response (on a scale from 0 to 100%); what participants believe the majority of the other participants' binary classification is; and participant's perceived difficulty level of the task (on a discrete scale). We design two crowdsourcing studies to test the performance of a variety of input elicitation methods and utilize data from over 300 participants. Various existing voting and machine learning (ML) methods are applied to make the best use of these inputs. In an effort to assess their performance on classification tasks of varying difficulty, a systematic synthetic image generation process is developed. Each generated image combines items from the MPEG-7 Core Experiment CE-Shape-1 Test Set into a single image using multiple parameters (e.g., density, transparency, etc.) and may or may not contain a target object. The difficulty of these images is validated by the performance of an automated image classification method. Experiment results suggest that more accurate results can be achieved with smaller training datasets when both the crowdsourced binary classification labels and the average of the self-reported confidence values in these labels are used as features for the ML classifiers. Moreover, when a relatively larger properly annotated dataset is available, in some cases augmenting these ML algorithms with the results (i.e., probability of outcome) from an automated classifier can achieve even higher performance than what can be obtained by using any one of the individual classifiers. Lastly, supplementary analysis of the collected data demonstrates that other performance metrics of interest, namely reduced false-negative rates, can be prioritized through special modifications of the proposed aggregation methods.
Collapse
Affiliation(s)
- Romena Yasmin
- School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ, United States
- *Correspondence: Romena Yasmin
| | - Md Mahmudulla Hassan
- Department of Computer Science, University of Texas at El Paso, El Paso, TX, United States
| | - Joshua T. Grassel
- School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ, United States
| | - Harika Bhogaraju
- School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ, United States
| | - Adolfo R. Escobedo
- School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ, United States
| | - Olac Fuentes
- Department of Computer Science, University of Texas at El Paso, El Paso, TX, United States
| |
Collapse
|
3
|
Gao L, Gan Y, Yao Z, Zhang X. A user-knowledge dynamic pattern matching process and optimization strategy based on the expert knowledge recommendation system. APPL INTELL 2021. [DOI: 10.1007/s10489-021-02289-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
4
|
A novel order evaluation model with nested probabilistic-numerical linguistic information applied to traditional order grabbing mode. APPL INTELL 2021. [DOI: 10.1007/s10489-020-02088-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
5
|
Co-destruction Patterns in Crowdsourcing. ADVANCED INFORMATION SYSTEMS ENGINEERING 2020. [PMCID: PMC7266435 DOI: 10.1007/978-3-030-49435-3_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Crowdsourcing has been a successful paradigm in organising a large number of actors to work on specific tasks and contribute to knowledge collectively. However, the openness of such systems allows destructive patterns to form through actors’ dynamics. As a result, the collective effort of actors may not achieve the targeted objective due to lower engagement and lower quality contributions. There are varying forms of actor dynamics that can lead to suboptimal outcomes and this paper provides a systematic analysis of these in the form of a collection of patterns, derived from both the literature and from our own experiences with crowdsourcing systems. This collection of so-called co-destruction patterns allows for an-depth analysis of corwdsourcing systems which can benefit a comparative analysis and also assist with improvements of existing systems or the set-up of new ones. A survey reveals that these patterns have been observed in practice and are perceived as worthwhile addressing.
Collapse
|