1
|
Yang J, Zhang T, Tsai CY, Lu Y, Yao L. Evolution and emerging trends of named entity recognition: Bibliometric analysis from 2000 to 2023. Heliyon 2024; 10:e30053. [PMID: 38707358 PMCID: PMC11066397 DOI: 10.1016/j.heliyon.2024.e30053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 04/16/2024] [Accepted: 04/18/2024] [Indexed: 05/07/2024] Open
Abstract
Identifying valuable information within the extensive texts documented in natural language presents a significant challenge in various disciplines. Named Entity Recognition (NER), as one of the critical technologies in text data processing and mining, has become a current research hotspot. To accurately and objectively review the progress in NER, this paper employs bibliometric methods. It analyzes 1300 documents related to NER obtained from the Web of Science database using CiteSpace software. Firstly, statistical analysis is performed on the literature and journals that were obtained to explore the distribution characteristics of the literature. Secondly, the core authors in the field of NER, the development of the technology in different countries, and the leading institutions are explored by analyzing the number of publications and the cooperation network graph. Finally, explore the research frontiers, development tracks, research hotspots, and other information in this field from a scientific point of view, and further discuss the five research frontiers and seven research hotspots in depth. This paper explores the progress of NER research from both macro and micro perspectives. It aims to assist researchers in quickly grasping relevant information and offers constructive ideas and suggestions to promote the development of NER.
Collapse
Affiliation(s)
- Jun Yang
- School of Mechanical and Electrical Engineering, Guizhou Normal University, Guiyang, Guizhou, 550025, China
| | - Taihua Zhang
- School of Mechanical and Electrical Engineering, Guizhou Normal University, Guiyang, Guizhou, 550025, China
- Technical Engineering Center of Manufacturing Service and Knowledge Engineering, Guizhou Normal University, Guiyang, Guizhou, 550025, China
| | - Chieh-Yuan Tsai
- Department of Industrial Engineering and Management, Yuan Ze University, Taoyuan, 32003, Taiwan
| | - Yao Lu
- School of Mechanical and Electrical Engineering, Guizhou Normal University, Guiyang, Guizhou, 550025, China
- Technical Engineering Center of Manufacturing Service and Knowledge Engineering, Guizhou Normal University, Guiyang, Guizhou, 550025, China
| | - Liguo Yao
- School of Mechanical and Electrical Engineering, Guizhou Normal University, Guiyang, Guizhou, 550025, China
- Technical Engineering Center of Manufacturing Service and Knowledge Engineering, Guizhou Normal University, Guiyang, Guizhou, 550025, China
| |
Collapse
|
2
|
Gu Z, He X, Yu P, Jia W, Yang X, Peng G, Hu P, Chen S, Chen H, Lin Y. Automatic quantitative stroke severity assessment based on Chinese clinical named entity recognition with domain-adaptive pre-trained large language model. Artif Intell Med 2024; 150:102822. [PMID: 38553162 DOI: 10.1016/j.artmed.2024.102822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 01/28/2024] [Accepted: 02/21/2024] [Indexed: 04/02/2024]
Abstract
BACKGROUND Stroke is a prevalent disease with a significant global impact. Effective assessment of stroke severity is vital for an accurate diagnosis, appropriate treatment, and optimal clinical outcomes. The National Institutes of Health Stroke Scale (NIHSS) is a widely used scale for quantitatively assessing stroke severity. However, the current manual scoring of NIHSS is labor-intensive, time-consuming, and sometimes unreliable. Applying artificial intelligence (AI) techniques to automate the quantitative assessment of stroke on vast amounts of electronic health records (EHRs) has attracted much interest. OBJECTIVE This study aims to develop an automatic, quantitative stroke severity assessment framework through automating the entire NIHSS scoring process on Chinese clinical EHRs. METHODS Our approach consists of two major parts: Chinese clinical named entity recognition (CNER) with a domain-adaptive pre-trained large language model (LLM) and automated NIHSS scoring. To build a high-performing CNER model, we first construct a stroke-specific, densely annotated dataset "Chinese Stroke Clinical Records" (CSCR) from EHRs provided by our partner hospital, based on a stroke ontology that defines semantically related entities for stroke assessment. We then pre-train a Chinese clinical LLM coined "CliRoberta" through domain-adaptive transfer learning and construct a deep learning-based CNER model that can accurately extract entities directly from Chinese EHRs. Finally, an automated, end-to-end NIHSS scoring pipeline is proposed by mapping the extracted entities to relevant NIHSS items and values, to quantitatively assess the stroke severity. RESULTS Results obtained on a benchmark dataset CCKS2019 and our newly created CSCR dataset demonstrate the superior performance of our domain-adaptive pre-trained LLM and the CNER model, compared with the existing benchmark LLMs and CNER models. The high F1 score of 0.990 ensures the reliability of our model in accurately extracting the entities for the subsequent automatic NIHSS scoring. Subsequently, our automated, end-to-end NIHSS scoring approach achieved excellent inter-rater agreement (0.823) and intraclass consistency (0.986) with the ground truth and significantly reduced the processing time from minutes to a few seconds. CONCLUSION Our proposed automatic and quantitative framework for assessing stroke severity demonstrates exceptional performance and reliability through directly scoring the NIHSS from diagnostic notes in Chinese clinical EHRs. Moreover, this study also contributes a new clinical dataset, a pre-trained clinical LLM, and an effective deep learning-based CNER model. The deployment of these advanced algorithms can improve the accuracy and efficiency of clinical assessment, and help improve the quality, affordability and productivity of healthcare services.
Collapse
Affiliation(s)
- Zhanzhong Gu
- School of Electrical and Data Engineering, University of Technology Sydney, NSW, 2007, Australia.
| | - Xiangjian He
- School of Electrical and Data Engineering, University of Technology Sydney, NSW, 2007, Australia; School of Computer Science, University of Nottingham Ningbo China, Ningbo, China
| | - Ping Yu
- School of Computing and Information Technology, University of Wollongong, NSW, 2522, Australia
| | - Wenjing Jia
- School of Electrical and Data Engineering, University of Technology Sydney, NSW, 2007, Australia
| | - Xiguang Yang
- School of Electrical and Data Engineering, University of Technology Sydney, NSW, 2007, Australia
| | - Gang Peng
- Intergenepharm Pty Ltd, Sydney, NSW, 2000, Australia
| | - Penghui Hu
- Department of Oncology, The First Affiliated Hospital of Jinan University, Guangzhou, China
| | - Shiyan Chen
- Department of Neurology, The First Affiliated Hospital of Fujian Medical University, Fuzhou, China
| | - Hongjie Chen
- Department of Traditional Chinese Medicine, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Yiguang Lin
- Department of Traditional Chinese Medicine, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China; Department of Immuno-Oncology, The First Affiliated Hospital of Guangdong Pharmaceutical University, China; School of Life Sciences, University of Technology Sydney, NSW, 2007, Australia
| |
Collapse
|
3
|
Li M, Wang L, Wu Q, Zhu J, Zhang M. Diagnosis knowledge constrained network based on first-order logic for syndrome differentiation. Artif Intell Med 2024; 147:102739. [PMID: 38044249 DOI: 10.1016/j.artmed.2023.102739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 10/16/2023] [Accepted: 11/28/2023] [Indexed: 12/05/2023]
Abstract
Traditional Chinese medicine (TCM) has been recognized worldwide as a valuable asset of human medicine. The procedure of TCM is to treatment based on syndrome differentiation. However, the effect of TCM syndrome differentiation relies heavily on the experience of doctors. The gratifying progress of machine learning research in recent years has brought new ideas for TCM syndrome differentiation. In this paper, we propose a deep network model for TCM syndrome differentiation, which improves network performance by injecting TCM syndrome differentiation knowledge in the form of first-order logic into the deep network. Experimental results show that the accuracy of our proposed model reaches 89%, which is significantly better than the deep learning model MLP and other traditional machine learning models. In addition, we present the collected and formatted TCM syndrome differentiation (TSD) dataset, which contains more than 40,000 TCM clinical records. Moreover, 45 symptoms (""), 322 patterns(""), and more than 500 symptoms are labeled in TSD respectively. To the best of our knowledge, this is the first TCM syndrome differentiation dataset labeling diseases, syndromes and pattern. Such detailed labeling is helpful to explore the relationship between various elements of syndrome differentiation.
Collapse
Affiliation(s)
- Meiwen Li
- School of Information Engineering, Henan University of Science and Technology, Luoyang, 471023, China.
| | - Lin Wang
- School of Information Engineering, Henan University of Science and Technology, Luoyang, 471023, China.
| | - Qingtao Wu
- School of Information Engineering, Henan University of Science and Technology, Luoyang, 471023, China.
| | - Junlong Zhu
- School of Information Engineering, Henan University of Science and Technology, Luoyang, 471023, China.
| | - Mingchuan Zhang
- School of Information Engineering, Henan University of Science and Technology, Luoyang, 471023, China.
| |
Collapse
|
4
|
Asudani DS, Nagwani NK, Singh P. Impact of word embedding models on text analytics in deep learning environment: a review. Artif Intell Rev 2023; 56:1-81. [PMID: 36844886 PMCID: PMC9944441 DOI: 10.1007/s10462-023-10419-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/01/2023] [Indexed: 02/25/2023]
Abstract
The selection of word embedding and deep learning models for better outcomes is vital. Word embeddings are an n-dimensional distributed representation of a text that attempts to capture the meanings of the words. Deep learning models utilize multiple computing layers to learn hierarchical representations of data. The word embedding technique represented by deep learning has received much attention. It is used in various natural language processing (NLP) applications, such as text classification, sentiment analysis, named entity recognition, topic modeling, etc. This paper reviews the representative methods of the most prominent word embedding and deep learning models. It presents an overview of recent research trends in NLP and a detailed understanding of how to use these models to achieve efficient results on text analytics tasks. The review summarizes, contrasts, and compares numerous word embedding and deep learning models and includes a list of prominent datasets, tools, APIs, and popular publications. A reference for selecting a suitable word embedding and deep learning approach is presented based on a comparative analysis of different techniques to perform text analytics tasks. This paper can serve as a quick reference for learning the basics, benefits, and challenges of various word representation approaches and deep learning models, with their application to text analytics and a future outlook on research. It can be concluded from the findings of this study that domain-specific word embedding and the long short term memory model can be employed to improve overall text analytics task performance.
Collapse
Affiliation(s)
- Deepak Suresh Asudani
- Department of Computer Science and Engineering, National Institute of Technology, Raipur, Chhattisgarh India
| | - Naresh Kumar Nagwani
- Department of Computer Science and Engineering, National Institute of Technology, Raipur, Chhattisgarh India
| | - Pradeep Singh
- Department of Computer Science and Engineering, National Institute of Technology, Raipur, Chhattisgarh India
| |
Collapse
|
5
|
Dong S, Lei Z, Fei Y. Data-driven based four examinations in TCM: a survey. DIGITAL CHINESE MEDICINE 2022. [DOI: 10.1016/j.dcmed.2022.12.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023] Open
|
6
|
Shi J, Sun M, Sun Z, Li M, Gu Y, Zhang W. Multi-level semantic fusion network for Chinese medical named entity recognition. J Biomed Inform 2022; 133:104144. [PMID: 35878823 DOI: 10.1016/j.jbi.2022.104144] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 07/06/2022] [Accepted: 07/10/2022] [Indexed: 11/17/2022]
Abstract
Medical named entity recognition (MNER) is a fundamental component of understanding the unstructured medical texts in electronic health records, and it has received widespread attention in both academia and industry. However, the previous approaches of MNER do not make full use of hierarchical semantics from morphology to syntactic relationships like word dependency. Furthermore, extracting entities from Chinese medical texts is a more complex task because it usually contains for example homophones or pictophonetic characters. In this paper, we propose a multi-level semantic fusion network for Chinese medical named entity recognition, which fuses semantic information on morphology, character, word and syntactic level. We take radical as morphology semantic, pinyin and character dictionary as character semantic, word dictionary as word semantic, and these semantic features are fused by BiLSTM to get the contextualized representation. Then we use a graph neural network to model word dependency as syntactic semantic to enhance the contextualized representation. The experimental results show the effectiveness of the proposed model on two public datasets and robustness in real-world scenarios.
Collapse
Affiliation(s)
- Jintong Shi
- University of Chinese Academy of Sciences, Beijing, 100049, China; Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
| | - Mengxuan Sun
- University of Chinese Academy of Sciences, Beijing, 100049, China; Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
| | - Zhengya Sun
- University of Chinese Academy of Sciences, Beijing, 100049, China; Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
| | - Mingda Li
- University of Chinese Academy of Sciences, Beijing, 100049, China; Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
| | - Yifan Gu
- University of Chinese Academy of Sciences, Beijing, 100049, China; Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
| | - Wensheng Zhang
- University of Chinese Academy of Sciences, Beijing, 100049, China; Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China.
| |
Collapse
|
7
|
|
8
|
Zhang Q, Zhou J, Zhang B. Computational Traditional Chinese Medicine diagnosis: A literature survey. Comput Biol Med 2021; 133:104358. [PMID: 33831712 DOI: 10.1016/j.compbiomed.2021.104358] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 03/23/2021] [Accepted: 03/24/2021] [Indexed: 12/22/2022]
Abstract
BACKGROUND AND OBJECTIVE Traditional Chinese Medicine (TCM) diagnosis is based on the theoretical principles and knowledge, where it is steeped in thousands of years of history to diagnose various types of diseases and syndromes. It can be generally divided into four main diagnostic approaches: 1. Inspection, 2. Auscultation and olfaction, 3. Inquiry, and 4. Palpation, which are widely used in TCM hospitals in China and around the world. With the development of intelligent computing technology in recent years, computational TCM diagnosis has grown rapidly. METHODS In this paper, we aim to systematically summarize the development of computational TCM diagnosis based on four diagnostic approaches, mainly focusing on digital acquisition devices, collected datasets, and computational detection approaches (algorithms). Furthermore, all related works of this field are compared and explored in detail. RESULTS This survey provides the principles, applications, and current progress in computing for readers and researchers in terms of computational TCM diagnosis. Moreover, the future development direction, prospect, and technological trend of computational TCM diagnosis will also be discussed in this study. CONCLUSIONS Recent computational TCM diagnosis works are compared in detail to show the pros/cons, where we provide some meaningful suggestions and opinions on the future research approaches in this area. This work is useful for disease detection in computational TCM diagnosis as well as health management in the smart healthcare area. INDEX TERMS Computational diagnosis, Traditional Chinese Medicine, survey, smart healthcare.
Collapse
Affiliation(s)
- Qi Zhang
- The PAMI Research Group, Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Macau SAR, People's Republic of China
| | - Jianhang Zhou
- The PAMI Research Group, Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Macau SAR, People's Republic of China
| | - Bob Zhang
- The PAMI Research Group, Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Macau SAR, People's Republic of China.
| |
Collapse
|
9
|
Ramachandran R, Arutchelvan K. Named entity recognition on bio-medical literature documents using hybrid based approach. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING 2021:1-10. [PMID: 33723489 PMCID: PMC7947151 DOI: 10.1007/s12652-021-03078-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/22/2020] [Accepted: 03/02/2021] [Indexed: 06/02/2023]
Abstract
There have been many changes in the medical field due to technological advances. The progression in technologies provides lot of opportunities to extract valuable insights from huge amount of unstructured data. The literature documents published by the researchers in medical domain consists enormous amount of knowledge. Many organizations are involving in retrieving the hidden information from the literature documents. Extracting the drug names, diseases, symptoms, route of administration, species and dosage forms from the textual document is an easy task due to the innovation of technologies in the Natural Language Processing. In this article, a new hybrid based approach is proposed to identify named entity from the medical literature documents. New dictionary has been built for route of administration, dosage forms and symptoms to annotate the entities in the medical documents. The annotated entities are trained by the blank Spacy machine learning model. The trained model provide a decent accuracy when compared with the existing model. The hybrid model is validated with the dictionary and human (optional)to calculate the confusion matrix. It is able to identify more entities than the prevailing model. The average F1 score for five entities of the proposed hybrid based approach 73.79%.
Collapse
Affiliation(s)
- R. Ramachandran
- Department of Computer and Information Science, Annamalai University, Tamil Nadu, Chidambaram, India
| | - K. Arutchelvan
- Department of Computer and Information Science, Annamalai University, Tamil Nadu, Chidambaram, India
| |
Collapse
|