1
|
Ayre K, Bittar A, Kam J, Verma S, Howard LM, Dutta R. Developing a Natural Language Processing tool to identify perinatal self-harm in electronic healthcare records. PLoS One 2021; 16:e0253809. [PMID: 34347787 PMCID: PMC8336818 DOI: 10.1371/journal.pone.0253809] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Accepted: 06/14/2021] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Self-harm occurring within pregnancy and the postnatal year ("perinatal self-harm") is a clinically important yet under-researched topic. Current research likely under-estimates prevalence due to methodological limitations. Electronic healthcare records (EHRs) provide a source of clinically rich data on perinatal self-harm. AIMS (1) To create a Natural Language Processing (NLP) tool that can, with acceptable precision and recall, identify mentions of acts of perinatal self-harm within EHRs. (2) To use this tool to identify service-users who have self-harmed perinatally, based on their EHRs. METHODS We used the Clinical Record Interactive Search system to extract de-identified EHRs of secondary mental healthcare service-users at South London and Maudsley NHS Foundation Trust. We developed a tool that applied several layers of linguistic processing based on the spaCy NLP library for Python. We evaluated mention-level performance in the following domains: span, status, temporality and polarity. Evaluation was done against a manually coded reference standard. Mention-level performance was reported as precision, recall, F-score and Cohen's kappa for each domain. Performance was also assessed at 'service-user' level and explored whether a heuristic rule improved this. We report per-class statistics for service-user performance, as well as likelihood ratios and post-test probabilities. RESULTS Mention-level performance: micro-averaged F-score, precision and recall for span, polarity and temporality >0.8. Kappa for status 0.68, temporality 0.62, polarity 0.91. Service-user level performance with heuristic: F-score, precision, recall of minority class 0.69, macro-averaged F-score 0.81, positive LR 9.4 (4.8-19), post-test probability 69.0% (53-82%). Considering the task difficulty, the tool performs well, although temporality was the attribute with the lowest level of annotator agreement. CONCLUSIONS It is feasible to develop an NLP tool that identifies, with acceptable validity, mentions of perinatal self-harm within EHRs, although with limitations regarding temporality. Using a heuristic rule, it can also function at a service-user-level.
Collapse
Affiliation(s)
- Karyn Ayre
- Section of Women’s Mental Health, Health Service and Population Research Department, Institute of Psychiatry, Psychology and Neuroscience, Kings College London, London, United Kingdom
- South London and Maudsley NHS Foundation Trust, Bethlem Royal Hospital, Kent, London, United Kingdom
- * E-mail:
| | - André Bittar
- Academic Department of Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, Kings College London, London, United Kingdom
| | - Joyce Kam
- King’s College London GKT School of Medical Education, London, United Kingdom
| | - Somain Verma
- King’s College London GKT School of Medical Education, London, United Kingdom
| | - Louise M. Howard
- Section of Women’s Mental Health, Health Service and Population Research Department, Institute of Psychiatry, Psychology and Neuroscience, Kings College London, London, United Kingdom
- South London and Maudsley NHS Foundation Trust, Bethlem Royal Hospital, Kent, London, United Kingdom
| | - Rina Dutta
- South London and Maudsley NHS Foundation Trust, Bethlem Royal Hospital, Kent, London, United Kingdom
- Academic Department of Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, Kings College London, London, United Kingdom
| |
Collapse
|
2
|
Fu S, Chen D, He H, Liu S, Moon S, Peterson KJ, Shen F, Wang L, Wang Y, Wen A, Zhao Y, Sohn S, Liu H. Clinical concept extraction: A methodology review. J Biomed Inform 2020; 109:103526. [PMID: 32768446 PMCID: PMC7746475 DOI: 10.1016/j.jbi.2020.103526] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Revised: 07/30/2020] [Accepted: 08/02/2020] [Indexed: 01/11/2023]
Abstract
BACKGROUND Concept extraction, a subdomain of natural language processing (NLP) with a focus on extracting concepts of interest, has been adopted to computationally extract clinical information from text for a wide range of applications ranging from clinical decision support to care quality improvement. OBJECTIVES In this literature review, we provide a methodology review of clinical concept extraction, aiming to catalog development processes, available methods and tools, and specific considerations when developing clinical concept extraction applications. METHODS Based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, a literature search was conducted for retrieving EHR-based information extraction articles written in English and published from January 2009 through June 2019 from Ovid MEDLINE In-Process & Other Non-Indexed Citations, Ovid MEDLINE, Ovid EMBASE, Scopus, Web of Science, and the ACM Digital Library. RESULTS A total of 6,686 publications were retrieved. After title and abstract screening, 228 publications were selected. The methods used for developing clinical concept extraction applications were discussed in this review.
Collapse
Affiliation(s)
- Sunyang Fu
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States; University of Minnesota - Twin Cities, Minneapolis, MN 55455, United States.
| | - David Chen
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Huan He
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Sijia Liu
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Sungrim Moon
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Kevin J Peterson
- Department of Information Technology, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States; University of Minnesota - Twin Cities, Minneapolis, MN 55455, United States.
| | - Feichen Shen
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Liwei Wang
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Yanshan Wang
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Andrew Wen
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Yiqing Zhao
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Sunghwan Sohn
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States; University of Minnesota - Twin Cities, Minneapolis, MN 55455, United States.
| |
Collapse
|
3
|
Liu S, Nie W, Gao D, Yang H, Yan J, Hao T. Clinical quantitative information recognition and entity-quantity association from Chinese electronic medical records. INT J MACH LEARN CYB 2020. [DOI: 10.1007/s13042-020-01160-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
4
|
Pan X, Chen B, Weng H, Gong Y, Qu Y. Temporal Expression Classification and Normalization From Chinese Narrative Clinical Texts: Pattern Learning Approach. JMIR Med Inform 2020; 8:e17652. [PMID: 32716307 PMCID: PMC7418025 DOI: 10.2196/17652] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Revised: 02/28/2020] [Accepted: 03/13/2020] [Indexed: 11/13/2022] Open
Abstract
Background Temporal information frequently exists in the representation of the disease progress, prescription, medication, surgery progress, or discharge summary in narrative clinical text. The accurate extraction and normalization of temporal expressions can positively boost the analysis and understanding of narrative clinical texts to promote clinical research and practice. Objective The goal of the study was to propose a novel approach for extracting and normalizing temporal expressions from Chinese narrative clinical text. Methods TNorm, a rule-based and pattern learning-based approach, has been developed for automatic temporal expression extraction and normalization from unstructured Chinese clinical text data. TNorm consists of three stages: extraction, classification, and normalization. It applies a set of heuristic rules and automatically generated patterns for temporal expression identification and extraction of clinical texts. Then, it collects the features of extracted temporal expressions for temporal type prediction and classification by using machine learning algorithms. Finally, the features are combined with the rule-based and a pattern learning-based approach to normalize the extracted temporal expressions. Results The evaluation dataset is a set of narrative clinical texts in Chinese containing 1459 discharge summaries of a domestic Grade A Class 3 hospital. The results show that TNorm, combined with temporal expressions extraction and temporal types prediction, achieves a precision of 0.8491, a recall of 0.8328, and a F1 score of 0.8409 in temporal expressions normalization. Conclusions This study illustrates an automatic approach, TNorm, that extracts and normalizes temporal expression from Chinese narrative clinical texts. TNorm was evaluated on the basis of discharge summary data, and results demonstrate its effectiveness on temporal expression normalization.
Collapse
Affiliation(s)
- Xiaoyi Pan
- School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou, China
| | - Boyu Chen
- School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou, China
| | - Heng Weng
- Department of Big Data Research of Medicine, The Second Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangzhou, China
| | - Yongyi Gong
- School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou, China
| | - Yingying Qu
- School of Business, Guangdong University of Foreign Studies, Guangzhou, China
| |
Collapse
|
5
|
Lee W, Choi J. Precursor-induced conditional random fields: connecting separate entities by induction for improved clinical named entity recognition. BMC Med Inform Decis Mak 2019; 19:132. [PMID: 31307440 PMCID: PMC6632205 DOI: 10.1186/s12911-019-0865-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2018] [Accepted: 07/03/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND This paper presents a conditional random fields (CRF) method that enables the capture of specific high-order label transition factors to improve clinical named entity recognition performance. Consecutive clinical entities in a sentence are usually separated from each other, and the textual descriptions in clinical narrative documents frequently indicate causal or posterior relationships that can be used to facilitate clinical named entity recognition. However, the CRF that is generally used for named entity recognition is a first-order model that constrains label transition dependency of adjoining labels under the Markov assumption. METHODS Based on the first-order structure, our proposed model utilizes non-entity tokens between separated entities as an information transmission medium by applying a label induction method. The model is referred to as precursor-induced CRF because its non-entity state memorizes precursor entity information, and the model's structure allows the precursor entity information to propagate forward through the label sequence. RESULTS We compared the proposed model with both first- and second-order CRFs in terms of their F1-scores, using two clinical named entity recognition corpora (the i2b2 2012 challenge and the Seoul National University Hospital electronic health record). The proposed model demonstrated better entity recognition performance than both the first- and second-order CRFs and was also more efficient than the higher-order model. CONCLUSION The proposed precursor-induced CRF which uses non-entity labels as label transition information improves entity recognition F1 score by exploiting long-distance transition factors without exponentially increasing the computational time. In contrast, a conventional second-order CRF model that uses longer distance transition factors showed even worse results than the first-order model and required the longest computation time. Thus, the proposed model could offer a considerable performance improvement over current clinical named entity recognition methods based on the CRF models.
Collapse
Affiliation(s)
- Wangjin Lee
- Interdisciplinary Program for Bioengineering, Graduate School, Seoul National University, 103 Daehak-ro, Jongno-gu, Seoul, 03080, South Korea
| | - Jinwook Choi
- Interdisciplinary Program for Bioengineering, Graduate School, Seoul National University, 103 Daehak-ro, Jongno-gu, Seoul, 03080, South Korea. .,Department of Biomedical Engineering, Seoul National University College of Medicine, 103 Daehak-ro, Jongno-gu, Seoul, 03080, South Korea. .,Institute of Medical and Biological Engineering, Medical Research Center, Seoul National University, 101 Daehak-ro, Jongno-gu, Seoul, 03080, South Korea.
| |
Collapse
|
6
|
Hao T, Pan X, Gu Z, Qu Y, Weng H. Correction to: A pattern learning-based method for temporal expression extraction and normalization from multi-lingual heterogeneous clinical texts. BMC Med Inform Decis Mak 2018; 18:25. [PMID: 29653522 PMCID: PMC5898051 DOI: 10.1186/s12911-018-0603-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
|