1
|
Mensa E, Martínez Fernández P, Roller R, Radicioni DP. Editorial: Information extraction for health documents. Front Artif Intell 2023; 6:1224529. [PMID: 37396971 PMCID: PMC10313187 DOI: 10.3389/frai.2023.1224529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 05/31/2023] [Indexed: 07/04/2023] Open
Affiliation(s)
- Enrico Mensa
- Dipartimento di Informatica, Università degli Studi di Torino, Turin, Italy
| | - Paloma Martínez Fernández
- Departamento de Informática, Computer Science and Engineering Department, Universidad Carlos III de Madrid, Leganés, Spain
| | - Roland Roller
- German Research Center for Artificial Intelligence (DFKI), Berlin, Germany
| | | |
Collapse
|
2
|
Ali SR, Strafford H, Dobbs TD, Fonferko-Shadrach B, Lacey AS, Pickrell WO, Hutchings HA, Whitaker IS. Development and validation of an automated basal cell carcinoma histopathology information extraction system using natural language processing. Front Surg 2022; 9:870494. [PMID: 36439548 PMCID: PMC9683031 DOI: 10.3389/fsurg.2022.870494] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2022] [Accepted: 07/11/2022] [Indexed: 01/26/2024] Open
Abstract
Introduction Routinely collected healthcare data are a powerful research resource, but often lack detailed disease-specific information that is collected in clinical free text such as histopathology reports. We aim to use natural Language Processing (NLP) techniques to extract detailed clinical and pathological information from histopathology reports to enrich routinely collected data. Methods We used the general architecture for text engineering (GATE) framework to build an NLP information extraction system using rule-based techniques. During validation, we deployed our rule-based NLP pipeline on 200 previously unseen, de-identified and pseudonymised basal cell carcinoma (BCC) histopathological reports from Swansea Bay University Health Board, Wales, UK. The results of our algorithm were compared with gold standard human annotation by two independent and blinded expert clinicians involved in skin cancer care. Results We identified 11,224 items of information with a mean precision, recall, and F1 score of 86.0% (95% CI: 75.1-96.9), 84.2% (95% CI: 72.8-96.1), and 84.5% (95% CI: 73.0-95.1), respectively. The difference between clinician annotator F1 scores was 7.9% in comparison with 15.5% between the NLP pipeline and the gold standard corpus. Cohen's Kappa score on annotated tokens was 0.85. Conclusion Using an NLP rule-based approach for named entity recognition in BCC, we have been able to develop and validate a pipeline with a potential application in improving the quality of cancer registry data, supporting service planning, and enhancing the quality of routinely collected data for research.
Collapse
Affiliation(s)
- Stephen R. Ali
- Reconstructive Surgery and Regenerative Medicine Research Centre, Institute of Life Sciences, Swansea University Medical School, Swansea, United Kingdom
- Welsh Centre for Burns and Plastic Surgery, Morriston Hospital, Swansea, United Kingdom
| | - Huw Strafford
- Neurology and Molecular Neuroscience Group, Institute of Life Science, Swansea University Medical School, Swansea University, Swansea, United Kingdom
- Health Data Research UK, Data Science Building, Swansea University Medical School, Swansea University, Swansea, United Kingdom
| | - Thomas D. Dobbs
- Reconstructive Surgery and Regenerative Medicine Research Centre, Institute of Life Sciences, Swansea University Medical School, Swansea, United Kingdom
- Welsh Centre for Burns and Plastic Surgery, Morriston Hospital, Swansea, United Kingdom
| | - Beata Fonferko-Shadrach
- Neurology and Molecular Neuroscience Group, Institute of Life Science, Swansea University Medical School, Swansea University, Swansea, United Kingdom
| | - Arron S. Lacey
- Neurology and Molecular Neuroscience Group, Institute of Life Science, Swansea University Medical School, Swansea University, Swansea, United Kingdom
- Health Data Research UK, Data Science Building, Swansea University Medical School, Swansea University, Swansea, United Kingdom
| | - William Owen Pickrell
- Neurology and Molecular Neuroscience Group, Institute of Life Science, Swansea University Medical School, Swansea University, Swansea, United Kingdom
- Department of Neurology, Morriston Hospital, Swansea, United Kingdom
| | - Hayley A. Hutchings
- Patient and Population Health and Informatics Research, Swansea University Medical School, Swansea, United Kingdom
| | - Iain S. Whitaker
- Reconstructive Surgery and Regenerative Medicine Research Centre, Institute of Life Sciences, Swansea University Medical School, Swansea, United Kingdom
- Welsh Centre for Burns and Plastic Surgery, Morriston Hospital, Swansea, United Kingdom
| |
Collapse
|