1
|
Casey A, Davidson E, Grover C, Tobin R, Grivas A, Zhang H, Schrempf P, O’Neil AQ, Lee L, Walsh M, Pellie F, Ferguson K, Cvoro V, Wu H, Whalley H, Mair G, Whiteley W, Alex B. Understanding the performance and reliability of NLP tools: a comparison of four NLP tools predicting stroke phenotypes in radiology reports. Front Digit Health 2023; 5:1184919. [PMID: 37840686 PMCID: PMC10569314 DOI: 10.3389/fdgth.2023.1184919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Accepted: 09/06/2023] [Indexed: 10/17/2023] Open
Abstract
Background Natural language processing (NLP) has the potential to automate the reading of radiology reports, but there is a need to demonstrate that NLP methods are adaptable and reliable for use in real-world clinical applications. Methods We tested the F1 score, precision, and recall to compare NLP tools on a cohort from a study on delirium using images and radiology reports from NHS Fife and a population-based cohort (Generation Scotland) that spans multiple National Health Service health boards. We compared four off-the-shelf rule-based and neural NLP tools (namely, EdIE-R, ALARM+, ESPRESSO, and Sem-EHR) and reported on their performance for three cerebrovascular phenotypes, namely, ischaemic stroke, small vessel disease (SVD), and atrophy. Clinical experts from the EdIE-R team defined phenotypes using labelling techniques developed in the development of EdIE-R, in conjunction with an expert researcher who read underlying images. Results EdIE-R obtained the highest F1 score in both cohorts for ischaemic stroke, ≥93%, followed by ALARM+, ≥87%. The F1 score of ESPRESSO was ≥74%, whilst that of Sem-EHR is ≥66%, although ESPRESSO had the highest precision in both cohorts, 90% and 98%. For F1 scores for SVD, EdIE-R scored ≥98% and ALARM+ ≥90%. ESPRESSO scored lowest with ≥77% and Sem-EHR ≥81%. In NHS Fife, F1 scores for atrophy by EdIE-R and ALARM+ were 99%, dropping in Generation Scotland to 96% for EdIE-R and 91% for ALARM+. Sem-EHR performed lowest for atrophy at 89% in NHS Fife and 73% in Generation Scotland. When comparing NLP tool output with brain image reads using F1 scores, ALARM+ scored 80%, outperforming EdIE-R at 66% in ischaemic stroke. For SVD, EdIE-R performed best, scoring 84%, with Sem-EHR 82%. For atrophy, EdIE-R and both ALARM+ versions were comparable at 80%. Conclusions The four NLP tools show varying F1 (and precision/recall) scores across all three phenotypes, although more apparent for ischaemic stroke. If NLP tools are to be used in clinical settings, this cannot be performed "out of the box." It is essential to understand the context of their development to assess whether they are suitable for the task at hand or whether further training, re-training, or modification is required to adapt tools to the target task.
Collapse
Affiliation(s)
- Arlene Casey
- Advanced Care Research Centre, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
| | - Emma Davidson
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Claire Grover
- School of Informatics, University of Edinburgh, Edinburgh, United Kingdom
| | - Richard Tobin
- School of Informatics, University of Edinburgh, Edinburgh, United Kingdom
| | - Andreas Grivas
- School of Informatics, University of Edinburgh, Edinburgh, United Kingdom
| | - Huayu Zhang
- Advanced Care Research Centre, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
| | - Patrick Schrempf
- Canon Medical Research Europe Ltd., AI Research, Edinburgh, United Kingdom
- School of Computer Science, University of St Andrews, St Andrews, United Kingdom
| | - Alison Q. O’Neil
- Canon Medical Research Europe Ltd., AI Research, Edinburgh, United Kingdom
- School of Engineering, University of Edinburgh, Edinburgh, United Kingdom
| | - Liam Lee
- Medical School, University of Edinburgh, Edinburgh, United Kingdom
| | - Michael Walsh
- Intensive Care Department, University Hospitals Bristol and Weston, Bristol, United Kingdom
| | - Freya Pellie
- National Horizons Centre, Teesside University, Darlington, United Kingdom
- School of Health and Life Sciences, Teesside University, Middlesbrough, United Kingdom
| | - Karen Ferguson
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Vera Cvoro
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
- Department of Geriatric Medicine, NHS Fife, Fife, United Kingdom
| | - Honghan Wu
- Institute of Health Informatics, University College London, London, United Kingdom
- Alan Turing Institute, London, United Kingdom
| | - Heather Whalley
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
- Generation Scotland, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, United Kingdom
| | - Grant Mair
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
- Neuroradiology, Department of Clinical Neurosciences, NHS Lothian, Edinburgh, United Kingdom
| | - William Whiteley
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
- Neuroradiology, Department of Clinical Neurosciences, NHS Lothian, Edinburgh, United Kingdom
| | - Beatrice Alex
- Edinburgh Futures Institute, University of Edinburgh, Edinburgh, United Kingdom
- School of Literatures, Languages and Cultures, University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
2
|
Cutforth M, Watson H, Brown C, Wang C, Thomson S, Fell D, Dilys V, Scrimgeour M, Schrempf P, Lesh J, Muir K, Weir A, O’Neil AQ. Acute stroke CDS: automatic retrieval of thrombolysis contraindications from unstructured clinical letters. Front Digit Health 2023; 5:1186516. [PMID: 37388253 PMCID: PMC10305776 DOI: 10.3389/fdgth.2023.1186516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 05/15/2023] [Indexed: 07/01/2023] Open
Abstract
Introduction Thrombolysis treatment for acute ischaemic stroke can lead to better outcomes if administered early enough. However, contraindications exist which put the patient at greater risk of a bleed (e.g. recent major surgery, anticoagulant medication). Therefore, clinicians must check a patient's past medical history before proceeding with treatment. In this work we present a machine learning approach for accurate automatic detection of this information in unstructured text documents such as discharge letters or referral letters, to support the clinician in making a decision about whether to administer thrombolysis. Methods We consulted local and national guidelines for thrombolysis eligibility, identifying 86 entities which are relevant to the thrombolysis decision. A total of 8,067 documents from 2,912 patients were manually annotated with these entities by medical students and clinicians. Using this data, we trained and validated several transformer-based named entity recognition (NER) models, focusing on transformer models which have been pre-trained on a biomedical corpus as these have shown most promise in the biomedical NER literature. Results Our best model was a PubMedBERT-based approach, which obtained a lenient micro/macro F1 score of 0.829/0.723. Ensembling 5 variants of this model gave a significant boost to precision, obtaining micro/macro F1 of 0.846/0.734 which approaches the human annotator performance of 0.847/0.839. We further propose numeric definitions for the concepts of name regularity (similarity of all spans which refer to an entity) and context regularity (similarity of all context surrounding mentions of an entity), using these to analyse the types of errors made by the system and finding that the name regularity of an entity is a stronger predictor of model performance than raw training set frequency. Discussion Overall, this work shows the potential of machine learning to provide clinical decision support (CDS) for the time-critical decision of thrombolysis administration in ischaemic stroke by quickly surfacing relevant information, leading to prompt treatment and hence to better patient outcomes.
Collapse
Affiliation(s)
| | - Hannah Watson
- Canon Medical Research Europe, Edinburgh, United Kingdom
| | - Cameron Brown
- Institute of Neuroscience & Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Chaoyang Wang
- Canon Medical Research Europe, Edinburgh, United Kingdom
| | - Stuart Thomson
- Canon Medical Research Europe, Edinburgh, United Kingdom
| | - Dickon Fell
- Canon Medical Research Europe, Edinburgh, United Kingdom
| | | | | | | | - James Lesh
- Canon Medical Research Europe, Edinburgh, United Kingdom
| | - Keith Muir
- Institute of Neuroscience & Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Alexander Weir
- Canon Medical Research Europe, Edinburgh, United Kingdom
| | - Alison Q O’Neil
- Canon Medical Research Europe, Edinburgh, United Kingdom
- School of Engineering, University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
3
|
Sanchez P, Voisey JP, Xia T, Watson HI, O’Neil AQ, Tsaftaris SA. Causal machine learning for healthcare and precision medicine. R Soc Open Sci 2022; 9:220638. [PMID: 35950198 PMCID: PMC9346354 DOI: 10.1098/rsos.220638] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Accepted: 07/15/2022] [Indexed: 06/15/2023]
Abstract
Causal machine learning (CML) has experienced increasing popularity in healthcare. Beyond the inherent capabilities of adding domain knowledge into learning systems, CML provides a complete toolset for investigating how a system would react to an intervention (e.g. outcome given a treatment). Quantifying effects of interventions allows actionable decisions to be made while maintaining robustness in the presence of confounders. Here, we explore how causal inference can be incorporated into different aspects of clinical decision support systems by using recent advances in machine learning. Throughout this paper, we use Alzheimer's disease to create examples for illustrating how CML can be advantageous in clinical scenarios. Furthermore, we discuss important challenges present in healthcare applications such as processing high-dimensional and unstructured data, generalization to out-of-distribution samples and temporal relationships, that despite the great effort from the research community remain to be solved. Finally, we review lines of research within causal representation learning, causal discovery and causal reasoning which offer the potential towards addressing the aforementioned challenges.
Collapse
Affiliation(s)
- Pedro Sanchez
- School of Engineering, University of Edinburgh, Edinburgh, UK
| | - Jeremy P. Voisey
- AI Research, Canon Medical Research Europe, Edinburgh, Lothian, UK
| | - Tian Xia
- School of Engineering, University of Edinburgh, Edinburgh, UK
| | - Hannah I. Watson
- AI Research, Canon Medical Research Europe, Edinburgh, Lothian, UK
| | - Alison Q. O’Neil
- School of Engineering, University of Edinburgh, Edinburgh, UK
- AI Research, Canon Medical Research Europe, Edinburgh, Lothian, UK
| | | |
Collapse
|