1
|
Hunter SA, Bullen J, Hunter KJ, Bhatt K. Analysis of Longitudinal Assessment: Role of Radiology Online Longitudinal Assessment-Type Questions. J Am Coll Radiol 2024:S1546-1440(24)00299-0. [PMID: 38527644 DOI: 10.1016/j.jacr.2024.03.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 02/29/2024] [Accepted: 03/13/2024] [Indexed: 03/27/2024]
Abstract
OBJECTIVE The purpose of this investigation was to assess gaps in radiologists' medical knowledge using abdominal subspecialty online longitudinal assessment (OLA)-type questions. Secondarily, we evaluated what question-centric factors influenced radiologists to pursue self-directed additional reading on topics presented. METHODS A prospective OLA-type test was distributed nationally to radiologists over a 4-month period. Questions were divided into multiple groupings, including arising from three different time periods of literature (≤5 years, 6-15 years, and >20 years), relating to common versus uncommon modalities, and guideline-based versus knowledge-based characterization. After each question, participants rated their confidence in diagnosis and perceived question relevance. Answers were provided, and links to answer explanations and references were provided and tracked. A series of regression models were used to test potential predictors of correct response, participant confidence, and perceived question relevance. RESULTS In all, 119 participants initiated the survey, with 100 answering at least one of the questions. Participants had significantly lower perceived relevance (mean: 51.3, 59.2, and 62.1 for topics ≤5 years old, 6-15 years old, and >20 years old, respectively; P < .001) and confidence (mean: 48.4, 57.8, and 63.4, respectively; P < .001) with questions on newer literature compared with older literature. Participants were significantly more likely to read question explanations for questions on common modalities compared with uncommon (46% versus 40%; P = .005) and on guideline-based questions compared with knowledge-based questions (49% versus 43%; P = .01). DISCUSSION OLA-type questions function by identifying areas in which radiologists lack knowledge or confidence and highlight areas in which participants have interest in further education.
Collapse
Affiliation(s)
- Sara A Hunter
- Assistant Professor of Radiology, Imaging Institute, Cleveland Clinic, Cleveland, Ohio.
| | - Jennifer Bullen
- Senior Biostatistician, Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, Ohio
| | - Kyle J Hunter
- Vice Chair of Quality for the Department of Radiology, Assistant Professor of Radiology, Department of Radiology, MetroHealth, Cleveland, Ohio
| | - Kavita Bhatt
- Diagnostic Radiology Residency Associate Program Director, Assistant Professor of Radiology, Imaging Institute, Cleveland Clinic, Cleveland, Ohio
| |
Collapse
|
2
|
Zaman S, Vimalesvaran K, Chappell D, Varela M, Peters NS, Shiwani H, Knott KD, Davies RH, Moon JC, Bharath AA, Linton NW, Francis DP, Cole GD, Howard JP. Quality assurance of late gadolinium enhancement cardiac magnetic resonance images: a deep learning classifier for confidence in the presence or absence of abnormality with potential to prompt real-time image optimization. J Cardiovasc Magn Reson 2024; 26:101040. [PMID: 38522522 PMCID: PMC11129090 DOI: 10.1016/j.jocmr.2024.101040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 03/10/2024] [Accepted: 03/19/2024] [Indexed: 03/26/2024] Open
Abstract
BACKGROUND Late gadolinium enhancement (LGE) of the myocardium has significant diagnostic and prognostic implications, with even small areas of enhancement being important. Distinguishing between definitely normal and definitely abnormal LGE images is usually straightforward, but diagnostic uncertainty arises when reporters are not sure whether the observed LGE is genuine or not. This uncertainty might be resolved by repetition (to remove artifact) or further acquisition of intersecting images, but this must take place before the scan finishes. Real-time quality assurance by humans is a complex task requiring training and experience, so being able to identify which images have an intermediate likelihood of LGE while the scan is ongoing, without the presence of an expert is of high value. This decision-support could prompt immediate image optimization or acquisition of supplementary images to confirm or refute the presence of genuine LGE. This could reduce ambiguity in reports. METHODS Short-axis, phase-sensitive inversion recovery late gadolinium images were extracted from our clinical cardiac magnetic resonance (CMR) database and shuffled. Two, independent, blinded experts scored each individual slice for "LGE likelihood" on a visual analog scale, from 0 (absolute certainty of no LGE) to 100 (absolute certainty of LGE), with 50 representing clinical equipoise. The scored images were split into two classes-either "high certainty" of whether LGE was present or not, or "low certainty." The dataset was split into training, validation, and test sets (70:15:15). A deep learning binary classifier based on the EfficientNetV2 convolutional neural network architecture was trained to distinguish between these categories. Classifier performance on the test set was evaluated by calculating the accuracy, precision, recall, F1-score, and area under the receiver operating characteristics curve (ROC AUC). Performance was also evaluated on an external test set of images from a different center. RESULTS One thousand six hundred and forty-five images (from 272 patients) were labeled and split at the patient level into training (1151 images), validation (247 images), and test (247 images) sets for the deep learning binary classifier. Of these, 1208 images were "high certainty" (255 for LGE, 953 for no LGE), and 437 were "low certainty". An external test comprising 247 images from 41 patients from another center was also employed. After 100 epochs, the performance on the internal test set was accuracy = 0.94, recall = 0.80, precision = 0.97, F1-score = 0.87, and ROC AUC = 0.94. The classifier also performed robustly on the external test set (accuracy = 0.91, recall = 0.73, precision = 0.93, F1-score = 0.82, and ROC AUC = 0.91). These results were benchmarked against a reference inter-expert accuracy of 0.86. CONCLUSION Deep learning shows potential to automate quality control of late gadolinium imaging in CMR. The ability to identify short-axis images with intermediate LGE likelihood in real-time may serve as a useful decision-support tool. This approach has the potential to guide immediate further imaging while the patient is still in the scanner, thereby reducing the frequency of recalls and inconclusive reports due to diagnostic indecision.
Collapse
Affiliation(s)
- Sameer Zaman
- National Heart and Lung Institute, Imperial College London, London SW7 2AZ, UK; Imperial College Healthcare NHS Trust, London W12 0HS, UK; AI for Healthcare Centre for Doctoral Training, Imperial College London, London SW7 2AZ, UK
| | - Kavitha Vimalesvaran
- National Heart and Lung Institute, Imperial College London, London SW7 2AZ, UK; AI for Healthcare Centre for Doctoral Training, Imperial College London, London SW7 2AZ, UK
| | - Digby Chappell
- AI for Healthcare Centre for Doctoral Training, Imperial College London, London SW7 2AZ, UK
| | - Marta Varela
- National Heart and Lung Institute, Imperial College London, London SW7 2AZ, UK
| | - Nicholas S Peters
- National Heart and Lung Institute, Imperial College London, London SW7 2AZ, UK; Imperial College Healthcare NHS Trust, London W12 0HS, UK
| | - Hunain Shiwani
- Institute of Cardiovascular Science, University College London, London WC1E 6DD, UK; Barts Health Centre, St. Bartholomew's Hospital, London EC1A 7BE, UK
| | - Kristopher D Knott
- Institute of Cardiovascular Science, University College London, London WC1E 6DD, UK; St. George's University Hospitals NHS Foundation Trust, London SW17 0QT, UK
| | - Rhodri H Davies
- Institute of Cardiovascular Science, University College London, London WC1E 6DD, UK; Barts Health Centre, St. Bartholomew's Hospital, London EC1A 7BE, UK
| | - James C Moon
- Institute of Cardiovascular Science, University College London, London WC1E 6DD, UK; Barts Health Centre, St. Bartholomew's Hospital, London EC1A 7BE, UK
| | - Anil A Bharath
- Department of Bioengineering, Imperial College London, London SW7 2AZ, UK
| | - Nick Wf Linton
- Imperial College Healthcare NHS Trust, London W12 0HS, UK; Department of Bioengineering, Imperial College London, London SW7 2AZ, UK.
| | - Darrel P Francis
- National Heart and Lung Institute, Imperial College London, London SW7 2AZ, UK; Imperial College Healthcare NHS Trust, London W12 0HS, UK
| | - Graham D Cole
- National Heart and Lung Institute, Imperial College London, London SW7 2AZ, UK; Imperial College Healthcare NHS Trust, London W12 0HS, UK
| | - James P Howard
- National Heart and Lung Institute, Imperial College London, London SW7 2AZ, UK; Imperial College Healthcare NHS Trust, London W12 0HS, UK
| |
Collapse
|
3
|
Cuna A, Rathore D, Bourret K, Opfer E, Chan S. Degree of Uncertainty in Reporting Imaging Findings for Necrotizing Enterocolitis: A Secondary Analysis from a Pilot Randomized Diagnostic Trial. Healthcare (Basel) 2024; 12:511. [PMID: 38470621 PMCID: PMC10931429 DOI: 10.3390/healthcare12050511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 02/18/2024] [Accepted: 02/18/2024] [Indexed: 03/14/2024] Open
Abstract
Diagnosis of necrotizing enterocolitis (NEC) relies heavily on imaging, but uncertainty in the language used in imaging reports can result in ambiguity, miscommunication, and potential diagnostic errors. To determine the degree of uncertainty in reporting imaging findings for NEC, we conducted a secondary analysis of the data from a previously completed pilot diagnostic randomized controlled trial (2019-2020). The study population comprised sixteen preterm infants with suspected NEC randomized to abdominal radiographs (AXRs) or AXR + bowel ultrasound (BUS). The level of uncertainty was determined using a four-point Likert scale. Overall, we reviewed radiology reports of 113 AXR and 24 BUS from sixteen preterm infants with NEC concern. The BUS reports showed less uncertainty for reporting pneumatosis, portal venous gas, and free air compared to AXR reports (pneumatosis: 1 [1-1.75) vs. 3 [2-3], p < 0.0001; portal venous gas: 1 [1-1] vs. 1 [1-1], p = 0.02; free air: 1 [1-1] vs. 2 [1-3], p < 0.0001). In conclusion, we found that BUS reports have a lower degree of uncertainty in reporting imaging findings of NEC compared to AXR reports. Whether the lower degree of uncertainty of BUS reports positively impacts clinical decision making in infants with possible NEC remains unknown.
Collapse
Affiliation(s)
- Alain Cuna
- Division of Neonatology, Children’s Mercy Kansas City, Kansas City, MO 64108, USA
- School of Medicine, University of Missouri-Kansas City, Kansas City, MO 64108, USA
| | - Disa Rathore
- School of Medicine, Kansas City University, Kansas City, MO 64106, USA
| | - Kira Bourret
- School of Medicine, Kansas City University, Kansas City, MO 64106, USA
| | - Erin Opfer
- School of Medicine, University of Missouri-Kansas City, Kansas City, MO 64108, USA
- Department of Radiology, Children’s Mercy Kansas City, Kansas City, MO 64108, USA
| | - Sherwin Chan
- School of Medicine, University of Missouri-Kansas City, Kansas City, MO 64108, USA
- Department of Radiology, Children’s Mercy Kansas City, Kansas City, MO 64108, USA
| |
Collapse
|
4
|
Nobel JM, Puts S, Krdzalic J, Zegers KML, Lobbes MBI, F Robben SG, Dekker ALAJ. Natural Language Processing Algorithm Used for Staging Pulmonary Oncology from Free-Text Radiological Reports: "Including PET-CT and Validation Towards Clinical Use". JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024; 37:3-12. [PMID: 38343237 DOI: 10.1007/s10278-023-00913-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 08/26/2023] [Accepted: 09/03/2023] [Indexed: 03/02/2024]
Abstract
Natural language processing (NLP) can be used to process and structure free text, such as (free text) radiological reports. In radiology, it is important that reports are complete and accurate for clinical staging of, for instance, pulmonary oncology. A computed tomography (CT) or positron emission tomography (PET)-CT scan is of great importance in tumor staging, and NLP may be of additional value to the radiological report when used in the staging process as it may be able to extract the T and N stage of the 8th tumor-node-metastasis (TNM) classification system. The purpose of this study is to evaluate a new TN algorithm (TN-PET-CT) by adding a layer of metabolic activity to an already existing rule-based NLP algorithm (TN-CT). This new TN-PET-CT algorithm is capable of staging chest CT examinations as well as PET-CT scans. The study design made it possible to perform a subgroup analysis to test the external validation of the prior TN-CT algorithm. For information extraction and matching, pyContextNLP, SpaCy, and regular expressions were used. Overall TN accuracy score of the TN-PET-CT algorithm was 0.73 and 0.62 in the training and validation set (N = 63, N = 100). The external validation of the TN-CT classifier (N = 65) was 0.72. Overall, it is possible to adjust the TN-CT algorithm into a TN-PET-CT algorithm. However, outcomes highly depend on the accuracy of the report, the used vocabulary, and its context to express, for example, uncertainty. This is true for both the adjusted PET-CT algorithm and for the CT algorithm when applied in another hospital.
Collapse
Affiliation(s)
- J Martijn Nobel
- Department of Radiology and Nuclear Medicine, Maastricht University Medical Center+, Postbox 5800, 6202 AZ, Maastricht, Netherlands.
- School of Health Professions Education, Maastricht University, Maastricht, Netherlands.
| | - Sander Puts
- Department of Radiation Oncology (MAASTRO), Maastricht, Netherlands
- GROW School for Oncology and Reproduction, Maastricht University, Maastricht, Netherlands
| | - Jasenko Krdzalic
- Zuyderland Medical Center, Department of Medical Imaging, Sittard-Geleen, Netherlands
| | - Karen M L Zegers
- Department of Radiation Oncology (MAASTRO), Maastricht, Netherlands
- GROW School for Oncology and Reproduction, Maastricht University, Maastricht, Netherlands
| | - Marc B I Lobbes
- Department of Radiology and Nuclear Medicine, Maastricht University Medical Center+, Postbox 5800, 6202 AZ, Maastricht, Netherlands
- GROW School for Oncology and Reproduction, Maastricht University, Maastricht, Netherlands
- Zuyderland Medical Center, Department of Medical Imaging, Sittard-Geleen, Netherlands
| | - Simon G F Robben
- Department of Radiology and Nuclear Medicine, Maastricht University Medical Center+, Postbox 5800, 6202 AZ, Maastricht, Netherlands
- School of Health Professions Education, Maastricht University, Maastricht, Netherlands
| | - André L A J Dekker
- Department of Radiation Oncology (MAASTRO), Maastricht, Netherlands
- GROW School for Oncology and Reproduction, Maastricht University, Maastricht, Netherlands
| |
Collapse
|
5
|
Hunter SA, Baker ME, Ream JM, Sweet DE, Austin NA, Remer EM, Primak A, Bullen J, Obuchowski N, Karim W, Herts BR. Visceral adipose tissue volume effect in Crohn's disease using reduced exposure CT enterography. J Appl Clin Med Phys 2024; 25:e14235. [PMID: 38059633 PMCID: PMC10795447 DOI: 10.1002/acm2.14235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 11/08/2023] [Accepted: 11/25/2023] [Indexed: 12/08/2023] Open
Abstract
PURPOSE The purpose of this investigation was to assess the effect of visceral adipose tissue volume (VA) on reader efficacy in diagnosing and characterizing small bowel Crohn's disease using lower exposure CT enterography (CTE). Secondarily, we investigated the effect of lower exposure and VA on reader diagnostic confidence. METHODS Prospective paired investigation of 256 CTE, 129 with Crohn's disease, were reconstructed at 100% and simulated 50% and 30% exposure. The senior author provided the disease classification for the 129 patients with Crohn's disease. Patient VA was measured, and exams were evaluated by six readers for presence or absence of Crohn's disease and phenotype using a 0-10-point scale. Logistic regression models assessed the effect of VA on sensitivity and specificity. RESULTS The effect of VA on sensitivity was significantly reduced at 30% exposure (odds radio [OR]: 1.00) compared to 100% exposure (OR: 1.12) (p = 0.048). There was no statistically significant difference among the exposures with respect to the effect of visceral fat on specificity (p = 0.159). The study readers' probability of agreement with the senior author on disease classification was 60%, 56%, and 53% at 100%, 50%, and 30% exposure, respectively (p = 0.004). When detecting low severity Crohn's disease, readers' mean sensitivity was 83%, 75%, and 74% at 100%, 50%, and 30% exposure, respectively (p = 0.002). In low severity disease, sensitivity also tended to increase as visceral fat increased (ORs per 1000 cm3 increase in visceral fat: 1.32, 1.31, and 1.18, p = 0.010, 0.016, and 0.100, at 100%, 50%, and 30% exposure). CONCLUSIONS While the interaction is complex, VA plays a role in detecting and characterizing small bowel Crohn's disease when exposure is altered, particularly in low severity disease.
Collapse
Affiliation(s)
| | - Mark E. Baker
- Imaging Institute – Cleveland ClinicClevelandOhioUSA
| | | | | | | | | | | | - Jennifer Bullen
- Department of Quantitative Health Sciences – Cleveland ClinicClevelandOhioUSA
| | - Nancy Obuchowski
- Department of Quantitative Health Sciences – Cleveland ClinicClevelandOhioUSA
| | - Wadih Karim
- Imaging Institute – Cleveland ClinicClevelandOhioUSA
| | | |
Collapse
|
6
|
Casey A, Davidson E, Grover C, Tobin R, Grivas A, Zhang H, Schrempf P, O’Neil AQ, Lee L, Walsh M, Pellie F, Ferguson K, Cvoro V, Wu H, Whalley H, Mair G, Whiteley W, Alex B. Understanding the performance and reliability of NLP tools: a comparison of four NLP tools predicting stroke phenotypes in radiology reports. Front Digit Health 2023; 5:1184919. [PMID: 37840686 PMCID: PMC10569314 DOI: 10.3389/fdgth.2023.1184919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Accepted: 09/06/2023] [Indexed: 10/17/2023] Open
Abstract
Background Natural language processing (NLP) has the potential to automate the reading of radiology reports, but there is a need to demonstrate that NLP methods are adaptable and reliable for use in real-world clinical applications. Methods We tested the F1 score, precision, and recall to compare NLP tools on a cohort from a study on delirium using images and radiology reports from NHS Fife and a population-based cohort (Generation Scotland) that spans multiple National Health Service health boards. We compared four off-the-shelf rule-based and neural NLP tools (namely, EdIE-R, ALARM+, ESPRESSO, and Sem-EHR) and reported on their performance for three cerebrovascular phenotypes, namely, ischaemic stroke, small vessel disease (SVD), and atrophy. Clinical experts from the EdIE-R team defined phenotypes using labelling techniques developed in the development of EdIE-R, in conjunction with an expert researcher who read underlying images. Results EdIE-R obtained the highest F1 score in both cohorts for ischaemic stroke, ≥93%, followed by ALARM+, ≥87%. The F1 score of ESPRESSO was ≥74%, whilst that of Sem-EHR is ≥66%, although ESPRESSO had the highest precision in both cohorts, 90% and 98%. For F1 scores for SVD, EdIE-R scored ≥98% and ALARM+ ≥90%. ESPRESSO scored lowest with ≥77% and Sem-EHR ≥81%. In NHS Fife, F1 scores for atrophy by EdIE-R and ALARM+ were 99%, dropping in Generation Scotland to 96% for EdIE-R and 91% for ALARM+. Sem-EHR performed lowest for atrophy at 89% in NHS Fife and 73% in Generation Scotland. When comparing NLP tool output with brain image reads using F1 scores, ALARM+ scored 80%, outperforming EdIE-R at 66% in ischaemic stroke. For SVD, EdIE-R performed best, scoring 84%, with Sem-EHR 82%. For atrophy, EdIE-R and both ALARM+ versions were comparable at 80%. Conclusions The four NLP tools show varying F1 (and precision/recall) scores across all three phenotypes, although more apparent for ischaemic stroke. If NLP tools are to be used in clinical settings, this cannot be performed "out of the box." It is essential to understand the context of their development to assess whether they are suitable for the task at hand or whether further training, re-training, or modification is required to adapt tools to the target task.
Collapse
Affiliation(s)
- Arlene Casey
- Advanced Care Research Centre, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
| | - Emma Davidson
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Claire Grover
- School of Informatics, University of Edinburgh, Edinburgh, United Kingdom
| | - Richard Tobin
- School of Informatics, University of Edinburgh, Edinburgh, United Kingdom
| | - Andreas Grivas
- School of Informatics, University of Edinburgh, Edinburgh, United Kingdom
| | - Huayu Zhang
- Advanced Care Research Centre, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
| | - Patrick Schrempf
- Canon Medical Research Europe Ltd., AI Research, Edinburgh, United Kingdom
- School of Computer Science, University of St Andrews, St Andrews, United Kingdom
| | - Alison Q. O’Neil
- Canon Medical Research Europe Ltd., AI Research, Edinburgh, United Kingdom
- School of Engineering, University of Edinburgh, Edinburgh, United Kingdom
| | - Liam Lee
- Medical School, University of Edinburgh, Edinburgh, United Kingdom
| | - Michael Walsh
- Intensive Care Department, University Hospitals Bristol and Weston, Bristol, United Kingdom
| | - Freya Pellie
- National Horizons Centre, Teesside University, Darlington, United Kingdom
- School of Health and Life Sciences, Teesside University, Middlesbrough, United Kingdom
| | - Karen Ferguson
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Vera Cvoro
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
- Department of Geriatric Medicine, NHS Fife, Fife, United Kingdom
| | - Honghan Wu
- Institute of Health Informatics, University College London, London, United Kingdom
- Alan Turing Institute, London, United Kingdom
| | - Heather Whalley
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
- Generation Scotland, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, United Kingdom
| | - Grant Mair
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
- Neuroradiology, Department of Clinical Neurosciences, NHS Lothian, Edinburgh, United Kingdom
| | - William Whiteley
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
- Neuroradiology, Department of Clinical Neurosciences, NHS Lothian, Edinburgh, United Kingdom
| | - Beatrice Alex
- Edinburgh Futures Institute, University of Edinburgh, Edinburgh, United Kingdom
- School of Literatures, Languages and Cultures, University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
7
|
Marshall TL, Nickels LC, Brady PW, Edgerton EJ, Lee JJ, Hagedorn PA. Developing a machine learning model to detect diagnostic uncertainty in clinical documentation. J Hosp Med 2023; 18:405-412. [PMID: 36919861 DOI: 10.1002/jhm.13080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 02/11/2023] [Accepted: 02/25/2023] [Indexed: 03/16/2023]
Abstract
BACKGROUND AND OBJECTIVE Diagnostic uncertainty, when unrecognized or poorly communicated, can result in diagnostic error. However, diagnostic uncertainty is challenging to study due to a lack of validated identification methods. This study aims to identify distinct linguistic patterns associated with diagnostic uncertainty in clinical documentation. DESIGN, SETTING AND PARTICIPANTS This case-control study compares the clinical documentation of hospitalized children who received a novel uncertain diagnosis (UD) diagnosis label during their admission to a set of matched controls. Linguistic analyses identified potential linguistic indicators (i.e., words or phrases) of diagnostic uncertainty that were then manually reviewed by a linguist and clinical experts to identify those most relevant to diagnostic uncertainty. A natural language processing program categorized medical terminology into semantic types (i.e., sign or symptom), from which we identified a subset of these semantic types that both categorized reliably and were relevant to diagnostic uncertainty. Finally, a competitive machine learning modeling strategy utilizing the linguistic indicators and semantic types compared different predictive models for identifying diagnostic uncertainty. RESULTS Our cohort included 242 UD-labeled patients and 932 matched controls with a combination of 3070 clinical notes. The best-performing model was a random forest, utilizing a combination of linguistic indicators and semantic types, yielding a sensitivity of 89.4% and a positive predictive value of 96.7%. CONCLUSION Expert labeling, natural language processing, and machine learning methods combined with human validation resulted in highly predictive models to detect diagnostic uncertainty in clinical documentation and represent a promising approach to detecting, studying, and ultimately mitigating diagnostic uncertainty in clinical practice.
Collapse
Affiliation(s)
- Trisha L Marshall
- Division of Hospital Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
- Department of Pediatrics, College of Medicine, University of Cincinnati, Cincinnati, Ohio, USA
| | - Lindsay C Nickels
- Digital Scholarship Center, University of Cincinnati Libraries and College of Arts and Sciences, Cincinnati, Ohio, USA
- AI for All Lab, Digital Futures Program, University of Cincinnati, Cincinnati, Ohio, USA
| | - Patrick W Brady
- Division of Hospital Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
- Department of Pediatrics, College of Medicine, University of Cincinnati, Cincinnati, Ohio, USA
- James M. Anderson Center for Health Systems Excellence, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | - Ezra J Edgerton
- Digital Scholarship Center, University of Cincinnati Libraries and College of Arts and Sciences, Cincinnati, Ohio, USA
- AI for All Lab, Digital Futures Program, University of Cincinnati, Cincinnati, Ohio, USA
| | - James J Lee
- Digital Scholarship Center, University of Cincinnati Libraries and College of Arts and Sciences, Cincinnati, Ohio, USA
- AI for All Lab, Digital Futures Program, University of Cincinnati, Cincinnati, Ohio, USA
| | - Philip A Hagedorn
- Division of Hospital Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
- Department of Pediatrics, College of Medicine, University of Cincinnati, Cincinnati, Ohio, USA
- Department of Information Services, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| |
Collapse
|
8
|
Li D, Pehrson LM, Bonnevie R, Fraccaro M, Thrane J, Tøttrup L, Lauridsen CA, Butt Balaganeshan S, Jankovic J, Andersen TT, Mayar A, Hansen KL, Carlsen JF, Darkner S, Nielsen MB. Performance and Agreement When Annotating Chest X-ray Text Reports—A Preliminary Step in the Development of a Deep Learning-Based Prioritization and Detection System. Diagnostics (Basel) 2023; 13:diagnostics13061070. [PMID: 36980376 PMCID: PMC10047142 DOI: 10.3390/diagnostics13061070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 03/06/2023] [Accepted: 03/08/2023] [Indexed: 03/18/2023] Open
Abstract
A chest X-ray report is a communicative tool and can be used as data for developing artificial intelligence-based decision support systems. For both, consistent understanding and labeling is important. Our aim was to investigate how readers would comprehend and annotate 200 chest X-ray reports. Reports written between 1 January 2015 and 11 March 2022 were selected based on search words. Annotators included three board-certified radiologists, two trained radiologists (physicians), two radiographers (radiological technicians), a non-radiological physician, and a medical student. Consensus labels by two or more of the experienced radiologists were considered “gold standard”. Matthew’s correlation coefficient (MCC) was calculated to assess annotation performance, and descriptive statistics were used to assess agreement between individual annotators and labels. The intermediate radiologist had the best correlation to “gold standard” (MCC 0.77). This was followed by the novice radiologist and medical student (MCC 0.71 for both), the novice radiographer (MCC 0.65), non-radiological physician (MCC 0.64), and experienced radiographer (MCC 0.57). Our findings showed that for developing an artificial intelligence-based support system, if trained radiologists are not available, annotations from non-radiological annotators with basic and general knowledge may be more aligned with radiologists compared to annotations from sub-specialized medical staff, if their sub-specialization is outside of diagnostic radiology.
Collapse
Affiliation(s)
- Dana Li
- Department of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, Denmark
- Department of Clinical Medicine, University of Copenhagen, 2100 Copenhagen, Denmark
- Correspondence:
| | - Lea Marie Pehrson
- Department of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, Denmark
- Department of Computer Science, University of Copenhagen, 2100 Copenhagen, Denmark
| | | | | | | | | | - Carsten Ammitzbøl Lauridsen
- Department of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, Denmark
- Radiography Education, University College Copenhagen, 2200 Copenhagen, Denmark
| | - Sedrah Butt Balaganeshan
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2100 Copenhagen, Denmark
| | - Jelena Jankovic
- Department of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, Denmark
| | - Tobias Thostrup Andersen
- Department of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, Denmark
| | - Alyas Mayar
- Department of Health Sciences, Panum Institute, University of Copenhagen, 2100 Copenhagen, Denmark
| | - Kristoffer Lindskov Hansen
- Department of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, Denmark
- Department of Clinical Medicine, University of Copenhagen, 2100 Copenhagen, Denmark
| | - Jonathan Frederik Carlsen
- Department of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, Denmark
- Department of Clinical Medicine, University of Copenhagen, 2100 Copenhagen, Denmark
| | - Sune Darkner
- Department of Computer Science, University of Copenhagen, 2100 Copenhagen, Denmark
| | - Michael Bachmann Nielsen
- Department of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, Denmark
- Department of Clinical Medicine, University of Copenhagen, 2100 Copenhagen, Denmark
| |
Collapse
|
9
|
Dhami MK, Mandel DR. Communicating uncertainty using words and numbers. Trends Cogn Sci 2022; 26:514-526. [DOI: 10.1016/j.tics.2022.03.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Revised: 02/27/2022] [Accepted: 03/08/2022] [Indexed: 11/24/2022]
|
10
|
Oren O, Gersh BJ, Bhatt DL. Improving Communication of Incidental Imaging Findings: Transforming Uncertainty Into Opportunity. Mayo Clin Proc 2021; 96:2753-2756. [PMID: 34579946 DOI: 10.1016/j.mayocp.2021.06.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Revised: 05/31/2021] [Accepted: 06/28/2021] [Indexed: 11/30/2022]
Affiliation(s)
- Ohad Oren
- Division of Cardiology, Massachusetts General Hospital, Boston, MA
| | - Bernard J Gersh
- Department of Cardiovascular Medicine, Mayo Clinic College of Medicine, Rochester, MN
| | - Deepak L Bhatt
- Brigham and Women's Hospital Heart and Vascular Center and Harvard Medical School, Boston, MA.
| |
Collapse
|
11
|
Anatomic Point-Based Lung Region with Zone Identification for Radiologist Annotation and Machine Learning for Chest Radiographs. J Digit Imaging 2021; 34:922-931. [PMID: 34327625 DOI: 10.1007/s10278-021-00494-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Revised: 06/02/2021] [Accepted: 07/05/2021] [Indexed: 10/20/2022] Open
Abstract
Our objective is to investigate the reliability and usefulness of anatomic point-based lung zone segmentation on chest radiographs (CXRs) as a reference standard framework and to evaluate the accuracy of automated point placement. Two hundred frontal CXRs were presented to two radiologists who identified five anatomic points: two at the lung apices, one at the top of the aortic arch, and two at the costophrenic angles. Of these 1000 anatomic points, 161 (16.1%) were obscured (mostly by pleural effusions). Observer variations were investigated. Eight anatomic zones then were automatically generated from the manually placed anatomic points, and a prototype algorithm was developed using the point-based lung zone segmentation to detect cardiomegaly and levels of diaphragm and pleural effusions. A trained U-Net neural network was used to automatically place these five points within 379 CXRs of an independent database. Intra- and inter-observer variation in mean distance between corresponding anatomic points was larger for obscured points (8.7 mm and 20 mm, respectively) than for visible points (4.3 mm and 7.6 mm, respectively). The computer algorithm using the point-based lung zone segmentation could diagnostically measure the cardiothoracic ratio and diaphragm position or pleural effusion. The mean distance between corresponding points placed by the radiologist and by the neural network was 6.2 mm. The network identified 95% of the radiologist-indicated points with only 3% of network-identified points being false-positives. In conclusion, a reliable anatomic point-based lung segmentation method for CXRs has been developed with expected utility for establishing reference standards for machine learning applications.
Collapse
|