1
|
Miró Catalina Q, Vidal-Alaball J, Fuster-Casanovas A, Escalé-Besa A, Ruiz Comellas A, Solé-Casals J. Real-world testing of an artificial intelligence algorithm for the analysis of chest X-rays in primary care settings. Sci Rep 2024; 14:5199. [PMID: 38431731 PMCID: PMC10908781 DOI: 10.1038/s41598-024-55792-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 02/27/2024] [Indexed: 03/05/2024] Open
Abstract
Interpreting chest X-rays is a complex task, and artificial intelligence algorithms for this purpose are currently being developed. It is important to perform external validations of these algorithms in order to implement them. This study therefore aims to externally validate an AI algorithm's diagnoses in real clinical practice, comparing them to a radiologist's diagnoses. The aim is also to identify diagnoses the algorithm may not have been trained for. A prospective observational study for the external validation of the AI algorithm in a region of Catalonia, comparing the AI algorithm's diagnosis with that of the reference radiologist, considered the gold standard. The external validation was performed with a sample of 278 images and reports, 51.8% of which showed no radiological abnormalities according to the radiologist's report. Analysing the validity of the AI algorithm, the average accuracy was 0.95 (95% CI 0.92; 0.98), the sensitivity was 0.48 (95% CI 0.30; 0.66) and the specificity was 0.98 (95% CI 0.97; 0.99). The conditions where the algorithm was most sensitive were external, upper abdominal and cardiac and/or valvular implants. On the other hand, the conditions where the algorithm was less sensitive were in the mediastinum, vessels and bone. The algorithm has been validated in the primary care setting and has proven to be useful when identifying images with or without conditions. However, in order to be a valuable tool to help and support experts, it requires additional real-world training to enhance its diagnostic capabilities for some of the conditions analysed. Our study emphasizes the need for continuous improvement to ensure the algorithm's effectiveness in primary care.
Collapse
Affiliation(s)
- Queralt Miró Catalina
- Unitat de Suport a la Recerca de la Catalunya Central, Fundació Institut Universitari per a la Recerca a l'Atenció Primària de Salut Jordi Gol i Gurina, Sant Fruitós de Bages, Spain
- Health Promotion in Rural Areas Research Group, Gerència d'Atenció Primària i a la Comunitat de la Catalunya Central, Institut Català de la Salut, Carrer Pica d'Estats, 13-15, 08272, Sant Fruitós de Bages, Barcelona, Spain
- Faculty of Science Technology and Engineering, University of Vic-Central University of Catalonia, Vic, Spain
| | - Josep Vidal-Alaball
- Unitat de Suport a la Recerca de la Catalunya Central, Fundació Institut Universitari per a la Recerca a l'Atenció Primària de Salut Jordi Gol i Gurina, Sant Fruitós de Bages, Spain.
- Health Promotion in Rural Areas Research Group, Gerència d'Atenció Primària i a la Comunitat de la Catalunya Central, Institut Català de la Salut, Carrer Pica d'Estats, 13-15, 08272, Sant Fruitós de Bages, Barcelona, Spain.
- Faculty of Medicine, University of Vic-Central University of Catalonia, Vic, Spain.
| | - Aïna Fuster-Casanovas
- Unitat de Suport a la Recerca de la Catalunya Central, Fundació Institut Universitari per a la Recerca a l'Atenció Primària de Salut Jordi Gol i Gurina, Sant Fruitós de Bages, Spain
- Health Promotion in Rural Areas Research Group, Gerència d'Atenció Primària i a la Comunitat de la Catalunya Central, Institut Català de la Salut, Carrer Pica d'Estats, 13-15, 08272, Sant Fruitós de Bages, Barcelona, Spain
| | - Anna Escalé-Besa
- Unitat de Suport a la Recerca de la Catalunya Central, Fundació Institut Universitari per a la Recerca a l'Atenció Primària de Salut Jordi Gol i Gurina, Sant Fruitós de Bages, Spain
- Health Promotion in Rural Areas Research Group, Gerència d'Atenció Primària i a la Comunitat de la Catalunya Central, Institut Català de la Salut, Carrer Pica d'Estats, 13-15, 08272, Sant Fruitós de Bages, Barcelona, Spain
- Faculty of Medicine, University of Vic-Central University of Catalonia, Vic, Spain
| | - Anna Ruiz Comellas
- Unitat de Suport a la Recerca de la Catalunya Central, Fundació Institut Universitari per a la Recerca a l'Atenció Primària de Salut Jordi Gol i Gurina, Sant Fruitós de Bages, Spain
- Health Promotion in Rural Areas Research Group, Gerència d'Atenció Primària i a la Comunitat de la Catalunya Central, Institut Català de la Salut, Carrer Pica d'Estats, 13-15, 08272, Sant Fruitós de Bages, Barcelona, Spain
- Faculty of Medicine, University of Vic-Central University of Catalonia, Vic, Spain
| | - Jordi Solé-Casals
- Data and Signal Processing Group, Faculty of Science, Technology and Engineering, University of Vic-Central University of Catalonia, Vic, Spain.
- Department of Psychiatry, University of Cambridge, Cambridge, UK.
| |
Collapse
|
2
|
Maiter A, Hocking K, Matthews S, Taylor J, Sharkey M, Metherall P, Alabed S, Dwivedi K, Shahin Y, Anderson E, Holt S, Rowbotham C, Kamil MA, Hoggard N, Balasubramanian SP, Swift A, Johns CS. Evaluating the performance of artificial intelligence software for lung nodule detection on chest radiographs in a retrospective real-world UK population. BMJ Open 2023; 13:e077348. [PMID: 37940155 PMCID: PMC10632826 DOI: 10.1136/bmjopen-2023-077348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 10/16/2023] [Indexed: 11/10/2023] Open
Abstract
OBJECTIVES Early identification of lung cancer on chest radiographs improves patient outcomes. Artificial intelligence (AI) tools may increase diagnostic accuracy and streamline this pathway. This study evaluated the performance of commercially available AI-based software trained to identify cancerous lung nodules on chest radiographs. DESIGN This retrospective study included primary care chest radiographs acquired in a UK centre. The software evaluated each radiograph independently and outputs were compared with two reference standards: (1) the radiologist report and (2) the diagnosis of cancer by multidisciplinary team decision. Failure analysis was performed by interrogating the software marker locations on radiographs. PARTICIPANTS 5722 consecutive chest radiographs were included from 5592 patients (median age 59 years, 53.8% women, 1.6% prevalence of cancer). RESULTS Compared with radiologist reports for nodule detection, the software demonstrated sensitivity 54.5% (95% CI 44.2% to 64.4%), specificity 83.2% (82.2% to 84.1%), positive predictive value (PPV) 5.5% (4.6% to 6.6%) and negative predictive value (NPV) 99.0% (98.8% to 99.2%). Compared with cancer diagnosis, the software demonstrated sensitivity 60.9% (50.1% to 70.9%), specificity 83.3% (82.3% to 84.2%), PPV 5.6% (4.8% to 6.6%) and NPV 99.2% (99.0% to 99.4%). Normal or variant anatomy was misidentified as an abnormality in 69.9% of the 943 false positive cases. CONCLUSIONS The software demonstrated considerable underperformance in this real-world patient cohort. Failure analysis suggested a lack of generalisability in the training and testing datasets as a potential factor. The low PPV carries the risk of over-investigation and limits the translation of the software to clinical practice. Our findings highlight the importance of training and testing software in representative datasets, with broader implications for the implementation of AI tools in imaging.
Collapse
Affiliation(s)
- Ahmed Maiter
- School of Medicine and Population Health, The University of Sheffield, Sheffield, UK
- Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Katherine Hocking
- Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Suzanne Matthews
- Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
- Medical Imaging and Medical Physics, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Jonathan Taylor
- Medical Imaging and Medical Physics, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Michael Sharkey
- Medical Imaging and Medical Physics, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Peter Metherall
- Medical Imaging and Medical Physics, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Samer Alabed
- School of Medicine and Population Health, The University of Sheffield, Sheffield, UK
- Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Krit Dwivedi
- School of Medicine and Population Health, The University of Sheffield, Sheffield, UK
- Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Yousef Shahin
- School of Medicine and Population Health, The University of Sheffield, Sheffield, UK
- Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Elizabeth Anderson
- Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Sarah Holt
- Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | | | - Mohamed A Kamil
- Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Nigel Hoggard
- School of Medicine and Population Health, The University of Sheffield, Sheffield, UK
- Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
- NIHR Sheffield Biomedical Research Centre, Sheffield, UK
| | - Saba P Balasubramanian
- Medical Imaging and Medical Physics, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
- Surgical directorate, Sheffield Teaching Hospitals Foundation NHS Trust, Sheffield, UK
| | - Andrew Swift
- School of Medicine and Population Health, The University of Sheffield, Sheffield, UK
- Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
- NIHR Sheffield Biomedical Research Centre, Sheffield, UK
| | | |
Collapse
|
3
|
van Beek EJR, Ahn JS, Kim MJ, Murchison JT. Validation study of machine-learning chest radiograph software in primary and emergency medicine. Clin Radiol 2023; 78:1-7. [PMID: 36171164 DOI: 10.1016/j.crad.2022.08.129] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 07/20/2022] [Accepted: 08/08/2022] [Indexed: 01/07/2023]
Abstract
AIM To evaluate the performance of a machine learning based algorithm tool for chest radiographs (CXRs), applied to a consecutive cohort of historical clinical cases, in comparison to expert chest radiologists. MATERIALS AND METHODS The study comprised 1,960 consecutive CXR from primary care referrals and the emergency department (992 and 968 cases respectively), obtained in 2015 at a UK hospital. Two chest radiologists, each with >20 years of experience independently read all studies in consensus to serve as a reference standard. A chest artificial intelligence (AI) algorithm, Lunit INSIGHT CXR, was run on the CXRs, and results were correlated with those by the expert readers. The area under the receiver operating characteristic curve (AUROC) was calculated for the normal and 10 common findings: atelectasis, fibrosis, calcification, consolidation, lung nodules, cardiomegaly, mediastinal widening, pleural effusion, pneumothorax, and pneumoperitoneum. RESULTS The ground truth annotation identified 398 primary care and 578 emergency department datasets containing pathologies. The AI algorithm showed AUROC of 0.881-0.999 in the emergency department dataset and 0.881-0.998 in the primary care dataset. The AUROC for each of the findings between the primary care and emergency department datasets did not differ, except for pleural effusion (0.954 versus 0.988, p<0.001). CONCLUSIONS The AI algorithm can accurately and consistently differentiate normal from major thoracic abnormalities in both acute and non-acute settings, and can serve as a triage tool.
Collapse
Affiliation(s)
- E J R van Beek
- Edinburgh Imaging, Queen's Medical Research Institute, University of Edinburgh, Edinburgh, UK; Department of Radiology, Royal Infirmary of Edinburgh, Edinburgh, UK.
| | | | | | - J T Murchison
- Department of Radiology, Royal Infirmary of Edinburgh, Edinburgh, UK
| |
Collapse
|
4
|
Lee SY, Ha S, Jeon MG, Li H, Choi H, Kim HP, Choi YR, I H, Jeong YJ, Park YH, Ahn H, Hong SH, Koo HJ, Lee CW, Kim MJ, Kim YJ, Kim KW, Choi JM. Localization-adjusted diagnostic performance and assistance effect of a computer-aided detection system for pneumothorax and consolidation. NPJ Digit Med 2022; 5:107. [PMID: 35908091 PMCID: PMC9339006 DOI: 10.1038/s41746-022-00658-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 07/11/2022] [Indexed: 11/24/2022] Open
Abstract
While many deep-learning-based computer-aided detection systems (CAD) have been developed and commercialized for abnormality detection in chest radiographs (CXR), their ability to localize a target abnormality is rarely reported. Localization accuracy is important in terms of model interpretability, which is crucial in clinical settings. Moreover, diagnostic performances are likely to vary depending on thresholds which define an accurate localization. In a multi-center, stand-alone clinical trial using temporal and external validation datasets of 1,050 CXRs, we evaluated localization accuracy, localization-adjusted discrimination, and calibration of a commercially available deep-learning-based CAD for detecting consolidation and pneumothorax. The CAD achieved image-level AUROC (95% CI) of 0.960 (0.945, 0.975), sensitivity of 0.933 (0.899, 0.959), specificity of 0.948 (0.930, 0.963), dice of 0.691 (0.664, 0.718), moderate calibration for consolidation, and image-level AUROC of 0.978 (0.965, 0.991), sensitivity of 0.956 (0.923, 0.978), specificity of 0.996 (0.989, 0.999), dice of 0.798 (0.770, 0.826), moderate calibration for pneumothorax. Diagnostic performances varied substantially when localization accuracy was accounted for but remained high at the minimum threshold of clinical relevance. In a separate trial for diagnostic impact using 461 CXRs, the causal effect of the CAD assistance on clinicians’ diagnostic performances was estimated. After adjusting for age, sex, dataset, and abnormality type, the CAD improved clinicians’ diagnostic performances on average (OR [95% CI] = 1.73 [1.30, 2.32]; p < 0.001), although the effects varied substantially by clinical backgrounds. The CAD was found to have high stand-alone diagnostic performances and may beneficially impact clinicians’ diagnostic performances when used in clinical settings.
Collapse
Affiliation(s)
- Sun Yeop Lee
- Department of Medical Artificial Intelligence, Deepnoid, Inc., Seoul, Republic of Korea
| | - Sangwoo Ha
- Department of Medical Artificial Intelligence, Deepnoid, Inc., Seoul, Republic of Korea
| | - Min Gyeong Jeon
- Department of Medical Artificial Intelligence, Deepnoid, Inc., Seoul, Republic of Korea
| | - Hao Li
- Department of Medical Artificial Intelligence, Deepnoid, Inc., Seoul, Republic of Korea
| | - Hyunju Choi
- Department of Medical Artificial Intelligence, Deepnoid, Inc., Seoul, Republic of Korea
| | - Hwa Pyung Kim
- Department of Medical Artificial Intelligence, Deepnoid, Inc., Seoul, Republic of Korea
| | - Ye Ra Choi
- Department of Radiology, Seoul Metropolitan Government-Seoul National University Boramae Medical Center, Seoul, Republic of Korea.,Department of Radiology, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Hoseok I
- Department of Thoracic and Cardiovascular Surgery, Pusan National University School of Medicine, Busan, Republic of Korea.,Convergence Medical Institute of Technology, Biomedical Research Institute, Pusan National University Hospital, Busan, Republic of Korea
| | - Yeon Joo Jeong
- Department of Radiology and Biomedical Research Institute, Pusan National University Hospital, Busan, Republic of Korea
| | - Yoon Ha Park
- Department of Internal Medicine, Jawol Health Center, Incheon, Republic of Korea
| | - Hyemin Ahn
- Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Sang Hyup Hong
- Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Hyun Jung Koo
- Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Choong Wook Lee
- Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Min Jae Kim
- Department of Infectious Disease, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Yeon Joo Kim
- Department of Respiratory Allergy Medicine, Nowon Eulji Medical Center, Seoul, Republic of Korea
| | - Kyung Won Kim
- Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Jong Mun Choi
- Department of Medical Artificial Intelligence, Deepnoid, Inc., Seoul, Republic of Korea.
| |
Collapse
|
5
|
Li D, Pehrson LM, Lauridsen CA, Tøttrup L, Fraccaro M, Elliott D, Zając HD, Darkner S, Carlsen JF, Nielsen MB. The Added Effect of Artificial Intelligence on Physicians' Performance in Detecting Thoracic Pathologies on CT and Chest X-ray: A Systematic Review. Diagnostics (Basel) 2021; 11:diagnostics11122206. [PMID: 34943442 PMCID: PMC8700414 DOI: 10.3390/diagnostics11122206] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 11/18/2021] [Accepted: 11/23/2021] [Indexed: 12/20/2022] Open
Abstract
Our systematic review investigated the additional effect of artificial intelligence-based devices on human observers when diagnosing and/or detecting thoracic pathologies using different diagnostic imaging modalities, such as chest X-ray and CT. Peer-reviewed, original research articles from EMBASE, PubMed, Cochrane library, SCOPUS, and Web of Science were retrieved. Included articles were published within the last 20 years and used a device based on artificial intelligence (AI) technology to detect or diagnose pulmonary findings. The AI-based device had to be used in an observer test where the performance of human observers with and without addition of the device was measured as sensitivity, specificity, accuracy, AUC, or time spent on image reading. A total of 38 studies were included for final assessment. The quality assessment tool for diagnostic accuracy studies (QUADAS-2) was used for bias assessment. The average sensitivity increased from 67.8% to 74.6%; specificity from 82.2% to 85.4%; accuracy from 75.4% to 81.7%; and Area Under the ROC Curve (AUC) from 0.75 to 0.80. Generally, a faster reading time was reported when radiologists were aided by AI-based devices. Our systematic review showed that performance generally improved for the physicians when assisted by AI-based devices compared to unaided interpretation.
Collapse
Affiliation(s)
- Dana Li
- Department of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, Denmark; (L.M.P.); (C.A.L.); (J.F.C.); (M.B.N.)
- Department of Clinical Medicine, University of Copenhagen, 2100 Copenhagen, Denmark
- Correspondence:
| | - Lea Marie Pehrson
- Department of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, Denmark; (L.M.P.); (C.A.L.); (J.F.C.); (M.B.N.)
| | - Carsten Ammitzbøl Lauridsen
- Department of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, Denmark; (L.M.P.); (C.A.L.); (J.F.C.); (M.B.N.)
- Department of Technology, Faculty of Health and Technology, University College Copenhagen, 2200 Copenhagen, Denmark
| | - Lea Tøttrup
- Unumed Aps, 1055 Copenhagen, Denmark; (L.T.); (M.F.)
| | | | - Desmond Elliott
- Department of Computer Science, University of Copenhagen, 2100 Copenhagen, Denmark; (D.E.); (H.D.Z.); (S.D.)
| | - Hubert Dariusz Zając
- Department of Computer Science, University of Copenhagen, 2100 Copenhagen, Denmark; (D.E.); (H.D.Z.); (S.D.)
| | - Sune Darkner
- Department of Computer Science, University of Copenhagen, 2100 Copenhagen, Denmark; (D.E.); (H.D.Z.); (S.D.)
| | - Jonathan Frederik Carlsen
- Department of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, Denmark; (L.M.P.); (C.A.L.); (J.F.C.); (M.B.N.)
| | - Michael Bachmann Nielsen
- Department of Diagnostic Radiology, Copenhagen University Hospital, Rigshospitalet, 2100 Copenhagen, Denmark; (L.M.P.); (C.A.L.); (J.F.C.); (M.B.N.)
- Department of Clinical Medicine, University of Copenhagen, 2100 Copenhagen, Denmark
| |
Collapse
|