1
|
Bennani S, Regnard NE, Ventre J, Lassalle L, Nguyen T, Ducarouge A, Dargent L, Guillo E, Gouhier E, Zaimi SH, Canniff E, Malandrin C, Khafagy P, Koulakian H, Revel MP, Chassagnon G. Using AI to Improve Radiologist Performance in Detection of Abnormalities on Chest Radiographs. Radiology 2023; 309:e230860. [PMID: 38085079 DOI: 10.1148/radiol.230860] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
Background Chest radiography remains the most common radiologic examination, and interpretation of its results can be difficult. Purpose To explore the potential benefit of artificial intelligence (AI) assistance in the detection of thoracic abnormalities on chest radiographs by evaluating the performance of radiologists with different levels of expertise, with and without AI assistance. Materials and Methods Patients who underwent both chest radiography and thoracic CT within 72 hours between January 2010 and December 2020 in a French public hospital were screened retrospectively. Radiographs were randomly included until reaching 500 radiographs, with about 50% of radiographs having abnormal findings. A senior thoracic radiologist annotated the radiographs for five abnormalities (pneumothorax, pleural effusion, consolidation, mediastinal and hilar mass, lung nodule) based on the corresponding CT results (ground truth). A total of 12 readers (four thoracic radiologists, four general radiologists, four radiology residents) read half the radiographs without AI and half the radiographs with AI (ChestView; Gleamer). Changes in sensitivity and specificity were measured using paired t tests. Results The study included 500 patients (mean age, 54 years ± 19 [SD]; 261 female, 239 male), with 522 abnormalities visible on 241 radiographs. On average, for all readers, AI use resulted in an absolute increase in sensitivity of 26% (95% CI: 20, 32), 14% (95% CI: 11, 17), 12% (95% CI: 10, 14), 8.5% (95% CI: 6, 11), and 5.9% (95% CI: 4, 8) for pneumothorax, consolidation, nodule, pleural effusion, and mediastinal and hilar mass, respectively (P < .001). Specificity increased with AI assistance (3.9% [95% CI: 3.2, 4.6], 3.7% [95% CI: 3, 4.4], 2.9% [95% CI: 2.3, 3.5], and 2.1% [95% CI: 1.6, 2.6] for pleural effusion, mediastinal and hilar mass, consolidation, and nodule, respectively), except in the diagnosis of pneumothorax (-0.2%; 95% CI: -0.36, -0.04; P = .01). The mean reading time was 81 seconds without AI versus 56 seconds with AI (31% decrease, P < .001). Conclusion AI-assisted chest radiography interpretation resulted in absolute increases in sensitivity for all radiologists of various levels of expertise and reduced the reading times; specificity increased with AI, except in the diagnosis of pneumothorax. © RSNA, 2023 Supplemental material is available for this article.
Collapse
Affiliation(s)
- Souhail Bennani
- From the Department of Thoracic Imaging, Cochin Hospital, AP-HP, 27 Rue du Faubourg Saint-Jacques, Paris 75014, France (S.B., L.D., E. Guillo, E. Gouhier, S.H.Z., E.C., M.P.R., G.C.); Gleamer, Paris, France (S.B., N.E.R., J.V., L.L., T.N., A.D.); Réseau d'Imagerie Sud Francilien, Lieusant, France (N.E.R., L.L., C.M.); Department of Pediatric Radiology, Armand Trousseau Hospital, AP-HP, Paris, France (T.N.); HFR Fribourg, Fribourg, Switzerland (P.K.); and Centre d'Imagerie Médicale de l'Ouest Parisien, Paris, France (H.K.)
| | - Nor-Eddine Regnard
- From the Department of Thoracic Imaging, Cochin Hospital, AP-HP, 27 Rue du Faubourg Saint-Jacques, Paris 75014, France (S.B., L.D., E. Guillo, E. Gouhier, S.H.Z., E.C., M.P.R., G.C.); Gleamer, Paris, France (S.B., N.E.R., J.V., L.L., T.N., A.D.); Réseau d'Imagerie Sud Francilien, Lieusant, France (N.E.R., L.L., C.M.); Department of Pediatric Radiology, Armand Trousseau Hospital, AP-HP, Paris, France (T.N.); HFR Fribourg, Fribourg, Switzerland (P.K.); and Centre d'Imagerie Médicale de l'Ouest Parisien, Paris, France (H.K.)
| | - Jeanne Ventre
- From the Department of Thoracic Imaging, Cochin Hospital, AP-HP, 27 Rue du Faubourg Saint-Jacques, Paris 75014, France (S.B., L.D., E. Guillo, E. Gouhier, S.H.Z., E.C., M.P.R., G.C.); Gleamer, Paris, France (S.B., N.E.R., J.V., L.L., T.N., A.D.); Réseau d'Imagerie Sud Francilien, Lieusant, France (N.E.R., L.L., C.M.); Department of Pediatric Radiology, Armand Trousseau Hospital, AP-HP, Paris, France (T.N.); HFR Fribourg, Fribourg, Switzerland (P.K.); and Centre d'Imagerie Médicale de l'Ouest Parisien, Paris, France (H.K.)
| | - Louis Lassalle
- From the Department of Thoracic Imaging, Cochin Hospital, AP-HP, 27 Rue du Faubourg Saint-Jacques, Paris 75014, France (S.B., L.D., E. Guillo, E. Gouhier, S.H.Z., E.C., M.P.R., G.C.); Gleamer, Paris, France (S.B., N.E.R., J.V., L.L., T.N., A.D.); Réseau d'Imagerie Sud Francilien, Lieusant, France (N.E.R., L.L., C.M.); Department of Pediatric Radiology, Armand Trousseau Hospital, AP-HP, Paris, France (T.N.); HFR Fribourg, Fribourg, Switzerland (P.K.); and Centre d'Imagerie Médicale de l'Ouest Parisien, Paris, France (H.K.)
| | - Toan Nguyen
- From the Department of Thoracic Imaging, Cochin Hospital, AP-HP, 27 Rue du Faubourg Saint-Jacques, Paris 75014, France (S.B., L.D., E. Guillo, E. Gouhier, S.H.Z., E.C., M.P.R., G.C.); Gleamer, Paris, France (S.B., N.E.R., J.V., L.L., T.N., A.D.); Réseau d'Imagerie Sud Francilien, Lieusant, France (N.E.R., L.L., C.M.); Department of Pediatric Radiology, Armand Trousseau Hospital, AP-HP, Paris, France (T.N.); HFR Fribourg, Fribourg, Switzerland (P.K.); and Centre d'Imagerie Médicale de l'Ouest Parisien, Paris, France (H.K.)
| | - Alexis Ducarouge
- From the Department of Thoracic Imaging, Cochin Hospital, AP-HP, 27 Rue du Faubourg Saint-Jacques, Paris 75014, France (S.B., L.D., E. Guillo, E. Gouhier, S.H.Z., E.C., M.P.R., G.C.); Gleamer, Paris, France (S.B., N.E.R., J.V., L.L., T.N., A.D.); Réseau d'Imagerie Sud Francilien, Lieusant, France (N.E.R., L.L., C.M.); Department of Pediatric Radiology, Armand Trousseau Hospital, AP-HP, Paris, France (T.N.); HFR Fribourg, Fribourg, Switzerland (P.K.); and Centre d'Imagerie Médicale de l'Ouest Parisien, Paris, France (H.K.)
| | - Lucas Dargent
- From the Department of Thoracic Imaging, Cochin Hospital, AP-HP, 27 Rue du Faubourg Saint-Jacques, Paris 75014, France (S.B., L.D., E. Guillo, E. Gouhier, S.H.Z., E.C., M.P.R., G.C.); Gleamer, Paris, France (S.B., N.E.R., J.V., L.L., T.N., A.D.); Réseau d'Imagerie Sud Francilien, Lieusant, France (N.E.R., L.L., C.M.); Department of Pediatric Radiology, Armand Trousseau Hospital, AP-HP, Paris, France (T.N.); HFR Fribourg, Fribourg, Switzerland (P.K.); and Centre d'Imagerie Médicale de l'Ouest Parisien, Paris, France (H.K.)
| | - Enora Guillo
- From the Department of Thoracic Imaging, Cochin Hospital, AP-HP, 27 Rue du Faubourg Saint-Jacques, Paris 75014, France (S.B., L.D., E. Guillo, E. Gouhier, S.H.Z., E.C., M.P.R., G.C.); Gleamer, Paris, France (S.B., N.E.R., J.V., L.L., T.N., A.D.); Réseau d'Imagerie Sud Francilien, Lieusant, France (N.E.R., L.L., C.M.); Department of Pediatric Radiology, Armand Trousseau Hospital, AP-HP, Paris, France (T.N.); HFR Fribourg, Fribourg, Switzerland (P.K.); and Centre d'Imagerie Médicale de l'Ouest Parisien, Paris, France (H.K.)
| | - Elodie Gouhier
- From the Department of Thoracic Imaging, Cochin Hospital, AP-HP, 27 Rue du Faubourg Saint-Jacques, Paris 75014, France (S.B., L.D., E. Guillo, E. Gouhier, S.H.Z., E.C., M.P.R., G.C.); Gleamer, Paris, France (S.B., N.E.R., J.V., L.L., T.N., A.D.); Réseau d'Imagerie Sud Francilien, Lieusant, France (N.E.R., L.L., C.M.); Department of Pediatric Radiology, Armand Trousseau Hospital, AP-HP, Paris, France (T.N.); HFR Fribourg, Fribourg, Switzerland (P.K.); and Centre d'Imagerie Médicale de l'Ouest Parisien, Paris, France (H.K.)
| | - Sophie-Hélène Zaimi
- From the Department of Thoracic Imaging, Cochin Hospital, AP-HP, 27 Rue du Faubourg Saint-Jacques, Paris 75014, France (S.B., L.D., E. Guillo, E. Gouhier, S.H.Z., E.C., M.P.R., G.C.); Gleamer, Paris, France (S.B., N.E.R., J.V., L.L., T.N., A.D.); Réseau d'Imagerie Sud Francilien, Lieusant, France (N.E.R., L.L., C.M.); Department of Pediatric Radiology, Armand Trousseau Hospital, AP-HP, Paris, France (T.N.); HFR Fribourg, Fribourg, Switzerland (P.K.); and Centre d'Imagerie Médicale de l'Ouest Parisien, Paris, France (H.K.)
| | - Emma Canniff
- From the Department of Thoracic Imaging, Cochin Hospital, AP-HP, 27 Rue du Faubourg Saint-Jacques, Paris 75014, France (S.B., L.D., E. Guillo, E. Gouhier, S.H.Z., E.C., M.P.R., G.C.); Gleamer, Paris, France (S.B., N.E.R., J.V., L.L., T.N., A.D.); Réseau d'Imagerie Sud Francilien, Lieusant, France (N.E.R., L.L., C.M.); Department of Pediatric Radiology, Armand Trousseau Hospital, AP-HP, Paris, France (T.N.); HFR Fribourg, Fribourg, Switzerland (P.K.); and Centre d'Imagerie Médicale de l'Ouest Parisien, Paris, France (H.K.)
| | - Cécile Malandrin
- From the Department of Thoracic Imaging, Cochin Hospital, AP-HP, 27 Rue du Faubourg Saint-Jacques, Paris 75014, France (S.B., L.D., E. Guillo, E. Gouhier, S.H.Z., E.C., M.P.R., G.C.); Gleamer, Paris, France (S.B., N.E.R., J.V., L.L., T.N., A.D.); Réseau d'Imagerie Sud Francilien, Lieusant, France (N.E.R., L.L., C.M.); Department of Pediatric Radiology, Armand Trousseau Hospital, AP-HP, Paris, France (T.N.); HFR Fribourg, Fribourg, Switzerland (P.K.); and Centre d'Imagerie Médicale de l'Ouest Parisien, Paris, France (H.K.)
| | - Philippe Khafagy
- From the Department of Thoracic Imaging, Cochin Hospital, AP-HP, 27 Rue du Faubourg Saint-Jacques, Paris 75014, France (S.B., L.D., E. Guillo, E. Gouhier, S.H.Z., E.C., M.P.R., G.C.); Gleamer, Paris, France (S.B., N.E.R., J.V., L.L., T.N., A.D.); Réseau d'Imagerie Sud Francilien, Lieusant, France (N.E.R., L.L., C.M.); Department of Pediatric Radiology, Armand Trousseau Hospital, AP-HP, Paris, France (T.N.); HFR Fribourg, Fribourg, Switzerland (P.K.); and Centre d'Imagerie Médicale de l'Ouest Parisien, Paris, France (H.K.)
| | - Hasmik Koulakian
- From the Department of Thoracic Imaging, Cochin Hospital, AP-HP, 27 Rue du Faubourg Saint-Jacques, Paris 75014, France (S.B., L.D., E. Guillo, E. Gouhier, S.H.Z., E.C., M.P.R., G.C.); Gleamer, Paris, France (S.B., N.E.R., J.V., L.L., T.N., A.D.); Réseau d'Imagerie Sud Francilien, Lieusant, France (N.E.R., L.L., C.M.); Department of Pediatric Radiology, Armand Trousseau Hospital, AP-HP, Paris, France (T.N.); HFR Fribourg, Fribourg, Switzerland (P.K.); and Centre d'Imagerie Médicale de l'Ouest Parisien, Paris, France (H.K.)
| | - Marie-Pierre Revel
- From the Department of Thoracic Imaging, Cochin Hospital, AP-HP, 27 Rue du Faubourg Saint-Jacques, Paris 75014, France (S.B., L.D., E. Guillo, E. Gouhier, S.H.Z., E.C., M.P.R., G.C.); Gleamer, Paris, France (S.B., N.E.R., J.V., L.L., T.N., A.D.); Réseau d'Imagerie Sud Francilien, Lieusant, France (N.E.R., L.L., C.M.); Department of Pediatric Radiology, Armand Trousseau Hospital, AP-HP, Paris, France (T.N.); HFR Fribourg, Fribourg, Switzerland (P.K.); and Centre d'Imagerie Médicale de l'Ouest Parisien, Paris, France (H.K.)
| | - Guillaume Chassagnon
- From the Department of Thoracic Imaging, Cochin Hospital, AP-HP, 27 Rue du Faubourg Saint-Jacques, Paris 75014, France (S.B., L.D., E. Guillo, E. Gouhier, S.H.Z., E.C., M.P.R., G.C.); Gleamer, Paris, France (S.B., N.E.R., J.V., L.L., T.N., A.D.); Réseau d'Imagerie Sud Francilien, Lieusant, France (N.E.R., L.L., C.M.); Department of Pediatric Radiology, Armand Trousseau Hospital, AP-HP, Paris, France (T.N.); HFR Fribourg, Fribourg, Switzerland (P.K.); and Centre d'Imagerie Médicale de l'Ouest Parisien, Paris, France (H.K.)
| |
Collapse
|
2
|
Yoon J, Han J, Ko J, Choi S, Park JI, Hwang JS, Han JM, Hwang DDJ. Developing and Evaluating an AI-Based Computer-Aided Diagnosis System for Retinal Disease: Diagnostic Study for Central Serous Chorioretinopathy. J Med Internet Res 2023; 25:e48142. [PMID: 38019564 PMCID: PMC10719821 DOI: 10.2196/48142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 10/29/2023] [Accepted: 11/05/2023] [Indexed: 11/30/2023] Open
Abstract
BACKGROUND Although previous research has made substantial progress in developing high-performance artificial intelligence (AI)-based computer-aided diagnosis (AI-CAD) systems in various medical domains, little attention has been paid to developing and evaluating AI-CAD system in ophthalmology, particularly for diagnosing retinal diseases using optical coherence tomography (OCT) images. OBJECTIVE This diagnostic study aimed to determine the usefulness of a proposed AI-CAD system in assisting ophthalmologists with the diagnosis of central serous chorioretinopathy (CSC), which is known to be difficult to diagnose, using OCT images. METHODS For the training and evaluation of the proposed deep learning model, 1693 OCT images were collected and annotated. The data set included 929 and 764 cases of acute and chronic CSC, respectively. In total, 66 ophthalmologists (2 groups: 36 retina and 30 nonretina specialists) participated in the observer performance test. To evaluate the deep learning algorithm used in the proposed AI-CAD system, the training, validation, and test sets were split in an 8:1:1 ratio. Further, 100 randomly sampled OCT images from the test set were used for the observer performance test, and the participants were instructed to select a CSC subtype for each of these images. Each image was provided under different conditions: (1) without AI assistance, (2) with AI assistance with a probability score, and (3) with AI assistance with a probability score and visual evidence heatmap. The sensitivity, specificity, and area under the receiver operating characteristic curve were used to measure the diagnostic performance of the model and ophthalmologists. RESULTS The proposed system achieved a high detection performance (99% of the area under the curve) for CSC, outperforming the 66 ophthalmologists who participated in the observer performance test. In both groups, ophthalmologists with the support of AI assistance with a probability score and visual evidence heatmap achieved the highest mean diagnostic performance compared with that of those subjected to other conditions (without AI assistance or with AI assistance with a probability score). Nonretina specialists achieved expert-level diagnostic performance with the support of the proposed AI-CAD system. CONCLUSIONS Our proposed AI-CAD system improved the diagnosis of CSC by ophthalmologists, which may support decision-making regarding retinal disease detection and alleviate the workload of ophthalmologists.
Collapse
Affiliation(s)
- Jeewoo Yoon
- Department of Applied Artificial Intelligence, Sungkyunkwan University, Seoul, Republic of Korea
- Raondata, Seoul, Republic of Korea
| | - Jinyoung Han
- Department of Applied Artificial Intelligence, Sungkyunkwan University, Seoul, Republic of Korea
- Department of Human-Artificial Intelligence Interaction, Sungkyunkwan University, Seoul, Republic of Korea
| | - Junseo Ko
- Department of Applied Artificial Intelligence, Sungkyunkwan University, Seoul, Republic of Korea
- Raondata, Seoul, Republic of Korea
| | - Seong Choi
- Department of Applied Artificial Intelligence, Sungkyunkwan University, Seoul, Republic of Korea
- Raondata, Seoul, Republic of Korea
| | - Ji In Park
- Department of Medicine, Kangwon National University School of Medicine, Kangwon National University Hospital, Chuncheon, Republic of Korea
| | | | - Jeong Mo Han
- Seoul Bombit Eye Clinic, Sejong, Republic of Korea
| | - Daniel Duck-Jin Hwang
- Department of Ophthalmology, Hangil Eye Hospital, Incheon, Republic of Korea
- Lux Mind, Incheon, Republic of Korea
| |
Collapse
|
3
|
Maiter A, Hocking K, Matthews S, Taylor J, Sharkey M, Metherall P, Alabed S, Dwivedi K, Shahin Y, Anderson E, Holt S, Rowbotham C, Kamil MA, Hoggard N, Balasubramanian SP, Swift A, Johns CS. Evaluating the performance of artificial intelligence software for lung nodule detection on chest radiographs in a retrospective real-world UK population. BMJ Open 2023; 13:e077348. [PMID: 37940155 PMCID: PMC10632826 DOI: 10.1136/bmjopen-2023-077348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 10/16/2023] [Indexed: 11/10/2023] Open
Abstract
OBJECTIVES Early identification of lung cancer on chest radiographs improves patient outcomes. Artificial intelligence (AI) tools may increase diagnostic accuracy and streamline this pathway. This study evaluated the performance of commercially available AI-based software trained to identify cancerous lung nodules on chest radiographs. DESIGN This retrospective study included primary care chest radiographs acquired in a UK centre. The software evaluated each radiograph independently and outputs were compared with two reference standards: (1) the radiologist report and (2) the diagnosis of cancer by multidisciplinary team decision. Failure analysis was performed by interrogating the software marker locations on radiographs. PARTICIPANTS 5722 consecutive chest radiographs were included from 5592 patients (median age 59 years, 53.8% women, 1.6% prevalence of cancer). RESULTS Compared with radiologist reports for nodule detection, the software demonstrated sensitivity 54.5% (95% CI 44.2% to 64.4%), specificity 83.2% (82.2% to 84.1%), positive predictive value (PPV) 5.5% (4.6% to 6.6%) and negative predictive value (NPV) 99.0% (98.8% to 99.2%). Compared with cancer diagnosis, the software demonstrated sensitivity 60.9% (50.1% to 70.9%), specificity 83.3% (82.3% to 84.2%), PPV 5.6% (4.8% to 6.6%) and NPV 99.2% (99.0% to 99.4%). Normal or variant anatomy was misidentified as an abnormality in 69.9% of the 943 false positive cases. CONCLUSIONS The software demonstrated considerable underperformance in this real-world patient cohort. Failure analysis suggested a lack of generalisability in the training and testing datasets as a potential factor. The low PPV carries the risk of over-investigation and limits the translation of the software to clinical practice. Our findings highlight the importance of training and testing software in representative datasets, with broader implications for the implementation of AI tools in imaging.
Collapse
Affiliation(s)
- Ahmed Maiter
- School of Medicine and Population Health, The University of Sheffield, Sheffield, UK
- Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Katherine Hocking
- Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Suzanne Matthews
- Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
- Medical Imaging and Medical Physics, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Jonathan Taylor
- Medical Imaging and Medical Physics, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Michael Sharkey
- Medical Imaging and Medical Physics, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Peter Metherall
- Medical Imaging and Medical Physics, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Samer Alabed
- School of Medicine and Population Health, The University of Sheffield, Sheffield, UK
- Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Krit Dwivedi
- School of Medicine and Population Health, The University of Sheffield, Sheffield, UK
- Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Yousef Shahin
- School of Medicine and Population Health, The University of Sheffield, Sheffield, UK
- Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Elizabeth Anderson
- Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Sarah Holt
- Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | | | - Mohamed A Kamil
- Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Nigel Hoggard
- School of Medicine and Population Health, The University of Sheffield, Sheffield, UK
- Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
- NIHR Sheffield Biomedical Research Centre, Sheffield, UK
| | - Saba P Balasubramanian
- Medical Imaging and Medical Physics, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
- Surgical directorate, Sheffield Teaching Hospitals Foundation NHS Trust, Sheffield, UK
| | - Andrew Swift
- School of Medicine and Population Health, The University of Sheffield, Sheffield, UK
- Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
- NIHR Sheffield Biomedical Research Centre, Sheffield, UK
| | | |
Collapse
|
4
|
Lind Plesner L, Müller FC, Brejnebøl MW, Laustrup LC, Rasmussen F, Nielsen OW, Boesen M, Brun Andersen M. Commercially Available Chest Radiograph AI Tools for Detecting Airspace Disease, Pneumothorax, and Pleural Effusion. Radiology 2023; 308:e231236. [PMID: 37750768 DOI: 10.1148/radiol.231236] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/27/2023]
Abstract
Background Commercially available artificial intelligence (AI) tools can assist radiologists in interpreting chest radiographs, but their real-life diagnostic accuracy remains unclear. Purpose To evaluate the diagnostic accuracy of four commercially available AI tools for detection of airspace disease, pneumothorax, and pleural effusion on chest radiographs. Materials and Methods This retrospective study included consecutive adult patients who underwent chest radiography at one of four Danish hospitals in January 2020. Two thoracic radiologists (or three, in cases of disagreement) who had access to all previous and future imaging labeled chest radiographs independently for the reference standard. Area under the receiver operating characteristic curve, sensitivity, and specificity were calculated. Sensitivity and specificity were additionally stratified according to the severity of findings, number of findings on chest radiographs, and radiographic projection. The χ2 and McNemar tests were used for comparisons. Results The data set comprised 2040 patients (median age, 72 years [IQR, 58-81 years]; 1033 female), of whom 669 (32.8%) had target findings. The AI tools demonstrated areas under the receiver operating characteristic curve ranging 0.83-0.88 for airspace disease, 0.89-0.97 for pneumothorax, and 0.94-0.97 for pleural effusion. Sensitivities ranged 72%-91% for airspace disease, 63%-90% for pneumothorax, and 62%-95% for pleural effusion. Negative predictive values ranged 92%-100% for all target findings. In airspace disease, pneumothorax, and pleural effusion, specificity was high for chest radiographs with normal or single findings (range, 85%-96%, 99%-100%, and 95%-100%, respectively) and markedly lower for chest radiographs with four or more findings (range, 27%-69%, 96%-99%, 65%-92%, respectively) (P < .001). AI sensitivity was lower for vague airspace disease (range, 33%-61%) and small pneumothorax or pleural effusion (range, 9%-94%) compared with larger findings (range, 81%-100%; P value range, > .99 to < .001). Conclusion Current-generation AI tools showed moderate to high sensitivity for detecting airspace disease, pneumothorax, and pleural effusion on chest radiographs. However, they produced more false-positive findings than radiology reports, and their performance decreased for smaller-sized target findings and when multiple findings were present. © RSNA, 2023 Supplemental material is available for this article. See also the editorial by Yanagawa and Tomiyama in this issue.
Collapse
Affiliation(s)
- Louis Lind Plesner
- From the Department of Radiology, Herlev and Gentofte Hospital, Borgmester Ib, Juuls vej 1 Herlev, Copenhagen 2730, Denmark (L.L.P., F.C.M., M.W.B., L.C.L., M.B.A.); Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark (L.L.P., M.W.B., O.W.N., M.B., M.B.A.); Radiological Artificial Intelligence Testcenter, RAIT.dk, Capital Region of Denmark (L.L.P., F.C.M., M.W.B., M.B., M.B.A.); Departments of Radiology (M.W.B., M.B.) and Cardiology (O.W.N.), Bispebjerg and Frederiksberg Hospital, Copenhagen, Denmark; and Department of Radiology, Aarhus University Hospital, Aarhus, Denmark (F.R.)
| | - Felix C Müller
- From the Department of Radiology, Herlev and Gentofte Hospital, Borgmester Ib, Juuls vej 1 Herlev, Copenhagen 2730, Denmark (L.L.P., F.C.M., M.W.B., L.C.L., M.B.A.); Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark (L.L.P., M.W.B., O.W.N., M.B., M.B.A.); Radiological Artificial Intelligence Testcenter, RAIT.dk, Capital Region of Denmark (L.L.P., F.C.M., M.W.B., M.B., M.B.A.); Departments of Radiology (M.W.B., M.B.) and Cardiology (O.W.N.), Bispebjerg and Frederiksberg Hospital, Copenhagen, Denmark; and Department of Radiology, Aarhus University Hospital, Aarhus, Denmark (F.R.)
| | - Mathias W Brejnebøl
- From the Department of Radiology, Herlev and Gentofte Hospital, Borgmester Ib, Juuls vej 1 Herlev, Copenhagen 2730, Denmark (L.L.P., F.C.M., M.W.B., L.C.L., M.B.A.); Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark (L.L.P., M.W.B., O.W.N., M.B., M.B.A.); Radiological Artificial Intelligence Testcenter, RAIT.dk, Capital Region of Denmark (L.L.P., F.C.M., M.W.B., M.B., M.B.A.); Departments of Radiology (M.W.B., M.B.) and Cardiology (O.W.N.), Bispebjerg and Frederiksberg Hospital, Copenhagen, Denmark; and Department of Radiology, Aarhus University Hospital, Aarhus, Denmark (F.R.)
| | - Lene C Laustrup
- From the Department of Radiology, Herlev and Gentofte Hospital, Borgmester Ib, Juuls vej 1 Herlev, Copenhagen 2730, Denmark (L.L.P., F.C.M., M.W.B., L.C.L., M.B.A.); Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark (L.L.P., M.W.B., O.W.N., M.B., M.B.A.); Radiological Artificial Intelligence Testcenter, RAIT.dk, Capital Region of Denmark (L.L.P., F.C.M., M.W.B., M.B., M.B.A.); Departments of Radiology (M.W.B., M.B.) and Cardiology (O.W.N.), Bispebjerg and Frederiksberg Hospital, Copenhagen, Denmark; and Department of Radiology, Aarhus University Hospital, Aarhus, Denmark (F.R.)
| | - Finn Rasmussen
- From the Department of Radiology, Herlev and Gentofte Hospital, Borgmester Ib, Juuls vej 1 Herlev, Copenhagen 2730, Denmark (L.L.P., F.C.M., M.W.B., L.C.L., M.B.A.); Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark (L.L.P., M.W.B., O.W.N., M.B., M.B.A.); Radiological Artificial Intelligence Testcenter, RAIT.dk, Capital Region of Denmark (L.L.P., F.C.M., M.W.B., M.B., M.B.A.); Departments of Radiology (M.W.B., M.B.) and Cardiology (O.W.N.), Bispebjerg and Frederiksberg Hospital, Copenhagen, Denmark; and Department of Radiology, Aarhus University Hospital, Aarhus, Denmark (F.R.)
| | - Olav W Nielsen
- From the Department of Radiology, Herlev and Gentofte Hospital, Borgmester Ib, Juuls vej 1 Herlev, Copenhagen 2730, Denmark (L.L.P., F.C.M., M.W.B., L.C.L., M.B.A.); Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark (L.L.P., M.W.B., O.W.N., M.B., M.B.A.); Radiological Artificial Intelligence Testcenter, RAIT.dk, Capital Region of Denmark (L.L.P., F.C.M., M.W.B., M.B., M.B.A.); Departments of Radiology (M.W.B., M.B.) and Cardiology (O.W.N.), Bispebjerg and Frederiksberg Hospital, Copenhagen, Denmark; and Department of Radiology, Aarhus University Hospital, Aarhus, Denmark (F.R.)
| | - Mikael Boesen
- From the Department of Radiology, Herlev and Gentofte Hospital, Borgmester Ib, Juuls vej 1 Herlev, Copenhagen 2730, Denmark (L.L.P., F.C.M., M.W.B., L.C.L., M.B.A.); Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark (L.L.P., M.W.B., O.W.N., M.B., M.B.A.); Radiological Artificial Intelligence Testcenter, RAIT.dk, Capital Region of Denmark (L.L.P., F.C.M., M.W.B., M.B., M.B.A.); Departments of Radiology (M.W.B., M.B.) and Cardiology (O.W.N.), Bispebjerg and Frederiksberg Hospital, Copenhagen, Denmark; and Department of Radiology, Aarhus University Hospital, Aarhus, Denmark (F.R.)
| | - Michael Brun Andersen
- From the Department of Radiology, Herlev and Gentofte Hospital, Borgmester Ib, Juuls vej 1 Herlev, Copenhagen 2730, Denmark (L.L.P., F.C.M., M.W.B., L.C.L., M.B.A.); Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark (L.L.P., M.W.B., O.W.N., M.B., M.B.A.); Radiological Artificial Intelligence Testcenter, RAIT.dk, Capital Region of Denmark (L.L.P., F.C.M., M.W.B., M.B., M.B.A.); Departments of Radiology (M.W.B., M.B.) and Cardiology (O.W.N.), Bispebjerg and Frederiksberg Hospital, Copenhagen, Denmark; and Department of Radiology, Aarhus University Hospital, Aarhus, Denmark (F.R.)
| |
Collapse
|
5
|
Sugibayashi T, Walston SL, Matsumoto T, Mitsuyama Y, Miki Y, Ueda D. Deep learning for pneumothorax diagnosis: a systematic review and meta-analysis. Eur Respir Rev 2023; 32:32/168/220259. [PMID: 37286217 DOI: 10.1183/16000617.0259-2022] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Accepted: 03/16/2023] [Indexed: 06/09/2023] Open
Abstract
BACKGROUND Deep learning (DL), a subset of artificial intelligence (AI), has been applied to pneumothorax diagnosis to aid physician diagnosis, but no meta-analysis has been performed. METHODS A search of multiple electronic databases through September 2022 was performed to identify studies that applied DL for pneumothorax diagnosis using imaging. Meta-analysis via a hierarchical model to calculate the summary area under the curve (AUC) and pooled sensitivity and specificity for both DL and physicians was performed. Risk of bias was assessed using a modified Prediction Model Study Risk of Bias Assessment Tool. RESULTS In 56 of the 63 primary studies, pneumothorax was identified from chest radiography. The total AUC was 0.97 (95% CI 0.96-0.98) for both DL and physicians. The total pooled sensitivity was 84% (95% CI 79-89%) for DL and 85% (95% CI 73-92%) for physicians and the pooled specificity was 96% (95% CI 94-98%) for DL and 98% (95% CI 95-99%) for physicians. More than half of the original studies (57%) had a high risk of bias. CONCLUSIONS Our review found the diagnostic performance of DL models was similar to that of physicians, although the majority of studies had a high risk of bias. Further pneumothorax AI research is needed.
Collapse
Affiliation(s)
- Takahiro Sugibayashi
- Department of Diagnostic and Interventional Radiology, Graduate School of Medicine, Osaka Metropolitan University, Osaka, Japan
| | - Shannon L Walston
- Department of Diagnostic and Interventional Radiology, Graduate School of Medicine, Osaka Metropolitan University, Osaka, Japan
| | - Toshimasa Matsumoto
- Department of Diagnostic and Interventional Radiology, Graduate School of Medicine, Osaka Metropolitan University, Osaka, Japan
- Smart Life Science Lab, Center for Health Science Innovation, Osaka Metropolitan University, Osaka, Japan
| | - Yasuhito Mitsuyama
- Department of Diagnostic and Interventional Radiology, Graduate School of Medicine, Osaka Metropolitan University, Osaka, Japan
| | - Yukio Miki
- Department of Diagnostic and Interventional Radiology, Graduate School of Medicine, Osaka Metropolitan University, Osaka, Japan
| | - Daiju Ueda
- Department of Diagnostic and Interventional Radiology, Graduate School of Medicine, Osaka Metropolitan University, Osaka, Japan
- Smart Life Science Lab, Center for Health Science Innovation, Osaka Metropolitan University, Osaka, Japan
| |
Collapse
|
6
|
Nuutinen M, Leskelä RL. Systematic review of the performance evaluation of clinicians with or without the aid of machine learning clinical decision support system. HEALTH AND TECHNOLOGY 2023; 13:1-14. [PMID: 37363342 PMCID: PMC10262137 DOI: 10.1007/s12553-023-00763-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Accepted: 06/01/2023] [Indexed: 06/28/2023]
Abstract
Background For the adoption of machine learning clinical decision support systems (ML-CDSS) it is critical to understand the performance aid of the ML-CDSS. However, it is not trivial, how the performance aid should be evaluated. To design reliable performance evaluation study, both the knowledge from the practical framework of experimental study design and the understanding of domain specific design factors are required. Objective The aim of this review study was to form a practical framework and identify key design factors for experimental design in evaluating the performance of clinicians with or without the aid of ML-CDSS. Methods The study was based on published ML-CDSS performance evaluation studies. We systematically searched articles published between January 2016 and December 2022. From the articles we collected a set of design factors. Only the articles comparing the performance of clinicians with or without the aid of ML-CDSS using experimental study methods were considered. Results The identified key design factors for the practical framework of ML-CDSS experimental study design were performance measures, user interface, ground truth data and the selection of samples and participants. In addition, we identified the importance of randomization, crossover design and training and practice rounds. Previous studies had shortcomings in the rationale and documentation of choices regarding the number of participants and the duration of the experiment. Conclusion The design factors of ML-CDSS experimental study are interdependent and all factors must be considered in individual choices. Supplementary Information The online version contains supplementary material available at 10.1007/s12553-023-00763-1.
Collapse
Affiliation(s)
- Mikko Nuutinen
- Nordic Healthcare Group, Helsinki, Finland
- Haartman Institute, University of Helsinki, Helsinki, Finland
| | | |
Collapse
|
7
|
Kim C, Yang Z, Park SH, Hwang SH, Oh YW, Kang EY, Yong HS. Multicentre external validation of a commercial artificial intelligence software to analyse chest radiographs in health screening environments with low disease prevalence. Eur Radiol 2023; 33:3501-3509. [PMID: 36624227 DOI: 10.1007/s00330-022-09315-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 10/13/2022] [Accepted: 11/22/2022] [Indexed: 01/11/2023]
Abstract
OBJECTIVES To externally validate the performance of a commercial AI software program for interpreting CXRs in a large, consecutive, real-world cohort from primary healthcare centres. METHODS A total of 3047 CXRs were collected from two primary healthcare centres, characterised by low disease prevalence, between January and December 2018. All CXRs were labelled as normal or abnormal according to CT findings. Four radiology residents read all CXRs twice with and without AI assistance. The performances of the AI and readers with and without AI assistance were measured in terms of area under the receiver operating characteristic curve (AUROC), sensitivity, and specificity. RESULTS The prevalence of clinically significant lesions was 2.2% (68 of 3047). The AUROC, sensitivity, and specificity of the AI were 0.648 (95% confidence interval [CI] 0.630-0.665), 35.3% (CI, 24.7-47.8), and 94.2% (CI, 93.3-95.0), respectively. AI detected 12 of 41 pneumonia, 3 of 5 tuberculosis, and 9 of 22 tumours. AI-undetected lesions tended to be smaller than true-positive lesions. The readers' AUROCs ranged from 0.534-0.676 without AI and 0.571-0.688 with AI (all p values < 0.05). For all readers, the mean reading time was 2.96-10.27 s longer with AI assistance (all p values < 0.05). CONCLUSIONS The performance of commercial AI in these high-volume, low-prevalence settings was poorer than expected, although it modestly boosted the performance of less-experienced readers. The technical prowess of AI demonstrated in experimental settings and approved by regulatory bodies may not directly translate to real-world practice, especially where the demand for AI assistance is highest. KEY POINTS • This study shows the limited applicability of commercial AI software for detecting abnormalities in CXRs in a health screening population. • When using AI software in a specific clinical setting that differs from the training setting, it is necessary to adjust the threshold or perform additional training with such data that reflects this environment well. • Prospective test accuracy studies, randomised controlled trials, or cohort studies are needed to examine AI software to be implemented in real clinical practice.
Collapse
Affiliation(s)
- Cherry Kim
- Department of Radiology, Ansan Hospital, Korea University College of Medicine, 123, Jeokgeum-ro, Danwon-gu, Ansan-si, Gyeonggi, 15355, South Korea
| | - Zepa Yang
- Biomedical Research Center, Guro Hospital, Korea University College of Medicine, Seoul, 08308, South Korea
| | - Seong Ho Park
- Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, 05505, South Korea
| | - Sung Ho Hwang
- Department of Radiology, Anam Hospital, Korea University College of Medicine, Seoul, 02841, South Korea
| | - Yu-Whan Oh
- Department of Radiology, Anam Hospital, Korea University College of Medicine, Seoul, 02841, South Korea
| | - Eun-Young Kang
- Department of Radiology, Guro Hospital, Korea University College of Medicine, 33-41, Gurodong-ro 28-gil, Guro-gu, Seoul, 08308, South Korea
| | - Hwan Seok Yong
- Department of Radiology, Guro Hospital, Korea University College of Medicine, 33-41, Gurodong-ro 28-gil, Guro-gu, Seoul, 08308, South Korea.
| |
Collapse
|
8
|
Ahmad HK, Milne MR, Buchlak QD, Ektas N, Sanderson G, Chamtie H, Karunasena S, Chiang J, Holt X, Tang CHM, Seah JCY, Bottrell G, Esmaili N, Brotchie P, Jones C. Machine Learning Augmented Interpretation of Chest X-rays: A Systematic Review. Diagnostics (Basel) 2023; 13:diagnostics13040743. [PMID: 36832231 PMCID: PMC9955112 DOI: 10.3390/diagnostics13040743] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 02/13/2023] [Accepted: 02/14/2023] [Indexed: 02/18/2023] Open
Abstract
Limitations of the chest X-ray (CXR) have resulted in attempts to create machine learning systems to assist clinicians and improve interpretation accuracy. An understanding of the capabilities and limitations of modern machine learning systems is necessary for clinicians as these tools begin to permeate practice. This systematic review aimed to provide an overview of machine learning applications designed to facilitate CXR interpretation. A systematic search strategy was executed to identify research into machine learning algorithms capable of detecting >2 radiographic findings on CXRs published between January 2020 and September 2022. Model details and study characteristics, including risk of bias and quality, were summarized. Initially, 2248 articles were retrieved, with 46 included in the final review. Published models demonstrated strong standalone performance and were typically as accurate, or more accurate, than radiologists or non-radiologist clinicians. Multiple studies demonstrated an improvement in the clinical finding classification performance of clinicians when models acted as a diagnostic assistance device. Device performance was compared with that of clinicians in 30% of studies, while effects on clinical perception and diagnosis were evaluated in 19%. Only one study was prospectively run. On average, 128,662 images were used to train and validate models. Most classified less than eight clinical findings, while the three most comprehensive models classified 54, 72, and 124 findings. This review suggests that machine learning devices designed to facilitate CXR interpretation perform strongly, improve the detection performance of clinicians, and improve the efficiency of radiology workflow. Several limitations were identified, and clinician involvement and expertise will be key to driving the safe implementation of quality CXR machine learning systems.
Collapse
Affiliation(s)
- Hassan K. Ahmad
- Annalise.ai, Sydney, NSW 2000, Australia
- Department of Emergency Medicine, Royal North Shore Hospital, Sydney, NSW 2065, Australia
- Correspondence:
| | | | - Quinlan D. Buchlak
- Annalise.ai, Sydney, NSW 2000, Australia
- School of Medicine, University of Notre Dame Australia, Sydney, NSW 2007, Australia
- Department of Neurosurgery, Monash Health, Melbourne, VIC 3168, Australia
| | | | | | | | | | - Jason Chiang
- Annalise.ai, Sydney, NSW 2000, Australia
- Department of General Practice, University of Melbourne, Melbourne, VIC 3010, Australia
- Westmead Applied Research Centre, University of Sydney, Sydney, NSW 2006, Australia
| | | | | | - Jarrel C. Y. Seah
- Annalise.ai, Sydney, NSW 2000, Australia
- Department of Radiology, Alfred Health, Melbourne, VIC 3004, Australia
| | | | - Nazanin Esmaili
- School of Medicine, University of Notre Dame Australia, Sydney, NSW 2007, Australia
- Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW 2007, Australia
| | - Peter Brotchie
- Annalise.ai, Sydney, NSW 2000, Australia
- Department of Radiology, St Vincent’s Health Australia, Melbourne, VIC 3065, Australia
| | - Catherine Jones
- Annalise.ai, Sydney, NSW 2000, Australia
- I-MED Radiology Network, Brisbane, QLD 4006, Australia
- School of Public and Preventive Health, Monash University, Clayton, VIC 3800, Australia
- Department of Clinical Imaging Science, University of Sydney, Sydney, NSW 2006, Australia
| |
Collapse
|
9
|
Toda N, Hashimoto M, Iwabuchi Y, Nagasaka M, Takeshita R, Yamada M, Yamada Y, Jinzaki M. Validation of deep learning-based computer-aided detection software use for interpretation of pulmonary abnormalities on chest radiographs and examination of factors that influence readers' performance and final diagnosis. Jpn J Radiol 2023; 41:38-44. [PMID: 36121622 DOI: 10.1007/s11604-022-01330-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2022] [Accepted: 08/15/2022] [Indexed: 01/07/2023]
Abstract
PURPOSE To evaluate the performance of a deep learning-based computer-aided detection (CAD) software for detecting pulmonary nodules, masses, and consolidation on chest radiographs (CRs) and to examine the effect of readers' experience and data characteristics on the sensitivity and final diagnosis. MATERIALS AND METHODS The CRs of 453 patients were retrospectively selected from two institutions. Among these CRs, 60 images with abnormal findings (pulmonary nodules, masses, and consolidation) and 140 without abnormal findings were randomly selected for sequential observer-performance testing. In the test, 12 readers (three radiologists, three pulmonologists, three non-pulmonology physicians, and three junior residents) interpreted 200 images with and without CAD, and the findings were compared. Weighted alternative free-response receiver operating characteristic (wAFROC) figure of merit (FOM) was used to analyze observer performance. The lesions that readers initially missed but CAD detected were stratified by anatomic location and degree of subtlety, and the adoption rate was calculated. Fisher's exact test was used for comparison. RESULTS The mean wAFROC FOM score of the 12 readers significantly improved from 0.746 to 0.810 with software assistance (P = 0.007). In the reader group with < 6 years of experience, the mean FOM score significantly improved from 0.680 to 0.779 (P = 0.011), while that in the reader group with ≥ 6 years of experience increased from 0.811 to 0.841 (P = 0.12). The sensitivity of the CAD software and the adoption rate for the lesions with subtlety level 2 or 3 (obscure) lesions were significantly lower than for level 4 or 5 (distinct) lesions (50% vs. 93%, P < 0.001; and 55% vs. 74%, P = 0.04, respectively). CONCLUSION CAD software use improved doctors' performance in detecting nodules/masses and consolidation on CRs, particularly for non-expert doctors, by preventing doctors from missing distinct lesions rather than helping them to detect obscure lesions.
Collapse
Affiliation(s)
- Naoki Toda
- Department of Radiology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo, 160-8582, Japan
| | - Masahiro Hashimoto
- Department of Radiology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo, 160-8582, Japan.
| | - Yu Iwabuchi
- Department of Radiology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo, 160-8582, Japan
| | - Misa Nagasaka
- Department of Radiology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo, 160-8582, Japan
| | - Ryo Takeshita
- Department of Radiology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo, 160-8582, Japan
| | - Minoru Yamada
- Department of Radiology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo, 160-8582, Japan
| | - Yoshitake Yamada
- Department of Radiology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo, 160-8582, Japan
| | - Masahiro Jinzaki
- Department of Radiology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo, 160-8582, Japan
| |
Collapse
|
10
|
Lee SY, Ha S, Jeon MG, Li H, Choi H, Kim HP, Choi YR, I H, Jeong YJ, Park YH, Ahn H, Hong SH, Koo HJ, Lee CW, Kim MJ, Kim YJ, Kim KW, Choi JM. Localization-adjusted diagnostic performance and assistance effect of a computer-aided detection system for pneumothorax and consolidation. NPJ Digit Med 2022; 5:107. [PMID: 35908091 PMCID: PMC9339006 DOI: 10.1038/s41746-022-00658-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 07/11/2022] [Indexed: 11/24/2022] Open
Abstract
While many deep-learning-based computer-aided detection systems (CAD) have been developed and commercialized for abnormality detection in chest radiographs (CXR), their ability to localize a target abnormality is rarely reported. Localization accuracy is important in terms of model interpretability, which is crucial in clinical settings. Moreover, diagnostic performances are likely to vary depending on thresholds which define an accurate localization. In a multi-center, stand-alone clinical trial using temporal and external validation datasets of 1,050 CXRs, we evaluated localization accuracy, localization-adjusted discrimination, and calibration of a commercially available deep-learning-based CAD for detecting consolidation and pneumothorax. The CAD achieved image-level AUROC (95% CI) of 0.960 (0.945, 0.975), sensitivity of 0.933 (0.899, 0.959), specificity of 0.948 (0.930, 0.963), dice of 0.691 (0.664, 0.718), moderate calibration for consolidation, and image-level AUROC of 0.978 (0.965, 0.991), sensitivity of 0.956 (0.923, 0.978), specificity of 0.996 (0.989, 0.999), dice of 0.798 (0.770, 0.826), moderate calibration for pneumothorax. Diagnostic performances varied substantially when localization accuracy was accounted for but remained high at the minimum threshold of clinical relevance. In a separate trial for diagnostic impact using 461 CXRs, the causal effect of the CAD assistance on clinicians’ diagnostic performances was estimated. After adjusting for age, sex, dataset, and abnormality type, the CAD improved clinicians’ diagnostic performances on average (OR [95% CI] = 1.73 [1.30, 2.32]; p < 0.001), although the effects varied substantially by clinical backgrounds. The CAD was found to have high stand-alone diagnostic performances and may beneficially impact clinicians’ diagnostic performances when used in clinical settings.
Collapse
Affiliation(s)
- Sun Yeop Lee
- Department of Medical Artificial Intelligence, Deepnoid, Inc., Seoul, Republic of Korea
| | - Sangwoo Ha
- Department of Medical Artificial Intelligence, Deepnoid, Inc., Seoul, Republic of Korea
| | - Min Gyeong Jeon
- Department of Medical Artificial Intelligence, Deepnoid, Inc., Seoul, Republic of Korea
| | - Hao Li
- Department of Medical Artificial Intelligence, Deepnoid, Inc., Seoul, Republic of Korea
| | - Hyunju Choi
- Department of Medical Artificial Intelligence, Deepnoid, Inc., Seoul, Republic of Korea
| | - Hwa Pyung Kim
- Department of Medical Artificial Intelligence, Deepnoid, Inc., Seoul, Republic of Korea
| | - Ye Ra Choi
- Department of Radiology, Seoul Metropolitan Government-Seoul National University Boramae Medical Center, Seoul, Republic of Korea.,Department of Radiology, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Hoseok I
- Department of Thoracic and Cardiovascular Surgery, Pusan National University School of Medicine, Busan, Republic of Korea.,Convergence Medical Institute of Technology, Biomedical Research Institute, Pusan National University Hospital, Busan, Republic of Korea
| | - Yeon Joo Jeong
- Department of Radiology and Biomedical Research Institute, Pusan National University Hospital, Busan, Republic of Korea
| | - Yoon Ha Park
- Department of Internal Medicine, Jawol Health Center, Incheon, Republic of Korea
| | - Hyemin Ahn
- Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Sang Hyup Hong
- Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Hyun Jung Koo
- Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Choong Wook Lee
- Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Min Jae Kim
- Department of Infectious Disease, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Yeon Joo Kim
- Department of Respiratory Allergy Medicine, Nowon Eulji Medical Center, Seoul, Republic of Korea
| | - Kyung Won Kim
- Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Jong Mun Choi
- Department of Medical Artificial Intelligence, Deepnoid, Inc., Seoul, Republic of Korea.
| |
Collapse
|