1
|
Bándi P, Balkenhol M, van Dijk M, Kok M, van Ginneken B, van der Laak J, Litjens G. Continual learning strategies for cancer-independent detection of lymph node metastases. Med Image Anal 2023; 85:102755. [PMID: 36724605 DOI: 10.1016/j.media.2023.102755] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 01/08/2023] [Accepted: 01/18/2023] [Indexed: 01/26/2023]
Abstract
Recently, large, high-quality public datasets have led to the development of convolutional neural networks that can detect lymph node metastases of breast cancer at the level of expert pathologists. Many cancers, regardless of the site of origin, can metastasize to lymph nodes. However, collecting and annotating high-volume, high-quality datasets for every cancer type is challenging. In this paper we investigate how to leverage existing high-quality datasets most efficiently in multi-task settings for closely related tasks. Specifically, we will explore different training and domain adaptation strategies, including prevention of catastrophic forgetting, for breast, colon and head-and-neck cancer metastasis detection in lymph nodes. Our results show state-of-the-art performance on colon and head-and-neck cancer metastasis detection tasks. We show the effectiveness of adaptation of networks from one cancer type to another to obtain multi-task metastasis detection networks. Furthermore, we show that leveraging existing high-quality datasets can significantly boost performance on new target tasks and that catastrophic forgetting can be effectively mitigated.Last, we compare different mitigation strategies.
Collapse
|
2
|
Jarkman S, Karlberg M, Pocevičiūtė M, Bodén A, Bándi P, Litjens G, Lundström C, Treanor D, van der Laak J. Generalization of Deep Learning in Digital Pathology: Experience in Breast Cancer Metastasis Detection. Cancers (Basel) 2022; 14:5424. [PMID: 36358842 PMCID: PMC9659028 DOI: 10.3390/cancers14215424] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 10/13/2022] [Accepted: 10/28/2022] [Indexed: 11/06/2022] Open
Abstract
Poor generalizability is a major barrier to clinical implementation of artificial intelligence in digital pathology. The aim of this study was to test the generalizability of a pretrained deep learning model to a new diagnostic setting and to a small change in surgical indication. A deep learning model for breast cancer metastases detection in sentinel lymph nodes, trained on CAMELYON multicenter data, was used as a base model, and achieved an AUC of 0.969 (95% CI 0.926-0.998) and FROC of 0.838 (95% CI 0.757-0.913) on CAMELYON16 test data. On local sentinel node data, the base model performance dropped to AUC 0.929 (95% CI 0.800-0.998) and FROC 0.744 (95% CI 0.566-0.912). On data with a change in surgical indication (axillary dissections) the base model performance indicated an even larger drop with a FROC of 0.503 (95%CI 0.201-0.911). The model was retrained with addition of local data, resulting in about a 4% increase for both AUC and FROC for sentinel nodes, and an increase of 11% in AUC and 49% in FROC for axillary nodes. Pathologist qualitative evaluation of the retrained model´s output showed no missed positive slides. False positives, false negatives and one previously undetected micro-metastasis were observed. The study highlights the generalization challenge even when using a multicenter trained model, and that a small change in indication can considerably impact the model´s performance.
Collapse
Affiliation(s)
- Sofia Jarkman
- Department of Clinical Pathology, and Department of Biomedical and Clinical Sciences, Linköping University, 581 83 Linköping, Sweden
- Center for Medical Image Science and Visualization (CMIV), Linköping University, 581 85 Linköping, Sweden
| | - Micael Karlberg
- Center for Medical Image Science and Visualization (CMIV), Linköping University, 581 85 Linköping, Sweden
- Department of Pathology, Radboud University Medical Center, P.O. Box 9101, 6500 HB Nijmegen, The Netherlands
| | - Milda Pocevičiūtė
- Center for Medical Image Science and Visualization (CMIV), Linköping University, 581 85 Linköping, Sweden
| | - Anna Bodén
- Department of Clinical Pathology, and Department of Biomedical and Clinical Sciences, Linköping University, 581 83 Linköping, Sweden
- Center for Medical Image Science and Visualization (CMIV), Linköping University, 581 85 Linköping, Sweden
| | - Péter Bándi
- Department of Pathology, Radboud University Medical Center, P.O. Box 9101, 6500 HB Nijmegen, The Netherlands
| | - Geert Litjens
- Department of Pathology, Radboud University Medical Center, P.O. Box 9101, 6500 HB Nijmegen, The Netherlands
| | - Claes Lundström
- Center for Medical Image Science and Visualization (CMIV), Linköping University, 581 85 Linköping, Sweden
- Sectra AB, Teknikringen 20, 583 30 Linköping, Sweden
| | - Darren Treanor
- Department of Clinical Pathology, and Department of Biomedical and Clinical Sciences, Linköping University, 581 83 Linköping, Sweden
- Center for Medical Image Science and Visualization (CMIV), Linköping University, 581 85 Linköping, Sweden
- Leeds Teaching Hospitals NHS Trust, St James´s University Hospital, Beckett Street, Leeds LS9 7TF, UK
- Department of Pathology, University of Leeds, Woodhouse Lane, Leeds LS2 9JT, UK
| | - Jeroen van der Laak
- Center for Medical Image Science and Visualization (CMIV), Linköping University, 581 85 Linköping, Sweden
- Department of Pathology, Radboud University Medical Center, P.O. Box 9101, 6500 HB Nijmegen, The Netherlands
| |
Collapse
|
3
|
Generating synthetic contrast enhancement from non-contrast chest computed tomography using a generative adversarial network. Sci Rep 2021; 11:20403. [PMID: 34650076 PMCID: PMC8516920 DOI: 10.1038/s41598-021-00058-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2021] [Accepted: 10/01/2021] [Indexed: 11/09/2022] Open
Abstract
This study aimed to evaluate a deep learning model for generating synthetic contrast-enhanced CT (sCECT) from non-contrast chest CT (NCCT). A deep learning model was applied to generate sCECT from NCCT. We collected three separate data sets, the development set (n = 25) for model training and tuning, test set 1 (n = 25) for technical evaluation, and test set 2 (n = 12) for clinical utility evaluation. In test set 1, image similarity metrics were calculated. In test set 2, the lesion contrast-to-noise ratio of the mediastinal lymph nodes was measured, and an observer study was conducted to compare lesion conspicuity. Comparisons were performed using the paired t-test or Wilcoxon signed-rank test. In test set 1, sCECT showed a lower mean absolute error (41.72 vs 48.74; P < .001), higher peak signal-to-noise ratio (17.44 vs 15.97; P < .001), higher multiscale structural similarity index measurement (0.84 vs 0.81; P < .001), and lower learned perceptual image patch similarity metric (0.14 vs 0.15; P < .001) than NCCT. In test set 2, the contrast-to-noise ratio of the mediastinal lymph nodes was higher in the sCECT group than in the NCCT group (6.15 ± 5.18 vs 0.74 ± 0.69; P < .001). The observer study showed for all reviewers higher lesion conspicuity in NCCT with sCECT than in NCCT alone (P ≤ .001). Synthetic CECT generated from NCCT improves the depiction of mediastinal lymph nodes.
Collapse
|
4
|
Nam JG, Kim HJ, Lee EH, Hong W, Park J, Hwang EJ, Park CM, Goo JM. Value of a deep learning-based algorithm for detecting Lung-RADS category 4 nodules on chest radiographs in a health checkup population: estimation of the sample size for a randomized controlled trial. Eur Radiol 2021; 32:213-222. [PMID: 34264351 DOI: 10.1007/s00330-021-08162-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Revised: 05/20/2021] [Accepted: 06/10/2021] [Indexed: 11/24/2022]
Abstract
OBJECTIVE To explore the value of a deep learning-based algorithm in detecting Lung CT Screening Reporting and Data System category 4 nodules on chest radiographs from an asymptomatic health checkup population. METHODS Data from an annual retrospective cohort of individuals who underwent chest radiographs for health checkup purposes and chest CT scanning within 3 months were collected. Among 3073 individuals, 118 with category 4 nodules on CT were selected. A reader performance test was performed using those 118 radiographs and randomly selected 51 individuals without any nodules. Four radiologists independently evaluated the radiographs without and with the results of the algorithm; and sensitivities/specificities were compared. The sample size needed to confirm the difference in detection rates was calculated, i.e., the number of true-positive radiographs divided by the total number of radiographs. RESULTS The sensitivity of the radiologists substantially increased aided by the algorithm (38.8% [183/472] to 45.1% [213/472]; p < .001) without significant change in specificity (94.1% [192/204] vs. 92.2% [188/204]; p = .22). Pooled radiologists detected more nodules with the algorithm (32.0% [156/488] vs. 38.9% [190/488]; p < .001), without alteration of false-positive rates (0.09 [62/676], both). Pooled detection rates for the annual cohort were 1.49% (183/12,292) and 1.73% (213/12,292) without and with the algorithm, respectively. A sample size of 41,776 in each arm would be required to demonstrate significant detection rate difference with < 5% type I error and > 80% power. CONCLUSION Although readers substantially increased sensitivity in detecting nodules on chest radiographs from a health checkup population aided by the algorithm, detection rate difference was only 0.24%, requiring a sample size >80,000 for a randomized controlled trial. KEY POINTS • Aided by a deep learning algorithm, pooled radiologists improved their sensitivity in detecting Lung-RADS category 4 nodules on chest radiographs from a health checkup population (38.8% [183/472] to 45.1% [213/472]; p < .001), without increasing false-positive rate. • The prevalence of the Lung-RADS category 4 nodules was 3.8% (118/3073) on the population, resulting in only 0.24% increase of the detection rate for the radiologists with assistance of the algorithm. • To confirm the significant detection rate increase by a randomized controlled trial, a sample size of 84,000 would be required.
Collapse
Affiliation(s)
- Ju Gang Nam
- Department of Radiology, Seoul National University Hospital and College of Medicine, Seoul, 03080, Republic of Korea
| | - Hyun Jin Kim
- Department of Radiology, Ewha Womans University Seoul Hospital, Seoul, 07804, Republic of Korea
| | - Eun Hee Lee
- Center for Health Promotion and Optimal Aging, Seoul National University Hospital, Seoul, 03080, Republic of Korea
| | - Wonju Hong
- Department of Radiology, Seoul National University Hospital and College of Medicine, Seoul, 03080, Republic of Korea
| | - Jongsoo Park
- Department of Radiology, Seoul National University Hospital and College of Medicine, Seoul, 03080, Republic of Korea
| | - Eui Jin Hwang
- Department of Radiology, Seoul National University Hospital and College of Medicine, Seoul, 03080, Republic of Korea
| | - Chang Min Park
- Department of Radiology, Seoul National University Hospital and College of Medicine, Seoul, 03080, Republic of Korea.,Cancer Research Institute, Seoul National University, Seoul, 03080, Republic of Korea
| | - Jin Mo Goo
- Department of Radiology, Seoul National University Hospital and College of Medicine, Seoul, 03080, Republic of Korea. .,Cancer Research Institute, Seoul National University, Seoul, 03080, Republic of Korea. .,Department of Radiology, Seoul National University College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul, 03080, Republic of Korea.
| |
Collapse
|
5
|
Nam JG, Hwang EJ, Kim DS, Yoo SJ, Choi H, Goo JM, Park CM. Undetected Lung Cancer at Posteroanterior Chest Radiography: Potential Role of a Deep Learning-based Detection Algorithm. Radiol Cardiothorac Imaging 2020; 2:e190222. [PMID: 33778635 DOI: 10.1148/ryct.2020190222] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Revised: 08/12/2020] [Accepted: 10/16/2020] [Indexed: 12/25/2022]
Abstract
Purpose To evaluate the performance of a deep learning-based algorithm in detecting lung cancers not reported on posteroanterior chest radiographs during routine practice. Materials and Methods The retrospective test dataset included 168 posteroanterior chest radiographs acquired between March 2017 and December 2018 (168 patients; mean age, 71.9 years ± 9.5 [standard deviation]; age range, 42-91 years) with 187 lung cancers (mean size, 2.3 cm ± 1.2) undetected during initial clinical evaluation, and 50 normal chest radiographs. CT served as the reference standard for ground truth. Four thoracic radiologists independently reevaluated the chest radiographs for lung nodules both without and with the aid of the algorithm. The performances of the algorithm and the radiologists were evaluated and compared on a per-chest radiograph basis and a per-lesion basis, according to the area under the receiver operating characteristic curve (AUROC) and area under the jackknife free-response ROC curve (AUFROC). Results The algorithm showed excellent diagnostic performances both in terms of per-chest radiograph classification (AUROC, 0.899) and per-lesion localization (AUFROC, 0.744); both of these values were significantly higher than those of the radiologists (AUROC, 0.634-0.663; AUFROC, 0.619-0.651; P < .001 for all). The algorithm also demonstrated higher sensitivity (69.6% [117 of 168] vs 47.0% [316 of 672]; P < .001) and specificity (94.0% [47 of 50] vs 78.0% [156 of 200]; P = .01). When assisted by the algorithm, the radiologists' AUROC (0.634-0.663 vs 0.685-0.724; P < 0.01 for all) and pooled AUFROC (0.636 vs 0.688; P = .03) substantially improved. The false-positive rate of the algorithm, that is, the total number of false-positive nodules divided by the total number of chest radiographs, was similar to that of pooled radiologists (21.1% [46 of 218] vs 19.0% [166 of 872]; P > .05). Conclusion A deep learning-based nodule detection algorithm showed excellent detection performance of lung cancers that were not reported on chest radiographs during routine practice and significantly reduced reading errors when used as a second reader.Supplemental material is available for this article.© RSNA, 2020See also commentary by White in this issue.
Collapse
Affiliation(s)
- Ju Gang Nam
- Department of Radiology, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul 03080, Republic of Korea (J.G.N., E.J.H., D.S.K., S.J.Y., H.C., J.M.G., C.M.P.); and Institute of Radiation Medicine, Seoul National University Medical Research Center, Seoul, Republic of Korea (J.M.G., C.M.P.)
| | - Eui Jin Hwang
- Department of Radiology, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul 03080, Republic of Korea (J.G.N., E.J.H., D.S.K., S.J.Y., H.C., J.M.G., C.M.P.); and Institute of Radiation Medicine, Seoul National University Medical Research Center, Seoul, Republic of Korea (J.M.G., C.M.P.)
| | - Da Som Kim
- Department of Radiology, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul 03080, Republic of Korea (J.G.N., E.J.H., D.S.K., S.J.Y., H.C., J.M.G., C.M.P.); and Institute of Radiation Medicine, Seoul National University Medical Research Center, Seoul, Republic of Korea (J.M.G., C.M.P.)
| | - Seung-Jin Yoo
- Department of Radiology, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul 03080, Republic of Korea (J.G.N., E.J.H., D.S.K., S.J.Y., H.C., J.M.G., C.M.P.); and Institute of Radiation Medicine, Seoul National University Medical Research Center, Seoul, Republic of Korea (J.M.G., C.M.P.)
| | - Hyewon Choi
- Department of Radiology, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul 03080, Republic of Korea (J.G.N., E.J.H., D.S.K., S.J.Y., H.C., J.M.G., C.M.P.); and Institute of Radiation Medicine, Seoul National University Medical Research Center, Seoul, Republic of Korea (J.M.G., C.M.P.)
| | - Jin Mo Goo
- Department of Radiology, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul 03080, Republic of Korea (J.G.N., E.J.H., D.S.K., S.J.Y., H.C., J.M.G., C.M.P.); and Institute of Radiation Medicine, Seoul National University Medical Research Center, Seoul, Republic of Korea (J.M.G., C.M.P.)
| | - Chang Min Park
- Department of Radiology, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul 03080, Republic of Korea (J.G.N., E.J.H., D.S.K., S.J.Y., H.C., J.M.G., C.M.P.); and Institute of Radiation Medicine, Seoul National University Medical Research Center, Seoul, Republic of Korea (J.M.G., C.M.P.)
| |
Collapse
|
6
|
Koo YH, Shin KE, Park JS, Lee JW, Byun S, Lee H. Extravalidation and reproducibility results of a commercial deep learning-based automatic detection algorithm for pulmonary nodules on chest radiographs at tertiary hospital. J Med Imaging Radiat Oncol 2020; 65:15-22. [PMID: 33090731 DOI: 10.1111/1754-9485.13105] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Revised: 08/06/2020] [Accepted: 08/31/2020] [Indexed: 12/25/2022]
Abstract
INTRODUCTION To extra validate and evaluate the reproducibility of a commercial deep convolutional neural network (DCNN) algorithm for pulmonary nodules on chest radiographs (CRs) and to compare its performance with radiologists. METHODS This retrospective study enrolled 434 CRs (normal to abnormal ratio, 246:188) from 378 patients that visited a tertiary hospital. DCNN performance was compared with two radiology residents and two thoracic radiologists. Abnormality assessment (using the area under the receiver operating ch3cteristics (AUROC)) and nodule detection (using jackknife alternative free-response ROC (JAFROC)) were compared among three groups (DCNN only, radiologist without DCNN and radiologist with DCNN). A subset of 56 paired cases, having two CRs taken within a 7-day period, were assessed for intraobserver reproducibility using the intraclass correlation coefficient. Independent characteristics of pulmonary nodules detected by DCNN were assessed by multiple logistic regression analysis. RESULTS The AUROC for abnormality detection for the three groups were 0.87, 0.93 and 0.96, respectively (P < 0.05), whereas the JAFROC analysis of nodule detection was 0.926, 0.929 and 0.964. Reproducibility for the three groups was 0.80, 0.67 and 0.80, which shows an increase in radiologists using DCNN (P < 0.05). Nodules detected by DCNN were more solid, round-shaped and well marginated, not masked and laterally located (P < 0.05). CONCLUSIONS Extra validation results of DCNN showed high ROC results and there was a significant improvement in the performance when radiologists used DCNN. Reproducibility by DCNN alone showed good agreement, and there was an improvement from moderate to good agreement for radiologists using DCNN.
Collapse
Affiliation(s)
- Young Hoon Koo
- Department of Radiology, Soonchunhyang University Bucheon Hospital, Gyeonggi-do, Korea
| | - Kyung Eun Shin
- Department of Radiology, Soonchunhyang University Bucheon Hospital, Gyeonggi-do, Korea
| | - Jai Soung Park
- Department of Radiology, Soonchunhyang University Bucheon Hospital, Gyeonggi-do, Korea
| | - Jae Wook Lee
- Department of Radiology, Soonchunhyang University Bucheon Hospital, Gyeonggi-do, Korea
| | - Seonghwan Byun
- Department of Radiology, Soonchunhyang University Bucheon Hospital, Gyeonggi-do, Korea
| | - Heon Lee
- Department of Radiology, Soonchunhyang University Bucheon Hospital, Gyeonggi-do, Korea
| |
Collapse
|
7
|
Detectability of small objects in PET/computed tomography phantom images with Bayesian penalised likelihood reconstruction. Nucl Med Commun 2020; 41:666-673. [DOI: 10.1097/mnm.0000000000001204] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
8
|
Carter RE, Holmes DR, Fletcher JG, McCollough CH. Evaluation of Pseudoreader Study Designs to Estimate Observer Performance Results as an Alternative to Fully Crossed, Multireader, Multicase Studies. Acad Radiol 2020; 27:244-252. [PMID: 31076331 DOI: 10.1016/j.acra.2019.03.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2018] [Revised: 02/27/2019] [Accepted: 03/05/2019] [Indexed: 11/17/2022]
Abstract
RATIONALE AND OBJECTIVES To examine the ability of a pseudoreader study design to estimate the observer performance obtained using a traditional fully crossed, multireader, multicase (MRMC) study. MATERIALS AND METHODS A 10-reader MRMC study with 20 computed tomography datasets was designed to measure observer performance on four novel noise reduction methods. This study served as the foundation for the empirical evaluation of three different pseudoreader designs, each of which used a similar bootstrap approach for generating 2000 realizations from the fully crossed study. Our three approaches to generating a pseudoreader varied in the degree to which reader performance was matched and integrated into the pseudoreader design. One randomly selected simulation was selected as a "mock study" to represent a hypothetical, prospective implementation of the design. RESULTS Using the traditional fully crossed design, figures of merit) (95% CIs) for the four noise reductions methods were 68.2 (55.5-81.0), 69.6 (58.4-80.8), 70.8 (60.2-81.4), and 70.9 (60.4-81.3), respectively. When radiologists' performances on the fourth noise reduction method were used to pair readers during the mock study, there was strong agreement in the estimated figures of merits with estimates using the pseudoreader design being within ±3% of the fully crossed design. CONCLUSION Fully crossed MRMC studies require significant investment in resources and time, often resulting in delayed implementation or minimal human testing before dissemination. The pseudoreader approach accelerates study conduct by combining readers judiciously and was found to provide comparable results to the traditional fully crossed design by making strong assumptions about exchangeability of the readers.
Collapse
Affiliation(s)
- Rickey E Carter
- Department of Health Sciences Research, Mayo Clinic, 4500 San Pablo Road South, Jacksonville, FL 32224.
| | - David R Holmes
- Department of Physiology and Biomedical Engineering, Mayo Clinic, Rochester, Minnesota
| | | | | |
Collapse
|
9
|
Automatic liver tumor segmentation in CT with fully convolutional neural networks and object-based postprocessing. Sci Rep 2018; 8:15497. [PMID: 30341319 PMCID: PMC6195599 DOI: 10.1038/s41598-018-33860-7] [Citation(s) in RCA: 112] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2018] [Accepted: 10/06/2018] [Indexed: 02/07/2023] Open
Abstract
Automatic liver tumor segmentation would have a big impact on liver therapy planning procedures and follow-up assessment, thanks to standardization and incorporation of full volumetric information. In this work, we develop a fully automatic method for liver tumor segmentation in CT images based on a 2D fully convolutional neural network with an object-based postprocessing step. We describe our experiments on the LiTS challenge training data set and evaluate segmentation and detection performance. Our proposed design cascading two models working on voxel- and object-level allowed for a significant reduction of false positive findings by 85% when compared with the raw neural network output. In comparison with the human performance, our approach achieves a similar segmentation quality for detected tumors (mean Dice 0.69 vs. 0.72), but is inferior in the detection performance (recall 63% vs. 92%). Finally, we describe how we participated in the LiTS challenge and achieved state-of-the-art performance.
Collapse
|
10
|
Nam JG, Park S, Hwang EJ, Lee JH, Jin KN, Lim KY, Vu TH, Sohn JH, Hwang S, Goo JM, Park CM. Development and Validation of Deep Learning-based Automatic Detection Algorithm for Malignant Pulmonary Nodules on Chest Radiographs. Radiology 2018; 290:218-228. [PMID: 30251934 DOI: 10.1148/radiol.2018180237] [Citation(s) in RCA: 305] [Impact Index Per Article: 50.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Purpose To develop and validate a deep learning-based automatic detection algorithm (DLAD) for malignant pulmonary nodules on chest radiographs and to compare its performance with physicians including thoracic radiologists. Materials and Methods For this retrospective study, DLAD was developed by using 43 292 chest radiographs (normal radiograph-to-nodule radiograph ratio, 34 067:9225) in 34 676 patients (healthy-to-nodule ratio, 30 784:3892; 19 230 men [mean age, 52.8 years; age range, 18-99 years]; 15 446 women [mean age, 52.3 years; age range, 18-98 years]) obtained between 2010 and 2015, which were labeled and partially annotated by 13 board-certified radiologists, in a convolutional neural network. Radiograph classification and nodule detection performances of DLAD were validated by using one internal and four external data sets from three South Korean hospitals and one U.S. hospital. For internal and external validation, radiograph classification and nodule detection performances of DLAD were evaluated by using the area under the receiver operating characteristic curve (AUROC) and jackknife alternative free-response receiver-operating characteristic (JAFROC) figure of merit (FOM), respectively. An observer performance test involving 18 physicians, including nine board-certified radiologists, was conducted by using one of the four external validation data sets. Performances of DLAD, physicians, and physicians assisted with DLAD were evaluated and compared. Results According to one internal and four external validation data sets, radiograph classification and nodule detection performances of DLAD were a range of 0.92-0.99 (AUROC) and 0.831-0.924 (JAFROC FOM), respectively. DLAD showed a higher AUROC and JAFROC FOM at the observer performance test than 17 of 18 and 15 of 18 physicians, respectively (P < .05), and all physicians showed improved nodule detection performances with DLAD (mean JAFROC FOM improvement, 0.043; range, 0.006-0.190; P < .05). Conclusion This deep learning-based automatic detection algorithm outperformed physicians in radiograph classification and nodule detection performance for malignant pulmonary nodules on chest radiographs, and it enhanced physicians' performances when used as a second reader. © RSNA, 2018 Online supplemental material is available for this article.
Collapse
Affiliation(s)
- Ju Gang Nam
- From the Department of Radiology and Institute of Radiation Medicine, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul 03080, Republic of Korea (J.G.N., E.J.H., J.M.G., C.M.P.); Lunit Incorporated, Seoul, Republic of Korea (S.P.); Department of Radiology, Armed Forces Seoul Hospital, Seoul, Republic of Korea (J.H.L.); Department of Radiology, Seoul National University Boramae Medical Center, Seoul, Republic of Korea (K.N.J.); Department of Radiology, National Cancer Center, Goyang, Republic of Korea (K.Y.L.); Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, Calif (T.H.V., J.H.S.); and Department of Industrial & Information Systems Engineering, Seoul National University of Science and Technology, Seoul, Republic of Korea (S.H.)
| | - Sunggyun Park
- From the Department of Radiology and Institute of Radiation Medicine, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul 03080, Republic of Korea (J.G.N., E.J.H., J.M.G., C.M.P.); Lunit Incorporated, Seoul, Republic of Korea (S.P.); Department of Radiology, Armed Forces Seoul Hospital, Seoul, Republic of Korea (J.H.L.); Department of Radiology, Seoul National University Boramae Medical Center, Seoul, Republic of Korea (K.N.J.); Department of Radiology, National Cancer Center, Goyang, Republic of Korea (K.Y.L.); Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, Calif (T.H.V., J.H.S.); and Department of Industrial & Information Systems Engineering, Seoul National University of Science and Technology, Seoul, Republic of Korea (S.H.)
| | - Eui Jin Hwang
- From the Department of Radiology and Institute of Radiation Medicine, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul 03080, Republic of Korea (J.G.N., E.J.H., J.M.G., C.M.P.); Lunit Incorporated, Seoul, Republic of Korea (S.P.); Department of Radiology, Armed Forces Seoul Hospital, Seoul, Republic of Korea (J.H.L.); Department of Radiology, Seoul National University Boramae Medical Center, Seoul, Republic of Korea (K.N.J.); Department of Radiology, National Cancer Center, Goyang, Republic of Korea (K.Y.L.); Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, Calif (T.H.V., J.H.S.); and Department of Industrial & Information Systems Engineering, Seoul National University of Science and Technology, Seoul, Republic of Korea (S.H.)
| | - Jong Hyuk Lee
- From the Department of Radiology and Institute of Radiation Medicine, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul 03080, Republic of Korea (J.G.N., E.J.H., J.M.G., C.M.P.); Lunit Incorporated, Seoul, Republic of Korea (S.P.); Department of Radiology, Armed Forces Seoul Hospital, Seoul, Republic of Korea (J.H.L.); Department of Radiology, Seoul National University Boramae Medical Center, Seoul, Republic of Korea (K.N.J.); Department of Radiology, National Cancer Center, Goyang, Republic of Korea (K.Y.L.); Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, Calif (T.H.V., J.H.S.); and Department of Industrial & Information Systems Engineering, Seoul National University of Science and Technology, Seoul, Republic of Korea (S.H.)
| | - Kwang-Nam Jin
- From the Department of Radiology and Institute of Radiation Medicine, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul 03080, Republic of Korea (J.G.N., E.J.H., J.M.G., C.M.P.); Lunit Incorporated, Seoul, Republic of Korea (S.P.); Department of Radiology, Armed Forces Seoul Hospital, Seoul, Republic of Korea (J.H.L.); Department of Radiology, Seoul National University Boramae Medical Center, Seoul, Republic of Korea (K.N.J.); Department of Radiology, National Cancer Center, Goyang, Republic of Korea (K.Y.L.); Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, Calif (T.H.V., J.H.S.); and Department of Industrial & Information Systems Engineering, Seoul National University of Science and Technology, Seoul, Republic of Korea (S.H.)
| | - Kun Young Lim
- From the Department of Radiology and Institute of Radiation Medicine, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul 03080, Republic of Korea (J.G.N., E.J.H., J.M.G., C.M.P.); Lunit Incorporated, Seoul, Republic of Korea (S.P.); Department of Radiology, Armed Forces Seoul Hospital, Seoul, Republic of Korea (J.H.L.); Department of Radiology, Seoul National University Boramae Medical Center, Seoul, Republic of Korea (K.N.J.); Department of Radiology, National Cancer Center, Goyang, Republic of Korea (K.Y.L.); Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, Calif (T.H.V., J.H.S.); and Department of Industrial & Information Systems Engineering, Seoul National University of Science and Technology, Seoul, Republic of Korea (S.H.)
| | - Thienkai Huy Vu
- From the Department of Radiology and Institute of Radiation Medicine, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul 03080, Republic of Korea (J.G.N., E.J.H., J.M.G., C.M.P.); Lunit Incorporated, Seoul, Republic of Korea (S.P.); Department of Radiology, Armed Forces Seoul Hospital, Seoul, Republic of Korea (J.H.L.); Department of Radiology, Seoul National University Boramae Medical Center, Seoul, Republic of Korea (K.N.J.); Department of Radiology, National Cancer Center, Goyang, Republic of Korea (K.Y.L.); Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, Calif (T.H.V., J.H.S.); and Department of Industrial & Information Systems Engineering, Seoul National University of Science and Technology, Seoul, Republic of Korea (S.H.)
| | - Jae Ho Sohn
- From the Department of Radiology and Institute of Radiation Medicine, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul 03080, Republic of Korea (J.G.N., E.J.H., J.M.G., C.M.P.); Lunit Incorporated, Seoul, Republic of Korea (S.P.); Department of Radiology, Armed Forces Seoul Hospital, Seoul, Republic of Korea (J.H.L.); Department of Radiology, Seoul National University Boramae Medical Center, Seoul, Republic of Korea (K.N.J.); Department of Radiology, National Cancer Center, Goyang, Republic of Korea (K.Y.L.); Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, Calif (T.H.V., J.H.S.); and Department of Industrial & Information Systems Engineering, Seoul National University of Science and Technology, Seoul, Republic of Korea (S.H.)
| | - Sangheum Hwang
- From the Department of Radiology and Institute of Radiation Medicine, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul 03080, Republic of Korea (J.G.N., E.J.H., J.M.G., C.M.P.); Lunit Incorporated, Seoul, Republic of Korea (S.P.); Department of Radiology, Armed Forces Seoul Hospital, Seoul, Republic of Korea (J.H.L.); Department of Radiology, Seoul National University Boramae Medical Center, Seoul, Republic of Korea (K.N.J.); Department of Radiology, National Cancer Center, Goyang, Republic of Korea (K.Y.L.); Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, Calif (T.H.V., J.H.S.); and Department of Industrial & Information Systems Engineering, Seoul National University of Science and Technology, Seoul, Republic of Korea (S.H.)
| | - Jin Mo Goo
- From the Department of Radiology and Institute of Radiation Medicine, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul 03080, Republic of Korea (J.G.N., E.J.H., J.M.G., C.M.P.); Lunit Incorporated, Seoul, Republic of Korea (S.P.); Department of Radiology, Armed Forces Seoul Hospital, Seoul, Republic of Korea (J.H.L.); Department of Radiology, Seoul National University Boramae Medical Center, Seoul, Republic of Korea (K.N.J.); Department of Radiology, National Cancer Center, Goyang, Republic of Korea (K.Y.L.); Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, Calif (T.H.V., J.H.S.); and Department of Industrial & Information Systems Engineering, Seoul National University of Science and Technology, Seoul, Republic of Korea (S.H.)
| | - Chang Min Park
- From the Department of Radiology and Institute of Radiation Medicine, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul 03080, Republic of Korea (J.G.N., E.J.H., J.M.G., C.M.P.); Lunit Incorporated, Seoul, Republic of Korea (S.P.); Department of Radiology, Armed Forces Seoul Hospital, Seoul, Republic of Korea (J.H.L.); Department of Radiology, Seoul National University Boramae Medical Center, Seoul, Republic of Korea (K.N.J.); Department of Radiology, National Cancer Center, Goyang, Republic of Korea (K.Y.L.); Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, Calif (T.H.V., J.H.S.); and Department of Industrial & Information Systems Engineering, Seoul National University of Science and Technology, Seoul, Republic of Korea (S.H.)
| |
Collapse
|
11
|
Ehteshami Bejnordi B, Veta M, Johannes van Diest P, van Ginneken B, Karssemeijer N, Litjens G, van der Laak JAWM. Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer. JAMA 2017; 318:2199-2210. [PMID: 29234806 PMCID: PMC5820737 DOI: 10.1001/jama.2017.14585] [Citation(s) in RCA: 1399] [Impact Index Per Article: 199.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/22/2017] [Accepted: 10/26/2017] [Indexed: 02/06/2023]
Abstract
Importance Application of deep learning algorithms to whole-slide pathology images can potentially improve diagnostic accuracy and efficiency. Objective Assess the performance of automated deep learning algorithms at detecting metastases in hematoxylin and eosin-stained tissue sections of lymph nodes of women with breast cancer and compare it with pathologists' diagnoses in a diagnostic setting. Design, Setting, and Participants Researcher challenge competition (CAMELYON16) to develop automated solutions for detecting lymph node metastases (November 2015-November 2016). A training data set of whole-slide images from 2 centers in the Netherlands with (n = 110) and without (n = 160) nodal metastases verified by immunohistochemical staining were provided to challenge participants to build algorithms. Algorithm performance was evaluated in an independent test set of 129 whole-slide images (49 with and 80 without metastases). The same test set of corresponding glass slides was also evaluated by a panel of 11 pathologists with time constraint (WTC) from the Netherlands to ascertain likelihood of nodal metastases for each slide in a flexible 2-hour session, simulating routine pathology workflow, and by 1 pathologist without time constraint (WOTC). Exposures Deep learning algorithms submitted as part of a challenge competition or pathologist interpretation. Main Outcomes and Measures The presence of specific metastatic foci and the absence vs presence of lymph node metastasis in a slide or image using receiver operating characteristic curve analysis. The 11 pathologists participating in the simulation exercise rated their diagnostic confidence as definitely normal, probably normal, equivocal, probably tumor, or definitely tumor. Results The area under the receiver operating characteristic curve (AUC) for the algorithms ranged from 0.556 to 0.994. The top-performing algorithm achieved a lesion-level, true-positive fraction comparable with that of the pathologist WOTC (72.4% [95% CI, 64.3%-80.4%]) at a mean of 0.0125 false-positives per normal whole-slide image. For the whole-slide image classification task, the best algorithm (AUC, 0.994 [95% CI, 0.983-0.999]) performed significantly better than the pathologists WTC in a diagnostic simulation (mean AUC, 0.810 [range, 0.738-0.884]; P < .001). The top 5 algorithms had a mean AUC that was comparable with the pathologist interpreting the slides in the absence of time constraints (mean AUC, 0.960 [range, 0.923-0.994] for the top 5 algorithms vs 0.966 [95% CI, 0.927-0.998] for the pathologist WOTC). Conclusions and Relevance In the setting of a challenge competition, some deep learning algorithms achieved better diagnostic performance than a panel of 11 pathologists participating in a simulation exercise designed to mimic routine pathology workflow; algorithm performance was comparable with an expert pathologist interpreting whole-slide images without time constraints. Whether this approach has clinical utility will require evaluation in a clinical setting.
Collapse
Affiliation(s)
- Babak Ehteshami Bejnordi
- Diagnostic Image Analysis Group, Department of Radiology and Nuclear Medicine, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Mitko Veta
- Medical Image Analysis Group, Eindhoven University of Technology, Eindhoven, the Netherlands
| | | | - Bram van Ginneken
- Diagnostic Image Analysis Group, Department of Radiology and Nuclear Medicine, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Nico Karssemeijer
- Diagnostic Image Analysis Group, Department of Radiology and Nuclear Medicine, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Geert Litjens
- Department of Pathology, Radboud University Medical Center, Nijmegen, the Netherlands
| | | | | |
Collapse
|
12
|
A brief history of free-response receiver operating characteristic paradigm data analysis. Acad Radiol 2013; 20:915-9. [PMID: 23583665 DOI: 10.1016/j.acra.2013.03.001] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2013] [Revised: 03/01/2013] [Accepted: 03/07/2013] [Indexed: 11/23/2022]
Abstract
In the receiver operating characteristic paradigm the observer assigns a single rating to each image and the location of the perceived abnormality, if any, is ignored. In the free-response receiver operating characteristic paradigm the observer is free to mark and rate as many suspicious regions as are considered clinically reportable. Credit for a correct localization is given only if a mark is sufficiently close to an actual lesion; otherwise, the observer's mark is scored as a location-level false positive. Until fairly recently there existed no accepted method for analyzing the resulting relatively unstructured data containing random numbers of mark-rating pairs per image. This report reviews the history of work in this field, which has now spanned more than five decades. It introduces terminology used to describe the paradigm, proposed measures of performance (figures of merit), ways of visualizing the data (operating characteristics), and software for analyzing free-response receiver operating characteristic studies.
Collapse
|
13
|
Abstract
A common task in medical imaging is assessing whether a new imaging system, or a variant of an existing one, is an improvement over an existing imaging technology. Imaging systems are generally quite complex, consisting of several components-for example, image acquisition hardware, image processing and display hardware and software, and image interpretation by radiologists- each of which can affect performance. Although it may appear odd to include the radiologist as a "component" of the imaging chain, because the radiologist's decision determines subsequent patient care, the effect of the human interpretation has to be included. Physical measurements such as modulation transfer function, signal-to-noise ratio, are useful for characterizing the nonhuman parts of the imaging chain under idealized and often unrealistic conditions, such as uniform background phantoms and target objects with sharp edges. Measuring the performance of the entire imaging chain, including the radiologist, and using real clinical images requires different methods that fall under the rubric of observer performance methods or "ROC" analysis, that involve collecting rating data on images. The purpose of this work is to review recent developments in this field, particularly with respect to the free-response method, where location information is also collected.
Collapse
Affiliation(s)
- Dev P Chakraborty
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA.
| |
Collapse
|