1
|
Grudza M, Salinel B, Zeien S, Murphy M, Adkins J, Jensen CT, Bay C, Kodibagkar V, Koo P, Dragovich T, Choti MA, Kundranda M, Syeda-Mahmood T, Wang HZ, Chang J. Methods for improving colorectal cancer annotation efficiency for artificial intelligence-observer training. World J Radiol 2023; 15:359-369. [PMID: 38179201 PMCID: PMC10762523 DOI: 10.4329/wjr.v15.i12.359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 11/13/2023] [Accepted: 12/05/2023] [Indexed: 12/26/2023] Open
Abstract
BACKGROUND Missing occult cancer lesions accounts for the most diagnostic errors in retrospective radiology reviews as early cancer can be small or subtle, making the lesions difficult to detect. Second-observer is the most effective technique for reducing these events and can be economically implemented with the advent of artificial intelligence (AI). AIM To achieve appropriate AI model training, a large annotated dataset is necessary to train the AI models. Our goal in this research is to compare two methods for decreasing the annotation time to establish ground truth: Skip-slice annotation and AI-initiated annotation. METHODS We developed a 2D U-Net as an AI second observer for detecting colorectal cancer (CRC) and an ensemble of 5 differently initiated 2D U-Net for ensemble technique. Each model was trained with 51 cases of annotated CRC computed tomography of the abdomen and pelvis, tested with 7 cases, and validated with 20 cases from The Cancer Imaging Archive cases. The sensitivity, false positives per case, and estimated Dice coefficient were obtained for each method of training. We compared the two methods of annotations and the time reduction associated with the technique. The time differences were tested using Friedman's two-way analysis of variance. RESULTS Sparse annotation significantly reduces the time for annotation particularly skipping 2 slices at a time (P < 0.001). Reduction of up to 2/3 of the annotation does not reduce AI model sensitivity or false positives per case. Although initializing human annotation with AI reduces the annotation time, the reduction is minimal, even when using an ensemble AI to decrease false positives. CONCLUSION Our data support the sparse annotation technique as an efficient technique for reducing the time needed to establish the ground truth.
Collapse
Affiliation(s)
- Matthew Grudza
- School of Biological Health and Systems Engineering, Arizona State University, Tempe, AZ 85287, United States
| | - Brandon Salinel
- Department of Radiology, Banner MD Anderson Cancer Center, Gilbert, AZ 85234, United States
| | - Sarah Zeien
- School of Osteopathic Medicine, A.T. Still University, Mesa, AZ 85206, United States
| | - Matthew Murphy
- School of Osteopathic Medicine, A.T. Still University, Mesa, AZ 85206, United States
| | - Jake Adkins
- Department of Abdominal Imaging, MD Anderson Cancer Center, Houston, TX 77030, United States
| | - Corey T Jensen
- Department of Abdominal Imaging, University Texas MD Anderson Cancer Center, Houston, TX 77030, United States
| | - Curtis Bay
- Department of Interdisciplinary Sciences, A.T. Still University, Mesa, AZ 85206, United States
| | - Vikram Kodibagkar
- School of Biological and Health Systems Engineering, Arizona State University, Tempe, AZ 85287, United States
| | - Phillip Koo
- Department of Radiology, Banner MD Anderson Cancer Center, Gilbert, AZ 85234, United States
| | - Tomislav Dragovich
- Division of Cancer Medicine, Banner MD Anderson Cancer Center, Gilbert, AZ 85234, United States
| | - Michael A Choti
- Department of Surgical Oncology, Banner MD Anderson Cancer Center, Gilbert, AZ 85234, United States
| | - Madappa Kundranda
- Division of Cancer Medicine, Banner MD Anderson Cancer Center, Gilbert, AZ 85234, United States
| | | | - Hong-Zhi Wang
- IBM Almaden Research Center, IBM, San Jose, CA 95120, United States
| | - John Chang
- Department of Radiology, Banner MD Anderson Cancer Center, Gilbert, AZ 85234, United States
| |
Collapse
|
2
|
Salinel B, Grudza M, Zeien S, Murphy M, Adkins J, Jensen C, Bay C, Kodibagkar V, Koo P, Dragovich T, Choti MA, Kundranda MN, Wang H, Syeda-Mahmood T, Chang J. Ensemble voting decreases false positives in AI second-observer reads for detecting colorectal cancer. J Clin Oncol 2022. [DOI: 10.1200/jco.2022.40.4_suppl.141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
141 Background: Colorectal cancer (CRC) is the second leading cause of cancer-related deaths, and survival can be improved if early, suspect imaging features on CT of the abdomen and pelvis (CTAP) can be routinely identified. At present, up to 40% of these features are undiagnosed on routine CTAP, but this can be improved with a second observer. In this study, we developed a deep ensemble learning method for detecting CRC on CTAP to determine if increasing agreement between ensemble models can decrease the false positives detected by artificial intelligence (AI) second-observer. Methods: 2D U-Net convolutional neural network (CNN) containing 31 million trainable parameters was trained with 58 CRC CT images from Banner MD Anderson (AZ) and MD Anderson Cancer Center (TX) (51 used for training and 7 for validation) and 59 normal CT scans from Banner MD Anderson Cancer Center. 20 of the 25 CRC cases from public domain data (The Cancer Genome Atlas) were used to evaluate the performance of the models. The CRC was segmented using ITK-SNAP open-source software (v. 3.8). To apply the deep ensemble approach, five CNN models were trained independently with random initialization using the same U-Net architect and the same training data. Given a testing CT scan, each of the five trained CNN models was applied to produce tumor segmentation for the testing CT scan. The tumor segmentation results produced by the trained CNN models were then fused using a simple majority voting rule to produce consensus tumor segmentation results. The segmentation was analyzed by the percentage of correct detection, the number of false positives per case, and the Dice similarity coefficient (DSC). If parts of the CRC were flagged by AI, then it was considered correct. A detection was considered false positive if the marked lesion did not overlap with any CRC; contiguous false positives across different slices of CT image were considered a single false positive. DSC measures the quality of the segmentation by measuring the overlap between the ground-truth and AI detected lesion. Results: Our results showed that increasing the agreement between the 5 models dramatically decreases the number of false positives per CT at the expense of slight decrease in accuracy and DSC. This is described in the table. Conclusions: Our results show that AI-based second observer can potentially detect CRC on routine CTAP. Although the initial result yields high false positives per case, ensemble voting is an effective method for decreasing the false positives with a slight decrease in accuracy. This technique can be further improved for eventual clinical application.[Table: see text]
Collapse
Affiliation(s)
| | | | - Sarah Zeien
- A.T. Still University of Health Sciences, Kirksville, MO
| | - Matthew Murphy
- A.T. Still University of Health Sciences, Kirksville, MO
| | | | | | - Curt Bay
- A.T. Still University of Health Sciences, Kirksville, MO
| | | | - Phillip Koo
- Banner MD Anderson Cancer Center, Gilbert, AZ
| | | | | | | | | | | | - John Chang
- Banner MD Anderson Cancer Center, Gilbert, AZ
| | | |
Collapse
|
3
|
Salinel B, Grudza M, Zeien S, Murphy M, Adkins J, Jensen C, Bay C, Kodibagkar V, Koo P, Dragovich T, Choti MA, Kundranda MN, Wang H, Syeda-Mahmood T, Chang J. Comparison of segmentation methods to improve throughput in annotating AI-observer for detecting colorectal cancer. J Clin Oncol 2022. [DOI: 10.1200/jco.2022.40.4_suppl.142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
142 Background: Colorectal cancer (CRC) is the second leading cause of cancer-related deaths, and its outcome can be improved with better detection of incidental early CRC on routine CT of the abdomen and pelvis (CTAP). AI-second observer (AI) has the potential as shown in our companion abstract. The bottleneck in training AI is the time required for radiologists to segment the CRC. We compared two techniques for accelerating the segmentation process: 1) Sparse annotation (annotating some of the CT slice containing CRC instead of every slice); 2) Allowing AI to perform initial segmentation followed by human adjustment. Methods: 2D U-Net convolutional neural network (CNN) containing 31 million trainable parameters was trained with 58 CRC CT images from Banner MD Anderson (AZ) and MD Anderson Cancer Center (TX) (51 used for training and 7 for validation) and 59 normal CT scans from Banner MD Anderson Cancer Center. Twenty of the 25 CRC cases from public domain data (The Cancer Genome Atlas) were used to evaluate the performance of the models. The CRC was segmented using ITK-SNAP open-source software (v. 3.8). For the first objective, 3 separate models were trained (fully annotated CRC, every other slice, and every third slice). The AI-annotation on the TCGA dataset was analyzed by the percentage of correct detection of CRC, the number of false positives, and the Dice similarity coefficient (DSC). If parts of the CRC were flagged by AI, then it was considered correct. A detection was considered false positive if the marked lesion did not overlap with CRC; contiguous false positives across different slices of CT image were considered a single false positive. DSC measures the quality of the segmentation by measuring the overlap between the ground-truth and AI detected lesion. For the second objective, the time required to adjust the AI-produced annotation was compared to the time required for annotating the entire CRC without AI assistance. The AI-models were trained using ensemble learning (see our companion abstract for details of the techniques). Results: Our results showed that skipping slices of tumor in training did not alter the accuracy, false positives, or DSC classification of the model. When adjusting the AI-observer segmentation, there was a trend toward decreasing the time required to adjust the annotation compared to full manual segmentation, but the difference was not statistically significant (Table; p=0.121). Conclusions: Our results show that both skipping slices of tumor as well as starting with AI-produced annotation can potentially decrease the effort required to produce high-quality ground truth without compromising the performance of AI. These techniques can help improve the throughput to obtain a large volume of cases to train AI for detecting CRC.[Table: see text]
Collapse
Affiliation(s)
| | | | - Sarah Zeien
- A.T. Still University of Health Sciences, Kirksville, MO
| | - Matthew Murphy
- A.T. Still University of Health Sciences, Kirksville, MO
| | | | | | - Curt Bay
- A.T. Still University of Health Sciences, Kirksville, MO
| | | | - Phillip Koo
- Banner MD Anderson Cancer Center, Gilbert, AZ
| | | | | | | | | | | | - John Chang
- Banner MD Anderson Cancer Center, Gilbert, AZ
| |
Collapse
|