1
|
Jaspers TJM, Boers TGW, Kusters CHJ, Jong MR, Jukema JB, de Groof AJ, Bergman JJ, de With PHN, van der Sommen F. Robustness evaluation of deep neural networks for endoscopic image analysis: Insights and strategies. Med Image Anal 2024; 94:103157. [PMID: 38574544 DOI: 10.1016/j.media.2024.103157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 03/19/2024] [Accepted: 03/21/2024] [Indexed: 04/06/2024]
Abstract
Computer-aided detection and diagnosis systems (CADe/CADx) in endoscopy are commonly trained using high-quality imagery, which is not representative for the heterogeneous input typically encountered in clinical practice. In endoscopy, the image quality heavily relies on both the skills and experience of the endoscopist and the specifications of the system used for screening. Factors such as poor illumination, motion blur, and specific post-processing settings can significantly alter the quality and general appearance of these images. This so-called domain gap between the data used for developing the system and the data it encounters after deployment, and the impact it has on the performance of deep neural networks (DNNs) supportive endoscopic CAD systems remains largely unexplored. As many of such systems, for e.g. polyp detection, are already being rolled out in clinical practice, this poses severe patient risks in particularly community hospitals, where both the imaging equipment and experience are subject to considerable variation. Therefore, this study aims to evaluate the impact of this domain gap on the clinical performance of CADe/CADx for various endoscopic applications. For this, we leverage two publicly available data sets (KVASIR-SEG and GIANA) and two in-house data sets. We investigate the performance of commonly-used DNN architectures under synthetic, clinically calibrated image degradations and on a prospectively collected dataset including 342 endoscopic images of lower subjective quality. Additionally, we assess the influence of DNN architecture and complexity, data augmentation, and pretraining techniques for improved robustness. The results reveal a considerable decline in performance of 11.6% (±1.5) as compared to the reference, within the clinically calibrated boundaries of image degradations. Nevertheless, employing more advanced DNN architectures and self-supervised in-domain pre-training effectively mitigate this drop to 7.7% (±2.03). Additionally, these enhancements yield the highest performance on the manually collected test set including images with lower subjective quality. By comprehensively assessing the robustness of popular DNN architectures and training strategies across multiple datasets, this study provides valuable insights into their performance and limitations for endoscopic applications. The findings highlight the importance of including robustness evaluation when developing DNNs for endoscopy applications and propose strategies to mitigate performance loss.
Collapse
Affiliation(s)
- Tim J M Jaspers
- Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands.
| | - Tim G W Boers
- Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Carolus H J Kusters
- Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Martijn R Jong
- Department of Gastroenterology and Hepatology, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, The Netherlands
| | - Jelmer B Jukema
- Department of Gastroenterology and Hepatology, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, The Netherlands
| | - Albert J de Groof
- Department of Gastroenterology and Hepatology, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, The Netherlands
| | - Jacques J Bergman
- Department of Gastroenterology and Hepatology, Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, The Netherlands
| | - Peter H N de With
- Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Fons van der Sommen
- Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands
| |
Collapse
|
2
|
Bakker FHA, de Nijs JV, Jaspers TJM, de With PHN, Beulens AJW, van der Poel H, van der Sommen F, Brinkman WM. Estimating Surgical Urethral Length on Intraoperative Robot-Assisted Prostatectomy Images using Artificial Intelligence Anatomy Recognition. J Endourol 2024. [PMID: 38613819 DOI: 10.1089/end.2023.0697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/15/2024] Open
Abstract
Objective To construct a Convolutional Neural Network (CNN) model that can recognize and delineate anatomic structures on intraoperative video frames of robot-assisted radical prostatectomy (RARP) and to use these annotations to predict the surgical urethral length (SUL). Background Urethral dissection during RARP impacts patient urinary incontinence (UI) outcomes, and requires extensive training. Large differences exist between incontinence outcomes of different urologists and hospitals. Also, surgeon experience and education are critical towards optimal outcomes. Therefore new approaches are warranted. SUL is associated with UI. Artificial intelligence (AI) surgical image segmentation using a CNN could automate SUL estimation and contribute towards future AI-assisted RARP and surgeon guidance. Methods Eighty-eight intraoperative RARP videos between June 2009 and September 2014 were collected from a single center. 264 frames were annotated according to: prostate, urethra, ligated plexus and catheter. Thirty annotated images from different RARP videos were used as a test dataset. The Dice coefficient (DSC) and 95th percentile Hausdorff distance (Hd95) were used to determine model performance. SUL was calculated using the catheter as a reference. Results The DSC of the best performing model were 0.735 and 0.755 for the catheter and urethra classes respectively, with a Hd95 of 29.27 and 72.62 respectively. The model performed moderately on the ligated plexus and prostate. The predicted SUL showed a mean difference of 0.64 - 1.86mm difference versus human annotators, but with significant deviation (SD 3.28 - 3.56). Conclusion This study shows that an AI image segmentation model can predict vital structures during RARP urethral dissection with moderate to fair accuracy. SUL estimation derived from it showed large deviations and outliers when compared to human annotators, but with a very small mean difference (<2mm). This is a promising development for further research on AI-assisted RARP. Keywords Prostate cancer, Anatomy recognition, Artificial intelligence, Continence, Urethral length.
Collapse
Affiliation(s)
| | - Joris V de Nijs
- Eindhoven University of Technology, 3169, Electrical Engineering, Eindhoven, Noord-Brabant, Netherlands;
| | - Tim J M Jaspers
- Eindhoven University of Technology, 3169, Electrical Engineering, Eindhoven, Noord-Brabant, Netherlands;
| | - Peter H N de With
- Eindhoven University of Technology, 3169, Electrical Engineering, Eindhoven, Noord-Brabant, Netherlands;
| | | | - Henk van der Poel
- Antoni van Leeuwenhoek, 1228, Urology, Amsterdam, Noord-Holland, Netherlands;
| | - Fons van der Sommen
- Eindhoven University of Technology, 3169, Electrical Engineering, Eindhoven, Noord-Brabant, Netherlands;
| | - Willem M Brinkman
- Universitair Medisch Centrum Utrecht, 8124, Urology, Heidelberglaan 100, Utrecht, Netherlands, 3584CG;
| |
Collapse
|
3
|
den Boer RB, Jaspers TJM, de Jongh C, Pluim JPW, van der Sommen F, Boers T, van Hillegersberg R, Van Eijnatten MAJM, Ruurda JP. Deep learning-based recognition of key anatomical structures during robot-assisted minimally invasive esophagectomy. Surg Endosc 2023:10.1007/s00464-023-09990-z. [PMID: 36947221 DOI: 10.1007/s00464-023-09990-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 02/25/2023] [Indexed: 03/23/2023]
Abstract
OBJECTIVE To develop a deep learning algorithm for anatomy recognition in thoracoscopic video frames from robot-assisted minimally invasive esophagectomy (RAMIE) procedures using deep learning. BACKGROUND RAMIE is a complex operation with substantial perioperative morbidity and a considerable learning curve. Automatic anatomy recognition may improve surgical orientation and recognition of anatomical structures and might contribute to reducing morbidity or learning curves. Studies regarding anatomy recognition in complex surgical procedures are currently lacking. METHODS Eighty-three videos of consecutive RAMIE procedures between 2018 and 2022 were retrospectively collected at University Medical Center Utrecht. A surgical PhD candidate and an expert surgeon annotated the azygos vein and vena cava, aorta, and right lung on 1050 thoracoscopic frames. 850 frames were used for training of a convolutional neural network (CNN) to segment the anatomical structures. The remaining 200 frames of the dataset were used for testing the CNN. The Dice and 95% Hausdorff distance (95HD) were calculated to assess algorithm accuracy. RESULTS The median Dice of the algorithm was 0.79 (IQR = 0.20) for segmentation of the azygos vein and/or vena cava. A median Dice coefficient of 0.74 (IQR = 0.86) and 0.89 (IQR = 0.30) were obtained for segmentation of the aorta and lung, respectively. Inference time was 0.026 s (39 Hz). The prediction of the deep learning algorithm was compared with the expert surgeon annotations, showing an accuracy measured in median Dice of 0.70 (IQR = 0.19), 0.88 (IQR = 0.07), and 0.90 (0.10) for the vena cava and/or azygos vein, aorta, and lung, respectively. CONCLUSION This study shows that deep learning-based semantic segmentation has potential for anatomy recognition in RAMIE video frames. The inference time of the algorithm facilitated real-time anatomy recognition. Clinical applicability should be assessed in prospective clinical studies.
Collapse
Affiliation(s)
- R B den Boer
- Department of Surgery, University Medical Center Utrecht, Heidelberglaan 100, 3584 CX, Utrecht, The Netherlands
| | - T J M Jaspers
- Department of Biomedical Engineering, Eindhoven University of Technology, Groene Loper 3, 5612 AE, Eindhoven, The Netherlands
| | - C de Jongh
- Department of Surgery, University Medical Center Utrecht, Heidelberglaan 100, 3584 CX, Utrecht, The Netherlands
| | - J P W Pluim
- Department of Biomedical Engineering, Eindhoven University of Technology, Groene Loper 3, 5612 AE, Eindhoven, The Netherlands
| | - F van der Sommen
- Department of Electrical Engineering, Eindhoven University of Technology, Groene Loper 19, 5612 AP, Eindhoven, The Netherlands
| | - T Boers
- Department of Electrical Engineering, Eindhoven University of Technology, Groene Loper 19, 5612 AP, Eindhoven, The Netherlands
| | - R van Hillegersberg
- Department of Surgery, University Medical Center Utrecht, Heidelberglaan 100, 3584 CX, Utrecht, The Netherlands
| | - M A J M Van Eijnatten
- Department of Biomedical Engineering, Eindhoven University of Technology, Groene Loper 3, 5612 AE, Eindhoven, The Netherlands
| | - J P Ruurda
- Department of Surgery, University Medical Center Utrecht, Heidelberglaan 100, 3584 CX, Utrecht, The Netherlands.
| |
Collapse
|