1
|
Venkatesh DK, Rivoir D, Pfeiffer M, Kolbinger F, Distler M, Weitz J, Speidel S. Exploring semantic consistency in unpaired image translation to generate data for surgical applications. Int J Comput Assist Radiol Surg 2024; 19:985-993. [PMID: 38407730 DOI: 10.1007/s11548-024-03079-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 02/14/2024] [Indexed: 02/27/2024]
Abstract
PURPOSE In surgical computer vision applications, data privacy and expert annotation challenges impede the acquisition of labeled training data. Unpaired image-to-image translation techniques have been explored to automatically generate annotated datasets by translating synthetic images into a realistic domain. The preservation of structure and semantic consistency, i.e., per-class distribution during translation, poses a significant challenge, particularly in cases of semantic distributional mismatch. METHOD This study empirically investigates various translation methods for generating data in surgical applications, explicitly focusing on semantic consistency. Through our analysis, we introduce a novel and simple combination of effective approaches, which we call ConStructS. The defined losses within this approach operate on multiple image patches and spatial resolutions during translation. RESULTS Various state-of-the-art models were extensively evaluated on two challenging surgical datasets. With two different evaluation schemes, the semantic consistency and the usefulness of the translated images on downstream semantic segmentation tasks were evaluated. The results demonstrate the effectiveness of the ConStructS method in minimizing semantic distortion, with images generated by this model showing superior utility for downstream training. CONCLUSION In this study, we tackle semantic inconsistency in unpaired image translation for surgical applications with minimal labeled data. The simple model (ConStructS) enhances consistency during translation and serves as a practical way of generating fully labeled and semantically consistent datasets at minimal cost. Our code is available at https://gitlab.com/nct_tso_public/constructs .
Collapse
Affiliation(s)
- Danush Kumar Venkatesh
- Department of Translational Surgical Oncology, National Centre for Tumor Diseases(NCT/UCC), Dresden, 01307, Germany.
- SECAI, TU Dresden, Dresden, Germany.
- Department of Visceral, Thoracic and Vascular Surgery, University Hospital and Faculty of Medicine, TU Dresden, 01307, Dresden, Germany.
| | - Dominik Rivoir
- Department of Translational Surgical Oncology, National Centre for Tumor Diseases(NCT/UCC), Dresden, 01307, Germany
- The Centre for Tactile Internet(CeTI), TU Dresden, Dresden, Germany
| | - Micha Pfeiffer
- Department of Translational Surgical Oncology, National Centre for Tumor Diseases(NCT/UCC), Dresden, 01307, Germany
| | - Fiona Kolbinger
- Department of Translational Surgical Oncology, National Centre for Tumor Diseases(NCT/UCC), Dresden, 01307, Germany
- Department of Visceral, Thoracic and Vascular Surgery, University Hospital and Faculty of Medicine, TU Dresden, 01307, Dresden, Germany
| | - Marius Distler
- Department of Visceral, Thoracic and Vascular Surgery, University Hospital and Faculty of Medicine, TU Dresden, 01307, Dresden, Germany
| | - Jürgen Weitz
- Department of Visceral, Thoracic and Vascular Surgery, University Hospital and Faculty of Medicine, TU Dresden, 01307, Dresden, Germany
- The Centre for Tactile Internet(CeTI), TU Dresden, Dresden, Germany
| | - Stefanie Speidel
- Department of Translational Surgical Oncology, National Centre for Tumor Diseases(NCT/UCC), Dresden, 01307, Germany
- SECAI, TU Dresden, Dresden, Germany
- Department of Visceral, Thoracic and Vascular Surgery, University Hospital and Faculty of Medicine, TU Dresden, 01307, Dresden, Germany
- The Centre for Tactile Internet(CeTI), TU Dresden, Dresden, Germany
| |
Collapse
|
2
|
Kaleta J, Dall'Alba D, Płotka S, Korzeniowski P. Minimal data requirement for realistic endoscopic image generation with Stable Diffusion. Int J Comput Assist Radiol Surg 2024; 19:531-539. [PMID: 37934401 PMCID: PMC10881618 DOI: 10.1007/s11548-023-03030-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Accepted: 10/11/2023] [Indexed: 11/08/2023]
Abstract
PURPOSE Computer-assisted surgical systems provide support information to the surgeon, which can improve the execution and overall outcome of the procedure. These systems are based on deep learning models that are trained on complex and challenging-to-annotate data. Generating synthetic data can overcome these limitations, but it is necessary to reduce the domain gap between real and synthetic data. METHODS We propose a method for image-to-image translation based on a Stable Diffusion model, which generates realistic images starting from synthetic data. Compared to previous works, the proposed method is better suited for clinical application as it requires a much smaller amount of input data and allows finer control over the generation of details by introducing different variants of supporting control networks. RESULTS The proposed method is applied in the context of laparoscopic cholecystectomy, using synthetic and real data from public datasets. It achieves a mean Intersection over Union of 69.76%, significantly improving the baseline results (69.76 vs. 42.21%). CONCLUSIONS The proposed method for translating synthetic images into images with realistic characteristics will enable the training of deep learning methods that can generalize optimally to real-world contexts, thereby improving computer-assisted intervention guidance systems.
Collapse
Affiliation(s)
- Joanna Kaleta
- Sano Centre for Computational Medicine, Krakow, Poland
| | - Diego Dall'Alba
- Sano Centre for Computational Medicine, Krakow, Poland.
- Department of Computer Science, University of Verona, Verona, Italy.
| | - Szymon Płotka
- Sano Centre for Computational Medicine, Krakow, Poland
- Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands
- Department of Biomedical Engineering and Physics, Amsterdam University Medical Center, Amsterdam, The Netherlands
| | | |
Collapse
|
3
|
Demir KC, Schieber H, Weise T, Roth D, May M, Maier A, Yang SH. Deep Learning in Surgical Workflow Analysis: A Review of Phase and Step Recognition. IEEE J Biomed Health Inform 2023; 27:5405-5417. [PMID: 37665700 DOI: 10.1109/jbhi.2023.3311628] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2023]
Abstract
OBJECTIVE In the last two decades, there has been a growing interest in exploring surgical procedures with statistical models to analyze operations at different semantic levels. This information is necessary for developing context-aware intelligent systems, which can assist the physicians during operations, evaluate procedures afterward or help the management team to effectively utilize the operating room. The objective is to extract reliable patterns from surgical data for the robust estimation of surgical activities performed during operations. The purpose of this article is to review the state-of-the-art deep learning methods that have been published after 2018 for analyzing surgical workflows, with a focus on phase and step recognition. METHODS Three databases, IEEE Xplore, Scopus, and PubMed were searched, and additional studies are added through a manual search. After the database search, 343 studies were screened and a total of 44 studies are selected for this review. CONCLUSION The use of temporal information is essential for identifying the next surgical action. Contemporary methods used mainly RNNs, hierarchical CNNs, and Transformers to preserve long-distance temporal relations. The lack of large publicly available datasets for various procedures is a great challenge for the development of new and robust models. As supervised learning strategies are used to show proof-of-concept, self-supervised, semi-supervised, or active learning methods are used to mitigate dependency on annotated data. SIGNIFICANCE The present study provides a comprehensive review of recent methods in surgical workflow analysis, summarizes commonly used architectures, datasets, and discusses challenges.
Collapse
|
4
|
Mao F, Huang T, Ma L, Zhang X, Liao H. A Monocular Variable Magnifications 3D Laparoscope System Using Double Liquid Lenses. IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE 2023; 12:32-42. [PMID: 38059130 PMCID: PMC10697296 DOI: 10.1109/jtehm.2023.3311022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 08/13/2023] [Accepted: 08/21/2023] [Indexed: 12/08/2023]
Abstract
During minimal invasive surgery (MIS), the laparoscope only provides a single viewpoint to the surgeon, leaving a lack of 3D perception. Many works have been proposed to obtain depth and 3D reconstruction by designing a new optical structure or by depending on the camera pose and image sequences. Most of these works modify the structure of the conventional laparoscopes and cannot provide 3D reconstruction of different magnification views. In this study, we propose a laparoscopic system based on double liquid lenses, which provide doctors with variable magnification rates, near observation, and real-time monocular 3D reconstruction. Our system composes of an optical structure that can obtain auto magnification change and autofocus without any physically moving element, and a deep learning network based on the Depth from Defocus (DFD) method, trained to suit inconsistent camera intrinsic situations and estimate depth from images of different focal lengths. The optical structure is portable and can be mounted on conventional laparoscopes. The depth estimation network estimates depth in real-time from monocular images of different focal lengths and magnification rates. Experiments show that our system provides a 0.68-1.44x zoom rate and can estimate depth from different magnification rates at 6fps. Monocular 3D reconstruction reaches at least 6mm accuracy. The system also provides a clear view even under 1mm close working distance. Ex-vivo experiments and implementation on clinical images prove that our system provides doctors with a magnified clear view of the lesion, as well as quick monocular depth perception during laparoscopy, which help surgeons get better detection and size diagnosis of the abdomen during laparoscope surgeries.
Collapse
Affiliation(s)
- Fan Mao
- Department of Biomedical EngineeringSchool of MedicineTsinghua UniversityBeijing100084China
| | - Tianqi Huang
- Department of Biomedical EngineeringSchool of MedicineTsinghua UniversityBeijing100084China
| | - Longfei Ma
- Department of Biomedical EngineeringSchool of MedicineTsinghua UniversityBeijing100084China
| | - Xinran Zhang
- Department of Biomedical EngineeringSchool of MedicineTsinghua UniversityBeijing100084China
| | - Hongen Liao
- Department of Biomedical EngineeringSchool of MedicineTsinghua UniversityBeijing100084China
| |
Collapse
|
5
|
Jalal NA, Alshirbaji TA, Docherty PD, Arabian H, Laufer B, Krueger-Ziolek S, Neumuth T, Moeller K. Laparoscopic Video Analysis Using Temporal, Attention, and Multi-Feature Fusion Based-Approaches. SENSORS (BASEL, SWITZERLAND) 2023; 23:1958. [PMID: 36850554 PMCID: PMC9964851 DOI: 10.3390/s23041958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/01/2023] [Revised: 02/06/2023] [Accepted: 02/07/2023] [Indexed: 06/18/2023]
Abstract
Adapting intelligent context-aware systems (CAS) to future operating rooms (OR) aims to improve situational awareness and provide surgical decision support systems to medical teams. CAS analyzes data streams from available devices during surgery and communicates real-time knowledge to clinicians. Indeed, recent advances in computer vision and machine learning, particularly deep learning, paved the way for extensive research to develop CAS. In this work, a deep learning approach for analyzing laparoscopic videos for surgical phase recognition, tool classification, and weakly-supervised tool localization in laparoscopic videos was proposed. The ResNet-50 convolutional neural network (CNN) architecture was adapted by adding attention modules and fusing features from multiple stages to generate better-focused, generalized, and well-representative features. Then, a multi-map convolutional layer followed by tool-wise and spatial pooling operations was utilized to perform tool localization and generate tool presence confidences. Finally, the long short-term memory (LSTM) network was employed to model temporal information and perform tool classification and phase recognition. The proposed approach was evaluated on the Cholec80 dataset. The experimental results (i.e., 88.5% and 89.0% mean precision and recall for phase recognition, respectively, 95.6% mean average precision for tool presence detection, and a 70.1% F1-score for tool localization) demonstrated the ability of the model to learn discriminative features for all tasks. The performances revealed the importance of integrating attention modules and multi-stage feature fusion for more robust and precise detection of surgical phases and tools.
Collapse
Affiliation(s)
- Nour Aldeen Jalal
- Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
- Innovation Center Computer Assisted Surgery (ICCAS), University of Leipzig, 04103 Leipzig, Germany
| | - Tamer Abdulbaki Alshirbaji
- Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
- Innovation Center Computer Assisted Surgery (ICCAS), University of Leipzig, 04103 Leipzig, Germany
| | - Paul David Docherty
- Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
- Department of Mechanical Engineering, University of Canterbury, Christchurch 8041, New Zealand
| | - Herag Arabian
- Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
| | - Bernhard Laufer
- Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
| | - Sabine Krueger-Ziolek
- Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
| | - Thomas Neumuth
- Innovation Center Computer Assisted Surgery (ICCAS), University of Leipzig, 04103 Leipzig, Germany
| | - Knut Moeller
- Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
- Department of Mechanical Engineering, University of Canterbury, Christchurch 8041, New Zealand
- Department of Microsystems Engineering, University of Freiburg, 79110 Freiburg, Germany
| |
Collapse
|
6
|
Minimally invasive and invasive liver surgery based on augmented reality training: a review of the literature. J Robot Surg 2022; 17:753-763. [DOI: 10.1007/s11701-022-01499-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Accepted: 11/14/2022] [Indexed: 11/29/2022]
|
7
|
Dowrick T, Davidson B, Gurusamy K, Clarkson MJ. Large scale simulation of labeled intraoperative scenes in unity. Int J Comput Assist Radiol Surg 2022; 17:961-963. [PMID: 35355211 PMCID: PMC9110486 DOI: 10.1007/s11548-022-02598-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 03/08/2022] [Indexed: 11/29/2022]
Abstract
PURPOSE The use of synthetic or simulated data has the potential to greatly improve the availability and volume of training data for image guided surgery and other medical applications, where access to real-life training data is limited. METHODS By using the Unity game engine, complex intraoperative scenes can be simulated. The Unity Perception package allows for randomisation of paremeters within the scene, and automatic labelling, to make simulating large data sets a trivial operation. In this work, the approach has been prototyped for liver segmentation from laparoscopic video images. 50,000 simulated images were used to train a U-Net, without the need for any manual labelling. The use of simulated data was compared against a model trained with 950 manually labelled laparoscopic images. RESULTS When evaluated on data from 10 separate patients, synthetic data outperformed real data in 4 out of 10 cases. Average DICE scores across the 10 cases were 0.59 (synthetic data), 0.64 (real data) and 0.75 (both synthetic and real data). CONCLUSION Synthetic data generated using this method is able to make valid inferences on real data, with average performance slightly below models trained on real data. The use of the simulated data for pre-training boosts model performance, when compared with training on real data only.
Collapse
Affiliation(s)
- Thomas Dowrick
- Wellcome EPSRC Centre for Interventional and Surgical Sciences, UCL, London, UK.
| | - Brian Davidson
- Wellcome EPSRC Centre for Interventional and Surgical Sciences, UCL, London, UK.,Division of Surgery and Interventional Science, UCL, London, UK
| | - Kurinchi Gurusamy
- Wellcome EPSRC Centre for Interventional and Surgical Sciences, UCL, London, UK.,Division of Surgery and Interventional Science, UCL, London, UK
| | - Matthew J Clarkson
- Wellcome EPSRC Centre for Interventional and Surgical Sciences, UCL, London, UK
| |
Collapse
|
8
|
Liu J, Guo X, Yuan Y. Graph-Based Surgical Instrument Adaptive Segmentation via Domain-Common Knowledge. IEEE TRANSACTIONS ON MEDICAL IMAGING 2022; 41:715-726. [PMID: 34673485 DOI: 10.1109/tmi.2021.3121138] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Unsupervised domain adaptation (UDA), aiming to adapt the model to an unseen domain without annotations, has drawn sustained attention in surgical instrument segmentation. Existing UDA methods neglect the domain-common knowledge of two datasets, thus failing to grasp the inter-category relationship in the target domain and leading to poor performance. To address these issues, we propose a graph-based unsupervised domain adaptation framework, named Interactive Graph Network (IGNet), to effectively adapt a model to an unlabeled new domain in surgical instrument segmentation tasks. In detail, the Domain-common Prototype Constructor (DPC) is first advanced to adaptively aggregate the feature map into domain-common prototypes using the probability mixture model, and construct a prototypical graph to interact the information among prototypes from the global perspective. In this way, DPC can grasp the co-occurrent and long-range relationship for both domains. To further narrow down the domain gap, we design a Domain-common Knowledge Incorporator (DKI) to guide the evolution of feature maps towards domain-common direction via a common-knowledge guidance graph and category-attentive graph reasoning. At last, the Cross-category Mismatch Estimator (CME) is developed to evaluate the category-level alignment from a graph perspective and assign each pixel with different adversarial weights, so as to refine the feature distribution alignment. The extensive experiments on three types of tasks demonstrate the feasibility and superiority of IGNet compared with other state-of-the-art methods. Furthermore, ablation studies verify the effectiveness of each component of IGNet. The source code is available at https://github.com/CityU-AIM-Group/Prototypical-Graph-DA.
Collapse
|
9
|
Edwards PJE, Psychogyios D, Speidel S, Maier-Hein L, Stoyanov D. SERV-CT: A disparity dataset from cone-beam CT for validation of endoscopic 3D reconstruction. Med Image Anal 2022; 76:102302. [PMID: 34906918 PMCID: PMC8961000 DOI: 10.1016/j.media.2021.102302] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 11/01/2021] [Accepted: 11/04/2021] [Indexed: 11/27/2022]
Abstract
In computer vision, reference datasets from simulation and real outdoor scenes have been highly successful in promoting algorithmic development in stereo reconstruction. Endoscopic stereo reconstruction for surgical scenes gives rise to specific problems, including the lack of clear corner features, highly specular surface properties and the presence of blood and smoke. These issues present difficulties for both stereo reconstruction itself and also for standardised dataset production. Previous datasets have been produced using computed tomography (CT) or structured light reconstruction on phantom or ex vivo models. We present a stereo-endoscopic reconstruction validation dataset based on cone-beam CT (SERV-CT). Two ex vivo small porcine full torso cadavers were placed within the view of the endoscope with both the endoscope and target anatomy visible in the CT scan. Subsequent orientation of the endoscope was manually aligned to match the stereoscopic view and benchmark disparities, depths and occlusions are calculated. The requirement of a CT scan limited the number of stereo pairs to 8 from each ex vivo sample. For the second sample an RGB surface was acquired to aid alignment of smooth, featureless surfaces. Repeated manual alignments showed an RMS disparity accuracy of around 2 pixels and a depth accuracy of about 2 mm. A simplified reference dataset is provided consisting of endoscope image pairs with corresponding calibration, disparities, depths and occlusions covering the majority of the endoscopic image and a range of tissue types, including smooth specular surfaces, as well as significant variation of depth. We assessed the performance of various stereo algorithms from online available repositories. There is a significant variation between algorithms, highlighting some of the challenges of surgical endoscopic images. The SERV-CT dataset provides an easy to use stereoscopic validation for surgical applications with smooth reference disparities and depths covering the majority of the endoscopic image. This complements existing resources well and we hope will aid the development of surgical endoscopic anatomical reconstruction algorithms.
Collapse
Affiliation(s)
- P J Eddie Edwards
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), University College London (UCL), Charles Bell House, 43-45 Foley Street, London W1W 7TS, UK.
| | - Dimitris Psychogyios
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), University College London (UCL), Charles Bell House, 43-45 Foley Street, London W1W 7TS, UK
| | - Stefanie Speidel
- Division of Translational Surgical Oncology, National Center for Tumor Diseases (NCT) Dresden, Dresden, 01307, Germany
| | - Lena Maier-Hein
- Division of Medical and Biological Informatics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Danail Stoyanov
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), University College London (UCL), Charles Bell House, 43-45 Foley Street, London W1W 7TS, UK
| |
Collapse
|
10
|
Automatic, global registration in laparoscopic liver surgery. Int J Comput Assist Radiol Surg 2021; 17:167-176. [PMID: 34697757 PMCID: PMC8739294 DOI: 10.1007/s11548-021-02518-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 10/04/2021] [Indexed: 11/13/2022]
Abstract
Purpose The initial registration of a 3D pre-operative CT model to a 2D laparoscopic video image in augmented reality systems for liver surgery needs to be fast, intuitive to perform and with minimal interruptions to the surgical intervention. Several recent methods have focussed on using easily recognisable landmarks across modalities. However, these methods still need manual annotation or manual alignment. We propose a novel, fully automatic pipeline for 3D–2D global registration in laparoscopic liver interventions. Methods Firstly, we train a fully convolutional network for the semantic detection of liver contours in laparoscopic images. Secondly, we propose a novel contour-based global registration algorithm to estimate the camera pose without any manual input during surgery. The contours used are the anterior ridge and the silhouette of the liver. Results We show excellent generalisation of the semantic contour detection on test data from 8 clinical cases. In quantitative experiments, the proposed contour-based registration can successfully estimate a global alignment with as little as 30% of the liver surface, a visibility ratio which is characteristic of laparoscopic interventions. Moreover, the proposed pipeline showed very promising results in clinical data from 5 laparoscopic interventions. Conclusions Our proposed automatic global registration could make augmented reality systems more intuitive and usable for surgeons and easier to translate to operating rooms. Yet, as the liver is deformed significantly during surgery, it will be very beneficial to incorporate deformation into our method for more accurate registration.
Collapse
|
11
|
Sharan L, Romano G, Koehler S, Kelm H, Karck M, De Simone R, Engelhardt S. Mutually improved endoscopic image synthesis and landmark detection in unpaired image-to-image translation. IEEE J Biomed Health Inform 2021; 26:127-138. [PMID: 34310335 DOI: 10.1109/jbhi.2021.3099858] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The CycleGAN framework allows for unsupervised image-to-image translation of unpaired data. In a scenario of surgical training on a physical surgical simulator, this method can be used to transform endoscopic images of phantoms into images which more closely resemble the intra-operative appearance of the same surgical target structure. This can be viewed as a novel augmented reality approach, which we coined Hyperrealism in previous work. In this use case, it is of paramount importance to display objects like needles, sutures or instruments consistent in both domains while altering the style to a more tissue-like appearance. Segmentation of these objects would allow for a direct transfer, however, contouring of these, partly tiny and thin foreground objects is cumbersome and perhaps inaccurate. Instead, we propose to use landmark detection on the points when sutures pass into the tissue. This objective is directly incorporated into a CycleGAN framework by treating the performance of pre-trained detector models as an additional optimization goal. We show that a task defined on these sparse landmark labels improves consistency of synthesis by the generator network in both domains. Comparing a baseline CycleGAN architecture to our proposed extension (DetCycleGAN), mean precision (PPV) improved by +61.32, mean sensitivity (TPR) by +37.91, and mean F1 score by +0.4743. Furthermore, it could be shown that by dataset fusion, generated intra-operative images can be leveraged as additional training data for the detection network itself. The data is released within the scope of the AdaptOR MICCAI Challenge 2021 at https://adaptor2021.github.io/, and code at https://github.com/Cardio-AI/detcyclegan_pytorch.
Collapse
|
12
|
Xi L, Zhao Y, Chen L, Gao QH, Tang W, Wan TR, Xue T. Recovering dense 3D point clouds from single endoscopic image. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021; 205:106077. [PMID: 33910150 DOI: 10.1016/j.cmpb.2021.106077] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Accepted: 03/23/2021] [Indexed: 06/12/2023]
Abstract
BACKGROUND AND OBJECTIVE Recovering high-quality 3D point clouds from monocular endoscopic images is a challenging task. This paper proposes a novel deep learning-based computational framework for 3D point cloud reconstruction from single monocular endoscopic images. METHODS An unsupervised mono-depth learning network is used to generate depth information from monocular images. Given a single mono endoscopic image, the network is capable of depicting a depth map. The depth map is then used to recover a dense 3D point cloud. A generative Endo-AE network based on an auto-encoder is trained to repair defects of the dense point cloud by generating the best representation from the incomplete data. The performance of the proposed framework is evaluated against state-of-the-art learning-based methods. The results are also compared with non-learning based stereo 3D reconstruction algorithms. RESULTS Our proposed methods outperform both the state-of-the-art learning-based and non-learning based methods for 3D point cloud reconstruction. The Endo-AE model for point cloud completion can generate high-quality, dense 3D endoscopic point clouds from incomplete point clouds with holes. Our framework is able to recover complete 3D point clouds with the missing rate of information up to 60%. Five large medical in-vivo databases of 3D point clouds of real endoscopic scenes have been generated and two synthetic 3D medical datasets are created. We have made these datasets publicly available for researchers free of charge. CONCLUSIONS The proposed computational framework can produce high-quality and dense 3D point clouds from single mono-endoscopy images for augmented reality, virtual reality and other computer-mediated medical applications.
Collapse
Affiliation(s)
- Long Xi
- Bournemouth University, Poole, Dorset BH12 5BB, UK.
| | - Yan Zhao
- Bournemouth University, Poole, Dorset BH12 5BB, UK.
| | | | | | - Wen Tang
- Bournemouth University, Poole, Dorset BH12 5BB, UK.
| | | | - Tao Xue
- Xian Polytechnic University, Xian, Shaanxi 710048, China.
| |
Collapse
|
13
|
Ozawa T, Hayashi Y, Oda H, Oda M, Kitasaka T, Takeshita N, Ito M, Mori K. Synthetic laparoscopic video generation for machine learning-based surgical instrument segmentation from real laparoscopic video and virtual surgical instruments. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING: IMAGING & VISUALIZATION 2021. [DOI: 10.1080/21681163.2020.1835560] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Takuya Ozawa
- Graduate School of Informatics, Nagoya University, Nagoya, Japan
| | - Yuichiro Hayashi
- Graduate School of Informatics, Nagoya University, Nagoya, Japan
| | - Hirohisa Oda
- Graduate School of Medicine, Nagoya University, Nagoya, Japan
| | - Masahiro Oda
- Graduate School of Informatics, Nagoya University, Nagoya, Japan
| | - Takayuki Kitasaka
- School of Information Science, Aichi Institute of Technology, Toyota, Japan
| | - Nobuyoshi Takeshita
- Department of Colorectal Surgery, National Cancer Center Hospital East, Kashiwa, Japan
| | - Masaaki Ito
- Department of Colorectal Surgery, National Cancer Center Hospital East, Kashiwa, Japan
| | - Kensaku Mori
- Graduate School of Informatics, Nagoya University, Nagoya, Japan
| |
Collapse
|
14
|
Garcia-Peraza-Herrera LC, Fidon L, D'Ettorre C, Stoyanov D, Vercauteren T, Ourselin S. Image Compositing for Segmentation of Surgical Tools Without Manual Annotations. IEEE TRANSACTIONS ON MEDICAL IMAGING 2021; 40:1450-1460. [PMID: 33556005 PMCID: PMC8092331 DOI: 10.1109/tmi.2021.3057884] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Producing manual, pixel-accurate, image segmentation labels is tedious and time-consuming. This is often a rate-limiting factor when large amounts of labeled images are required, such as for training deep convolutional networks for instrument-background segmentation in surgical scenes. No large datasets comparable to industry standards in the computer vision community are available for this task. To circumvent this problem, we propose to automate the creation of a realistic training dataset by exploiting techniques stemming from special effects and harnessing them to target training performance rather than visual appeal. Foreground data is captured by placing sample surgical instruments over a chroma key (a.k.a. green screen) in a controlled environment, thereby making extraction of the relevant image segment straightforward. Multiple lighting conditions and viewpoints can be captured and introduced in the simulation by moving the instruments and camera and modulating the light source. Background data is captured by collecting videos that do not contain instruments. In the absence of pre-existing instrument-free background videos, minimal labeling effort is required, just to select frames that do not contain surgical instruments from videos of surgical interventions freely available online. We compare different methods to blend instruments over tissue and propose a novel data augmentation approach that takes advantage of the plurality of options. We show that by training a vanilla U-Net on semi-synthetic data only and applying a simple post-processing, we are able to match the results of the same network trained on a publicly available manually labeled real dataset.
Collapse
|
15
|
Sahu M, Mukhopadhyay A, Zachow S. Simulation-to-real domain adaptation with teacher-student learning for endoscopic instrument segmentation. Int J Comput Assist Radiol Surg 2021; 16:849-859. [PMID: 33982232 PMCID: PMC8134307 DOI: 10.1007/s11548-021-02383-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Accepted: 04/16/2021] [Indexed: 02/06/2023]
Abstract
PURPOSE Segmentation of surgical instruments in endoscopic video streams is essential for automated surgical scene understanding and process modeling. However, relying on fully supervised deep learning for this task is challenging because manual annotation occupies valuable time of the clinical experts. METHODS We introduce a teacher-student learning approach that learns jointly from annotated simulation data and unlabeled real data to tackle the challenges in simulation-to-real unsupervised domain adaptation for endoscopic image segmentation. RESULTS Empirical results on three datasets highlight the effectiveness of the proposed framework over current approaches for the endoscopic instrument segmentation task. Additionally, we provide analysis of major factors affecting the performance on all datasets to highlight the strengths and failure modes of our approach. CONCLUSIONS We show that our proposed approach can successfully exploit the unlabeled real endoscopic video frames and improve generalization performance over pure simulation-based training and the previous state-of-the-art. This takes us one step closer to effective segmentation of surgical instrument in the annotation scarce setting.
Collapse
Affiliation(s)
- Manish Sahu
- Zuse Institute Berlin (ZIB), Berlin, Germany
| | | | | |
Collapse
|
16
|
Colleoni E, Stoyanov D. Robotic Instrument Segmentation With Image-to-Image Translation. IEEE Robot Autom Lett 2021. [DOI: 10.1109/lra.2021.3056354] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
17
|
Bodenstedt S, Wagner M, Müller-Stich BP, Weitz J, Speidel S. Artificial Intelligence-Assisted Surgery: Potential and Challenges. Visc Med 2020; 36:450-455. [PMID: 33447600 PMCID: PMC7768095 DOI: 10.1159/000511351] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Accepted: 09/03/2020] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Artificial intelligence (AI) has recently achieved considerable success in different domains including medical applications. Although current advances are expected to impact surgery, up until now AI has not been able to leverage its full potential due to several challenges that are specific to that field. SUMMARY This review summarizes data-driven methods and technologies needed as a prerequisite for different AI-based assistance functions in the operating room. Potential effects of AI usage in surgery will be highlighted, concluding with ongoing challenges to enabling AI for surgery. KEY MESSAGES AI-assisted surgery will enable data-driven decision-making via decision support systems and cognitive robotic assistance. The use of AI for workflow analysis will help provide appropriate assistance in the right context. The requirements for such assistance must be defined by surgeons in close cooperation with computer scientists and engineers. Once the existing challenges will have been solved, AI assistance has the potential to improve patient care by supporting the surgeon without replacing him or her.
Collapse
Affiliation(s)
- Sebastian Bodenstedt
- Division of Translational Surgical Oncology, National Center for Tumor Diseases Dresden, Dresden, Germany
- Centre for Tactile Internet with Human-in-the-Loop (CeTI), TU Dresden, Dresden, Germany
| | - Martin Wagner
- Department of General, Visceral and Transplantation Surgery, Heidelberg University Hospital, Heidelberg, Germany
| | - Beat Peter Müller-Stich
- Department of General, Visceral and Transplantation Surgery, Heidelberg University Hospital, Heidelberg, Germany
| | - Jürgen Weitz
- Department for Visceral, Thoracic and Vascular Surgery, University Hospital Carl-Gustav-Carus, TU Dresden, Dresden, Germany
- Centre for Tactile Internet with Human-in-the-Loop (CeTI), TU Dresden, Dresden, Germany
| | - Stefanie Speidel
- Division of Translational Surgical Oncology, National Center for Tumor Diseases Dresden, Dresden, Germany
- Centre for Tactile Internet with Human-in-the-Loop (CeTI), TU Dresden, Dresden, Germany
| |
Collapse
|