1
|
Liu Z, Yang B, Shen Y, Ni X, Tsaftaris SA, Zhou H. Long-short diffeomorphism memory network for weakly-supervised ultrasound landmark tracking. Med Image Anal 2024; 94:103138. [PMID: 38479152 DOI: 10.1016/j.media.2024.103138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 01/26/2024] [Accepted: 03/05/2024] [Indexed: 04/16/2024]
Abstract
Ultrasound is a promising medical imaging modality benefiting from low-cost and real-time acquisition. Accurate tracking of an anatomical landmark has been of high interest for various clinical workflows such as minimally invasive surgery and ultrasound-guided radiation therapy. However, tracking an anatomical landmark accurately in ultrasound video is very challenging, due to landmark deformation, visual ambiguity and partial observation. In this paper, we propose a long-short diffeomorphism memory network (LSDM), which is a multi-task framework with an auxiliary learnable deformation prior to supporting accurate landmark tracking. Specifically, we design a novel diffeomorphic representation, which contains both long and short temporal information stored in separate memory banks for delineating motion margins and reducing cumulative errors. We further propose an expectation maximization memory alignment (EMMA) algorithm to iteratively optimize both the long and short deformation memory, updating the memory queue for mitigating local anatomical ambiguity. The proposed multi-task system can be trained in a weakly-supervised manner, which only requires few landmark annotations for tracking and zero annotation for deformation learning. We conduct extensive experiments on both public and private ultrasound landmark tracking datasets. Experimental results show that LSDM can achieve better or competitive landmark tracking performance with a strong generalization capability across different scanner types and different ultrasound modalities, compared with other state-of-the-art methods.
Collapse
Affiliation(s)
- Zhihua Liu
- School of Computing and Mathematical Sciences, University of Leicester, Leicester, LE1 7RH, UK
| | - Bin Yang
- Department of Cardiovascular Sciences, University Hospitals of Leicester NHS Trust, Leicester, LE1 9HN, UK; Nantong-Leicester Joint Institute of Kidney Science, Department of Nephrology, Affiliated Hospital of Nantong University, Nantong, 226001, China
| | - Yan Shen
- Department of Emergency Medicine, Affiliated Hospital of Nantong University, Nantong, 226001, China
| | - Xuejun Ni
- Department of Emergency Medicine, Affiliated Hospital of Nantong University, Nantong, 226001, China
| | - Sotirios A Tsaftaris
- School of Engineering, The University of Edinburgh, Edinburgh EH9 3FG, UK; The Alan Turing Institute, London NW1 2DB, UK
| | - Huiyu Zhou
- School of Computing and Mathematical Sciences, University of Leicester, Leicester, LE1 7RH, UK.
| |
Collapse
|
2
|
Rivoir D, Funke I, Speidel S. On the pitfalls of Batch Normalization for end-to-end video learning: A study on surgical workflow analysis. Med Image Anal 2024; 94:103126. [PMID: 38452578 DOI: 10.1016/j.media.2024.103126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 01/11/2024] [Accepted: 02/26/2024] [Indexed: 03/09/2024]
Abstract
Batch Normalization's (BN) unique property of depending on other samples in a batch is known to cause problems in several tasks, including sequence modeling. Yet, BN-related issues are hardly studied for long video understanding, despite the ubiquitous use of BN in CNNs (Convolutional Neural Networks) for feature extraction. Especially in surgical workflow analysis, where the lack of pretrained feature extractors has led to complex, multi-stage training pipelines, limited awareness of BN issues may have hidden the benefits of training CNNs and temporal models end to end. In this paper, we analyze pitfalls of BN in video learning, including issues specific to online tasks such as a 'cheating' effect in anticipation. We observe that BN's properties create major obstacles for end-to-end learning. However, using BN-free backbones, even simple CNN-LSTMs beat the state of the art on three surgical workflow benchmarks by utilizing adequate end-to-end training strategies which maximize temporal context. We conclude that awareness of BN's pitfalls is crucial for effective end-to-end learning in surgical tasks. By reproducing results on natural-video datasets, we hope our insights will benefit other areas of video learning as well. Code is available at: https://gitlab.com/nct_tso_public/pitfalls_bn.
Collapse
Affiliation(s)
- Dominik Rivoir
- Department of Translational Surgical Oncology, National Center for Tumor Diseases (NCT/UCC Dresden), Fetscherstraße 74, 01307 Dresden, Germany: German Cancer Research Center (DKFZ), Heidelberg, Germany; Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany; Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Dresden, Germany; Centre for Tactile Internet with Human-in-the-Loop (CeTI), TUD Dresden University of Technology, Dresden, Germany.
| | - Isabel Funke
- Department of Translational Surgical Oncology, National Center for Tumor Diseases (NCT/UCC Dresden), Fetscherstraße 74, 01307 Dresden, Germany: German Cancer Research Center (DKFZ), Heidelberg, Germany; Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany; Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Dresden, Germany; Centre for Tactile Internet with Human-in-the-Loop (CeTI), TUD Dresden University of Technology, Dresden, Germany
| | - Stefanie Speidel
- Department of Translational Surgical Oncology, National Center for Tumor Diseases (NCT/UCC Dresden), Fetscherstraße 74, 01307 Dresden, Germany: German Cancer Research Center (DKFZ), Heidelberg, Germany; Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany; Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Dresden, Germany; Centre for Tactile Internet with Human-in-the-Loop (CeTI), TUD Dresden University of Technology, Dresden, Germany
| |
Collapse
|
3
|
Abiyev RH, Altabel MZ, Darwish M, Helwan A. A Multimodal Transformer Model for Recognition of Images from Complex Laparoscopic Surgical Videos. Diagnostics (Basel) 2024; 14:681. [PMID: 38611594 PMCID: PMC11011728 DOI: 10.3390/diagnostics14070681] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2024] [Revised: 03/12/2024] [Accepted: 03/15/2024] [Indexed: 04/14/2024] Open
Abstract
The determination of the potential role and advantages of artificial intelligence-based models in the field of surgery remains uncertain. This research marks an initial stride towards creating a multimodal model, inspired by the Video-Audio-Text Transformer, that aims to reduce negative occurrences and enhance patient safety. The model employs text and image embedding state-of-the-art models (ViT and BERT) to assess their efficacy in extracting the hidden and distinct features from the surgery video frames. These features are then used as inputs for convolution-free Transformer architectures to extract comprehensive multidimensional representations. A joint space is then used to combine the text and image features extracted from both Transformer encoders. This joint space ensures that the relationships between the different modalities are preserved during the combination process. The entire model was trained and tested on laparoscopic cholecystectomy (LC) videos encompassing various levels of complexity. Experimentally, a mean accuracy of 91.0%, a precision of 81%, and a recall of 83% were reached by the model when tested on 30 videos out of 80 from the Cholec80 dataset.
Collapse
Affiliation(s)
- Rahib H. Abiyev
- Applied Artificial Intelligence Research Centre, Department of Computer Engineering, Near East University, 99132 North Cyprus, Turkey; (M.Z.A.); (M.D.)
| | - Mohamad Ziad Altabel
- Applied Artificial Intelligence Research Centre, Department of Computer Engineering, Near East University, 99132 North Cyprus, Turkey; (M.Z.A.); (M.D.)
| | - Manal Darwish
- Applied Artificial Intelligence Research Centre, Department of Computer Engineering, Near East University, 99132 North Cyprus, Turkey; (M.Z.A.); (M.D.)
| | - Abdulkader Helwan
- Department of Health, Medicine and Caring Sciences, Linköping University, 581 85 Linköping, Sweden;
| |
Collapse
|
4
|
Kostiuchik G, Sharan L, Mayer B, Wolf I, Preim B, Engelhardt S. Surgical phase and instrument recognition: how to identify appropriate dataset splits. Int J Comput Assist Radiol Surg 2024:10.1007/s11548-024-03063-9. [PMID: 38285380 DOI: 10.1007/s11548-024-03063-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 01/08/2024] [Indexed: 01/30/2024]
Abstract
PURPOSE Machine learning approaches can only be reliably evaluated if training, validation, and test data splits are representative and not affected by the absence of classes. Surgical workflow and instrument recognition are two tasks that are complicated in this manner, because of heavy data imbalances resulting from different length of phases and their potential erratic occurrences. Furthermore, sub-properties like instrument (co-)occurrence are usually not particularly considered when defining the split. METHODS We present a publicly available data visualization tool that enables interactive exploration of dataset partitions for surgical phase and instrument recognition. The application focuses on the visualization of the occurrence of phases, phase transitions, instruments, and instrument combinations across sets. Particularly, it facilitates assessment of dataset splits, especially regarding identification of sub-optimal dataset splits. RESULTS We performed analysis of the datasets Cholec80, CATARACTS, CaDIS, M2CAI-workflow, and M2CAI-tool using the proposed application. We were able to uncover phase transitions, individual instruments, and combinations of surgical instruments that were not represented in one of the sets. Addressing these issues, we identify possible improvements in the splits using our tool. A user study with ten participants demonstrated that the participants were able to successfully solve a selection of data exploration tasks. CONCLUSION In highly unbalanced class distributions, special care should be taken with respect to the selection of an appropriate dataset split because it can greatly influence the assessments of machine learning approaches. Our interactive tool allows for determination of better splits to improve current practices in the field. The live application is available at https://cardio-ai.github.io/endovis-ml/ .
Collapse
Affiliation(s)
- Georgii Kostiuchik
- Department of Cardiac Surgery, Heidelberg University Hospital, Heidelberg, Germany.
- DZHK (German Centre for Cardiovascular Research), Partner Site Heidelberg/Mannheim, Heidelberg, Germany.
| | - Lalith Sharan
- Department of Cardiac Surgery, Heidelberg University Hospital, Heidelberg, Germany
- DZHK (German Centre for Cardiovascular Research), Partner Site Heidelberg/Mannheim, Heidelberg, Germany
| | - Benedikt Mayer
- Department of Simulation and Graphics, University of Magdeburg, Magdeburg, Germany
| | - Ivo Wolf
- Department of Computer Science, Mannheim University of Applied Sciences, Mannheim, Germany
| | - Bernhard Preim
- Department of Simulation and Graphics, University of Magdeburg, Magdeburg, Germany
| | - Sandy Engelhardt
- Department of Cardiac Surgery, Heidelberg University Hospital, Heidelberg, Germany
- DZHK (German Centre for Cardiovascular Research), Partner Site Heidelberg/Mannheim, Heidelberg, Germany
| |
Collapse
|
5
|
Men Y, Zhao Z, Chen W, Wu H, Zhang G, Luo F, Yu M. Research on workflow recognition for liver rupture repair surgery. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024; 21:1844-1856. [PMID: 38454663 DOI: 10.3934/mbe.2024080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/09/2024]
Abstract
Liver rupture repair surgery serves as one tool to treat liver rupture, especially beneficial for cases of mild liver rupture hemorrhage. Liver rupture can catalyze critical conditions such as hemorrhage and shock. Surgical workflow recognition in liver rupture repair surgery videos presents a significant task aimed at reducing surgical mistakes and enhancing the quality of surgeries conducted by surgeons. A liver rupture repair simulation surgical dataset is proposed in this paper which consists of 45 videos collaboratively completed by nine surgeons. Furthermore, an end-to-end SA-RLNet, a self attention-based recurrent convolutional neural network, is introduced in this paper. The self-attention mechanism is used to automatically identify the importance of input features in various instances and associate the relationships between input features. The accuracy of the surgical phase classification of the SA-RLNet approach is 90.6%. The present study demonstrates that the SA-RLNet approach shows strong generalization capabilities on the dataset. SA-RLNet has proved to be advantageous in capturing subtle variations between surgical phases. The application of surgical workflow recognition has promising feasibility in liver rupture repair surgery.
Collapse
Affiliation(s)
- Yutao Men
- Tianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin 300384, China
- National Demonstration Center for Experimental Mechanical and Electrical Engineering Education, Tianjin University of Technology, Tianjin 300384, China
| | - Zixian Zhao
- Tianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin 300384, China
- Medical Support Technology Research Department, Systems Engineering Institute, Academy of Military Sciences, People's Liberation Army, Tianjin 300161, China
- National Demonstration Center for Experimental Mechanical and Electrical Engineering Education, Tianjin University of Technology, Tianjin 300384, China
| | - Wei Chen
- Tianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin 300384, China
- National Demonstration Center for Experimental Mechanical and Electrical Engineering Education, Tianjin University of Technology, Tianjin 300384, China
| | - Hang Wu
- Medical Support Technology Research Department, Systems Engineering Institute, Academy of Military Sciences, People's Liberation Army, Tianjin 300161, China
| | - Guang Zhang
- Medical Support Technology Research Department, Systems Engineering Institute, Academy of Military Sciences, People's Liberation Army, Tianjin 300161, China
| | - Feng Luo
- Medical Support Technology Research Department, Systems Engineering Institute, Academy of Military Sciences, People's Liberation Army, Tianjin 300161, China
| | - Ming Yu
- Medical Support Technology Research Department, Systems Engineering Institute, Academy of Military Sciences, People's Liberation Army, Tianjin 300161, China
| |
Collapse
|
6
|
Feng X, Zhang X, Shi X, Li L, Wang S. ST-ITEF: Spatio-Temporal Intraoperative Task Estimating Framework to recognize surgical phase and predict instrument path based on multi-object tracking in keratoplasty. Med Image Anal 2024; 91:103026. [PMID: 37976868 DOI: 10.1016/j.media.2023.103026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Revised: 08/22/2023] [Accepted: 11/03/2023] [Indexed: 11/19/2023]
Abstract
Computer-assisted cognition guidance for surgical robotics by computer vision is a potential future outcome, which could facilitate the surgery for both operation accuracy and autonomy level. In this paper, multiple-object segmentation and feature extraction from this segmentation are combined to determine and predict surgical manipulation. A novel three-stage Spatio-Temporal Intraoperative Task Estimating Framework is proposed, with a quantitative expression derived from ophthalmologists' visual information process and also with the multi-object tracking of surgical instruments and human corneas involved in keratoplasty. In the estimation of intraoperative workflow, quantifying the operation parameters is still an open challenge. This problem is tackled by extracting key geometric properties from multi-object segmentation and calculating the relative position among instruments and corneas. A decision framework is further proposed, based on prior geometric properties, to recognize the current surgical phase and predict the instrument path for each phase. Our framework is tested and evaluated by real human keratoplasty videos. The optimized DeepLabV3 with image filtration won the competitive class-IoU in the segmentation task and the mean phase jaccard reached 55.58 % for the phase recognition. Both the qualitative and quantitative results indicate that our framework can achieve accurate segmentation and surgical phase recognition under complex disturbance. The Intraoperative Task Estimating Framework would be highly potential to guide surgical robots in clinical practice.
Collapse
Affiliation(s)
- Xiaojing Feng
- School of Mechanical Engineering at Xi'an Jiaotong University, 28 Xianning West Road, Xi'an 710049, China.
| | - Xiaodong Zhang
- School of Mechanical Engineering at Xi'an Jiaotong University, 28 Xianning West Road, Xi'an 710049, China.
| | - Xiaojun Shi
- School of Mechanical Engineering at Xi'an Jiaotong University, 28 Xianning West Road, Xi'an 710049, China
| | - Li Li
- Department of Ophthalmology at the First Affiliated Hospital of Xi'an Jiaotong University, 277 Yanta West Road, Xi'an 710061, China
| | - Shaopeng Wang
- School of Mechanical Engineering at Xi'an Jiaotong University, 28 Xianning West Road, Xi'an 710049, China
| |
Collapse
|
7
|
Zhang J, Barbarisi S, Kadkhodamohammadi A, Stoyanov D, Luengo I. Self-knowledge distillation for surgical phase recognition. Int J Comput Assist Radiol Surg 2024; 19:61-68. [PMID: 37340283 DOI: 10.1007/s11548-023-02970-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 05/19/2023] [Indexed: 06/22/2023]
Abstract
PURPOSE Advances in surgical phase recognition are generally led by training deeper networks. Rather than going further with a more complex solution, we believe that current models can be exploited better. We propose a self-knowledge distillation framework that can be integrated into current state-of-the-art (SOTA) models without requiring any extra complexity to the models or annotations. METHODS Knowledge distillation is a framework for network regularization where knowledge is distilled from a teacher network to a student network. In self-knowledge distillation, the student model becomes the teacher such that the network learns from itself. Most phase recognition models follow an encoder-decoder framework. Our framework utilizes self-knowledge distillation in both stages. The teacher model guides the training process of the student model to extract enhanced feature representations from the encoder and build a more robust temporal decoder to tackle the over-segmentation problem. RESULTS We validate our proposed framework on the public dataset Cholec80. Our framework is embedded on top of four popular SOTA approaches and consistently improves their performance. Specifically, our best GRU model boosts performance by [Formula: see text] accuracy and [Formula: see text] F1-score over the same baseline model. CONCLUSION We embed a self-knowledge distillation framework for the first time in the surgical phase recognition training pipeline. Experimental results demonstrate that our simple yet powerful framework can improve performance of existing phase recognition models. Moreover, our extensive experiments show that even with 75% of the training set we still achieve performance on par with the same baseline model trained on the full set.
Collapse
Affiliation(s)
- Jinglu Zhang
- Medtronic Digital Surgery, 230 City Road, London, UK
| | | | | | - Danail Stoyanov
- Medtronic Digital Surgery, 230 City Road, London, UK
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, UK
| | - Imanol Luengo
- Medtronic Digital Surgery, 230 City Road, London, UK
| |
Collapse
|
8
|
Park B, Chi H, Park B, Lee J, Jin HS, Park S, Hyung WJ, Choi MK. Visual modalities-based multimodal fusion for surgical phase recognition. Comput Biol Med 2023; 166:107453. [PMID: 37774560 DOI: 10.1016/j.compbiomed.2023.107453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 08/17/2023] [Accepted: 09/04/2023] [Indexed: 10/01/2023]
Abstract
Surgical workflow analysis is essential to help optimize surgery by encouraging efficient communication and the use of resources. However, the performance of phase recognition is limited by the use of information related to the presence of surgical instruments. To address the problem, we propose visual modality-based multimodal fusion for surgical phase recognition to overcome the limited diversity of information such as the presence of instruments. Using the proposed methods, we extracted a visual kinematics-based index related to using instruments, such as movement and their interrelations during surgery. In addition, we improved recognition performance using an effective convolutional neural network (CNN)-based fusion method for visual features and a visual kinematics-based index (VKI). The visual kinematics-based index improves the understanding of a surgical procedure since information is related to instrument interaction. Furthermore, these indices can be extracted in any environment, such as laparoscopic surgery, and help obtain complementary information for system kinematics log errors. The proposed methodology was applied to two multimodal datasets, a virtual reality (VR) simulator-based dataset (PETRAW) and a private distal gastrectomy surgery dataset, to verify that it can help improve recognition performance in clinical environments. We also explored the influence of a visual kinematics-based index to recognize each surgical workflow by the instrument's existence and the instrument's trajectory. Through the experimental results of a distal gastrectomy video dataset, we validated the effectiveness of our proposed fusion approach in surgical phase recognition. The relatively simple yet index-incorporated fusion we propose can yield significant performance improvements over only CNN-based training and exhibits effective training results compared to fusion based on Transformers, which require a large amount of pre-trained data.
Collapse
Affiliation(s)
- Bogyu Park
- AI Dev. Group, Hutom, Dokmak-ro 279, Mapo-gu, 04151, Seoul, Republic of Korea.
| | - Hyeongyu Chi
- AI Dev. Group, Hutom, Dokmak-ro 279, Mapo-gu, 04151, Seoul, Republic of Korea.
| | - Bokyung Park
- AI Dev. Group, Hutom, Dokmak-ro 279, Mapo-gu, 04151, Seoul, Republic of Korea.
| | - Jiwon Lee
- AI Dev. Group, Hutom, Dokmak-ro 279, Mapo-gu, 04151, Seoul, Republic of Korea.
| | - Hye Su Jin
- AI Dev. Group, Hutom, Dokmak-ro 279, Mapo-gu, 04151, Seoul, Republic of Korea.
| | - Sunghyun Park
- Yonsei University College of Medicine, Yonsei-ro 50, Seodaemun-gu, 03722, Seoul, Republic of Korea.
| | - Woo Jin Hyung
- AI Dev. Group, Hutom, Dokmak-ro 279, Mapo-gu, 04151, Seoul, Republic of Korea; Yonsei University College of Medicine, Yonsei-ro 50, Seodaemun-gu, 03722, Seoul, Republic of Korea.
| | - Min-Kook Choi
- AI Dev. Group, Hutom, Dokmak-ro 279, Mapo-gu, 04151, Seoul, Republic of Korea.
| |
Collapse
|
9
|
Demir KC, Schieber H, Weise T, Roth D, May M, Maier A, Yang SH. Deep Learning in Surgical Workflow Analysis: A Review of Phase and Step Recognition. IEEE J Biomed Health Inform 2023; 27:5405-5417. [PMID: 37665700 DOI: 10.1109/jbhi.2023.3311628] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2023]
Abstract
OBJECTIVE In the last two decades, there has been a growing interest in exploring surgical procedures with statistical models to analyze operations at different semantic levels. This information is necessary for developing context-aware intelligent systems, which can assist the physicians during operations, evaluate procedures afterward or help the management team to effectively utilize the operating room. The objective is to extract reliable patterns from surgical data for the robust estimation of surgical activities performed during operations. The purpose of this article is to review the state-of-the-art deep learning methods that have been published after 2018 for analyzing surgical workflows, with a focus on phase and step recognition. METHODS Three databases, IEEE Xplore, Scopus, and PubMed were searched, and additional studies are added through a manual search. After the database search, 343 studies were screened and a total of 44 studies are selected for this review. CONCLUSION The use of temporal information is essential for identifying the next surgical action. Contemporary methods used mainly RNNs, hierarchical CNNs, and Transformers to preserve long-distance temporal relations. The lack of large publicly available datasets for various procedures is a great challenge for the development of new and robust models. As supervised learning strategies are used to show proof-of-concept, self-supervised, semi-supervised, or active learning methods are used to mitigate dependency on annotated data. SIGNIFICANCE The present study provides a comprehensive review of recent methods in surgical workflow analysis, summarizes commonly used architectures, datasets, and discusses challenges.
Collapse
|
10
|
Zhang J, Zhou S, Wang Y, Shi S, Wan C, Zhao H, Cai X, Ding H. Laparoscopic Image-Based Critical Action Recognition and Anticipation With Explainable Features. IEEE J Biomed Health Inform 2023; 27:5393-5404. [PMID: 37603480 DOI: 10.1109/jbhi.2023.3306818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/23/2023]
Abstract
Surgical workflow analysis integrates perception, comprehension, and prediction of the surgical workflow, which helps real-time surgical support systems provide proper guidance and assistance for surgeons. This article promotes the idea of critical actions, which refer to the essential surgical actions that progress towards the fulfillment of the operation. Fine-grained workflow analysis involves recognizing current critical actions and previewing the moving tendency of instruments in the early stage of critical actions. Aiming at this, we propose a framework that incorporates operational experience to improve the robustness and interpretability of action recognition in in-vivo situations. High-dimensional images are mapped into an experience-based explainable feature space with low dimensions to achieve critical action recognition through a hierarchical classification structure. To forecast the instrument's motion tendency, we model the motion primitives in the polar coordinate system (PCS) to represent patterns of complex trajectories. Given the laparoscopy variance, the adaptive pattern recognition (APR) method, which adapts to uncertain trajectories by modifying model parameters, is designed to improve prediction accuracy. The in-vivo dataset validations show that our framework fulfilled the surgical awareness tasks with exceptional accuracy and real-time performance.
Collapse
|
11
|
Tao R, Zou X, Zheng G. LAST: LAtent Space-Constrained Transformers for Automatic Surgical Phase Recognition and Tool Presence Detection. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:3256-3268. [PMID: 37227905 DOI: 10.1109/tmi.2023.3279838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
When developing context-aware systems, automatic surgical phase recognition and tool presence detection are two essential tasks. There exist previous attempts to develop methods for both tasks but majority of the existing methods utilize a frame-level loss function (e.g., cross-entropy) which does not fully leverage the underlying semantic structure of a surgery, leading to sub-optimal results. In this paper, we propose multi-task learning-based, LAtent Space-constrained Transformers, referred as LAST, for automatic surgical phase recognition and tool presence detection. Our design features a two-branch transformer architecture with a novel and generic way to leverage video-level semantic information during network training. This is done by learning a non-linear compact presentation of the underlying semantic structure information of surgical videos through a transformer variational autoencoder (VAE) and by encouraging models to follow the learned statistical distributions. In other words, LAST is of structure-aware and favors predictions that lie on the extracted low dimensional data manifold. Validated on two public datasets of the cholecystectomy surgery, i.e., the Cholec80 dataset and the M2cai16 dataset, our method achieves better results than other state-of-the-art methods. Specifically, on the Cholec80 dataset, our method achieves an average accuracy of 93.12±4.71%, an average precision of 89.25±5.49%, an average recall of 90.10±5.45% and an average Jaccard of 81.11 ±7.62% for phase recognition, and an average mAP of 95.15±3.87% for tool presence detection. Similar superior performance is also observed when LAST is applied to the M2cai16 dataset.
Collapse
|
12
|
Li Y, Xia T, Luo H, He B, Jia F. MT-FiST: A Multi-Task Fine-Grained Spatial-Temporal Framework for Surgical Action Triplet Recognition. IEEE J Biomed Health Inform 2023; 27:4983-4994. [PMID: 37498758 DOI: 10.1109/jbhi.2023.3299321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Surgical action triplet recognition plays a significant role in helping surgeons facilitate scene analysis and decision-making in computer-assisted surgeries. Compared to traditional context-aware tasks such as phase recognition, surgical action triplets, comprising the instrument, verb, and target, can offer more comprehensive and detailed information. However, current triplet recognition methods fall short in distinguishing the fine-grained subclasses and disregard temporal correlation in action triplets. In this article, we propose a multi-task fine-grained spatial-temporal framework for surgical action triplet recognition named MT-FiST. The proposed method utilizes a multi-label mutual channel loss, which consists of diversity and discriminative components. This loss function decouples global task features into class-aligned features, enabling the learning of more local details from the surgical scene. The proposed framework utilizes partial shared-parameters LSTM units to capture temporal correlations between adjacent frames. We conducted experiments on the CholecT50 dataset proposed in the MICCAI 2021 Surgical Action Triplet Recognition Challenge. Our framework is evaluated on the private test set of the challenge to ensure fair comparisons. Our model apparently outperformed state-of-the-art models in instrument, verb, target, and action triplet recognition tasks, with mAPs of 82.1% (+4.6%), 51.5% (+4.0%), 45.50% (+7.8%), and 35.8% (+3.1%), respectively. The proposed MT-FiST boosts the recognition of surgical action triplets in a context-aware surgical assistant system, further solving multi-task recognition by effective temporal aggregation and fine-grained features.
Collapse
|
13
|
Ramesh S, Dall'Alba D, Gonzalez C, Yu T, Mascagni P, Mutter D, Marescaux J, Fiorini P, Padoy N. Weakly Supervised Temporal Convolutional Networks for Fine-Grained Surgical Activity Recognition. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:2592-2602. [PMID: 37030859 DOI: 10.1109/tmi.2023.3262847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Automatic recognition of fine-grained surgical activities, called steps, is a challenging but crucial task for intelligent intra-operative computer assistance. The development of current vision-based activity recognition methods relies heavily on a high volume of manually annotated data. This data is difficult and time-consuming to generate and requires domain-specific knowledge. In this work, we propose to use coarser and easier-to-annotate activity labels, namely phases, as weak supervision to learn step recognition with fewer step annotated videos. We introduce a step-phase dependency loss to exploit the weak supervision signal. We then employ a Single-Stage Temporal Convolutional Network (SS-TCN) with a ResNet-50 backbone, trained in an end-to-end fashion from weakly annotated videos, for temporal activity segmentation and recognition. We extensively evaluate and show the effectiveness of the proposed method on a large video dataset consisting of 40 laparoscopic gastric bypass procedures and the public benchmark CATARACTS containing 50 cataract surgeries.
Collapse
|
14
|
Ramesh S, Dall'Alba D, Gonzalez C, Yu T, Mascagni P, Mutter D, Marescaux J, Fiorini P, Padoy N. TRandAugment: temporal random augmentation strategy for surgical activity recognition from videos. Int J Comput Assist Radiol Surg 2023; 18:1665-1672. [PMID: 36944845 PMCID: PMC10491694 DOI: 10.1007/s11548-023-02864-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 03/01/2023] [Indexed: 03/23/2023]
Abstract
PURPOSE Automatic recognition of surgical activities from intraoperative surgical videos is crucial for developing intelligent support systems for computer-assisted interventions. Current state-of-the-art recognition methods are based on deep learning where data augmentation has shown the potential to improve the generalization of these methods. This has spurred work on automated and simplified augmentation strategies for image classification and object detection on datasets of still images. Extending such augmentation methods to videos is not straightforward, as the temporal dimension needs to be considered. Furthermore, surgical videos pose additional challenges as they are composed of multiple, interconnected, and long-duration activities. METHODS This work proposes a new simplified augmentation method, called TRandAugment, specifically designed for long surgical videos, that treats each video as an assemble of temporal segments and applies consistent but random transformations to each segment. The proposed augmentation method is used to train an end-to-end spatiotemporal model consisting of a CNN (ResNet50) followed by a TCN. RESULTS The effectiveness of the proposed method is demonstrated on two surgical video datasets, namely Bypass40 and CATARACTS, and two tasks, surgical phase and step recognition. TRandAugment adds a performance boost of 1-6% over previous state-of-the-art methods, that uses manually designed augmentations. CONCLUSION This work presents a simplified and automated augmentation method for long surgical videos. The proposed method has been validated on different datasets and tasks indicating the importance of devising temporal augmentation methods for long surgical videos.
Collapse
Affiliation(s)
- Sanat Ramesh
- Altair Robotics Lab, University of Verona, 37134, Verona, Italy.
- ICube, University of Strasbourg, CNRS, 67000, Strasbourg, France.
| | - Diego Dall'Alba
- Altair Robotics Lab, University of Verona, 37134, Verona, Italy
| | - Cristians Gonzalez
- University Hospital of Strasbourg, 67000, Strasbourg, France
- Institute of Image-Guided Surgery, IHU Strasbourg, 67000, Strasbourg, France
| | - Tong Yu
- ICube, University of Strasbourg, CNRS, 67000, Strasbourg, France
| | - Pietro Mascagni
- Institute of Image-Guided Surgery, IHU Strasbourg, 67000, Strasbourg, France
- Fondazione Policlinico Universitario Agostino Gemelli IRCCS, 00168, Rome, Italy
| | - Didier Mutter
- University Hospital of Strasbourg, 67000, Strasbourg, France
- IRCAD, 67000, Strasbourg, France
- Institute of Image-Guided Surgery, IHU Strasbourg, 67000, Strasbourg, France
| | | | - Paolo Fiorini
- Altair Robotics Lab, University of Verona, 37134, Verona, Italy
| | - Nicolas Padoy
- ICube, University of Strasbourg, CNRS, 67000, Strasbourg, France
- Institute of Image-Guided Surgery, IHU Strasbourg, 67000, Strasbourg, France
| |
Collapse
|
15
|
Yu T, Mascagni P, Verde J, Marescaux J, Mutter D, Padoy N. Live laparoscopic video retrieval with compressed uncertainty. Med Image Anal 2023; 88:102866. [PMID: 37356320 DOI: 10.1016/j.media.2023.102866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 04/14/2023] [Accepted: 06/07/2023] [Indexed: 06/27/2023]
Abstract
Searching through large volumes of medical data to retrieve relevant information is a challenging yet crucial task for clinical care. However the primitive and most common approach to retrieval, involving text in the form of keywords, is severely limited when dealing with complex media formats. Content-based retrieval offers a way to overcome this limitation, by using rich media as the query itself. Surgical video-to-video retrieval in particular is a new and largely unexplored research problem with high clinical value, especially in the real-time case: using real-time video hashing, search can be achieved directly inside of the operating room. Indeed, the process of hashing converts large data entries into compact binary arrays or hashes, enabling large-scale search operations at a very fast rate. However, due to fluctuations over the course of a video, not all bits in a given hash are equally reliable. In this work, we propose a method capable of mitigating this uncertainty while maintaining a light computational footprint. We present superior retrieval results (3%-4% top 10 mean average precision) on a multi-task evaluation protocol for surgery, using cholecystectomy phases, bypass phases, and coming from an entirely new dataset introduced here, surgical events across six different surgery types. Success on this multi-task benchmark shows the generalizability of our approach for surgical video retrieval.
Collapse
Affiliation(s)
- Tong Yu
- ICube, University of Strasbourg, CNRS, France; IHU Strasbourg, France.
| | - Pietro Mascagni
- ICube, University of Strasbourg, CNRS, France; IHU Strasbourg, France; Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Rome, Italy
| | | | | | - Didier Mutter
- IHU Strasbourg, France; University Hospital of Strasbourg, France
| | - Nicolas Padoy
- ICube, University of Strasbourg, CNRS, France; IHU Strasbourg, France
| |
Collapse
|
16
|
Ramesh S, Srivastav V, Alapatt D, Yu T, Murali A, Sestini L, Nwoye CI, Hamoud I, Sharma S, Fleurentin A, Exarchakis G, Karargyris A, Padoy N. Dissecting self-supervised learning methods for surgical computer vision. Med Image Anal 2023; 88:102844. [PMID: 37270898 DOI: 10.1016/j.media.2023.102844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 05/08/2023] [Accepted: 05/15/2023] [Indexed: 06/06/2023]
Abstract
The field of surgical computer vision has undergone considerable breakthroughs in recent years with the rising popularity of deep neural network-based methods. However, standard fully-supervised approaches for training such models require vast amounts of annotated data, imposing a prohibitively high cost; especially in the clinical domain. Self-Supervised Learning (SSL) methods, which have begun to gain traction in the general computer vision community, represent a potential solution to these annotation costs, allowing to learn useful representations from only unlabeled data. Still, the effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored. In this work, we address this critical need by investigating four state-of-the-art SSL methods (MoCo v2, SimCLR, DINO, SwAV) in the context of surgical computer vision. We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection. We examine their parameterization, then their behavior with respect to training data quantities in semi-supervised settings. Correct transfer of these methods to surgery, as described and conducted in this work, leads to substantial performance gains over generic uses of SSL - up to 7.4% on phase recognition and 20% on tool presence detection - as well as state-of-the-art semi-supervised phase recognition approaches by up to 14%. Further results obtained on a highly diverse selection of surgical datasets exhibit strong generalization properties. The code is available at https://github.com/CAMMA-public/SelfSupSurg.
Collapse
Affiliation(s)
- Sanat Ramesh
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France; Altair Robotics Lab, Department of Computer Science, University of Verona, Verona 37134, Italy
| | - Vinkle Srivastav
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France.
| | - Deepak Alapatt
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France
| | - Tong Yu
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France
| | - Aditya Murali
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France
| | - Luca Sestini
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France; Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milano 20133, Italy
| | | | - Idris Hamoud
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France
| | - Saurav Sharma
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France
| | | | - Georgios Exarchakis
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France; IHU Strasbourg, Strasbourg 67000, France
| | - Alexandros Karargyris
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France; IHU Strasbourg, Strasbourg 67000, France
| | - Nicolas Padoy
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France; IHU Strasbourg, Strasbourg 67000, France
| |
Collapse
|
17
|
Nyangoh Timoh K, Huaulme A, Cleary K, Zaheer MA, Lavoué V, Donoho D, Jannin P. A systematic review of annotation for surgical process model analysis in minimally invasive surgery based on video. Surg Endosc 2023:10.1007/s00464-023-10041-w. [PMID: 37157035 DOI: 10.1007/s00464-023-10041-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2022] [Accepted: 03/25/2023] [Indexed: 05/10/2023]
Abstract
BACKGROUND Annotated data are foundational to applications of supervised machine learning. However, there seems to be a lack of common language used in the field of surgical data science. The aim of this study is to review the process of annotation and semantics used in the creation of SPM for minimally invasive surgery videos. METHODS For this systematic review, we reviewed articles indexed in the MEDLINE database from January 2000 until March 2022. We selected articles using surgical video annotations to describe a surgical process model in the field of minimally invasive surgery. We excluded studies focusing on instrument detection or recognition of anatomical areas only. The risk of bias was evaluated with the Newcastle Ottawa Quality assessment tool. Data from the studies were visually presented in table using the SPIDER tool. RESULTS Of the 2806 articles identified, 34 were selected for review. Twenty-two were in the field of digestive surgery, six in ophthalmologic surgery only, one in neurosurgery, three in gynecologic surgery, and two in mixed fields. Thirty-one studies (88.2%) were dedicated to phase, step, or action recognition and mainly relied on a very simple formalization (29, 85.2%). Clinical information in the datasets was lacking for studies using available public datasets. The process of annotation for surgical process model was lacking and poorly described, and description of the surgical procedures was highly variable between studies. CONCLUSION Surgical video annotation lacks a rigorous and reproducible framework. This leads to difficulties in sharing videos between institutions and hospitals because of the different languages used. There is a need to develop and use common ontology to improve libraries of annotated surgical videos.
Collapse
Affiliation(s)
- Krystel Nyangoh Timoh
- Department of Gynecology and Obstetrics and Human Reproduction, CHU Rennes, Rennes, France.
- INSERM, LTSI - UMR 1099, University Rennes 1, Rennes, France.
- Laboratoire d'Anatomie et d'Organogenèse, Faculté de Médecine, Centre Hospitalier Universitaire de Rennes, 2 Avenue du Professeur Léon Bernard, 35043, Rennes Cedex, France.
- Department of Obstetrics and Gynecology, Rennes Hospital, Rennes, France.
| | - Arnaud Huaulme
- INSERM, LTSI - UMR 1099, University Rennes 1, Rennes, France
| | - Kevin Cleary
- Sheikh Zayed Institute for Pediatric Surgical Innovation, Children's National Hospital, Washington, DC, 20010, USA
| | - Myra A Zaheer
- George Washington University School of Medicine and Health Sciences, Washington, DC, USA
| | - Vincent Lavoué
- Department of Gynecology and Obstetrics and Human Reproduction, CHU Rennes, Rennes, France
| | - Dan Donoho
- Division of Neurosurgery, Center for Neuroscience, Children's National Hospital, Washington, DC, 20010, USA
| | - Pierre Jannin
- INSERM, LTSI - UMR 1099, University Rennes 1, Rennes, France
| |
Collapse
|
18
|
Sharma S, Nwoye CI, Mutter D, Padoy N. Rendezvous in time: an attention-based temporal fusion approach for surgical triplet recognition. Int J Comput Assist Radiol Surg 2023:10.1007/s11548-023-02914-1. [PMID: 37097518 DOI: 10.1007/s11548-023-02914-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 04/07/2023] [Indexed: 04/26/2023]
Abstract
PURPOSE One of the recent advances in surgical AI is the recognition of surgical activities as triplets of [Formula: see text]instrument, verb, target[Formula: see text]. Albeit providing detailed information for computer-assisted intervention, current triplet recognition approaches rely only on single-frame features. Exploiting the temporal cues from earlier frames would improve the recognition of surgical action triplets from videos. METHODS In this paper, we propose Rendezvous in Time (RiT)-a deep learning model that extends the state-of-the-art model, Rendezvous, with temporal modeling. Focusing more on the verbs, our RiT explores the connectedness of current and past frames to learn temporal attention-based features for enhanced triplet recognition. RESULTS We validate our proposal on the challenging surgical triplet dataset, CholecT45, demonstrating an improved recognition of the verb and triplet along with other interactions involving the verb such as [Formula: see text]instrument, verb[Formula: see text]. Qualitative results show that the RiT produces smoother predictions for most triplet instances than the state-of-the-arts. CONCLUSION We present a novel attention-based approach that leverages the temporal fusion of video frames to model the evolution of surgical actions and exploit their benefits for surgical triplet recognition.
Collapse
Affiliation(s)
- Saurav Sharma
- ICube, University of Strasbourg, CNRS, Strasbourg, France.
| | | | - Didier Mutter
- IHU Strasbourg, Strasbourg, France
- University Hospital of Strasbourg, Strasbourg, France
| | - Nicolas Padoy
- ICube, University of Strasbourg, CNRS, Strasbourg, France
- IHU Strasbourg, Strasbourg, France
| |
Collapse
|
19
|
Zhang B, Goel B, Sarhan MH, Goel VK, Abukhalil R, Kalesan B, Stottler N, Petculescu S. Surgical workflow recognition with temporal convolution and transformer for action segmentation. Int J Comput Assist Radiol Surg 2023; 18:785-794. [PMID: 36542253 DOI: 10.1007/s11548-022-02811-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 12/09/2022] [Indexed: 12/24/2022]
Abstract
PURPOSE Automatic surgical workflow recognition enabled by computer vision algorithms plays a key role in enhancing the learning experience of surgeons. It also supports building context-aware systems that allow better surgical planning and decision making which may in turn improve outcomes. Utilizing temporal information is crucial for recognizing context; hence, various recent approaches use recurrent neural networks or transformers to recognize actions. METHODS We design and implement a two-stage method for surgical workflow recognition. We utilize R(2+1)D for video clip modeling in the first stage. We propose Action Segmentation Temporal Convolutional Transformer (ASTCFormer) network for full video modeling in the second stage. ASTCFormer utilizes action segmentation transformers (ASFormers) and temporal convolutional networks (TCNs) to build a temporally aware surgical workflow recognition system. RESULTS We compare the proposed ASTCFormer with recurrent neural networks, multi-stage TCN, and ASFormer approaches. The comparison is done on a dataset comprised of 207 robotic and laparoscopic cholecystectomy surgical videos annotated for 7 surgical phases. The proposed method outperforms the compared methods achieving a [Formula: see text] relative improvement in the average segmental F1-score over the state-of-the-art ASFormer method. Moreover, our proposed method achieves state-of-the-art results on the publicly available Cholec80 dataset. CONCLUSION The improvement in the results when using the proposed method suggests that temporal context could be better captured when adding information from TCN to the ASFormer paradigm. This addition leads to better surgical workflow recognition.
Collapse
Affiliation(s)
- Bokai Zhang
- Johnson & Johnson MedTech, 1100 Olive Way, Suite 1100, Seattle, 98101, WA, USA.
| | - Bharti Goel
- Johnson & Johnson MedTech, 5490 Great America Pkwy, Santa Clara, CA, 95054, USA
| | - Mohammad Hasan Sarhan
- Johnson & Johnson MedTech, Robert-Koch-Straße 1, 22851, Norderstedt, Schleswig-Holstein, Germany
| | - Varun Kejriwal Goel
- Johnson & Johnson MedTech, 5490 Great America Pkwy, Santa Clara, CA, 95054, USA
| | - Rami Abukhalil
- Johnson & Johnson MedTech, 5490 Great America Pkwy, Santa Clara, CA, 95054, USA
| | - Bindu Kalesan
- Johnson & Johnson MedTech, 5490 Great America Pkwy, Santa Clara, CA, 95054, USA
| | - Natalie Stottler
- Johnson & Johnson MedTech, 1100 Olive Way, Suite 1100, Seattle, 98101, WA, USA
| | - Svetlana Petculescu
- Johnson & Johnson MedTech, 1100 Olive Way, Suite 1100, Seattle, 98101, WA, USA
| |
Collapse
|
20
|
Chadebecq F, Lovat LB, Stoyanov D. Artificial intelligence and automation in endoscopy and surgery. Nat Rev Gastroenterol Hepatol 2023; 20:171-182. [PMID: 36352158 DOI: 10.1038/s41575-022-00701-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/03/2022] [Indexed: 11/10/2022]
Abstract
Modern endoscopy relies on digital technology, from high-resolution imaging sensors and displays to electronics connecting configurable illumination and actuation systems for robotic articulation. In addition to enabling more effective diagnostic and therapeutic interventions, the digitization of the procedural toolset enables video data capture of the internal human anatomy at unprecedented levels. Interventional video data encapsulate functional and structural information about a patient's anatomy as well as events, activity and action logs about the surgical process. This detailed but difficult-to-interpret record from endoscopic procedures can be linked to preoperative and postoperative records or patient imaging information. Rapid advances in artificial intelligence, especially in supervised deep learning, can utilize data from endoscopic procedures to develop systems for assisting procedures leading to computer-assisted interventions that can enable better navigation during procedures, automation of image interpretation and robotically assisted tool manipulation. In this Perspective, we summarize state-of-the-art artificial intelligence for computer-assisted interventions in gastroenterology and surgery.
Collapse
Affiliation(s)
- François Chadebecq
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, UK
| | - Laurence B Lovat
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, UK
| | - Danail Stoyanov
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, UK.
| |
Collapse
|
21
|
Generating Rare Surgical Events Using CycleGAN: Addressing Lack of Data for Artificial Intelligence Event Recognition. J Surg Res 2023; 283:594-605. [PMID: 36442259 DOI: 10.1016/j.jss.2022.11.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 10/16/2022] [Accepted: 11/06/2022] [Indexed: 11/26/2022]
Abstract
INTRODUCTION Artificial Intelligence (AI) has shown promise in facilitating surgical video review through automatic recognition of surgical activities/events. There are few public video data sources that demonstrate critical yet rare events which are insufficient to train AI for reliable video event recognition. We suggest that a generative AI algorithm can create artificial massive bleeding images for minimally invasive lobectomy that can be used to augment the current lack of data in this field. MATERIALS AND METHODS A generative adversarial network (GAN) algorithm was used (CycleGAN) to generate artificial massive bleeding event images. To train CycleGAN, six videos of minimally invasive lobectomies were utilized from which 1819 frames of nonbleeding instances and 3178 frames of massive bleeding instances were used. RESULTS The performance of the CycleGAN algorithm was tested on a new video that was not used during the training process. The trained CycleGAN was able to alter the laparoscopic lobectomy images according to their corresponding massive bleeding images, where the contents of the original images were preserved (e.g., location of tools in the scene) and the style of each image is changed to massive bleeding (i.e., blood automatically added to appropriate locations on the images). CONCLUSIONS The result could suggest a promising approach to supplement the lack of data for the rare massive bleeding event that can occur during minimally invasive lobectomy. Future work could be dedicated to developing AI algorithms to identify surgical strategies and actions that potentially lead to massive bleeding and warn surgeons prior to this event occurrence.
Collapse
|
22
|
Jalal NA, Alshirbaji TA, Docherty PD, Arabian H, Laufer B, Krueger-Ziolek S, Neumuth T, Moeller K. Laparoscopic Video Analysis Using Temporal, Attention, and Multi-Feature Fusion Based-Approaches. SENSORS (BASEL, SWITZERLAND) 2023; 23:1958. [PMID: 36850554 PMCID: PMC9964851 DOI: 10.3390/s23041958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/01/2023] [Revised: 02/06/2023] [Accepted: 02/07/2023] [Indexed: 06/18/2023]
Abstract
Adapting intelligent context-aware systems (CAS) to future operating rooms (OR) aims to improve situational awareness and provide surgical decision support systems to medical teams. CAS analyzes data streams from available devices during surgery and communicates real-time knowledge to clinicians. Indeed, recent advances in computer vision and machine learning, particularly deep learning, paved the way for extensive research to develop CAS. In this work, a deep learning approach for analyzing laparoscopic videos for surgical phase recognition, tool classification, and weakly-supervised tool localization in laparoscopic videos was proposed. The ResNet-50 convolutional neural network (CNN) architecture was adapted by adding attention modules and fusing features from multiple stages to generate better-focused, generalized, and well-representative features. Then, a multi-map convolutional layer followed by tool-wise and spatial pooling operations was utilized to perform tool localization and generate tool presence confidences. Finally, the long short-term memory (LSTM) network was employed to model temporal information and perform tool classification and phase recognition. The proposed approach was evaluated on the Cholec80 dataset. The experimental results (i.e., 88.5% and 89.0% mean precision and recall for phase recognition, respectively, 95.6% mean average precision for tool presence detection, and a 70.1% F1-score for tool localization) demonstrated the ability of the model to learn discriminative features for all tasks. The performances revealed the importance of integrating attention modules and multi-stage feature fusion for more robust and precise detection of surgical phases and tools.
Collapse
Affiliation(s)
- Nour Aldeen Jalal
- Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
- Innovation Center Computer Assisted Surgery (ICCAS), University of Leipzig, 04103 Leipzig, Germany
| | - Tamer Abdulbaki Alshirbaji
- Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
- Innovation Center Computer Assisted Surgery (ICCAS), University of Leipzig, 04103 Leipzig, Germany
| | - Paul David Docherty
- Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
- Department of Mechanical Engineering, University of Canterbury, Christchurch 8041, New Zealand
| | - Herag Arabian
- Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
| | - Bernhard Laufer
- Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
| | - Sabine Krueger-Ziolek
- Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
| | - Thomas Neumuth
- Innovation Center Computer Assisted Surgery (ICCAS), University of Leipzig, 04103 Leipzig, Germany
| | - Knut Moeller
- Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
- Department of Mechanical Engineering, University of Canterbury, Christchurch 8041, New Zealand
- Department of Microsystems Engineering, University of Freiburg, 79110 Freiburg, Germany
| |
Collapse
|
23
|
Analysing multi-perspective patient-related data during laparoscopic gynaecology procedures. Sci Rep 2023; 13:1604. [PMID: 36709360 PMCID: PMC9884204 DOI: 10.1038/s41598-023-28652-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Accepted: 01/23/2023] [Indexed: 01/29/2023] Open
Abstract
Fusing data from different medical perspectives inside the operating room (OR) sets the stage for developing intelligent context-aware systems. These systems aim to promote better awareness inside the OR by keeping every medical team well informed about the work of other teams and thus mitigate conflicts resulting from different targets. In this research, a descriptive analysis of data collected from anaesthesiology and surgery was performed to investigate the relationships between the intra-abdominal pressure (IAP) and lung mechanics for patients during laparoscopic procedures. Data of nineteen patients who underwent laparoscopic gynaecology were included. Statistical analysis of all subjects showed a strong relationship between the IAP and dynamic lung compliance (r = 0.91). Additionally, the peak airway pressure was also strongly correlated to the IAP in volume-controlled ventilated patients (r = 0.928). Statistical results obtained by this study demonstrate the importance of analysing the relationship between surgical actions and physiological responses. Moreover, these results form the basis for developing medical decision support models, e.g., automatic compensation of IAP effects on lung function.
Collapse
|
24
|
Fer D, Zhang B, Abukhalil R, Goel V, Goel B, Barker J, Kalesan B, Barragan I, Gaddis ML, Kilroy PG. An artificial intelligence model that automatically labels roux-en-Y gastric bypasses, a comparison to trained surgeon annotators. Surg Endosc 2023:10.1007/s00464-023-09870-6. [PMID: 36658282 DOI: 10.1007/s00464-023-09870-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 01/04/2023] [Indexed: 01/21/2023]
Abstract
INTRODUCTION Artificial intelligence (AI) can automate certain tasks to improve data collection. Models have been created to annotate the steps of Roux-en-Y Gastric Bypass (RYGB). However, model performance has not been compared with individual surgeon annotator performance. We developed a model that automatically labels RYGB steps and compares its performance to surgeons. METHODS AND PROCEDURES 545 videos (17 surgeons) of laparoscopic RYGB procedures were collected. An annotation guide (12 steps, 52 tasks) was developed. Steps were annotated by 11 surgeons. Each video was annotated by two surgeons and a third reconciled the differences. A convolutional AI model was trained to identify steps and compared with manual annotation. For modeling, we used 390 videos for training, 95 for validation, and 60 for testing. The performance comparison between AI model versus manual annotation was performed using ANOVA (Analysis of Variance) in a subset of 60 testing videos. We assessed the performance of the model at each step and poor performance was defined (F1-score < 80%). RESULTS The convolutional model identified 12 steps in the RYGB architecture. Model performance varied at each step [F1 > 90% for 7, and > 80% for 2]. The reconciled manual annotation data (F1 > 80% for > 5 steps) performed better than trainee's (F1 > 80% for 2-5 steps for 4 annotators, and < 2 steps for 4 annotators). In testing subset, certain steps had low performance, indicating potential ambiguities in surgical landmarks. Additionally, some videos were easier to annotate than others, suggesting variability. After controlling for variability, the AI algorithm was comparable to the manual (p < 0.0001). CONCLUSION AI can be used to identify surgical landmarks in RYGB comparable to the manual process. AI was more accurate to recognize some landmarks more accurately than surgeons. This technology has the potential to improve surgical training by assessing the learning curves of surgeons at scale.
Collapse
Affiliation(s)
- Danyal Fer
- University of California, San Francisco-East Bay, General Surgery, Oakland, CA, USA.,Johnson & Johnson MedTech, New Brunswick, NJ, USA
| | - Bokai Zhang
- Johnson & Johnson MedTech, New Brunswick, NJ, USA
| | - Rami Abukhalil
- Johnson & Johnson MedTech, New Brunswick, NJ, USA. .,, 5490 Great America Parkway, Santa Clara, CA, 95054, USA.
| | - Varun Goel
- University of California, San Francisco-East Bay, General Surgery, Oakland, CA, USA.,Johnson & Johnson MedTech, New Brunswick, NJ, USA
| | - Bharti Goel
- Johnson & Johnson MedTech, New Brunswick, NJ, USA
| | | | | | | | | | | |
Collapse
|
25
|
Kawamura K, Ebata R, Nakamura R, Otori N. Improving situation recognition using endoscopic videos and navigation information for endoscopic sinus surgery. Int J Comput Assist Radiol Surg 2023; 18:9-16. [PMID: 36151349 DOI: 10.1007/s11548-022-02754-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 09/09/2022] [Indexed: 02/01/2023]
Abstract
PURPOSE Endoscopic sinus surgery (ESS) is widely used to treat chronic sinusitis. However, it involves the use of surgical instruments in a narrow surgical field in close proximity to vital organs, such as the brain and eyes. Thus, an advanced level of surgical skill is expected of surgeons performing this surgery. In a previous study, endoscopic images and surgical navigation information were used to develop an automatic situation recognition method in ESS. In this study, we aimed to develop a more accurate automatic surgical situation recognition method for ESS by improving the method proposed in our previous study and adding post-processing to remove incorrect recognition. METHOD We examined the training model parameters and the number of long short-term memory (LSTM) units, modified the input data augmentation method, and added post-processing. We also evaluated the modified method using clinical data. RESULT The proposed improvements improved the overall scene recognition accuracy compared with the previous study. However, phase recognition did not exhibit significant improvement. In addition, the application of the one-dimensional median filter significantly reduced short-time false recognition compared with the time series results. Furthermore, post-processing was required to set constraints on the transition of the scene to further improve recognition accuracy. CONCLUSION We suggested that the scene recognition could be improved by considering the model parameter, adding the one-dimensional filter and post-processing. However, the scene recognition accuracy remained unsatisfactory. Thus, a more accurate scene recognition and appropriate post-processing method is required.
Collapse
Affiliation(s)
- Kazuya Kawamura
- Center for Frontier Medical Engineering, Chiba University, Chiba, Japan.
| | - Ryu Ebata
- Graduate School of Science and Engineering, Chiba University, Chiba, Japan
| | - Ryoichi Nakamura
- Center for Frontier Medical Engineering, Chiba University, Chiba, Japan.,Institute of Biomaterials and Bioengineering, Tokyo Medical and Dental University, Tokyo, Japan
| | - Nobuyoshi Otori
- Department of Otorhinolaryngology, The Jikei University School of Medicine, Tokyo, Japan
| |
Collapse
|
26
|
Temporal-based Swin Transformer network for workflow recognition of surgical video. Int J Comput Assist Radiol Surg 2023; 18:139-147. [PMID: 36331795 DOI: 10.1007/s11548-022-02785-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Accepted: 10/21/2022] [Indexed: 11/06/2022]
Abstract
PURPOSE Surgical workflow recognition has emerged as an important part of computer-assisted intervention systems for the modern operating room, which also is a very challenging problem. Although the CNN-based approach achieves excellent performance, it does not learn global and long-range semantic information interactions well due to the inductive bias inherent in convolution. METHODS In this paper, we propose a temporal-based Swin Transformer network (TSTNet) for the surgical video workflow recognition task. TSTNet contains two main parts: the Swin Transformer and the LSTM. The Swin Transformer incorporates the attention mechanism to encode remote dependencies and learn highly expressive representations. The LSTM is capable of learning long-range dependencies and is used to extract temporal information. The TSTNet organically combines the two components to extract spatiotemporal features that contain more contextual information. In particular, based on a full understanding of the natural features of the surgical video, we propose a priori revision algorithm (PRA) using a priori information about the sequence of the surgical phase. This strategy optimizes the output of TSTNet and further improves the recognition performance. RESULTS We conduct extensive experiments using the Cholec80 dataset to validate the effectiveness of the TSTNet-PRA method. Our method achieves excellent performance on the Cholec80 dataset, which accuracy is up to 92.8% and greatly exceeds the state-of-the-art methods. CONCLUSION By modelling remote temporal information and multi-scale visual information, we propose the TSTNet-PRA method. It was evaluated on a large public dataset, showing a high recognition capability superior to other spatiotemporal networks.
Collapse
|
27
|
Park M, Oh S, Jeong T, Yu S. Multi-Stage Temporal Convolutional Network with Moment Loss and Positional Encoding for Surgical Phase Recognition. Diagnostics (Basel) 2022; 13:diagnostics13010107. [PMID: 36611399 PMCID: PMC9818879 DOI: 10.3390/diagnostics13010107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 12/28/2022] [Accepted: 12/28/2022] [Indexed: 12/31/2022] Open
Abstract
In recent times, many studies concerning surgical video analysis are being conducted due to its growing importance in many medical applications. In particular, it is very important to be able to recognize the current surgical phase because the phase information can be utilized in various ways both during and after surgery. This paper proposes an efficient phase recognition network, called MomentNet, for cholecystectomy endoscopic videos. Unlike LSTM-based network, MomentNet is based on a multi-stage temporal convolutional network. Besides, to improve the phase prediction accuracy, the proposed method adopts a new loss function to supplement the general cross entropy loss function. The new loss function significantly improves the performance of the phase recognition network by constraining un-desirable phase transition and preventing over-segmentation. In addition, MomnetNet effectively applies positional encoding techniques, which are commonly applied in transformer architectures, to the multi-stage temporal convolution network. By using the positional encoding techniques, MomentNet can provide important temporal context, resulting in higher phase prediction accuracy. Furthermore, the MomentNet applies label smoothing technique to suppress overfitting and replaces the backbone network for feature extraction to further improve the network performance. As a result, the MomentNet achieves 92.31% accuracy in the phase recognition task with the Cholec80 dataset, which is 4.55% higher than that of the baseline architecture.
Collapse
Affiliation(s)
- Minyoung Park
- School of Electrical and Electronics Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul 06974, Republic of Korea
| | - Seungtaek Oh
- School of Electrical and Electronics Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul 06974, Republic of Korea
| | - Taikyeong Jeong
- School of Artificial Intelligence Convergence, Hallym University, Chuncheon 24252, Republic of Korea
- Correspondence: (T.J.); (S.Y.)
| | - Sungwook Yu
- School of Electrical and Electronics Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul 06974, Republic of Korea
- Correspondence: (T.J.); (S.Y.)
| |
Collapse
|
28
|
Ali S. Where do we stand in AI for endoscopic image analysis? Deciphering gaps and future directions. NPJ Digit Med 2022; 5:184. [PMID: 36539473 PMCID: PMC9767933 DOI: 10.1038/s41746-022-00733-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Accepted: 11/29/2022] [Indexed: 12/24/2022] Open
Abstract
Recent developments in deep learning have enabled data-driven algorithms that can reach human-level performance and beyond. The development and deployment of medical image analysis methods have several challenges, including data heterogeneity due to population diversity and different device manufacturers. In addition, more input from experts is required for a reliable method development process. While the exponential growth in clinical imaging data has enabled deep learning to flourish, data heterogeneity, multi-modality, and rare or inconspicuous disease cases still need to be explored. Endoscopy being highly operator-dependent with grim clinical outcomes in some disease cases, reliable and accurate automated system guidance can improve patient care. Most designed methods must be more generalisable to the unseen target data, patient population variability, and variable disease appearances. The paper reviews recent works on endoscopic image analysis with artificial intelligence (AI) and emphasises the current unmatched needs in this field. Finally, it outlines the future directions for clinically relevant complex AI solutions to improve patient outcomes.
Collapse
Affiliation(s)
- Sharib Ali
- grid.9909.90000 0004 1936 8403School of Computing, University of Leeds, LS2 9JT Leeds, UK
| |
Collapse
|
29
|
Bastian L, Czempiel T, Heiliger C, Karcz K, Eck U, Busam B, Navab N. Know your sensors — a modality study for surgical action classification. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING: IMAGING & VISUALIZATION 2022. [DOI: 10.1080/21681163.2022.2152377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Affiliation(s)
- Lennart Bastian
- Chair for Computer Aided Medical Procedures, TU Munich, Munich, Germany
| | - Tobias Czempiel
- Chair for Computer Aided Medical Procedures, TU Munich, Munich, Germany
| | - Christian Heiliger
- Department of General, Visceral, and Transplant Surgery, University Hospital, LMU Munich, Munich, Germany
| | - Konrad Karcz
- Department of General, Visceral, and Transplant Surgery, University Hospital, LMU Munich, Munich, Germany
| | - Ulrich Eck
- Chair for Computer Aided Medical Procedures, TU Munich, Munich, Germany
| | - Benjamin Busam
- Chair for Computer Aided Medical Procedures, TU Munich, Munich, Germany
| | - Nassir Navab
- Chair for Computer Aided Medical Procedures, TU Munich, Munich, Germany
- Computer Aided Medical Procedures, John Hopkins University, Baltimore, Maryland, USA
| |
Collapse
|
30
|
Zhang B, Sturgeon D, Shankar AR, Goel VK, Barker J, Ghanem A, Lee P, Milecky M, Stottler N, Petculescu S. Surgical instrument recognition for instrument usage documentation and surgical video library indexing. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING: IMAGING & VISUALIZATION 2022. [DOI: 10.1080/21681163.2022.2152371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Bokai Zhang
- Digital Solutions, Johnson & Johnson MedTech, Seattle, WA, USA
| | - Darrick Sturgeon
- Digital Solutions, Johnson & Johnson MedTech, Santa Clara, CA, USA
| | | | | | - Jocelyn Barker
- Digital Solutions, Johnson & Johnson MedTech, Santa Clara, CA, USA
| | - Amer Ghanem
- Digital Solutions, Johnson & Johnson MedTech, Seattle, WA, USA
| | - Philip Lee
- Digital Solutions, Johnson & Johnson MedTech, Santa Clara, CA, USA
| | - Meghan Milecky
- Digital Solutions, Johnson & Johnson MedTech, Seattle, WA, USA
| | | | | |
Collapse
|
31
|
Fang L, Mou L, Gu Y, Hu Y, Chen B, Chen X, Wang Y, Liu J, Zhao Y. Global-local multi-stage temporal convolutional network for cataract surgery phase recognition. Biomed Eng Online 2022; 21:82. [PMID: 36451164 PMCID: PMC9710114 DOI: 10.1186/s12938-022-01048-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 11/04/2022] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Surgical video phase recognition is an essential technique in computer-assisted surgical systems for monitoring surgical procedures, which can assist surgeons in standardizing procedures and enhancing postsurgical assessment and indexing. However, the high similarity between the phases and temporal variations of cataract videos still poses the greatest challenge for video phase recognition. METHODS In this paper, we introduce a global-local multi-stage temporal convolutional network (GL-MSTCN) to explore the subtle differences between high similarity surgical phases and mitigate the temporal variations of surgical videos. The presented work consists of a triple-stream network (i.e., pupil stream, instrument stream, and video frame stream) and a multi-stage temporal convolutional network. The triple-stream network first detects the pupil and surgical instruments regions in the frame separately and then obtains the fine-grained semantic features of the video frames. The proposed multi-stage temporal convolutional network improves the surgical phase recognition performance by capturing longer time series features through dilated convolutional layers with varying receptive fields. RESULTS Our method is thoroughly validated on the CSVideo dataset with 32 cataract surgery videos and the public Cataract101 dataset with 101 cataract surgery videos, outperforming state-of-the-art approaches with 95.8% and 96.5% accuracy, respectively. CONCLUSIONS The experimental results show that the use of global and local feature information can effectively enhance the model to explore fine-grained features and mitigate temporal and spatial variations, thus improving the surgical phase recognition performance of the proposed GL-MSTCN.
Collapse
Affiliation(s)
- Lixin Fang
- grid.469325.f0000 0004 1761 325XCollege of Mechanical Engineering, Zhejiang University of Technology, Hangzhou, 310014 China ,grid.9227.e0000000119573309Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, China
| | - Lei Mou
- grid.9227.e0000000119573309Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, China
| | - Yuanyuan Gu
- grid.9227.e0000000119573309Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, China ,grid.9227.e0000000119573309Zhejiang Engineering Research Center for Biomedical Materials, Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, 315300 China
| | - Yan Hu
- grid.263817.90000 0004 1773 1790Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, 518055 China
| | - Bang Chen
- grid.9227.e0000000119573309Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, China
| | - Xu Chen
- Department of Ophthalmology, Shanghai Aier Eye Hospital, Shanghai, China ,Department of Ophthalmology, Shanghai Aier Qingliang Eye Hospital, Shanghai, China ,grid.258164.c0000 0004 1790 3548Aier Eye Hospital, Jinan University, No. 601, Huangpu Road West, Guangzhou, China ,grid.216417.70000 0001 0379 7164Aier School of Ophthalmology, Central South University Changsha, Changsha, Hunan China
| | - Yang Wang
- grid.9227.e0000000119573309Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China
| | - Jiang Liu
- grid.263817.90000 0004 1773 1790Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, 518055 China
| | - Yitian Zhao
- grid.9227.e0000000119573309Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, China ,grid.9227.e0000000119573309Zhejiang Engineering Research Center for Biomedical Materials, Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, 315300 China
| |
Collapse
|
32
|
Nema S, Vachhani L. Surgical instrument detection and tracking technologies: Automating dataset labeling for surgical skill assessment. Front Robot AI 2022; 9:1030846. [PMID: 36405072 PMCID: PMC9671944 DOI: 10.3389/frobt.2022.1030846] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 10/14/2022] [Indexed: 11/06/2022] Open
Abstract
Surgical skills can be improved by continuous surgical training and feedback, thus reducing adverse outcomes while performing an intervention. With the advent of new technologies, researchers now have the tools to analyze surgical instrument motion to differentiate surgeons’ levels of technical skill. Surgical skills assessment is time-consuming and prone to subjective interpretation. The surgical instrument detection and tracking algorithm analyzes the image captured by the surgical robotic endoscope and extracts the movement and orientation information of a surgical instrument to provide surgical navigation. This information can be used to label raw surgical video datasets that are used to form an action space for surgical skill analysis. Instrument detection and tracking is a challenging problem in MIS, including robot-assisted surgeries, but vision-based approaches provide promising solutions with minimal hardware integration requirements. This study offers an overview of the developments of assessment systems for surgical intervention analysis. The purpose of this study is to identify the research gap and make a leap in developing technology to automate the incorporation of new surgical skills. A prime factor in automating the learning is to create datasets with minimal manual intervention from raw surgical videos. This review encapsulates the current trends in artificial intelligence (AI) based visual detection and tracking technologies for surgical instruments and their application for surgical skill assessment.
Collapse
|
33
|
Jin Y, Long Y, Gao X, Stoyanov D, Dou Q, Heng PA. Trans-SVNet: hybrid embedding aggregation Transformer for surgical workflow analysis. Int J Comput Assist Radiol Surg 2022; 17:2193-2202. [PMID: 36129573 DOI: 10.1007/s11548-022-02743-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 08/31/2022] [Indexed: 11/05/2022]
Abstract
PURPOSE Real-time surgical workflow analysis has been a key component for computer-assisted intervention system to improve cognitive assistance. Most existing methods solely rely on conventional temporal models and encode features with a successive spatial-temporal arrangement. Supportive benefits of intermediate features are partially lost from both visual and temporal aspects. In this paper, we rethink feature encoding to attend and preserve the critical information for accurate workflow recognition and anticipation. METHODS We introduce Transformer in surgical workflow analysis, to reconsider complementary effects of spatial and temporal representations. We propose a hybrid embedding aggregation Transformer, named Trans-SVNet, to effectively interact with the designed spatial and temporal embeddings, by employing spatial embedding to query temporal embedding sequence. We jointly optimized by loss objectives from both analysis tasks to leverage their high correlation. RESULTS We extensively evaluate our method on three large surgical video datasets. Our method consistently outperforms the state-of-the-arts across three datasets on workflow recognition task. Jointly learning with anticipation, recognition results can gain a large improvement. Our approach also shows its effectiveness on anticipation with promising performance achieved. Our model achieves a real-time inference speed of 0.0134 second per frame. CONCLUSION Experimental results demonstrate the efficacy of our hybrid embeddings integration by rediscovering the crucial cues from complementary spatial-temporal embeddings. The better performance by multi-task learning indicates that anticipation task brings the additional knowledge to recognition task. Promising effectiveness and efficiency of our method also show its promising potential to be used in operating room.
Collapse
Affiliation(s)
- Yueming Jin
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), Department of Computer Science, University College, London, UK
| | - Yonghao Long
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, HK, China
| | - Xiaojie Gao
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, HK, China
| | - Danail Stoyanov
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), Department of Computer Science, University College, London, UK
| | - Qi Dou
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, HK, China. .,Institute of Medical Intelligence and XR, The Chinese University of Hong Kong, Shatin, HK, China.
| | - Pheng-Ann Heng
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, HK, China.,Institute of Medical Intelligence and XR, The Chinese University of Hong Kong, Shatin, HK, China
| |
Collapse
|
34
|
Anticipation for surgical workflow through instrument interaction and recognized Signals. Med Image Anal 2022; 82:102611. [PMID: 36162336 DOI: 10.1016/j.media.2022.102611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Revised: 07/16/2022] [Accepted: 08/30/2022] [Indexed: 10/31/2022]
Abstract
Surgical workflow anticipation is an essential task for computer-assisted intervention (CAI) systems. It aims at predicting the future surgical phase and instrument occurrence, providing support for intra-operative decision-support system. Recent studies have promoted the development of the anticipation task by transforming it into a remaining time prediction problem, but without factoring the surgical instruments' behaviors and their interactions with surrounding anatomies in the network design. In this paper, we propose an Instrument Interaction Aware Anticipation Network (IIA-Net) to overcome the previous deficiency while retaining the merits of two-stage models through using spatial feature extractor and temporal model. Spatially, feature extractor utilizes tooltips' movement to extracts the instrument-instrument interaction, which helps model concentrate on the surgeon's actions. On the other hand, it introduces the segmentation map to capture the rich instrument-surrounding features about the instrument surroundings. Temporally, the temporal model applies the causal dilated multi-stage temporal convolutional network to capture the long-term dependency in the long and untrimmed surgical videos with a large receptive field. Our IIA-Net enforces an online inference with reliable predictions even with severe noise and artifacts in the recorded videos and presence signals. Extensive experiments on Cholec80 dataset demonstrate the performance of our proposed method exceeds the state-of-the-art method by a large margin (1.03 v.s. 1.12 for MAEw, 1.40 v.s. 1.75 for MAEin and 2.14 v.s. 2.68 for MAEe). For reproduction purposes, all the original codes are made public at https://github.com/Flaick/Surgical-Workflow-Anticipation.
Collapse
|
35
|
Surgical Tool Datasets for Machine Learning Research: A Survey. Int J Comput Vis 2022. [DOI: 10.1007/s11263-022-01640-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
AbstractThis paper is a comprehensive survey of datasets for surgical tool detection and related surgical data science and machine learning techniques and algorithms. The survey offers a high level perspective of current research in this area, analyses the taxonomy of approaches adopted by researchers using surgical tool datasets, and addresses key areas of research, such as the datasets used, evaluation metrics applied and deep learning techniques utilised. Our presentation and taxonomy provides a framework that facilitates greater understanding of current work, and highlights the challenges and opportunities for further innovative and useful research.
Collapse
|
36
|
Sánchez-Brizuela G, Santos-Criado FJ, Sanz-Gobernado D, de la Fuente-López E, Fraile JC, Pérez-Turiel J, Cisnal A. Gauze Detection and Segmentation in Minimally Invasive Surgery Video Using Convolutional Neural Networks. SENSORS (BASEL, SWITZERLAND) 2022; 22:5180. [PMID: 35890857 PMCID: PMC9319965 DOI: 10.3390/s22145180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Revised: 06/30/2022] [Accepted: 07/07/2022] [Indexed: 06/15/2023]
Abstract
Medical instruments detection in laparoscopic video has been carried out to increase the autonomy of surgical robots, evaluate skills or index recordings. However, it has not been extended to surgical gauzes. Gauzes can provide valuable information to numerous tasks in the operating room, but the lack of an annotated dataset has hampered its research. In this article, we present a segmentation dataset with 4003 hand-labelled frames from laparoscopic video. To prove the dataset potential, we analyzed several baselines: detection using YOLOv3, coarse segmentation, and segmentation with a U-Net. Our results show that YOLOv3 can be executed in real time but provides a modest recall. Coarse segmentation presents satisfactory results but lacks inference speed. Finally, the U-Net baseline achieves a good speed-quality compromise running above 30 FPS while obtaining an IoU of 0.85. The accuracy reached by U-Net and its execution speed demonstrate that precise and real-time gauze segmentation can be achieved, training convolutional neural networks on the proposed dataset.
Collapse
Affiliation(s)
- Guillermo Sánchez-Brizuela
- Instituto de las Tecnologías Avanzadas de la Producción (ITAP), Universidad de Valladolid, Paseo del Cauce 59, 47011 Valladolid, Spain; (D.S.-G.); (E.d.l.F.-L.); (J.-C.F.); (J.P.-T.); (A.C.)
| | - Francisco-Javier Santos-Criado
- Escuela Técnica Superior de Ingenieros Industriales, Universidad Politécnica de Madrid, Calle de José Gutiérrez Abascal, 2, 28006 Madrid, Spain;
| | - Daniel Sanz-Gobernado
- Instituto de las Tecnologías Avanzadas de la Producción (ITAP), Universidad de Valladolid, Paseo del Cauce 59, 47011 Valladolid, Spain; (D.S.-G.); (E.d.l.F.-L.); (J.-C.F.); (J.P.-T.); (A.C.)
| | - Eusebio de la Fuente-López
- Instituto de las Tecnologías Avanzadas de la Producción (ITAP), Universidad de Valladolid, Paseo del Cauce 59, 47011 Valladolid, Spain; (D.S.-G.); (E.d.l.F.-L.); (J.-C.F.); (J.P.-T.); (A.C.)
| | - Juan-Carlos Fraile
- Instituto de las Tecnologías Avanzadas de la Producción (ITAP), Universidad de Valladolid, Paseo del Cauce 59, 47011 Valladolid, Spain; (D.S.-G.); (E.d.l.F.-L.); (J.-C.F.); (J.P.-T.); (A.C.)
| | - Javier Pérez-Turiel
- Instituto de las Tecnologías Avanzadas de la Producción (ITAP), Universidad de Valladolid, Paseo del Cauce 59, 47011 Valladolid, Spain; (D.S.-G.); (E.d.l.F.-L.); (J.-C.F.); (J.P.-T.); (A.C.)
| | - Ana Cisnal
- Instituto de las Tecnologías Avanzadas de la Producción (ITAP), Universidad de Valladolid, Paseo del Cauce 59, 47011 Valladolid, Spain; (D.S.-G.); (E.d.l.F.-L.); (J.-C.F.); (J.P.-T.); (A.C.)
| |
Collapse
|
37
|
Alshirbaji TA, Jalal NA, Docherty PD, Neumuth PT, Moller K. Improving the Generalisability of Deep CNNs by Combining Multi-stage Features for Surgical Tool Classification. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2022; 2022:533-536. [PMID: 36086626 DOI: 10.1109/embc48229.2022.9870883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Dataset characteristics play an important role in training convolutional neural networks (CNNs) to evolve optimal features required to perform a specific task. Due to the high cost of recording and labelling surgical data, available datasets are relatively small in size and have been predominantly acquired at single sites. CNN-based approaches have been widely adapted to analyse surgical workflow using single-site datasets. Therefore, assessing generalised performance on data from different institutions has not been investigated. In this work, a CNN model that combines features from multiple stages to develop more accurate and generalised tool classification was introduced. An extensive evaluation of the proposed approach on three different datasets showed better generalised performance of our approach compared to base CNN models. The proposed approach achieved mAP values of 91.46%, 69.02% and 37.14% on the Cholec80, Cholec20 and Gyna05 datasets, respectively. The generalisation performance was improved on the achieved base CNN models mAP by about 7%. Clinical Relevance- In this research, we proposed a method to improve generalisation capability of CNN models which will have positive impact on developing more robust assistive systems that can support the surgeon and improve patient care.
Collapse
|
38
|
Data-centric multi-task surgical phase estimation with sparse scene segmentation. Int J Comput Assist Radiol Surg 2022; 17:953-960. [PMID: 35505149 PMCID: PMC9110447 DOI: 10.1007/s11548-022-02616-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 03/22/2022] [Indexed: 11/27/2022]
Abstract
Purpose Surgical workflow estimation techniques aim to divide a surgical video into temporal segments based on predefined surgical actions or objectives, which can be of different granularity such as steps or phases. Potential applications range from real-time intra-operative feedback to automatic post-operative reports and analysis. A common approach in the literature for performing automatic surgical phase estimation is to decouple the problem into two stages: feature extraction from a single frame and temporal feature fusion. This approach is performed in two stages due to computational restrictions when processing large spatio-temporal sequences. Methods The majority of existing works focus on pushing the performance solely through temporal model development. Differently, we follow a data-centric approach and propose a training pipeline that enables models to maximise the usage of existing datasets, which are generally used in isolation. Specifically, we use dense phase annotations available in Cholec80, and sparse scene (i.e., instrument and anatomy) segmentation annotation available in CholecSeg8k in less than 5% of the overlapping frames. We propose a simple multi-task encoder that effectively fuses both streams, when available, based on their importance and jointly optimise them for performing accurate phase prediction. Results and conclusion We show that with a small fraction of scene segmentation annotations, a relatively simple model can obtain comparable results than previous state-of-the-art and more complex architectures when evaluated in similar settings. We hope that this data-centric approach can encourage new research directions where data, and how to use it, plays an important role along with model development.
Collapse
|
39
|
Hybrid Spatiotemporal Contrastive Representation Learning for Content-Based Surgical Video Retrieval. ELECTRONICS 2022. [DOI: 10.3390/electronics11091353] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
In the medical field, due to their economic and clinical benefits, there is a growing interest in minimally invasive surgeries and microscopic surgeries. These types of surgeries are often recorded during operations, and these recordings have become a key resource for education, patient disease analysis, surgical error analysis, and surgical skill assessment. However, manual searching in this collection of long-term surgical videos is an extremely labor-intensive and long-term task, requiring an effective content-based video analysis system. In this regard, previous methods for surgical video retrieval are based on handcrafted features which do not represent the video effectively. On the other hand, deep learning-based solutions were found to be effective in both surgical image and video analysis, where CNN-, LSTM- and CNN-LSTM-based methods were proposed in most surgical video analysis tasks. In this paper, we propose a hybrid spatiotemporal embedding method to enhance spatiotemporal representations using an adaptive fusion layer on top of the LSTM and temporal causal convolutional modules. To learn surgical video representations, we propose exploring the supervised contrastive learning approach to leverage label information in addition to augmented versions. By validating our approach to a video retrieval task on two datasets, Surgical Actions 160 and Cataract-101, we significantly improve on previous results in terms of mean average precision, 30.012 ± 1.778 vs. 22.54 ± 1.557 for Surgical Actions 160 and 81.134 ± 1.28 vs. 33.18 ± 1.311 for Cataract-101. We also validate the proposed method’s suitability for surgical phase recognition task using the benchmark Cholec80 surgical dataset, where our approach outperforms (with 90.2% accuracy) the state of the art.
Collapse
|
40
|
Li L, Li X, Ding S, Fang Z, Xu M, Ren H, Yang S. SIRNet: Fine-Grained Surgical Interaction Recognition. IEEE Robot Autom Lett 2022. [DOI: 10.1109/lra.2022.3148454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
41
|
Das A, Bano S, Vasconcelos F, Khan DZ, Marcus HJ, Stoyanov D. Reducing Prediction volatility in the surgical workflow recognition of endoscopic pituitary surgery. Int J Comput Assist Radiol Surg 2022; 17:1445-1452. [PMID: 35362848 PMCID: PMC9307536 DOI: 10.1007/s11548-022-02599-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 03/08/2022] [Indexed: 11/25/2022]
Abstract
Purpose: Workflow recognition can aid surgeons before an operation when used as a training tool, during an operation by increasing operating room efficiency, and after an operation in the completion of operation notes. Although several methods have been applied to this task, they have been tested on few surgical datasets. Therefore, their generalisability is not well tested, particularly for surgical approaches utilising smaller working spaces which are susceptible to occlusion and necessitate frequent withdrawal of the endoscope. This leads to rapidly changing predictions, which reduces the clinical confidence of the methods, and hence limits their suitability for clinical translation. Methods: Firstly, the optimal neural network is found using established methods, using endoscopic pituitary surgery as an exemplar. Then, prediction volatility is formally defined as a new evaluation metric as a proxy for uncertainty, and two temporal smoothing functions are created. The first (modal, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$M_n$$\end{document}Mn) mode-averages over the previous n predictions, and the second (threshold, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$T_n$$\end{document}Tn) ensures a class is only changed after being continuously predicted for n predictions. Both functions are independently applied to the predictions of the optimal network. Results: The methods are evaluated on a 50-video dataset using fivefold cross-validation, and the optimised evaluation metric is weighted-\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$F_1$$\end{document}F1 score. The optimal model is ResNet-50+LSTM achieving 0.84 in 3-phase classification and 0.74 in 7-step classification. Applying threshold smoothing further improves these results, achieving 0.86 in 3-phase classification, and 0.75 in 7-step classification, while also drastically reducing the prediction volatility. Conclusion: The results confirm the established methods generalise to endoscopic pituitary surgery, and show simple temporal smoothing not only reduces prediction volatility, but actively improves performance.
Collapse
Affiliation(s)
- Adrito Das
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, United Kingdom.
| | - Sophia Bano
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, United Kingdom
| | - Francisco Vasconcelos
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, United Kingdom
| | - Danyal Z Khan
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, United Kingdom
- Department of Neurosurgery, National Hospital for Neurology and Neurosurgery, London, United Kingdom
| | - Hani J Marcus
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, United Kingdom
- Department of Neurosurgery, National Hospital for Neurology and Neurosurgery, London, United Kingdom
| | - Danail Stoyanov
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, United Kingdom
| |
Collapse
|
42
|
Kadkhodamohammadi A, Luengo I, Stoyanov D. PATG: position-aware temporal graph networks for surgical phase recognition on laparoscopic videos. Int J Comput Assist Radiol Surg 2022; 17:849-856. [PMID: 35353299 DOI: 10.1007/s11548-022-02600-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Accepted: 03/08/2022] [Indexed: 11/27/2022]
Abstract
PURPOSE We tackle the problem of online surgical phase recognition in laparoscopic procedures, which is key in developing context-aware supporting systems. We propose a novel approach to take temporal context in surgical videos into account by precise modeling of temporal neighborhoods. METHODS We propose a two-stage model to perform phase recognition. A CNN model is used as a feature extractor to project RGB frames into a high-dimensional feature space. We introduce a novel paradigm for surgical phase recognition which utilizes graph neural networks to incorporate temporal information. Unlike recurrent neural networks and temporal convolution networks, our graph-based approach offers a more generic and flexible way for modeling temporal relationships. Each frame is a node in the graph, and the edges in the graph are used to define temporal connections among the nodes. The flexible configuration of temporal neighborhood comes at the price of losing temporal order. To mitigate this, our approach takes temporal orders into account by encoding frame positions, which is important to reliably predict surgical phases. RESULTS Experiments are carried out on the public Cholec80 dataset that contains 80 annotated videos. The experimental results highlight the superior performance of the proposed approach compared to the state-of-the-art models on this dataset. CONCLUSION A novel approach for formulating video-based surgical phase recognition is presented. The results indicate that temporal information can be incorporated using graph-based models, and positional encoding is important to efficiently utilize temporal information. Graph networks open possibilities to use evidence theory for uncertainty analysis in surgical phase recognition.
Collapse
Affiliation(s)
| | - Imanol Luengo
- Innovation Department, Medtronic Digital Surgery, 230 City Road, London, EC1V 2QY, UK
| | - Danail Stoyanov
- Innovation Department, Medtronic Digital Surgery, 230 City Road, London, EC1V 2QY, UK
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, UK
| |
Collapse
|
43
|
Junger D, Frommer SM, Burgert O. State-of-the-art of situation recognition systems for intraoperative procedures. Med Biol Eng Comput 2022; 60:921-939. [PMID: 35178622 PMCID: PMC8933302 DOI: 10.1007/s11517-022-02520-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Accepted: 01/30/2022] [Indexed: 11/05/2022]
Abstract
One of the key challenges for automatic assistance is the support of actors in the operating room depending on the status of the procedure. Therefore, context information collected in the operating room is used to gain knowledge about the current situation. In literature, solutions already exist for specific use cases, but it is doubtful to what extent these approaches can be transferred to other conditions. We conducted a comprehensive literature research on existing situation recognition systems for the intraoperative area, covering 274 articles and 95 cross-references published between 2010 and 2019. We contrasted and compared 58 identified approaches based on defined aspects such as used sensor data or application area. In addition, we discussed applicability and transferability. Most of the papers focus on video data for recognizing situations within laparoscopic and cataract surgeries. Not all of the approaches can be used online for real-time recognition. Using different methods, good results with recognition accuracies above 90% could be achieved. Overall, transferability is less addressed. The applicability of approaches to other circumstances seems to be possible to a limited extent. Future research should place a stronger focus on adaptability. The literature review shows differences within existing approaches for situation recognition and outlines research trends. Applicability and transferability to other conditions are less addressed in current work.
Collapse
Affiliation(s)
- D Junger
- School of Informatics, Research Group Computer Assisted Medicine (CaMed), Reutlingen University, Alteburgstr. 150, 72762, Reutlingen, Germany.
| | - S M Frommer
- School of Informatics, Research Group Computer Assisted Medicine (CaMed), Reutlingen University, Alteburgstr. 150, 72762, Reutlingen, Germany
| | - O Burgert
- School of Informatics, Research Group Computer Assisted Medicine (CaMed), Reutlingen University, Alteburgstr. 150, 72762, Reutlingen, Germany
| |
Collapse
|
44
|
Artificial Intelligence in Surgery: A Research Team Perspective. Curr Probl Surg 2022; 59:101125. [DOI: 10.1016/j.cpsurg.2022.101125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
45
|
Zhang Y, Marsic I, Burd RS. Real-time medical phase recognition using long-term video understanding and progress gate method. Med Image Anal 2021; 74:102224. [PMID: 34543914 PMCID: PMC8560574 DOI: 10.1016/j.media.2021.102224] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Revised: 08/31/2021] [Accepted: 09/02/2021] [Indexed: 01/10/2023]
Abstract
We introduce a real-time system for recognizing five phases of the trauma resuscitation process, the initial management of injured patients in the emergency department. We used depth videos as input to preserve the privacy of the patients and providers. The depth videos were recorded using a Kinect-v2 mounted on the sidewall of the room. Our dataset consisted of 183 depth videos of trauma resuscitations. The model was trained on 150 cases with more than 30 minutes each and tested on the remaining 33 cases. We introduced a reduced long-term operation (RLO) method for extracting features from long segments of video and combined it with the regular model having short-term information only. The model with RLO outperformed the regular short-term model by 5% using the accuracy score. We also introduced a progress gate (PG) method to distinguish visually similar phases using video progress. The final system achieved 91% accuracy and significantly outperformed previous systems for phase recognition in this setting.
Collapse
Affiliation(s)
- Yanyi Zhang
- Department of Electrical and Computer Engineering, Rutgers University, Piscataway, NJ 08854, USA.
| | - Ivan Marsic
- Department of Electrical and Computer Engineering, Rutgers University, Piscataway, NJ 08854, USA
| | - Randall S Burd
- Division of Trauma and Burn Surgery, Children's National Medical Center, Washington, DC 20010, USA
| |
Collapse
|
46
|
Unsupervised feature disentanglement for video retrieval in minimally invasive surgery. Med Image Anal 2021; 75:102296. [PMID: 34781159 DOI: 10.1016/j.media.2021.102296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Revised: 10/19/2021] [Accepted: 10/27/2021] [Indexed: 11/23/2022]
Abstract
In this paper, we propose a novel method of Unsupervised Disentanglement of Scene and Motion (UDSM) representations for minimally invasive surgery video retrieval within large databases, which has the potential to advance intelligent and efficient surgical teaching systems. To extract more discriminative video representations, two designed encoders with a triplet ranking loss and an adversarial learning mechanism are established to respectively capture the spatial and temporal information for achieving disentangled features from each frame with promising interpretability. In addition, the long-range temporal dependencies are improved in an integrated video level using a temporal aggregation module and then a set of compact binary codes that carries representative features is yielded to realize fast retrieval. The entire framework is trained in an unsupervised scheme, i.e., purely learning from raw surgical videos without using any annotation. We construct two large-scale minimally invasive surgery video datasets based on the public dataset Cholec80 and our in-house dataset of laparoscopic hysterectomy, to establish the learning process and validate the effectiveness of our proposed method qualitatively and quantitatively on the surgical video retrieval task. Extensive experiments show that our approach significantly outperforms the state-of-the-art video retrieval methods on both datasets, revealing a promising future for injecting intelligence in the next generation of surgical teaching systems.
Collapse
|
47
|
Huaulmé A, Sarikaya D, Le Mut K, Despinoy F, Long Y, Dou Q, Chng CB, Lin W, Kondo S, Bravo-Sánchez L, Arbeláez P, Reiter W, Mitsuishi M, Harada K, Jannin P. MIcro-surgical anastomose workflow recognition challenge report. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021; 212:106452. [PMID: 34688174 DOI: 10.1016/j.cmpb.2021.106452] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 09/28/2021] [Indexed: 05/22/2023]
Abstract
BACKGROUND AND OBJECTIVE Automatic surgical workflow recognition is an essential step in developing context-aware computer-assisted surgical systems. Video recordings of surgeries are becoming widely accessible, as the operational field view is captured during laparoscopic surgeries. Head and ceiling mounted cameras are also increasingly being used to record videos in open surgeries. This makes videos a common choice in surgical workflow recognition. Additional modalities, such as kinematic data captured during robot-assisted surgeries, could also improve workflow recognition. This paper presents the design and results of the MIcro-Surgical Anastomose Workflow recognition on training sessions (MISAW) challenge whose objective was to develop workflow recognition models based on kinematic data and/or videos. METHODS The MISAW challenge provided a data set of 27 sequences of micro-surgical anastomosis on artificial blood vessels. This data set was composed of videos, kinematics, and workflow annotations. The latter described the sequences at three different granularity levels: phase, step, and activity. Four tasks were proposed to the participants: three of them were related to the recognition of surgical workflow at three different granularity levels, while the last one addressed the recognition of all granularity levels in the same model. We used the average application-dependent balanced accuracy (AD-Accuracy) as the evaluation metric. This takes unbalanced classes into account and it is more clinically relevant than a frame-by-frame score. RESULTS Six teams participated in at least one task. All models employed deep learning models, such as convolutional neural networks (CNN), recurrent neural networks (RNN), or a combination of both. The best models achieved accuracy above 95%, 80%, 60%, and 75% respectively for recognition of phases, steps, activities, and multi-granularity. The RNN-based models outperformed the CNN-based ones as well as the dedicated modality models compared to the multi-granularity except for activity recognition. CONCLUSION For high levels of granularity, the best models had a recognition rate that may be sufficient for applications such as prediction of remaining surgical time. However, for activities, the recognition rate was still low for applications that can be employed clinically. The MISAW data set is publicly available at http://www.synapse.org/MISAW to encourage further research in surgical workflow recognition.
Collapse
Affiliation(s)
- Arnaud Huaulmé
- Univ Rennes,INSERM, LTSI - UMR 1099, Rennes, F35000, France.
| | - Duygu Sarikaya
- Gazi University, Faculty of Engineering; Department of Computer Engineering, Ankara, Turkey
| | - Kévin Le Mut
- Univ Rennes,INSERM, LTSI - UMR 1099, Rennes, F35000, France
| | | | - Yonghao Long
- Department of Computer Science & Engineering, The Chinese University of Hong Kong, China; T Stone Robotics Institute, The Chinese University of Hong Kong, China
| | - Qi Dou
- Department of Computer Science & Engineering, The Chinese University of Hong Kong, China; T Stone Robotics Institute, The Chinese University of Hong Kong, China
| | - Chin-Boon Chng
- National University of Singapore(NUS), Singapore, Singapore; Southern University of Science and Technology (SUSTech), Shenzhen, China
| | - Wenjun Lin
- National University of Singapore(NUS), Singapore, Singapore; Southern University of Science and Technology (SUSTech), Shenzhen, China
| | | | - Laura Bravo-Sánchez
- Center for Research and Formation in Artificial Intelligence, Department of Biomedical Engineering, Universidad de los Andes, Bogotá, Colombia
| | - Pablo Arbeláez
- Center for Research and Formation in Artificial Intelligence, Department of Biomedical Engineering, Universidad de los Andes, Bogotá, Colombia
| | | | - Manoru Mitsuishi
- Department of Mechanical Engineering, the University of Tokyo,Tokyo 113-8656, Japan
| | - Kanako Harada
- Department of Mechanical Engineering, the University of Tokyo,Tokyo 113-8656, Japan
| | - Pierre Jannin
- Univ Rennes,INSERM, LTSI - UMR 1099, Rennes, F35000, France.
| |
Collapse
|
48
|
Pradeep CS, Sinha N. Spatio-Temporal Features Based Surgical Phase Classification Using CNNs. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2021; 2021:3332-3335. [PMID: 34891953 DOI: 10.1109/embc46164.2021.9630829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this paper, we propose a novel encoder-decoder based surgical phase classification technique leveraging on the spatio-temporal features extracted from the videos of laparoscopic cholecystectomy surgery. We use combined margin loss function to train on the computationally efficient PeleeNet architecture to extract features that exhibit: (1) Intra-phase similarity, (2) Inter-phase dissimilarity. Using these features, we propose to encapsulate sequential feature embeddings, 64 at a time and classify the surgical phase based on customized efficient residual factorized CNN architecture (ST-ERFNet). We obtained surgical phase classification accuracy of 86.07% on the publicly available Cholec80 dataset which consists of 7 surgical phases. The number of parameters required for the computation is approximately reduced by 84% and yet achieves comparable performance as the state of the art.Clinical relevance- Autonomous surgical phase classification sets the platform for automatically analyzing the entire surgical work flow. Additionally, could streamline the process of assessment of a surgery in terms of efficiency, early detection of errors or deviation from usual practice. This would potentially result in increased patient care.
Collapse
|
49
|
Wang J, Jin Y, Cai S, Xu H, Heng PA, Qin J, Wang L. Real-time landmark detection for precise endoscopic submucosal dissection via shape-aware relation network. Med Image Anal 2021; 75:102291. [PMID: 34753019 DOI: 10.1016/j.media.2021.102291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Revised: 10/22/2021] [Accepted: 10/25/2021] [Indexed: 10/19/2022]
Abstract
We propose a novel shape-aware relation network for accurate and real-time landmark detection in endoscopic submucosal dissection (ESD) surgery. This task is of great clinical significance but extremely challenging due to bleeding, lighting reflection, and motion blur in the complicated surgical environment. Compared with existing solutions, which either neglect geometric relationships among targeting objects or capture the relationships by using complicated aggregation schemes, the proposed network is capable of achieving satisfactory accuracy while maintaining real-time performance by taking full advantage of the spatial relations among landmarks. We first devise an algorithm to automatically generate relation keypoint heatmaps, which are able to intuitively represent the prior knowledge of spatial relations among landmarks without using any extra manual annotation efforts. We then develop two complementary regularization schemes to progressively incorporate the prior knowledge into the training process. While one scheme introduces pixel-level regularization by multi-task learning, the other integrates global-level regularization by harnessing a newly designed grouped consistency evaluator, which adds relation constraints to the proposed network in an adversarial manner. Both schemes are beneficial to the model in training, and can be readily unloaded in inference to achieve real-time detection. We establish a large in-house dataset of ESD surgery for esophageal cancer to validate the effectiveness of our proposed method. Extensive experimental results demonstrate that our approach outperforms state-of-the-art methods in terms of accuracy and efficiency, achieving better detection results faster. Promising results on two downstream applications further corroborate the great potential of our method in ESD clinical practice.
Collapse
Affiliation(s)
- Jiacheng Wang
- Department of Computer Science at School of Informatics, Xiamen University, Xiamen 361005, China
| | - Yueming Jin
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, China
| | - Shuntian Cai
- Department of Gastroenterology, Zhongshan Hospital affiliated to Xiamen University, Xiamen, China
| | - Hongzhi Xu
- Department of Gastroenterology, Zhongshan Hospital affiliated to Xiamen University, Xiamen, China
| | - Pheng-Ann Heng
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, China
| | - Jing Qin
- Center for Smart Health, School of Nursing, The Hong Kong Polytechnic University, Hong Kong
| | - Liansheng Wang
- Department of Computer Science at School of Informatics, Xiamen University, Xiamen 361005, China.
| |
Collapse
|
50
|
Zhang B, Ghanem A, Simes A, Choi H, Yoo A. Surgical workflow recognition with 3DCNN for Sleeve Gastrectomy. Int J Comput Assist Radiol Surg 2021; 16:2029-2036. [PMID: 34415503 PMCID: PMC8589754 DOI: 10.1007/s11548-021-02473-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Accepted: 08/04/2021] [Indexed: 01/07/2023]
Abstract
PURPOSE Surgical workflow recognition is a crucial and challenging problem when building a computer-assisted surgery system. Current techniques focus on utilizing a convolutional neural network and a recurrent neural network (CNN-RNN) to solve the surgical workflow recognition problem. In this paper, we attempt to use a deep 3DCNN to solve this problem. METHODS In order to tackle the surgical workflow recognition problem and the imbalanced data problem, we implement a 3DCNN workflow referred to as I3D-FL-PKF. We utilize focal loss (FL) to train a 3DCNN architecture known as Inflated 3D ConvNet (I3D) for surgical workflow recognition. We use prior knowledge filtering (PKF) to filter the recognition results. RESULTS We evaluate our proposed workflow on a large sleeve gastrectomy surgical video dataset. We show that focal loss can help to address the imbalanced data problem. We show that our PKF can be used to generate smoothed prediction results and improve the overall accuracy. We show that the proposed workflow achieves 84.16% frame-level accuracy and reaches a weighted Jaccard score of 0.7327 which outperforms traditional CNN-RNN design. CONCLUSION The proposed workflow can obtain consistent and smooth predictions not only within the surgical phases but also for phase transitions. By utilizing focal loss and prior knowledge filtering, our implementation of deep 3DCNN has great potential to solve surgical workflow recognition problems for clinical practice.
Collapse
Affiliation(s)
- Bokai Zhang
- C-SATS, Inc. Johnson & Johnson, 1100 Olive Way, Suite 1100, Seattle, WA, 98101, USA.
| | - Amer Ghanem
- C-SATS, Inc. Johnson & Johnson, 1100 Olive Way, Suite 1100, Seattle, WA, 98101, USA
| | - Alexander Simes
- C-SATS, Inc. Johnson & Johnson, 1100 Olive Way, Suite 1100, Seattle, WA, 98101, USA
| | - Henry Choi
- C-SATS, Inc. Johnson & Johnson, 1100 Olive Way, Suite 1100, Seattle, WA, 98101, USA
| | - Andrew Yoo
- C-SATS, Inc. Johnson & Johnson, 1100 Olive Way, Suite 1100, Seattle, WA, 98101, USA
| |
Collapse
|