1
|
Lin W, Hu Y, Fu H, Yang M, Chng CB, Kawasaki R, Chui C, Liu J. Instrument-Tissue Interaction Detection Framework for Surgical Video Understanding. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:2803-2813. [PMID: 38530715 DOI: 10.1109/tmi.2024.3381209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/28/2024]
Abstract
Instrument-tissue interaction detection task, which helps understand surgical activities, is vital for constructing computer-assisted surgery systems but with many challenges. Firstly, most models represent instrument-tissue interaction in a coarse-grained way which only focuses on classification and lacks the ability to automatically detect instruments and tissues. Secondly, existing works do not fully consider relations between intra- and inter-frame of instruments and tissues. In the paper, we propose to represent instrument-tissue interaction as 〈 instrument class, instrument bounding box, tissue class, tissue bounding box, action class 〉 quintuple and present an Instrument-Tissue Interaction Detection Network (ITIDNet) to detect the quintuple for surgery videos understanding. Specifically, we propose a Snippet Consecutive Feature (SCF) Layer to enhance features by modeling relationships of proposals in the current frame using global context information in the video snippet. We also propose a Spatial Corresponding Attention (SCA) Layer to incorporate features of proposals between adjacent frames through spatial encoding. To reason relationships between instruments and tissues, a Temporal Graph (TG) Layer is proposed with intra-frame connections to exploit relationships between instruments and tissues in the same frame and inter-frame connections to model the temporal information for the same instance. For evaluation, we build a cataract surgery video (PhacoQ) dataset and a cholecystectomy surgery video (CholecQ) dataset. Experimental results demonstrate the promising performance of our model, which outperforms other state-of-the-art models on both datasets.
Collapse
|
2
|
Younis R, Yamlahi A, Bodenstedt S, Scheikl PM, Kisilenko A, Daum M, Schulze A, Wise PA, Nickel F, Mathis-Ullrich F, Maier-Hein L, Müller-Stich BP, Speidel S, Distler M, Weitz J, Wagner M. A surgical activity model of laparoscopic cholecystectomy for co-operation with collaborative robots. Surg Endosc 2024; 38:4316-4328. [PMID: 38872018 PMCID: PMC11289174 DOI: 10.1007/s00464-024-10958-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 05/24/2024] [Indexed: 06/15/2024]
Abstract
BACKGROUND Laparoscopic cholecystectomy is a very frequent surgical procedure. However, in an ageing society, less surgical staff will need to perform surgery on patients. Collaborative surgical robots (cobots) could address surgical staff shortages and workload. To achieve context-awareness for surgeon-robot collaboration, the intraoperative action workflow recognition is a key challenge. METHODS A surgical process model was developed for intraoperative surgical activities including actor, instrument, action and target in laparoscopic cholecystectomy (excluding camera guidance). These activities, as well as instrument presence and surgical phases were annotated in videos of laparoscopic cholecystectomy performed on human patients (n = 10) and on explanted porcine livers (n = 10). The machine learning algorithm Distilled-Swin was trained on our own annotated dataset and the CholecT45 dataset. The validation of the model was conducted using a fivefold cross-validation approach. RESULTS In total, 22,351 activities were annotated with a cumulative duration of 24.9 h of video segments. The machine learning algorithm trained and validated on our own dataset scored a mean average precision (mAP) of 25.7% and a top K = 5 accuracy of 85.3%. With training and validation on our dataset and CholecT45, the algorithm scored a mAP of 37.9%. CONCLUSIONS An activity model was developed and applied for the fine-granular annotation of laparoscopic cholecystectomies in two surgical settings. A machine recognition algorithm trained on our own annotated dataset and CholecT45 achieved a higher performance than training only on CholecT45 and can recognize frequently occurring activities well, but not infrequent activities. The analysis of an annotated dataset allowed for the quantification of the potential of collaborative surgical robots to address the workload of surgical staff. If collaborative surgical robots could grasp and hold tissue, up to 83.5% of the assistant's tissue interacting tasks (i.e. excluding camera guidance) could be performed by robots.
Collapse
Affiliation(s)
- R Younis
- Department for General, Visceral and Transplant Surgery, Heidelberg University Hospital, Heidelberg, Germany
- National Center for Tumor Diseases (NCT), Heidelberg, Germany
- Centre for the Tactile Internet with Human-in-the-Loop (CeTI), TUD Dresden University of Technology, Dresden, Germany
| | - A Yamlahi
- Division of Intelligent Medical Systems (IMSY), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - S Bodenstedt
- Department for Translational Surgical Oncology, National Center for Tumor Diseases, Partner Site Dresden, Dresden, Germany
- Centre for the Tactile Internet with Human-in-the-Loop (CeTI), TUD Dresden University of Technology, Dresden, Germany
| | - P M Scheikl
- Surgical Planning and Robotic Cognition (SPARC), Department Artificial Intelligence in Biomedical Engineering (AIBE), Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - A Kisilenko
- Department for General, Visceral and Transplant Surgery, Heidelberg University Hospital, Heidelberg, Germany
- National Center for Tumor Diseases (NCT), Heidelberg, Germany
| | - M Daum
- Centre for the Tactile Internet with Human-in-the-Loop (CeTI), TUD Dresden University of Technology, Dresden, Germany
- Department of Visceral, Thoracic and Vascular Surgery, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Fetscherstraße 74, 01307, Dresden, Germany
| | - A Schulze
- Centre for the Tactile Internet with Human-in-the-Loop (CeTI), TUD Dresden University of Technology, Dresden, Germany
- Department of Visceral, Thoracic and Vascular Surgery, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Fetscherstraße 74, 01307, Dresden, Germany
| | - P A Wise
- Department for General, Visceral and Transplant Surgery, Heidelberg University Hospital, Heidelberg, Germany
| | - F Nickel
- Department for General, Visceral and Transplant Surgery, Heidelberg University Hospital, Heidelberg, Germany
- Department of General, Visceral and Thoracic Surgery, University Medical Center Hamburg- Eppendorf, Hamburg, Germany
| | - F Mathis-Ullrich
- Surgical Planning and Robotic Cognition (SPARC), Department Artificial Intelligence in Biomedical Engineering (AIBE), Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - L Maier-Hein
- National Center for Tumor Diseases (NCT), Heidelberg, Germany
- Division of Intelligent Medical Systems (IMSY), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - B P Müller-Stich
- Department for Abdominal Surgery, University Center for Gastrointestinal and Liver Diseases, Basel, Switzerland
| | - S Speidel
- Department for Translational Surgical Oncology, National Center for Tumor Diseases, Partner Site Dresden, Dresden, Germany
- Centre for the Tactile Internet with Human-in-the-Loop (CeTI), TUD Dresden University of Technology, Dresden, Germany
| | - M Distler
- Department of Visceral, Thoracic and Vascular Surgery, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Fetscherstraße 74, 01307, Dresden, Germany
| | - J Weitz
- Centre for the Tactile Internet with Human-in-the-Loop (CeTI), TUD Dresden University of Technology, Dresden, Germany
- Department of Visceral, Thoracic and Vascular Surgery, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Fetscherstraße 74, 01307, Dresden, Germany
| | - M Wagner
- Department for General, Visceral and Transplant Surgery, Heidelberg University Hospital, Heidelberg, Germany.
- National Center for Tumor Diseases (NCT), Heidelberg, Germany.
- Department for Translational Surgical Oncology, National Center for Tumor Diseases, Partner Site Dresden, Dresden, Germany.
- Centre for the Tactile Internet with Human-in-the-Loop (CeTI), TUD Dresden University of Technology, Dresden, Germany.
- Department of Visceral, Thoracic and Vascular Surgery, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Fetscherstraße 74, 01307, Dresden, Germany.
| |
Collapse
|
3
|
Li Y, Bai B, Jia F. Parameter-efficient framework for surgical action triplet recognition. Int J Comput Assist Radiol Surg 2024; 19:1291-1299. [PMID: 38689146 DOI: 10.1007/s11548-024-03147-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 04/10/2024] [Indexed: 05/02/2024]
Abstract
PURPOSE Surgical action triplet recognition is a clinically significant yet challenging task. It provides surgeons with detailed information about surgical scenarios, thereby facilitating clinical decision-making. However, the high similarity among action triplets presents a formidable obstacle to recognition. To enhance accuracy, prior methods necessitated the utilization of larger models, thereby incurring a considerable computational burden. METHODS We propose a novel framework known as the Lite and Mega Models (LAM). It comprises a CNN-based fully fine-tuned model (LAM-Lite) and a parameter-efficient fine-tuned model based on the foundation model using Transformer architecture (LAM-Mega). Temporal multi-label data augmentation is introduced for extracting robust class-level features. RESULTS Our study demonstrates that LAM outperforms prior methods across various parameter scales on the CholecT50 dataset. Using fewer tunable parameters, LAM achieves a mean average precision (mAP) of 42.1%, a 3.6% improvement over the previous state of the art. CONCLUSION Leveraging effective structural design and robust capabilities of the foundational model, our proposed approach successfully strikes a balance between accuracy and computational efficiency. The source code is accessible at https://github.com/Lycus99/LAM .
Collapse
Affiliation(s)
- Yuchong Li
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen, China
| | - Bizhe Bai
- University of Toronto, Toronto, ON, Canada
| | - Fucang Jia
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.
- The Key Laboratory of Biomedical Imaging Science and System, Chinese Academy of Sciences, Shenzhen, China.
| |
Collapse
|
4
|
Batić D, Holm F, Özsoy E, Czempiel T, Navab N. EndoViT: pretraining vision transformers on a large collection of endoscopic images. Int J Comput Assist Radiol Surg 2024; 19:1085-1091. [PMID: 38570373 PMCID: PMC11178556 DOI: 10.1007/s11548-024-03091-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 02/28/2024] [Indexed: 04/05/2024]
Abstract
PURPOSE Automated endoscopy video analysis is essential for assisting surgeons during medical procedures, but it faces challenges due to complex surgical scenes and limited annotated data. Large-scale pretraining has shown great success in natural language processing and computer vision communities in recent years. These approaches reduce the need for annotated data, which is of great interest in the medical domain. In this work, we investigate endoscopy domain-specific self-supervised pretraining on large collections of data. METHODS To this end, we first collect Endo700k, the largest publicly available corpus of endoscopic images, extracted from nine public Minimally Invasive Surgery (MIS) datasets. Endo700k comprises more than 700,000 images. Next, we introduce EndoViT, an endoscopy-pretrained Vision Transformer (ViT), and evaluate it on a diverse set of surgical downstream tasks. RESULTS Our findings indicate that domain-specific pretraining with EndoViT yields notable advantages in complex downstream tasks. In the case of action triplet recognition, our approach outperforms ImageNet pretraining. In semantic segmentation, we surpass the state-of-the-art (SOTA) performance. These results demonstrate the effectiveness of our domain-specific pretraining approach in addressing the challenges of automated endoscopy video analysis. CONCLUSION Our study contributes to the field of medical computer vision by showcasing the benefits of domain-specific large-scale self-supervised pretraining for vision transformers. We release both our code and pretrained models to facilitate further research in this direction: https://github.com/DominikBatic/EndoViT .
Collapse
Affiliation(s)
- Dominik Batić
- Chair for Computer Aided Medical Procedures, Technical University Munich, Munich, Germany
| | - Felix Holm
- Chair for Computer Aided Medical Procedures, Technical University Munich, Munich, Germany.
- Carl Zeiss AG, Munich, Germany.
| | - Ege Özsoy
- Chair for Computer Aided Medical Procedures, Technical University Munich, Munich, Germany
| | - Tobias Czempiel
- Chair for Computer Aided Medical Procedures, Technical University Munich, Munich, Germany
| | - Nassir Navab
- Chair for Computer Aided Medical Procedures, Technical University Munich, Munich, Germany
| |
Collapse
|
5
|
Lavanchy JL, Ramesh S, Dall'Alba D, Gonzalez C, Fiorini P, Müller-Stich BP, Nett PC, Marescaux J, Mutter D, Padoy N. Challenges in multi-centric generalization: phase and step recognition in Roux-en-Y gastric bypass surgery. Int J Comput Assist Radiol Surg 2024:10.1007/s11548-024-03166-3. [PMID: 38761319 DOI: 10.1007/s11548-024-03166-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Accepted: 04/02/2024] [Indexed: 05/20/2024]
Abstract
PURPOSE Most studies on surgical activity recognition utilizing artificial intelligence (AI) have focused mainly on recognizing one type of activity from small and mono-centric surgical video datasets. It remains speculative whether those models would generalize to other centers. METHODS In this work, we introduce a large multi-centric multi-activity dataset consisting of 140 surgical videos (MultiBypass140) of laparoscopic Roux-en-Y gastric bypass (LRYGB) surgeries performed at two medical centers, i.e., the University Hospital of Strasbourg, France (StrasBypass70) and Inselspital, Bern University Hospital, Switzerland (BernBypass70). The dataset has been fully annotated with phases and steps by two board-certified surgeons. Furthermore, we assess the generalizability and benchmark different deep learning models for the task of phase and step recognition in 7 experimental studies: (1) Training and evaluation on BernBypass70; (2) Training and evaluation on StrasBypass70; (3) Training and evaluation on the joint MultiBypass140 dataset; (4) Training on BernBypass70, evaluation on StrasBypass70; (5) Training on StrasBypass70, evaluation on BernBypass70; Training on MultiBypass140, (6) evaluation on BernBypass70 and (7) evaluation on StrasBypass70. RESULTS The model's performance is markedly influenced by the training data. The worst results were obtained in experiments (4) and (5) confirming the limited generalization capabilities of models trained on mono-centric data. The use of multi-centric training data, experiments (6) and (7), improves the generalization capabilities of the models, bringing them beyond the level of independent mono-centric training and validation (experiments (1) and (2)). CONCLUSION MultiBypass140 shows considerable variation in surgical technique and workflow of LRYGB procedures between centers. Therefore, generalization experiments demonstrate a remarkable difference in model performance. These results highlight the importance of multi-centric datasets for AI model generalization to account for variance in surgical technique and workflows. The dataset and code are publicly available at https://github.com/CAMMA-public/MultiBypass140.
Collapse
Affiliation(s)
- Joël L Lavanchy
- University Digestive Health Care Center - Clarunis, 4002, Basel, Switzerland.
- Department of Biomedical Engineering, University of Basel, 4123, Allschwil, Switzerland.
- Institute of Image-Guided Surgery, IHU Strasbourg, 67000, Strasbourg, France.
| | - Sanat Ramesh
- Institute of Image-Guided Surgery, IHU Strasbourg, 67000, Strasbourg, France
- ICube, University of Strasbourg, CNRS, 67000, Strasbourg, France
- Altair Robotics Lab, University of Verona, 37134, Verona, Italy
| | - Diego Dall'Alba
- Altair Robotics Lab, University of Verona, 37134, Verona, Italy
| | - Cristians Gonzalez
- Institute of Image-Guided Surgery, IHU Strasbourg, 67000, Strasbourg, France
- University Hospital of Strasbourg, 67000, Strasbourg, France
| | - Paolo Fiorini
- Altair Robotics Lab, University of Verona, 37134, Verona, Italy
| | - Beat P Müller-Stich
- University Digestive Health Care Center - Clarunis, 4002, Basel, Switzerland
- Department of Biomedical Engineering, University of Basel, 4123, Allschwil, Switzerland
| | - Philipp C Nett
- Department of Visceral Surgery and Medicine, Inselspital Bern University Hospital, 3010, Bern, Switzerland
| | | | - Didier Mutter
- Institute of Image-Guided Surgery, IHU Strasbourg, 67000, Strasbourg, France
- University Hospital of Strasbourg, 67000, Strasbourg, France
| | - Nicolas Padoy
- Institute of Image-Guided Surgery, IHU Strasbourg, 67000, Strasbourg, France
- ICube, University of Strasbourg, CNRS, 67000, Strasbourg, France
| |
Collapse
|
6
|
Wang Y, Ye Z, Wen M, Liang H, Zhang X. TransVFS: A spatio-temporal local-global transformer for vision-based force sensing during ultrasound-guided prostate biopsy. Med Image Anal 2024; 94:103130. [PMID: 38437787 DOI: 10.1016/j.media.2024.103130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 02/16/2024] [Accepted: 02/29/2024] [Indexed: 03/06/2024]
Abstract
Robot-assisted prostate biopsy is a new technology to diagnose prostate cancer, but its safety is influenced by the inability of robots to sense the tool-tissue interaction force accurately during biopsy. Recently, vision based force sensing (VFS) provides a potential solution to this issue by utilizing image sequences to infer the interaction force. However, the existing mainstream VFS methods cannot realize the accurate force sensing due to the adoption of convolutional or recurrent neural network to learn deformation from the optical images and some of these methods are not efficient especially when the recurrent convolutional operations are involved. This paper has presented a Transformer based VFS (TransVFS) method by leveraging ultrasound volume sequences acquired during prostate biopsy. The TransVFS method uses a spatio-temporal local-global Transformer to capture the local image details and the global dependency simultaneously to learn prostate deformations for force estimation. Distinctively, our method explores both the spatial and temporal attention mechanisms for image feature learning, thereby addressing the influence of the low ultrasound image resolution and the unclear prostate boundary on the accurate force estimation. Meanwhile, the two efficient local-global attention modules are introduced to reduce 4D spatio-temporal computation burden by utilizing the factorized spatio-temporal processing strategy, thereby facilitating the fast force estimation. Experiments on prostate phantom and beagle dogs show that our method significantly outperforms existing VFS methods and other spatio-temporal Transformer models. The TransVFS method surpasses the most competitive compared method ResNet3dGRU by providing the mean absolute errors of force estimation, i.e., 70.4 ± 60.0 millinewton (mN) vs 123.7 ± 95.6 mN, on the transabdominal ultrasound dataset of dogs.
Collapse
Affiliation(s)
- Yibo Wang
- Department of Biomedical Engineering, College of Life Science and Technology, Huazhong University of Science and Technology, No 1037, Luyou Road, Wuhan, China
| | - Zhichao Ye
- Department of Urology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, No 13, Hangkong Road, Wuhan, China
| | - Mingwei Wen
- Department of Biomedical Engineering, College of Life Science and Technology, Huazhong University of Science and Technology, No 1037, Luyou Road, Wuhan, China
| | - Huageng Liang
- Department of Urology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, No 13, Hangkong Road, Wuhan, China
| | - Xuming Zhang
- Department of Biomedical Engineering, College of Life Science and Technology, Huazhong University of Science and Technology, No 1037, Luyou Road, Wuhan, China.
| |
Collapse
|
7
|
Gui S, Wang Z, Chen J, Zhou X, Zhang C, Cao Y. MT4MTL-KD: A Multi-Teacher Knowledge Distillation Framework for Triplet Recognition. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:1628-1639. [PMID: 38127608 DOI: 10.1109/tmi.2023.3345736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
The recognition of surgical triplets plays a critical role in the practical application of surgical videos. It involves the sub-tasks of recognizing instruments, verbs, and targets, while establishing precise associations between them. Existing methods face two significant challenges in triplet recognition: 1) the imbalanced class distribution of surgical triplets may lead to spurious task association learning, and 2) the feature extractors cannot reconcile local and global context modeling. To overcome these challenges, this paper presents a novel multi-teacher knowledge distillation framework for multi-task triplet learning, known as MT4MTL-KD. MT4MTL-KD leverages teacher models trained on less imbalanced sub-tasks to assist multi-task student learning for triplet recognition. Moreover, we adopt different categories of backbones for the teacher and student models, facilitating the integration of local and global context modeling. To further align the semantic knowledge between the triplet task and its sub-tasks, we propose a novel feature attention module (FAM). This module utilizes attention mechanisms to assign multi-task features to specific sub-tasks. We evaluate the performance of MT4MTL-KD on both the 5-fold cross-validation and the CholecTriplet challenge splits of the CholecT45 dataset. The experimental results consistently demonstrate the superiority of our framework over state-of-the-art methods, achieving significant improvements of up to 6.4% on the cross-validation split.
Collapse
|
8
|
Liu Y, Hayashi Y, Oda M, Kitasaka T, Mori K. YOLOv7-RepFPN: Improving real-time performance of laparoscopic tool detection on embedded systems. Healthc Technol Lett 2024; 11:157-166. [PMID: 38638498 PMCID: PMC11022232 DOI: 10.1049/htl2.12072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 12/09/2023] [Indexed: 04/20/2024] Open
Abstract
This study focuses on enhancing the inference speed of laparoscopic tool detection on embedded devices. Laparoscopy, a minimally invasive surgery technique, markedly reduces patient recovery times and postoperative complications. Real-time laparoscopic tool detection helps assisting laparoscopy by providing information for surgical navigation, and its implementation on embedded devices is gaining interest due to the portability, network independence and scalability of the devices. However, embedded devices often face computation resource limitations, potentially hindering inference speed. To mitigate this concern, the work introduces a two-fold modification to the YOLOv7 model: the feature channels and integrate RepBlock is halved, yielding the YOLOv7-RepFPN model. This configuration leads to a significant reduction in computational complexity. Additionally, the focal EIoU (efficient intersection of union) loss function is employed for bounding box regression. Experimental results on an embedded device demonstrate that for frame-by-frame laparoscopic tool detection, the proposed YOLOv7-RepFPN achieved an mAP of 88.2% (with IoU set to 0.5) on a custom dataset based on EndoVis17, and an inference speed of 62.9 FPS. Contrasting with the original YOLOv7, which garnered an 89.3% mAP and 41.8 FPS under identical conditions, the methodology enhances the speed by 21.1 FPS while maintaining detection accuracy. This emphasizes the effectiveness of the work.
Collapse
Affiliation(s)
- Yuzhang Liu
- Graduate School of InformaticsNagoya UniversityAichi, NagoyaJapan
| | - Yuichiro Hayashi
- Graduate School of InformaticsNagoya UniversityAichi, NagoyaJapan
| | - Masahiro Oda
- Graduate School of InformaticsNagoya UniversityAichi, NagoyaJapan
- Information and CommunicationsNagoya UniversityAichi NagoyaJapan
| | - Takayuki Kitasaka
- Department of Information ScienceAichi Institute of TechnologyAichi, NagoyaJapan
| | - Kensaku Mori
- Graduate School of InformaticsNagoya UniversityAichi, NagoyaJapan
- Information and CommunicationsNagoya UniversityAichi NagoyaJapan
- Research Center of Medical BigdataNational Institute of InformaticsTokyoJapan
| |
Collapse
|
9
|
Kaleta J, Dall'Alba D, Płotka S, Korzeniowski P. Minimal data requirement for realistic endoscopic image generation with Stable Diffusion. Int J Comput Assist Radiol Surg 2024; 19:531-539. [PMID: 37934401 PMCID: PMC10881618 DOI: 10.1007/s11548-023-03030-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Accepted: 10/11/2023] [Indexed: 11/08/2023]
Abstract
PURPOSE Computer-assisted surgical systems provide support information to the surgeon, which can improve the execution and overall outcome of the procedure. These systems are based on deep learning models that are trained on complex and challenging-to-annotate data. Generating synthetic data can overcome these limitations, but it is necessary to reduce the domain gap between real and synthetic data. METHODS We propose a method for image-to-image translation based on a Stable Diffusion model, which generates realistic images starting from synthetic data. Compared to previous works, the proposed method is better suited for clinical application as it requires a much smaller amount of input data and allows finer control over the generation of details by introducing different variants of supporting control networks. RESULTS The proposed method is applied in the context of laparoscopic cholecystectomy, using synthetic and real data from public datasets. It achieves a mean Intersection over Union of 69.76%, significantly improving the baseline results (69.76 vs. 42.21%). CONCLUSIONS The proposed method for translating synthetic images into images with realistic characteristics will enable the training of deep learning methods that can generalize optimally to real-world contexts, thereby improving computer-assisted intervention guidance systems.
Collapse
Affiliation(s)
- Joanna Kaleta
- Sano Centre for Computational Medicine, Krakow, Poland
| | - Diego Dall'Alba
- Sano Centre for Computational Medicine, Krakow, Poland.
- Department of Computer Science, University of Verona, Verona, Italy.
| | - Szymon Płotka
- Sano Centre for Computational Medicine, Krakow, Poland
- Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands
- Department of Biomedical Engineering and Physics, Amsterdam University Medical Center, Amsterdam, The Netherlands
| | | |
Collapse
|
10
|
Zhai Y, Chen Z, Zheng Z, Wang X, Yan X, Liu X, Yin J, Wang J, Zhang J. Artificial intelligence for automatic surgical phase recognition of laparoscopic gastrectomy in gastric cancer. Int J Comput Assist Radiol Surg 2024; 19:345-353. [PMID: 37914911 DOI: 10.1007/s11548-023-03027-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Accepted: 10/02/2023] [Indexed: 11/03/2023]
Abstract
PURPOSE This study aimed to classify laparoscopic gastric cancer phases. We also aimed to develop a transformer-based artificial intelligence (AI) model for automatic surgical phase recognition and to evaluate the model's performance using laparoscopic gastric cancer surgical videos. METHODS One hundred patients who underwent laparoscopic surgery for gastric cancer were included in this study. All surgical videos were labeled and classified into eight phases (P0. Preparation. P1. Separate the greater gastric curvature. P2. Separate the distal stomach. P3. Separate lesser gastric curvature. P4. Dissect the superior margin of the pancreas. P5. Separation of the proximal stomach. P6. Digestive tract reconstruction. P7. End of operation). This study proposed an AI phase recognition model consisting of a convolutional neural network-based visual feature extractor and temporal relational transformer. RESULTS A visual and temporal relationship network was proposed to automatically perform accurate surgical phase prediction. The average time for all surgical videos in the video set was 9114 ± 2571 s. The longest phase was at P1 (3388 s). The final research accuracy, F1, recall, and precision were 90.128, 87.04, 87.04, and 87.32%, respectively. The phase with the highest recognition accuracy was P1, and that with the lowest accuracy was P2. CONCLUSION An AI model based on neural and transformer networks was developed in this study. This model can identify the phases of laparoscopic surgery for gastric cancer accurately. AI can be used as an analytical tool for gastric cancer surgical videos.
Collapse
Affiliation(s)
- Yuhao Zhai
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, 95 Yong'an Road, Xicheng District, Beijing, China
- State Key Lab of Digestive Health, Beijing, China
| | - Zhen Chen
- Centre for Artificial Intelligence and Robotics (CAIR), Hong Kong Institute of Science and Innovation, Chinese Academy of Sciences, Hong Kong SAR, China
| | - Zhi Zheng
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, 95 Yong'an Road, Xicheng District, Beijing, China
- State Key Lab of Digestive Health, Beijing, China
| | - Xi Wang
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, 95 Yong'an Road, Xicheng District, Beijing, China
- State Key Lab of Digestive Health, Beijing, China
| | - Xiaosheng Yan
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, 95 Yong'an Road, Xicheng District, Beijing, China
- State Key Lab of Digestive Health, Beijing, China
| | - Xiaoye Liu
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, 95 Yong'an Road, Xicheng District, Beijing, China
- State Key Lab of Digestive Health, Beijing, China
| | - Jie Yin
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, 95 Yong'an Road, Xicheng District, Beijing, China
- State Key Lab of Digestive Health, Beijing, China
| | - Jinqiao Wang
- Centre for Artificial Intelligence and Robotics (CAIR), Hong Kong Institute of Science and Innovation, Chinese Academy of Sciences, Hong Kong SAR, China.
- National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, 95 Zhongguancun East Road, Haidian District, Beijing, China.
- Wuhan AI Research, Wuhan, China.
| | - Jun Zhang
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, 95 Yong'an Road, Xicheng District, Beijing, China.
- State Key Lab of Digestive Health, Beijing, China.
| |
Collapse
|
11
|
Yang X, Huang K, Yang D, Zhao W, Zhou X. Biomedical Big Data Technologies, Applications, and Challenges for Precision Medicine: A Review. GLOBAL CHALLENGES (HOBOKEN, NJ) 2024; 8:2300163. [PMID: 38223896 PMCID: PMC10784210 DOI: 10.1002/gch2.202300163] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Revised: 09/20/2023] [Indexed: 01/16/2024]
Abstract
The explosive growth of biomedical Big Data presents both significant opportunities and challenges in the realm of knowledge discovery and translational applications within precision medicine. Efficient management, analysis, and interpretation of big data can pave the way for groundbreaking advancements in precision medicine. However, the unprecedented strides in the automated collection of large-scale molecular and clinical data have also introduced formidable challenges in terms of data analysis and interpretation, necessitating the development of novel computational approaches. Some potential challenges include the curse of dimensionality, data heterogeneity, missing data, class imbalance, and scalability issues. This overview article focuses on the recent progress and breakthroughs in the application of big data within precision medicine. Key aspects are summarized, including content, data sources, technologies, tools, challenges, and existing gaps. Nine fields-Datawarehouse and data management, electronic medical record, biomedical imaging informatics, Artificial intelligence-aided surgical design and surgery optimization, omics data, health monitoring data, knowledge graph, public health informatics, and security and privacy-are discussed.
Collapse
Affiliation(s)
- Xue Yang
- Department of Pancreatic Surgery and West China Biomedical Big Data CenterWest China HospitalSichuan UniversityChengdu610041China
| | - Kexin Huang
- Department of Pancreatic Surgery and West China Biomedical Big Data CenterWest China HospitalSichuan UniversityChengdu610041China
| | - Dewei Yang
- College of Advanced Manufacturing EngineeringChongqing University of Posts and TelecommunicationsChongqingChongqing400000China
| | - Weiling Zhao
- Center for Systems MedicineSchool of Biomedical InformaticsUTHealth at HoustonHoustonTX77030USA
| | - Xiaobo Zhou
- Center for Systems MedicineSchool of Biomedical InformaticsUTHealth at HoustonHoustonTX77030USA
| |
Collapse
|
12
|
Pinto-Coelho L. How Artificial Intelligence Is Shaping Medical Imaging Technology: A Survey of Innovations and Applications. Bioengineering (Basel) 2023; 10:1435. [PMID: 38136026 PMCID: PMC10740686 DOI: 10.3390/bioengineering10121435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Revised: 12/12/2023] [Accepted: 12/15/2023] [Indexed: 12/24/2023] Open
Abstract
The integration of artificial intelligence (AI) into medical imaging has guided in an era of transformation in healthcare. This literature review explores the latest innovations and applications of AI in the field, highlighting its profound impact on medical diagnosis and patient care. The innovation segment explores cutting-edge developments in AI, such as deep learning algorithms, convolutional neural networks, and generative adversarial networks, which have significantly improved the accuracy and efficiency of medical image analysis. These innovations have enabled rapid and accurate detection of abnormalities, from identifying tumors during radiological examinations to detecting early signs of eye disease in retinal images. The article also highlights various applications of AI in medical imaging, including radiology, pathology, cardiology, and more. AI-based diagnostic tools not only speed up the interpretation of complex images but also improve early detection of disease, ultimately delivering better outcomes for patients. Additionally, AI-based image processing facilitates personalized treatment plans, thereby optimizing healthcare delivery. This literature review highlights the paradigm shift that AI has brought to medical imaging, highlighting its role in revolutionizing diagnosis and patient care. By combining cutting-edge AI techniques and their practical applications, it is clear that AI will continue shaping the future of healthcare in profound and positive ways.
Collapse
Affiliation(s)
- Luís Pinto-Coelho
- ISEP—School of Engineering, Polytechnic Institute of Porto, 4200-465 Porto, Portugal;
- INESCTEC, Campus of the Engineering Faculty of the University of Porto, 4200-465 Porto, Portugal
| |
Collapse
|
13
|
Hutchinson K, Reyes I, Li Z, Alemzadeh H. COMPASS: a formal framework and aggregate dataset for generalized surgical procedure modeling. Int J Comput Assist Radiol Surg 2023; 18:2143-2154. [PMID: 37145250 DOI: 10.1007/s11548-023-02922-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Accepted: 04/14/2023] [Indexed: 05/06/2023]
Abstract
PURPOSE We propose a formal framework for the modeling and segmentation of minimally invasive surgical tasks using a unified set of motion primitives (MPs) to enable more objective labeling and the aggregation of different datasets. METHODS We model dry-lab surgical tasks as finite state machines, representing how the execution of MPs as the basic surgical actions results in the change of surgical context, which characterizes the physical interactions among tools and objects in the surgical environment. We develop methods for labeling surgical context based on video data and for automatic translation of context to MP labels. We then use our framework to create the COntext and Motion Primitive Aggregate Surgical Set (COMPASS), including six dry-lab surgical tasks from three publicly available datasets (JIGSAWS, DESK, and ROSMA), with kinematic and video data and context and MP labels. RESULTS Our context labeling method achieves near-perfect agreement between consensus labels from crowd-sourcing and expert surgeons. Segmentation of tasks to MPs results in the creation of the COMPASS dataset that nearly triples the amount of data for modeling and analysis and enables the generation of separate transcripts for the left and right tools. CONCLUSION The proposed framework results in high quality labeling of surgical data based on context and fine-grained MPs. Modeling surgical tasks with MPs enables the aggregation of different datasets and the separate analysis of left and right hands for bimanual coordination assessment. Our formal framework and aggregate dataset can support the development of explainable and multi-granularity models for improved surgical process analysis, skill assessment, error detection, and autonomy.
Collapse
Affiliation(s)
- Kay Hutchinson
- Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA, 22903, USA.
| | - Ian Reyes
- Department of Computer Science, University of Virginia, Charlottesville, VA, 22903, USA
- IBM, RTP, Durham, NC, 27709, USA
| | - Zongyu Li
- Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA, 22903, USA
| | - Homa Alemzadeh
- Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA, 22903, USA
- Department of Computer Science, University of Virginia, Charlottesville, VA, 22903, USA
| |
Collapse
|
14
|
Park B, Chi H, Park B, Lee J, Jin HS, Park S, Hyung WJ, Choi MK. Visual modalities-based multimodal fusion for surgical phase recognition. Comput Biol Med 2023; 166:107453. [PMID: 37774560 DOI: 10.1016/j.compbiomed.2023.107453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 08/17/2023] [Accepted: 09/04/2023] [Indexed: 10/01/2023]
Abstract
Surgical workflow analysis is essential to help optimize surgery by encouraging efficient communication and the use of resources. However, the performance of phase recognition is limited by the use of information related to the presence of surgical instruments. To address the problem, we propose visual modality-based multimodal fusion for surgical phase recognition to overcome the limited diversity of information such as the presence of instruments. Using the proposed methods, we extracted a visual kinematics-based index related to using instruments, such as movement and their interrelations during surgery. In addition, we improved recognition performance using an effective convolutional neural network (CNN)-based fusion method for visual features and a visual kinematics-based index (VKI). The visual kinematics-based index improves the understanding of a surgical procedure since information is related to instrument interaction. Furthermore, these indices can be extracted in any environment, such as laparoscopic surgery, and help obtain complementary information for system kinematics log errors. The proposed methodology was applied to two multimodal datasets, a virtual reality (VR) simulator-based dataset (PETRAW) and a private distal gastrectomy surgery dataset, to verify that it can help improve recognition performance in clinical environments. We also explored the influence of a visual kinematics-based index to recognize each surgical workflow by the instrument's existence and the instrument's trajectory. Through the experimental results of a distal gastrectomy video dataset, we validated the effectiveness of our proposed fusion approach in surgical phase recognition. The relatively simple yet index-incorporated fusion we propose can yield significant performance improvements over only CNN-based training and exhibits effective training results compared to fusion based on Transformers, which require a large amount of pre-trained data.
Collapse
Affiliation(s)
- Bogyu Park
- AI Dev. Group, Hutom, Dokmak-ro 279, Mapo-gu, 04151, Seoul, Republic of Korea.
| | - Hyeongyu Chi
- AI Dev. Group, Hutom, Dokmak-ro 279, Mapo-gu, 04151, Seoul, Republic of Korea.
| | - Bokyung Park
- AI Dev. Group, Hutom, Dokmak-ro 279, Mapo-gu, 04151, Seoul, Republic of Korea.
| | - Jiwon Lee
- AI Dev. Group, Hutom, Dokmak-ro 279, Mapo-gu, 04151, Seoul, Republic of Korea.
| | - Hye Su Jin
- AI Dev. Group, Hutom, Dokmak-ro 279, Mapo-gu, 04151, Seoul, Republic of Korea.
| | - Sunghyun Park
- Yonsei University College of Medicine, Yonsei-ro 50, Seodaemun-gu, 03722, Seoul, Republic of Korea.
| | - Woo Jin Hyung
- AI Dev. Group, Hutom, Dokmak-ro 279, Mapo-gu, 04151, Seoul, Republic of Korea; Yonsei University College of Medicine, Yonsei-ro 50, Seodaemun-gu, 03722, Seoul, Republic of Korea.
| | - Min-Kook Choi
- AI Dev. Group, Hutom, Dokmak-ro 279, Mapo-gu, 04151, Seoul, Republic of Korea.
| |
Collapse
|
15
|
Demir KC, Schieber H, Weise T, Roth D, May M, Maier A, Yang SH. Deep Learning in Surgical Workflow Analysis: A Review of Phase and Step Recognition. IEEE J Biomed Health Inform 2023; 27:5405-5417. [PMID: 37665700 DOI: 10.1109/jbhi.2023.3311628] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2023]
Abstract
OBJECTIVE In the last two decades, there has been a growing interest in exploring surgical procedures with statistical models to analyze operations at different semantic levels. This information is necessary for developing context-aware intelligent systems, which can assist the physicians during operations, evaluate procedures afterward or help the management team to effectively utilize the operating room. The objective is to extract reliable patterns from surgical data for the robust estimation of surgical activities performed during operations. The purpose of this article is to review the state-of-the-art deep learning methods that have been published after 2018 for analyzing surgical workflows, with a focus on phase and step recognition. METHODS Three databases, IEEE Xplore, Scopus, and PubMed were searched, and additional studies are added through a manual search. After the database search, 343 studies were screened and a total of 44 studies are selected for this review. CONCLUSION The use of temporal information is essential for identifying the next surgical action. Contemporary methods used mainly RNNs, hierarchical CNNs, and Transformers to preserve long-distance temporal relations. The lack of large publicly available datasets for various procedures is a great challenge for the development of new and robust models. As supervised learning strategies are used to show proof-of-concept, self-supervised, semi-supervised, or active learning methods are used to mitigate dependency on annotated data. SIGNIFICANCE The present study provides a comprehensive review of recent methods in surgical workflow analysis, summarizes commonly used architectures, datasets, and discusses challenges.
Collapse
|
16
|
Zhang J, Zhou S, Wang Y, Shi S, Wan C, Zhao H, Cai X, Ding H. Laparoscopic Image-Based Critical Action Recognition and Anticipation With Explainable Features. IEEE J Biomed Health Inform 2023; 27:5393-5404. [PMID: 37603480 DOI: 10.1109/jbhi.2023.3306818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/23/2023]
Abstract
Surgical workflow analysis integrates perception, comprehension, and prediction of the surgical workflow, which helps real-time surgical support systems provide proper guidance and assistance for surgeons. This article promotes the idea of critical actions, which refer to the essential surgical actions that progress towards the fulfillment of the operation. Fine-grained workflow analysis involves recognizing current critical actions and previewing the moving tendency of instruments in the early stage of critical actions. Aiming at this, we propose a framework that incorporates operational experience to improve the robustness and interpretability of action recognition in in-vivo situations. High-dimensional images are mapped into an experience-based explainable feature space with low dimensions to achieve critical action recognition through a hierarchical classification structure. To forecast the instrument's motion tendency, we model the motion primitives in the polar coordinate system (PCS) to represent patterns of complex trajectories. Given the laparoscopy variance, the adaptive pattern recognition (APR) method, which adapts to uncertain trajectories by modifying model parameters, is designed to improve prediction accuracy. The in-vivo dataset validations show that our framework fulfilled the surgical awareness tasks with exceptional accuracy and real-time performance.
Collapse
|
17
|
Nwoye CI, Yu T, Sharma S, Murali A, Alapatt D, Vardazaryan A, Yuan K, Hajek J, Reiter W, Yamlahi A, Smidt FH, Zou X, Zheng G, Oliveira B, Torres HR, Kondo S, Kasai S, Holm F, Özsoy E, Gui S, Li H, Raviteja S, Sathish R, Poudel P, Bhattarai B, Wang Z, Rui G, Schellenberg M, Vilaça JL, Czempiel T, Wang Z, Sheet D, Thapa SK, Berniker M, Godau P, Morais P, Regmi S, Tran TN, Fonseca J, Nölke JH, Lima E, Vazquez E, Maier-Hein L, Navab N, Mascagni P, Seeliger B, Gonzalez C, Mutter D, Padoy N. CholecTriplet2022: Show me a tool and tell me the triplet - An endoscopic vision challenge for surgical action triplet detection. Med Image Anal 2023; 89:102888. [PMID: 37451133 DOI: 10.1016/j.media.2023.102888] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 06/23/2023] [Accepted: 06/28/2023] [Indexed: 07/18/2023]
Abstract
Formalizing surgical activities as triplets of the used instruments, actions performed, and target anatomies is becoming a gold standard approach for surgical activity modeling. The benefit is that this formalization helps to obtain a more detailed understanding of tool-tissue interaction which can be used to develop better Artificial Intelligence assistance for image-guided surgery. Earlier efforts and the CholecTriplet challenge introduced in 2021 have put together techniques aimed at recognizing these triplets from surgical footage. Estimating also the spatial locations of the triplets would offer a more precise intraoperative context-aware decision support for computer-assisted intervention. This paper presents the CholecTriplet2022 challenge, which extends surgical action triplet modeling from recognition to detection. It includes weakly-supervised bounding box localization of every visible surgical instrument (or tool), as the key actors, and the modeling of each tool-activity in the form of ‹instrument, verb, target› triplet. The paper describes a baseline method and 10 new deep learning algorithms presented at the challenge to solve the task. It also provides thorough methodological comparisons of the methods, an in-depth analysis of the obtained results across multiple metrics, visual and procedural challenges; their significance, and useful insights for future research directions and applications in surgery.
Collapse
Affiliation(s)
| | - Tong Yu
- ICube, University of Strasbourg, CNRS, France
| | | | | | | | | | - Kun Yuan
- ICube, University of Strasbourg, CNRS, France; Technical University Munich, Germany
| | | | | | - Amine Yamlahi
- Division of Intelligent Medical Systems (IMSY), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Finn-Henri Smidt
- Division of Intelligent Medical Systems (IMSY), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Xiaoyang Zou
- Institute of Medical Robotics, School of Biomedical Engineering, Shanghai Jiao Tong University, China
| | - Guoyan Zheng
- Institute of Medical Robotics, School of Biomedical Engineering, Shanghai Jiao Tong University, China
| | - Bruno Oliveira
- 2Ai School of Technology, IPCA, Barcelos, Portugal; Life and Health Science Research Institute (ICVS), School of Medicine, University of Minho, Braga, Portugal; Algoritimi Center, School of Engineering, University of Minho, Guimeraes, Portugal
| | - Helena R Torres
- 2Ai School of Technology, IPCA, Barcelos, Portugal; Life and Health Science Research Institute (ICVS), School of Medicine, University of Minho, Braga, Portugal; Algoritimi Center, School of Engineering, University of Minho, Guimeraes, Portugal
| | | | | | | | - Ege Özsoy
- Technical University Munich, Germany
| | | | - Han Li
- Southern University of Science and Technology, China
| | | | | | | | | | | | | | - Melanie Schellenberg
- Division of Intelligent Medical Systems (IMSY), German Cancer Research Center (DKFZ), Heidelberg, Germany; National Center for Tumor Diseases (NCT), Heidelberg, Germany
| | | | | | - Zhenkun Wang
- Southern University of Science and Technology, China
| | | | - Shrawan Kumar Thapa
- Nepal Applied Mathematics and Informatics Institute for research (NAAMII), Nepal
| | | | - Patrick Godau
- Division of Intelligent Medical Systems (IMSY), German Cancer Research Center (DKFZ), Heidelberg, Germany; National Center for Tumor Diseases (NCT), Heidelberg, Germany
| | - Pedro Morais
- 2Ai School of Technology, IPCA, Barcelos, Portugal
| | - Sudarshan Regmi
- Nepal Applied Mathematics and Informatics Institute for research (NAAMII), Nepal
| | - Thuy Nuong Tran
- Division of Intelligent Medical Systems (IMSY), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Jaime Fonseca
- Algoritimi Center, School of Engineering, University of Minho, Guimeraes, Portugal
| | - Jan-Hinrich Nölke
- Division of Intelligent Medical Systems (IMSY), German Cancer Research Center (DKFZ), Heidelberg, Germany; National Center for Tumor Diseases (NCT), Heidelberg, Germany
| | - Estevão Lima
- Life and Health Science Research Institute (ICVS), School of Medicine, University of Minho, Braga, Portugal
| | | | - Lena Maier-Hein
- Division of Intelligent Medical Systems (IMSY), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | | | - Pietro Mascagni
- Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Rome, Italy
| | - Barbara Seeliger
- ICube, University of Strasbourg, CNRS, France; University Hospital of Strasbourg, France; IHU Strasbourg, France
| | | | - Didier Mutter
- University Hospital of Strasbourg, France; IHU Strasbourg, France
| | - Nicolas Padoy
- ICube, University of Strasbourg, CNRS, France; IHU Strasbourg, France
| |
Collapse
|
18
|
Li Y, Xia T, Luo H, He B, Jia F. MT-FiST: A Multi-Task Fine-Grained Spatial-Temporal Framework for Surgical Action Triplet Recognition. IEEE J Biomed Health Inform 2023; 27:4983-4994. [PMID: 37498758 DOI: 10.1109/jbhi.2023.3299321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Surgical action triplet recognition plays a significant role in helping surgeons facilitate scene analysis and decision-making in computer-assisted surgeries. Compared to traditional context-aware tasks such as phase recognition, surgical action triplets, comprising the instrument, verb, and target, can offer more comprehensive and detailed information. However, current triplet recognition methods fall short in distinguishing the fine-grained subclasses and disregard temporal correlation in action triplets. In this article, we propose a multi-task fine-grained spatial-temporal framework for surgical action triplet recognition named MT-FiST. The proposed method utilizes a multi-label mutual channel loss, which consists of diversity and discriminative components. This loss function decouples global task features into class-aligned features, enabling the learning of more local details from the surgical scene. The proposed framework utilizes partial shared-parameters LSTM units to capture temporal correlations between adjacent frames. We conducted experiments on the CholecT50 dataset proposed in the MICCAI 2021 Surgical Action Triplet Recognition Challenge. Our framework is evaluated on the private test set of the challenge to ensure fair comparisons. Our model apparently outperformed state-of-the-art models in instrument, verb, target, and action triplet recognition tasks, with mAPs of 82.1% (+4.6%), 51.5% (+4.0%), 45.50% (+7.8%), and 35.8% (+3.1%), respectively. The proposed MT-FiST boosts the recognition of surgical action triplets in a context-aware surgical assistant system, further solving multi-task recognition by effective temporal aggregation and fine-grained features.
Collapse
|
19
|
Zang C, Turkcan MK, Narasimhan S, Cao Y, Yarali K, Xiang Z, Szot S, Ahmad F, Choksi S, Bitner DP, Filicori F, Kostic Z. Surgical Phase Recognition in Inguinal Hernia Repair-AI-Based Confirmatory Baseline and Exploration of Competitive Models. Bioengineering (Basel) 2023; 10:654. [PMID: 37370585 DOI: 10.3390/bioengineering10060654] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 05/18/2023] [Accepted: 05/23/2023] [Indexed: 06/29/2023] Open
Abstract
Video-recorded robotic-assisted surgeries allow the use of automated computer vision and artificial intelligence/deep learning methods for quality assessment and workflow analysis in surgical phase recognition. We considered a dataset of 209 videos of robotic-assisted laparoscopic inguinal hernia repair (RALIHR) collected from 8 surgeons, defined rigorous ground-truth annotation rules, then pre-processed and annotated the videos. We deployed seven deep learning models to establish the baseline accuracy for surgical phase recognition and explored four advanced architectures. For rapid execution of the studies, we initially engaged three dozen MS-level engineering students in a competitive classroom setting, followed by focused research. We unified the data processing pipeline in a confirmatory study, and explored a number of scenarios which differ in how the DL networks were trained and evaluated. For the scenario with 21 validation videos of all surgeons, the Video Swin Transformer model achieved ~0.85 validation accuracy, and the Perceiver IO model achieved ~0.84. Our studies affirm the necessity of close collaborative research between medical experts and engineers for developing automated surgical phase recognition models deployable in clinical settings.
Collapse
Affiliation(s)
- Chengbo Zang
- Department of Electrical Engineering, Columbia University, New York, NY 10027, USA
| | - Mehmet Kerem Turkcan
- Department of Electrical Engineering, Columbia University, New York, NY 10027, USA
| | - Sanjeev Narasimhan
- Department of Computer Science, Columbia University, New York, NY 10027, USA
| | - Yuqing Cao
- Department of Electrical Engineering, Columbia University, New York, NY 10027, USA
| | - Kaan Yarali
- Department of Electrical Engineering, Columbia University, New York, NY 10027, USA
| | - Zixuan Xiang
- Department of Electrical Engineering, Columbia University, New York, NY 10027, USA
| | - Skyler Szot
- Department of Electrical Engineering, Columbia University, New York, NY 10027, USA
| | - Feroz Ahmad
- Department of Computer Science, Columbia University, New York, NY 10027, USA
| | - Sarah Choksi
- Intraoperative Performance Analytics Laboratory (IPAL), Lenox Hill Hospital, New York, NY 10021, USA
| | - Daniel P Bitner
- Intraoperative Performance Analytics Laboratory (IPAL), Lenox Hill Hospital, New York, NY 10021, USA
| | - Filippo Filicori
- Intraoperative Performance Analytics Laboratory (IPAL), Lenox Hill Hospital, New York, NY 10021, USA
- Zucker School of Medicine at Hofstra/Northwell Health, Hempstead, NY 11549, USA
| | - Zoran Kostic
- Department of Electrical Engineering, Columbia University, New York, NY 10027, USA
| |
Collapse
|
20
|
Ramesh S, Srivastav V, Alapatt D, Yu T, Murali A, Sestini L, Nwoye CI, Hamoud I, Sharma S, Fleurentin A, Exarchakis G, Karargyris A, Padoy N. Dissecting self-supervised learning methods for surgical computer vision. Med Image Anal 2023; 88:102844. [PMID: 37270898 DOI: 10.1016/j.media.2023.102844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 05/08/2023] [Accepted: 05/15/2023] [Indexed: 06/06/2023]
Abstract
The field of surgical computer vision has undergone considerable breakthroughs in recent years with the rising popularity of deep neural network-based methods. However, standard fully-supervised approaches for training such models require vast amounts of annotated data, imposing a prohibitively high cost; especially in the clinical domain. Self-Supervised Learning (SSL) methods, which have begun to gain traction in the general computer vision community, represent a potential solution to these annotation costs, allowing to learn useful representations from only unlabeled data. Still, the effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored. In this work, we address this critical need by investigating four state-of-the-art SSL methods (MoCo v2, SimCLR, DINO, SwAV) in the context of surgical computer vision. We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection. We examine their parameterization, then their behavior with respect to training data quantities in semi-supervised settings. Correct transfer of these methods to surgery, as described and conducted in this work, leads to substantial performance gains over generic uses of SSL - up to 7.4% on phase recognition and 20% on tool presence detection - as well as state-of-the-art semi-supervised phase recognition approaches by up to 14%. Further results obtained on a highly diverse selection of surgical datasets exhibit strong generalization properties. The code is available at https://github.com/CAMMA-public/SelfSupSurg.
Collapse
Affiliation(s)
- Sanat Ramesh
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France; Altair Robotics Lab, Department of Computer Science, University of Verona, Verona 37134, Italy
| | - Vinkle Srivastav
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France.
| | - Deepak Alapatt
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France
| | - Tong Yu
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France
| | - Aditya Murali
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France
| | - Luca Sestini
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France; Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milano 20133, Italy
| | | | - Idris Hamoud
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France
| | - Saurav Sharma
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France
| | | | - Georgios Exarchakis
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France; IHU Strasbourg, Strasbourg 67000, France
| | - Alexandros Karargyris
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France; IHU Strasbourg, Strasbourg 67000, France
| | - Nicolas Padoy
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France; IHU Strasbourg, Strasbourg 67000, France
| |
Collapse
|
21
|
Sharma S, Nwoye CI, Mutter D, Padoy N. Rendezvous in time: an attention-based temporal fusion approach for surgical triplet recognition. Int J Comput Assist Radiol Surg 2023:10.1007/s11548-023-02914-1. [PMID: 37097518 DOI: 10.1007/s11548-023-02914-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 04/07/2023] [Indexed: 04/26/2023]
Abstract
PURPOSE One of the recent advances in surgical AI is the recognition of surgical activities as triplets of [Formula: see text]instrument, verb, target[Formula: see text]. Albeit providing detailed information for computer-assisted intervention, current triplet recognition approaches rely only on single-frame features. Exploiting the temporal cues from earlier frames would improve the recognition of surgical action triplets from videos. METHODS In this paper, we propose Rendezvous in Time (RiT)-a deep learning model that extends the state-of-the-art model, Rendezvous, with temporal modeling. Focusing more on the verbs, our RiT explores the connectedness of current and past frames to learn temporal attention-based features for enhanced triplet recognition. RESULTS We validate our proposal on the challenging surgical triplet dataset, CholecT45, demonstrating an improved recognition of the verb and triplet along with other interactions involving the verb such as [Formula: see text]instrument, verb[Formula: see text]. Qualitative results show that the RiT produces smoother predictions for most triplet instances than the state-of-the-arts. CONCLUSION We present a novel attention-based approach that leverages the temporal fusion of video frames to model the evolution of surgical actions and exploit their benefits for surgical triplet recognition.
Collapse
Affiliation(s)
- Saurav Sharma
- ICube, University of Strasbourg, CNRS, Strasbourg, France.
| | | | - Didier Mutter
- IHU Strasbourg, Strasbourg, France
- University Hospital of Strasbourg, Strasbourg, France
| | - Nicolas Padoy
- ICube, University of Strasbourg, CNRS, Strasbourg, France
- IHU Strasbourg, Strasbourg, France
| |
Collapse
|
22
|
Kiyasseh D, Ma R, Haque TF, Miles BJ, Wagner C, Donoho DA, Anandkumar A, Hung AJ. A vision transformer for decoding surgeon activity from surgical videos. Nat Biomed Eng 2023:10.1038/s41551-023-01010-8. [PMID: 36997732 DOI: 10.1038/s41551-023-01010-8] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 02/15/2023] [Indexed: 04/01/2023]
Abstract
The intraoperative activity of a surgeon has substantial impact on postoperative outcomes. However, for most surgical procedures, the details of intraoperative surgical actions, which can vary widely, are not well understood. Here we report a machine learning system leveraging a vision transformer and supervised contrastive learning for the decoding of elements of intraoperative surgical activity from videos commonly collected during robotic surgeries. The system accurately identified surgical steps, actions performed by the surgeon, the quality of these actions and the relative contribution of individual video frames to the decoding of the actions. Through extensive testing on data from three different hospitals located in two different continents, we show that the system generalizes across videos, surgeons, hospitals and surgical procedures, and that it can provide information on surgical gestures and skills from unannotated videos. Decoding intraoperative activity via accurate machine learning systems could be used to provide surgeons with feedback on their operating skills, and may allow for the identification of optimal surgical behaviour and for the study of relationships between intraoperative factors and postoperative outcomes.
Collapse
Affiliation(s)
- Dani Kiyasseh
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA.
| | - Runzhuo Ma
- Center for Robotic Simulation and Education, Catherine & Joseph Aresty Department of Urology, University of Southern California, Los Angeles, CA, USA
| | - Taseen F Haque
- Center for Robotic Simulation and Education, Catherine & Joseph Aresty Department of Urology, University of Southern California, Los Angeles, CA, USA
| | - Brian J Miles
- Department of Urology, Houston Methodist Hospital, Houston, TX, USA
| | - Christian Wagner
- Department of Urology, Pediatric Urology and Uro-Oncology, Prostate Center Northwest, St. Antonius-Hospital, Gronau, Germany
| | - Daniel A Donoho
- Division of Neurosurgery, Center for Neuroscience, Children's National Hospital, Washington, DC, USA
| | - Animashree Anandkumar
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA
| | - Andrew J Hung
- Center for Robotic Simulation and Education, Catherine & Joseph Aresty Department of Urology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
23
|
Nwoye CI, Alapatt D, Yu T, Vardazaryan A, Xia F, Zhao Z, Xia T, Jia F, Yang Y, Wang H, Yu D, Zheng G, Duan X, Getty N, Sanchez-Matilla R, Robu M, Zhang L, Chen H, Wang J, Wang L, Zhang B, Gerats B, Raviteja S, Sathish R, Tao R, Kondo S, Pang W, Ren H, Abbing JR, Sarhan MH, Bodenstedt S, Bhasker N, Oliveira B, Torres HR, Ling L, Gaida F, Czempiel T, Vilaça JL, Morais P, Fonseca J, Egging RM, Wijma IN, Qian C, Bian G, Li Z, Balasubramanian V, Sheet D, Luengo I, Zhu Y, Ding S, Aschenbrenner JA, van der Kar NE, Xu M, Islam M, Seenivasan L, Jenke A, Stoyanov D, Mutter D, Mascagni P, Seeliger B, Gonzalez C, Padoy N. CholecTriplet2021: A benchmark challenge for surgical action triplet recognition. Med Image Anal 2023; 86:102803. [PMID: 37004378 DOI: 10.1016/j.media.2023.102803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2022] [Revised: 12/13/2022] [Accepted: 03/23/2023] [Indexed: 03/29/2023]
Abstract
Context-aware decision support in the operating room can foster surgical safety and efficiency by leveraging real-time feedback from surgical workflow analysis. Most existing works recognize surgical activities at a coarse-grained level, such as phases, steps or events, leaving out fine-grained interaction details about the surgical activity; yet those are needed for more helpful AI assistance in the operating room. Recognizing surgical actions as triplets of ‹instrument, verb, target› combination delivers more comprehensive details about the activities taking place in surgical videos. This paper presents CholecTriplet2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos. The challenge granted private access to the large-scale CholecT50 dataset, which is annotated with action triplet information. In this paper, we present the challenge setup and the assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge. A total of 4 baseline methods from the challenge organizers and 19 new deep learning algorithms from the competing teams are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%. This study also analyzes the significance of the results obtained by the presented approaches, performs a thorough methodological comparison between them, in-depth result analysis, and proposes a novel ensemble method for enhanced recognition. Our analysis shows that surgical workflow analysis is not yet solved, and also highlights interesting directions for future research on fine-grained surgical activity recognition which is of utmost importance for the development of AI in surgery.
Collapse
|
24
|
Chadebecq F, Lovat LB, Stoyanov D. Artificial intelligence and automation in endoscopy and surgery. Nat Rev Gastroenterol Hepatol 2023; 20:171-182. [PMID: 36352158 DOI: 10.1038/s41575-022-00701-y] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/03/2022] [Indexed: 11/10/2022]
Abstract
Modern endoscopy relies on digital technology, from high-resolution imaging sensors and displays to electronics connecting configurable illumination and actuation systems for robotic articulation. In addition to enabling more effective diagnostic and therapeutic interventions, the digitization of the procedural toolset enables video data capture of the internal human anatomy at unprecedented levels. Interventional video data encapsulate functional and structural information about a patient's anatomy as well as events, activity and action logs about the surgical process. This detailed but difficult-to-interpret record from endoscopic procedures can be linked to preoperative and postoperative records or patient imaging information. Rapid advances in artificial intelligence, especially in supervised deep learning, can utilize data from endoscopic procedures to develop systems for assisting procedures leading to computer-assisted interventions that can enable better navigation during procedures, automation of image interpretation and robotically assisted tool manipulation. In this Perspective, we summarize state-of-the-art artificial intelligence for computer-assisted interventions in gastroenterology and surgery.
Collapse
Affiliation(s)
- François Chadebecq
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, UK
| | - Laurence B Lovat
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, UK
| | - Danail Stoyanov
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, UK.
| |
Collapse
|
25
|
Seenivasan L, Islam M, Xu M, Lim CM, Ren H. Task-aware asynchronous multi-task model with class incremental contrastive learning for surgical scene understanding. Int J Comput Assist Radiol Surg 2023; 18:921-928. [PMID: 36648701 DOI: 10.1007/s11548-022-02800-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 11/19/2022] [Indexed: 01/18/2023]
Abstract
PURPOSE Surgery scene understanding with tool-tissue interaction recognition and automatic report generation can play an important role in intra-operative guidance, decision-making and postoperative analysis in robotic surgery. However, domain shifts between different surgeries with inter and intra-patient variation and novel instruments' appearance degrade the performance of model prediction. Moreover, it requires output from multiple models, which can be computationally expensive and affect real-time performance. METHODOLOGY A multi-task learning (MTL) model is proposed for surgical report generation and tool-tissue interaction prediction that deals with domain shift problems. The model forms of shared feature extractor, mesh-transformer branch for captioning and graph attention branch for tool-tissue interaction prediction. The shared feature extractor employs class incremental contrastive learning to tackle intensity shift and novel class appearance in the target domain. We design Laplacian of Gaussian-based curriculum learning into both shared and task-specific branches to enhance model learning. We incorporate a task-aware asynchronous MTL optimization technique to fine-tune the shared weights and converge both tasks optimally. RESULTS The proposed MTL model trained using task-aware optimization and fine-tuning techniques reported a balanced performance (BLEU score of 0.4049 for scene captioning and accuracy of 0.3508 for interaction detection) for both tasks on the target domain and performed on-par with single-task models in domain adaptation. CONCLUSION The proposed multi-task model was able to adapt to domain shifts, incorporate novel instruments in the target domain, and perform tool-tissue interaction detection and report generation on par with single-task models.
Collapse
Affiliation(s)
- Lalithkumar Seenivasan
- Department of Biomedical Engineering, National University of Singapore, Singapore, Singapore
| | - Mobarakol Islam
- Department of Computing, Imperial College London, London, UK
| | - Mengya Xu
- Department of Biomedical Engineering, National University of Singapore, Singapore, Singapore
| | - Chwee Ming Lim
- Head and Neck Surgery, Singapore General Hospital, Singapore, Singapore
| | - Hongliang Ren
- Department of Biomedical Engineering, National University of Singapore, Singapore, Singapore. .,Department of Electrical Engineering and Shun Hing Institute of Advanced Engineering, The Chinese University of Hong Kong, Shatin, China.
| |
Collapse
|
26
|
Bastian L, Czempiel T, Heiliger C, Karcz K, Eck U, Busam B, Navab N. Know your sensors — a modality study for surgical action classification. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING: IMAGING & VISUALIZATION 2022. [DOI: 10.1080/21681163.2022.2152377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Affiliation(s)
- Lennart Bastian
- Chair for Computer Aided Medical Procedures, TU Munich, Munich, Germany
| | - Tobias Czempiel
- Chair for Computer Aided Medical Procedures, TU Munich, Munich, Germany
| | - Christian Heiliger
- Department of General, Visceral, and Transplant Surgery, University Hospital, LMU Munich, Munich, Germany
| | - Konrad Karcz
- Department of General, Visceral, and Transplant Surgery, University Hospital, LMU Munich, Munich, Germany
| | - Ulrich Eck
- Chair for Computer Aided Medical Procedures, TU Munich, Munich, Germany
| | - Benjamin Busam
- Chair for Computer Aided Medical Procedures, TU Munich, Munich, Germany
| | - Nassir Navab
- Chair for Computer Aided Medical Procedures, TU Munich, Munich, Germany
- Computer Aided Medical Procedures, John Hopkins University, Baltimore, Maryland, USA
| |
Collapse
|
27
|
Zou X, Liu W, Wang J, Tao R, Zheng G. ARST: auto-regressive surgical transformer for phase recognition from laparoscopic videos. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING: IMAGING & VISUALIZATION 2022. [DOI: 10.1080/21681163.2022.2145238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Xiaoyang Zou
- Institute of Medical Robotics, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Wenyong Liu
- Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, Beijing, China
| | - Junchen Wang
- School of Mechanical Engineering and Automation, Beihang University, Beijing, China
| | - Rong Tao
- Institute of Medical Robotics, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Guoyan Zheng
- Institute of Medical Robotics, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
28
|
Wagner M, Brandenburg JM, Bodenstedt S, Schulze A, Jenke AC, Stern A, Daum MTJ, Mündermann L, Kolbinger FR, Bhasker N, Schneider G, Krause-Jüttler G, Alwanni H, Fritz-Kebede F, Burgert O, Wilhelm D, Fallert J, Nickel F, Maier-Hein L, Dugas M, Distler M, Weitz J, Müller-Stich BP, Speidel S. Surgomics: personalized prediction of morbidity, mortality and long-term outcome in surgery using machine learning on multimodal data. Surg Endosc 2022; 36:8568-8591. [PMID: 36171451 PMCID: PMC9613751 DOI: 10.1007/s00464-022-09611-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 09/03/2022] [Indexed: 01/06/2023]
Abstract
BACKGROUND Personalized medicine requires the integration and analysis of vast amounts of patient data to realize individualized care. With Surgomics, we aim to facilitate personalized therapy recommendations in surgery by integration of intraoperative surgical data and their analysis with machine learning methods to leverage the potential of this data in analogy to Radiomics and Genomics. METHODS We defined Surgomics as the entirety of surgomic features that are process characteristics of a surgical procedure automatically derived from multimodal intraoperative data to quantify processes in the operating room. In a multidisciplinary team we discussed potential data sources like endoscopic videos, vital sign monitoring, medical devices and instruments and respective surgomic features. Subsequently, an online questionnaire was sent to experts from surgery and (computer) science at multiple centers for rating the features' clinical relevance and technical feasibility. RESULTS In total, 52 surgomic features were identified and assigned to eight feature categories. Based on the expert survey (n = 66 participants) the feature category with the highest clinical relevance as rated by surgeons was "surgical skill and quality of performance" for morbidity and mortality (9.0 ± 1.3 on a numerical rating scale from 1 to 10) as well as for long-term (oncological) outcome (8.2 ± 1.8). The feature category with the highest feasibility to be automatically extracted as rated by (computer) scientists was "Instrument" (8.5 ± 1.7). Among the surgomic features ranked as most relevant in their respective category were "intraoperative adverse events", "action performed with instruments", "vital sign monitoring", and "difficulty of surgery". CONCLUSION Surgomics is a promising concept for the analysis of intraoperative data. Surgomics may be used together with preoperative features from clinical data and Radiomics to predict postoperative morbidity, mortality and long-term outcome, as well as to provide tailored feedback for surgeons.
Collapse
Affiliation(s)
- Martin Wagner
- Department of General, Visceral and Transplantation Surgery, Heidelberg University Hospital, Im Neuenheimer Feld 420, 69120, Heidelberg, Germany.
- National Center for Tumor Diseases (NCT), Heidelberg, Germany.
| | - Johanna M Brandenburg
- Department of General, Visceral and Transplantation Surgery, Heidelberg University Hospital, Im Neuenheimer Feld 420, 69120, Heidelberg, Germany
- National Center for Tumor Diseases (NCT), Heidelberg, Germany
| | - Sebastian Bodenstedt
- Department of Translational Surgical Oncology, National Center for Tumor Diseases (NCT/UCC), Dresden, Germany
- Cluster of Excellence "Centre for Tactile Internet with Human-in-the-Loop" (CeTI), Technische Universität Dresden, 01062, Dresden, Germany
| | - André Schulze
- Department of General, Visceral and Transplantation Surgery, Heidelberg University Hospital, Im Neuenheimer Feld 420, 69120, Heidelberg, Germany
- National Center for Tumor Diseases (NCT), Heidelberg, Germany
| | - Alexander C Jenke
- Department of Translational Surgical Oncology, National Center for Tumor Diseases (NCT/UCC), Dresden, Germany
| | - Antonia Stern
- Corporate Research and Technology, Karl Storz SE & Co KG, Tuttlingen, Germany
| | - Marie T J Daum
- Department of General, Visceral and Transplantation Surgery, Heidelberg University Hospital, Im Neuenheimer Feld 420, 69120, Heidelberg, Germany
- National Center for Tumor Diseases (NCT), Heidelberg, Germany
| | - Lars Mündermann
- Corporate Research and Technology, Karl Storz SE & Co KG, Tuttlingen, Germany
| | - Fiona R Kolbinger
- Department of Visceral-, Thoracic and Vascular Surgery, University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
- Else Kröner Fresenius Center for Digital Health, Technische Universität Dresden, Dresden, Germany
| | - Nithya Bhasker
- Department of Translational Surgical Oncology, National Center for Tumor Diseases (NCT/UCC), Dresden, Germany
| | - Gerd Schneider
- Institute of Medical Informatics, Heidelberg University Hospital, Heidelberg, Germany
| | - Grit Krause-Jüttler
- Department of Visceral-, Thoracic and Vascular Surgery, University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Hisham Alwanni
- Corporate Research and Technology, Karl Storz SE & Co KG, Tuttlingen, Germany
| | - Fleur Fritz-Kebede
- Institute of Medical Informatics, Heidelberg University Hospital, Heidelberg, Germany
| | - Oliver Burgert
- Research Group Computer Assisted Medicine (CaMed), Reutlingen University, Reutlingen, Germany
| | - Dirk Wilhelm
- Department of Surgery, Faculty of Medicine, Klinikum Rechts der Isar, Technical University of Munich, Munich, Germany
| | - Johannes Fallert
- Corporate Research and Technology, Karl Storz SE & Co KG, Tuttlingen, Germany
| | - Felix Nickel
- Department of General, Visceral and Transplantation Surgery, Heidelberg University Hospital, Im Neuenheimer Feld 420, 69120, Heidelberg, Germany
| | - Lena Maier-Hein
- Department of Intelligent Medical Systems (IMSY), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Martin Dugas
- Institute of Medical Informatics, Heidelberg University Hospital, Heidelberg, Germany
| | - Marius Distler
- Department of Visceral-, Thoracic and Vascular Surgery, University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
- National Center for Tumor Diseases (NCT/UCC), Dresden, Germany
- German Cancer Research Center (DKFZ), Heidelberg, Germany
- Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
- Helmholtz-Zentrum Dresden - Rossendorf (HZDR), Dresden, Germany
| | - Jürgen Weitz
- Department of Visceral-, Thoracic and Vascular Surgery, University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
- National Center for Tumor Diseases (NCT/UCC), Dresden, Germany
- German Cancer Research Center (DKFZ), Heidelberg, Germany
- Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
- Helmholtz-Zentrum Dresden - Rossendorf (HZDR), Dresden, Germany
| | - Beat-Peter Müller-Stich
- Department of General, Visceral and Transplantation Surgery, Heidelberg University Hospital, Im Neuenheimer Feld 420, 69120, Heidelberg, Germany
- National Center for Tumor Diseases (NCT), Heidelberg, Germany
| | - Stefanie Speidel
- Department of Translational Surgical Oncology, National Center for Tumor Diseases (NCT/UCC), Dresden, Germany
- Cluster of Excellence "Centre for Tactile Internet with Human-in-the-Loop" (CeTI), Technische Universität Dresden, 01062, Dresden, Germany
| |
Collapse
|
29
|
Mascagni P, Alapatt D, Sestini L, Altieri MS, Madani A, Watanabe Y, Alseidi A, Redan JA, Alfieri S, Costamagna G, Boškoski I, Padoy N, Hashimoto DA. Computer vision in surgery: from potential to clinical value. NPJ Digit Med 2022; 5:163. [PMID: 36307544 PMCID: PMC9616906 DOI: 10.1038/s41746-022-00707-5] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Accepted: 10/10/2022] [Indexed: 11/09/2022] Open
Abstract
Hundreds of millions of operations are performed worldwide each year, and the rising uptake in minimally invasive surgery has enabled fiber optic cameras and robots to become both important tools to conduct surgery and sensors from which to capture information about surgery. Computer vision (CV), the application of algorithms to analyze and interpret visual data, has become a critical technology through which to study the intraoperative phase of care with the goals of augmenting surgeons' decision-making processes, supporting safer surgery, and expanding access to surgical care. While much work has been performed on potential use cases, there are currently no CV tools widely used for diagnostic or therapeutic applications in surgery. Using laparoscopic cholecystectomy as an example, we reviewed current CV techniques that have been applied to minimally invasive surgery and their clinical applications. Finally, we discuss the challenges and obstacles that remain to be overcome for broader implementation and adoption of CV in surgery.
Collapse
Affiliation(s)
- Pietro Mascagni
- Gemelli Hospital, Catholic University of the Sacred Heart, Rome, Italy.
- IHU-Strasbourg, Institute of Image-Guided Surgery, Strasbourg, France.
- Global Surgical Artificial Intelligence Collaborative, Toronto, ON, Canada.
| | - Deepak Alapatt
- ICube, University of Strasbourg, CNRS, IHU, Strasbourg, France
| | - Luca Sestini
- ICube, University of Strasbourg, CNRS, IHU, Strasbourg, France
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milano, Italy
| | - Maria S Altieri
- Global Surgical Artificial Intelligence Collaborative, Toronto, ON, Canada
- Department of Surgery, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Amin Madani
- Global Surgical Artificial Intelligence Collaborative, Toronto, ON, Canada
- Department of Surgery, University Health Network, Toronto, ON, Canada
| | - Yusuke Watanabe
- Global Surgical Artificial Intelligence Collaborative, Toronto, ON, Canada
- Department of Surgery, University of Hokkaido, Hokkaido, Japan
| | - Adnan Alseidi
- Global Surgical Artificial Intelligence Collaborative, Toronto, ON, Canada
- Department of Surgery, University of California San Francisco, San Francisco, CA, USA
| | - Jay A Redan
- Department of Surgery, AdventHealth-Celebration Health, Celebration, FL, USA
| | - Sergio Alfieri
- Fondazione Policlinico Universitario A. Gemelli IRCCS, Rome, Italy
| | - Guido Costamagna
- Fondazione Policlinico Universitario A. Gemelli IRCCS, Rome, Italy
| | - Ivo Boškoski
- Fondazione Policlinico Universitario A. Gemelli IRCCS, Rome, Italy
| | - Nicolas Padoy
- IHU-Strasbourg, Institute of Image-Guided Surgery, Strasbourg, France
- ICube, University of Strasbourg, CNRS, IHU, Strasbourg, France
| | - Daniel A Hashimoto
- Global Surgical Artificial Intelligence Collaborative, Toronto, ON, Canada
- Department of Surgery, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| |
Collapse
|
30
|
Loukas C, Gazis A, Schizas D. Multiple instance convolutional neural network for gallbladder assessment from laparoscopic images. Int J Med Robot 2022; 18:e2445. [DOI: 10.1002/rcs.2445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2022] [Revised: 06/01/2022] [Accepted: 07/21/2022] [Indexed: 12/07/2022]
Affiliation(s)
- Constantinos Loukas
- Laboratory of Medical PhysicsMedical SchoolNational and Kapodistrian University of AthensAthensGreece
| | - Athanasios Gazis
- Laboratory of Medical PhysicsMedical SchoolNational and Kapodistrian University of AthensAthensGreece
| | - Dimitrios Schizas
- 1st Department of SurgeryMedical SchoolLaikon General HospitalNational and Kapodistrian University of AthensAthensGreece
| |
Collapse
|
31
|
Surgical Tool Datasets for Machine Learning Research: A Survey. Int J Comput Vis 2022. [DOI: 10.1007/s11263-022-01640-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
AbstractThis paper is a comprehensive survey of datasets for surgical tool detection and related surgical data science and machine learning techniques and algorithms. The survey offers a high level perspective of current research in this area, analyses the taxonomy of approaches adopted by researchers using surgical tool datasets, and addresses key areas of research, such as the datasets used, evaluation metrics applied and deep learning techniques utilised. Our presentation and taxonomy provides a framework that facilitates greater understanding of current work, and highlights the challenges and opportunities for further innovative and useful research.
Collapse
|