1
|
Sirajudeen N, Boal M, Anastasiou D, Xu J, Stoyanov D, Kelly J, Collins JW, Sridhar A, Mazomenos E, Francis NK. Deep learning prediction of error and skill in robotic prostatectomy suturing. Surg Endosc 2024:10.1007/s00464-024-11341-5. [PMID: 39433583 DOI: 10.1007/s00464-024-11341-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2024] [Accepted: 10/02/2024] [Indexed: 10/23/2024]
Abstract
BACKGROUND Manual objective assessment of skill and errors in minimally invasive surgery have been validated with correlation to surgical expertise and patient outcomes. However, assessment and error annotation can be subjective and are time-consuming processes, often precluding their use. Recent years have seen the development of artificial intelligence models to work towards automating the process to allow reduction of errors and truly objective assessment. This study aimed to validate surgical skill rating and error annotations in suturing gestures to inform the development and evaluation of AI models. METHODS SAR-RARP50 open data set was blindly, independently annotated at the gesture level in Robotic-Assisted Radical Prostatectomy (RARP) suturing. Manual objective assessment tools and error annotation methodology, Objective Clinical Human Reliability Analysis (OCHRA), were used as ground truth to train and test vision-based deep learning methods to estimate skill and errors. Analysis included descriptive statistics plus tool validity and reliability. RESULTS Fifty-four RARP videos (266 min) were analysed. Strong/excellent inter-rater reliability (range r = 0.70-0.89, p < 0.001) and very strong correlation (r = 0.92, p < 0.001) between objective assessment tools was demonstrated. Skill estimation of OSATS and M-GEARS had a Spearman's Correlation Coefficient 0.37 and 0.36, respectively, with normalised mean absolute error representing a prediction error of 17.92% (inverted "accuracy" 82.08%) and 20.6% (inverted "accuracy" 79.4%) respectively. The best performing models in error prediction achieved mean absolute precision of 37.14%, area under the curve 65.10% and Macro-F1 58.97%. CONCLUSIONS This is the first study to employ detailed error detection methodology and deep learning models within real robotic surgical video. This benchmark evaluation of AI models sets a foundation and promising approach for future advancements in automated technical skill assessment.
Collapse
Affiliation(s)
- N Sirajudeen
- Wellcome/ESPRC Centre for Interventional Surgical Sciences (WEISS), University College London (UCL), London, UK
| | - M Boal
- Wellcome/ESPRC Centre for Interventional Surgical Sciences (WEISS), University College London (UCL), London, UK
- The Griffin Institute, Northwick Park and St Marks Hospital, London, UK
- Division of Surgery and Interventional Science, Research Department of Targeted Intervention, UCL, London, UK
| | - D Anastasiou
- Wellcome/ESPRC Centre for Interventional Surgical Sciences (WEISS), University College London (UCL), London, UK
- Medical Physics and Biomedical Engineering, UCL, London, UK
| | - J Xu
- Wellcome/ESPRC Centre for Interventional Surgical Sciences (WEISS), University College London (UCL), London, UK
- Medical Physics and Biomedical Engineering, UCL, London, UK
| | - D Stoyanov
- Wellcome/ESPRC Centre for Interventional Surgical Sciences (WEISS), University College London (UCL), London, UK
- Computer Vision, UCL, London, UK
| | - J Kelly
- Division of Surgery and Interventional Science, Research Department of Targeted Intervention, UCL, London, UK
- Computer Vision, UCL, London, UK
| | - J W Collins
- Division of Surgery and Interventional Science, Research Department of Targeted Intervention, UCL, London, UK
- University College London Hospitals NHS Foundation Trust, London, UK
| | - A Sridhar
- Division of Surgery and Interventional Science, Research Department of Targeted Intervention, UCL, London, UK
- University College London Hospitals NHS Foundation Trust, London, UK
| | - E Mazomenos
- Wellcome/ESPRC Centre for Interventional Surgical Sciences (WEISS), University College London (UCL), London, UK
- Medical Physics and Biomedical Engineering, UCL, London, UK
| | - N K Francis
- The Griffin Institute, Northwick Park and St Marks Hospital, London, UK.
- Division of Surgery and Interventional Science, Research Department of Targeted Intervention, UCL, London, UK.
- University College London Hospitals NHS Foundation Trust, London, UK.
- Yeovil District Hospital, Somerset Foundation NHS Trust, Yeovil, UK.
| |
Collapse
|
2
|
Liu Y, Boels M, Garcia-Peraza-Herrera LC, Vercauteren T, Dasgupta P, Granados A, Ourselin S. LoViT: Long Video Transformer for surgical phase recognition. Med Image Anal 2024; 99:103366. [PMID: 39418831 DOI: 10.1016/j.media.2024.103366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 06/12/2024] [Accepted: 10/02/2024] [Indexed: 10/19/2024]
Abstract
Online surgical phase recognition plays a significant role towards building contextual tools that could quantify performance and oversee the execution of surgical workflows. Current approaches are limited since they train spatial feature extractors using frame-level supervision that could lead to incorrect predictions due to similar frames appearing at different phases, and poorly fuse local and global features due to computational constraints which can affect the analysis of long videos commonly encountered in surgical interventions. In this paper, we present a two-stage method, called Long Video Transformer (LoViT), emphasizing the development of a temporally-rich spatial feature extractor and a phase transition map. The temporally-rich spatial feature extractor is designed to capture critical temporal information within the surgical video frames. The phase transition map provides essential insights into the dynamic transitions between different surgical phases. LoViT combines these innovations with a multiscale temporal aggregator consisting of two cascaded L-Trans modules based on self-attention, followed by a G-Informer module based on ProbSparse self-attention for processing global temporal information. The multi-scale temporal head then leverages the temporally-rich spatial features and phase transition map to classify surgical phases using phase transition-aware supervision. Our approach outperforms state-of-the-art methods on the Cholec80 and AutoLaparo datasets consistently. Compared to Trans-SVNet, LoViT achieves a 2.4 pp (percentage point) improvement in video-level accuracy on Cholec80 and a 3.1 pp improvement on AutoLaparo. Our results demonstrate the effectiveness of our approach in achieving state-of-the-art performance of surgical phase recognition on two datasets of different surgical procedures and temporal sequencing characteristics. The project page is available at https://github.com/MRUIL/LoViT.
Collapse
Affiliation(s)
- Yang Liu
- Department of Surgical & Interventional Engineering, King's College London, United Kingdom.
| | - Maxence Boels
- Department of Surgical & Interventional Engineering, King's College London, United Kingdom
| | | | - Tom Vercauteren
- Department of Surgical & Interventional Engineering, King's College London, United Kingdom
| | - Prokar Dasgupta
- Department of Peter Gorer Department of Immunobiology, King's College London, United Kingdom
| | - Alejandro Granados
- Department of Surgical & Interventional Engineering, King's College London, United Kingdom
| | - Sébastien Ourselin
- Department of Surgical & Interventional Engineering, King's College London, United Kingdom.
| |
Collapse
|
3
|
Theocharopoulos C, Davakis S, Ziogas DC, Theocharopoulos A, Foteinou D, Mylonakis A, Katsaros I, Gogas H, Charalabopoulos A. Deep Learning for Image Analysis in the Diagnosis and Management of Esophageal Cancer. Cancers (Basel) 2024; 16:3285. [PMID: 39409906 PMCID: PMC11475041 DOI: 10.3390/cancers16193285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2024] [Revised: 09/21/2024] [Accepted: 09/24/2024] [Indexed: 10/20/2024] Open
Abstract
Esophageal cancer has a dismal prognosis and necessitates a multimodal and multidisciplinary approach from diagnosis to treatment. High-definition white-light endoscopy and histopathological confirmation remain the gold standard for the definitive diagnosis of premalignant and malignant lesions. Artificial intelligence using deep learning (DL) methods for image analysis constitutes a promising adjunct for the clinical endoscopist that could effectively decrease BE overdiagnosis and unnecessary surveillance, while also assisting in the timely detection of dysplastic BE and esophageal cancer. A plethora of studies published during the last five years have consistently reported highly accurate DL algorithms with comparable or superior performance compared to endoscopists. Recent efforts aim to expand DL utilization into further aspects of esophageal neoplasia management including histologic diagnosis, segmentation of gross tumor volume, pretreatment prediction and post-treatment evaluation of patient response to systemic therapy and operative guidance during minimally invasive esophagectomy. Our manuscript serves as an introduction to the growing literature of DL applications for image analysis in the management of esophageal neoplasia, concisely presenting all currently published studies. We also aim to guide the clinician across basic functional principles, evaluation metrics and limitations of DL for image recognition to facilitate the comprehension and critical evaluation of the presented studies.
Collapse
Affiliation(s)
| | - Spyridon Davakis
- First Department of Surgery, School of Medicine, Laiko General Hospital, National and Kapodistrian University of Athens, 11527 Athens, Greece; (S.D.); (A.M.); (I.K.); (A.C.)
| | - Dimitrios C. Ziogas
- First Department of Medicine, School of Medicine, Laiko General Hospital, National and Kapodistrian University of Athens, 11527 Athens, Greece; (D.C.Z.); (D.F.); (H.G.)
| | - Achilleas Theocharopoulos
- Department of Electrical and Computer Engineering, National Technical University of Athens, 10682 Athens, Greece;
| | - Dimitra Foteinou
- First Department of Medicine, School of Medicine, Laiko General Hospital, National and Kapodistrian University of Athens, 11527 Athens, Greece; (D.C.Z.); (D.F.); (H.G.)
| | - Adam Mylonakis
- First Department of Surgery, School of Medicine, Laiko General Hospital, National and Kapodistrian University of Athens, 11527 Athens, Greece; (S.D.); (A.M.); (I.K.); (A.C.)
| | - Ioannis Katsaros
- First Department of Surgery, School of Medicine, Laiko General Hospital, National and Kapodistrian University of Athens, 11527 Athens, Greece; (S.D.); (A.M.); (I.K.); (A.C.)
| | - Helen Gogas
- First Department of Medicine, School of Medicine, Laiko General Hospital, National and Kapodistrian University of Athens, 11527 Athens, Greece; (D.C.Z.); (D.F.); (H.G.)
| | - Alexandros Charalabopoulos
- First Department of Surgery, School of Medicine, Laiko General Hospital, National and Kapodistrian University of Athens, 11527 Athens, Greece; (S.D.); (A.M.); (I.K.); (A.C.)
| |
Collapse
|
4
|
Rao M, Qin Y, Kolouri S, Wu JY, Moyer D. Zero-shot prompt-based video encoder for surgical gesture recognition. Int J Comput Assist Radiol Surg 2024:10.1007/s11548-024-03257-1. [PMID: 39287713 DOI: 10.1007/s11548-024-03257-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2024] [Accepted: 08/20/2024] [Indexed: 09/19/2024]
Abstract
PURPOSE In order to produce a surgical gesture recognition system that can support a wide variety of procedures, either a very large annotated dataset must be acquired, or fitted models must generalize to new labels (so-called zero-shot capability). In this paper we investigate the feasibility of latter option. METHODS Leveraging the bridge-prompt framework, we prompt-tune a pre-trained vision-text model (CLIP) for gesture recognition in surgical videos. This can utilize extensive outside video data such as text, but also make use of label meta-data and weakly supervised contrastive losses. RESULTS Our experiments show that prompt-based video encoder outperforms standard encoders in surgical gesture recognition tasks. Notably, it displays strong performance in zero-shot scenarios, where gestures/tasks that were not provided during the encoder training phase are included in the prediction phase. Additionally, we measure the benefit of inclusion text descriptions in the feature extractor training schema. CONCLUSION Bridge-prompt and similar pre-trained + prompt-tuned video encoder models present significant visual representation for surgical robotics, especially in gesture recognition tasks. Given the diverse range of surgical tasks (gestures), the ability of these models to zero-shot transfer without the need for any task (gesture) specific retraining makes them invaluable.
Collapse
Affiliation(s)
- Mingxing Rao
- Department of Computer Science, Vanderbilt University, Nashville, USA
| | - Yinhong Qin
- Department of Computer Science, Vanderbilt University, Nashville, USA
| | - Soheil Kolouri
- Department of Computer Science, Vanderbilt University, Nashville, USA
| | - Jie Ying Wu
- Department of Computer Science, Vanderbilt University, Nashville, USA
| | - Daniel Moyer
- Department of Computer Science, Vanderbilt University, Nashville, USA.
| |
Collapse
|
5
|
Fernicola A, Palomba G, Capuano M, De Palma GD, Aprea G. Artificial intelligence applied to laparoscopic cholecystectomy: what is the next step? A narrative review. Updates Surg 2024; 76:1655-1667. [PMID: 38839723 PMCID: PMC11455722 DOI: 10.1007/s13304-024-01892-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Accepted: 05/18/2024] [Indexed: 06/07/2024]
Abstract
Artificial Intelligence (AI) is playing an increasing role in several fields of medicine. AI is also used during laparoscopic cholecystectomy (LC) surgeries. In the literature, there is no review that groups together the various fields of application of AI applied to LC. The aim of this review is to describe the use of AI in these contexts. We performed a narrative literature review by searching PubMed, Web of Science, Scopus and Embase for all studies on AI applied to LC, published from January 01, 2010, to December 30, 2023. Our focus was on randomized controlled trials (RCTs), meta-analysis, systematic reviews, and observational studies, dealing with large cohorts of patients. We then gathered further relevant studies from the reference list of the selected publications. Based on the studies reviewed, it emerges that AI could strongly improve surgical efficiency and accuracy during LC. Future prospects include speeding up, implementing, and improving the automaticity with which AI recognizes, differentiates and classifies the phases of the surgical intervention and the anatomic structures that are safe and those at risk.
Collapse
Affiliation(s)
- Agostino Fernicola
- Division of Endoscopic Surgery, Department of Clinical Medicine and Surgery, "Federico II" University of Naples, Via Pansini 5, 80131, Naples, Italy.
| | - Giuseppe Palomba
- Division of Endoscopic Surgery, Department of Clinical Medicine and Surgery, "Federico II" University of Naples, Via Pansini 5, 80131, Naples, Italy
| | - Marianna Capuano
- Division of Endoscopic Surgery, Department of Clinical Medicine and Surgery, "Federico II" University of Naples, Via Pansini 5, 80131, Naples, Italy
| | - Giovanni Domenico De Palma
- Division of Endoscopic Surgery, Department of Clinical Medicine and Surgery, "Federico II" University of Naples, Via Pansini 5, 80131, Naples, Italy
| | - Giovanni Aprea
- Division of Endoscopic Surgery, Department of Clinical Medicine and Surgery, "Federico II" University of Naples, Via Pansini 5, 80131, Naples, Italy
| |
Collapse
|
6
|
Wagner L, Jourdan S, Mayer L, Müller C, Bernhard L, Kolb S, Harb F, Jell A, Berlet M, Feussner H, Buxmann P, Knoll A, Wilhelm D. Robotic scrub nurse to anticipate surgical instruments based on real-time laparoscopic video analysis. COMMUNICATIONS MEDICINE 2024; 4:156. [PMID: 39095639 PMCID: PMC11297199 DOI: 10.1038/s43856-024-00581-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Accepted: 07/25/2024] [Indexed: 08/04/2024] Open
Abstract
BACKGROUND Machine learning and robotics technologies are increasingly being used in the healthcare domain to improve the quality and efficiency of surgeries and to address challenges such as staff shortages. Robotic scrub nurses in particular offer great potential to address staff shortages by assuming nursing tasks such as the handover of surgical instruments. METHODS We introduce a robotic scrub nurse system designed to enhance the quality of surgeries and efficiency of surgical workflows by predicting and delivering the required surgical instruments based on real-time laparoscopic video analysis. We propose a three-stage deep learning architecture consisting of a single frame-, temporal multi frame-, and informed model to anticipate surgical instruments. The anticipation model was trained on a total of 62 laparoscopic cholecystectomies. RESULTS Here, we show that our prediction system can accurately anticipate 71.54% of the surgical instruments required during laparoscopic cholecystectomies in advance, facilitating a smoother surgical workflow and reducing the need for verbal communication. As the instruments in the left working trocar are changed less frequently and according to a standardized procedure, the prediction system works particularly well for this trocar. CONCLUSIONS The robotic scrub nurse thus acts as a mind reader and helps to mitigate staff shortages by taking over a great share of the workload during surgeries while additionally enabling an enhanced process standardization.
Collapse
Affiliation(s)
- Lars Wagner
- Technical University of Munich, TUM School of Medicine and Health, Klinikum rechts der Isar, Research Group MITI, Munich, Germany.
| | - Sara Jourdan
- Technical University of Darmstadt, Software & Digital Business Group, Darmstadt, Germany
| | - Leon Mayer
- Technical University of Munich, TUM School of Medicine and Health, Klinikum rechts der Isar, Research Group MITI, Munich, Germany
| | - Carolin Müller
- Technical University of Munich, TUM School of Medicine and Health, Klinikum rechts der Isar, Research Group MITI, Munich, Germany
| | - Lukas Bernhard
- Technical University of Munich, TUM School of Medicine and Health, Klinikum rechts der Isar, Research Group MITI, Munich, Germany
| | - Sven Kolb
- Technical University of Munich, TUM School of Medicine and Health, Klinikum rechts der Isar, Research Group MITI, Munich, Germany
| | - Farid Harb
- Technical University of Munich, TUM School of Medicine and Health, Klinikum rechts der Isar, Research Group MITI, Munich, Germany
| | - Alissa Jell
- Technical University of Munich, TUM School of Medicine and Health, Klinikum rechts der Isar, Research Group MITI, Munich, Germany
- Technical University of Munich, TUM School of Medicine and Health, Klinikum rechts der Isar, Department of Surgery, Munich, Germany
| | - Maximilian Berlet
- Technical University of Munich, TUM School of Medicine and Health, Klinikum rechts der Isar, Research Group MITI, Munich, Germany
- Technical University of Munich, TUM School of Medicine and Health, Klinikum rechts der Isar, Department of Surgery, Munich, Germany
| | - Hubertus Feussner
- Technical University of Munich, TUM School of Medicine and Health, Klinikum rechts der Isar, Research Group MITI, Munich, Germany
- Technical University of Munich, TUM School of Medicine and Health, Klinikum rechts der Isar, Department of Surgery, Munich, Germany
| | - Peter Buxmann
- Technical University of Darmstadt, Software & Digital Business Group, Darmstadt, Germany
| | - Alois Knoll
- Technical University of Munich, TUM School of Computation, Information and Technology, Chair of Robotics, Artificial Intelligence and Real-Time Systems, Garching, Germany
| | - Dirk Wilhelm
- Technical University of Munich, TUM School of Medicine and Health, Klinikum rechts der Isar, Research Group MITI, Munich, Germany
- Technical University of Munich, TUM School of Medicine and Health, Klinikum rechts der Isar, Department of Surgery, Munich, Germany
- Technical University of Munich, Munich Institute of Robotics and Machine Intelligence, Munich, Germany
| |
Collapse
|
7
|
Batić D, Holm F, Özsoy E, Czempiel T, Navab N. EndoViT: pretraining vision transformers on a large collection of endoscopic images. Int J Comput Assist Radiol Surg 2024; 19:1085-1091. [PMID: 38570373 PMCID: PMC11178556 DOI: 10.1007/s11548-024-03091-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 02/28/2024] [Indexed: 04/05/2024]
Abstract
PURPOSE Automated endoscopy video analysis is essential for assisting surgeons during medical procedures, but it faces challenges due to complex surgical scenes and limited annotated data. Large-scale pretraining has shown great success in natural language processing and computer vision communities in recent years. These approaches reduce the need for annotated data, which is of great interest in the medical domain. In this work, we investigate endoscopy domain-specific self-supervised pretraining on large collections of data. METHODS To this end, we first collect Endo700k, the largest publicly available corpus of endoscopic images, extracted from nine public Minimally Invasive Surgery (MIS) datasets. Endo700k comprises more than 700,000 images. Next, we introduce EndoViT, an endoscopy-pretrained Vision Transformer (ViT), and evaluate it on a diverse set of surgical downstream tasks. RESULTS Our findings indicate that domain-specific pretraining with EndoViT yields notable advantages in complex downstream tasks. In the case of action triplet recognition, our approach outperforms ImageNet pretraining. In semantic segmentation, we surpass the state-of-the-art (SOTA) performance. These results demonstrate the effectiveness of our domain-specific pretraining approach in addressing the challenges of automated endoscopy video analysis. CONCLUSION Our study contributes to the field of medical computer vision by showcasing the benefits of domain-specific large-scale self-supervised pretraining for vision transformers. We release both our code and pretrained models to facilitate further research in this direction: https://github.com/DominikBatic/EndoViT .
Collapse
Affiliation(s)
- Dominik Batić
- Chair for Computer Aided Medical Procedures, Technical University Munich, Munich, Germany
| | - Felix Holm
- Chair for Computer Aided Medical Procedures, Technical University Munich, Munich, Germany.
- Carl Zeiss AG, Munich, Germany.
| | - Ege Özsoy
- Chair for Computer Aided Medical Procedures, Technical University Munich, Munich, Germany
| | - Tobias Czempiel
- Chair for Computer Aided Medical Procedures, Technical University Munich, Munich, Germany
| | - Nassir Navab
- Chair for Computer Aided Medical Procedures, Technical University Munich, Munich, Germany
| |
Collapse
|
8
|
Lavanchy JL, Ramesh S, Dall'Alba D, Gonzalez C, Fiorini P, Müller-Stich BP, Nett PC, Marescaux J, Mutter D, Padoy N. Challenges in multi-centric generalization: phase and step recognition in Roux-en-Y gastric bypass surgery. Int J Comput Assist Radiol Surg 2024:10.1007/s11548-024-03166-3. [PMID: 38761319 DOI: 10.1007/s11548-024-03166-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Accepted: 04/02/2024] [Indexed: 05/20/2024]
Abstract
PURPOSE Most studies on surgical activity recognition utilizing artificial intelligence (AI) have focused mainly on recognizing one type of activity from small and mono-centric surgical video datasets. It remains speculative whether those models would generalize to other centers. METHODS In this work, we introduce a large multi-centric multi-activity dataset consisting of 140 surgical videos (MultiBypass140) of laparoscopic Roux-en-Y gastric bypass (LRYGB) surgeries performed at two medical centers, i.e., the University Hospital of Strasbourg, France (StrasBypass70) and Inselspital, Bern University Hospital, Switzerland (BernBypass70). The dataset has been fully annotated with phases and steps by two board-certified surgeons. Furthermore, we assess the generalizability and benchmark different deep learning models for the task of phase and step recognition in 7 experimental studies: (1) Training and evaluation on BernBypass70; (2) Training and evaluation on StrasBypass70; (3) Training and evaluation on the joint MultiBypass140 dataset; (4) Training on BernBypass70, evaluation on StrasBypass70; (5) Training on StrasBypass70, evaluation on BernBypass70; Training on MultiBypass140, (6) evaluation on BernBypass70 and (7) evaluation on StrasBypass70. RESULTS The model's performance is markedly influenced by the training data. The worst results were obtained in experiments (4) and (5) confirming the limited generalization capabilities of models trained on mono-centric data. The use of multi-centric training data, experiments (6) and (7), improves the generalization capabilities of the models, bringing them beyond the level of independent mono-centric training and validation (experiments (1) and (2)). CONCLUSION MultiBypass140 shows considerable variation in surgical technique and workflow of LRYGB procedures between centers. Therefore, generalization experiments demonstrate a remarkable difference in model performance. These results highlight the importance of multi-centric datasets for AI model generalization to account for variance in surgical technique and workflows. The dataset and code are publicly available at https://github.com/CAMMA-public/MultiBypass140.
Collapse
Affiliation(s)
- Joël L Lavanchy
- University Digestive Health Care Center - Clarunis, 4002, Basel, Switzerland.
- Department of Biomedical Engineering, University of Basel, 4123, Allschwil, Switzerland.
- Institute of Image-Guided Surgery, IHU Strasbourg, 67000, Strasbourg, France.
| | - Sanat Ramesh
- Institute of Image-Guided Surgery, IHU Strasbourg, 67000, Strasbourg, France
- ICube, University of Strasbourg, CNRS, 67000, Strasbourg, France
- Altair Robotics Lab, University of Verona, 37134, Verona, Italy
| | - Diego Dall'Alba
- Altair Robotics Lab, University of Verona, 37134, Verona, Italy
| | - Cristians Gonzalez
- Institute of Image-Guided Surgery, IHU Strasbourg, 67000, Strasbourg, France
- University Hospital of Strasbourg, 67000, Strasbourg, France
| | - Paolo Fiorini
- Altair Robotics Lab, University of Verona, 37134, Verona, Italy
| | - Beat P Müller-Stich
- University Digestive Health Care Center - Clarunis, 4002, Basel, Switzerland
- Department of Biomedical Engineering, University of Basel, 4123, Allschwil, Switzerland
| | - Philipp C Nett
- Department of Visceral Surgery and Medicine, Inselspital Bern University Hospital, 3010, Bern, Switzerland
| | | | - Didier Mutter
- Institute of Image-Guided Surgery, IHU Strasbourg, 67000, Strasbourg, France
- University Hospital of Strasbourg, 67000, Strasbourg, France
| | - Nicolas Padoy
- Institute of Image-Guided Surgery, IHU Strasbourg, 67000, Strasbourg, France
- ICube, University of Strasbourg, CNRS, 67000, Strasbourg, France
| |
Collapse
|
9
|
Özsoy E, Czempiel T, Örnek EP, Eck U, Tombari F, Navab N. Holistic OR domain modeling: a semantic scene graph approach. Int J Comput Assist Radiol Surg 2024; 19:791-799. [PMID: 37823976 PMCID: PMC11098880 DOI: 10.1007/s11548-023-03022-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 09/12/2023] [Indexed: 10/13/2023]
Abstract
PURPOSE Surgical procedures take place in highly complex operating rooms (OR), involving medical staff, patients, devices and their interactions. Until now, only medical professionals are capable of comprehending these intricate links and interactions. This work advances the field toward automated, comprehensive and semantic understanding and modeling of the OR domain by introducing semantic scene graphs (SSG) as a novel approach to describing and summarizing surgical environments in a structured and semantically rich manner. METHODS We create the first open-source 4D SSG dataset. 4D-OR includes simulated total knee replacement surgeries captured by RGB-D sensors in a realistic OR simulation center. It includes annotations for SSGs, human and object pose, clinical roles and surgical phase labels. We introduce a neural network-based SSG generation pipeline for semantic reasoning in the OR and apply our approach to two downstream tasks: clinical role prediction and surgical phase recognition. RESULTS We show that our pipeline can successfully reason within the OR domain. The capabilities of our scene graphs are further highlighted by their successful application to clinical role prediction and surgical phase recognition tasks. CONCLUSION This work paves the way for multimodal holistic operating room modeling, with the potential to significantly enhance the state of the art in surgical data analysis, such as enabling more efficient and precise decision-making during surgical procedures, and ultimately improving patient safety and surgical outcomes. We release our code and dataset at github.com/egeozsoy/4D-OR.
Collapse
Affiliation(s)
- Ege Özsoy
- Computer Aided Medical Procedures, Technische Universität München, Garching, Germany.
| | - Tobias Czempiel
- Computer Aided Medical Procedures, Technische Universität München, Garching, Germany
| | - Evin Pınar Örnek
- Computer Aided Medical Procedures, Technische Universität München, Garching, Germany
| | - Ulrich Eck
- Computer Aided Medical Procedures, Technische Universität München, Garching, Germany
| | - Federico Tombari
- Computer Aided Medical Procedures, Technische Universität München, Garching, Germany
- Google, Zurich, Switzerland
| | - Nassir Navab
- Computer Aided Medical Procedures, Technische Universität München, Garching, Germany
| |
Collapse
|
10
|
Gui S, Wang Z, Chen J, Zhou X, Zhang C, Cao Y. MT4MTL-KD: A Multi-Teacher Knowledge Distillation Framework for Triplet Recognition. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:1628-1639. [PMID: 38127608 DOI: 10.1109/tmi.2023.3345736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
The recognition of surgical triplets plays a critical role in the practical application of surgical videos. It involves the sub-tasks of recognizing instruments, verbs, and targets, while establishing precise associations between them. Existing methods face two significant challenges in triplet recognition: 1) the imbalanced class distribution of surgical triplets may lead to spurious task association learning, and 2) the feature extractors cannot reconcile local and global context modeling. To overcome these challenges, this paper presents a novel multi-teacher knowledge distillation framework for multi-task triplet learning, known as MT4MTL-KD. MT4MTL-KD leverages teacher models trained on less imbalanced sub-tasks to assist multi-task student learning for triplet recognition. Moreover, we adopt different categories of backbones for the teacher and student models, facilitating the integration of local and global context modeling. To further align the semantic knowledge between the triplet task and its sub-tasks, we propose a novel feature attention module (FAM). This module utilizes attention mechanisms to assign multi-task features to specific sub-tasks. We evaluate the performance of MT4MTL-KD on both the 5-fold cross-validation and the CholecTriplet challenge splits of the CholecT45 dataset. The experimental results consistently demonstrate the superiority of our framework over state-of-the-art methods, achieving significant improvements of up to 6.4% on the cross-validation split.
Collapse
|
11
|
Zhang B, Sarhan MH, Goel B, Petculescu S, Ghanem A. SF-TMN: SlowFast temporal modeling network for surgical phase recognition. Int J Comput Assist Radiol Surg 2024:10.1007/s11548-024-03095-1. [PMID: 38512588 DOI: 10.1007/s11548-024-03095-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 02/29/2024] [Indexed: 03/23/2024]
Abstract
PURPOSE Automatic surgical phase recognition is crucial for video-based assessment systems in surgical education. Utilizing temporal information is crucial for surgical phase recognition; hence, various recent approaches extract frame-level features to conduct full video temporal modeling. METHODS For better temporal modeling, we propose SlowFast temporal modeling network (SF-TMN) for offline surgical phase recognition that can achieve not only frame-level full video temporal modeling but also segment-level full video temporal modeling. We employ a feature extraction network, pretrained on the target dataset, to extract features from video frames as the training data for SF-TMN. The Slow Path in SF-TMN utilizes all frame features for frame temporal modeling. The Fast Path in SF-TMN utilizes segment-level features summarized from frame features for segment temporal modeling. The proposed paradigm is flexible regarding the choice of temporal modeling networks. RESULTS We explore MS-TCN and ASFormer as temporal modeling networks and experiment with multiple combination strategies for Slow and Fast Paths. We evaluate SF-TMN on Cholec80 and Cataract-101 surgical phase recognition tasks and demonstrate that SF-TMN can achieve state-of-the-art results on all considered metrics. SF-TMN with ASFormer backbone outperforms the state-of-the-art Swin BiGRU by approximately 1% in accuracy and 1.5% in recall on Cholec80. We also evaluate SF-TMN on action segmentation datasets including 50salads, GTEA, and Breakfast, and achieve state-of-the-art results. CONCLUSION The improvement in the results shows that combining temporal information from both frame level and segment level by refining outputs with temporal refinement stages is beneficial for the temporal modeling of surgical phases.
Collapse
Affiliation(s)
- Bokai Zhang
- Johnson & Johnson MedTech, 1100 Olive Way, Suite 1100, Seattle, WA, 98101, USA.
| | - Mohammad Hasan Sarhan
- Johnson & Johnson MedTech, Robert-Koch-Straße 1, 22851, Norderstedt, Schleswig-Holstein, Germany
| | - Bharti Goel
- Johnson & Johnson MedTech, 5490 Great America Pkwy, Santa Clara, CA, 95054, USA
| | - Svetlana Petculescu
- Johnson & Johnson MedTech, 1100 Olive Way, Suite 1100, Seattle, WA, 98101, USA
| | - Amer Ghanem
- Johnson & Johnson MedTech, 1100 Olive Way, Suite 1100, Seattle, WA, 98101, USA
| |
Collapse
|
12
|
Zhai Y, Chen Z, Zheng Z, Wang X, Yan X, Liu X, Yin J, Wang J, Zhang J. Artificial intelligence for automatic surgical phase recognition of laparoscopic gastrectomy in gastric cancer. Int J Comput Assist Radiol Surg 2024; 19:345-353. [PMID: 37914911 DOI: 10.1007/s11548-023-03027-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Accepted: 10/02/2023] [Indexed: 11/03/2023]
Abstract
PURPOSE This study aimed to classify laparoscopic gastric cancer phases. We also aimed to develop a transformer-based artificial intelligence (AI) model for automatic surgical phase recognition and to evaluate the model's performance using laparoscopic gastric cancer surgical videos. METHODS One hundred patients who underwent laparoscopic surgery for gastric cancer were included in this study. All surgical videos were labeled and classified into eight phases (P0. Preparation. P1. Separate the greater gastric curvature. P2. Separate the distal stomach. P3. Separate lesser gastric curvature. P4. Dissect the superior margin of the pancreas. P5. Separation of the proximal stomach. P6. Digestive tract reconstruction. P7. End of operation). This study proposed an AI phase recognition model consisting of a convolutional neural network-based visual feature extractor and temporal relational transformer. RESULTS A visual and temporal relationship network was proposed to automatically perform accurate surgical phase prediction. The average time for all surgical videos in the video set was 9114 ± 2571 s. The longest phase was at P1 (3388 s). The final research accuracy, F1, recall, and precision were 90.128, 87.04, 87.04, and 87.32%, respectively. The phase with the highest recognition accuracy was P1, and that with the lowest accuracy was P2. CONCLUSION An AI model based on neural and transformer networks was developed in this study. This model can identify the phases of laparoscopic surgery for gastric cancer accurately. AI can be used as an analytical tool for gastric cancer surgical videos.
Collapse
Affiliation(s)
- Yuhao Zhai
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, 95 Yong'an Road, Xicheng District, Beijing, China
- State Key Lab of Digestive Health, Beijing, China
| | - Zhen Chen
- Centre for Artificial Intelligence and Robotics (CAIR), Hong Kong Institute of Science and Innovation, Chinese Academy of Sciences, Hong Kong SAR, China
| | - Zhi Zheng
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, 95 Yong'an Road, Xicheng District, Beijing, China
- State Key Lab of Digestive Health, Beijing, China
| | - Xi Wang
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, 95 Yong'an Road, Xicheng District, Beijing, China
- State Key Lab of Digestive Health, Beijing, China
| | - Xiaosheng Yan
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, 95 Yong'an Road, Xicheng District, Beijing, China
- State Key Lab of Digestive Health, Beijing, China
| | - Xiaoye Liu
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, 95 Yong'an Road, Xicheng District, Beijing, China
- State Key Lab of Digestive Health, Beijing, China
| | - Jie Yin
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, 95 Yong'an Road, Xicheng District, Beijing, China
- State Key Lab of Digestive Health, Beijing, China
| | - Jinqiao Wang
- Centre for Artificial Intelligence and Robotics (CAIR), Hong Kong Institute of Science and Innovation, Chinese Academy of Sciences, Hong Kong SAR, China.
- National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, 95 Zhongguancun East Road, Haidian District, Beijing, China.
- Wuhan AI Research, Wuhan, China.
| | - Jun Zhang
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, 95 Yong'an Road, Xicheng District, Beijing, China.
- State Key Lab of Digestive Health, Beijing, China.
| |
Collapse
|
13
|
Kostiuchik G, Sharan L, Mayer B, Wolf I, Preim B, Engelhardt S. Surgical phase and instrument recognition: how to identify appropriate dataset splits. Int J Comput Assist Radiol Surg 2024:10.1007/s11548-024-03063-9. [PMID: 38285380 DOI: 10.1007/s11548-024-03063-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 01/08/2024] [Indexed: 01/30/2024]
Abstract
PURPOSE Machine learning approaches can only be reliably evaluated if training, validation, and test data splits are representative and not affected by the absence of classes. Surgical workflow and instrument recognition are two tasks that are complicated in this manner, because of heavy data imbalances resulting from different length of phases and their potential erratic occurrences. Furthermore, sub-properties like instrument (co-)occurrence are usually not particularly considered when defining the split. METHODS We present a publicly available data visualization tool that enables interactive exploration of dataset partitions for surgical phase and instrument recognition. The application focuses on the visualization of the occurrence of phases, phase transitions, instruments, and instrument combinations across sets. Particularly, it facilitates assessment of dataset splits, especially regarding identification of sub-optimal dataset splits. RESULTS We performed analysis of the datasets Cholec80, CATARACTS, CaDIS, M2CAI-workflow, and M2CAI-tool using the proposed application. We were able to uncover phase transitions, individual instruments, and combinations of surgical instruments that were not represented in one of the sets. Addressing these issues, we identify possible improvements in the splits using our tool. A user study with ten participants demonstrated that the participants were able to successfully solve a selection of data exploration tasks. CONCLUSION In highly unbalanced class distributions, special care should be taken with respect to the selection of an appropriate dataset split because it can greatly influence the assessments of machine learning approaches. Our interactive tool allows for determination of better splits to improve current practices in the field. The live application is available at https://cardio-ai.github.io/endovis-ml/ .
Collapse
Affiliation(s)
- Georgii Kostiuchik
- Department of Cardiac Surgery, Heidelberg University Hospital, Heidelberg, Germany.
- DZHK (German Centre for Cardiovascular Research), Partner Site Heidelberg/Mannheim, Heidelberg, Germany.
| | - Lalith Sharan
- Department of Cardiac Surgery, Heidelberg University Hospital, Heidelberg, Germany
- DZHK (German Centre for Cardiovascular Research), Partner Site Heidelberg/Mannheim, Heidelberg, Germany
| | - Benedikt Mayer
- Department of Simulation and Graphics, University of Magdeburg, Magdeburg, Germany
| | - Ivo Wolf
- Department of Computer Science, Mannheim University of Applied Sciences, Mannheim, Germany
| | - Bernhard Preim
- Department of Simulation and Graphics, University of Magdeburg, Magdeburg, Germany
| | - Sandy Engelhardt
- Department of Cardiac Surgery, Heidelberg University Hospital, Heidelberg, Germany
- DZHK (German Centre for Cardiovascular Research), Partner Site Heidelberg/Mannheim, Heidelberg, Germany
| |
Collapse
|
14
|
Zhang J, Barbarisi S, Kadkhodamohammadi A, Stoyanov D, Luengo I. Self-knowledge distillation for surgical phase recognition. Int J Comput Assist Radiol Surg 2024; 19:61-68. [PMID: 37340283 DOI: 10.1007/s11548-023-02970-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 05/19/2023] [Indexed: 06/22/2023]
Abstract
PURPOSE Advances in surgical phase recognition are generally led by training deeper networks. Rather than going further with a more complex solution, we believe that current models can be exploited better. We propose a self-knowledge distillation framework that can be integrated into current state-of-the-art (SOTA) models without requiring any extra complexity to the models or annotations. METHODS Knowledge distillation is a framework for network regularization where knowledge is distilled from a teacher network to a student network. In self-knowledge distillation, the student model becomes the teacher such that the network learns from itself. Most phase recognition models follow an encoder-decoder framework. Our framework utilizes self-knowledge distillation in both stages. The teacher model guides the training process of the student model to extract enhanced feature representations from the encoder and build a more robust temporal decoder to tackle the over-segmentation problem. RESULTS We validate our proposed framework on the public dataset Cholec80. Our framework is embedded on top of four popular SOTA approaches and consistently improves their performance. Specifically, our best GRU model boosts performance by [Formula: see text] accuracy and [Formula: see text] F1-score over the same baseline model. CONCLUSION We embed a self-knowledge distillation framework for the first time in the surgical phase recognition training pipeline. Experimental results demonstrate that our simple yet powerful framework can improve performance of existing phase recognition models. Moreover, our extensive experiments show that even with 75% of the training set we still achieve performance on par with the same baseline model trained on the full set.
Collapse
Affiliation(s)
- Jinglu Zhang
- Medtronic Digital Surgery, 230 City Road, London, UK
| | | | | | - Danail Stoyanov
- Medtronic Digital Surgery, 230 City Road, London, UK
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, UK
| | - Imanol Luengo
- Medtronic Digital Surgery, 230 City Road, London, UK
| |
Collapse
|
15
|
Demir KC, Schieber H, Weise T, Roth D, May M, Maier A, Yang SH. Deep Learning in Surgical Workflow Analysis: A Review of Phase and Step Recognition. IEEE J Biomed Health Inform 2023; 27:5405-5417. [PMID: 37665700 DOI: 10.1109/jbhi.2023.3311628] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2023]
Abstract
OBJECTIVE In the last two decades, there has been a growing interest in exploring surgical procedures with statistical models to analyze operations at different semantic levels. This information is necessary for developing context-aware intelligent systems, which can assist the physicians during operations, evaluate procedures afterward or help the management team to effectively utilize the operating room. The objective is to extract reliable patterns from surgical data for the robust estimation of surgical activities performed during operations. The purpose of this article is to review the state-of-the-art deep learning methods that have been published after 2018 for analyzing surgical workflows, with a focus on phase and step recognition. METHODS Three databases, IEEE Xplore, Scopus, and PubMed were searched, and additional studies are added through a manual search. After the database search, 343 studies were screened and a total of 44 studies are selected for this review. CONCLUSION The use of temporal information is essential for identifying the next surgical action. Contemporary methods used mainly RNNs, hierarchical CNNs, and Transformers to preserve long-distance temporal relations. The lack of large publicly available datasets for various procedures is a great challenge for the development of new and robust models. As supervised learning strategies are used to show proof-of-concept, self-supervised, semi-supervised, or active learning methods are used to mitigate dependency on annotated data. SIGNIFICANCE The present study provides a comprehensive review of recent methods in surgical workflow analysis, summarizes commonly used architectures, datasets, and discusses challenges.
Collapse
|
16
|
Tao R, Zou X, Zheng G. LAST: LAtent Space-Constrained Transformers for Automatic Surgical Phase Recognition and Tool Presence Detection. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:3256-3268. [PMID: 37227905 DOI: 10.1109/tmi.2023.3279838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
When developing context-aware systems, automatic surgical phase recognition and tool presence detection are two essential tasks. There exist previous attempts to develop methods for both tasks but majority of the existing methods utilize a frame-level loss function (e.g., cross-entropy) which does not fully leverage the underlying semantic structure of a surgery, leading to sub-optimal results. In this paper, we propose multi-task learning-based, LAtent Space-constrained Transformers, referred as LAST, for automatic surgical phase recognition and tool presence detection. Our design features a two-branch transformer architecture with a novel and generic way to leverage video-level semantic information during network training. This is done by learning a non-linear compact presentation of the underlying semantic structure information of surgical videos through a transformer variational autoencoder (VAE) and by encouraging models to follow the learned statistical distributions. In other words, LAST is of structure-aware and favors predictions that lie on the extracted low dimensional data manifold. Validated on two public datasets of the cholecystectomy surgery, i.e., the Cholec80 dataset and the M2cai16 dataset, our method achieves better results than other state-of-the-art methods. Specifically, on the Cholec80 dataset, our method achieves an average accuracy of 93.12±4.71%, an average precision of 89.25±5.49%, an average recall of 90.10±5.45% and an average Jaccard of 81.11 ±7.62% for phase recognition, and an average mAP of 95.15±3.87% for tool presence detection. Similar superior performance is also observed when LAST is applied to the M2cai16 dataset.
Collapse
|
17
|
Cao J, Yip HC, Chen Y, Scheppach M, Luo X, Yang H, Cheng MK, Long Y, Jin Y, Chiu PWY, Yam Y, Meng HML, Dou Q. Intelligent surgical workflow recognition for endoscopic submucosal dissection with real-time animal study. Nat Commun 2023; 14:6676. [PMID: 37865629 PMCID: PMC10590425 DOI: 10.1038/s41467-023-42451-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Accepted: 10/11/2023] [Indexed: 10/23/2023] Open
Abstract
Recent advancements in artificial intelligence have witnessed human-level performance; however, AI-enabled cognitive assistance for therapeutic procedures has not been fully explored nor pre-clinically validated. Here we propose AI-Endo, an intelligent surgical workflow recognition suit, for endoscopic submucosal dissection (ESD). Our AI-Endo is trained on high-quality ESD cases from an expert endoscopist, covering a decade time expansion and consisting of 201,026 labeled frames. The learned model demonstrates outstanding performance on validation data, including cases from relatively junior endoscopists with various skill levels, procedures conducted with different endoscopy systems and therapeutic skills, and cohorts from international multi-centers. Furthermore, we integrate our AI-Endo with the Olympus endoscopic system and validate the AI-enabled cognitive assistance system with animal studies in live ESD training sessions. Dedicated data analysis from surgical phase recognition results is summarized in an automatically generated report for skill assessment.
Collapse
Affiliation(s)
- Jianfeng Cao
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
| | - Hon-Chi Yip
- Department of Surgery, The Chinese University of Hong Kong, Hong Kong, China.
| | - Yueyao Chen
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
| | - Markus Scheppach
- Internal Medicine III-Gastroenterology, University Hospital of Augsburg, Augsburg, Germany
| | - Xiaobei Luo
- Guangdong Provincial Key Laboratory of Gastroenterology, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Hongzheng Yang
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
| | - Ming Kit Cheng
- Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Hong Kong, China
| | - Yonghao Long
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
| | - Yueming Jin
- Department of Biomedical Engineering, National University of Singapore, Singapore, Singapore
| | - Philip Wai-Yan Chiu
- Multi-scale Medical Robotics Center and The Chinese University of Hong Kong, Hong Kong, China.
| | - Yeung Yam
- Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Hong Kong, China.
- Multi-scale Medical Robotics Center and The Chinese University of Hong Kong, Hong Kong, China.
- Centre for Perceptual and Interactive Intelligence and The Chinese University of Hong Kong, Hong Kong, China.
| | - Helen Mei-Ling Meng
- Centre for Perceptual and Interactive Intelligence and The Chinese University of Hong Kong, Hong Kong, China.
| | - Qi Dou
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China.
| |
Collapse
|
18
|
Ramesh S, Dall'Alba D, Gonzalez C, Yu T, Mascagni P, Mutter D, Marescaux J, Fiorini P, Padoy N. Weakly Supervised Temporal Convolutional Networks for Fine-Grained Surgical Activity Recognition. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:2592-2602. [PMID: 37030859 DOI: 10.1109/tmi.2023.3262847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Automatic recognition of fine-grained surgical activities, called steps, is a challenging but crucial task for intelligent intra-operative computer assistance. The development of current vision-based activity recognition methods relies heavily on a high volume of manually annotated data. This data is difficult and time-consuming to generate and requires domain-specific knowledge. In this work, we propose to use coarser and easier-to-annotate activity labels, namely phases, as weak supervision to learn step recognition with fewer step annotated videos. We introduce a step-phase dependency loss to exploit the weak supervision signal. We then employ a Single-Stage Temporal Convolutional Network (SS-TCN) with a ResNet-50 backbone, trained in an end-to-end fashion from weakly annotated videos, for temporal activity segmentation and recognition. We extensively evaluate and show the effectiveness of the proposed method on a large video dataset consisting of 40 laparoscopic gastric bypass procedures and the public benchmark CATARACTS containing 50 cataract surgeries.
Collapse
|
19
|
Ramesh S, Dall'Alba D, Gonzalez C, Yu T, Mascagni P, Mutter D, Marescaux J, Fiorini P, Padoy N. TRandAugment: temporal random augmentation strategy for surgical activity recognition from videos. Int J Comput Assist Radiol Surg 2023; 18:1665-1672. [PMID: 36944845 PMCID: PMC10491694 DOI: 10.1007/s11548-023-02864-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 03/01/2023] [Indexed: 03/23/2023]
Abstract
PURPOSE Automatic recognition of surgical activities from intraoperative surgical videos is crucial for developing intelligent support systems for computer-assisted interventions. Current state-of-the-art recognition methods are based on deep learning where data augmentation has shown the potential to improve the generalization of these methods. This has spurred work on automated and simplified augmentation strategies for image classification and object detection on datasets of still images. Extending such augmentation methods to videos is not straightforward, as the temporal dimension needs to be considered. Furthermore, surgical videos pose additional challenges as they are composed of multiple, interconnected, and long-duration activities. METHODS This work proposes a new simplified augmentation method, called TRandAugment, specifically designed for long surgical videos, that treats each video as an assemble of temporal segments and applies consistent but random transformations to each segment. The proposed augmentation method is used to train an end-to-end spatiotemporal model consisting of a CNN (ResNet50) followed by a TCN. RESULTS The effectiveness of the proposed method is demonstrated on two surgical video datasets, namely Bypass40 and CATARACTS, and two tasks, surgical phase and step recognition. TRandAugment adds a performance boost of 1-6% over previous state-of-the-art methods, that uses manually designed augmentations. CONCLUSION This work presents a simplified and automated augmentation method for long surgical videos. The proposed method has been validated on different datasets and tasks indicating the importance of devising temporal augmentation methods for long surgical videos.
Collapse
Affiliation(s)
- Sanat Ramesh
- Altair Robotics Lab, University of Verona, 37134, Verona, Italy.
- ICube, University of Strasbourg, CNRS, 67000, Strasbourg, France.
| | - Diego Dall'Alba
- Altair Robotics Lab, University of Verona, 37134, Verona, Italy
| | - Cristians Gonzalez
- University Hospital of Strasbourg, 67000, Strasbourg, France
- Institute of Image-Guided Surgery, IHU Strasbourg, 67000, Strasbourg, France
| | - Tong Yu
- ICube, University of Strasbourg, CNRS, 67000, Strasbourg, France
| | - Pietro Mascagni
- Institute of Image-Guided Surgery, IHU Strasbourg, 67000, Strasbourg, France
- Fondazione Policlinico Universitario Agostino Gemelli IRCCS, 00168, Rome, Italy
| | - Didier Mutter
- University Hospital of Strasbourg, 67000, Strasbourg, France
- IRCAD, 67000, Strasbourg, France
- Institute of Image-Guided Surgery, IHU Strasbourg, 67000, Strasbourg, France
| | | | - Paolo Fiorini
- Altair Robotics Lab, University of Verona, 37134, Verona, Italy
| | - Nicolas Padoy
- ICube, University of Strasbourg, CNRS, 67000, Strasbourg, France
- Institute of Image-Guided Surgery, IHU Strasbourg, 67000, Strasbourg, France
| |
Collapse
|
20
|
Karargyris A, Umeton R, Sheller MJ, Aristizabal A, George J, Wuest A, Pati S, Kassem H, Zenk M, Baid U, Narayana Moorthy P, Chowdhury A, Guo J, Nalawade S, Rosenthal J, Kanter D, Xenochristou M, Beutel DJ, Chung V, Bergquist T, Eddy J, Abid A, Tunstall L, Sanseviero O, Dimitriadis D, Qian Y, Xu X, Liu Y, Goh RSM, Bala S, Bittorf V, Reddy Puchala S, Ricciuti B, Samineni S, Sengupta E, Chaudhari A, Coleman C, Desinghu B, Diamos G, Dutta D, Feddema D, Fursin G, Huang X, Kashyap S, Lane N, Mallick I, Mascagni P, Mehta V, Ferro Moraes C, Natarajan V, Nikolov N, Padoy N, Pekhimenko G, Reddi VJ, Reina GA, Ribalta P, Singh A, Thiagarajan JJ, Albrecht J, Wolf T, Miller G, Fu H, Shah P, Xu D, Yadav P, Talby D, Awad MM, Howard JP, Rosenthal M, Marchionni L, Loda M, Johnson JM, Bakas S, Mattson P. Federated benchmarking of medical artificial intelligence with MedPerf. NAT MACH INTELL 2023; 5:799-810. [PMID: 38706981 PMCID: PMC11068064 DOI: 10.1038/s42256-023-00652-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2021] [Accepted: 04/06/2023] [Indexed: 05/07/2024]
Abstract
Medical artificial intelligence (AI) has tremendous potential to advance healthcare by supporting and contributing to the evidence-based practice of medicine, personalizing patient treatment, reducing costs, and improving both healthcare provider and patient experience. Unlocking this potential requires systematic, quantitative evaluation of the performance of medical AI models on large-scale, heterogeneous data capturing diverse patient populations. Here, to meet this need, we introduce MedPerf, an open platform for benchmarking AI models in the medical domain. MedPerf focuses on enabling federated evaluation of AI models, by securely distributing them to different facilities, such as healthcare organizations. This process of bringing the model to the data empowers each facility to assess and verify the performance of AI models in an efficient and human-supervised process, while prioritizing privacy. We describe the current challenges healthcare and AI communities face, the need for an open platform, the design philosophy of MedPerf, its current implementation status and real-world deployment, our roadmap and, importantly, the use of MedPerf with multiple international institutions within cloud-based technology and on-premises scenarios. Finally, we welcome new contributions by researchers and organizations to further strengthen MedPerf as an open benchmarking platform.
Collapse
Affiliation(s)
- Alexandros Karargyris
- IHU Strasbourg, Strasbourg, France
- University of Strasbourg, Strasbourg, France
- These authors contributed equally: Alexandros Karargyris, Renato Umeton, Micah J. Sheller
| | - Renato Umeton
- Dana-Farber Cancer Institute, Boston, MA, USA
- Weill Cornell Medicine, New York, NY, USA
- Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Massachusetts Institute of Technology, Cambridge, MA, USA
- These authors contributed equally: Alexandros Karargyris, Renato Umeton, Micah J. Sheller
| | - Micah J. Sheller
- Intel, Santa Clara, CA, USA
- These authors contributed equally: Alexandros Karargyris, Renato Umeton, Micah J. Sheller
| | | | | | - Anna Wuest
- Dana-Farber Cancer Institute, Boston, MA, USA
- Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Sarthak Pati
- Perelman School of Medicine, Philadelphia, PA, USA
- University of Pennsylvania, Philadelphia, PA, USA
| | | | - Maximilian Zenk
- German Cancer Research Center, Heidelberg, Germany
- University of Heidelberg, Heidelberg, Germany
| | - Ujjwal Baid
- Perelman School of Medicine, Philadelphia, PA, USA
- University of Pennsylvania, Philadelphia, PA, USA
| | | | | | - Junyi Guo
- Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | | | - Jacob Rosenthal
- Dana-Farber Cancer Institute, Boston, MA, USA
- Weill Cornell Medicine, New York, NY, USA
| | | | | | - Daniel J. Beutel
- University of Cambridge, Cambridge, UK
- Flower Labs, Hamburg, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Akshay Chaudhari
- Stanford University, Stanford, CA, USA
- Stanford University School of Medicine, Stanford, CA, USA
| | | | | | | | | | | | | | | | | | - Nicholas Lane
- University of Cambridge, Cambridge, UK
- Flower Labs, Hamburg, Germany
| | | | | | | | | | - Pietro Mascagni
- IHU Strasbourg, Strasbourg, France
- Fondazione Policlinico Universitario A. Gemelli IRCCS, Rome, Italy
| | | | | | | | | | - Nicolas Padoy
- IHU Strasbourg, Strasbourg, France
- University of Strasbourg, Strasbourg, France
| | - Gennady Pekhimenko
- University of Toronto, Toronto, Ontario, Canada
- Vector Institute, Toronto, Ontario, Canada
| | | | | | | | - Abhishek Singh
- Massachusetts Institute of Technology, Cambridge, MA, USA
| | | | | | | | | | | | | | | | | | | | - Mark M. Awad
- Dana-Farber Cancer Institute, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Jeremy P. Howard
- fast.ai, San Francisco, CA, USA
- University of Queensland, Brisbane, Queensland, Australia
| | - Michael Rosenthal
- Dana-Farber Cancer Institute, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
- Brigham and Women’s Hospital, Boston, MA, USA
| | | | - Massimo Loda
- Dana-Farber Cancer Institute, Boston, MA, USA
- Weill Cornell Medicine, New York, NY, USA
- Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Spyridon Bakas
- Perelman School of Medicine, Philadelphia, PA, USA
- University of Pennsylvania, Philadelphia, PA, USA
- These authors jointly supervised this work: Spyridon Bakas, Peter Mattson
| | - Peter Mattson
- MLCommons, San Francisco, CA, USA
- Google, Mountain View, CA, USA
- These authors jointly supervised this work: Spyridon Bakas, Peter Mattson
| |
Collapse
|
21
|
Zang C, Turkcan MK, Narasimhan S, Cao Y, Yarali K, Xiang Z, Szot S, Ahmad F, Choksi S, Bitner DP, Filicori F, Kostic Z. Surgical Phase Recognition in Inguinal Hernia Repair-AI-Based Confirmatory Baseline and Exploration of Competitive Models. Bioengineering (Basel) 2023; 10:654. [PMID: 37370585 DOI: 10.3390/bioengineering10060654] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 05/18/2023] [Accepted: 05/23/2023] [Indexed: 06/29/2023] Open
Abstract
Video-recorded robotic-assisted surgeries allow the use of automated computer vision and artificial intelligence/deep learning methods for quality assessment and workflow analysis in surgical phase recognition. We considered a dataset of 209 videos of robotic-assisted laparoscopic inguinal hernia repair (RALIHR) collected from 8 surgeons, defined rigorous ground-truth annotation rules, then pre-processed and annotated the videos. We deployed seven deep learning models to establish the baseline accuracy for surgical phase recognition and explored four advanced architectures. For rapid execution of the studies, we initially engaged three dozen MS-level engineering students in a competitive classroom setting, followed by focused research. We unified the data processing pipeline in a confirmatory study, and explored a number of scenarios which differ in how the DL networks were trained and evaluated. For the scenario with 21 validation videos of all surgeons, the Video Swin Transformer model achieved ~0.85 validation accuracy, and the Perceiver IO model achieved ~0.84. Our studies affirm the necessity of close collaborative research between medical experts and engineers for developing automated surgical phase recognition models deployable in clinical settings.
Collapse
Affiliation(s)
- Chengbo Zang
- Department of Electrical Engineering, Columbia University, New York, NY 10027, USA
| | - Mehmet Kerem Turkcan
- Department of Electrical Engineering, Columbia University, New York, NY 10027, USA
| | - Sanjeev Narasimhan
- Department of Computer Science, Columbia University, New York, NY 10027, USA
| | - Yuqing Cao
- Department of Electrical Engineering, Columbia University, New York, NY 10027, USA
| | - Kaan Yarali
- Department of Electrical Engineering, Columbia University, New York, NY 10027, USA
| | - Zixuan Xiang
- Department of Electrical Engineering, Columbia University, New York, NY 10027, USA
| | - Skyler Szot
- Department of Electrical Engineering, Columbia University, New York, NY 10027, USA
| | - Feroz Ahmad
- Department of Computer Science, Columbia University, New York, NY 10027, USA
| | - Sarah Choksi
- Intraoperative Performance Analytics Laboratory (IPAL), Lenox Hill Hospital, New York, NY 10021, USA
| | - Daniel P Bitner
- Intraoperative Performance Analytics Laboratory (IPAL), Lenox Hill Hospital, New York, NY 10021, USA
| | - Filippo Filicori
- Intraoperative Performance Analytics Laboratory (IPAL), Lenox Hill Hospital, New York, NY 10021, USA
- Zucker School of Medicine at Hofstra/Northwell Health, Hempstead, NY 11549, USA
| | - Zoran Kostic
- Department of Electrical Engineering, Columbia University, New York, NY 10027, USA
| |
Collapse
|
22
|
Ramesh S, Srivastav V, Alapatt D, Yu T, Murali A, Sestini L, Nwoye CI, Hamoud I, Sharma S, Fleurentin A, Exarchakis G, Karargyris A, Padoy N. Dissecting self-supervised learning methods for surgical computer vision. Med Image Anal 2023; 88:102844. [PMID: 37270898 DOI: 10.1016/j.media.2023.102844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 05/08/2023] [Accepted: 05/15/2023] [Indexed: 06/06/2023]
Abstract
The field of surgical computer vision has undergone considerable breakthroughs in recent years with the rising popularity of deep neural network-based methods. However, standard fully-supervised approaches for training such models require vast amounts of annotated data, imposing a prohibitively high cost; especially in the clinical domain. Self-Supervised Learning (SSL) methods, which have begun to gain traction in the general computer vision community, represent a potential solution to these annotation costs, allowing to learn useful representations from only unlabeled data. Still, the effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored. In this work, we address this critical need by investigating four state-of-the-art SSL methods (MoCo v2, SimCLR, DINO, SwAV) in the context of surgical computer vision. We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection. We examine their parameterization, then their behavior with respect to training data quantities in semi-supervised settings. Correct transfer of these methods to surgery, as described and conducted in this work, leads to substantial performance gains over generic uses of SSL - up to 7.4% on phase recognition and 20% on tool presence detection - as well as state-of-the-art semi-supervised phase recognition approaches by up to 14%. Further results obtained on a highly diverse selection of surgical datasets exhibit strong generalization properties. The code is available at https://github.com/CAMMA-public/SelfSupSurg.
Collapse
Affiliation(s)
- Sanat Ramesh
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France; Altair Robotics Lab, Department of Computer Science, University of Verona, Verona 37134, Italy
| | - Vinkle Srivastav
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France.
| | - Deepak Alapatt
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France
| | - Tong Yu
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France
| | - Aditya Murali
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France
| | - Luca Sestini
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France; Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milano 20133, Italy
| | | | - Idris Hamoud
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France
| | - Saurav Sharma
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France
| | | | - Georgios Exarchakis
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France; IHU Strasbourg, Strasbourg 67000, France
| | - Alexandros Karargyris
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France; IHU Strasbourg, Strasbourg 67000, France
| | - Nicolas Padoy
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France; IHU Strasbourg, Strasbourg 67000, France
| |
Collapse
|
23
|
Sharma S, Nwoye CI, Mutter D, Padoy N. Rendezvous in time: an attention-based temporal fusion approach for surgical triplet recognition. Int J Comput Assist Radiol Surg 2023:10.1007/s11548-023-02914-1. [PMID: 37097518 DOI: 10.1007/s11548-023-02914-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 04/07/2023] [Indexed: 04/26/2023]
Abstract
PURPOSE One of the recent advances in surgical AI is the recognition of surgical activities as triplets of [Formula: see text]instrument, verb, target[Formula: see text]. Albeit providing detailed information for computer-assisted intervention, current triplet recognition approaches rely only on single-frame features. Exploiting the temporal cues from earlier frames would improve the recognition of surgical action triplets from videos. METHODS In this paper, we propose Rendezvous in Time (RiT)-a deep learning model that extends the state-of-the-art model, Rendezvous, with temporal modeling. Focusing more on the verbs, our RiT explores the connectedness of current and past frames to learn temporal attention-based features for enhanced triplet recognition. RESULTS We validate our proposal on the challenging surgical triplet dataset, CholecT45, demonstrating an improved recognition of the verb and triplet along with other interactions involving the verb such as [Formula: see text]instrument, verb[Formula: see text]. Qualitative results show that the RiT produces smoother predictions for most triplet instances than the state-of-the-arts. CONCLUSION We present a novel attention-based approach that leverages the temporal fusion of video frames to model the evolution of surgical actions and exploit their benefits for surgical triplet recognition.
Collapse
Affiliation(s)
- Saurav Sharma
- ICube, University of Strasbourg, CNRS, Strasbourg, France.
| | | | - Didier Mutter
- IHU Strasbourg, Strasbourg, France
- University Hospital of Strasbourg, Strasbourg, France
| | - Nicolas Padoy
- ICube, University of Strasbourg, CNRS, Strasbourg, France
- IHU Strasbourg, Strasbourg, France
| |
Collapse
|
24
|
Zhang B, Goel B, Sarhan MH, Goel VK, Abukhalil R, Kalesan B, Stottler N, Petculescu S. Surgical workflow recognition with temporal convolution and transformer for action segmentation. Int J Comput Assist Radiol Surg 2023; 18:785-794. [PMID: 36542253 DOI: 10.1007/s11548-022-02811-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 12/09/2022] [Indexed: 12/24/2022]
Abstract
PURPOSE Automatic surgical workflow recognition enabled by computer vision algorithms plays a key role in enhancing the learning experience of surgeons. It also supports building context-aware systems that allow better surgical planning and decision making which may in turn improve outcomes. Utilizing temporal information is crucial for recognizing context; hence, various recent approaches use recurrent neural networks or transformers to recognize actions. METHODS We design and implement a two-stage method for surgical workflow recognition. We utilize R(2+1)D for video clip modeling in the first stage. We propose Action Segmentation Temporal Convolutional Transformer (ASTCFormer) network for full video modeling in the second stage. ASTCFormer utilizes action segmentation transformers (ASFormers) and temporal convolutional networks (TCNs) to build a temporally aware surgical workflow recognition system. RESULTS We compare the proposed ASTCFormer with recurrent neural networks, multi-stage TCN, and ASFormer approaches. The comparison is done on a dataset comprised of 207 robotic and laparoscopic cholecystectomy surgical videos annotated for 7 surgical phases. The proposed method outperforms the compared methods achieving a [Formula: see text] relative improvement in the average segmental F1-score over the state-of-the-art ASFormer method. Moreover, our proposed method achieves state-of-the-art results on the publicly available Cholec80 dataset. CONCLUSION The improvement in the results when using the proposed method suggests that temporal context could be better captured when adding information from TCN to the ASFormer paradigm. This addition leads to better surgical workflow recognition.
Collapse
Affiliation(s)
- Bokai Zhang
- Johnson & Johnson MedTech, 1100 Olive Way, Suite 1100, Seattle, 98101, WA, USA.
| | - Bharti Goel
- Johnson & Johnson MedTech, 5490 Great America Pkwy, Santa Clara, CA, 95054, USA
| | - Mohammad Hasan Sarhan
- Johnson & Johnson MedTech, Robert-Koch-Straße 1, 22851, Norderstedt, Schleswig-Holstein, Germany
| | - Varun Kejriwal Goel
- Johnson & Johnson MedTech, 5490 Great America Pkwy, Santa Clara, CA, 95054, USA
| | - Rami Abukhalil
- Johnson & Johnson MedTech, 5490 Great America Pkwy, Santa Clara, CA, 95054, USA
| | - Bindu Kalesan
- Johnson & Johnson MedTech, 5490 Great America Pkwy, Santa Clara, CA, 95054, USA
| | - Natalie Stottler
- Johnson & Johnson MedTech, 1100 Olive Way, Suite 1100, Seattle, 98101, WA, USA
| | - Svetlana Petculescu
- Johnson & Johnson MedTech, 1100 Olive Way, Suite 1100, Seattle, 98101, WA, USA
| |
Collapse
|
25
|
Evaluation of surgical complexity by automated surgical process recognition in robotic distal gastrectomy using artificial intelligence. Surg Endosc 2023:10.1007/s00464-023-09924-9. [PMID: 36823363 PMCID: PMC9949687 DOI: 10.1007/s00464-023-09924-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 01/28/2023] [Indexed: 02/25/2023]
Abstract
BACKGROUND Although radical gastrectomy with lymph node dissection is the standard treatment for gastric cancer, the complication rate remains high. Thus, estimation of surgical complexity is required for safety. We aim to investigate the association between the surgical process and complexity, such as a risk of complications in robotic distal gastrectomy (RDG), to establish an artificial intelligence (AI)-based automated surgical phase recognition by analyzing robotic surgical videos, and to investigate the predictability of surgical complexity by AI. METHOD This study assessed clinical data and robotic surgical videos for 56 patients who underwent RDG for gastric cancer. We investigated (1) the relationship between surgical complexity and perioperative factors (patient characteristics, surgical process); (2) AI training for automated phase recognition and model performance was assessed by comparing predictions to the surgeon-annotated reference; (3) AI model predictability for surgical complexity was calculated by the area under the curve. RESULT Surgical complexity score comprised extended total surgical duration, bleeding, and complications and was strongly associated with the intraoperative surgical process, especially in the beginning phases (area under the curve 0.913). We established an AI model that can recognize surgical phases from video with 87% accuracy; AI can determine intraoperative surgical complexity by calculating the duration of beginning phases from phases 1-3 (area under the curve 0.859). CONCLUSION Surgical complexity, as a surrogate of short-term outcomes, can be predicted by the surgical process, especially in the extended duration of beginning phases. Surgical complexity can also be evaluated with automation using our artificial intelligence-based model.
Collapse
|
26
|
Jalal NA, Alshirbaji TA, Docherty PD, Arabian H, Laufer B, Krueger-Ziolek S, Neumuth T, Moeller K. Laparoscopic Video Analysis Using Temporal, Attention, and Multi-Feature Fusion Based-Approaches. SENSORS (BASEL, SWITZERLAND) 2023; 23:1958. [PMID: 36850554 PMCID: PMC9964851 DOI: 10.3390/s23041958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/01/2023] [Revised: 02/06/2023] [Accepted: 02/07/2023] [Indexed: 06/18/2023]
Abstract
Adapting intelligent context-aware systems (CAS) to future operating rooms (OR) aims to improve situational awareness and provide surgical decision support systems to medical teams. CAS analyzes data streams from available devices during surgery and communicates real-time knowledge to clinicians. Indeed, recent advances in computer vision and machine learning, particularly deep learning, paved the way for extensive research to develop CAS. In this work, a deep learning approach for analyzing laparoscopic videos for surgical phase recognition, tool classification, and weakly-supervised tool localization in laparoscopic videos was proposed. The ResNet-50 convolutional neural network (CNN) architecture was adapted by adding attention modules and fusing features from multiple stages to generate better-focused, generalized, and well-representative features. Then, a multi-map convolutional layer followed by tool-wise and spatial pooling operations was utilized to perform tool localization and generate tool presence confidences. Finally, the long short-term memory (LSTM) network was employed to model temporal information and perform tool classification and phase recognition. The proposed approach was evaluated on the Cholec80 dataset. The experimental results (i.e., 88.5% and 89.0% mean precision and recall for phase recognition, respectively, 95.6% mean average precision for tool presence detection, and a 70.1% F1-score for tool localization) demonstrated the ability of the model to learn discriminative features for all tasks. The performances revealed the importance of integrating attention modules and multi-stage feature fusion for more robust and precise detection of surgical phases and tools.
Collapse
Affiliation(s)
- Nour Aldeen Jalal
- Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
- Innovation Center Computer Assisted Surgery (ICCAS), University of Leipzig, 04103 Leipzig, Germany
| | - Tamer Abdulbaki Alshirbaji
- Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
- Innovation Center Computer Assisted Surgery (ICCAS), University of Leipzig, 04103 Leipzig, Germany
| | - Paul David Docherty
- Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
- Department of Mechanical Engineering, University of Canterbury, Christchurch 8041, New Zealand
| | - Herag Arabian
- Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
| | - Bernhard Laufer
- Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
| | - Sabine Krueger-Ziolek
- Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
| | - Thomas Neumuth
- Innovation Center Computer Assisted Surgery (ICCAS), University of Leipzig, 04103 Leipzig, Germany
| | - Knut Moeller
- Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
- Department of Mechanical Engineering, University of Canterbury, Christchurch 8041, New Zealand
- Department of Microsystems Engineering, University of Freiburg, 79110 Freiburg, Germany
| |
Collapse
|
27
|
Fer D, Zhang B, Abukhalil R, Goel V, Goel B, Barker J, Kalesan B, Barragan I, Gaddis ML, Kilroy PG. An artificial intelligence model that automatically labels roux-en-Y gastric bypasses, a comparison to trained surgeon annotators. Surg Endosc 2023:10.1007/s00464-023-09870-6. [PMID: 36658282 DOI: 10.1007/s00464-023-09870-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 01/04/2023] [Indexed: 01/21/2023]
Abstract
INTRODUCTION Artificial intelligence (AI) can automate certain tasks to improve data collection. Models have been created to annotate the steps of Roux-en-Y Gastric Bypass (RYGB). However, model performance has not been compared with individual surgeon annotator performance. We developed a model that automatically labels RYGB steps and compares its performance to surgeons. METHODS AND PROCEDURES 545 videos (17 surgeons) of laparoscopic RYGB procedures were collected. An annotation guide (12 steps, 52 tasks) was developed. Steps were annotated by 11 surgeons. Each video was annotated by two surgeons and a third reconciled the differences. A convolutional AI model was trained to identify steps and compared with manual annotation. For modeling, we used 390 videos for training, 95 for validation, and 60 for testing. The performance comparison between AI model versus manual annotation was performed using ANOVA (Analysis of Variance) in a subset of 60 testing videos. We assessed the performance of the model at each step and poor performance was defined (F1-score < 80%). RESULTS The convolutional model identified 12 steps in the RYGB architecture. Model performance varied at each step [F1 > 90% for 7, and > 80% for 2]. The reconciled manual annotation data (F1 > 80% for > 5 steps) performed better than trainee's (F1 > 80% for 2-5 steps for 4 annotators, and < 2 steps for 4 annotators). In testing subset, certain steps had low performance, indicating potential ambiguities in surgical landmarks. Additionally, some videos were easier to annotate than others, suggesting variability. After controlling for variability, the AI algorithm was comparable to the manual (p < 0.0001). CONCLUSION AI can be used to identify surgical landmarks in RYGB comparable to the manual process. AI was more accurate to recognize some landmarks more accurately than surgeons. This technology has the potential to improve surgical training by assessing the learning curves of surgeons at scale.
Collapse
Affiliation(s)
- Danyal Fer
- University of California, San Francisco-East Bay, General Surgery, Oakland, CA, USA.,Johnson & Johnson MedTech, New Brunswick, NJ, USA
| | - Bokai Zhang
- Johnson & Johnson MedTech, New Brunswick, NJ, USA
| | - Rami Abukhalil
- Johnson & Johnson MedTech, New Brunswick, NJ, USA. .,, 5490 Great America Parkway, Santa Clara, CA, 95054, USA.
| | - Varun Goel
- University of California, San Francisco-East Bay, General Surgery, Oakland, CA, USA.,Johnson & Johnson MedTech, New Brunswick, NJ, USA
| | - Bharti Goel
- Johnson & Johnson MedTech, New Brunswick, NJ, USA
| | | | | | | | | | | |
Collapse
|
28
|
Park M, Oh S, Jeong T, Yu S. Multi-Stage Temporal Convolutional Network with Moment Loss and Positional Encoding for Surgical Phase Recognition. Diagnostics (Basel) 2022; 13:107. [PMID: 36611399 PMCID: PMC9818879 DOI: 10.3390/diagnostics13010107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 12/28/2022] [Accepted: 12/28/2022] [Indexed: 12/31/2022] Open
Abstract
In recent times, many studies concerning surgical video analysis are being conducted due to its growing importance in many medical applications. In particular, it is very important to be able to recognize the current surgical phase because the phase information can be utilized in various ways both during and after surgery. This paper proposes an efficient phase recognition network, called MomentNet, for cholecystectomy endoscopic videos. Unlike LSTM-based network, MomentNet is based on a multi-stage temporal convolutional network. Besides, to improve the phase prediction accuracy, the proposed method adopts a new loss function to supplement the general cross entropy loss function. The new loss function significantly improves the performance of the phase recognition network by constraining un-desirable phase transition and preventing over-segmentation. In addition, MomnetNet effectively applies positional encoding techniques, which are commonly applied in transformer architectures, to the multi-stage temporal convolution network. By using the positional encoding techniques, MomentNet can provide important temporal context, resulting in higher phase prediction accuracy. Furthermore, the MomentNet applies label smoothing technique to suppress overfitting and replaces the backbone network for feature extraction to further improve the network performance. As a result, the MomentNet achieves 92.31% accuracy in the phase recognition task with the Cholec80 dataset, which is 4.55% higher than that of the baseline architecture.
Collapse
Affiliation(s)
- Minyoung Park
- School of Electrical and Electronics Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul 06974, Republic of Korea
| | - Seungtaek Oh
- School of Electrical and Electronics Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul 06974, Republic of Korea
| | - Taikyeong Jeong
- School of Artificial Intelligence Convergence, Hallym University, Chuncheon 24252, Republic of Korea
| | - Sungwook Yu
- School of Electrical and Electronics Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul 06974, Republic of Korea
| |
Collapse
|
29
|
Golany T, Aides A, Freedman D, Rabani N, Liu Y, Rivlin E, Corrado GS, Matias Y, Khoury W, Kashtan H, Reissman P. Artificial intelligence for phase recognition in complex laparoscopic cholecystectomy. Surg Endosc 2022; 36:9215-9223. [PMID: 35941306 PMCID: PMC9652206 DOI: 10.1007/s00464-022-09405-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 06/19/2022] [Indexed: 01/06/2023]
Abstract
BACKGROUND The potential role and benefits of AI in surgery has yet to be determined. This study is a first step in developing an AI system for minimizing adverse events and improving patient's safety. We developed an Artificial Intelligence (AI) algorithm and evaluated its performance in recognizing surgical phases of laparoscopic cholecystectomy (LC) videos spanning a range of complexities. METHODS A set of 371 LC videos with various complexity levels and containing adverse events was collected from five hospitals. Two expert surgeons segmented each video into 10 phases including Calot's triangle dissection and clipping and cutting. For each video, adverse events were also annotated when present (major bleeding; gallbladder perforation; major bile leakage; and incidental finding) and complexity level (on a scale of 1-5) was also recorded. The dataset was then split in an 80:20 ratio (294 and 77 videos), stratified by complexity, hospital, and adverse events to train and test the AI model, respectively. The AI-surgeon agreement was then compared to the agreement between surgeons. RESULTS The mean accuracy of the AI model for surgical phase recognition was 89% [95% CI 87.1%, 90.6%], comparable to the mean inter-annotator agreement of 90% [95% CI 89.4%, 90.5%]. The model's accuracy was inversely associated with procedure complexity, decreasing from 92% (complexity level 1) to 88% (complexity level 3) to 81% (complexity level 5). CONCLUSION The AI model successfully identified surgical phases in both simple and complex LC procedures. Further validation and system training is warranted to evaluate its potential applications such as to increase patient safety during surgery.
Collapse
Affiliation(s)
| | | | | | | | - Yun Liu
- Google Health, Tel Aviv, Israel
| | | | | | | | - Wisam Khoury
- Department of Surgery, Rappaport Faculty of Medicine, Carmel Medical Center, Technion, Haifa, Israel
| | - Hanoch Kashtan
- Department of Surgery, Rabin Medical Center, The Sackler School of Medicine, Tel-Aviv University, Petah Tikva, Israel
| | - Petachia Reissman
- Department of Surgery, The Hebrew University School of Medicine, Sharee Zedek Medical Center, Jerusalem, Israel.
- Digestive Disease Institute, Shaare-Zedek Medical Center, The Hebrew University School of Medicine, P.O. Box 3235, 91031, Jerusalem, Israel.
| |
Collapse
|
30
|
Fang L, Mou L, Gu Y, Hu Y, Chen B, Chen X, Wang Y, Liu J, Zhao Y. Global-local multi-stage temporal convolutional network for cataract surgery phase recognition. Biomed Eng Online 2022; 21:82. [PMID: 36451164 PMCID: PMC9710114 DOI: 10.1186/s12938-022-01048-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 11/04/2022] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Surgical video phase recognition is an essential technique in computer-assisted surgical systems for monitoring surgical procedures, which can assist surgeons in standardizing procedures and enhancing postsurgical assessment and indexing. However, the high similarity between the phases and temporal variations of cataract videos still poses the greatest challenge for video phase recognition. METHODS In this paper, we introduce a global-local multi-stage temporal convolutional network (GL-MSTCN) to explore the subtle differences between high similarity surgical phases and mitigate the temporal variations of surgical videos. The presented work consists of a triple-stream network (i.e., pupil stream, instrument stream, and video frame stream) and a multi-stage temporal convolutional network. The triple-stream network first detects the pupil and surgical instruments regions in the frame separately and then obtains the fine-grained semantic features of the video frames. The proposed multi-stage temporal convolutional network improves the surgical phase recognition performance by capturing longer time series features through dilated convolutional layers with varying receptive fields. RESULTS Our method is thoroughly validated on the CSVideo dataset with 32 cataract surgery videos and the public Cataract101 dataset with 101 cataract surgery videos, outperforming state-of-the-art approaches with 95.8% and 96.5% accuracy, respectively. CONCLUSIONS The experimental results show that the use of global and local feature information can effectively enhance the model to explore fine-grained features and mitigate temporal and spatial variations, thus improving the surgical phase recognition performance of the proposed GL-MSTCN.
Collapse
Affiliation(s)
- Lixin Fang
- grid.469325.f0000 0004 1761 325XCollege of Mechanical Engineering, Zhejiang University of Technology, Hangzhou, 310014 China ,grid.9227.e0000000119573309Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, China
| | - Lei Mou
- grid.9227.e0000000119573309Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, China
| | - Yuanyuan Gu
- grid.9227.e0000000119573309Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, China ,grid.9227.e0000000119573309Zhejiang Engineering Research Center for Biomedical Materials, Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, 315300 China
| | - Yan Hu
- grid.263817.90000 0004 1773 1790Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, 518055 China
| | - Bang Chen
- grid.9227.e0000000119573309Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, China
| | - Xu Chen
- Department of Ophthalmology, Shanghai Aier Eye Hospital, Shanghai, China ,Department of Ophthalmology, Shanghai Aier Qingliang Eye Hospital, Shanghai, China ,grid.258164.c0000 0004 1790 3548Aier Eye Hospital, Jinan University, No. 601, Huangpu Road West, Guangzhou, China ,grid.216417.70000 0001 0379 7164Aier School of Ophthalmology, Central South University Changsha, Changsha, Hunan China
| | - Yang Wang
- grid.9227.e0000000119573309Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China
| | - Jiang Liu
- grid.263817.90000 0004 1773 1790Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, 518055 China
| | - Yitian Zhao
- grid.9227.e0000000119573309Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, China ,grid.9227.e0000000119573309Zhejiang Engineering Research Center for Biomedical Materials, Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, 315300 China
| |
Collapse
|
31
|
Zou X, Liu W, Wang J, Tao R, Zheng G. ARST: auto-regressive surgical transformer for phase recognition from laparoscopic videos. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING: IMAGING & VISUALIZATION 2022. [DOI: 10.1080/21681163.2022.2145238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Xiaoyang Zou
- Institute of Medical Robotics, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Wenyong Liu
- Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, Beijing, China
| | - Junchen Wang
- School of Mechanical Engineering and Automation, Beihang University, Beijing, China
| | - Rong Tao
- Institute of Medical Robotics, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Guoyan Zheng
- Institute of Medical Robotics, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
32
|
Jin Y, Long Y, Gao X, Stoyanov D, Dou Q, Heng PA. Trans-SVNet: hybrid embedding aggregation Transformer for surgical workflow analysis. Int J Comput Assist Radiol Surg 2022; 17:2193-2202. [PMID: 36129573 DOI: 10.1007/s11548-022-02743-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 08/31/2022] [Indexed: 11/05/2022]
Abstract
PURPOSE Real-time surgical workflow analysis has been a key component for computer-assisted intervention system to improve cognitive assistance. Most existing methods solely rely on conventional temporal models and encode features with a successive spatial-temporal arrangement. Supportive benefits of intermediate features are partially lost from both visual and temporal aspects. In this paper, we rethink feature encoding to attend and preserve the critical information for accurate workflow recognition and anticipation. METHODS We introduce Transformer in surgical workflow analysis, to reconsider complementary effects of spatial and temporal representations. We propose a hybrid embedding aggregation Transformer, named Trans-SVNet, to effectively interact with the designed spatial and temporal embeddings, by employing spatial embedding to query temporal embedding sequence. We jointly optimized by loss objectives from both analysis tasks to leverage their high correlation. RESULTS We extensively evaluate our method on three large surgical video datasets. Our method consistently outperforms the state-of-the-arts across three datasets on workflow recognition task. Jointly learning with anticipation, recognition results can gain a large improvement. Our approach also shows its effectiveness on anticipation with promising performance achieved. Our model achieves a real-time inference speed of 0.0134 second per frame. CONCLUSION Experimental results demonstrate the efficacy of our hybrid embeddings integration by rediscovering the crucial cues from complementary spatial-temporal embeddings. The better performance by multi-task learning indicates that anticipation task brings the additional knowledge to recognition task. Promising effectiveness and efficiency of our method also show its promising potential to be used in operating room.
Collapse
Affiliation(s)
- Yueming Jin
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), Department of Computer Science, University College, London, UK
| | - Yonghao Long
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, HK, China
| | - Xiaojie Gao
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, HK, China
| | - Danail Stoyanov
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), Department of Computer Science, University College, London, UK
| | - Qi Dou
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, HK, China. .,Institute of Medical Intelligence and XR, The Chinese University of Hong Kong, Shatin, HK, China.
| | - Pheng-Ann Heng
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, HK, China.,Institute of Medical Intelligence and XR, The Chinese University of Hong Kong, Shatin, HK, China
| |
Collapse
|
33
|
Chen HB, Li Z, Fu P, Ni ZL, Bian GB. Spatio-Temporal Causal Transformer for Multi-Grained Surgical Phase Recognition. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2022; 2022:1663-1666. [PMID: 36086459 DOI: 10.1109/embc48229.2022.9871004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Automatic surgical phase recognition plays a key role in surgical workflow analysis and overall optimization in clinical work. In the complicated surgical procedures, similar inter-class appearance and drastic variability in phase duration make this still a challenging task. In this paper, a spatio-temporal transformer is proposed for online surgical phase recognition with different granularity. To extract rich spatial information, a spatial transformer is used to model global spatial dependencies of each time index. To overcome the variability in phase duration, a temporal transformer captures the multi-scale temporal context of different time indexes with a dual pyramid pattern. Our method is thoroughly validated on the public Cholec80 dataset with 7 coarse-grained phases and the CATARACTS2020 dataset with 19 fine-grained phases, outperforming state-of-the-art approaches with 91.4% and 84.2% accuracy, taking only 24.5M parameters.
Collapse
|
34
|
Takeuchi M, Kawakubo H, Saito K, Maeda Y, Matsuda S, Fukuda K, Nakamura R, Kitagawa Y. Automated Surgical-Phase Recognition for Robot-Assisted Minimally Invasive Esophagectomy Using Artificial Intelligence. Ann Surg Oncol 2022; 29:6847-6855. [PMID: 35763234 DOI: 10.1245/s10434-022-11996-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 05/11/2022] [Indexed: 11/18/2022]
Abstract
BACKGROUND Although a number of robot-assisted minimally invasive esophagectomy (RAMIE) procedures have been performed due to three-dimensional field of view, image stabilization, and flexible joint function, both the surgeons and surgical teams require proficiency. This study aimed to establish an artificial intelligence (AI)-based automated surgical-phase recognition system for RAMIE by analyzing robotic surgical videos. METHODS This study enrolled 31 patients who underwent RAMIE. The videos were annotated into the following nine surgical phases: preparation, lower mediastinal dissection, upper mediastinal dissection, azygos vein division, subcarinal lymph node dissection (LND), right recurrent laryngeal nerve (RLN) LND, left RLN LND, esophageal transection, and post-dissection to completion of surgery to train the AI for automated phase recognition. An additional phase ("no step") was used to indicate video sequences upon removal of the camera from the thoracic cavity. All the patients were divided into two groups, namely, early period (20 patients) and late period (11 patients), after which the relationship between the surgical-phase duration and the surgical periods was assessed. RESULTS Fourfold cross validation was applied to evaluate the performance of the current model. The AI had an accuracy of 84%. The preparation (p = 0.012), post-dissection to completion of surgery (p = 0.003), and "no step" (p < 0.001) phases predicted by the AI were significantly shorter in the late period than in the early period. CONCLUSIONS A highly accurate automated surgical-phase recognition system for RAMIE was established using deep learning. Specific phase durations were significantly associated with the surgical period at the authors' institution.
Collapse
Affiliation(s)
- Masashi Takeuchi
- Department of Surgery, Keio University School of Medicine, Tokyo, Japan
| | - Hirofumi Kawakubo
- Department of Surgery, Keio University School of Medicine, Tokyo, Japan.
| | - Kosuke Saito
- Department of Surgery, Keio University School of Medicine, Tokyo, Japan
| | - Yusuke Maeda
- Department of Surgery, Keio University School of Medicine, Tokyo, Japan
| | - Satoru Matsuda
- Department of Surgery, Keio University School of Medicine, Tokyo, Japan
| | - Kazumasa Fukuda
- Department of Surgery, Keio University School of Medicine, Tokyo, Japan
| | - Rieko Nakamura
- Department of Surgery, Keio University School of Medicine, Tokyo, Japan
| | - Yuko Kitagawa
- Department of Surgery, Keio University School of Medicine, Tokyo, Japan
| |
Collapse
|
35
|
Surgical reporting for laparoscopic cholecystectomy based on phase annotation by a convolutional neural network (CNN) and the phenomenon of phase flickering: a proof of concept. Int J Comput Assist Radiol Surg 2022; 17:1991-1999. [PMID: 35643827 PMCID: PMC9515052 DOI: 10.1007/s11548-022-02680-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 05/10/2022] [Indexed: 12/03/2022]
Abstract
Purpose Surgical documentation is an important yet time-consuming necessity in clinical routine. Beside its core function to transmit information about a surgery to other medical professionals, the surgical report has gained even more significance in terms of information extraction for scientific, administrative and judicial application. A possible basis for computer aided reporting is phase detection by convolutional neural networks (CNN). In this article we propose a workflow to generate operative notes based on the output of the TeCNO CNN. Methods Video recordings of 15 cholecystectomies were used for inference. The annotation of TeCNO was compared to that of an expert surgeon (HE) and the algorithm based annotation of a scientist (HA). The CNN output then was used to identify aberrance from standard course as basis for the final report. Moreover, we assessed the phenomenon of ‘phase flickering’ as clusters of incorrectly labeled frames and evaluated its usability. Results The accordance of the HE and CNN was 79.7% and that of HA and CNN 87.0%. ‘Phase flickering’ indicated an aberrant course with AUCs of 0.91 and 0.89 in ROC analysis regarding number and extend of concerned frames. Finally, we created operative notes based on a standard text, deviation alerts, and manual completion by the surgeon. Conclusion Computer-aided documentation is a noteworthy use case for phase recognition in standardized surgery. The analysis of phase flickering in a CNN’s annotation has the potential of retrieving more information about the course of a particular procedure to complement an automated report.
Collapse
|
36
|
Takeuchi M, Collins T, Ndagijimana A, Kawakubo H, Kitagawa Y, Marescaux J, Mutter D, Perretta S, Hostettler A, Dallemagne B. Automatic surgical phase recognition in laparoscopic inguinal hernia repair with artificial intelligence. Hernia 2022; 26:1669-1678. [PMID: 35536371 DOI: 10.1007/s10029-022-02621-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 04/21/2022] [Indexed: 11/26/2022]
Abstract
BACKGROUND Because of the complexity of the intra-abdominal anatomy in the posterior approach, a longer learning curve has been observed in laparoscopic transabdominal preperitoneal (TAPP) inguinal hernia repair. Consequently, automatic tools using artificial intelligence (AI) to monitor TAPP procedures and assess learning curves are required. The primary objective of this study was to establish a deep learning-based automated surgical phase recognition system for TAPP. A secondary objective was to investigate the relationship between surgical skills and phase duration. METHODS This study enrolled 119 patients who underwent the TAPP procedure. The surgical videos were annotated (delineated in time) and split into seven surgical phases (preparation, peritoneal flap incision, peritoneal flap dissection, hernia dissection, mesh deployment, mesh fixation, peritoneal flap closure, and additional closure). An AI model was trained to automatically recognize surgical phases from videos. The relationship between phase duration and surgical skills were also evaluated. RESULTS A fourfold cross-validation was used to assess the performance of the AI model. The accuracy was 88.81 and 85.82%, in unilateral and bilateral cases, respectively. In unilateral hernia cases, the duration of peritoneal incision (p = 0.003) and hernia dissection (p = 0.014) detected via AI were significantly shorter for experts than for trainees. CONCLUSION An automated surgical phase recognition system was established for TAPP using deep learning with a high accuracy. Our AI-based system can be useful for the automatic monitoring of surgery progress, improving OR efficiency, evaluating surgical skills and video-based surgical education. Specific phase durations detected via the AI model were significantly associated with the surgeons' learning curve.
Collapse
Affiliation(s)
- M Takeuchi
- IRCAD, Research Institute Against Digestive Cancer (IRCAD) France, 1, place de l'Hôpital, 67091, Strasbourg, France.
- Department of Surgery, Keio University School of Medicine, Tokyo, Japan.
| | - T Collins
- IRCAD, Research Institute Against Digestive Cancer (IRCAD) France, 1, place de l'Hôpital, 67091, Strasbourg, France
- IRCAD, Research Institute Against Digestive Cancer (IRCAD) Africa, Kigali, Rwanda
| | - A Ndagijimana
- IRCAD, Research Institute Against Digestive Cancer (IRCAD) Africa, Kigali, Rwanda
| | - H Kawakubo
- Department of Surgery, Keio University School of Medicine, Tokyo, Japan
| | - Y Kitagawa
- Department of Surgery, Keio University School of Medicine, Tokyo, Japan
| | - J Marescaux
- IRCAD, Research Institute Against Digestive Cancer (IRCAD) France, 1, place de l'Hôpital, 67091, Strasbourg, France
- IRCAD, Research Institute Against Digestive Cancer (IRCAD) Africa, Kigali, Rwanda
| | - D Mutter
- IRCAD, Research Institute Against Digestive Cancer (IRCAD) France, 1, place de l'Hôpital, 67091, Strasbourg, France
- Department of Digestive and Endocrine Surgery, University Hospital, Strasbourg, France
| | - S Perretta
- IRCAD, Research Institute Against Digestive Cancer (IRCAD) France, 1, place de l'Hôpital, 67091, Strasbourg, France
- Department of Digestive and Endocrine Surgery, University Hospital, Strasbourg, France
| | - A Hostettler
- IRCAD, Research Institute Against Digestive Cancer (IRCAD) France, 1, place de l'Hôpital, 67091, Strasbourg, France
- IRCAD, Research Institute Against Digestive Cancer (IRCAD) Africa, Kigali, Rwanda
| | - B Dallemagne
- IRCAD, Research Institute Against Digestive Cancer (IRCAD) France, 1, place de l'Hôpital, 67091, Strasbourg, France
- Department of Digestive and Endocrine Surgery, University Hospital, Strasbourg, France
| |
Collapse
|
37
|
Data-centric multi-task surgical phase estimation with sparse scene segmentation. Int J Comput Assist Radiol Surg 2022; 17:953-960. [PMID: 35505149 PMCID: PMC9110447 DOI: 10.1007/s11548-022-02616-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 03/22/2022] [Indexed: 11/27/2022]
Abstract
Purpose Surgical workflow estimation techniques aim to divide a surgical video into temporal segments based on predefined surgical actions or objectives, which can be of different granularity such as steps or phases. Potential applications range from real-time intra-operative feedback to automatic post-operative reports and analysis. A common approach in the literature for performing automatic surgical phase estimation is to decouple the problem into two stages: feature extraction from a single frame and temporal feature fusion. This approach is performed in two stages due to computational restrictions when processing large spatio-temporal sequences. Methods The majority of existing works focus on pushing the performance solely through temporal model development. Differently, we follow a data-centric approach and propose a training pipeline that enables models to maximise the usage of existing datasets, which are generally used in isolation. Specifically, we use dense phase annotations available in Cholec80, and sparse scene (i.e., instrument and anatomy) segmentation annotation available in CholecSeg8k in less than 5% of the overlapping frames. We propose a simple multi-task encoder that effectively fuses both streams, when available, based on their importance and jointly optimise them for performing accurate phase prediction. Results and conclusion We show that with a small fraction of scene segmentation annotations, a relatively simple model can obtain comparable results than previous state-of-the-art and more complex architectures when evaluated in similar settings. We hope that this data-centric approach can encourage new research directions where data, and how to use it, plays an important role along with model development.
Collapse
|
38
|
Hybrid Spatiotemporal Contrastive Representation Learning for Content-Based Surgical Video Retrieval. ELECTRONICS 2022. [DOI: 10.3390/electronics11091353] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
In the medical field, due to their economic and clinical benefits, there is a growing interest in minimally invasive surgeries and microscopic surgeries. These types of surgeries are often recorded during operations, and these recordings have become a key resource for education, patient disease analysis, surgical error analysis, and surgical skill assessment. However, manual searching in this collection of long-term surgical videos is an extremely labor-intensive and long-term task, requiring an effective content-based video analysis system. In this regard, previous methods for surgical video retrieval are based on handcrafted features which do not represent the video effectively. On the other hand, deep learning-based solutions were found to be effective in both surgical image and video analysis, where CNN-, LSTM- and CNN-LSTM-based methods were proposed in most surgical video analysis tasks. In this paper, we propose a hybrid spatiotemporal embedding method to enhance spatiotemporal representations using an adaptive fusion layer on top of the LSTM and temporal causal convolutional modules. To learn surgical video representations, we propose exploring the supervised contrastive learning approach to leverage label information in addition to augmented versions. By validating our approach to a video retrieval task on two datasets, Surgical Actions 160 and Cataract-101, we significantly improve on previous results in terms of mean average precision, 30.012 ± 1.778 vs. 22.54 ± 1.557 for Surgical Actions 160 and 81.134 ± 1.28 vs. 33.18 ± 1.311 for Cataract-101. We also validate the proposed method’s suitability for surgical phase recognition task using the benchmark Cholec80 surgical dataset, where our approach outperforms (with 90.2% accuracy) the state of the art.
Collapse
|
39
|
Das A, Bano S, Vasconcelos F, Khan DZ, Marcus HJ, Stoyanov D. Reducing Prediction volatility in the surgical workflow recognition of endoscopic pituitary surgery. Int J Comput Assist Radiol Surg 2022; 17:1445-1452. [PMID: 35362848 PMCID: PMC9307536 DOI: 10.1007/s11548-022-02599-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 03/08/2022] [Indexed: 11/25/2022]
Abstract
Purpose: Workflow recognition can aid surgeons before an operation when used as a training tool, during an operation by increasing operating room efficiency, and after an operation in the completion of operation notes. Although several methods have been applied to this task, they have been tested on few surgical datasets. Therefore, their generalisability is not well tested, particularly for surgical approaches utilising smaller working spaces which are susceptible to occlusion and necessitate frequent withdrawal of the endoscope. This leads to rapidly changing predictions, which reduces the clinical confidence of the methods, and hence limits their suitability for clinical translation. Methods: Firstly, the optimal neural network is found using established methods, using endoscopic pituitary surgery as an exemplar. Then, prediction volatility is formally defined as a new evaluation metric as a proxy for uncertainty, and two temporal smoothing functions are created. The first (modal, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$M_n$$\end{document}Mn) mode-averages over the previous n predictions, and the second (threshold, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$T_n$$\end{document}Tn) ensures a class is only changed after being continuously predicted for n predictions. Both functions are independently applied to the predictions of the optimal network. Results: The methods are evaluated on a 50-video dataset using fivefold cross-validation, and the optimised evaluation metric is weighted-\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$F_1$$\end{document}F1 score. The optimal model is ResNet-50+LSTM achieving 0.84 in 3-phase classification and 0.74 in 7-step classification. Applying threshold smoothing further improves these results, achieving 0.86 in 3-phase classification, and 0.75 in 7-step classification, while also drastically reducing the prediction volatility. Conclusion: The results confirm the established methods generalise to endoscopic pituitary surgery, and show simple temporal smoothing not only reduces prediction volatility, but actively improves performance.
Collapse
Affiliation(s)
- Adrito Das
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, United Kingdom.
| | - Sophia Bano
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, United Kingdom
| | - Francisco Vasconcelos
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, United Kingdom
| | - Danyal Z Khan
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, United Kingdom
- Department of Neurosurgery, National Hospital for Neurology and Neurosurgery, London, United Kingdom
| | - Hani J Marcus
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, United Kingdom
- Department of Neurosurgery, National Hospital for Neurology and Neurosurgery, London, United Kingdom
| | - Danail Stoyanov
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, United Kingdom
| |
Collapse
|
40
|
Kadkhodamohammadi A, Luengo I, Stoyanov D. PATG: position-aware temporal graph networks for surgical phase recognition on laparoscopic videos. Int J Comput Assist Radiol Surg 2022; 17:849-856. [PMID: 35353299 DOI: 10.1007/s11548-022-02600-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Accepted: 03/08/2022] [Indexed: 11/27/2022]
Abstract
PURPOSE We tackle the problem of online surgical phase recognition in laparoscopic procedures, which is key in developing context-aware supporting systems. We propose a novel approach to take temporal context in surgical videos into account by precise modeling of temporal neighborhoods. METHODS We propose a two-stage model to perform phase recognition. A CNN model is used as a feature extractor to project RGB frames into a high-dimensional feature space. We introduce a novel paradigm for surgical phase recognition which utilizes graph neural networks to incorporate temporal information. Unlike recurrent neural networks and temporal convolution networks, our graph-based approach offers a more generic and flexible way for modeling temporal relationships. Each frame is a node in the graph, and the edges in the graph are used to define temporal connections among the nodes. The flexible configuration of temporal neighborhood comes at the price of losing temporal order. To mitigate this, our approach takes temporal orders into account by encoding frame positions, which is important to reliably predict surgical phases. RESULTS Experiments are carried out on the public Cholec80 dataset that contains 80 annotated videos. The experimental results highlight the superior performance of the proposed approach compared to the state-of-the-art models on this dataset. CONCLUSION A novel approach for formulating video-based surgical phase recognition is presented. The results indicate that temporal information can be incorporated using graph-based models, and positional encoding is important to efficiently utilize temporal information. Graph networks open possibilities to use evidence theory for uncertainty analysis in surgical phase recognition.
Collapse
Affiliation(s)
| | - Imanol Luengo
- Innovation Department, Medtronic Digital Surgery, 230 City Road, London, EC1V 2QY, UK
| | - Danail Stoyanov
- Innovation Department, Medtronic Digital Surgery, 230 City Road, London, EC1V 2QY, UK
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, UK
| |
Collapse
|
41
|
Zhang Y, Bano S, Page AS, Deprest J, Stoyanov D, Vasconcelos F. Large-scale surgical workflow segmentation for laparoscopic sacrocolpopexy. Int J Comput Assist Radiol Surg 2022; 17:467-477. [PMID: 35050468 PMCID: PMC8873061 DOI: 10.1007/s11548-021-02544-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Accepted: 12/07/2021] [Indexed: 12/03/2022]
Abstract
Purpose Laparoscopic sacrocolpopexy is the gold standard procedure for the management of vaginal vault prolapse. Studying surgical skills and different approaches to this procedure requires an analysis at the level of each of its individual phases, thus motivating investigation of automated surgical workflow for expediting this research. Phase durations in this procedure are significantly larger and more variable than commonly available benchmarks such as Cholec80, and we assess these differences. Methodology We introduce sequence-to-sequence (seq2seq) models for coarse-level phase segmentation in order to deal with highly variable phase durations in Sacrocolpopexy. Multiple architectures (LSTM and transformer), configurations (time-shifted, time-synchronous), and training strategies are tested with this novel framework to explore its flexibility. Results We perform 7-fold cross-validation on a dataset with 14 complete videos of sacrocolpopexy. We perform both a frame-based (accuracy, F1-score) and an event-based (Ward metric) evaluation of our algorithms and show that different architectures present a trade-off between higher number of accurate frames (LSTM, Mode average) or more consistent ordering of phase transitions (Transformer). We compare the implementations on the widely used Cholec80 dataset and verify that relative performances are different to those in Sacrocolpopexy. Conclusions We show that workflow segmentation of Sacrocolpopexy videos has specific challenges that are different to the widely used benchmark Cholec80 and require dedicated approaches to deal with the significantly larger phase durations. We demonstrate the feasibility of seq2seq models in Sacrocolpopexy, a broad framework that can be further explored with new configurations. We show that an event-based evaluation metric is useful to evaluate workflow segmentation algorithms and provides complementary insight to the more commonly used metrics such as accuracy or F1-score. Supplementary Information The online version supplementary material available at 10.1007/s11548-021-02544-5.
Collapse
Affiliation(s)
- Yitong Zhang
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS) and Department of Computer Science, University College London, London, UK.
| | - Sophia Bano
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS) and Department of Computer Science, University College London, London, UK
| | - Ann-Sophie Page
- Department of Development and Regeneration, University Hospital Leuven, Leuven, Belgium
| | - Jan Deprest
- Department of Development and Regeneration, University Hospital Leuven, Leuven, Belgium
| | - Danail Stoyanov
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS) and Department of Computer Science, University College London, London, UK
| | - Francisco Vasconcelos
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS) and Department of Computer Science, University College London, London, UK
| |
Collapse
|
42
|
Bernhard L, Krumpholz R, Krieger Y, Czempiel T, Meining A, Navab N, Lüth T, Wilhelm D. PLAFOKON: a new concept for a patient-individual and intervention-specific flexible surgical platform. Surg Endosc 2021; 36:5303-5312. [PMID: 34919177 PMCID: PMC9160157 DOI: 10.1007/s00464-021-08908-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Accepted: 11/21/2021] [Indexed: 11/28/2022]
Abstract
Background Research in the field of surgery is mainly driven by aiming for trauma reduction as well as for personalized treatment concepts. Beyond laparoscopy, other proposed approaches for further reduction of the therapeutic trauma have failed to achieve clinical translation, with few notable exceptions. We believe that this is mainly due to a lack of flexibility and high associated costs. We aimed at addressing these issues by developing a novel minimally invasive operating platform and a preoperative design workflow for patient-individual adaptation and cost-effective rapid manufacturing of surgical manipulators. In this article, we report on the first in-vitro cholecystectomy performed with our operating platform. Methods The single-port overtube (SPOT) is a snake-like surgical manipulator for minimally invasive interventions. The system layout is highly flexible and can be adapted in design and dimensions for different kinds of surgery, based on patient- and disease-specific parameters. For collecting and analyzing this data, we developed a graphical user interface, which assists clinicians during the preoperative planning phase. Other major components of our operating platform include an instrument management system and a non-sterile user interface. For the trial surgery, we used a validated phantom which was further equipped with a porcine liver including the gallbladder. Results Following our envisioned preoperative design workflow, a suitable geometry of the surgical manipulator was determined for our trial surgery and rapidly manufactured by means of 3D printing. With this setup, we successfully performed a first in-vitro cholecystectomy, which was completed in 78 min. Conclusions By conducting the trial surgery, we demonstrated the effectiveness of our PLAFOKON operating platform. While some aspects – especially regarding usability and ergonomics – can be further optimized, the overall performance of the system is highly promising, with sufficient flexibility and strength for conducting the necessary tissue manipulations. Supplementary Information The online version contains supplementary material available at 10.1007/s00464-021-08908-x.
Collapse
Affiliation(s)
- Lukas Bernhard
- Research Group for Minimally Invasive Interdisciplinary Therapeutic Intervention (MITI), Klinikum rechts der Isar, Technical University of Munich, Munich, Germany.
| | - Roman Krumpholz
- Research Group for Minimally Invasive Interdisciplinary Therapeutic Intervention (MITI), Klinikum rechts der Isar, Technical University of Munich, Munich, Germany
| | - Yannick Krieger
- Institute of Micro Technology and Medical Device Technology, Department of Mechanical Engineering, Technical University of Munich, Munich, Germany
| | - Tobias Czempiel
- Chair for Computer Aided Medical Procedures & Augmented Reality, Technical University of Munich, Munich, Germany
| | - Alexander Meining
- Department of Internal Medicine II, Gastroenterology, University Hospital Würzburg, Würzburg, Germany
| | - Nassir Navab
- Chair for Computer Aided Medical Procedures & Augmented Reality, Technical University of Munich, Munich, Germany
| | - Tim Lüth
- Institute of Micro Technology and Medical Device Technology, Department of Mechanical Engineering, Technical University of Munich, Munich, Germany
| | - Dirk Wilhelm
- Research Group for Minimally Invasive Interdisciplinary Therapeutic Intervention (MITI), Klinikum rechts der Isar, Technical University of Munich, Munich, Germany.,Department of Surgery, Klinikum rechts der Isar, Munich, Germany
| |
Collapse
|
43
|
Pradeep CS, Sinha N. Spatio-Temporal Features Based Surgical Phase Classification Using CNNs. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2021; 2021:3332-3335. [PMID: 34891953 DOI: 10.1109/embc46164.2021.9630829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this paper, we propose a novel encoder-decoder based surgical phase classification technique leveraging on the spatio-temporal features extracted from the videos of laparoscopic cholecystectomy surgery. We use combined margin loss function to train on the computationally efficient PeleeNet architecture to extract features that exhibit: (1) Intra-phase similarity, (2) Inter-phase dissimilarity. Using these features, we propose to encapsulate sequential feature embeddings, 64 at a time and classify the surgical phase based on customized efficient residual factorized CNN architecture (ST-ERFNet). We obtained surgical phase classification accuracy of 86.07% on the publicly available Cholec80 dataset which consists of 7 surgical phases. The number of parameters required for the computation is approximately reduced by 84% and yet achieves comparable performance as the state of the art.Clinical relevance- Autonomous surgical phase classification sets the platform for automatically analyzing the entire surgical work flow. Additionally, could streamline the process of assessment of a surgery in terms of efficiency, early detection of errors or deviation from usual practice. This would potentially result in increased patient care.
Collapse
|
44
|
Chen IHA, Ghazi A, Sridhar A, Stoyanov D, Slack M, Kelly JD, Collins JW. Evolving robotic surgery training and improving patient safety, with the integration of novel technologies. World J Urol 2021; 39:2883-2893. [PMID: 33156361 PMCID: PMC8405494 DOI: 10.1007/s00345-020-03467-7] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Accepted: 09/21/2020] [Indexed: 12/18/2022] Open
Abstract
INTRODUCTION Robot-assisted surgery is becoming increasingly adopted by multiple surgical specialties. There is evidence of inherent risks of utilising new technologies that are unfamiliar early in the learning curve. The development of standardised and validated training programmes is crucial to deliver safe introduction. In this review, we aim to evaluate the current evidence and opportunities to integrate novel technologies into modern digitalised robotic training curricula. METHODS A systematic literature review of the current evidence for novel technologies in surgical training was conducted online and relevant publications and information were identified. Evaluation was made on how these technologies could further enable digitalisation of training. RESULTS Overall, the quality of available studies was found to be low with current available evidence consisting largely of expert opinion, consensus statements and small qualitative studies. The review identified that there are several novel technologies already being utilised in robotic surgery training. There is also a trend towards standardised validated robotic training curricula. Currently, the majority of the validated curricula do not incorporate novel technologies and training is delivered with more traditional methods that includes centralisation of training services with wet laboratories that have access to cadavers and dedicated training robots. CONCLUSIONS Improvements to training standards and understanding performance data have good potential to significantly lower complications in patients. Digitalisation automates data collection and brings data together for analysis. Machine learning has potential to develop automated performance feedback for trainees. Digitalised training aims to build on the current gold standards and to further improve the 'continuum of training' by integrating PBP training, 3D-printed models, telementoring, telemetry and machine learning.
Collapse
Affiliation(s)
- I-Hsuan Alan Chen
- Division of Surgery and Interventional Science, Research Department of Targeted Intervention, University College London, London, UK.
- Department of Surgery, Division of Urology, Kaohsiung Veterans General Hospital, No. 386, Dazhong 1st Rd., Zuoying District, Kaohsiung, 81362, Taiwan.
- Wellcome/ESPRC Centre for Interventional and Surgical Sciences (WEISS), University College London, London, UK.
| | - Ahmed Ghazi
- Department of Urology, Simulation Innovation Laboratory, University of Rochester, New York, USA
| | - Ashwin Sridhar
- Division of Uro-Oncology, University College London Hospital, London, UK
| | - Danail Stoyanov
- Wellcome/ESPRC Centre for Interventional and Surgical Sciences (WEISS), University College London, London, UK
| | | | - John D Kelly
- Division of Surgery and Interventional Science, Research Department of Targeted Intervention, University College London, London, UK
- Wellcome/ESPRC Centre for Interventional and Surgical Sciences (WEISS), University College London, London, UK
- Division of Uro-Oncology, University College London Hospital, London, UK
| | - Justin W Collins
- Division of Surgery and Interventional Science, Research Department of Targeted Intervention, University College London, London, UK.
- Wellcome/ESPRC Centre for Interventional and Surgical Sciences (WEISS), University College London, London, UK.
- Division of Uro-Oncology, University College London Hospital, London, UK.
| |
Collapse
|
45
|
Aspart F, Bolmgren JL, Lavanchy JL, Beldi G, Woods MS, Padoy N, Hosgor E. ClipAssistNet: bringing real-time safety feedback to operating rooms. Int J Comput Assist Radiol Surg 2021; 17:5-13. [PMID: 34297269 PMCID: PMC8739308 DOI: 10.1007/s11548-021-02441-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 06/17/2021] [Indexed: 12/18/2022]
Abstract
Purpose Cholecystectomy is one of the most common laparoscopic procedures. A critical phase of laparoscopic cholecystectomy consists in clipping the cystic duct and artery before cutting them. Surgeons can improve the clipping safety by ensuring full visibility of the clipper, while enclosing the artery or the duct with the clip applier jaws. This can prevent unintentional interaction with neighboring tissues or clip misplacement. In this article, we present a novel real-time feedback to ensure safe visibility of the instrument during this critical phase. This feedback incites surgeons to keep the tip of their clip applier visible while operating. Methods We present a new dataset of 300 laparoscopic cholecystectomy videos with frame-wise annotation of clipper tip visibility. We further present ClipAssistNet, a neural network-based image classifier which detects the clipper tip visibility in single frames. ClipAssistNet ensembles predictions from 5 neural networks trained on different subsets of the dataset. Results Our model learns to classify the clipper tip visibility by detecting its presence in the image. Measured on a separate test set, ClipAssistNet classifies the clipper tip visibility with an AUROC of 0.9107, and 66.15% specificity at 95% sensitivity. Additionally, it can perform real-time inference (16 FPS) on an embedded computing board; this enables its deployment in operating room settings. Conclusion This work presents a new application of computer-assisted surgery for laparoscopic cholecystectomy, namely real-time feedback on adequate visibility of the clip applier. We believe this feedback can increase surgeons’ attentiveness when departing from safe visibility during the critical clipping of the cystic duct and artery. Supplementary Information The online version supplementary material available at 10.1007/s11548-021-02441-x.
Collapse
Affiliation(s)
- Florian Aspart
- Caresyntax GmbH, Komturstraße 18A, 12099, Berlin, Germany.
| | - Jon L Bolmgren
- Caresyntax GmbH, Komturstraße 18A, 12099, Berlin, Germany
| | - Joël L Lavanchy
- Department of Visceral Surgery and Medicine, Inselspital, Bern University Hospital, University of Bern, 3010, Bern, Switzerland
| | - Guido Beldi
- Department of Visceral Surgery and Medicine, Inselspital, Bern University Hospital, University of Bern, 3010, Bern, Switzerland
| | | | - Nicolas Padoy
- ICube, University of Strasbourg, CNRS, IHU, Strasbourg, France
| | - Enes Hosgor
- Caresyntax GmbH, Komturstraße 18A, 12099, Berlin, Germany
| |
Collapse
|
46
|
Reiter W. Co-occurrence balanced time series classification for the semi-supervised recognition of surgical smoke. Int J Comput Assist Radiol Surg 2021; 16:2021-2027. [PMID: 34032964 DOI: 10.1007/s11548-021-02411-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Accepted: 05/14/2021] [Indexed: 12/14/2022]
Abstract
PURPOSE Automatic recognition and removal of smoke in surgical procedures can reduce risks to the patient by supporting the surgeon. Surgical smoke changes its visibility over time, impacting the vision depending on its amount and the volume of the body cavity. While modern deep learning algorithms for computer vision require large amounts of data, annotations for training are scarce. This paper investigates the use of unlabeled training data with a modern time-based deep learning algorithm. METHODS We propose to improve the state of the art in smoke recognition by enhancing a image classifier based on convolutional neural networks with a recurrent architecture thereby providing temporal context to the algorithm. We enrich the training with unlabeled recordings from similar procedures. The influence of surgical tools on the smoke recognition task is studied to reduce a possible bias. RESULTS The evaluations show that smoke recognition benefits from the additional temporal information during training. The use of unlabeled data from the same domain in a semi-supervised training procedure shows additional improvements reaching an accuracy of 86.8%. The proposed balancing policy is shown to have a positive impact on learning the discrimination of co-occurring surgical tools. CONCLUSIONS This study presents, to the best of our knowledge, the first use of a time series algorithm for the recognition of surgical smoke and the first use of this algorithm in the described semi-supervised setting. We show that the performance improvements with unlabeled data can be enhanced by integrating temporal context. We also show that adaption of the data distribution is beneficial to avoid learning biases.
Collapse
|
47
|
Ramesh S, Dall'Alba D, Gonzalez C, Yu T, Mascagni P, Mutter D, Marescaux J, Fiorini P, Padoy N. Multi-task temporal convolutional networks for joint recognition of surgical phases and steps in gastric bypass procedures. Int J Comput Assist Radiol Surg 2021; 16:1111-1119. [PMID: 34013464 PMCID: PMC8260406 DOI: 10.1007/s11548-021-02388-z] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 04/27/2021] [Indexed: 12/31/2022]
Abstract
PURPOSE Automatic segmentation and classification of surgical activity is crucial for providing advanced support in computer-assisted interventions and autonomous functionalities in robot-assisted surgeries. Prior works have focused on recognizing either coarse activities, such as phases, or fine-grained activities, such as gestures. This work aims at jointly recognizing two complementary levels of granularity directly from videos, namely phases and steps. METHODS We introduce two correlated surgical activities, phases and steps, for the laparoscopic gastric bypass procedure. We propose a multi-task multi-stage temporal convolutional network (MTMS-TCN) along with a multi-task convolutional neural network (CNN) training setup to jointly predict the phases and steps and benefit from their complementarity to better evaluate the execution of the procedure. We evaluate the proposed method on a large video dataset consisting of 40 surgical procedures (Bypass40). RESULTS We present experimental results from several baseline models for both phase and step recognition on the Bypass40. The proposed MTMS-TCN method outperforms single-task methods in both phase and step recognition by 1-2% in accuracy, precision and recall. Furthermore, for step recognition, MTMS-TCN achieves a superior performance of 3-6% compared to LSTM-based models on all metrics. CONCLUSION In this work, we present a multi-task multi-stage temporal convolutional network for surgical activity recognition, which shows improved results compared to single-task models on a gastric bypass dataset with multi-level annotations. The proposed method shows that the joint modeling of phases and steps is beneficial to improve the overall recognition of each type of activity.
Collapse
Affiliation(s)
- Sanat Ramesh
- Altair Robotics Lab, Department of Computer Science, University of Verona, Verona, Italy. .,ICube, University of Strasbourg, CNRS, IHU Strasbourg, France.
| | - Diego Dall'Alba
- Altair Robotics Lab, Department of Computer Science, University of Verona, Verona, Italy
| | | | - Tong Yu
- ICube, University of Strasbourg, CNRS, IHU Strasbourg, France
| | - Pietro Mascagni
- ICube, University of Strasbourg, CNRS, IHU Strasbourg, France.,Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Rome, Italy
| | - Didier Mutter
- University Hospital of Strasbourg, IHU Strasbourg, France.,IRCAD, Strasbourg, France
| | | | - Paolo Fiorini
- Altair Robotics Lab, Department of Computer Science, University of Verona, Verona, Italy
| | - Nicolas Padoy
- ICube, University of Strasbourg, CNRS, IHU Strasbourg, France
| |
Collapse
|