1
|
Fuchtmann J, Riedel T, Berlet M, Jell A, Wegener L, Wagner L, Graf S, Wilhelm D, Ostler-Mildner D. Audio-based event detection in the operating room. Int J Comput Assist Radiol Surg 2024:10.1007/s11548-024-03211-1. [PMID: 38862745 DOI: 10.1007/s11548-024-03211-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Accepted: 06/03/2024] [Indexed: 06/13/2024]
Abstract
PURPOSE Even though workflow analysis in the operating room has come a long way, current systems are still limited to research. In the quest for a robust, universal setup, hardly any attention has been given to the dimension of audio despite its numerous advantages, such as low costs, location, and sight independence, or little required processing power. METHODOLOGY We present an approach for audio-based event detection that solely relies on two microphones capturing the sound in the operating room. Therefore, a new data set was created with over 63 h of audio recorded and annotated at the University Hospital rechts der Isar. Sound files were labeled, preprocessed, augmented, and subsequently converted to log-mel-spectrograms that served as a visual input for an event classification using pretrained convolutional neural networks. RESULTS Comparing multiple architectures, we were able to show that even lightweight models, such as MobileNet, can already provide promising results. Data augmentation additionally improved the classification of 11 defined classes, including inter alia different types of coagulation, operating table movements as well as an idle class. With the newly created audio data set, an overall accuracy of 90%, a precision of 91% and a F1-score of 91% were achieved, demonstrating the feasibility of an audio-based event recognition in the operating room. CONCLUSION With this first proof of concept, we demonstrated that audio events can serve as a meaningful source of information that goes beyond spoken language and can easily be integrated into future workflow recognition pipelines using computational inexpensive architectures.
Collapse
Affiliation(s)
- Jonas Fuchtmann
- Research Group MITI, Klinikum rechts der Isar, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany.
- Department of Surgery, Klinikum rechts der Isar, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany.
| | - Thomas Riedel
- Research Group MITI, Klinikum rechts der Isar, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany
| | - Maximilian Berlet
- Research Group MITI, Klinikum rechts der Isar, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany
- Department of Surgery, Klinikum rechts der Isar, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany
| | - Alissa Jell
- Research Group MITI, Klinikum rechts der Isar, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany
- Department of Surgery, Klinikum rechts der Isar, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany
| | - Luca Wegener
- Research Group MITI, Klinikum rechts der Isar, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany
| | - Lars Wagner
- Research Group MITI, Klinikum rechts der Isar, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany
| | - Simone Graf
- University Hospital of Hearing, Speech and Voice Disorders, Medical University of Innsbruck, Innsbruck, Austria
| | - Dirk Wilhelm
- Research Group MITI, Klinikum rechts der Isar, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany
- Department of Surgery, Klinikum rechts der Isar, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany
| | - Daniel Ostler-Mildner
- Research Group MITI, Klinikum rechts der Isar, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany
| |
Collapse
|
2
|
Gui S, Wang Z, Chen J, Zhou X, Zhang C, Cao Y. MT4MTL-KD: A Multi-Teacher Knowledge Distillation Framework for Triplet Recognition. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:1628-1639. [PMID: 38127608 DOI: 10.1109/tmi.2023.3345736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
The recognition of surgical triplets plays a critical role in the practical application of surgical videos. It involves the sub-tasks of recognizing instruments, verbs, and targets, while establishing precise associations between them. Existing methods face two significant challenges in triplet recognition: 1) the imbalanced class distribution of surgical triplets may lead to spurious task association learning, and 2) the feature extractors cannot reconcile local and global context modeling. To overcome these challenges, this paper presents a novel multi-teacher knowledge distillation framework for multi-task triplet learning, known as MT4MTL-KD. MT4MTL-KD leverages teacher models trained on less imbalanced sub-tasks to assist multi-task student learning for triplet recognition. Moreover, we adopt different categories of backbones for the teacher and student models, facilitating the integration of local and global context modeling. To further align the semantic knowledge between the triplet task and its sub-tasks, we propose a novel feature attention module (FAM). This module utilizes attention mechanisms to assign multi-task features to specific sub-tasks. We evaluate the performance of MT4MTL-KD on both the 5-fold cross-validation and the CholecTriplet challenge splits of the CholecT45 dataset. The experimental results consistently demonstrate the superiority of our framework over state-of-the-art methods, achieving significant improvements of up to 6.4% on the cross-validation split.
Collapse
|
3
|
Tao R, Zou X, Zheng G. LAST: LAtent Space-Constrained Transformers for Automatic Surgical Phase Recognition and Tool Presence Detection. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:3256-3268. [PMID: 37227905 DOI: 10.1109/tmi.2023.3279838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
When developing context-aware systems, automatic surgical phase recognition and tool presence detection are two essential tasks. There exist previous attempts to develop methods for both tasks but majority of the existing methods utilize a frame-level loss function (e.g., cross-entropy) which does not fully leverage the underlying semantic structure of a surgery, leading to sub-optimal results. In this paper, we propose multi-task learning-based, LAtent Space-constrained Transformers, referred as LAST, for automatic surgical phase recognition and tool presence detection. Our design features a two-branch transformer architecture with a novel and generic way to leverage video-level semantic information during network training. This is done by learning a non-linear compact presentation of the underlying semantic structure information of surgical videos through a transformer variational autoencoder (VAE) and by encouraging models to follow the learned statistical distributions. In other words, LAST is of structure-aware and favors predictions that lie on the extracted low dimensional data manifold. Validated on two public datasets of the cholecystectomy surgery, i.e., the Cholec80 dataset and the M2cai16 dataset, our method achieves better results than other state-of-the-art methods. Specifically, on the Cholec80 dataset, our method achieves an average accuracy of 93.12±4.71%, an average precision of 89.25±5.49%, an average recall of 90.10±5.45% and an average Jaccard of 81.11 ±7.62% for phase recognition, and an average mAP of 95.15±3.87% for tool presence detection. Similar superior performance is also observed when LAST is applied to the M2cai16 dataset.
Collapse
|
4
|
Fang L, Mou L, Gu Y, Hu Y, Chen B, Chen X, Wang Y, Liu J, Zhao Y. Global-local multi-stage temporal convolutional network for cataract surgery phase recognition. Biomed Eng Online 2022; 21:82. [PMID: 36451164 PMCID: PMC9710114 DOI: 10.1186/s12938-022-01048-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 11/04/2022] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Surgical video phase recognition is an essential technique in computer-assisted surgical systems for monitoring surgical procedures, which can assist surgeons in standardizing procedures and enhancing postsurgical assessment and indexing. However, the high similarity between the phases and temporal variations of cataract videos still poses the greatest challenge for video phase recognition. METHODS In this paper, we introduce a global-local multi-stage temporal convolutional network (GL-MSTCN) to explore the subtle differences between high similarity surgical phases and mitigate the temporal variations of surgical videos. The presented work consists of a triple-stream network (i.e., pupil stream, instrument stream, and video frame stream) and a multi-stage temporal convolutional network. The triple-stream network first detects the pupil and surgical instruments regions in the frame separately and then obtains the fine-grained semantic features of the video frames. The proposed multi-stage temporal convolutional network improves the surgical phase recognition performance by capturing longer time series features through dilated convolutional layers with varying receptive fields. RESULTS Our method is thoroughly validated on the CSVideo dataset with 32 cataract surgery videos and the public Cataract101 dataset with 101 cataract surgery videos, outperforming state-of-the-art approaches with 95.8% and 96.5% accuracy, respectively. CONCLUSIONS The experimental results show that the use of global and local feature information can effectively enhance the model to explore fine-grained features and mitigate temporal and spatial variations, thus improving the surgical phase recognition performance of the proposed GL-MSTCN.
Collapse
Affiliation(s)
- Lixin Fang
- grid.469325.f0000 0004 1761 325XCollege of Mechanical Engineering, Zhejiang University of Technology, Hangzhou, 310014 China ,grid.9227.e0000000119573309Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, China
| | - Lei Mou
- grid.9227.e0000000119573309Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, China
| | - Yuanyuan Gu
- grid.9227.e0000000119573309Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, China ,grid.9227.e0000000119573309Zhejiang Engineering Research Center for Biomedical Materials, Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, 315300 China
| | - Yan Hu
- grid.263817.90000 0004 1773 1790Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, 518055 China
| | - Bang Chen
- grid.9227.e0000000119573309Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, China
| | - Xu Chen
- Department of Ophthalmology, Shanghai Aier Eye Hospital, Shanghai, China ,Department of Ophthalmology, Shanghai Aier Qingliang Eye Hospital, Shanghai, China ,grid.258164.c0000 0004 1790 3548Aier Eye Hospital, Jinan University, No. 601, Huangpu Road West, Guangzhou, China ,grid.216417.70000 0001 0379 7164Aier School of Ophthalmology, Central South University Changsha, Changsha, Hunan China
| | - Yang Wang
- grid.9227.e0000000119573309Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China
| | - Jiang Liu
- grid.263817.90000 0004 1773 1790Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, 518055 China
| | - Yitian Zhao
- grid.9227.e0000000119573309Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, China ,grid.9227.e0000000119573309Zhejiang Engineering Research Center for Biomedical Materials, Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, 315300 China
| |
Collapse
|
5
|
Junger D, Frommer SM, Burgert O. State-of-the-art of situation recognition systems for intraoperative procedures. Med Biol Eng Comput 2022; 60:921-939. [PMID: 35178622 PMCID: PMC8933302 DOI: 10.1007/s11517-022-02520-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Accepted: 01/30/2022] [Indexed: 11/05/2022]
Abstract
One of the key challenges for automatic assistance is the support of actors in the operating room depending on the status of the procedure. Therefore, context information collected in the operating room is used to gain knowledge about the current situation. In literature, solutions already exist for specific use cases, but it is doubtful to what extent these approaches can be transferred to other conditions. We conducted a comprehensive literature research on existing situation recognition systems for the intraoperative area, covering 274 articles and 95 cross-references published between 2010 and 2019. We contrasted and compared 58 identified approaches based on defined aspects such as used sensor data or application area. In addition, we discussed applicability and transferability. Most of the papers focus on video data for recognizing situations within laparoscopic and cataract surgeries. Not all of the approaches can be used online for real-time recognition. Using different methods, good results with recognition accuracies above 90% could be achieved. Overall, transferability is less addressed. The applicability of approaches to other circumstances seems to be possible to a limited extent. Future research should place a stronger focus on adaptability. The literature review shows differences within existing approaches for situation recognition and outlines research trends. Applicability and transferability to other conditions are less addressed in current work.
Collapse
Affiliation(s)
- D Junger
- School of Informatics, Research Group Computer Assisted Medicine (CaMed), Reutlingen University, Alteburgstr. 150, 72762, Reutlingen, Germany.
| | - S M Frommer
- School of Informatics, Research Group Computer Assisted Medicine (CaMed), Reutlingen University, Alteburgstr. 150, 72762, Reutlingen, Germany
| | - O Burgert
- School of Informatics, Research Group Computer Assisted Medicine (CaMed), Reutlingen University, Alteburgstr. 150, 72762, Reutlingen, Germany
| |
Collapse
|
6
|
Jin Y, Long Y, Chen C, Zhao Z, Dou Q, Heng PA. Temporal Memory Relation Network for Workflow Recognition From Surgical Video. IEEE TRANSACTIONS ON MEDICAL IMAGING 2021; 40:1911-1923. [PMID: 33780335 DOI: 10.1109/tmi.2021.3069471] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Automatic surgical workflow recognition is a key component for developing context-aware computer-assisted systems in the operating theatre. Previous works either jointly modeled the spatial features with short fixed-range temporal information, or separately learned visual and long temporal cues. In this paper, we propose a novel end-to-end temporal memory relation network (TMRNet) for relating long-range and multi-scale temporal patterns to augment the present features. We establish a long-range memory bank to serve as a memory cell storing the rich supportive information. Through our designed temporal variation layer, the supportive cues are further enhanced by multi-scale temporal-only convolutions. To effectively incorporate the two types of cues without disturbing the joint learning of spatio-temporal features, we introduce a non-local bank operator to attentively relate the past to the present. In this regard, our TMRNet enables the current feature to view the long-range temporal dependency, as well as tolerate complex temporal extents. We have extensively validated our approach on two benchmark surgical video datasets, M2CAI challenge dataset and Cholec80 dataset. Experimental results demonstrate the outstanding performance of our method, consistently exceeding the state-of-the-art methods by a large margin (e.g., 67.0% v.s. 78.9% Jaccard on Cholec80 dataset).
Collapse
|
7
|
Xia T, Jia F. Against spatial-temporal discrepancy: contrastive learning-based network for surgical workflow recognition. Int J Comput Assist Radiol Surg 2021; 16:839-848. [PMID: 33950398 DOI: 10.1007/s11548-021-02382-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Accepted: 04/16/2021] [Indexed: 11/27/2022]
Abstract
PURPOSE Automatic workflow recognition from surgical videos is fundamental and significant for developing context-aware systems in modern operating rooms. Although many approaches have been proposed to tackle challenges in this complex task, there are still many problems such as the fine-grained characteristics and spatial-temporal discrepancies in surgical videos. METHODS We propose a contrastive learning-based convolutional recurrent network with multi-level prediction to tackle these problems. Specifically, split-attention blocks are employed to extract spatial features. Through a mapping function in the step-phase branch, the current workflow can be predicted on two mutual-boosting levels. Furthermore, a contrastive branch is introduced to learn the spatial-temporal features that eliminate irrelevant changes in the environment. RESULTS We evaluate our method on the Cataract-101 dataset. The results show that our method achieves an accuracy of 96.37% with only surgical step labels, which outperforms other state-of-the-art approaches. CONCLUSION The proposed convolutional recurrent network based on step-phase prediction and contrastive learning can leverage fine-grained characteristics and alleviate spatial-temporal discrepancies to improve the performance of surgical workflow recognition.
Collapse
Affiliation(s)
- Tong Xia
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Fucang Jia
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China. .,University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
8
|
Beyersdorffer P, Kunert W, Jansen K, Miller J, Wilhelm P, Burgert O, Kirschniak A, Rolinger J. Detection of adverse events leading to inadvertent injury during laparoscopic cholecystectomy using convolutional neural networks. ACTA ACUST UNITED AC 2021; 66:413-421. [PMID: 33655738 DOI: 10.1515/bmt-2020-0106] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Accepted: 02/16/2021] [Indexed: 01/17/2023]
Abstract
Uncontrolled movements of laparoscopic instruments can lead to inadvertent injury of adjacent structures. The risk becomes evident when the dissecting instrument is located outside the field of view of the laparoscopic camera. Technical solutions to ensure patient safety are appreciated. The present work evaluated the feasibility of an automated binary classification of laparoscopic image data using Convolutional Neural Networks (CNN) to determine whether the dissecting instrument is located within the laparoscopic image section. A unique record of images was generated from six laparoscopic cholecystectomies in a surgical training environment to configure and train the CNN. By using a temporary version of the neural network, the annotation of the training image files could be automated and accelerated. A combination of oversampling and selective data augmentation was used to enlarge the fully labeled image data set and prevent loss of accuracy due to imbalanced class volumes. Subsequently the same approach was applied to the comprehensive, fully annotated Cholec80 database. The described process led to the generation of extensive and balanced training image data sets. The performance of the CNN-based binary classifiers was evaluated on separate test records from both databases. On our recorded data, an accuracy of 0.88 with regard to the safety-relevant classification was achieved. The subsequent evaluation on the Cholec80 data set yielded an accuracy of 0.84. The presented results demonstrate the feasibility of a binary classification of laparoscopic image data for the detection of adverse events in a surgical training environment using a specifically configured CNN architecture.
Collapse
Affiliation(s)
| | - Wolfgang Kunert
- Department of Surgery and Transplantation, Tübingen University Hospital, Tübingen, Germany
| | - Kai Jansen
- Department of Surgery and Transplantation, Tübingen University Hospital, Tübingen, Germany
| | - Johanna Miller
- Department of Surgery and Transplantation, Tübingen University Hospital, Tübingen, Germany
| | - Peter Wilhelm
- Department of Surgery and Transplantation, Tübingen University Hospital, Tübingen, Germany
| | - Oliver Burgert
- Department of Medical Informatics, Reutlingen University, Reutlingen, Germany
| | - Andreas Kirschniak
- Department of Surgery and Transplantation, Tübingen University Hospital, Tübingen, Germany
| | - Jens Rolinger
- Department of Surgery and Transplantation, Tübingen University Hospital, Tübingen, Germany
| |
Collapse
|
9
|
Alnafisee N, Zafar S, Vedula SS, Sikder S. Current methods for assessing technical skill in cataract surgery. J Cataract Refract Surg 2021; 47:256-264. [PMID: 32675650 DOI: 10.1097/j.jcrs.0000000000000322] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 06/19/2020] [Indexed: 12/18/2022]
Abstract
Surgery is a major source of errors in patient care. Preventing complications from surgical errors in the operating room is estimated to lead to reduction of up to 41 846 readmissions and save $620.3 million per year. It is now established that poor technical skill is associated with an increased risk of severe adverse events postoperatively and traditional models to train surgeons are being challenged by rapid advances in technology, an intensified patient-safety culture, and a need for value-driven health systems. This review discusses the current methods available for evaluating technical skills in cataract surgery and the recent technological advancements that have enabled capture and analysis of large amounts of complex surgical data for more automated objective skills assessment.
Collapse
Affiliation(s)
- Nouf Alnafisee
- From the The Wilmer Eye Institute, Johns Hopkins University School of Medicine (Alnafisee, Zafar, Sikder), Baltimore, and the Department of Computer Science, Malone Center for Engineering in Healthcare, The Johns Hopkins University Whiting School of Engineering (Vedula), Baltimore, Maryland, USA
| | | | | | | |
Collapse
|
10
|
Deep learning for surgical phase recognition using endoscopic videos. Surg Endosc 2020; 35:6150-6157. [PMID: 33237461 DOI: 10.1007/s00464-020-08110-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Accepted: 10/16/2020] [Indexed: 10/22/2022]
Abstract
BACKGROUND Operating room planning is a complex task as pre-operative estimations of procedure duration have a limited accuracy. This is due to large variations in the course of procedures. Therefore, information about the progress of procedures is essential to adapt the daily operating room schedule accordingly. This information should ideally be objective, automatically retrievable and in real-time. Recordings made during endoscopic surgeries are a potential source of progress information. A trained observer is able to recognize the ongoing surgical phase from watching these videos. The introduction of deep learning techniques brought up opportunities to automatically retrieve information from surgical videos. The aim of this study was to apply state-of-the art deep learning techniques on a new set of endoscopic videos to automatically recognize the progress of a procedure, and to assess the feasibility of the approach in terms of performance, scalability and practical considerations. METHODS A dataset of 33 laparoscopic cholecystectomies (LC) and 35 total laparoscopic hysterectomies (TLH) was used. The surgical tools that were used and the ongoing surgical phases were annotated in the recordings. Neural networks were trained on a subset of annotated videos. The automatic recognition of surgical tools and phases was then assessed on another subset. The scalability of the networks was tested and practical considerations were kept up. RESULTS The performance of the surgical tools and phase recognition reached an average precision and recall between 0.77 and 0.89. The scalability tests showed diverging results. Legal considerations had to be taken into account and a considerable amount of time was needed to annotate the datasets. CONCLUSION This study shows the potential of deep learning to automatically recognize information contained in surgical videos. This study also provides insights in the applicability of such a technique to support operating room planning.
Collapse
|
11
|
Surgical phase recognition by learning phase transitions. CURRENT DIRECTIONS IN BIOMEDICAL ENGINEERING 2020. [DOI: 10.1515/cdbme-2020-0037] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Abstract
Automatic recognition of surgical phases is an important component for developing an intra-operative context-aware system. Prior work in this area focuses on recognizing short-term tool usage patterns within surgical phases. However, the difference between intra- and inter-phase tool usage patterns has not been investigated for automatic phase recognition. We developed a Recurrent Neural Network (RNN), in particular a state-preserving Long Short Term Memory (LSTM) architecture to utilize the long-term evolution of tool usage within complete surgical procedures. For fully automatic tool presence detection from surgical video frames, a Convolutional Neural Network (CNN) based architecture namely ZIBNet is employed. Our proposed approach outperformed EndoNet by 8.1% on overall precision for phase detection tasks and 12.5% on meanAP for tool recognition tasks.
Collapse
|
12
|
Real-time automatic surgical phase recognition in laparoscopic sigmoidectomy using the convolutional neural network-based deep learning approach. Surg Endosc 2019; 34:4924-4931. [PMID: 31797047 DOI: 10.1007/s00464-019-07281-0] [Citation(s) in RCA: 69] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Accepted: 11/23/2019] [Indexed: 02/06/2023]
Abstract
BACKGROUND Automatic surgical workflow recognition is a key component for developing the context-aware computer-assisted surgery (CA-CAS) systems. However, automatic surgical phase recognition focused on colorectal surgery has not been reported. We aimed to develop a deep learning model for automatic surgical phase recognition based on laparoscopic sigmoidectomy (Lap-S) videos, which could be used for real-time phase recognition, and to clarify the accuracies of the automatic surgical phase and action recognitions using visual information. METHODS The dataset used contained 71 cases of Lap-S. The video data were divided into frame units every 1/30 s as static images. Every Lap-S video was manually divided into 11 surgical phases (Phases 0-10) and manually annotated for each surgical action on every frame. The model was generated based on the training data. Validation of the model was performed on a set of unseen test data. Convolutional neural network (CNN)-based deep learning was also used. RESULTS The average surgical time was 175 min (± 43 min SD), with the individual surgical phases also showing high variations in the duration between cases. Each surgery started in the first phase (Phase 0) and ended in the last phase (Phase 10), and phase transitions occurred 14 (± 2 SD) times per procedure on an average. The accuracy of the automatic surgical phase recognition was 91.9% and those for the automatic surgical action recognition of extracorporeal action and irrigation were 89.4% and 82.5%, respectively. Moreover, this system could perform real-time automatic surgical phase recognition at 32 fps. CONCLUSIONS The CNN-based deep learning approach enabled the recognition of surgical phases and actions in 71 Lap-S cases based on manually annotated data. This system could perform automatic surgical phase recognition and automatic target surgical action recognition with high accuracy. Moreover, this study showed the feasibility of real-time automatic surgical phase recognition with high frame rate.
Collapse
|
13
|
Bodenstedt S, Wagner M, Mündermann L, Kenngott H, Müller-Stich B, Breucha M, Mees ST, Weitz J, Speidel S. Prediction of laparoscopic procedure duration using unlabeled, multimodal sensor data. Int J Comput Assist Radiol Surg 2019; 14:1089-1095. [PMID: 30968352 DOI: 10.1007/s11548-019-01966-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Accepted: 04/03/2019] [Indexed: 11/29/2022]
Abstract
PURPOSE The course of surgical procedures is often unpredictable, making it difficult to estimate the duration of procedures beforehand. This uncertainty makes scheduling surgical procedures a difficult task. A context-aware method that analyses the workflow of an intervention online and automatically predicts the remaining duration would alleviate these problems. As basis for such an estimate, information regarding the current state of the intervention is a requirement. METHODS Today, the operating room contains a diverse range of sensors. During laparoscopic interventions, the endoscopic video stream is an ideal source of such information. Extracting quantitative information from the video is challenging though, due to its high dimensionality. Other surgical devices (e.g., insufflator, lights, etc.) provide data streams which are, in contrast to the video stream, more compact and easier to quantify. Though whether such streams offer sufficient information for estimating the duration of surgery is uncertain. In this paper, we propose and compare methods, based on convolutional neural networks, for continuously predicting the duration of laparoscopic interventions based on unlabeled data, such as from endoscopic image and surgical device streams. RESULTS The methods are evaluated on 80 recorded laparoscopic interventions of various types, for which surgical device data and the endoscopic video streams are available. Here the combined method performs best with an overall average error of 37% and an average halftime error of approximately 28%. CONCLUSION In this paper, we present, to our knowledge, the first approach for online procedure duration prediction using unlabeled endoscopic video data and surgical device data in a laparoscopic setting. Furthermore, we show that a method incorporating both vision and device data performs better than methods based only on vision, while methods only based on tool usage and surgical device data perform poorly, showing the importance of the visual channel.
Collapse
Affiliation(s)
- Sebastian Bodenstedt
- Department for Translational Surgical Oncology, National Center for Tumor Diseases (NCT), Partner Site Dresden, Dresden, Germany.
| | - Martin Wagner
- Department of General, Visceral and Transplant Surgery, University of Heidelberg, Heidelberg, Germany
| | | | - Hannes Kenngott
- Department of General, Visceral and Transplant Surgery, University of Heidelberg, Heidelberg, Germany
| | - Beat Müller-Stich
- Department of General, Visceral and Transplant Surgery, University of Heidelberg, Heidelberg, Germany
| | - Michael Breucha
- Department of Visceral, Thoracic and Vascular Surgery, Faculty of Medicine and University Hospital Carl Gustav Carus, TU Dresden, Dresden, Germany
| | - Sören Torge Mees
- Department of Visceral, Thoracic and Vascular Surgery, Faculty of Medicine and University Hospital Carl Gustav Carus, TU Dresden, Dresden, Germany
| | - Jürgen Weitz
- Department of Visceral, Thoracic and Vascular Surgery, Faculty of Medicine and University Hospital Carl Gustav Carus, TU Dresden, Dresden, Germany
| | - Stefanie Speidel
- Department for Translational Surgical Oncology, National Center for Tumor Diseases (NCT), Partner Site Dresden, Dresden, Germany
| |
Collapse
|
14
|
Gholinejad M, J Loeve A, Dankelman J. Surgical process modelling strategies: which method to choose for determining workflow? MINIM INVASIV THER 2019; 28:91-104. [PMID: 30915885 DOI: 10.1080/13645706.2019.1591457] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The vital role of surgeries in healthcare requires a constant attention to improvement. Surgical process modelling is an innovative and rather recently introduced approach for tackling the issues in today's complex surgeries. This modelling field is very challenging and still under development, therefore, it is not always clear which modelling strategy would best fit the needs in which situations. The aim of this study was to provide a guide for matching the choice of modelling strategies for determining surgical workflows. In this work, the concepts associated with surgical process modelling are described, aiming to clarify them and to promote their use in future studies. The relationship of these concepts and the possible combinations of the suitable approaches for modelling strategies are elaborated and the criteria for opting for the proper modelling strategy are discussed.
Collapse
Affiliation(s)
- Maryam Gholinejad
- a Department of Biomechanical Engineering, Faculty of Mechanical, Maritime and Materials Engineering , Delft University of Technology , Delft , the Netherlands
| | - Arjo J Loeve
- a Department of Biomechanical Engineering, Faculty of Mechanical, Maritime and Materials Engineering , Delft University of Technology , Delft , the Netherlands
| | - Jenny Dankelman
- a Department of Biomechanical Engineering, Faculty of Mechanical, Maritime and Materials Engineering , Delft University of Technology , Delft , the Netherlands
| |
Collapse
|
15
|
Hard Frame Detection and Online Mapping for Surgical Phase Recognition. LECTURE NOTES IN COMPUTER SCIENCE 2019. [DOI: 10.1007/978-3-030-32254-0_50] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
|
16
|
Nakawala H, Bianchi R, Pescatori LE, De Cobelli O, Ferrigno G, De Momi E. “Deep-Onto” network for surgical workflow and context recognition. Int J Comput Assist Radiol Surg 2018; 14:685-696. [DOI: 10.1007/s11548-018-1882-8] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2018] [Accepted: 11/05/2018] [Indexed: 12/31/2022]
|
17
|
A Kalman-Filter-Based Common Algorithm Approach for Object Detection in Surgery Scene to Assist Surgeon's Situation Awareness in Robot-Assisted Laparoscopic Surgery. JOURNAL OF HEALTHCARE ENGINEERING 2018; 2018:8079713. [PMID: 29854366 PMCID: PMC5954863 DOI: 10.1155/2018/8079713] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Revised: 02/10/2018] [Accepted: 04/03/2018] [Indexed: 02/07/2023]
Abstract
Although the use of the surgical robot is rapidly expanding for various medical treatments, there still exist safety issues and concerns about robot-assisted surgeries due to limited vision through a laparoscope, which may cause compromised situation awareness and surgical errors requiring rapid emergency conversion to open surgery. To assist surgeon's situation awareness and preventive emergency response, this study proposes situation information guidance through a vision-based common algorithm architecture for automatic detection and tracking of intraoperative hemorrhage and surgical instruments. The proposed common architecture comprises the location of the object of interest using feature texture, morphological information, and the tracking of the object based on Kalman filter for robustness with reduced error. The average recall and precision of the instrument detection in four prostate surgery videos were 96% and 86%, and the accuracy of the hemorrhage detection in two prostate surgery videos was 98%. Results demonstrate the robustness of the automatic intraoperative object detection and tracking which can be used to enhance the surgeon's preventive state recognition during robot-assisted surgery.
Collapse
|
18
|
Jin Y, Dou Q, Chen H, Yu L, Qin J, Fu CW, Heng PA. SV-RCNet: Workflow Recognition From Surgical Videos Using Recurrent Convolutional Network. IEEE TRANSACTIONS ON MEDICAL IMAGING 2018; 37:1114-1126. [PMID: 29727275 DOI: 10.1109/tmi.2017.2787657] [Citation(s) in RCA: 107] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
We propose an analysis of surgical videos that is based on a novel recurrent convolutional network (SV-RCNet), specifically for automatic workflow recognition from surgical videos online, which is a key component for developing the context-aware computer-assisted intervention systems. Different from previous methods which harness visual and temporal information separately, the proposed SV-RCNet seamlessly integrates a convolutional neural network (CNN) and a recurrent neural network (RNN) to form a novel recurrent convolutional architecture in order to take full advantages of the complementary information of visual and temporal features learned from surgical videos. We effectively train the SV-RCNet in an end-to-end manner so that the visual representations and sequential dynamics can be jointly optimized in the learning process. In order to produce more discriminative spatio-temporal features, we exploit a deep residual network (ResNet) and a long short term memory (LSTM) network, to extract visual features and temporal dependencies, respectively, and integrate them into the SV-RCNet. Moreover, based on the phase transition-sensitive predictions from the SV-RCNet, we propose a simple yet effective inference scheme, namely the prior knowledge inference (PKI), by leveraging the natural characteristic of surgical video. Such a strategy further improves the consistency of results and largely boosts the recognition performance. Extensive experiments have been conducted with the MICCAI 2016 Modeling and Monitoring of Computer Assisted Interventions Workflow Challenge dataset and Cholec80 dataset to validate SV-RCNet. Our approach not only achieves superior performance on these two datasets but also outperforms the state-of-the-art methods by a significant margin.
Collapse
|
19
|
Loukas C. Video content analysis of surgical procedures. Surg Endosc 2017; 32:553-568. [PMID: 29075965 DOI: 10.1007/s00464-017-5878-1] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Accepted: 09/07/2017] [Indexed: 12/14/2022]
Abstract
BACKGROUND In addition to its therapeutic benefits, minimally invasive surgery offers the potential for video recording of the operation. The videos may be archived and used later for reasons such as cognitive training, skills assessment, and workflow analysis. Methods from the major field of video content analysis and representation are increasingly applied in the surgical domain. In this paper, we review recent developments and analyze future directions in the field of content-based video analysis of surgical operations. METHODS The review was obtained from PubMed and Google Scholar search on combinations of the following keywords: 'surgery', 'video', 'phase', 'task', 'skills', 'event', 'shot', 'analysis', 'retrieval', 'detection', 'classification', and 'recognition'. The collected articles were categorized and reviewed based on the technical goal sought, type of surgery performed, and structure of the operation. RESULTS A total of 81 articles were included. The publication activity is constantly increasing; more than 50% of these articles were published in the last 3 years. Significant research has been performed for video task detection and retrieval in eye surgery. In endoscopic surgery, the research activity is more diverse: gesture/task classification, skills assessment, tool type recognition, shot/event detection and retrieval. Recent works employ deep neural networks for phase and tool recognition as well as shot detection. CONCLUSIONS Content-based video analysis of surgical operations is a rapidly expanding field. Several future prospects for research exist including, inter alia, shot boundary detection, keyframe extraction, video summarization, pattern discovery, and video annotation. The development of publicly available benchmark datasets to evaluate and compare task-specific algorithms is essential.
Collapse
Affiliation(s)
- Constantinos Loukas
- Laboratory of Medical Physics, Medical School, National and Kapodistrian University of Athens, Mikras Asias 75 str., 11527, Athens, Greece.
| |
Collapse
|
20
|
Li X, Zhang Y, Zhang J, Zhou M, Chen S, Gu Y, Chen Y, Marsic I, Farneth RA, Burd RS. Progress Estimation and Phase Detection for Sequential Processes. ACTA ACUST UNITED AC 2017; 1. [PMID: 30417164 DOI: 10.1145/3130936] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Process modeling and understanding are fundamental for advanced human-computer interfaces and automation systems. Most recent research has focused on activity recognition, but little has been done on sensor-based detection of process progress. We introduce a real-time, sensor-based system for modeling, recognizing and estimating the progress of a work process. We implemented a multimodal deep learning structure to extract the relevant spatio-temporal features from multiple sensory inputs and used a novel deep regression structure for overall completeness estimation. Using process completeness estimation with a Gaussian mixture model, our system can predict the phase for sequential processes. The performance speed, calculated using completeness estimation, allows online estimation of the remaining time. To train our system, we introduced a novel rectified hyperbolic tangent (rtanh) activation function and conditional loss. Our system was tested on data obtained from the medical process (trauma resuscitation) and sports events (Olympic swimming competition). Our system outperformed the existing trauma-resuscitation phase detectors with a phase detection accuracy of over 86%, an F1-score of 0.67, a completeness estimation error of under 12.6%, and a remaining-time estimation error of less than 7.5 minutes. For the Olympic swimming dataset, our system achieved an accuracy of 88%, an F1-score of 0.58, a completeness estimation error of 6.3% and a remaining-time estimation error of 2.9 minutes.
Collapse
Affiliation(s)
- Xinyu Li
- Rutgers, the State University of New Jersey, USA, Electrical & Computer Engineering Building, 94 Brett Road, Piscataway, New Jersey;
| | - Yanyi Zhang
- Rutgers, the State University of New Jersey, USA, Electrical & Computer Engineering Building, 94 Brett Road, Piscataway, New Jersey;
| | - Jianyu Zhang
- Rutgers, the State University of New Jersey, USA, Electrical & Computer Engineering Building, 94 Brett Road, Piscataway, New Jersey;
| | - Moliang Zhou
- Rutgers, the State University of New Jersey, USA, Electrical & Computer Engineering Building, 94 Brett Road, Piscataway, New Jersey;
| | - Shuhong Chen
- Rutgers, the State University of New Jersey, USA, Electrical & Computer Engineering Building, 94 Brett Road, Piscataway, New Jersey;
| | - Yue Gu
- Rutgers, the State University of New Jersey, USA, Electrical & Computer Engineering Building, 94 Brett Road, Piscataway, New Jersey;
| | - Yueyang Chen
- Rutgers, the State University of New Jersey, USA, Electrical & Computer Engineering Building, 94 Brett Road, Piscataway, New Jersey;
| | - Ivan Marsic
- Rutgers, the State University of New Jersey, USA, Electrical & Computer Engineering Building, 94 Brett Road, Piscataway, New Jersey
| | - Richard A Farneth
- Division of Trauma and Burn Surgery, Children's National Medical Center, Washington, D.C., 20010, USA;
| | - Randall S Burd
- Division of Trauma and Burn Surgery, Children's National Medical Center, Washington, D.C., 20010, USA;
| |
Collapse
|
21
|
Stauder R, Ostler D, Vogel T, Wilhelm D, Koller S, Kranzfelder M, Navab N. Surgical data processing for smart intraoperative assistance systems. Innov Surg Sci 2017; 2:145-152. [PMID: 31579746 PMCID: PMC6754013 DOI: 10.1515/iss-2017-0035] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2017] [Accepted: 08/28/2017] [Indexed: 11/26/2022] Open
Abstract
Different components of the newly defined field of surgical data science have been under research at our groups for more than a decade now. In this paper, we describe our sensor-driven approaches to workflow recognition without the need for explicit models, and our current aim is to apply this knowledge to enable context-aware surgical assistance systems, such as a unified surgical display and robotic assistance systems. The methods we evaluated over time include dynamic time warping, hidden Markov models, random forests, and recently deep neural networks, specifically convolutional neural networks.
Collapse
Affiliation(s)
- Ralf Stauder
- Chair for Computer Aided Medical Procedures, Technical University of Munich, Munich, Germany
| | - Daniel Ostler
- Research Group for Minimally Invasive Interdisciplinary Therapeutical Interventions, Klinikum Rechts der Isar, Technical University of Munich, Munich, Germany
| | - Thomas Vogel
- Research Group for Minimally Invasive Interdisciplinary Therapeutical Interventions, Klinikum Rechts der Isar, Technical University of Munich, Munich, Germany
| | - Dirk Wilhelm
- Research Group for Minimally Invasive Interdisciplinary Therapeutical Interventions, Klinikum Rechts der Isar, Technical University of Munich, Munich, Germany
| | - Sebastian Koller
- Research Group for Minimally Invasive Interdisciplinary Therapeutical Interventions, Klinikum Rechts der Isar, Technical University of Munich, Munich, Germany
| | - Michael Kranzfelder
- Research Group for Minimally Invasive Interdisciplinary Therapeutical Interventions, Klinikum Rechts der Isar, Technical University of Munich, Munich, Germany
| | - Nassir Navab
- Chair for Computer Aided Medical Procedures, Technical University of Munich, Munich, Germany.,Department of Computer Science, The Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
22
|
Sahu M, Mukhopadhyay A, Szengel A, Zachow S. Addressing multi-label imbalance problem of surgical tool detection using CNN. Int J Comput Assist Radiol Surg 2017; 12:1013-1020. [PMID: 28357628 DOI: 10.1007/s11548-017-1565-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2017] [Accepted: 03/14/2017] [Indexed: 11/24/2022]
Abstract
PURPOSE A fully automated surgical tool detection framework is proposed for endoscopic video streams. State-of-the-art surgical tool detection methods rely on supervised one-vs-all or multi-class classification techniques, completely ignoring the co-occurrence relationship of the tools and the associated class imbalance. METHODS In this paper, we formulate tool detection as a multi-label classification task where tool co-occurrences are treated as separate classes. In addition, imbalance on tool co-occurrences is analyzed and stratification techniques are employed to address the imbalance during convolutional neural network (CNN) training. Moreover, temporal smoothing is introduced as an online post-processing step to enhance runtime prediction. RESULTS Quantitative analysis is performed on the M2CAI16 tool detection dataset to highlight the importance of stratification, temporal smoothing and the overall framework for tool detection. CONCLUSION The analysis on tool imbalance, backed by the empirical results, indicates the need and superiority of the proposed framework over state-of-the-art techniques.
Collapse
|
23
|
Guédon ACP, Paalvast M, Meeuwsen FC, Tax DMJ, van Dijke AP, Wauben LSGL, van der Elst M, Dankelman J, van den Dobbelsteen JJ. 'It is Time to Prepare the Next patient' Real-Time Prediction of Procedure Duration in Laparoscopic Cholecystectomies. J Med Syst 2016; 40:271. [PMID: 27743243 PMCID: PMC5065600 DOI: 10.1007/s10916-016-0631-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Accepted: 09/28/2016] [Indexed: 12/02/2022]
Abstract
Operating Room (OR) scheduling is crucial to allow efficient use of ORs. Currently, the predicted durations of surgical procedures are unreliable and the OR schedulers have to follow the progress of the procedures in order to update the daily planning accordingly. The OR schedulers often acquire the needed information through verbal communication with the OR staff, which causes undesired interruptions of the surgical process. The aim of this study was to develop a system that predicts in real-time the remaining procedure duration and to test this prediction system for reliability and usability in an OR. The prediction system was based on the activation pattern of one single piece of equipment, the electrosurgical device. The prediction system was tested during 21 laparoscopic cholecystectomies, in which the activation of the electrosurgical device was recorded and processed in real-time using pattern recognition methods. The remaining surgical procedure duration was estimated and the optimal timing to prepare the next patient for surgery was communicated to the OR staff. The mean absolute error was smaller for the prediction system (14 min) than for the OR staff (19 min). The OR staff doubted whether the prediction system could take all relevant factors into account but were positive about its potential to shorten waiting times for patients. The prediction system is a promising tool to automatically and objectively predict the remaining procedure duration, and thereby achieve optimal OR scheduling and streamline the patient flow from the nursing department to the OR.
Collapse
Affiliation(s)
- Annetje C P Guédon
- Department of BioMechanical Engineering, Faculty of Mechanical, Maritime and Materials Engineering, Delft University of Technology, Mekelweg 2, 2628 CD, Delft, The Netherlands.
| | - M Paalvast
- Department of BioMechanical Engineering, Faculty of Mechanical, Maritime and Materials Engineering, Delft University of Technology, Mekelweg 2, 2628 CD, Delft, The Netherlands
| | - F C Meeuwsen
- Department of BioMechanical Engineering, Faculty of Mechanical, Maritime and Materials Engineering, Delft University of Technology, Mekelweg 2, 2628 CD, Delft, The Netherlands
| | - D M J Tax
- Pattern Recognition Laboratory, Delft University of Technology, Mekelweg 4, 2628 CD, Delft, The Netherlands
| | - A P van Dijke
- Department of BioMechanical Engineering, Faculty of Mechanical, Maritime and Materials Engineering, Delft University of Technology, Mekelweg 2, 2628 CD, Delft, The Netherlands
| | - L S G L Wauben
- Department of BioMechanical Engineering, Faculty of Mechanical, Maritime and Materials Engineering, Delft University of Technology, Mekelweg 2, 2628 CD, Delft, The Netherlands
| | - M van der Elst
- Department of Surgery, Reinier de Graaf Groep, Reinier de Graafweg 3-11, 2625 AD, Delft, The Netherlands
| | - J Dankelman
- Department of BioMechanical Engineering, Faculty of Mechanical, Maritime and Materials Engineering, Delft University of Technology, Mekelweg 2, 2628 CD, Delft, The Netherlands
| | - J J van den Dobbelsteen
- Department of BioMechanical Engineering, Faculty of Mechanical, Maritime and Materials Engineering, Delft University of Technology, Mekelweg 2, 2628 CD, Delft, The Netherlands
| |
Collapse
|
24
|
Franke S, Neumuth T. Rule-based medical device adaptation for the digital operating room. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2016; 2015:1733-6. [PMID: 26736612 DOI: 10.1109/embc.2015.7318712] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
A workflow-driven cooperative operating room needs to be established in order to successfully unburden the surgeon and the operating room staff very time-consuming information-seeking and configuration tasks. We propose an approach towards the integration of intraoperative surgical workflow management and integration technologies. The concept of rule-based behavior is adapted to situation-aware medical devices. A prototype was implemented and experiments with sixty recorded brain tumor removal procedures were conducted to test the proposed approach. An analysis of the recordings indicated numerous applications, such as automatic display configuration, room light adaptation and pre-configuration of medical devices and systems.
Collapse
|
25
|
Automated video-based assessment of surgical skills for training and evaluation in medical schools. Int J Comput Assist Radiol Surg 2016; 11:1623-36. [PMID: 27567917 DOI: 10.1007/s11548-016-1468-2] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2016] [Accepted: 08/03/2016] [Indexed: 10/21/2022]
Abstract
PURPOSE Routine evaluation of basic surgical skills in medical schools requires considerable time and effort from supervising faculty. For each surgical trainee, a supervisor has to observe the trainees in person. Alternatively, supervisors may use training videos, which reduces some of the logistical overhead. All these approaches however are still incredibly time consuming and involve human bias. In this paper, we present an automated system for surgical skills assessment by analyzing video data of surgical activities. METHOD We compare different techniques for video-based surgical skill evaluation. We use techniques that capture the motion information at a coarser granularity using symbols or words, extract motion dynamics using textural patterns in a frame kernel matrix, and analyze fine-grained motion information using frequency analysis. RESULTS We were successfully able to classify surgeons into different skill levels with high accuracy. Our results indicate that fine-grained analysis of motion dynamics via frequency analysis is most effective in capturing the skill relevant information in surgical videos. CONCLUSION Our evaluations show that frequency features perform better than motion texture features, which in-turn perform better than symbol-/word-based features. Put succinctly, skill classification accuracy is positively correlated with motion granularity as demonstrated by our results on two challenging video datasets.
Collapse
|
26
|
Shot boundary detection in endoscopic surgery videos using a variational Bayesian framework. Int J Comput Assist Radiol Surg 2016; 11:1937-1949. [DOI: 10.1007/s11548-016-1431-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Accepted: 05/27/2016] [Indexed: 01/07/2023]
|
27
|
System events: readily accessible features for surgical phase detection. Int J Comput Assist Radiol Surg 2016; 11:1201-9. [PMID: 27177760 DOI: 10.1007/s11548-016-1409-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2016] [Accepted: 03/31/2016] [Indexed: 10/21/2022]
Abstract
PURPOSE Surgical phase recognition using sensor data is challenging due to high variation in patient anatomy and surgeon-specific operating styles. Segmenting surgical procedures into constituent phases is of significant utility for resident training, education, self-review, and context-aware operating room technologies. Phase annotation is a highly labor-intensive task and would benefit greatly from automated solutions. METHODS We propose a novel approach using system events-for example, activation of cautery tools-that are easily captured in most surgical procedures. Our method involves extracting event-based features over 90-s intervals and assigning a phase label to each interval. We explore three classification techniques: support vector machines, random forests, and temporal convolution neural networks. Each of these models independently predicts a label for each time interval. We also examine segmental inference using an approach based on the semi-Markov conditional random field, which jointly performs phase segmentation and classification. Our method is evaluated on a data set of 24 robot-assisted hysterectomy procedures. RESULTS Our framework is able to detect surgical phases with an accuracy of 74 % using event-based features over a set of five different phases-ligation, dissection, colpotomy, cuff closure, and background. Precision and recall values for the cuff closure (Precision: 83 %, Recall: 98 %) and dissection (Precision: 75 %, Recall: 88 %) classes were higher than other classes. The normalized Levenshtein distance between predicted and ground truth phase sequence was 25 %. CONCLUSIONS Our findings demonstrate that system events features are useful for automatically detecting surgical phase. Events contain phase information that cannot be obtained from motion data and that would require advanced computer vision algorithms to extract from a video. Many of these events are not specific to robotic surgery and can easily be recorded in non-robotic surgical modalities. In future work, we plan to combine information from system events, tool motion, and videos to automate phase detection in surgical procedures.
Collapse
|
28
|
Bridging the gap between formal and experience-based knowledge for context-aware laparoscopy. Int J Comput Assist Radiol Surg 2016; 11:881-8. [DOI: 10.1007/s11548-016-1379-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Accepted: 03/07/2016] [Indexed: 10/22/2022]
|
29
|
Dergachyova O, Bouget D, Huaulmé A, Morandi X, Jannin P. Automatic data-driven real-time segmentation and recognition of surgical workflow. Int J Comput Assist Radiol Surg 2016; 11:1081-9. [PMID: 26995598 DOI: 10.1007/s11548-016-1371-x] [Citation(s) in RCA: 58] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2016] [Accepted: 02/26/2016] [Indexed: 11/30/2022]
Abstract
PURPOSE With the intention of extending the perception and action of surgical staff inside the operating room, the medical community has expressed a growing interest towards context-aware systems. Requiring an accurate identification of the surgical workflow, such systems make use of data from a diverse set of available sensors. In this paper, we propose a fully data-driven and real-time method for segmentation and recognition of surgical phases using a combination of video data and instrument usage signals, exploiting no prior knowledge. We also introduce new validation metrics for assessment of workflow detection. METHODS The segmentation and recognition are based on a four-stage process. Firstly, during the learning time, a Surgical Process Model is automatically constructed from data annotations to guide the following process. Secondly, data samples are described using a combination of low-level visual cues and instrument information. Then, in the third stage, these descriptions are employed to train a set of AdaBoost classifiers capable of distinguishing one surgical phase from others. Finally, AdaBoost responses are used as input to a Hidden semi-Markov Model in order to obtain a final decision. RESULTS On the MICCAI EndoVis challenge laparoscopic dataset we achieved a precision and a recall of 91 % in classification of 7 phases. CONCLUSION Compared to the analysis based on one data type only, a combination of visual features and instrument signals allows better segmentation, reduction of the detection delay and discovery of the correct phase order.
Collapse
Affiliation(s)
- Olga Dergachyova
- INSERM, U1099, Rennes, 35000, France. .,Université de Rennes 1, LTSI, Rennes, 35000, France.
| | - David Bouget
- INSERM, U1099, Rennes, 35000, France.,Université de Rennes 1, LTSI, Rennes, 35000, France
| | - Arnaud Huaulmé
- INSERM, U1099, Rennes, 35000, France.,Université de Rennes 1, LTSI, Rennes, 35000, France.,Université Joseph Fourier, TIMC-IMAG UMR 5525, Grenoble, 38041, France
| | - Xavier Morandi
- INSERM, U1099, Rennes, 35000, France.,Université de Rennes 1, LTSI, Rennes, 35000, France.,CHU Rennes, Département de Neurochirurgie, Rennes, 35000, France
| | - Pierre Jannin
- INSERM, U1099, Rennes, 35000, France.,Université de Rennes 1, LTSI, Rennes, 35000, France
| |
Collapse
|
30
|
Loukas C, Georgiou E. Performance comparison of various feature detector-descriptors and temporal models for video-based assessment of laparoscopic skills. Int J Med Robot 2015; 12:387-98. [PMID: 26415583 DOI: 10.1002/rcs.1702] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2013] [Revised: 07/17/2015] [Accepted: 08/21/2015] [Indexed: 11/07/2022]
Abstract
BACKGROUND Despite the significant progress in hand gesture analysis for surgical skills assessment, video-based analysis has not received much attention. In this study we investigate the application of various feature detector-descriptors and temporal modeling techniques for laparoscopic skills assessment. METHODS Two different setups were designed: static and dynamic video-histogram analysis. Four well-known feature detection-extraction methods were investigated: SIFT, SURF, STAR-BRIEF and STIP-HOG. For the dynamic setup two temporal models were employed (LDS and GMMAR model). Each method was evaluated for its ability to classify experts and novices on peg transfer and knot tying. RESULTS STIP-HOG yielded the best performance (static: 74-79%; dynamic: 80-89%). Temporal models had equivalent performance. Important differences were found between the two groups with respect to the underlying dynamics of the video-histogram sequences. CONCLUSIONS Temporal modeling of feature histograms extracted from laparoscopic training videos provides information about the skill level and motion pattern of the operator. Copyright © 2015 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Constantinos Loukas
- Medical Physics Lab-Simulation Center, School of Medicine, University of Athens, Greece
| | - Evangelos Georgiou
- Medical Physics Lab-Simulation Center, School of Medicine, University of Athens, Greece
| |
Collapse
|
31
|
Rockstroh M, Franke S, Neumuth T. Closed-loop approach for situation awareness of medical devices and operating room infrastructure. CURRENT DIRECTIONS IN BIOMEDICAL ENGINEERING 2015. [DOI: 10.1515/cdbme-2015-0044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Abstract
In recent years, approaches for information and control integration in the digital operating room have emerged. A major step towards an intelligent operating room and a cooperative technical environment would be autonomous adaptation of medical devices and systems to the surgical workflow. The OR staff should be freed from information seeking and maintenance tasks. We propose a closed-loop concept integrating workflow monitoring, processing and (semi-)automatic interaction to bridge the gap between OR integration of medical devices and workflow-related information management.
Four steps were identified for the implementation of workflow-driven assistance functionalities. The processing steps in the closed loop of workflow-driven assistance could either be implemented with centralized responsible components or in a cooperative agent-based approach. However, both strategies require a common framework and terminology to ensure interoperability between the components, the medical devices (actors) and the OR infrastructure.
Collapse
Affiliation(s)
- Max Rockstroh
- Universität Leipzig, Innovation Center Computer Assisted Surgery (ICCAS), Semmelweisstraße 14, D-04103, 0341 97 12051, 0341 97 12009
| | - Stefan Franke
- Universität Leipzig, Innovation Center Computer Assisted Surgery (ICCAS), Semmelweisstraße 14, D-04103, 0341 97 12051, 0341 97 12009
| | - Thomas Neumuth
- Universität Leipzig, Innovation Center Computer Assisted Surgery (ICCAS), Semmelweisstraße 14, D-04103, 0341 97 12051, 0341 97 12009
| |
Collapse
|
32
|
Maktabi M, Vinz ST, Neumuth T. Frequency based assessment of surgical activities. CURRENT DIRECTIONS IN BIOMEDICAL ENGINEERING 2015. [DOI: 10.1515/cdbme-2015-0038] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
AbstractIn hospitals the duration of surgeries plays a decisive role in many areas, such as patient safety or financial aspects. By utilizing accurate automated online prediction efficient surgical patient care and effective resource management can be attained. In this work several surgical activities during an intervention were examined for their potential to forecast the remaining intervention time. The method used was based on analysing in the frequency domain of time series which represented the status of surgical activities during an intervention. A nonparametric estimation of power spectral density was calculated for single surgical tasks during an intervention. The power spectral densities (PSD) of different surgical activities were compared in a leave-one-out cross validation of forty surgical workflow recordings of lumbar discectomies. The results showed that the activity irrigate with a mean prediction error of 26 min 23 s is best-suited for determining the remainder of the intervention. To construct a scheduling support for a wider range of surgery types the actions conducted by the surgeon’s right and left hand would eminently be more suitable; the error of the action right hand was 41 min 39 s, yet. In conclusion sophistication into the presented frequency based method might support time and resource management in a general manner.
Collapse
Affiliation(s)
- Marianne Maktabi
- 1University Leipzig, ICCAS, Semmelweisstr. 14, 04103 Leipzig, Germany
| | - Sascha T. Vinz
- 1University Leipzig, ICCAS, Semmelweisstr. 14, 04103 Leipzig, Germany
| | - Thomas Neumuth
- 1University Leipzig, ICCAS, Semmelweisstr. 14, 04103 Leipzig, Germany
| |
Collapse
|
33
|
Katić D, Julliard C, Wekerle AL, Kenngott H, Müller-Stich BP, Dillmann R, Speidel S, Jannin P, Gibaud B. LapOntoSPM: an ontology for laparoscopic surgeries and its application to surgical phase recognition. Int J Comput Assist Radiol Surg 2015; 10:1427-34. [PMID: 26062794 DOI: 10.1007/s11548-015-1222-1] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2014] [Accepted: 05/01/2015] [Indexed: 10/23/2022]
Abstract
PURPOSE The rise of intraoperative information threatens to outpace our abilities to process it. Context-aware systems, filtering information to automatically adapt to the current needs of the surgeon, are necessary to fully profit from computerized surgery. To attain context awareness, representation of medical knowledge is crucial. However, most existing systems do not represent knowledge in a reusable way, hindering also reuse of data. Our purpose is therefore to make our computational models of medical knowledge sharable, extensible and interoperational with established knowledge representations in the form of the LapOntoSPM ontology. To show its usefulness, we apply it to situation interpretation, i.e., the recognition of surgical phases based on surgical activities. METHODS Considering best practices in ontology engineering and building on our ontology for laparoscopy, we formalized the workflow of laparoscopic adrenalectomies, cholecystectomies and pancreatic resections in the framework of OntoSPM, a new standard for surgical process models. Furthermore, we provide a rule-based situation interpretation algorithm based on SQWRL to recognize surgical phases using the ontology. RESULTS The system was evaluated on ground-truth data from 19 manually annotated surgeries. The aim was to show that the phase recognition capabilities are equal to a specialized solution. The recognition rates of the new system were equal to the specialized one. However, the time needed to interpret a situation rose from 0.5 to 1.8 s on average which is still viable for practical application. CONCLUSION We successfully integrated medical knowledge for laparoscopic surgeries into OntoSPM, facilitating knowledge and data sharing. This is especially important for reproducibility of results and unbiased comparison of recognition algorithms. The associated recognition algorithm was adapted to the new representation without any loss of classification power. The work is an important step to standardized knowledge and data representation in the field on context awareness and thus toward unified benchmark data sets.
Collapse
Affiliation(s)
- Darko Katić
- Karlsruhe Institute of Technology (KIT), Adenauerring 2, 76131, Karlsruhe, Germany,
| | | | | | | | | | | | | | | | | |
Collapse
|
34
|
Twinanda AP, Alkan EO, Gangi A, de Mathelin M, Padoy N. Data-driven spatio-temporal RGBD feature encoding for action recognition in operating rooms. Int J Comput Assist Radiol Surg 2015; 10:737-47. [PMID: 25847670 DOI: 10.1007/s11548-015-1186-1] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 03/20/2015] [Indexed: 11/30/2022]
Abstract
PURPOSE Context-aware systems for the operating room (OR) provide the possibility to significantly improve surgical workflow through various applications such as efficient OR scheduling, context-sensitive user interfaces, and automatic transcription of medical procedures. Being an essential element of such a system, surgical action recognition is thus an important research area. In this paper, we tackle the problem of classifying surgical actions from video clips that capture the activities taking place in the OR. METHODS We acquire recordings using a multi-view RGBD camera system mounted on the ceiling of a hybrid OR dedicated to X-ray-based procedures and annotate clips of the recordings with the corresponding actions. To recognize the surgical actions from the video clips, we use a classification pipeline based on the bag-of-words (BoW) approach. We propose a novel feature encoding method that extends the classical BoW approach. Instead of using the typical rigid grid layout to divide the space of the feature locations, we propose to learn the layout from the actual 4D spatio-temporal locations of the visual features. This results in a data-driven and non-rigid layout which retains more spatio-temporal information compared to the rigid counterpart. RESULTS We classify multi-view video clips from a new dataset generated from 11-day recordings of real operations. This dataset is composed of 1734 video clips of 15 actions. These include generic actions (e.g., moving patient to the OR bed) and actions specific to the vertebroplasty procedure (e.g., hammering). The experiments show that the proposed non-rigid feature encoding method performs better than the rigid encoding one. The classifier's accuracy is increased by over 4 %, from 81.08 to 85.53 %. CONCLUSION The combination of both intensity and depth information from the RGBD data provides more discriminative power in carrying out the surgical action recognition task as compared to using either one of them alone. Furthermore, the proposed non-rigid spatio-temporal feature encoding scheme provides more discriminative histogram representations than the rigid counterpart. To the best of our knowledge, this is also the first work that presents action recognition results on multi-view RGBD data recorded in the OR.
Collapse
Affiliation(s)
- Andru P Twinanda
- ICube Laboratory, University of Strasbourg, CNRS, IHU Strasbourg, Strasbourg, France,
| | | | | | | | | |
Collapse
|
35
|
Twinanda AP, Marescaux J, de Mathelin M, Padoy N. Classification approach for automatic laparoscopic video database organization. Int J Comput Assist Radiol Surg 2015; 10:1449-60. [PMID: 25847668 DOI: 10.1007/s11548-015-1183-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2014] [Accepted: 03/18/2015] [Indexed: 10/23/2022]
Abstract
PURPOSE One of the advantages of minimally invasive surgery (MIS) is that the underlying digitization provides invaluable information regarding the execution of procedures in various patient-specific conditions. However, such information can only be obtained conveniently if the laparoscopic video database comes with semantic annotations, which are typically provided manually by experts. Considering the growing popularity of MIS, manual annotation becomes a laborious and costly task. In this paper, we tackle the problem of laparoscopic video classification, which consists of automatically identifying the type of abdominal surgery performed in a video. In addition to performing classifications on the full recordings of the procedures, we also carry out sub-video and video clip classifications. These classifications are carried out to investigate how many frames from a video are needed to get a good classification performance and which parts of the procedures contain more discriminative features. METHOD Our classification pipeline is as follows. First, we reject the irrelevant frames from the videos using the color properties of the video frames. Second, we extract visual features from the relevant frames. Third, we quantize the features using several feature encoding methods, i.e., vector quantization, sparse coding (SC), and Fisher encoding. Fourth, we carry out the classification using support vector machines. While the sub-video classification is carried out by uniformly downsampling the video frames, the video clip classification is carried out by taking three parts of the videos (i.e., beginning, middle, and end) and running the classification pipeline separately for every video part. Ultimately, we build our final classification model by combining the features using a multiple kernel learning (MKL) approach. RESULTS To carry out the experiments, we use a dataset containing 208 videos of eight different surgeries performed by 10 different surgeons. The results show that SC with K-singular value decomposition (K-SVD) yields the best classification accuracy. The results also demonstrate that the classification accuracy only decreases by 3 % when solely 60 % of the video frames are utilized. Furthermore, it is also shown that the end part of the procedures is the most discriminative part of the surgery. Specifically, by using only the last 20 % of the video frames, a classification accuracy greater than 70 % can be achieved. Finally, the combination of all features yields the best performance of 90.38 % accuracy. CONCLUSIONS The SC with K-SVD provides the best representation of our videos, yielding the best accuracies for all features. In terms of information, the end part of the laparoscopic videos is the most discriminative compared to the other parts of the videos. In addition to their good performance individually, the features yield even better classification results when all of them are combined using the MKL approach.
Collapse
|
36
|
Quellec G, Lamard M, Cochener B, Cazuguel G. Real-time segmentation and recognition of surgical tasks in cataract surgery videos. IEEE TRANSACTIONS ON MEDICAL IMAGING 2014; 33:2352-60. [PMID: 25055383 DOI: 10.1109/tmi.2014.2340473] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
In ophthalmology, it is now common practice to record every surgical procedure and to archive the resulting videos for documentation purposes. In this paper, we present a solution to automatically segment and categorize surgical tasks in real-time during the surgery, using the video recording. The goal would be to communicate information to the surgeon in due time, such as recommendations to the less experienced surgeons. The proposed solution relies on the content-based video retrieval paradigm: it reuses previously archived videos to automatically analyze the current surgery, by analogy reasoning. Each video is segmented, in real-time, into an alternating sequence of idle phases, during which no clinically-relevant motions are visible, and action phases. As soon as an idle phase is detected, the previous action phase is categorized and the next action phase is predicted. A conditional random field is used for categorization and prediction. The proposed system was applied to the automatic segmentation and categorization of cataract surgery tasks. A dataset of 186 surgeries, performed by ten different surgeons, was manually annotated: ten possibly overlapping surgical tasks were delimited in each surgery. Using the content of action phases and the duration of idle phases as sources of evidence, an average recognition performance of Az = 0.832 ± 0.070 was achieved.
Collapse
|
37
|
Katić D, Spengler P, Bodenstedt S, Castrillon-Oberndorfer G, Seeberger R, Hoffmann J, Dillmann R, Speidel S. A system for context-aware intraoperative augmented reality in dental implant surgery. Int J Comput Assist Radiol Surg 2014; 10:101-8. [PMID: 24771315 DOI: 10.1007/s11548-014-1005-0] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2013] [Accepted: 04/03/2014] [Indexed: 01/27/2023]
Abstract
PURPOSE Large volumes of information in the OR are ignored by surgeons when the amount outpaces human mental processing abilities. We developed an augmented reality (AR) system for dental implant surgery that acts as an automatic information filter, selectively displaying only relevant information. The purpose is to reduce information overflow and offer intuitive image guidance. The system was evaluated in a pig cadaver experiment. METHODS Information filtering is implemented via rule-based situation interpretation with description logics. The interpretation is based on intraoperative distances measurement between anatomical structures and the dental drill with optical tracking. For AR, a head-mounted display is used, which was calibrated with a novel method based on SPAAM. To adapt to surgeon specific preferences, we offer two alternative display formats: one with static and another with contact analog AR. RESULTS The system made the surgery easier and showed ergonomical benefits, as assessed by a questionnaire. All relevant phases were recognized reliably. The new calibration showed significant improvements, while the deviation of the realized implants was <2.5 mm. CONCLUSION The system allowed the surgeon to fully concentrate on the surgery itself. It offered greater flexibility since the surgeon received all relevant information, but was free to deviate from it. Accuracy of the realized implants remains an open issue and part of future work.
Collapse
Affiliation(s)
- Darko Katić
- Department of Informatics, Institute for Anthropomatics, Karlsruhe Institute of Technology (KIT), Adenauerring 2, 76131 , Karlsruhe, Germany,
| | | | | | | | | | | | | | | |
Collapse
|
38
|
Quellec G, Charrière K, Lamard M, Droueche Z, Roux C, Cochener B, Cazuguel G. Real-time recognition of surgical tasks in eye surgery videos. Med Image Anal 2014; 18:579-90. [DOI: 10.1016/j.media.2014.02.007] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2012] [Revised: 02/07/2014] [Accepted: 02/17/2014] [Indexed: 01/23/2023]
|
39
|
Unger M, Chalopin C, Neumuth T. Vision-based online recognition of surgical activities. Int J Comput Assist Radiol Surg 2014; 9:979-86. [PMID: 24664268 DOI: 10.1007/s11548-014-0994-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2013] [Accepted: 03/07/2014] [Indexed: 10/25/2022]
Abstract
PURPOSE Surgical processes are complex entities characterized by expressive models and data. Recognizable activities define each surgical process. The principal limitation of current vision-based recognition methods is inefficiency due to the large amount of information captured during a surgical procedure. To overcome this technical challenge, we introduce a surgical gesture recognition system using temperature-based recognition. METHODS An infrared thermal camera was combined with a hierarchical temporal memory and was used during surgical procedures. The recordings were analyzed for recognition of surgical activities. The image sequence information acquired included hand temperatures. This datum was analyzed to perform gesture extraction and recognition based on heat differences between the surgeon's warm hands and the colder background of the environment. RESULTS The system was validated by simulating a functional endoscopic sinus surgery, a common type of otolaryngologic surgery. The thermal camera was directed toward the hands of the surgeon while handling different instruments. The system achieved an online recognition accuracy of 96% with high precision and recall rates of approximately 60%. CONCLUSION Vision-based recognition methods are the current best practice approaches for monitoring surgical processes. Problems of information overflow and extended recognition times in vision-based approaches were overcome by changing the spectral range to infrared. This change enables the real-time recognition of surgical activities and provides online monitoring information to surgical assistance systems and workflow management systems.
Collapse
Affiliation(s)
- Michael Unger
- Innovation Center Computer Assisted Surgery, University of Leipzig, Semmelweisstr. 14, Leipzig, 04103, Germany.
| | - Claire Chalopin
- Innovation Center Computer Assisted Surgery, University of Leipzig, Semmelweisstr. 14, Leipzig, 04103, Germany
| | - Thomas Neumuth
- Innovation Center Computer Assisted Surgery, University of Leipzig, Semmelweisstr. 14, Leipzig, 04103, Germany
| |
Collapse
|
40
|
Loukas C, Georgiou E. Smoke detection in endoscopic surgery videos: a first step towards retrieval of semantic events. Int J Med Robot 2014; 11:80-94. [DOI: 10.1002/rcs.1578] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2013] [Revised: 12/16/2013] [Accepted: 01/18/2014] [Indexed: 12/17/2022]
Affiliation(s)
- Constantinos Loukas
- Medical Physics Laboratory Simulation Centre, School of Medicine; University of Athens; Greece
| | - Evangelos Georgiou
- Medical Physics Laboratory Simulation Centre, School of Medicine; University of Athens; Greece
| |
Collapse
|
41
|
|
42
|
Fisher Kernel Based Task Boundary Retrieval in Laparoscopic Database with Single Video Query. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION – MICCAI 2014 2014; 17:409-16. [DOI: 10.1007/978-3-319-10443-0_52] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
|
43
|
Katić D, Wekerle AL, Gärtner F, Kenngott H, Müller-Stich BP, Dillmann R, Speidel S. Knowledge-Driven Formalization of Laparoscopic Surgeries for Rule-Based Intraoperative Context-Aware Assistance. INFORMATION PROCESSING IN COMPUTER-ASSISTED INTERVENTIONS 2014. [DOI: 10.1007/978-3-319-07521-1_17] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
44
|
Forestier G, Lalys F, Riffaud L, Louis Collins D, Meixensberger J, Wassef SN, Neumuth T, Goulet B, Jannin P. Multi-site study of surgical practice in neurosurgery based on surgical process models. J Biomed Inform 2013; 46:822-9. [PMID: 23810856 DOI: 10.1016/j.jbi.2013.06.006] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2013] [Revised: 06/11/2013] [Accepted: 06/12/2013] [Indexed: 11/26/2022]
Abstract
Surgical Process Modelling (SPM) was introduced to improve understanding the different parameters that influence the performance of a Surgical Process (SP). Data acquired from SPM methodology is enormous and complex. Several analysis methods based on comparison or classification of Surgical Process Models (SPMs) have previously been proposed. Such methods compare a set of SPMs to highlight specific parameters explaining differences between populations of patients, surgeons or systems. In this study, procedures performed at three different international University hospitals were compared using SPM methodology based on a similarity metric focusing on the sequence of activities occurring during surgery. The proposed approach is based on Dynamic Time Warping (DTW) algorithm combined with a clustering algorithm. SPMs of 41 Anterior Cervical Discectomy (ACD) surgeries were acquired at three Neurosurgical departments; in France, Germany, and Canada. The proposed approach distinguished the different surgical behaviors according to the location where surgery was performed as well as between the categorized surgical experience of individual surgeons. We also propose the use of Multidimensional Scaling to induce a new space of representation of the sequences of activities. The approach was compared to a time-based approach (e.g. duration of surgeries) and has been shown to be more precise. We also discuss the integration of other criteria in order to better understand what influences the way the surgeries are performed. This first multi-site study represents an important step towards the creation of robust analysis tools for processing SPMs. It opens new perspectives for the assessment of surgical approaches, tools or systems as well as objective assessment and comparison of surgeon's expertise.
Collapse
|
45
|
Surgical gesture classification from video and kinematic data. Med Image Anal 2013; 17:732-45. [PMID: 23706754 DOI: 10.1016/j.media.2013.04.007] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2012] [Revised: 03/22/2013] [Accepted: 04/15/2013] [Indexed: 11/21/2022]
Abstract
Much of the existing work on automatic classification of gestures and skill in robotic surgery is based on dynamic cues (e.g., time to completion, speed, forces, torque) or kinematic data (e.g., robot trajectories and velocities). While videos could be equally or more discriminative (e.g., videos contain semantic information not present in kinematic data), they are typically not used because of the difficulties associated with automatic video interpretation. In this paper, we propose several methods for automatic surgical gesture classification from video data. We assume that the video of a surgical task (e.g., suturing) has been segmented into video clips corresponding to a single gesture (e.g., grabbing the needle, passing the needle) and propose three methods to classify the gesture of each video clip. In the first one, we model each video clip as the output of a linear dynamical system (LDS) and use metrics in the space of LDSs to classify new video clips. In the second one, we use spatio-temporal features extracted from each video clip to learn a dictionary of spatio-temporal words, and use a bag-of-features (BoF) approach to classify new video clips. In the third one, we use multiple kernel learning (MKL) to combine the LDS and BoF approaches. Since the LDS approach is also applicable to kinematic data, we also use MKL to combine both types of data in order to exploit their complementarity. Our experiments on a typical surgical training setup show that methods based on video data perform equally well, if not better, than state-of-the-art approaches based on kinematic data. In turn, the combination of both kinematic and video data outperforms any other algorithm based on one type of data alone.
Collapse
|
46
|
Loukas C, Georgiou E. Surgical workflow analysis with Gaussian mixture multivariate autoregressive (GMMAR) models: a simulation study. ACTA ACUST UNITED AC 2013; 18:47-62. [DOI: 10.3109/10929088.2012.762944] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
|
47
|
Tao L, Zappella L, Hager GD, Vidal R. Surgical gesture segmentation and recognition. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION : MICCAI ... INTERNATIONAL CONFERENCE ON MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION 2013; 16:339-46. [PMID: 24505779 DOI: 10.1007/978-3-642-40760-4_43] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
Automatic surgical gesture segmentation and recognition can provide useful feedback for surgical training in robotic surgery. Most prior work in this field relies on the robot's kinematic data. Although recent work [1,2] shows that the robot's video data can be equally effective for surgical gesture recognition, the segmentation of the video into gestures is assumed to be known. In this paper, we propose a framework for joint segmentation and recognition of surgical gestures from kinematic and video data. Unlike prior work that relies on either frame-level kinematic cues, or segment-level kinematic or video cues, our approach exploits both cues by using a combined Markov/semi-Markov conditional random field (MsM-CRF) model. Our experiments show that the proposed model improves over a Markov or semi-Markov CRF when using video data alone, gives results that are comparable to state-of-the-art methods on kinematic data alone, and improves over state-of-the-art methods when combining kinematic and video data.
Collapse
Affiliation(s)
- Lingling Tao
- Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA
| | - Luca Zappella
- Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA
| | - Gregory D Hager
- Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA
| | - René Vidal
- Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA
| |
Collapse
|
48
|
Haro BB, Zappella L, Vidal R. Surgical gesture classification from video data. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION : MICCAI ... INTERNATIONAL CONFERENCE ON MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION 2012; 15:34-41. [PMID: 23285532 DOI: 10.1007/978-3-642-33415-3_5] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Much of the existing work on automatic classification of gestures and skill in robotic surgery is based on kinematic and dynamic cues, such as time to completion, speed, forces, torque, or robot trajectories. In this paper we show that in a typical surgical training setup, video data can be equally discriminative. To that end, we propose and evaluate three approaches to surgical gesture classification from video. In the first one, we model each video clip from each surgical gesture as the output of a linear dynamical system (LDS) and use metrics in the space of LDSs to classify new video clips. In the second one, we use spatio-temporal features extracted from each video clip to learn a dictionary of spatio-temporal words and use a bag-of-features (BoF) approach to classify new video clips. In the third approach, we use multiple kernel learning to combine the LDS and BoF approaches. Our experiments show that methods based on video data perform equally well as the state-of-the-art approaches based on kinematic data.
Collapse
|
49
|
Lalys F, Riffaud L, Bouget D, Jannin P. A framework for the recognition of high-level surgical tasks from video images for cataract surgeries. IEEE Trans Biomed Eng 2011; 59:966-76. [PMID: 22203700 DOI: 10.1109/tbme.2011.2181168] [Citation(s) in RCA: 82] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The need for a better integration of the new generation of computer-assisted-surgical systems has been recently emphasized. One necessity to achieve this objective is to retrieve data from the operating room (OR) with different sensors, then to derive models from these data. Recently, the use of videos from cameras in the OR has demonstrated its efficiency. In this paper, we propose a framework to assist in the development of systems for the automatic recognition of high-level surgical tasks using microscope videos analysis. We validated its use on cataract procedures. The idea is to combine state-of-the-art computer vision techniques with time series analysis. The first step of the framework consisted in the definition of several visual cues for extracting semantic information, therefore, characterizing each frame of the video. Five different pieces of image-based classifiers were, therefore, implemented. A step of pupil segmentation was also applied for dedicated visual cue detection. Time series classification algorithms were then applied to model time-varying data. Dynamic time warping and hidden Markov models were tested. This association combined the advantages of all methods for better understanding of the problem. The framework was finally validated through various studies. Six binary visual cues were chosen along with 12 phases to detect, obtaining accuracies of 94%.
Collapse
Affiliation(s)
- F Lalys
- U1099 Institut National de la Santé et de la Recherche Médicale and the Faculté de Médecine, University of Rennes I, Rennes, France.
| | | | | | | |
Collapse
|
50
|
Forestier G, Lalys F, Riffaud L, Trelhu B, Jannin P. Classification of surgical processes using dynamic time warping. J Biomed Inform 2011; 45:255-64. [PMID: 22120773 DOI: 10.1016/j.jbi.2011.11.002] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2011] [Revised: 11/09/2011] [Accepted: 11/10/2011] [Indexed: 10/15/2022]
Abstract
In the creation of new computer-assisted intervention systems, Surgical Process Models (SPMs) are an emerging concept used for analyzing and assessing surgical interventions. SPMs represent Surgical Processes (SPs) which are formalized as symbolic structured descriptions of surgical interventions using a pre-defined level of granularity and a dedicated terminology. In this context, one major challenge is the creation of new metrics for the comparison and the evaluation of SPs. Thus, correlations between these metrics and pre-operative data are used to classify surgeries and highlight specific information on the surgery itself and on the surgeon, such as his/her level of expertise. In this paper, we explore the automatic classification of a set of SPs based on the Dynamic Time Warping (DTW) algorithm. DTW is used to compute a similarity measure between two SPs that focuses on the different types of activities performed during surgery and their sequencing, by minimizing time differences. Indeed, it turns out to be a complementary approach to the classical methods that only focus on differences in the time and the number of activities. Experiments were carried out on 24 lumbar disk herniation surgeries to discriminate the surgeons level of expertise according to a prior classification of SPs. Supervised and unsupervised classification experiments have shown that this approach was able to automatically identify groups of surgeons according to their level of expertise (senior and junior), and opens many perspectives for the creation of new metrics for comparing and evaluating surgeries.
Collapse
|