1
|
Kannan S, Yengera G, Mutter D, Marescaux J, Padoy N. Future-State Predicting LSTM for Early Surgery Type Recognition. IEEE TRANSACTIONS ON MEDICAL IMAGING 2020; 39:556-566. [PMID: 31352339 DOI: 10.1109/tmi.2019.2931158] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This work presents a novel approach for the early recognition of the type of a laparoscopic surgery from its video. Early recognition algorithms can be beneficial to the development of "smart" OR systems that can provide automatic context-aware assistance, and also enable quick database indexing. The task is however ridden with challenges specific to videos belonging to the domain of laparoscopy, such as high visual similarity across surgeries and large variations in video durations. To capture the spatio-temporal dependencies in these videos, we choose as our model a combination of a convolutional neural network (CNN) and long short-term memory (LSTM) network. We then propose two complementary approaches for improving early recognition performance. The first approach is a CNN fine-tuning method that encourages surgeries to be distinguished based on the initial frames of laparoscopic videos. The second approach, referred to as " Future-State Predicting LSTM," trains an LSTM to predict information related to future frames, which helps in distinguishing between the different types of surgeries. We evaluate our approaches on a large dataset of 425 laparoscopic videos containing nine types of surgeries (Laparo425), and achieve on average an accuracy of 75% having observed only the first 10 min of a surgery. These results are quite promising from a practical standpoint and also encouraging for other types of image-guided surgeries.
Collapse
|
2
|
Zia A, Essa I. Automated surgical skill assessment in RMIS training. Int J Comput Assist Radiol Surg 2018; 13:731-739. [PMID: 29549553 DOI: 10.1007/s11548-018-1735-5] [Citation(s) in RCA: 54] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2018] [Accepted: 03/09/2018] [Indexed: 10/17/2022]
Abstract
PURPOSE Manual feedback in basic robot-assisted minimally invasive surgery (RMIS) training can consume a significant amount of time from expert surgeons' schedule and is prone to subjectivity. In this paper, we explore the usage of different holistic features for automated skill assessment using only robot kinematic data and propose a weighted feature fusion technique for improving score prediction performance. Moreover, we also propose a method for generating 'task highlights' which can give surgeons a more directed feedback regarding which segments had the most effect on the final skill score. METHODS We perform our experiments on the publicly available JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS) and evaluate four different types of holistic features from robot kinematic data-sequential motion texture (SMT), discrete Fourier transform (DFT), discrete cosine transform (DCT) and approximate entropy (ApEn). The features are then used for skill classification and exact skill score prediction. Along with using these features individually, we also evaluate the performance using our proposed weighted combination technique. The task highlights are produced using DCT features. RESULTS Our results demonstrate that these holistic features outperform all previous Hidden Markov Model (HMM)-based state-of-the-art methods for skill classification on the JIGSAWS dataset. Also, our proposed feature fusion strategy significantly improves performance for skill score predictions achieving up to 0.61 average spearman correlation coefficient. Moreover, we provide an analysis on how the proposed task highlights can relate to different surgical gestures within a task. CONCLUSIONS Holistic features capturing global information from robot kinematic data can successfully be used for evaluating surgeon skill in basic surgical tasks on the da Vinci robot. Using the framework presented can potentially allow for real-time score feedback in RMIS training and help surgical trainees have more focused training.
Collapse
Affiliation(s)
- Aneeq Zia
- College of Computing, Georgia Institute of Technology, Atlanta, GA, 30332, USA.
| | - Irfan Essa
- College of Computing, Georgia Institute of Technology, Atlanta, GA, 30332, USA
| |
Collapse
|
3
|
Video and accelerometer-based motion analysis for automated surgical skills assessment. Int J Comput Assist Radiol Surg 2018; 13:443-455. [PMID: 29380122 DOI: 10.1007/s11548-018-1704-z] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2017] [Accepted: 01/08/2018] [Indexed: 10/18/2022]
Abstract
PURPOSE Basic surgical skills of suturing and knot tying are an essential part of medical training. Having an automated system for surgical skills assessment could help save experts time and improve training efficiency. There have been some recent attempts at automated surgical skills assessment using either video analysis or acceleration data. In this paper, we present a novel approach for automated assessment of OSATS-like surgical skills and provide an analysis of different features on multi-modal data (video and accelerometer data). METHODS We conduct a large study for basic surgical skill assessment on a dataset that contained video and accelerometer data for suturing and knot-tying tasks. We introduce "entropy-based" features-approximate entropy and cross-approximate entropy, which quantify the amount of predictability and regularity of fluctuations in time series data. The proposed features are compared to existing methods of Sequential Motion Texture, Discrete Cosine Transform and Discrete Fourier Transform, for surgical skills assessment. RESULTS We report average performance of different features across all applicable OSATS-like criteria for suturing and knot-tying tasks. Our analysis shows that the proposed entropy-based features outperform previous state-of-the-art methods using video data, achieving average classification accuracies of 95.1 and 92.2% for suturing and knot tying, respectively. For accelerometer data, our method performs better for suturing achieving 86.8% average accuracy. We also show that fusion of video and acceleration features can improve overall performance for skill assessment. CONCLUSION Automated surgical skills assessment can be achieved with high accuracy using the proposed entropy features. Such a system can significantly improve the efficiency of surgical training in medical schools and teaching hospitals.
Collapse
|
4
|
Loukas C. Video content analysis of surgical procedures. Surg Endosc 2017; 32:553-568. [PMID: 29075965 DOI: 10.1007/s00464-017-5878-1] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Accepted: 09/07/2017] [Indexed: 12/14/2022]
Abstract
BACKGROUND In addition to its therapeutic benefits, minimally invasive surgery offers the potential for video recording of the operation. The videos may be archived and used later for reasons such as cognitive training, skills assessment, and workflow analysis. Methods from the major field of video content analysis and representation are increasingly applied in the surgical domain. In this paper, we review recent developments and analyze future directions in the field of content-based video analysis of surgical operations. METHODS The review was obtained from PubMed and Google Scholar search on combinations of the following keywords: 'surgery', 'video', 'phase', 'task', 'skills', 'event', 'shot', 'analysis', 'retrieval', 'detection', 'classification', and 'recognition'. The collected articles were categorized and reviewed based on the technical goal sought, type of surgery performed, and structure of the operation. RESULTS A total of 81 articles were included. The publication activity is constantly increasing; more than 50% of these articles were published in the last 3 years. Significant research has been performed for video task detection and retrieval in eye surgery. In endoscopic surgery, the research activity is more diverse: gesture/task classification, skills assessment, tool type recognition, shot/event detection and retrieval. Recent works employ deep neural networks for phase and tool recognition as well as shot detection. CONCLUSIONS Content-based video analysis of surgical operations is a rapidly expanding field. Several future prospects for research exist including, inter alia, shot boundary detection, keyframe extraction, video summarization, pattern discovery, and video annotation. The development of publicly available benchmark datasets to evaluate and compare task-specific algorithms is essential.
Collapse
Affiliation(s)
- Constantinos Loukas
- Laboratory of Medical Physics, Medical School, National and Kapodistrian University of Athens, Mikras Asias 75 str., 11527, Athens, Greece.
| |
Collapse
|
5
|
Stauder R, Ostler D, Vogel T, Wilhelm D, Koller S, Kranzfelder M, Navab N. Surgical data processing for smart intraoperative assistance systems. Innov Surg Sci 2017; 2:145-152. [PMID: 31579746 PMCID: PMC6754013 DOI: 10.1515/iss-2017-0035] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2017] [Accepted: 08/28/2017] [Indexed: 11/26/2022] Open
Abstract
Different components of the newly defined field of surgical data science have been under research at our groups for more than a decade now. In this paper, we describe our sensor-driven approaches to workflow recognition without the need for explicit models, and our current aim is to apply this knowledge to enable context-aware surgical assistance systems, such as a unified surgical display and robotic assistance systems. The methods we evaluated over time include dynamic time warping, hidden Markov models, random forests, and recently deep neural networks, specifically convolutional neural networks.
Collapse
Affiliation(s)
- Ralf Stauder
- Chair for Computer Aided Medical Procedures, Technical University of Munich, Munich, Germany
| | - Daniel Ostler
- Research Group for Minimally Invasive Interdisciplinary Therapeutical Interventions, Klinikum Rechts der Isar, Technical University of Munich, Munich, Germany
| | - Thomas Vogel
- Research Group for Minimally Invasive Interdisciplinary Therapeutical Interventions, Klinikum Rechts der Isar, Technical University of Munich, Munich, Germany
| | - Dirk Wilhelm
- Research Group for Minimally Invasive Interdisciplinary Therapeutical Interventions, Klinikum Rechts der Isar, Technical University of Munich, Munich, Germany
| | - Sebastian Koller
- Research Group for Minimally Invasive Interdisciplinary Therapeutical Interventions, Klinikum Rechts der Isar, Technical University of Munich, Munich, Germany
| | - Michael Kranzfelder
- Research Group for Minimally Invasive Interdisciplinary Therapeutical Interventions, Klinikum Rechts der Isar, Technical University of Munich, Munich, Germany
| | - Nassir Navab
- Chair for Computer Aided Medical Procedures, Technical University of Munich, Munich, Germany.,Department of Computer Science, The Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
6
|
Bovo F, De Rossi G, Visentin F. Surgical robot simulation with BBZ console. J Vis Surg 2017; 3:57. [PMID: 29078620 DOI: 10.21037/jovs.2017.03.16] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2016] [Accepted: 01/21/2017] [Indexed: 11/06/2022]
Abstract
This paper presents a lean approach to training in robot assisted surgery. Minimally Invasive Surgical procedures can be decomposed in a sequence of tasks, each surgical task can be further decomposed in basic gestures. Each surgical gesture seems similar to perform rather in laparoscopic than in robot assisted technique, but surgeon posture, tools dexterity, force and vision feedback are different. As a consequence, performing a robot-assisted procedure needs specific training. Currently, the most used robot in in abdominal and pelvic surgery is the da Vinci Surgical System and a different set of skills is needed to master the human-machine interface of this device. The training with the real robot is very expensive due to the high initial cost of purchasing and maintaining the robotic surgical system, and the ethic involved in vivo practice. For these reasons, different training systems based on virtual reality were developed. The simulation physics realism and the objective metrics collected during the task execution are the main features for the effectiveness of a virtual reality based training device. Availability of training systems is another issue. To help surgeons to train in virtual reality, BBZ presents a compact, lightweight and portable console, suitable also for "home" training.
Collapse
Affiliation(s)
| | - Giacomo De Rossi
- Department of Computer Science, Università degli Studi di Verona, Verona, Italy
| | - Francesco Visentin
- Department of Computer Science, Università degli Studi di Verona, Verona, Italy.,Department of Intelligent Interaction Technologies, Tsukuba University, Tsukuba, Japan
| |
Collapse
|
7
|
Ahmidi N, Tao L, Sefati S, Gao Y, Lea C, Haro BB, Zappella L, Khudanpur S, Vidal R, Hager GD. A Dataset and Benchmarks for Segmentation and Recognition of Gestures in Robotic Surgery. IEEE Trans Biomed Eng 2017; 64:2025-2041. [PMID: 28060703 DOI: 10.1109/tbme.2016.2647680] [Citation(s) in RCA: 94] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
OBJECTIVE State-of-the-art techniques for surgical data analysis report promising results for automated skill assessment and action recognition. The contributions of many of these techniques, however, are limited to study-specific data and validation metrics, making assessment of progress across the field extremely challenging. METHODS In this paper, we address two major problems for surgical data analysis: First, lack of uniform-shared datasets and benchmarks, and second, lack of consistent validation processes. We address the former by presenting the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), a public dataset that we have created to support comparative research benchmarking. JIGSAWS contains synchronized video and kinematic data from multiple performances of robotic surgical tasks by operators of varying skill. We address the latter by presenting a well-documented evaluation methodology and reporting results for six techniques for automated segmentation and classification of time-series data on JIGSAWS. These techniques comprise four temporal approaches for joint segmentation and classification: hidden Markov model, sparse hidden Markov model (HMM), Markov semi-Markov conditional random field, and skip-chain conditional random field; and two feature-based ones that aim to classify fixed segments: bag of spatiotemporal features and linear dynamical systems. RESULTS Most methods recognize gesture activities with approximately 80% overall accuracy under both leave-one-super-trial-out and leave-one-user-out cross-validation settings. CONCLUSION Current methods show promising results on this shared dataset, but room for significant progress remains, particularly for consistent prediction of gesture activities across different surgeons. SIGNIFICANCE The results reported in this paper provide the first systematic and uniform evaluation of surgical activity recognition techniques on the benchmark database.
Collapse
|
8
|
|
9
|
Automated video-based assessment of surgical skills for training and evaluation in medical schools. Int J Comput Assist Radiol Surg 2016; 11:1623-36. [PMID: 27567917 DOI: 10.1007/s11548-016-1468-2] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2016] [Accepted: 08/03/2016] [Indexed: 10/21/2022]
Abstract
PURPOSE Routine evaluation of basic surgical skills in medical schools requires considerable time and effort from supervising faculty. For each surgical trainee, a supervisor has to observe the trainees in person. Alternatively, supervisors may use training videos, which reduces some of the logistical overhead. All these approaches however are still incredibly time consuming and involve human bias. In this paper, we present an automated system for surgical skills assessment by analyzing video data of surgical activities. METHOD We compare different techniques for video-based surgical skill evaluation. We use techniques that capture the motion information at a coarser granularity using symbols or words, extract motion dynamics using textural patterns in a frame kernel matrix, and analyze fine-grained motion information using frequency analysis. RESULTS We were successfully able to classify surgeons into different skill levels with high accuracy. Our results indicate that fine-grained analysis of motion dynamics via frequency analysis is most effective in capturing the skill relevant information in surgical videos. CONCLUSION Our evaluations show that frequency features perform better than motion texture features, which in-turn perform better than symbol-/word-based features. Put succinctly, skill classification accuracy is positively correlated with motion granularity as demonstrated by our results on two challenging video datasets.
Collapse
|
10
|
System events: readily accessible features for surgical phase detection. Int J Comput Assist Radiol Surg 2016; 11:1201-9. [PMID: 27177760 DOI: 10.1007/s11548-016-1409-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2016] [Accepted: 03/31/2016] [Indexed: 10/21/2022]
Abstract
PURPOSE Surgical phase recognition using sensor data is challenging due to high variation in patient anatomy and surgeon-specific operating styles. Segmenting surgical procedures into constituent phases is of significant utility for resident training, education, self-review, and context-aware operating room technologies. Phase annotation is a highly labor-intensive task and would benefit greatly from automated solutions. METHODS We propose a novel approach using system events-for example, activation of cautery tools-that are easily captured in most surgical procedures. Our method involves extracting event-based features over 90-s intervals and assigning a phase label to each interval. We explore three classification techniques: support vector machines, random forests, and temporal convolution neural networks. Each of these models independently predicts a label for each time interval. We also examine segmental inference using an approach based on the semi-Markov conditional random field, which jointly performs phase segmentation and classification. Our method is evaluated on a data set of 24 robot-assisted hysterectomy procedures. RESULTS Our framework is able to detect surgical phases with an accuracy of 74 % using event-based features over a set of five different phases-ligation, dissection, colpotomy, cuff closure, and background. Precision and recall values for the cuff closure (Precision: 83 %, Recall: 98 %) and dissection (Precision: 75 %, Recall: 88 %) classes were higher than other classes. The normalized Levenshtein distance between predicted and ground truth phase sequence was 25 %. CONCLUSIONS Our findings demonstrate that system events features are useful for automatically detecting surgical phase. Events contain phase information that cannot be obtained from motion data and that would require advanced computer vision algorithms to extract from a video. Many of these events are not specific to robotic surgery and can easily be recorded in non-robotic surgical modalities. In future work, we plan to combine information from system events, tool motion, and videos to automate phase detection in surgical procedures.
Collapse
|
11
|
Gao Y, Vedula SS, Lee GI, Lee MR, Khudanpur S, Hager GD. Query-by-example surgical activity detection. Int J Comput Assist Radiol Surg 2016; 11:987-96. [PMID: 27072835 DOI: 10.1007/s11548-016-1386-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2016] [Accepted: 03/14/2016] [Indexed: 10/22/2022]
Abstract
PURPOSE Easy acquisition of surgical data opens many opportunities to automate skill evaluation and teaching. Current technology to search tool motion data for surgical activity segments of interest is limited by the need for manual pre-processing, which can be prohibitive at scale. We developed a content-based information retrieval method, query-by-example (QBE), to automatically detect activity segments within surgical data recordings of long duration that match a query. METHODS The example segment of interest (query) and the surgical data recording (target trial) are time series of kinematics. Our approach includes an unsupervised feature learning module using a stacked denoising autoencoder (SDAE), two scoring modules based on asymmetric subsequence dynamic time warping (AS-DTW) and template matching, respectively, and a detection module. A distance matrix of the query against the trial is computed using the SDAE features, followed by AS-DTW combined with template scoring, to generate a ranked list of candidate subsequences (substrings). To evaluate the quality of the ranked list against the ground-truth, thresholding conventional DTW distances and bipartite matching are applied. We computed the recall, precision, F1-score, and a Jaccard index-based score on three experimental setups. We evaluated our QBE method using a suture throw maneuver as the query, on two tool motion datasets (JIGSAWS and MISTIC-SL) captured in a training laboratory. RESULTS We observed a recall of 93, 90 and 87 % and a precision of 93, 91, and 88 % with same surgeon same trial (SSST), same surgeon different trial (SSDT) and different surgeon (DS) experiment setups on JIGSAWS, and a recall of 87, 81 and 75 % and a precision of 72, 61, and 53 % with SSST, SSDT and DS experiment setups on MISTIC-SL, respectively. CONCLUSION We developed a novel, content-based information retrieval method to automatically detect multiple instances of an activity within long surgical recordings. Our method demonstrated adequate recall across different complexity datasets and experimental conditions.
Collapse
Affiliation(s)
- Yixin Gao
- Department of Computer Science, Whiting School of Engineering, The Johns Hopkins University, Baltimore, MD, 21218, USA.
| | - S Swaroop Vedula
- Department of Computer Science, Whiting School of Engineering, The Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Gyusung I Lee
- Department of Surgery, Johns Hopkins University School of Medicine, Baltimore, MD, 21287, USA
| | - Mija R Lee
- Department of Surgery, Johns Hopkins University School of Medicine, Baltimore, MD, 21287, USA
| | - Sanjeev Khudanpur
- Department of Electrical and Computer Engineering, Whiting School of Engineering, The Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Gregory D Hager
- Department of Computer Science, Whiting School of Engineering, The Johns Hopkins University, Baltimore, MD, 21218, USA
| |
Collapse
|
12
|
Vedula SS, Malpani AO, Tao L, Chen G, Gao Y, Poddar P, Ahmidi N, Paxton C, Vidal R, Khudanpur S, Hager GD, Chen CCG. Analysis of the Structure of Surgical Activity for a Suturing and Knot-Tying Task. PLoS One 2016; 11:e0149174. [PMID: 26950551 PMCID: PMC4780814 DOI: 10.1371/journal.pone.0149174] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2015] [Accepted: 01/07/2016] [Indexed: 11/17/2022] Open
Abstract
Background Surgical tasks are performed in a sequence of steps, and technical skill evaluation includes assessing task flow efficiency. Our objective was to describe differences in task flow for expert and novice surgeons for a basic surgical task. Methods We used a hierarchical semantic vocabulary to decompose and annotate maneuvers and gestures for 135 instances of a surgeon’s knot performed by 18 surgeons. We compared counts of maneuvers and gestures, and analyzed task flow by skill level. Results Experts used fewer gestures to perform the task (26.29; 95% CI = 25.21 to 27.38 for experts vs. 31.30; 95% CI = 29.05 to 33.55 for novices) and made fewer errors in gestures than novices (1.00; 95% CI = 0.61 to 1.39 vs. 2.84; 95% CI = 2.3 to 3.37). Transitions among maneuvers, and among gestures within each maneuver for expert trials were more predictable than novice trials. Conclusions Activity segments and state flow transitions within a basic surgical task differ by surgical skill level, and can be used to provide targeted feedback to surgical trainees.
Collapse
Affiliation(s)
- S. Swaroop Vedula
- Department of Computer Science, The Johns Hopkins University, Baltimore, Maryland, United States of America
- * E-mail:
| | - Anand O. Malpani
- Department of Computer Science, The Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Lingling Tao
- Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, Maryland, United States of America
| | - George Chen
- Department of Biomedical Engineering, The Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Yixin Gao
- Department of Computer Science, The Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Piyush Poddar
- Department of Computer Science, The Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Narges Ahmidi
- Department of Computer Science, The Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Christopher Paxton
- Department of Computer Science, The Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Rene Vidal
- Department of Biomedical Engineering, The Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Sanjeev Khudanpur
- Department of Computer Science, The Johns Hopkins University, Baltimore, Maryland, United States of America
- Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Gregory D. Hager
- Department of Computer Science, The Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Chi Chiung Grace Chen
- Department of Gynecology and Obstetrics, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| |
Collapse
|
13
|
A study of crowdsourced segment-level surgical skill assessment using pairwise rankings. Int J Comput Assist Radiol Surg 2015; 10:1435-47. [PMID: 26133652 DOI: 10.1007/s11548-015-1238-6] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2014] [Accepted: 06/04/2015] [Indexed: 10/23/2022]
Abstract
PURPOSE Currently available methods for surgical skills assessment are either subjective or only provide global evaluations for the overall task. Such global evaluations do not inform trainees about where in the task they need to perform better. In this study, we investigated the reliability and validity of a framework to generate objective skill assessments for segments within a task, and compared assessments from our framework using crowdsourced segment ratings from surgically untrained individuals and expert surgeons against manually assigned global rating scores. METHODS Our framework includes (1) a binary classifier trained to generate preferences for pairs of task segments (i.e., given a pair of segments, specification of which one was performed better), (2) computing segment-level percentile scores based on the preferences, and (3) predicting task-level scores using the segment-level scores. We conducted a crowdsourcing user study to obtain manual preferences for segments within a suturing and knot-tying task from a crowd of surgically untrained individuals and a group of experts. We analyzed the inter-rater reliability of preferences obtained from the crowd and experts, and investigated the validity of task-level scores obtained using our framework. In addition, we compared accuracy of the crowd and expert preference classifiers, as well as the segment- and task-level scores obtained from the classifiers. RESULTS We observed moderate inter-rater reliability within the crowd (Fleiss' kappa, κ = 0.41) and experts (κ = 0.55). For both the crowd and experts, the accuracy of an automated classifier trained using all the task segments was above par as compared to the inter-rater agreement [crowd classifier 85 % (SE 2 %), expert classifier 89 % (SE 3 %)]. We predicted the overall global rating scores (GRS) for the task with a root-mean-squared error that was lower than one standard deviation of the ground-truth GRS. We observed a high correlation between segment-level scores (ρ ≥ 0.86) obtained using the crowd and expert preference classifiers. The task-level scores obtained using the crowd and expert preference classifier were also highly correlated with each other (ρ ≥ 0.84), and statistically equivalent within a margin of two points (for a score ranging from 6 to 30). Our analyses, however, did not demonstrate statistical significance in equivalence of accuracy between the crowd and expert classifiers within a 10 % margin. CONCLUSIONS Our framework implemented using crowdsourced pairwise comparisons leads to valid objective surgical skill assessment for segments within a task, and for the task overall. Crowdsourcing yields reliable pairwise comparisons of skill for segments within a task with high efficiency. Our framework may be deployed within surgical training programs for objective, automated, and standardized evaluation of technical skills.
Collapse
|
14
|
Quellec G, Charrière K, Lamard M, Droueche Z, Roux C, Cochener B, Cazuguel G. Real-time recognition of surgical tasks in eye surgery videos. Med Image Anal 2014; 18:579-90. [DOI: 10.1016/j.media.2014.02.007] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2012] [Revised: 02/07/2014] [Accepted: 02/17/2014] [Indexed: 01/23/2023]
|
15
|
Unger M, Chalopin C, Neumuth T. Vision-based online recognition of surgical activities. Int J Comput Assist Radiol Surg 2014; 9:979-86. [PMID: 24664268 DOI: 10.1007/s11548-014-0994-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2013] [Accepted: 03/07/2014] [Indexed: 10/25/2022]
Abstract
PURPOSE Surgical processes are complex entities characterized by expressive models and data. Recognizable activities define each surgical process. The principal limitation of current vision-based recognition methods is inefficiency due to the large amount of information captured during a surgical procedure. To overcome this technical challenge, we introduce a surgical gesture recognition system using temperature-based recognition. METHODS An infrared thermal camera was combined with a hierarchical temporal memory and was used during surgical procedures. The recordings were analyzed for recognition of surgical activities. The image sequence information acquired included hand temperatures. This datum was analyzed to perform gesture extraction and recognition based on heat differences between the surgeon's warm hands and the colder background of the environment. RESULTS The system was validated by simulating a functional endoscopic sinus surgery, a common type of otolaryngologic surgery. The thermal camera was directed toward the hands of the surgeon while handling different instruments. The system achieved an online recognition accuracy of 96% with high precision and recall rates of approximately 60%. CONCLUSION Vision-based recognition methods are the current best practice approaches for monitoring surgical processes. Problems of information overflow and extended recognition times in vision-based approaches were overcome by changing the spectral range to infrared. This change enables the real-time recognition of surgical activities and provides online monitoring information to surgical assistance systems and workflow management systems.
Collapse
Affiliation(s)
- Michael Unger
- Innovation Center Computer Assisted Surgery, University of Leipzig, Semmelweisstr. 14, Leipzig, 04103, Germany.
| | - Claire Chalopin
- Innovation Center Computer Assisted Surgery, University of Leipzig, Semmelweisstr. 14, Leipzig, 04103, Germany
| | - Thomas Neumuth
- Innovation Center Computer Assisted Surgery, University of Leipzig, Semmelweisstr. 14, Leipzig, 04103, Germany
| |
Collapse
|
16
|
Fast part-based classification for instrument detection in minimally invasive surgery. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION : MICCAI ... INTERNATIONAL CONFERENCE ON MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION 2014; 17:692-9. [PMID: 25485440 DOI: 10.1007/978-3-319-10470-6_86] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Automatic visual detection of instruments in minimally invasive surgery (MIS) can significantly augment the procedure experience for operating clinicians. In this paper, we present a novel technique for detecting surgical instruments by constructing a robust and reliable instrument-part detector. While such detectors are typically slow to use, we introduce a novel early stopping scheme for multiclass ensemble classifiers which acts as a cascade and significantly reduces the computational requirements at test time, ultimately allowing it to run at framerate. We evaluate the effectiveness of our approach on instrument detection in retinal microsurgery and laparoscopic image sequences and demonstrate significant improvements in both accuracy and speed.
Collapse
|
17
|
Kwitt R, Vasconcelos N, Razzaque S, Aylward S. Localizing target structures in ultrasound video - a phantom study. Med Image Anal 2013; 17:712-22. [PMID: 23746488 PMCID: PMC3737575 DOI: 10.1016/j.media.2013.05.003] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2012] [Revised: 04/24/2013] [Accepted: 05/02/2013] [Indexed: 10/26/2022]
Abstract
The problem of localizing specific anatomic structures using ultrasound (US) video is considered. This involves automatically determining when an US probe is acquiring images of a previously defined object of interest, during the course of an US examination. Localization using US is motivated by the increased availability of portable, low-cost US probes, which inspire applications where inexperienced personnel and even first-time users acquire US data that is then sent to experts for further assessment. This process is of particular interest for routine examinations in underserved populations as well as for patient triage after natural disasters and large-scale accidents, where experts may be in short supply. The proposed localization approach is motivated by research in the area of dynamic texture analysis and leverages several recent advances in the field of activity recognition. For evaluation, we introduce an annotated and publicly available database of US video, acquired on three phantoms. Several experiments reveal the challenges of applying video analysis approaches to US images and demonstrate that good localization performance is possible with the proposed solution.
Collapse
Affiliation(s)
- R Kwitt
- Kitware Inc., Chapel Hill, NC, USA.
| | | | | | | |
Collapse
|
18
|
Lalys F, Jannin P. Surgical process modelling: a review. Int J Comput Assist Radiol Surg 2013; 9:495-511. [PMID: 24014322 DOI: 10.1007/s11548-013-0940-5] [Citation(s) in RCA: 113] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2013] [Accepted: 08/27/2013] [Indexed: 11/26/2022]
Abstract
PURPOSE Surgery is continuously subject to technological and medical innovations that are transforming daily surgical routines. In order to gain a better understanding and description of surgeries, the field of surgical process modelling (SPM) has recently emerged. The challenge is to support surgery through the quantitative analysis and understanding of operating room activities. Related surgical process models can then be introduced into a new generation of computer-assisted surgery systems. METHODS In this paper, we present a review of the literature dealing with SPM. This methodological review was obtained from a search using Google Scholar on the specific keywords: "surgical process analysis", "surgical process model" and "surgical workflow analysis". RESULTS This paper gives an overview of current approaches in the field that study the procedural aspects of surgery. We propose a classification of the domain that helps to summarise and describe the most important components of each paper we have reviewed, i.e., acquisition, modelling, analysis, application and validation/evaluation. These five aspects are presented independently along with an exhaustive list of their possible instantiations taken from the studied publications. CONCLUSION This review allows a greater understanding of the SPM field to be gained and introduces future related prospects.
Collapse
Affiliation(s)
- Florent Lalys
- University of Rennes I, LTSI, 35000 , Rennes, France,
| | | |
Collapse
|
19
|
Surgical gesture classification from video and kinematic data. Med Image Anal 2013; 17:732-45. [PMID: 23706754 DOI: 10.1016/j.media.2013.04.007] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2012] [Revised: 03/22/2013] [Accepted: 04/15/2013] [Indexed: 11/21/2022]
Abstract
Much of the existing work on automatic classification of gestures and skill in robotic surgery is based on dynamic cues (e.g., time to completion, speed, forces, torque) or kinematic data (e.g., robot trajectories and velocities). While videos could be equally or more discriminative (e.g., videos contain semantic information not present in kinematic data), they are typically not used because of the difficulties associated with automatic video interpretation. In this paper, we propose several methods for automatic surgical gesture classification from video data. We assume that the video of a surgical task (e.g., suturing) has been segmented into video clips corresponding to a single gesture (e.g., grabbing the needle, passing the needle) and propose three methods to classify the gesture of each video clip. In the first one, we model each video clip as the output of a linear dynamical system (LDS) and use metrics in the space of LDSs to classify new video clips. In the second one, we use spatio-temporal features extracted from each video clip to learn a dictionary of spatio-temporal words, and use a bag-of-features (BoF) approach to classify new video clips. In the third one, we use multiple kernel learning (MKL) to combine the LDS and BoF approaches. Since the LDS approach is also applicable to kinematic data, we also use MKL to combine both types of data in order to exploit their complementarity. Our experiments on a typical surgical training setup show that methods based on video data perform equally well, if not better, than state-of-the-art approaches based on kinematic data. In turn, the combination of both kinematic and video data outperforms any other algorithm based on one type of data alone.
Collapse
|
20
|
Tao L, Zappella L, Hager GD, Vidal R. Surgical gesture segmentation and recognition. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION : MICCAI ... INTERNATIONAL CONFERENCE ON MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION 2013; 16:339-46. [PMID: 24505779 DOI: 10.1007/978-3-642-40760-4_43] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
Automatic surgical gesture segmentation and recognition can provide useful feedback for surgical training in robotic surgery. Most prior work in this field relies on the robot's kinematic data. Although recent work [1,2] shows that the robot's video data can be equally effective for surgical gesture recognition, the segmentation of the video into gestures is assumed to be known. In this paper, we propose a framework for joint segmentation and recognition of surgical gestures from kinematic and video data. Unlike prior work that relies on either frame-level kinematic cues, or segment-level kinematic or video cues, our approach exploits both cues by using a combined Markov/semi-Markov conditional random field (MsM-CRF) model. Our experiments show that the proposed model improves over a Markov or semi-Markov CRF when using video data alone, gives results that are comparable to state-of-the-art methods on kinematic data alone, and improves over state-of-the-art methods when combining kinematic and video data.
Collapse
Affiliation(s)
- Lingling Tao
- Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA
| | - Luca Zappella
- Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA
| | - Gregory D Hager
- Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA
| | - René Vidal
- Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA
| |
Collapse
|