1
|
Tomita H, Ienaga N, Kajita H, Hayashida T, Sugimoto M. An analysis on the effect of body tissues and surgical tools on workflow recognition in first person surgical videos. Int J Comput Assist Radiol Surg 2024:10.1007/s11548-024-03074-6. [PMID: 38411780 DOI: 10.1007/s11548-024-03074-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 02/09/2024] [Indexed: 02/28/2024]
Abstract
PURPOSE Analysis of operative fields is expected to aid in estimating procedural workflow and evaluating surgeons' procedural skills by considering the temporal transitions during the progression of the surgery. This study aims to propose an automatic recognition system for the procedural workflow by employing machine learning techniques to identify and distinguish elements in the operative field, including body tissues such as fat, muscle, and dermis, along with surgical tools. METHODS We conducted annotations on approximately 908 first-person-view images of breast surgery to facilitate segmentation. The annotated images were used to train a pixel-level classifier based on Mask R-CNN. To assess the impact on procedural workflow recognition, we annotated an additional 43,007 images. The network, structured on the Transformer architecture, was then trained with surgical images incorporating masks for body tissues and surgical tools. RESULTS The instance segmentation of each body tissue in the segmentation phase provided insights into the trend of area transitions for each tissue. Simultaneously, the spatial features of the surgical tools were effectively captured. In regard to the accuracy of procedural workflow recognition, accounting for body tissues led to an average improvement of 3 % over the baseline. Furthermore, the inclusion of surgical tools yielded an additional increase in accuracy by 4 % compared to the baseline. CONCLUSION In this study, we revealed the contribution of the temporal transition of the body tissues and surgical tools spatial features to recognize procedural workflow in first-person-view surgical videos. Body tissues, especially in open surgery, can be a crucial element. This study suggests that further improvements can be achieved by accurately identifying surgical tools specific to each procedural workflow step.
Collapse
Affiliation(s)
- Hisako Tomita
- Graduate School of Science and Technology, Keio University, Yokohama, 2238522, Japan.
| | - Naoto Ienaga
- Institute of Systems and Information Engineering, University of Tsukuba, Tsukuba, 3058573, Japan
| | - Hiroki Kajita
- Department of Plastic and Reconstructive Surgery, Keio University School of Medicine, Tokyo, 1608582, Japan
| | - Tetsu Hayashida
- Department of Surgery, Keio University School of Medicine, Tokyo, 1608582, Japan
| | - Maki Sugimoto
- Graduate School of Science and Technology, Keio University, Yokohama, 2238522, Japan
| |
Collapse
|
2
|
Huaulmé A, Harada K, Nguyen QM, Park B, Hong S, Choi MK, Peven M, Li Y, Long Y, Dou Q, Kumar S, Lalithkumar S, Hongliang R, Matsuzaki H, Ishikawa Y, Harai Y, Kondo S, Mitsuishi M, Jannin P. PEg TRAnsfer Workflow recognition challenge report: Do multimodal data improve recognition? Comput Methods Programs Biomed 2023; 236:107561. [PMID: 37119774 DOI: 10.1016/j.cmpb.2023.107561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 04/06/2023] [Accepted: 04/18/2023] [Indexed: 05/21/2023]
Abstract
BACKGROUND AND OBJECTIVE In order to be context-aware, computer-assisted surgical systems require accurate, real-time automatic surgical workflow recognition. In the past several years, surgical video has been the most commonly-used modality for surgical workflow recognition. But with the democratization of robot-assisted surgery, new modalities, such as kinematics, are now accessible. Some previous methods use these new modalities as input for their models, but their added value has rarely been studied. This paper presents the design and results of the "PEg TRAnsfer Workflow recognition" (PETRAW) challenge with the objective of developing surgical workflow recognition methods based on one or more modalities and studying their added value. METHODS The PETRAW challenge included a data set of 150 peg transfer sequences performed on a virtual simulator. This data set included videos, kinematic data, semantic segmentation data, and annotations, which described the workflow at three levels of granularity: phase, step, and activity. Five tasks were proposed to the participants: three were related to the recognition at all granularities simultaneously using a single modality, and two addressed the recognition using multiple modalities. The mean application-dependent balanced accuracy (AD-Accuracy) was used as an evaluation metric to take into account class balance and is more clinically relevant than a frame-by-frame score. RESULTS Seven teams participated in at least one task with four participating in every task. The best results were obtained by combining video and kinematic data (AD-Accuracy of between 93% and 90% for the four teams that participated in all tasks). CONCLUSION The improvement of surgical workflow recognition methods using multiple modalities compared with unimodal methods was significant for all teams. However, the longer execution time required for video/kinematic-based methods(compared to only kinematic-based methods) must be considered. Indeed, one must ask if it is wise to increase computing time by 2000 to 20,000% only to increase accuracy by 3%. The PETRAW data set is publicly available at www.synapse.org/PETRAW to encourage further research in surgical workflow recognition.
Collapse
Affiliation(s)
- Arnaud Huaulmé
- Univ Rennes, INSERM, LTSI - UMR 1099, Rennes, F35000, France.
| | - Kanako Harada
- Department of Mechanical Engineering, the University of Tokyo, Tokyo 113-8656, Japan
| | | | - Bogyu Park
- VisionAI hutom, Seoul, Republic of Korea
| | | | | | | | | | - Yonghao Long
- Department of Computer Science & Engineering, The Chinese University of Hong Kong, Hong Kong
| | - Qi Dou
- Department of Computer Science & Engineering, The Chinese University of Hong Kong, Hong Kong
| | | | | | - Ren Hongliang
- National University of Singapore, Singapore, Singapore; The Chinese University of Hong Kong, Hong Kong, Hong Kong
| | - Hiroki Matsuzaki
- National Cancer Center Japan East Hospital, Tokyo 104-0045, Japan
| | - Yuto Ishikawa
- National Cancer Center Japan East Hospital, Tokyo 104-0045, Japan
| | - Yuriko Harai
- National Cancer Center Japan East Hospital, Tokyo 104-0045, Japan
| | | | - Manoru Mitsuishi
- Department of Mechanical Engineering, the University of Tokyo, Tokyo 113-8656, Japan
| | - Pierre Jannin
- Univ Rennes, INSERM, LTSI - UMR 1099, Rennes, F35000, France.
| |
Collapse
|
3
|
Jin Y, Long Y, Gao X, Stoyanov D, Dou Q, Heng PA. Trans-SVNet: hybrid embedding aggregation Transformer for surgical workflow analysis. Int J Comput Assist Radiol Surg 2022; 17:2193-2202. [PMID: 36129573 DOI: 10.1007/s11548-022-02743-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 08/31/2022] [Indexed: 11/05/2022]
Abstract
PURPOSE Real-time surgical workflow analysis has been a key component for computer-assisted intervention system to improve cognitive assistance. Most existing methods solely rely on conventional temporal models and encode features with a successive spatial-temporal arrangement. Supportive benefits of intermediate features are partially lost from both visual and temporal aspects. In this paper, we rethink feature encoding to attend and preserve the critical information for accurate workflow recognition and anticipation. METHODS We introduce Transformer in surgical workflow analysis, to reconsider complementary effects of spatial and temporal representations. We propose a hybrid embedding aggregation Transformer, named Trans-SVNet, to effectively interact with the designed spatial and temporal embeddings, by employing spatial embedding to query temporal embedding sequence. We jointly optimized by loss objectives from both analysis tasks to leverage their high correlation. RESULTS We extensively evaluate our method on three large surgical video datasets. Our method consistently outperforms the state-of-the-arts across three datasets on workflow recognition task. Jointly learning with anticipation, recognition results can gain a large improvement. Our approach also shows its effectiveness on anticipation with promising performance achieved. Our model achieves a real-time inference speed of 0.0134 second per frame. CONCLUSION Experimental results demonstrate the efficacy of our hybrid embeddings integration by rediscovering the crucial cues from complementary spatial-temporal embeddings. The better performance by multi-task learning indicates that anticipation task brings the additional knowledge to recognition task. Promising effectiveness and efficiency of our method also show its promising potential to be used in operating room.
Collapse
Affiliation(s)
- Yueming Jin
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), Department of Computer Science, University College, London, UK
| | - Yonghao Long
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, HK, China
| | - Xiaojie Gao
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, HK, China
| | - Danail Stoyanov
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), Department of Computer Science, University College, London, UK
| | - Qi Dou
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, HK, China. .,Institute of Medical Intelligence and XR, The Chinese University of Hong Kong, Shatin, HK, China.
| | - Pheng-Ann Heng
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, HK, China.,Institute of Medical Intelligence and XR, The Chinese University of Hong Kong, Shatin, HK, China
| |
Collapse
|
4
|
Huaulmé A, Sarikaya D, Le Mut K, Despinoy F, Long Y, Dou Q, Chng CB, Lin W, Kondo S, Bravo-Sánchez L, Arbeláez P, Reiter W, Mitsuishi M, Harada K, Jannin P. MIcro-surgical anastomose workflow recognition challenge report. Comput Methods Programs Biomed 2021; 212:106452. [PMID: 34688174 DOI: 10.1016/j.cmpb.2021.106452] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 09/28/2021] [Indexed: 05/22/2023]
Abstract
BACKGROUND AND OBJECTIVE Automatic surgical workflow recognition is an essential step in developing context-aware computer-assisted surgical systems. Video recordings of surgeries are becoming widely accessible, as the operational field view is captured during laparoscopic surgeries. Head and ceiling mounted cameras are also increasingly being used to record videos in open surgeries. This makes videos a common choice in surgical workflow recognition. Additional modalities, such as kinematic data captured during robot-assisted surgeries, could also improve workflow recognition. This paper presents the design and results of the MIcro-Surgical Anastomose Workflow recognition on training sessions (MISAW) challenge whose objective was to develop workflow recognition models based on kinematic data and/or videos. METHODS The MISAW challenge provided a data set of 27 sequences of micro-surgical anastomosis on artificial blood vessels. This data set was composed of videos, kinematics, and workflow annotations. The latter described the sequences at three different granularity levels: phase, step, and activity. Four tasks were proposed to the participants: three of them were related to the recognition of surgical workflow at three different granularity levels, while the last one addressed the recognition of all granularity levels in the same model. We used the average application-dependent balanced accuracy (AD-Accuracy) as the evaluation metric. This takes unbalanced classes into account and it is more clinically relevant than a frame-by-frame score. RESULTS Six teams participated in at least one task. All models employed deep learning models, such as convolutional neural networks (CNN), recurrent neural networks (RNN), or a combination of both. The best models achieved accuracy above 95%, 80%, 60%, and 75% respectively for recognition of phases, steps, activities, and multi-granularity. The RNN-based models outperformed the CNN-based ones as well as the dedicated modality models compared to the multi-granularity except for activity recognition. CONCLUSION For high levels of granularity, the best models had a recognition rate that may be sufficient for applications such as prediction of remaining surgical time. However, for activities, the recognition rate was still low for applications that can be employed clinically. The MISAW data set is publicly available at http://www.synapse.org/MISAW to encourage further research in surgical workflow recognition.
Collapse
Affiliation(s)
- Arnaud Huaulmé
- Univ Rennes,INSERM, LTSI - UMR 1099, Rennes, F35000, France.
| | - Duygu Sarikaya
- Gazi University, Faculty of Engineering; Department of Computer Engineering, Ankara, Turkey
| | - Kévin Le Mut
- Univ Rennes,INSERM, LTSI - UMR 1099, Rennes, F35000, France
| | | | - Yonghao Long
- Department of Computer Science & Engineering, The Chinese University of Hong Kong, China; T Stone Robotics Institute, The Chinese University of Hong Kong, China
| | - Qi Dou
- Department of Computer Science & Engineering, The Chinese University of Hong Kong, China; T Stone Robotics Institute, The Chinese University of Hong Kong, China
| | - Chin-Boon Chng
- National University of Singapore(NUS), Singapore, Singapore; Southern University of Science and Technology (SUSTech), Shenzhen, China
| | - Wenjun Lin
- National University of Singapore(NUS), Singapore, Singapore; Southern University of Science and Technology (SUSTech), Shenzhen, China
| | | | - Laura Bravo-Sánchez
- Center for Research and Formation in Artificial Intelligence, Department of Biomedical Engineering, Universidad de los Andes, Bogotá, Colombia
| | - Pablo Arbeláez
- Center for Research and Formation in Artificial Intelligence, Department of Biomedical Engineering, Universidad de los Andes, Bogotá, Colombia
| | | | - Manoru Mitsuishi
- Department of Mechanical Engineering, the University of Tokyo,Tokyo 113-8656, Japan
| | - Kanako Harada
- Department of Mechanical Engineering, the University of Tokyo,Tokyo 113-8656, Japan
| | - Pierre Jannin
- Univ Rennes,INSERM, LTSI - UMR 1099, Rennes, F35000, France.
| |
Collapse
|
5
|
Xia T, Jia F. Against spatial-temporal discrepancy: contrastive learning-based network for surgical workflow recognition. Int J Comput Assist Radiol Surg 2021; 16:839-848. [PMID: 33950398 DOI: 10.1007/s11548-021-02382-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Accepted: 04/16/2021] [Indexed: 11/27/2022]
Abstract
PURPOSE Automatic workflow recognition from surgical videos is fundamental and significant for developing context-aware systems in modern operating rooms. Although many approaches have been proposed to tackle challenges in this complex task, there are still many problems such as the fine-grained characteristics and spatial-temporal discrepancies in surgical videos. METHODS We propose a contrastive learning-based convolutional recurrent network with multi-level prediction to tackle these problems. Specifically, split-attention blocks are employed to extract spatial features. Through a mapping function in the step-phase branch, the current workflow can be predicted on two mutual-boosting levels. Furthermore, a contrastive branch is introduced to learn the spatial-temporal features that eliminate irrelevant changes in the environment. RESULTS We evaluate our method on the Cataract-101 dataset. The results show that our method achieves an accuracy of 96.37% with only surgical step labels, which outperforms other state-of-the-art approaches. CONCLUSION The proposed convolutional recurrent network based on step-phase prediction and contrastive learning can leverage fine-grained characteristics and alleviate spatial-temporal discrepancies to improve the performance of surgical workflow recognition.
Collapse
Affiliation(s)
- Tong Xia
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Fucang Jia
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China. .,University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|