1
|
Junger D, Kücherer C, Hirt B, Burgert O. Transferable situation recognition system for scenario-independent context-aware surgical assistance systems: a proof of concept. Int J Comput Assist Radiol Surg 2025; 20:579-590. [PMID: 39604547 PMCID: PMC11929725 DOI: 10.1007/s11548-024-03283-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Accepted: 10/18/2024] [Indexed: 11/29/2024]
Abstract
PURPOSE Surgical interventions and the intraoperative environment can vary greatly. A system that reliably recognizes the situation in the operating room should therefore be flexibly applicable to different surgical settings. To achieve this, transferability should be focused during system design and development. In this paper, we demonstrated the feasibility of a transferable, scenario-independent situation recognition system (SRS) by the definition and evaluation based on non-functional requirements. METHODS Based on a high-level concept for a transferable SRS, a proof of concept implementation was demonstrated using scenarios. The architecture was evaluated with a focus on non-functional requirements of compatibility, maintainability, and portability. Moreover, transferability aspects beyond the requirements, such as the effort to cover new scenarios, were discussed in a subsequent argumentative evaluation. RESULTS The evaluation demonstrated the development of an SRS that can be applied to various scenarios. Furthermore, the investigation of the transferability to other settings highlighted the system's characteristics regarding configurability, interchangeability, and expandability. The components can be optimized step by step to realize a versatile and efficient situation recognition that can be easily adapted to different scenarios. CONCLUSION The prototype provides a framework for scenario-independent situation recognition, suggesting greater applicability and transferability to different surgical settings. For the transfer into clinical routine, the system's modules need to be evolved, further transferability challenges be addressed, and comprehensive scenarios be integrated.
Collapse
Affiliation(s)
- D Junger
- School of Informatics, Research Group Computer Assisted Medicine (CaMed), Reutlingen University, Reutlingen, Germany.
| | - C Kücherer
- School of Informatics, Research Group Computer Assisted Medicine (CaMed), Reutlingen University, Reutlingen, Germany
| | - B Hirt
- Faculty of Medicine, Department of Anatomy, Institute for Clinical Anatomy and Cell Analytics, Eberhard Karls University Tübingen, Tübingen, Germany
| | - O Burgert
- School of Informatics, Research Group Computer Assisted Medicine (CaMed), Reutlingen University, Reutlingen, Germany
| |
Collapse
|
2
|
Leiderman YI, Gerber MJ, Hubschman JP, Yi D. Artificial intelligence applications in ophthalmic surgery. Curr Opin Ophthalmol 2024; 35:526-532. [PMID: 39145488 DOI: 10.1097/icu.0000000000001033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/16/2024]
Abstract
PURPOSE OF REVIEW Technologies in healthcare incorporating artificial intelligence tools are experiencing rapid growth in static-image-based applications such as diagnostic imaging. Given the proliferation of artificial intelligence (AI)-technologies created for video-based imaging, ophthalmic microsurgery is likely to experience significant benefits from the application of emerging technologies to multiple facets of the care of the surgical patient. RECENT FINDINGS Proof-of-concept research and early phase clinical trials are in progress for AI-based surgical technologies that aim to provide preoperative planning and decision support, intraoperative image enhancement, surgical guidance, surgical decision-making support, tactical assistive technologies, enhanced surgical training and assessment of trainee progress, and semi-autonomous tool control or autonomous elements of surgical procedures. SUMMARY The proliferation of AI-based technologies in static imaging in clinical ophthalmology, continued refinement of AI tools designed for video-based applications, and development of AI-based digital tools in allied surgical fields suggest that ophthalmic surgery is poised for the integration of AI into our microsurgical paradigm.
Collapse
Affiliation(s)
- Yannek I Leiderman
- Departments of Ophthalmology and Bioengineering, University of Illinois Chicago
| | - Matthew J Gerber
- Department of Ophthalmology, University of California at Los Angeles, Los Angeles, California, USA
| | - Jean-Pierre Hubschman
- Department of Ophthalmology, University of California at Los Angeles, Los Angeles, California, USA
| | - Darvin Yi
- Departments of Ophthalmology and Bioengineering, University of Illinois Chicago
| |
Collapse
|
3
|
Müller S, Jain M, Sachdeva B, Shah PN, Holz FG, Finger RP, Murali K, Wintergerst MWM, Schultz T. Artificial Intelligence in Cataract Surgery: A Systematic Review. Transl Vis Sci Technol 2024; 13:20. [PMID: 38618893 PMCID: PMC11033603 DOI: 10.1167/tvst.13.4.20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 02/12/2024] [Indexed: 04/16/2024] Open
Abstract
Purpose The purpose of this study was to assess the current use and reliability of artificial intelligence (AI)-based algorithms for analyzing cataract surgery videos. Methods A systematic review of the literature about intra-operative analysis of cataract surgery videos with machine learning techniques was performed. Cataract diagnosis and detection algorithms were excluded. Resulting algorithms were compared, descriptively analyzed, and metrics summarized or visually reported. The reproducibility and reliability of the methods and results were assessed using a modified version of the Medical Image Computing and Computer-Assisted (MICCAI) checklist. Results Thirty-eight of the 550 screened studies were included, 20 addressed the challenge of instrument detection or tracking, 9 focused on phase discrimination, and 8 predicted skill and complications. Instrument detection achieves an area under the receiver operator characteristic curve (ROC AUC) between 0.976 and 0.998, instrument tracking an mAP between 0.685 and 0.929, phase recognition an ROC AUC between 0.773 and 0.990, and complications or surgical skill performs with an ROC AUC between 0.570 and 0.970. Conclusions The studies showed a wide variation in quality and pose a challenge regarding replication due to a small number of public datasets (none for manual small incision cataract surgery) and seldom published source code. There is no standard for reported outcome metrics and validation of the models on external datasets is rare making comparisons difficult. The data suggests that tracking of instruments and phase detection work well but surgical skill and complication recognition remains a challenge for deep learning. Translational Relevance This overview of cataract surgery analysis with AI models provides translational value for improving training of the clinician by identifying successes and challenges.
Collapse
Affiliation(s)
- Simon Müller
- University Hospital Bonn, Department of Ophthalmology, Bonn, Germany
| | | | - Bhuvan Sachdeva
- Microsoft Research, Bengaluru, India
- Sankara Eye Hospital, Bengaluru, Karnataka, India
| | | | - Frank G. Holz
- University Hospital Bonn, Department of Ophthalmology, Bonn, Germany
| | - Robert P. Finger
- University Hospital Bonn, Department of Ophthalmology, Bonn, Germany
- Department of Ophthalmology, University Medical Center Mannheim, Heidelberg University, Mannheim, Germany
| | | | | | - Thomas Schultz
- B-IT and Department of Computer Science, University of Bonn, Bonn, Germany
- Lamarr Institute for Machine Learning and Artificial Intelligence, Dortmund, Germany
| |
Collapse
|
4
|
Park M, Oh S, Jeong T, Yu S. Multi-Stage Temporal Convolutional Network with Moment Loss and Positional Encoding for Surgical Phase Recognition. Diagnostics (Basel) 2022; 13:107. [PMID: 36611399 PMCID: PMC9818879 DOI: 10.3390/diagnostics13010107] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 12/28/2022] [Accepted: 12/28/2022] [Indexed: 12/31/2022] Open
Abstract
In recent times, many studies concerning surgical video analysis are being conducted due to its growing importance in many medical applications. In particular, it is very important to be able to recognize the current surgical phase because the phase information can be utilized in various ways both during and after surgery. This paper proposes an efficient phase recognition network, called MomentNet, for cholecystectomy endoscopic videos. Unlike LSTM-based network, MomentNet is based on a multi-stage temporal convolutional network. Besides, to improve the phase prediction accuracy, the proposed method adopts a new loss function to supplement the general cross entropy loss function. The new loss function significantly improves the performance of the phase recognition network by constraining un-desirable phase transition and preventing over-segmentation. In addition, MomnetNet effectively applies positional encoding techniques, which are commonly applied in transformer architectures, to the multi-stage temporal convolution network. By using the positional encoding techniques, MomentNet can provide important temporal context, resulting in higher phase prediction accuracy. Furthermore, the MomentNet applies label smoothing technique to suppress overfitting and replaces the backbone network for feature extraction to further improve the network performance. As a result, the MomentNet achieves 92.31% accuracy in the phase recognition task with the Cholec80 dataset, which is 4.55% higher than that of the baseline architecture.
Collapse
Affiliation(s)
- Minyoung Park
- School of Electrical and Electronics Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul 06974, Republic of Korea
| | - Seungtaek Oh
- School of Electrical and Electronics Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul 06974, Republic of Korea
| | - Taikyeong Jeong
- School of Artificial Intelligence Convergence, Hallym University, Chuncheon 24252, Republic of Korea
| | - Sungwook Yu
- School of Electrical and Electronics Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul 06974, Republic of Korea
| |
Collapse
|
5
|
Hybrid Spatiotemporal Contrastive Representation Learning for Content-Based Surgical Video Retrieval. ELECTRONICS 2022. [DOI: 10.3390/electronics11091353] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
In the medical field, due to their economic and clinical benefits, there is a growing interest in minimally invasive surgeries and microscopic surgeries. These types of surgeries are often recorded during operations, and these recordings have become a key resource for education, patient disease analysis, surgical error analysis, and surgical skill assessment. However, manual searching in this collection of long-term surgical videos is an extremely labor-intensive and long-term task, requiring an effective content-based video analysis system. In this regard, previous methods for surgical video retrieval are based on handcrafted features which do not represent the video effectively. On the other hand, deep learning-based solutions were found to be effective in both surgical image and video analysis, where CNN-, LSTM- and CNN-LSTM-based methods were proposed in most surgical video analysis tasks. In this paper, we propose a hybrid spatiotemporal embedding method to enhance spatiotemporal representations using an adaptive fusion layer on top of the LSTM and temporal causal convolutional modules. To learn surgical video representations, we propose exploring the supervised contrastive learning approach to leverage label information in addition to augmented versions. By validating our approach to a video retrieval task on two datasets, Surgical Actions 160 and Cataract-101, we significantly improve on previous results in terms of mean average precision, 30.012 ± 1.778 vs. 22.54 ± 1.557 for Surgical Actions 160 and 81.134 ± 1.28 vs. 33.18 ± 1.311 for Cataract-101. We also validate the proposed method’s suitability for surgical phase recognition task using the benchmark Cholec80 surgical dataset, where our approach outperforms (with 90.2% accuracy) the state of the art.
Collapse
|
6
|
Junger D, Frommer SM, Burgert O. State-of-the-art of situation recognition systems for intraoperative procedures. Med Biol Eng Comput 2022; 60:921-939. [PMID: 35178622 PMCID: PMC8933302 DOI: 10.1007/s11517-022-02520-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Accepted: 01/30/2022] [Indexed: 11/05/2022]
Abstract
One of the key challenges for automatic assistance is the support of actors in the operating room depending on the status of the procedure. Therefore, context information collected in the operating room is used to gain knowledge about the current situation. In literature, solutions already exist for specific use cases, but it is doubtful to what extent these approaches can be transferred to other conditions. We conducted a comprehensive literature research on existing situation recognition systems for the intraoperative area, covering 274 articles and 95 cross-references published between 2010 and 2019. We contrasted and compared 58 identified approaches based on defined aspects such as used sensor data or application area. In addition, we discussed applicability and transferability. Most of the papers focus on video data for recognizing situations within laparoscopic and cataract surgeries. Not all of the approaches can be used online for real-time recognition. Using different methods, good results with recognition accuracies above 90% could be achieved. Overall, transferability is less addressed. The applicability of approaches to other circumstances seems to be possible to a limited extent. Future research should place a stronger focus on adaptability. The literature review shows differences within existing approaches for situation recognition and outlines research trends. Applicability and transferability to other conditions are less addressed in current work.
Collapse
Affiliation(s)
- D Junger
- School of Informatics, Research Group Computer Assisted Medicine (CaMed), Reutlingen University, Alteburgstr. 150, 72762, Reutlingen, Germany.
| | - S M Frommer
- School of Informatics, Research Group Computer Assisted Medicine (CaMed), Reutlingen University, Alteburgstr. 150, 72762, Reutlingen, Germany
| | - O Burgert
- School of Informatics, Research Group Computer Assisted Medicine (CaMed), Reutlingen University, Alteburgstr. 150, 72762, Reutlingen, Germany
| |
Collapse
|
7
|
Yeh HH, Jain AM, Fox O, Wang SY. PhacoTrainer: A Multicenter Study of Deep Learning for Activity Recognition in Cataract Surgical Videos. Transl Vis Sci Technol 2021; 10:23. [PMID: 34784415 PMCID: PMC8606857 DOI: 10.1167/tvst.10.13.23] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Purpose To build and evaluate deep learning models for recognizing cataract surgical steps from whole-length surgical videos with minimal preprocessing, including identification of routine and complex steps. Methods We collected 298 cataract surgical videos from 12 resident surgeons across 6 sites and excluded 30 incomplete, duplicated, and combination surgery videos. Videos were downsampled at 1 frame/second. Trained annotators labeled 13 steps of surgery: create wound, injection into the eye, capsulorrhexis, hydrodissection, phacoemulsification, irrigation/aspiration, place lens, remove viscoelastic, close wound, advanced technique/other, stain with trypan blue, manipulating iris, and subconjunctival injection. We trained two deep learning models, one based on the VGG16 architecture (VGG model) and the second using VGG16 followed by a long short-term memory network (convolutional neural network [CNN]– recurrent neural network [RNN] model). Class activation maps were visualized using Grad-CAM. Results Overall top 1 prediction accuracy was 76% for VGG model (93% for top 3 accuracy) and 84% for the CNN–RNN model (97% for top 3 accuracy). The microaveraged area under receiver-operating characteristic curves was 0.97 for the VGG model and 0.99 for the CNN–RNN model. The microaveraged average precision score was 0.83 for the VGG model and 0.92 for the CNN–RNN model. Class activation maps revealed the model was appropriately focused on the instrumentation used in each step to identify which step was being performed. Conclusions Deep learning models can classify cataract surgical activities on a frame-by-frame basis with remarkably high accuracy, especially routine surgical steps. Translational Relevance An automated system for recognition of cataract surgical steps could provide to residents automated feedback metrics, such as the length of time spent on each step.
Collapse
Affiliation(s)
- Hsu-Hang Yeh
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, USA
| | - Anjal M Jain
- Department of Ophthalmology, Byers Eye Institute, Stanford University, Palo Alto, CA, USA
| | - Olivia Fox
- Krieger School of Arts and Sciences, Johns Hopkins University, Baltimore, MD, USA
| | - Sophia Y Wang
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, USA.,Department of Ophthalmology, Byers Eye Institute, Stanford University, Palo Alto, CA, USA
| |
Collapse
|
8
|
Tognetto D, Giglio R, Vinciguerra AL, Milan S, Rejdak R, Rejdak M, Zaluska-Ogryzek K, Zweifel S, Toro MD. Artificial intelligence applications and cataract management: A systematic review. Surv Ophthalmol 2021; 67:817-829. [PMID: 34606818 DOI: 10.1016/j.survophthal.2021.09.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2021] [Revised: 09/27/2021] [Accepted: 09/27/2021] [Indexed: 11/26/2022]
Abstract
Artificial intelligence (AI)-based applications exhibit the potential to improve the quality and efficiency of patient care in different fields, including cataract management. A systematic review of the different applications of AI-based software on all aspects of a cataract patient's management, from diagnosis to follow-up, was carried out in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. All selected articles were analyzed to assess the level of evidence according to the Oxford Centre for Evidence-Based Medicine 2011 guidelines, and the quality of evidence according to the Grading of Recommendations Assessment, Development and Evaluation system. Of the articles analyzed, 49 met the inclusion criteria. No data synthesis was possible for the heterogeneity of available data and the design of the available studies. The AI-driven diagnosis seemed to be comparable and, in selected cases, to even exceed the accuracy of experienced clinicians in classifying disease, supporting the operating room scheduling, and intraoperative and postoperative management of complications. Considering the heterogeneity of data analyzed, however, further randomized controlled trials to assess the efficacy and safety of AI application in the management of cataract should be highly warranted.
Collapse
Affiliation(s)
- Daniele Tognetto
- Eye Clinic, Department of Medicine, Surgery and Health Sciences, University of Trieste, Trieste, Italy
| | - Rosa Giglio
- Eye Clinic, Department of Medicine, Surgery and Health Sciences, University of Trieste, Trieste, Italy.
| | - Alex Lucia Vinciguerra
- Eye Clinic, Department of Medicine, Surgery and Health Sciences, University of Trieste, Trieste, Italy
| | - Serena Milan
- Eye Clinic, Department of Medicine, Surgery and Health Sciences, University of Trieste, Trieste, Italy
| | - Robert Rejdak
- Chair and Department of General and Pediatric Ophthalmology, Medical University of Lublin, Lublin, Poland
| | | | | | | | - Mario Damiano Toro
- Department of Ophthalmology, University of Zurich, Zurich; Department of Medical Sciences, Collegium Medicum, Cardinal Stefan Wyszyński University, Warsaw, Poland
| |
Collapse
|
9
|
van Amsterdam B, Clarkson MJ, Stoyanov D. Gesture Recognition in Robotic Surgery: A Review. IEEE Trans Biomed Eng 2021; 68:2021-2035. [PMID: 33497324 DOI: 10.1109/tbme.2021.3054828] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
OBJECTIVE Surgical activity recognition is a fundamental step in computer-assisted interventions. This paper reviews the state-of-the-art in methods for automatic recognition of fine-grained gestures in robotic surgery focusing on recent data-driven approaches and outlines the open questions and future research directions. METHODS An article search was performed on 5 bibliographic databases with the following search terms: robotic, robot-assisted, JIGSAWS, surgery, surgical, gesture, fine-grained, surgeme, action, trajectory, segmentation, recognition, parsing. Selected articles were classified based on the level of supervision required for training and divided into different groups representing major frameworks for time series analysis and data modelling. RESULTS A total of 52 articles were reviewed. The research field is showing rapid expansion, with the majority of articles published in the last 4 years. Deep-learning-based temporal models with discriminative feature extraction and multi-modal data integration have demonstrated promising results on small surgical datasets. Currently, unsupervised methods perform significantly less well than the supervised approaches. CONCLUSION The development of large and diverse open-source datasets of annotated demonstrations is essential for development and validation of robust solutions for surgical gesture recognition. While new strategies for discriminative feature extraction and knowledge transfer, or unsupervised and semi-supervised approaches, can mitigate the need for data and labels, they have not yet been demonstrated to achieve comparable performance. Important future research directions include detection and forecast of gesture-specific errors and anomalies. SIGNIFICANCE This paper is a comprehensive and structured analysis of surgical gesture recognition methods aiming to summarize the status of this rapidly evolving field.
Collapse
|
10
|
A Guide to Annotation of Neurosurgical Intraoperative Video for Machine Learning Analysis and Computer Vision. World Neurosurg 2021; 150:26-30. [PMID: 33722717 DOI: 10.1016/j.wneu.2021.03.022] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Revised: 03/02/2021] [Accepted: 03/03/2021] [Indexed: 11/21/2022]
Abstract
OBJECTIVE Computer vision (CV) is a subset of artificial intelligence that performs computations on image or video data, permitting the quantitative analysis of visual information. Common CV tasks that may be relevant to surgeons include image classification, object detection and tracking, and extraction of higher order features. Despite the potential applications of CV to intraoperative video, however, few surgeons describe the use of CV. A primary roadblock in implementing CV is the lack of a clear workflow to create an intraoperative video dataset to which CV can be applied. We report general principles for creating usable surgical video datasets and the result of their applications. METHODS Video annotations from cadaveric endoscopic endonasal skull base simulations (n = 20 trials of 1-5 minutes, size = 8 GB) were reviewed by 2 researcher-annotators. An internal, retrospective analysis of workflow for development of the intraoperative video annotations was performed to identify guiding practices. RESULTS Approximately 34,000 frames of surgical video were annotated. Key considerations in developing annotation workflows include 1) overcoming software and personnel constraints; 2) ensuring adequate storage and access infrastructure; 3) optimization and standardization of annotation protocol; and 4) operationalizing annotated data. Potential tools for use include CVAT (Computer Vision Annotation Tool) and Vott: open-sourced annotation software allowing for local video storage, easy setup, and the use of interpolation. CONCLUSIONS CV techniques can be applied to surgical video, but challenges for novice users may limit adoption. We outline principles in annotation workflow that can mitigate initial challenges groups may have when converting raw video into useable, annotated datasets.
Collapse
|
11
|
Alnafisee N, Zafar S, Vedula SS, Sikder S. Current methods for assessing technical skill in cataract surgery. J Cataract Refract Surg 2021; 47:256-264. [PMID: 32675650 DOI: 10.1097/j.jcrs.0000000000000322] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 06/19/2020] [Indexed: 12/18/2022]
Abstract
Surgery is a major source of errors in patient care. Preventing complications from surgical errors in the operating room is estimated to lead to reduction of up to 41 846 readmissions and save $620.3 million per year. It is now established that poor technical skill is associated with an increased risk of severe adverse events postoperatively and traditional models to train surgeons are being challenged by rapid advances in technology, an intensified patient-safety culture, and a need for value-driven health systems. This review discusses the current methods available for evaluating technical skills in cataract surgery and the recent technological advancements that have enabled capture and analysis of large amounts of complex surgical data for more automated objective skills assessment.
Collapse
Affiliation(s)
- Nouf Alnafisee
- From the The Wilmer Eye Institute, Johns Hopkins University School of Medicine (Alnafisee, Zafar, Sikder), Baltimore, and the Department of Computer Science, Malone Center for Engineering in Healthcare, The Johns Hopkins University Whiting School of Engineering (Vedula), Baltimore, Maryland, USA
| | | | | | | |
Collapse
|
12
|
LRTD: long-range temporal dependency based active learning for surgical workflow recognition. Int J Comput Assist Radiol Surg 2020; 15:1573-1584. [PMID: 32588246 DOI: 10.1007/s11548-020-02198-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Accepted: 05/18/2020] [Indexed: 10/24/2022]
Abstract
PURPOSE Automatic surgical workflow recognition in video is an essentially fundamental yet challenging problem for developing computer-assisted and robotic-assisted surgery. Existing approaches with deep learning have achieved remarkable performance on analysis of surgical videos, however, heavily relying on large-scale labelled datasets. Unfortunately, the annotation is not often available in abundance, because it requires the domain knowledge of surgeons. Even for experts, it is very tedious and time-consuming to do a sufficient amount of annotations. METHODS In this paper, we propose a novel active learning method for cost-effective surgical video analysis. Specifically, we propose a non-local recurrent convolutional network, which introduces non-local block to capture the long-range temporal dependency (LRTD) among continuous frames. We then formulate an intra-clip dependency score to represent the overall dependency within this clip. By ranking scores among clips in unlabelled data pool, we select the clips with weak dependencies to annotate, which indicates the most informative ones to better benefit network training. RESULTS We validate our approach on a large surgical video dataset (Cholec80) by performing surgical workflow recognition task. By using our LRTD based selection strategy, we can outperform other state-of-the-art active learning methods who only consider neighbor-frame information. Using only up to 50% of samples, our approach can exceed the performance of full-data training. CONCLUSION By modeling the intra-clip dependency, our LRTD based strategy shows stronger capability to select informative video clips for annotation compared with other active learning methods, through the evaluation on a popular public surgical dataset. The results also show the promising potential of our framework for reducing annotation workload in the clinical practice.
Collapse
|
13
|
Gómez-Correa JE, Torres-Treviño LM, Moragrega-Adame E, Mayorquin-Ruiz M, Villalobos-Ojeda C, Velasco-Barona C, Chávez-Cerda S. Intelligent-assistant system for scleral spur location. APPLIED OPTICS 2020; 59:3026-3032. [PMID: 32400579 DOI: 10.1364/ao.384440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Accepted: 03/03/2020] [Indexed: 06/11/2023]
Abstract
A system based on the use of two artificial neural networks (ANNs) to determine the location of the scleral spur of the human eye in ocular images generated by an ultrasound biomicroscopy is presented in this paper. The two ANNs establish a relationship between the distance of four manually placed landmarks in an ocular image with the coordinates of the scleral spur. The latter coordinates are generated by the expert knowledge of a subject matter specialist. Trained ANNs that generate good results for scleral spur location are incorporated into a software system. Statistical indicators and results yield an efficiency performance above 95%.
Collapse
|
14
|
Schoeb D, Suarez-Ibarrola R, Hein S, Dressler FF, Adams F, Schlager D, Miernik A. Use of Artificial Intelligence for Medical Literature Search: Randomized Controlled Trial Using the Hackathon Format. Interact J Med Res 2020; 9:e16606. [PMID: 32224481 PMCID: PMC7154940 DOI: 10.2196/16606] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Revised: 11/24/2019] [Accepted: 12/15/2019] [Indexed: 12/17/2022] Open
Abstract
Background Mapping out the research landscape around a project is often time consuming and difficult. Objective This study evaluates a commercial artificial intelligence (AI) search engine (IRIS.AI) for its applicability in an automated literature search on a specific medical topic. Methods To evaluate the AI search engine in a standardized manner, the concept of a science hackathon was applied. Three groups of researchers were tasked with performing a literature search on a clearly defined scientific project. All participants had a high level of expertise for this specific field of research. Two groups were given access to the AI search engine IRIS.AI. All groups were given the same amount of time for their search and were instructed to document their results. Search results were summarized and ranked according to a predetermined scoring system. Results The final scoring awarded 49 and 39 points out of 60 to AI groups 1 and 2, respectively, and the control group received 46 points. A total of 20 scientific studies with high relevance were identified, and 5 highly relevant studies (“spot on”) were reported by each group. Conclusions AI technology is a promising approach to facilitate literature searches and the management of medical libraries. In this study, however, the application of AI technology lead to a more focused literature search without a significant improvement in the number of results.
Collapse
Affiliation(s)
- Dominik Schoeb
- Medical Center - Department of Urology, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Rodrigo Suarez-Ibarrola
- Medical Center - Department of Urology, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Simon Hein
- Medical Center - Department of Urology, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Franz Friedrich Dressler
- Medical Center - Department of Urology, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Fabian Adams
- Medical Center - Department of Urology, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Daniel Schlager
- Medical Center - Department of Urology, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Arkadiusz Miernik
- Medical Center - Department of Urology, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| |
Collapse
|
15
|
Real-time automatic surgical phase recognition in laparoscopic sigmoidectomy using the convolutional neural network-based deep learning approach. Surg Endosc 2019; 34:4924-4931. [PMID: 31797047 DOI: 10.1007/s00464-019-07281-0] [Citation(s) in RCA: 78] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Accepted: 11/23/2019] [Indexed: 02/06/2023]
Abstract
BACKGROUND Automatic surgical workflow recognition is a key component for developing the context-aware computer-assisted surgery (CA-CAS) systems. However, automatic surgical phase recognition focused on colorectal surgery has not been reported. We aimed to develop a deep learning model for automatic surgical phase recognition based on laparoscopic sigmoidectomy (Lap-S) videos, which could be used for real-time phase recognition, and to clarify the accuracies of the automatic surgical phase and action recognitions using visual information. METHODS The dataset used contained 71 cases of Lap-S. The video data were divided into frame units every 1/30 s as static images. Every Lap-S video was manually divided into 11 surgical phases (Phases 0-10) and manually annotated for each surgical action on every frame. The model was generated based on the training data. Validation of the model was performed on a set of unseen test data. Convolutional neural network (CNN)-based deep learning was also used. RESULTS The average surgical time was 175 min (± 43 min SD), with the individual surgical phases also showing high variations in the duration between cases. Each surgery started in the first phase (Phase 0) and ended in the last phase (Phase 10), and phase transitions occurred 14 (± 2 SD) times per procedure on an average. The accuracy of the automatic surgical phase recognition was 91.9% and those for the automatic surgical action recognition of extracorporeal action and irrigation were 89.4% and 82.5%, respectively. Moreover, this system could perform real-time automatic surgical phase recognition at 32 fps. CONCLUSIONS The CNN-based deep learning approach enabled the recognition of surgical phases and actions in 71 Lap-S cases based on manually annotated data. This system could perform automatic surgical phase recognition and automatic target surgical action recognition with high accuracy. Moreover, this study showed the feasibility of real-time automatic surgical phase recognition with high frame rate.
Collapse
|
16
|
Jin Y, Li H, Dou Q, Chen H, Qin J, Fu CW, Heng PA. Multi-task recurrent convolutional network with correlation loss for surgical video analysis. Med Image Anal 2019; 59:101572. [PMID: 31639622 DOI: 10.1016/j.media.2019.101572] [Citation(s) in RCA: 68] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Revised: 09/29/2019] [Accepted: 10/03/2019] [Indexed: 12/16/2022]
Abstract
Surgical tool presence detection and surgical phase recognition are two fundamental yet challenging tasks in surgical video analysis as well as very essential components in various applications in modern operating rooms. While these two analysis tasks are highly correlated in clinical practice as the surgical process is typically well-defined, most previous methods tackled them separately, without making full use of their relatedness. In this paper, we present a novel method by developing a multi-task recurrent convolutional network with correlation loss (MTRCNet-CL) to exploit their relatedness to simultaneously boost the performance of both tasks. Specifically, our proposed MTRCNet-CL model has an end-to-end architecture with two branches, which share earlier feature encoders to extract general visual features while holding respective higher layers targeting for specific tasks. Given that temporal information is crucial for phase recognition, long-short term memory (LSTM) is explored to model the sequential dependencies in the phase recognition branch. More importantly, a novel and effective correlation loss is designed to model the relatedness between tool presence and phase identification of each video frame, by minimizing the divergence of predictions from the two branches. Mutually leveraging both low-level feature sharing and high-level prediction correlating, our MTRCNet-CL method can encourage the interactions between the two tasks to a large extent, and hence can bring about benefits to each other. Extensive experiments on a large surgical video dataset (Cholec80) demonstrate outstanding performance of our proposed method, consistently exceeding the state-of-the-art methods by a large margin, e.g., 89.1% v.s. 81.0% for the mAP in tool presence detection and 87.4% v.s. 84.5% for F1 score in phase recognition.
Collapse
Affiliation(s)
- Yueming Jin
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, China
| | - Huaxia Li
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, China
| | - Qi Dou
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, China.
| | - Hao Chen
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, China
| | - Jing Qin
- Centre for Smart Health, School of Nursing, The Hong Kong Polytechnic University, China
| | - Chi-Wing Fu
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, China
| | - Pheng-Ann Heng
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, China; T Stone Robotics Institute, The Chinese University of Hong Kong, China
| |
Collapse
|
17
|
Yu F, Silva Croso G, Kim TS, Song Z, Parker F, Hager GD, Reiter A, Vedula SS, Ali H, Sikder S. Assessment of Automated Identification of Phases in Videos of Cataract Surgery Using Machine Learning and Deep Learning Techniques. JAMA Netw Open 2019; 2:e191860. [PMID: 30951163 PMCID: PMC6450320 DOI: 10.1001/jamanetworkopen.2019.1860] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
IMPORTANCE Competence in cataract surgery is a public health necessity, and videos of cataract surgery are routinely available to educators and trainees but currently are of limited use in training. Machine learning and deep learning techniques can yield tools that efficiently segment videos of cataract surgery into constituent phases for subsequent automated skill assessment and feedback. OBJECTIVE To evaluate machine learning and deep learning algorithms for automated phase classification of manually presegmented phases in videos of cataract surgery. DESIGN, SETTING, AND PARTICIPANTS This was a cross-sectional study using a data set of videos from a convenience sample of 100 cataract procedures performed by faculty and trainee surgeons in an ophthalmology residency program from July 2011 to December 2017. Demographic characteristics for surgeons and patients were not captured. Ten standard labels in the procedure and 14 instruments used during surgery were manually annotated, which served as the ground truth. EXPOSURES Five algorithms with different input data: (1) a support vector machine input with cross-sectional instrument label data; (2) a recurrent neural network (RNN) input with a time series of instrument labels; (3) a convolutional neural network (CNN) input with cross-sectional image data; (4) a CNN-RNN input with a time series of images; and (5) a CNN-RNN input with time series of images and instrument labels. Each algorithm was evaluated with 5-fold cross-validation. MAIN OUTCOMES AND MEASURES Accuracy, area under the receiver operating characteristic curve, sensitivity, specificity, and precision. RESULTS Unweighted accuracy for the 5 algorithms ranged between 0.915 and 0.959. Area under the receiver operating characteristic curve for the 5 algorithms ranged between 0.712 and 0.773, with small differences among them. The area under the receiver operating characteristic curve for the image-only CNN-RNN (0.752) was significantly greater than that of the CNN with cross-sectional image data (0.712) (difference, -0.040; 95% CI, -0.049 to -0.033) and the CNN-RNN with images and instrument labels (0.737) (difference, 0.016; 95% CI, 0.014 to 0.018). While specificity was uniformly high for all phases with all 5 algorithms (range, 0.877 to 0.999), sensitivity ranged between 0.005 (95% CI, 0.000 to 0.015) for the support vector machine for wound closure (corneal hydration) and 0.974 (95% CI, 0.957 to 0.991) for the RNN for main incision. Precision ranged between 0.283 and 0.963. CONCLUSIONS AND RELEVANCE Time series modeling of instrument labels and video images using deep learning techniques may yield potentially useful tools for the automated detection of phases in cataract surgery procedures.
Collapse
Affiliation(s)
- Felix Yu
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland
| | | | - Tae Soo Kim
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland
| | - Ziang Song
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland
| | - Felix Parker
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland
| | - Gregory D. Hager
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland
- Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, Maryland
| | - Austin Reiter
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland
- Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, Maryland
| | - S. Swaroop Vedula
- Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, Maryland
| | - Haider Ali
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland
- Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, Maryland
| | - Shameema Sikder
- Wilmer Eye Institute, Johns Hopkins University School of Medicine, Baltimore, Maryland
| |
Collapse
|
18
|
Abstract
Recent years have seen tremendous progress in artificial intelligence (AI), such as with the automatic and real-time recognition of objects and activities in videos in the field of computer vision. Due to its increasing digitalization, the operating room (OR) promises to directly benefit from this progress in the form of new assistance tools that can enhance the abilities and performance of surgical teams. Key for such tools is the recognition of the surgical workflow, because efficient assistance by an AI system requires this system to be aware of the surgical context, namely of all activities taking place inside the operating room. We present here how several recent techniques relying on machine and deep learning can be used to analyze the activities taking place during surgery, using videos captured from either endoscopic or ceiling-mounted cameras. We also present two potential clinical applications that we are developing at the University of Strasbourg with our clinical partners.
Collapse
Affiliation(s)
- Nicolas Padoy
- a ICube, IHU Strasbourg, CNRS , University of Strasbourg , Strasbourg , France
| |
Collapse
|
19
|
Loukas C. Video content analysis of surgical procedures. Surg Endosc 2017; 32:553-568. [PMID: 29075965 DOI: 10.1007/s00464-017-5878-1] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Accepted: 09/07/2017] [Indexed: 12/14/2022]
Abstract
BACKGROUND In addition to its therapeutic benefits, minimally invasive surgery offers the potential for video recording of the operation. The videos may be archived and used later for reasons such as cognitive training, skills assessment, and workflow analysis. Methods from the major field of video content analysis and representation are increasingly applied in the surgical domain. In this paper, we review recent developments and analyze future directions in the field of content-based video analysis of surgical operations. METHODS The review was obtained from PubMed and Google Scholar search on combinations of the following keywords: 'surgery', 'video', 'phase', 'task', 'skills', 'event', 'shot', 'analysis', 'retrieval', 'detection', 'classification', and 'recognition'. The collected articles were categorized and reviewed based on the technical goal sought, type of surgery performed, and structure of the operation. RESULTS A total of 81 articles were included. The publication activity is constantly increasing; more than 50% of these articles were published in the last 3 years. Significant research has been performed for video task detection and retrieval in eye surgery. In endoscopic surgery, the research activity is more diverse: gesture/task classification, skills assessment, tool type recognition, shot/event detection and retrieval. Recent works employ deep neural networks for phase and tool recognition as well as shot detection. CONCLUSIONS Content-based video analysis of surgical operations is a rapidly expanding field. Several future prospects for research exist including, inter alia, shot boundary detection, keyframe extraction, video summarization, pattern discovery, and video annotation. The development of publicly available benchmark datasets to evaluate and compare task-specific algorithms is essential.
Collapse
Affiliation(s)
- Constantinos Loukas
- Laboratory of Medical Physics, Medical School, National and Kapodistrian University of Athens, Mikras Asias 75 str., 11527, Athens, Greece.
| |
Collapse
|
20
|
Abstract
Due to the rapidly evolving medical, technological, and technical possibilities, surgical procedures are becoming more and more complex. On the one hand, this offers an increasing number of advantages for patients, such as enhanced patient safety, minimal invasive interventions, and less medical malpractices. On the other hand, it also heightens pressure on surgeons and other clinical staff and has brought about a new policy in hospitals, which must rely on a great number of economic, social, psychological, qualitative, practical, and technological resources. As a result, medical disciplines, such as surgery, are slowly merging with technical disciplines. However, this synergy is not yet fully matured. The current information and communication technology in hospitals cannot manage the clinical and operational sequence adequately. The consequences are breaches in the surgical workflow, extensions in procedure times, and media disruptions. Furthermore, the data accrued in operating rooms (ORs) by surgeons and systems are not sufficiently implemented. A flood of information, “big data”, is available from information systems. That might be deployed in the context of Medicine 4.0 to facilitate the surgical treatment. However, it is unused due to infrastructure breaches or communication errors. Surgical process models (SPMs) alleviate these problems. They can be defined as simplified, formal, or semiformal representations of a network of surgery-related activities, reflecting a predefined subset of interest. They can employ different means of generation, languages, and data acquisition strategies. They can represent surgical interventions with high resolution, offering qualifiable and quantifiable information on the course of the intervention on the level of single, minute, surgical work-steps. The basic idea is to gather information concerning the surgical intervention and its activities, such as performance time, surgical instrument used, trajectories, movements, or intervention phases. These data can be gathered by means of workflow recordings. These recordings are abstracted to represent an individual surgical process as a model and are an essential requirement to enable Medicine 4.0 in the OR. Further abstraction can be generated by merging individual process models to form generic SPMs to increase the validity for a larger number of patients. Furthermore, these models can be applied in a wide variety of use-cases. In this regard, the term “modeling” can be used to support either one or more of the following tasks: “to describe”, “to understand”, “to explain”, to optimize”, “to learn”, “to teach”, or “to automate”. Possible use-cases are requirements analyses, evaluating surgical assist systems, generating surgeon-specific training-recommendation, creating workflow management systems for ORs, and comparing different surgical strategies. The presented chapter will give an introduction into this challenging topic, presenting different methods to generate SPMs from the workflow in the OR, as well as various use-cases, and state-of-the-art research in this field. Although many examples in the article are given according to SPMs that were computed based on observations, the same approaches can be easily applied to SPMs that were measured automatically and mined from big data.
Collapse
Affiliation(s)
- Thomas Neumuth
- Innovation Center Computer Assisted Surgery (ICCAS), Universität Leipzig, Leipzig, Germany
| |
Collapse
|
21
|
Quellec G, Cazuguel G, Cochener B, Lamard M. Multiple-Instance Learning for Medical Image and Video Analysis. IEEE Rev Biomed Eng 2017; 10:213-234. [DOI: 10.1109/rbme.2017.2651164] [Citation(s) in RCA: 86] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
22
|
Charrière K, Quellec G, Lamard M, Coatrieux G, Cochener B, Cazuguel G. Automated surgical step recognition in normalized cataract surgery videos. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2015; 2014:4647-50. [PMID: 25571028 DOI: 10.1109/embc.2014.6944660] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Huge amounts of surgical data are recorded during video-monitored surgery. Content-based video retrieval systems intent to reuse those data for computer-aided surgery. In this paper, we focus on real-time recognition of cataract surgery steps: the goal is to retrieve from a database surgery videos that were recorded during the same surgery step. The proposed system relies on motion features for video characterization. Motion features are usually impacted by eye motion or zoom level variations, which are not necessarily relevant for surgery step recognition. Those problems certainly limit the performance of the retrieval system. We therefore propose to refine motion feature extraction by applying pre-processing steps based on a novel pupil center and scale tracking method. Those pre-processing steps are evaluated for two different motion features. In this paper, a similarity measure adapted from Piciarelli's video surveillance system is evaluated for the first time in a surgery dataset. This similarity measure provides good results and for both motion features, the proposed preprocessing steps improved the retrieval performance of the system significantly.
Collapse
|
23
|
Voros S, Moreau-Gaudry A. How Sensor, Signal, and Imaging Informatics May Impact Patient Centered Care and Care Coordination. Yearb Med Inform 2015; 10:102-5. [PMID: 26293856 DOI: 10.15265/iy-2015-025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
OBJECTIVE This synopsis presents a selection for the IMIA (International Medical Informatics Association) Yearbook 2015 of excellent research in the broad field of Sensor, Signal, and Imaging Informatics published in the year 2014, with a focus on patient centered care coordination. METHODS The two section editors performed a systematic initial selection and a double blind peer review process to select a list of candidate best papers in the domain published in 2014, from the PubMed and Web of Science databases. A set of MeSH keywords provided by experts was used. This selection was peer-reviewed by external reviewers. RESULTS The review process highlighted articles illustrating two current trends related to care coordination and patient centered care: the enhanced capacity to predict the evolution of a disease based on patient-specific information can impact care coordination; similarly, better perception of the patient and his treatment could lead to enhanced personalized care with a potential impact on care coordination. CONCLUSIONS This review shows the multiplicity of angles from which the question of patient-centered care can be addressed, with consequences on care coordination that will need to be confirmed and demonstrated in the future.
Collapse
Affiliation(s)
- S Voros
- Sandrine Voros, Laboratoire TIMC-IMAG, équipe GMCAO, IN3S, pavillon Taillefer, Faculté de Médecine, 38706 La Tronche Cedex, France, Tel: +33 4 56 52 00 09, Fax +33 4 56 52 00 55, E-mail:
| | | |
Collapse
|
24
|
Quellec G, Lamard M, Cochener B, Cazuguel G. Real-time task recognition in cataract surgery videos using adaptive spatiotemporal polynomials. IEEE TRANSACTIONS ON MEDICAL IMAGING 2015; 34:877-887. [PMID: 25373078 DOI: 10.1109/tmi.2014.2366726] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
This paper introduces a new algorithm for recognizing surgical tasks in real-time in a video stream. The goal is to communicate information to the surgeon in due time during a video-monitored surgery. The proposed algorithm is applied to cataract surgery, which is the most common eye surgery. To compensate for eye motion and zoom level variations, cataract surgery videos are first normalized. Then, the motion content of short video subsequences is characterized with spatiotemporal polynomials: a multiscale motion characterization based on adaptive spatiotemporal polynomials is presented. The proposed solution is particularly suited to characterize deformable moving objects with fuzzy borders, which are typically found in surgical videos. Given a target surgical task, the system is trained to identify which spatiotemporal polynomials are usually extracted from videos when and only when this task is being performed. These key spatiotemporal polynomials are then searched in new videos to recognize the target surgical task. For improved performances, the system jointly adapts the spatiotemporal polynomial basis and identifies the key spatiotemporal polynomials using the multiple-instance learning paradigm. The proposed system runs in real-time and outperforms the previous solution from our group, both for surgical task recognition ( Az = 0.851 on average, as opposed to Az = 0.794 previously) and for the joint segmentation and recognition of surgical tasks ( Az = 0.856 on average, as opposed to Az = 0.832 previously).
Collapse
|
25
|
Quellec G, Lamard M, Cochener B, Cazuguel G. Real-time segmentation and recognition of surgical tasks in cataract surgery videos. IEEE TRANSACTIONS ON MEDICAL IMAGING 2014; 33:2352-60. [PMID: 25055383 DOI: 10.1109/tmi.2014.2340473] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
In ophthalmology, it is now common practice to record every surgical procedure and to archive the resulting videos for documentation purposes. In this paper, we present a solution to automatically segment and categorize surgical tasks in real-time during the surgery, using the video recording. The goal would be to communicate information to the surgeon in due time, such as recommendations to the less experienced surgeons. The proposed solution relies on the content-based video retrieval paradigm: it reuses previously archived videos to automatically analyze the current surgery, by analogy reasoning. Each video is segmented, in real-time, into an alternating sequence of idle phases, during which no clinically-relevant motions are visible, and action phases. As soon as an idle phase is detected, the previous action phase is categorized and the next action phase is predicted. A conditional random field is used for categorization and prediction. The proposed system was applied to the automatic segmentation and categorization of cataract surgery tasks. A dataset of 186 surgeries, performed by ten different surgeons, was manually annotated: ten possibly overlapping surgical tasks were delimited in each surgery. Using the content of action phases and the duration of idle phases as sources of evidence, an average recognition performance of Az = 0.832 ± 0.070 was achieved.
Collapse
|
26
|
Quellec G, Charriére K, Lamard M, Cochener B, Cazuguel G. Normalizing videos of anterior eye segment surgeries. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2014; 2014:122-125. [PMID: 25569912 DOI: 10.1109/embc.2014.6943544] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Anterior eye segment surgeries are usually video-recorded. If we are able to efficiently analyze surgical videos in real-time, new decision support tools will emerge. The main anatomical landmarks in these videos are the pupil boundaries and the limbus, but segmenting them is challenging due to the variety of colors and textures in the pupil, the iris, the sclera and the lids. In this paper, we present a solution to reliably normalize the center and the scale in videos, without explicitly segmenting these landmarks. First, a robust solution to track the pupil center is presented: it uses the fact that the pupil boundaries, the limbus and the sclera / lid interface are concentric. Second, a solution to estimate the zoom level is presented: it relies on the illumination pattern reflected on the cornea. The proposed solution was assessed in a dataset of 186 real-live cataract surgery videos. The distance between the true and estimated pupil centers was equal to 8.0 ± 6.9% of the limbus radius. The correlation between the estimated zoom level and the true limbus size in images was high: R = 0.834.
Collapse
|