1
|
Giap BD, Srinivasan K, Mahmoud O, Ballouz D, Lustre J, Likosky K, Mian SI, Tannen BL, Nallasamy N. A Computational Framework for Intraoperative Pupil Analysis in Cataract Surgery. OPHTHALMOLOGY SCIENCE 2025; 5:100597. [PMID: 39435136 PMCID: PMC11492071 DOI: 10.1016/j.xops.2024.100597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 08/06/2024] [Accepted: 08/14/2024] [Indexed: 10/23/2024]
Abstract
Purpose Pupillary instability is a known risk factor for complications in cataract surgery. This study aims to develop and validate an innovative and reliable computational framework for the automated assessment of pupil morphologic changes during the various phases of cataract surgery. Design Retrospective surgical video analysis. Subjects Two hundred forty complete surgical video recordings, among which 190 surgeries were conducted without the use of pupil expansion devices (PEDs) and 50 were performed with the use of a PED. Methods The proposed framework consists of 3 stages: feature extraction, deep learning (DL)-based anatomy recognition, and obstruction (OB) detection/compensation. In the first stage, surgical video frames undergo noise reduction using a tensor-based wavelet feature extraction method. In the second stage, DL-based segmentation models are trained and employed to segment the pupil, limbus, and palpebral fissure. In the third stage, obstructed visualization of the pupil is detected and compensated for using a DL-based algorithm. A dataset of 5700 intraoperative video frames across 190 cataract surgeries in the BigCat database was collected for validating algorithm performance. Main Outcome Measures The pupil analysis framework was assessed on the basis of segmentation performance for both obstructed and unobstructed pupils. Classification performance of models utilizing the segmented pupil time series to predict surgeon use of a PED was also assessed. Results An architecture based on the Feature Pyramid Network model with Visual Geometry Group 16 backbone integrated with the adaptive wavelet tensor feature extraction feature extraction method demonstrated the highest performance in anatomy segmentation, with Dice coefficient of 96.52%. Incorporation of an OB compensation algorithm improved performance further (Dice 96.82%). Downstream analysis of framework output enabled the development of a Support Vector Machine-based classifier that could predict surgeon usage of a PED prior to its placement with 96.67% accuracy and area under the curve of 99.44%. Conclusions The experimental results demonstrate that the proposed framework (1) provides high accuracy in pupil analysis compared with human-annotated ground truth, (2) substantially outperforms isolated use of a DL segmentation model, and (3) can enable downstream analytics with clinically valuable predictive capacity. Financial Disclosures Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
Collapse
Affiliation(s)
- Binh Duong Giap
- Kellogg Eye Center, Department of Ophthalmology & Visual Sciences, University of Michigan, 1000 Wall Street, Ann Arbor, Michigan, 48105
| | - Karthik Srinivasan
- Department of Vitreo Retinal, Aravind Eye Hospital, Chennai, Tamil Nadu, 600077, India
| | - Ossama Mahmoud
- Kellogg Eye Center, Department of Ophthalmology & Visual Sciences, University of Michigan, 1000 Wall Street, Ann Arbor, Michigan, 48105
- Wayne State University School of Medicine, 540 E Canfield Street, Detroit, Michigan, 48201
| | - Dena Ballouz
- Kellogg Eye Center, Department of Ophthalmology & Visual Sciences, University of Michigan, 1000 Wall Street, Ann Arbor, Michigan, 48105
| | - Jefferson Lustre
- Kellogg Eye Center, Department of Ophthalmology & Visual Sciences, University of Michigan, 1000 Wall Street, Ann Arbor, Michigan, 48105
| | - Keely Likosky
- Kellogg Eye Center, Department of Ophthalmology & Visual Sciences, University of Michigan, 1000 Wall Street, Ann Arbor, Michigan, 48105
| | - Shahzad I. Mian
- Kellogg Eye Center, Department of Ophthalmology & Visual Sciences, University of Michigan, 1000 Wall Street, Ann Arbor, Michigan, 48105
| | - Bradford L. Tannen
- Kellogg Eye Center, Department of Ophthalmology & Visual Sciences, University of Michigan, 1000 Wall Street, Ann Arbor, Michigan, 48105
| | - Nambi Nallasamy
- Kellogg Eye Center, Department of Ophthalmology & Visual Sciences, University of Michigan, 1000 Wall Street, Ann Arbor, Michigan, 48105
- Department of Computational Medicine & Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, Michigan, 48109
| |
Collapse
|
2
|
Ahuja AS, Paredes III AA, Eisel MLS, Kodwani S, Wagner IV, Miller DD, Dorairaj S. Applications of Artificial Intelligence in Cataract Surgery: A Review. Clin Ophthalmol 2024; 18:2969-2975. [PMID: 39434720 PMCID: PMC11492897 DOI: 10.2147/opth.s489054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2024] [Accepted: 09/21/2024] [Indexed: 10/23/2024] Open
Abstract
Cataract surgery is one of the most performed procedures worldwide, and cataracts are rising in prevalence in our aging population. With the increasing utilization of artificial intelligence (AI) in the medical field, we aimed to understand the extent of present AI applications in ophthalmic microsurgery, specifically cataract surgery. We conducted a literature search on PubMed and Google Scholar using keywords related to the application of AI in cataract surgery and included relevant articles published since 2010 in our review. The literature search yielded information on AI mechanisms such as machine learning (ML), deep learning (DL), and convolutional neural networks (CNN) as they are being incorporated into pre-operative, intraoperative, and post-operative stages of cataract surgery. AI is currently integrated in the pre-operative stage of cataract surgery to calculate intraocular lens (IOL) power and diagnose cataracts with slit-lamp microscopy and retinal imaging. During the intraoperative stage, AI has been applied to risk calculation, tracking surgical workflow, multimodal imaging data analysis, and instrument location via the use of "smart instruments". AI is also involved in predicting post-operative complications, such as posterior capsular opacification and intraocular lens dislocation, and organizing follow-up patient care. Challenges such as limited imaging dataset availability, unstandardized deep learning analysis metrics, and lack of generalizability to novel datasets currently present obstacles to the enhanced application of AI in cataract surgery. Upon addressing these barriers in upcoming research, AI stands to improve cataract screening accessibility, junior physician training, and identification of surgical complications through future applications of AI in cataract surgery.
Collapse
Affiliation(s)
- Abhimanyu S Ahuja
- Department of Ophthalmology, Casey Eye Institute, Oregon Health and Science University, Portland, OR, USA
| | - Alfredo A Paredes III
- Charles E. Schmidt College of Medicine, Florida Atlantic University, Boca Raton, FL, USA
| | | | - Sejal Kodwani
- Windsor University School of Medicine, Cayon, St. Kitts, KN
| | - Isabella V Wagner
- Department of Ophthalmology, Mayo Clinic Florida, Jacksonville, FL, USA
| | - Darby D Miller
- Department of Ophthalmology, Mayo Clinic Florida, Jacksonville, FL, USA
| | - Syril Dorairaj
- Department of Ophthalmology, Mayo Clinic Florida, Jacksonville, FL, USA
| |
Collapse
|
3
|
Satyanaik S, Murali A, Alapatt D, Wang X, Mascagni P, Padoy N. Optimizing latent graph representations of surgical scenes for unseen domain generalization. Int J Comput Assist Radiol Surg 2024; 19:1243-1250. [PMID: 38678488 DOI: 10.1007/s11548-024-03121-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Accepted: 03/22/2024] [Indexed: 05/01/2024]
Abstract
PURPOSE Advances in deep learning have resulted in effective models for surgical video analysis; however, these models often fail to generalize across medical centers due to domain shift caused by variations in surgical workflow, camera setups, and patient demographics. Recently, object-centric learning has emerged as a promising approach for improved surgical scene understanding, capturing and disentangling visual and semantic properties of surgical tools and anatomy to improve downstream task performance. In this work, we conduct a multicentric performance benchmark of object-centric approaches, focusing on critical view of safety assessment in laparoscopic cholecystectomy, then propose an improved approach for unseen domain generalization. METHODS We evaluate four object-centric approaches for domain generalization, establishing baseline performance. Next, leveraging the disentangled nature of object-centric representations, we dissect one of these methods through a series of ablations (e.g., ignoring either visual or semantic features for downstream classification). Finally, based on the results of these ablations, we develop an optimized method specifically tailored for domain generalization, LG-DG, that includes a novel disentanglement loss function. RESULTS Our optimized approach, LG-DG, achieves an improvement of 9.28% over the best baseline approach. More broadly, we show that object-centric approaches are highly effective for domain generalization thanks to their modular approach to representation learning. CONCLUSION We investigate the use of object-centric methods for unseen domain generalization, identify method-agnostic factors critical for performance, and present an optimized approach that substantially outperforms existing methods.
Collapse
Affiliation(s)
| | - Aditya Murali
- ICube, University of Strasbourg, CNRS, Strasbourg, France.
| | - Deepak Alapatt
- ICube, University of Strasbourg, CNRS, Strasbourg, France
| | - Xin Wang
- West China Hospital of Sichuan University, Chengdu, China
| | - Pietro Mascagni
- IHU, Strasbourg, France
- Fondazione Policlinico Universitario A. Gemelli IRCCS, Rome, Italy
| | - Nicolas Padoy
- ICube, University of Strasbourg, CNRS, Strasbourg, France
- IHU, Strasbourg, France
| |
Collapse
|
4
|
Ghamsarian N, Wolf S, Zinkernagel M, Schoeffmann K, Sznitman R. DeepPyramid+: medical image segmentation using Pyramid View Fusion and Deformable Pyramid Reception. Int J Comput Assist Radiol Surg 2024; 19:851-859. [PMID: 38189905 DOI: 10.1007/s11548-023-03046-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 12/07/2023] [Indexed: 01/09/2024]
Abstract
PURPOSE Semantic segmentation plays a pivotal role in many applications related to medical image and video analysis. However, designing a neural network architecture for medical image and surgical video segmentation is challenging due to the diverse features of relevant classes, including heterogeneity, deformability, transparency, blunt boundaries, and various distortions. We propose a network architecture, DeepPyramid+, which addresses diverse challenges encountered in medical image and surgical video segmentation. METHODS The proposed DeepPyramid+ incorporates two major modules, namely "Pyramid View Fusion" (PVF) and "Deformable Pyramid Reception" (DPR), to address the outlined challenges. PVF replicates a deduction process within the neural network, aligning with the human visual system, thereby enhancing the representation of relative information at each pixel position. Complementarily, DPR introduces shape- and scale-adaptive feature extraction techniques using dilated deformable convolutions, enhancing accuracy and robustness in handling heterogeneous classes and deformable shapes. RESULTS Extensive experiments conducted on diverse datasets, including endometriosis videos, MRI images, OCT scans, and cataract and laparoscopy videos, demonstrate the effectiveness of DeepPyramid+ in handling various challenges such as shape and scale variation, reflection, and blur degradation. DeepPyramid+ demonstrates significant improvements in segmentation performance, achieving up to a 3.65% increase in Dice coefficient for intra-domain segmentation and up to a 17% increase in Dice coefficient for cross-domain segmentation. CONCLUSIONS DeepPyramid+ consistently outperforms state-of-the-art networks across diverse modalities considering different backbone networks, showcasing its versatility. Accordingly, DeepPyramid+ emerges as a robust and effective solution, successfully overcoming the intricate challenges associated with relevant content segmentation in medical images and surgical videos. Its consistent performance and adaptability indicate its potential to enhance precision in computerized medical image and surgical video analysis applications.
Collapse
Affiliation(s)
- Negin Ghamsarian
- ARTORG Center for Biomedical Engineering Research, University of Bern, Bern, Switzerland.
| | - Sebastian Wolf
- Department of Ophthalmology, Inselspital, Bern, Switzerland
| | | | - Klaus Schoeffmann
- Department of Information Technology, University of Klagenfurt, Klagenfurt, Austria
| | - Raphael Sznitman
- ARTORG Center for Biomedical Engineering Research, University of Bern, Bern, Switzerland
| |
Collapse
|
5
|
Ghamsarian N, El-Shabrawi Y, Nasirihaghighi S, Putzgruber-Adamitsch D, Zinkernagel M, Wolf S, Schoeffmann K, Sznitman R. Cataract-1K Dataset for Deep-Learning-Assisted Analysis of Cataract Surgery Videos. Sci Data 2024; 11:373. [PMID: 38609405 PMCID: PMC11014927 DOI: 10.1038/s41597-024-03193-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Accepted: 03/28/2024] [Indexed: 04/14/2024] Open
Abstract
In recent years, the landscape of computer-assisted interventions and post-operative surgical video analysis has been dramatically reshaped by deep-learning techniques, resulting in significant advancements in surgeons' skills, operation room management, and overall surgical outcomes. However, the progression of deep-learning-powered surgical technologies is profoundly reliant on large-scale datasets and annotations. In particular, surgical scene understanding and phase recognition stand as pivotal pillars within the realm of computer-assisted surgery and post-operative assessment of cataract surgery videos. In this context, we present the largest cataract surgery video dataset that addresses diverse requisites for constructing computerized surgical workflow analysis and detecting post-operative irregularities in cataract surgery. We validate the quality of annotations by benchmarking the performance of several state-of-the-art neural network architectures for phase recognition and surgical scene segmentation. Besides, we initiate the research on domain adaptation for instrument segmentation in cataract surgery by evaluating cross-domain instrument segmentation performance in cataract surgery videos. The dataset and annotations are publicly available in Synapse.
Collapse
Affiliation(s)
- Negin Ghamsarian
- Center for Artificial Intelligence in Medicine (CAIM), Department of Medicine, University of Bern, Bern, Switzerland
| | - Yosuf El-Shabrawi
- Department of Ophthalmology, Klinikum Klagenfurt, Klagenfurt, Austria
| | - Sahar Nasirihaghighi
- Department of Information Technology, University of Klagenfurt, Klagenfurt, Austria
| | | | | | - Sebastian Wolf
- Department of Ophthalmology, Inselspital, Bern, Switzerland
| | - Klaus Schoeffmann
- Department of Information Technology, University of Klagenfurt, Klagenfurt, Austria.
| | - Raphael Sznitman
- Center for Artificial Intelligence in Medicine (CAIM), Department of Medicine, University of Bern, Bern, Switzerland
| |
Collapse
|
6
|
Liu M, Han Y, Wang J, Wang C, Wang Y, Meijering E. LSKANet: Long Strip Kernel Attention Network for Robotic Surgical Scene Segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:1308-1322. [PMID: 38015689 DOI: 10.1109/tmi.2023.3335406] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
Abstract
Surgical scene segmentation is a critical task in Robotic-assisted surgery. However, the complexity of the surgical scene, which mainly includes local feature similarity (e.g., between different anatomical tissues), intraoperative complex artifacts, and indistinguishable boundaries, poses significant challenges to accurate segmentation. To tackle these problems, we propose the Long Strip Kernel Attention network (LSKANet), including two well-designed modules named Dual-block Large Kernel Attention module (DLKA) and Multiscale Affinity Feature Fusion module (MAFF), which can implement precise segmentation of surgical images. Specifically, by introducing strip convolutions with different topologies (cascaded and parallel) in two blocks and a large kernel design, DLKA can make full use of region- and strip-like surgical features and extract both visual and structural information to reduce the false segmentation caused by local feature similarity. In MAFF, affinity matrices calculated from multiscale feature maps are applied as feature fusion weights, which helps to address the interference of artifacts by suppressing the activations of irrelevant regions. Besides, the hybrid loss with Boundary Guided Head (BGH) is proposed to help the network segment indistinguishable boundaries effectively. We evaluate the proposed LSKANet on three datasets with different surgical scenes. The experimental results show that our method achieves new state-of-the-art results on all three datasets with improvements of 2.6%, 1.4%, and 3.4% mIoU, respectively. Furthermore, our method is compatible with different backbones and can significantly increase their segmentation accuracy. Code is available at https://github.com/YubinHan73/LSKANet.
Collapse
|
7
|
Rueckert T, Rueckert D, Palm C. Methods and datasets for segmentation of minimally invasive surgical instruments in endoscopic images and videos: A review of the state of the art. Comput Biol Med 2024; 169:107929. [PMID: 38184862 DOI: 10.1016/j.compbiomed.2024.107929] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 12/02/2023] [Accepted: 01/01/2024] [Indexed: 01/09/2024]
Abstract
In the field of computer- and robot-assisted minimally invasive surgery, enormous progress has been made in recent years based on the recognition of surgical instruments in endoscopic images and videos. In particular, the determination of the position and type of instruments is of great interest. Current work involves both spatial and temporal information, with the idea that predicting the movement of surgical tools over time may improve the quality of the final segmentations. The provision of publicly available datasets has recently encouraged the development of new methods, mainly based on deep learning. In this review, we identify and characterize datasets used for method development and evaluation and quantify their frequency of use in the literature. We further present an overview of the current state of research regarding the segmentation and tracking of minimally invasive surgical instruments in endoscopic images and videos. The paper focuses on methods that work purely visually, without markers of any kind attached to the instruments, considering both single-frame semantic and instance segmentation approaches, as well as those that incorporate temporal information. The publications analyzed were identified through the platforms Google Scholar, Web of Science, and PubMed. The search terms used were "instrument segmentation", "instrument tracking", "surgical tool segmentation", and "surgical tool tracking", resulting in a total of 741 articles published between 01/2015 and 07/2023, of which 123 were included using systematic selection criteria. A discussion of the reviewed literature is provided, highlighting existing shortcomings and emphasizing the available potential for future developments.
Collapse
Affiliation(s)
- Tobias Rueckert
- Regensburg Medical Image Computing (ReMIC), Ostbayerische Technische Hochschule Regensburg (OTH Regensburg), Germany.
| | - Daniel Rueckert
- Artificial Intelligence in Healthcare and Medicine, Klinikum rechts der Isar, Technical University of Munich, Germany; Department of Computing, Imperial College London, UK
| | - Christoph Palm
- Regensburg Medical Image Computing (ReMIC), Ostbayerische Technische Hochschule Regensburg (OTH Regensburg), Germany; Regensburg Center of Health Sciences and Technology (RCHST), OTH Regensburg, Germany
| |
Collapse
|
8
|
Kostiuchik G, Sharan L, Mayer B, Wolf I, Preim B, Engelhardt S. Surgical phase and instrument recognition: how to identify appropriate dataset splits. Int J Comput Assist Radiol Surg 2024:10.1007/s11548-024-03063-9. [PMID: 38285380 DOI: 10.1007/s11548-024-03063-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 01/08/2024] [Indexed: 01/30/2024]
Abstract
PURPOSE Machine learning approaches can only be reliably evaluated if training, validation, and test data splits are representative and not affected by the absence of classes. Surgical workflow and instrument recognition are two tasks that are complicated in this manner, because of heavy data imbalances resulting from different length of phases and their potential erratic occurrences. Furthermore, sub-properties like instrument (co-)occurrence are usually not particularly considered when defining the split. METHODS We present a publicly available data visualization tool that enables interactive exploration of dataset partitions for surgical phase and instrument recognition. The application focuses on the visualization of the occurrence of phases, phase transitions, instruments, and instrument combinations across sets. Particularly, it facilitates assessment of dataset splits, especially regarding identification of sub-optimal dataset splits. RESULTS We performed analysis of the datasets Cholec80, CATARACTS, CaDIS, M2CAI-workflow, and M2CAI-tool using the proposed application. We were able to uncover phase transitions, individual instruments, and combinations of surgical instruments that were not represented in one of the sets. Addressing these issues, we identify possible improvements in the splits using our tool. A user study with ten participants demonstrated that the participants were able to successfully solve a selection of data exploration tasks. CONCLUSION In highly unbalanced class distributions, special care should be taken with respect to the selection of an appropriate dataset split because it can greatly influence the assessments of machine learning approaches. Our interactive tool allows for determination of better splits to improve current practices in the field. The live application is available at https://cardio-ai.github.io/endovis-ml/ .
Collapse
Affiliation(s)
- Georgii Kostiuchik
- Department of Cardiac Surgery, Heidelberg University Hospital, Heidelberg, Germany.
- DZHK (German Centre for Cardiovascular Research), Partner Site Heidelberg/Mannheim, Heidelberg, Germany.
| | - Lalith Sharan
- Department of Cardiac Surgery, Heidelberg University Hospital, Heidelberg, Germany
- DZHK (German Centre for Cardiovascular Research), Partner Site Heidelberg/Mannheim, Heidelberg, Germany
| | - Benedikt Mayer
- Department of Simulation and Graphics, University of Magdeburg, Magdeburg, Germany
| | - Ivo Wolf
- Department of Computer Science, Mannheim University of Applied Sciences, Mannheim, Germany
| | - Bernhard Preim
- Department of Simulation and Graphics, University of Magdeburg, Magdeburg, Germany
| | - Sandy Engelhardt
- Department of Cardiac Surgery, Heidelberg University Hospital, Heidelberg, Germany
- DZHK (German Centre for Cardiovascular Research), Partner Site Heidelberg/Mannheim, Heidelberg, Germany
| |
Collapse
|
9
|
Badilla-Solórzano J, Ihler S, Gellrich NC, Spalthoff S. Improving instrument detection for a robotic scrub nurse using multi-view voting. Int J Comput Assist Radiol Surg 2023; 18:1961-1968. [PMID: 37530904 PMCID: PMC10589190 DOI: 10.1007/s11548-023-03002-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 07/13/2023] [Indexed: 08/03/2023]
Abstract
PURPOSE A basic task of a robotic scrub nurse is surgical instrument detection. Deep learning techniques could potentially address this task; nevertheless, their performance is subject to some degree of error, which could render them unsuitable for real-world applications. In this work, we aim to demonstrate how the combination of a trained instrument detector with an instance-based voting scheme that considers several frames and viewpoints is enough to guarantee a strong improvement in the instrument detection task. METHODS We exploit the typical setup of a robotic scrub nurse to collect RGB data and point clouds from different viewpoints. Using trained Mask R-CNN models, we obtain predictions from each view. We propose a multi-view voting scheme based on predicted instances that combines the gathered data and predictions to produce a reliable map of the location of the instruments in the scene. RESULTS Our approach reduces the number of errors by more than 82% compared with the single-view case. On average, the data from five viewpoints are sufficient to infer the correct instrument arrangement with our best model. CONCLUSION Our approach can drastically improve an instrument detector's performance. Our method is practical and can be applied during an actual medical procedure without negatively affecting the surgical workflow. Our implementation and data are made available for the scientific community ( https://github.com/Jorebs/Multi-view-Voting-Scheme ).
Collapse
Affiliation(s)
| | - Sontje Ihler
- Institute of Mechatronic Systems, Leibniz University Hannover, Garbsen, Germany
| | | | - Simon Spalthoff
- Department of Cranio-Maxillofacial Surgery, Hannover Medical School, Hannover, Germany
| |
Collapse
|
10
|
Lou A, Tawfik K, Yao X, Liu Z, Noble J. Min-Max Similarity: A Contrastive Semi-Supervised Deep Learning Network for Surgical Tools Segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:2832-2841. [PMID: 37037256 PMCID: PMC10597739 DOI: 10.1109/tmi.2023.3266137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
A common problem with segmentation of medical images using neural networks is the difficulty to obtain a significant number of pixel-level annotated data for training. To address this issue, we proposed a semi-supervised segmentation network based on contrastive learning. In contrast to the previous state-of-the-art, we introduce Min-Max Similarity (MMS), a contrastive learning form of dual-view training by employing classifiers and projectors to build all-negative, and positive and negative feature pairs, respectively, to formulate the learning as solving a MMS problem. The all-negative pairs are used to supervise the networks learning from different views and to capture general features, and the consistency of unlabeled predictions is measured by pixel-wise contrastive loss between positive and negative pairs. To quantitatively and qualitatively evaluate our proposed method, we test it on four public endoscopy surgical tool segmentation datasets and one cochlear implant surgery dataset, which we manually annotated. Results indicate that our proposed method consistently outperforms state-of-the-art semi-supervised and fully supervised segmentation algorithms. And our semi-supervised segmentation algorithm can successfully recognize unknown surgical tools and provide good predictions. Also, our MMS approach could achieve inference speeds of about 40 frames per second (fps) and is suitable to deal with the real-time video segmentation.
Collapse
|
11
|
Ramesh S, Srivastav V, Alapatt D, Yu T, Murali A, Sestini L, Nwoye CI, Hamoud I, Sharma S, Fleurentin A, Exarchakis G, Karargyris A, Padoy N. Dissecting self-supervised learning methods for surgical computer vision. Med Image Anal 2023; 88:102844. [PMID: 37270898 DOI: 10.1016/j.media.2023.102844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 05/08/2023] [Accepted: 05/15/2023] [Indexed: 06/06/2023]
Abstract
The field of surgical computer vision has undergone considerable breakthroughs in recent years with the rising popularity of deep neural network-based methods. However, standard fully-supervised approaches for training such models require vast amounts of annotated data, imposing a prohibitively high cost; especially in the clinical domain. Self-Supervised Learning (SSL) methods, which have begun to gain traction in the general computer vision community, represent a potential solution to these annotation costs, allowing to learn useful representations from only unlabeled data. Still, the effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored. In this work, we address this critical need by investigating four state-of-the-art SSL methods (MoCo v2, SimCLR, DINO, SwAV) in the context of surgical computer vision. We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection. We examine their parameterization, then their behavior with respect to training data quantities in semi-supervised settings. Correct transfer of these methods to surgery, as described and conducted in this work, leads to substantial performance gains over generic uses of SSL - up to 7.4% on phase recognition and 20% on tool presence detection - as well as state-of-the-art semi-supervised phase recognition approaches by up to 14%. Further results obtained on a highly diverse selection of surgical datasets exhibit strong generalization properties. The code is available at https://github.com/CAMMA-public/SelfSupSurg.
Collapse
Affiliation(s)
- Sanat Ramesh
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France; Altair Robotics Lab, Department of Computer Science, University of Verona, Verona 37134, Italy
| | - Vinkle Srivastav
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France.
| | - Deepak Alapatt
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France
| | - Tong Yu
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France
| | - Aditya Murali
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France
| | - Luca Sestini
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France; Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milano 20133, Italy
| | | | - Idris Hamoud
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France
| | - Saurav Sharma
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France
| | | | - Georgios Exarchakis
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France; IHU Strasbourg, Strasbourg 67000, France
| | - Alexandros Karargyris
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France; IHU Strasbourg, Strasbourg 67000, France
| | - Nicolas Padoy
- ICube, University of Strasbourg, CNRS, Strasbourg 67000, France; IHU Strasbourg, Strasbourg 67000, France
| |
Collapse
|
12
|
Bhattarai B, Subedi R, Gaire RR, Vazquez E, Stoyanov D. Histogram of Oriented Gradients meet deep learning: A novel multi-task deep network for 2D surgical image semantic segmentation. Med Image Anal 2023; 85:102747. [PMID: 36702038 PMCID: PMC10626764 DOI: 10.1016/j.media.2023.102747] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 12/01/2022] [Accepted: 01/05/2023] [Indexed: 01/15/2023]
Abstract
We present our novel deep multi-task learning method for medical image segmentation. Existing multi-task methods demand ground truth annotations for both the primary and auxiliary tasks. Contrary to it, we propose to generate the pseudo-labels of an auxiliary task in an unsupervised manner. To generate the pseudo-labels, we leverage Histogram of Oriented Gradients (HOGs), one of the most widely used and powerful hand-crafted features for detection. Together with the ground truth semantic segmentation masks for the primary task and pseudo-labels for the auxiliary task, we learn the parameters of the deep network to minimize the loss of both the primary task and the auxiliary task jointly. We employed our method on two powerful and widely used semantic segmentation networks: UNet and U2Net to train in a multi-task setup. To validate our hypothesis, we performed experiments on two different medical image segmentation data sets. From the extensive quantitative and qualitative results, we observe that our method consistently improves the performance compared to the counter-part method. Moreover, our method is the winner of FetReg Endovis Sub-challenge on Semantic Segmentation organised in conjunction with MICCAI 2021. Code and implementation details are available at:https://github.com/thetna/medical_image_segmentation.
Collapse
Affiliation(s)
| | - Ronast Subedi
- Nepal Applied Mathematics and Informatics Institute for research (NAAMII), Nepal
| | - Rebati Raman Gaire
- Nepal Applied Mathematics and Informatics Institute for research (NAAMII), Nepal
| | | | | |
Collapse
|
13
|
Yeh HH, Jain AM, Fox O, Sebov K, Wang SY. PhacoTrainer: Deep Learning for Cataract Surgical Videos to Track Surgical Tools. Transl Vis Sci Technol 2023; 12:23. [PMID: 36947046 PMCID: PMC10050900 DOI: 10.1167/tvst.12.3.23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2022] [Accepted: 02/13/2023] [Indexed: 03/23/2023] Open
Abstract
Purpose The purpose of this study was to build a deep-learning model that automatically analyzes cataract surgical videos for the locations of surgical landmarks, and to derive skill-related motion metrics. Methods The locations of the pupil, limbus, and 8 classes of surgical instruments were identified by a 2-step algorithm: (1) mask segmentation and (2) landmark identification from the masks. To perform mask segmentation, we trained the YOLACT model on 1156 frames sampled from 268 videos and the public Cataract Dataset for Image Segmentation (CaDIS) dataset. Landmark identification was performed by fitting ellipses or lines to the contours of the masks and deriving locations of interest, including surgical tooltips and the pupil center. Landmark identification was evaluated by the distance between the predicted and true positions in 5853 frames of 10 phacoemulsification video clips. We derived the total path length, maximal speed, and covered area using the tip positions and examined the correlation with human-rated surgical performance. Results The mean average precision score and intersection-over-union for mask detection were 0.78 and 0.82. The average distance between the predicted and true positions of the pupil center, phaco tip, and second instrument tip was 5.8, 9.1, and 17.1 pixels. The total path length and covered areas of these landmarks were negatively correlated with surgical performance. Conclusions We developed a deep-learning method to localize key anatomical portions of the eye and cataract surgical tools, which can be used to automatically derive metrics correlated with surgical skill. Translational Relevance Our system could form the basis of an automated feedback system that helps cataract surgeons evaluate their performance.
Collapse
Affiliation(s)
- Hsu-Hang Yeh
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, USA
| | - Anjal M. Jain
- Department of Ophthalmology, Byers Eye Institute, Stanford University, Palo Alto, CA, USA
| | - Olivia Fox
- Krieger School of Arts and Sciences, Johns Hopkins University, Baltimore, MD, USA
| | - Kostya Sebov
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, USA
| | - Sophia Y. Wang
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, USA
- Department of Ophthalmology, Byers Eye Institute, Stanford University, Palo Alto, CA, USA
| |
Collapse
|
14
|
Bydon M, Durrani S, Mualem W. Commentary: Validation of Machine Learning-Based Automated Surgical Instrument Annotation Using Publicly Available Intraoperative Video. Oper Neurosurg (Hagerstown) 2022; 23:e158-e159. [PMID: 35972093 DOI: 10.1227/ons.0000000000000285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 04/03/2022] [Indexed: 02/04/2023] Open
Affiliation(s)
- Mohamad Bydon
- Mayo Clinic Neuro-Informatics Laboratory, Department of Neurologic Surgery, Mayo Clinic, Rochester, Minnesota, USA.,Department of Neurologic Surgery, Mayo Clinic, Rochester, Minnesota, USA
| | - Sulaman Durrani
- Mayo Clinic Neuro-Informatics Laboratory, Department of Neurologic Surgery, Mayo Clinic, Rochester, Minnesota, USA.,Department of Neurologic Surgery, Mayo Clinic, Rochester, Minnesota, USA
| | - William Mualem
- Mayo Clinic Neuro-Informatics Laboratory, Department of Neurologic Surgery, Mayo Clinic, Rochester, Minnesota, USA.,Department of Neurologic Surgery, Mayo Clinic, Rochester, Minnesota, USA
| |
Collapse
|
15
|
Evolution and Applications of Artificial Intelligence to Cataract Surgery. OPHTHALMOLOGY SCIENCE 2022; 2:100164. [PMID: 36245750 PMCID: PMC9559105 DOI: 10.1016/j.xops.2022.100164] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Revised: 03/27/2022] [Accepted: 04/19/2022] [Indexed: 11/22/2022]
Abstract
Topic Despite significant recent advances in artificial intelligence (AI) technology within several ophthalmic subspecialties, AI seems to be underutilized in the diagnosis and management of cataracts. In this article, we review AI technology that may soon become central to the cataract surgical pathway, from diagnosis to completion of surgery. Clinical Relevance This review describes recent advances in AI in the preoperative, intraoperative, and postoperative phase of cataract surgery, demonstrating its impact on the pathway and the surgical team. Methods A systematic search of PubMed was conducted to identify relevant publications on the topic of AI for cataract surgery. Articles of high quality and relevance to the topic were selected. Results Before surgery, diagnosis and grading of cataracts through AI-based image analysis has been demonstrated in several research settings. Optimal intraocular lens (IOL) power to achieve the desired postoperative refraction can be calculated with a higher degree of accuracy using AI-based modeling compared with traditional IOL formulae. During surgery, innovative AI-based video analysis tools are in development, promoting a paradigm shift for documentation, storage, and cataloging libraries of surgical videos with applications for teaching and training, complication review, and surgical research. Situation-aware computer-assisted devices can be connected to surgical microscopes for automated video capture and cloud storage upload. Artificial intelligence-based software can provide workflow analysis, tool detection, and video segmentation for skill evaluation by the surgeon and the trainee. Mixed reality features, such as real-time intraoperative warnings, may have a role in improving surgical decision-making with the key aim of reducing complications by recognizing surgical risks in advance and alerting the operator to them. For the management of patient flow through the pathway, AI-based mathematical models generating patient referral patterns are in development, as are simulations to optimize operating room use. In the postoperative phase, AI has been shown to predict the posterior capsule status with reasonable accuracy, and can therefore improve the triage pathway in the treatment of posterior capsular opacification. Discussion Artificial intelligence for cataract surgery will be as relevant as in other subspecialties of ophthalmology and will eventually constitute a future cornerstone for an enhanced cataract surgery pathway.
Collapse
|
16
|
Vedula SS, Ghazi A, Collins JW, Pugh C, Stefanidis D, Meireles O, Hung AJ, Schwaitzberg S, Levy JS, Sachdeva AK. Artificial Intelligence Methods and Artificial Intelligence-Enabled Metrics for Surgical Education: A Multidisciplinary Consensus. J Am Coll Surg 2022; 234:1181-1192. [PMID: 35703817 PMCID: PMC10634198 DOI: 10.1097/xcs.0000000000000190] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
BACKGROUND Artificial intelligence (AI) methods and AI-enabled metrics hold tremendous potential to advance surgical education. Our objective was to generate consensus guidance on specific needs for AI methods and AI-enabled metrics for surgical education. STUDY DESIGN The study included a systematic literature search, a virtual conference, and a 3-round Delphi survey of 40 representative multidisciplinary stakeholders with domain expertise selected through purposeful sampling. The accelerated Delphi process was completed within 10 days. The survey covered overall utility, anticipated future (10-year time horizon), and applications for surgical training, assessment, and feedback. Consensus was agreement among 80% or more respondents. We coded survey questions into 11 themes and descriptively analyzed the responses. RESULTS The respondents included surgeons (40%), engineers (15%), affiliates of industry (27.5%), professional societies (7.5%), regulatory agencies (7.5%), and a lawyer (2.5%). The survey included 155 questions; consensus was achieved on 136 (87.7%). The panel listed 6 deliverables each for AI-enhanced learning curve analytics and surgical skill assessment. For feedback, the panel identified 10 priority deliverables spanning 2-year (n = 2), 5-year (n = 4), and 10-year (n = 4) timeframes. Within 2 years, the panel expects development of methods to recognize anatomy in images of the surgical field and to provide surgeons with performance feedback immediately after an operation. The panel also identified 5 essential that should be included in operative performance reports for surgeons. CONCLUSIONS The Delphi panel consensus provides a specific, bold, and forward-looking roadmap for AI methods and AI-enabled metrics for surgical education.
Collapse
Affiliation(s)
- S Swaroop Vedula
- From the Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, MD (Vedula)
| | - Ahmed Ghazi
- the Department of Urology, University of Rochester Medical Center, Rochester, NY (Ghazi)
| | - Justin W Collins
- the Division of Surgery and Interventional Science, Research Department of Targeted Intervention and Wellcome/Engineering and Physical Sciences Research Council Center for Interventional and Surgical Sciences, University College London, London, UK (Collins)
| | - Carla Pugh
- the Department of Surgery, Stanford University, Stanford, CA (Pugh)
| | | | - Ozanan Meireles
- the Department of Surgery, Massachusetts General Hospital, Boston, MA (Meireles)
| | - Andrew J Hung
- the Artificial Intelligence Center at University of Southern California Urology, Department of Urology, University of Southern California, Los Angeles, CA (Hung)
| | | | - Jeffrey S Levy
- Institute for Surgical Excellence, Washington, DC (Levy)
| | - Ajit K Sachdeva
- Division of Education, American College of Surgeons, Chicago, IL (Sachdeva)
| |
Collapse
|
17
|
Seidlitz S, Sellner J, Odenthal J, Özdemir B, Studier-Fischer A, Knödler S, Ayala L, Adler TJ, Kenngott HG, Tizabi M, Wagner M, Nickel F, Müller-Stich BP, Maier-Hein L. Robust deep learning-based semantic organ segmentation in hyperspectral images. Med Image Anal 2022; 80:102488. [DOI: 10.1016/j.media.2022.102488] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Revised: 03/28/2022] [Accepted: 05/20/2022] [Indexed: 12/15/2022]
|
18
|
Kugener G, Pangal DJ, Cardinal T, Collet C, Lechtholz-Zey E, Lasky S, Sundaram S, Markarian N, Zhu Y, Roshannai A, Sinha A, Han XY, Papyan V, Hung A, Anandkumar A, Wrobel B, Zada G, Donoho DA. Utility of the Simulated Outcomes Following Carotid Artery Laceration Video Data Set for Machine Learning Applications. JAMA Netw Open 2022; 5:e223177. [PMID: 35311962 PMCID: PMC8938712 DOI: 10.1001/jamanetworkopen.2022.3177] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
IMPORTANCE Surgical data scientists lack video data sets that depict adverse events, which may affect model generalizability and introduce bias. Hemorrhage may be particularly challenging for computer vision-based models because blood obscures the scene. OBJECTIVE To assess the utility of the Simulated Outcomes Following Carotid Artery Laceration (SOCAL)-a publicly available surgical video data set of hemorrhage complication management with instrument annotations and task outcomes-to provide benchmarks for surgical data science techniques, including computer vision instrument detection, instrument use metrics and outcome associations, and validation of a SOCAL-trained neural network using real operative video. DESIGN, SETTING, AND PARTICIPANTS For this quailty improvement study, a total of 75 surgeons with 1 to 30 years' experience (mean, 7 years) were filmed from January 1, 2017, to December 31, 2020, managing catastrophic surgical hemorrhage in a high-fidelity cadaveric training exercise at nationwide training courses. Videos were annotated from January 1 to June 30, 2021. INTERVENTIONS Surgeons received expert coaching between 2 trials. MAIN OUTCOMES AND MEASURES Hemostasis within 5 minutes (task success, dichotomous), time to hemostasis (in seconds), and blood loss (in milliliters) were recorded. Deep neural networks (DNNs) were trained to detect surgical instruments in view. Model performance was measured using mean average precision (mAP), sensitivity, and positive predictive value. RESULTS SOCAL contains 31 443 frames with 65 071 surgical instrument annotations from 147 trials with associated surgeon demographic characteristics, time to hemostasis, and recorded blood loss for each trial. Computer vision-based instrument detection methods using DNNs trained on SOCAL achieved a mAP of 0.67 overall and 0.91 for the most common surgical instrument (suction). Hemorrhage control challenges standard object detectors: detection of some surgical instruments remained poor (mAP, 0.25). On real intraoperative video, the model achieved a sensitivity of 0.77 and a positive predictive value of 0.96. Instrument use metrics derived from the SOCAL video were significantly associated with performance (blood loss). CONCLUSIONS AND RELEVANCE Hemorrhage control is a high-stakes adverse event that poses unique challenges for video analysis, but no data sets of hemorrhage control exist. The use of SOCAL, the first data set to depict hemorrhage control, allows the benchmarking of data science applications, including object detection, performance metric development, and identification of metrics associated with outcomes. In the future, SOCAL may be used to build and validate surgical data science models.
Collapse
Affiliation(s)
- Guillaume Kugener
- Department of Neurosurgery, Keck School of Medicine of the University of Southern California, Los Angeles
| | - Dhiraj J. Pangal
- Department of Neurosurgery, Keck School of Medicine of the University of Southern California, Los Angeles
| | - Tyler Cardinal
- Department of Neurosurgery, Keck School of Medicine of the University of Southern California, Los Angeles
| | - Casey Collet
- Department of Neurosurgery, Keck School of Medicine of the University of Southern California, Los Angeles
| | - Elizabeth Lechtholz-Zey
- Department of Neurosurgery, Keck School of Medicine of the University of Southern California, Los Angeles
| | - Sasha Lasky
- Department of Neurosurgery, Keck School of Medicine of the University of Southern California, Los Angeles
| | - Shivani Sundaram
- Department of Neurosurgery, Keck School of Medicine of the University of Southern California, Los Angeles
| | - Nicholas Markarian
- Department of Neurosurgery, Keck School of Medicine of the University of Southern California, Los Angeles
| | - Yichao Zhu
- Department of Computer Science, Viterbi School of Engineering, University of Southern California, Los Angeles
| | - Arman Roshannai
- Department of Neurosurgery, Keck School of Medicine of the University of Southern California, Los Angeles
| | - Aditya Sinha
- Department of Neurosurgery, Keck School of Medicine of the University of Southern California, Los Angeles
| | - X. Y. Han
- Department of Operations Research and Information Engineering, Cornell University, Ithaca, New York
| | - Vardan Papyan
- Department of Mathematics, University of Toronto, Toronto, Ontario, Canada
| | - Andrew Hung
- Center for Robotic Simulation and Education, USC Institute of Urology, Keck School of Medicine of the University of Southern California, Los Angeles
| | - Animashree Anandkumar
- Department of Computer Science and Mathematics, California Institute of Technology, Pasadena
| | - Bozena Wrobel
- Department of Otolaryngology, Keck School of Medicine of the University of Southern California, Los Angeles
| | - Gabriel Zada
- Department of Neurosurgery, Keck School of Medicine of the University of Southern California, Los Angeles
| | - Daniel A. Donoho
- Division of Neurosurgery, Center for Neuroscience, Children’s National Hospital, Washington, DC
| |
Collapse
|
19
|
Ni ZL, Bian GB, Li Z, Zhou XH, Li RQ, Hou ZG. Space Squeeze Reasoning and Low-Rank Bilinear Feature Fusion for Surgical Image Segmentation. IEEE J Biomed Health Inform 2022; 26:3209-3217. [PMID: 35226612 DOI: 10.1109/jbhi.2022.3154925] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Surgical image segmentation is critical for surgical robot control and computer-assisted surgery. In the surgical scene, the local features of objects are highly similar, and the illumination interference is strong, which makes image segmentation challenging. To address the above issues, a bilinear squeeze reasoning network is proposed for surgical image segmentation. In it, the space squeeze reasoning module is proposed, which adopts height pooling and width pooling to squeeze global contexts in the vertical and horizontal directions, respectively. The similarity between each horizontal position and each vertical position is calculated to encode long-range semantic dependencies and establish the affinity matrix. The feature maps are also squeezed from both the vertical and horizontal directions to model channel relations. Guided by channel relations, the affinity matrix is embedded in the original feature space. It captures long-range semantic dependencies from different directions, helping address the local similarity issue. Besides, a low-rank bilinear fusion module is proposed to enhance the model's ability to recognize similar features. This module is based on the low-rank bilinear model to capture the inter-layer feature relations. It integrates the location details from low-level features and semantic information from high-level features. Various semantics can be represented more accurately, which effectively improves feature representation. The proposed network achieves state-of-the-art performance on cataract image segmentation dataset CataSeg and robotic image segmentation dataset EndoVis 2018.
Collapse
|