1
|
Cui B, Islam M, Bai L, Ren H. Surgical-DINO: adapter learning of foundation models for depth estimation in endoscopic surgery. Int J Comput Assist Radiol Surg 2024; 19:1013-1020. [PMID: 38459402 PMCID: PMC11178563 DOI: 10.1007/s11548-024-03083-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Accepted: 02/16/2024] [Indexed: 03/10/2024]
Abstract
PURPOSE Depth estimation in robotic surgery is vital in 3D reconstruction, surgical navigation and augmented reality visualization. Although the foundation model exhibits outstanding performance in many vision tasks, including depth estimation (e.g., DINOv2), recent works observed its limitations in medical and surgical domain-specific applications. This work presents a low-ranked adaptation (LoRA) of the foundation model for surgical depth estimation. METHODS We design a foundation model-based depth estimation method, referred to as Surgical-DINO, a low-rank adaptation of the DINOv2 for depth estimation in endoscopic surgery. We build LoRA layers and integrate them into DINO to adapt with surgery-specific domain knowledge instead of conventional fine-tuning. During training, we freeze the DINO image encoder, which shows excellent visual representation capacity, and only optimize the LoRA layers and depth decoder to integrate features from the surgical scene. RESULTS Our model is extensively validated on a MICCAI challenge dataset of SCARED, which is collected from da Vinci Xi endoscope surgery. We empirically show that Surgical-DINO significantly outperforms all the state-of-the-art models in endoscopic depth estimation tasks. The analysis with ablation studies has shown evidence of the remarkable effect of our LoRA layers and adaptation. CONCLUSION Surgical-DINO shed some light on the successful adaptation of the foundation models into the surgical domain for depth estimation. There is clear evidence in the results that zero-shot prediction on pre-trained weights in computer vision datasets or naive fine-tuning is not sufficient to use the foundation model in the surgical domain directly.
Collapse
Affiliation(s)
- Beilei Cui
- The Chinese University of Hong Kong, Hong Kong, China
| | - Mobarakol Islam
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), University College London, London, UK
| | - Long Bai
- The Chinese University of Hong Kong, Hong Kong, China
| | - Hongliang Ren
- The Chinese University of Hong Kong, Hong Kong, China.
- Department of BME, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
2
|
Yang Z, Dai J, Pan J. 3D reconstruction from endoscopy images: A survey. Comput Biol Med 2024; 175:108546. [PMID: 38704902 DOI: 10.1016/j.compbiomed.2024.108546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 01/05/2024] [Accepted: 04/28/2024] [Indexed: 05/07/2024]
Abstract
Three-dimensional reconstruction of images acquired through endoscopes is playing a vital role in an increasing number of medical applications. Endoscopes used in the clinic are commonly classified as monocular endoscopes and binocular endoscopes. We have reviewed the classification of methods for depth estimation according to the type of endoscope. Basically, depth estimation relies on feature matching of images and multi-view geometry theory. However, these traditional techniques have many problems in the endoscopic environment. With the increasing development of deep learning techniques, there is a growing number of works based on learning methods to address challenges such as inconsistent illumination and texture sparsity. We have reviewed over 170 papers published in the 10 years from 2013 to 2023. The commonly used public datasets and performance metrics are summarized. We also give a taxonomy of methods and analyze the advantages and drawbacks of algorithms. Summary tables and result atlas are listed to facilitate the comparison of qualitative and quantitative performance of different methods in each category. In addition, we summarize commonly used scene representation methods in endoscopy and speculate on the prospects of deep estimation research in medical applications. We also compare the robustness performance, processing time, and scene representation of the methods to facilitate doctors and researchers in selecting appropriate methods based on surgical applications.
Collapse
Affiliation(s)
- Zhuoyue Yang
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, 37 Xueyuan Road, Haidian District, Beijing, 100191, China; Peng Cheng Lab, 2 Xingke 1st Street, Nanshan District, Shenzhen, Guangdong Province, 518000, China
| | - Ju Dai
- Peng Cheng Lab, 2 Xingke 1st Street, Nanshan District, Shenzhen, Guangdong Province, 518000, China
| | - Junjun Pan
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, 37 Xueyuan Road, Haidian District, Beijing, 100191, China; Peng Cheng Lab, 2 Xingke 1st Street, Nanshan District, Shenzhen, Guangdong Province, 518000, China.
| |
Collapse
|
3
|
Richter A, Steinmann T, Rosenthal JC, Rupitsch SJ. Advances in Real-Time 3D Reconstruction for Medical Endoscopy. J Imaging 2024; 10:120. [PMID: 38786574 PMCID: PMC11122342 DOI: 10.3390/jimaging10050120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 04/23/2024] [Accepted: 04/24/2024] [Indexed: 05/25/2024] Open
Abstract
This contribution is intended to provide researchers with a comprehensive overview of the current state-of-the-art concerning real-time 3D reconstruction methods suitable for medical endoscopy. Over the past decade, there have been various technological advancements in computational power and an increased research effort in many computer vision fields such as autonomous driving, robotics, and unmanned aerial vehicles. Some of these advancements can also be adapted to the field of medical endoscopy while coping with challenges such as featureless surfaces, varying lighting conditions, and deformable structures. To provide a comprehensive overview, a logical division of monocular, binocular, trinocular, and multiocular methods is performed and also active and passive methods are distinguished. Within these categories, we consider both flexible and non-flexible endoscopes to cover the state-of-the-art as fully as possible. The relevant error metrics to compare the publications presented here are discussed, and the choice of when to choose a GPU rather than an FPGA for camera-based 3D reconstruction is debated. We elaborate on the good practice of using datasets and provide a direct comparison of the presented work. It is important to note that in addition to medical publications, publications evaluated on the KITTI and Middlebury datasets are also considered to include related methods that may be suited for medical 3D reconstruction.
Collapse
Affiliation(s)
- Alexander Richter
- Fraunhofer Institute for High-Speed Dynamics, Ernst–Mach–Institut (EMI), Ernst-Zermelo-Straße 4, 79104 Freiburg, Germany
- Electrical Instrumentation and Embedded Systems, Albert–Ludwigs–Universität Freiburg, Goerges-Köhler-Allee 106, 79110 Freiburg, Germany; (T.S.); (S.J.R.)
| | - Till Steinmann
- Electrical Instrumentation and Embedded Systems, Albert–Ludwigs–Universität Freiburg, Goerges-Köhler-Allee 106, 79110 Freiburg, Germany; (T.S.); (S.J.R.)
| | - Jean-Claude Rosenthal
- Fraunhofer Institute for Telecommunications, Heinrich–Hertz–Institut (HHI), Einsteinufer 37, 10587 Berlin, Germany
| | - Stefan J. Rupitsch
- Electrical Instrumentation and Embedded Systems, Albert–Ludwigs–Universität Freiburg, Goerges-Köhler-Allee 106, 79110 Freiburg, Germany; (T.S.); (S.J.R.)
| |
Collapse
|
4
|
Yang Z, Pan J, Dai J, Sun Z, Xiao Y. Self-Supervised Lightweight Depth Estimation in Endoscopy Combining CNN and Transformer. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:1934-1944. [PMID: 38198275 DOI: 10.1109/tmi.2024.3352390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2024]
Abstract
In recent years, an increasing number of medical engineering tasks, such as surgical navigation, pre-operative registration, and surgical robotics, rely on 3D reconstruction techniques. Self-supervised depth estimation has attracted interest in endoscopic scenarios because it does not require ground truth. Most existing methods depend on expanding the size of parameters to improve their performance. There, designing a lightweight self-supervised model that can obtain competitive results is a hot topic. We propose a lightweight network with a tight coupling of convolutional neural network (CNN) and Transformer for depth estimation. Unlike other methods that use CNN and Transformer to extract features separately and then fuse them on the deepest layer, we utilize the modules of CNN and Transformer to extract features at different scales in the encoder. This hierarchical structure leverages the advantages of CNN in texture perception and Transformer in shape extraction. In the same scale of feature extraction, the CNN is used to acquire local features while the Transformer encodes global information. Finally, we add multi-head attention modules to the pose network to improve the accuracy of predicted poses. Experiments demonstrate that our approach obtains comparable results while effectively compressing the model parameters on two datasets.
Collapse
|
5
|
Schmidt A, Mohareri O, DiMaio S, Yip MC, Salcudean SE. Tracking and mapping in medical computer vision: A review. Med Image Anal 2024; 94:103131. [PMID: 38442528 DOI: 10.1016/j.media.2024.103131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 02/08/2024] [Accepted: 02/29/2024] [Indexed: 03/07/2024]
Abstract
As computer vision algorithms increase in capability, their applications in clinical systems will become more pervasive. These applications include: diagnostics, such as colonoscopy and bronchoscopy; guiding biopsies, minimally invasive interventions, and surgery; automating instrument motion; and providing image guidance using pre-operative scans. Many of these applications depend on the specific visual nature of medical scenes and require designing algorithms to perform in this environment. In this review, we provide an update to the field of camera-based tracking and scene mapping in surgery and diagnostics in medical computer vision. We begin with describing our review process, which results in a final list of 515 papers that we cover. We then give a high-level summary of the state of the art and provide relevant background for those who need tracking and mapping for their clinical applications. After which, we review datasets provided in the field and the clinical needs that motivate their design. Then, we delve into the algorithmic side, and summarize recent developments. This summary should be especially useful for algorithm designers and to those looking to understand the capability of off-the-shelf methods. We maintain focus on algorithms for deformable environments while also reviewing the essential building blocks in rigid tracking and mapping since there is a large amount of crossover in methods. With the field summarized, we discuss the current state of the tracking and mapping methods along with needs for future algorithms, needs for quantification, and the viability of clinical applications. We then provide some research directions and questions. We conclude that new methods need to be designed or combined to support clinical applications in deformable environments, and more focus needs to be put into collecting datasets for training and evaluation.
Collapse
Affiliation(s)
- Adam Schmidt
- Department of Electrical and Computer Engineering, University of British Columbia, 2329 West Mall, Vancouver V6T 1Z4, BC, Canada.
| | - Omid Mohareri
- Advanced Research, Intuitive Surgical, 1020 Kifer Rd, Sunnyvale, CA 94086, USA
| | - Simon DiMaio
- Advanced Research, Intuitive Surgical, 1020 Kifer Rd, Sunnyvale, CA 94086, USA
| | - Michael C Yip
- Department of Electrical and Computer Engineering, University of California San Diego, 9500 Gilman Dr, La Jolla, CA 92093, USA
| | - Septimiu E Salcudean
- Department of Electrical and Computer Engineering, University of British Columbia, 2329 West Mall, Vancouver V6T 1Z4, BC, Canada
| |
Collapse
|
6
|
Wang Y, Gong B, Long Y, Fan SH, Dou Q. Efficient EndoNeRF reconstruction and its application for data-driven surgical simulation. Int J Comput Assist Radiol Surg 2024; 19:821-829. [PMID: 38658450 PMCID: PMC11098936 DOI: 10.1007/s11548-024-03114-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 03/13/2024] [Indexed: 04/26/2024]
Abstract
PURPOSE The healthcare industry has a growing need for realistic modeling and efficient simulation of surgical scenes. With effective models of deformable surgical scenes, clinicians are able to conduct surgical planning and surgery training on scenarios close to real-world cases. However, a significant challenge in achieving such a goal is the scarcity of high-quality soft tissue models with accurate shapes and textures. To address this gap, we present a data-driven framework that leverages emerging neural radiance field technology to enable high-quality surgical reconstruction and explore its application for surgical simulations. METHOD We first focus on developing a fast NeRF-based surgical scene 3D reconstruction approach that achieves state-of-the-art performance. This method can significantly outperform traditional 3D reconstruction methods, which have failed to capture large deformations and produce fine-grained shapes and textures. We then propose an automated creation pipeline of interactive surgical simulation environments through a closed mesh extraction algorithm. RESULTS Our experiments have validated the superior performance and efficiency of our proposed approach in surgical scene 3D reconstruction. We further utilize our reconstructed soft tissues to conduct FEM and MPM simulations, showcasing the practical application of our method in data-driven surgical simulations. CONCLUSION We have proposed a novel NeRF-based reconstruction framework with an emphasis on simulation purposes. Our reconstruction framework facilitates the efficient creation of high-quality surgical soft tissue 3D models. With multiple soft tissue simulations demonstrated, we show that our work has the potential to benefit downstream clinical tasks, such as surgical education.
Collapse
Affiliation(s)
- Yuehao Wang
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
| | - Bingchen Gong
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
| | - Yonghao Long
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
| | - Siu Hin Fan
- Department of Biomedical Engineering, The Chinese University of Hong Kong, Hong Kong, China
| | - Qi Dou
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China.
| |
Collapse
|
7
|
Regef J, Talasila L, Wiercigroch J, Lin RJ, Kahrs LA. Laryngeal surface reconstructions from monocular endoscopic videos: a structure from motion pipeline for periodic deformations. Int J Comput Assist Radiol Surg 2024:10.1007/s11548-024-03118-x. [PMID: 38652415 DOI: 10.1007/s11548-024-03118-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 03/21/2024] [Indexed: 04/25/2024]
Abstract
PURPOSE Surface reconstructions from laryngoscopic videos have the potential to assist clinicians in diagnosing, quantifying, and monitoring airway diseases using minimally invasive techniques. However, tissue movements and deformations make these reconstructions challenging using conventional pipelines. METHODS To facilitate such reconstructions, we developed video frame pre-filtering and featureless dense matching steps to enhance the Alicevision Meshroom SfM pipeline. Time and the anterior glottic angle were used to approximate the rigid state of the airway and to collect frames with different camera poses. Featureless dense matches were tracked with a correspondence transformer across subsets of images to extract matched points that could be used to estimate the point cloud and reconstructed surface. The proposed pipeline was tested on a simulated dataset under various conditions like illumination and resolution as well as real laryngoscopic videos. RESULTS Our pipeline was able to reconstruct the laryngeal region based on 4, 8, and 16 images obtained from simulated and real patient exams. The pipeline was robust to sparse inputs, blur, and extreme lighting conditions, unlike the Meshroom pipeline which failed to produce a point cloud for 6 of 15 simulated datasets. CONCLUSION The pre-filtering and featureless dense matching modules specialize the conventional SfM pipeline to handle the challenging laryngoscopic examinations, directly from patient videos. These 3D visualizations have the potential to improve spatial understanding of airway conditions.
Collapse
Affiliation(s)
- Justin Regef
- Medical Computer Vision and Robotics Lab, University of Toronto, Toronto, ON, Canada.
- Department of Mathematical and Computational Sciences, University of Toronto Mississauga, 3359 Mississauga Rd, Mississauga, ON, L5L 1C6, Canada.
| | - Likhit Talasila
- Medical Computer Vision and Robotics Lab, University of Toronto, Toronto, ON, Canada
- Department of Mathematical and Computational Sciences, University of Toronto Mississauga, 3359 Mississauga Rd, Mississauga, ON, L5L 1C6, Canada
| | - Julia Wiercigroch
- Medical Computer Vision and Robotics Lab, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, 40 St George St, Toronto, ON, M5S 2E4, Canada
| | - R Jun Lin
- Department of Otolaryngology - Head & Neck Surgery, Unity Health Toronto - St. Michael's Hospital, Temerty Faculty of Medicine, University of Toronto, 36 Queen St E, Toronto, ON, M5B 1W8, Canada
| | - Lueder A Kahrs
- Medical Computer Vision and Robotics Lab, University of Toronto, Toronto, ON, Canada
- Department of Mathematical and Computational Sciences, University of Toronto Mississauga, 3359 Mississauga Rd, Mississauga, ON, L5L 1C6, Canada
- Department of Computer Science, University of Toronto, 40 St George St, Toronto, ON, M5S 2E4, Canada
- Department of Otolaryngology - Head & Neck Surgery, Unity Health Toronto - St. Michael's Hospital, Temerty Faculty of Medicine, University of Toronto, 36 Queen St E, Toronto, ON, M5B 1W8, Canada
- Institute of Biomedical Engineering, University of Toronto, 164 College Street, Toronto, ON, M5S 3G9, Canada
| |
Collapse
|
8
|
Sun X, Wang F, Ma Z, Su H. Dynamic surface reconstruction in robot-assisted minimally invasive surgery based on neural radiance fields. Int J Comput Assist Radiol Surg 2024; 19:519-530. [PMID: 37768485 DOI: 10.1007/s11548-023-03016-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 08/29/2023] [Indexed: 09/29/2023]
Abstract
PURPOSE The purpose of this study was to improve surgical scene perception by addressing the challenge of reconstructing highly dynamic surgical scenes. We proposed a novel depth estimation network and a reconstruction framework that combines neural radiance fields to provide more accurate scene information for surgical task automation and AR navigation. METHODS We added a spatial pyramid pooling module and a Swin-Transformer module to enhance the robustness of stereo depth estimation. We also improved depth accuracy by adding unique matching constraints from optimal transport. To avoid deformation distortion in highly dynamic scenes, we used neural radiance fields to implicitly represent scenes in the time dimension and optimized them with depth and color information in a learning-based manner. RESULTS Our experiments on the KITTI and SCARED datasets show that the proposed depth estimation network performs close to the state-of-the-art method on natural images and surpasses the SOTA method on medical images with 1.12% in 3 px Error and 0.45 px in EPE. The proposed dynamic reconstruction framework successfully reconstructed the dynamic cardiac surface on a totally endoscopic coronary artery bypass video, achieving SOTA performance with 27.983 dB in PSNR, 0.812 in SSIM, and 0.189 in LPIPS. CONCLUSION Our proposed depth estimation network and reconstruction framework provide a significant contribution to the field of surgical scene perception. The framework achieves better results than SOTA methods on medical datasets, reducing mismatches on depth maps and resulting in more accurate depth maps with clearer edges. The proposed ER framework is verified on a series of dynamic cardiac surgical images. Future efforts will focus on improving the training speed and solving the problem of limited field of view.
Collapse
Affiliation(s)
- Xinan Sun
- School of Mechanical Engineering, Tianjin University, 135 Yaguan Road, Jinnan District, Tianjin, 300350, China
- Key Laboratory of Mechanism Theory and Equipment Design of Ministry of Education, Tianjin University, 135 Yaguan Road, Tianjin, 300350, China
| | - Feng Wang
- School of Mechanical Engineering, Tianjin University, 135 Yaguan Road, Jinnan District, Tianjin, 300350, China
| | - Zhikang Ma
- School of Mechanical Engineering, Tianjin University, 135 Yaguan Road, Jinnan District, Tianjin, 300350, China
| | - He Su
- School of Mechanical Engineering, Tianjin University, 135 Yaguan Road, Jinnan District, Tianjin, 300350, China.
- Key Laboratory of Mechanism Theory and Equipment Design of Ministry of Education, Tianjin University, 135 Yaguan Road, Tianjin, 300350, China.
| |
Collapse
|
9
|
Liu S, Fan J, Yang Y, Xiao D, Ai D, Song H, Wang Y, Yang J. Monocular endoscopy images depth estimation with multi-scale residual fusion. Comput Biol Med 2024; 169:107850. [PMID: 38145602 DOI: 10.1016/j.compbiomed.2023.107850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 11/16/2023] [Accepted: 12/11/2023] [Indexed: 12/27/2023]
Abstract
BACKGROUND Monocular depth estimation plays a fundamental role in clinical endoscopy surgery. However, the coherent illumination, smooth surfaces, and texture-less nature of endoscopy images present significant challenges to traditional depth estimation methods. Existing approaches struggle to accurately perceive depth in such settings. METHOD To overcome these challenges, this paper proposes a novel multi-scale residual fusion method for estimating the depth of monocular endoscopy images. Specifically, we address the issue of coherent illumination by leveraging image frequency domain component space transformation, thereby enhancing the stability of the scene's light source. Moreover, we employ an image radiation intensity attenuation model to estimate the initial depth map. Finally, to refine the accuracy of depth estimation, we utilize a multi-scale residual fusion optimization technique. RESULTS To evaluate the performance of our proposed method, extensive experiments were conducted on public datasets. The structural similarity measures for continuous frames in three distinct clinical data scenes reached impressive values of 0.94, 0.82, and 0.84, respectively. These results demonstrate the effectiveness of our approach in capturing the intricate details of endoscopy images. Furthermore, the depth estimation accuracy achieved remarkable levels of 89.3 % and 91.2 % for the two models' data, respectively, underscoring the robustness of our method. CONCLUSIONS Overall, the promising results obtained on public datasets highlight the significant potential of our method for clinical applications, facilitating reliable depth estimation and enhancing the quality of endoscopy surgical procedures.
Collapse
Affiliation(s)
- Shiyuan Liu
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China; China Center for Information Industry Development, Beijing, 100081, China
| | - Jingfan Fan
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China.
| | - Yun Yang
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, National Clinical Research Center for Digestive Diseases, Beijing 100050, China
| | - Deqiang Xiao
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Danni Ai
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Hong Song
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
| | - Yongtian Wang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China.
| | - Jian Yang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| |
Collapse
|
10
|
Liu S, Fan J, Zang L, Yang Y, Fu T, Song H, Wang Y, Yang J. Pose estimation via structure-depth information from monocular endoscopy images sequence. BIOMEDICAL OPTICS EXPRESS 2024; 15:460-478. [PMID: 38223180 PMCID: PMC10783895 DOI: 10.1364/boe.498262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 12/08/2023] [Accepted: 12/14/2023] [Indexed: 01/16/2024]
Abstract
Image-based endoscopy pose estimation has been shown to significantly improve the visualization and accuracy of minimally invasive surgery (MIS). This paper proposes a method for pose estimation based on structure-depth information from a monocular endoscopy image sequence. Firstly, the initial frame location is constrained using the image structure difference (ISD) network. Secondly, endoscopy image depth information is used to estimate the pose of sequence frames. Finally, adaptive boundary constraints are used to optimize continuous frame endoscopy pose estimation, resulting in more accurate intraoperative endoscopy pose estimation. Evaluations were conducted on publicly available datasets, with the pose estimation error in bronchoscopy and colonoscopy datasets reaching 1.43 mm and 3.64 mm, respectively. These results meet the real-time requirements of various scenarios, demonstrating the capability of this method to generate reliable pose estimation results for endoscopy images and its meaningful applications in clinical practice. This method enables accurate localization of endoscopy images during surgery, assisting physicians in performing safer and more effective procedures.
Collapse
Affiliation(s)
- Shiyuan Liu
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China
- China Center for Information Industry Development, Beijing 100081, China
| | - Jingfan Fan
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China
| | - Liugeng Zang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China
| | - Yun Yang
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University; National Clinical Research Center for Digestive Diseases, Beijing 100050, China
| | - Tianyu Fu
- Institute of Engineering Medicine, Beijing Institute of Technology, Beijing 100081, China
| | - Hong Song
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Yongtian Wang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China
| | - Jian Yang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
11
|
Bobrow TL, Golhar M, Vijayan R, Akshintala VS, Garcia JR, Durr NJ. Colonoscopy 3D video dataset with paired depth from 2D-3D registration. Med Image Anal 2023; 90:102956. [PMID: 37713764 PMCID: PMC10591895 DOI: 10.1016/j.media.2023.102956] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 06/29/2023] [Accepted: 09/04/2023] [Indexed: 09/17/2023]
Abstract
Screening colonoscopy is an important clinical application for several 3D computer vision techniques, including depth estimation, surface reconstruction, and missing region detection. However, the development, evaluation, and comparison of these techniques in real colonoscopy videos remain largely qualitative due to the difficulty of acquiring ground truth data. In this work, we present a Colonoscopy 3D Video Dataset (C3VD) acquired with a high definition clinical colonoscope and high-fidelity colon models for benchmarking computer vision methods in colonoscopy. We introduce a novel multimodal 2D-3D registration technique to register optical video sequences with ground truth rendered views of a known 3D model. The different modalities are registered by transforming optical images to depth maps with a Generative Adversarial Network and aligning edge features with an evolutionary optimizer. This registration method achieves an average translation error of 0.321 millimeters and an average rotation error of 0.159 degrees in simulation experiments where error-free ground truth is available. The method also leverages video information, improving registration accuracy by 55.6% for translation and 60.4% for rotation compared to single frame registration. 22 short video sequences were registered to generate 10,015 total frames with paired ground truth depth, surface normals, optical flow, occlusion, six degree-of-freedom pose, coverage maps, and 3D models. The dataset also includes screening videos acquired by a gastroenterologist with paired ground truth pose and 3D surface models. The dataset and registration source code are available at https://durr.jhu.edu/C3VD.
Collapse
Affiliation(s)
- Taylor L Bobrow
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Mayank Golhar
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Rohan Vijayan
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Venkata S Akshintala
- Division of Gastroenterology and Hepatology, Johns Hopkins Medicine, Baltimore, MD 21287, USA
| | - Juan R Garcia
- Department of Art as Applied to Medicine, Johns Hopkins School of Medicine, Baltimore, MD 21287, USA
| | - Nicholas J Durr
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA.
| |
Collapse
|
12
|
Hirohata Y, Sogabe M, Miyazaki T, Kawase T, Kawashima K. Confidence-aware self-supervised learning for dense monocular depth estimation in dynamic laparoscopic scene. Sci Rep 2023; 13:15380. [PMID: 37717055 PMCID: PMC10505201 DOI: 10.1038/s41598-023-42713-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Accepted: 09/13/2023] [Indexed: 09/18/2023] Open
Abstract
This paper tackles the challenge of accurate depth estimation from monocular laparoscopic images in dynamic surgical environments. The lack of reliable ground truth due to inconsistencies within these images makes this a complex task. Further complicating the learning process is the presence of noise elements like bleeding and smoke. We propose a model learning framework that uses a generic laparoscopic surgery video dataset for training, aimed at achieving precise monocular depth estimation in dynamic surgical settings. The architecture employs binocular disparity confidence information as a self-supervisory signal, along with the disparity information from a stereo laparoscope. Our method ensures robust learning amidst outliers, influenced by tissue deformation, smoke, and surgical instruments, by utilizing a unique loss function. This function adjusts the selection and weighting of depth data for learning based on their given confidence. We trained the model using the Hamlyn Dataset and verified it with Hamlyn Dataset test data and a static dataset. The results show exceptional generalization performance and efficacy for various scene dynamics, laparoscope types, and surgical sites.
Collapse
Affiliation(s)
- Yasuhide Hirohata
- The Department of Information Physics and Computing, The University of Tokyo, Tokyo, 113-8656, Japan
| | - Maina Sogabe
- The Department of Information Physics and Computing, The University of Tokyo, Tokyo, 113-8656, Japan.
| | - Tetsuro Miyazaki
- The Department of Information Physics and Computing, The University of Tokyo, Tokyo, 113-8656, Japan
| | - Toshihiro Kawase
- The School of Engineering Department of Information and Communication Engineering, Tokyo Denki University, Tokyo, 120-8551, Japan
| | - Kenji Kawashima
- The Department of Information Physics and Computing, The University of Tokyo, Tokyo, 113-8656, Japan
| |
Collapse
|
13
|
Mao F, Huang T, Ma L, Zhang X, Liao H. A Monocular Variable Magnifications 3D Laparoscope System Using Double Liquid Lenses. IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE 2023; 12:32-42. [PMID: 38059130 PMCID: PMC10697296 DOI: 10.1109/jtehm.2023.3311022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 08/13/2023] [Accepted: 08/21/2023] [Indexed: 12/08/2023]
Abstract
During minimal invasive surgery (MIS), the laparoscope only provides a single viewpoint to the surgeon, leaving a lack of 3D perception. Many works have been proposed to obtain depth and 3D reconstruction by designing a new optical structure or by depending on the camera pose and image sequences. Most of these works modify the structure of the conventional laparoscopes and cannot provide 3D reconstruction of different magnification views. In this study, we propose a laparoscopic system based on double liquid lenses, which provide doctors with variable magnification rates, near observation, and real-time monocular 3D reconstruction. Our system composes of an optical structure that can obtain auto magnification change and autofocus without any physically moving element, and a deep learning network based on the Depth from Defocus (DFD) method, trained to suit inconsistent camera intrinsic situations and estimate depth from images of different focal lengths. The optical structure is portable and can be mounted on conventional laparoscopes. The depth estimation network estimates depth in real-time from monocular images of different focal lengths and magnification rates. Experiments show that our system provides a 0.68-1.44x zoom rate and can estimate depth from different magnification rates at 6fps. Monocular 3D reconstruction reaches at least 6mm accuracy. The system also provides a clear view even under 1mm close working distance. Ex-vivo experiments and implementation on clinical images prove that our system provides doctors with a magnified clear view of the lesion, as well as quick monocular depth perception during laparoscopy, which help surgeons get better detection and size diagnosis of the abdomen during laparoscope surgeries.
Collapse
Affiliation(s)
- Fan Mao
- Department of Biomedical EngineeringSchool of MedicineTsinghua UniversityBeijing100084China
| | - Tianqi Huang
- Department of Biomedical EngineeringSchool of MedicineTsinghua UniversityBeijing100084China
| | - Longfei Ma
- Department of Biomedical EngineeringSchool of MedicineTsinghua UniversityBeijing100084China
| | - Xinran Zhang
- Department of Biomedical EngineeringSchool of MedicineTsinghua UniversityBeijing100084China
| | - Hongen Liao
- Department of Biomedical EngineeringSchool of MedicineTsinghua UniversityBeijing100084China
| |
Collapse
|
14
|
Yu X, Zhao J, Wu H, Wang A. A Novel Evaluation Method for SLAM-Based 3D Reconstruction of Lumen Panoramas. SENSORS (BASEL, SWITZERLAND) 2023; 23:7188. [PMID: 37631725 PMCID: PMC10459170 DOI: 10.3390/s23167188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 08/09/2023] [Accepted: 08/10/2023] [Indexed: 08/27/2023]
Abstract
Laparoscopy is employed in conventional minimally invasive surgery to inspect internal cavities by viewing two-dimensional images on a monitor. This method has a limited field of view and provides insufficient information for surgeons, increasing surgical complexity. Utilizing simultaneous localization and mapping (SLAM) technology to reconstruct laparoscopic scenes can offer more comprehensive and intuitive visual feedback. Moreover, the precision of the reconstructed models is a crucial factor for further applications of surgical assistance systems. However, challenges such as data scarcity and scale uncertainty hinder effective assessment of the accuracy of endoscopic monocular SLAM reconstructions. Therefore, this paper proposes a technique that incorporates existing knowledge from calibration objects to supplement metric information and resolve scale ambiguity issues, and it quantifies the endoscopic reconstruction accuracy based on local alignment metrics. The experimental results demonstrate that the reconstructed models restore realistic scales and enable error analysis for laparoscopic SLAM reconstruction systems. This suggests that for the evaluation of monocular SLAM three-dimensional (3D) reconstruction accuracy in minimally invasive surgery scenarios, our proposed scheme for recovering scale factors is viable, and our evaluation outcomes can serve as criteria for measuring reconstruction precision.
Collapse
Affiliation(s)
- Xiaoyu Yu
- College of Electron and Information, University of Electronic Science and Technology of China, Zhongshan Institute, Zhongshan 528402, China;
- Heilongjiang Province Key Laboratory of Laser Spectroscopy Technology and Application, Harbin University of Science and Technology, Harbin 150080, China (A.W.)
| | - Jianbo Zhao
- Heilongjiang Province Key Laboratory of Laser Spectroscopy Technology and Application, Harbin University of Science and Technology, Harbin 150080, China (A.W.)
| | - Haibin Wu
- Heilongjiang Province Key Laboratory of Laser Spectroscopy Technology and Application, Harbin University of Science and Technology, Harbin 150080, China (A.W.)
| | - Aili Wang
- Heilongjiang Province Key Laboratory of Laser Spectroscopy Technology and Application, Harbin University of Science and Technology, Harbin 150080, China (A.W.)
| |
Collapse
|
15
|
Liu R, Liu Z, Lu J, Zhang G, Zuo Z, Sun B, Zhang J, Sheng W, Guo R, Zhang L, Hua X. Sparse-to-dense coarse-to-fine depth estimation for colonoscopy. Comput Biol Med 2023; 160:106983. [PMID: 37187133 DOI: 10.1016/j.compbiomed.2023.106983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 04/17/2023] [Accepted: 04/27/2023] [Indexed: 05/17/2023]
Abstract
Colonoscopy, as the golden standard for screening colon cancer and diseases, offers considerable benefits to patients. However, it also imposes challenges on diagnosis and potential surgery due to the narrow observation perspective and limited perception dimension. Dense depth estimation can overcome the above limitations and offer doctors straightforward 3D visual feedback. To this end, we propose a novel sparse-to-dense coarse-to-fine depth estimation solution for colonoscopic scenes based on the direct SLAM algorithm. The highlight of our solution is that we utilize the scattered 3D points obtained from SLAM to generate accurate and dense depth in full resolution. This is done by a deep learning (DL)-based depth completion network and a reconstruction system. The depth completion network effectively extracts texture, geometry, and structure features from sparse depth along with RGB data to recover the dense depth map. The reconstruction system further updates the dense depth map using a photometric error-based optimization and a mesh modeling approach to reconstruct a more accurate 3D model of colons with detailed surface texture. We show the effectiveness and accuracy of our depth estimation method on near photo-realistic challenging colon datasets. Experiments demonstrate that the strategy of sparse-to-dense coarse-to-fine can significantly improve the performance of depth estimation and smoothly fuse direct SLAM and DL-based depth estimation into a complete dense reconstruction system.
Collapse
Affiliation(s)
- Ruyu Liu
- School of Information Science and Technology, Hangzhou Normal University, Hangzhou, 311121, China; Haixi Institutes, Chinese Academy of Sciences Quanzhou Institute of Equipment Manufacturing, Quanzhou, 362000, China
| | - Zhengzhe Liu
- School of Information Science and Technology, Hangzhou Normal University, Hangzhou, 311121, China
| | - Jiaming Lu
- School of Computer Science and Engineering, Tianjin University of Technology, Tianjin, 300384, China
| | - Guodao Zhang
- Department of Digital Media Technology, Hangzhou Dianzi University, Hangzhou, 310018, China
| | - Zhigui Zuo
- Department of Colorectal Surgery, the First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325035, China
| | - Bo Sun
- Haixi Institutes, Chinese Academy of Sciences Quanzhou Institute of Equipment Manufacturing, Quanzhou, 362000, China
| | - Jianhua Zhang
- School of Computer Science and Engineering, Tianjin University of Technology, Tianjin, 300384, China
| | - Weiguo Sheng
- School of Information Science and Technology, Hangzhou Normal University, Hangzhou, 311121, China
| | - Ran Guo
- Cyberspace Institute Advanced Technology, Guangzhou University, Guangzhou, 510006, China.
| | - Lejun Zhang
- Cyberspace Institute Advanced Technology, Guangzhou University, Guangzhou, 510006, China; College of Information Engineering, Yangzhou University, Yangzhou, 225127, China; Research and Development Center for E-Learning, Ministry of Education, Beijing, 100039, China
| | - Xiaozhen Hua
- Department of Pediatrics, Cangnan Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325800, China.
| |
Collapse
|
16
|
Chadebecq F, Lovat LB, Stoyanov D. Artificial intelligence and automation in endoscopy and surgery. Nat Rev Gastroenterol Hepatol 2023; 20:171-182. [PMID: 36352158 DOI: 10.1038/s41575-022-00701-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/03/2022] [Indexed: 11/10/2022]
Abstract
Modern endoscopy relies on digital technology, from high-resolution imaging sensors and displays to electronics connecting configurable illumination and actuation systems for robotic articulation. In addition to enabling more effective diagnostic and therapeutic interventions, the digitization of the procedural toolset enables video data capture of the internal human anatomy at unprecedented levels. Interventional video data encapsulate functional and structural information about a patient's anatomy as well as events, activity and action logs about the surgical process. This detailed but difficult-to-interpret record from endoscopic procedures can be linked to preoperative and postoperative records or patient imaging information. Rapid advances in artificial intelligence, especially in supervised deep learning, can utilize data from endoscopic procedures to develop systems for assisting procedures leading to computer-assisted interventions that can enable better navigation during procedures, automation of image interpretation and robotically assisted tool manipulation. In this Perspective, we summarize state-of-the-art artificial intelligence for computer-assisted interventions in gastroenterology and surgery.
Collapse
Affiliation(s)
- François Chadebecq
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, UK
| | - Laurence B Lovat
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, UK
| | - Danail Stoyanov
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, UK.
| |
Collapse
|
17
|
Li Y. Deep causal learning for robotic intelligence. Front Neurorobot 2023; 17:1128591. [PMID: 36910267 PMCID: PMC9992986 DOI: 10.3389/fnbot.2023.1128591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 01/30/2023] [Indexed: 02/24/2023] Open
Abstract
This invited Review discusses causal learning in the context of robotic intelligence. The Review introduces the psychological findings on causal learning in human cognition, as well as the traditional statistical solutions for causal discovery and causal inference. Additionally, we examine recent deep causal learning algorithms, with a focus on their architectures and the benefits of using deep nets, and discuss the gap between deep causal learning and the needs of robotic intelligence.
Collapse
Affiliation(s)
- Yangming Li
- RoCAL, Rochester Institute of Technology, Rochester, NY, United States
| |
Collapse
|
18
|
Yang Z, Pan J, Li R, Qin H. Scene-graph-driven semantic feature matching for monocular digestive endoscopy. Comput Biol Med 2022; 146:105616. [DOI: 10.1016/j.compbiomed.2022.105616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Revised: 04/11/2022] [Accepted: 05/11/2022] [Indexed: 11/28/2022]
|
19
|
Yang Z, Lin S, Simon R, Linte CA. Endoscope Localization and Dense Surgical Scene Reconstruction for Stereo Endoscopy by Unsupervised Optical Flow and Kanade-Lucas-Tomasi Tracking. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2022; 2022:4839-4842. [PMID: 36086106 PMCID: PMC10153602 DOI: 10.1109/embc48229.2022.9871588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
In image-guided surgery, endoscope tracking and surgical scene reconstruction are critical, yet equally challenging tasks. We present a hybrid visual odometry and reconstruction framework for stereo endoscopy that leverages unsupervised learning-based and traditional optical flow methods to enable concurrent endoscope tracking and dense scene reconstruction. More specifically, to reconstruct texture-less tissue surfaces, we use an unsupervised learning-based optical flow method to estimate dense depth maps from stereo images. Robust 3D landmarks are selected from the dense depth maps and tracked via the Kanade-Lucas-Tomasi tracking algorithm. The hybrid visual odometry also benefits from traditional visual odometry modules, such as keyframe insertion and local bundle adjustment. We evaluate the proposed framework on endoscopic video sequences openly available via the SCARED dataset against both ground truth data, as well as two other state-of-the-art methods - ORB-SLAM2 and Endo-depth. Our proposed method achieved comparable results in terms of both RMS Absolute Trajectory Error and Cloud-to-Mesh RMS Error, suggesting its potential to enable accurate endoscope tracking and scene reconstruction.
Collapse
|
20
|
Tracking better, tracking longer: automatic keyframe selection in model-based laparoscopic augmented reality. Int J Comput Assist Radiol Surg 2022; 17:1507-1511. [DOI: 10.1007/s11548-022-02643-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Accepted: 04/08/2022] [Indexed: 11/05/2022]
|
21
|
Shao S, Pei Z, Chen W, Zhu W, Wu X, Sun D, Zhang B. Self-Supervised monocular depth and ego-Motion estimation in endoscopy: Appearance flow to the rescue. Med Image Anal 2021; 77:102338. [PMID: 35016079 DOI: 10.1016/j.media.2021.102338] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 10/24/2021] [Accepted: 12/14/2021] [Indexed: 11/25/2022]
Abstract
Recently, self-supervised learning technology has been applied to calculate depth and ego-motion from monocular videos, achieving remarkable performance in autonomous driving scenarios. One widely adopted assumption of depth and ego-motion self-supervised learning is that the image brightness remains constant within nearby frames. Unfortunately, the endoscopic scene does not meet this assumption because there are severe brightness fluctuations induced by illumination variations, non-Lambertian reflections and interreflections during data collection, and these brightness fluctuations inevitably deteriorate the depth and ego-motion estimation accuracy. In this work, we introduce a novel concept referred to as appearance flow to address the brightness inconsistency problem. The appearance flow takes into consideration any variations in the brightness pattern and enables us to develop a generalized dynamic image constraint. Furthermore, we build a unified self-supervised framework to estimate monocular depth and ego-motion simultaneously in endoscopic scenes, which comprises a structure module, a motion module, an appearance module and a correspondence module, to accurately reconstruct the appearance and calibrate the image brightness. Extensive experiments are conducted on the SCARED dataset and EndoSLAM dataset, and the proposed unified framework exceeds other self-supervised approaches by a large margin. To validate our framework's generalization ability on different patients and cameras, we train our model on SCARED but test it on the SERV-CT and Hamlyn datasets without any fine-tuning, and the superior results reveal its strong generalization ability. Code is available at: https://github.com/ShuweiShao/AF-SfMLearner.
Collapse
Affiliation(s)
- Shuwei Shao
- School of Automation Science and Electrical Engineering, Beihang University, Beijing, China
| | - Zhongcai Pei
- School of Automation Science and Electrical Engineering, Beihang University, Beijing, China; Hangzhou Innovation Institute, Beihang University, Hangzhou, China
| | - Weihai Chen
- School of Automation Science and Electrical Engineering, Beihang University, Beijing, China; Hangzhou Innovation Institute, Beihang University, Hangzhou, China.
| | | | - Xingming Wu
- School of Automation Science and Electrical Engineering, Beihang University, Beijing, China
| | - Dianmin Sun
- Shandong Cancer Hospital Affiliated to Shandong University, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, China
| | - Baochang Zhang
- Institute of Artificial Intelligence, Beihang University, Beijing, China.
| |
Collapse
|