1
|
Zhou Y, Li R, Dai Y, Chen G, Zhang J, Cui L, Yin X. Taking measurement in every direction: Implicit scene representation for accurately estimating target dimensions under monocular endoscope. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 256:108380. [PMID: 39178502 DOI: 10.1016/j.cmpb.2024.108380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 06/02/2024] [Accepted: 08/15/2024] [Indexed: 08/26/2024]
Abstract
BACKGROUND AND OBJECTIVES In endoscopy, measurement of target size can assist medical diagnosis. However, limited operating space, low image quality, and irregular target shape pose great challenges to traditional vision-based measurement methods. METHODS In this paper, we propose a novel approach to measure irregular target size under monocular endoscope using image rendering. Firstly synthesize virtual poses on the same main optical axis as known camera poses, and use implicit neural representation module that considers brightness and target boundaries to render images corresponding to virtual poses. Then, Swin-Unet and rotating calipers are utilized to obtain maximum pixel length of the target in image pairs with the same main optical axis. Finally, the similarity triangle relationship of the endoscopic imaging model is used to measure the size of the target. RESULTS The evaluation is conducted using renal stone fragments of patients which are placed in the kidney model and the isolated porcine kidney. The mean error of measurement is 0.12 mm. CONCLUSIONS The approached method can automatically measure object size within narrow body cavities in any visible direction. It improves the effectiveness and accuracy of measurement in limited endoscopic space.
Collapse
Affiliation(s)
- Yuchen Zhou
- The College of Artificial Intelligence, Nankai University, Tianjin 300350, China; The Institute of Robotics and Automatic Information System, Tianjin Key Laboratory of Intelligent Robotics, Tianjin 300350, China
| | - Rui Li
- The College of Artificial Intelligence, Nankai University, Tianjin 300350, China; The Institute of Robotics and Automatic Information System, Tianjin Key Laboratory of Intelligent Robotics, Tianjin 300350, China
| | - Yu Dai
- The College of Artificial Intelligence, Nankai University, Tianjin 300350, China; The Institute of Robotics and Automatic Information System, Tianjin Key Laboratory of Intelligent Robotics, Tianjin 300350, China.
| | - Gongping Chen
- The College of Artificial Intelligence, Nankai University, Tianjin 300350, China; The Institute of Robotics and Automatic Information System, Tianjin Key Laboratory of Intelligent Robotics, Tianjin 300350, China
| | - Jianxun Zhang
- The College of Artificial Intelligence, Nankai University, Tianjin 300350, China; The Institute of Robotics and Automatic Information System, Tianjin Key Laboratory of Intelligent Robotics, Tianjin 300350, China
| | - Liang Cui
- Department of Urology, Civil Aviation General Hospital, Beijing 100123, China
| | - Xiaotao Yin
- Department of Urology, Fourth Medical Center of Chinese, PLA General Hospital, Beijing 10048, China
| |
Collapse
|
2
|
He Q, Feng G, Bano S, Stoyanov D, Zuo S. MonoLoT: Self-Supervised Monocular Depth Estimation in Low-Texture Scenes for Automatic Robotic Endoscopy. IEEE J Biomed Health Inform 2024; 28:6078-6091. [PMID: 38968011 DOI: 10.1109/jbhi.2024.3423791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/07/2024]
Abstract
The self-supervised monocular depth estimation framework is well-suited for medical images that lack ground-truth depth, such as those from digestive endoscopes, facilitating navigation and 3D reconstruction in the gastrointestinal tract. However, this framework faces several limitations, including poor performance in low-texture environments, limited generalisation to real-world datasets, and unclear applicability in downstream tasks like visual servoing. To tackle these challenges, we propose MonoLoT, a self-supervised monocular depth estimation framework featuring two key innovations: point matching loss and batch image shuffle. Extensive ablation studies on two publicly available datasets, namely C3VD and SimCol, have shown that methods enabled by MonoLoT achieve substantial improvements, with accuracies of 0.944 on C3VD and 0.959 on SimCol, surpassing both depth-supervised and self-supervised baselines on C3VD. Qualitative evaluations on real-world endoscopic data underscore the generalisation capabilities of our methods, outperforming both depth-supervised and self-supervised baselines. To demonstrate the feasibility of using monocular depth estimation for visual servoing, we have successfully integrated our method into a proof-of-concept robotic platform, enabling real-time automatic intervention and control in digestive endoscopy. In summary, our method represents a significant advancement in monocular depth estimation for digestive endoscopy, overcoming key challenges and opening promising avenues for medical applications.
Collapse
|
3
|
Jeong BH, Kim HK, Son YD. Depth estimation from monocular endoscopy using simulation and image transfer approach. Comput Biol Med 2024; 181:109038. [PMID: 39178804 DOI: 10.1016/j.compbiomed.2024.109038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 08/12/2024] [Accepted: 08/15/2024] [Indexed: 08/26/2024]
Abstract
Obtaining accurate distance or depth information in endoscopy is crucial for the effective utilization of navigation systems. However, due to space constraints, incorporating depth cameras into endoscopic systems is often impractical. Our goal is to estimate depth images directly from endoscopic images using deep learning. This study presents a three-step methodology for training a depth-estimation network model. Initially, simulated endoscopy images and corresponding depth maps are generated using Unity based on a colon surface model obtained from segmented computed tomography colonography data. Subsequently, a cycle generative adversarial network model is employed to enhance the realism of the simulated endoscopy images. Finally, a deep learning model is trained using the synthesized endoscopy images and depth maps to estimate depths accurately. The performance of the proposed approach is evaluated and compared against prior studies utilizing unsupervised training methods. The results demonstrate the superior precision of the proposed technique in estimating depth images within endoscopy. The proposed depth estimation method holds promise for advancing the field by enabling enhanced navigation, improved lesion marking capabilities, and ultimately leading to better clinical outcomes.
Collapse
Affiliation(s)
- Bong Hyuk Jeong
- Department of Health Sciences and Technology, GAIHST, Gachon University, Incheon, 21999, South Korea.
| | - Hang Keun Kim
- Department of Health Sciences and Technology, GAIHST, Gachon University, Incheon, 21999, South Korea; Department of Biomedical Engineering, Gachon University, Seongnam, 13120, South Korea.
| | - Young Don Son
- Department of Health Sciences and Technology, GAIHST, Gachon University, Incheon, 21999, South Korea; Department of Biomedical Engineering, Gachon University, Seongnam, 13120, South Korea.
| |
Collapse
|
4
|
Lee Y. Three-Dimensional Dense Reconstruction: A Review of Algorithms and Datasets. SENSORS (BASEL, SWITZERLAND) 2024; 24:5861. [PMID: 39338606 PMCID: PMC11435907 DOI: 10.3390/s24185861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2024] [Revised: 09/04/2024] [Accepted: 09/05/2024] [Indexed: 09/30/2024]
Abstract
Three-dimensional dense reconstruction involves extracting the full shape and texture details of three-dimensional objects from two-dimensional images. Although 3D reconstruction is a crucial and well-researched area, it remains an unsolved challenge in dynamic or complex environments. This work provides a comprehensive overview of classical 3D dense reconstruction techniques, including those based on geometric and optical models, as well as approaches leveraging deep learning. It also discusses the datasets used for deep learning and evaluates the performance and the strengths and limitations of deep learning methods on these datasets.
Collapse
Affiliation(s)
- Yangming Lee
- RoCAL Lab, Rochester Institute of Technology, Rochester, NY 14623, USA
| |
Collapse
|
5
|
Li C, Zhang G, Zhao B, Xie D, Du H, Duan X, Hu Y, Zhang L. Advances of surgical robotics: image-guided classification and application. Natl Sci Rev 2024; 11:nwae186. [PMID: 39144738 PMCID: PMC11321255 DOI: 10.1093/nsr/nwae186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Revised: 04/19/2024] [Accepted: 05/07/2024] [Indexed: 08/16/2024] Open
Abstract
Surgical robotics application in the field of minimally invasive surgery has developed rapidly and has been attracting increasingly more research attention in recent years. A common consensus has been reached that surgical procedures are to become less traumatic and with the implementation of more intelligence and higher autonomy, which is a serious challenge faced by the environmental sensing capabilities of robotic systems. One of the main sources of environmental information for robots are images, which are the basis of robot vision. In this review article, we divide clinical image into direct and indirect based on the object of information acquisition, and into continuous, intermittent continuous, and discontinuous according to the target-tracking frequency. The characteristics and applications of the existing surgical robots in each category are introduced based on these two dimensions. Our purpose in conducting this review was to analyze, summarize, and discuss the current evidence on the general rules on the application of image technologies for medical purposes. Our analysis gives insight and provides guidance conducive to the development of more advanced surgical robotics systems in the future.
Collapse
Affiliation(s)
- Changsheng Li
- School of Mechatronical Engineering, Beijing Institute of Technology, Beijing 100081, China
| | - Gongzi Zhang
- Department of Orthopedics, Chinese PLA General Hospital, Beijing 100141, China
| | - Baoliang Zhao
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Dongsheng Xie
- School of Mechatronical Engineering, Beijing Institute of Technology, Beijing 100081, China
- School of Medical Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Hailong Du
- Department of Orthopedics, Chinese PLA General Hospital, Beijing 100141, China
| | - Xingguang Duan
- School of Mechatronical Engineering, Beijing Institute of Technology, Beijing 100081, China
- School of Medical Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Ying Hu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Lihai Zhang
- Department of Orthopedics, Chinese PLA General Hospital, Beijing 100141, China
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| |
Collapse
|
6
|
Regef J, Talasila L, Wiercigroch J, Lin RJ, Kahrs LA. Laryngeal surface reconstructions from monocular endoscopic videos: a structure from motion pipeline for periodic deformations. Int J Comput Assist Radiol Surg 2024; 19:1895-1907. [PMID: 38652415 DOI: 10.1007/s11548-024-03118-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 03/21/2024] [Indexed: 04/25/2024]
Abstract
PURPOSE Surface reconstructions from laryngoscopic videos have the potential to assist clinicians in diagnosing, quantifying, and monitoring airway diseases using minimally invasive techniques. However, tissue movements and deformations make these reconstructions challenging using conventional pipelines. METHODS To facilitate such reconstructions, we developed video frame pre-filtering and featureless dense matching steps to enhance the Alicevision Meshroom SfM pipeline. Time and the anterior glottic angle were used to approximate the rigid state of the airway and to collect frames with different camera poses. Featureless dense matches were tracked with a correspondence transformer across subsets of images to extract matched points that could be used to estimate the point cloud and reconstructed surface. The proposed pipeline was tested on a simulated dataset under various conditions like illumination and resolution as well as real laryngoscopic videos. RESULTS Our pipeline was able to reconstruct the laryngeal region based on 4, 8, and 16 images obtained from simulated and real patient exams. The pipeline was robust to sparse inputs, blur, and extreme lighting conditions, unlike the Meshroom pipeline which failed to produce a point cloud for 6 of 15 simulated datasets. CONCLUSION The pre-filtering and featureless dense matching modules specialize the conventional SfM pipeline to handle the challenging laryngoscopic examinations, directly from patient videos. These 3D visualizations have the potential to improve spatial understanding of airway conditions.
Collapse
Affiliation(s)
- Justin Regef
- Medical Computer Vision and Robotics Lab, University of Toronto, Toronto, ON, Canada.
- Department of Mathematical and Computational Sciences, University of Toronto Mississauga, 3359 Mississauga Rd, Mississauga, ON, L5L 1C6, Canada.
| | - Likhit Talasila
- Medical Computer Vision and Robotics Lab, University of Toronto, Toronto, ON, Canada
- Department of Mathematical and Computational Sciences, University of Toronto Mississauga, 3359 Mississauga Rd, Mississauga, ON, L5L 1C6, Canada
| | - Julia Wiercigroch
- Medical Computer Vision and Robotics Lab, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, 40 St George St, Toronto, ON, M5S 2E4, Canada
| | - R Jun Lin
- Department of Otolaryngology - Head & Neck Surgery, Unity Health Toronto - St. Michael's Hospital, Temerty Faculty of Medicine, University of Toronto, 36 Queen St E, Toronto, ON, M5B 1W8, Canada
| | - Lueder A Kahrs
- Medical Computer Vision and Robotics Lab, University of Toronto, Toronto, ON, Canada
- Department of Mathematical and Computational Sciences, University of Toronto Mississauga, 3359 Mississauga Rd, Mississauga, ON, L5L 1C6, Canada
- Department of Computer Science, University of Toronto, 40 St George St, Toronto, ON, M5S 2E4, Canada
- Department of Otolaryngology - Head & Neck Surgery, Unity Health Toronto - St. Michael's Hospital, Temerty Faculty of Medicine, University of Toronto, 36 Queen St E, Toronto, ON, M5B 1W8, Canada
- Institute of Biomedical Engineering, University of Toronto, 164 College Street, Toronto, ON, M5S 3G9, Canada
| |
Collapse
|
7
|
Rau A, Bano S, Jin Y, Azagra P, Morlana J, Kader R, Sanderson E, Matuszewski BJ, Lee JY, Lee DJ, Posner E, Frank N, Elangovan V, Raviteja S, Li Z, Liu J, Lalithkumar S, Islam M, Ren H, Lovat LB, Montiel JMM, Stoyanov D. SimCol3D - 3D reconstruction during colonoscopy challenge. Med Image Anal 2024; 96:103195. [PMID: 38815359 DOI: 10.1016/j.media.2024.103195] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 02/08/2024] [Accepted: 05/02/2024] [Indexed: 06/01/2024]
Abstract
Colorectal cancer is one of the most common cancers in the world. While colonoscopy is an effective screening technique, navigating an endoscope through the colon to detect polyps is challenging. A 3D map of the observed surfaces could enhance the identification of unscreened colon tissue and serve as a training platform. However, reconstructing the colon from video footage remains difficult. Learning-based approaches hold promise as robust alternatives, but necessitate extensive datasets. Establishing a benchmark dataset, the 2022 EndoVis sub-challenge SimCol3D aimed to facilitate data-driven depth and pose prediction during colonoscopy. The challenge was hosted as part of MICCAI 2022 in Singapore. Six teams from around the world and representatives from academia and industry participated in the three sub-challenges: synthetic depth prediction, synthetic pose prediction, and real pose prediction. This paper describes the challenge, the submitted methods, and their results. We show that depth prediction from synthetic colonoscopy images is robustly solvable, while pose estimation remains an open research question.
Collapse
Affiliation(s)
- Anita Rau
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS) and Department of Computer Science, University College London, London, UK; Stanford University, Stanford, CA, USA.
| | - Sophia Bano
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS) and Department of Computer Science, University College London, London, UK.
| | - Yueming Jin
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS) and Department of Computer Science, University College London, London, UK; National University of Singapore, Singapore.
| | | | | | - Rawen Kader
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS) and Department of Computer Science, University College London, London, UK
| | - Edward Sanderson
- Computer Vision and Machine Learning (CVML) Group, University of Central Lancashire, Preston, UK
| | - Bogdan J Matuszewski
- Computer Vision and Machine Learning (CVML) Group, University of Central Lancashire, Preston, UK
| | - Jae Young Lee
- Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| | - Dong-Jae Lee
- Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| | | | | | | | - Sista Raviteja
- Indian Institute of Technology Kharagpur, Kharagpur, India
| | - Zhengwen Li
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering & Instrument Science, Zhejiang University, China
| | - Jiquan Liu
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering & Instrument Science, Zhejiang University, China
| | - Seenivasan Lalithkumar
- National University of Singapore, Singapore; The Chinese University of Hong Kong, Hong Kong, China
| | | | - Hongliang Ren
- National University of Singapore, Singapore; The Chinese University of Hong Kong, Hong Kong, China
| | - Laurence B Lovat
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS) and Department of Computer Science, University College London, London, UK
| | | | - Danail Stoyanov
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS) and Department of Computer Science, University College London, London, UK
| |
Collapse
|
8
|
Teufel T, Shu H, Soberanis-Mukul RD, Mangulabnan JE, Sahu M, Vedula SS, Ishii M, Hager G, Taylor RH, Unberath M. OneSLAM to map them all: a generalized approach to SLAM for monocular endoscopic imaging based on tracking any point. Int J Comput Assist Radiol Surg 2024; 19:1259-1266. [PMID: 38775904 DOI: 10.1007/s11548-024-03171-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 04/30/2024] [Indexed: 07/10/2024]
Abstract
PURPOSE Monocular SLAM algorithms are the key enabling technology for image-based surgical navigation systems for endoscopic procedures. Due to the visual feature scarcity and unique lighting conditions encountered in endoscopy, classical SLAM approaches perform inconsistently. Many of the recent approaches to endoscopic SLAM rely on deep learning models. They show promising results when optimized on singular domains such as arthroscopy, sinus endoscopy, colonoscopy or laparoscopy, but are limited by an inability to generalize to different domains without retraining. METHODS To address this generality issue, we propose OneSLAM a monocular SLAM algorithm for surgical endoscopy that works out of the box for several endoscopic domains, including sinus endoscopy, colonoscopy, arthroscopy and laparoscopy. Our pipeline builds upon robust tracking any point (TAP) foundation models to reliably track sparse correspondences across multiple frames and runs local bundle adjustment to jointly optimize camera poses and a sparse 3D reconstruction of the anatomy. RESULTS We compare the performance of our method against three strong baselines previously proposed for monocular SLAM in endoscopy and general scenes. OneSLAM presents better or comparable performance over existing approaches targeted to that specific data in all four tested domains, generalizing across domains without the need for retraining. CONCLUSION OneSLAM benefits from the convincing performance of TAP foundation models but generalizes to endoscopic sequences of different anatomies all while demonstrating better or comparable performance over domain-specific SLAM approaches. Future research on global loop closure will investigate how to reliably detect loops in endoscopic scenes to reduce accumulated drift and enhance long-term navigation capabilities.
Collapse
Affiliation(s)
- Timo Teufel
- Johns Hopkins University, Baltimore, MD, 21211, USA.
| | - Hongchao Shu
- Johns Hopkins University, Baltimore, MD, 21211, USA
| | | | | | - Manish Sahu
- Johns Hopkins University, Baltimore, MD, 21211, USA
| | | | - Masaru Ishii
- Johns Hopkins Medical Institutions, Baltimore, MD, 21287, USA
| | | | - Russell H Taylor
- Johns Hopkins University, Baltimore, MD, 21211, USA
- Johns Hopkins Medical Institutions, Baltimore, MD, 21287, USA
| | - Mathias Unberath
- Johns Hopkins University, Baltimore, MD, 21211, USA
- Johns Hopkins Medical Institutions, Baltimore, MD, 21287, USA
| |
Collapse
|
9
|
Schmidt A, Mohareri O, DiMaio SP, Salcudean SE. Surgical Tattoos in Infrared: A Dataset for Quantifying Tissue Tracking and Mapping. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:2634-2645. [PMID: 38437151 DOI: 10.1109/tmi.2024.3372828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2024]
Abstract
Quantifying performance of methods for tracking and mapping tissue in endoscopic environments is essential for enabling image guidance and automation of medical interventions and surgery. Datasets developed so far either use rigid environments, visible markers, or require annotators to label salient points in videos after collection. These are respectively: not general, visible to algorithms, or costly and error-prone. We introduce a novel labeling methodology along with a dataset that uses said methodology, Surgical Tattoos in Infrared (STIR). STIR has labels that are persistent but invisible to visible spectrum algorithms. This is done by labelling tissue points with IR-fluorescent dye, indocyanine green (ICG), and then collecting visible light video clips. STIR comprises hundreds of stereo video clips in both in vivo and ex vivo scenes with start and end points labelled in the IR spectrum. With over 3,000 labelled points, STIR will help to quantify and enable better analysis of tracking and mapping methods. After introducing STIR, we analyze multiple different frame-based tracking methods on STIR using both 3D and 2D endpoint error and accuracy metrics. STIR is available at https://dx.doi.org/10.21227/w8g4-g548.
Collapse
|
10
|
Cui B, Islam M, Bai L, Ren H. Surgical-DINO: adapter learning of foundation models for depth estimation in endoscopic surgery. Int J Comput Assist Radiol Surg 2024; 19:1013-1020. [PMID: 38459402 PMCID: PMC11178563 DOI: 10.1007/s11548-024-03083-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Accepted: 02/16/2024] [Indexed: 03/10/2024]
Abstract
PURPOSE Depth estimation in robotic surgery is vital in 3D reconstruction, surgical navigation and augmented reality visualization. Although the foundation model exhibits outstanding performance in many vision tasks, including depth estimation (e.g., DINOv2), recent works observed its limitations in medical and surgical domain-specific applications. This work presents a low-ranked adaptation (LoRA) of the foundation model for surgical depth estimation. METHODS We design a foundation model-based depth estimation method, referred to as Surgical-DINO, a low-rank adaptation of the DINOv2 for depth estimation in endoscopic surgery. We build LoRA layers and integrate them into DINO to adapt with surgery-specific domain knowledge instead of conventional fine-tuning. During training, we freeze the DINO image encoder, which shows excellent visual representation capacity, and only optimize the LoRA layers and depth decoder to integrate features from the surgical scene. RESULTS Our model is extensively validated on a MICCAI challenge dataset of SCARED, which is collected from da Vinci Xi endoscope surgery. We empirically show that Surgical-DINO significantly outperforms all the state-of-the-art models in endoscopic depth estimation tasks. The analysis with ablation studies has shown evidence of the remarkable effect of our LoRA layers and adaptation. CONCLUSION Surgical-DINO shed some light on the successful adaptation of the foundation models into the surgical domain for depth estimation. There is clear evidence in the results that zero-shot prediction on pre-trained weights in computer vision datasets or naive fine-tuning is not sufficient to use the foundation model in the surgical domain directly.
Collapse
Affiliation(s)
- Beilei Cui
- The Chinese University of Hong Kong, Hong Kong, China
| | - Mobarakol Islam
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), University College London, London, UK
| | - Long Bai
- The Chinese University of Hong Kong, Hong Kong, China
| | - Hongliang Ren
- The Chinese University of Hong Kong, Hong Kong, China.
- Department of BME, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
11
|
Yang Z, Dai J, Pan J. 3D reconstruction from endoscopy images: A survey. Comput Biol Med 2024; 175:108546. [PMID: 38704902 DOI: 10.1016/j.compbiomed.2024.108546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 01/05/2024] [Accepted: 04/28/2024] [Indexed: 05/07/2024]
Abstract
Three-dimensional reconstruction of images acquired through endoscopes is playing a vital role in an increasing number of medical applications. Endoscopes used in the clinic are commonly classified as monocular endoscopes and binocular endoscopes. We have reviewed the classification of methods for depth estimation according to the type of endoscope. Basically, depth estimation relies on feature matching of images and multi-view geometry theory. However, these traditional techniques have many problems in the endoscopic environment. With the increasing development of deep learning techniques, there is a growing number of works based on learning methods to address challenges such as inconsistent illumination and texture sparsity. We have reviewed over 170 papers published in the 10 years from 2013 to 2023. The commonly used public datasets and performance metrics are summarized. We also give a taxonomy of methods and analyze the advantages and drawbacks of algorithms. Summary tables and result atlas are listed to facilitate the comparison of qualitative and quantitative performance of different methods in each category. In addition, we summarize commonly used scene representation methods in endoscopy and speculate on the prospects of deep estimation research in medical applications. We also compare the robustness performance, processing time, and scene representation of the methods to facilitate doctors and researchers in selecting appropriate methods based on surgical applications.
Collapse
Affiliation(s)
- Zhuoyue Yang
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, 37 Xueyuan Road, Haidian District, Beijing, 100191, China; Peng Cheng Lab, 2 Xingke 1st Street, Nanshan District, Shenzhen, Guangdong Province, 518000, China
| | - Ju Dai
- Peng Cheng Lab, 2 Xingke 1st Street, Nanshan District, Shenzhen, Guangdong Province, 518000, China
| | - Junjun Pan
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, 37 Xueyuan Road, Haidian District, Beijing, 100191, China; Peng Cheng Lab, 2 Xingke 1st Street, Nanshan District, Shenzhen, Guangdong Province, 518000, China.
| |
Collapse
|
12
|
Richter A, Steinmann T, Rosenthal JC, Rupitsch SJ. Advances in Real-Time 3D Reconstruction for Medical Endoscopy. J Imaging 2024; 10:120. [PMID: 38786574 PMCID: PMC11122342 DOI: 10.3390/jimaging10050120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 04/23/2024] [Accepted: 04/24/2024] [Indexed: 05/25/2024] Open
Abstract
This contribution is intended to provide researchers with a comprehensive overview of the current state-of-the-art concerning real-time 3D reconstruction methods suitable for medical endoscopy. Over the past decade, there have been various technological advancements in computational power and an increased research effort in many computer vision fields such as autonomous driving, robotics, and unmanned aerial vehicles. Some of these advancements can also be adapted to the field of medical endoscopy while coping with challenges such as featureless surfaces, varying lighting conditions, and deformable structures. To provide a comprehensive overview, a logical division of monocular, binocular, trinocular, and multiocular methods is performed and also active and passive methods are distinguished. Within these categories, we consider both flexible and non-flexible endoscopes to cover the state-of-the-art as fully as possible. The relevant error metrics to compare the publications presented here are discussed, and the choice of when to choose a GPU rather than an FPGA for camera-based 3D reconstruction is debated. We elaborate on the good practice of using datasets and provide a direct comparison of the presented work. It is important to note that in addition to medical publications, publications evaluated on the KITTI and Middlebury datasets are also considered to include related methods that may be suited for medical 3D reconstruction.
Collapse
Affiliation(s)
- Alexander Richter
- Fraunhofer Institute for High-Speed Dynamics, Ernst–Mach–Institut (EMI), Ernst-Zermelo-Straße 4, 79104 Freiburg, Germany
- Electrical Instrumentation and Embedded Systems, Albert–Ludwigs–Universität Freiburg, Goerges-Köhler-Allee 106, 79110 Freiburg, Germany; (T.S.); (S.J.R.)
| | - Till Steinmann
- Electrical Instrumentation and Embedded Systems, Albert–Ludwigs–Universität Freiburg, Goerges-Köhler-Allee 106, 79110 Freiburg, Germany; (T.S.); (S.J.R.)
| | - Jean-Claude Rosenthal
- Fraunhofer Institute for Telecommunications, Heinrich–Hertz–Institut (HHI), Einsteinufer 37, 10587 Berlin, Germany
| | - Stefan J. Rupitsch
- Electrical Instrumentation and Embedded Systems, Albert–Ludwigs–Universität Freiburg, Goerges-Köhler-Allee 106, 79110 Freiburg, Germany; (T.S.); (S.J.R.)
| |
Collapse
|
13
|
Yang Z, Pan J, Dai J, Sun Z, Xiao Y. Self-Supervised Lightweight Depth Estimation in Endoscopy Combining CNN and Transformer. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:1934-1944. [PMID: 38198275 DOI: 10.1109/tmi.2024.3352390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2024]
Abstract
In recent years, an increasing number of medical engineering tasks, such as surgical navigation, pre-operative registration, and surgical robotics, rely on 3D reconstruction techniques. Self-supervised depth estimation has attracted interest in endoscopic scenarios because it does not require ground truth. Most existing methods depend on expanding the size of parameters to improve their performance. There, designing a lightweight self-supervised model that can obtain competitive results is a hot topic. We propose a lightweight network with a tight coupling of convolutional neural network (CNN) and Transformer for depth estimation. Unlike other methods that use CNN and Transformer to extract features separately and then fuse them on the deepest layer, we utilize the modules of CNN and Transformer to extract features at different scales in the encoder. This hierarchical structure leverages the advantages of CNN in texture perception and Transformer in shape extraction. In the same scale of feature extraction, the CNN is used to acquire local features while the Transformer encodes global information. Finally, we add multi-head attention modules to the pose network to improve the accuracy of predicted poses. Experiments demonstrate that our approach obtains comparable results while effectively compressing the model parameters on two datasets.
Collapse
|
14
|
Yu Y, Feng T, Qiu H, Gu Y, Chen Q, Zuo C, Ma H. Simultaneous photoacoustic and ultrasound imaging: A review. ULTRASONICS 2024; 139:107277. [PMID: 38460216 DOI: 10.1016/j.ultras.2024.107277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2023] [Revised: 01/09/2024] [Accepted: 02/26/2024] [Indexed: 03/11/2024]
Abstract
Photoacoustic imaging (PAI) is an emerging biomedical imaging technique that combines the advantages of optical and ultrasound imaging, enabling the generation of images with both optical resolution and acoustic penetration depth. By leveraging similar signal acquisition and processing methods, the integration of photoacoustic and ultrasound imaging has introduced a novel hybrid imaging modality suitable for clinical applications. Photoacoustic-ultrasound imaging allows for non-invasive, high-resolution, and deep-penetrating imaging, providing a wealth of image information. In recent years, with the deepening research and the expanding biomedical application scenarios of photoacoustic-ultrasound bimodal systems, the immense potential of photoacoustic-ultrasound bimodal imaging in basic research and clinical applications has been demonstrated, with some research achievements already commercialized. In this review, we introduce the principles, technical advantages, and biomedical applications of photoacoustic-ultrasound bimodal imaging techniques, specifically focusing on tomographic, microscopic, and endoscopic imaging modalities. Furthermore, we discuss the future directions of photoacoustic-ultrasound bimodal imaging technology.
Collapse
Affiliation(s)
- Yinshi Yu
- Smart Computational Imaging Laboratory (SCILab), School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu Province 210094, China; Smart Computational Imaging Research Institute (SCIRI) of Nanjing University of Science and Technology, Nanjing, Jiangsu Province 210019, China; Jiangsu Key Laboratory of Spectral Imaging & Intelligent Sense, Nanjing, Jiangsu Province 210094, China
| | - Ting Feng
- Academy for Engineering & Technology, Fudan University, Shanghai 200433,China.
| | - Haixia Qiu
- First Medical Center of PLA General Hospital, Beijing, China
| | - Ying Gu
- First Medical Center of PLA General Hospital, Beijing, China
| | - Qian Chen
- Smart Computational Imaging Laboratory (SCILab), School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu Province 210094, China; Smart Computational Imaging Research Institute (SCIRI) of Nanjing University of Science and Technology, Nanjing, Jiangsu Province 210019, China; Jiangsu Key Laboratory of Spectral Imaging & Intelligent Sense, Nanjing, Jiangsu Province 210094, China
| | - Chao Zuo
- Smart Computational Imaging Laboratory (SCILab), School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu Province 210094, China; Smart Computational Imaging Research Institute (SCIRI) of Nanjing University of Science and Technology, Nanjing, Jiangsu Province 210019, China; Jiangsu Key Laboratory of Spectral Imaging & Intelligent Sense, Nanjing, Jiangsu Province 210094, China.
| | - Haigang Ma
- Smart Computational Imaging Laboratory (SCILab), School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu Province 210094, China; Smart Computational Imaging Research Institute (SCIRI) of Nanjing University of Science and Technology, Nanjing, Jiangsu Province 210019, China; Jiangsu Key Laboratory of Spectral Imaging & Intelligent Sense, Nanjing, Jiangsu Province 210094, China.
| |
Collapse
|
15
|
Masuda T, Sagawa R, Furukawa R, Kawasaki H. Scale-preserving shape reconstruction from monocular endoscope image sequences by supervised depth learning. Healthc Technol Lett 2024; 11:76-84. [PMID: 38638502 PMCID: PMC11022228 DOI: 10.1049/htl2.12064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Accepted: 11/21/2023] [Indexed: 04/20/2024] Open
Abstract
Reconstructing 3D shapes from images are becoming popular, but such methods usually estimate relative depth maps with ambiguous scales. A method for reconstructing a scale-preserving 3D shape from monocular endoscope image sequences through training an absolute depth prediction network is proposed. First, a dataset of synchronized sequences of RGB images and depth maps is created using an endoscope simulator. Then, a supervised depth prediction network is trained that estimates a depth map from a RGB image minimizing the loss compared to the ground-truth depth map. The predicted depth map sequence is aligned to reconstruct a 3D shape. Finally, the proposed method is applied to a real endoscope image sequence.
Collapse
Affiliation(s)
- Takeshi Masuda
- Artificial Intelligence Research CenterNational Institute of Advanced Industrial Science and Technology (AIST)TsukubaIbarakiJapan
| | - Ryusuke Sagawa
- Artificial Intelligence Research CenterNational Institute of Advanced Industrial Science and Technology (AIST)TsukubaIbarakiJapan
| | - Ryo Furukawa
- Faculty of EngineeringKindai UniversityHigashihiroshimaHiroshimaJapan
| | - Hiroshi Kawasaki
- Faculty of Information Science and Electrical EngineeringKyushu UniversityFukuokaJapan
| |
Collapse
|
16
|
Xu K, Wu H, Iwahori Y, Yu X, Hu Z, Wang A. A Vascular Feature Detection and Matching Method Based on Dual-Branch Fusion and Structure Enhancement. SENSORS (BASEL, SWITZERLAND) 2024; 24:1880. [PMID: 38544143 PMCID: PMC10975952 DOI: 10.3390/s24061880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 03/09/2024] [Accepted: 03/13/2024] [Indexed: 11/12/2024]
Abstract
How to obtain internal cavity features and perform image matching is a great challenge for laparoscopic 3D reconstruction. This paper proposes a method for detecting and associating vascular features based on dual-branch weighted fusion vascular structure enhancement. Our proposed method is divided into three stages, including analyzing various types of minimally invasive surgery (MIS) images and designing a universal preprocessing framework to make our method generalized. We propose a Gaussian weighted fusion vascular structure enhancement algorithm using the dual-branch Frangi measure and MFAT (multiscale fractional anisotropic tensor) to address the structural measurement differences and uneven responses between venous vessels and microvessels, providing effective structural information for vascular feature extraction. We extract vascular features through dual-circle detection based on branch point characteristics, and introduce NMS (non-maximum suppression) to reduce feature point redundancy. We also calculate the ZSSD (zero sum of squared differences) and perform feature matching on the neighboring blocks of feature points extracted from the front and back frames. The experimental results show that the proposed method has an average accuracy and repeatability score of 0.7149 and 0.5612 in the Vivo data set, respectively. By evaluating the quantity, repeatability, and accuracy of feature detection, our method has more advantages and robustness than the existing methods.
Collapse
Affiliation(s)
- Kaiyang Xu
- Heilongjiang Province Key Laboratory of Laser Spectroscopy Technology and Application, Harbin University of Science and Technology, Harbin 150080, China; (K.X.); (Z.H.); (A.W.)
| | - Haibin Wu
- Heilongjiang Province Key Laboratory of Laser Spectroscopy Technology and Application, Harbin University of Science and Technology, Harbin 150080, China; (K.X.); (Z.H.); (A.W.)
| | - Yuji Iwahori
- Computer Science, Chubu University, Kasugai 487-8501, Japan;
| | - Xiaoyu Yu
- College of Electron and Information, University of Electronic Science and Technology of China, Zhongshan Institute, Zhongshan 528402, China;
| | - Zeyu Hu
- Heilongjiang Province Key Laboratory of Laser Spectroscopy Technology and Application, Harbin University of Science and Technology, Harbin 150080, China; (K.X.); (Z.H.); (A.W.)
| | - Aili Wang
- Heilongjiang Province Key Laboratory of Laser Spectroscopy Technology and Application, Harbin University of Science and Technology, Harbin 150080, China; (K.X.); (Z.H.); (A.W.)
| |
Collapse
|
17
|
Zhang C, Wei R, Mo H, Zhai Y, Sun D. Deep learning-assisted 3D laser steering using an optofluidic laser scanner. BIOMEDICAL OPTICS EXPRESS 2024; 15:1668-1681. [PMID: 38495701 PMCID: PMC10942714 DOI: 10.1364/boe.514489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 12/26/2023] [Accepted: 12/27/2023] [Indexed: 03/19/2024]
Abstract
Laser ablation is an effective treatment modality. However, current laser scanners suffer from laser defocusing when scanning targets at different depths in a 3D surgical scene. This study proposes a deep learning-assisted 3D laser steering strategy for minimally invasive surgery that eliminates laser defocusing, increases working distance, and extends scanning range. An optofluidic laser scanner is developed to conduct 3D laser steering. The optofluidic laser scanner has no mechanical moving components, enabling miniature size, lightweight, and low driving voltage. A deep learning-based monocular depth estimation method provides real-time target depth estimation so that the focal length of the laser scanner can be adjusted for laser focusing. Simulations and experiments indicate that the proposed method can significantly increase the working distance and maintain laser focusing while performing 2D laser steering, demonstrating the potential for application in minimally invasive surgery.
Collapse
Affiliation(s)
- Chunqi Zhang
- Department of Biomedical Engineering, City University of Hong Kong, Hong Kong SAR, 999077, China
| | - Ruofeng Wei
- Department of Biomedical Engineering, City University of Hong Kong, Hong Kong SAR, 999077, China
| | - Hangjie Mo
- Department of Biomedical Engineering, City University of Hong Kong, Hong Kong SAR, 999077, China
| | - Yujia Zhai
- Department of Biomedical Engineering, City University of Hong Kong, Hong Kong SAR, 999077, China
| | - Dong Sun
- Department of Biomedical Engineering, City University of Hong Kong, Hong Kong SAR, 999077, China
- Center of Robotics and Automation, Shenzhen Research Institute, Shenzhen, Guangdong, 518000, China
| |
Collapse
|
18
|
Zhang Z, Song H, Fan J, Fu T, Li Q, Ai D, Xiao D, Yang J. Dual-correlate optimized coarse-fine strategy for monocular laparoscopic videos feature matching via multilevel sequential coupling feature descriptor. Comput Biol Med 2024; 169:107890. [PMID: 38168646 DOI: 10.1016/j.compbiomed.2023.107890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Revised: 12/13/2023] [Accepted: 12/18/2023] [Indexed: 01/05/2024]
Abstract
Feature matching of monocular laparoscopic videos is crucial for visualization enhancement in computer-assisted surgery, and the keys to conducting high-quality matches are accurate homography estimation, relative pose estimation, as well as sufficient matches and fast calculation. However, limited by various monocular laparoscopic imaging characteristics such as highlight noises, motion blur, texture interference and illumination variation, most exiting feature matching methods face the challenges of producing high-quality matches efficiently and sufficiently. To overcome these limitations, this paper presents a novel sequential coupling feature descriptor to extract and express multilevel feature maps efficiently, and a dual-correlate optimized coarse-fine strategy to establish dense matches in coarse level and adjust pixel-wise matches in fine level. Firstly, a novel sequential coupling swin transformer layer is designed in feature descriptor to learn and extract multilevel feature representations richly without increasing complexity. Then, a dual-correlate optimized coarse-fine strategy is proposed to match coarse feature sequences under low resolution, and the correlated fine feature sequences is optimized to refine pixel-wise matches based on coarse matching priors. Finally, the sequential coupling feature descriptor and dual-correlate optimization are merged into the Sequential Coupling Dual-Correlate Network (SeCo DC-Net) to produce high-quality matches. The evaluation is conducted on two public laparoscopic datasets: Scared and EndoSLAM, and the experimental results show the proposed network outperforms state-of-the-art methods in homography estimation, relative pose estimation, reprojection error, matching pairs number and inference runtime. The source code is publicly available at https://github.com/Iheckzza/FeatureMatching.
Collapse
Affiliation(s)
- Ziang Zhang
- The School of Medical Technology, Beijing Institute of Technology, Beijing, 100081, China
| | - Hong Song
- The School of Computer Science & Technology, Beijing Institute of Technology, Beijing, 100081, China.
| | - Jingfan Fan
- The School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China.
| | - Tianyu Fu
- The School of Medical Technology, Beijing Institute of Technology, Beijing, 100081, China
| | - Qiang Li
- The School of Computer Science & Technology, Beijing Institute of Technology, Beijing, 100081, China
| | - Danni Ai
- The School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Deqaing Xiao
- The School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Jian Yang
- The School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China.
| |
Collapse
|
19
|
Liu S, Fan J, Yang Y, Xiao D, Ai D, Song H, Wang Y, Yang J. Monocular endoscopy images depth estimation with multi-scale residual fusion. Comput Biol Med 2024; 169:107850. [PMID: 38145602 DOI: 10.1016/j.compbiomed.2023.107850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 11/16/2023] [Accepted: 12/11/2023] [Indexed: 12/27/2023]
Abstract
BACKGROUND Monocular depth estimation plays a fundamental role in clinical endoscopy surgery. However, the coherent illumination, smooth surfaces, and texture-less nature of endoscopy images present significant challenges to traditional depth estimation methods. Existing approaches struggle to accurately perceive depth in such settings. METHOD To overcome these challenges, this paper proposes a novel multi-scale residual fusion method for estimating the depth of monocular endoscopy images. Specifically, we address the issue of coherent illumination by leveraging image frequency domain component space transformation, thereby enhancing the stability of the scene's light source. Moreover, we employ an image radiation intensity attenuation model to estimate the initial depth map. Finally, to refine the accuracy of depth estimation, we utilize a multi-scale residual fusion optimization technique. RESULTS To evaluate the performance of our proposed method, extensive experiments were conducted on public datasets. The structural similarity measures for continuous frames in three distinct clinical data scenes reached impressive values of 0.94, 0.82, and 0.84, respectively. These results demonstrate the effectiveness of our approach in capturing the intricate details of endoscopy images. Furthermore, the depth estimation accuracy achieved remarkable levels of 89.3 % and 91.2 % for the two models' data, respectively, underscoring the robustness of our method. CONCLUSIONS Overall, the promising results obtained on public datasets highlight the significant potential of our method for clinical applications, facilitating reliable depth estimation and enhancing the quality of endoscopy surgical procedures.
Collapse
Affiliation(s)
- Shiyuan Liu
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China; China Center for Information Industry Development, Beijing, 100081, China
| | - Jingfan Fan
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China.
| | - Yun Yang
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, National Clinical Research Center for Digestive Diseases, Beijing 100050, China
| | - Deqiang Xiao
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Danni Ai
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Hong Song
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
| | - Yongtian Wang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China.
| | - Jian Yang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| |
Collapse
|
20
|
Yu J, Pruitt K, Nawawithan N, Johnson BA, Gahan J, Fei B. Dense surface reconstruction using a learning-based monocular vSLAM model for laparoscopic surgery. PROCEEDINGS OF SPIE--THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING 2024; 12928:129280J. [PMID: 38745863 PMCID: PMC11093590 DOI: 10.1117/12.3008768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Augmented reality (AR) has seen increased interest and attention for its application in surgical procedures. AR-guided surgical systems can overlay segmented anatomy from pre-operative imaging onto the user's environment to delineate hard-to-see structures and subsurface lesions intraoperatively. While previous works have utilized pre-operative imaging such as computed tomography or magnetic resonance images, registration methods still lack the ability to accurately register deformable anatomical structures without fiducial markers across modalities and dimensionalities. This is especially true of minimally invasive abdominal surgical techniques, which often employ a monocular laparoscope, due to inherent limitations. Surgical scene reconstruction is a critical component towards accurate registrations needed for AR-guided surgery and other downstream AR applications such as remote assistance or surgical simulation. In this work, we utilize a state-of-the-art (SOTA) deep-learning-based visual simultaneous localization and mapping (vSLAM) algorithm to generate a dense 3D reconstruction with camera pose estimations and depth maps from video obtained with a monocular laparoscope. The proposed method can robustly reconstruct surgical scenes using real-time data and provide camera pose estimations without stereo or additional sensors, which increases its usability and is less intrusive. We also demonstrate a framework to evaluate current vSLAM algorithms on non-Lambertian, low-texture surfaces and explore using its outputs on downstream tasks. We expect these evaluation methods can be utilized for the continual refinement of newer algorithms for AR-guided surgery.
Collapse
Affiliation(s)
- James Yu
- Center for Imaging and Surgical Innovation, University of Texas at Dallas, Richardson, TX
- Department of Radiology, University of Texas Southwestern Medical Center, Dallas, TX
- Department of Bioengineering, University of Texas at Dallas, Richardson, TX
| | - Kelden Pruitt
- Center for Imaging and Surgical Innovation, University of Texas at Dallas, Richardson, TX
- Department of Bioengineering, University of Texas at Dallas, Richardson, TX
| | - Nati Nawawithan
- Center for Imaging and Surgical Innovation, University of Texas at Dallas, Richardson, TX
- Department of Bioengineering, University of Texas at Dallas, Richardson, TX
| | - Brett A. Johnson
- Department of Urology, University of Texas Southwestern Medical Center, Dallas, TX
| | - Jeffrey Gahan
- Department of Urology, University of Texas Southwestern Medical Center, Dallas, TX
| | - Baowei Fei
- Center for Imaging and Surgical Innovation, University of Texas at Dallas, Richardson, TX
- Department of Radiology, University of Texas Southwestern Medical Center, Dallas, TX
- Department of Bioengineering, University of Texas at Dallas, Richardson, TX
| |
Collapse
|
21
|
Cartucho J, Weld A, Tukra S, Xu H, Matsuzaki H, Ishikawa T, Kwon M, Jang YE, Kim KJ, Lee G, Bai B, Kahrs LA, Boecking L, Allmendinger S, Müller L, Zhang Y, Jin Y, Bano S, Vasconcelos F, Reiter W, Hajek J, Silva B, Lima E, Vilaça JL, Queirós S, Giannarou S. SurgT challenge: Benchmark of soft-tissue trackers for robotic surgery. Med Image Anal 2024; 91:102985. [PMID: 37844472 DOI: 10.1016/j.media.2023.102985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 08/30/2023] [Accepted: 09/28/2023] [Indexed: 10/18/2023]
Abstract
This paper introduces the "SurgT: Surgical Tracking" challenge which was organized in conjunction with the 25th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI 2022). There were two purposes for the creation of this challenge: (1) the establishment of the first standardized benchmark for the research community to assess soft-tissue trackers; and (2) to encourage the development of unsupervised deep learning methods, given the lack of annotated data in surgery. A dataset of 157 stereo endoscopic videos from 20 clinical cases, along with stereo camera calibration parameters, have been provided. Participants were assigned the task of developing algorithms to track the movement of soft tissues, represented by bounding boxes, in stereo endoscopic videos. At the end of the challenge, the developed methods were assessed on a previously hidden test subset. This assessment uses benchmarking metrics that were purposely developed for this challenge, to verify the efficacy of unsupervised deep learning algorithms in tracking soft-tissue. The metric used for ranking the methods was the Expected Average Overlap (EAO) score, which measures the average overlap between a tracker's and the ground truth bounding boxes. Coming first in the challenge was the deep learning submission by ICVS-2Ai with a superior EAO score of 0.617. This method employs ARFlow to estimate unsupervised dense optical flow from cropped images, using photometric and regularization losses. Second, Jmees with an EAO of 0.583, uses deep learning for surgical tool segmentation on top of a non-deep learning baseline method: CSRT. CSRT by itself scores a similar EAO of 0.563. The results from this challenge show that currently, non-deep learning methods are still competitive. The dataset and benchmarking tool created for this challenge have been made publicly available at https://surgt.grand-challenge.org/. This challenge is expected to contribute to the development of autonomous robotic surgery and other digital surgical technologies.
Collapse
Affiliation(s)
- João Cartucho
- The Hamlyn Centre for Robotic Surgery, Imperial College London, United Kingdom.
| | - Alistair Weld
- The Hamlyn Centre for Robotic Surgery, Imperial College London, United Kingdom
| | - Samyakh Tukra
- The Hamlyn Centre for Robotic Surgery, Imperial College London, United Kingdom
| | - Haozheng Xu
- The Hamlyn Centre for Robotic Surgery, Imperial College London, United Kingdom
| | | | | | - Minjun Kwon
- Electronics and Telecommunications Research Institute (ETRI), Daejeon, South Korea
| | - Yong Eun Jang
- Electronics and Telecommunications Research Institute (ETRI), Daejeon, South Korea
| | - Kwang-Ju Kim
- Electronics and Telecommunications Research Institute (ETRI), Daejeon, South Korea
| | - Gwang Lee
- Ajou University, Gyeonggi-do, South Korea
| | - Bizhe Bai
- Medical Computer Vision and Robotics Lab, University of Toronto, Canada
| | - Lueder A Kahrs
- Medical Computer Vision and Robotics Lab, University of Toronto, Canada
| | | | | | | | - Yitong Zhang
- Surgical Robot Vision, University College London, United Kingdom
| | - Yueming Jin
- Surgical Robot Vision, University College London, United Kingdom
| | - Sophia Bano
- Surgical Robot Vision, University College London, United Kingdom
| | | | | | | | - Bruno Silva
- Life and Health Sciences Research Institute (ICVS), School of Medicine, University of Minho, Braga, Portugal; ICVS/3B's - PT Government Associate Laboratory, Braga/Guimarães, Portugal; 2Ai - School of Technology, IPCA, Barcelos, Portugal
| | - Estevão Lima
- Life and Health Sciences Research Institute (ICVS), School of Medicine, University of Minho, Braga, Portugal; ICVS/3B's - PT Government Associate Laboratory, Braga/Guimarães, Portugal
| | - João L Vilaça
- 2Ai - School of Technology, IPCA, Barcelos, Portugal
| | - Sandro Queirós
- Life and Health Sciences Research Institute (ICVS), School of Medicine, University of Minho, Braga, Portugal; ICVS/3B's - PT Government Associate Laboratory, Braga/Guimarães, Portugal
| | - Stamatia Giannarou
- The Hamlyn Centre for Robotic Surgery, Imperial College London, United Kingdom
| |
Collapse
|
22
|
Liu S, Fan J, Zang L, Yang Y, Fu T, Song H, Wang Y, Yang J. Pose estimation via structure-depth information from monocular endoscopy images sequence. BIOMEDICAL OPTICS EXPRESS 2024; 15:460-478. [PMID: 38223180 PMCID: PMC10783895 DOI: 10.1364/boe.498262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 12/08/2023] [Accepted: 12/14/2023] [Indexed: 01/16/2024]
Abstract
Image-based endoscopy pose estimation has been shown to significantly improve the visualization and accuracy of minimally invasive surgery (MIS). This paper proposes a method for pose estimation based on structure-depth information from a monocular endoscopy image sequence. Firstly, the initial frame location is constrained using the image structure difference (ISD) network. Secondly, endoscopy image depth information is used to estimate the pose of sequence frames. Finally, adaptive boundary constraints are used to optimize continuous frame endoscopy pose estimation, resulting in more accurate intraoperative endoscopy pose estimation. Evaluations were conducted on publicly available datasets, with the pose estimation error in bronchoscopy and colonoscopy datasets reaching 1.43 mm and 3.64 mm, respectively. These results meet the real-time requirements of various scenarios, demonstrating the capability of this method to generate reliable pose estimation results for endoscopy images and its meaningful applications in clinical practice. This method enables accurate localization of endoscopy images during surgery, assisting physicians in performing safer and more effective procedures.
Collapse
Affiliation(s)
- Shiyuan Liu
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China
- China Center for Information Industry Development, Beijing 100081, China
| | - Jingfan Fan
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China
| | - Liugeng Zang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China
| | - Yun Yang
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University; National Clinical Research Center for Digestive Diseases, Beijing 100050, China
| | - Tianyu Fu
- Institute of Engineering Medicine, Beijing Institute of Technology, Beijing 100081, China
| | - Hong Song
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Yongtian Wang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China
| | - Jian Yang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
23
|
Bordbar M, Helfroush MS, Danyali H, Ejtehadi F. Wireless capsule endoscopy multiclass classification using three-dimensional deep convolutional neural network model. Biomed Eng Online 2023; 22:124. [PMID: 38098015 PMCID: PMC10722702 DOI: 10.1186/s12938-023-01186-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 11/29/2023] [Indexed: 12/17/2023] Open
Abstract
BACKGROUND Wireless capsule endoscopy (WCE) is a patient-friendly and non-invasive technology that scans the whole of the gastrointestinal tract, including difficult-to-access regions like the small bowel. Major drawback of this technology is that the visual inspection of a large number of video frames produced during each examination makes the physician diagnosis process tedious and prone to error. Several computer-aided diagnosis (CAD) systems, such as deep network models, have been developed for the automatic recognition of abnormalities in WCE frames. Nevertheless, most of these studies have only focused on spatial information within individual WCE frames, missing the crucial temporal data within consecutive frames. METHODS In this article, an automatic multiclass classification system based on a three-dimensional deep convolutional neural network (3D-CNN) is proposed, which utilizes the spatiotemporal information to facilitate the WCE diagnosis process. The 3D-CNN model fed with a series of sequential WCE frames in contrast to the two-dimensional (2D) model, which exploits frames as independent ones. Moreover, the proposed 3D deep model is compared with some pre-trained networks. The proposed models are trained and evaluated with 29 subject WCE videos (14,691 frames before augmentation). The performance advantages of 3D-CNN over 2D-CNN and pre-trained networks are verified in terms of sensitivity, specificity, and accuracy. RESULTS 3D-CNN outperforms the 2D technique in all evaluation metrics (sensitivity: 98.92 vs. 98.05, specificity: 99.50 vs. 86.94, accuracy: 99.20 vs. 92.60). In conclusion, a novel 3D-CNN model for lesion detection in WCE frames is proposed in this study. CONCLUSION The results indicate the performance of 3D-CNN over 2D-CNN and some well-known pre-trained classifier networks. The proposed 3D-CNN model uses the rich temporal information in adjacent frames as well as spatial data to develop an accurate and efficient model.
Collapse
Affiliation(s)
- Mehrdokht Bordbar
- Department of Electrical Engineering, Shiraz University of Technology, Shiraz, Iran
| | | | - Habibollah Danyali
- Department of Electrical Engineering, Shiraz University of Technology, Shiraz, Iran
| | - Fardad Ejtehadi
- Department of Internal Medicine, Gastroenterohepatology Research Center, School of Medicine, Shiraz University of Medical Sciences, Shiraz, Iran
| |
Collapse
|
24
|
Daher R, Vasconcelos F, Stoyanov D. A Temporal Learning Approach to Inpainting Endoscopic Specularities and Its Effect on Image Correspondence. Med Image Anal 2023; 90:102994. [PMID: 37812856 PMCID: PMC10958122 DOI: 10.1016/j.media.2023.102994] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2021] [Revised: 08/31/2023] [Accepted: 10/02/2023] [Indexed: 10/11/2023]
Abstract
Video streams are utilised to guide minimally-invasive surgery and diagnosis in a wide range of procedures, and many computer-assisted techniques have been developed to automatically analyse them. These approaches can provide additional information to the surgeon such as lesion detection, instrument navigation, or anatomy 3D shape modelling. However, the necessary image features to recognise these patterns are not always reliably detected due to the presence of irregular light patterns such as specular highlight reflections. In this paper, we aim at removing specular highlights from endoscopic videos using machine learning. We propose using a temporal generative adversarial network (GAN) to inpaint the hidden anatomy under specularities, inferring its appearance spatially and from neighbouring frames, where they are not present in the same location. This is achieved using in-vivo data from gastric endoscopy (Hyper Kvasir) in a fully unsupervised manner that relies on the automatic detection of specular highlights. System evaluations show significant improvements to other methods through direct comparison and ablation studies that depict the importance of the network's temporal and transfer learning components. The generalisability of our system to different surgical setups and procedures was also evaluated qualitatively on in-vivo data of gastric endoscopy and ex-vivo porcine data (SERV-CT, SCARED). We also assess the effect of our method in comparison to other methods on computer vision tasks that underpin 3D reconstruction and camera motion estimation, namely stereo disparity, optical flow, and sparse point feature matching. These are evaluated quantitatively and qualitatively and results show a positive effect of our specular inpainting method on these tasks in a novel comprehensive analysis. Our code and dataset are made available at https://github.com/endomapper/Endo-STTN.
Collapse
Affiliation(s)
- Rema Daher
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), University College London, Gower Street, London, WC1E 6BT, UK.
| | - Francisco Vasconcelos
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), University College London, Gower Street, London, WC1E 6BT, UK.
| | - Danail Stoyanov
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), University College London, Gower Street, London, WC1E 6BT, UK.
| |
Collapse
|
25
|
Bobrow TL, Golhar M, Vijayan R, Akshintala VS, Garcia JR, Durr NJ. Colonoscopy 3D video dataset with paired depth from 2D-3D registration. Med Image Anal 2023; 90:102956. [PMID: 37713764 PMCID: PMC10591895 DOI: 10.1016/j.media.2023.102956] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 06/29/2023] [Accepted: 09/04/2023] [Indexed: 09/17/2023]
Abstract
Screening colonoscopy is an important clinical application for several 3D computer vision techniques, including depth estimation, surface reconstruction, and missing region detection. However, the development, evaluation, and comparison of these techniques in real colonoscopy videos remain largely qualitative due to the difficulty of acquiring ground truth data. In this work, we present a Colonoscopy 3D Video Dataset (C3VD) acquired with a high definition clinical colonoscope and high-fidelity colon models for benchmarking computer vision methods in colonoscopy. We introduce a novel multimodal 2D-3D registration technique to register optical video sequences with ground truth rendered views of a known 3D model. The different modalities are registered by transforming optical images to depth maps with a Generative Adversarial Network and aligning edge features with an evolutionary optimizer. This registration method achieves an average translation error of 0.321 millimeters and an average rotation error of 0.159 degrees in simulation experiments where error-free ground truth is available. The method also leverages video information, improving registration accuracy by 55.6% for translation and 60.4% for rotation compared to single frame registration. 22 short video sequences were registered to generate 10,015 total frames with paired ground truth depth, surface normals, optical flow, occlusion, six degree-of-freedom pose, coverage maps, and 3D models. The dataset also includes screening videos acquired by a gastroenterologist with paired ground truth pose and 3D surface models. The dataset and registration source code are available at https://durr.jhu.edu/C3VD.
Collapse
Affiliation(s)
- Taylor L Bobrow
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Mayank Golhar
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Rohan Vijayan
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Venkata S Akshintala
- Division of Gastroenterology and Hepatology, Johns Hopkins Medicine, Baltimore, MD 21287, USA
| | - Juan R Garcia
- Department of Art as Applied to Medicine, Johns Hopkins School of Medicine, Baltimore, MD 21287, USA
| | - Nicholas J Durr
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA.
| |
Collapse
|
26
|
Azagra P, Sostres C, Ferrández Á, Riazuelo L, Tomasini C, Barbed OL, Morlana J, Recasens D, Batlle VM, Gómez-Rodríguez JJ, Elvira R, López J, Oriol C, Civera J, Tardós JD, Murillo AC, Lanas A, Montiel JMM. Endomapper dataset of complete calibrated endoscopy procedures. Sci Data 2023; 10:671. [PMID: 37789003 PMCID: PMC10547713 DOI: 10.1038/s41597-023-02564-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Accepted: 09/14/2023] [Indexed: 10/05/2023] Open
Abstract
Computer-assisted systems are becoming broadly used in medicine. In endoscopy, most research focuses on the automatic detection of polyps or other pathologies, but localization and navigation of the endoscope are completely performed manually by physicians. To broaden this research and bring spatial Artificial Intelligence to endoscopies, data from complete procedures is needed. This paper introduces the Endomapper dataset, the first collection of complete endoscopy sequences acquired during regular medical practice, making secondary use of medical data. Its main purpose is to facilitate the development and evaluation of Visual Simultaneous Localization and Mapping (VSLAM) methods in real endoscopy data. The dataset contains more than 24 hours of video. It is the first endoscopic dataset that includes endoscope calibration as well as the original calibration videos. Meta-data and annotations associated with the dataset vary from the anatomical landmarks, procedure labeling, segmentations, reconstructions, simulated sequences with ground truth and same patient procedures. The software used in this paper is publicly available.
Collapse
Affiliation(s)
- Pablo Azagra
- Instituto de Investigación en Ingeniería de Aragón (I3A), Universidad de Zaragoza, Zaragoza, Spain.
| | - Carlos Sostres
- Digestive Disease Service, Hospital Clínico Universitario Lozano Blesa, Zaragoza, Spain
- Department of Medicine, Universidad de Zaragoza, Zaragoza, Spain
- Instituto de Investigación Sanitaria Aragón (IIS Aragón), Zaragoza, Spain
- Centro de Investigación Biomédica en Red, Enfermedades Hepáticas y Digestivas (CIBEREHD), Madrid, Spain
| | - Ángel Ferrández
- Digestive Disease Service, Hospital Clínico Universitario Lozano Blesa, Zaragoza, Spain
- Department of Medicine, Universidad de Zaragoza, Zaragoza, Spain
- Instituto de Investigación Sanitaria Aragón (IIS Aragón), Zaragoza, Spain
- Centro de Investigación Biomédica en Red, Enfermedades Hepáticas y Digestivas (CIBEREHD), Madrid, Spain
| | - Luis Riazuelo
- Instituto de Investigación en Ingeniería de Aragón (I3A), Universidad de Zaragoza, Zaragoza, Spain
| | - Clara Tomasini
- Instituto de Investigación en Ingeniería de Aragón (I3A), Universidad de Zaragoza, Zaragoza, Spain
| | - O León Barbed
- Instituto de Investigación en Ingeniería de Aragón (I3A), Universidad de Zaragoza, Zaragoza, Spain
| | - Javier Morlana
- Instituto de Investigación en Ingeniería de Aragón (I3A), Universidad de Zaragoza, Zaragoza, Spain
| | - David Recasens
- Instituto de Investigación en Ingeniería de Aragón (I3A), Universidad de Zaragoza, Zaragoza, Spain
| | - Víctor M Batlle
- Instituto de Investigación en Ingeniería de Aragón (I3A), Universidad de Zaragoza, Zaragoza, Spain
| | - Juan J Gómez-Rodríguez
- Instituto de Investigación en Ingeniería de Aragón (I3A), Universidad de Zaragoza, Zaragoza, Spain
| | - Richard Elvira
- Instituto de Investigación en Ingeniería de Aragón (I3A), Universidad de Zaragoza, Zaragoza, Spain
| | - Julia López
- Digestive Disease Service, Hospital Clínico Universitario Lozano Blesa, Zaragoza, Spain
| | - Cristina Oriol
- Instituto de Investigación en Ingeniería de Aragón (I3A), Universidad de Zaragoza, Zaragoza, Spain
| | - Javier Civera
- Instituto de Investigación en Ingeniería de Aragón (I3A), Universidad de Zaragoza, Zaragoza, Spain
| | - Juan D Tardós
- Instituto de Investigación en Ingeniería de Aragón (I3A), Universidad de Zaragoza, Zaragoza, Spain
| | - Ana C Murillo
- Instituto de Investigación en Ingeniería de Aragón (I3A), Universidad de Zaragoza, Zaragoza, Spain
| | - Angel Lanas
- Digestive Disease Service, Hospital Clínico Universitario Lozano Blesa, Zaragoza, Spain
- Department of Medicine, Universidad de Zaragoza, Zaragoza, Spain
- Instituto de Investigación Sanitaria Aragón (IIS Aragón), Zaragoza, Spain
- Centro de Investigación Biomédica en Red, Enfermedades Hepáticas y Digestivas (CIBEREHD), Madrid, Spain
| | - José M M Montiel
- Instituto de Investigación en Ingeniería de Aragón (I3A), Universidad de Zaragoza, Zaragoza, Spain
| |
Collapse
|
27
|
Wang X, Nie Y, Ren W, Wei M, Zhang J. Multi-scale, multi-dimensional binocular endoscopic image depth estimation network. Comput Biol Med 2023; 164:107305. [PMID: 37597409 DOI: 10.1016/j.compbiomed.2023.107305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Revised: 07/07/2023] [Accepted: 07/28/2023] [Indexed: 08/21/2023]
Abstract
During invasive surgery, the use of deep learning techniques to acquire depth information from lesion sites in real-time is hindered by the lack of endoscopic environmental datasets. This work aims to develop a high-accuracy three-dimensional (3D) simulation model for generating image datasets and acquiring depth information in real-time. Here, we proposed an end-to-end multi-scale supervisory depth estimation network (MMDENet) model for the depth estimation of pairs of binocular images. The proposed MMDENet highlights a multi-scale feature extraction module incorporating contextual information to enhance the correspondence precision of poorly exposed regions. A multi-dimensional information-guidance refinement module is also proposed to refine the initial coarse disparity map. Statistical experimentation demonstrated a 3.14% reduction in endpoint error compared to state-of-the-art methods. With a processing time of approximately 30fps, satisfying the requirements of real-time operation applications. In order to validate the performance of the trained MMDENet in actual endoscopic images, we conduct both qualitative and quantitative analysis with 93.38% high precision, which holds great promise for applications in surgical navigation.
Collapse
Affiliation(s)
- Xiongzhi Wang
- School of Future Technology, University of Chinese Academy of Sciences, Beijing 100039, China; School of Aerospace Science And Technology, Xidian University, Xian 710071, China.
| | - Yunfeng Nie
- Brussel Photonics, Department of Applied Physics and Photonics, Vrije Universiteit Brussel and Flanders Make, 1050 Brussels, Belgium
| | - Wenqi Ren
- State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing, 100093, China
| | - Min Wei
- Department of Orthopedics, the Fourth Medical Center, Chinese PLA General Hospital, Beijing 100853, China
| | - Jingang Zhang
- School of Future Technology, University of Chinese Academy of Sciences, Beijing 100039, China; School of Aerospace Science And Technology, Xidian University, Xian 710071, China.
| |
Collapse
|
28
|
Yu X, Zhao J, Wu H, Wang A. A Novel Evaluation Method for SLAM-Based 3D Reconstruction of Lumen Panoramas. SENSORS (BASEL, SWITZERLAND) 2023; 23:7188. [PMID: 37631725 PMCID: PMC10459170 DOI: 10.3390/s23167188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 08/09/2023] [Accepted: 08/10/2023] [Indexed: 08/27/2023]
Abstract
Laparoscopy is employed in conventional minimally invasive surgery to inspect internal cavities by viewing two-dimensional images on a monitor. This method has a limited field of view and provides insufficient information for surgeons, increasing surgical complexity. Utilizing simultaneous localization and mapping (SLAM) technology to reconstruct laparoscopic scenes can offer more comprehensive and intuitive visual feedback. Moreover, the precision of the reconstructed models is a crucial factor for further applications of surgical assistance systems. However, challenges such as data scarcity and scale uncertainty hinder effective assessment of the accuracy of endoscopic monocular SLAM reconstructions. Therefore, this paper proposes a technique that incorporates existing knowledge from calibration objects to supplement metric information and resolve scale ambiguity issues, and it quantifies the endoscopic reconstruction accuracy based on local alignment metrics. The experimental results demonstrate that the reconstructed models restore realistic scales and enable error analysis for laparoscopic SLAM reconstruction systems. This suggests that for the evaluation of monocular SLAM three-dimensional (3D) reconstruction accuracy in minimally invasive surgery scenarios, our proposed scheme for recovering scale factors is viable, and our evaluation outcomes can serve as criteria for measuring reconstruction precision.
Collapse
Affiliation(s)
- Xiaoyu Yu
- College of Electron and Information, University of Electronic Science and Technology of China, Zhongshan Institute, Zhongshan 528402, China;
- Heilongjiang Province Key Laboratory of Laser Spectroscopy Technology and Application, Harbin University of Science and Technology, Harbin 150080, China (A.W.)
| | - Jianbo Zhao
- Heilongjiang Province Key Laboratory of Laser Spectroscopy Technology and Application, Harbin University of Science and Technology, Harbin 150080, China (A.W.)
| | - Haibin Wu
- Heilongjiang Province Key Laboratory of Laser Spectroscopy Technology and Application, Harbin University of Science and Technology, Harbin 150080, China (A.W.)
| | - Aili Wang
- Heilongjiang Province Key Laboratory of Laser Spectroscopy Technology and Application, Harbin University of Science and Technology, Harbin 150080, China (A.W.)
| |
Collapse
|
29
|
Liu Y, Zuo S. Self-supervised monocular depth estimation for gastrointestinal endoscopy. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 238:107619. [PMID: 37235969 DOI: 10.1016/j.cmpb.2023.107619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 04/26/2023] [Accepted: 05/18/2023] [Indexed: 05/28/2023]
Abstract
BACKGROUND AND OBJECTIVE Gastrointestinal (GI) endoscopy represents a promising tool for GI cancer screening. However, the limited field of view and uneven skills of endoscopists make it remains difficult to accurately identify polyps and follow up on precancerous lesions under endoscopy. Estimating depth from GI endoscopic sequences is essential for a series of AI-assisted surgical techniques. Nonetheless, depth estimation algorithm of GI endoscopy is a challenging task due to the particularity of the environment and the limitation of datasets. In this paper, we propose a self-supervised monocular depth estimation method for GI endoscopy. METHODS A depth estimation network and a camera ego-motion estimation network are firstly constructed to obtain the depth information and pose information of the sequence respectively, and then the model is enabled to perform self-supervised training by calculating the multi-scale structural similarity with L1 norm (MS-SSIM+L1) loss function between the target frame and the reconstructed image as part of the loss of the training network. The MS-SSIM+L1 loss function is good for reserving high-frequency information and can maintain the invariance of brightness and color. Our model consists of the U-shape convolutional network with the dual-attention mechanism, which is beneficial to capture muti-scale contextual information, and greatly improves the accuracy of depth estimation. We evaluated our method qualitatively and quantitatively with different state-of-the-art methods. RESULTS AND CONCLUSIONS The experimental results manifest that our method has superior generality, achieving lower error metrics and higher accuracy metrics on both the UCL dataset and the Endoslam dataset. The proposed method has also been validated with clinical GI endoscopy, demonstrating the potential clinical value of the model.
Collapse
Affiliation(s)
- Yuying Liu
- Key Laboratory of Mechanism Theory and Equipment Design of Ministry of Education, Tianjin University, Tianjin, China
| | - Siyang Zuo
- Key Laboratory of Mechanism Theory and Equipment Design of Ministry of Education, Tianjin University, Tianjin, China.
| |
Collapse
|
30
|
Hayoz M, Hahne C, Gallardo M, Candinas D, Kurmann T, Allan M, Sznitman R. Learning how to robustly estimate camera pose in endoscopic videos. Int J Comput Assist Radiol Surg 2023:10.1007/s11548-023-02919-w. [PMID: 37184768 PMCID: PMC10329609 DOI: 10.1007/s11548-023-02919-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Accepted: 04/13/2023] [Indexed: 05/16/2023]
Abstract
PURPOSE Surgical scene understanding plays a critical role in the technology stack of tomorrow's intervention-assisting systems in endoscopic surgeries. For this, tracking the endoscope pose is a key component, but remains challenging due to illumination conditions, deforming tissues and the breathing motion of organs. METHOD We propose a solution for stereo endoscopes that estimates depth and optical flow to minimize two geometric losses for camera pose estimation. Most importantly, we introduce two learned adaptive per-pixel weight mappings that balance contributions according to the input image content. To do so, we train a Deep Declarative Network to take advantage of the expressiveness of deep learning and the robustness of a novel geometric-based optimization approach. We validate our approach on the publicly available SCARED dataset and introduce a new in vivo dataset, StereoMIS, which includes a wider spectrum of typically observed surgical settings. RESULTS Our method outperforms state-of-the-art methods on average and more importantly, in difficult scenarios where tissue deformations and breathing motion are visible. We observed that our proposed weight mappings attenuate the contribution of pixels on ambiguous regions of the images, such as deforming tissues. CONCLUSION We demonstrate the effectiveness of our solution to robustly estimate the camera pose in challenging endoscopic surgical scenes. Our contributions can be used to improve related tasks like simultaneous localization and mapping (SLAM) or 3D reconstruction, therefore advancing surgical scene understanding in minimally invasive surgery.
Collapse
Affiliation(s)
- Michel Hayoz
- ARTORG Center, University of Bern, Bern, Switzerland.
| | | | | | - Daniel Candinas
- Department of Visceral Surgery and Medicine, Inselspital, Bern, Switzerland
| | | | | | | |
Collapse
|
31
|
Horovistiz A, Oliveira M, Araújo H. Computer vision-based solutions to overcome the limitations of wireless capsule endoscopy. J Med Eng Technol 2023; 47:242-261. [PMID: 38231042 DOI: 10.1080/03091902.2024.2302025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 12/28/2023] [Indexed: 01/18/2024]
Abstract
Endoscopic investigation plays a critical role in the diagnosis of gastrointestinal (GI) diseases. Since 2001, Wireless Capsule Endoscopy (WCE) has been available for small bowel exploration and is in continuous development. Over the last decade, WCE has achieved impressive improvements in areas such as miniaturisation, image quality and battery life. As a result, WCE is currently a very useful alternative to wired enteroscopy in the investigation of various small bowel abnormalities and has the potential to become the leading screening technique for the entire gastrointestinal tract. However, commercial solutions still have several limitations, namely incomplete examination and limited diagnostic capacity. These deficiencies are related to technical issues, such as image quality, motion estimation and power consumption management. Computational methods, based on image processing and analysis, can help to overcome these challenges and reduce both the time required by reviewers and human interpretation errors. Research groups have proposed a series of methods including algorithms for locating the capsule or lesion, assessing intestinal motility and improving image quality.In this work, we provide a critical review of computational vision-based methods for WCE image analysis aimed at overcoming the technological challenges of capsules. This article also reviews several representative public datasets used to evaluate the performance of WCE techniques and methods. Finally, some promising solutions of computational methods based on the analysis of multiple-camera endoscopic images are presented.
Collapse
Affiliation(s)
- Ana Horovistiz
- Institute of Systems and Robotics, University of Coimbra, Coimbra, Portugal
| | - Marina Oliveira
- Institute of Systems and Robotics, University of Coimbra, Coimbra, Portugal
- Department of Electrical and Computer Engineering (DEEC), Faculty of Sciences and Technology, University of Coimbra, Coimbra, Portugal
| | - Helder Araújo
- Institute of Systems and Robotics, University of Coimbra, Coimbra, Portugal
- Department of Electrical and Computer Engineering (DEEC), Faculty of Sciences and Technology, University of Coimbra, Coimbra, Portugal
| |
Collapse
|
32
|
Semantic Segmentation of Digestive Abnormalities from WCE Images by Using AttResU-Net Architecture. Life (Basel) 2023; 13:life13030719. [PMID: 36983874 PMCID: PMC10051085 DOI: 10.3390/life13030719] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 02/04/2023] [Accepted: 03/03/2023] [Indexed: 03/09/2023] Open
Abstract
Colorectal cancer is one of the most common malignancies and the leading cause of cancer death worldwide. Wireless capsule endoscopy is currently the most frequent method for detecting precancerous digestive diseases. Thus, precise and early polyps segmentation has significant clinical value in reducing the probability of cancer development. However, the manual examination is a time-consuming and tedious task for doctors. Therefore, scientists have proposed many computational techniques to automatically segment the anomalies from endoscopic images. In this paper, we present an end-to-end 2D attention residual U-Net architecture (AttResU-Net), which concurrently integrates the attention mechanism and residual units into U-Net for further polyp and bleeding segmentation performance enhancement. To reduce outside areas in an input image while emphasizing salient features, AttResU-Net inserts a sequence of attention units among related downsampling and upsampling steps. On the other hand, the residual block propagates information across layers, allowing for the construction of a deeper neural network capable of solving the vanishing gradient issue in each encoder. This improves the channel interdependencies while lowering the computational cost. Multiple publicly available datasets were employed in this work, to evaluate and verify the proposed method. Our highest-performing model was AttResU-Net, on the MICCAI 2017 WCE dataset, which achieved an accuracy of 99.16%, a Dice coefficient of 94.91%, and a Jaccard index of 90.32%. The experiment findings show that the proposed AttResU-Net overcomes its baselines and provides performance comparable to existing polyp segmentation approaches.
Collapse
|
33
|
Ali S. Where do we stand in AI for endoscopic image analysis? Deciphering gaps and future directions. NPJ Digit Med 2022; 5:184. [PMID: 36539473 PMCID: PMC9767933 DOI: 10.1038/s41746-022-00733-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Accepted: 11/29/2022] [Indexed: 12/24/2022] Open
Abstract
Recent developments in deep learning have enabled data-driven algorithms that can reach human-level performance and beyond. The development and deployment of medical image analysis methods have several challenges, including data heterogeneity due to population diversity and different device manufacturers. In addition, more input from experts is required for a reliable method development process. While the exponential growth in clinical imaging data has enabled deep learning to flourish, data heterogeneity, multi-modality, and rare or inconspicuous disease cases still need to be explored. Endoscopy being highly operator-dependent with grim clinical outcomes in some disease cases, reliable and accurate automated system guidance can improve patient care. Most designed methods must be more generalisable to the unseen target data, patient population variability, and variable disease appearances. The paper reviews recent works on endoscopic image analysis with artificial intelligence (AI) and emphasises the current unmatched needs in this field. Finally, it outlines the future directions for clinically relevant complex AI solutions to improve patient outcomes.
Collapse
Affiliation(s)
- Sharib Ali
- School of Computing, University of Leeds, LS2 9JT, Leeds, UK.
| |
Collapse
|
34
|
Gu Y, Gu C, Yang J, Sun J, Yang GZ. Vision-Kinematics Interaction for Robotic-Assisted Bronchoscopy Navigation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2022; 41:3600-3610. [PMID: 35839186 DOI: 10.1109/tmi.2022.3191317] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Endobronchial intervention is increasingly used as a minimally invasive means for the treatment of pulmonary diseases. In order to acquire the position of bronchoscopy, vision-based localization approaches are clinically preferable but are sensitive to visual variations. The static nature of pre-operative planning makes mapping of intraoperative anatomical features challenging for learning-based methods using visual features alone. In this work, we propose a robust navigation framework based on Vision Kinematic Interaction (VKI) for monocular bronchoscopic videos. To address visual-imbalance between the virtual and real views of bronchoscopy images, a Visual Similarity Network (VSN) is proposed to extract domain-invariant features to represent the lumen structure from endoscopic views, as well as domain-specific features to characterize the surface texture and visual artefacts. To improve the robustness of online estimation of camera pose, we also introduce a Kinematic Refinement Network (KRN) that allows progressive refinement of camera pose estimation based on network prediction and robot control signals. The accuracy of camera localization is validated on phantom and porcine lung datasets from a robotically controlled endobronchial intervention system, with both quantitative and qualitative results demonstrating the performance of the techniques. Results show that the features extracted by the proposed method can preserve the structural information of small airways in the presence of large visual variations along with the much-improved camera localization accuracy. The absolute trajectory errors (ATE) on phantom data and porcine data are 8.01 mm and 8.62 mm respectively.
Collapse
|
35
|
Turan M, Durmus F. UC-NfNet: Deep learning-enabled assessment of ulcerative colitis from colonoscopy images. Med Image Anal 2022; 82:102587. [DOI: 10.1016/j.media.2022.102587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 07/12/2022] [Accepted: 08/17/2022] [Indexed: 10/31/2022]
|
36
|
Bagadthey D, Prabhu S, Khan SS, Fredrick DT, Boominathan V, Veeraraghavan A, Mitra K. FlatNet3D: intensity and absolute depth from single-shot lensless capture. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA. A, OPTICS, IMAGE SCIENCE, AND VISION 2022; 39:1903-1912. [PMID: 36215563 DOI: 10.1364/josaa.466286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 09/14/2022] [Indexed: 06/16/2023]
Abstract
Lensless cameras are ultra-thin imaging systems that replace the lens with a thin passive optical mask and computation. Passive mask-based lensless cameras encode depth information in their measurements for a certain depth range. Early works have shown that this encoded depth can be used to perform 3D reconstruction of close-range scenes. However, these approaches for 3D reconstructions are typically optimization based and require strong hand-crafted priors and hundreds of iterations to reconstruct. Moreover, the reconstructions suffer from low resolution, noise, and artifacts. In this work, we propose FlatNet3D-a feed-forward deep network that can estimate both depth and intensity from a single lensless capture. FlatNet3D is an end-to-end trainable deep network that directly reconstructs depth and intensity from a lensless measurement using an efficient physics-based 3D mapping stage and a fully convolutional network. Our algorithm is fast and produces high-quality results, which we validate using both simulated and real scenes captured using PhlatCam.
Collapse
|
37
|
A Robust Deep Model for Classification of Peptic Ulcer and Other Digestive Tract Disorders Using Endoscopic Images. Biomedicines 2022; 10:biomedicines10092195. [PMID: 36140296 PMCID: PMC9496137 DOI: 10.3390/biomedicines10092195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 08/23/2022] [Accepted: 08/24/2022] [Indexed: 11/17/2022] Open
Abstract
Accurate patient disease classification and detection through deep-learning (DL) models are increasingly contributing to the area of biomedical imaging. The most frequent gastrointestinal (GI) tract ailments are peptic ulcers and stomach cancer. Conventional endoscopy is a painful and hectic procedure for the patient while Wireless Capsule Endoscopy (WCE) is a useful technology for diagnosing GI problems and doing painless gut imaging. However, there is still a challenge to investigate thousands of images captured during the WCE procedure accurately and efficiently because existing deep models are not scored with significant accuracy on WCE image analysis. So, to prevent emergency conditions among patients, we need an efficient and accurate DL model for real-time analysis. In this study, we propose a reliable and efficient approach for classifying GI tract abnormalities using WCE images by applying a deep Convolutional Neural Network (CNN). For this purpose, we propose a custom CNN architecture named GI Disease-Detection Network (GIDD-Net) that is designed from scratch with relatively few parameters to detect GI tract disorders more accurately and efficiently at a low computational cost. Moreover, our model successfully distinguishes GI disorders by visualizing class activation patterns in the stomach bowls as a heat map. The Kvasir-Capsule image dataset has a significant class imbalance problem, we exploited a synthetic oversampling technique BORDERLINE SMOTE (BL-SMOTE) to evenly distribute the image among the classes to prevent the problem of class imbalance. The proposed model is evaluated against various metrics and achieved the following values for evaluation metrics: 98.9%, 99.8%, 98.9%, 98.9%, 98.8%, and 0.0474 for accuracy, AUC, F1-score, precision, recall, and loss, respectively. From the simulation results, it is noted that the proposed model outperforms other state-of-the-art models in all the evaluation metrics.
Collapse
|
38
|
Moen S, Vuik FER, Kuipers EJ, Spaander MCW. Artificial Intelligence in Colon Capsule Endoscopy—A Systematic Review. Diagnostics (Basel) 2022; 12:diagnostics12081994. [PMID: 36010345 PMCID: PMC9407289 DOI: 10.3390/diagnostics12081994] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 08/05/2022] [Accepted: 08/10/2022] [Indexed: 12/17/2022] Open
Abstract
Background and aims: The applicability of colon capsule endoscopy in daily practice is limited by the accompanying labor-intensive reviewing time and the risk of inter-observer variability. Automated reviewing of colon capsule endoscopy images using artificial intelligence could be timesaving while providing an objective and reproducible outcome. This systematic review aims to provide an overview of the available literature on artificial intelligence for reviewing colonic mucosa by colon capsule endoscopy and to assess the necessary action points for its use in clinical practice. Methods: A systematic literature search of literature published up to January 2022 was conducted using Embase, Web of Science, OVID MEDLINE and Cochrane CENTRAL. Studies reporting on the use of artificial intelligence to review second-generation colon capsule endoscopy colonic images were included. Results: 1017 studies were evaluated for eligibility, of which nine were included. Two studies reported on computed bowel cleansing assessment, five studies reported on computed polyp or colorectal neoplasia detection and two studies reported on other implications. Overall, the sensitivity of the proposed artificial intelligence models were 86.5–95.5% for bowel cleansing and 47.4–98.1% for the detection of polyps and colorectal neoplasia. Two studies performed per-lesion analysis, in addition to per-frame analysis, which improved the sensitivity of polyp or colorectal neoplasia detection to 81.3–98.1%. By applying a convolutional neural network, the highest sensitivity of 98.1% for polyp detection was found. Conclusion: The use of artificial intelligence for reviewing second-generation colon capsule endoscopy images is promising. The highest sensitivity of 98.1% for polyp detection was achieved by deep learning with a convolutional neural network. Convolutional neural network algorithms should be optimized and tested with more data, possibly requiring the set-up of a large international colon capsule endoscopy database. Finally, the accuracy of the optimized convolutional neural network models need to be confirmed in a prospective setting.
Collapse
|
39
|
Yang Z, Pan J, Li R, Qin H. Scene-graph-driven semantic feature matching for monocular digestive endoscopy. Comput Biol Med 2022; 146:105616. [DOI: 10.1016/j.compbiomed.2022.105616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Revised: 04/11/2022] [Accepted: 05/11/2022] [Indexed: 11/28/2022]
|
40
|
Yang Z, Lin S, Simon R, Linte CA. Endoscope Localization and Dense Surgical Scene Reconstruction for Stereo Endoscopy by Unsupervised Optical Flow and Kanade-Lucas-Tomasi Tracking. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2022; 2022:4839-4842. [PMID: 36086106 PMCID: PMC10153602 DOI: 10.1109/embc48229.2022.9871588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
In image-guided surgery, endoscope tracking and surgical scene reconstruction are critical, yet equally challenging tasks. We present a hybrid visual odometry and reconstruction framework for stereo endoscopy that leverages unsupervised learning-based and traditional optical flow methods to enable concurrent endoscope tracking and dense scene reconstruction. More specifically, to reconstruct texture-less tissue surfaces, we use an unsupervised learning-based optical flow method to estimate dense depth maps from stereo images. Robust 3D landmarks are selected from the dense depth maps and tracked via the Kanade-Lucas-Tomasi tracking algorithm. The hybrid visual odometry also benefits from traditional visual odometry modules, such as keyframe insertion and local bundle adjustment. We evaluate the proposed framework on endoscopic video sequences openly available via the SCARED dataset against both ground truth data, as well as two other state-of-the-art methods - ORB-SLAM2 and Endo-depth. Our proposed method achieved comparable results in terms of both RMS Absolute Trajectory Error and Cloud-to-Mesh RMS Error, suggesting its potential to enable accurate endoscope tracking and scene reconstruction.
Collapse
|
41
|
SelfVIO: Self-supervised deep monocular Visual–Inertial Odometry and depth estimation. Neural Netw 2022; 150:119-136. [DOI: 10.1016/j.neunet.2022.03.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Revised: 03/01/2022] [Accepted: 03/03/2022] [Indexed: 01/31/2023]
|
42
|
Padovan E, Marullo G, Tanzi L, Piazzolla P, Moos S, Porpiglia F, Vezzetti E. A deep learning framework for real-time 3D model registration in robot-assisted laparoscopic surgery. Int J Med Robot 2022; 18:e2387. [PMID: 35246913 PMCID: PMC9286374 DOI: 10.1002/rcs.2387] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Revised: 01/31/2022] [Accepted: 03/03/2022] [Indexed: 12/18/2022]
Abstract
Introduction The current study presents a deep learning framework to determine, in real‐time, position and rotation of a target organ from an endoscopic video. These inferred data are used to overlay the 3D model of patient's organ over its real counterpart. The resulting augmented video flow is streamed back to the surgeon as a support during laparoscopic robot‐assisted procedures. Methods This framework exploits semantic segmentation and, thereafter, two techniques, based on Convolutional Neural Networks and motion analysis, were used to infer the rotation. Results The segmentation shows optimal accuracies, with a mean IoU score greater than 80% in all tests. Different performance levels are obtained for rotation, depending on the surgical procedure. Discussion Even if the presented methodology has various degrees of precision depending on the testing scenario, this work sets the first step for the adoption of deep learning and augmented reality to generalise the automatic registration process.
Collapse
Affiliation(s)
- Erica Padovan
- Department of Management, Production and Design Engineering, Polytechnic University of Turin, Turin, Italy
| | - Giorgia Marullo
- Department of Management, Production and Design Engineering, Polytechnic University of Turin, Turin, Italy
| | - Leonardo Tanzi
- Department of Management, Production and Design Engineering, Polytechnic University of Turin, Turin, Italy
| | - Pietro Piazzolla
- Department of Oncology, Division of Urology, School of Medicine, University of Turin, Turin, Italy
| | - Sandro Moos
- Department of Management, Production and Design Engineering, Polytechnic University of Turin, Turin, Italy
| | - Francesco Porpiglia
- Department of Oncology, Division of Urology, School of Medicine, University of Turin, Turin, Italy
| | - Enrico Vezzetti
- Department of Management, Production and Design Engineering, Polytechnic University of Turin, Turin, Italy
| |
Collapse
|
43
|
Boese A, Wex C, Croner R, Liehr UB, Wendler JJ, Weigt J, Walles T, Vorwerk U, Lohmann CH, Friebe M, Illanes A. Endoscopic Imaging Technology Today. Diagnostics (Basel) 2022; 12:1262. [PMID: 35626417 PMCID: PMC9140648 DOI: 10.3390/diagnostics12051262] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 05/02/2022] [Accepted: 05/16/2022] [Indexed: 02/04/2023] Open
Abstract
One of the most applied imaging methods in medicine is endoscopy. A highly specialized image modality has been developed since the first modern endoscope, the "Lichtleiter" of Bozzini was introduced in the early 19th century. Multiple medical disciplines use endoscopy for diagnostics or to visualize and support therapeutic procedures. Therefore, the shapes, functionalities, handling concepts, and the integrated and surrounding technology of endoscopic systems were adapted to meet these dedicated medical application requirements. This survey gives an overview of modern endoscopic technology's state of the art. Therefore, the portfolio of several manufacturers with commercially available products on the market was screened and summarized. Additionally, some trends for upcoming developments were collected.
Collapse
Affiliation(s)
- Axel Boese
- INKA Health Tech Innovation Lab., Medical Faculty, Otto-von-Guericke University Magdeburg, 39120 Magdeburg, Germany; (M.F.); (A.I.)
| | - Cora Wex
- Clinic of General-, Visceral-, Vascular- and Transplant Surgery, University Hospital Magdeburg, 39120 Magdeburg, Germany; (C.W.); (R.C.)
| | - Roland Croner
- Clinic of General-, Visceral-, Vascular- and Transplant Surgery, University Hospital Magdeburg, 39120 Magdeburg, Germany; (C.W.); (R.C.)
| | - Uwe Bernd Liehr
- Uro-Oncology, Roboter-Assisted and Focal Therapy, Clinic for Urology, University Hospital Magdeburg, 39120 Magdeburg, Germany; (U.B.L.); (J.J.W.)
| | - Johann Jakob Wendler
- Uro-Oncology, Roboter-Assisted and Focal Therapy, Clinic for Urology, University Hospital Magdeburg, 39120 Magdeburg, Germany; (U.B.L.); (J.J.W.)
| | - Jochen Weigt
- Hepatology, and Infectious Diseases, Clinic of Gastroenterology, University Hospital Magdeburg, 39120 Magdeburg, Germany;
| | - Thorsten Walles
- Clinic of Cardiac and Thoracic Surgery, University Hospital Magdeburg, 39120 Magdeburg, Germany;
| | - Ulrich Vorwerk
- Clinic of Throat, Nose, and Ear, Head and Neck Surgery, University Hospital Magdeburg, 39120 Magdeburg, Germany;
| | | | - Michael Friebe
- INKA Health Tech Innovation Lab., Medical Faculty, Otto-von-Guericke University Magdeburg, 39120 Magdeburg, Germany; (M.F.); (A.I.)
- Department of Measurement and Electronics, AGH University of Science and Technology, 31-503 Kraków, Poland
| | - Alfredo Illanes
- INKA Health Tech Innovation Lab., Medical Faculty, Otto-von-Guericke University Magdeburg, 39120 Magdeburg, Germany; (M.F.); (A.I.)
| |
Collapse
|
44
|
Liu S, Fan J, Song D, Fu T, Lin Y, Xiao D, Song H, Wang Y, Yang J. Joint estimation of depth and motion from a monocular endoscopy image sequence using a multi-loss rebalancing network. BIOMEDICAL OPTICS EXPRESS 2022; 13:2707-2727. [PMID: 35774318 PMCID: PMC9203100 DOI: 10.1364/boe.457475] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Revised: 04/01/2022] [Accepted: 04/01/2022] [Indexed: 06/15/2023]
Abstract
Building an in vivo three-dimensional (3D) surface model from a monocular endoscopy is an effective technology to improve the intuitiveness and precision of clinical laparoscopic surgery. This paper proposes a multi-loss rebalancing-based method for joint estimation of depth and motion from a monocular endoscopy image sequence. The feature descriptors are used to provide monitoring signals for the depth estimation network and motion estimation network. The epipolar constraints of the sequence frame is considered in the neighborhood spatial information by depth estimation network to enhance the accuracy of depth estimation. The reprojection information of depth estimation is used to reconstruct the camera motion by motion estimation network with a multi-view relative pose fusion mechanism. The relative response loss, feature consistency loss, and epipolar consistency loss function are defined to improve the robustness and accuracy of the proposed unsupervised learning-based method. Evaluations are implemented on public datasets. The error of motion estimation in three scenes decreased by 42.1%,53.6%, and 50.2%, respectively. And the average error of 3D reconstruction is 6.456 ± 1.798mm. This demonstrates its capability to generate reliable depth estimation and trajectory reconstruction results for endoscopy images and meaningful applications in clinical.
Collapse
Affiliation(s)
- Shiyuan Liu
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Jingfan Fan
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Dengpan Song
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Tianyu Fu
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Yucong Lin
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Deqiang Xiao
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Hong Song
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
| | - Yongtian Wang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Jian Yang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| |
Collapse
|
45
|
Shao S, Pei Z, Chen W, Zhu W, Wu X, Sun D, Zhang B. Self-Supervised monocular depth and ego-Motion estimation in endoscopy: Appearance flow to the rescue. Med Image Anal 2021; 77:102338. [PMID: 35016079 DOI: 10.1016/j.media.2021.102338] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 10/24/2021] [Accepted: 12/14/2021] [Indexed: 11/25/2022]
Abstract
Recently, self-supervised learning technology has been applied to calculate depth and ego-motion from monocular videos, achieving remarkable performance in autonomous driving scenarios. One widely adopted assumption of depth and ego-motion self-supervised learning is that the image brightness remains constant within nearby frames. Unfortunately, the endoscopic scene does not meet this assumption because there are severe brightness fluctuations induced by illumination variations, non-Lambertian reflections and interreflections during data collection, and these brightness fluctuations inevitably deteriorate the depth and ego-motion estimation accuracy. In this work, we introduce a novel concept referred to as appearance flow to address the brightness inconsistency problem. The appearance flow takes into consideration any variations in the brightness pattern and enables us to develop a generalized dynamic image constraint. Furthermore, we build a unified self-supervised framework to estimate monocular depth and ego-motion simultaneously in endoscopic scenes, which comprises a structure module, a motion module, an appearance module and a correspondence module, to accurately reconstruct the appearance and calibrate the image brightness. Extensive experiments are conducted on the SCARED dataset and EndoSLAM dataset, and the proposed unified framework exceeds other self-supervised approaches by a large margin. To validate our framework's generalization ability on different patients and cameras, we train our model on SCARED but test it on the SERV-CT and Hamlyn datasets without any fine-tuning, and the superior results reveal its strong generalization ability. Code is available at: https://github.com/ShuweiShao/AF-SfMLearner.
Collapse
Affiliation(s)
- Shuwei Shao
- School of Automation Science and Electrical Engineering, Beihang University, Beijing, China
| | - Zhongcai Pei
- School of Automation Science and Electrical Engineering, Beihang University, Beijing, China; Hangzhou Innovation Institute, Beihang University, Hangzhou, China
| | - Weihai Chen
- School of Automation Science and Electrical Engineering, Beihang University, Beijing, China; Hangzhou Innovation Institute, Beihang University, Hangzhou, China.
| | | | - Xingming Wu
- School of Automation Science and Electrical Engineering, Beihang University, Beijing, China
| | - Dianmin Sun
- Shandong Cancer Hospital Affiliated to Shandong University, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, China
| | - Baochang Zhang
- Institute of Artificial Intelligence, Beihang University, Beijing, China.
| |
Collapse
|
46
|
Widya AR, Monno Y, Okutomi M, Suzuki S, Gotoda T, Miki K. Learning-Based Depth and Pose Estimation for Monocular Endoscope with Loss Generalization. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2021; 2021:3547-3552. [PMID: 34892005 DOI: 10.1109/embc46164.2021.9630156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Gastroendoscopy has been a clinical standard for diagnosing and treating conditions that affect a part of a patient's digestive system, such as the stomach. Despite the fact that gastroendoscopy has a lot of advantages for patients, there exist some challenges for practitioners, such as the lack of 3D perception, including the depth and the endoscope pose information. Such challenges make navigating the endoscope and localizing any found lesion in a digestive tract difficult. To tackle these problems, deep learning-based approaches have been proposed to provide monocular gastroendoscopy with additional yet important depth and pose information. In this paper, we propose a novel supervised approach to train depth and pose estimation networks using consecutive endoscopy images to assist the endoscope navigation in the stomach. We firstly generate real depth and pose training data using our previously proposed whole stomach 3D reconstruction pipeline to avoid poor generalization ability between computer-generated (CG) models and real data for the stomach. In addition, we propose a novel generalized photometric loss function to avoid the complicated process of finding proper weights for balancing the depth and the pose loss terms, which is required for existing direct depth and pose supervision approaches. We then experimentally show that our proposed generalized loss performs better than existing direct supervision losses.
Collapse
|
47
|
Zhuang H, Zhang J, Liao F. A systematic review on application of deep learning in digestive system image processing. THE VISUAL COMPUTER 2021; 39:2207-2222. [PMID: 34744231 PMCID: PMC8557108 DOI: 10.1007/s00371-021-02322-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 09/30/2021] [Indexed: 05/07/2023]
Abstract
With the advent of the big data era, the application of artificial intelligence represented by deep learning in medicine has become a hot topic. In gastroenterology, deep learning has accomplished remarkable accomplishments in endoscopy, imageology, and pathology. Artificial intelligence has been applied to benign gastrointestinal tract lesions, early cancer, tumors, inflammatory bowel diseases, livers, pancreas, and other diseases. Computer-aided diagnosis significantly improve diagnostic accuracy and reduce physicians' workload and provide a shred of evidence for clinical diagnosis and treatment. In the near future, artificial intelligence will have high application value in the field of medicine. This paper mainly summarizes the latest research on artificial intelligence in diagnosing and treating digestive system diseases and discussing artificial intelligence's future in digestive system diseases. We sincerely hope that our work can become a stepping stone for gastroenterologists and computer experts in artificial intelligence research and facilitate the application and development of computer-aided image processing technology in gastroenterology.
Collapse
Affiliation(s)
- Huangming Zhuang
- Gastroenterology Department, Renmin Hospital of Wuhan University, Wuhan, 430060 Hubei China
| | - Jixiang Zhang
- Gastroenterology Department, Renmin Hospital of Wuhan University, Wuhan, 430060 Hubei China
| | - Fei Liao
- Gastroenterology Department, Renmin Hospital of Wuhan University, Wuhan, 430060 Hubei China
| |
Collapse
|
48
|
Gastrointestinal Tract Disease Classification from Wireless Endoscopy Images Using Pretrained Deep Learning Model. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:5940433. [PMID: 34545292 PMCID: PMC8449743 DOI: 10.1155/2021/5940433] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Revised: 07/03/2021] [Accepted: 08/16/2021] [Indexed: 12/28/2022]
Abstract
Wireless capsule endoscopy is a noninvasive wireless imaging technology that becomes increasingly popular in recent years. One of the major drawbacks of this technology is that it generates a large number of photos that must be analyzed by medical personnel, which takes time. Various research groups have proposed different image processing and machine learning techniques to classify gastrointestinal tract diseases in recent years. Traditional image processing algorithms and a data augmentation technique are combined with an adjusted pretrained deep convolutional neural network to classify diseases in the gastrointestinal tract from wireless endoscopy images in this research. We take advantage of pretrained models VGG16, ResNet-18, and GoogLeNet, a convolutional neural network (CNN) model with adjusted fully connected and output layers. The proposed models are validated with a dataset consisting of 6702 images of 8 classes. The VGG16 model achieved the highest results with 96.33% accuracy, 96.37% recall, 96.5% precision, and 96.5% F1-measure. Compared to other state-of-the-art models, the VGG16 model has the highest Matthews Correlation Coefficient value of 0.95 and Cohen's kappa score of 0.96.
Collapse
|
49
|
Motion-based camera localization system in colonoscopy videos. Med Image Anal 2021; 73:102180. [PMID: 34303888 DOI: 10.1016/j.media.2021.102180] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Revised: 07/09/2021] [Accepted: 07/13/2021] [Indexed: 11/20/2022]
Abstract
Optical colonoscopy is an essential diagnostic and prognostic tool for many gastrointestinal diseases, including cancer screening and staging, intestinal bleeding, diarrhea, abdominal symptom evaluation, and inflammatory bowel disease assessment. However, the evaluation, classification, and quantification of findings from colonoscopy are subject to inter-observer variation. Automated assessment of colonoscopy is of interest considering the subjectivity present in qualitative human interpretations of colonoscopy findings. Localization of the camera is essential to interpreting the meaning and context of findings for diseases evaluated by colonoscopy. In this study, we propose a camera localization system to estimate the relative location of the camera and classify the colon into anatomical segments. The camera localization system begins with non-informative frame detection and removal. Then a self-training end-to-end convolutional neural network is built to estimate the camera motion, where several strategies are proposed to improve its robustness and generalization on endoscopic videos. Using the estimated camera motion a camera trajectory can be derived and a relative location index calculated. Based on the estimated location index, anatomical colon segment classification is performed by constructing a colon template. The proposed motion estimation algorithm was evaluated on an external dataset containing the ground truth for camera pose. The experimental results show that the performance of the proposed method is superior to other published methods. The relative location index estimation and anatomical region classification were further validated using colonoscopy videos collected from routine clinical practice. This validation yielded an average accuracy in classification of 0.754, which is substantially higher than the performances obtained using location indices built from other methods.
Collapse
|
50
|
Yang Z, Simon R, Li Y, Linte CA. Dense Depth Estimation from Stereo Endoscopy Videos Using Unsupervised Optical Flow Methods. MEDICAL IMAGE UNDERSTANDING AND ANALYSIS. MEDICAL IMAGE UNDERSTANDING AND ANALYSIS (CONFERENCE) 2021; 12722:337-349. [PMID: 35610998 PMCID: PMC9125693 DOI: 10.1007/978-3-030-80432-9_26] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
In the context of Minimally Invasive Surgery, estimating depth from stereo endoscopy plays a crucial role in three-dimensional (3D) reconstruction, surgical navigation, and augmentation reality (AR) visualization. However, the challenges associated with this task are three-fold: 1) feature-less surface representations, often polluted by artifacts, pose difficulty in identifying correspondence; 2) ground truth depth is difficult to estimate; and 3) an endoscopy image acquisition accompanied by accurately calibrated camera parameters is rare, as the camera is often adjusted during an intervention. To address these difficulties, we propose an unsupervised depth estimation framework (END-flow) based on an unsupervised optical flow network trained on un-rectified binocular videos without calibrated camera parameters. The proposed END-flow architecture is compared with traditional stereo matching, self-supervised depth estimation, unsupervised optical flow, and supervised methods implemented on the Stereo Correspondence and Reconstruction of Endoscopic Data (SCARED) Challenge dataset. Experimental results show that our method outperforms several state-of-the-art techniques and achieves a close performance to that of supervised methods.
Collapse
Affiliation(s)
- Zixin Yang
- Center for Imaging Science, Rochester Institute of Technology, Rochester, NY 14623, USA
| | - Richard Simon
- Biomedical Engineering, Rochester Institute of Technology, Rochester, NY 14623, USA
| | - Yangming Li
- Electrical Computer and Telecommunications Engineering Technology, Rochester Institute of Technology, Rochester, NY 14623, USA
| | - Cristian A Linte
- Center for Imaging Science, Rochester Institute of Technology, Rochester, NY 14623, USA
- Biomedical Engineering, Rochester Institute of Technology, Rochester, NY 14623, USA
| |
Collapse
|