1
|
Zhou Y, Li R, Dai Y, Chen G, Zhang J, Cui L, Yin X. Taking measurement in every direction: Implicit scene representation for accurately estimating target dimensions under monocular endoscope. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 256:108380. [PMID: 39178502 DOI: 10.1016/j.cmpb.2024.108380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 06/02/2024] [Accepted: 08/15/2024] [Indexed: 08/26/2024]
Abstract
BACKGROUND AND OBJECTIVES In endoscopy, measurement of target size can assist medical diagnosis. However, limited operating space, low image quality, and irregular target shape pose great challenges to traditional vision-based measurement methods. METHODS In this paper, we propose a novel approach to measure irregular target size under monocular endoscope using image rendering. Firstly synthesize virtual poses on the same main optical axis as known camera poses, and use implicit neural representation module that considers brightness and target boundaries to render images corresponding to virtual poses. Then, Swin-Unet and rotating calipers are utilized to obtain maximum pixel length of the target in image pairs with the same main optical axis. Finally, the similarity triangle relationship of the endoscopic imaging model is used to measure the size of the target. RESULTS The evaluation is conducted using renal stone fragments of patients which are placed in the kidney model and the isolated porcine kidney. The mean error of measurement is 0.12 mm. CONCLUSIONS The approached method can automatically measure object size within narrow body cavities in any visible direction. It improves the effectiveness and accuracy of measurement in limited endoscopic space.
Collapse
Affiliation(s)
- Yuchen Zhou
- The College of Artificial Intelligence, Nankai University, Tianjin 300350, China; The Institute of Robotics and Automatic Information System, Tianjin Key Laboratory of Intelligent Robotics, Tianjin 300350, China
| | - Rui Li
- The College of Artificial Intelligence, Nankai University, Tianjin 300350, China; The Institute of Robotics and Automatic Information System, Tianjin Key Laboratory of Intelligent Robotics, Tianjin 300350, China
| | - Yu Dai
- The College of Artificial Intelligence, Nankai University, Tianjin 300350, China; The Institute of Robotics and Automatic Information System, Tianjin Key Laboratory of Intelligent Robotics, Tianjin 300350, China.
| | - Gongping Chen
- The College of Artificial Intelligence, Nankai University, Tianjin 300350, China; The Institute of Robotics and Automatic Information System, Tianjin Key Laboratory of Intelligent Robotics, Tianjin 300350, China
| | - Jianxun Zhang
- The College of Artificial Intelligence, Nankai University, Tianjin 300350, China; The Institute of Robotics and Automatic Information System, Tianjin Key Laboratory of Intelligent Robotics, Tianjin 300350, China
| | - Liang Cui
- Department of Urology, Civil Aviation General Hospital, Beijing 100123, China
| | - Xiaotao Yin
- Department of Urology, Fourth Medical Center of Chinese, PLA General Hospital, Beijing 10048, China
| |
Collapse
|
2
|
Jeong BH, Kim HK, Son YD. Depth estimation from monocular endoscopy using simulation and image transfer approach. Comput Biol Med 2024; 181:109038. [PMID: 39178804 DOI: 10.1016/j.compbiomed.2024.109038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 08/12/2024] [Accepted: 08/15/2024] [Indexed: 08/26/2024]
Abstract
Obtaining accurate distance or depth information in endoscopy is crucial for the effective utilization of navigation systems. However, due to space constraints, incorporating depth cameras into endoscopic systems is often impractical. Our goal is to estimate depth images directly from endoscopic images using deep learning. This study presents a three-step methodology for training a depth-estimation network model. Initially, simulated endoscopy images and corresponding depth maps are generated using Unity based on a colon surface model obtained from segmented computed tomography colonography data. Subsequently, a cycle generative adversarial network model is employed to enhance the realism of the simulated endoscopy images. Finally, a deep learning model is trained using the synthesized endoscopy images and depth maps to estimate depths accurately. The performance of the proposed approach is evaluated and compared against prior studies utilizing unsupervised training methods. The results demonstrate the superior precision of the proposed technique in estimating depth images within endoscopy. The proposed depth estimation method holds promise for advancing the field by enabling enhanced navigation, improved lesion marking capabilities, and ultimately leading to better clinical outcomes.
Collapse
Affiliation(s)
- Bong Hyuk Jeong
- Department of Health Sciences and Technology, GAIHST, Gachon University, Incheon, 21999, South Korea.
| | - Hang Keun Kim
- Department of Health Sciences and Technology, GAIHST, Gachon University, Incheon, 21999, South Korea; Department of Biomedical Engineering, Gachon University, Seongnam, 13120, South Korea.
| | - Young Don Son
- Department of Health Sciences and Technology, GAIHST, Gachon University, Incheon, 21999, South Korea; Department of Biomedical Engineering, Gachon University, Seongnam, 13120, South Korea.
| |
Collapse
|
3
|
Rampinelli V, Paderno A, Conti C, Testa G, Modesti CL, Agosti E, Dohin I, Saccardo T, Vinciguerra A, Ferrari M, Schreiber A, Mattavelli D, Nicolai P, Holsinger C, Piazza C. Artificial intelligence for automatic detection and segmentation of nasal polyposis: a pilot study. Eur Arch Otorhinolaryngol 2024:10.1007/s00405-024-08809-4. [PMID: 39001915 DOI: 10.1007/s00405-024-08809-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Accepted: 06/23/2024] [Indexed: 07/15/2024]
Abstract
PURPOSE Accurate diagnosis and quantification of polyps and symptoms are pivotal for planning the therapeutic strategy of Chronic rhinosinusitis with nasal polyposis (CRSwNP). This pilot study aimed to develop an artificial intelligence (AI)-based image analysis system capable of segmenting nasal polyps from nasal endoscopy videos. METHODS Recorded nasal videoendoscopies from 52 patients diagnosed with CRSwNP between 2019 and 2022 were retrospectively analyzed. Images extracted were manually segmented on the web application Roboflow. A dataset of 342 images was generated and divided into training (80%), validation (10%), and testing (10%) sets. The Ultralytics YOLOv8.0.28 model was employed for automated segmentation. RESULTS The YOLOv8s-seg model consisted of 195 layers and required 42.4 GFLOPs for operation. When tested against the validation set, the algorithm achieved a precision of 0.91, recall of 0.839, and mean average precision at 50% IoU (mAP50) of 0.949. For the segmentation task, similar metrics were observed, including a mAP ranging from 0.675 to 0.679 for IoUs between 50% and 95%. CONCLUSIONS The study shows that a carefully trained AI algorithm can effectively identify and delineate nasal polyps in patients with CRSwNP. Despite certain limitations like the focus on CRSwNP-specific samples, the algorithm presents a promising complementary tool to existing diagnostic methods.
Collapse
Affiliation(s)
- Vittorio Rampinelli
- Unit of Otorhinolaryngology - Head and Neck Surgery, Department of Surgical and Medical Specialties, Radiological Sciences, and Public Health, School of Medicine, ASST Spedali Civili, University of Brescia, Brescia, Italy.
| | - Alberto Paderno
- Otorhinolaryngology Unit, IRCCS Humanitas Research Hospital, Milano, Italy
| | - Carlo Conti
- Unit of Otorhinolaryngology - Head and Neck Surgery, Department of Surgical and Medical Specialties, Radiological Sciences, and Public Health, School of Medicine, ASST Spedali Civili, University of Brescia, Brescia, Italy
| | - Gabriele Testa
- Unit of Otorhinolaryngology - Head and Neck Surgery, Department of Surgical and Medical Specialties, Radiological Sciences, and Public Health, School of Medicine, ASST Spedali Civili, University of Brescia, Brescia, Italy
| | - Claudia Lodovica Modesti
- Unit of Otorhinolaryngology - Head and Neck Surgery, Department of Surgical and Medical Specialties, Radiological Sciences, and Public Health, School of Medicine, ASST Spedali Civili, University of Brescia, Brescia, Italy
| | - Edoardo Agosti
- Division of Neurosurgery, Department of Surgical and Medical Specialties, Radiological Sciences, and Public Health, School of Medicine, ASST Spedali Civili, University of Brescia, Brescia, Italy
| | - Isabelle Dohin
- Unit of Otorhinolaryngology - Head and Neck Surgery, Department of Surgical and Medical Specialties, Radiological Sciences, and Public Health, School of Medicine, ASST Spedali Civili, University of Brescia, Brescia, Italy
| | - Tommaso Saccardo
- Section of Otorhinolaryngology - Head and Neck Surgery, Department of Neurosciences, University of Padova, Padova, PD, Italy
| | | | - Marco Ferrari
- Section of Otorhinolaryngology - Head and Neck Surgery, Department of Neurosciences, University of Padova, Padova, PD, Italy
| | - Alberto Schreiber
- Unit of Otorhinolaryngology - Head and Neck Surgery, Department of Surgical and Medical Specialties, Radiological Sciences, and Public Health, School of Medicine, ASST Spedali Civili, University of Brescia, Brescia, Italy
| | - Davide Mattavelli
- Unit of Otorhinolaryngology - Head and Neck Surgery, Department of Surgical and Medical Specialties, Radiological Sciences, and Public Health, School of Medicine, ASST Spedali Civili, University of Brescia, Brescia, Italy
| | - Piero Nicolai
- Section of Otorhinolaryngology - Head and Neck Surgery, Department of Neurosciences, University of Padova, Padova, PD, Italy
| | - Chris Holsinger
- Division of Head and Neck Surgery, Department of Otolaryngology, Stanford University, Palo Alto, CA, USA
| | - Cesare Piazza
- Unit of Otorhinolaryngology - Head and Neck Surgery, Department of Surgical and Medical Specialties, Radiological Sciences, and Public Health, School of Medicine, ASST Spedali Civili, University of Brescia, Brescia, Italy
| |
Collapse
|
4
|
Mangulabnan JE, Soberanis-Mukul RD, Teufel T, Sahu M, Porras JL, Vedula SS, Ishii M, Hager G, Taylor RH, Unberath M. An endoscopic chisel: intraoperative imaging carves 3D anatomical models. Int J Comput Assist Radiol Surg 2024; 19:1359-1366. [PMID: 38753135 DOI: 10.1007/s11548-024-03151-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 04/12/2024] [Indexed: 07/10/2024]
Abstract
PURPOSE Preoperative imaging plays a pivotal role in sinus surgery where CTs offer patient-specific insights of complex anatomy, enabling real-time intraoperative navigation to complement endoscopy imaging. However, surgery elicits anatomical changes not represented in the preoperative model, generating an inaccurate basis for navigation during surgery progression. METHODS We propose a first vision-based approach to update the preoperative 3D anatomical model leveraging intraoperative endoscopic video for navigated sinus surgery where relative camera poses are known. We rely on comparisons of intraoperative monocular depth estimates and preoperative depth renders to identify modified regions. The new depths are integrated in these regions through volumetric fusion in a truncated signed distance function representation to generate an intraoperative 3D model that reflects tissue manipulation RESULTS: We quantitatively evaluate our approach by sequentially updating models for a five-step surgical progression in an ex vivo specimen. We compute the error between correspondences from the updated model and ground-truth intraoperative CT in the region of anatomical modification. The resulting models show a decrease in error during surgical progression as opposed to increasing when no update is employed. CONCLUSION Our findings suggest that preoperative 3D anatomical models can be updated using intraoperative endoscopy video in navigated sinus surgery. Future work will investigate improvements to monocular depth estimation as well as removing the need for external navigation systems. The resulting ability to continuously update the patient model may provide surgeons with a more precise understanding of the current anatomical state and paves the way toward a digital twin paradigm for sinus surgery.
Collapse
Affiliation(s)
| | | | - Timo Teufel
- Johns Hopkins University, Baltimore, MD, 21211, USA
| | - Manish Sahu
- Johns Hopkins University, Baltimore, MD, 21211, USA
| | - Jose L Porras
- Johns Hopkins Medical Institutions, Baltimore, MD, 21287, USA
| | | | - Masaru Ishii
- Johns Hopkins Medical Institutions, Baltimore, MD, 21287, USA
| | | | - Russell H Taylor
- Johns Hopkins University, Baltimore, MD, 21211, USA
- Johns Hopkins Medical Institutions, Baltimore, MD, 21287, USA
| | - Mathias Unberath
- Johns Hopkins University, Baltimore, MD, 21211, USA
- Johns Hopkins Medical Institutions, Baltimore, MD, 21287, USA
| |
Collapse
|
5
|
Cui B, Islam M, Bai L, Ren H. Surgical-DINO: adapter learning of foundation models for depth estimation in endoscopic surgery. Int J Comput Assist Radiol Surg 2024; 19:1013-1020. [PMID: 38459402 PMCID: PMC11178563 DOI: 10.1007/s11548-024-03083-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Accepted: 02/16/2024] [Indexed: 03/10/2024]
Abstract
PURPOSE Depth estimation in robotic surgery is vital in 3D reconstruction, surgical navigation and augmented reality visualization. Although the foundation model exhibits outstanding performance in many vision tasks, including depth estimation (e.g., DINOv2), recent works observed its limitations in medical and surgical domain-specific applications. This work presents a low-ranked adaptation (LoRA) of the foundation model for surgical depth estimation. METHODS We design a foundation model-based depth estimation method, referred to as Surgical-DINO, a low-rank adaptation of the DINOv2 for depth estimation in endoscopic surgery. We build LoRA layers and integrate them into DINO to adapt with surgery-specific domain knowledge instead of conventional fine-tuning. During training, we freeze the DINO image encoder, which shows excellent visual representation capacity, and only optimize the LoRA layers and depth decoder to integrate features from the surgical scene. RESULTS Our model is extensively validated on a MICCAI challenge dataset of SCARED, which is collected from da Vinci Xi endoscope surgery. We empirically show that Surgical-DINO significantly outperforms all the state-of-the-art models in endoscopic depth estimation tasks. The analysis with ablation studies has shown evidence of the remarkable effect of our LoRA layers and adaptation. CONCLUSION Surgical-DINO shed some light on the successful adaptation of the foundation models into the surgical domain for depth estimation. There is clear evidence in the results that zero-shot prediction on pre-trained weights in computer vision datasets or naive fine-tuning is not sufficient to use the foundation model in the surgical domain directly.
Collapse
Affiliation(s)
- Beilei Cui
- The Chinese University of Hong Kong, Hong Kong, China
| | - Mobarakol Islam
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), University College London, London, UK
| | - Long Bai
- The Chinese University of Hong Kong, Hong Kong, China
| | - Hongliang Ren
- The Chinese University of Hong Kong, Hong Kong, China.
- Department of BME, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
6
|
Zhang C, Tang X, Yang M, Zhao H, Sun D. Performance analysis of a liquid lens for laser ablation using OCT imaging. APPLIED OPTICS 2024; 63:4271-4277. [PMID: 38856602 DOI: 10.1364/ao.525094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Accepted: 05/02/2024] [Indexed: 06/11/2024]
Abstract
Laser ablation has been used in different surgical procedures to perform precise treatments. Compared with previous free-beam laser delivery systems, flexible-optical-fiber-based systems can deliver laser energy to a curved space, avoiding the requirement of a straight working path to the target. However, the fiber tip maintains direct contact with the tissue to prevent laser divergence, resulting in fiber damage, uneven ablation, and tissue carbonization. Here, a liquid lens is used to address the problem of laser defocusing when radiating targets at different depths for flexible-optical-fiber-based systems. The liquid lens focuses a laser with a maximum power of 3 W onto a medium-density fiberboard at a focal length of 40-180 mm. The relationships between the ablation crater diameter and depth with the radiation time and laser power have been quantitatively evaluated through OCT (optical coherence tomography) imaging. Experiments demonstrate that the liquid lens can continuously focus the high-power laser to different depths, with the advantages of compact size, fast response, light weight, and easy operation. This study explores liquid-lens-based focused laser ablation, which can potentially improve the performance of future medical image-guided laser ablation.
Collapse
|
7
|
Yang Z, Dai J, Pan J. 3D reconstruction from endoscopy images: A survey. Comput Biol Med 2024; 175:108546. [PMID: 38704902 DOI: 10.1016/j.compbiomed.2024.108546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 01/05/2024] [Accepted: 04/28/2024] [Indexed: 05/07/2024]
Abstract
Three-dimensional reconstruction of images acquired through endoscopes is playing a vital role in an increasing number of medical applications. Endoscopes used in the clinic are commonly classified as monocular endoscopes and binocular endoscopes. We have reviewed the classification of methods for depth estimation according to the type of endoscope. Basically, depth estimation relies on feature matching of images and multi-view geometry theory. However, these traditional techniques have many problems in the endoscopic environment. With the increasing development of deep learning techniques, there is a growing number of works based on learning methods to address challenges such as inconsistent illumination and texture sparsity. We have reviewed over 170 papers published in the 10 years from 2013 to 2023. The commonly used public datasets and performance metrics are summarized. We also give a taxonomy of methods and analyze the advantages and drawbacks of algorithms. Summary tables and result atlas are listed to facilitate the comparison of qualitative and quantitative performance of different methods in each category. In addition, we summarize commonly used scene representation methods in endoscopy and speculate on the prospects of deep estimation research in medical applications. We also compare the robustness performance, processing time, and scene representation of the methods to facilitate doctors and researchers in selecting appropriate methods based on surgical applications.
Collapse
Affiliation(s)
- Zhuoyue Yang
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, 37 Xueyuan Road, Haidian District, Beijing, 100191, China; Peng Cheng Lab, 2 Xingke 1st Street, Nanshan District, Shenzhen, Guangdong Province, 518000, China
| | - Ju Dai
- Peng Cheng Lab, 2 Xingke 1st Street, Nanshan District, Shenzhen, Guangdong Province, 518000, China
| | - Junjun Pan
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, 37 Xueyuan Road, Haidian District, Beijing, 100191, China; Peng Cheng Lab, 2 Xingke 1st Street, Nanshan District, Shenzhen, Guangdong Province, 518000, China.
| |
Collapse
|
8
|
Yang Z, Pan J, Dai J, Sun Z, Xiao Y. Self-Supervised Lightweight Depth Estimation in Endoscopy Combining CNN and Transformer. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:1934-1944. [PMID: 38198275 DOI: 10.1109/tmi.2024.3352390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2024]
Abstract
In recent years, an increasing number of medical engineering tasks, such as surgical navigation, pre-operative registration, and surgical robotics, rely on 3D reconstruction techniques. Self-supervised depth estimation has attracted interest in endoscopic scenarios because it does not require ground truth. Most existing methods depend on expanding the size of parameters to improve their performance. There, designing a lightweight self-supervised model that can obtain competitive results is a hot topic. We propose a lightweight network with a tight coupling of convolutional neural network (CNN) and Transformer for depth estimation. Unlike other methods that use CNN and Transformer to extract features separately and then fuse them on the deepest layer, we utilize the modules of CNN and Transformer to extract features at different scales in the encoder. This hierarchical structure leverages the advantages of CNN in texture perception and Transformer in shape extraction. In the same scale of feature extraction, the CNN is used to acquire local features while the Transformer encodes global information. Finally, we add multi-head attention modules to the pose network to improve the accuracy of predicted poses. Experiments demonstrate that our approach obtains comparable results while effectively compressing the model parameters on two datasets.
Collapse
|
9
|
Schmidt A, Mohareri O, DiMaio S, Yip MC, Salcudean SE. Tracking and mapping in medical computer vision: A review. Med Image Anal 2024; 94:103131. [PMID: 38442528 DOI: 10.1016/j.media.2024.103131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 02/08/2024] [Accepted: 02/29/2024] [Indexed: 03/07/2024]
Abstract
As computer vision algorithms increase in capability, their applications in clinical systems will become more pervasive. These applications include: diagnostics, such as colonoscopy and bronchoscopy; guiding biopsies, minimally invasive interventions, and surgery; automating instrument motion; and providing image guidance using pre-operative scans. Many of these applications depend on the specific visual nature of medical scenes and require designing algorithms to perform in this environment. In this review, we provide an update to the field of camera-based tracking and scene mapping in surgery and diagnostics in medical computer vision. We begin with describing our review process, which results in a final list of 515 papers that we cover. We then give a high-level summary of the state of the art and provide relevant background for those who need tracking and mapping for their clinical applications. After which, we review datasets provided in the field and the clinical needs that motivate their design. Then, we delve into the algorithmic side, and summarize recent developments. This summary should be especially useful for algorithm designers and to those looking to understand the capability of off-the-shelf methods. We maintain focus on algorithms for deformable environments while also reviewing the essential building blocks in rigid tracking and mapping since there is a large amount of crossover in methods. With the field summarized, we discuss the current state of the tracking and mapping methods along with needs for future algorithms, needs for quantification, and the viability of clinical applications. We then provide some research directions and questions. We conclude that new methods need to be designed or combined to support clinical applications in deformable environments, and more focus needs to be put into collecting datasets for training and evaluation.
Collapse
Affiliation(s)
- Adam Schmidt
- Department of Electrical and Computer Engineering, University of British Columbia, 2329 West Mall, Vancouver V6T 1Z4, BC, Canada.
| | - Omid Mohareri
- Advanced Research, Intuitive Surgical, 1020 Kifer Rd, Sunnyvale, CA 94086, USA
| | - Simon DiMaio
- Advanced Research, Intuitive Surgical, 1020 Kifer Rd, Sunnyvale, CA 94086, USA
| | - Michael C Yip
- Department of Electrical and Computer Engineering, University of California San Diego, 9500 Gilman Dr, La Jolla, CA 92093, USA
| | - Septimiu E Salcudean
- Department of Electrical and Computer Engineering, University of British Columbia, 2329 West Mall, Vancouver V6T 1Z4, BC, Canada
| |
Collapse
|
10
|
Guo H, Somayajula SA, Hosseini R, Xie P. Improving image classification of gastrointestinal endoscopy using curriculum self-supervised learning. Sci Rep 2024; 14:6100. [PMID: 38480815 PMCID: PMC10937990 DOI: 10.1038/s41598-024-53955-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 02/07/2024] [Indexed: 03/17/2024] Open
Abstract
Endoscopy, a widely used medical procedure for examining the gastrointestinal (GI) tract to detect potential disorders, poses challenges in manual diagnosis due to non-specific symptoms and difficulties in accessing affected areas. While supervised machine learning models have proven effective in assisting clinical diagnosis of GI disorders, the scarcity of image-label pairs created by medical experts limits their availability. To address these limitations, we propose a curriculum self-supervised learning framework inspired by human curriculum learning. Our approach leverages the HyperKvasir dataset, which comprises 100k unlabeled GI images for pre-training and 10k labeled GI images for fine-tuning. By adopting our proposed method, we achieved an impressive top-1 accuracy of 88.92% and an F1 score of 73.39%. This represents a 2.1% increase over vanilla SimSiam for the top-1 accuracy and a 1.9% increase for the F1 score. The combination of self-supervised learning and a curriculum-based approach demonstrates the efficacy of our framework in advancing the diagnosis of GI disorders. Our study highlights the potential of curriculum self-supervised learning in utilizing unlabeled GI tract images to improve the diagnosis of GI disorders, paving the way for more accurate and efficient diagnosis in GI endoscopy.
Collapse
Affiliation(s)
- Han Guo
- Department of Electrical and Computer Engineering, University of California, San Diego, San Diego, 92093, USA
| | - Sai Ashish Somayajula
- Department of Electrical and Computer Engineering, University of California, San Diego, San Diego, 92093, USA
| | - Ramtin Hosseini
- Department of Electrical and Computer Engineering, University of California, San Diego, San Diego, 92093, USA
| | - Pengtao Xie
- Department of Electrical and Computer Engineering, University of California, San Diego, San Diego, 92093, USA.
| |
Collapse
|
11
|
Zhang C, Wei R, Mo H, Zhai Y, Sun D. Deep learning-assisted 3D laser steering using an optofluidic laser scanner. BIOMEDICAL OPTICS EXPRESS 2024; 15:1668-1681. [PMID: 38495701 PMCID: PMC10942714 DOI: 10.1364/boe.514489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 12/26/2023] [Accepted: 12/27/2023] [Indexed: 03/19/2024]
Abstract
Laser ablation is an effective treatment modality. However, current laser scanners suffer from laser defocusing when scanning targets at different depths in a 3D surgical scene. This study proposes a deep learning-assisted 3D laser steering strategy for minimally invasive surgery that eliminates laser defocusing, increases working distance, and extends scanning range. An optofluidic laser scanner is developed to conduct 3D laser steering. The optofluidic laser scanner has no mechanical moving components, enabling miniature size, lightweight, and low driving voltage. A deep learning-based monocular depth estimation method provides real-time target depth estimation so that the focal length of the laser scanner can be adjusted for laser focusing. Simulations and experiments indicate that the proposed method can significantly increase the working distance and maintain laser focusing while performing 2D laser steering, demonstrating the potential for application in minimally invasive surgery.
Collapse
Affiliation(s)
- Chunqi Zhang
- Department of Biomedical Engineering, City University of Hong Kong, Hong Kong SAR, 999077, China
| | - Ruofeng Wei
- Department of Biomedical Engineering, City University of Hong Kong, Hong Kong SAR, 999077, China
| | - Hangjie Mo
- Department of Biomedical Engineering, City University of Hong Kong, Hong Kong SAR, 999077, China
| | - Yujia Zhai
- Department of Biomedical Engineering, City University of Hong Kong, Hong Kong SAR, 999077, China
| | - Dong Sun
- Department of Biomedical Engineering, City University of Hong Kong, Hong Kong SAR, 999077, China
- Center of Robotics and Automation, Shenzhen Research Institute, Shenzhen, Guangdong, 518000, China
| |
Collapse
|
12
|
Liu S, Fan J, Yang Y, Xiao D, Ai D, Song H, Wang Y, Yang J. Monocular endoscopy images depth estimation with multi-scale residual fusion. Comput Biol Med 2024; 169:107850. [PMID: 38145602 DOI: 10.1016/j.compbiomed.2023.107850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 11/16/2023] [Accepted: 12/11/2023] [Indexed: 12/27/2023]
Abstract
BACKGROUND Monocular depth estimation plays a fundamental role in clinical endoscopy surgery. However, the coherent illumination, smooth surfaces, and texture-less nature of endoscopy images present significant challenges to traditional depth estimation methods. Existing approaches struggle to accurately perceive depth in such settings. METHOD To overcome these challenges, this paper proposes a novel multi-scale residual fusion method for estimating the depth of monocular endoscopy images. Specifically, we address the issue of coherent illumination by leveraging image frequency domain component space transformation, thereby enhancing the stability of the scene's light source. Moreover, we employ an image radiation intensity attenuation model to estimate the initial depth map. Finally, to refine the accuracy of depth estimation, we utilize a multi-scale residual fusion optimization technique. RESULTS To evaluate the performance of our proposed method, extensive experiments were conducted on public datasets. The structural similarity measures for continuous frames in three distinct clinical data scenes reached impressive values of 0.94, 0.82, and 0.84, respectively. These results demonstrate the effectiveness of our approach in capturing the intricate details of endoscopy images. Furthermore, the depth estimation accuracy achieved remarkable levels of 89.3 % and 91.2 % for the two models' data, respectively, underscoring the robustness of our method. CONCLUSIONS Overall, the promising results obtained on public datasets highlight the significant potential of our method for clinical applications, facilitating reliable depth estimation and enhancing the quality of endoscopy surgical procedures.
Collapse
Affiliation(s)
- Shiyuan Liu
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China; China Center for Information Industry Development, Beijing, 100081, China
| | - Jingfan Fan
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China.
| | - Yun Yang
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University, National Clinical Research Center for Digestive Diseases, Beijing 100050, China
| | - Deqiang Xiao
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Danni Ai
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Hong Song
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
| | - Yongtian Wang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China.
| | - Jian Yang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| |
Collapse
|
13
|
Liu S, Fan J, Zang L, Yang Y, Fu T, Song H, Wang Y, Yang J. Pose estimation via structure-depth information from monocular endoscopy images sequence. BIOMEDICAL OPTICS EXPRESS 2024; 15:460-478. [PMID: 38223180 PMCID: PMC10783895 DOI: 10.1364/boe.498262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 12/08/2023] [Accepted: 12/14/2023] [Indexed: 01/16/2024]
Abstract
Image-based endoscopy pose estimation has been shown to significantly improve the visualization and accuracy of minimally invasive surgery (MIS). This paper proposes a method for pose estimation based on structure-depth information from a monocular endoscopy image sequence. Firstly, the initial frame location is constrained using the image structure difference (ISD) network. Secondly, endoscopy image depth information is used to estimate the pose of sequence frames. Finally, adaptive boundary constraints are used to optimize continuous frame endoscopy pose estimation, resulting in more accurate intraoperative endoscopy pose estimation. Evaluations were conducted on publicly available datasets, with the pose estimation error in bronchoscopy and colonoscopy datasets reaching 1.43 mm and 3.64 mm, respectively. These results meet the real-time requirements of various scenarios, demonstrating the capability of this method to generate reliable pose estimation results for endoscopy images and its meaningful applications in clinical practice. This method enables accurate localization of endoscopy images during surgery, assisting physicians in performing safer and more effective procedures.
Collapse
Affiliation(s)
- Shiyuan Liu
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China
- China Center for Information Industry Development, Beijing 100081, China
| | - Jingfan Fan
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China
| | - Liugeng Zang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China
| | - Yun Yang
- Department of General Surgery, Beijing Friendship Hospital, Capital Medical University; National Clinical Research Center for Digestive Diseases, Beijing 100050, China
| | - Tianyu Fu
- Institute of Engineering Medicine, Beijing Institute of Technology, Beijing 100081, China
| | - Hong Song
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Yongtian Wang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China
| | - Jian Yang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
14
|
Lin G, Zhang Z, Long K, Zhang Y, Lu Y, Geng J, Zhou Z, Feng Q, Lu L, Cao L. GCLR: A self-supervised representation learning pretext task for glomerular filtration barrier segmentation in TEM images. Artif Intell Med 2023; 146:102720. [PMID: 38042604 DOI: 10.1016/j.artmed.2023.102720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Revised: 10/04/2023] [Accepted: 11/14/2023] [Indexed: 12/04/2023]
Abstract
Automatic segmentation of the three substructures of glomerular filtration barrier (GFB) in transmission electron microscopy (TEM) images holds immense potential for aiding pathologists in renal disease diagnosis. However, the labor-intensive nature of manual annotations limits the training data for a fully-supervised deep learning model. Addressing this, our study harnesses self-supervised representation learning (SSRL) to utilize vast unlabeled data and mitigate annotation scarcity. Our innovation, GCLR, is a hybrid pixel-level pretext task tailored for GFB segmentation, integrating two subtasks: global clustering (GC) and local restoration (LR). GC captures the overall GFB by learning global context representations, while LR refines three substructures by learning local detail representations. Experiments on 18,928 unlabeled glomerular TEM images for self-supervised pre-training and 311 labeled ones for fine-tuning demonstrate that our proposed GCLR obtains the state-of-the-art segmentation results for all three substructures of GFB with the Dice similarity coefficient of 86.56 ± 0.16%, 75.56 ± 0.36%, and 79.41 ± 0.16%, respectively, compared with other representative self-supervised pretext tasks. Our proposed GCLR also outperforms the fully-supervised pre-training methods based on the three large-scale public datasets - MitoEM, COCO, and ImageNet - with less training data and time.
Collapse
Affiliation(s)
- Guoyu Lin
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, 510515, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510515, China
| | - Zhentai Zhang
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, 510515, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510515, China
| | - Kaixing Long
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, 510515, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510515, China
| | - Yiwen Zhang
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, 510515, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510515, China
| | - Yanmeng Lu
- Central Laboratory, Southern Medical University, Guangzhou, 510515, China
| | - Jian Geng
- Department of Pathology, School of Basic Medical Sciences, Southern Medical University, Guangzhou, 510515, China; Guangzhou Huayin Medical Laboratory Center, Guangzhou, 510515, China
| | - Zhitao Zhou
- Central Laboratory, Southern Medical University, Guangzhou, 510515, China
| | - Qianjin Feng
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, 510515, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510515, China
| | - Lijun Lu
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, 510515, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510515, China.
| | - Lei Cao
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, 510515, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510515, China.
| |
Collapse
|
15
|
Luo X, Xie L, Zeng HQ, Wang X, Li S. Monocular endoscope 6-DoF tracking with constrained evolutionary stochastic filtering. Med Image Anal 2023; 89:102928. [PMID: 37603943 DOI: 10.1016/j.media.2023.102928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2022] [Revised: 05/15/2023] [Accepted: 08/03/2023] [Indexed: 08/23/2023]
Abstract
Monocular endoscopic 6-DoF camera tracking plays a vital role in surgical navigation that involves multimodal images to build augmented or virtual reality surgery. Such a 6-DoF camera tracking generally can be formulated as a nonlinear optimization problem. To resolve this nonlinear problem, this work proposes a new pipeline of constrained evolutionary stochastic filtering that originally introduces spatial constraints and evolutionary stochastic diffusion to deal with particle degeneracy and impoverishment in current stochastic filtering methods. With its application to endoscope 6-DoF tracking and validation on clinical data including more than 59,000 endoscopic video frames acquired from various surgical procedures, the experimental results demonstrate the effectiveness of the new pipeline that works much better than state-of-the-art tracking methods. In particular, it can significantly improve the accuracy of current monocular endoscope tracking approaches from (4.83 mm, 10.2∘) to (2.78 mm, 7.44∘).
Collapse
Affiliation(s)
- Xiongbiao Luo
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361102, China; Department of Computer Science and Technology, Xiamen University, Xiamen 361005, China; Discipline of Intelligent Instrument and Equipment, Xiamen University, Xiamen 361102, China; Fujian Key Laboratory of Sensing and Computing for Smart Cities, Xiamen University, Xiamen 361005, China.
| | - Lixin Xie
- College of Pulmonary and Critical Care Medicine, Chinese PLA General Hospital, Beijing 100853, China
| | - Hui-Qing Zeng
- Department of Pulmonary and Critical Care Medicine, Zhongshan Hospital, Xiamen University, Xiamen 361004, China.
| | - Xiaoying Wang
- Department of Liver Surgery, Zhongshan Hospital, Fudan University, Shanghai 200032, China.
| | - Shiyue Li
- The First Affiliated Hospital of Guangzhou Medical University, Guangzhou 510120, China
| |
Collapse
|
16
|
Hirohata Y, Sogabe M, Miyazaki T, Kawase T, Kawashima K. Confidence-aware self-supervised learning for dense monocular depth estimation in dynamic laparoscopic scene. Sci Rep 2023; 13:15380. [PMID: 37717055 PMCID: PMC10505201 DOI: 10.1038/s41598-023-42713-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Accepted: 09/13/2023] [Indexed: 09/18/2023] Open
Abstract
This paper tackles the challenge of accurate depth estimation from monocular laparoscopic images in dynamic surgical environments. The lack of reliable ground truth due to inconsistencies within these images makes this a complex task. Further complicating the learning process is the presence of noise elements like bleeding and smoke. We propose a model learning framework that uses a generic laparoscopic surgery video dataset for training, aimed at achieving precise monocular depth estimation in dynamic surgical settings. The architecture employs binocular disparity confidence information as a self-supervisory signal, along with the disparity information from a stereo laparoscope. Our method ensures robust learning amidst outliers, influenced by tissue deformation, smoke, and surgical instruments, by utilizing a unique loss function. This function adjusts the selection and weighting of depth data for learning based on their given confidence. We trained the model using the Hamlyn Dataset and verified it with Hamlyn Dataset test data and a static dataset. The results show exceptional generalization performance and efficacy for various scene dynamics, laparoscope types, and surgical sites.
Collapse
Affiliation(s)
- Yasuhide Hirohata
- The Department of Information Physics and Computing, The University of Tokyo, Tokyo, 113-8656, Japan
| | - Maina Sogabe
- The Department of Information Physics and Computing, The University of Tokyo, Tokyo, 113-8656, Japan.
| | - Tetsuro Miyazaki
- The Department of Information Physics and Computing, The University of Tokyo, Tokyo, 113-8656, Japan
| | - Toshihiro Kawase
- The School of Engineering Department of Information and Communication Engineering, Tokyo Denki University, Tokyo, 120-8551, Japan
| | - Kenji Kawashima
- The Department of Information Physics and Computing, The University of Tokyo, Tokyo, 113-8656, Japan
| |
Collapse
|
17
|
Yu X, Zhao J, Wu H, Wang A. A Novel Evaluation Method for SLAM-Based 3D Reconstruction of Lumen Panoramas. SENSORS (BASEL, SWITZERLAND) 2023; 23:7188. [PMID: 37631725 PMCID: PMC10459170 DOI: 10.3390/s23167188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 08/09/2023] [Accepted: 08/10/2023] [Indexed: 08/27/2023]
Abstract
Laparoscopy is employed in conventional minimally invasive surgery to inspect internal cavities by viewing two-dimensional images on a monitor. This method has a limited field of view and provides insufficient information for surgeons, increasing surgical complexity. Utilizing simultaneous localization and mapping (SLAM) technology to reconstruct laparoscopic scenes can offer more comprehensive and intuitive visual feedback. Moreover, the precision of the reconstructed models is a crucial factor for further applications of surgical assistance systems. However, challenges such as data scarcity and scale uncertainty hinder effective assessment of the accuracy of endoscopic monocular SLAM reconstructions. Therefore, this paper proposes a technique that incorporates existing knowledge from calibration objects to supplement metric information and resolve scale ambiguity issues, and it quantifies the endoscopic reconstruction accuracy based on local alignment metrics. The experimental results demonstrate that the reconstructed models restore realistic scales and enable error analysis for laparoscopic SLAM reconstruction systems. This suggests that for the evaluation of monocular SLAM three-dimensional (3D) reconstruction accuracy in minimally invasive surgery scenarios, our proposed scheme for recovering scale factors is viable, and our evaluation outcomes can serve as criteria for measuring reconstruction precision.
Collapse
Affiliation(s)
- Xiaoyu Yu
- College of Electron and Information, University of Electronic Science and Technology of China, Zhongshan Institute, Zhongshan 528402, China;
- Heilongjiang Province Key Laboratory of Laser Spectroscopy Technology and Application, Harbin University of Science and Technology, Harbin 150080, China (A.W.)
| | - Jianbo Zhao
- Heilongjiang Province Key Laboratory of Laser Spectroscopy Technology and Application, Harbin University of Science and Technology, Harbin 150080, China (A.W.)
| | - Haibin Wu
- Heilongjiang Province Key Laboratory of Laser Spectroscopy Technology and Application, Harbin University of Science and Technology, Harbin 150080, China (A.W.)
| | - Aili Wang
- Heilongjiang Province Key Laboratory of Laser Spectroscopy Technology and Application, Harbin University of Science and Technology, Harbin 150080, China (A.W.)
| |
Collapse
|
18
|
Amanian A, Heffernan A, Ishii M, Creighton FX, Thamboo A. The Evolution and Application of Artificial Intelligence in Rhinology: A State of the Art Review. Otolaryngol Head Neck Surg 2023; 169:21-30. [PMID: 35787221 PMCID: PMC11110957 DOI: 10.1177/01945998221110076] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 06/10/2022] [Indexed: 11/16/2022]
Abstract
OBJECTIVE To provide a comprehensive overview on the applications of artificial intelligence (AI) in rhinology, highlight its limitations, and propose strategies for its integration into surgical practice. DATA SOURCES Medline, Embase, CENTRAL, Ei Compendex, IEEE, and Web of Science. REVIEW METHODS English studies from inception until January 2022 and those focusing on any application of AI in rhinology were included. Study selection was independently performed by 2 authors; discrepancies were resolved by the senior author. Studies were categorized by rhinology theme, and data collection comprised type of AI utilized, sample size, and outcomes, including accuracy and precision among others. CONCLUSIONS An overall 5435 articles were identified. Following abstract and title screening, 130 articles underwent full-text review, and 59 articles were selected for analysis. Eleven studies were from the gray literature. Articles were stratified into image processing, segmentation, and diagnostics (n = 27); rhinosinusitis classification (n = 14); treatment and disease outcome prediction (n = 8); optimizing surgical navigation and phase assessment (n = 3); robotic surgery (n = 2); olfactory dysfunction (n = 2); and diagnosis of allergic rhinitis (n = 3). Most AI studies were published from 2016 onward (n = 45). IMPLICATIONS FOR PRACTICE This state of the art review aimed to highlight the increasing applications of AI in rhinology. Next steps will entail multidisciplinary collaboration to ensure data integrity, ongoing validation of AI algorithms, and integration into clinical practice. Future research should be tailored at the interplay of AI with robotics and surgical education.
Collapse
Affiliation(s)
- Ameen Amanian
- Division of Otolaryngology–Head and Neck Surgery, Department of Surgery, University of British Columbia, Vancouver, Canada
| | - Austin Heffernan
- Division of Otolaryngology–Head and Neck Surgery, Department of Surgery, University of British Columbia, Vancouver, Canada
| | - Masaru Ishii
- Department of Otolaryngology–Head and Neck Surgery, School of Medicine, Johns Hopkins University, Baltimore, Maryland, USA
| | - Francis X. Creighton
- Department of Otolaryngology–Head and Neck Surgery, School of Medicine, Johns Hopkins University, Baltimore, Maryland, USA
| | - Andrew Thamboo
- Division of Otolaryngology–Head and Neck Surgery, Department of Surgery, University of British Columbia, Vancouver, Canada
| |
Collapse
|
19
|
Liu R, Liu Z, Lu J, Zhang G, Zuo Z, Sun B, Zhang J, Sheng W, Guo R, Zhang L, Hua X. Sparse-to-dense coarse-to-fine depth estimation for colonoscopy. Comput Biol Med 2023; 160:106983. [PMID: 37187133 DOI: 10.1016/j.compbiomed.2023.106983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 04/17/2023] [Accepted: 04/27/2023] [Indexed: 05/17/2023]
Abstract
Colonoscopy, as the golden standard for screening colon cancer and diseases, offers considerable benefits to patients. However, it also imposes challenges on diagnosis and potential surgery due to the narrow observation perspective and limited perception dimension. Dense depth estimation can overcome the above limitations and offer doctors straightforward 3D visual feedback. To this end, we propose a novel sparse-to-dense coarse-to-fine depth estimation solution for colonoscopic scenes based on the direct SLAM algorithm. The highlight of our solution is that we utilize the scattered 3D points obtained from SLAM to generate accurate and dense depth in full resolution. This is done by a deep learning (DL)-based depth completion network and a reconstruction system. The depth completion network effectively extracts texture, geometry, and structure features from sparse depth along with RGB data to recover the dense depth map. The reconstruction system further updates the dense depth map using a photometric error-based optimization and a mesh modeling approach to reconstruct a more accurate 3D model of colons with detailed surface texture. We show the effectiveness and accuracy of our depth estimation method on near photo-realistic challenging colon datasets. Experiments demonstrate that the strategy of sparse-to-dense coarse-to-fine can significantly improve the performance of depth estimation and smoothly fuse direct SLAM and DL-based depth estimation into a complete dense reconstruction system.
Collapse
Affiliation(s)
- Ruyu Liu
- School of Information Science and Technology, Hangzhou Normal University, Hangzhou, 311121, China; Haixi Institutes, Chinese Academy of Sciences Quanzhou Institute of Equipment Manufacturing, Quanzhou, 362000, China
| | - Zhengzhe Liu
- School of Information Science and Technology, Hangzhou Normal University, Hangzhou, 311121, China
| | - Jiaming Lu
- School of Computer Science and Engineering, Tianjin University of Technology, Tianjin, 300384, China
| | - Guodao Zhang
- Department of Digital Media Technology, Hangzhou Dianzi University, Hangzhou, 310018, China
| | - Zhigui Zuo
- Department of Colorectal Surgery, the First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325035, China
| | - Bo Sun
- Haixi Institutes, Chinese Academy of Sciences Quanzhou Institute of Equipment Manufacturing, Quanzhou, 362000, China
| | - Jianhua Zhang
- School of Computer Science and Engineering, Tianjin University of Technology, Tianjin, 300384, China
| | - Weiguo Sheng
- School of Information Science and Technology, Hangzhou Normal University, Hangzhou, 311121, China
| | - Ran Guo
- Cyberspace Institute Advanced Technology, Guangzhou University, Guangzhou, 510006, China.
| | - Lejun Zhang
- Cyberspace Institute Advanced Technology, Guangzhou University, Guangzhou, 510006, China; College of Information Engineering, Yangzhou University, Yangzhou, 225127, China; Research and Development Center for E-Learning, Ministry of Education, Beijing, 100039, China
| | - Xiaozhen Hua
- Department of Pediatrics, Cangnan Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325800, China.
| |
Collapse
|
20
|
Deng Z, Jiang P, Guo Y, Zhang S, Hu Y, Zheng X, He B. Safety-aware robotic steering of a flexible endoscope for nasotracheal intubation. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
21
|
Horovistiz A, Oliveira M, Araújo H. Computer vision-based solutions to overcome the limitations of wireless capsule endoscopy. J Med Eng Technol 2023; 47:242-261. [PMID: 38231042 DOI: 10.1080/03091902.2024.2302025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 12/28/2023] [Indexed: 01/18/2024]
Abstract
Endoscopic investigation plays a critical role in the diagnosis of gastrointestinal (GI) diseases. Since 2001, Wireless Capsule Endoscopy (WCE) has been available for small bowel exploration and is in continuous development. Over the last decade, WCE has achieved impressive improvements in areas such as miniaturisation, image quality and battery life. As a result, WCE is currently a very useful alternative to wired enteroscopy in the investigation of various small bowel abnormalities and has the potential to become the leading screening technique for the entire gastrointestinal tract. However, commercial solutions still have several limitations, namely incomplete examination and limited diagnostic capacity. These deficiencies are related to technical issues, such as image quality, motion estimation and power consumption management. Computational methods, based on image processing and analysis, can help to overcome these challenges and reduce both the time required by reviewers and human interpretation errors. Research groups have proposed a series of methods including algorithms for locating the capsule or lesion, assessing intestinal motility and improving image quality.In this work, we provide a critical review of computational vision-based methods for WCE image analysis aimed at overcoming the technological challenges of capsules. This article also reviews several representative public datasets used to evaluate the performance of WCE techniques and methods. Finally, some promising solutions of computational methods based on the analysis of multiple-camera endoscopic images are presented.
Collapse
Affiliation(s)
- Ana Horovistiz
- Institute of Systems and Robotics, University of Coimbra, Coimbra, Portugal
| | - Marina Oliveira
- Institute of Systems and Robotics, University of Coimbra, Coimbra, Portugal
- Department of Electrical and Computer Engineering (DEEC), Faculty of Sciences and Technology, University of Coimbra, Coimbra, Portugal
| | - Helder Araújo
- Institute of Systems and Robotics, University of Coimbra, Coimbra, Portugal
- Department of Electrical and Computer Engineering (DEEC), Faculty of Sciences and Technology, University of Coimbra, Coimbra, Portugal
| |
Collapse
|
22
|
Lee H, Park J, Jeong W, Jung SW. Monocular depth estimation network with single-pixel depth guidance. OPTICS LETTERS 2023; 48:594-597. [PMID: 36723539 DOI: 10.1364/ol.478375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 12/14/2022] [Indexed: 06/18/2023]
Abstract
Due to the scale ambiguity problem, the performance of monocular depth estimation (MDE) is inherently restricted. Multi-camera systems, especially those equipped with active depth cameras, have addressed this problem at the expense of increased hardware costs and space. In this Letter, we adopt a similar but cost-effective solution using only single-pixel depth guidance with a single-photon avalanche diode. To this end, we design a single-pixel guidance module (SPGM) that combines the global information from the single-pixel depth guidance with the spatial information from the image at the feature level. By integrating SPGMs into an MDE network, we introduce PhoMoNet, the first, to the best of our knowledge, end-to-end MDE network with single-pixel depth guidance. Experimental results show the effectiveness and superiority of PhoMoNet over state-of-the-art MDE networks on synthetic and real-world datasets.
Collapse
|
23
|
Ali S. Where do we stand in AI for endoscopic image analysis? Deciphering gaps and future directions. NPJ Digit Med 2022; 5:184. [PMID: 36539473 PMCID: PMC9767933 DOI: 10.1038/s41746-022-00733-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Accepted: 11/29/2022] [Indexed: 12/24/2022] Open
Abstract
Recent developments in deep learning have enabled data-driven algorithms that can reach human-level performance and beyond. The development and deployment of medical image analysis methods have several challenges, including data heterogeneity due to population diversity and different device manufacturers. In addition, more input from experts is required for a reliable method development process. While the exponential growth in clinical imaging data has enabled deep learning to flourish, data heterogeneity, multi-modality, and rare or inconspicuous disease cases still need to be explored. Endoscopy being highly operator-dependent with grim clinical outcomes in some disease cases, reliable and accurate automated system guidance can improve patient care. Most designed methods must be more generalisable to the unseen target data, patient population variability, and variable disease appearances. The paper reviews recent works on endoscopic image analysis with artificial intelligence (AI) and emphasises the current unmatched needs in this field. Finally, it outlines the future directions for clinically relevant complex AI solutions to improve patient outcomes.
Collapse
Affiliation(s)
- Sharib Ali
- School of Computing, University of Leeds, LS2 9JT, Leeds, UK.
| |
Collapse
|
24
|
Psychogyios D, Mazomenos E, Vasconcelos F, Stoyanov D. MSDESIS: Multitask Stereo Disparity Estimation and Surgical Instrument Segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2022; 41:3218-3230. [PMID: 35675257 PMCID: PMC7613770 DOI: 10.1109/tmi.2022.3181229] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Reconstructing the 3D geometry of the surgical site and detecting instruments within it are important tasks for surgical navigation systems and robotic surgery automation. Traditional approaches treat each problem in isolation and do not account for the intrinsic relationship between segmentation and stereo matching. In this paper, we present a learning-based framework that jointly estimates disparity and binary tool segmentation masks. The core component of our architecture is a shared feature encoder which allows strong interaction between the aforementioned tasks. Experimentally, we train two variants of our network with different capacities and explore different training schemes including both multi-task and single-task learning. Our results show that supervising the segmentation task improves our network's disparity estimation accuracy. We demonstrate a domain adaptation scheme where we supervise the segmentation task with monocular data and achieve domain adaptation of the adjacent disparity task, reducing disparity End-Point-Error and depth mean absolute error by 77.73% and 61.73% respectively compared to the pre-trained baseline model. Our best overall multi-task model, trained with both disparity and segmentation data in subsequent phases, achieves 89.15% mean Intersection-over-Union in RIS and 3.18 millimetre depth mean absolute error in SCARED test sets. Our proposed multi-task architecture is real-time, able to process ( 1280×1024 ) stereo input and simultaneously estimate disparity maps and segmentation masks at 22 frames per second. The model code and pre-trained models are made available: https://github.com/dimitrisPs/msdesis.
Collapse
|
25
|
Yang B, Xu S, Chen H, Zheng W, Liu C. Reconstruct Dynamic Soft-Tissue With Stereo Endoscope Based on a Single-Layer Network. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:5828-5840. [PMID: 36054398 DOI: 10.1109/tip.2022.3202367] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In dynamic minimally invasive surgery environments, 3D reconstruction of deformable soft-tissue surfaces with stereo endoscopic images is very challenging. A simple self-supervised stereo reconstruction framework is proposed to address this issue, which bridges the traditional geometric deformable models and the newly revived neural networks. The equivalence between the classical thin plate spline (TPS) model and a single-layer fully-connected or convolutional network is studied. By alternating training of two TPS equivalent networks within the self-supervised framework, disparity priors are learnt from the past stereo frames of target tissues to form an optimized disparity basis, on which disparity maps of subsequent frames can be estimated more accurately without sacrificing computational efficiency and robustness. The proposed method was verified on stereo-endoscopic videos recorded by the da Vinci® surgical robots.
Collapse
|
26
|
Masoumian A, Rashwan HA, Cristiano J, Asif MS, Puig D. Monocular Depth Estimation Using Deep Learning: A Review. SENSORS (BASEL, SWITZERLAND) 2022; 22:5353. [PMID: 35891033 PMCID: PMC9325018 DOI: 10.3390/s22145353] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 07/01/2022] [Accepted: 07/15/2022] [Indexed: 06/15/2023]
Abstract
In current decades, significant advancements in robotics engineering and autonomous vehicles have improved the requirement for precise depth measurements. Depth estimation (DE) is a traditional task in computer vision that can be appropriately predicted by applying numerous procedures. This task is vital in disparate applications such as augmented reality and target tracking. Conventional monocular DE (MDE) procedures are based on depth cues for depth prediction. Various deep learning techniques have demonstrated their potential applications in managing and supporting the traditional ill-posed problem. The principal purpose of this paper is to represent a state-of-the-art review of the current developments in MDE based on deep learning techniques. For this goal, this paper tries to highlight the critical points of the state-of-the-art works on MDE from disparate aspects. These aspects include input data shapes and training manners such as supervised, semi-supervised, and unsupervised learning approaches in combination with applying different datasets and evaluation indicators. At last, limitations regarding the accuracy of the DL-based MDE models, computational time requirements, real-time inference, transferability, input images shape and domain adaptation, and generalization are discussed to open new directions for future research.
Collapse
Affiliation(s)
- Armin Masoumian
- Department of Computer Engineering and Mathematics, University of Rovira i Virgili, 43007 Tarragona, Spain; (H.A.R.); (J.C.); (D.P.)
- Department of Electrical and Computer Engineering, University of California, Riverside, CA 92521, USA;
| | - Hatem A. Rashwan
- Department of Computer Engineering and Mathematics, University of Rovira i Virgili, 43007 Tarragona, Spain; (H.A.R.); (J.C.); (D.P.)
| | - Julián Cristiano
- Department of Computer Engineering and Mathematics, University of Rovira i Virgili, 43007 Tarragona, Spain; (H.A.R.); (J.C.); (D.P.)
| | - M. Salman Asif
- Department of Electrical and Computer Engineering, University of California, Riverside, CA 92521, USA;
| | - Domenec Puig
- Department of Computer Engineering and Mathematics, University of Rovira i Virgili, 43007 Tarragona, Spain; (H.A.R.); (J.C.); (D.P.)
| |
Collapse
|
27
|
Oda M, Itoh H, Tanaka K, Takabatake H, Mori M, Natori H, Mori K. Depth estimation from single-shot monocular endoscope image using image domain adaptation and edge-aware depth estimation. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING: IMAGING & VISUALIZATION 2022. [DOI: 10.1080/21681163.2021.2012835] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Masahiro Oda
- Information and Communications, Nagoya University, Nagoya, Japan
- Graduate School of Informatics, Nagoya University, Nagoya, Japan
| | - Hayato Itoh
- Graduate School of Informatics, Nagoya University, Nagoya, Japan
| | - Kiyohito Tanaka
- Department of Gastroenterology, Kyoto Second Red Cross Hospital, Kyoto, Japan
| | - Hirotsugu Takabatake
- Department of Respiratory Medicine, Sapporo-Minami-Sanjo Hospital, Sapporo, Japan
| | - Masaki Mori
- Department of Respiratory Medicine, Sapporo-Kosei General Hospital, Sapporo, Japan
| | - Hiroshi Natori
- Department of Respiratory Medicine, Keiwakai Nishioka Hospital, Sapporo, Japan
| | - Kensaku Mori
- Information and Communications, Nagoya University, Nagoya, Japan
- Graduate School of Informatics, Nagoya University, Nagoya, Japan
- Research Center for Medical Bigdata, National Institute of Informatics, Tokyo, Japan
| |
Collapse
|
28
|
Xu C, Huang B, Elson DS. Self-supervised Monocular Depth Estimation with 3D Displacement Module for Laparoscopic Images. IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS 2022; 4:331-334. [PMID: 36148138 PMCID: PMC7613618 DOI: 10.1109/tmrb.2022.3170206] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
We present a novel self-supervised training framework with 3D displacement (3DD) module for accurately estimating per-pixel depth maps from single laparoscopic images. Recently, several self-supervised learning based monocular depth estimation models have achieved good results on the KITTI dataset, under the hypothesis that the camera is dynamic and the objects are stationary, however this hypothesis is often reversed in the surgical setting (laparoscope is stationary, the surgical instruments and tissues are dynamic). Therefore, a 3DD module is proposed to establish the relation between frames instead of ego-motion estimation. In the 3DD module, a convolutional neural network (CNN) analyses source and target frames to predict the 3D displacement of a 3D point cloud from a target frame to a source frame in the coordinates of the camera. Since it is difficult to constrain the depth displacement from two 2D images, a novel depth consistency module is proposed to maintain depth consistency between displacement-updated depth and model-estimated depth to constrain 3D displacement effectively. Our proposed method achieves remarkable performance for monocular depth estimation on the Hamlyn surgical dataset and acquired ground truth depth maps, outperforming monodepth, monodepth2 and packnet models.
Collapse
Affiliation(s)
- Chi Xu
- The Hamlyn Centre for Robotic Surgery, Department of Surgery and Cancer, Imperial College London, London SW7 2AZ, UK
| | - Baoru Huang
- The Hamlyn Centre for Robotic Surgery, Department of Surgery and Cancer, Imperial College London, London SW7 2AZ, UK
| | - Daniel S. Elson
- The Hamlyn Centre for Robotic Surgery, Department of Surgery and Cancer, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
29
|
Huang B, Nguyen A, Wang S, Wang Z, Mayer E, Tuch D, Vyas K, Giannarou S, Elson DS. Simultaneous Depth Estimation and Surgical Tool Segmentation in Laparoscopic Images. IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS 2022; 4:335-338. [PMID: 36148137 PMCID: PMC7613616 DOI: 10.1109/tmrb.2022.3170215] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Surgical instrument segmentation and depth estimation are crucial steps to improve autonomy in robotic surgery. Most recent works treat these problems separately, making the deployment challenging. In this paper, we propose a unified framework for depth estimation and surgical tool segmentation in laparoscopic images. The network has an encoder-decoder architecture and comprises two branches for simultaneously performing depth estimation and segmentation. To train the network end to end, we propose a new multi-task loss function that effectively learns to estimate depth in an unsupervised manner, while requiring only semi-ground truth for surgical tool segmentation. We conducted extensive experiments on different datasets to validate these findings. The results showed that the end-to-end network successfully improved the state-of-the-art for both tasks while reducing the complexity during their deployment.
Collapse
Affiliation(s)
- Baoru Huang
- The Hamlyn Centre for Robotic Surgery, Imperial College London, SW7 2AZ, UK
- Department of Surgery & Cancer, Imperial College London, SW7 2AZ, UK
| | - Anh Nguyen
- The Hamlyn Centre for Robotic Surgery, Imperial College London, SW7 2AZ, UK
- Department of Computer Science, University of Liverpool, UK
| | - Siyao Wang
- The Hamlyn Centre for Robotic Surgery, Imperial College London, SW7 2AZ, UK
| | - Ziyang Wang
- Department of Computer Science, University of Oxford, UK
| | - Erik Mayer
- Department of Surgery & Cancer, Imperial College London, SW7 2AZ, UK
| | | | | | - Stamatia Giannarou
- The Hamlyn Centre for Robotic Surgery, Imperial College London, SW7 2AZ, UK
- Department of Surgery & Cancer, Imperial College London, SW7 2AZ, UK
| | - Daniel S Elson
- The Hamlyn Centre for Robotic Surgery, Imperial College London, SW7 2AZ, UK
- Department of Surgery & Cancer, Imperial College London, SW7 2AZ, UK
| |
Collapse
|
30
|
Liu S, Fan J, Song D, Fu T, Lin Y, Xiao D, Song H, Wang Y, Yang J. Joint estimation of depth and motion from a monocular endoscopy image sequence using a multi-loss rebalancing network. BIOMEDICAL OPTICS EXPRESS 2022; 13:2707-2727. [PMID: 35774318 PMCID: PMC9203100 DOI: 10.1364/boe.457475] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Revised: 04/01/2022] [Accepted: 04/01/2022] [Indexed: 06/15/2023]
Abstract
Building an in vivo three-dimensional (3D) surface model from a monocular endoscopy is an effective technology to improve the intuitiveness and precision of clinical laparoscopic surgery. This paper proposes a multi-loss rebalancing-based method for joint estimation of depth and motion from a monocular endoscopy image sequence. The feature descriptors are used to provide monitoring signals for the depth estimation network and motion estimation network. The epipolar constraints of the sequence frame is considered in the neighborhood spatial information by depth estimation network to enhance the accuracy of depth estimation. The reprojection information of depth estimation is used to reconstruct the camera motion by motion estimation network with a multi-view relative pose fusion mechanism. The relative response loss, feature consistency loss, and epipolar consistency loss function are defined to improve the robustness and accuracy of the proposed unsupervised learning-based method. Evaluations are implemented on public datasets. The error of motion estimation in three scenes decreased by 42.1%,53.6%, and 50.2%, respectively. And the average error of 3D reconstruction is 6.456 ± 1.798mm. This demonstrates its capability to generate reliable depth estimation and trajectory reconstruction results for endoscopy images and meaningful applications in clinical.
Collapse
Affiliation(s)
- Shiyuan Liu
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Jingfan Fan
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Dengpan Song
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Tianyu Fu
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Yucong Lin
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Deqiang Xiao
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Hong Song
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
| | - Yongtian Wang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| | - Jian Yang
- Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing, 100081, China
| |
Collapse
|
31
|
Gruijthuijsen C, Garcia-Peraza-Herrera LC, Borghesan G, Reynaerts D, Deprest J, Ourselin S, Vercauteren T, Vander Poorten E. Robotic Endoscope Control Via Autonomous Instrument Tracking. Front Robot AI 2022; 9:832208. [PMID: 35480090 PMCID: PMC9035496 DOI: 10.3389/frobt.2022.832208] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Accepted: 02/17/2022] [Indexed: 11/13/2022] Open
Abstract
Many keyhole interventions rely on bi-manual handling of surgical instruments, forcing the main surgeon to rely on a second surgeon to act as a camera assistant. In addition to the burden of excessively involving surgical staff, this may lead to reduced image stability, increased task completion time and sometimes errors due to the monotony of the task. Robotic endoscope holders, controlled by a set of basic instructions, have been proposed as an alternative, but their unnatural handling may increase the cognitive load of the (solo) surgeon, which hinders their clinical acceptance. More seamless integration in the surgical workflow would be achieved if robotic endoscope holders collaborated with the operating surgeon via semantically rich instructions that closely resemble instructions that would otherwise be issued to a human camera assistant, such as “focus on my right-hand instrument.” As a proof of concept, this paper presents a novel system that paves the way towards a synergistic interaction between surgeons and robotic endoscope holders. The proposed platform allows the surgeon to perform a bimanual coordination and navigation task, while a robotic arm autonomously performs the endoscope positioning tasks. Within our system, we propose a novel tooltip localization method based on surgical tool segmentation and a novel visual servoing approach that ensures smooth and appropriate motion of the endoscope camera. We validate our vision pipeline and run a user study of this system. The clinical relevance of the study is ensured through the use of a laparoscopic exercise validated by the European Academy of Gynaecological Surgery which involves bi-manual coordination and navigation. Successful application of our proposed system provides a promising starting point towards broader clinical adoption of robotic endoscope holders.
Collapse
Affiliation(s)
| | - Luis C. Garcia-Peraza-Herrera
- Department of Medical Physics and Biomedical Engineering, University College London, London, United Kingdom
- Department of Surgical and Interventional Engineering, King’s College London, London, United Kingdom
- *Correspondence: Luis C. Garcia-Peraza-Herrera,
| | - Gianni Borghesan
- Department of Mechanical Engineering, KU Leuven, Leuven, Belgium
- Core Lab ROB, Flanders Make, Lommel, Belgium
| | | | - Jan Deprest
- Department of Development and Regeneration, Division Woman and Child, KU Leuven, Leuven, Belgium
| | - Sebastien Ourselin
- Department of Surgical and Interventional Engineering, King’s College London, London, United Kingdom
| | - Tom Vercauteren
- Department of Surgical and Interventional Engineering, King’s College London, London, United Kingdom
| | | |
Collapse
|
32
|
Luo H, Wang C, Duan X, Liu H, Wang P, Hu Q, Jia F. Unsupervised learning of depth estimation from imperfect rectified stereo laparoscopic images. Comput Biol Med 2022; 140:105109. [PMID: 34891097 DOI: 10.1016/j.compbiomed.2021.105109] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Revised: 11/30/2021] [Accepted: 12/02/2021] [Indexed: 11/03/2022]
Abstract
BACKGROUND Learning-based methods have achieved remarkable performances on depth estimation. However, the premise of most self-learning and unsupervised learning methods is built on rigorous, geometrically-aligned stereo rectification. The performances of these methods degrade when the rectification is not accurate. Therefore, we explore an approach for unsupervised depth estimation from stereo images that can handle imperfect camera parameters. METHODS We propose an unsupervised deep convolutional network that takes rectified stereo image pairs as input and outputs corresponding dense disparity maps. First, a new vertical correction module is designed for predicting a correction map to compensate for the imperfect geometry alignment. Second, the left and right images, which are reconstructed based on the input image pair and corresponding disparities as well as the vertical correction maps, are regarded as the outputs of the generative term of the generative adversarial network (GAN). Then, the discriminator term of the GAN is used to distinguish the reconstructed images from the original inputs to force the generator to output increasingly realistic images. In addition, a residual mask is introduced to exclude pixels that conflict with the appearance of the original image in the loss calculation. RESULTS The proposed model is validated on the publicly available Stereo Correspondence and Reconstruction of Endoscopic Data (SCARED) dataset and the average MAE is 3.054 mm. CONCLUSION Our model can effectively handle imperfect rectified stereo images for depth estimation.
Collapse
Affiliation(s)
- Huoling Luo
- Research Lab for Medical Imaging and Digital Surgery, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China; Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen, China
| | - Congcong Wang
- School of Computer Science and Engineering, Tianjin University of Technology, Tianjin, China; Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway
| | - Xingguang Duan
- Advanced Innovation Centre for Intelligent Robots & Systems, Beijing Institute of Technology, Beijing, China
| | - Hao Liu
- State Key Lab for Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, China
| | - Ping Wang
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Qingmao Hu
- Research Lab for Medical Imaging and Digital Surgery, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China; Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen, China
| | - Fucang Jia
- Research Lab for Medical Imaging and Digital Surgery, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China; Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen, China; Pazhou Lab, Guangzhou, China.
| |
Collapse
|
33
|
Shao S, Pei Z, Chen W, Zhu W, Wu X, Sun D, Zhang B. Self-Supervised monocular depth and ego-Motion estimation in endoscopy: Appearance flow to the rescue. Med Image Anal 2021; 77:102338. [PMID: 35016079 DOI: 10.1016/j.media.2021.102338] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 10/24/2021] [Accepted: 12/14/2021] [Indexed: 11/25/2022]
Abstract
Recently, self-supervised learning technology has been applied to calculate depth and ego-motion from monocular videos, achieving remarkable performance in autonomous driving scenarios. One widely adopted assumption of depth and ego-motion self-supervised learning is that the image brightness remains constant within nearby frames. Unfortunately, the endoscopic scene does not meet this assumption because there are severe brightness fluctuations induced by illumination variations, non-Lambertian reflections and interreflections during data collection, and these brightness fluctuations inevitably deteriorate the depth and ego-motion estimation accuracy. In this work, we introduce a novel concept referred to as appearance flow to address the brightness inconsistency problem. The appearance flow takes into consideration any variations in the brightness pattern and enables us to develop a generalized dynamic image constraint. Furthermore, we build a unified self-supervised framework to estimate monocular depth and ego-motion simultaneously in endoscopic scenes, which comprises a structure module, a motion module, an appearance module and a correspondence module, to accurately reconstruct the appearance and calibrate the image brightness. Extensive experiments are conducted on the SCARED dataset and EndoSLAM dataset, and the proposed unified framework exceeds other self-supervised approaches by a large margin. To validate our framework's generalization ability on different patients and cameras, we train our model on SCARED but test it on the SERV-CT and Hamlyn datasets without any fine-tuning, and the superior results reveal its strong generalization ability. Code is available at: https://github.com/ShuweiShao/AF-SfMLearner.
Collapse
Affiliation(s)
- Shuwei Shao
- School of Automation Science and Electrical Engineering, Beihang University, Beijing, China
| | - Zhongcai Pei
- School of Automation Science and Electrical Engineering, Beihang University, Beijing, China; Hangzhou Innovation Institute, Beihang University, Hangzhou, China
| | - Weihai Chen
- School of Automation Science and Electrical Engineering, Beihang University, Beijing, China; Hangzhou Innovation Institute, Beihang University, Hangzhou, China.
| | | | - Xingming Wu
- School of Automation Science and Electrical Engineering, Beihang University, Beijing, China
| | - Dianmin Sun
- Shandong Cancer Hospital Affiliated to Shandong University, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, China
| | - Baochang Zhang
- Institute of Artificial Intelligence, Beihang University, Beijing, China.
| |
Collapse
|
34
|
Banach A, King F, Masaki F, Tsukada H, Hata N. Visually Navigated Bronchoscopy using three cycle-Consistent generative adversarial network for depth estimation. Med Image Anal 2021; 73:102164. [PMID: 34314953 PMCID: PMC8453111 DOI: 10.1016/j.media.2021.102164] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 06/29/2021] [Accepted: 07/06/2021] [Indexed: 11/30/2022]
Abstract
[Background] Electromagnetically Navigated Bronchoscopy (ENB) is currently the state-of-the art diagnostic and interventional bronchoscopy. CT-to-body divergence is a critical hurdle in ENB, causing navigation error and ultimately limiting the clinical efficacy of diagnosis and treatment. In this study, Visually Navigated Bronchoscopy (VNB) is proposed to address the aforementioned issue of CT-to-body divergence. [Materials and Methods] We extended and validated an unsupervised learning method to generate a depth map directly from bronchoscopic images using a Three Cycle-Consistent Generative Adversarial Network (3cGAN) and registering the depth map to preprocedural CTs. We tested the working hypothesis that the proposed VNB can be integrated to the navigated bronchoscopic system based on 3D Slicer, and accurately register bronchoscopic images to pre-procedural CTs to navigate transbronchial biopsies. The quantitative metrics to asses the hypothesis we set was Absolute Tracking Error (ATE) of the tracking and the Target Registration Error (TRE) of the total navigation system. We validated our method on phantoms produced from the pre-procedural CTs of five patients who underwent ENB and on two ex-vivo pig lung specimens. [Results] The ATE using 3cGAN was 6.2 +/- 2.9 [mm]. The ATE of 3cGAN was statistically significantly lower than that of cGAN, particularly in the trachea and lobar bronchus (p < 0.001). The TRE of the proposed method had a range of 11.7 to 40.5 [mm]. The TRE computed by 3cGAN was statistically significantly smaller than those computed by cGAN in two of the five cases enrolled (p < 0.05). [Conclusion] VNB, using 3cGAN to generate the depth maps was technically and clinically feasible. While the accuracy of tracking by cGAN was acceptable, the TRE warrants further investigation and improvement.
Collapse
Affiliation(s)
- Artur Banach
- National Center for Image-guided Therapy, Department of Radiology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, United States; QUT Centre for Robotics, Queensland University of Technology, Brisbane, Australia.
| | - Franklin King
- National Center for Image-guided Therapy, Department of Radiology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, United States
| | - Fumitaro Masaki
- National Center for Image-guided Therapy, Department of Radiology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, United States; Healthcare Optics Research Laboratory, Canon U.S.A., Cambridge, MA, United States
| | - Hisashi Tsukada
- Division of Thoracic Surgery, Department of Surgery, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, United States
| | - Nobuhiko Hata
- National Center for Image-guided Therapy, Department of Radiology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, United States
| |
Collapse
|
35
|
Recasens D, Lamarca J, Facil JM, Montiel JMM, Civera J. Endo-Depth-and-Motion: Reconstruction and Tracking in Endoscopic Videos Using Depth Networks and Photometric Constraints. IEEE Robot Autom Lett 2021. [DOI: 10.1109/lra.2021.3095528] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
36
|
Colonoscopic 3D reconstruction by tubular non-rigid structure-from-motion. Int J Comput Assist Radiol Surg 2021; 16:1237-1241. [PMID: 34031817 DOI: 10.1007/s11548-021-02409-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 05/11/2021] [Indexed: 10/21/2022]
Abstract
PURPOSE The visual examination of colonoscopic images fails to extract precise geometric information of the colonic surface. Reconstructing the 3D surface of the colon from colonoscopic image sequences may thus add valuable clinical information. We address this problem of extracting precise spatio-temporal 3D structure information from colonoscopic images. METHODS Using just the intrinsically calibrated monocular image stream, we develop a technique to compute the depth of certain feature points that have been tracked across images. Our method uses the prior knowledge of an approximate geometry of the colon, called the (TTP). It works by fitting a deformable cylindrical model to points reconstructed independently by non-rigid structure-from-motion (NRSfM), compromising between the data term and a novel tubular smoothing prior. Our method represents the first method ever to exploit a very weak topological prior to improve NRSfM. As such, it lies in-between standard NRSfM, which does not use a topological prior beyond the mere plane, and shape-from-template (SfT), which uses a very strong prior as a full deformable 3D object model. RESULTS We validate our method on both synthetic images of tubular structures and real colonoscopic data. Our method improves the results obtained by existing NRSfM methods by 71.74% on average on synthetic data and succeeds in obtaining 3D reconstruction from a real colonoscopic sequence defeating the existing methods. CONCLUSION Colonoscopic 3D reconstruction is a difficult problem, which is yet unresolved by the existing methods from computer vision. Our proposed dedicated NRSfM method and experiments show that the visual motion might be the right visual cue to use in colonoscopy.
Collapse
|
37
|
Tong HS, Ng YL, Liu Z, Ho JDL, Chan PL, Chan JYK, Kwok KW. Real-to-virtual domain transfer-based depth estimation for real-time 3D annotation in transnasal surgery: a study of annotation accuracy and stability. Int J Comput Assist Radiol Surg 2021; 16:731-739. [PMID: 33786777 PMCID: PMC8134290 DOI: 10.1007/s11548-021-02346-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 03/05/2021] [Indexed: 11/29/2022]
Abstract
PURPOSE Surgical annotation promotes effective communication between medical personnel during surgical procedures. However, existing approaches to 2D annotations are mostly static with respect to a display. In this work, we propose a method to achieve 3D annotations that anchor rigidly and stably to target structures upon camera movement in a transnasal endoscopic surgery setting. METHODS This is accomplished through intra-operative endoscope tracking and monocular depth estimation. A virtual endoscopic environment is utilized to train a supervised depth estimation network. An adversarial network transfers the style from the real endoscopic view to a synthetic-like view for input into the depth estimation network, wherein framewise depth can be obtained in real time. RESULTS (1) Accuracy: Framewise depth was predicted from images captured from within a nasal airway phantom and compared with ground truth, achieving a SSIM value of 0.8310 ± 0.0655. (2) Stability: mean absolute error (MAE) between reference and predicted depth of a target point was 1.1330 ± 0.9957 mm. CONCLUSION Both the accuracy and stability evaluations demonstrated the feasibility and practicality of our proposed method for achieving 3D annotations.
Collapse
Affiliation(s)
- Hon-Sing Tong
- Department of Mechanical Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong
| | - Yui-Lun Ng
- Department of Mechanical Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong
| | - Zhiyu Liu
- Department of Mechanical Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong
| | - Justin D L Ho
- Department of Mechanical Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong
| | - Po-Ling Chan
- Department of Otorhinolaryngology, Head and Neck Surgery, The Chinese University of Hong Kong, Sha Tin, Hong Kong SAR
| | - Jason Y K Chan
- Department of Otorhinolaryngology, Head and Neck Surgery, The Chinese University of Hong Kong, Sha Tin, Hong Kong SAR.
| | - Ka-Wai Kwok
- Department of Mechanical Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong.
| |
Collapse
|
38
|
Ozyoruk KB, Gokceler GI, Bobrow TL, Coskun G, Incetan K, Almalioglu Y, Mahmood F, Curto E, Perdigoto L, Oliveira M, Sahin H, Araujo H, Alexandrino H, Durr NJ, Gilbert HB, Turan M. EndoSLAM dataset and an unsupervised monocular visual odometry and depth estimation approach for endoscopic videos. Med Image Anal 2021; 71:102058. [PMID: 33930829 DOI: 10.1016/j.media.2021.102058] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 01/23/2021] [Accepted: 03/29/2021] [Indexed: 02/07/2023]
Abstract
Deep learning techniques hold promise to develop dense topography reconstruction and pose estimation methods for endoscopic videos. However, currently available datasets do not support effective quantitative benchmarking. In this paper, we introduce a comprehensive endoscopic SLAM dataset consisting of 3D point cloud data for six porcine organs, capsule and standard endoscopy recordings, synthetically generated data as well as clinically in use conventional endoscope recording of the phantom colon with computed tomography(CT) scan ground truth. A Panda robotic arm, two commercially available capsule endoscopes, three conventional endoscopes with different camera properties, two high precision 3D scanners, and a CT scanner were employed to collect data from eight ex-vivo porcine gastrointestinal (GI)-tract organs and a silicone colon phantom model. In total, 35 sub-datasets are provided with 6D pose ground truth for the ex-vivo part: 18 sub-datasets for colon, 12 sub-datasets for stomach, and 5 sub-datasets for small intestine, while four of these contain polyp-mimicking elevations carried out by an expert gastroenterologist. To verify the applicability of this data for use with real clinical systems, we recorded a video sequence with a state-of-the-art colonoscope from a full representation silicon colon phantom. Synthetic capsule endoscopy frames from stomach, colon, and small intestine with both depth and pose annotations are included to facilitate the study of simulation-to-real transfer learning algorithms. Additionally, we propound Endo-SfMLearner, an unsupervised monocular depth and pose estimation method that combines residual networks with a spatial attention module in order to dictate the network to focus on distinguishable and highly textured tissue regions. The proposed approach makes use of a brightness-aware photometric loss to improve the robustness under fast frame-to-frame illumination changes that are commonly seen in endoscopic videos. To exemplify the use-case of the EndoSLAM dataset, the performance of Endo-SfMLearner is extensively compared with the state-of-the-art: SC-SfMLearner, Monodepth2, and SfMLearner. The codes and the link for the dataset are publicly available at https://github.com/CapsuleEndoscope/EndoSLAM. A video demonstrating the experimental setup and procedure is accessible as Supplementary Video 1.
Collapse
Affiliation(s)
| | | | - Taylor L Bobrow
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Gulfize Coskun
- Institute of Biomedical Engineering, Bogazici University, Turkey
| | - Kagan Incetan
- Institute of Biomedical Engineering, Bogazici University, Turkey
| | | | - Faisal Mahmood
- Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA; Cancer Data Science, Dana Farber Cancer Institute, Boston, MA, USA; Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Eva Curto
- Institute for Systems and Robotics, University of Coimbra, Portugal
| | - Luis Perdigoto
- Institute for Systems and Robotics, University of Coimbra, Portugal
| | - Marina Oliveira
- Institute for Systems and Robotics, University of Coimbra, Portugal
| | - Hasan Sahin
- Institute of Biomedical Engineering, Bogazici University, Turkey
| | - Helder Araujo
- Institute for Systems and Robotics, University of Coimbra, Portugal
| | - Henrique Alexandrino
- Faculty of Medicine, Clinical Academic Center of Coimbra, University of Coimbra, Coimbra, Portugal
| | - Nicholas J Durr
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Hunter B Gilbert
- Department of Mechanical and Industrial Engineering, Louisiana State University, Baton Rouge, LA, USA
| | - Mehmet Turan
- Institute of Biomedical Engineering, Bogazici University, Turkey.
| |
Collapse
|
39
|
Gehrig D, Ruegg M, Gehrig M, Hidalgo-Carrio J, Scaramuzza D. Combining Events and Frames Using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction. IEEE Robot Autom Lett 2021. [DOI: 10.1109/lra.2021.3060707] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
40
|
Widya AR, Monno Y, Okutomi M, Suzuki S, Gotoda T, Miki K. Stomach 3D Reconstruction Using Virtual Chromoendoscopic Images. IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE-JTEHM 2021; 9:1700211. [PMID: 33796417 PMCID: PMC8009143 DOI: 10.1109/jtehm.2021.3062226] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Revised: 01/19/2021] [Accepted: 02/15/2021] [Indexed: 12/23/2022]
Abstract
Gastric endoscopy is a golden standard in the clinical process that enables medical practitioners to diagnose various lesions inside a patient’s stomach. If a lesion is found, a success in identifying the location of the found lesion relative to the global view of the stomach will lead to better decision making for the next clinical treatment. Our previous research showed that the lesion localization could be achieved by reconstructing the whole stomach shape from chromoendoscopic indigo carmine (IC) dye-sprayed images using a structure-from-motion (SfM) pipeline. However, spraying the IC dye to the whole stomach requires additional time, which is not desirable for both patients and practitioners. Our objective is to propose an alternative way to achieve whole stomach 3D reconstruction without the need of the IC dye. We generate virtual IC-sprayed (VIC) images based on image-to-image style translation trained on unpaired real no-IC and IC-sprayed images, where we have investigated the effect of input and output color channel selection for generating the VIC images. We validate our reconstruction results by comparing them with the results using real IC-sprayed images and confirm that the obtained stomach 3D structures are comparable to each other. We also propose a local reconstruction technique to obtain a more detailed surface and texture around an interesting region. The proposed method achieves the whole stomach reconstruction without the need of real IC dye using SfM. We have found that translating no-IC green-channel images to IC-sprayed red-channel images gives the best SfM reconstruction result. Clinical impact We offer a method of the frame localization and local 3D reconstruction of a found gastric lesion using standard endoscopy images, leading to better clinical decision.
Collapse
Affiliation(s)
- Aji Resindra Widya
- Department of Systems and Control EngineeringSchool of EngineeringTokyo Institute of TechnologyTokyo152-8550Japan
| | - Yusuke Monno
- Department of Systems and Control EngineeringSchool of EngineeringTokyo Institute of TechnologyTokyo152-8550Japan
| | - Masatoshi Okutomi
- Department of Systems and Control EngineeringSchool of EngineeringTokyo Institute of TechnologyTokyo152-8550Japan
| | - Sho Suzuki
- Division of Gastroenterology and HepatologyDepartment of MedicineNihon University School of MedicineTokyo101-8309Japan
| | - Takuji Gotoda
- Division of Gastroenterology and HepatologyDepartment of MedicineNihon University School of MedicineTokyo101-8309Japan
| | - Kenji Miki
- Department of Internal MedicineTsujinaka Hospital KashiwanohaKashiwa277-0871Japan
| |
Collapse
|
41
|
Sharan L, Burger L, Kostiuchik G, Wolf I, Karck M, De Simone R, Engelhardt S. Domain gap in adapting self-supervised depth estimation methods for stereo-endoscopy. CURRENT DIRECTIONS IN BIOMEDICAL ENGINEERING 2020. [DOI: 10.1515/cdbme-2020-0004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Abstract
In endoscopy, depth estimation is a task that potentially helps in quantifying visual information for better scene understanding. A plethora of depth estimation algorithms have been proposed in the computer vision community. The endoscopic domain however, differs from the typical depth estimation scenario due to differences in the setup and nature of the scene. Furthermore, it is unfeasible to obtain ground truth depth information owing to an unsuitable detection range of off-the-shelf depth sensors and difficulties in setting up a depth-sensor in a surgical environment. In this paper, an existing self-supervised approach, called Monodepth [1], from the field of autonomous driving is applied to a novel dataset of stereo-endoscopic images from reconstructive mitral valve surgery. While it is already known that endoscopic scenes are more challenging than outdoor driving scenes, the paper performs experiments to quantify the comparison, and describe the domain gap and challenges involved in the transfer of these methods.
Collapse
Affiliation(s)
- Lalith Sharan
- WG Artificial Intelligence in Cardiovascular Medicine (AICM), University Hospital Heidelberg , Heidelberg , Germany
- Informatics for Life , Heidelberg , Germany
| | - Lukas Burger
- WG Artificial Intelligence in Cardiovascular Medicine (AICM), University Hospital Heidelberg , Heidelberg , Germany
- Department of Computer Science , Mannheim University of Applied Sciences , Mannheim , Germany
| | - Georgii Kostiuchik
- Department of Computer Science , Mannheim University of Applied Sciences , Mannheim , Germany
| | - Ivo Wolf
- Department of Computer Science , Mannheim University of Applied Sciences , Mannheim , Germany
| | - Matthias Karck
- Department of Cardiac Surgery , University Hospital Heidelberg , Heidelberg , Germany
| | - Raffaele De Simone
- Department of Cardiac Surgery , University Hospital Heidelberg , Heidelberg , Germany
| | - Sandy Engelhardt
- WG Artificial Intelligence in Cardiovascular Medicine (AICM), University Hospital Heidelberg , Heidelberg , Germany
- Informatics for Life , Heidelberg , Germany
| |
Collapse
|
42
|
Deep Learning-Based Monocular Depth Estimation Methods-A State-of-the-Art Review. SENSORS 2020; 20:s20082272. [PMID: 32316336 PMCID: PMC7219073 DOI: 10.3390/s20082272] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Revised: 04/09/2020] [Accepted: 04/12/2020] [Indexed: 12/11/2022]
Abstract
Monocular depth estimation from Red-Green-Blue (RGB) images is a well-studied ill-posed problem in computer vision which has been investigated intensively over the past decade using Deep Learning (DL) approaches. The recent approaches for monocular depth estimation mostly rely on Convolutional Neural Networks (CNN). Estimating depth from two-dimensional images plays an important role in various applications including scene reconstruction, 3D object-detection, robotics and autonomous driving. This survey provides a comprehensive overview of this research topic including the problem representation and a short description of traditional methods for depth estimation. Relevant datasets and 13 state-of-the-art deep learning-based approaches for monocular depth estimation are reviewed, evaluated and discussed. We conclude this paper with a perspective towards future research work requiring further investigation in monocular depth estimation challenges.
Collapse
|