1
|
Wang Y, Ye Z, Wen M, Liang H, Zhang X. TransVFS: A spatio-temporal local-global transformer for vision-based force sensing during ultrasound-guided prostate biopsy. Med Image Anal 2024; 94:103130. [PMID: 38437787 DOI: 10.1016/j.media.2024.103130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 02/16/2024] [Accepted: 02/29/2024] [Indexed: 03/06/2024]
Abstract
Robot-assisted prostate biopsy is a new technology to diagnose prostate cancer, but its safety is influenced by the inability of robots to sense the tool-tissue interaction force accurately during biopsy. Recently, vision based force sensing (VFS) provides a potential solution to this issue by utilizing image sequences to infer the interaction force. However, the existing mainstream VFS methods cannot realize the accurate force sensing due to the adoption of convolutional or recurrent neural network to learn deformation from the optical images and some of these methods are not efficient especially when the recurrent convolutional operations are involved. This paper has presented a Transformer based VFS (TransVFS) method by leveraging ultrasound volume sequences acquired during prostate biopsy. The TransVFS method uses a spatio-temporal local-global Transformer to capture the local image details and the global dependency simultaneously to learn prostate deformations for force estimation. Distinctively, our method explores both the spatial and temporal attention mechanisms for image feature learning, thereby addressing the influence of the low ultrasound image resolution and the unclear prostate boundary on the accurate force estimation. Meanwhile, the two efficient local-global attention modules are introduced to reduce 4D spatio-temporal computation burden by utilizing the factorized spatio-temporal processing strategy, thereby facilitating the fast force estimation. Experiments on prostate phantom and beagle dogs show that our method significantly outperforms existing VFS methods and other spatio-temporal Transformer models. The TransVFS method surpasses the most competitive compared method ResNet3dGRU by providing the mean absolute errors of force estimation, i.e., 70.4 ± 60.0 millinewton (mN) vs 123.7 ± 95.6 mN, on the transabdominal ultrasound dataset of dogs.
Collapse
Affiliation(s)
- Yibo Wang
- Department of Biomedical Engineering, College of Life Science and Technology, Huazhong University of Science and Technology, No 1037, Luyou Road, Wuhan, China
| | - Zhichao Ye
- Department of Urology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, No 13, Hangkong Road, Wuhan, China
| | - Mingwei Wen
- Department of Biomedical Engineering, College of Life Science and Technology, Huazhong University of Science and Technology, No 1037, Luyou Road, Wuhan, China
| | - Huageng Liang
- Department of Urology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, No 13, Hangkong Road, Wuhan, China
| | - Xuming Zhang
- Department of Biomedical Engineering, College of Life Science and Technology, Huazhong University of Science and Technology, No 1037, Luyou Road, Wuhan, China.
| |
Collapse
|
2
|
Bengs M, Sprenger J, Gerlach S, Neidhardt M, Schlaefer A. Real-Time Motion Analysis With 4D Deep Learning for Ultrasound-Guided Radiotherapy. IEEE Trans Biomed Eng 2023; 70:2690-2699. [PMID: 37030809 DOI: 10.1109/tbme.2023.3262422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/30/2023]
Abstract
Motion compensation in radiation therapy is a challenging scenario that requires estimating and forecasting motion of tissue structures to deliver the target dose. Ultrasound offers direct imaging of tissue in real-time and is considered for image guidance in radiation therapy. Recently, fast volumetric ultrasound has gained traction, but motion analysis with such high-dimensional data remains difficult. While deep learning could bring many advantages, such as fast data processing and high performance, it remains unclear how to process sequences of hundreds of image volumes efficiently and effectively. We present a 4D deep learning approach for real-time motion estimation and forecasting using long-term 4D ultrasound data. Using motion traces acquired during radiation therapy combined with various tissue types, our results demonstrate that long-term motion estimation can be performed markerless with a tracking error of 0.35±0.2 mm and with an inference time of less than 5 ms. Also, we demonstrate forecasting directly from the image data up to 900 ms into the future. Overall, our findings highlight that 4D deep learning is a promising approach for motion analysis during radiotherapy.
Collapse
|
3
|
Optical force estimation for interactions between tool and soft tissues. Sci Rep 2023; 13:506. [PMID: 36627354 PMCID: PMC9831996 DOI: 10.1038/s41598-022-27036-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 12/23/2022] [Indexed: 01/11/2023] Open
Abstract
Robotic assistance in minimally invasive surgery offers numerous advantages for both patient and surgeon. However, the lack of force feedback in robotic surgery is a major limitation, and accurately estimating tool-tissue interaction forces remains a challenge. Image-based force estimation offers a promising solution without the need to integrate sensors into surgical tools. In this indirect approach, interaction forces are derived from the observed deformation, with learning-based methods improving accuracy and real-time capability. However, the relationship between deformation and force is determined by the stiffness of the tissue. Consequently, both deformation and local tissue properties must be observed for an approach applicable to heterogeneous tissue. In this work, we use optical coherence tomography, which can combine the detection of tissue deformation with shear wave elastography in a single modality. We present a multi-input deep learning network for processing of local elasticity estimates and volumetric image data. Our results demonstrate that accounting for elastic properties is critical for accurate image-based force estimation across different tissue types and properties. Joint processing of local elasticity information yields the best performance throughout our phantom study. Furthermore, we test our approach on soft tissue samples that were not present during training and show that generalization to other tissue properties is possible.
Collapse
|
4
|
Wu S, Zhao W, Ji S. Real-time dynamic simulation for highly accurate spatiotemporal brain deformation from impact. COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING 2022; 394:114913. [PMID: 35572209 PMCID: PMC9097909 DOI: 10.1016/j.cma.2022.114913] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Real-time dynamic simulation remains a significant challenge for spatiotemporal data of high dimension and resolution. In this study, we establish a transformer neural network (TNN) originally developed for natural language processing and a separate convolutional neural network (CNN) to estimate five-dimensional (5D) spatiotemporal brain-skull relative displacement resulting from impact (isotropic spatial resolution of 4 mm with temporal resolution of 1 ms). Sequential training is applied to train (N = 5184 samples) the two neural networks for estimating the complete 5D displacement across a temporal duration of 60 ms. We find that TNN slightly but consistently outperforms CNN in accuracy for both displacement and the resulting voxel-wise four-dimensional (4D) maximum principal strain (e.g., root mean squared error (RMSE) of ~1.0% vs. ~1.6%, with coefficient of determination, R 2 >0.99 vs. >0.98, respectively, and normalized RMSE (NRMSE) at peak displacement of 2%-3%, based on an independent testing dataset; N = 314). Their accuracies are similar for a range of real-world impacts drawn from various published sources (dummy, helmet, football, soccer, and car crash; average RMSE/NRMSE of ~0.3 mm/~4%-5% and average R 2 of ~0.98 at peak displacement). Sequential training is effective for allowing instantaneous estimation of 5D displacement with high accuracy, although TNN poses a heavier computational burden in training. This work enables efficient characterization of the intrinsically dynamic brain strain in impact critical for downstream multiscale axonal injury model simulation. This is also the first application of TNN in biomechanics, which offers important insight into how real-time dynamic simulations can be achieved across diverse engineering fields.
Collapse
Affiliation(s)
- Shaoju Wu
- Department of Biomedical Engineering, Worcester Polytechnic Institute, Worcester, MA, United States of America
| | - Wei Zhao
- Department of Biomedical Engineering, Worcester Polytechnic Institute, Worcester, MA, United States of America
| | - Songbai Ji
- Department of Biomedical Engineering, Worcester Polytechnic Institute, Worcester, MA, United States of America
- Department of Mechanical Engineering, Worcester Polytechnic Institute, Worcester, MA, United States of America
- Correspondence to: Department of Biomedical Engineering, Worcester Polytechnic Institute, 60 Prescott Street, Worcester, MA 01506, USA., (S. Ji)
| |
Collapse
|
5
|
Jiang Z, Wang Y, Shi C, Wu Y, Hu R, Chen S, Hu S, Wang X, Qiu B. Attention module improves both performance and interpretability of four-dimensional functional magnetic resonance imaging decoding neural network. Hum Brain Mapp 2022; 43:2683-2692. [PMID: 35212436 PMCID: PMC9057093 DOI: 10.1002/hbm.25813] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Revised: 01/29/2022] [Accepted: 02/09/2022] [Indexed: 11/15/2022] Open
Abstract
Decoding brain cognitive states from neuroimaging signals is an important topic in neuroscience. In recent years, deep neural networks (DNNs) have been recruited for multiple brain state decoding and achieved good performance. However, the open question of how to interpret the DNN black box remains unanswered. Capitalizing on advances in machine learning, we integrated attention modules into brain decoders to facilitate an in‐depth interpretation of DNN channels. A four‐dimensional (4D) convolution operation was also included to extract temporo‐spatial interaction within the fMRI signal. The experiments showed that the proposed model obtains a very high accuracy (97.4%) and outperforms previous researches on the seven different task benchmarks from the Human Connectome Project (HCP) dataset. The visualization analysis further illustrated the hierarchical emergence of task‐specific masks with depth. Finally, the model was retrained to regress individual traits within the HCP and to classify viewing images from the BOLD5000 dataset, respectively. Transfer learning also achieves good performance. Further visualization analysis shows that, after transfer learning, low‐level attention masks remained similar to the source domain, whereas high‐level attention masks changed adaptively. In conclusion, the proposed 4D model with attention module performed well and facilitated interpretation of DNNs, which is helpful for subsequent research.
Collapse
Affiliation(s)
- Zhoufan Jiang
- Center for Biomedical Imaging, University of Science and Technology of China, Hefei, Anhui, China
| | - Yanming Wang
- Center for Biomedical Imaging, University of Science and Technology of China, Hefei, Anhui, China
| | - ChenWei Shi
- Center for Biomedical Imaging, University of Science and Technology of China, Hefei, Anhui, China
| | - Yueyang Wu
- Center for Biomedical Imaging, University of Science and Technology of China, Hefei, Anhui, China
| | - Rongjie Hu
- Center for Biomedical Imaging, University of Science and Technology of China, Hefei, Anhui, China
| | - Shishuo Chen
- Center for Biomedical Imaging, University of Science and Technology of China, Hefei, Anhui, China
| | - Sheng Hu
- Center for Biomedical Imaging, University of Science and Technology of China, Hefei, Anhui, China
| | - Xiaoxiao Wang
- Center for Biomedical Imaging, University of Science and Technology of China, Hefei, Anhui, China.,Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, Anhui, China
| | - Bensheng Qiu
- Center for Biomedical Imaging, University of Science and Technology of China, Hefei, Anhui, China.,Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, Anhui, China
| |
Collapse
|
6
|
Vision-Based Suture Tensile Force Estimation in Robotic Surgery. SENSORS 2020; 21:s21010110. [PMID: 33375388 PMCID: PMC7796030 DOI: 10.3390/s21010110] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Revised: 12/23/2020] [Accepted: 12/24/2020] [Indexed: 12/14/2022]
Abstract
Compared to laparoscopy, robotics-assisted minimally invasive surgery has the problem of an absence of force feedback, which is important to prevent a breakage of the suture. To overcome this problem, surgeons infer the suture force from their proprioception and 2D image by comparing them to the training experience. Based on this idea, a deep-learning-based method using a single image and robot position to estimate the tensile force of the sutures without a force sensor is proposed. A neural network structure with a modified Inception Resnet-V2 and Long Short Term Memory (LSTM) networks is used to estimate the suture pulling force. The feasibility of proposed network is verified using the generated DB, recording the interaction under the condition of two different artificial skins and two different situations (in vivo and in vitro) at 13 viewing angles of the images by changing the tool positions collected from the master-slave robotic system. From the evaluation conducted to show the feasibility of the interaction force estimation, the proposed learning models successfully estimated the tensile force at 10 unseen viewing angles during training.
Collapse
|
7
|
Behrendt F, Gessert N, Schlaefer A. Generalization of spatio-temporal deep learning for vision-based force estimation. CURRENT DIRECTIONS IN BIOMEDICAL ENGINEERING 2020. [DOI: 10.1515/cdbme-2020-0024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Abstract
Robot-assisted minimally-invasive surgery is increasingly used in clinical practice. Force feedback offers potential to develop haptic feedback for surgery systems. Forces can be estimated in a vision-based way by capturing deformation observed in 2D-image sequences with deep learning models. Variations in tissue appearance and mechanical properties likely influence force estimation methods’ generalization. In this work, we study the generalization capabilities of different spatial and spatio-temporal deep learning methods across different tissue samples. We acquire several data-sets using a clinical laparoscope and use both purely spatial and also spatio-temporal deep learning models. The results of this work show that generalization across different tissues is challenging. Nevertheless, we demonstrate that using spatio-temporal data instead of individual frames is valuable for force estimation. In particular, processing spatial and temporal data separately by a combination of a ResNet and GRU architecture shows promising results with a mean absolute error of 15.450 compared to 19.744 mN of a purely spatial CNN.
Collapse
Affiliation(s)
- Finn Behrendt
- Institute of Medical Technology , Hamburg University of Technology , Hamburg , Germany
| | - Nils Gessert
- Institute of Medical Technology , Hamburg University of Technology , Hamburg , Germany
| | - Alexander Schlaefer
- Institute of Medical Technology , Hamburg University of Technology , Hamburg , Germany
| |
Collapse
|
8
|
Neidhardt M, Gessert N, Gosau T, Kemmling J, Feldhaus S, Schumacher U, Schlaefer A. Force estimation from 4D OCT data in a human tumor xenograft mouse model. CURRENT DIRECTIONS IN BIOMEDICAL ENGINEERING 2020. [DOI: 10.1515/cdbme-2020-0022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Abstract
Minimally invasive robotic surgery offer benefits such as reduced physical trauma, faster recovery and lesser pain for the patient. For these procedures, visual and haptic feedback to the surgeon is crucial when operating surgical tools without line-of-sight with a robot. External force sensors are biased by friction at the tool shaft and thereby cannot estimate forces between tool tip and tissue. As an alternative, vision-based force estimation was proposed. Here, interaction forces are directly learned from deformation observed by an external imaging system. Recently, an approach based on optical coherence tomography and deep learning has shown promising results. However, most experiments are performed on ex-vivo tissue. In this work, we demonstrate that models trained on dead tissue do not perform well in in vivo data. We performed multiple experiments on a human tumor xenograft mouse model, both on in vivo, perfused tissue and dead tissue. We compared two deep learning models in different training scenarios. Training on perfused, in vivo data improved model performance by 24% for in vivo force estimation.
Collapse
Affiliation(s)
- Maximilian Neidhardt
- Institute of Medical Technology and Intelligent Systems, Hamburg University of Technology , Hamburg , Germany
| | - Nils Gessert
- Institute of Medical Technology and Intelligent Systems, Hamburg University of Technology , Hamburg , Germany
| | - Tobias Gosau
- Department of Anatomy and Experimental Morphology, University Medical Center Hamburg-Eppendorf , Hamburg , Germany
| | - Julia Kemmling
- Department of Anatomy and Experimental Morphology, University Medical Center Hamburg-Eppendorf , Hamburg , Germany
| | - Susanne Feldhaus
- Department of Anatomy and Experimental Morphology, University Medical Center Hamburg-Eppendorf , Hamburg , Germany
| | - Udo Schumacher
- Department of Anatomy and Experimental Morphology, University Medical Center Hamburg-Eppendorf , Hamburg , Germany
| | - Alexander Schlaefer
- Institute of Medical Technology and Intelligent Systems, Hamburg University of Technology , Hamburg , Germany
| |
Collapse
|
9
|
Bengs M, Gessert N, Schlaefer A. 4D spatio-temporal convolutional networks for object position estimation in OCT volumes. CURRENT DIRECTIONS IN BIOMEDICAL ENGINEERING 2020. [DOI: 10.1515/cdbme-2020-0001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Abstract
Tracking and localizing objects is a central problem in computer-assisted surgery. Optical coherence tomography (OCT) can be employed as an optical tracking system, due to its high spatial and temporal resolution. Recently, 3D convolutional neural networks (CNNs) have shown promising performance for pose estimation of a marker object using single volumetric OCT images. While this approach relied on spatial information only, OCT allows for a temporal stream of OCT image volumes capturing the motion of an object at high volumes rates. In this work, we systematically extend 3D CNNs to 4D spatio-temporal CNNs to evaluate the impact of additional temporal information for marker object tracking. Across various architectures, our results demonstrate that using a stream of OCT volumes and employing 4D spatio-temporal convolutions leads to a 30% lower mean absolute error compared to single volume processing with 3D CNNs.
Collapse
Affiliation(s)
- Marcel Bengs
- Institute of Medical Technology and Intelligent Systems, Hamburg University of Technology , Hamburg , Germany
| | - Nils Gessert
- Institute of Medical Technology and Intelligent Systems, Hamburg University of Technology , Hamburg , Germany
| | - Alexander Schlaefer
- Institute of Medical Technology and Intelligent Systems, Hamburg University of Technology , Hamburg , Germany
| |
Collapse
|