1
|
Ahmadian M, Bodalal Z, van der Hulst HJ, Vens C, Karssemakers LHE, Bogveradze N, Castagnoli F, Landolfi F, Hong EK, Gennaro N, Pizzi AD, Beets-Tan RGH, van den Brekel MWM, Castelijns JA. Overcoming data scarcity in radiomics/radiogenomics using synthetic radiomic features. Comput Biol Med 2024; 174:108389. [PMID: 38593640 DOI: 10.1016/j.compbiomed.2024.108389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 03/11/2024] [Accepted: 03/25/2024] [Indexed: 04/11/2024]
Abstract
PURPOSE To evaluate the potential of synthetic radiomic data generation in addressing data scarcity in radiomics/radiogenomics models. METHODS This study was conducted on a retrospectively collected cohort of 386 colorectal cancer patients (n = 2570 lesions) for whom matched contrast-enhanced CT images and gene TP53 mutational status were available. The full cohort data was divided into a training cohort (n = 2055 lesions) and an independent and fixed test set (n = 515 lesions). Differently sized training sets were subsampled from the training cohort to measure the impact of sample size on model performance and assess the added value of synthetic radiomic augmentation at different sizes. Five different tabular synthetic data generation models were used to generate synthetic radiomic data based on "real-world" radiomics data extracted from this cohort. The quality and reproducibility of the generated synthetic radiomic data were assessed. Synthetic radiomics were then combined with "real-world" radiomic training data to evaluate their impact on the predictive model's performance. RESULTS A prediction model was generated using only "real-world" radiomic data, revealing the impact of data scarcity in this particular data set through a lack of predictive performance at low training sample numbers (n = 200, 400, 1000 lesions with average AUC = 0.52, 0.53, and 0.56 respectively, compared to 0.64 when using 2055 training lesions). Synthetic tabular data generation models created reproducible synthetic radiomic data with properties highly similar to "real-world" data (for n = 1000 lesions, average Chi-square = 0.932, average basic statistical correlation = 0.844). The integration of synthetic radiomic data consistently enhanced the performance of predictive models trained with small sample size sets (AUC enhanced by 9.6%, 11.3%, and 16.7% for models trained on n_samples = 200, 400, and 1000 lesions, respectively). In contrast, synthetic data generated from randomised/noisy radiomic data failed to enhance predictive performance underlining the requirement of true signal data to do so. CONCLUSION Synthetic radiomic data, when combined with real radiomics, could enhance the performance of predictive models. Tabular synthetic data generation might help to overcome limitations in medical AI stemming from data scarcity.
Collapse
Affiliation(s)
- Milad Ahmadian
- Department of Head and Neck Oncology and Surgery, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; Department of Radiology, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; Amsterdam Center for Language and Communication, University of Amsterdam, Amsterdam, the Netherlands.
| | - Zuhir Bodalal
- Department of Radiology, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; GROW School for Oncology and Developmental Biology, Maastricht University, Maastricht, the Netherlands
| | - Hedda J van der Hulst
- Department of Radiology, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; GROW School for Oncology and Developmental Biology, Maastricht University, Maastricht, the Netherlands
| | - Conchita Vens
- Department of Head and Neck Oncology and Surgery, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; School of Cancer Science, University of Glasgow, Glasgow, Scotland, UK
| | - Luc H E Karssemakers
- Department of Head and Neck Oncology and Surgery, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands
| | - Nino Bogveradze
- Department of Radiology, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; GROW School for Oncology and Developmental Biology, Maastricht University, Maastricht, the Netherlands; Department of Radiology, American Hospital Tbilisi, Tbilisi, Georgia
| | - Francesca Castagnoli
- Department of Radiology, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; Department of Radiology, Royal Marsden Hospital, London, UK; Division of Radiotherapy and Imaging, The Institute of Cancer Research, London, UK
| | - Federica Landolfi
- Department of Radiology, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; Radiology Unit, Sant'Andrea Hospital, Sapienza University of Rome, Rome, Italy
| | - Eun Kyoung Hong
- Department of Radiology, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; GROW School for Oncology and Developmental Biology, Maastricht University, Maastricht, the Netherlands; Seoul National University Hospital, Seoul, South Korea
| | - Nicolo Gennaro
- Department of Radiology, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; Department of Radiology, Northwestern University, Chicago, USA
| | - Andrea Delli Pizzi
- Department of Radiology, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; ITAB - Institute for Advanced Biomedical Technologies, G. d'Annunzio University, Chieti, Italy; Department of Innovative Technologies in Medicine and Dentistry, G. D'Annunzio University, Chieti, Italy
| | - Regina G H Beets-Tan
- Department of Radiology, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; GROW School for Oncology and Developmental Biology, Maastricht University, Maastricht, the Netherlands; Institute of Regional Health Research, University of Southern Denmark, Odense, Denmark
| | - Michiel W M van den Brekel
- Department of Head and Neck Oncology and Surgery, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; Amsterdam Center for Language and Communication, University of Amsterdam, Amsterdam, the Netherlands.
| | - Jonas A Castelijns
- Department of Radiology, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands
| |
Collapse
|
2
|
Kaleta J, Dall'Alba D, Płotka S, Korzeniowski P. Minimal data requirement for realistic endoscopic image generation with Stable Diffusion. Int J Comput Assist Radiol Surg 2024; 19:531-539. [PMID: 37934401 PMCID: PMC10881618 DOI: 10.1007/s11548-023-03030-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Accepted: 10/11/2023] [Indexed: 11/08/2023]
Abstract
PURPOSE Computer-assisted surgical systems provide support information to the surgeon, which can improve the execution and overall outcome of the procedure. These systems are based on deep learning models that are trained on complex and challenging-to-annotate data. Generating synthetic data can overcome these limitations, but it is necessary to reduce the domain gap between real and synthetic data. METHODS We propose a method for image-to-image translation based on a Stable Diffusion model, which generates realistic images starting from synthetic data. Compared to previous works, the proposed method is better suited for clinical application as it requires a much smaller amount of input data and allows finer control over the generation of details by introducing different variants of supporting control networks. RESULTS The proposed method is applied in the context of laparoscopic cholecystectomy, using synthetic and real data from public datasets. It achieves a mean Intersection over Union of 69.76%, significantly improving the baseline results (69.76 vs. 42.21%). CONCLUSIONS The proposed method for translating synthetic images into images with realistic characteristics will enable the training of deep learning methods that can generalize optimally to real-world contexts, thereby improving computer-assisted intervention guidance systems.
Collapse
Affiliation(s)
- Joanna Kaleta
- Sano Centre for Computational Medicine, Krakow, Poland
| | - Diego Dall'Alba
- Sano Centre for Computational Medicine, Krakow, Poland.
- Department of Computer Science, University of Verona, Verona, Italy.
| | - Szymon Płotka
- Sano Centre for Computational Medicine, Krakow, Poland
- Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands
- Department of Biomedical Engineering and Physics, Amsterdam University Medical Center, Amsterdam, The Netherlands
| | | |
Collapse
|
3
|
Kopeliovich MV, Petrushan MV, Matukhno AE, Lysenko LV. Towards detection of cancer biomarkers in human exhaled air by transfer-learning-powered analysis of odor-evoked calcium activity in rat olfactory bulb. Heliyon 2024; 10:e20173. [PMID: 38173493 PMCID: PMC10761347 DOI: 10.1016/j.heliyon.2023.e20173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 09/04/2023] [Accepted: 09/13/2023] [Indexed: 01/05/2024] Open
Abstract
Detection of volatile organic compounds in exhaled air is a promising approach to non-invasive and scalable gastric cancer screening. This work proposes a new approach for the detection of volatile organic compounds by analyzing odor-evoked calcium responses in the rat olfactory bulb. We estimate the feasibility of gastric cancer biomarker detection added to the exhaled air of healthy participants. Our detector consists of a convolutional encoder and a similarity-based classifier over encoder outputs. To minimize overfitting on a small available training set, we involve a pre-training where the encoder is trained on synthetic data representing spatiotemporal patterns similar to real calcium responses in the olfactory bulb. We estimate the classification accuracy of exhaled air samples by matching their encodings with encodings of calibration samples of two classes: 1) exhaled air and 2) a mixture of exhaled air with the cancer biomarker. On our data, the accuracy increased from 0.68 on real data up to 0.74 if pre-training on synthetic data is involved. Our work is focused on proving the feasibility of proposed new approach rather than on comparing its efficiency with existing methods. Such detection is often performed with an electronic nose, but its output becomes unstable over time due to a sensor drift. In contrast to the electronic nose, rats can robustly detect low concentrations of biomarkers over lifetime. The feasibility of gastric cancer biomarker detection in exhaled air by bio-hybrid system is shown. Pre-training of neural models for images analysis increases the accuracy of detection.
Collapse
Affiliation(s)
| | - Mikhail V. Petrushan
- WiznTech LLC, Rostov-on-Don, 344082, Russia
- Research Center for Neurotechnology, Southern Federal University, Rostov-on-Don, 344090, Russia
| | - Aleksey E. Matukhno
- Research Center for Neurotechnology, Southern Federal University, Rostov-on-Don, 344090, Russia
| | - Larisa V. Lysenko
- Research Center for Neurotechnology, Southern Federal University, Rostov-on-Don, 344090, Russia
- Department of Physics, Southern Federal University, Rostov-on-Don, 344090, Russia
| |
Collapse
|
4
|
Guo L, Nahm W. Texture synthesis for generating realistic-looking bronchoscopic videos. Int J Comput Assist Radiol Surg 2023; 18:2287-2293. [PMID: 37162734 PMCID: PMC10632244 DOI: 10.1007/s11548-023-02874-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 03/15/2023] [Indexed: 05/11/2023]
Abstract
PURPOSE Synthetic realistic-looking bronchoscopic videos are needed to develop and evaluate depth estimation methods as part of investigating vision-based bronchoscopic navigation system. To generate these synthetic videos under the circumstance where access to real bronchoscopic images/image sequences is limited, we need to create various realistic-looking image textures of the airway inner surface with large size using a small number of real bronchoscopic image texture patches. METHODS A generative adversarial networks-based method is applied to create realistic-looking textures of the airway inner surface by learning from a limited number of small texture patches from real bronchoscopic images. By applying a purely convolutional architecture without any fully connected layers, this method allows the production of textures with arbitrary size. RESULTS Authentic image textures of airway inner surface are created. An example of the synthesized textures and two frames of the thereby generated bronchoscopic video are shown. The necessity and sufficiency of the generated textures as image features for further depth estimation methods are demonstrated. CONCLUSIONS The method can generate textures of the airway inner surface that meet the requirements for the texture itself and for the thereby generated bronchoscopic videos, including "realistic-looking," "long-term temporal consistency," "sufficient image features for depth estimation," and "large size and variety of synthesized textures." Besides, it also shows advantages with respect to the easy accessibility to required data source. A further validation of this approach is planned by utilizing the realistic-looking bronchoscopic videos with textures generated by this method as training and test data for some depth estimation networks.
Collapse
Affiliation(s)
- Lu Guo
- Karlsruhe Institute of Technology, Kaiserstraße 12, Karlsruhe, 76131, Germany.
| | - Werner Nahm
- Karlsruhe Institute of Technology, Kaiserstraße 12, Karlsruhe, 76131, Germany
| |
Collapse
|
5
|
Luo Y, Gao Y, Zhang Z, Fan J, Zhang H, Xu M. Long-range zero-shot generative deep network quantization. Neural Netw 2023; 166:683-691. [PMID: 37604077 DOI: 10.1016/j.neunet.2023.07.042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 07/22/2023] [Accepted: 07/27/2023] [Indexed: 08/23/2023]
Abstract
Quantization approximates a deep network model with floating-point numbers by the model with low bit width numbers, thereby accelerating inference and reducing computation. Zero-shot quantization, which aims to quantize a model without access to the original data, can be achieved by fitting the real data distribution through data synthesis. However, it has been observed that zero-shot quantization leads to inferior performance compared to post-training quantization with real data for two primary reasons: 1) a normal generator has difficulty obtaining a high diversity of synthetic data since it lacks long-range information to allocate attention to global features, and 2) synthetic images aim to simulate the statistics of real data, which leads to weak intra-class heterogeneity and limited feature richness. To overcome these problems, we propose a novel deep network quantizer called long-range zero-shot generative deep network quantization (LRQ). Technically, we propose a long-range generator (LRG) to learn long-range information instead of simple local features. To incorporate more global features into the synthetic data, we use long-range attention with large-kernel convolution in the generator. In addition, we also present an adversarial margin add (AMA) module to force intra-class angular enlargement between the feature vector and class center. The AMA module forms an adversarial process that increases the convergence difficulty of the loss function, which is opposite to the training objective of the original loss function. Furthermore, to transfer knowledge from the full-precision network, we also utilize decoupled knowledge distillation. Extensive experiments demonstrate that LRQ obtains better performance than other competitors.
Collapse
Affiliation(s)
- Yan Luo
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China
| | - Yangcheng Gao
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China
| | - Zhao Zhang
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China; Shenzhen Research Institute of Big data, Shenzhen, China.
| | - Jicong Fan
- School of Data Science, The Chinese University of Hong Kong, Shenzhen, China; Shenzhen Research Institute of Big data, Shenzhen, China.
| | - Haijun Zhang
- School of Computer Science, Harbin Institute of Technology, Shenzhen, China
| | - Mingliang Xu
- School of Information Engineering, Zhengzhou University, Zhengzhou, China
| |
Collapse
|
6
|
Dials J, Demirel D, Sanchez-Arias R, Halic T, Kruger U, De S, Gromski MA. Skill-level classification and performance evaluation for endoscopic sleeve gastroplasty. Surg Endosc 2023:10.1007/s00464-023-09955-2. [PMID: 36897405 PMCID: PMC10000349 DOI: 10.1007/s00464-023-09955-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2022] [Accepted: 02/12/2023] [Indexed: 03/11/2023]
Abstract
BACKGROUND We previously developed grading metrics for quantitative performance measurement for simulated endoscopic sleeve gastroplasty (ESG) to create a scalar reference to classify subjects into experts and novices. In this work, we used synthetic data generation and expanded our skill level analysis using machine learning techniques. METHODS We used the synthetic data generation algorithm SMOTE to expand and balance our dataset of seven actual simulated ESG procedures using synthetic data. We performed optimization to seek optimum metrics to classify experts and novices by identifying the most critical and distinctive sub-tasks. We used support vector machine (SVM), AdaBoost, K-nearest neighbors (KNN) Kernel Fisher discriminant analysis (KFDA), random forest, and decision tree classifiers to classify surgeons as experts or novices after grading. Furthermore, we used an optimization model to create weights for each task and separate the clusters by maximizing the distance between the expert and novice scores. RESULTS We split our dataset into a training set of 15 samples and a testing dataset of five samples. We put this dataset through six classifiers, SVM, KFDA, AdaBoost, KNN, random forest, and decision tree, resulting in 0.94, 0.94, 1.00, 1.00, 1.00, and 1.00 accuracy, respectively, for training and 1.00 accuracy for the testing results for SVM and AdaBoost. Our optimization model maximized the distance between the expert and novice groups from 2 to 53.72. CONCLUSION This paper shows that feature reduction, in combination with classification algorithms such as SVM and KNN, can be used in tandem to classify endoscopists as experts or novices based on their results recorded using our grading metrics. Furthermore, this work introduces a non-linear constraint optimization to separate the two clusters and find the most important tasks using weights.
Collapse
Affiliation(s)
- James Dials
- Department of Computer Science, Florida Polytechnic University, Lakeland, FL, USA
| | - Doga Demirel
- Department of Computer Science, Florida Polytechnic University, Lakeland, FL, USA.
| | - Reinaldo Sanchez-Arias
- Department of Data Science and Business Analytics, Florida Polytechnic University, Lakeland, FL, USA
| | | | - Uwe Kruger
- Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, NY, USA
| | - Suvranu De
- College of Engineering, Florida A&M University - Florida State University, Tallahassee, FL, USA
| | - Mark A Gromski
- Division of Gastroenterology and Hepatology, Indiana University School of Medicine, Indianapolis, IN, USA
| |
Collapse
|
7
|
Posilović L, Medak D, Subašić M, Budimir M, Lončarić S. Generating ultrasonic images indistinguishable from real images using Generative Adversarial Networks. Ultrasonics 2022; 119:106610. [PMID: 34735930 DOI: 10.1016/j.ultras.2021.106610] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 09/30/2021] [Accepted: 09/30/2021] [Indexed: 06/13/2023]
Abstract
Ultrasonic imaging is widely used for non-destructive evaluation in various industry applications. Early detection of defects in materials is the key to keeping the integrity of inspected structures. Currently, there have been some attempts to develop models for automated defect detection on ultrasonic data. To push the performance of these models even further more data is needed to train deep convolutional neural networks. A lot of data is also needed for training human experts. However, gathering a sufficient amount of data for training is a challenge due to the rare occurrence of defects in real inspection scenarios. This is why inspection results heavily depend on the inspector's previous experience. To overcome these challenges, we propose the use of Generative Adversarial Networks for generating realistic ultrasonic images. To the best of our knowledge, this work is the first one to show that a Generative Adversarial Network is able to generate images indistinguishable from real ultrasonic images. The most thorough statistical quality analysis to date of generated ultrasonic images has been conducted with the participation of human expert inspectors. The experimental results show that images generated using our Generative Adversarial Network provide the highest quality images compared to other published methods.
Collapse
Affiliation(s)
- Luka Posilović
- University of Zagreb, Faculty of Electrical Engineering and Computing, Zagreb, Croatia.
| | - Duje Medak
- University of Zagreb, Faculty of Electrical Engineering and Computing, Zagreb, Croatia.
| | - Marko Subašić
- University of Zagreb, Faculty of Electrical Engineering and Computing, Zagreb, Croatia.
| | - Marko Budimir
- INETEC Institute for Nuclear Technology, Zagreb, Croatia.
| | - Sven Lončarić
- University of Zagreb, Faculty of Electrical Engineering and Computing, Zagreb, Croatia.
| |
Collapse
|
8
|
Momeni S, Fazlollahi A, Yates P, Rowe C, Gao Y, Liew AWC, Salvado O. Synthetic microbleeds generation for classifier training without ground truth. Comput Methods Programs Biomed 2021; 207:106127. [PMID: 34051412 DOI: 10.1016/j.cmpb.2021.106127] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Accepted: 04/21/2021] [Indexed: 06/12/2023]
Abstract
BACKGROUND AND OBJECTIVE Cerebral microbleeds (CMB) are important biomarkers of cerebrovascular diseases and cognitive dysfunctions. Susceptibility weighted imaging (SWI) is a common MRI sequence where CMB appear as small hypointense blobs. The prevalence of CMB in the population and in each scan is low, resulting in tedious and time-consuming visual assessment. Automated detection methods would be of value but are challenged by the CMB low prevalence, the presence of mimics such as blood vessels, and the difficulty to obtain sufficient ground truth for training and testing. In this paper, synthetic CMB (sCMB) generation using an analytical model is proposed for training and testing machine learning methods. The main aim is creating perfect synthetic ground truth as similar as reals, in high number, with a high diversity of shape, volume, intensity, and location to improve training of supervised methods. METHOD sCMB were modelled with a random Gaussian shape and added to healthy brain locations. We compared training on our synthetic data to standard augmentation techniques. We performed a validation experiment using sCMB and report result for whole brain detection using a 10-fold cross validation design with an ensemble of 10 neural networks. RESULTS Performance was close to state of the art (~9 false positives per scan), when random forest was trained on synthetic only and tested on real lesion. Other experiments showed that top detection performance could be achieved when training on synthetic CMB only. Our dataset is made available, including a version with 37,000 synthetic lesions, that could be used for benchmarking and training. CONCLUSION Our proposed synthetic microbleeds model is a powerful data augmentation approach for CMB classification with and should be considered for training automated lesion detection system from MRI SWI.
Collapse
Affiliation(s)
- Saba Momeni
- CSIRO Health and Biosecurity, Australian E-Health Research Centre, Brisbane, Australia; School of Engineering and Built Environment, Griffith University, Brisbane, Australia.
| | - Amir Fazlollahi
- CSIRO Health and Biosecurity, Australian E-Health Research Centre, Brisbane, Australia
| | - Paul Yates
- Department of Aged Care, Austin Health, Heidelberg, Victoria, Australia
| | - Christopher Rowe
- Department of Nuclear Medicine and Centre for PET, Austin Health, Heidelberg, Australia
| | - Yongsheng Gao
- School of Engineering and Built Environment, Griffith University, Brisbane, Australia
| | - Alan Wee-Chung Liew
- School of Information & Communication Technology, Griffith University, Gold Coast, Australia
| | | |
Collapse
|
9
|
Hein J, Seibold M, Bogo F, Farshad M, Pollefeys M, Fürnstahl P, Navab N. Towards markerless surgical tool and hand pose estimation. Int J Comput Assist Radiol Surg 2021; 16:799-808. [PMID: 33881732 PMCID: PMC8134312 DOI: 10.1007/s11548-021-02369-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Accepted: 04/06/2021] [Indexed: 01/16/2023]
Abstract
Purpose: Tracking of tools and surgical activity is becoming more and more important in the context of computer assisted surgery. In this work, we present a data generation framework, dataset and baseline methods to facilitate further research in the direction of markerless hand and instrument pose estimation in realistic surgical scenarios. Methods: We developed a rendering pipeline to create inexpensive and realistic synthetic data for model pretraining. Subsequently, we propose a pipeline to capture and label real data with hand and object pose ground truth in an experimental setup to gather high-quality real data. We furthermore present three state-of-the-art RGB-based pose estimation baselines. Results: We evaluate three baseline models on the proposed datasets. The best performing baseline achieves an average tool 3D vertex error of 16.7 mm on synthetic data as well as 13.8 mm on real data which is comparable to the state-of-the art in RGB-based hand/object pose estimation. Conclusion: To the best of our knowledge, we propose the first synthetic and real data generation pipelines to generate hand and object pose labels for open surgery. We present three baseline models for RGB based object and object/hand pose estimation based on RGB frames. Our realistic synthetic data generation pipeline may contribute to overcome the data bottleneck in the surgical domain and can easily be transferred to other medical applications. Supplementary Information The online version supplementary material available at 10.1007/s11548-021-02369-2.
Collapse
Affiliation(s)
- Jonas Hein
- Research in Orthopedic Computer Science, University Hospital Balgrist, University of Zurich, Balgrist CAMPUS, Zurich, Switzerland. .,Computer Vision and Geometry Group, ETH Zurich, Zurich, Switzerland.
| | - Matthias Seibold
- Research in Orthopedic Computer Science, University Hospital Balgrist, University of Zurich, Balgrist CAMPUS, Zurich, Switzerland. .,Computer Aided Medical Procedures, Technical University Munich, Garching, Germany.
| | - Federica Bogo
- Mixed Reality & AI Zurich Lab, Microsoft, Zurich, Switzerland
| | - Mazda Farshad
- Balgrist University Hospital, University of Zurich, Zurich, Switzerland
| | - Marc Pollefeys
- Computer Vision and Geometry Group, ETH Zurich, Zurich, Switzerland.,Mixed Reality & AI Zurich Lab, Microsoft, Zurich, Switzerland
| | - Philipp Fürnstahl
- Research in Orthopedic Computer Science, University Hospital Balgrist, University of Zurich, Balgrist CAMPUS, Zurich, Switzerland
| | - Nassir Navab
- Computer Aided Medical Procedures, Technical University Munich, Garching, Germany
| |
Collapse
|
10
|
İncetan K, Celik IO, Obeid A, Gokceler GI, Ozyoruk KB, Almalioglu Y, Chen RJ, Mahmood F, Gilbert H, Durr NJ, Turan M. VR-Caps: A Virtual Environment for Capsule Endoscopy. Med Image Anal 2021; 70:101990. [PMID: 33609920 DOI: 10.1016/j.media.2021.101990] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 02/01/2021] [Accepted: 02/02/2021] [Indexed: 02/06/2023]
Abstract
Current capsule endoscopes and next-generation robotic capsules for diagnosis and treatment of gastrointestinal diseases are complex cyber-physical platforms that must orchestrate complex software and hardware functions. The desired tasks for these systems include visual localization, depth estimation, 3D mapping, disease detection and segmentation, automated navigation, active control, path realization and optional therapeutic modules such as targeted drug delivery and biopsy sampling. Data-driven algorithms promise to enable many advanced functionalities for capsule endoscopes, but real-world data is challenging to obtain. Physically-realistic simulations providing synthetic data have emerged as a solution to the development of data-driven algorithms. In this work, we present a comprehensive simulation platform for capsule endoscopy operations and introduce VR-Caps, a virtual active capsule environment that simulates a range of normal and abnormal tissue conditions (e.g., inflated, dry, wet etc.) and varied organ types, capsule endoscope designs (e.g., mono, stereo, dual and 360∘ camera), and the type, number, strength, and placement of internal and external magnetic sources that enable active locomotion. VR-Caps makes it possible to both independently or jointly develop, optimize, and test medical imaging and analysis software for the current and next-generation endoscopic capsule systems. To validate this approach, we train state-of-the-art deep neural networks to accomplish various medical image analysis tasks using simulated data from VR-Caps and evaluate the performance of these models on real medical data. Results demonstrate the usefulness and effectiveness of the proposed virtual platform in developing algorithms that quantify fractional coverage, camera trajectory, 3D map reconstruction, and disease classification. All of the code, pre-trained weights and created 3D organ models of the virtual environment with detailed instructions how to setup and use the environment are made publicly available at https://github.com/CapsuleEndoscope/VirtualCapsuleEndoscopy and a video demonstration can be seen in the supplementary videos (Video-I).
Collapse
Affiliation(s)
- Kağan İncetan
- Institute of Biomedical Engineering, Bogazici University, Istanbul, Turkey
| | - Ibrahim Omer Celik
- Department of Computer Engineering, Bogazici University, Istanbul, Turkey
| | - Abdulhamid Obeid
- Institute of Biomedical Engineering, Bogazici University, Istanbul, Turkey
| | | | | | | | - Richard J Chen
- Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Faisal Mahmood
- Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA; Cancer Data Science, Dana Farber Cancer Institute, Boston, MA, USA; Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Hunter Gilbert
- Deparment of Mechanical and Industrial Engineering, Louisiana State University, Baton Rouge, LA USA
| | - Nicholas J Durr
- Department of Biomedical Engineering, Johns Hopkins University (JHU), Baltimore, MD, USA
| | - Mehmet Turan
- Institute of Biomedical Engineering, Bogazici University, Istanbul, Turkey.
| |
Collapse
|
11
|
Lobo J, Henriques R, Madeira SC. G-Tric: generating three-way synthetic datasets with triclustering solutions. BMC Bioinformatics 2021; 22:16. [PMID: 33413095 PMCID: PMC7789692 DOI: 10.1186/s12859-020-03925-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2020] [Accepted: 12/07/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Three-way data started to gain popularity due to their increasing capacity to describe inherently multivariate and temporal events, such as biological responses, social interactions along time, urban dynamics, or complex geophysical phenomena. Triclustering, subspace clustering of three-way data, enables the discovery of patterns corresponding to data subspaces (triclusters) with values correlated across the three dimensions (observations [Formula: see text] features [Formula: see text] contexts). With increasing number of algorithms being proposed, effectively comparing them with state-of-the-art algorithms is paramount. These comparisons are usually performed using real data, without a known ground-truth, thus limiting the assessments. In this context, we propose a synthetic data generator, G-Tric, allowing the creation of synthetic datasets with configurable properties and the possibility to plant triclusters. The generator is prepared to create datasets resembling real 3-way data from biomedical and social data domains, with the additional advantage of further providing the ground truth (triclustering solution) as output. RESULTS G-Tric can replicate real-world datasets and create new ones that match researchers needs across several properties, including data type (numeric or symbolic), dimensions, and background distribution. Users can tune the patterns and structure that characterize the planted triclusters (subspaces) and how they interact (overlapping). Data quality can also be controlled, by defining the amount of missing, noise or errors. Furthermore, a benchmark of datasets resembling real data is made available, together with the corresponding triclustering solutions (planted triclusters) and generating parameters. CONCLUSIONS Triclustering evaluation using G-Tric provides the possibility to combine both intrinsic and extrinsic metrics to compare solutions that produce more reliable analyses. A set of predefined datasets, mimicking widely used three-way data and exploring crucial properties was generated and made available, highlighting G-Tric's potential to advance triclustering state-of-the-art by easing the process of evaluating the quality of new triclustering approaches.
Collapse
Affiliation(s)
- João Lobo
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Campo Grande 016, 1749-016, Lisbon, Portugal
| | - Rui Henriques
- INESC-ID and Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais 1, 1900-001, Lisbon, Portugal
| | - Sara C Madeira
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Campo Grande 016, 1749-016, Lisbon, Portugal.
| |
Collapse
|
12
|
Goncalves A, Ray P, Soper B, Stevens J, Coyle L, Sales AP. Generation and evaluation of synthetic patient data. BMC Med Res Methodol 2020; 20:108. [PMID: 32381039 PMCID: PMC7204018 DOI: 10.1186/s12874-020-00977-1] [Citation(s) in RCA: 76] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2019] [Accepted: 04/13/2020] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Machine learning (ML) has made a significant impact in medicine and cancer research; however, its impact in these areas has been undeniably slower and more limited than in other application domains. A major reason for this has been the lack of availability of patient data to the broader ML research community, in large part due to patient privacy protection concerns. High-quality, realistic, synthetic datasets can be leveraged to accelerate methodological developments in medicine. By and large, medical data is high dimensional and often categorical. These characteristics pose multiple modeling challenges. METHODS In this paper, we evaluate three classes of synthetic data generation approaches; probabilistic models, classification-based imputation models, and generative adversarial neural networks. Metrics for evaluating the quality of the generated synthetic datasets are presented and discussed. RESULTS While the results and discussions are broadly applicable to medical data, for demonstration purposes we generate synthetic datasets for cancer based on the publicly available cancer registry data from the Surveillance Epidemiology and End Results (SEER) program. Specifically, our cohort consists of breast, respiratory, and non-solid cancer cases diagnosed between 2010 and 2015, which includes over 360,000 individual cases. CONCLUSIONS We discuss the trade-offs of the different methods and metrics, providing guidance on considerations for the generation and usage of medical synthetic data.
Collapse
Affiliation(s)
- Andre Goncalves
- Lawrence Livermore National Laboratory, 7000 East Ave, Livermore, CA, USA.
| | - Priyadip Ray
- Lawrence Livermore National Laboratory, 7000 East Ave, Livermore, CA, USA
| | - Braden Soper
- Lawrence Livermore National Laboratory, 7000 East Ave, Livermore, CA, USA
| | - Jennifer Stevens
- Information Management Systems, 1455 Research Blvd, Suite 315, Rockville, MD, USA
| | - Linda Coyle
- Information Management Systems, 1455 Research Blvd, Suite 315, Rockville, MD, USA
| | - Ana Paula Sales
- Lawrence Livermore National Laboratory, 7000 East Ave, Livermore, CA, USA
| |
Collapse
|
13
|
Fowler EE, Berglund A, Schell MJ, Sellers TA, Eschrich S, Heine J. Empirically-derived synthetic populations to mitigate small sample sizes. J Biomed Inform 2020; 105:103408. [PMID: 32173502 DOI: 10.1016/j.jbi.2020.103408] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 02/10/2020] [Accepted: 03/10/2020] [Indexed: 01/28/2023]
Abstract
Limited sample sizes can lead to spurious modeling findings in biomedical research. The objective of this work is to present a new method to generate synthetic populations (SPs) from limited samples using matched case-control data (n = 180 pairs), considered as two separate limited samples. SPs were generated with multivariate kernel density estimations (KDEs) with unconstrained bandwidth matrices. We included four continuous variables and one categorical variable for each individual. Bandwidth matrices were determined with Differential Evolution (DE) optimization by covariance comparisons. Four synthetic samples (n = 180) were derived from their respective SPs. Similarity between observed samples with synthetic samples was compared assuming their empirical probability density functions (EPDFs) were similar. EPDFs were compared with the maximum mean discrepancy (MMD) test statistic based on the Kernel Two-Sample Test. To evaluate similarity within a modeling context, EPDFs derived from the Principal Component Analysis (PCA) scores and residuals were summarized with the distance to the model in X-space (DModX) as additional comparisons. Four SPs were generated from each sample. The probability of selecting a replicate when randomly constructing synthetic samples (n = 180) was infinitesimally small. MMD tests indicated that the observed sample EPDFs were similar to the respective synthetic EPDFs. For the samples, PCA scores and residuals did not deviate significantly when compared with their respective synthetic samples. The feasibility of this approach was demonstrated by producing synthetic data at the individual level, statistically similar to the observed samples. The methodology coupled KDE with DE optimization and deployed novel similarity metrics derived from PCA. This approach could be used to generate larger-sized synthetic samples. To develop this approach into a research tool for data exploration purposes, additional evaluation with increased dimensionality is required. Moreover, given a fully specified population, the degree to which individuals can be discarded while synthesizing the respective population accurately will be investigated. When these objectives are addressed, comparisons with other techniques such as bootstrapping will be required for a complete evaluation.
Collapse
Affiliation(s)
- Erin E Fowler
- Cancer Epidemiology Department, MCC, Moffitt Cancer Center & Research Institute, 12901 Bruce B. Downs Blvd, Tampa, FL 33612, United States.
| | - Anders Berglund
- Department of Biostatistics and Bioinformatics, MCC, Moffitt Cancer Center & Research Institute, 12901 Bruce B. Downs Blvd, Tampa, FL 33612, United States.
| | - Michael J Schell
- Department of Biostatistics and Bioinformatics, MCC, Moffitt Cancer Center & Research Institute, 12901 Bruce B. Downs Blvd, Tampa, FL 33612, United States.
| | | | - Steven Eschrich
- Department of Biostatistics and Bioinformatics, MCC, Moffitt Cancer Center & Research Institute, 12901 Bruce B. Downs Blvd, Tampa, FL 33612, United States.
| | - John Heine
- Cancer Epidemiology Department, MCC, Moffitt Cancer Center & Research Institute, 12901 Bruce B. Downs Blvd, Tampa, FL 33612, United States.
| |
Collapse
|
14
|
Abstract
With the increase in GNSS user base, the studies of threats and vulnerabilities of GNSS system are also increased. Among the threats, spoofing is of particular interest because of the risk associated with it. The studies on spoofing are generally limited to simulated scenarios as the real world spoofing attack is very difficult to create or spot and record for analysis. This paper presents a method of generating baseband spoofing data using real world signals by simultaneous recordings of GNSS signals using two separate receivers, where one of them simulates the receiver under attack and the other simulates the response the spoofer will be going to produce to fabricate the attack. After taking the records and merging them to create the spoofing baseband signals, it is checked against several spoofing detection methods to verify the valid spoofing attack being present in the signal. This method produces the signal recordings that have real world disturbances in it that may be difficult to simulate. The developed method has the following advantages: It does not require very expensive hardware to produce an intermediate spoofing signal. The user has control over the spoofing power advantage. The same scenario can be reproduced with varying parameters.
Collapse
Affiliation(s)
- Abdul Malik Khan
- National University of Sciences and Technology, Islamabad, Pakistan
| | - Naveed Iqbal
- National University of Sciences and Technology, Islamabad, Pakistan
| | | |
Collapse
|