1
|
Sourlos N, Vliegenthart R, Santinha J, Klontzas ME, Cuocolo R, Huisman M, van Ooijen P. Recommendations for the creation of benchmark datasets for reproducible artificial intelligence in radiology. Insights Imaging 2024; 15:248. [PMID: 39400639 PMCID: PMC11473745 DOI: 10.1186/s13244-024-01833-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Accepted: 09/20/2024] [Indexed: 10/15/2024] Open
Abstract
Various healthcare domains have witnessed successful preliminary implementation of artificial intelligence (AI) solutions, including radiology, though limited generalizability hinders their widespread adoption. Currently, most research groups and industry have limited access to the data needed for external validation studies. The creation and accessibility of benchmark datasets to validate such solutions represents a critical step towards generalizability, for which an array of aspects ranging from preprocessing to regulatory issues and biostatistical principles come into play. In this article, the authors provide recommendations for the creation of benchmark datasets in radiology, explain current limitations in this realm, and explore potential new approaches. CLINICAL RELEVANCE STATEMENT: Benchmark datasets, facilitating validation of AI software performance can contribute to the adoption of AI in clinical practice. KEY POINTS: Benchmark datasets are essential for the validation of AI software performance. Factors like image quality and representativeness of cases should be considered. Benchmark datasets can help adoption by increasing the trustworthiness and robustness of AI.
Collapse
Affiliation(s)
- Nikos Sourlos
- Department of Radiology, University Medical Center of Groningen, Groningen, The Netherlands
- DataScience Center in Health, University Medical Center Groningen, Groningen, The Netherlands
| | - Rozemarijn Vliegenthart
- Department of Radiology, University Medical Center of Groningen, Groningen, The Netherlands
- DataScience Center in Health, University Medical Center Groningen, Groningen, The Netherlands
| | - Joao Santinha
- Digital Surgery LAB, Champalimaud Foundation, Champalimaud Clinical Centre, Lisbon, Portugal
| | - Michail E Klontzas
- Department of Medical Imaging, University Hospital of Heraklion, Heraklion, Greece
- Department of Radiology, School of Medicine, University of Crete, Heraklion, Greece
| | - Renato Cuocolo
- Department of Medicine, Surgery, and Dentistry, University of Salerno, Baronissi, Italy
| | - Merel Huisman
- Department of Radiology and Nuclear Medicine, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Peter van Ooijen
- DataScience Center in Health, University Medical Center Groningen, Groningen, The Netherlands.
- Department of Radiation Oncology, University Medical Center Groningen, Groningen, The Netherlands.
| |
Collapse
|
2
|
Wahid KA, Kaffey ZY, Farris DP, Humbert-Vidan L, Moreno AC, Rasmussen M, Ren J, Naser MA, Netherton TJ, Korreman S, Balakrishnan G, Fuller CD, Fuentes D, Dohopolski MJ. Artificial intelligence uncertainty quantification in radiotherapy applications - A scoping review. Radiother Oncol 2024; 201:110542. [PMID: 39299574 DOI: 10.1016/j.radonc.2024.110542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Revised: 08/18/2024] [Accepted: 09/09/2024] [Indexed: 09/22/2024]
Abstract
BACKGROUND/PURPOSE The use of artificial intelligence (AI) in radiotherapy (RT) is expanding rapidly. However, there exists a notable lack of clinician trust in AI models, underscoring the need for effective uncertainty quantification (UQ) methods. The purpose of this study was to scope existing literature related to UQ in RT, identify areas of improvement, and determine future directions. METHODS We followed the PRISMA-ScR scoping review reporting guidelines. We utilized the population (human cancer patients), concept (utilization of AI UQ), context (radiotherapy applications) framework to structure our search and screening process. We conducted a systematic search spanning seven databases, supplemented by manual curation, up to January 2024. Our search yielded a total of 8980 articles for initial review. Manuscript screening and data extraction was performed in Covidence. Data extraction categories included general study characteristics, RT characteristics, AI characteristics, and UQ characteristics. RESULTS We identified 56 articles published from 2015 to 2024. 10 domains of RT applications were represented; most studies evaluated auto-contouring (50 %), followed by image-synthesis (13 %), and multiple applications simultaneously (11 %). 12 disease sites were represented, with head and neck cancer being the most common disease site independent of application space (32 %). Imaging data was used in 91 % of studies, while only 13 % incorporated RT dose information. Most studies focused on failure detection as the main application of UQ (60 %), with Monte Carlo dropout being the most commonly implemented UQ method (32 %) followed by ensembling (16 %). 55 % of studies did not share code or datasets. CONCLUSION Our review revealed a lack of diversity in UQ for RT applications beyond auto-contouring. Moreover, we identified a clear need to study additional UQ methods, such as conformal prediction. Our results may incentivize the development of guidelines for reporting and implementation of UQ in RT.
Collapse
Affiliation(s)
- Kareem A Wahid
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA; Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Zaphanlene Y Kaffey
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - David P Farris
- Research Medical Library, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Laia Humbert-Vidan
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Amy C Moreno
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | | | - Jintao Ren
- Department of Oncology, Aarhus University Hospital, Denmark
| | - Mohamed A Naser
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Tucker J Netherton
- Department of Radiation Physics, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Stine Korreman
- Department of Oncology, Aarhus University Hospital, Denmark
| | | | - Clifton D Fuller
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - David Fuentes
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
| | - Michael J Dohopolski
- Department of Radiation Oncology, The University of Texas Southwestern Medical Center, Dallas, TX, USA.
| |
Collapse
|
3
|
Li L, Jiang C, Yu L, Zeng X, Zheng S. Efficient model-informed co-segmentation of tumors on PET/CT driven by clustering and classification information. Comput Biol Med 2024; 180:108980. [PMID: 39137668 DOI: 10.1016/j.compbiomed.2024.108980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2024] [Revised: 07/18/2024] [Accepted: 08/01/2024] [Indexed: 08/15/2024]
Abstract
Automatic tumor segmentation via positron emission tomography (PET) and computed tomography (CT) images plays a critical role in the prevention, diagnosis, and treatment of this disease via radiation oncology. However, segmenting these tumors is challenging due to the heterogeneity of grayscale levels and fuzzy boundaries. To address these issues, in this paper, an efficient model-informed PET/CT tumor co-segmentation method that combines fuzzy C-means clustering and Bayesian classification information is proposed. To alleviate the grayscale heterogeneity of multi-modal images, in this method, a novel grayscale similar region term is designed based on the background region information of PET and the foreground region information of CT. An edge stop function is innovatively presented to enhance the localization of fuzzy edges by incorporating the fuzzy C-means clustering strategy. To improve the segmentation accuracy further, a unique data fidelity term is introduced based on PET images by combining the distribution characteristics of pixel points in PET images. Finally, experimental validation on datasets of head and neck tumor (HECKTOR) and non-small cell lung cancer (NSCLC) demonstrated impressive values for three key evaluation metrics, including DSC, RVD and HD5, achieved impressive values of 0.85, 5.32, and 0.17, respectively. These compelling results indicate that image segmentation methods based on mathematical models exhibit outstanding performance in handling grayscale heterogeneity and fuzzy boundaries in multi-modal images.
Collapse
Affiliation(s)
- Laquan Li
- College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Chuangbo Jiang
- School of Science, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Lei Yu
- Emergency Department, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, 400010, China
| | - Xianhua Zeng
- College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Shenhai Zheng
- College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| |
Collapse
|
4
|
Ren J, Teuwen J, Nijkamp J, Rasmussen M, Gouw Z, Grau Eriksen J, Sonke JJ, Korreman S. Enhancing the reliability of deep learning-based head and neck tumour segmentation using uncertainty estimation with multi-modal images. Phys Med Biol 2024; 69:165018. [PMID: 39059432 DOI: 10.1088/1361-6560/ad682d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 07/26/2024] [Indexed: 07/28/2024]
Abstract
Objective.Deep learning shows promise in autosegmentation of head and neck cancer (HNC) primary tumours (GTV-T) and nodal metastases (GTV-N). However, errors such as including non-tumour regions or missing nodal metastases still occur. Conventional methods often make overconfident predictions, compromising reliability. Incorporating uncertainty estimation, which provides calibrated confidence intervals can address this issue. Our aim was to investigate the efficacy of various uncertainty estimation methods in improving segmentation reliability. We evaluated their confidence levels in voxel predictions and ability to reveal potential segmentation errors.Approach.We retrospectively collected data from 567 HNC patients with diverse cancer sites and multi-modality images (CT, PET, T1-, and T2-weighted MRI) along with their clinical GTV-T/N delineations. Using the nnUNet 3D segmentation pipeline, we compared seven uncertainty estimation methods, evaluating them based on segmentation accuracy (Dice similarity coefficient, DSC), confidence calibration (Expected Calibration Error, ECE), and their ability to reveal segmentation errors (Uncertainty-Error overlap using DSC, UE-DSC).Main results.Evaluated on the hold-out test dataset (n= 97), the median DSC scores for GTV-T and GTV-N segmentation across all uncertainty estimation methods had a narrow range, from 0.73 to 0.76 and 0.78 to 0.80, respectively. In contrast, the median ECE exhibited a wider range, from 0.30 to 0.12 for GTV-T and 0.25 to 0.09 for GTV-N. Similarly, the median UE-DSC also ranged broadly, from 0.21 to 0.38 for GTV-T and 0.22 to 0.36 for GTV-N. A probabilistic network-PhiSeg method consistently demonstrated the best performance in terms of ECE and UE-DSC.Significance.Our study highlights the importance of uncertainty estimation in enhancing the reliability of deep learning for autosegmentation of HNC GTV. The results show that while segmentation accuracy can be similar across methods, their reliability, measured by calibration error and uncertainty-error overlap, varies significantly. Used with visualisation maps, these methods may effectively pinpoint uncertainties and potential errors at the voxel level.
Collapse
Affiliation(s)
- Jintao Ren
- Danish Centre for Particle Therapy, Aarhus University Hospital, Palle Juul-Jensens Boulevard 25, 8200 Aarhus N, Denmark
- Department of Oncology, Aarhus University Hospital, Palle Juul-Jensens Boulevard 25, 8200 Aarhus N, Denmark
- Department of Clinical Medicine, Aarhus University, Palle Juul-Jensens Boulevard 25, 8200 Aarhus N, Denmark
| | - Jonas Teuwen
- Department of Radiation Oncology, Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands
| | - Jasper Nijkamp
- Danish Centre for Particle Therapy, Aarhus University Hospital, Palle Juul-Jensens Boulevard 25, 8200 Aarhus N, Denmark
- Department of Clinical Medicine, Aarhus University, Palle Juul-Jensens Boulevard 25, 8200 Aarhus N, Denmark
| | - Mathis Rasmussen
- Danish Centre for Particle Therapy, Aarhus University Hospital, Palle Juul-Jensens Boulevard 25, 8200 Aarhus N, Denmark
- Department of Oncology, Aarhus University Hospital, Palle Juul-Jensens Boulevard 25, 8200 Aarhus N, Denmark
- Department of Clinical Medicine, Aarhus University, Palle Juul-Jensens Boulevard 25, 8200 Aarhus N, Denmark
| | - Zeno Gouw
- Department of Radiation Oncology, Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands
| | - Jesper Grau Eriksen
- Department of Oncology, Aarhus University Hospital, Palle Juul-Jensens Boulevard 25, 8200 Aarhus N, Denmark
- Department of Clinical Medicine, Aarhus University, Palle Juul-Jensens Boulevard 25, 8200 Aarhus N, Denmark
| | - Jan-Jakob Sonke
- Department of Radiation Oncology, Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands
| | - Stine Korreman
- Danish Centre for Particle Therapy, Aarhus University Hospital, Palle Juul-Jensens Boulevard 25, 8200 Aarhus N, Denmark
- Department of Oncology, Aarhus University Hospital, Palle Juul-Jensens Boulevard 25, 8200 Aarhus N, Denmark
- Department of Clinical Medicine, Aarhus University, Palle Juul-Jensens Boulevard 25, 8200 Aarhus N, Denmark
| |
Collapse
|
5
|
Ma B, Guo J, De Biase A, van Dijk LV, van Ooijen PMA, Langendijk JA, Both S, Sijtsema NM. PET/CT based transformer model for multi-outcome prediction in oropharyngeal cancer. Radiother Oncol 2024; 197:110368. [PMID: 38834153 DOI: 10.1016/j.radonc.2024.110368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 05/08/2024] [Accepted: 06/01/2024] [Indexed: 06/06/2024]
Abstract
BACKGROUND AND PURPOSE To optimize our previously proposed TransRP, a model integrating CNN (convolutional neural network) and ViT (Vision Transformer) designed for recurrence-free survival prediction in oropharyngeal cancer and to extend its application to the prediction of multiple clinical outcomes, including locoregional control (LRC), Distant metastasis-free survival (DMFS) and overall survival (OS). MATERIALS AND METHODS Data was collected from 400 patients (300 for training and 100 for testing) diagnosed with oropharyngeal squamous cell carcinoma (OPSCC) who underwent (chemo)radiotherapy at University Medical Center Groningen. Each patient's data comprised pre-treatment PET/CT scans, clinical parameters, and clinical outcome endpoints, namely LRC, DMFS and OS. The prediction performance of TransRP was compared with CNNs when inputting image data only. Additionally, three distinct methods (m1-3) of incorporating clinical predictors into TransRP training and one method (m4) that uses TransRP prediction as one parameter in a clinical Cox model were compared. RESULTS TransRP achieved higher test C-index values of 0.61, 0.84 and 0.70 than CNNs for LRC, DMFS and OS, respectively. Furthermore, when incorporating TransRP's prediction into a clinical Cox model (m4), a higher C-index of 0.77 for OS was obtained. Compared with a clinical routine risk stratification model of OS, our model, using clinical variables, radiomics and TransRP prediction as predictors, achieved larger separations of survival curves between low, intermediate and high risk groups. CONCLUSION TransRP outperformed CNN models for all endpoints. Combining clinical data and TransRP prediction in a Cox model achieved better OS prediction.
Collapse
Affiliation(s)
- Baoqiang Ma
- Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands.
| | - Jiapan Guo
- Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands; Machine Learning Lab, Data Science Center in Health (DASH), Groningen, the Netherlands; Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence, University of Groningen, Groningen, the Netherlands
| | - Alessia De Biase
- Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands; Machine Learning Lab, Data Science Center in Health (DASH), Groningen, the Netherlands
| | - Lisanne V van Dijk
- Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands; Department of Radiation Oncology, University of Texas MD Anderson Cancer Center, Houston, TX USA
| | - Peter M A van Ooijen
- Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands; Machine Learning Lab, Data Science Center in Health (DASH), Groningen, the Netherlands
| | - Johannes A Langendijk
- Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Stefan Both
- Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Nanna M Sijtsema
- Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| |
Collapse
|
6
|
De Biase A, Ziegfeld L, Sijtsema NM, Steenbakkers R, Wijsman R, van Dijk LV, Langendijk JA, Cnossen F, van Ooijen P. Probability maps for deep learning-based head and neck tumor segmentation: Graphical User Interface design and test. Comput Biol Med 2024; 177:108675. [PMID: 38820779 DOI: 10.1016/j.compbiomed.2024.108675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 05/27/2024] [Accepted: 05/27/2024] [Indexed: 06/02/2024]
Abstract
BACKGROUND The different tumor appearance of head and neck cancer across imaging modalities, scanners, and acquisition parameters accounts for the highly subjective nature of the manual tumor segmentation task. The variability of the manual contours is one of the causes of the lack of generalizability and the suboptimal performance of deep learning (DL) based tumor auto-segmentation models. Therefore, a DL-based method was developed that outputs predicted tumor probabilities for each PET-CT voxel in the form of a probability map instead of one fixed contour. The aim of this study was to show that DL-generated probability maps for tumor segmentation are clinically relevant, intuitive, and a more suitable solution to assist radiation oncologists in gross tumor volume segmentation on PET-CT images of head and neck cancer patients. METHOD A graphical user interface (GUI) was designed, and a prototype was developed to allow the user to interact with tumor probability maps. Furthermore, a user study was conducted where nine experts in tumor delineation interacted with the interface prototype and its functionality. The participants' experience was assessed qualitatively and quantitatively. RESULTS The interviews with radiation oncologists revealed their preference for using a rainbow colormap to visualize tumor probability maps during contouring, which they found intuitive. They also appreciated the slider feature, which facilitated interaction by allowing the selection of threshold values to create single contours for editing and use as a starting point. Feedback on the prototype highlighted its excellent usability and positive integration into clinical workflows. CONCLUSIONS This study shows that DL-generated tumor probability maps are explainable, transparent, intuitive and a better alternative to the single output of tumor segmentation models.
Collapse
Affiliation(s)
- Alessia De Biase
- Department of Radiation Oncology, University Medical Center Groningen (UMCG), 9700 RB, Groningen, the Netherlands; Data Science Center in Health (DASH), University Medical Center Groningen (UMCG), 9700 RB, Groningen, the Netherlands.
| | - Liv Ziegfeld
- University of Groningen, University of Groningen (RUG), 9700 AK, Groningen, the Netherlands
| | - Nanna Maria Sijtsema
- Department of Radiation Oncology, University Medical Center Groningen (UMCG), 9700 RB, Groningen, the Netherlands
| | - Roel Steenbakkers
- Department of Radiation Oncology, University Medical Center Groningen (UMCG), 9700 RB, Groningen, the Netherlands
| | - Robin Wijsman
- Department of Radiation Oncology, University Medical Center Groningen (UMCG), 9700 RB, Groningen, the Netherlands
| | - Lisanne V van Dijk
- Department of Radiation Oncology, University Medical Center Groningen (UMCG), 9700 RB, Groningen, the Netherlands
| | - Johannes A Langendijk
- Department of Radiation Oncology, University Medical Center Groningen (UMCG), 9700 RB, Groningen, the Netherlands
| | - Fokie Cnossen
- Department of Artificial Intelligence, Bernoulli Institute of Mathematics, Computer Science and Artificial Intelligence, University of Groningen (RUG), 9700 AK, Groningen, the Netherlands
| | - Peter van Ooijen
- Department of Radiation Oncology, University Medical Center Groningen (UMCG), 9700 RB, Groningen, the Netherlands; Data Science Center in Health (DASH), University Medical Center Groningen (UMCG), 9700 RB, Groningen, the Netherlands
| |
Collapse
|
7
|
Sahlsten J, Jaskari J, Wahid KA, Ahmed S, Glerean E, He R, Kann BH, Mäkitie A, Fuller CD, Naser MA, Kaski K. Application of simultaneous uncertainty quantification and segmentation for oropharyngeal cancer use-case with Bayesian deep learning. COMMUNICATIONS MEDICINE 2024; 4:110. [PMID: 38851837 PMCID: PMC11162474 DOI: 10.1038/s43856-024-00528-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 05/16/2024] [Indexed: 06/10/2024] Open
Abstract
BACKGROUND Radiotherapy is a core treatment modality for oropharyngeal cancer (OPC), where the primary gross tumor volume (GTVp) is manually segmented with high interobserver variability. This calls for reliable and trustworthy automated tools in clinician workflow. Therefore, accurate uncertainty quantification and its downstream utilization is critical. METHODS Here we propose uncertainty-aware deep learning for OPC GTVp segmentation, and illustrate the utility of uncertainty in multiple applications. We examine two Bayesian deep learning (BDL) models and eight uncertainty measures, and utilize a large multi-institute dataset of 292 PET/CT scans to systematically analyze our approach. RESULTS We show that our uncertainty-based approach accurately predicts the quality of the deep learning segmentation in 86.6% of cases, identifies low performance cases for semi-automated correction, and visualizes regions of the scans where the segmentations likely fail. CONCLUSIONS Our BDL-based analysis provides a first-step towards more widespread implementation of uncertainty quantification in OPC GTVp segmentation.
Collapse
Affiliation(s)
- Jaakko Sahlsten
- Department of Computer Science, Aalto University School of Science, Espoo, Finland
| | - Joel Jaskari
- Department of Computer Science, Aalto University School of Science, Espoo, Finland
| | - Kareem A Wahid
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Sara Ahmed
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Enrico Glerean
- Department of Neuroscience and Biomedical Engineering, Aalto University School of Science, Espoo, Finland
| | - Renjie He
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Benjamin H Kann
- Artificial Intelligence in Medicine Program, Brigham and Women's Hospital, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Antti Mäkitie
- Department of Otorhinolaryngology, Head and Neck Surgery, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
- Research Program in Systems Oncology, University of Helsinki, Helsinki, Finland
| | - Clifton D Fuller
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Mohamed A Naser
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
| | - Kimmo Kaski
- Department of Computer Science, Aalto University School of Science, Espoo, Finland.
| |
Collapse
|
8
|
Wu J, Ma Q, Zhou X, Wei Y, Liu Z, Kang H. Segmentation and quantitative analysis of optical coherence tomography (OCT) images of laser burned skin based on deep learning. Biomed Phys Eng Express 2024; 10:045026. [PMID: 38718764 DOI: 10.1088/2057-1976/ad488f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 05/08/2024] [Indexed: 05/22/2024]
Abstract
Evaluation of skin recovery is an important step in the treatment of burns. However, conventional methods only observe the surface of the skin and cannot quantify the injury volume. Optical coherence tomography (OCT) is a non-invasive, non-contact, real-time technique. Swept source OCT uses near infrared light and analyzes the intensity of light echo at different depths to generate images from optical interference signals. To quantify the dynamic recovery of skin burns over time, laser induced skin burns in mice were evaluated using deep learning of Swept source OCT images. A laser-induced mouse skin thermal injury model was established in thirty Kunming mice, and OCT images of normal and burned areas of mouse skin were acquired at day 0, day 1, day 3, day 7, and day 14 after laser irradiation. This resulted in 7000 normal and 1400 burn B-scan images which were divided into training, validation, and test sets at 8:1.5:0.5 ratio for the normal data and 8:1:1 for the burn data. Normal images were manually annotated, and the deep learning U-Net model (verified with PSPNe and HRNet models) was used to segment the skin into three layers: the dermal epidermal layer, subcutaneous fat layer, and muscle layer. For the burn images, the models were trained to segment just the damaged area. Three-dimensional reconstruction technology was then used to reconstruct the damaged tissue and calculate the damaged tissue volume. The average IoU value and f-score of the normal tissue layer U-Net segmentation model were 0.876 and 0.934 respectively. The IoU value of the burn area segmentation model reached 0.907 and f-score value reached 0.951. Compared with manual labeling, the U-Net model was faster with higher accuracy for skin stratification. OCT and U-Net segmentation can provide rapid and accurate analysis of tissue changes and clinical guidance in the treatment of burns.
Collapse
Affiliation(s)
- Jingyuan Wu
- Beijing Institute of Radiation Medicine, Beijing 100850, People's Republic of China
- College of Life Sciences, Hebei University, Baoding, Hebei 071002, People's Republic of China
| | - Qiong Ma
- Beijing Institute of Radiation Medicine, Beijing 100850, People's Republic of China
| | - Xun Zhou
- Beijing Institute of Radiation Medicine, Beijing 100850, People's Republic of China
| | - Yu Wei
- Beijing Institute of Radiation Medicine, Beijing 100850, People's Republic of China
- College of Life Sciences, Hebei University, Baoding, Hebei 071002, People's Republic of China
| | - Zhibo Liu
- Beijing Institute of Radiation Medicine, Beijing 100850, People's Republic of China
| | - Hongxiang Kang
- Beijing Institute of Radiation Medicine, Beijing 100850, People's Republic of China
| |
Collapse
|
9
|
Wahid KA, Kaffey ZY, Farris DP, Humbert-Vidan L, Moreno AC, Rasmussen M, Ren J, Naser MA, Netherton TJ, Korreman S, Balakrishnan G, Fuller CD, Fuentes D, Dohopolski MJ. Artificial Intelligence Uncertainty Quantification in Radiotherapy Applications - A Scoping Review. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.13.24307226. [PMID: 38798581 PMCID: PMC11118597 DOI: 10.1101/2024.05.13.24307226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Background/purpose The use of artificial intelligence (AI) in radiotherapy (RT) is expanding rapidly. However, there exists a notable lack of clinician trust in AI models, underscoring the need for effective uncertainty quantification (UQ) methods. The purpose of this study was to scope existing literature related to UQ in RT, identify areas of improvement, and determine future directions. Methods We followed the PRISMA-ScR scoping review reporting guidelines. We utilized the population (human cancer patients), concept (utilization of AI UQ), context (radiotherapy applications) framework to structure our search and screening process. We conducted a systematic search spanning seven databases, supplemented by manual curation, up to January 2024. Our search yielded a total of 8980 articles for initial review. Manuscript screening and data extraction was performed in Covidence. Data extraction categories included general study characteristics, RT characteristics, AI characteristics, and UQ characteristics. Results We identified 56 articles published from 2015-2024. 10 domains of RT applications were represented; most studies evaluated auto-contouring (50%), followed by image-synthesis (13%), and multiple applications simultaneously (11%). 12 disease sites were represented, with head and neck cancer being the most common disease site independent of application space (32%). Imaging data was used in 91% of studies, while only 13% incorporated RT dose information. Most studies focused on failure detection as the main application of UQ (60%), with Monte Carlo dropout being the most commonly implemented UQ method (32%) followed by ensembling (16%). 55% of studies did not share code or datasets. Conclusion Our review revealed a lack of diversity in UQ for RT applications beyond auto-contouring. Moreover, there was a clear need to study additional UQ methods, such as conformal prediction. Our results may incentivize the development of guidelines for reporting and implementation of UQ in RT.
Collapse
Affiliation(s)
- Kareem A. Wahid
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Zaphanlene Y. Kaffey
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - David P. Farris
- Research Medical Library, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Laia Humbert-Vidan
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Amy C. Moreno
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | | | - Jintao Ren
- Department of Oncology, Aarhus University Hospital, Denmark
| | - Mohamed A. Naser
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Tucker J. Netherton
- Department of Radiation Physics, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Stine Korreman
- Department of Oncology, Aarhus University Hospital, Denmark
| | | | - Clifton D. Fuller
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - David Fuentes
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Michael J. Dohopolski
- Department of Radiation Oncology, The University of Texas Southwestern Medical Center, Dallas, Texas, USA
| |
Collapse
|
10
|
De Biase A, Ma B, Guo J, van Dijk LV, Langendijk JA, Both S, van Ooijen PMA, Sijtsema NM. Deep learning-based outcome prediction using PET/CT and automatically predicted probability maps of primary tumor in patients with oropharyngeal cancer. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 244:107939. [PMID: 38008678 DOI: 10.1016/j.cmpb.2023.107939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 11/20/2023] [Accepted: 11/20/2023] [Indexed: 11/28/2023]
Abstract
BACKGROUND AND OBJECTIVE Recently, deep learning (DL) algorithms showed to be promising in predicting outcomes such as distant metastasis-free survival (DMFS) and overall survival (OS) using pre-treatment imaging in head and neck cancer. Gross Tumor Volume of the primary tumor (GTVp) segmentation is used as an additional channel in the input to DL algorithms to improve model performance. However, the binary segmentation mask of the GTVp directs the focus of the network to the defined tumor region only and uniformly. DL models trained for tumor segmentation have also been used to generate predicted tumor probability maps (TPM) where each pixel value corresponds to the degree of certainty of that pixel to be classified as tumor. The aim of this study was to explore the effect of using TPM as an extra input channel of CT- and PET-based DL prediction models for oropharyngeal cancer (OPC) patients in terms of local control (LC), regional control (RC), DMFS and OS. METHODS We included 399 OPC patients from our institute that were treated with definitive (chemo)radiation. For each patient, CT and PET scans and GTVp contours, used for radiotherapy treatment planning, were collected. We first trained a previously developed 2.5D DL framework for tumor probability prediction by 5-fold cross validation using 131 patients. Then, a 3D ResNet18 was trained for outcome prediction using the 3D TPM as one of the possible inputs. The endpoints were LC, RC, DMFS, and OS. We performed 3-fold cross validation on 168 patients for each endpoint using different combinations of image modalities as input. The final prediction in the test set (100) was obtained by averaging the predictions of the 3-fold models. The C-index was used to evaluate the discriminative performance of the models. RESULTS The models trained replacing the GTVp contours with the TPM achieved the highest C-indexes for LC (0.74) and RC (0.60) prediction. For OS, using the TPM or the GTVp as additional image modality resulted in comparable C-indexes (0.72 and 0.74). CONCLUSIONS Adding predicted TPMs instead of GTVp contours as an additional input channel for DL-based outcome prediction models improved model performance for LC and RC.
Collapse
Affiliation(s)
- Alessia De Biase
- Department of Radiation Oncology, University Medical Centre Groningen (UMCG), RB, Groningen 9700, the Netherlands; Data Science Centre in Health (DASH), University Medical Centre Groningen (UMCG), RB, Groningen 9700, the Netherlands
| | - Baoqiang Ma
- Department of Radiation Oncology, University Medical Centre Groningen (UMCG), RB, Groningen 9700, the Netherlands.
| | - Jiapan Guo
- Computer Science and Artificial Intelligence, Bernoulli Institute for Mathematics, University of Groningen (RUG), Groningen, AK 9700, the Netherlands
| | - Lisanne V van Dijk
- Department of Radiation Oncology, University Medical Centre Groningen (UMCG), RB, Groningen 9700, the Netherlands
| | - Johannes A Langendijk
- Department of Radiation Oncology, University Medical Centre Groningen (UMCG), RB, Groningen 9700, the Netherlands
| | - Stefan Both
- Department of Radiation Oncology, University Medical Centre Groningen (UMCG), RB, Groningen 9700, the Netherlands
| | - Peter M A van Ooijen
- Department of Radiation Oncology, University Medical Centre Groningen (UMCG), RB, Groningen 9700, the Netherlands; Data Science Centre in Health (DASH), University Medical Centre Groningen (UMCG), RB, Groningen 9700, the Netherlands
| | - Nanna M Sijtsema
- Department of Radiation Oncology, University Medical Centre Groningen (UMCG), RB, Groningen 9700, the Netherlands
| |
Collapse
|
11
|
Wahid KA, Sahlsten J, Jaskari J, Dohopolski MJ, Kaski K, He R, Glerean E, Kann BH, Mäkitie A, Fuller CD, Naser MA, Fuentes D. Harnessing uncertainty in radiotherapy auto-segmentation quality assurance. Phys Imaging Radiat Oncol 2024; 29:100526. [PMID: 38179210 PMCID: PMC10765294 DOI: 10.1016/j.phro.2023.100526] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Accepted: 12/13/2023] [Indexed: 01/06/2024] Open
Affiliation(s)
- Kareem A. Wahid
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Jaakko Sahlsten
- Department of Computer Science, Aalto University School of Science, Espoo, Finland
| | - Joel Jaskari
- Department of Computer Science, Aalto University School of Science, Espoo, Finland
| | - Michael J. Dohopolski
- Department of Radiation Oncology, The University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Kimmo Kaski
- Department of Computer Science, Aalto University School of Science, Espoo, Finland
| | - Renjie He
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Enrico Glerean
- Department of Neuroscience and Biomedical Engineering, Aalto University School of Science, Espoo, Finland
| | - Benjamin H. Kann
- Artificial Intelligence in Medicine Program, Brigham and Women’s Hospital, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Antti Mäkitie
- Department of Otorhinolaryngology, Head and Neck Surgery, University of Helsinki and Helsinki University Hospital, Research Program in Systems Oncology, University of Helsinki, Helsinki, Finland
| | - Clifton D. Fuller
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Mohamed A. Naser
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - David Fuentes
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
12
|
Zhang X, Cheng G, Han X, Li S, Xiong J, Wu Z, Zhang H, Chen D. Deep learning-based multi-stage postoperative type-b aortic dissection segmentation using global-local fusion learning. Phys Med Biol 2023; 68:235011. [PMID: 37774717 DOI: 10.1088/1361-6560/acfec7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 09/29/2023] [Indexed: 10/01/2023]
Abstract
Objective.Type-b aortic dissection (AD) is a life-threatening cardiovascular disease and the primary treatment is thoracic endovascular aortic repair (TEVAR). Due to the lack of a rapid and accurate segmentation technique, the patient-specific postoperative AD model is unavailable in clinical practice, resulting in impracticable 3D morphological and hemodynamic analyses during TEVAR assessment. This work aims to construct a deep learning-based segmentation framework for postoperative type-b AD.Approach.The segmentation is performed in a two-stage manner. A multi-class segmentation of the contrast-enhanced aorta, thrombus (TH), and branch vessels (BV) is achieved in the first stage based on the cropped image patches. True lumen (TL) and false lumen (FL) are extracted from a straightened image containing the entire aorta in the second stage. A global-local fusion learning mechanism is designed to improve the segmentation of TH and BR by compensating for the missing contextual features of the cropped images in the first stage.Results.The experiments are conducted on a multi-center dataset comprising 133 patients with 306 follow-up images. Our framework achieves the state-of-the-art dice similarity coefficient (DSC) of 0.962, 0.921, 0.811, and 0.884 for TL, FL, TH, and BV, respectively. The global-local fusion learning mechanism increases the DSC of TH and BV by 2.3% (p< 0.05) and 1.4% (p< 0.05), respectively, based on the baseline. Segmenting TH in stage 1 can achieve significantly better DSC for FL (0.921 ± 0.055 versus 0.857 ± 0.220,p< 0.01) and TH (0.811 ± 0.137 versus 0.797 ± 0.146,p< 0.05) than in stage 2. Our framework supports more accurate vascular volume quantifications compared with previous segmentation model, especially for the patients with enlarged TH+FL after TEVAR, and shows good generalizability to different hospital settings.Significance.Our framework can quickly provide accurate patient-specific AD models, supporting the clinical practice of 3D morphological and hemodynamic analyses for quantitative and more comprehensive patient-specific TEVAR assessments.
Collapse
Affiliation(s)
- Xuyang Zhang
- School of Medical Technology, Beijing Institute of Technology, Beijing, People's Republic of China
| | - Guoliang Cheng
- School of Medical Technology, Beijing Institute of Technology, Beijing, People's Republic of China
| | - Xiaofeng Han
- Department of Diagnostic and Interventional Radiology, Beijing Anzhen Hospital, Capital Medical University, Beijing, People's Republic of China
| | - Shilong Li
- School of Medical Technology, Beijing Institute of Technology, Beijing, People's Republic of China
| | - Jiang Xiong
- Department of Vascular and Endovascular Surgery, Chinese PLA General Hospital, Beijing, People's Republic of China
| | - Ziheng Wu
- Department of Vascular Surgery, The First Affiliated Hospital, Zhejiang University, Hangzhou, People's Republic of China
| | - Hongkun Zhang
- Department of Vascular Surgery, The First Affiliated Hospital, Zhejiang University, Hangzhou, People's Republic of China
| | - Duanduan Chen
- School of Medical Technology, Beijing Institute of Technology, Beijing, People's Republic of China
| |
Collapse
|
13
|
Tsilivigkos C, Athanasopoulos M, Micco RD, Giotakis A, Mastronikolis NS, Mulita F, Verras GI, Maroulis I, Giotakis E. Deep Learning Techniques and Imaging in Otorhinolaryngology-A State-of-the-Art Review. J Clin Med 2023; 12:6973. [PMID: 38002588 PMCID: PMC10672270 DOI: 10.3390/jcm12226973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Revised: 11/02/2023] [Accepted: 11/06/2023] [Indexed: 11/26/2023] Open
Abstract
Over the last decades, the field of medicine has witnessed significant progress in artificial intelligence (AI), the Internet of Medical Things (IoMT), and deep learning (DL) systems. Otorhinolaryngology, and imaging in its various subspecialties, has not remained untouched by this transformative trend. As the medical landscape evolves, the integration of these technologies becomes imperative in augmenting patient care, fostering innovation, and actively participating in the ever-evolving synergy between computer vision techniques in otorhinolaryngology and AI. To that end, we conducted a thorough search on MEDLINE for papers published until June 2023, utilizing the keywords 'otorhinolaryngology', 'imaging', 'computer vision', 'artificial intelligence', and 'deep learning', and at the same time conducted manual searching in the references section of the articles included in our manuscript. Our search culminated in the retrieval of 121 related articles, which were subsequently subdivided into the following categories: imaging in head and neck, otology, and rhinology. Our objective is to provide a comprehensive introduction to this burgeoning field, tailored for both experienced specialists and aspiring residents in the domain of deep learning algorithms in imaging techniques in otorhinolaryngology.
Collapse
Affiliation(s)
- Christos Tsilivigkos
- 1st Department of Otolaryngology, National and Kapodistrian University of Athens, Hippocrateion Hospital, 115 27 Athens, Greece; (A.G.); (E.G.)
| | - Michail Athanasopoulos
- Department of Otolaryngology, University Hospital of Patras, 265 04 Patras, Greece; (M.A.); (N.S.M.)
| | - Riccardo di Micco
- Department of Otolaryngology and Head and Neck Surgery, Medical School of Hannover, 30625 Hannover, Germany;
| | - Aris Giotakis
- 1st Department of Otolaryngology, National and Kapodistrian University of Athens, Hippocrateion Hospital, 115 27 Athens, Greece; (A.G.); (E.G.)
| | - Nicholas S. Mastronikolis
- Department of Otolaryngology, University Hospital of Patras, 265 04 Patras, Greece; (M.A.); (N.S.M.)
| | - Francesk Mulita
- Department of Surgery, University Hospital of Patras, 265 04 Patras, Greece; (G.-I.V.); (I.M.)
| | - Georgios-Ioannis Verras
- Department of Surgery, University Hospital of Patras, 265 04 Patras, Greece; (G.-I.V.); (I.M.)
| | - Ioannis Maroulis
- Department of Surgery, University Hospital of Patras, 265 04 Patras, Greece; (G.-I.V.); (I.M.)
| | - Evangelos Giotakis
- 1st Department of Otolaryngology, National and Kapodistrian University of Athens, Hippocrateion Hospital, 115 27 Athens, Greece; (A.G.); (E.G.)
| |
Collapse
|
14
|
Gu X, Strijbis VIJ, Slotman BJ, Dahele MR, Verbakel WFAR. Dose distribution prediction for head-and-neck cancer radiotherapy using a generative adversarial network: influence of input data. Front Oncol 2023; 13:1251132. [PMID: 37829347 PMCID: PMC10565853 DOI: 10.3389/fonc.2023.1251132] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 08/25/2023] [Indexed: 10/14/2023] Open
Abstract
Purpose A three-dimensional deep generative adversarial network (GAN) was used to predict dose distributions for locally advanced head and neck cancer radiotherapy. Given the labor- and time-intensive nature of manual planning target volume (PTV) and organ-at-risk (OAR) segmentation, we investigated whether dose distributions could be predicted without the need for fully segmented datasets. Materials and methods GANs were trained/validated/tested using 320/30/35 previously segmented CT datasets and treatment plans. The following input combinations were used to train and test the models: CT-scan only (C); CT+PTVboost/elective (CP); CT+PTVs+OARs+body structure (CPOB); PTVs+OARs+body structure (POB); PTVs+body structure (PB). Mean absolute errors (MAEs) for the predicted dose distribution and mean doses to individual OARs (individual salivary glands, individual swallowing structures) were analyzed. Results For the five models listed, MAEs were 7.3 Gy, 3.5 Gy, 3.4 Gy, 3.4 Gy, and 3.5 Gy, respectively, without significant differences among CP-CPOB, CP-POB, CP-PB, among CPOB-POB. Dose volume histograms showed that all four models that included PTV contours predicted dose distributions that had a high level of agreement with clinical treatment plans. The best model CPOB and the worst model PB (except model C) predicted mean dose to within ±3 Gy of the clinical dose, for 82.6%/88.6%/82.9% and 71.4%/67.1%/72.2% of all OARs, parotid glands (PG), and submandibular glands (SMG), respectively. The R2 values (0.17/0.96/0.97/0.95/0.95) of OAR mean doses for each model also indicated that except for model C, the predictions correlated highly with the clinical dose distributions. Interestingly model C could reasonably predict the dose in eight patients, but on average, it performed inadequately. Conclusion We demonstrated the influence of the CT scan, and PTV and OAR contours on dose prediction. Model CP was not statistically different from model CPOB and represents the minimum data statistically required to adequately predict the clinical dose distribution in a group of patients.
Collapse
Affiliation(s)
- Xiaojin Gu
- Department of Radiation Oncology, Amsterdam UMC Location Vrije Universiteit Amsterdam, Amsterdam, Netherlands
- Cancer Center Amsterdam, Cancer Treatment and Quality of Life, Amsterdam, Netherlands
| | - Victor I. J. Strijbis
- Department of Radiation Oncology, Amsterdam UMC Location Vrije Universiteit Amsterdam, Amsterdam, Netherlands
- Cancer Center Amsterdam, Cancer Treatment and Quality of Life, Amsterdam, Netherlands
| | - Ben J. Slotman
- Department of Radiation Oncology, Amsterdam UMC Location Vrije Universiteit Amsterdam, Amsterdam, Netherlands
- Cancer Center Amsterdam, Cancer Treatment and Quality of Life, Amsterdam, Netherlands
| | - Max R. Dahele
- Department of Radiation Oncology, Amsterdam UMC Location Vrije Universiteit Amsterdam, Amsterdam, Netherlands
- Cancer Center Amsterdam, Cancer Treatment and Quality of Life, Amsterdam, Netherlands
| | - Wilko F. A. R. Verbakel
- Department of Radiation Oncology, Amsterdam UMC Location Vrije Universiteit Amsterdam, Amsterdam, Netherlands
- Cancer Center Amsterdam, Cancer Treatment and Quality of Life, Amsterdam, Netherlands
| |
Collapse
|
15
|
Sahlsten J, Jaskari J, Wahid KA, Ahmed S, Glerean E, He R, Kann BH, Mäkitie A, Fuller CD, Naser MA, Kaski K. Application of simultaneous uncertainty quantification for image segmentation with probabilistic deep learning: Performance benchmarking of oropharyngeal cancer target delineation as a use-case. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.02.20.23286188. [PMID: 36865296 PMCID: PMC9980236 DOI: 10.1101/2023.02.20.23286188] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/26/2023]
Abstract
Background Oropharyngeal cancer (OPC) is a widespread disease, with radiotherapy being a core treatment modality. Manual segmentation of the primary gross tumor volume (GTVp) is currently employed for OPC radiotherapy planning, but is subject to significant interobserver variability. Deep learning (DL) approaches have shown promise in automating GTVp segmentation, but comparative (auto)confidence metrics of these models predictions has not been well-explored. Quantifying instance-specific DL model uncertainty is crucial to improving clinician trust and facilitating broad clinical implementation. Therefore, in this study, probabilistic DL models for GTVp auto-segmentation were developed using large-scale PET/CT datasets, and various uncertainty auto-estimation methods were systematically investigated and benchmarked. Methods We utilized the publicly available 2021 HECKTOR Challenge training dataset with 224 co-registered PET/CT scans of OPC patients with corresponding GTVp segmentations as a development set. A separate set of 67 co-registered PET/CT scans of OPC patients with corresponding GTVp segmentations was used for external validation. Two approximate Bayesian deep learning methods, the MC Dropout Ensemble and Deep Ensemble, both with five submodels, were evaluated for GTVp segmentation and uncertainty performance. The segmentation performance was evaluated using the volumetric Dice similarity coefficient (DSC), mean surface distance (MSD), and Hausdorff distance at 95% (95HD). The uncertainty was evaluated using four measures from literature: coefficient of variation (CV), structure expected entropy, structure predictive entropy, and structure mutual information, and additionally with our novel Dice-risk measure. The utility of uncertainty information was evaluated with the accuracy of uncertainty-based segmentation performance prediction using the Accuracy vs Uncertainty (AvU) metric, and by examining the linear correlation between uncertainty estimates and DSC. In addition, batch-based and instance-based referral processes were examined, where the patients with high uncertainty were rejected from the set. In the batch referral process, the area under the referral curve with DSC (R-DSC AUC) was used for evaluation, whereas in the instance referral process, the DSC at various uncertainty thresholds were examined. Results Both models behaved similarly in terms of the segmentation performance and uncertainty estimation. Specifically, the MC Dropout Ensemble had 0.776 DSC, 1.703 mm MSD, and 5.385 mm 95HD. The Deep Ensemble had 0.767 DSC, 1.717 mm MSD, and 5.477 mm 95HD. The uncertainty measure with the highest DSC correlation was structure predictive entropy with correlation coefficients of 0.699 and 0.692 for the MC Dropout Ensemble and the Deep Ensemble, respectively. The highest AvU value was 0.866 for both models. The best performing uncertainty measure for both models was the CV which had R-DSC AUC of 0.783 and 0.782 for the MC Dropout Ensemble and Deep Ensemble, respectively. With referring patients based on uncertainty thresholds from 0.85 validation DSC for all uncertainty measures, on average the DSC improved from the full dataset by 4.7% and 5.0% while referring 21.8% and 22% patients for MC Dropout Ensemble and Deep Ensemble, respectively. Conclusion We found that many of the investigated methods provide overall similar but distinct utility in terms of predicting segmentation quality and referral performance. These findings are a critical first-step towards more widespread implementation of uncertainty quantification in OPC GTVp segmentation.
Collapse
Affiliation(s)
- Jaakko Sahlsten
- Department of Computer Science, Aalto University School of Science, Espoo, Finland
| | - Joel Jaskari
- Department of Computer Science, Aalto University School of Science, Espoo, Finland
| | - Kareem A Wahid
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX USA
| | - Sara Ahmed
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX USA
| | - Enrico Glerean
- Department of Neuroscience and Biomedical Engineering, Aalto University School of Science, Espoo, Finland
| | - Renjie He
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX USA
| | - Benjamin H Kann
- Artificial Intelligence in Medicine Program, Brigham and Women's Hospital, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA USA
| | - Antti Mäkitie
- Department of Otorhinolaryngology, Head and Neck Surgery, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
| | - Clifton D Fuller
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX USA
| | - Mohamed A Naser
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX USA
| | - Kimmo Kaski
- Department of Computer Science, Aalto University School of Science, Espoo, Finland
| |
Collapse
|