1
|
Hetz MJ, Bucher TC, Brinker TJ. Multi-domain stain normalization for digital pathology: A cycle-consistent adversarial network for whole slide images. Med Image Anal 2024; 94:103149. [PMID: 38574542 DOI: 10.1016/j.media.2024.103149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Revised: 12/11/2023] [Accepted: 03/20/2024] [Indexed: 04/06/2024]
Abstract
The variation in histologic staining between different medical centers is one of the most profound challenges in the field of computer-aided diagnosis. The appearance disparity of pathological whole slide images causes algorithms to become less reliable, which in turn impedes the wide-spread applicability of downstream tasks like cancer diagnosis. Furthermore, different stainings lead to biases in the training which in case of domain shifts negatively affect the test performance. Therefore, in this paper we propose MultiStain-CycleGAN, a multi-domain approach to stain normalization based on CycleGAN. Our modifications to CycleGAN allow us to normalize images of different origins without retraining or using different models. We perform an extensive evaluation of our method using various metrics and compare it to commonly used methods that are multi-domain capable. First, we evaluate how well our method fools a domain classifier that tries to assign a medical center to an image. Then, we test our normalization on the tumor classification performance of a downstream classifier. Furthermore, we evaluate the image quality of the normalized images using the Structural similarity index and the ability to reduce the domain shift using the Fréchet inception distance. We show that our method proves to be multi-domain capable, provides a very high image quality among the compared methods, and can most reliably fool the domain classifier while keeping the tumor classifier performance high. By reducing the domain influence, biases in the data can be removed on the one hand and the origin of the whole slide image can be disguised on the other, thus enhancing patient data privacy.
Collapse
Affiliation(s)
- Martin J Hetz
- Division of Digital Biomarkers for Oncology, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Tabea-Clara Bucher
- Division of Digital Biomarkers for Oncology, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Titus J Brinker
- Division of Digital Biomarkers for Oncology, German Cancer Research Center (DKFZ), Heidelberg, Germany.
| |
Collapse
|
2
|
Faryna K, van der Laak J, Litjens G. Automatic data augmentation to improve generalization of deep learning in H&E stained histopathology. Comput Biol Med 2024; 170:108018. [PMID: 38281317 DOI: 10.1016/j.compbiomed.2024.108018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 01/03/2024] [Accepted: 01/22/2024] [Indexed: 01/30/2024]
Abstract
In histopathology practice, scanners, tissue processing, staining, and image acquisition protocols vary from center to center, resulting in subtle variations in images. Vanilla convolutional neural networks are sensitive to such domain shifts. Data augmentation is a popular way to improve domain generalization. Currently, state-of-the-art domain generalization in computational pathology is achieved using a manually curated set of augmentation transforms. However, manual tuning of augmentation parameters is time-consuming and can lead to sub-optimal generalization performance. Meta-learning frameworks can provide efficient ways to find optimal training hyper-parameters, including data augmentation. In this study, we hypothesize that an automated search of augmentation hyper-parameters can provide superior generalization performance and reduce experimental optimization time. We select four state-of-the-art automatic augmentation methods from general computer vision and investigate their capacity to improve domain generalization in histopathology. We analyze their performance on data from 25 centers across two different tasks: tumor metastasis detection in lymph nodes and breast cancer tissue type classification. On tumor metastasis detection, most automatic augmentation methods achieve comparable performance to state-of-the-art manual augmentation. On breast cancer tissue type classification, the leading automatic augmentation method significantly outperforms state-of-the-art manual data augmentation.
Collapse
Affiliation(s)
- Khrystyna Faryna
- Department of Pathology, Radboud Institute for Health Sciences, Radboud University Medical Center, Geert Grooteplein Zuid 10, 6525 GA, Nijmegen, The Netherlands.
| | - Jeroen van der Laak
- Department of Pathology, Radboud Institute for Health Sciences, Radboud University Medical Center, Geert Grooteplein Zuid 10, 6525 GA, Nijmegen, The Netherlands; Center for Medical Image Science and Visualization, Linköping University, SE-581 83, Linköping, Sweden
| | - Geert Litjens
- Department of Pathology, Radboud Institute for Health Sciences, Radboud University Medical Center, Geert Grooteplein Zuid 10, 6525 GA, Nijmegen, The Netherlands
| |
Collapse
|
3
|
Mehrtens HA, Kurz A, Bucher TC, Brinker TJ. Benchmarking common uncertainty estimation methods with histopathological images under domain shift and label noise. Med Image Anal 2023; 89:102914. [PMID: 37544085 DOI: 10.1016/j.media.2023.102914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 05/17/2023] [Accepted: 07/25/2023] [Indexed: 08/08/2023]
Abstract
In the past years, deep learning has seen an increase in usage in the domain of histopathological applications. However, while these approaches have shown great potential, in high-risk environments deep learning models need to be able to judge their uncertainty and be able to reject inputs when there is a significant chance of misclassification. In this work, we conduct a rigorous evaluation of the most commonly used uncertainty and robustness methods for the classification of Whole Slide Images, with a focus on the task of selective classification, where the model should reject the classification in situations in which it is uncertain. We conduct our experiments on tile-level under the aspects of domain shift and label noise, as well as on slide-level. In our experiments, we compare Deep Ensembles, Monte-Carlo Dropout, Stochastic Variational Inference, Test-Time Data Augmentation as well as ensembles of the latter approaches. We observe that ensembles of methods generally lead to better uncertainty estimates as well as an increased robustness towards domain shifts and label noise, while contrary to results from classical computer vision benchmarks no systematic gain of the other methods can be shown. Across methods, a rejection of the most uncertain samples reliably leads to a significant increase in classification accuracy on both in-distribution as well as out-of-distribution data. Furthermore, we conduct experiments comparing these methods under varying conditions of label noise. Lastly, we publish our code framework to facilitate further research on uncertainty estimation on histopathological data.
Collapse
Affiliation(s)
- Hendrik A Mehrtens
- Division of Digital Biomarkers for Oncology, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Alexander Kurz
- Division of Digital Biomarkers for Oncology, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Tabea-Clara Bucher
- Division of Digital Biomarkers for Oncology, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Titus J Brinker
- Division of Digital Biomarkers for Oncology, German Cancer Research Center (DKFZ), Heidelberg, Germany.
| |
Collapse
|
4
|
Nijskens L, van den Berg CAT, Verhoeff JJC, Maspero M. Exploring contrast generalisation in deep learning-based brain MRI-to-CT synthesis. Phys Med 2023; 112:102642. [PMID: 37473612 DOI: 10.1016/j.ejmp.2023.102642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/18/2023] [Revised: 05/24/2023] [Accepted: 07/05/2023] [Indexed: 07/22/2023] Open
Abstract
BACKGROUND Synthetic computed tomography (sCT) has been proposed and increasingly clinically adopted to enable magnetic resonance imaging (MRI)-based radiotherapy. Deep learning (DL) has recently demonstrated the ability to generate accurate sCT from fixed MRI acquisitions. However, MRI protocols may change over time or differ between centres resulting in low-quality sCT due to poor model generalisation. PURPOSE investigating domain randomisation (DR) to increase the generalisation of a DL model for brain sCT generation. METHODS CT and corresponding T1-weighted MRI with/without contrast, T2-weighted, and FLAIR MRI from 95 patients undergoing RT were collected, considering FLAIR the unseen sequence where to investigate generalisation. A "Baseline" generative adversarial network was trained with/without the FLAIR sequence to test how a model performs without DR. Image similarity and accuracy of sCT-based dose plans were assessed against CT to select the best-performing DR approach against the Baseline. RESULTS The Baseline model had the poorest performance on FLAIR, with mean absolute error (MAE) = 106 ± 20.7 HU (mean ±σ). Performance on FLAIR significantly improved for the DR model with MAE = 99.0 ± 14.9 HU, but still inferior to the performance of the Baseline+FLAIR model (MAE = 72.6 ± 10.1 HU). Similarly, an improvement in γ-pass rate was obtained for DR vs Baseline. CONCLUSION DR improved image similarity and dose accuracy on the unseen sequence compared to training only on acquired MRI. DR makes the model more robust, reducing the need for re-training when applying a model on sequences unseen and unavailable for retraining.
Collapse
Affiliation(s)
- Lotte Nijskens
- Computational Imaging Group for MR Diagnostics & Therapy, Center for Image Science, University Medical Center Utrecht, Heidelberglaan 100, Utrecht, 3584CX, The Netherlands; Department of Radiotherapy, University Medical Center Utrecht, Heidelberglaan 100, Utrecht, 3584CX, The Netherlands
| | - Cornelis A T van den Berg
- Computational Imaging Group for MR Diagnostics & Therapy, Center for Image Science, University Medical Center Utrecht, Heidelberglaan 100, Utrecht, 3584CX, The Netherlands; Department of Radiotherapy, University Medical Center Utrecht, Heidelberglaan 100, Utrecht, 3584CX, The Netherlands
| | - Joost J C Verhoeff
- Department of Radiotherapy, University Medical Center Utrecht, Heidelberglaan 100, Utrecht, 3584CX, The Netherlands
| | - Matteo Maspero
- Computational Imaging Group for MR Diagnostics & Therapy, Center for Image Science, University Medical Center Utrecht, Heidelberglaan 100, Utrecht, 3584CX, The Netherlands; Department of Radiotherapy, University Medical Center Utrecht, Heidelberglaan 100, Utrecht, 3584CX, The Netherlands.
| |
Collapse
|
5
|
Fogelberg K, Chamarthi S, Maron RC, Niebling J, Brinker TJ. Domain shifts in dermoscopic skin cancer datasets: Evaluation of essential limitations for clinical translation. N Biotechnol 2023:S1871-6784(23)00021-3. [PMID: 37146681 DOI: 10.1016/j.nbt.2023.04.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 04/12/2023] [Accepted: 04/26/2023] [Indexed: 05/07/2023]
Abstract
The limited ability of Convolutional Neural Networks to generalize to images from previously unseen domains is a major limitation, in particular, for safety-critical clinical tasks such as dermoscopic skin cancer classification. In order to translate CNN-based applications into the clinic, it is essential that they are able to adapt to domain shifts. Such new conditions can arise through the use of different image acquisition systems or varying lighting conditions. In dermoscopy, shifts can also occur as a change in patient age or occurence of rare lesion localizations (e.g. palms). These are not prominently represented in most training datasets and can therefore lead to a decrease in performance. In order to verify the generalizability of classification models in real world clinical settings it is crucial to have access to data which mimics such domain shifts. To our knowledge no dermoscopic image dataset exists where such domain shifts are properly described and quantified. We therefore grouped publicly available images from ISIC archive based on their metadata (e.g. acquisition location, lesion localization, patient age) to generate meaningful domains. To verify that these domains are in fact distinct, we used multiple quantification measures to estimate the presence and intensity of domain shifts. Additionally, we analyzed the performance on these domains with and without an unsupervised domain adaptation technique. We observed that in most of our grouped domains, domain shifts in fact exist. Based on our results, we believe these datasets to be helpful for testing the generalization capabilities of dermoscopic skin cancer classifiers.
Collapse
Affiliation(s)
- Katharina Fogelberg
- Digital Biomarkers for Oncology, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Sireesha Chamarthi
- Data Analysis and Intelligence, German Aerospace Center (DLR - Institute of Data science), Jena, Germany
| | - Roman C Maron
- Digital Biomarkers for Oncology, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Julia Niebling
- Data Analysis and Intelligence, German Aerospace Center (DLR - Institute of Data science), Jena, Germany
| | - Titus J Brinker
- Digital Biomarkers for Oncology, German Cancer Research Center (DKFZ), Heidelberg, Germany.
| |
Collapse
|
6
|
Wang X, Zhang J, Yang S, Xiang J, Luo F, Wang M, Zhang J, Yang W, Huang J, Han X. A generalizable and robust deep learning algorithm for mitosis detection in multicenter breast histopathological images. Med Image Anal 2023; 84:102703. [PMID: 36481608 DOI: 10.1016/j.media.2022.102703] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Revised: 09/16/2022] [Accepted: 11/21/2022] [Indexed: 11/24/2022]
Abstract
Mitosis counting of biopsies is an important biomarker for breast cancer patients, which supports disease prognostication and treatment planning. Developing a robust mitotic cell detection model is highly challenging due to its complex growth pattern and high similarities with non-mitotic cells. Most mitosis detection algorithms have poor generalizability across image domains and lack reproducibility and validation in multicenter settings. To overcome these issues, we propose a generalizable and robust mitosis detection algorithm (called FMDet), which is independently tested on multicenter breast histopathological images. To capture more refined morphological features of cells, we convert the object detection task as a semantic segmentation problem. The pixel-level annotations for mitotic nuclei are obtained by taking the intersection of the masks generated from a well-trained nuclear segmentation model and the bounding boxes provided by the MIDOG 2021 challenge. In our segmentation framework, a robust feature extractor is developed to capture the appearance variations of mitotic cells, which is constructed by integrating a channel-wise multi-scale attention mechanism into a fully convolutional network structure. Benefiting from the fact that the changes in the low-level spectrum do not affect the high-level semantic perception, we employ a Fourier-based data augmentation method to reduce domain discrepancies by exchanging the low-frequency spectrum between two domains. Our FMDet algorithm has been tested in the MIDOG 2021 challenge and ranked first place. Further, our algorithm is also externally validated on four independent datasets for mitosis detection, which exhibits state-of-the-art performance in comparison with previously published results. These results demonstrate that our algorithm has the potential to be deployed as an assistant decision support tool in clinical practice. Our code has been released at https://github.com/Xiyue-Wang/1st-in-MICCAI-MIDOG-2021-challenge.
Collapse
|
7
|
Zhou Y, Koyuncu C, Lu C, Grobholz R, Katz I, Madabhushi A, Janowczyk A. Multi-site cross-organ calibrated deep learning (MuSClD): Automated diagnosis of non-melanoma skin cancer. Med Image Anal 2023; 84:102702. [PMID: 36516556 PMCID: PMC9825103 DOI: 10.1016/j.media.2022.102702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 11/09/2022] [Accepted: 11/21/2022] [Indexed: 11/25/2022]
Abstract
Although deep learning (DL) has demonstrated impressive diagnostic performance for a variety of computational pathology tasks, this performance often markedly deteriorates on whole slide images (WSI) generated at external test sites. This phenomenon is due in part to domain shift, wherein differences in test-site pre-analytical variables (e.g., slide scanner, staining procedure) result in WSI with notably different visual presentations compared to training data. To ameliorate pre-analytic variances, approaches such as CycleGAN can be used to calibrate visual properties of images between sites, with the intent of improving DL classifier generalizability. In this work, we present a new approach termed Multi-Site Cross-Organ Calibration based Deep Learning (MuSClD) that employs WSIs of an off-target organ for calibration created at the same site as the on-target organ, based off the assumption that cross-organ slides are subjected to a common set of pre-analytical sources of variance. We demonstrate that by using an off-target organ from the test site to calibrate training data, the domain shift between training and testing data can be mitigated. Importantly, this strategy uniquely guards against potential data leakage introduced during calibration, wherein information only available in the testing data is imparted on the training data. We evaluate MuSClD in the context of the automated diagnosis of non-melanoma skin cancer (NMSC). Specifically, we evaluated MuSClD for identifying and distinguishing (a) basal cell carcinoma (BCC), (b) in-situ squamous cell carcinomas (SCC-In Situ), and (c) invasive squamous cell carcinomas (SCC-Invasive), using an Australian (training, n = 85) and a Swiss (held-out testing, n = 352) cohort. Our experiments reveal that MuSCID reduces the Wasserstein distances between sites in terms of color, contrast, and brightness metrics, without imparting noticeable artifacts to training data. The NMSC-subtyping performance is statistically improved as a result of MuSCID in terms of one-vs. rest AUC: BCC (0.92 vs 0.87, p = 0.01), SCC-In Situ (0.87 vs 0.73, p = 0.15) and SCC-Invasive (0.92 vs 0.82, p = 1e-5). Compared to baseline NMSC-subtyping with no calibration, the internal validation results of MuSClD (BCC (0.98), SCC-In Situ (0.92), and SCC-Invasive (0.97)) suggest that while domain shift indeed degrades classification performance, our on-target calibration using off-target tissue can safely compensate for pre-analytical variabilities, while improving the robustness of the model.
Collapse
Affiliation(s)
- Yufei Zhou
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH, USA
| | - Can Koyuncu
- Department of Biomedical Engineering, Emory University and Georgia Institute of Technology, Atlanta, GA, USA,Louis Stokes Cleveland Veterans Administration Medical Center, Cleveland, USA
| | - Cheng Lu
- Department of Biomedical Engineering, Emory University and Georgia Institute of Technology, Atlanta, GA, USA
| | - Rainer Grobholz
- Institute of Pathology, Cantonal Hospital Aarau, Aarau, Switzerland,Medical Faculty University of Zurich, Zurich, Switzerland
| | - Ian Katz
- Southern Sun Pathology, Sydney, NSW, Australia,University of Queensland, Brisbane, Qld, Australia
| | - Anant Madabhushi
- Department of Biomedical Engineering, Emory University and Georgia Institute of Technology, Atlanta, GA, USA; Atlanta VA Medical Center, Atlanta, USA.
| | - Andrew Janowczyk
- Department of Biomedical Engineering, Emory University and Georgia Institute of Technology, Atlanta, GA, USA,Department of Oncology, Lausanne University Hospital,Department of Diagnostics, Division of Clinical Pathology, Geneva University Hospitals
| |
Collapse
|
8
|
孙 玉, 刘 嘉, 孙 泽, 韩 建, 于 宁. [A generative adversarial network-based unsupervised domain adaptation method for magnetic resonance image segmentation]. Sheng Wu Yi Xue Gong Cheng Xue Za Zhi 2022; 39:1181-1188. [PMID: 36575088 PMCID: PMC9927195 DOI: 10.7507/1001-5515.202203009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Revised: 10/23/2022] [Indexed: 12/29/2022]
Abstract
Intelligent medical image segmentation methods have been rapidly developed and applied, while a significant challenge is domain shift. That is, the segmentation performance degrades due to distribution differences between the source domain and the target domain. This paper proposed an unsupervised end-to-end domain adaptation medical image segmentation method based on the generative adversarial network (GAN). A network training and adjustment model was designed, including segmentation and discriminant networks. In the segmentation network, the residual module was used as the basic module to increase feature reusability and reduce model optimization difficulty. Further, it learned cross-domain features at the image feature level with the help of the discriminant network and a combination of segmentation loss with adversarial loss. The discriminant network took the convolutional neural network and used the labels from the source domain, to distinguish whether the segmentation result of the generated network is from the source domain or the target domain. The whole training process was unsupervised. The proposed method was tested with experiments on a public dataset of knee magnetic resonance (MR) images and the clinical dataset from our cooperative hospital. With our method, the mean Dice similarity coefficient (DSC) of segmentation results increased by 2.52% and 6.10% to the classical feature level and image level domain adaptive method. The proposed method effectively improves the domain adaptive ability of the segmentation method, significantly improves the segmentation accuracy of the tibia and femur, and can better solve the domain transfer problem in MR image segmentation.
Collapse
Affiliation(s)
- 玉波 孙
- 南开大学 人工智能学院(天津 300350)College of Artificial Intelligence, Nankai University, Tianjin 300350, P. R. China
- 南开大学 天津市智能机器人技术重点实验室(天津 300350)Tianjin Key Laboratory of Intelligent Robotics, Nankai University, Tianjin 300350, P. R. China
| | - 嘉男 刘
- 南开大学 人工智能学院(天津 300350)College of Artificial Intelligence, Nankai University, Tianjin 300350, P. R. China
- 南开大学 天津市智能机器人技术重点实验室(天津 300350)Tianjin Key Laboratory of Intelligent Robotics, Nankai University, Tianjin 300350, P. R. China
| | - 泽文 孙
- 南开大学 人工智能学院(天津 300350)College of Artificial Intelligence, Nankai University, Tianjin 300350, P. R. China
| | - 建达 韩
- 南开大学 人工智能学院(天津 300350)College of Artificial Intelligence, Nankai University, Tianjin 300350, P. R. China
- 南开大学 天津市智能机器人技术重点实验室(天津 300350)Tianjin Key Laboratory of Intelligent Robotics, Nankai University, Tianjin 300350, P. R. China
- 北京大学第三医院 运动医学研究所(北京 100083)Institute of Sports Medicine, Peking University Third Hospital, Beijing 100083, P. R. China
| | - 宁波 于
- 南开大学 人工智能学院(天津 300350)College of Artificial Intelligence, Nankai University, Tianjin 300350, P. R. China
- 南开大学 天津市智能机器人技术重点实验室(天津 300350)Tianjin Key Laboratory of Intelligent Robotics, Nankai University, Tianjin 300350, P. R. China
- 北京大学第三医院 运动医学研究所(北京 100083)Institute of Sports Medicine, Peking University Third Hospital, Beijing 100083, P. R. China
| |
Collapse
|
9
|
Abstract
Many previous studies claim to have developed machine learning models that diagnose COVID-19 from blood tests. However, we hypothesize that changes in the underlying distribution of the data, so called domain shifts, affect the predictive performance and reliability and are a reason for the failure of such machine learning models in clinical application. Domain shifts can be caused, e.g., by changes in the disease prevalence (spreading or tested population), by refined RT-PCR testing procedures (way of taking samples, laboratory procedures), or by virus mutations. Therefore, machine learning models for diagnosing COVID-19 or other diseases may not be reliable and degrade in performance over time. We investigate whether domain shifts are present in COVID-19 datasets and how they affect machine learning methods. We further set out to estimate the mortality risk based on routinely acquired blood tests in a hospital setting throughout pandemics and under domain shifts. We reveal domain shifts by evaluating the models on a large-scale dataset with different assessment strategies, such as temporal validation. We present the novel finding that domain shifts strongly affect machine learning models for COVID-19 diagnosis and deteriorate their predictive performance and credibility. Therefore, frequent re-training and re-assessment are indispensable for robust models enabling clinical utility.
Collapse
Affiliation(s)
- Theresa Roland
- ELLIS Unit Linz, LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Linz, Austria.
| | - Carl Böck
- Department of Anesthesiology and Critical Care Medicine, Kepler University Hospital GmbH, Johannes Kepler University Linz, Linz, Austria
| | - Thomas Tschoellitsch
- Department of Anesthesiology and Critical Care Medicine, Kepler University Hospital GmbH, Johannes Kepler University Linz, Linz, Austria
| | | | - Sepp Hochreiter
- ELLIS Unit Linz, LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Linz, Austria
| | - Jens Meier
- Department of Anesthesiology and Critical Care Medicine, Kepler University Hospital GmbH, Johannes Kepler University Linz, Linz, Austria
| | - Günter Klambauer
- ELLIS Unit Linz, LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Linz, Austria
| |
Collapse
|
10
|
Liu L, Zhang Z, Li S, Ma K, Zheng Y. S-CUDA: Self-cleansing unsupervised domain adaptation for medical image segmentation. Med Image Anal 2021; 74:102214. [PMID: 34464837 DOI: 10.1016/j.media.2021.102214] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Revised: 08/09/2021] [Accepted: 08/10/2021] [Indexed: 01/08/2023]
Abstract
Medical image segmentation tasks hitherto have achieved excellent progresses with large-scale datasets, which empowers us to train potent deep convolutional neural networks (DCNNs). However, labeling such large-scale datasets is laborious and error-prone, which leads the noisy (or incorrect) labels to be an ubiquitous problem in the real-world scenarios. In addition, data collected from different sites usually exhibit significant data distribution shift (or domain shift). As a result, noisy label and domain shift become two common problems in medical imaging application scenarios, especially in medical image segmentation, which degrade the performance of deep learning models significantly. In this paper, we identify a novel problem hidden in medical image segmentation, which is unsupervised domain adaptation on noisy labeled data, and propose a novel algorithm named "Self-Cleansing Unsupervised Domain Adaptation" (S-CDUA) to address such issue. S-CUDA sets up a realistic scenario to solve the above problems simultaneously where training data (i.e., source domain) not only shows domain shift w.r.t. unsupervised test data (i.e., target domain) but also contains noisy labels. The key idea of S-CUDA is to learn noise-excluding and domain invariant knowledge from noisy supervised data, which will be applied on the highly corrupted data for label cleansing and further data-recycling, as well as on the test data with domain shift for supervised propagation. To this end, we propose a novel framework leveraging noisy-label learning and domain adaptation techniques to cleanse the noisy labels and learn from trustable clean samples, thus enabling robust adaptation and prediction on the target domain. Specifically, we train two peer adversarial networks to identify high-confidence clean data and exchange them in companions to eliminate the error accumulation problem and narrow the domain gap simultaneously. In the meantime, the high-confidence noisy data are detected and cleansed in order to reuse the contaminated training data. Therefore, our proposed method can not only cleanse the noisy labels in the training set but also take full advantage of the existing noisy data to update the parameters of the network. For evaluation, we conduct experiments on two popular datasets (REFUGE and Drishti-GS) for optic disc (OD) and optic cup (OC) segmentation, and on another public multi-vendor dataset for spinal cord gray matter (SCGM) segmentation. Experimental results show that our proposed method can cleanse noisy labels efficiently and obtain a model with better generalization performance at the same time, which outperforms previous state-of-the-art methods by large margin. Our code can be found at https://github.com/zzdxjtu/S-cuda.
Collapse
Affiliation(s)
- Luyan Liu
- Tencent Jarvis Lab, Shenzhen 518040, China; Tencent Healthcare (Shenzhen) Co., LTD, China.
| | - Zhengdong Zhang
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing 100191, China
| | - Shuai Li
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing 100191, China
| | - Kai Ma
- Tencent Jarvis Lab, Shenzhen 518040, China; Tencent Healthcare (Shenzhen) Co., LTD, China
| | - Yefeng Zheng
- Tencent Jarvis Lab, Shenzhen 518040, China; Tencent Healthcare (Shenzhen) Co., LTD, China
| |
Collapse
|
11
|
Mårtensson G, Ferreira D, Granberg T, Cavallin L, Oppedal K, Padovani A, Rektorova I, Bonanni L, Pardini M, Kramberger MG, Taylor JP, Hort J, Snædal J, Kulisevsky J, Blanc F, Antonini A, Mecocci P, Vellas B, Tsolaki M, Kłoszewska I, Soininen H, Lovestone S, Simmons A, Aarsland D, Westman E. The reliability of a deep learning model in clinical out-of-distribution MRI data: A multicohort study. Med Image Anal 2020; 66:101714. [PMID: 33007638 DOI: 10.1016/j.media.2020.101714] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Revised: 04/17/2020] [Accepted: 04/24/2020] [Indexed: 01/12/2023]
Abstract
Deep learning (DL) methods have in recent years yielded impressive results in medical imaging, with the potential to function as clinical aid to radiologists. However, DL models in medical imaging are often trained on public research cohorts with images acquired with a single scanner or with strict protocol harmonization, which is not representative of a clinical setting. The aim of this study was to investigate how well a DL model performs in unseen clinical datasets-collected with different scanners, protocols and disease populations-and whether more heterogeneous training data improves generalization. In total, 3117 MRI scans of brains from multiple dementia research cohorts and memory clinics, that had been visually rated by a neuroradiologist according to Scheltens' scale of medial temporal atrophy (MTA), were included in this study. By training multiple versions of a convolutional neural network on different subsets of this data to predict MTA ratings, we assessed the impact of including images from a wider distribution during training had on performance in external memory clinic data. Our results showed that our model generalized well to datasets acquired with similar protocols as the training data, but substantially worse in clinical cohorts with visibly different tissue contrasts in the images. This implies that future DL studies investigating performance in out-of-distribution (OOD) MRI data need to assess multiple external cohorts for reliable results. Further, by including data from a wider range of scanners and protocols the performance improved in OOD data, which suggests that more heterogeneous training data makes the model generalize better. To conclude, this is the most comprehensive study to date investigating the domain shift in deep learning on MRI data, and we advocate rigorous evaluation of DL models on clinical data prior to being certified for deployment.
Collapse
Affiliation(s)
- Gustav Mårtensson
- Division of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Stockholm, Sweden.
| | - Daniel Ferreira
- Division of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Stockholm, Sweden
| | - Tobias Granberg
- Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden; Department of Radiology, Karolinska University Hospital, Stockholm, Sweden
| | - Lena Cavallin
- Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden; Department of Radiology, Karolinska University Hospital, Stockholm, Sweden
| | - Ketil Oppedal
- Centre for Age-Related Medicine, Stavanger University Hospital, Stavanger, Norway; Stavanger Medical Imaging Laboratory (SMIL), Department of Radiology, Stavanger University Hospital, Stavanger, Norway; Department of Electrical Engineering and Computer Science, University of Stavanger, Stavanger, Norway
| | - Alessandro Padovani
- Neurology Unit, Department of Clinical and Experimental Sciences, University of Brescia, Brescia, Italy
| | - Irena Rektorova
- 1st Department of Neurology, Medical Faculty, St. Anne's Hospital and CEITEC, Masaryk University, Brno, Czech Republic
| | - Laura Bonanni
- Department of Neuroscience Imaging and Clinical Sciences and CESI, University G d'Annunzio of Chieti-Pescara, Chieti, Italy
| | - Matteo Pardini
- Department of Neuroscience (DINOGMI), University of Genoa and Neurology Clinics, Polyclinic San Martino Hospital, Genoa, Italy
| | - Milica G Kramberger
- Department of Neurology, University Medical Centre Ljubljana, Medical faculty, University of Ljubljana, Slovenia
| | - John-Paul Taylor
- Institute of Neuroscience, Newcastle University, Newcastle upon Tyne, UK
| | - Jakub Hort
- Memory Clinic, Department of Neurology, Charles University, 2nd Faculty of Medicine and Motol University Hospital, Prague, Czech Republic
| | - Jón Snædal
- Landspitali University Hospital, Reykjavik, Iceland
| | - Jaime Kulisevsky
- Movement Disorders Unit, Neurology Department, Sant Pau Hospital, Barcelona, Spain; Institut d'Investigacions Biomédiques Sant Pau (IIB-Sant Pau), Barcelona, Spain; Centro de Investigación en Red-Enfermedades Neurodegenerativas (CIBERNED), Barcelona, Spain; Universitat Autónoma de Barcelona (U.A.B.), Barcelona, Spain
| | - Frederic Blanc
- Day Hospital of Geriatrics, Memory Resource and Research Centre (CM2R) of Strasbourg, Department of Geriatrics, Hôpitaux Universitaires de Strasbourg, Strasbourg, France; University of Strasbourg and French National Centre for Scientific Research (CNRS), ICube Laboratory and Fédération de Médecine Translationnelle de Strasbourg (FMTS), Team Imagerie Multimodale Intégrative en Santé (IMIS)/ICONE, Strasbourg, France
| | - Angelo Antonini
- Department of Neuroscience, University of Padua, Padua & Fondazione Ospedale San Camillo, Venezia, Venice, Italy
| | - Patrizia Mecocci
- Institute of Gerontology and Geriatrics, University of Perugia, Perugia, Italy
| | - Bruno Vellas
- UMR INSERM 1027, gerontopole, CHU, University of Toulouse, France
| | - Magda Tsolaki
- 3rd Department of Neurology, Memory and Dementia Unit, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | | | - Hilkka Soininen
- Institute of Clinical Medicine, Neurology, University of Eastern Finland, Finland; Neurocenter, Neurology, Kuopio University Hospital, Kuopio, Finland
| | - Simon Lovestone
- Department of Psychiatry, Warneford Hospital, University of Oxford, Oxford, UK
| | - Andrew Simmons
- NIHR Biomedical Research Centre for Mental Health, London, UK; NIHR Biomedical Research Unit for Dementia, London, UK; Department of Neuroimaging, Centre for Neuroimaging Sciences, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Dag Aarsland
- Centre for Age-Related Medicine, Stavanger University Hospital, Stavanger, Norway; Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Eric Westman
- Division of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Stockholm, Sweden; Department of Neuroimaging, Centre for Neuroimaging Sciences, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| |
Collapse
|