1
|
Djahnine A, Jupin-Delevaux E, Nempont O, Si-Mohamed SA, Craighero F, Cottin V, Douek P, Popoff A, Boussel L. Weakly-supervised learning-based pathology detection and localization in 3D chest CT scans. Med Phys 2024. [PMID: 39140793 DOI: 10.1002/mp.17302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 05/24/2024] [Accepted: 07/01/2024] [Indexed: 08/15/2024] Open
Abstract
BACKGROUND Recent advancements in anomaly detection have paved the way for novel radiological reading assistance tools that support the identification of findings, aimed at saving time. The clinical adoption of such applications requires a low rate of false positives while maintaining high sensitivity. PURPOSE In light of recent interest and development in multi pathology identification, we present a novel method, based on a recent contrastive self-supervised approach, for multiple chest-related abnormality identification including low lung density area ("LLDA"), consolidation ("CONS"), nodules ("NOD") and interstitial pattern ("IP"). Our approach alerts radiologists about abnormal regions within a computed tomography (CT) scan by providing 3D localization. METHODS We introduce a new method for the classification and localization of multiple chest pathologies in 3D Chest CT scans. Our goal is to distinguish four common chest-related abnormalities: "LLDA", "CONS", "NOD", "IP" and "NORMAL". This method is based on a 3D patch-based classifier with a Resnet backbone encoder pretrained leveraging recent contrastive self supervised approach and a fine-tuned classification head. We leverage the SimCLR contrastive framework for pretraining on an unannotated dataset of randomly selected patches and we then fine-tune it on a labeled dataset. During inference, this classifier generates probability maps for each abnormality across the CT volume, which are aggregated to produce a multi-label patient-level prediction. We compare different training strategies, including random initialization, ImageNet weight initialization, frozen SimCLR pretrained weights and fine-tuned SimCLR pretrained weights. Each training strategy is evaluated on a validation set for hyperparameter selection and tested on a test set. Additionally, we explore the fine-tuned SimCLR pretrained classifier for 3D pathology localization and conduct qualitative evaluation. RESULTS Validated on 111 chest scans for hyperparameter selection and subsequently tested on 251 chest scans with multi-abnormalities, our method achieves an AUROC of 0.931 (95% confidence interval [CI]: [0.9034, 0.9557], p $ p$ -value < 0.001) and 0.963 (95% CI: [0.952, 0.976], p $ p$ -value < 0.001) in the multi-label and binary (i.e., normal versus abnormal) settings, respectively. Notably, our method surpasses the area under the receiver operating characteristic (AUROC) threshold of 0.9 for two abnormalities: IP (0.974) and LLDA (0.952), while achieving values of 0.853 and 0.791 for NOD and CONS, respectively. Furthermore, our results highlight the superiority of incorporating contrastive pretraining within the patch classifier, outperforming Imagenet pretraining weights and non-pretrained counterparts with uninitialized weights (F1 score = 0.943, 0.792, and 0.677 respectively). Qualitatively, the method achieved a satisfactory 88.8% completeness rate in localization and maintained an 88.3% accuracy rate against false positives. CONCLUSIONS The proposed method integrates self-supervised learning algorithms for pretraining, utilizes a patch-based approach for 3D pathology localization and develops an aggregation method for multi-label prediction at patient-level. It shows promise in efficiently detecting and localizing multiple anomalies within a single scan.
Collapse
Affiliation(s)
- Aissam Djahnine
- CREATIS UMR5220, INSERM U1044, Claude Bernard University Lyon 1, INSA, Lyon, France
- Philips Health Technology innovation, Paris, France
| | | | | | - Salim Aymeric Si-Mohamed
- CREATIS UMR5220, INSERM U1044, Claude Bernard University Lyon 1, INSA, Lyon, France
- Department of Radiology, Hospices Civils de Lyon, Lyon, France
| | | | - Vincent Cottin
- National Reference Center for Rare Pulmonary Diseases, Louis Pradel Hospital, Lyon, France
- Claude Bernard University Lyon 1, Lyon, France
| | - Philippe Douek
- CREATIS UMR5220, INSERM U1044, Claude Bernard University Lyon 1, INSA, Lyon, France
- Department of Radiology, Hospices Civils de Lyon, Lyon, France
| | | | - Loic Boussel
- CREATIS UMR5220, INSERM U1044, Claude Bernard University Lyon 1, INSA, Lyon, France
- Department of Radiology, Hospices Civils de Lyon, Lyon, France
| |
Collapse
|
2
|
Misera L, Müller-Franzes G, Truhn D, Kather JN. Weakly Supervised Deep Learning in Radiology. Radiology 2024; 312:e232085. [PMID: 39041937 DOI: 10.1148/radiol.232085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/24/2024]
Abstract
Deep learning (DL) is currently the standard artificial intelligence tool for computer-based image analysis in radiology. Traditionally, DL models have been trained with strongly supervised learning methods. These methods depend on reference standard labels, typically applied manually by experts. In contrast, weakly supervised learning is more scalable. Weak supervision comprises situations in which only a portion of the data are labeled (incomplete supervision), labels refer to a whole region or case as opposed to a precisely delineated image region (inexact supervision), or labels contain errors (inaccurate supervision). In many applications, weak labels are sufficient to train useful models. Thus, weakly supervised learning can unlock a large amount of otherwise unusable data for training DL models. One example of this is using large language models to automatically extract weak labels from free-text radiology reports. Here, we outline the key concepts in weakly supervised learning and provide an overview of applications in radiologic image analysis. With more fundamental and clinical translational work, weakly supervised learning could facilitate the uptake of DL in radiology and research workflows by enabling large-scale image analysis and advancing the development of new DL-based biomarkers.
Collapse
Affiliation(s)
- Leo Misera
- From the Institute and Polyclinic for Diagnostic and Interventional Radiology (L.M.), Else Kröner Fresenius Center for Digital Health (L.M., J.N.K.), and Department of Medicine I (J.N.K.), Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Fetscherstrasse 74, 01307 Dresden, Germany; Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, Germany (G.M.F., D.T.); and Medical Oncology, National Center for Tumor Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany (J.N.K.)
| | - Gustav Müller-Franzes
- From the Institute and Polyclinic for Diagnostic and Interventional Radiology (L.M.), Else Kröner Fresenius Center for Digital Health (L.M., J.N.K.), and Department of Medicine I (J.N.K.), Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Fetscherstrasse 74, 01307 Dresden, Germany; Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, Germany (G.M.F., D.T.); and Medical Oncology, National Center for Tumor Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany (J.N.K.)
| | - Daniel Truhn
- From the Institute and Polyclinic for Diagnostic and Interventional Radiology (L.M.), Else Kröner Fresenius Center for Digital Health (L.M., J.N.K.), and Department of Medicine I (J.N.K.), Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Fetscherstrasse 74, 01307 Dresden, Germany; Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, Germany (G.M.F., D.T.); and Medical Oncology, National Center for Tumor Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany (J.N.K.)
| | - Jakob Nikolas Kather
- From the Institute and Polyclinic for Diagnostic and Interventional Radiology (L.M.), Else Kröner Fresenius Center for Digital Health (L.M., J.N.K.), and Department of Medicine I (J.N.K.), Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Fetscherstrasse 74, 01307 Dresden, Germany; Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, Germany (G.M.F., D.T.); and Medical Oncology, National Center for Tumor Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany (J.N.K.)
| |
Collapse
|
3
|
Teneggi J, Yi PH, Sulam J. Examination-Level Supervision for Deep Learning-based Intracranial Hemorrhage Detection on Head CT Scans. Radiol Artif Intell 2024; 6:e230159. [PMID: 38294324 PMCID: PMC10831525 DOI: 10.1148/ryai.230159] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 11/02/2023] [Accepted: 12/05/2023] [Indexed: 02/01/2024]
Abstract
Purpose To compare the effectiveness of weak supervision (ie, with examination-level labels only) and strong supervision (ie, with image-level labels) in training deep learning models for detection of intracranial hemorrhage (ICH) on head CT scans. Materials and Methods In this retrospective study, an attention-based convolutional neural network was trained with either local (ie, image level) or global (ie, examination level) binary labels on the Radiological Society of North America (RSNA) 2019 Brain CT Hemorrhage Challenge dataset of 21 736 examinations (8876 [40.8%] ICH) and 752 422 images (107 784 [14.3%] ICH). The CQ500 (436 examinations; 212 [48.6%] ICH) and CT-ICH (75 examinations; 36 [48.0%] ICH) datasets were employed for external testing. Performance in detecting ICH was compared between weak (examination-level labels) and strong (image-level labels) learners as a function of the number of labels available during training. Results On examination-level binary classification, strong and weak learners did not have different area under the receiver operating characteristic curve values on the internal validation split (0.96 vs 0.96; P = .64) and the CQ500 dataset (0.90 vs 0.92; P = .15). Weak learners outperformed strong ones on the CT-ICH dataset (0.95 vs 0.92; P = .03). Weak learners had better section-level ICH detection performance when more than 10 000 labels were available for training (average f1 = 0.73 vs 0.65; P < .001). Weakly supervised models trained on the entire RSNA dataset required 35 times fewer labels than equivalent strong learners. Conclusion Strongly supervised models did not achieve better performance than weakly supervised ones, which could reduce radiologist labor requirements for prospective dataset curation. Keywords: CT, Head/Neck, Brain/Brain Stem, Hemorrhage Supplemental material is available for this article. © RSNA, 2023 See also commentary by Wahid and Fuentes in this issue.
Collapse
Affiliation(s)
- Jacopo Teneggi
- From the Department of Computer Science (J.T.), Department of
Biomedical Engineering (J.S.), and Mathematical Institute for Data Science
(MINDS) (J.S., J.T.), Johns Hopkins University, 3400 N Charles St, Clark Hall,
Suite 320, Baltimore, MD 21218; and University of Maryland Medical Intelligent
Imaging Center (UM2ii), Department of Diagnostic Radiology and Nuclear Medicine,
University of Maryland School of Medicine, Baltimore, Md (P.H.Y.)
| | - Paul H. Yi
- From the Department of Computer Science (J.T.), Department of
Biomedical Engineering (J.S.), and Mathematical Institute for Data Science
(MINDS) (J.S., J.T.), Johns Hopkins University, 3400 N Charles St, Clark Hall,
Suite 320, Baltimore, MD 21218; and University of Maryland Medical Intelligent
Imaging Center (UM2ii), Department of Diagnostic Radiology and Nuclear Medicine,
University of Maryland School of Medicine, Baltimore, Md (P.H.Y.)
| | - Jeremias Sulam
- From the Department of Computer Science (J.T.), Department of
Biomedical Engineering (J.S.), and Mathematical Institute for Data Science
(MINDS) (J.S., J.T.), Johns Hopkins University, 3400 N Charles St, Clark Hall,
Suite 320, Baltimore, MD 21218; and University of Maryland Medical Intelligent
Imaging Center (UM2ii), Department of Diagnostic Radiology and Nuclear Medicine,
University of Maryland School of Medicine, Baltimore, Md (P.H.Y.)
| |
Collapse
|
4
|
A review of deep learning-based multiple-lesion recognition from medical images: classification, detection and segmentation. Comput Biol Med 2023; 157:106726. [PMID: 36924732 DOI: 10.1016/j.compbiomed.2023.106726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 02/07/2023] [Accepted: 02/27/2023] [Indexed: 03/05/2023]
Abstract
Deep learning-based methods have become the dominant methodology in medical image processing with the advancement of deep learning in natural image classification, detection, and segmentation. Deep learning-based approaches have proven to be quite effective in single lesion recognition and segmentation. Multiple-lesion recognition is more difficult than single-lesion recognition due to the little variation between lesions or the too wide range of lesions involved. Several studies have recently explored deep learning-based algorithms to solve the multiple-lesion recognition challenge. This paper includes an in-depth overview and analysis of deep learning-based methods for multiple-lesion recognition developed in recent years, including multiple-lesion recognition in diverse body areas and recognition of whole-body multiple diseases. We discuss the challenges that still persist in the multiple-lesion recognition tasks by critically assessing these efforts. Finally, we outline existing problems and potential future research areas, with the hope that this review will help researchers in developing future approaches that will drive additional advances.
Collapse
|
5
|
Zhang D, Neely B, Lo JY, Patel BN, Hyslop T, Gupta RT. Utility of a Rule-Based Algorithm in the Assessment of Standardized Reporting in PI-RADS. Acad Radiol 2022; 30:1141-1147. [PMID: 35909050 DOI: 10.1016/j.acra.2022.06.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2022] [Revised: 06/15/2022] [Accepted: 06/28/2022] [Indexed: 11/26/2022]
Abstract
RATIONALE AND OBJECTIVES Adoption of the Prostate Imaging Reporting & Data System (PI-RADS) has been shown to increase detection of clinically significant prostate cancer on prostate mpMRI. We propose that a rule-based algorithm based on Regular Expression (RegEx) matching can be used to automatically categorize prostate mpMRI reports into categories as a means by which to assess for opportunities for quality improvement. MATERIALS AND METHODS All prostate mpMRIs performed in the Duke University Health System from January 2, 2015, to January 29, 2021, were analyzed. Exclusion criteria were applied, for a total of 5343 male patients and 6264 prostate mpMRI reports. These reports were then analyzed by our RegEx algorithm to be categorized as PI-RADS 1 through PI-RADS 5, Recurrent Disease, or "No Information Available." A stratified, random sample of 502 mpMRI reports was reviewed by a blinded clinical team to assess performance of the RegEx algorithm. RESULTS Compared to manual review, the RegEx algorithm achieved overall accuracy of 92.6%, average precision of 88.8%, average recall of 85.6%, and F1 score of 0.871. The clinical team also reviewed 344 cases that were classified as "No Information Available," and found that in 150 instances, no numerical PI-RADS score for any lesion was included in the impression section of the mpMRI report. CONCLUSION Rule-based processing is an accurate method for the large-scale, automated extraction of PI-RADS scores from the text of radiology reports. These natural language processing approaches can be used for future initiatives in quality improvement in prostate mpMRI reporting with PI-RADS.
Collapse
|
6
|
D’Anniballe VM, Tushar FI, Faryna K, Han S, Mazurowski MA, Rubin GD, Lo JY. Multi-label annotation of text reports from computed tomography of the chest, abdomen, and pelvis using deep learning. BMC Med Inform Decis Mak 2022; 22:102. [PMID: 35428335 PMCID: PMC9011942 DOI: 10.1186/s12911-022-01843-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Accepted: 04/08/2022] [Indexed: 11/10/2022] Open
Abstract
Abstract
Background
There is progress to be made in building artificially intelligent systems to detect abnormalities that are not only accurate but can handle the true breadth of findings that radiologists encounter in body (chest, abdomen, and pelvis) computed tomography (CT). Currently, the major bottleneck for developing multi-disease classifiers is a lack of manually annotated data. The purpose of this work was to develop high throughput multi-label annotators for body CT reports that can be applied across a variety of abnormalities, organs, and disease states thereby mitigating the need for human annotation.
Methods
We used a dictionary approach to develop rule-based algorithms (RBA) for extraction of disease labels from radiology text reports. We targeted three organ systems (lungs/pleura, liver/gallbladder, kidneys/ureters) with four diseases per system based on their prevalence in our dataset. To expand the algorithms beyond pre-defined keywords, attention-guided recurrent neural networks (RNN) were trained using the RBA-extracted labels to classify reports as being positive for one or more diseases or normal for each organ system. Alternative effects on disease classification performance were evaluated using random initialization or pre-trained embedding as well as different sizes of training datasets. The RBA was tested on a subset of 2158 manually labeled reports and performance was reported as accuracy and F-score. The RNN was tested against a test set of 48,758 reports labeled by RBA and performance was reported as area under the receiver operating characteristic curve (AUC), with 95% CIs calculated using the DeLong method.
Results
Manual validation of the RBA confirmed 91–99% accuracy across the 15 different labels. Our models extracted disease labels from 261,229 radiology reports of 112,501 unique subjects. Pre-trained models outperformed random initialization across all diseases. As the training dataset size was reduced, performance was robust except for a few diseases with a relatively small number of cases. Pre-trained classification AUCs reached > 0.95 for all four disease outcomes and normality across all three organ systems.
Conclusions
Our label-extracting pipeline was able to encompass a variety of cases and diseases in body CT reports by generalizing beyond strict rules with exceptional accuracy. The method described can be easily adapted to enable automated labeling of hospital-scale medical data sets for training image-based disease classifiers.
Collapse
|