Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Xin KZ, Li D, Yi PH. Limited generalizability of deep learning algorithm for pediatric pneumonia classification on external data. Emerg Radiol 2021;29:107-113. [PMID: 34648114 PMCID: PMC8515154 DOI: 10.1007/s10140-021-01954-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Accepted: 06/08/2021] [Indexed: 11/06/2022]

For:	Xin KZ, Li D, Yi PH. Limited generalizability of deep learning algorithm for pediatric pneumonia classification on external data. Emerg Radiol 2021;29:107-113. [PMID: 34648114 PMCID: PMC8515154 DOI: 10.1007/s10140-021-01954-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Accepted: 06/08/2021] [Indexed: 11/06/2022]

Number

Cited by Other Article(s)

Wu D, Smith D, VanBerlo B, Roshankar A, Lee H, Li B, Ali F, Rahman M, Basmaji J, Tschirhart J, Ford A, VanBerlo B, Durvasula A, Vannelli C, Dave C, Deglint J, Ho J, Chaudhary R, Clausdorff H, Prager R, Millington S, Shah S, Buchanan B, Arntfield R. Improving the Generalizability and Performance of an Ultrasound Deep Learning Model Using Limited Multicenter Data for Lung Sliding Artifact Identification. Diagnostics (Basel) 2024;14:1081. [PMID: 38893608 PMCID: PMC11172006 DOI: 10.3390/diagnostics14111081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 05/18/2024] [Accepted: 05/20/2024] [Indexed: 06/21/2024] Open

Abstract

Deep learning (DL) models for medical image classification frequently struggle to generalize to data from outside institutions. Additional clinical data are also rarely collected to comprehensively assess and understand model performance amongst subgroups. Following the development of a single-center model to identify the lung sliding artifact on lung ultrasound (LUS), we pursued a validation strategy using external LUS data. As annotated LUS data are relatively scarce-compared to other medical imaging data-we adopted a novel technique to optimize the use of limited external data to improve model generalizability. Externally acquired LUS data from three tertiary care centers, totaling 641 clips from 238 patients, were used to assess the baseline generalizability of our lung sliding model. We then employed our novel Threshold-Aware Accumulative Fine-Tuning (TAAFT) method to fine-tune the baseline model and determine the minimum amount of data required to achieve predefined performance goals. A subgroup analysis was also performed and Grad-CAM++ explanations were examined. The final model was fine-tuned on one-third of the external dataset to achieve 0.917 sensitivity, 0.817 specificity, and 0.920 area under the receiver operator characteristic curve (AUC) on the external validation dataset, exceeding our predefined performance goals. Subgroup analyses identified LUS characteristics that most greatly challenged the model's performance. Grad-CAM++ saliency maps highlighted clinically relevant regions on M-mode images. We report a multicenter study that exploits limited available external data to improve the generalizability and performance of our lung sliding model while identifying poorly performing subgroups to inform future iterative improvements. This approach may contribute to efficiencies for DL researchers working with smaller quantities of external validation data.

Collapse

Affiliation(s)

Derek Wu Department of Medicine, Western University, London, ON N6A 5C1, Canada;
Delaney Smith Faculty of Mathematics, University of Waterloo, Waterloo, ON N2L 3G1, Canada; (D.S.); (H.L.)
Blake VanBerlo Faculty of Mathematics, University of Waterloo, Waterloo, ON N2L 3G1, Canada; (D.S.); (H.L.)
Amir Roshankar Faculty of Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada; (A.R.); (B.L.); (F.A.); (M.R.)
Hoseok Lee Faculty of Mathematics, University of Waterloo, Waterloo, ON N2L 3G1, Canada; (D.S.); (H.L.)
Brian Li Faculty of Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada; (A.R.); (B.L.); (F.A.); (M.R.)
Faraz Ali Faculty of Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada; (A.R.); (B.L.); (F.A.); (M.R.)
Marwan Rahman Faculty of Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada; (A.R.); (B.L.); (F.A.); (M.R.)
John Basmaji Division of Critical Care Medicine, Western University, London, ON N6A 5C1, Canada; (J.B.); (C.D.); (R.P.); (R.A.)
Jared Tschirhart Schulich School of Medicine and Dentistry, Western University, London, ON N6A 5C1, Canada; (J.T.); (A.D.); (C.V.)
Alex Ford Independent Researcher, London, ON N6A 1L8, Canada;
Bennett VanBerlo Faculty of Engineering, Western University, London, ON N6A 5C1, Canada;
Ashritha Durvasula Schulich School of Medicine and Dentistry, Western University, London, ON N6A 5C1, Canada; (J.T.); (A.D.); (C.V.)
Claire Vannelli Schulich School of Medicine and Dentistry, Western University, London, ON N6A 5C1, Canada; (J.T.); (A.D.); (C.V.)
Chintan Dave Division of Critical Care Medicine, Western University, London, ON N6A 5C1, Canada; (J.B.); (C.D.); (R.P.); (R.A.)
Jason Deglint Faculty of Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada; (A.R.); (B.L.); (F.A.); (M.R.)
Jordan Ho Department of Family Medicine, Western University, London, ON N6A 5C1, Canada;
Rushil Chaudhary Department of Medicine, Western University, London, ON N6A 5C1, Canada;
Hans Clausdorff Departamento de Medicina de Urgencia, Pontificia Universidad Católica de Chile, Santiago 8331150, Chile;
Ross Prager Division of Critical Care Medicine, Western University, London, ON N6A 5C1, Canada; (J.B.); (C.D.); (R.P.); (R.A.)
Scott Millington Department of Critical Care Medicine, University of Ottawa, Ottawa, ON K1N 6N5, Canada;
Samveg Shah Department of Medicine, University of Alberta, Edmonton, AB T6G 2R3, Canada;
Brian Buchanan Department of Critical Care, University of Alberta, Edmonton, AB T6G 2R3, Canada;
Robert Arntfield Division of Critical Care Medicine, Western University, London, ON N6A 5C1, Canada; (J.B.); (C.D.); (R.P.); (R.A.)

Collapse

Rajaraman S, Zamzmi G, Yang F, Liang Z, Xue Z, Antani S. Uncovering the effects of model initialization on deep model generalization: A study with adult and pediatric chest X-ray images. PLOS DIGITAL HEALTH 2024;3:e0000286. [PMID: 38232121 DOI: 10.1371/journal.pdig.0000286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 12/04/2023] [Indexed: 01/19/2024]

Abstract

Model initialization techniques are vital for improving the performance and reliability of deep learning models in medical computer vision applications. While much literature exists on non-medical images, the impacts on medical images, particularly chest X-rays (CXRs) are less understood. Addressing this gap, our study explores three deep model initialization techniques: Cold-start, Warm-start, and Shrink and Perturb start, focusing on adult and pediatric populations. We specifically focus on scenarios with periodically arriving data for training, thereby embracing the real-world scenarios of ongoing data influx and the need for model updates. We evaluate these models for generalizability against external adult and pediatric CXR datasets. We also propose novel ensemble methods: F-score-weighted Sequential Least-Squares Quadratic Programming (F-SLSQP) and Attention-Guided Ensembles with Learnable Fuzzy Softmax to aggregate weight parameters from multiple models to capitalize on their collective knowledge and complementary representations. We perform statistical significance tests with 95% confidence intervals and p-values to analyze model performance. Our evaluations indicate models initialized with ImageNet-pretrained weights demonstrate superior generalizability over randomly initialized counterparts, contradicting some findings for non-medical images. Notably, ImageNet-pretrained models exhibit consistent performance during internal and external testing across different training scenarios. Weight-level ensembles of these models show significantly higher recall (p<0.05) during testing compared to individual models. Thus, our study accentuates the benefits of ImageNet-pretrained weight initialization, especially when used with weight-level ensembles, for creating robust and generalizable deep learning solutions.

Collapse

Rajaraman S, Yang F, Zamzmi G, Xue Z, Antani S. Can Deep Adult Lung Segmentation Models Generalize to the Pediatric Population? EXPERT SYSTEMS WITH APPLICATIONS 2023;229:120531. [PMID: 37397242 PMCID: PMC10310063 DOI: 10.1016/j.eswa.2023.120531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]

Krokos G, MacKewn J, Dunn J, Marsden P. A review of PET attenuation correction methods for PET-MR. EJNMMI Phys 2023;10:52. [PMID: 37695384 PMCID: PMC10495310 DOI: 10.1186/s40658-023-00569-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 08/07/2023] [Indexed: 09/12/2023] Open

Abstract

Despite being thirteen years since the installation of the first PET-MR system, the scanners constitute a very small proportion of the total hybrid PET systems installed. This is in stark contrast to the rapid expansion of the PET-CT scanner, which quickly established its importance in patient diagnosis within a similar timeframe. One of the main hurdles is the development of an accurate, reproducible and easy-to-use method for attenuation correction. Quantitative discrepancies in PET images between the manufacturer-provided MR methods and the more established CT- or transmission-based attenuation correction methods have led the scientific community in a continuous effort to develop a robust and accurate alternative. These can be divided into four broad categories: (i) MR-based, (ii) emission-based, (iii) atlas-based and the (iv) machine learning-based attenuation correction, which is rapidly gaining momentum. The first is based on segmenting the MR images in various tissues and allocating a predefined attenuation coefficient for each tissue. Emission-based attenuation correction methods aim in utilising the PET emission data by simultaneously reconstructing the radioactivity distribution and the attenuation image. Atlas-based attenuation correction methods aim to predict a CT or transmission image given an MR image of a new patient, by using databases containing CT or transmission images from the general population. Finally, in machine learning methods, a model that could predict the required image given the acquired MR or non-attenuation-corrected PET image is developed by exploiting the underlying features of the images. Deep learning methods are the dominant approach in this category. Compared to the more traditional machine learning, which uses structured data for building a model, deep learning makes direct use of the acquired images to identify underlying features. This up-to-date review goes through the literature of attenuation correction approaches in PET-MR after categorising them. The various approaches in each category are described and discussed. After exploring each category separately, a general overview is given of the current status and potential future approaches along with a comparison of the four outlined categories.

Collapse

Beheshtian E, Putman K, Santomartino SM, Parekh VS, Yi PH. Generalizability and Bias in a Deep Learning Pediatric Bone Age Prediction Model Using Hand Radiographs. Radiology 2023;306:e220505. [PMID: 36165796 DOI: 10.1148/radiol.220505] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]

Abstract

Background Although deep learning (DL) models have demonstrated expert-level ability for pediatric bone age prediction, they have shown poor generalizability and bias in other use cases. Purpose To quantify generalizability and bias in a bone age DL model measured by performance on external versus internal test sets and performance differences between different demographic groups, respectively. Materials and Methods The winning DL model of the 2017 RSNA Pediatric Bone Age Challenge was retrospectively evaluated and trained on 12 611 pediatric hand radiographs from two U.S. hospitals. The DL model was tested from September 2021 to December 2021 on an internal validation set and an external test set of pediatric hand radiographs with diverse demographic representation. Images reporting ground-truth bone age were included for study. Mean absolute difference (MAD) between ground-truth bone age and the model prediction bone age was calculated for each set. Generalizability was evaluated by comparing MAD between internal and external evaluation sets with use of t tests. Bias was evaluated by comparing MAD and clinically significant error rate (rate of errors changing the clinical diagnosis) between demographic groups with use of t tests or analysis of variance and χ² tests, respectively (statistically significant difference defined as P < .05). Results The internal validation set had images from 1425 individuals (773 boys), and the external test set had images from 1202 individuals (mean age, 133 months ± 60 [SD]; 614 boys). The bone age model generalized well to the external test set, with no difference in MAD (6.8 months in the validation set vs 6.9 months in the external set; P = .64). Model predictions would have led to clinically significant errors in 194 of 1202 images (16%) in the external test set. The MAD was greater for girls than boys in the internal validation set (P = .01) and in the subcategories of age and Tanner stage in the external test set (P < .001 for both). Conclusion A deep learning (DL) bone age model generalized well to an external test set, although clinically significant sex-, age-, and sexual maturity-based biases in DL bone age were identified. © RSNA, 2022 Online supplemental material is available for this article See also the editorial by Larson in this issue.

Collapse

Chua M, Kim D, Choi J, Lee NG, Deshpande V, Schwab J, Lev MH, Gonzalez RG, Gee MS, Do S. Tackling prediction uncertainty in machine learning for healthcare. Nat Biomed Eng 2022:10.1038/s41551-022-00988-x. [PMID: 36581695 DOI: 10.1038/s41551-022-00988-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 11/17/2022] [Indexed: 12/31/2022]

Can images crowdsourced from the internet be used to train generalizable joint dislocation deep learning algorithms? Skeletal Radiol 2022;51:2121-2128. [PMID: 35624310 DOI: 10.1007/s00256-022-04077-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 05/18/2022] [Accepted: 05/19/2022] [Indexed: 02/02/2023]