Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Chang K, Beers AL, Brink L, Patel JB, Singh P, Arun NT, Hoebel KV, Gaw N, Shah M, Pisano ED, Tilkin M, Coombs LP, Dreyer KJ, Allen B, Agarwal S, Kalpathy-Cramer J. Multi-Institutional Assessment and Crowdsourcing Evaluation of Deep Learning for Automated Classification of Breast Density. J Am Coll Radiol 2020;17:1653-1662. [PMID: 32592660 PMCID: PMC10757768 DOI: 10.1016/j.jacr.2020.05.015] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 05/05/2020] [Accepted: 05/07/2020] [Indexed: 12/11/2022]

For:	Chang K, Beers AL, Brink L, Patel JB, Singh P, Arun NT, Hoebel KV, Gaw N, Shah M, Pisano ED, Tilkin M, Coombs LP, Dreyer KJ, Allen B, Agarwal S, Kalpathy-Cramer J. Multi-Institutional Assessment and Crowdsourcing Evaluation of Deep Learning for Automated Classification of Breast Density. J Am Coll Radiol 2020;17:1653-1662. [PMID: 32592660 PMCID: PMC10757768 DOI: 10.1016/j.jacr.2020.05.015] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 05/05/2020] [Accepted: 05/07/2020] [Indexed: 12/11/2022]

Number

Cited by Other Article(s)

Nisanova A, Yavary A, Deaner J, Ali FS, Gogte P, Kaplan R, Chen KC, Nudleman E, Grewal D, Gupta M, Wolfe J, Klufas M, Yiu G, Soltani I, Emami-Naeini P. Performance of Automated Machine Learning in Predicting Outcomes of Pneumatic Retinopexy. OPHTHALMOLOGY SCIENCE 2024;4:100470. [PMID: 38827487 PMCID: PMC11141253 DOI: 10.1016/j.xops.2024.100470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 12/30/2023] [Accepted: 01/12/2024] [Indexed: 06/04/2024]

Abstract

Purpose

Automated machine learning (AutoML) has emerged as a novel tool for medical professionals lacking coding experience, enabling them to develop predictive models for treatment outcomes. This study evaluated the performance of AutoML tools in developing models predicting the success of pneumatic retinopexy (PR) in treatment of rhegmatogenous retinal detachment (RRD). These models were then compared with custom models created by machine learning (ML) experts.

Design

Retrospective multicenter study.

Participants

Five hundred and thirty nine consecutive patients with primary RRD that underwent PR by a vitreoretinal fellow at 6 training hospitals between 2002 and 2022.

Methods

We used 2 AutoML platforms: MATLAB Classification Learner and Google Cloud AutoML. Additional models were developed by computer scientists. We included patient demographics and baseline characteristics, including lens and macula status, RRD size, number and location of breaks, presence of vitreous hemorrhage and lattice degeneration, and physicians' experience. The dataset was split into a training (n = 483) and test set (n = 56). The training set, with a 2:1 success-to-failure ratio, was used to train the MATLAB models. Because Google Cloud AutoML requires a minimum of 1000 samples, the training set was tripled to create a new set with 1449 datapoints. Additionally, balanced datasets with a 1:1 success-to-failure ratio were created using Python.

Main Outcome Measures

Single-procedure anatomic success rate, as predicted by the ML models. F2 scores and area under the receiver operating curve (AUROC) were used as primary metrics to compare models.

Results

The best performing AutoML model (F2 score: 0.85; AUROC: 0.90; MATLAB), showed comparable performance to the custom model (0.92, 0.86) when trained on the balanced datasets. However, training the AutoML model with imbalanced data yielded misleadingly high AUROC (0.81) despite low F2-score (0.2) and sensitivity (0.17).

Conclusions

We demonstrated the feasibility of using AutoML as an accessible tool for medical professionals to develop models from clinical data. Such models can ultimately aid in the clinical decision-making, contributing to better patient outcomes. However, outcomes can be misleading or unreliable if used naively. Limitations exist, particularly if datasets contain missing variables or are highly imbalanced. Proper model selection and data preprocessing can improve the reliability of AutoML tools.

Financial Disclosures

Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

Collapse

Schmidt K, Bearce B, Chang K, Coombs L, Farahani K, Elbatel M, Mouheb K, Marti R, Zhang R, Zhang Y, Wang Y, Hu Y, Ying H, Xu Y, Testagrose C, Demirer M, Gupta V, Akünal Ü, Bujotzek M, Maier-Hein KH, Qin Y, Li X, Kalpathy-Cramer J, Roth HR. Fair evaluation of federated learning algorithms for automated breast density classification: The results of the 2022 ACR-NCI-NVIDIA federated learning challenge. Med Image Anal 2024;95:103206. [PMID: 38776844 DOI: 10.1016/j.media.2024.103206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 02/15/2024] [Accepted: 05/13/2024] [Indexed: 05/25/2024]

Affiliation(s)

Kendall Schmidt American College of Radiology, USA.
Benjamin Bearce The Massachusetts General Hospital, USA; University of Colorado, USA
Ken Chang The Massachusetts General Hospital, USA
Laura Coombs American College of Radiology, USA
Keyvan Farahani National Institutes of Health National Cancer Institute, USA
Marawan Elbatel Computer Vision and Robotics Institute, University of Girona, Spain
Kaouther Mouheb Computer Vision and Robotics Institute, University of Girona, Spain
Robert Marti Computer Vision and Robotics Institute, University of Girona, Spain
Ruipeng Zhang Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, China; Shanghai AI Laboratory, China
Yao Zhang Shanghai AI Laboratory, China
Yanfeng Wang Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, China; Shanghai AI Laboratory, China
Yaojun Hu Real Doctor AI Research Centre, Zhejiang University, China
Haochao Ying Real Doctor AI Research Centre, Zhejiang University, China; School of Public Health, Zhejiang University, China
Yuyang Xu Real Doctor AI Research Centre, Zhejiang University, China; College of Computer Science and Technology, Zhejiang University, China
Conrad Testagrose University of North Florida College of Computing Jacksonville, USA
Mutlu Demirer Mayo Clinic Florida Radiology, USA
Vikash Gupta Mayo Clinic Florida Radiology, USA
Ünal Akünal Division of Medical Image Computing, German Cancer Research Center, Heidelberg, Germany
Markus Bujotzek Division of Medical Image Computing, German Cancer Research Center, Heidelberg, Germany
Klaus H Maier-Hein Division of Medical Image Computing, German Cancer Research Center, Heidelberg, Germany
Yi Qin Electronic and Computer Engineering, Hong Kong University of Science and Technology, China
Xiaomeng Li Electronic and Computer Engineering, Hong Kong University of Science and Technology, China
Jayashree Kalpathy-Cramer The Massachusetts General Hospital, USA; University of Colorado, USA
Holger R Roth NVIDIA, USA

Collapse

Sim SY, Hwang J, Ryu J, Kim H, Kim EJ, Lee JY. Differential Diagnosis of OKC and SBC on Panoramic Radiographs: Leveraging Deep Learning Algorithms. Diagnostics (Basel) 2024;14:1144. [PMID: 38893670 PMCID: PMC11172000 DOI: 10.3390/diagnostics14111144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 05/28/2024] [Accepted: 05/28/2024] [Indexed: 06/21/2024] Open

Egemen D, Perkins RB, Cheung LC, Befano B, Rodriguez AC, Desai K, Lemay A, Ahmed SR, Antani S, Jeronimo J, Wentzensen N, Kalpathy-Cramer J, De Sanjose S, Schiffman M. Artificial intelligence-based image analysis in clinical testing: lessons from cervical cancer screening. J Natl Cancer Inst 2024;116:26-33. [PMID: 37758250 PMCID: PMC10777665 DOI: 10.1093/jnci/djad202] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 09/11/2023] [Accepted: 09/21/2023] [Indexed: 10/03/2023] Open

Abstract

Novel screening and diagnostic tests based on artificial intelligence (AI) image recognition algorithms are proliferating. Some initial reports claim outstanding accuracy followed by disappointing lack of confirmation, including our own early work on cervical screening. This is a presentation of lessons learned, organized as a conceptual step-by-step approach to bridge the gap between the creation of an AI algorithm and clinical efficacy. The first fundamental principle is specifying rigorously what the algorithm is designed to identify and what the test is intended to measure (eg, screening, diagnostic, or prognostic). Second, designing the AI algorithm to minimize the most clinically important errors. For example, many equivocal cervical images cannot yet be labeled because the borderline between cases and controls is blurred. To avoid a misclassified case-control dichotomy, we have isolated the equivocal cases and formally included an intermediate, indeterminate class (severity order of classes: case>indeterminate>control). The third principle is evaluating AI algorithms like any other test, using clinical epidemiologic criteria. Repeatability of the algorithm at the borderline, for indeterminate images, has proven extremely informative. Distinguishing between internal and external validation is also essential. Linking the AI algorithm results to clinical risk estimation is the fourth principle. Absolute risk (not relative) is the critical metric for translating a test result into clinical use. Finally, generating risk-based guidelines for clinical use that match local resources and priorities is the last principle in our approach. We are particularly interested in applications to lower-resource settings to address health disparities. We note that similar principles apply to other domains of AI-based image analysis for medical diagnostic testing.

Collapse

Affiliation(s)

Didem Egemen Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
Rebecca B Perkins Department of Obstetrics and Gynecology, Boston Medical Center/Boston University School of Medicine, Boston, MA, USA
Li C Cheung Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
Brian Befano Information Management Services Inc, Calverton, MD, USA Department of Epidemiology, School of Public Health, University of Washington, Seattle, WA, USA
Ana Cecilia Rodriguez Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
Kanan Desai Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
Andreanne Lemay Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, MA, USA
Syed Rakin Ahmed Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, MA, USA Harvard Graduate Program in Biophysics, Harvard Medical School, Harvard University, Cambridge, MA, USA Massachusetts Institute of Technology, Cambridge, MA, USA Geisel School of Medicine at Dartmouth, Dartmouth College, Hanover, NH, USA
Sameer Antani National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
Jose Jeronimo Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
Nicolas Wentzensen Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
Jayashree Kalpathy-Cramer Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, MA, USA
Silvia De Sanjose Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA ISGlobal, Barcelona, Spain
Mark Schiffman Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA

Collapse

Watanabe AT, Retson T, Wang J, Mantey R, Chim C, Karimabadi H. Mammographic Breast Density Model Using Semi-Supervised Learning Reduces Inter-/Intra-Reader Variability. Diagnostics (Basel) 2023;13:2694. [PMID: 37627953 PMCID: PMC10453732 DOI: 10.3390/diagnostics13162694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 07/27/2023] [Accepted: 08/13/2023] [Indexed: 08/27/2023] Open

Kai C, Ishizuka S, Otsuka T, Nara M, Kondo S, Futamura H, Kodama N, Kasai S. Automated Estimation of Mammary Gland Content Ratio Using Regression Deep Convolutional Neural Network and the Effectiveness in Clinical Practice as Explainable Artificial Intelligence. Cancers (Basel) 2023;15:2794. [PMID: 37345132 DOI: 10.3390/cancers15102794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Revised: 05/11/2023] [Accepted: 05/15/2023] [Indexed: 06/23/2023] Open

Gupta S, Kumar S, Chang K, Lu C, Singh P, Kalpathy-Cramer J. Collaborative Privacy-preserving Approaches for Distributed Deep Learning Using Multi-Institutional Data. Radiographics 2023;43:e220107. [PMID: 36862082 PMCID: PMC10091220 DOI: 10.1148/rg.220107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 08/04/2022] [Accepted: 08/09/2022] [Indexed: 03/03/2023]

Cole E, Valikodath NG, Al-Khaled T, Bajimaya S, KC S, Chuluunbat T, Munkhuu B, Jonas KE, Chuluunkhuu C, MacKeen LD, Yap V, Hallak J, Ostmo S, Wu WC, Coyner AS, Singh P, Kalpathy-Cramer J, Chiang MF, Campbell JP, Chan RVP. Evaluation of an Artificial Intelligence System for Retinopathy of Prematurity Screening in Nepal and Mongolia. OPHTHALMOLOGY SCIENCE 2022;2:100165. [PMID: 36531583 PMCID: PMC9754980 DOI: 10.1016/j.xops.2022.100165] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 04/19/2022] [Accepted: 04/19/2022] [Indexed: 05/09/2023]

Abstract

PURPOSE

To evaluate the performance of a deep learning (DL) algorithm for retinopathy of prematurity (ROP) screening in Nepal and Mongolia.

DESIGN

Retrospective analysis of prospectively collected clinical data.

PARTICIPANTS

Clinical information and fundus images were obtained from infants in 2 ROP screening programs in Nepal and Mongolia.

METHODS

Fundus images were obtained using the Forus 3nethra neo (Forus Health) in Nepal and the RetCam Portable (Natus Medical, Inc.) in Mongolia. The overall severity of ROP was determined from the medical record using the International Classification of ROP (ICROP). The presence of plus disease was determined independently in each image using a reference standard diagnosis. The Imaging and Informatics for ROP (i-ROP) DL algorithm was trained on images from the RetCam to classify plus disease and to assign a vascular severity score (VSS) from 1 through 9.

MAIN OUTCOME MEASURES

Area under the receiver operating characteristic curve and area under the precision-recall curve for the presence of plus disease or type 1 ROP and association between VSS and ICROP disease category.

RESULTS

The prevalence of type 1 ROP was found to be higher in Mongolia (14.0%) than in Nepal (2.2%; P < 0.001) in these data sets. In Mongolia (RetCam images), the area under the receiver operating characteristic curve for examination-level plus disease detection was 0.968, and the area under the precision-recall curve was 0.823. In Nepal (Forus images), these values were 0.999 and 0.993, respectively. The ROP VSS was associated with ICROP classification in both datasets (P < 0.001). At the population level, the median VSS was found to be higher in Mongolia (2.7; interquartile range [IQR], 1.3-5.4]) as compared with Nepal (1.9; IQR, 1.2-3.4; P < 0.001).

CONCLUSIONS

These data provide preliminary evidence of the effectiveness of the i-ROP DL algorithm for ROP screening in neonatal populations in Nepal and Mongolia using multiple camera systems and are useful for consideration in future clinical implementation of artificial intelligence-based ROP screening in low- and middle-income countries.

Collapse

Affiliation(s)

Emily Cole Department of Ophthalmology and Visual Sciences, Illinois Eye and Ear Infirmary, University of Illinois Chicago, Chicago, Illinois
Nita G. Valikodath Department of Ophthalmology and Visual Sciences, Illinois Eye and Ear Infirmary, University of Illinois Chicago, Chicago, Illinois
Tala Al-Khaled Department of Ophthalmology and Visual Sciences, Illinois Eye and Ear Infirmary, University of Illinois Chicago, Chicago, Illinois
Sanyam Bajimaya Tilganga Institute of Ophthalmology, Kathmandu, Nepal
Sagun KC Helen Keller International, Kathmandu, Nepal
Tsengelmaa Chuluunbat National Center for Maternal and Child Health, Ulaanbaatar, Mongolia
Bayalag Munkhuu National Center for Maternal and Child Health, Ulaanbaatar, Mongolia
Karyn E. Jonas Department of Ophthalmology and Visual Sciences, Illinois Eye and Ear Infirmary, University of Illinois Chicago, Chicago, Illinois
Chimgee Chuluunkhuu Orbis International, New York, New York
Leslie D. MacKeen The Hospital for Sick Children, Toronto, Canada Phoenix Technology Group, Pleasanton, California
Vivien Yap Department of Pediatrics, Weill Cornell Medical College, New York, New York
Joelle Hallak Department of Ophthalmology and Visual Sciences, Illinois Eye and Ear Infirmary, University of Illinois Chicago, Chicago, Illinois
Susan Ostmo Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, Oregon
Wei-Chi Wu Chang Gung Memorial Hospital, Taoyuan, Taiwan, and Chang Gung University, College of Medicine, Taoyuan, Taiwan
Aaron S. Coyner Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, Oregon
Praveer Singh Harvard Medical School, Boston, Massachusetts
Jayashree Kalpathy-Cramer Harvard Medical School, Boston, Massachusetts
Michael F. Chiang National Eye Institute, National Institutes of Health, Bethesda, Maryland
J. Peter Campbell Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, Oregon
R. V. Paul Chan Department of Ophthalmology and Visual Sciences, Illinois Eye and Ear Infirmary, University of Illinois Chicago, Chicago, Illinois Correspondence: R. V. Paul Chan, MD, MSc, MBA, Department of Ophthalmology and Visual Sciences, Illinois Eye and Ear Infirmary, University of Illinois at Chicago, 1905 West Taylor Street, Chicago, IL 60612.

Collapse

Improving the repeatability of deep learning models with Monte Carlo dropout. NPJ Digit Med 2022;5:174. [PMID: 36400939 PMCID: PMC9674698 DOI: 10.1038/s41746-022-00709-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Accepted: 10/10/2022] [Indexed: 11/19/2022] Open

Abstract AbstractThe integration of artificial intelligence into clinical workflows requires reliable and robust models. Repeatability is a key attribute of model robustness. Ideal repeatable models output predictions without variation during independent tests carried out under similar conditions. However, slight variations, though not ideal, may be unavoidable and acceptable in practice. During model development and evaluation, much attention is given to classification performance while model repeatability is rarely assessed, leading to the development of models that are unusable in clinical practice. In this work, we evaluate the repeatability of four model types (binary classification, multi-class classification, ordinal classification, and regression) on images that were acquired from the same patient during the same visit. We study the each model’s performance on four medical image classification tasks from public and private datasets: knee osteoarthritis, cervical cancer screening, breast density estimation, and retinopathy of prematurity. Repeatability is measured and compared on ResNet and DenseNet architectures. Moreover, we assess the impact of sampling Monte Carlo dropout predictions at test time on classification performance and repeatability. Leveraging Monte Carlo predictions significantly increases repeatability, in particular at the class boundaries, for all tasks on the binary, multi-class, and ordinal models leading to an average reduction of the 95% limits of agreement by 16% points and of the class disagreement rate by 7% points. The classification accuracy improves in most settings along with the repeatability. Our results suggest that beyond about 20 Monte Carlo iterations, there is no further gain in repeatability. In addition to the higher test-retest agreement, Monte Carlo predictions are better calibrated which leads to output probabilities reflecting more accurately the true likelihood of being correctly classified. Collapse

Bahl M. Artificial Intelligence in Clinical Practice: Implementation Considerations and Barriers. JOURNAL OF BREAST IMAGING 2022;4:632-639. [PMID: 36530476 PMCID: PMC9741727 DOI: 10.1093/jbi/wbac065] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Indexed: 09/06/2023]

Brink L, Coombs LP, Kattil Veettil D, Kuchipudi K, Marella S, Schmidt K, Nair SS, Tilkin M, Treml C, Chang K, Kalpathy-Cramer J. ACR’s Connect and AI-LAB technical framework. JAMIA Open 2022;5:ooac094. [PMID: 36380846 PMCID: PMC9651971 DOI: 10.1093/jamiaopen/ooac094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 10/11/2022] [Accepted: 10/31/2022] [Indexed: 11/13/2022] Open

Classifying Breast Density from Mammogram with Pretrained CNNs and Weighted Average Ensembles. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12115599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]

Bhowmik A, Eskreis-Winkler S. Deep learning in breast imaging. BJR Open 2022;4:20210060. [PMID: 36105427 PMCID: PMC9459862 DOI: 10.1259/bjro.20210060] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 04/04/2022] [Accepted: 04/21/2022] [Indexed: 11/22/2022] Open

Lu C, Hanif A, Singh P, Chang K, Coyner AS, Brown JM, Ostmo S, Chan RP, Rubin D, Chiang MF, Campbell JP, Kalpathy-Cramer J. Federated learning for multi-center collaboration in ophthalmology: improving classification performance in retinopathy of prematurity. Ophthalmol Retina 2022;6:657-663. [PMID: 35296449 DOI: 10.1016/j.oret.2022.02.015] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Revised: 02/10/2022] [Accepted: 02/28/2022] [Indexed: 11/29/2022]

Abstract

OBJECTIVE

To compare the performance of deep learning (DL) classifiers for the diagnosis of plus disease in retinopathy of prematurity (ROP) trained using two methods of developing models on multi-institutional datasets: centralizing data versus federated learning (FL) where no data leaves each institution.

DESIGN

Evaluation of a diagnostic test or technology.

SUBJECTS, PARTICIPANTS, AND/OR CONTROLS

DL models were trained, validated, and tested on 5,255 wide-angle retinal images in the neonatal intensive care units of 7 institutions as part of the Imaging and Informatics in ROP (i-ROP) study. All images were labeled for the presence of plus, pre-plus, or no plus disease with a clinical label, and a reference standard diagnosis (RSD) determined by three image-based ROP graders and the clinical diagnosis.

METHODS, INTERVENTION OR TESTING

We compared the area under the receiver operating characteristic curve (AUROC) for models developed on multi-institutional data, using a central approach, then FL, and compared locally trained models to either approach. We compared model performance (kappa) with label agreement (between clinical and RSD), dataset size and number of plus disease cases in each training cohort using Spearman's correlation coefficient (CC).

MAIN OUTCOME MEASURES

Model performance using AUROC and linearly-weighted kappa.

RESULTS

Four settings of experiment: FL trained on RSD against central trained on RSD, FL trained on clinical labels against central trained on clinical labels, FL trained on RSD against central trained on clinical labels, and FL trained on clinical labels against central trained on RSD (p=0.046, p=0.126, p=0.224, p=0.0173, respectively). 4/7 (57%) of models trained on local institutional data performed inferiorly to the FL models. Model performance for local models was positively correlated with label agreement (between clinical and RSD labels, CC = 0.389, p=0.387), total number of plus cases (CC=0.759, p=0.047), overall training set size (CC=0.924, p=0.002).

CONCLUSIONS

We show that a FL model trained performs comparably to a centralized model, confirming that FL may provide an effective, more feasible solution for inter-institutional learning. Smaller institutions benefit more from collaboration than larger institutions, showing the potential of FL for addressing disparities in resource access.

Collapse

Gastounioti A, Desai S, Ahluwalia VS, Conant EF, Kontos D. Artificial intelligence in mammographic phenotyping of breast cancer risk: a narrative review. Breast Cancer Res 2022;24:14. [PMID: 35184757 PMCID: PMC8859891 DOI: 10.1186/s13058-022-01509-z] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 02/08/2022] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Improved breast cancer risk assessment models are needed to enable personalized screening strategies that achieve better harm-to-benefit ratio based on earlier detection and better breast cancer outcomes than existing screening guidelines. Computational mammographic phenotypes have demonstrated a promising role in breast cancer risk prediction. With the recent exponential growth of computational efficiency, the artificial intelligence (AI) revolution, driven by the introduction of deep learning, has expanded the utility of imaging in predictive models. Consequently, AI-based imaging-derived data has led to some of the most promising tools for precision breast cancer screening.

MAIN BODY

This review aims to synthesize the current state-of-the-art applications of AI in mammographic phenotyping of breast cancer risk. We discuss the fundamentals of AI and explore the computing advancements that have made AI-based image analysis essential in refining breast cancer risk assessment. Specifically, we discuss the use of data derived from digital mammography as well as digital breast tomosynthesis. Different aspects of breast cancer risk assessment are targeted including (a) robust and reproducible evaluations of breast density, a well-established breast cancer risk factor, (b) assessment of a woman's inherent breast cancer risk, and (c) identification of women who are likely to be diagnosed with breast cancers after a negative or routine screen due to masking or the rapid and aggressive growth of a tumor. Lastly, we discuss AI challenges unique to the computational analysis of mammographic imaging as well as future directions for this promising research field.

CONCLUSIONS

We provide a useful reference for AI researchers investigating image-based breast cancer risk assessment while indicating key priorities and challenges that, if properly addressed, could accelerate the implementation of AI-assisted risk stratification to future refine and individualize breast cancer screening strategies.

Collapse

Lee SE, Son NH, Kim MH, Kim EK. Mammographic Density Assessment by Artificial Intelligence-Based Computer-Assisted Diagnosis: A Comparison with Automated Volumetric Assessment. J Digit Imaging 2022;35:173-179. [PMID: 35015180 PMCID: PMC8921363 DOI: 10.1007/s10278-021-00555-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 10/05/2021] [Accepted: 11/21/2021] [Indexed: 10/19/2022] Open

Tardy M, Mateus D. Leveraging Multi-Task Learning to Cope With Poor and Missing Labels of Mammograms. FRONTIERS IN RADIOLOGY 2022;1:796078. [PMID: 37492176 PMCID: PMC10365086 DOI: 10.3389/fradi.2021.796078] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 12/06/2021] [Indexed: 07/27/2023]

Kumar I, Kumar A, Kumar VDA, Kannan R, Vimal V, Singh KU, Mahmud M. Dense Tissue Pattern Characterization Using Deep Neural Network. Cognit Comput 2022. [DOI: 10.1007/s12559-021-09970-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Abstract AbstractBreast tumors are from the common infections among women around the world. Classifying the various types of breast tumors contribute to treating breast tumors more efficiently. However, this classification task is often hindered by dense tissue patterns captured in mammograms. The present study has been proposed a dense tissue pattern characterization framework using deep neural network. A total of 322 mammograms belonging to the mini-MIAS dataset and 4880 mammograms from DDSM dataset have been taken, and an ROI of fixed size 224 × 224 pixels from each mammogram has been extracted. In this work, tedious experimentation has been executed using different combinations of training and testing sets using different activation function with AlexNet, ResNet-18 model. Data augmentation has been used to create a similar type of virtual image for proper training of the DL model. After that, the testing set is applied on the trained model to validate the proposed model. During experiments, four different activation functions ‘sigmoid’, ‘tanh’, ‘ReLu’, and ‘leakyReLu’ are used, and the outcome for each function has been reported. It has been found that activation function ‘ReLu’ perform always outstanding with respect to others. For each experiment, classification accuracy and kappa coefficient have been computed. The obtained accuracy and kappa value for MIAS dataset using ResNet-18 model is 91.3% and 0.803, respectively. For DDSM dataset, the accuracy of 92.3% and kappa coefficient value of 0.846 are achieved. After the combination of both dataset images, the achieved accuracy is 91.9%, and kappa coefficient value is 0.839 using ResNet-18 model. Finally, it has been concluded that the ResNet-18 model and ReLu activation function yield outstanding performance for the task. Collapse

Singh NM, Harrod JB, Subramanian S, Robinson M, Chang K, Cetin-Karayumak S, Dalca AV, Eickhoff S, Fox M, Franke L, Golland P, Haehn D, Iglesias JE, O’Donnell LJ, Ou Y, Rathi Y, Siddiqi SH, Sun H, Westover MB, Whitfield-Gabrieli S, Gollub RL. How Machine Learning is Powering Neuroimaging to Improve Brain Health. Neuroinformatics 2022;20:943-964. [PMID: 35347570 PMCID: PMC9515245 DOI: 10.1007/s12021-022-09572-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/07/2022] [Indexed: 12/31/2022]

Affiliation(s)

Nalini M. Singh Harvard-MIT Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
Jordan B. Harrod Harvard-MIT Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
Sandya Subramanian Harvard-MIT Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
Mitchell Robinson Harvard-MIT Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
Ken Chang Harvard-MIT Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
Suheyla Cetin-Karayumak Department of Psychiatry, Brigham and Women’s Hospital and Harvard Medical School, Boston, 02115 USA
Adrian Vasile Dalca Martinos, Radiology, MGH, MIT, HMS & EECS, Cambridge, 02114 USA
Simon Eickhoff Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany ,Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7) Research Centre Jülich, Jülich, Germany
Michael Fox Center for Brain Circuit Therapeutics, Department of Neurology, Psychiatry, and Radiology, Brigham and Women’s Hospital and Harvard Medical School, 02115 Boston, USA
Loraine Franke University of Massachusetts Boston, Boston, MA 02125 USA
Polina Golland Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
Daniel Haehn University of Massachusetts Boston, Boston, MA 02125 USA
Juan Eugenio Iglesias Centre for Medical Image Computing, University College London, London, UK ,Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, 02114 USA ,Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
Lauren J. O’Donnell Department of Radiology, Brigham and Women’s Hospital and Harvard Medical School, MA 02115 Boston, USA
Yangming Ou Department of Radiology, Boston Children’s Hospital, Harvard Medical School, Boston, MA 02115 USA
Yogesh Rathi Department of Psychiatry, Brigham and Women’s Hospital and Harvard Medical School, Boston, 02115 USA
Shan H. Siddiqi Department of Psychiatry, Brigham and Women’s Hospital and Harvard Medical School, Boston, 02115 USA
Haoqi Sun Department of Neurology and McCance Center for Brain Health / Harvard Medical School, Massachusetts General Hospital, Boston, 02114 USA
M. Brandon Westover Department of Neurology and McCance Center for Brain Health / Harvard Medical School, Massachusetts General Hospital, Boston, 02114 USA
Susan Whitfield-Gabrieli Department of Psychology, Northeastern University, Boston, 02115 USA
Randy L. Gollub Department of Psychiatry and Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, MA 02114 USA

Collapse

Desai KT, Befano B, Xue Z, Kelly H, Campos NG, Egemen D, Gage JC, Rodriguez AC, Sahasrabuddhe V, Levitz D, Pearlman P, Jeronimo J, Antani S, Schiffman M, de Sanjosé S. The development of "automated visual evaluation" for cervical cancer screening: The promise and challenges in adapting deep-learning for clinical testing. Int J Cancer 2021;150:741-752. [PMID: 34800038 PMCID: PMC8732320 DOI: 10.1002/ijc.33879] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 09/24/2021] [Accepted: 10/15/2021] [Indexed: 12/22/2022]

Arun N, Gaw N, Singh P, Chang K, Aggarwal M, Chen B, Hoebel K, Gupta S, Patel J, Gidwani M, Adebayo J, Li MD, Kalpathy-Cramer J. Assessing the Trustworthiness of Saliency Maps for Localizing Abnormalities in Medical Imaging. Radiol Artif Intell 2021;3:e200267. [PMID: 34870212 PMCID: PMC8637231 DOI: 10.1148/ryai.2021200267] [Citation(s) in RCA: 58] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 09/13/2021] [Accepted: 09/20/2021] [Indexed: 11/11/2022]

Abstract

PURPOSE

To evaluate the trustworthiness of saliency maps for abnormality localization in medical imaging.

MATERIALS AND METHODS

Using two large publicly available radiology datasets (Society for Imaging Informatics in Medicine-American College of Radiology Pneumothorax Segmentation dataset and Radiological Society of North America Pneumonia Detection Challenge dataset), the performance of eight commonly used saliency map techniques were quantified in regard to (a) localization utility (segmentation and detection), (b) sensitivity to model weight randomization, (c) repeatability, and (d) reproducibility. Their performances versus baseline methods and localization network architectures were compared, using area under the precision-recall curve (AUPRC) and structural similarity index measure (SSIM) as metrics.

RESULTS

All eight saliency map techniques failed at least one of the criteria and were inferior in performance compared with localization networks. For pneumothorax segmentation, the AUPRC ranged from 0.024 to 0.224, while a U-Net achieved a significantly superior AUPRC of 0.404 (P < .005). For pneumonia detection, the AUPRC ranged from 0.160 to 0.519, while a RetinaNet achieved a significantly superior AUPRC of 0.596 (P <.005). Five and two saliency methods (of eight) failed the model randomization test on the segmentation and detection datasets, respectively, suggesting that these methods are not sensitive to changes in model parameters. The repeatability and reproducibility of the majority of the saliency methods were worse than localization networks for both the segmentation and detection datasets.

CONCLUSION

The use of saliency maps in the high-risk domain of medical imaging warrants additional scrutiny and recommend that detection or segmentation models be used if localization is the desired output of the network.Keywords: Technology Assessment, Technical Aspects, Feature Detection, Convolutional Neural Network (CNN) Supplemental material is available for this article. © RSNA, 2021.

Collapse

Affiliation(s)

Nishanth Arun
Nathan Gaw
Praveer Singh From the Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 149 13th St, Boston, MA 02129 (N.A., P.S., K.C., M.A., B.C., K.H., S.G., J.P., M.G., M.D.L., J.K.C.); Department of Computer Science, Shiv Nadar University, Greater Noida, India (N.A.); Department of Operational Sciences, Graduate School of Engineering and Management, Air Force Institute of Technology, Wright-Patterson AFB, Dayton, Ohio (N.G.); and Massachusetts Institute of Technology, Cambridge, Mass (K.C., B.C., K.H., J.P., J.A.)
Ken Chang From the Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 149 13th St, Boston, MA 02129 (N.A., P.S., K.C., M.A., B.C., K.H., S.G., J.P., M.G., M.D.L., J.K.C.); Department of Computer Science, Shiv Nadar University, Greater Noida, India (N.A.); Department of Operational Sciences, Graduate School of Engineering and Management, Air Force Institute of Technology, Wright-Patterson AFB, Dayton, Ohio (N.G.); and Massachusetts Institute of Technology, Cambridge, Mass (K.C., B.C., K.H., J.P., J.A.)
Mehak Aggarwal From the Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 149 13th St, Boston, MA 02129 (N.A., P.S., K.C., M.A., B.C., K.H., S.G., J.P., M.G., M.D.L., J.K.C.); Department of Computer Science, Shiv Nadar University, Greater Noida, India (N.A.); Department of Operational Sciences, Graduate School of Engineering and Management, Air Force Institute of Technology, Wright-Patterson AFB, Dayton, Ohio (N.G.); and Massachusetts Institute of Technology, Cambridge, Mass (K.C., B.C., K.H., J.P., J.A.)
Bryan Chen From the Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 149 13th St, Boston, MA 02129 (N.A., P.S., K.C., M.A., B.C., K.H., S.G., J.P., M.G., M.D.L., J.K.C.); Department of Computer Science, Shiv Nadar University, Greater Noida, India (N.A.); Department of Operational Sciences, Graduate School of Engineering and Management, Air Force Institute of Technology, Wright-Patterson AFB, Dayton, Ohio (N.G.); and Massachusetts Institute of Technology, Cambridge, Mass (K.C., B.C., K.H., J.P., J.A.)
Katharina Hoebel From the Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 149 13th St, Boston, MA 02129 (N.A., P.S., K.C., M.A., B.C., K.H., S.G., J.P., M.G., M.D.L., J.K.C.); Department of Computer Science, Shiv Nadar University, Greater Noida, India (N.A.); Department of Operational Sciences, Graduate School of Engineering and Management, Air Force Institute of Technology, Wright-Patterson AFB, Dayton, Ohio (N.G.); and Massachusetts Institute of Technology, Cambridge, Mass (K.C., B.C., K.H., J.P., J.A.)
Sharut Gupta From the Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 149 13th St, Boston, MA 02129 (N.A., P.S., K.C., M.A., B.C., K.H., S.G., J.P., M.G., M.D.L., J.K.C.); Department of Computer Science, Shiv Nadar University, Greater Noida, India (N.A.); Department of Operational Sciences, Graduate School of Engineering and Management, Air Force Institute of Technology, Wright-Patterson AFB, Dayton, Ohio (N.G.); and Massachusetts Institute of Technology, Cambridge, Mass (K.C., B.C., K.H., J.P., J.A.)
Jay Patel From the Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 149 13th St, Boston, MA 02129 (N.A., P.S., K.C., M.A., B.C., K.H., S.G., J.P., M.G., M.D.L., J.K.C.); Department of Computer Science, Shiv Nadar University, Greater Noida, India (N.A.); Department of Operational Sciences, Graduate School of Engineering and Management, Air Force Institute of Technology, Wright-Patterson AFB, Dayton, Ohio (N.G.); and Massachusetts Institute of Technology, Cambridge, Mass (K.C., B.C., K.H., J.P., J.A.)
Mishka Gidwani From the Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 149 13th St, Boston, MA 02129 (N.A., P.S., K.C., M.A., B.C., K.H., S.G., J.P., M.G., M.D.L., J.K.C.); Department of Computer Science, Shiv Nadar University, Greater Noida, India (N.A.); Department of Operational Sciences, Graduate School of Engineering and Management, Air Force Institute of Technology, Wright-Patterson AFB, Dayton, Ohio (N.G.); and Massachusetts Institute of Technology, Cambridge, Mass (K.C., B.C., K.H., J.P., J.A.)
Julius Adebayo From the Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 149 13th St, Boston, MA 02129 (N.A., P.S., K.C., M.A., B.C., K.H., S.G., J.P., M.G., M.D.L., J.K.C.); Department of Computer Science, Shiv Nadar University, Greater Noida, India (N.A.); Department of Operational Sciences, Graduate School of Engineering and Management, Air Force Institute of Technology, Wright-Patterson AFB, Dayton, Ohio (N.G.); and Massachusetts Institute of Technology, Cambridge, Mass (K.C., B.C., K.H., J.P., J.A.)
Matthew D. Li From the Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 149 13th St, Boston, MA 02129 (N.A., P.S., K.C., M.A., B.C., K.H., S.G., J.P., M.G., M.D.L., J.K.C.); Department of Computer Science, Shiv Nadar University, Greater Noida, India (N.A.); Department of Operational Sciences, Graduate School of Engineering and Management, Air Force Institute of Technology, Wright-Patterson AFB, Dayton, Ohio (N.G.); and Massachusetts Institute of Technology, Cambridge, Mass (K.C., B.C., K.H., J.P., J.A.)
Jayashree Kalpathy-Cramer From the Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 149 13th St, Boston, MA 02129 (N.A., P.S., K.C., M.A., B.C., K.H., S.G., J.P., M.G., M.D.L., J.K.C.); Department of Computer Science, Shiv Nadar University, Greater Noida, India (N.A.); Department of Operational Sciences, Graduate School of Engineering and Management, Air Force Institute of Technology, Wright-Patterson AFB, Dayton, Ohio (N.G.); and Massachusetts Institute of Technology, Cambridge, Mass (K.C., B.C., K.H., J.P., J.A.)

Collapse

Kalpathy-Cramer J, Patel JB, Bridge C, Chang K. Basic Artificial Intelligence Techniques: Evaluation of Artificial Intelligence Performance. Radiol Clin North Am 2021;59:941-954. [PMID: 34689879 DOI: 10.1016/j.rcl.2021.06.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Radiology Implementation Considerations for Artificial Intelligence (AI) Applied to COVID-19, From the AJR Special Series on AI Applications. AJR Am J Roentgenol 2021;219:15-23. [PMID: 34612681 DOI: 10.2214/ajr.21.26717] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Allen B, Dreyer K, Stibolt R, Agarwal S, Coombs L, Treml C, Elkholy M, Brink L, Wald C. Evaluation and Real-World Performance Monitoring of Artificial Intelligence Models in Clinical Practice: Try It, Buy It, Check It. J Am Coll Radiol 2021;18:1489-1496. [PMID: 34599876 DOI: 10.1016/j.jacr.2021.08.022] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Accepted: 08/02/2021] [Indexed: 01/16/2023]

Chen JS, Coyner AS, Ostmo S, Sonmez K, Bajimaya S, Pradhan E, Valikodath N, Cole ED, Al-Khaled T, Chan RVP, Singh P, Kalpathy-Cramer J, Chiang MF, Campbell JP. Deep Learning for the Diagnosis of Stage in Retinopathy of Prematurity: Accuracy and Generalizability across Populations and Cameras. Ophthalmol Retina 2021;5:1027-1035. [PMID: 33561545 PMCID: PMC8364291 DOI: 10.1016/j.oret.2020.12.013] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 12/02/2020] [Accepted: 12/16/2020] [Indexed: 12/23/2022]

Abstract

PURPOSE

Stage is an important feature to identify in retinal images of infants at risk of retinopathy of prematurity (ROP). The purpose of this study was to implement a convolutional neural network (CNN) for binary detection of stages 1, 2, and 3 in ROP and to evaluate its generalizability across different populations and camera systems.

DESIGN

Diagnostic validation study of CNN for stage detection.

PARTICIPANTS

Retinal fundus images obtained from preterm infants during routine ROP screenings.

METHODS

Two datasets were used: 5943 fundus images obtained by RetCam camera (Natus Medical, Pleasanton, CA) from 9 North American institutions and 5049 images obtained by 3nethra camera (Forus Health Incorporated, Bengaluru, India) from 4 hospitals in Nepal. Images were labeled based on the presence of stage by 1 to 3 expert graders. Three CNN models were trained using 5-fold cross-validation on datasets from North America alone, Nepal alone, and a combined dataset and were evaluated on 2 held-out test sets consisting of 708 and 247 images from the Nepali and North American datasets, respectively.

MAIN OUTCOME MEASURES

Convolutional neural network performance was evaluated using area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), sensitivity, and specificity.

RESULTS

Both the North American- and Nepali-trained models demonstrated high performance on a test set from the same population: AUROC, 0.99; AUPRC, 0.98; sensitivity, 94%; and AUROC, 0.97; AUPRC, 0.91; and sensitivity, 73%; respectively. However, the performance of each model decreased to AUROC of 0.96 and AUPRC of 0.88 (sensitivity, 52%) and AUROC of 0.62 and AUPRC of 0.36 (sensitivity, 44%) when evaluated on a test set from the other population. Compared with the models trained on individual datasets, the model trained on a combined dataset achieved improved performance on each respective test set: sensitivity improved from 94% to 98% on the North American test set and from 73% to 82% on the Nepali test set.

CONCLUSIONS

A CNN can identify accurately the presence of ROP stage in retinal images, but performance depends on the similarity between training and testing populations. We demonstrated that internal and external performance can be improved by increasing the heterogeneity of the training dataset features of the training dataset, in this case by combining images from different populations and cameras.

Collapse

Affiliation(s)

Jimmy S Chen Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, Oregon
Aaron S Coyner Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, Oregon
Susan Ostmo Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, Oregon
Kemal Sonmez Cancer Early Detection Advanced Research Center, Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon
Sanyam Bajimaya Tilganga Institute of Ophthalmology, Kathmandu, Nepal
Eli Pradhan Tilganga Institute of Ophthalmology, Kathmandu, Nepal
Nita Valikodath Department of Ophthalmology and Visual Sciences, Illinois Eye and Ear Infirmary, University of Illinois at Chicago, Chicago, Illinois
Emily D Cole Department of Ophthalmology and Visual Sciences, Illinois Eye and Ear Infirmary, University of Illinois at Chicago, Chicago, Illinois
Tala Al-Khaled Department of Ophthalmology and Visual Sciences, Illinois Eye and Ear Infirmary, University of Illinois at Chicago, Chicago, Illinois
R V Paul Chan Department of Ophthalmology and Visual Sciences, Illinois Eye and Ear Infirmary, University of Illinois at Chicago, Chicago, Illinois
Praveer Singh Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Charlestown, Massachusetts; Center for Clinical Data Science, Massachusetts General Hospital and Brigham and Women's Hospital, Boston, Massachusetts
Jayashree Kalpathy-Cramer Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Charlestown, Massachusetts; Center for Clinical Data Science, Massachusetts General Hospital and Brigham and Women's Hospital, Boston, Massachusetts
Michael F Chiang Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, Oregon; Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, Oregon
J Peter Campbell Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, Oregon.

Collapse

Lee A, Kim MS, Han SS, Park P, Lee C, Yun JP. Deep learning neural networks to differentiate Stafne's bone cavity from pathological radiolucent lesions of the mandible in heterogeneous panoramic radiography. PLoS One 2021;16:e0254997. [PMID: 34283883 PMCID: PMC8291753 DOI: 10.1371/journal.pone.0254997] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Accepted: 07/07/2021] [Indexed: 11/18/2022] Open

Chang K, Singh P, Vepakomma P, Poirot MG, Raskar R, Rubin DL, Kalpathy-Cramer J. Privacy-preserving collaborative deep learning methods for multiinstitutional training without sharing patient data. Artif Intell Med 2021. [DOI: 10.1016/b978-0-12-821259-2.00006-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]