1
|
Nisanova A, Yavary A, Deaner J, Ali FS, Gogte P, Kaplan R, Chen KC, Nudleman E, Grewal D, Gupta M, Wolfe J, Klufas M, Yiu G, Soltani I, Emami-Naeini P. Performance of Automated Machine Learning in Predicting Outcomes of Pneumatic Retinopexy. OPHTHALMOLOGY SCIENCE 2024; 4:100470. [PMID: 38827487 PMCID: PMC11141253 DOI: 10.1016/j.xops.2024.100470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 12/30/2023] [Accepted: 01/12/2024] [Indexed: 06/04/2024]
Abstract
Purpose Automated machine learning (AutoML) has emerged as a novel tool for medical professionals lacking coding experience, enabling them to develop predictive models for treatment outcomes. This study evaluated the performance of AutoML tools in developing models predicting the success of pneumatic retinopexy (PR) in treatment of rhegmatogenous retinal detachment (RRD). These models were then compared with custom models created by machine learning (ML) experts. Design Retrospective multicenter study. Participants Five hundred and thirty nine consecutive patients with primary RRD that underwent PR by a vitreoretinal fellow at 6 training hospitals between 2002 and 2022. Methods We used 2 AutoML platforms: MATLAB Classification Learner and Google Cloud AutoML. Additional models were developed by computer scientists. We included patient demographics and baseline characteristics, including lens and macula status, RRD size, number and location of breaks, presence of vitreous hemorrhage and lattice degeneration, and physicians' experience. The dataset was split into a training (n = 483) and test set (n = 56). The training set, with a 2:1 success-to-failure ratio, was used to train the MATLAB models. Because Google Cloud AutoML requires a minimum of 1000 samples, the training set was tripled to create a new set with 1449 datapoints. Additionally, balanced datasets with a 1:1 success-to-failure ratio were created using Python. Main Outcome Measures Single-procedure anatomic success rate, as predicted by the ML models. F2 scores and area under the receiver operating curve (AUROC) were used as primary metrics to compare models. Results The best performing AutoML model (F2 score: 0.85; AUROC: 0.90; MATLAB), showed comparable performance to the custom model (0.92, 0.86) when trained on the balanced datasets. However, training the AutoML model with imbalanced data yielded misleadingly high AUROC (0.81) despite low F2-score (0.2) and sensitivity (0.17). Conclusions We demonstrated the feasibility of using AutoML as an accessible tool for medical professionals to develop models from clinical data. Such models can ultimately aid in the clinical decision-making, contributing to better patient outcomes. However, outcomes can be misleading or unreliable if used naively. Limitations exist, particularly if datasets contain missing variables or are highly imbalanced. Proper model selection and data preprocessing can improve the reliability of AutoML tools. Financial Disclosures Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
Collapse
Affiliation(s)
- Arina Nisanova
- School of Medicine, University of California Davis, Davis, California
| | - Arefeh Yavary
- Department of Computer Science, University of California Davis, Davis, California
| | - Jordan Deaner
- Mid Atlantic Retina, Wills Eye Hospital, Philadelphia, Pennsylvania
| | | | | | - Richard Kaplan
- New York Eye and Ear Infirmary of Mount Sinai, New York, New York
| | | | - Eric Nudleman
- Shiley Eye Center, University of California San Diego, La Jolla, California
| | | | - Meenakashi Gupta
- New York Eye and Ear Infirmary of Mount Sinai, New York, New York
| | - Jeremy Wolfe
- Associated Retinal Consultants, Royal Oak, Michigan
| | - Michael Klufas
- Wills Eye Hospital, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Glenn Yiu
- Tschannen Eye Institute, University of California Davis, Sacramento, California
| | - Iman Soltani
- Department of Mechanical and Aerospace Engineering, University of California Davis, Davis, California
| | - Parisa Emami-Naeini
- Tschannen Eye Institute, University of California Davis, Sacramento, California
| |
Collapse
|
2
|
Schmidt K, Bearce B, Chang K, Coombs L, Farahani K, Elbatel M, Mouheb K, Marti R, Zhang R, Zhang Y, Wang Y, Hu Y, Ying H, Xu Y, Testagrose C, Demirer M, Gupta V, Akünal Ü, Bujotzek M, Maier-Hein KH, Qin Y, Li X, Kalpathy-Cramer J, Roth HR. Fair evaluation of federated learning algorithms for automated breast density classification: The results of the 2022 ACR-NCI-NVIDIA federated learning challenge. Med Image Anal 2024; 95:103206. [PMID: 38776844 DOI: 10.1016/j.media.2024.103206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 02/15/2024] [Accepted: 05/13/2024] [Indexed: 05/25/2024]
Abstract
The correct interpretation of breast density is important in the assessment of breast cancer risk. AI has been shown capable of accurately predicting breast density, however, due to the differences in imaging characteristics across mammography systems, models built using data from one system do not generalize well to other systems. Though federated learning (FL) has emerged as a way to improve the generalizability of AI without the need to share data, the best way to preserve features from all training data during FL is an active area of research. To explore FL methodology, the breast density classification FL challenge was hosted in partnership with the American College of Radiology, Harvard Medical Schools' Mass General Brigham, University of Colorado, NVIDIA, and the National Institutes of Health National Cancer Institute. Challenge participants were able to submit docker containers capable of implementing FL on three simulated medical facilities, each containing a unique large mammography dataset. The breast density FL challenge ran from June 15 to September 5, 2022, attracting seven finalists from around the world. The winning FL submission reached a linear kappa score of 0.653 on the challenge test data and 0.413 on an external testing dataset, scoring comparably to a model trained on the same data in a central location.
Collapse
Affiliation(s)
| | - Benjamin Bearce
- The Massachusetts General Hospital, USA; University of Colorado, USA
| | - Ken Chang
- The Massachusetts General Hospital, USA
| | | | - Keyvan Farahani
- National Institutes of Health National Cancer Institute, USA
| | - Marawan Elbatel
- Computer Vision and Robotics Institute, University of Girona, Spain
| | - Kaouther Mouheb
- Computer Vision and Robotics Institute, University of Girona, Spain
| | - Robert Marti
- Computer Vision and Robotics Institute, University of Girona, Spain
| | - Ruipeng Zhang
- Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, China; Shanghai AI Laboratory, China
| | | | - Yanfeng Wang
- Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, China; Shanghai AI Laboratory, China
| | - Yaojun Hu
- Real Doctor AI Research Centre, Zhejiang University, China
| | - Haochao Ying
- Real Doctor AI Research Centre, Zhejiang University, China; School of Public Health, Zhejiang University, China
| | - Yuyang Xu
- Real Doctor AI Research Centre, Zhejiang University, China; College of Computer Science and Technology, Zhejiang University, China
| | | | | | | | - Ünal Akünal
- Division of Medical Image Computing, German Cancer Research Center, Heidelberg, Germany
| | - Markus Bujotzek
- Division of Medical Image Computing, German Cancer Research Center, Heidelberg, Germany
| | - Klaus H Maier-Hein
- Division of Medical Image Computing, German Cancer Research Center, Heidelberg, Germany
| | - Yi Qin
- Electronic and Computer Engineering, Hong Kong University of Science and Technology, China
| | - Xiaomeng Li
- Electronic and Computer Engineering, Hong Kong University of Science and Technology, China
| | | | | |
Collapse
|
3
|
Sim SY, Hwang J, Ryu J, Kim H, Kim EJ, Lee JY. Differential Diagnosis of OKC and SBC on Panoramic Radiographs: Leveraging Deep Learning Algorithms. Diagnostics (Basel) 2024; 14:1144. [PMID: 38893670 PMCID: PMC11172000 DOI: 10.3390/diagnostics14111144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 05/28/2024] [Accepted: 05/28/2024] [Indexed: 06/21/2024] Open
Abstract
This study aims to determine whether it can distinguish odontogenic keratocyst (OKC) and simple bone cyst (SBC) based solely on preoperative panoramic radiographs through a deep learning algorithm. (1) Methods: We conducted a retrospective analysis of patient data from January 2018 to December 2022 at Pusan National University Dental Hospital. This study included 63 cases of OKC confirmed by histological examination after surgical excision and 125 cases of SBC that underwent surgical curettage. All panoramic radiographs were obtained utilizing the Proline XC system (Planmeca Co., Helsinki, Finland), which already had diagnostic data on them. The panoramic images were cut into 299 × 299 cropped sizes and divided into 80% training and 20% validation data sets for 5-fold cross-validation. Inception-ResNet-V2 system was adopted to train for OKC and SBC discrimination. (2) Results: The classification network for diagnostic performance evaluation achieved 0.829 accuracy, 0.800 precision, 0.615 recall, and a 0.695 F1 score. (4) Conclusions: The deep learning algorithm demonstrated notable accuracy in distinguishing OKC from SBC, facilitated by CAM visualization. This progress is expected to become an essential resource for clinicians, improving diagnostic and treatment outcomes.
Collapse
Affiliation(s)
- Su-Yi Sim
- Department of Oral and Maxillofacial Surgery, Dental and Life Science Institute & Dental Research Institute, School of Dentistry, Pusan National University, Yangsan 50612, Republic of Korea; (S.-Y.S.); (J.R.); (H.K.)
| | - JaeJoon Hwang
- Department of Oral and Maxillofacial Radiology, Dental and Life Science Institute & Dental Research Institute, School of Dentistry, Pusan National University, Yangsan 50612, Republic of Korea;
| | - Jihye Ryu
- Department of Oral and Maxillofacial Surgery, Dental and Life Science Institute & Dental Research Institute, School of Dentistry, Pusan National University, Yangsan 50612, Republic of Korea; (S.-Y.S.); (J.R.); (H.K.)
| | - Hyeonjin Kim
- Department of Oral and Maxillofacial Surgery, Dental and Life Science Institute & Dental Research Institute, School of Dentistry, Pusan National University, Yangsan 50612, Republic of Korea; (S.-Y.S.); (J.R.); (H.K.)
| | - Eun-Jung Kim
- Department of Dental Anesthesia and Pain Medicine, School of Dentistry, Pusan National University, Yangsan 50612, Republic of Korea;
| | - Jae-Yeol Lee
- Department of Oral and Maxillofacial Surgery, Dental and Life Science Institute & Dental Research Institute, School of Dentistry, Pusan National University, Yangsan 50612, Republic of Korea; (S.-Y.S.); (J.R.); (H.K.)
| |
Collapse
|
4
|
Egemen D, Perkins RB, Cheung LC, Befano B, Rodriguez AC, Desai K, Lemay A, Ahmed SR, Antani S, Jeronimo J, Wentzensen N, Kalpathy-Cramer J, De Sanjose S, Schiffman M. Artificial intelligence-based image analysis in clinical testing: lessons from cervical cancer screening. J Natl Cancer Inst 2024; 116:26-33. [PMID: 37758250 PMCID: PMC10777665 DOI: 10.1093/jnci/djad202] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 09/11/2023] [Accepted: 09/21/2023] [Indexed: 10/03/2023] Open
Abstract
Novel screening and diagnostic tests based on artificial intelligence (AI) image recognition algorithms are proliferating. Some initial reports claim outstanding accuracy followed by disappointing lack of confirmation, including our own early work on cervical screening. This is a presentation of lessons learned, organized as a conceptual step-by-step approach to bridge the gap between the creation of an AI algorithm and clinical efficacy. The first fundamental principle is specifying rigorously what the algorithm is designed to identify and what the test is intended to measure (eg, screening, diagnostic, or prognostic). Second, designing the AI algorithm to minimize the most clinically important errors. For example, many equivocal cervical images cannot yet be labeled because the borderline between cases and controls is blurred. To avoid a misclassified case-control dichotomy, we have isolated the equivocal cases and formally included an intermediate, indeterminate class (severity order of classes: case>indeterminate>control). The third principle is evaluating AI algorithms like any other test, using clinical epidemiologic criteria. Repeatability of the algorithm at the borderline, for indeterminate images, has proven extremely informative. Distinguishing between internal and external validation is also essential. Linking the AI algorithm results to clinical risk estimation is the fourth principle. Absolute risk (not relative) is the critical metric for translating a test result into clinical use. Finally, generating risk-based guidelines for clinical use that match local resources and priorities is the last principle in our approach. We are particularly interested in applications to lower-resource settings to address health disparities. We note that similar principles apply to other domains of AI-based image analysis for medical diagnostic testing.
Collapse
Affiliation(s)
- Didem Egemen
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Rebecca B Perkins
- Department of Obstetrics and Gynecology, Boston Medical Center/Boston University School of Medicine, Boston, MA, USA
| | - Li C Cheung
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Brian Befano
- Information Management Services Inc, Calverton, MD, USA
- Department of Epidemiology, School of Public Health, University of Washington, Seattle, WA, USA
| | - Ana Cecilia Rodriguez
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Kanan Desai
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Andreanne Lemay
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, MA, USA
| | - Syed Rakin Ahmed
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, MA, USA
- Harvard Graduate Program in Biophysics, Harvard Medical School, Harvard University, Cambridge, MA, USA
- Massachusetts Institute of Technology, Cambridge, MA, USA
- Geisel School of Medicine at Dartmouth, Dartmouth College, Hanover, NH, USA
| | - Sameer Antani
- National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Jose Jeronimo
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Nicolas Wentzensen
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | - Jayashree Kalpathy-Cramer
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, MA, USA
| | - Silvia De Sanjose
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
- ISGlobal, Barcelona, Spain
| | - Mark Schiffman
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| |
Collapse
|
5
|
Watanabe AT, Retson T, Wang J, Mantey R, Chim C, Karimabadi H. Mammographic Breast Density Model Using Semi-Supervised Learning Reduces Inter-/Intra-Reader Variability. Diagnostics (Basel) 2023; 13:2694. [PMID: 37627953 PMCID: PMC10453732 DOI: 10.3390/diagnostics13162694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 07/27/2023] [Accepted: 08/13/2023] [Indexed: 08/27/2023] Open
Abstract
Breast density is an important risk factor for breast cancer development; however, imager inconsistency in density reporting can lead to patient and clinician confusion. A deep learning (DL) model for mammographic density grading was examined in a retrospective multi-reader multi-case study consisting of 928 image pairs and assessed for impact on inter- and intra-reader variability and reading time. Seven readers assigned density categories to the images, then re-read the test set aided by the model after a 4-week washout. To measure intra-reader agreement, 100 image pairs were blindly double read in both sessions. Linear Cohen Kappa (κ) and Student's t-test were used to assess the model and reader performance. The model achieved a κ of 0.87 (95% CI: 0.84, 0.89) for four-class density assessment and a κ of 0.91 (95% CI: 0.88, 0.93) for binary non-dense/dense assessment. Superiority tests showed significant reduction in inter-reader variability (κ improved from 0.70 to 0.88, p ≤ 0.001) and intra-reader variability (κ improved from 0.83 to 0.95, p ≤ 0.01) for four-class density, and significant reduction in inter-reader variability (κ improved from 0.77 to 0.96, p ≤ 0.001) and intra-reader variability (κ improved from 0.89 to 0.97, p ≤ 0.01) for binary non-dense/dense assessment when aided by DL. The average reader mean reading time per image pair also decreased by 30%, 0.86 s (95% CI: 0.01, 1.71), with six of seven readers having reading time reductions.
Collapse
Affiliation(s)
- Alyssa T. Watanabe
- Department of Radiology, Keck School of Medicine, University of Southern California, Los Angeles, CA 90007, USA
- CureMetrix, Inc., San Diego, CA 92101, USA (R.M.); (H.K.)
| | - Tara Retson
- Department of Radiology, University of California, San Diego, CA 92093, USA
| | - Junhao Wang
- CureMetrix, Inc., San Diego, CA 92101, USA (R.M.); (H.K.)
| | - Richard Mantey
- CureMetrix, Inc., San Diego, CA 92101, USA (R.M.); (H.K.)
| | - Chiyung Chim
- CureMetrix, Inc., San Diego, CA 92101, USA (R.M.); (H.K.)
| | | |
Collapse
|
6
|
Kai C, Ishizuka S, Otsuka T, Nara M, Kondo S, Futamura H, Kodama N, Kasai S. Automated Estimation of Mammary Gland Content Ratio Using Regression Deep Convolutional Neural Network and the Effectiveness in Clinical Practice as Explainable Artificial Intelligence. Cancers (Basel) 2023; 15:2794. [PMID: 37345132 DOI: 10.3390/cancers15102794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Revised: 05/11/2023] [Accepted: 05/15/2023] [Indexed: 06/23/2023] Open
Abstract
Recently, breast types were categorized into four types based on the Breast Imaging Reporting and Data System (BI-RADS) atlas, and evaluating them is vital in clinical practice. A Japanese guideline, called breast composition, was developed for the breast types based on BI-RADS. The guideline is characterized using a continuous value called the mammary gland content ratio calculated to determine the breast composition, therefore allowing a more objective and visual evaluation. Although a discriminative deep convolutional neural network (DCNN) has been developed conventionally to classify the breast composition, it could encounter two-step errors or more. Hence, we propose an alternative regression DCNN based on mammary gland content ratio. We used 1476 images, evaluated by an expert physician. Our regression DCNN contained four convolution layers and three fully connected layers. Consequently, we obtained a high correlation of 0.93 (p < 0.01). Furthermore, to scrutinize the effectiveness of the regression DCNN, we categorized breast composition using the estimated ratio obtained by the regression DCNN. The agreement rates are high at 84.8%, suggesting that the breast composition can be calculated using regression DCNN with high accuracy. Moreover, the occurrence of two-step errors or more is unlikely, and the proposed method can intuitively understand the estimated results.
Collapse
Affiliation(s)
- Chiharu Kai
- Department of Radiological Technology, Faculty of Medical Technology, Niigata University of Health and Welfare, Niigata City 950-3198, Niigata, Japan
| | - Sachi Ishizuka
- Department of Radiological Technology, Faculty of Medical Technology, Niigata University of Health and Welfare, Niigata City 950-3198, Niigata, Japan
| | | | - Miyako Nara
- Department of Breast Surgery, Tokyo Metropolitan Cancer and Infectious Disease Center, Komagome Hospital, Tokyo 113-8677, Japan
| | - Satoshi Kondo
- Graduate School of Engineering, Muroran Institute of Technology, Muroran City 050-8585, Hokkaido, Japan
| | | | - Naoki Kodama
- Department of Radiological Technology, Faculty of Medical Technology, Niigata University of Health and Welfare, Niigata City 950-3198, Niigata, Japan
| | - Satoshi Kasai
- Department of Radiological Technology, Faculty of Medical Technology, Niigata University of Health and Welfare, Niigata City 950-3198, Niigata, Japan
| |
Collapse
|
7
|
Gupta S, Kumar S, Chang K, Lu C, Singh P, Kalpathy-Cramer J. Collaborative Privacy-preserving Approaches for Distributed Deep Learning Using Multi-Institutional Data. Radiographics 2023; 43:e220107. [PMID: 36862082 PMCID: PMC10091220 DOI: 10.1148/rg.220107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 08/04/2022] [Accepted: 08/09/2022] [Indexed: 03/03/2023]
Abstract
Deep learning (DL) algorithms have shown remarkable potential in automating various tasks in medical imaging and radiologic reporting. However, models trained on low quantities of data or only using data from a single institution often are not generalizable to other institutions, which may have different patient demographics or data acquisition characteristics. Therefore, training DL algorithms using data from multiple institutions is crucial to improving the robustness and generalizability of clinically useful DL models. In the context of medical data, simply pooling data from each institution to a central location to train a model poses several issues such as increased risk to patient privacy, increased costs for data storage and transfer, and regulatory challenges. These challenges of centrally hosting data have motivated the development of distributed machine learning techniques and frameworks for collaborative learning that facilitate the training of DL models without the need to explicitly share private medical data. The authors describe several popular methods for collaborative training and review the main considerations for deploying these models. They also highlight publicly available software frameworks for federated learning and showcase several real-world examples of collaborative learning. The authors conclude by discussing some key challenges and future research directions for distributed DL. They aim to introduce clinicians to the benefits, limitations, and risks of using distributed DL for the development of medical artificial intelligence algorithms. ©RSNA, 2023 Quiz questions for this article are available in the supplemental material.
Collapse
Affiliation(s)
| | | | - Ken Chang
- From the Athinoula A. Martinos Center for Biomedical Imaging,
Department of Radiology, Massachusetts General Hospital, Harvard Medical School,
13th Street, Building 149, Room 2301, Charlestown, MA 02129 (S.G., S.K., K.C.,
C.L., P.S., J.K.C.); and Indian Institute of Technology Delhi, New Delhi, India
(S.G., S.K.)
| | - Charles Lu
- From the Athinoula A. Martinos Center for Biomedical Imaging,
Department of Radiology, Massachusetts General Hospital, Harvard Medical School,
13th Street, Building 149, Room 2301, Charlestown, MA 02129 (S.G., S.K., K.C.,
C.L., P.S., J.K.C.); and Indian Institute of Technology Delhi, New Delhi, India
(S.G., S.K.)
| | - Praveer Singh
- From the Athinoula A. Martinos Center for Biomedical Imaging,
Department of Radiology, Massachusetts General Hospital, Harvard Medical School,
13th Street, Building 149, Room 2301, Charlestown, MA 02129 (S.G., S.K., K.C.,
C.L., P.S., J.K.C.); and Indian Institute of Technology Delhi, New Delhi, India
(S.G., S.K.)
| | - Jayashree Kalpathy-Cramer
- From the Athinoula A. Martinos Center for Biomedical Imaging,
Department of Radiology, Massachusetts General Hospital, Harvard Medical School,
13th Street, Building 149, Room 2301, Charlestown, MA 02129 (S.G., S.K., K.C.,
C.L., P.S., J.K.C.); and Indian Institute of Technology Delhi, New Delhi, India
(S.G., S.K.)
| |
Collapse
|
8
|
Cole E, Valikodath NG, Al-Khaled T, Bajimaya S, KC S, Chuluunbat T, Munkhuu B, Jonas KE, Chuluunkhuu C, MacKeen LD, Yap V, Hallak J, Ostmo S, Wu WC, Coyner AS, Singh P, Kalpathy-Cramer J, Chiang MF, Campbell JP, Chan RVP. Evaluation of an Artificial Intelligence System for Retinopathy of Prematurity Screening in Nepal and Mongolia. OPHTHALMOLOGY SCIENCE 2022; 2:100165. [PMID: 36531583 PMCID: PMC9754980 DOI: 10.1016/j.xops.2022.100165] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 04/19/2022] [Accepted: 04/19/2022] [Indexed: 05/09/2023]
Abstract
PURPOSE To evaluate the performance of a deep learning (DL) algorithm for retinopathy of prematurity (ROP) screening in Nepal and Mongolia. DESIGN Retrospective analysis of prospectively collected clinical data. PARTICIPANTS Clinical information and fundus images were obtained from infants in 2 ROP screening programs in Nepal and Mongolia. METHODS Fundus images were obtained using the Forus 3nethra neo (Forus Health) in Nepal and the RetCam Portable (Natus Medical, Inc.) in Mongolia. The overall severity of ROP was determined from the medical record using the International Classification of ROP (ICROP). The presence of plus disease was determined independently in each image using a reference standard diagnosis. The Imaging and Informatics for ROP (i-ROP) DL algorithm was trained on images from the RetCam to classify plus disease and to assign a vascular severity score (VSS) from 1 through 9. MAIN OUTCOME MEASURES Area under the receiver operating characteristic curve and area under the precision-recall curve for the presence of plus disease or type 1 ROP and association between VSS and ICROP disease category. RESULTS The prevalence of type 1 ROP was found to be higher in Mongolia (14.0%) than in Nepal (2.2%; P < 0.001) in these data sets. In Mongolia (RetCam images), the area under the receiver operating characteristic curve for examination-level plus disease detection was 0.968, and the area under the precision-recall curve was 0.823. In Nepal (Forus images), these values were 0.999 and 0.993, respectively. The ROP VSS was associated with ICROP classification in both datasets (P < 0.001). At the population level, the median VSS was found to be higher in Mongolia (2.7; interquartile range [IQR], 1.3-5.4]) as compared with Nepal (1.9; IQR, 1.2-3.4; P < 0.001). CONCLUSIONS These data provide preliminary evidence of the effectiveness of the i-ROP DL algorithm for ROP screening in neonatal populations in Nepal and Mongolia using multiple camera systems and are useful for consideration in future clinical implementation of artificial intelligence-based ROP screening in low- and middle-income countries.
Collapse
Key Words
- Artificial intelligence
- BW, birth weight
- DL, deep learning
- Deep learning
- GA, gestational age
- ICROP, International Classification of Retinopathy of Prematurity
- IQR, interquartile range
- LMIC, low- and middle-income country
- Mongolia
- Nepal
- ROP, retinopathy of prematurity
- RSD, reference standard diagnosis
- Retinopathy of prematurity
- TR, treatment-requiring
- VSS, vascular severity score
- i-ROP, Imaging and Informatics for Retinopathy of Prematurity
Collapse
Affiliation(s)
- Emily Cole
- Department of Ophthalmology and Visual Sciences, Illinois Eye and Ear Infirmary, University of Illinois Chicago, Chicago, Illinois
| | - Nita G. Valikodath
- Department of Ophthalmology and Visual Sciences, Illinois Eye and Ear Infirmary, University of Illinois Chicago, Chicago, Illinois
| | - Tala Al-Khaled
- Department of Ophthalmology and Visual Sciences, Illinois Eye and Ear Infirmary, University of Illinois Chicago, Chicago, Illinois
| | | | - Sagun KC
- Helen Keller International, Kathmandu, Nepal
| | | | - Bayalag Munkhuu
- National Center for Maternal and Child Health, Ulaanbaatar, Mongolia
| | - Karyn E. Jonas
- Department of Ophthalmology and Visual Sciences, Illinois Eye and Ear Infirmary, University of Illinois Chicago, Chicago, Illinois
| | | | - Leslie D. MacKeen
- The Hospital for Sick Children, Toronto, Canada
- Phoenix Technology Group, Pleasanton, California
| | - Vivien Yap
- Department of Pediatrics, Weill Cornell Medical College, New York, New York
| | - Joelle Hallak
- Department of Ophthalmology and Visual Sciences, Illinois Eye and Ear Infirmary, University of Illinois Chicago, Chicago, Illinois
| | - Susan Ostmo
- Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, Oregon
| | - Wei-Chi Wu
- Chang Gung Memorial Hospital, Taoyuan, Taiwan, and Chang Gung University, College of Medicine, Taoyuan, Taiwan
| | - Aaron S. Coyner
- Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, Oregon
| | | | | | - Michael F. Chiang
- National Eye Institute, National Institutes of Health, Bethesda, Maryland
| | - J. Peter Campbell
- Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, Oregon
| | - R. V. Paul Chan
- Department of Ophthalmology and Visual Sciences, Illinois Eye and Ear Infirmary, University of Illinois Chicago, Chicago, Illinois
- Correspondence: R. V. Paul Chan, MD, MSc, MBA, Department of Ophthalmology and Visual Sciences, Illinois Eye and Ear Infirmary, University of Illinois at Chicago, 1905 West Taylor Street, Chicago, IL 60612.
| |
Collapse
|
9
|
Improving the repeatability of deep learning models with Monte Carlo dropout. NPJ Digit Med 2022; 5:174. [PMID: 36400939 PMCID: PMC9674698 DOI: 10.1038/s41746-022-00709-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Accepted: 10/10/2022] [Indexed: 11/19/2022] Open
Abstract
AbstractThe integration of artificial intelligence into clinical workflows requires reliable and robust models. Repeatability is a key attribute of model robustness. Ideal repeatable models output predictions without variation during independent tests carried out under similar conditions. However, slight variations, though not ideal, may be unavoidable and acceptable in practice. During model development and evaluation, much attention is given to classification performance while model repeatability is rarely assessed, leading to the development of models that are unusable in clinical practice. In this work, we evaluate the repeatability of four model types (binary classification, multi-class classification, ordinal classification, and regression) on images that were acquired from the same patient during the same visit. We study the each model’s performance on four medical image classification tasks from public and private datasets: knee osteoarthritis, cervical cancer screening, breast density estimation, and retinopathy of prematurity. Repeatability is measured and compared on ResNet and DenseNet architectures. Moreover, we assess the impact of sampling Monte Carlo dropout predictions at test time on classification performance and repeatability. Leveraging Monte Carlo predictions significantly increases repeatability, in particular at the class boundaries, for all tasks on the binary, multi-class, and ordinal models leading to an average reduction of the 95% limits of agreement by 16% points and of the class disagreement rate by 7% points. The classification accuracy improves in most settings along with the repeatability. Our results suggest that beyond about 20 Monte Carlo iterations, there is no further gain in repeatability. In addition to the higher test-retest agreement, Monte Carlo predictions are better calibrated which leads to output probabilities reflecting more accurately the true likelihood of being correctly classified.
Collapse
|
10
|
Bahl M. Artificial Intelligence in Clinical Practice: Implementation Considerations and Barriers. JOURNAL OF BREAST IMAGING 2022; 4:632-639. [PMID: 36530476 PMCID: PMC9741727 DOI: 10.1093/jbi/wbac065] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Indexed: 09/06/2023]
Abstract
The rapid growth of artificial intelligence (AI) in radiology has led to Food and Drug Administration clearance of more than 20 AI algorithms for breast imaging. The steps involved in the clinical implementation of an AI product include identifying all stakeholders, selecting the appropriate product to purchase, evaluating it with a local data set, integrating it into the workflow, and monitoring its performance over time. Despite the potential benefits of improved quality and increased efficiency with AI, several barriers, such as high costs and liability concerns, may limit its widespread implementation. This article lists currently available AI products for breast imaging, describes the key elements of clinical implementation, and discusses barriers to clinical implementation.
Collapse
Affiliation(s)
- Manisha Bahl
- Massachusetts General Hospital, Department of Radiology, Boston, MA, USA
| |
Collapse
|
11
|
Brink L, Coombs LP, Kattil Veettil D, Kuchipudi K, Marella S, Schmidt K, Nair SS, Tilkin M, Treml C, Chang K, Kalpathy-Cramer J. ACR’s Connect and AI-LAB technical framework. JAMIA Open 2022; 5:ooac094. [PMID: 36380846 PMCID: PMC9651971 DOI: 10.1093/jamiaopen/ooac094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 10/11/2022] [Accepted: 10/31/2022] [Indexed: 11/13/2022] Open
Abstract
Objective To develop a free, vendor-neutral software suite, the American College of Radiology (ACR) Connect, which serves as a platform for democratizing artificial intelligence (AI) for all individuals and institutions. Materials and Methods Among its core capabilities, ACR Connect provides educational resources; tools for dataset annotation; model building and evaluation; and an interface for collaboration and federated learning across institutions without the need to move data off hospital premises. Results The AI-LAB application within ACR Connect allows users to investigate AI models using their own local data while maintaining data security. The software enables non-technical users to participate in the evaluation and training of AI models as part of a larger, collaborative network. Discussion Advancements in AI have transformed automated quantitative analysis for medical imaging. Despite the significant progress in research, AI is currently underutilized in current clinical workflows. The success of AI model development depends critically on the synergy between physicians who can drive clinical direction, data scientists who can design effective algorithms, and the availability of high-quality datasets. ACR Connect and AI-LAB provide a way to perform external validation as well as collaborative, distributed training. Conclusion In order to create a collaborative AI ecosystem across clinical and technical domains, the ACR developed a platform that enables non-technical users to participate in education and model development.
Collapse
Affiliation(s)
- Laura Brink
- Department of Information Technology, American College of Radiology , Reston, Virginia, USA
| | - Laura P Coombs
- Department of Information Technology, American College of Radiology , Reston, Virginia, USA
| | - Deepak Kattil Veettil
- Department of Information Technology, American College of Radiology , Reston, Virginia, USA
| | - Kashyap Kuchipudi
- Department of Information Technology, American College of Radiology , Reston, Virginia, USA
| | - Sailaja Marella
- Department of Information Technology, American College of Radiology , Reston, Virginia, USA
| | - Kendall Schmidt
- Department of Information Technology, American College of Radiology , Reston, Virginia, USA
| | - Sujith Surendran Nair
- Department of Information Technology, American College of Radiology , Reston, Virginia, USA
| | - Michael Tilkin
- Department of Information Technology, American College of Radiology , Reston, Virginia, USA
| | - Christopher Treml
- Department of Information Technology, American College of Radiology , Reston, Virginia, USA
| | - Ken Chang
- Department of Radiology, Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital , Boston, Massachusetts, USA
| | - Jayashree Kalpathy-Cramer
- Department of Radiology, Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital , Boston, Massachusetts, USA
- Department of Ophthalmology, University of Colorado School of Medicine , Aurora, Colorado, USA
| |
Collapse
|
12
|
Classifying Breast Density from Mammogram with Pretrained CNNs and Weighted Average Ensembles. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12115599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
We are currently experiencing a revolution in data production and artificial intelligence (AI) applications. Data are produced much faster than they can be consumed. Thus, there is an urgent need to develop AI algorithms for all aspects of modern life. Furthermore, the medical field is a fertile field in which to apply AI techniques. Breast cancer is one of the most common cancers and a leading cause of death around the world. Early detection is critical to treating the disease effectively. Breast density plays a significant role in determining the likelihood and risk of breast cancer. Breast density describes the amount of fibrous and glandular tissue compared with the amount of fatty tissue in the breast. Breast density is categorized using a system called the ACR BI-RADS. The ACR assigns breast density to one of four classes. In class A, breasts are almost entirely fatty. In class B, scattered areas of fibroglandular density appear in the breasts. In class C, the breasts are heterogeneously dense. In class D, the breasts are extremely dense. This paper applies pre-trained Convolutional Neural Network (CNN) on a local mammogram dataset to classify breast density. Several transfer learning models were tested on a dataset consisting of more than 800 mammogram screenings from King Abdulaziz Medical City (KAMC). Inception V3, EfficientNet 2B0, and Xception gave the highest accuracy for both four- and two-class classification. To enhance the accuracy of density classification, we applied weighted average ensembles, and performance was visibly improved. The overall accuracy of ACR classification with weighted average ensembles was 78.11%.
Collapse
|
13
|
Bhowmik A, Eskreis-Winkler S. Deep learning in breast imaging. BJR Open 2022; 4:20210060. [PMID: 36105427 PMCID: PMC9459862 DOI: 10.1259/bjro.20210060] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 04/04/2022] [Accepted: 04/21/2022] [Indexed: 11/22/2022] Open
Abstract
Millions of breast imaging exams are performed each year in an effort to reduce the morbidity and mortality of breast cancer. Breast imaging exams are performed for cancer screening, diagnostic work-up of suspicious findings, evaluating extent of disease in recently diagnosed breast cancer patients, and determining treatment response. Yet, the interpretation of breast imaging can be subjective, tedious, time-consuming, and prone to human error. Retrospective and small reader studies suggest that deep learning (DL) has great potential to perform medical imaging tasks at or above human-level performance, and may be used to automate aspects of the breast cancer screening process, improve cancer detection rates, decrease unnecessary callbacks and biopsies, optimize patient risk assessment, and open up new possibilities for disease prognostication. Prospective trials are urgently needed to validate these proposed tools, paving the way for real-world clinical use. New regulatory frameworks must also be developed to address the unique ethical, medicolegal, and quality control issues that DL algorithms present. In this article, we review the basics of DL, describe recent DL breast imaging applications including cancer detection and risk prediction, and discuss the challenges and future directions of artificial intelligence-based systems in the field of breast cancer.
Collapse
Affiliation(s)
- Arka Bhowmik
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, United States
| | - Sarah Eskreis-Winkler
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, United States
| |
Collapse
|
14
|
Lu C, Hanif A, Singh P, Chang K, Coyner AS, Brown JM, Ostmo S, Chan RP, Rubin D, Chiang MF, Campbell JP, Kalpathy-Cramer J. Federated learning for multi-center collaboration in ophthalmology: improving classification performance in retinopathy of prematurity. Ophthalmol Retina 2022; 6:657-663. [PMID: 35296449 DOI: 10.1016/j.oret.2022.02.015] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Revised: 02/10/2022] [Accepted: 02/28/2022] [Indexed: 11/29/2022]
Abstract
OBJECTIVE To compare the performance of deep learning (DL) classifiers for the diagnosis of plus disease in retinopathy of prematurity (ROP) trained using two methods of developing models on multi-institutional datasets: centralizing data versus federated learning (FL) where no data leaves each institution. DESIGN Evaluation of a diagnostic test or technology. SUBJECTS, PARTICIPANTS, AND/OR CONTROLS DL models were trained, validated, and tested on 5,255 wide-angle retinal images in the neonatal intensive care units of 7 institutions as part of the Imaging and Informatics in ROP (i-ROP) study. All images were labeled for the presence of plus, pre-plus, or no plus disease with a clinical label, and a reference standard diagnosis (RSD) determined by three image-based ROP graders and the clinical diagnosis. METHODS, INTERVENTION OR TESTING We compared the area under the receiver operating characteristic curve (AUROC) for models developed on multi-institutional data, using a central approach, then FL, and compared locally trained models to either approach. We compared model performance (kappa) with label agreement (between clinical and RSD), dataset size and number of plus disease cases in each training cohort using Spearman's correlation coefficient (CC). MAIN OUTCOME MEASURES Model performance using AUROC and linearly-weighted kappa. RESULTS Four settings of experiment: FL trained on RSD against central trained on RSD, FL trained on clinical labels against central trained on clinical labels, FL trained on RSD against central trained on clinical labels, and FL trained on clinical labels against central trained on RSD (p=0.046, p=0.126, p=0.224, p=0.0173, respectively). 4/7 (57%) of models trained on local institutional data performed inferiorly to the FL models. Model performance for local models was positively correlated with label agreement (between clinical and RSD labels, CC = 0.389, p=0.387), total number of plus cases (CC=0.759, p=0.047), overall training set size (CC=0.924, p=0.002). CONCLUSIONS We show that a FL model trained performs comparably to a centralized model, confirming that FL may provide an effective, more feasible solution for inter-institutional learning. Smaller institutions benefit more from collaboration than larger institutions, showing the potential of FL for addressing disparities in resource access.
Collapse
Affiliation(s)
- Charles Lu
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Charlestown, Massachusetts; Center for Clinical Data Science, Massachusetts General Hospital and Brigham and Women's Hospital, Boston, Massachusetts
| | - Adam Hanif
- Department of Ophthalmology, Oregon Health & Science University, Portland, OR
| | - Praveer Singh
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Charlestown, Massachusetts; Center for Clinical Data Science, Massachusetts General Hospital and Brigham and Women's Hospital, Boston, Massachusetts
| | - Ken Chang
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Charlestown, Massachusetts; Center for Clinical Data Science, Massachusetts General Hospital and Brigham and Women's Hospital, Boston, Massachusetts
| | - Aaron S Coyner
- Department of Ophthalmology, Oregon Health & Science University, Portland, OR
| | - James M Brown
- School of Computer Science, University of Lincoln, Lincoln, UK
| | - Susan Ostmo
- Department of Ophthalmology, Oregon Health & Science University, Portland, OR
| | - Rv Paul Chan
- Ophthalmology and Visual Sciences, University of Illinois at Chicago, Chicago, IL
| | - Daniel Rubin
- Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, California
| | - Michael F Chiang
- National Eye Institute, National Institutes of Health, Bethesda, MD
| | - J Peter Campbell
- Department of Ophthalmology, Oregon Health & Science University, Portland, OR
| | - Jayashree Kalpathy-Cramer
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Charlestown, Massachusetts; Center for Clinical Data Science, Massachusetts General Hospital and Brigham and Women's Hospital, Boston, Massachusetts.
| | | |
Collapse
|
15
|
Gastounioti A, Desai S, Ahluwalia VS, Conant EF, Kontos D. Artificial intelligence in mammographic phenotyping of breast cancer risk: a narrative review. Breast Cancer Res 2022; 24:14. [PMID: 35184757 PMCID: PMC8859891 DOI: 10.1186/s13058-022-01509-z] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 02/08/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Improved breast cancer risk assessment models are needed to enable personalized screening strategies that achieve better harm-to-benefit ratio based on earlier detection and better breast cancer outcomes than existing screening guidelines. Computational mammographic phenotypes have demonstrated a promising role in breast cancer risk prediction. With the recent exponential growth of computational efficiency, the artificial intelligence (AI) revolution, driven by the introduction of deep learning, has expanded the utility of imaging in predictive models. Consequently, AI-based imaging-derived data has led to some of the most promising tools for precision breast cancer screening. MAIN BODY This review aims to synthesize the current state-of-the-art applications of AI in mammographic phenotyping of breast cancer risk. We discuss the fundamentals of AI and explore the computing advancements that have made AI-based image analysis essential in refining breast cancer risk assessment. Specifically, we discuss the use of data derived from digital mammography as well as digital breast tomosynthesis. Different aspects of breast cancer risk assessment are targeted including (a) robust and reproducible evaluations of breast density, a well-established breast cancer risk factor, (b) assessment of a woman's inherent breast cancer risk, and (c) identification of women who are likely to be diagnosed with breast cancers after a negative or routine screen due to masking or the rapid and aggressive growth of a tumor. Lastly, we discuss AI challenges unique to the computational analysis of mammographic imaging as well as future directions for this promising research field. CONCLUSIONS We provide a useful reference for AI researchers investigating image-based breast cancer risk assessment while indicating key priorities and challenges that, if properly addressed, could accelerate the implementation of AI-assisted risk stratification to future refine and individualize breast cancer screening strategies.
Collapse
Affiliation(s)
- Aimilia Gastounioti
- Department of Radiology, Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, Philadelphia, PA, 19104, USA.,Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, MO, 63110, USA
| | - Shyam Desai
- Department of Radiology, Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Vinayak S Ahluwalia
- Department of Radiology, Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, Philadelphia, PA, 19104, USA.,Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Emily F Conant
- Department of Radiology, Hospital of the University of Pennsylvania, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Despina Kontos
- Department of Radiology, Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
16
|
Lee SE, Son NH, Kim MH, Kim EK. Mammographic Density Assessment by Artificial Intelligence-Based Computer-Assisted Diagnosis: A Comparison with Automated Volumetric Assessment. J Digit Imaging 2022; 35:173-179. [PMID: 35015180 PMCID: PMC8921363 DOI: 10.1007/s10278-021-00555-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 10/05/2021] [Accepted: 11/21/2021] [Indexed: 10/19/2022] Open
Abstract
We evaluated and compared the mammographic density assessment of an artificial intelligence-based computer-assisted diagnosis (AI-CAD) program using inter-rater agreements between radiologists and an automated density assessment program. Between March and May 2020, 488 consecutive mammograms of 488 patients (56.2 ± 10.9 years) were collected from a single institution. We assigned four classes of mammographic density based on BI-RADS (Breast Imaging Reporting and Data System) using commercial AI-CAD (Lunit INSIGHT MMG), and compared inter-rater agreements between radiologists, AI-CAD, and another commercial automated density assessment program (Volpara®). The inter-rater agreement between AI-CAD and the reader consensus was 0.52 with a matched rate of 68.2% (333/488). The inter-rater agreement between Volpara® and the reader consensus was similar to AI-CAD at 0.50 with a matched rate of 62.7% (306/488). The inter-rater agreement between AI-CAD and Volpara® was 0.54 with a matched rate of 61.5% (300/488). In conclusion, density assessments by AI-CAD showed fair agreement with those of radiologists, similar to the agreement between the commercial automated density assessment program and radiologists.
Collapse
Affiliation(s)
- Si Eun Lee
- Department of Radiology, Research Institute of Radiological Science and Center for Clinical Imaging Data Science, Yongin Severance Hospital, Yonsei University College of Medicine, Yongin, Gyeonggi-do, Korea
| | - Nak-Hoon Son
- Division of Biostatistics, Yongin Severance Hospital, Yonsei University College of Medicine, Gyeonggi-do, Yongin, Republic of Korea
| | - Myung Hyun Kim
- Department of Radiology, Research Institute of Radiological Science and Center for Clinical Imaging Data Science, Yongin Severance Hospital, Yonsei University College of Medicine, Yongin, Gyeonggi-do, Korea
| | - Eun-Kyung Kim
- Department of Radiology, Research Institute of Radiological Science and Center for Clinical Imaging Data Science, Yongin Severance Hospital, Yonsei University College of Medicine, Yongin, Gyeonggi-do, Korea.
| |
Collapse
|
17
|
Tardy M, Mateus D. Leveraging Multi-Task Learning to Cope With Poor and Missing Labels of Mammograms. FRONTIERS IN RADIOLOGY 2022; 1:796078. [PMID: 37492176 PMCID: PMC10365086 DOI: 10.3389/fradi.2021.796078] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 12/06/2021] [Indexed: 07/27/2023]
Abstract
In breast cancer screening, binary classification of mammograms is a common task aiming to determine whether a case is malignant or benign. A Computer-Aided Diagnosis (CADx) system based on a trainable classifier requires clean data and labels coming from a confirmed diagnosis. Unfortunately, such labels are not easy to obtain in clinical practice, since the histopathological reports of biopsy may not be available alongside mammograms, while normal cases may not have an explicit follow-up confirmation. Such ambiguities result either in reducing the number of samples eligible for training or in a label uncertainty that may decrease the performances. In this work, we maximize the number of samples for training relying on multi-task learning. We design a deep-neural-network-based classifier yielding multiple outputs in one forward pass. The predicted classes include binary malignancy, cancer probability estimation, breast density, and image laterality. Since few samples have all classes available and confirmed, we propose to introduce the uncertainty related to the classes as a per-sample weight during training. Such weighting prevents updating the network's parameters when training on uncertain or missing labels. We evaluate our approach on the public INBreast and private datasets, showing statistically significant improvements compared to baseline and independent state-of-the-art approaches. Moreover, we use mammograms from Susan G. Komen Tissue Bank for fine-tuning, further demonstrating the ability to improve the performances in our multi-task learning setup from raw clinical data. We achieved the binary classification performance of AUC = 80.46 on our private dataset and AUC = 85.23 on the INBreast dataset.
Collapse
Affiliation(s)
- Mickael Tardy
- Ecole Centrale de Nantes, LS2N, UMR CNRS 6004, Nantes, France
- Hera-MI SAS, Saint-Herblain, France
| | - Diana Mateus
- Ecole Centrale de Nantes, LS2N, UMR CNRS 6004, Nantes, France
| |
Collapse
|
18
|
Kumar I, Kumar A, Kumar VDA, Kannan R, Vimal V, Singh KU, Mahmud M. Dense Tissue Pattern Characterization Using Deep Neural Network. Cognit Comput 2022. [DOI: 10.1007/s12559-021-09970-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
AbstractBreast tumors are from the common infections among women around the world. Classifying the various types of breast tumors contribute to treating breast tumors more efficiently. However, this classification task is often hindered by dense tissue patterns captured in mammograms. The present study has been proposed a dense tissue pattern characterization framework using deep neural network. A total of 322 mammograms belonging to the mini-MIAS dataset and 4880 mammograms from DDSM dataset have been taken, and an ROI of fixed size 224 × 224 pixels from each mammogram has been extracted. In this work, tedious experimentation has been executed using different combinations of training and testing sets using different activation function with AlexNet, ResNet-18 model. Data augmentation has been used to create a similar type of virtual image for proper training of the DL model. After that, the testing set is applied on the trained model to validate the proposed model. During experiments, four different activation functions ‘sigmoid’, ‘tanh’, ‘ReLu’, and ‘leakyReLu’ are used, and the outcome for each function has been reported. It has been found that activation function ‘ReLu’ perform always outstanding with respect to others. For each experiment, classification accuracy and kappa coefficient have been computed. The obtained accuracy and kappa value for MIAS dataset using ResNet-18 model is 91.3% and 0.803, respectively. For DDSM dataset, the accuracy of 92.3% and kappa coefficient value of 0.846 are achieved. After the combination of both dataset images, the achieved accuracy is 91.9%, and kappa coefficient value is 0.839 using ResNet-18 model. Finally, it has been concluded that the ResNet-18 model and ReLu activation function yield outstanding performance for the task.
Collapse
|
19
|
Singh NM, Harrod JB, Subramanian S, Robinson M, Chang K, Cetin-Karayumak S, Dalca AV, Eickhoff S, Fox M, Franke L, Golland P, Haehn D, Iglesias JE, O’Donnell LJ, Ou Y, Rathi Y, Siddiqi SH, Sun H, Westover MB, Whitfield-Gabrieli S, Gollub RL. How Machine Learning is Powering Neuroimaging to Improve Brain Health. Neuroinformatics 2022; 20:943-964. [PMID: 35347570 PMCID: PMC9515245 DOI: 10.1007/s12021-022-09572-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/07/2022] [Indexed: 12/31/2022]
Abstract
This report presents an overview of how machine learning is rapidly advancing clinical translational imaging in ways that will aid in the early detection, prediction, and treatment of diseases that threaten brain health. Towards this goal, we aresharing the information presented at a symposium, "Neuroimaging Indicators of Brain Structure and Function - Closing the Gap Between Research and Clinical Application", co-hosted by the McCance Center for Brain Health at Mass General Hospital and the MIT HST Neuroimaging Training Program on February 12, 2021. The symposium focused on the potential for machine learning approaches, applied to increasingly large-scale neuroimaging datasets, to transform healthcare delivery and change the trajectory of brain health by addressing brain care earlier in the lifespan. While not exhaustive, this overview uniquely addresses many of the technical challenges from image formation, to analysis and visualization, to synthesis and incorporation into the clinical workflow. Some of the ethical challenges inherent to this work are also explored, as are some of the regulatory requirements for implementation. We seek to educate, motivate, and inspire graduate students, postdoctoral fellows, and early career investigators to contribute to a future where neuroimaging meaningfully contributes to the maintenance of brain health.
Collapse
Affiliation(s)
- Nalini M. Singh
- Harvard-MIT Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Jordan B. Harrod
- Harvard-MIT Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Sandya Subramanian
- Harvard-MIT Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Mitchell Robinson
- Harvard-MIT Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Ken Chang
- Harvard-MIT Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Suheyla Cetin-Karayumak
- Department of Psychiatry, Brigham and Women’s Hospital and Harvard Medical School, Boston, 02115 USA
| | | | - Simon Eickhoff
- Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany ,Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7) Research Centre Jülich, Jülich, Germany
| | - Michael Fox
- Center for Brain Circuit Therapeutics, Department of Neurology, Psychiatry, and Radiology, Brigham and Women’s Hospital and Harvard Medical School, 02115 Boston, USA
| | - Loraine Franke
- University of Massachusetts Boston, Boston, MA 02125 USA
| | - Polina Golland
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Daniel Haehn
- University of Massachusetts Boston, Boston, MA 02125 USA
| | - Juan Eugenio Iglesias
- Centre for Medical Image Computing, University College London, London, UK ,Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, 02114 USA ,Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Lauren J. O’Donnell
- Department of Radiology, Brigham and Women’s Hospital and Harvard Medical School, MA 02115 Boston, USA
| | - Yangming Ou
- Department of Radiology, Boston Children’s Hospital, Harvard Medical School, Boston, MA 02115 USA
| | - Yogesh Rathi
- Department of Psychiatry, Brigham and Women’s Hospital and Harvard Medical School, Boston, 02115 USA
| | - Shan H. Siddiqi
- Department of Psychiatry, Brigham and Women’s Hospital and Harvard Medical School, Boston, 02115 USA
| | - Haoqi Sun
- Department of Neurology and McCance Center for Brain Health / Harvard Medical School, Massachusetts General Hospital, Boston, 02114 USA
| | - M. Brandon Westover
- Department of Neurology and McCance Center for Brain Health / Harvard Medical School, Massachusetts General Hospital, Boston, 02114 USA
| | | | - Randy L. Gollub
- Department of Psychiatry and Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, MA 02114 USA
| |
Collapse
|
20
|
Desai KT, Befano B, Xue Z, Kelly H, Campos NG, Egemen D, Gage JC, Rodriguez AC, Sahasrabuddhe V, Levitz D, Pearlman P, Jeronimo J, Antani S, Schiffman M, de Sanjosé S. The development of "automated visual evaluation" for cervical cancer screening: The promise and challenges in adapting deep-learning for clinical testing. Int J Cancer 2021; 150:741-752. [PMID: 34800038 PMCID: PMC8732320 DOI: 10.1002/ijc.33879] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 09/24/2021] [Accepted: 10/15/2021] [Indexed: 12/22/2022]
Abstract
There is limited access to effective cervical cancer screening programs in many resource‐limited settings, resulting in continued high cervical cancer burden. Human papillomavirus (HPV) testing is increasingly recognized to be the preferable primary screening approach if affordable due to superior long‐term reassurance when negative and adaptability to self‐sampling. Visual inspection with acetic acid (VIA) is an inexpensive but subjective and inaccurate method widely used in resource‐limited settings, either for primary screening or for triage of HPV‐positive individuals. A deep learning (DL)‐based automated visual evaluation (AVE) of cervical images has been developed to help improve the accuracy and reproducibility of VIA as assistive technology. However, like any new clinical technology, rigorous evaluation and proof of clinical effectiveness are required before AVE is implemented widely. In the current article, we outline essential clinical and technical considerations involved in building a validated DL‐based AVE tool for broad use as a clinical test.
Collapse
Affiliation(s)
- Kanan T Desai
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, USA
| | - Brian Befano
- Information Management Services Inc., Calverton, Maryland, USA.,Department of Epidemiology, University of Washington School of Public Health, Seattle, Washington, USA
| | - Zhiyun Xue
- US National Library of Medicine, Bethesda, Maryland, USA
| | - Helen Kelly
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, USA
| | - Nicole G Campos
- Center for Health Decision Science, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Didem Egemen
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, USA
| | - Julia C Gage
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, USA
| | - Ana-Cecilia Rodriguez
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, USA
| | | | - David Levitz
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, USA
| | - Paul Pearlman
- Center for Global Health, National Cancer Institute, Rockville, Maryland, USA
| | - Jose Jeronimo
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, USA
| | - Sameer Antani
- US National Library of Medicine, Bethesda, Maryland, USA
| | - Mark Schiffman
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, USA
| | - Silvia de Sanjosé
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, USA.,ISGlobal, Barcelona, Spain
| |
Collapse
|
21
|
Arun N, Gaw N, Singh P, Chang K, Aggarwal M, Chen B, Hoebel K, Gupta S, Patel J, Gidwani M, Adebayo J, Li MD, Kalpathy-Cramer J. Assessing the Trustworthiness of Saliency Maps for Localizing Abnormalities in Medical Imaging. Radiol Artif Intell 2021; 3:e200267. [PMID: 34870212 PMCID: PMC8637231 DOI: 10.1148/ryai.2021200267] [Citation(s) in RCA: 58] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 09/13/2021] [Accepted: 09/20/2021] [Indexed: 11/11/2022]
Abstract
PURPOSE To evaluate the trustworthiness of saliency maps for abnormality localization in medical imaging. MATERIALS AND METHODS Using two large publicly available radiology datasets (Society for Imaging Informatics in Medicine-American College of Radiology Pneumothorax Segmentation dataset and Radiological Society of North America Pneumonia Detection Challenge dataset), the performance of eight commonly used saliency map techniques were quantified in regard to (a) localization utility (segmentation and detection), (b) sensitivity to model weight randomization, (c) repeatability, and (d) reproducibility. Their performances versus baseline methods and localization network architectures were compared, using area under the precision-recall curve (AUPRC) and structural similarity index measure (SSIM) as metrics. RESULTS All eight saliency map techniques failed at least one of the criteria and were inferior in performance compared with localization networks. For pneumothorax segmentation, the AUPRC ranged from 0.024 to 0.224, while a U-Net achieved a significantly superior AUPRC of 0.404 (P < .005). For pneumonia detection, the AUPRC ranged from 0.160 to 0.519, while a RetinaNet achieved a significantly superior AUPRC of 0.596 (P <.005). Five and two saliency methods (of eight) failed the model randomization test on the segmentation and detection datasets, respectively, suggesting that these methods are not sensitive to changes in model parameters. The repeatability and reproducibility of the majority of the saliency methods were worse than localization networks for both the segmentation and detection datasets. CONCLUSION The use of saliency maps in the high-risk domain of medical imaging warrants additional scrutiny and recommend that detection or segmentation models be used if localization is the desired output of the network.Keywords: Technology Assessment, Technical Aspects, Feature Detection, Convolutional Neural Network (CNN) Supplemental material is available for this article. © RSNA, 2021.
Collapse
Affiliation(s)
| | | | - Praveer Singh
- From the Athinoula A. Martinos Center for Biomedical Imaging,
Department of Radiology, Massachusetts General Hospital, Harvard Medical School,
149 13th St, Boston, MA 02129 (N.A., P.S., K.C., M.A., B.C., K.H., S.G.,
J.P., M.G., M.D.L., J.K.C.); Department of Computer Science, Shiv Nadar
University, Greater Noida, India (N.A.); Department of Operational Sciences,
Graduate School of Engineering and Management, Air Force Institute of
Technology, Wright-Patterson AFB, Dayton, Ohio (N.G.); and Massachusetts
Institute of Technology, Cambridge, Mass (K.C., B.C., K.H., J.P., J.A.)
| | - Ken Chang
- From the Athinoula A. Martinos Center for Biomedical Imaging,
Department of Radiology, Massachusetts General Hospital, Harvard Medical School,
149 13th St, Boston, MA 02129 (N.A., P.S., K.C., M.A., B.C., K.H., S.G.,
J.P., M.G., M.D.L., J.K.C.); Department of Computer Science, Shiv Nadar
University, Greater Noida, India (N.A.); Department of Operational Sciences,
Graduate School of Engineering and Management, Air Force Institute of
Technology, Wright-Patterson AFB, Dayton, Ohio (N.G.); and Massachusetts
Institute of Technology, Cambridge, Mass (K.C., B.C., K.H., J.P., J.A.)
| | - Mehak Aggarwal
- From the Athinoula A. Martinos Center for Biomedical Imaging,
Department of Radiology, Massachusetts General Hospital, Harvard Medical School,
149 13th St, Boston, MA 02129 (N.A., P.S., K.C., M.A., B.C., K.H., S.G.,
J.P., M.G., M.D.L., J.K.C.); Department of Computer Science, Shiv Nadar
University, Greater Noida, India (N.A.); Department of Operational Sciences,
Graduate School of Engineering and Management, Air Force Institute of
Technology, Wright-Patterson AFB, Dayton, Ohio (N.G.); and Massachusetts
Institute of Technology, Cambridge, Mass (K.C., B.C., K.H., J.P., J.A.)
| | - Bryan Chen
- From the Athinoula A. Martinos Center for Biomedical Imaging,
Department of Radiology, Massachusetts General Hospital, Harvard Medical School,
149 13th St, Boston, MA 02129 (N.A., P.S., K.C., M.A., B.C., K.H., S.G.,
J.P., M.G., M.D.L., J.K.C.); Department of Computer Science, Shiv Nadar
University, Greater Noida, India (N.A.); Department of Operational Sciences,
Graduate School of Engineering and Management, Air Force Institute of
Technology, Wright-Patterson AFB, Dayton, Ohio (N.G.); and Massachusetts
Institute of Technology, Cambridge, Mass (K.C., B.C., K.H., J.P., J.A.)
| | - Katharina Hoebel
- From the Athinoula A. Martinos Center for Biomedical Imaging,
Department of Radiology, Massachusetts General Hospital, Harvard Medical School,
149 13th St, Boston, MA 02129 (N.A., P.S., K.C., M.A., B.C., K.H., S.G.,
J.P., M.G., M.D.L., J.K.C.); Department of Computer Science, Shiv Nadar
University, Greater Noida, India (N.A.); Department of Operational Sciences,
Graduate School of Engineering and Management, Air Force Institute of
Technology, Wright-Patterson AFB, Dayton, Ohio (N.G.); and Massachusetts
Institute of Technology, Cambridge, Mass (K.C., B.C., K.H., J.P., J.A.)
| | - Sharut Gupta
- From the Athinoula A. Martinos Center for Biomedical Imaging,
Department of Radiology, Massachusetts General Hospital, Harvard Medical School,
149 13th St, Boston, MA 02129 (N.A., P.S., K.C., M.A., B.C., K.H., S.G.,
J.P., M.G., M.D.L., J.K.C.); Department of Computer Science, Shiv Nadar
University, Greater Noida, India (N.A.); Department of Operational Sciences,
Graduate School of Engineering and Management, Air Force Institute of
Technology, Wright-Patterson AFB, Dayton, Ohio (N.G.); and Massachusetts
Institute of Technology, Cambridge, Mass (K.C., B.C., K.H., J.P., J.A.)
| | - Jay Patel
- From the Athinoula A. Martinos Center for Biomedical Imaging,
Department of Radiology, Massachusetts General Hospital, Harvard Medical School,
149 13th St, Boston, MA 02129 (N.A., P.S., K.C., M.A., B.C., K.H., S.G.,
J.P., M.G., M.D.L., J.K.C.); Department of Computer Science, Shiv Nadar
University, Greater Noida, India (N.A.); Department of Operational Sciences,
Graduate School of Engineering and Management, Air Force Institute of
Technology, Wright-Patterson AFB, Dayton, Ohio (N.G.); and Massachusetts
Institute of Technology, Cambridge, Mass (K.C., B.C., K.H., J.P., J.A.)
| | - Mishka Gidwani
- From the Athinoula A. Martinos Center for Biomedical Imaging,
Department of Radiology, Massachusetts General Hospital, Harvard Medical School,
149 13th St, Boston, MA 02129 (N.A., P.S., K.C., M.A., B.C., K.H., S.G.,
J.P., M.G., M.D.L., J.K.C.); Department of Computer Science, Shiv Nadar
University, Greater Noida, India (N.A.); Department of Operational Sciences,
Graduate School of Engineering and Management, Air Force Institute of
Technology, Wright-Patterson AFB, Dayton, Ohio (N.G.); and Massachusetts
Institute of Technology, Cambridge, Mass (K.C., B.C., K.H., J.P., J.A.)
| | - Julius Adebayo
- From the Athinoula A. Martinos Center for Biomedical Imaging,
Department of Radiology, Massachusetts General Hospital, Harvard Medical School,
149 13th St, Boston, MA 02129 (N.A., P.S., K.C., M.A., B.C., K.H., S.G.,
J.P., M.G., M.D.L., J.K.C.); Department of Computer Science, Shiv Nadar
University, Greater Noida, India (N.A.); Department of Operational Sciences,
Graduate School of Engineering and Management, Air Force Institute of
Technology, Wright-Patterson AFB, Dayton, Ohio (N.G.); and Massachusetts
Institute of Technology, Cambridge, Mass (K.C., B.C., K.H., J.P., J.A.)
| | - Matthew D. Li
- From the Athinoula A. Martinos Center for Biomedical Imaging,
Department of Radiology, Massachusetts General Hospital, Harvard Medical School,
149 13th St, Boston, MA 02129 (N.A., P.S., K.C., M.A., B.C., K.H., S.G.,
J.P., M.G., M.D.L., J.K.C.); Department of Computer Science, Shiv Nadar
University, Greater Noida, India (N.A.); Department of Operational Sciences,
Graduate School of Engineering and Management, Air Force Institute of
Technology, Wright-Patterson AFB, Dayton, Ohio (N.G.); and Massachusetts
Institute of Technology, Cambridge, Mass (K.C., B.C., K.H., J.P., J.A.)
| | - Jayashree Kalpathy-Cramer
- From the Athinoula A. Martinos Center for Biomedical Imaging,
Department of Radiology, Massachusetts General Hospital, Harvard Medical School,
149 13th St, Boston, MA 02129 (N.A., P.S., K.C., M.A., B.C., K.H., S.G.,
J.P., M.G., M.D.L., J.K.C.); Department of Computer Science, Shiv Nadar
University, Greater Noida, India (N.A.); Department of Operational Sciences,
Graduate School of Engineering and Management, Air Force Institute of
Technology, Wright-Patterson AFB, Dayton, Ohio (N.G.); and Massachusetts
Institute of Technology, Cambridge, Mass (K.C., B.C., K.H., J.P., J.A.)
| |
Collapse
|
22
|
Kalpathy-Cramer J, Patel JB, Bridge C, Chang K. Basic Artificial Intelligence Techniques: Evaluation of Artificial Intelligence Performance. Radiol Clin North Am 2021; 59:941-954. [PMID: 34689879 DOI: 10.1016/j.rcl.2021.06.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Jayashree Kalpathy-Cramer
- Radiology, Martinos Center for Biomedical Imaging, Massachusetts General Hospital, 149 13th Street, Boston, MA 02129, USA.
| | - Jay B Patel
- Radiology, Martinos Center for Biomedical Imaging, Massachusetts General Hospital, 149 13th Street, Boston, MA 02129, USA
| | - Christopher Bridge
- Radiology, Martinos Center for Biomedical Imaging, Massachusetts General Hospital, 149 13th Street, Boston, MA 02129, USA
| | - Ken Chang
- Radiology, Martinos Center for Biomedical Imaging, Massachusetts General Hospital, 149 13th Street, Boston, MA 02129, USA
| |
Collapse
|
23
|
Radiology Implementation Considerations for Artificial Intelligence (AI) Applied to COVID-19, From the AJR Special Series on AI Applications. AJR Am J Roentgenol 2021; 219:15-23. [PMID: 34612681 DOI: 10.2214/ajr.21.26717] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Hundreds of imaging-based artificial intelligence (AI) models have been developed in response to the COVID-19 pandemic. AI systems that incorporate imaging have shown promise in primary detection, severity grading, and prognostication of outcomes in COVID-19, and have enabled integration of imaging with a broad range of additional clinical and epidemiologic data. However, systematic reviews of AI models applied to COVID-19 medical imaging have highlighted problems in the field, including methodologic issues and problems in real-world deployment. Clinical use of such models should be informed by both the promise and potential pitfalls of implementation. How does a practicing radiologist make sense of this complex topic, and what factors should be considered in the implementation of AI tools for imaging of COVID-19? This critical review aims to help the radiologist understand the nuances that impact the clinical deployment of AI for imaging of COVID-19. We review imaging use cases for AI models in COVID-19 (e.g., diagnosis, severity assessment, and prognostication) and explore considerations for AI model development and testing, deployment infrastructure, clinical user interfaces, quality control, and institutional review board and regulatory approvals, with a practical focus on what a radiologist should consider when implementing an AI tool for COVID-19.
Collapse
|
24
|
Allen B, Dreyer K, Stibolt R, Agarwal S, Coombs L, Treml C, Elkholy M, Brink L, Wald C. Evaluation and Real-World Performance Monitoring of Artificial Intelligence Models in Clinical Practice: Try It, Buy It, Check It. J Am Coll Radiol 2021; 18:1489-1496. [PMID: 34599876 DOI: 10.1016/j.jacr.2021.08.022] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Accepted: 08/02/2021] [Indexed: 01/16/2023]
Abstract
The pace of regulatory clearance of artificial intelligence (AI) algorithms for radiology continues to accelerate, and numerous algorithms are becoming available for use in clinical practice. End users of AI in radiology should be aware that AI algorithms may not work as expected when used beyond the institutions in which they were trained, and model performance may degrade over time. In this article, we discuss why regulatory clearance alone may not be enough to ensure AI will be safe and effective in all radiological practices and review strategies available resources for evaluating before clinical use and monitoring performance of AI models to ensure efficacy and patient safety.
Collapse
Affiliation(s)
- Bibb Allen
- Chief Medical Officer ACR Data Science Institute; and Department of Radiology, Grandview Medical Center, Birmingham, Alabama.
| | - Keith Dreyer
- Chief Science Officer ACR Data Science Institute; and Massachusetts General Hospital, Boston, Massachusetts
| | - Robert Stibolt
- Diagnostic Radiology, Brookwood Baptist Health, Birmingham, Alabama
| | | | | | - Chris Treml
- ACR Data Science Institute, Reston, Virginia
| | | | - Laura Brink
- ACR Data Science Institute, Reston, Virginia
| | | |
Collapse
|
25
|
Chen JS, Coyner AS, Ostmo S, Sonmez K, Bajimaya S, Pradhan E, Valikodath N, Cole ED, Al-Khaled T, Chan RVP, Singh P, Kalpathy-Cramer J, Chiang MF, Campbell JP. Deep Learning for the Diagnosis of Stage in Retinopathy of Prematurity: Accuracy and Generalizability across Populations and Cameras. Ophthalmol Retina 2021; 5:1027-1035. [PMID: 33561545 PMCID: PMC8364291 DOI: 10.1016/j.oret.2020.12.013] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 12/02/2020] [Accepted: 12/16/2020] [Indexed: 12/23/2022]
Abstract
PURPOSE Stage is an important feature to identify in retinal images of infants at risk of retinopathy of prematurity (ROP). The purpose of this study was to implement a convolutional neural network (CNN) for binary detection of stages 1, 2, and 3 in ROP and to evaluate its generalizability across different populations and camera systems. DESIGN Diagnostic validation study of CNN for stage detection. PARTICIPANTS Retinal fundus images obtained from preterm infants during routine ROP screenings. METHODS Two datasets were used: 5943 fundus images obtained by RetCam camera (Natus Medical, Pleasanton, CA) from 9 North American institutions and 5049 images obtained by 3nethra camera (Forus Health Incorporated, Bengaluru, India) from 4 hospitals in Nepal. Images were labeled based on the presence of stage by 1 to 3 expert graders. Three CNN models were trained using 5-fold cross-validation on datasets from North America alone, Nepal alone, and a combined dataset and were evaluated on 2 held-out test sets consisting of 708 and 247 images from the Nepali and North American datasets, respectively. MAIN OUTCOME MEASURES Convolutional neural network performance was evaluated using area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), sensitivity, and specificity. RESULTS Both the North American- and Nepali-trained models demonstrated high performance on a test set from the same population: AUROC, 0.99; AUPRC, 0.98; sensitivity, 94%; and AUROC, 0.97; AUPRC, 0.91; and sensitivity, 73%; respectively. However, the performance of each model decreased to AUROC of 0.96 and AUPRC of 0.88 (sensitivity, 52%) and AUROC of 0.62 and AUPRC of 0.36 (sensitivity, 44%) when evaluated on a test set from the other population. Compared with the models trained on individual datasets, the model trained on a combined dataset achieved improved performance on each respective test set: sensitivity improved from 94% to 98% on the North American test set and from 73% to 82% on the Nepali test set. CONCLUSIONS A CNN can identify accurately the presence of ROP stage in retinal images, but performance depends on the similarity between training and testing populations. We demonstrated that internal and external performance can be improved by increasing the heterogeneity of the training dataset features of the training dataset, in this case by combining images from different populations and cameras.
Collapse
Affiliation(s)
- Jimmy S Chen
- Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, Oregon
| | - Aaron S Coyner
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, Oregon
| | - Susan Ostmo
- Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, Oregon
| | - Kemal Sonmez
- Cancer Early Detection Advanced Research Center, Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon
| | | | - Eli Pradhan
- Tilganga Institute of Ophthalmology, Kathmandu, Nepal
| | - Nita Valikodath
- Department of Ophthalmology and Visual Sciences, Illinois Eye and Ear Infirmary, University of Illinois at Chicago, Chicago, Illinois
| | - Emily D Cole
- Department of Ophthalmology and Visual Sciences, Illinois Eye and Ear Infirmary, University of Illinois at Chicago, Chicago, Illinois
| | - Tala Al-Khaled
- Department of Ophthalmology and Visual Sciences, Illinois Eye and Ear Infirmary, University of Illinois at Chicago, Chicago, Illinois
| | - R V Paul Chan
- Department of Ophthalmology and Visual Sciences, Illinois Eye and Ear Infirmary, University of Illinois at Chicago, Chicago, Illinois
| | - Praveer Singh
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Charlestown, Massachusetts; Center for Clinical Data Science, Massachusetts General Hospital and Brigham and Women's Hospital, Boston, Massachusetts
| | - Jayashree Kalpathy-Cramer
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Charlestown, Massachusetts; Center for Clinical Data Science, Massachusetts General Hospital and Brigham and Women's Hospital, Boston, Massachusetts
| | - Michael F Chiang
- Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, Oregon; Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, Oregon
| | - J Peter Campbell
- Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, Oregon.
| |
Collapse
|
26
|
Lee A, Kim MS, Han SS, Park P, Lee C, Yun JP. Deep learning neural networks to differentiate Stafne's bone cavity from pathological radiolucent lesions of the mandible in heterogeneous panoramic radiography. PLoS One 2021; 16:e0254997. [PMID: 34283883 PMCID: PMC8291753 DOI: 10.1371/journal.pone.0254997] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Accepted: 07/07/2021] [Indexed: 11/18/2022] Open
Abstract
This study aimed to develop a high-performance deep learning algorithm to differentiate Stafne’s bone cavity (SBC) from cysts and tumors of the jaw based on images acquired from various panoramic radiographic systems. Data sets included 176 Stafne’s bone cavities and 282 odontogenic cysts and tumors of the mandible (98 dentigerous cysts, 91 odontogenic keratocysts, and 93 ameloblastomas) that required surgical removal. Panoramic radiographs were obtained using three different imaging systems. The trained model showed 99.25% accuracy, 98.08% sensitivity, and 100% specificity for SBC classification and resulted in one misclassified SBC case. The algorithm was approved to recognize the typical imaging features of SBC in panoramic radiography regardless of the imaging system when traced back with Grad-Cam and Guided Grad-Cam methods. The deep learning model for SBC differentiating from odontogenic cysts and tumors showed high performance with images obtained from multiple panoramic systems. The present algorithm is expected to be a useful tool for clinicians, as it diagnoses SBCs in panoramic radiography to prevent unnecessary examinations for patients. Additionally, it would provide support for clinicians to determine further examinations or referrals to surgeons for cases where even experts are unsure of diagnosis using panoramic radiography alone.
Collapse
Affiliation(s)
- Ari Lee
- Department of Oral and Maxillofacial Radiology, Yonsei University College of Dentistry, Seoul, Republic of Korea
| | - Min Su Kim
- Department of Electrical Engineering, Pohang University of Science and Technology, Pohang, Gyeongbuk, Republic of Korea
| | - Sang-Sun Han
- Department of Oral and Maxillofacial Radiology, Yonsei University College of Dentistry, Seoul, Republic of Korea
| | - PooGyeon Park
- Department of Electrical Engineering, Pohang University of Science and Technology, Pohang, Gyeongbuk, Republic of Korea
| | - Chena Lee
- Department of Oral and Maxillofacial Radiology, Yonsei University College of Dentistry, Seoul, Republic of Korea
- * E-mail: (CL); (JPY)
| | - Jong Pil Yun
- Daegyeong Division, Korea Institute of Industrial Technology, Daegu, Republic of Korea
- * E-mail: (CL); (JPY)
| |
Collapse
|
27
|
Chang K, Singh P, Vepakomma P, Poirot MG, Raskar R, Rubin DL, Kalpathy-Cramer J. Privacy-preserving collaborative deep learning methods for multiinstitutional training without sharing patient data. Artif Intell Med 2021. [DOI: 10.1016/b978-0-12-821259-2.00006-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|