1
|
Wayne N, Wu Q, Moore SC, Ferrari VA, Metzler SD, Guerraty MA. Multimodality assessment of the coronary microvasculature with TIMI frame count versus perfusion PET highlights coronary changes characteristic of coronary microvascular disease. Front Cardiovasc Med 2024; 11:1395036. [PMID: 38966750 PMCID: PMC11222597 DOI: 10.3389/fcvm.2024.1395036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Accepted: 06/07/2024] [Indexed: 07/06/2024] Open
Abstract
Background The diagnosis of coronary microvascular disease (CMVD) remains challenging. Perfusion PET-derived myocardial blood flow (MBF) reserve (MBFR) can quantify CMVD but is not widely available. Thrombolysis in Myocardial Infarction (TIMI) frame count (TFC) is an angiography-based method that has been proposed as a measure of CMVD. Here, we compare TFC and PET-derived MBF measurements to establish the role of TFC in assessing for CMVD. We use coronary modeling to elucidate the relationship between MBFR and TFC and propose TFC thresholds for identifying CMVD. Methods In a cohort of 123 individuals (age 58 ± 12.1, 63% women, 41% Caucasian) without obstructive coronary artery disease who had undergone perfusion PET and coronary angiography for clinical indications, we compared TFC and perfusion PET parameters using Pearson correlation (PCC) and linear regression modeling. We used mathematical modeling of the coronary circulation to understand the relationship between these parameters and performed Receiver Operating Curve (ROC) analysis. Results We found a significant negative correlation between TFC and MBFR. Sex, race and ethnicity, and nitroglycerin administration impact this relationship. Coronary modeling showed an uncoupling between TFC and flow in epicardial vessels. In ROC analysis, TFC performed well in women (AUC 0.84-0.89) and a moderately in men (AUC 0.68-0.78). Conclusions We established an inverse relationship between TFC and PET-derived MBFR, which is affected by patient selection and procedural factors. TFC represents a measure of the volume of the epicardial coronary compartment, which is increased in patients with CMVD, and performs well in identifying women with CMVD.
Collapse
Affiliation(s)
- Nicole Wayne
- Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, United States
| | - Qufei Wu
- Center for Clinical Epidemiology and Biostatistics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, United States
| | - Stephen C. Moore
- Department of Radiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, United States
| | - Victor A. Ferrari
- Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, United States
| | - Scott D. Metzler
- Department of Radiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, United States
| | - Marie A. Guerraty
- Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, United States
| |
Collapse
|
2
|
Jiang Y, Iuanow E, Malik B, Klock J. A Multireader Multicase (MRMC) Receiver Operating Characteristic (ROC) Study Evaluating Noninferiority of Quantitative Transmission (QT) Ultrasound to Digital Breast Tomosynthesis (DBT) on Detection and Recall of Breast Lesions. Acad Radiol 2024; 31:2248-2258. [PMID: 38290888 DOI: 10.1016/j.acra.2023.12.038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 12/16/2023] [Accepted: 12/26/2023] [Indexed: 02/01/2024]
Abstract
RATIONALE AND OBJECTIVES Quantitative transmission (QT) imaging is an emerging volumetric ultrasound modality for women too young for mammography. QT images tissue without overlap seen in mammography, thereby can potentially improve breast mass detection and characterization and noncancer recall. We compared radiologists' interpretation of QT vs digital breast tomosynthesis (DBT) with a multireader multicase observer performance study. MATERIALS AND METHODS Study subjects received screening DBT and QT scans in HIPAA-compliant, institutional review board-approved prospective case-collection studies at four clinical sites. Twenty-four Mammography Quality Standards Act-qualified radiologists interpreted 177 cases (66 with cancer, atypia, or solid mass and 111 normal or with nonsolid benign abnormality), first QT, then 2 weeks later DBT synthesized 2D-views. Readers reported up to three findings per case and for each finding a recall or no recall decision and confidence of that decision. The study hypothesis was area under receiver operating characteristic curve (AUC) of QT was noninferior to DBT. Sensitivity and specificity were also compared. RESULTS AUC of QT (0.746 ± 0.028, mean ± SD) was noninferior to DBT (0.700 ± 0.028) for AUC difference margin of -0.05 (P < .05). AUC difference was 0.046 ± 0.028 (95% CI: [-0.008, 0.101]). Sensitivity was 70.6 ± 7.2% for QT and 85.2 ± 6.4% for DBT, specificity was 60.1 ± 12.3% vs 37.2 ± 11.0%, and both differences were statistically significant. Of a total of 21 cases of cysts, readers recommended recall, on average, in 1.1 ± 1.4 cases with QT, but not with DBT, and 10.6 ± 2.2 cases with DBT, but not with QT. CONCLUSION QT can be a potential alternative to mammography for breast cancer screening of women too young to undergo mammography.
Collapse
Affiliation(s)
- Yulei Jiang
- Department of Radiology, the University of Chicago, 5841 South Maryland Ave, MC2026, Chicago, IL 60637.
| | | | | | | |
Collapse
|
3
|
Whitney HM, Drukker K, Vieceli M, Dusen AV, de Oliveira M, Abe H, Giger ML. Role of sureness in evaluating AI/CADx: Lesion-based repeatability of machine learning classification performance on breast MRI. Med Phys 2024; 51:1812-1821. [PMID: 37602841 PMCID: PMC10879454 DOI: 10.1002/mp.16673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 07/24/2023] [Accepted: 07/24/2023] [Indexed: 08/22/2023] Open
Abstract
BACKGROUND Artificial intelligence/computer-aided diagnosis (AI/CADx) and its use of radiomics have shown potential in diagnosis and prognosis of breast cancer. Performance metrics such as the area under the receiver operating characteristic (ROC) curve (AUC) are frequently used as figures of merit for the evaluation of CADx. Methods for evaluating lesion-based measures of performance may enhance the assessment of AI/CADx pipelines, particularly in the situation of comparing performances by classifier. PURPOSE The purpose of this study was to investigate the use case of two standard classifiers to (1) compare overall classification performance of the classifiers in the task of distinguishing between benign and malignant breast lesions using radiomic features extracted from dynamic contrast-enhanced magnetic resonance (DCE-MR) images, (2) define a new repeatability metric (termed sureness), and (3) use sureness to examine if one classifier provides an advantage in AI diagnostic performance by lesion when using radiomic features. METHODS Images of 1052 breast lesions (201 benign, 851 cancers) had been retrospectively collected under HIPAA/IRB compliance. The lesions had been segmented automatically using a fuzzy c-means method and thirty-two radiomic features had been extracted. Classification was investigated for the task of malignant lesions (81% of the dataset) versus benign lesions (19%). Two classifiers (linear discriminant analysis, LDA and support vector machines, SVM) were trained and tested within 0.632 bootstrap analyses (2000 iterations). Whole-set classification performance was evaluated at two levels: (1) the 0.632+ bias-corrected area under the ROC curve (AUC) and (2) performance metric curves which give variability in operating sensitivity and specificity at a target operating point (95% target sensitivity). Sureness was defined as 1-95% confidence interval of the classifier output for each lesion for each classifier. Lesion-based repeatability was evaluated at two levels: (1) repeatability profiles, which represent the distribution of sureness across the decision threshold and (2) sureness of each lesion. The latter was used to identify lesions with better sureness with one classifier over another while maintaining lesion-based performance across the bootstrap iterations. RESULTS In classification performance assessment, the median and 95% CI of difference in AUC between the two classifiers did not show evidence of difference (ΔAUC = -0.003 [-0.031, 0.018]). Both classifiers achieved the target sensitivity. Sureness was more consistent across the classifier output range for the SVM classifier than the LDA classifier. The SVM resulted in a net gain of 33 benign lesions and 307 cancers with higher sureness and maintained lesion-based performance. However, with the LDA there was a notable percentage of benign lesions (42%) with better sureness but lower lesion-based performance. CONCLUSIONS When there is no evidence for difference in performance between classifiers using AUC or other performance summary measures, a lesion-based sureness metric may provide additional insight into AI pipeline design. These findings present and emphasize the utility of lesion-based repeatability via sureness in AI/CADx as a complementary enhancement to other evaluation measures.
Collapse
Affiliation(s)
- Heather M. Whitney
- Department of Radiology, The University of Chicago, Chicago, IL USA 60637
| | - Karen Drukker
- Department of Radiology, The University of Chicago, Chicago, IL USA 60637
| | - Michael Vieceli
- Department of Physics, Wheaton College, Wheaton, IL USA 60187
| | - Amy Van Dusen
- Department of Physics, Wheaton College, Wheaton, IL USA 60187
| | | | - Hiroyuki Abe
- Department of Radiology, The University of Chicago, Chicago, IL USA 60637
| | - Maryellen L. Giger
- Department of Radiology, The University of Chicago, Chicago, IL USA 60637
| |
Collapse
|
4
|
Pretorius PH, Liu J, Kalluri KS, Jiang Y, Leppo JA, Dahlberg ST, Kikut J, Parker MW, Keating FK, Licho R, Auer B, Lindsay C, Konik A, Yang Y, Wernick MN, King MA. Observer studies of image quality of denoising reduced-count cardiac single photon emission computed tomography myocardial perfusion imaging by three-dimensional Gaussian post-reconstruction filtering and deep learning. J Nucl Cardiol 2023; 30:2427-2437. [PMID: 37221409 PMCID: PMC11401514 DOI: 10.1007/s12350-023-03295-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 04/25/2023] [Indexed: 05/25/2023]
Abstract
BACKGROUND The aim of this research was to asses perfusion-defect detection-accuracy by human observers as a function of reduced-counts for 3D Gaussian post-reconstruction filtering vs deep learning (DL) denoising to determine if there was improved performance with DL. METHODS SPECT projection data of 156 normally interpreted patients were used for these studies. Half were altered to include hybrid perfusion defects with defect presence and location known. Ordered-subset expectation-maximization (OSEM) reconstruction was employed with the optional correction of attenuation (AC) and scatter (SC) in addition to distance-dependent resolution (RC). Count levels varied from full-counts (100%) to 6.25% of full-counts. The denoising strategies were previously optimized for defect detection using total perfusion deficit (TPD). Four medical physicist (PhD) and six physician (MD) observers rated the slices using a graphical user interface. Observer ratings were analyzed using the LABMRMC multi-reader, multi-case receiver-operating-characteristic (ROC) software to calculate and compare statistically the area-under-the-ROC-curves (AUCs). RESULTS For the same count-level no statistically significant increase in AUCs for DL over Gaussian denoising was determined when counts were reduced to either the 25% or 12.5% of full-counts. The average AUC for full-count OSEM with solely RC and Gaussian filtering was lower than for the strategies with AC and SC, except for a reduction to 6.25% of full-counts, thus verifying the utility of employing AC and SC with RC. CONCLUSION We did not find any indication that at the dose levels investigated and with the DL network employed, that DL denoising was superior in AUC to optimized 3D post-reconstruction Gaussian filtering.
Collapse
Affiliation(s)
- P Hendrik Pretorius
- Division of Nuclear Medicine, Department of Radiology, University of Massachusetts Chan Medical School, Worcester, MA, USA.
| | - Junchi Liu
- Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL, USA
| | - Kesava S Kalluri
- Division of Nuclear Medicine, Department of Radiology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | | | | | - Seth T Dahlberg
- Cardiovascular Medicine, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Janusz Kikut
- University of Vermont Medical Center, Burlington, VT, USA
| | - Matthew W Parker
- Cardiovascular Medicine, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | | | - Robert Licho
- UMass Memorial Medical Center - University Campus, Worcester, MA, USA
| | - Benjamin Auer
- Brigham and Women's Hospital Department of Radiology, Boston, MA, USA
| | - Clifford Lindsay
- Division of Nuclear Medicine, Department of Radiology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Arda Konik
- Dana-Farber Cancer Institute Department of Radiation Oncology, Boston, MA, USA
| | - Yongyi Yang
- Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL, USA
| | - Miles N Wernick
- Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL, USA
| | - Michael A King
- Division of Nuclear Medicine, Department of Radiology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| |
Collapse
|
5
|
Zhou W, Villa U, Anastasio MA. Ideal Observer Computation by Use of Markov-Chain Monte Carlo With Generative Adversarial Networks. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:3715-3724. [PMID: 37578916 PMCID: PMC10769588 DOI: 10.1109/tmi.2023.3304907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/16/2023]
Abstract
Medical imaging systems are often evaluated and optimized via objective, or task-specific, measures of image quality (IQ) that quantify the performance of an observer on a specific clinically-relevant task. The performance of the Bayesian Ideal Observer (IO) sets an upper limit among all observers, numerical or human, and has been advocated for use as a figure-of-merit (FOM) for evaluating and optimizing medical imaging systems. However, the IO test statistic corresponds to the likelihood ratio that is intractable to compute in the majority of cases. A sampling-based method that employs Markov-chain Monte Carlo (MCMC) techniques was previously proposed to estimate the IO performance. However, current applications of MCMC methods for IO approximation have been limited to a small number of situations where the considered distribution of to-be-imaged objects can be described by a relatively simple stochastic object model (SOM). As such, there remains an important need to extend the domain of applicability of MCMC methods to address a large variety of scenarios where IO-based assessments are needed but the associated SOMs have not been available. In this study, a novel MCMC method that employs a generative adversarial network (GAN)-based SOM, referred to as MCMC-GAN, is described and evaluated. The MCMC-GAN method was quantitatively validated by use of test-cases for which reference solutions were available. The results demonstrate that the MCMC-GAN method can extend the domain of applicability of MCMC methods for conducting IO analyses of medical imaging systems.
Collapse
|
6
|
Shenouda M, Flerlage I, Kaveti A, Giger ML, Armato SG. Assessment of a deep learning model for COVID-19 classification on chest radiographs: a comparison across image acquisition techniques and clinical factors. J Med Imaging (Bellingham) 2023; 10:064504. [PMID: 38162317 PMCID: PMC10753846 DOI: 10.1117/1.jmi.10.6.064504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 11/30/2023] [Accepted: 12/06/2023] [Indexed: 01/03/2024] Open
Abstract
Purpose The purpose is to assess the performance of a pre-trained deep learning model in the task of classifying between coronavirus disease (COVID)-positive and COVID-negative patients from chest radiographs (CXRs) while considering various image acquisition parameters, clinical factors, and patient demographics. Methods Standard and soft-tissue CXRs of 9860 patients comprised the "original dataset," consisting of training and test sets and were used to train a DenseNet-121 architecture model to classify COVID-19 using three classification algorithms: standard, soft tissue, and a combination of both types of images via feature fusion. A larger more-current test set of 5893 patients (the "current test set") was used to assess the performance of the pretrained model. The current test set contained a larger span of dates, incorporated different variants of the virus and included different immunization statuses. Model performance between the original and current test sets was evaluated using area under the receiver operating characteristic curve (ROC AUC) [95% CI]. Results The model achieved AUC values of 0.67 [0.65, 0.70] for cropped standard images, 0.65 [0.63, 0.67] for cropped soft-tissue images, and 0.67 [0.65, 0.69] for both types of cropped images. These were all significantly lower than the performance of the model on the original test set. Investigations regarding matching the acquisition dates between the test sets (i.e., controlling for virus variants), immunization status, disease severity, and age and sex distributions did not fully explain the discrepancy in performance. Conclusions Several relevant factors were considered to determine whether differences existed in the test sets, including time period of image acquisition, vaccination status, and disease severity. The lower performance on the current test set may have occurred due to model overfitting and a lack of generalizability.
Collapse
Affiliation(s)
- Mena Shenouda
- The University of Chicago, Committee on Medical Physics, Department of Radiology, Chicago, Illinois, United States
| | | | - Aditi Kaveti
- Stony Brook University, Stony Brook, New York, United States
| | - Maryellen L. Giger
- The University of Chicago, Committee on Medical Physics, Department of Radiology, Chicago, Illinois, United States
| | - Samuel G. Armato
- The University of Chicago, Committee on Medical Physics, Department of Radiology, Chicago, Illinois, United States
| |
Collapse
|
7
|
Granstedt JL, Zhou W, Anastasio MA. Approximating the Hotelling observer with autoencoder-learned efficient channels for binary signal detection tasks. J Med Imaging (Bellingham) 2023; 10:055501. [PMID: 37767114 PMCID: PMC10520791 DOI: 10.1117/1.jmi.10.5.055501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 08/31/2023] [Accepted: 09/05/2023] [Indexed: 09/29/2023] Open
Abstract
Purpose The objective assessment of image quality (IQ) has been advocated for the analysis and optimization of medical imaging systems. One method of computing such IQ metrics is through a numerical observer. The Hotelling observer (HO) is the optimal linear observer, but conventional methods for obtaining the HO can become intractable due to large image sizes or insufficient data. Channelized methods are sometimes employed in such circumstances to approximate the HO. The performance of such channelized methods varies, with different methods obtaining superior performance to others depending on the imaging conditions and detection task. A channelized HO method using an AE is presented and implemented across several tasks to characterize its performance. Approach The process for training an AE is demonstrated to be equivalent to developing a set of channels for approximating the HO. The efficiency of the learned AE-channels is increased by modifying the conventional AE loss function to incorporate task-relevant information. Multiple binary detection tasks involving lumpy and breast phantom backgrounds across varying dataset sizes are considered to evaluate the performance of the proposed method and compare to current state-of-the-art channelized methods. Additionally, the ability of the channelized methods to generalize to images outside of the training dataset is investigated. Results AE-learned channels are demonstrated to have comparable performance with other state-of-the-art channel methods in the detection studies and superior performance in the generalization studies. Incorporating a cleaner estimate of the signal for the detection task is also demonstrated to significantly improve the performance of the proposed method, particularly in datasets with fewer images. Conclusions AEs are demonstrated to be capable of learning efficient channels for the HO. The resulting significant increase in detection performance for small dataset sizes when incorporating a signal prior holds promising implications for future assessments of imaging technologies.
Collapse
Affiliation(s)
- Jason L. Granstedt
- University of Illinois Urbana-Champaign, Department of Computer Science, Champaign, Illinois, United States
| | - Weimin Zhou
- Shanghai Jiao Tong University, Global Institute of Future Technology, Shanghai, China
| | - Mark A. Anastasio
- University of Illinois Urbana-Champaign, Department of Computer Science, Champaign, Illinois, United States
- University of Illinois Urbana-Champaign, Department of Bioengineering, Champaign, Illinois, United States
| |
Collapse
|
8
|
Ji Y, Whitney HM, Li H, Liu P, Giger ML, Zhang X. Differences in Molecular Subtype Reference Standards Impact AI-based Breast Cancer Classification with Dynamic Contrast-enhanced MRI. Radiology 2023; 307:e220984. [PMID: 36594836 PMCID: PMC10068887 DOI: 10.1148/radiol.220984] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Revised: 10/20/2022] [Accepted: 11/01/2022] [Indexed: 01/04/2023]
Abstract
Background Breast cancer tumors can be identified as different luminal molecular subtypes depending on either immunohistochemical (IHC) staining or St Gallen criteria that includes Ki-67. Purpose To characterize molecular subtypes and understand the impact of disagreement among IHC and St Gallen molecular subtype reference standards on artificial intelligence classification of luminal A and luminal B tumors with use of radiomic features extracted from dynamic contrast-enhanced (DCE) MRI scans. Materials and Methods In this retrospective study, 28 radiomic features previously extracted from DCE-MRI scans of breast tumors imaged between February 2015 and October 2017 were examined in the following groups: (a) tumors classified as luminal A by both reference standards ("agreement"), (b) tumors classified as luminal A by IHC and luminal B by St Gallen ("disagreement"), and (c) tumors classified as luminal B by both ("agreement"). Luminal A or luminal B tumor classification with use of radiomic features was conducted with use of three sets: (a) IHC molecular subtyping, (b) St Gallen molecular subtyping, and (c) agreement tumors. The Kruskal-Wallis test was followed by the Mann-Whitney U test to determine pair-wise differences of radiomic features among agreement and disagreement tumors. Fivefold cross-validation with use of stepwise feature selection and linear discriminant analysis classified tumors in each set, with performance measured with use of area under the receiver operating characteristic curve (AUC). Results A total of 877 breast cancer tumors from 872 women (mean age, 48 years [range, 19-75 years]) were analyzed. Six features (sphericity, irregularity, surface area to volume ratio, variance of radial gradient histogram, sum average, volume of most enhancing voxels) were different (P ≤ .001) among agreement and disagreement tumors. AUC (median, 0.74 [95% CI: 0.68, 0.80]) was higher than when using tumors subtyped by either reference standard (IHC, 0.66 [0.60, 0.71], P = .003; St Gallen, 0.62 [0.58, 0.67], P = .001). Conclusion Differences in reference standards can hinder artificial intelligence classification performance of luminal molecular subtypes with dynamic contrast-enhanced MRI. © RSNA, 2023 Supplemental material is available for this article. See also the editorial by Bae in this issue.
Collapse
Affiliation(s)
- Yu Ji
- From the Department of Radiology, The Second Hospital of Tianjin
Medical University, No. 23 Pingjiang Rd, Hexi District, Tianjin, China 300211
(Y.J., X.Z.); National Clinical Research Center for Cancer, Tianjin Medical
University Cancer Institute and Hospital, Tianjin, China (Y.J., P.L.);
Department of Radiology, The University of Chicago, Chicago, Ill (H.M.W., H.L.,
M.L.G.); and Department of Physics, Wheaton College, Wheaton, Ill
(H.M.W.)
| | - Heather M. Whitney
- From the Department of Radiology, The Second Hospital of Tianjin
Medical University, No. 23 Pingjiang Rd, Hexi District, Tianjin, China 300211
(Y.J., X.Z.); National Clinical Research Center for Cancer, Tianjin Medical
University Cancer Institute and Hospital, Tianjin, China (Y.J., P.L.);
Department of Radiology, The University of Chicago, Chicago, Ill (H.M.W., H.L.,
M.L.G.); and Department of Physics, Wheaton College, Wheaton, Ill
(H.M.W.)
| | - Hui Li
- From the Department of Radiology, The Second Hospital of Tianjin
Medical University, No. 23 Pingjiang Rd, Hexi District, Tianjin, China 300211
(Y.J., X.Z.); National Clinical Research Center for Cancer, Tianjin Medical
University Cancer Institute and Hospital, Tianjin, China (Y.J., P.L.);
Department of Radiology, The University of Chicago, Chicago, Ill (H.M.W., H.L.,
M.L.G.); and Department of Physics, Wheaton College, Wheaton, Ill
(H.M.W.)
| | - Peifang Liu
- From the Department of Radiology, The Second Hospital of Tianjin
Medical University, No. 23 Pingjiang Rd, Hexi District, Tianjin, China 300211
(Y.J., X.Z.); National Clinical Research Center for Cancer, Tianjin Medical
University Cancer Institute and Hospital, Tianjin, China (Y.J., P.L.);
Department of Radiology, The University of Chicago, Chicago, Ill (H.M.W., H.L.,
M.L.G.); and Department of Physics, Wheaton College, Wheaton, Ill
(H.M.W.)
| | - Maryellen L. Giger
- From the Department of Radiology, The Second Hospital of Tianjin
Medical University, No. 23 Pingjiang Rd, Hexi District, Tianjin, China 300211
(Y.J., X.Z.); National Clinical Research Center for Cancer, Tianjin Medical
University Cancer Institute and Hospital, Tianjin, China (Y.J., P.L.);
Department of Radiology, The University of Chicago, Chicago, Ill (H.M.W., H.L.,
M.L.G.); and Department of Physics, Wheaton College, Wheaton, Ill
(H.M.W.)
| | - Xuening Zhang
- From the Department of Radiology, The Second Hospital of Tianjin
Medical University, No. 23 Pingjiang Rd, Hexi District, Tianjin, China 300211
(Y.J., X.Z.); National Clinical Research Center for Cancer, Tianjin Medical
University Cancer Institute and Hospital, Tianjin, China (Y.J., P.L.);
Department of Radiology, The University of Chicago, Chicago, Ill (H.M.W., H.L.,
M.L.G.); and Department of Physics, Wheaton College, Wheaton, Ill
(H.M.W.)
| |
Collapse
|
9
|
Al-Labadi L, Evans M, Liang Q. ROC Analyses Based on Measuring Evidence Using the Relative Belief Ratio. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1710. [PMID: 36554115 PMCID: PMC9777999 DOI: 10.3390/e24121710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 11/18/2022] [Accepted: 11/19/2022] [Indexed: 06/17/2023]
Abstract
ROC (Receiver Operating Characteristic) analyses are considered under a variety of assumptions concerning the distributions of a measurement X in two populations. These include the binormal model as well as nonparametric models where little is assumed about the form of distributions. The methodology is based on a characterization of statistical evidence which is dependent on the specification of prior distributions for the unknown population distributions as well as for the relevant prevalence w of the disease in a given population. In all cases, elicitation algorithms are provided to guide the selection of the priors. Inferences are derived for the AUC (Area Under the Curve), the cutoff c used for classification as well as the error characteristics used to assess the quality of the classification.
Collapse
Affiliation(s)
- Luai Al-Labadi
- Department of Mathematical and Computational Sciences, University of Toronto Mississauga, Mississauga, ON L5L 1C6, Canada
| | - Michael Evans
- Department of Statistical Sciences, University of Toronto, Toronto, ON M5S 3G3, Canada
| | - Qiaoyu Liang
- Department of Statistical Sciences, University of Toronto, Toronto, ON M5S 3G3, Canada
| |
Collapse
|
10
|
Martínez-Camblor P. The fundamental role of density functions in the binary classification problem. J STAT COMPUT SIM 2022. [DOI: 10.1080/00949655.2022.2051026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Pablo Martínez-Camblor
- Department of Anesthesiology, Dartmouth-Hitchcock Medical Center, Lebanon, NH, USA
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
| |
Collapse
|
11
|
Ghebremichael M, Michael H. Comparison of the binormal and Lehman receiver operating characteristic curves. COMMUN STAT-SIMUL C 2022. [DOI: 10.1080/03610918.2022.2032159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Affiliation(s)
- Musie Ghebremichael
- Department of Medicine, Harvard Medical School and Ragon Institute, Cambridge, Massachusetts, USA
| | - Haben Michael
- Department of Mathematics, University of Massachusetts, Amherst, Massachusetts, USA
| |
Collapse
|
12
|
Whitney HM, Li H, Ji Y, Liu P, Giger ML. Multi-Stage Harmonization for Robust AI across Breast MR Databases. Cancers (Basel) 2021; 13:cancers13194809. [PMID: 34638294 PMCID: PMC8508003 DOI: 10.3390/cancers13194809] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Revised: 09/16/2021] [Accepted: 09/18/2021] [Indexed: 12/22/2022] Open
Abstract
Simple Summary Batch harmonization of radiomic features extracted from magnetic resonance images of breast lesions from two databases was applied to an artificial intelligence/machine learning classification workflow. Training and independent test sets from the two databases, as well as the combination of them, were used in pre-harmonization and post-harmonization forms to investigate the generalizability of performance in the task of distinguishing between malignant and benign lesions. Most training and independent test scenarios were statistically equivalent, demonstrating that batch harmonization with feature selection harmonization can potentially develop generalizable classification models. Abstract Radiomic features extracted from medical images may demonstrate a batch effect when cases come from different sources. We investigated classification performance using training and independent test sets drawn from two sources using both pre-harmonization and post-harmonization features. In this retrospective study, a database of thirty-two radiomic features, extracted from DCE-MR images of breast lesions after fuzzy c-means segmentation, was collected. There were 944 unique lesions in Database A (208 benign lesions, 736 cancers) and 1986 unique lesions in Database B (481 benign lesions, 1505 cancers). The lesions from each database were divided by year of image acquisition into training and independent test sets, separately by database and in combination. ComBat batch harmonization was conducted on the combined training set to minimize the batch effect on eligible features by database. The empirical Bayes estimates from the feature harmonization were applied to the eligible features of the combined independent test set. The training sets (A, B, and combined) were then used in training linear discriminant analysis classifiers after stepwise feature selection. The classifiers were then run on the A, B, and combined independent test sets. Classification performance was compared using pre-harmonization features to post-harmonization features, including their corresponding feature selection, evaluated using the area under the receiver operating characteristic curve (AUC) as the figure of merit. Four out of five training and independent test scenarios demonstrated statistically equivalent classification performance when compared pre- and post-harmonization. These results demonstrate that translation of machine learning techniques with batch data harmonization can potentially yield generalizable models that maintain classification performance.
Collapse
Affiliation(s)
- Heather M. Whitney
- Department of Radiology, The University of Chicago, Chicago, IL 60637, USA; (H.L.); (Y.J.)
- Department of Physics, Wheaton College, Wheaton, IL 60187, USA
- Correspondence: (H.M.W.); (M.L.G.)
| | - Hui Li
- Department of Radiology, The University of Chicago, Chicago, IL 60637, USA; (H.L.); (Y.J.)
| | - Yu Ji
- Department of Radiology, The University of Chicago, Chicago, IL 60637, USA; (H.L.); (Y.J.)
- Tianjin Medical University Cancer Institute and Hospital, Tianjin 300060, China;
| | - Peifang Liu
- Tianjin Medical University Cancer Institute and Hospital, Tianjin 300060, China;
| | - Maryellen L. Giger
- Department of Radiology, The University of Chicago, Chicago, IL 60637, USA; (H.L.); (Y.J.)
- Correspondence: (H.M.W.); (M.L.G.)
| |
Collapse
|
13
|
Hu Q, Whitney HM, Li H, Ji Y, Liu P, Giger ML. Improved Classification of Benign and Malignant Breast Lesions Using Deep Feature Maximum Intensity Projection MRI in Breast Cancer Diagnosis Using Dynamic Contrast-enhanced MRI. Radiol Artif Intell 2021; 3:e200159. [PMID: 34235439 PMCID: PMC8231792 DOI: 10.1148/ryai.2021200159] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Revised: 02/04/2021] [Accepted: 02/09/2021] [Indexed: 04/16/2023]
Abstract
PURPOSE To develop a deep transfer learning method that incorporates four-dimensional (4D) information in dynamic contrast-enhanced (DCE) MRI to classify benign and malignant breast lesions. MATERIALS AND METHODS The retrospective dataset is composed of 1990 distinct lesions (1494 malignant and 496 benign) from 1979 women (mean age, 47 years ± 10). Lesions were split into a training and validation set of 1455 lesions (acquired in 2015-2016) and an independent test set of 535 lesions (acquired in 2017). Features were extracted from a convolutional neural network (CNN), and lesions were classified as benign or malignant using support vector machines. Volumetric information was collapsed into two dimensions by taking the maximum intensity projection (MIP) at the image level or feature level within the CNN architecture. Performances were evaluated using the area under the receiver operating characteristic curve (AUC) as the figure of merit and were compared using the DeLong test. RESULTS The image MIP and feature MIP methods yielded AUCs of 0.91 (95% CI: 0.87, 0.94) and 0.93 (95% CI: 0.91, 0.96), respectively, for the independent test set. The feature MIP method achieved higher performance than the image MIP method (∆AUC 95% CI: 0.003, 0.051; P = .03). CONCLUSION Incorporating 4D information in DCE MRI by MIP of features in deep transfer learning demonstrated superior classification performance compared with using MIP images as input in the task of distinguishing between benign and malignant breast lesions.Keywords: Breast, Computer Aided Diagnosis (CAD), Convolutional Neural Network (CNN), MR-Dynamic Contrast Enhanced, Supervised learning, Support vector machines (SVM), Transfer learning, Volume Analysis © RSNA, 2021.
Collapse
|
14
|
Martínez-Camblor P, Pérez-Fernández S, Díaz-Coto S. The area under the generalized receiver-operating characteristic curve. Int J Biostat 2021; 18:293-306. [PMID: 33761578 DOI: 10.1515/ijb-2020-0091] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2020] [Accepted: 03/01/2021] [Indexed: 12/22/2022]
Abstract
The receiver operating-characteristic (ROC) curve is a well-known graphical tool routinely used for evaluating the discriminatory ability of continuous markers, referring to a binary characteristic. The area under the curve (AUC) has been proposed as a summarized accuracy index. Higher values of the marker are usually associated with higher probabilities of having the characteristic under study. However, there are other situations where both, higher and lower marker scores, are associated with a positive result. The generalized ROC (gROC) curve has been proposed as a proper extension of the ROC curve to fit these situations. Of course, the corresponding area under the gROC curve, gAUC, has also been introduced as a global measure of the classification capacity. In this paper, we study in deep the gAUC properties. The weak convergence of its empirical estimator is provided while deriving an explicit and useful expression for the asymptotic variance. We also obtain the expression for the asymptotic covariance of related gAUCs and propose a non-parametric procedure to compare them. The finite-samples behavior is studied through Monte Carlo simulations under different scenarios, presenting a real-world problem in order to illustrate its practical application. The R code functions implementing the procedures are provided as Supplementary Material.
Collapse
Affiliation(s)
- Pablo Martínez-Camblor
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, 7 Lebanon Street, Suite 309, Hinman Box 7261, Hanover, NH 03755, USA
| | | | - Susana Díaz-Coto
- Department of Statistics, Oviedo University, Oviedo, Asturies, Spain
| |
Collapse
|
15
|
Hu Q, Drukker K, Giger ML. Role of standard and soft tissue chest radiography images in deep-learning-based early diagnosis of COVID-19. J Med Imaging (Bellingham) 2021; 8:014503. [PMID: 34595245 PMCID: PMC8478672 DOI: 10.1117/1.jmi.8.s1.014503] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Accepted: 09/13/2021] [Indexed: 12/24/2022] Open
Abstract
Purpose: We propose a deep learning method for the automatic diagnosis of COVID-19 at patient presentation on chest radiography (CXR) images and investigates the role of standard and soft tissue CXR in this task. Approach: The dataset consisted of the first CXR exams of 9860 patients acquired within 2 days after their initial reverse transcription polymerase chain reaction tests for the SARS-CoV-2 virus, 1523 (15.5%) of whom tested positive and 8337 (84.5%) of whom tested negative for COVID-19. A sequential transfer learning strategy was employed to fine-tune a convolutional neural network in phases on increasingly specific and complex tasks. The COVID-19 positive/negative classification was performed on standard images, soft tissue images, and both combined via feature fusion. A U-Net variant was used to segment and crop the lung region from each image prior to performing classification. Classification performances were evaluated and compared on a held-out test set of 1972 patients using the area under the receiver operating characteristic curve (AUC) and the DeLong test. Results: Using full standard, cropped standard, cropped, soft tissue, and both types of cropped CXR yielded AUC values of 0.74 [0.70, 0.77], 0.76 [0.73, 0.79], 0.73 [0.70, 0.76], and 0.78 [0.74, 0.81], respectively. Using soft tissue images significantly underperformed standard images, and using both types of CXR failed to significantly outperform using standard images alone. Conclusions: The proposed method was able to automatically diagnose COVID-19 at patient presentation with promising performance, and the inclusion of soft tissue images did not result in a significant performance improvement.
Collapse
Affiliation(s)
- Qiyuan Hu
- The University of Chicago, Committee on Medical Physics, Department of Radiology, Chicago, Illinois, United States
| | - Karen Drukker
- The University of Chicago, Committee on Medical Physics, Department of Radiology, Chicago, Illinois, United States
| | - Maryellen L. Giger
- The University of Chicago, Committee on Medical Physics, Department of Radiology, Chicago, Illinois, United States
| |
Collapse
|
16
|
Fuhrman JD, Chen J, Dong Z, Lure FYM, Luo Z, Giger ML. Cascaded deep transfer learning on thoracic CT in COVID-19 patients treated with steroids. J Med Imaging (Bellingham) 2021; 8:014501. [PMID: 33415179 PMCID: PMC7773028 DOI: 10.1117/1.jmi.8.s1.014501] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Accepted: 11/04/2020] [Indexed: 12/15/2022] Open
Abstract
Purpose: Given the recent COVID-19 pandemic and its stress on global medical resources, presented here is the development of a machine intelligent method for thoracic computed tomography (CT) to inform management of patients on steroid treatment. Approach: Transfer learning has demonstrated strong performance when applied to medical imaging, particularly when only limited data are available. A cascaded transfer learning approach extracted quantitative features from thoracic CT sections using a fine-tuned VGG19 network. The extracted slice features were axially pooled to provide a CT-scan-level representation of thoracic characteristics and a support vector machine was trained to distinguish between patients who required steroid administration and those who did not, with performance evaluated through receiver operating characteristic (ROC) curve analysis. Least-squares fitting was used to assess temporal trends using the transfer learning approach, providing a preliminary method for monitoring disease progression. Results: In the task of identifying patients who should receive steroid treatments, this approach yielded an area under the ROC curve of 0.85 ± 0.10 and demonstrated significant separation between patients who received steroids and those who did not. Furthermore, temporal trend analysis of the prediction score matched expected progression during hospitalization for both groups, with separation at early timepoints prior to convergence near the end of the duration of hospitalization. Conclusions: The proposed cascade deep learning method has strong clinical potential for informing clinical decision-making and monitoring patient treatment.
Collapse
Affiliation(s)
- Jordan D. Fuhrman
- The University of Chicago, Committee on Medical Physics, Department of Radiology, Chicago, United States
| | - Jun Chen
- Renmin Hospital of Wuhan University, Department of Radiology, Wuhan, China
| | - Zegang Dong
- MS Technologies Corp., Rockville, Maryland, United States
| | | | - Zhe Luo
- Fudan University, Zhongshan Hospital, Department of Critical Care Medicine, Shanghai, China
- Fudan University, Zhongshan Hospital, Department of Critical Care Medicine, Xiamen Branch, Xiamen, China
| | - Maryellen L. Giger
- The University of Chicago, Committee on Medical Physics, Department of Radiology, Chicago, United States
| |
Collapse
|
17
|
Jiang Y. Receiver Operating Characteristic (ROC) Analysis of Image Search-and-Localize Tasks. Acad Radiol 2020; 27:1742-1750. [PMID: 32033862 DOI: 10.1016/j.acra.2019.12.020] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Revised: 12/18/2019] [Accepted: 12/20/2019] [Indexed: 10/25/2022]
Abstract
RATIONALE AND OBJECTIVES Receiver operating characteristic (ROC) analysis for the common image search-and-localize task, in which readers search an image for lesion or lesions not knowing a priori any exists, has been studied for over four decades. However, a satisfactory solution seems elusive. MATERIALS AND METHODS We show that the ROC curve predictive of clinical outcomes where readers are penalized appropriately for not correctly localizing known lesions cannot be obtained because it is a missing data problem. Further, this ROC curve is between the case-based ROC curve where readers are not penalized and the lesion-based ROC curve where penalty applies. Moreover, the lesion-based ROC curve is the LROC curve proposed by Starr et al. We show maximum-likelihood (ML) estimation of the LROC curve, validation of this procedure with Monte Carlo simulations, and its application to reader ROC datasets. RESULTS Monte Carlo simulations validated ML estimation of area under the LROC curve (AUC) and its variance. Example applications showed that ML estimate of LROC curve fits experimental datasets. CONCLUSION The ROC curve predictive of clinical performance cannot be estimated from reader ROC data alone because it is a missing data problem, and is between the case-based ROC curve where readers are not penalized for not correctly identifying known lesions and the lesion-based ROC curve where penalty applies. The lesion-based ROC curve is the LROC curve proposed by Starr et al. and can be estimated via ML estimation.
Collapse
|
18
|
Zhou W, Li H, Anastasio MA. Approximating the Ideal Observer for Joint Signal Detection and Localization Tasks by use of Supervised Learning Methods. IEEE TRANSACTIONS ON MEDICAL IMAGING 2020; 39:3992-4000. [PMID: 32746143 PMCID: PMC7768793 DOI: 10.1109/tmi.2020.3009022] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Medical imaging systems are commonly assessed and optimized by use of objective measures of image quality (IQ). The Ideal Observer (IO) performance has been advocated to provide a figure-of-merit for use in assessing and optimizing imaging systems because the IO sets an upper performance limit among all observers. When joint signal detection and localization tasks are considered, the IO that employs a modified generalized likelihood ratio test maximizes observer performance as characterized by the localization receiver operating characteristic (LROC) curve. Computations of likelihood ratios are analytically intractable in the majority of cases. Therefore, sampling-based methods that employ Markov-Chain Monte Carlo (MCMC) techniques have been developed to approximate the likelihood ratios. However, the applications of MCMC methods have been limited to relatively simple object models. Supervised learning-based methods that employ convolutional neural networks have been recently developed to approximate the IO for binary signal detection tasks. In this paper, the ability of supervised learning-based methods to approximate the IO for joint signal detection and localization tasks is explored. Both background-known-exactly and background-known-statistically signal detection and localization tasks are considered. The considered object models include a lumpy object model and a clustered lumpy model, and the considered measurement noise models include Laplacian noise, Gaussian noise, and mixed Poisson-Gaussian noise. The LROC curves produced by the supervised learning-based method are compared to those produced by the MCMC approach or analytical computation when feasible. The potential utility of the proposed method for computing objective measures of IQ for optimizing imaging system performance is explored.
Collapse
|
19
|
Jiang Y, Edwards AV, Newstead GM. Artificial Intelligence Applied to Breast MRI for Improved Diagnosis. Radiology 2020; 298:38-46. [PMID: 33078996 DOI: 10.1148/radiol.2020200292] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Background Recognition of salient MRI morphologic and kinetic features of various malignant tumor subtypes and benign diseases, either visually or with artificial intelligence (AI), allows radiologists to improve diagnoses that may improve patient treatment. Purpose To evaluate whether the diagnostic performance of radiologists in the differentiation of cancer from noncancer at dynamic contrast material-enhanced (DCE) breast MRI is improved when using an AI system compared with conventionally available software. Materials and Methods In a retrospective clinical reader study, images from breast DCE MRI examinations were interpreted by 19 breast imaging radiologists from eight academic and 11 private practices. Readers interpreted each examination twice. In the "first read," they were provided with conventionally available computer-aided evaluation software, including kinetic maps. In the "second read," they were also provided with AI analytics through computer-aided diagnosis software. Reader diagnostic performance was evaluated with receiver operating characteristic (ROC) analysis, with the area under the ROC curve (AUC) as a figure of merit in the task of distinguishing between malignant and benign lesions. The primary study end point was the difference in AUC between the first-read and the second-read conditions. Results One hundred eleven women (mean age, 52 years ± 13 [standard deviation]) were evaluated with a total of 111 breast DCE MRI examinations (54 malignant and 57 nonmalignant lesions). The average AUC of all readers improved from 0.71 to 0.76 (P = .04) when using the AI system. The average sensitivity improved when Breast Imaging Reporting and Data System (BI-RADS) category 3 was used as the cut point (from 90% to 94%; 95% confidence interval [CI] for the change: 0.8%, 7.4%) but not when using BI-RADS category 4a (from 80% to 85%; 95% CI: -0.9%, 11%). The average specificity showed no difference when using either BI-RADS category 4a or category 3 as the cut point (52% and 52% [95% CI: -7.3%, 6.0%], and from 29% to 28% [95% CI: -6.4%, 4.3%], respectively). Conclusion Use of an artificial intelligence system improves radiologists' performance in the task of differentiating benign and malignant MRI breast lesions. © RSNA, 2020 Online supplemental material is available for this article. See also the editorial by Krupinski in this issue.
Collapse
Affiliation(s)
- Yulei Jiang
- From the Department of Radiology, University of Chicago, 5841 S Maryland Ave, MC2026, Chicago, IL 60637
| | - Alexandra V Edwards
- From the Department of Radiology, University of Chicago, 5841 S Maryland Ave, MC2026, Chicago, IL 60637
| | - Gillian M Newstead
- From the Department of Radiology, University of Chicago, 5841 S Maryland Ave, MC2026, Chicago, IL 60637
| |
Collapse
|
20
|
Yousef WA. Prudence when assuming normality: An advice for machine learning practitioners. Pattern Recognit Lett 2020. [DOI: 10.1016/j.patrec.2020.06.026] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
21
|
Hu Q, Whitney HM, Giger ML. Radiomics methodology for breast cancer diagnosis using multiparametric magnetic resonance imaging. J Med Imaging (Bellingham) 2020; 7:044502. [PMID: 32864390 PMCID: PMC7444714 DOI: 10.1117/1.jmi.7.4.044502] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Accepted: 07/29/2020] [Indexed: 12/30/2022] Open
Abstract
Purpose: This study aims to develop and compare human-engineered radiomics methodologies that use multiparametric magnetic resonance imaging (mpMRI) to diagnose breast cancer. Approach: The dataset comprises clinical multiparametric MR images of 852 unique lesions from 612 patients. Each MR study included a dynamic contrast-enhanced (DCE)-MRI sequence and a T2-weighted (T2w) MRI sequence, and a subset of 389 lesions were also imaged with a diffusion-weighted imaging (DWI) sequence. Lesions were automatically segmented using the fuzzy C-means algorithm. Radiomic features were extracted from each MRI sequence. Two approaches, feature fusion and classifier fusion, to utilizing multiparametric information were investigated. A support vector machine classifier was trained for each method to differentiate between benign and malignant lesions. Area under the receiver operating characteristic curve (AUC) was used to evaluate and compare diagnostic performance. Analyses were first performed on the entire dataset and then on the subset that was imaged using the three-sequence protocol. Results: When using the full dataset, the single-parametric classifiers yielded the following AUCs and 95% confidence intervals:AUC DCE = 0.84 [0.82, 0.87],AUC T 2 w = 0.83 [0.80, 0.86], andAUC DWI = 0.69 [0.62, 0.75]. The two multiparametric classifiers both yielded AUCs of 0.87 [0.84, 0.89] and significantly outperformed all single-parametric methods classifiers. When using the three-sequence subset, the mpMRI classifiers' performances significantly decreased. Conclusions: The proposed mpMRI radiomics methods can improve the performance of computer-aided diagnostics for breast cancer and handle missing sequences in the imaging protocol.
Collapse
Affiliation(s)
- Qiyuan Hu
- University of Chicago, Department of Radiology, Committee on Medical Physics, Chicago, Illinois, United States
| | - Heather M. Whitney
- University of Chicago, Department of Radiology, Committee on Medical Physics, Chicago, Illinois, United States
- Wheaton College, Department of Physics, Wheaton, Illinois, United States
| | - Maryellen L. Giger
- University of Chicago, Department of Radiology, Committee on Medical Physics, Chicago, Illinois, United States
| |
Collapse
|
22
|
A deep learning methodology for improved breast cancer diagnosis using multiparametric MRI. Sci Rep 2020; 10:10536. [PMID: 32601367 PMCID: PMC7324398 DOI: 10.1038/s41598-020-67441-4] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Accepted: 06/05/2020] [Indexed: 12/21/2022] Open
Abstract
Multiparametric magnetic resonance imaging (mpMRI) has been shown to improve radiologists’ performance in the clinical diagnosis of breast cancer. This machine learning study develops a deep transfer learning computer-aided diagnosis (CADx) methodology to diagnose breast cancer using mpMRI. The retrospective study included clinical MR images of 927 unique lesions from 616 women. Each MR study included a dynamic contrast-enhanced (DCE)-MRI sequence and a T2-weighted (T2w) MRI sequence. A pretrained convolutional neural network (CNN) was used to extract features from the DCE and T2w sequences, and support vector machine classifiers were trained on the CNN features to distinguish between benign and malignant lesions. Three methods that integrate the sequences at different levels (image fusion, feature fusion, and classifier fusion) were investigated. Classification performance was evaluated using the receiver operating characteristic (ROC) curve and compared using the DeLong test. The single-sequence classifiers yielded areas under the ROC curves (AUCs) [95% confidence intervals] of AUCDCE = 0.85 [0.82, 0.88] and AUCT2w = 0.78 [0.75, 0.81]. The multiparametric schemes yielded AUCImageFusion = 0.85 [0.82, 0.88], AUCFeatureFusion = 0.87 [0.84, 0.89], and AUCClassifierFusion = 0.86 [0.83, 0.88]. The feature fusion method statistically significantly outperformed using DCE alone (P < 0.001). In conclusion, the proposed deep transfer learning CADx method for mpMRI may improve diagnostic performance by reducing the false positive rate and improving the positive predictive value in breast imaging interpretation.
Collapse
|
23
|
Blangero Y, Rabilloud M, Laurent-Puig P, Le Malicot K, Lepage C, Ecochard R, Taieb J, Subtil F. The area between ROC curves, a non-parametric method to evaluate a biomarker for patient treatment selection. Biom J 2020; 62:1476-1493. [PMID: 32346912 DOI: 10.1002/bimj.201900171] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Revised: 09/26/2019] [Accepted: 01/10/2020] [Indexed: 12/19/2022]
Abstract
Treatment selection markers are generally sought for when the benefit of an innovative treatment in comparison with a reference treatment is considered, and this benefit is suspected to vary according to the characteristics of the patients. Classically, such quantitative markers are detected through testing a marker-by-treatment interaction in a parametric regression model. Most alternative methods rely on modeling the risk of event occurrence in each treatment arm or the benefit of the innovative treatment over the marker values, but with assumptions that may be difficult to verify. Herein, a simple non-parametric approach is proposed to detect and assess the general capacity of a quantitative marker for treatment selection when no overall difference in efficacy could be demonstrated between two treatments in a clinical trial. This graphical method relies on the area between treatment-arm-specific receiver operating characteristic curves (ABC), which reflects the treatment selection capacity of the marker. A simulation study assessed the inference properties of the ABC estimator and compared them with other parametric and non-parametric indicators. The simulations showed that the estimate of the ABC had low bias, power comparable to parametric indicators, and that its confidence interval had a good coverage probability (better than the other non-parametric indicator in some cases). Thus, the ABC is a good alternative to parametric indicators. The ABC method was applied to data of the PETACC-8 trial that investigated FOLFOX4 versus FOLFOX4 + cetuximab in stage III colon adenocarcinoma. It enabled the detection of a treatment selection marker: the DDR2 gene.
Collapse
Affiliation(s)
- Yoann Blangero
- Service de Biostatistique, Pôle Santé Publique, Hospices Civils de Lyon, Lyon, France.,Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR 5558, Villeurbanne, France
| | - Muriel Rabilloud
- Service de Biostatistique, Pôle Santé Publique, Hospices Civils de Lyon, Lyon, France.,Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR 5558, Villeurbanne, France
| | - Pierre Laurent-Puig
- Université Paris Descartes, Sorbonne Paris Cité, Paris, France.,Service de génétique, Hôpital Européen Georges Pompidou, Paris, France.,INSERM UMR-S 1147, Paris, France
| | | | - Côme Lepage
- Fédération Francophone de Cancérologie Digestive, Dijon, France.,Hépato-gastroentérologie et cancérologie digestive, Centre hospitalier universitaire Dijon Bourgogne, Dijon, France.,INSERM U 866, Dijon, France
| | - René Ecochard
- Service de Biostatistique, Pôle Santé Publique, Hospices Civils de Lyon, Lyon, France.,Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR 5558, Villeurbanne, France
| | - Julien Taieb
- Université Paris Descartes, Sorbonne Paris Cité, Paris, France.,Chirurgie digestive générale et cancérologique, Hôpital Européen Georges Pompidou, Paris, France
| | - Fabien Subtil
- Service de Biostatistique, Pôle Santé Publique, Hospices Civils de Lyon, Lyon, France.,Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR 5558, Villeurbanne, France
| |
Collapse
|
24
|
Jang EJ, Nandram B, Ko Y, Kim DH. Small area estimation of receiver operating characteristic curves for ordinal data under stochastic ordering. Stat Med 2020; 39:1514-1528. [PMID: 32017182 DOI: 10.1002/sim.8493] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 08/31/2019] [Accepted: 11/21/2019] [Indexed: 11/11/2022]
Abstract
There has been a recent increase in the diagnosis of diseases through radiographic images such as x-rays, magnetic resonance imaging, and computed tomography. The outcome of a radiological diagnostic test is often in the form of discrete ordinal data, and we usually summarize the performance of the diagnostic test using the receiver operating characteristic (ROC) curve and the area under the curve (AUC). The ROC curve will be concave and called proper when the outcomes of the diagnostic test in the actually positive subjects are higher than in the actually negative subjects. The diagnostic test for disease detection is clinically useful when a ROC curve is proper. In this study, we develop a hierarchical Bayesian model to estimate the proper ROC curve and AUC using stochastic ordering in several domains when the outcome of the diagnostic test is discrete ordinal data and compare it with the model without stochastic ordering. The model without stochastic ordering can estimate the improper ROC curve with a nonconcave shape or a hook when the true ROC curve of the population is a proper ROC curve. Therefore, the model with stochastic ordering is preferable over the model without stochastic ordering to estimate the proper ROC curve with clinical usefulness for ordinal data.
Collapse
Affiliation(s)
- Eun Jin Jang
- Department of Information Statistics, Andong National University, Andong, South Korea
| | - Balgobin Nandram
- Department of Mathematical Sciences, Worcester Polytechnic Institute, Worcester, Massachusetts
| | - Yousun Ko
- Program in Biomedical Radiation Sciences, Department of Transdisciplinary Studies, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, South Korea.,Biomedical Research Center, Asan Institute for Life Sciences, Asan Medical Center, Seoul, South Korea
| | - Dal Ho Kim
- Department of Statistics, Kyungpook National University, Daegu, South Korea
| |
Collapse
|
25
|
McKinney SM, Sieniek M, Godbole V, Godwin J, Antropova N, Ashrafian H, Back T, Chesus M, Corrado GS, Darzi A, Etemadi M, Garcia-Vicente F, Gilbert FJ, Halling-Brown M, Hassabis D, Jansen S, Karthikesalingam A, Kelly CJ, King D, Ledsam JR, Melnick D, Mostofi H, Peng L, Reicher JJ, Romera-Paredes B, Sidebottom R, Suleyman M, Tse D, Young KC, De Fauw J, Shetty S. International evaluation of an AI system for breast cancer screening. Nature 2020; 577:89-94. [PMID: 31894144 DOI: 10.1038/s41586-019-1799-6] [Citation(s) in RCA: 1000] [Impact Index Per Article: 250.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2019] [Accepted: 11/05/2019] [Indexed: 02/07/2023]
Abstract
Screening mammography aims to identify breast cancer at earlier stages of the disease, when treatment can be more successful1. Despite the existence of screening programmes worldwide, the interpretation of mammograms is affected by high rates of false positives and false negatives2. Here we present an artificial intelligence (AI) system that is capable of surpassing human experts in breast cancer prediction. To assess its performance in the clinical setting, we curated a large representative dataset from the UK and a large enriched dataset from the USA. We show an absolute reduction of 5.7% and 1.2% (USA and UK) in false positives and 9.4% and 2.7% in false negatives. We provide evidence of the ability of the system to generalize from the UK to the USA. In an independent study of six radiologists, the AI system outperformed all of the human readers: the area under the receiver operating characteristic curve (AUC-ROC) for the AI system was greater than the AUC-ROC for the average radiologist by an absolute margin of 11.5%. We ran a simulation in which the AI system participated in the double-reading process that is used in the UK, and found that the AI system maintained non-inferior performance and reduced the workload of the second reader by 88%. This robust assessment of the AI system paves the way for clinical trials to improve the accuracy and efficiency of breast cancer screening.
Collapse
Affiliation(s)
| | | | | | | | | | - Hutan Ashrafian
- Department of Surgery and Cancer, Imperial College London, London, UK
- Institute of Global Health Innovation, Imperial College London, London, UK
| | | | | | | | - Ara Darzi
- Department of Surgery and Cancer, Imperial College London, London, UK
- Institute of Global Health Innovation, Imperial College London, London, UK
- Cancer Research UK Imperial Centre, Imperial College London, London, UK
| | | | | | - Fiona J Gilbert
- Department of Radiology, Cambridge Biomedical Research Centre, University of Cambridge, Cambridge, UK
| | | | | | - Sunny Jansen
- Verily Life Sciences, South San Francisco, CA, USA
| | | | | | | | | | | | | | | | | | | | - Richard Sidebottom
- The Royal Marsden Hospital, London, UK
- Thirlestaine Breast Centre, Cheltenham, UK
| | | | | | | | | | | |
Collapse
|
26
|
Whitney HM, Li H, Ji Y, Liu P, Giger ML. Harmonization of radiomic features of breast lesions across international DCE-MRI datasets. J Med Imaging (Bellingham) 2020; 7:012707. [PMID: 32206682 PMCID: PMC7056633 DOI: 10.1117/1.jmi.7.1.012707] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Accepted: 02/24/2020] [Indexed: 12/12/2022] Open
Abstract
Purpose: Radiomic features extracted from medical images acquired in different countries may demonstrate a batch effect. Thus, we investigated the effect of harmonization on a database of radiomic features extracted from dynamic contrast-enhanced magnetic resonance (DCE-MR) breast imaging studies of 3150 benign lesions and cancers collected from international datasets, as well as the potential of harmonization to improve classification of malignancy. Approach: Eligible features were harmonized by category using the ComBat method. Harmonization effect on features was evaluated using the Davies-Bouldin index for degree of clustering between populations for both benign lesions and cancers. Performance in distinguishing between cancers and benign lesions was evaluated for each dataset using 10-fold cross validation with the area under the receiver operating characteristic curve (AUC) determined on the pre- and postharmonization sets of radiomic features in each dataset and a combined one. Differences in AUCs were evaluated for statistical significance. Results: The Davies-Bouldin index increased by 27% for benign lesions and by 43% for cancers, indicating that the postharmonization features were more similar. Classification performance using postharmonization features performed better than that using preharmonization features ( p < 0.001 for all three). Conclusion: Harmonization of radiomic features may enable combining databases from different populations for more comprehensive computer-aided diagnosis models of breast cancer.
Collapse
Affiliation(s)
- Heather M. Whitney
- The University of Chicago, Department of Radiology, Chicago, Illinois, United States
- Wheaton College, Department of Physics, Wheaton, Illinois, United States
| | - Hui Li
- The University of Chicago, Department of Radiology, Chicago, Illinois, United States
| | - Yu Ji
- Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, National Clinical Research Center for Cancer, Department of Breast Imaging, Tianjin, China
| | - Peifang Liu
- Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, National Clinical Research Center for Cancer, Department of Breast Imaging, Tianjin, China
| | - Maryellen L. Giger
- The University of Chicago, Department of Radiology, Chicago, Illinois, United States
| |
Collapse
|
27
|
Zhou W, Li H, Anastasio MA. Approximating the Ideal Observer and Hotelling Observer for Binary Signal Detection Tasks by Use of Supervised Learning Methods. IEEE TRANSACTIONS ON MEDICAL IMAGING 2019; 38:2456-2468. [PMID: 30990425 PMCID: PMC6858982 DOI: 10.1109/tmi.2019.2911211] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
It is widely accepted that the optimization of medical imaging system performance should be guided by task-based measures of image quality (IQ). Task-based measures of IQ quantify the ability of an observer to perform a specific task, such as detection or estimation of a signal (e.g., a tumor). For binary signal detection tasks, the Bayesian Ideal Observer (IO) sets an upper limit of observer performance and has been advocated for use in optimizing medical imaging systems and data-acquisition designs. Except in special cases, the determination of the IO test statistic is analytically intractable. Markov-chain Monte Carlo (MCMC) techniques can be employed to approximate the IO detection performance, but their reported applications have been limited to relatively simple object models. In cases where the IO test statistic is difficult to compute, the Hotelling Observer (HO) can be employed. To compute the HO test statistic, potentially large covariance matrices must be accurately estimated and subsequently inverted, which can present computational challenges. This paper investigates the supervised learning-based methodologies for approximating the IO and HO test statistics. Convolutional neural networks (CNNs) and single-layer neural networks (SLNNs) are employed to approximate the IO and HO test statistics, respectively. The numerical simulations were conducted for both signal-known-exactly (SKE) and signal-known-statistically (SKS) signal detection tasks. The considered background models include the lumpy object model and the clustered lumpy object model. The measurement noise models considered are Gaussian, Laplacian, and mixed Poisson-Gaussian. The performances of the supervised learning methods are assessed via receiver operating characteristic (ROC) analysis, and the results are compared to those produced by the use of traditional numerical methods or analytical calculations when feasible. The potential advantages of the proposed supervised learning approaches for approximating the IO and HO test statistics are discussed.
Collapse
|
28
|
Accuracy of the Vancouver Lung Cancer Risk Prediction Model Compared With That of Radiologists. Chest 2019; 156:112-119. [DOI: 10.1016/j.chest.2019.04.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Revised: 02/20/2019] [Accepted: 04/02/2019] [Indexed: 12/17/2022] Open
|
29
|
Kim SH. Assessment of solid components of borderline ovarian tumor and stage I carcinoma: added value of combined diffusion- and perfusion-weighted magnetic resonance imaging. Yeungnam Univ J Med 2019; 36:231-240. [PMID: 31620638 PMCID: PMC6784647 DOI: 10.12701/yujm.2019.00234] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2019] [Revised: 06/04/2019] [Accepted: 06/06/2019] [Indexed: 01/29/2023] Open
Abstract
Background We sought to determine the value of combining diffusion-weighted (DW) and perfusion-weighted (PW) sequences with a conventional magnetic resonance (MR) sequence to assess solid components of borderline ovarian tumors (BOTs) and stage I carcinomas. Methods Conventional, DW, and PW sequences in the tumor imaging studies of 70 patients (BOTs, n=38; stage I carcinomas, n=32) who underwent surgery with pathologic correlation were assessed. Two independent radiologists calculated the parameters apparent diffusion coefficient (ADC), Ktrans (vessel permeability), and Ve (cell density) for the solid components. The distribution on conventional MR sequence and mean, standard deviation, and 95% confidence interval of each DW and PW parameter were calculated. The inter-observer agreement among the two radiologists was assessed. Area under the receiver operating characteristic curve (AUC) and multivariate logistic regression were performed to compare the effectiveness of DW and PW sequences for average values and to characterize the diagnostic performance of combined DW and PW sequences. Results There were excellent agreements for DW and PW parameters between radiologists. The distributions of ADC, Ktrans, and Ve values were significantly different between BOTs and stage I carcinomas, yielding AUCs of 0.58 and 0.68, 0.78 and 0.82, and 0.70 and 0.72, respectively, with ADC yielding the lowest diagnostic performance. The AUCs of the DW, PW, and combined PW and DW sequences were 0.71±0.05, 0.80±0.05, and 0.85±0.05, respectively. Conclusion Combining PW and DW sequences to a conventional sequence potentially improves the diagnostic accuracy in the differentiation of BOTs and stage I carcinomas.
Collapse
Affiliation(s)
- See Hyung Kim
- Department of Radiology, School of Medicine, Kyungpook National University, Daegu, Korea
| |
Collapse
|
30
|
Bayesian hierarchical model for the estimation of proper receiver operating characteristic curves using stochastic ordering. COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS 2019. [DOI: 10.29220/csam.2019.26.2.205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
31
|
Samala RK, Hadjiiski L, Helvie MA, Richter CD, Cha KH. Breast Cancer Diagnosis in Digital Breast Tomosynthesis: Effects of Training Sample Size on Multi-Stage Transfer Learning Using Deep Neural Nets. IEEE TRANSACTIONS ON MEDICAL IMAGING 2019; 38:686-696. [PMID: 31622238 PMCID: PMC6812655 DOI: 10.1109/tmi.2018.2870343] [Citation(s) in RCA: 86] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
In this paper, we developed a deep convolutional neural network (CNN) for the classification of malignant and benign masses in digital breast tomosynthesis (DBT) using a multi-stage transfer learning approach that utilized data from similar auxiliary domains for intermediate-stage fine-tuning. Breast imaging data from DBT, digitized screen-film mammography, and digital mammography totaling 4039 unique regions of interest (1797 malignant and 2242 benign) were collected. Using cross validation, we selected the best transfer network from six transfer networks by varying the level up to which the convolutional layers were frozen. In a single-stage transfer learning approach, knowledge from CNN trained on the ImageNet data was fine-tuned directly with the DBT data. In a multi-stage transfer learning approach, knowledge learned from ImageNet was first fine-tuned with the mammography data and then fine-tuned with the DBT data. Two transfer networks were compared for the second-stage transfer learning by freezing most of the CNN structures versus freezing only the first convolutional layer. We studied the dependence of the classification performance on training sample size for various transfer learning and fine-tuning schemes by varying the training data from 1% to 100% of the available sets. The area under the receiver operating characteristic curve (AUC) was used as a performance measure. The view-based AUC on the test set for single-stage transfer learning was 0.85 ± 0.05 and improved significantly (p <; 0.05$ ) to 0.91 ± 0.03 for multi-stage learning. This paper demonstrated that, when the training sample size from the target domain is limited, an additional stage of transfer learning using data from a similar auxiliary domain is advantageous.
Collapse
|
32
|
Charles Edgar Metz, Ph.D. (1942–2012): pioneer in receiver operating characteristic (ROC) analysis. Radiol Phys Technol 2019; 12:1-5. [DOI: 10.1007/s12194-018-0483-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
33
|
Binormal Precision–Recall Curves for Optimal Classification of Imbalanced Data. STATISTICS IN BIOSCIENCES 2019. [DOI: 10.1007/s12561-019-09231-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
34
|
Hillis SL, Mohammad BA, Brennan PC. Estimating latent reader-performance variability using the Obuchowski-Rockette method. PROCEEDINGS OF SPIE--THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING 2019; 10952:10952F. [PMID: 32390679 PMCID: PMC7210714 DOI: 10.1117/12.2513106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
We describe how the Obuchowski-Rockette (OR) method of analysis for multi-reader diagnostic studies can be used to estimate the variability of latent reader-performance outcomes, such as the area under the ROC curve (AUC). For a specific reader the latent or true reader performance outcome can conceptually be thought of as the average of the estimates that would result if the reader were to read a very large number of case samples. We note that for the sample sizes used in typical diagnostic studies, the latent reader-performance outcome is equal to the observed outcome minus measurement error. An often-cited study that assesses the variability of various reader-performance outcomes, including the AUC, is the study by Craig Beam et. al., "Variability in the Interpretation of Screening Mammograms by US Radiologists," published in 1996. However, a problem with this type of study is that the variability estimates include measurement error. Thus this approach overestimates latent reader variability and gives variability estimates that are dependent on case sample size. The proposed method overcomes this problem. We illustrate the proposed method for 29 radiologists in Jordan, with each reading 60 chest computed tomography (CT) scans. Using the OR method we estimate the middle 95% range for latent AUC values to be 0.07; i.e., we estimate that 95% of radiologists differ by less than 0.07 in their ability to successfully discriminate between a pair of diseased and nondiseased cases. In contrast, the estimate for the 95% range for the observed AUCs is 0.18. Thus we see how the conventional method of describing variability of reader performance estimates can greatly overstate the variability of the true abilities of the readers.
Collapse
Affiliation(s)
- Stephen L Hillis
- Departments of Radiology and Biostatistics, University of Iowa, Iowa City, IA, USA
| | - Badera Al Mohammad
- Department of Medical Imaging and Radiation Sciences, Faculty of Health Sciences, The University of Sydney, Lidcombe, NSW, Australia
| | - Patrick C Brennan
- Department of Medical Imaging and Radiation Sciences, Faculty of Health Sciences, The University of Sydney, Lidcombe, NSW, Australia
| |
Collapse
|
35
|
Kim ST, Lee JH, Lee H, Ro YM. Visually interpretable deep network for diagnosis of breast masses on mammograms. Phys Med Biol 2018; 63:235025. [PMID: 30511660 DOI: 10.1088/1361-6560/aaef0a] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Recently, deep learning technology has achieved various successes in medical image analysis studies including computer-aided diagnosis (CADx). However, current CADx approaches based on deep learning have a limitation in interpreting diagnostic decisions. The limited interpretability is a major challenge for practical use of current deep learning approaches. In this paper, a novel visually interpretable deep network framework is proposed to provide diagnostic decisions with visual interpretation. The proposed method is motivated by the fact that the radiologists characterize breast masses according to the breast imaging reporting and data system (BIRADS). The proposed deep network framework consists of a BIRADS guided diagnosis network and a BIRADS critic network. A 2D map, named BIRADS guide map, is generated in the inference process of the deep network. The visual features extracted from the breast masses could be refined by the BIRADS guide map, which helps the deep network to focus on more informative areas. The BIRADS critic network makes the BIRADS guide map to be relevant to the characterization of masses in terms of BIRADS description. To verify the proposed method, comparative experiments have been conducted on public mammogram database. On the independent test set (170 malignant masses and 170 benign masses), the proposed method was found to have significantly higher performance compared to the deep network approach without using the BIRADS guide map (p < 0.05). Moreover, the visualization was conducted to show the location where the deep network exploited more information. This study demonstrated that the proposed visually interpretable CADx framework could be a promising approach for visually interpreting the diagnostic decision of the deep network.
Collapse
Affiliation(s)
- Seong Tae Kim
- School of Electrical Engineering, KAIST, 291, Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| | | | | | | |
Collapse
|
36
|
Radiologist performance in the detection of lung cancer using CT. Clin Radiol 2018; 74:67-75. [PMID: 30470412 DOI: 10.1016/j.crad.2018.10.008] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2018] [Accepted: 10/16/2018] [Indexed: 12/17/2022]
Abstract
AIM To measure the level of radiologists' performance in lung cancer detection, and to explore radiologists' performance in cancer specialised and non-specialised centres. MATERIALS AND METHODS Thirty radiologists read 60 chest computed tomography (CT) examinations. Thirty cases had surgically or biopsy-proven lung cancer and 30 were cancer-free cases. The cancer cases were validated by four expert radiologists who located the malignant lung nodules. Reader performance was evaluated by calculating sensitivity, location sensitivity, specificity, and area under the receiver operating characteristic (ROC) curve (AUC). In addition, sensitivity at fixed specificity (0.794) was computed from each reader's estimated ROC curve. RESULTS The radiologists had a mean sensitivity of 0.749, sensitivity at fixed specificity of 0.744, location sensitivity of 0.666, specificity of 0.81 and AUC of 0.846. Radiologists in the specialised and non-specialised cancer centres had the following (specialised, non-specialised) pairs of values: sensitivity=(0.80, 0.719); sensitivity for fixed 0.794 specificity=(0.752, 0.740); location sensitivity=(0.712, 0.637); specificity=(0.794, 0.82) and AUC=(0.846, 0.846). CONCLUSION The efficacy of radiologists was comparable to other studies. Furthermore, AUC outcomes were similar for specialised and non-specialised cancer centre radiologists, suggesting they have similar discriminatory ability and that the higher sensitivity and lower specificity for specialised-centre radiologists can be attributed to them being less conservative in interpreting case images.
Collapse
|
37
|
Shiraishi J, Fukuoka D, Iha R, Inada H, Tanaka R, Hara T. Verification of modified receiver-operating characteristic software using simulated rating data. Radiol Phys Technol 2018; 11:406-414. [PMID: 30244314 DOI: 10.1007/s12194-018-0479-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2018] [Revised: 09/18/2018] [Accepted: 09/18/2018] [Indexed: 11/25/2022]
Abstract
ROCKIT, which is a receiver-operating characteristic (ROC) curve-fitting software package, was developed by Metz et al. In the early 1990s, it is a very frequently used ROC software throughout the world. In addition to ROCKIT, DBM-MRMC software was developed for multi-reader multi-case analysis of the difference in average area under ROC curves (AUCs). Because this old software cannot run on a PC with Windows 7 or a more recent operating system, we developed new software that employs the same basic algorithms with minor modifications. In this study, we verified our modified software and tested the differences between the index of diagnostic accuracies using simulated rating data. In our simulation model, all data were generated using target AUCs and a binormal parameter b. In ROC curve fitting with simulated rating data, we varied four factors: the total number of case samples, the ratio of positive-to-negative cases, a binormal parameter b, and the preset AUC. To investigate the differences between the statistical test results obtained from our software and the existing software, we generated simulated rating data sets with three levels of case difficulty and three degrees of difference in AUCs obtained from two modalities. As a result of the simulation, the AUCs estimated by the new and existing software were highly correlated (R > 0.98), and there were high agreements (85% or more) in the statistical test results. In conclusion, we believe that our modified software is as capable as the existing software.
Collapse
Affiliation(s)
- Junji Shiraishi
- Faculty of Life Sciences, Kumamoto University, 4-24-1 Kuhonji, Kumamoto, 862-0976, Japan.
| | - Daisuke Fukuoka
- Faculty of Education, Gifu University, 1-1 Yanagido, Gifu, 501-1193, Japan
| | - Reimi Iha
- School of Health Sciences, Kumamoto University, 4-24-1 Kuhonji, Kumamoto, 862-0976, Japan
| | - Haruka Inada
- School of Health Sciences, Kumamoto University, 4-24-1 Kuhonji, Kumamoto, 862-0976, Japan
| | - Rie Tanaka
- College of Medical, Pharmaceutical and Health Sciences, Kanazawa University, 5-11-80 Kodatsuno, Kanazawa, Ishikawa, 920-0942, Japan
| | - Takeshi Hara
- Faculty of Engineering, Gifu University, 1-1 Yanagido, Gifu, 501-1193, Japan
| |
Collapse
|
38
|
Hillis SL. Relationship between Roe and Metz simulation model for multireader diagnostic data and Obuchowski-Rockette model parameters. Stat Med 2018; 37:2067-2093. [PMID: 29609206 PMCID: PMC5980727 DOI: 10.1002/sim.7616] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Revised: 10/08/2017] [Accepted: 01/02/2018] [Indexed: 11/06/2022]
Abstract
For the typical diagnostic radiology study design, each case (ie, patient) undergoes several diagnostic tests (or modalities) and the resulting images are interpreted by several readers. Often, each reader is asked to assign a confidence-of-disease rating to each case for each test, and the diagnostic tests are compared with respect to reader-performance outcomes that are functions of the reader receiver operating characteristic (ROC) curves, such as the area under the ROC curve. These reader-performance outcomes are frequently analyzed using the Obuchowski and Rockette method, which allows conclusions to generalize to both the reader and case populations. The simulation model proposed by Roe and Metz (RM) in 1997 emulates confidence-of-disease data collected from such studies and has been an important tool for empirically evaluating various reader-performance analysis methods. However, because the RM model parameters are expressed in terms of a continuous decision variable rather than in terms of reader-performance outcomes, it has not been possible to evaluate the realism of the RM model. I derive the relationships between the RM and Obuchowski-Rockette model parameters for the empirical area under the ROC curve reader-performance outcome. These relationships make it possible to evaluate the realism of the RM parameter models and to assess the performance of Obuchowski-Rockette parameter estimates. An example illustrates the application of the relationships for assessing the performance of a proposed upper one-sided confidence bound for the Obuchowski-Rockette test-by-reader variance component, which is useful for sample size estimation.
Collapse
Affiliation(s)
- Stephen L Hillis
- Departments of Radiology and Biostatistics, The University of Iowa, 3710 Medical Laboratories, 200 Hawkins Drive, Iowa City, 52242-1077, IA, U.S.A
| |
Collapse
|
39
|
Interpretation Time Using a Concurrent-Read Computer-Aided Detection System for Automated Breast Ultrasound in Breast Cancer Screening of Women With Dense Breast Tissue. AJR Am J Roentgenol 2018; 211:452-461. [PMID: 29792747 DOI: 10.2214/ajr.18.19516] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
OBJECTIVE The purpose of this study was to compare diagnostic accuracy and interpretation time of screening automated breast ultrasound (ABUS) for women with dense breast tissue without and with use of a recently U.S. Food and Drug Administration-approved computer-aided detection (CAD) system for concurrent read. MATERIALS AND METHODS In a retrospective observer performance study, 18 radiologists interpreted a cancer-enriched set (i.e., cancer prevalence higher than in the original screening cohort) of 185 screening ABUS studies (52 with and 133 without breast cancer). These studies were from a large cohort of ABUS-screened patients interpreted as BI-RADS density C or D. Each reader interpreted each case twice in a counterbalanced study, once without the CAD system and once with it, separated by 4 weeks. For each case, each reader identified abnormal findings and reported BI-RADS assessment category and level of suspicion for breast cancer. Interpretation time was recorded. Level of suspicion data were compared to evaluate diagnostic accuracy by means of the Dorfman-Berbaum-Metz method of jackknife with ANOVA ROC analysis. Interpretation times were compared by ANOVA. RESULTS The ROC AUC was 0.848 with the CAD system, compared with 0.828 without it, for a difference of 0.020 (95% CI, -0.011 to 0.051) and was statistically noninferior to the AUC without the CAD system with respect to a margin of -0.05 (p = 0.000086). The mean interpretation time was 3 minutes 33 seconds per case without the CAD system and 2 minutes 24 seconds with it, for a difference of 1 minute 9 seconds saved (95% CI, 44-93 seconds; p = 0.000014), or a reduction in interpretation time to 67% of the time without the CAD system. CONCLUSION Use of the concurrent-read CAD system for interpretation of screening ABUS studies of women with dense breast tissue who do not have symptoms is expected to make interpretation significantly faster and produce noninferior diagnostic accuracy compared with interpretation without the CAD system.
Collapse
|
40
|
Chen W, Sahiner B, Samuelson F, Pezeshk A, Petrick N. Calibration of medical diagnostic classifier scores to the probability of disease. Stat Methods Med Res 2018; 27:1394-1409. [PMID: 27507287 PMCID: PMC5548655 DOI: 10.1177/0962280216661371] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Scores produced by statistical classifiers in many clinical decision support systems and other medical diagnostic devices are generally on an arbitrary scale, so the clinical meaning of these scores is unclear. Calibration of classifier scores to a meaningful scale such as the probability of disease is potentially useful when such scores are used by a physician. In this work, we investigated three methods (parametric, semi-parametric, and non-parametric) for calibrating classifier scores to the probability of disease scale and developed uncertainty estimation techniques for these methods. We showed that classifier scores on arbitrary scales can be calibrated to the probability of disease scale without affecting their discrimination performance. With a finite dataset to train the calibration function, it is important to accompany the probability estimate with its confidence interval. Our simulations indicate that, when a dataset used for finding the transformation for calibration is also used for estimating the performance of calibration, the resubstitution bias exists for a performance metric involving the truth states in evaluating the calibration performance. However, the bias is small for the parametric and semi-parametric methods when the sample size is moderate to large (>100 per class).
Collapse
Affiliation(s)
- Weijie Chen
- Office of Science and Engineering Laboratories, Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring, USA
| | - Berkman Sahiner
- Office of Science and Engineering Laboratories, Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring, USA
| | - Frank Samuelson
- Office of Science and Engineering Laboratories, Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring, USA
| | - Aria Pezeshk
- Office of Science and Engineering Laboratories, Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring, USA
| | - Nicholas Petrick
- Office of Science and Engineering Laboratories, Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring, USA
| |
Collapse
|
41
|
Antropova N, Abe H, Giger ML. Use of clinical MRI maximum intensity projections for improved breast lesion classification with deep convolutional neural networks. J Med Imaging (Bellingham) 2018; 5:014503. [PMID: 29430478 PMCID: PMC5798576 DOI: 10.1117/1.jmi.5.1.014503] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Accepted: 01/11/2018] [Indexed: 12/26/2022] Open
Abstract
Deep learning methods have been shown to improve breast cancer diagnostic and prognostic decisions based on selected slices of dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI). However, incorporation of volumetric and temporal components into DCE-MRIs has not been well studied. We propose maximum intensity projection (MIP) images of subtraction MRI as a way to simultaneously include four-dimensional (4-D) images into lesion classification using convolutional neural networks (CNN). The study was performed on a dataset of 690 cases. Regions of interest were selected around each lesion on three MRI presentations: (i) the MIP image generated on the second postcontrast subtraction MRI, (ii) the central slice of the second postcontrast MRI, and (iii) the central slice of the second postcontrast subtraction MRI. CNN features were extracted from the ROIs using pretrained VGGNet. The features were utilized in the training of three support vector machine classifiers to characterize lesions as malignant or benign. Classifier performances were evaluated with fivefold cross-validation and compared based on area under the ROC curve (AUC). The approach using MIPs [Formula: see text] outperformed that using central-slices of either second postcontrast MRIs [Formula: see text] or second postcontrast subtraction MRIs [Formula: see text], at statistically significant levels.
Collapse
Affiliation(s)
- Natalia Antropova
- The University of Chicago, Department of Radiology, Chicago, Illinois, United States
| | - Hiroyuki Abe
- The University of Chicago, Department of Radiology, Chicago, Illinois, United States
| | - Maryellen L. Giger
- The University of Chicago, Department of Radiology, Chicago, Illinois, United States
| |
Collapse
|
42
|
Samala RK, Chan HP, Hadjiiski LM, Helvie MA, Cha KH, Richter CD. Multi-task transfer learning deep convolutional neural network: application to computer-aided diagnosis of breast cancer on mammograms. Phys Med Biol 2017; 62:8894-8908. [PMID: 29035873 PMCID: PMC5859950 DOI: 10.1088/1361-6560/aa93d4] [Citation(s) in RCA: 97] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Transfer learning in deep convolutional neural networks (DCNNs) is an important step in its application to medical imaging tasks. We propose a multi-task transfer learning DCNN with the aim of translating the 'knowledge' learned from non-medical images to medical diagnostic tasks through supervised training and increasing the generalization capabilities of DCNNs by simultaneously learning auxiliary tasks. We studied this approach in an important application: classification of malignant and benign breast masses. With Institutional Review Board (IRB) approval, digitized screen-film mammograms (SFMs) and digital mammograms (DMs) were collected from our patient files and additional SFMs were obtained from the Digital Database for Screening Mammography. The data set consisted of 2242 views with 2454 masses (1057 malignant, 1397 benign). In single-task transfer learning, the DCNN was trained and tested on SFMs. In multi-task transfer learning, SFMs and DMs were used to train the DCNN, which was then tested on SFMs. N-fold cross-validation with the training set was used for training and parameter optimization. On the independent test set, the multi-task transfer learning DCNN was found to have significantly (p = 0.007) higher performance compared to the single-task transfer learning DCNN. This study demonstrates that multi-task transfer learning may be an effective approach for training DCNN in medical imaging applications when training samples from a single modality are limited.
Collapse
Affiliation(s)
- Ravi K Samala
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109-5842, United States of America
| | | | | | | | | | | |
Collapse
|
43
|
Effectiveness of Bone Suppression Imaging in the Detection of Lung Nodules on Chest Radiographs. J Thorac Imaging 2017; 32:398-405. [DOI: 10.1097/rti.0000000000000299] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
44
|
Antropova N, Huynh BQ, Giger ML. A deep feature fusion methodology for breast cancer diagnosis demonstrated on three imaging modality datasets. Med Phys 2017; 44:5162-5171. [PMID: 28681390 DOI: 10.1002/mp.12453] [Citation(s) in RCA: 207] [Impact Index Per Article: 29.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2017] [Revised: 06/12/2017] [Accepted: 06/25/2017] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Deep learning methods for radiomics/computer-aided diagnosis (CADx) are often prohibited by small datasets, long computation time, and the need for extensive image preprocessing. AIMS We aim to develop a breast CADx methodology that addresses the aforementioned issues by exploiting the efficiency of pre-trained convolutional neural networks (CNNs) and using pre-existing handcrafted CADx features. MATERIALS & METHODS We present a methodology that extracts and pools low- to mid-level features using a pretrained CNN and fuses them with handcrafted radiomic features computed using conventional CADx methods. Our methodology is tested on three different clinical imaging modalities (dynamic contrast enhanced-MRI [690 cases], full-field digital mammography [245 cases], and ultrasound [1125 cases]). RESULTS From ROC analysis, our fusion-based method demonstrates, on all three imaging modalities, statistically significant improvements in terms of AUC as compared to previous breast cancer CADx methods in the task of distinguishing between malignant and benign lesions. (DCE-MRI [AUC = 0.89 (se = 0.01)], FFDM [AUC = 0.86 (se = 0.01)], and ultrasound [AUC = 0.90 (se = 0.01)]). DISCUSSION/CONCLUSION We proposed a novel breast CADx methodology that can be used to more effectively characterize breast lesions in comparison to existing methods. Furthermore, our proposed methodology is computationally efficient and circumvents the need for image preprocessing.
Collapse
Affiliation(s)
- Natalia Antropova
- Department of Radiology, University of Chicago, 5841 S Maryland Ave., Chicago, IL, 60637, USA
| | - Benjamin Q Huynh
- Department of Radiology, University of Chicago, 5841 S Maryland Ave., Chicago, IL, 60637, USA
| | - Maryellen L Giger
- Department of Radiology, University of Chicago, 5841 S Maryland Ave., Chicago, IL, 60637, USA
| |
Collapse
|
45
|
Leng S, Takahashi N, Gomez Cardona D, Kitajima K, McCollough B, Li Z, Kawashima A, Leibovich BC, McCollough CH. Subjective and objective heterogeneity scores for differentiating small renal masses using contrast-enhanced CT. Abdom Radiol (NY) 2017; 42:1485-1492. [PMID: 28025654 DOI: 10.1007/s00261-016-1014-2] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
PURPOSE The aim of this study was to assess the effect of denoising on objective heterogeneity scores and its diagnostic capability for the diagnosis of angiomyolipoma (AML) and renal cell carcinoma (RCC). MATERIALS AND METHODS A total of 158 resected renal masses ≤4 cm [98 clear cell (cc) RCCs, 36 papillary (pap)-RCCs, and 24 AMLs] from 139 patients were evaluated. A representative contrast-enhanced computed tomography (CT) image for each mass was selected by a genitourinary radiologist. A largest possible region of interest was drawn on each mass by the radiologist, from which three objective heterogeneity indices were calculated: standard deviation (SD), entropy (Ent), and uniformity (Uni). Objective heterogeneity indices were also calculated after images were processed with a denoising algorithm (non-local means) at three strengths: weak, medium, and strong. Two genitourinary radiologists also subjectively scored each mass independently using a three-point scale (1-3; with 1 the least and 3 the most heterogeneous), which were added to represent the final subjective heterogeneity score of each mass. Heterogeneity scores were compared among mass types, and area under the ROC curve (AUC) was calculated. RESULTS For all heterogeneity indices, cc-RCC was significantly more heterogeneous than pap-RCC and AML (p < 0.001), but no significant difference was found between pap-RCC and AML (p > 0.01). For cc-RCC and pap-RCC differentiation, AUCs were 0.91, 0.81, 0.78, and 0.78 for the subjective score, SD, Ent, and Uni, respectively, using original images. The corresponding AUC values were 0.84, 0.74, 0.79, and 0.80 for differentiation of AML and cc-RCC. Noise reduction at weak setting improves AUC values by 0.03, 0.05, and 0.05 for SD, entropy, and uniformity for differentiation of cc-RCC from pap-RCC. Further increase of filtering strength did not improve AUC values. For differentiation of AML vs. cc-RCC, the AUC values stayed relatively flat using the noise reduction technique at different strengths for all three indices. CONCLUSIONS Both subjective and objective heterogeneity indices can differentiate cc-RCC from pap-RCC and AML. Noise reduction improved differentiation of cc-RCC from pap-RCC, but not differentiation of AML from cc-RCC.
Collapse
Affiliation(s)
- Shuai Leng
- Department of Radiology, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA.
| | - Naoki Takahashi
- Department of Radiology, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
| | - Daniel Gomez Cardona
- Department of Radiology, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
- Department of Medical Physics, University of Wisconsin-Madison, 1111 Highland Avenue, Madison, WI, 53705-2275, USA
| | - Kazuhiro Kitajima
- Department of Radiology, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
- Department of Radiology, Faculty of Medicine, Kobe University, Kobe, Hyogo, Japan
| | - Brian McCollough
- Department of Radiology, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
| | - Zhoubo Li
- Department of Radiology, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
- GE Healthcare, 3000 N. Grandview Blvd, Waukesha, WI, 53188, USA
| | - Akira Kawashima
- Department of Radiology, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
| | - Bradley C Leibovich
- Department of Urology, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
| | - Cynthia H McCollough
- Department of Radiology, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
| |
Collapse
|
46
|
Zhai X, Chakraborty DP. A bivariate contaminated binormal model for robust fitting of proper ROC curves to a pair of correlated, possibly degenerate, ROC datasets. Med Phys 2017; 44:2207-2222. [PMID: 28382718 DOI: 10.1002/mp.12263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2016] [Revised: 03/13/2017] [Accepted: 03/27/2017] [Indexed: 11/06/2022] Open
Abstract
PURPOSE The objective was to design and implement a bivariate extension to the contaminated binormal model (CBM) to fit paired receiver operating characteristic (ROC) datasets-possibly degenerate-with proper ROC curves. Paired datasets yield two correlated ratings per case. Degenerate datasets have no interior operating points and proper ROC curves do not inappropriately cross the chance diagonal. The existing method, developed more than three decades ago utilizes a bivariate extension to the binormal model, implemented in CORROC2 software, which yields improper ROC curves and cannot fit degenerate datasets. CBM can fit proper ROC curves to unpaired (i.e., yielding one rating per case) and degenerate datasets, and there is a clear scientific need to extend it to handle paired datasets. METHODS In CBM, nondiseased cases are modeled by a probability density function (pdf) consisting of a unit variance peak centered at zero. Diseased cases are modeled with a mixture distribution whose pdf consists of two unit variance peaks, one centered at positive μ with integrated probability α, the mixing fraction parameter, corresponding to the fraction of diseased cases where the disease was visible to the radiologist, and one centered at zero, with integrated probability (1-α), corresponding to disease that was not visible. It is shown that: (a) for nondiseased cases the bivariate extension is a unit variances bivariate normal distribution centered at (0,0) with a specified correlation ρ1 ; (b) for diseased cases the bivariate extension is a mixture distribution with four peaks, corresponding to disease not visible in either condition, disease visible in only one condition, contributing two peaks, and disease visible in both conditions. An expression for the likelihood function is derived. A maximum likelihood estimation (MLE) algorithm, CORCBM, was implemented in the R programming language that yields parameter estimates and the covariance matrix of the parameters, and other statistics. A limited simulation validation of the method was performed. RESULTS CORCBM and CORROC2 were applied to two datasets containing nine readers each contributing paired interpretations. CORCBM successfully fitted the data for all readers, whereas CORROC2 failed to fit a degenerate dataset. All fits were visually reasonable. All CORCBM fits were proper, whereas all CORROC2 fits were improper. CORCBM and CORROC2 were in agreement (a) in declaring only one of the nine readers as having significantly different performances in the two modalities; (b) in estimating higher correlations for diseased cases than for nondiseased ones; and (c) in finding that the intermodality correlation estimates for nondiseased cases were consistent between the two methods. All CORCBM fits yielded higher area under curve (AUC) than the CORROC2 fits, consistent with the fact that a proper ROC model like CORCBM is based on a likelihood-ratio-equivalent decision variable, and consequently yields higher performance than the binormal model-based CORROC2. The method gave satisfactory fits to four simulated datasets. CONCLUSIONS CORCBM is a robust method for fitting paired ROC datasets, always yielding proper ROC curves, and able to fit degenerate datasets.
Collapse
Affiliation(s)
- Xuetong Zhai
- Swanson School of Engineering, Department of Bioengineering, University of Pittsburgh, 302 Benedum Hall, 3700 O'Hara Street, Pittsburgh, PA, 15260, USA
| | - Dev P Chakraborty
- CEO, ExpertCAD Analytics, LLC, 2103 Noble Ct, Murrysville, PA, 15668, USA
| |
Collapse
|
47
|
Yin J. Using the ROC Curve to Measure Association and Evaluate Prediction Accuracy for a Binary Outcome. ACTA ACUST UNITED AC 2017. [DOI: 10.15406/bbij.2017.05.00134] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
48
|
|
49
|
Samala RK, Chan HP, Hadjiiski LM, Helvie MA. Analysis of computer-aided detection techniques and signal characteristics for clustered microcalcifications on digital mammography and digital breast tomosynthesis. Phys Med Biol 2016; 61:7092-7112. [PMID: 27648708 DOI: 10.1088/0031-9155/61/19/7092] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
With IRB approval, digital breast tomosynthesis (DBT) images of human subjects were collected using a GE GEN2 DBT prototype system. Corresponding digital mammograms (DMs) of the same subjects were collected retrospectively from patient files. The data set contained a total of 237 views of DBT and equal number of DM views from 120 human subjects, each included 163 views with microcalcification clusters (MCs) and 74 views without MCs. The data set was separated into training and independent test sets. The pre-processing, object prescreening and segmentation, false positive reduction and clustering strategies for MC detection by three computer-aided detection (CADe) systems designed for DM, DBT, and a planar projection image generated from DBT were analyzed. Receiver operating characteristic (ROC) curves based on features extracted from microcalcifications and free-response ROC (FROC) curves based on scores from MCs were used to quantify the performance of the systems. Jackknife FROC (JAFROC) and non-parametric analysis methods were used to determine the statistical difference between the FROC curves. The difference between the CADDM and CADDBT systems when the false positive rate was estimated from cases without MCs did not reach statistical significance. The study indicates that the large search space in DBT may not be a limiting factor for CADe to achieve similar performance as that observed in DM.
Collapse
Affiliation(s)
- Ravi K Samala
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109-5842, USA
| | | | | | | |
Collapse
|
50
|
Li H, Zhu Y, Burnside ES, Huang E, Drukker K, Hoadley KA, Fan C, Conzen SD, Zuley M, Net JM, Sutton E, Whitman GJ, Morris E, Perou CM, Ji Y, Giger ML. Quantitative MRI radiomics in the prediction of molecular classifications of breast cancer subtypes in the TCGA/TCIA data set. NPJ Breast Cancer 2016; 2. [PMID: 27853751 PMCID: PMC5108580 DOI: 10.1038/npjbcancer.2016.12] [Citation(s) in RCA: 230] [Impact Index Per Article: 28.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Using quantitative radiomics, we demonstrate that computer-extracted magnetic resonance (MR) image-based tumor phenotypes can be predictive of the molecular classification of invasive breast cancers. Radiomics analysis was performed on 91 MRIs of biopsy-proven invasive breast cancers from National Cancer Institute’s multi-institutional TCGA/TCIA. Immunohistochemistry molecular classification was performed including estrogen receptor, progesterone receptor, human epidermal growth factor receptor 2, and for 84 cases, the molecular subtype (normal-like, luminal A, luminal B, HER2-enriched, and basal-like). Computerized quantitative image analysis included: three-dimensional lesion segmentation, phenotype extraction, and leave-one-case-out cross validation involving stepwise feature selection and linear discriminant analysis. The performance of the classifier model for molecular subtyping was evaluated using receiver operating characteristic analysis. The computer-extracted tumor phenotypes were able to distinguish between molecular prognostic indicators; area under the ROC curve values of 0.89, 0.69, 0.65, and 0.67 in the tasks of distinguishing between ER+ versus ER−, PR+ versus PR−, HER2+ versus HER2−, and triple-negative versus others, respectively. Statistically significant associations between tumor phenotypes and receptor status were observed. More aggressive cancers are likely to be larger in size with more heterogeneity in their contrast enhancement. Even after controlling for tumor size, a statistically significant trend was observed within each size group (P=0.04 for lesions ⩽2 cm; P=0.02 for lesions >2 to ⩽5 cm) as with the entire data set (P-value=0.006) for the relationship between enhancement texture (entropy) and molecular subtypes (normal-like, luminal A, luminal B, HER2-enriched, basal-like). In conclusion, computer-extracted image phenotypes show promise for high-throughput discrimination of breast cancer subtypes and may yield a quantitative predictive signature for advancing precision medicine.
Collapse
Affiliation(s)
- Hui Li
- Department of Radiology, The University of Chicago, Chicago, IL, USA
| | - Yitan Zhu
- Program of Computational Genomics & Medicine, NorthShore University HealthSystem, Evanston, IL, USA
| | | | - Erich Huang
- National Cancer Institute, Cancer Imaging Program, Bethesda, MA, USA
| | - Karen Drukker
- Department of Radiology, The University of Chicago, Chicago, IL, USA
| | - Katherine A Hoadley
- Department of Genetics, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | - Cheng Fan
- Department of Genetics, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | - Suzanne D Conzen
- Department of Medicine, The University of Chicago, Chicago, IL, USA
| | - Margarita Zuley
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Jose M Net
- Department of Radiology, University of Miami Health System, Miami, FL, USA
| | - Elizabeth Sutton
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Gary J Whitman
- Department of Radiology, MD Anderson Cancer Center, Houston, TX, USA
| | - Elizabeth Morris
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Charles M Perou
- Department of Genetics, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | - Yuan Ji
- Program of Computational Genomics & Medicine, NorthShore University HealthSystem, Evanston, IL, USA; Department of Public Health Sciences, University of Chicago, Chicago, IL, USA
| | - Maryellen L Giger
- Department of Radiology, The University of Chicago, Chicago, IL, USA
| |
Collapse
|