1
|
Yapp KE, Ekpo E. Clinical history and incidental abnormality detection in endodontic cone beam computed tomography. J Med Imaging (Bellingham) 2023; 10:045502. [PMID: 37529625 PMCID: PMC10390029 DOI: 10.1117/1.jmi.10.4.045502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 06/28/2023] [Accepted: 07/21/2023] [Indexed: 08/03/2023] Open
Abstract
Purpose To assess the effect of clinical history on incidental abnormality detection, false positive (FP) marks, and diagnostic confidence in endodontic cone beam computed tomography (CBCT) imaging. Approach A reader performance study using a free-response, factorial study design was undertaken, which accounted for changes in the independent variables: native case type, native case severity, reader type, and reading modality. Twenty-three readers interpreted 26 cases (18 diseased and 8 non-diseased) twice, once with and once without access to clinical history. Each case had at least one incidental abnormality that was not a native endodontic finding. Lesion localization (LL), non-localizations (FPs), and diagnostic confidence (rating 2, 3, or 4: lowest, middle, and highest, respectively) of incidental abnormalities were analyzed. Results Clinical history increased the detection of incidental abnormalities in non-diseased subtle cases (76 versus 59, p = 0.04 ). Reader experience and monthly CBCT reading volume did not affect incidental abnormality detection. FPs were neither affected by clinical history nor reader characteristics. The highest confidence rating was most often used in each case type when clinical history was available. For this rating, history had significantly greater LLs in subtle diseased (53 versus 41, p = 0.03 ) and non-diseased images (53 versus 33, p = 0.02 ). Conclusions Clinical history improved the detection of incidental endodontic abnormalities in non-diseased subtle CBCT images and did not affect the number of FP marks. Reader confidence in correctly identified abnormalities was higher with clinical history when disease and non-disease were subtle but was not associated with an improvement in diagnostic performance.
Collapse
Affiliation(s)
- Kehn E. Yapp
- The University of Sydney, School of Health Sciences, Faculty of Medicine and Health, Medical Image Optimisation and Perception Group (MIOPeG), Discipline of Medical Imaging Science, Camperdown, New South Wales, Australia
| | - Ernest Ekpo
- The University of Sydney, School of Health Sciences, Faculty of Medicine and Health, Medical Image Optimisation and Perception Group (MIOPeG), Discipline of Medical Imaging Science, Camperdown, New South Wales, Australia
| |
Collapse
|
2
|
Yapp KE, Suleiman M, Brennan P, Ekpo E. Periapical Radiography versus Cone Beam Computed Tomography in Endodontic Disease Detection: A Free-response, Factorial Study. J Endod 2023; 49:419-429. [PMID: 36773745 DOI: 10.1016/j.joen.2023.02.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 01/17/2023] [Accepted: 02/01/2023] [Indexed: 02/11/2023]
Abstract
AIM To assess and compare reader performance in interpreting digital periapical (PA) radiography and cone beam computed tomography (CBCT) in endodontic disease detection, using a free-response, factorial model. MATERIALS AND METHODS A reader performance study of 2 image test sets was undertaken using a factorial, free-response design, accounting for the independent variables: case type, case severity, reader type, and imaging modality. Twenty-two readers interpreted 60 PA and 60 CBCT images divided into 5 categories: diseased-subtle, diseased-moderate, diseased-obvious, nondiseased-subtle, and nondiseased-obvious. Lesion localization fraction, specificity, false positive (FP) marks, and the weighted alternative free-response receiver operating characteristic figure of merit were calculated. RESULTS CBCT had greater specificity than PA in the obvious nondiseased cases (P = .01) and no significant difference in the subtle nondiseased category. Weighted alternative free-response receiver operating characteristic values were higher for PA than CBCT in the subtle diseased (P = .02) and moderate diseased (P = .01) groups with no significant difference between in the obvious diseased groups. CBCT had higher mean FPs than PA (P < .05) in subtle diseased cases. Mean lesion localization fraction in the moderate diseased group was higher in PA than CBCT (P = .003). No relationships were found between clinical experience and all diagnostic performance measures, except for in the obvious diseased CBCT group, where increasing experience was associated mean FP marks (P = .04). CONCLUSIONS Reader performance in the detection of endodontic disease is better with PA radiography than CBCT. Clinical experience does not impact upon the accuracy of interpretation of both PA radiography and CBCT.
Collapse
Affiliation(s)
- Kehn E Yapp
- Medical Image Optimisation and Perception Group (MIOPeG), Discipline of Medical Imaging Science, Faculty of Medicine and Health, School of Health Sciences, The University of Sydney, Camperdown, New South Wales, Australia.
| | - Mo'ayyad Suleiman
- Medical Image Optimisation and Perception Group (MIOPeG), Discipline of Medical Imaging Science, Faculty of Medicine and Health, School of Health Sciences, The University of Sydney, Camperdown, New South Wales, Australia
| | - Patrick Brennan
- Medical Image Optimisation and Perception Group (MIOPeG), Discipline of Medical Imaging Science, Faculty of Medicine and Health, School of Health Sciences, The University of Sydney, Camperdown, New South Wales, Australia
| | - Ernest Ekpo
- Medical Image Optimisation and Perception Group (MIOPeG), Discipline of Medical Imaging Science, Faculty of Medicine and Health, School of Health Sciences, The University of Sydney, Camperdown, New South Wales, Australia
| |
Collapse
|
3
|
The effect of clinical history on diagnostic performance of endodontic cone-beam CT interpretation. Clin Radiol 2023; 78:e433-e441. [PMID: 36702710 DOI: 10.1016/j.crad.2022.12.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 11/21/2022] [Accepted: 12/09/2022] [Indexed: 01/12/2023]
Abstract
AIM To assess the effect of clinical history on the interpretation of endodontic disease in dental cone-beam computed tomography (CBCT). MATERIALS AND METHODS A reader performance study of an image test-set was undertaken using a factorial, free-response, crossover design, accounting for the independent variables: case type, case severity, reader type, and reading modality. Twenty-three readers interpreted 60 CBCT images twice over two reading sessions using a balanced design, once with access to clinical history and once without, where 30 in each session included history. Lesion localisations, specificity, false-positive marks and the weighted alternative free-response receiver operating characteristic (wAFROC1) figure of merit were calculated. RESULTS Clinical history had no significant effect on specificity and false-positive rates in non-diseased cases (p>0.05), but improved lesion localisation in subtle and obvious diseased cases (p<0.01). wAFROC1 values were higher with clinical history for subtle (0.58 versus 0.48; p<0.001) and obvious (0.77 versus 0.71; p=0.006) diseased categories. No associations were observed between clinical history and both readers' years of experience and reading volume in the non-diseased categories. Readers with fewer (p=0.03) and moderate (p=0.008) years of experience and low (p=0.002) CBCT reading volume demonstrated better lesion localisation in subtle diseased cases when clinical history was available. CONCLUSIONS Clinical history improved the interpretation of CBCT images with disease without affecting the interpretation of images without disease. Less and moderately experienced readers and low-volume readers benefitted more from availability of clinical history.
Collapse
|
4
|
Establishment of image quality for MRI of the knee joint using a list of anatomical criteria. Radiography (Lond) 2018; 24:196-203. [DOI: 10.1016/j.radi.2018.01.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Revised: 01/28/2018] [Accepted: 01/30/2018] [Indexed: 11/21/2022]
|
5
|
Chakraborty DP, Zhai X. On the meaning of the weighted alternative free-response operating characteristic figure of merit. Med Phys 2017; 43:2548. [PMID: 27147365 DOI: 10.1118/1.4947125] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Abstract
PURPOSE The free-response receiver operating characteristic (FROC) method is being increasingly used to evaluate observer performance in search tasks. Data analysis requires definition of a figure of merit (FOM) quantifying performance. While a number of FOMs have been proposed, the recommended one, namely, the weighted alternative FROC (wAFROC) FOM, is not well understood. The aim of this work is to clarify the meaning of this FOM by relating it to the empirical area under a proposed wAFROC curve. METHODS The weighted wAFROC FOM is defined in terms of a quasi-Wilcoxon statistic that involves weights, coding the clinical importance, assigned to each lesion. A new wAFROC curve is proposed, the y-axis of which incorporates the weights, giving more credit for marking clinically important lesions, while the x-axis is identical to that of the AFROC curve. An expression is derived relating the area under the empirical wAFROC curve to the wAFROC FOM. Examples are presented with small numbers of cases showing how AFROC and wAFROC curves are affected by correct and incorrect decisions and how the corresponding FOMs credit or penalize these decisions. The wAFROC, AFROC, and inferred ROC FOMs were applied to three clinical data sets involving multiple reader FROC interpretations in different modalities. RESULTS It is shown analytically that the area under the empirical wAFROC curve equals the wAFROC FOM. This theorem is the FROC analog of a well-known theorem developed in 1975 for ROC analysis, which gave meaning to a Wilcoxon statistic based ROC FOM. A similar equivalence applies between the area under the empirical AFROC curve and the AFROC FOM. The examples show explicitly that the wAFROC FOM gives equal importance to all diseased cases, regardless of the number of lesions, a desirable statistical property not shared by the AFROC FOM. Applications to the clinical data sets show that the wAFROC FOM yields results comparable to that using the AFROC FOM. CONCLUSIONS The equivalence theorem gives meaning to the weighted AFROC FOM, namely, it is identical to the empirical area under weighted AFROC curve.
Collapse
Affiliation(s)
- Dev P Chakraborty
- Department of Radiology, University of Pittsburgh, Pittsburgh, Pennsylvania 15668
| | - Xuetong Zhai
- Department of Bioengineering, University of Pittsburgh, Pittsburgh, Pennsylvania 15668
| |
Collapse
|
6
|
Zarb F, McEntee MF, Rainford L. Visual grading characteristics and ordinal regression analysis during optimisation of CT head examinations. Insights Imaging 2015; 6:393-401. [PMID: 25510470 PMCID: PMC4444791 DOI: 10.1007/s13244-014-0374-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Revised: 11/16/2014] [Accepted: 11/21/2014] [Indexed: 11/26/2022] Open
Abstract
OBJECTIVES To evaluate visual grading characteristics (VGC) and ordinal regression analysis during head CT optimisation as a potential alternative to visual grading assessment (VGA), traditionally employed to score anatomical visualisation. METHODS Patient images (n = 66) were obtained using current and optimised imaging protocols from two CT suites: a 16-slice scanner at the national Maltese centre for trauma and a 64-slice scanner in a private centre. Local resident radiologists (n = 6) performed VGA followed by VGC and ordinal regression analysis. RESULTS VGC alone indicated that optimised protocols had similar image quality as current protocols. Ordinal logistic regression analysis provided an in-depth evaluation, criterion by criterion allowing the selective implementation of the protocols. The local radiology review panel supported the implementation of optimised protocols for brain CT examinations (including trauma) in one centre, achieving radiation dose reductions ranging from 24 % to 36 %. In the second centre a 29 % reduction in radiation dose was achieved for follow-up cases. CONCLUSIONS The combined use of VGC and ordinal logistic regression analysis led to clinical decisions being taken on the implementation of the optimised protocols. This improved method of image quality analysis provided the evidence to support imaging protocol optimisation, resulting in significant radiation dose savings. MAIN MESSAGES • There is need for scientifically based image quality evaluation during CT optimisation. • VGC and ordinal regression analysis in combination led to better informed clinical decisions. • VGC and ordinal regression analysis led to dose reductions without compromising diagnostic efficacy.
Collapse
Affiliation(s)
- Francis Zarb
- Department of Radiography, Faculty of Health Sciences, University of Malta, Msida, Malta
| | - Mark F. McEntee
- Discipline of Medical Radiation Sciences and Brain and Mind Research Institute, Faculty of Health Sciences, The University of Sydney, Sydney, Australia
| | - Louise Rainford
- School of Medicine & Medical Science, Health Science Centre, University College Dublin, Belfield, Dublin 4, Ireland
| |
Collapse
|
7
|
Zarb F, McEntee MF, Rainford L. A multi-phased study of optimisation methodologies and radiation dose savings for head CT examinations. RADIATION PROTECTION DOSIMETRY 2015; 163:480-490. [PMID: 25009189 DOI: 10.1093/rpd/ncu227] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
The impact of optimisation methods on dose reductions for head computerised tomography was undertaken in three phases for two manufacturer models. Phase 1: a Catphan(®)600 was employed to evaluate protocols where the impact of parameter manipulation on dose and image quality was gauged by psychophysical measurements of contrast and spatial resolution in terms of contrast discs and line pairs. mA, kV and pitch were systematically altered until the optimisation threshold was identified. Phantom studies provide dose comparisons during optimisation but lack anatomical detail. Phase 2: optimised protocols were tested on a porcine model permitting further dose reductions over phantom findings providing anatomical structures for image quality evaluation using relative visual grading analysis of anatomical criteria. Phase 3: patient images using pre- and post-optimised protocols were clinically audited using visual grading characteristic analysis and ordinal regression analysis providing a robust analysis of image quality data prior to clinical implementation.
Collapse
Affiliation(s)
- Francis Zarb
- Department of Radiography, Faculty of Health Sciences, University of Malta, Msida, Malta
| | - Mark F McEntee
- Discipline of Medical Radiation Sciences and Brain and Mind Research Institute, Faculty of Health Sciences, The University of Sydney, Sydney, Australia
| | - Louise Rainford
- School of Medicine and Medical Science, Health Science Centre, University College Dublin, Belfield Dublin 4, Ireland
| |
Collapse
|
8
|
He X, Samuelson F, Zeng R, Sahiner B. Discovering intrinsic properties of human observers' visual search and mathematical observers' scanning. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA. A, OPTICS, IMAGE SCIENCE, AND VISION 2014; 31:2495-2510. [PMID: 25401363 DOI: 10.1364/josaa.31.002495] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
There is a lack of consensus in measuring observer performance in search tasks. To pursue a consensus, we set our goal to obtain metrics that are practical, meaningful, and predictive. We consider a metric practical if it can be implemented to measure human and computer observers' performance. To be meaningful, we propose to discover intrinsic properties of search observers and formulate the metrics to characterize these properties. If the discovered properties allow verifiable predictions, we consider them predictive. We propose a theory and a conjecture toward two intrinsic properties of search observers: rationality in classification as measured by the location-known-exactly (LKE) receiver operating characteristic (ROC) curve and location uncertainty as measured by the effective set size (M*). These two properties are used to develop search models in both single-response and free-response search tasks. To confirm whether these properties are "intrinsic," we investigate their ability in predicting search performance of both human and scanning channelized Hotelling observers. In particular, for each observer, we designed experiments to measure the LKE-ROC curve and M*, which were then used to predict the same observer's performance in other search tasks. The predictions were then compared to the experimentally measured observer performance. Our results indicate that modeling the search performance using the LKE-ROC curve and M* leads to successful predictions in most cases.
Collapse
|
9
|
Petrick N, Sahiner B, Armato SG, Bert A, Correale L, Delsanto S, Freedman MT, Fryd D, Gur D, Hadjiiski L, Huo Z, Jiang Y, Morra L, Paquerault S, Raykar V, Samuelson F, Summers RM, Tourassi G, Yoshida H, Zheng B, Zhou C, Chan HP. Evaluation of computer-aided detection and diagnosis systems. Med Phys 2014; 40:087001. [PMID: 23927365 DOI: 10.1118/1.4816310] [Citation(s) in RCA: 65] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Computer-aided detection and diagnosis (CAD) systems are increasingly being used as an aid by clinicians for detection and interpretation of diseases. Computer-aided detection systems mark regions of an image that may reveal specific abnormalities and are used to alert clinicians to these regions during image interpretation. Computer-aided diagnosis systems provide an assessment of a disease using image-based information alone or in combination with other relevant diagnostic data and are used by clinicians as a decision support in developing their diagnoses. While CAD systems are commercially available, standardized approaches for evaluating and reporting their performance have not yet been fully formalized in the literature or in a standardization effort. This deficiency has led to difficulty in the comparison of CAD devices and in understanding how the reported performance might translate into clinical practice. To address these important issues, the American Association of Physicists in Medicine (AAPM) formed the Computer Aided Detection in Diagnostic Imaging Subcommittee (CADSC), in part, to develop recommendations on approaches for assessing CAD system performance. The purpose of this paper is to convey the opinions of the AAPM CADSC members and to stimulate the development of consensus approaches and "best practices" for evaluating CAD systems. Both the assessment of a standalone CAD system and the evaluation of the impact of CAD on end-users are discussed. It is hoped that awareness of these important evaluation elements and the CADSC recommendations will lead to further development of structured guidelines for CAD performance assessment. Proper assessment of CAD system performance is expected to increase the understanding of a CAD system's effectiveness and limitations, which is expected to stimulate further research and development efforts on CAD technologies, reduce problems due to improper use, and eventually improve the utility and efficacy of CAD in clinical practice.
Collapse
Affiliation(s)
- Nicholas Petrick
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, Maryland 20993, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
A brief history of free-response receiver operating characteristic paradigm data analysis. Acad Radiol 2013; 20:915-9. [PMID: 23583665 DOI: 10.1016/j.acra.2013.03.001] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2013] [Revised: 03/01/2013] [Accepted: 03/07/2013] [Indexed: 11/23/2022]
Abstract
In the receiver operating characteristic paradigm the observer assigns a single rating to each image and the location of the perceived abnormality, if any, is ignored. In the free-response receiver operating characteristic paradigm the observer is free to mark and rate as many suspicious regions as are considered clinically reportable. Credit for a correct localization is given only if a mark is sufficiently close to an actual lesion; otherwise, the observer's mark is scored as a location-level false positive. Until fairly recently there existed no accepted method for analyzing the resulting relatively unstructured data containing random numbers of mark-rating pairs per image. This report reviews the history of work in this field, which has now spanned more than five decades. It introduces terminology used to describe the paradigm, proposed measures of performance (figures of merit), ways of visualizing the data (operating characteristics), and software for analyzing free-response receiver operating characteristic studies.
Collapse
|
11
|
Zanca F, Hillis SL, Claus F, Van Ongeval C, Celis V, Provoost V, Yoon HJ, Bosmans H. Correlation of free-response and receiver-operating-characteristic area-under-the-curve estimates: results from independently conducted FROC∕ROC studies in mammography. Med Phys 2012; 39:5917-29. [PMID: 23039631 DOI: 10.1118/1.4747262] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
PURPOSE From independently conducted free-response receiver operating characteristic (FROC) and receiver operating characteristic (ROC) experiments, to study fixed-reader associations between three estimators: the area under the alternative FROC (AFROC) curve computed from FROC data, the area under the ROC curve computed from FROC highest rating data, and the area under the ROC curve computed from confidence-of-disease ratings. METHODS Two hundred mammograms, 100 of which were abnormal, were processed by two image-processing algorithms and interpreted by four radiologists under the FROC paradigm. From the FROC data, inferred-ROC data were derived, using the highest rating assumption. Eighteen months afterwards, the images were interpreted by the same radiologists under the conventional ROC paradigm; conventional-ROC data (in contrast to inferred-ROC data) were obtained. FROC and ROC (inferred, conventional) data were analyzed using the nonparametric area-under-the-curve (AUC), (AFROC and ROC curve, respectively). Pearson correlation was used to quantify the degree of association between the modality-specific AUC indices and standard errors were computed using the bootstrap-after-bootstrap method. The magnitude of the correlations was assessed by comparison with computed Obuchowski-Rockette fixed reader correlations. RESULTS Average Pearson correlations (with 95% confidence intervals in square brackets) were: Corr(FROC, inferred ROC) = 0.76[0.64, 0.84] > Corr(inferred ROC, conventional ROC) = 0.40[0.18, 0.58] > Corr (FROC, conventional ROC) = 0.32[0.16, 0.46]. CONCLUSIONS Correlation between FROC and inferred-ROC data AUC estimates was high. Correlation between inferred- and conventional-ROC AUC was similar to the correlation between two modalities for a single reader using one estimation method, suggesting that the highest rating assumption might be questionable.
Collapse
Affiliation(s)
- Federica Zanca
- Department of Radiology, University Hospitals Leuven, Leuven, Belgium. @ac.be
| | | | | | | | | | | | | | | |
Collapse
|
12
|
Chakraborty DP, Yoon HJ, Mello-Thoms C. Application of threshold-bias independent analysis to eye-tracking and FROC data. Acad Radiol 2012; 19:1474-83. [PMID: 23040503 PMCID: PMC3489965 DOI: 10.1016/j.acra.2012.09.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2012] [Revised: 09/08/2012] [Accepted: 09/08/2012] [Indexed: 10/27/2022]
Abstract
RATIONALE AND OBJECTIVES Studies of medical image interpretation have focused on either assessing radiologists' performance using, for example, the receiver operating characteristic (ROC) paradigm, or assessing the interpretive process by analyzing their eye-tracking (ET) data. Analysis of ET data has not benefited from threshold-bias independent figures of merit (FOMs) analogous to the area under the receiver operating characteristic (ROC) curve. The aim was to demonstrate the feasibility of such FOMs and to measure the agreement between FOMs derived from free-response ROC (FROC) and ET data. METHODS Eight expert breast radiologists interpreted a case set of 120 two-view mammograms while eye-position data and FROC data were continuously collected during the interpretation interval. Regions that attract prolonged (>800 ms) visual attention were considered to be virtual marks, and ratings based on the dwell and approach-rate (inverse of time-to-hit) were assigned to them. The virtual ratings were used to define threshold-bias independent FOMs in a manner analogous to the area under the trapezoidal alternative FROC (AFROC) curve (0 = worst, 1 = best). Agreement at the case level (0.5 = chance, 1 = perfect) was measured using the jackknife and 95% confidence intervals (CI) for the FOMs and agreement were estimated using the bootstrap. RESULTS The AFROC mark-ratings' FOM was largest at 0.734 (CI 0.65-0.81) followed by the dwell at 0.460 (0.34-0.59) and then by the approach-rate FOM 0.336 (0.25-0.46). The differences between the FROC mark-ratings' FOM and the perceptual FOMs were significant (P < .05). All pairwise agreements were significantly better then chance: ratings vs. dwell 0.707 (0.63-0.88), dwell vs. approach-rate 0.703 (0.60-0.79) and rating vs. approach-rate 0.606 (0.53-0.68). The ratings vs. approach-rate agreement was significantly smaller than the dwell vs. approach-rate agreement (P = .008). CONCLUSIONS Leveraging current methods developed for analyzing observer performance data could complement current ways of analyzing ET data and lead to new insights.
Collapse
Affiliation(s)
- Dev P. Chakraborty
- Department of Radiology, University of Pittsburgh, Presbyterian South Tower, Room 4771, 200 Lothrop Street, Pittsburgh, PA 15213, 412-605-1553 (p), 412-605-1554 (f), 412-605-1553 (phone), 412-605-1554 (fax)
| | - Hong-Jun Yoon
- Department of Radiology, University of Pittsburgh, Presbyterian South Tower, Room 4771, 200 Lothrop Street, Pittsburgh, PA 15213, 412-605-1553 (p), 412-605-1554 (f), 412-605-1553 (phone), 412-605-1554 (fax)
| | - Claudia Mello-Thoms
- University of Pittsburgh School of Medicine, Department of Biomedical Informatics and Department of Radiology, The Offices at Baum, 5th floor, Room 516, 5607 Baum Blvd, Pittsburgh, PA 15206-3701, Phone: (412) 648–9314
| |
Collapse
|
13
|
Svahn TM, Chakraborty DP, Ikeda D, Zackrisson S, Do Y, Mattsson S, Andersson I. Breast tomosynthesis and digital mammography: a comparison of diagnostic accuracy. Br J Radiol 2012; 85:e1074-82. [PMID: 22674710 PMCID: PMC3500806 DOI: 10.1259/bjr/53282892] [Citation(s) in RCA: 107] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2011] [Revised: 02/28/2012] [Accepted: 03/14/2012] [Indexed: 11/05/2022] Open
Abstract
OBJECTIVE Our aim was to compare the ability of radiologists to detect breast cancers using one-view breast tomosynthesis (BT) and two-view digital mammography (DM) in an enriched population of diseased patients and benign and/or healthy patients. METHODS All participants gave informed consent. The BT and DM examinations were performed with about the same average glandular dose to the breast. The study population comprised patients with subtle signs of malignancy seen on DM and/or ultrasonography. Ground truth was established by pathology, needle biopsy and/or by 1-year follow-up by mammography, which retrospectively resulted in 89 diseased breasts (1 breast per patient) with 95 malignant lesions and 96 healthy or benign breasts. Two experienced radiologists, who were not participants in the study, determined the locations of the malignant lesions. Five radiologists, experienced in mammography, interpreted the cases independently in a free-response study. The data were analysed by the receiver operating characteristic (ROC) and jackknife alternative free-response ROC (JAFROC) methods, regarding both readers and cases as random effects. RESULTS The diagnostic accuracy of BT was significantly better than that of DM (JAFROC: p=0.0031, ROC: p=0.0415). The average sensitivity of BT was higher than that of DM (∼90% vs ∼79%; 95% confidence interval of difference: 0.036, 0.108) while the average false-positive fraction was not significantly different (95% confidence interval of difference: -0.117, 0.010). CONCLUSION The diagnostic accuracy of BT was superior to DM in an enriched population.
Collapse
Affiliation(s)
- T M Svahn
- Medical Radiation Physics, Department of Clinical Sciences Malmö, Lund University, Skåne University Hospital, Malmö, Sweden.
| | | | | | | | | | | | | |
Collapse
|
14
|
Abstract
A common task in medical imaging is assessing whether a new imaging system, or a variant of an existing one, is an improvement over an existing imaging technology. Imaging systems are generally quite complex, consisting of several components-for example, image acquisition hardware, image processing and display hardware and software, and image interpretation by radiologists- each of which can affect performance. Although it may appear odd to include the radiologist as a "component" of the imaging chain, because the radiologist's decision determines subsequent patient care, the effect of the human interpretation has to be included. Physical measurements such as modulation transfer function, signal-to-noise ratio, are useful for characterizing the nonhuman parts of the imaging chain under idealized and often unrealistic conditions, such as uniform background phantoms and target objects with sharp edges. Measuring the performance of the entire imaging chain, including the radiologist, and using real clinical images requires different methods that fall under the rubric of observer performance methods or "ROC" analysis, that involve collecting rating data on images. The purpose of this work is to review recent developments in this field, particularly with respect to the free-response method, where location information is also collected.
Collapse
Affiliation(s)
- Dev P Chakraborty
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA.
| |
Collapse
|
15
|
Chakraborty DP. Recent developments in imaging system assessment methodology, FROC analysis and the search model. NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH. SECTION A, ACCELERATORS, SPECTROMETERS, DETECTORS AND ASSOCIATED EQUIPMENT 2011; 648 Supplement 1:S297-S301. [PMID: 21804679 PMCID: PMC3144765 DOI: 10.1016/j.nima.2010.11.042] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
A frequent problem in imaging is assessing whether a new imaging system is an improvement over an existing standard. Observer performance methods, in particular the receiver operating characteristic (ROC) paradigm, are widely used in this context. In ROC analysis lesion location information is not used and consequently scoring ambiguities can arise in tasks, such as nodule detection, involving finding localized lesions. This paper reviews progress in the free-response ROC (FROC) paradigm in which the observer marks and rates suspicious regions and the location information is used to determine whether lesions were correctly localized. Reviewed are FROC data analysis, a search-model for simulating FROC data, predictions of the model and a method for estimating the parameters. The search model parameters are physically meaningful quantities that can guide system optimization.
Collapse
|
16
|
Zarb F, Rainford L, McEntee MF. Image quality assessment tools for optimization of CT images. Radiography (Lond) 2010. [DOI: 10.1016/j.radi.2009.10.002] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
|
17
|
Abstract
The purpose of this paper is to summarise recent progress in free-response receiver operating characteristic (FROC) methodology. These are: (1) jackknife alternative FROC analysis including recent extensions and alternative methods; (2) the search-model simulator that enables validation and objective comparison of different methods of analysing the data; (3) case-based analysis that has the potential of greater clinical relevance than conventional free-response analysis; (4) a method for collectively analysing the multiple lesion types in an image (e.g. microcalcifications, masses and architectural distortions); (5) a method for sample-size estimation for FROC studies; and (6) a method for determining an objective proximity criterion, namely how close must a mark be to a true lesion in order to credit the observer for a true localisation. FROC analysis is being increasingly used to evaluate the imaging systems and understanding of recent progress should help researchers conduct better FROC studies.
Collapse
Affiliation(s)
- D P Chakraborty
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA, USA.
| |
Collapse
|
18
|
Zanca F, Chakraborty DP, Marchal G, Bosmans H. Consistency of methods for analysing location-specific data. RADIATION PROTECTION DOSIMETRY 2010; 139:52-56. [PMID: 20159917 PMCID: PMC2868070 DOI: 10.1093/rpd/ncq030] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Although the receiver operating characteristic (ROC) method is the acknowledged gold-standard for imaging system assessment, it ignores localisation information and differentiation between multiple abnormalities per case. As the free-response ROC (FROC) method uses localisation information and more closely resembles the clinical reporting process, it is being increasingly used. A number of methods have been proposed to analyse the data that result from an FROC study: jackknife alternative FROC (JAFROC) and a variant termed JAFROC1, initial detection and candidate analysis (IDCA) and ROC analysis via the reduction of the multiple ratings on a case to a single rating. The focus of this paper was to compare JAFROC1, IDCA and the ROC analysis methods using a clinical FROC human data set. All methods agreed on the ordering of the modalities and all yielded statistically significant differences of the figures-of-merit, i.e. p < 0.05. Both IDCA and JAFROC1 yielded much smaller p-values than ROC. The results are consistent with a recent simulation-based validation study comparing these and other methods. In conclusion, IDCA or JAFROC1 analysis of FROC human data may be superior at detecting modality differences than ROC analysis.
Collapse
Affiliation(s)
- F Zanca
- Department of Radiology, Leuven University Center of Medical Physics in Radiology, University Hospitals Leuven, 3000 Leuven, Belgium.
| | | | | | | |
Collapse
|
19
|
Chakraborty DP. Clinical relevance of the ROC and free-response paradigms for comparing imaging system efficacies. RADIATION PROTECTION DOSIMETRY 2010; 139:37-41. [PMID: 20139268 PMCID: PMC2868120 DOI: 10.1093/rpd/ncq017] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Observer performance studies are widely used to assess medical imaging systems. Unlike technical/engineering measurements observer performance include the entire imaging chain and the radiologist. However, the widely used receiver operating characteristic (ROC) method ignores lesion localisation information. The free-response ROC (FROC) method uses the location information to appropriately reward or penalise correct or incorrect localisations, respectively. This paper describes a method for improving the clinical relevance of FROC studies. The method consists of assigning appropriate risk values to the different lesions that may be present on a single image. A high-risk lesion is one that is critical to detect and act upon, and is assigned a higher risk value than a low-risk lesion, one that is relatively innocuous. Instead of simply counting the number of lesions that are detected, as is done in conventional FROC analysis, a risk-weighted count is used. This has the advantage of rewarding detections of high-risk lesions commensurately more than detections of lower risk lesions. Simulations were used to demonstrate that the new method, termed case-based analysis, results in a higher figure of merit for an expert who detects more high-risk lesions than a naive observer who detects more low-risk lesions, even though both detect the same total number of lesions. Conventional free-response analysis is unable to distinguish between the two types of observers. This paper also comments on the issue of clinical relevance of ROC analysis vs. FROC for tasks that involve lesion localisation.
Collapse
|
20
|
Obuchowski NA, Mazzone PJ, Dachman AH. Bias, underestimation of risk, and loss of statistical power in patient-level analyses of lesion detection. Eur Radiol 2009; 20:584-94. [PMID: 19763582 DOI: 10.1007/s00330-009-1590-4] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2009] [Revised: 07/15/2009] [Accepted: 07/27/2009] [Indexed: 11/28/2022]
Abstract
PURPOSE Sensitivity and the false positive rate are usually defined with the patient as the unit of observation, i.e., the diagnostic test detects or does not detect disease in a patient. For tests designed to find and diagnose lesions, e.g., lung nodules, the usual definitions of sensitivity and specificity may be misleading. In this paper we describe and compare five measures of accuracy of lesion detection. METHODS The five levels of evaluation considered were patient level without localization, patient level with localization, region of interest (ROI) level without localization, ROI level with localization, and lesion level. RESULTS We found that estimators of sensitivity that do not require the reader to correctly locate the lesion overstate sensitivity. Patient-level estimators of sensitivity can be misleading when there is more than one lesion per patient and they reduce study power. Patient-level estimators of the false positive rate can conceal important differences between techniques. Referring clinicians rely on a test's reported accuracy to both choose the appropriate test and plan management for their patients. If reported sensitivity is overstated, the clinician could choose the test for disease screening, and have false confidence that a negative test represents the true absence of lesions. Similarly, the lower false positive rate associated with patient-level estimators can mislead clinicians about the diagnostic value of the test and consequently that a positive finding is real. CONCLUSION We present clear recommendations for studies assessing and comparing the accuracy of tests tasked with the detection and interpretation of lesions...
Collapse
Affiliation(s)
- Nancy A Obuchowski
- Department of Quantitative Health Sciences/JJN3 and the Imaging Institute, Cleveland Clinic Foundation, 9500 Euclid Ave, Cleveland, OH 44195, USA.
| | | | | |
Collapse
|
21
|
Counterpoint to "Performance assessment of diagnostic systems under the FROC paradigm" by Gur and Rockette. Acad Radiol 2009; 16:507-10. [PMID: 19268864 DOI: 10.1016/j.acra.2008.12.011] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2008] [Revised: 12/25/2008] [Accepted: 11/25/2008] [Indexed: 11/23/2022]
|
22
|
Popescu LM. Model for the detection of signals in images with multiple suspicious locations. Med Phys 2009; 35:5565-74. [PMID: 19175114 DOI: 10.1118/1.3002413] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
A signal detection model is presented that combines a signal model and a noise model providing mathematical descriptions of the frequency of appearance of the signals, and of the signal-like features naturally occurring in the background. We derive expressions for the likelihood functions for the whole ensemble of observed suspicious locations, in various possible combinations of signals and false signal candidates. As a result, this formalism is able to describe several new types of detection tests using likelihood ratio statistics. We have a global image abnormality test and an individual signal detection test. The model also provides an alternative mechanism in which is selected the combination of signal and noise features candidates that has the maximum likelihood. These tests can be analyzed with a variety of operating characteristic curves (ROC, LROC, FROC, etc.). In the mathematical formalism of the model, all the details characterizing the suspicious features are reduced to a single scalar function, which we name the signal specificity function, representing the frequency that a signal takes a certain value relative to the frequency of having a false signal with the same value in an image of given size. The signal specificity function ranks the degree of suspiciousness of the features found, and can be used to unify into a single score all the suspicious feature characteristics, and then apply the usual decision conventions as in the Swensson's detection model [Med. Phys. 23, 1709-1725 (1996)]. We present several examples in which these tests are compared. We also show how the signal specificity function can be used to model various degrees of accuracy of the observer's knowledge about image noise and signal statistical properties. Aspects concerning modeling of the human observer are also discussed.
Collapse
Affiliation(s)
- Lucreţiu M Popescu
- Department of Radiology, University of Pennsylvania, Philadelphia, Pennsylvania 19104-6021, USA.
| |
Collapse
|
23
|
Gur D, Bandos AI, King JL, Klym AH, Cohen CS, Hakim CM, Hardesty LA, Ganott MA, Perrin RL, Poller WR, Shah R, Sumkin JH, Wallace LP, Rockette HE. Binary and multi-category ratings in a laboratory observer performance study: a comparison. Med Phys 2008; 35:4404-9. [PMID: 18975686 DOI: 10.1118/1.2977766] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
The authors investigated radiologists, performances during retrospective interpretation of screening mammograms when using a binary decision whether to recall a woman for additional procedures or not and compared it with their receiver operating characteristic (ROC) type performance curves using a semi-continuous rating scale. Under an Institutional Review Board approved protocol nine experienced radiologists independently rated an enriched set of 155 examinations that they had not personally read in the clinic, mixed with other enriched sets of examinations that they had individually read in the clinic, using both a screening BI-RADS rating scale (recall/not recall) and a semi-continuous ROC type rating scale (0 to 100). The vertical distance, namely the difference in sensitivity levels at the same specificity levels, between the empirical ROC curve and the binary operating point were computed for each reader. The vertical distance averaged over all readers was used to assess the proximity of the performance levels under the binary and ROC-type rating scale. There does not appear to be any systematic tendency of the readers towards a better performance when using either of the two rating approaches, namely four readers performed better using the semi-continuous rating scale, four readers performed better with the binary scale, and one reader had the point exactly on the empirical ROC curve. Only one of the nine readers had a binary "operating point" that was statistically distant from the same reader's empirical ROC curve. Reader-specific differences ranged from -0.046 to 0.128 with an average width of the corresponding 95% confidence intervals of 0.2 and p-values ranging for individual readers from 0.050 to 0.966. On average, radiologists performed similarly when using the two rating scales in that the average distance between the run in individual reader's binary operating point and their ROC curve was close to zero. The 95% confidence interval for the fixed-reader average (0.016) was (-0.0206, 0.0631) (two-sided p-value 0.35). In conclusion the authors found that in retrospective observer performance studies the use of a binary response or a semi-continuous rating scale led to consistent results in terms of performance as measured by sensitivity-specificity operating points.
Collapse
Affiliation(s)
- David Gur
- Department of Radiology, University of Pittsburgh, Pittsburgh, Pennsylvania 15213, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Chakraborty DP. Validation and statistical power comparison of methods for analyzing free-response observer performance studies. Acad Radiol 2008; 15:1554-66. [PMID: 19000872 DOI: 10.1016/j.acra.2008.07.018] [Citation(s) in RCA: 75] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2008] [Revised: 07/16/2008] [Accepted: 07/17/2008] [Indexed: 11/26/2022]
Abstract
RATIONALE AND OBJECTIVES The aim of this work was to validate and compare the statistical powers of proposed methods for analyzing free-response data using a search-model-based simulator. MATERIALS AND METHODS A free-response data simulator is described that can model a single reader interpreting the same cases in two modalities, or two computer-aided detection (CAD) algorithms, or two human observers, interpreting the same cases in one modality. A variance components model, analogous to the Roe and Metz receiver-operating characteristic (ROC) data simulator, is described; it models intracase and intermodality correlations in free-response studies. Two generic observers were simulated: a quasi-human observer and a quasi-CAD algorithm. Null hypothesis (NH) validity and statistical powers of ROC, jackknife alternative free-response operating characteristic (JAFROC), a variant of JAFROC termed JAFROC-1, initial detection and candidate analysis (IDCA), and a nonparametric (NP) approach were investigated. RESULTS All methods had valid NH behavior over a wide range of simulator parameters. For equal numbers of normal and abnormal cases, for the human observer, the statistical power ranking of the methods was JAFROC-1 > JAFROC > (IDCA approximately NP) > ROC. For the CAD algorithm, the ranking was (NP approximately IDCA) > (JAFROC-1 approximately JAFROC) > ROC. In either case, the statistical power of the highest ranked method exceeded that of the lowest ranked method by about a factor of two. Dependence of statistical power on simulator parameters followed expected trends. For data sets with more abnormal cases than normal cases, JAFROC-1 power significantly exceeded JAFROC power. CONCLUSION Based on this work, the recommendation is to use JAFROC-1 for human observers (including human observers with CAD assist) and the NP method for evaluating CAD algorithms.
Collapse
|
25
|
Performance assessments of diagnostic systems under the FROC paradigm: experimental, analytical, and results interpretation issues. Acad Radiol 2008; 15:1312-5. [PMID: 18790403 DOI: 10.1016/j.acra.2008.05.006] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2008] [Revised: 05/22/2008] [Accepted: 04/29/2008] [Indexed: 11/22/2022]
Abstract
As use of free response receiver-operating characteristic (FROC) curves gains more acceptance for quantitatively assessing the performance of diagnostic systems, it is important that the experimentalist understands the possible role of this approach as one of the experimental design paradigms that are available to him or her among all other approaches as well as some of the issues associated with FROC type studies. In a number of experimental scenarios, the FROC paradigm and associated analytical tools have theoretical and practical advantages over both the binary and the ROC approaches to performance assessments of diagnostic systems, but it also has some limitations related to experimental design, data analyses, clinical relevance, and complexity in the interpretation of the results. These issues are rarely discussed and are the focus of this work.
Collapse
|
26
|
Song T, Bandos AI, Rockette HE, Gur D. On comparing methods for discriminating between actually negative and actually positive subjects with FROC type data. Med Phys 2008; 35:1547-58. [PMID: 18491549 DOI: 10.1118/1.2890410] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
The task of searching and detecting multiple abnormalities depicted on an image, or a series of images, is a common problem in different areas such as military target detection or diagnostic medical imaging. A free response receiver operating characteristic (FROC) approach for assessing performance in many of these scenarios entails marking the locations of suspected abnormalities and indicating a level of suspicion at each of the marked locations. One of the important characteristics of a system being evaluated under the FROC paradigm is its performance in the conventional ROC domain, namely classifying a subject (or a unit of interest) as "negative" or "positive" in regard to the presence of the abnormality (or any of the abnormalities) of interest. With FROC data we can compare subjects by specifying a function of multiple scores within a subject. This approach allows formulating subject-based ROC type indices that can be estimated using existing ROC concepts. In this article we focus on indices that reflect the ability of the system to discriminate between actually negative and actually positive subjects. We consider a previously proposed index that is based on the comparison of the highest scores on subjects and two new indices that are based on potentially more stable comparison functions, namely comparison of average scores and stochastic dominance. Based on these indices we develop nonparametric procedures for comparing subject-based discriminative ability of diagnostic systems being evaluated under the FROC paradigm. We also investigate the properties of the statistical procedures in a simulation study.
Collapse
Affiliation(s)
- Tao Song
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, USA
| | | | | | | |
Collapse
|
27
|
Bandos AI, Rockette HE, Song T, Gur D. Area under the free-response ROC curve (FROC) and a related summary index. Biometrics 2008; 65:247-56. [PMID: 18479482 DOI: 10.1111/j.1541-0420.2008.01049.x] [Citation(s) in RCA: 83] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Free-response assessment of diagnostic systems continues to gain acceptance in areas related to the detection, localization, and classification of one or more "abnormalities" within a subject. A free-response receiver operating characteristic (FROC) curve is a tool for characterizing the performance of a free-response system at all decision thresholds simultaneously. Although the importance of a single index summarizing the entire curve over all decision thresholds is well recognized in ROC analysis (e.g., area under the ROC curve), currently there is no widely accepted summary of a system being evaluated under the FROC paradigm. In this article, we propose a new index of the free-response performance at all decision thresholds simultaneously, and develop a nonparametric method for its analysis. Algebraically, the proposed summary index is the area under the empirical FROC curve penalized for the number of erroneous marks, rewarded for the fraction of detected abnormalities, and adjusted for the effect of the target size (or "acceptance radius"). Geometrically, the proposed index can be interpreted as a measure of average performance superiority over an artificial "guessing" free-response process and it represents an analogy to the area between the ROC curve and the "guessing" or diagonal line. We derive the ideal bootstrap estimator of the variance, which can be used for a resampling-free construction of asymptotic bootstrap confidence intervals and for sample size estimation using standard expressions. The proposed procedure is free from any parametric assumptions and does not require an assumption of independence of observations within a subject. We provide an example with a dataset sampled from a diagnostic imaging study and conduct simulations that demonstrate the appropriateness of the developed procedure for the considered sample sizes and ranges of parameters.
Collapse
Affiliation(s)
- Andriy I Bandos
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, USA.
| | | | | | | |
Collapse
|
28
|
Chakraborty DP, Yoon HJ. Operating characteristics predicted by models for diagnostic tasks involving lesion localization. Med Phys 2008; 35:435-45. [PMID: 18383663 DOI: 10.1118/1.2820902] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
In 1996 Swensson published an observer model that predicted receiver operating characteristic (ROC), localization ROC (LROC), free-response ROC (FROC) and alternative FROC (AFROC) curves, thereby achieving "unification" of different observer performance paradigms. More recently a model termed initial detection and candidate analysis (IDCA) has been proposed for fitting computer aided detection (CAD) generated FROC data, and recently a search model for human observer FROC data has been proposed. The purpose of this study was to derive IDCA and the search model based expressions for operating characteristics, and to compare the predictions to the Swensson model. For three out of four mammography CAD data sets all models yielded good fits in the high-confidence region, i.e., near the lower end of the plots. The search model and IDCA tended to better fit the data in the low-confidence region, i.e., near the upper end of the plots, particularly for FROC curves for which the Swensson model predictions departed markedly from the data. For one data set none of the models yielded satisfactory fits. A unique characteristic of search model and IDCA predicted operating characteristics is that the operating point is not allowed to move continuously to the lowest confidence limit of the corresponding Swensson model curves. This prediction is actually observed in the CAD raw data and it is the primary reason for the poor FROC fits of the Swensson model in the low-confidence region.
Collapse
Affiliation(s)
- D P Chakraborty
- Department of Radiology, University of Pittsburgh, 3520 Forbes Avenue, Parkvale Building, Room 109, Pittsburgh, Pennsylvania 15261, USA.
| | | |
Collapse
|
29
|
ROC analysis in medical imaging: a tutorial review of the literature. Radiol Phys Technol 2007; 1:2-12. [PMID: 20821157 DOI: 10.1007/s12194-007-0002-1] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2007] [Accepted: 09/25/2007] [Indexed: 10/22/2022]
|
30
|
Abstract
Computer-aided detection (CAD) has been attracting extensive research interest during the last two decades. It is recognized that the full potential of CAD can only be realized by improving the performance and robustness of CAD algorithms and this requires good evaluation methodology that would permit CAD designers to optimize their algorithms. Free-response receiver operating characteristic (FROC) curves are widely used to assess CAD performance, however, evaluation rarely proceeds beyond determination of lesion localization fraction (sensitivity) at an arbitrarily selected value of nonlesion localizations (false marks) per image. This work describes a FROC curve fitting procedure that uses a recent model of visual search that serves as a framework for the free-response task. A maximum likelihood procedure for estimating the parameters of the model from free-response data and fitting CAD generated FROC curves was implemented. Procedures were implemented to estimate two figures of merit and associated statistics such as 95% confidence intervals and goodness of fit. One of the figures of merit does not require the arbitrary specification of an operating point at which to evaluate CAD performance. For comparison a related method termed initial detection and candidate analysis was also implemented that is applicable when all suspicious regions are reported. The two methods were tested on seven mammography CAD data sets and both yielded good to excellent fits. The search model approach has the advantage that it can potentially be applied to radiologist generated free-response data where not all suspicious regions are reported, only the ones that are deemed sufficiently suspicious to warrant clinical follow-up. This work represents the first practical application of the search model to an important evaluation problem in diagnostic radiology. Software based on this work is expected to benefit CAD developers working in diverse areas of medical imaging.
Collapse
Affiliation(s)
- Hong Jun Yoon
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15261
| | - Bin Zheng
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15261
| | - Berkman Sahiner
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109
| | | |
Collapse
|
31
|
Gur D, Rockette HE, Bandos AI. "Binary" and "non-binary" detection tasks: are current performance measures optimal? Acad Radiol 2007; 14:871-6. [PMID: 17626312 DOI: 10.1016/j.acra.2007.03.014] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
We have observed that a very large fraction of responses for several detection tasks during the performance of observer studies are in the extreme ranges of lower than 11% or higher than 89% regardless of the actual presence or absence of the abnormality in question or its subjectively rated "subtleness." This observation raises questions regarding the validity and appropriateness of using multicategory rating scales for such detection tasks. Monte Carlo simulation of binary and multicategory ratings for these tasks demonstrate that the use of the former (binary) often results in a less biased and more precise summary index and hence may lead to a higher statistical power for determining differences between modalities.
Collapse
Affiliation(s)
- David Gur
- Department of Radiology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA.
| | | | | |
Collapse
|
32
|
Gur D. Objectively measuring and comparing performance levels of diagnostic imaging systems and practices. Acad Radiol 2007; 14:641-2. [PMID: 17502252 DOI: 10.1016/j.acra.2007.04.007] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2007] [Revised: 04/16/2007] [Accepted: 06/16/2007] [Indexed: 11/30/2022]
|
33
|
Wagner RF, Metz CE, Campbell G. Assessment of medical imaging systems and computer aids: a tutorial review. Acad Radiol 2007; 14:723-48. [PMID: 17502262 DOI: 10.1016/j.acra.2007.03.001] [Citation(s) in RCA: 99] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2007] [Revised: 03/05/2007] [Accepted: 03/06/2007] [Indexed: 11/22/2022]
Abstract
This article reviews the central issues that arise in the assessment of diagnostic imaging and computer-assist modalities. The paradigm of the receiver operating characteristic (ROC) curve--the dependence of the true-positive fraction versus the false-positive fraction as a function of the level of aggressiveness of the reader/radiologist toward a positive call--is essential to this field because diagnostic imaging systems are used in multiple settings, including controlled laboratory studies in which the prevalence of disease is different from that encountered in a study in the field. The basic equation of statistical decision theory is used to display how readers can vary their level of aggressiveness according to this diagnostic context. Most studies of diagnostic modalities in the last 15 years have demonstrated not only a range of levels of reader aggressiveness, but also a range of level of reader performance. These characteristics require a multivariate approach to ROC analysis that accounts for both the variation of case difficulty and the variation of reader skill in a study. The resulting paradigm is called the multiple-reader, multiple-case ROC paradigm. Highlights of historic as well as contemporary work in this field are reviewed. Many practical issues related to study design and resulting statistical power are included, together with recent developments and availability of analytical software.
Collapse
Affiliation(s)
- Robert F Wagner
- Office of Science and Engineering Laboratories, Center for Devices & Radiological Health, Food and Drug Administration, Rockville, MD 20850, USA.
| | | | | |
Collapse
|
34
|
Abstract
When a person or an algorithm searches for targets throughout an image, there are no discrete trials, and the probability of a false alarm cannot be computed. Instead, what is observable is the rate of production of false alarms (per image, say), and data analysis uses the free-response version of signal detection theory. A previously-proposed model implies a power relationship for the free-response operating characteristic. The present letter introduces an extra parameter. The relationship between the logarithms of probability of correct detection and rate of false alarm production is no longer forced to be linear.
Collapse
|
35
|
Abstract
In this paper we review several results of the nonparametric receiver operating characteristic (ROC) analysis and present an extension to the nonparametric localization ROC (LROC) analysis. Equations for the estimation of the area under the characteristic curve and for the variance calculations are derived. Expressions for the choice of the optimal ratio between the number of signal-absent and signal-present image samples are also presented. The results can be applied both with continuous or discrete scoring scales. The simulation studies carried out validate the theoretical derivations and show that the LROC analysis is considerably more sensitive than the ROC analysis.
Collapse
Affiliation(s)
- Lucretiu M Popescu
- University of Pennsylvania, Department of Radiology, 423 Guardian Drive, 4th floor Blockley Hall, Philadelphia, PA 19104-6021, USA.
| |
Collapse
|
36
|
Chakraborty D, Yoon HJ, Mello-Thoms C. Spatial localization accuracy of radiologists in free-response studies: Inferring perceptual FROC curves from mark-rating data. Acad Radiol 2007; 14:4-18. [PMID: 17178361 PMCID: PMC1829298 DOI: 10.1016/j.acra.2006.10.015] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2006] [Revised: 10/24/2006] [Accepted: 10/26/2006] [Indexed: 11/25/2022]
Abstract
RATIONALE AND OBJECTIVES Free-response data consist of a set of mark-ratings pairs. Before analysis, the data are classified or "scored" into lesion and non-lesion localizations. The scoring is done by choosing an acceptance-radius and classifying marks within the acceptance-radius of lesion centers as lesion localizations, and all other marks are classified as non-lesion localizations. The scored data are plotted as a free-response receiver operating characteristic (FROC) curve, essentially a plot of appropriately normalized numbers of lesion localizations vs. non-lesion localizations. Scored FROC curves are frequently used to compare imaging systems and computer-aided detection (CAD) algorithms. However, the choice of acceptance-radius is arbitrary. This makes it difficult to compare curves from different studies and to estimate true performance. MATERIALS AND METHODS To resolve this issue the concept of two types of marks is introduced: perceptual hits and perceptual misses. A perceptual hit is a mark made in response to the observer seeing the lesion. A perceptual miss is a mark made in response to the observer seeing a (lesion-like) non-lesion. A method of estimating the most probable numbers of perceptual hits and misses is described. This allows one to plot a perceptual FROC operating point and by extension a perceptual FROC curve. Unlike a scored FROC operating point, a perceptual point is independent of the choice of acceptance-radius. The method does not allow one to identify individual marks as perceptual hits or misses-only the most probable numbers. It is based on a three-parameter statistical model of the spatial distributions of perceptual hits and misses relative to lesion centers. RESULTS The method has been applied to an observer dataset in which mammographers and residents with different levels of experience were asked to locate lesions in mammograms. The perceptual operating points suggest superior performance for the mammographers and equivalent performance for residents in the first and second mammography rotations. These results and the model validation are preliminary as they are based on a small dataset. CONCLUSION The significance of this study is showing that it is possible to probabilistically determine if a mark resulted from seeing a lesion or a non-lesion. Using the method developed in this study one could perform acceptance-radius independent estimation of observer performance.
Collapse
Affiliation(s)
- Dev Chakraborty
- Department of Radiology, University of Pittsburgh, 3520 5th Avenue, Suite 300, Pittsburgh, PA 15261, USA.
| | | | | |
Collapse
|
37
|
Popescu LM, Lewitt RM. Small nodule detectability evaluation using a generalized scan-statistic model. Phys Med Biol 2006; 51:6225-44. [PMID: 17110782 DOI: 10.1088/0031-9155/51/23/020] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
In this paper is investigated the use of the scan statistic for evaluating the detectability of small nodules in medical images. The scan-statistic method is often used in applications in which random fields must be searched for abnormal local features. Several results of the detection with localization theory are reviewed and a generalization is presented using the noise nodule distribution obtained by scanning arbitrary areas. One benefit of the noise nodule model is that it enables determination of the scan-statistic distribution by using only a few image samples in a way suitable both for simulation and experimental setups. Also, based on the noise nodule model, the case of multiple targets per image is addressed and an image abnormality test using the likelihood ratio and an alternative test using multiple decision thresholds are derived. The results obtained reveal that in the case of low contrast nodules or multiple nodules the usual test strategy based on a single decision threshold underperforms compared with the alternative tests. That is a consequence of the fact that not only the contrast or the size, but also the number of suspicious nodules is a clue indicating the image abnormality. In the case of the likelihood ratio test, the multiple clues are unified in a single decision variable. Other tests that process multiple clues differently do not necessarily produce a unique ROC curve, as shown in examples using a test involving two decision thresholds. We present examples with two-dimensional time-of-flight (TOF) and non-TOF PET image sets analysed using the scan statistic for different search areas, as well as the fixed position observer.
Collapse
Affiliation(s)
- Lucreţiu M Popescu
- Department of Radiology, University of Pennsylvania, 423 Guardian Drive, 4th floor Blockley Hall, Philadelphia, PA 19104-6021, USA.
| | | |
Collapse
|
38
|
Abstract
In imaging tasks where the observer is uncertain whether lesions are present, and where they could be present, the image is searched for lesions. In the free-response paradigm, which closely reflects this task, the observer provides data in the form of a variable number of mark-rating pairs per image. In a companion paper a statistical model of visual search has been proposed that has parameters characterizing the perceived lesion signal-to-noise ratio, the ability of the observer to avoid marking non-lesion locations, and the ability of the observer to find lesions. The aim of this work is to relate the search model parameters to receiver operating characteristic (ROC) curves that would result if the observer reported the rating of the most suspicious finding on an image as the overall rating. Also presented are the probability density functions (pdfs) of the underlying latent decision variables corresponding to the highest rating for normal and abnormal images. The search-model-predicted ROC curves are 'proper' in the sense of never crossing the chance diagonal and the slope is monotonically changing. They also have the interesting property of not allowing the observer to move the operating point continuously from the origin to (1, 1). For certain choices of parameters the operating points are predicted to be clustered near the initial steep region of the curve, as has been observed by other investigators. The pdfs are non-Gaussians, markedly so for the abnormal images and for certain choices of parameter values, and provide an explanation for the well-known observation that experimental ROC data generally imply a wider pdf for abnormal images than for normal images. Some features of search-model-predicted ROC curves and pdfs resemble those predicted by the contaminated binormal model, but there are significant differences. The search model appears to provide physical explanations for several aspects of experimental ROC curves.
Collapse
Affiliation(s)
- D P Chakraborty
- Department of Radiology, University of Pittsburgh, 3520 5th Avenue, Suite 300, Pittsburgh, PA 15261, USA.
| |
Collapse
|