Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Chakraborty DP. A search model and figure of merit for observer data acquired according to the free-response paradigm. Phys Med Biol 2006;51:3449-62. [PMID: 16825742 PMCID: PMC2230665 DOI: 10.1088/0031-9155/51/14/012] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

For:	Chakraborty DP. A search model and figure of merit for observer data acquired according to the free-response paradigm. Phys Med Biol 2006;51:3449-62. [PMID: 16825742 PMCID: PMC2230665 DOI: 10.1088/0031-9155/51/14/012] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Number

Cited by Other Article(s)

Yapp KE, Ekpo E. Clinical history and incidental abnormality detection in endodontic cone beam computed tomography. J Med Imaging (Bellingham) 2023;10:045502. [PMID: 37529625 PMCID: PMC10390029 DOI: 10.1117/1.jmi.10.4.045502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 06/28/2023] [Accepted: 07/21/2023] [Indexed: 08/03/2023] Open

Yapp KE, Suleiman M, Brennan P, Ekpo E. Periapical Radiography versus Cone Beam Computed Tomography in Endodontic Disease Detection: A Free-response, Factorial Study. J Endod 2023;49:419-429. [PMID: 36773745 DOI: 10.1016/j.joen.2023.02.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 01/17/2023] [Accepted: 02/01/2023] [Indexed: 02/11/2023]

The effect of clinical history on diagnostic performance of endodontic cone-beam CT interpretation. Clin Radiol 2023;78:e433-e441. [PMID: 36702710 DOI: 10.1016/j.crad.2022.12.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 11/21/2022] [Accepted: 12/09/2022] [Indexed: 01/12/2023]

Establishment of image quality for MRI of the knee joint using a list of anatomical criteria. Radiography (Lond) 2018;24:196-203. [DOI: 10.1016/j.radi.2018.01.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Revised: 01/28/2018] [Accepted: 01/30/2018] [Indexed: 11/21/2022]

Chakraborty DP, Zhai X. On the meaning of the weighted alternative free-response operating characteristic figure of merit. Med Phys 2017;43:2548. [PMID: 27147365 DOI: 10.1118/1.4947125] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open

Abstract

PURPOSE

The free-response receiver operating characteristic (FROC) method is being increasingly used to evaluate observer performance in search tasks. Data analysis requires definition of a figure of merit (FOM) quantifying performance. While a number of FOMs have been proposed, the recommended one, namely, the weighted alternative FROC (wAFROC) FOM, is not well understood. The aim of this work is to clarify the meaning of this FOM by relating it to the empirical area under a proposed wAFROC curve.

METHODS

The weighted wAFROC FOM is defined in terms of a quasi-Wilcoxon statistic that involves weights, coding the clinical importance, assigned to each lesion. A new wAFROC curve is proposed, the y-axis of which incorporates the weights, giving more credit for marking clinically important lesions, while the x-axis is identical to that of the AFROC curve. An expression is derived relating the area under the empirical wAFROC curve to the wAFROC FOM. Examples are presented with small numbers of cases showing how AFROC and wAFROC curves are affected by correct and incorrect decisions and how the corresponding FOMs credit or penalize these decisions. The wAFROC, AFROC, and inferred ROC FOMs were applied to three clinical data sets involving multiple reader FROC interpretations in different modalities.

RESULTS

It is shown analytically that the area under the empirical wAFROC curve equals the wAFROC FOM. This theorem is the FROC analog of a well-known theorem developed in 1975 for ROC analysis, which gave meaning to a Wilcoxon statistic based ROC FOM. A similar equivalence applies between the area under the empirical AFROC curve and the AFROC FOM. The examples show explicitly that the wAFROC FOM gives equal importance to all diseased cases, regardless of the number of lesions, a desirable statistical property not shared by the AFROC FOM. Applications to the clinical data sets show that the wAFROC FOM yields results comparable to that using the AFROC FOM.

CONCLUSIONS

The equivalence theorem gives meaning to the weighted AFROC FOM, namely, it is identical to the empirical area under weighted AFROC curve.

Collapse

Zarb F, McEntee MF, Rainford L. Visual grading characteristics and ordinal regression analysis during optimisation of CT head examinations. Insights Imaging 2015;6:393-401. [PMID: 25510470 PMCID: PMC4444791 DOI: 10.1007/s13244-014-0374-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Revised: 11/16/2014] [Accepted: 11/21/2014] [Indexed: 11/26/2022] Open

Zarb F, McEntee MF, Rainford L. A multi-phased study of optimisation methodologies and radiation dose savings for head CT examinations. RADIATION PROTECTION DOSIMETRY 2015;163:480-490. [PMID: 25009189 DOI: 10.1093/rpd/ncu227] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]

He X, Samuelson F, Zeng R, Sahiner B. Discovering intrinsic properties of human observers' visual search and mathematical observers' scanning. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA. A, OPTICS, IMAGE SCIENCE, AND VISION 2014;31:2495-2510. [PMID: 25401363 DOI: 10.1364/josaa.31.002495] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]

Petrick N, Sahiner B, Armato SG, Bert A, Correale L, Delsanto S, Freedman MT, Fryd D, Gur D, Hadjiiski L, Huo Z, Jiang Y, Morra L, Paquerault S, Raykar V, Samuelson F, Summers RM, Tourassi G, Yoshida H, Zheng B, Zhou C, Chan HP. Evaluation of computer-aided detection and diagnosis systems. Med Phys 2014;40:087001. [PMID: 23927365 DOI: 10.1118/1.4816310] [Citation(s) in RCA: 65] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open

Abstract

Computer-aided detection and diagnosis (CAD) systems are increasingly being used as an aid by clinicians for detection and interpretation of diseases. Computer-aided detection systems mark regions of an image that may reveal specific abnormalities and are used to alert clinicians to these regions during image interpretation. Computer-aided diagnosis systems provide an assessment of a disease using image-based information alone or in combination with other relevant diagnostic data and are used by clinicians as a decision support in developing their diagnoses. While CAD systems are commercially available, standardized approaches for evaluating and reporting their performance have not yet been fully formalized in the literature or in a standardization effort. This deficiency has led to difficulty in the comparison of CAD devices and in understanding how the reported performance might translate into clinical practice. To address these important issues, the American Association of Physicists in Medicine (AAPM) formed the Computer Aided Detection in Diagnostic Imaging Subcommittee (CADSC), in part, to develop recommendations on approaches for assessing CAD system performance. The purpose of this paper is to convey the opinions of the AAPM CADSC members and to stimulate the development of consensus approaches and "best practices" for evaluating CAD systems. Both the assessment of a standalone CAD system and the evaluation of the impact of CAD on end-users are discussed. It is hoped that awareness of these important evaluation elements and the CADSC recommendations will lead to further development of structured guidelines for CAD performance assessment. Proper assessment of CAD system performance is expected to increase the understanding of a CAD system's effectiveness and limitations, which is expected to stimulate further research and development efforts on CAD technologies, reduce problems due to improper use, and eventually improve the utility and efficacy of CAD in clinical practice.

Collapse

A brief history of free-response receiver operating characteristic paradigm data analysis. Acad Radiol 2013;20:915-9. [PMID: 23583665 DOI: 10.1016/j.acra.2013.03.001] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2013] [Revised: 03/01/2013] [Accepted: 03/07/2013] [Indexed: 11/23/2022]

Zanca F, Hillis SL, Claus F, Van Ongeval C, Celis V, Provoost V, Yoon HJ, Bosmans H. Correlation of free-response and receiver-operating-characteristic area-under-the-curve estimates: results from independently conducted FROC∕ROC studies in mammography. Med Phys 2012;39:5917-29. [PMID: 23039631 DOI: 10.1118/1.4747262] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open

Abstract

PURPOSE

From independently conducted free-response receiver operating characteristic (FROC) and receiver operating characteristic (ROC) experiments, to study fixed-reader associations between three estimators: the area under the alternative FROC (AFROC) curve computed from FROC data, the area under the ROC curve computed from FROC highest rating data, and the area under the ROC curve computed from confidence-of-disease ratings.

METHODS

Two hundred mammograms, 100 of which were abnormal, were processed by two image-processing algorithms and interpreted by four radiologists under the FROC paradigm. From the FROC data, inferred-ROC data were derived, using the highest rating assumption. Eighteen months afterwards, the images were interpreted by the same radiologists under the conventional ROC paradigm; conventional-ROC data (in contrast to inferred-ROC data) were obtained. FROC and ROC (inferred, conventional) data were analyzed using the nonparametric area-under-the-curve (AUC), (AFROC and ROC curve, respectively). Pearson correlation was used to quantify the degree of association between the modality-specific AUC indices and standard errors were computed using the bootstrap-after-bootstrap method. The magnitude of the correlations was assessed by comparison with computed Obuchowski-Rockette fixed reader correlations.

RESULTS

Average Pearson correlations (with 95% confidence intervals in square brackets) were: Corr(FROC, inferred ROC) = 0.76[0.64, 0.84] > Corr(inferred ROC, conventional ROC) = 0.40[0.18, 0.58] > Corr (FROC, conventional ROC) = 0.32[0.16, 0.46].

CONCLUSIONS

Correlation between FROC and inferred-ROC data AUC estimates was high. Correlation between inferred- and conventional-ROC AUC was similar to the correlation between two modalities for a single reader using one estimation method, suggesting that the highest rating assumption might be questionable.

Collapse

Chakraborty DP, Yoon HJ, Mello-Thoms C. Application of threshold-bias independent analysis to eye-tracking and FROC data. Acad Radiol 2012;19:1474-83. [PMID: 23040503 PMCID: PMC3489965 DOI: 10.1016/j.acra.2012.09.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2012] [Revised: 09/08/2012] [Accepted: 09/08/2012] [Indexed: 10/27/2022]

Abstract

RATIONALE AND OBJECTIVES

Studies of medical image interpretation have focused on either assessing radiologists' performance using, for example, the receiver operating characteristic (ROC) paradigm, or assessing the interpretive process by analyzing their eye-tracking (ET) data. Analysis of ET data has not benefited from threshold-bias independent figures of merit (FOMs) analogous to the area under the receiver operating characteristic (ROC) curve. The aim was to demonstrate the feasibility of such FOMs and to measure the agreement between FOMs derived from free-response ROC (FROC) and ET data.

METHODS

Eight expert breast radiologists interpreted a case set of 120 two-view mammograms while eye-position data and FROC data were continuously collected during the interpretation interval. Regions that attract prolonged (>800 ms) visual attention were considered to be virtual marks, and ratings based on the dwell and approach-rate (inverse of time-to-hit) were assigned to them. The virtual ratings were used to define threshold-bias independent FOMs in a manner analogous to the area under the trapezoidal alternative FROC (AFROC) curve (0 = worst, 1 = best). Agreement at the case level (0.5 = chance, 1 = perfect) was measured using the jackknife and 95% confidence intervals (CI) for the FOMs and agreement were estimated using the bootstrap.

RESULTS

The AFROC mark-ratings' FOM was largest at 0.734 (CI 0.65-0.81) followed by the dwell at 0.460 (0.34-0.59) and then by the approach-rate FOM 0.336 (0.25-0.46). The differences between the FROC mark-ratings' FOM and the perceptual FOMs were significant (P < .05). All pairwise agreements were significantly better then chance: ratings vs. dwell 0.707 (0.63-0.88), dwell vs. approach-rate 0.703 (0.60-0.79) and rating vs. approach-rate 0.606 (0.53-0.68). The ratings vs. approach-rate agreement was significantly smaller than the dwell vs. approach-rate agreement (P = .008).

CONCLUSIONS

Leveraging current methods developed for analyzing observer performance data could complement current ways of analyzing ET data and lead to new insights.

Collapse

Svahn TM, Chakraborty DP, Ikeda D, Zackrisson S, Do Y, Mattsson S, Andersson I. Breast tomosynthesis and digital mammography: a comparison of diagnostic accuracy. Br J Radiol 2012;85:e1074-82. [PMID: 22674710 PMCID: PMC3500806 DOI: 10.1259/bjr/53282892] [Citation(s) in RCA: 107] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2011] [Revised: 02/28/2012] [Accepted: 03/14/2012] [Indexed: 11/05/2022] Open

Chakraborty DP. New developments in observer performance methodology in medical imaging. Semin Nucl Med 2012;41:401-18. [PMID: 21978444 DOI: 10.1053/j.semnuclmed.2011.07.001] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Chakraborty DP. Recent developments in imaging system assessment methodology, FROC analysis and the search model. NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH. SECTION A, ACCELERATORS, SPECTROMETERS, DETECTORS AND ASSOCIATED EQUIPMENT 2011;648 Supplement 1:S297-S301. [PMID: 21804679 PMCID: PMC3144765 DOI: 10.1016/j.nima.2010.11.042] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

Zarb F, Rainford L, McEntee MF. Image quality assessment tools for optimization of CT images. Radiography (Lond) 2010. [DOI: 10.1016/j.radi.2009.10.002] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]

Chakraborty DP. A status report on free-response analysis. RADIATION PROTECTION DOSIMETRY 2010;139:20-25. [PMID: 20085898 PMCID: PMC2868098 DOI: 10.1093/rpd/ncp305] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]

Zanca F, Chakraborty DP, Marchal G, Bosmans H. Consistency of methods for analysing location-specific data. RADIATION PROTECTION DOSIMETRY 2010;139:52-56. [PMID: 20159917 PMCID: PMC2868070 DOI: 10.1093/rpd/ncq030] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]

Chakraborty DP. Clinical relevance of the ROC and free-response paradigms for comparing imaging system efficacies. RADIATION PROTECTION DOSIMETRY 2010;139:37-41. [PMID: 20139268 PMCID: PMC2868120 DOI: 10.1093/rpd/ncq017] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]

Obuchowski NA, Mazzone PJ, Dachman AH. Bias, underestimation of risk, and loss of statistical power in patient-level analyses of lesion detection. Eur Radiol 2009;20:584-94. [PMID: 19763582 DOI: 10.1007/s00330-009-1590-4] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2009] [Revised: 07/15/2009] [Accepted: 07/27/2009] [Indexed: 11/28/2022]

Counterpoint to "Performance assessment of diagnostic systems under the FROC paradigm" by Gur and Rockette. Acad Radiol 2009;16:507-10. [PMID: 19268864 DOI: 10.1016/j.acra.2008.12.011] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2008] [Revised: 12/25/2008] [Accepted: 11/25/2008] [Indexed: 11/23/2022]

Popescu LM. Model for the detection of signals in images with multiple suspicious locations. Med Phys 2009;35:5565-74. [PMID: 19175114 DOI: 10.1118/1.3002413] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open

Abstract

A signal detection model is presented that combines a signal model and a noise model providing mathematical descriptions of the frequency of appearance of the signals, and of the signal-like features naturally occurring in the background. We derive expressions for the likelihood functions for the whole ensemble of observed suspicious locations, in various possible combinations of signals and false signal candidates. As a result, this formalism is able to describe several new types of detection tests using likelihood ratio statistics. We have a global image abnormality test and an individual signal detection test. The model also provides an alternative mechanism in which is selected the combination of signal and noise features candidates that has the maximum likelihood. These tests can be analyzed with a variety of operating characteristic curves (ROC, LROC, FROC, etc.). In the mathematical formalism of the model, all the details characterizing the suspicious features are reduced to a single scalar function, which we name the signal specificity function, representing the frequency that a signal takes a certain value relative to the frequency of having a false signal with the same value in an image of given size. The signal specificity function ranks the degree of suspiciousness of the features found, and can be used to unify into a single score all the suspicious feature characteristics, and then apply the usual decision conventions as in the Swensson's detection model [Med. Phys. 23, 1709-1725 (1996)]. We present several examples in which these tests are compared. We also show how the signal specificity function can be used to model various degrees of accuracy of the observer's knowledge about image noise and signal statistical properties. Aspects concerning modeling of the human observer are also discussed.

Collapse

Gur D, Bandos AI, King JL, Klym AH, Cohen CS, Hakim CM, Hardesty LA, Ganott MA, Perrin RL, Poller WR, Shah R, Sumkin JH, Wallace LP, Rockette HE. Binary and multi-category ratings in a laboratory observer performance study: a comparison. Med Phys 2008;35:4404-9. [PMID: 18975686 DOI: 10.1118/1.2977766] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open

Abstract

The authors investigated radiologists, performances during retrospective interpretation of screening mammograms when using a binary decision whether to recall a woman for additional procedures or not and compared it with their receiver operating characteristic (ROC) type performance curves using a semi-continuous rating scale. Under an Institutional Review Board approved protocol nine experienced radiologists independently rated an enriched set of 155 examinations that they had not personally read in the clinic, mixed with other enriched sets of examinations that they had individually read in the clinic, using both a screening BI-RADS rating scale (recall/not recall) and a semi-continuous ROC type rating scale (0 to 100). The vertical distance, namely the difference in sensitivity levels at the same specificity levels, between the empirical ROC curve and the binary operating point were computed for each reader. The vertical distance averaged over all readers was used to assess the proximity of the performance levels under the binary and ROC-type rating scale. There does not appear to be any systematic tendency of the readers towards a better performance when using either of the two rating approaches, namely four readers performed better using the semi-continuous rating scale, four readers performed better with the binary scale, and one reader had the point exactly on the empirical ROC curve. Only one of the nine readers had a binary "operating point" that was statistically distant from the same reader's empirical ROC curve. Reader-specific differences ranged from -0.046 to 0.128 with an average width of the corresponding 95% confidence intervals of 0.2 and p-values ranging for individual readers from 0.050 to 0.966. On average, radiologists performed similarly when using the two rating scales in that the average distance between the run in individual reader's binary operating point and their ROC curve was close to zero. The 95% confidence interval for the fixed-reader average (0.016) was (-0.0206, 0.0631) (two-sided p-value 0.35). In conclusion the authors found that in retrospective observer performance studies the use of a binary response or a semi-continuous rating scale led to consistent results in terms of performance as measured by sensitivity-specificity operating points.

Collapse

Chakraborty DP. Validation and statistical power comparison of methods for analyzing free-response observer performance studies. Acad Radiol 2008;15:1554-66. [PMID: 19000872 DOI: 10.1016/j.acra.2008.07.018] [Citation(s) in RCA: 75] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2008] [Revised: 07/16/2008] [Accepted: 07/17/2008] [Indexed: 11/26/2022]

Abstract

RATIONALE AND OBJECTIVES

The aim of this work was to validate and compare the statistical powers of proposed methods for analyzing free-response data using a search-model-based simulator.

MATERIALS AND METHODS

A free-response data simulator is described that can model a single reader interpreting the same cases in two modalities, or two computer-aided detection (CAD) algorithms, or two human observers, interpreting the same cases in one modality. A variance components model, analogous to the Roe and Metz receiver-operating characteristic (ROC) data simulator, is described; it models intracase and intermodality correlations in free-response studies. Two generic observers were simulated: a quasi-human observer and a quasi-CAD algorithm. Null hypothesis (NH) validity and statistical powers of ROC, jackknife alternative free-response operating characteristic (JAFROC), a variant of JAFROC termed JAFROC-1, initial detection and candidate analysis (IDCA), and a nonparametric (NP) approach were investigated.

RESULTS

All methods had valid NH behavior over a wide range of simulator parameters. For equal numbers of normal and abnormal cases, for the human observer, the statistical power ranking of the methods was JAFROC-1 > JAFROC > (IDCA approximately NP) > ROC. For the CAD algorithm, the ranking was (NP approximately IDCA) > (JAFROC-1 approximately JAFROC) > ROC. In either case, the statistical power of the highest ranked method exceeded that of the lowest ranked method by about a factor of two. Dependence of statistical power on simulator parameters followed expected trends. For data sets with more abnormal cases than normal cases, JAFROC-1 power significantly exceeded JAFROC power.

CONCLUSION

Based on this work, the recommendation is to use JAFROC-1 for human observers (including human observers with CAD assist) and the NP method for evaluating CAD algorithms.

Collapse

Performance assessments of diagnostic systems under the FROC paradigm: experimental, analytical, and results interpretation issues. Acad Radiol 2008;15:1312-5. [PMID: 18790403 DOI: 10.1016/j.acra.2008.05.006] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2008] [Revised: 05/22/2008] [Accepted: 04/29/2008] [Indexed: 11/22/2022]

Song T, Bandos AI, Rockette HE, Gur D. On comparing methods for discriminating between actually negative and actually positive subjects with FROC type data. Med Phys 2008;35:1547-58. [PMID: 18491549 DOI: 10.1118/1.2890410] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open

Bandos AI, Rockette HE, Song T, Gur D. Area under the free-response ROC curve (FROC) and a related summary index. Biometrics 2008;65:247-56. [PMID: 18479482 DOI: 10.1111/j.1541-0420.2008.01049.x] [Citation(s) in RCA: 83] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Abstract

Free-response assessment of diagnostic systems continues to gain acceptance in areas related to the detection, localization, and classification of one or more "abnormalities" within a subject. A free-response receiver operating characteristic (FROC) curve is a tool for characterizing the performance of a free-response system at all decision thresholds simultaneously. Although the importance of a single index summarizing the entire curve over all decision thresholds is well recognized in ROC analysis (e.g., area under the ROC curve), currently there is no widely accepted summary of a system being evaluated under the FROC paradigm. In this article, we propose a new index of the free-response performance at all decision thresholds simultaneously, and develop a nonparametric method for its analysis. Algebraically, the proposed summary index is the area under the empirical FROC curve penalized for the number of erroneous marks, rewarded for the fraction of detected abnormalities, and adjusted for the effect of the target size (or "acceptance radius"). Geometrically, the proposed index can be interpreted as a measure of average performance superiority over an artificial "guessing" free-response process and it represents an analogy to the area between the ROC curve and the "guessing" or diagonal line. We derive the ideal bootstrap estimator of the variance, which can be used for a resampling-free construction of asymptotic bootstrap confidence intervals and for sample size estimation using standard expressions. The proposed procedure is free from any parametric assumptions and does not require an assumption of independence of observations within a subject. We provide an example with a dataset sampled from a diagnostic imaging study and conduct simulations that demonstrate the appropriateness of the developed procedure for the considered sample sizes and ranges of parameters.

Collapse

Chakraborty DP, Yoon HJ. Operating characteristics predicted by models for diagnostic tasks involving lesion localization. Med Phys 2008;35:435-45. [PMID: 18383663 DOI: 10.1118/1.2820902] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open

ROC analysis in medical imaging: a tutorial review of the literature. Radiol Phys Technol 2007;1:2-12. [PMID: 20821157 DOI: 10.1007/s12194-007-0002-1] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2007] [Accepted: 09/25/2007] [Indexed: 10/22/2022]

Yoon HJ, Zheng B, Sahiner B, Chakraborty DP. Evaluating computer-aided detection algorithms. Med Phys 2007;34:2024-38. [PMID: 17654906 PMCID: PMC2041901 DOI: 10.1118/1.2736289] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open

Abstract

Computer-aided detection (CAD) has been attracting extensive research interest during the last two decades. It is recognized that the full potential of CAD can only be realized by improving the performance and robustness of CAD algorithms and this requires good evaluation methodology that would permit CAD designers to optimize their algorithms. Free-response receiver operating characteristic (FROC) curves are widely used to assess CAD performance, however, evaluation rarely proceeds beyond determination of lesion localization fraction (sensitivity) at an arbitrarily selected value of nonlesion localizations (false marks) per image. This work describes a FROC curve fitting procedure that uses a recent model of visual search that serves as a framework for the free-response task. A maximum likelihood procedure for estimating the parameters of the model from free-response data and fitting CAD generated FROC curves was implemented. Procedures were implemented to estimate two figures of merit and associated statistics such as 95% confidence intervals and goodness of fit. One of the figures of merit does not require the arbitrary specification of an operating point at which to evaluate CAD performance. For comparison a related method termed initial detection and candidate analysis was also implemented that is applicable when all suspicious regions are reported. The two methods were tested on seven mammography CAD data sets and both yielded good to excellent fits. The search model approach has the advantage that it can potentially be applied to radiologist generated free-response data where not all suspicious regions are reported, only the ones that are deemed sufficiently suspicious to warrant clinical follow-up. This work represents the first practical application of the search model to an important evaluation problem in diagnostic radiology. Software based on this work is expected to benefit CAD developers working in diverse areas of medical imaging.

Collapse

Gur D, Rockette HE, Bandos AI. "Binary" and "non-binary" detection tasks: are current performance measures optimal? Acad Radiol 2007;14:871-6. [PMID: 17626312 DOI: 10.1016/j.acra.2007.03.014] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

Gur D. Objectively measuring and comparing performance levels of diagnostic imaging systems and practices. Acad Radiol 2007;14:641-2. [PMID: 17502252 DOI: 10.1016/j.acra.2007.04.007] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2007] [Revised: 04/16/2007] [Accepted: 06/16/2007] [Indexed: 11/30/2022]

Wagner RF, Metz CE, Campbell G. Assessment of medical imaging systems and computer aids: a tutorial review. Acad Radiol 2007;14:723-48. [PMID: 17502262 DOI: 10.1016/j.acra.2007.03.001] [Citation(s) in RCA: 99] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2007] [Revised: 03/05/2007] [Accepted: 03/06/2007] [Indexed: 11/22/2022]

Hutchinson TP. Free-response operator characteristic models for visual search. Phys Med Biol 2007;52:L1-3. [PMID: 17473337 DOI: 10.1088/0031-9155/52/10/l01] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Popescu LM. Nonparametric ROC and LROC analysis. Med Phys 2007;34:1556-64. [PMID: 17555237 DOI: 10.1118/1.2717407] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open

Chakraborty D, Yoon HJ, Mello-Thoms C. Spatial localization accuracy of radiologists in free-response studies: Inferring perceptual FROC curves from mark-rating data. Acad Radiol 2007;14:4-18. [PMID: 17178361 PMCID: PMC1829298 DOI: 10.1016/j.acra.2006.10.015] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2006] [Revised: 10/24/2006] [Accepted: 10/26/2006] [Indexed: 11/25/2022]

Abstract

RATIONALE AND OBJECTIVES

Free-response data consist of a set of mark-ratings pairs. Before analysis, the data are classified or "scored" into lesion and non-lesion localizations. The scoring is done by choosing an acceptance-radius and classifying marks within the acceptance-radius of lesion centers as lesion localizations, and all other marks are classified as non-lesion localizations. The scored data are plotted as a free-response receiver operating characteristic (FROC) curve, essentially a plot of appropriately normalized numbers of lesion localizations vs. non-lesion localizations. Scored FROC curves are frequently used to compare imaging systems and computer-aided detection (CAD) algorithms. However, the choice of acceptance-radius is arbitrary. This makes it difficult to compare curves from different studies and to estimate true performance.

MATERIALS AND METHODS

To resolve this issue the concept of two types of marks is introduced: perceptual hits and perceptual misses. A perceptual hit is a mark made in response to the observer seeing the lesion. A perceptual miss is a mark made in response to the observer seeing a (lesion-like) non-lesion. A method of estimating the most probable numbers of perceptual hits and misses is described. This allows one to plot a perceptual FROC operating point and by extension a perceptual FROC curve. Unlike a scored FROC operating point, a perceptual point is independent of the choice of acceptance-radius. The method does not allow one to identify individual marks as perceptual hits or misses-only the most probable numbers. It is based on a three-parameter statistical model of the spatial distributions of perceptual hits and misses relative to lesion centers.

RESULTS

The method has been applied to an observer dataset in which mammographers and residents with different levels of experience were asked to locate lesions in mammograms. The perceptual operating points suggest superior performance for the mammographers and equivalent performance for residents in the first and second mammography rotations. These results and the model validation are preliminary as they are based on a small dataset.

CONCLUSION

The significance of this study is showing that it is possible to probabilistically determine if a mark resulted from seeing a lesion or a non-lesion. Using the method developed in this study one could perform acceptance-radius independent estimation of observer performance.

Collapse

Popescu LM, Lewitt RM. Small nodule detectability evaluation using a generalized scan-statistic model. Phys Med Biol 2006;51:6225-44. [PMID: 17110782 DOI: 10.1088/0031-9155/51/23/020] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Chakraborty DP. ROC curves predicted by a model of visual search. Phys Med Biol 2006;51:3463-82. [PMID: 16825743 PMCID: PMC2230636 DOI: 10.1088/0031-9155/51/14/013] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Abstract

In imaging tasks where the observer is uncertain whether lesions are present, and where they could be present, the image is searched for lesions. In the free-response paradigm, which closely reflects this task, the observer provides data in the form of a variable number of mark-rating pairs per image. In a companion paper a statistical model of visual search has been proposed that has parameters characterizing the perceived lesion signal-to-noise ratio, the ability of the observer to avoid marking non-lesion locations, and the ability of the observer to find lesions. The aim of this work is to relate the search model parameters to receiver operating characteristic (ROC) curves that would result if the observer reported the rating of the most suspicious finding on an image as the overall rating. Also presented are the probability density functions (pdfs) of the underlying latent decision variables corresponding to the highest rating for normal and abnormal images. The search-model-predicted ROC curves are 'proper' in the sense of never crossing the chance diagonal and the slope is monotonically changing. They also have the interesting property of not allowing the observer to move the operating point continuously from the origin to (1, 1). For certain choices of parameters the operating points are predicted to be clustered near the initial steep region of the curve, as has been observed by other investigators. The pdfs are non-Gaussians, markedly so for the abnormal images and for certain choices of parameter values, and provide an explanation for the well-known observation that experimental ROC data generally imply a wider pdf for abnormal images than for normal images. Some features of search-model-predicted ROC curves and pdfs resemble those predicted by the contaminated binormal model, but there are significant differences. The search model appears to provide physical explanations for several aspects of experimental ROC curves.

Collapse