1
|
Obuchowski NA, Bullen J. Multireader Diagnostic Accuracy Imaging Studies: Fundamentals of Design and Analysis. Radiology 2022; 303:26-34. [PMID: 35166584 DOI: 10.1148/radiol.211593] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
The design and analysis of multireader multicase (MRMC) studies are quite challenging. These studies differ from most medical studies because they need a reference standard and sampling from two populations (ie, reader and patient populations). They are quite expensive to conduct, requiring a good deal of readers' time for image interpretation. One common problem is the use of imperfect reference standards, often correlated with the test or tests being evaluated. Another common issue is oversimplification of the multidimensional MRMC data. In this study, the fundamentals of MRMC study design and analysis are reviewed. The goal is to provide investigators with a guide to the fundamentals of MRMC design and analysis, with references to more detailed discussions. In addition, readers are updated on newer areas of research, including correction for studies with multiple diagnostic accuracy end points and adjustment for location bias.
Collapse
Affiliation(s)
- Nancy A Obuchowski
- From the Department of Quantitative Health Sciences, Cleveland Clinic Foundation, 9500 Euclid Ave, JJN3, Cleveland, OH 44195
| | - Jennifer Bullen
- From the Department of Quantitative Health Sciences, Cleveland Clinic Foundation, 9500 Euclid Ave, JJN3, Cleveland, OH 44195
| |
Collapse
|
2
|
Jha AK, Myers KJ, Obuchowski NA, Liu Z, Rahman MA, Saboury B, Rahmim A, Siegel BA. Objective Task-Based Evaluation of Artificial Intelligence-Based Medical Imaging Methods:: Framework, Strategies, and Role of the Physician. PET Clin 2021; 16:493-511. [PMID: 34537127 DOI: 10.1016/j.cpet.2021.06.013] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Artificial intelligence-based methods are showing promise in medical imaging applications. There is substantial interest in clinical translation of these methods, requiring that they be evaluated rigorously. We lay out a framework for objective task-based evaluation of artificial intelligence methods. We provide a list of available tools to conduct this evaluation. We outline the important role of physicians in conducting these evaluation studies. The examples in this article are proposed in the context of PET scans with a focus on evaluating neural network-based methods. However, the framework is also applicable to evaluate other medical imaging modalities and other types of artificial intelligence methods.
Collapse
Affiliation(s)
- Abhinav K Jha
- Department of Biomedical Engineering, Mallinckrodt Institute of Radioly, Alvin J. Siteman Cancer Center, Washington University in St. Louis, 510 S Kingshighway Boulevard, St Louis, MO 63110, USA.
| | - Kyle J Myers
- Division of Imaging, Diagnostics, and Software Reliability, Office of Science and Engineering Laboratories, Center for Devices and Radiological Health, Food and Drug Administration (FDA), Silver Spring, MD, USA
| | | | - Ziping Liu
- Department of Biomedical Engineering, Washington University in St. Louis, 1 Brookings Drive, St Louis, MO 63130, USA
| | - Md Ashequr Rahman
- Department of Biomedical Engineering, Washington University in St. Louis, 1 Brookings Drive, St Louis, MO 63130, USA
| | - Babak Saboury
- Department of Radiology and Imaging Sciences, Clinical Center, National Institutes of Health, 9000 Rockville Pike, Bethesda, MD 20892, USA
| | - Arman Rahmim
- Department of Radiology, Department of Physics, University of British Columbia, BC Cancer, BC Cancer Research Institute, 675 West 10th Avenue, Office 6-112, Vancouver, British Columbia V5Z 1L3, Canada
| | - Barry A Siegel
- Division of Nuclear Medicine, Mallinckrodt Institute of Radiology, Alvin J. Siteman Cancer Center, Washington University School of Medicine, 510 S Kingshighway Boulevard #956, St Louis, MO 63110, USA
| |
Collapse
|
3
|
Li J, Fine JP, Pencina MJ. Multi-category diagnostic accuracy based on logistic regression. ACTA ACUST UNITED AC 2017. [DOI: 10.1080/24754269.2017.1319105] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Jialiang Li
- Department of Statistics and Applied Probability, Duke-NUS Graduate Medical School, Singapore Eye Research Institute, National University of Singapore, Singapore
| | - Jason P. Fine
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
| | | |
Collapse
|
4
|
Dong T, Tian L. Confidence Interval Estimation for Sensitivity to the Early Diseased Stage Based on Empirical Likelihood. J Biopharm Stat 2014; 25:1215-33. [PMID: 25372999 PMCID: PMC5540368 DOI: 10.1080/10543406.2014.971173] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Many disease processes can be divided into three stages: the non-diseased stage: the early diseased stage, and the fully diseased stage. To assess the accuracy of diagnostic tests for such diseases, various summary indexes have been proposed, such as volume under the surface (VUS), partial volume under the surface (PVUS), and the sensitivity to the early diseased stage given specificity and the sensitivity to the fully diseased stage (P2). This paper focuses on confidence interval estimation for P2 based on empirical likelihood. Simulation studies are carried out to assess the performance of the new methods compared to the existing parametric and nonparametric ones. A real dataset from Alzheimer's Disease Neuroimaging Initiative (ADNI) is analyzed.
Collapse
Affiliation(s)
- Tuochuan Dong
- Department of Biostatistics, University at Buffalo, Buffalo, NY 14214, USA
| | - Lili Tian
- Department of Biostatistics, University at Buffalo, Buffalo, NY 14214, USA
| |
Collapse
|
5
|
He X, Samuelson F, Zeng R, Sahiner B. Discovering intrinsic properties of human observers' visual search and mathematical observers' scanning. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA. A, OPTICS, IMAGE SCIENCE, AND VISION 2014; 31:2495-2510. [PMID: 25401363 DOI: 10.1364/josaa.31.002495] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
There is a lack of consensus in measuring observer performance in search tasks. To pursue a consensus, we set our goal to obtain metrics that are practical, meaningful, and predictive. We consider a metric practical if it can be implemented to measure human and computer observers' performance. To be meaningful, we propose to discover intrinsic properties of search observers and formulate the metrics to characterize these properties. If the discovered properties allow verifiable predictions, we consider them predictive. We propose a theory and a conjecture toward two intrinsic properties of search observers: rationality in classification as measured by the location-known-exactly (LKE) receiver operating characteristic (ROC) curve and location uncertainty as measured by the effective set size (M*). These two properties are used to develop search models in both single-response and free-response search tasks. To confirm whether these properties are "intrinsic," we investigate their ability in predicting search performance of both human and scanning channelized Hotelling observers. In particular, for each observer, we designed experiments to measure the LKE-ROC curve and M*, which were then used to predict the same observer's performance in other search tasks. The predictions were then compared to the experimentally measured observer performance. Our results indicate that modeling the search performance using the LKE-ROC curve and M* leads to successful predictions in most cases.
Collapse
|
6
|
He X, Park S. Model observers in medical imaging research. Am J Cancer Res 2013; 3:774-86. [PMID: 24312150 PMCID: PMC3840411 DOI: 10.7150/thno.5138] [Citation(s) in RCA: 78] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2012] [Accepted: 04/15/2013] [Indexed: 01/17/2023] Open
Abstract
Model observers play an important role in the optimization and assessment of imaging devices. In this review paper, we first discuss the basic concepts of model observers, which include the mathematical foundations and psychophysical considerations in designing both optimal observers for optimizing imaging systems and anthropomorphic observers for modeling human observers. Second, we survey a few state-of-the-art computational techniques for estimating model observers and the principles of implementing these techniques. Finally, we review a few applications of model observers in medical imaging research.
Collapse
|
7
|
Validation of Monte Carlo estimates of three-class ideal observer operating points for normal data. Acad Radiol 2013; 20:908-14. [PMID: 23747155 DOI: 10.1016/j.acra.2013.04.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2013] [Revised: 04/15/2013] [Accepted: 04/16/2013] [Indexed: 11/23/2022]
Abstract
RATIONALE AND OBJECTIVES Traditional two-class receiver operating characteristic (ROC) analysis is inadequate for the complete evaluation of observer performance in tasks with more than two classes. MATERIALS AND METHODS Here, a Monte Carlo estimation method for operating point coordinates on a three-class ROC surface is developed and compared with analytically calculated coordinates in two special cases: (1) univariate and (2) restricted bivariate trinormal underlying data. RESULTS In both cases, the statistical estimates were found to be good in the sense that the analytical values lay within the 95% confidence interval of the estimated values about 95% of the time. CONCLUSIONS The statistical estimation method should be key in the development of a pragmatic performance metric for evaluation of observers in classification tasks with three or more classes.
Collapse
|
8
|
Edwards DC, Metz CE. The three-class ideal observer for univariate normal data: Decision variable and ROC surface properties. JOURNAL OF MATHEMATICAL PSYCHOLOGY 2012; 56:256-273. [PMID: 23162165 PMCID: PMC3496401 DOI: 10.1016/j.jmp.2012.05.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Although a fully general extension of ROC analysis to classification tasks with more than two classes has yet to be developed, the potential benefits to be gained from a practical performance evaluation methodology for classification tasks with three classes have motivated a number of research groups to propose methods based on constrained or simplified observer or data models. Here we consider an ideal observer in a task with underlying data drawn from three univariate normal distributions. We investigate the behavior of the resulting ideal observer's decision variables and ROC surface. In particular, we show that the pair of ideal observer decision variables is constrained to a parametric curve in two-dimensional likelihood ratio space, and that the decision boundary line segments used by the ideal observer can intersect this curve in at most six places. From this, we further show that the resulting ROC surface has at most four degrees of freedom at any point, and not the five that would be required, in general, for a surface in a six-dimensional space to be non-degenerate. In light of the difficulties we have previously pointed out in generalizing the well-known area under the ROC curve performance metric to tasks with three or more classes, the problem of developing a suitable and fully general performance metric for classification tasks with three or more classes remains unsolved.
Collapse
|
9
|
Abstract
Medical images constitute a core portion of the information a physician utilizes to render diagnostic and treatment decisions. At a fundamental level, this diagnostic process involves two basic processes: visually inspecting the image (visual perception) and rendering an interpretation (cognition). The likelihood of error in the interpretation of medical images is, unfortunately, not negligible. Errors do occur, and patients' lives are impacted, underscoring our need to understand how physicians interact with the information in an image during the interpretation process. With improved understanding, we can develop ways to further improve decision making and, thus, to improve patient care. The science of medical image perception is dedicated to understanding and improving the clinical interpretation process.
Collapse
|