1
|
Liu Z, Mhlanga JC, Xia H, Siegel BA, Jha AK. Need for Objective Task-Based Evaluation of Image Segmentation Algorithms for Quantitative PET: A Study with ACRIN 6668/RTOG 0235 Multicenter Clinical Trial Data. J Nucl Med 2024; 65:jnumed.123.266018. [PMID: 38360049 PMCID: PMC10924158 DOI: 10.2967/jnumed.123.266018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 12/19/2023] [Accepted: 12/19/2023] [Indexed: 02/17/2024] Open
Abstract
Reliable performance of PET segmentation algorithms on clinically relevant tasks is required for their clinical translation. However, these algorithms are typically evaluated using figures of merit (FoMs) that are not explicitly designed to correlate with clinical task performance. Such FoMs include the Dice similarity coefficient (DSC), the Jaccard similarity coefficient (JSC), and the Hausdorff distance (HD). The objective of this study was to investigate whether evaluating PET segmentation algorithms using these task-agnostic FoMs yields interpretations consistent with evaluation on clinically relevant quantitative tasks. Methods: We conducted a retrospective study to assess the concordance in the evaluation of segmentation algorithms using the DSC, JSC, and HD and on the tasks of estimating the metabolic tumor volume (MTV) and total lesion glycolysis (TLG) of primary tumors from PET images of patients with non-small cell lung cancer. The PET images were collected from the American College of Radiology Imaging Network 6668/Radiation Therapy Oncology Group 0235 multicenter clinical trial data. The study was conducted in 2 contexts: (1) evaluating conventional segmentation algorithms, namely those based on thresholding (SUVmax40% and SUVmax50%), boundary detection (Snakes), and stochastic modeling (Markov random field-Gaussian mixture model); (2) evaluating the impact of network depth and loss function on the performance of a state-of-the-art U-net-based segmentation algorithm. Results: Evaluation of conventional segmentation algorithms based on the DSC, JSC, and HD showed that SUVmax40% significantly outperformed SUVmax50%. However, SUVmax40% yielded lower accuracy on the tasks of estimating MTV and TLG, with a 51% and 54% increase, respectively, in the ensemble normalized bias. Similarly, the Markov random field-Gaussian mixture model significantly outperformed Snakes on the basis of the task-agnostic FoMs but yielded a 24% increased bias in estimated MTV. For the U-net-based algorithm, our evaluation showed that although the network depth did not significantly alter the DSC, JSC, and HD values, a deeper network yielded substantially higher accuracy in the estimated MTV and TLG, with a decreased bias of 91% and 87%, respectively. Additionally, whereas there was no significant difference in the DSC, JSC, and HD values for different loss functions, up to a 73% and 58% difference in the bias of the estimated MTV and TLG, respectively, existed. Conclusion: Evaluation of PET segmentation algorithms using task-agnostic FoMs could yield findings discordant with evaluation on clinically relevant quantitative tasks. This study emphasizes the need for objective task-based evaluation of image segmentation algorithms for quantitative PET.
Collapse
Affiliation(s)
- Ziping Liu
- Department of Biomedical Engineering, Washington University, St. Louis, Missouri
| | - Joyce C Mhlanga
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, Missouri; and
| | - Huitian Xia
- Department of Biomedical Engineering, Washington University, St. Louis, Missouri
| | - Barry A Siegel
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, Missouri; and
- Alvin J. Siteman Cancer Center, Washington University School of Medicine, St. Louis, Missouri
| | - Abhinav K Jha
- Department of Biomedical Engineering, Washington University, St. Louis, Missouri;
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, Missouri; and
- Alvin J. Siteman Cancer Center, Washington University School of Medicine, St. Louis, Missouri
| |
Collapse
|
2
|
Liu Y, Jha AK. How accurately can quantitative imaging methods be ranked without ground truth: An upper bound on no-gold-standard evaluation. PROCEEDINGS OF SPIE--THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING 2024; 12929:129290W. [PMID: 39610808 PMCID: PMC11601990 DOI: 10.1117/12.3006888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2024]
Abstract
Objective evaluation of quantitative imaging (QI) methods with patient data, while important, is typically hindered by the lack of gold standards. To address this challenge, no-gold-standard evaluation (NGSE) techniques have been proposed. These techniques have demonstrated efficacy in accurately ranking QI methods without access to gold standards. The development of NGSE methods has raised an important question: how accurately can QI methods be ranked without ground truth. To answer this question, we propose a Cramér-Rao bound (CRB)-based framework that quantifies the upper bound in ranking QI methods without any ground truth. We present the application of this framework in guiding the use of a well-known NGSE technique, namely the regression-without-truth (RWT) technique. Our results show the utility of this framework in quantifying the performance of this NGSE technique for different patient numbers. These results provide motivation towards studying other applications of this upper bound.
Collapse
Affiliation(s)
- Yan Liu
- Department of Biomedical Engineering, Washington University in St. Louis, St. Louis, MO, USA
- Mallinckrodt Institute of Radiology, Washington University in St. Louis, St. Louis, MO, USA
| | - Abhinav K. Jha
- Department of Biomedical Engineering, Washington University in St. Louis, St. Louis, MO, USA
- Mallinckrodt Institute of Radiology, Washington University in St. Louis, St. Louis, MO, USA
| |
Collapse
|
3
|
Liu Z, Mhlanga JC, Siegel BA, Jha AK. Need for objective task-based evaluation of AI-based segmentation methods for quantitative PET. PROCEEDINGS OF SPIE--THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING 2023; 12467:124670R. [PMID: 37990707 PMCID: PMC10659582 DOI: 10.1117/12.2647894] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2023]
Abstract
Artificial intelligence (AI)-based methods are showing substantial promise in segmenting oncologic positron emission tomography (PET) images. For clinical translation of these methods, assessing their performance on clinically relevant tasks is important. However, these methods are typically evaluated using metrics that may not correlate with the task performance. One such widely used metric is the Dice score, a figure of merit that measures the spatial overlap between the estimated segmentation and a reference standard (e.g., manual segmentation). In this work, we investigated whether evaluating AI-based segmentation methods using Dice scores yields a similar interpretation as evaluation on the clinical tasks of quantifying metabolic tumor volume (MTV) and total lesion glycolysis (TLG) of primary tumor from PET images of patients with non-small cell lung cancer. The investigation was conducted via a retrospective analysis with the ECOG-ACRIN 6668/RTOG 0235 multi-center clinical trial data. Specifically, we evaluated different structures of a commonly used AI-based segmentation method using both Dice scores and the accuracy in quantifying MTV/TLG. Our results show that evaluation using Dice scores can lead to findings that are inconsistent with evaluation using the task-based figure of merit. Thus, our study motivates the need for objective task-based evaluation of AI-based segmentation methods for quantitative PET.
Collapse
Affiliation(s)
- Ziping Liu
- Department of Biomedical Engineering, Washington University in St. Louis, St. Louis, MO, USA
| | - Joyce C. Mhlanga
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, MO, USA
| | - Barry A. Siegel
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, MO, USA
- Alvin J. Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO, USA
| | - Abhinav K. Jha
- Department of Biomedical Engineering, Washington University in St. Louis, St. Louis, MO, USA
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, MO, USA
- Alvin J. Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO, USA
| |
Collapse
|
4
|
Liu Z, Moon HS, Li Z, Laforest R, Perlmutter JS, Norris SA, Jha AK. A tissue-fraction estimation-based segmentation method for quantitative dopamine transporter SPECT. Med Phys 2022; 49:5121-5137. [PMID: 35635327 PMCID: PMC9703616 DOI: 10.1002/mp.15778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2022] [Revised: 04/25/2022] [Accepted: 05/16/2022] [Indexed: 11/09/2022] Open
Abstract
BACKGROUND Quantitative measures of dopamine transporter (DaT) uptake in caudate, putamen, and globus pallidus (GP) derived from dopamine transporter-single-photon emission computed tomography (DaT-SPECT) images have potential as biomarkers for measuring the severity of Parkinson's disease. Reliable quantification of this uptake requires accurate segmentation of the considered regions. However, segmentation of these regions from DaT-SPECT images is challenging, a major reason being partial-volume effects (PVEs) in SPECT. The PVEs arise from two sources, namely the limited system resolution and reconstruction of images over finite-sized voxel grids. The limited system resolution results in blurred boundaries of the different regions. The finite voxel size leads to TFEs, that is, voxels contain a mixture of regions. Thus, there is an important need for methods that can account for the PVEs, including the TFEs, and accurately segment the caudate, putamen, and GP, from DaT-SPECT images. PURPOSE Design and objectively evaluate a fully automated tissue-fraction estimation-based segmentation method that segments the caudate, putamen, and GP from DaT-SPECT images. METHODS The proposed method estimates the posterior mean of the fractional volumes occupied by the caudate, putamen, and GP within each voxel of a three-dimensional DaT-SPECT image. The estimate is obtained by minimizing a cost function based on the binary cross-entropy loss between the true and estimated fractional volumes over a population of SPECT images, where the distribution of true fractional volumes is obtained from existing populations of clinical magnetic resonance images. The method is implemented using a supervised deep-learning-based approach. RESULTS Evaluations using clinically guided highly realistic simulation studies show that the proposed method accurately segmented the caudate, putamen, and GP with high mean Dice similarity coefficients of ∼ 0.80 and significantly outperformed (p < 0.01 $p < 0.01$ ) all other considered segmentation methods. Further, an objective evaluation of the proposed method on the task of quantifying regional uptake shows that the method yielded reliable quantification with low ensemble normalized root mean square error (NRMSE) < 20% for all the considered regions. In particular, the method yielded an even lower ensemble NRMSE of ∼ 10% for the caudate and putamen. CONCLUSIONS The proposed tissue-fraction estimation-based segmentation method for DaT-SPECT images demonstrated the ability to accurately segment the caudate, putamen, and GP, and reliably quantify the uptake within these regions. The results motivate further evaluation of the method with physical-phantom and patient studies.
Collapse
Affiliation(s)
- Ziping Liu
- Department of Biomedical Engineering, Washington University, St. Louis, Missouri, USA
| | - Hae Sol Moon
- Department of Biomedical Engineering, Washington University, St. Louis, Missouri, USA
| | - Zekun Li
- Department of Biomedical Engineering, Washington University, St. Louis, Missouri, USA
| | - Richard Laforest
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Joel S. Perlmutter
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, Missouri, USA
- Department of Neurology,Washington University School of Medicine, St. Louis, Missouri, USA
| | - Scott A. Norris
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, Missouri, USA
- Department of Neurology,Washington University School of Medicine, St. Louis, Missouri, USA
| | - Abhinav K. Jha
- Department of Biomedical Engineering, Washington University, St. Louis, Missouri, USA
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, Missouri, USA
| |
Collapse
|
5
|
Liu Z, Li Z, Mhlanga JC, Siegel BA, Jha AK. No-gold-standard evaluation of quantitative imaging methods in the presence of correlated noise. PROCEEDINGS OF SPIE--THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING 2022; 12035:120350M. [PMID: 36465994 PMCID: PMC9717481 DOI: 10.1117/12.2605762] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Objective evaluation of quantitative imaging (QI) methods with patient data is highly desirable, but is hindered by the lack or unreliability of an available gold standard. To address this issue, techniques that can evaluate QI methods without access to a gold standard are being actively developed. These techniques assume that the true and measured values are linearly related by a slope, bias, and Gaussian-distributed noise term, where the noise between measurements made by different methods is independent of each other. However, this noise arises in the process of measuring the same quantitative value, and thus can be correlated. To address this limitation, we propose a no-gold-standard evaluation (NGSE) technique that models this correlated noise by a multi-variate Gaussian distribution parameterized by a covariance matrix. We derive a maximum-likelihood-based approach to estimate the parameters that describe the relationship between the true and measured values, without any knowledge of the true values. We then use the estimated slopes and diagonal elements of the covariance matrix to compute the noise-to-slope ratio (NSR) to rank the QI methods on the basis of precision. The proposed NGSE technique was evaluated with multiple numerical experiments. Our results showed that the technique reliably estimated the NSR values and yielded accurate rankings of the considered methods for 83% of 160 trials. In particular, the technique correctly identified the most precise method for ∼ 97% of the trials. Overall, this study demonstrates the efficacy of the NGSE technique to accurately rank different QI methods when correlated noise is present, and without access to any knowledge of the ground truth. The results motivate further validation of this technique with realistic simulation studies and patient data.
Collapse
Affiliation(s)
- Ziping Liu
- Department of Biomedical Engineering, Washington University, St. Louis, MO, USA
| | - Zekun Li
- Department of Biomedical Engineering, Washington University, St. Louis, MO, USA
| | - Joyce C. Mhlanga
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, MO, USA
| | - Barry A. Siegel
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, MO, USA
| | - Abhinav K. Jha
- Department of Biomedical Engineering, Washington University, St. Louis, MO, USA
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, MO, USA
| |
Collapse
|
6
|
Jha AK, Myers KJ, Obuchowski NA, Liu Z, Rahman MA, Saboury B, Rahmim A, Siegel BA. Objective Task-Based Evaluation of Artificial Intelligence-Based Medical Imaging Methods:: Framework, Strategies, and Role of the Physician. PET Clin 2021; 16:493-511. [PMID: 34537127 DOI: 10.1016/j.cpet.2021.06.013] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Artificial intelligence-based methods are showing promise in medical imaging applications. There is substantial interest in clinical translation of these methods, requiring that they be evaluated rigorously. We lay out a framework for objective task-based evaluation of artificial intelligence methods. We provide a list of available tools to conduct this evaluation. We outline the important role of physicians in conducting these evaluation studies. The examples in this article are proposed in the context of PET scans with a focus on evaluating neural network-based methods. However, the framework is also applicable to evaluate other medical imaging modalities and other types of artificial intelligence methods.
Collapse
Affiliation(s)
- Abhinav K Jha
- Department of Biomedical Engineering, Mallinckrodt Institute of Radioly, Alvin J. Siteman Cancer Center, Washington University in St. Louis, 510 S Kingshighway Boulevard, St Louis, MO 63110, USA.
| | - Kyle J Myers
- Division of Imaging, Diagnostics, and Software Reliability, Office of Science and Engineering Laboratories, Center for Devices and Radiological Health, Food and Drug Administration (FDA), Silver Spring, MD, USA
| | | | - Ziping Liu
- Department of Biomedical Engineering, Washington University in St. Louis, 1 Brookings Drive, St Louis, MO 63130, USA
| | - Md Ashequr Rahman
- Department of Biomedical Engineering, Washington University in St. Louis, 1 Brookings Drive, St Louis, MO 63130, USA
| | - Babak Saboury
- Department of Radiology and Imaging Sciences, Clinical Center, National Institutes of Health, 9000 Rockville Pike, Bethesda, MD 20892, USA
| | - Arman Rahmim
- Department of Radiology, Department of Physics, University of British Columbia, BC Cancer, BC Cancer Research Institute, 675 West 10th Avenue, Office 6-112, Vancouver, British Columbia V5Z 1L3, Canada
| | - Barry A Siegel
- Division of Nuclear Medicine, Mallinckrodt Institute of Radiology, Alvin J. Siteman Cancer Center, Washington University School of Medicine, 510 S Kingshighway Boulevard #956, St Louis, MO 63110, USA
| |
Collapse
|
7
|
Madan H, Berlot R, Ray NJ, Pernus F, Spiclin Z. Practical Priors for Bayesian Inference of Latent Biomarkers. IEEE J Biomed Health Inform 2019; 24:396-406. [PMID: 31581104 DOI: 10.1109/jbhi.2019.2945077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Latent biomarkers are quantities that strongly relate to patient's disease diagnosis and prognosis, but are difficult to measure or even not directly observable. The objective of this study was to develop, analyze and validate new priors for Bayesian inference of such biomarkers. Theoretical analysis revealed a relationship between the estimates inferred from the model and the true values of measured quantities, and the impact of the priors. This led to a new prior encoding scheme that incorporates objectively measurable domain knowledge, i.e. by performing two measurements with a reference method, which imply scale of the prior distribution. Second, priors on parameters of systematic error are non-informative, which enables biomarker estimation from a set of different quantities. Analysis showed that the volume of nucleus basalis of Meynert, which is reduced in early stages of Alzheimer's dementia and Parkinson's disease, is inter-related and could be inferred from compartmental brain volume measurements performed on routine clinical MR scans. Another experiment showed that total lesion load, associated to future disability progression in multiple sclerosis patients, could be inferred from lesion volume measurements based on multiple automated MR scan segmentations. Besides, figures of merit derived from the estimates could, without comparing against reference gold standard segmentations, identify the best performing lesion segmentation method. The proposed new priors substantially simplify the application of Bayesian inference for latent biomarkers and thus open an avenue for clinical implementation of new biomarkers, which may ultimately advance the evidence-based medicine.
Collapse
|
8
|
Madan H, Pernuš F, Špiclin Ž. Reference-free error estimation for multiple measurement methods. Stat Methods Med Res 2018; 28:2196-2209. [PMID: 29384043 DOI: 10.1177/0962280217754231] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
We present a computational framework to select the most accurate and precise method of measurement of a certain quantity, when there is no access to the true value of the measurand. A typical use case is when several image analysis methods are applied to measure the value of a particular quantitative imaging biomarker from the same images. The accuracy of each measurement method is characterized by systematic error (bias), which is modeled as a polynomial in true values of measurand, and the precision as random error modeled with a Gaussian random variable. In contrast to previous works, the random errors are modeled jointly across all methods, thereby enabling the framework to analyze measurement methods based on similar principles, which may have correlated random errors. Furthermore, the posterior distribution of the error model parameters is estimated from samples obtained by Markov chain Monte-Carlo and analyzed to estimate the parameter values and the unknown true values of the measurand. The framework was validated on six synthetic and one clinical dataset containing measurements of total lesion load, a biomarker of neurodegenerative diseases, which was obtained with four automatic methods by analyzing brain magnetic resonance images. The estimates of bias and random error were in a good agreement with the corresponding least squares regression estimates against a reference.
Collapse
Affiliation(s)
- Hennadii Madan
- Faculty of Electrical Engineering, University of Ljubljana, Ljubljana, Slovenia
| | - Franjo Pernuš
- Faculty of Electrical Engineering, University of Ljubljana, Ljubljana, Slovenia
| | - Žiga Špiclin
- Faculty of Electrical Engineering, University of Ljubljana, Ljubljana, Slovenia
| |
Collapse
|
9
|
Harmonic subtraction for evaluating right ventricle ejection fraction from planar equilibrium radionuclide angiography. Int J Cardiovasc Imaging 2017; 33:1857-1862. [DOI: 10.1007/s10554-017-1164-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/31/2017] [Accepted: 05/08/2017] [Indexed: 10/19/2022]
|
10
|
Osadebey M, Pedersen M, Arnold D, Wendel-Mitoraj K. Bayesian framework inspired no-reference region-of-interest quality measure for brain MRI images. J Med Imaging (Bellingham) 2017. [PMID: 28630885 DOI: 10.1117/1.jmi.4.2.025504] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We describe a postacquisition, attribute-based quality assessment method for brain magnetic resonance imaging (MRI) images. It is based on the application of Bayes theory to the relationship between entropy and image quality attributes. The entropy feature image of a slice is segmented into low- and high-entropy regions. For each entropy region, there are three separate observations of contrast, standard deviation, and sharpness quality attributes. A quality index for a quality attribute is the posterior probability of an entropy region given any corresponding region in a feature image where quality attribute is observed. Prior belief in each entropy region is determined from normalized total clique potential (TCP) energy of the slice. For TCP below the predefined threshold, the prior probability for a region is determined by deviation of its percentage composition in the slice from a standard normal distribution built from 250 MRI volume data provided by Alzheimer's Disease Neuroimaging Initiative. For TCP above the threshold, the prior is computed using a mathematical model that describes the TCP-noise level relationship in brain MRI images. Our proposed method assesses the image quality of each entropy region and the global image. Experimental results demonstrate good correlation with subjective opinions of radiologists for different types and levels of quality distortions.
Collapse
Affiliation(s)
- Michael Osadebey
- NeuroRx Research Inc., MRI Reader Group, Montreal, Québec, Canada
| | - Marius Pedersen
- Norwegian University of Science and Technology, Department of Computer Science, Gjøvik, Norway
| | | | | | | |
Collapse
|
11
|
Jha AK, Mena E, Caffo B, Ashrafinia S, Rahmim A, Frey E, Subramaniam RM. Practical no-gold-standard evaluation framework for quantitative imaging methods: application to lesion segmentation in positron emission tomography. J Med Imaging (Bellingham) 2017; 4:011011. [PMID: 28331883 DOI: 10.1117/1.jmi.4.1.011011] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Accepted: 02/09/2017] [Indexed: 11/14/2022] Open
Abstract
Recently, a class of no-gold-standard (NGS) techniques have been proposed to evaluate quantitative imaging methods using patient data. These techniques provide figures of merit (FoMs) quantifying the precision of the estimated quantitative value without requiring repeated measurements and without requiring a gold standard. However, applying these techniques to patient data presents several practical difficulties including assessing the underlying assumptions, accounting for patient-sampling-related uncertainty, and assessing the reliability of the estimated FoMs. To address these issues, we propose statistical tests that provide confidence in the underlying assumptions and in the reliability of the estimated FoMs. Furthermore, the NGS technique is integrated within a bootstrap-based methodology to account for patient-sampling-related uncertainty. The developed NGS framework was applied to evaluate four methods for segmenting lesions from F-Fluoro-2-deoxyglucose positron emission tomography images of patients with head-and-neck cancer on the task of precisely measuring the metabolic tumor volume. The NGS technique consistently predicted the same segmentation method as the most precise method. The proposed framework provided confidence in these results, even when gold-standard data were not available. The bootstrap-based methodology indicated improved performance of the NGS technique with larger numbers of patient studies, as was expected, and yielded consistent results as long as data from more than 80 lesions were available for the analysis.
Collapse
Affiliation(s)
- Abhinav K Jha
- Johns Hopkins University , Department of Radiology and Radiological Sciences, Baltimore, Maryland, United States
| | - Esther Mena
- Johns Hopkins University , Department of Radiology and Radiological Sciences, Baltimore, Maryland, United States
| | - Brian Caffo
- Johns Hopkins University , Department of Biostatistics, Baltimore, Maryland, United States
| | - Saeed Ashrafinia
- Johns Hopkins University, Department of Radiology and Radiological Sciences, Baltimore, Maryland, United States; Johns Hopkins University, Department of Electrical & Computer Engineering, Baltimore, Maryland, United States
| | - Arman Rahmim
- Johns Hopkins University, Department of Radiology and Radiological Sciences, Baltimore, Maryland, United States; Johns Hopkins University, Department of Electrical & Computer Engineering, Baltimore, Maryland, United States
| | - Eric Frey
- Johns Hopkins University, Department of Radiology and Radiological Sciences, Baltimore, Maryland, United States; Johns Hopkins University, Department of Electrical & Computer Engineering, Baltimore, Maryland, United States
| | - Rathan M Subramaniam
- University of Texas Southwestern Medical Center , Department of Radiology and Advanced Imaging Research Center, Dallas, Texas, United States
| |
Collapse
|
12
|
Jha AK, Frey E. No-gold-standard evaluation of image-acquisition methods using patient data. PROCEEDINGS OF SPIE--THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING 2017; 10136. [PMID: 28596636 DOI: 10.1117/12.2255902] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Several new and improved modalities, scanners, and protocols, together referred to as image-acquisition methods (IAMs), are being developed to provide reliable quantitative imaging. Objective evaluation of these IAMs on the clinically relevant quantitative tasks is highly desirable. Such evaluation is most reliable and clinically decisive when performed with patient data, but that requires the availability of a gold standard, which is often rare. While no-gold-standard (NGS) techniques have been developed to clinically evaluate quantitative imaging methods, these techniques require that each of the patients be scanned using all the IAMs, which is expensive, time consuming, and could lead to increased radiation dose. A more clinically practical scenario is where different set of patients are scanned using different IAMs. We have developed an NGS technique that uses patient data where different patient sets are imaged using different IAMs to compare the different IAMs. The technique posits a linear relationship, characterized by a slope, bias, and noise standard-deviation term, between the true and measured quantitative values. Under the assumption that the true quantitative values have been sampled from a unimodal distribution, a maximum-likelihood procedure was developed that estimates these linear relationship parameters for the different IAMs. Figures of merit can be estimated using these linear relationship parameters to evaluate the IAMs on the basis of accuracy, precision, and overall reliability. The proposed technique has several potential applications such as in protocol optimization, quantifying difference in system performance, and system harmonization using patient data.
Collapse
Affiliation(s)
- Abhinav K Jha
- Department of Radiology, Johns Hopkins University, Baltimore, MD USA
| | - Eric Frey
- Department of Radiology, Johns Hopkins University, Baltimore, MD USA
| |
Collapse
|
13
|
Jha AK, Caffo B, Frey EC. A no-gold-standard technique for objective assessment of quantitative nuclear-medicine imaging methods. Phys Med Biol 2016; 61:2780-800. [PMID: 26982626 DOI: 10.1088/0031-9155/61/7/2780] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The objective optimization and evaluation of nuclear-medicine quantitative imaging methods using patient data is highly desirable but often hindered by the lack of a gold standard. Previously, a regression-without-truth (RWT) approach has been proposed for evaluating quantitative imaging methods in the absence of a gold standard, but this approach implicitly assumes that bounds on the distribution of true values are known. Several quantitative imaging methods in nuclear-medicine imaging measure parameters where these bounds are not known, such as the activity concentration in an organ or the volume of a tumor. We extended upon the RWT approach to develop a no-gold-standard (NGS) technique for objectively evaluating such quantitative nuclear-medicine imaging methods with patient data in the absence of any ground truth. Using the parameters estimated with the NGS technique, a figure of merit, the noise-to-slope ratio (NSR), can be computed, which can rank the methods on the basis of precision. An issue with NGS evaluation techniques is the requirement of a large number of patient studies. To reduce this requirement, the proposed method explored the use of multiple quantitative measurements from the same patient, such as the activity concentration values from different organs in the same patient. The proposed technique was evaluated using rigorous numerical experiments and using data from realistic simulation studies. The numerical experiments demonstrated that the NSR was estimated accurately using the proposed NGS technique when the bounds on the distribution of true values were not precisely known, thus serving as a very reliable metric for ranking the methods on the basis of precision. In the realistic simulation study, the NGS technique was used to rank reconstruction methods for quantitative single-photon emission computed tomography (SPECT) based on their performance on the task of estimating the mean activity concentration within a known volume of interest. Results showed that the proposed technique provided accurate ranking of the reconstruction methods for 97.5% of the 50 noise realizations. Further, the technique was robust to the choice of evaluated reconstruction methods. The simulation study pointed to possible violations of the assumptions made in the NGS technique under clinical scenarios. However, numerical experiments indicated that the NGS technique was robust in ranking methods even when there was some degree of such violation.
Collapse
Affiliation(s)
- Abhinav K Jha
- Division of Medical Imaging Physics, Department of Radiology and Radiological Sciences, Johns Hopkins University, Baltimore, MD 21218, USA
| | | | | |
Collapse
|
14
|
Sullivan DC, Obuchowski NA, Kessler LG, Raunig DL, Gatsonis C, Huang EP, Kondratovich M, McShane LM, Reeves AP, Barboriak DP, Guimaraes AR, Wahl RL. Metrology Standards for Quantitative Imaging Biomarkers. Radiology 2015; 277:813-25. [PMID: 26267831 PMCID: PMC4666097 DOI: 10.1148/radiol.2015142202] [Citation(s) in RCA: 305] [Impact Index Per Article: 30.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Although investigators in the imaging community have been active in developing and evaluating quantitative imaging biomarkers (QIBs), the development and implementation of QIBs have been hampered by the inconsistent or incorrect use of terminology or methods for technical performance and statistical concepts. Technical performance is an assessment of how a test performs in reference objects or subjects under controlled conditions. In this article, some of the relevant statistical concepts are reviewed, methods that can be used for evaluating and comparing QIBs are described, and some of the technical performance issues related to imaging biomarkers are discussed. More consistent and correct use of terminology and study design principles will improve clinical research, advance regulatory science, and foster better care for patients who undergo imaging studies.
Collapse
Affiliation(s)
- Daniel C. Sullivan
- From the Department of Radiology, Duke University Medical Center, Box 2715, Durham, NC 27710 (D.C.S., D.P.B.); Department of Quantitative Health Sciences, Cleveland Clinic Foundation, Cleveland, Ohio (N.A.O.); Department of Public Health, University of Washington, Seattle, Wash (L.G.K.); Department of Informatics, ICON Medical, Washington, Pa (D.L.R.); Center for Statistical Sciences, Brown University, Providence, RI (C.G.); National Cancer Institute, Bethesda, Md (E.P.H., L.M.M.); Center for Devices and Radiological Health, U.S. Food and Drug Administration, White Oak, Md (M.K.); Department of Electrical and Computer Engineering, Cornell University, Ithaca, NY (A.P.R.); Department of Radiology, Oregon Health & Science University, Portland, Ore (A.R.G.); and Mallinckrodt Institute of Radiology, Washington University School of Medicine, St Louis, Mo (R.L.W.)
| | - Nancy A. Obuchowski
- From the Department of Radiology, Duke University Medical Center, Box 2715, Durham, NC 27710 (D.C.S., D.P.B.); Department of Quantitative Health Sciences, Cleveland Clinic Foundation, Cleveland, Ohio (N.A.O.); Department of Public Health, University of Washington, Seattle, Wash (L.G.K.); Department of Informatics, ICON Medical, Washington, Pa (D.L.R.); Center for Statistical Sciences, Brown University, Providence, RI (C.G.); National Cancer Institute, Bethesda, Md (E.P.H., L.M.M.); Center for Devices and Radiological Health, U.S. Food and Drug Administration, White Oak, Md (M.K.); Department of Electrical and Computer Engineering, Cornell University, Ithaca, NY (A.P.R.); Department of Radiology, Oregon Health & Science University, Portland, Ore (A.R.G.); and Mallinckrodt Institute of Radiology, Washington University School of Medicine, St Louis, Mo (R.L.W.)
| | - Larry G. Kessler
- From the Department of Radiology, Duke University Medical Center, Box 2715, Durham, NC 27710 (D.C.S., D.P.B.); Department of Quantitative Health Sciences, Cleveland Clinic Foundation, Cleveland, Ohio (N.A.O.); Department of Public Health, University of Washington, Seattle, Wash (L.G.K.); Department of Informatics, ICON Medical, Washington, Pa (D.L.R.); Center for Statistical Sciences, Brown University, Providence, RI (C.G.); National Cancer Institute, Bethesda, Md (E.P.H., L.M.M.); Center for Devices and Radiological Health, U.S. Food and Drug Administration, White Oak, Md (M.K.); Department of Electrical and Computer Engineering, Cornell University, Ithaca, NY (A.P.R.); Department of Radiology, Oregon Health & Science University, Portland, Ore (A.R.G.); and Mallinckrodt Institute of Radiology, Washington University School of Medicine, St Louis, Mo (R.L.W.)
| | - David L. Raunig
- From the Department of Radiology, Duke University Medical Center, Box 2715, Durham, NC 27710 (D.C.S., D.P.B.); Department of Quantitative Health Sciences, Cleveland Clinic Foundation, Cleveland, Ohio (N.A.O.); Department of Public Health, University of Washington, Seattle, Wash (L.G.K.); Department of Informatics, ICON Medical, Washington, Pa (D.L.R.); Center for Statistical Sciences, Brown University, Providence, RI (C.G.); National Cancer Institute, Bethesda, Md (E.P.H., L.M.M.); Center for Devices and Radiological Health, U.S. Food and Drug Administration, White Oak, Md (M.K.); Department of Electrical and Computer Engineering, Cornell University, Ithaca, NY (A.P.R.); Department of Radiology, Oregon Health & Science University, Portland, Ore (A.R.G.); and Mallinckrodt Institute of Radiology, Washington University School of Medicine, St Louis, Mo (R.L.W.)
| | - Constantine Gatsonis
- From the Department of Radiology, Duke University Medical Center, Box 2715, Durham, NC 27710 (D.C.S., D.P.B.); Department of Quantitative Health Sciences, Cleveland Clinic Foundation, Cleveland, Ohio (N.A.O.); Department of Public Health, University of Washington, Seattle, Wash (L.G.K.); Department of Informatics, ICON Medical, Washington, Pa (D.L.R.); Center for Statistical Sciences, Brown University, Providence, RI (C.G.); National Cancer Institute, Bethesda, Md (E.P.H., L.M.M.); Center for Devices and Radiological Health, U.S. Food and Drug Administration, White Oak, Md (M.K.); Department of Electrical and Computer Engineering, Cornell University, Ithaca, NY (A.P.R.); Department of Radiology, Oregon Health & Science University, Portland, Ore (A.R.G.); and Mallinckrodt Institute of Radiology, Washington University School of Medicine, St Louis, Mo (R.L.W.)
| | - Erich P. Huang
- From the Department of Radiology, Duke University Medical Center, Box 2715, Durham, NC 27710 (D.C.S., D.P.B.); Department of Quantitative Health Sciences, Cleveland Clinic Foundation, Cleveland, Ohio (N.A.O.); Department of Public Health, University of Washington, Seattle, Wash (L.G.K.); Department of Informatics, ICON Medical, Washington, Pa (D.L.R.); Center for Statistical Sciences, Brown University, Providence, RI (C.G.); National Cancer Institute, Bethesda, Md (E.P.H., L.M.M.); Center for Devices and Radiological Health, U.S. Food and Drug Administration, White Oak, Md (M.K.); Department of Electrical and Computer Engineering, Cornell University, Ithaca, NY (A.P.R.); Department of Radiology, Oregon Health & Science University, Portland, Ore (A.R.G.); and Mallinckrodt Institute of Radiology, Washington University School of Medicine, St Louis, Mo (R.L.W.)
| | - Marina Kondratovich
- From the Department of Radiology, Duke University Medical Center, Box 2715, Durham, NC 27710 (D.C.S., D.P.B.); Department of Quantitative Health Sciences, Cleveland Clinic Foundation, Cleveland, Ohio (N.A.O.); Department of Public Health, University of Washington, Seattle, Wash (L.G.K.); Department of Informatics, ICON Medical, Washington, Pa (D.L.R.); Center for Statistical Sciences, Brown University, Providence, RI (C.G.); National Cancer Institute, Bethesda, Md (E.P.H., L.M.M.); Center for Devices and Radiological Health, U.S. Food and Drug Administration, White Oak, Md (M.K.); Department of Electrical and Computer Engineering, Cornell University, Ithaca, NY (A.P.R.); Department of Radiology, Oregon Health & Science University, Portland, Ore (A.R.G.); and Mallinckrodt Institute of Radiology, Washington University School of Medicine, St Louis, Mo (R.L.W.)
| | - Lisa M. McShane
- From the Department of Radiology, Duke University Medical Center, Box 2715, Durham, NC 27710 (D.C.S., D.P.B.); Department of Quantitative Health Sciences, Cleveland Clinic Foundation, Cleveland, Ohio (N.A.O.); Department of Public Health, University of Washington, Seattle, Wash (L.G.K.); Department of Informatics, ICON Medical, Washington, Pa (D.L.R.); Center for Statistical Sciences, Brown University, Providence, RI (C.G.); National Cancer Institute, Bethesda, Md (E.P.H., L.M.M.); Center for Devices and Radiological Health, U.S. Food and Drug Administration, White Oak, Md (M.K.); Department of Electrical and Computer Engineering, Cornell University, Ithaca, NY (A.P.R.); Department of Radiology, Oregon Health & Science University, Portland, Ore (A.R.G.); and Mallinckrodt Institute of Radiology, Washington University School of Medicine, St Louis, Mo (R.L.W.)
| | - Anthony P. Reeves
- From the Department of Radiology, Duke University Medical Center, Box 2715, Durham, NC 27710 (D.C.S., D.P.B.); Department of Quantitative Health Sciences, Cleveland Clinic Foundation, Cleveland, Ohio (N.A.O.); Department of Public Health, University of Washington, Seattle, Wash (L.G.K.); Department of Informatics, ICON Medical, Washington, Pa (D.L.R.); Center for Statistical Sciences, Brown University, Providence, RI (C.G.); National Cancer Institute, Bethesda, Md (E.P.H., L.M.M.); Center for Devices and Radiological Health, U.S. Food and Drug Administration, White Oak, Md (M.K.); Department of Electrical and Computer Engineering, Cornell University, Ithaca, NY (A.P.R.); Department of Radiology, Oregon Health & Science University, Portland, Ore (A.R.G.); and Mallinckrodt Institute of Radiology, Washington University School of Medicine, St Louis, Mo (R.L.W.)
| | - Daniel P. Barboriak
- From the Department of Radiology, Duke University Medical Center, Box 2715, Durham, NC 27710 (D.C.S., D.P.B.); Department of Quantitative Health Sciences, Cleveland Clinic Foundation, Cleveland, Ohio (N.A.O.); Department of Public Health, University of Washington, Seattle, Wash (L.G.K.); Department of Informatics, ICON Medical, Washington, Pa (D.L.R.); Center for Statistical Sciences, Brown University, Providence, RI (C.G.); National Cancer Institute, Bethesda, Md (E.P.H., L.M.M.); Center for Devices and Radiological Health, U.S. Food and Drug Administration, White Oak, Md (M.K.); Department of Electrical and Computer Engineering, Cornell University, Ithaca, NY (A.P.R.); Department of Radiology, Oregon Health & Science University, Portland, Ore (A.R.G.); and Mallinckrodt Institute of Radiology, Washington University School of Medicine, St Louis, Mo (R.L.W.)
| | - Alexander R. Guimaraes
- From the Department of Radiology, Duke University Medical Center, Box 2715, Durham, NC 27710 (D.C.S., D.P.B.); Department of Quantitative Health Sciences, Cleveland Clinic Foundation, Cleveland, Ohio (N.A.O.); Department of Public Health, University of Washington, Seattle, Wash (L.G.K.); Department of Informatics, ICON Medical, Washington, Pa (D.L.R.); Center for Statistical Sciences, Brown University, Providence, RI (C.G.); National Cancer Institute, Bethesda, Md (E.P.H., L.M.M.); Center for Devices and Radiological Health, U.S. Food and Drug Administration, White Oak, Md (M.K.); Department of Electrical and Computer Engineering, Cornell University, Ithaca, NY (A.P.R.); Department of Radiology, Oregon Health & Science University, Portland, Ore (A.R.G.); and Mallinckrodt Institute of Radiology, Washington University School of Medicine, St Louis, Mo (R.L.W.)
| | - Richard L. Wahl
- From the Department of Radiology, Duke University Medical Center, Box 2715, Durham, NC 27710 (D.C.S., D.P.B.); Department of Quantitative Health Sciences, Cleveland Clinic Foundation, Cleveland, Ohio (N.A.O.); Department of Public Health, University of Washington, Seattle, Wash (L.G.K.); Department of Informatics, ICON Medical, Washington, Pa (D.L.R.); Center for Statistical Sciences, Brown University, Providence, RI (C.G.); National Cancer Institute, Bethesda, Md (E.P.H., L.M.M.); Center for Devices and Radiological Health, U.S. Food and Drug Administration, White Oak, Md (M.K.); Department of Electrical and Computer Engineering, Cornell University, Ithaca, NY (A.P.R.); Department of Radiology, Oregon Health & Science University, Portland, Ore (A.R.G.); and Mallinckrodt Institute of Radiology, Washington University School of Medicine, St Louis, Mo (R.L.W.)
| | | |
Collapse
|
15
|
Lebenberg J, Lalande A, Clarysse P, Buvat I, Casta C, Cochet A, Constantinidès C, Cousty J, de Cesare A, Jehan-Besson S, Lefort M, Najman L, Roullot E, Sarry L, Tilmant C, Frouin F, Garreau M. Improved Estimation of Cardiac Function Parameters Using a Combination of Independent Automated Segmentation Results in Cardiovascular Magnetic Resonance Imaging. PLoS One 2015; 10:e0135715. [PMID: 26287691 PMCID: PMC4545395 DOI: 10.1371/journal.pone.0135715] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2015] [Accepted: 07/24/2015] [Indexed: 11/18/2022] Open
Abstract
This work aimed at combining different segmentation approaches to produce a robust and accurate segmentation result. Three to five segmentation results of the left ventricle were combined using the STAPLE algorithm and the reliability of the resulting segmentation was evaluated in comparison with the result of each individual segmentation method. This comparison was performed using a supervised approach based on a reference method. Then, we used an unsupervised statistical evaluation, the extended Regression Without Truth (eRWT) that ranks different methods according to their accuracy in estimating a specific biomarker in a population. The segmentation accuracy was evaluated by estimating six cardiac function parameters resulting from the left ventricle contour delineation using a public cardiac cine MRI database. Eight different segmentation methods, including three expert delineations and five automated methods, were considered, and sixteen combinations of the automated methods using STAPLE were investigated. The supervised and unsupervised evaluations demonstrated that in most cases, STAPLE results provided better estimates than individual automated segmentation methods. Overall, combining different automated segmentation methods improved the reliability of the segmentation result compared to that obtained using an individual method and could achieve the accuracy of an expert.
Collapse
Affiliation(s)
- Jessica Lebenberg
- Laboratoire d’Imagerie Biomédicale, Institut National de la Santé et de la Recherche Médicale, Centre National de la Recherche Scientifique, Université Pierre et Marie Curie, Paris, France
- École Spéciale de Mécanique et d’Électricité-Sudria, Ivry-sur-Seine, France
- * E-mail:
| | - Alain Lalande
- Laboratoire Electronique, Informatique et Image, Centre National de la Recherche Scientifique, Université de Bourgogne, Dijon, France
| | - Patrick Clarysse
- Centre de Recherche en Acquisition et Traitement de l’Image pour la Santé, Centre National de la Recherche Scientifique, Institut National de la Santé et de la Recherche Médicale, Institut National des Sciences Appliquées Lyon, Université de Lyon, Villeurbanne, France
| | - Irene Buvat
- Unité d’Imagerie Moléculaire In Vivo, Service Hospitalier Frédéric Joliot, Institut National de la Santé et de la Recherche Médicale, Centre National de la Recherche Scientifique, Commissariat à l’Energie Atomique, Université Paris Sud, Orsay, France
| | - Christopher Casta
- Centre de Recherche en Acquisition et Traitement de l’Image pour la Santé, Centre National de la Recherche Scientifique, Institut National de la Santé et de la Recherche Médicale, Institut National des Sciences Appliquées Lyon, Université de Lyon, Villeurbanne, France
| | - Alexandre Cochet
- Laboratoire Electronique, Informatique et Image, Centre National de la Recherche Scientifique, Université de Bourgogne, Dijon, France
| | - Constantin Constantinidès
- Laboratoire d’Imagerie Biomédicale, Institut National de la Santé et de la Recherche Médicale, Centre National de la Recherche Scientifique, Université Pierre et Marie Curie, Paris, France
- École Spéciale de Mécanique et d’Électricité-Sudria, Ivry-sur-Seine, France
| | - Jean Cousty
- Laboratoire d’Informatique Gaspard Monge, Centre National de la Recherche Scientifique, Université Paris-Est Marne-la-Vallée, École Supérieure d’Ingénieurs en Électrotechnique et Électronique, Marne-la-Vallée, France
| | - Alain de Cesare
- Laboratoire d’Imagerie Biomédicale, Institut National de la Santé et de la Recherche Médicale, Centre National de la Recherche Scientifique, Université Pierre et Marie Curie, Paris, France
| | - Stephanie Jehan-Besson
- Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen, Centre National de la Recherche Scientifique, Caen, France
| | - Muriel Lefort
- Laboratoire d’Imagerie Biomédicale, Institut National de la Santé et de la Recherche Médicale, Centre National de la Recherche Scientifique, Université Pierre et Marie Curie, Paris, France
| | - Laurent Najman
- Laboratoire d’Informatique Gaspard Monge, Centre National de la Recherche Scientifique, Université Paris-Est Marne-la-Vallée, École Supérieure d’Ingénieurs en Électrotechnique et Électronique, Marne-la-Vallée, France
| | - Elodie Roullot
- École Spéciale de Mécanique et d’Électricité-Sudria, Ivry-sur-Seine, France
| | - Laurent Sarry
- Image Science for Interventional Techniques, Centre National de la Recherche Scientifique, Université d’Auvergne, Clermont-Ferrand, France
| | - Christophe Tilmant
- Institut Pascal, Centre National de la Recherche Scientifique, Université Blaise Pascal, Clermont-Ferrand, France
| | - Frederique Frouin
- Unité d’Imagerie Moléculaire In Vivo, Service Hospitalier Frédéric Joliot, Institut National de la Santé et de la Recherche Médicale, Centre National de la Recherche Scientifique, Commissariat à l’Energie Atomique, Université Paris Sud, Orsay, France
| | - Mireille Garreau
- Laboratoire de Traitement du Signal et des Images, Institut National de la Santé et de la Recherche Médicale, Université de Rennes, Rennes, France
| |
Collapse
|
16
|
Branscum AJ, Johnson WO, Hanson TE, Baron AT. Flexible regression models for ROC and risk analysis, with or without a gold standard. Stat Med 2015; 34:3997-4015. [DOI: 10.1002/sim.6610] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2014] [Accepted: 07/06/2015] [Indexed: 11/07/2022]
Affiliation(s)
- Adam J. Branscum
- Biostatistics Program; Oregon State University; Corvallis 97331 Oregon U.S.A
| | | | - Timothy E. Hanson
- Department of Statistics; University of South Carolina; Columbia SC U.S.A
| | | |
Collapse
|
17
|
Jha AK, Song N, Caffo B, Frey EC. Objective evaluation of reconstruction methods for quantitative SPECT imaging in the absence of ground truth. PROCEEDINGS OF SPIE--THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING 2015; 9416:94161K. [PMID: 26430292 DOI: 10.1117/12.2081286] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Quantitative single-photon emission computed tomography (SPECT) imaging is emerging as an important tool in clinical studies and biomedical research. There is thus a need for optimization and evaluation of systems and algorithms that are being developed for quantitative SPECT imaging. An appropriate objective method to evaluate these systems is by comparing their performance in the end task that is required in quantitative SPECT imaging, such as estimating the mean activity concentration in a volume of interest (VOI) in a patient image. This objective evaluation can be performed if the true value of the estimated parameter is known, i.e. we have a gold standard. However, very rarely is this gold standard known in human studies. Thus, no-gold-standard techniques to optimize and evaluate systems and algorithms in the absence of gold standard are required. In this work, we developed a no-gold-standard technique to objectively evaluate reconstruction methods used in quantitative SPECT when the parameter to be estimated is the mean activity concentration in a VOI. We studied the performance of the technique with realistic simulated image data generated from an object database consisting of five phantom anatomies with all possible combinations of five sets of organ uptakes, where each anatomy consisted of eight different organ VOIs. Results indicate that the method provided accurate ranking of the reconstruction methods. We also demonstrated the application of consistency checks to test the no-gold-standard output.
Collapse
Affiliation(s)
- Abhinav K Jha
- Division of Medical Imaging Physics, Department of Radiology and Radiological Sciences, Johns Hopkins University, Baltimore, MD, USA
| | - Na Song
- Division of Nuclear Medicine, Department of Radiology, Albert Einstein College of Medicine, Yeshiva University, Bronx, NY, USA
| | - Brian Caffo
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Eric C Frey
- Division of Medical Imaging Physics, Department of Radiology and Radiological Sciences, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
18
|
Liu H, Wang J, Xu X, Song E, Wang Q, Jin R, Hung CC, Fei B. A robust and accurate center-frequency estimation (RACE) algorithm for improving motion estimation performance of SinMod on tagged cardiac MR images without known tagging parameters. Magn Reson Imaging 2014; 32:1139-55. [PMID: 25087857 DOI: 10.1016/j.mri.2014.07.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2014] [Revised: 04/30/2014] [Accepted: 07/24/2014] [Indexed: 10/25/2022]
Abstract
A robust and accurate center-frequency (CF) estimation (RACE) algorithm for improving the performance of the local sine-wave modeling (SinMod) method, which is a good motion estimation method for tagged cardiac magnetic resonance (MR) images, is proposed in this study. The RACE algorithm can automatically, effectively and efficiently produce a very appropriate CF estimate for the SinMod method, under the circumstance that the specified tagging parameters are unknown, on account of the following two key techniques: (1) the well-known mean-shift algorithm, which can provide accurate and rapid CF estimation; and (2) an original two-direction-combination strategy, which can further enhance the accuracy and robustness of CF estimation. Some other available CF estimation algorithms are brought out for comparison. Several validation approaches that can work on the real data without ground truths are specially designed. Experimental results on human body in vivo cardiac data demonstrate the significance of accurate CF estimation for SinMod, and validate the effectiveness of RACE in facilitating the motion estimation performance of SinMod.
Collapse
Affiliation(s)
- Hong Liu
- School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China; Key Laboratory of Education Ministry for Image Processing and Intelligence Control, Wuhan, Hubei 430074, China
| | - Jie Wang
- School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China; Key Laboratory of Education Ministry for Image Processing and Intelligence Control, Wuhan, Hubei 430074, China
| | - Xiangyang Xu
- School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China; Key Laboratory of Education Ministry for Image Processing and Intelligence Control, Wuhan, Hubei 430074, China.
| | - Enmin Song
- School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China; Key Laboratory of Education Ministry for Image Processing and Intelligence Control, Wuhan, Hubei 430074, China
| | - Qian Wang
- Key Laboratory of Education Ministry for Image Processing and Intelligence Control, Wuhan, Hubei 430074, China; Department of Information and Safety Engineering, Zhongnan University of Economics and Law, Wuhan, Hubei 430073, China
| | - Renchao Jin
- School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China; Key Laboratory of Education Ministry for Image Processing and Intelligence Control, Wuhan, Hubei 430074, China
| | - Chih-Cheng Hung
- School of Computing and Software Engineering, Southern Polytechnic State University, Marietta, GA 30060, USA
| | - Baowei Fei
- Quantitative BioImaging Laboratory, Emory University School of Medicine, Atlanta, GA 30322, USA
| |
Collapse
|
19
|
Obuchowski NA, Reeves AP, Huang EP, Wang XF, Buckler AJ, Kim HJG, Barnhart HX, Jackson EF, Giger ML, Pennello G, Toledano AY, Kalpathy-Cramer J, Apanasovich TV, Kinahan PE, Myers KJ, Goldgof DB, Barboriak DP, Gillies RJ, Schwartz LH, Sullivan DC. Quantitative imaging biomarkers: a review of statistical methods for computer algorithm comparisons. Stat Methods Med Res 2014; 24:68-106. [PMID: 24919829 DOI: 10.1177/0962280214537390] [Citation(s) in RCA: 123] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Quantitative biomarkers from medical images are becoming important tools for clinical diagnosis, staging, monitoring, treatment planning, and development of new therapies. While there is a rich history of the development of quantitative imaging biomarker (QIB) techniques, little attention has been paid to the validation and comparison of the computer algorithms that implement the QIB measurements. In this paper we provide a framework for QIB algorithm comparisons. We first review and compare various study designs, including designs with the true value (e.g. phantoms, digital reference images, and zero-change studies), designs with a reference standard (e.g. studies testing equivalence with a reference standard), and designs without a reference standard (e.g. agreement studies and studies of algorithm precision). The statistical methods for comparing QIB algorithms are then presented for various study types using both aggregate and disaggregate approaches. We propose a series of steps for establishing the performance of a QIB algorithm, identify limitations in the current statistical literature, and suggest future directions for research.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | - Gene Pennello
- Food and Drug Administration/CDRH, Silver Spring, MD, USA
| | | | | | | | | | - Kyle J Myers
- Food and Drug Administration/CDRH, Silver Spring, MD, USA
| | | | | | | | | | | | | |
Collapse
|
20
|
Four-Dimensional Image Reconstruction Strategies in Cardiac-Gated and Respiratory-Gated PET Imaging. PET Clin 2012; 8:51-67. [PMID: 27157815 DOI: 10.1016/j.cpet.2012.10.005] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Cardiac and respiratory movements pose significant challenges to image quality and quantitative accuracy in PET imaging. Cardiac and/or respiratory gating attempt to address this issue, but instead lead to enhanced noise levels. Direct four-dimensional (4D) PET image reconstruction incorporating motion compensation has the potential to minimize noise amplification while removing considerable motion blur. A wide-ranging choice of such techniques is reviewed in this work. Future opportunities and the challenges facing the adoption of 4D PET reconstruction and its role in basic and clinical research are also discussed.
Collapse
|
21
|
Lebenberg J, Buvat I, Lalande A, Clarysse P, Casta C, Cochet A, Constantinides C, Cousty J, de Cesare A, Jehan-Besson S, Lefort M, Najman L, Roullot E, Sarry L, Tilmant C, Garreau M, Frouin F. Nonsupervised ranking of different segmentation approaches: application to the estimation of the left ventricular ejection fraction from cardiac cine MRI sequences. IEEE TRANSACTIONS ON MEDICAL IMAGING 2012; 31:1651-1660. [PMID: 22665506 DOI: 10.1109/tmi.2012.2201737] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
A statistical methodology is proposed to rank several estimation methods of a relevant clinical parameter when no gold standard is available. Based on a regression without truth method, the proposed approach was applied to rank eight methods without using any a priori information regarding the reliability of each method and its degree of automation. It was only based on a prior concerning the statistical distribution of the parameter of interest in the database. The ranking of the methods relies on figures of merit derived from the regression and computed using a bootstrap process. The methodology was applied to the estimation of the left ventricular ejection fraction derived from cardiac magnetic resonance images segmented using eight approaches with different degrees of automation: three segmentations were entirely manually performed and the others were variously automated. The ranking of methods was consistent with the expected performance of the estimation methods: the most accurate estimates of the ejection fraction were obtained using manual segmentations. The robustness of the ranking was demonstrated when at least three methods were compared. These results suggest that the proposed statistical approach might be helpful to assess the performance of estimation methods on clinical data for which no gold standard is available.
Collapse
Affiliation(s)
- Jessica Lebenberg
- LIF, INSERM UMR_S 678 Université Pierre et Marie Curie, 75013 Paris, France.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Jha AK, Kupinski MA, Rodríguez JJ, Stephen RM, Stopeck AT. Task-based evaluation of segmentation algorithms for diffusion-weighted MRI without using a gold standard. Phys Med Biol 2012; 57:4425-46. [PMID: 22713231 DOI: 10.1088/0031-9155/57/13/4425] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
In many studies, the estimation of the apparent diffusion coefficient (ADC) of lesions in visceral organs in diffusion-weighted (DW) magnetic resonance images requires an accurate lesion-segmentation algorithm. To evaluate these lesion-segmentation algorithms, region-overlap measures are used currently. However, the end task from the DW images is accurate ADC estimation, and the region-overlap measures do not evaluate the segmentation algorithms on this task. Moreover, these measures rely on the existence of gold-standard segmentation of the lesion, which is typically unavailable. In this paper, we study the problem of task-based evaluation of segmentation algorithms in DW imaging in the absence of a gold standard. We first show that using manual segmentations instead of gold-standard segmentations for this task-based evaluation is unreliable. We then propose a method to compare the segmentation algorithms that does not require gold-standard or manual segmentation results. The no-gold-standard method estimates the bias and the variance of the error between the true ADC values and the ADC values estimated using the automated segmentation algorithm. The method can be used to rank the segmentation algorithms on the basis of both the ensemble mean square error and precision. We also propose consistency checks for this evaluation technique.
Collapse
Affiliation(s)
- Abhinav K Jha
- College of Optical Sciences, University of Arizona, Tucson, AZ, USA.
| | | | | | | | | |
Collapse
|
23
|
Manatunga AK, Binongo JNG, Taylor AT. Computer-aided diagnosis of renal obstruction: utility of log-linear modeling versus standard ROC and kappa analysis. EJNMMI Res 2011; 1:1-8. [PMID: 21935501 PMCID: PMC3175375 DOI: 10.1186/2191-219x-1-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background The accuracy of computer-aided diagnosis (CAD) software is best evaluated by comparison to a gold standard which represents the true status of disease. In many settings, however, knowledge of the true status of disease is not possible and accuracy is evaluated against the interpretations of an expert panel. Common statistical approaches to evaluate accuracy include receiver operating characteristic (ROC) and kappa analysis but both of these methods have significant limitations and cannot answer the question of equivalence: Is the CAD performance equivalent to that of an expert? The goal of this study is to show the strength of log-linear analysis over standard ROC and kappa statistics in evaluating the accuracy of computer-aided diagnosis of renal obstruction compared to the diagnosis provided by expert readers. Methods Log-linear modeling was utilized to analyze a previously published database that used ROC and kappa statistics to compare diuresis renography scan interpretations (non-obstructed, equivocal, or obstructed) generated by a renal expert system (RENEX) in 185 kidneys (95 patients) with the independent and consensus scan interpretations of three experts who were blinded to clinical information and prospectively and independently graded each kidney as obstructed, equivocal, or non-obstructed. Results Log-linear modeling showed that RENEX and the expert consensus had beyond-chance agreement in both non-obstructed and obstructed readings (both p < 0.0001). Moreover, pairwise agreement between experts and pairwise agreement between each expert and RENEX were not significantly different (p = 0.41, 0.95, 0.81 for the non-obstructed, equivocal, and obstructed categories, respectively). Similarly, the three-way agreement of the three experts and three-way agreement of two experts and RENEX was not significantly different for non-obstructed (p = 0.79) and obstructed (p = 0.49) categories. Conclusion Log-linear modeling showed that RENEX was equivalent to any expert in rating kidneys, particularly in the obstructed and non-obstructed categories. This conclusion, which could not be derived from the original ROC and kappa analysis, emphasizes and illustrates the role and importance of log-linear modeling in the absence of a gold standard. The log-linear analysis also provides additional evidence that RENEX has the potential to assist in the interpretation of diuresis renography studies.
Collapse
Affiliation(s)
- Amita K Manatunga
- Department of Biostatistics and Bioinformatics, Emory University School of Public Health, 1364 Clifton Road NE, Atlanta, GA 30322, USA
| | | | | |
Collapse
|
24
|
Zaidi H, El Naqa I. PET-guided delineation of radiation therapy treatment volumes: a survey of image segmentation techniques. Eur J Nucl Med Mol Imaging 2010; 37:2165-87. [PMID: 20336455 DOI: 10.1007/s00259-010-1423-3] [Citation(s) in RCA: 227] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2009] [Accepted: 02/20/2010] [Indexed: 12/23/2022]
Abstract
Historically, anatomical CT and MR images were used to delineate the gross tumour volumes (GTVs) for radiotherapy treatment planning. The capabilities offered by modern radiation therapy units and the widespread availability of combined PET/CT scanners stimulated the development of biological PET imaging-guided radiation therapy treatment planning with the aim to produce highly conformal radiation dose distribution to the tumour. One of the most difficult issues facing PET-based treatment planning is the accurate delineation of target regions from typical blurred and noisy functional images. The major problems encountered are image segmentation and imperfect system response function. Image segmentation is defined as the process of classifying the voxels of an image into a set of distinct classes. The difficulty in PET image segmentation is compounded by the low spatial resolution and high noise characteristics of PET images. Despite the difficulties and known limitations, several image segmentation approaches have been proposed and used in the clinical setting including thresholding, edge detection, region growing, clustering, stochastic models, deformable models, classifiers and several other approaches. A detailed description of the various approaches proposed in the literature is reviewed. Moreover, we also briefly discuss some important considerations and limitations of the widely used techniques to guide practitioners in the field of radiation oncology. The strategies followed for validation and comparative assessment of various PET segmentation approaches are described. Future opportunities and the current challenges facing the adoption of PET-guided delineation of target volumes and its role in basic and clinical research are also addressed.
Collapse
Affiliation(s)
- Habib Zaidi
- Geneva University Hospital, Geneva 4, Switzerland.
| | | |
Collapse
|
25
|
Jha AK, Kupinski MA, Rodríguez JJ, Stephen RM, Stopeck AT. Evaluating segmentation algorithms for diffusion-weighted MR images: a task-based approach. PROCEEDINGS OF SPIE--THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING 2010; 7627:76270L (2010). [PMID: 21152379 PMCID: PMC2997747 DOI: 10.1117/12.845515] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Apparent Diffusion Coefficient (ADC) of lesions obtained from Diffusion Weighted Magnetic Resonance Imaging is an emerging biomarker for evaluating anti-cancer therapy response. To compute the lesion's ADC, accurate lesion segmentation must be performed. To quantitatively compare these lesion segmentation algorithms, standard methods are used currently. However, the end task from these images is accurate ADC estimation, and these standard methods don't evaluate the segmentation algorithms on this task-based measure. Moreover, standard methods rely on the highly unlikely scenario of there being perfectly manually segmented lesions. In this paper, we present two methods for quantitatively comparing segmentation algorithms on the above task-based measure; the first method compares them given good manual segmentations from a radiologist, the second compares them even in absence of good manual segmentations.
Collapse
Affiliation(s)
- Abhinav K. Jha
- College of Optical Sciences, University of Arizona, Tucson, Arizona
| | | | - Jeffrey J. Rodríguez
- Dept. of Electrical and Computer Engineering, University of Arizona, Tucson, Arizona
| | - Renu M. Stephen
- Arizona Cancer Center, University of Arizona, Tucson, Arizona
| | | |
Collapse
|
26
|
Abstract
The purpose of this paper is to present a statistical reviewer's perspective on some technical aspects of reviewing Bayesian medical device trials submitted to the Food and Drug Administration. The discussion reflects the experiences of the authors and should not be misconstrued as official guidance by the FDA. A variety of applications are described, reflecting our experience with therapeutic and diagnostic devices. In addition to Bayesian analysis of trials, Bayesian trial design and Bayesian monitoring are discussed. Analyses were implemented in WinBUGS (http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/contents.shtml), with the code provided.
Collapse
Affiliation(s)
- Gene Pennello
- Division of Biostatistics, Rockville, Maryland 20850, USA
| | | |
Collapse
|
27
|
Ducreux D, Buvat I, Meder JF, Mikulis D, Crawley A, Fredy D, TerBrugge K, Lasjaunias P, Bittoun J. Perfusion-weighted MR imaging studies in brain hypervascular diseases: comparison of arterial input function extractions for perfusion measurement. AJNR Am J Neuroradiol 2006; 27:1059-69. [PMID: 16687543 PMCID: PMC7975726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
BACKGROUND AND PURPOSE Brain hypervascular diseases are complex and induce hemodynamic disturbances on brain parenchyma, which are difficult to accurately evaluate by using perfusion-weighted (PWI) MR imaging. Our purpose was to test and to assess the best AIF estimation method among 4 patients with brain hypervascular disease and healthy volunteers. METHODS Thirty-three patients and 10 healthy volunteers underwent brain perfusion studies by using a 1.5T MR imaging scanner with gadolinium-chelate bolus injection. PWI was performed with the indicator dilution method. AIF estimation methods were performed with local, regional, regional scaled, and global estimated arterial input function (AIF), and PWI measurements (cerebral blood volume [CBV] and cerebral blood flow [CBF]) were performed with regions of interest drawn on the thalami and centrum semiovale in all subjects, remote from the brain hypervascular disease nidus. Abnormal PWI results were assessed by using Z Score, and evaluation of the best AIF estimation method was performed by using a no gold standard evaluation method. RESULTS From 88% to 97% of patients had overall abnormal perfusion areas of hypo- (decreased CBV and CBF) and/or hyperperfusion (increased CBV and CBF) and/or venous congestion (increased CBV, normal or decreased CBF), depending on the AIF estimation method used for PWI computations. No gold standard evaluation of the 4 AIF estimates found the regional and the regional scaled methods to be the most accurate. CONCLUSION Brain hypervascular disease induces remote brain perfusion abnormalities that can be better detected by using PWI with regional or regional scaled AIF estimation methods.
Collapse
Affiliation(s)
- D Ducreux
- Department of Neuroradiology, C.H.U. de Bicêtre, Paris XI University, Le Kremlin-Bicêtre, France
| | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Khalil MM, Elgazzar A, Khalil W. Evaluation of left ventricular ejection fraction by the quantitative algorithms QGS, ECTb, LMC and LVGTF using gated myocardial perfusion SPECT: investigation of relative accuracy. Nucl Med Commun 2006; 27:321-32. [PMID: 16531917 DOI: 10.1097/01.mnm.0000202861.67293.95] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
AIM To compare the quantitative algorithms Emory Cardiac Toolbox (ECTb), quantitative gated SPECT (QGS), layer of maximum counts (LMC), and left ventricular global thickening fraction (LVGTF) using gated myocardial tomography in the calculation of the left ventricular ejection fraction using the regression without truth (RWT) technique. MATERIALS AND METHODS Seventy-four consecutive patients were included in the study (59 males). All patients underwent stress-rest myocardial perfusion SPECT using Tc-tetrofosmin. Analysis of variance (ANOVA), the paired Student's t-test, the Pearson correlation coefficient and Bland-Altman were used for comparing the methods. The relative accuracy was performed by RWT. RESULTS ANOVA revealed a significant difference among the methods in calculating the ejection fraction. RWT showed that ECTb and QGS outperformed the other two methods. The ECTb was slightly better than QGS, and LMC was slightly better than LVGTF. QGS and ECTb achieved good correlations in end diastolic volume, end systolic volume and ejection fraction measurements. One-way ANOVA demonstrated that QGS was the only software program affected by the category of the perfusion summed stress score (SSS), P=0.038. The ejection fraction determined by the QGS, ECTb and LVGTF methods correlated significantly with defect size (r=0.545, P<0.0001; r=0.530, P<0.0001; and r=0.419, P<0.0001, respectively), but the LMC method was not significantly correlated (r=0.216, P=0.067). CONCLUSIONS There was a considerable variation among the quantitative gated SPECT methods in the evaluation of the ejection fraction. RWT revealed that the ECTb and QGS outperformed the other two methods with respect to the bias and precision of the measurements. Pair-wise correlations of the four methods ranged from mild to good with large agreement limits. Results of RWT provided important information in ranking the quantitative gated SPECT methods.
Collapse
Affiliation(s)
- Magdy Mohamed Khalil
- Nuclear Medicine Department, Faculty of Medicine, Kuwait University, Faculty of Science, Cairo University, Egypt.
| | | | | |
Collapse
|
29
|
Kupinski MA, Hoppin JW, Krasnow J, Dahlberg S, Leppo JA, King MA, Clarkson E, Barrett HH. Comparing cardiac ejection fraction estimation algorithms without a gold standard. Acad Radiol 2006; 13:329-37. [PMID: 16488845 PMCID: PMC2464280 DOI: 10.1016/j.acra.2005.12.005] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2005] [Revised: 12/01/2005] [Accepted: 12/02/2005] [Indexed: 11/17/2022]
Abstract
RATIONALE AND OBJECTIVES Imaging and estimation of left ventricular function have major diagnostic and prognostic importance in patients with coronary artery disease. It is vital that the method used to estimate cardiac ejection fraction (EF) allows the observer to best perform this task. To measure task-based performance, one must clearly define the task in question, the observer performing the task, and the patient population being imaged. In this report, the task is to accurately and precisely measure cardiac EF, and the observers are human-assisted computer algorithms that analyze the images and estimate cardiac EF. It is very difficult to measure the performance of an observer by using clinical data because estimation tasks typically lack a gold standard. A solution to this "no-gold-standard" problem recently was proposed, called regression without truth (RWT). MATERIALS AND METHODS Results of three different software packages used to analyze gated, cardiac, and nuclear medicine images, each of which uses a different algorithm to estimate a patient's cardiac EF, are compared. The three methods are the Emory method, Quantitative Gated Single-Photon Emission Computed Tomographic method, and the Wackers-Liu Circumferential Quantification method. The same set of images is used as input to each of the three algorithms. Data were analyzed from the three different algorithms by using RWT to determine which produces the best estimates of cardiac EF in terms of accuracy and precision. RESULTS AND DISCUSSION In performing this study, three different consistency checks were developed to ensure that the RWT method is working properly. The Emory method of estimating EF slightly outperformed the other two methods. In addition, the RWT method passed all three consistency checks, garnering confidence in the method and its application to clinical data.
Collapse
Affiliation(s)
- Matthew A Kupinski
- Optical Sciences Center, The University of Arizona, 1630 East University Blvd, Tucson, AZ 85721, USA.
| | | | | | | | | | | | | | | |
Collapse
|
30
|
Ruiz-de-Jesus O, Yanez-Suarez O, Jimenez-Angeles L, Vallejo-Venegas E. Software phantom for the synthesis of equilibrium radionuclide ventriculography images. CONFERENCE PROCEEDINGS : ... ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL CONFERENCE 2006; 2006:1085-1088. [PMID: 17946442 DOI: 10.1109/iembs.2006.260276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
This paper presents the novel design of a software phantom for the evaluation of equilibrium radionuclide ventriculography systems. Through singular value decomposition, the data matrix corresponding to an equilibrium image series is decomposed into both spatial and temporal fundamental components that can be parametrized. This parametric model allows for the application of user-controlled conditions related to a desired dynamic behavior. Being invertible, the decomposition is used to regenerate the radionuclide image series, which is then translated into a DICOM ventriculography file that can be read by commercial equipment.
Collapse
Affiliation(s)
- Oscar Ruiz-de-Jesus
- Department of Electrical Engineering, Universidad Autonoma Metropolitana-Iztapalapa, Mexico City, Mexico.
| | | | | | | |
Collapse
|
31
|
Hoppin JW, Kupinski MA, Wilson DW, Peterson T, Gershman B, Kastis G, Clarkson E, Furenlid L, Barrett HH. Evaluating Estimation Techniques in Medical Imaging Without a Gold Standard: Experimental Validation. PROCEEDINGS OF SPIE--THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING 2003; 5034:10.1117/12.480330. [PMID: 26346933 PMCID: PMC4558919 DOI: 10.1117/12.480330] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Imaging is often used for the purpose of estimating the value of some parameter of interest. For example, a cardiologist may measure the ejection fraction (EF) of the heart to quantify how much blood is being pumped out of the heart on each stroke. In clinical practice, however, it is difficult to evaluate an estimation method because the gold standard is not known, e.g., a cardiologist does not know the true EF of a patient. An estimation method is typically evaluated by plotting its results against the results of another (more accepted) estimation method. This approach results in the use of one set of estimates as the pseudo-gold standard. We have developed a maximum-likelihood approach for comparing different estimation methods to the gold standard without the use of the gold standard. In previous works we have displayed the results of numerous simulation studies indicating the method can precisely and accurately estimate the parameters of a regression line without a gold standard, i.e., without the x-axis. In an attempt to further validate our method we have designed an experiment performing volume estimation using a physical phantom and two imaging systems (SPECT,CT).
Collapse
Affiliation(s)
- John W Hoppin
- Program in Applied Mathematics, The University of Arizona, Tucson, AZ
| | - Matthew A Kupinski
- Department of Radiology, The University of Arizona, Tucson, AZ ; Optical Sciences Center, The University of Arizona, Tucson, AZ
| | - Donald W Wilson
- Department of Radiology, The University of Arizona, Tucson, AZ
| | - Todd Peterson
- Department of Radiology, The University of Arizona, Tucson, AZ
| | | | - George Kastis
- Department of Radiology, The University of Arizona, Tucson, AZ
| | - Eric Clarkson
- Program in Applied Mathematics, The University of Arizona, Tucson, AZ ; Department of Radiology, The University of Arizona, Tucson, AZ ; Optical Sciences Center, The University of Arizona, Tucson, AZ
| | - Lars Furenlid
- Department of Radiology, The University of Arizona, Tucson, AZ ; Optical Sciences Center, The University of Arizona, Tucson, AZ
| | - Harrison H Barrett
- Program in Applied Mathematics, The University of Arizona, Tucson, AZ ; Department of Radiology, The University of Arizona, Tucson, AZ ; Optical Sciences Center, The University of Arizona, Tucson, AZ
| |
Collapse
|
32
|
Hoppin JW, Kupinski MA, Kastis GA, Clarkson E, Barrett HH. Objective comparison of quantitative imaging modalities without the use of a gold standard. IEEE TRANSACTIONS ON MEDICAL IMAGING 2002; 21:441-9. [PMID: 12071615 PMCID: PMC3150581 DOI: 10.1109/tmi.2002.1009380] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Imaging is often used for the purpose of estimating the value of some parameter of interest. For example, a cardiologist may measure the ejection fraction (EF) of the heart in order to know how much blood is being pumped out of the heart on each stroke. In clinical practice, however, it is difficult to evaluate an estimation method because the gold standard is not known, e.g., a cardiologist does not know the true EF of a patient. Thus, researchers have often evaluated an estimation method by plotting its results against the results of another (more accepted) estimation method, which amounts to using one set of estimates as the pseudogold standard. In this paper, we present a maximum-likelihood approach for evaluating and comparing different estimation methods without the use of a gold standard with specific emphasis on the problem of evaluating EF estimation methods. Results of numerous simulation studies will be presented and indicate that the method can precisely and accurately estimate the parameters of a regression line without a gold standard, i.e., without the x axis.
Collapse
Affiliation(s)
- John W Hoppin
- Department of Radiology, Arizona Health Sciences Center, Tucson 85724-5067, USA.
| | | | | | | | | |
Collapse
|