1
|
Hatt M, Lee JA, Schmidtlein CR, Naqa IE, Caldwell C, De Bernardi E, Lu W, Das S, Geets X, Gregoire V, Jeraj R, MacManus MP, Mawlawi OR, Nestle U, Pugachev AB, Schöder H, Shepherd T, Spezi E, Visvikis D, Zaidi H, Kirov AS. Classification and evaluation strategies of auto-segmentation approaches for PET: Report of AAPM task group No. 211. Med Phys 2017; 44:e1-e42. [PMID: 28120467 DOI: 10.1002/mp.12124] [Citation(s) in RCA: 134] [Impact Index Per Article: 19.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2016] [Revised: 12/09/2016] [Accepted: 01/04/2017] [Indexed: 12/14/2022] Open
Abstract
PURPOSE The purpose of this educational report is to provide an overview of the present state-of-the-art PET auto-segmentation (PET-AS) algorithms and their respective validation, with an emphasis on providing the user with help in understanding the challenges and pitfalls associated with selecting and implementing a PET-AS algorithm for a particular application. APPROACH A brief description of the different types of PET-AS algorithms is provided using a classification based on method complexity and type. The advantages and the limitations of the current PET-AS algorithms are highlighted based on current publications and existing comparison studies. A review of the available image datasets and contour evaluation metrics in terms of their applicability for establishing a standardized evaluation of PET-AS algorithms is provided. The performance requirements for the algorithms and their dependence on the application, the radiotracer used and the evaluation criteria are described and discussed. Finally, a procedure for algorithm acceptance and implementation, as well as the complementary role of manual and auto-segmentation are addressed. FINDINGS A large number of PET-AS algorithms have been developed within the last 20 years. Many of the proposed algorithms are based on either fixed or adaptively selected thresholds. More recently, numerous papers have proposed the use of more advanced image analysis paradigms to perform semi-automated delineation of the PET images. However, the level of algorithm validation is variable and for most published algorithms is either insufficient or inconsistent which prevents recommending a single algorithm. This is compounded by the fact that realistic image configurations with low signal-to-noise ratios (SNR) and heterogeneous tracer distributions have rarely been used. Large variations in the evaluation methods used in the literature point to the need for a standardized evaluation protocol. CONCLUSIONS Available comparison studies suggest that PET-AS algorithms relying on advanced image analysis paradigms provide generally more accurate segmentation than approaches based on PET activity thresholds, particularly for realistic configurations. However, this may not be the case for simple shape lesions in situations with a narrower range of parameters, where simpler methods may also perform well. Recent algorithms which employ some type of consensus or automatic selection between several PET-AS methods have potential to overcome the limitations of the individual methods when appropriately trained. In either case, accuracy evaluation is required for each different PET scanner and scanning and image reconstruction protocol. For the simpler, less robust approaches, adaptation to scanning conditions, tumor type, and tumor location by optimization of parameters is necessary. The results from the method evaluation stage can be used to estimate the contouring uncertainty. All PET-AS contours should be critically verified by a physician. A standard test, i.e., a benchmark dedicated to evaluating both existing and future PET-AS algorithms needs to be designed, to aid clinicians in evaluating and selecting PET-AS algorithms and to establish performance limits for their acceptance for clinical use. The initial steps toward designing and building such a standard are undertaken by the task group members.
Collapse
Affiliation(s)
- Mathieu Hatt
- INSERM, UMR 1101, LaTIM, University of Brest, IBSAM, Brest, France
| | - John A Lee
- Université catholique de Louvain (IREC/MIRO) & FNRS, Brussels, 1200, Belgium
| | | | | | - Curtis Caldwell
- Sunnybrook Health Sciences Center, Toronto, ON, M4N 3M5, Canada
| | | | - Wei Lu
- Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
| | - Shiva Das
- University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Xavier Geets
- Université catholique de Louvain (IREC/MIRO) & FNRS, Brussels, 1200, Belgium
| | - Vincent Gregoire
- Université catholique de Louvain (IREC/MIRO) & FNRS, Brussels, 1200, Belgium
| | - Robert Jeraj
- University of Wisconsin, Madison, WI, 53705, USA
| | | | | | - Ursula Nestle
- Universitätsklinikum Freiburg, Freiburg, 79106, Germany
| | - Andrei B Pugachev
- University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Heiko Schöder
- Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
| | | | - Emiliano Spezi
- School of Engineering, Cardiff University, Cardiff, Wales, United Kingdom
| | | | - Habib Zaidi
- Geneva University Hospital, Geneva, CH-1211, Switzerland
| | - Assen S Kirov
- Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
| |
Collapse
|
2
|
Beichel RR, Smith BJ, Bauer C, Ulrich EJ, Ahmadvand P, Budzevich MM, Gillies RJ, Goldgof D, Grkovski M, Hamarneh G, Huang Q, Kinahan PE, Laymon CM, Mountz JM, Muzi JP, Muzi M, Nehmeh S, Oborski MJ, Tan Y, Zhao B, Sunderland JJ, Buatti JM. Multi-site quality and variability analysis of 3D FDG PET segmentations based on phantom and clinical image data. Med Phys 2017; 44:479-496. [PMID: 28205306 DOI: 10.1002/mp.12041] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Revised: 11/15/2016] [Accepted: 11/21/2016] [Indexed: 01/03/2023] Open
Abstract
PURPOSE Radiomics utilizes a large number of image-derived features for quantifying tumor characteristics that can in turn be correlated with response and prognosis. Unfortunately, extraction and analysis of such image-based features is subject to measurement variability and bias. The challenge for radiomics is particularly acute in Positron Emission Tomography (PET) where limited resolution, a high noise component related to the limited stochastic nature of the raw data, and the wide variety of reconstruction options confound quantitative feature metrics. Extracted feature quality is also affected by tumor segmentation methods used to define regions over which to calculate features, making it challenging to produce consistent radiomics analysis results across multiple institutions that use different segmentation algorithms in their PET image analysis. Understanding each element contributing to these inconsistencies in quantitative image feature and metric generation is paramount for ultimate utilization of these methods in multi-institutional trials and clinical oncology decision making. METHODS To assess segmentation quality and consistency at the multi-institutional level, we conducted a study of seven institutional members of the National Cancer Institute Quantitative Imaging Network. For the study, members were asked to segment a common set of phantom PET scans acquired over a range of imaging conditions as well as a second set of head and neck cancer (HNC) PET scans. Segmentations were generated at each institution using their preferred approach. In addition, participants were asked to repeat segmentations with a time interval between initial and repeat segmentation. This procedure resulted in overall 806 phantom insert and 641 lesion segmentations. Subsequently, the volume was computed from the segmentations and compared to the corresponding reference volume by means of statistical analysis. RESULTS On the two test sets (phantom and HNC PET scans), the performance of the seven segmentation approaches was as follows. On the phantom test set, the mean relative volume errors ranged from 29.9 to 87.8% of the ground truth reference volumes, and the repeat difference for each institution ranged between -36.4 to 39.9%. On the HNC test set, the mean relative volume error ranged between -50.5 to 701.5%, and the repeat difference for each institution ranged between -37.7 to 31.5%. In addition, performance measures per phantom insert/lesion size categories are given in the paper. On phantom data, regression analysis resulted in coefficient of variation (CV) components of 42.5% for scanners, 26.8% for institutional approaches, 21.1% for repeated segmentations, 14.3% for relative contrasts, 5.3% for count statistics (acquisition times), and 0.0% for repeated scans. Analysis showed that the CV components for approaches and repeated segmentations were significantly larger on the HNC test set with increases by 112.7% and 102.4%, respectively. CONCLUSION Analysis results underline the importance of PET scanner reconstruction harmonization and imaging protocol standardization for quantification of lesion volumes. In addition, to enable a distributed multi-site analysis of FDG PET images, harmonization of analysis approaches and operator training in combination with highly automated segmentation methods seems to be advisable. Future work will focus on quantifying the impact of segmentation variation on radiomics system performance.
Collapse
Affiliation(s)
- Reinhard R Beichel
- Department of Electrical and Computer Engineering, The University of Iowa, Iowa City, IA, USA.,Department of Internal Medicine, The University of Iowa, Iowa City, IA, USA
| | - Brian J Smith
- Department of Biostatistics, The University of Iowa, Iowa City, IA, USA
| | - Christian Bauer
- Department of Electrical and Computer Engineering, The University of Iowa, Iowa City, IA, USA
| | - Ethan J Ulrich
- Department of Electrical and Computer Engineering, The University of Iowa, Iowa City, IA, USA.,Department of Biomedical Engineering, The University of Iowa, Iowa City, IA, USA
| | - Payam Ahmadvand
- School of Computing Science, Simon Fraser University, Burnaby, Canada
| | | | | | - Dmitry Goldgof
- Department of Computer Science and Engineering, University of South Florida, Tampa, FL, USA
| | - Milan Grkovski
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Ghassan Hamarneh
- School of Computing Science, Simon Fraser University, Burnaby, Canada
| | - Qiao Huang
- Department of Radiology, Columbia University Medical Center, New York, NY, USA
| | - Paul E Kinahan
- Department of Radiology, University of Washington Medical Center, Seattle, WA, USA
| | - Charles M Laymon
- Department of Bioengineering, University of Pittsburgh, Pittsburgh, PA, USA.,Department of Radiology, University of Pittsburgh, Pittsburgh, PA, USA
| | - James M Mountz
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA, USA
| | - John P Muzi
- Department of Radiology, University of Washington Medical Center, Seattle, WA, USA
| | - Mark Muzi
- Department of Radiology, University of Washington Medical Center, Seattle, WA, USA
| | - Sadek Nehmeh
- National Center for Cancer Care and Research, Doha, Qatar
| | - Matthew J Oborski
- Department of Bioengineering, University of Pittsburgh, Pittsburgh, PA, USA
| | - Yongqiang Tan
- Department of Radiology, Columbia University Medical Center, New York, NY, USA
| | - Binsheng Zhao
- Department of Radiology, Columbia University Medical Center, New York, NY, USA
| | | | - John M Buatti
- Department of Radiation Oncology, The University of Iowa, Iowa City, IA, USA
| |
Collapse
|
4
|
Haack S, Tanderup K, Kallehauge JF, Mohamed SMI, Lindegaard JC, Pedersen EM, Jespersen SN. Diffusion-weighted magnetic resonance imaging during radiotherapy of locally advanced cervical cancer--treatment response assessment using different segmentation methods. Acta Oncol 2015. [PMID: 26217984 DOI: 10.3109/0284186x.2015.1062545] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
BACKGROUND Diffusion-weighted magnetic resonance imaging (DW-MRI) and the derived apparent diffusion coefficient (ADC) value has potential for monitoring tumor response to radiotherapy (RT). Method used for segmentation of volumes with reduced diffusion will influence both volume size and observed distribution of ADC values. This study evaluates: 1) different segmentation methods; and 2) how they affect assessment of tumor ADC value during RT. MATERIAL AND METHODS Eleven patients with locally advanced cervical cancer underwent MRI three times during their RT: prior to start of RT (PRERT), two weeks into external beam RT (WK2RT) and one week prior to brachytherapy (PREBT). Volumes on DW-MRI were segmented using three semi-automatic segmentation methods: "cluster analysis", "relative signal intensity (SD4)" and "region growing". Segmented volumes were compared to the gross tumor volume (GTV) identified on T2-weighted MR images using the Jaccard similarity index (JSI). ADC values from segmented volumes were compared and changes of ADC values during therapy were evaluated. RESULTS Significant difference between the four volumes (GTV, DWIcluster, DWISD4 and DWIregion) was found (p < 0.01), and the volumes changed significantly during treatment (p < 0.01). There was a significant difference in JSI among segmentation methods at time of PRERT (p < 0.016) with region growing having the lowest JSIGTV (mean± sd: 0.35 ± 0.1), followed by the SD4 method (mean± sd: 0.50 ± 0.1) and clustering (mean± sd: 0.52 ± 0.3). There was no significant difference in mean ADC value compared at same treatment time. Mean tumor ADC value increased significantly (p < 0.01) for all methods across treatment time. CONCLUSION Among the three semi-automatic segmentations of hyper-intense intensities on DW-MR images implemented, cluster analysis and relative signal thresholding had the greatest similarity to the clinical tumor volume. Evaluation of mean ADC value did not depend on segmentation method.
Collapse
Affiliation(s)
- Søren Haack
- a Department of Clinical Engineering , Aarhus University Hospital , Aarhus , Denmark
- b Departmant of Oncology, Aarhus University Hospital , Aarhus , Denmark
| | - Kari Tanderup
- b Departmant of Oncology, Aarhus University Hospital , Aarhus , Denmark
| | | | - Sandy Mohamed Ismail Mohamed
- b Departmant of Oncology, Aarhus University Hospital , Aarhus , Denmark
- d Department of Radiotherapy , National Cancer Institute, Cairo University , Cairo , Egypt
| | | | | | - Sune Nørhøj Jespersen
- f CFIN/MindLab, Aarhus University , Aarhus , Denmark
- g Department of Physics and Astronomy , Aarhus University , Aarhus , Denmark
| |
Collapse
|