Koganezawa AS, Matsuura T, Kawahara D, Nakashima T, Shiba E, Murakami Y, Nagata Y. Unbiased evaluation of predicted gamma passing rate by an event-mixing technique.
Med Phys 2024;
51:5-17. [PMID:
38009570 DOI:
10.1002/mp.16848]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 10/31/2023] [Accepted: 10/31/2023] [Indexed: 11/29/2023] Open
Abstract
BACKGROUND
Predicting models of the gamma passing rate (GPR) have been studied to substitute the measurement-based gamma analysis. Since these studies used data from different radiotherapy systems comprising TPS, linear accelerator, and detector array, it has been difficult to compare the performances of the predicting models among institutions with different radiotherapy systems.
PURPOSE
We aimed to develop unbiased scoring methods to evaluate the performance of the models predicting the GPR, by introducing both best and worst limits for the performance of the GPR prediction.
METHODS
Two hundred head-and-neck VMAT plans were used to develop a framework. The GPRs were measured using the ArcCHECK device. The predicted GPR [p] was generated using a deep learning-based model [pDL ]. The predicting model was evaluated using four metrics: standard deviation (SD) [σ], Pearson's correlation coefficient (CC) [r], mean squared error (MSE) [s], and mean absolute error (MAE) [a]. The best limit [σ m ${\sigma _m}$ ,r m ${r_m}$ ,s m ${s_m}$ , anda m ${a_m}$ ] was estimated by measuring the SD of measured GPR [m] by shifting the device along the longitudinal direction to measure different sampling points. Mimicked best and worst p's [pbest and pworst ] were generated from pDL . The worst limit was defined such that m and p have no correlation [CC ∼ 0]. The worst limit [σMix , rMix , sMix , and aMix ] was generated using the event-mixing (EM) technique originally introduced in high-energy physics experiments. The range of σ, r, s, and a was defined to be[ σ m , σ Mix ] $[ {{\sigma _m},{\sigma _{{\mathrm{Mix}}}}} ]$ ,[ 0 , r m ] $[ {0,{r_m}} ]$ ,[ s m , s Mix ] $[ {{s_m},{s_{{\mathrm{Mix}}}}} ]$ , and[ a m , a Mix ] $[ {{a_m},{a_{{\mathrm{Mix}}}}} ]$ . The achievement score (AS) independently based on σ, r, s, and a were calculated for pDL , pbest and pworst . The probability that p fails the gamma analysis (alert frequency; AF) was estimated as a function ofσ d ${\sigma _d}$ values within the [σ m ${\sigma _m}$ , σMix ] range for the 3%/2 mm data with a 95% criterion.
RESULTS
SDs of the best limit were well reproduced byσ m = 0.531 100 - m ${\sigma _m} = \;0.531\sqrt {100 - m} $ . The EM technique successfully generated the( m , p ) $( {m,p} )$ pairs with no correlation. The AS using four metrics showed good agreement. This agreement indicates successful definitions of both best and worst limits, consistent definitions of the AS, and successful generations of mixed events. The AF for the DL-based model with the 3%/2 mm tolerance was 31.5% and 63.0% with CL's 99% and 99.9%, respectively.
CONCLUSION
We developed the AS to evaluate the predicting model of the GPR in an unbiased manner by excluding the effects of the precision of the radiotherapy system and the spreading of the GPR. The best and worst limits of the GPR prediction were successfully generated using the measured precision of the GPR and the EM technique, respectively. The AS andσ p ${\sigma _p}$ are expected to enable objective evaluation of the predicting model and setting exact achievement goal of precision for the predicted GPR.
Collapse