4
|
Payne DL, Purohit K, Borrero WM, Chung K, Hao M, Mpoy M, Jin M, Prasanna P, Hill V. Performance of GPT-4 on the American College of Radiology In-training Examination: Evaluating Accuracy, Model Drift, and Fine-tuning. Acad Radiol 2024; 31:3046-3054. [PMID: 38653599 DOI: 10.1016/j.acra.2024.04.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 04/01/2024] [Accepted: 04/06/2024] [Indexed: 04/25/2024]
Abstract
RATIONALE AND OBJECTIVES In our study, we evaluate GPT-4's performance on the American College of Radiology (ACR) 2022 Diagnostic Radiology In-Training Examination (DXIT). We perform multiple experiments across time points to assess for model drift, as well as after fine-tuning to assess for differences in accuracy. MATERIALS AND METHODS Questions were sequentially input into GPT-4 with a standardized prompt. Each answer was recorded and overall accuracy was calculated, as was logic-adjusted accuracy, and accuracy on image-based questions. This experiment was repeated several months later to assess for model drift, then again after the performance of fine-tuning to assess for changes in GPT's performance. RESULTS GPT-4 achieved 58.5% overall accuracy, lower than the PGY-3 average (61.9%) but higher than the PGY-2 average (52.8%). Adjusted accuracy was 52.8%. GPT-4 showed significantly higher (p = 0.012) confidence for correct answers (87.1%) compared to incorrect (84.0%). Performance on image-based questions was significantly poorer (p < 0.001) at 45.4% compared to text-only questions (80.0%), with adjusted accuracy for image-based questions of 36.4%. When the questions were repeated, GPT-4 chose a different answer 25.5% of the time and there was no change in accuracy. Fine-tuning did not improve accuracy. CONCLUSION GPT-4 performed between PGY-2 and PGY-3 levels on the 2022 DXIT, significantly poorer on image-based questions, and with large variability in answer choices across time points. Exploratory experiments in fine-tuning did not improve performance. This study underscores the potential and risks of using minimally-prompted general AI models in interpreting radiologic images as a diagnostic tool. Implementers of general AI radiology systems should exercise caution given the possibility of spurious yet confident responses.
Collapse
Affiliation(s)
- David L Payne
- Stony Brook University Hospital Department of Radiology, 101 Nicolls Road, Stony Brook, New York 11794, USA (D.L.P., K.P., W.M.B., K.C., M.H., M.M., M.J.); Stony Brook University Department of Biomedical Informatics, 1 Lauterbur Drive, Stony Brook, New York 11794, USA (D.L.P., P.P.).
| | - Kush Purohit
- Stony Brook University Hospital Department of Radiology, 101 Nicolls Road, Stony Brook, New York 11794, USA (D.L.P., K.P., W.M.B., K.C., M.H., M.M., M.J.)
| | - Walter Morales Borrero
- Stony Brook University Hospital Department of Radiology, 101 Nicolls Road, Stony Brook, New York 11794, USA (D.L.P., K.P., W.M.B., K.C., M.H., M.M., M.J.)
| | - Katherine Chung
- Stony Brook University Hospital Department of Radiology, 101 Nicolls Road, Stony Brook, New York 11794, USA (D.L.P., K.P., W.M.B., K.C., M.H., M.M., M.J.)
| | - Max Hao
- Stony Brook University Hospital Department of Radiology, 101 Nicolls Road, Stony Brook, New York 11794, USA (D.L.P., K.P., W.M.B., K.C., M.H., M.M., M.J.)
| | - Mutshipay Mpoy
- Stony Brook University Hospital Department of Radiology, 101 Nicolls Road, Stony Brook, New York 11794, USA (D.L.P., K.P., W.M.B., K.C., M.H., M.M., M.J.)
| | - Michael Jin
- Stony Brook University Hospital Department of Radiology, 101 Nicolls Road, Stony Brook, New York 11794, USA (D.L.P., K.P., W.M.B., K.C., M.H., M.M., M.J.)
| | - Prateek Prasanna
- Stony Brook University Department of Biomedical Informatics, 1 Lauterbur Drive, Stony Brook, New York 11794, USA (D.L.P., P.P.)
| | - Virginia Hill
- Northwestern University Feinberg School of Medicine Department of Radiology, 676 North Clair Street, Chicago, Illinois 60611, USA (V.H.)
| |
Collapse
|