Silva TP, Andrade-Bortoletto MFS, Ocampo TSC, Alencar-Palha C, Bornstein MM, Oliveira-Santos C, Oliveira ML. Performance of a commercially available Generative Pre-trained Transformer (GPT) in describing radiolucent lesions in panoramic radiographs and establishing differential diagnoses.
Clin Oral Investig 2024;
28:204. [PMID:
38459362 PMCID:
PMC10924032 DOI:
10.1007/s00784-024-05587-5]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Accepted: 02/25/2024] [Indexed: 03/10/2024]
Abstract
OBJECTIVES
To evaluate the performance of a commercially available Generative Pre-trained Transformer (GPT) in describing and establishing differential diagnoses for radiolucent lesions in panoramic radiographs.
MATERIALS AND METHODS
Twenty-eight panoramic radiographs, each containing a single radiolucent lesion, were evaluated in consensus by three examiners and a commercially available ChatGPT-3.5 model. They provided descriptions regarding internal structure (radiodensity, loculation), periphery (margin type, cortication), shape, location (bone, side, region, teeth/structures), and effects on adjacent structures (effect, adjacent structure). Diagnostic impressions related to origin, behavior, and nature were also provided. The GPT program was additionally prompted to provide differential diagnoses. Keywords used by the GPT program were compared to those used by the examiners and scored as 0 (incorrect), 0.5 (partially correct), or 1 (correct). Mean score values and standard deviation were calculated for each description. Performance in establishing differential diagnoses was assessed using Rank-1, -2, and - 3.
RESULTS
Descriptions of margination, affected bone, and origin received the highest scores: 0.93, 0.93, and 0.87, respectively. Shape, region, teeth/structures, effect, affected region, and nature received considerably lower scores ranging from 0.22 to 0.50. Rank-1, -2, and - 3 demonstrated accuracy in 25%, 57.14%, and 67.85% of cases, respectively.
CONCLUSION
The performance of the GPT program in describing and providing differential diagnoses for radiolucent lesions in panoramic radiographs is variable and at this stage limited in its use for clinical application.
CLINICAL RELEVANCE
Understanding the potential role of GPT systems as an auxiliary tool in image interpretation is imperative to validate their clinical applicability.
Collapse