Johns WL, Martinazzi BJ, Miltenberg B, Nam HH, Hammoud S. ChatGPT Provides Unsatisfactory Responses to Frequently Asked Questions Regarding Anterior Cruciate Ligament Reconstruction.
Arthroscopy 2024;
40:2067-2079.e1. [PMID:
38311261 DOI:
10.1016/j.arthro.2024.01.017]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 01/01/2024] [Accepted: 01/08/2024] [Indexed: 02/10/2024]
Abstract
PURPOSE
To determine whether the free online artificial intelligence platform ChatGPT could accurately, adequately, and appropriately answer questions regarding anterior cruciate ligament (ACL) reconstruction surgery.
METHODS
A list of 10 questions about ACL surgery was created based on a review of frequently asked questions that appeared on websites of various orthopaedic institutions. Each question was separately entered into ChatGPT (version 3.5), and responses were recorded, scored, and graded independently by 3 authors. The reading level of the ChatGPT response was calculated using the WordCalc software package, and readability was assessed using the Flesch-Kincaid grade level, Simple Measure of Gobbledygook index, Coleman-Liau index, Gunning fog index, and automated readability index.
RESULTS
Of the 10 frequently asked questions entered into ChatGPT, 6 were deemed as unsatisfactory and requiring substantial clarification; 1, as adequate and requiring moderate clarification; 1, as adequate and requiring minor clarification; and 2, as satisfactory and requiring minimal clarification. The mean DISCERN score was 41 (inter-rater reliability, 0.721), indicating the responses to the questions were average. According to the readability assessments, a full understanding of the ChatGPT responses required 13.4 years of education, which corresponds to the reading level of a college sophomore.
CONCLUSIONS
Most of the ChatGPT-generated responses were outdated and failed to provide an adequate foundation for patients' understanding regarding their injury and treatment options. The reading level required to understand the responses was too advanced for some patients, leading to potential misunderstanding and misinterpretation of information. ChatGPT lacks the ability to differentiate and prioritize information that is presented to patients.
CLINICAL RELEVANCE
Recognizing the shortcomings in artificial intelligence platforms may equip surgeons to better set expectations and provide support for patients considering and preparing for ACL reconstruction.
Collapse