1
|
Mahedia M, Rohrich RN, Sadiq KO, Bailey L, Harrison LM, Hallac RR. Exploring the Utility of ChatGPT in Cleft Lip Repair Education. J Clin Med 2025; 14:993. [PMID: 39941663 PMCID: PMC11818196 DOI: 10.3390/jcm14030993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2024] [Revised: 01/30/2025] [Accepted: 02/01/2025] [Indexed: 02/16/2025] Open
Abstract
Background/Objectives: The evolving capabilities of large language models, such as generative pre-trained transformers (ChatGPT), offer new avenues for disseminating health information online. These models, trained on extensive datasets, are designed to deliver customized responses to user queries. However, as these outputs are unsupervised, understanding their quality and accuracy is essential to gauge their reliability for potential applications in healthcare. This study evaluates responses generated by ChatGPT addressing common patient concerns and questions about cleft lip repair. Methods: Ten commonly asked questions about cleft lip repair procedures were selected from the American Society of Plastic Surgeons' patient information resources. These questions were input as ChatGPT prompts and five board-certified plastic surgeons assessed the generated responses on quality of content, clarity, relevance, and trustworthiness, using a 4-point Likert scale. Readability was evaluated using the Flesch reading ease score (FRES) and the Flesch-Kincaid grade level (FKGL). Results: ChatGPT responses scored an aggregated mean rating of 2.9 out of 4 across all evaluation criteria. Clarity and content quality received the highest ratings (3.1 ± 0.6), while trustworthiness had the lowest rating (2.7 ± 0.6). Readability metrics revealed a mean FRES of 44.35 and a FKGL of 10.87, corresponding to approximately a 10th-grade literacy standard. None of the responses contained grossly inaccurate or potentially harmful medical information but lacked citations. Conclusions: ChatGPT demonstrates potential as a supplementary tool for patient education in cleft lip management by delivering generally accurate, relevant, and understandable information. Despite the value that AI-powered tools can provide to clinicians and patients, the lack of human oversight underscores the importance of user awareness regarding its limitations.
Collapse
Affiliation(s)
- Monali Mahedia
- Department of Surgery, Rutgers University—NJMS, Newark, NJ 07103, USA
| | - Rachel N. Rohrich
- Department of Plastic and Reconstructive Surgery, MedStar Georgetown University Hospital, Washington, DC 20007, USA
| | | | - Lauren Bailey
- Department of Plastic Surgery, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Lucas M. Harrison
- Department of Plastic Surgery, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Rami R. Hallac
- Department of Plastic Surgery, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
- Analytical Imaging and Modeling Center, Children’s Health, Dallas, TX 75235, USA
| |
Collapse
|
2
|
Shiraishi M, Sowa Y, Tomita K, Terao Y, Satake T, Muto M, Morita Y, Higai S, Toyohara Y, Kurokawa Y, Sunaga A, Okazaki M. Performance of Artificial Intelligence Chatbots in Answering Clinical Questions on Japanese Practical Guidelines for Implant-based Breast Reconstruction. Aesthetic Plast Surg 2024:10.1007/s00266-024-04515-y. [PMID: 39592492 DOI: 10.1007/s00266-024-04515-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2024] [Accepted: 11/04/2024] [Indexed: 11/28/2024]
Abstract
BACKGROUND Artificial intelligence (AI) chatbots, including ChatGPT-4 (GPT-4) and Grok-1 (Grok), have been shown to be potentially useful in several medical fields, but have not been examined in plastic and aesthetic surgery. The aim of this study is to evaluate the responses of these AI chatbots for clinical questions (CQs) related to the guidelines for implant-based breast reconstruction (IBBR) published by the Japan Society of Plastic and Reconstructive Surgery (JSPRS) in 2021. METHODS CQs in the JSPRS guidelines were used as question sources. Responses from two AI chatbots, GPT-4 and Grok, were evaluated for accuracy, informativeness, and readability by five Japanese Board-certified breast reconstruction specialists and five Japanese clinical fellows of plastic surgery. RESULTS GPT-4 outperformed Grok significantly in terms of accuracy (p < 0.001), informativeness (p < 0.001), and readability (p < 0.001) when evaluated by plastic surgery fellows. Compared to the original guidelines, Grok scored significantly lower in all three areas (all p < 0.001). The accuracy of GPT-4 was rated to be significantly higher based on scores given by plastic surgery fellows compared to those of breast reconstruction specialists (p = 0.012), whereas there was no significant difference between these scores for Grok. CONCLUSIONS The study suggests that GPT-4 has the potential to assist in interpreting and applying clinical guidelines for IBBR but importantly there is still a risk that AI chatbots can misinform. Further studies are needed to understand the broader role of current and future AI chatbots in breast reconstruction surgery. LEVEL OF EVIDENCE IV This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine Ratings, please refer to Table of Contents or online Instructions to Authors www.springer.com/00266 .
Collapse
Affiliation(s)
- Makoto Shiraishi
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, Tokyo, Japan
| | - Yoshihiro Sowa
- Department of Plastic Surgery, Jichi Medical University, Yakushiji, Shimotsuke, Tochigi, Japan.
| | - Koichi Tomita
- Department of Plastic and Reconstructive Surgery, Kindai University, Osaka, Japan
| | - Yasunobu Terao
- Department of Plastic and Reconstructive Surgery, Tokyo Metropolitan Cancer and Infectious Diseases Center, Komagome Hospital, Tokyo, Japan
| | - Toshihiko Satake
- Department of Plastic, Reconstructive and Aesthetic Surgery, Toyama University Hospital, Toyama, Japan
| | - Mayu Muto
- Department of Plastic, Reconstructive and Aesthetic Surgery, Toyama University Hospital, Toyama, Japan
- Lala Breast Reconstruction Clinic Yokohama, Yokohama, Japan
- Department of Plastic Surgery, Yokohama City University Medical Center, Yokohama, Japan
| | - Yuhei Morita
- Department of Plastic Surgery, Jichi Medical University, Yakushiji, Shimotsuke, Tochigi, Japan
- Japanese Red Cross Koga Hospital, Koga, Japan
| | - Shino Higai
- Department of Plastic Surgery, Jichi Medical University, Yakushiji, Shimotsuke, Tochigi, Japan
| | - Yoshihiro Toyohara
- Department of Plastic Surgery, Jichi Medical University, Yakushiji, Shimotsuke, Tochigi, Japan
| | - Yasue Kurokawa
- Department of Plastic Surgery, Jichi Medical University, Yakushiji, Shimotsuke, Tochigi, Japan
| | - Ataru Sunaga
- Department of Plastic Surgery, Jichi Medical University, Yakushiji, Shimotsuke, Tochigi, Japan
| | - Mutsumi Okazaki
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, Tokyo, Japan
| |
Collapse
|
3
|
Madaudo C, Parlati ALM, Di Lisi D, Carluccio R, Sucato V, Vadalà G, Nardi E, Macaione F, Cannata A, Manzullo N, Santoro C, Iervolino A, D'Angelo F, Marzano F, Basile C, Gargiulo P, Corrado E, Paolillo S, Novo G, Galassi AR, Filardi PP. Artificial intelligence in cardiology: a peek at the future and the role of ChatGPT in cardiology practice. J Cardiovasc Med (Hagerstown) 2024; 25:766-771. [PMID: 39347723 DOI: 10.2459/jcm.0000000000001664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Accepted: 08/19/2024] [Indexed: 10/01/2024]
Abstract
Artificial intelligence has increasingly become an integral part of our daily activities. ChatGPT, a natural language processing technology developed by OpenAI, is widely used in various industries, including healthcare. The application of ChatGPT in healthcare is still evolving, with studies exploring its potential in clinical decision-making, patient education, workflow optimization, and scientific literature. ChatGPT could be exploited in the medical field to improve patient education and information, thus increasing compliance. ChatGPT could facilitate information exchange on major cardiovascular diseases, provide clinical decision support, and improve patient communication and education. It could assist the clinician in differential diagnosis, suggest appropriate imaging modalities, and optimize treatment plans based on evidence-based guidelines. However, it is unclear whether it will be possible to use ChatGPT for the management of patients who require rapid decisions. Indeed, many drawbacks are associated with the daily use of these technologies in the medical field, such as insufficient expertise in specialized fields and a lack of comprehension of the context in which it works. The pros and cons of its use have been explored in this review, which was not written with the help of ChatGPT.
Collapse
Affiliation(s)
- Cristina Madaudo
- Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties, Cardiology Unit, University of Palermo, University Hospital P. Giaccone, Palermo
- Department of Cardiovascular Sciences, British Heart Foundation Centre of Research Excellence, School of Cardiovascular Medicine, Faculty of Life Sciences and Medicine, King's College London, The James Black Centre, 125 Coldharbour Lane, London, UK
| | - Antonio Luca Maria Parlati
- Department of Advanced Biomedical Sciences, University of Naples Federico II, Naples, Italy
- Department of Cardiovascular Sciences, British Heart Foundation Centre of Research Excellence, School of Cardiovascular Medicine, Faculty of Life Sciences and Medicine, King's College London, The James Black Centre, 125 Coldharbour Lane, London, UK
| | - Daniela Di Lisi
- Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties, Cardiology Unit, University of Palermo, University Hospital P. Giaccone, Palermo
| | - Raffaele Carluccio
- Department of Advanced Biomedical Sciences, University of Naples Federico II, Naples, Italy
| | - Vincenzo Sucato
- Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties, Cardiology Unit, University of Palermo, University Hospital P. Giaccone, Palermo
| | - Giuseppe Vadalà
- Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties, Cardiology Unit, University of Palermo, University Hospital P. Giaccone, Palermo
| | - Ermanno Nardi
- Department of Advanced Biomedical Sciences, University of Naples Federico II, Naples, Italy
| | - Francesca Macaione
- Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties, Cardiology Unit, University of Palermo, University Hospital P. Giaccone, Palermo
| | - Antonio Cannata
- Department of Cardiovascular Sciences, British Heart Foundation Centre of Research Excellence, School of Cardiovascular Medicine, Faculty of Life Sciences and Medicine, King's College London, The James Black Centre, 125 Coldharbour Lane, London, UK
| | - Nilla Manzullo
- Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties, Cardiology Unit, University of Palermo, University Hospital P. Giaccone, Palermo
| | - Ciro Santoro
- Department of Advanced Biomedical Sciences, University of Naples Federico II, Naples, Italy
| | - Adelaide Iervolino
- Department of Clinical Medicine and Surgery, University of Naples Federico II, Naples, Italy
| | - Federica D'Angelo
- Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties, Cardiology Unit, University of Palermo, University Hospital P. Giaccone, Palermo
| | - Federica Marzano
- Department of Advanced Biomedical Sciences, University of Naples Federico II, Naples, Italy
| | - Christian Basile
- Department of Advanced Biomedical Sciences, University of Naples Federico II, Naples, Italy
| | - Paola Gargiulo
- Department of Advanced Biomedical Sciences, University of Naples Federico II, Naples, Italy
| | - Egle Corrado
- Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties, Cardiology Unit, University of Palermo, University Hospital P. Giaccone, Palermo
| | - Stefania Paolillo
- Department of Advanced Biomedical Sciences, University of Naples Federico II, Naples, Italy
| | - Giuseppina Novo
- Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties, Cardiology Unit, University of Palermo, University Hospital P. Giaccone, Palermo
| | - Alfredo Ruggero Galassi
- Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties, Cardiology Unit, University of Palermo, University Hospital P. Giaccone, Palermo
| | | |
Collapse
|
4
|
Aydin S, Karabacak M, Vlachos V, Margetis K. Large language models in patient education: a scoping review of applications in medicine. Front Med (Lausanne) 2024; 11:1477898. [PMID: 39534227 PMCID: PMC11554522 DOI: 10.3389/fmed.2024.1477898] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2024] [Accepted: 10/03/2024] [Indexed: 11/16/2024] Open
Abstract
Introduction Large Language Models (LLMs) are sophisticated algorithms that analyze and generate vast amounts of textual data, mimicking human communication. Notable LLMs include GPT-4o by Open AI, Claude 3.5 Sonnet by Anthropic, and Gemini by Google. This scoping review aims to synthesize the current applications and potential uses of LLMs in patient education and engagement. Materials and methods Following the PRISMA-ScR checklist and methodologies by Arksey, O'Malley, and Levac, we conducted a scoping review. We searched PubMed in June 2024, using keywords and MeSH terms related to LLMs and patient education. Two authors conducted the initial screening, and discrepancies were resolved by consensus. We employed thematic analysis to address our primary research question. Results The review identified 201 studies, predominantly from the United States (58.2%). Six themes emerged: generating patient education materials, interpreting medical information, providing lifestyle recommendations, supporting customized medication use, offering perioperative care instructions, and optimizing doctor-patient interaction. LLMs were found to provide accurate responses to patient queries, enhance existing educational materials, and translate medical information into patient-friendly language. However, challenges such as readability, accuracy, and potential biases were noted. Discussion LLMs demonstrate significant potential in patient education and engagement by creating accessible educational materials, interpreting complex medical information, and enhancing communication between patients and healthcare providers. Nonetheless, issues related to the accuracy and readability of LLM-generated content, as well as ethical concerns, require further research and development. Future studies should focus on improving LLMs and ensuring content reliability while addressing ethical considerations.
Collapse
Affiliation(s)
- Serhat Aydin
- School of Medicine, Koç University, Istanbul, Türkiye
| | - Mert Karabacak
- Department of Neurosurgery, Mount Sinai Health System, New York, NY, United States
| | - Victoria Vlachos
- College of Human Ecology, Cornell University, Ithaca, NY, United States
| | | |
Collapse
|
5
|
Mert M, Vahabi A, Daştan AE, Kuyucu A, Ünal YC, Tezgel O, Öztürk AM, Taşbakan M, Aktuğlu K. Artificial intelligence's suggestions for level of amputation in diabetic foot ulcers are highly correlated with those of clinicians, only with exception of hindfoot amputations. Int Wound J 2024; 21:e70055. [PMID: 39353602 PMCID: PMC11444738 DOI: 10.1111/iwj.70055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Revised: 09/01/2024] [Accepted: 09/02/2024] [Indexed: 10/04/2024] Open
Abstract
Diabetic foot ulcers (DFUs) are a growing public health problem, paralleling the increasing incidence of diabetes. While prevention is most effective treatment for DFUs, challenge remains on selecting the optimal treatment in cases with DFUs. Health sciences have greatly benefited from the integration of artificial intelligence (AI) applications across various fields. Regarding amputations in DFUs, both literature and clinical practice have mainly focused on strategies to prevent amputation and identify avoidable risk factor. However, there are very limited data on assistive parameters/tools that can be used to determine the level of amputation. This study investigated how well ChatGPT, with its lately released version 4o, matches the amputation level selection of an experienced team in this field. For this purpose, clinical photographs from patients who underwent amputations due to diabetic foot ulcers between May 2023 and May 2024 were submitted to the ChatGPT-4o program. The AI was tasked with recommending an appropriate amputation level based on these clinical photographs. Data from a total of 60 patients were analysed, with a median age of 64.5 years (range: 41-91). According to the Wagner Classification, 32 patients (53.3%) had grade 4 ulcers, 16 patients (26.6%) had grade 5 ulcers, 10 patients (16.6%) had grade 3 ulcers and 2 patients (3.3%) had grade 2 ulcers. A one-to-one correspondence between the AI tool's recommended amputation level and the level actually performed was observed in 50 out of 60 cases (83.3%). In the remaining 10 cases, discrepancies were noted, with the AI consistently recommending a more proximal level of amputation than what was performed. The inter-rater agreement analysis between the actual surgeries and the AI tool's recommendations yielded a Cohen's kappa coefficient of 0.808 (SD: 0.055, 95% CI: 0.701-0.916), indicating substantial agreement. Relying solely on clinical photographs, ChatGPT-4.0 demonstrates decisions that are largely consistent with those of an experienced team in determining the optimal level of amputation for DFUs, with the exception of hindfoot amputations.
Collapse
Affiliation(s)
- Merve Mert
- Department of Orthopedics and TraumatologyEge University School of MedicineIzmirTurkey
- Department of Infectious Diseases and Clinical MicrobiologyEge University School of MedicineIzmirTurkey
| | - Arman Vahabi
- Department of Orthopedics and TraumatologyEge University School of MedicineIzmirTurkey
| | - Ali Engin Daştan
- Department of Orthopedics and TraumatologyEge University School of MedicineIzmirTurkey
| | - Abdussamet Kuyucu
- Department of Orthopedics and TraumatologyEge University School of MedicineIzmirTurkey
- Department of Infectious Diseases and Clinical MicrobiologyEge University School of MedicineIzmirTurkey
| | - Yunus Can Ünal
- Department of Orthopaedics and TraumatologyVan Educational and Research HospitalVanTurkey
| | - Okan Tezgel
- Department of Orthopaedics and TraumatologyVan Educational and Research HospitalVanTurkey
| | - Anıl Murat Öztürk
- Department of Orthopedics and TraumatologyEge University School of MedicineIzmirTurkey
| | - Meltem Taşbakan
- Department of Infectious Diseases and Clinical MicrobiologyEge University School of MedicineIzmirTurkey
| | - Kemal Aktuğlu
- Department of Orthopedics and TraumatologyEge University School of MedicineIzmirTurkey
| |
Collapse
|
6
|
Shiraishi M, Tsuruda S, Tomioka Y, Chang J, Hori A, Ishii S, Fujinaka R, Ando T, Ohba J, Okazaki M. Advancement of Generative Pre-trained Transformer Chatbots in Answering Clinical Questions in the Practical Rhinoplasty Guideline. Aesthetic Plast Surg 2024:10.1007/s00266-024-04377-4. [PMID: 39322837 DOI: 10.1007/s00266-024-04377-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Accepted: 09/03/2024] [Indexed: 09/27/2024]
Abstract
BACKGROUND The Generative Pre-trained Transformer (GPT) series, which includes ChatGPT, is an artificial large language model that provides human-like text dialogue. This study aimed to evaluate the performance of artificial intelligence chatbots in answering clinical questions based on practical rhinoplasty guidelines. METHODS Clinical questions (CQs) developed from the guidelines were used as question sources. For each question, we asked GPT-4 and GPT-3.5 (ChatGPT), developed by OpenAI, to provide answers for the CQs, Policy Level, Aggregate Evidence Quality, Level of Confidence in Evidence, and References. We compared the performance of the two types of artificial intelligence (AI) chatbots. RESULTS A total of 10 questions were included in the final analysis, and the AI chatbots correctly answered 90.0% of these. GPT-4 demonstrated a lower accuracy rate than GPT-3.5 in answering CQs, although without statistically significant difference (86.0% vs. 94.0%; p = 0.05), whereas GPT-4 showed significantly higher accuracy for the level of confidence in Evidence than GPT-3.5 (52.0% vs. 28.0%; p < 0.01). No statistical differences were observed in Policy Level, Aggregate Evidence Quality, and Reference Match. In addition, GPT-4 rated significantly higher in presenting existing references than GPT-3.5 (36.9% vs. 24.1%; p = 0.01). CONCLUSIONS The overall performance of GPT-4 was similar to that of GPT-3.5. However, GPT-4 provided existing references at a higher rate than GPT-3.5. GPT-4 has the potential to provide a more accurate reference in professional fields, including rhinoplasty. LEVEL OF EVIDENCE V This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
Collapse
Affiliation(s)
- Makoto Shiraishi
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan.
| | - Saori Tsuruda
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Yoko Tomioka
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Jinwoo Chang
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Asei Hori
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Saaya Ishii
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Rei Fujinaka
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Taku Ando
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Jun Ohba
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Mutsumi Okazaki
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| |
Collapse
|
7
|
Carnino JM, Pellegrini WR, Willis M, Cohen MB, Paz-Lansberg M, Davis EM, Grillone GA, Levi JR. Assessing ChatGPT's Responses to Otolaryngology Patient Questions. Ann Otol Rhinol Laryngol 2024; 133:658-664. [PMID: 38676440 DOI: 10.1177/00034894241249621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/28/2024]
Abstract
OBJECTIVE This study aims to evaluate ChatGPT's performance in addressing real-world otolaryngology patient questions, focusing on accuracy, comprehensiveness, and patient safety, to assess its suitability for integration into healthcare. METHODS A cross-sectional study was conducted using patient questions from the public online forum Reddit's r/AskDocs, where medical advice is sought from healthcare professionals. Patient questions were input into ChatGPT (GPT-3.5), and responses were reviewed by 5 board-certified otolaryngologists. The evaluation criteria included difficulty, accuracy, comprehensiveness, and bedside manner/empathy. Statistical analysis explored the relationship between patient question characteristics and ChatGPT response scores. Potentially dangerous responses were also identified. RESULTS Patient questions averaged 224.93 words, while ChatGPT responses were longer at 414.93 words. The accuracy scores for ChatGPT responses were 3.76/5, comprehensiveness scores were 3.59/5, and bedside manner/empathy scores were 4.28/5. Longer patient questions did not correlate with higher response ratings. However, longer ChatGPT responses scored higher in bedside manner/empathy. Higher question difficulty correlated with lower comprehensiveness. Five responses were flagged as potentially dangerous. CONCLUSION While ChatGPT exhibits promise in addressing otolaryngology patient questions, this study demonstrates its limitations, particularly in accuracy and comprehensiveness. The identification of potentially dangerous responses underscores the need for a cautious approach to AI in medical advice. Responsible integration of AI into healthcare necessitates thorough assessments of model performance and ethical considerations for patient safety.
Collapse
Affiliation(s)
- Jonathan M Carnino
- Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
| | - William R Pellegrini
- Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
- Department of Otolaryngology-Head and Neck Surgery, Boston Medical Center, Boston, MA, USA
| | - Megan Willis
- Department of Biostatistics, Boston University, Boston, MA, USA
| | - Michael B Cohen
- Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
- Department of Otolaryngology-Head and Neck Surgery, Boston Medical Center, Boston, MA, USA
| | - Marianella Paz-Lansberg
- Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
- Department of Otolaryngology-Head and Neck Surgery, Boston Medical Center, Boston, MA, USA
| | - Elizabeth M Davis
- Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
- Department of Otolaryngology-Head and Neck Surgery, Boston Medical Center, Boston, MA, USA
| | - Gregory A Grillone
- Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
- Department of Otolaryngology-Head and Neck Surgery, Boston Medical Center, Boston, MA, USA
| | - Jessica R Levi
- Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
- Department of Otolaryngology-Head and Neck Surgery, Boston Medical Center, Boston, MA, USA
| |
Collapse
|
8
|
Shiraishi M, Tomioka Y, Miyakuni A, Ishii S, Hori A, Park H, Ohba J, Okazaki M. Performance of ChatGPT in Answering Clinical Questions on the Practical Guideline of Blepharoptosis. Aesthetic Plast Surg 2024; 48:2389-2398. [PMID: 38684536 DOI: 10.1007/s00266-024-04005-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 03/11/2024] [Indexed: 05/02/2024]
Abstract
BACKGROUND ChatGPT is a free artificial intelligence (AI) language model developed and released by OpenAI in late 2022. This study aimed to evaluate the performance of ChatGPT to accurately answer clinical questions (CQs) on the Guideline for the Management of Blepharoptosis published by the American Society of Plastic Surgeons (ASPS) in 2022. METHODS CQs in the guideline were used as question sources in both English and Japanese. For each question, ChatGPT provided answers for CQs, evidence quality, recommendation strength, reference match, and answered word counts. We compared the performance of ChatGPT in each component between English and Japanese queries. RESULTS A total of 11 questions were included in the final analysis, and ChatGPT answered 61.3% of these correctly. ChatGPT demonstrated a higher accuracy rate in English answers for CQs compared to Japanese answers for CQs (76.4% versus 46.4%; p = 0.004) and word counts (123 words versus 35.9 words; p = 0.004). No statistical differences were noted for evidence quality, recommendation strength, and reference match. A total of 697 references were proposed, but only 216 of them (31.0%) existed. CONCLUSIONS ChatGPT demonstrates potential as an adjunctive tool in the management of blepharoptosis. However, it is crucial to recognize that the existing AI model has distinct limitations, and its primary role should be to complement the expertise of medical professionals. LEVEL OF EVIDENCE V Observational study under respected authorities. This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
Collapse
Affiliation(s)
- Makoto Shiraishi
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan.
| | - Yoko Tomioka
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Ami Miyakuni
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Saaya Ishii
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Asei Hori
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Hwayoung Park
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Jun Ohba
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Mutsumi Okazaki
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| |
Collapse
|
9
|
Shiraishi M, Tomioka Y, Okazaki M. ChatGPT and Clinical Questions on the Practical Guideline of Blepharoptosis: Reply. Aesthetic Plast Surg 2024:10.1007/s00266-024-04193-w. [PMID: 38890161 DOI: 10.1007/s00266-024-04193-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Accepted: 06/06/2024] [Indexed: 06/20/2024]
Abstract
In a recent Letter to the Editor authored by Daungsupawong et al. in Aesthetic Plastic Surgery, titled "ChatGPT and Clinical Questions on the Practical Guideline of Blepharoptosis: Correspondence," the authors emphasized important points regarding the input language differences between input and output references. However, advanced versions, such as GPT-4, have shown marginal differences between English and Chinese inputs, possibly because of the use of larger training data. To address this issue, non-English-language-oriented large language models (LLMs) have been developed. The ability of LLMs to refer to existing references varies, with newer models, such as GPT-4, showing higher reference rates than GPT-3.5. Future research should focus on addressing the current limitations and enhancing the effectiveness of emerging LLMs in providing accurate and informative answers to medical questions across multiple languages.Level of Evidence V This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
Collapse
Affiliation(s)
- Makoto Shiraishi
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan.
| | - Yoko Tomioka
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Mutsumi Okazaki
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| |
Collapse
|
10
|
Shiraishi M, Tanigawa K, Tomioka Y, Miyakuni A, Moriwaki Y, Yang R, Oba J, Okazaki M. Blepharoptosis Consultation with Artificial Intelligence: Aesthetic Surgery Advice and Counseling from Chat Generative Pre-Trained Transformer (ChatGPT). Aesthetic Plast Surg 2024; 48:2057-2063. [PMID: 38589561 DOI: 10.1007/s00266-024-04002-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 03/11/2024] [Indexed: 04/10/2024]
Abstract
BACKGROUND Chat generative pre-trained transformer (ChatGPT) is a publicly available extensive artificial intelligence (AI) language model that leverages deep learning to generate text that mimics human conversations. In this study, the performance of ChatGPT was assessed by offering insightful and precise answers to a series of fictional questions and emulating a preliminary consultation on blepharoplasty. METHODS ChatGPT was posed with questions derived from a blepharoplasty checklist provided by the American Society of Plastic Surgeons. Board-certified plastic surgeons and non-medical staff members evaluated the responses for accuracy, informativeness, and accessibility. RESULTS Nine questions were used in this study. Regarding informativeness, the average score given by board-certified plastic surgeons was significantly lower than that given by non-medical staff members (2.89 ± 0.72 vs 4.41 ± 0.71; p = 0.042). No statistically significant differences were observed in accuracy (p = 0.56) or accessibility (p = 0.11). CONCLUSIONS Our results emphasize the effectiveness of ChatGPT in simulating doctor-patient conversations during blepharoplasty. Non-medical individuals found its responses more informative compared with the surgeons. Although limited in terms of specialized guidance, ChatGPT offers foundational surgical information. Further exploration is warranted to elucidate the broader role of AI in esthetic surgical consultations. LEVEL OF EVIDENCE V Observational study under respected authorities. This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
Collapse
Affiliation(s)
- Makoto Shiraishi
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan.
| | - Koji Tanigawa
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Yoko Tomioka
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Ami Miyakuni
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Yuta Moriwaki
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Rui Yang
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Jun Oba
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Mutsumi Okazaki
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| |
Collapse
|