1
|
DiDonna N, Shetty PN, Khan K, Damitz L. Unveiling the Potential of AI in Plastic Surgery Education: A Comparative Study of Leading AI Platforms' Performance on In-training Examinations. PLASTIC AND RECONSTRUCTIVE SURGERY-GLOBAL OPEN 2024; 12:e5929. [PMID: 38911577 PMCID: PMC11191997 DOI: 10.1097/gox.0000000000005929] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Accepted: 05/01/2024] [Indexed: 06/25/2024]
Abstract
Background Within the last few years, artificial intelligence (AI) chatbots have sparked fascination for their potential as an educational tool. Although it has been documented that one such chatbot, ChatGPT, is capable of performing at a moderate level on plastic surgery examinations and has the capacity to become a beneficial educational tool, the potential of other chatbots remains unexplored. Methods To investigate the efficacy of AI chatbots in plastic surgery education, performance on the 2019-2023 Plastic Surgery In-service Training Examination (PSITE) was compared among seven popular AI platforms: ChatGPT-3.5, ChatGPT-4.0, Google Bard, Google PaLM, Microsoft Bing AI, Claude, and My AI by Snapchat. Answers were evaluated for accuracy and incorrect responses were characterized by question category and error type. Results ChatGPT-4.0 outperformed the other platforms, reaching accuracy rates up to 79%. On the 2023 PSITE, ChatGPT-4.0 ranked in the 95th percentile of first-year residents; however, relative performance worsened when compared with upper-level residents, with the platform ranking in the 12th percentile of sixth-year residents. The performance among other chatbots was comparable, with their average PSITE score (2019-2023) ranging from 48.6% to 57.0%. Conclusions Results of our study indicate that ChatGPT-4.0 has potential as an educational tool in the field of plastic surgery; however, given their poor performance on the PSITE, the use of other chatbots should be cautioned against at this time. To our knowledge, this is the first article comparing the performance of multiple AI chatbots within the realm of plastic surgery education.
Collapse
Affiliation(s)
- Nicole DiDonna
- From the School of Medicine, University of North Carolina, Chapel Hill, N.C
| | - Pragna N. Shetty
- Division of Plastic and Reconstructive Surgery, University of North Carolina, Chapel Hill, N.C
| | - Kamran Khan
- Division of Plastic and Reconstructive Surgery, University of North Carolina, Chapel Hill, N.C
| | - Lynn Damitz
- Division of Plastic and Reconstructive Surgery, University of North Carolina, Chapel Hill, N.C
| |
Collapse
|
2
|
Alfertshofer M, Hoch CC, Funk PF, Hollmann K, Wollenberg B, Knoedler S, Knoedler L. Sailing the Seven Seas: A Multinational Comparison of ChatGPT's Performance on Medical Licensing Examinations. Ann Biomed Eng 2024; 52:1542-1545. [PMID: 37553555 PMCID: PMC11082010 DOI: 10.1007/s10439-023-03338-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 07/28/2023] [Indexed: 08/10/2023]
Abstract
PURPOSE The use of AI-powered technology, particularly OpenAI's ChatGPT, holds significant potential to reshape healthcare and medical education. Despite existing studies on the performance of ChatGPT in medical licensing examinations across different nations, a comprehensive, multinational analysis using rigorous methodology is currently lacking. Our study sought to address this gap by evaluating the performance of ChatGPT on six different national medical licensing exams and investigating the relationship between test question length and ChatGPT's accuracy. METHODS We manually inputted a total of 1,800 test questions (300 each from US, Italian, French, Spanish, UK, and Indian medical licensing examination) into ChatGPT, and recorded the accuracy of its responses. RESULTS We found significant variance in ChatGPT's test accuracy across different countries, with the highest accuracy seen in the Italian examination (73% correct answers) and the lowest in the French examination (22% correct answers). Interestingly, question length correlated with ChatGPT's performance in the Italian and French state examinations only. In addition, the study revealed that questions requiring multiple correct answers, as seen in the French examination, posed a greater challenge to ChatGPT. CONCLUSION Our findings underscore the need for future research to further delineate ChatGPT's strengths and limitations in medical test-taking across additional countries and to develop guidelines to prevent AI-assisted cheating in medical examinations.
Collapse
Affiliation(s)
- Michael Alfertshofer
- Division of Hand, Plastic and Aesthetic Surgery, Ludwig-Maximilians University Munich, Ziemssenstrasse 5, 80336, Munich, Germany.
| | - Cosima C Hoch
- Department of Otolaryngology, Head and Neck Surgery, School of Medicine, Technical University of Munich (TUM), Ismaningerstrasse 22, 81675, Munich, Germany
| | - Paul F Funk
- Department of Otolaryngology, Head and Neck Surgery, University Hospital Jena, Friedrich Schiller University Jena, Am Klinikum 1, 07747, Jena, Germany
| | - Katharina Hollmann
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, MA, 02114, USA
| | - Barbara Wollenberg
- Department of Otolaryngology, Head and Neck Surgery, School of Medicine, Technical University of Munich (TUM), Ismaningerstrasse 22, 81675, Munich, Germany
| | - Samuel Knoedler
- Department of Plastic, Hand and Reconstructive Surgery, University Hospital Regensburg, Franz-Josef-Strauss-Allee 11, 93053, Regensburg, Germany
| | - Leonard Knoedler
- Department of Plastic, Hand and Reconstructive Surgery, University Hospital Regensburg, Franz-Josef-Strauss-Allee 11, 93053, Regensburg, Germany
| |
Collapse
|
3
|
Knoedler S, Alfertshofer M, Simon S, Panayi AC, Saadoun R, Palackic A, Falkner F, Hundeshagen G, Kauke-Navarro M, Vollbach FH, Bigdeli AK, Knoedler L. Turn Your Vision into Reality-AI-Powered Pre-operative Outcome Simulation in Rhinoplasty Surgery. Aesthetic Plast Surg 2024:10.1007/s00266-024-04043-9. [PMID: 38777929 DOI: 10.1007/s00266-024-04043-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Accepted: 03/28/2024] [Indexed: 05/25/2024]
Abstract
BACKGROUND The increasing demand and changing trends in rhinoplasty surgery emphasize the need for effective doctor-patient communication, for which Artificial Intelligence (AI) could be a valuable tool in managing patient expectations during pre-operative consultations. OBJECTIVE To develop an AI-based model to simulate realistic postoperative rhinoplasty outcomes. METHODS We trained a Generative Adversarial Network (GAN) using 3,030 rhinoplasty patients' pre- and postoperative images. One-hundred-one study participants were presented with 30 pre-rhinoplasty patient photographs followed by an image set consisting of the real postoperative versus the GAN-generated image and asked to identify the GAN-generated image. RESULTS The study sample (48 males, 53 females, mean age of 31.6 ± 9.0 years) correctly identified the GAN-generated images with an accuracy of 52.5 ± 14.3%. Male study participants were more likely to identify the AI-generated images compared with female study participants (55.4% versus 49.6%; p = 0.042). CONCLUSION We presented a GAN-based simulator for rhinoplasty outcomes which used pre-operative patient images to predict accurate representations that were not perceived as different from real postoperative outcomes. LEVEL OF EVIDENCE III This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
Collapse
Affiliation(s)
- Samuel Knoedler
- Division of Plastic Surgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Plastic and Hand Surgery, Klinikum Rechts der Isar, Technical University of Munich, Munich, Germany
| | - Michael Alfertshofer
- Department of Plastic and Hand Surgery, Klinikum Rechts der Isar, Technical University of Munich, Munich, Germany
- Department of Oromaxillofacial Surgery, Ludwig-Maximilians University Munich, Munich, Germany
| | - Siddharth Simon
- Department of Oromaxillofacial Surgery, Ludwig-Maximilians University Munich, Munich, Germany
| | - Adriana C Panayi
- Department of Hand-, Plastic and Reconstructive Surgery, Microsurgery, Burn Center, BG Center Ludwigshafen, University of Heidelberg, Ludwigshafen, Germany
- Department of Hand and Plastic Surgery, University of Heidelberg, Heidelberg, Germany
| | - Rakan Saadoun
- Department of Plastic Surgery, University of Pittsburgh, Pittsburgh, PA, USA
| | - Alen Palackic
- Department of Hand-, Plastic and Reconstructive Surgery, Microsurgery, Burn Center, BG Center Ludwigshafen, University of Heidelberg, Ludwigshafen, Germany
- Department of Hand and Plastic Surgery, University of Heidelberg, Heidelberg, Germany
| | - Florian Falkner
- Department of Hand-, Plastic and Reconstructive Surgery, Microsurgery, Burn Center, BG Center Ludwigshafen, University of Heidelberg, Ludwigshafen, Germany
- Department of Hand and Plastic Surgery, University of Heidelberg, Heidelberg, Germany
| | - Gabriel Hundeshagen
- Department of Hand-, Plastic and Reconstructive Surgery, Microsurgery, Burn Center, BG Center Ludwigshafen, University of Heidelberg, Ludwigshafen, Germany
- Department of Hand and Plastic Surgery, University of Heidelberg, Heidelberg, Germany
| | - Martin Kauke-Navarro
- Department of Surgery, Division of Plastic Surgery, Yale School of Medicine, New Haven, CT, USA
| | - Felix H Vollbach
- Department of Hand-, Plastic and Reconstructive Surgery, Microsurgery, Burn Center, BG Center Ludwigshafen, University of Heidelberg, Ludwigshafen, Germany
- Department of Hand and Plastic Surgery, University of Heidelberg, Heidelberg, Germany
| | - Amir K Bigdeli
- Department of Hand-, Plastic and Reconstructive Surgery, Microsurgery, Burn Center, BG Center Ludwigshafen, University of Heidelberg, Ludwigshafen, Germany
- Department of Hand and Plastic Surgery, University of Heidelberg, Heidelberg, Germany
| | - Leonard Knoedler
- Department of Surgery, Division of Plastic Surgery, Yale School of Medicine, New Haven, CT, USA.
- Department of Plastic, Hand and Reconstructive Surgery, University Hospital Regensburg, Regensburg, Germany.
| |
Collapse
|
4
|
Knoedler L, Alfertshofer M, Geldner B, Sherwani K, Knoedler S, Kauke-Navarro M, Safi AF. Truth Lies in the Depths: Novel Insights into Facial Aesthetic Measurements from a U.S. Survey Panel. Aesthetic Plast Surg 2024:10.1007/s00266-024-04022-0. [PMID: 38772944 DOI: 10.1007/s00266-024-04022-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Accepted: 03/11/2024] [Indexed: 05/23/2024]
Abstract
BACKGROUND Aesthetic facial bone surgery and facial implantology expand the boundaries of conventional facial surgery that focus on facial soft tissue. This study aimed to reveal novel aesthetic facial measurements to provide tailored treatment concepts and advance patient care. METHODS A total of n=101 study participants (46 females and 55 males) were presented with 120 patient portraits (frontal images in natural head posture; 60 females and 60 males) and asked to assess the facial attractiveness (scale 0-10; "How attractive do you find the person in the image?") and the model capability score (MCS; scale 0-10; "How likely do you think the person in the image could pursue a modelling career?"). For each frontal photograph, defined facial measurements and ratios were taken to analyse their relationship with the perception of facial attractiveness and MCS. RESULTS The overall attractiveness rating was 4.3 ± 1.1, while the mean MCS was 3.4 ± 1.1. In young males, there was a significant correlation between attractiveness and the zygoma-mandible angle (ZMA)2 (r= - 0.553; p= 0.011). In young and middle-aged females, MCS was significantly correlated with facial width (FW)1-FW2 ratio (r= 0.475; p= 0.034). For all male individuals, a ZMA1 value of 171.79 degrees (Y= 0.313; p= 0.024) was the most robust cut-off to determine facial attractiveness. The majority of human evaluators (n=62; 51.7%) considered facial implants a potential treatment to improve the patient's facial attractiveness. CONCLUSION This study introduced novel metrics of facial attractiveness, focusing on the facial skeleton. Our findings emphasized the significance of zygomatic measurements and mandibular projections for facial aesthetics, with FI representing a promising surgical approach to optimize facial aesthetics. LEVEL OF EVIDENCE IV This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
Collapse
Affiliation(s)
- Leonard Knoedler
- Department of Surgery, Division of Plastic and Reconstructive Surgery, Yale School of Medicine, New Haven, CT, USA
| | - Michael Alfertshofer
- Division of Hand, Plastic and Aesthetic Surgery, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Benedikt Geldner
- Department of Hand-, Plastic and Reconstructive Surgery, Microsurgery, Burn Center, BG Center Ludwigshafen, University of Heidelberg, Ludwigshafen, Germany
| | - Khalil Sherwani
- Department of Hand-, Plastic and Reconstructive Surgery, Microsurgery, Burn Center, BG Center Ludwigshafen, University of Heidelberg, Ludwigshafen, Germany
| | - Samuel Knoedler
- Department of Surgery, Division of Plastic and Reconstructive Surgery, Yale School of Medicine, New Haven, CT, USA
| | - Martin Kauke-Navarro
- Department of Surgery, Division of Plastic and Reconstructive Surgery, Yale School of Medicine, New Haven, CT, USA.
| | - Ali-Farid Safi
- Faculty of Medicine, University of Bern, Bern, Switzerland.
- Center for Cranio-Maxillo-Facial Surgery, Bern, Switzerland.
| |
Collapse
|
5
|
Knoedler L, Alfertshofer M, Knoedler S, Hoch CC, Funk PF, Cotofana S, Maheta B, Frank K, Brébant V, Prantl L, Lamby P. Pure Wisdom or Potemkin Villages? A Comparison of ChatGPT 3.5 and ChatGPT 4 on USMLE Step 3 Style Questions: Quantitative Analysis. JMIR MEDICAL EDUCATION 2024; 10:e51148. [PMID: 38180782 PMCID: PMC10799278 DOI: 10.2196/51148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2023] [Revised: 09/30/2023] [Accepted: 10/20/2023] [Indexed: 01/06/2024]
Abstract
BACKGROUND The United States Medical Licensing Examination (USMLE) has been critical in medical education since 1992, testing various aspects of a medical student's knowledge and skills through different steps, based on their training level. Artificial intelligence (AI) tools, including chatbots like ChatGPT, are emerging technologies with potential applications in medicine. However, comprehensive studies analyzing ChatGPT's performance on USMLE Step 3 in large-scale scenarios and comparing different versions of ChatGPT are limited. OBJECTIVE This paper aimed to analyze ChatGPT's performance on USMLE Step 3 practice test questions to better elucidate the strengths and weaknesses of AI use in medical education and deduce evidence-based strategies to counteract AI cheating. METHODS A total of 2069 USMLE Step 3 practice questions were extracted from the AMBOSS study platform. After including 229 image-based questions, a total of 1840 text-based questions were further categorized and entered into ChatGPT 3.5, while a subset of 229 questions were entered into ChatGPT 4. Responses were recorded, and the accuracy of ChatGPT answers as well as its performance in different test question categories and for different difficulty levels were compared between both versions. RESULTS Overall, ChatGPT 4 demonstrated a statistically significant superior performance compared to ChatGPT 3.5, achieving an accuracy of 84.7% (194/229) and 56.9% (1047/1840), respectively. A noteworthy correlation was observed between the length of test questions and the performance of ChatGPT 3.5 (ρ=-0.069; P=.003), which was absent in ChatGPT 4 (P=.87). Additionally, the difficulty of test questions, as categorized by AMBOSS hammer ratings, showed a statistically significant correlation with performance for both ChatGPT versions, with ρ=-0.289 for ChatGPT 3.5 and ρ=-0.344 for ChatGPT 4. ChatGPT 4 surpassed ChatGPT 3.5 in all levels of test question difficulty, except for the 2 highest difficulty tiers (4 and 5 hammers), where statistical significance was not reached. CONCLUSIONS In this study, ChatGPT 4 demonstrated remarkable proficiency in taking the USMLE Step 3, with an accuracy rate of 84.7% (194/229), outshining ChatGPT 3.5 with an accuracy rate of 56.9% (1047/1840). Although ChatGPT 4 performed exceptionally, it encountered difficulties in questions requiring the application of theoretical concepts, particularly in cardiology and neurology. These insights are pivotal for the development of examination strategies that are resilient to AI and underline the promising role of AI in the realm of medical education and diagnostics.
Collapse
Affiliation(s)
- Leonard Knoedler
- Department of Plastic, Hand and Reconstructive Surgery, University Hospital Regensburg, Regensburg, Germany
| | - Michael Alfertshofer
- Division of Hand, Plastic and Aesthetic Surgery, Ludwig-Maximilians University Munich, Munich, Germany
| | - Samuel Knoedler
- Department of Plastic, Hand and Reconstructive Surgery, University Hospital Regensburg, Regensburg, Germany
- Division of Plastic Surgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, United States
| | - Cosima C Hoch
- Department of Otolaryngology, Head and Neck Surgery, School of Medicine, Technical University of Munich, Munich, Germany
| | - Paul F Funk
- Department of Otolaryngology, Head and Neck Surgery, University Hospital Jena, Friedrich Schiller University Jena, Jena, Germany
| | - Sebastian Cotofana
- Department of Dermatology, Erasmus Hospital, Rotterdam, Netherlands
- Centre for Cutaneous Research, Blizard Institute, Queen Mary University of London, London, United Kingdom
| | - Bhagvat Maheta
- College of Medicine, California Northstate University, Elk Grove, CA, United States
| | | | - Vanessa Brébant
- Department of Plastic, Hand and Reconstructive Surgery, University Hospital Regensburg, Regensburg, Germany
| | - Lukas Prantl
- Department of Plastic, Hand and Reconstructive Surgery, University Hospital Regensburg, Regensburg, Germany
| | - Philipp Lamby
- Department of Plastic, Hand and Reconstructive Surgery, University Hospital Regensburg, Regensburg, Germany
| |
Collapse
|
6
|
Knoedler L, Alfertshofer M, Simon S, Prantl L, Kehrer A, Hoch CC, Knoedler S, Lamby P. Diagnosing lagophthalmos using artificial intelligence. Sci Rep 2023; 13:21657. [PMID: 38066112 PMCID: PMC10709577 DOI: 10.1038/s41598-023-49006-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 12/02/2023] [Indexed: 12/18/2023] Open
Abstract
Lagophthalmos is the incomplete closure of the eyelids posing the risk of corneal ulceration and blindness. Lagophthalmos is a common symptom of various pathologies. We aimed to program a convolutional neural network to automatize lagophthalmos diagnosis. From June 2019 to May 2021, prospective data acquisition was performed on 30 patients seen at the Department of Plastic, Hand, and Reconstructive Surgery at the University Hospital Regensburg, Germany (IRB reference number: 20-2081-101). In addition, comparative data were gathered from 10 healthy patients as the control group. The training set comprised 826 images, while the validation and testing sets consisted of 91 patient images each. Validation accuracy was 97.8% over the span of 64 epochs. The model was trained for 17.3 min. For training and validation, an average loss of 0.304 and 0.358 and a final loss of 0.276 and 0.157 were noted. The testing accuracy was observed to be 93.41% with a loss of 0.221. This study proposes a novel application for rapid and reliable lagophthalmos diagnosis. Our CNN-based approach combines effective anti-overfitting strategies, short training times, and high accuracy levels. Ultimately, this tool carries high translational potential to facilitate the physician's workflow and improve overall lagophthalmos patient care.
Collapse
Affiliation(s)
- Leonard Knoedler
- Department of Plastic, Hand and Reconstructive Surgery, University Hospital Regensburg, Franz-Josef-Strauss-Allee 11, 93053, Regensburg, Germany.
| | - Michael Alfertshofer
- Division of Hand, Plastic and Aesthetic Surgery, Ludwig-Maximilians-University Munich, Munich, Germany
| | | | - Lukas Prantl
- Department of Plastic, Hand and Reconstructive Surgery, University Hospital Regensburg, Franz-Josef-Strauss-Allee 11, 93053, Regensburg, Germany
| | - Andreas Kehrer
- Department of Plastic, Hand and Reconstructive Surgery, University Hospital Regensburg, Franz-Josef-Strauss-Allee 11, 93053, Regensburg, Germany
| | - Cosima C Hoch
- Department of Otolaryngology, Head and Neck Surgery, School of Medicine, Technical University of Munich (TUM), 81675, Munich, Germany
| | - Samuel Knoedler
- Department of Plastic, Hand and Reconstructive Surgery, University Hospital Regensburg, Franz-Josef-Strauss-Allee 11, 93053, Regensburg, Germany
| | - Philipp Lamby
- Department of Plastic, Hand and Reconstructive Surgery, University Hospital Regensburg, Franz-Josef-Strauss-Allee 11, 93053, Regensburg, Germany
| |
Collapse
|
7
|
Knoedler L, Knoedler S, Allam O, Remy K, Miragall M, Safi AF, Alfertshofer M, Pomahac B, Kauke-Navarro M. Application possibilities of artificial intelligence in facial vascularized composite allotransplantation-a narrative review. Front Surg 2023; 10:1266399. [PMID: 38026484 PMCID: PMC10646214 DOI: 10.3389/fsurg.2023.1266399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 09/26/2023] [Indexed: 12/01/2023] Open
Abstract
Facial vascularized composite allotransplantation (FVCA) is an emerging field of reconstructive surgery that represents a dogmatic shift in the surgical treatment of patients with severe facial disfigurements. While conventional reconstructive strategies were previously considered the goldstandard for patients with devastating facial trauma, FVCA has demonstrated promising short- and long-term outcomes. Yet, there remain several obstacles that complicate the integration of FVCA procedures into the standard workflow for facial trauma patients. Artificial intelligence (AI) has been shown to provide targeted and resource-effective solutions for persisting clinical challenges in various specialties. However, there is a paucity of studies elucidating the combination of FVCA and AI to overcome such hurdles. Here, we delineate the application possibilities of AI in the field of FVCA and discuss the use of AI technology for FVCA outcome simulation, diagnosis and prediction of rejection episodes, and malignancy screening. This line of research may serve as a fundament for future studies linking these two revolutionary biotechnologies.
Collapse
Affiliation(s)
- Leonard Knoedler
- Department of Plastic, Hand- and Reconstructive Surgery, University Hospital Regensburg, Regensburg, Germany
- Division of Plastic Surgery, Department of Surgery, Yale New Haven Hospital, Yale School of Medicine, New Haven, CT, United States
| | - Samuel Knoedler
- Division of Plastic Surgery, Department of Surgery, Yale New Haven Hospital, Yale School of Medicine, New Haven, CT, United States
| | - Omar Allam
- Division of Plastic Surgery, Department of Surgery, Yale New Haven Hospital, Yale School of Medicine, New Haven, CT, United States
| | - Katya Remy
- Department of Oral and Maxillofacial Surgery, University Hospital Regensburg, Regensburg, Germany
| | - Maximilian Miragall
- Department of Oral and Maxillofacial Surgery, University Hospital Regensburg, Regensburg, Germany
| | - Ali-Farid Safi
- Craniologicum, Center for Cranio-Maxillo-Facial Surgery, Bern, Switzerland
- Faculty of Medicine, University of Bern, Bern, Switzerland
| | - Michael Alfertshofer
- Division of Hand, Plastic and Aesthetic Surgery, Ludwig-Maximilians University Munich, Munich, Germany
| | - Bohdan Pomahac
- Division of Plastic Surgery, Department of Surgery, Yale New Haven Hospital, Yale School of Medicine, New Haven, CT, United States
| | - Martin Kauke-Navarro
- Division of Plastic Surgery, Department of Surgery, Yale New Haven Hospital, Yale School of Medicine, New Haven, CT, United States
| |
Collapse
|
8
|
Aljindan FK, Al Qurashi AA, Albalawi IAS, Alanazi AMM, Aljuhani HAM, Falah Almutairi F, Aldamigh OA, Halawani IR, K Zino Alarki SM. ChatGPT Conquers the Saudi Medical Licensing Exam: Exploring the Accuracy of Artificial Intelligence in Medical Knowledge Assessment and Implications for Modern Medical Education. Cureus 2023; 15:e45043. [PMID: 37829968 PMCID: PMC10566535 DOI: 10.7759/cureus.45043] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/11/2023] [Indexed: 10/14/2023] Open
Abstract
Background The application of artificial intelligence (AI) in education is undergoing rapid advancements, with models such as ChatGPT-4 showing potential in medical education. This study aims to evaluate the proficiency of ChatGPT-4 in answering Saudi Medical Licensing Exam (SMLE) questions. Methodology A dataset of 220 questions across four medical disciplines was used. The model was trained using a specific code to answer the questions accurately, and its performance was assessed using key performance indicators, difficulty level, and exam sections. Results ChatGPT-4 demonstrated an overall accuracy of 88.6%. It showed high proficiency with Easy and Average questions, but accuracy decreased for Hard questions. Performance was consistent across all disciplines, indicating a broad knowledge base. However, an error analysis revealed areas for further refinement, particularly with category (Option) A questions across all sections. Conclusions This study underscores the potential of ChatGPT-4 as an AI-assisted tool in medical education, demonstrating high proficiency in answering SMLE questions. Future research is recommended to expand the scope of training and evaluation as well as to enhance the model's performance on complex clinical questions.
Collapse
Affiliation(s)
- Fahad K Aljindan
- Department of Plastic Surgery, King Abdullah Medical City, Makkah, SAU
| | - Abdullah A Al Qurashi
- College of Medicine, King Saud Bin Abdulaziz University for Health Sciences, Jeddah, SAU
| | | | | | | | - Faisal Falah Almutairi
- College of Medicine, Unaizah College of Medicine and Medical Sciences, Qassim University, Unaizah, SAU
| | | | | | | |
Collapse
|