1
|
Stoneham S, Livesey A, Cooper H, Mitchell C. ChatGPT versus clinician: challenging the diagnostic capabilities of artificial intelligence in dermatology. Clin Exp Dermatol 2024; 49:707-710. [PMID: 37979201 DOI: 10.1093/ced/llad402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 11/11/2023] [Accepted: 11/14/2023] [Indexed: 11/20/2023]
Abstract
BACKGROUND ChatGPT is an online language-based platform designed to answer questions in a human-like way, using deep learning -technology. OBJECTIVES To examine the diagnostic capabilities of ChatGPT using real-world anonymized medical dermatology cases. METHODS Clinical information from 90 consecutive patients referred to a single dermatology emergency clinic between June and December 2022 were examined. Thirty-six patients were included. Anonymized clinical information was transcribed and input into ChatGPT 4.0 followed by the question 'What is the most likely diagnosis?' The suggested diagnosis made by ChatGPT was then compared with the diagnosis made by dermatology. RESULTS After inputting clinical history and examination data obtained by a dermatologist, ChatGPT made a correct primary diagnosis 56% of the time (n = 20). Using the clinical history and cutaneous signs recorded by nonspecialists, it was able to make a correct diagnosis 39% of the time (n = 14). This was similar to the diagnostic rate of nonspecialists (36%; n = 13), but it was much lower than that of dermatologists (83%; n = 30). There was no differential offered by referring sources 28% of the time (n = 10), unlike ChatGPT, which provided a differential diagnosis 100% of the time. Qualitative analysis showed that ChatGPT offered responses with caution, often justifying its reasoning. CONCLUSIONS This study illustrates that while ChatGPT has a diagnostic capability, in its current form it does not significantly improve the diagnostic yield in primary or secondary care.
Collapse
Affiliation(s)
- Sophie Stoneham
- Department of Dermatology, Royal South Hants Hospital, University Hospitals Southampton, Southampton, UK
| | - Amy Livesey
- Department of Dermatology, St Mary's Hospital, Portsmouth Hospitals University NHS Trust, Portsmouth, UK
| | - Hywel Cooper
- Department of Dermatology, St Mary's Hospital, Portsmouth Hospitals University NHS Trust, Portsmouth, UK
| | - Charles Mitchell
- Department of Dermatology, St Mary's Hospital, Portsmouth Hospitals University NHS Trust, Portsmouth, UK
| |
Collapse
|
2
|
Buldur M, Sezer B. Evaluating the accuracy of Chat Generative Pre-trained Transformer version 4 (ChatGPT-4) responses to United States Food and Drug Administration (FDA) frequently asked questions about dental amalgam. BMC Oral Health 2024; 24:605. [PMID: 38789962 PMCID: PMC11127407 DOI: 10.1186/s12903-024-04358-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Accepted: 05/09/2024] [Indexed: 05/26/2024] Open
Abstract
BACKGROUND The use of artificial intelligence in the field of health sciences is becoming widespread. It is known that patients benefit from artificial intelligence applications on various health issues, especially after the pandemic period. One of the most important issues in this regard is the accuracy of the information provided by artificial intelligence applications. OBJECTIVE The purpose of this study was to the frequently asked questions about dental amalgam, as determined by the United States Food and Drug Administration (FDA), which is one of these information resources, to Chat Generative Pre-trained Transformer version 4 (ChatGPT-4) and to compare the content of the answers given by the application with the answers of the FDA. METHODS The questions were directed to ChatGPT-4 on May 8th and May 16th, 2023, and the responses were recorded and compared at the word and meaning levels using ChatGPT. The answers from the FDA webpage were also recorded. The responses were compared for content similarity in "Main Idea", "Quality Analysis", "Common Ideas", and "Inconsistent Ideas" between ChatGPT-4's responses and FDA's responses. RESULTS ChatGPT-4 provided similar responses at one-week intervals. In comparison with FDA guidance, it provided answers with similar information content to frequently asked questions. However, although there were some similarities in the general aspects of the recommendation regarding amalgam removal in the question, the two texts are not the same, and they offered different perspectives on the replacement of fillings. CONCLUSIONS The findings of this study indicate that ChatGPT-4, an artificial intelligence based application, encompasses current and accurate information regarding dental amalgam and its removal, providing it to individuals seeking access to such information. Nevertheless, we believe that numerous studies are required to assess the validity and reliability of ChatGPT-4 across diverse subjects.
Collapse
Affiliation(s)
- Mehmet Buldur
- Department of Restorative Dentistry, School of Dentistry, Çanakkale Onsekiz Mart University, Çanakkale, Türkiye
| | - Berkant Sezer
- Department of Pediatric Dentistry, School of Dentistry, Çanakkale Onsekiz Mart University, Çanakkale, Türkiye.
| |
Collapse
|
3
|
Devranoglu B, Gurbuz T, Gokmen O. ChatGPT's Efficacy in Queries Regarding Polycystic Ovary Syndrome and Treatment Strategies for Women Experiencing Infertility. Diagnostics (Basel) 2024; 14:1082. [PMID: 38893609 PMCID: PMC11172366 DOI: 10.3390/diagnostics14111082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 05/14/2024] [Accepted: 05/21/2024] [Indexed: 06/21/2024] Open
Abstract
This study assesses the efficacy of ChatGPT-4, an advanced artificial intelligence (AI) language model, in delivering precise and comprehensive answers to inquiries regarding managing polycystic ovary syndrome (PCOS)-related infertility. The research team, comprising experienced gynecologists, formulated 460 structured queries encompassing a wide range of common and intricate PCOS scenarios. The queries were: true/false (170), open-ended (165), and multiple-choice (125) and further classified as 'easy', 'moderate', and 'hard'. For true/false questions, ChatGPT-4 achieved a flawless accuracy rate of 100% initially and upon reassessment after 30 days. In the open-ended category, there was a noteworthy enhancement in accuracy, with scores increasing from 5.53 ± 0.89 initially to 5.88 ± 0.43 at the 30-day mark (p < 0.001). Completeness scores for open-ended queries also experienced a significant improvement, rising from 2.35 ± 0.58 to 2.92 ± 0.29 (p < 0.001). In the multiple-choice category, although the accuracy score exhibited a minor decline from 5.96 ± 0.44 to 5.92 ± 0.63 after 30 days (p > 0.05). Completeness scores for multiple-choice questions remained consistent, with initial and 30-day means of 2.98 ± 0.18 and 2.97 ± 0.25, respectively (p > 0.05). ChatGPT-4 demonstrated exceptional performance in true/false queries and significantly improved handling of open-ended questions during the 30 days. These findings emphasize the potential of AI, particularly ChatGPT-4, in enhancing decision-making support for healthcare professionals managing PCOS-related infertility.
Collapse
Affiliation(s)
- Belgin Devranoglu
- Department of Obstetrics and Gynecology, Zeynep Kamil Maternity/Children, Education and Training Hospital, Istanbul 34480, Turkey
| | - Tugba Gurbuz
- Department of Gynecology and Obstetrics Clinic, Medistate Hospital, Istanbul 34820, Turkey;
| | - Oya Gokmen
- Department of Gynecology, Obstetrics and In Vitro Fertilization Clinic, Medistate Hospital, Istanbul 34820, Turkey;
| |
Collapse
|
4
|
Cil G, Dogan K. The efficacy of artificial intelligence in urology: a detailed analysis of kidney stone-related queries. World J Urol 2024; 42:158. [PMID: 38483582 PMCID: PMC10940482 DOI: 10.1007/s00345-024-04847-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 01/24/2024] [Indexed: 03/17/2024] Open
Abstract
PURPOSE The study aimed to assess the efficacy of OpenAI's advanced AI model, ChatGPT, in diagnosing urological conditions, focusing on kidney stones. MATERIALS AND METHODS A set of 90 structured questions, compliant with EAU Guidelines 2023, was curated by seasoned urologists for this investigation. We evaluated ChatGPT's performance based on the accuracy and completeness of its responses to two types of questions [binary (true/false) and descriptive (multiple-choice)], stratified into difficulty levels: easy, moderate, and complex. Furthermore, we analyzed the model's learning and adaptability capacity by reassessing the initially incorrect responses after a 2 week interval. RESULTS The model demonstrated commendable accuracy, correctly answering 80% of binary questions (n:45) and 93.3% of descriptive questions (n:45). The model's performance showed no significant variation across different question difficulty levels, with p-values of 0.548 for accuracy and 0.417 for completeness, respectively. Upon reassessment of initially 12 incorrect responses (9 binary to 3 descriptive) after two weeks, ChatGPT's accuracy showed substantial improvement. The mean accuracy score significantly increased from 1.58 ± 0.51 to 2.83 ± 0.93 (p = 0.004), underlining the model's ability to learn and adapt over time. CONCLUSION These findings highlight the potential of ChatGPT in urological diagnostics, but also underscore areas requiring enhancement, especially in the completeness of responses to complex queries. The study endorses AI's incorporation into healthcare, while advocating for prudence and professional supervision in its application.
Collapse
Affiliation(s)
- Gökhan Cil
- Department of Urology, Bagcilar Training and Research Hospital, University of Health Sciences, Istanbul, Turkey.
| | - Kazim Dogan
- Department of Urology, Faculty of Medicine, Istinye University, Istanbul, Turkey
| |
Collapse
|
5
|
Lee Y, Kim SY. Potential applications of ChatGPT in obstetrics and gynecology in Korea: a review article. Obstet Gynecol Sci 2024; 67:153-159. [PMID: 38247132 PMCID: PMC10948210 DOI: 10.5468/ogs.23231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 11/08/2023] [Accepted: 11/29/2023] [Indexed: 01/23/2024] Open
Abstract
The use of chatbot technology, particularly chat generative pre-trained transformer (ChatGPT) with an impressive 175 billion parameters, has garnered significant attention across various domains, including Obstetrics and Gynecology (OBGYN). This comprehensive review delves into the transformative potential of chatbots with a special focus on ChatGPT as a leading artificial intelligence (AI) technology. Moreover, ChatGPT harnesses the power of deep learning algorithms to generate responses that closely mimic human language, opening up myriad applications in medicine, research, and education. In the field of medicine, ChatGPT plays a pivotal role in diagnosis, treatment, and personalized patient education. Notably, the technology has demonstrated remarkable capabilities, surpassing human performance in OBGYN examinations, and delivering highly accurate diagnoses. However, challenges remain, including the need to verify the accuracy of the information and address the ethical considerations and limitations. In the wide scope of chatbot technology, AI systems play a vital role in healthcare processes, including documentation, diagnosis, research, and education. Although promising, the limitations and occasional inaccuracies require validation by healthcare professionals. This review also examined global chatbot adoption in healthcare, emphasizing the need for user awareness to ensure patient safety. Chatbot technology holds great promise in OBGYN and medicine, offering innovative solutions while necessitating responsible integration to ensure patient care and safety.
Collapse
Affiliation(s)
- YooKyung Lee
- Division of Maternal Fetal Medicine, Department of Obstetrics and Gynecology, MizMedi Hospital, Seoul, Korea
| | - So Yun Kim
- Division of Maternal Fetal Medicine, Department of Obstetrics and Gynecology, MizMedi Hospital, Seoul, Korea
| |
Collapse
|
6
|
Peng Z, Ma R, Zhang Y, Yan M, Lu J, Cheng Q, Liao J, Zhang Y, Wang J, Zhao Y, Zhu J, Qin B, Jiang Q, Shi F, Qian J, Chen X, Zhao C. Development and evaluation of multimodal AI for diagnosis and triage of ophthalmic diseases using ChatGPT and anterior segment images: protocol for a two-stage cross-sectional study. Front Artif Intell 2023; 6:1323924. [PMID: 38145231 PMCID: PMC10748413 DOI: 10.3389/frai.2023.1323924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 11/22/2023] [Indexed: 12/26/2023] Open
Abstract
Introduction Artificial intelligence (AI) technology has made rapid progress for disease diagnosis and triage. In the field of ophthalmic diseases, image-based diagnosis has achieved high accuracy but still encounters limitations due to the lack of medical history. The emergence of ChatGPT enables human-computer interaction, allowing for the development of a multimodal AI system that integrates interactive text and image information. Objective To develop a multimodal AI system using ChatGPT and anterior segment images for diagnosing and triaging ophthalmic diseases. To assess the AI system's performance through a two-stage cross-sectional study, starting with silent evaluation and followed by early clinical evaluation in outpatient clinics. Methods and analysis Our study will be conducted across three distinct centers in Shanghai, Nanjing, and Suqian. The development of the smartphone-based multimodal AI system will take place in Shanghai with the goal of achieving ≥90% sensitivity and ≥95% specificity for diagnosing and triaging ophthalmic diseases. The first stage of the cross-sectional study will explore the system's performance in Shanghai's outpatient clinics. Medical histories will be collected without patient interaction, and anterior segment images will be captured using slit lamp equipment. This stage aims for ≥85% sensitivity and ≥95% specificity with a sample size of 100 patients. The second stage will take place at three locations, with Shanghai serving as the internal validation dataset, and Nanjing and Suqian as the external validation dataset. Medical history will be collected through patient interviews, and anterior segment images will be captured via smartphone devices. An expert panel will establish reference standards and assess AI accuracy for diagnosis and triage throughout all stages. A one-vs.-rest strategy will be used for data analysis, and a post-hoc power calculation will be performed to evaluate the impact of disease types on AI performance. Discussion Our study may provide a user-friendly smartphone-based multimodal AI system for diagnosis and triage of ophthalmic diseases. This innovative system may support early detection of ocular abnormalities, facilitate establishment of a tiered healthcare system, and reduce the burdens on tertiary facilities. Trial registration The study was registered in ClinicalTrials.gov on June 25th, 2023 (NCT05930444).
Collapse
Affiliation(s)
- Zhiyu Peng
- Department of Ophthalmology, Fudan Eye & ENT Hospital, Shanghai, China
- Department of Ophthalmology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China
- NHC Key Laboratory of Myopia, Fudan University, Shanghai, China
| | - Ruiqi Ma
- Department of Ophthalmology, Fudan Eye & ENT Hospital, Shanghai, China
- Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China
- NHC Key Laboratory of Myopia, Fudan University, Shanghai, China
| | - Yihan Zhang
- Department of Ophthalmology, Fudan Eye & ENT Hospital, Shanghai, China
- Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China
- NHC Key Laboratory of Myopia, Fudan University, Shanghai, China
| | - Mingxu Yan
- Department of Ophthalmology, Fudan Eye & ENT Hospital, Shanghai, China
- Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China
- NHC Key Laboratory of Myopia, Fudan University, Shanghai, China
- School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Jie Lu
- Department of Ophthalmology, Fudan Eye & ENT Hospital, Shanghai, China
- Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China
- NHC Key Laboratory of Myopia, Fudan University, Shanghai, China
- School of Public Health, Fudan University, Shanghai, China
| | - Qian Cheng
- Medical Image Processing, Analysis, and Visualization (MIVAP) Lab, School of Electronics and Information Engineering, Soochow University, Suzhou, China
| | - Jingjing Liao
- Medical Image Processing, Analysis, and Visualization (MIVAP) Lab, School of Electronics and Information Engineering, Soochow University, Suzhou, China
| | - Yunqiu Zhang
- School of Public Health, Fudan University, Shanghai, China
| | - Jinghan Wang
- Department of Ophthalmology, Fudan Eye & ENT Hospital, Shanghai, China
- Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China
- NHC Key Laboratory of Myopia, Fudan University, Shanghai, China
| | - Yue Zhao
- The Affiliated Eye Hospital, Nanjing Medical University, Nanjing, China
| | - Jiang Zhu
- Department of Ophthalmology, Suqian First Hospital, Suqian, China
| | - Bing Qin
- Department of Ophthalmology, Suqian First Hospital, Suqian, China
| | - Qin Jiang
- The Affiliated Eye Hospital, Nanjing Medical University, Nanjing, China
- The Fourth School of Clinical Medicine, Nanjing Medical University, Nanjing, China
| | - Fei Shi
- Medical Image Processing, Analysis, and Visualization (MIVAP) Lab, School of Electronics and Information Engineering, Soochow University, Suzhou, China
| | - Jiang Qian
- Department of Ophthalmology, Fudan Eye & ENT Hospital, Shanghai, China
- Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China
- NHC Key Laboratory of Myopia, Fudan University, Shanghai, China
| | - Xinjian Chen
- Medical Image Processing, Analysis, and Visualization (MIVAP) Lab, School of Electronics and Information Engineering, Soochow University, Suzhou, China
- State Key Laboratory of Radiation Medicine and Protection, Soochow University, Suzhou, China
| | - Chen Zhao
- Department of Ophthalmology, Fudan Eye & ENT Hospital, Shanghai, China
- Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China
- NHC Key Laboratory of Myopia, Fudan University, Shanghai, China
| |
Collapse
|
7
|
Chen TC, Multala E, Kearns P, Delashaw J, Dumont A, Maraganore D, Wang A. Assessment of ChatGPT's performance on neurology written board examination questions. BMJ Neurol Open 2023; 5:e000530. [PMID: 37936648 PMCID: PMC10626870 DOI: 10.1136/bmjno-2023-000530] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 10/19/2023] [Indexed: 11/09/2023] Open
Abstract
Background and objectives ChatGPT has shown promise in healthcare. To assess the utility of this novel tool in healthcare education, we evaluated ChatGPT's performance in answering neurology board exam questions. Methods Neurology board-style examination questions were accessed from BoardVitals, a commercial neurology question bank. ChatGPT was provided a full question prompt and multiple answer choices. First attempts and additional attempts up to three tries were given to ChatGPT to select the correct answer. A total of 560 questions (14 blocks of 40 questions) were used, although any image-based questions were disregarded due to ChatGPT's inability to process visual input. The artificial intelligence (AI) answers were then compared with human user data provided by the question bank to gauge its performance. Results Out of 509 eligible questions over 14 question blocks, ChatGPT correctly answered 335 questions (65.8%) on the first attempt/iteration and 383 (75.3%) over three attempts/iterations, scoring at approximately the 26th and 50th percentiles, respectively. The highest performing subjects were pain (100%), epilepsy & seizures (85%) and genetic (82%) while the lowest performing subjects were imaging/diagnostic studies (27%), critical care (41%) and cranial nerves (48%). Discussion This study found that ChatGPT performed similarly to its human counterparts. The accuracy of the AI increased with multiple attempts and performance fell within the expected range of neurology resident learners. This study demonstrates ChatGPT's potential in processing specialised medical information. Future studies would better define the scope to which AI would be able to integrate into medical decision making.
Collapse
Affiliation(s)
- Tse Chian Chen
- Neurology, Tulane University School of Medicine, New Orleans, Louisiana, USA
| | - Evan Multala
- Tulane University School of Medicine, New Orleans, Louisiana, USA
| | - Patrick Kearns
- Tulane University School of Medicine, New Orleans, Louisiana, USA
| | - Johnny Delashaw
- Neurosurgery, Tulane University School of Medicine, New Orleans, Louisiana, USA
| | - Aaron Dumont
- Neurosurgery, Tulane University School of Medicine, New Orleans, Louisiana, USA
| | | | - Arthur Wang
- Neurosurgery, Tulane University School of Medicine, New Orleans, Louisiana, USA
| |
Collapse
|