1
|
Ming S, Guo Q, Cheng W, Lei B. Influence of Model Evolution and System Roles on ChatGPT's Performance in Chinese Medical Licensing Exams: Comparative Study. JMIR MEDICAL EDUCATION 2024; 10:e52784. [PMID: 39140269 DOI: 10.2196/52784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 05/20/2024] [Accepted: 06/20/2024] [Indexed: 08/15/2024]
Abstract
Background With the increasing application of large language models like ChatGPT in various industries, its potential in the medical domain, especially in standardized examinations, has become a focal point of research. Objective The aim of this study is to assess the clinical performance of ChatGPT, focusing on its accuracy and reliability in the Chinese National Medical Licensing Examination (CNMLE). Methods The CNMLE 2022 question set, consisting of 500 single-answer multiple choices questions, were reclassified into 15 medical subspecialties. Each question was tested 8 to 12 times in Chinese on the OpenAI platform from April 24 to May 15, 2023. Three key factors were considered: the version of GPT-3.5 and 4.0, the prompt's designation of system roles tailored to medical subspecialties, and repetition for coherence. A passing accuracy threshold was established as 60%. The χ2 tests and κ values were employed to evaluate the model's accuracy and consistency. Results GPT-4.0 achieved a passing accuracy of 72.7%, which was significantly higher than that of GPT-3.5 (54%; P<.001). The variability rate of repeated responses from GPT-4.0 was lower than that of GPT-3.5 (9% vs 19.5%; P<.001). However, both models showed relatively good response coherence, with κ values of 0.778 and 0.610, respectively. System roles numerically increased accuracy for both GPT-4.0 (0.3%-3.7%) and GPT-3.5 (1.3%-4.5%), and reduced variability by 1.7% and 1.8%, respectively (P>.05). In subgroup analysis, ChatGPT achieved comparable accuracy among different question types (P>.05). GPT-4.0 surpassed the accuracy threshold in 14 of 15 subspecialties, while GPT-3.5 did so in 7 of 15 on the first response. Conclusions GPT-4.0 passed the CNMLE and outperformed GPT-3.5 in key areas such as accuracy, consistency, and medical subspecialty expertise. Adding a system role insignificantly enhanced the model's reliability and answer coherence. GPT-4.0 showed promising potential in medical education and clinical practice, meriting further study.
Collapse
Affiliation(s)
- Shuai Ming
- Department of Ophthalmology, Henan Eye Hospital, Henan Provincial People's Hospital, Zhengzhou, China
- Eye Institute, Henan Academy of Innovations in Medical Science, Zhengzhou, China
- Henan Clinical Research Center for Ocular Diseases, People's Hospital of Zhengzhou University, Zhengzhou, China
| | - Qingge Guo
- Department of Ophthalmology, Henan Eye Hospital, Henan Provincial People's Hospital, Zhengzhou, China
- Eye Institute, Henan Academy of Innovations in Medical Science, Zhengzhou, China
- Henan Clinical Research Center for Ocular Diseases, People's Hospital of Zhengzhou University, Zhengzhou, China
| | - Wenjun Cheng
- Department of Ophthalmology, People's Hospital of Zhengzhou University, Zhengzhou, China
| | - Bo Lei
- Department of Ophthalmology, Henan Eye Hospital, Henan Provincial People's Hospital, Zhengzhou, China
- Eye Institute, Henan Academy of Innovations in Medical Science, Zhengzhou, China
- Henan Clinical Research Center for Ocular Diseases, People's Hospital of Zhengzhou University, Zhengzhou, China
| |
Collapse
|
2
|
García-Alonso EM, León-Mejía AC, Sánchez-Cabrero R, Guzmán-Ordaz R. Training and Technology Acceptance of ChatGPT in University Students of Social Sciences: A Netcoincidental Analysis. Behav Sci (Basel) 2024; 14:612. [PMID: 39062435 PMCID: PMC11274043 DOI: 10.3390/bs14070612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Revised: 07/05/2024] [Accepted: 07/15/2024] [Indexed: 07/28/2024] Open
Abstract
This study analyzes the perception and usage of ChatGPT based on the technology acceptance model (TAM). Conducting reticular analysis of coincidences (RAC) on a convenience survey among university students in the social sciences, this research delves into the perception and utilization of this artificial intelligence tool. The analysis considers variables such as gender, academic year, prior experience with ChatGPT, and the training provided by university faculty. The networks created with the statistical tool "CARING" highlight the role of perceived utility, credibility, and prior experience in shaping attitudes and behaviors toward this emerging technology. Previous experience, familiarity with video games, and programming knowledge were related to more favorable attitudes towards ChatGPT. Students who received specific training showed lower confidence in the tool. These findings underscore the importance of implementing training strategies that raise awareness among students about both the potential strengths and weaknesses of artificial intelligence in educational contexts.
Collapse
Affiliation(s)
- Elena María García-Alonso
- Department of Sociology and Communication, Faculty of Social Sciences, University of Salamanca, 37007 Salamanca, Spain; (A.C.L.-M.); (R.G.-O.)
| | - Ana Cristina León-Mejía
- Department of Sociology and Communication, Faculty of Social Sciences, University of Salamanca, 37007 Salamanca, Spain; (A.C.L.-M.); (R.G.-O.)
| | - Roberto Sánchez-Cabrero
- Department of Evolutionary Psychology and Education, Faculty of Teacher Training and Education, Autonomous University of Madrid, 28049 Madrid, Spain;
| | - Raquel Guzmán-Ordaz
- Department of Sociology and Communication, Faculty of Social Sciences, University of Salamanca, 37007 Salamanca, Spain; (A.C.L.-M.); (R.G.-O.)
| |
Collapse
|
3
|
Worthing KA, Roberts M, Šlapeta J. Surveyed veterinary students in Australia find ChatGPT practical and relevant while expressing no concern about artificial intelligence replacing veterinarians. Vet Rec Open 2024; 11:e280. [PMID: 38854916 PMCID: PMC11162838 DOI: 10.1002/vro2.80] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 04/25/2024] [Accepted: 05/03/2024] [Indexed: 06/11/2024] Open
Abstract
Background Chat Generative Pre-trained Transformer (ChatGPT) is a freely available online artificial intelligence (AI) program capable of understanding and generating human-like language. This study assessed veterinary students' perceptions about ChatGPT in education and practice. It compared perceptions about ChatGPT between students who had completed a critical analysis task and those who had not. Methods This cross-sectional study surveyed 498 Doctor of Veterinary Medicine (DVM) students at The University of Sydney, Australia. Second-year DVM students researched a veterinary pathogen and then completed a critical analysis of ChatGPT (version 3.5) output for the same pathogen. A survey based on the Technology Acceptance Model was then delivered to all DVM students from all years of the programme, collecting data using Likert-style, categorical and free-text items. Results Over 75% of the 100 respondents reported having used ChatGPT. The students found ChatGPT's output relevant and practical for their use but perceived it as inaccurate. They perceived ChatGPT output to be more useful for veterinary students than for pet owners or veterinarians. Those who had completed the critical analysis assignment had a more positive view of ChatGPT's practicality for veterinary students but noted its authoritative tone even when delivering inaccurate information. Over 50% of the students agreed that information about tools such as ChatGPT should be included in the veterinary curriculum. Students agreed that veterinarians should embrace AI but disagreed that AI would eventually replace the need for veterinarians. Conclusions A critical appraisal of outputs from AI tools such as ChatGPT may help prepare future veterinarians for the effective use of these tools.
Collapse
Affiliation(s)
- Kate A. Worthing
- Sydney School of Veterinary ScienceFaculty of ScienceThe University of SydneySydneyNew South WalesAustralia
- Sydney Infectious Diseases InstituteThe University of SydneySydneyNew South WalesAustralia
| | - Madeleine Roberts
- Sydney School of Veterinary ScienceFaculty of ScienceThe University of SydneySydneyNew South WalesAustralia
| | - Jan Šlapeta
- Sydney School of Veterinary ScienceFaculty of ScienceThe University of SydneySydneyNew South WalesAustralia
| |
Collapse
|
4
|
Buldur M, Sezer B. Evaluating the accuracy of Chat Generative Pre-trained Transformer version 4 (ChatGPT-4) responses to United States Food and Drug Administration (FDA) frequently asked questions about dental amalgam. BMC Oral Health 2024; 24:605. [PMID: 38789962 PMCID: PMC11127407 DOI: 10.1186/s12903-024-04358-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Accepted: 05/09/2024] [Indexed: 05/26/2024] Open
Abstract
BACKGROUND The use of artificial intelligence in the field of health sciences is becoming widespread. It is known that patients benefit from artificial intelligence applications on various health issues, especially after the pandemic period. One of the most important issues in this regard is the accuracy of the information provided by artificial intelligence applications. OBJECTIVE The purpose of this study was to the frequently asked questions about dental amalgam, as determined by the United States Food and Drug Administration (FDA), which is one of these information resources, to Chat Generative Pre-trained Transformer version 4 (ChatGPT-4) and to compare the content of the answers given by the application with the answers of the FDA. METHODS The questions were directed to ChatGPT-4 on May 8th and May 16th, 2023, and the responses were recorded and compared at the word and meaning levels using ChatGPT. The answers from the FDA webpage were also recorded. The responses were compared for content similarity in "Main Idea", "Quality Analysis", "Common Ideas", and "Inconsistent Ideas" between ChatGPT-4's responses and FDA's responses. RESULTS ChatGPT-4 provided similar responses at one-week intervals. In comparison with FDA guidance, it provided answers with similar information content to frequently asked questions. However, although there were some similarities in the general aspects of the recommendation regarding amalgam removal in the question, the two texts are not the same, and they offered different perspectives on the replacement of fillings. CONCLUSIONS The findings of this study indicate that ChatGPT-4, an artificial intelligence based application, encompasses current and accurate information regarding dental amalgam and its removal, providing it to individuals seeking access to such information. Nevertheless, we believe that numerous studies are required to assess the validity and reliability of ChatGPT-4 across diverse subjects.
Collapse
Affiliation(s)
- Mehmet Buldur
- Department of Restorative Dentistry, School of Dentistry, Çanakkale Onsekiz Mart University, Çanakkale, Türkiye
| | - Berkant Sezer
- Department of Pediatric Dentistry, School of Dentistry, Çanakkale Onsekiz Mart University, Çanakkale, Türkiye.
| |
Collapse
|
5
|
Bhatia AP, Lambat A, Jain T. A Comparative Analysis of Conventional and Chat-Generative Pre-trained Transformer-Assisted Teaching Methods in Undergraduate Dental Education. Cureus 2024; 16:e60006. [PMID: 38854264 PMCID: PMC11162508 DOI: 10.7759/cureus.60006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/09/2024] [Indexed: 06/11/2024] Open
Abstract
INTRODUCTION In the present era, individuals have the ability to improve their study organization, attendance in classes, and use of mnemonics via the utilization of contemporary technology. The use of the open AI-based application Chat Generative Pre-Trained Transformer (ChatGPT) in dentistry is a developing domain, and the integration of this technology into dental education relies on the accessibility and efficacy of AI technology, as well as the readiness of institutions to adopt it. Furthermore, it is crucial to contemplate the possible ethical ramifications associated with the utilization of AI in the field of dentistry, as well as the need for dental practitioners to have adequate training in its use. In order to include the Chat Generative Pre-Trained Transformer in the dentistry curriculum, a thorough evaluation and consultation with field specialists would be necessary. This study aimed to determine whether the Chat Generative Pre-Trained Transformer is more effective than conventional teaching methods in teaching undergraduate dental students. METHOD Comparative research was conducted at Shri. Yashwantrao Chavan Memorial Medical and Rural Development Foundation's Dental College, Ahmednagar. Computer-generated random numbers were used to divide 100 students into two groups. Each group consists of 50 students. A didactic lecture was given using PowerPoint (Redmond, WA: Microsoft Corp.) for both groups. Group A was given textbooks to read and Group B used the Chat Generative Pre-Trained Transformer. An online questionnaire using Google Forms (Menlo Park, CA: Google LLC), which had been pre-validated, was sent via email to both groups. The pre- and post-test scores are then compared using the t-test. RESULT The calculated t-value is 12.263 (at 81 degrees of freedom) and the p-value is 0.000, which is less than 0.01. Therefore, the null hypothesis is rejected, and it is concluded that conventional method scores and ChatGPT method scores for the post-test have a high significant difference. Also, it is observed that the mean scores for the conventional method are higher than the mean scores for the ChatGPT method for the post-test. CONCLUSION It has been concluded from the study that traditional teaching methods are more effective for learning than understanding ChatGPT.
Collapse
Affiliation(s)
- Amrita P Bhatia
- Prosthodontics, Shri. Yashwantrao Chavan Memorial Medical and Rural Development Foundation's Dental College, Ahmednagar, IND
| | - Apurva Lambat
- Prosthodontics, Shri. Yashwantrao Chavan Memorial Medical and Rural Development Foundation's Dental College, Ahmednagar, IND
| | - Teerthesh Jain
- General Dentistry, Affordable Dentures and Implants, Indianapolis, USA
| |
Collapse
|
6
|
Sallam M, Barakat M, Sallam M. A Preliminary Checklist (METRICS) to Standardize the Design and Reporting of Studies on Generative Artificial Intelligence-Based Models in Health Care Education and Practice: Development Study Involving a Literature Review. Interact J Med Res 2024; 13:e54704. [PMID: 38276872 PMCID: PMC10905357 DOI: 10.2196/54704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Revised: 12/18/2023] [Accepted: 01/26/2024] [Indexed: 01/27/2024] Open
Abstract
BACKGROUND Adherence to evidence-based practice is indispensable in health care. Recently, the utility of generative artificial intelligence (AI) models in health care has been evaluated extensively. However, the lack of consensus guidelines on the design and reporting of findings of these studies poses a challenge for the interpretation and synthesis of evidence. OBJECTIVE This study aimed to develop a preliminary checklist to standardize the reporting of generative AI-based studies in health care education and practice. METHODS A literature review was conducted in Scopus, PubMed, and Google Scholar. Published records with "ChatGPT," "Bing," or "Bard" in the title were retrieved. Careful examination of the methodologies employed in the included records was conducted to identify the common pertinent themes and the possible gaps in reporting. A panel discussion was held to establish a unified and thorough checklist for the reporting of AI studies in health care. The finalized checklist was used to evaluate the included records by 2 independent raters. Cohen κ was used as the method to evaluate the interrater reliability. RESULTS The final data set that formed the basis for pertinent theme identification and analysis comprised a total of 34 records. The finalized checklist included 9 pertinent themes collectively referred to as METRICS (Model, Evaluation, Timing, Range/Randomization, Individual factors, Count, and Specificity of prompts and language). Their details are as follows: (1) Model used and its exact settings; (2) Evaluation approach for the generated content; (3) Timing of testing the model; (4) Transparency of the data source; (5) Range of tested topics; (6) Randomization of selecting the queries; (7) Individual factors in selecting the queries and interrater reliability; (8) Count of queries executed to test the model; and (9) Specificity of the prompts and language used. The overall mean METRICS score was 3.0 (SD 0.58). The tested METRICS score was acceptable, with the range of Cohen κ of 0.558 to 0.962 (P<.001 for the 9 tested items). With classification per item, the highest average METRICS score was recorded for the "Model" item, followed by the "Specificity" item, while the lowest scores were recorded for the "Randomization" item (classified as suboptimal) and "Individual factors" item (classified as satisfactory). CONCLUSIONS The METRICS checklist can facilitate the design of studies guiding researchers toward best practices in reporting results. The findings highlight the need for standardized reporting algorithms for generative AI-based studies in health care, considering the variability observed in methodologies and reporting. The proposed METRICS checklist could be a preliminary helpful base to establish a universally accepted approach to standardize the design and reporting of generative AI-based studies in health care, which is a swiftly evolving research topic.
Collapse
Affiliation(s)
- Malik Sallam
- Department of Pathology, Microbiology and Forensic Medicine, School of Medicine, The University of Jordan, Amman, Jordan
- Department of Clinical Laboratories and Forensic Medicine, Jordan University Hospital, Amman, Jordan
- Department of Translational Medicine, Faculty of Medicine, Lund University, Malmo, Sweden
| | - Muna Barakat
- Department of Clinical Pharmacy and Therapeutics, Faculty of Pharmacy, Applied Science Private University, Amman, Jordan
| | - Mohammed Sallam
- Department of Pharmacy, Mediclinic Parkview Hospital, Mediclinic Middle East, Dubai, United Arab Emirates
| |
Collapse
|
7
|
Robleto E, Habashi A, Kaplan MAB, Riley RL, Zhang C, Bianchi L, Shehadeh LA. Medical students' perceptions of an artificial intelligence (AI) assisted diagnosing program. MEDICAL TEACHER 2024:1-7. [PMID: 38306667 DOI: 10.1080/0142159x.2024.2305369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 01/10/2024] [Indexed: 02/04/2024]
Abstract
As artificial intelligence (AI) assisted diagnosing systems become accessible and user-friendly, evaluating how first-year medical students perceive such systems holds substantial importance in medical education. This study aimed to assess medical students' perceptions of an AI-assisted diagnostic tool known as 'Glass AI.' Data was collected from first year medical students enrolled in a 1.5-week Cell Physiology pre-clerkship unit. Students voluntarily participated in an activity that involved implementation of Glass AI to solve a clinical case. A questionnaire was designed using 3 domains: 1) immediate experience with Glass AI, 2) potential for Glass AI utilization in medical education, and 3) student deliberations of AI-assisted diagnostic systems for future healthcare environments. 73/202 (36.10%) of students completed the survey. 96% of the participants noted that Glass AI increased confidence in the diagnosis, 43% thought Glass AI lacked sufficient explanation, and 68% expressed risk concerns for the physician workforce. Students expressed future positive outlooks involving AI-assisted diagnosing systems in healthcare, provided strict regulations, are set to protect patient privacy and safety, address legal liability, remove system biases, and improve quality of patient care. In conclusion, first year medical students are aware that AI will play a role in their careers as students and future physicians.
Collapse
Affiliation(s)
- Emely Robleto
- Department of Medicine, Division of Cardiology, University of Miami Miller School of Medicine, Miami, FL, USA
- Interdisciplinary Stem Cell Institute, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Ali Habashi
- Department of Cinematic Arts, School of Communication, University of Miami, Miami, FL, USA
| | - Mary-Ann Benites Kaplan
- Department of Medical Education, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Richard L Riley
- Department of Medical Education, University of Miami Miller School of Medicine, Miami, FL, USA
- Department of Microbiology and Immunology, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Chi Zhang
- Department of Medical Education, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Laura Bianchi
- Department of Physiology and Biophysics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Lina A Shehadeh
- Department of Medicine, Division of Cardiology, University of Miami Miller School of Medicine, Miami, FL, USA
- Interdisciplinary Stem Cell Institute, University of Miami Miller School of Medicine, Miami, FL, USA
- Department of Medical Education, University of Miami Miller School of Medicine, Miami, FL, USA
| |
Collapse
|
8
|
Thorat VA, Rao P, Joshi N, Talreja P, Shetty A. The Role of Chatbot GPT Technology in Undergraduate Dental Education. Cureus 2024; 16:e54193. [PMID: 38496058 PMCID: PMC10942112 DOI: 10.7759/cureus.54193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/14/2024] [Indexed: 03/19/2024] Open
Abstract
This comprehensive article explores the transformative role of Chatbot GPT, based on the GPT-3 architecture, in revolutionizing dental education. The focus is on its impact across various facets, including personalized learning pathways, integration into virtual patient simulation scenarios, 24/7 accessibility, multilingual support, interactive dental dictionary functionality, evidence-based learning, and assessment and evaluation of dental students. The objective is to showcase how Chatbot GPT enhances educational experiences, promotes inclusivity, and aligns with contemporary pedagogical principles. Chatbot GPT emerges as a powerful ally in dental education, offering personalized learning experiences, risk-free clinical simulations, continuous accessibility, multilingual support, instant terminology assistance, evidence-based learning resources, and real-time assessment capabilities. Its adaptability caters to diverse learning needs, fostering a learner-centered approach and promoting lifelong learning for both dental students and practitioners. As a versatile tool, Chatbot GPT not only transforms the educational journey but also serves as a valuable asset for continuous professional development in the dynamic landscape of dentistry.
Collapse
Affiliation(s)
- Vinayak A Thorat
- Department of Periodontology, Bharati Vidyapeeth (Deemed to Be University) Dental College and Hospital, Navi Mumbai, IND
| | - Prajakta Rao
- Department of Periodontology, Bharati Vidyapeeth (Deemed to Be University) Dental College and Hospital, Navi Mumbai, IND
| | - Nilesh Joshi
- Department of Periodontology, Bharati Vidyapeeth (Deemed to Be University) Dental College and Hospital, Navi Mumbai, IND
| | - Prakash Talreja
- Department of Periodontology, Bharati Vidyapeeth (Deemed to Be University) Dental College and Hospital, Navi Mumbai, IND
| | - Anupa Shetty
- Department of Periodontology, Bharati Vidyapeeth (Deemed to Be University) Dental College and Hospital, Navi Mumbai, IND
| |
Collapse
|
9
|
Kapsali MZ, Livanis E, Tsalikidis C, Oikonomou P, Voultsos P, Tsaroucha A. Ethical Concerns About ChatGPT in Healthcare: A Useful Tool or the Tombstone of Original and Reflective Thinking? Cureus 2024; 16:e54759. [PMID: 38523987 PMCID: PMC10961144 DOI: 10.7759/cureus.54759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/23/2024] [Indexed: 03/26/2024] Open
Abstract
Artificial intelligence (AI), the uprising technology of computer science aiming to create digital systems with human behavior and intelligence, seems to have invaded almost every field of modern life. Launched in November 2022, ChatGPT (Chat Generative Pre-trained Transformer) is a textual AI application capable of creating human-like responses characterized by original language and high coherence. Although AI-based language models have demonstrated impressive capabilities in healthcare, ChatGPT has received controversial annotations from the scientific and academic communities. This chatbot already appears to have a massive impact as an educational tool for healthcare professionals and transformative potential for clinical practice and could lead to dramatic changes in scientific research. Nevertheless, rational concerns were raised regarding whether the pre-trained, AI-generated text would be a menace not only for original thinking and new scientific ideas but also for academic and research integrity, as it gets more and more difficult to distinguish its AI origin due to the coherence and fluency of the produced text. This short review aims to summarize the potential applications and the consequential implications of ChatGPT in the three critical pillars of medicine: education, research, and clinical practice. In addition, this paper discusses whether the current use of this chatbot is in compliance with the ethical principles for the safe use of AI in healthcare, as determined by the World Health Organization. Finally, this review highlights the need for an updated ethical framework and the increased vigilance of healthcare stakeholders to harvest the potential benefits and limit the imminent dangers of this new innovative technology.
Collapse
Affiliation(s)
- Marina Z Kapsali
- Postgraduate Program on Bioethics, Laboratory of Bioethics, Democritus University of Thrace, Alexandroupolis, GRC
| | - Efstratios Livanis
- Department of Accounting and Finance, University of Macedonia, Thessaloniki, GRC
| | - Christos Tsalikidis
- Department of General Surgery, Democritus University of Thrace, Alexandroupolis, GRC
| | - Panagoula Oikonomou
- Laboratory of Experimental Surgery, Department of General Surgery, Democritus University of Thrace, Alexandroupolis, GRC
| | - Polychronis Voultsos
- Laboratory of Forensic Medicine & Toxicology (Medical Law and Ethics), School of Medicine, Faculty of Health Sciences, Aristotle University of Thessaloniki, Thessaloniki, GRC
| | - Aleka Tsaroucha
- Department of General Surgery, Democritus University of Thrace, Alexandroupolis, GRC
| |
Collapse
|
10
|
Sallam M, Barakat M, Sallam M. Pilot Testing of a Tool to Standardize the Assessment of the Quality of Health Information Generated by Artificial Intelligence-Based Models. Cureus 2023; 15:e49373. [PMID: 38024074 PMCID: PMC10674084 DOI: 10.7759/cureus.49373] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/24/2023] [Indexed: 12/01/2023] Open
Abstract
Background Artificial intelligence (AI)-based conversational models, such as Chat Generative Pre-trained Transformer (ChatGPT), Microsoft Bing, and Google Bard, have emerged as valuable sources of health information for lay individuals. However, the accuracy of the information provided by these AI models remains a significant concern. This pilot study aimed to test a new tool with key themes for inclusion as follows: Completeness of content, Lack of false information in the content, Evidence supporting the content, Appropriateness of the content, and Relevance, referred to as "CLEAR", designed to assess the quality of health information delivered by AI-based models. Methods Tool development involved a literature review on health information quality, followed by the initial establishment of the CLEAR tool, which comprised five items that aimed to assess the following: completeness, lack of false information, evidence support, appropriateness, and relevance. Each item was scored on a five-point Likert scale from excellent to poor. Content validity was checked by expert review. Pilot testing involved 32 healthcare professionals using the CLEAR tool to assess content on eight different health topics deliberately designed with varying qualities. The internal consistency was checked with Cronbach's alpha (α). Feedback from the pilot test resulted in language modifications to improve the clarity of the items. The final CLEAR tool was used to assess the quality of health information generated by four distinct AI models on five health topics. The AI models were ChatGPT 3.5, ChatGPT 4, Microsoft Bing, and Google Bard, and the content generated was scored by two independent raters with Cohen's kappa (κ) for inter-rater agreement. Results The final five CLEAR items were: (1) Is the content sufficient?; (2) Is the content accurate?; (3) Is the content evidence-based?; (4) Is the content clear, concise, and easy to understand?; and (5) Is the content free from irrelevant information? Pilot testing on the eight health topics revealed acceptable internal consistency with a Cronbach's α range of 0.669-0.981. The use of the final CLEAR tool yielded the following average scores: Microsoft Bing (mean=24.4±0.42), ChatGPT-4 (mean=23.6±0.96), Google Bard (mean=21.2±1.79), and ChatGPT-3.5 (mean=20.6±5.20). The inter-rater agreement revealed the following Cohen κ values: for ChatGPT-3.5 (κ=0.875, P<.001), ChatGPT-4 (κ=0.780, P<.001), Microsoft Bing (κ=0.348, P=.037), and Google Bard (κ=.749, P<.001). Conclusions The CLEAR tool is a brief yet helpful tool that can aid in standardizing testing of the quality of health information generated by AI-based models. Future studies are recommended to validate the utility of the CLEAR tool in the quality assessment of AI-generated health-related content using a larger sample across various complex health topics.
Collapse
Affiliation(s)
- Malik Sallam
- Department of Pathology, Microbiology, and Forensic Medicine, School of Medicine, University of Jordan, Amman, JOR
- Department of Clinical Laboratories and Forensic Medicine, Jordan University Hospital, Amman, JOR
| | - Muna Barakat
- Department of Clinical Pharmacy and Therapeutics, School of Pharmacy, Applied Science Private University, Amman, JOR
- Department of Research, Middle East University, Amman, JOR
| | - Mohammed Sallam
- Department of Pharmacy, Mediclinic Parkview Hospital, Mediclinic Middle East, Dubai, ARE
| |
Collapse
|
11
|
Huang H. Performance of ChatGPT on Registered Nurse License Exam in Taiwan: A Descriptive Study. Healthcare (Basel) 2023; 11:2855. [PMID: 37958000 PMCID: PMC10649156 DOI: 10.3390/healthcare11212855] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Revised: 10/17/2023] [Accepted: 10/27/2023] [Indexed: 11/15/2023] Open
Abstract
(1) Background: AI (artificial intelligence) chatbots have been widely applied. ChatGPT could enhance individual learning capabilities and clinical reasoning skills and facilitate students' understanding of complex concepts in healthcare education. There is currently less emphasis on its application in nursing education. The application of ChatGPT in nursing education needs to be verified. (2) Methods: A descriptive study was used to analyze the scores of ChatGPT on the registered nurse license exam (RNLE) in 2022~2023, and to explore the response and explanations of ChatGPT. The process of data measurement encompassed input sourcing, encoding methods, and statistical analysis. (3) Results: ChatGPT promptly responded within seconds. The average score of four exams was around 51.6 to 63.75 by ChatGPT, and it passed the RNLE in 2022 1st and 2023 2nd. However, ChatGPT may generate misleading or inaccurate explanations, or it could lead to hallucination; confusion or misunderstanding about complicated scenarios; and languages bias. (4) Conclusions: ChatGPT may have the potential to assist with nursing education because of its advantages. It is recommended to integrate ChatGPT into different nursing courses, to assess its limitations and effectiveness through a variety of tools and methods.
Collapse
Affiliation(s)
- Huiman Huang
- School of Nursing, College of Nursing, Tzu Chi University of Science and Technology, Hualien 970302, Taiwan
| |
Collapse
|
12
|
Ghosh A, Maini Jindal N, Gupta VK, Bansal E, Kaur Bajwa N, Sett A. Is ChatGPT's Knowledge and Interpretative Ability Comparable to First Professional MBBS (Bachelor of Medicine, Bachelor of Surgery) Students of India in Taking a Medical Biochemistry Examination? Cureus 2023; 15:e47329. [PMID: 38021639 PMCID: PMC10657167 DOI: 10.7759/cureus.47329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/19/2023] [Indexed: 12/01/2023] Open
Abstract
Introduction ChatGPT is a large language model (LLM)-based chatbot that uses natural language processing to create humanlike conversational dialogue. It has created a significant impact on the entire global landscape, especially in sectors like finance and banking, e-commerce, education, legal, human resources (HR), and recruitment since its inception. There have been multiple ongoing controversies regarding the seamless integration of ChatGPT with the healthcare system because of its factual accuracy, lack of experience, lack of clarity, expertise, and above all, lack of empathy. Our study seeks to compare ChatGPT's knowledge and interpretative abilities with those of first-year medical students in India in the subject of medical biochemistry. Materials and methods A total of 79 questions (40 multiple choice questions and 39 subjective questions) of medical biochemistry were set for Phase 1, block II term examination. Chat GPT was enrolled as the 101st student in the class. The questions were entered into ChatGPT's interface and responses were noted. The response time for the multiple-choice questions (MCQs) asked was also noted. The answers given by ChatGPT and 100 students of the class were checked by two subject experts, and marks were given according to the quality of answers. Marks obtained by the AI chatbot were compared with the marks obtained by the students. Results ChatGPT scored 140 marks out of 200 and outperformed almost all the students and ranked fifth in the class. It scored very well in information-based MCQs (92%) and descriptive logical reasoning (80%), whereas performed poorly in descriptive clinical scenario-based questions (52%). In terms of time taken to respond to the MCQs, it took significantly more time to answer logical reasoning MCQs than simple information-based MCQs (3.10±0.882 sec vs. 2.02±0.477 sec, p<0.005). Conclusions ChatGPT was able to outperform almost all the students in the subject of medical biochemistry. If the ethical issues are dealt with efficiently, these LLMs have a huge potential to be used in teaching and learning methods of modern medicine by students successfully.
Collapse
Affiliation(s)
- Abhra Ghosh
- Biochemistry, Dayanand Medical College and Hospital, Ludhiana, IND
| | | | - Vikram K Gupta
- Community Medicine, Dayanand Medical College, Ludhiana, IND
| | - Ekta Bansal
- Biochemistry, Dayanand Medical College and Hospital, Ludhiana, IND
| | | | - Abhishek Sett
- Healthcare, Deloitte Consulting US India Pvt Ltd, Bangalore, IND
| |
Collapse
|