Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Ali K, Barhom N, Tamimi F, Duggal M. ChatGPT-A double-edged sword for healthcare education? Implications for assessments of dental students. Eur J Dent Educ 2024;28:206-211. [PMID: 37550893 DOI: 10.1111/eje.12937] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 07/08/2023] [Accepted: 07/24/2023] [Indexed: 08/09/2023]

For:	Ali K, Barhom N, Tamimi F, Duggal M. ChatGPT-A double-edged sword for healthcare education? Implications for assessments of dental students. Eur J Dent Educ 2024;28:206-211. [PMID: 37550893 DOI: 10.1111/eje.12937] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 07/08/2023] [Accepted: 07/24/2023] [Indexed: 08/09/2023]

Number

Cited by Other Article(s)

Ming S, Guo Q, Cheng W, Lei B. Influence of Model Evolution and System Roles on ChatGPT's Performance in Chinese Medical Licensing Exams: Comparative Study. JMIR MEDICAL EDUCATION 2024;10:e52784. [PMID: 39140269 DOI: 10.2196/52784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 05/20/2024] [Accepted: 06/20/2024] [Indexed: 08/15/2024]

Abstract

Background

With the increasing application of large language models like ChatGPT in various industries, its potential in the medical domain, especially in standardized examinations, has become a focal point of research.

Objective

The aim of this study is to assess the clinical performance of ChatGPT, focusing on its accuracy and reliability in the Chinese National Medical Licensing Examination (CNMLE).

Methods

The CNMLE 2022 question set, consisting of 500 single-answer multiple choices questions, were reclassified into 15 medical subspecialties. Each question was tested 8 to 12 times in Chinese on the OpenAI platform from April 24 to May 15, 2023. Three key factors were considered: the version of GPT-3.5 and 4.0, the prompt's designation of system roles tailored to medical subspecialties, and repetition for coherence. A passing accuracy threshold was established as 60%. The χ2 tests and κ values were employed to evaluate the model's accuracy and consistency.

Results

GPT-4.0 achieved a passing accuracy of 72.7%, which was significantly higher than that of GPT-3.5 (54%; P<.001). The variability rate of repeated responses from GPT-4.0 was lower than that of GPT-3.5 (9% vs 19.5%; P<.001). However, both models showed relatively good response coherence, with κ values of 0.778 and 0.610, respectively. System roles numerically increased accuracy for both GPT-4.0 (0.3%-3.7%) and GPT-3.5 (1.3%-4.5%), and reduced variability by 1.7% and 1.8%, respectively (P>.05). In subgroup analysis, ChatGPT achieved comparable accuracy among different question types (P>.05). GPT-4.0 surpassed the accuracy threshold in 14 of 15 subspecialties, while GPT-3.5 did so in 7 of 15 on the first response.

Conclusions

GPT-4.0 passed the CNMLE and outperformed GPT-3.5 in key areas such as accuracy, consistency, and medical subspecialty expertise. Adding a system role insignificantly enhanced the model's reliability and answer coherence. GPT-4.0 showed promising potential in medical education and clinical practice, meriting further study.

Collapse

García-Alonso EM, León-Mejía AC, Sánchez-Cabrero R, Guzmán-Ordaz R. Training and Technology Acceptance of ChatGPT in University Students of Social Sciences: A Netcoincidental Analysis. Behav Sci (Basel) 2024;14:612. [PMID: 39062435 PMCID: PMC11274043 DOI: 10.3390/bs14070612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Revised: 07/05/2024] [Accepted: 07/15/2024] [Indexed: 07/28/2024] Open

Worthing KA, Roberts M, Šlapeta J. Surveyed veterinary students in Australia find ChatGPT practical and relevant while expressing no concern about artificial intelligence replacing veterinarians. Vet Rec Open 2024;11:e280. [PMID: 38854916 PMCID: PMC11162838 DOI: 10.1002/vro2.80] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 04/25/2024] [Accepted: 05/03/2024] [Indexed: 06/11/2024] Open

Buldur M, Sezer B. Evaluating the accuracy of Chat Generative Pre-trained Transformer version 4 (ChatGPT-4) responses to United States Food and Drug Administration (FDA) frequently asked questions about dental amalgam. BMC Oral Health 2024;24:605. [PMID: 38789962 PMCID: PMC11127407 DOI: 10.1186/s12903-024-04358-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Accepted: 05/09/2024] [Indexed: 05/26/2024] Open

Abstract

BACKGROUND

The use of artificial intelligence in the field of health sciences is becoming widespread. It is known that patients benefit from artificial intelligence applications on various health issues, especially after the pandemic period. One of the most important issues in this regard is the accuracy of the information provided by artificial intelligence applications.

OBJECTIVE

The purpose of this study was to the frequently asked questions about dental amalgam, as determined by the United States Food and Drug Administration (FDA), which is one of these information resources, to Chat Generative Pre-trained Transformer version 4 (ChatGPT-4) and to compare the content of the answers given by the application with the answers of the FDA.

METHODS

The questions were directed to ChatGPT-4 on May 8th and May 16th, 2023, and the responses were recorded and compared at the word and meaning levels using ChatGPT. The answers from the FDA webpage were also recorded. The responses were compared for content similarity in "Main Idea", "Quality Analysis", "Common Ideas", and "Inconsistent Ideas" between ChatGPT-4's responses and FDA's responses.

RESULTS

ChatGPT-4 provided similar responses at one-week intervals. In comparison with FDA guidance, it provided answers with similar information content to frequently asked questions. However, although there were some similarities in the general aspects of the recommendation regarding amalgam removal in the question, the two texts are not the same, and they offered different perspectives on the replacement of fillings.

CONCLUSIONS

The findings of this study indicate that ChatGPT-4, an artificial intelligence based application, encompasses current and accurate information regarding dental amalgam and its removal, providing it to individuals seeking access to such information. Nevertheless, we believe that numerous studies are required to assess the validity and reliability of ChatGPT-4 across diverse subjects.

Collapse

Bhatia AP, Lambat A, Jain T. A Comparative Analysis of Conventional and Chat-Generative Pre-trained Transformer-Assisted Teaching Methods in Undergraduate Dental Education. Cureus 2024;16:e60006. [PMID: 38854264 PMCID: PMC11162508 DOI: 10.7759/cureus.60006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/09/2024] [Indexed: 06/11/2024] Open

Abstract

INTRODUCTION

In the present era, individuals have the ability to improve their study organization, attendance in classes, and use of mnemonics via the utilization of contemporary technology. The use of the open AI-based application Chat Generative Pre-Trained Transformer (ChatGPT) in dentistry is a developing domain, and the integration of this technology into dental education relies on the accessibility and efficacy of AI technology, as well as the readiness of institutions to adopt it. Furthermore, it is crucial to contemplate the possible ethical ramifications associated with the utilization of AI in the field of dentistry, as well as the need for dental practitioners to have adequate training in its use. In order to include the Chat Generative Pre-Trained Transformer in the dentistry curriculum, a thorough evaluation and consultation with field specialists would be necessary. This study aimed to determine whether the Chat Generative Pre-Trained Transformer is more effective than conventional teaching methods in teaching undergraduate dental students.

METHOD

Comparative research was conducted at Shri. Yashwantrao Chavan Memorial Medical and Rural Development Foundation's Dental College, Ahmednagar. Computer-generated random numbers were used to divide 100 students into two groups. Each group consists of 50 students. A didactic lecture was given using PowerPoint (Redmond, WA: Microsoft Corp.) for both groups. Group A was given textbooks to read and Group B used the Chat Generative Pre-Trained Transformer. An online questionnaire using Google Forms (Menlo Park, CA: Google LLC), which had been pre-validated, was sent via email to both groups. The pre- and post-test scores are then compared using the t-test.

RESULT

The calculated t-value is 12.263 (at 81 degrees of freedom) and the p-value is 0.000, which is less than 0.01. Therefore, the null hypothesis is rejected, and it is concluded that conventional method scores and ChatGPT method scores for the post-test have a high significant difference. Also, it is observed that the mean scores for the conventional method are higher than the mean scores for the ChatGPT method for the post-test.

CONCLUSION

It has been concluded from the study that traditional teaching methods are more effective for learning than understanding ChatGPT.

Collapse

Sallam M, Barakat M, Sallam M. A Preliminary Checklist (METRICS) to Standardize the Design and Reporting of Studies on Generative Artificial Intelligence-Based Models in Health Care Education and Practice: Development Study Involving a Literature Review. Interact J Med Res 2024;13:e54704. [PMID: 38276872 PMCID: PMC10905357 DOI: 10.2196/54704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Revised: 12/18/2023] [Accepted: 01/26/2024] [Indexed: 01/27/2024] Open

Abstract

BACKGROUND

Adherence to evidence-based practice is indispensable in health care. Recently, the utility of generative artificial intelligence (AI) models in health care has been evaluated extensively. However, the lack of consensus guidelines on the design and reporting of findings of these studies poses a challenge for the interpretation and synthesis of evidence.

OBJECTIVE

This study aimed to develop a preliminary checklist to standardize the reporting of generative AI-based studies in health care education and practice.

METHODS

A literature review was conducted in Scopus, PubMed, and Google Scholar. Published records with "ChatGPT," "Bing," or "Bard" in the title were retrieved. Careful examination of the methodologies employed in the included records was conducted to identify the common pertinent themes and the possible gaps in reporting. A panel discussion was held to establish a unified and thorough checklist for the reporting of AI studies in health care. The finalized checklist was used to evaluate the included records by 2 independent raters. Cohen κ was used as the method to evaluate the interrater reliability.

RESULTS

The final data set that formed the basis for pertinent theme identification and analysis comprised a total of 34 records. The finalized checklist included 9 pertinent themes collectively referred to as METRICS (Model, Evaluation, Timing, Range/Randomization, Individual factors, Count, and Specificity of prompts and language). Their details are as follows: (1) Model used and its exact settings; (2) Evaluation approach for the generated content; (3) Timing of testing the model; (4) Transparency of the data source; (5) Range of tested topics; (6) Randomization of selecting the queries; (7) Individual factors in selecting the queries and interrater reliability; (8) Count of queries executed to test the model; and (9) Specificity of the prompts and language used. The overall mean METRICS score was 3.0 (SD 0.58). The tested METRICS score was acceptable, with the range of Cohen κ of 0.558 to 0.962 (P<.001 for the 9 tested items). With classification per item, the highest average METRICS score was recorded for the "Model" item, followed by the "Specificity" item, while the lowest scores were recorded for the "Randomization" item (classified as suboptimal) and "Individual factors" item (classified as satisfactory).

CONCLUSIONS

The METRICS checklist can facilitate the design of studies guiding researchers toward best practices in reporting results. The findings highlight the need for standardized reporting algorithms for generative AI-based studies in health care, considering the variability observed in methodologies and reporting. The proposed METRICS checklist could be a preliminary helpful base to establish a universally accepted approach to standardize the design and reporting of generative AI-based studies in health care, which is a swiftly evolving research topic.

Collapse

Robleto E, Habashi A, Kaplan MAB, Riley RL, Zhang C, Bianchi L, Shehadeh LA. Medical students' perceptions of an artificial intelligence (AI) assisted diagnosing program. MEDICAL TEACHER 2024:1-7. [PMID: 38306667 DOI: 10.1080/0142159x.2024.2305369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 01/10/2024] [Indexed: 02/04/2024]

Thorat VA, Rao P, Joshi N, Talreja P, Shetty A. The Role of Chatbot GPT Technology in Undergraduate Dental Education. Cureus 2024;16:e54193. [PMID: 38496058 PMCID: PMC10942112 DOI: 10.7759/cureus.54193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/14/2024] [Indexed: 03/19/2024] Open

Kapsali MZ, Livanis E, Tsalikidis C, Oikonomou P, Voultsos P, Tsaroucha A. Ethical Concerns About ChatGPT in Healthcare: A Useful Tool or the Tombstone of Original and Reflective Thinking? Cureus 2024;16:e54759. [PMID: 38523987 PMCID: PMC10961144 DOI: 10.7759/cureus.54759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/23/2024] [Indexed: 03/26/2024] Open

Sallam M, Barakat M, Sallam M. Pilot Testing of a Tool to Standardize the Assessment of the Quality of Health Information Generated by Artificial Intelligence-Based Models. Cureus 2023;15:e49373. [PMID: 38024074 PMCID: PMC10674084 DOI: 10.7759/cureus.49373] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/24/2023] [Indexed: 12/01/2023] Open

Abstract

Background Artificial intelligence (AI)-based conversational models, such as Chat Generative Pre-trained Transformer (ChatGPT), Microsoft Bing, and Google Bard, have emerged as valuable sources of health information for lay individuals. However, the accuracy of the information provided by these AI models remains a significant concern. This pilot study aimed to test a new tool with key themes for inclusion as follows: Completeness of content, Lack of false information in the content, Evidence supporting the content, Appropriateness of the content, and Relevance, referred to as "CLEAR", designed to assess the quality of health information delivered by AI-based models. Methods Tool development involved a literature review on health information quality, followed by the initial establishment of the CLEAR tool, which comprised five items that aimed to assess the following: completeness, lack of false information, evidence support, appropriateness, and relevance. Each item was scored on a five-point Likert scale from excellent to poor. Content validity was checked by expert review. Pilot testing involved 32 healthcare professionals using the CLEAR tool to assess content on eight different health topics deliberately designed with varying qualities. The internal consistency was checked with Cronbach's alpha (α). Feedback from the pilot test resulted in language modifications to improve the clarity of the items. The final CLEAR tool was used to assess the quality of health information generated by four distinct AI models on five health topics. The AI models were ChatGPT 3.5, ChatGPT 4, Microsoft Bing, and Google Bard, and the content generated was scored by two independent raters with Cohen's kappa (κ) for inter-rater agreement. Results The final five CLEAR items were: (1) Is the content sufficient?; (2) Is the content accurate?; (3) Is the content evidence-based?; (4) Is the content clear, concise, and easy to understand?; and (5) Is the content free from irrelevant information? Pilot testing on the eight health topics revealed acceptable internal consistency with a Cronbach's α range of 0.669-0.981. The use of the final CLEAR tool yielded the following average scores: Microsoft Bing (mean=24.4±0.42), ChatGPT-4 (mean=23.6±0.96), Google Bard (mean=21.2±1.79), and ChatGPT-3.5 (mean=20.6±5.20). The inter-rater agreement revealed the following Cohen κ values: for ChatGPT-3.5 (κ=0.875, P<.001), ChatGPT-4 (κ=0.780, P<.001), Microsoft Bing (κ=0.348, P=.037), and Google Bard (κ=.749, P<.001). Conclusions The CLEAR tool is a brief yet helpful tool that can aid in standardizing testing of the quality of health information generated by AI-based models. Future studies are recommended to validate the utility of the CLEAR tool in the quality assessment of AI-generated health-related content using a larger sample across various complex health topics.

Collapse

Huang H. Performance of ChatGPT on Registered Nurse License Exam in Taiwan: A Descriptive Study. Healthcare (Basel) 2023;11:2855. [PMID: 37958000 PMCID: PMC10649156 DOI: 10.3390/healthcare11212855] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Revised: 10/17/2023] [Accepted: 10/27/2023] [Indexed: 11/15/2023] Open

Ghosh A, Maini Jindal N, Gupta VK, Bansal E, Kaur Bajwa N, Sett A. Is ChatGPT's Knowledge and Interpretative Ability Comparable to First Professional MBBS (Bachelor of Medicine, Bachelor of Surgery) Students of India in Taking a Medical Biochemistry Examination? Cureus 2023;15:e47329. [PMID: 38021639 PMCID: PMC10657167 DOI: 10.7759/cureus.47329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/19/2023] [Indexed: 12/01/2023] Open

Abstract

Introduction ChatGPT is a large language model (LLM)-based chatbot that uses natural language processing to create humanlike conversational dialogue. It has created a significant impact on the entire global landscape, especially in sectors like finance and banking, e-commerce, education, legal, human resources (HR), and recruitment since its inception. There have been multiple ongoing controversies regarding the seamless integration of ChatGPT with the healthcare system because of its factual accuracy, lack of experience, lack of clarity, expertise, and above all, lack of empathy. Our study seeks to compare ChatGPT's knowledge and interpretative abilities with those of first-year medical students in India in the subject of medical biochemistry. Materials and methods A total of 79 questions (40 multiple choice questions and 39 subjective questions) of medical biochemistry were set for Phase 1, block II term examination. Chat GPT was enrolled as the 101st student in the class. The questions were entered into ChatGPT's interface and responses were noted. The response time for the multiple-choice questions (MCQs) asked was also noted. The answers given by ChatGPT and 100 students of the class were checked by two subject experts, and marks were given according to the quality of answers. Marks obtained by the AI chatbot were compared with the marks obtained by the students. Results ChatGPT scored 140 marks out of 200 and outperformed almost all the students and ranked fifth in the class. It scored very well in information-based MCQs (92%) and descriptive logical reasoning (80%), whereas performed poorly in descriptive clinical scenario-based questions (52%). In terms of time taken to respond to the MCQs, it took significantly more time to answer logical reasoning MCQs than simple information-based MCQs (3.10±0.882 sec vs. 2.02±0.477 sec, p<0.005). Conclusions ChatGPT was able to outperform almost all the students in the subject of medical biochemistry. If the ethical issues are dealt with efficiently, these LLMs have a huge potential to be used in teaching and learning methods of modern medicine by students successfully.

Collapse