1
|
Bak M, Chin J. The potential and limitations of large language models in identification of the states of motivations for facilitating health behavior change. J Am Med Inform Assoc 2024; 31:2047-2053. [PMID: 38527272 PMCID: PMC11339501 DOI: 10.1093/jamia/ocae057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 02/26/2024] [Accepted: 03/04/2024] [Indexed: 03/27/2024] Open
Abstract
IMPORTANCE The study highlights the potential and limitations of the Large Language Models (LLMs) in recognizing different states of motivation to provide appropriate information for behavior change. Following the Transtheoretical Model (TTM), we identified the major gap of LLMs in responding to certain states of motivation through validated scenario studies, suggesting future directions of LLMs research for health promotion. OBJECTIVES The LLMs-based generative conversational agents (GAs) have shown success in identifying user intents semantically. Little is known about its capabilities to identify motivation states and provide appropriate information to facilitate behavior change progression. MATERIALS AND METHODS We evaluated 3 GAs, ChatGPT, Google Bard, and Llama 2 in identifying motivation states following the TTM stages of change. GAs were evaluated using 25 validated scenarios with 5 health topics across 5 TTM stages. The relevance and completeness of the responses to cover the TTM processes to proceed to the next stage of change were assessed. RESULTS 3 GAs identified the motivation states in the preparation stage providing sufficient information to proceed to the action stage. The responses to the motivation states in the action and maintenance stages were good enough covering partial processes for individuals to initiate and maintain their changes in behavior. However, the GAs were not able to identify users' motivation states in the precontemplation and contemplation stages providing irrelevant information, covering about 20%-30% of the processes. DISCUSSION GAs are able to identify users' motivation states and provide relevant information when individuals have established goals and commitments to take and maintain an action. However, individuals who are hesitant or ambivalent about behavior change are unlikely to receive sufficient and relevant guidance to proceed to the next stage of change. CONCLUSION The current GAs effectively identify motivation states of individuals with established goals but may lack support for those ambivalent towards behavior change.
Collapse
Affiliation(s)
- Michelle Bak
- School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL 61820, United States
| | - Jessie Chin
- School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL 61820, United States
| |
Collapse
|
2
|
Delgado-Ruiz R, Kim AS, Zhang H, Sullivan D, Awan KH, Stathopoulou PG. Generative Artificial Intelligence (Gen AI) in dental education: Opportunities, cautions, and recommendations. J Dent Educ 2024. [PMID: 39219015 DOI: 10.1002/jdd.13688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 06/27/2024] [Accepted: 07/20/2024] [Indexed: 09/04/2024]
Affiliation(s)
- Rafael Delgado-Ruiz
- Department of Prosthodontics and Digital Technology, Stony Brook University School of Dental Medicine, Stony Brook, New York, USA
| | - Amy S Kim
- Department of Pediatric Dentistry and Restorative Dentistry, University of Washington School of Dentistry, Seattle, Washington, USA
| | - Hai Zhang
- Department of Pediatric Dentistry and Restorative Dentistry, University of Washington School of Dentistry, Seattle, Washington, USA
| | - Diane Sullivan
- Department of Comprehensive Dentistry, University of Texas San Antonio School of Dentistry, San Antonio, Texas, USA
| | - Kamran H Awan
- Roseman University of Health Sciences College of Dental Medicine, South Jordan, Utah, USA
| | - Panagiota G Stathopoulou
- Division of Periodontology/Department of Regenerative and Reconstructive Sciences, Oregon Health & Science University School of Dentistry, Portland, Oregon, USA
| |
Collapse
|
3
|
Pool J, Indulska M, Sadiq S. Large language models and generative AI in telehealth: a responsible use lens. J Am Med Inform Assoc 2024; 31:2125-2136. [PMID: 38441296 PMCID: PMC11339524 DOI: 10.1093/jamia/ocae035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 02/05/2024] [Accepted: 02/14/2024] [Indexed: 08/23/2024] Open
Abstract
OBJECTIVE This scoping review aims to assess the current research landscape of the application and use of large language models (LLMs) and generative Artificial Intelligence (AI), through tools such as ChatGPT in telehealth. Additionally, the review seeks to identify key areas for future research, with a particular focus on AI ethics considerations for responsible use and ensuring trustworthy AI. MATERIALS AND METHODS Following the scoping review methodological framework, a search strategy was conducted across 6 databases. To structure our review, we employed AI ethics guidelines and principles, constructing a concept matrix for investigating the responsible use of AI in telehealth. Using the concept matrix in our review enabled the identification of gaps in the literature and informed future research directions. RESULTS Twenty studies were included in the review. Among the included studies, 5 were empirical, and 15 were reviews and perspectives focusing on different telehealth applications and healthcare contexts. Benefit and reliability concepts were frequently discussed in these studies. Privacy, security, and accountability were peripheral themes, with transparency, explainability, human agency, and contestability lacking conceptual or empirical exploration. CONCLUSION The findings emphasized the potential of LLMs, especially ChatGPT, in telehealth. They provide insights into understanding the use of LLMs, enhancing telehealth services, and taking ethical considerations into account. By proposing three future research directions with a focus on responsible use, this review further contributes to the advancement of this emerging phenomenon of healthcare AI.
Collapse
Affiliation(s)
- Javad Pool
- ARC Industrial Transformation Training Centre for Information Resilience (CIRES), The University of Queensland, Brisbane 4072, Australia
- School of Electrical Engineering and Computer Science, The University of Queensland, Brisbane 4072, Australia
| | - Marta Indulska
- ARC Industrial Transformation Training Centre for Information Resilience (CIRES), The University of Queensland, Brisbane 4072, Australia
- Business School, The University of Queensland, Brisbane 4072, Australia
| | - Shazia Sadiq
- ARC Industrial Transformation Training Centre for Information Resilience (CIRES), The University of Queensland, Brisbane 4072, Australia
- School of Electrical Engineering and Computer Science, The University of Queensland, Brisbane 4072, Australia
| |
Collapse
|
4
|
Thomae AV, Witt CM, Barth J. Integration of ChatGPT Into a Course for Medical Students: Explorative Study on Teaching Scenarios, Students' Perception, and Applications. JMIR MEDICAL EDUCATION 2024; 10:e50545. [PMID: 39177012 PMCID: PMC11360267 DOI: 10.2196/50545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 02/27/2024] [Accepted: 06/04/2024] [Indexed: 08/24/2024]
Abstract
Background Text-generating artificial intelligence (AI) such as ChatGPT offers many opportunities and challenges in medical education. Acquiring practical skills necessary for using AI in a clinical context is crucial, especially for medical education. Objective This explorative study aimed to investigate the feasibility of integrating ChatGPT into teaching units and to evaluate the course and the importance of AI-related competencies for medical students. Since a possible application of ChatGPT in the medical field could be the generation of information for patients, we further investigated how such information is perceived by students in terms of persuasiveness and quality. Methods ChatGPT was integrated into 3 different teaching units of a blended learning course for medical students. Using a mixed methods approach, quantitative and qualitative data were collected. As baseline data, we assessed students' characteristics, including their openness to digital innovation. The students evaluated the integration of ChatGPT into the course and shared their thoughts regarding the future of text-generating AI in medical education. The course was evaluated based on the Kirkpatrick Model, with satisfaction, learning progress, and applicable knowledge considered as key assessment levels. In ChatGPT-integrating teaching units, students evaluated videos featuring information for patients regarding their persuasiveness on treatment expectations in a self-experience experiment and critically reviewed information for patients written using ChatGPT 3.5 based on different prompts. Results A total of 52 medical students participated in the study. The comprehensive evaluation of the course revealed elevated levels of satisfaction, learning progress, and applicability specifically in relation to the ChatGPT-integrating teaching units. Furthermore, all evaluation levels demonstrated an association with each other. Higher openness to digital innovation was associated with higher satisfaction and, to a lesser extent, with higher applicability. AI-related competencies in other courses of the medical curriculum were perceived as highly important by medical students. Qualitative analysis highlighted potential use cases of ChatGPT in teaching and learning. In ChatGPT-integrating teaching units, students rated information for patients generated using a basic ChatGPT prompt as "moderate" in terms of comprehensibility, patient safety, and the correct application of communication rules taught during the course. The students' ratings were considerably improved using an extended prompt. The same text, however, showed the smallest increase in treatment expectations when compared with information provided by humans (patient, clinician, and expert) via videos. Conclusions This study offers valuable insights into integrating the development of AI competencies into a blended learning course. Integration of ChatGPT enhanced learning experiences for medical students.
Collapse
Affiliation(s)
- Anita V Thomae
- Institute for Complementary and Integrative Medicine, University Hospital Zurich and University of Zurich, Zurich, Switzerland
| | - Claudia M Witt
- Institute for Complementary and Integrative Medicine, University Hospital Zurich and University of Zurich, Zurich, Switzerland
- Institute of Social Medicine, Epidemiology and Health Economics, Charité – Universitätsmedizin Berlin, Corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, Germany
| | - Jürgen Barth
- Institute for Complementary and Integrative Medicine, University Hospital Zurich and University of Zurich, Zurich, Switzerland
| |
Collapse
|
5
|
Abi-Rafeh J, Bassiri-Tehrani B, Kazan R, Furnas H, Hammond D, Adams WP, Nahai F. Preoperative Patient Guidance and Education in Aesthetic Breast Plastic Surgery: A Novel Proposed Application of Artificial Intelligence Large Language Models. Aesthet Surg J Open Forum 2024; 6:ojae062. [PMID: 39257998 PMCID: PMC11385898 DOI: 10.1093/asjof/ojae062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/12/2024] Open
Abstract
Background At a time when Internet and social media use is omnipresent among patients in their self-directed research about their medical or surgical needs, artificial intelligence (AI) large language models (LLMs) are on track to represent hallmark resources in this context. Objectives The authors aim to explore and assess the performance of a novel AI LLM in answering questions posed by simulated patients interested in aesthetic breast plastic surgery procedures. Methods A publicly available AI LLM was queried using simulated interactions from the perspective of patients interested in breast augmentation, mastopexy, and breast reduction. Questions posed were standardized and categorized under aesthetic needs inquiries and awareness of appropriate procedures; patient candidacy and indications; procedure safety and risks; procedure information, steps, and techniques; patient assessment; preparation for surgery; postprocedure instructions and recovery; and procedure cost and surgeon recommendations. Using standardized Likert scales ranging from 1 to 10, 4 expert breast plastic surgeons evaluated responses provided by AI. A postparticipation survey assessed expert evaluators' experience with LLM technology, perceived utility, and limitations. Results The overall performance across all question categories, assessment criteria, and procedures examined was 7.3/10 ± 0.5. Overall accuracy of information shared was scored at 7.1/10 ± 0.5; comprehensiveness at 7.0/10 ± 0.6; objectivity at 7.5/10 ± 0.4; safety at 7.5/10 ± 0.4; communication clarity at 7.3/10 ± 0.2; and acknowledgment of limitations at 7.7/10 ± 0.2. With regards to performance on procedures examined, the model's overall score was 7.0/10 ± 0.8 for breast augmentation; 7.6/10 ± 0.5 for mastopexy; and 7.4/10 ± 0.5 for breast reduction. The score on breast implant-specific knowledge was 6.7/10 ± 0.6. Conclusions Albeit not without limitations, AI LLMs represent promising resources for patient guidance and patient education. The technology's machine learning capabilities may explain its improved performance efficiency. Level of Evidence 4
Collapse
|
6
|
Abi-Rafeh J, Bassiri-Tehrani B, Kazan R, Hanna SA, Kanevsky J, Nahai F. Comparative Performance of Current Patient-Accessible Artificial Intelligence Large Language Models in the Preoperative Education of Patients in Facial Aesthetic Surgery. Aesthet Surg J Open Forum 2024; 6:ojae058. [PMID: 39228821 PMCID: PMC11371156 DOI: 10.1093/asjof/ojae058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2024] Open
Abstract
Background Artificial intelligence large language models (LLMs) represent promising resources for patient guidance and education in aesthetic surgery. Objectives The present study directly compares the performance of OpenAI's ChatGPT (San Francisco, CA) with Google's Bard (Mountain View, CA) in this patient-related clinical application. Methods Standardized questions were generated and posed to ChatGPT and Bard from the perspective of simulated patients interested in facelift, rhinoplasty, and brow lift. Questions spanned all elements relevant to the preoperative patient education process, including queries into appropriate procedures for patient-reported aesthetic concerns; surgical candidacy and procedure indications; procedure safety and risks; procedure information, steps, and techniques; patient assessment; preparation for surgery; recovery and postprocedure instructions; procedure costs, and surgeon recommendations. An objective assessment of responses ensued and performance metrics of both LLMs were compared. Results ChatGPT scored 8.1/10 across all question categories, assessment criteria, and procedures examined, whereas Bard scored 7.4/10. Overall accuracy of information was scored at 6.7/10 ± 3.5 for ChatGPT and 6.5/10 ± 2.3 for Bard; comprehensiveness was scored as 6.6/10 ± 3.5 vs 6.3/10 ± 2.6; objectivity as 8.2/10 ± 1.0 vs 7.2/10 ± 0.8, safety as 8.8/10 ± 0.4 vs 7.8/10 ± 0.7, communication clarity as 9.3/10 ± 0.6 vs 8.5/10 ± 0.3, and acknowledgment of limitations as 8.9/10 ± 0.2 vs 8.1/10 ± 0.5, respectively. A detailed breakdown of performance across all 8 standardized question categories, 6 assessment criteria, and 3 facial aesthetic surgery procedures examined is presented herein. Conclusions ChatGPT outperformed Bard in all assessment categories examined, with more accurate, comprehensive, objective, safe, and clear responses provided. Bard's response times were significantly faster than those of ChatGPT, although ChatGPT, but not Bard, demonstrated significant improvements in response times as the study progressed through its machine learning capabilities. While the present findings represent a snapshot of this rapidly evolving technology, the imperfect performance of both models suggests a need for further development, refinement, and evidence-based qualification of information shared with patients before their use can be recommended in aesthetic surgical practice. Level of Evidence 5
Collapse
Affiliation(s)
| | | | | | | | | | - Foad Nahai
- Corresponding Author: Dr Foad Nahai, 875 Johnson Ferry Rd NE, Atlanta, GA 30342, USA. E-mail:
| |
Collapse
|
7
|
İlhan B, Gürses BO, Güneri P. Addressing Inequalities in Science: The Role of Language Learning Models in Bridging the Gap. Int Dent J 2024; 74:657-660. [PMID: 38599934 PMCID: PMC11287170 DOI: 10.1016/j.identj.2024.01.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2023] [Revised: 01/24/2024] [Accepted: 01/28/2024] [Indexed: 04/12/2024] Open
Affiliation(s)
- Betül İlhan
- Department of Oral & Maxillofacial Radiology, Faculty of Dentistry, Ege University, Izmir, Turkey.
| | - Barış Oğuz Gürses
- Department of Mechanical Engineering, Faculty of Engineering, Ege University, Izmir, Turkey
| | - Pelin Güneri
- Department of Oral & Maxillofacial Radiology, Faculty of Dentistry, Ege University, Izmir, Turkey
| |
Collapse
|
8
|
Gao Z, Li L, Ma S, Wang Q, Hemphill L, Xu R. Examining the Potential of ChatGPT on Biomedical Information Retrieval: Fact-Checking Drug-Disease Associations. Ann Biomed Eng 2024; 52:1919-1927. [PMID: 37855948 DOI: 10.1007/s10439-023-03385-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 10/09/2023] [Indexed: 10/20/2023]
Abstract
Large language models (LLMs) such as ChatGPT have recently attracted significant attention due to their impressive performance on many real-world tasks. These models have also demonstrated the potential in facilitating various biomedical tasks. However, little is known of their potential in biomedical information retrieval, especially identifying drug-disease associations. This study aims to explore the potential of ChatGPT, a popular LLM, in discerning drug-disease associations. We collected 2694 true drug-disease associations and 5662 false drug-disease pairs. Our approach involved creating various prompts to instruct ChatGPT in identifying these associations. Under varying prompt designs, ChatGPT's capability to identify drug-disease associations with an accuracy of 74.6-83.5% and 96.2-97.6% for the true and false pairs, respectively. This study shows that ChatGPT has the potential in identifying drug-disease associations and may serve as a helpful tool in searching pharmacy-related information. However, the accuracy of its insights warrants comprehensive examination before its implementation in medical practice.
Collapse
Affiliation(s)
- Zhenxiang Gao
- Center for Artificial Intelligence in Drug Discovery, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Lingyao Li
- School of Information, University of Michigan, Ann Arbor, MI, USA
| | - Siyuan Ma
- Vanderbilt University Medical Center, Nashville, TN, USA
| | - Qinyong Wang
- Center for Artificial Intelligence in Drug Discovery, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Libby Hemphill
- School of Information, University of Michigan, Ann Arbor, MI, USA
| | - Rong Xu
- Center for Artificial Intelligence in Drug Discovery, School of Medicine, Case Western Reserve University, Cleveland, OH, USA.
| |
Collapse
|
9
|
Dursun D, Bilici Geçer R. Can artificial intelligence models serve as patient information consultants in orthodontics? BMC Med Inform Decis Mak 2024; 24:211. [PMID: 39075513 PMCID: PMC11285120 DOI: 10.1186/s12911-024-02619-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Accepted: 07/23/2024] [Indexed: 07/31/2024] Open
Abstract
BACKGROUND To evaluate the accuracy, reliability, quality, and readability of responses generated by ChatGPT-3.5, ChatGPT-4, Gemini, and Copilot in relation to orthodontic clear aligners. METHODS Frequently asked questions by patients/laypersons about clear aligners on websites were identified using the Google search tool and these questions were posed to ChatGPT-3.5, ChatGPT-4, Gemini, and Copilot AI models. Responses were assessed using a five-point Likert scale for accuracy, the modified DISCERN scale for reliability, the Global Quality Scale (GQS) for quality, and the Flesch Reading Ease Score (FRES) for readability. RESULTS ChatGPT-4 responses had the highest mean Likert score (4.5 ± 0.61), followed by Copilot (4.35 ± 0.81), ChatGPT-3.5 (4.15 ± 0.75) and Gemini (4.1 ± 0.72). The difference between the Likert scores of the chatbot models was not statistically significant (p > 0.05). Copilot had a significantly higher modified DISCERN and GQS score compared to both Gemini, ChatGPT-4 and ChatGPT-3.5 (p < 0.05). Gemini's modified DISCERN and GQS score was statistically higher than ChatGPT-3.5 (p < 0.05). Gemini also had a significantly higher FRES compared to both ChatGPT-4, Copilot and ChatGPT-3.5 (p < 0.05). The mean FRES was 38.39 ± 11.56 for ChatGPT-3.5, 43.88 ± 10.13 for ChatGPT-4 and 41.72 ± 10.74 for Copilot, indicating that the responses were difficult to read according to the reading level. The mean FRES for Gemini is 54.12 ± 10.27, indicating that Gemini's responses are more readable than other chatbots. CONCLUSIONS All chatbot models provided generally accurate, moderate reliable and moderate to good quality answers to questions about the clear aligners. Furthermore, the readability of the responses was difficult. ChatGPT, Gemini and Copilot have significant potential as patient information tools in orthodontics, however, to be fully effective they need to be supplemented with more evidence-based information and improved readability.
Collapse
Affiliation(s)
- Derya Dursun
- Department of Orthodontics, Hamidiye Faculty of Dentistry, University of Health Sciences, Istanbul, Turkey
| | - Rumeysa Bilici Geçer
- Department of Orthodontics, Faculty of Dentistry, Istanbul Aydin University, Istanbul, Turkey.
| |
Collapse
|
10
|
Goldstein M, Donos N, Teughels W, Gkranias N, Temmerman A, Derks J, Kuru BE, Carra MC, Castro AB, Dereka X, Dekeyser C, Herrera D, Vandamme K, Calciolari E. Structure, governance and delivery of specialist training programs in periodontology and implant dentistry. J Clin Periodontol 2024. [PMID: 39072845 DOI: 10.1111/jcpe.14033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 06/07/2024] [Accepted: 06/08/2024] [Indexed: 07/30/2024]
Abstract
AIM To update the competences and learning outcomes and their evaluation, educational methods and education quality assurance for the training of contemporary specialists in periodontology, including the impact of the 2018 Classification of Periodontal and Peri-implant Diseases and Conditions (2018 Classification hereafter) and the European Federation of Periodontology (EFP) Clinical Practice Guidelines (CPGs). METHODS Evidence was gathered through scientific databases and by searching for European policies on higher education. In addition, two surveys were designed and sent to program directors and graduates. RESULTS Program directors reported that curricula were periodically adapted to incorporate advances in diagnosis, classification, treatment guidelines and clinical techniques, including the 2018 Classification and the EFP CPGs. Graduates evaluated their overall training positively, although satisfaction was limited for training in mucogingival and surgical procedures related to dental implants. Traditional educational methods, such as didactic lectures, are still commonly employed, but they are now often associated with more interactive methods such as case-based seminars and problem-based and simulation-based learning. The evaluation of competences/learning outcomes should employ multiple methods of assessment. CONCLUSION An update of competences and learning outcomes of specialist training in periodontology is proposed, including knowledge and practical application of the 2018 Classification and CPGs. Harmonizing specialist training in periodontology is a critical issue at the European level.
Collapse
Affiliation(s)
- Moshe Goldstein
- Faculty of Dental Medicine, Hadassah Medical Center and Hebrew University, Jerusalem, Israel
- Postgraduate Education Committee, European Federation of Periodontology (EFP)
| | - Nikolaos Donos
- Centre for Oral Clinical Research, Institute of Dentistry, Faculty of Medicine and Dentistry, Queen Mary University of London, London, UK
- Chair, Education Committee, European Federation of Periodontology (EFP)
| | - Wim Teughels
- Department of Oral Health Sciences, Periodontology and Oral Microbiology, KU Leuven and Dentistry, University Hospitals Leuven, Leuven, Belgium
| | - Nikolaos Gkranias
- Centre for Oral Clinical Research, Institute of Dentistry, Faculty of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Andy Temmerman
- Department of Oral Health Sciences, Periodontology and Oral Microbiology, KU Leuven and Dentistry, University Hospitals Leuven, Leuven, Belgium
| | - Jan Derks
- Department of Periodontology, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
- Department of Periodontology, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
| | - Bahar Eren Kuru
- Department of Periodontology and Postgraduate Program in Periodontology, Faculty of Dentistry, Yeditepe University, Istanbul, Turkey
| | - Maria Clotilde Carra
- Department of Periodontology, U.F.R. of Odontology, Université Paris Cité, Paris, France
- Unit of Periodontal and Oral Surgery, Service of Odontology, Rothschild Hospital (AP-HP), Paris, France
- INSERM- Sorbonne Paris Cité Epidemiology and Statistics Research Centre, Paris, France
| | - Ana Belen Castro
- Department of Oral Health Sciences, Periodontology and Oral Microbiology, KU Leuven and Dentistry, University Hospitals Leuven, Leuven, Belgium
| | - Xanthippi Dereka
- Department of Periodontology, School of Dentistry, National and Kapodistrian University of Athens, Athens, Greece
| | - Christel Dekeyser
- Department of Oral Health Sciences, Periodontology and Oral Microbiology, KU Leuven and Dentistry, University Hospitals Leuven, Leuven, Belgium
| | - David Herrera
- ETEP (Etiology and Therapy of Periodontal and Peri-implant Diseases) Research Group, University Complutense of Madrid, Madrid, Spain
| | - Katleen Vandamme
- Department of Oral Health Sciences, Periodontology and Oral Microbiology, KU Leuven and Dentistry, University Hospitals Leuven, Leuven, Belgium
| | - Elena Calciolari
- Centre for Oral Clinical Research, Institute of Dentistry, Faculty of Medicine and Dentistry, Queen Mary University of London, London, UK
- Dental School, Department of Medicine and Surgery, University of Parma, Parma, Italy
| |
Collapse
|
11
|
Danesh A, Danesh A, Danesh F. Innovating dental diagnostics: ChatGPT's accuracy on diagnostic challenges. Oral Dis 2024. [PMID: 39039720 DOI: 10.1111/odi.15082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Revised: 06/13/2024] [Accepted: 07/12/2024] [Indexed: 07/24/2024]
Abstract
INTRODUCTION Complex patient diagnoses in dentistry require a multifaceted approach which combines interpretations of clinical observations with an in-depth understanding of patient history and presenting problems. The present study aims to elucidate the implications of ChatGPT (OpenAI) as a comprehensive diagnostic tool in the dental clinic through examining the chatbot's diagnostic performance on challenging patient cases retrieved from the literature. METHODS Our study subjected ChatGPT3.5 and ChatGPT4 to descriptions of patient cases for diagnostic challenges retrieved from the literature. Sample means were compared using a two-tailed t-test, while sample proportions were compared using a two-tailed χ2 test. A p-value below the threshold of 0.05 was deemed statistically significant. RESULTS When prompted to generate their own differential diagnoses, ChatGPT3.5 and ChatGPT4 achieved a diagnostic accuracy of 40% and 62%, respectively. When basing their diagnostic processes on a differential diagnosis retrieved from the literature, ChatGPT3.5 and ChatGPT4 achieved a diagnostic accuracy of 70% and 80%, respectively. CONCLUSION ChatGPT displays an impressive capacity to correctly diagnose complex diagnostic challenges in the field of dentistry. Our study paints a promising potential for the chatbot to 1 day serve as a comprehensive diagnostic tool in the dental clinic.
Collapse
Affiliation(s)
- Arman Danesh
- Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| | - Arsalan Danesh
- Faculty of Dentistry, University of British Columbia, Vancouver, British Columbia, Canada
| | - Farzad Danesh
- Elgin Mills Endodontic Specialists, Richmond Hill, Ontario, Canada
| |
Collapse
|
12
|
Brondani M, Alves C, Ribeiro C, Braga MM, Garcia RCM, Ardenghi T, Pattanaporn K. Artificial intelligence, ChatGPT, and dental education: Implications for reflective assignments and qualitative research. J Dent Educ 2024. [PMID: 38973069 DOI: 10.1002/jdd.13663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 06/02/2024] [Accepted: 06/21/2024] [Indexed: 07/09/2024]
Abstract
INTRODUCTION Reflections enable students to gain additional value from a given experience. The use of Chat Generative Pre-training Transformer (ChatGPT, OpenAI Incorporated) has gained momentum, but its impact on dental education is understudied. OBJECTIVES To assess whether or not university instructors can differentiate reflections generated by ChatGPT from those generated by students, and to assess whether or not the content of a thematic analysis generated by ChatGPT differs from that generated by qualitative researchers on the same reflections. METHODS Hardcopies of 20 reflections (10 generated by undergraduate dental students and 10 generated by ChatGPT) were distributed to three instructors who had at least 5 years of teaching experience. Instructors were asked to assign either 'ChatGPT' or 'student' to each reflection. Ten of these reflections (five generated by undergraduate dental students and five generated by ChatGPT) were randomly selected and distributed to two qualitative researchers who were asked to perform a brief thematic analysis with codes and themes. The same ten reflections were also thematically analyzed by ChatGPT. RESULTS The three instructors correctly determined whether the reflections were student or ChatGPT generated 85% of the time. Most disagreements (40%) happened with the reflections generated by ChatGPT, as the instructors thought to be generated by students. The thematic analyses did not differ substantially when comparing the codes and themes produced by the two researchers with those generated by ChatGPT. CONCLUSIONS Instructors could differentiate between reflections generated by ChatGPT or by students most of the time. The overall content of a thematic analysis generated by the artificial intelligence program ChatGPT did not differ from that generated by qualitative researchers. Overall, the promising applications of ChatGPT will likely generate a paradigm shift in (dental) health education, research, and practice.
Collapse
Affiliation(s)
- Mario Brondani
- Faculty of Dentistry, Department of Oral Health Sciences, University of British Columbia, Vancouver, Canada
| | - Claudia Alves
- Faculty of Dentistry, Department of Dentistry II, Federal University of Maranhão, Sao Luis-Maranhao, Brazil
| | - Cecilia Ribeiro
- Faculty of Dentistry, Department of Dentistry II, Federal University of Maranhão, Sao Luis-Maranhao, Brazil
| | - Mariana M Braga
- Faculty of Dentistry, Department of Pediatric Dentistry, University of São Paulo, Sao Paulo, Brazil
| | - Renata C Mathes Garcia
- Faculty of Dentistry, Prosthodontic and Periodontic Department, University of Campinas, Sao Paulo, Brazil
| | - Thiago Ardenghi
- Faculty of Dentistry, Department of Pediatric Dentistry and Epidemiology, School of Dentistry, Federal University of Santa Maria, Santa Maria, Brazil
| | | |
Collapse
|
13
|
Iglesias-Puzas Á, Conde-Taboada A, López-Bran E. [Considerations for using ChatGPT in medical practice]. J Healthc Qual Res 2024; 39:266-267. [PMID: 37743152 DOI: 10.1016/j.jhqr.2023.09.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Accepted: 09/01/2023] [Indexed: 09/26/2023]
Affiliation(s)
- Á Iglesias-Puzas
- Servicio de Dermatología, Hospital Universitario Clínico San Carlos, Universidad Complutense, Madrid, España.
| | - A Conde-Taboada
- Servicio de Dermatología, Hospital Universitario Clínico San Carlos, Universidad Complutense, Madrid, España
| | - E López-Bran
- Servicio de Dermatología, Hospital Universitario Clínico San Carlos, Universidad Complutense, Madrid, España
| |
Collapse
|
14
|
Costa ICP, do Nascimento MC, Treviso P, Chini LT, Roza BDA, Barbosa SDFF, Mendes KDS. Using the Chat Generative Pre-trained Transformer in academic writing in health: a scoping review. Rev Lat Am Enfermagem 2024; 32:e4194. [PMID: 38922265 PMCID: PMC11182606 DOI: 10.1590/1518-8345.7133.4194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 02/04/2024] [Indexed: 06/27/2024] Open
Abstract
OBJECTIVE to map the scientific literature regarding the use of the Chat Generative Pre-trained Transformer, ChatGPT, in academic writing in health. METHOD this was a scoping review, following the JBI methodology. Conventional databases and gray literature were included. The selection of studies was applied after removing duplicates and individual and paired evaluation. Data were extracted based on an elaborate script, and presented in a descriptive, tabular and graphical format. RESULTS the analysis of the 49 selected articles revealed that ChatGPT is a versatile tool, contributing to scientific production, description of medical procedures and preparation of summaries aligned with the standards of scientific journals. Its application has been shown to improve the clarity of writing and benefits areas such as innovation and automation. Risks were also observed, such as the possibility of lack of originality and ethical issues. Future perspectives highlight the need for adequate regulation, agile adaptation and the search for an ethical balance in incorporating ChatGPT into academic writing. CONCLUSION ChatGPT presents transformative potential in academic writing in health. However, its adoption requires rigorous human supervision, solid regulation, and transparent guidelines to ensure its responsible and beneficial use by the scientific community.
Collapse
Affiliation(s)
| | | | - Patrícia Treviso
- Universidade do Vale do Rio dos Sinos, Escola de Saúde, São Leopoldo, RS, Brazil
| | | | | | | | - Karina Dal Sasso Mendes
- Universidade de São Paulo, Escola de Enfermagem de Ribeirão Preto, PAHO/WHO Collaborating Centre for Nursing Research Development, Ribeirão Preto, SP, Brazil
| |
Collapse
|
15
|
Daraqel B, Wafaie K, Mohammed H, Cao L, Mheissen S, Liu Y, Zheng L. The performance of artificial intelligence models in generating responses to general orthodontic questions: ChatGPT vs Google Bard. Am J Orthod Dentofacial Orthop 2024; 165:652-662. [PMID: 38493370 DOI: 10.1016/j.ajodo.2024.01.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Revised: 01/01/2024] [Accepted: 01/01/2024] [Indexed: 03/18/2024]
Abstract
INTRODUCTION This study aimed to evaluate and compare the performance of 2 artificial intelligence (AI) models, Chat Generative Pretrained Transformer-3.5 (ChatGPT-3.5; OpenAI, San Francisco, Calif) and Google Bidirectional Encoder Representations from Transformers (Google Bard; Bard Experiment, Google, Mountain View, Calif), in terms of response accuracy, completeness, generation time, and response length when answering general orthodontic questions. METHODS A team of orthodontic specialists developed a set of 100 questions in 10 orthodontic domains. One author submitted the questions to both ChatGPT and Google Bard. The AI-generated responses from both models were randomly assigned into 2 forms and sent to 5 blinded and independent assessors. The quality of AI-generated responses was evaluated using a newly developed tool for accuracy of information and completeness. In addition, response generation time and length were recorded. RESULTS The accuracy and completeness of responses were high in both AI models. The median accuracy score was 9 (interquartile range [IQR]: 8-9) for ChatGPT and 8 (IQR: 8-9) for Google Bard (Median difference: 1; P <0.001). The median completeness score was similar in both models, with 8 (IQR: 8-9) for ChatGPT and 8 (IQR: 7-9) for Google Bard. The odds of accuracy and completeness were higher by 31% and 23% in ChatGPT than in Google Bard. Google Bard's response generation time was significantly shorter than that of ChatGPT by 10.4 second/question. However, both models were similar in terms of response length generation. CONCLUSIONS Both ChatGPT and Google Bard generated responses were rated with a high level of accuracy and completeness to the posed general orthodontic questions. However, acquiring answers was generally faster using the Google Bard model.
Collapse
Affiliation(s)
- Baraa Daraqel
- Department of Orthodontics, Stomatological Hospital of Chongqing Medical University Chongqing Key Laboratory of Oral Disease and Biomedical Sciences Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, China; Oral Health Research and Promotion Unit, Al-Quds University, Jerusalem, Palestine.
| | - Khaled Wafaie
- Department of Orthodontics, Faculty of Dentistry, First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China
| | | | - Li Cao
- Department of Orthodontics, Stomatological Hospital of Chongqing Medical University Chongqing Key Laboratory of Oral Disease and Biomedical Sciences Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, China
| | | | - Yang Liu
- Department of Orthodontics, Stomatological Hospital of Chongqing Medical University Chongqing Key Laboratory of Oral Disease and Biomedical Sciences Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, China
| | - Leilei Zheng
- Department of Orthodontics, Stomatological Hospital of Chongqing Medical University Chongqing Key Laboratory of Oral Disease and Biomedical Sciences Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, China.
| |
Collapse
|
16
|
Buitrago-Esquinas EM, Puig-Cabrera M, Santos JAC, Custódio-Santos M, Yñiguez-Ovando R. Developing a hetero-intelligence methodological framework for sustainable policy-making based on the assessment of large language models. MethodsX 2024; 12:102707. [PMID: 38650999 PMCID: PMC11033193 DOI: 10.1016/j.mex.2024.102707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 04/09/2024] [Indexed: 04/25/2024] Open
Abstract
This work delves into the increasing relevance of Large Language Models (LLMs) in the realm of sustainable policy-making, proposing an innovative hetero-intelligence framework that blends human and artificial intelligence (AI) for tackling modern sustainability challenges. The research methodology includes a hetero-intelligence performance test, which juxtaposes human intelligence with AI in the formulation and implementation of sustainable policies. After testing this hetero-intelligence methodology, seven steps are rigorously described so that it can be replicated in any sustainability planning related context. The results underscore the capabilities and limitations of LLMs, underscoring the critical role of human intelligence in enhancing the efficacy of hetero-intelligence systems. This work fulfils the need of a rigorous methodological framework based on empirical steps that can provide unbiased outcomes to be integrated into sustainable planning and decision-making processes.•Assesses LLMs' limitations and capabilities regarding sustainable planning issues•A replicable methodology is proposed based on the combination of both human and artificial intelligence•It proposes and systematises the integration of a hetero-intelligent approach into the formulation of sustainability policies to be more efficient and effective.
Collapse
Affiliation(s)
- Eva M. Buitrago-Esquinas
- Faculty of Economics and Business Sciences, Universidad de Sevilla, Spain
- Research Centre for Tourism, Sustainability and Well-being (CinTurs), Universidade do Algarve, Faro, Portugal
| | - Miguel Puig-Cabrera
- Faculty of Economics and Business Sciences, Universidad de Sevilla, Spain
- Research Centre for Tourism, Sustainability and Well-being (CinTurs), Universidade do Algarve, Faro, Portugal
| | - José António C. Santos
- Faculty of Economics and Business Sciences, Universidad de Sevilla, Spain
- Research Centre for Tourism, Sustainability and Well-being (CinTurs), Universidade do Algarve, Faro, Portugal
| | - Margarida Custódio-Santos
- Faculty of Economics and Business Sciences, Universidad de Sevilla, Spain
- Research Centre for Tourism, Sustainability and Well-being (CinTurs), Universidade do Algarve, Faro, Portugal
| | - Rocío Yñiguez-Ovando
- Faculty of Economics and Business Sciences, Universidad de Sevilla, Spain
- Research Centre for Tourism, Sustainability and Well-being (CinTurs), Universidade do Algarve, Faro, Portugal
| |
Collapse
|
17
|
Saravia-Rojas MÁ, Camarena-Fonseca AR, León-Manco R, Geng-Vivanco R. Artificial intelligence: ChatGPT as a disruptive didactic strategy in dental education. J Dent Educ 2024; 88:872-876. [PMID: 38356365 DOI: 10.1002/jdd.13485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 12/22/2023] [Accepted: 01/20/2024] [Indexed: 02/16/2024]
Abstract
PURPOSE To evaluate the influence of ChatGPT on academic tasks performed by undergraduate dental students. METHOD Fifty-five participants completed scientific writing assignments. First, ChatGPT was utilized; subsequently, a conventional method involving the search of scientific articles was employed. Each task was preceded by a 30-min training session. The assignments were reviewed by professors, and an anonymous questionnaire was administered to the students regarding the usefulness of ChatGPT. Data were analyzed by Mann-Whitney U-test. RESULTS Final scores and scores for the criteria of utilization of evidence, evaluation of arguments, and generation of alternatives achieved higher values through the traditional method than with ChatGPT (p = 0.019, 0.042, 0.017, and <0.001, respectively). No differences were found between the two methods for the remaining criteria (p > 0.05). A total of 64.29% of the students found ChatGPT useful, 33.33% found it very useful, and 3.38% not very useful. Regarding its application in further academic activities, 54.76% considered it useful, 40.48% found it very useful, and 4.76% not very useful. A total of 61.90% of the participants indicated that ChatGPT contributed to over 25% of their productivity, while 11.9% perceived it contributed to less than 15%. Concerning the relevance of having known ChatGPT for academic tasks, 50% found it opportune, 45.24% found it very opportune, 2.38% were unsure, and the same percentage thought it is inopportune. All students provided positive feedback. CONCLUSION Dental students highly valued the experience of using ChatGPT for academic tasks. Nonetheless, the traditional method of searching for scientific articles yield higher scores.
Collapse
Affiliation(s)
| | | | | | - Rocio Geng-Vivanco
- Department of Dental Materials and Prosthodontics, Ribeirão Preto School of Dentistry, University of São Paulo, Ribeirão Preto, SP, Brazil
| |
Collapse
|
18
|
Buldur M, Sezer B. Evaluating the accuracy of Chat Generative Pre-trained Transformer version 4 (ChatGPT-4) responses to United States Food and Drug Administration (FDA) frequently asked questions about dental amalgam. BMC Oral Health 2024; 24:605. [PMID: 38789962 PMCID: PMC11127407 DOI: 10.1186/s12903-024-04358-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Accepted: 05/09/2024] [Indexed: 05/26/2024] Open
Abstract
BACKGROUND The use of artificial intelligence in the field of health sciences is becoming widespread. It is known that patients benefit from artificial intelligence applications on various health issues, especially after the pandemic period. One of the most important issues in this regard is the accuracy of the information provided by artificial intelligence applications. OBJECTIVE The purpose of this study was to the frequently asked questions about dental amalgam, as determined by the United States Food and Drug Administration (FDA), which is one of these information resources, to Chat Generative Pre-trained Transformer version 4 (ChatGPT-4) and to compare the content of the answers given by the application with the answers of the FDA. METHODS The questions were directed to ChatGPT-4 on May 8th and May 16th, 2023, and the responses were recorded and compared at the word and meaning levels using ChatGPT. The answers from the FDA webpage were also recorded. The responses were compared for content similarity in "Main Idea", "Quality Analysis", "Common Ideas", and "Inconsistent Ideas" between ChatGPT-4's responses and FDA's responses. RESULTS ChatGPT-4 provided similar responses at one-week intervals. In comparison with FDA guidance, it provided answers with similar information content to frequently asked questions. However, although there were some similarities in the general aspects of the recommendation regarding amalgam removal in the question, the two texts are not the same, and they offered different perspectives on the replacement of fillings. CONCLUSIONS The findings of this study indicate that ChatGPT-4, an artificial intelligence based application, encompasses current and accurate information regarding dental amalgam and its removal, providing it to individuals seeking access to such information. Nevertheless, we believe that numerous studies are required to assess the validity and reliability of ChatGPT-4 across diverse subjects.
Collapse
Affiliation(s)
- Mehmet Buldur
- Department of Restorative Dentistry, School of Dentistry, Çanakkale Onsekiz Mart University, Çanakkale, Türkiye
| | - Berkant Sezer
- Department of Pediatric Dentistry, School of Dentistry, Çanakkale Onsekiz Mart University, Çanakkale, Türkiye.
| |
Collapse
|
19
|
Kılınç DD, Mansız D. Examination of the reliability and readability of Chatbot Generative Pretrained Transformer's (ChatGPT) responses to questions about orthodontics and the evolution of these responses in an updated version. Am J Orthod Dentofacial Orthop 2024; 165:546-555. [PMID: 38300168 DOI: 10.1016/j.ajodo.2023.11.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Revised: 11/01/2023] [Accepted: 11/01/2023] [Indexed: 02/02/2024]
Abstract
INTRODUCTION This study aimed to assess the reliability and readability of Chatbot Generative Pretrained Transformer (ChatGPT) responses to questions about orthodontics and the evolution of these responses in an updated version. METHODS Frequently asked questions about orthodontics by laypeople on Web sites were determined using the Google Search Tool. These questions were asked to both ChatGPT's March 23 version and May 24 version on April 20, 2023, and July 12, 2023, respectively. Responses were assessed for readability and reliability using the Flesch-Kincaid and DISCERN tests. RESULTS The mean DISCERN value for general questions was 2.96 ± 0.05, 3.04 ± 0.06, 2.38 ± 0.27, and 2.82 ± 0.31 for treatment-related questions; the mean Flesch-Kincaid Reading Ease score for general questions was 29.28 ± 8.22, 25.12 ± 7.39, 47.67 ± 10.77, and 41.60 ± 9.54 for treatment-related questions; mean Flesch-Kincaid Grade Level for general questions was 14.52 ± 1.48 and 14.04 ± 1.25 and 11.90 ± 2.08 and 11.41 ± 1.88 for treatment-related questions; in first and second evaluations respectively (P = 0.001). CONCLUSIONS In the second evaluation, the reliability of the answers given to general questions and treatment-related questions increased. However, in both evaluations, the reliability of the answers was found to be moderate according to the DISCERN tool. On the second evaluation, Flesch Reading Ease Scores for both general questions and treatment-related questions decreased, meaning that the readability of the new response texts became more difficult. Flesch-Kincaid Grade Level results were found at the college graduate level in the first and second evaluations for general questions and at the high school level in the first and second evaluations for treatment-related questions.
Collapse
Affiliation(s)
- Delal Dara Kılınç
- Department of Orthodontics, School of Dental Medicine, Bahçeşehir University, Istanbul, Turkey.
| | - Duygu Mansız
- Department of Orthodontics, Faculty of Dentistry, Istanbul Aydin University, Istanbul, Turkey
| |
Collapse
|
20
|
Rokhshad R, Zhang P, Mohammad-Rahimi H, Pitchika V, Entezari N, Schwendicke F. Accuracy and consistency of chatbots versus clinicians for answering pediatric dentistry questions: A pilot study. J Dent 2024; 144:104938. [PMID: 38499280 DOI: 10.1016/j.jdent.2024.104938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Revised: 03/06/2024] [Accepted: 03/11/2024] [Indexed: 03/20/2024] Open
Abstract
OBJECTIVES Artificial Intelligence has applications such as Large Language Models (LLMs), which simulate human-like conversations. The potential of LLMs in healthcare is not fully evaluated. This pilot study assessed the accuracy and consistency of chatbots and clinicians in answering common questions in pediatric dentistry. METHODS Two expert pediatric dentists developed thirty true or false questions involving different aspects of pediatric dentistry. Publicly accessible chatbots (Google Bard, ChatGPT4, ChatGPT 3.5, Llama, Sage, Claude 2 100k, Claude-instant, Claude-instant-100k, and Google Palm) were employed to answer the questions (3 independent new conversations). Three groups of clinicians (general dentists, pediatric specialists, and students; n = 20/group) also answered. Responses were graded by two pediatric dentistry faculty members, along with a third independent pediatric dentist. Resulting accuracies (percentage of correct responses) were compared using analysis of variance (ANOVA), and post-hoc pairwise group comparisons were corrected using Tukey's HSD method. ACronbach's alpha was calculated to determine consistency. RESULTS Pediatric dentists were significantly more accurate (mean±SD 96.67 %± 4.3 %) than other clinicians and chatbots (p < 0.001). General dentists (88.0 % ± 6.1 %) also demonstrated significantly higher accuracy than chatbots (p < 0.001), followed by students (80.8 %±6.9 %). ChatGPT showed the highest accuracy (78 %±3 %) among chatbots. All chatbots except ChatGPT3.5 showed acceptable consistency (Cronbach alpha>0.7). CLINICAL SIGNIFICANCE Based on this pilot study, chatbots may be valuable adjuncts for educational purposes and for distributing information to patients. However, they are not yet ready to serve as substitutes for human clinicians in diagnostic decision-making. CONCLUSION In this pilot study, chatbots showed lower accuracy than dentists. Chatbots may not yet be recommended for clinical pediatric dentistry.
Collapse
Affiliation(s)
- Rata Rokhshad
- Department of Pediatric Dentistry, University of Alabama at Birmingham, Birmingham, AL, USA.
| | - Ping Zhang
- Department of Pediatric Dentistry, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Hossein Mohammad-Rahimi
- Topic Group Dental Diagnostics and Digital Dentistry, ITU/WHO Focus Group AI on Health, Berlin, Germany
| | - Vinay Pitchika
- Department of Conservative Dentistry and Periodontology, LMU Klinikum Munich, Germany
| | - Niloufar Entezari
- Department of pediatric dentistry, School of Dentistry, Qom University of Medical Sciences, Qom, Iran
| | - Falk Schwendicke
- Topic Group Dental Diagnostics and Digital Dentistry, ITU/WHO Focus Group AI on Health, Berlin, Germany; Department of Conservative Dentistry and Periodontology, LMU Klinikum Munich, Germany
| |
Collapse
|
21
|
Brozović J, Mikulić B, Tomas M, Juzbašić M, Blašković M. Assessing the performance of Bing Chat artificial intelligence: Dental exams, clinical guidelines, and patients' frequent questions. J Dent 2024; 144:104927. [PMID: 38458379 DOI: 10.1016/j.jdent.2024.104927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 03/03/2024] [Accepted: 03/05/2024] [Indexed: 03/10/2024] Open
Abstract
OBJECTIVES Bing Chat is a large language model artificial intelligence (AI) with online search and text generating capabilities. This study assessed its performance within the scope of dentistry in: (a) tackling exam questions for dental students, (ii) providing guidelines for dental practitioners, and (iii) answering patients' frequently asked questions. We discuss the potential of clinical tutoring, common patient communication and impact on academia. METHODS With the aim of assessing AI's performance in dental exams, Bing Chat was presented with 532 multiple-choice questions and awarded scores based on its answers. In evaluating guidelines for clinicians, a further set of 15 questions, each with 2 follow-up questions on clinical protocols, was presented to the AI. The answers were assessed by 4 reviewers using electronic visual analog scale. In evaluating answers to patients' frequently asked questions, another list of 15 common questions was included in the session, with respective outputs assessed. RESULTS Bing Chat correctly answered 383 out of 532 multiple-choice questions in dental exam part, achieving a score of 71.99 %. As for outlining clinical protocols for practitioners, the overall assessment score was 81.05 %. In answering patients' frequently asked questions, Bing Chat achieved an overall mean score of 83.8 %. The assessments demonstrated low inter-rater reliability. CONCLUSIONS The overall performance of Bing Chat was above the regularly adopted passing scores, particularly in answering patient's frequently asked questions. The generated content may have biased sources. These results suggest the importance of raising clinicians' awareness of AI's benefits and risks, as well as timely adaptations of dental education curricula, and safeguarding its use in dentistry and healthcare in general. CLINICAL SIGNIFICANCE Bing Chat AI performed above the passing threshold in three categories, and thus demonstrated potential for educational assistance, clinical tutoring, and answering patients' questions. We recommend popularizing its benefits and risks among students and clinicians, while maintaining awareness of possible false information.
Collapse
Affiliation(s)
- Juraj Brozović
- Assistant Professor, Ph.D., DMD, Specialist in Oral Surgery, Faculty of Dental Medicine and Health, University of Osijek, Croatia.
| | - Barbara Mikulić
- Assistant, DMD, Faculty of Dental Medicine and Health, University of Osijek, Croatia
| | - Matej Tomas
- Assistant, Ph.D., DMD, Faculty of Dental Medicine and Health, University of Osijek, Croatia
| | - Martina Juzbašić
- Assistant, DMD, Faculty of Dental Medicine and Health, University of Osijek, Croatia
| | - Marko Blašković
- Assistant, DMD, Specialist in Oral Surgery, Department of Oral Surgery, Faculty of Dental Medicine, University of Rijeka, Croatia
| |
Collapse
|
22
|
Lv X, Zhang X, Li Y, Ding X, Lai H, Shi J. Leveraging Large Language Models for Improved Patient Access and Self-Management: Assessor-Blinded Comparison Between Expert- and AI-Generated Content. J Med Internet Res 2024; 26:e55847. [PMID: 38663010 PMCID: PMC11082737 DOI: 10.2196/55847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Revised: 03/04/2024] [Accepted: 03/19/2024] [Indexed: 05/12/2024] Open
Abstract
BACKGROUND While large language models (LLMs) such as ChatGPT and Google Bard have shown significant promise in various fields, their broader impact on enhancing patient health care access and quality, particularly in specialized domains such as oral health, requires comprehensive evaluation. OBJECTIVE This study aims to assess the effectiveness of Google Bard, ChatGPT-3.5, and ChatGPT-4 in offering recommendations for common oral health issues, benchmarked against responses from human dental experts. METHODS This comparative analysis used 40 questions derived from patient surveys on prevalent oral diseases, which were executed in a simulated clinical environment. Responses, obtained from both human experts and LLMs, were subject to a blinded evaluation process by experienced dentists and lay users, focusing on readability, appropriateness, harmlessness, comprehensiveness, intent capture, and helpfulness. Additionally, the stability of artificial intelligence responses was also assessed by submitting each question 3 times under consistent conditions. RESULTS Google Bard excelled in readability but lagged in appropriateness when compared to human experts (mean 8.51, SD 0.37 vs mean 9.60, SD 0.33; P=.03). ChatGPT-3.5 and ChatGPT-4, however, performed comparably with human experts in terms of appropriateness (mean 8.96, SD 0.35 and mean 9.34, SD 0.47, respectively), with ChatGPT-4 demonstrating the highest stability and reliability. Furthermore, all 3 LLMs received superior harmlessness scores comparable to human experts, with lay users finding minimal differences in helpfulness and intent capture between the artificial intelligence models and human responses. CONCLUSIONS LLMs, particularly ChatGPT-4, show potential in oral health care, providing patient-centric information for enhancing patient education and clinical care. The observed performance variations underscore the need for ongoing refinement and ethical considerations in health care settings. Future research focuses on developing strategies for the safe integration of LLMs in health care settings.
Collapse
Affiliation(s)
- Xiaolei Lv
- Department of Oral and Maxillofacial Implantology, Shanghai PerioImplant Innovation Center, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- College of Stomatology, Shanghai Jiao Tong University, Shanghai, China
- National Center for Stomatology, Shanghai, China
- National Clinical Research Center for Oral Diseases, Shanghai, China
- Shanghai Key Laboratory of Stomatology, Shanghai, China
- Shanghai Research Institute of Stomatology, Shanghai, China
| | - Xiaomeng Zhang
- Department of Oral and Maxillofacial Implantology, Shanghai PerioImplant Innovation Center, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- College of Stomatology, Shanghai Jiao Tong University, Shanghai, China
- National Center for Stomatology, Shanghai, China
- National Clinical Research Center for Oral Diseases, Shanghai, China
- Shanghai Key Laboratory of Stomatology, Shanghai, China
- Shanghai Research Institute of Stomatology, Shanghai, China
| | - Yuan Li
- Department of Oral and Maxillofacial Implantology, Shanghai PerioImplant Innovation Center, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- College of Stomatology, Shanghai Jiao Tong University, Shanghai, China
- National Center for Stomatology, Shanghai, China
- National Clinical Research Center for Oral Diseases, Shanghai, China
- Shanghai Key Laboratory of Stomatology, Shanghai, China
- Shanghai Research Institute of Stomatology, Shanghai, China
| | - Xinxin Ding
- Department of Oral and Maxillofacial Implantology, Shanghai PerioImplant Innovation Center, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- College of Stomatology, Shanghai Jiao Tong University, Shanghai, China
- National Center for Stomatology, Shanghai, China
- National Clinical Research Center for Oral Diseases, Shanghai, China
- Shanghai Key Laboratory of Stomatology, Shanghai, China
- Shanghai Research Institute of Stomatology, Shanghai, China
| | - Hongchang Lai
- Department of Oral and Maxillofacial Implantology, Shanghai PerioImplant Innovation Center, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- College of Stomatology, Shanghai Jiao Tong University, Shanghai, China
- National Center for Stomatology, Shanghai, China
- National Clinical Research Center for Oral Diseases, Shanghai, China
- Shanghai Key Laboratory of Stomatology, Shanghai, China
- Shanghai Research Institute of Stomatology, Shanghai, China
| | - Junyu Shi
- Department of Oral and Maxillofacial Implantology, Shanghai PerioImplant Innovation Center, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- College of Stomatology, Shanghai Jiao Tong University, Shanghai, China
- National Center for Stomatology, Shanghai, China
- National Clinical Research Center for Oral Diseases, Shanghai, China
- Shanghai Key Laboratory of Stomatology, Shanghai, China
- Shanghai Research Institute of Stomatology, Shanghai, China
| |
Collapse
|
23
|
Ahmed WM, Azhari AA, Alfaraj A, Alhamadani A, Zhang M, Lu CT. The Quality of AI-Generated Dental Caries Multiple Choice Questions: A Comparative Analysis of ChatGPT and Google Bard Language Models. Heliyon 2024; 10:e28198. [PMID: 38596020 PMCID: PMC11002540 DOI: 10.1016/j.heliyon.2024.e28198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 03/05/2024] [Accepted: 03/13/2024] [Indexed: 04/11/2024] Open
Abstract
Statement of problem AI technology presents a variety of benefits and challenges for educators. Purpose To investigate whether ChatGPT and Google Bard (now is named Gemini) are valuable resources for generating multiple-choice questions for educators of dental caries. Material and methods A book on dental caries was used. Sixteen paragraphs were extracted by an expert consultant based on applicability and potential for developing multiple-choice questions. ChatGPT and Bard language models were used to produce multiple-choice questions based on this input, and 64 questions were generated. Three dental specialists assessed the relevance, accuracy, and complexity of the generated questions. The questions were qualitatively evaluated using cognitive learning objectives and item writing flaws. Paired sample t-tests and two-way analysis of variance (ANOVA) were used to compare the generated multiple-choice questions and answers between ChatGPT and Bard. Results There were no significant differences between the questions generated by ChatGPT and Bard. Moreover, the analysis of variance found no significant differences in question quality. Bard-generated questions tended to have higher cognitive levels than those of ChatGPT. Format error was predominant in ChatGPT-generated questions. Finally, Bard exhibited more absolute terms than ChatGPT. Conclusions ChatGPT and Bard could generate questions related to dental caries, mainly at the cognitive level of knowledge and comprehension. Clinical significance Language models are crucial for generating subject-specific questions used in quizzes, tests, and education. By using these models, educators can save time and focus on lesson preparation and student engagement instead of solely focusing on assessment creation. Additionally, language models are adept at generating numerous questions, making them particularly valuable for large-scale exams. However, educators must carefully review and adapt the questions to ensure they align with their learning goals.
Collapse
Affiliation(s)
- Walaa Magdy Ahmed
- Department of Restorative Dentistry, Faculty of Dentistry, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Amr Ahmed Azhari
- Department of Restorative Dentistry, Faculty of Dentistry, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Amal Alfaraj
- Department of Prosthodontics, School of Dentistry, King Faisal Universality, Al Ahsa, Saudi Arabia
| | | | - Min Zhang
- Department of Computer Science, Virginia Tech, Northern Virginia Center, USA
| | - Chang-Tien Lu
- Department of Computer Science, Virginia Tech, Northern Virginia Center, USA
| |
Collapse
|
24
|
Uribe SE, Maldupa I, Kavadella A, El Tantawi M, Chaurasia A, Fontana M, Marino R, Innes N, Schwendicke F. Artificial intelligence chatbots and large language models in dental education: Worldwide survey of educators. EUROPEAN JOURNAL OF DENTAL EDUCATION : OFFICIAL JOURNAL OF THE ASSOCIATION FOR DENTAL EDUCATION IN EUROPE 2024. [PMID: 38586899 DOI: 10.1111/eje.13009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 02/15/2024] [Accepted: 03/18/2024] [Indexed: 04/09/2024]
Abstract
INTRODUCTION Interest is growing in the potential of artificial intelligence (AI) chatbots and large language models like OpenAI's ChatGPT and Google's Gemini, particularly in dental education. To explore dental educators' perceptions of AI chatbots and large language models, specifically their potential benefits and challenges for dental education. MATERIALS AND METHODS A global cross-sectional survey was conducted in May-June 2023 using a 31-item online-questionnaire to assess dental educators' perceptions of AI chatbots like ChatGPT and their influence on dental education. Dental educators, representing diverse backgrounds, were asked about their use of AI, its perceived impact, barriers to using chatbots, and the future role of AI in this field. RESULTS 428 dental educators (survey views = 1516; response rate = 28%) with a median [25/75th percentiles] age of 45 [37, 56] and 16 [8, 25] years of experience participated, with the majority from the Americas (54%), followed by Europe (26%) and Asia (10%). Thirty-one percent of respondents already use AI tools, with 64% recognising their potential in dental education. Perception of AI's potential impact on dental education varied by region, with Africa (4[4-5]), Asia (4[4-5]), and the Americas (4[3-5]) perceiving more potential than Europe (3[3-4]). Educators stated that AI chatbots could enhance knowledge acquisition (74.3%), research (68.5%), and clinical decision-making (63.6%) but expressed concern about AI's potential to reduce human interaction (53.9%). Dental educators' chief concerns centred around the absence of clear guidelines and training for using AI chatbots. CONCLUSION A positive yet cautious view towards AI chatbot integration in dental curricula is prevalent, underscoring the need for clear implementation guidelines.
Collapse
Affiliation(s)
- Sergio E Uribe
- Department of Conservative Dentistry and Oral Health, Riga Stradins University, Riga, Latvia
- Faculty of Dentistry, Universidad de Valparaiso, Valparaíso, Chile
- Baltic Biomaterials Centre of Excellence, Headquarters at Riga Technical University, Riga, Latvia
- ITU/WHO Focus Group AI on Health, Topic Group Dental, Geneva, Switzerland
| | - Ilze Maldupa
- Department of Conservative Dentistry and Oral Health, Riga Stradins University, Riga, Latvia
| | - Argyro Kavadella
- School of Dentistry, European University Cyprus, Nicosia, Cyprus
| | - Maha El Tantawi
- Faculty of Dentistry, Alexandria University, Alexandria, Egypt
| | - Akhilanand Chaurasia
- ITU/WHO Focus Group AI on Health, Topic Group Dental, Geneva, Switzerland
- Department of Oral Medicine & Radiology, King George's Medical University, Lucknow, Uttar Pradesh, India
| | - Margherita Fontana
- Department of Cariology, Restorative Sciences and Endodontics, School of Dentistry, University of Michigan, Ann Arbor, Michigan, USA
| | - Rodrigo Marino
- Melbourne Dental School, The University of Melbourne, Melbourne, Victoria, Australia
| | - Nicola Innes
- School of Dentistry, College of Biomedical & Life Sciences, Cardiff University, Cardiff, UK
| | - Falk Schwendicke
- ITU/WHO Focus Group AI on Health, Topic Group Dental, Geneva, Switzerland
- Department of Conservative Dentistry and Periodontology, Ludwig-Maximilians-University Munich, Munich, Germany
| |
Collapse
|
25
|
Gugnani N, Pandit IK, Gupta M, Gugnani S, Kathuria S. Parental concerns about oral health of children: Is ChatGPT helpful in finding appropriate answers? J Indian Soc Pedod Prev Dent 2024; 42:104-111. [PMID: 38957907 DOI: 10.4103/jisppd.jisppd_110_24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Accepted: 05/27/2024] [Indexed: 07/04/2024] Open
Abstract
INTRODUCTION Artificial intelligence (AI) is becoming an important part of our lives owing to increased data availability and improved power of computing. One of the recently launched modalities of AI, ChatGPT, is being enormously used worldwide for different types of tasks. In medical context, its use is being explored for clinical queries, academia, research help, etc. Further, literature suggests that parents seek information about health of their children using different Internet resources and would surely turn toward ChatGPT for the same, as this chatbot model is easy to use, generates "one" response, and is available without any subscription. ChatGPT generates a response using text cues and applying different algorithms on prepublished literature but is still in its naïve state; hence, it is imperative to validate the generated responses. Accordingly, we planned this study to determine the clarity, correctness, and completeness of some Frequently asked questions (FAQs) about child's oral health, from a mother's perspective. METHODS The study design was a vignette-based survey and included a set of 23 questions, for which ChatGPT was interviewed from the perspective of an imaginary parent. The answers responded by ChatGPT were copied "verbatim," and a Google survey form was designed. The survey form was validated and then sent to 15 pediatric dentists, and the responses were mainly collected on the Likert's scale with a provision of one open-ended question aiming to determine "what they would have added" to this generated response as an expert in the field. RESULTS The responses on Likert's scale were condensed and values ≥4 were considered 'adequate and acceptable' while scores ≤3, were considered 'inadequate'. The generated responses and comments mentioned by different respondents in the open-ended question were critiqued in reference to the existing literature. CONCLUSION Overall, the responses were found to be complete and logical and in clear language, with only some inadequacies being reported in few of the answers.
Collapse
Affiliation(s)
- Neeraj Gugnani
- Department of Paediatric and Preventive Dentistry, D. A. V. (C) Dental College, Yamuna Nagar, Haryana, India
| | - Inder Kumar Pandit
- Department of Paediatric and Preventive Dentistry, D. A. V. (C) Dental College, Yamuna Nagar, Haryana, India
| | - Monika Gupta
- Department of Paediatric and Preventive Dentistry, D. A. V. (C) Dental College, Yamuna Nagar, Haryana, India
| | - Shalini Gugnani
- Department of Periodontics and Oral Implantology, D. A. V. (C) Dental College, Yamuna Nagar, Haryana, India
| | - Simran Kathuria
- Department of Paediatric and Preventive Dentistry, D. A. V. (C) Dental College, Yamuna Nagar, Haryana, India
| |
Collapse
|
26
|
Shorey S, Mattar C, Pereira TLB, Choolani M. A scoping review of ChatGPT's role in healthcare education and research. NURSE EDUCATION TODAY 2024; 135:106121. [PMID: 38340639 DOI: 10.1016/j.nedt.2024.106121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 01/05/2024] [Accepted: 02/04/2024] [Indexed: 02/12/2024]
Abstract
OBJECTIVES To examine and consolidate literature regarding the advantages and disadvantages of utilizing ChatGPT in healthcare education and research. DESIGN/METHODS We searched seven electronic databases (PubMed/Medline, CINAHL, Embase, PsycINFO, Scopus, ProQuest Dissertations and Theses Global, and Web of Science) from November 2022 until September 2023. This scoping review adhered to Arksey and O'Malley's framework and followed reporting guidelines outlined in the PRISMA-ScR checklist. For analysis, we employed Thomas and Harden's thematic synthesis framework. RESULTS A total of 100 studies were included. An overarching theme, "Forging the Future: Bridging Theory and Integration of ChatGPT" emerged, accompanied by two main themes (1) Enhancing Healthcare Education, Research, and Writing with ChatGPT, (2) Controversies and Concerns about ChatGPT in Healthcare Education Research and Writing, and seven subthemes. CONCLUSIONS Our review underscores the importance of acknowledging legitimate concerns related to the potential misuse of ChatGPT such as 'ChatGPT hallucinations', its limited understanding of specialized healthcare knowledge, its impact on teaching methods and assessments, confidentiality and security risks, and the controversial practice of crediting it as a co-author on scientific papers, among other considerations. Furthermore, our review also recognizes the urgency of establishing timely guidelines and regulations, along with the active engagement of relevant stakeholders, to ensure the responsible and safe implementation of ChatGPT's capabilities. We advocate for the use of cross-verification techniques to enhance the precision and reliability of generated content, the adaptation of higher education curricula to incorporate ChatGPT's potential, educators' need to familiarize themselves with the technology to improve their literacy and teaching approaches, and the development of innovative methods to detect ChatGPT usage. Furthermore, data protection measures should be prioritized when employing ChatGPT, and transparent reporting becomes crucial when integrating ChatGPT into academic writing.
Collapse
Affiliation(s)
- Shefaly Shorey
- Alice Lee Centre for Nursing Studies, Yong Loo Lin School of Medicine, National University of Singapore, Singapore.
| | - Citra Mattar
- Division of Maternal Fetal Medicine, Department of Obstetrics and Gynaecology, National University Health Systems, Singapore; Department of Obstetrics and Gynaecology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Travis Lanz-Brian Pereira
- Alice Lee Centre for Nursing Studies, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Mahesh Choolani
- Division of Maternal Fetal Medicine, Department of Obstetrics and Gynaecology, National University Health Systems, Singapore; Department of Obstetrics and Gynaecology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| |
Collapse
|
27
|
Freire Y, Santamaría Laorden A, Orejas Pérez J, Gómez Sánchez M, Díaz-Flores García V, Suárez A. ChatGPT performance in prosthodontics: Assessment of accuracy and repeatability in answer generation. J Prosthet Dent 2024; 131:659.e1-659.e6. [PMID: 38310063 DOI: 10.1016/j.prosdent.2024.01.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 01/17/2024] [Accepted: 01/18/2024] [Indexed: 02/05/2024]
Abstract
STATEMENT OF PROBLEM The artificial intelligence (AI) software program ChatGPT is based on large language models (LLMs) and is widely accessible. However, in prosthodontics, little is known about its performance in generating answers. PURPOSE The purpose of this study was to determine the performance of ChatGPT in generating answers about removable dental prostheses (RDPs) and tooth-supported fixed dental prostheses (FDPs). MATERIAL AND METHODS Thirty short questions were designed about RDPs and tooth-supported FDP, and 30 answers were generated for each of the questions using ChatGPT-4 in October 2023. The 900 generated answers were independently graded by experts using a 3-point Likert scale. The relative frequency and absolute percentage of answers were described. Accuracy was assessed using the Wald binomial method, while repeatability was evaluated using percentage agreement, Brennan and Prediger coefficient, Conger generalized Cohen kappa, Fleiss kappa, Gwet AC, and Krippendorff alpha methods. Confidence intervals were set at 95%. Statistical analysis was performed using the STATA software program. RESULTS The performance of ChatGPT in generating answers related to RDP and tooth-supported FDP was limited. The answers showed a reliability of 25.6%, with a confidence range between 22.9% and 28.6%. The repeatability ranged from substantial to moderate. CONCLUSIONS The results show that currently ChatGPT has limited ability to generate answers related to RDPs and tooth-supported FDPs. Therefore, ChatGPT cannot replace a dentist, and, if professionals were to use it, they should be aware of its limitations.
Collapse
Affiliation(s)
- Yolanda Freire
- Assistant Professor, Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, European University of Madrid (UEM), Madrid, Spain
| | - Andrea Santamaría Laorden
- Assistant Professor, Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, European University of Madrid (UEM), Madrid, Spain
| | - Jaime Orejas Pérez
- Assistant Professor, Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, European University of Madrid (UEM), Madrid, Spain
| | - Margarita Gómez Sánchez
- Assistant Professor, Vice Dean of Dentistry, Department of Pre-Clinic Dentistry and Clinical Dentistry, Faculty of Biomedical and Health Sciences, European University of Madrid (UEM), Madrid, Spain
| | - Víctor Díaz-Flores García
- Assistant Professor, Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, European University of Madrid (UEM), Madrid, Spain.
| | - Ana Suárez
- Associate Professor, Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, European University of Madrid (UEM), Madrid, Spain
| |
Collapse
|
28
|
Abi-Rafeh J, Mroueh VJ, Bassiri-Tehrani B, Marks J, Kazan R, Nahai F. Complications Following Body Contouring: Performance Validation of Bard, a Novel AI Large Language Model, in Triaging and Managing Postoperative Patient Concerns. Aesthetic Plast Surg 2024; 48:953-976. [PMID: 38273152 DOI: 10.1007/s00266-023-03819-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 12/14/2023] [Indexed: 01/27/2024]
Abstract
INTRODUCTION Large language models (LLM) have revolutionized the way humans interact with artificial intelligence (AI) technology, with marked potential for applications in esthetic surgery. The present study evaluates the performance of Bard, a novel LLM, in identifying and managing postoperative patient concerns for complications following body contouring surgery. METHODS The American Society of Plastic Surgeons' website was queried to identify and simulate all potential postoperative complications following body contouring across different acuities and severity. Bard's accuracy was assessed in providing a differential diagnosis, soliciting a history, suggesting a most-likely diagnosis, appropriate disposition, treatments/interventions to begin from home, and red-flag signs/symptoms indicating deterioration, or requiring urgent emergency department (ED) presentation. RESULTS Twenty-two simulated body contouring complications were examined. Overall, Bard demonstrated a 59% accuracy in listing relevant diagnoses on its differentials, with a 52% incidence of incorrect or misleading diagnoses. Following history-taking, Bard demonstrated an overall accuracy of 44% in identifying the most-likely diagnosis, and a 55% accuracy in suggesting the indicated medical dispositions. Helpful treatments/interventions to begin from home were suggested with a 40% accuracy, whereas red-flag signs/symptoms, indicating deterioration, were shared with a 48% accuracy. A detailed analysis of performance, stratified according to latency of postoperative presentation (<48hours, 48hours-1month, or >1month postoperatively), and according to acuity and indicated medical disposition, is presented herein. CONCLUSIONS Despite promising potential of LLMs and AI in healthcare-related applications, Bard's performance in the present study significantly falls short of accepted clinical standards, thus indicating a need for further research and development prior to adoption. LEVEL OF EVIDENCE IV This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
Collapse
Affiliation(s)
- Jad Abi-Rafeh
- Division of Plastic, Reconstructive, and Aesthetic Surgery, McGill University Health Centre, Montreal, QC, Canada
| | - Vanessa J Mroueh
- Brigham and Women's Hospital, Harvard Medical School, Boston Massachusetts, USA
| | | | - Jacob Marks
- Manhattan Eye, Ear, and Throat Hospital, New York, NY, USA
| | - Roy Kazan
- Division of Plastic, Reconstructive, and Aesthetic Surgery, McGill University Health Centre, Montreal, QC, Canada
| | - Foad Nahai
- Department of Surgery, Emory University, Atlanta, GA, USA.
| |
Collapse
|
29
|
Cevik J, Lim B, Seth I, Sofiadellis F, Ross RJ, Cuomo R, Rozen WM. Assessment of the bias of artificial intelligence generated images and large language models on their depiction of a surgeon. ANZ J Surg 2024; 94:287-294. [PMID: 38087912 DOI: 10.1111/ans.18792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 10/22/2023] [Accepted: 11/12/2023] [Indexed: 03/20/2024]
Affiliation(s)
- Jevan Cevik
- Department of Plastic Surgery, Peninsula Health, Melbourne, Victoria, 3199, Australia
- The Alfred Centre, Central Clinical School at Monash University, 99 Commercial Rd, Melbourne, Victoria, 3004, Australia
| | - Bryan Lim
- Department of Plastic Surgery, Peninsula Health, Melbourne, Victoria, 3199, Australia
- The Alfred Centre, Central Clinical School at Monash University, 99 Commercial Rd, Melbourne, Victoria, 3004, Australia
| | - Ishith Seth
- Department of Plastic Surgery, Peninsula Health, Melbourne, Victoria, 3199, Australia
- The Alfred Centre, Central Clinical School at Monash University, 99 Commercial Rd, Melbourne, Victoria, 3004, Australia
| | - Foti Sofiadellis
- Department of Plastic Surgery, Peninsula Health, Melbourne, Victoria, 3199, Australia
| | - Richard J Ross
- Department of Plastic Surgery, Peninsula Health, Melbourne, Victoria, 3199, Australia
| | - Roberto Cuomo
- Plastic Surgery Unit, Department of Medicine, Surgery and Neuroscience, University of Siena, Siena, 53100, Italy
| | - Warren M Rozen
- Department of Plastic Surgery, Peninsula Health, Melbourne, Victoria, 3199, Australia
- The Alfred Centre, Central Clinical School at Monash University, 99 Commercial Rd, Melbourne, Victoria, 3004, Australia
| |
Collapse
|
30
|
Lee Y, Kim SY. Potential applications of ChatGPT in obstetrics and gynecology in Korea: a review article. Obstet Gynecol Sci 2024; 67:153-159. [PMID: 38247132 PMCID: PMC10948210 DOI: 10.5468/ogs.23231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 11/08/2023] [Accepted: 11/29/2023] [Indexed: 01/23/2024] Open
Abstract
The use of chatbot technology, particularly chat generative pre-trained transformer (ChatGPT) with an impressive 175 billion parameters, has garnered significant attention across various domains, including Obstetrics and Gynecology (OBGYN). This comprehensive review delves into the transformative potential of chatbots with a special focus on ChatGPT as a leading artificial intelligence (AI) technology. Moreover, ChatGPT harnesses the power of deep learning algorithms to generate responses that closely mimic human language, opening up myriad applications in medicine, research, and education. In the field of medicine, ChatGPT plays a pivotal role in diagnosis, treatment, and personalized patient education. Notably, the technology has demonstrated remarkable capabilities, surpassing human performance in OBGYN examinations, and delivering highly accurate diagnoses. However, challenges remain, including the need to verify the accuracy of the information and address the ethical considerations and limitations. In the wide scope of chatbot technology, AI systems play a vital role in healthcare processes, including documentation, diagnosis, research, and education. Although promising, the limitations and occasional inaccuracies require validation by healthcare professionals. This review also examined global chatbot adoption in healthcare, emphasizing the need for user awareness to ensure patient safety. Chatbot technology holds great promise in OBGYN and medicine, offering innovative solutions while necessitating responsible integration to ensure patient care and safety.
Collapse
Affiliation(s)
- YooKyung Lee
- Division of Maternal Fetal Medicine, Department of Obstetrics and Gynecology, MizMedi Hospital, Seoul, Korea
| | - So Yun Kim
- Division of Maternal Fetal Medicine, Department of Obstetrics and Gynecology, MizMedi Hospital, Seoul, Korea
| |
Collapse
|
31
|
Bukar UA, Sayeed MS, Razak SFA, Yogarayan S, Amodu OA. An integrative decision-making framework to guide policies on regulating ChatGPT usage. PeerJ Comput Sci 2024; 10:e1845. [PMID: 38440047 PMCID: PMC10911759 DOI: 10.7717/peerj-cs.1845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 01/09/2024] [Indexed: 03/06/2024]
Abstract
Generative artificial intelligence has created a moment in history where human beings have begin to closely interact with artificial intelligence (AI) tools, putting policymakers in a position to restrict or legislate such tools. One particular example of such a tool is ChatGPT which is the first and world's most popular multipurpose generative AI tool. This study aims to put forward a policy-making framework of generative artificial intelligence based on the risk, reward, and resilience framework. A systematic search was conducted, by using carefully chosen keywords, excluding non-English content, conference articles, book chapters, and editorials. Published research were filtered based on their relevance to ChatGPT ethics, yielding a total of 41 articles. Key elements surrounding ChatGPT concerns and motivations were systematically deduced and classified under the risk, reward, and resilience categories to serve as ingredients for the proposed decision-making framework. The decision-making process and rules were developed as a primer to help policymakers navigate decision-making conundrums. Then, the framework was practically tailored towards some of the concerns surrounding ChatGPT in the context of higher education. In the case of the interconnection between risk and reward, the findings show that providing students with access to ChatGPT presents an opportunity for increased efficiency in tasks such as text summarization and workload reduction. However, this exposes them to risks such as plagiarism and cheating. Similarly, pursuing certain opportunities such as accessing vast amounts of information, can lead to rewards, but it also introduces risks like misinformation and copyright issues. Likewise, focusing on specific capabilities of ChatGPT, such as developing tools to detect plagiarism and misinformation, may enhance resilience in some areas (e.g., academic integrity). However, it may also create vulnerabilities in other domains, such as the digital divide, educational equity, and job losses. Furthermore, the finding indicates second-order effects of legislation regarding ChatGPT which have implications both positively and negatively. One potential effect is a decrease in rewards due to the limitations imposed by the legislation, which may hinder individuals from fully capitalizing on the opportunities provided by ChatGPT. Hence, the risk, reward, and resilience framework provides a comprehensive and flexible decision-making model that allows policymakers and in this use case, higher education institutions to navigate the complexities and trade-offs associated with ChatGPT, which have theoretical and practical implications for the future.
Collapse
Affiliation(s)
- Umar Ali Bukar
- Centre for Intelligent Cloud Computing (CICC), Faculty of Information Science & Technology, Multimedia University, Melaka, Malaysia
| | - Md Shohel Sayeed
- Centre for Intelligent Cloud Computing (CICC), Faculty of Information Science & Technology, Multimedia University, Melaka, Malaysia
| | - Siti Fatimah Abdul Razak
- Centre for Intelligent Cloud Computing (CICC), Faculty of Information Science & Technology, Multimedia University, Melaka, Malaysia
| | - Sumendra Yogarayan
- Centre for Intelligent Cloud Computing (CICC), Faculty of Information Science & Technology, Multimedia University, Melaka, Malaysia
| | - Oluwatosin Ahmed Amodu
- Information and Communication Engineering Department, Elizade University, Ilara-Mokin, Ondo State, Nigeria
| |
Collapse
|
32
|
Liu Z, Zhang L, Wu Z, Yu X, Cao C, Dai H, Liu N, Liu J, Liu W, Li Q, Shen D, Li X, Zhu D, Liu T. Surviving ChatGPT in healthcare. FRONTIERS IN RADIOLOGY 2024; 3:1224682. [PMID: 38464946 PMCID: PMC10920216 DOI: 10.3389/fradi.2023.1224682] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 07/25/2023] [Indexed: 03/12/2024]
Abstract
At the dawn of of Artificial General Intelligence (AGI), the emergence of large language models such as ChatGPT show promise in revolutionizing healthcare by improving patient care, expanding medical access, and optimizing clinical processes. However, their integration into healthcare systems requires careful consideration of potential risks, such as inaccurate medical advice, patient privacy violations, the creation of falsified documents or images, overreliance on AGI in medical education, and the perpetuation of biases. It is crucial to implement proper oversight and regulation to address these risks, ensuring the safe and effective incorporation of AGI technologies into healthcare systems. By acknowledging and mitigating these challenges, AGI can be harnessed to enhance patient care, medical knowledge, and healthcare processes, ultimately benefiting society as a whole.
Collapse
Affiliation(s)
- Zhengliang Liu
- School of Computing, University of Georgia, Athens, GA, United States
| | - Lu Zhang
- Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington, TX, United States
| | - Zihao Wu
- School of Computing, University of Georgia, Athens, GA, United States
| | - Xiaowei Yu
- Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington, TX, United States
| | - Chao Cao
- Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington, TX, United States
| | - Haixing Dai
- School of Computing, University of Georgia, Athens, GA, United States
| | - Ninghao Liu
- School of Computing, University of Georgia, Athens, GA, United States
| | - Jun Liu
- Department of Radiology, Second Xiangya Hospital, Changsha, Hunan, China
| | - Wei Liu
- Department of Radiation Oncology, Mayo Clinic, Scottsdale, AZ, United States
| | - Quanzheng Li
- Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, United States
| | - Dinggang Shen
- School of Biomedical Engineering, ShanghaiTech University, Shanghai, China
- Department of Research and Development, Shanhai United Imaging Intelligence Co., Ltd., Shanghai, China
- Shanghai Clinical Research and Trial Center, Shanghai, China
| | - Xiang Li
- Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, United States
| | - Dajiang Zhu
- Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington, TX, United States
| | - Tianming Liu
- School of Computing, University of Georgia, Athens, GA, United States
| |
Collapse
|
33
|
Hu Y, Hu Z, Liu W, Gao A, Wen S, Liu S, Lin Z. Exploring the potential of ChatGPT as an adjunct for generating diagnosis based on chief complaint and cone beam CT radiologic findings. BMC Med Inform Decis Mak 2024; 24:55. [PMID: 38374067 PMCID: PMC10875853 DOI: 10.1186/s12911-024-02445-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 01/28/2024] [Indexed: 02/21/2024] Open
Abstract
AIM This study aimed to assess the performance of OpenAI's ChatGPT in generating diagnosis based on chief complaint and cone beam computed tomography (CBCT) radiologic findings. MATERIALS AND METHODS 102 CBCT reports (48 with dental diseases (DD) and 54 with neoplastic/cystic diseases (N/CD)) were collected. ChatGPT was provided with chief complaint and CBCT radiologic findings. Diagnostic outputs from ChatGPT were scored based on five-point Likert scale. For diagnosis accuracy, the scoring was based on the accuracy of chief complaint related diagnosis and chief complaint unrelated diagnoses (1-5 points); for diagnosis completeness, the scoring was based on how many accurate diagnoses included in ChatGPT's output for one case (1-5 points); for text quality, the scoring was based on how many text errors included in ChatGPT's output for one case (1-5 points). For 54 N/CD cases, the consistence of the diagnosis generated by ChatGPT with pathological diagnosis was also calculated. The constitution of text errors in ChatGPT's outputs was evaluated. RESULTS After subjective ratings by expert reviewers on a five-point Likert scale, the final score of diagnosis accuracy, diagnosis completeness and text quality of ChatGPT was 3.7, 4.5 and 4.6 for the 102 cases. For diagnostic accuracy, it performed significantly better on N/CD (3.8/5) compared to DD (3.6/5). For 54 N/CD cases, 21(38.9%) cases have first diagnosis completely consistent with pathological diagnosis. No text errors were observed in 88.7% of all the 390 text items. CONCLUSION ChatGPT showed potential in generating radiographic diagnosis based on chief complaint and radiologic findings. However, the performance of ChatGPT varied with task complexity, necessitating professional oversight due to a certain error rate.
Collapse
Affiliation(s)
- Yanni Hu
- Department of Dentomaxillofacial Radiology, Nanjing Stomatological Hospital, Affiliated Hospital of Medical School, Institute of Stomatology, Nanjing University, Nanjing, Jiangsu, People's Republic of China
| | - Ziyang Hu
- Department of Dentomaxillofacial Radiology, Nanjing Stomatological Hospital, Affiliated Hospital of Medical School, Institute of Stomatology, Nanjing University, Nanjing, Jiangsu, People's Republic of China
- Department of Stomatology, Shenzhen Longhua District Central Hospital, Shenzhen, People's Republic of China
| | - Wenjing Liu
- Department of Dentomaxillofacial Radiology, Nanjing Stomatological Hospital, Affiliated Hospital of Medical School, Institute of Stomatology, Nanjing University, Nanjing, Jiangsu, People's Republic of China
| | - Antian Gao
- Department of Dentomaxillofacial Radiology, Nanjing Stomatological Hospital, Affiliated Hospital of Medical School, Institute of Stomatology, Nanjing University, Nanjing, Jiangsu, People's Republic of China
| | - Shanhui Wen
- Department of Dentomaxillofacial Radiology, Nanjing Stomatological Hospital, Affiliated Hospital of Medical School, Institute of Stomatology, Nanjing University, Nanjing, Jiangsu, People's Republic of China
| | - Shu Liu
- Department of Dentomaxillofacial Radiology, Nanjing Stomatological Hospital, Affiliated Hospital of Medical School, Institute of Stomatology, Nanjing University, Nanjing, Jiangsu, People's Republic of China
| | - Zitong Lin
- Department of Dentomaxillofacial Radiology, Nanjing Stomatological Hospital, Affiliated Hospital of Medical School, Institute of Stomatology, Nanjing University, Nanjing, Jiangsu, People's Republic of China.
| |
Collapse
|
34
|
Denecke K, May R, Rivera-Romero O. Transformer Models in Healthcare: A Survey and Thematic Analysis of Potentials, Shortcomings and Risks. J Med Syst 2024; 48:23. [PMID: 38367119 PMCID: PMC10874304 DOI: 10.1007/s10916-024-02043-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 02/10/2024] [Indexed: 02/19/2024]
Abstract
Large Language Models (LLMs) such as General Pretrained Transformer (GPT) and Bidirectional Encoder Representations from Transformers (BERT), which use transformer model architectures, have significantly advanced artificial intelligence and natural language processing. Recognized for their ability to capture associative relationships between words based on shared context, these models are poised to transform healthcare by improving diagnostic accuracy, tailoring treatment plans, and predicting patient outcomes. However, there are multiple risks and potentially unintended consequences associated with their use in healthcare applications. This study, conducted with 28 participants using a qualitative approach, explores the benefits, shortcomings, and risks of using transformer models in healthcare. It analyses responses to seven open-ended questions using a simplified thematic analysis. Our research reveals seven benefits, including improved operational efficiency, optimized processes and refined clinical documentation. Despite these benefits, there are significant concerns about the introduction of bias, auditability issues and privacy risks. Challenges include the need for specialized expertise, the emergence of ethical dilemmas and the potential reduction in the human element of patient care. For the medical profession, risks include the impact on employment, changes in the patient-doctor dynamic, and the need for extensive training in both system operation and data interpretation.
Collapse
Affiliation(s)
- Kerstin Denecke
- Institute Patient-centered Digital Health, Bern University of Applied Sciences, Quellgasse 21, Biel, 2502, Switzerland.
| | - Richard May
- Harz University of Applied Sciences, Friedrichstraße 57-59, 38855, Wernigerode, Germany
| | - Octavio Rivera-Romero
- Instituto de Ingeniería Informática (I3US), Universidad de Sevilla, Sevilla, Spain
- Department of Electronic Technology, Universidad de Sevilla, Avda Reina Mercedes s/n, ETSI Informática, G1.43, Sevilla, 41012, Spain
| |
Collapse
|
35
|
Zhang Z, Huang X. The impact of chatbots based on large language models on second language vocabulary acquisition. Heliyon 2024; 10:e25370. [PMID: 38333802 PMCID: PMC10850600 DOI: 10.1016/j.heliyon.2024.e25370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 12/23/2023] [Accepted: 01/25/2024] [Indexed: 02/10/2024] Open
Abstract
In recent years, the integration of artificial intelligence (AI) and machine learning (ML) into education, particularly for Personalized Language Learning (PLL), has garnered significant attention. This approach tailors interventions to address the unique challenges faced by individual learners. Large Language Models (LLMs), including Chatbots, have demonstrated a substantial potential in automating and enhancing educational tasks, effectively capturing the complexity and diversity of human language. In this study, 52 foreign language students were randomly divided into two groups: one with the assistance of a Chatbot based on LLMs and one without. Both groups learned the same series of target words over eight weeks. Post-treatment assessments, including systematic observation and quantitative tests assessing both receptive and productive vocabulary knowledge, were conducted immediately after the study and again two weeks later. The findings demonstrate that employing an AI Chatbot based on LLMs significantly aids students in acquiring both receptive and productive vocabulary knowledge during their second language learning journey. Notably, Chatbots contribute to the long-term retention of productive vocabulary and facilitate incidental vocabulary learning. This study offers valuable insights into the practical benefits of LLM-based tools in language learning, with a specific emphasis on vocabulary development. Chatbots utilizing LLMs emerge as effective language learning aids. It emphasizes the importance of educators understanding the potential of these technologies in L2 vocabulary instruction and encourages the adoption of strategic teaching methods incorporating such tools.
Collapse
Affiliation(s)
- Zhihui Zhang
- Rossier School of Education, University of Southern California, 3551 Trousdale Pkwy, Los Angeles, CA, 90089, USA
| | - Xiaomeng Huang
- Alibaba Cloud, 969 West Wen Yi Road Yu Hang District, Hangzhou, Zhejiang Province, 311121, China
| |
Collapse
|
36
|
Büttner M, Leser U, Schneider L, Schwendicke F. Natural Language Processing: Chances and Challenges in Dentistry. J Dent 2024; 141:104796. [PMID: 38072335 DOI: 10.1016/j.jdent.2023.104796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 11/25/2023] [Accepted: 11/27/2023] [Indexed: 12/21/2023] Open
Abstract
INTRODUCTION Natural language processing (NLP) is an intersection between Computer Science and Linguistic which aims to enable machines to process and understand human language. We here summarized applications and limitations of NLP in dentistry. DATA AND SOURCES Narrative review. FINDINGS NLP has evolved increasingly fast. For the dental domain, relevant NLP applications are text classification (e.g., symptom classification) and natural language generation and understanding (e.g., clinical chatbots assisting professionals in office work and patient communication). Analyzing large quantities of text will allow understanding diseases and their trajectories and support a more precise and personalized care. Speech recognition systems may serve as virtual assistants and facilitate automated documentation. However, to date, NLP has rarely been applied in dentistry. Existing research focuses mainly on rule-based solutions for narrow tasks. Technologies such as Recurrent Neural Networks and Transformers have been shown to surpass the language processing capabilities of such rule-based solutions in many fields, but are data-hungry (i.e., rely on large amounts of training data), which limits their application in the dental domain at present. Technologies such as federated or transfer learning or data sharing concepts may allow to overcome this limitation, while challenges in terms of explainability, reproducibility, generalizability and evaluation of NLP in dentistry remain to be resolved for enabling approval of such technologies in medical devices and services. CONCLUSIONS NLP will become a cornerstone of a number of applications in dentistry. The community is called to action to improve the current limitations and foster reliable, high-quality dental NLP. CLINICAL SIGNIFICANCE NLP for text classification (e.g., dental symptom classification) and language generation and understanding (e.g., clinical chatbots, speech recognition) will support administrative tasks in dentistry, provide deeper insights for clinicians and support research and education.
Collapse
Affiliation(s)
- Martha Büttner
- Department of Oral Diagnostics, Digital Health and Health Services Research, Charité - Universitätsmedizin Berlin, Germany.
| | - Ulf Leser
- Department of Computer Science, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Lisa Schneider
- Department of Oral Diagnostics, Digital Health and Health Services Research, Charité - Universitätsmedizin Berlin, Germany
| | - Falk Schwendicke
- Clinic for Operative, Preventive and Pediatric Dentistry and Periodontology, Ludwig-Maximilians-University, Munich, Germany
| |
Collapse
|
37
|
Suárez A, Díaz-Flores García V, Algar J, Gómez Sánchez M, Llorente de Pedro M, Freire Y. Unveiling the ChatGPT phenomenon: Evaluating the consistency and accuracy of endodontic question answers. Int Endod J 2024; 57:108-113. [PMID: 37814369 DOI: 10.1111/iej.13985] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Revised: 09/26/2023] [Accepted: 09/27/2023] [Indexed: 10/11/2023]
Abstract
AIM Chatbot Generative Pre-trained Transformer (ChatGPT) is a generative artificial intelligence (AI) software based on large language models (LLMs), designed to simulate human conversations and generate novel content based on the training data it has been exposed to. The aim of this study was to evaluate the consistency and accuracy of ChatGPT-generated answers to clinical questions in endodontics, compared to answers provided by human experts. METHODOLOGY Ninety-one dichotomous (yes/no) questions were designed and categorized into three levels of difficulty. Twenty questions were randomly selected from each difficulty level. Sixty answers were generated by ChatGPT for each question. Two endodontic experts independently answered the 60 questions. Statistical analysis was performed using the SPSS program to calculate the consistency and accuracy of the answers generated by ChatGPT compared to the experts. Confidence intervals (95%) and standard deviations were used to estimate variability. RESULTS The answers generated by ChatGPT showed high consistency (85.44%). No significant differences in consistency were found based on question difficulty. In terms of answer accuracy, ChatGPT achieved an average accuracy of 57.33%. However, significant differences in accuracy were observed based on question difficulty, with lower accuracy for easier questions. CONCLUSIONS Currently, ChatGPT is not capable of replacing dentists in clinical decision-making. As ChatGPT's performance improves through deep learning, it is expected to become more useful and effective in the field of endodontics. However, careful attention and ongoing evaluation are needed to ensure its accuracy, reliability and safety in endodontics.
Collapse
Affiliation(s)
- Ana Suárez
- Department of Pre-Clinic Dentistry, School of Biomedical Sciences, Universidad Europea de Madrid, Madrid, Spain
| | - Víctor Díaz-Flores García
- Department of Pre-Clinic Dentistry, School of Biomedical Sciences, Universidad Europea de Madrid, Madrid, Spain
| | - Juan Algar
- Department of Clinical Dentistry, School of Biomedical Sciences, Universidad Europea de Madrid, Madrid, Spain
| | - Margarita Gómez Sánchez
- Department of Pre-Clinic Dentistry, School of Biomedical Sciences, Universidad Europea de Madrid, Madrid, Spain
| | - María Llorente de Pedro
- Department of Pre-Clinic Dentistry, School of Biomedical Sciences, Universidad Europea de Madrid, Madrid, Spain
| | - Yolanda Freire
- Department of Pre-Clinic Dentistry, School of Biomedical Sciences, Universidad Europea de Madrid, Madrid, Spain
| |
Collapse
|
38
|
Albagieh H, Alzeer ZO, Alasmari ON, Alkadhi AA, Naitah AN, Almasaad KF, Alshahrani TS, Alshahrani KS, Almahmoud MI. Comparing Artificial Intelligence and Senior Residents in Oral Lesion Diagnosis: A Comparative Study. Cureus 2024; 16:e51584. [PMID: 38173951 PMCID: PMC10763647 DOI: 10.7759/cureus.51584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/03/2024] [Indexed: 01/05/2024] Open
Abstract
INTRODUCTION Artificial intelligence (AI) is a field of computer science that seeks to build intelligent machines that can carry out tasks that usually necessitate human intelligence. AI may help dentists with a variety of dental tasks, including clinical diagnosis and treatment planning. This study aims to compare the performance of AI and oral medicine residents in diagnosing different cases, providing treatment, and determining if it is reliable to assist them in their field of work. METHODS The study conducted a comparative analysis of the responses from third- and fourth-year residents trained in Oral Medicine and Pathology at King Saud University, College of Dentistry. The residents were given a closed multiple-choice test consisting of 19 questions with four response options labeled A-D and one question with five response options labeled A-E. The test was administered via Google Forms, and each resident's response was stored electronically in an Excel sheet (Microsoft® Corp., Redmond, WA). The residents' answers were then compared to the responses generated by three major language models: OpenAI, Stablediffusion, and PopAI. The questions were inputted into the language models in the same format as the original test, and prior to each question, an artificial intelligence chat session was created to eliminate memory retention bias. The input was done on November 19, 2023, the same day the official multiple-choice test was administered. The study had a sample size of 20 residents trained in Oral Medicine and Pathology at King Saud University, College of Dentistry, consisting of both third-year and fourth-year residents. RESULT The responses of three large language models (LLM), including OpenAI, Stablediffusion, and PopAI, as well as the responses of 20 senior residents for 20 clinical cases about oral lesion diagnosis. There were no significant variations observed for the remaining questions in the responses to only two questions (10%). For the remaining questions, there were no significant differences. The median (IQR) score of LLMs was 50.0 (45.0 to 60.0), with a minimum of 40 (for stable diffusion) and a maximum of 70 (for OpenAI). The median (IQR) score of senior residents was 65.0 (55.0-75.0). The highest and lowest scores of residents were 40 and 90, respectively. There was no significant difference in the percent scores of residents and LLMs (p = 0.211). The agreement level was measured using the Kappa value. The agreement among senior dental residents was observed to be weak, with a Kappa value of 0.396. In contrast, the agreement among LLMs demonstrated a moderate level, with a Kappa value of 0.622, suggesting a more cohesive alignment in responses among the artificial intelligence models. When comparing residents' responses with those generated by different OpenAI models, including OpenAI, Stablediffusion, and PopAI, the agreement levels were consistently categorized as weak, with Kappa values of 0.402, 0.381, and 0.392, respectively. CONCLUSION What the current study reveals is that when comparing the response score, there is no significant difference, in contrast to the agreement analysis among the residents, which was low compared to the LLMs, in which it was high. Dentists should consider that AI is very beneficial in providing diagnosis and treatment and use it to assist them.
Collapse
Affiliation(s)
| | - Zaid O Alzeer
- Dentistry, College of Dentistry, King Saud University, Riyadh, SAU
| | - Osama N Alasmari
- Dentistry, College of Dentistry, King Saud University, Riyadh, SAU
| | - Abdullah A Alkadhi
- College of Dentistry, Dental University Hospital/King Saud University, Riyadh, SAU
| | - Abdulaziz N Naitah
- College of Dentistry, Dental University Hospital/King Saud University, Riyadh, SAU
| | | | - Turki S Alshahrani
- College of Dentistry, Dental University Hospital/King Saud University, Riyadh, SAU
| | - Khalid S Alshahrani
- College of Dentistry, Dental University Hospital/King Saud University, Riyadh, SAU
| | | |
Collapse
|
39
|
Iglesias-Puzas A, Conde-Taboada A, López-Bran E. [Considerations for using ChatGPT in medical practice]. J Healthc Qual Res 2024; 39:55-56. [PMID: 37949772 DOI: 10.1016/j.jhqr.2023.09.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 05/03/2023] [Accepted: 09/26/2023] [Indexed: 11/12/2023]
Affiliation(s)
- A Iglesias-Puzas
- Servicio de Dermatología, Hospital Universitario Clínico San Carlos, Universidad Complutense, Madrid, España.
| | - A Conde-Taboada
- Servicio de Dermatología, Hospital Universitario Clínico San Carlos, Universidad Complutense, Madrid, España
| | - E López-Bran
- Servicio de Dermatología, Hospital Universitario Clínico San Carlos, Universidad Complutense, Madrid, España
| |
Collapse
|
40
|
Rahad K, Martin K, Amugo I, Ferguson S, Curtis A, Davis A, Gangula P, Wang Q. ChatGPT to enhance learning in dental education at a historically black medical college. RESEARCH SQUARE 2023:rs.3.rs-3546693. [PMID: 37986988 PMCID: PMC10659452 DOI: 10.21203/rs.3.rs-3546693/v2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
The recent rise of powerful large language model (LLM)-based AI tools, exemplified by ChatGPT and Bard, poses a great challenge to contemporary dental education while simultaneously offering a unique resource and approach that potentially complements today's teaching and learning, where existing widely available learning resources have often fallen short. Although both the clinical and educational aspects of dentistry will be shaped profoundly by the LLM tools, the didactic curricula, which primarily rely on lecture-based courses where instructors impart knowledge through presentations and discussions, need to be upgraded urgently. In this paper, we used dental course materials, syllabi, and textbooks adopted currently in the School of Dentistry (SOD) at Meharry Medical College to assess the potential utility and effectiveness of ChatGPT in dental education. We collected the responses of the chatbot to questions as well as students' interactions with it for assessment. Our results showed that ChatGPT can assist in dental essay writing and generate relevant content for dental students, in addition to other benefits. The limitations of ChatGPT were also discussed in the paper.
Collapse
|
41
|
Giannakopoulos K, Kavadella A, Aaqel Salim A, Stamatopoulos V, Kaklamanos EG. Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study. J Med Internet Res 2023; 25:e51580. [PMID: 38009003 PMCID: PMC10784979 DOI: 10.2196/51580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 10/15/2023] [Accepted: 11/20/2023] [Indexed: 11/28/2023] Open
Abstract
BACKGROUND The increasing application of generative artificial intelligence large language models (LLMs) in various fields, including dentistry, raises questions about their accuracy. OBJECTIVE This study aims to comparatively evaluate the answers provided by 4 LLMs, namely Bard (Google LLC), ChatGPT-3.5 and ChatGPT-4 (OpenAI), and Bing Chat (Microsoft Corp), to clinically relevant questions from the field of dentistry. METHODS The LLMs were queried with 20 open-type, clinical dentistry-related questions from different disciplines, developed by the respective faculty of the School of Dentistry, European University Cyprus. The LLMs' answers were graded 0 (minimum) to 10 (maximum) points against strong, traditionally collected scientific evidence, such as guidelines and consensus statements, using a rubric, as if they were examination questions posed to students, by 2 experienced faculty members. The scores were statistically compared to identify the best-performing model using the Friedman and Wilcoxon tests. Moreover, the evaluators were asked to provide a qualitative evaluation of the comprehensiveness, scientific accuracy, clarity, and relevance of the LLMs' answers. RESULTS Overall, no statistically significant difference was detected between the scores given by the 2 evaluators; therefore, an average score was computed for every LLM. Although ChatGPT-4 statistically outperformed ChatGPT-3.5 (P=.008), Bing Chat (P=.049), and Bard (P=.045), all models occasionally exhibited inaccuracies, generality, outdated content, and a lack of source references. The evaluators noted instances where the LLMs delivered irrelevant information, vague answers, or information that was not fully accurate. CONCLUSIONS This study demonstrates that although LLMs hold promising potential as an aid in the implementation of evidence-based dentistry, their current limitations can lead to potentially harmful health care decisions if not used judiciously. Therefore, these tools should not replace the dentist's critical thinking and in-depth understanding of the subject matter. Further research, clinical validation, and model improvements are necessary for these tools to be fully integrated into dental practice. Dental practitioners must be aware of the limitations of LLMs, as their imprudent use could potentially impact patient care. Regulatory measures should be established to oversee the use of these evolving technologies.
Collapse
Affiliation(s)
| | - Argyro Kavadella
- School of Dentistry, European University Cyprus, Nicosia, Cyprus
| | - Anas Aaqel Salim
- School of Dentistry, European University Cyprus, Nicosia, Cyprus
| | - Vassilis Stamatopoulos
- Information Management Systems Institute, ATHENA Research and Innovation Center, Athens, Greece
| | - Eleftherios G Kaklamanos
- School of Dentistry, European University Cyprus, Nicosia, Cyprus
- School of Dentistry, Aristotle University of Thessaloniki, Thessaloniki, Greece
- Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, United Arab Emirates
| |
Collapse
|
42
|
Johnson M, Ribeiro AP, Drew TM, Pereira PNR. Generative AI use in dental education: Efficient exam item writing. J Dent Educ 2023; 87 Suppl 3:1865-1866. [PMID: 37354022 DOI: 10.1002/jdd.13294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Accepted: 06/08/2023] [Indexed: 06/25/2023]
Affiliation(s)
- Margeaux Johnson
- College of Dentistry, University of Florida, Restorative Dental Sciences, Division of Operative Dentistry, Gainesville, Florida, USA
| | - Ana P Ribeiro
- College of Dentistry, University of Florida, Restorative Dental Sciences, Division of Operative Dentistry, Gainesville, Florida, USA
| | - Tiffany M Drew
- College of Dentistry, University of Florida, Restorative Dental Sciences, Division of Operative Dentistry, Gainesville, Florida, USA
| | - Patricia N R Pereira
- College of Dentistry, University of Florida, Restorative Dental Sciences, Division of Operative Dentistry, Gainesville, Florida, USA
| |
Collapse
|
43
|
Ohta K, Ohta S. The Performance of GPT-3.5, GPT-4, and Bard on the Japanese National Dentist Examination: A Comparison Study. Cureus 2023; 15:e50369. [PMID: 38213361 PMCID: PMC10782219 DOI: 10.7759/cureus.50369] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/12/2023] [Indexed: 01/13/2024] Open
Abstract
Purpose This study aims to evaluate the performance of three large language models (LLMs), the Generative Pre-trained Transformer (GPT)-3.5, GPT-4, and Google Bard, on the 2023 Japanese National Dentist Examination (JNDE) and assess their potential clinical applications in Japan. Methods A total of 185 questions from the 2023 JNDE were used. These questions were categorized by question type and category. McNemar's test compared the correct response rates between two LLMs, while Fisher's exact test evaluated the performance of LLMs in each question category. Results The overall correct response rates were 73.5% for GPT-4, 66.5% for Bard, and 51.9% for GPT-3.5. GPT-4 showed a significantly higher correct response rate than Bard and GPT-3.5. In the category of essential questions, Bard achieved a correct response rate of 80.5%, surpassing the passing criterion of 80%. In contrast, both GPT-4 and GPT-3.5 fell short of this benchmark, with GPT-4 attaining 77.6% and GPT-3.5 only 52.5%. The scores of GPT-4 and Bard were significantly higher than that of GPT-3.5 (p<0.01). For general questions, the correct response rates were 71.2% for GPT-4, 58.5% for Bard, and 52.5% for GPT-3.5. GPT-4 outperformed GPT-3.5 and Bard (p<0.01). The correct response rates for professional dental questions were 51.6% for GPT-4, 45.3% for Bard, and 35.9% for GPT-3.5. The differences among the models were not statistically significant. All LLMs demonstrated significantly lower accuracy for dentistry questions compared to other types of questions (p<0.01). Conclusions GPT-4 achieved the highest overall score in the JNDE, followed by Bard and GPT-3.5. However, only Bard surpassed the passing score for essential questions. To further understand the application of LLMs in clinical dentistry worldwide, more research on their performance in dental examinations across different languages is required.
Collapse
Affiliation(s)
| | - Satomi Ohta
- Dentistry, Dentist of Mama and Kodomo, Kobe, JPN
| |
Collapse
|
44
|
Chatterjee S, Bhattacharya M, Pal S, Lee SS, Chakraborty C. ChatGPT and large language models in orthopedics: from education and surgery to research. J Exp Orthop 2023; 10:128. [PMID: 38038796 PMCID: PMC10692045 DOI: 10.1186/s40634-023-00700-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 11/16/2023] [Indexed: 12/02/2023] Open
Abstract
ChatGPT has quickly popularized since its release in November 2022. Currently, large language models (LLMs) and ChatGPT have been applied in various domains of medical science, including in cardiology, nephrology, orthopedics, ophthalmology, gastroenterology, and radiology. Researchers are exploring the potential of LLMs and ChatGPT for clinicians and surgeons in every domain. This study discusses how ChatGPT can help orthopedic clinicians and surgeons perform various medical tasks. LLMs and ChatGPT can help the patient community by providing suggestions and diagnostic guidelines. In this study, the use of LLMs and ChatGPT to enhance and expand the field of orthopedics, including orthopedic education, surgery, and research, is explored. Present LLMs have several shortcomings, which are discussed herein. However, next-generation and future domain-specific LLMs are expected to be more potent and transform patients' quality of life.
Collapse
Affiliation(s)
- Srijan Chatterjee
- Institute for Skeletal Aging & Orthopaedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon-Si, 24252, Gangwon-Do, Republic of Korea
| | - Manojit Bhattacharya
- Department of Zoology, Fakir Mohan University, Vyasa Vihar, Balasore, 756020, Odisha, India
| | - Soumen Pal
- School of Mechanical Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | - Sang-Soo Lee
- Institute for Skeletal Aging & Orthopaedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon-Si, 24252, Gangwon-Do, Republic of Korea.
| | - Chiranjib Chakraborty
- Department of Biotechnology, School of Life Science and Biotechnology, Adamas University, Kolkata, West Bengal, 700126, India.
| |
Collapse
|
45
|
Danesh A, Pazouki H, Danesh K, Danesh F, Danesh A. The performance of artificial intelligence language models in board-style dental knowledge assessment: A preliminary study on ChatGPT. J Am Dent Assoc 2023; 154:970-974. [PMID: 37676187 DOI: 10.1016/j.adaj.2023.07.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 07/27/2023] [Accepted: 07/29/2023] [Indexed: 09/08/2023]
Abstract
BACKGROUND Although Chat Generative Pre-trained Transformer (ChatGPT) (OpenAI) may be an appealing educational resource for students, the chatbot responses can be subject to misinformation. This study was designed to evaluate the performance of ChatGPT on a board-style multiple-choice dental knowledge assessment to gauge its capacity to output accurate dental content and in turn the risk of misinformation associated with use of the chatbot as an educational resource by dental students. METHODS ChatGPT3.5 and ChatGPT4 were asked questions obtained from 3 different sources: INBDE Bootcamp, ITDOnline, and a list of board-style questions provided by the Joint Commission on National Dental Examinations. Image-based questions were excluded, as ChatGPT only takes text-based inputs. The mean performance across 3 trials was reported for each model. RESULTS ChatGPT3.5 and ChatGPT4 answered 61.3% and 76.9% of the questions correctly on average, respectively. A 2-tailed t test was used to compare 2 independent sample means, and a 2-tailed χ2 test was used to compare 2 sample proportions. A P value less than .05 was considered to be statistically significant. CONCLUSION ChatGPT3.5 did not perform sufficiently well on the board-style knowledge assessment. ChatGPT4, however, displayed a competent ability to output accurate dental content. Future research should evaluate the proficiency of emerging models of ChatGPT in dentistry to assess its evolving role in dental education. PRACTICAL IMPLICATIONS Although ChatGPT showed an impressive ability to output accurate dental content, our findings should encourage dental students to incorporate ChatGPT to supplement their existing learning program instead of using it as their primary learning resource.
Collapse
|
46
|
Waters MR, Aneja S, Hong JC. Unlocking the Power of ChatGPT, Artificial Intelligence, and Large Language Models: Practical Suggestions for Radiation Oncologists. Pract Radiat Oncol 2023; 13:e484-e490. [PMID: 37598727 DOI: 10.1016/j.prro.2023.06.011] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 06/28/2023] [Accepted: 06/29/2023] [Indexed: 08/22/2023]
Abstract
Recent advances in artificial intelligence (AI), such as generative AI and large language models (LLMs), have generated significant excitement about the potential of AI to revolutionize our lives, work, and interaction with technology. This article explores the practical applications of LLMs, particularly ChatGPT, in the field of radiation oncology. We offer a guide on how radiation oncologists can interact with LLMs like ChatGPT in their routine clinical and administrative tasks, highlighting potential use cases of the present and future. We also highlight limitations and ethical considerations, including the current state of LLMs in decision making, protection of sensitive data, and the important role of human review of AI-generated content.
Collapse
Affiliation(s)
- Michael R Waters
- Department of Radiation Oncology, Washington University School of Medicine, St. Louis, Missouri
| | - Sanjay Aneja
- Department of Radiation Oncology, Yale School of Medicine, New Haven, Connecticut
| | - Julian C Hong
- Department of Radiation Oncology and Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California.
| |
Collapse
|
47
|
Biri SK, Kumar S, Panigrahi M, Mondal S, Behera JK, Mondal H. Assessing the Utilization of Large Language Models in Medical Education: Insights From Undergraduate Medical Students. Cureus 2023; 15:e47468. [PMID: 38021810 PMCID: PMC10662537 DOI: 10.7759/cureus.47468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/22/2023] [Indexed: 12/01/2023] Open
Abstract
Background Artificial intelligence (AI) has the potential to be integrated into medical education. Among AI-based technology, large language models (LLMs) such as ChatGPT, Google Bard, Microsoft Bing, and Perplexity have emerged as powerful tools with capabilities in natural language processing. With this background, this study investigates the knowledge, attitude, and practice of undergraduate medical students regarding the utilization of LLMs in medical education in a medical college in Jharkhand, India. Methods A cross-sectional online survey was sent to 370 undergraduate medical students on Google Forms. The questionnaire comprised the following three domains: knowledge, attitude, and practice, each containing six questions. Cronbach's alphas for knowledge, attitude, and practice domains were 0.703, 0.707, and 0.809, respectively. Intraclass correlation coefficients for knowledge, attitude, and practice domains were 0.82, 0.87, and 0.78, respectively. The average scores in the three domains were compared using ANOVA. Results A total of 172 students participated in the study (response rate: 46.49%). The majority of the students (45.93%) rarely used the LLMs for their teaching-learning purposes (chi-square (3) = 41.44, p < 0.0001). The overall score of knowledge (3.21±0.55), attitude (3.47±0.54), and practice (3.26±0.61) were statistically significantly different (ANOVA F (2, 513) = 10.2, p < 0.0001), with the highest score in attitude and lowest in knowledge. Conclusion While there is a generally positive attitude toward the incorporation of LLMs in medical education, concerns about overreliance and potential inaccuracies are evident. LLMs offer the potential to enhance learning resources and provide accessible education, but their integration requires further planning. Further studies are required to explore the long-term impact of LLMs in diverse educational contexts.
Collapse
Affiliation(s)
| | - Subir Kumar
- Pharmacology, Phulo Jhano Medical College, Dumka, IND
| | | | - Shaikat Mondal
- Physiology, Raiganj Government Medical College & Hospital, Raiganj, IND
| | - Joshil Kumar Behera
- Physiology, Nagaland Institute of Medical Sciences and Research, Kohima, IND
| | - Himel Mondal
- Physiology, All India Institute of Medical Sciences, Deoghar, IND
| |
Collapse
|
48
|
Huang H, Zheng O, Wang D, Yin J, Wang Z, Ding S, Yin H, Xu C, Yang R, Zheng Q, Shi B. ChatGPT for shaping the future of dentistry: the potential of multi-modal large language model. Int J Oral Sci 2023; 15:29. [PMID: 37507396 PMCID: PMC10382494 DOI: 10.1038/s41368-023-00239-y] [Citation(s) in RCA: 30] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Revised: 07/06/2023] [Accepted: 07/13/2023] [Indexed: 07/30/2023] Open
Abstract
The ChatGPT, a lite and conversational variant of Generative Pretrained Transformer 4 (GPT-4) developed by OpenAI, is one of the milestone Large Language Models (LLMs) with billions of parameters. LLMs have stirred up much interest among researchers and practitioners in their impressive skills in natural language processing tasks, which profoundly impact various fields. This paper mainly discusses the future applications of LLMs in dentistry. We introduce two primary LLM deployment methods in dentistry, including automated dental diagnosis and cross-modal dental diagnosis, and examine their potential applications. Especially, equipped with a cross-modal encoder, a single LLM can manage multi-source data and conduct advanced natural language reasoning to perform complex clinical operations. We also present cases to demonstrate the potential of a fully automatic Multi-Modal LLM AI system for dentistry clinical application. While LLMs offer significant potential benefits, the challenges, such as data privacy, data quality, and model bias, need further study. Overall, LLMs have the potential to revolutionize dental diagnosis and treatment, which indicates a promising avenue for clinical application and research in dentistry.
Collapse
Affiliation(s)
- Hanyao Huang
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases & Department of Oral and Maxillofacial Surgery, West China Hospital of Stomatology, Sichuan University, Chengdu, China.
| | - Ou Zheng
- Department of Civil, Environmental & Construction Engineering, University of Central Florida, Orlando, USA.
| | - Dongdong Wang
- Department of Civil, Environmental & Construction Engineering, University of Central Florida, Orlando, USA
| | - Jiayi Yin
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases & Department of Oral and Maxillofacial Surgery, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Zijin Wang
- Department of Civil, Environmental & Construction Engineering, University of Central Florida, Orlando, USA
| | - Shengxuan Ding
- College of Transportation Engineering, University of Central Florida, Orlando, USA
| | - Heng Yin
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases & Department of Oral and Maxillofacial Surgery, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Chuan Xu
- School of Transportation and Logistics, Southwest Jiaotong University, Chengdu, China
- C2SMART Center, Tandon School of Engineering, New York University, Brooklyn, USA
| | - Renjie Yang
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases & Eastern Clinic, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Qian Zheng
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases & Department of Oral and Maxillofacial Surgery, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Bing Shi
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases & Department of Oral and Maxillofacial Surgery, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| |
Collapse
|
49
|
Alhaidry HM, Fatani B, Alrayes JO, Almana AM, Alfhaed NK. ChatGPT in Dentistry: A Comprehensive Review. Cureus 2023; 15:e38317. [PMID: 37266053 PMCID: PMC10230850 DOI: 10.7759/cureus.38317] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/29/2023] [Indexed: 06/03/2023] Open
Abstract
Chat generative pre-trained transformer (ChatGPT) is an artificial intelligence chatbot that uses natural language processing that can respond to human input in a conversational manner. ChatGPT has numerous applications in the health care system including dentistry; it is used in diagnoses and for assessing disease risk and scheduling appointments. It also has a role in scientific research. In the dental field, it has provided many benefits such as detecting dental and maxillofacial abnormalities on panoramic radiographs and identifying different dental restorations. Therefore, it helps in decreasing the workload. But even with these benefits, one should take into consideration the risks and limitations of this chatbot. Few articles mentioned the use of ChatGPT in dentistry. This comprehensive review represents data collected from 66 relevant articles using PubMed and Google Scholar as databases. This review aims to discuss all relevant published articles on the use of ChatGPT in dentistry.
Collapse
Affiliation(s)
- Hind M Alhaidry
- Advanced General Dentistry, Prince Sultan Military Medical City, Riyadh, SAU
| | - Bader Fatani
- Dentistry, College of Dentistry, King Saud University, Riyadh, SAU
| | - Jenan O Alrayes
- Dentistry, College of Dentistry, King Saud University, Riyadh, SAU
| | | | - Nawaf K Alfhaed
- Dentistry, College of Dentistry, King Saud University, Riyadh, SAU
| |
Collapse
|