1
|
Leng L. Challenge, integration, and change: ChatGPT and future anatomical education. Med Educ Online 2024; 29:2304973. [PMID: 38217884 PMCID: PMC10791098 DOI: 10.1080/10872981.2024.2304973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 01/08/2024] [Indexed: 01/15/2024]
Abstract
With the vigorous development of ChatGPT and its application in the field of education, a new era of the collaborative development of human and artificial intelligence and the symbiosis of education has come. Integrating artificial intelligence (AI) into medical education has the potential to revolutionize it. Large language models, such as ChatGPT, can be used as virtual teaching aids to provide students with individualized and immediate medical knowledge, and conduct interactive simulation learning and detection. In this paper, we discuss the application of ChatGPT in anatomy teaching and its various application levels based on our own teaching experiences, and discuss the advantages and disadvantages of ChatGPT in anatomy teaching. ChatGPT increases student engagement and strengthens students' ability to learn independently. At the same time, ChatGPT faces many challenges and limitations in medical education. Medical educators must keep pace with the rapid changes in technology, taking into account ChatGPT's impact on curriculum design, assessment strategies and teaching methods. Discussing the application of ChatGPT in medical education, especially anatomy teaching, is helpful to the effective integration and application of artificial intelligence tools in medical education.
Collapse
Affiliation(s)
- Lige Leng
- Fujian Provincial Key Laboratory of Neurodegenerative Disease and Aging Research, Institute of Neuroscience, School of Medicine, Xiamen University, Xiamen, Fujian, P.R. China
| |
Collapse
|
2
|
Suárez A, Jiménez J, Llorente de Pedro M, Andreu-Vázquez C, Díaz-Flores García V, Gómez Sánchez M, Freire Y. Beyond the Scalpel: Assessing ChatGPT's potential as an auxiliary intelligent virtual assistant in oral surgery. Comput Struct Biotechnol J 2024; 24:46-52. [PMID: 38162955 PMCID: PMC10755495 DOI: 10.1016/j.csbj.2023.11.058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 11/28/2023] [Accepted: 11/28/2023] [Indexed: 01/03/2024] Open
Abstract
AI has revolutionized the way we interact with technology. Noteworthy advances in AI algorithms and large language models (LLM) have led to the development of natural generative language (NGL) systems such as ChatGPT. Although these LLM can simulate human conversations and generate content in real time, they face challenges related to the topicality and accuracy of the information they generate. This study aimed to assess whether ChatGPT-4 could provide accurate and reliable answers to general dentists in the field of oral surgery, and thus explore its potential as an intelligent virtual assistant in clinical decision making in oral surgery. Thirty questions related to oral surgery were posed to ChatGPT4, each question repeated 30 times. Subsequently, a total of 900 responses were obtained. Two surgeons graded the answers according to the guidelines of the Spanish Society of Oral Surgery, using a three-point Likert scale (correct, partially correct/incomplete, and incorrect). Disagreements were arbitrated by an experienced oral surgeon, who provided the final grade Accuracy was found to be 71.7%, and consistency of the experts' grading across iterations, ranged from moderate to almost perfect. ChatGPT-4, with its potential capabilities, will inevitably be integrated into dental disciplines, including oral surgery. In the future, it could be considered as an auxiliary intelligent virtual assistant, though it would never replace oral surgery experts. Proper training and verified information by experts will remain vital to the implementation of the technology. More comprehensive research is needed to ensure the safe and successful application of AI in oral surgery.
Collapse
Affiliation(s)
- Ana Suárez
- Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, 28670 Madrid, Spain
| | - Jaime Jiménez
- Department of Clinic Dentistry, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, 28670 Madrid, Spain
| | - María Llorente de Pedro
- Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, 28670 Madrid, Spain
| | - Cristina Andreu-Vázquez
- Department of Veterinary Medicine, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, 28670 Madrid, Spain
| | - Víctor Díaz-Flores García
- Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, 28670 Madrid, Spain
| | - Margarita Gómez Sánchez
- Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, 28670 Madrid, Spain
| | - Yolanda Freire
- Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, 28670 Madrid, Spain
| |
Collapse
|
3
|
Baxter SL, Longhurst CA, Millen M, Sitapati AM, Tai-Seale M. Generative artificial intelligence responses to patient messages in the electronic health record: early lessons learned. JAMIA Open 2024; 7:ooae028. [PMID: 38601475 PMCID: PMC11006101 DOI: 10.1093/jamiaopen/ooae028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 03/18/2024] [Accepted: 04/03/2024] [Indexed: 04/12/2024] Open
Abstract
Background Electronic health record (EHR)-based patient messages can contribute to burnout. Messages with a negative tone are particularly challenging to address. In this perspective, we describe our initial evaluation of large language model (LLM)-generated responses to negative EHR patient messages and contend that using LLMs to generate initial drafts may be feasible, although refinement will be needed. Methods A retrospective sample (n = 50) of negative patient messages was extracted from a health system EHR, de-identified, and inputted into an LLM (ChatGPT). Qualitative analyses were conducted to compare LLM responses to actual care team responses. Results Some LLM-generated draft responses varied from human responses in relational connection, informational content, and recommendations for next steps. Occasionally, the LLM draft responses could have potentially escalated emotionally charged conversations. Conclusion Further work is needed to optimize the use of LLMs for responding to negative patient messages in the EHR.
Collapse
Affiliation(s)
- Sally L Baxter
- Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California San Diego, La Jolla, CA 92093, United States
- Health Department of Biomedical Informatics, University of California San Diego Health, La Jolla, CA 92093, United States
| | - Christopher A Longhurst
- Health Department of Biomedical Informatics, University of California San Diego Health, La Jolla, CA 92093, United States
| | - Marlene Millen
- Health Department of Biomedical Informatics, University of California San Diego Health, La Jolla, CA 92093, United States
- Division of Internal Medicine, Department of Medicine, University of California San Diego, La Jolla, CA 92093, United States
| | - Amy M Sitapati
- Health Department of Biomedical Informatics, University of California San Diego Health, La Jolla, CA 92093, United States
- Division of Internal Medicine, Department of Medicine, University of California San Diego, La Jolla, CA 92093, United States
| | - Ming Tai-Seale
- Health Department of Biomedical Informatics, University of California San Diego Health, La Jolla, CA 92093, United States
- Department of Family Medicine, University of California San Diego, La Jolla, CA 92093, United States
| |
Collapse
|
4
|
Tailor PD, Dalvin LA, Chen JJ, Iezzi R, Olsen TW, Scruggs BA, Barkmeier AJ, Bakri SJ, Ryan EH, Tang PH, Parke DW, Belin PJ, Sridhar J, Xu D, Kuriyan AE, Yonekawa Y, Starr MR. A Comparative Study of Responses to Retina Questions from Either Experts, Expert-Edited Large Language Models, or Expert-Edited Large Language Models Alone. Ophthalmol Sci 2024; 4:100485. [PMID: 38660460 PMCID: PMC11041826 DOI: 10.1016/j.xops.2024.100485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Revised: 01/03/2024] [Accepted: 02/01/2024] [Indexed: 04/26/2024]
Abstract
Objective To assess the quality, empathy, and safety of expert edited large language model (LLM), human expert created, and LLM responses to common retina patient questions. Design Randomized, masked multicenter study. Participants Twenty-one common retina patient questions were randomly assigned among 13 retina specialists. Methods Each expert created a response (Expert) and then edited a LLM (ChatGPT-4)-generated response to that question (Expert + artificial intelligence [AI]), timing themselves for both tasks. Five LLMs (ChatGPT-3.5, ChatGPT-4, Claude 2, Bing, and Bard) also generated responses to each question. The original question along with anonymized and randomized Expert + AI, Expert, and LLM responses were evaluated by the other experts who did not write an expert response to the question. Evaluators judged quality and empathy (very poor, poor, acceptable, good, or very good) along with safety metrics (incorrect information, likelihood to cause harm, extent of harm, and missing content). Main Outcome Mean quality and empathy score, proportion of responses with incorrect information, likelihood to cause harm, extent of harm, and missing content for each response type. Results There were 4008 total grades collected (2608 for quality and empathy; 1400 for safety metrics), with significant differences in both quality and empathy (P < 0.001, P < 0.001) between LLM, Expert and Expert + AI groups. For quality, Expert + AI (3.86 ± 0.85) performed the best overall while GPT-3.5 (3.75 ± 0.79) was the top performing LLM. For empathy, GPT-3.5 (3.75 ± 0.69) had the highest mean score followed by Expert + AI (3.73 ± 0.63). By mean score, Expert placed 4 out of 7 for quality and 6 out of 7 for empathy. For both quality (P < 0.001) and empathy (P < 0.001), expert-edited LLM responses performed better than expert-created responses. There were time savings for an expert-edited LLM response versus expert-created response (P = 0.02). ChatGPT-4 performed similar to Expert for inappropriate content (P = 0.35), missing content (P = 0.001), extent of possible harm (P = 0.356), and likelihood of possible harm (P = 0.129). Conclusions In this randomized, masked, multicenter study, LLM responses were comparable with experts in terms of quality, empathy, and safety metrics, warranting further exploration of their potential benefits in clinical settings. Financial Disclosures Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of the article.
Collapse
Affiliation(s)
| | | | - John J. Chen
- Department of Ophthalmology, Mayo Clinic, Rochester, Minnesota
| | - Raymond Iezzi
- Department of Ophthalmology, Mayo Clinic, Rochester, Minnesota
| | | | | | | | - Sophie J. Bakri
- Department of Ophthalmology, Mayo Clinic, Rochester, Minnesota
| | - Edwin H. Ryan
- Retina Consultants of Minnesota, Edina, Minnesota
- Department of Ophthalmology & Visual Neurosciences, University of Minnesota Medical School, Minneapolis, Minnesota
| | - Peter H. Tang
- Retina Consultants of Minnesota, Edina, Minnesota
- Department of Ophthalmology & Visual Neurosciences, University of Minnesota Medical School, Minneapolis, Minnesota
| | - D. Wilkin. Parke
- Retina Consultants of Minnesota, Edina, Minnesota
- Department of Ophthalmology & Visual Neurosciences, University of Minnesota Medical School, Minneapolis, Minnesota
| | | | - Jayanth Sridhar
- Olive View Medical Center, University of California Los Angeles, Los Angeles, California
| | - David Xu
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Ajay E. Kuriyan
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Yoshihiro Yonekawa
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | | |
Collapse
|
5
|
Young AT, Lane BN, Ozog D, Matthews NH. Patients and dermatologists are largely satisfied with ChatGPT-generated after-visit summaries: A pilot study. JAAD Int 2024; 15:33-35. [PMID: 38371667 PMCID: PMC10869927 DOI: 10.1016/j.jdin.2023.12.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/20/2024] Open
Affiliation(s)
- Albert T. Young
- Department of Dermatology, Henry Ford Hospital, Detroit, Michigan
| | - Brittany N. Lane
- Department of Medicine, Michigan State University College of Human Medicine, East Lansing, Michigan
| | - David Ozog
- Department of Dermatology, Henry Ford Hospital, Detroit, Michigan
- Department of Medicine, Michigan State University College of Human Medicine, East Lansing, Michigan
| | - Natalie H. Matthews
- Department of Dermatology, Henry Ford Hospital, Detroit, Michigan
- Department of Medicine, Michigan State University College of Human Medicine, East Lansing, Michigan
| |
Collapse
|
6
|
An H, Li X, Huang Y, Wang W, Wu Y, Liu L, Ling W, Li W, Zhao H, Lu D, Liu Q, Jiang G. A new ChatGPT-empowered, easy-to-use machine learning paradigm for environmental science. Eco Environ Health 2024; 3:131-136. [PMID: 38638173 PMCID: PMC11021822 DOI: 10.1016/j.eehl.2024.01.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 12/23/2023] [Accepted: 01/02/2024] [Indexed: 04/20/2024]
Abstract
The quantity and complexity of environmental data show exponential growth in recent years. High-quality big data analysis is critical for performing a sophisticated characterization of the complex network of environmental pollution. Machine learning (ML) has been employed as a powerful tool for decoupling the complexities of environmental big data based on its remarkable fitting ability. Yet, due to the knowledge gap across different subjects, ML concepts and algorithms have not been well-popularized among researchers in environmental sustainability. In this context, we introduce a new research paradigm-"ChatGPT + ML + Environment", providing an unprecedented chance for environmental researchers to reduce the difficulty of using ML models. For instance, each step involved in applying ML models to environmental sustainability, including data preparation, model selection and construction, model training and evaluation, and hyper-parameter optimization, can be easily performed with guidance from ChatGPT. We also discuss the challenges and limitations of using this research paradigm in the field of environmental sustainability. Furthermore, we highlight the importance of "secondary training" for future application of "ChatGPT + ML + Environment".
Collapse
Affiliation(s)
- Haoyuan An
- State Key Laboratory of Environmental Chemistry and Toxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- Biomedical Engineering Institute, School of Control Science and Engineering, Shandong University, Jinan 250061, China
| | - Xiangyu Li
- State Key Laboratory of Environmental Chemistry and Toxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Yuming Huang
- State Key Laboratory of Environmental Chemistry and Toxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Weichao Wang
- State Key Laboratory of Environmental Chemistry and Toxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Yuehan Wu
- State Key Laboratory of Environmental Chemistry and Toxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Lin Liu
- State Key Laboratory of Environmental Chemistry and Toxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Weibo Ling
- State Key Laboratory of Environmental Chemistry and Toxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Wei Li
- Biomedical Engineering Institute, School of Control Science and Engineering, Shandong University, Jinan 250061, China
| | - Hanzhu Zhao
- Biomedical Engineering Institute, School of Control Science and Engineering, Shandong University, Jinan 250061, China
| | - Dawei Lu
- State Key Laboratory of Environmental Chemistry and Toxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Qian Liu
- State Key Laboratory of Environmental Chemistry and Toxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Guibin Jiang
- State Key Laboratory of Environmental Chemistry and Toxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| |
Collapse
|
7
|
Buitrago-Esquinas EM, Puig-Cabrera M, Santos JAC, Custódio-Santos M, Yñiguez-Ovando R. Developing a hetero-intelligence methodological framework for sustainable policy-making based on the assessment of large language models. MethodsX 2024; 12:102707. [PMID: 38650999 PMCID: PMC11033193 DOI: 10.1016/j.mex.2024.102707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 04/09/2024] [Indexed: 04/25/2024] Open
Abstract
This work delves into the increasing relevance of Large Language Models (LLMs) in the realm of sustainable policy-making, proposing an innovative hetero-intelligence framework that blends human and artificial intelligence (AI) for tackling modern sustainability challenges. The research methodology includes a hetero-intelligence performance test, which juxtaposes human intelligence with AI in the formulation and implementation of sustainable policies. After testing this hetero-intelligence methodology, seven steps are rigorously described so that it can be replicated in any sustainability planning related context. The results underscore the capabilities and limitations of LLMs, underscoring the critical role of human intelligence in enhancing the efficacy of hetero-intelligence systems. This work fulfils the need of a rigorous methodological framework based on empirical steps that can provide unbiased outcomes to be integrated into sustainable planning and decision-making processes.•Assesses LLMs' limitations and capabilities regarding sustainable planning issues•A replicable methodology is proposed based on the combination of both human and artificial intelligence•It proposes and systematises the integration of a hetero-intelligent approach into the formulation of sustainability policies to be more efficient and effective.
Collapse
Affiliation(s)
- Eva M. Buitrago-Esquinas
- Faculty of Economics and Business Sciences, Universidad de Sevilla, Spain
- Research Centre for Tourism, Sustainability and Well-being (CinTurs), Universidade do Algarve, Faro, Portugal
| | - Miguel Puig-Cabrera
- Faculty of Economics and Business Sciences, Universidad de Sevilla, Spain
- Research Centre for Tourism, Sustainability and Well-being (CinTurs), Universidade do Algarve, Faro, Portugal
| | - José António C. Santos
- Faculty of Economics and Business Sciences, Universidad de Sevilla, Spain
- Research Centre for Tourism, Sustainability and Well-being (CinTurs), Universidade do Algarve, Faro, Portugal
| | - Margarida Custódio-Santos
- Faculty of Economics and Business Sciences, Universidad de Sevilla, Spain
- Research Centre for Tourism, Sustainability and Well-being (CinTurs), Universidade do Algarve, Faro, Portugal
| | - Rocío Yñiguez-Ovando
- Faculty of Economics and Business Sciences, Universidad de Sevilla, Spain
- Research Centre for Tourism, Sustainability and Well-being (CinTurs), Universidade do Algarve, Faro, Portugal
| |
Collapse
|
8
|
Grippaudo F, Nigrelli S, Patrignani A, Ribuffo D. Quality of the Information provided by ChatGPT for Patients in Breast Plastic Surgery: Are we already in the future? JPRAS Open 2024; 40:99-105. [PMID: 38444627 PMCID: PMC10914413 DOI: 10.1016/j.jpra.2024.02.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 02/04/2024] [Indexed: 03/07/2024] Open
Abstract
Introduction In recent years, artificial intelligence (AI) has gained popularity, even in the field of plastic surgery. It is increasingly common for patients to use the internet to gather information about plastic surgery, and AI-based chatbots, such as ChatGPT, could be employed to answer patients' questions.The aim of this study was to evaluate the quality of medical information provided by ChatGPT regarding three of the most common procedures in breast plastic surgery: breast reconstruction, breast reduction, and augmentation mammaplasty. Methods The quality of information was evaluated through the expanded EQIP scale. Responses were collected from a pool made by ten resident doctors in plastic surgery and then processed by SPSS software ver. 28.0. Results The analysis of the contents provided by ChatGPT revealed sufficient quality of information across all selected topics, with a high bias in terms of distribution of the score between the different items. There was a critical lack in the "Information data field" (0/6 score in all the 3 investigations) but a very high overall evaluation concerning the "Structure data" (>7/11 in all the 3 investigations). Conclusion Currently, AI serves as a valuable tool for patients; however, engineers and developers must address certain critical issues. It is possible that models like ChatGPT will play an important role in improving patient's consciousness about medical procedures and surgical interventions in the future, but their role must be considered ancillary to that of surgeons.
Collapse
Affiliation(s)
- F.R. Grippaudo
- Department of Plastic Reconstructive and Aesthetic Surgery, Policlinico Umberto I, Sapienza University of Rome, Viale del Policlinico 155, 00161, Rome, Italy
| | - S. Nigrelli
- Department of Plastic Reconstructive and Aesthetic Surgery, Policlinico Umberto I, Sapienza University of Rome, Viale del Policlinico 155, 00161, Rome, Italy
| | - A. Patrignani
- Department of Plastic Reconstructive and Aesthetic Surgery, Policlinico Umberto I, Sapienza University of Rome, Viale del Policlinico 155, 00161, Rome, Italy
| | - D. Ribuffo
- Department of Plastic Reconstructive and Aesthetic Surgery, Policlinico Umberto I, Sapienza University of Rome, Viale del Policlinico 155, 00161, Rome, Italy
| |
Collapse
|
9
|
Chang CT, Ticknor IL, Spinelli JA, Bhatia BK, Marwaha S, Mirmirani P, Seidler AM, Man JR, McCleskey PE. Comparison of large language models in generating patient handouts for the dermatology clinic: A blinded study. JAAD Int 2024; 15:152-154. [PMID: 38571697 PMCID: PMC10988028 DOI: 10.1016/j.jdin.2024.02.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2024] Open
Affiliation(s)
- Crystal T. Chang
- Kaiser Permanente Bernard J. Tyson School of Medicine, Pasadena, California
| | - Iesha L. Ticknor
- Kaiser Permanente Bernard J. Tyson School of Medicine, Pasadena, California
| | | | - Bhavnit K. Bhatia
- Department of Dermatology, The Permanente Medical Group, Richmond, California
| | - Sangeeta Marwaha
- Department of Dermatology, The Permanente Medical Group, Napa, California
| | - Paradi Mirmirani
- Department of Dermatology, The Permanente Medical Group, Vallejo, California
| | - Anne M. Seidler
- Department of Dermatology, The Permanente Medical Group, Oakland, California
| | - Jeremy R. Man
- Department of Dermatology, Southern California Permanente Medical Group, Los Angeles, California
| | | |
Collapse
|
10
|
Lechien JR, Carroll TL, Huston MN, Naunheim MR. ChatGPT-4 accuracy for patient education in laryngopharyngeal reflux. Eur Arch Otorhinolaryngol 2024; 281:2547-2552. [PMID: 38492008 DOI: 10.1007/s00405-024-08560-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Accepted: 02/13/2024] [Indexed: 03/18/2024]
Abstract
INTRODUCTION Chatbot Generative Pre-trained Transformer (ChatGPT) is an artificial intelligence-powered language model chatbot able to help otolaryngologists in practice and research. The ability of ChatGPT in generating patient-centered information related to laryngopharyngeal reflux disease (LPRD) was evaluated. METHODS Twenty-five questions dedicated to definition, clinical presentation, diagnosis, and treatment of LPRD were developed from the Dubai definition and management of LPRD consensus and recent reviews. Questions about the four aforementioned categories were entered into ChatGPT-4. Four board-certified laryngologists evaluated the accuracy of ChatGPT-4 with a 5-point Likert scale. Interrater reliability was evaluated. RESULTS The mean scores (SD) of ChatGPT-4 answers for definition, clinical presentation, additional examination, and treatments were 4.13 (0.52), 4.50 (0.72), 3.75 (0.61), and 4.18 (0.47), respectively. Experts reported high interrater reliability for sub-scores (ICC = 0.973). The lowest performances of ChatGPT-4 were on answers about the most prevalent LPR signs, the most reliable objective tool for the diagnosis (hypopharyngeal-esophageal multichannel intraluminal impedance-pH monitoring (HEMII-pH)), and the criteria for the diagnosis of LPR using HEMII-pH. CONCLUSION ChatGPT-4 may provide adequate information on the definition of LPR, differences compared to GERD (gastroesophageal reflux disease), and clinical presentation. Information provided upon extra-laryngeal manifestations and HEMII-pH may need further optimization. Regarding the recent trends identifying increasing patient use of internet sources for self-education, the findings of the present study may help draw attention to ChatGPT-4's accuracy on the topic of LPR.
Collapse
Affiliation(s)
- Jerome R Lechien
- Research Committee, Young Otolaryngologists of the International Federation of Otorhinolaryngological Societies (IFOS), Paris, France.
- Division of Laryngology and Broncho-Esophagology, Department of Otolaryngology-Head Neck Surgery, EpiCURA Hospital, UMONS Research Institute for Health Sciences and Technology, University of Mons (UMons), Mons, Belgium.
- Department of Otorhinolaryngology and Head and Neck Surgery, Foch Hospital, School of Medicine, Phonetics and Phonology Laboratory (UMR 7018 CNRS, Université Sorbonne Nouvelle/Paris 3), Paris, France.
- Polyclinique Elsan de Poitiers, Poitiers, France.
| | - Thomas L Carroll
- Division of Otolaryngology-Head and Neck Surgery, Brigham and Women's Hospital, Department of Otolaryngology-Head and Neck Surgery, Harvard Medical School, Boston, MA, USA
| | - Molly N Huston
- Department of Otolaryngology, Washington University School of Medicine in St. Louis, St. Louis, MO, USA
| | - Matthew R Naunheim
- Research Committee, Young Otolaryngologists of the International Federation of Otorhinolaryngological Societies (IFOS), Paris, France
- Department of Otolaryngology-Head and Neck Surgery, Harvard Medical School, Boston, MA, USA
- Division of Laryngology, Massachusetts Eye and Ear, Boston, MA, USA
| |
Collapse
|
11
|
Fournier A, Fallet C, Sadeghipour F, Perrottet N. Assessing the applicability and appropriateness of ChatGPT in answering clinical pharmacy questions. Ann Pharm Fr 2024; 82:507-513. [PMID: 37992892 DOI: 10.1016/j.pharma.2023.11.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 11/16/2023] [Accepted: 11/16/2023] [Indexed: 11/24/2023]
Abstract
OBJECTIVES Clinical pharmacists rely on different scientific references to ensure appropriate, safe, and cost-effective drug use. Tools based on artificial intelligence (AI) such as ChatGPT (Generative Pre-trained Transformer) could offer valuable support. The objective of this study was to assess ChatGPT's capacity to correctly respond to clinical pharmacy questions asked by healthcare professionals in our university hospital. MATERIAL AND METHODS ChatGPT's capacity to respond correctly to the last 100 consecutive questions recorded in our clinical pharmacy database was assessed. Questions were copied from our FileMaker Pro database and pasted into ChatGPT March 14 version online platform. The generated answers were then copied verbatim into an Excel file. Two blinded clinical pharmacists reviewed all the questions and the answers given by the software. In case of disagreements, a third blinded pharmacist intervened to decide. RESULTS Documentation-related issues (n=36) and drug administration mode (n=30) were preponderantly recorded. Among 69 applicable questions, the rate of correct answers varied from 30 to 57.1% depending on questions type with a global rate of 44.9%. Regarding inappropriate answers (n=38), 20 were incorrect, 18 gave no answers and 8 were incomplete with 8 answers belonging to 2 different categories. No better answers than the pharmacists were observed. CONCLUSIONS ChatGPT demonstrated a mitigated performance in answering clinical pharmacy questions. It should not replace human expertise as a high rate of inappropriate answers was highlighted. Future studies should focus on the optimization of ChatGPT for specific clinical pharmacy questions and explore the potential benefits and limitations of integrating this technology into clinical practice.
Collapse
Affiliation(s)
- A Fournier
- Service of Pharmacy, Centre Hospitalier Universitaire Vaudois (CHUV), Lausanne, Switzerland
| | - C Fallet
- Service of Pharmacy, Centre Hospitalier Universitaire Vaudois (CHUV), Lausanne, Switzerland
| | - F Sadeghipour
- Service of Pharmacy, Centre Hospitalier Universitaire Vaudois (CHUV), Lausanne, Switzerland; School of Pharmaceutical Sciences, University of Geneva, University of Lausanne, Geneva, Switzerland; Center for Research and Innovation in Clinical Pharmaceutical Sciences, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland
| | - N Perrottet
- Service of Pharmacy, Centre Hospitalier Universitaire Vaudois (CHUV), Lausanne, Switzerland; School of Pharmaceutical Sciences, University of Geneva, University of Lausanne, Geneva, Switzerland.
| |
Collapse
|
12
|
Papastratis I, Stergioulas A, Konstantinidis D, Daras P, Dimitropoulos K. Can ChatGPT provide appropriate meal plans for NCD patients? Nutrition 2024; 121:112291. [PMID: 38359704 DOI: 10.1016/j.nut.2023.112291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 10/30/2023] [Indexed: 02/17/2024]
Abstract
OBJECTIVES Dietary habits significantly affect health conditions and are closely related to the onset and progression of non-communicable diseases (NCDs). Consequently, a well-balanced diet plays an important role in lessening the effects of various disorders, including NCDs. Several artificial intelligence recommendation systems have been developed to propose healthy and nutritious diets. Most of these systems use expert knowledge and guidelines to provide tailored diets and encourage healthier eating habits. However, new advances in large language models such as ChatGPT, with their ability to produce human-like responses, have led individuals to search for advice in several tasks, including diet recommendations. This study aimed to determine the ability of ChatGPT models to generate appropriate personalized meal plans for patients with obesity, cardiovascular diseases, and type 2 diabetes. METHODS Using a state-of-the-art knowledge-based recommendation system as a reference, we assessed the meal plans generated by two large language models in terms of energy intake, nutrient accuracy, and meal variability. RESULTS Experimental results with different user profiles revealed the potential of ChatGPT models to provide personalized nutritional advice. CONCLUSION Additional supervision and guidance by nutrition experts or knowledge-based systems are required to ensure meal appropriateness for users with NCDs.
Collapse
Affiliation(s)
- Ilias Papastratis
- The Visual Computing Lab, Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Central Macedonia, Greece.
| | - Andreas Stergioulas
- The Visual Computing Lab, Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Central Macedonia, Greece
| | - Dimitrios Konstantinidis
- The Visual Computing Lab, Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Central Macedonia, Greece
| | - Petros Daras
- The Visual Computing Lab, Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Central Macedonia, Greece
| | - Kosmas Dimitropoulos
- The Visual Computing Lab, Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Central Macedonia, Greece
| |
Collapse
|
13
|
Noda R, Izaki Y, Kitano F, Komatsu J, Ichikawa D, Shibagaki Y. Performance of ChatGPT and Bard in self-assessment questions for nephrology board renewal. Clin Exp Nephrol 2024; 28:465-469. [PMID: 38353783 DOI: 10.1007/s10157-023-02451-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 12/25/2023] [Indexed: 04/23/2024]
Abstract
BACKGROUND Large language models (LLMs) have impacted advances in artificial intelligence. While LLMs have demonstrated high performance in general medical examinations, their performance in specialized areas such as nephrology is unclear. This study aimed to evaluate ChatGPT and Bard in their potential nephrology applications. METHODS Ninety-nine questions from the Self-Assessment Questions for Nephrology Board Renewal from 2018 to 2022 were presented to two versions of ChatGPT (GPT-3.5 and GPT-4) and Bard. We calculated the correct answer rates for the five years, each year, and question categories and checked whether they exceeded the pass criterion. The correct answer rates were compared with those of the nephrology residents. RESULTS The overall correct answer rates for GPT-3.5, GPT-4, and Bard were 31.3% (31/99), 54.5% (54/99), and 32.3% (32/99), respectively, thus GPT-4 significantly outperformed GPT-3.5 (p < 0.01) and Bard (p < 0.01). GPT-4 passed in three years, barely meeting the minimum threshold in two. GPT-4 demonstrated significantly higher performance in problem-solving, clinical, and non-image questions than GPT-3.5 and Bard. GPT-4's performance was between third- and fourth-year nephrology residents. CONCLUSIONS GPT-4 outperformed GPT-3.5 and Bard and met the Nephrology Board renewal standards in specific years, albeit marginally. These results highlight LLMs' potential and limitations in nephrology. As LLMs advance, nephrologists should understand their performance for future applications.
Collapse
Affiliation(s)
- Ryunosuke Noda
- Division of Nephrology and Hypertension, Department of Internal Medicine, St. Marianna University School of Medicine, 2-16-1 Sugao, Miyamae-Ku, Kawasaki, Kanagawa, 216-8511, Japan.
| | - Yuto Izaki
- Division of Nephrology and Hypertension, Department of Internal Medicine, St. Marianna University School of Medicine, 2-16-1 Sugao, Miyamae-Ku, Kawasaki, Kanagawa, 216-8511, Japan
| | - Fumiya Kitano
- Division of Nephrology and Hypertension, Department of Internal Medicine, St. Marianna University School of Medicine, 2-16-1 Sugao, Miyamae-Ku, Kawasaki, Kanagawa, 216-8511, Japan
| | - Jun Komatsu
- Division of Nephrology and Hypertension, Department of Internal Medicine, St. Marianna University School of Medicine, 2-16-1 Sugao, Miyamae-Ku, Kawasaki, Kanagawa, 216-8511, Japan
| | - Daisuke Ichikawa
- Division of Nephrology and Hypertension, Department of Internal Medicine, St. Marianna University School of Medicine, 2-16-1 Sugao, Miyamae-Ku, Kawasaki, Kanagawa, 216-8511, Japan
| | - Yugo Shibagaki
- Division of Nephrology and Hypertension, Department of Internal Medicine, St. Marianna University School of Medicine, 2-16-1 Sugao, Miyamae-Ku, Kawasaki, Kanagawa, 216-8511, Japan
| |
Collapse
|
14
|
González R, Poenaru D, Woo R, Trappey AF, Carter S, Darcy D, Encisco E, Gulack B, Miniati D, Tombash E, Huang EY. ChatGPT: What Every Pediatric Surgeon Should Know About Its Potential Uses and Pitfalls. J Pediatr Surg 2024; 59:941-947. [PMID: 38336588 DOI: 10.1016/j.jpedsurg.2024.01.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 12/30/2023] [Accepted: 01/09/2024] [Indexed: 02/12/2024]
Abstract
ChatGPT - currently the most popular generative artificial intelligence system - has been revolutionizing the world and healthcare since its release in November 2022. ChatGPT is a conversational chatbot that uses machine learning algorithms to enhance its replies based on user interactions and is a part of a broader effort to develop natural language processing that can assist people in their daily lives by understanding and responding to human language in a useful and engaging way. Thus far, many potential applications within healthcare have been described, despite its relatively recent release. This manuscript offers the pediatric surgical community a primer on this new technology and discusses some initial observations about its potential uses and pitfalls. Moreover, it introduces the perspectives of medical journals and surgical societies regarding the use of this artificial intelligence chatbot. As ChatGPT and other large language models continue to evolve, it is the responsibility of the pediatric surgery community to stay abreast of these changes and play an active role in safely incorporating them into our field for the benefit of our patients. LEVEL OF EVIDENCE: V.
Collapse
Affiliation(s)
- Raquel González
- Division of Pediatric Surgery, Johns Hopkins All Children's Hospital, 501 6th Avenue S, Saint Petersburg, FL, 33701, USA.
| | - Dan Poenaru
- McGill University, 5252 Boul. De Maissonneuve O. rm. 3E.05, Montréal, QC, H4a 3S5, Canada
| | - Russell Woo
- Department of Surgery, Division of Pediatric Surgery, University of Hawai'i, John A. Burns School of Medicine, 1319 Punahou Street, Suite 600, Honolulu, HI, 96826, USA
| | - A Francois Trappey
- Pediatric General and Thoracic Surgery, Brooke Army Medical Center, 3551 Roger Brooke Dr, Fort Sam Houston, TX, 78234, USA
| | - Stewart Carter
- Division of Pediatric Surgery, University of Louisville, Norton Children's Hospital, 315 East Broadway, Suite 565, Louisville, KY, 40202, USA
| | - David Darcy
- Golisano Children's Hospital, University of Rochester Medical Center, 601 Elmwood Avenue, Box SURG, Rochester, NY, 14642, USA
| | - Ellen Encisco
- Division of Pediatric General and Thoracic Surgery, Cincinnati Children's Hospital, 3333 Burnet Ave, Cincinnati, OH, 45229, USA
| | - Brian Gulack
- Rush University Medical Center, 1653 W Congress Parkway, Kellogg, Chicago, IL, 60612, USA
| | - Doug Miniati
- Department of Pediatric Surgery, Kaiser Permanente Roseville, 1600 Eureka Road, Building C, Suite C35, Roseville, CA, 95661, USA
| | - Edzhem Tombash
- Division of Pediatric General and Thoracic Surgery, Cincinnati Children's Hospital, 3333 Burnet Ave, Cincinnati, OH, 45229, USA
| | - Eunice Y Huang
- Vanderbilt University Medical Center, Monroe Carell Jr. Children's Hospital, 2200 Children's Way, Suite 7100, Nashville, TN, 37232, USA
| |
Collapse
|
15
|
Araji T, Brooks AD. Evaluating The Role of ChatGPT as a Study Aid in Medical Education in Surgery. J Surg Educ 2024; 81:753-757. [PMID: 38556438 DOI: 10.1016/j.jsurg.2024.01.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 01/15/2024] [Accepted: 01/25/2024] [Indexed: 04/02/2024]
Abstract
OBJECTIVE Our aim was to assess how ChatGPT compares to Google search in assisting medical students during their surgery clerkships. DESIGN We conducted a crossover study where participants were asked to complete 2 standardized assessments on different general surgery topics before and after they used either Google search or ChatGPT. SETTING The study was conducted at the Perelman School of Medicine at the University of Pennsylvania (PSOM) in Philadelphia, Pennsylvania. PARTICIPANTS 19 third-year medical students participated in our study. RESULTS The baseline (preintervention) performance of participants on both quizzes did not differ between the Google search and ChatGPT groups (p = 0.728). Students overall performed better postintervention and the difference in test scores was statistically significant for both the Google group (p < 0.001) and the ChatGPT group (p = 0.01). The mean percent increase in test scores pre- and postintervention was higher in the Google group at 11% vs. 10% in the ChatGPT group, but this difference was not statistically significant (p = 0.87). Similarly, there was no statistically significant difference in postintervention scores on both assessments between the 2 groups (p = 0.508). Postassessment surveys revealed that all students (100%) have known about ChatGPT before, and 47% have previously used it for various purposes. On a scale of 1 to 10 with 1 being the lowest and 10 being the highest, the feasibility of ChatGPT and its usefulness in finding answers were rated as 8.4 and 6.6 on average, respectively. When asked to rate the likelihood of using ChatGPT in their surgery rotation, the answers ranged between 1 and 3 ("Unlikely" 47%), 4 to 6 ("intermediate" 26%), and 7 to 10 ("likely" 26%). CONCLUSION Our results show that even though ChatGPT was comparable to Google search in finding answers pertaining to surgery questions, many students were reluctant to use ChatGPT for learning purposes during their surgery clerkship.
Collapse
Affiliation(s)
- Tarek Araji
- Hospital of the University of Pennsylvania, Department of Surgery, Philadelphia, Pennsylvania
| | - Ari D Brooks
- Hospital of the University of Pennsylvania, Department of Surgery, Philadelphia, Pennsylvania.
| |
Collapse
|
16
|
Amin KS, Forman HP, Davis MA. Even with ChatGPT, race matters. Clin Imaging 2024; 109:110113. [PMID: 38552383 DOI: 10.1016/j.clinimag.2024.110113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 02/15/2024] [Accepted: 02/24/2024] [Indexed: 04/17/2024]
Abstract
BACKGROUND Applications of large language models such as ChatGPT are increasingly being studied. Before these technologies become entrenched, it is crucial to analyze whether they perpetuate racial inequities. METHODS We asked Open AI's ChatGPT-3.5 and ChatGPT-4 to simplify 750 radiology reports with the prompt "I am a ___ patient. Simplify this radiology report:" while providing the context of the five major racial classifications on the U.S. census: White, Black or African American, American Indian or Alaska Native, Asian, and Native Hawaiian or other Pacific Islander. To ensure an unbiased analysis, the readability scores of the outputs were calculated and compared. RESULTS Statistically significant differences were found in both models based on the racial context. For ChatGPT-3.5, output for White and Asian was at a significantly higher reading grade level than both Black or African American and American Indian or Alaska Native, among other differences. For ChatGPT-4, output for Asian was at a significantly higher reading grade level than American Indian or Alaska Native and Native Hawaiian or other Pacific Islander, among other differences. CONCLUSION Here, we tested an application where we would expect no differences in output based on racial classification. Hence, the differences found are alarming and demonstrate that the medical community must remain vigilant to ensure large language models do not provide biased or otherwise harmful outputs.
Collapse
Affiliation(s)
| | - Howard P Forman
- Department of Radiology and Biomedical Imaging, Yale School of Medicine, New Haven, CT, USA
| | - Melissa A Davis
- Department of Radiology and Biomedical Imaging, Yale School of Medicine, New Haven, CT, USA.
| |
Collapse
|
17
|
Makhoul M, Melkane AE, Khoury PE, Hadi CE, Matar N. A cross-sectional comparative study: ChatGPT 3.5 versus diverse levels of medical experts in the diagnosis of ENT diseases. Eur Arch Otorhinolaryngol 2024; 281:2717-2721. [PMID: 38365990 DOI: 10.1007/s00405-024-08509-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Accepted: 01/24/2024] [Indexed: 02/18/2024]
Abstract
PURPOSE With recent advances in artificial intelligence (AI), it has become crucial to thoroughly evaluate its applicability in healthcare. This study aimed to assess the accuracy of ChatGPT in diagnosing ear, nose, and throat (ENT) pathology, and comparing its performance to that of medical experts. METHODS We conducted a cross-sectional comparative study where 32 ENT cases were presented to ChatGPT 3.5, ENT physicians, ENT residents, family medicine (FM) specialists, second-year medical students (Med2), and third-year medical students (Med3). Each participant provided three differential diagnoses. The study analyzed diagnostic accuracy rates and inter-rater agreement within and between participant groups and ChatGPT. RESULTS The accuracy rate of ChatGPT was 70.8%, being not significantly different from ENT physicians or ENT residents. However, a significant difference in correctness rate existed between ChatGPT and FM specialists (49.8%, p < 0.001), and between ChatGPT and medical students (Med2 47.5%, p < 0.001; Med3 47%, p < 0.001). Inter-rater agreement for the differential diagnosis between ChatGPT and each participant group was either poor or fair. In 68.75% of cases, ChatGPT failed to mention the most critical diagnosis. CONCLUSIONS ChatGPT demonstrated accuracy comparable to that of ENT physicians and ENT residents in diagnosing ENT pathology, outperforming FM specialists, Med2 and Med3. However, it showed limitations in identifying the most critical diagnosis.
Collapse
Affiliation(s)
- Mikhael Makhoul
- Department of Otolaryngology-Head and Neck Surgery, Hotel Dieu de France Hospital, Saint Joseph University, Alfred Naccache Boulevard, Ashrafieh, PO Box: 166830, Beirut, Lebanon.
| | - Antoine E Melkane
- Department of Otolaryngology-Head and Neck Surgery, Hotel Dieu de France Hospital, Saint Joseph University, Alfred Naccache Boulevard, Ashrafieh, PO Box: 166830, Beirut, Lebanon
| | - Patrick El Khoury
- Department of Otolaryngology-Head and Neck Surgery, Hotel Dieu de France Hospital, Saint Joseph University, Alfred Naccache Boulevard, Ashrafieh, PO Box: 166830, Beirut, Lebanon
| | - Christopher El Hadi
- Department of Otolaryngology-Head and Neck Surgery, Hotel Dieu de France Hospital, Saint Joseph University, Alfred Naccache Boulevard, Ashrafieh, PO Box: 166830, Beirut, Lebanon
| | - Nayla Matar
- Department of Otolaryngology-Head and Neck Surgery, Hotel Dieu de France Hospital, Saint Joseph University, Alfred Naccache Boulevard, Ashrafieh, PO Box: 166830, Beirut, Lebanon
| |
Collapse
|
18
|
Abu Arqub S, Al-Moghrabi D, Allareddy V, Upadhyay M, Vaid N, Yadav S. Content analysis of AI-generated ( ChatGPT) responses concerning orthodontic clear aligners. Angle Orthod 2024; 94:263-272. [PMID: 38195060 DOI: 10.2319/071123-484.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Accepted: 11/01/2023] [Indexed: 01/11/2024] Open
Abstract
OBJECTIVES To assess the accuracy of ChatGPT answers concerning orthodontic clear aligners. MATERIALS AND METHODS A cross-sectional content analysis of ChatGPT generated responses to queries related to clear aligner treatment (CAT) was undertaken. A total of 111 questions were generated by three orthodontists based on a set of predefined domains and subdomains. The artificial intelligence (AI)-generated (ChatGPT) answers were extracted and their accuracy was determined independently by five orthodontists. The accuracy of answers was assessed using a prepiloted four-point scale scoring rubric. Descriptive statistics were performed. RESULTS The total mean accuracy score for the entire set was 2.6 ± 1.1. It was noted that 58% of the AI-generated answers were scored as objectively true, 18% were selected facts, 9% were minimal facts, and 15% were false. False claims included the ability of CAT to reduce the need for orthognathic surgery (4.0 ± 0.0), improve airway function (3.8 ± 0.5), achieve root parallelism (3.6 ± 0.5), alleviate sleep apnea (3.8 ± 0.5), and produce more stable results compared to fixed appliances (3.8 ± 0.5). CONCLUSIONS The overall level of accuracy of ChatGPT responses to questions concerning CAT was suboptimal and lacked citations to relevant literature. Ability of the software to offer current and precise information was limited. Therefore, clinicians and patients must be mindful of false claims and relevant facts omitted in the answers generated by ChatGPT.
Collapse
|
19
|
Davis RJ, Ayo-Ajibola O, Lin ME, Swanson MS, Chambers TN, Kwon DI, Kokot NC. Evaluation of Oropharyngeal Cancer Information from Revolutionary Artificial Intelligence Chatbot. Laryngoscope 2024; 134:2252-2257. [PMID: 37983846 DOI: 10.1002/lary.31191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 10/12/2023] [Accepted: 11/03/2023] [Indexed: 11/22/2023]
Abstract
OBJECTIVE With burgeoning popularity of artificial intelligence-based chatbots, oropharyngeal cancer patients now have access to a novel source of medical information. Because chatbot information is not reviewed by experts, we sought to evaluate an artificial intelligence-based chatbot's oropharyngeal cancer-related information for accuracy. METHODS Fifteen oropharyngeal cancer-related questions were developed and input into ChatGPT version 3.5. Four physician-graders independently assessed accuracy, comprehensiveness, and similarity to a physician response using 5-point Likert scales. Responses graded lower than three were then critiqued by physician-graders. Critiques were analyzed using inductive thematic analysis. Readability of responses was assessed using Flesch Reading Ease (FRE) and Flesch-Kincaid Reading Grade Level (FKRGL) scales. RESULTS Average accuracy, comprehensiveness, and similarity to a physician response scores were 3.88 (SD = 0.99), 3.80 (SD = 1.14), and 3.67 (SD = 1.08), respectively. Posttreatment-related questions were most accurate, comprehensive, and similar to a physician response, followed by treatment-related, then diagnosis-related questions. Posttreatment-related questions scored significantly higher than diagnosis-related questions in all three domains (p < 0.01). Two themes of the physician critiques were identified: suboptimal education value and potential to misinform patients. The mean FRE and FKRGL scores both indicated greater than an 11th grade readability level-higher than the 6th grade level recommended for patients. CONCLUSION ChatGPT responses may not educate patients to an appropriate degree, could outright misinform them, and read at a more difficult grade level than is recommended for patient material. As oropharyngeal cancer patients represent a vulnerable population facing complex, life-altering diagnoses, and treatments, they should be cautious when consuming chatbot-generated medical information. LEVEL OF EVIDENCE NA Laryngoscope, 134:2252-2257, 2024.
Collapse
Affiliation(s)
- Ryan J Davis
- Keck School of Medicine of the University of Southern California, Los Angeles, California, USA
| | | | - Matthew E Lin
- Keck School of Medicine of the University of Southern California, Los Angeles, California, USA
| | - Mark S Swanson
- Caruso Department of Otolaryngology-Head & Neck Surgery, Keck School of Medicine of the University of Southern California, Los Angeles, California, USA
| | - Tamara N Chambers
- Caruso Department of Otolaryngology-Head & Neck Surgery, Keck School of Medicine of the University of Southern California, Los Angeles, California, USA
| | - Daniel I Kwon
- Caruso Department of Otolaryngology-Head & Neck Surgery, Keck School of Medicine of the University of Southern California, Los Angeles, California, USA
| | - Niels C Kokot
- Caruso Department of Otolaryngology-Head & Neck Surgery, Keck School of Medicine of the University of Southern California, Los Angeles, California, USA
| |
Collapse
|
20
|
Abstract
The launch of Open AI's chatbot, ChatGPT, has generated a lot of attention and discussion among professionals in several fields. Many concerns and challenges have been brought up by researchers from various fields, particularly in relation to the harm that using these tools for medical diagnosis and treatment recommendations can cause. In addition, it has been debated if ChatGPT is dependable, efficient, and helpful for clinicians and medical professionals. Therefore, in this study, we assess ChatGPT's effectiveness in providing mental health support, particularly for issues related to anxiety and depression, based on the chatbot's responses and cross-questioning. The findings indicate that there are significant inconsistencies and that ChatGPT's reliability is low in this specific domain. As a result, care must be used when using ChatGPT as a complementary mental health resource.
Collapse
Affiliation(s)
- Faiza Farhat
- Section of Parasitology, Department of Zoology, Aligarh Muslim University, Aligarh, UP, 202002, India.
| |
Collapse
|
21
|
Khan U. Revolutionizing Personalized Protein Energy Malnutrition Treatment: Harnessing the Power of Chat GPT. Ann Biomed Eng 2024; 52:1125-1127. [PMID: 37728811 DOI: 10.1007/s10439-023-03331-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 07/22/2023] [Indexed: 09/21/2023]
Abstract
Protein energy malnutrition (PEM) is a global public health concern, and personalized treatment approaches are crucial for improved outcomes. This study explores the transformative potential of Chat GPT, an AI language model, in revolutionizing personalized treatment for PEM. By providing accurate information, personalized dietary recommendations, food choices, psychological counseling of the patient and real-time monitoring and support, Chat GPT can enhance the effectiveness of PEM interventions. Along with the benefits it is also important to acknowledge its potential flaws and limitations. The study emphasizes the importance of collaboration between AI technology and healthcare professionals to leverage Chat GPT's capabilities effectively. By combining human expertise with AI capabilities, personalized PEM treatment can be revolutionized, leading to improved patient outcomes and a comprehensive approach to addressing this global public health concern. The study highlights the significant impact of Chat GPT in providing tailored guidance and continuous support throughout the treatment process, empowering individuals and improving their overall well-being.
Collapse
Affiliation(s)
- Urooj Khan
- Department of Human Nutrition and Dietetics, Faculty of Allied Health Science, Superior University, Sargodha Campus, Sargodha, Pakistan.
| |
Collapse
|
22
|
Timurkaynak Ö, Gönenli G. Response to Young et al., "The utility of ChatGPT in generating patient-facing and clinical responses for melanoma". J Am Acad Dermatol 2024; 90:e177. [PMID: 38215796 DOI: 10.1016/j.jaad.2023.12.052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 12/12/2023] [Accepted: 12/13/2023] [Indexed: 01/14/2024]
Affiliation(s)
- Özgür Timurkaynak
- Department of Dermatology, Acibadem Mehmet Ali Aydinlar University School of Medicine, Istanbul, Turkey.
| | - Gökhan Gönenli
- Department of Internal Medicine, Koç University School of Medicine, Istanbul, Turkey
| |
Collapse
|
23
|
Shifai N, van Doorn R, Malvehy J, Sangers TE. Can ChatGPT vision diagnose melanoma? An exploratory diagnostic accuracy study. J Am Acad Dermatol 2024; 90:1057-1059. [PMID: 38244612 DOI: 10.1016/j.jaad.2023.12.062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 12/07/2023] [Accepted: 12/24/2023] [Indexed: 01/22/2024]
Affiliation(s)
- Naweed Shifai
- Department of Dermatology, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Remco van Doorn
- Department of Dermatology, Leiden University Medical Center, Leiden, The Netherlands
| | - Josep Malvehy
- Melanoma Unit, Dermatology Department, Hospital Clínic de Barcelona, IDIBAPS, Universitat de Barcelona, Barcelona, Spain; Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Barcelona, Spain
| | - Tobias E Sangers
- Department of Dermatology, Leiden University Medical Center, Leiden, The Netherlands.
| |
Collapse
|
24
|
Robinson MA, Belzberg M, Thakker S, Bibee K, Merkel E, MacFarlane DF, Lim J, Scott JF, Deng M, Lewin J, Soleymani D, Rosenfeld D, Liu R, Liu TYA, Ng E. Assessing the accuracy, usefulness, and readability of artificial-intelligence-generated responses to common dermatologic surgery questions for patient education: A double-blinded comparative study of ChatGPT and Google Bard. J Am Acad Dermatol 2024; 90:1078-1080. [PMID: 38296195 DOI: 10.1016/j.jaad.2024.01.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 12/26/2023] [Accepted: 01/14/2024] [Indexed: 02/17/2024]
Affiliation(s)
- Michelle A Robinson
- Department of Dermatology, Johns Hopkins School of Medicine, Baltimore, Maryland.
| | - Micah Belzberg
- Department of Dermatology, Johns Hopkins School of Medicine, Baltimore, Maryland
| | - Sach Thakker
- Georgetown University School of Medicine, Washington, DC
| | - Kristin Bibee
- Department of Dermatology, Johns Hopkins School of Medicine, Baltimore, Maryland
| | - Emily Merkel
- Department of Dermatology, Johns Hopkins School of Medicine, Baltimore, Maryland
| | - Deborah F MacFarlane
- Department of Dermatology, the University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Jordan Lim
- Department of Dermatology, Emory University School of Medicine, Atlanta, Georgia
| | - Jeffrey F Scott
- Department of Dermatology, Johns Hopkins School of Medicine, Baltimore, Maryland
| | - Min Deng
- Department of Dermatology, MedStar Washington Hospital Center, Georgetown University Hospital, Washington, DC
| | - Jesse Lewin
- Kimberly and Eric J. Waldman Department of Dermatology, Icahn School of Medicine at Mount Sinai, New York, New York
| | | | | | | | - Tin Yan Alvin Liu
- Department of Ophthalmology, Johns Hopkins School of Medicine, Baltimore, Maryland
| | - Elise Ng
- Department of Ophthalmology, Johns Hopkins School of Medicine, Baltimore, Maryland
| |
Collapse
|
25
|
Ozgor BY, Simavi MA. Accuracy and reproducibility of ChatGPT's free version answers about endometriosis. Int J Gynaecol Obstet 2024; 165:691-695. [PMID: 38108232 DOI: 10.1002/ijgo.15309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 11/27/2023] [Accepted: 12/04/2023] [Indexed: 12/19/2023]
Abstract
OBJECTIVE To evaluate the accuracy and reproducibility of ChatGPT's free version answers about endometriosis for the first time. METHODS Detailed internet searches to identify frequently asked questions (FAQs) about endometriosis have been performed. Scientific questions were prepared in accordance with the European Society of Human Reproduction and Embryology (ESHRE) endometriosis guidelines. An experienced gynecologist gave a score of 1-4 for each ChatGPT answer. The repeatability of ChatGPT answers about endometriosis was analyzed by asking each question twice, and the reproducibility of ChatGPT was accepted as scoring the answer to the same question in the same score category. RESULTS A total of 91.4% (n = 71) of all FAQs were answered completely, accurately, and sufficiently. ChatGPT had the highest accuracy in the symptom and diagnosis category (94.1%, 16/17 questions) and the lowest accuracy in the treatment category (81.3%, 13/16 questions). Furthermore, of the 40 questions based on the ESHRE endometriosis guidelines, 27 (67.5%) were classified as grade 1, seven (17.5%) as grade 2, and six (15.0%) as grade 3. The reproducibility rate of FAQs in the prevention, symptoms, and diagnosis, and complications categories was the highest (100% for all categories). The reproducibility rate was the lowest for questions based on the ESHRE endometriosis guidelines (70.0%). CONCLUSION ChatGPT accurately and satisfactorily responded to more than 90% of the questions about endometriosis, but to only 67.5% of questions based on the ESHRE endometriosis guidelines.
Collapse
Affiliation(s)
- Bahar Yuksel Ozgor
- Department of Obstetrics and Gynecology, Biruni University, Istanbul, Turkey
- Endometriosis Research and Support Organization (Endo Türkiye), Istanbul, Turkey
| | - Melek Azade Simavi
- Endometriosis Research and Support Organization (Endo Türkiye), Istanbul, Turkey
| |
Collapse
|
26
|
Coleman MC, Moore JN. Two artificial intelligence models underperform on examinations in a veterinary curriculum. J Am Vet Med Assoc 2024; 262:692-697. [PMID: 38382193 DOI: 10.2460/javma.23.12.0666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 01/08/2024] [Indexed: 02/23/2024]
Abstract
OBJECTIVE Advancements in artificial intelligence (AI) and large language models have rapidly generated new possibilities for education and knowledge dissemination in various domains. Currently, our understanding of the knowledge of these models, such as ChatGPT, in the medical and veterinary sciences is in its nascent stage. Educators are faced with an urgent need to better understand these models in order to unleash student potential, promote responsible use, and align AI models with educational goals and learning objectives. The objectives of this study were to evaluate the knowledge level and consistency of responses of 2 platforms of ChatGPT, namely GPT-3.5 and GPT-4.0. SAMPLE A total of 495 multiple-choice and true/false questions from 15 courses used in the assessment of third-year veterinary students at a single veterinary institution were included in this study. METHODS The questions were manually entered 3 times into each platform, and answers were recorded. These answers were then compared against those provided by the faculty members coordinating the courses. RESULTS GPT-3.5 achieved an overall performance score of 55%, whereas GPT-4.0 had a significantly (P < .05) greater performance score of 77%. Importantly, the performance scores of both platforms were significantly (P < .05) below that of the veterinary students (86%). CLINICAL RELEVANCE Findings of this study suggested that veterinary educators and veterinary students retrieving information from these AI-based platforms should do so with caution.
Collapse
|
27
|
Ferdush J, Begum M, Hossain ST. ChatGPT and Clinical Decision Support: Scope, Application, and Limitations. Ann Biomed Eng 2024; 52:1119-1124. [PMID: 37516680 DOI: 10.1007/s10439-023-03329-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Accepted: 07/18/2023] [Indexed: 07/31/2023]
Abstract
This study examines ChatGPT's role in clinical decision support, by analyzing its scope, application, and limitations. By analyzing patient data and providing evidence-based recommendations, ChatGPT, an AI language model, can help healthcare professionals make well-informed decisions. This study examines ChatGPT's use in clinical decision support, including diagnosis and treatment planning. However, it acknowledges limitations like biases, lack of contextual understanding, and human oversight and also proposes a framework for the future clinical decision support system. Understanding these factors will allow healthcare professionals to utilize ChatGPT effectively and make accurate clinical decisions. Further research is needed to understand the implications of using ChatGPT in healthcare settings and to develop safeguards for responsible use.
Collapse
Affiliation(s)
- Jannatul Ferdush
- Department of Computer Science and Engineering, Jashore University of Science and Technology, Jashore, 7408, Bangladesh.
| | - Mahbuba Begum
- Department of Computer Science and Engineering, Mawlana Bhasani Science and Technology, Tangail, 1902, Bangladesh
| | - Sakib Tanvir Hossain
- Department of Mechanical Engineering, Khulna University of Engineering and Technology, Khulna, 9203, Bangladesh
| |
Collapse
|
28
|
Kıyak YS, Coşkun Ö, Budakoğlu Iİ, Uluoğlu C. ChatGPT for generating multiple-choice questions: Evidence on the use of artificial intelligence in automatic item generation for a rational pharmacotherapy exam. Eur J Clin Pharmacol 2024; 80:729-735. [PMID: 38353690 DOI: 10.1007/s00228-024-03649-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Accepted: 02/03/2024] [Indexed: 04/09/2024]
Abstract
PURPOSE Artificial intelligence, specifically large language models such as ChatGPT, offers valuable potential benefits in question (item) writing. This study aimed to determine the feasibility of generating case-based multiple-choice questions using ChatGPT in terms of item difficulty and discrimination levels. METHODS This study involved 99 fourth-year medical students who participated in a rational pharmacotherapy clerkship carried out based-on the WHO 6-Step Model. In response to a prompt that we provided, ChatGPT generated ten case-based multiple-choice questions on hypertension. Following an expert panel, two of these multiple-choice questions were incorporated into a medical school exam without making any changes in the questions. Based on the administration of the test, we evaluated their psychometric properties, including item difficulty, item discrimination (point-biserial correlation), and functionality of the options. RESULTS Both questions exhibited acceptable levels of point-biserial correlation, which is higher than the threshold of 0.30 (0.41 and 0.39). However, one question had three non-functional options (options chosen by fewer than 5% of the exam participants) while the other question had none. CONCLUSIONS The findings showed that the questions can effectively differentiate between students who perform at high and low levels, which also point out the potential of ChatGPT as an artificial intelligence tool in test development. Future studies may use the prompt to generate items in order for enhancing the external validity of the results by gathering data from diverse institutions and settings.
Collapse
Affiliation(s)
- Yavuz Selim Kıyak
- Department of Medical Education and Informatics, Faculty of Medicine, Gazi University, Ankara, Turkey.
- Gazi Üniversitesi Hastanesi E Blok 9, Kat 06500 Beşevler, Ankara, Turkey.
| | - Özlem Coşkun
- Department of Medical Education and Informatics, Faculty of Medicine, Gazi University, Ankara, Turkey
| | - Işıl İrem Budakoğlu
- Department of Medical Education and Informatics, Faculty of Medicine, Gazi University, Ankara, Turkey
| | - Canan Uluoğlu
- Department of Medical Pharmacology, Faculty of Medicine, Gazi University, Ankara, Turkey
| |
Collapse
|
29
|
Koga S, Martin NB, Dickson DW. Evaluating the performance of large language models: ChatGPT and Google Bard in generating differential diagnoses in clinicopathological conferences of neurodegenerative disorders. Brain Pathol 2024; 34:e13207. [PMID: 37553205 PMCID: PMC11006994 DOI: 10.1111/bpa.13207] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 07/31/2023] [Indexed: 08/10/2023] Open
Abstract
This study explores the utility of the large language models (LLMs), specifically ChatGPT and Google Bard, in predicting neuropathologic diagnoses from clinical summaries. A total of 25 cases of neurodegenerative disorders presented at Mayo Clinic brain bank Clinico-Pathological Conferences were analyzed. The LLMs provided multiple pathologic diagnoses and their rationales, which were compared with the final clinical diagnoses made by physicians. ChatGPT-3.5, ChatGPT-4, and Google Bard correctly made primary diagnoses in 32%, 52%, and 40% of cases, respectively, while correct diagnoses were included in 76%, 84%, and 76% of cases, respectively. These findings highlight the potential of artificial intelligence tools like ChatGPT in neuropathology, suggesting they may facilitate more comprehensive discussions in clinicopathological conferences.
Collapse
Affiliation(s)
- Shunsuke Koga
- Department of NeuroscienceMayo ClinicJacksonvilleFloridaUSA
- Present address:
Department of Pathology and Laboratory MedicineHospital of the University of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | | | | |
Collapse
|
30
|
Zaboli A, Brigo F, Sibilio S, Mian M, Turcato G. Human intelligence versus Chat-GPT: who performs better in correctly classifying patients in triage? Am J Emerg Med 2024; 79:44-47. [PMID: 38341993 DOI: 10.1016/j.ajem.2024.02.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 02/02/2024] [Accepted: 02/04/2024] [Indexed: 02/13/2024] Open
Abstract
INTRODUCTION Chat-GPT is rapidly emerging as a promising and potentially revolutionary tool in medicine. One of its possible applications is the stratification of patients according to the severity of clinical conditions and prognosis during the triage evaluation in the emergency department (ED). METHODS Using a randomly selected sample of 30 vignettes recreated from real clinical cases, we compared the concordance in risk stratification of ED patients between healthcare personnel and Chat-GPT. The concordance was assessed with Cohen's kappa, and the performance was evaluated with the area under the receiver operating characteristic curve (AUROC) curves. Among the outcomes, we considered mortality within 72 h, the need for hospitalization, and the presence of a severe or time-dependent condition. RESULTS The concordance in triage code assignment between triage nurses and Chat-GPT was 0.278 (unweighted Cohen's kappa; 95% confidence intervals: 0.231-0.388). For all outcomes, the ROC values were higher for the triage nurses. The most relevant difference was found in 72-h mortality, where triage nurses showed an AUROC of 0.910 (0.757-1.000) compared to only 0.669 (0.153-1.000) for Chat-GPT. CONCLUSIONS The current level of Chat-GPT reliability is insufficient to make it a valid substitute for the expertise of triage nurses in prioritizing ED patients. Further developments are required to enhance the safety and effectiveness of AI for risk stratification of ED patients.
Collapse
Affiliation(s)
- Arian Zaboli
- Innovation, Research and Teaching Service (SABES-ASDAA), Teaching Hospital of the Paracelsus Medical Private University (PMU), Bolzano, Italy.
| | - Francesco Brigo
- Innovation, Research and Teaching Service (SABES-ASDAA), Teaching Hospital of the Paracelsus Medical Private University (PMU), Bolzano, Italy
| | - Serena Sibilio
- Department of Emergency Medicine, Hospital of Merano-Meran (SABES-ASDAA), Merano-Meran, Italy; Lehrkrankenhaus der Paracelsus Medizinischen Privatuniversität, Salzburg, Austria
| | - Michael Mian
- Innovation, Research and Teaching Service (SABES-ASDAA), Teaching Hospital of the Paracelsus Medical Private University (PMU), Bolzano, Italy; College of Health Care-Professions Claudiana, Bozen, Italy
| | - Gianni Turcato
- Department of Internal Medicine, Intermediate Care Unit, Hospital Alto Vicentino (AULSS-7), Santorso, Italy
| |
Collapse
|
31
|
Samaan JS, Rajeev N, Ng WH, Srinivasan N, Busam JA, Yeo YH, Samakar K. ChatGPT as a Source of Information for Bariatric Surgery Patients: a Comparative Analysis of Accuracy and Comprehensiveness Between GPT-4 and GPT-3.5. Obes Surg 2024; 34:1987-1989. [PMID: 38564173 PMCID: PMC11031485 DOI: 10.1007/s11695-024-07212-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 03/22/2024] [Accepted: 03/28/2024] [Indexed: 04/04/2024]
Affiliation(s)
- Jamil S Samaan
- Karsh Division of Digestive and Liver Diseases, Department of Medicine, Cedars-Sinai Medical Center, 8700 Beverly Blvd, Los Angeles, CA, 90048, USA.
| | - Nithya Rajeev
- Division of Upper GI and General Surgery, Department of Surgery, Keck School of Medicine of USC, Health Care Consultation Center, 1510 San Pablo St #514, Los Angeles, CA, 90033, USA
| | - Wee Han Ng
- Bristol Medical School, University of Bristol, 5 Tyndall Ave, Bristol, BS8 1UD, UK
| | - Nitin Srinivasan
- Division of Upper GI and General Surgery, Department of Surgery, Keck School of Medicine of USC, Health Care Consultation Center, 1510 San Pablo St #514, Los Angeles, CA, 90033, USA
| | - Jonathan A Busam
- Karsh Division of Digestive and Liver Diseases, Department of Medicine, Cedars-Sinai Medical Center, 8700 Beverly Blvd, Los Angeles, CA, 90048, USA
| | - Yee Hui Yeo
- Karsh Division of Digestive and Liver Diseases, Department of Medicine, Cedars-Sinai Medical Center, 8700 Beverly Blvd, Los Angeles, CA, 90048, USA
| | - Kamran Samakar
- Division of Upper GI and General Surgery, Department of Surgery, Keck School of Medicine of USC, Health Care Consultation Center, 1510 San Pablo St #514, Los Angeles, CA, 90033, USA
| |
Collapse
|
32
|
Alessandri-Bonetti M, Liu HY, Giorgino R, Nguyen VT, Egro FM. The First Months of Life of ChatGPT and Its Impact in Healthcare: A Bibliometric Analysis of the Current Literature. Ann Biomed Eng 2024; 52:1107-1110. [PMID: 37482572 DOI: 10.1007/s10439-023-03325-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 07/14/2023] [Indexed: 07/25/2023]
Abstract
We aimed to evaluate current trends and future directions in the field of AI research since ChatGPT was launched. We performed a bibliometric analysis of the literature published during the first 7 months of the life of ChatGPT since its introduction, updated to July 1st, 2023. Seven hundred and twenty-four (724) articles were retrieved. This analysis highlights a significant increase in publications exploring ChatGPT use across various medical disciplines, indicating its expanding relevance in healthcare. A decline proportion of studies focusing on ethical considerations was observed. Simultaneously, there was a steady increase in studies focused on the exploration of possible applications of ChatGPT. As ChatGPT applications continue to expand, ongoing vigilance and collaborative efforts to optimize ChatGPT performance are essential in harnessing the benefits while mitigating the risks of AI use in healthcare.
Collapse
Affiliation(s)
- Mario Alessandri-Bonetti
- Department of Plastic Surgery, University of Pittsburgh Medical Center, 1350 Locust Street, G103, Pittsburgh, PA, 15213, USA
| | - Hilary Y Liu
- Department of Plastic Surgery, University of Pittsburgh Medical Center, 1350 Locust Street, G103, Pittsburgh, PA, 15213, USA
| | - Riccardo Giorgino
- Department of Orthopedics, IRCCS Istituto Ortopedico Galeazzi, Milan, Italy
| | - Vu T Nguyen
- Department of Plastic Surgery, University of Pittsburgh Medical Center, 1350 Locust Street, G103, Pittsburgh, PA, 15213, USA
| | - Francesco M Egro
- Department of Plastic Surgery, University of Pittsburgh Medical Center, 1350 Locust Street, G103, Pittsburgh, PA, 15213, USA.
- Department of Plastic Surgery, University of Pittsburgh Medical Center, 3550 Terrace Street 6B Scaife Hall, Pittsburgh, PA, 15261, USA.
| |
Collapse
|
33
|
Sabour A, Ghassemi F. Methodological issues on precision and prediction value of ChatGPT in emergency department triage decisions. Am J Emerg Med 2024; 79:198-199. [PMID: 38565486 DOI: 10.1016/j.ajem.2024.03.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Accepted: 03/15/2024] [Indexed: 04/04/2024] Open
Affiliation(s)
- Amirhossein Sabour
- Department of Computing and Software, McMaster University, Hamilton, ON, Canada.
| | - Fariba Ghassemi
- Eye Research Center, Farabi Eye Hospital, Tehran University of Medical Sciences, Tehran, Iran; Retina and Vitreous Service, Farabi Eye Hospital, Tehran University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
34
|
Yang J, Ardavanis KS, Slack KE, Fernando ND, Della Valle CJ, Hernandez NM. Chat Generative Pretrained Transformer ( ChatGPT) and Bard: Artificial Intelligence Does not yet Provide Clinically Supported Answers for Hip and Knee Osteoarthritis. J Arthroplasty 2024; 39:1184-1190. [PMID: 38237878 DOI: 10.1016/j.arth.2024.01.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Revised: 01/08/2024] [Accepted: 01/11/2024] [Indexed: 02/22/2024] Open
Abstract
BACKGROUND Advancements in artificial intelligence (AI) have led to the creation of large language models (LLMs), such as Chat Generative Pretrained Transformer (ChatGPT) and Bard, that analyze online resources to synthesize responses to user queries. Despite their popularity, the accuracy of LLM responses to medical questions remains unknown. This study aimed to compare the responses of ChatGPT and Bard regarding treatments for hip and knee osteoarthritis with the American Academy of Orthopaedic Surgeons (AAOS) Evidence-Based Clinical Practice Guidelines (CPGs) recommendations. METHODS Both ChatGPT (Open AI) and Bard (Google) were queried regarding 20 treatments (10 for hip and 10 for knee osteoarthritis) from the AAOS CPGs. Responses were classified by 2 reviewers as being in "Concordance," "Discordance," or "No Concordance" with AAOS CPGs. A Cohen's Kappa coefficient was used to assess inter-rater reliability, and Chi-squared analyses were used to compare responses between LLMs. RESULTS Overall, ChatGPT and Bard provided responses that were concordant with the AAOS CPGs for 16 (80%) and 12 (60%) treatments, respectively. Notably, ChatGPT and Bard encouraged the use of non-recommended treatments in 30% and 60% of queries, respectively. There were no differences in performance when evaluating by joint or by recommended versus non-recommended treatments. Studies were referenced in 6 (30%) of the Bard responses and none (0%) of the ChatGPT responses. Of the 6 Bard responses, studies could only be identified for 1 (16.7%). Of the remaining, 2 (33.3%) responses cited studies in journals that did not exist, 2 (33.3%) cited studies that could not be found with the information given, and 1 (16.7%) provided links to unrelated studies. CONCLUSIONS Both ChatGPT and Bard do not consistently provide responses that align with the AAOS CPGs. Consequently, physicians and patients should temper expectations on the guidance AI platforms can currently provide.
Collapse
Affiliation(s)
- JaeWon Yang
- Department of Orthopaedic Surgery, University of Washington, Seattle, Washington
| | - Kyle S Ardavanis
- Department of Orthopaedic Surgery, Madigan Medical Center, Tacoma, Washington
| | - Katherine E Slack
- Elson S. Floyd College of Medicine, Washington State University, Spokane, Washington
| | - Navin D Fernando
- Department of Orthopaedic Surgery, University of Washington, Seattle, Washington
| | - Craig J Della Valle
- Department of Orthopaedic Surgery, Rush University Medical Center, Chicago, Illinois
| | - Nicholas M Hernandez
- Department of Orthopaedic Surgery, University of Washington, Seattle, Washington
| |
Collapse
|
35
|
Dahri NA, Yahaya N, Al-Rahmi WM, Aldraiweesh A, Alturki U, Almutairy S, Shutaleva A, Soomro RB. Extended TAM based acceptance of AI-Powered ChatGPT for supporting metacognitive self-regulated learning in education: A mixed-methods study. Heliyon 2024; 10:e29317. [PMID: 38628736 PMCID: PMC11016976 DOI: 10.1016/j.heliyon.2024.e29317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 04/02/2024] [Accepted: 04/04/2024] [Indexed: 04/19/2024] Open
Abstract
This mixed-method study explores the acceptance of ChatGPT as a tool for Metacognitive Self-Regulated Learning (MSRL) among academics. Despite the growing attention towards ChatGPT as a metacognitive learning tool, there is a need for a comprehensive understanding of the factors influencing its acceptance in academic settings. Engaging 300 preservice teachers through a ChatGPT-based scenario learning activity and utilizing convenience sampling, this study administered a questionnaire based on the proposed Technology Acceptance Model at UTM University's School of Education. Structural equation modelling was applied to analyze participants' perspectives on ChatGPT, considering factors like MSRL's impact on usage intention. Post-reflection sessions, semi-structured interviews, and record analysis were conducted to gather results. Findings indicate a high acceptance of ChatGPT, significantly influenced by personal competency, social influence, perceived AI usefulness, enjoyment, trust, AI intelligence, positive attitude, and metacognitive self-regulated learning. Interviews and record analysis suggest that academics view ChatGPT positively as an educational tool, seeing it as a solution to challenges in teaching and learning processes. The study highlights ChatGPT's potential to enhance MSRL and holds implications for teacher education and AI integration in educational settings.
Collapse
Affiliation(s)
- Nisar Ahmed Dahri
- Faculty of Social Sciences and Humanities, School of Education, University Teknologi Malaysia, UTM Sukadi, Johor, 81310, Malaysia
| | - Noraffandy Yahaya
- Faculty of Social Sciences and Humanities, School of Education, University Teknologi Malaysia, UTM Sukadi, Johor, 81310, Malaysia
| | - Waleed Mugahed Al-Rahmi
- Faculty of Social Sciences and Humanities, School of Education, University Teknologi Malaysia, UTM Sukadi, Johor, 81310, Malaysia
| | - Ahmed Aldraiweesh
- Educational Technology Department, College of Education, King Saud University, P.O. Box 21501, Riyadh, 11485, Saudi Arabia
| | - Uthman Alturki
- Educational Technology Department, College of Education, King Saud University, P.O. Box 21501, Riyadh, 11485, Saudi Arabia
| | - Sultan Almutairy
- Educational Technology Department, College of Education, King Saud University, P.O. Box 21501, Riyadh, 11485, Saudi Arabia
| | - Anna Shutaleva
- Ural Federal University Named After the First President of Russia B. N. Yeltsin, 620002, Ekaterinburg, Russia
| | - Rahim Bux Soomro
- Institute of Business Administration, Shah Abdul Latif University, Khairpur, Pakistan
| |
Collapse
|
36
|
Choudhury A, Chaudhry Z. Large Language Models and User Trust: Consequence of Self-Referential Learning Loop and the Deskilling of Health Care Professionals. J Med Internet Res 2024; 26:e56764. [PMID: 38662419 DOI: 10.2196/56764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 03/12/2024] [Accepted: 03/20/2024] [Indexed: 04/26/2024] Open
Abstract
As the health care industry increasingly embraces large language models (LLMs), understanding the consequence of this integration becomes crucial for maximizing benefits while mitigating potential pitfalls. This paper explores the evolving relationship among clinician trust in LLMs, the transition of data sources from predominantly human-generated to artificial intelligence (AI)-generated content, and the subsequent impact on the performance of LLMs and clinician competence. One of the primary concerns identified in this paper is the LLMs' self-referential learning loops, where AI-generated content feeds into the learning algorithms, threatening the diversity of the data pool, potentially entrenching biases, and reducing the efficacy of LLMs. While theoretical at this stage, this feedback loop poses a significant challenge as the integration of LLMs in health care deepens, emphasizing the need for proactive dialogue and strategic measures to ensure the safe and effective use of LLM technology. Another key takeaway from our investigation is the role of user expertise and the necessity for a discerning approach to trusting and validating LLM outputs. The paper highlights how expert users, particularly clinicians, can leverage LLMs to enhance productivity by off-loading routine tasks while maintaining a critical oversight to identify and correct potential inaccuracies in AI-generated content. This balance of trust and skepticism is vital for ensuring that LLMs augment rather than undermine the quality of patient care. We also discuss the risks associated with the deskilling of health care professionals. Frequent reliance on LLMs for critical tasks could result in a decline in health care providers' diagnostic and thinking skills, particularly affecting the training and development of future professionals. The legal and ethical considerations surrounding the deployment of LLMs in health care are also examined. We discuss the medicolegal challenges, including liability in cases of erroneous diagnoses or treatment advice generated by LLMs. The paper references recent legislative efforts, such as The Algorithmic Accountability Act of 2023, as crucial steps toward establishing a framework for the ethical and responsible use of AI-based technologies in health care. In conclusion, this paper advocates for a strategic approach to integrating LLMs into health care. By emphasizing the importance of maintaining clinician expertise, fostering critical engagement with LLM outputs, and navigating the legal and ethical landscape, we can ensure that LLMs serve as valuable tools in enhancing patient care and supporting health care professionals. This approach addresses the immediate challenges posed by integrating LLMs and sets a foundation for their maintainable and responsible use in the future.
Collapse
Affiliation(s)
- Avishek Choudhury
- Industrial and Management Systems Engineering, West Virginia University, Morgantown, WV, United States
| | - Zaira Chaudhry
- Industrial and Management Systems Engineering, West Virginia University, Morgantown, WV, United States
| |
Collapse
|
37
|
Carobene A, Padoan A, Cabitza F, Banfi G, Plebani M. Rising adoption of artificial intelligence in scientific publishing: evaluating the role, risks, and ethical implications in paper drafting and review process. Clin Chem Lab Med 2024; 62:835-843. [PMID: 38019961 DOI: 10.1515/cclm-2023-1136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Accepted: 11/13/2023] [Indexed: 12/01/2023]
Abstract
BACKGROUND In the rapid evolving landscape of artificial intelligence (AI), scientific publishing is experiencing significant transformations. AI tools, while offering unparalleled efficiencies in paper drafting and peer review, also introduce notable ethical concerns. CONTENT This study delineates AI's dual role in scientific publishing: as a co-creator in the writing and review of scientific papers and as an ethical challenge. We first explore the potential of AI as an enhancer of efficiency, efficacy, and quality in creating scientific papers. A critical assessment follows, evaluating the risks vs. rewards for researchers, especially those early in their careers, emphasizing the need to maintain a balance between AI's capabilities and fostering independent reasoning and creativity. Subsequently, we delve into the ethical dilemmas of AI's involvement, particularly concerning originality, plagiarism, and preserving the genuine essence of scientific discourse. The evolving dynamics further highlight an overlooked aspect: the inadequate recognition of human reviewers in the academic community. With the increasing volume of scientific literature, tangible metrics and incentives for reviewers are proposed as essential to ensure a balanced academic environment. SUMMARY AI's incorporation in scientific publishing is promising yet comes with significant ethical and operational challenges. The role of human reviewers is accentuated, ensuring authenticity in an AI-influenced environment. OUTLOOK As the scientific community treads the path of AI integration, a balanced symbiosis between AI's efficiency and human discernment is pivotal. Emphasizing human expertise, while exploit artificial intelligence responsibly, will determine the trajectory of an ethically sound and efficient AI-augmented future in scientific publishing.
Collapse
Affiliation(s)
- Anna Carobene
- Laboratory Medicine, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Andrea Padoan
- Department of Medicine-DIMED, University of Padova, Padova, Italy
- Laboratory Medicine Unit, University Hospital of Padova, Padova, Italy
| | - Federico Cabitza
- DISCo, Università Degli Studi di Milano-Bicocca, Milan, Italy
- IRCCS Ospedale Galeazzi - Sant'Ambrogio, Milan, Italy
| | - Giuseppe Banfi
- IRCCS Ospedale Galeazzi - Sant'Ambrogio, Milan, Italy
- University Vita-Salute San Raffaele, Milan, Italy
| | - Mario Plebani
- Laboratory Medicine Unit, University Hospital of Padova, Padova, Italy
- University of Padova, Padova, Italy
| |
Collapse
|
38
|
Shin E, Yu Y, Bies RR, Ramanathan M. Evaluation of ChatGPT and Gemini large language models for pharmacometrics with NONMEM. J Pharmacokinet Pharmacodyn 2024:10.1007/s10928-024-09921-y. [PMID: 38656706 DOI: 10.1007/s10928-024-09921-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Accepted: 04/16/2024] [Indexed: 04/26/2024]
Abstract
To assess ChatGPT 4.0 (ChatGPT) and Gemini Ultra 1.0 (Gemini) large language models on NONMEM coding tasks relevant to pharmacometrics and clinical pharmacology. ChatGPT and Gemini were assessed on tasks mimicking real-world applications of NONMEM. The tasks ranged from providing a curriculum for learning NONMEM, an overview of NONMEM code structure to generating code. Prompts in lay language to elicit NONMEM code for a linear pharmacokinetic (PK) model with oral administration and a more complex model with two parallel first-order absorption mechanisms were investigated. Reproducibility and the impact of "temperature" hyperparameter settings were assessed. The code was reviewed by two NONMEM experts. ChatGPT and Gemini provided NONMEM curriculum structures combining foundational knowledge with advanced concepts (e.g., covariate modeling and Bayesian approaches) and practical skills including NONMEM code structure and syntax. ChatGPT provided an informative summary of the NONMEM control stream structure and outlined the key NONMEM Translator (NM-TRAN) records needed. ChatGPT and Gemini were able to generate code blocks for the NONMEM control stream from the lay language prompts for the two coding tasks. The control streams contained focal structural and syntax errors that required revision before they could be executed without errors and warnings. The code output from ChatGPT and Gemini was not reproducible, and varying the temperature hyperparameter did not reduce the errors and omissions substantively. Large language models may be useful in pharmacometrics for efficiently generating an initial coding template for modeling projects. However, the output can contain errors and omissions that require correction.
Collapse
Affiliation(s)
- Euibeom Shin
- Department of Pharmaceutical Sciences, University at Buffalo, The State University of New York, Buffalo, NY, 14214-8033, USA
| | - Yifan Yu
- Department of Pharmaceutical Sciences, University at Buffalo, The State University of New York, Buffalo, NY, 14214-8033, USA
| | - Robert R Bies
- Department of Pharmaceutical Sciences, University at Buffalo, The State University of New York, Buffalo, NY, 14214-8033, USA
| | - Murali Ramanathan
- Department of Pharmaceutical Sciences, University at Buffalo, The State University of New York, Buffalo, NY, 14214-8033, USA.
| |
Collapse
|
39
|
Ostrowska M, Kacała P, Onolememen D, Vaughan-Lane K, Sisily Joseph A, Ostrowski A, Pietruszewska W, Banaszewski J, Wróbel MJ. To trust or not to trust: evaluating the reliability and safety of AI responses to laryngeal cancer queries. Eur Arch Otorhinolaryngol 2024:10.1007/s00405-024-08643-8. [PMID: 38652298 DOI: 10.1007/s00405-024-08643-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Accepted: 03/26/2024] [Indexed: 04/25/2024]
Abstract
PURPOSE As online health information-seeking surges, concerns mount over the quality and safety of accessible content, potentially leading to patient harm through misinformation. On one hand, the emergence of Artificial Intelligence (AI) in healthcare could prevent it; on the other hand, questions raise regarding the quality and safety of the medical information provided. As laryngeal cancer is a prevalent head and neck malignancy, this study aims to evaluate the utility and safety of three large language models (LLMs) as sources of patient information about laryngeal cancer. METHODS A cross-sectional study was conducted using three LLMs (ChatGPT 3.5, ChatGPT 4.0, and Bard). A questionnaire comprising 36 inquiries about laryngeal cancer was categorised into diagnosis (11 questions), treatment (9 questions), novelties and upcoming treatments (4 questions), controversies (8 questions), and sources of information (4 questions). The population of reviewers consisted of 3 groups, including ENT specialists, junior physicians, and non-medicals, who graded the responses. Each physician evaluated each question twice for each model, while non-medicals only once. Everyone was blinded to the model type, and the question order was shuffled. Outcome evaluations were based on a safety score (1-3) and a Global Quality Score (GQS, 1-5). Results were compared between LLMs. The study included iterative assessments and statistical validations. RESULTS Analysis revealed that ChatGPT 3.5 scored highest in both safety (mean: 2.70) and GQS (mean: 3.95). ChatGPT 4.0 and Bard had lower safety scores of 2.56 and 2.42, respectively, with corresponding quality scores of 3.65 and 3.38. Inter-rater reliability was consistent, with less than 3% discrepancy. About 4.2% of responses fell into the lowest safety category (1), particularly in the novelty category. Non-medical reviewers' quality assessments correlated moderately (r = 0.67) with response length. CONCLUSIONS LLMs can be valuable resources for patients seeking information on laryngeal cancer. ChatGPT 3.5 provided the most reliable and safe responses among the models evaluated.
Collapse
Affiliation(s)
- Magdalena Ostrowska
- Department of Otolaryngology and Laryngological Oncology, Collegium Medicum, Nicolaus Copernicus University in Torun, ul.Marie Sklodowskiej-Curie 9, 85-094, Bydgoszcz, Poland
| | - Paulina Kacała
- ENT Scientific Club, Department of Otolaryngology and Laryngological Oncology, Collegium Medicum, Nicolaus Copernicus University in Torun, ul.Marie Sklodowskiej-Curie 9, 85-094, Bydgoszcz, Poland
| | - Deborah Onolememen
- ENT Scientific Club, Department of Otolaryngology and Laryngological Oncology, Collegium Medicum, Nicolaus Copernicus University in Torun, ul.Marie Sklodowskiej-Curie 9, 85-094, Bydgoszcz, Poland
| | - Katie Vaughan-Lane
- ENT Scientific Club, Department of Otolaryngology and Laryngological Oncology, Collegium Medicum, Nicolaus Copernicus University in Torun, ul.Marie Sklodowskiej-Curie 9, 85-094, Bydgoszcz, Poland.
| | - Anitta Sisily Joseph
- ENT Scientific Club, Department of Otolaryngology and Laryngological Oncology, Collegium Medicum, Nicolaus Copernicus University in Torun, ul.Marie Sklodowskiej-Curie 9, 85-094, Bydgoszcz, Poland
| | - Adam Ostrowski
- Department of Urology, Collegium Medicum, Nicolaus Copernicus University in Torun, ul.Marie Sklodowskiej-Curie 9, 85-094, Bydgoszcz, Poland
| | - Wioletta Pietruszewska
- Department of Otolaryngology, Laryngological Oncology, Audiology and Phoniatrics, Medical University of Lodz, ul Żeromskiego 113, 90-549, Lodz, Poland
| | - Jacek Banaszewski
- Department of Otolaryngology, Head and Neck Oncology, Poznan University of Medical Science, ul Przybyszewskiego 49, 60-355, Poznań, Poland
| | - Maciej J Wróbel
- Department of Otolaryngology and Laryngological Oncology, Collegium Medicum, Nicolaus Copernicus University in Torun, ul.Marie Sklodowskiej-Curie 9, 85-094, Bydgoszcz, Poland
| |
Collapse
|
40
|
Pinto VBP, Gomes CM. Insights and future directions for ChatGPT in medical practice: Addressing comments on our study. Neurourol Urodyn 2024. [PMID: 38651742 DOI: 10.1002/nau.25479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Accepted: 04/11/2024] [Indexed: 04/25/2024]
Affiliation(s)
- Vicktor B P Pinto
- Division of Urology, University of Sao Paulo School of Medicine, Sao Paulo, Brazil
| | - Cristiano M Gomes
- Division of Urology, University of Sao Paulo School of Medicine, Sao Paulo, Brazil
| |
Collapse
|
41
|
Moulin TC. Learning with AI Language Models: Guidelines for the Development and Scoring of Medical Questions for Higher Education. J Med Syst 2024; 48:45. [PMID: 38652327 DOI: 10.1007/s10916-024-02069-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 04/11/2024] [Indexed: 04/25/2024]
Abstract
In medical and biomedical education, traditional teaching methods often struggle to engage students and promote critical thinking. The use of AI language models has the potential to transform teaching and learning practices by offering an innovative, active learning approach that promotes intellectual curiosity and deeper understanding. To effectively integrate AI language models into biomedical education, it is essential for educators to understand the benefits and limitations of these tools and how they can be employed to achieve high-level learning outcomes.This article explores the use of AI language models in biomedical education, focusing on their application in both classroom teaching and learning assignments. Using the SOLO taxonomy as a framework, I discuss strategies for designing questions that challenge students to exercise critical thinking and problem-solving skills, even when assisted by AI models. Additionally, I propose a scoring rubric for evaluating student performance when collaborating with AI language models, ensuring a comprehensive assessment of their learning outcomes.AI language models offer a promising opportunity for enhancing student engagement and promoting active learning in the biomedical field. Understanding the potential use of these technologies allows educators to create learning experiences that are fit for their students' needs, encouraging intellectual curiosity and a deeper understanding of complex subjects. The application of these tools will be fundamental to provide more effective and engaging learning experiences for students in the future.
Collapse
Affiliation(s)
- Thiago C Moulin
- Department of Experimental Medical Science, Lund University, Lund, Sweden.
- Department of Surgical Sciences, Uppsala University, Uppsala, Sweden.
| |
Collapse
|
42
|
Jagiella-Lodise O, Suh N, Zelenski NA. Can Patients Rely on ChatGPT to Answer Hand Pathology-Related Medical Questions? Hand (N Y) 2024:15589447241247246. [PMID: 38654498 DOI: 10.1177/15589447241247246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
BACKGROUND In recent years, ChatGPT has become a popular source of information online. Physicians need to be aware of the resources their patients are using to self-inform of their conditions. This study investigates physician-graded accuracy and completeness of ChatGPT regarding various questions patients are likely to ask the artificial intelligence (AI) system concerning common upper limb orthopedic conditions. METHODS ChatGPT 3.5 was interrogated concerning 5 common orthopedic hand conditions: carpal tunnel syndrome, Dupuytren contracture, De Quervain tenosynovitis, trigger finger, and carpal metacarpal arthritis. Questions evaluated conditions' symptoms, pathology, management, surgical indications, recovery time, insurance coverage, and workers' compensation possibility. Each topic had 12 to 15 questions and was established as its own ChatGPT conversation. All questions regarding the same diagnosis were presented to the AI, and its answers were recorded. Each question was then graded for both accuracy (Likert scale of 1-6) and completeness (Likert scale of 1-3) by 10 fellowship trained hand surgeons. Descriptive statistics were performed. RESULTS Overall, the mean accuracy score for ChatGPT's answers to common orthopedic hand diagnoses was 4.83 out of 6 ± 0.95. The mean completeness of answers was 2 out of 3 ± 0.59. CONCLUSIONS Easily accessible online AI such as ChatGPT is becoming more advanced and thus more reliable in its ability to answer common medical questions. Physicians can anticipate such online resources being mostly correct, however incomplete. Patients should beware of relying on such resources in isolation.
Collapse
Affiliation(s)
| | - Nina Suh
- Division of Hand Surgery, Department of Orthopaedic Surgery, Emory University, Atlanta, GA, USA
| | - Nicole A Zelenski
- Division of Hand Surgery, Department of Orthopaedic Surgery, Emory University, Atlanta, GA, USA
| |
Collapse
|
43
|
Tsai CY, Hsieh SJ, Huang HH, Deng JH, Huang YY, Cheng PY. Performance of ChatGPT on the Taiwan urology board examination: insights into current strengths and shortcomings. World J Urol 2024; 42:250. [PMID: 38652322 DOI: 10.1007/s00345-024-04957-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 03/25/2024] [Indexed: 04/25/2024] Open
Abstract
PURPOSE To compare ChatGPT-4 and ChatGPT-3.5's performance on Taiwan urology board examination (TUBE), focusing on answer accuracy, explanation consistency, and uncertainty management tactics to minimize score penalties from incorrect responses across 12 urology domains. METHODS 450 multiple-choice questions from TUBE(2020-2022) were presented to two models. Three urologists assessed correctness and consistency of each response. Accuracy quantifies correct answers; consistency assesses logic and coherence in explanations out of total responses, alongside a penalty reduction experiment with prompt variations. Univariate logistic regression was applied for subgroup comparison. RESULTS ChatGPT-4 showed strengths in urology, achieved an overall accuracy of 57.8%, with annual accuracies of 64.7% (2020), 58.0% (2021), and 50.7% (2022), significantly surpassing ChatGPT-3.5 (33.8%, OR = 2.68, 95% CI [2.05-3.52]). It could have passed the TUBE written exams if solely based on accuracy but failed in the final score due to penalties. ChatGPT-4 displayed a declining accuracy trend over time. Variability in accuracy across 12 urological domains was noted, with more frequently updated knowledge domains showing lower accuracy (53.2% vs. 62.2%, OR = 0.69, p = 0.05). A high consistency rate of 91.6% in explanations across all domains indicates reliable delivery of coherent and logical information. The simple prompt outperformed strategy-based prompts in accuracy (60% vs. 40%, p = 0.016), highlighting ChatGPT's limitations in its inability to accurately self-assess uncertainty and a tendency towards overconfidence, which may hinder medical decision-making. CONCLUSIONS ChatGPT-4's high accuracy and consistent explanations in urology board examination demonstrate its potential in medical information processing. However, its limitations in self-assessment and overconfidence necessitate caution in its application, especially for inexperienced users. These insights call for ongoing advancements of urology-specific AI tools.
Collapse
Affiliation(s)
- Chung-You Tsai
- Divisions of Urology, Department of Surgery, Far Eastern Memorial Hospital, No.21, Sec. 2, Nanya S. Rd., Banciao Dist., New Taipei City, 220, Taiwan
- Department of Electrical Engineering, Yuan Ze University, Taoyuan, Taiwan
| | - Shang-Ju Hsieh
- Divisions of Urology, Department of Surgery, Far Eastern Memorial Hospital, No.21, Sec. 2, Nanya S. Rd., Banciao Dist., New Taipei City, 220, Taiwan
| | - Hung-Hsiang Huang
- Divisions of Urology, Department of Surgery, Far Eastern Memorial Hospital, No.21, Sec. 2, Nanya S. Rd., Banciao Dist., New Taipei City, 220, Taiwan
| | - Juinn-Horng Deng
- Department of Electrical Engineering, Yuan Ze University, Taoyuan, Taiwan
| | - Yi-You Huang
- Department of Biomedical Engineering, College of Medicine and College of Engineering, National Taiwan University, Taipei, Taiwan
| | - Pai-Yu Cheng
- Divisions of Urology, Department of Surgery, Far Eastern Memorial Hospital, No.21, Sec. 2, Nanya S. Rd., Banciao Dist., New Taipei City, 220, Taiwan.
- Department of Biomedical Engineering, College of Medicine and College of Engineering, National Taiwan University, Taipei, Taiwan.
| |
Collapse
|
44
|
Mishra V, Sarraju A, Kalwani NM, Dexter JP. Evaluation of Prompts to Simplify Cardiovascular Disease Information Generated Using a Large Language Model: Cross-Sectional Study. J Med Internet Res 2024; 26:e55388. [PMID: 38648104 DOI: 10.2196/55388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 01/25/2024] [Accepted: 01/31/2024] [Indexed: 04/25/2024] Open
Abstract
In this cross-sectional study, we evaluated the completeness, readability, and syntactic complexity of cardiovascular disease prevention information produced by GPT-4 in response to 4 kinds of prompts.
Collapse
Affiliation(s)
- Vishala Mishra
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, United States
| | - Ashish Sarraju
- Department of Cardiovascular Medicine, Cleveland Clinic, Cleveland, OH, United States
| | - Neil M Kalwani
- Veterans Affairs Palo Alto Health Care System, Palo Alto, CA, United States
- Division of Cardiovascular Medicine and the Cardiovascular Institute, Department of Medicine, Stanford University School of Medicine, Stanford, CA, United States
| | - Joseph P Dexter
- Data Science Initiative, Harvard University, Allston, MA, United States
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, United States
- Institute of Collaborative Innovation, University of Macau, Taipa, Macao
| |
Collapse
|
45
|
Kernberg A, Gold JA, Mohan V. Using ChatGPT-4 to Create Structured Medical Notes From Audio Recordings of Physician-Patient Encounters: Comparative Study. J Med Internet Res 2024; 26:e54419. [PMID: 38648636 DOI: 10.2196/54419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 02/20/2024] [Accepted: 03/10/2024] [Indexed: 04/25/2024] Open
Abstract
BACKGROUND Medical documentation plays a crucial role in clinical practice, facilitating accurate patient management and communication among health care professionals. However, inaccuracies in medical notes can lead to miscommunication and diagnostic errors. Additionally, the demands of documentation contribute to physician burnout. Although intermediaries like medical scribes and speech recognition software have been used to ease this burden, they have limitations in terms of accuracy and addressing provider-specific metrics. The integration of ambient artificial intelligence (AI)-powered solutions offers a promising way to improve documentation while fitting seamlessly into existing workflows. OBJECTIVE This study aims to assess the accuracy and quality of Subjective, Objective, Assessment, and Plan (SOAP) notes generated by ChatGPT-4, an AI model, using established transcripts of History and Physical Examination as the gold standard. We seek to identify potential errors and evaluate the model's performance across different categories. METHODS We conducted simulated patient-provider encounters representing various ambulatory specialties and transcribed the audio files. Key reportable elements were identified, and ChatGPT-4 was used to generate SOAP notes based on these transcripts. Three versions of each note were created and compared to the gold standard via chart review; errors generated from the comparison were categorized as omissions, incorrect information, or additions. We compared the accuracy of data elements across versions, transcript length, and data categories. Additionally, we assessed note quality using the Physician Documentation Quality Instrument (PDQI) scoring system. RESULTS Although ChatGPT-4 consistently generated SOAP-style notes, there were, on average, 23.6 errors per clinical case, with errors of omission (86%) being the most common, followed by addition errors (10.5%) and inclusion of incorrect facts (3.2%). There was significant variance between replicates of the same case, with only 52.9% of data elements reported correctly across all 3 replicates. The accuracy of data elements varied across cases, with the highest accuracy observed in the "Objective" section. Consequently, the measure of note quality, assessed by PDQI, demonstrated intra- and intercase variance. Finally, the accuracy of ChatGPT-4 was inversely correlated to both the transcript length (P=.05) and the number of scorable data elements (P=.05). CONCLUSIONS Our study reveals substantial variability in errors, accuracy, and note quality generated by ChatGPT-4. Errors were not limited to specific sections, and the inconsistency in error types across replicates complicated predictability. Transcript length and data complexity were inversely correlated with note accuracy, raising concerns about the model's effectiveness in handling complex medical cases. The quality and reliability of clinical notes produced by ChatGPT-4 do not meet the standards required for clinical use. Although AI holds promise in health care, caution should be exercised before widespread adoption. Further research is needed to address accuracy, variability, and potential errors. ChatGPT-4, while valuable in various applications, should not be considered a safe alternative to human-generated clinical documentation at this time.
Collapse
Affiliation(s)
- Annessa Kernberg
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Sciences University, Portland, OR, United States
| | - Jeffrey A Gold
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Sciences University, Portland, OR, United States
| | - Vishnu Mohan
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Sciences University, Portland, OR, United States
| |
Collapse
|
46
|
Pham C, Govender R, Tehami S, Chavez S, Adepoju OE, Liaw W. ChatGPT's Performance in Cardiac Arrest and Bradycardia Simulations Using the American Heart Association's Advanced Cardiovascular Life Support Guidelines: Exploratory Study. J Med Internet Res 2024; 26:e55037. [PMID: 38648098 DOI: 10.2196/55037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 02/22/2024] [Accepted: 03/10/2024] [Indexed: 04/25/2024] Open
Abstract
BACKGROUND ChatGPT is the most advanced large language model to date, with prior iterations having passed medical licensing examinations, providing clinical decision support, and improved diagnostics. Although limited, past studies of ChatGPT's performance found that artificial intelligence could pass the American Heart Association's advanced cardiovascular life support (ACLS) examinations with modifications. ChatGPT's accuracy has not been studied in more complex clinical scenarios. As heart disease and cardiac arrest remain leading causes of morbidity and mortality in the United States, finding technologies that help increase adherence to ACLS algorithms, which improves survival outcomes, is critical. OBJECTIVE This study aims to examine the accuracy of ChatGPT in following ACLS guidelines for bradycardia and cardiac arrest. METHODS We evaluated the accuracy of ChatGPT's responses to 2 simulations based on the 2020 American Heart Association ACLS guidelines with 3 primary outcomes of interest: the mean individual step accuracy, the accuracy score per simulation attempt, and the accuracy score for each algorithm. For each simulation step, ChatGPT was scored for correctness (1 point) or incorrectness (0 points). Each simulation was conducted 20 times. RESULTS ChatGPT's median accuracy for each step was 85% (IQR 40%-100%) for cardiac arrest and 30% (IQR 13%-81%) for bradycardia. ChatGPT's median accuracy over 20 simulation attempts for cardiac arrest was 69% (IQR 67%-74%) and for bradycardia was 42% (IQR 33%-50%). We found that ChatGPT's outputs varied despite consistent input, the same actions were persistently missed, repetitive overemphasis hindered guidance, and erroneous medication information was presented. CONCLUSIONS This study highlights the need for consistent and reliable guidance to prevent potential medical errors and optimize the application of ChatGPT to enhance its reliability and effectiveness in clinical practice.
Collapse
Affiliation(s)
- Cecilia Pham
- Tilman J Fertitta Family College of Medicine, University of Houston, Houston, TX, United States
| | - Romi Govender
- Tilman J Fertitta Family College of Medicine, University of Houston, Houston, TX, United States
| | - Salik Tehami
- Tilman J Fertitta Family College of Medicine, University of Houston, Houston, TX, United States
| | - Summer Chavez
- Tilman J Fertitta Family College of Medicine, University of Houston, Houston, TX, United States
- Humana Integrated Health Sciences Institute, University of Houston, Houston, TX, United States
- Department of Health Systems and Population Health Sciences, Tilman J Fertitta Family College of Medicine, Houston, TX, United States
| | - Omolola E Adepoju
- Tilman J Fertitta Family College of Medicine, University of Houston, Houston, TX, United States
- Humana Integrated Health Sciences Institute, University of Houston, Houston, TX, United States
- Department of Health Systems and Population Health Sciences, Tilman J Fertitta Family College of Medicine, Houston, TX, United States
| | - Winston Liaw
- Tilman J Fertitta Family College of Medicine, University of Houston, Houston, TX, United States
- Humana Integrated Health Sciences Institute, University of Houston, Houston, TX, United States
- Department of Health Systems and Population Health Sciences, Tilman J Fertitta Family College of Medicine, Houston, TX, United States
| |
Collapse
|
47
|
Tessler I, Wolfovitz A, Alon EE, Gecel NA, Livneh N, Zimlichman E, Klang E. ChatGPT's adherence to otolaryngology clinical practice guidelines. Eur Arch Otorhinolaryngol 2024:10.1007/s00405-024-08634-9. [PMID: 38647684 DOI: 10.1007/s00405-024-08634-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Accepted: 03/22/2024] [Indexed: 04/25/2024]
Abstract
OBJECTIVES Large language models, including ChatGPT, has the potential to transform the way we approach medical knowledge, yet accuracy in clinical topics is critical. Here we assessed ChatGPT's performance in adhering to the American Academy of Otolaryngology-Head and Neck Surgery guidelines. METHODS We presented ChatGPT with 24 clinical otolaryngology questions based on the guidelines of the American Academy of Otolaryngology. This was done three times (N = 72) to test the model's consistency. Two otolaryngologists evaluated the responses for accuracy and relevance to the guidelines. Cohen's Kappa was used to measure evaluator agreement, and Cronbach's alpha assessed the consistency of ChatGPT's responses. RESULTS The study revealed mixed results; 59.7% (43/72) of ChatGPT's responses were highly accurate, while only 2.8% (2/72) directly contradicted the guidelines. The model showed 100% accuracy in Head and Neck, but lower accuracy in Rhinology and Otology/Neurotology (66%), Laryngology (50%), and Pediatrics (8%). The model's responses were consistent in 17/24 (70.8%), with a Cronbach's alpha value of 0.87, indicating a reasonable consistency across tests. CONCLUSIONS Using a guideline-based set of structured questions, ChatGPT demonstrates consistency but variable accuracy in otolaryngology. Its lower performance in some areas, especially Pediatrics, suggests that further rigorous evaluation is needed before considering real-world clinical use.
Collapse
Affiliation(s)
- Idit Tessler
- Department of Otolaryngology and Head and Neck Surgery, Sheba Medical Center, Ramat Gan, Israel.
- School of Medicine, Tel Aviv University, Tel Aviv, Israel.
- ARC Innovation Center, Sheba Medical Center, Ramat Gan, Israel.
| | - Amit Wolfovitz
- Department of Otolaryngology and Head and Neck Surgery, Sheba Medical Center, Ramat Gan, Israel
- School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Eran E Alon
- Department of Otolaryngology and Head and Neck Surgery, Sheba Medical Center, Ramat Gan, Israel
- School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Nir A Gecel
- School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Nir Livneh
- Department of Otolaryngology and Head and Neck Surgery, Sheba Medical Center, Ramat Gan, Israel
- School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Eyal Zimlichman
- School of Medicine, Tel Aviv University, Tel Aviv, Israel
- ARC Innovation Center, Sheba Medical Center, Ramat Gan, Israel
- The Sheba Talpiot Medical Leadership Program, Ramat Gan, Israel
- Hospital Management, Sheba Medical Center, Ramat Gan, Israel
| | - Eyal Klang
- The Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, USA
| |
Collapse
|
48
|
Angyal V, Bertalan Á, Domján P, Dinya E. [ScreenGPT - The opportunities and limitations of artificial intelligence in primary, secondary and tertiary prevention]. Orv Hetil 2024; 165:629-635. [PMID: 38643476 DOI: 10.1556/650.2024.33029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 02/29/2024] [Indexed: 04/23/2024]
Abstract
Bevezetés: A prevenció és a szűrővizsgálatok manapság egyre
népszerűbbek. A páciensek – tudatosabbá válásuknak köszönhetően – többet
kutatnak az interneten egészségi állapotukkal kapcsolatosan, függetlenül attól,
hogy az mennyire megbízható. A ChatGPT megjelenése forradalmasította az
információszerzést, így elkezdték azt öndiagnózisra és egészségi állapotuk
menedzselésére használni. Annak ellenére, hogy a mesterségesintelligencia-alapú
szolgáltatások nem helyettesíthetik az egészségügyi szakemberekkel történő
konzultációt, kiegészítő szerepet tölthetnek be a hagyományos szűrési eljárások
során, így érdemes megvizsgálni a lehetőségeket és a korlátokat.
Célkitűzés: Kutatásunk legfőbb célkitűzése az volt, hogy
azonosítsuk azokat a területeket, ahol a ChatGPT képes bekapcsolódni a primer,
szekunder és tercier prevenciós folyamatokba. Célunk volt továbbá megalkotni az
olyan mesterségesintelligencia-alapú szolgáltatás koncepcióját, amely segítheti
a pácienseket a prevenció különböző szintjein. Módszer: A
prevenciós területen a ChatGPT által nyújtott lehetőségeket a rendszernek
feltett specifikus kérdésekkel térképeztük fel. Ezen tapasztalatok alapján
létrehoztunk egy webapplikációt, melynek elkészítéséhez a GPT-4 modell szolgált
alapul. A válaszok helyességét strukturált pontos kérdésekkel igyekeztük
javítani. A webapplikáció elkészítéséhez Python programozási nyelvet
használtunk, az alkalmazást pedig a Streamlit keretrendszer felhőszolgáltatásán
keresztül tettük elérhetővé és tesztelhetővé. Eredmények: A
tesztek eredményei alapján több olyan prevenciós területet azonosítottunk, ahol
a ChatGPT-t hatékonyan lehetne alkalmazni. Az eredmények alapján sikeresen
létrehoztuk egy webapplikáció alapjait, amely a ScreenGPT nevet kapta.
Következtetés: Megállapítottuk, hogy a ChatGPT a prevenció
mindhárom szintjén képes hasznos válaszokat adni pontos kérdésekre. Válaszai jól
tükrözik az emberi párbeszédet, ám a ChatGPT nem rendelkezik öntudattal, így
fontos, hogy a felhasználók kritikusan értékeljék a válaszait. A ScreenGPT
szolgáltatást e tapasztalatok alapján sikerült megalkotnunk, számos további
vizsgálatra van azonban szükség, hogy megbizonyosodjunk a megbízhatóságáról. Orv
Hetil. 2024; 165(16): 629–635.
Collapse
Affiliation(s)
- Viola Angyal
- 1 Semmelweis Egyetem, Doktori Iskola, Egészségtudományi Doktori Tagozat, Egészségügyi Közszolgálati Kar, Digitális Egészségtudományi Intézet Budapest Magyarország
| | - Ádám Bertalan
- 1 Semmelweis Egyetem, Doktori Iskola, Egészségtudományi Doktori Tagozat, Egészségügyi Közszolgálati Kar, Digitális Egészségtudományi Intézet Budapest Magyarország
| | - Péter Domján
- 2 Semmelweis Egyetem, Doktori Iskola, Egészségtudományi Doktori Tagozat Budapest Magyarország
| | - Elek Dinya
- 3 Semmelweis Egyetem, Egészségügyi Közszolgálati Kar, Digitális Egészségtudományi Intézet Budapest Magyarország
| |
Collapse
|
49
|
Stoneham AC, Walker LC, Newman MJ, Nicholls A, Avis D. Can artificial intelligence make elective hand clinic letters easier for patients to understand? J Hand Surg Eur Vol 2024:17531934241246479. [PMID: 38641940 DOI: 10.1177/17531934241246479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/21/2024]
Abstract
We investigated whether ChatGPT was able to increase the Flesch reading ease and the Flesch-Kincaid reading level of elective clinic letters written by hand surgeons. ChatGPT could not reliably simplify the hand clinic letters any further.
Collapse
Affiliation(s)
- Adam C Stoneham
- Department of Trauma and Orthopaedics, University Hospitals Southampton, Southampton, UK
| | - Lucy C Walker
- Department of Trauma and Orthopaedics, Hampshire Hospitals Foundation Trust, Basingstoke and North Hampshire Hospital, Basingstoke, UK
| | - Michael J Newman
- Department of Trauma and Orthopaedics, University Hospitals Southampton, Southampton, UK
| | - Alex Nicholls
- Department of Trauma and Orthopaedics, Hampshire Hospitals Foundation Trust, Basingstoke and North Hampshire Hospital, Basingstoke, UK
| | - Duncan Avis
- Department of Trauma and Orthopaedics, Hampshire Hospitals Foundation Trust, Basingstoke and North Hampshire Hospital, Basingstoke, UK
| |
Collapse
|
50
|
A Fuller K, Morbitzer KA, Zeeman JM, M Persky A, C Savage A, McLaughlin JE. Exploring the use of ChatGPT to analyze student course evaluation comments. BMC Med Educ 2024; 24:423. [PMID: 38641798 PMCID: PMC11031883 DOI: 10.1186/s12909-024-05316-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 03/14/2024] [Indexed: 04/21/2024]
Abstract
BACKGROUND Since the release of ChatGPT, numerous positive applications for this artificial intelligence (AI) tool in higher education have emerged. Faculty can reduce workload by implementing the use of AI. While course evaluations are a common tool used across higher education, the process of identifying useful information from multiple open-ended comments is often time consuming. The purpose of this study was to explore the use of ChatGPT in analyzing course evaluation comments, including the time required to generate themes and the level of agreement between instructor-identified and AI-identified themes. METHODS Course instructors independently analyzed open-ended student course evaluation comments. Five prompts were provided to guide the coding process. Instructors were asked to note the time required to complete the analysis, the general process they used, and how they felt during their analysis. Student comments were also analyzed through two independent Open-AI ChatGPT user accounts. Thematic analysis was used to analyze the themes generated by instructors and ChatGPT. Percent agreement between the instructor and ChatGPT themes were calculated for each prompt, along with an overall agreement statistic between the instructor and two ChatGPT themes. RESULTS There was high agreement between the instructor and ChatGPT results. The highest agreement was for course-related topics (range 0.71-0.82) and lowest agreement was for weaknesses of the course (range 0.53-0.81). For all prompts except themes related to student experience, the two ChatGPT accounts demonstrated higher agreement with one another than with the instructors. On average, instructors took 27.50 ± 15.00 min to analyze their data (range 20-50). The ChatGPT users took 10.50 ± 1.00 min (range 10-12) and 12.50 ± 2.89 min (range 10-15) to analyze the data. In relation to reviewing and analyzing their own open-ended course evaluations, instructors reported feeling anxiety prior to the process, satisfaction during the process, and frustration related to findings. CONCLUSIONS This study offers valuable insights into the potential of ChatGPT as a tool for analyzing open-ended student course evaluation comments in health professions education. However, it is crucial to ensure ChatGPT is used as a tool to assist with the analysis and to avoid relying solely on its outputs for conclusions.
Collapse
Affiliation(s)
- Kathryn A Fuller
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Kathryn A Morbitzer
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Center for Innovative Pharmacy Education and Research, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Jacqueline M Zeeman
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Adam M Persky
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Center for Innovative Pharmacy Education and Research, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Amanda C Savage
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Jacqueline E McLaughlin
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- Center for Innovative Pharmacy Education and Research, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| |
Collapse
|