Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Shahsavar Y, Choudhury A. User Intentions to Use ChatGPT for Self-Diagnosis and Health-Related Purposes: Cross-sectional Survey Study. JMIR Hum Factors 2023;10:e47564. [PMID: 37195756 PMCID: PMC10233444 DOI: 10.2196/47564] [Citation(s) in RCA: 36] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 04/24/2023] [Accepted: 05/09/2023] [Indexed: 05/18/2023] Open

For:	Shahsavar Y, Choudhury A. User Intentions to Use ChatGPT for Self-Diagnosis and Health-Related Purposes: Cross-sectional Survey Study. JMIR Hum Factors 2023;10:e47564. [PMID: 37195756 PMCID: PMC10233444 DOI: 10.2196/47564] [Citation(s) in RCA: 36] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 04/24/2023] [Accepted: 05/09/2023] [Indexed: 05/18/2023] Open

Number

Cited by Other Article(s)

Meyer A, Soleman A, Riese J, Streichert T. Comparison of ChatGPT, Gemini, and Le Chat with physician interpretations of medical laboratory questions from an online health forum. Clin Chem Lab Med 2024;62:2425-2434. [PMID: 38804035 DOI: 10.1515/cclm-2024-0246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2024] [Accepted: 05/13/2024] [Indexed: 05/29/2024]

Abstract

OBJECTIVES

Laboratory medical reports are often not intuitively comprehensible to non-medical professionals. Given their recent advancements, easier accessibility and remarkable performance on medical licensing exams, patients are therefore likely to turn to artificial intelligence-based chatbots to understand their laboratory results. However, empirical studies assessing the efficacy of these chatbots in responding to real-life patient queries regarding laboratory medicine are scarce.

METHODS

Thus, this investigation included 100 patient inquiries from an online health forum, specifically addressing Complete Blood Count interpretation. The aim was to evaluate the proficiency of three artificial intelligence-based chatbots (ChatGPT, Gemini and Le Chat) against the online responses from certified physicians.

RESULTS

The findings revealed that the chatbots' interpretations of laboratory results were inferior to those from online medical professionals. While the chatbots exhibited a higher degree of empathetic communication, they frequently produced erroneous or overly generalized responses to complex patient questions. The appropriateness of chatbot responses ranged from 51 to 64 %, with 22 to 33 % of responses overestimating patient conditions. A notable positive aspect was the chatbots' consistent inclusion of disclaimers regarding its non-medical nature and recommendations to seek professional medical advice.

CONCLUSIONS

The chatbots' interpretations of laboratory results from real patient queries highlight a dangerous dichotomy - a perceived trustworthiness potentially obscuring factual inaccuracies. Given the growing inclination towards self-diagnosis using AI platforms, further research and improvement of these chatbots is imperative to increase patients' awareness and avoid future burdens on the healthcare system.

Collapse

Berry CE, Fazilat AZ, Lavin C, Lintel H, Cole N, Stingl CS, Valencia C, Morgan AG, Momeni A, Wan DC. Both Patients and Plastic Surgeons Prefer Artificial Intelligence-Generated Microsurgical Information. J Reconstr Microsurg 2024;40:657-664. [PMID: 38382637 DOI: 10.1055/a-2273-4163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]

Abstract

BACKGROUND

With the growing relevance of artificial intelligence (AI)-based patient-facing information, microsurgical-specific online information provided by professional organizations was compared with that of ChatGPT (Chat Generative Pre-Trained Transformer) and assessed for accuracy, comprehensiveness, clarity, and readability.

METHODS

Six plastic and reconstructive surgeons blindly assessed responses to 10 microsurgery-related medical questions written either by the American Society of Reconstructive Microsurgery (ASRM) or ChatGPT based on accuracy, comprehensiveness, and clarity. Surgeons were asked to choose which source provided the overall highest-quality microsurgical patient-facing information. Additionally, 30 individuals with no medical background (ages: 18-81, μ = 49.8) were asked to determine a preference when blindly comparing materials. Readability scores were calculated, and all numerical scores were analyzed using the following six reliability formulas: Flesch-Kincaid Grade Level, Flesch-Kincaid Readability Ease, Gunning Fog Index, Simple Measure of Gobbledygook Index, Coleman-Liau Index, Linsear Write Formula, and Automated Readability Index. Statistical analysis of microsurgical-specific online sources was conducted utilizing paired t-tests.

RESULTS

Statistically significant differences in comprehensiveness and clarity were seen in favor of ChatGPT. Surgeons, 70.7% of the time, blindly choose ChatGPT as the source that overall provided the highest-quality microsurgical patient-facing information. Nonmedical individuals 55.9% of the time selected AI-generated microsurgical materials as well. Neither ChatGPT nor ASRM-generated materials were found to contain inaccuracies. Readability scores for both ChatGPT and ASRM materials were found to exceed recommended levels for patient proficiency across six readability formulas, with AI-based material scored as more complex.

CONCLUSION

AI-generated patient-facing materials were preferred by surgeons in terms of comprehensiveness and clarity when blindly compared with online material provided by ASRM. Studied AI-generated material was not found to contain inaccuracies. Additionally, surgeons and nonmedical individuals consistently indicated an overall preference for AI-generated material. A readability analysis suggested that both materials sourced from ChatGPT and ASRM surpassed recommended reading levels across six readability scores.

Collapse

Affiliation(s)

Charlotte E Berry Department of Surgery, Division of Plastic and Reconstructive Surgery, Hagey Laboratory for Pediatric Regenerative Medicine, Stanford University School of Medicine, Stanford, California
Alexander Z Fazilat Department of Surgery, Division of Plastic and Reconstructive Surgery, Hagey Laboratory for Pediatric Regenerative Medicine, Stanford University School of Medicine, Stanford, California
Christopher Lavin Department of Surgery, Division of Plastic and Reconstructive Surgery, Hagey Laboratory for Pediatric Regenerative Medicine, Stanford University School of Medicine, Stanford, California
Hendrik Lintel Department of Surgery, Division of Plastic and Reconstructive Surgery, Hagey Laboratory for Pediatric Regenerative Medicine, Stanford University School of Medicine, Stanford, California
Naomi Cole Department of Surgery, Division of Plastic and Reconstructive Surgery, Hagey Laboratory for Pediatric Regenerative Medicine, Stanford University School of Medicine, Stanford, California
Cybil S Stingl Department of Surgery, Division of Plastic and Reconstructive Surgery, Hagey Laboratory for Pediatric Regenerative Medicine, Stanford University School of Medicine, Stanford, California
Caleb Valencia Department of Surgery, Division of Plastic and Reconstructive Surgery, Hagey Laboratory for Pediatric Regenerative Medicine, Stanford University School of Medicine, Stanford, California
Annah G Morgan Department of Surgery, Division of Plastic and Reconstructive Surgery, Hagey Laboratory for Pediatric Regenerative Medicine, Stanford University School of Medicine, Stanford, California
Arash Momeni Department of Surgery, Division of Plastic and Reconstructive Surgery, Hagey Laboratory for Pediatric Regenerative Medicine, Stanford University School of Medicine, Stanford, California
Derrick C Wan Department of Surgery, Division of Plastic and Reconstructive Surgery, Hagey Laboratory for Pediatric Regenerative Medicine, Stanford University School of Medicine, Stanford, California

Collapse

Chen SY, Kuo HY, Chang SH. Perceptions of ChatGPT in healthcare: usefulness, trust, and risk. Front Public Health 2024;12:1457131. [PMID: 39346584 PMCID: PMC11436320 DOI: 10.3389/fpubh.2024.1457131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Accepted: 08/29/2024] [Indexed: 10/01/2024] Open

Abstract

Introduction

This study explores the perceptions of ChatGPT in healthcare settings in Taiwan, focusing on its usefulness, trust, and associated risks. As AI technologies like ChatGPT increasingly influence various sectors, their potential in public health education, promotion, medical education, and clinical practice is significant but not without challenges. The study aims to assess how individuals with and without healthcare-related education perceive and adopt ChatGPT, contributing to a deeper understanding of AI's role in enhancing public health outcomes.

Methods

An online survey was conducted among 659 university and graduate students, all of whom had prior experience using ChatGPT. The survey measured perceptions of ChatGPT's ease of use, novelty, usefulness, trust, and risk, particularly within clinical practice, medical education, and research settings. Multiple linear regression models were used to analyze how these factors influence perception in healthcare applications, comparing responses between healthcare majors and non-healthcare majors.

Results

The study revealed that both healthcare and non-healthcare majors find ChatGPT more useful in medical education and research than in clinical practice. Regression analysis revealed that for healthcare majors, general trust is crucial for ChatGPT's adoption in clinical practice and influences its use in medical education and research. For non-healthcare majors, novelty, perceived general usefulness, and trust are key predictors. Interestingly, while healthcare majors were cautious about ease of use, fearing it might increase risk, non-healthcare majors associated increased complexity with greater trust.

Conclusion

This study highlights the varying expectations between healthcare and non-healthcare majors regarding ChatGPT's role in healthcare. The findings suggest the need for AI applications to be tailored to address specific user needs, particularly in clinical practice, where trust and reliability are paramount. Additionally, the potential of AI tools like ChatGPT to contribute to public health education and promotion is significant, as these technologies can enhance health literacy and encourage behavior change. These insights can inform future healthcare practices and policies by guiding the thoughtful and effective integration of AI tools like ChatGPT, ensuring they complement clinical judgment, enhance educational outcomes, support research integrity, and ultimately contribute to improved public health outcomes.

Collapse

Hatherley J, Kinderlerer A, Bjerring JC, Munch LA, Threlfall L. The FHJ debate: Will artificial intelligence replace clinical decision making within our lifetimes? Future Healthc J 2024;11:100178. [PMID: 39371529 PMCID: PMC11452837 DOI: 10.1016/j.fhj.2024.100178] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2024] [Accepted: 08/29/2024] [Indexed: 10/08/2024]

Ayo-Ajibola O, Davis RJ, Lin ME, Riddell J, Kravitz RL. Characterizing the Adoption and Experiences of Users of Artificial Intelligence-Generated Health Information in the United States: Cross-Sectional Questionnaire Study. J Med Internet Res 2024;26:e55138. [PMID: 39141910 PMCID: PMC11358651 DOI: 10.2196/55138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 03/27/2024] [Accepted: 04/15/2024] [Indexed: 08/16/2024] Open

Abstract

BACKGROUND

OpenAI's ChatGPT is a source of advanced online health information (OHI) that may be integrated into individuals' health information-seeking routines. However, concerns have been raised about its factual accuracy and impact on health outcomes. To forecast implications for medical practice and public health, more information is needed on who uses the tool, how often, and for what.

OBJECTIVE

This study aims to characterize the reasons for and types of ChatGPT OHI use and describe the users most likely to engage with the platform.

METHODS

In this cross-sectional survey, patients received invitations to participate via the ResearchMatch platform, a nonprofit affiliate of the National Institutes of Health. A web-based survey measured demographic characteristics, use of ChatGPT and other sources of OHI, experience characterization, and resultant health behaviors. Descriptive statistics were used to summarize the data. Both 2-tailed t tests and Pearson chi-square tests were used to compare users of ChatGPT OHI to nonusers.

RESULTS

Of 2406 respondents, 21.5% (n=517) respondents reported using ChatGPT for OHI. ChatGPT users were younger than nonusers (32.8 vs 39.1 years, P<.001) with lower advanced degree attainment (BA or higher; 49.9% vs 67%, P<.001) and greater use of transient health care (ED and urgent care; P<.001). ChatGPT users were more avid consumers of general non-ChatGPT OHI (percentage of weekly or greater OHI seeking frequency in past 6 months, 28.2% vs 22.8%, P<.001). Around 39.3% (n=206) respondents endorsed using the platform for OHI 2-3 times weekly or more, and most sought the tool to determine if a consultation was required (47.4%, n=245) or to explore alternative treatment (46.2%, n=239). Use characterization was favorable as many believed ChatGPT to be just as or more useful than other OHIs (87.7%, n=429) and their doctor (81%, n=407). About one-third of respondents requested a referral (35.6%, n=184) or changed medications (31%, n=160) based on the information received from ChatGPT. As many users reported skepticism regarding the ChatGPT output (67.9%, n=336), most turned to their physicians (67.5%, n=349).

CONCLUSIONS

This study underscores the significant role of AI-generated OHI in shaping health-seeking behaviors and the potential evolution of patient-provider interactions. Given the proclivity of these users to enact health behavior changes based on AI-generated content, there is an opportunity for physicians to guide ChatGPT OHI users on an informed and examined use of the technology.

Collapse

Cherrez-Ojeda I, Gallardo-Bastidas JC, Robles-Velasco K, Osorio MF, Velez Leon EM, Leon Velastegui M, Pauletto P, Aguilar-Díaz FC, Squassi A, González Eras SP, Cordero Carrasco E, Chavez Gonzalez KL, Calderon JC, Bousquet J, Bedbrook A, Faytong-Haro M. Understanding Health Care Students' Perceptions, Beliefs, and Attitudes Toward AI-Powered Language Models: Cross-Sectional Study. JMIR MEDICAL EDUCATION 2024;10:e51757. [PMID: 39137029 DOI: 10.2196/51757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 09/26/2023] [Accepted: 04/30/2024] [Indexed: 08/15/2024]

Abstract

BACKGROUND

ChatGPT was not intended for use in health care, but it has potential benefits that depend on end-user understanding and acceptability, which is where health care students become crucial. There is still a limited amount of research in this area.

OBJECTIVE

The primary aim of our study was to assess the frequency of ChatGPT use, the perceived level of knowledge, the perceived risks associated with its use, and the ethical issues, as well as attitudes toward the use of ChatGPT in the context of education in the field of health. In addition, we aimed to examine whether there were differences across groups based on demographic variables. The second part of the study aimed to assess the association between the frequency of use, the level of perceived knowledge, the level of risk perception, and the level of perception of ethics as predictive factors for participants' attitudes toward the use of ChatGPT.

METHODS

A cross-sectional survey was conducted from May to June 2023 encompassing students of medicine, nursing, dentistry, nutrition, and laboratory science across the Americas. The study used descriptive analysis, chi-square tests, and ANOVA to assess statistical significance across different categories. The study used several ordinal logistic regression models to analyze the impact of predictive factors (frequency of use, perception of knowledge, perception of risk, and ethics perception scores) on attitude as the dependent variable. The models were adjusted for gender, institution type, major, and country. Stata was used to conduct all the analyses.

RESULTS

Of 2661 health care students, 42.99% (n=1144) were unaware of ChatGPT. The median score of knowledge was "minimal" (median 2.00, IQR 1.00-3.00). Most respondents (median 2.61, IQR 2.11-3.11) regarded ChatGPT as neither ethical nor unethical. Most participants (median 3.89, IQR 3.44-4.34) "somewhat agreed" that ChatGPT (1) benefits health care settings, (2) provides trustworthy data, (3) is a helpful tool for clinical and educational medical information access, and (4) makes the work easier. In total, 70% (7/10) of people used it for homework. As the perceived knowledge of ChatGPT increased, there was a stronger tendency with regard to having a favorable attitude toward ChatGPT. Higher ethical consideration perception ratings increased the likelihood of considering ChatGPT as a source of trustworthy health care information (odds ratio [OR] 1.620, 95% CI 1.498-1.752), beneficial in medical issues (OR 1.495, 95% CI 1.452-1.539), and useful for medical literature (OR 1.494, 95% CI 1.426-1.564; P<.001 for all results).

CONCLUSIONS

Over 40% of American health care students (1144/2661, 42.99%) were unaware of ChatGPT despite its extensive use in the health field. Our data revealed the positive attitudes toward ChatGPT and the desire to learn more about it. Medical educators must explore how chatbots may be included in undergraduate health care education programs.

Collapse

Affiliation(s)

Ivan Cherrez-Ojeda Universidad Espiritu Santo, Samborondon, Ecuador Respiralab Research Group, Guayaquil, Ecuador
Juan C Gallardo-Bastidas School of Dentistry, Universidad Católica de Santiago de Guayaquil, Guayaquil, Ecuador
Karla Robles-Velasco Universidad Espiritu Santo, Samborondon, Ecuador Respiralab Research Group, Guayaquil, Ecuador
María F Osorio Universidad Espiritu Santo, Samborondon, Ecuador Respiralab Research Group, Guayaquil, Ecuador
Eleonor Maria Velez Leon Facultad de Odontología Universidad Católica de Cuenca, Cuenca, Ecuador
Manuel Leon Velastegui Universidad Nacional de Chimborazo, Riobamba, Ecuador
Patrícia Pauletto Universidad de Las Américas (UDLA), Quito, Ecuador
F C Aguilar-Díaz Departamento Salud Pública, Escuela Nacional de Estudios Superiores, Universidad Nacional Autónoma de México, Guanajuato, Mexico
Aldo Squassi Universidad de Buenos Aires, Facultad de Odontologìa, Cátedra de Odontología Preventiva y Comunitaria, Buenos Aires, Argentina
Susana Patricia González Eras Universidad Nacional de Loja, Loja, Ecuador
Erita Cordero Carrasco Departamento de cirugía y traumatología bucal y maxilofacial, Universidad de Chile, Santiago, Chile
Karol Leonor Chavez Gonzalez Universidad Politécnica Salesiana Sede Guayaquil, Guayaquil, Ecuador
Juan C Calderon Universidad Espiritu Santo, Samborondon, Ecuador Respiralab Research Group, Guayaquil, Ecuador
Jean Bousquet Institute of Allergology, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany Fraunhofer Institute for Translational Medicine and Pharmacology ITMP, Allergology and Immunology, Berlin, Germany MASK-air, Montpellier, France
Anna Bedbrook MASK-air, Montpellier, France
Marco Faytong-Haro Respiralab Research Group, Guayaquil, Ecuador Universidad Estatal de Milagro, Cdla Universitaria "Dr. Rómulo Minchala Murillo", Milagro, Ecuador Ecuadorian Development Research Lab, Daule, Ecuador

Collapse

Schütz P, Lob S, Chahed H, Dathe L, Löwer M, Reiß H, Weigel A, Albrecht J, Tokgöz P, Dockweiler C. ChatGPT as an Information Source for Patients with Migraines: A Qualitative Case Study. Healthcare (Basel) 2024;12:1594. [PMID: 39201153 PMCID: PMC11354001 DOI: 10.3390/healthcare12161594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Revised: 08/07/2024] [Accepted: 08/08/2024] [Indexed: 09/02/2024] Open

Abstract

Migraines are one of the most common and expensive neurological diseases worldwide. Non-pharmacological and digitally delivered treatment options have long been used in the treatment of migraines. For instance, migraine management tools, online migraine diagnosis or digitally networked patients have been used. Recently, applications of ChatGPT are used in fields of healthcare ranging from identifying potential research topics to assisting professionals in clinical diagnosis and helping patients in managing their health. Despite advances in migraine management, only a minority of patients are adequately informed and treated. It is important to provide these patients with information to help them manage the symptoms and their daily activities. The primary aim of this case study was to examine the appropriateness of ChatGPT to handle symptom descriptions responsibly, suggest supplementary assistance from credible sources, provide valuable perspectives on treatment options, and exhibit potential influences on daily life for patients with migraines. Using a deductive, qualitative study, ten interactions with ChatGPT on different migraine types were analyzed through semi-structured interviews. ChatGPT provided relevant information aligned with common scientific patient resources. Responses were generally intelligible and situationally appropriate, providing personalized insights despite occasional discrepancies in interaction. ChatGPT's empathetic tone and linguistic clarity encouraged user engagement. However, source citations were found to be inconsistent and, in some cases, not comprehensible, which affected the overall comprehensibility of the information. ChatGPT might be promising for patients seeking information on migraine conditions. Its user-specific responses demonstrate potential benefits over static web-based sources. However, reproducibility and accuracy issues highlight the need for digital health literacy. The findings underscore the necessity for continuously evaluating AI systems and their broader societal implications in health communication.

Collapse

Sallam M, Al-Mahzoum K, Alshuaib O, Alhajri H, Alotaibi F, Alkhurainej D, Al-Balwah MY, Barakat M, Egger J. Language discrepancies in the performance of generative artificial intelligence models: an examination of infectious disease queries in English and Arabic. BMC Infect Dis 2024;24:799. [PMID: 39118057 PMCID: PMC11308449 DOI: 10.1186/s12879-024-09725-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Accepted: 08/06/2024] [Indexed: 08/10/2024] Open

Abstract

BACKGROUND

Assessment of artificial intelligence (AI)-based models across languages is crucial to ensure equitable access and accuracy of information in multilingual contexts. This study aimed to compare AI model efficiency in English and Arabic for infectious disease queries.

METHODS

The study employed the METRICS checklist for the design and reporting of AI-based studies in healthcare. The AI models tested included ChatGPT-3.5, ChatGPT-4, Bing, and Bard. The queries comprised 15 questions on HIV/AIDS, tuberculosis, malaria, COVID-19, and influenza. The AI-generated content was assessed by two bilingual experts using the validated CLEAR tool.

RESULTS

In comparing AI models' performance in English and Arabic for infectious disease queries, variability was noted. English queries showed consistently superior performance, with Bard leading, followed by Bing, ChatGPT-4, and ChatGPT-3.5 (P = .012). The same trend was observed in Arabic, albeit without statistical significance (P = .082). Stratified analysis revealed higher scores for English in most CLEAR components, notably in completeness, accuracy, appropriateness, and relevance, especially with ChatGPT-3.5 and Bard. Across the five infectious disease topics, English outperformed Arabic, except for flu queries in Bing and Bard. The four AI models' performance in English was rated as "excellent", significantly outperforming their "above-average" Arabic counterparts (P = .002).

CONCLUSIONS

Disparity in AI model performance was noticed between English and Arabic in response to infectious disease queries. This language variation can negatively impact the quality of health content delivered by AI models among native speakers of Arabic. This issue is recommended to be addressed by AI developers, with the ultimate goal of enhancing health outcomes.

Collapse

Fazilat AZ, Berry CE, Churukian A, Lavin C, Kameni L, Brenac C, Podda S, Bruckman K, Lorenz HP, Khosla RK, Wan DC. AI-based Cleft Lip and Palate Surgical Information is Preferred by Both Plastic Surgeons and Patients in a Blind Comparison. Cleft Palate Craniofac J 2024:10556656241266368. [PMID: 39091088 DOI: 10.1177/10556656241266368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/04/2024] Open

Abstract

INTRODUCTION

The application of artificial intelligence (AI) in healthcare has expanded in recent years, and these tools such as ChatGPT to generate patient-facing information have garnered particular interest. Online cleft lip and palate (CL/P) surgical information supplied by academic/professional (A/P) sources was therefore evaluated against ChatGPT regarding accuracy, comprehensiveness, and clarity.

METHODS

11 plastic and reconstructive surgeons and 29 non-medical individuals blindly compared responses written by ChatGPT or A/P sources to 30 frequently asked CL/P surgery questions. Surgeons indicated preference, determined accuracy, and scored comprehensiveness and clarity. Non-medical individuals indicated preference. Calculations of readability scores were determined using seven readability formulas. Statistical analysis of CL/P surgical online information was performed using paired t-tests.

RESULTS

Surgeons, 60.88% of the time, blindly preferred material generated by ChatGPT over A/P sources. Additionally, surgeons consistently indicated that ChatGPT-generated material was more comprehensive and had greater clarity. No significant difference was found between ChatGPT and resources provided by professional organizations in terms of accuracy. Among individuals with no medical background, ChatGPT-generated materials were preferred 60.46% of the time. For materials from both ChatGPT and A/P sources, readability scores surpassed advised levels for patient proficiency across seven readability formulas.

CONCLUSION

As the prominence of ChatGPT-based language tools rises in the healthcare space, potential applications of the tools should be assessed by experts against existing high-quality sources. Our results indicate that ChatGPT is capable of producing high-quality material in terms of accuracy, comprehensiveness, and clarity preferred by both plastic surgeons and individuals with no medical background.

Collapse

Affiliation(s)

Alexander Z Fazilat Hagey Laboratory for Pediatric Regenerative Medicine, Division of Plastic and Reconstructive Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, CA, USA
Charlotte E Berry Hagey Laboratory for Pediatric Regenerative Medicine, Division of Plastic and Reconstructive Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, CA, USA
Andrew Churukian Hagey Laboratory for Pediatric Regenerative Medicine, Division of Plastic and Reconstructive Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, CA, USA
Christopher Lavin Hagey Laboratory for Pediatric Regenerative Medicine, Division of Plastic and Reconstructive Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, CA, USA
Lionel Kameni Hagey Laboratory for Pediatric Regenerative Medicine, Division of Plastic and Reconstructive Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, CA, USA
Camille Brenac Hagey Laboratory for Pediatric Regenerative Medicine, Division of Plastic and Reconstructive Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, CA, USA
Silvio Podda Division of Plastic and Reconstructive Surgery, St. Joseph's Regional Medical Center, Paterson, NJ, USA
Karl Bruckman Hagey Laboratory for Pediatric Regenerative Medicine, Division of Plastic and Reconstructive Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, CA, USA
Hermann P Lorenz Hagey Laboratory for Pediatric Regenerative Medicine, Division of Plastic and Reconstructive Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, CA, USA
Rohit K Khosla Hagey Laboratory for Pediatric Regenerative Medicine, Division of Plastic and Reconstructive Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, CA, USA
Derrick C Wan Hagey Laboratory for Pediatric Regenerative Medicine, Division of Plastic and Reconstructive Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, CA, USA

Collapse

Sallam M. Bibliometric top ten healthcare-related ChatGPT publications in the first ChatGPT anniversary. NARRA J 2024;4:e917. [PMID: 39280327 PMCID: PMC11391998 DOI: 10.52225/narra.v4i2.917] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Accepted: 07/29/2024] [Indexed: 09/18/2024]

Abstract

Since its public release on November 30, 2022, ChatGPT has shown promising potential in diverse healthcare applications despite ethical challenges, privacy issues, and possible biases. The aim of this study was to identify and assess the most influential publications in the field of ChatGPT utility in healthcare using bibliometric analysis. The study employed an advanced search on three databases, Scopus, Web of Science, and Google Scholar, to identify ChatGPT-related records in healthcare education, research, and practice between November 27 and 30, 2023. The ranking was based on the retrieved citation count in each database. The additional alternative metrics that were evaluated included (1) Semantic Scholar highly influential citations, (2) PlumX captures, (3) PlumX mentions, (4) PlumX social media and (5) Altmetric Attention Scores (AASs). A total of 22 unique records published in 17 different scientific journals from 14 different publishers were identified in the three databases. Only two publications were in the top 10 list across the three databases. Variable publication types were identified, with the most common being editorial/commentary publications (n=8/22, 36.4%). Nine of the 22 records had corresponding authors affiliated with institutions in the United States (40.9%). The range of citation count varied per database, with the highest range identified in Google Scholar (1019-121), followed by Scopus (242-88), and Web of Science (171-23). Google Scholar citations were correlated significantly with the following metrics: Semantic Scholar highly influential citations (Spearman's correlation coefficient ρ=0.840, p<0.001), PlumX captures (ρ=0.831, p<0.001), PlumX mentions (ρ=0.609, p=0.004), and AASs (ρ=0.542, p=0.009). In conclusion, despite several acknowledged limitations, this study showed the evolving landscape of ChatGPT utility in healthcare. There is an urgent need for collaborative initiatives by all stakeholders involved to establish guidelines for ethical, transparent, and responsible use of ChatGPT in healthcare. The study revealed the correlation between citations and alternative metrics, highlighting its usefulness as a supplement to gauge the impact of publications, even in a rapidly growing research field.

Collapse

Reis M, Reis F, Kunde W. Influence of believed AI involvement on the perception of digital medical advice. Nat Med 2024:10.1038/s41591-024-03180-7. [PMID: 39054373 DOI: 10.1038/s41591-024-03180-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 07/04/2024] [Indexed: 07/27/2024]

Lin HL, Liao LL, Wang YN, Chang LC. Attitude and utilization of ChatGPT among registered nurses: A cross-sectional study. Int Nurs Rev 2024. [PMID: 38979771 DOI: 10.1111/inr.13012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Accepted: 06/10/2024] [Indexed: 07/10/2024]

Abstract

AIM

This study explores the influencing factors of attitudes and behaviors toward use of ChatGPT based on the Technology Acceptance Model among registered nurses in Taiwan.

BACKGROUND

The complexity of medical services and nursing shortages increases workloads. ChatGPT swiftly answers medical questions, provides clinical guidelines, and assists with patient information management, thereby improving nursing efficiency.

INTRODUCTION

To facilitate the development of effective ChatGPT training programs, it is essential to examine registered nurses' attitudes toward and utilization of ChatGPT across diverse workplace settings.

METHODS

An anonymous online survey was used to collect data from over 1000 registered nurses recruited through social media platforms between November 2023 and January 2024. Descriptive statistics and multiple linear regression analyses were conducted for data analysis.

RESULTS

Among respondents, some were unfamiliar with ChatGPT, while others had used it before, with higher usage among males, higher-educated individuals, experienced nurses, and supervisors. Gender and work settings influenced perceived risks, and those familiar with ChatGPT recognized its social impact. Perceived risk and usefulness significantly influenced its adoption.

DISCUSSION

Nurse attitudes to ChatGPT vary based on gender, education, experience, and role. Positive perceptions emphasize its usefulness, while risk concerns affect adoption. The insignificant role of perceived ease of use highlights ChatGPT's user-friendly nature.

CONCLUSION

Over half of the surveyed nurses had used or were familiar with ChatGPT and showed positive attitudes toward its use. Establishing rigorous guidelines to enhance their interaction with ChatGPT is crucial for future training.

IMPLICATIONS FOR NURSING AND HEALTH POLICY

Nurse managers should understand registered nurses' attitudes toward ChatGPT and integrate it into in-service education with tailored support and training, including appropriate prompt formulation and advanced decision-making, to prevent misuse.

Collapse

Elshaer IA, Hasanein AM, Sobaih AEE. The Moderating Effects of Gender and Study Discipline in the Relationship between University Students' Acceptance and Use of ChatGPT. Eur J Investig Health Psychol Educ 2024;14:1981-1995. [PMID: 39056647 PMCID: PMC11275491 DOI: 10.3390/ejihpe14070132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Revised: 06/23/2024] [Accepted: 07/05/2024] [Indexed: 07/28/2024] Open

Law S, Oldfield B, Yang W. ChatGPT/GPT-4 (large language models): Opportunities and challenges of perspective in bariatric healthcare professionals. Obes Rev 2024;25:e13746. [PMID: 38613164 DOI: 10.1111/obr.13746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 03/14/2024] [Accepted: 03/15/2024] [Indexed: 04/14/2024]

Yang Z, Wang D, Zhou F, Song D, Zhang Y, Jiang J, Kong K, Liu X, Qiao Y, Chang RT, Han Y, Li F, Tham CC, Zhang X. Understanding natural language: Potential application of large language models to ophthalmology. Asia Pac J Ophthalmol (Phila) 2024;13:100085. [PMID: 39059558 DOI: 10.1016/j.apjo.2024.100085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 06/19/2024] [Accepted: 07/19/2024] [Indexed: 07/28/2024] Open

Affiliation(s)

Zefeng Yang State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
Deming Wang State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
Fengqi Zhou Ophthalmology, Mayo Clinic Health System, Eau Claire, Wisconsin, USA
Diping Song Shanghai Artificial Intelligence Laboratory, Shanghai, China
Yinhang Zhang State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
Jiaxuan Jiang State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
Kangjie Kong State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
Xiaoyi Liu State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
Yu Qiao Shanghai Artificial Intelligence Laboratory, Shanghai, China
Robert T Chang Department of Ophthalmology, Byers Eye Institute at Stanford University, Palo Alto, CA, USA
Ying Han Department of Ophthalmology, University of California, San Francisco, San Francisco, CA, USA
Fei Li State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China.
Clement C Tham Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong SAR, China; Hong Kong Eye Hospital, Kowloon, Hong Kong SAR, China; Department of Ophthalmology and Visual Sciences, Prince of Wales Hospital, Shatin, Hong Kong SAR, China.
Xiulan Zhang State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China.

Collapse

Shinan-Altman S, Elyoseph Z, Levkovich I. The impact of history of depression and access to weapons on suicide risk assessment: a comparison of ChatGPT-3.5 and ChatGPT-4. PeerJ 2024;12:e17468. [PMID: 38827287 PMCID: PMC11143969 DOI: 10.7717/peerj.17468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 05/05/2024] [Indexed: 06/04/2024] Open

Choudhury A, Shamszare H. The Impact of Performance Expectancy, Workload, Risk, and Satisfaction on Trust in ChatGPT: Cross-Sectional Survey Analysis. JMIR Hum Factors 2024;11:e55399. [PMID: 38801658 PMCID: PMC11165287 DOI: 10.2196/55399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 03/25/2024] [Accepted: 04/07/2024] [Indexed: 05/29/2024] Open

Abstract

BACKGROUND

ChatGPT (OpenAI) is a powerful tool for a wide range of tasks, from entertainment and creativity to health care queries. There are potential risks and benefits associated with this technology. In the discourse concerning the deployment of ChatGPT and similar large language models, it is sensible to recommend their use primarily for tasks a human user can execute accurately. As we transition into the subsequent phase of ChatGPT deployment, establishing realistic performance expectations and understanding users' perceptions of risk associated with its use are crucial in determining the successful integration of this artificial intelligence (AI) technology.

OBJECTIVE

The aim of the study is to explore how perceived workload, satisfaction, performance expectancy, and risk-benefit perception influence users' trust in ChatGPT.

METHODS

A semistructured, web-based survey was conducted with 607 adults in the United States who actively use ChatGPT. The survey questions were adapted from constructs used in various models and theories such as the technology acceptance model, the theory of planned behavior, the unified theory of acceptance and use of technology, and research on trust and security in digital environments. To test our hypotheses and structural model, we used the partial least squares structural equation modeling method, a widely used approach for multivariate analysis.

RESULTS

A total of 607 people responded to our survey. A significant portion of the participants held at least a high school diploma (n=204, 33.6%), and the majority had a bachelor's degree (n=262, 43.1%). The primary motivations for participants to use ChatGPT were for acquiring information (n=219, 36.1%), amusement (n=203, 33.4%), and addressing problems (n=135, 22.2%). Some participants used it for health-related inquiries (n=44, 7.2%), while a few others (n=6, 1%) used it for miscellaneous activities such as brainstorming, grammar verification, and blog content creation. Our model explained 64.6% of the variance in trust. Our analysis indicated a significant relationship between (1) workload and satisfaction, (2) trust and satisfaction, (3) performance expectations and trust, and (4) risk-benefit perception and trust.

CONCLUSIONS

The findings underscore the importance of ensuring user-friendly design and functionality in AI-based applications to reduce workload and enhance user satisfaction, thereby increasing user trust. Future research should further explore the relationship between risk-benefit perception and trust in the context of AI chatbots.

Collapse

Naqvi WM, Shaikh SZ, Mishra GV. Large language models in physical therapy: time to adapt and adept. Front Public Health 2024;12:1364660. [PMID: 38887241 PMCID: PMC11182445 DOI: 10.3389/fpubh.2024.1364660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Accepted: 05/10/2024] [Indexed: 06/20/2024] Open

Jokar M, Abdous A, Rahmanian V. AI chatbots in pet health care: Opportunities and challenges for owners. Vet Med Sci 2024;10:e1464. [PMID: 38678576 PMCID: PMC11056198 DOI: 10.1002/vms3.1464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Accepted: 04/04/2024] [Indexed: 05/01/2024] Open

Abstract

The integration of artificial intelligence (AI) into health care has seen remarkable advancements, with applications extending to animal health. This article explores the potential benefits and challenges associated with employing AI chatbots as tools for pet health care. Focusing on ChatGPT, a prominent language model, the authors elucidate its capabilities and its potential impact on pet owners' decision-making processes. AI chatbots offer pet owners access to extensive information on animal health, research studies and diagnostic options, providing a cost-effective and convenient alternative to traditional veterinary consultations. The fate of a case involving a Border Collie named Sassy demonstrates the potential benefits of AI in veterinary medicine. In this instance, ChatGPT played a pivotal role in suggesting a diagnosis that led to successful treatment, showcasing the potential of AI chatbots as valuable tools in complex cases. However, concerns arise regarding pet owners relying solely on AI chatbots for medical advice, potentially resulting in misdiagnosis, inappropriate treatment and delayed professional intervention. We emphasize the need for a balanced approach, positioning AI chatbots as supplementary tools rather than substitutes for licensed veterinarians. To mitigate risks, the article proposes strategies such as educating pet owners on AI chatbots' limitations, implementing regulations to guide AI chatbot companies and fostering collaboration between AI chatbots and veterinarians. The intricate web of responsibilities in this dynamic landscape underscores the importance of government regulations, the educational role of AI chatbots and the symbiotic relationship between AI technology and veterinary expertise. In conclusion, while AI chatbots hold immense promise in transforming pet health care, cautious and informed usage is crucial. By promoting awareness, establishing regulations and fostering collaboration, the article advocates for a responsible integration of AI chatbots to ensure optimal care for pets.

Collapse

Sawamura S, Bito T, Ando T, Masuda K, Kameyama S, Ishida H. Evaluation of the accuracy of ChatGPT's responses to and references for clinical questions in physical therapy. J Phys Ther Sci 2024;36:234-239. [PMID: 38694019 PMCID: PMC11060764 DOI: 10.1589/jpts.36.234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 01/29/2024] [Indexed: 05/03/2024] Open

Wang L, Wan Z, Ni C, Song Q, Li Y, Clayton EW, Malin BA, Yin Z. A Systematic Review of ChatGPT and Other Conversational Large Language Models in Healthcare. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.26.24306390. [PMID: 38712148 PMCID: PMC11071576 DOI: 10.1101/2024.04.26.24306390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]

Abstract

Background

The launch of the Chat Generative Pre-trained Transformer (ChatGPT) in November 2022 has attracted public attention and academic interest to large language models (LLMs), facilitating the emergence of many other innovative LLMs. These LLMs have been applied in various fields, including healthcare. Numerous studies have since been conducted regarding how to employ state-of-the-art LLMs in health-related scenarios to assist patients, doctors, and public health administrators.

Objective

This review aims to summarize the applications and concerns of applying conversational LLMs in healthcare and provide an agenda for future research on LLMs in healthcare.

Methods

We utilized PubMed, ACM, and IEEE digital libraries as primary sources for this review. We followed the guidance of Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRIMSA) to screen and select peer-reviewed research articles that (1) were related to both healthcare applications and conversational LLMs and (2) were published before September 1st, 2023, the date when we started paper collection and screening. We investigated these papers and classified them according to their applications and concerns.

Results

Our search initially identified 820 papers according to targeted keywords, out of which 65 papers met our criteria and were included in the review. The most popular conversational LLM was ChatGPT from OpenAI (60), followed by Bard from Google (1), Large Language Model Meta AI (LLaMA) from Meta (1), and other LLMs (5). These papers were classified into four categories in terms of their applications: 1) summarization, 2) medical knowledge inquiry, 3) prediction, and 4) administration, and four categories of concerns: 1) reliability, 2) bias, 3) privacy, and 4) public acceptability. There are 49 (75%) research papers using LLMs for summarization and/or medical knowledge inquiry, and 58 (89%) research papers expressing concerns about reliability and/or bias. We found that conversational LLMs exhibit promising results in summarization and providing medical knowledge to patients with a relatively high accuracy. However, conversational LLMs like ChatGPT are not able to provide reliable answers to complex health-related tasks that require specialized domain expertise. Additionally, no experiments in our reviewed papers have been conducted to thoughtfully examine how conversational LLMs lead to bias or privacy issues in healthcare research.

Conclusions

Future studies should focus on improving the reliability of LLM applications in complex health-related tasks, as well as investigating the mechanisms of how LLM applications brought bias and privacy issues. Considering the vast accessibility of LLMs, legal, social, and technical efforts are all needed to address concerns about LLMs to promote, improve, and regularize the application of LLMs in healthcare.

Collapse

Choudhury A, Chaudhry Z. Large Language Models and User Trust: Consequence of Self-Referential Learning Loop and the Deskilling of Health Care Professionals. J Med Internet Res 2024;26:e56764. [PMID: 38662419 PMCID: PMC11082730 DOI: 10.2196/56764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 03/12/2024] [Accepted: 03/20/2024] [Indexed: 04/26/2024] Open

Abstract

As the health care industry increasingly embraces large language models (LLMs), understanding the consequence of this integration becomes crucial for maximizing benefits while mitigating potential pitfalls. This paper explores the evolving relationship among clinician trust in LLMs, the transition of data sources from predominantly human-generated to artificial intelligence (AI)-generated content, and the subsequent impact on the performance of LLMs and clinician competence. One of the primary concerns identified in this paper is the LLMs' self-referential learning loops, where AI-generated content feeds into the learning algorithms, threatening the diversity of the data pool, potentially entrenching biases, and reducing the efficacy of LLMs. While theoretical at this stage, this feedback loop poses a significant challenge as the integration of LLMs in health care deepens, emphasizing the need for proactive dialogue and strategic measures to ensure the safe and effective use of LLM technology. Another key takeaway from our investigation is the role of user expertise and the necessity for a discerning approach to trusting and validating LLM outputs. The paper highlights how expert users, particularly clinicians, can leverage LLMs to enhance productivity by off-loading routine tasks while maintaining a critical oversight to identify and correct potential inaccuracies in AI-generated content. This balance of trust and skepticism is vital for ensuring that LLMs augment rather than undermine the quality of patient care. We also discuss the risks associated with the deskilling of health care professionals. Frequent reliance on LLMs for critical tasks could result in a decline in health care providers' diagnostic and thinking skills, particularly affecting the training and development of future professionals. The legal and ethical considerations surrounding the deployment of LLMs in health care are also examined. We discuss the medicolegal challenges, including liability in cases of erroneous diagnoses or treatment advice generated by LLMs. The paper references recent legislative efforts, such as The Algorithmic Accountability Act of 2023, as crucial steps toward establishing a framework for the ethical and responsible use of AI-based technologies in health care. In conclusion, this paper advocates for a strategic approach to integrating LLMs into health care. By emphasizing the importance of maintaining clinician expertise, fostering critical engagement with LLM outputs, and navigating the legal and ethical landscape, we can ensure that LLMs serve as valuable tools in enhancing patient care and supporting health care professionals. This approach addresses the immediate challenges posed by integrating LLMs and sets a foundation for their maintainable and responsible use in the future.

Collapse

Ostrowska M, Kacała P, Onolememen D, Vaughan-Lane K, Sisily Joseph A, Ostrowski A, Pietruszewska W, Banaszewski J, Wróbel MJ. To trust or not to trust: evaluating the reliability and safety of AI responses to laryngeal cancer queries. Eur Arch Otorhinolaryngol 2024:10.1007/s00405-024-08643-8. [PMID: 38652298 DOI: 10.1007/s00405-024-08643-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Accepted: 03/26/2024] [Indexed: 04/25/2024]

Abstract

PURPOSE

As online health information-seeking surges, concerns mount over the quality and safety of accessible content, potentially leading to patient harm through misinformation. On one hand, the emergence of Artificial Intelligence (AI) in healthcare could prevent it; on the other hand, questions raise regarding the quality and safety of the medical information provided. As laryngeal cancer is a prevalent head and neck malignancy, this study aims to evaluate the utility and safety of three large language models (LLMs) as sources of patient information about laryngeal cancer.

METHODS

A cross-sectional study was conducted using three LLMs (ChatGPT 3.5, ChatGPT 4.0, and Bard). A questionnaire comprising 36 inquiries about laryngeal cancer was categorised into diagnosis (11 questions), treatment (9 questions), novelties and upcoming treatments (4 questions), controversies (8 questions), and sources of information (4 questions). The population of reviewers consisted of 3 groups, including ENT specialists, junior physicians, and non-medicals, who graded the responses. Each physician evaluated each question twice for each model, while non-medicals only once. Everyone was blinded to the model type, and the question order was shuffled. Outcome evaluations were based on a safety score (1-3) and a Global Quality Score (GQS, 1-5). Results were compared between LLMs. The study included iterative assessments and statistical validations.

RESULTS

Analysis revealed that ChatGPT 3.5 scored highest in both safety (mean: 2.70) and GQS (mean: 3.95). ChatGPT 4.0 and Bard had lower safety scores of 2.56 and 2.42, respectively, with corresponding quality scores of 3.65 and 3.38. Inter-rater reliability was consistent, with less than 3% discrepancy. About 4.2% of responses fell into the lowest safety category (1), particularly in the novelty category. Non-medical reviewers' quality assessments correlated moderately (r = 0.67) with response length.

CONCLUSIONS

LLMs can be valuable resources for patients seeking information on laryngeal cancer. ChatGPT 3.5 provided the most reliable and safe responses among the models evaluated.

Collapse

Moise A, Centomo-Bozzo A, Orishchak O, Alnoury MK, Daniel SJ. Can ChatGPT Replace an Otolaryngologist in Guiding Parents on Tonsillectomy? EAR, NOSE & THROAT JOURNAL 2024:1455613241230841. [PMID: 38563440 DOI: 10.1177/01455613241230841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2024] Open

Wimbarti S, Kairupan BHR, Tallei TE. Critical review of self-diagnosis of mental health conditions using artificial intelligence. Int J Ment Health Nurs 2024;33:344-358. [PMID: 38345132 DOI: 10.1111/inm.13303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 01/26/2024] [Accepted: 01/30/2024] [Indexed: 03/10/2024]

Parikh AO, Oca MC, Conger JR, McCoy A, Chang J, Zhang-Nunes S. Accuracy and Bias in Artificial Intelligence Chatbot Recommendations for Oculoplastic Surgeons. Cureus 2024;16:e57611. [PMID: 38707042 PMCID: PMC11069401 DOI: 10.7759/cureus.57611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/30/2024] [Indexed: 05/07/2024] Open

Xue Z, Zhang Y, Gan W, Wang H, She G, Zheng X. Quality and Dependability of ChatGPT and DingXiangYuan Forums for Remote Orthopedic Consultations: Comparative Analysis. J Med Internet Res 2024;26:e50882. [PMID: 38483451 PMCID: PMC10979330 DOI: 10.2196/50882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Revised: 11/04/2023] [Accepted: 01/30/2024] [Indexed: 04/01/2024] Open

Abstract

BACKGROUND

The widespread use of artificial intelligence, such as ChatGPT (OpenAI), is transforming sectors, including health care, while separate advancements of the internet have enabled platforms such as China's DingXiangYuan to offer remote medical services.

OBJECTIVE

This study evaluates ChatGPT-4's responses against those of professional health care providers in telemedicine, assessing artificial intelligence's capability to support the surge in remote medical consultations and its impact on health care delivery.

METHODS

We sourced remote orthopedic consultations from "Doctor DingXiang," with responses from its certified physicians as the control and ChatGPT's responses as the experimental group. In all, 3 blindfolded, experienced orthopedic surgeons assessed responses against 7 criteria: "logical reasoning," "internal information," "external information," "guiding function," "therapeutic effect," "medical knowledge popularization education," and "overall satisfaction." We used Fleiss κ to measure agreement among multiple raters.

RESULTS

Initially, consultation records for a cumulative count of 8 maladies (equivalent to 800 cases) were gathered. We ultimately included 73 consultation records by May 2023, following primary and rescreening, in which no communication records containing private information, images, or voice messages were transmitted. After statistical scoring, we discovered that ChatGPT's "internal information" score (mean 4.61, SD 0.52 points vs mean 4.66, SD 0.49 points; P=.43) and "therapeutic effect" score (mean 4.43, SD 0.75 points vs mean 4.55, SD 0.62 points; P=.32) were lower than those of the control group, but the differences were not statistically significant. ChatGPT showed better performance with a higher "logical reasoning" score (mean 4.81, SD 0.36 points vs mean 4.75, SD 0.39 points; P=.38), "external information" score (mean 4.06, SD 0.72 points vs mean 3.92, SD 0.77 points; P=.25), and "guiding function" score (mean 4.73, SD 0.51 points vs mean 4.72, SD 0.54 points; P=.96), although the differences were not statistically significant. Meanwhile, the "medical knowledge popularization education" score of ChatGPT was better than that of the control group (mean 4.49, SD 0.67 points vs mean 3.87, SD 1.01 points; P<.001), and the difference was statistically significant. In terms of "overall satisfaction," the difference was not statistically significant between the groups (mean 8.35, SD 1.38 points vs mean 8.37, SD 1.24 points; P=.92). According to how Fleiss κ values were interpreted, 6 of the control group's score points were classified as displaying "fair agreement" (P<.001), and 1 was classified as showing "substantial agreement" (P<.001). In the experimental group, 3 points were classified as indicating "fair agreement," while 4 suggested "moderate agreement" (P<.001).

CONCLUSIONS

ChatGPT-4 matches the expertise found in DingXiangYuan forums' paid consultations, excelling particularly in scientific education. It presents a promising alternative for remote health advice. For health care professionals, it could act as an aid in patient education, while patients may use it as a convenient tool for health inquiries.

Collapse

Akinci D’Antonoli T, Stanzione A, Bluethgen C, Vernuccio F, Ugga L, Klontzas ME, Cuocolo R, Cannella R, Koçak B. Large language models in radiology: fundamentals, applications, ethical considerations, risks, and future directions. Diagn Interv Radiol 2024;30:80-90. [PMID: 37789676 PMCID: PMC10916534 DOI: 10.4274/dir.2023.232417] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 09/18/2023] [Indexed: 10/05/2023]

Spotnitz M, Idnay B, Gordon ER, Shyu R, Zhang G, Liu C, Cimino JJ, Weng C. A Survey of Clinicians' Views of the Utility of Large Language Models. Appl Clin Inform 2024;15:306-312. [PMID: 38442909 PMCID: PMC11023712 DOI: 10.1055/a-2281-7092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 02/15/2024] [Indexed: 03/07/2024] Open

Monteith S, Glenn T, Geddes JR, Whybrow PC, Achtyes ED, Bauer M. Implications of Online Self-Diagnosis in Psychiatry. PHARMACOPSYCHIATRY 2024;57:45-52. [PMID: 38471511 DOI: 10.1055/a-2268-5441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/14/2024]

Rouhi AD, Ghanem YK, Yolchieva L, Saleh Z, Joshi H, Moccia MC, Suarez-Pierre A, Han JJ. Can Artificial Intelligence Improve the Readability of Patient Education Materials on Aortic Stenosis? A Pilot Study. Cardiol Ther 2024;13:137-147. [PMID: 38194058 PMCID: PMC10899139 DOI: 10.1007/s40119-023-00347-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Accepted: 12/13/2023] [Indexed: 01/10/2024] Open

Abstract

INTRODUCTION

The advent of generative artificial intelligence (AI) dialogue platforms and large language models (LLMs) may help facilitate ongoing efforts to improve health literacy. Additionally, recent studies have highlighted inadequate health literacy among patients with cardiac disease. The aim of the present study was to ascertain whether two freely available generative AI dialogue platforms could rewrite online aortic stenosis (AS) patient education materials (PEMs) to meet recommended reading skill levels for the public.

METHODS

Online PEMs were gathered from a professional cardiothoracic surgical society and academic institutions in the USA. PEMs were then inputted into two AI-powered LLMs, ChatGPT-3.5 and Bard, with the prompt "translate to 5th-grade reading level". Readability of PEMs before and after AI conversion was measured using the validated Flesch Reading Ease (FRE), Flesch-Kincaid Grade Level (FKGL), Simple Measure of Gobbledygook Index (SMOGI), and Gunning-Fog Index (GFI) scores.

RESULTS

Overall, 21 PEMs on AS were gathered. Original readability measures indicated difficult readability at the 10th-12th grade reading level. ChatGPT-3.5 successfully improved readability across all four measures (p < 0.001) to the approximately 6th-7th grade reading level. Bard successfully improved readability across all measures (p < 0.001) except for SMOGI (p = 0.729) to the approximately 8th-9th grade level. Neither platform generated PEMs written below the recommended 6th-grade reading level. ChatGPT-3.5 demonstrated significantly more favorable post-conversion readability scores, percentage change in readability scores, and conversion time compared to Bard (all p < 0.001).

CONCLUSION

AI dialogue platforms can enhance the readability of PEMs for patients with AS but may not fully meet recommended reading skill levels, highlighting potential tools to help strengthen cardiac health literacy in the future.

Collapse

Sallam M, Barakat M, Sallam M. A Preliminary Checklist (METRICS) to Standardize the Design and Reporting of Studies on Generative Artificial Intelligence-Based Models in Health Care Education and Practice: Development Study Involving a Literature Review. Interact J Med Res 2024;13:e54704. [PMID: 38276872 PMCID: PMC10905357 DOI: 10.2196/54704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Revised: 12/18/2023] [Accepted: 01/26/2024] [Indexed: 01/27/2024] Open

Abstract

BACKGROUND

Adherence to evidence-based practice is indispensable in health care. Recently, the utility of generative artificial intelligence (AI) models in health care has been evaluated extensively. However, the lack of consensus guidelines on the design and reporting of findings of these studies poses a challenge for the interpretation and synthesis of evidence.

OBJECTIVE

This study aimed to develop a preliminary checklist to standardize the reporting of generative AI-based studies in health care education and practice.

METHODS

A literature review was conducted in Scopus, PubMed, and Google Scholar. Published records with "ChatGPT," "Bing," or "Bard" in the title were retrieved. Careful examination of the methodologies employed in the included records was conducted to identify the common pertinent themes and the possible gaps in reporting. A panel discussion was held to establish a unified and thorough checklist for the reporting of AI studies in health care. The finalized checklist was used to evaluate the included records by 2 independent raters. Cohen κ was used as the method to evaluate the interrater reliability.

RESULTS

The final data set that formed the basis for pertinent theme identification and analysis comprised a total of 34 records. The finalized checklist included 9 pertinent themes collectively referred to as METRICS (Model, Evaluation, Timing, Range/Randomization, Individual factors, Count, and Specificity of prompts and language). Their details are as follows: (1) Model used and its exact settings; (2) Evaluation approach for the generated content; (3) Timing of testing the model; (4) Transparency of the data source; (5) Range of tested topics; (6) Randomization of selecting the queries; (7) Individual factors in selecting the queries and interrater reliability; (8) Count of queries executed to test the model; and (9) Specificity of the prompts and language used. The overall mean METRICS score was 3.0 (SD 0.58). The tested METRICS score was acceptable, with the range of Cohen κ of 0.558 to 0.962 (P<.001 for the 9 tested items). With classification per item, the highest average METRICS score was recorded for the "Model" item, followed by the "Specificity" item, while the lowest scores were recorded for the "Randomization" item (classified as suboptimal) and "Individual factors" item (classified as satisfactory).

CONCLUSIONS

The METRICS checklist can facilitate the design of studies guiding researchers toward best practices in reporting results. The findings highlight the need for standardized reporting algorithms for generative AI-based studies in health care, considering the variability observed in methodologies and reporting. The proposed METRICS checklist could be a preliminary helpful base to establish a universally accepted approach to standardize the design and reporting of generative AI-based studies in health care, which is a swiftly evolving research topic.

Collapse

Bilal M, Jamil Y, Rana D, Shah HH. Enhancing Awareness and Self-diagnosis of Obstructive Sleep Apnea Using AI-Powered Chatbots: The Role of ChatGPT in Revolutionizing Healthcare. Ann Biomed Eng 2024;52:136-138. [PMID: 37389659 DOI: 10.1007/s10439-023-03298-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 06/22/2023] [Indexed: 07/01/2023]

Alanezi F. Assessing the Effectiveness of ChatGPT in Delivering Mental Health Support: A Qualitative Study. J Multidiscip Healthc 2024;17:461-471. [PMID: 38314011 PMCID: PMC10838501 DOI: 10.2147/jmdh.s447368] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 01/08/2024] [Indexed: 02/06/2024] Open

Abdaljaleel M, Barakat M, Alsanafi M, Salim NA, Abazid H, Malaeb D, Mohammed AH, Hassan BAR, Wayyes AM, Farhan SS, Khatib SE, Rahal M, Sahban A, Abdelaziz DH, Mansour NO, AlZayer R, Khalil R, Fekih-Romdhane F, Hallit R, Hallit S, Sallam M. A multinational study on the factors influencing university students' attitudes and usage of ChatGPT. Sci Rep 2024;14:1983. [PMID: 38263214 PMCID: PMC10806219 DOI: 10.1038/s41598-024-52549-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Accepted: 01/19/2024] [Indexed: 01/25/2024] Open

Affiliation(s)

Maram Abdaljaleel Department of Pathology, Microbiology and Forensic Medicine, School of Medicine, The University of Jordan, Amman, 11942, Jordan Department of Clinical Laboratories and Forensic Medicine, Jordan University Hospital, Amman, 11942, Jordan
Muna Barakat Department of Clinical Pharmacy and Therapeutics, Faculty of Pharmacy, Applied Science Private University, Amman, 11931, Jordan
Mariam Alsanafi Department of Pharmacy Practice, Faculty of Pharmacy, Kuwait University, Kuwait City, Kuwait Department of Pharmaceutical Sciences, Public Authority for Applied Education and Training, College of Health Sciences, Safat, Kuwait
Nesreen A Salim Prosthodontic Department, School of Dentistry, The University of Jordan, Amman, 11942, Jordan Prosthodontic Department, Jordan University Hospital, Amman, 11942, Jordan
Husam Abazid Department of Clinical Pharmacy and Therapeutics, Faculty of Pharmacy, Applied Science Private University, Amman, 11931, Jordan
Diana Malaeb College of Pharmacy, Gulf Medical University, P.O. Box 4184, Ajman, United Arab Emirates
Ali Haider Mohammed School of Pharmacy, Monash University Malaysia, Jalan Lagoon Selatan, 47500, Bandar Sunway, Selangor Darul Ehsan, Malaysia
Bassam Abdul Rasool Hassan Department of Pharmacy, Al Rafidain University College, Baghdad, 10001, Iraq
Abdulrasool M Wayyes Department of Pharmacy, Al Rafidain University College, Baghdad, 10001, Iraq
Sinan Subhi Farhan Department of Anesthesia, Al Rafidain University College, Baghdad, 10001, Iraq
Sami El Khatib Department of Biomedical Sciences, School of Arts and Sciences, Lebanese International University, Bekaa, Lebanon Center for Applied Mathematics and Bioinformatics (CAMB), Gulf University for Science and Technology (GUST), 32093, Hawally, Kuwait
Mohamad Rahal School of Pharmacy, Lebanese International University, Beirut, 961, Lebanon
Ali Sahban School of Dentistry, The University of Jordan, Amman, 11942, Jordan
Doaa H Abdelaziz Pharmacy Practice and Clinical Pharmacy Department, Faculty of Pharmacy, Future University in Egypt, Cairo, 11835, Egypt Department of Clinical Pharmacy, Faculty of Pharmacy, Al-Baha University, Al-Baha, Saudi Arabia
Noha O Mansour Clinical Pharmacy and Pharmacy Practice Department, Faculty of Pharmacy, Mansoura University, Mansoura, 35516, Egypt Clinical Pharmacy and Pharmacy Practice Department, Faculty of Pharmacy, Mansoura National University, Dakahlia Governorate, 7723730, Egypt
Reem AlZayer Clinical Pharmacy Practice, Department of Pharmacy, Mohammed Al-Mana College for Medical Sciences, 34222, Dammam, Saudi Arabia
Roaa Khalil Department of Pathology, Microbiology and Forensic Medicine, School of Medicine, The University of Jordan, Amman, 11942, Jordan
Feten Fekih-Romdhane The Tunisian Center of Early Intervention in Psychosis, Department of Psychiatry "Ibn Omrane", Razi Hospital, 2010, Manouba, Tunisia Faculty of Medicine of Tunis, Tunis El Manar University, Tunis, Tunisia
Rabih Hallit School of Medicine and Medical Sciences, Holy Spirit University of Kaslik, Jounieh, Lebanon Department of Infectious Disease, Bellevue Medical Center, Mansourieh, Lebanon Department of Infectious Disease, Notre Dame des Secours, University Hospital Center, Byblos, Lebanon
Souheil Hallit School of Medicine and Medical Sciences, Holy Spirit University of Kaslik, Jounieh, Lebanon Research Department, Psychiatric Hospital of the Cross, Jal Eddib, Lebanon
Malik Sallam Department of Pathology, Microbiology and Forensic Medicine, School of Medicine, The University of Jordan, Amman, 11942, Jordan. Department of Clinical Laboratories and Forensic Medicine, Jordan University Hospital, Amman, 11942, Jordan.

Collapse

Kim R, Margolis A, Barile J, Han K, Kalash S, Papaioannou H, Krevskaya A, Milanaik R. Challenging the Chatbot: An Assessment of ChatGPT's Diagnoses and Recommendations for DBP Case Studies. J Dev Behav Pediatr 2024;45:e8-e13. [PMID: 38347665 DOI: 10.1097/dbp.0000000000001255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 11/14/2023] [Indexed: 02/18/2024]

Pushpanathan K, Lim ZW, Er Yew SM, Chen DZ, Hui'En Lin HA, Lin Goh JH, Wong WM, Wang X, Jin Tan MC, Chang Koh VT, Tham YC. Popular large language model chatbots' accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries. iScience 2023;26:108163. [PMID: 37915603 PMCID: PMC10616302 DOI: 10.1016/j.isci.2023.108163] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 09/19/2023] [Accepted: 10/05/2023] [Indexed: 11/03/2023] Open

Affiliation(s)

Krithi Pushpanathan Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
Zhi Wei Lim Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
Samantha Min Er Yew Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
David Ziyou Chen Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore Department of Ophthalmology, National University Hospital, Singapore, Singapore
Hazel Anne Hui'En Lin Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore Department of Ophthalmology, National University Hospital, Singapore, Singapore
Jocelyn Hui Lin Goh Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore
Wendy Meihua Wong Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore Department of Ophthalmology, National University Hospital, Singapore, Singapore
Xiaofei Wang Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing, China Advanced Innovation Centre for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, Beijing, China
Marcus Chun Jin Tan Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore Department of Ophthalmology, National University Hospital, Singapore, Singapore
Victor Teck Chang Koh Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore Department of Ophthalmology, National University Hospital, Singapore, Singapore
Yih-Chung Tham Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore Ophthalmology and Visual Sciences Academic Clinical Programme (Eye ACP), Duke NUS Medical School, Singapore, Singapore

Collapse

Sallam M, Barakat M, Sallam M. Pilot Testing of a Tool to Standardize the Assessment of the Quality of Health Information Generated by Artificial Intelligence-Based Models. Cureus 2023;15:e49373. [PMID: 38024074 PMCID: PMC10674084 DOI: 10.7759/cureus.49373] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/24/2023] [Indexed: 12/01/2023] Open

Abstract

Background Artificial intelligence (AI)-based conversational models, such as Chat Generative Pre-trained Transformer (ChatGPT), Microsoft Bing, and Google Bard, have emerged as valuable sources of health information for lay individuals. However, the accuracy of the information provided by these AI models remains a significant concern. This pilot study aimed to test a new tool with key themes for inclusion as follows: Completeness of content, Lack of false information in the content, Evidence supporting the content, Appropriateness of the content, and Relevance, referred to as "CLEAR", designed to assess the quality of health information delivered by AI-based models. Methods Tool development involved a literature review on health information quality, followed by the initial establishment of the CLEAR tool, which comprised five items that aimed to assess the following: completeness, lack of false information, evidence support, appropriateness, and relevance. Each item was scored on a five-point Likert scale from excellent to poor. Content validity was checked by expert review. Pilot testing involved 32 healthcare professionals using the CLEAR tool to assess content on eight different health topics deliberately designed with varying qualities. The internal consistency was checked with Cronbach's alpha (α). Feedback from the pilot test resulted in language modifications to improve the clarity of the items. The final CLEAR tool was used to assess the quality of health information generated by four distinct AI models on five health topics. The AI models were ChatGPT 3.5, ChatGPT 4, Microsoft Bing, and Google Bard, and the content generated was scored by two independent raters with Cohen's kappa (κ) for inter-rater agreement. Results The final five CLEAR items were: (1) Is the content sufficient?; (2) Is the content accurate?; (3) Is the content evidence-based?; (4) Is the content clear, concise, and easy to understand?; and (5) Is the content free from irrelevant information? Pilot testing on the eight health topics revealed acceptable internal consistency with a Cronbach's α range of 0.669-0.981. The use of the final CLEAR tool yielded the following average scores: Microsoft Bing (mean=24.4±0.42), ChatGPT-4 (mean=23.6±0.96), Google Bard (mean=21.2±1.79), and ChatGPT-3.5 (mean=20.6±5.20). The inter-rater agreement revealed the following Cohen κ values: for ChatGPT-3.5 (κ=0.875, P<.001), ChatGPT-4 (κ=0.780, P<.001), Microsoft Bing (κ=0.348, P=.037), and Google Bard (κ=.749, P<.001). Conclusions The CLEAR tool is a brief yet helpful tool that can aid in standardizing testing of the quality of health information generated by AI-based models. Future studies are recommended to validate the utility of the CLEAR tool in the quality assessment of AI-generated health-related content using a larger sample across various complex health topics.

Collapse

Islam MR, Urmi TJ, Mosharrafa RA, Rahman MS, Kadir MF. Role of ChatGPT in health science and research: A correspondence addressing potential application. Health Sci Rep 2023;6:e1625. [PMID: 37841943 PMCID: PMC10568002 DOI: 10.1002/hsr2.1625] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 09/01/2023] [Accepted: 09/28/2023] [Indexed: 10/17/2023] Open

Levkovich I, Elyoseph Z. Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 Versus ChatGPT-4: Vignette Study. JMIR Ment Health 2023;10:e51232. [PMID: 37728984 PMCID: PMC10551796 DOI: 10.2196/51232] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 08/22/2023] [Accepted: 08/24/2023] [Indexed: 09/22/2023] Open

Abstract

BACKGROUND

ChatGPT, a linguistic artificial intelligence (AI) model engineered by OpenAI, offers prospective contributions to mental health professionals. Although having significant theoretical implications, ChatGPT's practical capabilities, particularly regarding suicide prevention, have not yet been substantiated.

OBJECTIVE

The study's aim was to evaluate ChatGPT's ability to assess suicide risk, taking into consideration 2 discernable factors-perceived burdensomeness and thwarted belongingness-over a 2-month period. In addition, we evaluated whether ChatGPT-4 more accurately evaluated suicide risk than did ChatGPT-3.5.

METHODS

ChatGPT was tasked with assessing a vignette that depicted a hypothetical patient exhibiting differing degrees of perceived burdensomeness and thwarted belongingness. The assessments generated by ChatGPT were subsequently contrasted with standard evaluations rendered by mental health professionals. Using both ChatGPT-3.5 and ChatGPT-4 (May 24, 2023), we executed 3 evaluative procedures in June and July 2023. Our intent was to scrutinize ChatGPT-4's proficiency in assessing various facets of suicide risk in relation to the evaluative abilities of both mental health professionals and an earlier version of ChatGPT-3.5 (March 14 version).

RESULTS

During the period of June and July 2023, we found that the likelihood of suicide attempts as evaluated by ChatGPT-4 was similar to the norms of mental health professionals (n=379) under all conditions (average Z score of 0.01). Nonetheless, a pronounced discrepancy was observed regarding the assessments performed by ChatGPT-3.5 (May version), which markedly underestimated the potential for suicide attempts, in comparison to the assessments carried out by the mental health professionals (average Z score of -0.83). The empirical evidence suggests that ChatGPT-4's evaluation of the incidence of suicidal ideation and psychache was higher than that of the mental health professionals (average Z score of 0.47 and 1.00, respectively). Conversely, the level of resilience as assessed by both ChatGPT-4 and ChatGPT-3.5 (both versions) was observed to be lower in comparison to the assessments offered by mental health professionals (average Z score of -0.89 and -0.90, respectively).

CONCLUSIONS

The findings suggest that ChatGPT-4 estimates the likelihood of suicide attempts in a manner akin to evaluations provided by professionals. In terms of recognizing suicidal ideation, ChatGPT-4 appears to be more precise. However, regarding psychache, there was an observed overestimation by ChatGPT-4, indicating a need for further research. These results have implications regarding ChatGPT-4's potential to support gatekeepers, patients, and even mental health professionals' decision-making. Despite the clinical potential, intensive follow-up studies are necessary to establish the use of ChatGPT-4's capabilities in clinical practice. The finding that ChatGPT-3.5 frequently underestimates suicide risk, especially in severe cases, is particularly troubling. It indicates that ChatGPT may downplay one's actual suicide risk level.

Collapse

Kuroiwa T, Sarcon A, Ibara T, Yamada E, Yamamoto A, Tsukamoto K, Fujita K. The Potential of ChatGPT as a Self-Diagnostic Tool in Common Orthopedic Diseases: Exploratory Study. J Med Internet Res 2023;25:e47621. [PMID: 37713254 PMCID: PMC10541638 DOI: 10.2196/47621] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 05/17/2023] [Accepted: 08/17/2023] [Indexed: 09/16/2023] Open

Abstract

BACKGROUND

Artificial intelligence (AI) has gained tremendous popularity recently, especially the use of natural language processing (NLP). ChatGPT is a state-of-the-art chatbot capable of creating natural conversations using NLP. The use of AI in medicine can have a tremendous impact on health care delivery. Although some studies have evaluated ChatGPT's accuracy in self-diagnosis, there is no research regarding its precision and the degree to which it recommends medical consultations.

OBJECTIVE

The aim of this study was to evaluate ChatGPT's ability to accurately and precisely self-diagnose common orthopedic diseases, as well as the degree of recommendation it provides for medical consultations.

METHODS

Over a 5-day course, each of the study authors submitted the same questions to ChatGPT. The conditions evaluated were carpal tunnel syndrome (CTS), cervical myelopathy (CM), lumbar spinal stenosis (LSS), knee osteoarthritis (KOA), and hip osteoarthritis (HOA). Answers were categorized as either correct, partially correct, incorrect, or a differential diagnosis. The percentage of correct answers and reproducibility were calculated. The reproducibility between days and raters were calculated using the Fleiss κ coefficient. Answers that recommended that the patient seek medical attention were recategorized according to the strength of the recommendation as defined by the study.

RESULTS

The ratios of correct answers were 25/25, 1/25, 24/25, 16/25, and 17/25 for CTS, CM, LSS, KOA, and HOA, respectively. The ratios of incorrect answers were 23/25 for CM and 0/25 for all other conditions. The reproducibility between days was 1.0, 0.15, 0.7, 0.6, and 0.6 for CTS, CM, LSS, KOA, and HOA, respectively. The reproducibility between raters was 1.0, 0.1, 0.64, -0.12, and 0.04 for CTS, CM, LSS, KOA, and HOA, respectively. Among the answers recommending medical attention, the phrases "essential," "recommended," "best," and "important" were used. Specifically, "essential" occurred in 4 out of 125, "recommended" in 12 out of 125, "best" in 6 out of 125, and "important" in 94 out of 125 answers. Additionally, 7 out of the 125 answers did not include a recommendation to seek medical attention.

CONCLUSIONS

The accuracy and reproducibility of ChatGPT to self-diagnose five common orthopedic conditions were inconsistent. The accuracy could potentially be improved by adding symptoms that could easily identify a specific location. Only a few answers were accompanied by a strong recommendation to seek medical attention according to our study standards. Although ChatGPT could serve as a potential first step in accessing care, we found variability in accurate self-diagnosis. Given the risk of harm with self-diagnosis without medical follow-up, it would be prudent for an NLP to include clear language alerting patients to seek expert medical opinions. We hope to shed further light on the use of AI in a future clinical study.

Collapse

Oca MC, Meller L, Wilson K, Parikh AO, McCoy A, Chang J, Sudharshan R, Gupta S, Zhang-Nunes S. Bias and Inaccuracy in AI Chatbot Ophthalmologist Recommendations. Cureus 2023;15:e45911. [PMID: 37885556 PMCID: PMC10599183 DOI: 10.7759/cureus.45911] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/25/2023] [Indexed: 10/28/2023] Open

Abstract

PURPOSE AND DESIGN

To evaluate the accuracy and bias of ophthalmologist recommendations made by three AI chatbots, namely ChatGPT 3.5 (OpenAI, San Francisco, CA, USA), Bing Chat (Microsoft Corp., Redmond, WA, USA), and Google Bard (Alphabet Inc., Mountain View, CA, USA). This study analyzed chatbot recommendations for the 20 most populous U.S. cities.

METHODS

Each chatbot returned 80 total recommendations when given the prompt "Find me four good ophthalmologists in (city)." Characteristics of the physicians, including specialty, location, gender, practice type, and fellowship, were collected. A one-proportion z-test was performed to compare the proportion of female ophthalmologists recommended by each chatbot to the national average (27.2% per the Association of American Medical Colleges (AAMC)). Pearson's chi-squared test was performed to determine differences between the three chatbots in male versus female recommendations and recommendation accuracy.

RESULTS

Female ophthalmologists recommended by Bing Chat (1.61%) and Bard (8.0%) were significantly less than the national proportion of 27.2% practicing female ophthalmologists (p<0.001, p<0.01, respectively). ChatGPT recommended fewer female (29.5%) than male ophthalmologists (p<0.722). ChatGPT (73.8%), Bing Chat (67.5%), and Bard (62.5%) gave high rates of inaccurate recommendations. Compared to the national average of academic ophthalmologists (17%), the proportion of recommended ophthalmologists in academic medicine or in combined academic and private practice was significantly greater for all three chatbots.

CONCLUSION

This study revealed substantial bias and inaccuracy in the AI chatbots' recommendations. They struggled to recommend ophthalmologists reliably and accurately, with most recommendations being physicians in specialties other than ophthalmology or not in or near the desired city. Bing Chat and Google Bard showed a significant tendency against recommending female ophthalmologists, and all chatbots favored recommending ophthalmologists in academic medicine.

Collapse

Kumari A, Kumari A, Singh A, Singh SK, Juhi A, Dhanvijay AKD, Pinjar MJ, Mondal H. Large Language Models in Hematology Case Solving: A Comparative Study of ChatGPT-3.5, Google Bard, and Microsoft Bing. Cureus 2023;15:e43861. [PMID: 37736448 PMCID: PMC10511207 DOI: 10.7759/cureus.43861] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/21/2023] [Indexed: 09/23/2023] Open

Abstract

Background Large language models (LLMs), such as ChatGPT-3.5, Google Bard, and Microsoft Bing, have shown promising capabilities in various natural language processing (NLP) tasks. However, their performance and accuracy in solving domain-specific questions, particularly in the field of hematology, have not been extensively investigated. Objective This study aimed to explore the capability of LLMs, namely, ChatGPT-3.5, Google Bard, and Microsoft Bing (Precise), in solving hematology-related cases and comparing their performance. Methods This was a cross-sectional study conducted in the Department of Physiology and Pathology, All India Institute of Medical Sciences, Deoghar, Jharkhand, India. We curated a set of 50 cases on hematology covering a range of topics and complexities. The dataset included queries related to blood disorders, hematologic malignancies, laboratory test parameters, calculations, and treatment options. Each case and related question was prepared with a set of correct answers to compare with. We utilized ChatGPT-3.5, Google Bard Experiment, and Microsoft Bing (Precise) for question-answering tasks. The answers were checked by two physiologists and one pathologist. They rated the answers on a rating scale from one to five. The average score of the three models was compared by Friedman's test with Dunn's post-hoc test. The performance of the LLMs was compared with a median of 2.5 by a one-sample median test as the curriculum from which the questions were curated has a 50% pass grade. Results The scores among the three LLMs were significantly different (p-value < 0.0001) with the highest score by ChatGPT (3.15±1.19), followed by Bard (2.23±1.17) and Bing (1.98±1.01). The score of ChatGPT was significantly higher than 50% (p-value = 0.0004), Bard's score was close to 50% (p-value = 0.38), and Bing's score was significantly lower than the pass score (p-value = 0.0015). Conclusion The LLMs reveal significant differences in solving case vignettes in hematology. ChatGPT exhibited the highest score, followed by Google Bard and Microsoft Bing. The observed performance trends suggest that ChatGPT holds promising potential in the medical domain. However, none of the models was capable of answering all questions accurately. Further research and optimization of language models can offer valuable contributions to healthcare and medical education applications.

Collapse

Nov O, Singh N, Mann D. Putting ChatGPT's Medical Advice to the (Turing) Test: Survey Study. JMIR MEDICAL EDUCATION 2023;9:e46939. [PMID: 37428540 PMCID: PMC10366957 DOI: 10.2196/46939] [Citation(s) in RCA: 27] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 05/26/2023] [Accepted: 06/14/2023] [Indexed: 07/11/2023]

Abstract

BACKGROUND

Chatbots are being piloted to draft responses to patient questions, but patients' ability to distinguish between provider and chatbot responses and patients' trust in chatbots' functions are not well established.

OBJECTIVE

This study aimed to assess the feasibility of using ChatGPT (Chat Generative Pre-trained Transformer) or a similar artificial intelligence-based chatbot for patient-provider communication.

METHODS

A survey study was conducted in January 2023. Ten representative, nonadministrative patient-provider interactions were extracted from the electronic health record. Patients' questions were entered into ChatGPT with a request for the chatbot to respond using approximately the same word count as the human provider's response. In the survey, each patient question was followed by a provider- or ChatGPT-generated response. Participants were informed that 5 responses were provider generated and 5 were chatbot generated. Participants were asked-and incentivized financially-to correctly identify the response source. Participants were also asked about their trust in chatbots' functions in patient-provider communication, using a Likert scale from 1-5.

RESULTS

A US-representative sample of 430 study participants aged 18 and older were recruited on Prolific, a crowdsourcing platform for academic studies. In all, 426 participants filled out the full survey. After removing participants who spent less than 3 minutes on the survey, 392 respondents remained. Overall, 53.3% (209/392) of respondents analyzed were women, and the average age was 47.1 (range 18-91) years. The correct classification of responses ranged between 49% (192/392) to 85.7% (336/392) for different questions. On average, chatbot responses were identified correctly in 65.5% (1284/1960) of the cases, and human provider responses were identified correctly in 65.1% (1276/1960) of the cases. On average, responses toward patients' trust in chatbots' functions were weakly positive (mean Likert score 3.4 out of 5), with lower trust as the health-related complexity of the task in the questions increased.

CONCLUSIONS

ChatGPT responses to patient questions were weakly distinguishable from provider responses. Laypeople appear to trust the use of chatbots to answer lower-risk health questions. It is important to continue studying patient-chatbot interaction as chatbots move from administrative to more clinical roles in health care.

Collapse