1
|
Meyer A, Soleman A, Riese J, Streichert T. Comparison of ChatGPT, Gemini, and Le Chat with physician interpretations of medical laboratory questions from an online health forum. Clin Chem Lab Med 2024; 62:2425-2434. [PMID: 38804035 DOI: 10.1515/cclm-2024-0246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2024] [Accepted: 05/13/2024] [Indexed: 05/29/2024]
Abstract
OBJECTIVES Laboratory medical reports are often not intuitively comprehensible to non-medical professionals. Given their recent advancements, easier accessibility and remarkable performance on medical licensing exams, patients are therefore likely to turn to artificial intelligence-based chatbots to understand their laboratory results. However, empirical studies assessing the efficacy of these chatbots in responding to real-life patient queries regarding laboratory medicine are scarce. METHODS Thus, this investigation included 100 patient inquiries from an online health forum, specifically addressing Complete Blood Count interpretation. The aim was to evaluate the proficiency of three artificial intelligence-based chatbots (ChatGPT, Gemini and Le Chat) against the online responses from certified physicians. RESULTS The findings revealed that the chatbots' interpretations of laboratory results were inferior to those from online medical professionals. While the chatbots exhibited a higher degree of empathetic communication, they frequently produced erroneous or overly generalized responses to complex patient questions. The appropriateness of chatbot responses ranged from 51 to 64 %, with 22 to 33 % of responses overestimating patient conditions. A notable positive aspect was the chatbots' consistent inclusion of disclaimers regarding its non-medical nature and recommendations to seek professional medical advice. CONCLUSIONS The chatbots' interpretations of laboratory results from real patient queries highlight a dangerous dichotomy - a perceived trustworthiness potentially obscuring factual inaccuracies. Given the growing inclination towards self-diagnosis using AI platforms, further research and improvement of these chatbots is imperative to increase patients' awareness and avoid future burdens on the healthcare system.
Collapse
Affiliation(s)
- Annika Meyer
- Institute of Clinical Chemistry, Faculty of Medicine and University Hospital, 27182 University Hospital Cologne , Cologne, Germany
| | - Ari Soleman
- Faculty of Medicine and University Hospital, 27182 University Hospital Cologne , Cologne, Germany
| | - Janik Riese
- Institute of Pathology, Faculty of Medicine, RWTH Aachen University, Aachen, Germany
| | - Thomas Streichert
- Institute of Clinical Chemistry, Faculty of Medicine and University Hospital, 27182 University Hospital Cologne , Cologne, Germany
| |
Collapse
|
2
|
Berry CE, Fazilat AZ, Lavin C, Lintel H, Cole N, Stingl CS, Valencia C, Morgan AG, Momeni A, Wan DC. Both Patients and Plastic Surgeons Prefer Artificial Intelligence-Generated Microsurgical Information. J Reconstr Microsurg 2024; 40:657-664. [PMID: 38382637 DOI: 10.1055/a-2273-4163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
BACKGROUND With the growing relevance of artificial intelligence (AI)-based patient-facing information, microsurgical-specific online information provided by professional organizations was compared with that of ChatGPT (Chat Generative Pre-Trained Transformer) and assessed for accuracy, comprehensiveness, clarity, and readability. METHODS Six plastic and reconstructive surgeons blindly assessed responses to 10 microsurgery-related medical questions written either by the American Society of Reconstructive Microsurgery (ASRM) or ChatGPT based on accuracy, comprehensiveness, and clarity. Surgeons were asked to choose which source provided the overall highest-quality microsurgical patient-facing information. Additionally, 30 individuals with no medical background (ages: 18-81, μ = 49.8) were asked to determine a preference when blindly comparing materials. Readability scores were calculated, and all numerical scores were analyzed using the following six reliability formulas: Flesch-Kincaid Grade Level, Flesch-Kincaid Readability Ease, Gunning Fog Index, Simple Measure of Gobbledygook Index, Coleman-Liau Index, Linsear Write Formula, and Automated Readability Index. Statistical analysis of microsurgical-specific online sources was conducted utilizing paired t-tests. RESULTS Statistically significant differences in comprehensiveness and clarity were seen in favor of ChatGPT. Surgeons, 70.7% of the time, blindly choose ChatGPT as the source that overall provided the highest-quality microsurgical patient-facing information. Nonmedical individuals 55.9% of the time selected AI-generated microsurgical materials as well. Neither ChatGPT nor ASRM-generated materials were found to contain inaccuracies. Readability scores for both ChatGPT and ASRM materials were found to exceed recommended levels for patient proficiency across six readability formulas, with AI-based material scored as more complex. CONCLUSION AI-generated patient-facing materials were preferred by surgeons in terms of comprehensiveness and clarity when blindly compared with online material provided by ASRM. Studied AI-generated material was not found to contain inaccuracies. Additionally, surgeons and nonmedical individuals consistently indicated an overall preference for AI-generated material. A readability analysis suggested that both materials sourced from ChatGPT and ASRM surpassed recommended reading levels across six readability scores.
Collapse
Affiliation(s)
- Charlotte E Berry
- Department of Surgery, Division of Plastic and Reconstructive Surgery, Hagey Laboratory for Pediatric Regenerative Medicine, Stanford University School of Medicine, Stanford, California
| | - Alexander Z Fazilat
- Department of Surgery, Division of Plastic and Reconstructive Surgery, Hagey Laboratory for Pediatric Regenerative Medicine, Stanford University School of Medicine, Stanford, California
| | - Christopher Lavin
- Department of Surgery, Division of Plastic and Reconstructive Surgery, Hagey Laboratory for Pediatric Regenerative Medicine, Stanford University School of Medicine, Stanford, California
| | - Hendrik Lintel
- Department of Surgery, Division of Plastic and Reconstructive Surgery, Hagey Laboratory for Pediatric Regenerative Medicine, Stanford University School of Medicine, Stanford, California
| | - Naomi Cole
- Department of Surgery, Division of Plastic and Reconstructive Surgery, Hagey Laboratory for Pediatric Regenerative Medicine, Stanford University School of Medicine, Stanford, California
| | - Cybil S Stingl
- Department of Surgery, Division of Plastic and Reconstructive Surgery, Hagey Laboratory for Pediatric Regenerative Medicine, Stanford University School of Medicine, Stanford, California
| | - Caleb Valencia
- Department of Surgery, Division of Plastic and Reconstructive Surgery, Hagey Laboratory for Pediatric Regenerative Medicine, Stanford University School of Medicine, Stanford, California
| | - Annah G Morgan
- Department of Surgery, Division of Plastic and Reconstructive Surgery, Hagey Laboratory for Pediatric Regenerative Medicine, Stanford University School of Medicine, Stanford, California
| | - Arash Momeni
- Department of Surgery, Division of Plastic and Reconstructive Surgery, Hagey Laboratory for Pediatric Regenerative Medicine, Stanford University School of Medicine, Stanford, California
| | - Derrick C Wan
- Department of Surgery, Division of Plastic and Reconstructive Surgery, Hagey Laboratory for Pediatric Regenerative Medicine, Stanford University School of Medicine, Stanford, California
| |
Collapse
|
3
|
Chen SY, Kuo HY, Chang SH. Perceptions of ChatGPT in healthcare: usefulness, trust, and risk. Front Public Health 2024; 12:1457131. [PMID: 39346584 PMCID: PMC11436320 DOI: 10.3389/fpubh.2024.1457131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Accepted: 08/29/2024] [Indexed: 10/01/2024] Open
Abstract
Introduction This study explores the perceptions of ChatGPT in healthcare settings in Taiwan, focusing on its usefulness, trust, and associated risks. As AI technologies like ChatGPT increasingly influence various sectors, their potential in public health education, promotion, medical education, and clinical practice is significant but not without challenges. The study aims to assess how individuals with and without healthcare-related education perceive and adopt ChatGPT, contributing to a deeper understanding of AI's role in enhancing public health outcomes. Methods An online survey was conducted among 659 university and graduate students, all of whom had prior experience using ChatGPT. The survey measured perceptions of ChatGPT's ease of use, novelty, usefulness, trust, and risk, particularly within clinical practice, medical education, and research settings. Multiple linear regression models were used to analyze how these factors influence perception in healthcare applications, comparing responses between healthcare majors and non-healthcare majors. Results The study revealed that both healthcare and non-healthcare majors find ChatGPT more useful in medical education and research than in clinical practice. Regression analysis revealed that for healthcare majors, general trust is crucial for ChatGPT's adoption in clinical practice and influences its use in medical education and research. For non-healthcare majors, novelty, perceived general usefulness, and trust are key predictors. Interestingly, while healthcare majors were cautious about ease of use, fearing it might increase risk, non-healthcare majors associated increased complexity with greater trust. Conclusion This study highlights the varying expectations between healthcare and non-healthcare majors regarding ChatGPT's role in healthcare. The findings suggest the need for AI applications to be tailored to address specific user needs, particularly in clinical practice, where trust and reliability are paramount. Additionally, the potential of AI tools like ChatGPT to contribute to public health education and promotion is significant, as these technologies can enhance health literacy and encourage behavior change. These insights can inform future healthcare practices and policies by guiding the thoughtful and effective integration of AI tools like ChatGPT, ensuring they complement clinical judgment, enhance educational outcomes, support research integrity, and ultimately contribute to improved public health outcomes.
Collapse
Affiliation(s)
- Su-Yen Chen
- Institute of Learning Sciences and Technologies, National Tsing Hua University, Hsinchu, Taiwan
| | - H Y Kuo
- Institute of Learning Sciences and Technologies, National Tsing Hua University, Hsinchu, Taiwan
| | - Shu-Hao Chang
- Department of Sport Management, College of Health and Human Performance, University of Florida, Gainesville, FL, United States
| |
Collapse
|
4
|
Hatherley J, Kinderlerer A, Bjerring JC, Munch LA, Threlfall L. The FHJ debate: Will artificial intelligence replace clinical decision making within our lifetimes? Future Healthc J 2024; 11:100178. [PMID: 39371529 PMCID: PMC11452837 DOI: 10.1016/j.fhj.2024.100178] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2024] [Accepted: 08/29/2024] [Indexed: 10/08/2024]
Affiliation(s)
- Joshua Hatherley
- Aarhus University, Department of Philosophy and History of Ideas, Denmark
| | | | | | | | - Lynsey Threlfall
- Royal Victoria Infirmary Newcastle, Newcastle University - BRC, England
| |
Collapse
|
5
|
Ayo-Ajibola O, Davis RJ, Lin ME, Riddell J, Kravitz RL. Characterizing the Adoption and Experiences of Users of Artificial Intelligence-Generated Health Information in the United States: Cross-Sectional Questionnaire Study. J Med Internet Res 2024; 26:e55138. [PMID: 39141910 PMCID: PMC11358651 DOI: 10.2196/55138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 03/27/2024] [Accepted: 04/15/2024] [Indexed: 08/16/2024] Open
Abstract
BACKGROUND OpenAI's ChatGPT is a source of advanced online health information (OHI) that may be integrated into individuals' health information-seeking routines. However, concerns have been raised about its factual accuracy and impact on health outcomes. To forecast implications for medical practice and public health, more information is needed on who uses the tool, how often, and for what. OBJECTIVE This study aims to characterize the reasons for and types of ChatGPT OHI use and describe the users most likely to engage with the platform. METHODS In this cross-sectional survey, patients received invitations to participate via the ResearchMatch platform, a nonprofit affiliate of the National Institutes of Health. A web-based survey measured demographic characteristics, use of ChatGPT and other sources of OHI, experience characterization, and resultant health behaviors. Descriptive statistics were used to summarize the data. Both 2-tailed t tests and Pearson chi-square tests were used to compare users of ChatGPT OHI to nonusers. RESULTS Of 2406 respondents, 21.5% (n=517) respondents reported using ChatGPT for OHI. ChatGPT users were younger than nonusers (32.8 vs 39.1 years, P<.001) with lower advanced degree attainment (BA or higher; 49.9% vs 67%, P<.001) and greater use of transient health care (ED and urgent care; P<.001). ChatGPT users were more avid consumers of general non-ChatGPT OHI (percentage of weekly or greater OHI seeking frequency in past 6 months, 28.2% vs 22.8%, P<.001). Around 39.3% (n=206) respondents endorsed using the platform for OHI 2-3 times weekly or more, and most sought the tool to determine if a consultation was required (47.4%, n=245) or to explore alternative treatment (46.2%, n=239). Use characterization was favorable as many believed ChatGPT to be just as or more useful than other OHIs (87.7%, n=429) and their doctor (81%, n=407). About one-third of respondents requested a referral (35.6%, n=184) or changed medications (31%, n=160) based on the information received from ChatGPT. As many users reported skepticism regarding the ChatGPT output (67.9%, n=336), most turned to their physicians (67.5%, n=349). CONCLUSIONS This study underscores the significant role of AI-generated OHI in shaping health-seeking behaviors and the potential evolution of patient-provider interactions. Given the proclivity of these users to enact health behavior changes based on AI-generated content, there is an opportunity for physicians to guide ChatGPT OHI users on an informed and examined use of the technology.
Collapse
Affiliation(s)
| | - Ryan J Davis
- Keck School of Medicine of the University of Southern California, Los Angeles, CA, United States
| | - Matthew E Lin
- Department of Head and Neck Surgery, David Geffen School of Medicine at University of California Los Angeles, Los Angeles, CA, United States
| | - Jeffrey Riddell
- Department of Emergency Medicine, Keck School of Medicine of the University of Southern California, Los Angeles, CA, United States
| | - Richard L Kravitz
- Division of General Medicine, University of California Davis, Sacramento, CA, United States
| |
Collapse
|
6
|
Cherrez-Ojeda I, Gallardo-Bastidas JC, Robles-Velasco K, Osorio MF, Velez Leon EM, Leon Velastegui M, Pauletto P, Aguilar-Díaz FC, Squassi A, González Eras SP, Cordero Carrasco E, Chavez Gonzalez KL, Calderon JC, Bousquet J, Bedbrook A, Faytong-Haro M. Understanding Health Care Students' Perceptions, Beliefs, and Attitudes Toward AI-Powered Language Models: Cross-Sectional Study. JMIR MEDICAL EDUCATION 2024; 10:e51757. [PMID: 39137029 DOI: 10.2196/51757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 09/26/2023] [Accepted: 04/30/2024] [Indexed: 08/15/2024]
Abstract
BACKGROUND ChatGPT was not intended for use in health care, but it has potential benefits that depend on end-user understanding and acceptability, which is where health care students become crucial. There is still a limited amount of research in this area. OBJECTIVE The primary aim of our study was to assess the frequency of ChatGPT use, the perceived level of knowledge, the perceived risks associated with its use, and the ethical issues, as well as attitudes toward the use of ChatGPT in the context of education in the field of health. In addition, we aimed to examine whether there were differences across groups based on demographic variables. The second part of the study aimed to assess the association between the frequency of use, the level of perceived knowledge, the level of risk perception, and the level of perception of ethics as predictive factors for participants' attitudes toward the use of ChatGPT. METHODS A cross-sectional survey was conducted from May to June 2023 encompassing students of medicine, nursing, dentistry, nutrition, and laboratory science across the Americas. The study used descriptive analysis, chi-square tests, and ANOVA to assess statistical significance across different categories. The study used several ordinal logistic regression models to analyze the impact of predictive factors (frequency of use, perception of knowledge, perception of risk, and ethics perception scores) on attitude as the dependent variable. The models were adjusted for gender, institution type, major, and country. Stata was used to conduct all the analyses. RESULTS Of 2661 health care students, 42.99% (n=1144) were unaware of ChatGPT. The median score of knowledge was "minimal" (median 2.00, IQR 1.00-3.00). Most respondents (median 2.61, IQR 2.11-3.11) regarded ChatGPT as neither ethical nor unethical. Most participants (median 3.89, IQR 3.44-4.34) "somewhat agreed" that ChatGPT (1) benefits health care settings, (2) provides trustworthy data, (3) is a helpful tool for clinical and educational medical information access, and (4) makes the work easier. In total, 70% (7/10) of people used it for homework. As the perceived knowledge of ChatGPT increased, there was a stronger tendency with regard to having a favorable attitude toward ChatGPT. Higher ethical consideration perception ratings increased the likelihood of considering ChatGPT as a source of trustworthy health care information (odds ratio [OR] 1.620, 95% CI 1.498-1.752), beneficial in medical issues (OR 1.495, 95% CI 1.452-1.539), and useful for medical literature (OR 1.494, 95% CI 1.426-1.564; P<.001 for all results). CONCLUSIONS Over 40% of American health care students (1144/2661, 42.99%) were unaware of ChatGPT despite its extensive use in the health field. Our data revealed the positive attitudes toward ChatGPT and the desire to learn more about it. Medical educators must explore how chatbots may be included in undergraduate health care education programs.
Collapse
Affiliation(s)
- Ivan Cherrez-Ojeda
- Universidad Espiritu Santo, Samborondon, Ecuador
- Respiralab Research Group, Guayaquil, Ecuador
| | | | - Karla Robles-Velasco
- Universidad Espiritu Santo, Samborondon, Ecuador
- Respiralab Research Group, Guayaquil, Ecuador
| | - María F Osorio
- Universidad Espiritu Santo, Samborondon, Ecuador
- Respiralab Research Group, Guayaquil, Ecuador
| | | | | | | | - F C Aguilar-Díaz
- Departamento Salud Pública, Escuela Nacional de Estudios Superiores, Universidad Nacional Autónoma de México, Guanajuato, Mexico
| | - Aldo Squassi
- Universidad de Buenos Aires, Facultad de Odontologìa, Cátedra de Odontología Preventiva y Comunitaria, Buenos Aires, Argentina
| | | | - Erita Cordero Carrasco
- Departamento de cirugía y traumatología bucal y maxilofacial, Universidad de Chile, Santiago, Chile
| | | | - Juan C Calderon
- Universidad Espiritu Santo, Samborondon, Ecuador
- Respiralab Research Group, Guayaquil, Ecuador
| | - Jean Bousquet
- Institute of Allergology, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Fraunhofer Institute for Translational Medicine and Pharmacology ITMP, Allergology and Immunology, Berlin, Germany
- MASK-air, Montpellier, France
| | | | - Marco Faytong-Haro
- Respiralab Research Group, Guayaquil, Ecuador
- Universidad Estatal de Milagro, Cdla Universitaria "Dr. Rómulo Minchala Murillo", Milagro, Ecuador
- Ecuadorian Development Research Lab, Daule, Ecuador
| |
Collapse
|
7
|
Schütz P, Lob S, Chahed H, Dathe L, Löwer M, Reiß H, Weigel A, Albrecht J, Tokgöz P, Dockweiler C. ChatGPT as an Information Source for Patients with Migraines: A Qualitative Case Study. Healthcare (Basel) 2024; 12:1594. [PMID: 39201153 PMCID: PMC11354001 DOI: 10.3390/healthcare12161594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Revised: 08/07/2024] [Accepted: 08/08/2024] [Indexed: 09/02/2024] Open
Abstract
Migraines are one of the most common and expensive neurological diseases worldwide. Non-pharmacological and digitally delivered treatment options have long been used in the treatment of migraines. For instance, migraine management tools, online migraine diagnosis or digitally networked patients have been used. Recently, applications of ChatGPT are used in fields of healthcare ranging from identifying potential research topics to assisting professionals in clinical diagnosis and helping patients in managing their health. Despite advances in migraine management, only a minority of patients are adequately informed and treated. It is important to provide these patients with information to help them manage the symptoms and their daily activities. The primary aim of this case study was to examine the appropriateness of ChatGPT to handle symptom descriptions responsibly, suggest supplementary assistance from credible sources, provide valuable perspectives on treatment options, and exhibit potential influences on daily life for patients with migraines. Using a deductive, qualitative study, ten interactions with ChatGPT on different migraine types were analyzed through semi-structured interviews. ChatGPT provided relevant information aligned with common scientific patient resources. Responses were generally intelligible and situationally appropriate, providing personalized insights despite occasional discrepancies in interaction. ChatGPT's empathetic tone and linguistic clarity encouraged user engagement. However, source citations were found to be inconsistent and, in some cases, not comprehensible, which affected the overall comprehensibility of the information. ChatGPT might be promising for patients seeking information on migraine conditions. Its user-specific responses demonstrate potential benefits over static web-based sources. However, reproducibility and accuracy issues highlight the need for digital health literacy. The findings underscore the necessity for continuously evaluating AI systems and their broader societal implications in health communication.
Collapse
Affiliation(s)
- Pascal Schütz
- Department Digital Health Sciences and Biomedicine, Professorship of Digital Public Health, School of Life Sciences, University of Siegen, 57076 Siegen, Germany; (S.L.); (H.C.); (L.D.); (M.L.); (H.R.); (A.W.); (J.A.); (P.T.); (C.D.)
| | | | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Sallam M, Al-Mahzoum K, Alshuaib O, Alhajri H, Alotaibi F, Alkhurainej D, Al-Balwah MY, Barakat M, Egger J. Language discrepancies in the performance of generative artificial intelligence models: an examination of infectious disease queries in English and Arabic. BMC Infect Dis 2024; 24:799. [PMID: 39118057 PMCID: PMC11308449 DOI: 10.1186/s12879-024-09725-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Accepted: 08/06/2024] [Indexed: 08/10/2024] Open
Abstract
BACKGROUND Assessment of artificial intelligence (AI)-based models across languages is crucial to ensure equitable access and accuracy of information in multilingual contexts. This study aimed to compare AI model efficiency in English and Arabic for infectious disease queries. METHODS The study employed the METRICS checklist for the design and reporting of AI-based studies in healthcare. The AI models tested included ChatGPT-3.5, ChatGPT-4, Bing, and Bard. The queries comprised 15 questions on HIV/AIDS, tuberculosis, malaria, COVID-19, and influenza. The AI-generated content was assessed by two bilingual experts using the validated CLEAR tool. RESULTS In comparing AI models' performance in English and Arabic for infectious disease queries, variability was noted. English queries showed consistently superior performance, with Bard leading, followed by Bing, ChatGPT-4, and ChatGPT-3.5 (P = .012). The same trend was observed in Arabic, albeit without statistical significance (P = .082). Stratified analysis revealed higher scores for English in most CLEAR components, notably in completeness, accuracy, appropriateness, and relevance, especially with ChatGPT-3.5 and Bard. Across the five infectious disease topics, English outperformed Arabic, except for flu queries in Bing and Bard. The four AI models' performance in English was rated as "excellent", significantly outperforming their "above-average" Arabic counterparts (P = .002). CONCLUSIONS Disparity in AI model performance was noticed between English and Arabic in response to infectious disease queries. This language variation can negatively impact the quality of health content delivered by AI models among native speakers of Arabic. This issue is recommended to be addressed by AI developers, with the ultimate goal of enhancing health outcomes.
Collapse
Affiliation(s)
- Malik Sallam
- Department of Pathology, Microbiology and Forensic Medicine, School of Medicine, The University of Jordan, Amman, 11942, Jordan.
- Department of Translational Medicine, Faculty of Medicine, Lund University, Malmö, 22184, Sweden.
- Department of Clinical Laboratories and Forensic Medicine, Jordan University Hospital, Queen Rania Al-Abdullah Street-Aljubeiha, P.O. Box: 13046, Amman, Jordan.
| | | | - Omaima Alshuaib
- School of Medicine, The University of Jordan, Amman, 11942, Jordan
| | - Hawajer Alhajri
- School of Medicine, The University of Jordan, Amman, 11942, Jordan
| | - Fatmah Alotaibi
- School of Medicine, The University of Jordan, Amman, 11942, Jordan
| | | | | | - Muna Barakat
- Department of Clinical Pharmacy and Therapeutics, Faculty of Pharmacy, Applied Science Private University, Amman, 11931, Jordan
- MEU Research Unit, Middle East University, Amman, 11831, Jordan
| | - Jan Egger
- Institute for AI in Medicine (IKIM), University Medicine Essen (AöR), Essen, Germany
| |
Collapse
|
9
|
Fazilat AZ, Berry CE, Churukian A, Lavin C, Kameni L, Brenac C, Podda S, Bruckman K, Lorenz HP, Khosla RK, Wan DC. AI-based Cleft Lip and Palate Surgical Information is Preferred by Both Plastic Surgeons and Patients in a Blind Comparison. Cleft Palate Craniofac J 2024:10556656241266368. [PMID: 39091088 DOI: 10.1177/10556656241266368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/04/2024] Open
Abstract
INTRODUCTION The application of artificial intelligence (AI) in healthcare has expanded in recent years, and these tools such as ChatGPT to generate patient-facing information have garnered particular interest. Online cleft lip and palate (CL/P) surgical information supplied by academic/professional (A/P) sources was therefore evaluated against ChatGPT regarding accuracy, comprehensiveness, and clarity. METHODS 11 plastic and reconstructive surgeons and 29 non-medical individuals blindly compared responses written by ChatGPT or A/P sources to 30 frequently asked CL/P surgery questions. Surgeons indicated preference, determined accuracy, and scored comprehensiveness and clarity. Non-medical individuals indicated preference. Calculations of readability scores were determined using seven readability formulas. Statistical analysis of CL/P surgical online information was performed using paired t-tests. RESULTS Surgeons, 60.88% of the time, blindly preferred material generated by ChatGPT over A/P sources. Additionally, surgeons consistently indicated that ChatGPT-generated material was more comprehensive and had greater clarity. No significant difference was found between ChatGPT and resources provided by professional organizations in terms of accuracy. Among individuals with no medical background, ChatGPT-generated materials were preferred 60.46% of the time. For materials from both ChatGPT and A/P sources, readability scores surpassed advised levels for patient proficiency across seven readability formulas. CONCLUSION As the prominence of ChatGPT-based language tools rises in the healthcare space, potential applications of the tools should be assessed by experts against existing high-quality sources. Our results indicate that ChatGPT is capable of producing high-quality material in terms of accuracy, comprehensiveness, and clarity preferred by both plastic surgeons and individuals with no medical background.
Collapse
Affiliation(s)
- Alexander Z Fazilat
- Hagey Laboratory for Pediatric Regenerative Medicine, Division of Plastic and Reconstructive Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, CA, USA
| | - Charlotte E Berry
- Hagey Laboratory for Pediatric Regenerative Medicine, Division of Plastic and Reconstructive Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, CA, USA
| | - Andrew Churukian
- Hagey Laboratory for Pediatric Regenerative Medicine, Division of Plastic and Reconstructive Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, CA, USA
| | - Christopher Lavin
- Hagey Laboratory for Pediatric Regenerative Medicine, Division of Plastic and Reconstructive Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, CA, USA
| | - Lionel Kameni
- Hagey Laboratory for Pediatric Regenerative Medicine, Division of Plastic and Reconstructive Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, CA, USA
| | - Camille Brenac
- Hagey Laboratory for Pediatric Regenerative Medicine, Division of Plastic and Reconstructive Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, CA, USA
| | - Silvio Podda
- Division of Plastic and Reconstructive Surgery, St. Joseph's Regional Medical Center, Paterson, NJ, USA
| | - Karl Bruckman
- Hagey Laboratory for Pediatric Regenerative Medicine, Division of Plastic and Reconstructive Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, CA, USA
| | - Hermann P Lorenz
- Hagey Laboratory for Pediatric Regenerative Medicine, Division of Plastic and Reconstructive Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, CA, USA
| | - Rohit K Khosla
- Hagey Laboratory for Pediatric Regenerative Medicine, Division of Plastic and Reconstructive Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, CA, USA
| | - Derrick C Wan
- Hagey Laboratory for Pediatric Regenerative Medicine, Division of Plastic and Reconstructive Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, CA, USA
| |
Collapse
|
10
|
Sallam M. Bibliometric top ten healthcare-related ChatGPT publications in the first ChatGPT anniversary. NARRA J 2024; 4:e917. [PMID: 39280327 PMCID: PMC11391998 DOI: 10.52225/narra.v4i2.917] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Accepted: 07/29/2024] [Indexed: 09/18/2024]
Abstract
Since its public release on November 30, 2022, ChatGPT has shown promising potential in diverse healthcare applications despite ethical challenges, privacy issues, and possible biases. The aim of this study was to identify and assess the most influential publications in the field of ChatGPT utility in healthcare using bibliometric analysis. The study employed an advanced search on three databases, Scopus, Web of Science, and Google Scholar, to identify ChatGPT-related records in healthcare education, research, and practice between November 27 and 30, 2023. The ranking was based on the retrieved citation count in each database. The additional alternative metrics that were evaluated included (1) Semantic Scholar highly influential citations, (2) PlumX captures, (3) PlumX mentions, (4) PlumX social media and (5) Altmetric Attention Scores (AASs). A total of 22 unique records published in 17 different scientific journals from 14 different publishers were identified in the three databases. Only two publications were in the top 10 list across the three databases. Variable publication types were identified, with the most common being editorial/commentary publications (n=8/22, 36.4%). Nine of the 22 records had corresponding authors affiliated with institutions in the United States (40.9%). The range of citation count varied per database, with the highest range identified in Google Scholar (1019-121), followed by Scopus (242-88), and Web of Science (171-23). Google Scholar citations were correlated significantly with the following metrics: Semantic Scholar highly influential citations (Spearman's correlation coefficient ρ=0.840, p<0.001), PlumX captures (ρ=0.831, p<0.001), PlumX mentions (ρ=0.609, p=0.004), and AASs (ρ=0.542, p=0.009). In conclusion, despite several acknowledged limitations, this study showed the evolving landscape of ChatGPT utility in healthcare. There is an urgent need for collaborative initiatives by all stakeholders involved to establish guidelines for ethical, transparent, and responsible use of ChatGPT in healthcare. The study revealed the correlation between citations and alternative metrics, highlighting its usefulness as a supplement to gauge the impact of publications, even in a rapidly growing research field.
Collapse
Affiliation(s)
- Malik Sallam
- Department of Pathology, Microbiology and Forensic Medicine, School of Medicine, The University of Jordan, Amman, Jordan
- Department of Clinical Laboratories and Forensic Medicine, Jordan University Hospital, Amman, Jordan
- Department of Translational Medicine, Faculty of Medicine, Lund University, Malmö, Sweden
| |
Collapse
|
11
|
Reis M, Reis F, Kunde W. Influence of believed AI involvement on the perception of digital medical advice. Nat Med 2024:10.1038/s41591-024-03180-7. [PMID: 39054373 DOI: 10.1038/s41591-024-03180-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 07/04/2024] [Indexed: 07/27/2024]
Abstract
Large language models offer novel opportunities to seek digital medical advice. While previous research primarily addressed the performance of such artificial intelligence (AI)-based tools, public perception of these advancements received little attention. In two preregistered studies (n = 2,280), we presented participants with scenarios of patients obtaining medical advice. All participants received identical information, but we manipulated the putative source of this advice ('AI', 'human physician', 'human + AI'). 'AI'- and 'human + AI'-labeled advice was evaluated as significantly less reliable and less empathetic compared with 'human'-labeled advice. Moreover, participants indicated lower willingness to follow the advice when AI was believed to be involved in advice generation. Our findings point toward an anti-AI bias when receiving digital medical advice, even when AI is supposedly supervised by physicians. Given the tremendous potential of AI for medicine, elucidating ways to counteract this bias should be an important objective of future research.
Collapse
Affiliation(s)
- Moritz Reis
- Institute of Psychology, Julius-Maximilians-Universität Würzburg, Würzburg, Germany.
- Judge Business School, University of Cambridge, Cambridge, UK.
| | - Florian Reis
- Medical Affairs, Pfizer Pharma GmbH, Berlin, Germany
| | - Wilfried Kunde
- Institute of Psychology, Julius-Maximilians-Universität Würzburg, Würzburg, Germany
| |
Collapse
|
12
|
Lin HL, Liao LL, Wang YN, Chang LC. Attitude and utilization of ChatGPT among registered nurses: A cross-sectional study. Int Nurs Rev 2024. [PMID: 38979771 DOI: 10.1111/inr.13012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Accepted: 06/10/2024] [Indexed: 07/10/2024]
Abstract
AIM This study explores the influencing factors of attitudes and behaviors toward use of ChatGPT based on the Technology Acceptance Model among registered nurses in Taiwan. BACKGROUND The complexity of medical services and nursing shortages increases workloads. ChatGPT swiftly answers medical questions, provides clinical guidelines, and assists with patient information management, thereby improving nursing efficiency. INTRODUCTION To facilitate the development of effective ChatGPT training programs, it is essential to examine registered nurses' attitudes toward and utilization of ChatGPT across diverse workplace settings. METHODS An anonymous online survey was used to collect data from over 1000 registered nurses recruited through social media platforms between November 2023 and January 2024. Descriptive statistics and multiple linear regression analyses were conducted for data analysis. RESULTS Among respondents, some were unfamiliar with ChatGPT, while others had used it before, with higher usage among males, higher-educated individuals, experienced nurses, and supervisors. Gender and work settings influenced perceived risks, and those familiar with ChatGPT recognized its social impact. Perceived risk and usefulness significantly influenced its adoption. DISCUSSION Nurse attitudes to ChatGPT vary based on gender, education, experience, and role. Positive perceptions emphasize its usefulness, while risk concerns affect adoption. The insignificant role of perceived ease of use highlights ChatGPT's user-friendly nature. CONCLUSION Over half of the surveyed nurses had used or were familiar with ChatGPT and showed positive attitudes toward its use. Establishing rigorous guidelines to enhance their interaction with ChatGPT is crucial for future training. IMPLICATIONS FOR NURSING AND HEALTH POLICY Nurse managers should understand registered nurses' attitudes toward ChatGPT and integrate it into in-service education with tailored support and training, including appropriate prompt formulation and advanced decision-making, to prevent misuse.
Collapse
Affiliation(s)
- Hui-Ling Lin
- Department of Nursing, Linkou Branch, Chang Gung Memorial Hospital, Taoyuan, Taiwan, ROC
- School of Nursing, College of Medicine, Chang Gung University, Taoyuan, Taiwan, ROC
- School of Nursing, Chang Gung University of Science and Technology, Gui-Shan Town, Taoyuan, Taiwan, ROC
- Taipei Medical University, Taipei, Taiwan
| | - Li-Ling Liao
- Department of Public Health, College of Health Science, Kaohsiung Medical University, Kaohsiung City, Taiwan
- Department of Medical Research, Kaohsiung Medical University Hospital, Kaohsiung City, Taiwan
| | - Ya-Ni Wang
- School of Nursing, College of Medicine, Chang Gung University, Taoyuan, Taiwan, ROC
| | - Li-Chun Chang
- Department of Nursing, Linkou Branch, Chang Gung Memorial Hospital, Taoyuan, Taiwan, ROC
- School of Nursing, College of Medicine, Chang Gung University, Taoyuan, Taiwan, ROC
- School of Nursing, Chang Gung University of Science and Technology, Gui-Shan Town, Taoyuan, Taiwan, ROC
| |
Collapse
|
13
|
Elshaer IA, Hasanein AM, Sobaih AEE. The Moderating Effects of Gender and Study Discipline in the Relationship between University Students' Acceptance and Use of ChatGPT. Eur J Investig Health Psychol Educ 2024; 14:1981-1995. [PMID: 39056647 PMCID: PMC11275491 DOI: 10.3390/ejihpe14070132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Revised: 06/23/2024] [Accepted: 07/05/2024] [Indexed: 07/28/2024] Open
Abstract
The intensive adoption of ChatGPT by university students for learning has encouraged many scholars to test the variables that impact on their use of such AI in their learning. This study adds to the growing body of studies, especially in relation to the moderating role of students' gender and their study discipline in their acceptance and usage of ChatGPT in their learning process. This study expanded the Unified Theory of Acceptance and Use of Technology (UTAUT) by integrating gender as well as study disciplines as moderators. The study collected responses from students in Saudi universities with different study disciplines and of different genders. The results of a structural model using Smart PLS showed a significant moderating effect of gender on the relationship between performance expectancy and ChatGPT usage. The results confirmed that the impact of performance expectancy in fostering ChatGPT usage was stronger in male than in female students. Moreover, social influence was shown to significantly affect males more than females in relation to ChatGPT usage. In addition, the findings showed that study discipline significantly moderates the link between social influence and ChatGPT usage. In the same vein, social influence significantly influences ChatGPT use in social sciences more than in applied sciences. Hence, the various implications of the study were discussed.
Collapse
Affiliation(s)
- Ibrahim A. Elshaer
- Management Department, College of Business Administration, King Faisal University, Al-Ahsa 31982, Saudi Arabia; (I.A.E.); (A.M.H.)
- Faculty of Tourism and Hotel Management, Suez Canal University, Ismailia 41522, Egypt
| | - Ahmed M. Hasanein
- Management Department, College of Business Administration, King Faisal University, Al-Ahsa 31982, Saudi Arabia; (I.A.E.); (A.M.H.)
- Faculty of Tourism and Hotel Management, Helwan University, Cairo 12612, Egypt
| | - Abu Elnasr E. Sobaih
- Management Department, College of Business Administration, King Faisal University, Al-Ahsa 31982, Saudi Arabia; (I.A.E.); (A.M.H.)
- Faculty of Tourism and Hotel Management, Helwan University, Cairo 12612, Egypt
| |
Collapse
|
14
|
Law S, Oldfield B, Yang W. ChatGPT/GPT-4 (large language models): Opportunities and challenges of perspective in bariatric healthcare professionals. Obes Rev 2024; 25:e13746. [PMID: 38613164 DOI: 10.1111/obr.13746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 03/14/2024] [Accepted: 03/15/2024] [Indexed: 04/14/2024]
Abstract
ChatGPT/GPT-4 is a conversational large language model (LLM) based on artificial intelligence (AI). The potential application of LLM as a virtual assistant for bariatric healthcare professionals in education and practice may be promising if relevant and valid issues are actively examined and addressed. In general medical terms, it is possible that AI models like ChatGPT/GPT-4 will be deeply integrated into medical scenarios, improving medical efficiency and quality, and allowing doctors more time to communicate with patients and implement personalized health management. Chatbots based on AI have great potential in bariatric healthcare and may play an important role in predicting and intervening in weight loss and obesity-related complications. However, given its potential limitations, we should carefully consider the medical, legal, ethical, data security, privacy, and liability issues arising from medical errors caused by ChatGPT/GPT-4. This concern also extends to ChatGPT/GPT -4's ability to justify wrong decisions, and there is an urgent need for appropriate guidelines and regulations to ensure the safe and responsible use of ChatGPT/GPT-4.
Collapse
Affiliation(s)
- Saikam Law
- Department of Metabolic and Bariatric Surgery, The First Affiliated Hospital of Jinan University, Guangzhou, China
- School of Medicine, Jinan University, Guangzhou, China
| | - Brian Oldfield
- Department of Physiology, Monash Biomedicine Discovery Institute, Monash University, Melbourne, Australia
| | - Wah Yang
- Department of Metabolic and Bariatric Surgery, The First Affiliated Hospital of Jinan University, Guangzhou, China
| |
Collapse
|
15
|
Yang Z, Wang D, Zhou F, Song D, Zhang Y, Jiang J, Kong K, Liu X, Qiao Y, Chang RT, Han Y, Li F, Tham CC, Zhang X. Understanding natural language: Potential application of large language models to ophthalmology. Asia Pac J Ophthalmol (Phila) 2024; 13:100085. [PMID: 39059558 DOI: 10.1016/j.apjo.2024.100085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 06/19/2024] [Accepted: 07/19/2024] [Indexed: 07/28/2024] Open
Abstract
Large language models (LLMs), a natural language processing technology based on deep learning, are currently in the spotlight. These models closely mimic natural language comprehension and generation. Their evolution has undergone several waves of innovation similar to convolutional neural networks. The transformer architecture advancement in generative artificial intelligence marks a monumental leap beyond early-stage pattern recognition via supervised learning. With the expansion of parameters and training data (terabytes), LLMs unveil remarkable human interactivity, encompassing capabilities such as memory retention and comprehension. These advances make LLMs particularly well-suited for roles in healthcare communication between medical practitioners and patients. In this comprehensive review, we discuss the trajectory of LLMs and their potential implications for clinicians and patients. For clinicians, LLMs can be used for automated medical documentation, and given better inputs and extensive validation, LLMs may be able to autonomously diagnose and treat in the future. For patient care, LLMs can be used for triage suggestions, summarization of medical documents, explanation of a patient's condition, and customizing patient education materials tailored to their comprehension level. The limitations of LLMs and possible solutions for real-world use are also presented. Given the rapid advancements in this area, this review attempts to briefly cover many roles that LLMs may play in the ophthalmic space, with a focus on improving the quality of healthcare delivery.
Collapse
Affiliation(s)
- Zefeng Yang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
| | - Deming Wang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
| | - Fengqi Zhou
- Ophthalmology, Mayo Clinic Health System, Eau Claire, Wisconsin, USA
| | - Diping Song
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
| | - Yinhang Zhang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
| | - Jiaxuan Jiang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
| | - Kangjie Kong
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
| | - Xiaoyi Liu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
| | - Yu Qiao
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
| | - Robert T Chang
- Department of Ophthalmology, Byers Eye Institute at Stanford University, Palo Alto, CA, USA
| | - Ying Han
- Department of Ophthalmology, University of California, San Francisco, San Francisco, CA, USA
| | - Fei Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China.
| | - Clement C Tham
- Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong SAR, China; Hong Kong Eye Hospital, Kowloon, Hong Kong SAR, China; Department of Ophthalmology and Visual Sciences, Prince of Wales Hospital, Shatin, Hong Kong SAR, China.
| | - Xiulan Zhang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China.
| |
Collapse
|
16
|
Shinan-Altman S, Elyoseph Z, Levkovich I. The impact of history of depression and access to weapons on suicide risk assessment: a comparison of ChatGPT-3.5 and ChatGPT-4. PeerJ 2024; 12:e17468. [PMID: 38827287 PMCID: PMC11143969 DOI: 10.7717/peerj.17468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 05/05/2024] [Indexed: 06/04/2024] Open
Abstract
The aim of this study was to evaluate the effectiveness of ChatGPT-3.5 and ChatGPT-4 in incorporating critical risk factors, namely history of depression and access to weapons, into suicide risk assessments. Both models assessed suicide risk using scenarios that featured individuals with and without a history of depression and access to weapons. The models estimated the likelihood of suicidal thoughts, suicide attempts, serious suicide attempts, and suicide-related mortality on a Likert scale. A multivariate three-way ANOVA analysis with Bonferroni post hoc tests was conducted to examine the impact of the forementioned independent factors (history of depression and access to weapons) on these outcome variables. Both models identified history of depression as a significant suicide risk factor. ChatGPT-4 demonstrated a more nuanced understanding of the relationship between depression, access to weapons, and suicide risk. In contrast, ChatGPT-3.5 displayed limited insight into this complex relationship. ChatGPT-4 consistently assigned higher severity ratings to suicide-related variables than did ChatGPT-3.5. The study highlights the potential of these two models, particularly ChatGPT-4, to enhance suicide risk assessment by considering complex risk factors.
Collapse
Affiliation(s)
| | - Zohar Elyoseph
- Department of Brain Sciences, Faculty of Medicine, Imperial College London, London, England, United Kingdom
- The Center for Psychobiological Research, Department of Psychology and Educational Counseling, Max Stern Yezreel Valley College, Emek Yezreel, Israel
| | - Inbar Levkovich
- Faculty of Graduate Studies, Oranim Academic College of Education, Kiryat Tiv’on, Israel
| |
Collapse
|
17
|
Choudhury A, Shamszare H. The Impact of Performance Expectancy, Workload, Risk, and Satisfaction on Trust in ChatGPT: Cross-Sectional Survey Analysis. JMIR Hum Factors 2024; 11:e55399. [PMID: 38801658 PMCID: PMC11165287 DOI: 10.2196/55399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 03/25/2024] [Accepted: 04/07/2024] [Indexed: 05/29/2024] Open
Abstract
BACKGROUND ChatGPT (OpenAI) is a powerful tool for a wide range of tasks, from entertainment and creativity to health care queries. There are potential risks and benefits associated with this technology. In the discourse concerning the deployment of ChatGPT and similar large language models, it is sensible to recommend their use primarily for tasks a human user can execute accurately. As we transition into the subsequent phase of ChatGPT deployment, establishing realistic performance expectations and understanding users' perceptions of risk associated with its use are crucial in determining the successful integration of this artificial intelligence (AI) technology. OBJECTIVE The aim of the study is to explore how perceived workload, satisfaction, performance expectancy, and risk-benefit perception influence users' trust in ChatGPT. METHODS A semistructured, web-based survey was conducted with 607 adults in the United States who actively use ChatGPT. The survey questions were adapted from constructs used in various models and theories such as the technology acceptance model, the theory of planned behavior, the unified theory of acceptance and use of technology, and research on trust and security in digital environments. To test our hypotheses and structural model, we used the partial least squares structural equation modeling method, a widely used approach for multivariate analysis. RESULTS A total of 607 people responded to our survey. A significant portion of the participants held at least a high school diploma (n=204, 33.6%), and the majority had a bachelor's degree (n=262, 43.1%). The primary motivations for participants to use ChatGPT were for acquiring information (n=219, 36.1%), amusement (n=203, 33.4%), and addressing problems (n=135, 22.2%). Some participants used it for health-related inquiries (n=44, 7.2%), while a few others (n=6, 1%) used it for miscellaneous activities such as brainstorming, grammar verification, and blog content creation. Our model explained 64.6% of the variance in trust. Our analysis indicated a significant relationship between (1) workload and satisfaction, (2) trust and satisfaction, (3) performance expectations and trust, and (4) risk-benefit perception and trust. CONCLUSIONS The findings underscore the importance of ensuring user-friendly design and functionality in AI-based applications to reduce workload and enhance user satisfaction, thereby increasing user trust. Future research should further explore the relationship between risk-benefit perception and trust in the context of AI chatbots.
Collapse
Affiliation(s)
- Avishek Choudhury
- Industrial and Management Systems Engineering, Benjamin M. Statler College of Engineering and Mineral Resources, West Virginia University, Morgantown, WV, United States
| | - Hamid Shamszare
- Industrial and Management Systems Engineering, Benjamin M. Statler College of Engineering and Mineral Resources, West Virginia University, Morgantown, WV, United States
| |
Collapse
|
18
|
Naqvi WM, Shaikh SZ, Mishra GV. Large language models in physical therapy: time to adapt and adept. Front Public Health 2024; 12:1364660. [PMID: 38887241 PMCID: PMC11182445 DOI: 10.3389/fpubh.2024.1364660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Accepted: 05/10/2024] [Indexed: 06/20/2024] Open
Abstract
Healthcare is experiencing a transformative phase, with artificial intelligence (AI) and machine learning (ML). Physical therapists (PTs) stand on the brink of a paradigm shift in education, practice, and research. Rather than visualizing AI as a threat, it presents an opportunity to revolutionize. This paper examines how large language models (LLMs), such as ChatGPT and BioMedLM, driven by deep ML can offer human-like performance but face challenges in accuracy due to vast data in PT and rehabilitation practice. PTs can benefit by developing and training an LLM specifically for streamlining administrative tasks, connecting globally, and customizing treatments using LLMs. However, human touch and creativity remain invaluable. This paper urges PTs to engage in learning and shaping AI models by highlighting the need for ethical use and human supervision to address potential biases. Embracing AI as a contributor, and not just a user, is crucial by integrating AI, fostering collaboration for a future in which AI enriches the PT field provided data accuracy, and the challenges associated with feeding the AI model are sensitively addressed.
Collapse
Affiliation(s)
- Waqar M. Naqvi
- Department of Interdisciplinary Sciences, Datta Meghe Institute of Higher Education and Research, Wardha, India
- Department of Physiotherapy, College of Health Sciences, Gulf Medical University, Ajman, United Arab Emirates
- NKP Salve Institute of Medical Sciences and Research Center, Nagpur, India
| | - Summaiya Zareen Shaikh
- Department of Neuro-Physiotherapy, The SIA College of Health Sciences, College of Physiotherapy, Thane, India
| | - Gaurav V. Mishra
- Department of Radiodiagnosis, Datta Meghe Institute of Higher Education and Research, Wardha, India
| |
Collapse
|
19
|
Jokar M, Abdous A, Rahmanian V. AI chatbots in pet health care: Opportunities and challenges for owners. Vet Med Sci 2024; 10:e1464. [PMID: 38678576 PMCID: PMC11056198 DOI: 10.1002/vms3.1464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Accepted: 04/04/2024] [Indexed: 05/01/2024] Open
Abstract
The integration of artificial intelligence (AI) into health care has seen remarkable advancements, with applications extending to animal health. This article explores the potential benefits and challenges associated with employing AI chatbots as tools for pet health care. Focusing on ChatGPT, a prominent language model, the authors elucidate its capabilities and its potential impact on pet owners' decision-making processes. AI chatbots offer pet owners access to extensive information on animal health, research studies and diagnostic options, providing a cost-effective and convenient alternative to traditional veterinary consultations. The fate of a case involving a Border Collie named Sassy demonstrates the potential benefits of AI in veterinary medicine. In this instance, ChatGPT played a pivotal role in suggesting a diagnosis that led to successful treatment, showcasing the potential of AI chatbots as valuable tools in complex cases. However, concerns arise regarding pet owners relying solely on AI chatbots for medical advice, potentially resulting in misdiagnosis, inappropriate treatment and delayed professional intervention. We emphasize the need for a balanced approach, positioning AI chatbots as supplementary tools rather than substitutes for licensed veterinarians. To mitigate risks, the article proposes strategies such as educating pet owners on AI chatbots' limitations, implementing regulations to guide AI chatbot companies and fostering collaboration between AI chatbots and veterinarians. The intricate web of responsibilities in this dynamic landscape underscores the importance of government regulations, the educational role of AI chatbots and the symbiotic relationship between AI technology and veterinary expertise. In conclusion, while AI chatbots hold immense promise in transforming pet health care, cautious and informed usage is crucial. By promoting awareness, establishing regulations and fostering collaboration, the article advocates for a responsible integration of AI chatbots to ensure optimal care for pets.
Collapse
Affiliation(s)
- Mohammad Jokar
- Faculty of Veterinary MedicineKaraj BranchIslamic Azad UniversityKarajIran
| | - Arman Abdous
- Faculty of Veterinary MedicineKaraj BranchIslamic Azad UniversityKarajIran
| | - Vahid Rahmanian
- Department of Public HealthTorbat Jam Faculty of Medical SciencesTorbat JamIran
| |
Collapse
|
20
|
Sawamura S, Bito T, Ando T, Masuda K, Kameyama S, Ishida H. Evaluation of the accuracy of ChatGPT's responses to and references for clinical questions in physical therapy. J Phys Ther Sci 2024; 36:234-239. [PMID: 38694019 PMCID: PMC11060764 DOI: 10.1589/jpts.36.234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 01/29/2024] [Indexed: 05/03/2024] Open
Abstract
[Purpose] This study evaluated the accuracy of ChatGPT's responses to and references for five clinical questions in physical therapy based on the Physical Therapy Guidelines and assessed this language model's potential as a tool for supporting clinical decision-making in the rehabilitation field. [Participants and Methods] Five clinical questions from the "Stroke", "Musculoskeletal disorders", and "Internal disorders" sections of the Physical Therapy Guidelines, released by the Japanese Society of Physical Therapy, were presented to ChatGPT. ChatGPT was instructed to provide responses in Japanese accompanied by references such as PubMed IDs or digital object identifiers. The accuracy of the generated content and references was evaluated by two assessors with expertise in their respective sections by using a 4-point scale, and comments were provided for point deductions. The inter-rater agreement was evaluated using weighted kappa coefficients. [Results] ChatGPT demonstrated adequate accuracy in generating content for clinical questions in physical therapy. However, the accuracy of the references was poor, with a significant number of references being non-existent or misinterpreted. [Conclusion] ChatGPT has limitations in reference selection and reliability. While ChatGPT can offer accurate responses to clinical questions in physical therapy, it should be used with caution because it is not a completely reliable model.
Collapse
Affiliation(s)
- Shogo Sawamura
- Department of Rehabilitation, Heisei College of Health
Sciences: 180 Kurono, Gifu City, Gifu 501-1131, Japan
| | - Takanobu Bito
- Department of Rehabilitation, Gifu University Hospital,
Japan
| | - Takahiro Ando
- Department of Rehabilitation, Gifu University Hospital,
Japan
| | - Kento Masuda
- Department of Rehabilitation, Gifu University Hospital,
Japan
| | - Sakiko Kameyama
- Department of Rehabilitation, Heisei College of Health
Sciences: 180 Kurono, Gifu City, Gifu 501-1131, Japan
| | - Hiroyasu Ishida
- Department of Rehabilitation, Heisei College of Health
Sciences: 180 Kurono, Gifu City, Gifu 501-1131, Japan
| |
Collapse
|
21
|
Wang L, Wan Z, Ni C, Song Q, Li Y, Clayton EW, Malin BA, Yin Z. A Systematic Review of ChatGPT and Other Conversational Large Language Models in Healthcare. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.26.24306390. [PMID: 38712148 PMCID: PMC11071576 DOI: 10.1101/2024.04.26.24306390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Background The launch of the Chat Generative Pre-trained Transformer (ChatGPT) in November 2022 has attracted public attention and academic interest to large language models (LLMs), facilitating the emergence of many other innovative LLMs. These LLMs have been applied in various fields, including healthcare. Numerous studies have since been conducted regarding how to employ state-of-the-art LLMs in health-related scenarios to assist patients, doctors, and public health administrators. Objective This review aims to summarize the applications and concerns of applying conversational LLMs in healthcare and provide an agenda for future research on LLMs in healthcare. Methods We utilized PubMed, ACM, and IEEE digital libraries as primary sources for this review. We followed the guidance of Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRIMSA) to screen and select peer-reviewed research articles that (1) were related to both healthcare applications and conversational LLMs and (2) were published before September 1st, 2023, the date when we started paper collection and screening. We investigated these papers and classified them according to their applications and concerns. Results Our search initially identified 820 papers according to targeted keywords, out of which 65 papers met our criteria and were included in the review. The most popular conversational LLM was ChatGPT from OpenAI (60), followed by Bard from Google (1), Large Language Model Meta AI (LLaMA) from Meta (1), and other LLMs (5). These papers were classified into four categories in terms of their applications: 1) summarization, 2) medical knowledge inquiry, 3) prediction, and 4) administration, and four categories of concerns: 1) reliability, 2) bias, 3) privacy, and 4) public acceptability. There are 49 (75%) research papers using LLMs for summarization and/or medical knowledge inquiry, and 58 (89%) research papers expressing concerns about reliability and/or bias. We found that conversational LLMs exhibit promising results in summarization and providing medical knowledge to patients with a relatively high accuracy. However, conversational LLMs like ChatGPT are not able to provide reliable answers to complex health-related tasks that require specialized domain expertise. Additionally, no experiments in our reviewed papers have been conducted to thoughtfully examine how conversational LLMs lead to bias or privacy issues in healthcare research. Conclusions Future studies should focus on improving the reliability of LLM applications in complex health-related tasks, as well as investigating the mechanisms of how LLM applications brought bias and privacy issues. Considering the vast accessibility of LLMs, legal, social, and technical efforts are all needed to address concerns about LLMs to promote, improve, and regularize the application of LLMs in healthcare.
Collapse
Affiliation(s)
- Leyao Wang
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA, 37212
| | - Zhiyu Wan
- Department of Biomedical Informatics, Vanderbilt University Medical Center, TN, USA, 37203
| | - Congning Ni
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA, 37212
| | - Qingyuan Song
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA, 37212
| | - Yang Li
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA, 37212
| | - Ellen Wright Clayton
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, Tennessee, USA, 37203
- Center for Biomedical Ethics and Society, Vanderbilt University Medical Center, Nashville, Tennessee, USA, 37203
| | - Bradley A. Malin
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA, 37212
- Department of Biomedical Informatics, Vanderbilt University Medical Center, TN, USA, 37203
- Department of Biostatistics, Vanderbilt University Medical Center, TN, USA, 37203
| | - Zhijun Yin
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA, 37212
- Department of Biomedical Informatics, Vanderbilt University Medical Center, TN, USA, 37203
| |
Collapse
|
22
|
Choudhury A, Chaudhry Z. Large Language Models and User Trust: Consequence of Self-Referential Learning Loop and the Deskilling of Health Care Professionals. J Med Internet Res 2024; 26:e56764. [PMID: 38662419 PMCID: PMC11082730 DOI: 10.2196/56764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 03/12/2024] [Accepted: 03/20/2024] [Indexed: 04/26/2024] Open
Abstract
As the health care industry increasingly embraces large language models (LLMs), understanding the consequence of this integration becomes crucial for maximizing benefits while mitigating potential pitfalls. This paper explores the evolving relationship among clinician trust in LLMs, the transition of data sources from predominantly human-generated to artificial intelligence (AI)-generated content, and the subsequent impact on the performance of LLMs and clinician competence. One of the primary concerns identified in this paper is the LLMs' self-referential learning loops, where AI-generated content feeds into the learning algorithms, threatening the diversity of the data pool, potentially entrenching biases, and reducing the efficacy of LLMs. While theoretical at this stage, this feedback loop poses a significant challenge as the integration of LLMs in health care deepens, emphasizing the need for proactive dialogue and strategic measures to ensure the safe and effective use of LLM technology. Another key takeaway from our investigation is the role of user expertise and the necessity for a discerning approach to trusting and validating LLM outputs. The paper highlights how expert users, particularly clinicians, can leverage LLMs to enhance productivity by off-loading routine tasks while maintaining a critical oversight to identify and correct potential inaccuracies in AI-generated content. This balance of trust and skepticism is vital for ensuring that LLMs augment rather than undermine the quality of patient care. We also discuss the risks associated with the deskilling of health care professionals. Frequent reliance on LLMs for critical tasks could result in a decline in health care providers' diagnostic and thinking skills, particularly affecting the training and development of future professionals. The legal and ethical considerations surrounding the deployment of LLMs in health care are also examined. We discuss the medicolegal challenges, including liability in cases of erroneous diagnoses or treatment advice generated by LLMs. The paper references recent legislative efforts, such as The Algorithmic Accountability Act of 2023, as crucial steps toward establishing a framework for the ethical and responsible use of AI-based technologies in health care. In conclusion, this paper advocates for a strategic approach to integrating LLMs into health care. By emphasizing the importance of maintaining clinician expertise, fostering critical engagement with LLM outputs, and navigating the legal and ethical landscape, we can ensure that LLMs serve as valuable tools in enhancing patient care and supporting health care professionals. This approach addresses the immediate challenges posed by integrating LLMs and sets a foundation for their maintainable and responsible use in the future.
Collapse
Affiliation(s)
- Avishek Choudhury
- Industrial and Management Systems Engineering, West Virginia University, Morgantown, WV, United States
| | - Zaira Chaudhry
- Industrial and Management Systems Engineering, West Virginia University, Morgantown, WV, United States
| |
Collapse
|
23
|
Ostrowska M, Kacała P, Onolememen D, Vaughan-Lane K, Sisily Joseph A, Ostrowski A, Pietruszewska W, Banaszewski J, Wróbel MJ. To trust or not to trust: evaluating the reliability and safety of AI responses to laryngeal cancer queries. Eur Arch Otorhinolaryngol 2024:10.1007/s00405-024-08643-8. [PMID: 38652298 DOI: 10.1007/s00405-024-08643-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Accepted: 03/26/2024] [Indexed: 04/25/2024]
Abstract
PURPOSE As online health information-seeking surges, concerns mount over the quality and safety of accessible content, potentially leading to patient harm through misinformation. On one hand, the emergence of Artificial Intelligence (AI) in healthcare could prevent it; on the other hand, questions raise regarding the quality and safety of the medical information provided. As laryngeal cancer is a prevalent head and neck malignancy, this study aims to evaluate the utility and safety of three large language models (LLMs) as sources of patient information about laryngeal cancer. METHODS A cross-sectional study was conducted using three LLMs (ChatGPT 3.5, ChatGPT 4.0, and Bard). A questionnaire comprising 36 inquiries about laryngeal cancer was categorised into diagnosis (11 questions), treatment (9 questions), novelties and upcoming treatments (4 questions), controversies (8 questions), and sources of information (4 questions). The population of reviewers consisted of 3 groups, including ENT specialists, junior physicians, and non-medicals, who graded the responses. Each physician evaluated each question twice for each model, while non-medicals only once. Everyone was blinded to the model type, and the question order was shuffled. Outcome evaluations were based on a safety score (1-3) and a Global Quality Score (GQS, 1-5). Results were compared between LLMs. The study included iterative assessments and statistical validations. RESULTS Analysis revealed that ChatGPT 3.5 scored highest in both safety (mean: 2.70) and GQS (mean: 3.95). ChatGPT 4.0 and Bard had lower safety scores of 2.56 and 2.42, respectively, with corresponding quality scores of 3.65 and 3.38. Inter-rater reliability was consistent, with less than 3% discrepancy. About 4.2% of responses fell into the lowest safety category (1), particularly in the novelty category. Non-medical reviewers' quality assessments correlated moderately (r = 0.67) with response length. CONCLUSIONS LLMs can be valuable resources for patients seeking information on laryngeal cancer. ChatGPT 3.5 provided the most reliable and safe responses among the models evaluated.
Collapse
Affiliation(s)
- Magdalena Ostrowska
- Department of Otolaryngology and Laryngological Oncology, Collegium Medicum, Nicolaus Copernicus University in Torun, ul.Marie Sklodowskiej-Curie 9, 85-094, Bydgoszcz, Poland
| | - Paulina Kacała
- ENT Scientific Club, Department of Otolaryngology and Laryngological Oncology, Collegium Medicum, Nicolaus Copernicus University in Torun, ul.Marie Sklodowskiej-Curie 9, 85-094, Bydgoszcz, Poland
| | - Deborah Onolememen
- ENT Scientific Club, Department of Otolaryngology and Laryngological Oncology, Collegium Medicum, Nicolaus Copernicus University in Torun, ul.Marie Sklodowskiej-Curie 9, 85-094, Bydgoszcz, Poland
| | - Katie Vaughan-Lane
- ENT Scientific Club, Department of Otolaryngology and Laryngological Oncology, Collegium Medicum, Nicolaus Copernicus University in Torun, ul.Marie Sklodowskiej-Curie 9, 85-094, Bydgoszcz, Poland.
| | - Anitta Sisily Joseph
- ENT Scientific Club, Department of Otolaryngology and Laryngological Oncology, Collegium Medicum, Nicolaus Copernicus University in Torun, ul.Marie Sklodowskiej-Curie 9, 85-094, Bydgoszcz, Poland
| | - Adam Ostrowski
- Department of Urology, Collegium Medicum, Nicolaus Copernicus University in Torun, ul.Marie Sklodowskiej-Curie 9, 85-094, Bydgoszcz, Poland
| | - Wioletta Pietruszewska
- Department of Otolaryngology, Laryngological Oncology, Audiology and Phoniatrics, Medical University of Lodz, ul Żeromskiego 113, 90-549, Lodz, Poland
| | - Jacek Banaszewski
- Department of Otolaryngology, Head and Neck Oncology, Poznan University of Medical Science, ul Przybyszewskiego 49, 60-355, Poznań, Poland
| | - Maciej J Wróbel
- Department of Otolaryngology and Laryngological Oncology, Collegium Medicum, Nicolaus Copernicus University in Torun, ul.Marie Sklodowskiej-Curie 9, 85-094, Bydgoszcz, Poland
| |
Collapse
|
24
|
Moise A, Centomo-Bozzo A, Orishchak O, Alnoury MK, Daniel SJ. Can ChatGPT Replace an Otolaryngologist in Guiding Parents on Tonsillectomy? EAR, NOSE & THROAT JOURNAL 2024:1455613241230841. [PMID: 38563440 DOI: 10.1177/01455613241230841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2024] Open
Abstract
Background: ChatGPT is an artificial intelligence tool, which utilizes machine learning to analyze and generate human-like text. The user-friendly accessibility of this tool enables patients conveniently access medical information without intricate terminology challenges. The objective of this study was to assess the accuracy of ChatGPT in providing insights into indications and management of complications after tonsillectomy, a common pediatric otolaryngology procedure. Methods: The responses generated by ChatGPT were compared to the "Clinical practice guidelines: tonsillectomy in children-executive summary" developed by the American Academy of Otolaryngology-Head and Neck Surgery Foundation (AAO-HNSF). An assessment was carried out by presenting predetermined questions regarding indications and complications post tonsillectomy to ChatGPT, followed by a comparison of its responses with the established guideline by 2 otolaryngology experts. The responses of both parties were reviewed by the senior author. Results: A total of 16 responses generated by ChatGPT were assessed. After a comprehensive review, it was concluded that 15 out of 16 (93.8%) responses demonstrated a high degree of reliability and accuracy, closely adhering to the standard established by the AAO-HNSF guideline. Conclusion: The results validate the potential of using ChatGPT to enhance healthcare delivery making guidelines more accessible to patients while also emphasizing the importance of ensuring the provision of accurate and reliable medical advice to patients.
Collapse
Affiliation(s)
- Alexander Moise
- Faculty of Medicine and Health Sciences, McGill University, Montreal, QC, Canada
| | - Adam Centomo-Bozzo
- Faculty of Dental Medicine and Oral Health Sciences, McGill University, Montreal, QC, Canada
| | - Ostap Orishchak
- Department of Pediatric Otolaryngology, Montreal Children's Hospital, Montreal, QC, Canada
| | - Mohammed K Alnoury
- Department of Otolaryngology-Head and Neck Surgery, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Sam J Daniel
- Department of Pediatric Otolaryngology, Montreal Children's Hospital, Montreal, QC, Canada
| |
Collapse
|
25
|
Wimbarti S, Kairupan BHR, Tallei TE. Critical review of self-diagnosis of mental health conditions using artificial intelligence. Int J Ment Health Nurs 2024; 33:344-358. [PMID: 38345132 DOI: 10.1111/inm.13303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 01/26/2024] [Accepted: 01/30/2024] [Indexed: 03/10/2024]
Abstract
The advent of artificial intelligence (AI) has revolutionised various aspects of our lives, including mental health nursing. AI-driven tools and applications have provided a convenient and accessible means for individuals to assess their mental well-being within the confines of their homes. Nonetheless, the widespread trend of self-diagnosing mental health conditions through AI poses considerable risks. This review article examines the perils associated with relying on AI for self-diagnosis in mental health, highlighting the constraints and possible adverse outcomes that can arise from such practices. It delves into the ethical, psychological, and social implications, underscoring the vital role of mental health professionals, including psychologists, psychiatrists, and nursing specialists, in providing professional assistance and guidance. This article aims to highlight the importance of seeking professional assistance and guidance in addressing mental health concerns, especially in the era of AI-driven self-diagnosis.
Collapse
Affiliation(s)
- Supra Wimbarti
- Faculty of Psychology, Universitas Gadjah Mada, Yogyakarta, Indonesia
| | - B H Ralph Kairupan
- Department of Psychiatry, Faculty of Medicine, Sam Ratulangi University, Manado, North Sulawesi, Indonesia
| | - Trina Ekawati Tallei
- Department of Biology, Faculty of Mathematics and Natural Sciences, Sam Ratulangi University, Manado, North Sulawesi, Indonesia
- Department of Biology, Faculty of Medicine, Sam Ratulangi University, Manado, North Sulawesi, Indonesia
| |
Collapse
|
26
|
Parikh AO, Oca MC, Conger JR, McCoy A, Chang J, Zhang-Nunes S. Accuracy and Bias in Artificial Intelligence Chatbot Recommendations for Oculoplastic Surgeons. Cureus 2024; 16:e57611. [PMID: 38707042 PMCID: PMC11069401 DOI: 10.7759/cureus.57611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/30/2024] [Indexed: 05/07/2024] Open
Abstract
Purpose The purpose of this study is to assess the accuracy of and bias in recommendations for oculoplastic surgeons from three artificial intelligence (AI) chatbot systems. Methods ChatGPT, Microsoft Bing Balanced, and Google Bard were asked for recommendations for oculoplastic surgeons practicing in 20 cities with the highest population in the United States. Three prompts were used: "can you help me find (an oculoplastic surgeon)/(a doctor who does eyelid lifts)/(an oculofacial plastic surgeon) in (city)." Results A total of 672 suggestions were made between (oculoplastic surgeon; doctor who does eyelid lifts; oculofacial plastic surgeon); 19.8% suggestions were excluded, leaving 539 suggested physicians. Of these, 64.1% were oculoplastics specialists (of which 70.1% were American Society of Ophthalmic Plastic and Reconstructive Surgery (ASOPRS) members); 16.1% were general plastic surgery trained, 9.0% were ENT trained, 8.8% were ophthalmology but not oculoplastics trained, and 1.9% were trained in another specialty. 27.7% of recommendations across all AI systems were female. Conclusions Among the chatbot systems tested, there were high rates of inaccuracy: up to 38% of recommended surgeons were nonexistent or not practicing in the city requested, and 35.9% of those recommended as oculoplastic/oculofacial plastic surgeons were not oculoplastics specialists. Choice of prompt affected the result, with requests for "a doctor who does eyelid lifts" resulting in more plastic surgeons and ENTs and fewer oculoplastic surgeons. It is important to identify inaccuracies and biases in recommendations provided by AI systems as more patients may start using them to choose a surgeon.
Collapse
Affiliation(s)
- Alomi O Parikh
- Ophthalmology, USC Roski Eye Institute, Keck School of Medicine, University of Southern California, Los Angeles, USA
| | - Michael C Oca
- Ophthalmology, University of California San Diego School of Medicine, La Jolla, USA
| | - Jordan R Conger
- Oculofacial Plastic Surgery, USC Roski Eye Institute, Keck School of Medicine, University of Southern California, Los Angeles, USA
| | - Allison McCoy
- Oculofacial Plastic Surgery, Del Mar Plastic Surgery, San Diego, USA
| | - Jessica Chang
- Oculofacial Plastic Surgery, USC Roski Eye Institute, Keck School Medicine, University of Southern California, Los Angeles, USA
| | - Sandy Zhang-Nunes
- Ophthalmology, USC Roski Eye Institute, Keck School Medicine, University of Southern California, Los Angeles, USA
| |
Collapse
|
27
|
Xue Z, Zhang Y, Gan W, Wang H, She G, Zheng X. Quality and Dependability of ChatGPT and DingXiangYuan Forums for Remote Orthopedic Consultations: Comparative Analysis. J Med Internet Res 2024; 26:e50882. [PMID: 38483451 PMCID: PMC10979330 DOI: 10.2196/50882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Revised: 11/04/2023] [Accepted: 01/30/2024] [Indexed: 04/01/2024] Open
Abstract
BACKGROUND The widespread use of artificial intelligence, such as ChatGPT (OpenAI), is transforming sectors, including health care, while separate advancements of the internet have enabled platforms such as China's DingXiangYuan to offer remote medical services. OBJECTIVE This study evaluates ChatGPT-4's responses against those of professional health care providers in telemedicine, assessing artificial intelligence's capability to support the surge in remote medical consultations and its impact on health care delivery. METHODS We sourced remote orthopedic consultations from "Doctor DingXiang," with responses from its certified physicians as the control and ChatGPT's responses as the experimental group. In all, 3 blindfolded, experienced orthopedic surgeons assessed responses against 7 criteria: "logical reasoning," "internal information," "external information," "guiding function," "therapeutic effect," "medical knowledge popularization education," and "overall satisfaction." We used Fleiss κ to measure agreement among multiple raters. RESULTS Initially, consultation records for a cumulative count of 8 maladies (equivalent to 800 cases) were gathered. We ultimately included 73 consultation records by May 2023, following primary and rescreening, in which no communication records containing private information, images, or voice messages were transmitted. After statistical scoring, we discovered that ChatGPT's "internal information" score (mean 4.61, SD 0.52 points vs mean 4.66, SD 0.49 points; P=.43) and "therapeutic effect" score (mean 4.43, SD 0.75 points vs mean 4.55, SD 0.62 points; P=.32) were lower than those of the control group, but the differences were not statistically significant. ChatGPT showed better performance with a higher "logical reasoning" score (mean 4.81, SD 0.36 points vs mean 4.75, SD 0.39 points; P=.38), "external information" score (mean 4.06, SD 0.72 points vs mean 3.92, SD 0.77 points; P=.25), and "guiding function" score (mean 4.73, SD 0.51 points vs mean 4.72, SD 0.54 points; P=.96), although the differences were not statistically significant. Meanwhile, the "medical knowledge popularization education" score of ChatGPT was better than that of the control group (mean 4.49, SD 0.67 points vs mean 3.87, SD 1.01 points; P<.001), and the difference was statistically significant. In terms of "overall satisfaction," the difference was not statistically significant between the groups (mean 8.35, SD 1.38 points vs mean 8.37, SD 1.24 points; P=.92). According to how Fleiss κ values were interpreted, 6 of the control group's score points were classified as displaying "fair agreement" (P<.001), and 1 was classified as showing "substantial agreement" (P<.001). In the experimental group, 3 points were classified as indicating "fair agreement," while 4 suggested "moderate agreement" (P<.001). CONCLUSIONS ChatGPT-4 matches the expertise found in DingXiangYuan forums' paid consultations, excelling particularly in scientific education. It presents a promising alternative for remote health advice. For health care professionals, it could act as an aid in patient education, while patients may use it as a convenient tool for health inquiries.
Collapse
Affiliation(s)
- Zhaowen Xue
- Department of Bone and Joint Surgery and Sports Medicine Center, The First Affiliated Hospital, The First Affiliated Hospital of Jinan University, Guangzhou, China
| | - Yiming Zhang
- Department of Bone and Joint Surgery and Sports Medicine Center, The First Affiliated Hospital, The First Affiliated Hospital of Jinan University, Guangzhou, China
| | - Wenyi Gan
- Department of Bone and Joint Surgery and Sports Medicine Center, The First Affiliated Hospital, The First Affiliated Hospital of Jinan University, Guangzhou, China
| | - Huajun Wang
- Department of Bone and Joint Surgery and Sports Medicine Center, The First Affiliated Hospital, The First Affiliated Hospital of Jinan University, Guangzhou, China
| | - Guorong She
- Department of Bone and Joint Surgery and Sports Medicine Center, The First Affiliated Hospital, The First Affiliated Hospital of Jinan University, Guangzhou, China
| | - Xiaofei Zheng
- Department of Bone and Joint Surgery and Sports Medicine Center, The First Affiliated Hospital, The First Affiliated Hospital of Jinan University, Guangzhou, China
| |
Collapse
|
28
|
Akinci D’Antonoli T, Stanzione A, Bluethgen C, Vernuccio F, Ugga L, Klontzas ME, Cuocolo R, Cannella R, Koçak B. Large language models in radiology: fundamentals, applications, ethical considerations, risks, and future directions. Diagn Interv Radiol 2024; 30:80-90. [PMID: 37789676 PMCID: PMC10916534 DOI: 10.4274/dir.2023.232417] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 09/18/2023] [Indexed: 10/05/2023]
Abstract
With the advent of large language models (LLMs), the artificial intelligence revolution in medicine and radiology is now more tangible than ever. Every day, an increasingly large number of articles are published that utilize LLMs in radiology. To adopt and safely implement this new technology in the field, radiologists should be familiar with its key concepts, understand at least the technical basics, and be aware of the potential risks and ethical considerations that come with it. In this review article, the authors provide an overview of the LLMs that might be relevant to the radiology community and include a brief discussion of their short history, technical basics, ChatGPT, prompt engineering, potential applications in medicine and radiology, advantages, disadvantages and risks, ethical and regulatory considerations, and future directions.
Collapse
Affiliation(s)
- Tugba Akinci D’Antonoli
- Cantonal Hospital Baselland, Institute of Radiology and Nuclear Medicine, Liestal, Switzerland
| | - Arnaldo Stanzione
- University of Naples Federico II, Department of Advanced Biomedical Sciences, Naples, Italy
| | - Christian Bluethgen
- University Hospital Zurich, University of Zurich, Institute for Diagnostic and Interventional Radiology, Zurich, Switzerland
| | | | - Lorenzo Ugga
- University of Naples Federico II, Department of Advanced Biomedical Sciences, Naples, Italy
| | - Michail E. Klontzas
- Department of Medical Imaging, University Hospital of Heraklion, Crete, Greece & Department of Radiology, University of Crete, Heraklion, Crete, Greece & Computational Biomedicine Laboratory, Institute of Computer Science, FORTH, Heraklion, Crete, Greece
| | - Renato Cuocolo
- University of Salerno, Department of Medicine, Surgery and Dentistry, Baronissi, Italy
| | - Roberto Cannella
- University of Palermo, Department of Biomedicine, Neuroscience and Advanced Diagnostics, Section of Radiology, Palermo, Italy
| | - Burak Koçak
- University of Health Sciences, Basakşehir Çam and Sakura City Hospital, Clinic of Radiology, İstanbul, Türkiye
| |
Collapse
|
29
|
Spotnitz M, Idnay B, Gordon ER, Shyu R, Zhang G, Liu C, Cimino JJ, Weng C. A Survey of Clinicians' Views of the Utility of Large Language Models. Appl Clin Inform 2024; 15:306-312. [PMID: 38442909 PMCID: PMC11023712 DOI: 10.1055/a-2281-7092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 02/15/2024] [Indexed: 03/07/2024] Open
Abstract
OBJECTIVES Large language models (LLMs) like Generative pre-trained transformer (ChatGPT) are powerful algorithms that have been shown to produce human-like text from input data. Several potential clinical applications of this technology have been proposed and evaluated by biomedical informatics experts. However, few have surveyed health care providers for their opinions about whether the technology is fit for use. METHODS We distributed a validated mixed-methods survey to gauge practicing clinicians' comfort with LLMs for a breadth of tasks in clinical practice, research, and education, which were selected from the literature. RESULTS A total of 30 clinicians fully completed the survey. Of the 23 tasks, 16 were rated positively by more than 50% of the respondents. Based on our qualitative analysis, health care providers considered LLMs to have excellent synthesis skills and efficiency. However, our respondents had concerns that LLMs could generate false information and propagate training data bias.Our survey respondents were most comfortable with scenarios that allow LLMs to function in an assistive role, like a physician extender or trainee. CONCLUSION In a mixed-methods survey of clinicians about LLM use, health care providers were encouraging of having LLMs in health care for many tasks, and especially in assistive roles. There is a need for continued human-centered development of both LLMs and artificial intelligence in general.
Collapse
Affiliation(s)
- Matthew Spotnitz
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York, United States
| | - Betina Idnay
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York, United States
| | - Emily R. Gordon
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York, United States
- Department of Dermatology, Vagelos College of Physicians and Surgeons, Columbia University Irving Medical Center, New York, New York, United States
| | - Rebecca Shyu
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York, United States
| | - Gongbo Zhang
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York, United States
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York, United States
| | - James J. Cimino
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York, United States
- Department of Biomedical Informatics and Data Science, Informatics Institute, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York, United States
| |
Collapse
|
30
|
Monteith S, Glenn T, Geddes JR, Whybrow PC, Achtyes ED, Bauer M. Implications of Online Self-Diagnosis in Psychiatry. PHARMACOPSYCHIATRY 2024; 57:45-52. [PMID: 38471511 DOI: 10.1055/a-2268-5441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/14/2024]
Abstract
Online self-diagnosis of psychiatric disorders by the general public is increasing. The reasons for the increase include the expansion of Internet technologies and the use of social media, the rapid growth of direct-to-consumer e-commerce in healthcare, and the increased emphasis on patient involvement in decision making. The publicity given to artificial intelligence (AI) has also contributed to the increased use of online screening tools by the general public. This paper aims to review factors contributing to the expansion of online self-diagnosis by the general public, and discuss both the risks and benefits of online self-diagnosis of psychiatric disorders. A narrative review was performed with examples obtained from the scientific literature and commercial articles written for the general public. Online self-diagnosis of psychiatric disorders is growing rapidly. Some people with a positive result on a screening tool will seek professional help. However, there are many potential risks for patients who self-diagnose, including an incorrect or dangerous diagnosis, increased patient anxiety about the diagnosis, obtaining unfiltered advice on social media, using the self-diagnosis to self-treat, including online purchase of medications without a prescription, and technical issues including the loss of privacy. Physicians need to be aware of the increase in self-diagnosis by the general public and the potential risks, both medical and technical. Psychiatrists must recognize that the general public is often unaware of the challenging medical and technical issues involved in the diagnosis of a mental disorder, and be ready to treat patients who have already obtained an online self-diagnosis.
Collapse
Affiliation(s)
- Scott Monteith
- Michigan State University College of Human Medicine, Traverse City Campus, Traverse City, Michigan, USA
| | - Tasha Glenn
- ChronoRecord Association, Fullerton, California, USA
| | - John R Geddes
- Department of Psychiatry, University of Oxford, Warneford Hospital, Oxford, UK
| | - Peter C Whybrow
- Department of Psychiatry and Biobehavioral Sciences, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles (UCLA), Los Angeles, California, USA
| | - Eric D Achtyes
- Department of Psychiatry, Western Michigan University Homer Stryker M.D. School of Medicine, Kalamazoo, Michigan, USA
| | - Michael Bauer
- Department of Psychiatry and Psychotherapy, University Hospital Carl Gustav Carus Medical Faculty, Technische Universität Dresden, Dresden, Germany
| |
Collapse
|
31
|
Rouhi AD, Ghanem YK, Yolchieva L, Saleh Z, Joshi H, Moccia MC, Suarez-Pierre A, Han JJ. Can Artificial Intelligence Improve the Readability of Patient Education Materials on Aortic Stenosis? A Pilot Study. Cardiol Ther 2024; 13:137-147. [PMID: 38194058 PMCID: PMC10899139 DOI: 10.1007/s40119-023-00347-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Accepted: 12/13/2023] [Indexed: 01/10/2024] Open
Abstract
INTRODUCTION The advent of generative artificial intelligence (AI) dialogue platforms and large language models (LLMs) may help facilitate ongoing efforts to improve health literacy. Additionally, recent studies have highlighted inadequate health literacy among patients with cardiac disease. The aim of the present study was to ascertain whether two freely available generative AI dialogue platforms could rewrite online aortic stenosis (AS) patient education materials (PEMs) to meet recommended reading skill levels for the public. METHODS Online PEMs were gathered from a professional cardiothoracic surgical society and academic institutions in the USA. PEMs were then inputted into two AI-powered LLMs, ChatGPT-3.5 and Bard, with the prompt "translate to 5th-grade reading level". Readability of PEMs before and after AI conversion was measured using the validated Flesch Reading Ease (FRE), Flesch-Kincaid Grade Level (FKGL), Simple Measure of Gobbledygook Index (SMOGI), and Gunning-Fog Index (GFI) scores. RESULTS Overall, 21 PEMs on AS were gathered. Original readability measures indicated difficult readability at the 10th-12th grade reading level. ChatGPT-3.5 successfully improved readability across all four measures (p < 0.001) to the approximately 6th-7th grade reading level. Bard successfully improved readability across all measures (p < 0.001) except for SMOGI (p = 0.729) to the approximately 8th-9th grade level. Neither platform generated PEMs written below the recommended 6th-grade reading level. ChatGPT-3.5 demonstrated significantly more favorable post-conversion readability scores, percentage change in readability scores, and conversion time compared to Bard (all p < 0.001). CONCLUSION AI dialogue platforms can enhance the readability of PEMs for patients with AS but may not fully meet recommended reading skill levels, highlighting potential tools to help strengthen cardiac health literacy in the future.
Collapse
Affiliation(s)
- Armaun D Rouhi
- Department of Surgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Yazid K Ghanem
- Department of Surgery, Cooper University Hospital, Camden, NJ, USA
| | - Laman Yolchieva
- College of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA
| | - Zena Saleh
- Department of Surgery, Cooper University Hospital, Camden, NJ, USA
| | - Hansa Joshi
- Department of Surgery, Cooper University Hospital, Camden, NJ, USA
| | - Matthew C Moccia
- Department of Surgery, Cooper University Hospital, Camden, NJ, USA
| | | | - Jason J Han
- Division of Cardiovascular Surgery, Department of Surgery, Perelman School of Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
32
|
Sallam M, Barakat M, Sallam M. A Preliminary Checklist (METRICS) to Standardize the Design and Reporting of Studies on Generative Artificial Intelligence-Based Models in Health Care Education and Practice: Development Study Involving a Literature Review. Interact J Med Res 2024; 13:e54704. [PMID: 38276872 PMCID: PMC10905357 DOI: 10.2196/54704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Revised: 12/18/2023] [Accepted: 01/26/2024] [Indexed: 01/27/2024] Open
Abstract
BACKGROUND Adherence to evidence-based practice is indispensable in health care. Recently, the utility of generative artificial intelligence (AI) models in health care has been evaluated extensively. However, the lack of consensus guidelines on the design and reporting of findings of these studies poses a challenge for the interpretation and synthesis of evidence. OBJECTIVE This study aimed to develop a preliminary checklist to standardize the reporting of generative AI-based studies in health care education and practice. METHODS A literature review was conducted in Scopus, PubMed, and Google Scholar. Published records with "ChatGPT," "Bing," or "Bard" in the title were retrieved. Careful examination of the methodologies employed in the included records was conducted to identify the common pertinent themes and the possible gaps in reporting. A panel discussion was held to establish a unified and thorough checklist for the reporting of AI studies in health care. The finalized checklist was used to evaluate the included records by 2 independent raters. Cohen κ was used as the method to evaluate the interrater reliability. RESULTS The final data set that formed the basis for pertinent theme identification and analysis comprised a total of 34 records. The finalized checklist included 9 pertinent themes collectively referred to as METRICS (Model, Evaluation, Timing, Range/Randomization, Individual factors, Count, and Specificity of prompts and language). Their details are as follows: (1) Model used and its exact settings; (2) Evaluation approach for the generated content; (3) Timing of testing the model; (4) Transparency of the data source; (5) Range of tested topics; (6) Randomization of selecting the queries; (7) Individual factors in selecting the queries and interrater reliability; (8) Count of queries executed to test the model; and (9) Specificity of the prompts and language used. The overall mean METRICS score was 3.0 (SD 0.58). The tested METRICS score was acceptable, with the range of Cohen κ of 0.558 to 0.962 (P<.001 for the 9 tested items). With classification per item, the highest average METRICS score was recorded for the "Model" item, followed by the "Specificity" item, while the lowest scores were recorded for the "Randomization" item (classified as suboptimal) and "Individual factors" item (classified as satisfactory). CONCLUSIONS The METRICS checklist can facilitate the design of studies guiding researchers toward best practices in reporting results. The findings highlight the need for standardized reporting algorithms for generative AI-based studies in health care, considering the variability observed in methodologies and reporting. The proposed METRICS checklist could be a preliminary helpful base to establish a universally accepted approach to standardize the design and reporting of generative AI-based studies in health care, which is a swiftly evolving research topic.
Collapse
Affiliation(s)
- Malik Sallam
- Department of Pathology, Microbiology and Forensic Medicine, School of Medicine, The University of Jordan, Amman, Jordan
- Department of Clinical Laboratories and Forensic Medicine, Jordan University Hospital, Amman, Jordan
- Department of Translational Medicine, Faculty of Medicine, Lund University, Malmo, Sweden
| | - Muna Barakat
- Department of Clinical Pharmacy and Therapeutics, Faculty of Pharmacy, Applied Science Private University, Amman, Jordan
| | - Mohammed Sallam
- Department of Pharmacy, Mediclinic Parkview Hospital, Mediclinic Middle East, Dubai, United Arab Emirates
| |
Collapse
|
33
|
Bilal M, Jamil Y, Rana D, Shah HH. Enhancing Awareness and Self-diagnosis of Obstructive Sleep Apnea Using AI-Powered Chatbots: The Role of ChatGPT in Revolutionizing Healthcare. Ann Biomed Eng 2024; 52:136-138. [PMID: 37389659 DOI: 10.1007/s10439-023-03298-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 06/22/2023] [Indexed: 07/01/2023]
Abstract
Since OpenAI (San Francisco, CA) released its generative AI chatbot, ChatGPT, we are on the cusp of technological transformation. The tool is capable of generating text according to the input that the user adds to it. Due to its ability to imitate human speech tone while extracting encyclopedic knowledge, ChatGPT can be a platform for personalized patient interaction. Thus, it has the potential to revolutionize the healthcare system. Our study aims to evaluate how ChatGPT can answer the queries of patients suffering from obstructive sleep apnea and aid in self-diagnosis. By analyzing symptoms and guiding patients' behavior toward prevention, ChatGPT can play a major role in avoiding serious health repercussions that develop in the later course of obstructive sleep apnea.
Collapse
Affiliation(s)
- Maham Bilal
- Dow Medical College, Dow University of Health Sciences, Karachi, Pakistan.
| | - Yumna Jamil
- Dow Medical College, Dow University of Health Sciences, Karachi, Pakistan
| | - Dua Rana
- Dow Medical College, Dow University of Health Sciences, Karachi, Pakistan
| | | |
Collapse
|
34
|
Alanezi F. Assessing the Effectiveness of ChatGPT in Delivering Mental Health Support: A Qualitative Study. J Multidiscip Healthc 2024; 17:461-471. [PMID: 38314011 PMCID: PMC10838501 DOI: 10.2147/jmdh.s447368] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 01/08/2024] [Indexed: 02/06/2024] Open
Abstract
Background Artificial Intelligence (AI) applications are widely researched for their potential in effectively improving the healthcare operations and disease management. However, the research trend shows that these applications also have significant negative implications on the service delivery. Purpose To assess the use of ChatGPT for mental health support. Methods Due to the novelty and unfamiliarity of the ChatGPT technology, a quasi-experimental design was chosen for this study. Outpatients from a public hospital were included in the sample. A two-week experiment followed by semi-structured interviews was conducted in which participants used ChatGPT for mental health support. Semi-structured interviews were conducted with 24 individuals with mental health conditions. Results Eight positive factors (psychoeducation, emotional support, goal setting and motivation, referral and resource information, self-assessment and monitoring, cognitive behavioral therapy, crisis interventions, and psychotherapeutic exercises) and four negative factors (ethical and legal considerations, accuracy and reliability, limited assessment capabilities, and cultural and linguistic considerations) were associated with the use of ChatGPT for mental health support. Conclusion It is important to carefully consider the ethical, reliability, accuracy, and legal challenges and develop appropriate strategies to mitigate them in order to ensure safe and effective use of AI-based applications like ChatGPT in mental health support.
Collapse
Affiliation(s)
- Fahad Alanezi
- College of Business Administration, Department Management Information Systems, Imam Abdulrahman Bin Faisal University, Dammam, 31441, Saudi Arabia
| |
Collapse
|
35
|
Abdaljaleel M, Barakat M, Alsanafi M, Salim NA, Abazid H, Malaeb D, Mohammed AH, Hassan BAR, Wayyes AM, Farhan SS, Khatib SE, Rahal M, Sahban A, Abdelaziz DH, Mansour NO, AlZayer R, Khalil R, Fekih-Romdhane F, Hallit R, Hallit S, Sallam M. A multinational study on the factors influencing university students' attitudes and usage of ChatGPT. Sci Rep 2024; 14:1983. [PMID: 38263214 PMCID: PMC10806219 DOI: 10.1038/s41598-024-52549-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Accepted: 01/19/2024] [Indexed: 01/25/2024] Open
Abstract
Artificial intelligence models, like ChatGPT, have the potential to revolutionize higher education when implemented properly. This study aimed to investigate the factors influencing university students' attitudes and usage of ChatGPT in Arab countries. The survey instrument "TAME-ChatGPT" was administered to 2240 participants from Iraq, Kuwait, Egypt, Lebanon, and Jordan. Of those, 46.8% heard of ChatGPT, and 52.6% used it before the study. The results indicated that a positive attitude and usage of ChatGPT were determined by factors like ease of use, positive attitude towards technology, social influence, perceived usefulness, behavioral/cognitive influences, low perceived risks, and low anxiety. Confirmatory factor analysis indicated the adequacy of the "TAME-ChatGPT" constructs. Multivariate analysis demonstrated that the attitude towards ChatGPT usage was significantly influenced by country of residence, age, university type, and recent academic performance. This study validated "TAME-ChatGPT" as a useful tool for assessing ChatGPT adoption among university students. The successful integration of ChatGPT in higher education relies on the perceived ease of use, perceived usefulness, positive attitude towards technology, social influence, behavioral/cognitive elements, low anxiety, and minimal perceived risks. Policies for ChatGPT adoption in higher education should be tailored to individual contexts, considering the variations in student attitudes observed in this study.
Collapse
Affiliation(s)
- Maram Abdaljaleel
- Department of Pathology, Microbiology and Forensic Medicine, School of Medicine, The University of Jordan, Amman, 11942, Jordan
- Department of Clinical Laboratories and Forensic Medicine, Jordan University Hospital, Amman, 11942, Jordan
| | - Muna Barakat
- Department of Clinical Pharmacy and Therapeutics, Faculty of Pharmacy, Applied Science Private University, Amman, 11931, Jordan
| | - Mariam Alsanafi
- Department of Pharmacy Practice, Faculty of Pharmacy, Kuwait University, Kuwait City, Kuwait
- Department of Pharmaceutical Sciences, Public Authority for Applied Education and Training, College of Health Sciences, Safat, Kuwait
| | - Nesreen A Salim
- Prosthodontic Department, School of Dentistry, The University of Jordan, Amman, 11942, Jordan
- Prosthodontic Department, Jordan University Hospital, Amman, 11942, Jordan
| | - Husam Abazid
- Department of Clinical Pharmacy and Therapeutics, Faculty of Pharmacy, Applied Science Private University, Amman, 11931, Jordan
| | - Diana Malaeb
- College of Pharmacy, Gulf Medical University, P.O. Box 4184, Ajman, United Arab Emirates
| | - Ali Haider Mohammed
- School of Pharmacy, Monash University Malaysia, Jalan Lagoon Selatan, 47500, Bandar Sunway, Selangor Darul Ehsan, Malaysia
| | | | | | - Sinan Subhi Farhan
- Department of Anesthesia, Al Rafidain University College, Baghdad, 10001, Iraq
| | - Sami El Khatib
- Department of Biomedical Sciences, School of Arts and Sciences, Lebanese International University, Bekaa, Lebanon
- Center for Applied Mathematics and Bioinformatics (CAMB), Gulf University for Science and Technology (GUST), 32093, Hawally, Kuwait
| | - Mohamad Rahal
- School of Pharmacy, Lebanese International University, Beirut, 961, Lebanon
| | - Ali Sahban
- School of Dentistry, The University of Jordan, Amman, 11942, Jordan
| | - Doaa H Abdelaziz
- Pharmacy Practice and Clinical Pharmacy Department, Faculty of Pharmacy, Future University in Egypt, Cairo, 11835, Egypt
- Department of Clinical Pharmacy, Faculty of Pharmacy, Al-Baha University, Al-Baha, Saudi Arabia
| | - Noha O Mansour
- Clinical Pharmacy and Pharmacy Practice Department, Faculty of Pharmacy, Mansoura University, Mansoura, 35516, Egypt
- Clinical Pharmacy and Pharmacy Practice Department, Faculty of Pharmacy, Mansoura National University, Dakahlia Governorate, 7723730, Egypt
| | - Reem AlZayer
- Clinical Pharmacy Practice, Department of Pharmacy, Mohammed Al-Mana College for Medical Sciences, 34222, Dammam, Saudi Arabia
| | - Roaa Khalil
- Department of Pathology, Microbiology and Forensic Medicine, School of Medicine, The University of Jordan, Amman, 11942, Jordan
| | - Feten Fekih-Romdhane
- The Tunisian Center of Early Intervention in Psychosis, Department of Psychiatry "Ibn Omrane", Razi Hospital, 2010, Manouba, Tunisia
- Faculty of Medicine of Tunis, Tunis El Manar University, Tunis, Tunisia
| | - Rabih Hallit
- School of Medicine and Medical Sciences, Holy Spirit University of Kaslik, Jounieh, Lebanon
- Department of Infectious Disease, Bellevue Medical Center, Mansourieh, Lebanon
- Department of Infectious Disease, Notre Dame des Secours, University Hospital Center, Byblos, Lebanon
| | - Souheil Hallit
- School of Medicine and Medical Sciences, Holy Spirit University of Kaslik, Jounieh, Lebanon
- Research Department, Psychiatric Hospital of the Cross, Jal Eddib, Lebanon
| | - Malik Sallam
- Department of Pathology, Microbiology and Forensic Medicine, School of Medicine, The University of Jordan, Amman, 11942, Jordan.
- Department of Clinical Laboratories and Forensic Medicine, Jordan University Hospital, Amman, 11942, Jordan.
| |
Collapse
|
36
|
Kim R, Margolis A, Barile J, Han K, Kalash S, Papaioannou H, Krevskaya A, Milanaik R. Challenging the Chatbot: An Assessment of ChatGPT's Diagnoses and Recommendations for DBP Case Studies. J Dev Behav Pediatr 2024; 45:e8-e13. [PMID: 38347665 DOI: 10.1097/dbp.0000000000001255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 11/14/2023] [Indexed: 02/18/2024]
Abstract
OBJECTIVE Chat Generative Pretrained Transformer-3.5 (ChatGPT) is a publicly available and free artificial intelligence chatbot that logs billions of visits per day; parents may rely on such tools for developmental and behavioral medical consultations. The objective of this study was to determine how ChatGPT evaluates developmental and behavioral pediatrics (DBP) case studies and makes recommendations and diagnoses. METHODS ChatGPT was asked to list treatment recommendations and a diagnosis for each of 97 DBP case studies. A panel of 3 DBP physicians evaluated ChatGPT's diagnostic accuracy and scored treatment recommendations on accuracy (5-point Likert scale) and completeness (3-point Likert scale). Physicians also assessed whether ChatGPT's treatment plan correctly addressed cultural and ethical issues for relevant cases. Scores were analyzed using Python, and descriptive statistics were computed. RESULTS The DBP panel agreed with ChatGPT's diagnosis for 66.2% of the case reports. The mean accuracy score of ChatGPT's treatment plan was deemed by physicians to be 4.6 (between entirely correct and more correct than incorrect), and the mean completeness was 2.6 (between complete and adequate). Physicians agreed that ChatGPT addressed relevant cultural issues in 10 out of the 11 appropriate cases and the ethical issues in the single ethical case. CONCLUSION While ChatGPT can generate a comprehensive and adequate list of recommendations, the diagnosis accuracy rate is still low. Physicians must advise caution to patients when using such online sources.
Collapse
Affiliation(s)
- Rachel Kim
- Division of Developmental and Behavioral Pediatrics, Steven and Alexandra Cohen's Children Medical Center of New York, Lake Success, NY
| | | | | | | | | | | | | | | |
Collapse
|
37
|
Pushpanathan K, Lim ZW, Er Yew SM, Chen DZ, Hui'En Lin HA, Lin Goh JH, Wong WM, Wang X, Jin Tan MC, Chang Koh VT, Tham YC. Popular large language model chatbots' accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries. iScience 2023; 26:108163. [PMID: 37915603 PMCID: PMC10616302 DOI: 10.1016/j.isci.2023.108163] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 09/19/2023] [Accepted: 10/05/2023] [Indexed: 11/03/2023] Open
Abstract
In light of growing interest in using emerging large language models (LLMs) for self-diagnosis, we systematically assessed the performance of ChatGPT-3.5, ChatGPT-4.0, and Google Bard in delivering proficient responses to 37 common inquiries regarding ocular symptoms. Responses were masked, randomly shuffled, and then graded by three consultant-level ophthalmologists for accuracy (poor, borderline, good) and comprehensiveness. Additionally, we evaluated the self-awareness capabilities (ability to self-check and self-correct) of the LLM-Chatbots. 89.2% of ChatGPT-4.0 responses were 'good'-rated, outperforming ChatGPT-3.5 (59.5%) and Google Bard (40.5%) significantly (all p < 0.001). All three LLM-Chatbots showed optimal mean comprehensiveness scores as well (ranging from 4.6 to 4.7 out of 5). However, they exhibited subpar to moderate self-awareness capabilities. Our study underscores the potential of ChatGPT-4.0 in delivering accurate and comprehensive responses to ocular symptom inquiries. Future rigorous validation of their performance is crucial to ensure their reliability and appropriateness for actual clinical use.
Collapse
Affiliation(s)
- Krithi Pushpanathan
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Zhi Wei Lim
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Samantha Min Er Yew
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - David Ziyou Chen
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Department of Ophthalmology, National University Hospital, Singapore, Singapore
| | - Hazel Anne Hui'En Lin
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Department of Ophthalmology, National University Hospital, Singapore, Singapore
| | - Jocelyn Hui Lin Goh
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore
| | - Wendy Meihua Wong
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Department of Ophthalmology, National University Hospital, Singapore, Singapore
| | - Xiaofei Wang
- Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing, China
- Advanced Innovation Centre for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, Beijing, China
| | - Marcus Chun Jin Tan
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Department of Ophthalmology, National University Hospital, Singapore, Singapore
| | - Victor Teck Chang Koh
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Department of Ophthalmology, National University Hospital, Singapore, Singapore
| | - Yih-Chung Tham
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore
- Ophthalmology and Visual Sciences Academic Clinical Programme (Eye ACP), Duke NUS Medical School, Singapore, Singapore
| |
Collapse
|
38
|
Sallam M, Barakat M, Sallam M. Pilot Testing of a Tool to Standardize the Assessment of the Quality of Health Information Generated by Artificial Intelligence-Based Models. Cureus 2023; 15:e49373. [PMID: 38024074 PMCID: PMC10674084 DOI: 10.7759/cureus.49373] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/24/2023] [Indexed: 12/01/2023] Open
Abstract
Background Artificial intelligence (AI)-based conversational models, such as Chat Generative Pre-trained Transformer (ChatGPT), Microsoft Bing, and Google Bard, have emerged as valuable sources of health information for lay individuals. However, the accuracy of the information provided by these AI models remains a significant concern. This pilot study aimed to test a new tool with key themes for inclusion as follows: Completeness of content, Lack of false information in the content, Evidence supporting the content, Appropriateness of the content, and Relevance, referred to as "CLEAR", designed to assess the quality of health information delivered by AI-based models. Methods Tool development involved a literature review on health information quality, followed by the initial establishment of the CLEAR tool, which comprised five items that aimed to assess the following: completeness, lack of false information, evidence support, appropriateness, and relevance. Each item was scored on a five-point Likert scale from excellent to poor. Content validity was checked by expert review. Pilot testing involved 32 healthcare professionals using the CLEAR tool to assess content on eight different health topics deliberately designed with varying qualities. The internal consistency was checked with Cronbach's alpha (α). Feedback from the pilot test resulted in language modifications to improve the clarity of the items. The final CLEAR tool was used to assess the quality of health information generated by four distinct AI models on five health topics. The AI models were ChatGPT 3.5, ChatGPT 4, Microsoft Bing, and Google Bard, and the content generated was scored by two independent raters with Cohen's kappa (κ) for inter-rater agreement. Results The final five CLEAR items were: (1) Is the content sufficient?; (2) Is the content accurate?; (3) Is the content evidence-based?; (4) Is the content clear, concise, and easy to understand?; and (5) Is the content free from irrelevant information? Pilot testing on the eight health topics revealed acceptable internal consistency with a Cronbach's α range of 0.669-0.981. The use of the final CLEAR tool yielded the following average scores: Microsoft Bing (mean=24.4±0.42), ChatGPT-4 (mean=23.6±0.96), Google Bard (mean=21.2±1.79), and ChatGPT-3.5 (mean=20.6±5.20). The inter-rater agreement revealed the following Cohen κ values: for ChatGPT-3.5 (κ=0.875, P<.001), ChatGPT-4 (κ=0.780, P<.001), Microsoft Bing (κ=0.348, P=.037), and Google Bard (κ=.749, P<.001). Conclusions The CLEAR tool is a brief yet helpful tool that can aid in standardizing testing of the quality of health information generated by AI-based models. Future studies are recommended to validate the utility of the CLEAR tool in the quality assessment of AI-generated health-related content using a larger sample across various complex health topics.
Collapse
Affiliation(s)
- Malik Sallam
- Department of Pathology, Microbiology, and Forensic Medicine, School of Medicine, University of Jordan, Amman, JOR
- Department of Clinical Laboratories and Forensic Medicine, Jordan University Hospital, Amman, JOR
| | - Muna Barakat
- Department of Clinical Pharmacy and Therapeutics, School of Pharmacy, Applied Science Private University, Amman, JOR
- Department of Research, Middle East University, Amman, JOR
| | - Mohammed Sallam
- Department of Pharmacy, Mediclinic Parkview Hospital, Mediclinic Middle East, Dubai, ARE
| |
Collapse
|
39
|
Islam MR, Urmi TJ, Mosharrafa RA, Rahman MS, Kadir MF. Role of ChatGPT in health science and research: A correspondence addressing potential application. Health Sci Rep 2023; 6:e1625. [PMID: 37841943 PMCID: PMC10568002 DOI: 10.1002/hsr2.1625] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 09/01/2023] [Accepted: 09/28/2023] [Indexed: 10/17/2023] Open
Affiliation(s)
- Md. Rabiul Islam
- School of PharmacyBRAC UniversityDhakaBangladesh
- Department of PharmacyUniversity of Asia PacificDhakaBangladesh
| | | | - Rana Al Mosharrafa
- Department of Business Administration, Faculty of Business StudiesPrime UniversityDhakaBangladesh
| | | | - Mohammad Fahim Kadir
- Department of PharmacologyLake Erie College of Osteopathic Medicine (LECOM)EriePennsylvaniaUSA
| |
Collapse
|
40
|
Levkovich I, Elyoseph Z. Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 Versus ChatGPT-4: Vignette Study. JMIR Ment Health 2023; 10:e51232. [PMID: 37728984 PMCID: PMC10551796 DOI: 10.2196/51232] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 08/22/2023] [Accepted: 08/24/2023] [Indexed: 09/22/2023] Open
Abstract
BACKGROUND ChatGPT, a linguistic artificial intelligence (AI) model engineered by OpenAI, offers prospective contributions to mental health professionals. Although having significant theoretical implications, ChatGPT's practical capabilities, particularly regarding suicide prevention, have not yet been substantiated. OBJECTIVE The study's aim was to evaluate ChatGPT's ability to assess suicide risk, taking into consideration 2 discernable factors-perceived burdensomeness and thwarted belongingness-over a 2-month period. In addition, we evaluated whether ChatGPT-4 more accurately evaluated suicide risk than did ChatGPT-3.5. METHODS ChatGPT was tasked with assessing a vignette that depicted a hypothetical patient exhibiting differing degrees of perceived burdensomeness and thwarted belongingness. The assessments generated by ChatGPT were subsequently contrasted with standard evaluations rendered by mental health professionals. Using both ChatGPT-3.5 and ChatGPT-4 (May 24, 2023), we executed 3 evaluative procedures in June and July 2023. Our intent was to scrutinize ChatGPT-4's proficiency in assessing various facets of suicide risk in relation to the evaluative abilities of both mental health professionals and an earlier version of ChatGPT-3.5 (March 14 version). RESULTS During the period of June and July 2023, we found that the likelihood of suicide attempts as evaluated by ChatGPT-4 was similar to the norms of mental health professionals (n=379) under all conditions (average Z score of 0.01). Nonetheless, a pronounced discrepancy was observed regarding the assessments performed by ChatGPT-3.5 (May version), which markedly underestimated the potential for suicide attempts, in comparison to the assessments carried out by the mental health professionals (average Z score of -0.83). The empirical evidence suggests that ChatGPT-4's evaluation of the incidence of suicidal ideation and psychache was higher than that of the mental health professionals (average Z score of 0.47 and 1.00, respectively). Conversely, the level of resilience as assessed by both ChatGPT-4 and ChatGPT-3.5 (both versions) was observed to be lower in comparison to the assessments offered by mental health professionals (average Z score of -0.89 and -0.90, respectively). CONCLUSIONS The findings suggest that ChatGPT-4 estimates the likelihood of suicide attempts in a manner akin to evaluations provided by professionals. In terms of recognizing suicidal ideation, ChatGPT-4 appears to be more precise. However, regarding psychache, there was an observed overestimation by ChatGPT-4, indicating a need for further research. These results have implications regarding ChatGPT-4's potential to support gatekeepers, patients, and even mental health professionals' decision-making. Despite the clinical potential, intensive follow-up studies are necessary to establish the use of ChatGPT-4's capabilities in clinical practice. The finding that ChatGPT-3.5 frequently underestimates suicide risk, especially in severe cases, is particularly troubling. It indicates that ChatGPT may downplay one's actual suicide risk level.
Collapse
Affiliation(s)
- Inbar Levkovich
- Oranim Academic College, Faculty of Graduate Studies, Kiryat Tivon, Israel
| | - Zohar Elyoseph
- Department of Psychology and Educational Counseling, The Center for Psychobiological Research, Max Stern Yezreel Valley College, Emek Yezreel, Israel
- Department of Brain Sciences, Faculty of Medicine, Imperial College London, London, United Kingdom
| |
Collapse
|
41
|
Kuroiwa T, Sarcon A, Ibara T, Yamada E, Yamamoto A, Tsukamoto K, Fujita K. The Potential of ChatGPT as a Self-Diagnostic Tool in Common Orthopedic Diseases: Exploratory Study. J Med Internet Res 2023; 25:e47621. [PMID: 37713254 PMCID: PMC10541638 DOI: 10.2196/47621] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 05/17/2023] [Accepted: 08/17/2023] [Indexed: 09/16/2023] Open
Abstract
BACKGROUND Artificial intelligence (AI) has gained tremendous popularity recently, especially the use of natural language processing (NLP). ChatGPT is a state-of-the-art chatbot capable of creating natural conversations using NLP. The use of AI in medicine can have a tremendous impact on health care delivery. Although some studies have evaluated ChatGPT's accuracy in self-diagnosis, there is no research regarding its precision and the degree to which it recommends medical consultations. OBJECTIVE The aim of this study was to evaluate ChatGPT's ability to accurately and precisely self-diagnose common orthopedic diseases, as well as the degree of recommendation it provides for medical consultations. METHODS Over a 5-day course, each of the study authors submitted the same questions to ChatGPT. The conditions evaluated were carpal tunnel syndrome (CTS), cervical myelopathy (CM), lumbar spinal stenosis (LSS), knee osteoarthritis (KOA), and hip osteoarthritis (HOA). Answers were categorized as either correct, partially correct, incorrect, or a differential diagnosis. The percentage of correct answers and reproducibility were calculated. The reproducibility between days and raters were calculated using the Fleiss κ coefficient. Answers that recommended that the patient seek medical attention were recategorized according to the strength of the recommendation as defined by the study. RESULTS The ratios of correct answers were 25/25, 1/25, 24/25, 16/25, and 17/25 for CTS, CM, LSS, KOA, and HOA, respectively. The ratios of incorrect answers were 23/25 for CM and 0/25 for all other conditions. The reproducibility between days was 1.0, 0.15, 0.7, 0.6, and 0.6 for CTS, CM, LSS, KOA, and HOA, respectively. The reproducibility between raters was 1.0, 0.1, 0.64, -0.12, and 0.04 for CTS, CM, LSS, KOA, and HOA, respectively. Among the answers recommending medical attention, the phrases "essential," "recommended," "best," and "important" were used. Specifically, "essential" occurred in 4 out of 125, "recommended" in 12 out of 125, "best" in 6 out of 125, and "important" in 94 out of 125 answers. Additionally, 7 out of the 125 answers did not include a recommendation to seek medical attention. CONCLUSIONS The accuracy and reproducibility of ChatGPT to self-diagnose five common orthopedic conditions were inconsistent. The accuracy could potentially be improved by adding symptoms that could easily identify a specific location. Only a few answers were accompanied by a strong recommendation to seek medical attention according to our study standards. Although ChatGPT could serve as a potential first step in accessing care, we found variability in accurate self-diagnosis. Given the risk of harm with self-diagnosis without medical follow-up, it would be prudent for an NLP to include clear language alerting patients to seek expert medical opinions. We hope to shed further light on the use of AI in a future clinical study.
Collapse
Affiliation(s)
- Tomoyuki Kuroiwa
- Department of Orthopaedic and Spinal Surgery, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
- Division of Orthopedic Surgery Research, Mayo Clinic, Rochester, MN, United States
| | - Aida Sarcon
- Department of Surgery, Mayo Clinic, Rochester, MN, United States
| | - Takuya Ibara
- Department of Functional Joint Anatomy, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | - Eriku Yamada
- Department of Orthopaedic and Spinal Surgery, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | - Akiko Yamamoto
- Department of Orthopaedic and Spinal Surgery, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | - Kazuya Tsukamoto
- Department of Orthopaedic and Spinal Surgery, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | - Koji Fujita
- Department of Functional Joint Anatomy, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
- Division of Medical Design Innovations, Open Innovation Center, Institute of Research Innovation, Tokyo Medical and Dental University, Tokyo, Japan
| |
Collapse
|
42
|
Oca MC, Meller L, Wilson K, Parikh AO, McCoy A, Chang J, Sudharshan R, Gupta S, Zhang-Nunes S. Bias and Inaccuracy in AI Chatbot Ophthalmologist Recommendations. Cureus 2023; 15:e45911. [PMID: 37885556 PMCID: PMC10599183 DOI: 10.7759/cureus.45911] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/25/2023] [Indexed: 10/28/2023] Open
Abstract
PURPOSE AND DESIGN To evaluate the accuracy and bias of ophthalmologist recommendations made by three AI chatbots, namely ChatGPT 3.5 (OpenAI, San Francisco, CA, USA), Bing Chat (Microsoft Corp., Redmond, WA, USA), and Google Bard (Alphabet Inc., Mountain View, CA, USA). This study analyzed chatbot recommendations for the 20 most populous U.S. cities. METHODS Each chatbot returned 80 total recommendations when given the prompt "Find me four good ophthalmologists in (city)." Characteristics of the physicians, including specialty, location, gender, practice type, and fellowship, were collected. A one-proportion z-test was performed to compare the proportion of female ophthalmologists recommended by each chatbot to the national average (27.2% per the Association of American Medical Colleges (AAMC)). Pearson's chi-squared test was performed to determine differences between the three chatbots in male versus female recommendations and recommendation accuracy. RESULTS Female ophthalmologists recommended by Bing Chat (1.61%) and Bard (8.0%) were significantly less than the national proportion of 27.2% practicing female ophthalmologists (p<0.001, p<0.01, respectively). ChatGPT recommended fewer female (29.5%) than male ophthalmologists (p<0.722). ChatGPT (73.8%), Bing Chat (67.5%), and Bard (62.5%) gave high rates of inaccurate recommendations. Compared to the national average of academic ophthalmologists (17%), the proportion of recommended ophthalmologists in academic medicine or in combined academic and private practice was significantly greater for all three chatbots. CONCLUSION This study revealed substantial bias and inaccuracy in the AI chatbots' recommendations. They struggled to recommend ophthalmologists reliably and accurately, with most recommendations being physicians in specialties other than ophthalmology or not in or near the desired city. Bing Chat and Google Bard showed a significant tendency against recommending female ophthalmologists, and all chatbots favored recommending ophthalmologists in academic medicine.
Collapse
Affiliation(s)
- Michael C Oca
- Orthopedic Surgery, Shiley Eye Institute, University of California (UC) San Diego Health, La Jolla, USA
| | - Leo Meller
- Orthopedic Surgery, Shiley Eye Institute, University of California (UC) San Diego Health, La Jolla, USA
| | - Katherine Wilson
- Orthopedic Surgery, Shiley Eye Institute, University of California (UC) San Diego Health, La Jolla, USA
| | - Alomi O Parikh
- Ophthalmology, University of Southern California (USC) Roski Eye Institute, Keck School of Medicine of University of Southern California, Los Angeles, USA
| | - Allison McCoy
- Plastic Surgery, Del Mar Plastic Surgery, San Diego, USA
| | - Jessica Chang
- Ophthalmology, University of Southern California (USC) Roski Eye Institute, Keck School of Medicine of University of Southern California, Los Angeles, USA
| | - Rasika Sudharshan
- Ophthalmology, University of Southern California (USC) Roski Eye Institute, Keck School of Medicine of University of Southern California, Los Angeles, USA
| | - Shreya Gupta
- Ophthalmology, University of Southern California (USC) Roski Eye Institute, Keck School of Medicine of University of Southern California, Los Angeles, USA
| | - Sandy Zhang-Nunes
- Ophthalmology, University of Southern California (USC) Roski Eye Institute, Keck School of Medicine of University of Southern California, Los Angeles, USA
| |
Collapse
|
43
|
Kumari A, Kumari A, Singh A, Singh SK, Juhi A, Dhanvijay AKD, Pinjar MJ, Mondal H. Large Language Models in Hematology Case Solving: A Comparative Study of ChatGPT-3.5, Google Bard, and Microsoft Bing. Cureus 2023; 15:e43861. [PMID: 37736448 PMCID: PMC10511207 DOI: 10.7759/cureus.43861] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/21/2023] [Indexed: 09/23/2023] Open
Abstract
Background Large language models (LLMs), such as ChatGPT-3.5, Google Bard, and Microsoft Bing, have shown promising capabilities in various natural language processing (NLP) tasks. However, their performance and accuracy in solving domain-specific questions, particularly in the field of hematology, have not been extensively investigated. Objective This study aimed to explore the capability of LLMs, namely, ChatGPT-3.5, Google Bard, and Microsoft Bing (Precise), in solving hematology-related cases and comparing their performance. Methods This was a cross-sectional study conducted in the Department of Physiology and Pathology, All India Institute of Medical Sciences, Deoghar, Jharkhand, India. We curated a set of 50 cases on hematology covering a range of topics and complexities. The dataset included queries related to blood disorders, hematologic malignancies, laboratory test parameters, calculations, and treatment options. Each case and related question was prepared with a set of correct answers to compare with. We utilized ChatGPT-3.5, Google Bard Experiment, and Microsoft Bing (Precise) for question-answering tasks. The answers were checked by two physiologists and one pathologist. They rated the answers on a rating scale from one to five. The average score of the three models was compared by Friedman's test with Dunn's post-hoc test. The performance of the LLMs was compared with a median of 2.5 by a one-sample median test as the curriculum from which the questions were curated has a 50% pass grade. Results The scores among the three LLMs were significantly different (p-value < 0.0001) with the highest score by ChatGPT (3.15±1.19), followed by Bard (2.23±1.17) and Bing (1.98±1.01). The score of ChatGPT was significantly higher than 50% (p-value = 0.0004), Bard's score was close to 50% (p-value = 0.38), and Bing's score was significantly lower than the pass score (p-value = 0.0015). Conclusion The LLMs reveal significant differences in solving case vignettes in hematology. ChatGPT exhibited the highest score, followed by Google Bard and Microsoft Bing. The observed performance trends suggest that ChatGPT holds promising potential in the medical domain. However, none of the models was capable of answering all questions accurately. Further research and optimization of language models can offer valuable contributions to healthcare and medical education applications.
Collapse
Affiliation(s)
- Amita Kumari
- Physiology, All India Institute of Medical Sciences, Deoghar, Deoghar, IND
| | - Anita Kumari
- Physiology, All India Institute of Medical Sciences, Deoghar, Deoghar, IND
| | - Amita Singh
- Physiology, All India Institute of Medical Sciences, Deoghar, Deoghar, IND
| | - Sanjeet K Singh
- Pathology, All India Institute of Medical Sciences, Deoghar, Deoghar, IND
| | - Ayesha Juhi
- Physiology, All India Institute of Medical Sciences, Deoghar, Deoghar, IND
| | | | | | - Himel Mondal
- Physiology, All India Institute of Medical Sciences, Deoghar, Deoghar, IND
| |
Collapse
|
44
|
Nov O, Singh N, Mann D. Putting ChatGPT's Medical Advice to the (Turing) Test: Survey Study. JMIR MEDICAL EDUCATION 2023; 9:e46939. [PMID: 37428540 PMCID: PMC10366957 DOI: 10.2196/46939] [Citation(s) in RCA: 27] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 05/26/2023] [Accepted: 06/14/2023] [Indexed: 07/11/2023]
Abstract
BACKGROUND Chatbots are being piloted to draft responses to patient questions, but patients' ability to distinguish between provider and chatbot responses and patients' trust in chatbots' functions are not well established. OBJECTIVE This study aimed to assess the feasibility of using ChatGPT (Chat Generative Pre-trained Transformer) or a similar artificial intelligence-based chatbot for patient-provider communication. METHODS A survey study was conducted in January 2023. Ten representative, nonadministrative patient-provider interactions were extracted from the electronic health record. Patients' questions were entered into ChatGPT with a request for the chatbot to respond using approximately the same word count as the human provider's response. In the survey, each patient question was followed by a provider- or ChatGPT-generated response. Participants were informed that 5 responses were provider generated and 5 were chatbot generated. Participants were asked-and incentivized financially-to correctly identify the response source. Participants were also asked about their trust in chatbots' functions in patient-provider communication, using a Likert scale from 1-5. RESULTS A US-representative sample of 430 study participants aged 18 and older were recruited on Prolific, a crowdsourcing platform for academic studies. In all, 426 participants filled out the full survey. After removing participants who spent less than 3 minutes on the survey, 392 respondents remained. Overall, 53.3% (209/392) of respondents analyzed were women, and the average age was 47.1 (range 18-91) years. The correct classification of responses ranged between 49% (192/392) to 85.7% (336/392) for different questions. On average, chatbot responses were identified correctly in 65.5% (1284/1960) of the cases, and human provider responses were identified correctly in 65.1% (1276/1960) of the cases. On average, responses toward patients' trust in chatbots' functions were weakly positive (mean Likert score 3.4 out of 5), with lower trust as the health-related complexity of the task in the questions increased. CONCLUSIONS ChatGPT responses to patient questions were weakly distinguishable from provider responses. Laypeople appear to trust the use of chatbots to answer lower-risk health questions. It is important to continue studying patient-chatbot interaction as chatbots move from administrative to more clinical roles in health care.
Collapse
Affiliation(s)
- Oded Nov
- Department of Technology Management, Tandon School of Engineering, New York University, New York, NY, United States
| | - Nina Singh
- Department of Population Health, Grossman School of Medicine, New York University, New York, NY, United States
| | - Devin Mann
- Department of Population Health, Grossman School of Medicine, New York University, New York, NY, United States
- Medical Center Information Technology, Langone Health, New York University, New York, NY, United States
| |
Collapse
|