Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Chow JCL, Sanders L, Li K. Impact of ChatGPT on medical chatbots as a disruptive technology. Front Artif Intell 2023;6:1166014. [PMID: 37091303 PMCID: PMC10113434 DOI: 10.3389/frai.2023.1166014] [Citation(s) in RCA: 36] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 03/23/2023] [Indexed: 04/09/2023] Open

For:	Chow JCL, Sanders L, Li K. Impact of ChatGPT on medical chatbots as a disruptive technology. Front Artif Intell 2023;6:1166014. [PMID: 37091303 PMCID: PMC10113434 DOI: 10.3389/frai.2023.1166014] [Citation(s) in RCA: 36] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 03/23/2023] [Indexed: 04/09/2023] Open

Number

Cited by Other Article(s)

Suárez A, Jiménez J, Llorente de Pedro M, Andreu-Vázquez C, Díaz-Flores García V, Gómez Sánchez M, Freire Y. Beyond the Scalpel: Assessing ChatGPT's potential as an auxiliary intelligent virtual assistant in oral surgery. Comput Struct Biotechnol J 2024;24:46-52. [PMID: 38162955 PMCID: PMC10755495 DOI: 10.1016/j.csbj.2023.11.058] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 11/28/2023] [Accepted: 11/28/2023] [Indexed: 01/03/2024] Open

Chow JCL, Li K. Ethical Considerations in Human-Centered AI: Advancing Oncology Chatbots Through Large Language Models. JMIR BIOINFORMATICS AND BIOTECHNOLOGY 2024;5:e64406. [PMID: 39321336 DOI: 10.2196/64406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Revised: 08/23/2024] [Accepted: 09/23/2024] [Indexed: 09/27/2024]

Battisti ES, Roman MK, Bellei EA, Kirsten VR, De Marchi ACB, Da Silva Leal GV. A virtual assistant for primary care's food and nutrition surveillance system: Development and validation study in Brazil. PATIENT EDUCATION AND COUNSELING 2024;130:108461. [PMID: 39413720 DOI: 10.1016/j.pec.2024.108461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 09/18/2024] [Accepted: 10/04/2024] [Indexed: 10/18/2024]

Huo B, Marfo N, Sylla P, Calabrese E, Kumar S, Slater BJ, Walsh DS, Vosburg W. Clinical artificial intelligence: teaching a large language model to generate recommendations that align with guidelines for the surgical management of GERD. Surg Endosc 2024;38:5668-5677. [PMID: 39134725 DOI: 10.1007/s00464-024-11155-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Accepted: 08/04/2024] [Indexed: 10/08/2024]

Abstract

BACKGROUND

Large Language Models (LLMs) provide clinical guidance with inconsistent accuracy due to limitations with their training dataset. LLMs are "teachable" through customization. We compared the ability of the generic ChatGPT-4 model and a customized version of ChatGPT-4 to provide recommendations for the surgical management of gastroesophageal reflux disease (GERD) to both surgeons and patients.

METHODS

Sixty patient cases were developed using eligibility criteria from the Society of American Gastrointestinal and Endoscopic Surgeons (SAGES) & United European Gastroenterology (UEG)-European Association of Endoscopic. Surgery (EAES) guidelines for the surgical management of GERD. Standardized prompts were engineered for physicians as the end-user, with separate layperson prompts for patients. A customized GPT was developed to generate recommendations based on guidelines, called the GERD Tool for Surgery (GTS). Both the GTS and generic ChatGPT-4 were queried July 21st, 2024. Model performance was evaluated by comparing responses to SAGES & UEG-EAES guideline recommendations. Outcome data was presented using descriptive statistics including counts and percentages.

RESULTS

The GTS provided accurate recommendations for the surgical management of GERD for 60/60 (100.0%) surgeon inquiries and 40/40 (100.0%) patient inquiries based on guideline recommendations. The Generic ChatGPT-4 model generated accurate guidance for 40/60 (66.7%) surgeon inquiries and 19/40 (47.5%) patient inquiries. The GTS produced recommendations based on the 2021 SAGES & UEG-EAES guidelines on the surgical management of GERD, while the generic ChatGPT-4 model generated guidance without citing evidence to support its recommendations.

CONCLUSION

ChatGPT-4 can be customized to overcome limitations with its training dataset to provide recommendations for the surgical management of GERD with reliable accuracy and consistency. The training of LLM models can be used to help integrate this efficient technology into the creation of robust and accurate information for both surgeons and patients. Prospective data is needed to assess its effectiveness in a pragmatic clinical environment.

Collapse

Haran C, Allan P, Dholakia J, Lai S, Lim E, Xu W, Hart O, Cain J, Narayanan A, Khashram M. The application and uses of telemedicine in vascular surgery: A narrative review. Semin Vasc Surg 2024;37:290-297. [PMID: 39277344 DOI: 10.1053/j.semvascsurg.2024.07.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Revised: 07/14/2024] [Accepted: 07/22/2024] [Indexed: 09/17/2024]

Lee JW, Yoo IS, Kim JH, Kim WT, Jeon HJ, Yoo HS, Shin JG, Kim GH, Hwang S, Park S, Kim YJ. Development of AI-generated medical responses using the ChatGPT for cancer patients. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024;254:108302. [PMID: 38996805 DOI: 10.1016/j.cmpb.2024.108302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 05/28/2024] [Accepted: 06/22/2024] [Indexed: 07/14/2024]

Affiliation(s)

Jae-Woo Lee Department of Family Medicine, Chungbuk National University Hospital, Cheongju, Republic of Korea; Department of Family Medicine, Chungbuk National University College of Medicine, Cheongju, Republic of Korea
In-Sang Yoo Department of Biomedical Engineering, Chungbuk National University Hospital, Cheongju, Republic of Korea; Department of Medicine, Chungbuk National University College of Medicine, Cheongju, Republic of Korea
Ji-Hye Kim Department of Biomedical Engineering, Chungbuk National University Hospital, Cheongju, Republic of Korea
Won Tae Kim Department of Urology, Chungbuk National University Hospital, Cheongju, Republic of Korea; Department of Urology, Chungbuk National University College of Medicine, 1 Chungdae-ro, Seowon-gu, Cheongju, Chungcheongbuk-do 28644, Republic of Korea
Hyun Jeong Jeon Department of Internal Medicine, Chungbuk National University Hospital, Cheongju, Republic of Korea; Department of Internal Medicine, College of Medicine, Chungbuk National University, Cheongju, Republic of Korea
Hyo-Sun Yoo Department of Family Medicine, Chungbuk National University Hospital, Cheongju, Republic of Korea
Jae Gwang Shin Department of Biomedical Engineering, Chungbuk National University Hospital, Cheongju, Republic of Korea
Geun-Hyeong Kim Department of Biomedical Engineering, Chungbuk National University Hospital, Cheongju, Republic of Korea
ShinJi Hwang Department of Biomedical Engineering, Chungbuk National University Hospital, Cheongju, Republic of Korea
Seung Park Department of Biomedical Engineering, Chungbuk National University Hospital, Cheongju, Republic of Korea; Department of Medicine, Chungbuk National University College of Medicine, Cheongju, Republic of Korea
Yong-June Kim Department of Urology, Chungbuk National University Hospital, Cheongju, Republic of Korea; Department of Urology, Chungbuk National University College of Medicine, 1 Chungdae-ro, Seowon-gu, Cheongju, Chungcheongbuk-do 28644, Republic of Korea.

Collapse

Mihalache A, Grad J, Patil NS, Huang RS, Popovic MM, Mallipatna A, Kertes PJ, Muni RH. Google Gemini and Bard artificial intelligence chatbot performance in ophthalmology knowledge assessment. Eye (Lond) 2024;38:2530-2535. [PMID: 38615098 PMCID: PMC11383935 DOI: 10.1038/s41433-024-03067-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 03/08/2024] [Accepted: 04/04/2024] [Indexed: 04/15/2024] Open

Cherrez-Ojeda I, Gallardo-Bastidas JC, Robles-Velasco K, Osorio MF, Velez Leon EM, Leon Velastegui M, Pauletto P, Aguilar-Díaz FC, Squassi A, González Eras SP, Cordero Carrasco E, Chavez Gonzalez KL, Calderon JC, Bousquet J, Bedbrook A, Faytong-Haro M. Understanding Health Care Students' Perceptions, Beliefs, and Attitudes Toward AI-Powered Language Models: Cross-Sectional Study. JMIR MEDICAL EDUCATION 2024;10:e51757. [PMID: 39137029 DOI: 10.2196/51757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 09/26/2023] [Accepted: 04/30/2024] [Indexed: 08/15/2024]

Abstract

BACKGROUND

ChatGPT was not intended for use in health care, but it has potential benefits that depend on end-user understanding and acceptability, which is where health care students become crucial. There is still a limited amount of research in this area.

OBJECTIVE

The primary aim of our study was to assess the frequency of ChatGPT use, the perceived level of knowledge, the perceived risks associated with its use, and the ethical issues, as well as attitudes toward the use of ChatGPT in the context of education in the field of health. In addition, we aimed to examine whether there were differences across groups based on demographic variables. The second part of the study aimed to assess the association between the frequency of use, the level of perceived knowledge, the level of risk perception, and the level of perception of ethics as predictive factors for participants' attitudes toward the use of ChatGPT.

METHODS

A cross-sectional survey was conducted from May to June 2023 encompassing students of medicine, nursing, dentistry, nutrition, and laboratory science across the Americas. The study used descriptive analysis, chi-square tests, and ANOVA to assess statistical significance across different categories. The study used several ordinal logistic regression models to analyze the impact of predictive factors (frequency of use, perception of knowledge, perception of risk, and ethics perception scores) on attitude as the dependent variable. The models were adjusted for gender, institution type, major, and country. Stata was used to conduct all the analyses.

RESULTS

Of 2661 health care students, 42.99% (n=1144) were unaware of ChatGPT. The median score of knowledge was "minimal" (median 2.00, IQR 1.00-3.00). Most respondents (median 2.61, IQR 2.11-3.11) regarded ChatGPT as neither ethical nor unethical. Most participants (median 3.89, IQR 3.44-4.34) "somewhat agreed" that ChatGPT (1) benefits health care settings, (2) provides trustworthy data, (3) is a helpful tool for clinical and educational medical information access, and (4) makes the work easier. In total, 70% (7/10) of people used it for homework. As the perceived knowledge of ChatGPT increased, there was a stronger tendency with regard to having a favorable attitude toward ChatGPT. Higher ethical consideration perception ratings increased the likelihood of considering ChatGPT as a source of trustworthy health care information (odds ratio [OR] 1.620, 95% CI 1.498-1.752), beneficial in medical issues (OR 1.495, 95% CI 1.452-1.539), and useful for medical literature (OR 1.494, 95% CI 1.426-1.564; P<.001 for all results).

CONCLUSIONS

Over 40% of American health care students (1144/2661, 42.99%) were unaware of ChatGPT despite its extensive use in the health field. Our data revealed the positive attitudes toward ChatGPT and the desire to learn more about it. Medical educators must explore how chatbots may be included in undergraduate health care education programs.

Collapse

Affiliation(s)

Ivan Cherrez-Ojeda Universidad Espiritu Santo, Samborondon, Ecuador Respiralab Research Group, Guayaquil, Ecuador
Juan C Gallardo-Bastidas School of Dentistry, Universidad Católica de Santiago de Guayaquil, Guayaquil, Ecuador
Karla Robles-Velasco Universidad Espiritu Santo, Samborondon, Ecuador Respiralab Research Group, Guayaquil, Ecuador
María F Osorio Universidad Espiritu Santo, Samborondon, Ecuador Respiralab Research Group, Guayaquil, Ecuador
Eleonor Maria Velez Leon Facultad de Odontología Universidad Católica de Cuenca, Cuenca, Ecuador
Manuel Leon Velastegui Universidad Nacional de Chimborazo, Riobamba, Ecuador
Patrícia Pauletto Universidad de Las Américas (UDLA), Quito, Ecuador
F C Aguilar-Díaz Departamento Salud Pública, Escuela Nacional de Estudios Superiores, Universidad Nacional Autónoma de México, Guanajuato, Mexico
Aldo Squassi Universidad de Buenos Aires, Facultad de Odontologìa, Cátedra de Odontología Preventiva y Comunitaria, Buenos Aires, Argentina
Susana Patricia González Eras Universidad Nacional de Loja, Loja, Ecuador
Erita Cordero Carrasco Departamento de cirugía y traumatología bucal y maxilofacial, Universidad de Chile, Santiago, Chile
Karol Leonor Chavez Gonzalez Universidad Politécnica Salesiana Sede Guayaquil, Guayaquil, Ecuador
Juan C Calderon Universidad Espiritu Santo, Samborondon, Ecuador Respiralab Research Group, Guayaquil, Ecuador
Jean Bousquet Institute of Allergology, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany Fraunhofer Institute for Translational Medicine and Pharmacology ITMP, Allergology and Immunology, Berlin, Germany MASK-air, Montpellier, France
Anna Bedbrook MASK-air, Montpellier, France
Marco Faytong-Haro Respiralab Research Group, Guayaquil, Ecuador Universidad Estatal de Milagro, Cdla Universitaria "Dr. Rómulo Minchala Murillo", Milagro, Ecuador Ecuadorian Development Research Lab, Daule, Ecuador

Collapse

Takahashi H, Shikino K, Kondo T, Komori A, Yamada Y, Saita M, Naito T. Educational Utility of Clinical Vignettes Generated in Japanese by ChatGPT-4: Mixed Methods Study. JMIR MEDICAL EDUCATION 2024;10:e59133. [PMID: 39137031 DOI: 10.2196/59133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Revised: 05/22/2024] [Accepted: 06/27/2024] [Indexed: 08/15/2024]

Abstract

BACKGROUND

Evaluating the accuracy and educational utility of artificial intelligence-generated medical cases, especially those produced by large language models such as ChatGPT-4 (developed by OpenAI), is crucial yet underexplored.

OBJECTIVE

This study aimed to assess the educational utility of ChatGPT-4-generated clinical vignettes and their applicability in educational settings.

METHODS

Using a convergent mixed methods design, a web-based survey was conducted from January 8 to 28, 2024, to evaluate 18 medical cases generated by ChatGPT-4 in Japanese. In the survey, 6 main question items were used to evaluate the quality of the generated clinical vignettes and their educational utility, which are information quality, information accuracy, educational usefulness, clinical match, terminology accuracy (TA), and diagnosis difficulty. Feedback was solicited from physicians specializing in general internal medicine or general medicine and experienced in medical education. Chi-square and Mann-Whitney U tests were performed to identify differences among cases, and linear regression was used to examine trends associated with physicians' experience. Thematic analysis of qualitative feedback was performed to identify areas for improvement and confirm the educational utility of the cases.

RESULTS

Of the 73 invited participants, 71 (97%) responded. The respondents, primarily male (64/71, 90%), spanned a broad range of practice years (from 1976 to 2017) and represented diverse hospital sizes throughout Japan. The majority deemed the information quality (mean 0.77, 95% CI 0.75-0.79) and information accuracy (mean 0.68, 95% CI 0.65-0.71) to be satisfactory, with these responses being based on binary data. The average scores assigned were 3.55 (95% CI 3.49-3.60) for educational usefulness, 3.70 (95% CI 3.65-3.75) for clinical match, 3.49 (95% CI 3.44-3.55) for TA, and 2.34 (95% CI 2.28-2.40) for diagnosis difficulty, based on a 5-point Likert scale. Statistical analysis showed significant variability in content quality and relevance across the cases (P<.001 after Bonferroni correction). Participants suggested improvements in generating physical findings, using natural language, and enhancing medical TA. The thematic analysis highlighted the need for clearer documentation, clinical information consistency, content relevance, and patient-centered case presentations.

CONCLUSIONS

ChatGPT-4-generated medical cases written in Japanese possess considerable potential as resources in medical education, with recognized adequacy in quality and accuracy. Nevertheless, there is a notable need for enhancements in the precision and realism of case details. This study emphasizes ChatGPT-4's value as an adjunctive educational tool in the medical field, requiring expert oversight for optimal application.

Collapse

Sharma H, Ruikar M. Artificial intelligence at the pen's edge: Exploring the ethical quagmires in using artificial intelligence models like ChatGPT for assisted writing in biomedical research. Perspect Clin Res 2024;15:108-115. [PMID: 39140014 PMCID: PMC11318783 DOI: 10.4103/picr.picr_196_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 08/09/2023] [Accepted: 08/11/2023] [Indexed: 08/15/2024] Open

Lahat A, Sharif K, Zoabi N, Shneor Patt Y, Sharif Y, Fisher L, Shani U, Arow M, Levin R, Klang E. Assessing Generative Pretrained Transformers (GPT) in Clinical Decision-Making: Comparative Analysis of GPT-3.5 and GPT-4. J Med Internet Res 2024;26:e54571. [PMID: 38935937 PMCID: PMC11240076 DOI: 10.2196/54571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 02/02/2024] [Accepted: 04/29/2024] [Indexed: 06/29/2024] Open

Abstract

BACKGROUND

Artificial intelligence, particularly chatbot systems, is becoming an instrumental tool in health care, aiding clinical decision-making and patient engagement.

OBJECTIVE

This study aims to analyze the performance of ChatGPT-3.5 and ChatGPT-4 in addressing complex clinical and ethical dilemmas, and to illustrate their potential role in health care decision-making while comparing seniors' and residents' ratings, and specific question types.

METHODS

A total of 4 specialized physicians formulated 176 real-world clinical questions. A total of 8 senior physicians and residents assessed responses from GPT-3.5 and GPT-4 on a 1-5 scale across 5 categories: accuracy, relevance, clarity, utility, and comprehensiveness. Evaluations were conducted within internal medicine, emergency medicine, and ethics. Comparisons were made globally, between seniors and residents, and across classifications.

RESULTS

Both GPT models received high mean scores (4.4, SD 0.8 for GPT-4 and 4.1, SD 1.0 for GPT-3.5). GPT-4 outperformed GPT-3.5 across all rating dimensions, with seniors consistently rating responses higher than residents for both models. Specifically, seniors rated GPT-4 as more beneficial and complete (mean 4.6 vs 4.0 and 4.6 vs 4.1, respectively; P<.001), and GPT-3.5 similarly (mean 4.1 vs 3.7 and 3.9 vs 3.5, respectively; P<.001). Ethical queries received the highest ratings for both models, with mean scores reflecting consistency across accuracy and completeness criteria. Distinctions among question types were significant, particularly for the GPT-4 mean scores in completeness across emergency, internal, and ethical questions (4.2, SD 1.0; 4.3, SD 0.8; and 4.5, SD 0.7, respectively; P<.001), and for GPT-3.5's accuracy, beneficial, and completeness dimensions.

CONCLUSIONS

ChatGPT's potential to assist physicians with medical issues is promising, with prospects to enhance diagnostics, treatments, and ethics. While integration into clinical workflows may be valuable, it must complement, not replace, human expertise. Continued research is essential to ensure safe and effective implementation in clinical environments.

Collapse

Hussain T, Wang D, Li B. The influence of the COVID-19 pandemic on the adoption and impact of AI ChatGPT: Challenges, applications, and ethical considerations. Acta Psychol (Amst) 2024;246:104264. [PMID: 38626597 DOI: 10.1016/j.actpsy.2024.104264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 04/08/2024] [Accepted: 04/09/2024] [Indexed: 04/18/2024] Open

Abstract

DESIGN/METHODOLOGY/APPROACH

This article employs qualitative thematic modeling to gather insights from 30 informants. The study explores various aspects related to the impact of the COVID-19 pandemic on AI ChatGPT technologies.

PURPOSE

The purpose of this research is to examine how the COVID-19 pandemic has influenced the increased usage and adoption of AI ChatGPT. It aims to explore the pandemic's impact on AI ChatGPT and its applications in specific domains, as well as the challenges and opportunities it presents.

FINDINGS

The findings highlight that the pandemic has led to a surge in online activities, resulting in a heightened demand for AI ChatGPT. It has been widely used in areas such as healthcare, mental health support, remote collaboration, and personalized customer experiences. The article showcases examples of AI ChatGPT's application during the pandemic.

STRENGTH OF STUDY

This qualitative framework enables the study to delve deeply into the multifaceted dimensions of AI ChatGPT's role during the pandemic, capturing the diverse experiences and insights of users, practitioners, and experts. By embracing the qualitative nature of inquiry and this research offers a comprehensive understanding of the challenges, opportunities, and ethical considerations associated with the adoption and utilization of AI ChatGPT in crisis contexts.

PRACTICAL IMPLICATIONS

The insights from this research have practical implications for policymakers, developers, and researchers. This reserach emphasize the need for responsible and ethical implementation of AI ChatGPT to fully harness its potential in addressing societal needs during and beyond the pandemic.

SOCIAL IMPLICATIONS

The increased reliance on AI ChatGPT during the pandemic has led to changes in user behavior, expectations, and interactions. However, it has also unveiled ethical considerations and potential risks. Addressing societal and ethical concerns, such as user impact and autonomy, privacy and security, bias and fairness, and transparency and accountability, is crucial for the responsible deployment of AI ChatGPT.

ORIGINALITY/VALUE

This research contributes to the understanding of the novel role of AI ChatGPT in times of crisis, particularly in the era of COVID-19 pandemic. It highlights the necessity of responsible and ethical implementation of AI ChatGPT and provides valuable insights for the development and application of AI technology in the future.

Collapse

Denecke K, May R, Rivera Romero O. Potential of Large Language Models in Health Care: Delphi Study. J Med Internet Res 2024;26:e52399. [PMID: 38739445 PMCID: PMC11130776 DOI: 10.2196/52399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Revised: 10/10/2023] [Accepted: 04/19/2024] [Indexed: 05/14/2024] Open

Abstract

BACKGROUND

A large language model (LLM) is a machine learning model inferred from text data that captures subtle patterns of language use in context. Modern LLMs are based on neural network architectures that incorporate transformer methods. They allow the model to relate words together through attention to multiple words in a text sequence. LLMs have been shown to be highly effective for a range of tasks in natural language processing (NLP), including classification and information extraction tasks and generative applications.

OBJECTIVE

The aim of this adapted Delphi study was to collect researchers' opinions on how LLMs might influence health care and on the strengths, weaknesses, opportunities, and threats of LLM use in health care.

METHODS

We invited researchers in the fields of health informatics, nursing informatics, and medical NLP to share their opinions on LLM use in health care. We started the first round with open questions based on our strengths, weaknesses, opportunities, and threats framework. In the second and third round, the participants scored these items.

RESULTS

The first, second, and third rounds had 28, 23, and 21 participants, respectively. Almost all participants (26/28, 93% in round 1 and 20/21, 95% in round 3) were affiliated with academic institutions. Agreement was reached on 103 items related to use cases, benefits, risks, reliability, adoption aspects, and the future of LLMs in health care. Participants offered several use cases, including supporting clinical tasks, documentation tasks, and medical research and education, and agreed that LLM-based systems will act as health assistants for patient education. The agreed-upon benefits included increased efficiency in data handling and extraction, improved automation of processes, improved quality of health care services and overall health outcomes, provision of personalized care, accelerated diagnosis and treatment processes, and improved interaction between patients and health care professionals. In total, 5 risks to health care in general were identified: cybersecurity breaches, the potential for patient misinformation, ethical concerns, the likelihood of biased decision-making, and the risk associated with inaccurate communication. Overconfidence in LLM-based systems was recognized as a risk to the medical profession. The 6 agreed-upon privacy risks included the use of unregulated cloud services that compromise data security, exposure of sensitive patient data, breaches of confidentiality, fraudulent use of information, vulnerabilities in data storage and communication, and inappropriate access or use of patient data.

CONCLUSIONS

Future research related to LLMs should not only focus on testing their possibilities for NLP-related tasks but also consider the workflows the models could contribute to and the requirements regarding quality, integration, and regulations needed for successful implementation in practice.

Collapse

Pinto DS, Noronha SM, Saigal G, Quencer RM. Comparison of an AI-Generated Case Report With a Human-Written Case Report: Practical Considerations for AI-Assisted Medical Writing. Cureus 2024;16:e60461. [PMID: 38883028 PMCID: PMC11179998 DOI: 10.7759/cureus.60461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/15/2024] [Indexed: 06/18/2024] Open

Abstract

INTRODUCTION

The utility of ChatGPT has recently caused consternation in the medical world. While it has been utilized to write manuscripts, only a few studies have evaluated the quality of manuscripts generated by AI (artificial intelligence).

OBJECTIVE

We evaluate the ability of ChatGPT to write a case report when provided with a framework. We also provide practical considerations for manuscript writing using AI.

METHODS

We compared a manuscript written by a blinded human author (10 years of medical experience) with a manuscript written by ChatGPT on a rare presentation of a common disease. We used multiple iterations of the manuscript generation request to derive the best ChatGPT output. Participants, outcomes, and measures: 22 human reviewers compared the manuscripts using parameters that characterize human writing and relevant standard manuscript assessment criteria, viz., scholarly impact quotient (SIQ). We also compared the manuscripts using the "average perplexity score" (APS), "burstiness score" (BS), and "highest perplexity of a sentence" (GPTZero parameters to detect AI-generated content).

RESULTS

The human manuscript had a significantly higher quality of presentation and nuanced writing (p<0.05). Both manuscripts had a logical flow. 12/22 reviewers were able to identify the AI-generated manuscript (p<0.05), but 4/22 reviewers wrongly identified the human-written manuscript as AI-generated. GPTZero software erroneously identified four sentences of the human-written manuscript to be AI-generated.

CONCLUSION

Though AI showed an ability to highlight the novelty of the case report and project a logical flow comparable to the human manuscript, it could not outperform the human writer on all parameters. The human manuscript showed a better quality of presentation and more nuanced writing. The practical considerations we provide for AI-assisted medical writing will help to better utilize AI in manuscript writing.

Collapse

Lang S, Vitale J, Fekete TF, Haschtmann D, Reitmeir R, Ropelato M, Puhakka J, Galbusera F, Loibl M. Are large language models valid tools for patient information on lumbar disc herniation? The spine surgeons' perspective. BRAIN & SPINE 2024;4:102804. [PMID: 38706800 PMCID: PMC11067000 DOI: 10.1016/j.bas.2024.102804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Revised: 02/19/2024] [Accepted: 04/04/2024] [Indexed: 05/07/2024]

Abstract

Introduction

Generative AI is revolutionizing patient education in healthcare, particularly through chatbots that offer personalized, clear medical information. Reliability and accuracy are vital in AI-driven patient education.

Research question

How effective are Large Language Models (LLM), such as ChatGPT and Google Bard, in delivering accurate and understandable patient education on lumbar disc herniation?

Material and methods

Ten Frequently Asked Questions about lumbar disc herniation were selected from 133 questions and were submitted to three LLMs. Six experienced spine surgeons rated the responses on a scale from "excellent" to "unsatisfactory," and evaluated the answers for exhaustiveness, clarity, empathy, and length. Statistical analysis involved Fleiss Kappa, Chi-square, and Friedman tests.

Results

Out of the responses, 27.2% were excellent, 43.9% satisfactory with minimal clarification, 18.3% satisfactory with moderate clarification, and 10.6% unsatisfactory. There were no significant differences in overall ratings among the LLMs (p = 0.90); however, inter-rater reliability was not achieved, and large differences among raters were detected in the distribution of answer frequencies. Overall, ratings varied among the 10 answers (p = 0.043). The average ratings for exhaustiveness, clarity, empathy, and length were above 3.5/5.

Discussion and conclusion

LLMs show potential in patient education for lumbar spine surgery, with generally positive feedback from evaluators. The new EU AI Act, enforcing strict regulation on AI systems, highlights the need for rigorous oversight in medical contexts. In the current study, the variability in evaluations and occasional inaccuracies underline the need for continuous improvement. Future research should involve more advanced models to enhance patient-physician communication.

Collapse

Valentini M, Szkandera J, Smolle MA, Scheipl S, Leithner A, Andreou D. Artificial intelligence large language model ChatGPT: is it a trustworthy and reliable source of information for sarcoma patients? Front Public Health 2024;12:1303319. [PMID: 38584922 PMCID: PMC10995284 DOI: 10.3389/fpubh.2024.1303319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Accepted: 03/06/2024] [Indexed: 04/09/2024] Open

Abstract

Introduction

Since its introduction in November 2022, the artificial intelligence large language model ChatGPT has taken the world by storm. Among other applications it can be used by patients as a source of information on diseases and their treatments. However, little is known about the quality of the sarcoma-related information ChatGPT provides. We therefore aimed at analyzing how sarcoma experts evaluate the quality of ChatGPT's responses on sarcoma-related inquiries and assess the bot's answers in specific evaluation metrics.

Methods

The ChatGPT responses to a sample of 25 sarcoma-related questions (5 definitions, 9 general questions, and 11 treatment-related inquiries) were evaluated by 3 independent sarcoma experts. Each response was compared with authoritative resources and international guidelines and graded on 5 different metrics using a 5-point Likert scale: completeness, misleadingness, accuracy, being up-to-date, and appropriateness. This resulted in maximum 25 and minimum 5 points per answer, with higher scores indicating a higher response quality. Scores ≥21 points were rated as very good, between 16 and 20 as good, while scores ≤15 points were classified as poor (11-15) and very poor (≤10).

Results

The median score that ChatGPT's answers achieved was 18.3 points (IQR, i.e., Inter-Quartile Range, 12.3-20.3 points). Six answers were classified as very good, 9 as good, while 5 answers each were rated as poor and very poor. The best scores were documented in the evaluation of how appropriate the response was for patients (median, 3.7 points; IQR, 2.5-4.2 points), which were significantly higher compared to the accuracy scores (median, 3.3 points; IQR, 2.0-4.2 points; p = 0.035). ChatGPT fared considerably worse with treatment-related questions, with only 45% of its responses classified as good or very good, compared to general questions (78% of responses good/very good) and definitions (60% of responses good/very good).

Discussion

The answers ChatGPT provided on a rare disease, such as sarcoma, were found to be of very inconsistent quality, with some answers being classified as very good and others as very poor. Sarcoma physicians should be aware of the risks of misinformation that ChatGPT poses and advise their patients accordingly.

Collapse

Mu Y, He D. The Potential Applications and Challenges of ChatGPT in the Medical Field. Int J Gen Med 2024;17:817-826. [PMID: 38476626 PMCID: PMC10929156 DOI: 10.2147/ijgm.s456659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 02/26/2024] [Indexed: 03/14/2024] Open

Bellini V, Semeraro F, Montomoli J, Cascella M, Bignami E. Between human and AI: assessing the reliability of AI text detection tools. Curr Med Res Opin 2024;40:353-358. [PMID: 38265047 DOI: 10.1080/03007995.2024.2310086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 01/22/2024] [Indexed: 01/25/2024]

Abi-Rafeh J, Xu HH, Kazan R, Tevlin R, Furnas H. Large Language Models and Artificial Intelligence: A Primer for Plastic Surgeons on the Demonstrated and Potential Applications, Promises, and Limitations of ChatGPT. Aesthet Surg J 2024;44:329-343. [PMID: 37562022 DOI: 10.1093/asj/sjad260] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 08/02/2023] [Accepted: 08/04/2023] [Indexed: 08/12/2023] Open

Ajagunde J, Das NK. ChatGPT Versus Medical Professionals. Health Serv Insights 2024;17:11786329241230161. [PMID: 38322596 PMCID: PMC10845989 DOI: 10.1177/11786329241230161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2024] Open

Yan S, Du D, Liu X, Dai Y, Kim MK, Zhou X, Wang L, Zhang L, Jiang X. Assessment of the Reliability and Clinical Applicability of ChatGPT's Responses to Patients' Common Queries About Rosacea. Patient Prefer Adherence 2024;18:249-253. [PMID: 38313827 PMCID: PMC10838492 DOI: 10.2147/ppa.s444928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 01/22/2024] [Indexed: 02/06/2024] Open

Abstract

Objective

Artificial intelligence chatbot, particularly ChatGPT (Chat Generative Pre-trained Transformer), is capable of analyzing human input and generating human-like responses, which shows its potential application in healthcare. People with rosacea often have questions about alleviating symptoms and daily skin-care, which is suitable for ChatGPT to response. This study aims to assess the reliability and clinical applicability of ChatGPT 3.5 in responding to patients' common queries about rosacea and to evaluate the extent of ChatGPT's coverage in dermatology resources.

Methods

Based on a qualitative analysis of the literature on the queries from rosacea patients, we have extracted 20 questions of patients' greatest concerns, covering four main categories: treatment, triggers and diet, skincare, and special manifestations of rosacea. Each question was inputted into ChatGPT separately for three rounds of question-and-answer conversations. The generated answers will be evaluated by three experienced dermatologists with postgraduate degrees and over five years of clinical experience in dermatology, to assess their reliability and applicability for clinical practice.

Results

The analysis results indicate that the reviewers unanimously agreed that ChatGPT achieved a high reliability of 92.22% to 97.78% in responding to patients' common queries about rosacea. Additionally, almost all answers were applicable for supporting rosacea patient education, with a clinical applicability ranging from 98.61% to 100.00%. The consistency of the expert ratings was excellent (all significance levels were less than 0.05), with a consistency coefficient of 0.404 for content reliability and 0.456 for clinical practicality, indicating significant consistency in the results and a high level of agreement among the expert ratings.

Conclusion

ChatGPT 3.5 exhibits excellent reliability and clinical applicability in responding to patients' common queries about rosacea. This artificial intelligence tool is applicable for supporting rosacea patient education.

Collapse

Affiliation(s)

Sihan Yan Department of Dermatology, West China Hospital, Sichuan University, Chengdu, People’s Republic of China Laboratory of Dermatology, Clinical Institute of Inflammation and Immunology, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, People’s Republic of China
Dan Du Department of Dermatology, West China Hospital, Sichuan University, Chengdu, People’s Republic of China Laboratory of Dermatology, Clinical Institute of Inflammation and Immunology, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, People’s Republic of China
Xu Liu Department of Dermatology, West China Hospital, Sichuan University, Chengdu, People’s Republic of China Laboratory of Dermatology, Clinical Institute of Inflammation and Immunology, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, People’s Republic of China
Yingying Dai Department of Dermatology, West China Hospital, Sichuan University, Chengdu, People’s Republic of China Laboratory of Dermatology, Clinical Institute of Inflammation and Immunology, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, People’s Republic of China
Min-Kyu Kim Department of Dermatology, West China Hospital, Sichuan University, Chengdu, People’s Republic of China Laboratory of Dermatology, Clinical Institute of Inflammation and Immunology, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, People’s Republic of China
Xinyu Zhou Department of Dermatology, Nanbu County People’s Hospital, Nanbu County, Nanchong, Sichuan, People’s Republic of China
Lian Wang Department of Dermatology, West China Hospital, Sichuan University, Chengdu, People’s Republic of China Laboratory of Dermatology, Clinical Institute of Inflammation and Immunology, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, People’s Republic of China
Lu Zhang Department of Dermatology, West China Hospital, Sichuan University, Chengdu, People’s Republic of China Laboratory of Dermatology, Clinical Institute of Inflammation and Immunology, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, People’s Republic of China
Xian Jiang Department of Dermatology, West China Hospital, Sichuan University, Chengdu, People’s Republic of China Laboratory of Dermatology, Clinical Institute of Inflammation and Immunology, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, People’s Republic of China

Collapse

Zaleski AL, Berkowsky R, Craig KJT, Pescatello LS. Comprehensiveness, Accuracy, and Readability of Exercise Recommendations Provided by an AI-Based Chatbot: Mixed Methods Study. JMIR MEDICAL EDUCATION 2024;10:e51308. [PMID: 38206661 PMCID: PMC10811574 DOI: 10.2196/51308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 10/05/2023] [Accepted: 12/11/2023] [Indexed: 01/12/2024]

Abstract

BACKGROUND

Regular physical activity is critical for health and disease prevention. Yet, health care providers and patients face barriers to implement evidence-based lifestyle recommendations. The potential to augment care with the increased availability of artificial intelligence (AI) technologies is limitless; however, the suitability of AI-generated exercise recommendations has yet to be explored.

OBJECTIVE

The purpose of this study was to assess the comprehensiveness, accuracy, and readability of individualized exercise recommendations generated by a novel AI chatbot.

METHODS

A coding scheme was developed to score AI-generated exercise recommendations across ten categories informed by gold-standard exercise recommendations, including (1) health condition-specific benefits of exercise, (2) exercise preparticipation health screening, (3) frequency, (4) intensity, (5) time, (6) type, (7) volume, (8) progression, (9) special considerations, and (10) references to the primary literature. The AI chatbot was prompted to provide individualized exercise recommendations for 26 clinical populations using an open-source application programming interface. Two independent reviewers coded AI-generated content for each category and calculated comprehensiveness (%) and factual accuracy (%) on a scale of 0%-100%. Readability was assessed using the Flesch-Kincaid formula. Qualitative analysis identified and categorized themes from AI-generated output.

RESULTS

AI-generated exercise recommendations were 41.2% (107/260) comprehensive and 90.7% (146/161) accurate, with the majority (8/15, 53%) of inaccuracy related to the need for exercise preparticipation medical clearance. Average readability level of AI-generated exercise recommendations was at the college level (mean 13.7, SD 1.7), with an average Flesch reading ease score of 31.1 (SD 7.7). Several recurring themes and observations of AI-generated output included concern for liability and safety, preference for aerobic exercise, and potential bias and direct discrimination against certain age-based populations and individuals with disabilities.

CONCLUSIONS

There were notable gaps in the comprehensiveness, accuracy, and readability of AI-generated exercise recommendations. Exercise and health care professionals should be aware of these limitations when using and endorsing AI-based technologies as a tool to support lifestyle change involving exercise.

Collapse

Younis HA, Eisa TAE, Nasser M, Sahib TM, Noor AA, Alyasiri OM, Salisu S, Hayder IM, Younis HA. A Systematic Review and Meta-Analysis of Artificial Intelligence Tools in Medicine and Healthcare: Applications, Considerations, Limitations, Motivation and Challenges. Diagnostics (Basel) 2024;14:109. [PMID: 38201418 PMCID: PMC10802884 DOI: 10.3390/diagnostics14010109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 12/02/2023] [Accepted: 12/04/2023] [Indexed: 01/12/2024] Open

Malik S, Zaheer S. ChatGPT as an aid for pathological diagnosis of cancer. Pathol Res Pract 2024;253:154989. [PMID: 38056135 DOI: 10.1016/j.prp.2023.154989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 11/26/2023] [Accepted: 11/27/2023] [Indexed: 12/08/2023]

Alotaibi SS, Rehman A, Hasnain M. Revolutionizing ocular cancer management: a narrative review on exploring the potential role of ChatGPT. Front Public Health 2023;11:1338215. [PMID: 38192545 PMCID: PMC10773849 DOI: 10.3389/fpubh.2023.1338215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 12/04/2023] [Indexed: 01/10/2024] Open

Chatterjee S, Bhattacharya M, Pal S, Lee SS, Chakraborty C. ChatGPT and large language models in orthopedics: from education and surgery to research. J Exp Orthop 2023;10:128. [PMID: 38038796 PMCID: PMC10692045 DOI: 10.1186/s40634-023-00700-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 11/16/2023] [Indexed: 12/02/2023] Open

Au K, Yang W. Auxiliary use of ChatGPT in surgical diagnosis and treatment. Int J Surg 2023;109:3940-3943. [PMID: 37678271 PMCID: PMC10720849 DOI: 10.1097/js9.0000000000000686] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 08/09/2023] [Indexed: 09/09/2023]

Abu-Farha R, Fino L, Al-Ashwal FY, Zawiah M, Gharaibeh L, Harahsheh MM, Darwish Elhajji F. Evaluation of community pharmacists' perceptions and willingness to integrate ChatGPT into their pharmacy practice: A study from Jordan. J Am Pharm Assoc (2003) 2023;63:1761-1767.e2. [PMID: 37648157 DOI: 10.1016/j.japh.2023.08.020] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Revised: 08/10/2023] [Accepted: 08/22/2023] [Indexed: 09/01/2023]

Abstract

OBJECTIVES

This study aimed to examine the extent of community pharmacists' awareness of Chat Generative Pretraining Transformer (ChatGPT), their willingness to embark on this new development of artificial intelligence (AI) development, and barriers that face the incorporation of this nonconventional source of information into pharmacy practice.

METHODS

A cross-sectional study was conducted among community pharmacists in Jordanian cities between April 26, 2023, and May 10, 2023. Convenience and snowball sampling techniques were used to select study participants owing to resource and time constraints. The questionnaire was distributed by research assistants through popular social media platforms. Logistic regression analysis was used to assess predictors affecting their willingness to use this service in the future.

RESULTS

A total of 221 community pharmacists participated in the current study (response rate was not calculated because opt-in recruitment strategies were used). Remarkably, nearly half of the pharmacists (n = 107, 48.4%) indicated a willingness to incorporate the ChatGPT into their pharmacy practice. Nearly half of the pharmacists (n = 105, 47.5%) demonstrated a high perceived benefit score for ChatGPT, whereas approximately 37% of pharmacists (n = 81) expressed a high concern score about ChatGPT. More than 70% of pharmacists believed that ChatGPT lacked the ability to use human judgment and make complicated ethical judgments in its responses (n = 168). Finally, logistics regression analysis showed that pharmacists who had previous experience in using ChatGPT were more willing to integrate ChatGPT in their pharmacy practice than those with no previous experience in using ChatGPT (odds ratio 2.312, P = 0.035).

CONCLUSION

Although pharmacists show a willingness to incorporate ChatGPT into their practice, especially those with previous experience, there are major concerns. These mainly revolve around the tool's ability to make human-like judgments and ethical decisions. These findings are crucial for the future development and integration of AI tools in pharmacy practice.

Collapse

Karakas C, Brock D, Lakhotia A. Leveraging ChatGPT in the Pediatric Neurology Clinic: Practical Considerations for Use to Improve Efficiency and Outcomes. Pediatr Neurol 2023;148:157-163. [PMID: 37725885 DOI: 10.1016/j.pediatrneurol.2023.08.035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 08/17/2023] [Accepted: 08/25/2023] [Indexed: 09/21/2023]

Mese I, Taslicay CA, Sivrioglu AK. Improving radiology workflow using ChatGPT and artificial intelligence. Clin Imaging 2023;103:109993. [PMID: 37812965 DOI: 10.1016/j.clinimag.2023.109993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 08/19/2023] [Accepted: 09/28/2023] [Indexed: 10/11/2023]

Chakraborty C, Pal S, Bhattacharya M, Dash S, Lee SS. Overview of Chatbots with special emphasis on artificial intelligence-enabled ChatGPT in medical science. Front Artif Intell 2023;6:1237704. [PMID: 38028668 PMCID: PMC10644239 DOI: 10.3389/frai.2023.1237704] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 10/05/2023] [Indexed: 12/01/2023] Open

Irfan B, Yaqoob A. ChatGPT's Epoch in Rheumatological Diagnostics: A Critical Assessment in the Context of Sjögren's Syndrome. Cureus 2023;15:e47754. [PMID: 38022092 PMCID: PMC10676288 DOI: 10.7759/cureus.47754] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/26/2023] [Indexed: 12/01/2023] Open

Abstract

INTRODUCTION

The rise of artificial intelligence in medical practice is reshaping clinical care. Large language models (LLMs) like ChatGPT have the potential to assist in rheumatology by personalizing scientific information retrieval, particularly in the context of Sjögren's Syndrome. This study aimed to evaluate the efficacy of ChatGPT in providing insights into Sjögren's Syndrome, differentiating it from other rheumatological conditions.

MATERIALS AND METHODS

A database of peer-reviewed articles and clinical guidelines focused on Sjögren's Syndrome was compiled. Clinically relevant questions were presented to ChatGPT, with responses assessed for accuracy, relevance, and comprehensiveness. Techniques such as blinding, random control queries, and temporal analysis ensured unbiased evaluation. ChatGPT's responses were also assessed using the 15-questionnaire DISCERN tool.

RESULTS

ChatGPT effectively highlighted key immunopathological and histopathological characteristics of Sjögren's Syndrome, though some crucial data and citation inconsistencies were noted. For a given clinical vignette, ChatGPT correctly identified potential etiological considerations with Sjögren's Syndrome being prominent.

DISCUSSION

LLMs like ChatGPT offer rapid access to vast amounts of data, beneficial for both patients and providers. While it democratizes information, limitations like potential oversimplification and reference inaccuracies were observed. The balance between LLM insights and clinical judgment, as well as continuous model refinement, is crucial.

CONCLUSION

LLMs like ChatGPT offer significant potential in rheumatology, providing swift and broad medical insights. However, a cautious approach is vital, ensuring rigorous training and ethical application for optimal patient care and clinical practice.

Collapse

Turner JH. Cancer Care by Committee to be Superseded by Personal Physician-Patient Partnership Informed by Artificial Intelligence. Cancer Biother Radiopharm 2023;38:497-505. [PMID: 37366774 DOI: 10.1089/cbr.2023.0058] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/28/2023] Open

Chou YH, Lin C, Lee SH, Chang Chien YW, Cheng LC. Potential Mobile Health Applications for Improving the Mental Health of the Elderly: A Systematic Review. Clin Interv Aging 2023;18:1523-1534. [PMID: 37727447 PMCID: PMC10506600 DOI: 10.2147/cia.s410396] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 09/05/2023] [Indexed: 09/21/2023] Open

Stanbrook MB, Weinhold M, Kelsall D. Nouvelle politique sur l’utilisation des outils d’intelligence artificielle dans les manuscrits soumis au JAMC. CMAJ 2023;195:E1168-E1169. [PMID: 37669792 PMCID: PMC10479997 DOI: 10.1503/cmaj.230949-f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/07/2023] Open

Lim ZW, Pushpanathan K, Yew SME, Lai Y, Sun CH, Lam JSH, Chen DZ, Goh JHL, Tan MCJ, Sheng B, Cheng CY, Koh VTC, Tham YC. Benchmarking large language models' performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard. EBioMedicine 2023;95:104770. [PMID: 37625267 PMCID: PMC10470220 DOI: 10.1016/j.ebiom.2023.104770] [Citation(s) in RCA: 78] [Impact Index Per Article: 78.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 07/21/2023] [Accepted: 08/08/2023] [Indexed: 08/27/2023] Open

Abstract

BACKGROUND

Large language models (LLMs) are garnering wide interest due to their human-like and contextually relevant responses. However, LLMs' accuracy across specific medical domains has yet been thoroughly evaluated. Myopia is a frequent topic which patients and parents commonly seek information online. Our study evaluated the performance of three LLMs namely ChatGPT-3.5, ChatGPT-4.0, and Google Bard, in delivering accurate responses to common myopia-related queries.

METHODS

We curated thirty-one commonly asked myopia care-related questions, which were categorised into six domains-pathogenesis, risk factors, clinical presentation, diagnosis, treatment and prevention, and prognosis. Each question was posed to the LLMs, and their responses were independently graded by three consultant-level paediatric ophthalmologists on a three-point accuracy scale (poor, borderline, good). A majority consensus approach was used to determine the final rating for each response. 'Good' rated responses were further evaluated for comprehensiveness on a five-point scale. Conversely, 'poor' rated responses were further prompted for self-correction and then re-evaluated for accuracy.

FINDINGS

ChatGPT-4.0 demonstrated superior accuracy, with 80.6% of responses rated as 'good', compared to 61.3% in ChatGPT-3.5 and 54.8% in Google Bard (Pearson's chi-squared test, all p ≤ 0.009). All three LLM-Chatbots showed high mean comprehensiveness scores (Google Bard: 4.35; ChatGPT-4.0: 4.23; ChatGPT-3.5: 4.11, out of a maximum score of 5). All LLM-Chatbots also demonstrated substantial self-correction capabilities: 66.7% (2 in 3) of ChatGPT-4.0's, 40% (2 in 5) of ChatGPT-3.5's, and 60% (3 in 5) of Google Bard's responses improved after self-correction. The LLM-Chatbots performed consistently across domains, except for 'treatment and prevention'. However, ChatGPT-4.0 still performed superiorly in this domain, receiving 70% 'good' ratings, compared to 40% in ChatGPT-3.5 and 45% in Google Bard (Pearson's chi-squared test, all p ≤ 0.001).

INTERPRETATION

Our findings underscore the potential of LLMs, particularly ChatGPT-4.0, for delivering accurate and comprehensive responses to myopia-related queries. Continuous strategies and evaluations to improve LLMs' accuracy remain crucial.

FUNDING

Dr Yih-Chung Tham was supported by the National Medical Research Council of Singapore (NMRC/MOH/HCSAINV21nov-0001).

Collapse

Affiliation(s)

Zhi Wei Lim Yong Loo Lin School of Medicine, National University of Singapore, Singapore
Krithi Pushpanathan Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore
Samantha Min Er Yew Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore
Yien Lai Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
Chen-Hsin Sun Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
Janice Sing Harn Lam Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
David Ziyou Chen Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
Jocelyn Hui Lin Goh Singapore Eye Research Institute, Singapore National Eye Centre, Singapore
Marcus Chun Jin Tan Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
Bin Sheng Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China; Department of Endocrinology and Metabolism, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai Diabetes Institute, Shanghai Clinical Center for Diabetes, Shanghai, China; MoE Key Lab of Artificial Intelligence, Artificial Intelligence Institute, Shanghai Jiao Tong University, Shanghai, China
Ching-Yu Cheng Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Singapore Eye Research Institute, Singapore National Eye Centre, Singapore; Eye Academic Clinical Program (Eye ACP), Duke NUS Medical School, Singapore
Victor Teck Chang Koh Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Department of Ophthalmology, National University Hospital, Singapore
Yih-Chung Tham Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore; Singapore Eye Research Institute, Singapore National Eye Centre, Singapore; Eye Academic Clinical Program (Eye ACP), Duke NUS Medical School, Singapore.

Collapse

Chow JCL, Wong V, Sanders L, Li K. Developing an AI-Assisted Educational Chatbot for Radiotherapy Using the IBM Watson Assistant Platform. Healthcare (Basel) 2023;11:2417. [PMID: 37685452 PMCID: PMC10487627 DOI: 10.3390/healthcare11172417] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 08/25/2023] [Accepted: 08/26/2023] [Indexed: 09/10/2023] Open

Nazir T, Ahmad U, Mal M, Rehman MU, Saeed R, Kalia J. Microsoft Bing vs Google Bard in Neurology: A Comparative Study of AI-Generated Patient Education Material.. [DOI: 10.1101/2023.08.25.23294641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]

Abstract AbstractBackgroundPatient education is an essential component of healthcare, and artificial intelligence (AI) language models such as Google Bard and Microsoft Bing have the potential to improve information transmission and enhance patient care. However, it is crucial to evaluate the quality, accuracy, and understandability of the materials generated by these models before applying them in medical practice. This study aimed to assess and compare the quality of patient education materials produced by Google Bard and Microsoft Bing in response to questions related to neurological conditions.MethodsA cross-sectional study design was used to evaluate and compare the ability of Google Bard and Microsoft Bing to generate patient education materials. The study included the top ten prevalent neurological diseases based on WHO prevalence data. Ten board-certified neurologists and four neurology residents evaluated the responses generated by the models on six quality metrics. The scores for each model were compiled and averaged across all measures, and the significance of any observed variations was assessed using an independent t-test.ResultsGoogle Bard performed better than Microsoft Bing in all six-quality metrics, with an overall mean score of 79% and 69%, respectively. Google Bard outperformed Microsoft Bing in all measures for eight questions, while Microsoft Bing performed marginally better in terms of objectivity and clarity for the epilepsy query.ConclusionThis study showed that Google Bard performs better than Microsoft Bing in generating patient education materials for neurological diseases. However, healthcare professionals should take into account both AI models’ advantages and disadvantages when providing support for health information requirements. Future studies can help determine the underlying causes of these variations and guide cooperative initiatives to create more user-focused AI-generated patient education materials. Finally, researchers should consider the perception of patients regarding AI-generated patient education material and its impact on implementing these solutions in healthcare settings. Collapse

Watters C, Lemanski MK. Universal skepticism of ChatGPT: a review of early literature on chat generative pre-trained transformer. Front Big Data 2023;6:1224976. [PMID: 37680954 PMCID: PMC10482048 DOI: 10.3389/fdata.2023.1224976] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 07/10/2023] [Indexed: 09/09/2023] Open

Erren TC, Lewis P, Shaw DM. Brave (in a) new world: an ethical perspective on chatbots for medical advice. Front Public Health 2023;11:1254334. [PMID: 37663854 PMCID: PMC10470018 DOI: 10.3389/fpubh.2023.1254334] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 07/31/2023] [Indexed: 09/05/2023] Open

Rawashdeh B, Kim J, AlRyalat SA, Prasad R, Cooper M. ChatGPT and Artificial Intelligence in Transplantation Research: Is It Always Correct? Cureus 2023;15:e42150. [PMID: 37602076 PMCID: PMC10438857 DOI: 10.7759/cureus.42150] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/18/2023] [Indexed: 08/22/2023] Open

Martínez-Ezquerro JD. Response to: Impact of ChatGPT and Artificial Intelligence in the Contemporary Medical Landscape. Arch Med Res 2023;54:102838. [PMID: 37364482 DOI: 10.1016/j.arcmed.2023.06.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 06/14/2023] [Indexed: 06/28/2023]

Kunze KN, Jang SJ, Fullerton MA, Vigdorchik JM, Haddad FS. What's all the chatter about? Bone Joint J 2023;105-B:587-589. [PMID: 37257860 DOI: 10.1302/0301-620x.105b6.bjj-2023-0156] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]