Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: De Angelis L, Baglivo F, Arzilli G, Privitera GP, Ferragina P, Tozzi AE, Rizzo C. ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health. Front Public Health 2023;11:1166120. [PMID: 37181697 PMCID: PMC10166793 DOI: 10.3389/fpubh.2023.1166120] [Citation(s) in RCA: 98] [Impact Index Per Article: 98.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 04/11/2023] [Indexed: 05/16/2023] Open

For:	De Angelis L, Baglivo F, Arzilli G, Privitera GP, Ferragina P, Tozzi AE, Rizzo C. ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health. Front Public Health 2023;11:1166120. [PMID: 37181697 PMCID: PMC10166793 DOI: 10.3389/fpubh.2023.1166120] [Citation(s) in RCA: 98] [Impact Index Per Article: 98.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 04/11/2023] [Indexed: 05/16/2023] Open

Number

Cited by Other Article(s)

Lee JW, Yoo IS, Kim JH, Kim WT, Jeon HJ, Yoo HS, Shin JG, Kim GH, Hwang S, Park S, Kim YJ. Development of AI-generated medical responses using the ChatGPT for cancer patients. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024;254:108302. [PMID: 38996805 DOI: 10.1016/j.cmpb.2024.108302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 05/28/2024] [Accepted: 06/22/2024] [Indexed: 07/14/2024]

Affiliation(s)

Jae-Woo Lee Department of Family Medicine, Chungbuk National University Hospital, Cheongju, Republic of Korea; Department of Family Medicine, Chungbuk National University College of Medicine, Cheongju, Republic of Korea
In-Sang Yoo Department of Biomedical Engineering, Chungbuk National University Hospital, Cheongju, Republic of Korea; Department of Medicine, Chungbuk National University College of Medicine, Cheongju, Republic of Korea
Ji-Hye Kim Department of Biomedical Engineering, Chungbuk National University Hospital, Cheongju, Republic of Korea
Won Tae Kim Department of Urology, Chungbuk National University Hospital, Cheongju, Republic of Korea; Department of Urology, Chungbuk National University College of Medicine, 1 Chungdae-ro, Seowon-gu, Cheongju, Chungcheongbuk-do 28644, Republic of Korea
Hyun Jeong Jeon Department of Internal Medicine, Chungbuk National University Hospital, Cheongju, Republic of Korea; Department of Internal Medicine, College of Medicine, Chungbuk National University, Cheongju, Republic of Korea
Hyo-Sun Yoo Department of Family Medicine, Chungbuk National University Hospital, Cheongju, Republic of Korea
Jae Gwang Shin Department of Biomedical Engineering, Chungbuk National University Hospital, Cheongju, Republic of Korea
Geun-Hyeong Kim Department of Biomedical Engineering, Chungbuk National University Hospital, Cheongju, Republic of Korea
ShinJi Hwang Department of Biomedical Engineering, Chungbuk National University Hospital, Cheongju, Republic of Korea
Seung Park Department of Biomedical Engineering, Chungbuk National University Hospital, Cheongju, Republic of Korea; Department of Medicine, Chungbuk National University College of Medicine, Cheongju, Republic of Korea
Yong-June Kim Department of Urology, Chungbuk National University Hospital, Cheongju, Republic of Korea; Department of Urology, Chungbuk National University College of Medicine, 1 Chungdae-ro, Seowon-gu, Cheongju, Chungcheongbuk-do 28644, Republic of Korea.

Collapse

Zhang Q, Wu Z, Song J, Luo S, Chai Z. Comprehensiveness of Large Language Models in Patient Queries on Gingival and Endodontic Health. Int Dent J 2024:S0020-6539(24)00195-3. [PMID: 39147663 DOI: 10.1016/j.identj.2024.06.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 06/12/2024] [Accepted: 06/19/2024] [Indexed: 08/17/2024] Open

Abstract

AIM

Given the increasing interest in using large language models (LLMs) for self-diagnosis, this study aimed to evaluate the comprehensiveness of two prominent LLMs, ChatGPT-3.5 and ChatGPT-4, in addressing common queries related to gingival and endodontic health across different language contexts and query types.

METHODS

We assembled a set of 33 common real-life questions related to gingival and endodontic healthcare, including 17 common-sense questions and 16 expert questions. Each question was presented to the LLMs in both English and Chinese. Three specialists were invited to evaluate the comprehensiveness of the responses on a five-point Likert scale, where a higher score indicated greater quality responses.

RESULTS

LLMs performed significantly better in English, with an average score of 4.53, compared to 3.95 in Chinese (Mann-Whitney U test, P < .05). Responses to common sense questions received higher scores than those to expert questions, with averages of 4.46 and 4.02 (Mann-Whitney U test, P < .05). Among the LLMs, ChatGPT-4 consistently outperformed ChatGPT-3.5, achieving average scores of 4.45 and 4.03 (Mann-Whitney U test, P < .05).

CONCLUSIONS

ChatGPT-4 provides more comprehensive responses than ChatGPT-3.5 for queries related to gingival and endodontic health. Both LLMs perform better in English and on common sense questions. However, the performance discrepancies across different language contexts and the presence of inaccurate responses suggest that further evaluation and understanding of their limitations are crucial to avoid potential misunderstandings.

CLINICAL RELEVANCE

This study revealed the performance differences of ChatGPT-3.5 and ChatGPT-4 in handling gingival and endodontic health issues across different language contexts, providing insights into the comprehensiveness and limitations of LLMs in addressing common oral healthcare queries.

Collapse

McClymont H, Lambert SB, Barr I, Vardoulakis S, Bambrick H, Hu W. Internet-based Surveillance Systems and Infectious Diseases Prediction: An Updated Review of the Last 10 Years and Lessons from the COVID-19 Pandemic. J Epidemiol Glob Health 2024:10.1007/s44197-024-00272-y. [PMID: 39141074 DOI: 10.1007/s44197-024-00272-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Accepted: 06/26/2024] [Indexed: 08/15/2024] Open

Feng X, Xu K, Luo MJ, Chen H, Yang Y, He Q, Song C, Li R, Wu Y, Wang H, Tham YC, Ting DSW, Lin H, Wong TY, Lam DSC. Latest developments of generative artificial intelligence and applications in ophthalmology. Asia Pac J Ophthalmol (Phila) 2024:100090. [PMID: 39128549 DOI: 10.1016/j.apjo.2024.100090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Revised: 07/30/2024] [Accepted: 08/07/2024] [Indexed: 08/13/2024] Open

Affiliation(s)

Xiaoru Feng School of Biomedical Engineering, Tsinghua Medicine, Tsinghua University, Beijing, China; Institute for Hospital Management, Tsinghua Medicine, Tsinghua University, Beijing, China
Kezheng Xu State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
Ming-Jie Luo State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
Haichao Chen School of Clinical Medicine, Beijing Tsinghua Changgung Hospital, Tsinghua Medicine, Tsinghua University, Beijing, China
Yangfan Yang State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
Qi He Research Centre of Big Data and Artificial Research for Medicine, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China
Chenxin Song Research Centre of Big Data and Artificial Research for Medicine, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China
Ruiyao Li Research Centre of Big Data and Artificial Research for Medicine, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China
You Wu Institute for Hospital Management, Tsinghua Medicine, Tsinghua University, Beijing, China; School of Basic Medical Sciences, Tsinghua Medicine, Tsinghua University, Beijing, China; Department of Health Policy and Management, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA.
Haibo Wang Research Centre of Big Data and Artificial Research for Medicine, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China.
Yih Chung Tham Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Singapore Eye Research Institute, Singapore National Eye Centre, Singapore; Ophthalmology and Visual Science Academic Clinical Program, Duke-NUS Medical School, Singapore
Daniel Shu Wei Ting Singapore Eye Research Institute, Singapore National Eye Centre, Singapore; Ophthalmology and Visual Science Academic Clinical Program, Duke-NUS Medical School, Singapore; Byers Eye Institute, Stanford University, Palo Alto, CA, USA
Haotian Lin State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China; Center for Precision Medicine and Department of Genetics and Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China; Hainan Eye Hospital and Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Haikou, China
Tien Yin Wong School of Clinical Medicine, Beijing Tsinghua Changgung Hospital, Tsinghua Medicine, Tsinghua University, Beijing, China; Singapore Eye Research Institute, Singapore National Eye Centre, Singapore; Tsinghua Medicine, Tsinghua University, Beijing, China
Dennis Shun-Chiu Lam The International Eye Research Institute, The Chinese University of Hong Kong (Shenzhen), Shenzhen, China; The C-MER International Eye Care Group, Hong Kong, Hong Kong, China

Collapse

Fatima A, Shafique MA, Alam K, Fadlalla Ahmed TK, Mustafa MS. ChatGPT in medicine: A cross-disciplinary systematic review of ChatGPT's (artificial intelligence) role in research, clinical practice, education, and patient interaction. Medicine (Baltimore) 2024;103:e39250. [PMID: 39121303 PMCID: PMC11315549 DOI: 10.1097/md.0000000000039250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 07/19/2024] [Indexed: 08/11/2024] Open

Hassona Y, Alqaisi D, Al-Haddad A, Georgakopoulou EA, Malamos D, Alrashdan MS, Sawair F. How good is ChatGPT at answering patients' questions related to early detection of oral (mouth) cancer? Oral Surg Oral Med Oral Pathol Oral Radiol 2024;138:269-278. [PMID: 38714483 DOI: 10.1016/j.oooo.2024.04.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 03/22/2024] [Accepted: 04/14/2024] [Indexed: 05/10/2024]

Shi R, Liu S, Xu X, Ye Z, Yang J, Le Q, Qiu J, Tian L, Wei A, Shan K, Zhao C, Sun X, Zhou X, Hong J. Benchmarking four large language models' performance of addressing Chinese patients' inquiries about dry eye disease: A two-phase study. Heliyon 2024;10:e34391. [PMID: 39113991 PMCID: PMC11305187 DOI: 10.1016/j.heliyon.2024.e34391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 07/08/2024] [Accepted: 07/09/2024] [Indexed: 08/10/2024] Open

Abstract

Purpose

To evaluate the performance of four large language models (LLMs)-GPT-4, PaLM 2, Qwen, and Baichuan 2-in generating responses to inquiries from Chinese patients about dry eye disease (DED).

Design

Two-phase study, including a cross-sectional test in the first phase and a real-world clinical assessment in the second phase.

Subjects

Eight board-certified ophthalmologists and 46 patients with DED.

Methods

The chatbots' responses to Chinese patients' inquiries about DED were assessed by the evaluation. In the first phase, six senior ophthalmologists subjectively rated the chatbots' responses using a 5-point Likert scale across five domains: correctness, completeness, readability, helpfulness, and safety. Objective readability analysis was performed using a Chinese readability analysis platform. In the second phase, 46 representative patients with DED asked the two language models (GPT-4 and Baichuan 2) that performed best in the in the first phase questions and then rated the answers for satisfaction and readability. Two senior ophthalmologists then assessed the responses across the five domains.

Main outcome measures

Subjective scores for the five domains and objective readability scores in the first phase. The patient satisfaction, readability scores, and subjective scores for the five-domains in the second phase.

Results

In the first phase, GPT-4 exhibited superior performance across the five domains (correctness: 4.47; completeness: 4.39; readability: 4.47; helpfulness: 4.49; safety: 4.47, p < 0.05). However, the readability analysis revealed that GPT-4's responses were highly complex, with an average score of 12.86 (p < 0.05) compared to scores of 10.87, 11.53, and 11.26 for Qwen, Baichuan 2, and PaLM 2, respectively. In the second phase, as shown by the scores for the five domains, both GPT-4 and Baichuan 2 were adept in answering questions posed by patients with DED. However, the completeness of Baichuan 2's responses was relatively poor (4.04 vs. 4.48 for GPT-4, p < 0.05). Nevertheless, Baichuan 2's recommendations more comprehensible than those of GPT-4 (patient readability: 3.91 vs. 4.61, p < 0.05; ophthalmologist readability: 2.67 vs. 4.33).

Conclusions

The findings underscore the potential of LLMs, particularly that of GPT-4 and Baichuan 2, in delivering accurate and comprehensive responses to questions from Chinese patients about DED.

Collapse

Affiliation(s)

Runhan Shi Department of Ophthalmology and Vision Science, State Key Laboratory of Molecular Engineering of Polymerse, Fudan University, Shanghai, 200031, China NHC Key laboratory of molecular engineering of polymers, Fudan University, Shanghai, 200031, China Shanghai Engineering Research Center of Synthetic Immunology, Shanghai, 200032, China Department of Ophthalmology, Children's Hospital of Fudan University, National Pediatric Medical Center of China, Shanghai, China
Steven Liu Department of Statistics, College of Liberal Arts & Sciences, University of Illinois Urbana-Champaign, Illinois, USA
Xinwei Xu Faculty of Business and Economics, Hong Kong University, Hong Kong Special Administrative Region, China
Zhengqiang Ye Department of Ophthalmology and Vision Science, State Key Laboratory of Molecular Engineering of Polymerse, Fudan University, Shanghai, 200031, China
Jin Yang Department of Ophthalmology and Vision Science, State Key Laboratory of Molecular Engineering of Polymerse, Fudan University, Shanghai, 200031, China
Qihua Le Department of Ophthalmology and Vision Science, State Key Laboratory of Molecular Engineering of Polymerse, Fudan University, Shanghai, 200031, China
Jini Qiu Department of Ophthalmology and Vision Science, State Key Laboratory of Molecular Engineering of Polymerse, Fudan University, Shanghai, 200031, China
Lijia Tian Department of Ophthalmology and Vision Science, State Key Laboratory of Molecular Engineering of Polymerse, Fudan University, Shanghai, 200031, China
Anji Wei Department of Ophthalmology and Vision Science, State Key Laboratory of Molecular Engineering of Polymerse, Fudan University, Shanghai, 200031, China
Kun Shan Department of Ophthalmology and Vision Science, State Key Laboratory of Molecular Engineering of Polymerse, Fudan University, Shanghai, 200031, China
Chen Zhao Department of Ophthalmology and Vision Science, State Key Laboratory of Molecular Engineering of Polymerse, Fudan University, Shanghai, 200031, China
Xinghuai Sun Department of Ophthalmology and Vision Science, State Key Laboratory of Molecular Engineering of Polymerse, Fudan University, Shanghai, 200031, China
Xingtao Zhou Department of Ophthalmology and Vision Science, State Key Laboratory of Molecular Engineering of Polymerse, Fudan University, Shanghai, 200031, China
Jiaxu Hong Department of Ophthalmology and Vision Science, State Key Laboratory of Molecular Engineering of Polymerse, Fudan University, Shanghai, 200031, China NHC Key laboratory of molecular engineering of polymers, Fudan University, Shanghai, 200031, China Shanghai Engineering Research Center of Synthetic Immunology, Shanghai, 200032, China Department of Ophthalmology, Children's Hospital of Fudan University, National Pediatric Medical Center of China, Shanghai, China

Collapse

Ihara K, Dumkrieger G, Zhang P, Takizawa T, Schwedt TJ, Chiang CC. Application of Artificial Intelligence in the Headache Field. Curr Pain Headache Rep 2024:10.1007/s11916-024-01297-5. [PMID: 38976174 DOI: 10.1007/s11916-024-01297-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/27/2024] [Indexed: 07/09/2024]

Haltaufderheide J, Ranisch R. The ethics of ChatGPT in medicine and healthcare: a systematic review on Large Language Models (LLMs). NPJ Digit Med 2024;7:183. [PMID: 38977771 PMCID: PMC11231310 DOI: 10.1038/s41746-024-01157-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 05/29/2024] [Indexed: 07/10/2024] Open

Abstract

With the introduction of ChatGPT, Large Language Models (LLMs) have received enormous attention in healthcare. Despite potential benefits, researchers have underscored various ethical implications. While individual instances have garnered attention, a systematic and comprehensive overview of practical applications currently researched and ethical issues connected to them is lacking. Against this background, this work maps the ethical landscape surrounding the current deployment of LLMs in medicine and healthcare through a systematic review. Electronic databases and preprint servers were queried using a comprehensive search strategy which generated 796 records. Studies were screened and extracted following a modified rapid review approach. Methodological quality was assessed using a hybrid approach. For 53 records, a meta-aggregative synthesis was performed. Four general fields of applications emerged showcasing a dynamic exploration phase. Advantages of using LLMs are attributed to their capacity in data analysis, information provisioning, support in decision-making or mitigating information loss and enhancing information accessibility. However, our study also identifies recurrent ethical concerns connected to fairness, bias, non-maleficence, transparency, and privacy. A distinctive concern is the tendency to produce harmful or convincing but inaccurate content. Calls for ethical guidance and human oversight are recurrent. We suggest that the ethical guidance debate should be reframed to focus on defining what constitutes acceptable human oversight across the spectrum of applications. This involves considering the diversity of settings, varying potentials for harm, and different acceptable thresholds for performance and certainty in healthcare. Additionally, critical inquiry is needed to evaluate the necessity and justification of LLMs' current experimental use.

Collapse

Hassona Y, Alqaisi DA. "My kid has autism": An interesting conversation with ChatGPT. SPECIAL CARE IN DENTISTRY 2024;44:1296-1299. [PMID: 38415857 DOI: 10.1111/scd.12983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 02/11/2024] [Accepted: 02/16/2024] [Indexed: 02/29/2024]

Lyman GH, Kuderer NM. Artificial Intelligence in Cancer Clinical Research: I. Introduction. Cancer Invest 2024;42:443-446. [PMID: 38695668 DOI: 10.1080/07357907.2024.2347784] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2024]

Sharma H, Ruikar M. Artificial intelligence at the pen's edge: Exploring the ethical quagmires in using artificial intelligence models like ChatGPT for assisted writing in biomedical research. Perspect Clin Res 2024;15:108-115. [PMID: 39140014 PMCID: PMC11318783 DOI: 10.4103/picr.picr_196_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 08/09/2023] [Accepted: 08/11/2023] [Indexed: 08/15/2024] Open

Checcucci E, Rodler S, Piazza P, Porpiglia F, Cacciamani GE. Transitioning from "Dr. Google" to "Dr. ChatGPT": the advent of artificial intelligence chatbots. Transl Androl Urol 2024;13:1067-1070. [PMID: 38983463 PMCID: PMC11228672 DOI: 10.21037/tau-23-629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 04/09/2024] [Indexed: 07/11/2024] Open

McGrath SP, Kozel BA, Gracefo S, Sutherland N, Danford CJ, Walton N. A comparative evaluation of ChatGPT 3.5 and ChatGPT 4 in responses to selected genetics questions. J Am Med Inform Assoc 2024:ocae128. [PMID: 38872284 DOI: 10.1093/jamia/ocae128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 04/23/2024] [Accepted: 05/28/2024] [Indexed: 06/15/2024] Open

Maggio MG, Tartarisco G, Cardile D, Bonanno M, Bruschetta R, Pignolo L, Pioggia G, Calabrò RS, Cerasa A. Exploring ChatGPT's potential in the clinical stream of neurorehabilitation. Front Artif Intell 2024;7:1407905. [PMID: 38903157 PMCID: PMC11187276 DOI: 10.3389/frai.2024.1407905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Accepted: 05/13/2024] [Indexed: 06/22/2024] Open

Loughran E, Kane M, Wyatt TH, Kerley A, Lowe S, Li X. Using Large Language Models to Address Health Literacy in mHealth: Case Report. Comput Inform Nurs 2024:00024665-990000000-00193. [PMID: 38832874 DOI: 10.1097/cin.0000000000001152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2024]

Kamyabi A, Iyamu I, Saini M, May C, McKee G, Choi A. Advocating for population health: The role of public health practitioners in the age of artificial intelligence. CANADIAN JOURNAL OF PUBLIC HEALTH = REVUE CANADIENNE DE SANTE PUBLIQUE 2024;115:473-476. [PMID: 38625496 PMCID: PMC11151885 DOI: 10.17269/s41997-024-00881-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 03/14/2024] [Indexed: 04/17/2024]

Riestra-Ayora J, Vaduva C, Esteban-Sánchez J, Garrote-Garrote M, Fernández-Navarro C, Sánchez-Rodríguez C, Martin-Sanz E. ChatGPT as an information tool in rhinology. Can we trust each other today? Eur Arch Otorhinolaryngol 2024;281:3253-3259. [PMID: 38436756 DOI: 10.1007/s00405-024-08581-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Accepted: 02/23/2024] [Indexed: 03/05/2024]

Hussain T, Wang D, Li B. The influence of the COVID-19 pandemic on the adoption and impact of AI ChatGPT: Challenges, applications, and ethical considerations. Acta Psychol (Amst) 2024;246:104264. [PMID: 38626597 DOI: 10.1016/j.actpsy.2024.104264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 04/08/2024] [Accepted: 04/09/2024] [Indexed: 04/18/2024] Open

Abstract

DESIGN/METHODOLOGY/APPROACH

This article employs qualitative thematic modeling to gather insights from 30 informants. The study explores various aspects related to the impact of the COVID-19 pandemic on AI ChatGPT technologies.

PURPOSE

The purpose of this research is to examine how the COVID-19 pandemic has influenced the increased usage and adoption of AI ChatGPT. It aims to explore the pandemic's impact on AI ChatGPT and its applications in specific domains, as well as the challenges and opportunities it presents.

FINDINGS

The findings highlight that the pandemic has led to a surge in online activities, resulting in a heightened demand for AI ChatGPT. It has been widely used in areas such as healthcare, mental health support, remote collaboration, and personalized customer experiences. The article showcases examples of AI ChatGPT's application during the pandemic.

STRENGTH OF STUDY

This qualitative framework enables the study to delve deeply into the multifaceted dimensions of AI ChatGPT's role during the pandemic, capturing the diverse experiences and insights of users, practitioners, and experts. By embracing the qualitative nature of inquiry and this research offers a comprehensive understanding of the challenges, opportunities, and ethical considerations associated with the adoption and utilization of AI ChatGPT in crisis contexts.

PRACTICAL IMPLICATIONS

The insights from this research have practical implications for policymakers, developers, and researchers. This reserach emphasize the need for responsible and ethical implementation of AI ChatGPT to fully harness its potential in addressing societal needs during and beyond the pandemic.

SOCIAL IMPLICATIONS

The increased reliance on AI ChatGPT during the pandemic has led to changes in user behavior, expectations, and interactions. However, it has also unveiled ethical considerations and potential risks. Addressing societal and ethical concerns, such as user impact and autonomy, privacy and security, bias and fairness, and transparency and accountability, is crucial for the responsible deployment of AI ChatGPT.

ORIGINALITY/VALUE

This research contributes to the understanding of the novel role of AI ChatGPT in times of crisis, particularly in the era of COVID-19 pandemic. It highlights the necessity of responsible and ethical implementation of AI ChatGPT and provides valuable insights for the development and application of AI technology in the future.

Collapse

Ying H, Zhao Z, Zhao Y, Zeng S, Yu S. CoRTEx: contrastive learning for representing terms via explanations with applications on constructing biomedical knowledge graphs. J Am Med Inform Assoc 2024:ocae115. [PMID: 38777805 DOI: 10.1093/jamia/ocae115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 03/11/2024] [Accepted: 05/14/2024] [Indexed: 05/25/2024] Open

Kittichai V, Sompong W, Kaewthamasorn M, Sasisaowapak T, Naing KM, Tongloy T, Chuwongin S, Thanee S, Boonsang S. A novel approach for identification of zoonotic trypanosome utilizing deep metric learning and vector database-based image retrieval system. Heliyon 2024;10:e30643. [PMID: 38774068 PMCID: PMC11107104 DOI: 10.1016/j.heliyon.2024.e30643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Revised: 04/28/2024] [Accepted: 05/01/2024] [Indexed: 05/24/2024] Open

Abstract

Trypanosomiasis, a significant health concern in South America, South Asia, and Southeast Asia, requires active surveys to effectively control the disease. To address this, we have developed a hybrid model that combines deep metric learning (DML) and image retrieval. This model is proficient at identifying Trypanosoma species in microscopic images of thin-blood film examinations. Utilizing the ResNet50 backbone neural network, a trained-model has demonstrated outstanding performance, achieving an accuracy exceeding 99.71 % and up to 96 % in recall. Acknowledging the necessity for automated tools in field scenarios, we demonstrated the potential of our model as an autonomous screening approach. This was achieved by using prevailing convolutional neural network (CNN) applications, and vector database based-images returned by the KNN algorithm. This achievement is primarily attributed to the implementation of the Triplet Margin Loss function as 98 % of precision. The robustness of the model demonstrated in five-fold cross-validation highlights the ResNet50 neural network, based on DML, as a state-of-the-art CNN model as AUC >98 %. The adoption of DML significantly improves the performance of the model, remaining unaffected by variations in the dataset and rendering it a useful tool for fieldwork studies. DML offers several advantages over conventional classification model to manage large-scale datasets with a high volume of classes, enhancing scalability. The model has the capacity to generalize to novel classes that were not encountered during training, proving particularly advantageous in scenarios where new classes may consistently emerge. It is also well suited for applications requiring precise recognition, especially in discriminating between closely related classes. Furthermore, the DML exhibits greater resilience to issues related to class imbalance, as it concentrates on learning distances or similarities, which are more tolerant to such imbalances. These contributions significantly make the effectiveness and practicality of DML model, particularly in in fieldwork research.

Collapse

Gwon YN, Kim JH, Chung HS, Jung EJ, Chun J, Lee S, Shim SR. The Use of Generative AI for Scientific Literature Searches for Systematic Reviews: ChatGPT and Microsoft Bing AI Performance Evaluation. JMIR Med Inform 2024;12:e51187. [PMID: 38771247 PMCID: PMC11107769 DOI: 10.2196/51187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 03/31/2024] [Accepted: 04/04/2024] [Indexed: 05/22/2024] Open

Abstract

Background

A large language model is a type of artificial intelligence (AI) model that opens up great possibilities for health care practice, research, and education, although scholars have emphasized the need to proactively address the issue of unvalidated and inaccurate information regarding its use. One of the best-known large language models is ChatGPT (OpenAI). It is believed to be of great help to medical research, as it facilitates more efficient data set analysis, code generation, and literature review, allowing researchers to focus on experimental design as well as drug discovery and development.

Objective

This study aims to explore the potential of ChatGPT as a real-time literature search tool for systematic reviews and clinical decision support systems, to enhance their efficiency and accuracy in health care settings.

Methods

The search results of a published systematic review by human experts on the treatment of Peyronie disease were selected as a benchmark, and the literature search formula of the study was applied to ChatGPT and Microsoft Bing AI as a comparison to human researchers. Peyronie disease typically presents with discomfort, curvature, or deformity of the penis in association with palpable plaques and erectile dysfunction. To evaluate the quality of individual studies derived from AI answers, we created a structured rating system based on bibliographic information related to the publications. We classified its answers into 4 grades if the title existed: A, B, C, and F. No grade was given for a fake title or no answer.

Results

From ChatGPT, 7 (0.5%) out of 1287 identified studies were directly relevant, whereas Bing AI resulted in 19 (40%) relevant studies out of 48, compared to the human benchmark of 24 studies. In the qualitative evaluation, ChatGPT had 7 grade A, 18 grade B, 167 grade C, and 211 grade F studies, and Bing AI had 19 grade A and 28 grade C studies.

Conclusions

This is the first study to compare AI and conventional human systematic review methods as a real-time literature collection tool for evidence-based medicine. The results suggest that the use of ChatGPT as a tool for real-time evidence generation is not yet accurate and feasible. Therefore, researchers should be cautious about using such AI. The limitations of this study using the generative pre-trained transformer model are that the search for research topics was not diverse and that it did not prevent the hallucination of generative AI. However, this study will serve as a standard for future studies by providing an index to verify the reliability and consistency of generative AI from a user's point of view. If the reliability and consistency of AI literature search services are verified, then the use of these technologies will help medical research greatly.

Collapse

Xu Z, Fang Q, Huang Y, Xie M. The public attitude towards ChatGPT on reddit: A study based on unsupervised learning from sentiment analysis and topic modeling. PLoS One 2024;19:e0302502. [PMID: 38743773 PMCID: PMC11093324 DOI: 10.1371/journal.pone.0302502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 04/07/2024] [Indexed: 05/16/2024] Open

Aguirre A, Hilsabeck R, Smith T, Xie B, He D, Wang Z, Zou N. Assessing the Quality of ChatGPT Responses to Dementia Caregivers' Questions: Qualitative Analysis. JMIR Aging 2024;7:e53019. [PMID: 38722219 PMCID: PMC11089887 DOI: 10.2196/53019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 02/15/2024] [Accepted: 03/09/2024] [Indexed: 05/15/2024] Open

Abstract

Background

Artificial intelligence (AI) such as ChatGPT by OpenAI holds great promise to improve the quality of life of patients with dementia and their caregivers by providing high-quality responses to their questions about typical dementia behaviors. So far, however, evidence on the quality of such ChatGPT responses is limited. A few recent publications have investigated the quality of ChatGPT responses in other health conditions. Our study is the first to assess ChatGPT using real-world questions asked by dementia caregivers themselves.

objectives

This pilot study examines the potential of ChatGPT-3.5 to provide high-quality information that may enhance dementia care and patient-caregiver education.

Methods

Our interprofessional team used a formal rating scale (scoring range: 0-5; the higher the score, the better the quality) to evaluate ChatGPT responses to real-world questions posed by dementia caregivers. We selected 60 posts by dementia caregivers from Reddit, a popular social media platform. These posts were verified by 3 interdisciplinary dementia clinicians as representing dementia caregivers' desire for information in the areas of memory loss and confusion, aggression, and driving. Word count for posts in the memory loss and confusion category ranged from 71 to 531 (mean 218; median 188), aggression posts ranged from 58 to 602 words (mean 254; median 200), and driving posts ranged from 93 to 550 words (mean 272; median 276).

Results

ChatGPT's response quality scores ranged from 3 to 5. Of the 60 responses, 26 (43%) received 5 points, 21 (35%) received 4 points, and 13 (22%) received 3 points, suggesting high quality. ChatGPT obtained consistently high scores in synthesizing information to provide follow-up recommendations (n=58, 96%), with the lowest scores in the area of comprehensiveness (n=38, 63%).

Conclusions

ChatGPT provided high-quality responses to complex questions posted by dementia caregivers, but it did have limitations. ChatGPT was unable to anticipate future problems that a human professional might recognize and address in a clinical encounter. At other times, ChatGPT recommended a strategy that the caregiver had already explicitly tried. This pilot study indicates the potential of AI to provide high-quality information to enhance dementia care and patient-caregiver education in tandem with information provided by licensed health care professionals. Evaluating the quality of responses is necessary to ensure that caregivers can make informed decisions. ChatGPT has the potential to transform health care practice by shaping how caregivers receive health information.

Collapse

Nguyen J, Owen SC. Emerging Voices in Drug Delivery - Breaking Barriers (Issue 1). Adv Drug Deliv Rev 2024;208:115273. [PMID: 38447932 DOI: 10.1016/j.addr.2024.115273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2024]

Han Z, Battaglia F, Udaiyar A, Fooks A, Terlecky SR. An explorative assessment of ChatGPT as an aid in medical education: Use it with caution. MEDICAL TEACHER 2024;46:657-664. [PMID: 37862566 DOI: 10.1080/0142159x.2023.2271159] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/22/2023]

Yaghy A, Yaghy M, Shields JA, Shields CL. Large Language Models in Ophthalmology: Potential and Pitfalls. Semin Ophthalmol 2024;39:289-293. [PMID: 38179986 DOI: 10.1080/08820538.2023.2300808] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Accepted: 12/06/2023] [Indexed: 01/06/2024]

Sawamura S, Bito T, Ando T, Masuda K, Kameyama S, Ishida H. Evaluation of the accuracy of ChatGPT's responses to and references for clinical questions in physical therapy. J Phys Ther Sci 2024;36:234-239. [PMID: 38694019 PMCID: PMC11060764 DOI: 10.1589/jpts.36.234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 01/29/2024] [Indexed: 05/03/2024] Open

Owen SC, Nguyen J. Emerging Voices in Drug Delivery - Harnessing and Modulating Complex Biological Systems (Issue 2). Adv Drug Deliv Rev 2024;208:115293. [PMID: 38521245 DOI: 10.1016/j.addr.2024.115293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2024]

Choudhury A, Chaudhry Z. Large Language Models and User Trust: Consequence of Self-Referential Learning Loop and the Deskilling of Health Care Professionals. J Med Internet Res 2024;26:e56764. [PMID: 38662419 PMCID: PMC11082730 DOI: 10.2196/56764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 03/12/2024] [Accepted: 03/20/2024] [Indexed: 04/26/2024] Open

Abstract

As the health care industry increasingly embraces large language models (LLMs), understanding the consequence of this integration becomes crucial for maximizing benefits while mitigating potential pitfalls. This paper explores the evolving relationship among clinician trust in LLMs, the transition of data sources from predominantly human-generated to artificial intelligence (AI)-generated content, and the subsequent impact on the performance of LLMs and clinician competence. One of the primary concerns identified in this paper is the LLMs' self-referential learning loops, where AI-generated content feeds into the learning algorithms, threatening the diversity of the data pool, potentially entrenching biases, and reducing the efficacy of LLMs. While theoretical at this stage, this feedback loop poses a significant challenge as the integration of LLMs in health care deepens, emphasizing the need for proactive dialogue and strategic measures to ensure the safe and effective use of LLM technology. Another key takeaway from our investigation is the role of user expertise and the necessity for a discerning approach to trusting and validating LLM outputs. The paper highlights how expert users, particularly clinicians, can leverage LLMs to enhance productivity by off-loading routine tasks while maintaining a critical oversight to identify and correct potential inaccuracies in AI-generated content. This balance of trust and skepticism is vital for ensuring that LLMs augment rather than undermine the quality of patient care. We also discuss the risks associated with the deskilling of health care professionals. Frequent reliance on LLMs for critical tasks could result in a decline in health care providers' diagnostic and thinking skills, particularly affecting the training and development of future professionals. The legal and ethical considerations surrounding the deployment of LLMs in health care are also examined. We discuss the medicolegal challenges, including liability in cases of erroneous diagnoses or treatment advice generated by LLMs. The paper references recent legislative efforts, such as The Algorithmic Accountability Act of 2023, as crucial steps toward establishing a framework for the ethical and responsible use of AI-based technologies in health care. In conclusion, this paper advocates for a strategic approach to integrating LLMs into health care. By emphasizing the importance of maintaining clinician expertise, fostering critical engagement with LLM outputs, and navigating the legal and ethical landscape, we can ensure that LLMs serve as valuable tools in enhancing patient care and supporting health care professionals. This approach addresses the immediate challenges posed by integrating LLMs and sets a foundation for their maintainable and responsible use in the future.

Collapse

Raman R, Lathabai HH, Mandal S, Das P, Kaur T, Nedungadi P. ChatGPT: Literate or intelligent about UN sustainable development goals? PLoS One 2024;19:e0297521. [PMID: 38656952 PMCID: PMC11042716 DOI: 10.1371/journal.pone.0297521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 01/05/2024] [Indexed: 04/26/2024] Open

Maccaro A, Stokes K, Statham L, He L, Williams A, Pecchia L, Piaggio D. Clearing the Fog: A Scoping Literature Review on the Ethical Issues Surrounding Artificial Intelligence-Based Medical Devices. J Pers Med 2024;14:443. [PMID: 38793025 PMCID: PMC11121798 DOI: 10.3390/jpm14050443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 04/12/2024] [Accepted: 04/16/2024] [Indexed: 05/26/2024] Open

Lucas HC, Upperman JS, Robinson JR. A systematic review of large language models and their implications in medical education. MEDICAL EDUCATION 2024. [PMID: 38639098 DOI: 10.1111/medu.15402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 03/20/2024] [Accepted: 03/23/2024] [Indexed: 04/20/2024]

Abstract

INTRODUCTION

In the past year, the use of large language models (LLMs) has generated significant interest and excitement because of their potential to revolutionise various fields, including medical education for aspiring physicians. Although medical students undergo a demanding educational process to become competent health care professionals, the emergence of LLMs presents a promising solution to challenges like information overload, time constraints and pressure on clinical educators. However, integrating LLMs into medical education raises critical concerns and challenges for educators, professionals and students. This systematic review aims to explore LLM applications in medical education, specifically their impact on medical students' learning experiences.

METHODS

A systematic search was performed in PubMed, Web of Science and Embase for articles discussing the applications of LLMs in medical education using selected keywords related to LLMs and medical education, from the time of ChatGPT's debut until February 2024. Only articles available in full text or English were reviewed. The credibility of each study was critically appraised by two independent reviewers.

RESULTS

The systematic review identified 166 studies, of which 40 were found by review to be relevant to the study. Among the 40 relevant studies, key themes included LLM capabilities, benefits such as personalised learning and challenges regarding content accuracy. Importantly, 42.5% of these studies specifically evaluated LLMs in a novel way, including ChatGPT, in contexts such as medical exams and clinical/biomedical information, highlighting their potential in replicating human-level performance in medical knowledge. The remaining studies broadly discussed the prospective role of LLMs in medical education, reflecting a keen interest in their future potential despite current constraints.

CONCLUSIONS

The responsible implementation of LLMs in medical education offers a promising opportunity to enhance learning experiences. However, ensuring information accuracy, emphasising skill-building and maintaining ethical safeguards are crucial. Continuous critical evaluation and interdisciplinary collaboration are essential for the appropriate integration of LLMs in medical education.

Collapse

Siepmann R, Huppertz M, Rastkhiz A, Reen M, Corban E, Schmidt C, Wilke S, Schad P, Yüksel C, Kuhl C, Truhn D, Nebelung S. The virtual reference radiologist: comprehensive AI assistance for clinical image reading and interpretation. Eur Radiol 2024:10.1007/s00330-024-10727-2. [PMID: 38627289 DOI: 10.1007/s00330-024-10727-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 02/27/2024] [Accepted: 03/08/2024] [Indexed: 04/20/2024]

Abstract

OBJECTIVES

Large language models (LLMs) have shown potential in radiology, but their ability to aid radiologists in interpreting imaging studies remains unexplored. We investigated the effects of a state-of-the-art LLM (GPT-4) on the radiologists' diagnostic workflow.

MATERIALS AND METHODS

In this retrospective study, six radiologists of different experience levels read 40 selected radiographic [n = 10], CT [n = 10], MRI [n = 10], and angiographic [n = 10] studies unassisted (session one) and assisted by GPT-4 (session two). Each imaging study was presented with demographic data, the chief complaint, and associated symptoms, and diagnoses were registered using an online survey tool. The impact of Artificial Intelligence (AI) on diagnostic accuracy, confidence, user experience, input prompts, and generated responses was assessed. False information was registered. Linear mixed-effect models were used to quantify the factors (fixed: experience, modality, AI assistance; random: radiologist) influencing diagnostic accuracy and confidence.

RESULTS

When assessing if the correct diagnosis was among the top-3 differential diagnoses, diagnostic accuracy improved slightly from 181/240 (75.4%, unassisted) to 188/240 (78.3%, AI-assisted). Similar improvements were found when only the top differential diagnosis was considered. AI assistance was used in 77.5% of the readings. Three hundred nine prompts were generated, primarily involving differential diagnoses (59.1%) and imaging features of specific conditions (27.5%). Diagnostic confidence was significantly higher when readings were AI-assisted (p > 0.001). Twenty-three responses (7.4%) were classified as hallucinations, while two (0.6%) were misinterpretations.

CONCLUSION

Integrating GPT-4 in the diagnostic process improved diagnostic accuracy slightly and diagnostic confidence significantly. Potentially harmful hallucinations and misinterpretations call for caution and highlight the need for further safeguarding measures.

CLINICAL RELEVANCE STATEMENT

Using GPT-4 as a virtual assistant when reading images made six radiologists of different experience levels feel more confident and provide more accurate diagnoses; yet, GPT-4 gave factually incorrect and potentially harmful information in 7.4% of its responses.

Collapse

Javid M, Bhandari M, Parameshwari P, Reddiboina M, Prasad S. Evaluation of ChatGPT for Patient Counseling in Kidney Stone Clinic: A Prospective Study. J Endourol 2024;38:377-383. [PMID: 38411835 DOI: 10.1089/end.2023.0571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/28/2024] Open

Abstract

Introduction: The potential of large language models (LLMs) is to improve the clinical workflow and to make patient care efficient. We prospectively evaluated the performance of the LLM ChatGPT as a patient counseling tool in the urology stone clinic and validated the generated responses with those of urologists. Methods: We collected 61 questions from 12 kidney stone patients and prompted those to ChatGPT and a panel of experienced urologists (Level 1). Subsequently, the blinded responses of urologists and ChatGPT were presented to two expert urologists (Level 2) for comparative evaluation on preset domains: accuracy, relevance, empathy, completeness, and practicality. All responses were rated on a Likert scale of 1 to 10 for psychometric response evaluation. The mean difference in the scores given by the urologists (Level 2) was analyzed and interrater reliability (IRR) for the level of agreement in the responses between the urologists (Level 2) was analyzed by Cohen's kappa. Results: The mean differences in average scores between the responses from ChatGPT and urologists showed significant differences in accuracy (p < 0.001), empathy (p < 0.001), completeness (p < 0.001), and practicality (p < 0.001), except for the relevance domain (p = 0.051), with ChatGPT's responses being rated higher. The IRR analysis revealed significant agreement only in the empathy domain [k = 0.163, (0.059-0.266)]. Conclusion: We believe the introduction of ChatGPT in the clinical workflow could further optimize the information provided to patients in a busy stone clinic. In this preliminary study, ChatGPT supplemented the answers provided by the urologists, adding value to the conversation. However, in its current state, it is still not ready to be a direct source of authentic information for patients. We recommend its use as a source to build a comprehensive Frequently Asked Questions bank as a prelude to developing an LLM Chatbot for patient counseling.

Collapse

Shukla R, Mishra AK, Banerjee N, Verma A. The Comparison of ChatGPT 3.5, Microsoft Bing, and Google Gemini for Diagnosing Cases of Neuro-Ophthalmology. Cureus 2024;16:e58232. [PMID: 38745784 PMCID: PMC11092423 DOI: 10.7759/cureus.58232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/14/2024] [Indexed: 05/16/2024] Open

Menz BD, Kuderer NM, Bacchi S, Modi ND, Chin-Yee B, Hu T, Rickard C, Haseloff M, Vitry A, McKinnon RA, Kichenadasse G, Rowland A, Sorich MJ, Hopkins AM. Current safeguards, risk mitigation, and transparency measures of large language models against the generation of health disinformation: repeated cross sectional analysis. BMJ 2024;384:e078538. [PMID: 38508682 PMCID: PMC10961718 DOI: 10.1136/bmj-2023-078538] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/19/2024] [Indexed: 03/22/2024]

Abstract

OBJECTIVES

To evaluate the effectiveness of safeguards to prevent large language models (LLMs) from being misused to generate health disinformation, and to evaluate the transparency of artificial intelligence (AI) developers regarding their risk mitigation processes against observed vulnerabilities.

DESIGN

Repeated cross sectional analysis.

SETTING

Publicly accessible LLMs.

METHODS

In a repeated cross sectional analysis, four LLMs (via chatbots/assistant interfaces) were evaluated: OpenAI's GPT-4 (via ChatGPT and Microsoft's Copilot), Google's PaLM 2 and newly released Gemini Pro (via Bard), Anthropic's Claude 2 (via Poe), and Meta's Llama 2 (via HuggingChat). In September 2023, these LLMs were prompted to generate health disinformation on two topics: sunscreen as a cause of skin cancer and the alkaline diet as a cancer cure. Jailbreaking techniques (ie, attempts to bypass safeguards) were evaluated if required. For LLMs with observed safeguarding vulnerabilities, the processes for reporting outputs of concern were audited. 12 weeks after initial investigations, the disinformation generation capabilities of the LLMs were re-evaluated to assess any subsequent improvements in safeguards.

MAIN OUTCOME MEASURES

The main outcome measures were whether safeguards prevented the generation of health disinformation, and the transparency of risk mitigation processes against health disinformation.

RESULTS

Claude 2 (via Poe) declined 130 prompts submitted across the two study timepoints requesting the generation of content claiming that sunscreen causes skin cancer or that the alkaline diet is a cure for cancer, even with jailbreaking attempts. GPT-4 (via Copilot) initially refused to generate health disinformation, even with jailbreaking attempts-although this was not the case at 12 weeks. In contrast, GPT-4 (via ChatGPT), PaLM 2/Gemini Pro (via Bard), and Llama 2 (via HuggingChat) consistently generated health disinformation blogs. In September 2023 evaluations, these LLMs facilitated the generation of 113 unique cancer disinformation blogs, totalling more than 40 000 words, without requiring jailbreaking attempts. The refusal rate across the evaluation timepoints for these LLMs was only 5% (7 of 150), and as prompted the LLM generated blogs incorporated attention grabbing titles, authentic looking (fake or fictional) references, fabricated testimonials from patients and clinicians, and they targeted diverse demographic groups. Although each LLM evaluated had mechanisms to report observed outputs of concern, the developers did not respond when observations of vulnerabilities were reported.

CONCLUSIONS

This study found that although effective safeguards are feasible to prevent LLMs from being misused to generate health disinformation, they were inconsistently implemented. Furthermore, effective processes for reporting safeguard problems were lacking. Enhanced regulation, transparency, and routine auditing are required to help prevent LLMs from contributing to the mass generation of health disinformation.

Collapse

Affiliation(s)

Bradley D Menz College of Medicine and Public Health, Flinders University, Adelaide, SA, 5042, Australia
Nicole M Kuderer Advanced Cancer Research Group, Kirkland, WA, USA
Stephen Bacchi College of Medicine and Public Health, Flinders University, Adelaide, SA, 5042, Australia Northern Adelaide Local Health Network, Lyell McEwin Hospital, Adelaide, Australia
Natansh D Modi College of Medicine and Public Health, Flinders University, Adelaide, SA, 5042, Australia
Benjamin Chin-Yee Schulich School of Medicine and Dentistry, Western University, London, Canada Department of History and Philosophy of Science, University of Cambridge, Cambridge, UK
Tiancheng Hu Language Technology Lab, University of Cambridge, Cambridge, UK
Ceara Rickard Consumer Advisory Group, Clinical Cancer Epidemiology Group, College of Medicine and Public Health, Flinders University, Adelaide, Australia
Mark Haseloff Consumer Advisory Group, Clinical Cancer Epidemiology Group, College of Medicine and Public Health, Flinders University, Adelaide, Australia
Agnes Vitry Consumer Advisory Group, Clinical Cancer Epidemiology Group, College of Medicine and Public Health, Flinders University, Adelaide, Australia University of South Australia, Clinical and Health Sciences, Adelaide, Australia
Ross A McKinnon College of Medicine and Public Health, Flinders University, Adelaide, SA, 5042, Australia
Ganessan Kichenadasse College of Medicine and Public Health, Flinders University, Adelaide, SA, 5042, Australia Flinders Centre for Innovation in Cancer, Department of Medical Oncology, Flinders Medical Centre, Flinders University, Bedford Park, South Australia, Australia
Andrew Rowland College of Medicine and Public Health, Flinders University, Adelaide, SA, 5042, Australia
Michael J Sorich College of Medicine and Public Health, Flinders University, Adelaide, SA, 5042, Australia
Ashley M Hopkins College of Medicine and Public Health, Flinders University, Adelaide, SA, 5042, Australia

Collapse

Elbadawi M, Li H, Basit AW, Gaisford S. The role of artificial intelligence in generating original scientific research. Int J Pharm 2024;652:123741. [PMID: 38181989 DOI: 10.1016/j.ijpharm.2023.123741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 12/20/2023] [Accepted: 12/22/2023] [Indexed: 01/07/2024]

Sharpnack PA. Made Better by Chat GPT: Cultivating a Culture of Innovation in Nursing Education: Cultivating a Culture of Innovation in Nursing Education. Nurs Educ Perspect 2024;45:67-68. [PMID: 38373098 DOI: 10.1097/01.nep.0000000000001242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2024]

Gurnani B, Kaur K. Leveraging ChatGPT for ophthalmic education: A critical appraisal. Eur J Ophthalmol 2024;34:323-327. [PMID: 37974429 DOI: 10.1177/11206721231215862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2023]

Tunçer G, Güçlü KG. How Reliable is ChatGPT as a Novel Consultant in Infectious Diseases and Clinical Microbiology? INFECTIOUS DISEASES & CLINICAL MICROBIOLOGY 2024;6:55-59. [PMID: 38633442 PMCID: PMC11020004 DOI: 10.36519/idcm.2024.286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 12/14/2023] [Indexed: 04/19/2024]

Abstract

Objective

The study aimed to investigate the reliability of ChatGPT's answers to medical questions, including those sourced from patients and guide recommendations. The focus was on evaluating ChatGPT's accuracy in responding to various types of infectious disease questions.

Materials and Methods

The study was conducted using 200 questions sourced from social media, experts, and guidelines related to various infectious diseases like urinary tract infection, pneumonia, HIV, various types of hepatitis, COVID-19, skin infections, and tuberculosis. The questions were arranged for clarity and consistency by excluding repetitive or unclear ones. The answers were based on guidelines from reputable sources like the Infectious Diseases Society of America (IDSA), Centers for Disease Control and Prevention (CDC), European Association for the Study of Liver Disease (EASL) and Joint United Nations Programme on HIV/AIDS (UNAIDS) AIDSinfo. According to the scoring system, completely correct answers were given 1-point, and completely incorrect ones were given 4-points. To assess reproducibility, each question was posed twice on separate computers. Repeatability was determined by the consistency of the answers' scores.

Results

In the study, ChatGPT was posed with 200 questions: 107 from social media platforms and 93 from guidelines. The questions covered a range of topics: urinary tract infections (n=18 questions), pneumonia (n=22), HIV (n=39), hepatitis B and C (n=53), COVID-19 (n=11), skin and soft tissue infections (n=38), and tuberculosis (n=19). The lowest accuracy was 72% for urinary tract infections. ChatGPT answered 92% of social media platform questions correctly (scored 1-point) versus 69% of guideline questions (p=0.001; OR=5.48, 95% CI=2.29-13.11).

Conclusion

Artificial intelligence is widely used in the medical field by both healthcare professionals and patients. Although ChatGPT answers questions from social media platforms quite properly, we recommend that healthcare professionals be conscientious when using it.

Collapse

Haman M, Školník M, Lošťák M. AI dietician: Unveiling the accuracy of ChatGPT's nutritional estimations. Nutrition 2024;119:112325. [PMID: 38194819 DOI: 10.1016/j.nut.2023.112325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 11/02/2023] [Accepted: 12/04/2023] [Indexed: 01/11/2024]

Birkun AA. Misinformation on resuscitation and first aid as an uncontrolled problem that demands close attention: a brief scoping review. Public Health 2024;228:147-149. [PMID: 38354584 DOI: 10.1016/j.puhe.2024.01.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 12/28/2023] [Accepted: 01/07/2024] [Indexed: 02/16/2024]

Tao BK, Handzic A, Hua NJ, Vosoughi AR, Margolin EA, Micieli JA. Utility of ChatGPT for Automated Creation of Patient Education Handouts: An Application in Neuro-Ophthalmology. J Neuroophthalmol 2024;44:119-124. [PMID: 38175720 DOI: 10.1097/wno.0000000000002074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]

Kueper JK, Emu M, Banbury M, Bjerre LM, Choudhury S, Green M, Pimlott N, Slade S, Tsuei SH, Sisler J. Artificial intelligence for family medicine research in Canada: current state and future directions: Report of the CFPC AI Working Group. CANADIAN FAMILY PHYSICIAN MEDECIN DE FAMILLE CANADIEN 2024;70:161-168. [PMID: 38499374 PMCID: PMC11280631 DOI: 10.46747/cfp.7003161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]

Abstract

OBJECTIVE

To understand the current landscape of artificial intelligence (AI) for family medicine (FM) research in Canada, identify how the College of Family Physicians of Canada (CFPC) could support near-term positive progress in this field, and strengthen the community working in this field.

COMPOSITION OF THE COMMITTEE

Members of a scientific planning committee provided guidance alongside members of a CFPC staff advisory committee, led by the CFPC-AMS TechForward Fellow and including CFPC, FM, and AI leaders.

METHODS

This initiative included 2 projects. First, an environmental scan of published and gray literature on AI for FM produced between 2018 and 2022 was completed. Second, an invitational round table held in April 2022 brought together AI and FM experts and leaders to discuss priorities and to create a strategy for the future.

REPORT

The environmental scan identified research related to 5 major domains of application in FM (preventive care and risk profiling, physician decision support, operational efficiencies, patient self-management, and population health). Although there had been little testing or evaluation of AI-based tools in practice settings, progress since previous reviews has been made in engaging stakeholders to identify key considerations about AI for FM and opportunities in the field. The round-table discussions further emphasized barriers to and facilitators of high-quality research; they also indicated that while there is immense potential for AI to benefit FM practice, the current research trajectory needs to change, and greater support is needed to achieve these expected benefits and to avoid harm.

CONCLUSION

Ten candidate action items that the CFPC could adopt to support near-term positive progress in the field were identified, some of which an AI working group has begun pursuing. Candidate action items are roughly divided into avenues where the CFPC is well-suited to take a leadership role in tackling priority issues in AI for FM research and specific activities or initiatives the CFPC could complete. Strong FM leadership is needed to advance AI research that will contribute to positive transformation in FM.

Collapse

Liu Z, Zhang L, Wu Z, Yu X, Cao C, Dai H, Liu N, Liu J, Liu W, Li Q, Shen D, Li X, Zhu D, Liu T. Surviving ChatGPT in healthcare. FRONTIERS IN RADIOLOGY 2024;3:1224682. [PMID: 38464946 PMCID: PMC10920216 DOI: 10.3389/fradi.2023.1224682] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 07/25/2023] [Indexed: 03/12/2024]

Denecke K, May R, Rivera-Romero O. Transformer Models in Healthcare: A Survey and Thematic Analysis of Potentials, Shortcomings and Risks. J Med Syst 2024;48:23. [PMID: 38367119 PMCID: PMC10874304 DOI: 10.1007/s10916-024-02043-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 02/10/2024] [Indexed: 02/19/2024]

Raman R, Kumar Nair V, Nedungadi P, Kumar Sahu A, Kowalski R, Ramanathan S, Achuthan K. Fake news research trends, linkages to generative artificial intelligence and sustainable development goals. Heliyon 2024;10:e24727. [PMID: 38322879 PMCID: PMC10844021 DOI: 10.1016/j.heliyon.2024.e24727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 12/14/2023] [Accepted: 01/12/2024] [Indexed: 02/08/2024] Open

McMahon HV, McMahon BD. Automating untruths: ChatGPT, self-managed medication abortion, and the threat of misinformation in a post-Roe world. Front Digit Health 2024;6:1287186. [PMID: 38419805 PMCID: PMC10900507 DOI: 10.3389/fdgth.2024.1287186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 01/26/2024] [Indexed: 03/02/2024] Open

Morita PP, Lotto M, Kaur J, Chumachenko D, Oetomo A, Espiritu KD, Hussain IZ. What is the impact of artificial intelligence-based chatbots on infodemic management? Front Public Health 2024;12:1310437. [PMID: 38414895 PMCID: PMC10896940 DOI: 10.3389/fpubh.2024.1310437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 01/31/2024] [Indexed: 02/29/2024] Open