1
|
Hassona Y, Alqaisi D, Al-Haddad A, Georgakopoulou EA, Malamos D, Alrashdan MS, Sawair F. How good is ChatGPT at answering patients' questions related to early detection of oral (mouth) cancer? Oral Surg Oral Med Oral Pathol Oral Radiol 2024; 138:269-278. [PMID: 38714483 DOI: 10.1016/j.oooo.2024.04.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 03/22/2024] [Accepted: 04/14/2024] [Indexed: 05/10/2024]
Abstract
OBJECTIVES To examine the quality, reliability, readability, and usefulness of ChatGPT in promoting oral cancer early detection. STUDY DESIGN About 108 patient-oriented questions about oral cancer early detection were compiled from expert panels, professional societies, and web-based tools. Questions were categorized into 4 topic domains and ChatGPT 3.5 was asked each question independently. ChatGPT answers were evaluated regarding quality, readability, actionability, and usefulness using. Two experienced reviewers independently assessed each response. RESULTS Questions related to clinical appearance constituted 36.1% (n = 39) of the total questions. ChatGPT provided "very useful" responses to the majority of questions (75%; n = 81). The mean Global Quality Score was 4.24 ± 1.3 of 5. The mean reliability score was 23.17 ± 9.87 of 25. The mean understandability score was 76.6% ± 25.9% of 100, while the mean actionability score was 47.3% ± 18.9% of 100. The mean FKS reading ease score was 38.4% ± 29.9%, while the mean SMOG index readability score was 11.65 ± 8.4. No misleading information was identified among ChatGPT responses. CONCLUSION ChatGPT is an attractive and potentially useful resource for informing patients about early detection of oral cancer. Nevertheless, concerns do exist about readability and actionability of the offered information.
Collapse
Affiliation(s)
- Yazan Hassona
- Faculty of Dentistry, Centre for Oral Diseases Studies (CODS), Al-Ahliyya Amman University, Jordan; School of Dentistry, The University of Jordan, Jordan.
| | - Dua'a Alqaisi
- School of Dentistry, The University of Jordan, Jordan
| | | | - Eleni A Georgakopoulou
- Molecular Carcinogenesis Group, Department of Histology and Embryology, Medical School, National and Kapodistrian University of Athens, Greece
| | - Dimitris Malamos
- Oral Medicine Clinic of the National Organization for the Provision of Health, Athens, Greece
| | - Mohammad S Alrashdan
- Department of Oral and Craniofacial Health Sciences, College of Dental Medicine, University of Sharjah, Sharjah, United Arab Emirates
| | - Faleh Sawair
- School of Dentistry, The University of Jordan, Jordan
| |
Collapse
|
2
|
Ihara K, Dumkrieger G, Zhang P, Takizawa T, Schwedt TJ, Chiang CC. Application of Artificial Intelligence in the Headache Field. Curr Pain Headache Rep 2024:10.1007/s11916-024-01297-5. [PMID: 38976174 DOI: 10.1007/s11916-024-01297-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/27/2024] [Indexed: 07/09/2024]
Abstract
PURPOSE OF REVIEW Headache disorders are highly prevalent worldwide. Rapidly advancing capabilities in artificial intelligence (AI) have expanded headache-related research with the potential to solve unmet needs in the headache field. We provide an overview of AI in headache research in this article. RECENT FINDINGS We briefly introduce machine learning models and commonly used evaluation metrics. We then review studies that have utilized AI in the field to advance diagnostic accuracy and classification, predict treatment responses, gather insights from various data sources, and forecast migraine attacks. Furthermore, given the emergence of ChatGPT, a type of large language model (LLM), and the popularity it has gained, we also discuss how LLMs could be used to advance the field. Finally, we discuss the potential pitfalls, bias, and future directions of employing AI in headache medicine. Many recent studies on headache medicine incorporated machine learning, generative AI and LLMs. A comprehensive understanding of potential pitfalls and biases is crucial to using these novel techniques with minimum harm. When used appropriately, AI has the potential to revolutionize headache medicine.
Collapse
Affiliation(s)
- Keiko Ihara
- Department of Neurology, Keio University School of Medicine, Shinjuku, Tokyo, Japan
- Japanese Red Cross Ashikaga Hospital, Ashikaga, Tochigi, Japan
| | | | - Pengfei Zhang
- Department of Neurology, Rutgers University, New Brunswick, NJ, USA
| | - Tsubasa Takizawa
- Department of Neurology, Keio University School of Medicine, Shinjuku, Tokyo, Japan
| | - Todd J Schwedt
- Department of Neurology, Mayo Clinic, Scottsdale, AZ, USA
| | | |
Collapse
|
3
|
Haltaufderheide J, Ranisch R. The ethics of ChatGPT in medicine and healthcare: a systematic review on Large Language Models (LLMs). NPJ Digit Med 2024; 7:183. [PMID: 38977771 PMCID: PMC11231310 DOI: 10.1038/s41746-024-01157-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 05/29/2024] [Indexed: 07/10/2024] Open
Abstract
With the introduction of ChatGPT, Large Language Models (LLMs) have received enormous attention in healthcare. Despite potential benefits, researchers have underscored various ethical implications. While individual instances have garnered attention, a systematic and comprehensive overview of practical applications currently researched and ethical issues connected to them is lacking. Against this background, this work maps the ethical landscape surrounding the current deployment of LLMs in medicine and healthcare through a systematic review. Electronic databases and preprint servers were queried using a comprehensive search strategy which generated 796 records. Studies were screened and extracted following a modified rapid review approach. Methodological quality was assessed using a hybrid approach. For 53 records, a meta-aggregative synthesis was performed. Four general fields of applications emerged showcasing a dynamic exploration phase. Advantages of using LLMs are attributed to their capacity in data analysis, information provisioning, support in decision-making or mitigating information loss and enhancing information accessibility. However, our study also identifies recurrent ethical concerns connected to fairness, bias, non-maleficence, transparency, and privacy. A distinctive concern is the tendency to produce harmful or convincing but inaccurate content. Calls for ethical guidance and human oversight are recurrent. We suggest that the ethical guidance debate should be reframed to focus on defining what constitutes acceptable human oversight across the spectrum of applications. This involves considering the diversity of settings, varying potentials for harm, and different acceptable thresholds for performance and certainty in healthcare. Additionally, critical inquiry is needed to evaluate the necessity and justification of LLMs' current experimental use.
Collapse
Affiliation(s)
- Joschka Haltaufderheide
- Faculty of Health Sciences Brandenburg, University of Potsdam, Am Mühlenberg 9, Potsdam, 14476, Germany
| | - Robert Ranisch
- Faculty of Health Sciences Brandenburg, University of Potsdam, Am Mühlenberg 9, Potsdam, 14476, Germany.
| |
Collapse
|
4
|
Hassona Y, Alqaisi DA. "My kid has autism": An interesting conversation with ChatGPT. SPECIAL CARE IN DENTISTRY 2024; 44:1296-1299. [PMID: 38415857 DOI: 10.1111/scd.12983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 02/11/2024] [Accepted: 02/16/2024] [Indexed: 02/29/2024]
Affiliation(s)
- Yazan Hassona
- Faculty of Dentistry, Centre for Oral Diseases Studies, Al-Ahliyya Amman University, Amman, Jordan
- School of Dentistry, The University of Jordan, Amman, Jordan
| | - Dua A Alqaisi
- School of Dentistry, The University of Jordan, Amman, Jordan
| |
Collapse
|
5
|
Affiliation(s)
- Gary H Lyman
- Editor-in-Chief Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | | |
Collapse
|
6
|
Checcucci E, Rodler S, Piazza P, Porpiglia F, Cacciamani GE. Transitioning from "Dr. Google" to "Dr. ChatGPT": the advent of artificial intelligence chatbots. Transl Androl Urol 2024; 13:1067-1070. [PMID: 38983463 PMCID: PMC11228672 DOI: 10.21037/tau-23-629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 04/09/2024] [Indexed: 07/11/2024] Open
Affiliation(s)
- Enrico Checcucci
- Department of Surgery, Candiolo Cancer Institute (FPO-IRCCS), Candiolo, Italy
| | - Severin Rodler
- Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, USC Institute of Urology, Los Angeles, CA, USA
- Artificial Intelligence Center at USC Institute of Urology, Los Angeles, CA, USA
- Department of Urology, University Hospital of Munich (LMU), Munich, Germany
| | - Pietro Piazza
- Division of Urology, IRCCS Azienda Ospedaliero- Universitaria di Bologna, Bologna, Italy
| | - Francesco Porpiglia
- Department of Oncology, Division of Urology, University of Turin, San Luigi Gonzaga Hospital, Orbassano, TO, Italy
| | - Giovanni Enrico Cacciamani
- Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, USC Institute of Urology, Los Angeles, CA, USA
- Artificial Intelligence Center at USC Institute of Urology, Los Angeles, CA, USA
| |
Collapse
|
7
|
Lee JW, Yoo IS, Kim JH, Kim WT, Jeon HJ, Yoo HS, Shin JG, Kim GH, Hwang S, Park S, Kim YJ. Development of AI-generated medical responses using the ChatGPT for cancer patients. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 254:108302. [PMID: 38996805 DOI: 10.1016/j.cmpb.2024.108302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 05/28/2024] [Accepted: 06/22/2024] [Indexed: 07/14/2024]
Abstract
BACKGROUND AND OBJECTIVE To develop a healthcare chatbot service (AI-guided bot) that conducts real-time conversations using large language models to provide accurate health information to patients. METHODS To provide accurate and specialized medical responses, we integrated several cancer practice guidelines. The size of the integrated meta-dataset was 1.17 million tokens. The integrated and classified metadata were extracted, transformed into text, segmented to specific character lengths, and vectorized using the embedding model. The AI-guide bot was implemented using Python 3.9. To enhance the scalability and incorporate the integrated dataset, we combined the AI-guide bot with OpenAI and the LangChain framework. To generate user-friendly conversations, a language model was developed based on Chat-Generative Pretrained Transformer (ChatGPT), an interactive conversational chatbot powered by GPT-3.5. The AI-guide bot was implemented using ChatGPT3.5 from Sep. 2023 to Jan. 2024. RESULTS The AI-guide bot allowed users to select their desired cancer type and language for conversational interactions. The AI-guided bot was designed to expand its capabilities to encompass multiple major cancer types. The performance of the AI-guide bot responses was 90.98 ± 4.02 (obtained by summing up the Likert scores). CONCLUSIONS The AI-guide bot can provide medical information quickly and accurately to patients with cancer who are concerned about their health.
Collapse
Affiliation(s)
- Jae-Woo Lee
- Department of Family Medicine, Chungbuk National University Hospital, Cheongju, Republic of Korea; Department of Family Medicine, Chungbuk National University College of Medicine, Cheongju, Republic of Korea
| | - In-Sang Yoo
- Department of Biomedical Engineering, Chungbuk National University Hospital, Cheongju, Republic of Korea; Department of Medicine, Chungbuk National University College of Medicine, Cheongju, Republic of Korea
| | - Ji-Hye Kim
- Department of Biomedical Engineering, Chungbuk National University Hospital, Cheongju, Republic of Korea
| | - Won Tae Kim
- Department of Urology, Chungbuk National University Hospital, Cheongju, Republic of Korea; Department of Urology, Chungbuk National University College of Medicine, 1 Chungdae-ro, Seowon-gu, Cheongju, Chungcheongbuk-do 28644, Republic of Korea
| | - Hyun Jeong Jeon
- Department of Internal Medicine, Chungbuk National University Hospital, Cheongju, Republic of Korea; Department of Internal Medicine, College of Medicine, Chungbuk National University, Cheongju, Republic of Korea
| | - Hyo-Sun Yoo
- Department of Family Medicine, Chungbuk National University Hospital, Cheongju, Republic of Korea
| | - Jae Gwang Shin
- Department of Biomedical Engineering, Chungbuk National University Hospital, Cheongju, Republic of Korea
| | - Geun-Hyeong Kim
- Department of Biomedical Engineering, Chungbuk National University Hospital, Cheongju, Republic of Korea
| | - ShinJi Hwang
- Department of Biomedical Engineering, Chungbuk National University Hospital, Cheongju, Republic of Korea
| | - Seung Park
- Department of Biomedical Engineering, Chungbuk National University Hospital, Cheongju, Republic of Korea; Department of Medicine, Chungbuk National University College of Medicine, Cheongju, Republic of Korea
| | - Yong-June Kim
- Department of Urology, Chungbuk National University Hospital, Cheongju, Republic of Korea; Department of Urology, Chungbuk National University College of Medicine, 1 Chungdae-ro, Seowon-gu, Cheongju, Chungcheongbuk-do 28644, Republic of Korea.
| |
Collapse
|
8
|
McGrath SP, Kozel BA, Gracefo S, Sutherland N, Danford CJ, Walton N. A comparative evaluation of ChatGPT 3.5 and ChatGPT 4 in responses to selected genetics questions. J Am Med Inform Assoc 2024:ocae128. [PMID: 38872284 DOI: 10.1093/jamia/ocae128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 04/23/2024] [Accepted: 05/28/2024] [Indexed: 06/15/2024] Open
Abstract
OBJECTIVES To evaluate the efficacy of ChatGPT 4 (GPT-4) in delivering genetic information about BRCA1, HFE, and MLH1, building on previous findings with ChatGPT 3.5 (GPT-3.5). To focus on assessing the utility, limitations, and ethical implications of using ChatGPT in medical settings. MATERIALS AND METHODS A structured survey was developed to assess GPT-4's clinical value. An expert panel of genetic counselors and clinical geneticists evaluated GPT-4's responses to these questions. We also performed comparative analysis with GPT-3.5, utilizing descriptive statistics and using Prism 9 for data analysis. RESULTS The findings indicate improved accuracy in GPT-4 over GPT-3.5 (P < .0001). However, notable errors in accuracy remained. The relevance of responses varied in GPT-4, but was generally favorable, with a mean in the "somewhat agree" range. There was no difference in performance by disease category. The 7-question subset of the Bot Usability Scale (BUS-15) showed no statistically significant difference between the groups but trended lower in the GPT-4 version. DISCUSSION AND CONCLUSION The study underscores GPT-4's potential role in genetic education, showing notable progress yet facing challenges like outdated information and the necessity of ongoing refinement. Our results, while showing promise, emphasizes the importance of balancing technological innovation with ethical responsibility in healthcare information delivery.
Collapse
Affiliation(s)
- Scott P McGrath
- CITRIS Health, University of California Berkeley, Berkeley, CA 94720-1764, United States
| | - Beth A Kozel
- Laboratory of Vascular and Matrix Genetics, National Heart, Lung, and Blood Institute (NHLBI), Bethesda, MD 20892, United States
| | - Sara Gracefo
- Intermountain Precision Genomics, Intermountain Healthcare, St George, UT 84790-8723, United States
| | - Nykole Sutherland
- Intermountain Precision Genomics, Intermountain Healthcare, St George, UT 84790-8723, United States
| | | | - Nephi Walton
- National Human Genome Research Institute, National Institute of Health, Bethesda, MD 20892-2152, United States
| |
Collapse
|
9
|
Maggio MG, Tartarisco G, Cardile D, Bonanno M, Bruschetta R, Pignolo L, Pioggia G, Calabrò RS, Cerasa A. Exploring ChatGPT's potential in the clinical stream of neurorehabilitation. Front Artif Intell 2024; 7:1407905. [PMID: 38903157 PMCID: PMC11187276 DOI: 10.3389/frai.2024.1407905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Accepted: 05/13/2024] [Indexed: 06/22/2024] Open
Abstract
In several medical fields, generative AI tools such as ChatGPT have achieved optimal performance in identifying correct diagnoses only by evaluating narrative clinical descriptions of cases. The most active fields of application include oncology and COVID-19-related symptoms, with preliminary relevant results also in psychiatric and neurological domains. This scoping review aims to introduce the arrival of ChatGPT applications in neurorehabilitation practice, where such AI-driven solutions have the potential to revolutionize patient care and assistance. First, a comprehensive overview of ChatGPT, including its design, and potential applications in medicine is provided. Second, the remarkable natural language processing skills and limitations of these models are examined with a focus on their use in neurorehabilitation. In this context, we present two case scenarios to evaluate ChatGPT ability to resolve higher-order clinical reasoning. Overall, we provide support to the first evidence that generative AI can meaningfully integrate as a facilitator into neurorehabilitation practice, aiding physicians in defining increasingly efficacious diagnostic and personalized prognostic plans.
Collapse
Affiliation(s)
| | - Gennaro Tartarisco
- Institute for Biomedical Research and Innovation (IRIB), National Research Council of Italy (CNR), Messina, Italy
| | | | | | - Roberta Bruschetta
- Institute for Biomedical Research and Innovation (IRIB), National Research Council of Italy (CNR), Messina, Italy
| | | | - Giovanni Pioggia
- Institute for Biomedical Research and Innovation (IRIB), National Research Council of Italy (CNR), Messina, Italy
| | | | - Antonio Cerasa
- Institute for Biomedical Research and Innovation (IRIB), National Research Council of Italy (CNR), Messina, Italy
- S’Anna Institute, Crotone, Italy
- Pharmacotechnology Documentation and Transfer Unit, Preclinical and Translational Pharmacology, Department of Pharmacy, Health and Nutritional Sciences, University of Calabria, Rende, Italy
| |
Collapse
|
10
|
Loughran E, Kane M, Wyatt TH, Kerley A, Lowe S, Li X. Using Large Language Models to Address Health Literacy in mHealth: Case Report. Comput Inform Nurs 2024:00024665-990000000-00193. [PMID: 38832874 DOI: 10.1097/cin.0000000000001152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2024]
Abstract
The innate complexity of medical topics often makes it challenging to produce educational content for the public. Although there are resources available to help authors appraise the complexity of their content, there are woefully few resources available to help authors reduce that complexity after it occurs. In this case study, we evaluate using ChatGPT to reduce the complex language used in health-related educational materials. ChatGPT adapted content from the SmartSHOTS mobile application, which is geared toward caregivers of children aged 0 to 24 months. SmartSHOTS helps reduce barriers and improve adherence to vaccination schedules. ChatGPT reduced complex sentence structure and rewrote content to align with a third-grade reading level. Furthermore, using ChatGPT to edit content already written removes the potential for unnoticed, artificial intelligence-produced inaccuracies. As an editorial tool, ChatGPT was effective, efficient, and free to use. This article discusses the potential of ChatGPT as an effective, time-efficient, and open-source method for editing health-related educational materials to reflect a comprehendible reading level.
Collapse
|
11
|
Kamyabi A, Iyamu I, Saini M, May C, McKee G, Choi A. Advocating for population health: The role of public health practitioners in the age of artificial intelligence. CANADIAN JOURNAL OF PUBLIC HEALTH = REVUE CANADIENNE DE SANTE PUBLIQUE 2024; 115:473-476. [PMID: 38625496 PMCID: PMC11151885 DOI: 10.17269/s41997-024-00881-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 03/14/2024] [Indexed: 04/17/2024]
Abstract
Over the past decade, artificial intelligence (AI) has begun to transform Canadian organizations, driven by the promise of improved efficiency, better decision-making, and enhanced client experience. While AI holds great opportunities, there are also near-term impacts on the determinants of health and population health equity that are already emerging. If adoption is unregulated, there is a substantial risk that health inequities could be exacerbated through intended or unintended biases embedded in AI systems. New economic opportunities could be disproportionately leveraged by already privileged workers and owners of AI systems, reinforcing prevailing power dynamics. AI could also detrimentally affect population well-being by replacing human interactions rather than fostering social connectedness. Furthermore, AI-powered health misinformation could undermine effective public health communication. To respond to these challenges, public health must assess and report on the health equity impacts of AI, inform implementation to reduce health inequities, and facilitate intersectoral partnerships to foster development of policies and regulatory frameworks to mitigate risks. This commentary highlights AI's near-term risks for population health to inform a public health response.
Collapse
Affiliation(s)
| | - Ihoghosa Iyamu
- British Columbia Centre for Disease Control, Vancouver, BC, Canada
- School of Population and Public Health, University of British Columbia, Vancouver, BC, Canada
| | - Manik Saini
- Vancouver Coastal Health, Vancouver, BC, Canada
| | - Curtis May
- School of Population and Public Health, University of British Columbia, Vancouver, BC, Canada
| | - Geoffrey McKee
- British Columbia Centre for Disease Control, Vancouver, BC, Canada
- School of Population and Public Health, University of British Columbia, Vancouver, BC, Canada
| | - Alex Choi
- Vancouver Coastal Health, Vancouver, BC, Canada.
| |
Collapse
|
12
|
Riestra-Ayora J, Vaduva C, Esteban-Sánchez J, Garrote-Garrote M, Fernández-Navarro C, Sánchez-Rodríguez C, Martin-Sanz E. ChatGPT as an information tool in rhinology. Can we trust each other today? Eur Arch Otorhinolaryngol 2024; 281:3253-3259. [PMID: 38436756 DOI: 10.1007/s00405-024-08581-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Accepted: 02/23/2024] [Indexed: 03/05/2024]
Abstract
PURPOSE ChatGPT (Chat-Generative Pre-trained Transformer) has proven to be a powerful information tool on various topics, including healthcare. This system is based on information obtained on the Internet, but this information is not always reliable. Currently, few studies analyze the validity of these responses in rhinology. Our work aims to assess the quality and reliability of the information provided by AI regarding the main rhinological pathologies. METHODS We asked to the default ChatGPT version (GPT-3.5) 65 questions about the most prevalent pathologies in rhinology. The focus was learning about the causes, risk factors, treatments, prognosis, and outcomes. We use the Discern questionnaire and a hexagonal radar schema to evaluate the quality of the information. We use Fleiss's kappa statistical analysis to determine the consistency of agreement between different observers. RESULTS The overall evaluation of the Discern questionnaire resulted in a score of 4.05 (± 0.6). The results in the Reliability section are worse, with an average score of 3.18. (± 1.77). This score is affected by the responses to questions about the source of the information provided. The average score for the Quality section was 3.59 (± 1.18). Fleiss's Kappa shows substantial agreement, with a K of 0.69 (p < 0.001). CONCLUSION The ChatGPT answers are accurate and reliable. It generates a simple and understandable description of the pathology for the patient's benefit. Our team considers that ChatGPT could be a useful tool to provide information under prior supervision by a health professional.
Collapse
Affiliation(s)
- Juan Riestra-Ayora
- Department of Medicine, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Villaviciosa de Odón, 28670, Madrid, Spain.
- Department of Otolaryngology-Head and Neck Surgery, Hospital Universitario de Getafe, Carretera de Toledo, Km 12.500, Getafe, 28905, Madrid, Spain.
| | - Cristina Vaduva
- Department of Medicine, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Villaviciosa de Odón, 28670, Madrid, Spain
- Department of Otolaryngology-Head and Neck Surgery, Hospital Universitario de Getafe, Carretera de Toledo, Km 12.500, Getafe, 28905, Madrid, Spain
| | - Jonathan Esteban-Sánchez
- Department of Medicine, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Villaviciosa de Odón, 28670, Madrid, Spain
- Department of Otolaryngology-Head and Neck Surgery, Hospital Universitario de Getafe, Carretera de Toledo, Km 12.500, Getafe, 28905, Madrid, Spain
| | - María Garrote-Garrote
- Department of Otolaryngology-Head and Neck Surgery, Hospital Universitario de Getafe, Carretera de Toledo, Km 12.500, Getafe, 28905, Madrid, Spain
| | - Carlos Fernández-Navarro
- Department of Otolaryngology-Head and Neck Surgery, Hospital Universitario de Getafe, Carretera de Toledo, Km 12.500, Getafe, 28905, Madrid, Spain
| | - Carolina Sánchez-Rodríguez
- Department of Medicine, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Villaviciosa de Odón, 28670, Madrid, Spain
| | - Eduardo Martin-Sanz
- Department of Medicine, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Villaviciosa de Odón, 28670, Madrid, Spain
- Department of Otolaryngology-Head and Neck Surgery, Hospital Universitario de Getafe, Carretera de Toledo, Km 12.500, Getafe, 28905, Madrid, Spain
| |
Collapse
|
13
|
Hussain T, Wang D, Li B. The influence of the COVID-19 pandemic on the adoption and impact of AI ChatGPT: Challenges, applications, and ethical considerations. Acta Psychol (Amst) 2024; 246:104264. [PMID: 38626597 DOI: 10.1016/j.actpsy.2024.104264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 04/08/2024] [Accepted: 04/09/2024] [Indexed: 04/18/2024] Open
Abstract
DESIGN/METHODOLOGY/APPROACH This article employs qualitative thematic modeling to gather insights from 30 informants. The study explores various aspects related to the impact of the COVID-19 pandemic on AI ChatGPT technologies. PURPOSE The purpose of this research is to examine how the COVID-19 pandemic has influenced the increased usage and adoption of AI ChatGPT. It aims to explore the pandemic's impact on AI ChatGPT and its applications in specific domains, as well as the challenges and opportunities it presents. FINDINGS The findings highlight that the pandemic has led to a surge in online activities, resulting in a heightened demand for AI ChatGPT. It has been widely used in areas such as healthcare, mental health support, remote collaboration, and personalized customer experiences. The article showcases examples of AI ChatGPT's application during the pandemic. STRENGTH OF STUDY This qualitative framework enables the study to delve deeply into the multifaceted dimensions of AI ChatGPT's role during the pandemic, capturing the diverse experiences and insights of users, practitioners, and experts. By embracing the qualitative nature of inquiry and this research offers a comprehensive understanding of the challenges, opportunities, and ethical considerations associated with the adoption and utilization of AI ChatGPT in crisis contexts. PRACTICAL IMPLICATIONS The insights from this research have practical implications for policymakers, developers, and researchers. This reserach emphasize the need for responsible and ethical implementation of AI ChatGPT to fully harness its potential in addressing societal needs during and beyond the pandemic. SOCIAL IMPLICATIONS The increased reliance on AI ChatGPT during the pandemic has led to changes in user behavior, expectations, and interactions. However, it has also unveiled ethical considerations and potential risks. Addressing societal and ethical concerns, such as user impact and autonomy, privacy and security, bias and fairness, and transparency and accountability, is crucial for the responsible deployment of AI ChatGPT. ORIGINALITY/VALUE This research contributes to the understanding of the novel role of AI ChatGPT in times of crisis, particularly in the era of COVID-19 pandemic. It highlights the necessity of responsible and ethical implementation of AI ChatGPT and provides valuable insights for the development and application of AI technology in the future.
Collapse
Affiliation(s)
- Talib Hussain
- School of Media and Communication, Shanghai Jiao Tong University, 800 Dongchuan Road, 2002240 Shanghai, China; Department of Media Management, University of Religions and Denominations, Qom 37491-13357, Iran.
| | - Dake Wang
- School of Media and Communication, Shanghai Jiao Tong University, 800 Dongchuan Road, 2002240 Shanghai, China.
| | - Benqian Li
- School of Media and Communication, Shanghai Jiao Tong University, 800 Dongchuan Road, 2002240 Shanghai, China.
| |
Collapse
|
14
|
Ying H, Zhao Z, Zhao Y, Zeng S, Yu S. CoRTEx: contrastive learning for representing terms via explanations with applications on constructing biomedical knowledge graphs. J Am Med Inform Assoc 2024:ocae115. [PMID: 38777805 DOI: 10.1093/jamia/ocae115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 03/11/2024] [Accepted: 05/14/2024] [Indexed: 05/25/2024] Open
Abstract
OBJECTIVES Biomedical Knowledge Graphs play a pivotal role in various biomedical research domains. Concurrently, term clustering emerges as a crucial step in constructing these knowledge graphs, aiming to identify synonymous terms. Due to a lack of knowledge, previous contrastive learning models trained with Unified Medical Language System (UMLS) synonyms struggle at clustering difficult terms and do not generalize well beyond UMLS terms. In this work, we leverage the world knowledge from large language models (LLMs) and propose Contrastive Learning for Representing Terms via Explanations (CoRTEx) to enhance term representation and significantly improves term clustering. MATERIALS AND METHODS The model training involves generating explanations for a cleaned subset of UMLS terms using ChatGPT. We employ contrastive learning, considering term and explanation embeddings simultaneously, and progressively introduce hard negative samples. Additionally, a ChatGPT-assisted BIRCH algorithm is designed for efficient clustering of a new ontology. RESULTS We established a clustering test set and a hard negative test set, where our model consistently achieves the highest F1 score. With CoRTEx embeddings and the modified BIRCH algorithm, we grouped 35 580 932 terms from the Biomedical Informatics Ontology System (BIOS) into 22 104 559 clusters with O(N) queries to ChatGPT. Case studies highlight the model's efficacy in handling challenging samples, aided by information from explanations. CONCLUSION By aligning terms to their explanations, CoRTEx demonstrates superior accuracy over benchmark models and robustness beyond its training set, and it is suitable for clustering terms for large-scale biomedical ontologies.
Collapse
Affiliation(s)
- Huaiyuan Ying
- Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing, 100084, China
| | - Zhengyun Zhao
- Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing, 100084, China
| | - Yang Zhao
- Weiyang College, Tsinghua University, Beijing, 100084, China
| | - Sihang Zeng
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA 98195, United States
| | - Sheng Yu
- Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing, 100084, China
| |
Collapse
|
15
|
Kittichai V, Sompong W, Kaewthamasorn M, Sasisaowapak T, Naing KM, Tongloy T, Chuwongin S, Thanee S, Boonsang S. A novel approach for identification of zoonotic trypanosome utilizing deep metric learning and vector database-based image retrieval system. Heliyon 2024; 10:e30643. [PMID: 38774068 PMCID: PMC11107104 DOI: 10.1016/j.heliyon.2024.e30643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Revised: 04/28/2024] [Accepted: 05/01/2024] [Indexed: 05/24/2024] Open
Abstract
Trypanosomiasis, a significant health concern in South America, South Asia, and Southeast Asia, requires active surveys to effectively control the disease. To address this, we have developed a hybrid model that combines deep metric learning (DML) and image retrieval. This model is proficient at identifying Trypanosoma species in microscopic images of thin-blood film examinations. Utilizing the ResNet50 backbone neural network, a trained-model has demonstrated outstanding performance, achieving an accuracy exceeding 99.71 % and up to 96 % in recall. Acknowledging the necessity for automated tools in field scenarios, we demonstrated the potential of our model as an autonomous screening approach. This was achieved by using prevailing convolutional neural network (CNN) applications, and vector database based-images returned by the KNN algorithm. This achievement is primarily attributed to the implementation of the Triplet Margin Loss function as 98 % of precision. The robustness of the model demonstrated in five-fold cross-validation highlights the ResNet50 neural network, based on DML, as a state-of-the-art CNN model as AUC >98 %. The adoption of DML significantly improves the performance of the model, remaining unaffected by variations in the dataset and rendering it a useful tool for fieldwork studies. DML offers several advantages over conventional classification model to manage large-scale datasets with a high volume of classes, enhancing scalability. The model has the capacity to generalize to novel classes that were not encountered during training, proving particularly advantageous in scenarios where new classes may consistently emerge. It is also well suited for applications requiring precise recognition, especially in discriminating between closely related classes. Furthermore, the DML exhibits greater resilience to issues related to class imbalance, as it concentrates on learning distances or similarities, which are more tolerant to such imbalances. These contributions significantly make the effectiveness and practicality of DML model, particularly in in fieldwork research.
Collapse
Affiliation(s)
- Veerayuth Kittichai
- Faculty of Medicine, King Mongkut's Institute of Technology Ladkrabang, Thailand
| | - Weerachat Sompong
- Faculty of Medicine, King Mongkut's Institute of Technology Ladkrabang, Thailand
| | - Morakot Kaewthamasorn
- Veterinary Parasitology Research Unit, Faculty of Veterinary Science, Chulalongkorn University, Bangkok, Thailand
| | - Thanyathep Sasisaowapak
- College of Advanced Manufacturing Innovation, King Mongkut's Institute of Technology Ladkrabang, Thailand
| | - Kaung Myat Naing
- College of Advanced Manufacturing Innovation, King Mongkut's Institute of Technology Ladkrabang, Thailand
| | - Teerawat Tongloy
- College of Advanced Manufacturing Innovation, King Mongkut's Institute of Technology Ladkrabang, Thailand
| | - Santhad Chuwongin
- College of Advanced Manufacturing Innovation, King Mongkut's Institute of Technology Ladkrabang, Thailand
| | - Suchansa Thanee
- Veterinary Parasitology Research Unit, Faculty of Veterinary Science, Chulalongkorn University, Bangkok, Thailand
| | - Siridech Boonsang
- Department of Electrical Engineering, Faculty of Engineering, King Mongkut's Institute of Technology Ladkrabang, Thailand
| |
Collapse
|
16
|
Gwon YN, Kim JH, Chung HS, Jung EJ, Chun J, Lee S, Shim SR. The Use of Generative AI for Scientific Literature Searches for Systematic Reviews: ChatGPT and Microsoft Bing AI Performance Evaluation. JMIR Med Inform 2024; 12:e51187. [PMID: 38771247 PMCID: PMC11107769 DOI: 10.2196/51187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 03/31/2024] [Accepted: 04/04/2024] [Indexed: 05/22/2024] Open
Abstract
Background A large language model is a type of artificial intelligence (AI) model that opens up great possibilities for health care practice, research, and education, although scholars have emphasized the need to proactively address the issue of unvalidated and inaccurate information regarding its use. One of the best-known large language models is ChatGPT (OpenAI). It is believed to be of great help to medical research, as it facilitates more efficient data set analysis, code generation, and literature review, allowing researchers to focus on experimental design as well as drug discovery and development. Objective This study aims to explore the potential of ChatGPT as a real-time literature search tool for systematic reviews and clinical decision support systems, to enhance their efficiency and accuracy in health care settings. Methods The search results of a published systematic review by human experts on the treatment of Peyronie disease were selected as a benchmark, and the literature search formula of the study was applied to ChatGPT and Microsoft Bing AI as a comparison to human researchers. Peyronie disease typically presents with discomfort, curvature, or deformity of the penis in association with palpable plaques and erectile dysfunction. To evaluate the quality of individual studies derived from AI answers, we created a structured rating system based on bibliographic information related to the publications. We classified its answers into 4 grades if the title existed: A, B, C, and F. No grade was given for a fake title or no answer. Results From ChatGPT, 7 (0.5%) out of 1287 identified studies were directly relevant, whereas Bing AI resulted in 19 (40%) relevant studies out of 48, compared to the human benchmark of 24 studies. In the qualitative evaluation, ChatGPT had 7 grade A, 18 grade B, 167 grade C, and 211 grade F studies, and Bing AI had 19 grade A and 28 grade C studies. Conclusions This is the first study to compare AI and conventional human systematic review methods as a real-time literature collection tool for evidence-based medicine. The results suggest that the use of ChatGPT as a tool for real-time evidence generation is not yet accurate and feasible. Therefore, researchers should be cautious about using such AI. The limitations of this study using the generative pre-trained transformer model are that the search for research topics was not diverse and that it did not prevent the hallucination of generative AI. However, this study will serve as a standard for future studies by providing an index to verify the reliability and consistency of generative AI from a user's point of view. If the reliability and consistency of AI literature search services are verified, then the use of these technologies will help medical research greatly.
Collapse
Affiliation(s)
- Yong Nam Gwon
- Department of Urology, Soonchunhyang University College of Medicine, Soonchunhyang University Seoul Hospital, Seoul, Republic of Korea
| | - Jae Heon Kim
- Department of Urology, Soonchunhyang University College of Medicine, Soonchunhyang University Seoul Hospital, Seoul, Republic of Korea
| | - Hyun Soo Chung
- College of Medicine, Soonchunhyang University, Cheonan, Republic of Korea
| | - Eun Jee Jung
- College of Medicine, Soonchunhyang University, Cheonan, Republic of Korea
| | - Joey Chun
- Department of Urology, Soonchunhyang University College of Medicine, Soonchunhyang University Seoul Hospital, Seoul, Republic of Korea
- Cranbrook Kingswood Upper School, Bloomfield Hills, MI, United States
| | - Serin Lee
- Department of Urology, Soonchunhyang University College of Medicine, Soonchunhyang University Seoul Hospital, Seoul, Republic of Korea
- Department of Biochemistry, Case Western Reserve University, Cleveland, OH, United States
| | - Sung Ryul Shim
- Department of Biomedical Informatics, Konyang University College of Medicine, Daejeon, Republic of Korea
- Konyang Medical Data Research Group-KYMERA, Konyang University Hospital, Daejeon, Republic of Korea
| |
Collapse
|
17
|
Xu Z, Fang Q, Huang Y, Xie M. The public attitude towards ChatGPT on reddit: A study based on unsupervised learning from sentiment analysis and topic modeling. PLoS One 2024; 19:e0302502. [PMID: 38743773 PMCID: PMC11093324 DOI: 10.1371/journal.pone.0302502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 04/07/2024] [Indexed: 05/16/2024] Open
Abstract
ChatGPT has demonstrated impressive abilities and impacted various aspects of human society since its creation, gaining widespread attention from different social spheres. This study aims to comprehensively assess public perception of ChatGPT on Reddit. The dataset was collected via Reddit, a social media platform, and includes 23,733 posts and comments related to ChatGPT. Firstly, to examine public attitudes, this study conducts content analysis utilizing topic modeling with the Latent Dirichlet Allocation (LDA) algorithm to extract pertinent topics. Furthermore, sentiment analysis categorizes user posts and comments as positive, negative, or neutral using Textblob and Vader in natural language processing. The result of topic modeling shows that seven topics regarding ChatGPT are identified, which can be grouped into three themes: user perception, technical methods, and impacts on society. Results from the sentiment analysis show that 61.6% of the posts and comments hold favorable opinions on ChatGPT. They emphasize ChatGPT's ability to prompt and engage in natural conversations with users, without relying on complex natural language processing. It provides suggestions for ChatGPT developers to enhance its usability design and functionality. Meanwhile, stakeholders, including users, should comprehend the advantages and disadvantages of ChatGPT in human society to promote ethical and regulated implementation of the system.
Collapse
Affiliation(s)
- Zhaoxiang Xu
- Department of Data Science, School of Computer Science and Engineering, Guangzhou Institute of Science and Technology, Guangzhou, Guangdong, China
| | - Qingguo Fang
- Department of Management, School of Business, Macau University of Science and Technology, Macao, China
| | - Yanbo Huang
- Data Science Research Center, Faculty of Innovation Engineering, Macau University of Science and Technology, Macao, China
| | - Mingjian Xie
- Department of Decision Sciences, School of Business, Macau University of Science and Technology, Macao, China
| |
Collapse
|
18
|
Aguirre A, Hilsabeck R, Smith T, Xie B, He D, Wang Z, Zou N. Assessing the Quality of ChatGPT Responses to Dementia Caregivers' Questions: Qualitative Analysis. JMIR Aging 2024; 7:e53019. [PMID: 38722219 PMCID: PMC11089887 DOI: 10.2196/53019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 02/15/2024] [Accepted: 03/09/2024] [Indexed: 05/15/2024] Open
Abstract
Background Artificial intelligence (AI) such as ChatGPT by OpenAI holds great promise to improve the quality of life of patients with dementia and their caregivers by providing high-quality responses to their questions about typical dementia behaviors. So far, however, evidence on the quality of such ChatGPT responses is limited. A few recent publications have investigated the quality of ChatGPT responses in other health conditions. Our study is the first to assess ChatGPT using real-world questions asked by dementia caregivers themselves. objectives This pilot study examines the potential of ChatGPT-3.5 to provide high-quality information that may enhance dementia care and patient-caregiver education. Methods Our interprofessional team used a formal rating scale (scoring range: 0-5; the higher the score, the better the quality) to evaluate ChatGPT responses to real-world questions posed by dementia caregivers. We selected 60 posts by dementia caregivers from Reddit, a popular social media platform. These posts were verified by 3 interdisciplinary dementia clinicians as representing dementia caregivers' desire for information in the areas of memory loss and confusion, aggression, and driving. Word count for posts in the memory loss and confusion category ranged from 71 to 531 (mean 218; median 188), aggression posts ranged from 58 to 602 words (mean 254; median 200), and driving posts ranged from 93 to 550 words (mean 272; median 276). Results ChatGPT's response quality scores ranged from 3 to 5. Of the 60 responses, 26 (43%) received 5 points, 21 (35%) received 4 points, and 13 (22%) received 3 points, suggesting high quality. ChatGPT obtained consistently high scores in synthesizing information to provide follow-up recommendations (n=58, 96%), with the lowest scores in the area of comprehensiveness (n=38, 63%). Conclusions ChatGPT provided high-quality responses to complex questions posted by dementia caregivers, but it did have limitations. ChatGPT was unable to anticipate future problems that a human professional might recognize and address in a clinical encounter. At other times, ChatGPT recommended a strategy that the caregiver had already explicitly tried. This pilot study indicates the potential of AI to provide high-quality information to enhance dementia care and patient-caregiver education in tandem with information provided by licensed health care professionals. Evaluating the quality of responses is necessary to ensure that caregivers can make informed decisions. ChatGPT has the potential to transform health care practice by shaping how caregivers receive health information.
Collapse
Affiliation(s)
- Alyssa Aguirre
- Department of Neurology, The University of Texas at Austin, Austin, TX, United States
- Steve Hicks School of Social Work, The University of Texas at Austin, Austin, TX, United States
| | - Robin Hilsabeck
- Glenn Biggs Institute for Alzheimer's & Neurodegenerative Diseases, Department of Neurology, University of Texas Health Science Center at San Antonio, San Antonio, TX, United States
| | - Tawny Smith
- Department of Psychiatry and Behavioral Sciences, The University of Texas at Austin, Austin, TX, United States
| | - Bo Xie
- School of Information, The University of Texas at Austin, Austin, TX, United States
- School of Nursing, The University of Texas at Austin, Austin, TX, United States
| | - Daqing He
- School of Computing and Information, University of Pittsburgh, Pittsburgh, PA, United States
| | - Zhendong Wang
- School of Computing and Information, University of Pittsburgh, Pittsburgh, PA, United States
| | - Ning Zou
- School of Computing and Information, University of Pittsburgh, Pittsburgh, PA, United States
| |
Collapse
|
19
|
Nguyen J, Owen SC. Emerging Voices in Drug Delivery - Breaking Barriers (Issue 1). Adv Drug Deliv Rev 2024; 208:115273. [PMID: 38447932 DOI: 10.1016/j.addr.2024.115273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2024]
Affiliation(s)
- Juliane Nguyen
- Division of Pharmacoengineering & Molecular Pharmaceutics, Eshelman School of Pharmacy, UNC, Chapel Hill, NC 27599, United States; Department of Biomedical Engineering, NC State/UNC, Chapel Hill, NC 27695, United States.
| | - Shawn C Owen
- Department of Molecular Pharmaceutics; Department of Biomedical Engineering, University of Utah, Salt Lake City, UT, USA.
| |
Collapse
|
20
|
Sawamura S, Bito T, Ando T, Masuda K, Kameyama S, Ishida H. Evaluation of the accuracy of ChatGPT's responses to and references for clinical questions in physical therapy. J Phys Ther Sci 2024; 36:234-239. [PMID: 38694019 PMCID: PMC11060764 DOI: 10.1589/jpts.36.234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 01/29/2024] [Indexed: 05/03/2024] Open
Abstract
[Purpose] This study evaluated the accuracy of ChatGPT's responses to and references for five clinical questions in physical therapy based on the Physical Therapy Guidelines and assessed this language model's potential as a tool for supporting clinical decision-making in the rehabilitation field. [Participants and Methods] Five clinical questions from the "Stroke", "Musculoskeletal disorders", and "Internal disorders" sections of the Physical Therapy Guidelines, released by the Japanese Society of Physical Therapy, were presented to ChatGPT. ChatGPT was instructed to provide responses in Japanese accompanied by references such as PubMed IDs or digital object identifiers. The accuracy of the generated content and references was evaluated by two assessors with expertise in their respective sections by using a 4-point scale, and comments were provided for point deductions. The inter-rater agreement was evaluated using weighted kappa coefficients. [Results] ChatGPT demonstrated adequate accuracy in generating content for clinical questions in physical therapy. However, the accuracy of the references was poor, with a significant number of references being non-existent or misinterpreted. [Conclusion] ChatGPT has limitations in reference selection and reliability. While ChatGPT can offer accurate responses to clinical questions in physical therapy, it should be used with caution because it is not a completely reliable model.
Collapse
Affiliation(s)
- Shogo Sawamura
- Department of Rehabilitation, Heisei College of Health
Sciences: 180 Kurono, Gifu City, Gifu 501-1131, Japan
| | - Takanobu Bito
- Department of Rehabilitation, Gifu University Hospital,
Japan
| | - Takahiro Ando
- Department of Rehabilitation, Gifu University Hospital,
Japan
| | - Kento Masuda
- Department of Rehabilitation, Gifu University Hospital,
Japan
| | - Sakiko Kameyama
- Department of Rehabilitation, Heisei College of Health
Sciences: 180 Kurono, Gifu City, Gifu 501-1131, Japan
| | - Hiroyasu Ishida
- Department of Rehabilitation, Heisei College of Health
Sciences: 180 Kurono, Gifu City, Gifu 501-1131, Japan
| |
Collapse
|
21
|
Yaghy A, Yaghy M, Shields JA, Shields CL. Large Language Models in Ophthalmology: Potential and Pitfalls. Semin Ophthalmol 2024; 39:289-293. [PMID: 38179986 DOI: 10.1080/08820538.2023.2300808] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Accepted: 12/06/2023] [Indexed: 01/06/2024]
Abstract
Large language models (LLMs) show great promise in assisting clinicians in general, and ophthalmology in particular, through knowledge synthesis, decision support, accelerating research, enhancing education, and improving patient interactions. Specifically, LLMs can rapidly summarize the latest literature to keep clinicians up-to-date. They can also analyze patient data to highlight crucial insights and recommend appropriate tests or referrals. LLMs can automate tedious research tasks like data cleaning and literature reviews. As AI tutors, LLMs can fill knowledge gaps and assess competency in trainees. As chatbots, they can provide empathetic, personalized responses to patient inquiries and improve satisfaction. The visual capabilities of LLMs like GPT-4 allow assisting the visually impaired by describing environments. However, there are significant ethical, technical, and legal challenges around the use of LLMs that should be addressed regarding privacy, fairness, robustness, attribution, and regulation. Ongoing oversight and refinement of models is critical to realize benefits while minimizing risks and upholding responsible AI principles. If carefully implemented, LLMs hold immense potential to push the boundaries of care, discovery, and quality of life for ophthalmology patients.
Collapse
Affiliation(s)
- Antonio Yaghy
- Ocular Oncology Service, Wills Eye Hospital, Thomas Jefferson University, Philadelphia, PA, USA
| | - Maria Yaghy
- Pediatric Emergency and Infectious Disease, Centre Hospitalier Universitaire Timone Enfants, Marseille, France
| | - Jerry A Shields
- Ocular Oncology Service, Wills Eye Hospital, Thomas Jefferson University, Philadelphia, PA, USA
| | - Carol L Shields
- Ocular Oncology Service, Wills Eye Hospital, Thomas Jefferson University, Philadelphia, PA, USA
| |
Collapse
|
22
|
Han Z, Battaglia F, Udaiyar A, Fooks A, Terlecky SR. An explorative assessment of ChatGPT as an aid in medical education: Use it with caution. MEDICAL TEACHER 2024; 46:657-664. [PMID: 37862566 DOI: 10.1080/0142159x.2023.2271159] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/22/2023]
Abstract
OBJECTIVE To explore the use of ChatGPT by educators and students in a medical school setting. METHOD This study used the public version of ChatGPT launched by OpenAI on November 30, 2022 (https://openai.com/blog/chatgpt/). We employed prompts to ask ChatGPT to 1) generate a content outline for a session on the topics of cholesterol, lipoproteins, and hyperlipidemia for medical students; 2) produce a list of learning objectives for the session; and 3) write assessment questions with and without clinical vignettes related to the identified learning objectives. We assessed the responses by ChatGPT for accuracy and reliability to determine the potential of the chatbot as an aid to educators and as a "know-it-all" medical information provider for students. RESULTS ChatGPT can function as an aid to educators, but it is not yet suitable as a reliable information resource for educators and medical students. CONCLUSION ChatGPT can be a useful tool to assist medical educators in drafting course and session content outlines and create assessment questions. At the same time, caution must be taken as ChatGPT is prone to providing incorrect information; expert oversight and caution are necessary to ensure the information generated is accurate and beneficial to students. Therefore, it is premature for medical students to use the current version of ChatGPT as a "know-it-all" information provider. In the future, medical educators should work with programming experts to explore and grow the full potential of AI in medical education.
Collapse
Affiliation(s)
- Zhiyong Han
- Department of Medical Sciences, Hackensack Meridian School of Medicine, Nutley, NJ, USA
| | - Fortunato Battaglia
- Department of Medical Sciences, Hackensack Meridian School of Medicine, Nutley, NJ, USA
| | - Abinav Udaiyar
- Department of Medical Sciences, Hackensack Meridian School of Medicine, Nutley, NJ, USA
| | - Allen Fooks
- Department of Medical Sciences, Hackensack Meridian School of Medicine, Nutley, NJ, USA
| | - Stanley R Terlecky
- Department of Medical Sciences, Hackensack Meridian School of Medicine, Nutley, NJ, USA
| |
Collapse
|
23
|
Owen SC, Nguyen J. Emerging Voices in Drug Delivery - Harnessing and Modulating Complex Biological Systems (Issue 2). Adv Drug Deliv Rev 2024; 208:115293. [PMID: 38521245 DOI: 10.1016/j.addr.2024.115293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2024]
Affiliation(s)
- Shawn C Owen
- Department of Molecular Pharmaceutics, Department of Biomedical Engineering, University of Utah, Salt Lake City, UT, the United States of America.
| | - Juliane Nguyen
- Division of Pharmacoengineering & Molecular Pharmaceutics, Eshelman School of Pharmacy, UNC, Chapel Hill, NC 27599, the United States of America; Department of Biomedical Engineering, NC State/UNC, Chapel Hill, NC 27695, the United States of America.
| |
Collapse
|
24
|
Choudhury A, Chaudhry Z. Large Language Models and User Trust: Consequence of Self-Referential Learning Loop and the Deskilling of Health Care Professionals. J Med Internet Res 2024; 26:e56764. [PMID: 38662419 PMCID: PMC11082730 DOI: 10.2196/56764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 03/12/2024] [Accepted: 03/20/2024] [Indexed: 04/26/2024] Open
Abstract
As the health care industry increasingly embraces large language models (LLMs), understanding the consequence of this integration becomes crucial for maximizing benefits while mitigating potential pitfalls. This paper explores the evolving relationship among clinician trust in LLMs, the transition of data sources from predominantly human-generated to artificial intelligence (AI)-generated content, and the subsequent impact on the performance of LLMs and clinician competence. One of the primary concerns identified in this paper is the LLMs' self-referential learning loops, where AI-generated content feeds into the learning algorithms, threatening the diversity of the data pool, potentially entrenching biases, and reducing the efficacy of LLMs. While theoretical at this stage, this feedback loop poses a significant challenge as the integration of LLMs in health care deepens, emphasizing the need for proactive dialogue and strategic measures to ensure the safe and effective use of LLM technology. Another key takeaway from our investigation is the role of user expertise and the necessity for a discerning approach to trusting and validating LLM outputs. The paper highlights how expert users, particularly clinicians, can leverage LLMs to enhance productivity by off-loading routine tasks while maintaining a critical oversight to identify and correct potential inaccuracies in AI-generated content. This balance of trust and skepticism is vital for ensuring that LLMs augment rather than undermine the quality of patient care. We also discuss the risks associated with the deskilling of health care professionals. Frequent reliance on LLMs for critical tasks could result in a decline in health care providers' diagnostic and thinking skills, particularly affecting the training and development of future professionals. The legal and ethical considerations surrounding the deployment of LLMs in health care are also examined. We discuss the medicolegal challenges, including liability in cases of erroneous diagnoses or treatment advice generated by LLMs. The paper references recent legislative efforts, such as The Algorithmic Accountability Act of 2023, as crucial steps toward establishing a framework for the ethical and responsible use of AI-based technologies in health care. In conclusion, this paper advocates for a strategic approach to integrating LLMs into health care. By emphasizing the importance of maintaining clinician expertise, fostering critical engagement with LLM outputs, and navigating the legal and ethical landscape, we can ensure that LLMs serve as valuable tools in enhancing patient care and supporting health care professionals. This approach addresses the immediate challenges posed by integrating LLMs and sets a foundation for their maintainable and responsible use in the future.
Collapse
Affiliation(s)
- Avishek Choudhury
- Industrial and Management Systems Engineering, West Virginia University, Morgantown, WV, United States
| | - Zaira Chaudhry
- Industrial and Management Systems Engineering, West Virginia University, Morgantown, WV, United States
| |
Collapse
|
25
|
Raman R, Lathabai HH, Mandal S, Das P, Kaur T, Nedungadi P. ChatGPT: Literate or intelligent about UN sustainable development goals? PLoS One 2024; 19:e0297521. [PMID: 38656952 PMCID: PMC11042716 DOI: 10.1371/journal.pone.0297521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 01/05/2024] [Indexed: 04/26/2024] Open
Abstract
Generative AI tools, such as ChatGPT, are progressively transforming numerous sectors, demonstrating a capacity to impact human life dramatically. This research seeks to evaluate the UN Sustainable Development Goals (SDGs) literacy of ChatGPT, which is crucial for diverse stakeholders involved in SDG-related policies. Experimental outcomes from two widely used Sustainability Assessment tests-the UN SDG Fitness Test and Sustainability Literacy Test (SULITEST) - suggest that ChatGPT exhibits high SDG literacy, yet its comprehensive SDG intelligence needs further exploration. The Fitness Test gauges eight vital competencies across introductory, intermediate, and advanced levels. Accurate mapping of these to the test questions is essential for partial evaluation of SDG intelligence. To assess SDG intelligence, the questions from both tests were mapped to 17 SDGs and eight cross-cutting SDG core competencies, but both test questionnaires were found to be insufficient. SULITEST could satisfactorily map only 5 out of 8 competencies, whereas the Fitness Test managed to map 6 out of 8. Regarding the coverage of the Fitness Test and SULITEST, their mapping to the 17 SDGs, both tests fell short. Most SDGs were underrepresented in both instruments, with certain SDGs not represented at all. Consequently, both tools proved ineffective in assessing SDG intelligence through SDG coverage. The study recommends future versions of ChatGPT to enhance competencies such as collaboration, critical thinking, systems thinking, and others to achieve the SDGs. It concludes that while AI models like ChatGPT hold considerable potential in sustainable development, their usage must be approached carefully, considering current limitations and ethical implications.
Collapse
Affiliation(s)
- Raghu Raman
- Amrita School of Business, Amrita Vishwa Vidyapeetham, Amritapuri, Kerala, India
| | | | - Santanu Mandal
- Amrita School of Business, Amaravati, Andhra Pradesh, India
| | - Payel Das
- Amrita School of Business, Amaravati, Andhra Pradesh, India
| | - Tavleen Kaur
- Fortune Institute of International Business, New Delhi, India
| | | |
Collapse
|
26
|
Maccaro A, Stokes K, Statham L, He L, Williams A, Pecchia L, Piaggio D. Clearing the Fog: A Scoping Literature Review on the Ethical Issues Surrounding Artificial Intelligence-Based Medical Devices. J Pers Med 2024; 14:443. [PMID: 38793025 PMCID: PMC11121798 DOI: 10.3390/jpm14050443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 04/12/2024] [Accepted: 04/16/2024] [Indexed: 05/26/2024] Open
Abstract
The use of AI in healthcare has sparked much debate among philosophers, ethicists, regulators and policymakers who raised concerns about the implications of such technologies. The presented scoping review captures the progression of the ethical and legal debate and the proposed ethical frameworks available concerning the use of AI-based medical technologies, capturing key themes across a wide range of medical contexts. The ethical dimensions are synthesised in order to produce a coherent ethical framework for AI-based medical technologies, highlighting how transparency, accountability, confidentiality, autonomy, trust and fairness are the top six recurrent ethical issues. The literature also highlighted how it is essential to increase ethical awareness through interdisciplinary research, such that researchers, AI developers and regulators have the necessary education/competence or networks and tools to ensure proper consideration of ethical matters in the conception and design of new AI technologies and their norms. Interdisciplinarity throughout research, regulation and implementation will help ensure AI-based medical devices are ethical, clinically effective and safe. Achieving these goals will facilitate successful translation of AI into healthcare systems, which currently is lagging behind other sectors, to ensure timely achievement of health benefits to patients and the public.
Collapse
Affiliation(s)
- Alessia Maccaro
- Applied Biomedical Signal Processing Intelligent eHealth Lab, School of Engineering, University of Warwick, Coventry CV4 7AL, UK; (A.M.); (K.S.); (L.S.); (L.H.); (A.W.); (L.P.)
| | - Katy Stokes
- Applied Biomedical Signal Processing Intelligent eHealth Lab, School of Engineering, University of Warwick, Coventry CV4 7AL, UK; (A.M.); (K.S.); (L.S.); (L.H.); (A.W.); (L.P.)
| | - Laura Statham
- Applied Biomedical Signal Processing Intelligent eHealth Lab, School of Engineering, University of Warwick, Coventry CV4 7AL, UK; (A.M.); (K.S.); (L.S.); (L.H.); (A.W.); (L.P.)
- Warwick Medical School, University of Warwick, Coventry CV4 7AL, UK
| | - Lucas He
- Applied Biomedical Signal Processing Intelligent eHealth Lab, School of Engineering, University of Warwick, Coventry CV4 7AL, UK; (A.M.); (K.S.); (L.S.); (L.H.); (A.W.); (L.P.)
- Faculty of Engineering, Imperial College, London SW7 1AY, UK
| | - Arthur Williams
- Applied Biomedical Signal Processing Intelligent eHealth Lab, School of Engineering, University of Warwick, Coventry CV4 7AL, UK; (A.M.); (K.S.); (L.S.); (L.H.); (A.W.); (L.P.)
| | - Leandro Pecchia
- Applied Biomedical Signal Processing Intelligent eHealth Lab, School of Engineering, University of Warwick, Coventry CV4 7AL, UK; (A.M.); (K.S.); (L.S.); (L.H.); (A.W.); (L.P.)
- Intelligent Technologies for Health and Well-Being: Sustainable Design, Management and Evaluation, Faculty of Engineering, Università Campus Bio-Medico Roma, Via Alvaro del Portillo, 21, 00128 Rome, Italy
| | - Davide Piaggio
- Applied Biomedical Signal Processing Intelligent eHealth Lab, School of Engineering, University of Warwick, Coventry CV4 7AL, UK; (A.M.); (K.S.); (L.S.); (L.H.); (A.W.); (L.P.)
| |
Collapse
|
27
|
Lucas HC, Upperman JS, Robinson JR. A systematic review of large language models and their implications in medical education. MEDICAL EDUCATION 2024. [PMID: 38639098 DOI: 10.1111/medu.15402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 03/20/2024] [Accepted: 03/23/2024] [Indexed: 04/20/2024]
Abstract
INTRODUCTION In the past year, the use of large language models (LLMs) has generated significant interest and excitement because of their potential to revolutionise various fields, including medical education for aspiring physicians. Although medical students undergo a demanding educational process to become competent health care professionals, the emergence of LLMs presents a promising solution to challenges like information overload, time constraints and pressure on clinical educators. However, integrating LLMs into medical education raises critical concerns and challenges for educators, professionals and students. This systematic review aims to explore LLM applications in medical education, specifically their impact on medical students' learning experiences. METHODS A systematic search was performed in PubMed, Web of Science and Embase for articles discussing the applications of LLMs in medical education using selected keywords related to LLMs and medical education, from the time of ChatGPT's debut until February 2024. Only articles available in full text or English were reviewed. The credibility of each study was critically appraised by two independent reviewers. RESULTS The systematic review identified 166 studies, of which 40 were found by review to be relevant to the study. Among the 40 relevant studies, key themes included LLM capabilities, benefits such as personalised learning and challenges regarding content accuracy. Importantly, 42.5% of these studies specifically evaluated LLMs in a novel way, including ChatGPT, in contexts such as medical exams and clinical/biomedical information, highlighting their potential in replicating human-level performance in medical knowledge. The remaining studies broadly discussed the prospective role of LLMs in medical education, reflecting a keen interest in their future potential despite current constraints. CONCLUSIONS The responsible implementation of LLMs in medical education offers a promising opportunity to enhance learning experiences. However, ensuring information accuracy, emphasising skill-building and maintaining ethical safeguards are crucial. Continuous critical evaluation and interdisciplinary collaboration are essential for the appropriate integration of LLMs in medical education.
Collapse
Affiliation(s)
| | - Jeffrey S Upperman
- Department of Pediatric Surgery, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Jamie R Robinson
- Department of Pediatric Surgery, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| |
Collapse
|
28
|
Siepmann R, Huppertz M, Rastkhiz A, Reen M, Corban E, Schmidt C, Wilke S, Schad P, Yüksel C, Kuhl C, Truhn D, Nebelung S. The virtual reference radiologist: comprehensive AI assistance for clinical image reading and interpretation. Eur Radiol 2024:10.1007/s00330-024-10727-2. [PMID: 38627289 DOI: 10.1007/s00330-024-10727-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 02/27/2024] [Accepted: 03/08/2024] [Indexed: 04/20/2024]
Abstract
OBJECTIVES Large language models (LLMs) have shown potential in radiology, but their ability to aid radiologists in interpreting imaging studies remains unexplored. We investigated the effects of a state-of-the-art LLM (GPT-4) on the radiologists' diagnostic workflow. MATERIALS AND METHODS In this retrospective study, six radiologists of different experience levels read 40 selected radiographic [n = 10], CT [n = 10], MRI [n = 10], and angiographic [n = 10] studies unassisted (session one) and assisted by GPT-4 (session two). Each imaging study was presented with demographic data, the chief complaint, and associated symptoms, and diagnoses were registered using an online survey tool. The impact of Artificial Intelligence (AI) on diagnostic accuracy, confidence, user experience, input prompts, and generated responses was assessed. False information was registered. Linear mixed-effect models were used to quantify the factors (fixed: experience, modality, AI assistance; random: radiologist) influencing diagnostic accuracy and confidence. RESULTS When assessing if the correct diagnosis was among the top-3 differential diagnoses, diagnostic accuracy improved slightly from 181/240 (75.4%, unassisted) to 188/240 (78.3%, AI-assisted). Similar improvements were found when only the top differential diagnosis was considered. AI assistance was used in 77.5% of the readings. Three hundred nine prompts were generated, primarily involving differential diagnoses (59.1%) and imaging features of specific conditions (27.5%). Diagnostic confidence was significantly higher when readings were AI-assisted (p > 0.001). Twenty-three responses (7.4%) were classified as hallucinations, while two (0.6%) were misinterpretations. CONCLUSION Integrating GPT-4 in the diagnostic process improved diagnostic accuracy slightly and diagnostic confidence significantly. Potentially harmful hallucinations and misinterpretations call for caution and highlight the need for further safeguarding measures. CLINICAL RELEVANCE STATEMENT Using GPT-4 as a virtual assistant when reading images made six radiologists of different experience levels feel more confident and provide more accurate diagnoses; yet, GPT-4 gave factually incorrect and potentially harmful information in 7.4% of its responses.
Collapse
Affiliation(s)
- Robert Siepmann
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany
| | - Marc Huppertz
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany
| | - Annika Rastkhiz
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany
| | - Matthias Reen
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany
| | - Eric Corban
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany
| | - Christian Schmidt
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany
| | - Stephan Wilke
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany
| | - Philipp Schad
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany
| | - Can Yüksel
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany
| | - Christiane Kuhl
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany
| | - Daniel Truhn
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany
| | - Sven Nebelung
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany.
| |
Collapse
|
29
|
Javid M, Bhandari M, Parameshwari P, Reddiboina M, Prasad S. Evaluation of ChatGPT for Patient Counseling in Kidney Stone Clinic: A Prospective Study. J Endourol 2024; 38:377-383. [PMID: 38411835 DOI: 10.1089/end.2023.0571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/28/2024] Open
Abstract
Introduction: The potential of large language models (LLMs) is to improve the clinical workflow and to make patient care efficient. We prospectively evaluated the performance of the LLM ChatGPT as a patient counseling tool in the urology stone clinic and validated the generated responses with those of urologists. Methods: We collected 61 questions from 12 kidney stone patients and prompted those to ChatGPT and a panel of experienced urologists (Level 1). Subsequently, the blinded responses of urologists and ChatGPT were presented to two expert urologists (Level 2) for comparative evaluation on preset domains: accuracy, relevance, empathy, completeness, and practicality. All responses were rated on a Likert scale of 1 to 10 for psychometric response evaluation. The mean difference in the scores given by the urologists (Level 2) was analyzed and interrater reliability (IRR) for the level of agreement in the responses between the urologists (Level 2) was analyzed by Cohen's kappa. Results: The mean differences in average scores between the responses from ChatGPT and urologists showed significant differences in accuracy (p < 0.001), empathy (p < 0.001), completeness (p < 0.001), and practicality (p < 0.001), except for the relevance domain (p = 0.051), with ChatGPT's responses being rated higher. The IRR analysis revealed significant agreement only in the empathy domain [k = 0.163, (0.059-0.266)]. Conclusion: We believe the introduction of ChatGPT in the clinical workflow could further optimize the information provided to patients in a busy stone clinic. In this preliminary study, ChatGPT supplemented the answers provided by the urologists, adding value to the conversation. However, in its current state, it is still not ready to be a direct source of authentic information for patients. We recommend its use as a source to build a comprehensive Frequently Asked Questions bank as a prelude to developing an LLM Chatbot for patient counseling.
Collapse
Affiliation(s)
- Mohamed Javid
- Department of Urology, Chengalpattu Medical College, Chengalpattu, Tamil Nadu, India
| | - Mahendra Bhandari
- Vattikuti Urology Institute, Henry Ford Hospital, Detroit, Michigan, USA
| | - P Parameshwari
- Department of Community Medicine, Chengalpattu Medical College, Chengalpattu, Tamil Nadu, India
| | | | - Srikala Prasad
- Department of Urology, Chengalpattu Medical College, Chengalpattu, Tamil Nadu, India
| |
Collapse
|
30
|
Shukla R, Mishra AK, Banerjee N, Verma A. The Comparison of ChatGPT 3.5, Microsoft Bing, and Google Gemini for Diagnosing Cases of Neuro-Ophthalmology. Cureus 2024; 16:e58232. [PMID: 38745784 PMCID: PMC11092423 DOI: 10.7759/cureus.58232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/14/2024] [Indexed: 05/16/2024] Open
Abstract
OBJECTIVE We aim to compare the capabilities of ChatGPT 3.5, Microsoft Bing, and Google Gemini in handling neuro-ophthalmological case scenarios. METHODS Ten randomly chosen neuro-ophthalmological cases from a publicly accessible database were used to test the accuracy and suitability of all three models, and the case details were followed by the following query: "What is the most probable diagnosis?" RESULTS On the basis of the accuracy of diagnosis, all three chat boxes (ChatGPT 3.5, Microsoft Bing, and Google Gemini) gave the correct diagnosis in four (40%) out of 10 cases, whereas in terms of suitability, ChatGPT 3.5, Microsoft Bing, and Google Gemini gave six (60%), five (50%), and five (50%) out of 10 case scenarios, respectively. CONCLUSION ChatGPT 3.5 performs better than the other two when it comes to handling neuro-ophthalmological case difficulties. These results highlight the potential benefits of developing artificial intelligence (AI) models for improving medical education and ocular diagnostics.
Collapse
Affiliation(s)
- Ruchi Shukla
- Department of Ophthalmology, All India Institute of Medical Sciences, Raebareli, Raebareli, IND
| | - Ashutosh K Mishra
- Department of Neurology, All India Institute of Medical Sciences, Raebareli, Raebareli, IND
| | - Nilakshi Banerjee
- Department of Ophthalmology, All India Institute of Medical Sciences, Raebareli, Raebareli, IND
| | - Archana Verma
- Department of Neurology, All India Institute of Medical Sciences, Raebareli, Raebareli, IND
| |
Collapse
|
31
|
Menz BD, Kuderer NM, Bacchi S, Modi ND, Chin-Yee B, Hu T, Rickard C, Haseloff M, Vitry A, McKinnon RA, Kichenadasse G, Rowland A, Sorich MJ, Hopkins AM. Current safeguards, risk mitigation, and transparency measures of large language models against the generation of health disinformation: repeated cross sectional analysis. BMJ 2024; 384:e078538. [PMID: 38508682 PMCID: PMC10961718 DOI: 10.1136/bmj-2023-078538] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/19/2024] [Indexed: 03/22/2024]
Abstract
OBJECTIVES To evaluate the effectiveness of safeguards to prevent large language models (LLMs) from being misused to generate health disinformation, and to evaluate the transparency of artificial intelligence (AI) developers regarding their risk mitigation processes against observed vulnerabilities. DESIGN Repeated cross sectional analysis. SETTING Publicly accessible LLMs. METHODS In a repeated cross sectional analysis, four LLMs (via chatbots/assistant interfaces) were evaluated: OpenAI's GPT-4 (via ChatGPT and Microsoft's Copilot), Google's PaLM 2 and newly released Gemini Pro (via Bard), Anthropic's Claude 2 (via Poe), and Meta's Llama 2 (via HuggingChat). In September 2023, these LLMs were prompted to generate health disinformation on two topics: sunscreen as a cause of skin cancer and the alkaline diet as a cancer cure. Jailbreaking techniques (ie, attempts to bypass safeguards) were evaluated if required. For LLMs with observed safeguarding vulnerabilities, the processes for reporting outputs of concern were audited. 12 weeks after initial investigations, the disinformation generation capabilities of the LLMs were re-evaluated to assess any subsequent improvements in safeguards. MAIN OUTCOME MEASURES The main outcome measures were whether safeguards prevented the generation of health disinformation, and the transparency of risk mitigation processes against health disinformation. RESULTS Claude 2 (via Poe) declined 130 prompts submitted across the two study timepoints requesting the generation of content claiming that sunscreen causes skin cancer or that the alkaline diet is a cure for cancer, even with jailbreaking attempts. GPT-4 (via Copilot) initially refused to generate health disinformation, even with jailbreaking attempts-although this was not the case at 12 weeks. In contrast, GPT-4 (via ChatGPT), PaLM 2/Gemini Pro (via Bard), and Llama 2 (via HuggingChat) consistently generated health disinformation blogs. In September 2023 evaluations, these LLMs facilitated the generation of 113 unique cancer disinformation blogs, totalling more than 40 000 words, without requiring jailbreaking attempts. The refusal rate across the evaluation timepoints for these LLMs was only 5% (7 of 150), and as prompted the LLM generated blogs incorporated attention grabbing titles, authentic looking (fake or fictional) references, fabricated testimonials from patients and clinicians, and they targeted diverse demographic groups. Although each LLM evaluated had mechanisms to report observed outputs of concern, the developers did not respond when observations of vulnerabilities were reported. CONCLUSIONS This study found that although effective safeguards are feasible to prevent LLMs from being misused to generate health disinformation, they were inconsistently implemented. Furthermore, effective processes for reporting safeguard problems were lacking. Enhanced regulation, transparency, and routine auditing are required to help prevent LLMs from contributing to the mass generation of health disinformation.
Collapse
Affiliation(s)
- Bradley D Menz
- College of Medicine and Public Health, Flinders University, Adelaide, SA, 5042, Australia
| | | | - Stephen Bacchi
- College of Medicine and Public Health, Flinders University, Adelaide, SA, 5042, Australia
- Northern Adelaide Local Health Network, Lyell McEwin Hospital, Adelaide, Australia
| | - Natansh D Modi
- College of Medicine and Public Health, Flinders University, Adelaide, SA, 5042, Australia
| | - Benjamin Chin-Yee
- Schulich School of Medicine and Dentistry, Western University, London, Canada
- Department of History and Philosophy of Science, University of Cambridge, Cambridge, UK
| | - Tiancheng Hu
- Language Technology Lab, University of Cambridge, Cambridge, UK
| | - Ceara Rickard
- Consumer Advisory Group, Clinical Cancer Epidemiology Group, College of Medicine and Public Health, Flinders University, Adelaide, Australia
| | - Mark Haseloff
- Consumer Advisory Group, Clinical Cancer Epidemiology Group, College of Medicine and Public Health, Flinders University, Adelaide, Australia
| | - Agnes Vitry
- Consumer Advisory Group, Clinical Cancer Epidemiology Group, College of Medicine and Public Health, Flinders University, Adelaide, Australia
- University of South Australia, Clinical and Health Sciences, Adelaide, Australia
| | - Ross A McKinnon
- College of Medicine and Public Health, Flinders University, Adelaide, SA, 5042, Australia
| | - Ganessan Kichenadasse
- College of Medicine and Public Health, Flinders University, Adelaide, SA, 5042, Australia
- Flinders Centre for Innovation in Cancer, Department of Medical Oncology, Flinders Medical Centre, Flinders University, Bedford Park, South Australia, Australia
| | - Andrew Rowland
- College of Medicine and Public Health, Flinders University, Adelaide, SA, 5042, Australia
| | - Michael J Sorich
- College of Medicine and Public Health, Flinders University, Adelaide, SA, 5042, Australia
| | - Ashley M Hopkins
- College of Medicine and Public Health, Flinders University, Adelaide, SA, 5042, Australia
| |
Collapse
|
32
|
Elbadawi M, Li H, Basit AW, Gaisford S. The role of artificial intelligence in generating original scientific research. Int J Pharm 2024; 652:123741. [PMID: 38181989 DOI: 10.1016/j.ijpharm.2023.123741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 12/20/2023] [Accepted: 12/22/2023] [Indexed: 01/07/2024]
Abstract
Artificial intelligence (AI) is a revolutionary technology that is finding wide application across numerous sectors. Large language models (LLMs) are an emerging subset technology of AI and have been developed to communicate using human languages. At their core, LLMs are trained with vast amounts of information extracted from the internet, including text and images. Their ability to create human-like, expert text in almost any subject means they are increasingly being used as an aid to presentation, particularly in scientific writing. However, we wondered whether LLMs could go further, generating original scientific research and preparing the results for publication. We taskedGPT-4, an LLM, to write an original pharmaceutics manuscript, on a topic that is itself novel. It was able to conceive a research hypothesis, define an experimental protocol, produce photo-realistic images of 3D printed tablets, generate believable analytical data from a range of instruments and write a convincing publication-ready manuscript with evidence of critical interpretation. The model achieved all this is less than 1 h. Moreover, the generated data were multi-modal in nature, including thermal analyses, vibrational spectroscopy and dissolution testing, demonstrating multi-disciplinary expertise in the LLM. One area in which the model failed, however, was in referencing to the literature. Since the generated experimental results appeared believable though, we suggest that LLMs could certainly play a role in scientific research but with human input, interpretation and data validation. We discuss the potential benefits and current bottlenecks for realising this ambition here.
Collapse
Affiliation(s)
- Moe Elbadawi
- UCL School of Pharmacy, University College London, 29-39 Brunswick Square, London WC1N 1AX, UK.
| | - Hanxiang Li
- UCL School of Pharmacy, University College London, 29-39 Brunswick Square, London WC1N 1AX, UK
| | - Abdul W Basit
- UCL School of Pharmacy, University College London, 29-39 Brunswick Square, London WC1N 1AX, UK
| | - Simon Gaisford
- UCL School of Pharmacy, University College London, 29-39 Brunswick Square, London WC1N 1AX, UK.
| |
Collapse
|
33
|
Sharpnack PA. Made Better by Chat GPT: Cultivating a Culture of Innovation in Nursing Education: Cultivating a Culture of Innovation in Nursing Education. Nurs Educ Perspect 2024; 45:67-68. [PMID: 38373098 DOI: 10.1097/01.nep.0000000000001242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2024]
Affiliation(s)
- Patricia A Sharpnack
- About the Author NLN Chair Patricia A. Sharpnack, DNP, RN, CNE, NEA-BC, ANEF, FAAN, is dean and Strawbridge Professor, The Breen School of Nursing and Health Professions, Ursuline College, Pepper Pike, Ohio. Contact her at
| |
Collapse
|
34
|
Gurnani B, Kaur K. Leveraging ChatGPT for ophthalmic education: A critical appraisal. Eur J Ophthalmol 2024; 34:323-327. [PMID: 37974429 DOI: 10.1177/11206721231215862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2023]
Abstract
In recent years, the advent of artificial intelligence (AI) has transformed many sectors, including medical education. This editorial critically appraises the integration of ChatGPT, a state-of-the-art AI language model, into ophthalmic education, focusing on its potential, limitations, and ethical considerations. The application of ChatGPT in teaching and training ophthalmologists presents an innovative method to offer real-time, customized learning experiences. Through a systematic analysis of both experimental and clinical data, this editorial examines how ChatGPT enhances engagement, understanding, and retention of complex ophthalmological concepts. The study also evaluates the efficacy of ChatGPT in simulating patient interactions and clinical scenarios, which can foster improved diagnostic and interpersonal skills. Despite the promising advantages, concerns regarding reliability, lack of personal touch, and potential biases in the AI-generated content are scrutinized. Ethical considerations concerning data privacy and potential misuse are also explored. The findings underline the need for carefully designed integration, continuous evaluation, and adherence to ethical guidelines to maximize benefits while mitigating risks. By shedding light on these multifaceted aspects, this paper contributes to the ongoing discourse on the incorporation of AI in medical education, offering valuable insights and guidance for educators, practitioners, and policymakers aiming to leverage modern technology for enhancing ophthalmic education.
Collapse
Affiliation(s)
- Bharat Gurnani
- Cataract, Cornea, Trauma, External Diseases, Ocular Surface and Refractive Services, ASG Eye Hospital, Jodhpur, Rajasthan, India
- Sadguru Netra Chikitsalya, Shri Sadguru Seva Sangh Trust, Chitrakoot, Madhya Pradesh, India
| | - Kirandeep Kaur
- Cataract, Pediatric Ophthalmology and Strabismus, ASG Eye Hospital, Jodhpur, Rajasthan, India
- Children Eye Care Centre, Sadguru Netra Chikitsalya, Shri Sadguru Seva Sangh Trust, Chitrakoot, Madhya Pradesh, India
| |
Collapse
|
35
|
Haman M, Školník M, Lošťák M. AI dietician: Unveiling the accuracy of ChatGPT's nutritional estimations. Nutrition 2024; 119:112325. [PMID: 38194819 DOI: 10.1016/j.nut.2023.112325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 11/02/2023] [Accepted: 12/04/2023] [Indexed: 01/11/2024]
Abstract
We investigate the accuracy and reliability of ChatGPT, an artificial intelligence model developed by OpenAI, in providing nutritional information for dietary planning and weight management. The results have a reasonable level of accuracy, with energy values having the highest level of conformity: 97% of the artificial intelligence values fall within a 40% difference from United States Department of Agriculture data. Additionally, ChatGPT displayed consistency in its provision of nutritional data, as indicated by relatively low coefficient of variation values for each nutrient. The artificial intelligence model also proved efficient in generating a daily meal plan within a specified caloric limit, with all the meals falling within a 30% bound of the United States Department of Agriculture's caloric values. These findings suggest that ChatGPT can provide reasonably accurate and consistent nutritional information. Further research is recommended to assess the model's performance across a broader range of foods and meals..
Collapse
Affiliation(s)
- Michael Haman
- Department of Humanities, Faculty of Economics and Management, Czech University of Life Sciences Prague, Prague, Czech Republic.
| | - Milan Školník
- Department of Humanities, Faculty of Economics and Management, Czech University of Life Sciences Prague, Prague, Czech Republic
| | - Michal Lošťák
- Department of Humanities, Faculty of Economics and Management, Czech University of Life Sciences Prague, Prague, Czech Republic
| |
Collapse
|
36
|
Tunçer G, Güçlü KG. How Reliable is ChatGPT as a Novel Consultant in Infectious Diseases and Clinical Microbiology? INFECTIOUS DISEASES & CLINICAL MICROBIOLOGY 2024; 6:55-59. [PMID: 38633442 PMCID: PMC11020004 DOI: 10.36519/idcm.2024.286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 12/14/2023] [Indexed: 04/19/2024]
Abstract
Objective The study aimed to investigate the reliability of ChatGPT's answers to medical questions, including those sourced from patients and guide recommendations. The focus was on evaluating ChatGPT's accuracy in responding to various types of infectious disease questions. Materials and Methods The study was conducted using 200 questions sourced from social media, experts, and guidelines related to various infectious diseases like urinary tract infection, pneumonia, HIV, various types of hepatitis, COVID-19, skin infections, and tuberculosis. The questions were arranged for clarity and consistency by excluding repetitive or unclear ones. The answers were based on guidelines from reputable sources like the Infectious Diseases Society of America (IDSA), Centers for Disease Control and Prevention (CDC), European Association for the Study of Liver Disease (EASL) and Joint United Nations Programme on HIV/AIDS (UNAIDS) AIDSinfo. According to the scoring system, completely correct answers were given 1-point, and completely incorrect ones were given 4-points. To assess reproducibility, each question was posed twice on separate computers. Repeatability was determined by the consistency of the answers' scores. Results In the study, ChatGPT was posed with 200 questions: 107 from social media platforms and 93 from guidelines. The questions covered a range of topics: urinary tract infections (n=18 questions), pneumonia (n=22), HIV (n=39), hepatitis B and C (n=53), COVID-19 (n=11), skin and soft tissue infections (n=38), and tuberculosis (n=19). The lowest accuracy was 72% for urinary tract infections. ChatGPT answered 92% of social media platform questions correctly (scored 1-point) versus 69% of guideline questions (p=0.001; OR=5.48, 95% CI=2.29-13.11). Conclusion Artificial intelligence is widely used in the medical field by both healthcare professionals and patients. Although ChatGPT answers questions from social media platforms quite properly, we recommend that healthcare professionals be conscientious when using it.
Collapse
Affiliation(s)
- Gülşah Tunçer
- Bilecik Training and Research Hospital, Bilecik, Türkiye
| | | |
Collapse
|
37
|
Birkun AA. Misinformation on resuscitation and first aid as an uncontrolled problem that demands close attention: a brief scoping review. Public Health 2024; 228:147-149. [PMID: 38354584 DOI: 10.1016/j.puhe.2024.01.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 12/28/2023] [Accepted: 01/07/2024] [Indexed: 02/16/2024]
Abstract
OBJECTIVES Misinformation is currently recognised by the World Health Organization as an apparent threat to public health. This study aimed to provide an outline of published evidence on misinformation related to the potentially life-saving interventions - first aid and cardiopulmonary resuscitation (CPR). STUDY DESIGN A scoping review. METHODS The review was conducted in accordance with the PRISMA Extension for Scoping Reviews. English-language publications describing original studies that evaluated the quality of publicly available information on first aid and/or CPR were included without limitations to the year of publication. RESULTS Forty-four original studies published between 1982 and 2023 were reviewed. Annual number of publications varied from 0 to 6. The studies have focused on the evaluation of information concerning initial care of cardiac arrest, choking, heart attack, poisoning, burns, and other emergencies. Forty three studies (97.7 %) have reported varying frequencies of misinformation, when public sources, including websites, YouTube videos, and modern artificial intelligence-based chatbots, omitted life-saving instructions on first aid or CPR or contained incorrect information that contradicted relevant international guidelines. Eleven studies (25.0 %) have also revealed potentially harmful advice, which, if followed by an unsuspecting person, may cause direct injury or death of a victim. CONCLUSIONS Misinformation concerning CPR and first aid cannot be ignored and demands close attention from relevant stakeholders to mitigate its harmful impacts. More studies are urgently needed to determine optimal methods for detecting and measuring misinformation, to understand mechanisms that drive its spread, and to develop effective measures to correct and prevent misinformation.
Collapse
Affiliation(s)
- A A Birkun
- Department of General Surgery, Anesthesiology, Resuscitation and Emergency Medicine, Medical Institute Named After S.I. Georgievsky of V.I. Vernadsky Crimean Federal University, Lenin Blvd, 5/7, Simferopol, 295051, Russian Federation.
| |
Collapse
|
38
|
Tao BK, Handzic A, Hua NJ, Vosoughi AR, Margolin EA, Micieli JA. Utility of ChatGPT for Automated Creation of Patient Education Handouts: An Application in Neuro-Ophthalmology. J Neuroophthalmol 2024; 44:119-124. [PMID: 38175720 DOI: 10.1097/wno.0000000000002074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
BACKGROUND Patient education in ophthalmology poses a challenge for physicians because of time and resource limitations. ChatGPT (OpenAI, San Francisco) may assist with automating production of patient handouts on common neuro-ophthalmic diseases. METHODS We queried ChatGPT-3.5 to generate 51 patient education handouts across 17 conditions. We devised the "Quality of Generated Language Outputs for Patients" (QGLOP) tool to assess handouts on the domains of accuracy/comprehensiveness, bias, currency, and tone, each scored out of 4 for a total of 16. A fellowship-trained neuro-ophthalmologist scored each passage. Handout readability was assessed using the Simple Measure of Gobbledygook (SMOG), which estimates years of education required to understand a text. RESULTS The QGLOP scores for accuracy, bias, currency, and tone were found to be 2.43, 3, 3.43, and 3.02 respectively. The mean QGLOP score was 11.9 [95% CI 8.98, 14.8] out of 16 points, indicating a performance of 74.4% [95% CI 56.1%, 92.5%]. The mean SMOG across responses as 10.9 [95% CI 9.36, 12.4] years of education. CONCLUSIONS The mean QGLOP score suggests that a fellowship-trained ophthalmologist may have at-least a moderate level of satisfaction with the write-up quality conferred by ChatGPT. This still requires a final review and editing before dissemination. Comparatively, the rarer 5% of responses collectively on either extreme would require very mild or extensive revision. Also, the mean SMOG score exceeded the accepted upper limits of grade 8 reading level for health-related patient handouts. In its current iteration, ChatGPT should be used as an efficiency tool to generate an initial draft for the neuro-ophthalmologist, who may then refine the accuracy and readability for a lay readership.
Collapse
Affiliation(s)
- Brendan K Tao
- Faculty of Medicine (BKT), The University of British Columbia, Vancouver, Canada ; Department of Ophthalmology & Vision Science (AH, EAM, JAM), University of Toronto, Toronto, Canada; Temerty Faculty of Medicine (NJH), University of Toronto, Toronto, Canada; Department of Ophthalmology (ARV), Max Rady College of Medicine, University of Manitoba, Winnipeg, Canada; Mount Sinai Hospital (EAM), Toronto, Canada; Division of Neurology (EAM, JAM), Department of Medicine, University of Toronto, Toronto, Canada; Toronto Western Hospital (EAM, JAM), Toronto, Canada; University Health Network (EAM, JAM), Toronto, Canada; Kensington Vision and Research Center (JAM), Toronto, Canada; and St. Michael's Hospital (JAM), Toronto, Canada
| | | | | | | | | | | |
Collapse
|
39
|
Liu Z, Zhang L, Wu Z, Yu X, Cao C, Dai H, Liu N, Liu J, Liu W, Li Q, Shen D, Li X, Zhu D, Liu T. Surviving ChatGPT in healthcare. FRONTIERS IN RADIOLOGY 2024; 3:1224682. [PMID: 38464946 PMCID: PMC10920216 DOI: 10.3389/fradi.2023.1224682] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 07/25/2023] [Indexed: 03/12/2024]
Abstract
At the dawn of of Artificial General Intelligence (AGI), the emergence of large language models such as ChatGPT show promise in revolutionizing healthcare by improving patient care, expanding medical access, and optimizing clinical processes. However, their integration into healthcare systems requires careful consideration of potential risks, such as inaccurate medical advice, patient privacy violations, the creation of falsified documents or images, overreliance on AGI in medical education, and the perpetuation of biases. It is crucial to implement proper oversight and regulation to address these risks, ensuring the safe and effective incorporation of AGI technologies into healthcare systems. By acknowledging and mitigating these challenges, AGI can be harnessed to enhance patient care, medical knowledge, and healthcare processes, ultimately benefiting society as a whole.
Collapse
Affiliation(s)
- Zhengliang Liu
- School of Computing, University of Georgia, Athens, GA, United States
| | - Lu Zhang
- Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington, TX, United States
| | - Zihao Wu
- School of Computing, University of Georgia, Athens, GA, United States
| | - Xiaowei Yu
- Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington, TX, United States
| | - Chao Cao
- Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington, TX, United States
| | - Haixing Dai
- School of Computing, University of Georgia, Athens, GA, United States
| | - Ninghao Liu
- School of Computing, University of Georgia, Athens, GA, United States
| | - Jun Liu
- Department of Radiology, Second Xiangya Hospital, Changsha, Hunan, China
| | - Wei Liu
- Department of Radiation Oncology, Mayo Clinic, Scottsdale, AZ, United States
| | - Quanzheng Li
- Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, United States
| | - Dinggang Shen
- School of Biomedical Engineering, ShanghaiTech University, Shanghai, China
- Department of Research and Development, Shanhai United Imaging Intelligence Co., Ltd., Shanghai, China
- Shanghai Clinical Research and Trial Center, Shanghai, China
| | - Xiang Li
- Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, United States
| | - Dajiang Zhu
- Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington, TX, United States
| | - Tianming Liu
- School of Computing, University of Georgia, Athens, GA, United States
| |
Collapse
|
40
|
Denecke K, May R, Rivera-Romero O. Transformer Models in Healthcare: A Survey and Thematic Analysis of Potentials, Shortcomings and Risks. J Med Syst 2024; 48:23. [PMID: 38367119 PMCID: PMC10874304 DOI: 10.1007/s10916-024-02043-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 02/10/2024] [Indexed: 02/19/2024]
Abstract
Large Language Models (LLMs) such as General Pretrained Transformer (GPT) and Bidirectional Encoder Representations from Transformers (BERT), which use transformer model architectures, have significantly advanced artificial intelligence and natural language processing. Recognized for their ability to capture associative relationships between words based on shared context, these models are poised to transform healthcare by improving diagnostic accuracy, tailoring treatment plans, and predicting patient outcomes. However, there are multiple risks and potentially unintended consequences associated with their use in healthcare applications. This study, conducted with 28 participants using a qualitative approach, explores the benefits, shortcomings, and risks of using transformer models in healthcare. It analyses responses to seven open-ended questions using a simplified thematic analysis. Our research reveals seven benefits, including improved operational efficiency, optimized processes and refined clinical documentation. Despite these benefits, there are significant concerns about the introduction of bias, auditability issues and privacy risks. Challenges include the need for specialized expertise, the emergence of ethical dilemmas and the potential reduction in the human element of patient care. For the medical profession, risks include the impact on employment, changes in the patient-doctor dynamic, and the need for extensive training in both system operation and data interpretation.
Collapse
Affiliation(s)
- Kerstin Denecke
- Institute Patient-centered Digital Health, Bern University of Applied Sciences, Quellgasse 21, Biel, 2502, Switzerland.
| | - Richard May
- Harz University of Applied Sciences, Friedrichstraße 57-59, 38855, Wernigerode, Germany
| | - Octavio Rivera-Romero
- Instituto de Ingeniería Informática (I3US), Universidad de Sevilla, Sevilla, Spain
- Department of Electronic Technology, Universidad de Sevilla, Avda Reina Mercedes s/n, ETSI Informática, G1.43, Sevilla, 41012, Spain
| |
Collapse
|
41
|
Raman R, Kumar Nair V, Nedungadi P, Kumar Sahu A, Kowalski R, Ramanathan S, Achuthan K. Fake news research trends, linkages to generative artificial intelligence and sustainable development goals. Heliyon 2024; 10:e24727. [PMID: 38322879 PMCID: PMC10844021 DOI: 10.1016/j.heliyon.2024.e24727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 12/14/2023] [Accepted: 01/12/2024] [Indexed: 02/08/2024] Open
Abstract
In the digital age, where information is a cornerstone for decision-making, social media's not-so-regulated environment has intensified the prevalence of fake news, with significant implications for both individuals and societies. This study employs a bibliometric analysis of a large corpus of 9678 publications spanning 2013-2022 to scrutinize the evolution of fake news research, identifying leading authors, institutions, and nations. Three thematic clusters emerge: Disinformation in social media, COVID-19-induced infodemics, and techno-scientific advancements in auto-detection. This work introduces three novel contributions: 1) a pioneering mapping of fake news research to Sustainable Development Goals (SDGs), indicating its influence on areas like health (SDG 3), peace (SDG 16), and industry (SDG 9); 2) the utilization of Prominence percentile metrics to discern critical and economically prioritized research areas, such as misinformation and object detection in deep learning; and 3) an evaluation of generative AI's role in the propagation and realism of fake news, raising pressing ethical concerns. These contributions collectively provide a comprehensive overview of the current state and future trajectories of fake news research, offering valuable insights for academia, policymakers, and industry.
Collapse
Affiliation(s)
- Raghu Raman
- Amrita School of Business, Amrita Vishwa Vidyapeetham, Amritapuri, Kerala, 690525, India
| | - Vinith Kumar Nair
- Amrita School of Business, Amrita Vishwa Vidyapeetham, Amritapuri, Kerala, 690525, India
| | - Prema Nedungadi
- Amrita School of Computing, Amrita Vishwa Vidyapeetham, Amritapuri, Kerala, 690525, India
| | - Aditya Kumar Sahu
- Amrita School of Computing, Amrita Vishwa Vidyapeetham, Amaravati, Andhra Pradesh, 522503, India
| | - Robin Kowalski
- College of Behavioral, Social and Health Sciences, Clemson University, Clemson, SC, 29634, USA
| | - Sasangan Ramanathan
- Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, Tamilnadu, 641112, India
| | - Krishnashree Achuthan
- Center for Cybersecurity Systems and Networks, Amrita Vishwa Vidyapeetham, Amritapuri, Kerala, 690525, India
| |
Collapse
|
42
|
McMahon HV, McMahon BD. Automating untruths: ChatGPT, self-managed medication abortion, and the threat of misinformation in a post- Roe world. Front Digit Health 2024; 6:1287186. [PMID: 38419805 PMCID: PMC10900507 DOI: 10.3389/fdgth.2024.1287186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 01/26/2024] [Indexed: 03/02/2024] Open
Abstract
Background ChatGPT is a generative artificial intelligence chatbot that uses natural language processing to understand and execute prompts in a human-like manner. While the chatbot has become popular as a source of information among the public, experts have expressed concerns about the number of false and misleading statements made by ChatGPT. Many people search online for information about self-managed medication abortion, which has become even more common following the overturning of Roe v. Wade. It is likely that ChatGPT is also being used as a source of this information; however, little is known about its accuracy. Objective To assess the accuracy of ChatGPT responses to common questions regarding self-managed abortion safety and the process of using abortion pills. Methods We prompted ChatGPT with 65 questions about self-managed medication abortion, which produced approximately 11,000 words of text. We qualitatively coded all data in MAXQDA and performed thematic analysis. Results ChatGPT responses correctly described clinician-managed medication abortion as both safe and effective. In contrast, self-managed medication abortion was inaccurately described as dangerous and associated with an increase in the risk of complications, which was attributed to the lack of clinician supervision. Conclusion ChatGPT repeatedly provided responses that overstated the risk of complications associated with self-managed medication abortion in ways that directly contradict the expansive body of evidence demonstrating that self-managed medication abortion is both safe and effective. The chatbot's tendency to perpetuate health misinformation and associated stigma regarding self-managed medication abortions poses a threat to public health and reproductive autonomy.
Collapse
Affiliation(s)
- Hayley V. McMahon
- Department of Behavioral, Social, and Health Education Sciences, Emory University Rollins School of Public Health, Atlanta, GA, United States
- The Center forReproductive Health Research in the Southeast, Emory University Rollins School of Public Health, Atlanta, GA, United States
| | | |
Collapse
|
43
|
Morita PP, Lotto M, Kaur J, Chumachenko D, Oetomo A, Espiritu KD, Hussain IZ. What is the impact of artificial intelligence-based chatbots on infodemic management? Front Public Health 2024; 12:1310437. [PMID: 38414895 PMCID: PMC10896940 DOI: 10.3389/fpubh.2024.1310437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 01/31/2024] [Indexed: 02/29/2024] Open
Abstract
Artificial intelligence (AI) chatbots have the potential to revolutionize online health information-seeking behavior by delivering up-to-date information on a wide range of health topics. They generate personalized responses to user queries through their ability to process extensive amounts of text, analyze trends, and generate natural language responses. Chatbots can manage infodemic by debunking online health misinformation on a large scale. Nevertheless, system accuracy remains technically challenging. Chatbots require training on diverse and representative datasets, security to protect against malicious actors, and updates to keep up-to-date on scientific progress. Therefore, although AI chatbots hold significant potential in assisting infodemic management, it is essential to approach their outputs with caution due to their current limitations.
Collapse
Affiliation(s)
- Plinio P. Morita
- School of Public Health Sciences, University of Waterloo, Waterloo, ON, Canada
- Department of Systems Design Engineering, University of Waterloo, Waterloo, ON, Canada
- Research Institute for Aging, University of Waterloo, Waterloo, ON, Canada
- Centre for Digital Therapeutics, Techna Institute, University Health Network, Toronto, ON, Canada
- Institute of Health Policy, Management, and Evaluation, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - Matheus Lotto
- School of Public Health Sciences, University of Waterloo, Waterloo, ON, Canada
- Department of Pediatric Dentistry, Orthodontics, and Public Health, Bauru School of Dentistry, University of São Paulo, Bauru, Brazil
| | - Jasleen Kaur
- School of Public Health Sciences, University of Waterloo, Waterloo, ON, Canada
| | - Dmytro Chumachenko
- School of Public Health Sciences, University of Waterloo, Waterloo, ON, Canada
- Department of Mathematical Modelling and Artificial Intelligence, National Aerospace University “Kharkiv Aviation Institute”, Kharkiv, Ukraine
| | - Arlene Oetomo
- School of Public Health Sciences, University of Waterloo, Waterloo, ON, Canada
| | | | | |
Collapse
|
44
|
Alipour S, Galeazzi A, Sangiorgio E, Avalle M, Bojic L, Cinelli M, Quattrociocchi W. Cross-platform social dynamics: an analysis of ChatGPT and COVID-19 vaccine conversations. Sci Rep 2024; 14:2789. [PMID: 38307909 PMCID: PMC10837143 DOI: 10.1038/s41598-024-53124-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 01/29/2024] [Indexed: 02/04/2024] Open
Abstract
The role of social media in information dissemination and agenda-setting has significantly expanded in recent years. By offering real-time interactions, online platforms have become invaluable tools for studying societal responses to significant events as they unfold. However, online reactions to external developments are influenced by various factors, including the nature of the event and the online environment. This study examines the dynamics of public discourse on digital platforms to shed light on this issue. We analyzed over 12 million posts and news articles related to two significant events: the release of ChatGPT in 2022 and the global discussions about COVID-19 vaccines in 2021. Data was collected from multiple platforms, including Twitter, Facebook, Instagram, Reddit, YouTube, and GDELT. We employed topic modeling techniques to uncover the distinct thematic emphases on each platform, which reflect their specific features and target audiences. Additionally, sentiment analysis revealed various public perceptions regarding the topics studied. Lastly, we compared the evolution of engagement across platforms, unveiling unique patterns for the same topic. Notably, discussions about COVID-19 vaccines spread more rapidly due to the immediacy of the subject, while discussions about ChatGPT, despite its technological importance, propagated more gradually.
Collapse
Affiliation(s)
- Shayan Alipour
- Department of Computer Science, Sapienza University of Rome, Rome, Italy.
| | | | - Emanuele Sangiorgio
- Department of Social Sciences and Economics, Sapienza University of Rome, Rome, Italy
| | - Michele Avalle
- Department of Computer Science, Sapienza University of Rome, Rome, Italy
| | - Ljubisa Bojic
- The Institute for Artificial Intelligence Research and Development of Serbia, Beograd, Serbia
- Institute for Philosophy and Social Theory, University of Belgrade, Beograd, Serbia
| | - Matteo Cinelli
- Department of Computer Science, Sapienza University of Rome, Rome, Italy
| | | |
Collapse
|
45
|
Khene ZE, Bigot P, Mathieu R, Rouprêt M, Bensalah K. Development of a Personalized Chat Model Based on the European Association of Urology Oncology Guidelines: Harnessing the Power of Generative Artificial Intelligence in Clinical Practice. Eur Urol Oncol 2024; 7:160-162. [PMID: 37474402 DOI: 10.1016/j.euo.2023.06.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 06/22/2023] [Accepted: 06/28/2023] [Indexed: 07/22/2023]
Affiliation(s)
| | - Pierre Bigot
- Department of Urology, University of Angers, Angers, France
| | - Romain Mathieu
- Department of Urology, Rennes University Hospital, Rennes, France
| | - Morgan Rouprêt
- Department of Urology, La Pitié-Salpétrière Hospital, Paris, France
| | - Karim Bensalah
- Department of Urology, Rennes University Hospital, Rennes, France.
| |
Collapse
|
46
|
Kapsali MZ, Livanis E, Tsalikidis C, Oikonomou P, Voultsos P, Tsaroucha A. Ethical Concerns About ChatGPT in Healthcare: A Useful Tool or the Tombstone of Original and Reflective Thinking? Cureus 2024; 16:e54759. [PMID: 38523987 PMCID: PMC10961144 DOI: 10.7759/cureus.54759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/23/2024] [Indexed: 03/26/2024] Open
Abstract
Artificial intelligence (AI), the uprising technology of computer science aiming to create digital systems with human behavior and intelligence, seems to have invaded almost every field of modern life. Launched in November 2022, ChatGPT (Chat Generative Pre-trained Transformer) is a textual AI application capable of creating human-like responses characterized by original language and high coherence. Although AI-based language models have demonstrated impressive capabilities in healthcare, ChatGPT has received controversial annotations from the scientific and academic communities. This chatbot already appears to have a massive impact as an educational tool for healthcare professionals and transformative potential for clinical practice and could lead to dramatic changes in scientific research. Nevertheless, rational concerns were raised regarding whether the pre-trained, AI-generated text would be a menace not only for original thinking and new scientific ideas but also for academic and research integrity, as it gets more and more difficult to distinguish its AI origin due to the coherence and fluency of the produced text. This short review aims to summarize the potential applications and the consequential implications of ChatGPT in the three critical pillars of medicine: education, research, and clinical practice. In addition, this paper discusses whether the current use of this chatbot is in compliance with the ethical principles for the safe use of AI in healthcare, as determined by the World Health Organization. Finally, this review highlights the need for an updated ethical framework and the increased vigilance of healthcare stakeholders to harvest the potential benefits and limit the imminent dangers of this new innovative technology.
Collapse
Affiliation(s)
- Marina Z Kapsali
- Postgraduate Program on Bioethics, Laboratory of Bioethics, Democritus University of Thrace, Alexandroupolis, GRC
| | - Efstratios Livanis
- Department of Accounting and Finance, University of Macedonia, Thessaloniki, GRC
| | - Christos Tsalikidis
- Department of General Surgery, Democritus University of Thrace, Alexandroupolis, GRC
| | - Panagoula Oikonomou
- Laboratory of Experimental Surgery, Department of General Surgery, Democritus University of Thrace, Alexandroupolis, GRC
| | - Polychronis Voultsos
- Laboratory of Forensic Medicine & Toxicology (Medical Law and Ethics), School of Medicine, Faculty of Health Sciences, Aristotle University of Thessaloniki, Thessaloniki, GRC
| | - Aleka Tsaroucha
- Department of General Surgery, Democritus University of Thrace, Alexandroupolis, GRC
| |
Collapse
|
47
|
Sezgin E. Redefining Virtual Assistants in Health Care: The Future With Large Language Models. J Med Internet Res 2024; 26:e53225. [PMID: 38241074 PMCID: PMC10837753 DOI: 10.2196/53225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 12/25/2023] [Accepted: 01/02/2024] [Indexed: 01/23/2024] Open
Abstract
This editorial explores the evolving and transformative role of large language models (LLMs) in enhancing the capabilities of virtual assistants (VAs) in the health care domain, highlighting recent research on the performance of VAs and LLMs in health care information sharing. Focusing on recent research, this editorial unveils the marked improvement in the accuracy and clinical relevance of responses from LLMs, such as GPT-4, compared to current VAs, especially in addressing complex health care inquiries, like those related to postpartum depression. The improved accuracy and clinical relevance with LLMs mark a paradigm shift in digital health tools and VAs. Furthermore, such LLM applications have the potential to dynamically adapt and be integrated into existing VA platforms, offering cost-effective, scalable, and inclusive solutions. These suggest a significant increase in the applicable range of VA applications, as well as the increased value, risk, and impact in health care, moving toward more personalized digital health ecosystems. However, alongside these advancements, it is necessary to develop and adhere to ethical guidelines, regulatory frameworks, governance principles, and privacy and safety measures. We need a robust interdisciplinary collaboration to navigate the complexities of safely and effectively integrating LLMs into health care applications, ensuring that these emerging technologies align with the diverse needs and ethical considerations of the health care domain.
Collapse
Affiliation(s)
- Emre Sezgin
- The Abigail Wexner Reseach Institute at Nationwide Children's Hospital, Columbus, OH, United States
- The Ohio State University College of Medicine, Columbus, OH, United States
| |
Collapse
|
48
|
Davies NP, Wilson R, Winder MS, Tunster SJ, McVicar K, Thakrar S, Williams J, Reid A. ChatGPT sits the DFPH exam: large language model performance and potential to support public health learning. BMC MEDICAL EDUCATION 2024; 24:57. [PMID: 38212802 PMCID: PMC10782695 DOI: 10.1186/s12909-024-05042-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 01/06/2024] [Indexed: 01/13/2024]
Abstract
BACKGROUND Artificial intelligence-based large language models, like ChatGPT, have been rapidly assessed for both risks and potential in health-related assessment and learning. However, their applications in public health professional exams have not yet been studied. We evaluated the performance of ChatGPT in part of the Faculty of Public Health's Diplomat exam (DFPH). METHODS ChatGPT was provided with a bank of 119 publicly available DFPH question parts from past papers. Its performance was assessed by two active DFPH examiners. The degree of insight and level of understanding apparently displayed by ChatGPT was also assessed. RESULTS ChatGPT passed 3 of 4 papers, surpassing the current pass rate. It performed best on questions relating to research methods. Its answers had a high floor. Examiners identified ChatGPT answers with 73.6% accuracy and human answers with 28.6% accuracy. ChatGPT provided a mean of 3.6 unique insights per question and appeared to demonstrate a required level of learning on 71.4% of occasions. CONCLUSIONS Large language models have rapidly increasing potential as a learning tool in public health education. However, their factual fallibility and the difficulty of distinguishing their responses from that of humans pose potential threats to teaching and learning.
Collapse
Affiliation(s)
- Nathan P Davies
- Nottingham Centre for Public Health and Epidemiology, University of Nottingham, Nottingham City Hospital, Hucknall Rd, Nottingham, NG5 1PB, England.
| | - Robert Wilson
- NHS England, Seaton House, City Link, London Road, Nottingham, NG2 4LA, England
| | - Madeleine S Winder
- Nottingham Centre for Public Health and Epidemiology, University of Nottingham, Nottingham City Hospital, Hucknall Rd, Nottingham, NG5 1PB, England
| | - Simon J Tunster
- Nottingham Centre for Public Health and Epidemiology, University of Nottingham, Nottingham City Hospital, Hucknall Rd, Nottingham, NG5 1PB, England
| | - Kathryn McVicar
- Nottingham Centre for Public Health and Epidemiology, University of Nottingham, Nottingham City Hospital, Hucknall Rd, Nottingham, NG5 1PB, England
| | - Shivan Thakrar
- Leicester City Council, Public Health, 115 Charles Street, Leicester, LE1 1FZ, England
| | - Joe Williams
- School of Health and Related Research (ScHARR), The University of Sheffield, 30 Regent St, Sheffield, S1 4DA, England
| | - Allan Reid
- NHS England, Seaton House, City Link, London Road, Nottingham, NG2 4LA, England
| |
Collapse
|
49
|
Toro-Hernández FD, Migeot J, Marchant N, Olivares D, Ferrante F, González-Gómez R, González Campo C, Fittipaldi S, Rojas-Costa GM, Moguilner S, Slachevsky A, Chaná Cuevas P, Ibáñez A, Chaigneau S, García AM. Neurocognitive correlates of semantic memory navigation in Parkinson's disease. NPJ Parkinsons Dis 2024; 10:15. [PMID: 38195756 PMCID: PMC10776628 DOI: 10.1038/s41531-024-00630-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 12/29/2023] [Indexed: 01/11/2024] Open
Abstract
Cognitive studies on Parkinson's disease (PD) reveal abnormal semantic processing. Most research, however, fails to indicate which conceptual properties are most affected and capture patients' neurocognitive profiles. Here, we asked persons with PD, healthy controls, and individuals with behavioral variant frontotemporal dementia (bvFTD, as a disease control group) to read concepts (e.g., 'sun') and list their features (e.g., hot). Responses were analyzed in terms of ten word properties (including concreteness, imageability, and semantic variability), used for group-level comparisons, subject-level classification, and brain-behavior correlations. PD (but not bvFTD) patients produced more concrete and imageable words than controls, both patterns being associated with overall cognitive status. PD and bvFTD patients showed reduced semantic variability, an anomaly which predicted semantic inhibition outcomes. Word-property patterns robustly classified PD (but not bvFTD) patients and correlated with disease-specific hypoconnectivity along the sensorimotor and salience networks. Fine-grained semantic assessments, then, can reveal distinct neurocognitive signatures of PD.
Collapse
Affiliation(s)
- Felipe Diego Toro-Hernández
- Graduate Program in Neuroscience and Cognition, Federal University of ABC, São Paulo, Brazil
- Center for Social and Cognitive Neuroscience, School of Psychology, Universidad Adolfo Ibáñez, Santiago, Chile
| | - Joaquín Migeot
- Center for Social and Cognitive Neuroscience, School of Psychology, Universidad Adolfo Ibáñez, Santiago, Chile
- Latin American Brain Health Institute, Universidad Adolfo Ibáñez, Santiago, Chile
| | - Nicolás Marchant
- Center for Social and Cognitive Neuroscience, School of Psychology, Universidad Adolfo Ibáñez, Santiago, Chile
| | - Daniela Olivares
- Center for Social and Cognitive Neuroscience, School of Psychology, Universidad Adolfo Ibáñez, Santiago, Chile
- Laboratorio de Neuropsicología y Neurociencias Clínicas, Universidad de Chile, Santiago, Chile
| | - Franco Ferrante
- Cognitive Neuroscience Center, Universidad de San Andrés, Buenos Aires, Argentina
- National Scientific and Technical Research Council, Buenos Aires, Argentina
- Facultad de Ingeniería, Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Raúl González-Gómez
- Center for Social and Cognitive Neuroscience, School of Psychology, Universidad Adolfo Ibáñez, Santiago, Chile
- Latin American Brain Health Institute, Universidad Adolfo Ibáñez, Santiago, Chile
| | - Cecilia González Campo
- Cognitive Neuroscience Center, Universidad de San Andrés, Buenos Aires, Argentina
- National Scientific and Technical Research Council, Buenos Aires, Argentina
| | - Sol Fittipaldi
- Latin American Brain Health Institute, Universidad Adolfo Ibáñez, Santiago, Chile
- Cognitive Neuroscience Center, Universidad de San Andrés, Buenos Aires, Argentina
- Global Brain Health Institute, University of California, San Francisco, California, USA; & Trinity College, Dublin, Ireland
| | - Gonzalo M Rojas-Costa
- Department of Radiology, Clínica las Condes, Santiago, Chile
- Advanced Epilepsy Center, Clínica las Condes, Santiago, Chile
- Join Unit FISABIO-CIPF, Valencia, Spain
- School of Medicine, Finis Terrae University, Santiago, Chile
- Health Innovation Center, Clínica Las Condes, Santiago, Chile
| | - Sebastian Moguilner
- Global Brain Health Institute, University of California, San Francisco, California, USA; & Trinity College, Dublin, Ireland
| | - Andrea Slachevsky
- Memory and Neuropsychiatric Center (CMYN), Neurology Department, Hospital del Salvador & Faculty of Medicine, University of Chile, Santiago, Chile
- Geroscience Center for Brain Health and Metabolism (GERO), Santiago, Chile
- Neuropsychology and Clinical Neuroscience Laboratory (LANNEC), Physiopatology Program - Institute of Biomedical Sciences (ICBM), Neuroscience and East Neuroscience Departments, Faculty of Medicine, University of Chile, Santiago, Chile
- Neurology and Psychiatry Department, Clínica Alemana-Universidad Desarrollo, Santiago, Chile
| | - Pedro Chaná Cuevas
- Facultad de Ciencias Médicas, Universidad de Santiago de Chile, Santiago, Chile
| | - Agustín Ibáñez
- Latin American Brain Health Institute, Universidad Adolfo Ibáñez, Santiago, Chile
- Cognitive Neuroscience Center, Universidad de San Andrés, Buenos Aires, Argentina
- Global Brain Health Institute, University of California, San Francisco, California, USA; & Trinity College, Dublin, Ireland
| | - Sergio Chaigneau
- Center for Social and Cognitive Neuroscience, School of Psychology, Universidad Adolfo Ibáñez, Santiago, Chile
- Center for Cognition Research, School of Psychology, Universidad Adolfo Ibáñez, Santiago, Chile
| | - Adolfo M García
- Latin American Brain Health Institute, Universidad Adolfo Ibáñez, Santiago, Chile.
- Cognitive Neuroscience Center, Universidad de San Andrés, Buenos Aires, Argentina.
- Global Brain Health Institute, University of California, San Francisco, California, USA; & Trinity College, Dublin, Ireland.
- Departamento de Lingüística y Literatura, Facultad de Humanidades, Universidad de Santiago de Chile, Santiago, Chile.
| |
Collapse
|
50
|
Gravina AG, Pellegrino R, Cipullo M, Palladino G, Imperio G, Ventura A, Auletta S, Ciamarra P, Federico A. May ChatGPT be a tool producing medical information for common inflammatory bowel disease patients' questions? An evidence-controlled analysis. World J Gastroenterol 2024; 30:17-33. [PMID: 38293321 PMCID: PMC10823903 DOI: 10.3748/wjg.v30.i1.17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Revised: 12/07/2023] [Accepted: 12/28/2023] [Indexed: 01/06/2024] Open
Abstract
Artificial intelligence is increasingly entering everyday healthcare. Large language model (LLM) systems such as Chat Generative Pre-trained Transformer (ChatGPT) have become potentially accessible to everyone, including patients with inflammatory bowel diseases (IBD). However, significant ethical issues and pitfalls exist in innovative LLM tools. The hype generated by such systems may lead to unweighted patient trust in these systems. Therefore, it is necessary to understand whether LLMs (trendy ones, such as ChatGPT) can produce plausible medical information (MI) for patients. This review examined ChatGPT's potential to provide MI regarding questions commonly addressed by patients with IBD to their gastroenterologists. From the review of the outputs provided by ChatGPT, this tool showed some attractive potential while having significant limitations in updating and detailing information and providing inaccurate information in some cases. Further studies and refinement of the ChatGPT, possibly aligning the outputs with the leading medical evidence provided by reliable databases, are needed.
Collapse
Affiliation(s)
- Antonietta Gerarda Gravina
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| | - Raffaele Pellegrino
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| | - Marina Cipullo
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| | - Giovanna Palladino
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| | - Giuseppe Imperio
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| | - Andrea Ventura
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| | - Salvatore Auletta
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| | - Paola Ciamarra
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| | - Alessandro Federico
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| |
Collapse
|