Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Elyoseph Z, Hadar-Shoval D, Asraf K, Lvovsky M. ChatGPT outperforms humans in emotional awareness evaluations. Front Psychol 2023;14:1199058. [PMID: 37303897 PMCID: PMC10254409 DOI: 10.3389/fpsyg.2023.1199058] [Citation(s) in RCA: 30] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Accepted: 05/11/2023] [Indexed: 06/13/2023] Open

For:	Elyoseph Z, Hadar-Shoval D, Asraf K, Lvovsky M. ChatGPT outperforms humans in emotional awareness evaluations. Front Psychol 2023;14:1199058. [PMID: 37303897 PMCID: PMC10254409 DOI: 10.3389/fpsyg.2023.1199058] [Citation(s) in RCA: 30] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Accepted: 05/11/2023] [Indexed: 06/13/2023] Open

Number

Cited by Other Article(s)

Hunter RB, Thammasitboon S, Rahman SS, Fainberg N, Renuart A, Kumar S, Jain PN, Rissmiller B, Sur M, Mehta S. Using ChatGPT to Provide Patient-Specific Answers to Parental Questions in the PICU. Pediatrics 2024:e2024066615. [PMID: 39370900 DOI: 10.1542/peds.2024-066615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 08/12/2024] [Accepted: 08/14/2024] [Indexed: 10/08/2024] Open

Park C, Kim J. Exploring Affective Representations in Emotional Narratives: An Exploratory Study Comparing ChatGPT and Human Responses. CYBERPSYCHOLOGY, BEHAVIOR AND SOCIAL NETWORKING 2024;27:736-741. [PMID: 39229675 DOI: 10.1089/cyber.2024.0100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]

Hadar-Shoval D, Asraf K, Shinan-Altman S, Elyoseph Z, Levkovich I. Embedded values-like shape ethical reasoning of large language models on primary care ethical dilemmas. Heliyon 2024;10:e38056. [PMID: 39381244 PMCID: PMC11458949 DOI: 10.1016/j.heliyon.2024.e38056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Accepted: 09/17/2024] [Indexed: 10/10/2024] Open

Abstract

Objective

This article uses the framework of Schwartz's values theory to examine whether the embedded values-like profile within large language models (LLMs) impact ethical decision-making dilemmas faced by primary care. It specifically aims to evaluate whether each LLM exhibits a distinct values-like profile, assess its alignment with general population values, and determine whether latent values influence clinical recommendations.

Methods

The Portrait Values Questionnaire-Revised (PVQ-RR) was submitted to each LLM (Claude, Bard, GPT-3.5, and GPT-4) 20 times to ensure reliable and valid responses. Their responses were compared to a benchmark derived from a diverse international sample consisting of over 53,000 culturally diverse respondents who completed the PVQ-RR. Four vignettes depicting prototypical professional quandaries involving conflicts between competing values were presented to the LLMs. The option selected by each LLM and the strength of its recommendation were evaluated to determine if underlying values-like impact output.

Results

Each LLM demonstrated a unique values-like profile. Universalism and self-direction were prioritized, while power and tradition were assigned less importance than population benchmarks, suggesting potential Western-centric biases. Four clinical vignettes involving value conflicts were presented to the LLMs. Preliminary indications suggested that embedded values-like influence recommendations. Significant variances in confidence strength regarding chosen recommendations materialized between models, proposing that further vetting is required before the LLMs can be relied on as judgment aids. However, the overall selection of preferences aligned with intrinsic value hierarchies.

Conclusion

The distinct intrinsic values-like embedded within LLMs shape ethical decision-making, which carries implications for their integration in primary care settings serving diverse populations. For context-appropriate, equitable delivery of AI-assisted healthcare globally it is essential that LLMs are tailored to align with cultural outlooks.

Collapse

Tam TYC, Sivarajkumar S, Kapoor S, Stolyar AV, Polanska K, McCarthy KR, Osterhoudt H, Wu X, Visweswaran S, Fu S, Mathur P, Cacciamani GE, Sun C, Peng Y, Wang Y. A framework for human evaluation of large language models in healthcare derived from literature review. NPJ Digit Med 2024;7:258. [PMID: 39333376 PMCID: PMC11437138 DOI: 10.1038/s41746-024-01258-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2024] [Accepted: 09/11/2024] [Indexed: 09/29/2024] Open

Affiliation(s)

Thomas Yu Chow Tam Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA
Sonish Sivarajkumar Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
Sumit Kapoor Department of Critical Care Medicine, University of Pittsburgh Medical Center, Pittsburgh, PA, USA
Alisa V Stolyar Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA
Katelyn Polanska Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA
Karleigh R McCarthy Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA
Hunter Osterhoudt Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA
Xizhi Wu Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA
Shyam Visweswaran Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, PA, USA
Sunyang Fu Department of Clinical and Health Informatics, Center for Translational AI Excellence and Applications in Medicine, University of Texas Health Science Center at Houston, Houston, TX, USA
Piyush Mathur Department of Anesthesiology, Cleveland Clinic, Cleveland, OH, USA BrainX AI ReSearch, BrainX LLC, Cleveland, OH, USA
Giovanni E Cacciamani Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
Cong Sun Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
Yifan Peng Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
Yanshan Wang Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA. Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA. Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA. Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, PA, USA. Hillman Cancer Center, University of Pittsburgh Medical Center, Pittsburgh, PA, USA.

Collapse

Tavory T. Regulating AI in Mental Health: Ethics of Care Perspective. JMIR Ment Health 2024;11:e58493. [PMID: 39298759 PMCID: PMC11450345 DOI: 10.2196/58493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/17/2024] [Revised: 06/29/2024] [Accepted: 07/20/2024] [Indexed: 09/22/2024] Open

Fatahi S, Vassileva J, Roy CK. Comparing emotions in ChatGPT answers and human answers to the coding questions on Stack Overflow. Front Artif Intell 2024;7:1393903. [PMID: 39351510 PMCID: PMC11439875 DOI: 10.3389/frai.2024.1393903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 08/22/2024] [Indexed: 10/04/2024] Open

Abstract

Introduction

Recent advances in generative Artificial Intelligence (AI) and Natural Language Processing (NLP) have led to the development of Large Language Models (LLMs) and AI-powered chatbots like ChatGPT, which have numerous practical applications. Notably, these models assist programmers with coding queries, debugging, solution suggestions, and providing guidance on software development tasks. Despite known issues with the accuracy of ChatGPT's responses, its comprehensive and articulate language continues to attract frequent use. This indicates potential for ChatGPT to support educators and serve as a virtual tutor for students.

Methods

To explore this potential, we conducted a comprehensive analysis comparing the emotional content in responses from ChatGPT and human answers to 2000 questions sourced from Stack Overflow (SO). The emotional aspects of the answers were examined to understand how the emotional tone of AI responses compares to that of human responses.

Results

Our analysis revealed that ChatGPT's answers are generally more positive compared to human responses. In contrast, human answers often exhibit emotions such as anger and disgust. Significant differences were observed in emotional expressions between ChatGPT and human responses, particularly in the emotions of anger, disgust, and joy. Human responses displayed a broader emotional spectrum compared to ChatGPT, suggesting greater emotional variability among humans.

Discussion

The findings highlight a distinct emotional divergence between ChatGPT and human responses, with ChatGPT exhibiting a more uniformly positive tone and humans displaying a wider range of emotions. This variance underscores the need for further research into the role of emotional content in AI and human interactions, particularly in educational contexts where emotional nuances can impact learning and communication.

Collapse

Bala B. Chatbots Are Not Clinicians: Addressing Misconceptions About Large Language Model Use in Psychiatric Care. ACADEMIC PSYCHIATRY : THE JOURNAL OF THE AMERICAN ASSOCIATION OF DIRECTORS OF PSYCHIATRIC RESIDENCY TRAINING AND THE ASSOCIATION FOR ACADEMIC PSYCHIATRY 2024:10.1007/s40596-024-02042-1. [PMID: 39237804 DOI: 10.1007/s40596-024-02042-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Accepted: 08/29/2024] [Indexed: 09/07/2024]

Girton MR, Greene DN, Messerlian G, Keren DF, Yu M. ChatGPT vs Medical Professional: Analyzing Responses to Laboratory Medicine Questions on Social Media. Clin Chem 2024;70:1122-1139. [PMID: 39013110 DOI: 10.1093/clinchem/hvae093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 05/30/2024] [Indexed: 07/18/2024]

Suffoletto B. Deceptively Simple yet Profoundly Impactful: Text Messaging Interventions to Support Health. J Med Internet Res 2024;26:e58726. [PMID: 39190427 PMCID: PMC11387917 DOI: 10.2196/58726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Revised: 05/30/2024] [Accepted: 07/15/2024] [Indexed: 08/28/2024] Open

Xian X, Chang A, Xiang YT, Liu MT. Debate and Dilemmas Regarding Generative AI in Mental Health Care: Scoping Review. Interact J Med Res 2024;13:e53672. [PMID: 39133916 PMCID: PMC11347908 DOI: 10.2196/53672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Revised: 04/02/2024] [Accepted: 04/26/2024] [Indexed: 08/30/2024] Open

Abstract

BACKGROUND

Mental disorders have ranked among the top 10 prevalent causes of burden on a global scale. Generative artificial intelligence (GAI) has emerged as a promising and innovative technological advancement that has significant potential in the field of mental health care. Nevertheless, there is a scarcity of research dedicated to examining and understanding the application landscape of GAI within this domain.

OBJECTIVE

This review aims to inform the current state of GAI knowledge and identify its key uses in the mental health domain by consolidating relevant literature.

METHODS

Records were searched within 8 reputable sources including Web of Science, PubMed, IEEE Xplore, medRxiv, bioRxiv, Google Scholar, CNKI and Wanfang databases between 2013 and 2023. Our focus was on original, empirical research with either English or Chinese publications that use GAI technologies to benefit mental health. For an exhaustive search, we also checked the studies cited by relevant literature. Two reviewers were responsible for the data selection process, and all the extracted data were synthesized and summarized for brief and in-depth analyses depending on the GAI approaches used (traditional retrieval and rule-based techniques vs advanced GAI techniques).

RESULTS

In this review of 144 articles, 44 (30.6%) met the inclusion criteria for detailed analysis. Six key uses of advanced GAI emerged: mental disorder detection, counseling support, therapeutic application, clinical training, clinical decision-making support, and goal-driven optimization. Advanced GAI systems have been mainly focused on therapeutic applications (n=19, 43%) and counseling support (n=13, 30%), with clinical training being the least common. Most studies (n=28, 64%) focused broadly on mental health, while specific conditions such as anxiety (n=1, 2%), bipolar disorder (n=2, 5%), eating disorders (n=1, 2%), posttraumatic stress disorder (n=2, 5%), and schizophrenia (n=1, 2%) received limited attention. Despite prevalent use, the efficacy of ChatGPT in the detection of mental disorders remains insufficient. In addition, 100 articles on traditional GAI approaches were found, indicating diverse areas where advanced GAI could enhance mental health care.

CONCLUSIONS

This study provides a comprehensive overview of the use of GAI in mental health care, which serves as a valuable guide for future research, practical applications, and policy development in this domain. While GAI demonstrates promise in augmenting mental health care services, its inherent limitations emphasize its role as a supplementary tool rather than a replacement for trained mental health providers. A conscientious and ethical integration of GAI techniques is necessary, ensuring a balanced approach that maximizes benefits while mitigating potential challenges in mental health care practices.

Collapse

Cosic K, Kopilas V, Jovanovic T. War, emotions, mental health, and artificial intelligence. Front Psychol 2024;15:1394045. [PMID: 39156807 PMCID: PMC11327060 DOI: 10.3389/fpsyg.2024.1394045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 07/24/2024] [Indexed: 08/20/2024] Open

Abstract

During the war time dysregulation of negative emotions such as fear, anger, hatred, frustration, sadness, humiliation, and hopelessness can overrule normal societal values, culture, and endanger global peace and security, and mental health in affected societies. Therefore, it is understandable that the range and power of negative emotions may play important roles in consideration of human behavior in any armed conflict. The estimation and assessment of dominant negative emotions during war time are crucial but are challenged by the complexity of emotions' neuro-psycho-physiology. Currently available natural language processing (NLP) tools have comprehensive computational methods to analyze and understand the emotional content of related textual data in war-inflicted societies. Innovative AI-driven technologies incorporating machine learning, neuro-linguistic programming, cloud infrastructure, and novel digital therapeutic tools and applications present an immense potential to enhance mental health care worldwide. This advancement could make mental health services more cost-effective and readily accessible. Due to the inadequate number of psychiatrists and limited psychiatric resources in coping with mental health consequences of war and traumas, new digital therapeutic wearable devices supported by AI tools and means might be promising approach in psychiatry of future. Transformation of negative dominant emotional maps might be undertaken by the simultaneous combination of online cognitive behavioral therapy (CBT) on individual level, as well as usage of emotionally based strategic communications (EBSC) on a public level. The proposed positive emotional transformation by means of CBT and EBSC may provide important leverage in efforts to protect mental health of civil population in war-inflicted societies. AI-based tools that can be applied in design of EBSC stimuli, like Open AI Chat GPT or Google Gemini may have great potential to significantly enhance emotionally based strategic communications by more comprehensive understanding of semantic and linguistic analysis of available text datasets of war-traumatized society. Human in the loop enhanced by Chat GPT and Gemini can aid in design and development of emotionally annotated messages that resonate among targeted population, amplifying the impact of strategic communications in shaping human dominant emotional maps into a more positive by CBT and EBCS.

Collapse

Kim J, Leonte KG, Chen ML, Torous JB, Linos E, Pinto A, Rodriguez CI. Large language models outperform mental and medical health care professionals in identifying obsessive-compulsive disorder. NPJ Digit Med 2024;7:193. [PMID: 39030292 PMCID: PMC11271579 DOI: 10.1038/s41746-024-01181-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 06/28/2024] [Indexed: 07/21/2024] Open

Maltby J, Rayes T, Nage A, Sharif S, Omar M, Nichani S. Synthesizing perspectives: Crafting an Interdisciplinary view of social media's impact on young people's mental health. PLoS One 2024;19:e0307164. [PMID: 39008509 PMCID: PMC11249244 DOI: 10.1371/journal.pone.0307164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 07/01/2024] [Indexed: 07/17/2024] Open

Banerjee S, Dunn P, Conard S, Ali A. Mental Health Applications of Generative AI and Large Language Modeling in the United States. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2024;21:910. [PMID: 39063487 PMCID: PMC11276907 DOI: 10.3390/ijerph21070910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2024] [Revised: 07/09/2024] [Accepted: 07/10/2024] [Indexed: 07/28/2024]

Omar M, Soffer S, Charney AW, Landi I, Nadkarni GN, Klang E. Applications of large language models in psychiatry: a systematic review. Front Psychiatry 2024;15:1422807. [PMID: 38979501 PMCID: PMC11228775 DOI: 10.3389/fpsyt.2024.1422807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Accepted: 06/05/2024] [Indexed: 07/10/2024] Open

Liu J. ChatGPT: perspectives from human-computer interaction and psychology. Front Artif Intell 2024;7:1418869. [PMID: 38957452 PMCID: PMC11217544 DOI: 10.3389/frai.2024.1418869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Accepted: 06/04/2024] [Indexed: 07/04/2024] Open

Suwała S, Szulc P, Guzowski C, Kamińska B, Dorobiała J, Wojciechowska K, Berska M, Kubicka O, Kosturkiewicz O, Kosztulska B, Rajewska A, Junik R. ChatGPT-3.5 passes Poland's medical final examination-Is it possible for ChatGPT to become a doctor in Poland? SAGE Open Med 2024;12:20503121241257777. [PMID: 38895543 PMCID: PMC11185017 DOI: 10.1177/20503121241257777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Accepted: 05/08/2024] [Indexed: 06/21/2024] Open

Abstract

Objectives

ChatGPT is an advanced chatbot based on Large Language Model that has the ability to answer questions. Undoubtedly, ChatGPT is capable of transforming communication, education, and customer support; however, can it play the role of a doctor? In Poland, prior to obtaining a medical diploma, candidates must successfully pass the Medical Final Examination.

Methods

The purpose of this research was to determine how well ChatGPT performed on the Polish Medical Final Examination, which passing is required to become a doctor in Poland (an exam is considered passed if at least 56% of the tasks are answered correctly). A total of 2138 categorized Medical Final Examination questions (from 11 examination sessions held between 2013-2015 and 2021-2023) were presented to ChatGPT-3.5 from 19 to 26 May 2023. For further analysis, the questions were divided into quintiles based on difficulty and duration, as well as question types (simple A-type or complex K-type). The answers provided by ChatGPT were compared to the official answer key, reviewed for any changes resulting from the advancement of medical knowledge.

Results

ChatGPT correctly answered 53.4%-64.9% of questions. In 8 out of 11 exam sessions, ChatGPT achieved the scores required to successfully pass the examination (60%). The correlation between the efficacy of artificial intelligence and the level of complexity, difficulty, and length of a question was found to be negative. AI outperformed humans in one category: psychiatry (77.18% vs. 70.25%, p = 0.081).

Conclusions

The performance of artificial intelligence is deemed satisfactory; however, it is observed to be markedly inferior to that of human graduates in the majority of instances. Despite its potential utility in many medical areas, ChatGPT is constrained by its inherent limitations that prevent it from entirely supplanting human expertise and knowledge.

Collapse

Affiliation(s)

Szymon Suwała Department of Endocrinology and Diabetology, Nicolaus Copernicus University, Collegium Medicum, Bydgoszcz, Poland
Paulina Szulc Evidence-Based Medicine Students Scientific Club of Department of Endocrinology and Diabetology, Nicolaus Copernicus University, Collegium Medicum, Bydgoszcz, Poland
Cezary Guzowski Evidence-Based Medicine Students Scientific Club of Department of Endocrinology and Diabetology, Nicolaus Copernicus University, Collegium Medicum, Bydgoszcz, Poland
Barbara Kamińska Evidence-Based Medicine Students Scientific Club of Department of Endocrinology and Diabetology, Nicolaus Copernicus University, Collegium Medicum, Bydgoszcz, Poland
Jakub Dorobiała Evidence-Based Medicine Students Scientific Club of Department of Endocrinology and Diabetology, Nicolaus Copernicus University, Collegium Medicum, Bydgoszcz, Poland
Karolina Wojciechowska Evidence-Based Medicine Students Scientific Club of Department of Endocrinology and Diabetology, Nicolaus Copernicus University, Collegium Medicum, Bydgoszcz, Poland
Maria Berska Evidence-Based Medicine Students Scientific Club of Department of Endocrinology and Diabetology, Nicolaus Copernicus University, Collegium Medicum, Bydgoszcz, Poland
Olga Kubicka Evidence-Based Medicine Students Scientific Club of Department of Endocrinology and Diabetology, Nicolaus Copernicus University, Collegium Medicum, Bydgoszcz, Poland
Oliwia Kosturkiewicz Evidence-Based Medicine Students Scientific Club of Department of Endocrinology and Diabetology, Nicolaus Copernicus University, Collegium Medicum, Bydgoszcz, Poland
Bernadetta Kosztulska Evidence-Based Medicine Students Scientific Club of Department of Endocrinology and Diabetology, Nicolaus Copernicus University, Collegium Medicum, Bydgoszcz, Poland
Alicja Rajewska Evidence-Based Medicine Students Scientific Club of Department of Endocrinology and Diabetology, Nicolaus Copernicus University, Collegium Medicum, Bydgoszcz, Poland
Roman Junik Department of Endocrinology and Diabetology, Nicolaus Copernicus University, Collegium Medicum, Bydgoszcz, Poland

Collapse

Burke-Garcia A, Soskin Hicks R. Scaling the Idea of Opinion Leadership to Address Health Misinformation: The Case for "Health Communication AI". JOURNAL OF HEALTH COMMUNICATION 2024;29:396-399. [PMID: 38832662 DOI: 10.1080/10810730.2024.2357575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2024]

Tan S, Xin X, Wu D. ChatGPT in medicine: prospects and challenges: a review article. Int J Surg 2024;110:3701-3706. [PMID: 38502861 PMCID: PMC11175750 DOI: 10.1097/js9.0000000000001312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 02/26/2024] [Indexed: 03/21/2024]

Shinan-Altman S, Elyoseph Z, Levkovich I. The impact of history of depression and access to weapons on suicide risk assessment: a comparison of ChatGPT-3.5 and ChatGPT-4. PeerJ 2024;12:e17468. [PMID: 38827287 PMCID: PMC11143969 DOI: 10.7717/peerj.17468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 05/05/2024] [Indexed: 06/04/2024] Open

Irfan B, Kuoppamäki S, Skantze G. Recommendations for designing conversational companion robots with older adults through foundation models. Front Robot AI 2024;11:1363713. [PMID: 38860032 PMCID: PMC11163135 DOI: 10.3389/frobt.2024.1363713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Accepted: 05/07/2024] [Indexed: 06/12/2024] Open

Haber Y, Levkovich I, Hadar-Shoval D, Elyoseph Z. The Artificial Third: A Broad View of the Effects of Introducing Generative Artificial Intelligence on Psychotherapy. JMIR Ment Health 2024;11:e54781. [PMID: 38787297 PMCID: PMC11137430 DOI: 10.2196/54781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 03/24/2024] [Accepted: 04/18/2024] [Indexed: 05/25/2024] Open

Yang P, Jiang J. In Reference to Evaluation of Oropharyngeal Cancer Information from Revolutionary Artificial Intelligence Chatbot. Laryngoscope 2024;134:E18. [PMID: 38299720 DOI: 10.1002/lary.31315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2023] [Accepted: 12/06/2023] [Indexed: 02/02/2024]

Kokot NC, Davis RJ. In Response to Evaluation of Oropharyngeal Cancer Information from Revolutionary Artificial Intelligence Chatbot. Laryngoscope 2024;134:E19. [PMID: 38299696 DOI: 10.1002/lary.31313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Accepted: 01/05/2024] [Indexed: 02/02/2024]

Wang S, Mo C, Chen Y, Dai X, Wang H, Shen X. Exploring the Performance of ChatGPT-4 in the Taiwan Audiologist Qualification Examination: Preliminary Observational Study Highlighting the Potential of AI Chatbots in Hearing Care. JMIR MEDICAL EDUCATION 2024;10:e55595. [PMID: 38693697 PMCID: PMC11067446 DOI: 10.2196/55595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 03/09/2024] [Accepted: 03/22/2024] [Indexed: 05/03/2024]

Abstract

Background

Artificial intelligence (AI) chatbots, such as ChatGPT-4, have shown immense potential for application across various aspects of medicine, including medical education, clinical practice, and research.

Objective

This study aimed to evaluate the performance of ChatGPT-4 in the 2023 Taiwan Audiologist Qualification Examination, thereby preliminarily exploring the potential utility of AI chatbots in the fields of audiology and hearing care services.

Methods

ChatGPT-4 was tasked to provide answers and reasoning for the 2023 Taiwan Audiologist Qualification Examination. The examination encompassed six subjects: (1) basic auditory science, (2) behavioral audiology, (3) electrophysiological audiology, (4) principles and practice of hearing devices, (5) health and rehabilitation of the auditory and balance systems, and (6) auditory and speech communication disorders (including professional ethics). Each subject included 50 multiple-choice questions, with the exception of behavioral audiology, which had 49 questions, amounting to a total of 299 questions.

Results

The correct answer rates across the 6 subjects were as follows: 88% for basic auditory science, 63% for behavioral audiology, 58% for electrophysiological audiology, 72% for principles and practice of hearing devices, 80% for health and rehabilitation of the auditory and balance systems, and 86% for auditory and speech communication disorders (including professional ethics). The overall accuracy rate for the 299 questions was 75%, which surpasses the examination's passing criteria of an average 60% accuracy rate across all subjects. A comprehensive review of ChatGPT-4's responses indicated that incorrect answers were predominantly due to information errors.

Conclusions

ChatGPT-4 demonstrated a robust performance in the Taiwan Audiologist Qualification Examination, showcasing effective logical reasoning skills. Our results suggest that with enhanced information accuracy, ChatGPT-4's performance could be further improved. This study indicates significant potential for the application of AI chatbots in audiology and hearing care services.

Collapse

Embaye J, de Wit M, Snoek F. A Self-Guided Web-Based App (MyDiaMate) for Enhancing Mental Health in Adults With Type 1 Diabetes: Insights From a Real-World Study in the Netherlands. JMIR Diabetes 2024;9:e52923. [PMID: 38568733 PMCID: PMC11024740 DOI: 10.2196/52923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 01/23/2024] [Accepted: 02/08/2024] [Indexed: 04/05/2024] Open

Abstract

BACKGROUND

MyDiaMate is a web-based intervention specifically designed for adults with type 1 diabetes (T1D) that aims to help them improve and maintain their mental health. Prior pilot-testing of MyDiaMate verified its acceptability, feasibility, and usability.

OBJECTIVE

This study aimed to investigate the real-world uptake and usage of MyDiaMate in the Netherlands.

METHODS

Between March 2021 and December 2022, MyDiaMate was made freely available to Dutch adults with T1D. Usage (participation and completion rates of the modules) was tracked using log data. Users could volunteer to participate in the user profile study, which required filling out a set of baseline questionnaires. The usage of study participants was examined separately for participants scoring above and below the cutoffs of the "Problem Areas in Diabetes" (PAID-11) questionnaire (diabetes distress), the "World Health Organization Well-being Index" (WHO-5) questionnaire (emotional well-being), and the fatigue severity subscale of the "Checklist Individual Strength" (CIS) questionnaire (fatigue). Two months after creating an account, study participants received an evaluation questionnaire to provide us with feedback.

RESULTS

In total, 1008 adults created a MyDiaMate account, of whom 343 (34%) participated in the user profile study. The mean age was 43 (SD 14.9; 18-76) years. Most participants were female (n=217, 63.3%) and higher educated (n=198, 57.6%). The majority had been living with T1D for over 5 years (n=241, 73.5%). Of the study participants, 59.1% (n=199) of them reported low emotional well-being (WHO-5 score≤50), 70.9% (n=239) of them reported elevated diabetes distress (PAID-11 score≥18), and 52.4% (n=178) of them reported severe fatigue (CIS score≥35). Participation rates varied between 9.5% (n=19) for social environment to 100% (n=726) for diabetes in balance, which opened by default. Completion rates ranged from 4.3% (n=1) for energy, an extensive cognitive behavioral therapy module, to 68.6% (n=24) for the shorter module on hypos. There were no differences in terms of participation and completion rates of the modules between study participants with a more severe profile, that is, lower emotional well-being, greater diabetes distress, or more fatigue symptoms, and those with a less severe profile. Further, no technical problems were reported, and various suggestions were made by study participants to improve the application, suggesting a need for more personalization.

CONCLUSIONS

Data from this naturalistic study demonstrated the potential of MyDiaMate as a self-help tool for adults with T1D, supplementary to ongoing diabetes care, to improve healthy coping with diabetes and mental health. Future research is needed to explore engagement strategies and test the efficacy of MyDiaMate in a randomized controlled trial.

Collapse

Elyoseph Z, Levkovich I. Comparing the Perspectives of Generative AI, Mental Health Experts, and the General Public on Schizophrenia Recovery: Case Vignette Study. JMIR Ment Health 2024;11:e53043. [PMID: 38533615 PMCID: PMC11004608 DOI: 10.2196/53043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Revised: 01/24/2024] [Accepted: 02/11/2024] [Indexed: 03/28/2024] Open

Abstract

Background

The current paradigm in mental health care focuses on clinical recovery and symptom remission. This model's efficacy is influenced by therapist trust in patient recovery potential and the depth of the therapeutic relationship. Schizophrenia is a chronic illness with severe symptoms where the possibility of recovery is a matter of debate. As artificial intelligence (AI) becomes integrated into the health care field, it is important to examine its ability to assess recovery potential in major psychiatric disorders such as schizophrenia.

Objective

This study aimed to evaluate the ability of large language models (LLMs) in comparison to mental health professionals to assess the prognosis of schizophrenia with and without professional treatment and the long-term positive and negative outcomes.

Methods

Vignettes were inputted into LLMs interfaces and assessed 10 times by 4 AI platforms: ChatGPT-3.5, ChatGPT-4, Google Bard, and Claude. A total of 80 evaluations were collected and benchmarked against existing norms to analyze what mental health professionals (general practitioners, psychiatrists, clinical psychologists, and mental health nurses) and the general public think about schizophrenia prognosis with and without professional treatment and the positive and negative long-term outcomes of schizophrenia interventions.

Results

For the prognosis of schizophrenia with professional treatment, ChatGPT-3.5 was notably pessimistic, whereas ChatGPT-4, Claude, and Bard aligned with professional views but differed from the general public. All LLMs believed untreated schizophrenia would remain static or worsen without professional treatment. For long-term outcomes, ChatGPT-4 and Claude predicted more negative outcomes than Bard and ChatGPT-3.5. For positive outcomes, ChatGPT-3.5 and Claude were more pessimistic than Bard and ChatGPT-4.

Conclusions

The finding that 3 out of the 4 LLMs aligned closely with the predictions of mental health professionals when considering the "with treatment" condition is a demonstration of the potential of this technology in providing professional clinical prognosis. The pessimistic assessment of ChatGPT-3.5 is a disturbing finding since it may reduce the motivation of patients to start or persist with treatment for schizophrenia. Overall, although LLMs hold promise in augmenting health care, their application necessitates rigorous validation and a harmonious blend with human expertise.

Collapse

Nedbal C, Naik N, Castellani D, Gauhar V, Geraghty R, Somani BK. ChatGPT in urology practice: revolutionizing efficiency and patient care with generative artificial intelligence. Curr Opin Urol 2024;34:98-104. [PMID: 37962176 DOI: 10.1097/mou.0000000000001151] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]

Elyoseph Z, Refoua E, Asraf K, Lvovsky M, Shimoni Y, Hadar-Shoval D. Capacity of Generative AI to Interpret Human Emotions From Visual and Textual Data: Pilot Evaluation Study. JMIR Ment Health 2024;11:e54369. [PMID: 38319707 PMCID: PMC10879976 DOI: 10.2196/54369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 12/09/2023] [Accepted: 12/25/2023] [Indexed: 02/07/2024] Open

Abstract

BACKGROUND

Mentalization, which is integral to human cognitive processes, pertains to the interpretation of one's own and others' mental states, including emotions, beliefs, and intentions. With the advent of artificial intelligence (AI) and the prominence of large language models in mental health applications, questions persist about their aptitude in emotional comprehension. The prior iteration of the large language model from OpenAI, ChatGPT-3.5, demonstrated an advanced capacity to interpret emotions from textual data, surpassing human benchmarks. Given the introduction of ChatGPT-4, with its enhanced visual processing capabilities, and considering Google Bard's existing visual functionalities, a rigorous assessment of their proficiency in visual mentalizing is warranted.

OBJECTIVE

The aim of the research was to critically evaluate the capabilities of ChatGPT-4 and Google Bard with regard to their competence in discerning visual mentalizing indicators as contrasted with their textual-based mentalizing abilities.

METHODS

The Reading the Mind in the Eyes Test developed by Baron-Cohen and colleagues was used to assess the models' proficiency in interpreting visual emotional indicators. Simultaneously, the Levels of Emotional Awareness Scale was used to evaluate the large language models' aptitude in textual mentalizing. Collating data from both tests provided a holistic view of the mentalizing capabilities of ChatGPT-4 and Bard.

RESULTS

ChatGPT-4, displaying a pronounced ability in emotion recognition, secured scores of 26 and 27 in 2 distinct evaluations, significantly deviating from a random response paradigm (P<.001). These scores align with established benchmarks from the broader human demographic. Notably, ChatGPT-4 exhibited consistent responses, with no discernible biases pertaining to the sex of the model or the nature of the emotion. In contrast, Google Bard's performance aligned with random response patterns, securing scores of 10 and 12 and rendering further detailed analysis redundant. In the domain of textual analysis, both ChatGPT and Bard surpassed established benchmarks from the general population, with their performances being remarkably congruent.

CONCLUSIONS

ChatGPT-4 proved its efficacy in the domain of visual mentalizing, aligning closely with human performance standards. Although both models displayed commendable acumen in textual emotion interpretation, Bard's capabilities in visual emotion interpretation necessitate further scrutiny and potential refinement. This study stresses the criticality of ethical AI development for emotional recognition, highlighting the need for inclusive data, collaboration with patients and mental health experts, and stringent governmental oversight to ensure transparency and protect patient privacy.

Collapse

Sufyan NS, Fadhel FH, Alkhathami SS, Mukhadi JYA. Artificial intelligence and social intelligence: preliminary comparison study between AI models and psychologists. Front Psychol 2024;15:1353022. [PMID: 38379623 PMCID: PMC10878391 DOI: 10.3389/fpsyg.2024.1353022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2023] [Accepted: 01/22/2024] [Indexed: 02/22/2024] Open

Elyoseph Z, Levkovich I, Shinan-Altman S. Assessing prognosis in depression: comparing perspectives of AI models, mental health professionals and the general public. Fam Med Community Health 2024;12:e002583. [PMID: 38199604 PMCID: PMC10806564 DOI: 10.1136/fmch-2023-002583] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2024] Open

Abstract

BACKGROUND

Artificial intelligence (AI) has rapidly permeated various sectors, including healthcare, highlighting its potential to facilitate mental health assessments. This study explores the underexplored domain of AI's role in evaluating prognosis and long-term outcomes in depressive disorders, offering insights into how AI large language models (LLMs) compare with human perspectives.

METHODS

Using case vignettes, we conducted a comparative analysis involving different LLMs (ChatGPT-3.5, ChatGPT-4, Claude and Bard), mental health professionals (general practitioners, psychiatrists, clinical psychologists and mental health nurses), and the general public that reported previously. We evaluate the LLMs ability to generate prognosis, anticipated outcomes with and without professional intervention, and envisioned long-term positive and negative consequences for individuals with depression.

RESULTS

In most of the examined cases, the four LLMs consistently identified depression as the primary diagnosis and recommended a combined treatment of psychotherapy and antidepressant medication. ChatGPT-3.5 exhibited a significantly pessimistic prognosis distinct from other LLMs, professionals and the public. ChatGPT-4, Claude and Bard aligned closely with mental health professionals and the general public perspectives, all of whom anticipated no improvement or worsening without professional help. Regarding long-term outcomes, ChatGPT 3.5, Claude and Bard consistently projected significantly fewer negative long-term consequences of treatment than ChatGPT-4.

CONCLUSIONS

This study underscores the potential of AI to complement the expertise of mental health professionals and promote a collaborative paradigm in mental healthcare. The observation that three of the four LLMs closely mirrored the anticipations of mental health experts in scenarios involving treatment underscores the technology's prospective value in offering professional clinical forecasts. The pessimistic outlook presented by ChatGPT 3.5 is concerning, as it could potentially diminish patients' drive to initiate or continue depression therapy. In summary, although LLMs show potential in enhancing healthcare services, their utilisation requires thorough verification and a seamless integration with human judgement and skills.

Collapse

Elyoseph Z, Hadar Shoval D, Levkovich I. Beyond Personhood: Ethical Paradigms in the Generative Artificial Intelligence Era. THE AMERICAN JOURNAL OF BIOETHICS : AJOB 2024;24:57-59. [PMID: 38236857 DOI: 10.1080/15265161.2023.2278546] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2024]

Fei X, Tang Y, Zhang J, Zhou Z, Yamamoto I, Zhang Y. Evaluating cognitive performance: Traditional methods vs. ChatGPT. Digit Health 2024;10:20552076241264639. [PMID: 39156049 PMCID: PMC11329975 DOI: 10.1177/20552076241264639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Accepted: 06/10/2024] [Indexed: 08/20/2024] Open

Abstract

Background

NLP models like ChatGPT promise to revolutionize text-based content delivery, particularly in medicine. Yet, doubts remain about ChatGPT's ability to reliably support evaluations of cognitive performance, warranting further investigation into its accuracy and comprehensiveness in this area.

Method

A cohort of 60 cognitively normal individuals and 30 stroke survivors underwent a comprehensive evaluation, covering memory, numerical processing, verbal fluency, and abstract thinking. Healthcare professionals and NLP models GPT-3.5 and GPT-4 conducted evaluations following established standards. Scores were compared, and efforts were made to refine scoring protocols and interaction methods to enhance ChatGPT's potential in these evaluations.

Result

Within the cohort of healthy participants, the utilization of GPT-3.5 revealed significant disparities in memory evaluation compared to both physician-led assessments and those conducted utilizing GPT-4 (P < 0.001). Furthermore, within the domain of memory evaluation, GPT-3.5 exhibited discrepancies in 8 out of 21 specific measures when compared to assessments conducted by physicians (P < 0.05). Additionally, GPT-3.5 demonstrated statistically significant deviations from physician assessments in speech evaluation (P = 0.009). Among participants with a history of stroke, GPT-3.5 exhibited differences solely in verbal assessment compared to physician-led evaluations (P = 0.002). Notably, through the implementation of optimized scoring methodologies and refinement of interaction protocols, partial mitigation of these disparities was achieved.

Conclusion

ChatGPT can produce evaluation outcomes comparable to traditional methods. Despite differences from physician evaluations, refinement of scoring algorithms and interaction protocols has improved alignment. ChatGPT performs well even in populations with specific conditions like stroke, suggesting its versatility. GPT-4 yields results closer to physician ratings, indicating potential for further enhancement. These findings highlight ChatGPT's importance as a supplementary tool, offering new avenues for information gathering in medical fields and guiding its ongoing development and application.

Collapse

Watari T, Takagi S, Sakaguchi K, Nishizaki Y, Shimizu T, Yamamoto Y, Tokuda Y. Performance Comparison of ChatGPT-4 and Japanese Medical Residents in the General Medicine In-Training Examination: Comparison Study. JMIR MEDICAL EDUCATION 2023;9:e52202. [PMID: 38055323 PMCID: PMC10733815 DOI: 10.2196/52202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 10/22/2023] [Accepted: 11/03/2023] [Indexed: 12/07/2023]

Abstract

BACKGROUND

The reliability of GPT-4, a state-of-the-art expansive language model specializing in clinical reasoning and medical knowledge, remains largely unverified across non-English languages.

OBJECTIVE

This study aims to compare fundamental clinical competencies between Japanese residents and GPT-4 by using the General Medicine In-Training Examination (GM-ITE).

METHODS

We used the GPT-4 model provided by OpenAI and the GM-ITE examination questions for the years 2020, 2021, and 2022 to conduct a comparative analysis. This analysis focused on evaluating the performance of individuals who were concluding their second year of residency in comparison to that of GPT-4. Given the current abilities of GPT-4, our study included only single-choice exam questions, excluding those involving audio, video, or image data. The assessment included 4 categories: general theory (professionalism and medical interviewing), symptomatology and clinical reasoning, physical examinations and clinical procedures, and specific diseases. Additionally, we categorized the questions into 7 specialty fields and 3 levels of difficulty, which were determined based on residents' correct response rates.

RESULTS

Upon examination of 137 GM-ITE questions in Japanese, GPT-4 scores were significantly higher than the mean scores of residents (residents: 55.8%, GPT-4: 70.1%; P<.001). In terms of specific disciplines, GPT-4 scored 23.5 points higher in the "specific diseases," 30.9 points higher in "obstetrics and gynecology," and 26.1 points higher in "internal medicine." In contrast, GPT-4 scores in "medical interviewing and professionalism," "general practice," and "psychiatry" were lower than those of the residents, although this discrepancy was not statistically significant. Upon analyzing scores based on question difficulty, GPT-4 scores were 17.2 points lower for easy problems (P=.007) but were 25.4 and 24.4 points higher for normal and difficult problems, respectively (P<.001). In year-on-year comparisons, GPT-4 scores were 21.7 and 21.5 points higher in the 2020 (P=.01) and 2022 (P=.003) examinations, respectively, but only 3.5 points higher in the 2021 examinations (no significant difference).

CONCLUSIONS

In the Japanese language, GPT-4 also outperformed the average medical residents in the GM-ITE test, originally designed for them. Specifically, GPT-4 demonstrated a tendency to score higher on difficult questions with low resident correct response rates and those demanding a more comprehensive understanding of diseases. However, GPT-4 scored comparatively lower on questions that residents could readily answer, such as those testing attitudes toward patients and professionalism, as well as those necessitating an understanding of context and communication. These findings highlight the strengths and limitations of artificial intelligence applications in medical education and practice.

Collapse

Levkovich I, Elyoseph Z. Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 Versus ChatGPT-4: Vignette Study. JMIR Ment Health 2023;10:e51232. [PMID: 37728984 PMCID: PMC10551796 DOI: 10.2196/51232] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 08/22/2023] [Accepted: 08/24/2023] [Indexed: 09/22/2023] Open

Abstract

BACKGROUND

ChatGPT, a linguistic artificial intelligence (AI) model engineered by OpenAI, offers prospective contributions to mental health professionals. Although having significant theoretical implications, ChatGPT's practical capabilities, particularly regarding suicide prevention, have not yet been substantiated.

OBJECTIVE

The study's aim was to evaluate ChatGPT's ability to assess suicide risk, taking into consideration 2 discernable factors-perceived burdensomeness and thwarted belongingness-over a 2-month period. In addition, we evaluated whether ChatGPT-4 more accurately evaluated suicide risk than did ChatGPT-3.5.

METHODS

ChatGPT was tasked with assessing a vignette that depicted a hypothetical patient exhibiting differing degrees of perceived burdensomeness and thwarted belongingness. The assessments generated by ChatGPT were subsequently contrasted with standard evaluations rendered by mental health professionals. Using both ChatGPT-3.5 and ChatGPT-4 (May 24, 2023), we executed 3 evaluative procedures in June and July 2023. Our intent was to scrutinize ChatGPT-4's proficiency in assessing various facets of suicide risk in relation to the evaluative abilities of both mental health professionals and an earlier version of ChatGPT-3.5 (March 14 version).

RESULTS

During the period of June and July 2023, we found that the likelihood of suicide attempts as evaluated by ChatGPT-4 was similar to the norms of mental health professionals (n=379) under all conditions (average Z score of 0.01). Nonetheless, a pronounced discrepancy was observed regarding the assessments performed by ChatGPT-3.5 (May version), which markedly underestimated the potential for suicide attempts, in comparison to the assessments carried out by the mental health professionals (average Z score of -0.83). The empirical evidence suggests that ChatGPT-4's evaluation of the incidence of suicidal ideation and psychache was higher than that of the mental health professionals (average Z score of 0.47 and 1.00, respectively). Conversely, the level of resilience as assessed by both ChatGPT-4 and ChatGPT-3.5 (both versions) was observed to be lower in comparison to the assessments offered by mental health professionals (average Z score of -0.89 and -0.90, respectively).

CONCLUSIONS

The findings suggest that ChatGPT-4 estimates the likelihood of suicide attempts in a manner akin to evaluations provided by professionals. In terms of recognizing suicidal ideation, ChatGPT-4 appears to be more precise. However, regarding psychache, there was an observed overestimation by ChatGPT-4, indicating a need for further research. These results have implications regarding ChatGPT-4's potential to support gatekeepers, patients, and even mental health professionals' decision-making. Despite the clinical potential, intensive follow-up studies are necessary to establish the use of ChatGPT-4's capabilities in clinical practice. The finding that ChatGPT-3.5 frequently underestimates suicide risk, especially in severe cases, is particularly troubling. It indicates that ChatGPT may downplay one's actual suicide risk level.

Collapse

Talyshinskii A, Naik N, Hameed BMZ, Zhanbyrbekuly U, Khairli G, Guliev B, Juilebø-Jones P, Tzelves L, Somani BK. Expanding horizons and navigating challenges for enhanced clinical workflows: ChatGPT in urology. Front Surg 2023;10:1257191. [PMID: 37744723 PMCID: PMC10512827 DOI: 10.3389/fsurg.2023.1257191] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 08/28/2023] [Indexed: 09/26/2023] Open

Levkovich I, Elyoseph Z. Identifying depression and its determinants upon initiating treatment: ChatGPT versus primary care physicians. Fam Med Community Health 2023;11:e002391. [PMID: 37844967 PMCID: PMC10582915 DOI: 10.1136/fmch-2023-002391] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2023] Open

Abstract

OBJECTIVE

To compare evaluations of depressive episodes and suggested treatment protocols generated by Chat Generative Pretrained Transformer (ChatGPT)-3 and ChatGPT-4 with the recommendations of primary care physicians.

METHODS

Vignettes were input to the ChatGPT interface. These vignettes focused primarily on hypothetical patients with symptoms of depression during initial consultations. The creators of these vignettes meticulously designed eight distinct versions in which they systematically varied patient attributes (sex, socioeconomic status (blue collar worker or white collar worker) and depression severity (mild or severe)). Each variant was subsequently introduced into ChatGPT-3.5 and ChatGPT-4. Each vignette was repeated 10 times to ensure consistency and reliability of the ChatGPT responses.

RESULTS

For mild depression, ChatGPT-3.5 and ChatGPT-4 recommended psychotherapy in 95.0% and 97.5% of cases, respectively. Primary care physicians, however, recommended psychotherapy in only 4.3% of cases. For severe cases, ChatGPT favoured an approach that combined psychotherapy, while primary care physicians recommended a combined approach. The pharmacological recommendations of ChatGPT-3.5 and ChatGPT-4 showed a preference for exclusive use of antidepressants (74% and 68%, respectively), in contrast with primary care physicians, who typically recommended a mix of antidepressants and anxiolytics/hypnotics (67.4%). Unlike primary care physicians, ChatGPT showed no gender or socioeconomic biases in its recommendations.

CONCLUSION

ChatGPT-3.5 and ChatGPT-4 aligned well with accepted guidelines for managing mild and severe depression, without showing the gender or socioeconomic biases observed among primary care physicians. Despite the suggested potential benefit of using atificial intelligence (AI) chatbots like ChatGPT to enhance clinical decision making, further research is needed to refine AI recommendations for severe cases and to consider potential risks and ethical issues.

Collapse

Hadar-Shoval D, Elyoseph Z, Lvovsky M. The plasticity of ChatGPT's mentalizing abilities: personalization for personality structures. Front Psychiatry 2023;14:1234397. [PMID: 37720897 PMCID: PMC10503434 DOI: 10.3389/fpsyt.2023.1234397] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/04/2023] [Accepted: 08/22/2023] [Indexed: 09/19/2023] Open

Elyoseph Z, Levkovich I. Beyond human expertise: the promise and limitations of ChatGPT in suicide risk assessment. Front Psychiatry 2023;14:1213141. [PMID: 37593450 PMCID: PMC10427505 DOI: 10.3389/fpsyt.2023.1213141] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 07/19/2023] [Indexed: 08/19/2023] Open