1
|
Hunter RB, Thammasitboon S, Rahman SS, Fainberg N, Renuart A, Kumar S, Jain PN, Rissmiller B, Sur M, Mehta S. Using ChatGPT to Provide Patient-Specific Answers to Parental Questions in the PICU. Pediatrics 2024:e2024066615. [PMID: 39370900 DOI: 10.1542/peds.2024-066615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 08/12/2024] [Accepted: 08/14/2024] [Indexed: 10/08/2024] Open
Abstract
OBJECTIVES To determine if ChatGPT can incorporate patient-specific information to provide high-quality answers to parental questions in the PICU. We hypothesized that ChatGPT would generate high-quality, patient-specific responses. METHODS In this cross-sectional study, we generated assessments and plans for 3 PICU patients with respiratory failure, septic shock, and status epilepticus and paired them with 8 typical parental questions. We prompted ChatGPT with instructions, an assessment and plan, and 1 question. Six PICU physicians evaluated the responses for accuracy (1-6), completeness (yes/no), empathy (1-6), and understandability (Patient Education Materials Assessment Tool, PEMAT, 0% to 100%; Flesch-Kincaid grade level). We compared answer quality among scenarios and question types using the Kruskal-Wallis and Fischer's exact tests. We used percent agreement, Cohen's Kappa, and Gwet's agreement coefficient to estimate inter-rater reliability. RESULTS All answers incorporated patient details, utilizing them for reasoning in 59% of sentences. Responses had high accuracy (median 5.0, [interquartile range (IQR), 4.0-6.0]), empathy (median 5.0, [IQR, 5.0-6.0]), completeness (97% of all questions), and understandability (PEMAT % median 100, [IQR, 87.5-100]; Flesch-Kincaid level 8.7). Only 4/144 reviewer scores were <4/6 in accuracy, and no response was deemed likely to cause harm. There was no difference in accuracy, completeness, empathy, or understandability among scenarios or question types. We found fair, substantial, and almost perfect agreement among reviewers for accuracy, empathy, and understandability, respectively. CONCLUSIONS ChatGPT used patient-specific information to provide high-quality answers to parental questions in PICU clinical scenarios.
Collapse
Affiliation(s)
- R Brandon Hunter
- Texas Children's Hospital, Houston, Texas
- Baylor College of Medicine, Houston, Texas
| | - Satid Thammasitboon
- Texas Children's Hospital, Houston, Texas
- Baylor College of Medicine, Houston, Texas
| | | | | | - Andrew Renuart
- Boston Children's Hospital, Boston, Massachusetts
- Harvard Medical School, Boston, Massachusetts
| | - Shelley Kumar
- Texas Children's Hospital, Houston, Texas
- Baylor College of Medicine, Houston, Texas
| | - Parag N Jain
- Texas Children's Hospital, Houston, Texas
- Baylor College of Medicine, Houston, Texas
| | - Brian Rissmiller
- Texas Children's Hospital, Houston, Texas
- Baylor College of Medicine, Houston, Texas
| | - Moushumi Sur
- Texas Children's Hospital, Houston, Texas
- Baylor College of Medicine, Houston, Texas
| | - Sanjiv Mehta
- The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania
- University of Pennsylvania, Philadelphia, Pennsylvania
| |
Collapse
|
2
|
Park C, Kim J. Exploring Affective Representations in Emotional Narratives: An Exploratory Study Comparing ChatGPT and Human Responses. CYBERPSYCHOLOGY, BEHAVIOR AND SOCIAL NETWORKING 2024; 27:736-741. [PMID: 39229675 DOI: 10.1089/cyber.2024.0100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
While artificial Intelligence (AI) has made significant advancements, the seeming absence of its emotional ability has hindered effective communication with humans. This study explores how ChatGPT (ChatGPT-3.5 Mar 23, 2023 Version) represents affective responses to emotional narratives and compare these responses to human responses. Thirty-four participants read affect-eliciting short stories and rated their emotional responses and 10 recorded ChatGPT sessions generated responses to the stories. Classification analyses revealed the successful identification of affective categories of stories, valence, and arousal within and across sessions for ChatGPT. Classification analyses revealed the successful identification of affective categories of stories, valence, and arousal within and across sessions for ChatGPT. Classification accuracies predicting affective categories of stories, valence, and arousal of humans based on the affective ratings of ChatGPT and vice versa were not significant, indicating differences in the way the affective states were represented., indicating differences in the way the affective states were represented. These findings suggested that ChatGPT can distinguish emotional states and generate affective responses consistently, but there are differences in how the affective states are represented between ChatGPT and humans. Understanding these mechanisms is crucial for improving emotional interactions with AI.
Collapse
Affiliation(s)
- Chaery Park
- Department of Psychology, Jeonbuk National University, Jeonju, Republic of Korea
| | - Jongwan Kim
- Department of Psychology, Jeonbuk National University, Jeonju, Republic of Korea
| |
Collapse
|
3
|
Hadar-Shoval D, Asraf K, Shinan-Altman S, Elyoseph Z, Levkovich I. Embedded values-like shape ethical reasoning of large language models on primary care ethical dilemmas. Heliyon 2024; 10:e38056. [PMID: 39381244 PMCID: PMC11458949 DOI: 10.1016/j.heliyon.2024.e38056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Accepted: 09/17/2024] [Indexed: 10/10/2024] Open
Abstract
Objective This article uses the framework of Schwartz's values theory to examine whether the embedded values-like profile within large language models (LLMs) impact ethical decision-making dilemmas faced by primary care. It specifically aims to evaluate whether each LLM exhibits a distinct values-like profile, assess its alignment with general population values, and determine whether latent values influence clinical recommendations. Methods The Portrait Values Questionnaire-Revised (PVQ-RR) was submitted to each LLM (Claude, Bard, GPT-3.5, and GPT-4) 20 times to ensure reliable and valid responses. Their responses were compared to a benchmark derived from a diverse international sample consisting of over 53,000 culturally diverse respondents who completed the PVQ-RR. Four vignettes depicting prototypical professional quandaries involving conflicts between competing values were presented to the LLMs. The option selected by each LLM and the strength of its recommendation were evaluated to determine if underlying values-like impact output. Results Each LLM demonstrated a unique values-like profile. Universalism and self-direction were prioritized, while power and tradition were assigned less importance than population benchmarks, suggesting potential Western-centric biases. Four clinical vignettes involving value conflicts were presented to the LLMs. Preliminary indications suggested that embedded values-like influence recommendations. Significant variances in confidence strength regarding chosen recommendations materialized between models, proposing that further vetting is required before the LLMs can be relied on as judgment aids. However, the overall selection of preferences aligned with intrinsic value hierarchies. Conclusion The distinct intrinsic values-like embedded within LLMs shape ethical decision-making, which carries implications for their integration in primary care settings serving diverse populations. For context-appropriate, equitable delivery of AI-assisted healthcare globally it is essential that LLMs are tailored to align with cultural outlooks.
Collapse
Affiliation(s)
- Dorit Hadar-Shoval
- The Center for Psychobiological Research, Department of Psychology and Educational Counseling, Max Stern Yezreel Valley College, Israel
| | - Kfir Asraf
- The Center for Psychobiological Research, Department of Psychology and Educational Counseling, Max Stern Yezreel Valley College, Israel
| | - Shiri Shinan-Altman
- The Louis and Gabi Weisfeld School of Social Work, Bar-Ilan University, Ramat Gan, Israel
| | - Zohar Elyoseph
- The Center for Psychobiological Research, Department of Psychology and Educational Counseling, Max Stern Yezreel Valley College, Israel
- Department of Brain Sciences, Faculty of Medicine, Imperial College London, England
- Department of Counseling and Human Development, Department of Education, University of Haifa, Israel
| | | |
Collapse
|
4
|
Tam TYC, Sivarajkumar S, Kapoor S, Stolyar AV, Polanska K, McCarthy KR, Osterhoudt H, Wu X, Visweswaran S, Fu S, Mathur P, Cacciamani GE, Sun C, Peng Y, Wang Y. A framework for human evaluation of large language models in healthcare derived from literature review. NPJ Digit Med 2024; 7:258. [PMID: 39333376 PMCID: PMC11437138 DOI: 10.1038/s41746-024-01258-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2024] [Accepted: 09/11/2024] [Indexed: 09/29/2024] Open
Abstract
With generative artificial intelligence (GenAI), particularly large language models (LLMs), continuing to make inroads in healthcare, assessing LLMs with human evaluations is essential to assuring safety and effectiveness. This study reviews existing literature on human evaluation methodologies for LLMs in healthcare across various medical specialties and addresses factors such as evaluation dimensions, sample types and sizes, selection, and recruitment of evaluators, frameworks and metrics, evaluation process, and statistical analysis type. Our literature review of 142 studies shows gaps in reliability, generalizability, and applicability of current human evaluation practices. To overcome such significant obstacles to healthcare LLM developments and deployments, we propose QUEST, a comprehensive and practical framework for human evaluation of LLMs covering three phases of workflow: Planning, Implementation and Adjudication, and Scoring and Review. QUEST is designed with five proposed evaluation principles: Quality of Information, Understanding and Reasoning, Expression Style and Persona, Safety and Harm, and Trust and Confidence.
Collapse
Affiliation(s)
- Thomas Yu Chow Tam
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA
| | | | - Sumit Kapoor
- Department of Critical Care Medicine, University of Pittsburgh Medical Center, Pittsburgh, PA, USA
| | - Alisa V Stolyar
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA
| | - Katelyn Polanska
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA
| | - Karleigh R McCarthy
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA
| | - Hunter Osterhoudt
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA
| | - Xizhi Wu
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA
| | - Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
- Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, PA, USA
| | - Sunyang Fu
- Department of Clinical and Health Informatics, Center for Translational AI Excellence and Applications in Medicine, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Piyush Mathur
- Department of Anesthesiology, Cleveland Clinic, Cleveland, OH, USA
- BrainX AI ReSearch, BrainX LLC, Cleveland, OH, USA
| | - Giovanni E Cacciamani
- Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Cong Sun
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Yifan Peng
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Yanshan Wang
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA.
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA.
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.
- Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, PA, USA.
- Hillman Cancer Center, University of Pittsburgh Medical Center, Pittsburgh, PA, USA.
| |
Collapse
|
5
|
Tavory T. Regulating AI in Mental Health: Ethics of Care Perspective. JMIR Ment Health 2024; 11:e58493. [PMID: 39298759 PMCID: PMC11450345 DOI: 10.2196/58493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/17/2024] [Revised: 06/29/2024] [Accepted: 07/20/2024] [Indexed: 09/22/2024] Open
Abstract
This article contends that the responsible artificial intelligence (AI) approach-which is the dominant ethics approach ruling most regulatory and ethical guidance-falls short because it overlooks the impact of AI on human relationships. Focusing only on responsible AI principles reinforces a narrow concept of accountability and responsibility of companies developing AI. This article proposes that applying the ethics of care approach to AI regulation can offer a more comprehensive regulatory and ethical framework that addresses AI's impact on human relationships. This dual approach is essential for the effective regulation of AI in the domain of mental health care. The article delves into the emergence of the new "therapeutic" area facilitated by AI-based bots, which operate without a therapist. The article highlights the difficulties involved, mainly the absence of a defined duty of care toward users, and shows how implementing ethics of care can establish clear responsibilities for developers. It also sheds light on the potential for emotional manipulation and the risks involved. In conclusion, the article proposes a series of considerations grounded in the ethics of care for the developmental process of AI-powered therapeutic tools.
Collapse
Affiliation(s)
- Tamar Tavory
- Faculty of Law, Bar Ilan University, Ramat Gan, Israel
- The Samueli Initiative for Responsible AI in Medicine, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
6
|
Fatahi S, Vassileva J, Roy CK. Comparing emotions in ChatGPT answers and human answers to the coding questions on Stack Overflow. Front Artif Intell 2024; 7:1393903. [PMID: 39351510 PMCID: PMC11439875 DOI: 10.3389/frai.2024.1393903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 08/22/2024] [Indexed: 10/04/2024] Open
Abstract
Introduction Recent advances in generative Artificial Intelligence (AI) and Natural Language Processing (NLP) have led to the development of Large Language Models (LLMs) and AI-powered chatbots like ChatGPT, which have numerous practical applications. Notably, these models assist programmers with coding queries, debugging, solution suggestions, and providing guidance on software development tasks. Despite known issues with the accuracy of ChatGPT's responses, its comprehensive and articulate language continues to attract frequent use. This indicates potential for ChatGPT to support educators and serve as a virtual tutor for students. Methods To explore this potential, we conducted a comprehensive analysis comparing the emotional content in responses from ChatGPT and human answers to 2000 questions sourced from Stack Overflow (SO). The emotional aspects of the answers were examined to understand how the emotional tone of AI responses compares to that of human responses. Results Our analysis revealed that ChatGPT's answers are generally more positive compared to human responses. In contrast, human answers often exhibit emotions such as anger and disgust. Significant differences were observed in emotional expressions between ChatGPT and human responses, particularly in the emotions of anger, disgust, and joy. Human responses displayed a broader emotional spectrum compared to ChatGPT, suggesting greater emotional variability among humans. Discussion The findings highlight a distinct emotional divergence between ChatGPT and human responses, with ChatGPT exhibiting a more uniformly positive tone and humans displaying a wider range of emotions. This variance underscores the need for further research into the role of emotional content in AI and human interactions, particularly in educational contexts where emotional nuances can impact learning and communication.
Collapse
Affiliation(s)
- Somayeh Fatahi
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK, Canada
| | | | | |
Collapse
|
7
|
Bala B. Chatbots Are Not Clinicians: Addressing Misconceptions About Large Language Model Use in Psychiatric Care. ACADEMIC PSYCHIATRY : THE JOURNAL OF THE AMERICAN ASSOCIATION OF DIRECTORS OF PSYCHIATRIC RESIDENCY TRAINING AND THE ASSOCIATION FOR ACADEMIC PSYCHIATRY 2024:10.1007/s40596-024-02042-1. [PMID: 39237804 DOI: 10.1007/s40596-024-02042-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Accepted: 08/29/2024] [Indexed: 09/07/2024]
Affiliation(s)
- Bazif Bala
- Warren Alpert Medical School of Brown University, Providence, RI, USA.
| |
Collapse
|
8
|
Girton MR, Greene DN, Messerlian G, Keren DF, Yu M. ChatGPT vs Medical Professional: Analyzing Responses to Laboratory Medicine Questions on Social Media. Clin Chem 2024; 70:1122-1139. [PMID: 39013110 DOI: 10.1093/clinchem/hvae093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 05/30/2024] [Indexed: 07/18/2024]
Abstract
BACKGROUND The integration of ChatGPT, a large language model (LLM) developed by OpenAI, into healthcare has sparked significant interest due to its potential to enhance patient care and medical education. With the increasing trend of patients accessing laboratory results online, there is a pressing need to evaluate the effectiveness of ChatGPT in providing accurate laboratory medicine information. Our study evaluates ChatGPT's effectiveness in addressing patient questions in this area, comparing its performance with that of medical professionals on social media. METHODS This study sourced patient questions and medical professional responses from Reddit and Quora, comparing them with responses generated by ChatGPT versions 3.5 and 4.0. Experienced laboratory medicine professionals evaluated the responses for quality and preference. Evaluation results were further analyzed using R software. RESULTS The study analyzed 49 questions, with evaluators reviewing responses from both medical professionals and ChatGPT. ChatGPT's responses were preferred by 75.9% of evaluators and generally received higher ratings for quality. They were noted for their comprehensive and accurate information, whereas responses from medical professionals were valued for their conciseness. The interrater agreement was fair, indicating some subjectivity but a consistent preference for ChatGPT's detailed responses. CONCLUSIONS ChatGPT demonstrates potential as an effective tool for addressing queries in laboratory medicine, often surpassing medical professionals in response quality. These results support the need for further research to confirm ChatGPT's utility and explore its integration into healthcare settings.
Collapse
Affiliation(s)
- Mark R Girton
- Department of Pathology, University of Michigan, Ann Arbor, MI, United States
| | - Dina N Greene
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, United States
| | - Geralyn Messerlian
- Department of Pathology, Women and Infants Hospital, Brown University, Providence, RI, United States
| | - David F Keren
- Department of Pathology, University of Michigan, Ann Arbor, MI, United States
| | - Min Yu
- Department of Pathology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, United States
| |
Collapse
|
9
|
Suffoletto B. Deceptively Simple yet Profoundly Impactful: Text Messaging Interventions to Support Health. J Med Internet Res 2024; 26:e58726. [PMID: 39190427 PMCID: PMC11387917 DOI: 10.2196/58726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Revised: 05/30/2024] [Accepted: 07/15/2024] [Indexed: 08/28/2024] Open
Abstract
This paper examines the use of text message (SMS) interventions for health-related behavioral support. It first outlines the historical progress in SMS intervention research publications and the variety of funds from US government agencies. A narrative review follows, highlighting the effectiveness of SMS interventions in key health areas, such as physical activity, diet and weight loss, mental health, and substance use, based on published meta-analyses. It then outlines advantages of text messaging compared to other digital modalities, including the real-time capability to collect information and deliver microdoses of intervention support. Crucial design elements are proposed to optimize effectiveness and longitudinal engagement across communication strategies, psychological foundations, and behavior change tactics. We then discuss advanced functionalities, such as the potential for generative artificial intelligence to improve user interaction. Finally, major challenges to implementation are highlighted, including the absence of a dedicated commercial platform, privacy and security concerns with SMS technology, difficulties integrating SMS interventions with medical informatics systems, and concerns about user engagement. Proposed solutions aim to facilitate the broader application and effectiveness of SMS interventions. Our hope is that these insights can assist researchers and practitioners in using SMS interventions to improve health outcomes and reducing disparities.
Collapse
Affiliation(s)
- Brian Suffoletto
- Department of Emergency Medicine, Stanford University, Palo Alto, CA, United States
| |
Collapse
|
10
|
Xian X, Chang A, Xiang YT, Liu MT. Debate and Dilemmas Regarding Generative AI in Mental Health Care: Scoping Review. Interact J Med Res 2024; 13:e53672. [PMID: 39133916 PMCID: PMC11347908 DOI: 10.2196/53672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Revised: 04/02/2024] [Accepted: 04/26/2024] [Indexed: 08/30/2024] Open
Abstract
BACKGROUND Mental disorders have ranked among the top 10 prevalent causes of burden on a global scale. Generative artificial intelligence (GAI) has emerged as a promising and innovative technological advancement that has significant potential in the field of mental health care. Nevertheless, there is a scarcity of research dedicated to examining and understanding the application landscape of GAI within this domain. OBJECTIVE This review aims to inform the current state of GAI knowledge and identify its key uses in the mental health domain by consolidating relevant literature. METHODS Records were searched within 8 reputable sources including Web of Science, PubMed, IEEE Xplore, medRxiv, bioRxiv, Google Scholar, CNKI and Wanfang databases between 2013 and 2023. Our focus was on original, empirical research with either English or Chinese publications that use GAI technologies to benefit mental health. For an exhaustive search, we also checked the studies cited by relevant literature. Two reviewers were responsible for the data selection process, and all the extracted data were synthesized and summarized for brief and in-depth analyses depending on the GAI approaches used (traditional retrieval and rule-based techniques vs advanced GAI techniques). RESULTS In this review of 144 articles, 44 (30.6%) met the inclusion criteria for detailed analysis. Six key uses of advanced GAI emerged: mental disorder detection, counseling support, therapeutic application, clinical training, clinical decision-making support, and goal-driven optimization. Advanced GAI systems have been mainly focused on therapeutic applications (n=19, 43%) and counseling support (n=13, 30%), with clinical training being the least common. Most studies (n=28, 64%) focused broadly on mental health, while specific conditions such as anxiety (n=1, 2%), bipolar disorder (n=2, 5%), eating disorders (n=1, 2%), posttraumatic stress disorder (n=2, 5%), and schizophrenia (n=1, 2%) received limited attention. Despite prevalent use, the efficacy of ChatGPT in the detection of mental disorders remains insufficient. In addition, 100 articles on traditional GAI approaches were found, indicating diverse areas where advanced GAI could enhance mental health care. CONCLUSIONS This study provides a comprehensive overview of the use of GAI in mental health care, which serves as a valuable guide for future research, practical applications, and policy development in this domain. While GAI demonstrates promise in augmenting mental health care services, its inherent limitations emphasize its role as a supplementary tool rather than a replacement for trained mental health providers. A conscientious and ethical integration of GAI techniques is necessary, ensuring a balanced approach that maximizes benefits while mitigating potential challenges in mental health care practices.
Collapse
Affiliation(s)
- Xuechang Xian
- Department of Communication, Faculty of Social Sciences, University of Macau, Macau SAR, China
- Department of Publicity, Zhaoqing University, Zhaoqing City, China
| | - Angela Chang
- Department of Communication, Faculty of Social Sciences, University of Macau, Macau SAR, China
- Institute of Communication and Health, Lugano University, Lugano, Switzerland
| | - Yu-Tao Xiang
- Department of Public Health and Medicinal Administration, Faculty of Health Sciences, University of Macau, Macau SAR, China
| | | |
Collapse
|
11
|
Cosic K, Kopilas V, Jovanovic T. War, emotions, mental health, and artificial intelligence. Front Psychol 2024; 15:1394045. [PMID: 39156807 PMCID: PMC11327060 DOI: 10.3389/fpsyg.2024.1394045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 07/24/2024] [Indexed: 08/20/2024] Open
Abstract
During the war time dysregulation of negative emotions such as fear, anger, hatred, frustration, sadness, humiliation, and hopelessness can overrule normal societal values, culture, and endanger global peace and security, and mental health in affected societies. Therefore, it is understandable that the range and power of negative emotions may play important roles in consideration of human behavior in any armed conflict. The estimation and assessment of dominant negative emotions during war time are crucial but are challenged by the complexity of emotions' neuro-psycho-physiology. Currently available natural language processing (NLP) tools have comprehensive computational methods to analyze and understand the emotional content of related textual data in war-inflicted societies. Innovative AI-driven technologies incorporating machine learning, neuro-linguistic programming, cloud infrastructure, and novel digital therapeutic tools and applications present an immense potential to enhance mental health care worldwide. This advancement could make mental health services more cost-effective and readily accessible. Due to the inadequate number of psychiatrists and limited psychiatric resources in coping with mental health consequences of war and traumas, new digital therapeutic wearable devices supported by AI tools and means might be promising approach in psychiatry of future. Transformation of negative dominant emotional maps might be undertaken by the simultaneous combination of online cognitive behavioral therapy (CBT) on individual level, as well as usage of emotionally based strategic communications (EBSC) on a public level. The proposed positive emotional transformation by means of CBT and EBSC may provide important leverage in efforts to protect mental health of civil population in war-inflicted societies. AI-based tools that can be applied in design of EBSC stimuli, like Open AI Chat GPT or Google Gemini may have great potential to significantly enhance emotionally based strategic communications by more comprehensive understanding of semantic and linguistic analysis of available text datasets of war-traumatized society. Human in the loop enhanced by Chat GPT and Gemini can aid in design and development of emotionally annotated messages that resonate among targeted population, amplifying the impact of strategic communications in shaping human dominant emotional maps into a more positive by CBT and EBCS.
Collapse
Affiliation(s)
- Kresimir Cosic
- Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia
| | - Vanja Kopilas
- University of Zagreb Faculty of Croatian Studies, Zagreb, Croatia
| | - Tanja Jovanovic
- Department of Psychiatry and Behavioral Neurosciences, Wayne State University School of Medicine, Detroit, MI, United States
| |
Collapse
|
12
|
Kim J, Leonte KG, Chen ML, Torous JB, Linos E, Pinto A, Rodriguez CI. Large language models outperform mental and medical health care professionals in identifying obsessive-compulsive disorder. NPJ Digit Med 2024; 7:193. [PMID: 39030292 PMCID: PMC11271579 DOI: 10.1038/s41746-024-01181-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 06/28/2024] [Indexed: 07/21/2024] Open
Abstract
Despite the promising capacity of large language model (LLM)-powered chatbots to diagnose diseases, they have not been tested for obsessive-compulsive disorder (OCD). We assessed the diagnostic accuracy of LLMs in OCD using vignettes and found that LLMs outperformed medical and mental health professionals. This highlights the potential benefit of LLMs in assisting in the timely and accurate diagnosis of OCD, which usually entails a long delay in diagnosis and treatment.
Collapse
Affiliation(s)
- Jiyeong Kim
- Stanford Center for Digital Health, Department of Medicine, Stanford University, Palo Alto, CA, USA.
| | | | - Michael L Chen
- Stanford Center for Digital Health, Department of Medicine, Stanford University, Palo Alto, CA, USA
| | - John B Torous
- Division of Digital Psychiatry, Department of Psychiatry, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Eleni Linos
- Stanford Center for Digital Health, Department of Medicine, Stanford University, Palo Alto, CA, USA
| | - Anthony Pinto
- Department of Psychiatry, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, NY, USA
- Northwell, New Hyde Park, NY, USA
| | - Carolyn I Rodriguez
- Department of Psychiatry and Behavioral Sciences, School of Medicine, Stanford University, Palo Alto, CA, USA.
- Veterans Affairs Palo Alto Health Care System, Palo Alto, CA, USA.
| |
Collapse
|
13
|
Maltby J, Rayes T, Nage A, Sharif S, Omar M, Nichani S. Synthesizing perspectives: Crafting an Interdisciplinary view of social media's impact on young people's mental health. PLoS One 2024; 19:e0307164. [PMID: 39008509 PMCID: PMC11249244 DOI: 10.1371/journal.pone.0307164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 07/01/2024] [Indexed: 07/17/2024] Open
Abstract
This study explores the intricate relationship between social media usage and the mental health of young individuals by leveraging the insights of 492 UK school headteachers. It adopts a novel multidisciplinary approach, integrating perspectives from psychology, sociology, education studies, political science, philosophy, media studies, linguistics, social work, anthropology, and health sciences. The application of thematic analysis, powered by ChatGPT-4, identifies a predominantly negative perspective on the impact of social media on young people, focusing on key themes across various disciplines, including mental health, identity formation, social interaction and comparison, bullying, digital literacy, and governance policies. These findings culminated in the development of the five-factor Comprehensive Digital Influence Model, suggesting five key themes (Self-Identity and Perception Formation, Social Interaction Skills and Peer Communication, Mental and Emotional Well-Being, Digital Literacy, Critical Thinking, and Information Perception, and Governance, Policy, and Cultural Influence in Digital Spaces) to focus the impacts of social media on young peoples' mental health across primary and secondary educational stages. This study not only advances academic discourse across multiple disciplines but also provides practical insights for educators, policymakers, and mental health professionals, seeking to navigate the challenges and opportunities presented by social media in the digital era.
Collapse
Affiliation(s)
- John Maltby
- School of Psychology and Vision Sciences, University of Leicester, Leicester, Leicestershire, United Kingdom
| | - Thooba Rayes
- School of Medicine, University of Leicester, Leicester, Leicestershire, United Kingdom
| | - Antara Nage
- School of Psychology and Vision Sciences, University of Leicester, Leicester, Leicestershire, United Kingdom
| | - Sulaimaan Sharif
- School of Medicine, University of Leicester, Leicester, Leicestershire, United Kingdom
| | - Maryama Omar
- School of Medicine, University of Leicester, Leicester, Leicestershire, United Kingdom
| | - Sanjiv Nichani
- Leicester Children's Hospital, University Hospitals of Leicester NHS Trust, Leicester, Leicestershire, United Kingdom
| |
Collapse
|
14
|
Banerjee S, Dunn P, Conard S, Ali A. Mental Health Applications of Generative AI and Large Language Modeling in the United States. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2024; 21:910. [PMID: 39063487 PMCID: PMC11276907 DOI: 10.3390/ijerph21070910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2024] [Revised: 07/09/2024] [Accepted: 07/10/2024] [Indexed: 07/28/2024]
Abstract
(1) Background: Artificial intelligence (AI) has flourished in recent years. More specifically, generative AI has had broad applications in many disciplines. While mental illness is on the rise, AI has proven valuable in aiding the diagnosis and treatment of mental disorders. However, there is little to no research about precisely how much interest there is in AI technology. (2) Methods: We performed a Google Trends search for "AI and mental health" and compared relative search volume (RSV) indices of "AI", "AI and Depression", and "AI and anxiety". This time series study employed Box-Jenkins time series modeling to forecast long-term interest through the end of 2024. (3) Results: Within the United States, AI interest steadily increased throughout 2023, with some anomalies due to media reporting. Through predictive models, we found that this trend is predicted to increase 114% through the end of the year 2024, with public interest in AI applications being on the rise. (4) Conclusions: According to our study, we found that the awareness of AI has drastically increased throughout 2023, especially in mental health. This demonstrates increasing public awareness of mental health and AI, making advocacy and education about AI technology of paramount importance.
Collapse
Affiliation(s)
- Sri Banerjee
- School of Health Sciences and Public Policy, Walden University, Minneapolis, MN 55401, USA
| | - Pat Dunn
- Center for Health Technology & Innovation American Heart Association, Dallas, TX 75231, USA;
| | | | - Asif Ali
- McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030, USA;
| |
Collapse
|
15
|
Omar M, Soffer S, Charney AW, Landi I, Nadkarni GN, Klang E. Applications of large language models in psychiatry: a systematic review. Front Psychiatry 2024; 15:1422807. [PMID: 38979501 PMCID: PMC11228775 DOI: 10.3389/fpsyt.2024.1422807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Accepted: 06/05/2024] [Indexed: 07/10/2024] Open
Abstract
Background With their unmatched ability to interpret and engage with human language and context, large language models (LLMs) hint at the potential to bridge AI and human cognitive processes. This review explores the current application of LLMs, such as ChatGPT, in the field of psychiatry. Methods We followed PRISMA guidelines and searched through PubMed, Embase, Web of Science, and Scopus, up until March 2024. Results From 771 retrieved articles, we included 16 that directly examine LLMs' use in psychiatry. LLMs, particularly ChatGPT and GPT-4, showed diverse applications in clinical reasoning, social media, and education within psychiatry. They can assist in diagnosing mental health issues, managing depression, evaluating suicide risk, and supporting education in the field. However, our review also points out their limitations, such as difficulties with complex cases and potential underestimation of suicide risks. Conclusion Early research in psychiatry reveals LLMs' versatile applications, from diagnostic support to educational roles. Given the rapid pace of advancement, future investigations are poised to explore the extent to which these models might redefine traditional roles in mental health care.
Collapse
Affiliation(s)
- Mahmud Omar
- Faculty of Medicine, Tel-Aviv University, Tel-Aviv, Israel
| | - Shelly Soffer
- Internal Medicine B, Assuta Medical Center, Ashdod, Israel
- Ben-Gurion University of the Negev, Be'er Sheva, Israel
| | | | - Isotta Landi
- Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Girish N Nadkarni
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Eyal Klang
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| |
Collapse
|
16
|
Liu J. ChatGPT: perspectives from human-computer interaction and psychology. Front Artif Intell 2024; 7:1418869. [PMID: 38957452 PMCID: PMC11217544 DOI: 10.3389/frai.2024.1418869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Accepted: 06/04/2024] [Indexed: 07/04/2024] Open
Abstract
The release of GPT-4 has garnered widespread attention across various fields, signaling the impending widespread adoption and application of Large Language Models (LLMs). However, previous research has predominantly focused on the technical principles of ChatGPT and its social impact, overlooking its effects on human-computer interaction and user psychology. This paper explores the multifaceted impacts of ChatGPT on human-computer interaction, psychology, and society through a literature review. The author investigates ChatGPT's technical foundation, including its Transformer architecture and RLHF (Reinforcement Learning from Human Feedback) process, enabling it to generate human-like responses. In terms of human-computer interaction, the author studies the significant improvements GPT models bring to conversational interfaces. The analysis extends to psychological impacts, weighing the potential of ChatGPT to mimic human empathy and support learning against the risks of reduced interpersonal connections. In the commercial and social domains, the paper discusses the applications of ChatGPT in customer service and social services, highlighting the improvements in efficiency and challenges such as privacy issues. Finally, the author offers predictions and recommendations for ChatGPT's future development directions and its impact on social relationships.
Collapse
Affiliation(s)
- Jiaxi Liu
- Wee Kim Wee School of Communication and Information, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
17
|
Suwała S, Szulc P, Guzowski C, Kamińska B, Dorobiała J, Wojciechowska K, Berska M, Kubicka O, Kosturkiewicz O, Kosztulska B, Rajewska A, Junik R. ChatGPT-3.5 passes Poland's medical final examination-Is it possible for ChatGPT to become a doctor in Poland? SAGE Open Med 2024; 12:20503121241257777. [PMID: 38895543 PMCID: PMC11185017 DOI: 10.1177/20503121241257777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Accepted: 05/08/2024] [Indexed: 06/21/2024] Open
Abstract
Objectives ChatGPT is an advanced chatbot based on Large Language Model that has the ability to answer questions. Undoubtedly, ChatGPT is capable of transforming communication, education, and customer support; however, can it play the role of a doctor? In Poland, prior to obtaining a medical diploma, candidates must successfully pass the Medical Final Examination. Methods The purpose of this research was to determine how well ChatGPT performed on the Polish Medical Final Examination, which passing is required to become a doctor in Poland (an exam is considered passed if at least 56% of the tasks are answered correctly). A total of 2138 categorized Medical Final Examination questions (from 11 examination sessions held between 2013-2015 and 2021-2023) were presented to ChatGPT-3.5 from 19 to 26 May 2023. For further analysis, the questions were divided into quintiles based on difficulty and duration, as well as question types (simple A-type or complex K-type). The answers provided by ChatGPT were compared to the official answer key, reviewed for any changes resulting from the advancement of medical knowledge. Results ChatGPT correctly answered 53.4%-64.9% of questions. In 8 out of 11 exam sessions, ChatGPT achieved the scores required to successfully pass the examination (60%). The correlation between the efficacy of artificial intelligence and the level of complexity, difficulty, and length of a question was found to be negative. AI outperformed humans in one category: psychiatry (77.18% vs. 70.25%, p = 0.081). Conclusions The performance of artificial intelligence is deemed satisfactory; however, it is observed to be markedly inferior to that of human graduates in the majority of instances. Despite its potential utility in many medical areas, ChatGPT is constrained by its inherent limitations that prevent it from entirely supplanting human expertise and knowledge.
Collapse
Affiliation(s)
- Szymon Suwała
- Department of Endocrinology and Diabetology, Nicolaus Copernicus University, Collegium Medicum, Bydgoszcz, Poland
| | - Paulina Szulc
- Evidence-Based Medicine Students Scientific Club of Department of Endocrinology and Diabetology, Nicolaus Copernicus University, Collegium Medicum, Bydgoszcz, Poland
| | - Cezary Guzowski
- Evidence-Based Medicine Students Scientific Club of Department of Endocrinology and Diabetology, Nicolaus Copernicus University, Collegium Medicum, Bydgoszcz, Poland
| | - Barbara Kamińska
- Evidence-Based Medicine Students Scientific Club of Department of Endocrinology and Diabetology, Nicolaus Copernicus University, Collegium Medicum, Bydgoszcz, Poland
| | - Jakub Dorobiała
- Evidence-Based Medicine Students Scientific Club of Department of Endocrinology and Diabetology, Nicolaus Copernicus University, Collegium Medicum, Bydgoszcz, Poland
| | - Karolina Wojciechowska
- Evidence-Based Medicine Students Scientific Club of Department of Endocrinology and Diabetology, Nicolaus Copernicus University, Collegium Medicum, Bydgoszcz, Poland
| | - Maria Berska
- Evidence-Based Medicine Students Scientific Club of Department of Endocrinology and Diabetology, Nicolaus Copernicus University, Collegium Medicum, Bydgoszcz, Poland
| | - Olga Kubicka
- Evidence-Based Medicine Students Scientific Club of Department of Endocrinology and Diabetology, Nicolaus Copernicus University, Collegium Medicum, Bydgoszcz, Poland
| | - Oliwia Kosturkiewicz
- Evidence-Based Medicine Students Scientific Club of Department of Endocrinology and Diabetology, Nicolaus Copernicus University, Collegium Medicum, Bydgoszcz, Poland
| | - Bernadetta Kosztulska
- Evidence-Based Medicine Students Scientific Club of Department of Endocrinology and Diabetology, Nicolaus Copernicus University, Collegium Medicum, Bydgoszcz, Poland
| | - Alicja Rajewska
- Evidence-Based Medicine Students Scientific Club of Department of Endocrinology and Diabetology, Nicolaus Copernicus University, Collegium Medicum, Bydgoszcz, Poland
| | - Roman Junik
- Department of Endocrinology and Diabetology, Nicolaus Copernicus University, Collegium Medicum, Bydgoszcz, Poland
| |
Collapse
|
18
|
Burke-Garcia A, Soskin Hicks R. Scaling the Idea of Opinion Leadership to Address Health Misinformation: The Case for "Health Communication AI". JOURNAL OF HEALTH COMMUNICATION 2024; 29:396-399. [PMID: 38832662 DOI: 10.1080/10810730.2024.2357575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2024]
Abstract
There is strong evidence of the impact of opinion leaders in health promotion programs. Early work by Burke-Garcia suggests that social media influencers are the opinion leaders of the digital age as they come from the communities they influence, have built trust with them, and may be useful in combating misinformation by disseminating credible and timely health information and prompting consideration of health behaviors. AI has contributed to the spread of misinformation, but it can also be a vital part of the solution, informing and educating in real time and at scale. Personalized, empathetic messaging is crucial, though, and research supports that individuals are drawn to empathetic AI responses and prefer them to human responses in some digital environments. This mimics what we know about influencers and how they approach communicating with their followers. Blending what we know about social media influencers as opinion leaders with the power and scale of AI can enable us to address the spread of misinformation. This paper reviews the knowledge base and proposes the development of something we term "Health Communication AI" - perhaps the newest form of opinion leader - to fight health misinformation.
Collapse
Affiliation(s)
- A Burke-Garcia
- Public Health Department, NORC at the University of Chicago, Bethesda, Maryland, USA
| | - R Soskin Hicks
- Public Health Department, NORC at the University of Chicago, Bethesda, Maryland, USA
| |
Collapse
|
19
|
Tan S, Xin X, Wu D. ChatGPT in medicine: prospects and challenges: a review article. Int J Surg 2024; 110:3701-3706. [PMID: 38502861 PMCID: PMC11175750 DOI: 10.1097/js9.0000000000001312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 02/26/2024] [Indexed: 03/21/2024]
Abstract
It has been a year since the launch of Chat Generator Pre-Trained Transformer (ChatGPT), a generative artificial intelligence (AI) program. The introduction of this cross-generational product initially brought a huge shock to people with its incredible potential and then aroused increasing concerns among people. In the field of medicine, researchers have extensively explored the possible applications of ChatGPT and achieved numerous satisfactory results. However, opportunities and issues always come together. Problems have also been exposed during the applications of ChatGPT, requiring cautious handling, thorough consideration, and further guidelines for safe use. Here, the authors summarized the potential applications of ChatGPT in the medical field, including revolutionizing healthcare consultation, assisting patient management and treatment, transforming medical education, and facilitating clinical research. Meanwhile, the authors also enumerated researchers' concerns arising along with its broad and satisfactory applications. As it is irreversible that AI will gradually permeate every aspect of modern life, the authors hope that this review can not only promote people's understanding of the potential applications of ChatGPT in the future but also remind them to be more cautious about this "Pandora's Box" in the medical field. It is necessary to establish normative guidelines for its safe use in the medical field as soon as possible.
Collapse
Affiliation(s)
| | | | - Di Wu
- Plastic Surgery Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shijingshan, Beijing, China
| |
Collapse
|
20
|
Shinan-Altman S, Elyoseph Z, Levkovich I. The impact of history of depression and access to weapons on suicide risk assessment: a comparison of ChatGPT-3.5 and ChatGPT-4. PeerJ 2024; 12:e17468. [PMID: 38827287 PMCID: PMC11143969 DOI: 10.7717/peerj.17468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 05/05/2024] [Indexed: 06/04/2024] Open
Abstract
The aim of this study was to evaluate the effectiveness of ChatGPT-3.5 and ChatGPT-4 in incorporating critical risk factors, namely history of depression and access to weapons, into suicide risk assessments. Both models assessed suicide risk using scenarios that featured individuals with and without a history of depression and access to weapons. The models estimated the likelihood of suicidal thoughts, suicide attempts, serious suicide attempts, and suicide-related mortality on a Likert scale. A multivariate three-way ANOVA analysis with Bonferroni post hoc tests was conducted to examine the impact of the forementioned independent factors (history of depression and access to weapons) on these outcome variables. Both models identified history of depression as a significant suicide risk factor. ChatGPT-4 demonstrated a more nuanced understanding of the relationship between depression, access to weapons, and suicide risk. In contrast, ChatGPT-3.5 displayed limited insight into this complex relationship. ChatGPT-4 consistently assigned higher severity ratings to suicide-related variables than did ChatGPT-3.5. The study highlights the potential of these two models, particularly ChatGPT-4, to enhance suicide risk assessment by considering complex risk factors.
Collapse
Affiliation(s)
| | - Zohar Elyoseph
- Department of Brain Sciences, Faculty of Medicine, Imperial College London, London, England, United Kingdom
- The Center for Psychobiological Research, Department of Psychology and Educational Counseling, Max Stern Yezreel Valley College, Emek Yezreel, Israel
| | - Inbar Levkovich
- Faculty of Graduate Studies, Oranim Academic College of Education, Kiryat Tiv’on, Israel
| |
Collapse
|
21
|
Irfan B, Kuoppamäki S, Skantze G. Recommendations for designing conversational companion robots with older adults through foundation models. Front Robot AI 2024; 11:1363713. [PMID: 38860032 PMCID: PMC11163135 DOI: 10.3389/frobt.2024.1363713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Accepted: 05/07/2024] [Indexed: 06/12/2024] Open
Abstract
Companion robots are aimed to mitigate loneliness and social isolation among older adults by providing social and emotional support in their everyday lives. However, older adults' expectations of conversational companionship might substantially differ from what current technologies can achieve, as well as from other age groups like young adults. Thus, it is crucial to involve older adults in the development of conversational companion robots to ensure that these devices align with their unique expectations and experiences. The recent advancement in foundation models, such as large language models, has taken a significant stride toward fulfilling those expectations, in contrast to the prior literature that relied on humans controlling robots (i.e., Wizard of Oz) or limited rule-based architectures that are not feasible to apply in the daily lives of older adults. Consequently, we conducted a participatory design (co-design) study with 28 older adults, demonstrating a companion robot using a large language model (LLM), and design scenarios that represent situations from everyday life. The thematic analysis of the discussions around these scenarios shows that older adults expect a conversational companion robot to engage in conversation actively in isolation and passively in social settings, remember previous conversations and personalize, protect privacy and provide control over learned data, give information and daily reminders, foster social skills and connections, and express empathy and emotions. Based on these findings, this article provides actionable recommendations for designing conversational companion robots for older adults with foundation models, such as LLMs and vision-language models, which can also be applied to conversational robots in other domains.
Collapse
Affiliation(s)
- Bahar Irfan
- Division of Speech, Music and Hearing, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Sanna Kuoppamäki
- Division of Health Informatics and Logistics, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Gabriel Skantze
- Division of Speech, Music and Hearing, KTH Royal Institute of Technology, Stockholm, Sweden
| |
Collapse
|
22
|
Haber Y, Levkovich I, Hadar-Shoval D, Elyoseph Z. The Artificial Third: A Broad View of the Effects of Introducing Generative Artificial Intelligence on Psychotherapy. JMIR Ment Health 2024; 11:e54781. [PMID: 38787297 PMCID: PMC11137430 DOI: 10.2196/54781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 03/24/2024] [Accepted: 04/18/2024] [Indexed: 05/25/2024] Open
Abstract
Unlabelled This paper explores a significant shift in the field of mental health in general and psychotherapy in particular following generative artificial intelligence's new capabilities in processing and generating humanlike language. Following Freud, this lingo-technological development is conceptualized as the "fourth narcissistic blow" that science inflicts on humanity. We argue that this narcissistic blow has a potentially dramatic influence on perceptions of human society, interrelationships, and the self. We should, accordingly, expect dramatic changes in perceptions of the therapeutic act following the emergence of what we term the artificial third in the field of psychotherapy. The introduction of an artificial third marks a critical juncture, prompting us to ask the following important core questions that address two basic elements of critical thinking, namely, transparency and autonomy: (1) What is this new artificial presence in therapy relationships? (2) How does it reshape our perception of ourselves and our interpersonal dynamics? and (3) What remains of the irreplaceable human elements at the core of therapy? Given the ethical implications that arise from these questions, this paper proposes that the artificial third can be a valuable asset when applied with insight and ethical consideration, enhancing but not replacing the human touch in therapy.
Collapse
Affiliation(s)
- Yuval Haber
- The PhD Program of Hermeneutics and Cultural Studies, Interdisciplinary Studies Unit, Bar-Ilan University, Ramat Gan, Israel
| | | | - Dorit Hadar-Shoval
- Department of Psychology and Educational Counseling, The Max Stern Yezreel Valley College, Emek Yezreel, Israel
| | - Zohar Elyoseph
- Department of Brain Sciences, Faculty of Medicine, Imperial College London, London, United Kingdom
- The Center for Psychobiological Research, Department of Psychology and Educational Counseling, The Max Stern Yezreel Valley College, Emek Yezreel, Israel
| |
Collapse
|
23
|
Yang P, Jiang J. In Reference to Evaluation of Oropharyngeal Cancer Information from Revolutionary Artificial Intelligence Chatbot. Laryngoscope 2024; 134:E18. [PMID: 38299720 DOI: 10.1002/lary.31315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2023] [Accepted: 12/06/2023] [Indexed: 02/02/2024]
Affiliation(s)
- Pingping Yang
- Department of Laboratory Medicine, People's Hospital of Qiannan Prefecture, Guizhou, China
| | - Jiuliang Jiang
- School of Clinical Medicine, Guizhou Medical University, Guizhou, China
| |
Collapse
|
24
|
Kokot NC, Davis RJ. In Response to Evaluation of Oropharyngeal Cancer Information from Revolutionary Artificial Intelligence Chatbot. Laryngoscope 2024; 134:E19. [PMID: 38299696 DOI: 10.1002/lary.31313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Accepted: 01/05/2024] [Indexed: 02/02/2024]
Affiliation(s)
- Niels C Kokot
- Caruso Department of Otolaryngology-Head & Neck Surgery, Keck School of Medicine of the University of Southern California, Los Angeles, California, U.S.A
| | - Ryan J Davis
- Keck School of Medicine of the University of Southern California, Los Angeles, California, U.S.A
| |
Collapse
|
25
|
Wang S, Mo C, Chen Y, Dai X, Wang H, Shen X. Exploring the Performance of ChatGPT-4 in the Taiwan Audiologist Qualification Examination: Preliminary Observational Study Highlighting the Potential of AI Chatbots in Hearing Care. JMIR MEDICAL EDUCATION 2024; 10:e55595. [PMID: 38693697 PMCID: PMC11067446 DOI: 10.2196/55595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 03/09/2024] [Accepted: 03/22/2024] [Indexed: 05/03/2024]
Abstract
Background Artificial intelligence (AI) chatbots, such as ChatGPT-4, have shown immense potential for application across various aspects of medicine, including medical education, clinical practice, and research. Objective This study aimed to evaluate the performance of ChatGPT-4 in the 2023 Taiwan Audiologist Qualification Examination, thereby preliminarily exploring the potential utility of AI chatbots in the fields of audiology and hearing care services. Methods ChatGPT-4 was tasked to provide answers and reasoning for the 2023 Taiwan Audiologist Qualification Examination. The examination encompassed six subjects: (1) basic auditory science, (2) behavioral audiology, (3) electrophysiological audiology, (4) principles and practice of hearing devices, (5) health and rehabilitation of the auditory and balance systems, and (6) auditory and speech communication disorders (including professional ethics). Each subject included 50 multiple-choice questions, with the exception of behavioral audiology, which had 49 questions, amounting to a total of 299 questions. Results The correct answer rates across the 6 subjects were as follows: 88% for basic auditory science, 63% for behavioral audiology, 58% for electrophysiological audiology, 72% for principles and practice of hearing devices, 80% for health and rehabilitation of the auditory and balance systems, and 86% for auditory and speech communication disorders (including professional ethics). The overall accuracy rate for the 299 questions was 75%, which surpasses the examination's passing criteria of an average 60% accuracy rate across all subjects. A comprehensive review of ChatGPT-4's responses indicated that incorrect answers were predominantly due to information errors. Conclusions ChatGPT-4 demonstrated a robust performance in the Taiwan Audiologist Qualification Examination, showcasing effective logical reasoning skills. Our results suggest that with enhanced information accuracy, ChatGPT-4's performance could be further improved. This study indicates significant potential for the application of AI chatbots in audiology and hearing care services.
Collapse
Affiliation(s)
- Shangqiguo Wang
- Human Communication, Learning, and Development Unit, Faculty of Education, The University of Hong Kong, Hong Kong, China (Hong Kong)
| | - Changgeng Mo
- Department of Otorhinolaryngology, Head and Neck Surgery, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong, China (Hong Kong)
| | - Yuan Chen
- Department of Special Education and Counselling, The Education University of Hong Kong, Hong Kong, China (Hong Kong)
| | - Xiaolu Dai
- Department of Social Work, Hong Kong Baptist University, Hong Kong, China (Hong Kong)
| | - Huiyi Wang
- Department of Medical Services, Children’s Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Xiaoli Shen
- Department of Health and Early Childhood Care, Ningbo College of Health School, Ningbo, China
| |
Collapse
|
26
|
Embaye J, de Wit M, Snoek F. A Self-Guided Web-Based App (MyDiaMate) for Enhancing Mental Health in Adults With Type 1 Diabetes: Insights From a Real-World Study in the Netherlands. JMIR Diabetes 2024; 9:e52923. [PMID: 38568733 PMCID: PMC11024740 DOI: 10.2196/52923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 01/23/2024] [Accepted: 02/08/2024] [Indexed: 04/05/2024] Open
Abstract
BACKGROUND MyDiaMate is a web-based intervention specifically designed for adults with type 1 diabetes (T1D) that aims to help them improve and maintain their mental health. Prior pilot-testing of MyDiaMate verified its acceptability, feasibility, and usability. OBJECTIVE This study aimed to investigate the real-world uptake and usage of MyDiaMate in the Netherlands. METHODS Between March 2021 and December 2022, MyDiaMate was made freely available to Dutch adults with T1D. Usage (participation and completion rates of the modules) was tracked using log data. Users could volunteer to participate in the user profile study, which required filling out a set of baseline questionnaires. The usage of study participants was examined separately for participants scoring above and below the cutoffs of the "Problem Areas in Diabetes" (PAID-11) questionnaire (diabetes distress), the "World Health Organization Well-being Index" (WHO-5) questionnaire (emotional well-being), and the fatigue severity subscale of the "Checklist Individual Strength" (CIS) questionnaire (fatigue). Two months after creating an account, study participants received an evaluation questionnaire to provide us with feedback. RESULTS In total, 1008 adults created a MyDiaMate account, of whom 343 (34%) participated in the user profile study. The mean age was 43 (SD 14.9; 18-76) years. Most participants were female (n=217, 63.3%) and higher educated (n=198, 57.6%). The majority had been living with T1D for over 5 years (n=241, 73.5%). Of the study participants, 59.1% (n=199) of them reported low emotional well-being (WHO-5 score≤50), 70.9% (n=239) of them reported elevated diabetes distress (PAID-11 score≥18), and 52.4% (n=178) of them reported severe fatigue (CIS score≥35). Participation rates varied between 9.5% (n=19) for social environment to 100% (n=726) for diabetes in balance, which opened by default. Completion rates ranged from 4.3% (n=1) for energy, an extensive cognitive behavioral therapy module, to 68.6% (n=24) for the shorter module on hypos. There were no differences in terms of participation and completion rates of the modules between study participants with a more severe profile, that is, lower emotional well-being, greater diabetes distress, or more fatigue symptoms, and those with a less severe profile. Further, no technical problems were reported, and various suggestions were made by study participants to improve the application, suggesting a need for more personalization. CONCLUSIONS Data from this naturalistic study demonstrated the potential of MyDiaMate as a self-help tool for adults with T1D, supplementary to ongoing diabetes care, to improve healthy coping with diabetes and mental health. Future research is needed to explore engagement strategies and test the efficacy of MyDiaMate in a randomized controlled trial.
Collapse
Affiliation(s)
- Jiska Embaye
- Department of Medical Psychology, Amsterdam Public Health, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
| | - Maartje de Wit
- Department of Medical Psychology, Amsterdam Public Health, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
| | - Frank Snoek
- Department of Medical Psychology, Amsterdam Public Health, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
| |
Collapse
|
27
|
Elyoseph Z, Levkovich I. Comparing the Perspectives of Generative AI, Mental Health Experts, and the General Public on Schizophrenia Recovery: Case Vignette Study. JMIR Ment Health 2024; 11:e53043. [PMID: 38533615 PMCID: PMC11004608 DOI: 10.2196/53043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Revised: 01/24/2024] [Accepted: 02/11/2024] [Indexed: 03/28/2024] Open
Abstract
Background The current paradigm in mental health care focuses on clinical recovery and symptom remission. This model's efficacy is influenced by therapist trust in patient recovery potential and the depth of the therapeutic relationship. Schizophrenia is a chronic illness with severe symptoms where the possibility of recovery is a matter of debate. As artificial intelligence (AI) becomes integrated into the health care field, it is important to examine its ability to assess recovery potential in major psychiatric disorders such as schizophrenia. Objective This study aimed to evaluate the ability of large language models (LLMs) in comparison to mental health professionals to assess the prognosis of schizophrenia with and without professional treatment and the long-term positive and negative outcomes. Methods Vignettes were inputted into LLMs interfaces and assessed 10 times by 4 AI platforms: ChatGPT-3.5, ChatGPT-4, Google Bard, and Claude. A total of 80 evaluations were collected and benchmarked against existing norms to analyze what mental health professionals (general practitioners, psychiatrists, clinical psychologists, and mental health nurses) and the general public think about schizophrenia prognosis with and without professional treatment and the positive and negative long-term outcomes of schizophrenia interventions. Results For the prognosis of schizophrenia with professional treatment, ChatGPT-3.5 was notably pessimistic, whereas ChatGPT-4, Claude, and Bard aligned with professional views but differed from the general public. All LLMs believed untreated schizophrenia would remain static or worsen without professional treatment. For long-term outcomes, ChatGPT-4 and Claude predicted more negative outcomes than Bard and ChatGPT-3.5. For positive outcomes, ChatGPT-3.5 and Claude were more pessimistic than Bard and ChatGPT-4. Conclusions The finding that 3 out of the 4 LLMs aligned closely with the predictions of mental health professionals when considering the "with treatment" condition is a demonstration of the potential of this technology in providing professional clinical prognosis. The pessimistic assessment of ChatGPT-3.5 is a disturbing finding since it may reduce the motivation of patients to start or persist with treatment for schizophrenia. Overall, although LLMs hold promise in augmenting health care, their application necessitates rigorous validation and a harmonious blend with human expertise.
Collapse
Affiliation(s)
- Zohar Elyoseph
- Department of Brain Sciences, Faculty of Medicine, Imperial College London, London, United Kingdom
- The Center for Psychobiological Research, Department of Psychology and Educational Counseling, Max Stern Yezreel Valley College, Emek Yezreel, Israel
| | - Inbar Levkovich
- Faculty of Graduate Studies, Oranim Academic College, Kiryat Tiv'on, Israel
| |
Collapse
|
28
|
Nedbal C, Naik N, Castellani D, Gauhar V, Geraghty R, Somani BK. ChatGPT in urology practice: revolutionizing efficiency and patient care with generative artificial intelligence. Curr Opin Urol 2024; 34:98-104. [PMID: 37962176 DOI: 10.1097/mou.0000000000001151] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
PURPOSE OF REVIEW ChatGPT has emerged as a potentially useful tool for healthcare. Its role in urology is in its infancy and has much potential for research, clinical practice and for patient assistance. With this narrative review, we want to draw a picture of what is known about ChatGPT's integration in urology, alongside future promises and challenges. RECENT FINDINGS The use of ChatGPT can ease the administrative work, helping urologists with note-taking and clinical documentation such as discharge summaries and clinical notes. It can improve patient engagement through increasing awareness and facilitating communication, as it has especially been investigated for uro-oncological diseases. Its ability to understand human emotions makes ChatGPT an empathic and thoughtful interactive tool or source for urological patients and their relatives. Currently, its role in clinical diagnosis and treatment decisions is uncertain, as concerns have been raised about misinterpretation, hallucination and out-of-date information. Moreover, a mandatory regulatory process for ChatGPT in urology is yet to be established. SUMMARY ChatGPT has the potential to contribute to precision medicine and tailored practice by its quick, structured responses. However, this will depend on how well information can be obtained by seeking appropriate responses and asking the pertinent questions. The key lies in being able to validate the responses, regulating the information shared and avoiding misuse of the same to protect the data and patient privacy. Its successful integration into mainstream urology needs educational bodies to provide guidelines or best practice recommendations for the same.
Collapse
Affiliation(s)
- Carlotta Nedbal
- Department of Urology, University Hospitals Southampton, NHS Trust, Southampton, UK
- Urology Unit, Azienda Ospedaliero-Universitaria delle Marche, Polytechnic University of Marche, Ancona, Italy
| | - Nitesh Naik
- Department of Mechanical and Industrial Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, India
| | - Daniele Castellani
- Urology Unit, Azienda Ospedaliero-Universitaria delle Marche, Polytechnic University of Marche, Ancona, Italy
| | - Vineet Gauhar
- Department of Urology, Ng Teng Fong General Hospital, NUHS, Singapore
| | - Robert Geraghty
- Department of Urology, Freeman Hospital, Newcastle-upon-Tyne, UK
| | - Bhaskar Kumar Somani
- Department of Urology, University Hospitals Southampton, NHS Trust, Southampton, UK
| |
Collapse
|
29
|
Elyoseph Z, Refoua E, Asraf K, Lvovsky M, Shimoni Y, Hadar-Shoval D. Capacity of Generative AI to Interpret Human Emotions From Visual and Textual Data: Pilot Evaluation Study. JMIR Ment Health 2024; 11:e54369. [PMID: 38319707 PMCID: PMC10879976 DOI: 10.2196/54369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 12/09/2023] [Accepted: 12/25/2023] [Indexed: 02/07/2024] Open
Abstract
BACKGROUND Mentalization, which is integral to human cognitive processes, pertains to the interpretation of one's own and others' mental states, including emotions, beliefs, and intentions. With the advent of artificial intelligence (AI) and the prominence of large language models in mental health applications, questions persist about their aptitude in emotional comprehension. The prior iteration of the large language model from OpenAI, ChatGPT-3.5, demonstrated an advanced capacity to interpret emotions from textual data, surpassing human benchmarks. Given the introduction of ChatGPT-4, with its enhanced visual processing capabilities, and considering Google Bard's existing visual functionalities, a rigorous assessment of their proficiency in visual mentalizing is warranted. OBJECTIVE The aim of the research was to critically evaluate the capabilities of ChatGPT-4 and Google Bard with regard to their competence in discerning visual mentalizing indicators as contrasted with their textual-based mentalizing abilities. METHODS The Reading the Mind in the Eyes Test developed by Baron-Cohen and colleagues was used to assess the models' proficiency in interpreting visual emotional indicators. Simultaneously, the Levels of Emotional Awareness Scale was used to evaluate the large language models' aptitude in textual mentalizing. Collating data from both tests provided a holistic view of the mentalizing capabilities of ChatGPT-4 and Bard. RESULTS ChatGPT-4, displaying a pronounced ability in emotion recognition, secured scores of 26 and 27 in 2 distinct evaluations, significantly deviating from a random response paradigm (P<.001). These scores align with established benchmarks from the broader human demographic. Notably, ChatGPT-4 exhibited consistent responses, with no discernible biases pertaining to the sex of the model or the nature of the emotion. In contrast, Google Bard's performance aligned with random response patterns, securing scores of 10 and 12 and rendering further detailed analysis redundant. In the domain of textual analysis, both ChatGPT and Bard surpassed established benchmarks from the general population, with their performances being remarkably congruent. CONCLUSIONS ChatGPT-4 proved its efficacy in the domain of visual mentalizing, aligning closely with human performance standards. Although both models displayed commendable acumen in textual emotion interpretation, Bard's capabilities in visual emotion interpretation necessitate further scrutiny and potential refinement. This study stresses the criticality of ethical AI development for emotional recognition, highlighting the need for inclusive data, collaboration with patients and mental health experts, and stringent governmental oversight to ensure transparency and protect patient privacy.
Collapse
Affiliation(s)
- Zohar Elyoseph
- Department of Educational Psychology, The Center for Psychobiological Research, The Max Stern Yezreel Valley College, Emek Yezreel, Israel
- Imperial College London, London, United Kingdom
| | - Elad Refoua
- Department of Psychology, Bar-Ilan University, Ramat Gan, Israel
| | - Kfir Asraf
- Department of Psychology, The Max Stern Yezreel Valley College, Emek Yezreel, Israel
| | - Maya Lvovsky
- Department of Psychology, The Max Stern Yezreel Valley College, Emek Yezreel, Israel
| | - Yoav Shimoni
- Boston Children's Hospital, Boston, MA, United States
| | - Dorit Hadar-Shoval
- Department of Psychology, The Max Stern Yezreel Valley College, Emek Yezreel, Israel
| |
Collapse
|
30
|
Sufyan NS, Fadhel FH, Alkhathami SS, Mukhadi JYA. Artificial intelligence and social intelligence: preliminary comparison study between AI models and psychologists. Front Psychol 2024; 15:1353022. [PMID: 38379623 PMCID: PMC10878391 DOI: 10.3389/fpsyg.2024.1353022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2023] [Accepted: 01/22/2024] [Indexed: 02/22/2024] Open
Abstract
Background Social intelligence (SI) is of great importance in the success of the counseling and psychotherapy, whether for the psychologist or for the artificial intelligence systems that help the psychologist, as it is the ability to understand the feelings, emotions, and needs of people during the counseling process. Therefore, this study aims to identify the Social Intelligence (SI) of artificial intelligence represented by its large linguistic models, "ChatGPT; Google Bard; and Bing" compared to psychologists. Methods A stratified random manner sample of 180 students of counseling psychology from the bachelor's and doctoral stages at King Khalid University was selected, while the large linguistic models included ChatGPT-4, Google Bard, and Bing. They (the psychologists and the AI models) responded to the social intelligence scale. Results There were significant differences in SI between psychologists and AI's ChatGPT-4 and Bing. ChatGPT-4 exceeded 100% of all the psychologists, and Bing outperformed 50% of PhD holders and 90% of bachelor's holders. The differences in SI between Google Bard and bachelor students were not significant, whereas the differences with PhDs were significant; Where 90% of PhD holders excel on Google Bird. Conclusion We explored the possibility of using human measures on AI entities, especially language models, and the results indicate that the development of AI in understanding emotions and social behavior related to social intelligence is very rapid. AI will help the psychotherapist a great deal in new ways. The psychotherapist needs to be aware of possible areas of further development of AI given their benefits in counseling and psychotherapy. Studies using humanistic and non-humanistic criteria with large linguistic models are needed.
Collapse
Affiliation(s)
- Nabil Saleh Sufyan
- Psychology Department, College of Education, King Khalid University, Abha, Saudi Arabia
| | - Fahmi H. Fadhel
- Psychology Program, Social Science Department, College of Arts and Sciences, Qatar University, Doha, Qatar
| | | | - Jubran Y. A. Mukhadi
- Psychology Department, College of Education, King Khalid University, Abha, Saudi Arabia
| |
Collapse
|
31
|
Elyoseph Z, Levkovich I, Shinan-Altman S. Assessing prognosis in depression: comparing perspectives of AI models, mental health professionals and the general public. Fam Med Community Health 2024; 12:e002583. [PMID: 38199604 PMCID: PMC10806564 DOI: 10.1136/fmch-2023-002583] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2024] Open
Abstract
BACKGROUND Artificial intelligence (AI) has rapidly permeated various sectors, including healthcare, highlighting its potential to facilitate mental health assessments. This study explores the underexplored domain of AI's role in evaluating prognosis and long-term outcomes in depressive disorders, offering insights into how AI large language models (LLMs) compare with human perspectives. METHODS Using case vignettes, we conducted a comparative analysis involving different LLMs (ChatGPT-3.5, ChatGPT-4, Claude and Bard), mental health professionals (general practitioners, psychiatrists, clinical psychologists and mental health nurses), and the general public that reported previously. We evaluate the LLMs ability to generate prognosis, anticipated outcomes with and without professional intervention, and envisioned long-term positive and negative consequences for individuals with depression. RESULTS In most of the examined cases, the four LLMs consistently identified depression as the primary diagnosis and recommended a combined treatment of psychotherapy and antidepressant medication. ChatGPT-3.5 exhibited a significantly pessimistic prognosis distinct from other LLMs, professionals and the public. ChatGPT-4, Claude and Bard aligned closely with mental health professionals and the general public perspectives, all of whom anticipated no improvement or worsening without professional help. Regarding long-term outcomes, ChatGPT 3.5, Claude and Bard consistently projected significantly fewer negative long-term consequences of treatment than ChatGPT-4. CONCLUSIONS This study underscores the potential of AI to complement the expertise of mental health professionals and promote a collaborative paradigm in mental healthcare. The observation that three of the four LLMs closely mirrored the anticipations of mental health experts in scenarios involving treatment underscores the technology's prospective value in offering professional clinical forecasts. The pessimistic outlook presented by ChatGPT 3.5 is concerning, as it could potentially diminish patients' drive to initiate or continue depression therapy. In summary, although LLMs show potential in enhancing healthcare services, their utilisation requires thorough verification and a seamless integration with human judgement and skills.
Collapse
Affiliation(s)
- Zohar Elyoseph
- Department of Psychology and Educational Counseling, The Center for Psychobiological Research, Max Stern Yezreel Valley College, Yezreel Valley, Israel
- Department of Brain Sciences, Imperial College London, London, UK
| | - Inbar Levkovich
- Faculty of Graduate Studies, Oranim Academic College, Tivon, Israel
| | - Shiri Shinan-Altman
- The Louis and Gabi Weisfeld School of Social Work, Bar-Ilan University, Ramat Gan, Tel Aviv, Israel
| |
Collapse
|
32
|
Elyoseph Z, Hadar Shoval D, Levkovich I. Beyond Personhood: Ethical Paradigms in the Generative Artificial Intelligence Era. THE AMERICAN JOURNAL OF BIOETHICS : AJOB 2024; 24:57-59. [PMID: 38236857 DOI: 10.1080/15265161.2023.2278546] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2024]
Affiliation(s)
- Zohar Elyoseph
- The Max Stern Yezreel Valley College
- Imperial College, London
| | | | | |
Collapse
|
33
|
Fei X, Tang Y, Zhang J, Zhou Z, Yamamoto I, Zhang Y. Evaluating cognitive performance: Traditional methods vs. ChatGPT. Digit Health 2024; 10:20552076241264639. [PMID: 39156049 PMCID: PMC11329975 DOI: 10.1177/20552076241264639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Accepted: 06/10/2024] [Indexed: 08/20/2024] Open
Abstract
Background NLP models like ChatGPT promise to revolutionize text-based content delivery, particularly in medicine. Yet, doubts remain about ChatGPT's ability to reliably support evaluations of cognitive performance, warranting further investigation into its accuracy and comprehensiveness in this area. Method A cohort of 60 cognitively normal individuals and 30 stroke survivors underwent a comprehensive evaluation, covering memory, numerical processing, verbal fluency, and abstract thinking. Healthcare professionals and NLP models GPT-3.5 and GPT-4 conducted evaluations following established standards. Scores were compared, and efforts were made to refine scoring protocols and interaction methods to enhance ChatGPT's potential in these evaluations. Result Within the cohort of healthy participants, the utilization of GPT-3.5 revealed significant disparities in memory evaluation compared to both physician-led assessments and those conducted utilizing GPT-4 (P < 0.001). Furthermore, within the domain of memory evaluation, GPT-3.5 exhibited discrepancies in 8 out of 21 specific measures when compared to assessments conducted by physicians (P < 0.05). Additionally, GPT-3.5 demonstrated statistically significant deviations from physician assessments in speech evaluation (P = 0.009). Among participants with a history of stroke, GPT-3.5 exhibited differences solely in verbal assessment compared to physician-led evaluations (P = 0.002). Notably, through the implementation of optimized scoring methodologies and refinement of interaction protocols, partial mitigation of these disparities was achieved. Conclusion ChatGPT can produce evaluation outcomes comparable to traditional methods. Despite differences from physician evaluations, refinement of scoring algorithms and interaction protocols has improved alignment. ChatGPT performs well even in populations with specific conditions like stroke, suggesting its versatility. GPT-4 yields results closer to physician ratings, indicating potential for further enhancement. These findings highlight ChatGPT's importance as a supplementary tool, offering new avenues for information gathering in medical fields and guiding its ongoing development and application.
Collapse
Affiliation(s)
- Xiao Fei
- Department of Rehabilitation Medicine, The First People's Hospital of Changzhou, Changzhou, China
| | - Ying Tang
- Department of Rehabilitation Medicine, The First People's Hospital of Changzhou, Changzhou, China
| | - Jianan Zhang
- Department of Rehabilitation Medicine, The First People's Hospital of Changzhou, Changzhou, China
| | - Zhongkai Zhou
- College of Information Science and Engineering, Hohai University, Changzhou, China
| | - Ikuo Yamamoto
- Graduate School of Engineering, Nagasaki University, Nagasaki, Japan
| | - Yi Zhang
- Department of Rehabilitation Medicine, The First People's Hospital of Changzhou, Changzhou, China
| |
Collapse
|
34
|
Watari T, Takagi S, Sakaguchi K, Nishizaki Y, Shimizu T, Yamamoto Y, Tokuda Y. Performance Comparison of ChatGPT-4 and Japanese Medical Residents in the General Medicine In-Training Examination: Comparison Study. JMIR MEDICAL EDUCATION 2023; 9:e52202. [PMID: 38055323 PMCID: PMC10733815 DOI: 10.2196/52202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 10/22/2023] [Accepted: 11/03/2023] [Indexed: 12/07/2023]
Abstract
BACKGROUND The reliability of GPT-4, a state-of-the-art expansive language model specializing in clinical reasoning and medical knowledge, remains largely unverified across non-English languages. OBJECTIVE This study aims to compare fundamental clinical competencies between Japanese residents and GPT-4 by using the General Medicine In-Training Examination (GM-ITE). METHODS We used the GPT-4 model provided by OpenAI and the GM-ITE examination questions for the years 2020, 2021, and 2022 to conduct a comparative analysis. This analysis focused on evaluating the performance of individuals who were concluding their second year of residency in comparison to that of GPT-4. Given the current abilities of GPT-4, our study included only single-choice exam questions, excluding those involving audio, video, or image data. The assessment included 4 categories: general theory (professionalism and medical interviewing), symptomatology and clinical reasoning, physical examinations and clinical procedures, and specific diseases. Additionally, we categorized the questions into 7 specialty fields and 3 levels of difficulty, which were determined based on residents' correct response rates. RESULTS Upon examination of 137 GM-ITE questions in Japanese, GPT-4 scores were significantly higher than the mean scores of residents (residents: 55.8%, GPT-4: 70.1%; P<.001). In terms of specific disciplines, GPT-4 scored 23.5 points higher in the "specific diseases," 30.9 points higher in "obstetrics and gynecology," and 26.1 points higher in "internal medicine." In contrast, GPT-4 scores in "medical interviewing and professionalism," "general practice," and "psychiatry" were lower than those of the residents, although this discrepancy was not statistically significant. Upon analyzing scores based on question difficulty, GPT-4 scores were 17.2 points lower for easy problems (P=.007) but were 25.4 and 24.4 points higher for normal and difficult problems, respectively (P<.001). In year-on-year comparisons, GPT-4 scores were 21.7 and 21.5 points higher in the 2020 (P=.01) and 2022 (P=.003) examinations, respectively, but only 3.5 points higher in the 2021 examinations (no significant difference). CONCLUSIONS In the Japanese language, GPT-4 also outperformed the average medical residents in the GM-ITE test, originally designed for them. Specifically, GPT-4 demonstrated a tendency to score higher on difficult questions with low resident correct response rates and those demanding a more comprehensive understanding of diseases. However, GPT-4 scored comparatively lower on questions that residents could readily answer, such as those testing attitudes toward patients and professionalism, as well as those necessitating an understanding of context and communication. These findings highlight the strengths and limitations of artificial intelligence applications in medical education and practice.
Collapse
Affiliation(s)
- Takashi Watari
- General Medicine Center, Shimane University Hospital, Izumo, Japan
- Department of Medicine, University of Michigan Medical School, Ann Arbor, MI, United States
- Medicine Service, VA Ann Arbor Healthcare System, Ann Arbor, MI, United States
| | - Soshi Takagi
- Faculty of Medicine, Shimane University, Izuom, Japan
| | - Kota Sakaguchi
- General Medicine Center, Shimane University Hospital, Izumo, Japan
| | - Yuji Nishizaki
- Division of Medical Education, Juntendo University School of Medicine, Tokyo, Japan
| | - Taro Shimizu
- Department of Diagnostic and Generalist Medicine, Dokkyo Medical University Hospital, Tochigi, Japan
| | - Yu Yamamoto
- Division of General Medicine, Center for Community Medicine, Jichi Medical University, Tochigi, Japan
| | - Yasuharu Tokuda
- Muribushi Okinawa Project for Teaching Hospitals, Okinawa, Japan
| |
Collapse
|
35
|
Levkovich I, Elyoseph Z. Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 Versus ChatGPT-4: Vignette Study. JMIR Ment Health 2023; 10:e51232. [PMID: 37728984 PMCID: PMC10551796 DOI: 10.2196/51232] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 08/22/2023] [Accepted: 08/24/2023] [Indexed: 09/22/2023] Open
Abstract
BACKGROUND ChatGPT, a linguistic artificial intelligence (AI) model engineered by OpenAI, offers prospective contributions to mental health professionals. Although having significant theoretical implications, ChatGPT's practical capabilities, particularly regarding suicide prevention, have not yet been substantiated. OBJECTIVE The study's aim was to evaluate ChatGPT's ability to assess suicide risk, taking into consideration 2 discernable factors-perceived burdensomeness and thwarted belongingness-over a 2-month period. In addition, we evaluated whether ChatGPT-4 more accurately evaluated suicide risk than did ChatGPT-3.5. METHODS ChatGPT was tasked with assessing a vignette that depicted a hypothetical patient exhibiting differing degrees of perceived burdensomeness and thwarted belongingness. The assessments generated by ChatGPT were subsequently contrasted with standard evaluations rendered by mental health professionals. Using both ChatGPT-3.5 and ChatGPT-4 (May 24, 2023), we executed 3 evaluative procedures in June and July 2023. Our intent was to scrutinize ChatGPT-4's proficiency in assessing various facets of suicide risk in relation to the evaluative abilities of both mental health professionals and an earlier version of ChatGPT-3.5 (March 14 version). RESULTS During the period of June and July 2023, we found that the likelihood of suicide attempts as evaluated by ChatGPT-4 was similar to the norms of mental health professionals (n=379) under all conditions (average Z score of 0.01). Nonetheless, a pronounced discrepancy was observed regarding the assessments performed by ChatGPT-3.5 (May version), which markedly underestimated the potential for suicide attempts, in comparison to the assessments carried out by the mental health professionals (average Z score of -0.83). The empirical evidence suggests that ChatGPT-4's evaluation of the incidence of suicidal ideation and psychache was higher than that of the mental health professionals (average Z score of 0.47 and 1.00, respectively). Conversely, the level of resilience as assessed by both ChatGPT-4 and ChatGPT-3.5 (both versions) was observed to be lower in comparison to the assessments offered by mental health professionals (average Z score of -0.89 and -0.90, respectively). CONCLUSIONS The findings suggest that ChatGPT-4 estimates the likelihood of suicide attempts in a manner akin to evaluations provided by professionals. In terms of recognizing suicidal ideation, ChatGPT-4 appears to be more precise. However, regarding psychache, there was an observed overestimation by ChatGPT-4, indicating a need for further research. These results have implications regarding ChatGPT-4's potential to support gatekeepers, patients, and even mental health professionals' decision-making. Despite the clinical potential, intensive follow-up studies are necessary to establish the use of ChatGPT-4's capabilities in clinical practice. The finding that ChatGPT-3.5 frequently underestimates suicide risk, especially in severe cases, is particularly troubling. It indicates that ChatGPT may downplay one's actual suicide risk level.
Collapse
Affiliation(s)
- Inbar Levkovich
- Oranim Academic College, Faculty of Graduate Studies, Kiryat Tivon, Israel
| | - Zohar Elyoseph
- Department of Psychology and Educational Counseling, The Center for Psychobiological Research, Max Stern Yezreel Valley College, Emek Yezreel, Israel
- Department of Brain Sciences, Faculty of Medicine, Imperial College London, London, United Kingdom
| |
Collapse
|
36
|
Talyshinskii A, Naik N, Hameed BMZ, Zhanbyrbekuly U, Khairli G, Guliev B, Juilebø-Jones P, Tzelves L, Somani BK. Expanding horizons and navigating challenges for enhanced clinical workflows: ChatGPT in urology. Front Surg 2023; 10:1257191. [PMID: 37744723 PMCID: PMC10512827 DOI: 10.3389/fsurg.2023.1257191] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 08/28/2023] [Indexed: 09/26/2023] Open
Abstract
Purpose of review ChatGPT has emerged as a potential tool for facilitating doctors' workflows. However, when it comes to applying these findings within a urological context, there have not been many studies. Thus, our objective was rooted in analyzing the pros and cons of ChatGPT use and how it can be exploited and used by urologists. Recent findings ChatGPT can facilitate clinical documentation and note-taking, patient communication and support, medical education, and research. In urology, it was proven that ChatGPT has the potential as a virtual healthcare aide for benign prostatic hyperplasia, an educational and prevention tool on prostate cancer, educational support for urological residents, and as an assistant in writing urological papers and academic work. However, several concerns about its exploitation are presented, such as lack of web crawling, risk of accidental plagiarism, and concerns about patients-data privacy. Summary The existing limitations mediate the need for further improvement of ChatGPT, such as ensuring the privacy of patient data and expanding the learning dataset to include medical databases, and developing guidance on its appropriate use. Urologists can also help by conducting studies to determine the effectiveness of ChatGPT in urology in clinical scenarios and nosologies other than those previously listed.
Collapse
Affiliation(s)
- Ali Talyshinskii
- Department of Urology, Astana Medical University, Astana, Kazakhstan
| | - Nithesh Naik
- Department of Mechanical and Industrial Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | | | | | - Gafur Khairli
- Department of Urology, Astana Medical University, Astana, Kazakhstan
| | - Bakhman Guliev
- Department of Urology, Mariinsky Hospital, St Petersburg, Russia
| | | | - Lazaros Tzelves
- Department of Urology, National and Kapodistrian University of Athens, Sismanogleion Hospital, Athens, Marousi, Greece
| | - Bhaskar Kumar Somani
- Department of Urology, University Hospital Southampton NHS Trust, Southampton, United Kingdom
| |
Collapse
|
37
|
Levkovich I, Elyoseph Z. Identifying depression and its determinants upon initiating treatment: ChatGPT versus primary care physicians. Fam Med Community Health 2023; 11:e002391. [PMID: 37844967 PMCID: PMC10582915 DOI: 10.1136/fmch-2023-002391] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2023] Open
Abstract
OBJECTIVE To compare evaluations of depressive episodes and suggested treatment protocols generated by Chat Generative Pretrained Transformer (ChatGPT)-3 and ChatGPT-4 with the recommendations of primary care physicians. METHODS Vignettes were input to the ChatGPT interface. These vignettes focused primarily on hypothetical patients with symptoms of depression during initial consultations. The creators of these vignettes meticulously designed eight distinct versions in which they systematically varied patient attributes (sex, socioeconomic status (blue collar worker or white collar worker) and depression severity (mild or severe)). Each variant was subsequently introduced into ChatGPT-3.5 and ChatGPT-4. Each vignette was repeated 10 times to ensure consistency and reliability of the ChatGPT responses. RESULTS For mild depression, ChatGPT-3.5 and ChatGPT-4 recommended psychotherapy in 95.0% and 97.5% of cases, respectively. Primary care physicians, however, recommended psychotherapy in only 4.3% of cases. For severe cases, ChatGPT favoured an approach that combined psychotherapy, while primary care physicians recommended a combined approach. The pharmacological recommendations of ChatGPT-3.5 and ChatGPT-4 showed a preference for exclusive use of antidepressants (74% and 68%, respectively), in contrast with primary care physicians, who typically recommended a mix of antidepressants and anxiolytics/hypnotics (67.4%). Unlike primary care physicians, ChatGPT showed no gender or socioeconomic biases in its recommendations. CONCLUSION ChatGPT-3.5 and ChatGPT-4 aligned well with accepted guidelines for managing mild and severe depression, without showing the gender or socioeconomic biases observed among primary care physicians. Despite the suggested potential benefit of using atificial intelligence (AI) chatbots like ChatGPT to enhance clinical decision making, further research is needed to refine AI recommendations for severe cases and to consider potential risks and ethical issues.
Collapse
Affiliation(s)
| | - Zohar Elyoseph
- Department of Psychology and Educational Counseling, Max Stern Academic College Of Emek Yezreel, Emek Yezreel, Israel
- Department of Brain Sciences, Imperial College London, London, UK
| |
Collapse
|
38
|
Hadar-Shoval D, Elyoseph Z, Lvovsky M. The plasticity of ChatGPT's mentalizing abilities: personalization for personality structures. Front Psychiatry 2023; 14:1234397. [PMID: 37720897 PMCID: PMC10503434 DOI: 10.3389/fpsyt.2023.1234397] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/04/2023] [Accepted: 08/22/2023] [Indexed: 09/19/2023] Open
Abstract
This study evaluated the potential of ChatGPT, a large language model, to generate mentalizing-like abilities that are tailored to a specific personality structure and/or psychopathology. Mentalization is the ability to understand and interpret one's own and others' mental states, including thoughts, feelings, and intentions. Borderline Personality Disorder (BPD) and Schizoid Personality Disorder (SPD) are characterized by distinct patterns of emotional regulation. Individuals with BPD tend to experience intense and unstable emotions, while individuals with SPD tend to experience flattened or detached emotions. We used ChatGPT's free version 23.3 and assessed the extent to which its responses akin to emotional awareness (EA) were customized to the distinctive personality structure-character characterized by Borderline Personality Disorder (BPD) and Schizoid Personality Disorder (SPD), employing the Levels of Emotional Awareness Scale (LEAS). ChatGPT was able to accurately describe the emotional reactions of individuals with BPD as more intense, complex, and rich than those with SPD. This finding suggests that ChatGPT can generate mentalizing-like responses consistent with a range of psychopathologies in line with clinical and theoretical knowledge. However, the study also raises concerns regarding the potential for stigmas or biases related to mental diagnoses to impact the validity and usefulness of chatbot-based clinical interventions. We emphasize the need for the responsible development and deployment of chatbot-based interventions in mental health, which considers diverse theoretical frameworks.
Collapse
Affiliation(s)
- Dorit Hadar-Shoval
- Department of Psychology and Educational Counseling, The Center for Psychobiological Research, Max Stern Yezreel Valley College, Emek Yezreel, Israel
| | - Zohar Elyoseph
- Department of Psychology and Educational Counseling, The Center for Psychobiological Research, Max Stern Yezreel Valley College, Emek Yezreel, Israel
- Department of Brain Sciences, Faculty of Medicine, Imperial College London, London, United Kingdom
- Educational Psychology Department, Center for Psychobiological Research, Max Stern Yezreel Valley College, Emek Yezreel, Israel
| | - Maya Lvovsky
- Educational Psychology Department, Center for Psychobiological Research, Max Stern Yezreel Valley College, Emek Yezreel, Israel
| |
Collapse
|
39
|
Elyoseph Z, Levkovich I. Beyond human expertise: the promise and limitations of ChatGPT in suicide risk assessment. Front Psychiatry 2023; 14:1213141. [PMID: 37593450 PMCID: PMC10427505 DOI: 10.3389/fpsyt.2023.1213141] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 07/19/2023] [Indexed: 08/19/2023] Open
Abstract
ChatGPT, an artificial intelligence language model developed by OpenAI, holds the potential for contributing to the field of mental health. Nevertheless, although ChatGPT theoretically shows promise, its clinical abilities in suicide prevention, a significant mental health concern, have yet to be demonstrated. To address this knowledge gap, this study aims to compare ChatGPT's assessments of mental health indicators to those of mental health professionals in a hypothetical case study that focuses on suicide risk assessment. Specifically, ChatGPT was asked to evaluate a text vignette describing a hypothetical patient with varying levels of perceived burdensomeness and thwarted belongingness. The ChatGPT assessments were compared to the norms of mental health professionals. The results indicated that ChatGPT rated the risk of suicide attempts lower than did the mental health professionals in all conditions. Furthermore, ChatGPT rated mental resilience lower than the norms in most conditions. These results imply that gatekeepers, patients or even mental health professionals who rely on ChatGPT for evaluating suicidal risk or as a complementary tool to improve decision-making may receive an inaccurate assessment that underestimates the actual suicide risk.
Collapse
Affiliation(s)
- Zohar Elyoseph
- Department of Psychology and Educational Counseling, The Center for Psychobiological Research, Max Stern Yezreel Valley College, Emek Yezreel, Israel
- Department of Brain Sciences, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Inbar Levkovich
- Faculty of Graduate Studies, Oranim Academic College, Kiryat Tiv'on, Israel
| |
Collapse
|