1
|
Levin C, Naimi E, Saban M. Evaluating GenAI systems to combat mental health issues in healthcare workers: An integrative literature review. Int J Med Inform 2024; 191:105566. [PMID: 39079316 DOI: 10.1016/j.ijmedinf.2024.105566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 06/10/2024] [Accepted: 07/21/2024] [Indexed: 09/07/2024]
Abstract
BACKGROUND Mental health issues among healthcare workers remain a serious problem globally. Recent surveys continue to report high levels of depression, anxiety, burnout and other conditions amongst various occupational groups. Novel approaches are needed to support clinician well-being. OBJECTIVE This integrative literature review aims to explore the current state of research examining the use of generative artificial intelligence (GenAI) and machine learning (ML) systems to predict mental health issues and identify associated risk factors amongst healthcare professionals. METHODS A literature search of databases was conducted in Medline then adapted as necessary to Scopus, Web of Science, Google Scholar, PubMed and CINAHL with Full Text. Eleven studies met the inclusion criteria for the review. RESULTS Nine studies employed various machine learning techniques to predict different mental health outcomes among healthcare workers. Models showed good predictive performance, with AUCs ranging from 0.82 to 0.904 for outcomes such as depression, anxiety and safety perceptions. Key risk factors identified included fatigue, stress, burnout, workload, sleep issues and lack of support. Two studies explored the potential of sensor-based technologies and GenAI analysis of physiological data. None of the included studies focused on the use of GenAI systems specifically for providing mental health support to healthcare workers. CONCLUSION Preliminary research demonstrates that AI/ML models can effectively predict mental health issues. However, more work is needed to evaluate the real-world integration and impact of these tools, including GenAI systems, in identifying clinician distress and supporting well-being over time. Further research should aim to explore how GenAI may be developed and applied to provide mental health support for healthcare workers.
Collapse
Affiliation(s)
- C Levin
- Faculty of School of Life and Health Sciences, Nursing Department, The Jerusalem College of Technology-Lev Academic Center, Jerusalem, Israel; The Department of Vascular Surgery, The Chaim Sheba Medical Center, Tel Hashomer, Ramat Gan, Tel Aviv, Israel
| | - E Naimi
- Department of Nursing, School of Health Professions, Faculty of Medicine, Tel Aviv University
| | - M Saban
- Department of Nursing, School of Health Professions, Faculty of Medicine, Tel Aviv University.
| |
Collapse
|
2
|
Guo Z, Lai A, Thygesen JH, Farrington J, Keen T, Li K. Large Language Models for Mental Health Applications: Systematic Review. JMIR Ment Health 2024; 11:e57400. [PMID: 39423368 PMCID: PMC11530718 DOI: 10.2196/57400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/18/2024] [Revised: 05/17/2024] [Accepted: 09/03/2024] [Indexed: 10/21/2024] Open
Abstract
BACKGROUND Large language models (LLMs) are advanced artificial neural networks trained on extensive datasets to accurately understand and generate natural language. While they have received much attention and demonstrated potential in digital health, their application in mental health, particularly in clinical settings, has generated considerable debate. OBJECTIVE This systematic review aims to critically assess the use of LLMs in mental health, specifically focusing on their applicability and efficacy in early screening, digital interventions, and clinical settings. By systematically collating and assessing the evidence from current studies, our work analyzes models, methodologies, data sources, and outcomes, thereby highlighting the potential of LLMs in mental health, the challenges they present, and the prospects for their clinical use. METHODS Adhering to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, this review searched 5 open-access databases: MEDLINE (accessed by PubMed), IEEE Xplore, Scopus, JMIR, and ACM Digital Library. Keywords used were (mental health OR mental illness OR mental disorder OR psychiatry) AND (large language models). This study included articles published between January 1, 2017, and April 30, 2024, and excluded articles published in languages other than English. RESULTS In total, 40 articles were evaluated, including 15 (38%) articles on mental health conditions and suicidal ideation detection through text analysis, 7 (18%) on the use of LLMs as mental health conversational agents, and 18 (45%) on other applications and evaluations of LLMs in mental health. LLMs show good effectiveness in detecting mental health issues and providing accessible, destigmatized eHealth services. However, assessments also indicate that the current risks associated with clinical use might surpass their benefits. These risks include inconsistencies in generated text; the production of hallucinations; and the absence of a comprehensive, benchmarked ethical framework. CONCLUSIONS This systematic review examines the clinical applications of LLMs in mental health, highlighting their potential and inherent risks. The study identifies several issues: the lack of multilingual datasets annotated by experts, concerns regarding the accuracy and reliability of generated content, challenges in interpretability due to the "black box" nature of LLMs, and ongoing ethical dilemmas. These ethical concerns include the absence of a clear, benchmarked ethical framework; data privacy issues; and the potential for overreliance on LLMs by both physicians and patients, which could compromise traditional medical practices. As a result, LLMs should not be considered substitutes for professional mental health services. However, the rapid development of LLMs underscores their potential as valuable clinical aids, emphasizing the need for continued research and development in this area. TRIAL REGISTRATION PROSPERO CRD42024508617; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=508617.
Collapse
Affiliation(s)
- Zhijun Guo
- Institute of Health Informatics University College, London, London, United Kingdom
| | - Alvina Lai
- Institute of Health Informatics University College, London, London, United Kingdom
| | - Johan H Thygesen
- Institute of Health Informatics University College, London, London, United Kingdom
| | - Joseph Farrington
- Institute of Health Informatics University College, London, London, United Kingdom
| | - Thomas Keen
- Institute of Health Informatics University College, London, London, United Kingdom
- Great Ormond Street Institute of Child Health, University College London, London, United Kingdom
| | - Kezhi Li
- Institute of Health Informatics University College, London, London, United Kingdom
| |
Collapse
|
3
|
Elyoseph Z, Gur T, Haber Y, Simon T, Angert T, Navon Y, Tal A, Asman O. An Ethical Perspective on the Democratization of Mental Health With Generative AI. JMIR Ment Health 2024; 11:e58011. [PMID: 39417792 PMCID: PMC11500620 DOI: 10.2196/58011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Revised: 07/20/2024] [Accepted: 07/24/2024] [Indexed: 10/19/2024] Open
Abstract
Unlabelled Knowledge has become more open and accessible to a large audience with the "democratization of information" facilitated by technology. This paper provides a sociohistorical perspective for the theme issue "Responsible Design, Integration, and Use of Generative AI in Mental Health." It evaluates ethical considerations in using generative artificial intelligence (GenAI) for the democratization of mental health knowledge and practice. It explores the historical context of democratizing information, transitioning from restricted access to widespread availability due to the internet, open-source movements, and most recently, GenAI technologies such as large language models. The paper highlights why GenAI technologies represent a new phase in the democratization movement, offering unparalleled access to highly advanced technology as well as information. In the realm of mental health, this requires delicate and nuanced ethical deliberation. Including GenAI in mental health may allow, among other things, improved accessibility to mental health care, personalized responses, and conceptual flexibility, and could facilitate a flattening of traditional hierarchies between health care providers and patients. At the same time, it also entails significant risks and challenges that must be carefully addressed. To navigate these complexities, the paper proposes a strategic questionnaire for assessing artificial intelligence-based mental health applications. This tool evaluates both the benefits and the risks, emphasizing the need for a balanced and ethical approach to GenAI integration in mental health. The paper calls for a cautious yet positive approach to GenAI in mental health, advocating for the active engagement of mental health professionals in guiding GenAI development. It emphasizes the importance of ensuring that GenAI advancements are not only technologically sound but also ethically grounded and patient-centered.
Collapse
Affiliation(s)
- Zohar Elyoseph
- Department of Brain Sciences, Faculty of Medicine, Imperial College, Fulham Palace Rd, London, W6 8RF, United Kingdom, 44 547836088
- Faculty of Education, University of Haifa, Haifa, Israel
| | - Tamar Gur
- The Adelson School of Entrepreneurship, Reichman University, Herzliya, Israel
| | - Yuval Haber
- The PhD Program of Hermeneutics & Cultural Studies, Bar-Ilan University, Ramat Gan, Israel
| | - Tomer Simon
- Microsoft Israel R&D Center, Tel Aviv, Israel
| | - Tal Angert
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
| | - Yuval Navon
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
| | - Amir Tal
- Samueli Initiative for Responsible AI in Medicine, Faculty of Medical and Health Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Oren Asman
- Samueli Initiative for Responsible AI in Medicine, Faculty of Medical and Health Sciences, Tel Aviv University, Tel Aviv, Israel
- Department of Nursing, Faculty of Medical and Health Sciences, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
4
|
Hadar-Shoval D, Asraf K, Shinan-Altman S, Elyoseph Z, Levkovich I. Embedded values-like shape ethical reasoning of large language models on primary care ethical dilemmas. Heliyon 2024; 10:e38056. [PMID: 39381244 PMCID: PMC11458949 DOI: 10.1016/j.heliyon.2024.e38056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Accepted: 09/17/2024] [Indexed: 10/10/2024] Open
Abstract
Objective This article uses the framework of Schwartz's values theory to examine whether the embedded values-like profile within large language models (LLMs) impact ethical decision-making dilemmas faced by primary care. It specifically aims to evaluate whether each LLM exhibits a distinct values-like profile, assess its alignment with general population values, and determine whether latent values influence clinical recommendations. Methods The Portrait Values Questionnaire-Revised (PVQ-RR) was submitted to each LLM (Claude, Bard, GPT-3.5, and GPT-4) 20 times to ensure reliable and valid responses. Their responses were compared to a benchmark derived from a diverse international sample consisting of over 53,000 culturally diverse respondents who completed the PVQ-RR. Four vignettes depicting prototypical professional quandaries involving conflicts between competing values were presented to the LLMs. The option selected by each LLM and the strength of its recommendation were evaluated to determine if underlying values-like impact output. Results Each LLM demonstrated a unique values-like profile. Universalism and self-direction were prioritized, while power and tradition were assigned less importance than population benchmarks, suggesting potential Western-centric biases. Four clinical vignettes involving value conflicts were presented to the LLMs. Preliminary indications suggested that embedded values-like influence recommendations. Significant variances in confidence strength regarding chosen recommendations materialized between models, proposing that further vetting is required before the LLMs can be relied on as judgment aids. However, the overall selection of preferences aligned with intrinsic value hierarchies. Conclusion The distinct intrinsic values-like embedded within LLMs shape ethical decision-making, which carries implications for their integration in primary care settings serving diverse populations. For context-appropriate, equitable delivery of AI-assisted healthcare globally it is essential that LLMs are tailored to align with cultural outlooks.
Collapse
Affiliation(s)
- Dorit Hadar-Shoval
- The Center for Psychobiological Research, Department of Psychology and Educational Counseling, Max Stern Yezreel Valley College, Israel
| | - Kfir Asraf
- The Center for Psychobiological Research, Department of Psychology and Educational Counseling, Max Stern Yezreel Valley College, Israel
| | - Shiri Shinan-Altman
- The Louis and Gabi Weisfeld School of Social Work, Bar-Ilan University, Ramat Gan, Israel
| | - Zohar Elyoseph
- The Center for Psychobiological Research, Department of Psychology and Educational Counseling, Max Stern Yezreel Valley College, Israel
- Department of Brain Sciences, Faculty of Medicine, Imperial College London, England
- Department of Counseling and Human Development, Department of Education, University of Haifa, Israel
| | | |
Collapse
|
5
|
Hurley ME, Lang BH, Kostick-Quenet KM, Smith JN, Blumenthal-Barby J. Patient Consent and The Right to Notice and Explanation of AI Systems Used in Health Care. THE AMERICAN JOURNAL OF BIOETHICS : AJOB 2024:1-13. [PMID: 39288291 DOI: 10.1080/15265161.2024.2399828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/19/2024]
Abstract
Given the need for enforceable guardrails for artificial intelligence (AI) that protect the public and allow for innovation, the U.S. Government recently issued a Blueprint for an AI Bill of Rights which outlines five principles of safe AI design, use, and implementation. One in particular, the right to notice and explanation, requires accurately informing the public about the use of AI that impacts them in ways that are easy to understand. Yet, in the healthcare setting, it is unclear what goal the right to notice and explanation serves, and the moral importance of patient-level disclosure. We propose three normative functions of this right: (1) to notify patients about their care, (2) to educate patients and promote trust, and (3) to meet standards for informed consent. Additional clarity is needed to guide practices that respect the right to notice and explanation of AI in healthcare while providing meaningful benefits to patients.
Collapse
|
6
|
Wang Z, Yang W, Li Z, Rong Z, Wang X, Han J, Ma L. A 25-Year Retrospective of the Use of AI for Diagnosing Acute Stroke: Systematic Review. J Med Internet Res 2024; 26:e59711. [PMID: 39255472 PMCID: PMC11422733 DOI: 10.2196/59711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Revised: 06/25/2024] [Accepted: 07/15/2024] [Indexed: 09/12/2024] Open
Abstract
BACKGROUND Stroke is a leading cause of death and disability worldwide. Rapid and accurate diagnosis is crucial for minimizing brain damage and optimizing treatment plans. OBJECTIVE This review aims to summarize the methods of artificial intelligence (AI)-assisted stroke diagnosis over the past 25 years, providing an overview of performance metrics and algorithm development trends. It also delves into existing issues and future prospects, intending to offer a comprehensive reference for clinical practice. METHODS A total of 50 representative articles published between 1999 and 2024 on using AI technology for stroke prevention and diagnosis were systematically selected and analyzed in detail. RESULTS AI-assisted stroke diagnosis has made significant advances in stroke lesion segmentation and classification, stroke risk prediction, and stroke prognosis. Before 2012, research mainly focused on segmentation using traditional thresholding and heuristic techniques. From 2012 to 2016, the focus shifted to machine learning (ML)-based approaches. After 2016, the emphasis moved to deep learning (DL), which brought significant improvements in accuracy. In stroke lesion segmentation and classification as well as stroke risk prediction, DL has shown superiority over ML. In stroke prognosis, both DL and ML have shown good performance. CONCLUSIONS Over the past 25 years, AI technology has shown promising performance in stroke diagnosis.
Collapse
Affiliation(s)
| | | | | | - Ze Rong
- Nantong University, Nantong, China
| | | | | | - Lei Ma
- Nantong University, Nantong, China
| |
Collapse
|
7
|
Banerjee S, Dunn P, Conard S, Ali A. Mental Health Applications of Generative AI and Large Language Modeling in the United States. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2024; 21:910. [PMID: 39063487 PMCID: PMC11276907 DOI: 10.3390/ijerph21070910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2024] [Revised: 07/09/2024] [Accepted: 07/10/2024] [Indexed: 07/28/2024]
Abstract
(1) Background: Artificial intelligence (AI) has flourished in recent years. More specifically, generative AI has had broad applications in many disciplines. While mental illness is on the rise, AI has proven valuable in aiding the diagnosis and treatment of mental disorders. However, there is little to no research about precisely how much interest there is in AI technology. (2) Methods: We performed a Google Trends search for "AI and mental health" and compared relative search volume (RSV) indices of "AI", "AI and Depression", and "AI and anxiety". This time series study employed Box-Jenkins time series modeling to forecast long-term interest through the end of 2024. (3) Results: Within the United States, AI interest steadily increased throughout 2023, with some anomalies due to media reporting. Through predictive models, we found that this trend is predicted to increase 114% through the end of the year 2024, with public interest in AI applications being on the rise. (4) Conclusions: According to our study, we found that the awareness of AI has drastically increased throughout 2023, especially in mental health. This demonstrates increasing public awareness of mental health and AI, making advocacy and education about AI technology of paramount importance.
Collapse
Affiliation(s)
- Sri Banerjee
- School of Health Sciences and Public Policy, Walden University, Minneapolis, MN 55401, USA
| | - Pat Dunn
- Center for Health Technology & Innovation American Heart Association, Dallas, TX 75231, USA;
| | | | - Asif Ali
- McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030, USA;
| |
Collapse
|
8
|
Omar M, Soffer S, Charney AW, Landi I, Nadkarni GN, Klang E. Applications of large language models in psychiatry: a systematic review. Front Psychiatry 2024; 15:1422807. [PMID: 38979501 PMCID: PMC11228775 DOI: 10.3389/fpsyt.2024.1422807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Accepted: 06/05/2024] [Indexed: 07/10/2024] Open
Abstract
Background With their unmatched ability to interpret and engage with human language and context, large language models (LLMs) hint at the potential to bridge AI and human cognitive processes. This review explores the current application of LLMs, such as ChatGPT, in the field of psychiatry. Methods We followed PRISMA guidelines and searched through PubMed, Embase, Web of Science, and Scopus, up until March 2024. Results From 771 retrieved articles, we included 16 that directly examine LLMs' use in psychiatry. LLMs, particularly ChatGPT and GPT-4, showed diverse applications in clinical reasoning, social media, and education within psychiatry. They can assist in diagnosing mental health issues, managing depression, evaluating suicide risk, and supporting education in the field. However, our review also points out their limitations, such as difficulties with complex cases and potential underestimation of suicide risks. Conclusion Early research in psychiatry reveals LLMs' versatile applications, from diagnostic support to educational roles. Given the rapid pace of advancement, future investigations are poised to explore the extent to which these models might redefine traditional roles in mental health care.
Collapse
Affiliation(s)
- Mahmud Omar
- Faculty of Medicine, Tel-Aviv University, Tel-Aviv, Israel
| | - Shelly Soffer
- Internal Medicine B, Assuta Medical Center, Ashdod, Israel
- Ben-Gurion University of the Negev, Be'er Sheva, Israel
| | | | - Isotta Landi
- Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Girish N Nadkarni
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Eyal Klang
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| |
Collapse
|
9
|
Azamfirei R. ChatGPT and Neuroprognostication: A Snow Globe, Not a Crystal Ball. Crit Care Med 2024; 52:992-994. [PMID: 38752820 DOI: 10.1097/ccm.0000000000006265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2024]
Affiliation(s)
- Razvan Azamfirei
- Department of Anesthesiology and Critical Care Medicine, Johns Hopkins University School of Medicine, Baltimore, MD
| |
Collapse
|
10
|
Shinan-Altman S, Elyoseph Z, Levkovich I. The impact of history of depression and access to weapons on suicide risk assessment: a comparison of ChatGPT-3.5 and ChatGPT-4. PeerJ 2024; 12:e17468. [PMID: 38827287 PMCID: PMC11143969 DOI: 10.7717/peerj.17468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 05/05/2024] [Indexed: 06/04/2024] Open
Abstract
The aim of this study was to evaluate the effectiveness of ChatGPT-3.5 and ChatGPT-4 in incorporating critical risk factors, namely history of depression and access to weapons, into suicide risk assessments. Both models assessed suicide risk using scenarios that featured individuals with and without a history of depression and access to weapons. The models estimated the likelihood of suicidal thoughts, suicide attempts, serious suicide attempts, and suicide-related mortality on a Likert scale. A multivariate three-way ANOVA analysis with Bonferroni post hoc tests was conducted to examine the impact of the forementioned independent factors (history of depression and access to weapons) on these outcome variables. Both models identified history of depression as a significant suicide risk factor. ChatGPT-4 demonstrated a more nuanced understanding of the relationship between depression, access to weapons, and suicide risk. In contrast, ChatGPT-3.5 displayed limited insight into this complex relationship. ChatGPT-4 consistently assigned higher severity ratings to suicide-related variables than did ChatGPT-3.5. The study highlights the potential of these two models, particularly ChatGPT-4, to enhance suicide risk assessment by considering complex risk factors.
Collapse
Affiliation(s)
| | - Zohar Elyoseph
- Department of Brain Sciences, Faculty of Medicine, Imperial College London, London, England, United Kingdom
- The Center for Psychobiological Research, Department of Psychology and Educational Counseling, Max Stern Yezreel Valley College, Emek Yezreel, Israel
| | - Inbar Levkovich
- Faculty of Graduate Studies, Oranim Academic College of Education, Kiryat Tiv’on, Israel
| |
Collapse
|
11
|
Haber Y, Levkovich I, Hadar-Shoval D, Elyoseph Z. The Artificial Third: A Broad View of the Effects of Introducing Generative Artificial Intelligence on Psychotherapy. JMIR Ment Health 2024; 11:e54781. [PMID: 38787297 PMCID: PMC11137430 DOI: 10.2196/54781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 03/24/2024] [Accepted: 04/18/2024] [Indexed: 05/25/2024] Open
Abstract
Unlabelled This paper explores a significant shift in the field of mental health in general and psychotherapy in particular following generative artificial intelligence's new capabilities in processing and generating humanlike language. Following Freud, this lingo-technological development is conceptualized as the "fourth narcissistic blow" that science inflicts on humanity. We argue that this narcissistic blow has a potentially dramatic influence on perceptions of human society, interrelationships, and the self. We should, accordingly, expect dramatic changes in perceptions of the therapeutic act following the emergence of what we term the artificial third in the field of psychotherapy. The introduction of an artificial third marks a critical juncture, prompting us to ask the following important core questions that address two basic elements of critical thinking, namely, transparency and autonomy: (1) What is this new artificial presence in therapy relationships? (2) How does it reshape our perception of ourselves and our interpersonal dynamics? and (3) What remains of the irreplaceable human elements at the core of therapy? Given the ethical implications that arise from these questions, this paper proposes that the artificial third can be a valuable asset when applied with insight and ethical consideration, enhancing but not replacing the human touch in therapy.
Collapse
Affiliation(s)
- Yuval Haber
- The PhD Program of Hermeneutics and Cultural Studies, Interdisciplinary Studies Unit, Bar-Ilan University, Ramat Gan, Israel
| | | | - Dorit Hadar-Shoval
- Department of Psychology and Educational Counseling, The Max Stern Yezreel Valley College, Emek Yezreel, Israel
| | - Zohar Elyoseph
- Department of Brain Sciences, Faculty of Medicine, Imperial College London, London, United Kingdom
- The Center for Psychobiological Research, Department of Psychology and Educational Counseling, The Max Stern Yezreel Valley College, Emek Yezreel, Israel
| |
Collapse
|
12
|
Altara R, Basson CJ, Biondi-Zoccai G, Booz GW. Exploring the Promise and Challenges of Artificial Intelligence in Biomedical Research and Clinical Practice. J Cardiovasc Pharmacol 2024; 83:403-409. [PMID: 38323891 DOI: 10.1097/fjc.0000000000001546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Accepted: 01/26/2024] [Indexed: 02/08/2024]
Abstract
ABSTRACT Artificial intelligence (AI) is poised to revolutionize how science, and biomedical research in particular, are done. With AI, problem-solving and complex tasks using massive data sets can be performed at a much higher rate and dimensionality level compared with humans. With the ability to handle huge data sets and self-learn, AI is already being exploited in drug design, drug repurposing, toxicology, and material identification. AI could also be used in both basic and clinical research in study design, defining outcomes, analyzing data, interpreting findings, and even identifying the most appropriate areas of investigation and funding sources. State-of-the-art AI-based large language models, such as ChatGPT and Perplexity, are positioned to change forever how science is communicated and how scientists interact with one another and their profession, including postpublication appraisal and critique. Like all revolutions, upheaval will follow and not all outcomes can be predicted, necessitating guardrails at the onset, especially to minimize the untoward impact of the many drawbacks of large language models, which include lack of confidentiality, risk of hallucinations, and propagation of mainstream albeit potentially mistaken opinions and perspectives. In this review, we highlight areas of biomedical research that are already being reshaped by AI and how AI is likely to affect it further in the near future. We discuss the potential benefits of AI in biomedical research and address possible risks, some surrounding the creative process, that warrant further reflection.
Collapse
Affiliation(s)
- Raffaele Altara
- Department of Anatomy and Embryology, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, the Netherlands
- Department of Pathology, School of Medicine, University of Mississippi Medical Center, Jackson, MS
| | - Cameron J Basson
- School of Medicine, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, the Netherlands
| | - Giuseppe Biondi-Zoccai
- Department of Medical Surgical Sciences and Biotechnologies, Sapienza University of Rome, Latina, Italy
- Mediterranea Cardiocentro, Napoli, Italy; and
| | - George W Booz
- Department of Pharmacology and Toxicology, School of Medicine, University of Mississippi Medical Center, Jackson, MS
| |
Collapse
|
13
|
Hadar-Shoval D, Asraf K, Mizrachi Y, Haber Y, Elyoseph Z. Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz's Theory of Basic Values. JMIR Ment Health 2024; 11:e55988. [PMID: 38593424 DOI: 10.2196/55988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 03/01/2024] [Accepted: 03/08/2024] [Indexed: 04/11/2024] Open
Abstract
BACKGROUND Large language models (LLMs) hold potential for mental health applications. However, their opaque alignment processes may embed biases that shape problematic perspectives. Evaluating the values embedded within LLMs that guide their decision-making have ethical importance. Schwartz's theory of basic values (STBV) provides a framework for quantifying cultural value orientations and has shown utility for examining values in mental health contexts, including cultural, diagnostic, and therapist-client dynamics. OBJECTIVE This study aimed to (1) evaluate whether the STBV can measure value-like constructs within leading LLMs and (2) determine whether LLMs exhibit distinct value-like patterns from humans and each other. METHODS In total, 4 LLMs (Bard, Claude 2, Generative Pretrained Transformer [GPT]-3.5, GPT-4) were anthropomorphized and instructed to complete the Portrait Values Questionnaire-Revised (PVQ-RR) to assess value-like constructs. Their responses over 10 trials were analyzed for reliability and validity. To benchmark the LLMs' value profiles, their results were compared to published data from a diverse sample of 53,472 individuals across 49 nations who had completed the PVQ-RR. This allowed us to assess whether the LLMs diverged from established human value patterns across cultural groups. Value profiles were also compared between models via statistical tests. RESULTS The PVQ-RR showed good reliability and validity for quantifying value-like infrastructure within the LLMs. However, substantial divergence emerged between the LLMs' value profiles and population data. The models lacked consensus and exhibited distinct motivational biases, reflecting opaque alignment processes. For example, all models prioritized universalism and self-direction, while de-emphasizing achievement, power, and security relative to humans. Successful discriminant analysis differentiated the 4 LLMs' distinct value profiles. Further examination found the biased value profiles strongly predicted the LLMs' responses when presented with mental health dilemmas requiring choosing between opposing values. This provided further validation for the models embedding distinct motivational value-like constructs that shape their decision-making. CONCLUSIONS This study leveraged the STBV to map the motivational value-like infrastructure underpinning leading LLMs. Although the study demonstrated the STBV can effectively characterize value-like infrastructure within LLMs, substantial divergence from human values raises ethical concerns about aligning these models with mental health applications. The biases toward certain cultural value sets pose risks if integrated without proper safeguards. For example, prioritizing universalism could promote unconditional acceptance even when clinically unwise. Furthermore, the differences between the LLMs underscore the need to standardize alignment processes to capture true cultural diversity. Thus, any responsible integration of LLMs into mental health care must account for their embedded biases and motivation mismatches to ensure equitable delivery across diverse populations. Achieving this will require transparency and refinement of alignment techniques to instill comprehensive human values.
Collapse
Affiliation(s)
- Dorit Hadar-Shoval
- The Psychology Department, Max Stern Yezreel Valley College, Tel Adashim, Israel
| | - Kfir Asraf
- The Psychology Department, Max Stern Yezreel Valley College, Tel Adashim, Israel
| | - Yonathan Mizrachi
- The Jane Goodall Institute, Max Stern Yezreel Valley College, Tel Adashim, Israel
- The Laboratory for AI, Machine Learning, Business & Data Analytics, Tel-Aviv University, Tel Aviv, Israel
| | - Yuval Haber
- The PhD Program of Hermeneutics and Cultural Studies, Interdisciplinary Studies Unit, Bar-Ilan University, Ramat Gan, Israel
| | - Zohar Elyoseph
- The Psychology Department, Center for Psychobiological Research, Max Stern Yezreel Valley College, Tel Adashim, Israel
- Department of Brain Sciences, Faculty of Medicine, Imperial College London, London, United Kingdom
| |
Collapse
|
14
|
Elyoseph Z, Refoua E, Asraf K, Lvovsky M, Shimoni Y, Hadar-Shoval D. Capacity of Generative AI to Interpret Human Emotions From Visual and Textual Data: Pilot Evaluation Study. JMIR Ment Health 2024; 11:e54369. [PMID: 38319707 PMCID: PMC10879976 DOI: 10.2196/54369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 12/09/2023] [Accepted: 12/25/2023] [Indexed: 02/07/2024] Open
Abstract
BACKGROUND Mentalization, which is integral to human cognitive processes, pertains to the interpretation of one's own and others' mental states, including emotions, beliefs, and intentions. With the advent of artificial intelligence (AI) and the prominence of large language models in mental health applications, questions persist about their aptitude in emotional comprehension. The prior iteration of the large language model from OpenAI, ChatGPT-3.5, demonstrated an advanced capacity to interpret emotions from textual data, surpassing human benchmarks. Given the introduction of ChatGPT-4, with its enhanced visual processing capabilities, and considering Google Bard's existing visual functionalities, a rigorous assessment of their proficiency in visual mentalizing is warranted. OBJECTIVE The aim of the research was to critically evaluate the capabilities of ChatGPT-4 and Google Bard with regard to their competence in discerning visual mentalizing indicators as contrasted with their textual-based mentalizing abilities. METHODS The Reading the Mind in the Eyes Test developed by Baron-Cohen and colleagues was used to assess the models' proficiency in interpreting visual emotional indicators. Simultaneously, the Levels of Emotional Awareness Scale was used to evaluate the large language models' aptitude in textual mentalizing. Collating data from both tests provided a holistic view of the mentalizing capabilities of ChatGPT-4 and Bard. RESULTS ChatGPT-4, displaying a pronounced ability in emotion recognition, secured scores of 26 and 27 in 2 distinct evaluations, significantly deviating from a random response paradigm (P<.001). These scores align with established benchmarks from the broader human demographic. Notably, ChatGPT-4 exhibited consistent responses, with no discernible biases pertaining to the sex of the model or the nature of the emotion. In contrast, Google Bard's performance aligned with random response patterns, securing scores of 10 and 12 and rendering further detailed analysis redundant. In the domain of textual analysis, both ChatGPT and Bard surpassed established benchmarks from the general population, with their performances being remarkably congruent. CONCLUSIONS ChatGPT-4 proved its efficacy in the domain of visual mentalizing, aligning closely with human performance standards. Although both models displayed commendable acumen in textual emotion interpretation, Bard's capabilities in visual emotion interpretation necessitate further scrutiny and potential refinement. This study stresses the criticality of ethical AI development for emotional recognition, highlighting the need for inclusive data, collaboration with patients and mental health experts, and stringent governmental oversight to ensure transparency and protect patient privacy.
Collapse
Affiliation(s)
- Zohar Elyoseph
- Department of Educational Psychology, The Center for Psychobiological Research, The Max Stern Yezreel Valley College, Emek Yezreel, Israel
- Imperial College London, London, United Kingdom
| | - Elad Refoua
- Department of Psychology, Bar-Ilan University, Ramat Gan, Israel
| | - Kfir Asraf
- Department of Psychology, The Max Stern Yezreel Valley College, Emek Yezreel, Israel
| | - Maya Lvovsky
- Department of Psychology, The Max Stern Yezreel Valley College, Emek Yezreel, Israel
| | - Yoav Shimoni
- Boston Children's Hospital, Boston, MA, United States
| | - Dorit Hadar-Shoval
- Department of Psychology, The Max Stern Yezreel Valley College, Emek Yezreel, Israel
| |
Collapse
|