1
|
Kurniawan MH, Handiyani H, Nuraini T, Hariyati RTS, Sutrisno S. A systematic review of artificial intelligence-powered (AI-powered) chatbot intervention for managing chronic illness. Ann Med 2024; 56:2302980. [PMID: 38466897 PMCID: PMC10930147 DOI: 10.1080/07853890.2024.2302980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 12/31/2023] [Indexed: 03/13/2024] Open
Abstract
BACKGROUND Utilizing artificial intelligence (AI) in chatbots, especially for chronic diseases, has become increasingly prevalent. These AI-powered chatbots serve as crucial tools for enhancing patient communication, addressing the rising prevalence of chronic conditions, and meeting the growing demand for supportive healthcare applications. However, there is a notable gap in comprehensive reviews evaluating the impact of AI-powered chatbot interventions in healthcare within academic literature. This study aimed to assess user satisfaction, intervention efficacy, and the specific characteristics and AI architectures of chatbot systems designed for chronic diseases. METHOD A thorough exploration of the existing literature was undertaken by employing diverse databases such as PubMed MEDLINE, CINAHL, EMBASE, PsycINFO, ACM Digital Library and Scopus. The studies incorporated in this analysis encompassed primary research that employed chatbots or other forms of AI architecture in the context of preventing, treating or rehabilitating chronic diseases. The assessment of bias risk was conducted using Risk of 2.0 Tools. RESULTS Seven hundred and eighty-four results were obtained, and subsequently, eight studies were found to align with the inclusion criteria. The intervention methods encompassed health education (n = 3), behaviour change theory (n = 1), stress and coping (n = 1), cognitive behavioural therapy (n = 2) and self-care behaviour (n = 1). The research provided valuable insights into the effectiveness and user-friendliness of AI-powered chatbots in handling various chronic conditions. Overall, users showed favourable acceptance of these chatbots for self-managing chronic illnesses. CONCLUSIONS The reviewed studies suggest promising acceptance of AI-powered chatbots for self-managing chronic conditions. However, limited evidence on their efficacy due to insufficient technical documentation calls for future studies to provide detailed descriptions and prioritize patient safety. These chatbots employ natural language processing and multimodal interaction. Subsequent research should focus on evidence-based evaluations, facilitating comparisons across diverse chronic health conditions.
Collapse
Affiliation(s)
- Moh Heri Kurniawan
- Doctoral Student, Faculty of Nursing, Universitas Indonesia, Depok, Indonesia
- Departement of Nursing, Faculty of Health, Universitas Aisyah Pringsewu, Kabupaten Pringsewu, Indonesia
| | - Hanny Handiyani
- Department of Nursing, Faculty of Nursing, Universitas Indonesia, Depok, Indonesia
| | - Tuti Nuraini
- Department of Nursing, Faculty of Nursing, Universitas Indonesia, Depok, Indonesia
| | | | - Sutrisno Sutrisno
- Departement of Nursing, Faculty of Health, Universitas Aisyah Pringsewu, Kabupaten Pringsewu, Indonesia
| |
Collapse
|
2
|
Chew HSJ, Chew NW, Loong SSE, Lim SL, Tam WSW, Chin YH, Chao AM, Dimitriadish GK, Gao Y, So JBY, Shabbir A, Ngiam KY. Effectiveness of an Artificial Intelligence-Assisted App for Improving Eating Behaviors: Mixed Methods Evaluation. J Med Internet Res 2024; 26:e46036. [PMID: 38713909 DOI: 10.2196/46036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 12/12/2023] [Accepted: 03/12/2024] [Indexed: 05/09/2024] Open
Abstract
BACKGROUND A plethora of weight management apps are available, but many individuals, especially those living with overweight and obesity, still struggle to achieve adequate weight loss. An emerging area in weight management is the support for one's self-regulation over momentary eating impulses. OBJECTIVE This study aims to examine the feasibility and effectiveness of a novel artificial intelligence-assisted weight management app in improving eating behaviors in a Southeast Asian cohort. METHODS A single-group pretest-posttest study was conducted. Participants completed the 1-week run-in period of a 12-week app-based weight management program called the Eating Trigger-Response Inhibition Program (eTRIP). This self-monitoring system was built upon 3 main components, namely, (1) chatbot-based check-ins on eating lapse triggers, (2) food-based computer vision image recognition (system built based on local food items), and (3) automated time-based nudges and meal stopwatch. At every mealtime, participants were prompted to take a picture of their food items, which were identified by a computer vision image recognition technology, thereby triggering a set of chatbot-initiated questions on eating triggers such as who the users were eating with. Paired 2-sided t tests were used to compare the differences in the psychobehavioral constructs before and after the 7-day program, including overeating habits, snacking habits, consideration of future consequences, self-regulation of eating behaviors, anxiety, depression, and physical activity. Qualitative feedback were analyzed by content analysis according to 4 steps, namely, decontextualization, recontextualization, categorization, and compilation. RESULTS The mean age, self-reported BMI, and waist circumference of the participants were 31.25 (SD 9.98) years, 28.86 (SD 7.02) kg/m2, and 92.60 (SD 18.24) cm, respectively. There were significant improvements in all the 7 psychobehavioral constructs, except for anxiety. After adjusting for multiple comparisons, statistically significant improvements were found for overeating habits (mean -0.32, SD 1.16; P<.001), snacking habits (mean -0.22, SD 1.12; P<.002), self-regulation of eating behavior (mean 0.08, SD 0.49; P=.007), depression (mean -0.12, SD 0.74; P=.007), and physical activity (mean 1288.60, SD 3055.20 metabolic equivalent task-min/day; P<.001). Forty-one participants reported skipping at least 1 meal (ie, breakfast, lunch, or dinner), summing to 578 (67.1%) of the 862 meals skipped. Of the 230 participants, 80 (34.8%) provided textual feedback that indicated satisfactory user experience with eTRIP. Four themes emerged, namely, (1) becoming more mindful of self-monitoring, (2) personalized reminders with prompts and chatbot, (3) food logging with image recognition, and (4) engaging with a simple, easy, and appealing user interface. The attrition rate was 8.4% (21/251). CONCLUSIONS eTRIP is a feasible and effective weight management program to be tested in a larger population for its effectiveness and sustainability as a personalized weight management program for people with overweight and obesity. TRIAL REGISTRATION ClinicalTrials.gov NCT04833803; https://classic.clinicaltrials.gov/ct2/show/NCT04833803.
Collapse
Affiliation(s)
- Han Shi Jocelyn Chew
- Alice Lee Centre for Nursing Studies, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Nicholas Ws Chew
- Department of Cardiology, National University Hospital, Singapore, Singapore
| | - Shaun Seh Ern Loong
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Su Lin Lim
- Department of Dietetics, National University Hospital, Singapore, Singapore
| | - Wai San Wilson Tam
- Alice Lee Centre for Nursing Studies, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Yip Han Chin
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Ariana M Chao
- School of Nursing, Johns Hopkins University, Baltimore, MD, United States
| | - Georgios K Dimitriadish
- Department of Endocrinology ASO/EASO COM, King's College Hospital NHS Foundation Trust, London, United Kingdom
| | - Yujia Gao
- Division of Hepatobiliary & Pancreatic Surgery, Department of Surgery, National University Hospital, Singapore, Singapore
| | - Jimmy Bok Yan So
- Division of General Surgery (Upper Gastrointestinal Surgery), Department of Surgery, National University Hospital, Singapore, Singapore
| | - Asim Shabbir
- Division of General Surgery (Upper Gastrointestinal Surgery), Department of Surgery, National University Hospital, Singapore, Singapore
| | - Kee Yuan Ngiam
- Division of Thyroid & Endocrine Surgery, Department of Surgery, National University Hospital, Singapore, Singapore
| |
Collapse
|
3
|
Bragazzi NL, Garbarino S. Assessing the Accuracy of Generative Conversational Artificial Intelligence in Debunking Sleep Health Myths: Mixed Methods Comparative Study With Expert Analysis. JMIR Form Res 2024; 8:e55762. [PMID: 38501898 PMCID: PMC11061787 DOI: 10.2196/55762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 02/25/2024] [Accepted: 03/14/2024] [Indexed: 03/20/2024] Open
Abstract
BACKGROUND Adequate sleep is essential for maintaining individual and public health, positively affecting cognition and well-being, and reducing chronic disease risks. It plays a significant role in driving the economy, public safety, and managing health care costs. Digital tools, including websites, sleep trackers, and apps, are key in promoting sleep health education. Conversational artificial intelligence (AI) such as ChatGPT (OpenAI, Microsoft Corp) offers accessible, personalized advice on sleep health but raises concerns about potential misinformation. This underscores the importance of ensuring that AI-driven sleep health information is accurate, given its significant impact on individual and public health, and the spread of sleep-related myths. OBJECTIVE This study aims to examine ChatGPT's capability to debunk sleep-related disbeliefs. METHODS A mixed methods design was leveraged. ChatGPT categorized 20 sleep-related myths identified by 10 sleep experts and rated them in terms of falseness and public health significance, on a 5-point Likert scale. Sensitivity, positive predictive value, and interrater agreement were also calculated. A qualitative comparative analysis was also conducted. RESULTS ChatGPT labeled a significant portion (n=17, 85%) of the statements as "false" (n=9, 45%) or "generally false" (n=8, 40%), with varying accuracy across different domains. For instance, it correctly identified most myths about "sleep timing," "sleep duration," and "behaviors during sleep," while it had varying degrees of success with other categories such as "pre-sleep behaviors" and "brain function and sleep." ChatGPT's assessment of the degree of falseness and public health significance, on the 5-point Likert scale, revealed an average score of 3.45 (SD 0.87) and 3.15 (SD 0.99), respectively, indicating a good level of accuracy in identifying the falseness of statements and a good understanding of their impact on public health. The AI-based tool showed a sensitivity of 85% and a positive predictive value of 100%. Overall, this indicates that when ChatGPT labels a statement as false, it is highly reliable, but it may miss identifying some false statements. When comparing with expert ratings, high intraclass correlation coefficients (ICCs) between ChatGPT's appraisals and expert opinions could be found, suggesting that the AI's ratings were generally aligned with expert views on falseness (ICC=.83, P<.001) and public health significance (ICC=.79, P=.001) of sleep-related myths. Qualitatively, both ChatGPT and sleep experts refuted sleep-related misconceptions. However, ChatGPT adopted a more accessible style and provided a more generalized view, focusing on broad concepts, while experts sometimes used technical jargon, providing evidence-based explanations. CONCLUSIONS ChatGPT-4 can accurately address sleep-related queries and debunk sleep-related myths, with a performance comparable to sleep experts, even if, given its limitations, the AI cannot completely replace expert opinions, especially in nuanced and complex fields such as sleep health, but can be a valuable complement in the dissemination of updated information and promotion of healthy behaviors.
Collapse
Affiliation(s)
- Nicola Luigi Bragazzi
- Human Nutrition Unit, Department of Food and Drugs, University of Parma, Parma, Italy
- Department of Neuroscience, Rehabilitation, Ophthalmology, Genetics and Maternal/Child Sciences, University of Genoa, Genoa, Italy
- Laboratory for Industrial and Applied Mathematics, Department of Mathematics and Statistics, York University, Toronto, ON, Canada
| | - Sergio Garbarino
- Department of Neuroscience, Rehabilitation, Ophthalmology, Genetics and Maternal/Child Sciences, University of Genoa, Genoa, Italy
- Post-Graduate School of Occupational Health, Università Cattolica del Sacro Cuore, Rome, Italy
| |
Collapse
|
4
|
Kosyluk K, Baeder T, Greene KY, Tran JT, Bolton C, Loecher N, DiEva D, Galea JT. Mental Distress, Label Avoidance, and Use of a Mental Health Chatbot: Results From a US Survey. JMIR Form Res 2024; 8:e45959. [PMID: 38607665 PMCID: PMC11053397 DOI: 10.2196/45959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 01/19/2024] [Accepted: 02/13/2024] [Indexed: 04/13/2024] Open
Abstract
BACKGROUND For almost two decades, researchers and clinicians have argued that certain aspects of mental health treatment can be removed from clinicians' responsibilities and allocated to technology, preserving valuable clinician time and alleviating the burden on the behavioral health care system. The service delivery tasks that could arguably be allocated to technology without negatively impacting patient outcomes include screening, triage, and referral. OBJECTIVE We pilot-tested a chatbot for mental health screening and referral to understand the relationship between potential users' demographics and chatbot use; the completion rate of mental health screening when delivered by a chatbot; and the acceptability of a prototype chatbot designed for mental health screening and referral. This chatbot not only screened participants for psychological distress but also referred them to appropriate resources that matched their level of distress and preferences. The goal of this study was to determine whether a mental health screening and referral chatbot would be feasible and acceptable to users. METHODS We conducted an internet-based survey among a sample of US-based adults. Our survey collected demographic data along with a battery of measures assessing behavioral health and symptoms, stigma (label avoidance and perceived stigma), attitudes toward treatment-seeking, readiness for change, and technology readiness and acceptance. Participants were then offered to engage with our chatbot. Those who engaged with the chatbot completed a mental health screening, received a distress score based on this screening, were referred to resources appropriate for their current level of distress, and were asked to rate the acceptability of the chatbot. RESULTS We found that mental health screening using a chatbot was feasible, with 168 (75.7%) of our 222 participants completing mental health screening within the chatbot sessions. Various demographic characteristics were associated with a willingness to use the chatbot. The participants who used the chatbot found it to be acceptable. Logistic regression produced a significant model with perceived usefulness and symptoms as significant positive predictors of chatbot use for the overall sample, and label avoidance as the only significant predictor of chatbot use for those currently experiencing distress. CONCLUSIONS Label avoidance, the desire to avoid mental health services to avoid the stigmatized label of mental illness, is a significant negative predictor of care seeking. Therefore, our finding regarding label avoidance and chatbot use has significant public health implications in terms of facilitating access to mental health resources. Those who are high on label avoidance are not likely to seek care in a community mental health clinic, yet they are likely willing to engage with a mental health chatbot, participate in mental health screening, and receive mental health resources within the chatbot session. Chatbot technology may prove to be a way to engage those in care who have previously avoided treatment due to stigma.
Collapse
Affiliation(s)
- Kristin Kosyluk
- Department of Mental Health Law & Policy, University of South Florida, Tampa, FL, United States
| | - Tanner Baeder
- School of Social Work, University of South Florida, Tampa, FL, United States
| | - Karah Yeona Greene
- School of Social Work, University of South Florida, Tampa, FL, United States
| | - Jennifer T Tran
- Department of Mental Health Law & Policy, University of South Florida, Tampa, FL, United States
| | - Cassidy Bolton
- Department of Mental Health Law & Policy, University of South Florida, Tampa, FL, United States
| | - Nele Loecher
- Department of Mental Health Law & Policy, University of South Florida, Tampa, FL, United States
| | - Daniel DiEva
- School of Social Work, University of South Florida, Tampa, FL, United States
| | - Jerome T Galea
- School of Social Work, University of South Florida, Tampa, FL, United States
| |
Collapse
|
5
|
Huq SM, Maskeliūnas R, Damaševičius R. Dialogue agents for artificial intelligence-based conversational systems for cognitively disabled: a systematic review. Disabil Rehabil Assist Technol 2024; 19:1059-1078. [PMID: 36413423 DOI: 10.1080/17483107.2022.2146768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 10/28/2022] [Accepted: 11/07/2022] [Indexed: 11/23/2022]
Abstract
PURPOSE We present a systematic literature review of dialogue agents for Artificial Intelligence (AI) and agent-based conversational systems dealing with cognitive disability of aged and impaired people including dementia and Parkinson's disease. We analyze current applications, gaps, and challenges in the existing research body, and provide guidelines and recommendations for their future development and use. MATERIALS AND METHODS We perform this study by applying Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) criteria. We performed a systematic search using relevant databases (ACM Digital Library, Google Scholar, IEEE Xplore, PubMed, and Scopus). RESULTS This study identified 468 articles on the use of conversational agents in healthcare. We finally selected 124 articles based on their objectives and content as directly related to our main topic. CONCLUSION We identified the main challenges in the field and analyzed the typical examples of the application of conversational agents in the healthcare domain, the desired characteristics of conversational agents, and chatbot support for aged people and people with cognitive disabilities. Our results contribute to a discussion on conversational health agents and emphasize current knowledge gaps and challenges for future research.IMPLICATIONS FOR REHABILITATIONA systematic literature review of dialogue agents for artificial intelligence and agent-based conversational systems dealing with cognitive disability of aged and impaired people.Main challenges and desired characteristics of the conversational agents, and chatbot support for aged people and people with cognitive disability.Current knowledge gaps and challenges for remote healthcare and rehabilitation.Guidelines and recommendations for future development and use of conversational systems.
Collapse
Affiliation(s)
- Syed Mahmudul Huq
- Faculty of Informatics, Kaunas University of Technology, Kaunas, Lithuania
| | - Rytis Maskeliūnas
- Faculty of Informatics, Kaunas University of Technology, Kaunas, Lithuania
| | | |
Collapse
|
6
|
Karkosz S, Szymański R, Sanna K, Michałowski J. Effectiveness of a Web-based and Mobile Therapy Chatbot on Anxiety and Depressive Symptoms in Subclinical Young Adults: Randomized Controlled Trial. JMIR Form Res 2024; 8:e47960. [PMID: 38506892 PMCID: PMC10993129 DOI: 10.2196/47960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 08/09/2023] [Accepted: 08/10/2023] [Indexed: 03/21/2024] Open
Abstract
BACKGROUND There has been an increased need to provide specialized help for people with depressive and anxiety symptoms, particularly teenagers and young adults. There is evidence from a 2-week intervention that chatbots (eg, Woebot) are effective in reducing depression and anxiety, an effect that was not detected in the control group that was provided self-help materials. Although chatbots are a promising solution, there is limited scientific evidence for the efficacy of agent-guided cognitive behavioral therapy (CBT) outside the English language, especially for highly inflected languages. OBJECTIVE This study aimed to measure the efficacy of Fido, a therapy chatbot that uses the Polish language. It targets depressive and anxiety symptoms using CBT techniques. We hypothesized that participants using Fido would show a greater reduction in anxiety and depressive symptoms than the control group. METHODS We conducted a 2-arm, open-label, randomized controlled trial with 81 participants with subclinical depression or anxiety who were recruited via social media. Participants were divided into experimental (interacted with a fully automated Fido chatbot) and control (received a self-help book) groups. Both intervention methods addressed topics such as general psychoeducation and cognitive distortion identification and modification via Socratic questioning. The chatbot also featured suicidal ideation identification and redirection to suicide hotlines. We used self-assessment scales to measure primary outcomes, including the levels of depression, anxiety, worry tendencies, satisfaction with life, and loneliness at baseline, after the 2-week intervention and at the 1-month follow-up. We also controlled for secondary outcomes, including engagement and frequency of use. RESULTS There were no differences in anxiety and depressive symptoms between the groups at enrollment and baseline. After the intervention, depressive and anxiety symptoms were reduced in both groups (chatbot: n=36; control: n=38), which remained stable at the 1-month follow-up. Loneliness was not significantly different between the groups after the intervention, but an exploratory analysis showed a decline in loneliness among participants who used Fido more frequently. Both groups used their intervention technique with similar frequency; however, the control group spent more time (mean 117.57, SD 72.40 minutes) on the intervention than the Fido group (mean 79.44, SD 42.96 minutes). CONCLUSIONS We did not replicate the findings from previous (eg, Woebot) studies, as both arms yielded therapeutic effects. However, such results are in line with other research of Internet interventions. Nevertheless, Fido provided sufficient help to reduce anxiety and depressive symptoms and decreased perceived loneliness among high-frequency users, which is one of the first pieces of evidence of chatbot efficacy with agents that use a highly inflected language. Further research is needed to determine the long-term, real-world effectiveness of Fido and its efficacy in a clinical sample. TRIAL REGISTRATION ClinicalTrials.gov NCT05762939; https://clinicaltrials.gov/study/NCT05762939; Open Science Foundation Registry 2cqt3; https://osf.io/2cqt3.
Collapse
Affiliation(s)
- Stanisław Karkosz
- Laboratory of Affective Neuroscience in Poznan, SWPS University, Warsaw, Poland
| | - Robert Szymański
- Laboratory of Affective Neuroscience in Poznan, SWPS University, Warsaw, Poland
| | - Katarzyna Sanna
- Center for Research on Personality Development in Poznan, SWPS University, Warsaw, Poland
| | | |
Collapse
|
7
|
Nakao T, Miki S, Nakamura Y, Kikuchi T, Nomura Y, Hanaoka S, Yoshikawa T, Abe O. Capability of GPT-4V(ision) in the Japanese National Medical Licensing Examination: Evaluation Study. JMIR Med Educ 2024; 10:e54393. [PMID: 38470459 DOI: 10.2196/54393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 12/26/2023] [Accepted: 02/16/2024] [Indexed: 03/13/2024]
Abstract
BACKGROUND Previous research applying large language models (LLMs) to medicine was focused on text-based information. Recently, multimodal variants of LLMs acquired the capability of recognizing images. OBJECTIVE We aim to evaluate the image recognition capability of generative pretrained transformer (GPT)-4V, a recent multimodal LLM developed by OpenAI, in the medical field by testing how visual information affects its performance to answer questions in the 117th Japanese National Medical Licensing Examination. METHODS We focused on 108 questions that had 1 or more images as part of a question and presented GPT-4V with the same questions under two conditions: (1) with both the question text and associated images and (2) with the question text only. We then compared the difference in accuracy between the 2 conditions using the exact McNemar test. RESULTS Among the 108 questions with images, GPT-4V's accuracy was 68% (73/108) when presented with images and 72% (78/108) when presented without images (P=.36). For the 2 question categories, clinical and general, the accuracies with and those without images were 71% (70/98) versus 78% (76/98; P=.21) and 30% (3/10) versus 20% (2/10; P≥.99), respectively. CONCLUSIONS The additional information from the images did not significantly improve the performance of GPT-4V in the Japanese National Medical Licensing Examination.
Collapse
Affiliation(s)
- Takahiro Nakao
- Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, Bunkyo-ku, Tokyo, Japan
| | - Soichiro Miki
- Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, Bunkyo-ku, Tokyo, Japan
| | - Yuta Nakamura
- Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, Bunkyo-ku, Tokyo, Japan
| | - Tomohiro Kikuchi
- Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, Bunkyo-ku, Tokyo, Japan
- Department of Radiology, School of Medicine, Jichi Medical University, Shimotsuke, Tochigi, Japan
| | - Yukihiro Nomura
- Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, Bunkyo-ku, Tokyo, Japan
- Center for Frontier Medical Engineering, Chiba University, Inage-ku, Chiba, Japan
| | - Shouhei Hanaoka
- Department of Radiology, The University of Tokyo Hospital, Bunkyo-ku, Tokyo, Japan
| | - Takeharu Yoshikawa
- Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, Bunkyo-ku, Tokyo, Japan
| | - Osamu Abe
- Department of Radiology, The University of Tokyo Hospital, Bunkyo-ku, Tokyo, Japan
| |
Collapse
|
8
|
Reynolds K, Tejasvi T. Potential Use of ChatGPT in Responding to Patient Questions and Creating Patient Resources. JMIR Dermatol 2024; 7:e48451. [PMID: 38446541 PMCID: PMC10955382 DOI: 10.2196/48451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 08/11/2023] [Accepted: 02/22/2024] [Indexed: 03/07/2024] Open
Abstract
ChatGPT (OpenAI) is an artificial intelligence-based free natural language processing model that generates complex responses to user-generated prompts. The advent of this tool comes at a time when physician burnout is at an all-time high, which is attributed at least in part to time spent outside of the patient encounter within the electronic medical record (documenting the encounter, responding to patient messages, etc). Although ChatGPT is not specifically designed to provide medical information, it can generate preliminary responses to patients' questions about their medical conditions and can precipitately create educational patient resources, which do inevitably require rigorous editing and fact-checking on the part of the health care provider to ensure accuracy. In this way, this assistive technology has the potential to not only enhance a physician's efficiency and work-life balance but also enrich the patient-physician relationship and ultimately improve patient outcomes.
Collapse
Affiliation(s)
- Kelly Reynolds
- Department of Dermatology, University of Michigan, Ann Arbor, MI, United States
| | - Trilokraj Tejasvi
- Department of Dermatology, University of Michigan, Ann Arbor, MI, United States
| |
Collapse
|
9
|
Schenker Y, Abdullah S, Arnold R, Schmitz KH. Conversational Agents in Palliative Care: Potential Benefits, Risks, and Next Steps. J Palliat Med 2024; 27:296-300. [PMID: 38215235 DOI: 10.1089/jpm.2023.0534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2024] Open
Abstract
Conversational agents (sometimes called chatbots) are technology-based systems that use artificial intelligence to simulate human-to-human conversations. Research on conversational agents in health care is nascent but growing, with recent reviews highlighting the need for more robust evaluations in diverse settings and populations. In this article, we consider how conversational agents might function in palliative care-not by replacing clinicians, but by interacting with patients around select uncomplicated needs while facilitating more targeted and appropriate referrals to specialty palliative care services. We describe potential roles for conversational agents aligned with the core domains of quality palliative care and identify risks that must be considered and addressed in the development and use of these systems for people with serious illness. With careful consideration of risks and benefits, conversational agents represent promising tools that should be explored as one component of a multipronged approach for improving patient and family outcomes in serious illness.
Collapse
Affiliation(s)
- Yael Schenker
- Section of Palliative Care and Medical Ethics, Division of General Internal Medicine, University of Pittsbiurgh, Pittsburgh, Pennsylvania, USA
- Palliative Research Center (PaRC), University of Pittsburgh, Pittsburgh, Pennsylvania, USA
- UPMC Hillman Cancer Center, Pittsburgh, Pennsylvania, USA
| | - Saeed Abdullah
- College of Information Sciences and Technology, Penn State University, University Park, Pennsylvania, USA
| | - Robert Arnold
- Section of Palliative Care and Medical Ethics, Division of General Internal Medicine, University of Pittsbiurgh, Pittsburgh, Pennsylvania, USA
- Palliative Research Center (PaRC), University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Kathryn H Schmitz
- UPMC Hillman Cancer Center, Pittsburgh, Pennsylvania, USA
- Division of Hematology and Oncology, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
10
|
Sezgin E. Redefining Virtual Assistants in Health Care: The Future With Large Language Models. J Med Internet Res 2024; 26:e53225. [PMID: 38241074 PMCID: PMC10837753 DOI: 10.2196/53225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 12/25/2023] [Accepted: 01/02/2024] [Indexed: 01/23/2024] Open
Abstract
This editorial explores the evolving and transformative role of large language models (LLMs) in enhancing the capabilities of virtual assistants (VAs) in the health care domain, highlighting recent research on the performance of VAs and LLMs in health care information sharing. Focusing on recent research, this editorial unveils the marked improvement in the accuracy and clinical relevance of responses from LLMs, such as GPT-4, compared to current VAs, especially in addressing complex health care inquiries, like those related to postpartum depression. The improved accuracy and clinical relevance with LLMs mark a paradigm shift in digital health tools and VAs. Furthermore, such LLM applications have the potential to dynamically adapt and be integrated into existing VA platforms, offering cost-effective, scalable, and inclusive solutions. These suggest a significant increase in the applicable range of VA applications, as well as the increased value, risk, and impact in health care, moving toward more personalized digital health ecosystems. However, alongside these advancements, it is necessary to develop and adhere to ethical guidelines, regulatory frameworks, governance principles, and privacy and safety measures. We need a robust interdisciplinary collaboration to navigate the complexities of safely and effectively integrating LLMs into health care applications, ensuring that these emerging technologies align with the diverse needs and ethical considerations of the health care domain.
Collapse
Affiliation(s)
- Emre Sezgin
- The Abigail Wexner Reseach Institute at Nationwide Children's Hospital, Columbus, OH, United States
- The Ohio State University College of Medicine, Columbus, OH, United States
| |
Collapse
|
11
|
Holderried F, Stegemann-Philipps C, Herschbach L, Moldt JA, Nevins A, Griewatz J, Holderried M, Herrmann-Werner A, Festl-Wietek T, Mahling M. A Generative Pretrained Transformer (GPT)-Powered Chatbot as a Simulated Patient to Practice History Taking: Prospective, Mixed Methods Study. JMIR Med Educ 2024; 10:e53961. [PMID: 38227363 PMCID: PMC10828948 DOI: 10.2196/53961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 12/09/2023] [Accepted: 12/14/2023] [Indexed: 01/17/2024]
Abstract
BACKGROUND Communication is a core competency of medical professionals and of utmost importance for patient safety. Although medical curricula emphasize communication training, traditional formats, such as real or simulated patient interactions, can present psychological stress and are limited in repetition. The recent emergence of large language models (LLMs), such as generative pretrained transformer (GPT), offers an opportunity to overcome these restrictions. OBJECTIVE The aim of this study was to explore the feasibility of a GPT-driven chatbot to practice history taking, one of the core competencies of communication. METHODS We developed an interactive chatbot interface using GPT-3.5 and a specific prompt including a chatbot-optimized illness script and a behavioral component. Following a mixed methods approach, we invited medical students to voluntarily practice history taking. To determine whether GPT provides suitable answers as a simulated patient, the conversations were recorded and analyzed using quantitative and qualitative approaches. We analyzed the extent to which the questions and answers aligned with the provided script, as well as the medical plausibility of the answers. Finally, the students filled out the Chatbot Usability Questionnaire (CUQ). RESULTS A total of 28 students practiced with our chatbot (mean age 23.4, SD 2.9 years). We recorded a total of 826 question-answer pairs (QAPs), with a median of 27.5 QAPs per conversation and 94.7% (n=782) pertaining to history taking. When questions were explicitly covered by the script (n=502, 60.3%), the GPT-provided answers were mostly based on explicit script information (n=471, 94.4%). For questions not covered by the script (n=195, 23.4%), the GPT answers used 56.4% (n=110) fictitious information. Regarding plausibility, 842 (97.9%) of 860 QAPs were rated as plausible. Of the 14 (2.1%) implausible answers, GPT provided answers rated as socially desirable, leaving role identity, ignoring script information, illogical reasoning, and calculation error. Despite these results, the CUQ revealed an overall positive user experience (77/100 points). CONCLUSIONS Our data showed that LLMs, such as GPT, can provide a simulated patient experience and yield a good user experience and a majority of plausible answers. Our analysis revealed that GPT-provided answers use either explicit script information or are based on available information, which can be understood as abductive reasoning. Although rare, the GPT-based chatbot provides implausible information in some instances, with the major tendency being socially desirable instead of medically plausible information.
Collapse
Affiliation(s)
- Friederike Holderried
- Tübingen Institute for Medical Education, Eberhard Karls University, Tübingen, Germany
| | | | - Lea Herschbach
- Tübingen Institute for Medical Education, Eberhard Karls University, Tübingen, Germany
| | - Julia-Astrid Moldt
- Tübingen Institute for Medical Education, Eberhard Karls University, Tübingen, Germany
| | - Andrew Nevins
- Division of Infectious Diseases, Stanford University School of Medicine, Stanford, CA, United States
| | - Jan Griewatz
- Tübingen Institute for Medical Education, Eberhard Karls University, Tübingen, Germany
| | - Martin Holderried
- Department of Medical Development, Process and Quality Management, University Hospital Tübingen, Tübingen, Germany
| | - Anne Herrmann-Werner
- Tübingen Institute for Medical Education, Eberhard Karls University, Tübingen, Germany
| | - Teresa Festl-Wietek
- Tübingen Institute for Medical Education, Eberhard Karls University, Tübingen, Germany
| | - Moritz Mahling
- Tübingen Institute for Medical Education, Eberhard Karls University, Tübingen, Germany
- Department of Diabetology, Endocrinology, Nephrology, Section of Nephrology and Hypertension, University Hospital Tübingen, Tübingen, Germany
| |
Collapse
|
12
|
Tan TC, Roslan NEB, Li JW, Zou X, Chen X, Santosa A. Patient Acceptability of Symptom Screening and Patient Education Using a Chatbot for Autoimmune Inflammatory Diseases: Survey Study. JMIR Form Res 2023; 7:e49239. [PMID: 37219234 PMCID: PMC11019963 DOI: 10.2196/49239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Revised: 08/27/2023] [Accepted: 11/05/2023] [Indexed: 05/24/2023] Open
Abstract
BACKGROUND Chatbots have the potential to enhance health care interaction, satisfaction, and service delivery. However, data regarding their acceptance across diverse patient populations are limited. In-depth studies on the reception of chatbots by patients with chronic autoimmune inflammatory diseases are lacking, although such studies are vital for facilitating the effective integration of chatbots in rheumatology care. OBJECTIVE We aim to assess patient perceptions and acceptance of a chatbot designed for autoimmune inflammatory rheumatic diseases (AIIRDs). METHODS We administered a comprehensive survey in an outpatient setting at a top-tier rheumatology referral center. The target cohort included patients who interacted with a chatbot explicitly tailored to facilitate diagnosis and obtain information on AIIRDs. Following the RE-AIM (Reach, Effectiveness, Adoption, Implementation and Maintenance) framework, the survey was designed to gauge the effectiveness, user acceptability, and implementation of the chatbot. RESULTS Between June and October 2022, we received survey responses from 200 patients, with an equal number of 100 initial consultations and 100 follow-up (FU) visits. The mean scores on a 5-point acceptability scale ranged from 4.01 (SD 0.63) to 4.41 (SD 0.54), indicating consistently high ratings across the different aspects of chatbot performance. Multivariate regression analysis indicated that having a FU visit was significantly associated with a greater willingness to reuse the chatbot for symptom determination (P=.01). Further, patients' comfort with chatbot diagnosis increased significantly after meeting physicians (P<.001). We observed no significant differences in chatbot acceptance according to sex, education level, or diagnosis category. CONCLUSIONS This study underscores that chatbots tailored to AIIRDs have a favorable reception. The inclination of FU patients to engage with the chatbot signifies the possible influence of past clinical encounters and physician affirmation on its use. Although further exploration is required to refine their integration, the prevalent positive perceptions suggest that chatbots have the potential to strengthen the bridge between patients and health care providers, thus enhancing the delivery of rheumatology care to various cohorts.
Collapse
Affiliation(s)
- Tze Chin Tan
- Department of Rheumatology and Immunology, Singapore General Hospital, Singapore, Singapore
- Medicine Academic Clinical Programme, SingHealth-Duke-NUS, Singapore, Singapore
| | - Nur Emillia Binte Roslan
- Medicine Academic Clinical Programme, SingHealth-Duke-NUS, Singapore, Singapore
- Department of General Medicine, Sengkang General Hospital, Singapore, Singapore
| | - James Weiquan Li
- Medicine Academic Clinical Programme, SingHealth-Duke-NUS, Singapore, Singapore
- Department of Gastroenterology and Hepatology, Changi General Hospital, Singapore, Singapore
| | - Xinying Zou
- Internal Medicine Clinic, Changi General Hospital, Singapore, Singapore
| | - Xiangmei Chen
- Internal Medicine Clinic, Changi General Hospital, Singapore, Singapore
| | - Anindita Santosa
- Medicine Academic Clinical Programme, SingHealth-Duke-NUS, Singapore, Singapore
- Division of Rheumatology and Immunology, Department of Medicine, Changi General Hospital, Singapore, Singapore
| |
Collapse
|
13
|
Tangadulrat P, Sono S, Tangtrakulwanich B. Using ChatGPT for Clinical Practice and Medical Education: Cross-Sectional Survey of Medical Students' and Physicians' Perceptions. JMIR Med Educ 2023; 9:e50658. [PMID: 38133908 PMCID: PMC10770783 DOI: 10.2196/50658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Revised: 10/17/2023] [Accepted: 12/11/2023] [Indexed: 12/23/2023]
Abstract
BACKGROUND ChatGPT is a well-known large language model-based chatbot. It could be used in the medical field in many aspects. However, some physicians are still unfamiliar with ChatGPT and are concerned about its benefits and risks. OBJECTIVE We aim to evaluate the perception of physicians and medical students toward using ChatGPT in the medical field. METHODS A web-based questionnaire was sent to medical students, interns, residents, and attending staff with questions regarding their perception toward using ChatGPT in clinical practice and medical education. Participants were also asked to rate their perception of ChatGPT's generated response about knee osteoarthritis. RESULTS Participants included 124 medical students, 46 interns, 37 residents, and 32 attending staff. After reading ChatGPT's response, 132 of the 239 (55.2%) participants had a positive rating about using ChatGPT for clinical practice. The proportion of positive answers was significantly lower in graduated physicians (48/115, 42%) compared with medical students (84/124, 68%; P<.001). Participants listed a lack of a patient-specific treatment plan, updated evidence, and a language barrier as ChatGPT's pitfalls. Regarding using ChatGPT for medical education, the proportion of positive responses was also significantly lower in graduate physicians (71/115, 62%) compared to medical students (103/124, 83.1%; P<.001). Participants were concerned that ChatGPT's response was too superficial, might lack scientific evidence, and might need expert verification. CONCLUSIONS Medical students generally had a positive perception of using ChatGPT for guiding treatment and medical education, whereas graduated doctors were more cautious in this regard. Nonetheless, both medical students and graduated doctors positively perceived using ChatGPT for creating patient educational materials.
Collapse
Affiliation(s)
- Pasin Tangadulrat
- Department of Orthopedics, Faculty of Medicine, Prince of Songkla University, Hatyai, Thailand
| | - Supinya Sono
- Division of Family and Preventive Medicine, Faculty of Medicine, Prince of Songkla University, Hatyai, Thailand
| | | |
Collapse
|
14
|
Xue J, Zhang B, Zhao Y, Zhang Q, Zheng C, Jiang J, Li H, Liu N, Li Z, Fu W, Peng Y, Logan J, Zhang J, Xiang X. Evaluation of the Current State of Chatbots for Digital Health: Scoping Review. J Med Internet Res 2023; 25:e47217. [PMID: 38113097 PMCID: PMC10762606 DOI: 10.2196/47217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 08/15/2023] [Accepted: 11/24/2023] [Indexed: 12/21/2023] Open
Abstract
BACKGROUND Chatbots have become ubiquitous in our daily lives, enabling natural language conversations with users through various modes of communication. Chatbots have the potential to play a significant role in promoting health and well-being. As the number of studies and available products related to chatbots continues to rise, there is a critical need to assess product features to enhance the design of chatbots that effectively promote health and behavioral change. OBJECTIVE This scoping review aims to provide a comprehensive assessment of the current state of health-related chatbots, including the chatbots' characteristics and features, user backgrounds, communication models, relational building capacity, personalization, interaction, responses to suicidal thoughts, and users' in-app experiences during chatbot use. Through this analysis, we seek to identify gaps in the current research, guide future directions, and enhance the design of health-focused chatbots. METHODS Following the scoping review methodology by Arksey and O'Malley and guided by the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) checklist, this study used a two-pronged approach to identify relevant chatbots: (1) searching the iOS and Android App Stores and (2) reviewing scientific literature through a search strategy designed by a librarian. Overall, 36 chatbots were selected based on predefined criteria from both sources. These chatbots were systematically evaluated using a comprehensive framework developed for this study, including chatbot characteristics, user backgrounds, building relational capacity, personalization, interaction models, responses to critical situations, and user experiences. Ten coauthors were responsible for downloading and testing the chatbots, coding their features, and evaluating their performance in simulated conversations. The testing of all chatbot apps was limited to their free-to-use features. RESULTS This review provides an overview of the diversity of health-related chatbots, encompassing categories such as mental health support, physical activity promotion, and behavior change interventions. Chatbots use text, animations, speech, images, and emojis for communication. The findings highlight variations in conversational capabilities, including empathy, humor, and personalization. Notably, concerns regarding safety, particularly in addressing suicidal thoughts, were evident. Approximately 44% (16/36) of the chatbots effectively addressed suicidal thoughts. User experiences and behavioral outcomes demonstrated the potential of chatbots in health interventions, but evidence remains limited. CONCLUSIONS This scoping review underscores the significance of chatbots in health-related applications and offers insights into their features, functionalities, and user experiences. This study contributes to advancing the understanding of chatbots' role in digital health interventions, thus paving the way for more effective and user-centric health promotion strategies. This study informs future research directions, emphasizing the need for rigorous randomized control trials, standardized evaluation metrics, and user-centered design to unlock the full potential of chatbots in enhancing health and well-being. Future research should focus on addressing limitations, exploring real-world user experiences, and implementing robust data security and privacy measures.
Collapse
Affiliation(s)
- Jia Xue
- Factor Inwentash Faculty of Social Work, University of Toronto, Toronto, ON, Canada
- Faculty of Information, University of Toronto, Toronto, ON, Canada
- Artificial Intelligence for Justice Lab, University of Toronto, Toronto, ON, Canada
| | - Bolun Zhang
- Faculty of Information, University of Toronto, Toronto, ON, Canada
- Artificial Intelligence for Justice Lab, University of Toronto, Toronto, ON, Canada
| | - Yaxi Zhao
- Faculty of Information, University of Toronto, Toronto, ON, Canada
- Artificial Intelligence for Justice Lab, University of Toronto, Toronto, ON, Canada
| | - Qiaoru Zhang
- Artificial Intelligence for Justice Lab, University of Toronto, Toronto, ON, Canada
- Faculty of Arts and Science, University of Toronto, Toronto, ON, Canada
| | - Chengda Zheng
- Artificial Intelligence for Justice Lab, University of Toronto, Toronto, ON, Canada
| | - Jielin Jiang
- Artificial Intelligence for Justice Lab, University of Toronto, Toronto, ON, Canada
| | - Hanjia Li
- Artificial Intelligence for Justice Lab, University of Toronto, Toronto, ON, Canada
| | - Nian Liu
- Artificial Intelligence for Justice Lab, University of Toronto, Toronto, ON, Canada
| | - Ziqian Li
- Artificial Intelligence for Justice Lab, University of Toronto, Toronto, ON, Canada
| | - Weiying Fu
- Artificial Intelligence for Justice Lab, University of Toronto, Toronto, ON, Canada
| | - Yingdong Peng
- Artificial Intelligence for Justice Lab, University of Toronto, Toronto, ON, Canada
| | - Judith Logan
- John P Robarts Library, University of Toronto, Toronto, ON, Canada
| | - Jingwen Zhang
- Department of Communication, University of California Davis, Davis, CA, United States
| | - Xiaoling Xiang
- School of Social Work, University of Michigan, Ann Arbor, MI, United States
| |
Collapse
|
15
|
Watari T, Takagi S, Sakaguchi K, Nishizaki Y, Shimizu T, Yamamoto Y, Tokuda Y. Performance Comparison of ChatGPT-4 and Japanese Medical Residents in the General Medicine In-Training Examination: Comparison Study. JMIR Med Educ 2023; 9:e52202. [PMID: 38055323 PMCID: PMC10733815 DOI: 10.2196/52202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 10/22/2023] [Accepted: 11/03/2023] [Indexed: 12/07/2023]
Abstract
BACKGROUND The reliability of GPT-4, a state-of-the-art expansive language model specializing in clinical reasoning and medical knowledge, remains largely unverified across non-English languages. OBJECTIVE This study aims to compare fundamental clinical competencies between Japanese residents and GPT-4 by using the General Medicine In-Training Examination (GM-ITE). METHODS We used the GPT-4 model provided by OpenAI and the GM-ITE examination questions for the years 2020, 2021, and 2022 to conduct a comparative analysis. This analysis focused on evaluating the performance of individuals who were concluding their second year of residency in comparison to that of GPT-4. Given the current abilities of GPT-4, our study included only single-choice exam questions, excluding those involving audio, video, or image data. The assessment included 4 categories: general theory (professionalism and medical interviewing), symptomatology and clinical reasoning, physical examinations and clinical procedures, and specific diseases. Additionally, we categorized the questions into 7 specialty fields and 3 levels of difficulty, which were determined based on residents' correct response rates. RESULTS Upon examination of 137 GM-ITE questions in Japanese, GPT-4 scores were significantly higher than the mean scores of residents (residents: 55.8%, GPT-4: 70.1%; P<.001). In terms of specific disciplines, GPT-4 scored 23.5 points higher in the "specific diseases," 30.9 points higher in "obstetrics and gynecology," and 26.1 points higher in "internal medicine." In contrast, GPT-4 scores in "medical interviewing and professionalism," "general practice," and "psychiatry" were lower than those of the residents, although this discrepancy was not statistically significant. Upon analyzing scores based on question difficulty, GPT-4 scores were 17.2 points lower for easy problems (P=.007) but were 25.4 and 24.4 points higher for normal and difficult problems, respectively (P<.001). In year-on-year comparisons, GPT-4 scores were 21.7 and 21.5 points higher in the 2020 (P=.01) and 2022 (P=.003) examinations, respectively, but only 3.5 points higher in the 2021 examinations (no significant difference). CONCLUSIONS In the Japanese language, GPT-4 also outperformed the average medical residents in the GM-ITE test, originally designed for them. Specifically, GPT-4 demonstrated a tendency to score higher on difficult questions with low resident correct response rates and those demanding a more comprehensive understanding of diseases. However, GPT-4 scored comparatively lower on questions that residents could readily answer, such as those testing attitudes toward patients and professionalism, as well as those necessitating an understanding of context and communication. These findings highlight the strengths and limitations of artificial intelligence applications in medical education and practice.
Collapse
Affiliation(s)
- Takashi Watari
- General Medicine Center, Shimane University Hospital, Izumo, Japan
- Department of Medicine, University of Michigan Medical School, Ann Arbor, MI, United States
- Medicine Service, VA Ann Arbor Healthcare System, Ann Arbor, MI, United States
| | - Soshi Takagi
- Faculty of Medicine, Shimane University, Izuom, Japan
| | - Kota Sakaguchi
- General Medicine Center, Shimane University Hospital, Izumo, Japan
| | - Yuji Nishizaki
- Division of Medical Education, Juntendo University School of Medicine, Tokyo, Japan
| | - Taro Shimizu
- Department of Diagnostic and Generalist Medicine, Dokkyo Medical University Hospital, Tochigi, Japan
| | - Yu Yamamoto
- Division of General Medicine, Center for Community Medicine, Jichi Medical University, Tochigi, Japan
| | - Yasuharu Tokuda
- Muribushi Okinawa Project for Teaching Hospitals, Okinawa, Japan
| |
Collapse
|
16
|
Thirunavukarasu AJ. How Can the Clinical Aptitude of AI Assistants Be Assayed? J Med Internet Res 2023; 25:e51603. [PMID: 38051572 PMCID: PMC10731545 DOI: 10.2196/51603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 09/28/2023] [Accepted: 11/20/2023] [Indexed: 12/07/2023] Open
Abstract
Large language models (LLMs) are exhibiting remarkable performance in clinical contexts, with exemplar results ranging from expert-level attainment in medical examination questions to superior accuracy and relevance when responding to patient queries compared to real doctors replying to queries on social media. The deployment of LLMs in conventional health care settings is yet to be reported, and there remains an open question as to what evidence should be required before such deployment is warranted. Early validation studies use unvalidated surrogate variables to represent clinical aptitude, and it may be necessary to conduct prospective randomized controlled trials to justify the use of an LLM for clinical advice or assistance, as potential pitfalls and pain points cannot be exhaustively predicted. This viewpoint states that as LLMs continue to revolutionize the field, there is an opportunity to improve the rigor of artificial intelligence (AI) research to reward innovation, conferring real benefits to real patients.
Collapse
Affiliation(s)
- Arun James Thirunavukarasu
- Oxford University Clinical Academic Graduate School, University of Oxford, Oxford, United Kingdom
- School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
17
|
Poivet R, Lopez Malet M, Pelachaud C, Auvray M. The influence of conversational agents' role and communication style on user experience. Front Psychol 2023; 14:1266186. [PMID: 38106384 PMCID: PMC10722890 DOI: 10.3389/fpsyg.2023.1266186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 11/07/2023] [Indexed: 12/19/2023] Open
Abstract
Conversational Agents (CAs) are characterized by their roles within a narrative and the communication style they adopt during conversations. Within computer games, users' evaluation of the narrative is influenced by their estimation of CAs' intelligence and believability. However, the impact of CAs' roles and communication styles on users' experience remains unclear. This research investigates such influence of CAs' roles and communication styles through a crime-solving textual game. Four different CAs were developed and each of them was assigned to a role of either witness or suspect and to a communication style than can be either aggressive or cooperative. Communication styles were simulated through a Wizard of Oz method. Users' task was to interact, through real-time written exchanges, with the four CAs and then to identify the culprit, assess the certainty of their judgments, and rank the CAs based on their conversational preferences. In addition, users' experience was evaluated using perceptual measures (perceived intelligence and believability scales) and behavioral measures (including analysis of users' input length, input delay, and conversation length). The results revealed that users' evaluation of CAs' intelligence and believability was primarily influenced by CAs' roles. On the other hand, users' conversational behaviors were mainly influenced by CAs' communication styles. CAs' communication styles also significantly determined users' choice of the culprit and conversational preferences.
Collapse
Affiliation(s)
- Remi Poivet
- Ubisoft Paris Studio, Paris, France
- Institut des Systèmes Intelligents et de Robotiques (ISIR), Sorbonne Université, Paris, France
| | | | - Catherine Pelachaud
- Institut des Systèmes Intelligents et de Robotiques (ISIR), Sorbonne Université, Paris, France
| | - Malika Auvray
- Institut des Systèmes Intelligents et de Robotiques (ISIR), Sorbonne Université, Paris, France
| |
Collapse
|
18
|
Dunn AG, Shih I, Ayre J, Spallek H. What generative AI means for trust in health communications. J Commun Healthc 2023; 16:385-388. [PMID: 37921509 DOI: 10.1080/17538068.2023.2277489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/04/2023]
Abstract
ABSTRACTLarge language models are fundamental technologies used in interfaces like ChatGPT and are poised to change the way people access and make sense of health information. The speed of uptake and investment suggests that these will be transformative technologies, but it is not yet clear what the implications might be for health communications. In this viewpoint, we draw on research about the adoption of new information technologies to focus on the ways that generative artificial intelligence (AI) tools like large language models might change how health information is produced, what health information people see, how marketing and misinformation might be mixed with evidence, and what people trust. We conclude that transparency and explainability in this space must be carefully considered to avoid unanticipated consequences.
Collapse
Affiliation(s)
- Adam G Dunn
- Faculty of Medicine and Health, Biomedical Informatics and Digital Health, School of Medical Sciences, The University of Sydney, Sydney, Australia
| | - Ivy Shih
- Media Office, The University of Sydney, Sydney, Australia
| | - Julie Ayre
- Faculty of Medicine and Heath, Sydney Health Literacy Lab, Sydney School of Public Health, The University of Sydney, Sydney, Australia
| | - Heiko Spallek
- Faculty of Medicine and Heath, Sydney Dental School, The University of Sydney, Sydney, Australia
| |
Collapse
|
19
|
Lakdawala N, Channa L, Gronbeck C, Lakdawala N, Weston G, Sloan B, Feng H. Assessing the Accuracy and Comprehensiveness of ChatGPT in Offering Clinical Guidance for Atopic Dermatitis and Acne Vulgaris. JMIR Dermatol 2023; 6:e50409. [PMID: 37962920 PMCID: PMC10685272 DOI: 10.2196/50409] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 09/13/2023] [Accepted: 10/30/2023] [Indexed: 11/15/2023] Open
Affiliation(s)
- Nehal Lakdawala
- University of Connecticut School of Medicine, Farmington, CT, United States
| | | | - Christian Gronbeck
- Department of Dermatology, University of Connecticut Health Center, Farmington, CT, United States
| | - Nikita Lakdawala
- The Ronald O. Perelman Department of Dermatology, New York University, New York, NY, United States
| | - Gillian Weston
- Department of Dermatology, University of Connecticut Health Center, Farmington, CT, United States
| | - Brett Sloan
- Department of Dermatology, University of Connecticut Health Center, Farmington, CT, United States
| | - Hao Feng
- Department of Dermatology, University of Connecticut Health Center, Farmington, CT, United States
| |
Collapse
|
20
|
Brown A, Kumar AT, Melamed O, Ahmed I, Wang YH, Deza A, Morcos M, Zhu L, Maslej M, Minian N, Sujaya V, Wolff J, Doggett O, Iantorno M, Ratto M, Selby P, Rose J. A Motivational Interviewing Chatbot With Generative Reflections for Increasing Readiness to Quit Smoking: Iterative Development Study. JMIR Ment Health 2023; 10:e49132. [PMID: 37847539 PMCID: PMC10618902 DOI: 10.2196/49132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 07/28/2023] [Accepted: 07/29/2023] [Indexed: 10/18/2023] Open
Abstract
BACKGROUND The motivational interviewing (MI) approach has been shown to help move ambivalent smokers toward the decision to quit smoking. There have been several attempts to broaden access to MI through text-based chatbots. These typically use scripted responses to client statements, but such nonspecific responses have been shown to reduce effectiveness. Recent advances in natural language processing provide a new way to create responses that are specific to a client's statements, using a generative language model. OBJECTIVE This study aimed to design, evolve, and measure the effectiveness of a chatbot system that can guide ambivalent people who smoke toward the decision to quit smoking with MI-style generative reflections. METHODS Over time, 4 different MI chatbot versions were evolved, and each version was tested with a separate group of ambivalent smokers. A total of 349 smokers were recruited through a web-based recruitment platform. The first chatbot version only asked questions without reflections on the answers. The second version asked the questions and provided reflections with an initial version of the reflection generator. The third version used an improved reflection generator, and the fourth version added extended interaction on some of the questions. Participants' readiness to quit was measured before the conversation and 1 week later using an 11-point scale that measured 3 attributes related to smoking cessation: readiness, confidence, and importance. The number of quit attempts made in the week before the conversation and the week after was surveyed; in addition, participants rated the perceived empathy of the chatbot. The main body of the conversation consists of 5 scripted questions, responses from participants, and (for 3 of the 4 versions) generated reflections. A pretrained transformer-based neural network was fine-tuned on examples of high-quality reflections to generate MI reflections. RESULTS The increase in average confidence using the nongenerative version was 1.0 (SD 2.0; P=.001), whereas for the 3 generative versions, the increases ranged from 1.2 to 1.3 (SD 2.0-2.3; P<.001). The extended conversation with improved generative reflections was the only version associated with a significant increase in average importance (0.7, SD 2.0; P<.001) and readiness (0.4, SD 1.7; P=.01). The enhanced reflection and extended conversations exhibited significantly better perceived empathy than the nongenerative conversation (P=.02 and P=.004, respectively). The number of quit attempts did not significantly change between the week before the conversation and the week after across all 4 conversations. CONCLUSIONS The results suggest that generative reflections increase the impact of a conversation on readiness to quit smoking 1 week later, although a significant portion of the impact seen so far can be achieved by only asking questions without the reflections. These results support further evolution of the chatbot conversation and can serve as a basis for comparison against more advanced versions.
Collapse
Affiliation(s)
- Andrew Brown
- The Edward S Rogers Sr Department of Electrical & Computer Engineering, University of Toronto, Toronto, ON, Canada
| | - Ash Tanuj Kumar
- The Edward S Rogers Sr Department of Electrical & Computer Engineering, University of Toronto, Toronto, ON, Canada
| | - Osnat Melamed
- INTREPID Lab, Centre for Addiction and Mental Health, Toronto, ON, Canada
- Department of Family and Community Medicine, University of Toronto, Toronto, ON, Canada
| | - Imtihan Ahmed
- The Edward S Rogers Sr Department of Electrical & Computer Engineering, University of Toronto, Toronto, ON, Canada
| | - Yu Hao Wang
- The Edward S Rogers Sr Department of Electrical & Computer Engineering, University of Toronto, Toronto, ON, Canada
| | - Arnaud Deza
- The Edward S Rogers Sr Department of Electrical & Computer Engineering, University of Toronto, Toronto, ON, Canada
| | - Marc Morcos
- The Edward S Rogers Sr Department of Electrical & Computer Engineering, University of Toronto, Toronto, ON, Canada
| | - Leon Zhu
- The Edward S Rogers Sr Department of Electrical & Computer Engineering, University of Toronto, Toronto, ON, Canada
| | - Marta Maslej
- Krembil Centre for Neuroinformatics, Centre for Addiction and Mental Health, Toronto, ON, Canada
| | - Nadia Minian
- INTREPID Lab, Centre for Addiction and Mental Health, Toronto, ON, Canada
- Department of Family and Community Medicine, University of Toronto, Toronto, ON, Canada
- Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, Canada
- Department of Pharmacology and Toxicology, University of Toronto, Toronto, ON, Canada
- Institute of Medical Sciences, University of Toronto, Toronto, ON, Canada
| | - Vidya Sujaya
- The Edward S Rogers Sr Department of Electrical & Computer Engineering, University of Toronto, Toronto, ON, Canada
| | - Jodi Wolff
- INTREPID Lab, Centre for Addiction and Mental Health, Toronto, ON, Canada
| | - Olivia Doggett
- Faculty of Information, University of Toronto, Toronto, ON, Canada
| | - Mathew Iantorno
- Faculty of Information, University of Toronto, Toronto, ON, Canada
| | - Matt Ratto
- Faculty of Information, University of Toronto, Toronto, ON, Canada
| | - Peter Selby
- INTREPID Lab, Centre for Addiction and Mental Health, Toronto, ON, Canada
- Department of Family and Community Medicine, University of Toronto, Toronto, ON, Canada
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - Jonathan Rose
- The Edward S Rogers Sr Department of Electrical & Computer Engineering, University of Toronto, Toronto, ON, Canada
- INTREPID Lab, Centre for Addiction and Mental Health, Toronto, ON, Canada
| |
Collapse
|
21
|
Moons P. For better or for worse: When chatbots influence human emotions and behaviours. Eur J Cardiovasc Nurs 2023:zvad098. [PMID: 37791604 DOI: 10.1093/eurjcn/zvad098] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 09/23/2023] [Accepted: 09/29/2023] [Indexed: 10/05/2023]
Affiliation(s)
- Philip Moons
- KU Leuven Department of Public Health and Primary Care, KU Leuven - University of Leuven, Kapucijnenvoer 35 PB7001, 3000 Leuven, Belgium
- Institute of Health and Care Sciences, University of Gothenburg, Arvid Wallgrens backe 1, 413 46 Gothenburg, Sweden
- Department of Paediatrics and Child Health, University of Cape Town, Klipfontein Rd, Rondebosch, 7700 Cape Town, South Africa
| |
Collapse
|
22
|
Meskó B. Prompt Engineering as an Important Emerging Skill for Medical Professionals: Tutorial. J Med Internet Res 2023; 25:e50638. [PMID: 37792434 PMCID: PMC10585440 DOI: 10.2196/50638] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 09/14/2023] [Accepted: 09/19/2023] [Indexed: 10/05/2023] Open
Abstract
Prompt engineering is a relatively new field of research that refers to the practice of designing, refining, and implementing prompts or instructions that guide the output of large language models (LLMs) to help in various tasks. With the emergence of LLMs, the most popular one being ChatGPT that has attracted the attention of over a 100 million users in only 2 months, artificial intelligence (AI), especially generative AI, has become accessible for the masses. This is an unprecedented paradigm shift not only because of the use of AI becoming more widespread but also due to the possible implications of LLMs in health care. As more patients and medical professionals use AI-based tools, LLMs being the most popular representatives of that group, it seems inevitable to address the challenge to improve this skill. This paper summarizes the current state of research about prompt engineering and, at the same time, aims at providing practical recommendations for the wide range of health care professionals to improve their interactions with LLMs.
Collapse
|
23
|
Wrightson-Hester AR, Anderson G, Dunstan J, McEvoy PM, Sutton CJ, Myers B, Egan S, Tai S, Johnston-Hollitt M, Chen W, Gedeon T, Mansell W. An Artificial Therapist (Manage Your Life Online) to Support the Mental Health of Youth: Co-Design and Case Series. JMIR Hum Factors 2023; 10:e46849. [PMID: 37477969 PMCID: PMC10403793 DOI: 10.2196/46849] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 05/26/2023] [Accepted: 06/17/2023] [Indexed: 07/22/2023] Open
Abstract
BACKGROUND The prevalence of child and adolescent mental health issues is increasing faster than the number of services available, leading to a shortfall. Mental health chatbots are a highly scalable method to address this gap. Manage Your Life Online (MYLO) is an artificially intelligent chatbot that emulates the method of levels therapy. Method of levels is a therapy that uses curious questioning to support the sustained awareness and exploration of current problems. OBJECTIVE This study aimed to assess the feasibility and acceptability of a co-designed interface for MYLO in young people aged 16 to 24 years with mental health problems. METHODS An iterative co-design phase occurred over 4 months, in which feedback was elicited from a group of young people (n=7) with lived experiences of mental health issues. This resulted in the development of a progressive web application version of MYLO that could be used on mobile phones. We conducted a case series to assess the feasibility and acceptability of MYLO in 13 young people over 2 weeks. During this time, the participants tested MYLO and completed surveys including clinical outcomes and acceptability measures. We then conducted focus groups and interviews and used thematic analysis to obtain feedback on MYLO and identify recommendations for further improvements. RESULTS Most participants were positive about their experience of using MYLO and would recommend MYLO to others. The participants enjoyed the simplicity of the interface, found it easy to use, and rated it as acceptable using the System Usability Scale. Inspection of the use data found evidence that MYLO can learn and adapt its questioning in response to user input. We found a large effect size for the decrease in participants' problem-related distress and a medium effect size for the increase in their self-reported tendency to resolve goal conflicts (the proposed mechanism of change) in the testing phase. Some patients also experienced a reliable change in their clinical outcome measures over the 2 weeks. CONCLUSIONS We established the feasibility and acceptability of MYLO. The initial outcomes suggest that MYLO has the potential to support the mental health of young people and help them resolve their own problems. We aim to establish whether the use of MYLO leads to a meaningful reduction in participants' symptoms of depression and anxiety and whether these are maintained over time by conducting a randomized controlled evaluation trial.
Collapse
Affiliation(s)
- Aimee-Rose Wrightson-Hester
- Curtin enAble Institute, Faculty of Health Sciences, Curtin University, Perth, Australia
- Discipline of Psychology, School of Population Health, Curtin University, Perth, Australia
- School of Arts and Humanities, Edith Cowan University, Perth, Australia
| | | | - Joel Dunstan
- Curtin Institute for Data Science, Curtin University, Perth, Australia
| | - Peter M McEvoy
- Curtin enAble Institute, Faculty of Health Sciences, Curtin University, Perth, Australia
- Discipline of Psychology, School of Population Health, Curtin University, Perth, Australia
- Centre for Clinical Interventions, North Metropolitan Health Service, Nedlands, Australia
| | - Christopher J Sutton
- Centre for Biostatistics, School of Health Sciences, The University of Manchester, Manchester, United Kingdom
| | - Bronwyn Myers
- Curtin enAble Institute, Faculty of Health Sciences, Curtin University, Perth, Australia
- Alcohol, Tobacco and Other Drug Research Unit, South African Medical Research Council, Parow, South Africa
- Division of Addiction Psychiatry, Department of Psychiatry and Mental Health, University of Cape Town, Cape Town, South Africa
| | - Sarah Egan
- Curtin enAble Institute, Faculty of Health Sciences, Curtin University, Perth, Australia
- Discipline of Psychology, School of Population Health, Curtin University, Perth, Australia
| | - Sara Tai
- Department of Clinical Psychology, School of Health Sciences, The University of Manchester, Manchester, United Kingdom
| | | | - Wai Chen
- Curtin enAble Institute, Faculty of Health Sciences, Curtin University, Perth, Australia
- Mental Health Service, Fiona Stanley Hospital, Perth, Australia
- Curtin Medical School, Curtin University, Perth, Australia
- Centre of Excellence in Medical Biotechnology, Faculty of Medical Science, Naresuan University, Phitsanulok, Thailand
| | - Tom Gedeon
- Optus-Curtin Centre of Excellence in AI, School of Electronic Engineering, Computing and Mathematical Sciences, Curtin University, Perth, Australia
| | - Warren Mansell
- Curtin enAble Institute, Faculty of Health Sciences, Curtin University, Perth, Australia
- Discipline of Psychology, School of Population Health, Curtin University, Perth, Australia
- Department of Clinical Psychology, School of Health Sciences, The University of Manchester, Manchester, United Kingdom
| |
Collapse
|
24
|
Mancone S, Diotaiuti P, Valente G, Corrado S, Bellizzi F, Vilarino GT, Andrade A. The Use of Voice Assistant for Psychological Assessment Elicits Empathy and Engagement While Maintaining Good Psychometric Properties. Behav Sci (Basel) 2023; 13:550. [PMID: 37503997 PMCID: PMC10376154 DOI: 10.3390/bs13070550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 06/20/2023] [Accepted: 06/28/2023] [Indexed: 07/29/2023] Open
Abstract
This study aimed to use the Alexa vocal assistant as an administerer of psychometric tests, assessing the efficiency and validity of this measurement. A total of 300 participants were administered the Interpersonal Reactivity Index (IRI). After a week, the administration was repeated, but the participants were randomly divided into groups of 100 participants each. In the first, the test was administered by means of a paper version; in the second, the questionnaire was read to the participants in person, and the operator contemporaneously recorded the answers declared by the participants; in the third group, the questionnaire was directly administered by the Alexa voice device, after specific reprogramming. The third group was also administered, as a post-session survey, the Engagement and Perceptions of the Bot Scale (EPVS), a short version of the Communication Styles Inventory (CSI), the Marlowe-Crowne Social Desirability Scale (MCSDS), and an additional six items to measure degrees of concentration, ease, and perceived pressure at the beginning and at the end of the administration. The results confirmed that the IRI did keep measurement invariance within the three conditions. The administration through vocal assistant showed an empathic activation effect significantly superior to the conditions of pencil-paper and operator-in-presence. The results indicated an engagement and positive evaluation of the interactive experience, with reported perceptions of closeness, warmth, competence, and human-likeness associated with higher values of empathetic activation and lower values of personal discomfort.
Collapse
Affiliation(s)
- Stefania Mancone
- Department of Human Sciences, Society and Health, University of Cassino and Southern Lazio, 03043 Cassino, Italy
| | - Pierluigi Diotaiuti
- Department of Human Sciences, Society and Health, University of Cassino and Southern Lazio, 03043 Cassino, Italy
| | - Giuseppe Valente
- Department of Human Sciences, Society and Health, University of Cassino and Southern Lazio, 03043 Cassino, Italy
| | - Stefano Corrado
- Department of Human Sciences, Society and Health, University of Cassino and Southern Lazio, 03043 Cassino, Italy
| | - Fernando Bellizzi
- Department of Human Sciences, Society and Health, University of Cassino and Southern Lazio, 03043 Cassino, Italy
| | - Guilherme Torres Vilarino
- Health and Sports Science Center, Department of Physical Education, Santa Catarina State University, Florianópolis 88035-901, Brazil
| | - Alexandro Andrade
- Health and Sports Science Center, Department of Physical Education, Santa Catarina State University, Florianópolis 88035-901, Brazil
| |
Collapse
|
25
|
Walker HL, Ghani S, Kuemmerli C, Nebiker CA, Müller BP, Raptis DA, Staubli SM. Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument. J Med Internet Res 2023; 25:e47479. [PMID: 37389908 PMCID: PMC10365578 DOI: 10.2196/47479] [Citation(s) in RCA: 38] [Impact Index Per Article: 38.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Revised: 06/07/2023] [Accepted: 06/15/2023] [Indexed: 07/01/2023] Open
Abstract
BACKGROUND ChatGPT-4 is the latest release of a novel artificial intelligence (AI) chatbot able to answer freely formulated and complex questions. In the near future, ChatGPT could become the new standard for health care professionals and patients to access medical information. However, little is known about the quality of medical information provided by the AI. OBJECTIVE We aimed to assess the reliability of medical information provided by ChatGPT. METHODS Medical information provided by ChatGPT-4 on the 5 hepato-pancreatico-biliary (HPB) conditions with the highest global disease burden was measured with the Ensuring Quality Information for Patients (EQIP) tool. The EQIP tool is used to measure the quality of internet-available information and consists of 36 items that are divided into 3 subsections. In addition, 5 guideline recommendations per analyzed condition were rephrased as questions and input to ChatGPT, and agreement between the guidelines and the AI answer was measured by 2 authors independently. All queries were repeated 3 times to measure the internal consistency of ChatGPT. RESULTS Five conditions were identified (gallstone disease, pancreatitis, liver cirrhosis, pancreatic cancer, and hepatocellular carcinoma). The median EQIP score across all conditions was 16 (IQR 14.5-18) for the total of 36 items. Divided by subsection, median scores for content, identification, and structure data were 10 (IQR 9.5-12.5), 1 (IQR 1-1), and 4 (IQR 4-5), respectively. Agreement between guideline recommendations and answers provided by ChatGPT was 60% (15/25). Interrater agreement as measured by the Fleiss κ was 0.78 (P<.001), indicating substantial agreement. Internal consistency of the answers provided by ChatGPT was 100%. CONCLUSIONS ChatGPT provides medical information of comparable quality to available static internet information. Although currently of limited quality, large language models could become the future standard for patients and health care professionals to gather medical information.
Collapse
Affiliation(s)
| | - Shahi Ghani
- Royal Free London NHS Foundation Trust, London, United Kingdom
| | - Christoph Kuemmerli
- Clarunis - University Center for Gastrointestinal and Liver Diseases, Basel, Switzerland
| | | | - Beat Peter Müller
- Clarunis - University Center for Gastrointestinal and Liver Diseases, Basel, Switzerland
| | - Dimitri Aristotle Raptis
- Organ Transplant Center of Excellence, King Faisal Specialist Hospital & Research Centre, Riyadh, Saudi Arabia
| | - Sebastian Manuel Staubli
- Royal Free London NHS Foundation Trust, London, United Kingdom
- Clarunis - University Center for Gastrointestinal and Liver Diseases, Basel, Switzerland
| |
Collapse
|
26
|
Spagnolli A, Cenzato G, Gamberini L. Modeling the Conversation with Digital Health Assistants in Adherence Apps: Some Considerations on the Similarities and Differences with Familiar Medical Encounters. Int J Environ Res Public Health 2023; 20:6182. [PMID: 37372768 DOI: 10.3390/ijerph20126182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 06/05/2023] [Accepted: 06/12/2023] [Indexed: 06/29/2023]
Abstract
Digital health assistants (DHAs) are conversational agents incorporated into health systems' interfaces, exploiting an intuitive interaction format appreciated by the users. At the same time, however, their conversational format can evoke interactional practices typical of health encounters with human doctors that might misguide the users. Awareness of the similarities and differences between novel mediated encounters and more familiar ones helps designers avoid unintended expectations and leverage suitable ones. Focusing on adherence apps, we analytically discuss the structure of DHA-patient encounters against the literature on physician-patient encounters and the specific affordances of DHAs. We synthesize our discussion into a design checklist and add some considerations about DHA with unconstrained natural language interfaces.
Collapse
Affiliation(s)
- Anna Spagnolli
- Department of General Psychology, University of Padua, 35131 Padua, Italy
- Human Inspired Technologies Research Centre, University of Padua, 35131 Padua, Italy
| | - Giulia Cenzato
- Department of General Psychology, University of Padua, 35131 Padua, Italy
- Human Inspired Technologies Research Centre, University of Padua, 35131 Padua, Italy
| | - Luciano Gamberini
- Department of General Psychology, University of Padua, 35131 Padua, Italy
- Human Inspired Technologies Research Centre, University of Padua, 35131 Padua, Italy
| |
Collapse
|
27
|
van der Schyff EL, Ridout B, Amon KL, Forsyth R, Campbell AJ. Providing Self-Led Mental Health Support Through an Artificial Intelligence-Powered Chat Bot (Leora) to Meet the Demand of Mental Health Care. J Med Internet Res 2023; 25:e46448. [PMID: 37335608 DOI: 10.2196/46448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2023] [Revised: 04/21/2023] [Accepted: 05/17/2023] [Indexed: 06/21/2023] Open
Abstract
Digital mental health services are becoming increasingly valuable for addressing the global public health burden of mental ill-health. There is significant demand for scalable and effective web-based mental health services. Artificial intelligence (AI) has the potential to improve mental health through the deployment of chatbots. These chatbots can provide round-the-clock support and triage individuals who are reluctant to access traditional health care due to stigma. The aim of this viewpoint paper is to consider the feasibility of AI-powered platforms to support mental well-being. The Leora model is considered a model with the potential to provide mental health support. Leora is a conversational agent that uses AI to engage in conversations with users about their mental health and provide support for minimal-to-mild symptoms of anxiety and depression. The tool is designed to be accessible, personalized, and discreet, offering strategies for promoting well-being and acting as a web-based self-care coach. Across all AI-powered mental health services, there are several challenges in the ethical development and deployment of AI in mental health treatment, including trust and transparency, bias and health inequity, and the potential for negative consequences. To ensure the effective and ethical use of AI in mental health care, researchers must carefully consider these challenges and engage with key stakeholders to provide high-quality mental health support. Validation of the Leora platform through rigorous user testing will be the next step in ensuring the model is effective.
Collapse
Affiliation(s)
- Emma L van der Schyff
- Cyberpsychology Research Group, Biomedical Informatics and Digital Health Theme, School of Medical Sciences, The University of Sydney, Sydney, Australia
| | - Brad Ridout
- Cyberpsychology Research Group, Biomedical Informatics and Digital Health Theme, School of Medical Sciences, The University of Sydney, Sydney, Australia
| | - Krestina L Amon
- Cyberpsychology Research Group, Biomedical Informatics and Digital Health Theme, School of Medical Sciences, The University of Sydney, Sydney, Australia
| | - Rowena Forsyth
- Cyberpsychology Research Group, Biomedical Informatics and Digital Health Theme, School of Medical Sciences, The University of Sydney, Sydney, Australia
| | - Andrew J Campbell
- Cyberpsychology Research Group, Biomedical Informatics and Digital Health Theme, School of Medical Sciences, The University of Sydney, Sydney, Australia
| |
Collapse
|
28
|
Grodniewicz JP, Hohol M. Waiting for a digital therapist: three challenges on the path to psychotherapy delivered by artificial intelligence. Front Psychiatry 2023; 14:1190084. [PMID: 37324824 PMCID: PMC10267322 DOI: 10.3389/fpsyt.2023.1190084] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 05/15/2023] [Indexed: 06/17/2023] Open
Abstract
Growing demand for broadly accessible mental health care, together with the rapid development of new technologies, trigger discussions about the feasibility of psychotherapeutic interventions based on interactions with Conversational Artificial Intelligence (CAI). Many authors argue that while currently available CAI can be a useful supplement for human-delivered psychotherapy, it is not yet capable of delivering fully fledged psychotherapy on its own. The goal of this paper is to investigate what are the most important obstacles on our way to developing CAI systems capable of delivering psychotherapy in the future. To this end, we formulate and discuss three challenges central to this quest. Firstly, we might not be able to develop effective AI-based psychotherapy unless we deepen our understanding of what makes human-delivered psychotherapy effective. Secondly, assuming that it requires building a therapeutic relationship, it is not clear whether psychotherapy can be delivered by non-human agents. Thirdly, conducting psychotherapy might be a problem too complicated for narrow AI, i.e., AI proficient in dealing with only relatively simple and well-delineated tasks. If this is the case, we should not expect CAI to be capable of delivering fully-fledged psychotherapy until the so-called "general" or "human-like" AI is developed. While we believe that all these challenges can ultimately be overcome, we think that being mindful of them is crucial to ensure well-balanced and steady progress on our path to AI-based psychotherapy.
Collapse
|
29
|
Su T, Calvo RA, Jouaiti M, Daniels S, Kirby P, Dijk DJ, Della Monica C, Vaidyanathan R. Assessing a Sleep Interviewing Chatbot to Improve Subjective and Objective Sleep: Protocol for an Observational Feasibility Study. JMIR Res Protoc 2023; 12:e45752. [PMID: 37166964 DOI: 10.2196/45752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Revised: 04/12/2023] [Accepted: 04/12/2023] [Indexed: 05/12/2023] Open
Abstract
BACKGROUND Sleep disorders are common among the aging population and people with neurodegenerative diseases. Sleep disorders have a strong bidirectional relationship with neurodegenerative diseases, where they accelerate and worsen one another. Although one-to-one individual cognitive behavioral interventions (conducted in-person or on the internet) have shown promise for significant improvements in sleep efficiency among adults, many may experience difficulties accessing interventions with sleep specialists, psychiatrists, or psychologists. Therefore, delivering sleep intervention through an automated chatbot platform may be an effective strategy to increase the accessibility and reach of sleep disorder intervention among the aging population and people with neurodegenerative diseases. OBJECTIVE This work aims to (1) determine the feasibility and usability of an automated chatbot (named MotivSleep) that conducts sleep interviews to encourage the aging population to report behaviors that may affect their sleep, followed by providing personalized recommendations for better sleep based on participants' self-reported behaviors; (2) assess the self-reported sleep assessment changes before, during, and after using our automated sleep disturbance intervention chatbot; (3) assess the changes in objective sleep assessment recorded by a sleep tracking device before, during, and after using the automated chatbot MotivSleep. METHODS We will recruit 30 older adult participants from West London for this pilot study. Each participant will have a sleep analyzer installed under their mattress. This contactless sleep monitoring device passively records movements, heart rate, and breathing rate while participants are in bed. In addition, each participant will use our proposed chatbot MotivSleep, accessible on WhatsApp, to describe their sleep and behaviors related to their sleep and receive personalized recommendations for better sleep tailored to their specific reasons for disrupted sleep. We will analyze questionnaire responses before and after the study to assess their perception of our proposed chatbot; questionnaire responses before, during, and after the study to assess their subjective sleep quality changes; and sleep parameters recorded by the sleep analyzer throughout the study to assess their objective sleep quality changes. RESULTS Recruitment will begin in May 2023 through UK Dementia Research Institute Care Research and Technology Centre organized community outreach. Data collection will run from May 2023 until December 2023. We hypothesize that participants will perceive our proposed chatbot as intelligent and trustworthy; we also hypothesize that our proposed chatbot can help improve participants' subjective and objective sleep assessment throughout the study. CONCLUSIONS The MotivSleep automated chatbot has the potential to provide additional care to older adults who wish to improve their sleep in more accessible and less costly ways than conventional face-to-face therapy. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) PRR1-10.2196/45752.
Collapse
Affiliation(s)
- Ting Su
- Department of Mechanical Engineering, Imperial College London, London, United Kingdom
- Dyson School of Design Engineering, Imperial College London, London, United Kingdom
- Care Research & Technology Centre, UK Dementia Research Institute, London, United Kingdom
| | - Rafael A Calvo
- Dyson School of Design Engineering, Imperial College London, London, United Kingdom
| | - Melanie Jouaiti
- Department of Mechanical Engineering, Imperial College London, London, United Kingdom
- Care Research & Technology Centre, UK Dementia Research Institute, London, United Kingdom
| | - Sarah Daniels
- Care Research & Technology Centre, UK Dementia Research Institute, London, United Kingdom
- Department of Brain Sciences, Imperial College London, London, United Kingdom
| | - Pippa Kirby
- Care Research & Technology Centre, UK Dementia Research Institute, London, United Kingdom
- Department of Brain Sciences, Imperial College London, London, United Kingdom
- Department of Therapies, Imperial College Healthcare NHS Trust, London, United Kingdom
| | - Derk-Jan Dijk
- Care Research & Technology Centre, UK Dementia Research Institute, London, United Kingdom
- Surrey Sleep Research Centre, School of Biosciences, University of Surrey, Guildford, United Kingdom
| | - Ciro Della Monica
- Care Research & Technology Centre, UK Dementia Research Institute, London, United Kingdom
- Surrey Sleep Research Centre, School of Biosciences, University of Surrey, Guildford, United Kingdom
| | - Ravi Vaidyanathan
- Department of Mechanical Engineering, Imperial College London, London, United Kingdom
- Care Research & Technology Centre, UK Dementia Research Institute, London, United Kingdom
| |
Collapse
|
30
|
Sabour S, Zhang W, Xiao X, Zhang Y, Zheng Y, Wen J, Zhao J, Huang M. A chatbot for mental health support: exploring the impact of Emohaa on reducing mental distress in China. Front Digit Health 2023; 5:1133987. [PMID: 37214342 PMCID: PMC10193040 DOI: 10.3389/fdgth.2023.1133987] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Accepted: 04/17/2023] [Indexed: 05/24/2023] Open
Abstract
Introduction The growing demand for mental health support has highlighted the importance of conversational agents as human supporters worldwide and in China. These agents could increase availability and reduce the relative costs of mental health support. The provided support can be divided into two main types: cognitive and emotional. Existing work on this topic mainly focuses on constructing agents that adopt Cognitive Behavioral Therapy (CBT) principles. Such agents operate based on pre-defined templates and exercises to provide cognitive support. However, research on emotional support using such agents is limited. In addition, most of the constructed agents operate in English, highlighting the importance of conducting such studies in China. To this end, we introduce Emohaa, a conversational agent that provides cognitive support through CBT-Bot exercises and guided conversations. It also emotionally supports users through ES-Bot, enabling them to vent their emotional problems. In this study, we analyze the effectiveness of Emohaa in reducing symptoms of mental distress. Methods and Results Following the RCT design, the current study randomly assigned participants into three groups: Emohaa (CBT-Bot), Emohaa (Full), and control. With both Intention-To-Treat (N=247) and PerProtocol (N=134) analyses, the results demonstrated that compared to the control group, participants who used two types of Emohaa experienced considerably more significant improvements in symptoms of mental distress, including depression (F[2,244]=6.26, p=0.002), negative affect (F[2,244]=6.09, p=0.003), and insomnia (F[2,244]=3.69, p=0.026). Discussion Based on the obtained results and participants' satisfaction with the platform, we concluded that Emohaa is a practical and effective tool for reducing mental distress.
Collapse
Affiliation(s)
- Sahand Sabour
- The CoAI Group, DCST, Institute for Artificial Intelligence, State Key Lab of Intelligent Technology and Systems, Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China
| | - Wen Zhang
- Department of Psychology, Beijing Normal University, Beijing, China
| | - Xiyao Xiao
- Department of Research and Development, Beijing Lingxin Intelligent Technology Co., Ltd, Beijing, China
| | - Yuwei Zhang
- Department of Research and Development, Beijing Lingxin Intelligent Technology Co., Ltd, Beijing, China
| | - Yinhe Zheng
- Department of Research and Development, Beijing Lingxin Intelligent Technology Co., Ltd, Beijing, China
| | - Jiaxin Wen
- The CoAI Group, DCST, Institute for Artificial Intelligence, State Key Lab of Intelligent Technology and Systems, Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China
| | - Jialu Zhao
- Center for Counseling and Psychological Development Guidance Center, Tsinghua University, Beijing, China
| | - Minlie Huang
- The CoAI Group, DCST, Institute for Artificial Intelligence, State Key Lab of Intelligent Technology and Systems, Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China
- Department of Research and Development, Beijing Lingxin Intelligent Technology Co., Ltd, Beijing, China
| |
Collapse
|
31
|
Tanaka H, Saga T, Iwauchi K, Honda M, Morimoto T, Matsuda Y, Uratani M, Okazaki K, Nakamura S. The Validation of Automated Social Skills Training in Members of the General Population Over 4 Weeks: Comparative Study. JMIR Form Res 2023; 7:e44857. [PMID: 37103996 PMCID: PMC10176127 DOI: 10.2196/44857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 03/03/2023] [Accepted: 03/20/2023] [Indexed: 04/28/2023] Open
Abstract
BACKGROUND Social skills training by human trainers is a well-established method of teaching appropriate social and communication skills and strengthening social self-efficacy. Specifically, human social skills training is a fundamental approach to teaching and learning the rules of social interaction. However, it is cost-ineffective and offers low accessibility, since the number of professional trainers is limited. A conversational agent is a system that can communicate with a human being in a natural language. We proposed to overcome the limitations of current social skills training with conversational agents. Our system is capable of speech recognition, response selection, and speech synthesis and can also generate nonverbal behaviors. We developed a system that incorporated automated social skills training that completely adheres to the training model of Bellack et al through a conversational agent. OBJECTIVE This study aimed to validate the training effect of a conversational agent-based social skills training system in members of the general population during a 4-week training session. We compare 2 groups (with and without training) and hypothesize that the trained group's social skills will improve. Furthermore, this study sought to clarify the effect size for future larger-scale evaluations, including a much larger group of different social pathological phenomena. METHODS For the experiment, 26 healthy Japanese participants were separated into 2 groups, where we hypothesized that group 1 (system trained) will make greater improvement than group 2 (nontrained). System training was done as a 4-week intervention where the participants visit the examination room every week. Each training session included social skills training with a conversational agent for 3 basic skills. We evaluated the training effect using questionnaires in pre- and posttraining evaluations. In addition to the questionnaires, we conducted a performance test that required the social cognition and expression of participants in new role-play scenarios. Blind ratings by third-party trainers were made by watching recorded role-play videos. A nonparametric Wilcoxson Rank Sum test was performed for each variable. Improvement between pre- and posttraining evaluations was used to compare the 2 groups. Moreover, we compared the statistical significance from the questionnaires and ratings between the 2 groups. RESULTS Of the 26 recruited participants, 18 completed this experiment: 9 in group 1 and 9 in group 2. Those in group 1 achieved significant improvement in generalized self-efficacy (P=.02; effect size r=0.53). We also found a significant decrease in state anxiety presence (P=.04; r=0.49), measured by the State-Trait Anxiety Inventory (STAI). For ratings by third-party trainers, speech clarity was significantly strengthened in group 1 (P=.03; r=0.30). CONCLUSIONS Our findings reveal the usefulness of the automated social skills training after a 4-week training period. This study confirms a large effect size between groups on generalized self-efficacy, state anxiety presence, and speech clarity.
Collapse
Affiliation(s)
- Hiroki Tanaka
- Nara Institute of Science and Technology, Ikoma, Japan
| | - Takeshi Saga
- Nara Institute of Science and Technology, Ikoma, Japan
| | - Kota Iwauchi
- Nara Institute of Science and Technology, Ikoma, Japan
| | - Masato Honda
- Department of Psychiatry, Nara Medical University, Kashihara, Japan
| | - Tsubasa Morimoto
- Department of Psychiatry, Nara Medical University, Kashihara, Japan
| | | | | | - Kosuke Okazaki
- Department of Psychiatry, Nara Medical University, Kashihara, Japan
| | | |
Collapse
|
32
|
Ahmed A, Hassan A, Aziz S, Abd-Alrazaq AA, Ali N, Alzubaidi M, Al-Thani D, Elhusein B, Siddig MA, Ahmed M, Househ M. Chatbot features for anxiety and depression: A scoping review. Health Informatics J 2023; 29:14604582221146719. [PMID: 36693014 DOI: 10.1177/14604582221146719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Chatbots can provide valuable support to patients in assessing and guiding management of various health problems particularly when human resources are scarce. Chatbots can be affordable and efficient on-demand virtual assistants for mental health conditions, including anxiety and depression. We review features of chatbots available for anxiety or depression. Six bibliographic databases were searched including backward and forwards reference list checking. The initial search returned 1302 citations. Post-filtering, 42 studies remained forming the final dataset for this scoping review. Most of the studies were from conference proceedings (62%, 26/42), followed by journal articles (26%, 11/42), reports (7%, 3/42), or book chapters (5%, 2/42). About half of the reviewed chatbots had functionality targeting both anxiety and depression (60%, 25/42), whereas 38% (16/42) targeted only depression, 38% (16/42) anxiety and the remaining addressed other mental health issues along with anxiety and depression. Avatars or fictional characters were rarely used in these studies only 26% (11/42) despite their increasing popularity. Mental health chatbots could benefit in helping patients with anxiety and depression and provide valuable support to mental healthcare workers, particularly when resources are scarce. Real-time personal virtual assistance fills in this gap. Their role in mental health care is expected to increase.
Collapse
Affiliation(s)
- Arfan Ahmed
- AI Center for Precision Health, 36579Weill Cornell Medicine-Qatar, Doha, Qatar
| | - Asmaa Hassan
- College of Science and Engineering, 370593Hamad Bin Khalifa University, Doha, Qatar
| | - Sarah Aziz
- AI Center for Precision Health, 36579Weill Cornell Medicine-Qatar, Doha, Qatar
| | - Alaa A Abd-Alrazaq
- AI Center for Precision Health, 36579Weill Cornell Medicine-Qatar, Doha, Qatar
| | - Nashva Ali
- College of Science and Engineering, 370593Hamad Bin Khalifa University, Doha, Qatar
| | - Mahmood Alzubaidi
- College of Science and Engineering, 370593Hamad Bin Khalifa University, Doha, Qatar
| | - Dena Al-Thani
- College of Science and Engineering, 370593Hamad Bin Khalifa University, Doha, Qatar
| | - Bushra Elhusein
- Mental Health Services, 36977Hamad Medical Corporation, Doha, Qatar
| | | | - Maram Ahmed
- Mental Health Services, 36977Hamad Medical Corporation, Doha, Qatar
| | - Mowafa Househ
- College of Science and Engineering, 370593Hamad Bin Khalifa University, Doha, Qatar
| |
Collapse
|
33
|
Leung T, Musiello F, Keter AK, Barnabas R, van Heerden A. The Feasibility and Acceptability of an mHealth Conversational Agent Designed to Support HIV Self-testing in South Africa: Cross-sectional Study. J Med Internet Res 2022; 24:e39816. [PMID: 36508248 PMCID: PMC9793294 DOI: 10.2196/39816] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 09/28/2022] [Accepted: 10/28/2022] [Indexed: 11/07/2022] Open
Abstract
BACKGROUND HIV testing rates in sub-Saharan Africa remain below the targeted threshold, and primary care facilities struggle to provide adequate services. Innovative approaches that leverage digital technologies could improve HIV testing and access to treatment. OBJECTIVE This study aimed to examine the feasibility and acceptability of Nolwazi_bot. It is an isiZulu-speaking conversational agent designed to support HIV self-testing (HIVST) in KwaZulu-Natal, South Africa. METHODS Nolwazi_bot was designed with 4 different personalities that users could choose when selecting a counselor for their HIVST session. We recruited a convenience sample of 120 consenting adults and invited them to undertake an HIV self-test facilitated by the Nolwazi_bot. After testing, participants completed an interviewer-led posttest structured survey to assess their experience with the chatbot-supported HIVST. RESULTS Participants (N=120) ranged in age from 18 to 47 years, with half of them being men (61/120, 50.8%). Of the 120 participants, 111 (92.5%) had tested with a human counselor more than once. Of the 120 participants, 45 (37.5%) chose to be counseled by the female Nolwazi_bot personality aged between 18 and 25 years. Approximately one-fifth (21/120, 17.5%) of the participants who underwent an HIV self-test guided by the chatbot tested positive. Most participants (95/120, 79.2%) indicated that their HIV testing experience with a chatbot was much better than that with a human counselor. Many participants (93/120, 77.5%) reported that they felt as if they were talking to a real person, stating that the response tone and word choice of Nolwazi_bot reminded them of how they speak in daily conversations. CONCLUSIONS The study provides insights into the potential of digital technology interventions to support HIVST in low-income and middle-income countries. Although we wait to see the full benefits of mobile health, technological interventions including conversational agents or chatbots provide us with an excellent opportunity to improve HIVST by addressing the barriers associated with clinic-based HIV testing.
Collapse
Affiliation(s)
| | - Franco Musiello
- Centre for Community Based Research, Human Sciences Research Council, Pietermaritzburg, South Africa
| | - Alfred Kipyegon Keter
- Centre for Community Based Research, Human Sciences Research Council, Pietermaritzburg, South Africa.,Department of Clinical Sciences, Institute of Tropical Medicine Antwerp, Antwerp, Belgium.,Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Ruanne Barnabas
- Department of Medicine, Harvard Medical School, Boston, MA, United States.,Division of Infectious Diseases, Massachusetts General Hospital, Boston, MA, United States
| | - Alastair van Heerden
- Centre for Community Based Research, Human Sciences Research Council, Pietermaritzburg, South Africa.,South African Medical Research Council/WITS Developmental Pathways for Health Research Unit, Department of Paediatrics, School of Clinical Medicine, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| |
Collapse
|
34
|
Wilson L, Marasoiu M. The Development and Use of Chatbots in Public Health: Scoping Review. JMIR Hum Factors 2022; 9:e35882. [PMID: 36197708 PMCID: PMC9536768 DOI: 10.2196/35882] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 07/14/2022] [Accepted: 08/02/2022] [Indexed: 01/13/2023] Open
Abstract
Background Chatbots are computer programs that present a conversation-like interface through which people can access information and services. The COVID-19 pandemic has driven a substantial increase in the use of chatbots to support and complement traditional health care systems. However, despite the uptake in their use, evidence to support the development and deployment of chatbots in public health remains limited. Recent reviews have focused on the use of chatbots during the COVID-19 pandemic and the use of conversational agents in health care more generally. This paper complements this research and addresses a gap in the literature by assessing the breadth and scope of research evidence for the use of chatbots across the domain of public health. Objective This scoping review had 3 main objectives: (1) to identify the application domains in public health in which there is the most evidence for the development and use of chatbots; (2) to identify the types of chatbots that are being deployed in these domains; and (3) to ascertain the methods and methodologies by which chatbots are being evaluated in public health applications. This paper explored the implications for future research on the development and deployment of chatbots in public health in light of the analysis of the evidence for their use. Methods Following the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines for scoping reviews, relevant studies were identified through searches conducted in the MEDLINE, PubMed, Scopus, Cochrane Central Register of Controlled Trials, IEEE Xplore, ACM Digital Library, and Open Grey databases from mid-June to August 2021. Studies were included if they used or evaluated chatbots for the purpose of prevention or intervention and for which the evidence showed a demonstrable health impact. Results Of the 1506 studies identified, 32 were included in the review. The results show a substantial increase in the interest of chatbots in the past few years, shortly before the pandemic. Half (16/32, 50%) of the research evaluated chatbots applied to mental health or COVID-19. The studies suggest promise in the application of chatbots, especially to easily automated and repetitive tasks, but overall, the evidence for the efficacy of chatbots for prevention and intervention across all domains is limited at present. Conclusions More research is needed to fully understand the effectiveness of using chatbots in public health. Concerns with the clinical, legal, and ethical aspects of the use of chatbots for health care are well founded given the speed with which they have been adopted in practice. Future research on their use should address these concerns through the development of expertise and best practices specific to public health, including a greater focus on user experience.
Collapse
Affiliation(s)
- Lee Wilson
- Centre for Policy Futures, University of Queensland, St Lucia, Queensland, Australia
| | - Mariana Marasoiu
- Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
35
|
Rathnayaka P, Mills N, Burnett D, De Silva D, Alahakoon D, Gray R. A Mental Health Chatbot with Cognitive Skills for Personalised Behavioural Activation and Remote Health Monitoring. Sensors (Basel) 2022; 22:s22103653. [PMID: 35632061 PMCID: PMC9148050 DOI: 10.3390/s22103653] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2022] [Revised: 05/06/2022] [Accepted: 05/09/2022] [Indexed: 05/08/2023]
Abstract
Mental health issues are at the forefront of healthcare challenges facing contemporary human society. These issues are most prevalent among working-age people, impacting negatively on the individual, his/her family, workplace, community, and the economy. Conventional mental healthcare services, although highly effective, cannot be scaled up to address the increasing demand from affected individuals, as evidenced in the first two years of the COVID-19 pandemic. Conversational agents, or chatbots, are a recent technological innovation that has been successfully adapted for mental healthcare as a scalable platform of cross-platform smartphone applications that provides first-level support for such individuals. Despite this disposition, mental health chatbots in the extant literature and practice are limited in terms of the therapy provided and the level of personalisation. For instance, most chatbots extend Cognitive Behavioural Therapy (CBT) into predefined conversational pathways that are generic and ineffective in recurrent use. In this paper, we postulate that Behavioural Activation (BA) therapy and Artificial Intelligence (AI) are more effectively materialised in a chatbot setting to provide recurrent emotional support, personalised assistance, and remote mental health monitoring. We present the design and development of our BA-based AI chatbot, followed by its participatory evaluation in a pilot study setting that confirmed its effectiveness in providing support for individuals with mental health issues.
Collapse
|
36
|
Ollier J, Nißen M, von Wangenheim F. The Terms of "You(s)": How the Term of Address Used by Conversational Agents Influences User Evaluations in French and German Linguaculture. Front Public Health 2022; 9:691595. [PMID: 35071147 PMCID: PMC8767023 DOI: 10.3389/fpubh.2021.691595] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Accepted: 12/03/2021] [Indexed: 11/26/2022] Open
Abstract
Background: Conversational agents (CAs) are a novel approach to delivering digital health interventions. In human interactions, terms of address often change depending on the context or relationship between interlocutors. In many languages, this encompasses T/V distinction—formal and informal forms of the second-person pronoun “You”—that conveys different levels of familiarity. Yet, few research articles have examined whether CAs' use of T/V distinction across language contexts affects users' evaluations of digital health applications. Methods: In an online experiment (N = 284), we manipulated a public health CA prototype to use either informal or formal T/V distinction forms in French (“tu” vs. “vous”) and German (“du” vs. “Sie”) language settings. A MANCOVA and post-hoc tests were performed to examine the effects of the independent variables (i.e., T/V distinction and Language) and the moderating role of users' demographic profile (i.e., Age and Gender) on eleven user evaluation variables. These were related to four themes: (i) Sociability, (ii) CA-User Collaboration, (iii) Service Evaluation, and (iv) Behavioral Intentions. Results: Results showed a four-way interaction between T/V Distinction, Language, Age, and Gender, influencing user evaluations across all outcome themes. For French speakers, when the informal “T form” (“Tu”) was used, higher user evaluation scores were generated for younger women and older men (e.g., the CA felt more humanlike or individuals were more likely to recommend the CA), whereas when the formal “V form” (“Vous”) was used, higher user evaluation scores were generated for younger men and older women. For German speakers, when the informal T form (“Du”) was used, younger users' evaluations were comparable regardless of Gender, however, as individuals' Age increased, the use of “Du” resulted in lower user evaluation scores, with this effect more pronounced in men. When using the formal V form (“Sie”), user evaluation scores were relatively stable, regardless of Gender, and only increasing slightly with Age. Conclusions: Results highlight how user CA evaluations vary based on the T/V distinction used and language setting, however, that even within a culturally homogenous language group, evaluations vary based on user demographics, thus highlighting the importance of personalizing CA language.
Collapse
Affiliation(s)
- Joseph Ollier
- Chair of Technology Marketing, Department of Management, Economics and Technology (D-MTEC), ETH Zürich, Zurich, Switzerland.,Centre for Digital Health Interventions (CDHI), Department of Management, Economics and Technology (D-MTEC), ETH Zürich, Zurich, Switzerland
| | - Marcia Nißen
- Chair of Technology Marketing, Department of Management, Economics and Technology (D-MTEC), ETH Zürich, Zurich, Switzerland.,Centre for Digital Health Interventions (CDHI), Department of Management, Economics and Technology (D-MTEC), ETH Zürich, Zurich, Switzerland
| | - Florian von Wangenheim
- Chair of Technology Marketing, Department of Management, Economics and Technology (D-MTEC), ETH Zürich, Zurich, Switzerland.,Centre for Digital Health Interventions (CDHI), Department of Management, Economics and Technology (D-MTEC), ETH Zürich, Zurich, Switzerland
| |
Collapse
|
37
|
Mitchell EG, Elhadad N, Mamykina L. Examining AI Methods for Micro-Coaching Dialogs. Proc SIGCHI Conf Hum Factor Comput Syst 2022; 2022:440. [PMID: 36454205 PMCID: PMC9707294 DOI: 10.1145/3491102.3501886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Conversational interaction, for example through chatbots, is well-suited to enable automated health coaching tools to support self-management and prevention of chronic diseases. However, chatbots in health are predominantly scripted or rule-based, which can result in a stagnant and repetitive user experience in contrast with more dynamic, data-driven chatbots in other domains. Consequently, little is known about the tradeoffs of pursuing data-driven approaches for health chatbots. We examined multiple artificial intelligence (AI) approaches to enable micro-coaching dialogs in nutrition - brief coaching conversations related to specific meals, to support achievement of nutrition goals - and compared, reinforcement learning (RL), rule-based, and scripted approaches for dialog management. While the data-driven RL chatbot succeeded in shorter, more efficient dialogs, surprisingly the simplest, scripted chatbot was rated as higher quality, despite not fulfilling its task as consistently. These results highlight tensions between scripted and more complex, data-driven approaches for chatbots in health.
Collapse
Affiliation(s)
- Elliot G Mitchell
- Columbia University, Department of Biomedical Informatics, New York, New York
- Geisinger, Steele Institute for Health Innovation, Danville, Pennsylvania
| | - Noémie Elhadad
- Columbia University, Department of Biomedical Informatics, New York, New York
| | - Lena Mamykina
- Columbia University, Department of Biomedical Informatics, New York, New York
| |
Collapse
|
38
|
Ahmed A, Aziz S, Khalifa M, Shah U, Hassan A, Abd-Alrazaq A, Househ M. Thematic Analysis on User Reviews for Depression and Anxiety Chatbot Apps: Machine Learning Approach. JMIR Form Res 2022; 6:e27654. [PMID: 35275069 PMCID: PMC8956988 DOI: 10.2196/27654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Revised: 05/19/2021] [Accepted: 12/15/2021] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Anxiety and depression are among the most commonly prevalent mental health disorders worldwide. Chatbot apps can play an important role in relieving anxiety and depression. Users' reviews of chatbot apps are considered an important source of data for exploring users' opinions and satisfaction. OBJECTIVE This study aims to explore users' opinions, satisfaction, and attitudes toward anxiety and depression chatbot apps by conducting a thematic analysis of users' reviews of 11 anxiety and depression chatbot apps collected from the Google Play Store and Apple App Store. In addition, we propose a workflow to provide a methodological approach for future analysis of app review comments. METHODS We analyzed 205,581 user review comments from chatbots designed for users with anxiety and depression symptoms. Using scraper tools and Google Play Scraper and App Store Scraper Python libraries, we extracted the text and metadata. The reviews were divided into positive and negative meta-themes based on users' rating per review. We analyzed the reviews using word frequencies of bigrams and words in pairs. A topic modeling technique, latent Dirichlet allocation, was applied to identify topics in the reviews and analyzed to detect themes and subthemes. RESULTS Thematic analysis was conducted on 5 topics for each sentimental set. Reviews were categorized as positive or negative. For positive reviews, the main themes were confidence and affirmation building, adequate analysis, and consultation, caring as a friend, and ease of use. For negative reviews, the results revealed the following themes: usability issues, update issues, privacy, and noncreative conversations. CONCLUSIONS Using a machine learning approach, we were able to analyze ≥200,000 comments and categorize them into themes, allowing us to observe users' expectations effectively despite some negative factors. A methodological workflow is provided for the future analysis of review comments.
Collapse
Affiliation(s)
- Arfan Ahmed
- Division of Information and Computing Technology, College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar.,AI Center for Precision Health, Weill Cornell Medicine-Qatar, Doha, Qatar
| | - Sarah Aziz
- Division of Information and Computing Technology, College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar.,AI Center for Precision Health, Weill Cornell Medicine-Qatar, Doha, Qatar
| | - Mohamed Khalifa
- Centre for Health Informatics, Australian Institute of Health Innovation, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, Australia
| | - Uzair Shah
- Division of Information and Computing Technology, College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar
| | - Asma Hassan
- Division of Information and Computing Technology, College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar
| | - Alaa Abd-Alrazaq
- Division of Information and Computing Technology, College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar.,AI Center for Precision Health, Weill Cornell Medicine-Qatar, Doha, Qatar
| | - Mowafa Househ
- Division of Information and Computing Technology, College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar
| |
Collapse
|
39
|
Bérubé C, Kovacs ZF, Fleisch E, Kowatsch T. Reliability of Commercial Voice Assistants' Responses to Health-Related Questions in Noncommunicable Disease Management: Factorial Experiment Assessing Response Rate and Source of Information. J Med Internet Res 2021; 23:e32161. [PMID: 34932003 PMCID: PMC8726026 DOI: 10.2196/32161] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Revised: 11/19/2021] [Accepted: 11/20/2021] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Noncommunicable diseases (NCDs) constitute a burden on public health. These are best controlled through self-management practices, such as self-information. Fostering patients' access to health-related information through efficient and accessible channels, such as commercial voice assistants (VAs), may support the patients' ability to make health-related decisions and manage their chronic conditions. OBJECTIVE This study aims to evaluate the reliability of the most common VAs (ie, Amazon Alexa, Apple Siri, and Google Assistant) in responding to questions about management of the main NCD. METHODS We generated health-related questions based on frequently asked questions from health organization, government, medical nonprofit, and other recognized health-related websites about conditions associated with Alzheimer's disease (AD), lung cancer (LCA), chronic obstructive pulmonary disease, diabetes mellitus (DM), cardiovascular disease, chronic kidney disease (CKD), and cerebrovascular accident (CVA). We then validated them with practicing medical specialists, selecting the 10 most frequent ones. Given the low average frequency of the AD-related questions, we excluded such questions. This resulted in a pool of 60 questions. We submitted the selected questions to VAs in a 3×3×6 fractional factorial design experiment with 3 developers (ie, Amazon, Apple, and Google), 3 modalities (ie, voice only, voice and display, display only), and 6 diseases. We assessed the rate of error-free voice responses and classified the web sources based on previous research (ie, expert, commercial, crowdsourced, or not stated). RESULTS Google showed the highest total response rate, followed by Amazon and Apple. Moreover, although Amazon and Apple showed a comparable response rate in both voice-and-display and voice-only modalities, Google showed a slightly higher response rate in voice only. The same pattern was observed for the rate of expert sources. When considering the response and expert source rate across diseases, we observed that although Google remained comparable, with a slight advantage for LCA and CKD, both Amazon and Apple showed the highest response rate for LCA. However, both Google and Apple showed most often expert sources for CVA, while Amazon did so for DM. CONCLUSIONS Google showed the highest response rate and the highest rate of expert sources, leading to the conclusion that Google Assistant would be the most reliable tool in responding to questions about NCD management. However, the rate of expert sources differed across diseases. We urge health organizations to collaborate with Google, Amazon, and Apple to allow their VAs to consistently provide reliable answers to health-related questions on NCD management across the different diseases.
Collapse
Affiliation(s)
- Caterina Bérubé
- Centre for Digital Health Interventions, Department of Management, Technology, and Economics, ETH Zurich, Zurich, Switzerland
| | - Zsolt Ferenc Kovacs
- Centre for Digital Health Interventions, Department of Management, Technology, and Economics, ETH Zurich, Zurich, Switzerland
| | - Elgar Fleisch
- Centre for Digital Health Interventions, Department of Management, Technology, and Economics, ETH Zurich, Zurich, Switzerland.,Future Health Technologies Programme, Campus for Research Excellence and Technological Enterprise (CREATE), Singapore-ETH Centre, Singapore, Singapore.,Centre for Digital Health Interventions, Institute of Technology Management, University of St. Gallen, St. Gallen, Switzerland
| | - Tobias Kowatsch
- Centre for Digital Health Interventions, Department of Management, Technology, and Economics, ETH Zurich, Zurich, Switzerland.,Future Health Technologies Programme, Campus for Research Excellence and Technological Enterprise (CREATE), Singapore-ETH Centre, Singapore, Singapore.,Centre for Digital Health Interventions, Institute of Technology Management, University of St. Gallen, St. Gallen, Switzerland
| |
Collapse
|
40
|
Allouch M, Azaria A, Azoulay R. Conversational Agents: Goals, Technologies, Vision and Challenges. Sensors (Basel) 2021; 21:8448. [PMID: 34960538 PMCID: PMC8704682 DOI: 10.3390/s21248448] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 12/09/2021] [Accepted: 12/10/2021] [Indexed: 01/04/2023]
Abstract
In recent years, conversational agents (CAs) have become ubiquitous and are a presence in our daily routines. It seems that the technology has finally ripened to advance the use of CAs in various domains, including commercial, healthcare, educational, political, industrial, and personal domains. In this study, the main areas in which CAs are successful are described along with the main technologies that enable the creation of CAs. Capable of conducting ongoing communication with humans, CAs are encountered in natural-language processing, deep learning, and technologies that integrate emotional aspects. The technologies used for the evaluation of CAs and publicly available datasets are outlined. In addition, several areas for future research are identified to address moral and security issues, given the current state of CA-related technological developments. The uniqueness of our review is that an overview of the concepts and building blocks of CAs is provided, and CAs are categorized according to their abilities and main application domains. In addition, the primary tools and datasets that may be useful for the development and evaluation of CAs of different categories are described. Finally, some thoughts and directions for future research are provided, and domains that may benefit from conversational agents are introduced.
Collapse
Affiliation(s)
- Merav Allouch
- Computer Science Department, Ariel University, Ariel 40700, Israel; (M.A.); (A.A.)
| | - Amos Azaria
- Computer Science Department, Ariel University, Ariel 40700, Israel; (M.A.); (A.A.)
| | - Rina Azoulay
- Department of Computer Science, Jerusalem College of Technology, Jerusalem 9116001, Israel
| |
Collapse
|
41
|
Giunti G, Isomursu M, Gabarron E, Solad Y. Designing Depression Screening Chatbots. Stud Health Technol Inform 2021; 284:259-263. [PMID: 34920522 DOI: 10.3233/shti210719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Advances in voice recognition, natural language processing, and artificial intelligence have led to the increasing availability and use of conversational agents (chatbots) in different settings. Chatbots are systems that mimic human dialogue interaction through text or voice. This paper describes a series of design considerations for integrating chatbots interfaces with health services. The present paper is part of ongoing work that explores the overall implementation of chatbots in the healthcare context. The findings have been created using a research through design process, combining (1) literature survey of existing body of knowledge on designing chatbots, (2) analysis on state-of-the-practice in using chatbots as service interfaces, and (3) generative process of designing a chatbot interface for depression screening. In this paper we describe considerations that would be useful for the design of a chatbot for a healthcare context.
Collapse
Affiliation(s)
- G Giunti
- University of Oulu, Oulu, Finland
- TU Delft, Delft, Netherlands
| | | | - E Gabarron
- Norwegian Centre for E-health Research, Tromso, Norway
| | - Y Solad
- Yale New Haven Health, New Haven, Connecticut, USA
| |
Collapse
|
42
|
Dhinagaran DA, Sathish T, Soong A, Theng YL, Best J, Tudor Car L. Conversational Agent for Healthy Lifestyle Behavior Change: Web-Based Feasibility Study. JMIR Form Res 2021; 5:e27956. [PMID: 34870611 PMCID: PMC8686401 DOI: 10.2196/27956] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 08/09/2021] [Accepted: 08/24/2021] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND The rising incidence of chronic diseases is a growing concern, especially in Singapore, which is one of the high-income countries with the highest prevalence of diabetes. Interventions that promote healthy lifestyle behavior changes have been proven to be effective in reducing the progression of prediabetes to diabetes, but their in-person delivery may not be feasible on a large scale. Novel technologies such as conversational agents are a potential alternative for delivering behavioral interventions that promote healthy lifestyle behavior changes to the public. OBJECTIVE The aim of this study is to assess the feasibility and acceptability of using a conversational agent promoting healthy lifestyle behavior changes in the general population in Singapore. METHODS We performed a web-based, single-arm feasibility study. The participants were recruited through Facebook over 4 weeks. The Facebook Messenger conversational agent was used to deliver the intervention. The conversations focused on diet, exercise, sleep, and stress and aimed to promote healthy lifestyle behavior changes and improve the participants' knowledge of diabetes. Messages were sent to the participants four times a week (once for each of the 4 topics of focus) for 4 weeks. We assessed the feasibility of recruitment, defined as at least 75% (150/200) of our target sample of 200 participants in 4 weeks, as well as retention, defined as 33% (66/200) of the recruited sample completing the study. We also assessed the participants' satisfaction with, and usability of, the conversational agent. In addition, we performed baseline and follow-up assessments of quality of life, diabetes knowledge and risk perception, diet, exercise, sleep, and stress. RESULTS We recruited 37.5% (75/200) of the target sample size in 1 month. Of the 75 eligible participants, 60 (80%) provided digital informed consent and completed baseline assessments. Of these 60 participants, 56 (93%) followed the study through till completion. Retention was high at 93% (56/60), along with engagement, denoted by 50% (30/60) of the participants communicating with the conversational agent at each interaction. Acceptability, usability, and satisfaction were generally high. Preliminary efficacy of the intervention showed no definitive improvements in health-related behavior. CONCLUSIONS The delivery of a conversational agent for healthy lifestyle behavior change through Facebook Messenger was feasible and acceptable. We were unable to recruit our planned sample solely using the free options in Facebook. However, participant retention and conversational agent engagement rates were high. Our findings provide important insights to inform the design of a future randomized controlled trial.
Collapse
Affiliation(s)
| | - Thirunavukkarasu Sathish
- Population Health Research Institute, McMaster University, Hamilton, ON, Canada
- Centre for Population Health Sciences, Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
| | - AiJia Soong
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore, Singapore
| | - Yin-Leng Theng
- Centre for Healthy and Sustainable Cities, Nanyang Technological University, Singapore, Singapore, Singapore
| | - James Best
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore, Singapore
| | - Lorainne Tudor Car
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore, Singapore
- Department of Primary Care and Public Health, School of Public Health, Imperial College London, London, United Kingdom
| |
Collapse
|
43
|
Dhinagaran DA, Sathish T, Kowatsch T, Griva K, Best JD, Tudor Car L. Public Perceptions of Diabetes, Healthy Living, and Conversational Agents in Singapore: Needs Assessment. JMIR Form Res 2021; 5:e30435. [PMID: 34762053 PMCID: PMC8663498 DOI: 10.2196/30435] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 09/09/2021] [Accepted: 09/16/2021] [Indexed: 01/22/2023] Open
Abstract
Background The incidence of chronic diseases such as type 2 diabetes is increasing in countries worldwide, including Singapore. Health professional–delivered healthy lifestyle interventions have been shown to prevent type 2 diabetes. However, ongoing personalized guidance from health professionals is not feasible or affordable at the population level. Novel digital interventions delivered using mobile technology, such as conversational agents, are a potential alternative for the delivery of healthy lifestyle change behavioral interventions to the public. Objective We explored perceptions and experiences of Singaporeans on healthy living, diabetes, and mobile health (mHealth) interventions (apps and conversational agents). This study was conducted to help inform the design and development of a conversational agent focusing on healthy lifestyle changes. Methods This qualitative study was conducted in August and September 2019. A total of 20 participants were recruited from relevant healthy living Facebook pages and groups. Semistructured interviews were conducted in person or over the telephone using an interview guide. Interviews were transcribed and analyzed in parallel by 2 researchers using Burnard’s method, a structured approach for thematic content analysis. Results The collected data were organized into 4 main themes: use of conversational agents, ubiquity of smartphone apps, understanding of diabetes, and barriers and facilitators to a healthy living in Singapore. Most participants used health-related mobile apps as well as conversational agents unrelated to health care. They provided diverse suggestions for future conversational agent-delivered interventions. Participants also highlighted several knowledge gaps in relation to diabetes and healthy living. Regarding barriers to healthy living, participants mentioned frequent dining out, high stress levels, lack of work-life balance, and lack of free time to engage in physical activity. In contrast, discipline, preplanning, and sticking to a routine were important for enabling a healthy lifestyle. Conclusions Participants in this study commonly used mHealth interventions and provided important insights into their knowledge gaps and needs in relation to changes in healthy lifestyle behaviors. Future digital interventions such as conversational agents focusing on healthy lifestyle and diabetes prevention should aim to address the barriers highlighted in our study and motivate individuals to adopt healthy lifestyle behavior.
Collapse
Affiliation(s)
| | - Thirunavukkarasu Sathish
- Population Health Research Institute, McMaster University, Hamilton, ON, Canada.,Centre for Population Health Sciences, Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
| | - Tobias Kowatsch
- Future Health Technologies Programme, Campus for Research Excellence and Technological Enterprise, Singapore-ETH Centre, Singapore, Singapore.,Centre for Digital Health Interventions, Department of Management, Technology, and Economics, ETH Zurich, Zurich, Switzerland.,Centre for Digital Health Interventions, Institute of Technology Management, University of St Gallen, St Gallen, Switzerland
| | - Konstadina Griva
- Centre for Population Health Sciences, Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
| | - James Donovan Best
- Lee Kong Chian School of Medicine, Nanyang Technological University Singapore, Singapore, Singapore
| | - Lorainne Tudor Car
- Lee Kong Chian School of Medicine, Nanyang Technological University Singapore, Singapore, Singapore.,Department of Primary Care and Public Health, School of Public Health, Imperial College London, London, United Kingdom
| |
Collapse
|
44
|
Figueroa CA, Luo TC, Jacobo A, Munoz A, Manuel M, Chan D, Canny J, Aguilera A. Conversational Physical Activity Coaches for Spanish and English Speaking Women: A User Design Study. Front Digit Health 2021; 3:747153. [PMID: 34713207 PMCID: PMC8531260 DOI: 10.3389/fdgth.2021.747153] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2021] [Accepted: 09/06/2021] [Indexed: 11/17/2022] Open
Abstract
Introduction: Digital technologies, including text messaging and mobile phone apps, can be leveraged to increase people's physical activity and manage health. Chatbots, powered by artificial intelligence, can automatically interact with individuals through natural conversation. They may be more engaging than one-way messaging interventions. To our knowledge, physical activity chatbots have not been developed with low-income participants, nor in Spanish-the second most dominant language in the U.S. We recommend best practices for physical activity chatbots in English and Spanish for low-income women. Methods: We designed a prototype physical activity text-message based conversational agent based on various psychotherapeutic techniques. We recruited participants through SNAP-Ed (Supplemental Nutrition Assistance Program Education) in California (Alameda County) and Tennessee (Shelby County). We conducted qualitative interviews with participants during testing of our prototype chatbot, held a Wizard of Oz study, and facilitated a co-design workshop in Spanish with a subset of our participants. Results: We included 10 Spanish- and 8 English-speaking women between 27 and 41 years old. The majority was Hispanic/Latina (n = 14), 2 were White and 2 were Black/African American. More than half were monolingual Spanish speakers, and the majority was born outside the US (>50% in Mexico). Most participants were unfamiliar with chatbots and were initially skeptical. After testing our prototype, most users felt positively about health chatbots. They desired a personalized chatbot that addresses their concerns about privacy, and stressed the need for a comprehensive system to also aid with nutrition, health information, stress, and involve family members. Differences between English and monolingual Spanish speakers were found mostly in exercise app use, digital literacy, and the wish for family inclusion. Conclusion: Low-income Spanish- and English-speaking women are interested in using chatbots to improve their physical activity and other health related aspects. Researchers developing health chatbots for this population should focus on issues of digital literacy, app familiarity, linguistic and cultural issues, privacy concerns, and personalization. Designing and testing this intervention for and with this group using co-creation techniques and involving community partners will increase the probability that it will ultimately be effective.
Collapse
Affiliation(s)
- Caroline A. Figueroa
- School of Social Welfare, University of California, Berkeley, Berkeley, CA, United States
| | - Tiffany C. Luo
- School of Social Welfare, University of California, Berkeley, Berkeley, CA, United States
| | - Andrea Jacobo
- School of Public Health, University of California, Berkeley, Berkeley, CA, United States
| | - Alan Munoz
- School of Social Welfare, University of California, Berkeley, Berkeley, CA, United States
| | - Minx Manuel
- School of Public Health, University of California, Berkeley, Berkeley, CA, United States
| | - David Chan
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, United States
| | - John Canny
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, United States
| | - Adrian Aguilera
- School of Social Welfare, University of California, Berkeley, Berkeley, CA, United States
- Department of Psychiatry and Behavioral Sciences, Zuckerberg San Francisco General Hospital, University of California, San Francisco, San Francisco, CA, United States
| |
Collapse
|
45
|
Siedlikowski S, Noël LP, Moynihan SA, Robin M. Chloe for COVID-19: Evolution of an Intelligent Conversational Agent to Address Infodemic Management Needs During the COVID-19 Pandemic. J Med Internet Res 2021; 23:e27283. [PMID: 34375299 PMCID: PMC8457340 DOI: 10.2196/27283] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 06/27/2021] [Accepted: 07/10/2021] [Indexed: 01/23/2023] Open
Abstract
There is an unprecedented demand for infodemic management due to rapidly evolving information about the novel COVID-19 pandemic. This viewpoint paper details the evolution of a Canadian digital information tool, Chloe for COVID-19, based on incremental leveraging of artificial intelligence techniques. By providing an accessible summary of Chloe's development, we show how proactive cooperation between health, technology, and corporate sectors can lead to a rapidly scalable, safe, and secure virtual chatbot to assist public health efforts in keeping Canadians informed. We then highlight Chloe's strengths, the challenges we faced during the development process, and future directions for the role of chatbots in infodemic management. The information presented here may guide future collaborative efforts in health technology in order to enhance access to accurate and timely health information to the public.
Collapse
Affiliation(s)
| | | | | | - Marc Robin
- Dialogue Health Technologies Inc, Montreal, QC, Canada
| |
Collapse
|
46
|
Mauriello ML, Tantivasadakarn N, Mora-Mendoza MA, Lincoln ET, Hon G, Nowruzi P, Simon D, Hansen L, Goenawan NH, Kim J, Gowda N, Jurafsky D, Paredes PE. A Suite of Mobile Conversational Agents for Daily Stress Management (Popbots): Mixed Methods Exploratory Study. JMIR Form Res 2021; 5:e25294. [PMID: 34519655 PMCID: PMC8479600 DOI: 10.2196/25294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Revised: 12/11/2020] [Accepted: 08/01/2021] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Approximately 60%-80% of the primary care visits have a psychological stress component, but only 3% of patients receive stress management advice during these visits. Given recent advances in natural language processing, there is renewed interest in mental health chatbots. Conversational agents that can understand a user's problems and deliver advice that mitigates the effects of daily stress could be an effective public health tool. However, such systems are complex to build and costly to develop. OBJECTIVE To address these challenges, our aim is to develop and evaluate a fully automated mobile suite of shallow chatbots-we call them Popbots-that may serve as a new species of chatbots and further complement human assistance in an ecosystem of stress management support. METHODS After conducting an exploratory Wizard of Oz study (N=14) to evaluate the feasibility of a suite of multiple chatbots, we conducted a web-based study (N=47) to evaluate the implementation of our prototype. Each participant was randomly assigned to a different chatbot designed on the basis of a proven cognitive or behavioral intervention method. To measure the effectiveness of the chatbots, the participants' stress levels were determined using self-reported psychometric evaluations (eg, web-based daily surveys and Patient Health Questionnaire-4). The participants in these studies were recruited through email and enrolled on the web, and some of them participated in follow-up interviews that were conducted in person or on the web (as necessary). RESULTS Of the 47 participants, 31 (66%) completed the main study. The findings suggest that the users viewed the conversations with our chatbots as helpful or at least neutral and came away with increasingly positive sentiment toward the use of chatbots for proactive stress management. Moreover, those users who used the system more often (ie, they had more than or equal to the median number of conversations) noted a decrease in depression symptoms compared with those who used the system less often based on a Wilcoxon signed-rank test (W=91.50; Z=-2.54; P=.01; r=0.47). The follow-up interviews with a subset of the participants indicated that half of the common daily stressors could be discussed with chatbots, potentially reducing the burden on human coping resources. CONCLUSIONS Our work suggests that suites of shallow chatbots may offer benefits for both users and designers. As a result, this study's contributions include the design and evaluation of a novel suite of shallow chatbots for daily stress management, a summary of benefits and challenges associated with random delivery of multiple conversational interventions, and design guidelines and directions for future research into similar systems, including authoring chatbot systems and artificial intelligence-enabled recommendation algorithms.
Collapse
Affiliation(s)
- Matthew Louis Mauriello
- Department of Computer and Information Sciences, University of Delaware, Newark, DE, United States
| | - Nantanick Tantivasadakarn
- Symbolic Systems Program, School of Humanities and Sciences, Stanford University, Stanford, CA, United States
| | | | | | - Grace Hon
- Stanford School of Medicine, Stanford University, Stanford, CA, United States
| | - Parsa Nowruzi
- Stanford School of Medicine, Stanford University, Stanford, CA, United States
| | - Dorien Simon
- Computer Science Department, College of Engineering, Stanford University, Stanford, CA, United States
| | - Luke Hansen
- Symbolic Systems Program, School of Humanities and Sciences, Stanford University, Stanford, CA, United States
| | - Nathaniel H Goenawan
- Computer Science Department, College of Engineering, Stanford University, Stanford, CA, United States
| | - Joshua Kim
- Computer Science Department, College of Engineering, Stanford University, Stanford, CA, United States
| | - Nikhil Gowda
- Alliance Innovation Lab, Silicon Valley, CA, United States
| | - Dan Jurafsky
- Computer Science Department, College of Engineering, Stanford University, Stanford, CA, United States.,Department of Linguistics, School of Humanities and Sciences, Stanford University, Stanford, CA, United States
| | | |
Collapse
|
47
|
Klos MC, Escoredo M, Joerin A, Lemos VN, Rauws M, Bunge EL. Artificial Intelligence-Based Chatbot for Anxiety and Depression in University Students: Pilot Randomized Controlled Trial. JMIR Form Res 2021; 5:e20678. [PMID: 34092548 PMCID: PMC8391753 DOI: 10.2196/20678] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Revised: 11/11/2020] [Accepted: 05/29/2021] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Artificial intelligence-based chatbots are emerging as instruments of psychological intervention; however, no relevant studies have been reported in Latin America. OBJECTIVE The objective of the present study was to evaluate the viability, acceptability, and potential impact of using Tess, a chatbot, for examining symptoms of depression and anxiety in university students. METHODS This was a pilot randomized controlled trial. The experimental condition used Tess for 8 weeks, and the control condition was assigned to a psychoeducation book on depression. Comparisons were conducted using Mann-Whitney U and Wilcoxon tests for depressive symptoms, and independent and paired sample t tests to analyze anxiety symptoms. RESULTS The initial sample consisted of 181 Argentinian college students (158, 87.2% female) aged 18 to 33. Data at week 8 were provided by 39 out of the 99 (39%) participants in the experimental condition and 34 out of the 82 (41%) in the control group. On an average, 472 (SD 249.52) messages were exchanged, with 116 (SD 73.87) of the messages sent from the users in response to Tess. A higher number of messages exchanged with Tess was associated with positive feedback (F2,36=4.37; P=.02). No significant differences between the experimental and control groups were found from the baseline to week 8 for depressive and anxiety symptoms. However, significant intragroup differences demonstrated that the experimental group showed a significant decrease in anxiety symptoms; no such differences were observed for the control group. Further, no significant intragroup differences were found for depressive symptoms. CONCLUSIONS The students spent a considerable amount of time exchanging messages with Tess and positive feedback was associated with a higher number of messages exchanged. The initial results show promising evidence for the usability and acceptability of Tess in the Argentinian population. Research on chatbots is still in its initial stages and further research is needed.
Collapse
Affiliation(s)
- Maria Carolina Klos
- Interdisciplinary Center for Research in Health and Behavioral Sciences (CIICSAC), Universidad Adventista del Plata (UAP)., National Scientific and Technical Research Council (CONICET)., Libertador San Martín, Entre Ríos, Argentina
| | | | | | - Viviana Noemí Lemos
- Interdisciplinary Center for Research in Health and Behavioral Sciences (CIICSAC), Universidad Adventista del Plata (UAP)., National Scientific and Technical Research Council (CONICET)., Libertador San Martín, Entre Ríos, Argentina
| | | | - Eduardo L Bunge
- Department of Psychology, Palo Alto University, Palo Alto, CA, United States
| |
Collapse
|
48
|
Vilaza GN, McCashin D. Is the Automation of Digital Mental Health Ethical? Applying an Ethical Framework to Chatbots for Cognitive Behaviour Therapy. Front Digit Health 2021; 3:689736. [PMID: 34713163 PMCID: PMC8521996 DOI: 10.3389/fdgth.2021.689736] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Accepted: 07/16/2021] [Indexed: 11/13/2022] Open
Abstract
The COVID-19 pandemic has intensified the need for mental health support across the whole spectrum of the population. Where global demand outweighs the supply of mental health services, established interventions such as cognitive behavioural therapy (CBT) have been adapted from traditional face-to-face interaction to technology-assisted formats. One such notable development is the emergence of Artificially Intelligent (AI) conversational agents for psychotherapy. Pre-pandemic, these adaptations had demonstrated some positive results; but they also generated debate due to a number of ethical and societal challenges. This article commences with a critical overview of both positive and negative aspects concerning the role of AI-CBT in its present form. Thereafter, an ethical framework is applied with reference to the themes of (1) beneficence, (2) non-maleficence, (3) autonomy, (4) justice, and (5) explicability. These themes are then discussed in terms of practical recommendations for future developments. Although automated versions of therapeutic support may be of appeal during times of global crises, ethical thinking should be at the core of AI-CBT design, in addition to guiding research, policy, and real-world implementation as the world considers post-COVID-19 society.
Collapse
|
49
|
Abstract
OBJECTIVES To describe the use and promise of conversational agents in digital health-including health promotion andprevention-and how they can be combined with other new technologies to provide healthcare at home. METHOD A narrative review of recent advances in technologies underpinning conversational agents and their use and potential for healthcare and improving health outcomes. RESULTS By responding to written and spoken language, conversational agents present a versatile, natural user interface and have the potential to make their services and applications more widely accessible. Historically, conversational interfaces for health applications have focused mainly on mental health, but with an increase in affordable devices and the modernization of health services, conversational agents are becoming more widely deployed across the health system. We present our work on context-aware voice assistants capable of proactively engaging users and delivering health information and services. The proactive voice agents we deploy, allow us to conduct experience sampling in people's homes and to collect information about the contexts in which users are interacting with them. CONCLUSION In this article, we describe the state-of-the-art of these and other enabling technologies for speech and conversation and discuss ongoing research efforts to develop conversational agents that "live" with patients and customize their service offerings around their needs. These agents can function as 'digital companions' who will send reminders about medications and appointments, proactively check in to gather self-assessments, and follow up with patients on their treatment plans. Together with an unobtrusive and continuous collection of other health data, conversational agents can provide novel and deeply personalized access to digital health care, and they will continue to become an increasingly important part of the ecosystem for future healthcare delivery.
Collapse
Affiliation(s)
- Tilman Dingler
- NHMRC CRE in Digital Technology to Transform Chronic Disease Outcomes, School of Computing and Information Systems, University of Melbourne, Parkville, Australia
| | - Dominika Kwasnicka
- NHMRC CRE in Digital Technology to Transform Chronic Disease Outcomes, Melbourne School of Population and Global Health, University of Melbourne, Melbourne, Australia
| | - Jing Wei
- NHMRC CRE in Digital Technology to Transform Chronic Disease Outcomes, School of Computing and Information Systems, University of Melbourne, Parkville, Australia
| | - Enying Gong
- NHMRC CRE in Digital Technology to Transform Chronic Disease Outcomes, Melbourne School of Population and Global Health, University of Melbourne, Melbourne, Australia
| | - Brian Oldenburg
- NHMRC CRE in Digital Technology to Transform Chronic Disease Outcomes, Melbourne School of Population and Global Health, University of Melbourne, Melbourne, Australia
| |
Collapse
|
50
|
Gross C, Schachner T, Hasl A, Kohlbrenner D, Clarenbach CF, Wangenheim FV, Kowatsch T. Personalization of Conversational Agent-Patient Interaction Styles for Chronic Disease Management: Two Consecutive Cross-sectional Questionnaire Studies. J Med Internet Res 2021; 23:e26643. [PMID: 33913814 PMCID: PMC8190651 DOI: 10.2196/26643] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Revised: 02/12/2021] [Accepted: 04/14/2021] [Indexed: 01/18/2023] Open
Abstract
Background Conversational agents (CAs) for chronic disease management are receiving increasing attention in academia and the industry. However, long-term adherence to CAs is still a challenge and needs to be explored. Personalization of CAs has the potential to improve long-term adherence and, with it, user satisfaction, task efficiency, perceived benefits, and intended behavior change. Research on personalized CAs has already addressed different aspects, such as personalized recommendations and anthropomorphic cues. However, detailed information on interaction styles between patients and CAs in the role of medical health care professionals is scant. Such interaction styles play essential roles for patient satisfaction, treatment adherence, and outcome, as has been shown for physician-patient interactions. Currently, it is not clear (1) whether chronically ill patients prefer a CA with a paternalistic, informative, interpretive, or deliberative interaction style, and (2) which factors influence these preferences. Objective We aimed to investigate the preferences of chronically ill patients for CA-delivered interaction styles. Methods We conducted two studies. The first study included a paper-based approach and explored the preferences of chronic obstructive pulmonary disease (COPD) patients for paternalistic, informative, interpretive, and deliberative CA-delivered interaction styles. Based on these results, a second study assessed the effects of the paternalistic and deliberative interaction styles on the relationship quality between the CA and patients via hierarchical multiple linear regression analyses in an online experiment with COPD patients. Patients’ sociodemographic and disease-specific characteristics served as moderator variables. Results Study 1 with 117 COPD patients revealed a preference for the deliberative (50/117) and informative (34/117) interaction styles across demographic characteristics. All patients who preferred the paternalistic style over the other interaction styles had more severe COPD (three patients, Global Initiative for Chronic Obstructive Lung Disease class 3 or 4). In Study 2 with 123 newly recruited COPD patients, younger participants and participants with a less recent COPD diagnosis scored higher on interaction-related outcomes when interacting with a CA that delivered the deliberative interaction style (interaction between age and CA type: relationship quality: b=−0.77, 95% CI −1.37 to −0.18; intention to continue interaction: b=−0.49, 95% CI −0.97 to −0.01; working alliance attachment bond: b=−0.65, 95% CI −1.26 to −0.04; working alliance goal agreement: b=−0.59, 95% CI −1.18 to −0.01; interaction between recency of COPD diagnosis and CA type: working alliance goal agreement: b=0.57, 95% CI 0.01 to 1.13). Conclusions Our results indicate that age and a patient’s personal disease experience inform which CA interaction style the patient should be paired with to achieve increased interaction-related outcomes with the CA. These results allow the design of personalized health care CAs with the goal to increase long-term adherence to health-promoting behavior.
Collapse
Affiliation(s)
- Christoph Gross
- Department of Management, Technology, and Economics, ETH Zurich, Zurich, Switzerland.,Centre for Digital Health Interventions, Department of Management, Technology, and Economics, ETH Zurich, Zurich, Switzerland
| | - Theresa Schachner
- Department of Management, Technology, and Economics, ETH Zurich, Zurich, Switzerland.,Centre for Digital Health Interventions, Department of Management, Technology, and Economics, ETH Zurich, Zurich, Switzerland
| | - Andrea Hasl
- Department of Educational Sciences, University of Potsdam, Potsdam, Germany.,International Max Planck Research School on the Life Course, Berlin, Germany
| | - Dario Kohlbrenner
- Department of Pulmonology, University Hospital Zurich, Zurich, Switzerland.,Faculty of Medicine, University of Zurich, Zurich, Switzerland
| | | | - Forian V Wangenheim
- Centre for Digital Health Interventions, Department of Management, Technology, and Economics, ETH Zurich, Zurich, Switzerland
| | - Tobias Kowatsch
- Centre for Digital Health Interventions, Department of Management, Technology, and Economics, ETH Zurich, Zurich, Switzerland.,Centre for Digital Health Interventions, Institute of Technology Management, University of St. Gallen, St. Gallen, Switzerland.,Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore
| |
Collapse
|