1
|
Ding H, Simmich J, Vaezipour A, Andrews N, Russell T. Evaluation framework for conversational agents with artificial intelligence in health interventions: a systematic scoping review. J Am Med Inform Assoc 2024; 31:746-761. [PMID: 38070173 PMCID: PMC10873847 DOI: 10.1093/jamia/ocad222] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Revised: 11/04/2023] [Accepted: 11/24/2023] [Indexed: 02/18/2024] Open
Abstract
OBJECTIVES Conversational agents (CAs) with emerging artificial intelligence present new opportunities to assist in health interventions but are difficult to evaluate, deterring their applications in the real world. We aimed to synthesize existing evidence and knowledge and outline an evaluation framework for CA interventions. MATERIALS AND METHODS We conducted a systematic scoping review to investigate designs and outcome measures used in the studies that evaluated CAs for health interventions. We then nested the results into an overarching digital health framework proposed by the World Health Organization (WHO). RESULTS The review included 81 studies evaluating CAs in experimental (n = 59), observational (n = 15) trials, and other research designs (n = 7). Most studies (n = 72, 89%) were published in the past 5 years. The proposed CA-evaluation framework includes 4 evaluation stages: (1) feasibility/usability, (2) efficacy, (3) effectiveness, and (4) implementation, aligning with WHO's stepwise evaluation strategy. Across these stages, this article presents the essential evidence of different study designs (n = 8), sample sizes, and main evaluation categories (n = 7) with subcategories (n = 40). The main evaluation categories included (1) functionality, (2) safety and information quality, (3) user experience, (4) clinical and health outcomes, (5) costs and cost benefits, (6) usage, adherence, and uptake, and (7) user characteristics for implementation research. Furthermore, the framework highlighted the essential evaluation areas (potential primary outcomes) and gaps across the evaluation stages. DISCUSSION AND CONCLUSION This review presents a new framework with practical design details to support the evaluation of CA interventions in healthcare research. PROTOCOL REGISTRATION The Open Science Framework (https://osf.io/9hq2v) on March 22, 2021.
Collapse
Affiliation(s)
- Hang Ding
- RECOVER Injury Research Centre, Faculty of Health and Behavioural Sciences, The University of Queensland, Brisbane, QLD, Australia
- STARS Education and Research Alliance, Surgical Treatment and Rehabilitation Service (STARS), The University of Queensland and Metro North Health, Brisbane, QLD, Australia
| | - Joshua Simmich
- RECOVER Injury Research Centre, Faculty of Health and Behavioural Sciences, The University of Queensland, Brisbane, QLD, Australia
- STARS Education and Research Alliance, Surgical Treatment and Rehabilitation Service (STARS), The University of Queensland and Metro North Health, Brisbane, QLD, Australia
| | - Atiyeh Vaezipour
- RECOVER Injury Research Centre, Faculty of Health and Behavioural Sciences, The University of Queensland, Brisbane, QLD, Australia
- STARS Education and Research Alliance, Surgical Treatment and Rehabilitation Service (STARS), The University of Queensland and Metro North Health, Brisbane, QLD, Australia
| | - Nicole Andrews
- RECOVER Injury Research Centre, Faculty of Health and Behavioural Sciences, The University of Queensland, Brisbane, QLD, Australia
- STARS Education and Research Alliance, Surgical Treatment and Rehabilitation Service (STARS), The University of Queensland and Metro North Health, Brisbane, QLD, Australia
- The Tess Cramond Pain and Research Centre, Metro North Hospital and Health Service, Brisbane, QLD, Australia
- The Occupational Therapy Department, The Royal Brisbane and Women’s Hospital, Metro North Hospital and Health Service, Brisbane, QLD, Australia
| | - Trevor Russell
- RECOVER Injury Research Centre, Faculty of Health and Behavioural Sciences, The University of Queensland, Brisbane, QLD, Australia
- STARS Education and Research Alliance, Surgical Treatment and Rehabilitation Service (STARS), The University of Queensland and Metro North Health, Brisbane, QLD, Australia
| |
Collapse
|
2
|
Mancone S, Diotaiuti P, Valente G, Corrado S, Bellizzi F, Vilarino GT, Andrade A. The Use of Voice Assistant for Psychological Assessment Elicits Empathy and Engagement While Maintaining Good Psychometric Properties. Behav Sci (Basel) 2023; 13:550. [PMID: 37503997 PMCID: PMC10376154 DOI: 10.3390/bs13070550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 06/20/2023] [Accepted: 06/28/2023] [Indexed: 07/29/2023] Open
Abstract
This study aimed to use the Alexa vocal assistant as an administerer of psychometric tests, assessing the efficiency and validity of this measurement. A total of 300 participants were administered the Interpersonal Reactivity Index (IRI). After a week, the administration was repeated, but the participants were randomly divided into groups of 100 participants each. In the first, the test was administered by means of a paper version; in the second, the questionnaire was read to the participants in person, and the operator contemporaneously recorded the answers declared by the participants; in the third group, the questionnaire was directly administered by the Alexa voice device, after specific reprogramming. The third group was also administered, as a post-session survey, the Engagement and Perceptions of the Bot Scale (EPVS), a short version of the Communication Styles Inventory (CSI), the Marlowe-Crowne Social Desirability Scale (MCSDS), and an additional six items to measure degrees of concentration, ease, and perceived pressure at the beginning and at the end of the administration. The results confirmed that the IRI did keep measurement invariance within the three conditions. The administration through vocal assistant showed an empathic activation effect significantly superior to the conditions of pencil-paper and operator-in-presence. The results indicated an engagement and positive evaluation of the interactive experience, with reported perceptions of closeness, warmth, competence, and human-likeness associated with higher values of empathetic activation and lower values of personal discomfort.
Collapse
Affiliation(s)
- Stefania Mancone
- Department of Human Sciences, Society and Health, University of Cassino and Southern Lazio, 03043 Cassino, Italy
| | - Pierluigi Diotaiuti
- Department of Human Sciences, Society and Health, University of Cassino and Southern Lazio, 03043 Cassino, Italy
| | - Giuseppe Valente
- Department of Human Sciences, Society and Health, University of Cassino and Southern Lazio, 03043 Cassino, Italy
| | - Stefano Corrado
- Department of Human Sciences, Society and Health, University of Cassino and Southern Lazio, 03043 Cassino, Italy
| | - Fernando Bellizzi
- Department of Human Sciences, Society and Health, University of Cassino and Southern Lazio, 03043 Cassino, Italy
| | - Guilherme Torres Vilarino
- Health and Sports Science Center, Department of Physical Education, Santa Catarina State University, Florianópolis 88035-901, Brazil
| | - Alexandro Andrade
- Health and Sports Science Center, Department of Physical Education, Santa Catarina State University, Florianópolis 88035-901, Brazil
| |
Collapse
|
3
|
Czerwiński SK, Atroszko PA. A solution for factorial validity testing of three-item scales: An example of tau-equivalent strict measurement invariance of three-item loneliness scale. CURRENT PSYCHOLOGY 2023; 42:1652-1664. [PMID: 33716473 PMCID: PMC7936930 DOI: 10.1007/s12144-021-01554-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/26/2021] [Indexed: 01/15/2023]
Abstract
Ultra-short scales are increasingly popular in surveys. Congeneric model fit of a three-item scale cannot be tested with Confirmatory Factor Analysis (CFA) without additional assumptions because the number of degrees of freedom is equal to zero. A more rigorous tau-equivalent model, assuming equality of factor loadings can be tested instead. The objective of this study was to demonstrate this approach with an example of the psychometric study of the Polish version of the Three-Item Loneliness Scale (TILS), and to discuss the arising problems and possible solutions. There seems to be a high need for such analysis because currently, some properties of CFA make it an approach still predominant over Item Response Theory (IRT) models in the quality of life research. A sample of 3510 students completed TILS together with the questionnaires measuring a variety of indicators of well-being. The results provided evidence for a good fit of a tau-equivalent model. Furthermore, multi-group CFAs provided support for strict measurement invariance of this model. To the Authors' knowledge, it is the first practical application of a tau-equivalent model to testing the factorial validity of an ultra-short scale and probably the first empirical case of tau-equivalent measurement invariance in psychological literature in general. TILS showed good criterion validity and satisfactory reliability. Unidimensionality of three-item scales can be examined with a tau-equivalent model that has some favorable psychometric properties. However, it might be exceedingly restrictive in certain practical cases. When developing a new short scale, it is recommended to maintain at least four items.
Collapse
|
4
|
Alsubheen SA, Oliveira A, Habash R, Goldstein R, Brooks D. Systematic review of psychometric properties and cross-cultural adaptation of the University of California and Los Angeles loneliness scale in adults. CURRENT PSYCHOLOGY 2021; 42:1-15. [PMID: 34785877 PMCID: PMC8586628 DOI: 10.1007/s12144-021-02494-w] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/08/2021] [Indexed: 12/30/2022]
Abstract
This systematic review assessed the psychometric properties and the cross-cultural adaptation of the University of California and Los Angeles Loneliness scale (UCLA-LS) in adults. A systematic search of four electronic databases (PubMed, EMBASE, Scopus, and PsycINFO) was conducted from inception until March 2021. We followed the Consensus-Based Standards for the Selection of Health Measurement Instruments (COSMIN) guidelines for data extraction and evidence synthesis. Eighty-one studies assessed the validity and reliability of the UCLA-LS, translated into many languages, and applied across several countries/societies. Three versions of the 20-item and nine short versions of the UCLA-LS with 3 to 20 questions were identified. High-quality evidence supported the internal structure of the UCLAs: 4, 6, 7 and 10, while low-to moderate-quality evidence supported the construct validity of the UCLAs: 3, 4, 6, 8, 16 and 20. Moderate-quality evidence supported the test-retest reliability of version 3 UCLA-20 with excellent interclass coefficients values of 0.76-0.93. The UCLAs: 4, 6, 7 and 10 had the most robust internal structure and may therefore be the most useful for informing clinicians and social psychologists engaged in assisting those with loneliness. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s12144-021-02494-w.
Collapse
Affiliation(s)
- Sanaa A. Alsubheen
- School of Rehabilitation Science, McMaster University, 1400 Main Street West, IAHS Building Room 430, Hamilton, ON L8S 1C7 Canada
| | - Ana Oliveira
- School of Rehabilitation Science, McMaster University, 1400 Main Street West, IAHS Building Room 430, Hamilton, ON L8S 1C7 Canada
- Department of Respiratory Medicine, West Park Healthcare Centre, Toronto, ON Canada
- Lab3R – Respiratory Research and Rehabilitation Laboratory, School of Health Sciences, University of Aveiro (ESSUA), Aveiro, Portugal
- Institute for Biomedicine (iBiMED), University of Aveiro, Aveiro, Portugal
| | - Razanne Habash
- Department of Respiratory Medicine, West Park Healthcare Centre, Toronto, ON Canada
| | - Roger Goldstein
- Department of Respiratory Medicine, West Park Healthcare Centre, Toronto, ON Canada
- Faculty of Medicine, University of Toronto, Toronto, ON Canada
| | - Dina Brooks
- School of Rehabilitation Science, McMaster University, 1400 Main Street West, IAHS Building Room 430, Hamilton, ON L8S 1C7 Canada
- Department of Respiratory Medicine, West Park Healthcare Centre, Toronto, ON Canada
- Department of Physical Therapy and Rehabilitation Science, University of Toronto, Toronto, ON Canada
| |
Collapse
|
5
|
Dosovitsky G, Kim E, Bunge EL. Psychometric Properties of a Chatbot Version of the PHQ-9 With Adults and Older Adults. Front Digit Health 2021; 3:645805. [PMID: 34713116 PMCID: PMC8522018 DOI: 10.3389/fdgth.2021.645805] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2020] [Accepted: 04/06/2021] [Indexed: 12/20/2022] Open
Abstract
Background: The Patient Health Questionnaire-9 (PHQ-9) is a brief depression measure that has been validated. A chatbot version of the PHQ-9 would allow the assessment of depressive symptoms remotely, at a large scale and low cost. Objective: The current study aims to: Assess the feasibility of administering the PHQ-9 in a sample of adults and older adults via chatbot, report the psychometric properties of and identify the relationship between demographic variables and PHQ-9 total scores. Methods: A sample of 3,902 adults and older adults in the US and Canada were recruited through Facebook from August 2019 to February 2020 to complete the PHQ-9 using a chatbot. Results: A total of 3,895 (99.82%) completed the PHQ-9 successfully. The internal consistency of the PHQ-9 was 0.896 (p < 0.05). A one factor structure was found to have good model fit [X2 (27, N = 1,948) = 365.396, p < 0.001; RMSEA = 0.080 (90% CI: 0.073, 0.088); CFI and TLI were 0.925 and 0.900, respectively, and SRMR was 0.039]. All of the demographic characteristics in this study were found to significantly predict PHQ-9 total score, however; their effect was negligible to weak. Conclusions: There was a large sample of adults and older adults were open to completing assessments via chatbot including those over 75. The psychometric properties of the chatbot version of the PHQ-9 provide initial support to the utilization of this assessment method.
Collapse
Affiliation(s)
- Gilly Dosovitsky
- Psychology Department, Palo Alto University, Palo Alto, CA, United States
| | - Erick Kim
- Psychology Department, Palo Alto University, Palo Alto, CA, United States
| | - Eduardo L Bunge
- Psychology Department, Palo Alto University, Palo Alto, CA, United States
| |
Collapse
|