1
|
Cheung JC, Ho SS. Explainable AI and trust: How news media shapes public support for AI-powered autonomous passenger drones. PUBLIC UNDERSTANDING OF SCIENCE (BRISTOL, ENGLAND) 2025; 34:344-362. [PMID: 39651735 DOI: 10.1177/09636625241291192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2024]
Abstract
This study delves into the intricate relationships between attention to AI in news media, perceived AI explainability, trust in AI, and public support for autonomous passenger drones. Using structural equation modelling (N = 1,002), we found significant associations between perceived AI explainability and all trust dimensions (i.e., performance, purpose, process). Additionally, we revealed that the public acquired the perception of AI explainability through attention to AI in the news media. Consequently, we found that when the public pondered upon support for autonomous passenger drones, only the trust in performance dimension was relevant. Our findings underscore the importance of ensuring explainability for the public and highlight the pivotal role of news media in shaping public perceptions in emerging AI technologies. Theoretical and practical implications are discussed.
Collapse
Affiliation(s)
- Justin C Cheung
- Wee Kim Wee School of Communication and Information, Nanyang Technological University, Singapore
- Campus for Research Excellence and Technological Enterprise, Singapore
| | - Shirley S Ho
- Wee Kim Wee School of Communication and Information, Nanyang Technological University, Singapore
- Campus for Research Excellence and Technological Enterprise, Singapore
| |
Collapse
|
2
|
Lammert JM, Roberts AC, McRae K, Batterink LJ, Butler BE. Early Identification of Language Disorders Using Natural Language Processing and Machine Learning: Challenges and Emerging Approaches. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2025; 68:705-718. [PMID: 39787490 DOI: 10.1044/2024_jslhr-24-00515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2025]
Abstract
PURPOSE Recent advances in artificial intelligence provide opportunities to capture and represent complex features of human language in a more automated manner, offering potential means of improving the efficiency of language assessment. This review article presents computerized approaches for the analysis of narrative language and identification of language disorders in children. METHOD We first describe the current barriers to clinicians' use of language sample analysis, narrative language sampling approaches, and the data processing stages that precede analysis. We then present recent studies demonstrating the automated extraction of linguistic features and identification of developmental language disorder using natural language processing and machine learning. We explain how these tools operate and emphasize how the decisions made in construction impact their performance in important ways, especially in the analysis of child language samples. We conclude with a discussion of major challenges in the field with respect to bias, access, and generalizability across settings and applications. CONCLUSION Given the progress that has occurred over the last decade, computer-automated approaches offer a promising opportunity to improve the efficiency and accessibility of language sample analysis and expedite the diagnosis and treatment of language disorders in children.
Collapse
Affiliation(s)
- Jessica M Lammert
- Graduate Program in Psychology, University of Western Ontario, London, Canada
| | - Angela C Roberts
- School of Communication Sciences and Disorders, University of Western Ontario, London, Canada
- Department of Computer Science, University of Western Ontario, London, Canada
| | - Ken McRae
- Department of Psychology, University of Western Ontario, London, Canada
- Centre for Brain and Mind, University of Western Ontario, London, Canada
| | - Laura J Batterink
- Department of Psychology, University of Western Ontario, London, Canada
- Centre for Brain and Mind, University of Western Ontario, London, Canada
| | - Blake E Butler
- Department of Psychology, University of Western Ontario, London, Canada
- Centre for Brain and Mind, University of Western Ontario, London, Canada
- National Centre for Audiology, University of Western Ontario, London, Canada
| |
Collapse
|
3
|
Aissaoui Ferhi L, Ben Amar M, Choubani F, Bouallegue R. Enhancing diagnostic accuracy in symptom-based health checkers: a comprehensive machine learning approach with clinical vignettes and benchmarking. Front Artif Intell 2024; 7:1397388. [PMID: 39421435 PMCID: PMC11483353 DOI: 10.3389/frai.2024.1397388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Accepted: 07/17/2024] [Indexed: 10/19/2024] Open
Abstract
Introduction The development of machine learning models for symptom-based health checkers is a rapidly evolving area with significant implications for healthcare. Accurate and efficient diagnostic tools can enhance patient outcomes and optimize healthcare resources. This study focuses on evaluating and optimizing machine learning models using a dataset of 10 diseases and 9,572 samples. Methods The dataset was divided into training and testing sets to facilitate model training and evaluation. The following models were selected and optimized: Decision Tree, Random Forest, Naive Bayes, Logistic Regression and K-Nearest Neighbors. Evaluation metrics included accuracy, F1 scores, and 10-fold cross-validation. ROC-AUC and precision-recall curves were also utilized to assess model performance, particularly in scenarios with imbalanced datasets. Clinical vignettes were employed to gauge the real-world applicability of the models. Results The performance of the models was evaluated using accuracy, F1 scores, and 10-fold cross-validation. The use of ROC-AUC curves revealed that model performance improved with increasing complexity. Precision-recall curves were particularly useful in evaluating model sensitivity in imbalanced dataset scenarios. Clinical vignettes demonstrated the robustness of the models in providing accurate diagnoses. Discussion The study underscores the importance of comprehensive model evaluation techniques. The use of clinical vignette testing and analysis of ROC-AUC and precision-recall curves are crucial in ensuring the reliability and sensitivity of symptom-based health checkers. These techniques provide a more nuanced understanding of model performance and highlight areas for further improvement. Conclusion This study highlights the significance of employing diverse evaluation metrics and methods to ensure the robustness and accuracy of machine learning models in symptom-based health checkers. The integration of clinical vignettes and the analysis of ROC-AUC and precision-recall curves are essential steps in developing reliable and sensitive diagnostic tools.
Collapse
Affiliation(s)
- Leila Aissaoui Ferhi
- Virtual University of Tunis, Tunis, Tunisia
- Innov’Com Laboratory at SUPCOM, University of Carthage, Carthage, Tunisia
| | - Manel Ben Amar
- Virtual University of Tunis, Tunis, Tunisia
- Innov’Com Laboratory at SUPCOM, University of Carthage, Carthage, Tunisia
- Faculty of Dental Medicine of Monastir, University of Monastir, Monastir, Tunisia
| | - Fethi Choubani
- Innov’Com Laboratory at SUPCOM, University of Carthage, Carthage, Tunisia
| | - Ridha Bouallegue
- Innov’Com Laboratory at SUPCOM, University of Carthage, Carthage, Tunisia
| |
Collapse
|
4
|
Subramanian HV, Canfield C, Shank DB. Designing explainable AI to improve human-AI team performance: A medical stakeholder-driven scoping review. Artif Intell Med 2024; 149:102780. [PMID: 38462282 DOI: 10.1016/j.artmed.2024.102780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 12/20/2023] [Accepted: 01/14/2024] [Indexed: 03/12/2024]
Abstract
The rise of complex AI systems in healthcare and other sectors has led to a growing area of research called Explainable AI (XAI) designed to increase transparency. In this area, quantitative and qualitative studies focus on improving user trust and task performance by providing system- and prediction-level XAI features. We analyze stakeholder engagement events (interviews and workshops) on the use of AI for kidney transplantation. From this we identify themes which we use to frame a scoping literature review on current XAI features. The stakeholder engagement process lasted over nine months covering three stakeholder group's workflows, determining where AI could intervene and assessing a mock XAI decision support system. Based on the stakeholder engagement, we identify four major themes relevant to designing XAI systems - 1) use of AI predictions, 2) information included in AI predictions, 3) personalization of AI predictions for individual differences, and 4) customizing AI predictions for specific cases. Using these themes, our scoping literature review finds that providing AI predictions before, during, or after decision-making could be beneficial depending on the complexity of the stakeholder's task. Additionally, expert stakeholders like surgeons prefer minimal to no XAI features, AI prediction, and uncertainty estimates for easy use cases. However, almost all stakeholders prefer to have optional XAI features to review when needed, especially in hard-to-predict cases. The literature also suggests that providing both system- and prediction-level information is necessary to build the user's mental model of the system appropriately. Although XAI features improve users' trust in the system, human-AI team performance is not always enhanced. Overall, stakeholders prefer to have agency over the XAI interface to control the level of information based on their needs and task complexity. We conclude with suggestions for future research, especially on customizing XAI features based on preferences and tasks.
Collapse
Affiliation(s)
- Harishankar V Subramanian
- Engineering Management & Systems Engineering, Missouri University of Science and Technology, 600 W 14(th) Street, Rolla, MO 65409, United States of America
| | - Casey Canfield
- Engineering Management & Systems Engineering, Missouri University of Science and Technology, 600 W 14(th) Street, Rolla, MO 65409, United States of America.
| | - Daniel B Shank
- Psychological Science, Missouri University of Science and Technology, 500 W 14(th) Street, Rolla, MO 65409, United States of America
| |
Collapse
|
5
|
Wetzel AJ, Koch R, Koch N, Klemmt M, Müller R, Preiser C, Rieger M, Rösel I, Ranisch R, Ehni HJ, Joos S. 'Better see a doctor?' Status quo of symptom checker apps in Germany: A cross-sectional survey with a mixed-methods design (CHECK.APP). Digit Health 2024; 10:20552076241231555. [PMID: 38434790 PMCID: PMC10908232 DOI: 10.1177/20552076241231555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/22/2024] [Indexed: 03/05/2024] Open
Abstract
Background Symptom checker apps (SCAs) offer symptom classification and low-threshold self-triage for laypeople. They are already in use despite their poor accuracy and concerns that they may negatively affect primary care. This study assesses the extent to which SCAs are used by medical laypeople in Germany and which software is most popular. We examined associations between satisfaction with the general practitioner (GP) and SCA use as well as the number of GP visits and SCA use. Furthermore, we assessed the reasons for intentional non-use. Methods We conducted a survey comprising standardised and open-ended questions. Quantitative data were weighted, and open-ended responses were examined using thematic analysis. Results This study included 850 participants. The SCA usage rate was 8%, and approximately 50% of SCA non-users were uninterested in trying SCAs. The most commonly used SCAs were NetDoktor and Ada. Surprisingly, SCAs were most frequently used in the age group of 51-55 years. No significant associations were found between SCA usage and satisfaction with the GP or the number of GP visits and SCA usage. Thematic analysis revealed skepticism regarding the results and recommendations of SCAs and discrepancies between users' requirements and the features of apps. Conclusion SCAs are still widely unknown in the German population and have been sparsely used so far. Many participants were not interested in trying SCAs, and we found no positive or negative associations of SCAs and primary care.
Collapse
Affiliation(s)
- Anna-Jasmin Wetzel
- Institute of General Practice and Interprofessional Care, University Hospital Tübingen, Tübingen, Germany
| | - Roland Koch
- Institute of General Practice and Interprofessional Care, University Hospital Tübingen, Tübingen, Germany
| | - Nadine Koch
- Institute of Software Engineering, University of Stuttgart, Stuttgart, Germany
| | - Malte Klemmt
- Institute of Applied Social Science, University of Applied Science Würzburg-Schweinfurt, Wurzburg, Germany
| | - Regina Müller
- Institute of Philosophy, University of Bremen, Bremen, Germany
| | - Christine Preiser
- Institute of Occupational and Social Medicine and Health Services Research, University Hospital Tübingen, Tübingen, Germany
| | - Monika Rieger
- Institute of Occupational and Social Medicine and Health Services Research, University Hospital Tübingen, Tübingen, Germany
| | - Inka Rösel
- Institute of Clinical Epidemiology and Applied Biometry, University Hospital Tübingen, Tübingen, Germany
| | - Robert Ranisch
- Faculty of Health Sciences, University of Potsdam, Potsdam, Germany
| | - Hans-Jörg Ehni
- Institute of Ethics and History of Medicine, University Hospital Tübingen, Tübingen, Germany
| | - Stefanie Joos
- Institute of General Practice and Interprofessional Care, University Hospital Tübingen, Tübingen, Germany
| |
Collapse
|
6
|
Love CS. "Just the Facts Ma'am": Moral and Ethical Considerations for Artificial Intelligence in Medicine and its Potential to Impact Patient Autonomy and Hope. LINACRE QUARTERLY 2023; 90:375-394. [PMID: 37974568 PMCID: PMC10638968 DOI: 10.1177/00243639231162431] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2023]
Abstract
Applying machine-based learning and synthetic cognition, commonly referred to as artificial intelligence (AI), to medicine intimates prescient knowledge. The ability of these algorithms to potentially unlock secrets held within vast data sets makes them invaluable to healthcare. Complex computer algorithms are routinely used to enhance diagnoses in fields like oncology, cardiology, and neurology. These algorithms have found utility in making healthcare decisions that are often complicated by seemingly endless relationships between exogenous and endogenous variables. They have also found utility in the allocation of limited healthcare resources and the management of end-of-life issues. With the increase in computing power and the ability to test a virtually unlimited number of relationships, scientists and engineers have the unprecedented ability to increase the prognostic confidence that comes from complex data analysis. While these systems present exciting opportunities for the democratization and precision of healthcare, their use raises important moral and ethical considerations around Christian concepts of autonomy and hope. The purpose of this essay is to explore some of the practical limitations associated with AI in medicine and discuss some of the potential theological implications that machine-generated diagnoses may present. Specifically, this article examines how these systems may disrupt the patient and healthcare provider relationship emblematic of Christ's healing mission. Finally, this article seeks to offer insights that might help in the development of a more robust ethical framework for the application of these systems in the future.
Collapse
|
7
|
Stepin I, Alonso-Moral JM, Catala A, Pereira-Fariña M. An empirical study on how humans appreciate automated counterfactual explanations which embrace imprecise information. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.10.098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
8
|
Shen Y, Xu W, Liang A, Wang X, Lu X, Lu Z, Gao C. Online health management continuance and the moderating effect of service type and age difference: A meta-analysis. Health Informatics J 2022; 28:14604582221119950. [PMID: 35976977 DOI: 10.1177/14604582221119950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Numerous empirical studies have been carried out to explore factors of online health management continuance. However, results were not unified. We thus conducted a meta-analysis to identify influential factors and potential moderators. A systematic literature search was performed in nine databases (PubMed, Web of Science, the Cochrane Library, Ovid of JBI, CINAHL, Embase, CNKI, VIP, and CBM) published up to December 2020 in the English or Chinese language. Meta-analysis of combined effect size, heterogeneity, moderator analysis, publication bias assessment, and inter-rater reliability was conducted. Totally 41 studies and 12 pairwise relationships were identified. Confirmation, perceived usefulness, satisfaction, information quality, service quality, perceived ease of use, and trust were all critical predictors. Service type and age difference showed their moderating effects respectively. The perceived usefulness was more noteworthy in medical service than health and fitness service. The trust was more noteworthy in young adults. The results confirmed the validity and robustness of the Expectation Confirmation Model, Information Systems Success Model, and trust theory in online health management continuance. Moderators included but are not limited to age difference and service type. The elderly research in the healthcare context and other analytical methods such as qualitative comparative analysis should be applied in the future.
Collapse
Affiliation(s)
- Yucong Shen
- School of Nursing, 26453Wenzhou Medical University, China
| | - Wenxian Xu
- School of Nursing, 26453Wenzhou Medical University, China
| | - Andong Liang
- School of Nursing, 26453Wenzhou Medical University, China
| | - Xinlu Wang
- School of Nursing, 26453Wenzhou Medical University, China
| | - Xueqin Lu
- Department of Endocrinology, 89657The First Affiliated Hospital of Wenzhou Medical University, China
| | - Zhongqiu Lu
- Department of Emergency, 89657The First Affiliated Hospital of Wenzhou Medical University, China
| | - Chenchen Gao
- School of Nursing, 26453Wenzhou Medical University, China
| |
Collapse
|
9
|
Artificial agents’ explainability to support trust: considerations on timing and context. AI & SOCIETY 2022. [DOI: 10.1007/s00146-022-01462-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
AbstractStrategies for improving the explainability of artificial agents are a key approach to support the understandability of artificial agents’ decision-making processes and their trustworthiness. However, since explanations are not inclined to standardization, finding solutions that fit the algorithmic-based decision-making processes of artificial agents poses a compelling challenge. This paper addresses the concept of trust in relation to complementary aspects that play a role in interpersonal and human–agent relationships, such as users’ confidence and their perception of artificial agents’ reliability. Particularly, this paper focuses on non-expert users’ perspectives, since users with little technical knowledge are likely to benefit the most from “post-hoc”, everyday explanations. Drawing upon the explainable AI and social sciences literature, this paper investigates how artificial agent’s explainability and trust are interrelated at different stages of an interaction. Specifically, the possibility of implementing explainability as a trust building, trust maintenance and restoration strategy is investigated. To this extent, the paper identifies and discusses the intrinsic limits and fundamental features of explanations, such as structural qualities and communication strategies. Accordingly, this paper contributes to the debate by providing recommendations on how to maximize the effectiveness of explanations for supporting non-expert users’ understanding and trust.
Collapse
|
10
|
Schmieding ML, Kopka M, Schmidt K, Schulz-Niethammer S, Balzer F, Feufel MA. Triage Accuracy of Symptom Checker Apps: 5-Year Follow-up Evaluation. J Med Internet Res 2022; 24:e31810. [PMID: 35536633 PMCID: PMC9131144 DOI: 10.2196/31810] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 11/19/2021] [Accepted: 01/30/2022] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Symptom checkers are digital tools assisting laypersons in self-assessing the urgency and potential causes of their medical complaints. They are widely used but face concerns from both patients and health care professionals, especially regarding their accuracy. A 2015 landmark study substantiated these concerns using case vignettes to demonstrate that symptom checkers commonly err in their triage assessment. OBJECTIVE This study aims to revisit the landmark index study to investigate whether and how symptom checkers' capabilities have evolved since 2015 and how they currently compare with laypersons' stand-alone triage appraisal. METHODS In early 2020, we searched for smartphone and web-based applications providing triage advice. We evaluated these apps on the same 45 case vignettes as the index study. Using descriptive statistics, we compared our findings with those of the index study and with publicly available data on laypersons' triage capability. RESULTS We retrieved 22 symptom checkers providing triage advice. The median triage accuracy in 2020 (55.8%, IQR 15.1%) was close to that in 2015 (59.1%, IQR 15.5%). The apps in 2020 were less risk averse (odds 1.11:1, the ratio of overtriage errors to undertriage errors) than those in 2015 (odds 2.82:1), missing >40% of emergencies. Few apps outperformed laypersons in either deciding whether emergency care was required or whether self-care was sufficient. No apps outperformed the laypersons on both decisions. CONCLUSIONS Triage performance of symptom checkers has, on average, not improved over the course of 5 years. It decreased in 2 use cases (advice on when emergency care is required and when no health care is needed for the moment). However, triage capability varies widely within the sample of symptom checkers. Whether it is beneficial to seek advice from symptom checkers depends on the app chosen and on the specific question to be answered. Future research should develop resources (eg, case vignette repositories) to audit the capabilities of symptom checkers continuously and independently and provide guidance on when and to whom they should be recommended.
Collapse
Affiliation(s)
- Malte L Schmieding
- Institute of Medical Informatics, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Marvin Kopka
- Institute of Medical Informatics, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Cognitive Psychology and Ergonomics, Department of Psychology and Ergonomics, Technische Universität Berlin, Berlin, Germany
| | - Konrad Schmidt
- Institute of General Practice and Family Medicine, Jena University Hospital, Germany, Jena, Germany
- Institute of General Practice and Family Medicine, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Sven Schulz-Niethammer
- Division of Ergonomics, Department of Psychology and Ergonomics, Technische Universität Berlin, Berlin, Germany
| | - Felix Balzer
- Institute of Medical Informatics, Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Markus A Feufel
- Division of Ergonomics, Department of Psychology and Ergonomics, Technische Universität Berlin, Berlin, Germany
| |
Collapse
|
11
|
Kopka M, Schmieding ML, Rieger T, Roesler E, Balzer F, Feufel MA. Determinants of Laypersons' Trust in Medical Decision Aids: Randomized Controlled Trial. JMIR Hum Factors 2022; 9:e35219. [PMID: 35503248 PMCID: PMC9115664 DOI: 10.2196/35219] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 02/09/2022] [Accepted: 03/06/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Symptom checker apps are patient-facing decision support systems aimed at providing advice to laypersons on whether, where, and how to seek health care (disposition advice). Such advice can improve laypersons' self-assessment and ultimately improve medical outcomes. Past research has mainly focused on the accuracy of symptom checker apps' suggestions. To support decision-making, such apps need to provide not only accurate but also trustworthy advice. To date, only few studies have addressed the question of the extent to which laypersons trust symptom checker app advice or the factors that moderate their trust. Studies on general decision support systems have shown that framing automated systems (anthropomorphic or emphasizing expertise), for example, by using icons symbolizing artificial intelligence (AI), affects users' trust. OBJECTIVE This study aims to identify the factors influencing laypersons' trust in the advice provided by symptom checker apps. Primarily, we investigated whether designs using anthropomorphic framing or framing the app as an AI increases users' trust compared with no such framing. METHODS Through a web-based survey, we recruited 494 US residents with no professional medical training. The participants had to first appraise the urgency of a fictitious patient description (case vignette). Subsequently, a decision aid (mock symptom checker app) provided disposition advice contradicting the participants' appraisal, and they had to subsequently reappraise the vignette. Participants were randomized into 3 groups: 2 experimental groups using visual framing (anthropomorphic, 160/494, 32.4%, vs AI, 161/494, 32.6%) and a neutral group without such framing (173/494, 35%). RESULTS Most participants (384/494, 77.7%) followed the decision aid's advice, regardless of its urgency level. Neither anthropomorphic framing (odds ratio 1.120, 95% CI 0.664-1.897) nor framing as AI (odds ratio 0.942, 95% CI 0.565-1.570) increased behavioral or subjective trust (P=.99) compared with the no-frame condition. Even participants who were extremely certain in their own decisions (ie, 100% certain) commonly changed it in favor of the symptom checker's advice (19/34, 56%). Propensity to trust and eHealth literacy were associated with increased subjective trust in the symptom checker (propensity to trust b=0.25; eHealth literacy b=0.2), whereas sociodemographic variables showed no such link with either subjective or behavioral trust. CONCLUSIONS Contrary to our expectation, neither the anthropomorphic framing nor the emphasis on AI increased trust in symptom checker advice compared with that of a neutral control condition. However, independent of the interface, most participants trusted the mock app's advice, even when they were very certain of their own assessment. Thus, the question arises as to whether laypersons use such symptom checkers as substitutes rather than as aids in their own decision-making. With trust in symptom checkers already high at baseline, the benefit of symptom checkers depends on interface designs that enable users to adequately calibrate their trust levels during usage. TRIAL REGISTRATION Deutsches Register Klinischer Studien DRKS00028561; https://tinyurl.com/rv4utcfb (retrospectively registered).
Collapse
Affiliation(s)
- Marvin Kopka
- Institute of Medical Informatics, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Cognitive Psychology and Ergonomics, Department of Psychology and Ergonomics (IPA), Technische Universität Berlin, Berlin, Germany
| | - Malte L Schmieding
- Institute of Medical Informatics, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Tobias Rieger
- Work, Engineering and Organizational Psychology, Department of Psychology and Ergonomics (IPA), Technische Universität Berlin, Berlin, Germany
| | - Eileen Roesler
- Work, Engineering and Organizational Psychology, Department of Psychology and Ergonomics (IPA), Technische Universität Berlin, Berlin, Germany
| | - Felix Balzer
- Institute of Medical Informatics, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Markus A Feufel
- Division of Ergonomics, Department of Psychology and Ergonomics (IPA), Technische Universität Berlin, Berlin, Germany
| |
Collapse
|
12
|
Funnell EL, Spadaro B, Benacek J, Martin-Key NA, Metcalfe T, Olmert T, Barton-Owen G, Bahn S. Learnings from user feedback of a novel digital mental health assessment. Front Psychiatry 2022; 13:1018095. [PMID: 36339864 PMCID: PMC9630572 DOI: 10.3389/fpsyt.2022.1018095] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Accepted: 09/12/2022] [Indexed: 11/30/2022] Open
Abstract
Digital mental health interventions (DMHI) have the potential to address barriers to face-to-face mental healthcare. In particular, digital mental health assessments offer the opportunity to increase access, reduce strain on services, and improve identification. Despite the potential of DMHIs there remains a high drop-out rate. Therefore, investigating user feedback may elucidate how to best design and deliver an engaging digital mental health assessment. The current study aimed to understand 1304 user perspectives of (1) a newly developed digital mental health assessment to determine which features users consider to be positive or negative and (2) the Composite International Diagnostic Interview (CIDI) employed in a previous large-scale pilot study. A thematic analysis method was employed to identify themes in feedback to three question prompts related to: (1) the questions included in the digital assessment, (2) the homepage design and reminders, and (3) the assessment results report. The largest proportion of the positive and negative feedback received regarding the questions included in the assessment (n = 706), focused on the quality of the assessment (n = 183, 25.92% and n = 284, 40.23%, respectively). Feedback for the homepage and reminders (n = 671) was overwhelmingly positive, with the largest two themes identified being positive usability (i.e., ease of use; n = 500, 74.52%) and functionality (i.e., reminders; n = 278, 41.43%). The most frequently identified negative theme in results report feedback (n = 794) was related to the report content (n = 309, 38.92%), with users stating it was lacking in-depth information. Nevertheless, the most frequent positive theme regarding the results report feedback was related to wellbeing outcomes (n = 145, 18.26%), with users stating the results report, albeit brief, encouraged them to seek professional support. Interestingly, despite some negative feedback, most users reported that completing the digital mental health assessment has been worthwhile (n = 1,017, 77.99%). Based on these findings, we offer recommendations to address potential barriers to user engagement with a digital mental health assessment. In summary, we recommend undertaking extensive co-design activities during the development of digital assessment tools, flexibility in answering modalities within digital assessment, customizable additional features such as reminders, transparency of diagnostic decision making, and an actionable results report with personalized mental health resources.
Collapse
Affiliation(s)
- Erin Lucy Funnell
- Cambridge Centre for Neuropsychiatric Research, Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, United Kingdom.,Psyomics Ltd., Cambridge, United Kingdom
| | - Benedetta Spadaro
- Cambridge Centre for Neuropsychiatric Research, Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, United Kingdom
| | - Jiri Benacek
- Cambridge Centre for Neuropsychiatric Research, Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, United Kingdom
| | - Nayra A Martin-Key
- Cambridge Centre for Neuropsychiatric Research, Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, United Kingdom
| | - Tim Metcalfe
- Independent Researcher, Cambridge, United Kingdom
| | - Tony Olmert
- Cambridge Centre for Neuropsychiatric Research, Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, United Kingdom
| | | | | |
Collapse
|
13
|
Woodcock C, Mittelstadt B, Busbridge D, Blank G. The Impact of Explanations on Layperson Trust in Artificial Intelligence-Driven Symptom Checker Apps: Experimental Study. J Med Internet Res 2021; 23:e29386. [PMID: 34730544 PMCID: PMC8600426 DOI: 10.2196/29386] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Revised: 07/11/2021] [Accepted: 07/27/2021] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Artificial intelligence (AI)-driven symptom checkers are available to millions of users globally and are advocated as a tool to deliver health care more efficiently. To achieve the promoted benefits of a symptom checker, laypeople must trust and subsequently follow its instructions. In AI, explanations are seen as a tool to communicate the rationale behind black-box decisions to encourage trust and adoption. However, the effectiveness of the types of explanations used in AI-driven symptom checkers has not yet been studied. Explanations can follow many forms, including why-explanations and how-explanations. Social theories suggest that why-explanations are better at communicating knowledge and cultivating trust among laypeople. OBJECTIVE The aim of this study is to ascertain whether explanations provided by a symptom checker affect explanatory trust among laypeople and whether this trust is impacted by their existing knowledge of disease. METHODS A cross-sectional survey of 750 healthy participants was conducted. The participants were shown a video of a chatbot simulation that resulted in the diagnosis of either a migraine or temporal arteritis, chosen for their differing levels of epidemiological prevalence. These diagnoses were accompanied by one of four types of explanations. Each explanation type was selected either because of its current use in symptom checkers or because it was informed by theories of contrastive explanation. Exploratory factor analysis of participants' responses followed by comparison-of-means tests were used to evaluate group differences in trust. RESULTS Depending on the treatment group, two or three variables were generated, reflecting the prior knowledge and subsequent mental model that the participants held. When varying explanation type by disease, migraine was found to be nonsignificant (P=.65) and temporal arteritis, marginally significant (P=.09). Varying disease by explanation type resulted in statistical significance for input influence (P=.001), social proof (P=.049), and no explanation (P=.006), with counterfactual explanation (P=.053). The results suggest that trust in explanations is significantly affected by the disease being explained. When laypeople have existing knowledge of a disease, explanations have little impact on trust. Where the need for information is greater, different explanation types engender significantly different levels of trust. These results indicate that to be successful, symptom checkers need to tailor explanations to each user's specific question and discount the diseases that they may also be aware of. CONCLUSIONS System builders developing explanations for symptom-checking apps should consider the recipient's knowledge of a disease and tailor explanations to each user's specific need. Effort should be placed on generating explanations that are personalized to each user of a symptom checker to fully discount the diseases that they may be aware of and to close their information gap.
Collapse
Affiliation(s)
- Claire Woodcock
- Oxford Internet Institute, University of Oxford, Oxford, United Kingdom
| | - Brent Mittelstadt
- Oxford Internet Institute, University of Oxford, Oxford, United Kingdom
| | | | - Grant Blank
- Oxford Internet Institute, University of Oxford, Oxford, United Kingdom
| |
Collapse
|