1
|
Desolda G, Dimauro G, Esposito A, Lanzilotti R, Matera M, Zancanaro M. A Human-AI interaction paradigm and its application to rhinocytology. Artif Intell Med 2024; 155:102933. [PMID: 39094227 DOI: 10.1016/j.artmed.2024.102933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 07/17/2024] [Accepted: 07/19/2024] [Indexed: 08/04/2024]
Abstract
This article explores Human-Centered Artificial Intelligence (HCAI) in medical cytology, with a focus on enhancing the interaction with AI. It presents a Human-AI interaction paradigm that emphasizes explainability and user control of AI systems. It is an iterative negotiation process based on three interaction strategies aimed to (i) elaborate the system outcomes through iterative steps (Iterative Exploration), (ii) explain the AI system's behavior or decisions (Clarification), and (iii) allow non-expert users to trigger simple retraining of the AI model (Reconfiguration). This interaction paradigm is exploited in the redesign of an existing AI-based tool for microscopic analysis of the nasal mucosa. The resulting tool is tested with rhinocytologists. The article discusses the analysis of the results of the conducted evaluation and outlines lessons learned that are relevant for AI in medicine.
Collapse
Affiliation(s)
- Giuseppe Desolda
- Department of Computer Science, University of Bari Aldo Moro, Via E. Orabona 4, Bari, 70125, Italy.
| | - Giovanni Dimauro
- Department of Computer Science, University of Bari Aldo Moro, Via E. Orabona 4, Bari, 70125, Italy.
| | - Andrea Esposito
- Department of Computer Science, University of Bari Aldo Moro, Via E. Orabona 4, Bari, 70125, Italy.
| | - Rosa Lanzilotti
- Department of Computer Science, University of Bari Aldo Moro, Via E. Orabona 4, Bari, 70125, Italy.
| | - Maristella Matera
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Via Ponzio 34/5, Milan, 20133, Italy.
| | - Massimo Zancanaro
- Department of Psychology and Cognitive Science, University of Trento, Corso Bettini 31, Rovereto, 38068, Italy; Fondazione Bruno Kessler, Povo, Trento, 38123, Italy.
| |
Collapse
|
2
|
Hogg HDJ, Martindale APL, Liu X, Denniston AK. Clinical Evaluation of Artificial Intelligence-Enabled Interventions. Invest Ophthalmol Vis Sci 2024; 65:10. [PMID: 39106058 PMCID: PMC11309043 DOI: 10.1167/iovs.65.10.10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 07/02/2024] [Indexed: 08/07/2024] Open
Abstract
Artificial intelligence (AI) health technologies are increasingly available for use in real-world care. This emerging opportunity is accompanied by a need for decision makers and practitioners across healthcare systems to evaluate the safety and effectiveness of these interventions against the needs of their own setting. To meet this need, high-quality evidence regarding AI-enabled interventions must be made available, and decision makers in varying roles and settings must be empowered to evaluate that evidence within the context in which they work. This article summarizes good practices across four stages of evidence generation for AI health technologies: study design, study conduct, study reporting, and study appraisal.
Collapse
Affiliation(s)
- H. D. Jeffry Hogg
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, United Kingdom
- Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, United Kingdom
- NIHR-Supported Incubator in AI & Digital Healthcare, Birmingham, United Kingdom
| | | | - Xiaoxuan Liu
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, United Kingdom
- Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, United Kingdom
- NIHR-Supported Incubator in AI & Digital Healthcare, Birmingham, United Kingdom
- National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, United Kingdom
| | - Alastair K. Denniston
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, United Kingdom
- Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, United Kingdom
- NIHR-Supported Incubator in AI & Digital Healthcare, Birmingham, United Kingdom
- National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, United Kingdom
| |
Collapse
|
3
|
Introzzi L, Zonca J, Cabitza F, Cherubini P, Reverberi C. Enhancing human-AI collaboration: The case of colonoscopy. Dig Liver Dis 2024; 56:1131-1139. [PMID: 37940501 DOI: 10.1016/j.dld.2023.10.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 10/19/2023] [Accepted: 10/23/2023] [Indexed: 11/10/2023]
Abstract
Diagnostic errors impact patient health and healthcare costs. Artificial Intelligence (AI) shows promise in mitigating this burden by supporting Medical Doctors in decision-making. However, the mere display of excellent or even superhuman performance by AI in specific tasks does not guarantee a positive impact on medical practice. Effective AI assistance should target the primary causes of human errors and foster effective collaborative decision-making with human experts who remain the ultimate decision-makers. In this narrative review, we apply these principles to the specific scenario of AI assistance during colonoscopy. By unraveling the neurocognitive foundations of the colonoscopy procedure, we identify multiple bottlenecks in perception, attention, and decision-making that contribute to diagnostic errors, shedding light on potential interventions to mitigate them. Furthermore, we explored how existing AI devices fare in clinical practice and whether they achieved an optimal integration with the human decision-maker. We argue that to foster optimal Human-AI collaboration, future research should expand our knowledge of factors influencing AI's impact, establish evidence-based cognitive models, and develop training programs based on them. These efforts will enhance human-AI collaboration, ultimately improving diagnostic accuracy and patient outcomes. The principles illuminated in this review hold more general value, extending their relevance to a wide array of medical procedures and beyond.
Collapse
Affiliation(s)
- Luca Introzzi
- Department of Psychology, Università Milano - Bicocca, Milano, Italy
| | - Joshua Zonca
- Department of Psychology, Università Milano - Bicocca, Milano, Italy; Milan Center for Neuroscience, Università Milano - Bicocca, Milano, Italy
| | - Federico Cabitza
- Department of Informatics, Systems and Communication, Università Milano - Bicocca, Milano, Italy; IRCCS Istituto Ortopedico Galeazzi, Milano, Italy
| | - Paolo Cherubini
- Department of Brain and Behavioral Sciences, Università Statale di Pavia, Pavia, Italy
| | - Carlo Reverberi
- Department of Psychology, Università Milano - Bicocca, Milano, Italy; Milan Center for Neuroscience, Università Milano - Bicocca, Milano, Italy.
| |
Collapse
|
4
|
Cabitza F, Natali C, Famiglini L, Campagner A, Caccavella V, Gallazzi E. Never tell me the odds: Investigating pro-hoc explanations in medical decision making. Artif Intell Med 2024; 150:102819. [PMID: 38553159 DOI: 10.1016/j.artmed.2024.102819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 01/28/2024] [Accepted: 02/21/2024] [Indexed: 04/02/2024]
Abstract
This paper examines a kind of explainable AI, centered around what we term pro-hoc explanations, that is a form of support that consists of offering alternative explanations (one for each possible outcome) instead of a specific post-hoc explanation following specific advice. Specifically, our support mechanism utilizes explanations by examples, featuring analogous cases for each category in a binary setting. Pro-hoc explanations are an instance of what we called frictional AI, a general class of decision support aimed at achieving a useful compromise between the increase of decision effectiveness and the mitigation of cognitive risks, such as over-reliance, automation bias and deskilling. To illustrate an instance of frictional AI, we conducted an empirical user study to investigate its impact on the task of radiological detection of vertebral fractures in x-rays. Our study engaged 16 orthopedists in a 'human-first, second-opinion' interaction protocol. In this protocol, clinicians first made initial assessments of the x-rays without AI assistance and then provided their final diagnosis after considering the pro-hoc explanations. Our findings indicate that physicians, particularly those with less experience, perceived pro-hoc XAI support as significantly beneficial, even though it did not notably enhance their diagnostic accuracy. However, their increased confidence in final diagnoses suggests a positive overall impact. Given the promisingly high effect size observed, our results advocate for further research into pro-hoc explanations specifically, and into the broader concept of frictional AI.
Collapse
Affiliation(s)
- Federico Cabitza
- Università degli Studi di Milano-Bicocca, Milan, Italy; IRCCS Istituto Ortopedico Galeazzi, Milan, Italy.
| | - Chiara Natali
- Università degli Studi di Milano-Bicocca, Milan, Italy
| | | | | | | | - Enrico Gallazzi
- Istituto Ortopedico Gaetano Pini - ASST Pini-CTO, Milan, Italy
| |
Collapse
|
5
|
Campion JR, O'Connor DB, Lahiff C. Human-artificial intelligence interaction in gastrointestinal endoscopy. World J Gastrointest Endosc 2024; 16:126-135. [PMID: 38577646 PMCID: PMC10989254 DOI: 10.4253/wjge.v16.i3.126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Revised: 01/18/2024] [Accepted: 02/23/2024] [Indexed: 03/14/2024] Open
Abstract
The number and variety of applications of artificial intelligence (AI) in gastrointestinal (GI) endoscopy is growing rapidly. New technologies based on machine learning (ML) and convolutional neural networks (CNNs) are at various stages of development and deployment to assist patients and endoscopists in preparing for endoscopic procedures, in detection, diagnosis and classification of pathology during endoscopy and in confirmation of key performance indicators. Platforms based on ML and CNNs require regulatory approval as medical devices. Interactions between humans and the technologies we use are complex and are influenced by design, behavioural and psychological elements. Due to the substantial differences between AI and prior technologies, important differences may be expected in how we interact with advice from AI technologies. Human–AI interaction (HAII) may be optimised by developing AI algorithms to minimise false positives and designing platform interfaces to maximise usability. Human factors influencing HAII may include automation bias, alarm fatigue, algorithm aversion, learning effect and deskilling. Each of these areas merits further study in the specific setting of AI applications in GI endoscopy and professional societies should engage to ensure that sufficient emphasis is placed on human-centred design in development of new AI technologies.
Collapse
Affiliation(s)
- John R Campion
- Department of Gastroenterology, Mater Misericordiae University Hospital, Dublin D07 AX57, Ireland
- School of Medicine, University College Dublin, Dublin D04 C7X2, Ireland
| | - Donal B O'Connor
- Department of Surgery, Trinity College Dublin, Dublin D02 R590, Ireland
| | - Conor Lahiff
- Department of Gastroenterology, Mater Misericordiae University Hospital, Dublin D07 AX57, Ireland
- School of Medicine, University College Dublin, Dublin D04 C7X2, Ireland
| |
Collapse
|
6
|
Famiglini L, Campagner A, Barandas M, La Maida GA, Gallazzi E, Cabitza F. Evidence-based XAI: An empirical approach to design more effective and explainable decision support systems. Comput Biol Med 2024; 170:108042. [PMID: 38308866 DOI: 10.1016/j.compbiomed.2024.108042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 12/19/2023] [Accepted: 01/26/2024] [Indexed: 02/05/2024]
Abstract
This paper proposes a user study aimed at evaluating the impact of Class Activation Maps (CAMs) as an eXplainable AI (XAI) method in a radiological diagnostic task, the detection of thoracolumbar (TL) fractures from vertebral X-rays. In particular, we focus on two oft-neglected features of CAMs, that is granularity and coloring, in terms of what features, lower-level vs higher-level, should the maps highlight and adopting which coloring scheme, to bring better impact to the decision-making process, both in terms of diagnostic accuracy (that is effectiveness) and of user-centered dimensions, such as perceived confidence and utility (that is satisfaction), depending on case complexity, AI accuracy, and user expertise. Our findings show that lower-level features CAMs, which highlight more focused anatomical landmarks, are associated with higher diagnostic accuracy than higher-level features CAMs, particularly among experienced physicians. Moreover, despite the intuitive appeal of semantic CAMs, traditionally colored CAMs consistently yielded higher diagnostic accuracy across all groups. Our results challenge some prevalent assumptions in the XAI field and emphasize the importance of adopting an evidence-based and human-centered approach to design and evaluate AI- and XAI-assisted diagnostic tools. To this aim, the paper also proposes a hierarchy of evidence framework to help designers and practitioners choose the XAI solutions that optimize performance and satisfaction on the basis of the strongest evidence available or to focus on the gaps in the literature that need to be filled to move from opinionated and eminence-based research to one more based on empirical evidence and end-user work and preferences.
Collapse
Affiliation(s)
- Lorenzo Famiglini
- Department of Computer Science, Systems and Communication, University of Milano-Bicocca, Milan, Italy.
| | | | - Marilia Barandas
- Associação Fraunhofer Portugal Research, Rua Alfredo Allen 455/461, Porto, Portugal
| | | | - Enrico Gallazzi
- Istituto Ortopedico Gaetano Pini - ASST Pini-CTO, Milan, Italy
| | - Federico Cabitza
- Department of Computer Science, Systems and Communication, University of Milano-Bicocca, Milan, Italy; IRCCS Istituto Ortopedico Galeazzi, Milan, Italy
| |
Collapse
|
7
|
He X, Zheng X, Ding H. Existing Barriers Faced by and Future Design Recommendations for Direct-to-Consumer Health Care Artificial Intelligence Apps: Scoping Review. J Med Internet Res 2023; 25:e50342. [PMID: 38109173 PMCID: PMC10758939 DOI: 10.2196/50342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Revised: 09/20/2023] [Accepted: 11/28/2023] [Indexed: 12/19/2023] Open
Abstract
BACKGROUND Direct-to-consumer (DTC) health care artificial intelligence (AI) apps hold the potential to bridge the spatial and temporal disparities in health care resources, but they also come with individual and societal risks due to AI errors. Furthermore, the manner in which consumers interact directly with health care AI is reshaping traditional physician-patient relationships. However, the academic community lacks a systematic comprehension of the research overview for such apps. OBJECTIVE This paper systematically delineated and analyzed the characteristics of included studies, identified existing barriers and design recommendations for DTC health care AI apps mentioned in the literature and also provided a reference for future design and development. METHODS This scoping review followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews guidelines and was conducted according to Arksey and O'Malley's 5-stage framework. Peer-reviewed papers on DTC health care AI apps published until March 27, 2023, in Web of Science, Scopus, the ACM Digital Library, IEEE Xplore, PubMed, and Google Scholar were included. The papers were analyzed using Braun and Clarke's reflective thematic analysis approach. RESULTS Of the 2898 papers retrieved, 32 (1.1%) covering this emerging field were included. The included papers were recently published (2018-2023), and most (23/32, 72%) were from developed countries. The medical field was mostly general practice (8/32, 25%). In terms of users and functionalities, some apps were designed solely for single-consumer groups (24/32, 75%), offering disease diagnosis (14/32, 44%), health self-management (8/32, 25%), and health care information inquiry (4/32, 13%). Other apps connected to physicians (5/32, 16%), family members (1/32, 3%), nursing staff (1/32, 3%), and health care departments (2/32, 6%), generally to alert these groups to abnormal conditions of consumer users. In addition, 8 barriers and 6 design recommendations related to DTC health care AI apps were identified. Some more subtle obstacles that are particularly worth noting and corresponding design recommendations in consumer-facing health care AI systems, including enhancing human-centered explainability, establishing calibrated trust and addressing overtrust, demonstrating empathy in AI, improving the specialization of consumer-grade products, and expanding the diversity of the test population, were further discussed. CONCLUSIONS The booming DTC health care AI apps present both risks and opportunities, which highlights the need to explore their current status. This paper systematically summarized and sorted the characteristics of the included studies, identified existing barriers faced by, and made future design recommendations for such apps. To the best of our knowledge, this is the first study to systematically summarize and categorize academic research on these apps. Future studies conducting the design and development of such systems could refer to the results of this study, which is crucial to improve the health care services provided by DTC health care AI apps.
Collapse
Affiliation(s)
- Xin He
- School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan, China
| | - Xi Zheng
- School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan, China
| | - Huiyuan Ding
- School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan, China
| |
Collapse
|
8
|
Cadamuro J, Cabitza F, Debeljak Z, De Bruyne S, Frans G, Perez SM, Ozdemir H, Tolios A, Carobene A, Padoan A. Potentials and pitfalls of ChatGPT and natural-language artificial intelligence models for the understanding of laboratory medicine test results. An assessment by the European Federation of Clinical Chemistry and Laboratory Medicine (EFLM) Working Group on Artificial Intelligence (WG-AI). Clin Chem Lab Med 2023; 61:1158-1166. [PMID: 37083166 DOI: 10.1515/cclm-2023-0355] [Citation(s) in RCA: 26] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Accepted: 04/12/2023] [Indexed: 04/22/2023]
Abstract
OBJECTIVES ChatGPT, a tool based on natural language processing (NLP), is on everyone's mind, and several potential applications in healthcare have been already proposed. However, since the ability of this tool to interpret laboratory test results has not yet been tested, the EFLM Working group on Artificial Intelligence (WG-AI) has set itself the task of closing this gap with a systematic approach. METHODS WG-AI members generated 10 simulated laboratory reports of common parameters, which were then passed to ChatGPT for interpretation, according to reference intervals (RI) and units, using an optimized prompt. The results were subsequently evaluated independently by all WG-AI members with respect to relevance, correctness, helpfulness and safety. RESULTS ChatGPT recognized all laboratory tests, it could detect if they deviated from the RI and gave a test-by-test as well as an overall interpretation. The interpretations were rather superficial, not always correct, and, only in some cases, judged coherently. The magnitude of the deviation from the RI seldom plays a role in the interpretation of laboratory tests, and artificial intelligence (AI) did not make any meaningful suggestion regarding follow-up diagnostics or further procedures in general. CONCLUSIONS ChatGPT in its current form, being not specifically trained on medical data or laboratory data in particular, may only be considered a tool capable of interpreting a laboratory report on a test-by-test basis at best, but not on the interpretation of an overall diagnostic picture. Future generations of similar AIs with medical ground truth training data might surely revolutionize current processes in healthcare, despite this implementation is not ready yet.
Collapse
Affiliation(s)
- Janne Cadamuro
- Department of Laboratory Medicine, Paracelsus Medical University Salzburg, Salzburg, Austria
| | - Federico Cabitza
- DISCo, Università degli Studi di Milano-Bicocca, Milano, Italy
- IRCCS Istituto Ortopedico Galeazzi, Milan, Italy
| | - Zeljko Debeljak
- Faculty of Medicine, Josip Juraj Strossmayer University of Osijek, Osijek, Croatia
- Clinical Institute of Laboratory Diagnostics, University Hospital Center Osijek, Osijek, Croatia
| | - Sander De Bruyne
- Department of Laboratory Medicine, Ghent University Hospital, Ghent, Belgium
| | - Glynis Frans
- Department of Laboratory Medicine, University Hospitals Leuven, KU Leuven, Leuven, Belgium
| | - Salomon Martin Perez
- Unidad de Bioquímica Clínica, Hospital Universitario Virgen Macarena, Sevilla, Spain
| | - Habib Ozdemir
- Department of Medical Biochemistry, Faculty of Medicine, Manisa Celal Bayar University, Manisa, Türkiye
| | - Alexander Tolios
- Department of Transfusion Medicine and Cell Therapy, Medical University of Vienna, Vienna, Austria
| | - Anna Carobene
- IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Andrea Padoan
- Department of Medicine (DIMED), University of Padova, Padova, Italy
| |
Collapse
|