1
|
Xu Z, Evans L, Song J, Chae S, Davoudi A, Bowles KH, McDonald MV, Topaz M. Exploring home healthcare clinicians' needs for using clinical decision support systems for early risk warning. J Am Med Inform Assoc 2024; 31:2641-2650. [PMID: 39302103 PMCID: PMC11491664 DOI: 10.1093/jamia/ocae247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 07/05/2024] [Accepted: 09/11/2024] [Indexed: 09/22/2024] Open
Abstract
OBJECTIVES To explore home healthcare (HHC) clinicians' needs for Clinical Decision Support Systems (CDSS) information delivery for early risk warning within HHC workflows. METHODS Guided by the CDS "Five-Rights" framework, we conducted semi-structured interviews with multidisciplinary HHC clinicians from April 2023 to August 2023. We used deductive and inductive content analysis to investigate informants' responses regarding CDSS information delivery. RESULTS Interviews with thirteen HHC clinicians yielded 16 codes mapping to the CDS "Five-Rights" framework (right information, right person, right format, right channel, right time) and 11 codes for unintended consequences and training needs. Clinicians favored risk levels displayed in color-coded horizontal bars, concrete risk indicators in bullet points, and actionable instructions in the existing EHR system. They preferred non-intrusive risk alerts requiring mandatory confirmation. Clinicians anticipated risk information updates aligned with patient's condition severity and their visit pace. Additionally, they requested training to understand the CDSS's underlying logic, and raised concerns about information accuracy and data privacy. DISCUSSION While recognizing CDSS's value in enhancing early risk warning, clinicians highlighted concerns about increased workload, alert fatigue, and CDSS misuse. The top risk factors identified by machine learning algorithms, especially text features, can be ambiguous due to a lack of context. Future research should ensure that CDSS outputs align with clinical evidence and are explainable. CONCLUSION This study identified HHC clinicians' expectations, preferences, adaptations, and unintended uses of CDSS for early risk warning. Our findings endorse operationalizing the CDS "Five-Rights" framework to optimize CDSS information delivery and integration into HHC workflows.
Collapse
Affiliation(s)
- Zidu Xu
- School of Nursing, Columbia University, New York, NY 10032, United States
| | - Lauren Evans
- Center for Home Care Policy & Research, VNS Health, New York, NY 10017, United States
| | - Jiyoun Song
- School of Nursing, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Sena Chae
- College of Nursing, The University of Iowa, Iowa City, IA 52242, United States
| | - Anahita Davoudi
- Center for Home Care Policy & Research, VNS Health, New York, NY 10017, United States
| | - Kathryn H Bowles
- Center for Home Care Policy & Research, VNS Health, New York, NY 10017, United States
- School of Nursing, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Margaret V McDonald
- Center for Home Care Policy & Research, VNS Health, New York, NY 10017, United States
| | - Maxim Topaz
- School of Nursing, Columbia University, New York, NY 10032, United States
- Center for Home Care Policy & Research, VNS Health, New York, NY 10017, United States
| |
Collapse
|
2
|
Fritz BA, Pugazenthi S, Budelier TP, Tellor Pennington BR, King CR, Avidan MS, Abraham J. User-Centered Design of a Machine Learning Dashboard for Prediction of Postoperative Complications. Anesth Analg 2024; 138:804-813. [PMID: 37339083 PMCID: PMC10730770 DOI: 10.1213/ane.0000000000006577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/22/2023]
Abstract
BACKGROUND Machine learning models can help anesthesiology clinicians assess patients and make clinical and operational decisions, but well-designed human-computer interfaces are necessary for machine learning model predictions to result in clinician actions that help patients. Therefore, the goal of this study was to apply a user-centered design framework to create a user interface for displaying machine learning model predictions of postoperative complications to anesthesiology clinicians. METHODS Twenty-five anesthesiology clinicians (attending anesthesiologists, resident physicians, and certified registered nurse anesthetists) participated in a 3-phase study that included (phase 1) semistructured focus group interviews and a card sorting activity to characterize user workflows and needs; (phase 2) simulated patient evaluation incorporating a low-fidelity static prototype display interface followed by a semistructured interview; and (phase 3) simulated patient evaluation with concurrent think-aloud incorporating a high-fidelity prototype display interface in the electronic health record. In each phase, data analysis included open coding of session transcripts and thematic analysis. RESULTS During the needs assessment phase (phase 1), participants voiced that (a) identifying preventable risk related to modifiable risk factors is more important than nonpreventable risk, (b) comprehensive patient evaluation follows a systematic approach that relies heavily on the electronic health record, and (c) an easy-to-use display interface should have a simple layout that uses color and graphs to minimize time and energy spent reading it. When performing simulations using the low-fidelity prototype (phase 2), participants reported that (a) the machine learning predictions helped them to evaluate patient risk, (b) additional information about how to act on the risk estimate would be useful, and (c) correctable problems related to textual content existed. When performing simulations using the high-fidelity prototype (phase 3), usability problems predominantly related to the presentation of information and functionality. Despite the usability problems, participants rated the system highly on the System Usability Scale (mean score, 82.5; standard deviation, 10.5). CONCLUSIONS Incorporating user needs and preferences into the design of a machine learning dashboard results in a display interface that clinicians rate as highly usable. Because the system demonstrates usability, evaluation of the effects of implementation on both process and clinical outcomes is warranted.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Joanna Abraham
- From the Department of Anesthesiology
- Institute for Informatics, Washington University School of Medicine, St. Louis, Missouri
| |
Collapse
|
3
|
Evans RP, Bryant LD, Russell G, Absolom K. Trust and acceptability of data-driven clinical recommendations in everyday practice: A scoping review. Int J Med Inform 2024; 183:105342. [PMID: 38266426 DOI: 10.1016/j.ijmedinf.2024.105342] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 12/08/2023] [Accepted: 01/14/2024] [Indexed: 01/26/2024]
Abstract
BACKGROUND Increasing attention is being given to the analysis of large health datasets to derive new clinical decision support systems (CDSS). However, few data-driven CDSS are being adopted into clinical practice. Trust in these tools is believed to be fundamental for acceptance and uptake but to date little attention has been given to defining or evaluating trust in clinical settings. OBJECTIVES A scoping review was conducted to explore how and where acceptability and trustworthiness of data-driven CDSS have been assessed from the health professional's perspective. METHODS Medline, Embase, PsycInfo, Web of Science, Scopus, ACM Digital, IEEE Xplore and Google Scholar were searched in March 2022 using terms expanded from: "data-driven" AND "clinical decision support" AND "acceptability". Included studies focused on healthcare practitioner-facing data-driven CDSS, relating directly to clinical care. They included trust or a proxy as an outcome, or in the discussion. The preferred reporting items for systematic reviews and meta-analyses extension for scoping reviews (PRISMA-ScR) is followed in the reporting of this review. RESULTS 3291 papers were screened, with 85 primary research studies eligible for inclusion. Studies covered a diverse range of clinical specialisms and intended contexts, but hypothetical systems (24) outnumbered those in clinical use (18). Twenty-five studies measured trust, via a wide variety of quantitative, qualitative and mixed methods. A further 24 discussed themes of trust without it being explicitly evaluated, and from these, themes of transparency, explainability, and supporting evidence were identified as factors influencing healthcare practitioner trust in data-driven CDSS. CONCLUSION There is a growing body of research on data-driven CDSS, but few studies have explored stakeholder perceptions in depth, with limited focused research on trustworthiness. Further research on healthcare practitioner acceptance, including requirements for transparency and explainability, should inform clinical implementation.
Collapse
Affiliation(s)
- Ruth P Evans
- University of Leeds, Woodhouse Lane, Leeds LS2 9JT, UK.
| | | | - Gregor Russell
- Bradford District Care Trust, Bradford, New Mill, Victoria Rd, BD18 3LD, UK.
| | - Kate Absolom
- University of Leeds, Woodhouse Lane, Leeds LS2 9JT, UK.
| |
Collapse
|
4
|
Subramanian HV, Canfield C, Shank DB. Designing explainable AI to improve human-AI team performance: A medical stakeholder-driven scoping review. Artif Intell Med 2024; 149:102780. [PMID: 38462282 DOI: 10.1016/j.artmed.2024.102780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 12/20/2023] [Accepted: 01/14/2024] [Indexed: 03/12/2024]
Abstract
The rise of complex AI systems in healthcare and other sectors has led to a growing area of research called Explainable AI (XAI) designed to increase transparency. In this area, quantitative and qualitative studies focus on improving user trust and task performance by providing system- and prediction-level XAI features. We analyze stakeholder engagement events (interviews and workshops) on the use of AI for kidney transplantation. From this we identify themes which we use to frame a scoping literature review on current XAI features. The stakeholder engagement process lasted over nine months covering three stakeholder group's workflows, determining where AI could intervene and assessing a mock XAI decision support system. Based on the stakeholder engagement, we identify four major themes relevant to designing XAI systems - 1) use of AI predictions, 2) information included in AI predictions, 3) personalization of AI predictions for individual differences, and 4) customizing AI predictions for specific cases. Using these themes, our scoping literature review finds that providing AI predictions before, during, or after decision-making could be beneficial depending on the complexity of the stakeholder's task. Additionally, expert stakeholders like surgeons prefer minimal to no XAI features, AI prediction, and uncertainty estimates for easy use cases. However, almost all stakeholders prefer to have optional XAI features to review when needed, especially in hard-to-predict cases. The literature also suggests that providing both system- and prediction-level information is necessary to build the user's mental model of the system appropriately. Although XAI features improve users' trust in the system, human-AI team performance is not always enhanced. Overall, stakeholders prefer to have agency over the XAI interface to control the level of information based on their needs and task complexity. We conclude with suggestions for future research, especially on customizing XAI features based on preferences and tasks.
Collapse
Affiliation(s)
- Harishankar V Subramanian
- Engineering Management & Systems Engineering, Missouri University of Science and Technology, 600 W 14(th) Street, Rolla, MO 65409, United States of America
| | - Casey Canfield
- Engineering Management & Systems Engineering, Missouri University of Science and Technology, 600 W 14(th) Street, Rolla, MO 65409, United States of America.
| | - Daniel B Shank
- Psychological Science, Missouri University of Science and Technology, 500 W 14(th) Street, Rolla, MO 65409, United States of America
| |
Collapse
|
5
|
Giddings R, Joseph A, Callender T, Janes SM, van der Schaar M, Sheringham J, Navani N. Factors influencing clinician and patient interaction with machine learning-based risk prediction models: a systematic review. Lancet Digit Health 2024; 6:e131-e144. [PMID: 38278615 DOI: 10.1016/s2589-7500(23)00241-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 10/20/2023] [Accepted: 11/14/2023] [Indexed: 01/28/2024]
Abstract
Machine learning (ML)-based risk prediction models hold the potential to support the health-care setting in several ways; however, use of such models is scarce. We aimed to review health-care professional (HCP) and patient perceptions of ML risk prediction models in published literature, to inform future risk prediction model development. Following database and citation searches, we identified 41 articles suitable for inclusion. Article quality varied with qualitative studies performing strongest. Overall, perceptions of ML risk prediction models were positive. HCPs and patients considered that models have the potential to add benefit in the health-care setting. However, reservations remain; for example, concerns regarding data quality for model development and fears of unintended consequences following ML model use. We identified that public views regarding these models might be more negative than HCPs and that concerns (eg, extra demands on workload) were not always borne out in practice. Conclusions are tempered by the low number of patient and public studies, the absence of participant ethnic diversity, and variation in article quality. We identified gaps in knowledge (particularly views from under-represented groups) and optimum methods for model explanation and alerts, which require future research.
Collapse
Affiliation(s)
- Rebecca Giddings
- Lungs for Living Research Centre, UCL Respiratory, University College London, London, UK.
| | - Anabel Joseph
- Lungs for Living Research Centre, UCL Respiratory, University College London, London, UK
| | - Thomas Callender
- Lungs for Living Research Centre, UCL Respiratory, University College London, London, UK
| | - Sam M Janes
- Lungs for Living Research Centre, UCL Respiratory, University College London, London, UK
| | - Mihaela van der Schaar
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK; The Alan Turing Institute, London, UK
| | - Jessica Sheringham
- Department of Applied Health Research, University College London, London, UK
| | - Neal Navani
- Lungs for Living Research Centre, UCL Respiratory, University College London, London, UK
| |
Collapse
|
6
|
Fischer A, Rietveld A, Teunissen P, Hoogendoorn M, Bakker P. What is the future of artificial intelligence in obstetrics? A qualitative study among healthcare professionals. BMJ Open 2023; 13:e076017. [PMID: 37879682 PMCID: PMC10603416 DOI: 10.1136/bmjopen-2023-076017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/27/2023] Open
Abstract
OBJECTIVE This work explores the perceptions of obstetrical clinicians about artificial intelligence (AI) in order to bridge the gap in uptake of AI between research and medical practice. Identifying potential areas where AI can contribute to clinical practice, enables AI research to align with the needs of clinicians and ultimately patients. DESIGN Qualitative interview study. SETTING A national study conducted in the Netherlands between November 2022 and February 2023. PARTICIPANTS Dutch clinicians working in obstetrics with varying relevant work experience, gender and age. ANALYSIS Thematic analysis of qualitative interview transcripts. RESULTS Thirteen gynaecologists were interviewed about hypothetical scenarios of an implemented AI model. Thematic analysis identified two major themes: perceived usefulness and trust. Usefulness involved AI extending human brain capacity in complex pattern recognition and information processing, reducing contextual influence and saving time. Trust required validation, explainability and successful personal experience. This result shows two paradoxes: first, AI is expected to provide added value by surpassing human capabilities, yet also a need to understand the parameters and their influence on predictions for trust and adoption was expressed. Second, participants recognised the value of incorporating numerous parameters into a model, but they also believed that certain contextual factors should only be considered by humans, as it would be undesirable for AI models to use that information. CONCLUSIONS Obstetricians' opinions on the potential value of AI highlight the need for clinician-AI researcher collaboration. Trust can be built through conventional means like randomised controlled trials and guidelines. Holistic impact metrics, such as changes in workflow, not just clinical outcomes, should guide AI model development. Further research is needed for evaluating evolving AI systems beyond traditional validation methods.
Collapse
Affiliation(s)
- Anne Fischer
- Department of Obstetrics and Gynecology, Amsterdam UMC Location VUmc, Amsterdam, The Netherlands
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Amsterdam Reproduction and Development Research Institute, Amsterdam, The Netherlands
| | - Anna Rietveld
- Department of Obstetrics and Gynecology, Amsterdam UMC Location VUmc, Amsterdam, The Netherlands
- Amsterdam Reproduction and Development Research Institute, Amsterdam, The Netherlands
| | - Pim Teunissen
- School of Health Professions Education, Faculty of Health Medicine and Life Sciences, Maastricht University, Maastricht, The Netherlands
- Department of Gynaecology & Obstetrics, Maastricht UMC, Maastricht, The Netherlands
| | - Mark Hoogendoorn
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Petra Bakker
- Department of Obstetrics and Gynecology, Amsterdam UMC Location VUmc, Amsterdam, The Netherlands
- Amsterdam Reproduction and Development Research Institute, Amsterdam, The Netherlands
| |
Collapse
|
7
|
Hogg HDJ, Al-Zubaidy M, Keane PA, Hughes G, Beyer FR, Maniatopoulos G. Evaluating the translation of implementation science to clinical artificial intelligence: a bibliometric study of qualitative research. FRONTIERS IN HEALTH SERVICES 2023; 3:1161822. [PMID: 37492632 PMCID: PMC10364639 DOI: 10.3389/frhs.2023.1161822] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 06/26/2023] [Indexed: 07/27/2023]
Abstract
Introduction Whilst a theoretical basis for implementation research is seen as advantageous, there is little clarity over if and how the application of theories, models or frameworks (TMF) impact implementation outcomes. Clinical artificial intelligence (AI) continues to receive multi-stakeholder interest and investment, yet a significant implementation gap remains. This bibliometric study aims to measure and characterize TMF application in qualitative clinical AI research to identify opportunities to improve research practice and its impact on clinical AI implementation. Methods Qualitative research of stakeholder perspectives on clinical AI published between January 2014 and October 2022 was systematically identified. Eligible studies were characterized by their publication type, clinical and geographical context, type of clinical AI studied, data collection method, participants and application of any TMF. Each TMF applied by eligible studies, its justification and mode of application was characterized. Results Of 202 eligible studies, 70 (34.7%) applied a TMF. There was an 8-fold increase in the number of publications between 2014 and 2022 but no significant increase in the proportion applying TMFs. Of the 50 TMFs applied, 40 (80%) were only applied once, with the Technology Acceptance Model applied most frequently (n = 9). Seven TMFs were novel contributions embedded within an eligible study. A minority of studies justified TMF application (n = 51,58.6%) and it was uncommon to discuss an alternative TMF or the limitations of the one selected (n = 11,12.6%). The most common way in which a TMF was applied in eligible studies was data analysis (n = 44,50.6%). Implementation guidelines or tools were explicitly referenced by 2 reports (1.0%). Conclusion TMFs have not been commonly applied in qualitative research of clinical AI. When TMFs have been applied there has been (i) little consensus on TMF selection (ii) limited description of selection rationale and (iii) lack of clarity over how TMFs inform research. We consider this to represent an opportunity to improve implementation science's translation to clinical AI research and clinical AI into practice by promoting the rigor and frequency of TMF application. We recommend that the finite resources of the implementation science community are diverted toward increasing accessibility and engagement with theory informed practices. The considered application of theories, models and frameworks (TMF) are thought to contribute to the impact of implementation science on the translation of innovations into real-world care. The frequency and nature of TMF use are yet to be described within digital health innovations, including the prominent field of clinical AI. A well-known implementation gap, coined as the "AI chasm" continues to limit the impact of clinical AI on real-world care. From this bibliometric study of the frequency and quality of TMF use within qualitative clinical AI research, we found that TMFs are usually not applied, their selection is highly varied between studies and there is not often a convincing rationale for their selection. Promoting the rigor and frequency of TMF use appears to present an opportunity to improve the translation of clinical AI into practice.
Collapse
Affiliation(s)
- H. D. J. Hogg
- Faculty of Medical Sciences, Newcastle University, Newcastle Upon Tyne, United Kingdom
- The Royal Victoria Infirmary, Newcastle Upon Tyne Hospitals NHS Foundation Trust, Newcastle Upon Tyne, United Kingdom
- Moorfields Eye Hospital NHS Foundation Trust, London, United Kingdom
| | - M. Al-Zubaidy
- The Royal Victoria Infirmary, Newcastle Upon Tyne Hospitals NHS Foundation Trust, Newcastle Upon Tyne, United Kingdom
| | - P. A. Keane
- Moorfields Eye Hospital NHS Foundation Trust, London, United Kingdom
- Institute of Ophthalmology, University College London, London, United Kingdom
| | - G. Hughes
- Nuffield Department of Primary Care Health Sciences, Oxford University, Oxford, United Kingdom
- University ofLeicester School of Business, University of Leicester, Leicester, United Kingdom
| | - F. R. Beyer
- Evidence Synthesis Group, Population Health Sciences Institute, Newcastle University, Newcastle Upon Tyne, United Kingdom
| | - G. Maniatopoulos
- Faculty of Medical Sciences, Newcastle University, Newcastle Upon Tyne, United Kingdom
- University ofLeicester School of Business, University of Leicester, Leicester, United Kingdom
| |
Collapse
|
8
|
Machine Learning Modeling of Disease Treatment Default: A Comparative Analysis of Classification Models. ADVANCES IN PUBLIC HEALTH 2023. [DOI: 10.1155/2023/4168770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Generally, treatment default of diseases by patients is regarded as the biggest threat to favourable disease treatment outcomes. It is seen as the reason for the resurgence of infectious diseases including tuberculosis in some developing countries. Sadly, its occurrence in chronic disease management is associated with high morbidity and mortality rates. Many reasons have been adduced for this phenomenon. Exploration of treatment default using biographic and behavioral metrics collected from patients and healthcare providers remains a challenge. The focus on contextual nonbiomedical measurements using a supervised machine learning modeling technique is aimed at creating an understanding of the reasons why treatment default occurs, including identifying important contextual parameters that contribute to treatment default. The predicted accuracy scores of four supervised machine learning algorithms, namely, gradient boosting, logistic regression, random forest, and support vector machine were 0.87, 0.90, 0.81, and 0.77, respectively. Additionally, performance indicators such as the positive predicted value score for the four models ranged between 98.72%–98.87%, and the negative predicted values of gradient boosting, logistic regression, random forest, and support vector machine were 50%, 75%, 22.22%, and 50%, respectively. Logistic regression appears to have the highest negative-predicted value score of 75%, with the smallest error margin of 25% and the highest accuracy score of 0.90, and the random forest had the lowest negative predicted value score of 22.22%, registering the highest error margin of 77.78%. By performing a chi-square correlation statistic test of variable independence, this study suggests that age, presence of comorbidities, concern for long queuing/waiting time at treatment facilities, availability of qualified clinicians, and the patient’s nutritional state whether on a controlled diet or not are likely to affect their adherence to disease treatment and could result in an increased risk of default.
Collapse
|
9
|
Abraham J, Bartek B, Meng A, Ryan King C, Xue B, Lu C, Avidan MS. Integrating machine learning predictions for perioperative risk management: Towards an empirical design of a flexible-standardized risk assessment tool. J Biomed Inform 2023; 137:104270. [PMID: 36516944 DOI: 10.1016/j.jbi.2022.104270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 12/02/2022] [Accepted: 12/07/2022] [Indexed: 12/14/2022]
Abstract
BACKGROUND Surgical patients are complex, vulnerable, and prone to postoperative complications that can potentially be mitigated with quality perioperative risk assessment and management. Several institutions have incorporated machine learning (ML) into their patient care to improve awareness and support clinician decision-making along the perioperative spectrum. Recent research suggests that ML risk prediction can support perioperative patient risk monitoring and management across several situations, including the operating room (OR) to intensive care unit (ICU) handoffs. OBJECTIVES Our study objectives were threefold: (1) evaluate whether ML-generated postoperative predictions are concordant with clinician-generated risk rankings for acute kidney injury, delirium, pneumonia, deep vein thrombosis, and pulmonary embolism, and establish their associated risk factors; (2) ascertain clinician end-user suggestions to improve adoption of ML-generated risks and their integration into the perioperative workflow; and (3) develop a user-friendly visualization format for a tool to display ML-generated risks and risk factors to support postoperative care planning, for example, within the context of OR-ICU handoffs. METHODS Graphical user interfaces for postoperative risk prediction models were assessed for end-user usability through cognitive walkthroughs and interviews with anesthesiologists, surgeons, certified registered nurse anesthetists, registered nurses, and critical care physicians. Thematic analysis relying on an explanation design framework was used to identify feedback and suggestions for improvement. RESULTS 17 clinicians participated in the evaluation. ML estimates of complication risks aligned with clinicians' independent rankings, and related displays were perceived as valuable for decision-making and care planning for postoperative care. During OR-ICU handoffs, the tool could speed up report preparation and remind clinicians to address patient-specific complications, thus providing more tailored care information. Suggestions for improvement centered on electronic tool delivery; methods to build trust in ML models; modifiable risks and risk mitigation strategies; and additional patient information based on individual preferences (e.g., surgical procedure). CONCLUSIONS ML estimates of postoperative complication risks can provide anticipatory guidance, potentially increasing the efficiency of care planning. We have offered an ML visualization framework for designing future ML-augmented tools and anticipate the development of tools that recommend specific actions to the user based on ML model output.
Collapse
Affiliation(s)
- Joanna Abraham
- Institute for Informatics, School of Medicine, Washington University in St Louis, MO, United States; Department of Anesthesiology, School of Medicine, Washington University in St Louis, MO, United States.
| | - Brian Bartek
- Institute for Informatics, School of Medicine, Washington University in St Louis, MO, United States
| | - Alicia Meng
- Department of Anesthesiology, School of Medicine, Washington University in St Louis, MO, United States
| | - Christopher Ryan King
- Department of Anesthesiology, School of Medicine, Washington University in St Louis, MO, United States
| | - Bing Xue
- Department of Electrical & Systems Engineering, McKelvey School of Engineering, Washington University in St Louis, MO, United States
| | - Chenyang Lu
- Department of Computer Science & Engineering, McKelvey School of Engineering, Washington University in St Louis, MO, United States
| | - Michael S Avidan
- Department of Anesthesiology, School of Medicine, Washington University in St Louis, MO, United States
| |
Collapse
|
10
|
A Systematic Review on the Use of Explainability in Deep Learning Systems for Computer Aided Diagnosis in Radiology: Limited Use of Explainable AI? Eur J Radiol 2022; 157:110592. [DOI: 10.1016/j.ejrad.2022.110592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 10/19/2022] [Accepted: 11/01/2022] [Indexed: 11/06/2022]
|
11
|
Combi C, Amico B, Bellazzi R, Holzinger A, Moore JH, Zitnik M, Holmes JH. A manifesto on explainability for artificial intelligence in medicine. Artif Intell Med 2022; 133:102423. [PMID: 36328669 DOI: 10.1016/j.artmed.2022.102423] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 10/04/2022] [Accepted: 10/04/2022] [Indexed: 12/13/2022]
Abstract
The rapid increase of interest in, and use of, artificial intelligence (AI) in computer applications has raised a parallel concern about its ability (or lack thereof) to provide understandable, or explainable, output to users. This concern is especially legitimate in biomedical contexts, where patient safety is of paramount importance. This position paper brings together seven researchers working in the field with different roles and perspectives, to explore in depth the concept of explainable AI, or XAI, offering a functional definition and conceptual framework or model that can be used when considering XAI. This is followed by a series of desiderata for attaining explainability in AI, each of which touches upon a key domain in biomedicine.
Collapse
Affiliation(s)
| | | | | | | | - Jason H Moore
- Cedars-Sinai Medical Center, West Hollywood, CA, USA
| | - Marinka Zitnik
- Harvard Medical School and Broad Institute of MIT & Harvard, MA, USA
| | - John H Holmes
- University of Pennsylvania Perelman School of Medicine Philadelphia, PA, USA
| |
Collapse
|
12
|
Engstrom CJ, Adelaine S, Liao F, Jacobsohn GC, Patterson BW. Operationalizing a real-time scoring model to predict fall risk among older adults in the emergency department. Front Digit Health 2022; 4:958663. [PMID: 36405416 PMCID: PMC9671211 DOI: 10.3389/fdgth.2022.958663] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 09/27/2022] [Indexed: 11/06/2022] Open
Abstract
Predictive models are increasingly being developed and implemented to improve patient care across a variety of clinical scenarios. While a body of literature exists on the development of models using existing data, less focus has been placed on practical operationalization of these models for deployment in real-time production environments. This case-study describes challenges and barriers identified and overcome in such an operationalization for a model aimed at predicting risk of outpatient falls after Emergency Department (ED) visits among older adults. Based on our experience, we provide general principles for translating an EHR-based predictive model from research and reporting environments into real-time operation.
Collapse
Affiliation(s)
- Collin J. Engstrom
- Department of Emergency Medicine, UW-Madison, Madison, WI, United States
- Department of Computer Science, Winona State University, Rochester, MN, United States
- Correspondence: Collin J. Engstrom
| | - Sabrina Adelaine
- Department of Enterprise Analytics, UW Health, Madison, WI, United States
| | - Frank Liao
- Department of Enterprise Analytics, UW Health, Madison, WI, United States
| | | | - Brian W. Patterson
- Department of Emergency Medicine, UW-Madison, Madison, WI, United States
- Department of Biostatistics and Medical Informatics, UW-Madison, Madison, WI, United States
| |
Collapse
|
13
|
Di Martino F, Delmastro F. Explainable AI for clinical and remote health applications: a survey on tabular and time series data. Artif Intell Rev 2022; 56:5261-5315. [PMID: 36320613 PMCID: PMC9607788 DOI: 10.1007/s10462-022-10304-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
AbstractNowadays Artificial Intelligence (AI) has become a fundamental component of healthcare applications, both clinical and remote, but the best performing AI systems are often too complex to be self-explaining. Explainable AI (XAI) techniques are defined to unveil the reasoning behind the system’s predictions and decisions, and they become even more critical when dealing with sensitive and personal health data. It is worth noting that XAI has not gathered the same attention across different research areas and data types, especially in healthcare. In particular, many clinical and remote health applications are based on tabular and time series data, respectively, and XAI is not commonly analysed on these data types, while computer vision and Natural Language Processing (NLP) are the reference applications. To provide an overview of XAI methods that are most suitable for tabular and time series data in the healthcare domain, this paper provides a review of the literature in the last 5 years, illustrating the type of generated explanations and the efforts provided to evaluate their relevance and quality. Specifically, we identify clinical validation, consistency assessment, objective and standardised quality evaluation, and human-centered quality assessment as key features to ensure effective explanations for the end users. Finally, we highlight the main research challenges in the field as well as the limitations of existing XAI methods.
Collapse
|
14
|
Schwartz JM, George M, Rossetti SC, Dykes PC, Minshall SR, Lucas E, Cato KD. Factors Influencing Clinician Trust in Predictive Clinical Decision Support Systems for In-Hospital Deterioration: Qualitative Descriptive Study. JMIR Hum Factors 2022; 9:e33960. [PMID: 35550304 PMCID: PMC9136656 DOI: 10.2196/33960] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 03/02/2022] [Accepted: 03/21/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Clinician trust in machine learning-based clinical decision support systems (CDSSs) for predicting in-hospital deterioration (a type of predictive CDSS) is essential for adoption. Evidence shows that clinician trust in predictive CDSSs is influenced by perceived understandability and perceived accuracy. OBJECTIVE The aim of this study was to explore the phenomenon of clinician trust in predictive CDSSs for in-hospital deterioration by confirming and characterizing factors known to influence trust (understandability and accuracy), uncovering and describing other influencing factors, and comparing nurses' and prescribing providers' trust in predictive CDSSs. METHODS We followed a qualitative descriptive methodology conducting directed deductive and inductive content analysis of interview data. Directed deductive analyses were guided by the human-computer trust conceptual framework. Semistructured interviews were conducted with nurses and prescribing providers (physicians, physician assistants, or nurse practitioners) working with a predictive CDSS at 2 hospitals in Mass General Brigham. RESULTS A total of 17 clinicians were interviewed. Concepts from the human-computer trust conceptual framework-perceived understandability and perceived technical competence (ie, perceived accuracy)-were found to influence clinician trust in predictive CDSSs for in-hospital deterioration. The concordance between clinicians' impressions of patients' clinical status and system predictions influenced clinicians' perceptions of system accuracy. Understandability was influenced by system explanations, both global and local, as well as training. In total, 3 additional themes emerged from the inductive analysis. The first, perceived actionability, captured the variation in clinicians' desires for predictive CDSSs to recommend a discrete action. The second, evidence, described the importance of both macro- (scientific) and micro- (anecdotal) evidence for fostering trust. The final theme, equitability, described fairness in system predictions. The findings were largely similar between nurses and prescribing providers. CONCLUSIONS Although there is a perceived trade-off between machine learning-based CDSS accuracy and understandability, our findings confirm that both are important for fostering clinician trust in predictive CDSSs for in-hospital deterioration. We found that reliance on the predictive CDSS in the clinical workflow may influence clinicians' requirements for trust. Future research should explore the impact of reliance, the optimal explanation design for enhancing understandability, and the role of perceived actionability in driving trust.
Collapse
Affiliation(s)
- Jessica M Schwartz
- Department of Biomedical Informatics, Columbia University, New York, NY, United States
- School of Nursing, Columbia University, New York, NY, United States
| | - Maureen George
- School of Nursing, Columbia University, New York, NY, United States
| | - Sarah Collins Rossetti
- Department of Biomedical Informatics, Columbia University, New York, NY, United States
- School of Nursing, Columbia University, New York, NY, United States
| | - Patricia C Dykes
- Brigham and Women's Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Simon R Minshall
- School of Health Information Science, University of Victoria, Victoria, BC, Canada
| | - Eugene Lucas
- Department of Biomedical Informatics, Columbia University, New York, NY, United States
- Weill Cornell Medicine, New York, NY, United States
| | - Kenrick D Cato
- School of Nursing, Columbia University, New York, NY, United States
- Department of Emergency Medicine, Columbia University, New York, NY, United States
| |
Collapse
|
15
|
van de Sande D, van Genderen ME, Verhoef C, Huiskens J, Gommers D, van Unen E, Schasfoort RA, Schepers J, van Bommel J, Grünhagen DJ. Optimizing discharge after major surgery using an artificial intelligence-based decision support tool (DESIRE): An external validation study. Surgery 2022; 172:663-669. [PMID: 35525621 DOI: 10.1016/j.surg.2022.03.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 03/15/2022] [Accepted: 03/21/2022] [Indexed: 11/18/2022]
Abstract
BACKGROUND In the DESIRE study (Discharge aftEr Surgery usIng aRtificial intElligence), we have previously developed and validated a machine learning concept in 1,677 gastrointestinal and oncology surgery patients that can predict safe hospital discharge after the second postoperative day. Despite strong model performance (area under the receiver operating characteristics curve of 0.88) in an academic surgical population, it remains unknown whether these findings can be translated to other hospitals and surgical populations. We therefore aimed to determine the generalizability of the previously developed machine learning concept. METHODS We externally validated the machine learning concept in gastrointestinal and oncology surgery patients admitted to 3 nonacademic hospitals in The Netherlands between January 2017 and June 2021, who remained admitted 2 days after surgery. Primary outcome was the ability to predict hospital interventions after the second postoperative day, which were defined as unplanned reoperations, radiological interventions, and/or intravenous antibiotics administration. Four forest models were locally trained and evaluated with respect to area under the receiver operating characteristics curve, sensitivity, specificity, positive predictive value, and negative predictive value. RESULTS All models were trained on 1,693 epsiodes, of which 731 (29.9%) required a hospital intervention and demonstrated strong performance (area under the receiver operating characteristics curve only varied 4%). The best model achieved an area under the receiver operating characteristics curve of 0.83 (95% confidence interval [0.81-0.85]), sensitivity of 77.9% (0.67-0.87), specificity of 79.2% (0.72-0.85), positive predictive value of 61.6% (0.54-0.69), and negative predictive value of 89.3% (0.85-0.93). CONCLUSION This study showed that a previously developed machine learning concept can predict safe discharge in different surgical populations and hospital settings (academic versus nonacademic) by training a model on local patient data. Given its high accuracy, integration of the machine learning concept into the clinical workflow could expedite surgical discharge and aid hospitals in addressing capacity challenges by reducing avoidable bed-days.
Collapse
Affiliation(s)
- Davy van de Sande
- Department of Adult Intensive Care, Erasmus University Medical Center, Rotterdam, The Netherlands. https://twitter.com/davy_sande
| | - Michel E van Genderen
- Department of Adult Intensive Care, Erasmus University Medical Center, Rotterdam, The Netherlands.
| | - Cornelis Verhoef
- Department of Surgical Oncology, Erasmus MC Cancer Institute University Medical Center, Rotterdam, The Netherlands
| | | | - Diederik Gommers
- Department of Adult Intensive Care, Erasmus University Medical Center, Rotterdam, The Netherlands
| | | | | | - Judith Schepers
- Department of Business Intelligence, Treant Care Group, Emmen, The Netherlands
| | - Jasper van Bommel
- Department of Adult Intensive Care, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Dirk J Grünhagen
- Department of Surgical Oncology, Erasmus MC Cancer Institute University Medical Center, Rotterdam, The Netherlands
| |
Collapse
|
16
|
Malins S, Figueredo G, Jilani T, Long Y, Andrews J, Rawsthorne M, Manolescu C, Clos J, Higton F, Waldram D, Hunt D, Perez Vallejos E, Moghaddam N. Developing An Automated Assessment of In-Session Patient Activation for Psychological Therapy: A Co-Development Approach (Preprint). JMIR Med Inform 2022; 10:e38168. [DOI: 10.2196/38168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 06/07/2022] [Accepted: 06/27/2022] [Indexed: 11/13/2022] Open
|
17
|
Helman S, Terry MA, Pellathy T, Williams A, Dubrawski A, Clermont G, Pinsky MR, Al-Zaiti S, Hravnak M. Engaging clinicians early during the development of a graphical user display of an intelligent alerting system at the bedside. Int J Med Inform 2022; 159:104643. [PMID: 34973608 PMCID: PMC9040820 DOI: 10.1016/j.ijmedinf.2021.104643] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2021] [Revised: 10/13/2021] [Accepted: 11/08/2021] [Indexed: 12/21/2022]
Abstract
BACKGROUND Artificial Intelligence (AI) is increasingly used to support bedside clinical decisions, but information must be presented in usable ways within workflow. Graphical User Interfaces (GUI) are front-facing presentations for communicating AI outputs, but clinicians are not routinely invited to participate in their design, hindering AI solution potential. PURPOSE To inform early user-engaged design of a GUI prototype aimed at predicting future Cardiorespiratory Insufficiency (CRI) by exploring clinician methods for identifying at-risk patients, previous experience with implementing new technologies into clinical workflow, and user perspectives on GUI screen changes. METHODS We conducted a qualitative focus group study to elicit iterative design feedback from clinical end-users on an early GUI prototype display. Five online focus group sessions were held, each moderated by an expert focus group methodologist. Iterative design changes were made sequentially, and the updated GUI display was presented to the next group of participants. RESULTS 23 clinicians were recruited (14 nurses, 4 nurse practitioners, 5 physicians; median participant age ∼35 years; 60% female; median clinical experience 8 years). Five themes emerged from thematic content analysis: trend evolution, context (risk evolution relative to vital signs and interventions), evaluation/interpretation/explanation (sub theme: continuity of evaluation), clinician intuition, and clinical operations. Based on these themes, GUI display changes were made. For example, color and scale adjustments, integration of clinical information, and threshold personalization. CONCLUSIONS Early user-engaged design was useful in adjusting GUI presentation of AI output. Next steps involve clinical testing and further design modification of the AI output to optimally facilitate clinician surveillance and decisions. Clinicians should be involved early and often in clinical decision support design to optimize efficacy of AI tools.
Collapse
Affiliation(s)
- Stephanie Helman
- The Department of Acute and Tertiary Care Nursing, University of Pittsburgh, Pittsburgh, PA, United States.
| | - Martha Ann Terry
- The Department of Behavioral and Community Health Sciences, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, United States.
| | - Tiffany Pellathy
- The Veterans Administration Center for Health Equity Research and Promotion, Pittsburgh, PA, United States.
| | - Andrew Williams
- The Auton Lab, School of Computer Science at Carnegie Mellon University, Pittsburgh, PA, United States.
| | - Artur Dubrawski
- The Auton Lab, School of Computer Science at Carnegie Mellon University, Pittsburgh, PA, United States.
| | - Gilles Clermont
- The Department of Critical Care Medicine, University of Pittsburgh, Pittsburgh, PA, United States.
| | - Michael R Pinsky
- The Department of Critical Care Medicine, University of Pittsburgh, Pittsburgh, PA, United States.
| | - Salah Al-Zaiti
- The Department of Acute and Tertiary Care Nursing, University of Pittsburgh, Pittsburgh, PA, United States; The Department of Emergency Medicine, University of Pittsburgh, Pittsburgh, PA, United States; The Division of Cardiology, University of Pittsburgh, Pittsburgh, PA, United States.
| | - Marilyn Hravnak
- The Department of Acute and Tertiary Care Nursing, University of Pittsburgh, Pittsburgh, PA, United States.
| |
Collapse
|
18
|
Kocbek S, Kocbek P, Gosak L, Fijačko N, Štiglic G. Extracting New Temporal Features to Improve the Interpretability of Undiagnosed Type 2 Diabetes Mellitus Prediction Models. J Pers Med 2022; 12:jpm12030368. [PMID: 35330368 PMCID: PMC8950921 DOI: 10.3390/jpm12030368] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Revised: 02/13/2022] [Accepted: 02/25/2022] [Indexed: 12/04/2022] Open
Abstract
Type 2 diabetes mellitus (T2DM) often results in high morbidity and mortality. In addition, T2DM presents a substantial financial burden for individuals and their families, health systems, and societies. According to studies and reports, globally, the incidence and prevalence of T2DM are increasing rapidly. Several models have been built to predict T2DM onset in the future or detect undiagnosed T2DM in patients. Additional to the performance of such models, their interpretability is crucial for health experts, especially in personalized clinical prediction models. Data collected over 42 months from health check-up examinations and prescribed drugs data repositories of four primary healthcare providers were used in this study. We propose a framework consisting of LogicRegression based feature extraction and Least Absolute Shrinkage and Selection operator based prediction modeling for undiagnosed T2DM prediction. Performance of the models was measured using Area under the ROC curve (AUC) with corresponding confidence intervals. Results show that using LogicRegression based feature extraction resulted in simpler models, which are easier for healthcare experts to interpret, especially in cases with many binary features. Models developed using the proposed framework resulted in an AUC of 0.818 (95% Confidence Interval (CI): 0.812−0.823) that was comparable to more complex models (i.e., models with a larger number of features), where all features were included in prediction model development with the AUC of 0.816 (95% CI: 0.810−0.822). However, the difference in the number of used features was significant. This study proposes a framework for building interpretable models in healthcare that can contribute to higher trust in prediction models from healthcare experts.
Collapse
Affiliation(s)
- Simon Kocbek
- Institute of Informatics, Faculty of Electrical Engineering and Computer Science, University of Maribor, 2000 Maribor, Slovenia;
- Correspondence:
| | - Primož Kocbek
- Faculty of Health Sciences, University of Maribor, 2000 Maribor, Slovenia; (P.K.); (L.G.); (N.F.)
| | - Lucija Gosak
- Faculty of Health Sciences, University of Maribor, 2000 Maribor, Slovenia; (P.K.); (L.G.); (N.F.)
| | - Nino Fijačko
- Faculty of Health Sciences, University of Maribor, 2000 Maribor, Slovenia; (P.K.); (L.G.); (N.F.)
| | - Gregor Štiglic
- Institute of Informatics, Faculty of Electrical Engineering and Computer Science, University of Maribor, 2000 Maribor, Slovenia;
- Faculty of Health Sciences, University of Maribor, 2000 Maribor, Slovenia; (P.K.); (L.G.); (N.F.)
- Usher Institute, University of Edinburgh, Edinburgh EH8 9YL, UK
| |
Collapse
|
19
|
van de Sande D, Van Genderen ME, Smit JM, Huiskens J, Visser JJ, Veen RER, van Unen E, Ba OH, Gommers D, Bommel JV. Developing, implementing and governing artificial intelligence in medicine: a step-by-step approach to prevent an artificial intelligence winter. BMJ Health Care Inform 2022; 29:bmjhci-2021-100495. [PMID: 35185012 PMCID: PMC8860016 DOI: 10.1136/bmjhci-2021-100495] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Accepted: 01/24/2022] [Indexed: 12/23/2022] Open
Abstract
Objective Although the role of artificial intelligence (AI) in medicine is increasingly studied, most patients do not benefit because the majority of AI models remain in the testing and prototyping environment. The development and implementation trajectory of clinical AI models are complex and a structured overview is missing. We therefore propose a step-by-step overview to enhance clinicians’ understanding and to promote quality of medical AI research. Methods We summarised key elements (such as current guidelines, challenges, regulatory documents and good practices) that are needed to develop and safely implement AI in medicine. Conclusion This overview complements other frameworks in a way that it is accessible to stakeholders without prior AI knowledge and as such provides a step-by-step approach incorporating all the key elements and current guidelines that are essential for implementation, and can thereby help to move AI from bytes to bedside.
Collapse
Affiliation(s)
- Davy van de Sande
- Department of Adult Intensive Care, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Michel E Van Genderen
- Department of Adult Intensive Care, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Jim M Smit
- Department of Adult Intensive Care, Erasmus Medical Center, Rotterdam, The Netherlands.,Pattern Recognition and Bioinformatics group, EEMCS, Delft University of Technology, Delft, The Netherlands
| | | | - Jacob J Visser
- Department of Radiology and Nuclear Medicine, Erasmus Medical Center, Rotterdam, The Netherlands.,Department of Information Technology, Chief Medical Information Officer, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Robert E R Veen
- Department of Information Technology, theme Research Suite, Erasmus Medical Center, Rotterdam, The Netherlands
| | | | - Oliver Hilgers Ba
- Active Medical Devices/Medical Device Software, CE Plus GmbH, Badenweiler, Germany
| | - Diederik Gommers
- Department of Adult Intensive Care, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Jasper van Bommel
- Department of Adult Intensive Care, Erasmus Medical Center, Rotterdam, The Netherlands
| |
Collapse
|
20
|
van Elten HJ, Sülz S, van Raaij EM, Wehrens R. Big Data Health Care Innovations: Performance Dashboarding as a Process of Collective Sensemaking. J Med Internet Res 2022; 24:e30201. [PMID: 35191847 PMCID: PMC8905474 DOI: 10.2196/30201] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 09/10/2021] [Accepted: 12/15/2021] [Indexed: 12/21/2022] Open
Abstract
Big data is poised to revolutionize health care, and performance dashboards can be an important tool to manage big data innovations. Dashboards show the progress being made and provide critical management information about effectiveness and efficiency. However, performance dashboards are more than just a clear and straightforward representation of performance in the health care context. Instead, the development and maintenance of informative dashboards can be more productively viewed as an interactive and iterative process involving all stakeholders. We refer to this process as dashboarding and reflect on our learnings within a large European Union–funded project. Within this project, multiple big data applications in health care are being developed, piloted, and scaled up. In this paper, we discuss the ways in which we cope with the inherent sensitivities and tensions surrounding dashboarding in such a dynamic environment.
Collapse
Affiliation(s)
- Hilco J van Elten
- Erasmus School of Health Policy & Management, Erasmus University, Rotterdam, Netherlands
| | - Sandra Sülz
- Erasmus School of Health Policy & Management, Erasmus University, Rotterdam, Netherlands
| | - Erik M van Raaij
- Erasmus School of Health Policy & Management, Erasmus University, Rotterdam, Netherlands.,Rotterdam School of Management, Erasmus University, Rotterdam, Netherlands
| | - Rik Wehrens
- Erasmus School of Health Policy & Management, Erasmus University, Rotterdam, Netherlands
| |
Collapse
|
21
|
Jaber D, Hajj H, Maalouf F, El-Hajj W. Medically-oriented design for explainable AI for stress prediction from physiological measurements. BMC Med Inform Decis Mak 2022; 22:38. [PMID: 35148762 PMCID: PMC8840288 DOI: 10.1186/s12911-022-01772-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Accepted: 12/08/2021] [Indexed: 11/29/2022] Open
Abstract
Background In the last decade, a lot of attention has been given to develop artificial intelligence (AI) solutions for mental health using machine learning. To build trust in AI applications, it is crucial for AI systems to provide for practitioners and patients the reasons behind the AI decisions. This is referred to as Explainable AI. While there has been significant progress in developing stress prediction models, little work has been done to develop explainable AI for mental health. Methods In this work, we address this gap by designing an explanatory AI report for stress prediction from wearable sensors. Because medical practitioners and patients are likely to be familiar with blood test reports, we modeled the look and feel of the explanatory AI on those of a standard blood test report. The report includes stress prediction and the physiological signals related to stressful episodes. In addition to the new design for explaining AI in mental health, the work includes the following contributions: Methods to automatically generate different components of the report, an approach for evaluating and validating the accuracies of the explanations, and a collection of ground truth of relationships between physiological measurements and stress prediction. Results Test results showed that the explanations were consistent with ground truth. The reference intervals for stress versus non-stress were quite distinctive with little variation. In addition to the quantitative evaluations, a qualitative survey, conducted by three expert psychiatrists confirmed the usefulness of the explanation report in understanding the different aspects of the AI system. Conclusion In this work, we have provided a new design for explainable AI used in stress prediction based on physiological measurements. Based on the report, users and medical practitioners can determine what biological features have the most impact on the prediction of stress in addition to any health-related abnormalities. The effectiveness of the explainable AI report was evaluated using a quantitative and a qualitative assessment. The stress prediction accuracy was shown to be comparable to state-of-the-art. The contributions of each physiological signal to the stress prediction was shown to correlate with ground truth. In addition to these quantitative evaluations, a qualitative survey with psychiatrists confirmed the confidence and effectiveness of the explanation report in the stress made by the AI system. Future work includes the addition of more explanatory features related to other emotional states of the patient, such as sadness, relaxation, anxiousness, or happiness.
Collapse
Affiliation(s)
- Dalia Jaber
- Electrical and Computer Engineering Department, American University of Beirut, Beirut, Lebanon.
| | - Hazem Hajj
- Pathfinding, Automation Technology and Analytics, Intel Corporation, Hillsboro, Oregon, USA
| | - Fadi Maalouf
- Department of Psychiatry, American University of Beirut, Beirut, Lebanon
| | - Wassim El-Hajj
- Computer Science Department, American University of Beirut, Beirut, Lebanon
| |
Collapse
|
22
|
Hwang J, Lee T, Lee H, Byun S. A Clinical Decision Support System for Sleep Staging Tasks With Explanations From Artificial Intelligence: User-Centered Design and Evaluation Study. J Med Internet Res 2022; 24:e28659. [PMID: 35044311 PMCID: PMC8811694 DOI: 10.2196/28659] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 06/30/2021] [Accepted: 12/01/2021] [Indexed: 12/11/2022] Open
Abstract
Background Despite the unprecedented performance of deep learning algorithms in clinical domains, full reviews of algorithmic predictions by human experts remain mandatory. Under these circumstances, artificial intelligence (AI) models are primarily designed as clinical decision support systems (CDSSs). However, from the perspective of clinical practitioners, the lack of clinical interpretability and user-centered interfaces hinders the adoption of these AI systems in practice. Objective This study aims to develop an AI-based CDSS for assisting polysomnographic technicians in reviewing AI-predicted sleep staging results. This study proposed and evaluated a CDSS that provides clinically sound explanations for AI predictions in a user-centered manner. Methods Our study is based on a user-centered design framework for developing explanations in a CDSS that identifies why explanations are needed, what information should be contained in explanations, and how explanations can be provided in the CDSS. We conducted user interviews, user observation sessions, and an iterative design process to identify three key aspects for designing explanations in the CDSS. After constructing the CDSS, the tool was evaluated to investigate how the CDSS explanations helped technicians. We measured the accuracy of sleep staging and interrater reliability with macro-F1 and Cohen κ scores to assess quantitative improvements after our tool was adopted. We assessed qualitative improvements through participant interviews that established how participants perceived and used the tool. Results The user study revealed that technicians desire explanations that are relevant to key electroencephalogram (EEG) patterns for sleep staging when assessing the correctness of AI predictions. Here, technicians wanted explanations that could be used to evaluate whether the AI models properly locate and use these patterns during prediction. On the basis of this, information that is closely related to sleep EEG patterns was formulated for the AI models. In the iterative design phase, we developed a different visualization strategy for each pattern based on how technicians interpreted the EEG recordings with these patterns during their workflows. Our evaluation study on 9 polysomnographic technicians quantitatively and qualitatively investigated the helpfulness of the tool. For technicians with <5 years of work experience, their quantitative sleep staging performance improved significantly from 56.75 to 60.59 with a P value of .05. Qualitatively, participants reported that the information provided effectively supported them, and they could develop notable adoption strategies for the tool. Conclusions Our findings indicate that formulating clinical explanations for automated predictions using the information in the AI with a user-centered design process is an effective strategy for developing a CDSS for sleep staging.
Collapse
Affiliation(s)
| | | | | | - Seonjeong Byun
- Department of Neuropsychiatry, Uijeongbu St Mary's Hospital, College of Medicine, The Catholic University of Korea, Uijeongbu-si, Republic of Korea
| |
Collapse
|
23
|
Hah H, Goldin D. Moving toward AI-assisted decision-making: Observation on clinicians' management of multimedia patient information in synchronous and asynchronous telehealth contexts. Health Informatics J 2022; 28:14604582221077049. [PMID: 35225704 DOI: 10.1177/14604582221077049] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
BACKGROUND Artificial intelligence (AI) intends to support clinicians' patient diagnosis decisions by processing and identifying insights from multimedia patient information. OBJECTIVE We explored clinicians' current decision-making patterns using multimedia patient information (MPI) provided by AI algorithms and identified areas where AI can support clinicians in diagnostic decision-making. DESIGN We recruited 87 advanced practice nursing (APN) students who had experience making diagnostic decisions using AI algorithms under various care contexts, including telehealth and other healthcare modalities. The participants described their diagnostic decision-making experiences using videos, images, and audio-based MPI. RESULTS Clinicians processed multimedia patient information differentially such that their focus, selection, and utilization of MPI influence diagnosis and satisfaction levels. CONCLUSIONS AND IMPLICATIONS To streamline collaboration between AI and clinicians across healthcare contexts, AI should understand clinicians' patterns of MPI processing under various care environments and provide them with interpretable analytic results for them. Furthermore, clinicians must be trained with the interface and contents of AI technology and analytic assistance.
Collapse
Affiliation(s)
- Hyeyoung Hah
- Department of Information Systems and Business Analytics, 5450Florida International University, FL, USA
| | - Deana Goldin
- Nicole Wertheim College of Nursing & Health Sciences, 5450Florida International University, FL, USA
| |
Collapse
|
24
|
Matthiesen S, Diederichsen SZ, Hansen MKH, Villumsen C, Lassen MCH, Jacobsen PK, Risum N, Winkel BG, Philbert BT, Svendsen JH, Andersen TO. Clinician Preimplementation Perspectives of a Decision-Support Tool for the Prediction of Cardiac Arrhythmia Based on Machine Learning: Near-Live Feasibility and Qualitative Study. JMIR Hum Factors 2021; 8:e26964. [PMID: 34842528 PMCID: PMC8665383 DOI: 10.2196/26964] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Revised: 03/23/2021] [Accepted: 10/11/2021] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Artificial intelligence (AI), such as machine learning (ML), shows great promise for improving clinical decision-making in cardiac diseases by outperforming statistical-based models. However, few AI-based tools have been implemented in cardiology clinics because of the sociotechnical challenges during transitioning from algorithm development to real-world implementation. OBJECTIVE This study explored how an ML-based tool for predicting ventricular tachycardia and ventricular fibrillation (VT/VF) could support clinical decision-making in the remote monitoring of patients with an implantable cardioverter defibrillator (ICD). METHODS Seven experienced electrophysiologists participated in a near-live feasibility and qualitative study, which included walkthroughs of 5 blinded retrospective patient cases, use of the prediction tool, and questionnaires and interview questions. All sessions were video recorded, and sessions evaluating the prediction tool were transcribed verbatim. Data were analyzed through an inductive qualitative approach based on grounded theory. RESULTS The prediction tool was found to have potential for supporting decision-making in ICD remote monitoring by providing reassurance, increasing confidence, acting as a second opinion, reducing information search time, and enabling delegation of decisions to nurses and technicians. However, the prediction tool did not lead to changes in clinical action and was found less useful in cases where the quality of data was poor or when VT/VF predictions were found to be irrelevant for evaluating the patient. CONCLUSIONS When transitioning from AI development to testing its feasibility for clinical implementation, we need to consider the following: expectations must be aligned with the intended use of AI; trust in the prediction tool is likely to emerge from real-world use; and AI accuracy is relational and dependent on available information and local workflows. Addressing the sociotechnical gap between the development and implementation of clinical decision-support tools based on ML in cardiac care is essential for succeeding with adoption. It is suggested to include clinical end-users, clinical contexts, and workflows throughout the overall iterative approach to design, development, and implementation.
Collapse
Affiliation(s)
- Stina Matthiesen
- Department of Computer Science, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
- Vital Beats, Copenhagen, Denmark
| | - Søren Zöga Diederichsen
- Vital Beats, Copenhagen, Denmark
- Department of Cardiology, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark
| | | | | | | | - Peter Karl Jacobsen
- Department of Cardiology, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark
| | - Niels Risum
- Department of Cardiology, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark
| | - Bo Gregers Winkel
- Department of Cardiology, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark
| | - Berit T Philbert
- Department of Cardiology, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark
| | - Jesper Hastrup Svendsen
- Department of Cardiology, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Tariq Osman Andersen
- Department of Computer Science, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
- Vital Beats, Copenhagen, Denmark
| |
Collapse
|
25
|
Benda NC, Novak LL, Reale C, Ancker JS. Trust in AI: why we should be designing for APPROPRIATE reliance. J Am Med Inform Assoc 2021; 29:207-212. [PMID: 34725693 DOI: 10.1093/jamia/ocab238] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 09/29/2021] [Accepted: 10/13/2021] [Indexed: 11/13/2022] Open
Abstract
Use of artificial intelligence in healthcare, such as machine learning-based predictive algorithms, holds promise for advancing outcomes, but few systems are used in routine clinical practice. Trust has been cited as an important challenge to meaningful use of artificial intelligence in clinical practice. Artificial intelligence systems often involve automating cognitively challenging tasks. Therefore, previous literature on trust in automation may hold important lessons for artificial intelligence applications in healthcare. In this perspective, we argue that informatics should take lessons from literature on trust in automation such that the goal should be to foster appropriate trust in artificial intelligence based on the purpose of the tool, its process for making recommendations, and its performance in the given context. We adapt a conceptual model to support this argument and present recommendations for future work.
Collapse
Affiliation(s)
- Natalie C Benda
- Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, USA
| | - Laurie L Novak
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Carrie Reale
- Center for Research and Innovation in Systems Safety, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Jessica S Ancker
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| |
Collapse
|
26
|
Roski J, Maier EJ, Vigilante K, Kane EA, Matheny ME. Enhancing trust in AI through industry self-governance. J Am Med Inform Assoc 2021; 28:1582-1590. [PMID: 33895824 PMCID: PMC8661431 DOI: 10.1093/jamia/ocab065] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Revised: 03/17/2020] [Accepted: 03/26/2021] [Indexed: 12/14/2022] Open
Abstract
Artificial intelligence (AI) is critical to harnessing value from exponentially growing health and healthcare data. Expectations are high for AI solutions to effectively address current health challenges. However, there have been prior periods of enthusiasm for AI followed by periods of disillusionment, reduced investments, and progress, known as "AI Winters." We are now at risk of another AI Winter in health/healthcare due to increasing publicity of AI solutions that are not representing touted breakthroughs, and thereby decreasing trust of users in AI. In this article, we first highlight recently published literature on AI risks and mitigation strategies that would be relevant for groups considering designing, implementing, and promoting self-governance. We then describe a process for how a diverse group of stakeholders could develop and define standards for promoting trust, as well as AI risk-mitigating practices through greater industry self-governance. We also describe how adherence to such standards could be verified, specifically through certification/accreditation. Self-governance could be encouraged by governments to complement existing regulatory schema or legislative efforts to mitigate AI risks. Greater adoption of industry self-governance could fill a critical gap to construct a more comprehensive approach to the governance of AI solutions than US legislation/regulations currently encompass. In this more comprehensive approach, AI developers, AI users, and government/legislators all have critical roles to play to advance practices that maintain trust in AI and prevent another AI Winter.
Collapse
Affiliation(s)
| | | | | | | | - Michael E Matheny
- Departments of Biomedical Informatics, Biostatistics, and Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA.,Geriatric Research Education and Clinical Care Center, Tennessee Valley Healthcare System VA, Nashville, Tennessee, USA
| |
Collapse
|