1
|
Weber S, Wyszynski M, Godefroid M, Plattfaut R, Niehaves B. How do medical professionals make sense (or not) of AI? A social-media-based computational grounded theory study and an online survey. Comput Struct Biotechnol J 2024; 24:146-159. [PMID: 38434249 PMCID: PMC10904922 DOI: 10.1016/j.csbj.2024.02.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 02/14/2024] [Accepted: 02/14/2024] [Indexed: 03/05/2024] Open
Abstract
To investigate opinions and attitudes of medical professionals towards adopting AI-enabled healthcare technologies in their daily business, we used a mixed-methods approach. Study 1 employed a qualitative computational grounded theory approach analyzing 181 Reddit threads in the several subreddits of r/medicine. By utilizing an unsupervised machine learning clustering method, we identified three key themes: (1) consequences of AI, (2) physician-AI relationship, and (3) a proposed way forward. In particular Reddit posts related to the first two themes indicated that the medical professionals' fear of being replaced by AI and skepticism toward AI played a major role in the argumentations. Moreover, the results suggest that this fear is driven by little or moderate knowledge about AI. Posts related to the third theme focused on factual discussions about how AI and medicine have to be designed to become broadly adopted in health care. Study 2 quantitatively examined the relationship between the fear of AI, knowledge about AI, and medical professionals' intention to use AI-enabled technologies in more detail. Results based on a sample of 223 medical professionals who participated in the online survey revealed that the intention to use AI technologies increases with increasing knowledge about AI and that this effect is moderated by the fear of being replaced by AI.
Collapse
Affiliation(s)
- Sebastian Weber
- University of Bremen, Digital Public, Bibliothekstr. 1, 28359 Bremen, Germany
| | - Marc Wyszynski
- University of Bremen, Digital Public, Bibliothekstr. 1, 28359 Bremen, Germany
| | - Marie Godefroid
- University of Siegen, Information Systems, Kohlbettstr. 15, 57072 Siegen, Germany
| | - Ralf Plattfaut
- University of Duisburg-Essen, Information Systems and Transformation Management, Universitätsstr. 9, 45141 Essen, Germany
| | - Bjoern Niehaves
- University of Bremen, Digital Public, Bibliothekstr. 1, 28359 Bremen, Germany
| |
Collapse
|
2
|
Kenny R, Fischhoff B, Davis A, Canfield C. Improving Social Bot Detection Through Aid and Training. HUMAN FACTORS 2024; 66:2323-2344. [PMID: 37963198 PMCID: PMC11382440 DOI: 10.1177/00187208231210145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2023]
Abstract
OBJECTIVE We test the effects of three aids on individuals' ability to detect social bots among Twitter personas: a bot indicator score, a training video, and a warning. BACKGROUND Detecting social bots can prevent online deception. We use a simulated social media task to evaluate three aids. METHOD Lay participants judged whether each of 60 Twitter personas was a human or social bot in a simulated online environment, using agreement between three machine learning algorithms to estimate the probability of each persona being a bot. Experiment 1 compared a control group and two intervention groups, one provided a bot indicator score for each tweet; the other provided a warning about social bots. Experiment 2 compared a control group and two intervention groups, one receiving the bot indicator scores and the other a training video, focused on heuristics for identifying social bots. RESULTS The bot indicator score intervention improved predictive performance and reduced overconfidence in both experiments. The training video was also effective, although somewhat less so. The warning had no effect. Participants rarely reported willingness to share content for a persona that they labeled as a bot, even when they agreed with it. CONCLUSIONS Informative interventions improved social bot detection; warning alone did not. APPLICATION We offer an experimental testbed and methodology that can be used to evaluate and refine interventions designed to reduce vulnerability to social bots. We show the value of two interventions that could be applied in many settings.
Collapse
Affiliation(s)
- Ryan Kenny
- United States Army, Fayetteville, NC, USA
| | | | - Alex Davis
- Carnegie Mellon University, Pittsburgh, PA, USA
| | - Casey Canfield
- Missouri University of Science and Technology, Rolla, MO, USA
| |
Collapse
|
3
|
Wahid KA, Kaffey ZY, Farris DP, Humbert-Vidan L, Moreno AC, Rasmussen M, Ren J, Naser MA, Netherton TJ, Korreman S, Balakrishnan G, Fuller CD, Fuentes D, Dohopolski MJ. Artificial intelligence uncertainty quantification in radiotherapy applications - A scoping review. Radiother Oncol 2024; 201:110542. [PMID: 39299574 DOI: 10.1016/j.radonc.2024.110542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Revised: 08/18/2024] [Accepted: 09/09/2024] [Indexed: 09/22/2024]
Abstract
BACKGROUND/PURPOSE The use of artificial intelligence (AI) in radiotherapy (RT) is expanding rapidly. However, there exists a notable lack of clinician trust in AI models, underscoring the need for effective uncertainty quantification (UQ) methods. The purpose of this study was to scope existing literature related to UQ in RT, identify areas of improvement, and determine future directions. METHODS We followed the PRISMA-ScR scoping review reporting guidelines. We utilized the population (human cancer patients), concept (utilization of AI UQ), context (radiotherapy applications) framework to structure our search and screening process. We conducted a systematic search spanning seven databases, supplemented by manual curation, up to January 2024. Our search yielded a total of 8980 articles for initial review. Manuscript screening and data extraction was performed in Covidence. Data extraction categories included general study characteristics, RT characteristics, AI characteristics, and UQ characteristics. RESULTS We identified 56 articles published from 2015 to 2024. 10 domains of RT applications were represented; most studies evaluated auto-contouring (50 %), followed by image-synthesis (13 %), and multiple applications simultaneously (11 %). 12 disease sites were represented, with head and neck cancer being the most common disease site independent of application space (32 %). Imaging data was used in 91 % of studies, while only 13 % incorporated RT dose information. Most studies focused on failure detection as the main application of UQ (60 %), with Monte Carlo dropout being the most commonly implemented UQ method (32 %) followed by ensembling (16 %). 55 % of studies did not share code or datasets. CONCLUSION Our review revealed a lack of diversity in UQ for RT applications beyond auto-contouring. Moreover, we identified a clear need to study additional UQ methods, such as conformal prediction. Our results may incentivize the development of guidelines for reporting and implementation of UQ in RT.
Collapse
Affiliation(s)
- Kareem A Wahid
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA; Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Zaphanlene Y Kaffey
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - David P Farris
- Research Medical Library, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Laia Humbert-Vidan
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Amy C Moreno
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | | | - Jintao Ren
- Department of Oncology, Aarhus University Hospital, Denmark
| | - Mohamed A Naser
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Tucker J Netherton
- Department of Radiation Physics, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Stine Korreman
- Department of Oncology, Aarhus University Hospital, Denmark
| | | | - Clifton D Fuller
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - David Fuentes
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
| | - Michael J Dohopolski
- Department of Radiation Oncology, The University of Texas Southwestern Medical Center, Dallas, TX, USA.
| |
Collapse
|
4
|
Mortlock R, Lucas C. Generative artificial intelligence (Gen-AI) in pharmacy education: Utilization and implications for academic integrity: A scoping review. EXPLORATORY RESEARCH IN CLINICAL AND SOCIAL PHARMACY 2024; 15:100481. [PMID: 39184524 PMCID: PMC11341932 DOI: 10.1016/j.rcsop.2024.100481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 07/16/2024] [Accepted: 07/17/2024] [Indexed: 08/27/2024] Open
Abstract
Introduction Generative artificial intelligence (Gen-AI), exemplified by the widely adopted ChatGPT, has garnered significant attention in recent years. Its application spans various health education domains, including pharmacy, where its potential benefits and drawbacks have become increasingly apparent. Despite the growing adoption of Gen-AIsuch as ChatGPT in pharmacy education, there remains a critical need to assess and mitigate associated risks. This review exploresthe literature and potential strategies for mitigating risks associated with the integration of Gen-AI in pharmacy education. Aim To conduct a scoping review to identify implications of Gen-AI in pharmacy education, identify its use and emerging evidence, with a particular focus on strategies which mitigate potential risks to academic integrity. Methods A scoping review strategy was employed in accordance with the PRISMA-ScR guidelines. Databases searched includedPubMed, ERIC [Education Resources Information Center], Scopus and ProQuestfrom August 2023 to 20 February 2024 and included all relevant records from 1 January 2000 to 20 February 2024 relating specifically to LLM use within pharmacy education. A grey literature search was also conducted due to the emerging nature of this topic. Policies, procedures, and documents from institutions such as universities and colleges, including standards, guidelines, and policy documents, were hand searched and reviewed in their most updated form. These documents were not published in the scientific literature or indexed in academic search engines. Results Articles (n = 12) were derived from the scientific data bases and Records (n = 9) derived from the grey literature. Potential use and benefits of Gen-AI within pharmacy education were identified in all included published articles however there was a paucity of published articles related the degree of consideration to the potential risks to academic integrity. Grey literature recordsheld the largest proportion of risk mitigation strategies largely focusing on increased academic and student education and training relating to the ethical use of Gen-AI as well considerations for redesigning of current assessments likely to be a risk for Gen-AI use to academic integrity. Conclusion Drawing upon existing literature, this review highlights the importance of evidence-based approaches to address the challenges posed by Gen-AI such as ChatGPT in pharmacy education settings. Additionally, whilst mitigation strategies are suggested, primarily drawn from the grey literature, there is a paucity of traditionally published scientific literature outlining strategies for the practical and ethical implementation of Gen-AI within pharmacy education. Further research related to the responsible and ethical use of Gen-AIin pharmacy curricula; and studies related to strategies adopted to mitigate risks to academic integrity would be beneficial.
Collapse
Affiliation(s)
- R. Mortlock
- Graduate School of Health, Faculty of Health, University of Technology, Sydney, Australia
| | - C. Lucas
- Graduate School of Health, Faculty of Health, University of Technology, Sydney, Australia
- School of Population Health, Faculty of Medicine and Health, University of NSW, Sydney, Australia
- Connected Intelligence Centre (CIC), University of Technology Sydney, Australia
| |
Collapse
|
5
|
Topff L, Steltenpool S, Ranschaert ER, Ramanauskas N, Menezes R, Visser JJ, Beets-Tan RGH, Hartkamp NS. Artificial intelligence-assisted double reading of chest radiographs to detect clinically relevant missed findings: a two-centre evaluation. Eur Radiol 2024; 34:5876-5885. [PMID: 38466390 PMCID: PMC11364654 DOI: 10.1007/s00330-024-10676-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 01/21/2024] [Accepted: 02/01/2024] [Indexed: 03/13/2024]
Abstract
OBJECTIVES To evaluate an artificial intelligence (AI)-assisted double reading system for detecting clinically relevant missed findings on routinely reported chest radiographs. METHODS A retrospective study was performed in two institutions, a secondary care hospital and tertiary referral oncology centre. Commercially available AI software performed a comparative analysis of chest radiographs and radiologists' authorised reports using a deep learning and natural language processing algorithm, respectively. The AI-detected discrepant findings between images and reports were assessed for clinical relevance by an external radiologist, as part of the commercial service provided by the AI vendor. The selected missed findings were subsequently returned to the institution's radiologist for final review. RESULTS In total, 25,104 chest radiographs of 21,039 patients (mean age 61.1 years ± 16.2 [SD]; 10,436 men) were included. The AI software detected discrepancies between imaging and reports in 21.1% (5289 of 25,104). After review by the external radiologist, 0.9% (47 of 5289) of cases were deemed to contain clinically relevant missed findings. The institution's radiologists confirmed 35 of 47 missed findings (74.5%) as clinically relevant (0.1% of all cases). Missed findings consisted of lung nodules (71.4%, 25 of 35), pneumothoraces (17.1%, 6 of 35) and consolidations (11.4%, 4 of 35). CONCLUSION The AI-assisted double reading system was able to identify missed findings on chest radiographs after report authorisation. The approach required an external radiologist to review the AI-detected discrepancies. The number of clinically relevant missed findings by radiologists was very low. CLINICAL RELEVANCE STATEMENT The AI-assisted double reader workflow was shown to detect diagnostic errors and could be applied as a quality assurance tool. Although clinically relevant missed findings were rare, there is potential impact given the common use of chest radiography. KEY POINTS • A commercially available double reading system supported by artificial intelligence was evaluated to detect reporting errors in chest radiographs (n=25,104) from two institutions. • Clinically relevant missed findings were found in 0.1% of chest radiographs and consisted of unreported lung nodules, pneumothoraces and consolidations. • Applying AI software as a secondary reader after report authorisation can assist in reducing diagnostic errors without interrupting the radiologist's reading workflow. However, the number of AI-detected discrepancies was considerable and required review by a radiologist to assess their relevance.
Collapse
Affiliation(s)
- Laurens Topff
- Department of Radiology, Netherlands Cancer Institute, Amsterdam, The Netherlands.
- GROW School for Oncology and Reproduction, Maastricht University, Maastricht, The Netherlands.
| | - Sanne Steltenpool
- Department of Radiology and Nuclear Medicine, Erasmus MC, University Medical Center Rotterdam, Rotterdam, The Netherlands
- Department of Radiology, Elisabeth-TweeSteden Hospital, Tilburg, The Netherlands
| | - Erik R Ranschaert
- Department of Radiology, St. Nikolaus Hospital, Eupen, Belgium
- Ghent University, Ghent, Belgium
| | - Naglis Ramanauskas
- Oxipit UAB, Vilnius, Lithuania
- Department of Radiology, Nuclear Medicine and Medical Physics, Institute of Biomedical Sciences, Faculty of Medicine, Vilnius University, Vilnius, Lithuania
| | - Renee Menezes
- Biostatistics Centre, Department of Psychosocial Research and Epidemiology, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Jacob J Visser
- Department of Radiology and Nuclear Medicine, Erasmus MC, University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - Regina G H Beets-Tan
- Department of Radiology, Netherlands Cancer Institute, Amsterdam, The Netherlands
- GROW School for Oncology and Reproduction, Maastricht University, Maastricht, The Netherlands
| | - Nolan S Hartkamp
- Department of Radiology, Elisabeth-TweeSteden Hospital, Tilburg, The Netherlands
| |
Collapse
|
6
|
Desolda G, Dimauro G, Esposito A, Lanzilotti R, Matera M, Zancanaro M. A Human-AI interaction paradigm and its application to rhinocytology. Artif Intell Med 2024; 155:102933. [PMID: 39094227 DOI: 10.1016/j.artmed.2024.102933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 07/17/2024] [Accepted: 07/19/2024] [Indexed: 08/04/2024]
Abstract
This article explores Human-Centered Artificial Intelligence (HCAI) in medical cytology, with a focus on enhancing the interaction with AI. It presents a Human-AI interaction paradigm that emphasizes explainability and user control of AI systems. It is an iterative negotiation process based on three interaction strategies aimed to (i) elaborate the system outcomes through iterative steps (Iterative Exploration), (ii) explain the AI system's behavior or decisions (Clarification), and (iii) allow non-expert users to trigger simple retraining of the AI model (Reconfiguration). This interaction paradigm is exploited in the redesign of an existing AI-based tool for microscopic analysis of the nasal mucosa. The resulting tool is tested with rhinocytologists. The article discusses the analysis of the results of the conducted evaluation and outlines lessons learned that are relevant for AI in medicine.
Collapse
Affiliation(s)
- Giuseppe Desolda
- Department of Computer Science, University of Bari Aldo Moro, Via E. Orabona 4, Bari, 70125, Italy.
| | - Giovanni Dimauro
- Department of Computer Science, University of Bari Aldo Moro, Via E. Orabona 4, Bari, 70125, Italy.
| | - Andrea Esposito
- Department of Computer Science, University of Bari Aldo Moro, Via E. Orabona 4, Bari, 70125, Italy.
| | - Rosa Lanzilotti
- Department of Computer Science, University of Bari Aldo Moro, Via E. Orabona 4, Bari, 70125, Italy.
| | - Maristella Matera
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Via Ponzio 34/5, Milan, 20133, Italy.
| | - Massimo Zancanaro
- Department of Psychology and Cognitive Science, University of Trento, Corso Bettini 31, Rovereto, 38068, Italy; Fondazione Bruno Kessler, Povo, Trento, 38123, Italy.
| |
Collapse
|
7
|
Moosavi A, Huang S, Vahabi M, Motamedivafa B, Tian N, Mahmood R, Liu P, Sun CL. Prospective Human Validation of Artificial Intelligence Interventions in Cardiology: A Scoping Review. JACC. ADVANCES 2024; 3:101202. [PMID: 39372457 PMCID: PMC11450923 DOI: 10.1016/j.jacadv.2024.101202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Revised: 07/09/2024] [Accepted: 07/11/2024] [Indexed: 10/08/2024]
Abstract
Background Despite the potential of artificial intelligence (AI) in enhancing cardiovascular care, its integration into clinical practice is limited by a lack of evidence on its effectiveness with respect to human experts or gold standard practices in real-world settings. Objectives The purpose of this study was to identify AI interventions in cardiology that have been prospectively validated against human expert benchmarks or gold standard practices, assessing their effectiveness, and identifying future research areas. Methods We systematically reviewed Scopus and MEDLINE to identify peer-reviewed publications that involved prospective human validation of AI-based interventions in cardiology from January 2015 to December 2023. Results Of 2,351 initial records, 64 studies were included. Among these studies, 59 (92.2%) were published after 2020. A total of 11 (17.2%) randomized controlled trials were published. AI interventions in 44 articles (68.75%) reported definite clinical or operational improvements over human experts. These interventions were mostly used in imaging (n = 14, 21.9%), ejection fraction (n = 10, 15.6%), arrhythmia (n = 9, 14.1%), and coronary artery disease (n = 12, 18.8%) application areas. Convolutional neural networks were the most common predictive model (n = 44, 69%), and images were the most used data type (n = 38, 54.3%). Only 22 (34.4%) studies made their models or data accessible. Conclusions This review identifies the potential of AI in cardiology, with models often performing equally well as human counterparts for specific and clearly scoped tasks suitable for such models. Nonetheless, the limited number of randomized controlled trials emphasizes the need for continued validation, especially in real-world settings that closely examine joint human AI decision-making.
Collapse
Affiliation(s)
- Amirhossein Moosavi
- Telfer School of Management, University of Ottawa, Ottawa, Ontario, Canada
- University of Ottawa Heart Institute, University of Ottawa, Ottawa, Ontario, Canada
| | - Steven Huang
- University of Ottawa Heart Institute, University of Ottawa, Ottawa, Ontario, Canada
| | - Maryam Vahabi
- Telfer School of Management, University of Ottawa, Ottawa, Ontario, Canada
- University of Ottawa Heart Institute, University of Ottawa, Ottawa, Ontario, Canada
| | - Bahar Motamedivafa
- Telfer School of Management, University of Ottawa, Ottawa, Ontario, Canada
- University of Ottawa Heart Institute, University of Ottawa, Ottawa, Ontario, Canada
| | - Nelly Tian
- Marshall School of Business, University of Southern California, Los Angeles, California, USA
| | - Rafid Mahmood
- Telfer School of Management, University of Ottawa, Ottawa, Ontario, Canada
| | - Peter Liu
- University of Ottawa Heart Institute, University of Ottawa, Ottawa, Ontario, Canada
| | - Christopher L.F. Sun
- Telfer School of Management, University of Ottawa, Ottawa, Ontario, Canada
- University of Ottawa Heart Institute, University of Ottawa, Ottawa, Ontario, Canada
| |
Collapse
|
8
|
McCradden MD, Stedman I. Explaining decisions without explainability? Artificial intelligence and medicolegal accountability. Future Healthc J 2024; 11:100171. [PMID: 39371527 PMCID: PMC11452834 DOI: 10.1016/j.fhj.2024.100171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Accepted: 08/06/2024] [Indexed: 10/08/2024]
Abstract
Image, graphical abstract.
Collapse
Affiliation(s)
- Melissa D. McCradden
- Australian Institute for Machine Learning, University of Adelaide, Australia
- Women's and Children's Hospital, Adelaide, Australia
- SickKids Research Institute, Toronto, Canada
| | - Ian Stedman
- School of Public Policy and Administration at York University, Toronto, Ontario, Canada
| |
Collapse
|
9
|
Dingel J, Kleine AK, Cecil J, Sigl AL, Lermer E, Gaube S. Predictors of Health Care Practitioners' Intention to Use AI-Enabled Clinical Decision Support Systems: Meta-Analysis Based on the Unified Theory of Acceptance and Use of Technology. J Med Internet Res 2024; 26:e57224. [PMID: 39102675 PMCID: PMC11333871 DOI: 10.2196/57224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 05/03/2024] [Accepted: 05/13/2024] [Indexed: 08/07/2024] Open
Abstract
BACKGROUND Artificial intelligence-enabled clinical decision support systems (AI-CDSSs) offer potential for improving health care outcomes, but their adoption among health care practitioners remains limited. OBJECTIVE This meta-analysis identified predictors influencing health care practitioners' intention to use AI-CDSSs based on the Unified Theory of Acceptance and Use of Technology (UTAUT). Additional predictors were examined based on existing empirical evidence. METHODS The literature search using electronic databases, forward searches, conference programs, and personal correspondence yielded 7731 results, of which 17 (0.22%) studies met the inclusion criteria. Random-effects meta-analysis, relative weight analyses, and meta-analytic moderation and mediation analyses were used to examine the relationships between relevant predictor variables and the intention to use AI-CDSSs. RESULTS The meta-analysis results supported the application of the UTAUT to the context of the intention to use AI-CDSSs. The results showed that performance expectancy (r=0.66), effort expectancy (r=0.55), social influence (r=0.66), and facilitating conditions (r=0.66) were positively associated with the intention to use AI-CDSSs, in line with the predictions of the UTAUT. The meta-analysis further identified positive attitude (r=0.63), trust (r=0.73), anxiety (r=-0.41), perceived risk (r=-0.21), and innovativeness (r=0.54) as additional relevant predictors. Trust emerged as the most influential predictor overall. The results of the moderation analyses show that the relationship between social influence and use intention becomes weaker with increasing age. In addition, the relationship between effort expectancy and use intention was stronger for diagnostic AI-CDSSs than for devices that combined diagnostic and treatment recommendations. Finally, the relationship between facilitating conditions and use intention was mediated through performance and effort expectancy. CONCLUSIONS This meta-analysis contributes to the understanding of the predictors of intention to use AI-CDSSs based on an extended UTAUT model. More research is needed to substantiate the identified relationships and explain the observed variations in effect sizes by identifying relevant moderating factors. The research findings bear important implications for the design and implementation of training programs for health care practitioners to ease the adoption of AI-CDSSs into their practice.
Collapse
Affiliation(s)
- Julius Dingel
- Human-AI-Interaction Group, Center for Leadership and People Management, Ludwig Maximilian University of Munich, Munich, Germany
| | - Anne-Kathrin Kleine
- Human-AI-Interaction Group, Center for Leadership and People Management, Ludwig Maximilian University of Munich, Munich, Germany
| | - Julia Cecil
- Human-AI-Interaction Group, Center for Leadership and People Management, Ludwig Maximilian University of Munich, Munich, Germany
| | - Anna Leonie Sigl
- Department of Liberal Arts and Sciences, Technical University of Applied Sciences Augsburg, Augsburg, Germany
| | - Eva Lermer
- Human-AI-Interaction Group, Center for Leadership and People Management, Ludwig Maximilian University of Munich, Munich, Germany
- Department of Liberal Arts and Sciences, Technical University of Applied Sciences Augsburg, Augsburg, Germany
| | - Susanne Gaube
- Human Factors in Healthcare, Global Business School for Health, University College London, London, United Kingdom
| |
Collapse
|
10
|
Rainey C, Bond R, McConnell J, Hughes C, Kumar D, McFadden S. Reporting radiographers' interaction with Artificial Intelligence-How do different forms of AI feedback impact trust and decision switching? PLOS DIGITAL HEALTH 2024; 3:e0000560. [PMID: 39110687 PMCID: PMC11305567 DOI: 10.1371/journal.pdig.0000560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Accepted: 06/22/2024] [Indexed: 08/10/2024]
Abstract
Artificial Intelligence (AI) has been increasingly integrated into healthcare settings, including the radiology department to aid radiographic image interpretation, including reporting by radiographers. Trust has been cited as a barrier to effective clinical implementation of AI. Appropriating trust will be important in the future with AI to ensure the ethical use of these systems for the benefit of the patient, clinician and health services. Means of explainable AI, such as heatmaps have been proposed to increase AI transparency and trust by elucidating which parts of image the AI 'focussed on' when making its decision. The aim of this novel study was to quantify the impact of different forms of AI feedback on the expert clinicians' trust. Whilst this study was conducted in the UK, it has potential international application and impact for AI interface design, either globally or in countries with similar cultural and/or economic status to the UK. A convolutional neural network was built for this study; trained, validated and tested on a publicly available dataset of MUsculoskeletal RAdiographs (MURA), with binary diagnoses and Gradient Class Activation Maps (GradCAM) as outputs. Reporting radiographers (n = 12) were recruited to this study from all four regions of the UK. Qualtrics was used to present each participant with a total of 18 complete examinations from the MURA test dataset (each examination contained more than one radiographic image). Participants were presented with the images first, images with heatmaps next and finally an AI binary diagnosis in a sequential order. Perception of trust in the AI systems was obtained following the presentation of each heatmap and binary feedback. The participants were asked to indicate whether they would change their mind (or decision switch) in response to the AI feedback. Participants disagreed with the AI heatmaps for the abnormal examinations 45.8% of the time and agreed with binary feedback on 86.7% of examinations (26/30 presentations).'Only two participants indicated that they would decision switch in response to all AI feedback (GradCAM and binary) (0.7%, n = 2) across all datasets. 22.2% (n = 32) of participants agreed with the localisation of pathology on the heatmap. The level of agreement with the GradCAM and binary diagnosis was found to be correlated with trust (GradCAM:-.515;-.584, significant large negative correlation at 0.01 level (p = < .01 and-.309;-.369, significant medium negative correlation at .01 level (p = < .01) for GradCAM and binary diagnosis respectively). This study shows that the extent of agreement with both AI binary diagnosis and heatmap is correlated with trust in AI for the participants in this study, where greater agreement with the form of AI feedback is associated with greater trust in AI, in particular in the heatmap form of AI feedback. Forms of explainable AI should be developed with cognisance of the need for precision and accuracy in localisation to promote appropriate trust in clinical end users.
Collapse
Affiliation(s)
- Clare Rainey
- Ulster University, School of Health Sciences, York St, Belfast, Northern Ireland
| | - Raymond Bond
- Ulster University, School of Computing, York St, Belfast, Northern Ireland
| | | | - Ciara Hughes
- Ulster University, School of Health Sciences, York St, Belfast, Northern Ireland
| | - Devinder Kumar
- School of Medicine, Stanford University, California, United States of America
| | - Sonyia McFadden
- Ulster University, School of Health Sciences, York St, Belfast, Northern Ireland
| |
Collapse
|
11
|
Brady AP, Allen B, Chong J, Kotter E, Kottler N, Mongan J, Oakden-Rayner L, Pinto Dos Santos D, Tang A, Wald C, Slavotinek J. Developing, Purchasing, Implementing and Monitoring AI Tools in Radiology: Practical Considerations. A Multi-Society Statement From the ACR, CAR, ESR, RANZCR & RSNA. J Am Coll Radiol 2024; 21:1292-1310. [PMID: 38276923 DOI: 10.1016/j.jacr.2023.12.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2024]
Abstract
Artificial intelligence (AI) carries the potential for unprecedented disruption in radiology, with possible positive and negative consequences. The integration of AI in radiology holds the potential to revolutionize healthcare practices by advancing diagnosis, quantification, and management of multiple medical conditions. Nevertheless, the ever-growing availability of AI tools in radiology highlights an increasing need to critically evaluate claims for its utility and to differentiate safe product offerings from potentially harmful, or fundamentally unhelpful ones. This multi-society paper, presenting the views of Radiology Societies in the USA, Canada, Europe, Australia, and New Zealand, defines the potential practical problems and ethical issues surrounding the incorporation of AI into radiological practice. In addition to delineating the main points of concern that developers, regulators, and purchasers of AI tools should consider prior to their introduction into clinical practice, this statement also suggests methods to monitor their stability and safety in clinical use, and their suitability for possible autonomous function. This statement is intended to serve as a useful summary of the practical issues which should be considered by all parties involved in the development of radiology AI resources, and their implementation as clinical tools. KEY POINTS.
Collapse
Affiliation(s)
| | - Bibb Allen
- Department of Radiology, Grandview Medical Center, Birmingham, Alabama; American College of Radiology Data Science Institute, Reston, Virginia
| | - Jaron Chong
- Department of Medical Imaging, Schulich School of Medicine and Dentistry, Western University, London, ON, Canada
| | - Elmar Kotter
- Department of Diagnostic and Interventional Radiology, Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Nina Kottler
- Radiology Partners, El Segundo, California; Stanford Center for Artificial Intelligence in Medicine & Imaging, Palo Alto, California
| | - John Mongan
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, California
| | - Lauren Oakden-Rayner
- Australian Institute for Machine Learning, University of Adelaide, Adelaide, Australia
| | - Daniel Pinto Dos Santos
- Department of Radiology, University Hospital of Cologne, Cologne, Germany; Department of Radiology, University Hospital of Frankfurt, Frankfurt, Germany
| | - An Tang
- Department of Radiology, Radiation Oncology, and Nuclear Medicine, Université de Montréal, Montréal, Québec, Canada
| | - Christoph Wald
- Department of Radiology, Lahey Hospital & Medical Center, Burlington, Massachusetts; Tufts University Medical School, Boston, Massachusetts; Commision on Informatics, and Member, Board of Chancellors, American College of Radiology, Virginia
| | - John Slavotinek
- South Australia Medical Imaging, Flinders Medical Centre Adelaide, Adelaide, Australia; College of Medicine and Public Health, Flinders University, Adelaide, Australia
| |
Collapse
|
12
|
Montomoli J, Bitondo MM, Cascella M, Rezoagli E, Romeo L, Bellini V, Semeraro F, Gamberini E, Frontoni E, Agnoletti V, Altini M, Benanti P, Bignami EG. Algor-ethics: charting the ethical path for AI in critical care. J Clin Monit Comput 2024; 38:931-939. [PMID: 38573370 PMCID: PMC11297831 DOI: 10.1007/s10877-024-01157-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2023] [Accepted: 03/22/2024] [Indexed: 04/05/2024]
Abstract
The integration of Clinical Decision Support Systems (CDSS) based on artificial intelligence (AI) in healthcare is groundbreaking evolution with enormous potential, but its development and ethical implementation, presents unique challenges, particularly in critical care, where physicians often deal with life-threating conditions requiring rapid actions and patients unable to participate in the decisional process. Moreover, development of AI-based CDSS is complex and should address different sources of bias, including data acquisition, health disparities, domain shifts during clinical use, and cognitive biases in decision-making. In this scenario algor-ethics is mandatory and emphasizes the integration of 'Human-in-the-Loop' and 'Algorithmic Stewardship' principles, and the benefits of advanced data engineering. The establishment of Clinical AI Departments (CAID) is necessary to lead AI innovation in healthcare, ensuring ethical integrity and human-centered development in this rapidly evolving field.
Collapse
Affiliation(s)
- Jonathan Montomoli
- Department of Anesthesia and Intensive Care, Infermi Hospital, Romagna Local Health Authority, Viale Settembrini 2, Rimini, 47923, Italy.
- Health Services Research, Evaluation and Policy Unit, Romagna Local Health Authority, Viale Settembrini 2, Rimini, 47923, Italy.
| | - Maria Maddalena Bitondo
- Department of Anesthesia and Intensive Care, Infermi Hospital, Romagna Local Health Authority, Viale Settembrini 2, Rimini, 47923, Italy
| | - Marco Cascella
- Unit of Anesthesia and Pain Medicine, Department of Medicine, Surgery and Dentistry "Scuola Medica Salernitana, " University of Salerno, Baronissi, Salerno, Italy
| | - Emanuele Rezoagli
- School of Medicine and Surgery, University of Milano-Bicocca, Via Cadore, 48, Monza, 20900, Italy
- Dipartimento di Emergenza e Urgenza, Terapia intensiva e Semintensiva adulti e pediatrica, Fondazione IRCCS San Gerardo dei Tintori, Via Pergolesi, 33, Monza, 20900, Italy
| | - Luca Romeo
- Department of Economics and Law, University of Macerata, Macerata, 62100, Italy
| | - Valentina Bellini
- Anesthesiology, Critical Care and Pain Medicine Division, Department of Medicine and Surgery, University of Parma, Via Gramsci 14, Parma, 43125, Italy
| | - Federico Semeraro
- Department of Anesthesia, Intensive Care and Prehospital Emergency, Ospedale Maggiore Carlo Alberto Pizzardi, Largo Bartolo Nigrisoli, 2, Bologna, 40133, Italy
| | - Emiliano Gamberini
- Department of Anesthesia and Intensive Care, Infermi Hospital, Romagna Local Health Authority, Viale Settembrini 2, Rimini, 47923, Italy
| | - Emanuele Frontoni
- Department of Political Sciences, Communication and International Relations, University of Macerata, Macerata, 62100, Italy
| | - Vanni Agnoletti
- Department of Surgery and Trauma, Anesthesia and Intensive Care Unit, Maurizio Bufalini Hospital, Romagna Local Health Authority, Viale Giovanni Ghirotti, 286, Cesena, 47521, Italy
| | - Mattia Altini
- Hospital Care Sector, Emilia-Romagna Region, Via Aldo Moro, 21, Bologna, 40127, Italy
| | - Paolo Benanti
- Pontifical Gregorian University, Piazza della Pilotta 4, Roma, 00187, Italy
| | - Elena Giovanna Bignami
- Anesthesiology, Critical Care and Pain Medicine Division, Department of Medicine and Surgery, University of Parma, Via Gramsci 14, Parma, 43125, Italy
| |
Collapse
|
13
|
Reis M, Reis F, Kunde W. Influence of believed AI involvement on the perception of digital medical advice. Nat Med 2024:10.1038/s41591-024-03180-7. [PMID: 39054373 DOI: 10.1038/s41591-024-03180-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 07/04/2024] [Indexed: 07/27/2024]
Abstract
Large language models offer novel opportunities to seek digital medical advice. While previous research primarily addressed the performance of such artificial intelligence (AI)-based tools, public perception of these advancements received little attention. In two preregistered studies (n = 2,280), we presented participants with scenarios of patients obtaining medical advice. All participants received identical information, but we manipulated the putative source of this advice ('AI', 'human physician', 'human + AI'). 'AI'- and 'human + AI'-labeled advice was evaluated as significantly less reliable and less empathetic compared with 'human'-labeled advice. Moreover, participants indicated lower willingness to follow the advice when AI was believed to be involved in advice generation. Our findings point toward an anti-AI bias when receiving digital medical advice, even when AI is supposedly supervised by physicians. Given the tremendous potential of AI for medicine, elucidating ways to counteract this bias should be an important objective of future research.
Collapse
Affiliation(s)
- Moritz Reis
- Institute of Psychology, Julius-Maximilians-Universität Würzburg, Würzburg, Germany.
- Judge Business School, University of Cambridge, Cambridge, UK.
| | - Florian Reis
- Medical Affairs, Pfizer Pharma GmbH, Berlin, Germany
| | - Wilfried Kunde
- Institute of Psychology, Julius-Maximilians-Universität Würzburg, Würzburg, Germany
| |
Collapse
|
14
|
Kostick-Quenet K, Lang BH, Smith J, Hurley M, Blumenthal-Barby J. Trust criteria for artificial intelligence in health: normative and epistemic considerations. JOURNAL OF MEDICAL ETHICS 2024; 50:544-551. [PMID: 37979976 PMCID: PMC11101592 DOI: 10.1136/jme-2023-109338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 11/02/2023] [Indexed: 11/20/2023]
Abstract
Rapid advancements in artificial intelligence and machine learning (AI/ML) in healthcare raise pressing questions about how much users should trust AI/ML systems, particularly for high stakes clinical decision-making. Ensuring that user trust is properly calibrated to a tool's computational capacities and limitations has both practical and ethical implications, given that overtrust or undertrust can influence over-reliance or under-reliance on algorithmic tools, with significant implications for patient safety and health outcomes. It is, thus, important to better understand how variability in trust criteria across stakeholders, settings, tools and use cases may influence approaches to using AI/ML tools in real settings. As part of a 5-year, multi-institutional Agency for Health Care Research and Quality-funded study, we identify trust criteria for a survival prediction algorithm intended to support clinical decision-making for left ventricular assist device therapy, using semistructured interviews (n=40) with patients and physicians, analysed via thematic analysis. Findings suggest that physicians and patients share similar empirical considerations for trust, which were primarily epistemic in nature, focused on accuracy and validity of AI/ML estimates. Trust evaluations considered the nature, integrity and relevance of training data rather than the computational nature of algorithms themselves, suggesting a need to distinguish 'source' from 'functional' explainability. To a lesser extent, trust criteria were also relational (endorsement from others) and sometimes based on personal beliefs and experience. We discuss implications for promoting appropriate and responsible trust calibration for clinical decision-making use AI/ML.
Collapse
Affiliation(s)
- Kristin Kostick-Quenet
- Center for Medical Ethics and Health Policy, Baylor College of Medicine, Houston, Texas, USA
| | - Benjamin H Lang
- Center for Medical Ethics and Health Policy, Baylor College of Medicine, Houston, Texas, USA
- Department of Philosophy, University of Oxford, Oxford, Oxfordshire, UK
| | - Jared Smith
- Center for Medical Ethics and Health Policy, Baylor College of Medicine, Houston, Texas, USA
| | - Meghan Hurley
- Center for Medical Ethics and Health Policy, Baylor College of Medicine, Houston, Texas, USA
| | | |
Collapse
|
15
|
Chang JY, Makary MS. Evolving and Novel Applications of Artificial Intelligence in Thoracic Imaging. Diagnostics (Basel) 2024; 14:1456. [PMID: 39001346 PMCID: PMC11240935 DOI: 10.3390/diagnostics14131456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 07/01/2024] [Accepted: 07/06/2024] [Indexed: 07/16/2024] Open
Abstract
The advent of artificial intelligence (AI) is revolutionizing medicine, particularly radiology. With the development of newer models, AI applications are demonstrating improved performance and versatile utility in the clinical setting. Thoracic imaging is an area of profound interest, given the prevalence of chest imaging and the significant health implications of thoracic diseases. This review aims to highlight the promising applications of AI within thoracic imaging. It examines the role of AI, including its contributions to improving diagnostic evaluation and interpretation, enhancing workflow, and aiding in invasive procedures. Next, it further highlights the current challenges and limitations faced by AI, such as the necessity of 'big data', ethical and legal considerations, and bias in representation. Lastly, it explores the potential directions for the application of AI in thoracic radiology.
Collapse
Affiliation(s)
- Jin Y Chang
- Department of Radiology, The Ohio State University College of Medicine, Columbus, OH 43210, USA
| | - Mina S Makary
- Department of Radiology, The Ohio State University College of Medicine, Columbus, OH 43210, USA
- Division of Vascular and Interventional Radiology, Department of Radiology, The Ohio State University Wexner Medical Center, Columbus, OH 43210, USA
| |
Collapse
|
16
|
Day TG, Matthew J, Budd SF, Venturini L, Wright R, Farruggia A, Vigneswaran TV, Zidere V, Hajnal JV, Razavi R, Simpson JM, Kainz B. Interaction between clinicians and artificial intelligence to detect fetal atrioventricular septal defects on ultrasound: how can we optimize collaborative performance? ULTRASOUND IN OBSTETRICS & GYNECOLOGY : THE OFFICIAL JOURNAL OF THE INTERNATIONAL SOCIETY OF ULTRASOUND IN OBSTETRICS AND GYNECOLOGY 2024; 64:28-35. [PMID: 38197584 DOI: 10.1002/uog.27577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 12/19/2023] [Accepted: 12/30/2023] [Indexed: 01/11/2024]
Abstract
OBJECTIVES Artificial intelligence (AI) has shown promise in improving the performance of fetal ultrasound screening in detecting congenital heart disease (CHD). The effect of giving AI advice to human operators has not been studied in this context. Giving additional information about AI model workings, such as confidence scores for AI predictions, may be a way of further improving performance. Our aims were to investigate whether AI advice improved overall diagnostic accuracy (using a single CHD lesion as an exemplar), and to determine what, if any, additional information given to clinicians optimized the overall performance of the clinician-AI team. METHODS An AI model was trained to classify a single fetal CHD lesion (atrioventricular septal defect (AVSD)), using a retrospective cohort of 121 130 cardiac four-chamber images extracted from 173 ultrasound scan videos (98 with normal hearts, 75 with AVSD); a ResNet50 model architecture was used. Temperature scaling of model prediction probability was performed on a validation set, and gradient-weighted class activation maps (grad-CAMs) produced. Ten clinicians (two consultant fetal cardiologists, three trainees in pediatric cardiology and five fetal cardiac sonographers) were recruited from a center of fetal cardiology to participate. Each participant was shown 2000 fetal four-chamber images in a random order (1000 normal and 1000 AVSD). The dataset comprised 500 images, each shown in four conditions: (1) image alone without AI output; (2) image with binary AI classification; (3) image with AI model confidence; and (4) image with grad-CAM image overlays. The clinicians were asked to classify each image as normal or AVSD. RESULTS A total of 20 000 image classifications were recorded from 10 clinicians. The AI model alone achieved an accuracy of 0.798 (95% CI, 0.760-0.832), a sensitivity of 0.868 (95% CI, 0.834-0.902) and a specificity of 0.728 (95% CI, 0.702-0.754), and the clinicians without AI achieved an accuracy of 0.844 (95% CI, 0.834-0.854), a sensitivity of 0.827 (95% CI, 0.795-0.858) and a specificity of 0.861 (95% CI, 0.828-0.895). Showing a binary (normal or AVSD) AI model output resulted in significant improvement in accuracy to 0.865 (P < 0.001). This effect was seen in both experienced and less-experienced participants. Giving incorrect AI advice resulted in a significant deterioration in overall accuracy, from 0.761 to 0.693 (P < 0.001), which was driven by an increase in both Type-I and Type-II errors by the clinicians. This effect was worsened by showing model confidence (accuracy, 0.649; P < 0.001) or grad-CAM (accuracy, 0.644; P < 0.001). CONCLUSIONS AI has the potential to improve performance when used in collaboration with clinicians, even if the model performance does not reach expert level. Giving additional information about model workings such as model confidence and class activation map image overlays did not improve overall performance, and actually worsened performance for images for which the AI model was incorrect. © 2024 The Authors. Ultrasound in Obstetrics & Gynecology published by John Wiley & Sons Ltd on behalf of International Society of Ultrasound in Obstetrics and Gynecology.
Collapse
Affiliation(s)
- T G Day
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
- Department of Congenital Heart Disease, Evelina London Children's Healthcare, Guy's and St Thomas' NHS Foundation Trust, London, UK
| | - J Matthew
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
| | - S F Budd
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
| | - L Venturini
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
| | - R Wright
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
| | - A Farruggia
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
| | - T V Vigneswaran
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
- Department of Congenital Heart Disease, Evelina London Children's Healthcare, Guy's and St Thomas' NHS Foundation Trust, London, UK
| | - V Zidere
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
- Department of Congenital Heart Disease, Evelina London Children's Healthcare, Guy's and St Thomas' NHS Foundation Trust, London, UK
- Harris Birthright Research Centre, King's College London NHS Foundation Trust, London, UK
| | - J V Hajnal
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
| | - R Razavi
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
- Department of Congenital Heart Disease, Evelina London Children's Healthcare, Guy's and St Thomas' NHS Foundation Trust, London, UK
| | - J M Simpson
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
- Department of Congenital Heart Disease, Evelina London Children's Healthcare, Guy's and St Thomas' NHS Foundation Trust, London, UK
| | - B Kainz
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
- Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität, Erlangen-Nürnberg, Germany
- Department of Computing, Faculty of Engineering, Imperial College London, London, UK
| |
Collapse
|
17
|
Chen H, Ma X, Rives H, Serpedin A, Yao P, Rameau A. Trust in Machine Learning Driven Clinical Decision Support Tools Among Otolaryngologists. Laryngoscope 2024; 134:2799-2804. [PMID: 38230948 DOI: 10.1002/lary.31260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 11/29/2023] [Accepted: 12/20/2023] [Indexed: 01/18/2024]
Abstract
BACKGROUND Machine learning driven clinical decision support tools (ML-CDST) are on the verge of being integrated into clinical settings, including in Otolaryngology-Head & Neck Surgery. In this study, we investigated whether such CDST may influence otolaryngologists' diagnostic judgement. METHODS Otolaryngologists were recruited virtually across the United States for this experiment on human-AI interaction. Participants were shown 12 different video-stroboscopic exams from patients with previously diagnosed laryngopharyngeal reflux or vocal fold paresis and asked to determine the presence of disease. They were then exposed to a random diagnosis purportedly resulting from an ML-CDST and given the opportunity to revise their diagnosis. The ML-CDST output was presented with no explanation, a general explanation, or a specific explanation of its logic. The ML-CDST impact on diagnostic judgement was assessed with McNemar's test. RESULTS Forty-five participants were recruited. When participants reported less confidence (268 observations), they were significantly (p = 0.001) more likely to change their diagnostic judgement after exposure to ML-CDST output compared to when they reported more confidence (238 observations). Participants were more likely to change their diagnostic judgement when presented with a specific explanation of the CDST logic (p = 0.048). CONCLUSIONS Our study suggests that otolaryngologists are susceptible to accepting ML-CDST diagnostic recommendations, especially when less confident. Otolaryngologists' trust in ML-CDST output is increased when accompanied with a specific explanation of its logic. LEVEL OF EVIDENCE 2 Laryngoscope, 134:2799-2804, 2024.
Collapse
Affiliation(s)
- Hannah Chen
- Sean Parker Institute for the Voice, Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, New York, New York, USA
| | - Xiaoyue Ma
- Division of Biostatistics, Department of Population Health Sciences, Weill Cornell Medical College, New York, New York, USA
| | - Hal Rives
- Sean Parker Institute for the Voice, Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, New York, New York, USA
| | - Aisha Serpedin
- Sean Parker Institute for the Voice, Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, New York, New York, USA
| | - Peter Yao
- Sean Parker Institute for the Voice, Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, New York, New York, USA
| | - Anaïs Rameau
- Sean Parker Institute for the Voice, Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, New York, New York, USA
| |
Collapse
|
18
|
Kotter E, Pinto Dos Santos D. [Ethics and artificial intelligence]. RADIOLOGIE (HEIDELBERG, GERMANY) 2024; 64:498-502. [PMID: 38499692 DOI: 10.1007/s00117-024-01286-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 02/26/2024] [Indexed: 03/20/2024]
Abstract
The introduction of artificial intelligence (AI) into radiology promises to enhance efficiency and improve diagnostic accuracy, yet it also raises manifold ethical questions. These include data protection issues, the future role of radiologists, liability when using AI systems, and the avoidance of bias. To prevent data bias, the datasets need to be compiled carefully and to be representative of the target population. Accordingly, the upcoming European Union AI act sets particularly high requirements for the datasets used in training medical AI systems. Cognitive bias occurs when radiologists place too much trust in the results provided by AI systems (overreliance). So far, diagnostic AI systems are used almost exclusively as "second look" systems. If diagnostic AI systems are to be used in the future as "first look" systems or even as autonomous AI systems in order to enhance efficiency in radiology, the question of liability needs to be addressed, comparable to liability for autonomous driving. Such use of AI would also significantly change the role of radiologists.
Collapse
Affiliation(s)
- Elmar Kotter
- Klinik für Diagnostische und Interventionelle Radiologie, Universitätsklinikum Freiburg, Hugstetterstr. 55, 79106, Freiburg, Deutschland.
| | - Daniel Pinto Dos Santos
- Institut für Diagnostische und Interventionelle Radiologie, Uniklinik Köln, Kerpener Str. 62, 50937, Köln, Deutschland.
- Institut für Diagnostische und Interventionelle Radiologie, Universitätsklinik Frankfurt, Theodor-Stern-Kai 7, 60596, Frankfurt am Main, Deutschland.
| |
Collapse
|
19
|
Hasani AM, Singh S, Zahergivar A, Ryan B, Nethala D, Bravomontenegro G, Mendhiratta N, Ball M, Farhadi F, Malayeri A. Evaluating the performance of Generative Pre-trained Transformer-4 (GPT-4) in standardizing radiology reports. Eur Radiol 2024; 34:3566-3574. [PMID: 37938381 DOI: 10.1007/s00330-023-10384-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 09/01/2023] [Accepted: 09/08/2023] [Indexed: 11/09/2023]
Abstract
OBJECTIVE Radiology reporting is an essential component of clinical diagnosis and decision-making. With the advent of advanced artificial intelligence (AI) models like GPT-4 (Generative Pre-trained Transformer 4), there is growing interest in evaluating their potential for optimizing or generating radiology reports. This study aimed to compare the quality and content of radiologist-generated and GPT-4 AI-generated radiology reports. METHODS A comparative study design was employed in the study, where a total of 100 anonymized radiology reports were randomly selected and analyzed. Each report was processed by GPT-4, resulting in the generation of a corresponding AI-generated report. Quantitative and qualitative analysis techniques were utilized to assess similarities and differences between the two sets of reports. RESULTS The AI-generated reports showed comparable quality to radiologist-generated reports in most categories. Significant differences were observed in clarity (p = 0.027), ease of understanding (p = 0.023), and structure (p = 0.050), favoring the AI-generated reports. AI-generated reports were more concise, with 34.53 fewer words and 174.22 fewer characters on average, but had greater variability in sentence length. Content similarity was high, with an average Cosine Similarity of 0.85, Sequence Matcher Similarity of 0.52, BLEU Score of 0.5008, and BERTScore F1 of 0.8775. CONCLUSION The results of this proof-of-concept study suggest that GPT-4 can be a reliable tool for generating standardized radiology reports, offering potential benefits such as improved efficiency, better communication, and simplified data extraction and analysis. However, limitations and ethical implications must be addressed to ensure the safe and effective implementation of this technology in clinical practice. CLINICAL RELEVANCE STATEMENT The findings of this study suggest that GPT-4 (Generative Pre-trained Transformer 4), an advanced AI model, has the potential to significantly contribute to the standardization and optimization of radiology reporting, offering improved efficiency and communication in clinical practice. KEY POINTS • Large language model-generated radiology reports exhibited high content similarity and moderate structural resemblance to radiologist-generated reports. • Performance metrics highlighted the strong matching of word selection and order, as well as high semantic similarity between AI and radiologist-generated reports. • Large language model demonstrated potential for generating standardized radiology reports, improving efficiency and communication in clinical settings.
Collapse
Affiliation(s)
- Amir M Hasani
- Laboratory of Translation Research, National Heart Blood Lung Institute, NIH, Bethesda, MD, USA
| | - Shiva Singh
- Radiology & Imaging Sciences Department, Clinical Center, NIH, Bethesda, MD, USA
| | - Aryan Zahergivar
- Radiology & Imaging Sciences Department, Clinical Center, NIH, Bethesda, MD, USA
| | - Beth Ryan
- Urology Oncology Branch, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Daniel Nethala
- Urology Oncology Branch, National Cancer Institute, NIH, Bethesda, MD, USA
| | | | - Neil Mendhiratta
- Urology Oncology Branch, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Mark Ball
- Urology Oncology Branch, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Faraz Farhadi
- Radiology & Imaging Sciences Department, Clinical Center, NIH, Bethesda, MD, USA
| | - Ashkan Malayeri
- Radiology & Imaging Sciences Department, Clinical Center, NIH, Bethesda, MD, USA.
| |
Collapse
|
20
|
Yuan W, Du Z, Han S. Semi-supervised skin cancer diagnosis based on self-feedback threshold focal learning. Discov Oncol 2024; 15:180. [PMID: 38776027 PMCID: PMC11111630 DOI: 10.1007/s12672-024-01043-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 05/17/2024] [Indexed: 05/25/2024] Open
Abstract
Worldwide, skin cancer prevalence necessitates accurate diagnosis to alleviate public health burdens. Although the application of artificial intelligence in image analysis and pattern recognition has improved the accuracy and efficiency of early skin cancer diagnosis, existing supervised learning methods are limited due to their reliance on a large amount of labeled data. To overcome the limitations of data labeling and enhance the performance of diagnostic models, this study proposes a semi-supervised skin cancer diagnostic model based on Self-feedback Threshold Focal Learning (STFL), capable of utilizing partial labeled and a large scale of unlabeled medical images for training models in unseen scenarios. The proposed model dynamically adjusts the selection threshold of unlabeled samples during training, effectively filtering reliable unlabeled samples and using focal learning to mitigate the impact of class imbalance in further training. The study is experimentally validated on the HAM10000 dataset, which includes images of various types of skin lesions, with experiments conducted across different scales of labeled samples. With just 500 annotated samples, the model demonstrates robust performance (0.77 accuracy, 0.6408 Kappa, 0.77 recall, 0.7426 precision, and 0.7462 F1-score), showcasing its efficiency with limited labeled data. Further, comprehensive testing validates the semi-supervised model's significant advancements in diagnostic accuracy and efficiency, underscoring the value of integrating unlabeled data. This model offers a new perspective on medical image processing and contributes robust scientific support for the early diagnosis and treatment of skin cancer.
Collapse
Affiliation(s)
- Weicheng Yuan
- College of Basic Medicine, Hebei Medical University, Zhongshan East, Shijiazhuang, 050017, Hebei, China
| | - Zeyu Du
- School of Health Science, University of Manchester, Sackville Street, Manchester, 610101, England, UK
| | - Shuo Han
- Department of Anatomy, Hebei Medical University, Zhongshan East, Shijiazhuang, 050017, Hebei, China.
| |
Collapse
|
21
|
Rosen S, Saban M. Evaluating the reliability of ChatGPT as a tool for imaging test referral: a comparative study with a clinical decision support system. Eur Radiol 2024; 34:2826-2837. [PMID: 37828297 DOI: 10.1007/s00330-023-10230-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 07/28/2023] [Accepted: 08/01/2023] [Indexed: 10/14/2023]
Abstract
OBJECTIVES As the technology continues to evolve and advance, we can expect to see artificial intelligence (AI) being used in increasingly sophisticated ways to make a diagnosis and decisions such as suggesting the most appropriate imaging referrals. We aim to explore whether Chat Generative Pretrained Transformer (ChatGPT) can provide accurate imaging referrals for clinical use that are at least as good as the ESR iGuide. METHODS A comparative study was conducted in a tertiary hospital. Data was collected from 97 consecutive cases that were admitted to the emergency department with abdominal complaints. We compared the imaging test referral recommendations suggested by the ESR iGuide and the ChatGPT and analyzed cases of disagreement. In addition, we selected cases where ChatGPT recommended a chest abdominal pelvis (CAP) CT (n = 66), and asked four specialists to grade the appropriateness of the referral. RESULTS ChatGPT recommendations were consistent with the recommendations provided by the ESR iGuide. No statistical differences were found between the appropriateness of referrals by age or gender. Using a sub-analysis of CAP cases, a high agreement between ChatGPT and the specialists was found. Cases of disagreement (12.4%) were further analyzed and presented themes of vague recommendations such as "it would be advisable" and "this would help to rule out." CONCLUSIONS ChatGPT's ability to guide the selection of appropriate tests may be comparable to some degree with the ESR iGuide. Features such as the clinical, ethical, and regulatory implications are still warranted and need to be addressed prior to clinical implementation. Further studies are needed to confirm these findings. CLINICAL RELEVANCE STATEMENT The article explores the potential of using advanced language models, such as ChatGPT, in healthcare as a CDS for selecting appropriate imaging tests. Using ChatGPT can improve the efficiency of the decision-making process KEY POINTS: • ChatGPT recommendations were highly consistent with the recommendations provided by the ESR iGuide. • ChatGPT's ability in guiding the selection of appropriate tests may be comparable to some degree with ESR iGuide's.
Collapse
Affiliation(s)
- Shani Rosen
- Department of Health Technology and Policy Evaluation, Gertner Institute for Epidemiology and Health Policy, Institute of Epidemiology & Health Policy Research, Sheba Medical Center, Tel HaShomer, Ramat-Gan, Israel
- Nursing Department, School of Health Sciences, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Mor Saban
- Nursing Department, School of Health Sciences, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
| |
Collapse
|
22
|
Brady AP, Allen B, Chong J, Kotter E, Kottler N, Mongan J, Oakden-Rayner L, Dos Santos DP, Tang A, Wald C, Slavotinek J. Developing, Purchasing, Implementing and Monitoring AI Tools in Radiology: Practical Considerations. A Multi-Society Statement From the ACR, CAR, ESR, RANZCR & RSNA. Can Assoc Radiol J 2024; 75:226-244. [PMID: 38251882 DOI: 10.1177/08465371231222229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2024] Open
Abstract
Artificial Intelligence (AI) carries the potential for unprecedented disruption in radiology, with possible positive and negative consequences. The integration of AI in radiology holds the potential to revolutionize healthcare practices by advancing diagnosis, quantification, and management of multiple medical conditions. Nevertheless, the ever‑growing availability of AI tools in radiology highlights an increasing need to critically evaluate claims for its utility and to differentiate safe product offerings from potentially harmful, or fundamentally unhelpful ones. This multi‑society paper, presenting the views of Radiology Societies in the USA, Canada, Europe, Australia, and New Zealand, defines the potential practical problems and ethical issues surrounding the incorporation of AI into radiological practice. In addition to delineating the main points of concern that developers, regulators, and purchasers of AI tools should consider prior to their introduction into clinical practice, this statement also suggests methods to monitor their stability and safety in clinical use, and their suitability for possible autonomous function. This statement is intended to serve as a useful summary of the practical issues which should be considered by all parties involved in the development of radiology AI resources, and their implementation as clinical tools.
Collapse
Affiliation(s)
| | - Bibb Allen
- Department of Radiology, Grandview Medical Center, Birmingham, AL, USA
- Data Science Institute, American College of Radiology, Reston, VA, USA
| | - Jaron Chong
- Department of Medical Imaging, Schulich School of Medicine and Dentistry, Western University, London, ON, Canada
| | - Elmar Kotter
- Department of Diagnostic and Interventional Radiology, Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Nina Kottler
- Radiology Partners, El Segundo, CA, USA
- Stanford Center for Artificial Intelligence in Medicine & Imaging, Palo Alto, CA, USA
| | - John Mongan
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, CA, USA
| | - Lauren Oakden-Rayner
- Australian Institute for Machine Learning, University of Adelaide, Adelaide, SA, Australia
| | - Daniel Pinto Dos Santos
- Department of Radiology, University Hospital of Cologne, Cologne, Germany
- Department of Radiology, University Hospital of Frankfurt, Frankfurt, Germany
| | - An Tang
- Department of Radiology, Radiation Oncology, and Nuclear Medicine, Université de Montréal, Montréal, QC, Canada
| | - Christoph Wald
- Department of Radiology, Lahey Hospital & Medical Center, Burlington, MA, USA
- Tufts University Medical School, Boston, MA, USA
- American College of Radiology, Reston, VA, USA
| | - John Slavotinek
- South Australia Medical Imaging, Flinders Medical Centre Adelaide, SA, Australia
- College of Medicine and Public Health, Flinders University, Adelaide, SA, Australia
| |
Collapse
|
23
|
Cecil J, Lermer E, Hudecek MFC, Sauer J, Gaube S. Explainability does not mitigate the negative impact of incorrect AI advice in a personnel selection task. Sci Rep 2024; 14:9736. [PMID: 38679619 PMCID: PMC11056364 DOI: 10.1038/s41598-024-60220-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 04/19/2024] [Indexed: 05/01/2024] Open
Abstract
Despite the rise of decision support systems enabled by artificial intelligence (AI) in personnel selection, their impact on decision-making processes is largely unknown. Consequently, we conducted five experiments (N = 1403 students and Human Resource Management (HRM) employees) investigating how people interact with AI-generated advice in a personnel selection task. In all pre-registered experiments, we presented correct and incorrect advice. In Experiments 1a and 1b, we manipulated the source of the advice (human vs. AI). In Experiments 2a, 2b, and 2c, we further manipulated the type of explainability of AI advice (2a and 2b: heatmaps and 2c: charts). We hypothesized that accurate and explainable advice improves decision-making. The independent variables were regressed on task performance, perceived advice quality and confidence ratings. The results consistently showed that incorrect advice negatively impacted performance, as people failed to dismiss it (i.e., overreliance). Additionally, we found that the effects of source and explainability of advice on the dependent variables were limited. The lack of reduction in participants' overreliance on inaccurate advice when the systems' predictions were made more explainable highlights the complexity of human-AI interaction and the need for regulation and quality standards in HRM.
Collapse
Affiliation(s)
- Julia Cecil
- Department of Psychology, LMU Center for Leadership and People Management, LMU Munich, Munich, Germany.
| | - Eva Lermer
- Department of Psychology, LMU Center for Leadership and People Management, LMU Munich, Munich, Germany
- Department of Business Psychology, Technical University of Applied Sciences Augsburg, Augsburg, Germany
| | - Matthias F C Hudecek
- Department of Experimental Psychology, University of Regensburg, Regensburg, Germany
| | - Jan Sauer
- Department of Business Administration, University of Applied Sciences Amberg-Weiden, Weiden, Germany
| | - Susanne Gaube
- Department of Psychology, LMU Center for Leadership and People Management, LMU Munich, Munich, Germany
- UCL Global Business School for Health, University College London, London, UK
| |
Collapse
|
24
|
Jiang T, Chen C, Zhou Y, Cai S, Yan Y, Sui L, Lai M, Song M, Zhu X, Pan Q, Wang H, Chen X, Wang K, Xiong J, Chen L, Xu D. Deep learning-assisted diagnosis of benign and malignant parotid tumors based on ultrasound: a retrospective study. BMC Cancer 2024; 24:510. [PMID: 38654281 PMCID: PMC11036551 DOI: 10.1186/s12885-024-12277-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 04/16/2024] [Indexed: 04/25/2024] Open
Abstract
BACKGROUND To develop a deep learning(DL) model utilizing ultrasound images, and evaluate its efficacy in distinguishing between benign and malignant parotid tumors (PTs), as well as its practicality in assisting clinicians with accurate diagnosis. METHODS A total of 2211 ultrasound images of 980 pathologically confirmed PTs (Training set: n = 721; Validation set: n = 82; Internal-test set: n = 89; External-test set: n = 88) from 907 patients were retrospectively included in this study. The optimal model was selected and the diagnostic performance evaluation is conducted by utilizing the area under curve (AUC) of the receiver-operating characteristic(ROC) based on five different DL networks constructed at varying depths. Furthermore, a comparison of different seniority radiologists was made in the presence of the optimal auxiliary diagnosis model. Additionally, the diagnostic confusion matrix of the optimal model was calculated, and an analysis and summary of misjudged cases' characteristics were conducted. RESULTS The Resnet18 demonstrated superior diagnostic performance, with an AUC value of 0.947, accuracy of 88.5%, sensitivity of 78.2%, and specificity of 92.7% in internal-test set, and with an AUC value of 0.925, accuracy of 89.8%, sensitivity of 83.3%, and specificity of 90.6% in external-test set. The PTs were subjectively assessed twice by six radiologists, both with and without the assisted of the model. With the assisted of the model, both junior and senior radiologists demonstrated enhanced diagnostic performance. In the internal-test set, there was an increase in AUC values by 0.062 and 0.082 for junior radiologists respectively, while senior radiologists experienced an improvement of 0.066 and 0.106 in their respective AUC values. CONCLUSIONS The DL model based on ultrasound images demonstrates exceptional capability in distinguishing between benign and malignant PTs, thereby assisting radiologists of varying expertise levels to achieve heightened diagnostic performance, and serve as a noninvasive imaging adjunct diagnostic method for clinical purposes.
Collapse
Affiliation(s)
- Tian Jiang
- Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, 310022, Hangzhou, Zhejiang, China
- Postgraduate training base Alliance of Wenzhou Medical University (Zhejiang Cancer Hospital), 310022, Hangzhou, Zhejiang, China
- Zhejiang Provincial Research Center for Cancer Intelligent Diagnosis and Molecular Technology, 310022, Hangzhou, Zhejiang, China
| | - Chen Chen
- Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, 310022, Hangzhou, Zhejiang, China
- Wenling Big Data and Artificial Intelligence Institute in Medicine, 317502, TaiZhou, Zhejiang, China
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), 317502, Taizhou, Zhejiang, China
| | - Yahan Zhou
- Wenling Big Data and Artificial Intelligence Institute in Medicine, 317502, TaiZhou, Zhejiang, China
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), 317502, Taizhou, Zhejiang, China
| | - Shenzhou Cai
- Wenling Big Data and Artificial Intelligence Institute in Medicine, 317502, TaiZhou, Zhejiang, China
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), 317502, Taizhou, Zhejiang, China
| | - Yuqi Yan
- Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, 310022, Hangzhou, Zhejiang, China
- Postgraduate training base Alliance of Wenzhou Medical University (Zhejiang Cancer Hospital), 310022, Hangzhou, Zhejiang, China
- Wenling Big Data and Artificial Intelligence Institute in Medicine, 317502, TaiZhou, Zhejiang, China
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), 317502, Taizhou, Zhejiang, China
| | - Lin Sui
- Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, 310022, Hangzhou, Zhejiang, China
- Postgraduate training base Alliance of Wenzhou Medical University (Zhejiang Cancer Hospital), 310022, Hangzhou, Zhejiang, China
- Wenling Big Data and Artificial Intelligence Institute in Medicine, 317502, TaiZhou, Zhejiang, China
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), 317502, Taizhou, Zhejiang, China
| | - Min Lai
- Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, 310022, Hangzhou, Zhejiang, China
- Zhejiang Provincial Research Center for Cancer Intelligent Diagnosis and Molecular Technology, 310022, Hangzhou, Zhejiang, China
- Second Clinical College, Zhejiang University of Traditional Chinese Medicine, 310022, Hangzhou, Zhejiang, China
| | - Mei Song
- Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, 310022, Hangzhou, Zhejiang, China
- Zhejiang Provincial Research Center for Cancer Intelligent Diagnosis and Molecular Technology, 310022, Hangzhou, Zhejiang, China
| | - Xi Zhu
- Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, 310022, Hangzhou, Zhejiang, China
- Wenling Big Data and Artificial Intelligence Institute in Medicine, 317502, TaiZhou, Zhejiang, China
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), 317502, Taizhou, Zhejiang, China
| | - Qianmeng Pan
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), 317502, Taizhou, Zhejiang, China
| | - Hui Wang
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), 317502, Taizhou, Zhejiang, China
| | - Xiayi Chen
- Wenling Big Data and Artificial Intelligence Institute in Medicine, 317502, TaiZhou, Zhejiang, China
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), 317502, Taizhou, Zhejiang, China
| | - Kai Wang
- Dongyang Hospital Affiliated to Wenzhou Medical University, 322100, Jinhua, Zhejiang, China
| | - Jing Xiong
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, 518000, Shenzhen, Guangdong, China
| | - Liyu Chen
- Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, 310022, Hangzhou, Zhejiang, China.
- Zhejiang Provincial Research Center for Cancer Intelligent Diagnosis and Molecular Technology, 310022, Hangzhou, Zhejiang, China.
| | - Dong Xu
- Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, 310022, Hangzhou, Zhejiang, China.
- Postgraduate training base Alliance of Wenzhou Medical University (Zhejiang Cancer Hospital), 310022, Hangzhou, Zhejiang, China.
- Zhejiang Provincial Research Center for Cancer Intelligent Diagnosis and Molecular Technology, 310022, Hangzhou, Zhejiang, China.
- Wenling Big Data and Artificial Intelligence Institute in Medicine, 317502, TaiZhou, Zhejiang, China.
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), 317502, Taizhou, Zhejiang, China.
| |
Collapse
|
25
|
Vaidya A, Chen RJ, Williamson DFK, Song AH, Jaume G, Yang Y, Hartvigsen T, Dyer EC, Lu MY, Lipkova J, Shaban M, Chen TY, Mahmood F. Demographic bias in misdiagnosis by computational pathology models. Nat Med 2024; 30:1174-1190. [PMID: 38641744 DOI: 10.1038/s41591-024-02885-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Accepted: 02/23/2024] [Indexed: 04/21/2024]
Abstract
Despite increasing numbers of regulatory approvals, deep learning-based computational pathology systems often overlook the impact of demographic factors on performance, potentially leading to biases. This concern is all the more important as computational pathology has leveraged large public datasets that underrepresent certain demographic groups. Using publicly available data from The Cancer Genome Atlas and the EBRAINS brain tumor atlas, as well as internal patient data, we show that whole-slide image classification models display marked performance disparities across different demographic groups when used to subtype breast and lung carcinomas and to predict IDH1 mutations in gliomas. For example, when using common modeling approaches, we observed performance gaps (in area under the receiver operating characteristic curve) between white and Black patients of 3.0% for breast cancer subtyping, 10.9% for lung cancer subtyping and 16.0% for IDH1 mutation prediction in gliomas. We found that richer feature representations obtained from self-supervised vision foundation models reduce performance variations between groups. These representations provide improvements upon weaker models even when those weaker models are combined with state-of-the-art bias mitigation strategies and modeling choices. Nevertheless, self-supervised vision foundation models do not fully eliminate these discrepancies, highlighting the continuing need for bias mitigation efforts in computational pathology. Finally, we demonstrate that our results extend to other demographic factors beyond patient race. Given these findings, we encourage regulatory and policy agencies to integrate demographic-stratified evaluation into their assessment guidelines.
Collapse
Affiliation(s)
- Anurag Vaidya
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
- Health Sciences and Technology, Harvard-MIT, Cambridge, MA, USA
| | - Richard J Chen
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Drew F K Williamson
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology and Laboratory Medicine, Emory University School of Medicine, Atlanta, GA, USA
| | - Andrew H Song
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Guillaume Jaume
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Yuzhe Yang
- Electrical Engineering and Computer Science, MIT, Cambridge, MA, USA
| | - Thomas Hartvigsen
- School of Data Science, University of Virginia, Charlottesville, VA, USA
| | - Emma C Dyer
- T.H. Chan School of Public Health, Harvard University, Cambridge, MA, USA
| | - Ming Y Lu
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
- Electrical Engineering and Computer Science, MIT, Cambridge, MA, USA
| | - Jana Lipkova
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Muhammad Shaban
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Tiffany Y Chen
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Faisal Mahmood
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA.
- Harvard Data Science Initiative, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
26
|
Balagopalan A, Baldini I, Celi LA, Gichoya J, McCoy LG, Naumann T, Shalit U, van der Schaar M, Wagstaff KL. Machine learning for healthcare that matters: Reorienting from technical novelty to equitable impact. PLOS DIGITAL HEALTH 2024; 3:e0000474. [PMID: 38620047 PMCID: PMC11018283 DOI: 10.1371/journal.pdig.0000474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 02/18/2024] [Indexed: 04/17/2024]
Abstract
Despite significant technical advances in machine learning (ML) over the past several years, the tangible impact of this technology in healthcare has been limited. This is due not only to the particular complexities of healthcare, but also due to structural issues in the machine learning for healthcare (MLHC) community which broadly reward technical novelty over tangible, equitable impact. We structure our work as a healthcare-focused echo of the 2012 paper "Machine Learning that Matters", which highlighted such structural issues in the ML community at large, and offered a series of clearly defined "Impact Challenges" to which the field should orient itself. Drawing on the expertise of a diverse and international group of authors, we engage in a narrative review and examine issues in the research background environment, training processes, evaluation metrics, and deployment protocols which act to limit the real-world applicability of MLHC. Broadly, we seek to distinguish between machine learning ON healthcare data and machine learning FOR healthcare-the former of which sees healthcare as merely a source of interesting technical challenges, and the latter of which regards ML as a tool in service of meeting tangible clinical needs. We offer specific recommendations for a series of stakeholders in the field, from ML researchers and clinicians, to the institutions in which they work, and the governments which regulate their data access.
Collapse
Affiliation(s)
- Aparna Balagopalan
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology; Cambridge, Massachusetts, United States of America
| | - Ioana Baldini
- IBM Research; Yorktown Heights, New York, United States of America
| | - Leo Anthony Celi
- Laboratory for Computational Physiology, Massachusetts Institute of Technology; Cambridge, Massachusetts, United States of America
- Division of Pulmonary, Critical Care and Sleep Medicine, Beth Israel Deaconess Medical Center; Boston, Massachusetts, United States of America
- Department of Biostatistics, Harvard T.H. Chan School of Public Health; Boston, Massachusetts, United States of America
| | - Judy Gichoya
- Department of Radiology and Imaging Sciences, School of Medicine, Emory University; Atlanta, Georgia, United States of America
| | - Liam G. McCoy
- Division of Neurology, Department of Medicine, University of Alberta; Edmonton, Alberta, Canada
| | - Tristan Naumann
- Microsoft Research; Redmond, Washington, United States of America
| | - Uri Shalit
- The Faculty of Data and Decision Sciences, Technion; Haifa, Israel
| | - Mihaela van der Schaar
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge; Cambridge, United Kingdom
- The Alan Turing Institute; London, United Kingdom
| | | |
Collapse
|
27
|
Ciet P, Eade C, Ho ML, Laborie LB, Mahomed N, Naidoo J, Pace E, Segal B, Toso S, Tschauner S, Vamyanmane DK, Wagner MW, Shelmerdine SC. The unintended consequences of artificial intelligence in paediatric radiology. Pediatr Radiol 2024; 54:585-593. [PMID: 37665368 DOI: 10.1007/s00247-023-05746-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 08/07/2023] [Accepted: 08/08/2023] [Indexed: 09/05/2023]
Abstract
Over the past decade, there has been a dramatic rise in the interest relating to the application of artificial intelligence (AI) in radiology. Originally only 'narrow' AI tasks were possible; however, with increasing availability of data, teamed with ease of access to powerful computer processing capabilities, we are becoming more able to generate complex and nuanced prediction models and elaborate solutions for healthcare. Nevertheless, these AI models are not without their failings, and sometimes the intended use for these solutions may not lead to predictable impacts for patients, society or those working within the healthcare profession. In this article, we provide an overview of the latest opinions regarding AI ethics, bias, limitations, challenges and considerations that we should all contemplate in this exciting and expanding field, with a special attention to how this applies to the unique aspects of a paediatric population. By embracing AI technology and fostering a multidisciplinary approach, it is hoped that we can harness the power AI brings whilst minimising harm and ensuring a beneficial impact on radiology practice.
Collapse
Affiliation(s)
- Pierluigi Ciet
- Department of Radiology and Nuclear Medicine, Erasmus MC - Sophia's Children's Hospital, Rotterdam, The Netherlands
- Department of Medical Sciences, University of Cagliari, Cagliari, Italy
| | | | - Mai-Lan Ho
- University of Missouri, Columbia, MO, USA
| | - Lene Bjerke Laborie
- Department of Radiology, Section for Paediatrics, Haukeland University Hospital, Bergen, Norway
- Department of Clinical Medicine, University of Bergen, Bergen, Norway
| | - Nasreen Mahomed
- Department of Radiology, University of Witwatersrand, Johannesburg, South Africa
| | - Jaishree Naidoo
- Paediatric Diagnostic Imaging, Dr J Naidoo Inc., Johannesburg, South Africa
- Envisionit Deep AI Ltd, Coveham House, Downside Bridge Road, Cobham, UK
| | - Erika Pace
- Department of Diagnostic Radiology, The Royal Marsden NHS Foundation Trust, London, UK
| | - Bradley Segal
- Department of Radiology, University of Witwatersrand, Johannesburg, South Africa
| | - Seema Toso
- Pediatric Radiology, Children's Hospital, University Hospitals of Geneva, Geneva, Switzerland
| | - Sebastian Tschauner
- Division of Paediatric Radiology, Department of Radiology, Medical University of Graz, Graz, Austria
| | - Dhananjaya K Vamyanmane
- Department of Pediatric Radiology, Indira Gandhi Institute of Child Health, Bangalore, India
| | - Matthias W Wagner
- Department of Diagnostic Imaging, Division of Neuroradiology, The Hospital for Sick Children, Toronto, Canada
- Department of Medical Imaging, University of Toronto, Toronto, ON, Canada
- Department of Neuroradiology, University Hospital Augsburg, Augsburg, Germany
| | - Susan C Shelmerdine
- Department of Clinical Radiology, Great Ormond Street Hospital for Children NHS Foundation Trust, Great Ormond Street, London, WC1H 3JH, UK.
- Great Ormond Street Hospital for Children, UCL Great Ormond Street Institute of Child Health, London, UK.
- NIHR Great Ormond Street Hospital Biomedical Research Centre, 30 Guilford Street, Bloomsbury, London, UK.
- Department of Clinical Radiology, St George's Hospital, London, UK.
| |
Collapse
|
28
|
Simmons C, DeGrasse J, Polakovic S, Aibinder W, Throckmorton T, Noerdlinger M, Papandrea R, Trenhaile S, Schoch B, Gobbato B, Routman H, Parsons M, Roche CP. Initial clinical experience with a predictive clinical decision support tool for anatomic and reverse total shoulder arthroplasty. EUROPEAN JOURNAL OF ORTHOPAEDIC SURGERY & TRAUMATOLOGY : ORTHOPEDIE TRAUMATOLOGIE 2024; 34:1307-1318. [PMID: 38095688 DOI: 10.1007/s00590-023-03796-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 11/19/2023] [Indexed: 04/02/2024]
Abstract
PURPOSE Clinical decision support tools (CDSTs) are software that generate patient-specific assessments that can be used to better inform healthcare provider decision making. Machine learning (ML)-based CDSTs have recently been developed for anatomic (aTSA) and reverse (rTSA) total shoulder arthroplasty to facilitate more data-driven, evidence-based decision making. Using this shoulder CDST as an example, this external validation study provides an overview of how ML-based algorithms are developed and discusses the limitations of these tools. METHODS An external validation for a novel CDST was conducted on 243 patients (120F/123M) who received a personalized prediction prior to surgery and had short-term clinical follow-up from 3 months to 2 years after primary aTSA (n = 43) or rTSA (n = 200). The outcome score and active range of motion predictions were compared to each patient's actual result at each timepoint, with the accuracy quantified by the mean absolute error (MAE). RESULTS The results of this external validation demonstrate the CDST accuracy to be similar (within 10%) or better than the MAEs from the published internal validation. A few predictive models were observed to have substantially lower MAEs than the internal validation, specifically, Constant (31.6% better), active abduction (22.5% better), global shoulder function (20.0% better), active external rotation (19.0% better), and active forward elevation (16.2% better), which is encouraging; however, the sample size was small. CONCLUSION A greater understanding of the limitations of ML-based CDSTs will facilitate more responsible use and build trust and confidence, potentially leading to greater adoption. As CDSTs evolve, we anticipate greater shared decision making between the patient and surgeon with the aim of achieving even better outcomes and greater levels of patient satisfaction.
Collapse
Affiliation(s)
- Chelsey Simmons
- University of Florida, PO Box 116250, Gainesville, FL, 32605, USA
- Exactech, 2320 NW 66th Court, Gainesville, FL, 32653, USA
| | | | | | - William Aibinder
- University of Michigan, 1500 E. Medical Center Drive, Ann Arbor, MI, 48109, USA
| | | | - Mayo Noerdlinger
- Atlantic Orthopaedics and Sports Medicine, 1900 Lafayette Road, Portsmouth, NH, USA
| | | | | | - Bradley Schoch
- Mayo Clinic, Florida, 4500 San Pablo Rd., Jacksonville, FL, 32224, USA
| | - Bruno Gobbato
- , R. José Emmendoerfer, 1449, Nova Brasília, Jaraguá do Sul, SC, 89252-278, Brazil
| | - Howard Routman
- Atlantis Orthopedics, 900 Village Square Crossing, #170, Palm Beach Gardens, FL, 33410, USA
| | - Moby Parsons
- , 333 Borthwick Ave Suite #301, Portsmouth, NH, 03801, USA
| | | |
Collapse
|
29
|
Anderson JW, Visweswaran S. Algorithmic Individual Fairness and Healthcare: A Scoping Review. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.25.24304853. [PMID: 38585746 PMCID: PMC10996729 DOI: 10.1101/2024.03.25.24304853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Objective Statistical and artificial intelligence algorithms are increasingly being developed for use in healthcare. These algorithms may reflect biases that magnify disparities in clinical care, and there is a growing need for understanding how algorithmic biases can be mitigated in pursuit of algorithmic fairness. Individual fairness in algorithms constrains algorithms to the notion that "similar individuals should be treated similarly." We conducted a scoping review on algorithmic individual fairness to understand the current state of research in the metrics and methods developed to achieve individual fairness and its applications in healthcare. Methods We searched three databases, PubMed, ACM Digital Library, and IEEE Xplore, for algorithmic individual fairness metrics, algorithmic bias mitigation, and healthcare applications. Our search was restricted to articles published between January 2013 and September 2023. We identified 1,886 articles through database searches and manually identified one article from which we included 30 articles in the review. Data from the selected articles were extracted, and the findings were synthesized. Results Based on the 30 articles in the review, we identified several themes, including philosophical underpinnings of fairness, individual fairness metrics, mitigation methods for achieving individual fairness, implications of achieving individual fairness on group fairness and vice versa, fairness metrics that combined individual fairness and group fairness, software for measuring and optimizing individual fairness, and applications of individual fairness in healthcare. Conclusion While there has been significant work on algorithmic individual fairness in recent years, the definition, use, and study of individual fairness remain in their infancy, especially in healthcare. Future research is needed to apply and evaluate individual fairness in healthcare comprehensively.
Collapse
Affiliation(s)
| | - Shyam Visweswaran
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA
| |
Collapse
|
30
|
Wei ML, Tada M, So A, Torres R. Artificial intelligence and skin cancer. Front Med (Lausanne) 2024; 11:1331895. [PMID: 38566925 PMCID: PMC10985205 DOI: 10.3389/fmed.2024.1331895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 02/26/2024] [Indexed: 04/04/2024] Open
Abstract
Artificial intelligence is poised to rapidly reshape many fields, including that of skin cancer screening and diagnosis, both as a disruptive and assistive technology. Together with the collection and availability of large medical data sets, artificial intelligence will become a powerful tool that can be leveraged by physicians in their diagnoses and treatment plans for patients. This comprehensive review focuses on current progress toward AI applications for patients, primary care providers, dermatologists, and dermatopathologists, explores the diverse applications of image and molecular processing for skin cancer, and highlights AI's potential for patient self-screening and improving diagnostic accuracy for non-dermatologists. We additionally delve into the challenges and barriers to clinical implementation, paths forward for implementation and areas of active research.
Collapse
Affiliation(s)
- Maria L. Wei
- Department of Dermatology, University of California, San Francisco, San Francisco, CA, United States
- Dermatology Service, San Francisco VA Health Care System, San Francisco, CA, United States
| | - Mikio Tada
- Institute for Neurodegenerative Diseases, University of California, San Francisco, San Francisco, CA, United States
| | - Alexandra So
- School of Medicine, University of California, San Francisco, San Francisco, CA, United States
| | - Rodrigo Torres
- Dermatology Service, San Francisco VA Health Care System, San Francisco, CA, United States
| |
Collapse
|
31
|
Campion JR, O'Connor DB, Lahiff C. Human-artificial intelligence interaction in gastrointestinal endoscopy. World J Gastrointest Endosc 2024; 16:126-135. [PMID: 38577646 PMCID: PMC10989254 DOI: 10.4253/wjge.v16.i3.126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Revised: 01/18/2024] [Accepted: 02/23/2024] [Indexed: 03/14/2024] Open
Abstract
The number and variety of applications of artificial intelligence (AI) in gastrointestinal (GI) endoscopy is growing rapidly. New technologies based on machine learning (ML) and convolutional neural networks (CNNs) are at various stages of development and deployment to assist patients and endoscopists in preparing for endoscopic procedures, in detection, diagnosis and classification of pathology during endoscopy and in confirmation of key performance indicators. Platforms based on ML and CNNs require regulatory approval as medical devices. Interactions between humans and the technologies we use are complex and are influenced by design, behavioural and psychological elements. Due to the substantial differences between AI and prior technologies, important differences may be expected in how we interact with advice from AI technologies. Human–AI interaction (HAII) may be optimised by developing AI algorithms to minimise false positives and designing platform interfaces to maximise usability. Human factors influencing HAII may include automation bias, alarm fatigue, algorithm aversion, learning effect and deskilling. Each of these areas merits further study in the specific setting of AI applications in GI endoscopy and professional societies should engage to ensure that sufficient emphasis is placed on human-centred design in development of new AI technologies.
Collapse
Affiliation(s)
- John R Campion
- Department of Gastroenterology, Mater Misericordiae University Hospital, Dublin D07 AX57, Ireland
- School of Medicine, University College Dublin, Dublin D04 C7X2, Ireland
| | - Donal B O'Connor
- Department of Surgery, Trinity College Dublin, Dublin D02 R590, Ireland
| | - Conor Lahiff
- Department of Gastroenterology, Mater Misericordiae University Hospital, Dublin D07 AX57, Ireland
- School of Medicine, University College Dublin, Dublin D04 C7X2, Ireland
| |
Collapse
|
32
|
Brady AP, Allen B, Chong J, Kotter E, Kottler N, Mongan J, Oakden-Rayner L, Pinto Dos Santos D, Tang A, Wald C, Slavotinek J. Developing, purchasing, implementing and monitoring AI tools in radiology: Practical considerations. A multi-society statement from the ACR, CAR, ESR, RANZCR & RSNA. J Med Imaging Radiat Oncol 2024; 68:7-26. [PMID: 38259140 DOI: 10.1111/1754-9485.13612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 11/23/2023] [Indexed: 01/24/2024]
Abstract
Artificial Intelligence (AI) carries the potential for unprecedented disruption in radiology, with possible positive and negative consequences. The integration of AI in radiology holds the potential to revolutionize healthcare practices by advancing diagnosis, quantification, and management of multiple medical conditions. Nevertheless, the ever-growing availability of AI tools in radiology highlights an increasing need to critically evaluate claims for its utility and to differentiate safe product offerings from potentially harmful, or fundamentally unhelpful ones. This multi-society paper, presenting the views of Radiology Societies in the USA, Canada, Europe, Australia, and New Zealand, defines the potential practical problems and ethical issues surrounding the incorporation of AI into radiological practice. In addition to delineating the main points of concern that developers, regulators, and purchasers of AI tools should consider prior to their introduction into clinical practice, this statement also suggests methods to monitor their stability and safety in clinical use, and their suitability for possible autonomous function. This statement is intended to serve as a useful summary of the practical issues which should be considered by all parties involved in the development of radiology AI resources, and their implementation as clinical tools.
Collapse
Affiliation(s)
| | - Bibb Allen
- Department of Radiology, Grandview Medical Center, Birmingham, Alabama, USA
- American College of Radiology Data Science Institute, Reston, Virginia, USA
| | - Jaron Chong
- Department of Medical Imaging, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| | - Elmar Kotter
- Department of Diagnostic and Interventional Radiology, Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Nina Kottler
- Radiology Partners, El Segundo, California, USA
- Stanford Center for Artificial Intelligence in Medicine & Imaging, Palo Alto, California, USA
| | - John Mongan
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, California, USA
| | - Lauren Oakden-Rayner
- Australian Institute for Machine Learning, University of Adelaide, Adelaide, South Australia, Australia
| | - Daniel Pinto Dos Santos
- Department of Radiology, University Hospital of Cologne, Cologne, Germany
- Department of Radiology, University Hospital of Frankfurt, Frankfurt, Germany
| | - An Tang
- Department of Radiology, Radiation Oncology, and Nuclear Medicine, Université de Montréal, Montreal, Quebec, Canada
| | - Christoph Wald
- Department of Radiology, Lahey Hospital & Medical Center, Burlington, Massachusetts, USA
- Tufts University Medical School, Boston, Massachusetts, USA
- Commision On Informatics, and Member, Board of Chancellors, American College of Radiology, Reston, Virginia, USA
| | - John Slavotinek
- South Australia Medical Imaging, Flinders Medical Centre Adelaide, Adelaide, South Australia, Australia
- College of Medicine and Public Health, Flinders University, Adelaide, South Australia, Australia
| |
Collapse
|
33
|
Groh M, Badri O, Daneshjou R, Koochek A, Harris C, Soenksen LR, Doraiswamy PM, Picard R. Deep learning-aided decision support for diagnosis of skin disease across skin tones. Nat Med 2024; 30:573-583. [PMID: 38317019 PMCID: PMC10878981 DOI: 10.1038/s41591-023-02728-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Accepted: 11/16/2023] [Indexed: 02/07/2024]
Abstract
Although advances in deep learning systems for image-based medical diagnosis demonstrate their potential to augment clinical decision-making, the effectiveness of physician-machine partnerships remains an open question, in part because physicians and algorithms are both susceptible to systematic errors, especially for diagnosis of underrepresented populations. Here we present results from a large-scale digital experiment involving board-certified dermatologists (n = 389) and primary-care physicians (n = 459) from 39 countries to evaluate the accuracy of diagnoses submitted by physicians in a store-and-forward teledermatology simulation. In this experiment, physicians were presented with 364 images spanning 46 skin diseases and asked to submit up to four differential diagnoses. Specialists and generalists achieved diagnostic accuracies of 38% and 19%, respectively, but both specialists and generalists were four percentage points less accurate for the diagnosis of images of dark skin as compared to light skin. Fair deep learning system decision support improved the diagnostic accuracy of both specialists and generalists by more than 33%, but exacerbated the gap in the diagnostic accuracy of generalists across skin tones. These results demonstrate that well-designed physician-machine partnerships can enhance the diagnostic accuracy of physicians, illustrating that success in improving overall diagnostic accuracy does not necessarily address bias.
Collapse
Affiliation(s)
- Matthew Groh
- Northwestern University Kellogg School of Management, Evanston, IL, USA.
- MIT Media Lab, Cambridge, MA, USA.
| | - Omar Badri
- Northeast Dermatology Associates, Beverly, MA, USA
| | - Roxana Daneshjou
- Stanford Department of Biomedical Data Science, Stanford, CA, USA
- Stanford Department of Dermatology, Redwood City, CA, USA
| | | | | | - Luis R Soenksen
- Wyss Institute for Bioinspired Engineering at Harvard, Boston, MA, USA
| | - P Murali Doraiswamy
- MIT Media Lab, Cambridge, MA, USA
- Duke University School of Medicine, Durham, NC, USA
| | | |
Collapse
|
34
|
Brady AP, Allen B, Chong J, Kotter E, Kottler N, Mongan J, Oakden-Rayner L, Dos Santos DP, Tang A, Wald C, Slavotinek J. Developing, purchasing, implementing and monitoring AI tools in radiology: practical considerations. A multi-society statement from the ACR, CAR, ESR, RANZCR & RSNA. Insights Imaging 2024; 15:16. [PMID: 38246898 PMCID: PMC10800328 DOI: 10.1186/s13244-023-01541-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2024] Open
Abstract
Artificial Intelligence (AI) carries the potential for unprecedented disruption in radiology, with possible positive and negative consequences. The integration of AI in radiology holds the potential to revolutionize healthcare practices by advancing diagnosis, quantification, and management of multiple medical conditions. Nevertheless, the ever-growing availability of AI tools in radiology highlights an increasing need to critically evaluate claims for its utility and to differentiate safe product offerings from potentially harmful, or fundamentally unhelpful ones.This multi-society paper, presenting the views of Radiology Societies in the USA, Canada, Europe, Australia, and New Zealand, defines the potential practical problems and ethical issues surrounding the incorporation of AI into radiological practice. In addition to delineating the main points of concern that developers, regulators, and purchasers of AI tools should consider prior to their introduction into clinical practice, this statement also suggests methods to monitor their stability and safety in clinical use, and their suitability for possible autonomous function. This statement is intended to serve as a useful summary of the practical issues which should be considered by all parties involved in the development of radiology AI resources, and their implementation as clinical tools.Key points • The incorporation of artificial intelligence (AI) in radiological practice demands increased monitoring of its utility and safety.• Cooperation between developers, clinicians, and regulators will allow all involved to address ethical issues and monitor AI performance.• AI can fulfil its promise to advance patient well-being if all steps from development to integration in healthcare are rigorously evaluated.
Collapse
Affiliation(s)
| | - Bibb Allen
- Department of Radiology, Grandview Medical Center, Birmingham, AL, USA
- American College of Radiology Data Science Institute, Reston, VA, USA
| | - Jaron Chong
- Department of Medical Imaging, Schulich School of Medicine and Dentistry, Western University, London, ON, Canada
| | - Elmar Kotter
- Department of Diagnostic and Interventional Radiology, Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Nina Kottler
- Radiology Partners, El Segundo, CA, USA
- Stanford Center for Artificial Intelligence in Medicine & Imaging, Palo Alto, CA, USA
| | - John Mongan
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, USA
| | - Lauren Oakden-Rayner
- Australian Institute for Machine Learning, University of Adelaide, Adelaide, Australia
| | - Daniel Pinto Dos Santos
- Department of Radiology, University Hospital of Cologne, Cologne, Germany
- Department of Radiology, University Hospital of Frankfurt, Frankfurt, Germany
| | - An Tang
- Department of Radiology, Radiation Oncology, and Nuclear Medicine, Université de Montréal, Montréal, Québec, Canada
| | - Christoph Wald
- Department of Radiology, Lahey Hospital & Medical Center, Burlington, MA, USA
- Tufts University Medical School, Boston, MA, USA
- Commision On Informatics, and Member, Board of Chancellors, American College of Radiology, Virginia, USA
| | - John Slavotinek
- South Australia Medical Imaging, Flinders Medical Centre Adelaide, Adelaide, Australia
- College of Medicine and Public Health, Flinders University, Adelaide, Australia
| |
Collapse
|
35
|
Nguyen T. ChatGPT in Medical Education: A Precursor for Automation Bias? JMIR MEDICAL EDUCATION 2024; 10:e50174. [PMID: 38231545 PMCID: PMC10831594 DOI: 10.2196/50174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 12/11/2023] [Indexed: 01/18/2024]
Abstract
Artificial intelligence (AI) in health care has the promise of providing accurate and efficient results. However, AI can also be a black box, where the logic behind its results is nonrational. There are concerns if these questionable results are used in patient care. As physicians have the duty to provide care based on their clinical judgment in addition to their patients' values and preferences, it is crucial that physicians validate the results from AI. Yet, there are some physicians who exhibit a phenomenon known as automation bias, where there is an assumption from the user that AI is always right. This is a dangerous mindset, as users exhibiting automation bias will not validate the results, given their trust in AI systems. Several factors impact a user's susceptibility to automation bias, such as inexperience or being born in the digital age. In this editorial, I argue that these factors and a lack of AI education in the medical school curriculum cause automation bias. I also explore the harms of automation bias and why prospective physicians need to be vigilant when using AI. Furthermore, it is important to consider what attitudes are being taught to students when introducing ChatGPT, which could be some students' first time using AI, prior to their use of AI in the clinical setting. Therefore, in attempts to avoid the problem of automation bias in the long-term, in addition to incorporating AI education into the curriculum, as is necessary, the use of ChatGPT in medical education should be limited to certain tasks. Otherwise, having no constraints on what ChatGPT should be used for could lead to automation bias.
Collapse
Affiliation(s)
- Tina Nguyen
- The University of Texas Medical Branch, Galveston, TX, United States
| |
Collapse
|
36
|
Dot G, Gajny L, Ducret M. [The challenges of artificial intelligence in odontology]. Med Sci (Paris) 2024; 40:79-84. [PMID: 38299907 DOI: 10.1051/medsci/2023199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2024] Open
Abstract
Artificial intelligence has numerous potential applications in dentistry, as these algorithms aim to improve the efficiency and safety of several clinical situations. While the first commercial solutions are being proposed, most of these algorithms have not been sufficiently validated for clinical use. This article describes the challenges surrounding the development of these new tools, to help clinicians to keep a critical eye on this technology.
Collapse
Affiliation(s)
- Gauthier Dot
- UFR odontologie, université Paris Cité, Paris, France - AP-HP, hôpital Pitié-Salpêtrière, service de médecine bucco-dentaire, Paris, France - Institut de biomécanique humaine Georges Charpak, école nationale supérieure d'Arts et Métiers, Paris, France
| | - Laurent Gajny
- Institut de biomécanique humaine Georges Charpak, école nationale supérieure d'Arts et Métiers, Paris, France
| | - Maxime Ducret
- Faculté d'odontologie, université Claude Bernard Lyon 1, hospices civils de Lyon, Lyon, France
| |
Collapse
|
37
|
Brady AP, Allen B, Chong J, Kotter E, Kottler N, Mongan J, Oakden-Rayner L, dos Santos DP, Tang A, Wald C, Slavotinek J. Developing, Purchasing, Implementing and Monitoring AI Tools in Radiology: Practical Considerations. A Multi-Society Statement from the ACR, CAR, ESR, RANZCR and RSNA. Radiol Artif Intell 2024; 6:e230513. [PMID: 38251899 PMCID: PMC10831521 DOI: 10.1148/ryai.230513] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2024]
Abstract
Artificial Intelligence (AI) carries the potential for unprecedented disruption in radiology, with possible positive and negative consequences. The integration of AI in radiology holds the potential to revolutionize healthcare practices by advancing diagnosis, quantification, and management of multiple medical conditions. Nevertheless, the ever-growing availability of AI tools in radiology highlights an increasing need to critically evaluate claims for its utility and to differentiate safe product offerings from potentially harmful, or fundamentally unhelpful ones. This multi-society paper, presenting the views of Radiology Societies in the USA, Canada, Europe, Australia, and New Zealand, defines the potential practical problems and ethical issues surrounding the incorporation of AI into radiological practice. In addition to delineating the main points of concern that developers, regulators, and purchasers of AI tools should consider prior to their introduction into clinical practice, this statement also suggests methods to monitor their stability and safety in clinical use, and their suitability for possible autonomous function. This statement is intended to serve as a useful summary of the practical issues which should be considered by all parties involved in the development of radiology AI resources, and their implementation as clinical tools. This article is simultaneously published in Insights into Imaging (DOI 10.1186/s13244-023-01541-3), Journal of Medical Imaging and Radiation Oncology (DOI 10.1111/1754-9485.13612), Canadian Association of Radiologists Journal (DOI 10.1177/08465371231222229), Journal of the American College of Radiology (DOI 10.1016/j.jacr.2023.12.005), and Radiology: Artificial Intelligence (DOI 10.1148/ryai.230513). Keywords: Artificial Intelligence, Radiology, Automation, Machine Learning Published under a CC BY 4.0 license. ©The Author(s) 2024. Editor's Note: The RSNA Board of Directors has endorsed this article. It has not undergone review or editing by this journal.
Collapse
Affiliation(s)
| | - Bibb Allen
- Department of Radiology, Grandview Medical
Center, Birmingham, AL, USA
- American College of Radiology Data Science
Institute, Reston, VA, USA
| | - Jaron Chong
- Department of Medical Imaging, Schulich
School of Medicine and Dentistry, Western University, London, ON, Canada
| | - Elmar Kotter
- Department of Diagnostic and
Interventional Radiology, Medical Center, Faculty of Medicine, University of
Freiburg, Freiburg, Germany
| | - Nina Kottler
- Radiology Partners, El Segundo, CA,
USA
- Stanford Center for Artificial
Intelligence in Medicine & Imaging, Palo Alto, CA, USA
| | - John Mongan
- Department of Radiology and Biomedical
Imaging, University of California, San Francisco, USA
| | - Lauren Oakden-Rayner
- Australian Institute for Machine Learning,
University of Adelaide, Adelaide, Australia
| | - Daniel Pinto dos Santos
- Department of Radiology, University
Hospital of Cologne, Cologne, Germany
- Department of Radiology, University
Hospital of Frankfurt, Frankfurt, Germany
| | - An Tang
- Department of Radiology, Radiation
Oncology, and Nuclear Medicine, Université de Montréal,
Montréal, Québec, Canada
| | - Christoph Wald
- Department of Radiology, Lahey Hospital
& Medical Center, Burlington, MA, USA
- Tufts University Medical School, Boston,
MA, USA
- Commission On Informatics, and Member,
Board of Chancellors, American College of Radiology, Virginia, USA
| | - John Slavotinek
- South Australia Medical Imaging,
Flinders Medical Centre Adelaide, Adelaide, Australia
- College of Medicine and Public Health,
Flinders University, Adelaide, Australia
| |
Collapse
|
38
|
Teneggi J, Yi PH, Sulam J. Examination-Level Supervision for Deep Learning-based Intracranial Hemorrhage Detection on Head CT Scans. Radiol Artif Intell 2024; 6:e230159. [PMID: 38294324 PMCID: PMC10831525 DOI: 10.1148/ryai.230159] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 11/02/2023] [Accepted: 12/05/2023] [Indexed: 02/01/2024]
Abstract
Purpose To compare the effectiveness of weak supervision (ie, with examination-level labels only) and strong supervision (ie, with image-level labels) in training deep learning models for detection of intracranial hemorrhage (ICH) on head CT scans. Materials and Methods In this retrospective study, an attention-based convolutional neural network was trained with either local (ie, image level) or global (ie, examination level) binary labels on the Radiological Society of North America (RSNA) 2019 Brain CT Hemorrhage Challenge dataset of 21 736 examinations (8876 [40.8%] ICH) and 752 422 images (107 784 [14.3%] ICH). The CQ500 (436 examinations; 212 [48.6%] ICH) and CT-ICH (75 examinations; 36 [48.0%] ICH) datasets were employed for external testing. Performance in detecting ICH was compared between weak (examination-level labels) and strong (image-level labels) learners as a function of the number of labels available during training. Results On examination-level binary classification, strong and weak learners did not have different area under the receiver operating characteristic curve values on the internal validation split (0.96 vs 0.96; P = .64) and the CQ500 dataset (0.90 vs 0.92; P = .15). Weak learners outperformed strong ones on the CT-ICH dataset (0.95 vs 0.92; P = .03). Weak learners had better section-level ICH detection performance when more than 10 000 labels were available for training (average f1 = 0.73 vs 0.65; P < .001). Weakly supervised models trained on the entire RSNA dataset required 35 times fewer labels than equivalent strong learners. Conclusion Strongly supervised models did not achieve better performance than weakly supervised ones, which could reduce radiologist labor requirements for prospective dataset curation. Keywords: CT, Head/Neck, Brain/Brain Stem, Hemorrhage Supplemental material is available for this article. © RSNA, 2023 See also commentary by Wahid and Fuentes in this issue.
Collapse
Affiliation(s)
- Jacopo Teneggi
- From the Department of Computer Science (J.T.), Department of
Biomedical Engineering (J.S.), and Mathematical Institute for Data Science
(MINDS) (J.S., J.T.), Johns Hopkins University, 3400 N Charles St, Clark Hall,
Suite 320, Baltimore, MD 21218; and University of Maryland Medical Intelligent
Imaging Center (UM2ii), Department of Diagnostic Radiology and Nuclear Medicine,
University of Maryland School of Medicine, Baltimore, Md (P.H.Y.)
| | - Paul H. Yi
- From the Department of Computer Science (J.T.), Department of
Biomedical Engineering (J.S.), and Mathematical Institute for Data Science
(MINDS) (J.S., J.T.), Johns Hopkins University, 3400 N Charles St, Clark Hall,
Suite 320, Baltimore, MD 21218; and University of Maryland Medical Intelligent
Imaging Center (UM2ii), Department of Diagnostic Radiology and Nuclear Medicine,
University of Maryland School of Medicine, Baltimore, Md (P.H.Y.)
| | - Jeremias Sulam
- From the Department of Computer Science (J.T.), Department of
Biomedical Engineering (J.S.), and Mathematical Institute for Data Science
(MINDS) (J.S., J.T.), Johns Hopkins University, 3400 N Charles St, Clark Hall,
Suite 320, Baltimore, MD 21218; and University of Maryland Medical Intelligent
Imaging Center (UM2ii), Department of Diagnostic Radiology and Nuclear Medicine,
University of Maryland School of Medicine, Baltimore, Md (P.H.Y.)
| |
Collapse
|
39
|
Jabbour S, Fouhey D, Shepard S, Valley TS, Kazerooni EA, Banovic N, Wiens J, Sjoding MW. Measuring the Impact of AI in the Diagnosis of Hospitalized Patients: A Randomized Clinical Vignette Survey Study. JAMA 2023; 330:2275-2284. [PMID: 38112814 PMCID: PMC10731487 DOI: 10.1001/jama.2023.22295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 10/11/2023] [Indexed: 12/21/2023]
Abstract
Importance Artificial intelligence (AI) could support clinicians when diagnosing hospitalized patients; however, systematic bias in AI models could worsen clinician diagnostic accuracy. Recent regulatory guidance has called for AI models to include explanations to mitigate errors made by models, but the effectiveness of this strategy has not been established. Objectives To evaluate the impact of systematically biased AI on clinician diagnostic accuracy and to determine if image-based AI model explanations can mitigate model errors. Design, Setting, and Participants Randomized clinical vignette survey study administered between April 2022 and January 2023 across 13 US states involving hospitalist physicians, nurse practitioners, and physician assistants. Interventions Clinicians were shown 9 clinical vignettes of patients hospitalized with acute respiratory failure, including their presenting symptoms, physical examination, laboratory results, and chest radiographs. Clinicians were then asked to determine the likelihood of pneumonia, heart failure, or chronic obstructive pulmonary disease as the underlying cause(s) of each patient's acute respiratory failure. To establish baseline diagnostic accuracy, clinicians were shown 2 vignettes without AI model input. Clinicians were then randomized to see 6 vignettes with AI model input with or without AI model explanations. Among these 6 vignettes, 3 vignettes included standard-model predictions, and 3 vignettes included systematically biased model predictions. Main Outcomes and Measures Clinician diagnostic accuracy for pneumonia, heart failure, and chronic obstructive pulmonary disease. Results Median participant age was 34 years (IQR, 31-39) and 241 (57.7%) were female. Four hundred fifty-seven clinicians were randomized and completed at least 1 vignette, with 231 randomized to AI model predictions without explanations, and 226 randomized to AI model predictions with explanations. Clinicians' baseline diagnostic accuracy was 73.0% (95% CI, 68.3% to 77.8%) for the 3 diagnoses. When shown a standard AI model without explanations, clinician accuracy increased over baseline by 2.9 percentage points (95% CI, 0.5 to 5.2) and by 4.4 percentage points (95% CI, 2.0 to 6.9) when clinicians were also shown AI model explanations. Systematically biased AI model predictions decreased clinician accuracy by 11.3 percentage points (95% CI, 7.2 to 15.5) compared with baseline and providing biased AI model predictions with explanations decreased clinician accuracy by 9.1 percentage points (95% CI, 4.9 to 13.2) compared with baseline, representing a nonsignificant improvement of 2.3 percentage points (95% CI, -2.7 to 7.2) compared with the systematically biased AI model. Conclusions and Relevance Although standard AI models improve diagnostic accuracy, systematically biased AI models reduced diagnostic accuracy, and commonly used image-based AI model explanations did not mitigate this harmful effect. Trial Registration ClinicalTrials.gov Identifier: NCT06098950.
Collapse
Affiliation(s)
- Sarah Jabbour
- Computer Science and Engineering, University of Michigan, Ann Arbor
| | - David Fouhey
- Computer Science and Engineering, University of Michigan, Ann Arbor
- Now with Computer Science Courant Institute, New York University, New York
- Now with Electrical and Computer Engineering Tandon School of Engineering, New York University, New York
| | | | - Thomas S. Valley
- Pulmonary and Critical Care Medicine, Department of Internal Medicine, University of Michigan Medical School, Ann Arbor
| | - Ella A. Kazerooni
- Department of Radiology, University of Michigan Medical School, Ann Arbor
| | - Nikola Banovic
- Computer Science and Engineering, University of Michigan, Ann Arbor
| | - Jenna Wiens
- Computer Science and Engineering, University of Michigan, Ann Arbor
| | - Michael W. Sjoding
- Pulmonary and Critical Care Medicine, Department of Internal Medicine, University of Michigan Medical School, Ann Arbor
| |
Collapse
|
40
|
Smith CM, Weathers AL, Lewis SL. An overview of clinical machine learning applications in neurology. J Neurol Sci 2023; 455:122799. [PMID: 37979413 DOI: 10.1016/j.jns.2023.122799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 10/26/2023] [Accepted: 11/12/2023] [Indexed: 11/20/2023]
Abstract
Machine learning techniques for clinical applications are evolving, and the potential impact this will have on clinical neurology is important to recognize. By providing a broad overview on this growing paradigm of clinical tools, this article aims to help healthcare professionals in neurology prepare to navigate both the opportunities and challenges brought on through continued advancements in machine learning. This narrative review first elaborates on how machine learning models are organized and implemented. Machine learning tools are then classified by clinical application, with examples of uses within neurology described in more detail. Finally, this article addresses limitations and considerations regarding clinical machine learning applications in neurology.
Collapse
Affiliation(s)
- Colin M Smith
- Lehigh Valley Fleming Neuroscience Institute, 1250 S Cedar Crest Blvd., Allentown, PA 18103, USA
| | - Allison L Weathers
- Cleveland Clinic Information Technology Division, 9500 Euclid Ave. Cleveland, OH 44195, USA
| | - Steven L Lewis
- Lehigh Valley Fleming Neuroscience Institute, 1250 S Cedar Crest Blvd., Allentown, PA 18103, USA.
| |
Collapse
|
41
|
Funer F, Liedtke W, Tinnemeyer S, Klausen AD, Schneider D, Zacharias HU, Langanke M, Salloch S. Responsibility and decision-making authority in using clinical decision support systems: an empirical-ethical exploration of German prospective professionals' preferences and concerns. JOURNAL OF MEDICAL ETHICS 2023; 50:6-11. [PMID: 37217277 PMCID: PMC10803986 DOI: 10.1136/jme-2022-108814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Accepted: 03/11/2023] [Indexed: 05/24/2023]
Abstract
Machine learning-driven clinical decision support systems (ML-CDSSs) seem impressively promising for future routine and emergency care. However, reflection on their clinical implementation reveals a wide array of ethical challenges. The preferences, concerns and expectations of professional stakeholders remain largely unexplored. Empirical research, however, may help to clarify the conceptual debate and its aspects in terms of their relevance for clinical practice. This study explores, from an ethical point of view, future healthcare professionals' attitudes to potential changes of responsibility and decision-making authority when using ML-CDSS. Twenty-seven semistructured interviews were conducted with German medical students and nursing trainees. The data were analysed based on qualitative content analysis according to Kuckartz. Interviewees' reflections are presented under three themes the interviewees describe as closely related: (self-)attribution of responsibility, decision-making authority and need of (professional) experience. The results illustrate the conceptual interconnectedness of professional responsibility and its structural and epistemic preconditions to be able to fulfil clinicians' responsibility in a meaningful manner. The study also sheds light on the four relata of responsibility understood as a relational concept. The article closes with concrete suggestions for the ethically sound clinical implementation of ML-CDSS.
Collapse
Affiliation(s)
- Florian Funer
- Institute of Ethics, History and Philosophy of Medicine, Hannover Medical School, Hannover, Germany
- Institute of Ethics and History of Medicine, Eberhard Karls University Tübingen, Tübingen, Germany
| | - Wenke Liedtke
- Department of Social Work, Protestant University of Applied Sciences RWL, Bochum, Germany
| | - Sara Tinnemeyer
- Institute of Ethics, History and Philosophy of Medicine, Hannover Medical School, Hannover, Germany
| | | | - Diana Schneider
- Competence Center Emerging Technologies, Fraunhofer Institute for Systems and Innovation Research ISI, Karlsruhe, Germany
| | - Helena U Zacharias
- Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Hannover Medical School, Hannover, Germany
| | - Martin Langanke
- Department of Social Work, Protestant University of Applied Sciences RWL, Bochum, Germany
| | - Sabine Salloch
- Institute of Ethics, History and Philosophy of Medicine, Hannover Medical School, Hannover, Germany
| |
Collapse
|
42
|
Banerji CRS, Chakraborti T, Harbron C, MacArthur BD. Clinical AI tools must convey predictive uncertainty for each individual patient. Nat Med 2023; 29:2996-2998. [PMID: 37821686 DOI: 10.1038/s41591-023-02562-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/13/2023]
Affiliation(s)
- Christopher R S Banerji
- The Alan Turing Institute, London, UK.
- University College London Hospitals, NHS Foundation Trust, London, UK.
- UCL Cancer Institute, Faculty of Medical Sciences, University College London, London, UK.
| | - Tapabrata Chakraborti
- The Alan Turing Institute, London, UK
- UCL Cancer Institute, Faculty of Medical Sciences, University College London, London, UK
| | | | - Ben D MacArthur
- The Alan Turing Institute, London, UK.
- Faculty of Medicine, University of Southampton, Southampton, UK.
- Mathematical Sciences, University of Southampton, Southampton, UK.
| |
Collapse
|
43
|
Nagendran M, Festor P, Komorowski M, Gordon AC, Faisal AA. Quantifying the impact of AI recommendations with explanations on prescription decision making. NPJ Digit Med 2023; 6:206. [PMID: 37935953 PMCID: PMC10630476 DOI: 10.1038/s41746-023-00955-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 10/27/2023] [Indexed: 11/09/2023] Open
Abstract
The influence of AI recommendations on physician behaviour remains poorly characterised. We assess how clinicians' decisions may be influenced by additional information more broadly, and how this influence can be modified by either the source of the information (human peers or AI) and the presence or absence of an AI explanation (XAI, here using simple feature importance). We used a modified between-subjects design where intensive care doctors (N = 86) were presented on a computer for each of 16 trials with a patient case and prompted to prescribe continuous values for two drugs. We used a multi-factorial experimental design with four arms, where each clinician experienced all four arms on different subsets of our 24 patients. The four arms were (i) baseline (control), (ii) peer human clinician scenario showing what doses had been prescribed by other doctors, (iii) AI suggestion and (iv) XAI suggestion. We found that additional information (peer, AI or XAI) had a strong influence on prescriptions (significantly for AI, not so for peers) but simple XAI did not have higher influence than AI alone. There was no correlation between attitudes to AI or clinical experience on the AI-supported decisions and nor was there correlation between what doctors self-reported about how useful they found the XAI and whether the XAI actually influenced their prescriptions. Our findings suggest that the marginal impact of simple XAI was low in this setting and we also cast doubt on the utility of self-reports as a valid metric for assessing XAI in clinical experts.
Collapse
Affiliation(s)
- Myura Nagendran
- UKRI Centre for Doctoral Training in AI for Healthcare, Imperial College London, London, UK
- Division of Anaesthetics, Pain Medicine, and Intensive Care, Imperial College London, London, UK
- Brain and Behaviour Lab, Imperial College London, London, UK
| | - Paul Festor
- UKRI Centre for Doctoral Training in AI for Healthcare, Imperial College London, London, UK
- Brain and Behaviour Lab, Imperial College London, London, UK
- Department of Computing, Imperial College London, London, UK
| | - Matthieu Komorowski
- Division of Anaesthetics, Pain Medicine, and Intensive Care, Imperial College London, London, UK
| | - Anthony C Gordon
- Division of Anaesthetics, Pain Medicine, and Intensive Care, Imperial College London, London, UK
| | - Aldo A Faisal
- UKRI Centre for Doctoral Training in AI for Healthcare, Imperial College London, London, UK.
- Brain and Behaviour Lab, Imperial College London, London, UK.
- Department of Computing, Imperial College London, London, UK.
- Institute of Artificial & Human Intelligence, University of Bayreuth, Bayreuth, Germany.
| |
Collapse
|
44
|
Li MD, Little BP. Appropriate Reliance on Artificial Intelligence in Radiology Education. J Am Coll Radiol 2023; 20:1126-1130. [PMID: 37392983 DOI: 10.1016/j.jacr.2023.04.019] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 03/20/2023] [Accepted: 04/06/2023] [Indexed: 07/03/2023]
Abstract
Users of artificial intelligence (AI) can become overreliant on AI, negatively affecting the performance of human-AI teams. For a future in which radiologists use interpretive AI tools routinely in clinical practice, radiology education will need to evolve to provide radiologists with the skills to use AI appropriately and wisely. In this work, we examine how overreliance on AI may develop in radiology trainees and explore how this problem can be mitigated, including through the use of AI-augmented education. Radiology trainees will still need to develop the perceptual skills and mastery of knowledge fundamental to radiology to use AI safely. We propose a framework for radiology trainees to use AI tools with appropriate reliance, drawing on lessons from human-AI interactions research.
Collapse
Affiliation(s)
- Matthew D Li
- Department of Radiology and Diagnostic Imaging, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, Alberta, Canada.
| | - Brent P Little
- Mayo Clinic College of Medicine and Science, Department of Radiology, Division of Cardiothoracic Imaging, Mayo Clinic Florida, Florida; Committee Member, ACR Appropriateness Criteria Thoracic Imaging
| |
Collapse
|
45
|
Ghassemi M. Presentation matters for AI-generated clinical advice. Nat Hum Behav 2023; 7:1833-1835. [PMID: 37985904 DOI: 10.1038/s41562-023-01721-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Affiliation(s)
- Marzyeh Ghassemi
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Institute for Medical Engineering & Science, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Vector Institute, Toronto, Ontario, Canada.
| |
Collapse
|
46
|
Schlicker N, Langer M, Hirsch MC. [How trustworthy is artificial intelligence? : A model for the conflict between objectivity and subjectivity]. INNERE MEDIZIN (HEIDELBERG, GERMANY) 2023; 64:1051-1057. [PMID: 37737496 DOI: 10.1007/s00108-023-01602-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 09/18/2023] [Indexed: 09/23/2023]
Abstract
For the integration of artificial intelligence (AI) systems into medical processes it is decisive to address both the trustworthiness of these systems and the trust that physicians and patients have in those systems. Too much trust can result in physicians uncritically relying on this technology, while too little trust may result in physicians not taking advantage of the full potential of AI-based technology in making decisions. To strike a balance between these extremes it is crucial to correctly assess the trustworthiness of a system. Only in this way is it possible to decide whether or the system can be trusted or not. This article describes these relationships for the medical context. We show why trustworthiness and trust are important in the use of AI-based systems and how individuals can come to an accurate assessment of the trustworthiness of AI-based systems.
Collapse
Affiliation(s)
- Nadine Schlicker
- Institut für Künstliche Intelligenz in der Medizin, Philipps-Universität Marburg, Baldingerstr., 35043, Marburg, Deutschland.
| | - Markus Langer
- Fachbereich Psychologie, Arbeitseinheit Digitalisierung in psychologischen Handlungsfeldern, Philipps-Universität Marburg, Marburg, Deutschland
| | - Martin C Hirsch
- Institut für Künstliche Intelligenz in der Medizin, Philipps-Universität Marburg, Baldingerstr., 35043, Marburg, Deutschland
| |
Collapse
|
47
|
Vijayakumar S, Lee VV, Leong QY, Hong SJ, Blasiak A, Ho D. Physicians' Perspectives on AI in Clinical Decision Support Systems: Interview Study of the CURATE.AI Personalized Dose Optimization Platform. JMIR Hum Factors 2023; 10:e48476. [PMID: 37902825 PMCID: PMC10644191 DOI: 10.2196/48476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 08/24/2023] [Accepted: 09/10/2023] [Indexed: 10/31/2023] Open
Abstract
BACKGROUND Physicians play a key role in integrating new clinical technology into care practices through user feedback and growth propositions to developers of the technology. As physicians are stakeholders involved through the technology iteration process, understanding their roles as users can provide nuanced insights into the workings of these technologies that are being explored. Therefore, understanding physicians' perceptions can be critical toward clinical validation, implementation, and downstream adoption. Given the increasing prevalence of clinical decision support systems (CDSSs), there remains a need to gain an in-depth understanding of physicians' perceptions and expectations toward their downstream implementation. This paper explores physicians' perceptions of integrating CURATE.AI, a novel artificial intelligence (AI)-based and clinical stage personalized dosing CDSSs, into clinical practice. OBJECTIVE This study aims to understand physicians' perspectives of integrating CURATE.AI for clinical work and to gather insights on considerations of the implementation of AI-based CDSS tools. METHODS A total of 12 participants completed semistructured interviews examining their knowledge, experience, attitudes, risks, and future course of the personalized combination therapy dosing platform, CURATE.AI. Interviews were audio recorded, transcribed verbatim, and coded manually. The data were thematically analyzed. RESULTS Overall, 3 broad themes and 9 subthemes were identified through thematic analysis. The themes covered considerations that physicians perceived as significant across various stages of new technology development, including trial, clinical implementation, and mass adoption. CONCLUSIONS The study laid out the various ways physicians interpreted an AI-based personalized dosing CDSS, CURATE.AI, for their clinical practice. The research pointed out that physicians' expectations during the different stages of technology exploration can be nuanced and layered with expectations of implementation that are relevant for technology developers and researchers.
Collapse
Affiliation(s)
- Smrithi Vijayakumar
- The N.1 Institute for Health, National University of Singapore, Singapore, Singapore
| | - V Vien Lee
- The N.1 Institute for Health, National University of Singapore, Singapore, Singapore
| | - Qiao Ying Leong
- The N.1 Institute for Health, National University of Singapore, Singapore, Singapore
| | - Soo Jung Hong
- Department of Communications and New Media, National University of Singapore, Singapore, Singapore
| | - Agata Blasiak
- The N.1 Institute for Health, National University of Singapore, Singapore, Singapore
- Department of Biomedical Engineering, National University of Singapore, Singapore, Singapore
- The Institute for Digital Medicine (WisDM), Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Department of Pharmacology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Dean Ho
- The N.1 Institute for Health, National University of Singapore, Singapore, Singapore
- Department of Biomedical Engineering, National University of Singapore, Singapore, Singapore
- The Institute for Digital Medicine (WisDM), Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Department of Pharmacology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| |
Collapse
|
48
|
Joo H, Mathis MR, Tam M, James C, Han P, Mangrulkar RS, Friedman CP, Vydiswaran VGV. Applying AI and Guidelines to Assist Medical Students in Recognizing Patients With Heart Failure: Protocol for a Randomized Trial. JMIR Res Protoc 2023; 12:e49842. [PMID: 37874618 PMCID: PMC10630872 DOI: 10.2196/49842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 09/16/2023] [Accepted: 09/20/2023] [Indexed: 10/25/2023] Open
Abstract
BACKGROUND The integration of artificial intelligence (AI) into clinical practice is transforming both clinical practice and medical education. AI-based systems aim to improve the efficacy of clinical tasks, enhancing diagnostic accuracy and tailoring treatment delivery. As it becomes increasingly prevalent in health care for high-quality patient care, it is critical for health care providers to use the systems responsibly to mitigate bias, ensure effective outcomes, and provide safe clinical practices. In this study, the clinical task is the identification of heart failure (HF) prior to surgery with the intention of enhancing clinical decision-making skills. HF is a common and severe disease, but detection remains challenging due to its subtle manifestation, often concurrent with other medical conditions, and the absence of a simple and effective diagnostic test. While advanced HF algorithms have been developed, the use of these AI-based systems to enhance clinical decision-making in medical education remains understudied. OBJECTIVE This research protocol is to demonstrate our study design, systematic procedures for selecting surgical cases from electronic health records, and interventions. The primary objective of this study is to measure the effectiveness of interventions aimed at improving HF recognition before surgery, the second objective is to evaluate the impact of inaccurate AI recommendations, and the third objective is to explore the relationship between the inclination to accept AI recommendations and their accuracy. METHODS Our study used a 3 × 2 factorial design (intervention type × order of prepost sets) for this randomized trial with medical students. The student participants are asked to complete a 30-minute e-learning module that includes key information about the intervention and a 5-question quiz, and a 60-minute review of 20 surgical cases to determine the presence of HF. To mitigate selection bias in the pre- and posttests, we adopted a feature-based systematic sampling procedure. From a pool of 703 expert-reviewed surgical cases, 20 were selected based on features such as case complexity, model performance, and positive and negative labels. This study comprises three interventions: (1) a direct AI-based recommendation with a predicted HF score, (2) an indirect AI-based recommendation gauged through the area under the curve metric, and (3) an HF guideline-based intervention. RESULTS As of July 2023, 62 of the enrolled medical students have fulfilled this study's participation, including the completion of a short quiz and the review of 20 surgical cases. The subject enrollment commenced in August 2022 and will end in December 2023, with the goal of recruiting 75 medical students in years 3 and 4 with clinical experience. CONCLUSIONS We demonstrated a study protocol for the randomized trial, measuring the effectiveness of interventions using AI and HF guidelines among medical students to enhance HF recognition in preoperative care with electronic health record data. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) DERR1-10.2196/49842.
Collapse
Affiliation(s)
- Hyeon Joo
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI, United States
| | - Michael R Mathis
- Department of Anesthesiology, University of Michigan, Ann Arbor, MI, United States
| | - Marty Tam
- Department of Internal Medicine, Cardiology, University of Michigan, Ann Arbor, MI, United States
| | - Cornelius James
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI, United States
- Department of Pediatrics, University of Michigan, Ann Arbor, MI, United States
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI, United States
| | - Peijin Han
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, United States
| | - Rajesh S Mangrulkar
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI, United States
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI, United States
| | - Charles P Friedman
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI, United States
- School of Information, University of Michigan, Ann Arbor, MI, United States
| | - V G Vinod Vydiswaran
- Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI, United States
- School of Information, University of Michigan, Ann Arbor, MI, United States
| |
Collapse
|
49
|
Vicente L, Matute H. Humans inherit artificial intelligence biases. Sci Rep 2023; 13:15737. [PMID: 37789032 PMCID: PMC10547752 DOI: 10.1038/s41598-023-42384-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 09/09/2023] [Indexed: 10/05/2023] Open
Abstract
Artificial intelligence recommendations are sometimes erroneous and biased. In our research, we hypothesized that people who perform a (simulated) medical diagnostic task assisted by a biased AI system will reproduce the model's bias in their own decisions, even when they move to a context without AI support. In three experiments, participants completed a medical-themed classification task with or without the help of a biased AI system. The biased recommendations by the AI influenced participants' decisions. Moreover, when those participants, assisted by the AI, moved on to perform the task without assistance, they made the same errors as the AI had made during the previous phase. Thus, participants' responses mimicked AI bias even when the AI was no longer making suggestions. These results provide evidence of human inheritance of AI bias.
Collapse
Affiliation(s)
- Lucía Vicente
- Department of Psychology, Deusto University, Avenida Universidades 24, 48007, Bilbao, Spain
| | - Helena Matute
- Department of Psychology, Deusto University, Avenida Universidades 24, 48007, Bilbao, Spain.
| |
Collapse
|
50
|
Carboni C, Wehrens R, van der Veen R, de Bont A. Eye for an AI: More-than-seeing, fauxtomation, and the enactment of uncertain data in digital pathology. SOCIAL STUDIES OF SCIENCE 2023; 53:712-737. [PMID: 37154611 PMCID: PMC10543128 DOI: 10.1177/03063127231167589] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Artificial Intelligence (AI) tools are being developed to assist with increasingly complex diagnostic tasks in medicine. This produces epistemic disruption in diagnostic processes, even in the absence of AI itself, through the datafication and digitalization encouraged by the promissory discourses around AI. In this study of the digitization of an academic pathology department, we mobilize Barad's agential realist framework to examine these epistemic disruptions. Narratives and expectations around AI-assisted diagnostics-which are inextricable from material changes-enact specific types of organizational change, and produce epistemic objects that facilitate to the emergence of some epistemic practices and subjects, but hinder others. Agential realism allows us to simultaneously study epistemic, ethical, and ontological changes enacted through digitization efforts, while keeping a close eye on the attendant organizational changes. Based on ethnographic analysis of pathologists' changing work processes, we identify three different types of uncertainty produced by digitization: sensorial, intra-active, and fauxtomated uncertainty. Sensorial and intra-active uncertainty stem from the ontological otherness of digital objects, materialized in their affordances, and result in digital slides' partial illegibility. Fauxtomated uncertainty stems from the quasi-automated digital slide-making, which complicates the question of responsibility for epistemic objects and related knowledge by marginalizing the human.
Collapse
Affiliation(s)
- Chiara Carboni
- Erasmus University Rotterdam, Rotterdam, The Netherlands
| | - Rik Wehrens
- Erasmus University Rotterdam, Rotterdam, The Netherlands
| | | | | |
Collapse
|