1
|
Kale AU, Hogg HDJ, Pearson R, Glocker B, Golder S, Coombe A, Waring J, Liu X, Moore DJ, Denniston AK. Detecting Algorithmic Errors and Patient Harms for AI-Enabled Medical Devices in Randomized Controlled Trials: Protocol for a Systematic Review. JMIR Res Protoc 2024; 13:e51614. [PMID: 38941147 DOI: 10.2196/51614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 03/11/2024] [Accepted: 04/18/2024] [Indexed: 06/29/2024] Open
Abstract
BACKGROUND Artificial intelligence (AI) medical devices have the potential to transform existing clinical workflows and ultimately improve patient outcomes. AI medical devices have shown potential for a range of clinical tasks such as diagnostics, prognostics, and therapeutic decision-making such as drug dosing. There is, however, an urgent need to ensure that these technologies remain safe for all populations. Recent literature demonstrates the need for rigorous performance error analysis to identify issues such as algorithmic encoding of spurious correlations (eg, protected characteristics) or specific failure modes that may lead to patient harm. Guidelines for reporting on studies that evaluate AI medical devices require the mention of performance error analysis; however, there is still a lack of understanding around how performance errors should be analyzed in clinical studies, and what harms authors should aim to detect and report. OBJECTIVE This systematic review will assess the frequency and severity of AI errors and adverse events (AEs) in randomized controlled trials (RCTs) investigating AI medical devices as interventions in clinical settings. The review will also explore how performance errors are analyzed including whether the analysis includes the investigation of subgroup-level outcomes. METHODS This systematic review will identify and select RCTs assessing AI medical devices. Search strategies will be deployed in MEDLINE (Ovid), Embase (Ovid), Cochrane CENTRAL, and clinical trial registries to identify relevant papers. RCTs identified in bibliographic databases will be cross-referenced with clinical trial registries. The primary outcomes of interest are the frequency and severity of AI errors, patient harms, and reported AEs. Quality assessment of RCTs will be based on version 2 of the Cochrane risk-of-bias tool (RoB2). Data analysis will include a comparison of error rates and patient harms between study arms, and a meta-analysis of the rates of patient harm in control versus intervention arms will be conducted if appropriate. RESULTS The project was registered on PROSPERO in February 2023. Preliminary searches have been completed and the search strategy has been designed in consultation with an information specialist and methodologist. Title and abstract screening started in September 2023. Full-text screening is ongoing and data collection and analysis began in April 2024. CONCLUSIONS Evaluations of AI medical devices have shown promising results; however, reporting of studies has been variable. Detection, analysis, and reporting of performance errors and patient harms is vital to robustly assess the safety of AI medical devices in RCTs. Scoping searches have illustrated that the reporting of harms is variable, often with no mention of AEs. The findings of this systematic review will identify the frequency and severity of AI performance errors and patient harms and generate insights into how errors should be analyzed to account for both overall and subgroup performance. TRIAL REGISTRATION PROSPERO CRD42023387747; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=387747. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) PRR1-10.2196/51614.
Collapse
Affiliation(s)
- Aditya U Kale
- Institute of Inflammation and Ageing, University of Birmingham, Birmingham, United Kingdom
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, United Kingdom
- NIHR Birmingham Biomedical Research Centre, Birmingham, United Kingdom
- NIHR Incubator for AI and Digital Health Research, Birmingham, United Kingdom
| | - Henry David Jeffry Hogg
- Population Health Science Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Russell Pearson
- Medicines and Healthcare Products Regulatory Agency, London, United Kingdom
| | - Ben Glocker
- Kheiron Medical Technologies, London, United Kingdom
- Department of Computing, Imperial College London, London, United Kingdom
| | - Su Golder
- Department of Health Sciences, University of York, York, United Kingdom
| | - April Coombe
- Institute of Applied Health Research, University of Birmingham, Birmingham, United Kingdom
| | - Justin Waring
- Health Services Management Centre, University of Birmingham, Birmingham, United Kingdom
| | - Xiaoxuan Liu
- Institute of Inflammation and Ageing, University of Birmingham, Birmingham, United Kingdom
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, United Kingdom
- NIHR Birmingham Biomedical Research Centre, Birmingham, United Kingdom
- NIHR Incubator for AI and Digital Health Research, Birmingham, United Kingdom
| | - David J Moore
- Institute of Applied Health Research, University of Birmingham, Birmingham, United Kingdom
| | - Alastair K Denniston
- Institute of Inflammation and Ageing, University of Birmingham, Birmingham, United Kingdom
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, United Kingdom
- NIHR Birmingham Biomedical Research Centre, Birmingham, United Kingdom
- NIHR Incubator for AI and Digital Health Research, Birmingham, United Kingdom
| |
Collapse
|
2
|
Faust L, Wilson P, Asai S, Fu S, Liu H, Ruan X, Storlie C. Considerations for Quality Control Monitoring of Machine Learning Models in Clinical Practice. JMIR Med Inform 2024; 12:e50437. [PMID: 38941140 DOI: 10.2196/50437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 08/22/2023] [Accepted: 05/04/2024] [Indexed: 06/29/2024] Open
Abstract
Integrating machine learning (ML) models into clinical practice presents a challenge of maintaining their efficacy over time. While existing literature offers valuable strategies for detecting declining model performance, there is a need to document the broader challenges and solutions associated with the real-world development and integration of model monitoring solutions. This work details the development and use of a platform for monitoring the performance of a production-level ML model operating in Mayo Clinic. In this paper, we aimed to provide a series of considerations and guidelines necessary for integrating such a platform into a team's technical infrastructure and workflow. We have documented our experiences with this integration process, discussed the broader challenges encountered with real-world implementation and maintenance, and included the source code for the platform. Our monitoring platform was built as an R shiny application, developed and implemented over the course of 6 months. The platform has been used and maintained for 2 years and is still in use as of July 2023. The considerations necessary for the implementation of the monitoring platform center around 4 pillars: feasibility (what resources can be used for platform development?); design (through what statistics or models will the model be monitored, and how will these results be efficiently displayed to the end user?); implementation (how will this platform be built, and where will it exist within the IT ecosystem?); and policy (based on monitoring feedback, when and what actions will be taken to fix problems, and how will these problems be translated to clinical staff?). While much of the literature surrounding ML performance monitoring emphasizes methodological approaches for capturing changes in performance, there remains a battery of other challenges and considerations that must be addressed for successful real-world implementation.
Collapse
Affiliation(s)
- Louis Faust
- Robert D and Patricia E Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, MN, United States
| | - Patrick Wilson
- Robert D and Patricia E Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, MN, United States
| | - Shusaku Asai
- Robert D and Patricia E Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, MN, United States
| | - Sunyang Fu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| | - Hongfang Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| | - Xiaoyang Ruan
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| | - Curt Storlie
- Robert D and Patricia E Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, MN, United States
| |
Collapse
|
3
|
Dong T, Sinha S, Zhai B, Fudulu D, Chan J, Narayan P, Judge A, Caputo M, Dimagli A, Benedetto U, Angelini GD. Performance Drift in Machine Learning Models for Cardiac Surgery Risk Prediction: Retrospective Analysis. JMIRX MED 2024; 5:e45973. [PMID: 38889069 PMCID: PMC11217160 DOI: 10.2196/45973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 02/27/2024] [Accepted: 04/29/2024] [Indexed: 06/20/2024]
Abstract
Background The Society of Thoracic Surgeons and European System for Cardiac Operative Risk Evaluation (EuroSCORE) II risk scores are the most commonly used risk prediction models for in-hospital mortality after adult cardiac surgery. However, they are prone to miscalibration over time and poor generalization across data sets; thus, their use remains controversial. Despite increased interest, a gap in understanding the effect of data set drift on the performance of machine learning (ML) over time remains a barrier to its wider use in clinical practice. Data set drift occurs when an ML system underperforms because of a mismatch between the data it was developed from and the data on which it is deployed. Objective In this study, we analyzed the extent of performance drift using models built on a large UK cardiac surgery database. The objectives were to (1) rank and assess the extent of performance drift in cardiac surgery risk ML models over time and (2) investigate any potential influence of data set drift and variable importance drift on performance drift. Methods We conducted a retrospective analysis of prospectively, routinely gathered data on adult patients undergoing cardiac surgery in the United Kingdom between 2012 and 2019. We temporally split the data 70:30 into a training and validation set and a holdout set. Five novel ML mortality prediction models were developed and assessed, along with EuroSCORE II, for relationships between and within variable importance drift, performance drift, and actual data set drift. Performance was assessed using a consensus metric. Results A total of 227,087 adults underwent cardiac surgery during the study period, with a mortality rate of 2.76% (n=6258). There was strong evidence of a decrease in overall performance across all models (P<.0001). Extreme gradient boosting (clinical effectiveness metric [CEM] 0.728, 95% CI 0.728-0.729) and random forest (CEM 0.727, 95% CI 0.727-0.728) were the overall best-performing models, both temporally and nontemporally. EuroSCORE II performed the worst across all comparisons. Sharp changes in variable importance and data set drift from October to December 2017, from June to July 2018, and from December 2018 to February 2019 mirrored the effects of performance decrease across models. Conclusions All models show a decrease in at least 3 of the 5 individual metrics. CEM and variable importance drift detection demonstrate the limitation of logistic regression methods used for cardiac surgery risk prediction and the effects of data set drift. Future work will be required to determine the interplay between ML models and whether ensemble models could improve on their respective performance advantages.
Collapse
Affiliation(s)
- Tim Dong
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol, United Kingdom
| | - Shubhra Sinha
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol, United Kingdom
| | - Ben Zhai
- School of Computing Science, Northumbria University, Newcastle upon Tyne, United Kingdom
| | - Daniel Fudulu
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol, United Kingdom
| | - Jeremy Chan
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol, United Kingdom
| | - Pradeep Narayan
- Department of Cardiac Surgery, Rabindranath Tagore International Institute of Cardiac Sciences, West Bengal, India
| | - Andy Judge
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol, United Kingdom
| | - Massimo Caputo
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol, United Kingdom
| | - Arnaldo Dimagli
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol, United Kingdom
| | - Umberto Benedetto
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol, United Kingdom
| | - Gianni D Angelini
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol, United Kingdom
| |
Collapse
|
4
|
Chung A, Opoku-Pare GA, Tibble H. Cause of death coding in asthma. BMC Med Res Methodol 2024; 24:129. [PMID: 38840045 PMCID: PMC11151540 DOI: 10.1186/s12874-024-02238-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 05/03/2024] [Indexed: 06/07/2024] Open
Abstract
BACKGROUND While clinical coding is intended to be an objective and standardized practice, it is important to recognize that it is not entirely the case. The clinical and bureaucratic practices from event of death to a case being entered into a research dataset are important context for analysing and interpreting this data. Variation in practices can influence the accuracy of the final coded record in two different stages: the reporting of the death certificate, and the International Classification of Diseases (Version 10; ICD-10) coding of that certificate. METHODS This study investigated 91,022 deaths recorded in the Scottish Asthma Learning Healthcare System dataset between 2000 and 2017. Asthma-related deaths were identified by the presence of any of ICD-10 codes J45 or J46, in any position. These codes were categorized either as relating to asthma attacks specifically (status asthmatic; J46) or generally to asthma diagnosis (J45). RESULTS We found that one in every 200 deaths in this were coded as being asthma related. Less than 1% of asthma-related mortality records used both J45 and J46 ICD-10 codes as causes. Infection (predominantly pneumonia) was more commonly reported as a contributing cause of death when J45 was the primary coded cause, compared to J46, which specifically denotes asthma attacks. CONCLUSION Further inspection of patient history can be essential to validate deaths recorded as caused by asthma, and to identify potentially mis-recorded non-asthma deaths, particularly in those with complex comorbidities.
Collapse
Affiliation(s)
| | | | - Holly Tibble
- Usher Institute, University of Edinburgh, Edinburgh, Scotland.
- Asthma UK Centre for Applied Research, Edinburgh, Scotland.
| |
Collapse
|
5
|
Petrella RJ. The AI Future of Emergency Medicine. Ann Emerg Med 2024:S0196-0644(24)00043-X. [PMID: 38795081 DOI: 10.1016/j.annemergmed.2024.01.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 01/23/2024] [Accepted: 01/24/2024] [Indexed: 05/27/2024]
Abstract
In the coming years, artificial intelligence (AI) and machine learning will likely give rise to profound changes in the field of emergency medicine, and medicine more broadly. This article discusses these anticipated changes in terms of 3 overlapping yet distinct stages of AI development. It reviews some fundamental concepts in AI and explores their relation to clinical practice, with a focus on emergency medicine. In addition, it describes some of the applications of AI in disease diagnosis, prognosis, and treatment, as well as some of the practical issues that they raise, the barriers to their implementation, and some of the legal and regulatory challenges they create.
Collapse
Affiliation(s)
- Robert J Petrella
- Emergency Departments, CharterCARE Health Partners, Providence and North Providence, RI; Emergency Department, Boston VA Medical Center, Boston, MA; Emergency Departments, Steward Health Care System, Boston and Methuen, MA; Harvard Medical School, Boston, MA; Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA; Department of Medicine, Brigham and Women's Hospital, Boston, MA.
| |
Collapse
|
6
|
Cherblanc J, Gaboury S, Maître J, Côté I, Cadell S, Bergeron-Leclerc C. Predicting levels of prolonged grief disorder symptoms during the COVID-19 pandemic: An integrated approach of classical data exploration, predictive machine learning, and explainable AI. J Affect Disord 2024; 351:746-754. [PMID: 38290589 DOI: 10.1016/j.jad.2024.01.236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 01/11/2024] [Accepted: 01/26/2024] [Indexed: 02/01/2024]
Abstract
BACKGROUND Prior studies on Prolonged Grief Disorder (PGD) primarily employed classical approaches to link bereaved individuals' characteristics with PGD symptom levels. This study utilized machine learning to identify key factors influencing PGD symptoms during the COVID-19 pandemic. METHODS We analyzed data from 479 participants through an online survey, employing classical data exploration, predictive machine learning, and SHapley Additive exPlanations (SHAP) to determine key factors influencing PGD symptoms measured with the Traumatic Grief Inventory - Self Report (TGI-SR) from 19 variables, comparing five predictive models. RESULTS The classical approach identified eight variables associated with a possible PGD (TGI-SR score ≥ 59): unexpected causes of death, living alone, seeking professional support, taking anxiety and/or depression medications, using more grief services (telephone or online supports) and more confrontation-oriented coping strategies, and higher levels of depression and anxiety. Using machine learning techniques, the CatBoost algorithm provided the best predictive model of the TGI-SR score (r2 = 0.6479). The three variables influencing the most the level of PGD symptoms were anxiety, and levels of avoidance and confrontation coping strategies used. CONCLUSIONS This pioneering approach within the field of grief research enabled us to leverage the extensive dataset collected during the pandemic, facilitating a deeper comprehension of the predominant factors influencing the grieving process for individuals who experienced loss during this period. LIMITATIONS This study acknowledges self-selection bias, limited sample diversity, and suggests further research is needed to fully understand the predictors of PGD symptoms.
Collapse
|
7
|
Ghanta SN, Gautam N, Mehta JL, Al’Aref SJ. Machine Learning for Predicting Intubations in Heart Failure Patients: the Challenge of the Right Approach. Cardiovasc Drugs Ther 2024; 38:211-214. [PMID: 36593325 PMCID: PMC9807425 DOI: 10.1007/s10557-022-07423-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/28/2022] [Indexed: 01/04/2023]
Affiliation(s)
- Sai Nikhila Ghanta
- Department of Internal Medicine, University of Arkansas for Medical Sciences, Little Rock, AR USA
| | - Nitesh Gautam
- Department of Internal Medicine, University of Arkansas for Medical Sciences, Little Rock, AR USA
| | - Jawahar L. Mehta
- Department of Medicine, Division of Cardiology, University of Arkansas for Medical Sciences, 4301 W. Markham St, Little Rock, AR USA
| | - Subhi J. Al’Aref
- Department of Medicine, Division of Cardiology, University of Arkansas for Medical Sciences, 4301 W. Markham St, Little Rock, AR USA
| |
Collapse
|
8
|
Kore A, Abbasi Bavil E, Subasri V, Abdalla M, Fine B, Dolatabadi E, Abdalla M. Empirical data drift detection experiments on real-world medical imaging data. Nat Commun 2024; 15:1887. [PMID: 38424096 PMCID: PMC10904813 DOI: 10.1038/s41467-024-46142-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 02/14/2024] [Indexed: 03/02/2024] Open
Abstract
While it is common to monitor deployed clinical artificial intelligence (AI) models for performance degradation, it is less common for the input data to be monitored for data drift - systemic changes to input distributions. However, when real-time evaluation may not be practical (eg., labeling costs) or when gold-labels are automatically generated, we argue that tracking data drift becomes a vital addition for AI deployments. In this work, we perform empirical experiments on real-world medical imaging to evaluate three data drift detection methods' ability to detect data drift caused (a) naturally (emergence of COVID-19 in X-rays) and (b) synthetically. We find that monitoring performance alone is not a good proxy for detecting data drift and that drift-detection heavily depends on sample size and patient features. Our work discusses the need and utility of data drift detection in various scenarios and highlights gaps in knowledge for the practical application of existing methods.
Collapse
Affiliation(s)
- Ali Kore
- Vector Institute, Toronto, Canada
| | | | - Vallijah Subasri
- Peter Munk Cardiac Center, University Health Network, Toronto, ON, Canada
| | - Moustafa Abdalla
- Department of Surgery, Harvard Medical School, Massachusetts General Hospital, Boston, USA
| | - Benjamin Fine
- Institute for Better Health, Trillium Health Partners, Mississauga, Canada
- Department of Medical Imaging, University of Toronto, Toronto, Canada
| | - Elham Dolatabadi
- Vector Institute, Toronto, Canada
- School of Health Policy and Management, Faculty of Health, York University, Toronto, Canada
| | - Mohamed Abdalla
- Institute for Better Health, Trillium Health Partners, Mississauga, Canada.
| |
Collapse
|
9
|
Hashimoto DA, Sambasastry SK, Singh V, Kurada S, Altieri M, Yoshida T, Madani A, Jogan M. A foundation for evaluating the surgical artificial intelligence literature. EUROPEAN JOURNAL OF SURGICAL ONCOLOGY 2024:108014. [PMID: 38360498 DOI: 10.1016/j.ejso.2024.108014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2023] [Revised: 01/06/2024] [Accepted: 02/09/2024] [Indexed: 02/17/2024]
Abstract
With increasing growth in applications of artificial intelligence (AI) in surgery, it has become essential for surgeons to gain a foundation of knowledge to critically appraise the scientific literature, commercial claims regarding products, and regulatory and legal frameworks that govern the development and use of AI. This guide offers surgeons a framework with which to evaluate manuscripts that incorporate the use of AI. It provides a glossary of common terms, an overview of prerequisite knowledge to maximize understanding of methodology, and recommendations on how to carefully consider each element of a manuscript to assess the quality of the data on which an algorithm was trained, the appropriateness of the methodological approach, the potential for reproducibility of the experiment, and the applicability to surgical practice, including considerations on generalizability and scalability.
Collapse
Affiliation(s)
- Daniel A Hashimoto
- Penn Computer Assisted Surgery and Outcomes Laboratory, Department of Surgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA; Global Surgical AI Collaborative, Toronto, ON, USA.
| | - Sai Koushik Sambasastry
- Penn Computer Assisted Surgery and Outcomes Laboratory, Department of Surgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Vivek Singh
- Penn Computer Assisted Surgery and Outcomes Laboratory, Department of Surgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Sruthi Kurada
- Penn Computer Assisted Surgery and Outcomes Laboratory, Department of Surgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Maria Altieri
- Penn Computer Assisted Surgery and Outcomes Laboratory, Department of Surgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Global Surgical AI Collaborative, Toronto, ON, USA
| | - Takuto Yoshida
- Surgical AI Research Academy, Department of Surgery, University Health Network, Toronto, ON, USA
| | - Amin Madani
- Global Surgical AI Collaborative, Toronto, ON, USA; Surgical AI Research Academy, Department of Surgery, University Health Network, Toronto, ON, USA
| | - Matjaz Jogan
- Penn Computer Assisted Surgery and Outcomes Laboratory, Department of Surgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
10
|
Conners KM, Avery CL, Syed FF. Advancing Cardiovascular Risk Assessment with Artificial Intelligence: Opportunities and Implications in North Carolina. N C Med J 2024; 85:10.18043/001c.91424. [PMID: 38938760 PMCID: PMC11208038 DOI: 10.18043/001c.91424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024]
Abstract
Cardiovascular disease mortality is increasing in North Carolina with persistent inequality by race, income, and location. Artificial intelligence (AI) can repurpose the widely available electrocardiogram (ECG) for enhanced assessment of cardiac dysfunction. By identifying accelerated cardiac aging from the ECG, AI offers novel insights into risk assessment and prevention.
Collapse
Affiliation(s)
- Katherine M Conners
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Christy L Avery
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Faisal F Syed
- Division of Cardiology, University of North Carolina School of Medicine, Chapel Hill, North Carolina
| |
Collapse
|
11
|
Hekman DJ, Barton HJ, Maru AP, Wills G, Cochran AL, Fritsch C, Wiegmann DA, Liao F, Patterson BW. Dashboarding to Monitor Machine-Learning-Based Clinical Decision Support Interventions. Appl Clin Inform 2024; 15:164-169. [PMID: 38029792 PMCID: PMC10901643 DOI: 10.1055/a-2219-5175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Accepted: 11/28/2023] [Indexed: 12/01/2023] Open
Abstract
BACKGROUND Existing monitoring of machine-learning-based clinical decision support (ML-CDS) is focused predominantly on the ML outputs and accuracy thereof. Improving patient care requires not only accurate algorithms but also systems of care that enable the output of these algorithms to drive specific actions by care teams, necessitating expanding their monitoring. OBJECTIVES In this case report, we describe the creation of a dashboard that allows the intervention development team and operational stakeholders to govern and identify potential issues that may require corrective action by bridging the monitoring gap between model outputs and patient outcomes. METHODS We used an iterative development process to build a dashboard to monitor the performance of our intervention in the broader context of the care system. RESULTS Our investigation of best practices elsewhere, iterative design, and expert consultation led us to anchor our dashboard on alluvial charts and control charts. Both the development process and the dashboard itself illuminated areas to improve the broader intervention. CONCLUSION We propose that monitoring ML-CDS algorithms with regular dashboards that allow both a context-level view of the system and a drilled down view of specific components is a critical part of implementing these algorithms to ensure that these tools function appropriately within the broader care system.
Collapse
Affiliation(s)
- Daniel J. Hekman
- Berbee-Walsh Department of Emergency Medicine, University of Wisconsin-Madison, School of Medicine and Public Health, Madison, Wisconsin, United States
| | - Hanna J. Barton
- Berbee-Walsh Department of Emergency Medicine, University of Wisconsin-Madison, School of Medicine and Public Health, Madison, Wisconsin, United States
| | - Apoorva P. Maru
- Berbee-Walsh Department of Emergency Medicine, University of Wisconsin-Madison, School of Medicine and Public Health, Madison, Wisconsin, United States
| | - Graham Wills
- Department of Applied Data Science, UWHealth Hospitals and Clinics, Madison, Wisconsin, United States
| | - Amy L. Cochran
- Department of Population Health, University of Wisconsin-Madison, School of Medicine and Public Health, Madison, Wisconsin, United States
| | - Corey Fritsch
- Department of Applied Data Science, UWHealth Hospitals and Clinics, Madison, Wisconsin, United States
| | - Douglas A. Wiegmann
- Department of Industrial and Systems Engineering, University of Wisconsin-Madison, Madison, Wisconsin, United States
| | - Frank Liao
- Department of Applied Data Science, UWHealth Hospitals and Clinics, Madison, Wisconsin, United States
| | - Brian W. Patterson
- Berbee-Walsh Department of Emergency Medicine, University of Wisconsin-Madison, School of Medicine and Public Health, Madison, Wisconsin, United States
- Department of Population Health, University of Wisconsin-Madison, School of Medicine and Public Health, Madison, Wisconsin, United States
- Department of Industrial and Systems Engineering, University of Wisconsin-Madison, Madison, Wisconsin, United States
| |
Collapse
|
12
|
van Velzen M, de Graaf-Waar HI, Ubert T, van der Willigen RF, Muilwijk L, Schmitt MA, Scheper MC, van Meeteren NLU. 21st century (clinical) decision support in nursing and allied healthcare. Developing a learning health system: a reasoned design of a theoretical framework. BMC Med Inform Decis Mak 2023; 23:279. [PMID: 38053104 PMCID: PMC10699040 DOI: 10.1186/s12911-023-02372-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 11/09/2023] [Indexed: 12/07/2023] Open
Abstract
In this paper, we present a framework for developing a Learning Health System (LHS) to provide means to a computerized clinical decision support system for allied healthcare and/or nursing professionals. LHSs are well suited to transform healthcare systems in a mission-oriented approach, and is being adopted by an increasing number of countries. Our theoretical framework provides a blueprint for organizing such a transformation with help of evidence based state of the art methodologies and techniques to eventually optimize personalized health and healthcare. Learning via health information technologies using LHS enables users to learn both individually and collectively, and independent of their location. These developments demand healthcare innovations beyond a disease focused orientation since clinical decision making in allied healthcare and nursing is mainly based on aspects of individuals' functioning, wellbeing and (dis)abilities. Developing LHSs depends heavily on intertwined social and technological innovation, and research and development. Crucial factors may be the transformation of the Internet of Things into the Internet of FAIR data & services. However, Electronic Health Record (EHR) data is in up to 80% unstructured including free text narratives and stored in various inaccessible data warehouses. Enabling the use of data as a driver for learning is challenged by interoperability and reusability.To address technical needs, key enabling technologies are suitable to convert relevant health data into machine actionable data and to develop algorithms for computerized decision support. To enable data conversions, existing classification and terminology systems serve as definition providers for natural language processing through (un)supervised learning.To facilitate clinical reasoning and personalized healthcare using LHSs, the development of personomics and functionomics are useful in allied healthcare and nursing. Developing these omics will be determined via text and data mining. This will focus on the relationships between social, psychological, cultural, behavioral and economic determinants, and human functioning.Furthermore, multiparty collaboration is crucial to develop LHSs, and man-machine interaction studies are required to develop a functional design and prototype. During development, validation and maintenance of the LHS continuous attention for challenges like data-drift, ethical, technical and practical implementation difficulties is required.
Collapse
Affiliation(s)
- Mark van Velzen
- Data Supported Healthcare: Data-Science unit, Research Center Innovations in care, Rotterdam University of Applied Sciences, Rotterdam, the Netherlands.
- Department of Anesthesiology, Erasmus Medical Center, Rotterdam, the Netherlands.
| | - Helen I de Graaf-Waar
- Data Supported Healthcare: Data-Science unit, Research Center Innovations in care, Rotterdam University of Applied Sciences, Rotterdam, the Netherlands
- Department of Anesthesiology, Erasmus Medical Center, Rotterdam, the Netherlands
| | - Tanja Ubert
- Institute for Communication, media and information Technology, Rotterdam University of Applied Sciences, Rotterdam, the Netherlands
| | - Robert F van der Willigen
- Institute for Communication, media and information Technology, Rotterdam University of Applied Sciences, Rotterdam, the Netherlands
| | - Lotte Muilwijk
- Data Supported Healthcare: Data-Science unit, Research Center Innovations in care, Rotterdam University of Applied Sciences, Rotterdam, the Netherlands
- Institute for Communication, media and information Technology, Rotterdam University of Applied Sciences, Rotterdam, the Netherlands
| | - Maarten A Schmitt
- Data Supported Healthcare: Data-Science unit, Research Center Innovations in care, Rotterdam University of Applied Sciences, Rotterdam, the Netherlands
| | - Mark C Scheper
- Data Supported Healthcare: Data-Science unit, Research Center Innovations in care, Rotterdam University of Applied Sciences, Rotterdam, the Netherlands
- Department of Anesthesiology, Erasmus Medical Center, Rotterdam, the Netherlands
- Allied Health professions, faculty of medicine and science, Macquarrie University, Sydney, Australia
| | - Nico L U van Meeteren
- Department of Anesthesiology, Erasmus Medical Center, Rotterdam, the Netherlands
- Top Sector Life Sciences and Health (Health~Holland), The Hague, the Netherlands
| |
Collapse
|
13
|
Sahiner B, Chen W, Samala RK, Petrick N. Data drift in medical machine learning: implications and potential remedies. Br J Radiol 2023; 96:20220878. [PMID: 36971405 PMCID: PMC10546450 DOI: 10.1259/bjr.20220878] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 02/16/2023] [Accepted: 02/20/2023] [Indexed: 03/29/2023] Open
Abstract
Data drift refers to differences between the data used in training a machine learning (ML) model and that applied to the model in real-world operation. Medical ML systems can be exposed to various forms of data drift, including differences between the data sampled for training and used in clinical operation, differences between medical practices or context of use between training and clinical use, and time-related changes in patient populations, disease patterns, and data acquisition, to name a few. In this article, we first review the terminology used in ML literature related to data drift, define distinct types of drift, and discuss in detail potential causes within the context of medical applications with an emphasis on medical imaging. We then review the recent literature regarding the effects of data drift on medical ML systems, which overwhelmingly show that data drift can be a major cause for performance deterioration. We then discuss methods for monitoring data drift and mitigating its effects with an emphasis on pre- and post-deployment techniques. Some of the potential methods for drift detection and issues around model retraining when drift is detected are included. Based on our review, we find that data drift is a major concern in medical ML deployment and that more research is needed so that ML models can identify drift early, incorporate effective mitigation strategies and resist performance decay.
Collapse
Affiliation(s)
- Berkman Sahiner
- Center for Devices and Radiological Health, U.S. Food and Drug Administration 10903 New Hampshire Avenue, Silver Spring, MD 20993-0002
| | - Weijie Chen
- Center for Devices and Radiological Health, U.S. Food and Drug Administration 10903 New Hampshire Avenue, Silver Spring, MD 20993-0002
| | - Ravi K. Samala
- Center for Devices and Radiological Health, U.S. Food and Drug Administration 10903 New Hampshire Avenue, Silver Spring, MD 20993-0002
| | - Nicholas Petrick
- Center for Devices and Radiological Health, U.S. Food and Drug Administration 10903 New Hampshire Avenue, Silver Spring, MD 20993-0002
| |
Collapse
|
14
|
Mallio CA, Radbruch A, Deike-Hofmann K, van der Molen AJ, Dekkers IA, Zaharchuk G, Parizel PM, Beomonte Zobel B, Quattrocchi CC. Artificial Intelligence to Reduce or Eliminate the Need for Gadolinium-Based Contrast Agents in Brain and Cardiac MRI: A Literature Review. Invest Radiol 2023; 58:746-753. [PMID: 37126454 DOI: 10.1097/rli.0000000000000983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
ABSTRACT Brain and cardiac MRIs are fundamental noninvasive imaging tools, which can provide important clinical information and can be performed without or with gadolinium-based contrast agents (GBCAs), depending on the clinical indication. It is currently a topic of debate whether it would be feasible to extract information such as standard gadolinium-enhanced MRI while injecting either less or no GBCAs. Artificial intelligence (AI) is a great source of innovation in medical imaging and has been explored as a method to synthesize virtual contrast MR images, potentially yielding similar diagnostic performance without the need to administer GBCAs. If possible, there would be significant benefits, including reduction of costs, acquisition time, and environmental impact with respect to conventional contrast-enhanced MRI examinations. Given its promise, we believe additional research is needed to increase the evidence to make these AI solutions feasible, reliable, and robust enough to be integrated into the clinical framework. Here, we review recent AI studies aimed at reducing or replacing gadolinium in brain and cardiac imaging while maintaining diagnostic image quality.
Collapse
Affiliation(s)
| | - Alexander Radbruch
- Clinic for Diagnostic and Interventional Neuroradiology, University Clinic Bonn, and German Center for Neurodegenerative Diseases, DZNE, Bonn, Germany
| | - Katerina Deike-Hofmann
- Clinic for Diagnostic and Interventional Neuroradiology, University Clinic Bonn, and German Center for Neurodegenerative Diseases, DZNE, Bonn, Germany
| | - Aart J van der Molen
- Department of Radiology, Leiden University Medical Center, Leiden, the Netherlands
| | - Ilona A Dekkers
- Department of Radiology, Leiden University Medical Center, Leiden, the Netherlands
| | - Greg Zaharchuk
- Department of Radiology, Stanford University, Stanford, CA
| | | | | | | |
Collapse
|
15
|
McFadden BR, Reynolds M, Inglis TJJ. Developing machine learning systems worthy of trust for infection science: a requirement for future implementation into clinical practice. Front Digit Health 2023; 5:1260602. [PMID: 37829595 PMCID: PMC10565494 DOI: 10.3389/fdgth.2023.1260602] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 09/15/2023] [Indexed: 10/14/2023] Open
Abstract
Infection science is a discipline of healthcare which includes clinical microbiology, public health microbiology, mechanisms of microbial disease, and antimicrobial countermeasures. The importance of infection science has become more apparent in recent years during the SARS-CoV-2 (COVID-19) pandemic and subsequent highlighting of critical operational domains within infection science including the hospital, clinical laboratory, and public health environments to prevent, manage, and treat infectious diseases. However, as the global community transitions beyond the pandemic, the importance of infection science remains, with emerging infectious diseases, bloodstream infections, sepsis, and antimicrobial resistance becoming increasingly significant contributions to the burden of global disease. Machine learning (ML) is frequently applied in healthcare and medical domains, with growing interest in the application of ML techniques to problems in infection science. This has the potential to address several key aspects including improving patient outcomes, optimising workflows in the clinical laboratory, and supporting the management of public health. However, despite promising results, the implementation of ML into clinical practice and workflows is limited. Enabling the migration of ML models from the research to real world environment requires the development of trustworthy ML systems that support the requirements of users, stakeholders, and regulatory agencies. This paper will provide readers with a brief introduction to infection science, outline the principles of trustworthy ML systems, provide examples of the application of these principles in infection science, and propose future directions for moving towards the development of trustworthy ML systems in infection science.
Collapse
Affiliation(s)
- Benjamin R. McFadden
- School of Physics, Mathematics and Computing, University of Western Australia, Perth, WA, Australia
| | - Mark Reynolds
- School of Physics, Mathematics and Computing, University of Western Australia, Perth, WA, Australia
| | - Timothy J. J. Inglis
- Western Australian Country Health Service, Perth, WA, Australia
- School of Medicine, University of Western Australia, Perth, WA, Australia
- Department of Microbiology, Pathwest Laboratory Medicine, Perth, WA, Australia
| |
Collapse
|
16
|
Xu Y, Sun X, Liu Y, Huang Y, Liang M, Sun R, Yin G, Song C, Ding Q, Du B, Bi X. Prediction of subjective cognitive decline after corpus callosum infarction by an interpretable machine learning-derived early warning strategy. Front Neurol 2023; 14:1123607. [PMID: 37416313 PMCID: PMC10321713 DOI: 10.3389/fneur.2023.1123607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Accepted: 05/25/2023] [Indexed: 07/08/2023] Open
Abstract
Background and purpose Corpus callosum (CC) infarction is an extremely rare subtype of cerebral ischemic stroke, however, the symptoms of cognitive impairment often fail to attract early attention of patients, which seriously affects the long-term prognosis, such as high mortality, personality changes, mood disorders, psychotic reactions, financial burden and so on. This study seeks to develop and validate models for early predicting the risk of subjective cognitive decline (SCD) after CC infarction by machine learning (ML) algorithms. Methods This is a prospective study that enrolled 213 (only 3.7%) CC infarction patients from a nine-year cohort comprising 8,555 patients with acute ischemic stroke. Telephone follow-up surveys were carried out for the patients with definite diagnosis of CC infarction one-year after disease onset, and SCD was identified by Behavioral Risk Factor Surveillance System (BRFSS) questionnaire. Based on the significant features selected by the least absolute shrinkage and selection operator (LASSO), seven ML models including Extreme Gradient Boosting (XGBoost), Logistic Regression (LR), Light Gradient Boosting Machine (LightGBM), Adaptive Boosting (AdaBoost), Gaussian Naïve Bayes (GNB), Complement Naïve Bayes (CNB), and Support vector machine (SVM) were established and their predictive performances were compared by different metrics. Importantly, the SHapley Additive exPlanations (SHAP) was also utilized to examine internal behavior of the highest-performance ML classifier. Results The Logistic Regression (LR)-model performed better than other six ML-models in SCD predictability after the CC infarction, with the area under the receiver characteristic operator curve (AUC) of 77.1% in the validation set. Using LASSO and SHAP analysis, we found that infarction subregions of CC infarction, female, 3-month modified Rankin Scale (mRS) score, age, homocysteine, location of angiostenosis, neutrophil to lymphocyte ratio, pure CC infarction, and number of angiostenosis were the top-nine significant predictors in the order of importance for the output of LR-model. Meanwhile, we identified that infarction subregion of CC, female, 3-month mRS score and pure CC infarction were the factors which independently associated with the cognitive outcome. Conclusion Our study firstly demonstrated that the LR-model with 9 common variables has the best-performance to predict the risk of post-stroke SCD due to CC infarcton. Particularly, the combination of LR-model and SHAP-explainer could aid in achieving personalized risk prediction and be served as a decision-making tool for early intervention since its poor long-term outcome.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | - Bingying Du
- *Correspondence: Bingying Du, ; Xiaoying Bi,
| | - Xiaoying Bi
- *Correspondence: Bingying Du, ; Xiaoying Bi,
| |
Collapse
|
17
|
Rahmani K, Thapa R, Tsou P, Casie Chetty S, Barnes G, Lam C, Foon Tso C. Assessing the effects of data drift on the performance of machine learning models used in clinical sepsis prediction. Int J Med Inform 2023; 173:104930. [PMID: 36893656 DOI: 10.1016/j.ijmedinf.2022.104930] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 10/30/2022] [Accepted: 11/15/2022] [Indexed: 11/21/2022]
Abstract
BACKGROUND Data drift can negatively impact the performance of machine learning algorithms (MLAs) that were trained on historical data. As such, MLAs should be continuously monitored and tuned to overcome the systematic changes that occur in the distribution of data. In this paper, we study the extent of data drift and provide insights about its characteristics for sepsis onset prediction. This study will help elucidate the nature of data drift for prediction of sepsis and similar diseases. This may aid with the development of more effective patient monitoring systems that can stratify risk for dynamic disease states in hospitals. METHODS We devise a series of simulations that measure the effects of data drift in patients with sepsis, using electronic health records (EHR). We simulate multiple scenarios in which data drift may occur, namely the change in the distribution of the predictor variables (covariate shift), the change in the statistical relationship between the predictors and the target (concept shift), and the occurrence of a major healthcare event (major event) such as the COVID-19 pandemic. We measure the impact of data drift on model performances, identify the circumstances that necessitate model retraining, and compare the effects of different retraining methodologies and model architecture on the outcomes. We present the results for two different MLAs, eXtreme Gradient Boosting (XGB) and Recurrent Neural Network (RNN). RESULTS Our results show that the properly retrained XGB models outperform the baseline models in all simulation scenarios, hence signifying the existence of data drift. In the major event scenario, the area under the receiver operating characteristic curve (AUROC) at the end of the simulation period is 0.811 for the baseline XGB model and 0.868 for the retrained XGB model. In the covariate shift scenario, the AUROC at the end of the simulation period for the baseline and retrained XGB models is 0.853 and 0.874 respectively. In the concept shift scenario and under the mixed labeling method, the retrained XGB models perform worse than the baseline model for most simulation steps. However, under the full relabeling method, the AUROC at the end of the simulation period for the baseline and retrained XGB models is 0.852 and 0.877 respectively. The results for the RNN models were mixed, suggesting that retraining based on a fixed network architecture may be inadequate for an RNN. We also present the results in the form of other performance metrics such as the ratio of observed to expected probabilities (calibration) and the normalized rate of positive predictive values (PPV) by prevalence, referred to as lift, at a sensitivity of 0.8. CONCLUSION Our simulations reveal that retraining periods of a couple of months or using several thousand patients are likely to be adequate to monitor machine learning models that predict sepsis. This indicates that a machine learning system for sepsis prediction will probably need less infrastructure for performance monitoring and retraining compared to other applications in which data drift is more frequent and continuous. Our results also show that in the event of a concept shift, a full overhaul of the sepsis prediction model may be necessary because it indicates a discrete change in the definition of sepsis labels, and mixing the labels for the sake of incremental training may not produce the desired results.
Collapse
Affiliation(s)
- Keyvan Rahmani
- Dascena, Inc., 12333 Sowden Rd Ste B PMB 65148, Houston, TX 77080-2059, USA
| | - Rahul Thapa
- Dascena, Inc., 12333 Sowden Rd Ste B PMB 65148, Houston, TX 77080-2059, USA
| | - Peiling Tsou
- Dascena, Inc., 12333 Sowden Rd Ste B PMB 65148, Houston, TX 77080-2059, USA
| | - Satish Casie Chetty
- Dascena, Inc., 12333 Sowden Rd Ste B PMB 65148, Houston, TX 77080-2059, USA.
| | - Gina Barnes
- Dascena, Inc., 12333 Sowden Rd Ste B PMB 65148, Houston, TX 77080-2059, USA
| | - Carson Lam
- Dascena, Inc., 12333 Sowden Rd Ste B PMB 65148, Houston, TX 77080-2059, USA
| | - Chak Foon Tso
- Dascena, Inc., 12333 Sowden Rd Ste B PMB 65148, Houston, TX 77080-2059, USA
| |
Collapse
|
18
|
Im JE, Park S, Kim YJ, Yoon SA, Lee JH. Predicting the need for intubation within 3 h in the neonatal intensive care unit using a multimodal deep neural network. Sci Rep 2023; 13:6213. [PMID: 37069174 PMCID: PMC10106895 DOI: 10.1038/s41598-023-33353-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 04/12/2023] [Indexed: 04/19/2023] Open
Abstract
Respiratory distress is a common chief complaint in neonates admitted to the neonatal intensive care unit. Despite the increasing use of non-invasive ventilation in neonates with respiratory difficulty, some of them require advanced airway support. Delayed intubation is associated with increased morbidity, particularly in urgent unplanned cases. Early and accurate prediction of the need for intubation may provide more time for preparation and increase safety margins by avoiding the late intubation at high-risk infants. This study aimed to predict the need for intubation within 3 h in neonates initially managed with non-invasive ventilation for respiratory distress during the first 48 h of life using a multimodal deep neural network. We developed a multimodal deep neural network model to simultaneously analyze four time-series data collected at 1-h intervals and 19 variables including demographic, physiological and laboratory parameters. Evaluating the dataset of 128 neonates with respiratory distress who underwent non-invasive ventilation, our model achieved an area under the curve of 0.917, sensitivity of 85.2%, and specificity of 89.2%. These findings demonstrate promising results for the multimodal model in predicting neonatal intubation within 3 h.
Collapse
Affiliation(s)
- Jueng-Eun Im
- Biomedical Engineering, Chungbuk National University Hospital, Cheongju, Republic of Korea
| | - Seung Park
- Biomedical Engineering, Chungbuk National University Hospital, Cheongju, Republic of Korea
| | - Yoo-Jin Kim
- Department of Pediatrics, Chungbuk National University Hospital, Chungbuk National University College of Medicine, Chungdae-ro 1, Seowon-gu, Cheongju, 28644, Republic of Korea
| | - Shin Ae Yoon
- Department of Pediatrics, Chungbuk National University Hospital, Chungbuk National University College of Medicine, Chungdae-ro 1, Seowon-gu, Cheongju, 28644, Republic of Korea.
| | - Ji Hyuk Lee
- Department of Pediatrics, Chungbuk National University Hospital, Chungbuk National University College of Medicine, Chungdae-ro 1, Seowon-gu, Cheongju, 28644, Republic of Korea
| |
Collapse
|
19
|
Andonov DI, Ulm B, Graessner M, Podtschaske A, Blobner M, Jungwirth B, Kagerbauer SM. Impact of the Covid-19 pandemic on the performance of machine learning algorithms for predicting perioperative mortality. BMC Med Inform Decis Mak 2023; 23:67. [PMID: 37046259 PMCID: PMC10092913 DOI: 10.1186/s12911-023-02151-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 03/15/2023] [Indexed: 04/14/2023] Open
Abstract
BACKGROUND Machine-learning models are susceptible to external influences which can result in performance deterioration. The aim of our study was to elucidate the impact of a sudden shift in covariates, like the one caused by the Covid-19 pandemic, on model performance. METHODS After ethical approval and registration in Clinical Trials (NCT04092933, initial release 17/09/2019), we developed different models for the prediction of perioperative mortality based on preoperative data: one for the pre-pandemic data period until March 2020, one including data before the pandemic and from the first wave until May 2020, and one that covers the complete period before and during the pandemic until October 2021. We applied XGBoost as well as a Deep Learning neural network (DL). Performance metrics of each model during the different pandemic phases were determined, and XGBoost models were analysed for changes in feature importance. RESULTS XGBoost and DL provided similar performance on the pre-pandemic data with respect to area under receiver operating characteristic (AUROC, 0.951 vs. 0.942) and area under precision-recall curve (AUPR, 0.144 vs. 0.187). Validation in patient cohorts of the different pandemic waves showed high fluctuations in performance from both AUROC and AUPR for DL, whereas the XGBoost models seemed more stable. Change in variable frequencies with onset of the pandemic were visible in age, ASA score, and the higher proportion of emergency operations, among others. Age consistently showed the highest information gain. Models based on pre-pandemic data performed worse during the first pandemic wave (AUROC 0.914 for XGBoost and DL) whereas models augmented with data from the first wave lacked performance after the first wave (AUROC 0.907 for XGBoost and 0.747 for DL). The deterioration was also visible in AUPR, which worsened by over 50% in both XGBoost and DL in the first phase after re-training. CONCLUSIONS A sudden shift in data impacts model performance. Re-training the model with updated data may cause degradation in predictive accuracy if the changes are only transient. Too early re-training should therefore be avoided, and close model surveillance is necessary.
Collapse
Affiliation(s)
- D I Andonov
- Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, Technical University of Munich, Munich, Germany
| | - B Ulm
- Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, Technical University of Munich, Munich, Germany
- Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, University Hospital Ulm, University of Ulm, Albert-Einstein-Allee 23, Ulm, 89081, Germany
| | - M Graessner
- Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, University Hospital Ulm, University of Ulm, Albert-Einstein-Allee 23, Ulm, 89081, Germany
| | - A Podtschaske
- Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, Technical University of Munich, Munich, Germany
| | - M Blobner
- Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, Technical University of Munich, Munich, Germany
- Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, University Hospital Ulm, University of Ulm, Albert-Einstein-Allee 23, Ulm, 89081, Germany
| | - B Jungwirth
- Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, University Hospital Ulm, University of Ulm, Albert-Einstein-Allee 23, Ulm, 89081, Germany
| | - S M Kagerbauer
- Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, Technical University of Munich, Munich, Germany.
- Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, University Hospital Ulm, University of Ulm, Albert-Einstein-Allee 23, Ulm, 89081, Germany.
| |
Collapse
|
20
|
Albahra S, Gorbett T, Robertson S, D'Aleo G, Kumar SVS, Ockunzzi S, Lallo D, Hu B, Rashidi HH. Artificial intelligence and machine learning overview in pathology & laboratory medicine: A general review of data preprocessing and basic supervised concepts. Semin Diagn Pathol 2023; 40:71-87. [PMID: 36870825 DOI: 10.1053/j.semdp.2023.02.002] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Revised: 02/10/2023] [Accepted: 02/14/2023] [Indexed: 02/17/2023]
Abstract
Machine learning (ML) is becoming an integral aspect of several domains in medicine. Yet, most pathologists and laboratory professionals remain unfamiliar with such tools and are unprepared for their inevitable integration. To bridge this knowledge gap, we present an overview of key elements within this emerging data science discipline. First, we will cover general, well-established concepts within ML, such as data type concepts, data preprocessing methods, and ML study design. We will describe common supervised and unsupervised learning algorithms and their associated common machine learning terms (provided within a comprehensive glossary of terms that are discussed within this review). Overall, this review will offer a broad overview of the key concepts and algorithms in machine learning, with a focus on pathology and laboratory medicine. The objective is to provide an updated useful reference for those new to this field or those who require a refresher.
Collapse
Affiliation(s)
- Samer Albahra
- Pathology and Laboratory Medicine Institute (PLMI), Cleveland Clinic, Cleveland, OH, United States; PLMI's Center for Artificial Intelligence & Data Science, Cleveland Clinic, Cleveland, OH, United States.
| | - Tom Gorbett
- Pathology and Laboratory Medicine Institute (PLMI), Cleveland Clinic, Cleveland, OH, United States; PLMI's Center for Artificial Intelligence & Data Science, Cleveland Clinic, Cleveland, OH, United States
| | - Scott Robertson
- Pathology and Laboratory Medicine Institute (PLMI), Cleveland Clinic, Cleveland, OH, United States; PLMI's Center for Artificial Intelligence & Data Science, Cleveland Clinic, Cleveland, OH, United States
| | - Giana D'Aleo
- Pathology and Laboratory Medicine Institute (PLMI), Cleveland Clinic, Cleveland, OH, United States; PLMI's Center for Artificial Intelligence & Data Science, Cleveland Clinic, Cleveland, OH, United States
| | - Sushasree Vasudevan Suseel Kumar
- Pathology and Laboratory Medicine Institute (PLMI), Cleveland Clinic, Cleveland, OH, United States; PLMI's Center for Artificial Intelligence & Data Science, Cleveland Clinic, Cleveland, OH, United States
| | - Samuel Ockunzzi
- Pathology and Laboratory Medicine Institute (PLMI), Cleveland Clinic, Cleveland, OH, United States; PLMI's Center for Artificial Intelligence & Data Science, Cleveland Clinic, Cleveland, OH, United States
| | - Daniel Lallo
- Pathology and Laboratory Medicine Institute (PLMI), Cleveland Clinic, Cleveland, OH, United States; PLMI's Center for Artificial Intelligence & Data Science, Cleveland Clinic, Cleveland, OH, United States
| | - Bo Hu
- Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, OH, United States; PLMI's Center for Artificial Intelligence & Data Science, Cleveland Clinic, Cleveland, OH, United States
| | - Hooman H Rashidi
- Pathology and Laboratory Medicine Institute (PLMI), Cleveland Clinic, Cleveland, OH, United States; PLMI's Center for Artificial Intelligence & Data Science, Cleveland Clinic, Cleveland, OH, United States.
| |
Collapse
|
21
|
Lu SC, Swisher CL, Chung C, Jaffray D, Sidey-Gibbons C. On the importance of interpretable machine learning predictions to inform clinical decision making in oncology. Front Oncol 2023; 13:1129380. [PMID: 36925929 PMCID: PMC10013157 DOI: 10.3389/fonc.2023.1129380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 02/14/2023] [Indexed: 03/04/2023] Open
Abstract
Machine learning-based tools are capable of guiding individualized clinical management and decision-making by providing predictions of a patient's future health state. Through their ability to model complex nonlinear relationships, ML algorithms can often outperform traditional statistical prediction approaches, but the use of nonlinear functions can mean that ML techniques may also be less interpretable than traditional statistical methodologies. While there are benefits of intrinsic interpretability, many model-agnostic approaches now exist and can provide insight into the way in which ML systems make decisions. In this paper, we describe how different algorithms can be interpreted and introduce some techniques for interpreting complex nonlinear algorithms.
Collapse
Affiliation(s)
- Sheng-Chieh Lu
- Section of Patient-Centered Analytics, Division of Internal Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Christine L Swisher
- The Ronin Project, San Mateo, CA, United States.,The Lawrence J. Ellison Institute for Transformative Medicine, Los Angeles, CA, United States
| | - Caroline Chung
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States.,Institute for Data Science in Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - David Jaffray
- Institute for Data Science in Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States.,Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, United States.,Department of Radiation Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Chris Sidey-Gibbons
- Section of Patient-Centered Analytics, Division of Internal Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| |
Collapse
|
22
|
The use of machine learning and artificial intelligence within pediatric critical care. Pediatr Res 2023; 93:405-412. [PMID: 36376506 PMCID: PMC9660024 DOI: 10.1038/s41390-022-02380-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 09/15/2022] [Accepted: 10/30/2022] [Indexed: 11/16/2022]
Abstract
The field of pediatric critical care has been hampered in the era of precision medicine by our inability to accurately define and subclassify disease phenotypes. This has been caused by heterogeneity across age groups that further challenges the ability to perform randomized controlled trials in pediatrics. One approach to overcome these inherent challenges include the use of machine learning algorithms that can assist in generating more meaningful interpretations from clinical data. This review summarizes machine learning and artificial intelligence techniques that are currently in use for clinical data modeling with relevance to pediatric critical care. Focus has been placed on the differences between techniques and the role of each in the clinical arena. The various forms of clinical decision support that utilize machine learning are also described. We review the applications and limitations of machine learning techniques to empower clinicians to make informed decisions at the bedside. IMPACT: Critical care units generate large amounts of under-utilized data that can be processed through artificial intelligence. This review summarizes the machine learning and artificial intelligence techniques currently being used to process clinical data. The review highlights the applications and limitations of these techniques within a clinical context to aid providers in making more informed decisions at the bedside.
Collapse
|
23
|
Loh HW, Ooi CP, Seoni S, Barua PD, Molinari F, Acharya UR. Application of explainable artificial intelligence for healthcare: A systematic review of the last decade (2011-2022). COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 226:107161. [PMID: 36228495 DOI: 10.1016/j.cmpb.2022.107161] [Citation(s) in RCA: 84] [Impact Index Per Article: 42.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 09/16/2022] [Accepted: 09/25/2022] [Indexed: 06/16/2023]
Abstract
BACKGROUND AND OBJECTIVES Artificial intelligence (AI) has branched out to various applications in healthcare, such as health services management, predictive medicine, clinical decision-making, and patient data and diagnostics. Although AI models have achieved human-like performance, their use is still limited because they are seen as a black box. This lack of trust remains the main reason for their low use in practice, especially in healthcare. Hence, explainable artificial intelligence (XAI) has been introduced as a technique that can provide confidence in the model's prediction by explaining how the prediction is derived, thereby encouraging the use of AI systems in healthcare. The primary goal of this review is to provide areas of healthcare that require more attention from the XAI research community. METHODS Multiple journal databases were thoroughly searched using PRISMA guidelines 2020. Studies that do not appear in Q1 journals, which are highly credible, were excluded. RESULTS In this review, we surveyed 99 Q1 articles covering the following XAI techniques: SHAP, LIME, GradCAM, LRP, Fuzzy classifier, EBM, CBR, rule-based systems, and others. CONCLUSION We discovered that detecting abnormalities in 1D biosignals and identifying key text in clinical notes are areas that require more attention from the XAI research community. We hope this is review will encourage the development of a holistic cloud system for a smart city.
Collapse
Affiliation(s)
- Hui Wen Loh
- School of Science and Technology, Singapore University of Social Sciences, Singapore
| | - Chui Ping Ooi
- School of Science and Technology, Singapore University of Social Sciences, Singapore
| | - Silvia Seoni
- Department of Electronics and Telecommunications, Biolab, Politecnico di Torino, Torino 10129, Italy
| | - Prabal Datta Barua
- Faculty of Engineering and Information Technology, University of Technology Sydney, Australia; School of Business (Information Systems), Faculty of Business, Education, Law & Arts, University of Southern Queensland, Australia
| | - Filippo Molinari
- Department of Electronics and Telecommunications, Biolab, Politecnico di Torino, Torino 10129, Italy
| | - U Rajendra Acharya
- School of Science and Technology, Singapore University of Social Sciences, Singapore; School of Business (Information Systems), Faculty of Business, Education, Law & Arts, University of Southern Queensland, Australia; School of Engineering, Ngee Ann Polytechnic, Singapore; Department of Bioinformatics and Medical Engineering, Asia University, Taiwan; Research Organization for Advanced Science and Technology (IROAST), Kumamoto University, Kumamoto, Japan.
| |
Collapse
|
24
|
Di Martino F, Delmastro F. Explainable AI for clinical and remote health applications: a survey on tabular and time series data. Artif Intell Rev 2022; 56:5261-5315. [PMID: 36320613 PMCID: PMC9607788 DOI: 10.1007/s10462-022-10304-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
AbstractNowadays Artificial Intelligence (AI) has become a fundamental component of healthcare applications, both clinical and remote, but the best performing AI systems are often too complex to be self-explaining. Explainable AI (XAI) techniques are defined to unveil the reasoning behind the system’s predictions and decisions, and they become even more critical when dealing with sensitive and personal health data. It is worth noting that XAI has not gathered the same attention across different research areas and data types, especially in healthcare. In particular, many clinical and remote health applications are based on tabular and time series data, respectively, and XAI is not commonly analysed on these data types, while computer vision and Natural Language Processing (NLP) are the reference applications. To provide an overview of XAI methods that are most suitable for tabular and time series data in the healthcare domain, this paper provides a review of the literature in the last 5 years, illustrating the type of generated explanations and the efforts provided to evaluate their relevance and quality. Specifically, we identify clinical validation, consistency assessment, objective and standardised quality evaluation, and human-centered quality assessment as key features to ensure effective explanations for the end users. Finally, we highlight the main research challenges in the field as well as the limitations of existing XAI methods.
Collapse
|
25
|
Rahmani K, Thapa R, Tsou P, Chetty SC, Barnes G, Lam C, Tso CF. Assessing the effects of data drift on the performance of machine learning models used in clinical sepsis prediction. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2022:2022.06.06.22276062. [PMID: 35702157 PMCID: PMC9196120 DOI: 10.1101/2022.06.06.22276062] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Background Data drift can negatively impact the performance of machine learning algorithms (MLAs) that were trained on historical data. As such, MLAs should be continuously monitored and tuned to overcome the systematic changes that occur in the distribution of data. In this paper, we study the extent of data drift and provide insights about its characteristics for sepsis onset prediction. This study will help elucidate the nature of data drift for prediction of sepsis and similar diseases. This may aid with the development of more effective patient monitoring systems that can stratify risk for dynamic disease states in hospitals. Methods We devise a series of simulations that measure the effects of data drift in patients with sepsis. We simulate multiple scenarios in which data drift may occur, namely the change in the distribution of the predictor variables (covariate shift), the change in the statistical relationship between the predictors and the target (concept shift), and the occurrence of a major healthcare event (major event) such as the COVID-19 pandemic. We measure the impact of data drift on model performances, identify the circumstances that necessitate model retraining, and compare the effects of different retraining methodologies and model architecture on the outcomes. We present the results for two different MLAs, eXtreme Gradient Boosting (XGB) and Recurrent Neural Network (RNN). Results Our results show that the properly retrained XGB models outperform the baseline models in all simulation scenarios, hence signifying the existence of data drift. In the major event scenario, the area under the receiver operating characteristic curve (AUROC) at the end of the simulation period is 0.811 for the baseline XGB model and 0.868 for the retrained XGB model. In the covariate shift scenario, the AUROC at the end of the simulation period for the baseline and retrained XGB models is 0.853 and 0.874 respectively. In the concept shift scenario and under the mixed labeling method, the retrained XGB models perform worse than the baseline model for most simulation steps. However, under the full relabeling method, the AUROC at the end of the simulation period for the baseline and retrained XGB models is 0.852 and 0.877 respectively. The results for the RNN models were mixed, suggesting that retraining based on a fixed network architecture may be inadequate for an RNN. We also present the results in the form of other performance metrics such as the ratio of observed to expected probabilities (calibration) and the normalized rate of positive predictive values (PPV) by prevalence, referred to as lift, at a sensitivity of 0.8. Conclusion Our simulations reveal that retraining periods of a couple of months or using several thousand patients are likely to be adequate to monitor machine learning models that predict sepsis. This indicates that a machine learning system for sepsis prediction will probably need less infrastructure for performance monitoring and retraining compared to other applications in which data drift is more frequent and continuous. Our results also show that in the event of a concept shift, a full overhaul of the sepsis prediction model may be necessary because it indicates a discrete change in the definition of sepsis labels, and mixing the labels for the sake of incremental training may not produce the desired results.
Collapse
Affiliation(s)
- Keyvan Rahmani
- Dascena, Inc., 12333 Sowden Rd Ste B PMB 65148, Houston, Texas 77080-2059
| | - Rahul Thapa
- Dascena, Inc., 12333 Sowden Rd Ste B PMB 65148, Houston, Texas 77080-2059
| | - Peiling Tsou
- Dascena, Inc., 12333 Sowden Rd Ste B PMB 65148, Houston, Texas 77080-2059
| | | | - Gina Barnes
- Dascena, Inc., 12333 Sowden Rd Ste B PMB 65148, Houston, Texas 77080-2059
| | - Carson Lam
- Dascena, Inc., 12333 Sowden Rd Ste B PMB 65148, Houston, Texas 77080-2059
| | - Chak Foon Tso
- Dascena, Inc., 12333 Sowden Rd Ste B PMB 65148, Houston, Texas 77080-2059
| |
Collapse
|
26
|
Clinical artificial intelligence quality improvement: towards continual monitoring and updating of AI algorithms in healthcare. NPJ Digit Med 2022; 5:66. [PMID: 35641814 PMCID: PMC9156743 DOI: 10.1038/s41746-022-00611-y] [Citation(s) in RCA: 69] [Impact Index Per Article: 34.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 04/29/2022] [Indexed: 12/13/2022] Open
Abstract
Machine learning (ML) and artificial intelligence (AI) algorithms have the potential to derive insights from clinical data and improve patient outcomes. However, these highly complex systems are sensitive to changes in the environment and liable to performance decay. Even after their successful integration into clinical practice, ML/AI algorithms should be continuously monitored and updated to ensure their long-term safety and effectiveness. To bring AI into maturity in clinical care, we advocate for the creation of hospital units responsible for quality assurance and improvement of these algorithms, which we refer to as “AI-QI” units. We discuss how tools that have long been used in hospital quality assurance and quality improvement can be adapted to monitor static ML algorithms. On the other hand, procedures for continual model updating are still nascent. We highlight key considerations when choosing between existing methods and opportunities for methodological innovation.
Collapse
|
27
|
AI and Clinical Decision Making: The Limitations and Risks of Computational Reductionism in Bowel Cancer Screening. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12073341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/09/2022]
Abstract
Advances in artificial intelligence in healthcare are frequently promoted as ‘solutions’ to improve the accuracy, safety, and quality of clinical decisions, treatments, and care. Despite some diagnostic success, however, AI systems rely on forms of reductive reasoning and computational determinism that embed problematic assumptions about clinical decision-making and clinical practice. Clinician autonomy, experience, and judgement are reduced to inputs and outputs framed as binary or multi-class classification problems benchmarked against a clinician’s capacity to identify or predict disease states. This paper examines this reductive reasoning in AI systems for colorectal cancer (CRC) to highlight their limitations and risks: (1) in AI systems themselves due to inherent biases in (a) retrospective training datasets and (b) embedded assumptions in underlying AI architectures and algorithms; (2) in the problematic and limited evaluations being conducted on AI systems prior to system integration in clinical practice; and (3) in marginalising socio-technical factors in the context-dependent interactions between clinicians, their patients, and the broader health system. The paper argues that to optimise benefits from AI systems and to avoid negative unintended consequences for clinical decision-making and patient care, there is a need for more nuanced and balanced approaches to AI system deployment and evaluation in CRC.
Collapse
|
28
|
Is Artificial Intelligence (AI) a Pipe Dream? Why Legal Issues Present Significant Hurdles to AI Autonomy. AJR Am J Roentgenol 2022; 219:152-156. [PMID: 35138133 DOI: 10.2214/ajr.21.27224] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Proponents of artificial intelligence ("AI") technology have suggested that in the near future, AI software may replace human radiologists. While AI's assimilation into the specialty has occurred more slowly than predicted, developments in machine learning, deep learning, and neural networks suggest that technological hurdles and costs will eventually be overcome. However, beyond these technological hurdles, formidable legal hurdles threaten AI's impact on the specialty. Legal liability for errors committed by AI will influence AI's ultimate role within radiology and whether AI remains a simple decision support tool or develops into an autonomous member of the healthcare team. Additional areas of uncertainty include the potential application of products liability law to AI, and the approach taken by the U.S. FDA in potentially classifying autonomous AI as a medical device. The current ambiguity of the legal treatment of AI will profoundly impact autonomous AI development given that vendors, radiologists, and hospitals will be unable to reliably assess their liability from implementing such tools. Advocates of AI in radiology and health care in general should lobby for legislative action to better clarify the liability risks of AI in a way that does not deter technological development.
Collapse
|