1
|
Chan VTT, Ran AR, Wagner SK, Hui HYH, Hu X, Ko H, Fekrat S, Wang Y, Lee CS, Young AL, Tham CC, Tham YC, Keane PA, Milea D, Chen C, Wong TY, Mok VCT, Cheung CY. Value proposition of retinal imaging in Alzheimer's disease screening: A review of eight evolving trends. Prog Retin Eye Res 2024; 103:101290. [PMID: 39173942 DOI: 10.1016/j.preteyeres.2024.101290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2024] [Revised: 08/13/2024] [Accepted: 08/15/2024] [Indexed: 08/24/2024]
Abstract
Alzheimer's disease (AD) is the leading cause of dementia worldwide. Current diagnostic modalities of AD generally focus on detecting the presence of amyloid β and tau protein in the brain (for example, positron emission tomography [PET] and cerebrospinal fluid testing), but these are limited by their high cost, invasiveness, and lack of expertise. Retinal imaging exhibits potential in AD screening and risk stratification, as the retina provides a platform for the optical visualization of the central nervous system in vivo, with vascular and neuronal changes that mirror brain pathology. Given the paradigm shift brought by advances in artificial intelligence and the emergence of disease-modifying therapies, this article aims to summarize and review the current literature to highlight 8 trends in an evolving landscape regarding the role and potential value of retinal imaging in AD screening.
Collapse
Affiliation(s)
- Victor T T Chan
- Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong SAR, China; Department of Ophthalmology and Visual Sciences, Prince of Wales Hospital, Hong Kong SAR, China
| | - An Ran Ran
- Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Siegfried K Wagner
- NIHR Biomedical Research Center at Moorfields Eye Hospital NHS Foundation Trust and UCL Institute of Ophthalmology, London, UK
| | - Herbert Y H Hui
- Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Xiaoyan Hu
- Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Ho Ko
- Division of Neurology, Department of Medicine and Therapeutics, The Chinese University of Hong Kong, Hong Kong SAR, China; Gerald Choa Neuroscience Institute, Margaret K.L. Cheung Research Centre for Management of Parkinsonism, Therese Pei Fong Chow Research Centre for Prevention of Dementia, Lui Che Woo Institute of Innovative Medicine, Li Ka Shing Institute of Health Science, Lau Tat-chuen Research Centre of Brain Degenerative Diseases in Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Sharon Fekrat
- Departments of Ophthalmology and Neurology, Duke University School of Medicine, Durham, NC, USA
| | - Yaxing Wang
- Beijing Institute of Ophthalmology, Beijing Ophthalmology and Visual Science Key Lab, Beijing Tongren Hospital, Capital University of Medical Science, Beijing, China
| | - Cecilia S Lee
- Department of Ophthalmology, University of Washington, Seattle, WA, USA
| | - Alvin L Young
- Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong SAR, China; Department of Ophthalmology and Visual Sciences, Prince of Wales Hospital, Hong Kong SAR, China
| | - Clement C Tham
- Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Yih Chung Tham
- Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Pearse A Keane
- NIHR Biomedical Research Center at Moorfields Eye Hospital NHS Foundation Trust and UCL Institute of Ophthalmology, London, UK
| | - Dan Milea
- Singapore National Eye Centre, Singapore
| | - Christopher Chen
- Memory Aging & Cognition Centre, National University Health System, Singapore; Department of Pharmacology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Tien Yin Wong
- Tsinghua Medicine, Tsinghua University, Beijing, China
| | - Vincent C T Mok
- Division of Neurology, Department of Medicine and Therapeutics, The Chinese University of Hong Kong, Hong Kong SAR, China; Gerald Choa Neuroscience Institute, Margaret K.L. Cheung Research Centre for Management of Parkinsonism, Therese Pei Fong Chow Research Centre for Prevention of Dementia, Lui Che Woo Institute of Innovative Medicine, Li Ka Shing Institute of Health Science, Lau Tat-chuen Research Centre of Brain Degenerative Diseases in Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Carol Y Cheung
- Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong SAR, China.
| |
Collapse
|
2
|
Matthews GA, McGenity C, Bansal D, Treanor D. Public evidence on AI products for digital pathology. NPJ Digit Med 2024; 7:300. [PMID: 39455883 PMCID: PMC11511888 DOI: 10.1038/s41746-024-01294-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 10/08/2024] [Indexed: 10/28/2024] Open
Abstract
Novel products applying artificial intelligence (AI)-based methods to digital pathology images are touted to have many uses and benefits. However, publicly available information for products can be variable, with few sources of independent evidence. This review aimed to identify public evidence for AI-based products for digital pathology. Key features of products on the European Economic Area/Great Britain (EEA/GB) markets were examined, including their regulatory approval, intended use, and published validation studies. There were 26 AI-based products that met the inclusion criteria and, of these, 24 had received regulatory approval via the self-certification route as General in vitro diagnostic (IVD) medical devices. Only 10 of the products (38%) had peer-reviewed internal validation studies and 11 products (42%) had peer-reviewed external validation studies. To support transparency an online register was developed using identified public evidence ( https://osf.io/gb84r/ ), which we anticipate will provide an accessible resource on novel devices and support decision making.
Collapse
Affiliation(s)
| | - Clare McGenity
- Leeds Teaching Hospitals NHS Trust, Leeds, UK
- University of Leeds, Leeds, UK
| | | | - Darren Treanor
- Leeds Teaching Hospitals NHS Trust, Leeds, UK.
- University of Leeds, Leeds, UK.
- Department of Clinical Pathology & Department of Clinical and Experimental Medicine, Linköping University, Linköping, Sweden.
- Centre for Medical Image Science and Visualization (CMIV), Linköping University, Linköping, Sweden.
| |
Collapse
|
3
|
Katal S, York B, Gholamrezanezhad A. AI in radiology: From promise to practice - A guide to effective integration. Eur J Radiol 2024; 181:111798. [PMID: 39471551 DOI: 10.1016/j.ejrad.2024.111798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Revised: 10/03/2024] [Accepted: 10/19/2024] [Indexed: 11/01/2024]
Abstract
While Artificial Intelligence (AI) has the potential to transform the field of diagnostic radiology, important obstacles still inhibit its integration into clinical environments. Foremost among them is the inability to integrate clinical information and prior and concurrent imaging examinations, which can lead to diagnostic errors that could irreversibly alter patient care. For AI to succeed in modern clinical practice, model training and algorithm development need to account for relevant background information that may influence the presentation of the patient in question. While AI is often remarkably accurate in distinguishing binary outcomes-hemorrhage vs. no hemorrhage; fracture vs. no fracture-the narrow scope of current training datasets prevents AI from examining the entire clinical context of the image in question. In this article, we provide an overview of the ways in which failure to account for clinical data and prior imaging can adversely affect AI interpretation of imaging studies. We then showcase how emerging techniques such as multimodal fusion and combined neural networks can take advantage of both clinical and imaging data, as well as how development strategies like domain adaptation can ensure greater generalizability of AI algorithms across diverse and dynamic clinical environments.
Collapse
Affiliation(s)
- Sanaz Katal
- Department of Medical Imaging, St. Vincent's Hospital Melbourne, 41 Victoria Parade, Fitzroy, VIC 3065, USA
| | - Benjamin York
- Department of Radiology, Los Angeles General Medical Center, 1200 N State Street, Los Angeles, CA 90033, USA.
| | - Ali Gholamrezanezhad
- Department of Radiology, Los Angeles General Medical Center, 1200 N State Street, Los Angeles, CA 90033, USA
| |
Collapse
|
4
|
Pfohl SR, Cole-Lewis H, Sayres R, Neal D, Asiedu M, Dieng A, Tomasev N, Rashid QM, Azizi S, Rostamzadeh N, McCoy LG, Celi LA, Liu Y, Schaekermann M, Walton A, Parrish A, Nagpal C, Singh P, Dewitt A, Mansfield P, Prakash S, Heller K, Karthikesalingam A, Semturs C, Barral J, Corrado G, Matias Y, Smith-Loud J, Horn I, Singhal K. A toolbox for surfacing health equity harms and biases in large language models. Nat Med 2024:10.1038/s41591-024-03258-2. [PMID: 39313595 DOI: 10.1038/s41591-024-03258-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Accepted: 08/20/2024] [Indexed: 09/25/2024]
Abstract
Large language models (LLMs) hold promise to serve complex health information needs but also have the potential to introduce harm and exacerbate health disparities. Reliably evaluating equity-related model failures is a critical step toward developing systems that promote health equity. We present resources and methodologies for surfacing biases with potential to precipitate equity-related harms in long-form, LLM-generated answers to medical questions and conduct a large-scale empirical case study with the Med-PaLM 2 LLM. Our contributions include a multifactorial framework for human assessment of LLM-generated answers for biases and EquityMedQA, a collection of seven datasets enriched for adversarial queries. Both our human assessment framework and our dataset design process are grounded in an iterative participatory approach and review of Med-PaLM 2 answers. Through our empirical study, we find that our approach surfaces biases that may be missed by narrower evaluation approaches. Our experience underscores the importance of using diverse assessment methodologies and involving raters of varying backgrounds and expertise. While our approach is not sufficient to holistically assess whether the deployment of an artificial intelligence (AI) system promotes equitable health outcomes, we hope that it can be leveraged and built upon toward a shared goal of LLMs that promote accessible and equitable healthcare.
Collapse
Affiliation(s)
| | | | | | | | | | - Awa Dieng
- Google DeepMind, Mountain View, CA, USA
| | | | | | | | | | - Liam G McCoy
- University of Alberta, Edmonton, Alberta, Canada
| | - Leo Anthony Celi
- Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA, USA
- Division of Pulmonary, Critical Care and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Yun Liu
- Google Research, Mountain View, CA, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Ivor Horn
- Google Research, Mountain View, CA, USA
| | | |
Collapse
|
5
|
Youssef A, Nichol AA, Martinez-Martin N, Larson DB, Abramoff M, Wolf RM, Char D. Ethical Considerations in the Design and Conduct of Clinical Trials of Artificial Intelligence. JAMA Netw Open 2024; 7:e2432482. [PMID: 39240560 PMCID: PMC11380101 DOI: 10.1001/jamanetworkopen.2024.32482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 09/07/2024] Open
Abstract
Importance Safe integration of artificial intelligence (AI) into clinical settings often requires randomized clinical trials (RCT) to compare AI efficacy with conventional care. Diabetic retinopathy (DR) screening is at the forefront of clinical AI applications, marked by the first US Food and Drug Administration (FDA) De Novo authorization for an autonomous AI for such use. Objective To determine the generalizability of the 7 ethical research principles for clinical trials endorsed by the National Institute of Health (NIH), and identify ethical concerns unique to clinical trials of AI. Design, Setting, and Participants This qualitative study included semistructured interviews conducted with 11 investigators engaged in the design and implementation of clinical trials of AI for DR screening from November 11, 2022, to February 20, 2023. The study was a collaboration with the ACCESS (AI for Children's Diabetic Eye Exams) trial, the first clinical trial of autonomous AI in pediatrics. Participant recruitment initially utilized purposeful sampling, and later expanded with snowball sampling. Study methodology for analysis combined a deductive approach to explore investigators' perspectives of the 7 ethical principles for clinical research endorsed by the NIH and an inductive approach to uncover the broader ethical considerations implementing clinical trials of AI within care delivery. Results A total of 11 participants (mean [SD] age, 47.5 [12.0] years; 7 male [64%], 4 female [36%]; 3 Asian [27%], 8 White [73%]) were included, with diverse expertise in ethics, ophthalmology, translational medicine, biostatistics, and AI development. Key themes revealed several ethical challenges unique to clinical trials of AI. These themes included difficulties in measuring social value, establishing scientific validity, ensuring fair participant selection, evaluating risk-benefit ratios across various patient subgroups, and addressing the complexities inherent in the data use terms of informed consent. Conclusions and Relevance This qualitative study identified practical ethical challenges that investigators need to consider and negotiate when conducting AI clinical trials, exemplified by the DR screening use-case. These considerations call for further guidance on where to focus empirical and normative ethical efforts to best support conduct clinical trials of AI and minimize unintended harm to trial participants.
Collapse
Affiliation(s)
- Alaa Youssef
- Departments of Radiology, Stanford University School of Medicine, Stanford, California
| | - Ariadne A Nichol
- Center for Biomedical Ethics, Stanford University School of Medicine, Stanford, California
| | - Nicole Martinez-Martin
- Center for Biomedical Ethics, Stanford University School of Medicine, Stanford, California
- Department of Psychiatry, Stanford University School of Medicine, Stanford, California
| | - David B Larson
- Departments of Radiology, Stanford University School of Medicine, Stanford, California
| | - Michael Abramoff
- Department of Ophthalmology and Visual Sciences, University of Iowa Hospital and Clinics, Iowa City
- Electrical and Computer Engineering, University of Iowa, Iowa City
| | - Risa M Wolf
- Division of Endocrinology, Department of Pediatrics, The Johns Hopkins School of Medicine, Baltimore, Maryland
| | - Danton Char
- Center for Biomedical Ethics, Stanford University School of Medicine, Stanford, California
- Department of Anesthesiology, Division of Pediatric Cardiac Anesthesia, Stanford, California
| |
Collapse
|
6
|
Kapoor DU, Saini PK, Sharma N, Singh A, Prajapati BG, Elossaily GM, Rashid S. AI illuminates paths in oral cancer: transformative insights, diagnostic precision, and personalized strategies. EXCLI JOURNAL 2024; 23:1091-1116. [PMID: 39391057 PMCID: PMC11464865 DOI: 10.17179/excli2024-7253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Accepted: 08/29/2024] [Indexed: 10/12/2024]
Abstract
Oral cancer retains one of the lowest survival rates worldwide, despite recent therapeutic advancements signifying a tenacious challenge in healthcare. Artificial intelligence exhibits noteworthy potential in escalating diagnostic and treatment procedures, offering promising advancements in healthcare. This review entails the traditional imaging techniques for the oral cancer treatment. The role of artificial intelligence in prognosis of oral cancer including predictive modeling, identification of prognostic factors and risk stratification also discussed significantly in this review. The review also encompasses the utilization of artificial intelligence such as automated image analysis, computer-aided detection and diagnosis integration of machine learning algorithms for oral cancer diagnosis and treatment. The customizing treatment approaches for oral cancer through artificial intelligence based personalized medicine is also part of this review. See also the graphical abstract(Fig. 1).
Collapse
Affiliation(s)
- Devesh U. Kapoor
- Dr. Dayaram Patel Pharmacy College, Bardoli-394601, Gujarat, India
| | - Pushpendra Kumar Saini
- Department of Pharmaceutics, Sri Balaji College of Pharmacy, Jaipur, Rajasthan-302013, India
| | - Narendra Sharma
- Department of Pharmaceutics, Sri Balaji College of Pharmacy, Jaipur, Rajasthan-302013, India
| | - Ankul Singh
- Faculty of Pharmacy, Department of Pharmacology, Dr MGR Educational and Research Institute, Velapanchavadi, Chennai-77, Tamil Nadu, India
| | - Bhupendra G. Prajapati
- Shree S. K. Patel College of Pharmaceutical Education and Research, Ganpat University, Kherva-384012, Gujarat, India
- Faculty of Pharmacy, Silpakorn University, Nakhon Pathom 73000, Thailand
| | - Gehan M. Elossaily
- Department of Basic Medical Sciences, College of Medicine, AlMaarefa University, P.O. Box 71666, Riyadh, 11597, Saudi Arabia
| | - Summya Rashid
- Department of Pharmacology & Toxicology, College of Pharmacy, Prince Sattam Bin Abdulaziz University, P.O. Box 173, Al-Kharj 11942, Saudi Arabia
| |
Collapse
|
7
|
Linfeng W, Pengyu M. Deep learning is necessary for safety regulation in predicting malnutrition in gastric cancer patients. Clin Nutr 2024; 43:2195. [PMID: 39153431 DOI: 10.1016/j.clnu.2024.07.043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2024] [Accepted: 07/31/2024] [Indexed: 08/19/2024]
Affiliation(s)
- Wang Linfeng
- Southwest Medical University, Luzhou, Sichuan, China
| | - Miao Pengyu
- Southwest Medical University, Luzhou, Sichuan, China.
| |
Collapse
|
8
|
Hatherley J. Are clinicians ethically obligated to disclose their use of medical machine learning systems to patients? JOURNAL OF MEDICAL ETHICS 2024:jme-2024-109905. [PMID: 39117396 DOI: 10.1136/jme-2024-109905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 07/26/2024] [Indexed: 08/10/2024]
Abstract
It is commonly accepted that clinicians are ethically obligated to disclose their use of medical machine learning systems to patients, and that failure to do so would amount to a moral fault for which clinicians ought to be held accountable. Call this 'the disclosure thesis.' Four main arguments have been, or could be, given to support the disclosure thesis in the ethics literature: the risk-based argument, the rights-based argument, the materiality argument and the autonomy argument. In this article, I argue that each of these four arguments are unconvincing, and therefore, that the disclosure thesis ought to be rejected. I suggest that mandating disclosure may also even risk harming patients by providing stakeholders with a way to avoid accountability for harm that results from improper applications or uses of these systems.
Collapse
Affiliation(s)
- Joshua Hatherley
- Department of Philosophy and History of Ideas, Aarhus University, Aarhus, Denmark
| |
Collapse
|
9
|
Shelmerdine SC. Rethinking our relationship with AI: for better or worse, richer or poorer? Eur Radiol 2024:10.1007/s00330-024-11007-9. [PMID: 39095603 DOI: 10.1007/s00330-024-11007-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Revised: 07/01/2024] [Accepted: 07/02/2024] [Indexed: 08/04/2024]
Affiliation(s)
- Susan C Shelmerdine
- Department of Clinical Radiology, Great Ormond Street Hospital for Children, London, UK.
- UCL Great Ormond Street Institute of Child Health, Great Ormond Street Hospital for Children, London, UK.
- NIHR Great Ormond Street Hospital Biomedical Research Centre, Bloomsbury, London, UK.
| |
Collapse
|
10
|
Bandyopadhyay A, Oks M, Sun H, Prasad B, Rusk S, Jefferson F, Malkani RG, Haghayegh S, Sachdeva R, Hwang D, Agustsson J, Mignot E, Summers M, Fabbri D, Deak M, Anastasi M, Sampson A, Van Hout S, Seixas A. Strengths, weaknesses, opportunities, and threats of using AI-enabled technology in sleep medicine: a commentary. J Clin Sleep Med 2024; 20:1183-1191. [PMID: 38533757 PMCID: PMC11217619 DOI: 10.5664/jcsm.11132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Accepted: 03/20/2024] [Indexed: 03/28/2024]
Abstract
Over the past few years, artificial intelligence (AI) has emerged as a powerful tool used to efficiently automate several tasks across multiple domains. Sleep medicine is perfectly positioned to leverage this tool due to the wealth of physiological signals obtained through sleep studies or sleep tracking devices and abundance of accessible clinical data through electronic medical records. However, caution must be applied when utilizing AI, due to intrinsic challenges associated with novel technology. The Artificial Intelligence in Sleep Medicine Committee of the American Academy of Sleep Medicine reviews advancements in AI within the sleep medicine field. In this article, the Artificial Intelligence in Sleep Medicine committee members provide a commentary on the scope of AI technology in sleep medicine. The commentary identifies 3 pivotal areas in sleep medicine that can benefit from AI technologies: clinical care, lifestyle management, and population health management. This article provides a detailed analysis of the strengths, weaknesses, opportunities, and threats associated with using AI-enabled technologies in each pivotal area. Finally, the article broadly reviews barriers and challenges associated with using AI-enabled technologies and offers possible solutions. CITATION Bandyopadhyay A, Oks M, Sun H, et al. Strengths, weaknesses, opportunities, and threats of using AI-enabled technology in sleep medicine: a commentary. J Clin Sleep Med. 2024;20(7):1183-1191.
Collapse
Affiliation(s)
- Anuja Bandyopadhyay
- Department of Pediatrics, Indiana University School of Medicine, Indianapolis, Indiana
| | - Margarita Oks
- Department of Medicine, Northwell Health System, New York, New York
| | - Haoqi Sun
- Department of Neurology, Beth Israel Deaconess Medical Center, Boston, Massachusetts
| | - Bharati Prasad
- Department of Medicine, University of Illinois, Chicago, Illinois
| | - Sam Rusk
- EnsoData Research, EnsoData, Madison, Wisconsin
| | - Felicia Jefferson
- Department of Biochemistry and Molecular Biology, University of Nevada, Reno, Nevada
| | - Roneil Gopal Malkani
- Department of Neurology, Northwestern University Feinberg School of Medicine, Chicago, Illinois
- Neurology Service, Jesse Brown Veterans Affairs Medical Center, Chicago, Illinois
| | - Shahab Haghayegh
- Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts
| | - Ramesh Sachdeva
- Children’s Hospital of Michigan and Central Michigan University College of Medicine, Detroit, Michigan
| | - Dennis Hwang
- Kaiser Permanente Southern California, Los Angeles, California
| | | | - Emmanuel Mignot
- Stanford University, School of Medicine, Stanford, California
| | - Michael Summers
- Division of Pulmonary, Critical Care, and Sleep Medicine, University of Nebraska Medical Center, Omaha, Nebraska
| | | | | | | | | | | | - Azizi Seixas
- Department of Informatics and Health Data Science, University of Miami Miller School of Medicine, Miami, Florida
| |
Collapse
|
11
|
Wilkinson LS, Dunbar JK, Lip G. Clinical Integration of Artificial Intelligence for Breast Imaging. Radiol Clin North Am 2024; 62:703-716. [PMID: 38777544 DOI: 10.1016/j.rcl.2023.12.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
This article describes an approach to planning and implementing artificial intelligence products in a breast screening service. It highlights the importance of an in-depth understanding of the end-to-end workflow and effective project planning by a multidisciplinary team. It discusses the need for monitoring to ensure that performance is stable and meets expectations, as well as focusing on the potential for inadvertantly generating inequality. New cross-discipline roles and expertise will be needed to enhance service delivery.
Collapse
Affiliation(s)
- Louise S Wilkinson
- Oxford Breast Imaging Centre, Churchill Hospital, Old Road, Headington, Oxford OX3 7LE, UK.
| | - J Kevin Dunbar
- Regional Head of Screening Quality Assurance Service (SQAS) - South, NHS England, England, UK
| | - Gerald Lip
- North East Scotland Breast Screening Service, Aberdeen Royal Infirmary, Foresterhill Road, Aberdeen AB25 2XF, UK
| |
Collapse
|
12
|
Feng X, Xu K, Luo MJ, Chen H, Yang Y, He Q, Song C, Li R, Wu Y, Wang H, Tham YC, Ting DSW, Lin H, Wong TY, Lam DSC. Latest developments of generative artificial intelligence and applications in ophthalmology. Asia Pac J Ophthalmol (Phila) 2024; 13:100090. [PMID: 39128549 DOI: 10.1016/j.apjo.2024.100090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Revised: 07/30/2024] [Accepted: 08/07/2024] [Indexed: 08/13/2024] Open
Abstract
The emergence of generative artificial intelligence (AI) has revolutionized various fields. In ophthalmology, generative AI has the potential to enhance efficiency, accuracy, personalization and innovation in clinical practice and medical research, through processing data, streamlining medical documentation, facilitating patient-doctor communication, aiding in clinical decision-making, and simulating clinical trials. This review focuses on the development and integration of generative AI models into clinical workflows and scientific research of ophthalmology. It outlines the need for development of a standard framework for comprehensive assessments, robust evidence, and exploration of the potential of multimodal capabilities and intelligent agents. Additionally, the review addresses the risks in AI model development and application in clinical service and research of ophthalmology, including data privacy, data bias, adaptation friction, over interdependence, and job replacement, based on which we summarized a risk management framework to mitigate these concerns. This review highlights the transformative potential of generative AI in enhancing patient care, improving operational efficiency in the clinical service and research in ophthalmology. It also advocates for a balanced approach to its adoption.
Collapse
Affiliation(s)
- Xiaoru Feng
- School of Biomedical Engineering, Tsinghua Medicine, Tsinghua University, Beijing, China; Institute for Hospital Management, Tsinghua Medicine, Tsinghua University, Beijing, China
| | - Kezheng Xu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Ming-Jie Luo
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Haichao Chen
- School of Clinical Medicine, Beijing Tsinghua Changgung Hospital, Tsinghua Medicine, Tsinghua University, Beijing, China
| | - Yangfan Yang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Qi He
- Research Centre of Big Data and Artificial Research for Medicine, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China
| | - Chenxin Song
- Research Centre of Big Data and Artificial Research for Medicine, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China
| | - Ruiyao Li
- Research Centre of Big Data and Artificial Research for Medicine, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China
| | - You Wu
- Institute for Hospital Management, Tsinghua Medicine, Tsinghua University, Beijing, China; School of Basic Medical Sciences, Tsinghua Medicine, Tsinghua University, Beijing, China; Department of Health Policy and Management, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA.
| | - Haibo Wang
- Research Centre of Big Data and Artificial Research for Medicine, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China.
| | - Yih Chung Tham
- Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Centre for Innovation and Precision Eye Health, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Singapore Eye Research Institute, Singapore National Eye Centre, Singapore; Ophthalmology and Visual Science Academic Clinical Program, Duke-NUS Medical School, Singapore
| | - Daniel Shu Wei Ting
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore; Ophthalmology and Visual Science Academic Clinical Program, Duke-NUS Medical School, Singapore; Byers Eye Institute, Stanford University, Palo Alto, CA, USA
| | - Haotian Lin
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China; Center for Precision Medicine and Department of Genetics and Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China; Hainan Eye Hospital and Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Haikou, China
| | - Tien Yin Wong
- School of Clinical Medicine, Beijing Tsinghua Changgung Hospital, Tsinghua Medicine, Tsinghua University, Beijing, China; Singapore Eye Research Institute, Singapore National Eye Centre, Singapore; Tsinghua Medicine, Tsinghua University, Beijing, China
| | - Dennis Shun-Chiu Lam
- The International Eye Research Institute, The Chinese University of Hong Kong (Shenzhen), Shenzhen, China; The C-MER International Eye Care Group, Hong Kong, Hong Kong, China
| |
Collapse
|
13
|
Kale AU, Hogg HDJ, Pearson R, Glocker B, Golder S, Coombe A, Waring J, Liu X, Moore DJ, Denniston AK. Detecting Algorithmic Errors and Patient Harms for AI-Enabled Medical Devices in Randomized Controlled Trials: Protocol for a Systematic Review. JMIR Res Protoc 2024; 13:e51614. [PMID: 38941147 PMCID: PMC11245650 DOI: 10.2196/51614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 03/11/2024] [Accepted: 04/18/2024] [Indexed: 06/29/2024] Open
Abstract
BACKGROUND Artificial intelligence (AI) medical devices have the potential to transform existing clinical workflows and ultimately improve patient outcomes. AI medical devices have shown potential for a range of clinical tasks such as diagnostics, prognostics, and therapeutic decision-making such as drug dosing. There is, however, an urgent need to ensure that these technologies remain safe for all populations. Recent literature demonstrates the need for rigorous performance error analysis to identify issues such as algorithmic encoding of spurious correlations (eg, protected characteristics) or specific failure modes that may lead to patient harm. Guidelines for reporting on studies that evaluate AI medical devices require the mention of performance error analysis; however, there is still a lack of understanding around how performance errors should be analyzed in clinical studies, and what harms authors should aim to detect and report. OBJECTIVE This systematic review will assess the frequency and severity of AI errors and adverse events (AEs) in randomized controlled trials (RCTs) investigating AI medical devices as interventions in clinical settings. The review will also explore how performance errors are analyzed including whether the analysis includes the investigation of subgroup-level outcomes. METHODS This systematic review will identify and select RCTs assessing AI medical devices. Search strategies will be deployed in MEDLINE (Ovid), Embase (Ovid), Cochrane CENTRAL, and clinical trial registries to identify relevant papers. RCTs identified in bibliographic databases will be cross-referenced with clinical trial registries. The primary outcomes of interest are the frequency and severity of AI errors, patient harms, and reported AEs. Quality assessment of RCTs will be based on version 2 of the Cochrane risk-of-bias tool (RoB2). Data analysis will include a comparison of error rates and patient harms between study arms, and a meta-analysis of the rates of patient harm in control versus intervention arms will be conducted if appropriate. RESULTS The project was registered on PROSPERO in February 2023. Preliminary searches have been completed and the search strategy has been designed in consultation with an information specialist and methodologist. Title and abstract screening started in September 2023. Full-text screening is ongoing and data collection and analysis began in April 2024. CONCLUSIONS Evaluations of AI medical devices have shown promising results; however, reporting of studies has been variable. Detection, analysis, and reporting of performance errors and patient harms is vital to robustly assess the safety of AI medical devices in RCTs. Scoping searches have illustrated that the reporting of harms is variable, often with no mention of AEs. The findings of this systematic review will identify the frequency and severity of AI performance errors and patient harms and generate insights into how errors should be analyzed to account for both overall and subgroup performance. TRIAL REGISTRATION PROSPERO CRD42023387747; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=387747. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) PRR1-10.2196/51614.
Collapse
Affiliation(s)
- Aditya U Kale
- Institute of Inflammation and Ageing, University of Birmingham, Birmingham, United Kingdom
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, United Kingdom
- NIHR Birmingham Biomedical Research Centre, Birmingham, United Kingdom
- NIHR Incubator for AI and Digital Health Research, Birmingham, United Kingdom
| | - Henry David Jeffry Hogg
- Population Health Science Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Russell Pearson
- Medicines and Healthcare Products Regulatory Agency, London, United Kingdom
| | - Ben Glocker
- Kheiron Medical Technologies, London, United Kingdom
- Department of Computing, Imperial College London, London, United Kingdom
| | - Su Golder
- Department of Health Sciences, University of York, York, United Kingdom
| | - April Coombe
- Institute of Applied Health Research, University of Birmingham, Birmingham, United Kingdom
| | - Justin Waring
- Health Services Management Centre, University of Birmingham, Birmingham, United Kingdom
| | - Xiaoxuan Liu
- Institute of Inflammation and Ageing, University of Birmingham, Birmingham, United Kingdom
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, United Kingdom
- NIHR Birmingham Biomedical Research Centre, Birmingham, United Kingdom
- NIHR Incubator for AI and Digital Health Research, Birmingham, United Kingdom
| | - David J Moore
- Institute of Applied Health Research, University of Birmingham, Birmingham, United Kingdom
| | - Alastair K Denniston
- Institute of Inflammation and Ageing, University of Birmingham, Birmingham, United Kingdom
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, United Kingdom
- NIHR Birmingham Biomedical Research Centre, Birmingham, United Kingdom
- NIHR Incubator for AI and Digital Health Research, Birmingham, United Kingdom
| |
Collapse
|
14
|
Wang F, Beecy A. Implementing AI models in clinical workflows: a roadmap. BMJ Evid Based Med 2024:bmjebm-2023-112727. [PMID: 38914450 DOI: 10.1136/bmjebm-2023-112727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/14/2024] [Indexed: 06/26/2024]
Affiliation(s)
- Fei Wang
- Weill Cornell Medical College, New York, New York, USA
| | - Ashley Beecy
- Weill Cornell Medical College, New York, New York, USA
- NewYork-Presbyterian Hospital, New York, New York, USA
| |
Collapse
|
15
|
Ong JCL, Chang SYH, William W, Butte AJ, Shah NH, Chew LST, Liu N, Doshi-Velez F, Lu W, Savulescu J, Ting DSW. Ethical and regulatory challenges of large language models in medicine. Lancet Digit Health 2024; 6:e428-e432. [PMID: 38658283 DOI: 10.1016/s2589-7500(24)00061-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Revised: 03/08/2024] [Accepted: 03/12/2024] [Indexed: 04/26/2024]
Abstract
With the rapid growth of interest in and use of large language models (LLMs) across various industries, we are facing some crucial and profound ethical concerns, especially in the medical field. The unique technical architecture and purported emergent abilities of LLMs differentiate them substantially from other artificial intelligence (AI) models and natural language processing techniques used, necessitating a nuanced understanding of LLM ethics. In this Viewpoint, we highlight ethical concerns stemming from the perspectives of users, developers, and regulators, notably focusing on data privacy and rights of use, data provenance, intellectual property contamination, and broad applications and plasticity of LLMs. A comprehensive framework and mitigating strategies will be imperative for the responsible integration of LLMs into medical practice, ensuring alignment with ethical principles and safeguarding against potential societal risks.
Collapse
Affiliation(s)
- Jasmine Chiat Ling Ong
- Division of Pharmacy, Singapore General Hospital, Singapore; Duke-NUS Medical School, National University of Singapore, Singapore
| | - Shelley Yin-Hsi Chang
- Department of Ophthalmology, Chang Gung Memorial Hospital, Linkou Medical Center, Taoyuan, Taiwan; College of Medicine, Chang Gung University, Taoyuan, Taiwan
| | - Wasswa William
- Department of Biomedical Sciences and Engineering, Mbarara University of Science and Technology, Mbarara, Uganda
| | - Atul J Butte
- Bakar Computational Health Sciences Institute, and Department of Pediatrics, University of California, San Francisco, San Francisco, CA, USA; Center for Data-Driven Insights and Innovation, University of California Health, Oakland, CA, USA
| | - Nigam H Shah
- Stanford Health Care, Palo Alto, CA, USA; Department of Medicine, and Clinical Excellence Research Center, School of Medicine, Stanford University, Stanford, CA, USA
| | - Lita Sui Tjien Chew
- Department of Pharmacy, National University of Singapore, Singapore; Singapore Health Services, Pharmacy and Therapeutics Council Office, Singapore; Department of Pharmacy, National Cancer Centre Singapore, Singapore
| | - Nan Liu
- Duke-NUS Medical School, National University of Singapore, Singapore
| | - Finale Doshi-Velez
- Harvard Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Wei Lu
- StatNLP Research Group, Singapore University of Technology and Design, Singpore
| | - Julian Savulescu
- Murdoch Children's Research Institute, Melbourne, VIC, Australia; Centre for Biomedical Ethics, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Oxford Uehiro Centre for Practical Ethics, Faculty of Philosophy, University of Oxford, Oxford, UK
| | - Daniel Shu Wei Ting
- Duke-NUS Medical School, National University of Singapore, Singapore; Artificial Intelligence and Digital Innovation, Singapore Eye Research Institute, Singapore National Eye Center, Singapore Health Service, Singapore; Byers Eye Institute, Stanford University, Palo Alto, CA, USA.
| |
Collapse
|
16
|
Scott IA, van der Vegt A, Lane P, McPhail S, Magrabi F. Achieving large-scale clinician adoption of AI-enabled decision support. BMJ Health Care Inform 2024; 31:e100971. [PMID: 38816209 PMCID: PMC11141172 DOI: 10.1136/bmjhci-2023-100971] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Accepted: 05/15/2024] [Indexed: 06/01/2024] Open
Abstract
Computerised decision support (CDS) tools enabled by artificial intelligence (AI) seek to enhance accuracy and efficiency of clinician decision-making at the point of care. Statistical models developed using machine learning (ML) underpin most current tools. However, despite thousands of models and hundreds of regulator-approved tools internationally, large-scale uptake into routine clinical practice has proved elusive. While underdeveloped system readiness and investment in AI/ML within Australia and perhaps other countries are impediments, clinician ambivalence towards adopting these tools at scale could be a major inhibitor. We propose a set of principles and several strategic enablers for obtaining broad clinician acceptance of AI/ML-enabled CDS tools.
Collapse
Affiliation(s)
- Ian A Scott
- Internal Medicine and Clinical Epidemiology, Princess Alexandra Hospital, Brisbane, Queensland, Australia
- Centre for Health Services Research, The University of Queensland Faculty of Medicine and Biomedical Sciences, Brisbane, Queensland, Australia
| | - Anton van der Vegt
- Digital Health Centre, The University of Queensland Faculty of Medicine and Biomedical Sciences, Herston, Queensland, Australia
| | - Paul Lane
- Safety, Quality and Innovation, The Prince Charles Hospital, Brisbane, Queensland, Australia
| | - Steven McPhail
- Australian Centre for Health Services Innovation, Queensland University of Technology Faculty of Health, Brisbane, Queensland, Australia
| | - Farah Magrabi
- Macquarie University, Sydney, New South Wales, Australia
| |
Collapse
|
17
|
Sulaieva O, Dudin O, Koshyk O, Panko M, Kobyliak N. Digital pathology implementation in cancer diagnostics: towards informed decision-making. Front Digit Health 2024; 6:1358305. [PMID: 38873358 PMCID: PMC11169727 DOI: 10.3389/fdgth.2024.1358305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 05/16/2024] [Indexed: 06/15/2024] Open
Abstract
Digital pathology (DP) has become a part of the cancer healthcare system, creating additional value for cancer patients. DP implementation in clinical practice provides plenty of benefits but also harbors hidden ethical challenges affecting physician-patient relationships. This paper addresses the ethical obligation to transform the physician-patient relationship for informed and responsible decision-making when using artificial intelligence (AI)-based tools for cancer diagnostics. DP application allows to improve the performance of the Human-AI Team shifting focus from AI challenges towards the Augmented Human Intelligence (AHI) benefits. AHI enhances analytical sensitivity and empowers pathologists to deliver accurate diagnoses and assess predictive biomarkers for further personalized treatment of cancer patients. At the same time, patients' right to know about using AI tools, their accuracy, strengths and limitations, measures for privacy protection, acceptance of privacy concerns and legal protection defines the duty of physicians to provide the relevant information about AHI-based solutions to patients and the community for building transparency, understanding and trust, respecting patients' autonomy and empowering informed decision-making in oncology.
Collapse
Affiliation(s)
- Oksana Sulaieva
- Medical LaboratoryCSD, Kyiv, Ukraine
- Endocrinology Department, Bogomolets National Medical University, Kyiv, Ukraine
| | | | | | | | - Nazarii Kobyliak
- Medical LaboratoryCSD, Kyiv, Ukraine
- Endocrinology Department, Bogomolets National Medical University, Kyiv, Ukraine
| |
Collapse
|
18
|
Leroy G, Andrews JG, KeAlohi-Preece M, Jaswani A, Song H, Galindo MK, Rice SA. Transparent deep learning to identify autism spectrum disorders (ASD) in EHR using clinical notes. J Am Med Inform Assoc 2024; 31:1313-1321. [PMID: 38626184 PMCID: PMC11105145 DOI: 10.1093/jamia/ocae080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 03/25/2024] [Accepted: 04/03/2024] [Indexed: 04/18/2024] Open
Abstract
OBJECTIVE Machine learning (ML) is increasingly employed to diagnose medical conditions, with algorithms trained to assign a single label using a black-box approach. We created an ML approach using deep learning that generates outcomes that are transparent and in line with clinical, diagnostic rules. We demonstrate our approach for autism spectrum disorders (ASD), a neurodevelopmental condition with increasing prevalence. METHODS We use unstructured data from the Centers for Disease Control and Prevention (CDC) surveillance records labeled by a CDC-trained clinician with ASD A1-3 and B1-4 criterion labels per sentence and with ASD cases labels per record using Diagnostic and Statistical Manual of Mental Disorders (DSM5) rules. One rule-based and three deep ML algorithms and six ensembles were compared and evaluated using a test set with 6773 sentences (N = 35 cases) set aside in advance. Criterion and case labeling were evaluated for each ML algorithm and ensemble. Case labeling outcomes were compared also with seven traditional tests. RESULTS Performance for criterion labeling was highest for the hybrid BiLSTM ML model. The best case labeling was achieved by an ensemble of two BiLSTM ML models using a majority vote. It achieved 100% precision (or PPV), 83% recall (or sensitivity), 100% specificity, 91% accuracy, and 0.91 F-measure. A comparison with existing diagnostic tests shows that our best ensemble was more accurate overall. CONCLUSIONS Transparent ML is achievable even with small datasets. By focusing on intermediate steps, deep ML can provide transparent decisions. By leveraging data redundancies, ML errors at the intermediate level have a low impact on final outcomes.
Collapse
Affiliation(s)
- Gondy Leroy
- Department of Management Information Systems, The University of Arizona, Tucson, AZ 85621, United States
| | - Jennifer G Andrews
- Department of Pediatrics, The University of Arizona, Tucson, AZ 85621, United States
| | | | - Ajay Jaswani
- Department of Management Information Systems, The University of Arizona, Tucson, AZ 85621, United States
| | - Hyunju Song
- Department of Computer Science, The University of Arizona, Tucson, AZ 85621, United States
| | - Maureen Kelly Galindo
- Department of Pediatrics, The University of Arizona, Tucson, AZ 85621, United States
| | - Sydney A Rice
- Department of Pediatrics, The University of Arizona, Tucson, AZ 85621, United States
| |
Collapse
|
19
|
Koch LM, Baumgartner CF, Berens P. Distribution shift detection for the postmarket surveillance of medical AI algorithms: a retrospective simulation study. NPJ Digit Med 2024; 7:120. [PMID: 38724581 PMCID: PMC11082139 DOI: 10.1038/s41746-024-01085-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 03/22/2024] [Indexed: 05/12/2024] Open
Abstract
Distribution shifts remain a problem for the safe application of regulated medical AI systems, and may impact their real-world performance if undetected. Postmarket shifts can occur for example if algorithms developed on data from various acquisition settings and a heterogeneous population are predominantly applied in hospitals with lower quality data acquisition or other centre-specific acquisition factors, or where some ethnicities are over-represented. Therefore, distribution shift detection could be important for monitoring AI-based medical products during postmarket surveillance. We implemented and evaluated three deep-learning based shift detection techniques (classifier-based, deep kernel, and multiple univariate kolmogorov-smirnov tests) on simulated shifts in a dataset of 130'486 retinal images. We trained a deep learning classifier for diabetic retinopathy grading. We then simulated population shifts by changing the prevalence of patients' sex, ethnicity, and co-morbidities, and example acquisition shifts by changes in image quality. We observed classification subgroup performance disparities w.r.t. image quality, patient sex, ethnicity and co-morbidity presence. The sensitivity at detecting referable diabetic retinopathy ranged from 0.50 to 0.79 for different ethnicities. This motivates the need for detecting shifts after deployment. Classifier-based tests performed best overall, with perfect detection rates for quality and co-morbidity subgroup shifts at a sample size of 1000. It was the only method to detect shifts in patient sex, but required large sample sizes ( > 3 0 ' 000 ). All methods identified easier-to-detect out-of-distribution shifts with small (≤300) sample sizes. We conclude that effective tools exist for detecting clinically relevant distribution shifts. In particular classifier-based tests can be easily implemented components in the post-market surveillance strategy of medical device manufacturers.
Collapse
Affiliation(s)
- Lisa M Koch
- Hertie Institute for AI in Brain Health, University of Tübingen, Tübingen, Germany.
| | - Christian F Baumgartner
- Cluster of Excellence Machine Learning: New Perspectives for Science, University of Tübingen, Tübingen, Germany
- Faculty of Health Sciences and Medicine, University of Lucerne, Lucerne, Switzerland
| | - Philipp Berens
- Hertie Institute for AI in Brain Health, University of Tübingen, Tübingen, Germany
- Cluster of Excellence Machine Learning: New Perspectives for Science, University of Tübingen, Tübingen, Germany
- Tübingen AI Center, Tübingen, Germany
| |
Collapse
|
20
|
Khan SD, Hoodbhoy Z, Raja MHR, Kim JY, Hogg HDJ, Manji AAA, Gulamali F, Hasan A, Shaikh A, Tajuddin S, Khan NS, Patel MR, Balu S, Samad Z, Sendak MP. Frameworks for procurement, integration, monitoring, and evaluation of artificial intelligence tools in clinical settings: A systematic review. PLOS DIGITAL HEALTH 2024; 3:e0000514. [PMID: 38809946 PMCID: PMC11135672 DOI: 10.1371/journal.pdig.0000514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 04/18/2024] [Indexed: 05/31/2024]
Abstract
Research on the applications of artificial intelligence (AI) tools in medicine has increased exponentially over the last few years but its implementation in clinical practice has not seen a commensurate increase with a lack of consensus on implementing and maintaining such tools. This systematic review aims to summarize frameworks focusing on procuring, implementing, monitoring, and evaluating AI tools in clinical practice. A comprehensive literature search, following PRSIMA guidelines was performed on MEDLINE, Wiley Cochrane, Scopus, and EBSCO databases, to identify and include articles recommending practices, frameworks or guidelines for AI procurement, integration, monitoring, and evaluation. From the included articles, data regarding study aim, use of a framework, rationale of the framework, details regarding AI implementation involving procurement, integration, monitoring, and evaluation were extracted. The extracted details were then mapped on to the Donabedian Plan, Do, Study, Act cycle domains. The search yielded 17,537 unique articles, out of which 47 were evaluated for inclusion based on their full texts and 25 articles were included in the review. Common themes extracted included transparency, feasibility of operation within existing workflows, integrating into existing workflows, validation of the tool using predefined performance indicators and improving the algorithm and/or adjusting the tool to improve performance. Among the four domains (Plan, Do, Study, Act) the most common domain was Plan (84%, n = 21), followed by Study (60%, n = 15), Do (52%, n = 13), & Act (24%, n = 6). Among 172 authors, only 1 (0.6%) was from a low-income country (LIC) and 2 (1.2%) were from lower-middle-income countries (LMICs). Healthcare professionals cite the implementation of AI tools within clinical settings as challenging owing to low levels of evidence focusing on integration in the Do and Act domains. The current healthcare AI landscape calls for increased data sharing and knowledge translation to facilitate common goals and reap maximum clinical benefit.
Collapse
Affiliation(s)
- Sarim Dawar Khan
- CITRIC Health Data Science Centre, Department of Medicine, Aga Khan University, Karachi, Pakistan
| | - Zahra Hoodbhoy
- CITRIC Health Data Science Centre, Department of Medicine, Aga Khan University, Karachi, Pakistan
- Department of Paediatrics and Child Health, Aga Khan University, Karachi, Pakistan
| | | | - Jee Young Kim
- Duke Institute for Health Innovation, Duke University School of Medicine, Durham, North Carolina, United States
| | - Henry David Jeffry Hogg
- Population Health Science Institute, Newcastle University, Newcastle upon Tyne, United Kingdom
- Newcastle upon Tyne Hospitals NHS Foundation Trust, Newcastle upon Tyne, United Kingdom
- Moorfields Eye Hospital NHS Foundation Trust, London, United Kingdom
| | - Afshan Anwar Ali Manji
- CITRIC Health Data Science Centre, Department of Medicine, Aga Khan University, Karachi, Pakistan
| | - Freya Gulamali
- Duke Institute for Health Innovation, Duke University School of Medicine, Durham, North Carolina, United States
| | - Alifia Hasan
- Duke Institute for Health Innovation, Duke University School of Medicine, Durham, North Carolina, United States
| | - Asim Shaikh
- CITRIC Health Data Science Centre, Department of Medicine, Aga Khan University, Karachi, Pakistan
| | - Salma Tajuddin
- CITRIC Health Data Science Centre, Department of Medicine, Aga Khan University, Karachi, Pakistan
| | - Nida Saddaf Khan
- CITRIC Health Data Science Centre, Department of Medicine, Aga Khan University, Karachi, Pakistan
| | - Manesh R. Patel
- Duke Clinical Research Institute, Duke University School of Medicine, Durham, North Carolina, United States
- Division of Cardiology, Duke University School of Medicine, Durham, North Carolina, United States
| | - Suresh Balu
- Duke Institute for Health Innovation, Duke University School of Medicine, Durham, North Carolina, United States
| | - Zainab Samad
- CITRIC Health Data Science Centre, Department of Medicine, Aga Khan University, Karachi, Pakistan
- Department of Medicine, Aga Khan University, Karachi, Pakistan
| | - Mark P. Sendak
- Duke Institute for Health Innovation, Duke University School of Medicine, Durham, North Carolina, United States
| |
Collapse
|
21
|
Ross J, Hammouche S, Chen Y, Rockall AG. Beyond regulatory compliance: evaluating radiology artificial intelligence applications in deployment. Clin Radiol 2024; 79:338-345. [PMID: 38360516 DOI: 10.1016/j.crad.2024.01.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 01/24/2024] [Accepted: 01/29/2024] [Indexed: 02/17/2024]
Abstract
The implementation of artificial intelligence (AI) applications in routine practice, following regulatory approval, is currently limited by practical concerns around reliability, accountability, trust, safety, and governance, in addition to factors such as cost-effectiveness and institutional information technology support. When a technology is new and relatively untested in a field, professional confidence is lacking and there is a sense of the need to go above the baseline level of validation and compliance. In this article, we propose an approach that goes beyond standard regulatory compliance for AI apps that are approved for marketing, including independent benchmarking in the lab as well as clinical audit in practice, with the aims of increasing trust and preventing harm.
Collapse
Affiliation(s)
- J Ross
- Department of Cancer and Surgery, Imperial College London, UK.
| | - S Hammouche
- Department of Cancer and Surgery, Imperial College London, UK
| | - Y Chen
- School of Medicine, University of Nottingham, UK
| | - A G Rockall
- Department of Cancer and Surgery, Imperial College London, UK
| |
Collapse
|
22
|
Kwong JCC, Wu J, Malik S, Khondker A, Gupta N, Bodnariuc N, Narayana K, Malik M, van der Kwast TH, Johnson AEW, Zlotta AR, Kulkarni GS. Predicting non-muscle invasive bladder cancer outcomes using artificial intelligence: a systematic review using APPRAISE-AI. NPJ Digit Med 2024; 7:98. [PMID: 38637674 PMCID: PMC11026453 DOI: 10.1038/s41746-024-01088-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 03/29/2024] [Indexed: 04/20/2024] Open
Abstract
Accurate prediction of recurrence and progression in non-muscle invasive bladder cancer (NMIBC) is essential to inform management and eligibility for clinical trials. Despite substantial interest in developing artificial intelligence (AI) applications in NMIBC, their clinical readiness remains unclear. This systematic review aimed to critically appraise AI studies predicting NMIBC outcomes, and to identify common methodological and reporting pitfalls. MEDLINE, EMBASE, Web of Science, and Scopus were searched from inception to February 5th, 2024 for AI studies predicting NMIBC recurrence or progression. APPRAISE-AI was used to assess methodological and reporting quality of these studies. Performance between AI and non-AI approaches included within these studies were compared. A total of 15 studies (five on recurrence, four on progression, and six on both) were included. All studies were retrospective, with a median follow-up of 71 months (IQR 32-93) and median cohort size of 125 (IQR 93-309). Most studies were low quality, with only one classified as high quality. While AI models generally outperformed non-AI approaches with respect to accuracy, c-index, sensitivity, and specificity, this margin of benefit varied with study quality (median absolute performance difference was 10 for low, 22 for moderate, and 4 for high quality studies). Common pitfalls included dataset limitations, heterogeneous outcome definitions, methodological flaws, suboptimal model evaluation, and reproducibility issues. Recommendations to address these challenges are proposed. These findings emphasise the need for collaborative efforts between urological and AI communities paired with rigorous methodologies to develop higher quality models, enabling AI to reach its potential in enhancing NMIBC care.
Collapse
Affiliation(s)
- Jethro C C Kwong
- Division of Urology, Department of Surgery, University of Toronto, Toronto, ON, Canada
- Temerty Centre for AI Research and Education in Medicine, University of Toronto, Toronto, ON, Canada
| | - Jeremy Wu
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Shamir Malik
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Adree Khondker
- Division of Urology, Department of Surgery, University of Toronto, Toronto, ON, Canada
| | - Naveen Gupta
- Georgetown University School of Medicine, Georgetown University, Washington, DC, USA
- Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA, USA
| | - Nicole Bodnariuc
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | | | - Mikail Malik
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Theodorus H van der Kwast
- Laboratory Medicine Program, University Health Network, Princess Margaret Cancer Centre, University of Toronto, Toronto, ON, Canada
| | - Alistair E W Johnson
- Temerty Centre for AI Research and Education in Medicine, University of Toronto, Toronto, ON, Canada
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - Alexandre R Zlotta
- Division of Urology, Department of Surgery, University of Toronto, Toronto, ON, Canada
- Division of Urology, Department of Surgery, Mount Sinai Hospital, Sinai Health System, Toronto, ON, Canada
- Division of Urology, Department of Surgery, Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Girish S Kulkarni
- Division of Urology, Department of Surgery, University of Toronto, Toronto, ON, Canada.
- Temerty Centre for AI Research and Education in Medicine, University of Toronto, Toronto, ON, Canada.
- Division of Urology, Department of Surgery, Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada.
| |
Collapse
|
23
|
Spielvogel CP, Haberl D, Mascherbauer K, Ning J, Kluge K, Traub-Weidinger T, Davies RH, Pierce I, Patel K, Nakuz T, Göllner A, Amereller D, Starace M, Monaci A, Weber M, Li X, Haug AR, Calabretta R, Ma X, Zhao M, Mascherbauer J, Kammerlander A, Hengstenberg C, Menezes LJ, Sciagra R, Treibel TA, Hacker M, Nitsche C. Diagnosis and prognosis of abnormal cardiac scintigraphy uptake suggestive of cardiac amyloidosis using artificial intelligence: a retrospective, international, multicentre, cross-tracer development and validation study. Lancet Digit Health 2024; 6:e251-e260. [PMID: 38519153 DOI: 10.1016/s2589-7500(23)00265-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 11/21/2023] [Accepted: 12/11/2023] [Indexed: 03/24/2024]
Abstract
BACKGROUND The diagnosis of cardiac amyloidosis can be established non-invasively by scintigraphy using bone-avid tracers, but visual assessment is subjective and can lead to misdiagnosis. We aimed to develop and validate an artificial intelligence (AI) system for standardised and reliable screening of cardiac amyloidosis-suggestive uptake and assess its prognostic value, using a multinational database of 99mTc-scintigraphy data across multiple tracers and scanners. METHODS In this retrospective, international, multicentre, cross-tracer development and validation study, 16 241 patients with 19 401 scans were included from nine centres: one hospital in Austria (consecutive recruitment Jan 4, 2010, to Aug 19, 2020), five hospital sites in London, UK (consecutive recruitment Oct 1, 2014, to Sept 29, 2022), two centres in China (selected scans from Jan 1, 2021, to Oct 31, 2022), and one centre in Italy (selected scans from Jan 1, 2011, to May 23, 2023). The dataset included all patients referred to whole-body 99mTc-scintigraphy with an anterior view and all 99mTc-labelled tracers currently used to identify cardiac amyloidosis-suggestive uptake. Exclusion criteria were image acquisition at less than 2 h (99mTc-3,3-diphosphono-1,2-propanodicarboxylic acid, 99mTc-hydroxymethylene diphosphonate, and 99mTc-methylene diphosphonate) or less than 1 h (99mTc-pyrophosphate) after tracer injection and if patients' imaging and clinical data could not be linked. Ground truth annotation was derived from centralised core-lab consensus reading of at least three independent experts (CN, TT-W, and JN). An AI system for detection of cardiac amyloidosis-associated high-grade cardiac tracer uptake was developed using data from one centre (Austria) and independently validated in the remaining centres. A multicase, multireader study and a medical algorithmic audit were conducted to assess clinician performance compared with AI and to evaluate and correct failure modes. The system's prognostic value in predicting mortality was tested in the consecutively recruited cohorts using cox proportional hazards models for each cohort individually and for the combined cohorts. FINDINGS The prevalence of cases positive for cardiac amyloidosis-suggestive uptake was 142 (2%) of 9176 patients in the Austrian, 125 (2%) of 6763 patients in the UK, 63 (62%) of 102 patients in the Chinese, and 103 (52%) of 200 patients in the Italian cohorts. In the Austrian cohort, cross-validation performance showed an area under the curve (AUC) of 1·000 (95% CI 1·000-1·000). Independent validation yielded AUCs of 0·997 (0·993-0·999) for the UK, 0·925 (0·871-0·971) for the Chinese, and 1·000 (0·999-1·000) for the Italian cohorts. In the multicase multireader study, five physicians disagreed in 22 (11%) of 200 cases (Fleiss' kappa 0·89), with a mean AUC of 0·946 (95% CI 0·924-0·967), which was inferior to AI (AUC 0·997 [0·991-1·000], p=0·0040). The medical algorithmic audit demonstrated the system's robustness across demographic factors, tracers, scanners, and centres. The AI's predictions were independently prognostic for overall mortality (adjusted hazard ratio 1·44 [95% CI 1·19-1·74], p<0·0001). INTERPRETATION AI-based screening of cardiac amyloidosis-suggestive uptake in patients undergoing scintigraphy was reliable, eliminated inter-rater variability, and portended prognostic value, with potential implications for identification, referral, and management pathways. FUNDING Pfizer.
Collapse
Affiliation(s)
- Clemens P Spielvogel
- Department of Biomedical Imaging and Image-Guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
| | - David Haberl
- Department of Biomedical Imaging and Image-Guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
| | - Katharina Mascherbauer
- Department of Medicine II, Division of Cardiology, Medical University of Vienna, Vienna, Austria
| | - Jing Ning
- Christian Doppler Laboratory for Applied Metabolomics, Medical University of Vienna, Vienna, Austria
| | - Kilian Kluge
- Department of Biomedical Imaging and Image-Guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
| | - Tatjana Traub-Weidinger
- Department of Biomedical Imaging and Image-Guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
| | - Rhodri H Davies
- Institute of Cardiovascular Science, University College London, London, UK; Bart's Heart Centre, St Bartholomew's Hospital, West Smithfield, London, London, UK
| | - Iain Pierce
- Institute of Cardiovascular Science, University College London, London, UK; Bart's Heart Centre, St Bartholomew's Hospital, West Smithfield, London, London, UK
| | - Kush Patel
- Bart's Heart Centre, St Bartholomew's Hospital, West Smithfield, London, London, UK
| | - Thomas Nakuz
- Department of Biomedical Imaging and Image-Guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
| | - Adelina Göllner
- Department of Biomedical Imaging and Image-Guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
| | - Dominik Amereller
- Department of Biomedical Imaging and Image-Guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
| | - Maria Starace
- Department of Experimental and Clinical Biomedical Sciences, Nuclear Medicine Unit, University of Florence, Florence, Italy
| | - Alice Monaci
- Department of Experimental and Clinical Biomedical Sciences, Nuclear Medicine Unit, University of Florence, Florence, Italy
| | - Michael Weber
- Department of Biomedical Imaging and Image-Guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
| | - Xiang Li
- Department of Biomedical Imaging and Image-Guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
| | - Alexander R Haug
- Department of Biomedical Imaging and Image-Guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria; Christian Doppler Laboratory for Applied Metabolomics, Medical University of Vienna, Vienna, Austria
| | - Raffaella Calabretta
- Department of Biomedical Imaging and Image-Guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
| | - Xiaowei Ma
- Department of Nuclear Medicine, Second Xiangya Hospital, Central South University, Changsha, China
| | - Min Zhao
- Department of Nuclear Medicine, Third Xiangya Hospital, Central South University, Changsha, China
| | - Julia Mascherbauer
- Department of Medicine II, Division of Cardiology, Medical University of Vienna, Vienna, Austria; Karl Landsteiner University of Health Sciences, Department of Internal Medicine 3, University Hospital St Pölten, Krems, Austria
| | - Andreas Kammerlander
- Department of Medicine II, Division of Cardiology, Medical University of Vienna, Vienna, Austria
| | - Christian Hengstenberg
- Department of Medicine II, Division of Cardiology, Medical University of Vienna, Vienna, Austria
| | - Leon J Menezes
- Bart's Heart Centre, St Bartholomew's Hospital, West Smithfield, London, London, UK
| | - Roberto Sciagra
- Department of Experimental and Clinical Biomedical Sciences, Nuclear Medicine Unit, University of Florence, Florence, Italy
| | - Thomas A Treibel
- Institute of Cardiovascular Science, University College London, London, UK; Bart's Heart Centre, St Bartholomew's Hospital, West Smithfield, London, London, UK
| | - Marcus Hacker
- Department of Biomedical Imaging and Image-Guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
| | - Christian Nitsche
- Institute of Cardiovascular Science, University College London, London, UK; Department of Medicine II, Division of Cardiology, Medical University of Vienna, Vienna, Austria; Bart's Heart Centre, St Bartholomew's Hospital, West Smithfield, London, London, UK.
| |
Collapse
|
24
|
Schaekermann M, Spitz T, Pyles M, Cole-Lewis H, Wulczyn E, Pfohl SR, Martin D, Jaroensri R, Keeling G, Liu Y, Farquhar S, Xue Q, Lester J, Hughes C, Strachan P, Tan F, Bui P, Mermel CH, Peng LH, Matias Y, Corrado GS, Webster DR, Virmani S, Semturs C, Liu Y, Horn I, Cameron Chen PH. Health equity assessment of machine learning performance (HEAL): a framework and dermatology AI model case study. EClinicalMedicine 2024; 70:102479. [PMID: 38685924 PMCID: PMC11056401 DOI: 10.1016/j.eclinm.2024.102479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 01/16/2024] [Accepted: 01/25/2024] [Indexed: 05/02/2024] Open
Abstract
Background Artificial intelligence (AI) has repeatedly been shown to encode historical inequities in healthcare. We aimed to develop a framework to quantitatively assess the performance equity of health AI technologies and to illustrate its utility via a case study. Methods Here, we propose a methodology to assess whether health AI technologies prioritise performance for patient populations experiencing worse outcomes, that is complementary to existing fairness metrics. We developed the Health Equity Assessment of machine Learning performance (HEAL) framework designed to quantitatively assess the performance equity of health AI technologies via a four-step interdisciplinary process to understand and quantify domain-specific criteria, and the resulting HEAL metric. As an illustrative case study (analysis conducted between October 2022 and January 2023), we applied the HEAL framework to a dermatology AI model. A set of 5420 teledermatology cases (store-and-forward cases from patients of 20 years or older, submitted from primary care providers in the USA and skin cancer clinics in Australia), enriched for diversity in age, sex and race/ethnicity, was used to retrospectively evaluate the AI model's HEAL metric, defined as the likelihood that the AI model performs better for subpopulations with worse average health outcomes as compared to others. The likelihood that AI performance was anticorrelated to pre-existing health outcomes was estimated using bootstrap methods as the probability that the negated Spearman's rank correlation coefficient (i.e., "R") was greater than zero. Positive values of R suggest that subpopulations with poorer health outcomes have better AI model performance. Thus, the HEAL metric, defined as p (R >0), measures how likely the AI technology is to prioritise performance for subpopulations with worse average health outcomes as compared to others (presented as a percentage below). Health outcomes were quantified as disability-adjusted life years (DALYs) when grouping by sex and age, and years of life lost (YLLs) when grouping by race/ethnicity. AI performance was measured as top-3 agreement with the reference diagnosis from a panel of 3 dermatologists per case. Findings Across all dermatologic conditions, the HEAL metric was 80.5% for prioritizing AI performance of racial/ethnic subpopulations based on YLLs, and 92.1% and 0.0% respectively for prioritizing AI performance of sex and age subpopulations based on DALYs. Certain dermatologic conditions were significantly associated with greater AI model performance compared to a reference category of less common conditions. For skin cancer conditions, the HEAL metric was 73.8% for prioritizing AI performance of age subpopulations based on DALYs. Interpretation Analysis using the proposed HEAL framework showed that the dermatology AI model prioritised performance for race/ethnicity, sex (all conditions) and age (cancer conditions) subpopulations with respect to pre-existing health disparities. More work is needed to investigate ways of promoting equitable AI performance across age for non-cancer conditions and to better understand how AI models can contribute towards improving equity in health outcomes. Funding Google LLC.
Collapse
Affiliation(s)
| | | | - Malcolm Pyles
- Advanced Clinical, Deerfield, IL, USA
- Department of Dermatology, Cleveland Clinic, Cleveland, OH, USA
| | | | | | | | | | | | | | - Yuan Liu
- Google Health, Mountain View, CA, USA
| | | | | | - Jenna Lester
- Advanced Clinical, Deerfield, IL, USA
- Department of Dermatology, University of California, San Francisco, CA, USA
| | | | | | | | - Peggy Bui
- Google Health, Mountain View, CA, USA
| | | | | | | | | | | | | | | | - Yun Liu
- Google Health, Mountain View, CA, USA
| | - Ivor Horn
- Google Health, Mountain View, CA, USA
| | | |
Collapse
|
25
|
Wang R, Kuo PC, Chen LC, Seastedt KP, Gichoya JW, Celi LA. Drop the shortcuts: image augmentation improves fairness and decreases AI detection of race and other demographics from medical images. EBioMedicine 2024; 102:105047. [PMID: 38471396 PMCID: PMC10945176 DOI: 10.1016/j.ebiom.2024.105047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 02/15/2024] [Accepted: 02/21/2024] [Indexed: 03/14/2024] Open
Abstract
BACKGROUND It has been shown that AI models can learn race on medical images, leading to algorithmic bias. Our aim in this study was to enhance the fairness of medical image models by eliminating bias related to race, age, and sex. We hypothesise models may be learning demographics via shortcut learning and combat this using image augmentation. METHODS This study included 44,953 patients who identified as Asian, Black, or White (mean age, 60.68 years ±18.21; 23,499 women) for a total of 194,359 chest X-rays (CXRs) from MIMIC-CXR database. The included CheXpert images comprised 45,095 patients (mean age 63.10 years ±18.14; 20,437 women) for a total of 134,300 CXRs were used for external validation. We also collected 1195 3D brain magnetic resonance imaging (MRI) data from the ADNI database, which included 273 participants with an average age of 76.97 years ±14.22, and 142 females. DL models were trained on either non-augmented or augmented images and assessed using disparity metrics. The features learned by the models were analysed using task transfer experiments and model visualisation techniques. FINDINGS In the detection of radiological findings, training a model using augmented CXR images was shown to reduce disparities in error rate among racial groups (-5.45%), age groups (-13.94%), and sex (-22.22%). For AD detection, the model trained with augmented MRI images was shown 53.11% and 31.01% reduction of disparities in error rate among age and sex groups, respectively. Image augmentation led to a reduction in the model's ability to identify demographic attributes and resulted in the model trained for clinical purposes incorporating fewer demographic features. INTERPRETATION The model trained using the augmented images was less likely to be influenced by demographic information in detecting image labels. These results demonstrate that the proposed augmentation scheme could enhance the fairness of interpretations by DL models when dealing with data from patients with different demographic backgrounds. FUNDING National Science and Technology Council (Taiwan), National Institutes of Health.
Collapse
Affiliation(s)
- Ryan Wang
- Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan
| | - Po-Chih Kuo
- Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan.
| | - Li-Ching Chen
- Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan
| | - Kenneth Patrick Seastedt
- Department of Surgery, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA; Department of Thoracic Surgery, Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA
| | | | - Leo Anthony Celi
- Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA, USA; Division of Pulmonary Critical Care and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
26
|
Umer F, Adnan N. Generative artificial intelligence: synthetic datasets in dentistry. BDJ Open 2024; 10:13. [PMID: 38429258 PMCID: PMC10907705 DOI: 10.1038/s41405-024-00198-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 02/15/2024] [Accepted: 02/16/2024] [Indexed: 03/03/2024] Open
Abstract
INTRODUCTION Artificial Intelligence (AI) algorithms, particularly Deep Learning (DL) models are known to be data intensive. This has increased the demand for digital data in all domains of healthcare, including dentistry. The main hindrance in the progress of AI is access to diverse datasets which train DL models ensuring optimal performance, comparable to subject experts. However, administration of these traditionally acquired datasets is challenging due to privacy regulations and the extensive manual annotation required by subject experts. Biases such as ethical, socioeconomic and class imbalances are also incorporated during the curation of these datasets, limiting their overall generalizability. These challenges prevent their accrual at a larger scale for training DL models. METHODS Generative AI techniques can be useful in the production of Synthetic Datasets (SDs) that can overcome issues affecting traditionally acquired datasets. Variational autoencoders, generative adversarial networks and diffusion models have been used to generate SDs. The following text is a review of these generative AI techniques and their operations. It discusses the chances of SDs and challenges with potential solutions which will improve the understanding of healthcare professionals working in AI research. CONCLUSION Synthetic data customized to the need of researchers can be produced to train robust AI models. These models, having been trained on such a diverse dataset will be applicable for dissemination across countries. However, there is a need for the limitations associated with SDs to be better understood, and attempts made to overcome those concerns prior to their widespread use.
Collapse
Affiliation(s)
- Fahad Umer
- Operative Dentistry and Endodontics, Department of Surgery, Aga Khan University Hospital, Karachi, Pakistan
| | - Niha Adnan
- Operative Dentistry and Endodontics, Department of Surgery, Aga Khan University Hospital, Karachi, Pakistan.
| |
Collapse
|
27
|
Fehr J, Citro B, Malpani R, Lippert C, Madai VI. A trustworthy AI reality-check: the lack of transparency of artificial intelligence products in healthcare. Front Digit Health 2024; 6:1267290. [PMID: 38455991 PMCID: PMC10919164 DOI: 10.3389/fdgth.2024.1267290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 02/05/2024] [Indexed: 03/09/2024] Open
Abstract
Trustworthy medical AI requires transparency about the development and testing of underlying algorithms to identify biases and communicate potential risks of harm. Abundant guidance exists on how to achieve transparency for medical AI products, but it is unclear whether publicly available information adequately informs about their risks. To assess this, we retrieved public documentation on the 14 available CE-certified AI-based radiology products of the II b risk category in the EU from vendor websites, scientific publications, and the European EUDAMED database. Using a self-designed survey, we reported on their development, validation, ethical considerations, and deployment caveats, according to trustworthy AI guidelines. We scored each question with either 0, 0.5, or 1, to rate if the required information was "unavailable", "partially available," or "fully available." The transparency of each product was calculated relative to all 55 questions. Transparency scores ranged from 6.4% to 60.9%, with a median of 29.1%. Major transparency gaps included missing documentation on training data, ethical considerations, and limitations for deployment. Ethical aspects like consent, safety monitoring, and GDPR-compliance were rarely documented. Furthermore, deployment caveats for different demographics and medical settings were scarce. In conclusion, public documentation of authorized medical AI products in Europe lacks sufficient public transparency to inform about safety and risks. We call on lawmakers and regulators to establish legally mandated requirements for public and substantive transparency to fulfill the promise of trustworthy AI for health.
Collapse
Affiliation(s)
- Jana Fehr
- Digital Health & Machine Learning, Hasso Plattner Institute, Potsdam, Germany
- Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
- QUEST Center for Responsible Research, Berlin Institute of Health (BIH), Charité Universitätsmedizin Berlin, Berlin, Germany
| | - Brian Citro
- Independent Researcher, Chicago, IL, United States
| | | | - Christoph Lippert
- Digital Health & Machine Learning, Hasso Plattner Institute, Potsdam, Germany
- Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Vince I. Madai
- QUEST Center for Responsible Research, Berlin Institute of Health (BIH), Charité Universitätsmedizin Berlin, Berlin, Germany
- Faculty of Computing, Engineering and the Built Environment, School of Computing and Digital Technology, Birmingham City University, Birmingham, United Kingdom
| |
Collapse
|
28
|
Groh M, Badri O, Daneshjou R, Koochek A, Harris C, Soenksen LR, Doraiswamy PM, Picard R. Deep learning-aided decision support for diagnosis of skin disease across skin tones. Nat Med 2024; 30:573-583. [PMID: 38317019 PMCID: PMC10878981 DOI: 10.1038/s41591-023-02728-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Accepted: 11/16/2023] [Indexed: 02/07/2024]
Abstract
Although advances in deep learning systems for image-based medical diagnosis demonstrate their potential to augment clinical decision-making, the effectiveness of physician-machine partnerships remains an open question, in part because physicians and algorithms are both susceptible to systematic errors, especially for diagnosis of underrepresented populations. Here we present results from a large-scale digital experiment involving board-certified dermatologists (n = 389) and primary-care physicians (n = 459) from 39 countries to evaluate the accuracy of diagnoses submitted by physicians in a store-and-forward teledermatology simulation. In this experiment, physicians were presented with 364 images spanning 46 skin diseases and asked to submit up to four differential diagnoses. Specialists and generalists achieved diagnostic accuracies of 38% and 19%, respectively, but both specialists and generalists were four percentage points less accurate for the diagnosis of images of dark skin as compared to light skin. Fair deep learning system decision support improved the diagnostic accuracy of both specialists and generalists by more than 33%, but exacerbated the gap in the diagnostic accuracy of generalists across skin tones. These results demonstrate that well-designed physician-machine partnerships can enhance the diagnostic accuracy of physicians, illustrating that success in improving overall diagnostic accuracy does not necessarily address bias.
Collapse
Affiliation(s)
- Matthew Groh
- Northwestern University Kellogg School of Management, Evanston, IL, USA.
- MIT Media Lab, Cambridge, MA, USA.
| | - Omar Badri
- Northeast Dermatology Associates, Beverly, MA, USA
| | - Roxana Daneshjou
- Stanford Department of Biomedical Data Science, Stanford, CA, USA
- Stanford Department of Dermatology, Redwood City, CA, USA
| | | | | | - Luis R Soenksen
- Wyss Institute for Bioinspired Engineering at Harvard, Boston, MA, USA
| | - P Murali Doraiswamy
- MIT Media Lab, Cambridge, MA, USA
- Duke University School of Medicine, Durham, NC, USA
| | | |
Collapse
|
29
|
Evans H, Snead D. Why do errors arise in artificial intelligence diagnostic tools in histopathology and how can we minimize them? Histopathology 2024; 84:279-287. [PMID: 37921030 DOI: 10.1111/his.15071] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/22/2023] [Accepted: 09/27/2023] [Indexed: 11/04/2023]
Abstract
Artificial intelligence (AI)-based diagnostic tools can offer numerous benefits to the field of histopathology, including improved diagnostic accuracy, efficiency and productivity. As a result, such tools are likely to have an increasing role in routine practice. However, all AI tools are prone to errors, and these AI-associated errors have been identified as a major risk in the introduction of AI into healthcare. The errors made by AI tools are different, in terms of both cause and nature, to the errors made by human pathologists. As highlighted by the National Institute for Health and Care Excellence, it is imperative that practising pathologists understand the potential limitations of AI tools, including the errors made. Pathologists are in a unique position to be gatekeepers of AI tool use, maximizing patient benefit while minimizing harm. Furthermore, their pathological knowledge is essential to understanding when, and why, errors have occurred and so to developing safer future algorithms. This paper summarises the literature on errors made by AI diagnostic tools in histopathology. These include erroneous errors, data concerns (data bias, hidden stratification, data imbalances, distributional shift, and lack of generalisability), reinforcement of outdated practices, unsafe failure mode, automation bias, and insensitivity to impact. Methods to reduce errors in both tool design and clinical use are discussed, and the practical roles for pathologists in error minimisation are highlighted. This aims to inform and empower pathologists to move safely through this seismic change in practice and help ensure that novel AI tools are adopted safely.
Collapse
Affiliation(s)
- Harriet Evans
- Histopathology Department, University Hospitals Coventry and Warwickshire NHS Trust, Coventry, UK
- Warwick Medical School, University of Warwick, Coventry, UK
| | - David Snead
- Histopathology Department, University Hospitals Coventry and Warwickshire NHS Trust, Coventry, UK
- Warwick Medical School, University of Warwick, Coventry, UK
| |
Collapse
|
30
|
Kerasidou CX, Malone M, Daly A, Tava F. Machine learning models, trusted research environments and UK health data: ensuring a safe and beneficial future for AI development in healthcare. JOURNAL OF MEDICAL ETHICS 2023; 49:838-843. [PMID: 36997310 DOI: 10.1136/jme-2022-108696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 03/11/2023] [Indexed: 06/19/2023]
Abstract
Digitalisation of health and the use of health data in artificial intelligence, and machine learning (ML), including for applications that will then in turn be used in healthcare are major themes permeating current UK and other countries' healthcare systems and policies. Obtaining rich and representative data is key for robust ML development, and UK health data sets are particularly attractive sources for this. However, ensuring that such research and development is in the public interest, produces public benefit and preserves privacy are key challenges. Trusted research environments (TREs) are positioned as a way of balancing the diverging interests in healthcare data research with privacy and public benefit. Using TRE data to train ML models presents various challenges to the balance previously struck between these societal interests, which have hitherto not been discussed in the literature. These challenges include the possibility of personal data being disclosed in ML models, the dynamic nature of ML models and how public benefit may be (re)conceived in this context. For ML research to be facilitated using UK health data, TREs and others involved in the UK health data policy ecosystem need to be aware of these issues and work to address them in order to continue to ensure a 'safe' health and care data environment that truly serves the public.
Collapse
Affiliation(s)
| | - Maeve Malone
- Dundee Law School, School of Humanities Social Sciences and Law, University of Dundee, Dundee, UK
| | - Angela Daly
- Leverhulme Research Centre for Forensic Science, School of Science and Engineering, University of Dundee, Dundee, UK
| | | |
Collapse
|
31
|
Wang Y, Li N, Chen L, Wu M, Meng S, Dai Z, Zhang Y, Clarke M. Guidelines, Consensus Statements, and Standards for the Use of Artificial Intelligence in Medicine: Systematic Review. J Med Internet Res 2023; 25:e46089. [PMID: 37991819 PMCID: PMC10701655 DOI: 10.2196/46089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2023] [Revised: 08/21/2023] [Accepted: 09/26/2023] [Indexed: 11/23/2023] Open
Abstract
BACKGROUND The application of artificial intelligence (AI) in the delivery of health care is a promising area, and guidelines, consensus statements, and standards on AI regarding various topics have been developed. OBJECTIVE We performed this study to assess the quality of guidelines, consensus statements, and standards in the field of AI for medicine and to provide a foundation for recommendations about the future development of AI guidelines. METHODS We searched 7 electronic databases from database establishment to April 6, 2022, and screened articles involving AI guidelines, consensus statements, and standards for eligibility. The AGREE II (Appraisal of Guidelines for Research & Evaluation II) and RIGHT (Reporting Items for Practice Guidelines in Healthcare) tools were used to assess the methodological and reporting quality of the included articles. RESULTS This systematic review included 19 guideline articles, 14 consensus statement articles, and 3 standard articles published between 2019 and 2022. Their content involved disease screening, diagnosis, and treatment; AI intervention trial reporting; AI imaging development and collaboration; AI data application; and AI ethics governance and applications. Our quality assessment revealed that the average overall AGREE II score was 4.0 (range 2.2-5.5; 7-point Likert scale) and the mean overall reporting rate of the RIGHT tool was 49.4% (range 25.7%-77.1%). CONCLUSIONS The results indicated important differences in the quality of different AI guidelines, consensus statements, and standards. We made recommendations for improving their methodological and reporting quality. TRIAL REGISTRATION PROSPERO International Prospective Register of Systematic Reviews (CRD42022321360); https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=321360.
Collapse
Affiliation(s)
- Ying Wang
- Department of Medical Administration, West China Hospital, Sichuan University, Chengdu, China
| | - Nian Li
- Department of Medical Administration, West China Hospital, Sichuan University, Chengdu, China
| | - Lingmin Chen
- Department of Anesthesiology, National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, Chengdu, China
| | - Miaomiao Wu
- Department of General Practice, National Clinical Research Center for Geriatrics, International Medical Center, West China Hospital, Sichuan University, Chengdu, China
| | - Sha Meng
- Department of Operation Management, West China Hospital, Sichuan University, Chengdu, China
| | - Zelei Dai
- Department of Radiation Oncology, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
| | - Yonggang Zhang
- Department of Periodical Press, National Clinical Research Center for Geriatrics, Chinese Evidence-based Medicine Center, Nursing Key Laboratory of Sichuan Province, West China Hospital, Sichuan University, Chengdu, China
| | - Mike Clarke
- Northern Ireland Methodology Hub, Queen's University Belfast, Belfast, United Kingdom
| |
Collapse
|
32
|
Nwosu OI, Crowson MG, Rameau A. Artificial Intelligence Governance and Otolaryngology-Head and Neck Surgery. Laryngoscope 2023; 133:2868-2870. [PMID: 37658749 PMCID: PMC10592089 DOI: 10.1002/lary.31013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 08/18/2023] [Indexed: 09/05/2023]
Abstract
This rapid communication highlights components of artificial intelligence governance in healthcare and suggests adopting key governance approaches in otolaryngology – head and neck surgery.
Collapse
Affiliation(s)
- Obinna I. Nwosu
- Department of Otolaryngology-Head & Neck Surgery, Massachusetts Eye & Ear, Boston, Massachusetts, USA
- Department of Otolaryngology-Head & Neck Surgery, Harvard Medical School, Boston, Massachusetts, USA
| | - Matthew G. Crowson
- Department of Otolaryngology-Head & Neck Surgery, Massachusetts Eye & Ear, Boston, Massachusetts, USA
- Department of Otolaryngology-Head & Neck Surgery, Harvard Medical School, Boston, Massachusetts, USA
- Deloitte Consulting, Boston, Massachusetts, USA
| | - Anaïs Rameau
- Department of Otolaryngology–Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medical College, New York, New York, USA
| |
Collapse
|
33
|
Jin Y, Kattan MW. Methodologic Issues Specific to Prediction Model Development and Evaluation. Chest 2023; 164:1281-1289. [PMID: 37414333 DOI: 10.1016/j.chest.2023.06.038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 06/26/2023] [Accepted: 06/27/2023] [Indexed: 07/08/2023] Open
Abstract
Developing and evaluating statistical prediction models is challenging, and many pitfalls can arise. This article identifies what the authors believe are some common methodologic concerns that may be encountered. We describe each problem and make suggestions regarding how to address them. The hope is that this article will result in higher-quality publications of statistical prediction models.
Collapse
Affiliation(s)
- Yuxuan Jin
- Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, OH
| | - Michael W Kattan
- Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, OH.
| |
Collapse
|
34
|
Glocker B, Jones C, Roschewitz M, Winzeck S. Risk of Bias in Chest Radiography Deep Learning Foundation Models. Radiol Artif Intell 2023; 5:e230060. [PMID: 38074789 PMCID: PMC10698597 DOI: 10.1148/ryai.230060] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 08/07/2023] [Accepted: 08/24/2023] [Indexed: 03/15/2024]
Abstract
PURPOSE To analyze a recently published chest radiography foundation model for the presence of biases that could lead to subgroup performance disparities across biologic sex and race. MATERIALS AND METHODS This Health Insurance Portability and Accountability Act-compliant retrospective study used 127 118 chest radiographs from 42 884 patients (mean age, 63 years ± 17 [SD]; 23 623 male, 19 261 female) from the CheXpert dataset that were collected between October 2002 and July 2017. To determine the presence of bias in features generated by a chest radiography foundation model and baseline deep learning model, dimensionality reduction methods together with two-sample Kolmogorov-Smirnov tests were used to detect distribution shifts across sex and race. A comprehensive disease detection performance analysis was then performed to associate any biases in the features to specific disparities in classification performance across patient subgroups. RESULTS Ten of 12 pairwise comparisons across biologic sex and race showed statistically significant differences in the studied foundation model, compared with four significant tests in the baseline model. Significant differences were found between male and female (P < .001) and Asian and Black (P < .001) patients in the feature projections that primarily capture disease. Compared with average model performance across all subgroups, classification performance on the "no finding" label decreased between 6.8% and 7.8% for female patients, and performance in detecting "pleural effusion" decreased between 10.7% and 11.6% for Black patients. CONCLUSION The studied chest radiography foundation model demonstrated racial and sex-related bias, which led to disparate performance across patient subgroups; thus, this model may be unsafe for clinical applications.Keywords: Conventional Radiography, Computer Application-Detection/Diagnosis, Chest Radiography, Bias, Foundation Models Supplemental material is available for this article. Published under a CC BY 4.0 license.See also commentary by Czum and Parr in this issue.
Collapse
Affiliation(s)
- Ben Glocker
- From the Department of Computing, Imperial College London, South
Kensington Campus, London SW7 2AZ, United Kingdom
| | - Charles Jones
- From the Department of Computing, Imperial College London, South
Kensington Campus, London SW7 2AZ, United Kingdom
| | - Mélanie Roschewitz
- From the Department of Computing, Imperial College London, South
Kensington Campus, London SW7 2AZ, United Kingdom
| | - Stefan Winzeck
- From the Department of Computing, Imperial College London, South
Kensington Campus, London SW7 2AZ, United Kingdom
| |
Collapse
|
35
|
Roschewitz M, Khara G, Yearsley J, Sharma N, James JJ, Ambrózay É, Heroux A, Kecskemethy P, Rijken T, Glocker B. Automatic correction of performance drift under acquisition shift in medical image classification. Nat Commun 2023; 14:6608. [PMID: 37857643 PMCID: PMC10587231 DOI: 10.1038/s41467-023-42396-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Accepted: 10/10/2023] [Indexed: 10/21/2023] Open
Abstract
Image-based prediction models for disease detection are sensitive to changes in data acquisition such as the replacement of scanner hardware or updates to the image processing software. The resulting differences in image characteristics may lead to drifts in clinically relevant performance metrics which could cause harm in clinical decision making, even for models that generalise in terms of area under the receiver-operating characteristic curve. We propose Unsupervised Prediction Alignment, a generic automatic recalibration method that requires no ground truth annotations and only limited amounts of unlabelled example images from the shifted data distribution. We illustrate the effectiveness of the proposed method to detect and correct performance drift in mammography-based breast cancer screening and on publicly available histopathology data. We show that the proposed method can preserve the expected performance in terms of sensitivity/specificity under various realistic scenarios of image acquisition shift, thus offering an important safeguard for clinical deployment.
Collapse
Affiliation(s)
- Mélanie Roschewitz
- Kheiron Medical Technologies, London, UK.
- Imperial College London, Department of Computing, London, UK.
| | | | | | - Nisha Sharma
- Leeds Teaching Hospital NHS Trust, Department of Radiology, Leeds, UK
| | - Jonathan J James
- Nottingham University Hospitals NHS Trust, Nottingham City Hospital, Nottingham Breast Institute, Nottingham, UK
| | | | | | | | | | - Ben Glocker
- Kheiron Medical Technologies, London, UK.
- Imperial College London, Department of Computing, London, UK.
| |
Collapse
|
36
|
Farris AB, Alexander MP, Balis UGJ, Barisoni L, Boor P, Bülow RD, Cornell LD, Demetris AJ, Farkash E, Hermsen M, Hogan J, Kain R, Kers J, Kong J, Levenson RM, Loupy A, Naesens M, Sarder P, Tomaszewski JE, van der Laak J, van Midden D, Yagi Y, Solez K. Banff Digital Pathology Working Group: Image Bank, Artificial Intelligence Algorithm, and Challenge Trial Developments. Transpl Int 2023; 36:11783. [PMID: 37908675 PMCID: PMC10614670 DOI: 10.3389/ti.2023.11783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Accepted: 09/22/2023] [Indexed: 11/02/2023]
Abstract
The Banff Digital Pathology Working Group (DPWG) was established with the goal to establish a digital pathology repository; develop, validate, and share models for image analysis; and foster collaborations using regular videoconferencing. During the calls, a variety of artificial intelligence (AI)-based support systems for transplantation pathology were presented. Potential collaborations in a competition/trial on AI applied to kidney transplant specimens, including the DIAGGRAFT challenge (staining of biopsies at multiple institutions, pathologists' visual assessment, and development and validation of new and pre-existing Banff scoring algorithms), were also discussed. To determine the next steps, a survey was conducted, primarily focusing on the feasibility of establishing a digital pathology repository and identifying potential hosts. Sixteen of the 35 respondents (46%) had access to a server hosting a digital pathology repository, with 2 respondents that could serve as a potential host at no cost to the DPWG. The 16 digital pathology repositories collected specimens from various organs, with the largest constituent being kidney (n = 12,870 specimens). A DPWG pilot digital pathology repository was established, and there are plans for a competition/trial with the DIAGGRAFT project. Utilizing existing resources and previously established models, the Banff DPWG is establishing new resources for the Banff community.
Collapse
Affiliation(s)
- Alton B. Farris
- Department of Pathology and Laboratory Medicine, Emory University, Atlanta, GE, United States
| | - Mariam P. Alexander
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, United States
| | - Ulysses G. J. Balis
- Department of Pathology, University of Michigan, Ann Arbor, MI, United States
| | - Laura Barisoni
- Department of Pathology and Medicine, Duke University, Durham, NC, United States
| | - Peter Boor
- Institute of Pathology, Rheinisch-Westfälische Technische Hochschule (RWTH) Aachen University Clinic, Aachen, Germany
- Department of Nephrology and Immunology, RWTH Aachen University Clinic, Aachen, Germany
| | - Roman D. Bülow
- Institute of Pathology, Rheinisch-Westfälische Technische Hochschule (RWTH) Aachen University Clinic, Aachen, Germany
| | - Lynn D. Cornell
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, United States
| | - Anthony J. Demetris
- Department of Pathology, University of Pittsburgh, Pittsburgh, PA, United States
| | - Evan Farkash
- Department of Pathology, University of Michigan, Ann Arbor, MI, United States
| | - Meyke Hermsen
- Department of Pathology, Radboud University Medical Center, Nijmegen, Netherlands
| | - Julien Hogan
- Department of Pathology and Laboratory Medicine, Emory University, Atlanta, GE, United States
- Nephrology Service, Robert Debré Hospital, University of Paris, Paris, France
| | - Renate Kain
- Department of Pathology, Medical University of Vienna, Vienna, Austria
| | - Jesper Kers
- Department of Pathology, Amsterdam University Medical Centers, Amsterdam, Netherlands
- Department of Pathology, Leiden University Medical Center, Leiden, Netherlands
| | - Jun Kong
- Georgia State University, Atlanta, GA, United States
- Emory University, Atlanta, GA, United States
| | - Richard M. Levenson
- Department of Pathology, University of California Davis Health System, Sacramento, CA, United States
| | - Alexandre Loupy
- Institut National de la Santé et de la Recherche Médicale, UMR 970, Paris Translational Research Centre for Organ Transplantation, and Kidney Transplant Department, Hôpital Necker, Assistance Publique-Hôpitaux de Paris, University of Paris, Paris, France
| | - Maarten Naesens
- Department of Microbiology, Immunology and Transplantation, KU Leuven, Leuven, Belgium
| | - Pinaki Sarder
- Division of Nephrology, Hypertension, and Renal Transplantation, Department of Medicine, Intelligent Critical Care Center, College of Medicine, University of Florida at Gainesville, Gainesville, FL, United States
| | - John E. Tomaszewski
- Department of Pathology, The State University of New York at Buffalo, Buffalo, NY, United States
| | - Jeroen van der Laak
- Department of Pathology, Radboud University Medical Center, Nijmegen, Netherlands
- Center for Medical Image Science and Visualization, Linköping University, Linköping, Sweden
| | - Dominique van Midden
- Department of Pathology, Radboud University Medical Center, Nijmegen, Netherlands
| | - Yukako Yagi
- Memorial Sloan Kettering Cancer Center, New York, NY, United States
| | - Kim Solez
- Department of Pathology, University of Alberta, Edmonton, AB, Canada
| |
Collapse
|
37
|
Charpignon ML, Byers J, Cabral S, Celi LA, Fernandes C, Gallifant J, Lough ME, Mlombwa D, Moukheiber L, Ong BA, Panitchote A, William W, Wong AKI, Nazer L. Critical Bias in Critical Care Devices. Crit Care Clin 2023; 39:795-813. [PMID: 37704341 DOI: 10.1016/j.ccc.2023.02.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]
Abstract
Critical care data contain information about the most physiologically fragile patients in the hospital, who require a significant level of monitoring. However, medical devices used for patient monitoring suffer from measurement biases that have been largely underreported. This article explores sources of bias in commonly used clinical devices, including pulse oximeters, thermometers, and sphygmomanometers. Further, it provides a framework for mitigating these biases and key principles to achieve more equitable health care delivery.
Collapse
Affiliation(s)
- Marie-Laure Charpignon
- Institute for Data, Systems, and Society (IDSS), E18-407A, 50 Ames Street, Cambridge, MA 02142, USA.
| | - Joseph Byers
- Respiratory Therapy, Beth Israel Deaconess Medical Center, 330 Brookline Avenue, Boston, MA 02215, USA
| | - Stephanie Cabral
- Department of Medicine, Beth Israel Deaconess Medical Center, 330 Brookline Avenue, Boston, MA 02215, USA
| | - Leo Anthony Celi
- Laboratory for Computational Physiology, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA; Division of Pulmonary, Critical Care and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Chrystinne Fernandes
- Laboratory for Computational Physiology, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
| | - Jack Gallifant
- Imperial College London NHS Trust, St Thomas' Hospital, Westminster Bridge Road, London SE1 7EH, UK
| | - Mary E Lough
- Stanford Health Care, Stanford University, 300 Pasteur Drive, Stanford, CA 94305, USA
| | - Donald Mlombwa
- Zomba Central Hospital, 8th Avenue, Zomba, Malawi; Kamuzu College of Health Sciences, Blantyre, Malawi; St. Luke's College of Health Sciences, Chilema-Zomba, Malawi
| | - Lama Moukheiber
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, E25-330, Cambridge, MA 02139, USA
| | - Bradley Ashley Ong
- College of Medicine, University of the Philippines Manila, Calderon hall, UP College of Medicine, 547 Pedro Gil Street, Ermita Manila, Philippines
| | - Anupol Panitchote
- Faculty of Medicine, Khon Kaen University, 123 Mittraparp Highway, Muang District, Khon Kaen 40002, Thailand
| | - Wasswa William
- Mbarara University of Science and Technology, P.O. Box 1410, Mbarara, Uganda
| | - An-Kwok Ian Wong
- Duke University Medical Center, 2424 Erwin Road, Suite 1102, Hock Plaza Box 2721, Durham, NC 27710, USA
| | - Lama Nazer
- King Hussein Cancer Center, Queen Rania Street 202, Amman, Jordan
| |
Collapse
|
38
|
Baumgartner R, Arora P, Bath C, Burljaev D, Ciereszko K, Custers B, Ding J, Ernst W, Fosch-Villaronga E, Galanos V, Gremsl T, Hendl T, Kropp C, Lenk C, Martin P, Mbelu S, Morais Dos Santos Bruss S, Napiwodzka K, Nowak E, Roxanne T, Samerski S, Schneeberger D, Tampe-Mai K, Vlantoni K, Wiggert K, Williams R. Fair and equitable AI in biomedical research and healthcare: Social science perspectives. Artif Intell Med 2023; 144:102658. [PMID: 37783540 DOI: 10.1016/j.artmed.2023.102658] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 06/30/2023] [Accepted: 09/01/2023] [Indexed: 10/04/2023]
Abstract
Artificial intelligence (AI) offers opportunities but also challenges for biomedical research and healthcare. This position paper shares the results of the international conference "Fair medicine and AI" (online 3-5 March 2021). Scholars from science and technology studies (STS), gender studies, and ethics of science and technology formulated opportunities, challenges, and research and development desiderata for AI in healthcare. AI systems and solutions, which are being rapidly developed and applied, may have undesirable and unintended consequences including the risk of perpetuating health inequalities for marginalized groups. Socially robust development and implications of AI in healthcare require urgent investigation. There is a particular dearth of studies in human-AI interaction and how this may best be configured to dependably deliver safe, effective and equitable healthcare. To address these challenges, we need to establish diverse and interdisciplinary teams equipped to develop and apply medical AI in a fair, accountable and transparent manner. We formulate the importance of including social science perspectives in the development of intersectionally beneficent and equitable AI for biomedical research and healthcare, in part by strengthening AI health evaluation.
Collapse
Affiliation(s)
- Renate Baumgartner
- Center of Gender- and Diversity Research, University of Tübingen, Wilhelmstrasse 56, 72074 Tübingen, Germany; Athena Institute, Vrije Universiteit Amsterdam, De Boelelaan 1085, 1081 HV Amsterdam, The Netherlands.
| | - Payal Arora
- Erasmus School of Philosophy, Erasmus University Rotterdam, Burgemeester Oudlaan 50, 3062 PA Rotterdam, The Netherlands
| | - Corinna Bath
- Gender, Technology and Mobility, Institute for Flight Guidance, TU Braunschweig, Hermann-Blenk-Str. 27, 38108 Braunschweig, Germany
| | - Darja Burljaev
- Center of Gender- and Diversity Research, University of Tübingen, Wilhelmstrasse 56, 72074 Tübingen, Germany
| | - Kinga Ciereszko
- Department of Philosophy, Adam Mickiewicz University in Poznan, Szamarzewski Street 89C, 60-569 Poznan, Poland
| | - Bart Custers
- eLaw - Center for Law and Digital Technologies, Leiden University, Steenschuur 25, 2311 ES Leiden, Netherlands
| | - Jin Ding
- iHuman and Department of Sociological Studies, University of Sheffield, ICOSS, 219 Portobello, Sheffield S1 4DP, United Kingdom
| | - Waltraud Ernst
- Institute for Women's and Gender Studies, Johannes Kepler University Linz, Altenberger Strasse 69, 4040 Linz, Austria
| | - Eduard Fosch-Villaronga
- eLaw - Center for Law and Digital Technologies, Leiden University, Steenschuur 25, 2311 ES Leiden, Netherlands
| | - Vassilis Galanos
- Science, Technology and Innovation Studies, School of Social and Political Science, University of Edinburgh, Old Surgeons' Hall, High School Yards, Edinburgh EH1 1LZ, United Kingdom
| | - Thomas Gremsl
- Institute of Ethics and Social Teaching, Faculty of Catholic Theology, University of Graz, Heinrichstraße 78b/2, 8010 Graz, Austria
| | - Tereza Hendl
- Professorship for Ethics of Medicine, University of Augsburg, Stenglinstraße 2, 86156 Augsburg, Germany; Institute of Ethics, History and Theory of Medicine, Ludwig-Maximilians-University in Munich, Lessingstr. 2, 80336 Munich, Germany
| | - Cordula Kropp
- Center for Interdisciplinary Risk and Innovation Studies (ZIRIUS), University of Stuttgart, Seidenstraße 36, 70174 Stuttgart, Germany
| | - Christian Lenk
- Institute of the History, Philosophy and Ethics of Medicine, Ulm University, Parkstraße 11, 89073 Ulm, Germany
| | - Paul Martin
- iHuman and Department of Sociological Studies, University of Sheffield, ICOSS, 219 Portobello, Sheffield S1 4DP, United Kingdom
| | - Somto Mbelu
- Erasmus School of Philosophy, Erasmus University Rotterdam, 10A Ademola Close off Remi Fani Kayode Street, GRA Ikeja, Lagos, Nigeria
| | | | - Karolina Napiwodzka
- Department of Philosophy, Adam Mickiewicz University in Poznan, Szamarzewski Street 89C, 60-569 Poznan, Poland
| | - Ewa Nowak
- Department of Philosophy, Adam Mickiewicz University in Poznan, Szamarzewski Street 89C, 60-569 Poznan, Poland
| | - Tiara Roxanne
- Data & Society Institute, 228 Park Ave S PMB 83075, New York, NY 10003-1502, United States of America
| | - Silja Samerski
- Fachbereich Soziale Arbeit und Gesundheit, Hochschule Emden/Leer, Constantiaplatz 4, 26723 Emden, Germany
| | - David Schneeberger
- Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Auenbruggerplatz 2, 8036 Graz, Austria
| | - Karolin Tampe-Mai
- Center for Interdisciplinary Risk and Innovation Studies (ZIRIUS), University of Stuttgart, Seidenstraße 36, 70174 Stuttgart, Germany
| | - Katerina Vlantoni
- Department of History and Philosophy of Science, School of Science, National and Kapodistrian University of Athens, Panepistimioupoli, Ilisia, Athens 15771, Greece
| | - Kevin Wiggert
- Institute of Sociology, Department Sociology of Technology and Innovation, Technical University of Berlin, Fraunhoferstraße 33-36, 10623 Berlin, Germany
| | - Robin Williams
- Science, Technology and Innovation Studies, School of Social and Political Science, University of Edinburgh, Old Surgeons' Hall, High School Yards, Edinburgh EH1 1LZ, United Kingdom
| |
Collapse
|
39
|
Herington J, McCradden MD, Creel K, Boellaard R, Jones EC, Jha AK, Rahmim A, Scott PJH, Sunderland JJ, Wahl RL, Zuehlsdorff S, Saboury B. Ethical Considerations for Artificial Intelligence in Medical Imaging: Deployment and Governance. J Nucl Med 2023; 64:1509-1515. [PMID: 37620051 DOI: 10.2967/jnumed.123.266110] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 07/11/2023] [Indexed: 08/26/2023] Open
Abstract
The deployment of artificial intelligence (AI) has the potential to make nuclear medicine and medical imaging faster, cheaper, and both more effective and more accessible. This is possible, however, only if clinicians and patients feel that these AI medical devices (AIMDs) are trustworthy. Highlighting the need to ensure health justice by fairly distributing benefits and burdens while respecting individual patients' rights, the AI Task Force of the Society of Nuclear Medicine and Molecular Imaging has identified 4 major ethical risks that arise during the deployment of AIMD: autonomy of patients and clinicians, transparency of clinical performance and limitations, fairness toward marginalized populations, and accountability of physicians and developers. We provide preliminary recommendations for governing these ethical risks to realize the promise of AIMD for patients and populations.
Collapse
Affiliation(s)
- Jonathan Herington
- Department of Health Humanities and Bioethics and Department of Philosophy, University of Rochester, Rochester, New York
| | - Melissa D McCradden
- Department of Bioethics, Hospital for Sick Children, and Dana Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Kathleen Creel
- Department of Philosophy and Religion and Khoury College of Computer Sciences, Northeastern University, Boston, Massachusetts
| | - Ronald Boellaard
- Department of Radiology and Nuclear Medicine, Cancer Centre Amsterdam, Amsterdam University Medical Centres, Amsterdam, The Netherlands
| | - Elizabeth C Jones
- Department of Radiology and Imaging Sciences, Clinical Center, National Institutes of Health, Bethesda, Maryland
| | - Abhinav K Jha
- Department of Biomedical Engineering and Mallinckrodt Institute of Radiology, Washington University in St. Louis, St. Louis, Missouri
| | - Arman Rahmim
- Departments of Radiology and Physics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Peter J H Scott
- Department of Radiology, University of Michigan Medical School, Ann Arbor, Michigan
| | - John J Sunderland
- Departments of Radiology and Physics, University of Iowa, Iowa City, Iowa
| | - Richard L Wahl
- Mallinckrodt Institute of Radiology, Washington University in St. Louis, St. Louis, Missouri; and
| | | | - Babak Saboury
- Department of Radiology and Imaging Sciences, Clinical Center, National Institutes of Health, Bethesda, Maryland;
| |
Collapse
|
40
|
Zhang A, Wu Z, Wu E, Wu M, Snyder MP, Zou J, Wu JC. Leveraging physiology and artificial intelligence to deliver advancements in health care. Physiol Rev 2023; 103:2423-2450. [PMID: 37104717 PMCID: PMC10390055 DOI: 10.1152/physrev.00033.2022] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Revised: 03/06/2023] [Accepted: 04/25/2023] [Indexed: 04/29/2023] Open
Abstract
Artificial intelligence in health care has experienced remarkable innovation and progress in the last decade. Significant advancements can be attributed to the utilization of artificial intelligence to transform physiology data to advance health care. In this review, we explore how past work has shaped the field and defined future challenges and directions. In particular, we focus on three areas of development. First, we give an overview of artificial intelligence, with special attention to the most relevant artificial intelligence models. We then detail how physiology data have been harnessed by artificial intelligence to advance the main areas of health care: automating existing health care tasks, increasing access to care, and augmenting health care capabilities. Finally, we discuss emerging concerns surrounding the use of individual physiology data and detail an increasingly important consideration for the field, namely the challenges of deploying artificial intelligence models to achieve meaningful clinical impact.
Collapse
Affiliation(s)
- Angela Zhang
- Stanford Cardiovascular Institute, School of Medicine, Stanford University, Stanford, California, United States
- Department of Genetics, School of Medicine, Stanford University, Stanford, California, United States
- Greenstone Biosciences, Palo Alto, California, United States
| | - Zhenqin Wu
- Department of Chemistry, Stanford University, Stanford, California, United States
| | - Eric Wu
- Department of Electrical Engineering, Stanford University, Stanford, California, United States
| | - Matthew Wu
- Greenstone Biosciences, Palo Alto, California, United States
| | - Michael P Snyder
- Department of Genetics, School of Medicine, Stanford University, Stanford, California, United States
| | - James Zou
- Department of Biomedical Informatics, School of Medicine, Stanford University, Stanford, California, United States
- Department of Computer Science, Stanford University, Stanford, California, United States
| | - Joseph C Wu
- Stanford Cardiovascular Institute, School of Medicine, Stanford University, Stanford, California, United States
- Greenstone Biosciences, Palo Alto, California, United States
- Division of Cardiovascular Medicine, Department of Medicine, Stanford University, Stanford, California, United States
- Department of Radiology, School of Medicine, Stanford University, Stanford, California, United States
| |
Collapse
|
41
|
Stegmann JU, Littlebury R, Trengove M, Goetz L, Bate A, Branson KM. Trustworthy AI for safe medicines. Nat Rev Drug Discov 2023; 22:855-856. [PMID: 37550364 DOI: 10.1038/s41573-023-00769-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/09/2023]
Affiliation(s)
| | | | - Markus Trengove
- Artificial Intelligence and Machine Learning, GSK, London, UK
| | - Lea Goetz
- Artificial Intelligence and Machine Learning, GSK, London, UK
| | | | - Kim M Branson
- Artificial Intelligence and Machine Learning, GSK, San Francisco, USA
| |
Collapse
|
42
|
Wang SM, Hogg HDJ, Sangvai D, Patel MR, Weissler EH, Kellogg KC, Ratliff W, Balu S, Sendak M. Development and Integration of Machine Learning Algorithm to Identify Peripheral Arterial Disease: Multistakeholder Qualitative Study. JMIR Form Res 2023; 7:e43963. [PMID: 37733427 PMCID: PMC10557008 DOI: 10.2196/43963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 01/20/2023] [Accepted: 04/30/2023] [Indexed: 09/22/2023] Open
Abstract
BACKGROUND Machine learning (ML)-driven clinical decision support (CDS) continues to draw wide interest and investment as a means of improving care quality and value, despite mixed real-world implementation outcomes. OBJECTIVE This study aimed to explore the factors that influence the integration of a peripheral arterial disease (PAD) identification algorithm to implement timely guideline-based care. METHODS A total of 12 semistructured interviews were conducted with individuals from 3 stakeholder groups during the first 4 weeks of integration of an ML-driven CDS. The stakeholder groups included technical, administrative, and clinical members of the team interacting with the ML-driven CDS. The ML-driven CDS identified patients with a high probability of having PAD, and these patients were then reviewed by an interdisciplinary team that developed a recommended action plan and sent recommendations to the patient's primary care provider. Pseudonymized transcripts were coded, and thematic analysis was conducted by a multidisciplinary research team. RESULTS Three themes were identified: positive factors translating in silico performance to real-world efficacy, organizational factors and data structure factors affecting clinical impact, and potential challenges to advancing equity. Our study found that the factors that led to successful translation of in silico algorithm performance to real-world impact were largely nontechnical, given adequate efficacy in retrospective validation, including strong clinical leadership, trustworthy workflows, early consideration of end-user needs, and ensuring that the CDS addresses an actionable problem. Negative factors of integration included failure to incorporate the on-the-ground context, the lack of feedback loops, and data silos limiting the ML-driven CDS. The success criteria for each stakeholder group were also characterized to better understand how teams work together to integrate ML-driven CDS and to understand the varying needs across stakeholder groups. CONCLUSIONS Longitudinal and multidisciplinary stakeholder engagement in the development and integration of ML-driven CDS underpins its effective translation into real-world care. Although previous studies have focused on the technical elements of ML-driven CDS, our study demonstrates the importance of including administrative and operational leaders as well as an early consideration of clinicians' needs. Seeing how different stakeholder groups have this more holistic perspective also permits more effective detection of context-driven health care inequities, which are uncovered or exacerbated via ML-driven CDS integration through structural and organizational challenges. Many of the solutions to these inequities lie outside the scope of ML and require coordinated systematic solutions for mitigation to help reduce disparities in the care of patients with PAD.
Collapse
Affiliation(s)
- Sabrina M Wang
- Duke University School of Medicine, Durham, NC, United States
| | - H D Jeffry Hogg
- Population Health Science Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, United Kingdom
- Newcastle Eye Centre, Royal Victoria Infirmary, Newcastle upon Tyne, United Kingdom
| | - Devdutta Sangvai
- Population Health Management, Duke Health, Durham, NC, United States
| | - Manesh R Patel
- Department of Cardiology, Duke University, Durham, NC, United States
| | - E Hope Weissler
- Department of Vascular Surgery, Duke University, Durham, NC, United States
| | | | - William Ratliff
- Duke Institute for Health Innovation, Durham, NC, United States
| | - Suresh Balu
- Duke Institute for Health Innovation, Durham, NC, United States
| | - Mark Sendak
- Duke Institute for Health Innovation, Durham, NC, United States
| |
Collapse
|
43
|
Kwong JCC, Khondker A, Lajkosz K, McDermott MBA, Frigola XB, McCradden MD, Mamdani M, Kulkarni GS, Johnson AEW. APPRAISE-AI Tool for Quantitative Evaluation of AI Studies for Clinical Decision Support. JAMA Netw Open 2023; 6:e2335377. [PMID: 37747733 PMCID: PMC10520738 DOI: 10.1001/jamanetworkopen.2023.35377] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 08/14/2023] [Indexed: 09/26/2023] Open
Abstract
Importance Artificial intelligence (AI) has gained considerable attention in health care, yet concerns have been raised around appropriate methods and fairness. Current AI reporting guidelines do not provide a means of quantifying overall quality of AI research, limiting their ability to compare models addressing the same clinical question. Objective To develop a tool (APPRAISE-AI) to evaluate the methodological and reporting quality of AI prediction models for clinical decision support. Design, Setting, and Participants This quality improvement study evaluated AI studies in the model development, silent, and clinical trial phases using the APPRAISE-AI tool, a quantitative method for evaluating quality of AI studies across 6 domains: clinical relevance, data quality, methodological conduct, robustness of results, reporting quality, and reproducibility. These domains included 24 items with a maximum overall score of 100 points. Points were assigned to each item, with higher points indicating stronger methodological or reporting quality. The tool was applied to a systematic review on machine learning to estimate sepsis that included articles published until September 13, 2019. Data analysis was performed from September to December 2022. Main Outcomes and Measures The primary outcomes were interrater and intrarater reliability and the correlation between APPRAISE-AI scores and expert scores, 3-year citation rate, number of Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) low risk-of-bias domains, and overall adherence to the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) statement. Results A total of 28 studies were included. Overall APPRAISE-AI scores ranged from 33 (low quality) to 67 (high quality). Most studies were moderate quality. The 5 lowest scoring items included source of data, sample size calculation, bias assessment, error analysis, and transparency. Overall APPRAISE-AI scores were associated with expert scores (Spearman ρ, 0.82; 95% CI, 0.64-0.91; P < .001), 3-year citation rate (Spearman ρ, 0.69; 95% CI, 0.43-0.85; P < .001), number of QUADAS-2 low risk-of-bias domains (Spearman ρ, 0.56; 95% CI, 0.24-0.77; P = .002), and adherence to the TRIPOD statement (Spearman ρ, 0.87; 95% CI, 0.73-0.94; P < .001). Intraclass correlation coefficient ranges for interrater and intrarater reliability were 0.74 to 1.00 for individual items, 0.81 to 0.99 for individual domains, and 0.91 to 0.98 for overall scores. Conclusions and Relevance In this quality improvement study, APPRAISE-AI demonstrated strong interrater and intrarater reliability and correlated well with several study quality measures. This tool may provide a quantitative approach for investigators, reviewers, editors, and funding organizations to compare the research quality across AI studies for clinical decision support.
Collapse
Affiliation(s)
- Jethro C. C. Kwong
- Division of Urology, Department of Surgery, University of Toronto, Toronto, Ontario, Canada
- Temerty Centre for AI Research and Education in Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Adree Khondker
- Division of Urology, Department of Surgery, University of Toronto, Toronto, Ontario, Canada
| | - Katherine Lajkosz
- Division of Urology, Department of Surgery, University of Toronto, Toronto, Ontario, Canada
- Department of Biostatistics, University Health Network, University of Toronto, Toronto, Ontario, Canada
| | | | - Xavier Borrat Frigola
- Laboratory for Computational Physiology, Harvard–Massachusetts Institute of Technology Division of Health Sciences and Technology, Cambridge
- Anesthesiology and Critical Care Department, Hospital Clinic de Barcelona, Barcelona, Spain
| | - Melissa D. McCradden
- Department of Bioethics, The Hospital for Sick Children, Toronto, Ontario, Canada
- Genetics & Genome Biology Research Program, Peter Gilgan Centre for Research and Learning, Toronto, Ontario, Canada
- Division of Clinical and Public Health, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Muhammad Mamdani
- Temerty Centre for AI Research and Education in Medicine, University of Toronto, Toronto, Ontario, Canada
- Data Science and Advanced Analytics, Unity Health Toronto, Toronto, Ontario, Canada
| | - Girish S. Kulkarni
- Division of Urology, Department of Surgery, University of Toronto, Toronto, Ontario, Canada
- Princess Margaret Cancer Centre, University Health Network, University of Toronto, Toronto, Ontario, Canada
| | - Alistair E. W. Johnson
- Temerty Centre for AI Research and Education in Medicine, University of Toronto, Toronto, Ontario, Canada
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
- Child Health Evaluative Sciences, The Hospital for Sick Children, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
44
|
Balch JA, Loftus TJ. Actionable artificial intelligence: Overcoming barriers to adoption of prediction tools. Surgery 2023; 174:730-732. [PMID: 37198040 DOI: 10.1016/j.surg.2023.03.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 03/30/2023] [Indexed: 05/19/2023]
Abstract
Clinical prediction models based on artificial intelligence algorithms can potentially improve patient care, reduce errors, and add value to the health care system. However, their adoption is hindered by legitimate economic, practical, professional, and intellectual concerns. This article explores these barriers and highlights well-studied instruments that can be used to overcome them. Adopting actionable predictive models will require the purposeful incorporation of patient, clinical, technical, and administrative perspectives. Model developers must articulate a priori clinical needs, ensure explainability and low error frequency and severity, and promote safety and fairness. Models themselves require ongoing validation and monitoring to address variations in health care settings and must comply with an evolving regulatory environment. Through these principles, surgeons and health care providers can leverage artificial intelligence to optimize patient care.
Collapse
Affiliation(s)
- Jeremy A Balch
- Department of Surgery, University of Florida Health, Gainesville, FL; Intelligent Critical Care Center (IC3), University of Florida, Gainesville, FL. https://twitter.com/balchja
| | - Tyler J Loftus
- Department of Surgery, University of Florida Health, Gainesville, FL; Intelligent Critical Care Center (IC3), University of Florida, Gainesville, FL.
| |
Collapse
|
45
|
Wang C, Liu S, Yang H, Guo J, Wu Y, Liu J. Ethical Considerations of Using ChatGPT in Health Care. J Med Internet Res 2023; 25:e48009. [PMID: 37566454 PMCID: PMC10457697 DOI: 10.2196/48009] [Citation(s) in RCA: 78] [Impact Index Per Article: 78.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Revised: 07/05/2023] [Accepted: 07/25/2023] [Indexed: 08/12/2023] Open
Abstract
ChatGPT has promising applications in health care, but potential ethical issues need to be addressed proactively to prevent harm. ChatGPT presents potential ethical challenges from legal, humanistic, algorithmic, and informational perspectives. Legal ethics concerns arise from the unclear allocation of responsibility when patient harm occurs and from potential breaches of patient privacy due to data collection. Clear rules and legal boundaries are needed to properly allocate liability and protect users. Humanistic ethics concerns arise from the potential disruption of the physician-patient relationship, humanistic care, and issues of integrity. Overreliance on artificial intelligence (AI) can undermine compassion and erode trust. Transparency and disclosure of AI-generated content are critical to maintaining integrity. Algorithmic ethics raise concerns about algorithmic bias, responsibility, transparency and explainability, as well as validation and evaluation. Information ethics include data bias, validity, and effectiveness. Biased training data can lead to biased output, and overreliance on ChatGPT can reduce patient adherence and encourage self-diagnosis. Ensuring the accuracy, reliability, and validity of ChatGPT-generated content requires rigorous validation and ongoing updates based on clinical practice. To navigate the evolving ethical landscape of AI, AI in health care must adhere to the strictest ethical standards. Through comprehensive ethical guidelines, health care professionals can ensure the responsible use of ChatGPT, promote accurate and reliable information exchange, protect patient privacy, and empower patients to make informed decisions about their health care.
Collapse
Affiliation(s)
- Changyu Wang
- Department of Medical Informatics, West China Medical School, Sichuan University, Chengdu, China
- West China College of Stomatology, Sichuan University, Chengdu, China
| | - Siru Liu
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Hao Yang
- Information Center, West China Hospital, Sichuan University, Chengdu, China
| | - Jiulin Guo
- Information Center, West China Hospital, Sichuan University, Chengdu, China
| | - Yuxuan Wu
- Department of Medical Informatics, West China Medical School, Sichuan University, Chengdu, China
| | - Jialin Liu
- Department of Medical Informatics, West China Medical School, Sichuan University, Chengdu, China
- Information Center, West China Hospital, Sichuan University, Chengdu, China
- Department of Otolaryngology-Head and Neck Surgery, West China Hospital, Sichuan University, Chengdu, China
| |
Collapse
|
46
|
Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, Scales N, Tanwani A, Cole-Lewis H, Pfohl S, Payne P, Seneviratne M, Gamble P, Kelly C, Babiker A, Schärli N, Chowdhery A, Mansfield P, Demner-Fushman D, Agüera Y Arcas B, Webster D, Corrado GS, Matias Y, Chou K, Gottweis J, Tomasev N, Liu Y, Rajkomar A, Barral J, Semturs C, Karthikesalingam A, Natarajan V. Large language models encode clinical knowledge. Nature 2023; 620:172-180. [PMID: 37438534 PMCID: PMC10396962 DOI: 10.1038/s41586-023-06291-2] [Citation(s) in RCA: 435] [Impact Index Per Article: 435.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Accepted: 06/05/2023] [Indexed: 07/14/2023]
Abstract
Large language models (LLMs) have demonstrated impressive capabilities, but the bar for clinical applications is high. Attempts to assess the clinical knowledge of models typically rely on automated evaluations based on limited benchmarks. Here, to address these limitations, we present MultiMedQA, a benchmark combining six existing medical question answering datasets spanning professional medicine, research and consumer queries and a new dataset of medical questions searched online, HealthSearchQA. We propose a human evaluation framework for model answers along multiple axes including factuality, comprehension, reasoning, possible harm and bias. In addition, we evaluate Pathways Language Model1 (PaLM, a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM2 on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA3, MedMCQA4, PubMedQA5 and Measuring Massive Multitask Language Understanding (MMLU) clinical topics6), including 67.6% accuracy on MedQA (US Medical Licensing Exam-style questions), surpassing the prior state of the art by more than 17%. However, human evaluation reveals key gaps. To resolve this, we introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians. We show that comprehension, knowledge recall and reasoning improve with model scale and instruction prompt tuning, suggesting the potential utility of LLMs in medicine. Our human evaluations reveal limitations of today's models, reinforcing the importance of both evaluation frameworks and method development in creating safe, helpful LLMs for clinical applications.
Collapse
Affiliation(s)
| | | | - Tao Tu
- Google Research, Mountain View, CA, USA
| | | | - Jason Wei
- Google Research, Mountain View, CA, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Yun Liu
- Google Research, Mountain View, CA, USA
| | | | | | | | | | | |
Collapse
|
47
|
van Leeuwen K, Becks M, Grob D, de Lange F, Rutten J, Schalekamp S, Rutten M, van Ginneken B, de Rooij M, Meijer F. AI-support for the detection of intracranial large vessel occlusions: One-year prospective evaluation. Heliyon 2023; 9:e19065. [PMID: 37636476 PMCID: PMC10458691 DOI: 10.1016/j.heliyon.2023.e19065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 08/07/2023] [Accepted: 08/09/2023] [Indexed: 08/29/2023] Open
Abstract
Purpose Few studies have evaluated real-world performance of radiological AI-tools in clinical practice. Over one-year, we prospectively evaluated the use of AI software to support the detection of intracranial large vessel occlusions (LVO) on CT angiography (CTA). Method Quantitative measures (user log-in attempts, AI standalone performance) and qualitative data (user surveys) were reviewed by a key-user group at three timepoints. A total of 491 CTA studies of 460 patients were included for analysis. Results The overall accuracy of the AI-tool for LVO detection and localization was 87.6%, sensitivity 69.1% and specificity 91.2%. Out of 81 LVOs, 31 of 34 (91%) M1 occlusions were detected correctly, 19 of 38 (50%) M2 occlusions, and 6 of 9 (67%) ICA occlusions. The product was considered user-friendly. The diagnostic confidence of the users for LVO detection remained the same over the year. The last measured net promotor score was -56%. The use of the AI-tool fluctuated over the year with a declining trend. Conclusions Our pragmatic approach of evaluating the AI-tool used in clinical practice, helped us to monitor the usage, to estimate the perceived added value by the users of the AI-tool, and to make an informed decision about the continuation of the use of the AI-tool.
Collapse
Affiliation(s)
- K.G. van Leeuwen
- Department of Medical Imaging, Radboud University Medical Center, Nijmegen, the Netherlands
| | - M.J. Becks
- Department of Medical Imaging, Radboud University Medical Center, Nijmegen, the Netherlands
| | - D. Grob
- Department of Medical Imaging, Radboud University Medical Center, Nijmegen, the Netherlands
| | - F. de Lange
- Department of Medical Imaging, Radboud University Medical Center, Nijmegen, the Netherlands
| | - J.H.E. Rutten
- Department of Medical Imaging, Radboud University Medical Center, Nijmegen, the Netherlands
| | - S. Schalekamp
- Department of Medical Imaging, Radboud University Medical Center, Nijmegen, the Netherlands
| | - M.J.C.M. Rutten
- Department of Medical Imaging, Radboud University Medical Center, Nijmegen, the Netherlands
- Department of Radiology, Jeroen Bosch Hospital, ‘s-Hertogenbosch, the Netherlands
| | - B. van Ginneken
- Department of Medical Imaging, Radboud University Medical Center, Nijmegen, the Netherlands
| | - M. de Rooij
- Department of Medical Imaging, Radboud University Medical Center, Nijmegen, the Netherlands
| | - F.J.A. Meijer
- Department of Medical Imaging, Radboud University Medical Center, Nijmegen, the Netherlands
| |
Collapse
|
48
|
Stutchfield BM, Attia A, Rowe IA, Harrison EM, Gordon-Walker T. UK liver transplantation allocation algorithm: transplant benefit score - Authors' reply. Lancet 2023; 402:371-372. [PMID: 37516542 DOI: 10.1016/s0140-6736(23)01307-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 06/21/2023] [Indexed: 07/31/2023]
Affiliation(s)
- Ben M Stutchfield
- Department of Clinical and Surgical Sciences, University of Edinburgh, Edinburgh EH14 4SA, UK; Edinburgh Transplant Centre, Royal Infirmary of Edinburgh, Edinburgh, UK.
| | - Antony Attia
- School of Medicine, University of Edinburgh, Edinburgh EH14 4SA, UK
| | - Ian A Rowe
- Leeds Institute for Medical Research, University of Leeds, Leeds, UK
| | - Ewen M Harrison
- Department of Clinical and Surgical Sciences, University of Edinburgh, Edinburgh EH14 4SA, UK; Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh, UK
| | - Tim Gordon-Walker
- Edinburgh Transplant Centre, Royal Infirmary of Edinburgh, Edinburgh, UK
| |
Collapse
|
49
|
Banda JM, Shah NH, Periyakoil VS. Characterizing subgroup performance of probabilistic phenotype algorithms within older adults: a case study for dementia, mild cognitive impairment, and Alzheimer's and Parkinson's diseases. JAMIA Open 2023; 6:ooad043. [PMID: 37397506 PMCID: PMC10307941 DOI: 10.1093/jamiaopen/ooad043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 06/06/2023] [Accepted: 06/22/2023] [Indexed: 07/04/2023] Open
Abstract
Objective Biases within probabilistic electronic phenotyping algorithms are largely unexplored. In this work, we characterize differences in subgroup performance of phenotyping algorithms for Alzheimer's disease and related dementias (ADRD) in older adults. Materials and methods We created an experimental framework to characterize the performance of probabilistic phenotyping algorithms under different racial distributions allowing us to identify which algorithms may have differential performance, by how much, and under what conditions. We relied on rule-based phenotype definitions as reference to evaluate probabilistic phenotype algorithms created using the Automated PHenotype Routine for Observational Definition, Identification, Training and Evaluation framework. Results We demonstrate that some algorithms have performance variations anywhere from 3% to 30% for different populations, even when not using race as an input variable. We show that while performance differences in subgroups are not present for all phenotypes, they do affect some phenotypes and groups more disproportionately than others. Discussion Our analysis establishes the need for a robust evaluation framework for subgroup differences. The underlying patient populations for the algorithms showing subgroup performance differences have great variance between model features when compared with the phenotypes with little to no differences. Conclusion We have created a framework to identify systematic differences in the performance of probabilistic phenotyping algorithms specifically in the context of ADRD as a use case. Differences in subgroup performance of probabilistic phenotyping algorithms are not widespread nor do they occur consistently. This highlights the great need for careful ongoing monitoring to evaluate, measure, and try to mitigate such differences.
Collapse
Affiliation(s)
- Juan M Banda
- Corresponding Author: Juan M. Banda, PhD, Department of Computer Science, College of Arts and Sciences, Georgia State University, 25 Park Place, Suite 752, Atlanta, GA 30303, USA;
| | - Nigam H Shah
- Stanford Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, California, USA
| | - Vyjeyanthi S Periyakoil
- Stanford Department of Medicine, Palo Alto, California, USA
- VA Palo Alto Health Care System, Palo Alto, California, USA
| |
Collapse
|
50
|
Kwong JCC, Khondker A, Meng E, Taylor N, Kuk C, Perlis N, Kulkarni GS, Hamilton RJ, Fleshner NE, Finelli A, van der Kwast TH, Ali A, Jamal M, Papanikolaou F, Short T, Srigley JR, Colinet V, Peltier A, Diamand R, Lefebvre Y, Mandoorah Q, Sanchez-Salas R, Macek P, Cathelineau X, Eklund M, Johnson AEW, Feifer A, Zlotta AR. Development, multi-institutional external validation, and algorithmic audit of an artificial intelligence-based Side-specific Extra-Prostatic Extension Risk Assessment tool (SEPERA) for patients undergoing radical prostatectomy: a retrospective cohort study. Lancet Digit Health 2023; 5:e435-e445. [PMID: 37211455 DOI: 10.1016/s2589-7500(23)00067-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 02/11/2023] [Accepted: 03/22/2023] [Indexed: 05/23/2023]
Abstract
BACKGROUND Accurate prediction of side-specific extraprostatic extension (ssEPE) is essential for performing nerve-sparing surgery to mitigate treatment-related side-effects such as impotence and incontinence in patients with localised prostate cancer. Artificial intelligence (AI) might provide robust and personalised ssEPE predictions to better inform nerve-sparing strategy during radical prostatectomy. We aimed to develop, externally validate, and perform an algorithmic audit of an AI-based Side-specific Extra-Prostatic Extension Risk Assessment tool (SEPERA). METHODS Each prostatic lobe was treated as an individual case such that each patient contributed two cases to the overall cohort. SEPERA was trained on 1022 cases from a community hospital network (Trillium Health Partners; Mississauga, ON, Canada) between 2010 and 2020. Subsequently, SEPERA was externally validated on 3914 cases across three academic centres: Princess Margaret Cancer Centre (Toronto, ON, Canada) from 2008 to 2020; L'Institut Mutualiste Montsouris (Paris, France) from 2010 to 2020; and Jules Bordet Institute (Brussels, Belgium) from 2015 to 2020. Model performance was characterised by area under the receiver operating characteristic curve (AUROC), area under the precision recall curve (AUPRC), calibration, and net benefit. SEPERA was compared against contemporary nomograms (ie, Sayyid nomogram, Soeterik nomogram [non-MRI and MRI]), as well as a separate logistic regression model using the same variables included in SEPERA. An algorithmic audit was performed to assess model bias and identify common patient characteristics among predictive errors. FINDINGS Overall, 2468 patients comprising 4936 cases (ie, prostatic lobes) were included in this study. SEPERA was well calibrated and had the best performance across all validation cohorts (pooled AUROC of 0·77 [95% CI 0·75-0·78] and pooled AUPRC of 0·61 [0·58-0·63]). In patients with pathological ssEPE despite benign ipsilateral biopsies, SEPERA correctly predicted ssEPE in 72 (68%) of 106 cases compared with the other models (47 [44%] in the logistic regression model, none in the Sayyid model, 13 [12%] in the Soeterik non-MRI model, and five [5%] in the Soeterik MRI model). SEPERA had higher net benefit than the other models to predict ssEPE, enabling more patients to safely undergo nerve-sparing. In the algorithmic audit, no evidence of model bias was observed, with no significant difference in AUROC when stratified by race, biopsy year, age, biopsy type (systematic only vs systematic and MRI-targeted biopsy), biopsy location (academic vs community), and D'Amico risk group. According to the audit, the most common errors were false positives, particularly for older patients with high-risk disease. No aggressive tumours (ie, grade >2 or high-risk disease) were found among false negatives. INTERPRETATION We demonstrated the accuracy, safety, and generalisability of using SEPERA to personalise nerve-sparing approaches during radical prostatectomy. FUNDING None.
Collapse
Affiliation(s)
- Jethro C C Kwong
- Division of Urology, Department of Surgery, University of Toronto, Toronto, ON, Canada; Division of Urology, Department of Surgery, Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada; Temerty Centre for AI Research and Education in Medicine, University of Toronto, Toronto, ON, Canada
| | - Adree Khondker
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Eric Meng
- Faculty of Medicine, Queen's University, Kingston, ON, Canada
| | - Nicholas Taylor
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Cynthia Kuk
- Division of Urology, Department of Surgery, Mount Sinai Hospital, Sinai Health System, Toronto, ON, Canada
| | - Nathan Perlis
- Division of Urology, Department of Surgery, University of Toronto, Toronto, ON, Canada; Division of Urology, Department of Surgery, Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Girish S Kulkarni
- Division of Urology, Department of Surgery, University of Toronto, Toronto, ON, Canada; Division of Urology, Department of Surgery, Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada; Temerty Centre for AI Research and Education in Medicine, University of Toronto, Toronto, ON, Canada
| | - Robert J Hamilton
- Division of Urology, Department of Surgery, University of Toronto, Toronto, ON, Canada; Division of Urology, Department of Surgery, Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Neil E Fleshner
- Division of Urology, Department of Surgery, University of Toronto, Toronto, ON, Canada; Division of Urology, Department of Surgery, Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Antonio Finelli
- Division of Urology, Department of Surgery, University of Toronto, Toronto, ON, Canada; Division of Urology, Department of Surgery, Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Theodorus H van der Kwast
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada; Laboratory Medicine Program, Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Amna Ali
- Institute for Better Health, Trillium Health Partners, Mississauga, ON, Canada
| | - Munir Jamal
- Division of Urology, Department of Surgery, University of Toronto, Toronto, ON, Canada
| | - Frank Papanikolaou
- Division of Urology, Department of Surgery, University of Toronto, Toronto, ON, Canada
| | - Thomas Short
- Division of Urology, Department of Surgery, University of Toronto, Toronto, ON, Canada
| | - John R Srigley
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada
| | - Valentin Colinet
- Division of Urology, Department of Surgery, Jules Bordet Institute, Brussels, Belgium
| | - Alexandre Peltier
- Division of Urology, Department of Surgery, Jules Bordet Institute, Brussels, Belgium
| | - Romain Diamand
- Division of Urology, Department of Surgery, Jules Bordet Institute, Brussels, Belgium
| | - Yolene Lefebvre
- Department of Medical Imagery, Jules Bordet Institute, Brussels, Belgium
| | - Qusay Mandoorah
- Division of Urology, Department of Surgery, L'Institut Mutualiste Montsouris, Paris, France
| | - Rafael Sanchez-Salas
- Division of Urology, Department of Surgery, L'Institut Mutualiste Montsouris, Paris, France
| | - Petr Macek
- Division of Urology, Department of Surgery, L'Institut Mutualiste Montsouris, Paris, France
| | - Xavier Cathelineau
- Division of Urology, Department of Surgery, L'Institut Mutualiste Montsouris, Paris, France
| | - Martin Eklund
- Department of Medical Epidemiology and Biostatistics, Karolinska Institute, Stockholm, Sweden
| | - Alistair E W Johnson
- Temerty Centre for AI Research and Education in Medicine, University of Toronto, Toronto, ON, Canada; Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada; Vector Institute, Toronto, ON, Canada
| | - Andrew Feifer
- Division of Urology, Department of Surgery, University of Toronto, Toronto, ON, Canada; Institute for Better Health, Trillium Health Partners, Mississauga, ON, Canada
| | - Alexandre R Zlotta
- Division of Urology, Department of Surgery, University of Toronto, Toronto, ON, Canada; Division of Urology, Department of Surgery, Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada; Division of Urology, Department of Surgery, Mount Sinai Hospital, Sinai Health System, Toronto, ON, Canada.
| |
Collapse
|