1
|
Tao BKL, Hua N, Milkovich J, Micieli JA. ChatGPT-3.5 and Bing Chat in ophthalmology: an updated evaluation of performance, readability, and informative sources. Eye (Lond) 2024; 38:1897-1902. [PMID: 38509182 PMCID: PMC11226422 DOI: 10.1038/s41433-024-03037-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Revised: 03/04/2024] [Accepted: 03/14/2024] [Indexed: 03/22/2024] Open
Abstract
BACKGROUND/OBJECTIVES Experimental investigation. Bing Chat (Microsoft) integration with ChatGPT-4 (OpenAI) integration has conferred the capability of accessing online data past 2021. We investigate its performance against ChatGPT-3.5 on a multiple-choice question ophthalmology exam. SUBJECTS/METHODS In August 2023, ChatGPT-3.5 and Bing Chat were evaluated against 913 questions derived from the Academy's Basic and Clinical Science Collection collection. For each response, the sub-topic, performance, Simple Measure of Gobbledygook readability score (measuring years of required education to understand a given passage), and cited resources were collected. The primary outcomes were the comparative scores between models, and qualitatively, the resources referenced by Bing Chat. Secondary outcomes included performance stratified by response readability, question type (explicit or situational), and BCSC sub-topic. RESULTS Across 913 questions, ChatGPT-3.5 scored 59.69% [95% CI 56.45,62.94] while Bing Chat scored 73.60% [95% CI 70.69,76.52]. Both models performed significantly better in explicit than clinical reasoning questions. Both models performed best on general medicine questions than ophthalmology subsections. Bing Chat referenced 927 online entities and provided at-least one citation to 836 of the 913 questions. The use of more reliable (peer-reviewed) sources was associated with higher likelihood of correct response. The most-cited resources were eyewiki.aao.org, aao.org, wikipedia.org, and ncbi.nlm.nih.gov. Bing Chat showed significantly better readability than ChatGPT-3.5, averaging a reading level of grade 11.4 [95% CI 7.14, 15.7] versus 12.4 [95% CI 8.77, 16.1], respectively (p-value < 0.0001, ρ = 0.25). CONCLUSIONS The online access, improved readability, and citation feature of Bing Chat confers additional utility for ophthalmology learners. We recommend critical appraisal of cited sources during response interpretation.
Collapse
Affiliation(s)
- Brendan Ka-Lok Tao
- Faculty of Medicine, The University of British Columbia, 317-2194 Health Sciences Mall, Vancouver, BC, V6T 1Z3, Canada
| | - Nicholas Hua
- Temerty Faculty of Medicine, University of Toronto, 1 King's College Circle, Toronto, ON, M5S 1A8, Canada
| | - John Milkovich
- Temerty Faculty of Medicine, University of Toronto, 1 King's College Circle, Toronto, ON, M5S 1A8, Canada
| | - Jonathan Andrew Micieli
- Temerty Faculty of Medicine, University of Toronto, 1 King's College Circle, Toronto, ON, M5S 1A8, Canada.
- Department of Ophthalmology and Vision Sciences, University of Toronto, 340 College Street, Toronto, ON, M5T 3A9, Canada.
- Division of Neurology, Department of Medicine, University of Toronto, 6 Queen's Park Crescent West, Toronto, ON, M5S 3H2, Canada.
- Kensington Vision and Research Center, 340 College Street, Toronto, ON, M5T 3A9, Canada.
- St. Michael's Hospital, 36 Queen Street East, Toronto, ON, M5B 1W8, Canada.
- Toronto Western Hospital, 399 Bathurst Street, Toronto, ON, M5T 2S8, Canada.
- University Health Network, 190 Elizabeth Street, Toronto, ON, M5G 2C4, Canada.
| |
Collapse
|
2
|
Gui H, Omiye JA, Chang CT, Daneshjou R. The Promises and Perils of Foundation Models in Dermatology. J Invest Dermatol 2024; 144:1440-1448. [PMID: 38441507 DOI: 10.1016/j.jid.2023.12.019] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 12/19/2023] [Accepted: 12/20/2023] [Indexed: 06/24/2024]
Abstract
Foundation models (FM), which are large-scale artificial intelligence (AI) models that can complete a range of tasks, represent a paradigm shift in AI. These versatile models encompass large language models, vision-language models, and multimodal models. Although these models are often trained for broad tasks, they have been applied either out of the box or after additional fine tuning to tasks in medicine, including dermatology. From addressing administrative tasks to answering dermatology questions, these models are poised to have an impact on dermatology care delivery. As FMs become more ubiquitous in health care, it is important for clinicians and dermatologists to have a basic understanding of how these models are developed, what they are capable of, and what pitfalls exist. In this paper, we present a comprehensive yet accessible overview of the current state of FMs and summarize their current applications in dermatology, highlight their limitations, and discuss future developments in the field.
Collapse
Affiliation(s)
- Haiwen Gui
- Department of Dermatology, Stanford University, Stanford, California, USA.
| | - Jesutofunmi A Omiye
- Department of Dermatology, Stanford University, Stanford, California, USA; Department of Biomedical Data Science, Stanford University, Stanford, California, USA
| | - Crystal T Chang
- Department of Dermatology, Stanford University, Stanford, California, USA; Clinical Excellence Research Center, School of Medicine, Stanford University, Palo Alto, California, USA
| | - Roxana Daneshjou
- Department of Dermatology, Stanford University, Stanford, California, USA; Department of Biomedical Data Science, Stanford University, Stanford, California, USA
| |
Collapse
|
3
|
Kale AU, Hogg HDJ, Pearson R, Glocker B, Golder S, Coombe A, Waring J, Liu X, Moore DJ, Denniston AK. Detecting Algorithmic Errors and Patient Harms for AI-Enabled Medical Devices in Randomized Controlled Trials: Protocol for a Systematic Review. JMIR Res Protoc 2024; 13:e51614. [PMID: 38941147 PMCID: PMC11245650 DOI: 10.2196/51614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 03/11/2024] [Accepted: 04/18/2024] [Indexed: 06/29/2024] Open
Abstract
BACKGROUND Artificial intelligence (AI) medical devices have the potential to transform existing clinical workflows and ultimately improve patient outcomes. AI medical devices have shown potential for a range of clinical tasks such as diagnostics, prognostics, and therapeutic decision-making such as drug dosing. There is, however, an urgent need to ensure that these technologies remain safe for all populations. Recent literature demonstrates the need for rigorous performance error analysis to identify issues such as algorithmic encoding of spurious correlations (eg, protected characteristics) or specific failure modes that may lead to patient harm. Guidelines for reporting on studies that evaluate AI medical devices require the mention of performance error analysis; however, there is still a lack of understanding around how performance errors should be analyzed in clinical studies, and what harms authors should aim to detect and report. OBJECTIVE This systematic review will assess the frequency and severity of AI errors and adverse events (AEs) in randomized controlled trials (RCTs) investigating AI medical devices as interventions in clinical settings. The review will also explore how performance errors are analyzed including whether the analysis includes the investigation of subgroup-level outcomes. METHODS This systematic review will identify and select RCTs assessing AI medical devices. Search strategies will be deployed in MEDLINE (Ovid), Embase (Ovid), Cochrane CENTRAL, and clinical trial registries to identify relevant papers. RCTs identified in bibliographic databases will be cross-referenced with clinical trial registries. The primary outcomes of interest are the frequency and severity of AI errors, patient harms, and reported AEs. Quality assessment of RCTs will be based on version 2 of the Cochrane risk-of-bias tool (RoB2). Data analysis will include a comparison of error rates and patient harms between study arms, and a meta-analysis of the rates of patient harm in control versus intervention arms will be conducted if appropriate. RESULTS The project was registered on PROSPERO in February 2023. Preliminary searches have been completed and the search strategy has been designed in consultation with an information specialist and methodologist. Title and abstract screening started in September 2023. Full-text screening is ongoing and data collection and analysis began in April 2024. CONCLUSIONS Evaluations of AI medical devices have shown promising results; however, reporting of studies has been variable. Detection, analysis, and reporting of performance errors and patient harms is vital to robustly assess the safety of AI medical devices in RCTs. Scoping searches have illustrated that the reporting of harms is variable, often with no mention of AEs. The findings of this systematic review will identify the frequency and severity of AI performance errors and patient harms and generate insights into how errors should be analyzed to account for both overall and subgroup performance. TRIAL REGISTRATION PROSPERO CRD42023387747; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=387747. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) PRR1-10.2196/51614.
Collapse
Affiliation(s)
- Aditya U Kale
- Institute of Inflammation and Ageing, University of Birmingham, Birmingham, United Kingdom
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, United Kingdom
- NIHR Birmingham Biomedical Research Centre, Birmingham, United Kingdom
- NIHR Incubator for AI and Digital Health Research, Birmingham, United Kingdom
| | - Henry David Jeffry Hogg
- Population Health Science Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Russell Pearson
- Medicines and Healthcare Products Regulatory Agency, London, United Kingdom
| | - Ben Glocker
- Kheiron Medical Technologies, London, United Kingdom
- Department of Computing, Imperial College London, London, United Kingdom
| | - Su Golder
- Department of Health Sciences, University of York, York, United Kingdom
| | - April Coombe
- Institute of Applied Health Research, University of Birmingham, Birmingham, United Kingdom
| | - Justin Waring
- Health Services Management Centre, University of Birmingham, Birmingham, United Kingdom
| | - Xiaoxuan Liu
- Institute of Inflammation and Ageing, University of Birmingham, Birmingham, United Kingdom
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, United Kingdom
- NIHR Birmingham Biomedical Research Centre, Birmingham, United Kingdom
- NIHR Incubator for AI and Digital Health Research, Birmingham, United Kingdom
| | - David J Moore
- Institute of Applied Health Research, University of Birmingham, Birmingham, United Kingdom
| | - Alastair K Denniston
- Institute of Inflammation and Ageing, University of Birmingham, Birmingham, United Kingdom
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, United Kingdom
- NIHR Birmingham Biomedical Research Centre, Birmingham, United Kingdom
- NIHR Incubator for AI and Digital Health Research, Birmingham, United Kingdom
| |
Collapse
|
4
|
Tan YY, Kang HG, Lee CJ, Kim SS, Park S, Thakur S, Da Soh Z, Cho Y, Peng Q, Tham YC, Rim TH, Cheng CY. Prognostic potentials of AI in ophthalmology: systemic disease forecasting via retinal imaging. EYE AND VISION (LONDON, ENGLAND) 2024; 11:17. [PMID: 38711111 PMCID: PMC11071258 DOI: 10.1186/s40662-024-00384-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 04/17/2024] [Indexed: 05/08/2024]
Abstract
BACKGROUND Artificial intelligence (AI) that utilizes deep learning (DL) has potential for systemic disease prediction using retinal imaging. The retina's unique features enable non-invasive visualization of the central nervous system and microvascular circulation, aiding early detection and personalized treatment plans for personalized care. This review explores the value of retinal assessment, AI-based retinal biomarkers, and the importance of longitudinal prediction models in personalized care. MAIN TEXT This narrative review extensively surveys the literature for relevant studies in PubMed and Google Scholar, investigating the application of AI-based retina biomarkers in predicting systemic diseases using retinal fundus photography. The study settings, sample sizes, utilized AI models and corresponding results were extracted and analysed. This review highlights the substantial potential of AI-based retinal biomarkers in predicting neurodegenerative, cardiovascular, and chronic kidney diseases. Notably, DL algorithms have demonstrated effectiveness in identifying retinal image features associated with cognitive decline, dementia, Parkinson's disease, and cardiovascular risk factors. Furthermore, longitudinal prediction models leveraging retinal images have shown potential in continuous disease risk assessment and early detection. AI-based retinal biomarkers are non-invasive, accurate, and efficient for disease forecasting and personalized care. CONCLUSION AI-based retinal imaging hold promise in transforming primary care and systemic disease management. Together, the retina's unique features and the power of AI enable early detection, risk stratification, and help revolutionizing disease management plans. However, to fully realize the potential of AI in this domain, further research and validation in real-world settings are essential.
Collapse
Affiliation(s)
| | - Hyun Goo Kang
- Division of Retina, Severance Eye Hospital, Yonsei University College of Medicine, Seoul, South Korea
| | - Chan Joo Lee
- Division of Cardiology, Severance Cardiovascular Hospital, Yonsei University College of Medicine, Seoul, South Korea
| | - Sung Soo Kim
- Division of Retina, Severance Eye Hospital, Yonsei University College of Medicine, Seoul, South Korea
| | - Sungha Park
- Division of Cardiology, Severance Cardiovascular Hospital, Yonsei University College of Medicine, Seoul, South Korea
| | - Sahil Thakur
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore
| | - Zhi Da Soh
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore
- Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Yunnie Cho
- Mediwhale Inc, Seoul, Republic of Korea
- Department of Education and Human Resource Development, Seoul National University Hospital, Seoul, South Korea
| | - Qingsheng Peng
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore
| | - Yih-Chung Tham
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore
- Mediwhale Inc, Seoul, Republic of Korea
| | - Tyler Hyungtaek Rim
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore.
- Mediwhale Inc, Seoul, Republic of Korea.
| | - Ching-Yu Cheng
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore
- Centre for Innovation and Precision Eye Health and Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| |
Collapse
|
5
|
Khan SD, Hoodbhoy Z, Raja MHR, Kim JY, Hogg HDJ, Manji AAA, Gulamali F, Hasan A, Shaikh A, Tajuddin S, Khan NS, Patel MR, Balu S, Samad Z, Sendak MP. Frameworks for procurement, integration, monitoring, and evaluation of artificial intelligence tools in clinical settings: A systematic review. PLOS DIGITAL HEALTH 2024; 3:e0000514. [PMID: 38809946 PMCID: PMC11135672 DOI: 10.1371/journal.pdig.0000514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 04/18/2024] [Indexed: 05/31/2024]
Abstract
Research on the applications of artificial intelligence (AI) tools in medicine has increased exponentially over the last few years but its implementation in clinical practice has not seen a commensurate increase with a lack of consensus on implementing and maintaining such tools. This systematic review aims to summarize frameworks focusing on procuring, implementing, monitoring, and evaluating AI tools in clinical practice. A comprehensive literature search, following PRSIMA guidelines was performed on MEDLINE, Wiley Cochrane, Scopus, and EBSCO databases, to identify and include articles recommending practices, frameworks or guidelines for AI procurement, integration, monitoring, and evaluation. From the included articles, data regarding study aim, use of a framework, rationale of the framework, details regarding AI implementation involving procurement, integration, monitoring, and evaluation were extracted. The extracted details were then mapped on to the Donabedian Plan, Do, Study, Act cycle domains. The search yielded 17,537 unique articles, out of which 47 were evaluated for inclusion based on their full texts and 25 articles were included in the review. Common themes extracted included transparency, feasibility of operation within existing workflows, integrating into existing workflows, validation of the tool using predefined performance indicators and improving the algorithm and/or adjusting the tool to improve performance. Among the four domains (Plan, Do, Study, Act) the most common domain was Plan (84%, n = 21), followed by Study (60%, n = 15), Do (52%, n = 13), & Act (24%, n = 6). Among 172 authors, only 1 (0.6%) was from a low-income country (LIC) and 2 (1.2%) were from lower-middle-income countries (LMICs). Healthcare professionals cite the implementation of AI tools within clinical settings as challenging owing to low levels of evidence focusing on integration in the Do and Act domains. The current healthcare AI landscape calls for increased data sharing and knowledge translation to facilitate common goals and reap maximum clinical benefit.
Collapse
Affiliation(s)
- Sarim Dawar Khan
- CITRIC Health Data Science Centre, Department of Medicine, Aga Khan University, Karachi, Pakistan
| | - Zahra Hoodbhoy
- CITRIC Health Data Science Centre, Department of Medicine, Aga Khan University, Karachi, Pakistan
- Department of Paediatrics and Child Health, Aga Khan University, Karachi, Pakistan
| | | | - Jee Young Kim
- Duke Institute for Health Innovation, Duke University School of Medicine, Durham, North Carolina, United States
| | - Henry David Jeffry Hogg
- Population Health Science Institute, Newcastle University, Newcastle upon Tyne, United Kingdom
- Newcastle upon Tyne Hospitals NHS Foundation Trust, Newcastle upon Tyne, United Kingdom
- Moorfields Eye Hospital NHS Foundation Trust, London, United Kingdom
| | - Afshan Anwar Ali Manji
- CITRIC Health Data Science Centre, Department of Medicine, Aga Khan University, Karachi, Pakistan
| | - Freya Gulamali
- Duke Institute for Health Innovation, Duke University School of Medicine, Durham, North Carolina, United States
| | - Alifia Hasan
- Duke Institute for Health Innovation, Duke University School of Medicine, Durham, North Carolina, United States
| | - Asim Shaikh
- CITRIC Health Data Science Centre, Department of Medicine, Aga Khan University, Karachi, Pakistan
| | - Salma Tajuddin
- CITRIC Health Data Science Centre, Department of Medicine, Aga Khan University, Karachi, Pakistan
| | - Nida Saddaf Khan
- CITRIC Health Data Science Centre, Department of Medicine, Aga Khan University, Karachi, Pakistan
| | - Manesh R. Patel
- Duke Clinical Research Institute, Duke University School of Medicine, Durham, North Carolina, United States
- Division of Cardiology, Duke University School of Medicine, Durham, North Carolina, United States
| | - Suresh Balu
- Duke Institute for Health Innovation, Duke University School of Medicine, Durham, North Carolina, United States
| | - Zainab Samad
- CITRIC Health Data Science Centre, Department of Medicine, Aga Khan University, Karachi, Pakistan
- Department of Medicine, Aga Khan University, Karachi, Pakistan
| | - Mark P. Sendak
- Duke Institute for Health Innovation, Duke University School of Medicine, Durham, North Carolina, United States
| |
Collapse
|
6
|
Han R, Acosta JN, Shakeri Z, Ioannidis JPA, Topol EJ, Rajpurkar P. Randomised controlled trials evaluating artificial intelligence in clinical practice: a scoping review. Lancet Digit Health 2024; 6:e367-e373. [PMID: 38670745 PMCID: PMC11068159 DOI: 10.1016/s2589-7500(24)00047-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 03/01/2024] [Accepted: 03/04/2024] [Indexed: 04/28/2024]
Abstract
This scoping review of randomised controlled trials on artificial intelligence (AI) in clinical practice reveals an expanding interest in AI across clinical specialties and locations. The USA and China are leading in the number of trials, with a focus on deep learning systems for medical imaging, particularly in gastroenterology and radiology. A majority of trials (70 [81%] of 86) report positive primary endpoints, primarily related to diagnostic yield or performance; however, the predominance of single-centre trials, little demographic reporting, and varying reports of operational efficiency raise concerns about the generalisability and practicality of these results. Despite the promising outcomes, considering the likelihood of publication bias and the need for more comprehensive research including multicentre trials, diverse outcome measures, and improved reporting standards is crucial. Future AI trials should prioritise patient-relevant outcomes to fully understand AI's true effects and limitations in health care.
Collapse
Affiliation(s)
- Ryan Han
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA; Department of Computer Science, Stanford University, Stanford, CA, USA; University of California Los Angeles-Caltech Medical Scientist Training Program, Los Angeles, CA, USA
| | - Julián N Acosta
- Department of Neurology, Yale School of Medicine, New Haven, CT, USA; Rad AI, San Francisco, CA, USA
| | - Zahra Shakeri
- Institute of Health Policy, Management and Evaluation, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - John P A Ioannidis
- Stanford Prevention Research Center, Department of Medicine, Stanford University, Stanford, CA, USA; Meta-Research Innovation Center at Stanford, Stanford University, Stanford, CA, USA
| | - Eric J Topol
- Scripps Research Translational Institute, Scripps Research, La Jolla, CA, USA.
| | - Pranav Rajpurkar
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
7
|
Adeoye J, Su YX. Leveraging artificial intelligence for perioperative cancer risk assessment of oral potentially malignant disorders. Int J Surg 2024; 110:1677-1686. [PMID: 38051932 PMCID: PMC10942172 DOI: 10.1097/js9.0000000000000979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Accepted: 11/21/2023] [Indexed: 12/07/2023]
Abstract
Oral potentially malignant disorders (OPMDs) are mucosal conditions with an inherent disposition to develop oral squamous cell carcinoma. Surgical management is the most preferred strategy to prevent malignant transformation in OPMDs, and surgical approaches to treatment include conventional scalpel excision, laser surgery, cryotherapy, and photodynamic therapy. However, in reality, since all patients with OPMDs will not develop oral squamous cell carcinoma in their lifetime, there is a need to stratify patients according to their risk of malignant transformation to streamline surgical intervention for patients with the highest risks. Artificial intelligence (AI) has the potential to integrate disparate factors influencing malignant transformation for robust, precise, and personalized cancer risk stratification of OPMD patients than current methods to determine the need for surgical resection, excision, or re-excision. Therefore, this article overviews existing AI models and tools, presents a clinical implementation pathway, and discusses necessary refinements to aid the clinical application of AI-based platforms for cancer risk stratification of OPMDs in surgical practice.
Collapse
Affiliation(s)
| | - Yu-Xiong Su
- Division of Oral and Maxillofacial Surgery, Faculty of Dentistry, University of Hong Kong, Hong Kong SAR, People’s Republic of China
| |
Collapse
|
8
|
Martindale APL, Llewellyn CD, de Visser RO, Ng B, Ngai V, Kale AU, di Ruffano LF, Golub RM, Collins GS, Moher D, McCradden MD, Oakden-Rayner L, Rivera SC, Calvert M, Kelly CJ, Lee CS, Yau C, Chan AW, Keane PA, Beam AL, Denniston AK, Liu X. Concordance of randomised controlled trials for artificial intelligence interventions with the CONSORT-AI reporting guidelines. Nat Commun 2024; 15:1619. [PMID: 38388497 PMCID: PMC10883966 DOI: 10.1038/s41467-024-45355-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 01/22/2024] [Indexed: 02/24/2024] Open
Abstract
The Consolidated Standards of Reporting Trials extension for Artificial Intelligence interventions (CONSORT-AI) was published in September 2020. Since its publication, several randomised controlled trials (RCTs) of AI interventions have been published but their completeness and transparency of reporting is unknown. This systematic review assesses the completeness of reporting of AI RCTs following publication of CONSORT-AI and provides a comprehensive summary of RCTs published in recent years. 65 RCTs were identified, mostly conducted in China (37%) and USA (18%). Median concordance with CONSORT-AI reporting was 90% (IQR 77-94%), although only 10 RCTs explicitly reported its use. Several items were consistently under-reported, including algorithm version, accessibility of the AI intervention or code, and references to a study protocol. Only 3 of 52 included journals explicitly endorsed or mandated CONSORT-AI. Despite a generally high concordance amongst recent AI RCTs, some AI-specific considerations remain systematically poorly reported. Further encouragement of CONSORT-AI adoption by journals and funders may enable more complete adoption of the full CONSORT-AI guidelines.
Collapse
Affiliation(s)
| | - Carrie D Llewellyn
- Department of Primary Care and Public Health, Brighton and Sussex Medical School, Brighton, UK
| | - Richard O de Visser
- Department of Primary Care and Public Health, Brighton and Sussex Medical School, Brighton, UK
| | - Benjamin Ng
- Birmingham and Midland Eye Centre, Sandwell and West Birmingham NHS Trust, Birmingham, UK
- Christ Church, University of Oxford, Oxford, UK
| | - Victoria Ngai
- University College London Medical School, London, UK
| | - Aditya U Kale
- Institute of Inflammation and Ageing, University of Birmingham, Birmingham, UK
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
- National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, University of Birmingham, Birmingham, UK
| | | | - Robert M Golub
- Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Gary S Collins
- Centre for Statistics in Medicine//UK EQUATOR Centre, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK
| | - David Moher
- Centre for Journalology, Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottowa, Canada
| | - Melissa D McCradden
- Department of Bioethics, The Hospital for Sick Children, Toronto, Canada
- Genetics & Genome Biology Research Program, Peter Gilgan Centre for Research & Learning, Toronto, Canada
- Division of Clinical and Public Health, Dalla Lana School of Public Health, Toronto, Canada
| | - Lauren Oakden-Rayner
- Australian Institute for Machine Learning, University of Adelaide, Adelaide, Australia
| | - Samantha Cruz Rivera
- Birmingham Health Partners Centre for Regulatory Science and Innovation, University of Birmingham, Birmingham, UK
- Centre for Patient Reported Outcomes Research (CPROR), Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| | - Melanie Calvert
- National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, University of Birmingham, Birmingham, UK
- Birmingham Health Partners Centre for Regulatory Science and Innovation, University of Birmingham, Birmingham, UK
- Centre for Patient Reported Outcomes Research (CPROR), Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
- NIHR Applied Research Collaboration (ARC) West Midlands, University of Birmingham, Birmingham, UK
- NIHR Blood and Transplant Research Unit (BTRU) in Precision Transplant and Cellular Therapeutics, University of Birmingham, Birmingham, UK
| | | | | | - Christopher Yau
- Nuffield Department of Women's and Reproductive Health, University of Oxford, Oxford, UK
- Health Data Research UK, London, UK
| | - An-Wen Chan
- Department of Medicine, Women's College Hospital. University of Toronto, Toronto, Canada
| | - Pearse A Keane
- NIHR Biomedical Research Centre at Moorfields, Moorfields Eye Hospital NHS Foundation Trust and UCL Institute of Ophthalmology, London, UK
| | - Andrew L Beam
- Department of Epidemiology, Harvard. T.H. Chan School of Public Health, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Alastair K Denniston
- Institute of Inflammation and Ageing, University of Birmingham, Birmingham, UK
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
- National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, University of Birmingham, Birmingham, UK
- Birmingham Health Partners Centre for Regulatory Science and Innovation, University of Birmingham, Birmingham, UK
- NIHR Biomedical Research Centre at Moorfields, Moorfields Eye Hospital NHS Foundation Trust and UCL Institute of Ophthalmology, London, UK
| | - Xiaoxuan Liu
- Institute of Inflammation and Ageing, University of Birmingham, Birmingham, UK.
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK.
- Birmingham Health Partners Centre for Regulatory Science and Innovation, University of Birmingham, Birmingham, UK.
| |
Collapse
|
9
|
Boverhof BJ, Redekop WK, Bos D, Starmans MPA, Birch J, Rockall A, Visser JJ. Radiology AI Deployment and Assessment Rubric (RADAR) to bring value-based AI into radiological practice. Insights Imaging 2024; 15:34. [PMID: 38315288 PMCID: PMC10844175 DOI: 10.1186/s13244-023-01599-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 11/14/2023] [Indexed: 02/07/2024] Open
Abstract
OBJECTIVE To provide a comprehensive framework for value assessment of artificial intelligence (AI) in radiology. METHODS This paper presents the RADAR framework, which has been adapted from Fryback and Thornbury's imaging efficacy framework to facilitate the valuation of radiology AI from conception to local implementation. Local efficacy has been newly introduced to underscore the importance of appraising an AI technology within its local environment. Furthermore, the RADAR framework is illustrated through a myriad of study designs that help assess value. RESULTS RADAR presents a seven-level hierarchy, providing radiologists, researchers, and policymakers with a structured approach to the comprehensive assessment of value in radiology AI. RADAR is designed to be dynamic and meet the different valuation needs throughout the AI's lifecycle. Initial phases like technical and diagnostic efficacy (RADAR-1 and RADAR-2) are assessed pre-clinical deployment via in silico clinical trials and cross-sectional studies. Subsequent stages, spanning from diagnostic thinking to patient outcome efficacy (RADAR-3 to RADAR-5), require clinical integration and are explored via randomized controlled trials and cohort studies. Cost-effectiveness efficacy (RADAR-6) takes a societal perspective on financial feasibility, addressed via health-economic evaluations. The final level, RADAR-7, determines how prior valuations translate locally, evaluated through budget impact analysis, multi-criteria decision analyses, and prospective monitoring. CONCLUSION The RADAR framework offers a comprehensive framework for valuing radiology AI. Its layered, hierarchical structure, combined with a focus on local relevance, aligns RADAR seamlessly with the principles of value-based radiology. CRITICAL RELEVANCE STATEMENT The RADAR framework advances artificial intelligence in radiology by delineating a much-needed framework for comprehensive valuation. KEYPOINTS • Radiology artificial intelligence lacks a comprehensive approach to value assessment. • The RADAR framework provides a dynamic, hierarchical method for thorough valuation of radiology AI. • RADAR advances clinical radiology by bridging the artificial intelligence implementation gap.
Collapse
Affiliation(s)
- Bart-Jan Boverhof
- Erasmus School of Health Policy and Management, Erasmus University Rotterdam, Rotterdam, The Netherlands
| | - W Ken Redekop
- Erasmus School of Health Policy and Management, Erasmus University Rotterdam, Rotterdam, The Netherlands
| | - Daniel Bos
- Department of Epidemiology, Erasmus University Medical Centre, Rotterdam, The Netherlands
- Department of Radiology & Nuclear Medicine, Erasmus University Medical Centre, Rotterdam, The Netherlands
| | - Martijn P A Starmans
- Department of Radiology & Nuclear Medicine, Erasmus University Medical Centre, Rotterdam, The Netherlands
| | | | - Andrea Rockall
- Department of Surgery & Cancer, Imperial College London, London, UK
| | - Jacob J Visser
- Department of Radiology & Nuclear Medicine, Erasmus University Medical Centre, Rotterdam, The Netherlands.
| |
Collapse
|
10
|
Affiliation(s)
| | | | - Chin-Chi Kuo
- China Medical University Hospital, Taichung, Taiwan
| |
Collapse
|
11
|
Sung JJY, Savulescu J, Ngiam KY, An B, Ang TL, Yeoh KG, Cham TJ, Tsao S, Chua TS. Artificial intelligence for gastroenterology: Singapore artificial intelligence for Gastroenterology Working Group Position Statement. J Gastroenterol Hepatol 2023; 38:1669-1676. [PMID: 37277693 DOI: 10.1111/jgh.16241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Revised: 05/10/2023] [Accepted: 05/11/2023] [Indexed: 06/07/2023]
Abstract
BACKGROUND Successful implementation of artificial intelligence in gastroenterology and hepatology practice requires more than technology. There are ethical, legal, and social issues that need to be settled. AIM A group consisting of AI developers (engineer), AI users (gastroenterologist, hepatologist, and surgeon) and AI regulators (ethicist and administrator) formed a Working Group to draft these Positions Statements with the objective of arousing public and professional interest and dialogue, to promote ethical considerations when implementing AI technology, to suggest to policy makers and health authorities relevant factors to take into account when approving and regulating the use of AI tools, and to engage the profession in preparing for change in clinical practice. STATEMENTS These series of Position Statements point out the salient issues to maintain the trust between care provider and care receivers, and to legitimize the use of a non-human tool in healthcare delivery. It is based on fundamental principles such as respect, autonomy, privacy, responsibility, and justice. Enforcing the use of AI without considering these factor risk damaging the doctor-patient relationship.
Collapse
Affiliation(s)
- Joseph J Y Sung
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore
| | - Julian Savulescu
- Centre for Biomedical Ethics, National University of Singapore, Singapore
| | - K Y Ngiam
- Department of Surgery, National University Hospital, Singapore
| | - Bo An
- School of Computer Science and Engineering, Nanyang Technological University, Singapore
| | - Tiing Leong Ang
- Singapore Health Service, Changi General Hospital, Singapore
| | - K G Yeoh
- Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- Department of Gastroenterology and Hepatology, National University Hospital, National University Health System, Singapore
| | - Tat-Jen Cham
- School of Computer Science and Engineering, Nanyang Technological University, Singapore
| | - Stephen Tsao
- National Healthcare Group, Tan Tock Seng Hospital Singapore, Singapore
- Gastroenterological Society of Singapore, Singapore
| | - T S Chua
- Gastroenterology Chapter, Academy of Medicine, Singapore
| |
Collapse
|
12
|
Singareddy S, Sn VP, Jaramillo AP, Yasir M, Iyer N, Hussein S, Nath TS. Artificial Intelligence and Its Role in the Management of Chronic Medical Conditions: A Systematic Review. Cureus 2023; 15:e46066. [PMID: 37900468 PMCID: PMC10607642 DOI: 10.7759/cureus.46066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 09/27/2023] [Indexed: 10/31/2023] Open
Abstract
Due to the increased burden of chronic medical conditions in recent years, artificial intelligence (AI) is suggested in the medical field to optimize health care. Physicians could implement these automated problem-solving tools for their benefit, reducing their workload, assisting in diagnostics, and supporting clinical decision-making. These tools are being considered for future medical assistance in real life. A literature review was performed to assess the impact of AI on the patient population with chronic medical conditions, using standardized guidelines. A MeSH strategy was created, and the database was searched for appropriate studies using specific inclusion and exclusion criteria. The online database yielded 93 results from various databases, of which 10 moderate to high-quality studies were selected to be included in our systematic review after removing the duplicates, screening titles, and articles. Of the 10 studies, nine recommended using AI after considering the potential limitations such as privacy protection, medicolegal implications, and psychosocial aspects. Due to its non-fatigable nature, AI was found to be of immense help in image recognition. It was also found to be valuable in various disciplines related to administration, physician burden, and patient adherence. The newer technologies of Chatbots and eHealth applications are of great help when used safely and effectively after proper patient education. After a careful review conducted by our team members, it is safe to conclude that implementing AI in daily clinical practice could potentiate the cognitive ability of physicians and decrease the workload through various automated technologies such as image recognition, speech recognition, and voice recognition due to its unmatchable speed and non-fatigable nature when compared to clinicians. Despite its vast benefits to the medical field, a few limitations could hinder its effective implementation into real-life practice, which requires enormous research and strict regulations to support its role as a physician's aid. However, AI should only be used as a medical support system, in order to improve the primary outcomes such as reducing waiting time, healthcare costs, and workload. AI should not be meant to replace physicians.
Collapse
Affiliation(s)
- Sanjana Singareddy
- Internal Medicine, California Institute of Behavioral Neurosciences & Psychology, Fairfield, USA
| | - Vijay Prabhu Sn
- Internal Medicine, California Institute of Behavioral Neurosciences & Psychology, Fairfield, USA
| | - Arturo P Jaramillo
- General Practice, California Institute of Behavioral Neurosciences & Psychology, Fairfield, USA
| | - Mohamed Yasir
- Research, California Institute of Behavioral Neurosciences & Psychology, Fairfield, USA
| | - Nandhini Iyer
- Internal Medicine, California Institute of Behavioral Neurosciences & Psychology, Fairfield, USA
| | - Sally Hussein
- Internal Medicine, California Institute of Behavioral Neurosciences & Psychology, Fairfield, USA
| | - Tuheen Sankar Nath
- Surgical Oncology, California Institute of Behavioral Neurosciences & Psychology, Fairfield, USA
| |
Collapse
|
13
|
Nong P. Demonstrating Trustworthiness to Patients in Data-Driven Health Care. Hastings Cent Rep 2023; 53 Suppl 2:S69-S75. [PMID: 37963050 DOI: 10.1002/hast.1526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2023]
Abstract
Patient data is used to drive an ecosystem of advanced digital tools in health care, like predictive models or artificial intelligence-based decision support. Patients themselves, however, receive little information about these technologies or how they affect their care. This raises important questions about patient trust and continued engagement in a health care system that extracts their data but does not treat them as key stakeholders. This essay explores these tensions and provides steps forward for health systems as they design advanced health information-technology (IT) policies and practices. It centers patients, their concerns, and the ways they perceive trustworthiness to reframe advanced health IT in service of patient interests.
Collapse
|
14
|
Zdziechowski A, Gluba-Sagr A, Rysz J, Woldańska-Okońska M. Why Does Rehabilitation Not (Always) Work in Osteoarthritis? Does Rehabilitation Need Molecular Biology? Int J Mol Sci 2023; 24:ijms24098109. [PMID: 37175818 PMCID: PMC10179350 DOI: 10.3390/ijms24098109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Revised: 04/14/2023] [Accepted: 04/20/2023] [Indexed: 05/15/2023] Open
Abstract
Osteoarthritis (OA) is a common disease among the human population worldwide. OA causes functional impairment, leads to disability and poses serious socioeconomic burden. The rehabilitation offers a function-oriented method to reduce the disability using diverse interventions (kinesiotherapy, physical therapy, occupational therapy, education, and pharmacotherapy). OA as a widespread disease among elderly patients is often treated by rehabilitation specialists and physiotherapists, however the results of rehabilitation are sometimes unsatisfactory. The understanding of molecular mechanisms activated by rehabilitation may enable the development of more effective rehabilitation procedures. Molecular biology methods may prove crucial in rehabilitation as the majority of rehabilitation procedures cannot be estimated in double-blinded placebo-controlled trials commonly used in pharmacotherapy. This article attempts to present and estimate the role of molecular biology in the development of modern rehabilitation. The role of clinicians in adequate molecular biology experimental design is also described.
Collapse
Affiliation(s)
- Adam Zdziechowski
- Department of Internal Diseases, Rehabilitation and Physical Medicine, Medical University, 90-700 Łódź, Poland
| | - Anna Gluba-Sagr
- Department of Nephrology, Hypertension and Family Medicine, Medical University of Lodz, 90-549 Łódź, Poland
| | - Jacek Rysz
- Department of Nephrology, Hypertension and Family Medicine, Medical University of Lodz, 90-549 Łódź, Poland
| | - Marta Woldańska-Okońska
- Department of Internal Diseases, Rehabilitation and Physical Medicine, Medical University, 90-700 Łódź, Poland
| |
Collapse
|
15
|
Sendak M, Vidal D, Trujillo S, Singh K, Liu X, Balu S. Editorial: Surfacing best practices for AI software development and integration in healthcare. Front Digit Health 2023; 5:1150875. [PMID: 36895323 PMCID: PMC9989472 DOI: 10.3389/fdgth.2023.1150875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Accepted: 02/06/2023] [Indexed: 02/25/2023] Open
Affiliation(s)
- Mark Sendak
- Duke Institute for Health Innovation, Durham, NC, United States
| | | | | | - Karandeep Singh
- Division of Nephrology, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, United States
| | - Xiaoxuan Liu
- Academic Unit of Ophthalmology, Institute of Inflammation and Ageing, University of Birmingham, Birmingham, United Kingdom
| | - Suresh Balu
- Duke Institute for Health Innovation, Durham, NC, United States
| |
Collapse
|