1
|
Mäenpää SM, Korja M. Diagnostic test accuracy of externally validated convolutional neural network (CNN) artificial intelligence (AI) models for emergency head CT scans - A systematic review. Int J Med Inform 2024; 189:105523. [PMID: 38901270 DOI: 10.1016/j.ijmedinf.2024.105523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Revised: 05/29/2024] [Accepted: 06/10/2024] [Indexed: 06/22/2024]
Abstract
BACKGROUND The surge in emergency head CT imaging and artificial intelligence (AI) advancements, especially deep learning (DL) and convolutional neural networks (CNN), have accelerated the development of computer-aided diagnosis (CADx) for emergency imaging. External validation assesses model generalizability, providing preliminary evidence of clinical potential. OBJECTIVES This study systematically reviews externally validated CNN-CADx models for emergency head CT scans, critically appraises diagnostic test accuracy (DTA), and assesses adherence to reporting guidelines. METHODS Studies comparing CNN-CADx model performance to reference standard were eligible. The review was registered in PROSPERO (CRD42023411641) and conducted on Medline, Embase, EBM-Reviews and Web of Science following PRISMA-DTA guideline. DTA reporting were systematically extracted and appraised using standardised checklists (STARD, CHARMS, CLAIM, TRIPOD, PROBAST, QUADAS-2). RESULTS Six of 5636 identified studies were eligible. The common target condition was intracranial haemorrhage (ICH), and intended workflow roles auxiliary to experts. Due to methodological and clinical between-study variation, meta-analysis was inappropriate. The scan-level sensitivity exceeded 90 % in 5/6 studies, while specificities ranged from 58,0-97,7 %. The SROC 95 % predictive region was markedly broader than the confidence region, ranging above 50 % sensitivity and 20 % specificity. All studies had unclear or high risk of bias and concern for applicability (QUADAS-2, PROBAST), and reporting adherence was below 50 % in 20 of 32 TRIPOD items. CONCLUSION 0.01 % of identified studies met the eligibility criteria. The evidence on the DTA of CNN-CADx models for emergency head CT scans remains limited in the scope of this review, as the reviewed studies were scarce, inapt for meta-analysis and undermined by inadequate methodological conduct and reporting. Properly conducted, external validation remains preliminary for evaluating the clinical potential of AI-CADx models, but prospective and pragmatic clinical validation in comparative trials remains most crucial. In conclusion, future AI-CADx research processes should be methodologically standardized and reported in a clinically meaningful way to avoid research waste.
Collapse
Affiliation(s)
- Saana M Mäenpää
- Department of Neurosurgery, University of Helsinki and Helsinki University Hospital, Helsinki, Finland.
| | - Miikka Korja
- Department of Neurosurgery, University of Helsinki and Helsinki University Hospital, Helsinki, Finland.
| |
Collapse
|
2
|
Seringa J, Abreu J, Magalhaes T. Machine learning methods, applications and economic analysis to predict heart failure hospitalisation risk: a scoping review protocol. BMJ Open 2024; 14:e083188. [PMID: 38580361 PMCID: PMC11002361 DOI: 10.1136/bmjopen-2023-083188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 03/22/2024] [Indexed: 04/07/2024] Open
Abstract
INTRODUCTION Machine learning (ML) has emerged as a powerful tool for uncovering patterns and generating new information. In cardiology, it has shown promising results in predictive outcomes risk assessment of heart failure (HF) patients, a chronic condition affecting over 64 million individuals globally.This scoping review aims to synthesise the evidence on ML methods, applications and economic analysis to predict the HF hospitalisation risk. METHODS AND ANALYSIS This scoping review will use the approach described by Arksey and O'Malley. This protocol will use the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) Protocol, and the PRISMA extension for scoping reviews will be used to present the results. PubMed, Scopus and Web of Science are the databases that will be searched. Two reviewers will independently screen the full-text studies for inclusion and extract the data. All the studies focusing on ML models to predict the risk of hospitalisation from HF adult patients will be included. ETHICS AND DISSEMINATION Ethical approval is not required for this review. The dissemination strategy includes peer-reviewed publications, conference presentations and dissemination to relevant stakeholders.
Collapse
Affiliation(s)
- Joana Seringa
- NOVA National School of Public Health, NOVA University Lisbon, Lisbon, Portugal
- NOVA National School of Public Health, Public Health Research Centre, Comprehensive Health Research Center, CHRC, NOVA University Lisbon, Lisbon, Portugal
| | - João Abreu
- NOVA National School of Public Health, NOVA University Lisbon, Lisbon, Portugal
| | - Teresa Magalhaes
- NOVA National School of Public Health, NOVA University Lisbon, Lisbon, Portugal
- NOVA National School of Public Health, Public Health Research Centre, Comprehensive Health Research Center, CHRC, NOVA University Lisbon, Lisbon, Portugal
| |
Collapse
|
3
|
Jeffery AD, Fabbri D, Reeves RM, Matheny ME. Use of noisy labels as weak learners to identify incompletely ascertainable outcomes: A Feasibility study with opioid-induced respiratory depression. Heliyon 2024; 10:e26434. [PMID: 38444495 PMCID: PMC10912240 DOI: 10.1016/j.heliyon.2024.e26434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 02/09/2024] [Accepted: 02/13/2024] [Indexed: 03/07/2024] Open
Abstract
Objective Assigning outcome labels to large observational data sets in a timely and accurate manner, particularly when outcomes are rare or not directly ascertainable, remains a significant challenge within biomedical informatics. We examined whether noisy labels generated from subject matter experts' heuristics using heterogenous data types within a data programming paradigm could provide outcomes labels to a large, observational data set. We chose the clinical condition of opioid-induced respiratory depression for our use case because it is rare, has no administrative codes to easily identify the condition, and typically requires at least some unstructured text to ascertain its presence. Materials and methods Using de-identified electronic health records of 52,861 post-operative encounters, we applied a data programming paradigm (implemented in the Snorkel software) for the development of a machine learning classifier for opioid-induced respiratory depression. Our approach included subject matter experts creating 14 labeling functions that served as noisy labels for developing a probabilistic Generative model. We used probabilistic labels from the Generative model as outcome labels for training a Discriminative model on the source data. We evaluated performance of the Discriminative model with a hold-out test set of 599 independently-reviewed patient records. Results The final Discriminative classification model achieved an accuracy of 0.977, an F1 score of 0.417, a sensitivity of 1.0, and an AUC of 0.988 in the hold-out test set with a prevalence of 0.83% (5/599). Discussion All of the confirmed Cases were identified by the classifier. For rare outcomes, this finding is encouraging because it reduces the number of manual reviews needed by excluding visits/patients with low probabilities. Conclusion Application of a data programming paradigm with expert-informed labeling functions might have utility for phenotyping clinical phenomena that are not easily ascertainable from highly-structured data.
Collapse
Affiliation(s)
- Alvin D. Jeffery
- Vanderbilt University School of Nursing, Nashville, TN, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
- Tennessee Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, TN, USA
| | - Daniel Fabbri
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Ruth M. Reeves
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
- Tennessee Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, TN, USA
| | - Michael E. Matheny
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
- Tennessee Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, TN, USA
| |
Collapse
|
4
|
Jeffery AD, Fabbri D, Reeves RM, Matheny ME. Use of Noisy Labels as Weak Learners to Identify Incompletely Ascertainable Outcomes: A Feasibility Study with Opioid-Induced Respiratory Depression. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.01.29.24301963. [PMID: 38352435 PMCID: PMC10863026 DOI: 10.1101/2024.01.29.24301963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/19/2024]
Abstract
Objective Assigning outcome labels to large observational data sets in a timely and accurate manner, particularly when outcomes are rare or not directly ascertainable, remains a significant challenge within biomedical informatics. We examined whether noisy labels generated from subject matter experts' heuristics using heterogenous data types within a data programming paradigm could provide outcomes labels to a large, observational data set. We chose the clinical condition of opioid-induced respiratory depression for our use case because it is rare, has no administrative codes to easily identify the condition, and typically requires at least some unstructured text to ascertain its presence. Materials and Methods Using de-identified electronic health records of 52,861 post-operative encounters, we applied a data programming paradigm (implemented in the Snorkel software) for the development of a machine learning classifier for opioid-induced respiratory depression. Our approach included subject matter experts creating 14 labeling functions that served as noisy labels for developing a probabilistic Generative model. We used probabilistic labels from the Generative model as outcome labels for training a Discriminative model on the source data. We evaluated performance of the Discriminative model with a hold-out test set of 599 independently-reviewed patient records. Results The final Discriminative classification model achieved an accuracy of 0.977, an F1 score of 0.417, a sensitivity of 1.0, and an AUC of 0.988 in the hold-out test set with a prevalence of 0.83% (5/599). Discussion All of the confirmed Cases were identified by the classifier. For rare outcomes, this finding is encouraging because it reduces the number of manual reviews needed by excluding visits/patients with low probabilities. Conclusion Application of a data programming paradigm with expert-informed labeling functions might have utility for phenotyping clinical phenomena that are not easily ascertainable from highly-structured data.
Collapse
Affiliation(s)
- Alvin D Jeffery
- School of Nursing, Vanderbilt University, Department of Biomedical Informatics, Vanderbilt University Medical Center, Tennessee Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, TN, USA
| | - Daniel Fabbri
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Ruth M Reeves
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Tennessee Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, TN, USA
| | - Michael E Matheny
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Tennessee Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, TN, USA
| |
Collapse
|
5
|
Luo AL, Ravi A, Arvisais-Anhalt S, Muniyappa AN, Liu X, Wang S. Development and Internal Validation of an Interpretable Machine Learning Model to Predict Readmissions in a United States Healthcare System. INFORMATICS 2023. [DOI: 10.3390/informatics10020033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023] Open
Abstract
(1) One in four hospital readmissions is potentially preventable. Machine learning (ML) models have been developed to predict hospital readmissions and risk-stratify patients, but thus far they have been limited in clinical applicability, timeliness, and generalizability. (2) Methods: Using deidentified clinical data from the University of California, San Francisco (UCSF) between January 2016 and November 2021, we developed and compared four supervised ML models (logistic regression, random forest, gradient boosting, and XGBoost) to predict 30-day readmissions for adults admitted to a UCSF hospital. (3) Results: Of 147,358 inpatient encounters, 20,747 (13.9%) patients were readmitted within 30 days of discharge. The final model selected was XGBoost, which had an area under the receiver operating characteristic curve of 0.783 and an area under the precision-recall curve of 0.434. The most important features by Shapley Additive Explanations were days since last admission, discharge department, and inpatient length of stay. (4) Conclusions: We developed and internally validated a supervised ML model to predict 30-day readmissions in a US-based healthcare system. This model has several advantages including state-of-the-art performance metrics, the use of clinical data, the use of features available within 24 h of discharge, and generalizability to multiple disease states.
Collapse
|
6
|
Abstract
The deployment of machine learning for tasks relevant to complementing standard of care and advancing tools for precision health has gained much attention in the clinical community, thus meriting further investigations into its broader use. In an introduction to predictive modelling using machine learning, we conducted a review of the recent literature that explains standard taxonomies, terminology and central concepts to a broad clinical readership. Articles aimed at readers with little or no prior experience of commonly used methods or typical workflows were summarised and key references are highlighted. Continual interdisciplinary developments in data science, biostatistics and epidemiology also motivated us to further discuss emerging topics in predictive and data-driven (hypothesis-less) analytics with machine learning. Through two methodological deep dives using examples from precision psychiatry and outcome prediction after lymphoma, we highlight how the use of, for example, natural language processing can outperform established clinical risk scores and aid dynamic prediction and adaptive care strategies. Such realistic and detailed examples allow for critical analysis of the importance of new technological advances in artificial intelligence for clinical decision-making. New clinical decision support systems can assist in prevention and care by leveraging precision medicine.
Collapse
Affiliation(s)
- Sandra Eloranta
- Division of Clinical Epidemiology, Department of Medicine Solna, Karolinska Institutet, Stockholm, Sweden
| | - Magnus Boman
- Division of Software and Computer Systems, School of Electrical Engineering and Computer Science, KTH, Stockholm, Sweden.,Department of Learning, Informatics, Management, and Ethics, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
7
|
Fitzsimmons L, Dewan M, Dexheimer JW. Diversity in Machine Learning: A Systematic Review of Text-Based Diagnostic Applications. Appl Clin Inform 2022; 13:569-582. [PMID: 35613914 DOI: 10.1055/s-0042-1749119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022] Open
Abstract
OBJECTIVE As the storage of clinical data has transitioned into electronic formats, medical informatics has become increasingly relevant in providing diagnostic aid. The purpose of this review is to evaluate machine learning models that use text data for diagnosis and to assess the diversity of the included study populations. METHODS We conducted a systematic literature review on three public databases. Two authors reviewed every abstract for inclusion. Articles were included if they used or developed machine learning algorithms to aid in diagnosis. Articles focusing on imaging informatics were excluded. RESULTS From 2,260 identified papers, we included 78. Of the machine learning models used, neural networks were relied upon most frequently (44.9%). Studies had a median population of 661.5 patients, and diseases and disorders of 10 different body systems were studied. Of the 35.9% (N = 28) of papers that included race data, 57.1% (N = 16) of study populations were majority White, 14.3% were majority Asian, and 7.1% were majority Black. In 75% (N = 21) of papers, White was the largest racial group represented. Of the papers included, 43.6% (N = 34) included the sex ratio of the patient population. DISCUSSION With the power to build robust algorithms supported by massive quantities of clinical data, machine learning is shaping the future of diagnostics. Limitations of the underlying data create potential biases, especially if patient demographics are unknown or not included in the training. CONCLUSION As the movement toward clinical reliance on machine learning accelerates, both recording demographic information and using diverse training sets should be emphasized. Extrapolating algorithms to demographics beyond the original study population leaves large gaps for potential biases.
Collapse
Affiliation(s)
- Lane Fitzsimmons
- College of Agriculture and Life Science, Cornell University, Ithaca, New York, United States
| | - Maya Dewan
- Division of Critical Care Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, United States.,Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States
| | - Judith W Dexheimer
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States.,Division of Emergency Medicine; Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, United States
| |
Collapse
|