1
|
Åkesson J, Töger J, Heiberg E. Random effects during training: Implications for deep learning-based medical image segmentation. Comput Biol Med 2024; 180:108944. [PMID: 39096609 DOI: 10.1016/j.compbiomed.2024.108944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 07/12/2024] [Accepted: 07/24/2024] [Indexed: 08/05/2024]
Abstract
BACKGROUND A single learning algorithm can produce deep learning-based image segmentation models that vary in performance purely due to random effects during training. This study assessed the effect of these random performance fluctuations on the reliability of standard methods of comparing segmentation models. METHODS The influence of random effects during training was assessed by running a single learning algorithm (nnU-Net) with 50 different random seeds for three multiclass 3D medical image segmentation problems, including brain tumour, hippocampus, and cardiac segmentation. Recent literature was sampled to find the most common methods for estimating and comparing the performance of deep learning segmentation models. Based on this, segmentation performance was assessed using both hold-out validation and 5-fold cross-validation and the statistical significance of performance differences was measured using the Paired t-test and the Wilcoxon signed rank test on Dice scores. RESULTS For the different segmentation problems, the seed producing the highest mean Dice score statistically significantly outperformed between 0 % and 76 % of the remaining seeds when estimating performance using hold-out validation, and between 10 % and 38 % when estimating performance using 5-fold cross-validation. CONCLUSION Random effects during training can cause high rates of statistically-significant performance differences between segmentation models from the same learning algorithm. Whilst statistical testing is widely used in contemporary literature, our results indicate that a statistically-significant difference in segmentation performance is a weak and unreliable indicator of a true performance difference between two learning algorithms.
Collapse
Affiliation(s)
- Julius Åkesson
- Clinical Physiology, Department of Clinical Sciences Lund, Lund University, Lund, Sweden; Department of Biomedical Engineering, Faculty of Engineering, Lund University, Lund, Sweden
| | - Johannes Töger
- Clinical Physiology, Department of Clinical Sciences Lund, Lund University, Lund, Sweden; Department of Biomedical Engineering, Faculty of Engineering, Lund University, Lund, Sweden
| | - Einar Heiberg
- Clinical Physiology, Department of Clinical Sciences Lund, Lund University, Lund, Sweden; Department of Biomedical Engineering, Faculty of Engineering, Lund University, Lund, Sweden; Wallenberg Centre for Molecular Medicine, Lund University, Lund, Sweden.
| |
Collapse
|
2
|
Ross J, Hammouche S, Chen Y, Rockall AG. Beyond regulatory compliance: evaluating radiology artificial intelligence applications in deployment. Clin Radiol 2024; 79:338-345. [PMID: 38360516 DOI: 10.1016/j.crad.2024.01.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 01/24/2024] [Accepted: 01/29/2024] [Indexed: 02/17/2024]
Abstract
The implementation of artificial intelligence (AI) applications in routine practice, following regulatory approval, is currently limited by practical concerns around reliability, accountability, trust, safety, and governance, in addition to factors such as cost-effectiveness and institutional information technology support. When a technology is new and relatively untested in a field, professional confidence is lacking and there is a sense of the need to go above the baseline level of validation and compliance. In this article, we propose an approach that goes beyond standard regulatory compliance for AI apps that are approved for marketing, including independent benchmarking in the lab as well as clinical audit in practice, with the aims of increasing trust and preventing harm.
Collapse
Affiliation(s)
- J Ross
- Department of Cancer and Surgery, Imperial College London, UK.
| | - S Hammouche
- Department of Cancer and Surgery, Imperial College London, UK
| | - Y Chen
- School of Medicine, University of Nottingham, UK
| | - A G Rockall
- Department of Cancer and Surgery, Imperial College London, UK
| |
Collapse
|
3
|
Abdulaal L, Maiter A, Salehi M, Sharkey M, Alnasser T, Garg P, Rajaram S, Hill C, Johns C, Rothman AMK, Dwivedi K, Kiely DG, Alabed S, Swift AJ. A systematic review of artificial intelligence tools for chronic pulmonary embolism on CT pulmonary angiography. FRONTIERS IN RADIOLOGY 2024; 4:1335349. [PMID: 38654762 PMCID: PMC11035730 DOI: 10.3389/fradi.2024.1335349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Accepted: 03/26/2024] [Indexed: 04/26/2024]
Abstract
Background Chronic pulmonary embolism (PE) may result in pulmonary hypertension (CTEPH). Automated CT pulmonary angiography (CTPA) interpretation using artificial intelligence (AI) tools has the potential for improving diagnostic accuracy, reducing delays to diagnosis and yielding novel information of clinical value in CTEPH. This systematic review aimed to identify and appraise existing studies presenting AI tools for CTPA in the context of chronic PE and CTEPH. Methods MEDLINE and EMBASE databases were searched on 11 September 2023. Journal publications presenting AI tools for CTPA in patients with chronic PE or CTEPH were eligible for inclusion. Information about model design, training and testing was extracted. Study quality was assessed using compliance with the Checklist for Artificial Intelligence in Medical Imaging (CLAIM). Results Five studies were eligible for inclusion, all of which presented deep learning AI models to evaluate PE. First study evaluated the lung parenchymal changes in chronic PE and two studies used an AI model to classify PE, with none directly assessing the pulmonary arteries. In addition, a separate study developed a CNN tool to distinguish chronic PE using 2D maximum intensity projection reconstructions. While another study assessed a novel automated approach to quantify hypoperfusion to help in the severity assessment of CTEPH. While descriptions of model design and training were reliable, descriptions of the datasets used in training and testing were more inconsistent. Conclusion In contrast to AI tools for evaluation of acute PE, there has been limited investigation of AI-based approaches to characterising chronic PE and CTEPH on CTPA. Existing studies are limited by inconsistent reporting of the data used to train and test their models. This systematic review highlights an area of potential expansion for the field of AI in medical image interpretation.There is limited knowledge of A systematic review of artificial intelligence tools for chronic pulmonary embolism in CT. This systematic review provides an assessment on research that examined deep learning algorithms in detecting CTEPH on CTPA images, the number of studies assessing the utility of deep learning on CTPA in CTEPH was unclear and should be highlighted.
Collapse
Affiliation(s)
- Lojain Abdulaal
- Department of Infection, Immunity and Cardiovascular Disease, The University of Sheffield, Sheffield, United Kingdom
- Faculty of Applied Medical Science, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Ahmed Maiter
- Department of Infection, Immunity and Cardiovascular Disease, The University of Sheffield, Sheffield, United Kingdom
- Respiratory Physiology Department, Sheffield Pulmonary Vascular Disease Unit, Sheffield, United Kingdom
- Department of Clinical Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, United Kingdom
| | - Mahan Salehi
- Department of Infection, Immunity and Cardiovascular Disease, The University of Sheffield, Sheffield, United Kingdom
| | - Michael Sharkey
- Department of Infection, Immunity and Cardiovascular Disease, The University of Sheffield, Sheffield, United Kingdom
| | - Turki Alnasser
- Department of Infection, Immunity and Cardiovascular Disease, The University of Sheffield, Sheffield, United Kingdom
| | - Pankaj Garg
- Faculty of Medicine and Health Sciences, Norwich Medical School, University of East Anglia, Norwich, United Kingdom
| | - Smitha Rajaram
- Department of Infection, Immunity and Cardiovascular Disease, The University of Sheffield, Sheffield, United Kingdom
- Respiratory Physiology Department, Sheffield Pulmonary Vascular Disease Unit, Sheffield, United Kingdom
- Department of Clinical Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, United Kingdom
| | - Catherine Hill
- Respiratory Physiology Department, Sheffield Pulmonary Vascular Disease Unit, Sheffield, United Kingdom
- Department of Clinical Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, United Kingdom
| | - Christopher Johns
- Department of Infection, Immunity and Cardiovascular Disease, The University of Sheffield, Sheffield, United Kingdom
- Respiratory Physiology Department, Sheffield Pulmonary Vascular Disease Unit, Sheffield, United Kingdom
- Department of Clinical Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, United Kingdom
| | - Alex Matthew Knox Rothman
- Department of Infection, Immunity and Cardiovascular Disease, The University of Sheffield, Sheffield, United Kingdom
| | - Krit Dwivedi
- Department of Infection, Immunity and Cardiovascular Disease, The University of Sheffield, Sheffield, United Kingdom
- Respiratory Physiology Department, Sheffield Pulmonary Vascular Disease Unit, Sheffield, United Kingdom
| | - David G. Kiely
- Department of Infection, Immunity and Cardiovascular Disease, The University of Sheffield, Sheffield, United Kingdom
- Faculty of Engineering, INSIGNEO Institute, Institute for in Silico Medicine, The University of Sheffield, Sheffield, United Kingdom
- Sheffield Biomedical Research Centre, National Institute for Health Research, Sheffield, United Kingdom
| | - Samer Alabed
- Department of Infection, Immunity and Cardiovascular Disease, The University of Sheffield, Sheffield, United Kingdom
- Department of Clinical Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, United Kingdom
- Faculty of Engineering, INSIGNEO Institute, Institute for in Silico Medicine, The University of Sheffield, Sheffield, United Kingdom
| | - Andrew James Swift
- Department of Infection, Immunity and Cardiovascular Disease, The University of Sheffield, Sheffield, United Kingdom
- Department of Clinical Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, United Kingdom
- Faculty of Engineering, INSIGNEO Institute, Institute for in Silico Medicine, The University of Sheffield, Sheffield, United Kingdom
- Sheffield Biomedical Research Centre, National Institute for Health Research, Sheffield, United Kingdom
| |
Collapse
|
4
|
Schilling M, Unterberg-Buchwald C, Lotz J, Uecker M. Assessment of deep learning segmentation for real-time free-breathing cardiac magnetic resonance imaging at rest and under exercise stress. Sci Rep 2024; 14:3754. [PMID: 38355969 PMCID: PMC10866998 DOI: 10.1038/s41598-024-54164-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 02/09/2024] [Indexed: 02/16/2024] Open
Abstract
In recent years, a variety of deep learning networks for cardiac MRI (CMR) segmentation have been developed and analyzed. However, nearly all of them are focused on cine CMR under breathold. In this work, accuracy of deep learning methods is assessed for volumetric analysis (via segmentation) of the left ventricle in real-time free-breathing CMR at rest and under exercise stress. Data from healthy volunteers (n = 15) for cine and real-time free-breathing CMR at rest and under exercise stress were analyzed retrospectively. Exercise stress was performed using an ergometer in the supine position. Segmentations of two deep learning methods, a commercially available technique (comDL) and an openly available network (nnU-Net), were compared to a reference model created via the manual correction of segmentations obtained with comDL. Segmentations of left ventricular endocardium (LV), left ventricular myocardium (MYO), and right ventricle (RV) are compared for both end-systolic and end-diastolic phases and analyzed with Dice's coefficient. The volumetric analysis includes the cardiac function parameters LV end-diastolic volume (EDV), LV end-systolic volume (ESV), and LV ejection fraction (EF), evaluated with respect to both absolute and relative differences. For cine CMR, nnU-Net and comDL achieve Dice's coefficients above 0.95 for LV and 0.9 for MYO, and RV. For real-time CMR, the accuracy of nnU-Net exceeds that of comDL overall. For real-time CMR at rest, nnU-Net achieves Dice's coefficients of 0.94 for LV, 0.89 for MYO, and 0.90 for RV and the mean absolute differences between nnU-Net and the reference are 2.9 mL for EDV, 3.5 mL for ESV, and 2.6% for EF. For real-time CMR under exercise stress, nnU-Net achieves Dice's coefficients of 0.92 for LV, 0.85 for MYO, and 0.83 for RV and the mean absolute differences between nnU-Net and reference are 11.4 mL for EDV, 2.9 mL for ESV, and 3.6% for EF. Deep learning methods designed or trained for cine CMR segmentation can perform well on real-time CMR. For real-time free-breathing CMR at rest, the performance of deep learning methods is comparable to inter-observer variability in cine CMR and is usable for fully automatic segmentation. For real-time CMR under exercise stress, the performance of nnU-Net could promise a higher degree of automation in the future.
Collapse
Affiliation(s)
- Martin Schilling
- Institute for Diagnostic and Interventional Radiology, Universitätsmedizin Göttingen, Göttingen, Germany
| | - Christina Unterberg-Buchwald
- Institute for Diagnostic and Interventional Radiology, Universitätsmedizin Göttingen, Göttingen, Germany
- German Centre for Cardiovascular Research (DZHK), Partner Site Göttingen, Göttingen, Germany
- Clinic of Cardiology and Pneumology, Universitätsmedizin Göttingen, Göttingen, Germany
| | - Joachim Lotz
- Institute for Diagnostic and Interventional Radiology, Universitätsmedizin Göttingen, Göttingen, Germany
| | - Martin Uecker
- Institute for Diagnostic and Interventional Radiology, Universitätsmedizin Göttingen, Göttingen, Germany.
- German Centre for Cardiovascular Research (DZHK), Partner Site Göttingen, Göttingen, Germany.
- Institute of Biomedical Imaging, Graz University of Technology, Graz, Austria.
| |
Collapse
|
5
|
Alnasser TN, Abdulaal L, Maiter A, Sharkey M, Dwivedi K, Salehi M, Garg P, Swift AJ, Alabed S. Advancements in cardiac structures segmentation: a comprehensive systematic review of deep learning in CT imaging. Front Cardiovasc Med 2024; 11:1323461. [PMID: 38317865 PMCID: PMC10839106 DOI: 10.3389/fcvm.2024.1323461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 01/08/2024] [Indexed: 02/07/2024] Open
Abstract
Background Segmentation of cardiac structures is an important step in evaluation of the heart on imaging. There has been growing interest in how artificial intelligence (AI) methods-particularly deep learning (DL)-can be used to automate this process. Existing AI approaches to cardiac segmentation have mostly focused on cardiac MRI. This systematic review aimed to appraise the performance and quality of supervised DL tools for the segmentation of cardiac structures on CT. Methods Embase and Medline databases were searched to identify related studies from January 1, 2013 to December 4, 2023. Original research studies published in peer-reviewed journals after January 1, 2013 were eligible for inclusion if they presented supervised DL-based tools for the segmentation of cardiac structures and non-coronary great vessels on CT. The data extracted from eligible studies included information about cardiac structure(s) being segmented, study location, DL architectures and reported performance metrics such as the Dice similarity coefficient (DSC). The quality of the included studies was assessed using the Checklist for Artificial Intelligence in Medical Imaging (CLAIM). Results 18 studies published after 2020 were included. The DSC scores median achieved for the most commonly segmented structures were left atrium (0.88, IQR 0.83-0.91), left ventricle (0.91, IQR 0.89-0.94), left ventricle myocardium (0.83, IQR 0.82-0.92), right atrium (0.88, IQR 0.83-0.90), right ventricle (0.91, IQR 0.85-0.92), and pulmonary artery (0.92, IQR 0.87-0.93). Compliance of studies with CLAIM was variable. In particular, only 58% of studies showed compliance with dataset description criteria and most of the studies did not test or validate their models on external data (81%). Conclusion Supervised DL has been applied to the segmentation of various cardiac structures on CT. Most showed similar performance as measured by DSC values. Existing studies have been limited by the size and nature of the training datasets, inconsistent descriptions of ground truth annotations and lack of testing in external data or clinical settings. Systematic Review Registration [www.crd.york.ac.uk/prospero/], PROSPERO [CRD42023431113].
Collapse
Affiliation(s)
- Turki Nasser Alnasser
- Department of Infection, Immunity and Cardiovascular Disease, The University of Sheffield, Sheffield, United Kingdom
- College of Applied Medical Science, King Saud bin Abdulaziz University for Health Science, Riyadh, Saudi Arabia
| | - Lojain Abdulaal
- Department of Infection, Immunity and Cardiovascular Disease, The University of Sheffield, Sheffield, United Kingdom
| | - Ahmed Maiter
- Department of Infection, Immunity and Cardiovascular Disease, The University of Sheffield, Sheffield, United Kingdom
- Department of Clinical Radiology, Sheffield Teaching Hospitals, Sheffield, United Kingdom
| | - Michael Sharkey
- Department of Infection, Immunity and Cardiovascular Disease, The University of Sheffield, Sheffield, United Kingdom
| | - Krit Dwivedi
- Department of Infection, Immunity and Cardiovascular Disease, The University of Sheffield, Sheffield, United Kingdom
- Department of Clinical Radiology, Sheffield Teaching Hospitals, Sheffield, United Kingdom
| | - Mahan Salehi
- Department of Infection, Immunity and Cardiovascular Disease, The University of Sheffield, Sheffield, United Kingdom
| | - Pankaj Garg
- Norwich Medical School, Faculty of Medicine and Health Sciences, University of East Anglia, Norwich, United Kingdom
| | - Andrew James Swift
- Department of Infection, Immunity and Cardiovascular Disease, The University of Sheffield, Sheffield, United Kingdom
- Insigneo Institute, Faculty of Engineering, The University of Sheffield, Sheffield, United Kingdom
| | - Samer Alabed
- Department of Infection, Immunity and Cardiovascular Disease, The University of Sheffield, Sheffield, United Kingdom
- Department of Clinical Radiology, Sheffield Teaching Hospitals, Sheffield, United Kingdom
- Insigneo Institute, Faculty of Engineering, The University of Sheffield, Sheffield, United Kingdom
| |
Collapse
|
6
|
Kim DY, Oh HW, Suh CH. Reporting Quality of Research Studies on AI Applications in Medical Images According to the CLAIM Guidelines in a Radiology Journal With a Strong Prominence in Asia. Korean J Radiol 2023; 24:1179-1189. [PMID: 38016678 PMCID: PMC10701000 DOI: 10.3348/kjr.2023.1027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 10/25/2023] [Accepted: 10/26/2023] [Indexed: 11/30/2023] Open
Abstract
OBJECTIVE We aimed to evaluate the reporting quality of research articles that applied deep learning to medical imaging. Using the Checklist for Artificial Intelligence in Medical Imaging (CLAIM) guidelines and a journal with prominence in Asia as a sample, we intended to provide an insight into reporting quality in the Asian region and establish a journal-specific audit. MATERIALS AND METHODS A total of 38 articles published in the Korean Journal of Radiology between June 2018 and January 2023 were analyzed. The analysis included calculating the percentage of studies that adhered to each CLAIM item and identifying items that were met by ≤ 50% of the studies. The article review was initially conducted independently by two reviewers, and the consensus results were used for the final analysis. We also compared adherence rates to CLAIM before and after December 2020. RESULTS Of the 42 items in the CLAIM guidelines, 12 items (29%) were satisfied by ≤ 50% of the included articles. None of the studies reported handling missing data (item #13). Only one study respectively presented the use of de-identification methods (#12), intended sample size (#19), robustness or sensitivity analysis (#30), and full study protocol (#41). Of the studies, 35% reported the selection of data subsets (#10), 40% reported registration information (#40), and 50% measured inter and intrarater variability (#18). No significant changes were observed in the rates of adherence to these 12 items before and after December 2020. CONCLUSION The reporting quality of artificial intelligence studies according to CLAIM guidelines, in our study sample, showed room for improvement. We recommend that the authors and reviewers have a solid understanding of the relevant reporting guidelines and ensure that the essential elements are adequately reported when writing and reviewing the manuscripts for publication.
Collapse
Affiliation(s)
- Dong Yeong Kim
- Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | | | - Chong Hyun Suh
- Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea.
| |
Collapse
|
7
|
Maiter A, Hocking K, Matthews S, Taylor J, Sharkey M, Metherall P, Alabed S, Dwivedi K, Shahin Y, Anderson E, Holt S, Rowbotham C, Kamil MA, Hoggard N, Balasubramanian SP, Swift A, Johns CS. Evaluating the performance of artificial intelligence software for lung nodule detection on chest radiographs in a retrospective real-world UK population. BMJ Open 2023; 13:e077348. [PMID: 37940155 PMCID: PMC10632826 DOI: 10.1136/bmjopen-2023-077348] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 10/16/2023] [Indexed: 11/10/2023] Open
Abstract
OBJECTIVES Early identification of lung cancer on chest radiographs improves patient outcomes. Artificial intelligence (AI) tools may increase diagnostic accuracy and streamline this pathway. This study evaluated the performance of commercially available AI-based software trained to identify cancerous lung nodules on chest radiographs. DESIGN This retrospective study included primary care chest radiographs acquired in a UK centre. The software evaluated each radiograph independently and outputs were compared with two reference standards: (1) the radiologist report and (2) the diagnosis of cancer by multidisciplinary team decision. Failure analysis was performed by interrogating the software marker locations on radiographs. PARTICIPANTS 5722 consecutive chest radiographs were included from 5592 patients (median age 59 years, 53.8% women, 1.6% prevalence of cancer). RESULTS Compared with radiologist reports for nodule detection, the software demonstrated sensitivity 54.5% (95% CI 44.2% to 64.4%), specificity 83.2% (82.2% to 84.1%), positive predictive value (PPV) 5.5% (4.6% to 6.6%) and negative predictive value (NPV) 99.0% (98.8% to 99.2%). Compared with cancer diagnosis, the software demonstrated sensitivity 60.9% (50.1% to 70.9%), specificity 83.3% (82.3% to 84.2%), PPV 5.6% (4.8% to 6.6%) and NPV 99.2% (99.0% to 99.4%). Normal or variant anatomy was misidentified as an abnormality in 69.9% of the 943 false positive cases. CONCLUSIONS The software demonstrated considerable underperformance in this real-world patient cohort. Failure analysis suggested a lack of generalisability in the training and testing datasets as a potential factor. The low PPV carries the risk of over-investigation and limits the translation of the software to clinical practice. Our findings highlight the importance of training and testing software in representative datasets, with broader implications for the implementation of AI tools in imaging.
Collapse
Affiliation(s)
- Ahmed Maiter
- School of Medicine and Population Health, The University of Sheffield, Sheffield, UK
- Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Katherine Hocking
- Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Suzanne Matthews
- Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
- Medical Imaging and Medical Physics, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Jonathan Taylor
- Medical Imaging and Medical Physics, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Michael Sharkey
- Medical Imaging and Medical Physics, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Peter Metherall
- Medical Imaging and Medical Physics, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Samer Alabed
- School of Medicine and Population Health, The University of Sheffield, Sheffield, UK
- Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Krit Dwivedi
- School of Medicine and Population Health, The University of Sheffield, Sheffield, UK
- Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Yousef Shahin
- School of Medicine and Population Health, The University of Sheffield, Sheffield, UK
- Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Elizabeth Anderson
- Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Sarah Holt
- Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | | | - Mohamed A Kamil
- Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
| | - Nigel Hoggard
- School of Medicine and Population Health, The University of Sheffield, Sheffield, UK
- Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
- NIHR Sheffield Biomedical Research Centre, Sheffield, UK
| | - Saba P Balasubramanian
- Medical Imaging and Medical Physics, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
- Surgical directorate, Sheffield Teaching Hospitals Foundation NHS Trust, Sheffield, UK
| | - Andrew Swift
- School of Medicine and Population Health, The University of Sheffield, Sheffield, UK
- Radiology, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
- NIHR Sheffield Biomedical Research Centre, Sheffield, UK
| | | |
Collapse
|
8
|
Solís-Lemus JA, Baptiste T, Barrows R, Sillett C, Gharaviri A, Raffaele G, Razeghi O, Strocchi M, Sim I, Kotadia I, Bodagh N, O'Hare D, O'Neill M, Williams SE, Roney C, Niederer S. Evaluation of an open-source pipeline to create patient-specific left atrial models: A reproducibility study. Comput Biol Med 2023; 162:107009. [PMID: 37301099 PMCID: PMC10790305 DOI: 10.1016/j.compbiomed.2023.107009] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 04/11/2023] [Accepted: 05/03/2023] [Indexed: 06/12/2023]
Abstract
This work presents an open-source software pipeline to create patient-specific left atrial models with fibre orientations and a fibrDEFAULTosis map, suitable for electrophysiology simulations, and quantifies the intra and inter observer reproducibility of the model creation. The semi-automatic pipeline takes as input a contrast enhanced magnetic resonance angiogram, and a late gadolinium enhanced (LGE) contrast magnetic resonance (CMR). Five operators were allocated 20 cases each from a set of 50 CMR datasets to create a total of 100 models to evaluate inter and intra-operator variability. Each output model consisted of: (1) a labelled surface mesh open at the pulmonary veins and mitral valve, (2) fibre orientations mapped from a diffusion tensor MRI (DTMRI) human atlas, (3) fibrosis map extracted from the LGE-CMR scan, and (4) simulation of local activation time (LAT) and phase singularity (PS) mapping. Reproducibility in our pipeline was evaluated by comparing agreement in shape of the output meshes, fibrosis distribution in the left atrial body, and fibre orientations. Reproducibility in simulations outputs was evaluated in the LAT maps by comparing the total activation times, and the mean conduction velocity (CV). PS maps were compared with the structural similarity index measure (SSIM). The users processed in total 60 cases for inter and 40 cases for intra-operator variability. Our workflow allows a single model to be created in 16.72 ± 12.25 min. Similarity was measured with shape, percentage of fibres oriented in the same direction, and intra-class correlation coefficient (ICC) for the fibrosis calculation. Shape differed noticeably only with users' selection of the mitral valve and the length of the pulmonary veins from the ostia to the distal end; fibrosis agreement was high, with ICC of 0.909 (inter) and 0.999 (intra); fibre orientation agreement was high with 60.63% (inter) and 71.77% (intra). The LAT showed good agreement, where the median ± IQR of the absolute difference of the total activation times was 2.02 ± 2.45 ms for inter, and 1.37 ± 2.45 ms for intra. Also, the average ± sd of the mean CV difference was -0.00404 ± 0.0155 m/s for inter, and 0.0021 ± 0.0115 m/s for intra. Finally, the PS maps showed a moderately good agreement in SSIM for inter and intra, where the mean ± sd SSIM for inter and intra were 0.648 ± 0.21 and 0.608 ± 0.15, respectively. Although we found notable differences in the models, as a consequence of user input, our tests show that the uncertainty caused by both inter and intra-operator variability is comparable with uncertainty due to estimated fibres, and image resolution accuracy of segmentation tools.
Collapse
Affiliation(s)
- José Alonso Solís-Lemus
- School of Biomedical Engineering & Imaging Sciences, King's College London, St Thomas Hospital, London, SE1 7EH, UK.
| | - Tiffany Baptiste
- School of Biomedical Engineering & Imaging Sciences, King's College London, St Thomas Hospital, London, SE1 7EH, UK
| | - Rosie Barrows
- School of Biomedical Engineering & Imaging Sciences, King's College London, St Thomas Hospital, London, SE1 7EH, UK
| | - Charles Sillett
- School of Biomedical Engineering & Imaging Sciences, King's College London, St Thomas Hospital, London, SE1 7EH, UK
| | - Ali Gharaviri
- School of Biomedical Engineering & Imaging Sciences, King's College London, St Thomas Hospital, London, SE1 7EH, UK; Centre for Cardiovascular Science, University of Edinburgh, Old College, South Bridge, Edinburgh, EH8 9YL, Scotland, UK
| | - Giulia Raffaele
- School of Biomedical Engineering & Imaging Sciences, King's College London, St Thomas Hospital, London, SE1 7EH, UK; School of Medical Education, King's College London, St Thomas Hospital, London, SE1 7EH, UK
| | - Orod Razeghi
- School of Biomedical Engineering & Imaging Sciences, King's College London, St Thomas Hospital, London, SE1 7EH, UK; Department of Haematology, NHS Blood and Transplant Centre, University of Cambridge, Cambridge, UK
| | - Marina Strocchi
- School of Biomedical Engineering & Imaging Sciences, King's College London, St Thomas Hospital, London, SE1 7EH, UK
| | - Iain Sim
- School of Biomedical Engineering & Imaging Sciences, King's College London, St Thomas Hospital, London, SE1 7EH, UK
| | - Irum Kotadia
- School of Biomedical Engineering & Imaging Sciences, King's College London, St Thomas Hospital, London, SE1 7EH, UK
| | - Neil Bodagh
- School of Biomedical Engineering & Imaging Sciences, King's College London, St Thomas Hospital, London, SE1 7EH, UK
| | - Daniel O'Hare
- School of Biomedical Engineering & Imaging Sciences, King's College London, St Thomas Hospital, London, SE1 7EH, UK
| | - Mark O'Neill
- School of Biomedical Engineering & Imaging Sciences, King's College London, St Thomas Hospital, London, SE1 7EH, UK
| | - Steven E Williams
- School of Biomedical Engineering & Imaging Sciences, King's College London, St Thomas Hospital, London, SE1 7EH, UK; Centre for Cardiovascular Science, University of Edinburgh, Old College, South Bridge, Edinburgh, EH8 9YL, Scotland, UK
| | - Caroline Roney
- School of Biomedical Engineering & Imaging Sciences, King's College London, St Thomas Hospital, London, SE1 7EH, UK; Queen Mary University of London, Mile End Rd, Bethnal Green, London, E1 4NS, UK
| | - Steven Niederer
- School of Biomedical Engineering & Imaging Sciences, King's College London, St Thomas Hospital, London, SE1 7EH, UK; Alan Turing Institute, British Library, 96 Euston Rd, London, NW1 2DB, UK
| |
Collapse
|
9
|
Kocak B, Bulut E, Bayrak ON, Okumus AA, Altun O, Borekci Arvas Z, Kavukoglu I. NEgatiVE results in Radiomics research (NEVER): A meta-research study of publication bias in leading radiology journals. Eur J Radiol 2023; 163:110830. [PMID: 37119709 DOI: 10.1016/j.ejrad.2023.110830] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Revised: 04/03/2023] [Accepted: 04/05/2023] [Indexed: 05/01/2023]
Abstract
PURPOSE The purpose of this study was to conduct a meta-research of radiomics-related articles for the publication of negative results, with a focus on the leading clinical radiology journals due to their purportedly high editorial standards. METHODS A literature search was performed in PubMed to identify original research studies on radiomics (last search date: August 16th, 2022). The search was restricted to studies published in Q1 clinical radiology journals indexed by Scopus and Web of Science. Following an a priori power analysis based on our null hypothesis, a random sampling of the published literature was conducted. Besides the six baseline study characteristics, a total of three items about publication bias were evaluated. Agreement between raters was analyzed. Disagreements were resolved through consensus. Statistical synthesis of the qualitative evaluations was presented. RESULTS Following a priori power analysis, we included a random sample of 149 publications in this study. Most of the publications were retrospective (95%; 142/149), based on private data (91%; 136/149), centered on a single institution (75%; 111/149), and lacked external validation (81%; 121/149). Slightly fewer than half (44%; 66/149) made no comparison to non-radiomic approaches. Overall, only one study (1%; 1/149) reported negative results for radiomics, yielding a statistically significant binomial test (p < 0.0001). CONCLUSION The top clinical radiology journals almost never publish negative results, having a strong bias toward publishing positive results. Almost half of the publications did not even compare their approach with a non-radiomic method.
Collapse
Affiliation(s)
- Burak Kocak
- Department of Radiology, University of Health Sciences, Basaksehir Cam and Sakura City Hospital, Istanbul, Turkey.
| | - Elif Bulut
- Department of Radiology, University of Health Sciences, Basaksehir Cam and Sakura City Hospital, Istanbul, Turkey
| | - Osman Nuri Bayrak
- Department of Radiology, University of Health Sciences, Basaksehir Cam and Sakura City Hospital, Istanbul, Turkey
| | - Ahmet Arda Okumus
- Department of Radiology, University of Health Sciences, Basaksehir Cam and Sakura City Hospital, Istanbul, Turkey
| | - Omer Altun
- Department of Radiology, University of Health Sciences, Basaksehir Cam and Sakura City Hospital, Istanbul, Turkey
| | - Zeynep Borekci Arvas
- Department of Radiology, University of Health Sciences, Basaksehir Cam and Sakura City Hospital, Istanbul, Turkey
| | - Irem Kavukoglu
- Department of Radiology, University of Health Sciences, Basaksehir Cam and Sakura City Hospital, Istanbul, Turkey
| |
Collapse
|
10
|
Hu J, Wang Y, Guo D, Qu Z, Sui C, He G, Wang S, Chen X, Wang C, Liu X. Diagnostic performance of magnetic resonance imaging-based machine learning in Alzheimer's disease detection: a meta-analysis. Neuroradiology 2023; 65:513-527. [PMID: 36477499 DOI: 10.1007/s00234-022-03098-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 11/28/2022] [Indexed: 12/12/2022]
Abstract
PURPOSE Advanced machine learning (ML) algorithms can assist rapid medical image recognition and realize automatic, efficient, noninvasive, and convenient diagnosis. We aim to further evaluate the diagnostic performance of ML to distinguish patients with probable Alzheimer's disease (AD) from normal older adults based on structural magnetic resonance imaging (MRI). METHODS The Medline, Embase, and Cochrane Library databases were searched for relevant literature published up until July 2021. We used the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool and Checklist for Artificial Intelligence in Medical Imaging (CLAIM) to evaluate all included studies' quality and potential bias. Random-effects models were used to calculate pooled sensitivity and specificity, and the Deeks' test was used to assess publication bias. RESULTS We included 24 models based on different brain features extracted by ML algorithms in 19 papers. The pooled sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, diagnostic odds ratio, and area under the summary receiver operating characteristic curve for ML in detecting AD were 0.85 (95%CI 0.81-0.89), 0.88 (95%CI 0.84-0.91), 7.15 (95%CI 5.40-9.47), 0.17 (95%CI 0.12-0.22), 43.34 (95%CI 26.89-69.84), and 0.93 (95%CI 0.91-0.95). CONCLUSION ML using structural MRI data performed well in diagnosing probable AD patients and normal elderly. However, more high-quality, large-scale prospective studies are needed to further enhance the reliability and generalizability of ML for clinical applications before it can be introduced into clinical practice.
Collapse
Affiliation(s)
- Jiayi Hu
- Department of Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, 130021, Jilin, China
| | - Yashan Wang
- Department of Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, 130021, Jilin, China
| | - Dingjie Guo
- Department of Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, 130021, Jilin, China
| | - Zihan Qu
- Department of Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, 130021, Jilin, China
| | - Chuanying Sui
- Department of Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, 130021, Jilin, China
| | - Guangliang He
- Department of Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, 130021, Jilin, China
| | - Song Wang
- Department of Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, 130021, Jilin, China
| | - Xiaofei Chen
- Department of Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, 130021, Jilin, China
| | - Chunpeng Wang
- School of Mathematics and Statistics, Northeast Normal University, Changchun, Jilin, China.
| | - Xin Liu
- Department of Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, 130021, Jilin, China.
| |
Collapse
|
11
|
Maiter A, Salehi M, Swift AJ, Alabed S. How should studies using AI be reported? lessons from a systematic review in cardiac MRI. FRONTIERS IN RADIOLOGY 2023; 3:1112841. [PMID: 37492379 PMCID: PMC10364997 DOI: 10.3389/fradi.2023.1112841] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 01/11/2023] [Indexed: 07/27/2023]
Abstract
Recent years have seen a dramatic increase in studies presenting artificial intelligence (AI) tools for cardiac imaging. Amongst these are AI tools that undertake segmentation of structures on cardiac MRI (CMR), an essential step in obtaining clinically relevant functional information. The quality of reporting of these studies carries significant implications for advancement of the field and the translation of AI tools to clinical practice. We recently undertook a systematic review to evaluate the quality of reporting of studies presenting automated approaches to segmentation in cardiac MRI (Alabed et al. 2022 Quality of reporting in AI cardiac MRI segmentation studies-a systematic review and recommendations for future studies. Frontiers in Cardiovascular Medicine 9:956811). 209 studies were assessed for compliance with the Checklist for AI in Medical Imaging (CLAIM), a framework for reporting. We found variable-and sometimes poor-quality of reporting and identified significant and frequently missing information in publications. Compliance with CLAIM was high for descriptions of models (100%, IQR 80%-100%), but lower than expected for descriptions of study design (71%, IQR 63-86%), datasets used in training and testing (63%, IQR 50%-67%) and model performance (60%, IQR 50%-70%). Here, we present a summary of our key findings, aimed at general readers who may not be experts in AI, and use them as a framework to discuss the factors determining quality of reporting, making recommendations for improving the reporting of research in this field. We aim to assist researchers in presenting their work and readers in their appraisal of evidence. Finally, we emphasise the need for close scrutiny of studies presenting AI tools, even in the face of the excitement surrounding AI in cardiac imaging.
Collapse
Affiliation(s)
- Ahmed Maiter
- Department of Infection, Immunity & Cardiovascular Disease, University of Sheffield, Sheffield, United Kingdom
- Department of Radiology, Sheffield Teaching Hospitals, Sheffield, United Kingdom
| | - Mahan Salehi
- Department of Infection, Immunity & Cardiovascular Disease, University of Sheffield, Sheffield, United Kingdom
| | - Andrew J. Swift
- Department of Infection, Immunity & Cardiovascular Disease, University of Sheffield, Sheffield, United Kingdom
- Department of Radiology, Sheffield Teaching Hospitals, Sheffield, United Kingdom
| | - Samer Alabed
- Department of Infection, Immunity & Cardiovascular Disease, University of Sheffield, Sheffield, United Kingdom
- Department of Radiology, Sheffield Teaching Hospitals, Sheffield, United Kingdom
| |
Collapse
|