1
|
Leinonen T, Wong D, Vasankari A, Wahab A, Nadarajah R, Kaisti M, Airola A. Empirical investigation of multi-source cross-validation in clinical ECG classification. Comput Biol Med 2024; 183:109271. [PMID: 39427424 DOI: 10.1016/j.compbiomed.2024.109271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Revised: 10/08/2024] [Accepted: 10/09/2024] [Indexed: 10/22/2024]
Abstract
Traditionally, machine learning-based clinical prediction models have been trained and evaluated on patient data from a single source, such as a hospital. Cross-validation methods can be used to estimate the accuracy of such models on new patients originating from the same source, by repeated random splitting of the data. However, such estimates tend to be highly overoptimistic when compared to accuracy obtained from deploying models to sources not represented in the dataset, such as a new hospital. The increasing availability of multi-source medical datasets provides new opportunities for obtaining more comprehensive and realistic evaluations of expected accuracy through source-level cross-validation designs. In this study, we present a systematic empirical evaluation of standard K-fold cross-validation and leave-source-out cross-validation methods in a multi-source setting. We consider the task of electrocardiogram based cardiovascular disease classification, combining and harmonizing the openly available PhysioNet/CinC Challenge 2021 and the Shandong Provincial Hospital datasets for our study. Our results show that K-fold cross-validation, both on single-source and multi-source data, systemically overestimates prediction performance when the end goal is to generalize to new sources. Leave-source-out cross-validation provides more reliable performance estimates, having close to zero bias though larger variability. The evaluation highlights the dangers of obtaining misleading cross-validation results on medical data and demonstrates how these issues can be mitigated when having access to multi-source data.
Collapse
Affiliation(s)
| | - David Wong
- Leeds Institute of Health Sciences, University of Leeds, UK
| | | | - Ali Wahab
- Institute of Cardiovascular and Metabolic Medicine, University of Leeds, UK
| | - Ramesh Nadarajah
- Institute of Cardiovascular and Metabolic Medicine, University of Leeds, UK
| | - Matti Kaisti
- Department of Computing, University of Turku, Finland
| | - Antti Airola
- Department of Computing, University of Turku, Finland.
| |
Collapse
|
2
|
Spagnolo F, Depeursinge A, Schädelin S, Akbulut A, Müller H, Barakovic M, Melie-Garcia L, Bach Cuadra M, Granziera C. How far MS lesion detection and segmentation are integrated into the clinical workflow? A systematic review. Neuroimage Clin 2023; 39:103491. [PMID: 37659189 PMCID: PMC10480555 DOI: 10.1016/j.nicl.2023.103491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 08/04/2023] [Indexed: 09/04/2023]
Abstract
INTRODUCTION Over the past few years, the deep learning community has developed and validated a plethora of tools for lesion detection and segmentation in Multiple Sclerosis (MS). However, there is an important gap between validating models technically and clinically. To this end, a six-step framework necessary for the development, validation, and integration of quantitative tools in the clinic was recently proposed under the name of the Quantitative Neuroradiology Initiative (QNI). AIMS Investigate to what extent automatic tools in MS fulfill the QNI framework necessary to integrate automated detection and segmentation into the clinical neuroradiology workflow. METHODS Adopting the systematic Cochrane literature review methodology, we screened and summarised published scientific articles that perform automatic MS lesions detection and segmentation. We categorised the retrieved studies based on their degree of fulfillment of QNI's six-steps, which include a tool's technical assessment, clinical validation, and integration. RESULTS We found 156 studies; 146/156 (94%) fullfilled the first QNI step, 155/156 (99%) the second, 8/156 (5%) the third, 3/156 (2%) the fourth, 5/156 (3%) the fifth and only one the sixth. CONCLUSIONS To date, little has been done to evaluate the clinical performance and the integration in the clinical workflow of available methods for MS lesion detection/segmentation. In addition, the socio-economic effects and the impact on patients' management of such tools remain almost unexplored.
Collapse
Affiliation(s)
- Federico Spagnolo
- Translational Imaging in Neurology (ThINK) Basel, Department of Biomedical Engineering, Faculty of Medicine, University Hospital Basel and University of Basel, Basel, Switzerland; Department of Neurology, University Hospital Basel, Basel, Switzerland; Research Center for Clinical Neuroimmunology and Neuroscience Basel (RC2NB), University Hospital Basel and University of Basel, Basel, Switzerland; MedGIFT, Institute of Informatics, School of Management, HES-SO Valais-Wallis University of Applied Sciences and Arts Western Switzerland, Sierre, Switzerland
| | - Adrien Depeursinge
- MedGIFT, Institute of Informatics, School of Management, HES-SO Valais-Wallis University of Applied Sciences and Arts Western Switzerland, Sierre, Switzerland; Nuclear Medicine and Molecular Imaging Department, Lausanne University Hospital (CHUV) and University of Lausanne, Lausanne, Switzerland
| | - Sabine Schädelin
- Translational Imaging in Neurology (ThINK) Basel, Department of Biomedical Engineering, Faculty of Medicine, University Hospital Basel and University of Basel, Basel, Switzerland; Clinical Trial Unit, Department of Clinical Research, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Aysenur Akbulut
- Translational Imaging in Neurology (ThINK) Basel, Department of Biomedical Engineering, Faculty of Medicine, University Hospital Basel and University of Basel, Basel, Switzerland; Ankara University School of Medicine, Ankara, Turkey
| | - Henning Müller
- MedGIFT, Institute of Informatics, School of Management, HES-SO Valais-Wallis University of Applied Sciences and Arts Western Switzerland, Sierre, Switzerland; The Sense Research and Innovation Center, Lausanne and Sion, Switzerland
| | - Muhamed Barakovic
- Translational Imaging in Neurology (ThINK) Basel, Department of Biomedical Engineering, Faculty of Medicine, University Hospital Basel and University of Basel, Basel, Switzerland; Department of Neurology, University Hospital Basel, Basel, Switzerland; Research Center for Clinical Neuroimmunology and Neuroscience Basel (RC2NB), University Hospital Basel and University of Basel, Basel, Switzerland
| | - Lester Melie-Garcia
- Translational Imaging in Neurology (ThINK) Basel, Department of Biomedical Engineering, Faculty of Medicine, University Hospital Basel and University of Basel, Basel, Switzerland; Department of Neurology, University Hospital Basel, Basel, Switzerland; Research Center for Clinical Neuroimmunology and Neuroscience Basel (RC2NB), University Hospital Basel and University of Basel, Basel, Switzerland
| | - Meritxell Bach Cuadra
- CIBM Center for Biomedical Imaging, Lausanne, Switzerland; Radiology Department, Lausanne University Hospital (CHUV) and University of Lausanne, Lausanne, Switzerland
| | - Cristina Granziera
- Translational Imaging in Neurology (ThINK) Basel, Department of Biomedical Engineering, Faculty of Medicine, University Hospital Basel and University of Basel, Basel, Switzerland; Department of Neurology, University Hospital Basel, Basel, Switzerland; Research Center for Clinical Neuroimmunology and Neuroscience Basel (RC2NB), University Hospital Basel and University of Basel, Basel, Switzerland.
| |
Collapse
|
4
|
Gryska E, Schneiderman J, Björkman-Burtscher I, Heckemann RA. Automatic brain lesion segmentation on standard magnetic resonance images: a scoping review. BMJ Open 2021; 11:e042660. [PMID: 33514580 PMCID: PMC7849889 DOI: 10.1136/bmjopen-2020-042660] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/11/2020] [Revised: 01/09/2021] [Accepted: 01/12/2021] [Indexed: 12/11/2022] Open
Abstract
OBJECTIVES Medical image analysis practices face challenges that can potentially be addressed with algorithm-based segmentation tools. In this study, we map the field of automatic MR brain lesion segmentation to understand the clinical applicability of prevalent methods and study designs, as well as challenges and limitations in the field. DESIGN Scoping review. SETTING Three databases (PubMed, IEEE Xplore and Scopus) were searched with tailored queries. Studies were included based on predefined criteria. Emerging themes during consecutive title, abstract, methods and whole-text screening were identified. The full-text analysis focused on materials, preprocessing, performance evaluation and comparison. RESULTS Out of 2990 unique articles identified through the search, 441 articles met the eligibility criteria, with an estimated growth rate of 10% per year. We present a general overview and trends in the field with regard to publication sources, segmentation principles used and types of lesions. Algorithms are predominantly evaluated by measuring the agreement of segmentation results with a trusted reference. Few articles describe measures of clinical validity. CONCLUSIONS The observed reporting practices leave room for improvement with a view to studying replication, method comparison and clinical applicability. To promote this improvement, we propose a list of recommendations for future studies in the field.
Collapse
Affiliation(s)
- Emilia Gryska
- Medical Radiation Sciences, Goteborgs universitet Institutionen for kliniska vetenskaper, Goteborg, Sweden
| | - Justin Schneiderman
- Sektionen för klinisk neurovetenskap, Goteborgs Universitet Institutionen for Neurovetenskap och fysiologi, Goteborg, Sweden
| | | | - Rolf A Heckemann
- Medical Radiation Sciences, Goteborgs universitet Institutionen for kliniska vetenskaper, Goteborg, Sweden
| |
Collapse
|
5
|
Automatic segmentation of white matter hyperintensities from brain magnetic resonance images in the era of deep learning and big data - A systematic review. Comput Med Imaging Graph 2021; 88:101867. [PMID: 33508567 DOI: 10.1016/j.compmedimag.2021.101867] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 12/23/2020] [Accepted: 12/31/2020] [Indexed: 11/20/2022]
Abstract
BACKGROUND White matter hyperintensities (WMH), of presumed vascular origin, are visible and quantifiable neuroradiological markers of brain parenchymal change. These changes may range from damage secondary to inflammation and other neurological conditions, through to healthy ageing. Fully automatic WMH quantification methods are promising, but still, traditional semi-automatic methods seem to be preferred in clinical research. We systematically reviewed the literature for fully automatic methods developed in the last five years, to assess what are considered state-of-the-art techniques, as well as trends in the analysis of WMH of presumed vascular origin. METHOD We registered the systematic review protocol with the International Prospective Register of Systematic Reviews (PROSPERO), registration number - CRD42019132200. We conducted the search for fully automatic methods developed from 2015 to July 2020 on Medline, Science direct, IEE Explore, and Web of Science. We assessed risk of bias and applicability of the studies using QUADAS 2. RESULTS The search yielded 2327 papers after removing 104 duplicates. After screening titles, abstracts and full text, 37 were selected for detailed analysis. Of these, 16 proposed a supervised segmentation method, 10 proposed an unsupervised segmentation method, and 11 proposed a deep learning segmentation method. Average DSC values ranged from 0.538 to 0.91, being the highest value obtained from an unsupervised segmentation method. Only four studies validated their method in longitudinal samples, and eight performed an additional validation using clinical parameters. Only 8/37 studies made available their methods in public repositories. CONCLUSIONS We found no evidence that favours deep learning methods over the more established k-NN, linear regression and unsupervised methods in this task. Data and code availability, bias in study design and ground truth generation influence the wider validation and applicability of these methods in clinical research.
Collapse
|
6
|
Kuijf HJ, Biesbroek JM, De Bresser J, Heinen R, Andermatt S, Bento M, Berseth M, Belyaev M, Cardoso MJ, Casamitjana A, Collins DL, Dadar M, Georgiou A, Ghafoorian M, Jin D, Khademi A, Knight J, Li H, Llado X, Luna M, Mahmood Q, McKinley R, Mehrtash A, Ourselin S, Park BY, Park H, Park SH, Pezold S, Puybareau E, Rittner L, Sudre CH, Valverde S, Vilaplana V, Wiest R, Xu Y, Xu Z, Zeng G, Zhang J, Zheng G, Chen C, van der Flier W, Barkhof F, Viergever MA, Biessels GJ. Standardized Assessment of Automatic Segmentation of White Matter Hyperintensities and Results of the WMH Segmentation Challenge. IEEE TRANSACTIONS ON MEDICAL IMAGING 2019; 38:2556-2568. [PMID: 30908194 PMCID: PMC7590957 DOI: 10.1109/tmi.2019.2905770] [Citation(s) in RCA: 110] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Quantification of cerebral white matter hyperintensities (WMH) of presumed vascular origin is of key importance in many neurological research studies. Currently, measurements are often still obtained from manual segmentations on brain MR images, which is a laborious procedure. The automatic WMH segmentation methods exist, but a standardized comparison of the performance of such methods is lacking. We organized a scientific challenge, in which developers could evaluate their methods on a standardized multi-center/-scanner image dataset, giving an objective comparison: the WMH Segmentation Challenge. Sixty T1 + FLAIR images from three MR scanners were released with the manual WMH segmentations for training. A test set of 110 images from five MR scanners was used for evaluation. The segmentation methods had to be containerized and submitted to the challenge organizers. Five evaluation metrics were used to rank the methods: 1) Dice similarity coefficient; 2) modified Hausdorff distance (95th percentile); 3) absolute log-transformed volume difference; 4) sensitivity for detecting individual lesions; and 5) F1-score for individual lesions. In addition, the methods were ranked on their inter-scanner robustness; 20 participants submitted their methods for evaluation. This paper provides a detailed analysis of the results. In brief, there is a cluster of four methods that rank significantly better than the other methods, with one clear winner. The inter-scanner robustness ranking shows that not all the methods generalize to unseen scanners. The challenge remains open for future submissions and provides a public platform for method evaluation.
Collapse
|