Domazetoski V, Gligoric G, Marinkovic M, Shvilkin A, Krsic J, Kocarev L, Ivanovic MD. The influence of atrial flutter in automated detection of atrial arrhythmias - are we ready to go into clinical practice?".
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022;
221:106901. [PMID:
35636359 DOI:
10.1016/j.cmpb.2022.106901]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2021] [Revised: 05/13/2022] [Accepted: 05/19/2022] [Indexed: 06/15/2023]
Abstract
OBJECTIVE
To investigate the impact of atrial flutter (Afl) in the atrial arrhythmias classification task. We additionally advocate the use of a subject-based split for future studies in the field in order to avoid within-subject correlation which may lead to over-optimistic inferences. Finally, we demonstrate the effectiveness of the classifiers outside of the initially studied circumstances, by performing an inter-dataset model evaluation of the classifiers in data from different sources.
METHODS
ECG signals of two private and three public (two MIT-BIH and Chapman ecgdb) databases were preprocessed and divided into 10s segments which were then subject to feature extraction. The created datasets were divided into a training and test set in two ways, based on a random split and a patient split. Classification was performed using the XGBoost classifier, as well as two benchmark classification models using both data splits. The trained models were then used to make predictions on the test data of the remaining datasets.
RESULTS
The XGBoost model yielded the best performance across all datasets compared to the remaining benchmark models, however variability in model performance was seen across datasets, with accuracy ranging from 70.6% to 89.4%, sensitivity ranging from 61.4% to 76.8%, and specificity ranging from 87.3% to 95.5%. When comparing the results between the patient and the random split, no significant difference was seen in the two private datasets and the Chapman dataset, where the number of samples per patient is low. Nonetheless, in the MIT-BIH dataset, where the average number of samples per patient is approximately 1300, a noticeable disparity was identified. The accuracy, sensitivity, and specificity of the random split in this dataset of 93.6%, 86.4%, and 95.9% respectively, were decreased to 88%, 61.4%, and 89.8% in the patient split, with the largest drop being in Afl sensitivity, from 71% to 5.4%. The inter-dataset scores were also significantly lower than their intra-dataset counterparts across all datasets.
CONCLUSIONS
CAD systems have great potential in the assistance of physicians in reliable, precise and efficient detection of arrhythmias. However, although compelling research has been done in the field, yielding models with excellent performances on their datasets, we show that these results may be over-optimistic. In our study, we give insight into the difficulty of detection of Afl on several datasets and show the need for a higher representation of Afl in public datasets. Furthermore, we show the necessity of a more structured evaluation of model performance through the use of a patient-based split and inter-dataset testing scheme to avoid the problem of within-subject correlation which may lead to misleadingly high scores. Finally, we stress the need for the creation and use of datasets with a higher number of patients and a more balanced representation of classes if we are to progress in this mission.
Collapse