1
|
Somaskandhan P, Leppänen T, Terrill PI, Sigurdardottir S, Arnardottir ES, Ólafsdóttir KA, Serwatko M, Sigurðardóttir SÞ, Clausen M, Töyräs J, Korkalainen H. Deep learning-based algorithm accurately classifies sleep stages in preadolescent children with sleep-disordered breathing symptoms and age-matched controls. Front Neurol 2023; 14:1162998. [PMID: 37122306 PMCID: PMC10140398 DOI: 10.3389/fneur.2023.1162998] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Accepted: 03/23/2023] [Indexed: 05/02/2023] Open
Abstract
Introduction Visual sleep scoring has several shortcomings, including inter-scorer inconsistency, which may adversely affect diagnostic decision-making. Although automatic sleep staging in adults has been extensively studied, it is uncertain whether such sophisticated algorithms generalize well to different pediatric age groups due to distinctive EEG characteristics. The preadolescent age group (10-13-year-olds) is relatively understudied, and thus, we aimed to develop an automatic deep learning-based sleep stage classifier specifically targeting this cohort. Methods A dataset (n = 115) containing polysomnographic recordings of Icelandic preadolescent children with sleep-disordered breathing (SDB) symptoms, and age and sex-matched controls was utilized. We developed a combined convolutional and long short-term memory neural network architecture relying on electroencephalography (F4-M1), electrooculography (E1-M2), and chin electromyography signals. Performance relative to human scoring was further evaluated by analyzing intra- and inter-rater agreements in a subset (n = 10) of data with repeat scoring from two manual scorers. Results The deep learning-based model achieved an overall cross-validated accuracy of 84.1% (Cohen's kappa κ = 0.78). There was no meaningful performance difference between SDB-symptomatic (n = 53) and control subgroups (n = 52) [83.9% (κ = 0.78) vs. 84.2% (κ = 0.78)]. The inter-rater reliability between manual scorers was 84.6% (κ = 0.78), and the automatic method reached similar agreements with scorers, 83.4% (κ = 0.76) and 82.7% (κ = 0.75). Conclusion The developed algorithm achieved high classification accuracy and substantial agreements with two manual scorers; the performance metrics compared favorably with typical inter-rater reliability between manual scorers and performance reported in previous studies. These suggest that our algorithm may facilitate less labor-intensive and reliable automatic sleep scoring in preadolescent children.
Collapse
Affiliation(s)
- Pranavan Somaskandhan
- School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, QLD, Australia
- *Correspondence: Pranavan Somaskandhan,
| | - Timo Leppänen
- School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, QLD, Australia
- Department of Technical Physics, University of Eastern Finland, Kuopio, Finland
- Diagnostic Imaging Center, Kuopio University Hospital, Kuopio, Finland
| | - Philip I. Terrill
- School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, QLD, Australia
| | - Sigridur Sigurdardottir
- Reykjavik University Sleep Institute, School of Technology, Reykjavik University, Reykjavik, Iceland
| | - Erna Sif Arnardottir
- Reykjavik University Sleep Institute, School of Technology, Reykjavik University, Reykjavik, Iceland
- Internal Medicine Services, Landspitali–The National University Hospital of Iceland, Reykjavik, Iceland
| | - Kristín A. Ólafsdóttir
- Reykjavik University Sleep Institute, School of Technology, Reykjavik University, Reykjavik, Iceland
| | - Marta Serwatko
- Department of Clinical Engineering, Landspitali University Hospital, Reykjavik, Iceland
| | - Sigurveig Þ. Sigurðardóttir
- Department of Immunology, Landspitali University Hospital, Reykjavik, Iceland
- Faculty of Medicine, University of Iceland, Reykjavik, Iceland
| | - Michael Clausen
- Department of Allergy, Landspitali University Hospital, Reykjavik, Iceland
- Children's Hospital Reykjavik, Reykjavik, Iceland
| | - Juha Töyräs
- School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, QLD, Australia
- Department of Technical Physics, University of Eastern Finland, Kuopio, Finland
- Science Service Center, Kuopio University Hospital, Kuopio, Finland
| | - Henri Korkalainen
- Department of Technical Physics, University of Eastern Finland, Kuopio, Finland
- Diagnostic Imaging Center, Kuopio University Hospital, Kuopio, Finland
| |
Collapse
|
2
|
Borsky M, Serwatko M, Arnardottir ES, Mallett J. Towards Sleep Study Automation: Detection Evaluation of Respiratory-Related Events. IEEE J Biomed Health Inform 2022; 26:3418-3426. [PMID: 35294367 DOI: 10.1109/jbhi.2022.3159727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The diagnosis of sleep disordered breathing depends on the detection of several respiratory-related events: apneas, hypopneas, snores, or respiratory event-related arousals from sleep studies. While a number of automatic detection methods have been proposed, reproducibility of these methods has been an issue, in part due to the absence of a generally accepted protocol for evaluating their results. With sleep measurements this is usually treated as a classification problem and the accompanying issue of localization is not treated as similarly critical. To address these problems we present a detection evaluation protocol that is able to qualitatively assess the match between two annotations of respiratory-related events. This protocol relies on measuring the relative temporal overlap between two annotations in order to find an alignment that maximizes their F1-score at the sequence level. This protocol can be used in applications which require a precise estimate of the number of events, total event duration, and a joint estimate of event number and duration. We assess its application using a data set that contains over 10,000 manually annotated snore events from 9 subjects, and show that when using the American Academy of Sleep Medicine Manual standard, two sleep technologists can achieve an F1-score of 0.88 when identifying the presence of snore events. In addition, we drafted rules for marking snore boundaries and showed that one sleep technologist can achieve F1-score of 0.94 at the same tasks. Finally, we compared our protocol against the protocol that is used to evaluate sleep spindle detection and highlighted the differences.
Collapse
|
3
|
Montazeri K, Jonsson SA, Agustsson JS, Serwatko M, Gislason T, Arnardottir ES. The design of RIP belts impacts the reliability and quality of the measured respiratory signals. Sleep Breath 2021; 25:1535-1541. [PMID: 33411184 PMCID: PMC8376735 DOI: 10.1007/s11325-020-02268-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Revised: 11/24/2020] [Accepted: 11/28/2020] [Indexed: 12/02/2022]
Abstract
Purpose Evaluate the effect of respiratory inductance plethysmography (RIP) belt design on the reliability and quality of respiratory signals. A comparison of cannula flow to disposable cut-to-fit, semi-disposable folding and disposable RIP belts was performed in clinical home sleep apnea testing (HSAT) studies. Methods This was a retrospective study using clinical HSAT studies. The signal reliability of cannula, thorax, and abdomen RIP belts was determined by automatically identifying periods during which the signals did not represent respiratory airflow and breathing movements. Results were verified by manual scoring. RIP flow quality was determined by examining the correlation between the RIP flow and cannula flow when both signals were considered reliable. Results Of 767 clinical HSAT studies, mean signal reliability of the cut-to-fit, semi-disposable, and disposable thorax RIP belts was 83.0 ± 26.2%, 76.1 ± 24.4%, and 98.5 ± 9.3%, respectively. The signal reliability of the cannula was 92.5 ± 16.1%, 87.0 ± 23.3%, and 85.5 ± 24.5%, respectively. The automatic assessment of signal reliability for the RIP belts and cannula flow had a sensitivity of 50% and a specificity of 99% compared with manual assessment. The mean correlation of cannula flow to RIP flow from the cut-to-fit, semi-disposable, and disposable RIP belts was 0.79 ± 0.24, 0.52 ± 0.20, and 0.86 ± 0.18, respectively. Conclusion The design of RIP belts affects the reliability and quality of respiratory signals. The disposable RIP belts that had integrated contacts and did not fold on top of themselves performed the best. The cut-to-fit RIP belts were most likely to be unreliable, and the semi-disposable folding belts produced the lowest-quality RIP flow signals compared to the cannula flow signal. Supplementary Information The online version contains supplementary material available at 10.1007/s11325-020-02268-x.
Collapse
Affiliation(s)
| | | | | | - Marta Serwatko
- Department of Engineering, Reykjavik University, Reykjavik, Iceland
| | - Thorarinn Gislason
- Sleep Department, Landspitali-The National University Hospital of Iceland, Reykjavik, Iceland.,Faculty of Medicine, University of Iceland, Reykjavik, Iceland
| | - Erna S Arnardottir
- Department of Engineering, Reykjavik University, Reykjavik, Iceland.,Internal Medicine Services, Landspitali-The National University Hospital of Iceland, Reykjavik, Iceland.,Department of Computer Science, Reykjavik University, Reykjavik, Iceland
| |
Collapse
|