1
|
Luo J, Zhao Y, Liu H, Zhang Y, Shi Z, Li R, Hei X, Ren X. SST: a snore shifted-window transformer method for potential obstructive sleep apnea patient diagnosis. Physiol Meas 2024; 45:035003. [PMID: 38316023 DOI: 10.1088/1361-6579/ad262b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Accepted: 02/05/2024] [Indexed: 02/07/2024]
Abstract
Objective.Obstructive sleep apnea (OSA) is a high-incidence disease that is seriously harmful and potentially dangerous. The objective of this study was to develop a noncontact sleep audio signal-based method for diagnosing potential OSA patients, aiming to provide a more convenient diagnostic approach compared to the traditional polysomnography (PSG) testing.Approach.The study employed a shifted window transformer model to detect snoring audio signals from whole-night sleep audio. First, a snoring detection model was trained on large-scale audio datasets. Subsequently, the deep feature statistical metrics of the detected snore audio were used to train a random forest classifier for OSA patient diagnosis.Main results.Using a self-collected dataset of 305 potential OSA patients, the proposed snore shifted-window transformer method (SST) achieved an accuracy of 85.9%, a sensitivity of 85.3%, and a precision of 85.6% in OSA patient classification. These values surpassed the state-of-the-art method by 9.7%, 10.7%, and 7.9%, respectively.Significance.The experimental results demonstrated that SST significantly improved the noncontact audio-based OSA diagnosis performance. The study's findings suggest a promising self-diagnosis method for potential OSA patients, potentially reducing the need for invasive and inconvenient diagnostic procedures.
Collapse
Affiliation(s)
- Jing Luo
- Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, Shaanxi, 710048, People's Republic of China
- Human-Machine Integration Intelligent Robot Shaanxi University Engineering Research Center, Xi'an University of Technology, Xi'an, Shaanxi, 710048, People's Republic of China
| | - Yinuo Zhao
- Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, Shaanxi, 710048, People's Republic of China
- Human-Machine Integration Intelligent Robot Shaanxi University Engineering Research Center, Xi'an University of Technology, Xi'an, Shaanxi, 710048, People's Republic of China
| | - Haiqin Liu
- Department of Otolaryngology Head and Neck Surgery & Center of Sleep Medicine, The Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, People's Republic of China
| | - Yitong Zhang
- Department of Otolaryngology Head and Neck Surgery & Center of Sleep Medicine, The Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, People's Republic of China
| | - Zhenghao Shi
- Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, Shaanxi, 710048, People's Republic of China
- Human-Machine Integration Intelligent Robot Shaanxi University Engineering Research Center, Xi'an University of Technology, Xi'an, Shaanxi, 710048, People's Republic of China
| | - Rui Li
- School of Mechanical and Instrumental Engineering, Xi'an University of Technology, Xi'an, Shaanxi, 710048, People's Republic of China
| | - Xinhong Hei
- Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, Shaanxi, 710048, People's Republic of China
- Human-Machine Integration Intelligent Robot Shaanxi University Engineering Research Center, Xi'an University of Technology, Xi'an, Shaanxi, 710048, People's Republic of China
| | - Xiaorong Ren
- Department of Otolaryngology Head and Neck Surgery & Center of Sleep Medicine, The Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, People's Republic of China
| |
Collapse
|
2
|
Sfayyih AH, Sulaiman N, Sabry AH. A review on lung disease recognition by acoustic signal analysis with deep learning networks. JOURNAL OF BIG DATA 2023; 10:101. [PMID: 37333945 PMCID: PMC10259357 DOI: 10.1186/s40537-023-00762-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 05/08/2023] [Indexed: 06/20/2023]
Abstract
Recently, assistive explanations for difficulties in the health check area have been made viable thanks in considerable portion to technologies like deep learning and machine learning. Using auditory analysis and medical imaging, they also increase the predictive accuracy for prompt and early disease detection. Medical professionals are thankful for such technological support since it helps them manage further patients because of the shortage of skilled human resources. In addition to serious illnesses like lung cancer and respiratory diseases, the plurality of breathing difficulties is gradually rising and endangering society. Because early prediction and immediate treatment are crucial for respiratory disorders, chest X-rays and respiratory sound audio are proving to be quite helpful together. Compared to related review studies on lung disease classification/detection using deep learning algorithms, only two review studies based on signal analysis for lung disease diagnosis have been conducted in 2011 and 2018. This work provides a review of lung disease recognition with acoustic signal analysis with deep learning networks. We anticipate that physicians and researchers working with sound-signal-based machine learning will find this material beneficial.
Collapse
Affiliation(s)
- Alyaa Hamel Sfayyih
- Department of Electrical and Electronic Engineering, Faculty of Engineering, Universiti Putra Malaysia, 43400 Serdang, Malaysia
| | - Nasri Sulaiman
- Department of Electrical and Electronic Engineering, Faculty of Engineering, Universiti Putra Malaysia, 43400 Serdang, Malaysia
| | - Ahmad H. Sabry
- Department of Computer Engineering, Al-Nahrain University, Al Jadriyah Bridge, 64074 Baghdad, Iraq
| |
Collapse
|
3
|
Zhao Z, Wang K. Unaligned Multimodal Sequences for Depression Assessment From Speech. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2022; 2022:3409-3413. [PMID: 36085884 DOI: 10.1109/embc48229.2022.9871556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
A growing area of mental health research pertains to how an individual's degree of depression might be automatically assessed through analyzing multimodal-based objective markers. However, when combined with machine learning, this research can be challenging due to the existence of unaligned multimodal sequences and the limited amount of annotated training data. In this paper, a novel cross-modal framework for automatic depression severity assessment is proposed. The low-level descriptions (LLDs) from multiple clues (such as text, audio and video) are extracted, after which multimodal fusion via cross-modal attention mechanism is utilized to facilitate the learning of more accurate feature representations. For the features extracted from each modality, the cross-modal attention mechanism is utilized to continuously update the input sequence of the target mode, until the score of the patient's health questionnaire (PHQ-8) can finally be obtained. Moreover, Self-Attention Generative Adversarial Networks (SAGAN) is employed to increase the amount of training data available for depression severity analysis. Experimental results on the depression sub-challenge dataset of the Audio/Visual Emotion Challenge (AVEC 2017 and AVEC 2019) demonstrate the effectiveness of our proposed method.
Collapse
|
4
|
Borsky M, Serwatko M, Arnardottir ES, Mallett J. Towards Sleep Study Automation: Detection Evaluation of Respiratory-Related Events. IEEE J Biomed Health Inform 2022; 26:3418-3426. [PMID: 35294367 DOI: 10.1109/jbhi.2022.3159727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The diagnosis of sleep disordered breathing depends on the detection of several respiratory-related events: apneas, hypopneas, snores, or respiratory event-related arousals from sleep studies. While a number of automatic detection methods have been proposed, reproducibility of these methods has been an issue, in part due to the absence of a generally accepted protocol for evaluating their results. With sleep measurements this is usually treated as a classification problem and the accompanying issue of localization is not treated as similarly critical. To address these problems we present a detection evaluation protocol that is able to qualitatively assess the match between two annotations of respiratory-related events. This protocol relies on measuring the relative temporal overlap between two annotations in order to find an alignment that maximizes their F1-score at the sequence level. This protocol can be used in applications which require a precise estimate of the number of events, total event duration, and a joint estimate of event number and duration. We assess its application using a data set that contains over 10,000 manually annotated snore events from 9 subjects, and show that when using the American Academy of Sleep Medicine Manual standard, two sleep technologists can achieve an F1-score of 0.88 when identifying the presence of snore events. In addition, we drafted rules for marking snore boundaries and showed that one sleep technologist can achieve F1-score of 0.94 at the same tasks. Finally, we compared our protocol against the protocol that is used to evaluate sleep spindle detection and highlighted the differences.
Collapse
|
5
|
Ljubić H, Martinović G, Volarić T. Augmenting data with generative adversarial networks: An overview. INTELL DATA ANAL 2022. [DOI: 10.3233/ida-215735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Performance of neural networks greatly depends on quality, size and balance of training dataset. In a real environment datasets are rarely balanced and training deep models over such data is one of the main challenges of deep learning. In order to reduce this problem, methods and techniques are borrowed from the traditional machine learning. Conversely, generative adversarial networks (GAN) were created and developed, a relatively new type of generative models that are based on game theory and consist of two neural networks, a generator and a discriminator. The generator’s task is to create a sample from the input noise that is based on training data distribution and the discriminator should detect those samples as fake. This process goes through a finite number of iterations until the generator successfully fools the discriminator. When this occurs, sample becomes a part of new (augmented) dataset. Even though the original GAN creates unlabeled samples, variants that soon appeared removed that limitation. Generating artificial data through these networks appears to be a meaningful solution to the imbalance problem since it turned out that artificial samples created by GAN are difficult to differentiate from the real ones. In this manner, new samples of minority class could be created and dataset imbalance ratio lowered.
Collapse
Affiliation(s)
- Hrvoje Ljubić
- Faculty of Science and Education, University of Mostar, Mostar, Bosnia and Herzegovina
| | - Goran Martinović
- Faculty of Electrical Engineering, Computer Science and Information Technology, Josip Juraj Strossmayer University of Osijek, Osijek, Croatia
| | - Tomislav Volarić
- Faculty of Science and Education, University of Mostar, Mostar, Bosnia and Herzegovina
| |
Collapse
|
6
|
Yi L, Mak MW. Improving Speech Emotion Recognition With Adversarial Data Augmentation Network. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:172-184. [PMID: 33035171 DOI: 10.1109/tnnls.2020.3027600] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
When training data are scarce, it is challenging to train a deep neural network without causing the overfitting problem. For overcoming this challenge, this article proposes a new data augmentation network-namely adversarial data augmentation network (ADAN)- based on generative adversarial networks (GANs). The ADAN consists of a GAN, an autoencoder, and an auxiliary classifier. These networks are trained adversarially to synthesize class-dependent feature vectors in both the latent space and the original feature space, which can be augmented to the real training data for training classifiers. Instead of using the conventional cross-entropy loss for adversarial training, the Wasserstein divergence is used in an attempt to produce high-quality synthetic samples. The proposed networks were applied to speech emotion recognition using EmoDB and IEMOCAP as the evaluation data sets. It was found that by forcing the synthetic latent vectors and the real latent vectors to share a common representation, the gradient vanishing problem can be largely alleviated. Also, results show that the augmented data generated by the proposed networks are rich in emotion information. Thus, the resulting emotion classifiers are competitive with state-of-the-art speech emotion recognition systems.
Collapse
|
7
|
Qian K, Schmitt M, Zheng H, Koike T, Han J, Liu J, Ji W, Duan J, Song M, Yang Z, Ren Z, Liu S, Zhang Z, Yamamoto Y, Schuller BW. Computer Audition for Fighting the SARS-CoV-2 Corona Crisis-Introducing the Multitask Speech Corpus for COVID-19. IEEE INTERNET OF THINGS JOURNAL 2021; 8:16035-16046. [PMID: 35782182 PMCID: PMC8768988 DOI: 10.1109/jiot.2021.3067605] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 02/24/2021] [Accepted: 03/17/2021] [Indexed: 05/29/2023]
Abstract
Computer audition (CA) has experienced a fast development in the past decades by leveraging advanced signal processing and machine learning techniques. In particular, for its noninvasive and ubiquitous character by nature, CA-based applications in healthcare have increasingly attracted attention in recent years. During the tough time of the global crisis caused by the coronavirus disease 2019 (COVID-19), scientists and engineers in data science have collaborated to think of novel ways in prevention, diagnosis, treatment, tracking, and management of this global pandemic. On the one hand, we have witnessed the power of 5G, Internet of Things, big data, computer vision, and artificial intelligence in applications of epidemiology modeling, drug and/or vaccine finding and designing, fast CT screening, and quarantine management. On the other hand, relevant studies in exploring the capacity of CA are extremely lacking and underestimated. To this end, we propose a novel multitask speech corpus for COVID-19 research usage. We collected 51 confirmed COVID-19 patients' in-the-wild speech data in Wuhan city, China. We define three main tasks in this corpus, i.e., three-category classification tasks for evaluating the physical and/or mental status of patients, i.e., sleep quality, fatigue, and anxiety. The benchmarks are given by using both classic machine learning methods and state-of-the-art deep learning techniques. We believe this study and corpus cannot only facilitate the ongoing research on using data science to fight against COVID-19, but also the monitoring of contagious diseases for general purpose.
Collapse
Affiliation(s)
- Kun Qian
- Educational Physiology Laboratory, Graduate School of EducationThe University of TokyoTokyo113-0033Japan
| | - Maximilian Schmitt
- Chair of Embedded Intelligence for Health Care and WellbeingUniversity of Augsburg86159AugsburgGermany
| | - Huaiyuan Zheng
- Department of Hand SurgeryWuhan Union Hospital, Tongji Medical CollegeHuazhong University of Science and TechnologyWuhan430074China
| | - Tomoya Koike
- Educational Physiology Laboratory, Graduate School of EducationThe University of TokyoTokyo113-0033Japan
| | - Jing Han
- Mobile Systems GroupUniversity of CambridgeCambridgeCB2 1TNU.K.
| | - Juan Liu
- Department of Plastic SurgeryCentral Hospital of Wuhan, Tongji Medical CollegeHuazhong University of Science and TechnologyWuhan430074China
| | - Wei Ji
- Department of Plastic SurgeryWuhan Third Hospital and Tongren Hospital of Wuhan UniversityWuhan430072China
| | - Junjun Duan
- Department of Plastic SurgeryCentral Hospital of Wuhan, Tongji Medical CollegeHuazhong University of Science and TechnologyWuhan430074China
| | - Meishu Song
- Chair of Embedded Intelligence for Health Care and WellbeingUniversity of Augsburg86159AugsburgGermany
| | - Zijiang Yang
- Chair of Embedded Intelligence for Health Care and WellbeingUniversity of Augsburg86159AugsburgGermany
| | - Zhao Ren
- Chair of Embedded Intelligence for Health Care and WellbeingUniversity of Augsburg86159AugsburgGermany
| | - Shuo Liu
- Chair of Embedded Intelligence for Health Care and WellbeingUniversity of Augsburg86159AugsburgGermany
| | - Zixing Zhang
- GLAM—the Group on Language, Audio, and MusicImperial College LondonLondonSW7 2BUU.K.
| | - Yoshiharu Yamamoto
- Educational Physiology Laboratory, Graduate School of EducationThe University of TokyoTokyo113-0033Japan
| | - Björn W. Schuller
- Chair of Embedded Intelligence for Health Care and WellbeingUniversity of Augsburg86159AugsburgGermany
- GLAM—the Group on Language, Audio, and MusicImperial College LondonLondonSW7 2BUU.K.
| |
Collapse
|
8
|
Dogan S, Akbal E, Tuncer T, Acharya UR. Application of substitution box of present cipher for automated detection of snoring sounds. Artif Intell Med 2021; 117:102085. [PMID: 34127246 DOI: 10.1016/j.artmed.2021.102085] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2020] [Revised: 04/30/2021] [Accepted: 05/03/2021] [Indexed: 01/06/2023]
Abstract
BACKGROUND AND PURPOSE Snoring is one of the sleep disorders, and snoring sounds have been used to diagnose many sleep-related diseases. However, the snoring sound classification is done manually which is time-consuming and prone to human errors. An automated snoring sound classification model is proposed to overcome these problems. MATERIAL AND METHOD This work proposes an automated snoring sound classification method using three new methods. These methods are maximum absolute pooling (MAP), the nonlinear present pattern, and two-layered neighborhood component analysis, and iterative neighborhood component analysis (NCAINCA) selector. Using these methods, a new snoring sound classification (SSC) model is presented. The MAP decomposition model is applied to snoring sounds to extract both low and high-level features. The presented model aims to attain high performance for SSC problem. The developed present pattern (Present-Pat) uses substitution box (SBox) and statistical feature generator. By deploying these feature generators, both textural and statistical features are generated. NCAINCA chooses the most informative/valuable features, and these selected features are fed to k-nearest neighbor (kNN) classifier with leave-one-out cross-validation (LOOCV). The Present-Pat based SSC system is developed using Munich-Passau Snore Sound Corpus (MPSSC) dataset comprising of four categories. RESULTS Our model reached an accuracy and unweighted average recall (UAR) of 97.10 % and 97.60 %, respectively, using LOOCV. Moreover, a nocturnal sound dataset is used to show the universal success of the presented model. Our model attained an accuracy of 98.14 % using the used nocturnal sound dataset. CONCLUSIONS Our developed classification model is ready to be tested with more data and can be used by sleep specialists to diagnose the sleep disorders based on snoring sounds.
Collapse
Affiliation(s)
- Sengul Dogan
- Department of Digital Forensics Engineering, College of Technology, Firat University, Elazig, Turkey.
| | - Erhan Akbal
- Department of Digital Forensics Engineering, College of Technology, Firat University, Elazig, Turkey
| | - Turker Tuncer
- Department of Digital Forensics Engineering, College of Technology, Firat University, Elazig, Turkey
| | - U Rajendra Acharya
- Ngee Ann Polytechnic, Department of Electronics and Computer Engineering, 599489, Singapore; Department of Biomedical Engineering, School of Science and Technology, SUSS University, Singapore; Department of Biomedical Informatics and Medical Engineering, Asia University, Taichung, Taiwan
| |
Collapse
|
9
|
Qian K, Janott C, Schmitt M, Zhang Z, Heiser C, Hemmert W, Yamamoto Y, Schuller BW. Can Machine Learning Assist Locating the Excitation of Snore Sound? A Review. IEEE J Biomed Health Inform 2021; 25:1233-1246. [PMID: 32750978 DOI: 10.1109/jbhi.2020.3012666] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In the past three decades, snoring (affecting more than 30 % adults of the UK population) has been increasingly studied in the transdisciplinary research community involving medicine and engineering. Early work demonstrated that, the snore sound can carry important information about the status of the upper airway, which facilitates the development of non-invasive acoustic based approaches for diagnosing and screening of obstructive sleep apnoea and other sleep disorders. Nonetheless, there are more demands from clinical practice on finding methods to localise the snore sound's excitation rather than only detecting sleep disorders. In order to further the relevant studies and attract more attention, we provide a comprehensive review on the state-of-the-art techniques from machine learning to automatically classify snore sounds. First, we introduce the background and definition of the problem. Second, we illustrate the current work in detail and explain potential applications. Finally, we discuss the limitations and challenges in the snore sound classification task. Overall, our review provides a comprehensive guidance for researchers to contribute to this area.
Collapse
|
10
|
Tabatabaei SAH, Fischer P, Schneider H, Koehler U, Gross V, Sohrabi K. Methods for Adventitious Respiratory Sound Analyzing Applications Based on Smartphones: A Survey. IEEE Rev Biomed Eng 2021; 14:98-115. [PMID: 32746364 DOI: 10.1109/rbme.2020.3002970] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Detection and classification of adventitious acoustic lung sounds plays an important role in diagnosing, monitoring, controlling and, caring the patients with lung diseases. Such systems can be presented as different platforms like medical devices, standalone software or smartphone application. Ubiquity of smartphones and widespread use of the corresponding applications make such a device an attractive platform for hosting the detection and classification systems for adventitious lung sounds. In this paper, the smartphone-based systems for automatic detection and classification of the adventitious lung sounds are surveyed. Such adventitious sounds include cough, wheeze, crackle and, snore. Relevant sounds related to abnormal respiratory activities are considered as well. The methods are shortly described and the analyzing algorithms are explained. The analysis includes detection and/or classification of the sound events. A summary of the main surveyed methods together with the classification parameters and used features for the sake of comparison is given. Existing challenges, open issues and future trends will be discussed as well.
Collapse
|
11
|
Tuncer T, Akbal E, Dogan S. An automated snoring sound classification method based on local dual octal pattern and iterative hybrid feature selector. Biomed Signal Process Control 2021; 63:102173. [PMID: 32922509 PMCID: PMC7476581 DOI: 10.1016/j.bspc.2020.102173] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Revised: 08/18/2020] [Accepted: 08/22/2020] [Indexed: 02/08/2023]
Abstract
In this research, a novel snoring sound classification (SSC) method is presented by proposing a new feature generation function to yield a high classification rate. The proposed feature extractor is named as Local Dual Octal Pattern (LDOP). A novel LDOP based SSC method is presented to solve the low success rate problems for Munich-Passau Snore Sound Corpus (MPSSC) dataset. Multilevel discrete wavelet transform (DWT) decomposition and the LDOP based feature generation, informative features selection with ReliefF and iterative neighborhood component analysis (RFINCA), and classification using k nearest neighbors (kNN) are fundamental phases of the proposed SSC method. Seven leveled DWT transform, and LDOP are used together to generate low, medium, and high levels features. This feature generation network extracts 4096 features in total. RFINCA selects 95 the most discriminative and informative ones of these 4096 features. In the classification phase, kNN with leave one out cross-validation (LOOCV) is used. 95.53% classification accuracy and 94.65% unweighted average recall (UAR) have been achieved using this method. The proposed LDOP based SSC method reaches 22% better result than the best of the other state-of-the-art machine learning and deep learning-based methods. These results clearly denote the success of the proposed SSC method.
Collapse
Affiliation(s)
- Turker Tuncer
- Department of Digital Forensics Engineering, Technology Faculty, Firat University, Elazig, Turkey
| | - Erhan Akbal
- Department of Digital Forensics Engineering, Technology Faculty, Firat University, Elazig, Turkey
| | - Sengul Dogan
- Department of Digital Forensics Engineering, Technology Faculty, Firat University, Elazig, Turkey
| |
Collapse
|
12
|
Sebastian A, Cistulli PA, Cohen G, de Chazal P. Automated identification of the predominant site of upper airway collapse in obstructive sleep apnoea patients using snore signal. Physiol Meas 2020; 41:095005. [DOI: 10.1088/1361-6579/abaa33] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
13
|
Sun J, Hu X, Chen C, Peng S, Ma Y. Amplitude spectrum trend-based feature for excitation location classification from snore sounds. Physiol Meas 2020; 41:085006. [PMID: 32721937 DOI: 10.1088/1361-6579/abaa34] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
OBJECTIVE Successful surgical treatment of obstructive sleep apnea (OSA) depends on the precise location of the vibrating tissue. Snoring is the main symptom of OSA and can be utilized to detect the active location of tissues. However, existing approaches are limited, owing to their inability to capture the characteristics of snoring produced from the upper airway. This paper proposes a new approach to better distinguish different snoring sounds that are generated from four different excitation locations. APPROACH First, we propose a robust null space pursuit algorithm for extracting the trend from the amplitude spectrum of snoring. Second, a new feature from this extracted amplitude spectrum trend, which outperforms the Mel-frequency cepstral coefficient (MFCC) feature, is designed. Subsequently, the newly proposed feature, namely the trend-based MFCC (TCC), is reduced in dimensionality by using principal component analysis. Finally, a support vector machine is employed for the classification task. MAIN RESULTS By using the TCC, the proposed approach achieves an unweighted average recall of 87.5% on the classification of four excitation locations on the public dataset Munich Passau Snore Sound Corpus. SIGNIFICANCE The TCC is a promising feature for capturing the characteristics of snoring. The proposed method can effectively perform snore classification and assist in accurate OSA diagnosis.
Collapse
Affiliation(s)
- Jingpeng Sun
- Institute of Automation, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing 100190, People's Republic of China
| | | | | | | | | |
Collapse
|
14
|
Dong F, Qian K, Ren Z, Baird A, Li X, Dai Z, Dong B, Metze F, Yamamoto Y, Schuller B. Machine Listening for Heart Status Monitoring: Introducing and Benchmarking HSS - the Heart Sounds Shenzhen Corpus. IEEE J Biomed Health Inform 2019; 24:2082-2092. [PMID: 31765322 DOI: 10.1109/jbhi.2019.2955281] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/28/2024]
Abstract
Auscultation of the heart is a widely studied technique, which requires precise hearing from practitioners as a means of distinguishing subtle differences in heart-beat rhythm. This technique is popular due to its non-invasive nature, and can be an early diagnosis aid for a range of cardiac conditions. Machine listening approaches can support this process, monitoring continuously and allowing for a representation of both mild and chronic heart conditions. Despite this potential, relevant databases and benchmark studies are scarce. In this paper, we introduce our publicly accessible database, the Heart Sounds Shenzhen Corpus (HSS), which was first released during the recent INTERSPEECH 2018 ComParE Heart Sound sub-challenge. Additionally, we provide a survey of machine learning work in the area of heart sound recognition, as well as a benchmark for HSS utilising standard acoustic features and machine learning models. At best our support vector machine with Log Mel features achieves 49.7% unweighted average recall on a three category task (normal, mild, moderate/severe).
Collapse
|