1
|
Baronetto A, Graf L, Fischer S, Neurath MF, Amft O. Multiscale Bowel Sound Event Spotting in Highly Imbalanced Wearable Monitoring Data: Algorithm Development and Validation Study. JMIR AI 2024; 3:e51118. [PMID: 38985504 PMCID: PMC11269970 DOI: 10.2196/51118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 03/29/2024] [Accepted: 04/24/2024] [Indexed: 07/11/2024]
Abstract
BACKGROUND Abdominal auscultation (i.e., listening to bowel sounds (BSs)) can be used to analyze digestion. An automated retrieval of BS would be beneficial to assess gastrointestinal disorders noninvasively. OBJECTIVE This study aims to develop a multiscale spotting model to detect BSs in continuous audio data from a wearable monitoring system. METHODS We designed a spotting model based on the Efficient-U-Net (EffUNet) architecture to analyze 10-second audio segments at a time and spot BSs with a temporal resolution of 25 ms. Evaluation data were collected across different digestive phases from 18 healthy participants and 9 patients with inflammatory bowel disease (IBD). Audio data were recorded in a daytime setting with a smart T-Shirt that embeds digital microphones. The data set was annotated by independent raters with substantial agreement (Cohen κ between 0.70 and 0.75), resulting in 136 hours of labeled data. In total, 11,482 BSs were analyzed, with a BS duration ranging between 18 ms and 6.3 seconds. The share of BSs in the data set (BS ratio) was 0.0089. We analyzed the performance depending on noise level, BS duration, and BS event rate. We also report spotting timing errors. RESULTS Leave-one-participant-out cross-validation of BS event spotting yielded a median F1-score of 0.73 for both healthy volunteers and patients with IBD. EffUNet detected BSs under different noise conditions with 0.73 recall and 0.72 precision. In particular, for a signal-to-noise ratio over 4 dB, more than 83% of BSs were recognized, with precision of 0.77 or more. EffUNet recall dropped below 0.60 for BS duration of 1.5 seconds or less. At a BS ratio greater than 0.05, the precision of our model was over 0.83. For both healthy participants and patients with IBD, insertion and deletion timing errors were the largest, with a total of 15.54 minutes of insertion errors and 13.08 minutes of deletion errors over the total audio data set. On our data set, EffUNet outperformed existing BS spotting models that provide similar temporal resolution. CONCLUSIONS The EffUNet spotter is robust against background noise and can retrieve BSs with varying duration. EffUNet outperforms previous BS detection approaches in unmodified audio data, containing highly sparse BS events.
Collapse
Affiliation(s)
- Annalisa Baronetto
- Hahn-Schickard, Freiburg, Germany
- Intelligent Embedded Systems Lab, University of Freiburg, Freiburg, Germany
| | - Luisa Graf
- Chair of Digital Health, Friedrich-Alexander Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Sarah Fischer
- Medical Clinic 1, University Hospital Erlangen, Friedrich-Alexander Universität Erlangen-Nürnberg, Erlangen, Germany
- Deutsches Zentrum Immuntherapie, Erlangen, Germany
| | - Markus F Neurath
- Medical Clinic 1, University Hospital Erlangen, Friedrich-Alexander Universität Erlangen-Nürnberg, Erlangen, Germany
- Deutsches Zentrum Immuntherapie, Erlangen, Germany
| | - Oliver Amft
- Hahn-Schickard, Freiburg, Germany
- Intelligent Embedded Systems Lab, University of Freiburg, Freiburg, Germany
| |
Collapse
|
2
|
Ijaz N, Banoori F, Koo I. Reshaping Bioacoustics Event Detection: Leveraging Few-Shot Learning (FSL) with Transductive Inference and Data Augmentation. Bioengineering (Basel) 2024; 11:685. [PMID: 39061767 PMCID: PMC11274013 DOI: 10.3390/bioengineering11070685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Revised: 06/30/2024] [Accepted: 07/03/2024] [Indexed: 07/28/2024] Open
Abstract
Bioacoustic event detection is a demanding endeavor involving recognizing and classifying the sounds animals make in their natural habitats. Traditional supervised learning requires a large amount of labeled data, which are hard to come by in bioacoustics. This paper presents a few-shot learning (FSL) method incorporating transductive inference and data augmentation to address the issues of too few labeled events and small volumes of recordings. Here, transductive inference iteratively alters class prototypes and feature extractors to seize essential patterns, whereas data augmentation applies SpecAugment on Mel spectrogram features to augment training data. The proposed approach is evaluated by using the Detecting and Classifying Acoustic Scenes and Events (DCASE) 2022 and 2021 datasets. Extensive experimental results demonstrate that all components of the proposed method achieve significant F-score improvements of 27% and 10%, for the DCASE-2022 and DCASE-2021 datasets, respectively, compared to recent advanced approaches. Moreover, our method is helpful in FSL tasks because it effectively adapts to sounds from various animal species, recordings, and durations.
Collapse
Affiliation(s)
- Nouman Ijaz
- Department of Electrical, Electronics and Computer Engineering, University of Ulsan, Ulsan 44610, Republic of Korea;
| | - Farhad Banoori
- School of Electronics and Information Engineering, South China University of Technology, Guangzhou 510641, China;
- Faculty of Computer Sciences, Department of Computer Science, ILMA University, Karachi City 74900, Pakistan
| | - Insoo Koo
- Department of Electrical, Electronics and Computer Engineering, University of Ulsan, Ulsan 44610, Republic of Korea;
| |
Collapse
|
3
|
Boucher AJ, Weladji RB, Holand Ø, Kumpula J. Modelling reindeer rut activity using on-animal acoustic recorders and machine learning. Ecol Evol 2024; 14:e11479. [PMID: 38932958 PMCID: PMC11199844 DOI: 10.1002/ece3.11479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 04/24/2024] [Accepted: 05/13/2024] [Indexed: 06/28/2024] Open
Abstract
For decades, researchers have employed sound to study the biology of wildlife, with the aim to better understand their ecology and behaviour. By utilizing on-animal recorders to capture audio from freely moving animals, scientists can decipher the vocalizations and glean insights into their behaviour and ecosystem dynamics through advanced signal processing. However, the laborious task of sorting through extensive audio recordings has been a major bottleneck. To expedite this process, researchers have turned to machine learning techniques, specifically neural networks, to streamline the analysis of data. Nevertheless, much of the existing research has focused predominantly on stationary recording devices, overlooking the potential benefits of employing on-animal recorders in conjunction with machine learning. To showcase the synergy of on-animal recorders and machine learning, we conducted a study at the Kutuharju research station in Kaamanen, Finland, where the vocalizations of rutting reindeer were recorded during their mating season. By attaching recorders to seven male reindeer during the rutting periods of 2019 and 2020, we trained convolutional neural networks to distinguish reindeer grunts with a 95% accuracy rate. This high level of accuracy allowed us to examine the reindeers' grunting behaviour, revealing patterns indicating that older, heavier males vocalized more compared to their younger, lighter counterparts. The success of this study underscores the potential of on-animal acoustic recorders coupled with machine learning techniques as powerful tools for wildlife research, hinting at their broader applications with further advancement and optimization.
Collapse
Affiliation(s)
| | | | - Øystein Holand
- Department of Animal and Aquacultural SciencesNorwegian University of Life SciencesÅsNorway
| | - Jouko Kumpula
- Natural Resources Institute of Finland (Luke), Reindeer Research StationHelsinkiFinland
| |
Collapse
|
4
|
Gul S, Khan MS, Ur-Rehman A. DEW: A wavelet approach of rare sound event detection. PLoS One 2024; 19:e0300444. [PMID: 38547253 PMCID: PMC10977878 DOI: 10.1371/journal.pone.0300444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 02/27/2024] [Indexed: 04/01/2024] Open
Abstract
This paper presents a novel sound event detection (SED) system for rare events occurring in an open environment. Wavelet multiresolution analysis (MRA) is used to decompose the input audio clip of 30 seconds into five levels. Wavelet denoising is then applied on the third and fifth levels of MRA to filter out the background. Significant transitions, which may represent the onset of a rare event, are then estimated in these two levels by combining the peak-finding algorithm with the K-medoids clustering algorithm. The small portions of one-second duration, called 'chunks' are cropped from the input audio signal corresponding to the estimated locations of the significant transitions. Features from these chunks are extracted by the wavelet scattering network (WSN) and are given as input to a support vector machine (SVM) classifier, which classifies them. The proposed SED framework produces an error rate comparable to the SED systems based on convolutional neural network (CNN) architecture. Also, the proposed algorithm is computationally efficient and lightweight as compared to deep learning models, as it has no learnable parameter. It requires only a single epoch of training, which is 5, 10, 200, and 600 times lesser than the models based on CNNs and deep neural networks (DNNs), CNN with long short-term memory (LSTM) network, convolutional recurrent neural network (CRNN), and CNN respectively. The proposed model neither requires concatenation with previous frames for anomaly detection nor any additional training data creation needed for other comparative deep learning models. It needs to check almost 360 times fewer chunks for the presence of rare events than the other baseline systems used for comparison in this paper. All these characteristics make the proposed system suitable for real-time applications on resource-limited devices.
Collapse
Affiliation(s)
- Sania Gul
- Department of Electrical Engineering, University of Engineering and Technology, Peshawar, Pakistan
- Intelligent Information Processing Lab, National Center of Artificial Intelligence, University of Engineering and Technology, Peshawar, Pakistan
| | - Muhammad Salman Khan
- Department of Electrical Engineering, College of Engineering, Qatar University, Doha, Qatar
| | - Ata Ur-Rehman
- Department of Electrical Engineering (MCS), NUST, Islamabad, Pakistan
| |
Collapse
|
5
|
You J, Wu W, Lee J. Open set classification of sound event. Sci Rep 2024; 14:1282. [PMID: 38218958 PMCID: PMC10787752 DOI: 10.1038/s41598-023-50639-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 12/22/2023] [Indexed: 01/15/2024] Open
Abstract
Sound is one of the primary forms of sensory information that we use to perceive our surroundings. Usually, a sound event is a sequence of an audio clip obtained from an action. The action can be rhythm patterns, music genre, people speaking for a few seconds, etc. The sound event classification address distinguishes what kind of audio clip it is from the given audio sequence. Nowadays, it is a common issue to solve in the following pipeline: audio pre-processing→perceptual feature extraction→classification algorithm. In this paper, we improve the traditional sound event classification algorithm to identify unknown sound events by using the deep learning method. The compact cluster structure in the feature space for known classes helps recognize unknown classes by allowing large room to locate unknown samples in the embedded feature space. Based on this concept, we applied center loss and supervised contrastive loss to optimize the model. The center loss tries to minimize the intra- class distance by pulling the embedded feature into the cluster center, while the contrastive loss disperses the inter-class features from one another. In addition, we explored the performance of self-supervised learning in detecting unknown sound events. The experimental results demonstrate that our proposed open-set sound event classification algorithm and self-supervised learning approach achieve sustained performance improvements in various datasets.
Collapse
Affiliation(s)
- Jie You
- School of Information Engineering, East China Jiaotong University, Nanchang, 330013, China
- Artificial Intelligence Lab, Department of Computer Science and Engineering, Jeonbuk National University, Jeonju, 54896, South Korea
| | - Wenqin Wu
- Artificial Intelligence Lab, Department of Computer Science and Engineering, Jeonbuk National University, Jeonju, 54896, South Korea
| | - Joonwhoan Lee
- Artificial Intelligence Lab, Department of Computer Science and Engineering, Jeonbuk National University, Jeonju, 54896, South Korea.
| |
Collapse
|
6
|
Bonet-Solà D, Vidaña-Vila E, Alsina-Pagès RM. Analysis and Acoustic Event Classification of Environmental Data Collected in a Citizen Science Project. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:3683. [PMID: 36834378 PMCID: PMC9966892 DOI: 10.3390/ijerph20043683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/22/2023] [Revised: 02/10/2023] [Accepted: 02/15/2023] [Indexed: 06/18/2023]
Abstract
Citizen science can serve as a tool to obtain information about changes in the soundscape. One of the challenges of citizen science projects is the processing of data gathered by the citizens, to obtain conclusions. As part of the project Sons al Balcó, authors aim to study the soundscape in Catalonia during the lockdown due to the COVID-19 pandemic and afterwards and design a tool to automatically detect sound events as a first step to assess the quality of the soundscape. This paper details and compares the acoustic samples of the two collecting campaigns of the Sons al Balcó project. While the 2020 campaign obtained 365 videos, the 2021 campaign obtained 237. Later, a convolutional neural network is trained to automatically detect and classify acoustic events even if they occur simultaneously. Event based macro F1-score tops 50% for both campaigns for the most prevalent noise sources. However, results suggest that not all the categories are equally detected: the percentage of prevalence of an event in the dataset and its foregound-to-background ratio play a decisive role.
Collapse
Affiliation(s)
| | - Ester Vidaña-Vila
- Human Environment Research (HER), La Salle—Universitat Ramon Llull, Sant Joan de La Salle, 42, 08022 Barcelona, Spain
| | - Rosa Ma Alsina-Pagès
- Human Environment Research (HER), La Salle—Universitat Ramon Llull, Sant Joan de La Salle, 42, 08022 Barcelona, Spain
| |
Collapse
|
7
|
Fränti P, Mariescu-Istodor R. Soft Precision and Recall. Pattern Recognit Lett 2023. [DOI: 10.1016/j.patrec.2023.02.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
|
8
|
Jin Y, Wang M, Luo L, Zhao D, Liu Z. Polyphonic Sound Event Detection Using Temporal-Frequency Attention and Feature Space Attention. SENSORS (BASEL, SWITZERLAND) 2022; 22:6818. [PMID: 36146166 PMCID: PMC9503981 DOI: 10.3390/s22186818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 09/01/2022] [Accepted: 09/03/2022] [Indexed: 06/16/2023]
Abstract
The complexity of polyphonic sounds imposes numerous challenges on their classification. Especially in real life, polyphonic sound events have discontinuity and unstable time-frequency variations. Traditional single acoustic features cannot characterize the key feature information of the polyphonic sound event, and this deficiency results in poor model classification performance. In this paper, we propose a convolutional recurrent neural network model based on the temporal-frequency (TF) attention mechanism and feature space (FS) attention mechanism (TFFS-CRNN). The TFFS-CRNN model aggregates Log-Mel spectrograms and MFCCs feature as inputs, which contains the TF-attention module, the convolutional recurrent neural network (CRNN) module, the FS-attention module and the bidirectional gated recurrent unit (BGRU) module. In polyphonic sound events detection (SED), the TF-attention module can capture the critical temporal-frequency features more capably. The FS-attention module assigns different dynamically learnable weights to different dimensions of features. The TFFS-CRNN model improves the characterization of features for key feature information in polyphonic SED. By using two attention modules, the model can focus on semantically relevant time frames, key frequency bands, and important feature spaces. Finally, the BGRU module learns contextual information. The experiments were conducted on the DCASE 2016 Task3 dataset and the DCASE 2017 Task3 dataset. Experimental results show that the F1-score of the TFFS-CRNN model improved 12.4% and 25.2% compared with winning system models in DCASE challenge; the ER is reduced by 0.41 and 0.37 as well. The proposed TFFS-CRNN model algorithm has better classification performance and lower ER in polyphonic SED.
Collapse
Affiliation(s)
- Ye Jin
- Ministry of Education Key Laboratory of Cognitive Radio and Information Processing, Guilin 541006, China
- School of Information and Communication, Guilin University of Electronic Technology, Guilin 541006, China
| | - Mei Wang
- Ministry of Education Key Laboratory of Cognitive Radio and Information Processing, Guilin 541006, China
- School of Information Science & Engineering, Guilin University of Technology, Guilin 541006, China
| | - Liyan Luo
- Ministry of Education Key Laboratory of Cognitive Radio and Information Processing, Guilin 541006, China
- School of Information and Communication, Guilin University of Electronic Technology, Guilin 541006, China
| | - Dinghao Zhao
- Ministry of Education Key Laboratory of Cognitive Radio and Information Processing, Guilin 541006, China
- School of Information and Communication, Guilin University of Electronic Technology, Guilin 541006, China
| | - Zhanqi Liu
- Ministry of Education Key Laboratory of Cognitive Radio and Information Processing, Guilin 541006, China
- School of Information and Communication, Guilin University of Electronic Technology, Guilin 541006, China
| |
Collapse
|
9
|
Wu J, Jiang Z, Chen Q, Wen S, Men A, Wang H. Toward a perceptive pretraining framework for Audio-Visual Video Parsing. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.07.144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
10
|
Park S, Han DK, Elhilali M. Cross-Referencing Self-Training Network for Sound Event Detection in Audio Mixtures. IEEE TRANSACTIONS ON MULTIMEDIA 2022; 25:4573-4585. [PMID: 37928617 PMCID: PMC10621403 DOI: 10.1109/tmm.2022.3178591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2023]
Abstract
Sound event detection is an important facet of audio tagging that aims to identify sounds of interest and define both the sound category and time boundaries for each sound event in a continuous recording. With advances in deep neural networks, there has been tremendous improvement in the performance of sound event detection systems, although at the expense of costly data collection and labeling efforts. In fact, current state-of-the-art methods employ supervised training methods that leverage large amounts of data samples and corresponding labels in order to facilitate identification of sound category and time stamps of events. As an alternative, the current study proposes a semi-supervised method for generating pseudo-labels from unsupervised data using a student-teacher scheme that balances self-training and cross-training. Additionally, this paper explores post-processing which extracts sound intervals from network prediction, for further improvement in sound event detection performance. The proposed approach is evaluated on sound event detection task for the DCASE2020 challenge. The results of these methods on both "validation" and "public evaluation" sets of DESED database show significant improvement compared to the state-of-the art systems in semi-supervised learning.
Collapse
Affiliation(s)
- Sangwook Park
- Department of Electronic Engineering, Gangneung-Wonju National University, Gangneung, 25457 South Korea
| | - David K Han
- Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA, 19104 USA
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering and jointly with the Department of Psychology and Brain Sciences, Johns Hopkins University, Baltimore, MD, 21218 USA
| |
Collapse
|
11
|
Wang Y, Ye J, Borchers DL. Automated call detection for acoustic surveys with structured calls of varying length. Methods Ecol Evol 2022. [DOI: 10.1111/2041-210x.13873] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Yuheng Wang
- Centre for Research into Ecological and Environmental Modelling School of Mathematics and Statistics University of St Andrews, The Observatory, St Andrews Fife Scotland
| | - Juan Ye
- School of Computer Science University of St Andrews, North Haugh, St Andrews Fife Scotland
| | - David L. Borchers
- Centre for Research into Ecological and Environmental Modelling School of Mathematics and Statistics University of St Andrews, The Observatory, St Andrews Fife Scotland
- Centre for Statistics in Ecology, the Environment, and Conservation University of Cape Town Cape Town South Africa
| |
Collapse
|
12
|
You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12073293] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Audio segmentation and sound event detection are crucial topics in machine listening that aim to detect acoustic classes and their respective boundaries. It is useful for audio-content analysis, speech recognition, audio-indexing, and music information retrieval. In recent years, most research articles adopt segmentation-by-classification. This technique divides audio into small frames and individually performs classification on these frames. In this paper, we present a novel approach called You Only Hear Once (YOHO), which is inspired by the YOLO algorithm popularly adopted in Computer Vision. We convert the detection of acoustic boundaries into a regression problem instead of frame-based classification. This is done by having separate output neurons to detect the presence of an audio class and predict its start and end points. The relative improvement for F-measure of YOHO, compared to the state-of-the-art Convolutional Recurrent Neural Network, ranged from 1% to 6% across multiple datasets for audio segmentation and sound event detection. As the output of YOHO is more end-to-end and has fewer neurons to predict, the speed of inference is at least 6 times faster than segmentation-by-classification. In addition, as this approach predicts acoustic boundaries directly, the post-processing and smoothing is about 7 times faster.
Collapse
|
13
|
Stowell D. Computational bioacoustics with deep learning: a review and roadmap. PeerJ 2022; 10:e13152. [PMID: 35341043 PMCID: PMC8944344 DOI: 10.7717/peerj.13152] [Citation(s) in RCA: 50] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Accepted: 03/01/2022] [Indexed: 01/20/2023] Open
Abstract
Animal vocalisations and natural soundscapes are fascinating objects of study, and contain valuable evidence about animal behaviours, populations and ecosystems. They are studied in bioacoustics and ecoacoustics, with signal processing and analysis an important component. Computational bioacoustics has accelerated in recent decades due to the growth of affordable digital sound recording devices, and to huge progress in informatics such as big data, signal processing and machine learning. Methods are inherited from the wider field of deep learning, including speech and image processing. However, the tasks, demands and data characteristics are often different from those addressed in speech or music analysis. There remain unsolved problems, and tasks for which evidence is surely present in many acoustic signals, but not yet realised. In this paper I perform a review of the state of the art in deep learning for computational bioacoustics, aiming to clarify key concepts and identify and analyse knowledge gaps. Based on this, I offer a subjective but principled roadmap for computational bioacoustics with deep learning: topics that the community should aim to address, in order to make the most of future developments in AI and informatics, and to use audio data in answering zoological and ecological questions.
Collapse
Affiliation(s)
- Dan Stowell
- Department of Cognitive Science and Artificial Intelligence, Tilburg University, Tilburg, The Netherlands,Naturalis Biodiversity Center, Leiden, The Netherlands
| |
Collapse
|
14
|
Multi-Scale Features for Transformer Model to Improve the Performance of Sound Event Detection. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12052626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
To alleviate the problem of performance degradation due to the varied sound durations of competing classes in sound event detection, we propose a method that utilizes multi-scale features for sound event detection. We employed a feature-pyramid component in a deep neural network architecture based on the Transformer encoder that is used to efficiently model the time correlation of sound signals because of its superiority over conventional recurrent neural networks, as demonstrated in recent studies. We used layers of convolutional neural networks to produce two-dimensional acoustic features that are input into the Transformer encoders. The outputs of the Transformer encoders at different levels of the network are combined to obtain the multi-scale features to feed the fully connected feed-forward neural network, which acts as the final classification layer. The proposed method is motivated by the idea that multi-scale features make the network more robust against the dynamic duration of the sound signals depending on their classes. We also applied the proposed method to a mean-teacher model, based on the Transformer encoder, to demonstrate its effectiveness on a large set of unlabeled data. We conducted experiments using the DCASE 2019 Task 4 dataset to evaluate the performance of the proposed method. The experimental results show that the proposed architecture outperforms the baseline network without multi-scale features.
Collapse
|
15
|
Hu Y, Sun X, He L, Huang H. A generalized network based on multi-scale densely connection and residual attention for sound source localization and detection. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:1754. [PMID: 35364955 DOI: 10.1121/10.0009671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Accepted: 02/07/2022] [Indexed: 06/14/2023]
Abstract
Sound source localization and detection (SSLD) is a joint task of identifying the presence of individual sound events and locating the sound sources in space. However, due to the diversity of sound events and the variability of sound source location, SSLD becomes a tough task. In this paper, we propose a SSLD method based on a multi-scale densely connection (MDC) mechanism and a residual attention (RA) mechanism. We design a MDC block to integrate the information from a very local to exponentially enlarged receptive field within the block. We also explored three kinds of RA blocks that can facilitate the conductivity of information flow among different layers by continuously adding feature maps from the previous layers to the next layer. In order to recalibrate the feature maps after convolutional operation, we design a dual-path attention (DPA) unit that is largely embodied in MDC and RA blocks. We firstly verified the effectiveness of the MDC block, RA block, and DPA unit, respectively. We then compared our proposed method with another four methods on the development dataset; finally, with SELDnet and SELD-TCN on another five datasets, experimental results show the generalization of our proposed method.
Collapse
Affiliation(s)
- Ying Hu
- Department of Information Science and Engineering, Xinjiang University, Urumqi 830000, China
| | - Xinghao Sun
- Department of Information Science and Engineering, Xinjiang University, Urumqi 830000, China
| | - Liang He
- Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
| | - Hao Huang
- Department of Information Science and Engineering, Xinjiang University, Urumqi 830000, China
| |
Collapse
|
16
|
Abstract
Polyphonic sound event detection (SED) is the task of detecting the time stamps and the class of sound event that occurred during a recording. Real life sound events overlap in recordings, and their durations vary dramatically, making them even harder to recognize. In this paper, we propose Convolutional Recurrent Neural Networks (CRNNs) to extract hidden state feature representations; then, a self-attention mechanism using a symmetric score function is introduced to memorize long-range dependencies of features that the CRNNs extract. Furthermore, we propose to use memory-controlled self-attention to explicitly compute the relations between time steps in audio representation embedding. Then, we propose a strategy for adaptive memory-controlled self-attention mechanisms. Moreover, we applied semi-supervised learning, namely, mean teacher–student methods, to exploit unlabeled audio data. The proposed methods all performed well in the Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 Sound Event Detection in Real Life Audio (task3) test and the DCASE 2021 Sound Event Detection and Separation in Domestic Environments (task4) test. In DCASE 2017 task3, our model surpassed the challenge’s winning system’s F1-score by 6.8%. We show that the proposed adaptive memory-controlled model reached the same performance level as a fixed attention width model. Experimental results indicate that the proposed attention mechanism is able to improve sound event detection. In DCASE 2021 task4, we investigated various pooling strategies in two scenarios. In addition, we found that in weakly labeled semi-supervised sound event detection, building an attention layer on top of the CRNN is needless repetition. This conclusion could be applied to other multi-instance learning problems.
Collapse
|
17
|
Pandeya YR, Bhattarai B, Afzaal U, Kim JB, Lee J. A monophonic cow sound annotation tool using a semi-automatic method on audio/video data. Livest Sci 2022. [DOI: 10.1016/j.livsci.2021.104811] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
18
|
Hildebrand JA, Frasier KE, Helble TA, Roch MA. Performance metrics for marine mammal signal detection and classification. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:414. [PMID: 35105012 DOI: 10.1121/10.0009270] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 12/15/2021] [Indexed: 06/14/2023]
Abstract
Automatic algorithms for the detection and classification of sound are essential to the analysis of acoustic datasets with long duration. Metrics are needed to assess the performance characteristics of these algorithms. Four metrics for performance evaluation are discussed here: receiver-operating-characteristic (ROC) curves, detection-error-trade-off (DET) curves, precision-recall (PR) curves, and cost curves. These metrics were applied to the generalized power law detector for blue whale D calls [Helble, Ierley, D'Spain, Roch, and Hildebrand (2012). J. Acoust. Soc. Am. 131(4), 2682-2699] and the click-clustering neural-net algorithm for Cuvier's beaked whale echolocation click detection [Frasier, Roch, Soldevilla, Wiggins, Garrison, and Hildebrand (2017). PLoS Comp. Biol. 13(12), e1005823] using data prepared for the 2015 Detection, Classification, Localization and Density Estimation Workshop. Detection class imbalance, particularly the situation of rare occurrence, is common for long-term passive acoustic monitoring datasets and is a factor in the performance of ROC and DET curves with regard to the impact of false positive detections. PR curves overcome this shortcoming when calculated for individual detections and do not rely on the reporting of true negatives. Cost curves provide additional insight on the effective operating range for the detector based on the a priori probability of occurrence. Use of more than a single metric is helpful in understanding the performance of a detection algorithm.
Collapse
Affiliation(s)
- John A Hildebrand
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, California 92093, USA
| | - Kaitlin E Frasier
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, California 92093, USA
| | - Tyler A Helble
- Naval Information Warfare Center Pacific, San Diego, California 92152, USA
| | - Marie A Roch
- Department of Computer Science, San Diego State University, San Diego, California 92182, USA
| |
Collapse
|
19
|
Sound Event Detection by Pseudo-Labeling in Weakly Labeled Dataset. SENSORS 2021; 21:s21248375. [PMID: 34960475 PMCID: PMC8705589 DOI: 10.3390/s21248375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 12/09/2021] [Accepted: 12/13/2021] [Indexed: 11/22/2022]
Abstract
Weakly labeled sound event detection (WSED) is an important task as it can facilitate the data collection efforts before constructing a strongly labeled sound event dataset. Recent high performance in deep learning-based WSED’s exploited using a segmentation mask for detecting the target feature map. However, achieving accurate detection performance was limited in real streaming audio due to the following reasons. First, the convolutional neural networks (CNN) employed in the segmentation mask extraction process do not appropriately highlight the importance of feature as the feature is extracted without pooling operations, and, concurrently, a small size kernel forces the receptive field small, making it difficult to learn various patterns. Second, as feature maps are obtained in an end-to-end fashion, the WSED model would be weak to unknown contents in the wild. These limitations would lead to generating undesired feature maps, such as noise in the unseen environment. This paper addresses these issues by constructing a more efficient model by employing a gated linear unit (GLU) and dilated convolution to improve the problems of de-emphasizing importance and lack of receptive field. In addition, this paper proposes pseudo-label-based learning for classifying target contents and unknown contents by adding ’noise label’ and ’noise loss’ so that unknown contents can be separated as much as possible through the noise label. The experiment is performed by mixing DCASE 2018 task1 acoustic scene data and task2 sound event data. The experimental results show that the proposed SED model achieves the best F1 performance with 59.7% at 0 SNR, 64.5% at 10 SNR, and 65.9% at 20 SNR. These results represent an improvement of 17.7%, 16.9%, and 16.5%, respectively, over the baseline.
Collapse
|
20
|
Lepak Ł, Radzikowski K, Nowak R, Piczak KJ. Generalisation Gap of Keyword Spotters in a Cross-Speaker Low-Resource Scenario. SENSORS 2021; 21:s21248313. [PMID: 34960407 PMCID: PMC8704929 DOI: 10.3390/s21248313] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 12/03/2021] [Accepted: 12/10/2021] [Indexed: 11/16/2022]
Abstract
Models for keyword spotting in continuous recordings can significantly improve the experience of navigating vast libraries of audio recordings. In this paper, we describe the development of such a keyword spotting system detecting regions of interest in Polish call centre conversations. Unfortunately, in spite of recent advancements in automatic speech recognition systems, human-level transcription accuracy reported on English benchmarks does not reflect the performance achievable in low-resource languages, such as Polish. Therefore, in this work, we shift our focus from complete speech-to-text conversion to acoustic similarity matching in the hope of reducing the demand for data annotation. As our primary approach, we evaluate Siamese and prototypical neural networks trained on several datasets of English and Polish recordings. While we obtain usable results in English, our models’ performance remains unsatisfactory when applied to Polish speech, both after mono- and cross-lingual training. This performance gap shows that generalisation with limited training resources is a significant obstacle for actual deployments in low-resource languages. As a potential countermeasure, we implement a detector using audio embeddings generated with a generic pre-trained model provided by Google. It has a much more favourable profile when applied in a cross-lingual setup to detect Polish audio patterns. Nevertheless, despite these promising results, its performance on out-of-distribution data are still far from stellar. It would indicate that, in spite of the richness of internal representations created by more generic models, such speech embeddings are not entirely malleable to cross-language transfer.
Collapse
Affiliation(s)
- Łukasz Lepak
- Institute of Computer Science, Faculty of Electronics and Information Technology, Warsaw University of Technology, 00-665 Warsaw, Poland; (K.R.); (R.N.)
- Correspondence: (Ł.L.); (K.J.P.)
| | - Kacper Radzikowski
- Institute of Computer Science, Faculty of Electronics and Information Technology, Warsaw University of Technology, 00-665 Warsaw, Poland; (K.R.); (R.N.)
- Graduate School of Information, Production and Systems, Waseda University, Tokyo 808-0135, Japan
| | - Robert Nowak
- Institute of Computer Science, Faculty of Electronics and Information Technology, Warsaw University of Technology, 00-665 Warsaw, Poland; (K.R.); (R.N.)
| | - Karol J. Piczak
- Institute of Computer Science and Computational Mathematics, Jagiellonian University, 30-348 Krakow, Poland
- Correspondence: (Ł.L.); (K.J.P.)
| |
Collapse
|
21
|
Tao R, Zhang S, Wang Y, Mi X, Ma J, Shen C, Zheng G. MCG-Net: End-to-end Fine-grained Delineation and Diagnostic Classification of Cardiac Events from Magnetocardiographs. IEEE J Biomed Health Inform 2021; 26:1057-1067. [PMID: 34780340 DOI: 10.1109/jbhi.2021.3128169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
In this paper, we propose an end-to-end deep learning architecture, referred as MCG-Net, integrating convolutional neural network (CNN) with transformer-based global context block for fine-grained delineation and diagnostic classification of four cardiac events from magnetocardiogram (MCG) data, namely Q-, R-, S- and T-waves. MCG-Net} takes advantage of a multi-resolution CNN backbone as well as the state-of-the-art (SOTA) transformer encoders that facilitate global temporal feature aggregation. Besides the novel network architecture, we introduce a multi-task learning scheme to achieve simultaneous delineation and classification. Specifically, the problem of MCG delineation is formulated as multi-class heatmap regression. Meanwhile, a binary diagnostic classification label as well as a duration are jointly estimated for each cardiac event using features that are temporally aligned by event heatmaps. The framework is evaluated on a clinical MCG dataset, containing data collected from 270 subjects with cardiac anomalies and 108 control subjects. We designed and conducted a two-fold cross-validation study to validate the proposed method and to compare its performance with the SOTA methods. Experimental results demonstrated that our method outperformed counterparts on both event delineation and diagnostic classification tasks, achieving respectively an average ECG-F1 of 0.987 and an average Event-F1 of 0.975 for MCG delineation, and an average accuracy of 0.870, an average sensitivity of 0.732, an average specificity of 0.914 and an average AUC of 0.903 for diagnostic classification. Comprehensive ablation experiments are additionally performed to investigate effectiveness of different network components.
Collapse
|
22
|
Vidaña-Vila E, Navarro J, Stowell D, Alsina-Pagès RM. Multilabel Acoustic Event Classification Using Real-World Urban Data and Physical Redundancy of Sensors. SENSORS 2021; 21:s21227470. [PMID: 34833545 PMCID: PMC8621353 DOI: 10.3390/s21227470] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Revised: 11/05/2021] [Accepted: 11/07/2021] [Indexed: 11/27/2022]
Abstract
Many people living in urban environments nowadays are overexposed to noise, which results in adverse effects on their health. Thus, urban sound monitoring has emerged as a powerful tool that might enable public administrations to automatically identify and quantify noise pollution. Therefore, identifying multiple and simultaneous acoustic sources in these environments in a reliable and cost-effective way has emerged as a hot research topic. The purpose of this paper is to propose a two-stage classifier able to identify, in real time, a set of up to 21 urban acoustic events that may occur simultaneously (i.e., multilabel), taking advantage of physical redundancy in acoustic sensors from a wireless acoustic sensors network. The first stage of the proposed system consists of a multilabel deep neural network that makes a classification for each 4-s window. The second stage intelligently aggregates the classification results from the first stage of four neighboring nodes to determine the final classification result. Conducted experiments with real-world data and up to three different computing devices show that the system is able to provide classification results in less than 1 s and that it has good performance when classifying the most common events from the dataset. The results of this research may help civic organisations to obtain actionable noise monitoring information from automatic systems.
Collapse
Affiliation(s)
- Ester Vidaña-Vila
- GTM—Grup de Recerca en Tecnologies Mèdia, La Salle Ramon Llull Univeristy, 08022 Barcelona, Spain;
- Correspondence: ; Tel.: +34-932902400
| | - Joan Navarro
- GRITS—Grup de Recerca en Internet Techologies and Storage, La Salle Ramon Llull Univeristy, 08022 Barcelona, Spain;
| | - Dan Stowell
- Department of Cognitive Sciences & Artificial Intelligence, Tilburg University, 5037 AB Tilburg, The Netherlands;
| | - Rosa Ma Alsina-Pagès
- GTM—Grup de Recerca en Tecnologies Mèdia, La Salle Ramon Llull Univeristy, 08022 Barcelona, Spain;
| |
Collapse
|
23
|
Compensating class imbalance for acoustic chimpanzee detection with convolutional recurrent neural networks. ECOL INFORM 2021. [DOI: 10.1016/j.ecoinf.2021.101423] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
24
|
Xue J, Zheng T, Han J. Exploring attention mechanisms based on summary information for end-to-end automatic speech recognition. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.09.017] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
25
|
Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification. SENSORS 2021; 21:s21206718. [PMID: 34695930 PMCID: PMC8540800 DOI: 10.3390/s21206718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Revised: 10/02/2021] [Accepted: 10/08/2021] [Indexed: 11/17/2022]
Abstract
Sound event detection (SED) recognizes the corresponding sound event of an incoming signal and estimates its temporal boundary. Although SED has been recently developed and used in various fields, achieving noise-robust SED in a real environment is typically challenging owing to the performance degradation due to ambient noise. In this paper, we propose combining a pretrained time-domain speech-separation-based noise suppression network (NS) and a pretrained classification network to improve the SED performance in real noisy environments. We use group communication with a context codec method (GC3)-equipped temporal convolutional network (TCN) for the noise suppression model and a convolutional recurrent neural network for the SED model. The former significantly reduce the model complexity while maintaining the same TCN module and performance as a fully convolutional time-domain audio separation network (Conv-TasNet). We also do not update the weights of some layers (i.e., freeze) in the joint fine-tuning process and add an attention module in the SED model to further improve the performance and prevent overfitting. We evaluate our proposed method using both simulation and real recorded datasets. The experimental results show that our method improves the classification performance in a noisy environment under various signal-to-noise-ratio conditions.
Collapse
|
26
|
Investigating the Effects of Training Set Synthesis for Audio Segmentation of Radio Broadcast. ELECTRONICS 2021. [DOI: 10.3390/electronics10070827] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Music and speech detection provides us valuable information regarding the nature of content in broadcast audio. It helps detect acoustic regions that contain speech, voice over music, only music, or silence. In recent years, there have been developments in machine learning algorithms to accomplish this task. However, broadcast audio is generally well-mixed and copyrighted, which makes it challenging to share across research groups. In this study, we address the challenges encountered in automatically synthesising data that resembles a radio broadcast. Firstly, we compare state-of-the-art neural network architectures such as CNN, GRU, LSTM, TCN, and CRNN. Later, we investigate how audio ducking of background music impacts the precision and recall of the machine learning algorithm. Thirdly, we examine how the quantity of synthetic training data impacts the results. Finally, we evaluate the effectiveness of synthesised, real-world, and combined approaches for training models, to understand if the synthetic data presents any additional value. Amongst the network architectures, CRNN was the best performing network. Results also show that the minimum level of audio ducking preferred by the machine learning algorithm was similar to that of human listeners. After testing our model on in-house and public datasets, we observe that our proposed synthesis technique outperforms real-world data in some cases and serves as a promising alternative.
Collapse
|
27
|
Bricher D, Müller A. Supervised Detection of Connector Lock Events with Optical Microphone Data. Int J Neural Syst 2021; 31:2150017. [PMID: 33752578 DOI: 10.1142/s0129065721500179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In manufacturing industry, one of the main targets is to increase automation and ultimately to avoid failures under all circumstances. The plugging and locking of connectors is a class of tasks which is yet hard to be automatized with sufficiently high process stability. Due to the variation of plugging positions and external disturbances, e.g. occlusion due to cables, the quality assessment of plugging processes has emerged as a challenging task for image-based systems. For this reason, the proposed approach analyzes the inherent acoustic connector locking properties in combination with different neural network architectures in order to correctly identify connector locking signals and further to distinguish them from other machining events occurring in assembly plants. For this specific task, highly sensitive optical microphones have been applied for data acquisition. The proposed experiments are carried out under laboratory conditions as well as for the more complex situation in a real manufacturing environment. In this context, the usage of multimodal neural network architectures achieved highest levels in classification performance with accuracy levels close to 90%.
Collapse
Affiliation(s)
- David Bricher
- Institute of Robotics, Johannes Kepler University, Altenberger Straße 69, 4040 Linz, Austria
| | - Andreas Müller
- Institute of Robotics, Johannes Kepler University, Altenberger Straße 69, 4040 Linz, Austria
| |
Collapse
|
28
|
Wang Y, Zhao G, Xiong K, Shi G, Zhang Y. Multi-Scale and Single-Scale Fully Convolutional Networks for Sound Event Detection. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.09.038] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
29
|
Schuller B, Baird A, Gebhard A, Amiriparian S, Keren G, Schmitt M, Cummins N. New Avenues in Audio Intelligence: Towards Holistic Real-life Audio Understanding. Trends Hear 2021; 25:23312165211046135. [PMID: 34751066 PMCID: PMC8581779 DOI: 10.1177/23312165211046135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Revised: 06/25/2021] [Accepted: 08/20/2021] [Indexed: 11/25/2022] Open
Abstract
Computer audition (i.e., intelligent audio) has made great strides in recent years; however, it is still far from achieving holistic hearing abilities, which more appropriately mimic human-like understanding. Within an audio scene, a human listener is quickly able to interpret layers of sound at a single time-point, with each layer varying in characteristics such as location, state, and trait. Currently, integrated machine listening approaches, on the other hand, will mainly recognise only single events. In this context, this contribution aims to provide key insights and approaches, which can be applied in computer audition to achieve the goal of a more holistic intelligent understanding system, as well as identifying challenges in reaching this goal. We firstly summarise the state-of-the-art in traditional signal-processing-based audio pre-processing and feature representation, as well as automated learning such as by deep neural networks. This concerns, in particular, audio interpretation, decomposition, understanding, as well as ontologisation. We then present an agent-based approach for integrating these concepts as a holistic audio understanding system. Based on this, concluding, avenues are given towards reaching the ambitious goal of 'holistic human-parity' machine listening abilities.
Collapse
Affiliation(s)
- Björn Schuller
- University of Augsburg, Augsburg, Germany
- GLAM – Group on Language, Audio & Music, Imperial College, London, UK
- aud EERING GmbH, Germany
| | | | | | | | - Gil Keren
- University of Augsburg, Augsburg, Germany
| | | | - Nicholas Cummins
- University of Augsburg, Augsburg, Germany
- Department of Biostatistics and Health Informatics, IoPPN, King’s College London, UK
| |
Collapse
|
30
|
MeLa: A Programming Language for a New Multidisciplinary Oceanographic Float. SENSORS 2020; 20:s20216081. [PMID: 33114608 PMCID: PMC7672633 DOI: 10.3390/s20216081] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2020] [Revised: 10/22/2020] [Accepted: 10/23/2020] [Indexed: 11/23/2022]
Abstract
At 2000 m depth in the oceans, one can hear biological, seismological, meteorological, and anthropogenic activity. Acoustic monitoring of the oceans at a global scale and over long periods of time could bring important information for various sciences. The Argo project monitors the physical properties of the oceans with autonomous floats, some of which are also equipped with a hydrophone. These have a limited transmission bandwidth requiring acoustic data to be processed on board. However, developing signal processing algorithms for these instruments requires one to be an expert in embedded software. To reduce the need of such expertise, we have developed a programming language, called MeLa. The language hides several aspects of embedded software with specialized programming concepts. It uses models to compute energy consumption, processor usage, and data transmission costs early during the development of applications; this helps to choose a strategy of data processing that has a minimum impact on performances. Simulations on a computer allow for verifying the performance of the algorithms before their deployment on the instrument. We have implemented a seismic P wave detection and a blue whales D call detection algorithm with the MeLa language to show its capabilities. These are the first efforts toward multidisciplinary monitoring of the oceans, which can extend beyond acoustic applications.
Collapse
|
31
|
Hsiao CH, Lin TW, Lin CW, Hsu FS, Lin FYS, Chen CW, Chung CM. Breathing Sound Segmentation and Detection Using Transfer Learning Techniques on an Attention-Based Encoder-Decoder Architecture. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2020; 2020:754-759. [PMID: 33018096 DOI: 10.1109/embc44109.2020.9176226] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
This paper focuses on the use of an attention-based encoder-decoder model for the task of breathing sound segmentation and detection. This study aims to accurately segment the inspiration and expiration of patients with pulmonary diseases using the proposed model. Spectrograms of the lung sound signals and labels for every time segment were used to train the model. The model would first encode the spectrogram and then detect inspiratory or expiratory sounds using the encoded image on an attention-based decoder. Physicians would be able to make a more precise diagnosis based on the more interpretable outputs with the assistance of the attention mechanism.The respiratory sounds used for training and testing were recorded from 22 participants using digital stethoscopes or anti-noising microphone sets. Experimental results showed a high 92.006% accuracy when applied 0.5 second time segments and ResNet101 as encoder. Consistent performance of the proposed method can be observed from ten-fold cross-validation experiments.
Collapse
|
32
|
Sudo Y, Itoyama K, Nishida K, Nakadai K. Sound event aware environmental sound segmentation with Mask U-Net. Adv Robot 2020. [DOI: 10.1080/01691864.2020.1829040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Y. Sudo
- Department of Systems and Control Engineering, School of Engineering, Tokyo Institute of Technology, Tokyo, Japan
| | - K. Itoyama
- Department of Systems and Control Engineering, School of Engineering, Tokyo Institute of Technology, Tokyo, Japan
| | - K. Nishida
- Department of Systems and Control Engineering, School of Engineering, Tokyo Institute of Technology, Tokyo, Japan
| | - K. Nakadai
- Department of Systems and Control Engineering, School of Engineering, Tokyo Institute of Technology, Tokyo, Japan
- Honda Research Institute Japan Co. Ltd., Saitama, Japan
| |
Collapse
|
33
|
Sound Event Detection Using Derivative Features in Deep Neural Networks. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10144911] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
We propose using derivative features for sound event detection based on deep neural networks. As input to the networks, we used log-mel-filterbank and its first and second derivative features for each frame of the audio signal. Two deep neural networks were used to evaluate the effectiveness of these derivative features. Specifically, a convolutional recurrent neural network (CRNN) was constructed by combining a convolutional neural network and a recurrent neural networks (RNN) followed by a feed-forward neural network (FNN) acting as a classification layer. In addition, a mean-teacher model based on an attention CRNN was used. Both models had an average pooling layer at the output so that weakly labeled and unlabeled audio data may be used during model training. Under the various training conditions, depending on the neural network architecture and training set, the use of derivative features resulted in a consistent performance improvement by using the derivative features. Experiments on audio data from the Detection and Classification of Acoustic Scenes and Events 2018 and 2019 challenges indicated that a maximum relative improvement of 16.9% was obtained in terms of the F-score.
Collapse
|
34
|
Noh K, Chang JH. Joint Optimization of Deep Neural Network-Based Dereverberation and Beamforming for Sound Event Detection in Multi-Channel Environments. SENSORS 2020; 20:s20071883. [PMID: 32231161 PMCID: PMC7180550 DOI: 10.3390/s20071883] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Revised: 03/21/2020] [Accepted: 03/25/2020] [Indexed: 11/16/2022]
Abstract
In this paper, we propose joint optimization of deep neural network (DNN)-supported dereverberation and beamforming for the convolutional recurrent neural network (CRNN)-based sound event detection (SED) in multi-channel environments. First, the short-time Fourier transform (STFT) coefficients are calculated from multi-channel audio signals under the noisy and reverberant environments, which are then enhanced by the DNN-supported weighted prediction error (WPE) dereverberation with the estimated masks. Next, the STFT coefficients of the dereverberated multi-channel audio signals are conveyed to the DNN-supported minimum variance distortionless response (MVDR) beamformer in which DNN-supported MVDR beamforming is carried out with the source and noise masks estimated by the DNN. As a result, the single-channel enhanced STFT coefficients are shown at the output and tossed to the CRNN-based SED system, and then, the three modules are jointly trained by the single loss function designed for SED. Furthermore, to ease the difficulty of training a deep learning model for SED caused by the imbalance in the amount of data for each class, the focal loss is used as a loss function. Experimental results show that joint training of DNN-supported dereverberation and beamforming with the SED model under the supervision of focal loss significantly improves the performance under the noisy and reverberant environments.
Collapse
|
35
|
Alías F, Orga F, Alsina-Pagès RM, Socoró JC. Aggregate Impact of Anomalous Noise Events on the WASN-Based Computation of Road Traffic Noise Levels in Urban and Suburban Environments. SENSORS 2020; 20:s20030609. [PMID: 31979126 PMCID: PMC7037915 DOI: 10.3390/s20030609] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 01/15/2020] [Accepted: 01/20/2020] [Indexed: 11/21/2022]
Abstract
Environmental noise can be defined as the accumulation of noise pollution caused by sounds generated by outdoor human activities, Road Traffic Noise (RTN) being the main source in urban and suburban areas. To address the negative effects of environmental noise on public health, the European Environmental Noise Directive requires EU member states to tailor noise maps and define the corresponding action plans every five years for major agglomerations and key infrastructures. Noise maps have been hitherto created from expert-based measurements, after cleaning the recorded acoustic data of undesired acoustic events, or Anomalous Noise Events (ANEs). In recent years, Wireless Acoustic Sensor Networks (WASNs) have become an alternative. However, most of the proposals focus on measuring global noise levels without taking into account the presence of ANEs. The LIFE DYNAMAP project has developed a WASN-based dynamic noise mapping system to analyze the acoustic impact of road infrastructures in real time based solely on RTN levels. After studying the bias caused by individual ANEs on the computation of the A-weighted equivalent noise levels through an expert-based dataset obtained before installing the sensor networks, this work evaluates the aggregate impact of the ANEs on the RTN measurements in a real-operation environment. To that effect, 304 h and 20 min of labeled acoustic data collected through the two WASNs deployed in both pilot areas have been analyzed, computing the individual and aggregate impacts of ANEs for each sensor location and impact range (low, medium and high) for a 5 min integration time. The study shows the regular occurrence of ANEs when monitoring RTN levels in both acoustic environments, which are especially common in the urban area. Moreover, the results reveal that the aggregate contribution of low- and medium-impact ANEs can become as critical as the presence of high-impact individual ANEs, thus highlighting the importance of their automatic removal to obtain reliable WASN-based RTN maps in real-operation environments.
Collapse
|
36
|
Messner E, Fediuk M, Swatek P, Scheidl S, Smolle-Juttner FM, Olschewski H, Pernkopf F. Crackle and Breathing Phase Detection in Lung Sounds with Deep Bidirectional Gated Recurrent Neural Networks. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2019; 2018:356-359. [PMID: 30440410 DOI: 10.1109/embc.2018.8512237] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In this paper, we present a method for event detection in single-channel lung sound recordings. This includes the detection of crackles and breathing phase events (inspiration/expiration). Therefore, we propose an event detection approach with spectral features and bidirectional gated recurrent neural networks (BiGRNNs). In our experiments, we use multichannel lung sound recordings from lung-healthy subjects and patients diagnosed with idiopathic pulmonary fibrosis, collected within a clinical trial. We achieve an event-based F-score of F1 ≈ 86% for breathing phase events and F1 ≈ 72% for crackles. The proposed method shows robustness regarding the contamination of the lung sound recordings with noise, bowel and heart sounds.
Collapse
|
37
|
Adaptive Noise Reduction for Sound Event Detection Using Subband-Weighted NMF. SENSORS 2019; 19:s19143206. [PMID: 31330840 PMCID: PMC6679307 DOI: 10.3390/s19143206] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/26/2019] [Revised: 07/16/2019] [Accepted: 07/17/2019] [Indexed: 11/17/2022]
Abstract
Sound event detection in real-world environments suffers from the interference of non-stationary and time-varying noise. This paper presents an adaptive noise reduction method for sound event detection based on non-negative matrix factorization (NMF). First, a scheme for noise dictionary learning from the input noisy signal is employed by the technique of robust NMF, which supports adaptation to noise variations. The estimated noise dictionary is used to develop a supervised source separation framework in combination with a pre-trained event dictionary. Second, to improve the separation quality, we extend the basic NMF model to a weighted form, with the aim of varying the relative importance of the different components when separating a target sound event from noise. With properly designed weights, the separation process is forced to rely more on those dominant event components, whereas the noise gets greatly suppressed. The proposed method is evaluated on a dataset of the rare sound event detection task of the DCASE 2017 challenge, and achieves comparable results to the top-ranking system based on convolutional recurrent neural networks (CRNNs). The proposed weighted NMF method shows an excellent noise reduction ability, and achieves an improvement of an F-score by 5%, compared to the unweighted approach.
Collapse
|
38
|
Messner E, Zohrer M, Pernkopf F. Heart Sound Segmentation-An Event Detection Approach Using Deep Recurrent Neural Networks. IEEE Trans Biomed Eng 2018; 65:1964-1974. [PMID: 29993398 DOI: 10.1109/tbme.2018.2843258] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
OBJECTIVE In this paper, we accurately detect the state-sequence first heart sound (S1)-systole-second heart sound (S2)-diastole, i.e., the positions of S1 and S2, in heart sound recordings. We propose an event detection approach without explicitly incorporating a priori information of the state duration. This renders it also applicable to recordings with cardiac arrhythmia and extendable to the detection of extra heart sounds (third and fourth heart sound), heart murmurs, as well as other acoustic events. METHODS We use data from the 2016 PhysioNet/CinC Challenge, containing heart sound recordings and annotations of the heart sound states. From the recordings, we extract spectral and envelope features and investigate the performance of different deep recurrent neural network (DRNN) architectures to detect the state sequence. We use virtual adversarial training, dropout, and data augmentation for regularization. RESULTS We compare our results with the state-of-the-art method and achieve an average score for the four events of the state sequence of ${\bf F}_{1}\approx 96$% on an independent test set. CONCLUSION Our approach shows state-of-the-art performance carefully evaluated on the 2016 PhysioNet/CinC Challenge dataset. SIGNIFICANCE In this work, we introduce a new methodology for the segmentation of heart sounds, suggesting an event detection approach with DRNNs using spectral or envelope features.
Collapse
|
39
|
Kananub S, Jawjaroensri W, VanLeeuwen J, Stryhn H, Arunvipas P. Exploring factors associated with bulk tank milk urea nitrogen in Central Thailand. Vet World 2018; 11:642-648. [PMID: 29915503 PMCID: PMC5993769 DOI: 10.14202/vetworld.2018.642-648] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2018] [Accepted: 04/18/2018] [Indexed: 11/16/2022] Open
Abstract
Aim The study was to determine seasonal fluctuations and non-nutritional factors associated with bulk tank milk urea nitrogen(BTMUN). Materials and Methods A total of 58,364 BTM testing records were collected from 2364 farms in Central Thailand during September 2014-August 2015. Using square root BTMUN as the outcome, other milk components, farm effect, and sampling time were analyzed by univariable repeated measures linear regression, and significant variables were included in multivariable repeated measures linear regression. Results The average BTMUN (standard deviation) was 4.71 (±1.16) mmol/L. In the final model, BTM fat and protein percentages were associated with BTMUN as quadratic and cubic polynomials, respectively. BTM lactose percentage and the natural logarithm of somatic cell counts were negatively linearly associated with BTMUN. At the farm level, the BTM lactose association was negatively linear; herd BTMUN decreased following an increase of herd lactose average, and BTM lactose slopes were quite different among farms as well. Sampling time had the highest potency for the estimation of BTMUN over time, with lows and highs occurring in August and October, respectively. The variation in test level BTMUN was decreased by 18.6% compared to the null model, and 6% of the variance could be explained at the farm level. Conclusion The results clarify seasonal variation in BTMUN and the relationships among other BTM constituents and BTMUN, which may be useful for understanding how to manage lactating dairy cattle better to keep BTM constituents within normal ranges.
Collapse
Affiliation(s)
- Suppada Kananub
- Department of Large Animals and Wildlife Clinical Sciences, Faculty of Veterinary Medicine, Kasetsart University, Bangkok, Thailand
| | - Wassana Jawjaroensri
- Laboratory Unit, Kasetsart University Veterinary Teaching Hospital, Nong Pho, Ratchaburi Province, Thailand
| | - John VanLeeuwen
- Department of Health Management, Atlantic Veterinary College, University of Prince Edward Island, Charlottetown, Canada
| | - Henrik Stryhn
- Department of Health Management, Atlantic Veterinary College, University of Prince Edward Island, Charlottetown, Canada
| | - Pipat Arunvipas
- Department of Large Animals and Wildlife Clinical Sciences, Faculty of Veterinary Medicine, Kasetsart University, Bangkok, Thailand
| |
Collapse
|
40
|
Alsina-Pagès RM, Alías F, Socoró JC, Orga F. Detection of Anomalous Noise Events on Low-Capacity Acoustic Nodes for Dynamic Road Traffic Noise Mapping within an Hybrid WASN. SENSORS (BASEL, SWITZERLAND) 2018; 18:E1272. [PMID: 29677147 PMCID: PMC5948866 DOI: 10.3390/s18041272] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Revised: 04/17/2018] [Accepted: 04/18/2018] [Indexed: 11/16/2022]
Abstract
One of the main aspects affecting the quality of life of people living in urban and suburban areas is the continuous exposure to high road traffic noise (RTN) levels. Nowadays, thanks to Wireless Acoustic Sensor Networks (WASN) noise in Smart Cities has started to be automatically mapped. To obtain a reliable picture of the RTN, those anomalous noise events (ANE) unrelated to road traffic (sirens, horns, people, etc.) should be removed from the noise map computation by means of an Anomalous Noise Event Detector (ANED). In Hybrid WASNs, with master-slave architecture, ANED should be implemented in both high-capacity (Hi-Cap) and low-capacity (Lo-Cap) sensors, following the same principle to obtain consistent results. This work presents an ANED version to run in real-time on μ Controller-based Lo-Cap sensors of a hybrid WASN, discriminating RTN from ANE through their Mel-based spectral energy differences. The experiments, considering 9 h and 8 min of real-life acoustic data from both urban and suburban environments, show the feasibility of the proposal both in terms of computational load and in classification accuracy. Specifically, the ANED Lo-Cap requires around 1 6 of the computational load of the ANED Hi-Cap, while classification accuracies are slightly lower (around 10%). However, preliminary analyses show that these results could be improved in around 4% in the future by means of considering optimal frequency selection.
Collapse
Affiliation(s)
- Rosa Ma Alsina-Pagès
- GTM-Grup de recerca en Tecnologies Mèdia, La Salle-Universitat Ramon Llull, Quatre Camins, 30, 08022 Barcelona, Spain.
| | - Francesc Alías
- GTM-Grup de recerca en Tecnologies Mèdia, La Salle-Universitat Ramon Llull, Quatre Camins, 30, 08022 Barcelona, Spain.
| | - Joan Claudi Socoró
- GTM-Grup de recerca en Tecnologies Mèdia, La Salle-Universitat Ramon Llull, Quatre Camins, 30, 08022 Barcelona, Spain.
| | - Ferran Orga
- GTM-Grup de recerca en Tecnologies Mèdia, La Salle-Universitat Ramon Llull, Quatre Camins, 30, 08022 Barcelona, Spain.
| |
Collapse
|
41
|
A Supervised Event-Based Non-Intrusive Load Monitoring for Non-Linear Appliances. SUSTAINABILITY 2018. [DOI: 10.3390/su10041001] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
42
|
Wearable Vibration Based Computer Interaction and Communication System for Deaf. APPLIED SCIENCES-BASEL 2017. [DOI: 10.3390/app7121296] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
43
|
Socoró JC, Alías F, Alsina-Pagès RM. An Anomalous Noise Events Detector for Dynamic Road Traffic Noise Mapping in Real-Life Urban and Suburban Environments. SENSORS (BASEL, SWITZERLAND) 2017; 17:E2323. [PMID: 29023397 PMCID: PMC5677313 DOI: 10.3390/s17102323] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/06/2017] [Revised: 09/29/2017] [Accepted: 10/10/2017] [Indexed: 11/16/2022]
Abstract
One of the main aspects affecting the quality of life of people living in urban and suburban areas is their continued exposure to high Road Traffic Noise (RTN) levels. Until now, noise measurements in cities have been performed by professionals, recording data in certain locations to build a noise map afterwards. However, the deployment of Wireless Acoustic Sensor Networks (WASN) has enabled automatic noise mapping in smart cities. In order to obtain a reliable picture of the RTN levels affecting citizens, Anomalous Noise Events (ANE) unrelated to road traffic should be removed from the noise map computation. To this aim, this paper introduces an Anomalous Noise Event Detector (ANED) designed to differentiate between RTN and ANE in real time within a predefined interval running on the distributed low-cost acoustic sensors of a WASN. The proposed ANED follows a two-class audio event detection and classification approach, instead of multi-class or one-class classification schemes, taking advantage of the collection of representative acoustic data in real-life environments. The experiments conducted within the DYNAMAP project, implemented on ARM-based acoustic sensors, show the feasibility of the proposal both in terms of computational cost and classification performance using standard Mel cepstral coefficients and Gaussian Mixture Models (GMM). The two-class GMM core classifier relatively improves the baseline universal GMM one-class classifier F1 measure by 18.7% and 31.8% for suburban and urban environments, respectively, within the 1-s integration interval. Nevertheless, according to the results, the classification performance of the current ANED implementation still has room for improvement.
Collapse
Affiliation(s)
- Joan Claudi Socoró
- GTM-Grup de recerca en Tecnologies Mèdia, La Salle, Universitat Ramon Llull, Quatre Camins, 30, 08022 Barcelona, Spain.
| | - Francesc Alías
- GTM-Grup de recerca en Tecnologies Mèdia, La Salle, Universitat Ramon Llull, Quatre Camins, 30, 08022 Barcelona, Spain.
| | - Rosa Ma Alsina-Pagès
- GTM-Grup de recerca en Tecnologies Mèdia, La Salle, Universitat Ramon Llull, Quatre Camins, 30, 08022 Barcelona, Spain.
| |
Collapse
|
44
|
Alsina-Pagès RM, Navarro J, Alías F, Hervás M. homeSound: Real-Time Audio Event Detection Based on High Performance Computing for Behaviour and Surveillance Remote Monitoring. SENSORS 2017; 17:s17040854. [PMID: 28406459 PMCID: PMC5424731 DOI: 10.3390/s17040854] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Revised: 03/29/2017] [Accepted: 04/10/2017] [Indexed: 11/16/2022]
Abstract
The consistent growth in human life expectancy during the recent years has driven governments and private organizations to increase the efforts in caring for the eldest segment of the population. These institutions have built hospitals and retirement homes that have been rapidly overfilled, making their associated maintenance and operating costs prohibitive. The latest advances in technology and communications envisage new ways to monitor those people with special needs at their own home, increasing their quality of life in a cost-affordable way. The purpose of this paper is to present an Ambient Assisted Living (AAL) platform able to analyze, identify, and detect specific acoustic events happening in daily life environments, which enables the medic staff to remotely track the status of every patient in real-time. Additionally, this tele-care proposal is validated through a proof-of-concept experiment that takes benefit of the capabilities of the NVIDIA Graphical Processing Unit running on a Jetson TK1 board to locally detect acoustic events. Conducted experiments demonstrate the feasibility of this approach by reaching an overall accuracy of 82% when identifying a set of 14 indoor environment events related to the domestic surveillance and patients’ behaviour monitoring field. Obtained results encourage practitioners to keep working in this direction, and enable health care providers to remotely track the status of their patients in real-time with non-invasive methods.
Collapse
Affiliation(s)
- Rosa Ma Alsina-Pagès
- GTM-Grup de recerca en Tecnologies Mèdia, La Salle-Universitat Ramon Llull, C/Quatre Camins, 30, 08022 Barcelona, Catalonia, Spain.
| | - Joan Navarro
- GRITS-Grup de Recerca en Internet Technologies & Storage, La Salle-Universitat Ramon Llull, C/Quatre Camins, 30, 08022 Barcelona, Catalonia, Spain.
| | - Francesc Alías
- GTM-Grup de recerca en Tecnologies Mèdia, La Salle-Universitat Ramon Llull, C/Quatre Camins, 30, 08022 Barcelona, Catalonia, Spain.
| | - Marcos Hervás
- GTM-Grup de recerca en Tecnologies Mèdia, La Salle-Universitat Ramon Llull, C/Quatre Camins, 30, 08022 Barcelona, Catalonia, Spain.
| |
Collapse
|
45
|
Description of Anomalous Noise Events for Reliable Dynamic Traffic Noise Mapping in Real-Life Urban and Suburban Soundscapes. APPLIED SCIENCES-BASEL 2017. [DOI: 10.3390/app7020146] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
46
|
Design of a Mobile Low-Cost Sensor Network Using Urban Buses for Real-Time Ubiquitous Noise Monitoring. SENSORS 2016; 17:s17010057. [PMID: 28036065 PMCID: PMC5298630 DOI: 10.3390/s17010057] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Revised: 12/07/2016] [Accepted: 12/23/2016] [Indexed: 11/17/2022]
Abstract
One of the main priorities of smart cities is improving the quality of life of their inhabitants. Traffic noise is one of the pollutant sources that causes a negative impact on the quality of life of citizens, which is gaining attention among authorities. The European Commission has promoted the Environmental Noise Directive 2002/49/EC (END) to inform citizens and to prevent the harmful effects of noise exposure. The measure of acoustic levels using noise maps is a strategic issue in the END action plan. Noise maps are typically calculated by computing the average noise during one year and updated every five years. Hence, the implementation of dynamic noise mapping systems could lead to short-term plan actions, besides helping to better understand the evolution of noise levels along time. Recently, some projects have started the monitoring of noise levels in urban areas by means of acoustic sensor networks settled in strategic locations across the city, while others have taken advantage of collaborative citizen sensing mobile applications. In this paper, we describe the design of an acoustic low-cost sensor network installed on public buses to measure the traffic noise in the city in real time. Moreover, the challenges that a ubiquitous bus acoustic measurement system entails are enumerated and discussed. Specifically, the analysis takes into account the feature extraction of the audio signal, the identification and separation of the road traffic noise from urban traffic noise, the hardware platform to measure and process the acoustic signal, the connectivity between the several nodes of the acoustic sensor network to store the data and, finally, the noise maps’ generation process. The implementation and evaluation of the proposal in a real-life scenario is left for future work.
Collapse
|