1
|
Peng L, Yang J, Yan L, Chen Z, Xiao J, Zhou L, Zhou J. BSN-ESC: A Big-Small Network-Based Environmental Sound Classification Method for AIoT Applications. SENSORS (BASEL, SWITZERLAND) 2023; 23:6767. [PMID: 37571550 PMCID: PMC10422364 DOI: 10.3390/s23156767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 06/25/2023] [Accepted: 06/28/2023] [Indexed: 08/13/2023]
Abstract
In recent years, environmental sound classification (ESC) has prevailed in many artificial intelligence Internet of Things (AIoT) applications, as environmental sound contains a wealth of information that can be used to detect particular events. However, existing ESC methods have high computational complexity and are not suitable for deployment on AIoT devices with constrained computing resources. Therefore, it is of great importance to propose a model with both high classification accuracy and low computational complexity. In this work, a new ESC method named BSN-ESC is proposed, including a big-small network-based ESC model that can assess the classification difficulty level and adaptively activate a big or small network for classification as well as a pre-classification processing technique with logmel spectrogram refining, which prevents distortion in the frequency-domain characteristics of the sound clip at the joint part of two adjacent sound clips. With the proposed methods, the computational complexity is significantly reduced, while the classification accuracy is still high. The proposed BSN-ESC model is implemented on both CPU and FPGA to evaluate its performance on both PC and embedded systems with the dataset ESC-50, which is the most commonly used dataset. The proposed BSN-ESC model achieves the lowest computational complexity with the number of floating-point operations (FLOPs) of only 0.123G, which represents a reduction of up to 2309 times in computational complexity compared with state-of-the-art methods while delivering a high classification accuracy of 89.25%. This work can achieve the realization of ESC being applied to AIoT devices with constrained computational resources.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Jun Zhou
- Department of Internet of Things Engineering, School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China; (L.P.); (J.Y.); (L.Y.); (Z.C.); (J.X.); (L.Z.)
| |
Collapse
|
2
|
Bonet-Solà D, Vidaña-Vila E, Alsina-Pagès RM. Analysis and Acoustic Event Classification of Environmental Data Collected in a Citizen Science Project. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:3683. [PMID: 36834378 PMCID: PMC9966892 DOI: 10.3390/ijerph20043683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/22/2023] [Revised: 02/10/2023] [Accepted: 02/15/2023] [Indexed: 06/18/2023]
Abstract
Citizen science can serve as a tool to obtain information about changes in the soundscape. One of the challenges of citizen science projects is the processing of data gathered by the citizens, to obtain conclusions. As part of the project Sons al Balcó, authors aim to study the soundscape in Catalonia during the lockdown due to the COVID-19 pandemic and afterwards and design a tool to automatically detect sound events as a first step to assess the quality of the soundscape. This paper details and compares the acoustic samples of the two collecting campaigns of the Sons al Balcó project. While the 2020 campaign obtained 365 videos, the 2021 campaign obtained 237. Later, a convolutional neural network is trained to automatically detect and classify acoustic events even if they occur simultaneously. Event based macro F1-score tops 50% for both campaigns for the most prevalent noise sources. However, results suggest that not all the categories are equally detected: the percentage of prevalence of an event in the dataset and its foregound-to-background ratio play a decisive role.
Collapse
Affiliation(s)
| | - Ester Vidaña-Vila
- Human Environment Research (HER), La Salle—Universitat Ramon Llull, Sant Joan de La Salle, 42, 08022 Barcelona, Spain
| | - Rosa Ma Alsina-Pagès
- Human Environment Research (HER), La Salle—Universitat Ramon Llull, Sant Joan de La Salle, 42, 08022 Barcelona, Spain
| |
Collapse
|
3
|
Kamalipour M, Agahi H, Khishe M, Mahmoodzadeh A. Passive ship detection and classification using hybrid cepstrums and deep compound autoencoders. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-08075-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
4
|
Mutanu L, Gohil J, Gupta K, Wagio P, Kotonya G. A Review of Automated Bioacoustics and General Acoustics Classification Research. SENSORS (BASEL, SWITZERLAND) 2022; 22:8361. [PMID: 36366061 PMCID: PMC9658612 DOI: 10.3390/s22218361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Revised: 10/19/2022] [Accepted: 10/21/2022] [Indexed: 06/16/2023]
Abstract
Automated bioacoustics classification has received increasing attention from the research community in recent years due its cross-disciplinary nature and its diverse application. Applications in bioacoustics classification range from smart acoustic sensor networks that investigate the effects of acoustic vocalizations on species to context-aware edge devices that anticipate changes in their environment adapt their sensing and processing accordingly. The research described here is an in-depth survey of the current state of bioacoustics classification and monitoring. The survey examines bioacoustics classification alongside general acoustics to provide a representative picture of the research landscape. The survey reviewed 124 studies spanning eight years of research. The survey identifies the key application areas in bioacoustics research and the techniques used in audio transformation and feature extraction. The survey also examines the classification algorithms used in bioacoustics systems. Lastly, the survey examines current challenges, possible opportunities, and future directions in bioacoustics.
Collapse
Affiliation(s)
- Leah Mutanu
- Department of Computing, United States International University Africa, Nairobi P.O. Box 14634-0800, Kenya
| | - Jeet Gohil
- Department of Computing, United States International University Africa, Nairobi P.O. Box 14634-0800, Kenya
| | - Khushi Gupta
- Department of Computer Science, Sam Houston State University, Huntsville, TX 77341, USA
| | - Perpetua Wagio
- Department of Computing, United States International University Africa, Nairobi P.O. Box 14634-0800, Kenya
| | - Gerald Kotonya
- School of Computing and Communications, Lancaster University, Lacaster LA1 4WA, UK
| |
Collapse
|
5
|
Tougui I, Jilbab A, Mhamdi JE. Machine Learning Smart System for Parkinson Disease Classification Using the Voice as a Biomarker. Healthc Inform Res 2022; 28:210-221. [PMID: 35982595 PMCID: PMC9388925 DOI: 10.4258/hir.2022.28.3.210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Accepted: 06/28/2022] [Indexed: 11/23/2022] Open
Abstract
OBJECTIVES This study presents PD Predict, a machine learning system for Parkinson disease classification using voice as a biomarker. METHODS We first created an original set of recordings from the mPower study, and then extracted several audio features, such as mel-frequency cepstral coefficient (MFCC) components and other classical speech features, using a windowing procedure. The generated dataset was then divided into training and holdout sets. The training set was used to train two machine learning pipelines, and their performance was estimated using a nested subject-wise cross-validation approach. The holdout set was used to assess the generalizability of the pipelines for unseen data. The final pipelines were implemented in PD Predict and accessed through a prediction endpoint developed using the Django REST Framework. PD Predict is a two-component system: a desktop application that records audio recordings, extracts audio features, and makes predictions; and a server-side web application that implements the machine learning pipelines and processes incoming requests with the extracted audio features to make predictions. Our system is deployed and accessible via the following link: https://pdpredict.herokuapp.com/. RESULTS Both machine learning pipelines showed moderate performance, between 65% and 75% using the nested subject-wise cross-validation approach. Furthermore, they generalized well to unseen data and they did not overfit the training set. CONCLUSIONS The architecture of PD Predict is clear, and the performance of the implemented machine learning pipelines is promising and confirms the usability of smartphone microphones for capturing digital biomarkers of disease.
Collapse
Affiliation(s)
- Ilias Tougui
- E2SN, ENSIAS, Mohammed V University in Rabat, Rabat, Morocco
| | | | - Jamal El Mhamdi
- E2SN, ENSIAS, Mohammed V University in Rabat, Rabat, Morocco
| |
Collapse
|
6
|
Abstract
The occurrence of wildfires often results in significant fatalities. As wildfires are notorious for their high speed of spread, the ability to identify wildfire at its early stage is essential in quickly obtaining control of the fire and in reducing property loss and preventing loss of life. This work presents a machine learning wildfire detecting data pipeline that can be deployed on embedded systems in remote locations. The proposed data pipeline consists of three main steps: audio preprocessing, feature engineering, and classification. Experiments show that the proposed data pipeline is capable of detecting wildfire effectively with high precision and is capable of detecting wildfire sound over the forest’s background soundscape. When being deployed on a Raspberry Pi 4, the proposed data pipeline takes 66 milliseconds to process a 1 s sound clip. To the knowledge of the author, this is the first edge-computing implementation of an audio-based wildfire detection system.
Collapse
|
7
|
Yao Y, Dai Y, Luo W. Early Fault Diagnosis Method for Batch Process Based on Local Time Window Standardization and Trend Analysis. SENSORS 2021; 21:s21238075. [PMID: 34884082 PMCID: PMC8662448 DOI: 10.3390/s21238075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/07/2021] [Revised: 11/23/2021] [Accepted: 11/30/2021] [Indexed: 11/30/2022]
Abstract
The products of a batch process have high economic value. Meanwhile, a batch process involves complex chemicals and equipment. The variability of its operation leads to a high failure rate. Therefore, early fault diagnosis of batch processes is of great significance. Usually, the available information of the sensor data in batch processing is obscured by its noise. The multistage variation of data results in poor diagnostic performance. This paper constructed a standardized method to enlarge fault information as well as a batch fault diagnosis method based on trend analysis. First, an adaptive standardization based on the time window was created; second, utilizing quadratic fitting, we extracted a data trend under the window; third, a new trend recognition method based on the Euclidean distance calculation principle was composed. The method was verified in penicillin fermentation. We constructed two test datasets: one based on an existing batch, and one based on an unknown batch. The average diagnostic rate of each group was 100% and 87.5%; the mean diagnosis time was the same; 0.2083 h. Compared with traditional fault diagnosis methods, this algorithm has better fault diagnosis ability and feature extraction ability.
Collapse
Affiliation(s)
- Yuman Yao
- College of Chemistry and Chemical Engineering, Southwest Petroleum University, Chengdu 610500, China; (Y.Y.); (W.L.)
| | - Yiyang Dai
- School of Chemical Engineering, Sichuan University, Chengdu 610065, China
- Correspondence:
| | - Wenjia Luo
- College of Chemistry and Chemical Engineering, Southwest Petroleum University, Chengdu 610500, China; (Y.Y.); (W.L.)
| |
Collapse
|
8
|
MosAIc: A Classical Machine Learning Multi-Classifier Based Approach against Deep Learning Classifiers for Embedded Sound Classification. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11188394] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Environmental Sound Recognition has become a relevant application for smart cities. Such an application, however, demands the use of trained machine learning classifiers in order to categorize a limited set of audio categories. Although classical machine learning solutions have been proposed in the past, most of the latest solutions that have been proposed toward automated and accurate sound classification are based on a deep learning approach. Deep learning models tend to be large, which can be problematic when considering that sound classifiers often have to be embedded in resource constrained devices. In this paper, a classical machine learning based classifier called MosAIc, and a lighter Convolutional Neural Network model for environmental sound recognition, are proposed to directly compete in terms of accuracy with the latest deep learning solutions. Both approaches are evaluated in an embedded system in order to identify the key parameters when placing such applications on constrained devices. The experimental results show that classical machine learning classifiers can be combined to achieve similar results to deep learning models, and even outperform them in accuracy. The cost, however, is a larger classification time.
Collapse
|
9
|
Pita A, Rodriguez FJ, Navarro JM. Cluster Analysis of Urban Acoustic Environments on Barcelona Sensor Network Data. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph18168271. [PMID: 34444020 PMCID: PMC8392880 DOI: 10.3390/ijerph18168271] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 07/21/2021] [Accepted: 07/27/2021] [Indexed: 11/23/2022]
Abstract
As cities grow in size and number of inhabitants, continuous monitoring of the environmental impact of sound sources becomes essential for the assessment of the urban acoustic environments. This requires the use of management systems that should be fed with large amounts of data captured by acoustic sensors, mostly remote nodes that belong to a wireless acoustic sensor network. These systems help city managers to conduct data-driven analysis and propose action plans in different areas of the city, for instance, to reduce citizens’ exposure to noise. In this paper, unsupervised learning techniques are applied to discover different behavior patterns, both time and space, of sound pressure levels captured by acoustic sensors and to cluster them allowing the identification of various urban acoustic environments. In this approach, the categorization of urban acoustic environments is based on a clustering algorithm using yearly acoustic indexes, such as Lday, Levening, Lnight and standard deviation of Lden. Data collected over three years by a network of acoustic sensors deployed in the city of Barcelona, Spain, are used to train several clustering methods. Comparison between methods concludes that the k-means algorithm has the best performance for these data. After an analysis of several solutions, an optimal clustering of four groups of nodes is chosen. Geographical analysis of the clusters shows insights about the relation between nodes and areas of the city, detecting clusters that are close to urban roads, residential areas and leisure areas mostly. Moreover, temporal analysis of the clusters gives information about their stability. Using one-year size of the sliding window, changes in the membership of nodes in the clusters regarding tendency of the acoustic environments are discovered. In contrast, using one-month windowing, changes due to seasonality and special events, such as COVID-19 lockdown, are recognized. Finally, the sensor clusters obtained by the algorithm are compared with the areas defined in the strategic noise map, previously created by the Barcelona city council. The developed k-means model identified most of the locations found on the overcoming map and also discovered a new area.
Collapse
|
10
|
An Ensemble One Dimensional Convolutional Neural Network with Bayesian Optimization for Environmental Sound Classification. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11104660] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
With the growth of deep learning in various classification problems, many researchers have used deep learning methods in environmental sound classification tasks. This paper introduces an end-to-end method for environmental sound classification based on a one-dimensional convolution neural network with Bayesian optimization and ensemble learning, which directly learns features representation from the audio signal. Several convolutional layers were used to capture the signal and learn various filters relevant to the classification problem. Our proposed method can deal with any audio signal length, as a sliding window divides the signal into overlapped frames. Bayesian optimization accomplished hyperparameter selection and model evaluation with cross-validation. Multiple models with different settings have been developed based on Bayesian optimization to ensure network convergence in both convex and non-convex optimization. An UrbanSound8K dataset was evaluated for the performance of the proposed end-to-end model. The experimental results achieved a classification accuracy of 94.46%, which is 5% higher than existing end-to-end approaches with fewer trainable parameters. Four measurement indices, namely: sensitivity, specificity, accuracy, precision, recall, F-measure, area under ROC curve, and the area under the precision-recall curve were used to measure the model performance. The proposed approach outperformed state-of-the-art end-to-end approaches that use hand-crafted features as input in selected measurement indices and time complexity.
Collapse
|
11
|
Benchmarking Audio Signal Representation Techniques for Classification with Convolutional Neural Networks. SENSORS 2021; 21:s21103434. [PMID: 34069189 PMCID: PMC8156023 DOI: 10.3390/s21103434] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 05/06/2021] [Accepted: 05/11/2021] [Indexed: 11/17/2022]
Abstract
Audio signal classification finds various applications in detecting and monitoring health conditions in healthcare. Convolutional neural networks (CNN) have produced state-of-the-art results in image classification and are being increasingly used in other tasks, including signal classification. However, audio signal classification using CNN presents various challenges. In image classification tasks, raw images of equal dimensions can be used as a direct input to CNN. Raw time-domain signals, on the other hand, can be of varying dimensions. In addition, the temporal signal often has to be transformed to frequency-domain to reveal unique spectral characteristics, therefore requiring signal transformation. In this work, we overview and benchmark various audio signal representation techniques for classification using CNN, including approaches that deal with signals of different lengths and combine multiple representations to improve the classification accuracy. Hence, this work surfaces important empirical evidence that may guide future works deploying CNN for audio signal classification purposes.
Collapse
|