1
|
Wang L, Kwan MP, Zhou S, Liu D. Assessing the affective quality of soundscape for individuals: Using third-party assessment combined with an artificial intelligence (TPA-AI) model. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 953:176083. [PMID: 39260516 DOI: 10.1016/j.scitotenv.2024.176083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Revised: 07/21/2024] [Accepted: 09/04/2024] [Indexed: 09/13/2024]
Abstract
When investigating the relationship between the acoustic environment and human wellbeing, there is a potential problem resulting from data source self-correlation. To address this data source self-correlation problem, we proposed a third-party assessment combined with an artificial intelligence (TPA-AI) model. The TPA-AI utilized acoustic spectrograms to assess the soundscape's affective quality. First, we collected data on public perceptions of urban sounds (i.e., inviting 100 volunteers to label the affective quality of 7051 10-s audios on a polar scale from annoying to pleasant). Second, we converted the labeled audios to acoustic spectrograms and used deep learning methods to train the TPA-AI model, achieving a 92.88 % predictive accuracy for binary classification. Third, geographic ecological momentary assessment (GEMA) was used to log momentary audios from 180 participants in their daily life context, and we employed the well-trained TPA-AI model to predict the affective quality of these momentary audios. Lastly, we compared the explanatory power of the three methods (i.e., sound level meters, sound questionnaires, and the TPA-AI model) when estimating the relationship between momentary stress level and the acoustic environment. Our results indicate that the TPA-AI's explanatory power outperformed the sound level meter, while using a sound questionnaire might overestimate the effect of the acoustic environment on momentary stress and underestimate other confounders.
Collapse
Affiliation(s)
- Linsen Wang
- Institute of Space and Earth Information Science, Fok Ying Tung Remote Sensing Science Building, The Chinese University of Hong Kong, Hong Kong.
| | - Mei-Po Kwan
- Institute of Space and Earth Information Science, Fok Ying Tung Remote Sensing Science Building, The Chinese University of Hong Kong, Hong Kong; Department of Geography and Resource Management, Wong Foo Yuan Building, The Chinese University of Hong Kong, Hong Kong.
| | - Suhong Zhou
- School of Geography and Planning, Sun Yat-sen University, Guangzhou, Guangdong, China.
| | - Dong Liu
- Institute of Space and Earth Information Science, Fok Ying Tung Remote Sensing Science Building, The Chinese University of Hong Kong, Hong Kong.
| |
Collapse
|
2
|
Terranova F, Betti L, Ferrario V, Friard O, Ludynia K, Petersen GS, Mathevon N, Reby D, Favaro L. Windy events detection in big bioacoustics datasets using a pre-trained Convolutional Neural Network. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 949:174868. [PMID: 39034006 DOI: 10.1016/j.scitotenv.2024.174868] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 07/16/2024] [Accepted: 07/16/2024] [Indexed: 07/23/2024]
Abstract
Passive Acoustic Monitoring (PAM), which involves using autonomous record units for studying wildlife behaviour and distribution, often requires handling big acoustic datasets collected over extended periods. While these data offer invaluable insights about wildlife, their analysis can present challenges in dealing with geophonic sources. A major issue in the process of detection of target sounds is represented by wind-induced noise. This can lead to false positive detections, i.e., energy peaks due to wind gusts misclassified as biological sounds, or false negative, i.e., the wind noise masks the presence of biological sounds. Acoustic data dominated by wind noise makes the analysis of vocal activity unreliable, thus compromising the detection of target sounds and, subsequently, the interpretation of the results. Our work introduces a straightforward approach for detecting recordings affected by windy events using a pre-trained convolutional neural network. This process facilitates identifying wind-compromised data. We consider this dataset pre-processing crucial for ensuring the reliable use of PAM data. We implemented this preprocessing by leveraging YAMNet, a deep learning model for sound classification tasks. We evaluated YAMNet as-is ability to detect wind-induced noise and tested its performance in a Transfer Learning scenario by using our annotated data from the Stony Point Penguin Colony in South Africa. While the classification of YAMNet as-is achieved a precision of 0.71, and recall of 0.66, those metrics strongly improved after the training on our annotated dataset, reaching a precision of 0.91, and recall of 0.92, corresponding to a relative increment of >28 %. Our study demonstrates the promising application of YAMNet in the bioacoustics and ecoacoustics fields, addressing the need for wind-noise-free acoustic data. We released an open-access code that, combined with the efficiency and peak performance of YAMNet, can be used on standard laptops for a broad user base.
Collapse
Affiliation(s)
- Francesca Terranova
- Department of Life Sciences and Systems Biology, University of Turin, Turin, Italy.
| | - Lorenzo Betti
- Department of Network and Data Science, Central European University, Vienna, Austria
| | - Valeria Ferrario
- Department of Life Sciences and Systems Biology, University of Turin, Turin, Italy; Chester Zoo, Caughall Road, Chester, UK
| | - Olivier Friard
- Department of Life Sciences and Systems Biology, University of Turin, Turin, Italy
| | - Katrin Ludynia
- Southern African Foundation for the Conservation of Coastal Birds (SANCCOB), Cape Town, South Africa; Department of Biodiversity and Conservation Biology, University of the Western Cape, Robert Sobukwe Road, Bellville, South Africa
| | - Gavin Sean Petersen
- Southern African Foundation for the Conservation of Coastal Birds (SANCCOB), Cape Town, South Africa
| | - Nicolas Mathevon
- ENES Bioacoustics Research Lab, CRNL, University of Saint-Etienne, CNRS, Inserm, Saint-Etienne, France; Institut universitaire de France, Ministry of Higher Education, Research and Innovation, 1 rue Descartes, CEDEX 05, Paris, France; Ecole Pratique des Hautes Etudes, CHArt lab, PSL University, Paris, France
| | - David Reby
- ENES Bioacoustics Research Lab, CRNL, University of Saint-Etienne, CNRS, Inserm, Saint-Etienne, France; Institut universitaire de France, Ministry of Higher Education, Research and Innovation, 1 rue Descartes, CEDEX 05, Paris, France
| | - Livio Favaro
- Department of Life Sciences and Systems Biology, University of Turin, Turin, Italy; Stazione Zoologica Anton Dohrn, Naples, Italy
| |
Collapse
|
3
|
Baowaly MK, Sarkar BC, Walid MAA, Ahamad MM, Singh BC, Alvarado ES, Ashraf I, Samad MA. Deep transfer learning-based bird species classification using mel spectrogram images. PLoS One 2024; 19:e0305708. [PMID: 39133732 PMCID: PMC11318847 DOI: 10.1371/journal.pone.0305708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 06/04/2024] [Indexed: 08/15/2024] Open
Abstract
The classification of bird species is of significant importance in the field of ornithology, as it plays an important role in assessing and monitoring environmental dynamics, including habitat modifications, migratory behaviors, levels of pollution, and disease occurrences. Traditional methods of bird classification, such as visual identification, were time-intensive and required a high level of expertise. However, audio-based bird species classification is a promising approach that can be used to automate bird species identification. This study aims to establish an audio-based bird species classification system for 264 Eastern African bird species employing modified deep transfer learning. In particular, the pre-trained EfficientNet technique was utilized for the investigation. The study adapts the fine-tune model to learn the pertinent patterns from mel spectrogram images specific to this bird species classification task. The fine-tuned EfficientNet model combined with a type of Recurrent Neural Networks (RNNs) namely Gated Recurrent Unit (GRU) and Long short-term memory (LSTM). RNNs are employed to capture the temporal dependencies in audio signals, thereby enhancing bird species classification accuracy. The dataset utilized in this work contains nearly 17,000 bird sound recordings across a diverse range of species. The experiment was conducted with several combinations of EfficientNet and RNNs, and EfficientNet-B7 with GRU surpasses other experimental models with an accuracy of 84.03% and a macro-average precision score of 0.8342.
Collapse
Affiliation(s)
- Mrinal Kanti Baowaly
- Department of Computer Science and Engineering, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj, Bangladesh
| | - Bisnu Chandra Sarkar
- Department of Computer Science and Engineering, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj, Bangladesh
| | - Md. Abul Ala Walid
- Department of Computer Science and Engineering, Khulna University of Engineering and Technology, Khulna, Bangladesh
- Department of Data Science, Bangabandhu Sheikh Mujibur Rahman Digital University, Gazipur, Bangladesh
| | - Md. Martuza Ahamad
- Department of Computer Science and Engineering, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj, Bangladesh
| | - Bikash Chandra Singh
- School of Cybersecurity, Old Dominion University, Norfolk, VA, United States of America
| | - Eduardo Silva Alvarado
- Universidad Europea del Atlántico, Santander, Spain
- Universidad Internacional Iberoamericana, Campeche, México
- Universidad de La Romana, La Romana, República Dominicana
| | - Imran Ashraf
- Department of Information and Communication Engineering, Yeungnam University, Gyeongsan-si, Gyeongsangbuk-do, South Korea
| | - Md. Abdus Samad
- Department of Information and Communication Engineering, Yeungnam University, Gyeongsan-si, Gyeongsangbuk-do, South Korea
| |
Collapse
|
4
|
Madhusudhana S, Klinck H, Symes LB. Extensive data engineering to the rescue: building a multi-species katydid detector from unbalanced, atypical training datasets. Philos Trans R Soc Lond B Biol Sci 2024; 379:20230444. [PMID: 38705172 PMCID: PMC11070257 DOI: 10.1098/rstb.2023.0444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 02/21/2024] [Indexed: 05/07/2024] Open
Abstract
Passive acoustic monitoring (PAM) is a powerful tool for studying ecosystems. However, its effective application in tropical environments, particularly for insects, poses distinct challenges. Neotropical katydids produce complex species-specific calls, spanning mere milliseconds to seconds and spread across broad audible and ultrasonic frequencies. However, subtle differences in inter-pulse intervals or central frequencies are often the only discriminatory traits. These extremities, coupled with low source levels and susceptibility to masking by ambient noise, challenge species identification in PAM recordings. This study aimed to develop a deep learning-based solution to automate the recognition of 31 katydid species of interest in a biodiverse Panamanian forest with over 80 katydid species. Besides the innate challenges, our efforts were also encumbered by a limited and imbalanced initial training dataset comprising domain-mismatched recordings. To overcome these, we applied rigorous data engineering, improving input variance through controlled playback re-recordings and by employing physics-based data augmentation techniques, and tuning signal-processing, model and training parameters to produce a custom well-fit solution. Methods developed here are incorporated into Koogu, an open-source Python-based toolbox for developing deep learning-based bioacoustic analysis solutions. The parametric implementations offer a valuable resource, enhancing the capabilities of PAM for studying insects in tropical ecosystems. This article is part of the theme issue 'Towards a toolkit for global insect biodiversity monitoring'.
Collapse
Affiliation(s)
- Shyam Madhusudhana
- Centre for Marine Science and Technology, Curtin University, Perth, Western Australia 6845, Australia
- K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University, Ithaca, NY 14853-0001, USA
| | - Holger Klinck
- K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University, Ithaca, NY 14853-0001, USA
| | - Laurel B. Symes
- K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University, Ithaca, NY 14853-0001, USA
- Smithsonian Tropical Research Institute, Balboa, Ancón, Panama City 0843-03092, Republic of Panama
| |
Collapse
|
5
|
Castro-Ospina AE, Solarte-Sanchez MA, Vega-Escobar LS, Isaza C, Martínez-Vargas JD. Graph-Based Audio Classification Using Pre-Trained Models and Graph Neural Networks. SENSORS (BASEL, SWITZERLAND) 2024; 24:2106. [PMID: 38610318 PMCID: PMC11014159 DOI: 10.3390/s24072106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 03/22/2024] [Accepted: 03/23/2024] [Indexed: 04/14/2024]
Abstract
Sound classification plays a crucial role in enhancing the interpretation, analysis, and use of acoustic data, leading to a wide range of practical applications, of which environmental sound analysis is one of the most important. In this paper, we explore the representation of audio data as graphs in the context of sound classification. We propose a methodology that leverages pre-trained audio models to extract deep features from audio files, which are then employed as node information to build graphs. Subsequently, we train various graph neural networks (GNNs), specifically graph convolutional networks (GCNs), GraphSAGE, and graph attention networks (GATs), to solve multi-class audio classification problems. Our findings underscore the effectiveness of employing graphs to represent audio data. Moreover, they highlight the competitive performance of GNNs in sound classification endeavors, with the GAT model emerging as the top performer, achieving a mean accuracy of 83% in classifying environmental sounds and 91% in identifying the land cover of a site based on its audio recording. In conclusion, this study provides novel insights into the potential of graph representation learning techniques for analyzing audio data.
Collapse
Affiliation(s)
- Andrés Eduardo Castro-Ospina
- Grupo de Investigación Máquinas Inteligentes y Reconocimiento de Patrones, Instituto Tecnológico Metropolitano, Medellín 050013, Colombia
| | - Miguel Angel Solarte-Sanchez
- Grupo de Investigación Máquinas Inteligentes y Reconocimiento de Patrones, Instituto Tecnológico Metropolitano, Medellín 050013, Colombia
| | - Laura Stella Vega-Escobar
- Grupo de Investigación Máquinas Inteligentes y Reconocimiento de Patrones, Instituto Tecnológico Metropolitano, Medellín 050013, Colombia
| | - Claudia Isaza
- SISTEMIC, Electronic Engineering Department, Universidad de Antioquia-UdeA, Medellín 050010, Colombia
| | | |
Collapse
|
6
|
Paranayapa T, Ranasinghe P, Ranmal D, Meedeniya D, Perera C. A Comparative Study of Preprocessing and Model Compression Techniques in Deep Learning for Forest Sound Classification. SENSORS (BASEL, SWITZERLAND) 2024; 24:1149. [PMID: 38400306 PMCID: PMC10891629 DOI: 10.3390/s24041149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Revised: 01/31/2024] [Accepted: 02/07/2024] [Indexed: 02/25/2024]
Abstract
Deep-learning models play a significant role in modern software solutions, with the capabilities of handling complex tasks, improving accuracy, automating processes, and adapting to diverse domains, eventually contributing to advancements in various industries. This study provides a comparative study on deep-learning techniques that can also be deployed on resource-constrained edge devices. As a novel contribution, we analyze the performance of seven Convolutional Neural Network models in the context of data augmentation, feature extraction, and model compression using acoustic data. The results show that the best performers can achieve an optimal trade-off between model accuracy and size when compressed with weight and filter pruning followed by 8-bit quantization. In adherence to the study workflow utilizing the forest sound dataset, MobileNet-v3-small and ACDNet achieved accuracies of 87.95% and 85.64%, respectively, while maintaining compact sizes of 243 KB and 484 KB, respectively. Henceforth, this study concludes that CNNs can be optimized and compressed to be deployed in resource-constrained edge devices for classifying forest environment sounds.
Collapse
Affiliation(s)
- Thivindu Paranayapa
- Department of Computer Science & Engineering, University of Moratuwa, Moratuwa 10400, Sri Lanka
| | - Piumini Ranasinghe
- Department of Computer Science & Engineering, University of Moratuwa, Moratuwa 10400, Sri Lanka
| | - Dakshina Ranmal
- Department of Computer Science & Engineering, University of Moratuwa, Moratuwa 10400, Sri Lanka
| | - Dulani Meedeniya
- Department of Computer Science & Engineering, University of Moratuwa, Moratuwa 10400, Sri Lanka
| | - Charith Perera
- School of Computer Science and Informatics, Cardiff University, Cardiff CF24 3AA, UK
| |
Collapse
|
7
|
Hyun SH. Sound-Event Detection of Water-Usage Activities Using Transfer Learning. SENSORS (BASEL, SWITZERLAND) 2023; 24:22. [PMID: 38202884 PMCID: PMC10780862 DOI: 10.3390/s24010022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Revised: 12/17/2023] [Accepted: 12/17/2023] [Indexed: 01/12/2024]
Abstract
In this paper, a sound event detection method is proposed for estimating three types of bathroom activities-showering, flushing, and faucet usage-based on the sounds of water usage in the bathroom. The proposed approach has a two-stage structure. First, the general sound classification network YAMNet is utilized to determine the existence of a general water sound; if the input data contains water sounds, W-YAMNet, a modified network of YAMNet, is then triggered to identify the specific activity. W-YAMNet is designed to accommodate the acoustic characteristics of each bathroom. In training W-YAMNet, the transfer learning method is applied to utilize the advantages of YAMNet and to address its limitations. Various parameters, including the length of the audio clip, were experimentally analyzed to identify trends and suitable values. The proposed method is implemented in a Raspberry-Pi-based edge computer to ensure privacy protection. Applying this methodology to 10-min segments of continuous audio data yielded promising results. However, the accuracy could still be further enhanced, and the potential for utilizing the data obtained through this approach in assessing the health and safety of elderly individuals living alone remains a topic for future investigation.
Collapse
Affiliation(s)
- Seung Ho Hyun
- School of Electrical Engineering, University of Ulsan, Ulsan 44610, Republic of Korea
| |
Collapse
|
8
|
Steffensen TL, Bartnes B, Fuglstad ML, Auflem M, Steinert M. Playing the pipes: acoustic sensing and machine learning for performance feedback during endotracheal intubation simulation. Front Robot AI 2023; 10:1218174. [PMID: 37965634 PMCID: PMC10642916 DOI: 10.3389/frobt.2023.1218174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Accepted: 10/16/2023] [Indexed: 11/16/2023] Open
Abstract
Objective: In emergency medicine, airway management is a core skill that includes endotracheal intubation (ETI), a common technique that can result in ineffective ventilation and laryngotracheal injury if executed incorrectly. We present a method for automatically generating performance feedback during ETI simulator training, potentially augmenting training outcomes on robotic simulators. Method: Electret microphones recorded ultrasonic echoes pulsed through the complex geometry of a simulated airway during ETI performed on a full-size patient simulator. As the endotracheal tube is inserted deeper and the cuff is inflated, the resulting changes in geometry are reflected in the recorded signal. We trained machine learning models to classify 240 intubations distributed equally between six conditions: three insertion depths and two cuff inflation states. The best performing models were cross validated in a leave-one-subject-out scheme. Results: Best performance was achieved by transfer learning with a convolutional neural network pre-trained for sound classification, reaching global accuracy above 98% on 1-second-long audio test samples. A support vector machine trained on different features achieved a median accuracy of 85% on the full label set and 97% on a reduced label set of tube depth only. Significance: This proof-of-concept study demonstrates a method of measuring qualitative performance criteria during simulated ETI in a relatively simple way that does not damage ecological validity of the simulated anatomy. As traditional sonar is hampered by geometrical complexity compounded by the introduced equipment in ETI, the accuracy of machine learning methods in this confined design space enables application in other invasive procedures. By enabling better interaction between the human user and the robotic simulator, this approach could improve training experiences and outcomes in medical simulation for ETI as well as many other invasive clinical procedures.
Collapse
Affiliation(s)
- Torjus L. Steffensen
- Department of Circulation and Medical Imaging, Norwegian University of Science and Technology, Trondheim, Norway
| | - Barge Bartnes
- Department of Mechanical Engineering, Norwegian University of Science and Technology, Trondheim, Norway
| | - Maja L. Fuglstad
- Department of Mechanical Engineering, Norwegian University of Science and Technology, Trondheim, Norway
| | - Marius Auflem
- Department of Mechanical Engineering, Norwegian University of Science and Technology, Trondheim, Norway
| | - Martin Steinert
- Department of Mechanical Engineering, Norwegian University of Science and Technology, Trondheim, Norway
| |
Collapse
|
9
|
Müller J, Mitesser O, Schaefer HM, Seibold S, Busse A, Kriegel P, Rabl D, Gelis R, Arteaga A, Freile J, Leite GA, de Melo TN, LeBien J, Campos-Cerqueira M, Blüthgen N, Tremlett CJ, Böttger D, Feldhaar H, Grella N, Falconí-López A, Donoso DA, Moriniere J, Buřivalová Z. Soundscapes and deep learning enable tracking biodiversity recovery in tropical forests. Nat Commun 2023; 14:6191. [PMID: 37848442 PMCID: PMC10582010 DOI: 10.1038/s41467-023-41693-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 09/07/2023] [Indexed: 10/19/2023] Open
Abstract
Tropical forest recovery is fundamental to addressing the intertwined climate and biodiversity loss crises. While regenerating trees sequester carbon relatively quickly, the pace of biodiversity recovery remains contentious. Here, we use bioacoustics and metabarcoding to measure forest recovery post-agriculture in a global biodiversity hotspot in Ecuador. We show that the community composition, and not species richness, of vocalizing vertebrates identified by experts reflects the restoration gradient. Two automated measures - an acoustic index model and a bird community composition derived from an independently developed Convolutional Neural Network - correlated well with restoration (adj-R² = 0.62 and 0.69, respectively). Importantly, both measures reflected composition of non-vocalizing nocturnal insects identified via metabarcoding. We show that such automated monitoring tools, based on new technologies, can effectively monitor the success of forest recovery, using robust and reproducible data.
Collapse
Affiliation(s)
- Jörg Müller
- Field Station Fabrikschleichach, Department of Animal Ecology and Tropical Biology, Biocenter, University of Würzburg, Glashüttenstr. 5, 96181, Rauhenebrach, Germany.
- Bavarian Forest National Park, Freyungerstr. 2, 94481, Grafenau, Germany.
| | - Oliver Mitesser
- Field Station Fabrikschleichach, Department of Animal Ecology and Tropical Biology, Biocenter, University of Würzburg, Glashüttenstr. 5, 96181, Rauhenebrach, Germany
| | - H Martin Schaefer
- Fundación Jocotoco, Valladolid N24-414 y Luis Cordero, Quito, Ecuador
| | - Sebastian Seibold
- Technical University of Munich, School of Life Sciences, Ecosystem Dynamics and Forest Management Research Group, Hans-Carl-von-Carlowitz-Platz 2, 85354, Freising, Germany
- Berchtesgaden National Park, Doktorberg 6, Berchtesgaden, 83471, Germany
| | - Annika Busse
- Saxon-Switzerland National Park, An der Elbe 4, 01814, Bad Schandau, Germany
| | - Peter Kriegel
- Field Station Fabrikschleichach, Department of Animal Ecology and Tropical Biology, Biocenter, University of Würzburg, Glashüttenstr. 5, 96181, Rauhenebrach, Germany
| | - Dominik Rabl
- Field Station Fabrikschleichach, Department of Animal Ecology and Tropical Biology, Biocenter, University of Würzburg, Glashüttenstr. 5, 96181, Rauhenebrach, Germany
| | - Rudy Gelis
- Yanayacu Research Center, Cosanga, Ecuador
| | | | - Juan Freile
- Pasaje El Moro E4-216 y Norberto Salazar, EC 170902, Tumbaco, DMQ, Ecuador
| | - Gabriel Augusto Leite
- Rainforest Connection, Science Department, 440 Cobia Drive, Suite 1902, Katy, TX, 77494, USA
| | | | - Jack LeBien
- Rainforest Connection, Science Department, 440 Cobia Drive, Suite 1902, Katy, TX, 77494, USA
| | | | - Nico Blüthgen
- Ecological Networks Lab, Department of Biology, Technische Universität Darmstadt, Schnittspahnstr. 3, 64287, Darmstadt, Germany
| | - Constance J Tremlett
- Ecological Networks Lab, Department of Biology, Technische Universität Darmstadt, Schnittspahnstr. 3, 64287, Darmstadt, Germany
| | - Dennis Böttger
- Phyletisches Museum, Institute for Zoology and Evolutionary Research, Friedrich-Schiller-University Jena, Jena, Germany
| | - Heike Feldhaar
- Animal Population Ecology, Bayreuth Center for Ecology and Environmental Research (BayCEER), University of Bayreuth, 95440, Bayreuth, Germany
| | - Nina Grella
- Animal Population Ecology, Bayreuth Center for Ecology and Environmental Research (BayCEER), University of Bayreuth, 95440, Bayreuth, Germany
| | - Ana Falconí-López
- Field Station Fabrikschleichach, Department of Animal Ecology and Tropical Biology, Biocenter, University of Würzburg, Glashüttenstr. 5, 96181, Rauhenebrach, Germany
- Grupo de Investigación en Biodiversidad, Medio Ambiente y Salud-BIOMAS-Universidad de las Américas, Quito, Ecuador
| | - David A Donoso
- Grupo de Investigación en Biodiversidad, Medio Ambiente y Salud-BIOMAS-Universidad de las Américas, Quito, Ecuador
- Departamento de Biología, Facultad de Ciencias, Escuela Politécnica Nacional, Av. Ladrón de Guevara E11-253, CP 17-01-2759, Quito, Ecuador
| | - Jerome Moriniere
- AIM - Advanced Identification Methods GmbH, Niemeyerstr. 1, 04179, Leipzig, Germany
| | - Zuzana Buřivalová
- University of Wisconsin-Madison, Department of Forest and Wildlife Ecology and The Nelson Institute for Environmental Studies, 1630 Linden Drive, Madison, WI, 53706, USA
| |
Collapse
|
10
|
Lamrini M, Chkouri MY, Touhafi A. Evaluating the Performance of Pre-Trained Convolutional Neural Network for Audio Classification on Embedded Systems for Anomaly Detection in Smart Cities. SENSORS (BASEL, SWITZERLAND) 2023; 23:6227. [PMID: 37448075 DOI: 10.3390/s23136227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 06/26/2023] [Accepted: 07/05/2023] [Indexed: 07/15/2023]
Abstract
Environmental Sound Recognition (ESR) plays a crucial role in smart cities by accurately categorizing audio using well-trained Machine Learning (ML) classifiers. This application is particularly valuable for cities that analyzed environmental sounds to gain insight and data. However, deploying deep learning (DL) models on resource-constrained embedded devices, such as Raspberry Pi (RPi) or Tensor Processing Units (TPUs), poses challenges. In this work, an evaluation of an existing pre-trained model for deployment on Raspberry Pi (RPi) and TPU platforms other than a laptop is proposed. We explored the impact of the retraining parameters and compared the sound classification performance across three datasets: ESC-10, BDLib, and Urban Sound. Our results demonstrate the effectiveness of the pre-trained model for transfer learning in embedded systems. On laptops, the accuracy rates reached 96.6% for ESC-10, 100% for BDLib, and 99% for Urban Sound. On RPi, the accuracy rates were 96.4% for ESC-10, 100% for BDLib, and 95.3% for Urban Sound, while on RPi with Coral TPU, the rates were 95.7% for ESC-10, 100% for BDLib and 95.4% for the Urban Sound. Utilizing pre-trained models reduces the computational requirements, enabling faster inference. Leveraging pre-trained models in embedded systems accelerates the development, deployment, and performance of various real-time applications.
Collapse
Affiliation(s)
- Mimoun Lamrini
- Department of Engineering Sciences and Technology (INDI), Vrije Universiteit Brussel (VUB), 1050 Brussels, Belgium
- SIGL Laboratory, National School of Applied Sciences of Tetuan, Abdelmalek Essaadi University, Tetuan 93000, Morocco
| | - Mohamed Yassin Chkouri
- SIGL Laboratory, National School of Applied Sciences of Tetuan, Abdelmalek Essaadi University, Tetuan 93000, Morocco
| | - Abdellah Touhafi
- Department of Engineering Sciences and Technology (INDI), Vrije Universiteit Brussel (VUB), 1050 Brussels, Belgium
- Department of Electronics and Informatics (ETRO), Vrije Universiteit Brussel (VUB), 1050 Brussels, Belgium
| |
Collapse
|
11
|
Choi Y, Lee H. Interpretation of lung disease classification with light attention connected module. Biomed Signal Process Control 2023; 84:104695. [PMID: 36879856 PMCID: PMC9978539 DOI: 10.1016/j.bspc.2023.104695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 12/21/2022] [Accepted: 02/11/2023] [Indexed: 03/06/2023]
Abstract
Lung diseases lead to complications from obstructive diseases, and the COVID-19 pandemic has increased lung disease-related deaths. Medical practitioners use stethoscopes to diagnose lung disease. However, an artificial intelligence model capable of objective judgment is required since the experience and diagnosis of respiratory sounds differ. Therefore, in this study, we propose a lung disease classification model that uses an attention module and deep learning. Respiratory sounds were extracted using log-Mel spectrogram MFCC. Normal and five types of adventitious sounds were effectively classified by improving VGGish and adding a light attention connected module to which the efficient channel attention module (ECA-Net) was applied. The performance of the model was evaluated for accuracy, precision, sensitivity, specificity, f1-score, and balanced accuracy, which were 92.56%, 92.81%, 92.22%, 98.50%, 92.29%, and 95.4%, respectively. We confirmed high performance according to the attention effect. The classification causes of lung diseases were analyzed using gradient-weighted class activation mapping (Grad-CAM), and the performances of their models were compared using open lung sounds measured using a Littmann 3200 stethoscope. The experts' opinions were also included. Our results will contribute to the early diagnosis and interpretation of diseases in patients with lung disease by utilizing algorithms in smart medical stethoscopes.
Collapse
Affiliation(s)
- Youngjin Choi
- School of Industrial Management Engineering, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul 02841, Republic of Korea
| | - Hongchul Lee
- School of Industrial Management Engineering, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul 02841, Republic of Korea
| |
Collapse
|
12
|
A Performance Study of CNN Architectures for the Autonomous Detection of COVID-19 Symptoms Using Cough and Breathing. COMPUTERS 2023. [DOI: 10.3390/computers12020044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/22/2023]
Abstract
Deep learning (DL) methods have the potential to be used for detecting COVID-19 symptoms. However, the rationale for which DL method to use and which symptoms to detect has not yet been explored. In this paper, we present the first performance study which compares various convolutional neural network (CNN) architectures for the autonomous preliminary COVID-19 detection of cough and/or breathing symptoms. We compare and analyze residual networks (ResNets), visual geometry Groups (VGGs), Alex neural networks (AlexNet), densely connected networks (DenseNet), squeeze neural networks (SqueezeNet), and COVID-19 identification ResNet (CIdeR) architectures to investigate their classification performance. We uniquely train and validate both unimodal and multimodal CNN architectures using the EPFL and Cambridge datasets. Performance comparison across all modes and datasets showed that the VGG19 and DenseNet-201 achieved the highest unimodal and multimodal classification performance. VGG19 and DensNet-201 had high F1 scores (0.94 and 0.92) for unimodal cough classification on the Cambridge dataset, compared to the next highest F1 score for ResNet (0.79), with comparable F1 scores to ResNet for the larger EPFL cough dataset. They also had consistently high accuracy, recall, and precision. For multimodal detection, VGG19 and DenseNet-201 had the highest F1 scores (0.91) compared to the other CNN structures (≤0.90), with VGG19 also having the highest accuracy and recall. Our investigation provides the foundation needed to select the appropriate deep CNN method to utilize for non-contact early COVID-19 detection.
Collapse
|
13
|
Campana MG, Delmastro F, Pagani E. Transfer learning for the efficient detection of COVID-19 from smartphone audio data. PERVASIVE AND MOBILE COMPUTING 2023; 89:101754. [PMID: 36741300 PMCID: PMC9884612 DOI: 10.1016/j.pmcj.2023.101754] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Revised: 01/18/2023] [Accepted: 01/25/2023] [Indexed: 06/12/2023]
Abstract
Disease detection from smartphone data represents an open research challenge in mobile health (m-health) systems. COVID-19 and its respiratory symptoms are an important case study in this area and their early detection is a potential real instrument to counteract the pandemic situation. The efficacy of this solution mainly depends on the performances of AI algorithms applied to the collected data and their possible implementation directly on the users' mobile devices. Considering these issues, and the limited amount of available data, in this paper we present the experimental evaluation of 3 different deep learning models, compared also with hand-crafted features, and of two main approaches of transfer learning in the considered scenario: both feature extraction and fine-tuning. Specifically, we considered VGGish, YAMNET, and L3-Net (including 12 different configurations) evaluated through user-independent experiments on 4 different datasets (13,447 samples in total). Results clearly show the advantages of L3-Net in all the experimental settings as it overcomes the other solutions by 12.3% in terms of Precision-Recall AUC as features extractor, and by 10% when the model is fine-tuned. Moreover, we note that to fine-tune only the fully-connected layers of the pre-trained models generally leads to worse performances, with an average drop of 6.6% with respect to feature extraction. Finally, we evaluate the memory footprints of the different models for their possible applications on commercial mobile devices.
Collapse
Affiliation(s)
- Mattia Giovanni Campana
- Institute for Informatics and Telematics of the National Research Council of Italy (IIT-CNR), Pisa, Italy
| | - Franca Delmastro
- Institute for Informatics and Telematics of the National Research Council of Italy (IIT-CNR), Pisa, Italy
| | - Elena Pagani
- Computer Science Department, University of Milano, Milan, Italy
| |
Collapse
|
14
|
Ahmed A, Serrestou Y, Raoof K, Diouris JF. Empirical Mode Decomposition-Based Feature Extraction for Environmental Sound Classification. SENSORS (BASEL, SWITZERLAND) 2022; 22:7717. [PMID: 36298067 PMCID: PMC9612378 DOI: 10.3390/s22207717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 10/04/2022] [Accepted: 10/06/2022] [Indexed: 06/16/2023]
Abstract
In environment sound classification, log Mel band energies (MBEs) are considered as the most successful and commonly used features for classification. The underlying algorithm, fast Fourier transform (FFT), is valid under certain restrictions. In this study, we address these limitations of Fourier transform and propose a new method to extract log Mel band energies using amplitude modulation and frequency modulation. We present a comparative study between traditionally used log Mel band energy features extracted by Fourier transform and log Mel band energy features extracted by our new approach. This approach is based on extracting log Mel band energies from estimation of instantaneous frequency (IF) and instantaneous amplitude (IA), which are used to construct a spectrogram. The estimation of IA and IF is made by associating empirical mode decomposition (EMD) with the Teager-Kaiser energy operator (TKEO) and the discrete energy separation algorithm. Later, Mel filter bank is applied to the estimated spectrogram to generate EMD-TKEO-based MBEs, or simply, EMD-MBEs. In addition, we employ the EMD method to remove signal trends from the original signal and generate another type of MBE, called S-MBEs, using FFT and a Mel filter bank. Four different datasets were utilised and convolutional neural networks (CNN) were trained using features extracted from Fourier transform-based MBEs (FFT-MBEs), EMD-MBEs, and S-MBEs. In addition, CNNs were trained with an aggregation of all three feature extraction techniques and a combination of FFT-MBEs and EMD-MBEs. Individually, FFT-MBEs achieved higher accuracy compared to EMD-MBEs and S-MBEs. In general, the system trained with the combination of all three features performed slightly better compared to the system trained with the three features separately.
Collapse
Affiliation(s)
- Ammar Ahmed
- Laboratoire d’Acoustique de l’Université du Mans (LAUM), UMR 6613, Institut d’Acoustique-Graduate School (IA-GS), CNRS, Le Mans Université, 72085 Le Mans, France
| | - Youssef Serrestou
- Laboratoire d’Acoustique de l’Université du Mans (LAUM), UMR 6613, Institut d’Acoustique-Graduate School (IA-GS), CNRS, Le Mans Université, 72085 Le Mans, France
| | - Kosai Raoof
- Laboratoire d’Acoustique de l’Université du Mans (LAUM), UMR 6613, Institut d’Acoustique-Graduate School (IA-GS), CNRS, Le Mans Université, 72085 Le Mans, France
| | | |
Collapse
|