1
|
Kershenbaum A, Akçay Ç, Babu-Saheer L, Barnhill A, Best P, Cauzinille J, Clink D, Dassow A, Dufourq E, Growcott J, Markham A, Marti-Domken B, Marxer R, Muir J, Reynolds S, Root-Gutteridge H, Sadhukhan S, Schindler L, Smith BR, Stowell D, Wascher CAF, Dunn JC. Automatic detection for bioacoustic research: a practical guide from and for biologists and computer scientists. Biol Rev Camb Philos Soc 2024. [PMID: 39417330 DOI: 10.1111/brv.13155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 09/30/2024] [Accepted: 10/04/2024] [Indexed: 10/19/2024]
Abstract
Recent years have seen a dramatic rise in the use of passive acoustic monitoring (PAM) for biological and ecological applications, and a corresponding increase in the volume of data generated. However, data sets are often becoming so sizable that analysing them manually is increasingly burdensome and unrealistic. Fortunately, we have also seen a corresponding rise in computing power and the capability of machine learning algorithms, which offer the possibility of performing some of the analysis required for PAM automatically. Nonetheless, the field of automatic detection of acoustic events is still in its infancy in biology and ecology. In this review, we examine the trends in bioacoustic PAM applications, and their implications for the burgeoning amount of data that needs to be analysed. We explore the different methods of machine learning and other tools for scanning, analysing, and extracting acoustic events automatically from large volumes of recordings. We then provide a step-by-step practical guide for using automatic detection in bioacoustics. One of the biggest challenges for the greater use of automatic detection in bioacoustics is that there is often a gulf in expertise between the biological sciences and the field of machine learning and computer science. Therefore, this review first presents an overview of the requirements for automatic detection in bioacoustics, intended to familiarise those from a computer science background with the needs of the bioacoustics community, followed by an introduction to the key elements of machine learning and artificial intelligence that a biologist needs to understand to incorporate automatic detection into their research. We then provide a practical guide to building an automatic detection pipeline for bioacoustic data, and conclude with a discussion of possible future directions in this field.
Collapse
Affiliation(s)
- Arik Kershenbaum
- Girton College and Department of Zoology, University of Cambridge, Huntingdon Road, Cambridge, CB3 0JG, UK
| | - Çağlar Akçay
- Behavioural Ecology Research Group, School of Life Sciences, Anglia Ruskin University, East Road, Cambridge, CB1 1PT, UK
| | - Lakshmi Babu-Saheer
- Computing Informatics and Applications Research Group, School of Computing and Information Sciences, Anglia Ruskin University, East Road, Cambridge, CB1 1PT, UK
| | - Alex Barnhill
- Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, 91058, Germany
| | - Paul Best
- Université de Toulon, Aix Marseille Univ, CNRS, LIS, ILCB, CS 60584, Toulon, 83041 CEDEX 9, France
| | - Jules Cauzinille
- Université de Toulon, Aix Marseille Univ, CNRS, LIS, ILCB, CS 60584, Toulon, 83041 CEDEX 9, France
| | - Dena Clink
- K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University, 159 Sapsucker Woods Road, Ithaca, New York, 14850, USA
| | - Angela Dassow
- Biology Department, Carthage College, 2001 Alford Park Dr, 68 David A Straz Jr, Kenosha, Wisconsin, 53140, USA
| | - Emmanuel Dufourq
- African Institute for Mathematical Sciences, 7 Melrose Road, Muizenberg, Cape Town, 7441, South Africa
- Stellenbosch University, Jan Celliers Road, Stellenbosch, 7600, South Africa
- African Institute for Mathematical Sciences - Research and Innovation Centre, District Gasabo, Secteur Kacyiru, Cellule Kamatamu, Rue KG590 ST No 1, Kigali, Rwanda
| | - Jonathan Growcott
- Centre of Ecology and Conservation, College of Life and Environmental Sciences, University of Exeter, Cornwall Campus, Exeter, TR10 9FE, UK
- Wildlife Conservation Research Unit, Recanati-Kaplan Centre, Tubney House, Abingdon Road Tubney, Abingdon, OX13 5QL, UK
| | - Andrew Markham
- Department of Computer Science, University of Oxford, Parks Road, Oxford, OX1 3QD, UK
| | | | - Ricard Marxer
- Université de Toulon, Aix Marseille Univ, CNRS, LIS, ILCB, CS 60584, Toulon, 83041 CEDEX 9, France
| | - Jen Muir
- Behavioural Ecology Research Group, School of Life Sciences, Anglia Ruskin University, East Road, Cambridge, CB1 1PT, UK
| | - Sam Reynolds
- Behavioural Ecology Research Group, School of Life Sciences, Anglia Ruskin University, East Road, Cambridge, CB1 1PT, UK
| | - Holly Root-Gutteridge
- School of Natural Sciences, University of Lincoln, Joseph Banks Laboratories, Beevor Street, Lincoln, Lincolnshire, LN5 7TS, UK
| | - Sougata Sadhukhan
- Institute of Environment Education and Research, Pune Bharati Vidyapeeth Educational Campus, Satara Road, Pune, Maharashtra, 411 043, India
| | - Loretta Schindler
- Department of Zoology, Faculty of Science, Charles University, Prague, 128 44, Czech Republic
| | - Bethany R Smith
- Institute of Zoology, Zoological Society of London, Outer Circle, London, NW1 4RY, UK
| | - Dan Stowell
- Tilburg University, Tilburg, The Netherlands
- Naturalis Biodiversity Center, Darwinweg 2, Leiden, 2333 CR, The Netherlands
| | - Claudia A F Wascher
- Behavioural Ecology Research Group, School of Life Sciences, Anglia Ruskin University, East Road, Cambridge, CB1 1PT, UK
| | - Jacob C Dunn
- Behavioural Ecology Research Group, School of Life Sciences, Anglia Ruskin University, East Road, Cambridge, CB1 1PT, UK
- Department of Archaeology, University of Cambridge, Downing Street, Cambridge, CB2 3DZ, UK
- Department of Behavioral and Cognitive Biology, University of Vienna, University Biology Building (UBB), Djerassiplatiz 1, Vienna, 1030, Austria
| |
Collapse
|
2
|
Madhusudhana S, Klinck H, Symes LB. Extensive data engineering to the rescue: building a multi-species katydid detector from unbalanced, atypical training datasets. Philos Trans R Soc Lond B Biol Sci 2024; 379:20230444. [PMID: 38705172 PMCID: PMC11070257 DOI: 10.1098/rstb.2023.0444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 02/21/2024] [Indexed: 05/07/2024] Open
Abstract
Passive acoustic monitoring (PAM) is a powerful tool for studying ecosystems. However, its effective application in tropical environments, particularly for insects, poses distinct challenges. Neotropical katydids produce complex species-specific calls, spanning mere milliseconds to seconds and spread across broad audible and ultrasonic frequencies. However, subtle differences in inter-pulse intervals or central frequencies are often the only discriminatory traits. These extremities, coupled with low source levels and susceptibility to masking by ambient noise, challenge species identification in PAM recordings. This study aimed to develop a deep learning-based solution to automate the recognition of 31 katydid species of interest in a biodiverse Panamanian forest with over 80 katydid species. Besides the innate challenges, our efforts were also encumbered by a limited and imbalanced initial training dataset comprising domain-mismatched recordings. To overcome these, we applied rigorous data engineering, improving input variance through controlled playback re-recordings and by employing physics-based data augmentation techniques, and tuning signal-processing, model and training parameters to produce a custom well-fit solution. Methods developed here are incorporated into Koogu, an open-source Python-based toolbox for developing deep learning-based bioacoustic analysis solutions. The parametric implementations offer a valuable resource, enhancing the capabilities of PAM for studying insects in tropical ecosystems. This article is part of the theme issue 'Towards a toolkit for global insect biodiversity monitoring'.
Collapse
Affiliation(s)
- Shyam Madhusudhana
- Centre for Marine Science and Technology, Curtin University, Perth, Western Australia 6845, Australia
- K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University, Ithaca, NY 14853-0001, USA
| | - Holger Klinck
- K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University, Ithaca, NY 14853-0001, USA
| | - Laurel B. Symes
- K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University, Ithaca, NY 14853-0001, USA
- Smithsonian Tropical Research Institute, Balboa, Ancón, Panama City 0843-03092, Republic of Panama
| |
Collapse
|
3
|
Yang W, Chang W, Song Z, Niu F, Wang X, Zhang Y. Denoising odontocete echolocation clicks using a hybrid model with convolutional neural network and long short-term memory network. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:938-947. [PMID: 37581404 DOI: 10.1121/10.0020560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 07/17/2023] [Indexed: 08/16/2023]
Abstract
Ocean noise negatively influences the recording of odontocete echolocation clicks. In this study, a hybrid model based on the convolutional neural network (CNN) and long short-term memory (LSTM) network-called a hybrid CNN-LSTM model-was proposed to denoise echolocation clicks. To learn the model parameters, the echolocation clicks were partially corrupted by adding ocean noise, and the model was trained to recover the original echolocation clicks. It can be difficult to collect large numbers of echolocation clicks free of ambient sea noise for training networks. Data augmentation and transfer learning were employed to address this problem. Based on Gabor functions, simulated echolocation clicks were generated to pre-train the network models, and the parameters of the networks were then fine-tuned using odontocete echolocation clicks. Finally, the performance of the proposed model was evaluated using synthetic data. The experimental results demonstrated the effectiveness of the proposed model for denoising two typical echolocation clicks-namely, narrowband high-frequency and broadband echolocation clicks. The denoising performance of hybrid models with the different number of convolution and LSTM layers was evaluated. Consequently, hybrid models with one convolutional layer and multiple LSTM layers are recommended, which can be adopted for denoising both types of echolocation clicks.
Collapse
Affiliation(s)
- Wuyi Yang
- Key Laboratory of Underwater Acoustic Communication and Marine Information Technology of the Ministry of Education, College of Ocean and Earth Sciences, Xiamen University, Xiamen, People's Republic of China
| | - Wenlei Chang
- Key Laboratory of Underwater Acoustic Communication and Marine Information Technology of the Ministry of Education, College of Ocean and Earth Sciences, Xiamen University, Xiamen, People's Republic of China
| | - Zhongchang Song
- Key Laboratory of Underwater Acoustic Communication and Marine Information Technology of the Ministry of Education, College of Ocean and Earth Sciences, Xiamen University, Xiamen, People's Republic of China
| | - Fuqiang Niu
- Laboratory of Marine Biology and Ecology, Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, People's Republic of China
| | - Xianyan Wang
- Laboratory of Marine Biology and Ecology, Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, People's Republic of China
| | - Yu Zhang
- Key Laboratory of Underwater Acoustic Communication and Marine Information Technology of the Ministry of Education, College of Ocean and Earth Sciences, Xiamen University, Xiamen, People's Republic of China
| |
Collapse
|
5
|
Stowell D. Computational bioacoustics with deep learning: a review and roadmap. PeerJ 2022; 10:e13152. [PMID: 35341043 PMCID: PMC8944344 DOI: 10.7717/peerj.13152] [Citation(s) in RCA: 50] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Accepted: 03/01/2022] [Indexed: 01/20/2023] Open
Abstract
Animal vocalisations and natural soundscapes are fascinating objects of study, and contain valuable evidence about animal behaviours, populations and ecosystems. They are studied in bioacoustics and ecoacoustics, with signal processing and analysis an important component. Computational bioacoustics has accelerated in recent decades due to the growth of affordable digital sound recording devices, and to huge progress in informatics such as big data, signal processing and machine learning. Methods are inherited from the wider field of deep learning, including speech and image processing. However, the tasks, demands and data characteristics are often different from those addressed in speech or music analysis. There remain unsolved problems, and tasks for which evidence is surely present in many acoustic signals, but not yet realised. In this paper I perform a review of the state of the art in deep learning for computational bioacoustics, aiming to clarify key concepts and identify and analyse knowledge gaps. Based on this, I offer a subjective but principled roadmap for computational bioacoustics with deep learning: topics that the community should aim to address, in order to make the most of future developments in AI and informatics, and to use audio data in answering zoological and ecological questions.
Collapse
Affiliation(s)
- Dan Stowell
- Department of Cognitive Science and Artificial Intelligence, Tilburg University, Tilburg, The Netherlands,Naturalis Biodiversity Center, Leiden, The Netherlands
| |
Collapse
|