1
|
Mielke A, Badihi G, Graham KE, Grund C, Hashimoto C, Piel AK, Safryghin A, Slocombe KE, Stewart F, Wilke C, Zuberbühler K, Hobaiter C. Many morphs: Parsing gesture signals from the noise. Behav Res Methods 2024; 56:6520-6537. [PMID: 38438657 PMCID: PMC11362259 DOI: 10.3758/s13428-024-02368-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/12/2024] [Indexed: 03/06/2024]
Abstract
Parsing signals from noise is a general problem for signallers and recipients, and for researchers studying communicative systems. Substantial efforts have been invested in comparing how other species encode information and meaning, and how signalling is structured. However, research depends on identifying and discriminating signals that represent meaningful units of analysis. Early approaches to defining signal repertoires applied top-down approaches, classifying cases into predefined signal types. Recently, more labour-intensive methods have taken a bottom-up approach describing detailed features of each signal and clustering cases based on patterns of similarity in multi-dimensional feature-space that were previously undetectable. Nevertheless, it remains essential to assess whether the resulting repertoires are composed of relevant units from the perspective of the species using them, and redefining repertoires when additional data become available. In this paper we provide a framework that takes data from the largest set of wild chimpanzee (Pan troglodytes) gestures currently available, splitting gesture types at a fine scale based on modifying features of gesture expression using latent class analysis (a model-based cluster detection algorithm for categorical variables), and then determining whether this splitting process reduces uncertainty about the goal or community of the gesture. Our method allows different features of interest to be incorporated into the splitting process, providing substantial future flexibility across, for example, species, populations, and levels of signal granularity. Doing so, we provide a powerful tool allowing researchers interested in gestural communication to establish repertoires of relevant units for subsequent analyses within and between systems of communication.
Collapse
Affiliation(s)
- Alexander Mielke
- Wild Minds Lab, School of Psychology and Neuroscience, University of St Andrews, St Andrews, UK.
- School of Biological and Behavioural Sciences, Queen Mary University of London, London, UK.
| | - Gal Badihi
- Wild Minds Lab, School of Psychology and Neuroscience, University of St Andrews, St Andrews, UK
| | - Kirsty E Graham
- Wild Minds Lab, School of Psychology and Neuroscience, University of St Andrews, St Andrews, UK
| | - Charlotte Grund
- Wild Minds Lab, School of Psychology and Neuroscience, University of St Andrews, St Andrews, UK
| | - Chie Hashimoto
- Primate Research Institute, Kyoto University, Kyoto, Japan
| | - Alex K Piel
- Department of Anthropology, University College London, London, UK
- Department of Human Origins, Max Planck Institute of Evolutionary Anthropology, Leipzig, Germany
| | - Alexandra Safryghin
- Wild Minds Lab, School of Psychology and Neuroscience, University of St Andrews, St Andrews, UK
| | | | - Fiona Stewart
- Department of Anthropology, University College London, London, UK
- Department of Human Origins, Max Planck Institute of Evolutionary Anthropology, Leipzig, Germany
| | - Claudia Wilke
- Department of Psychology, University of York, York, UK
| | - Klaus Zuberbühler
- Institute of Biology, University of Neuchâtel, Neuchâtel, Switzerland
| | - Catherine Hobaiter
- Wild Minds Lab, School of Psychology and Neuroscience, University of St Andrews, St Andrews, UK
| |
Collapse
|
2
|
Schilling A, Gerum R, Boehm C, Rasheed J, Metzner C, Maier A, Reindl C, Hamer H, Krauss P. Deep learning based decoding of single local field potential events. Neuroimage 2024; 297:120696. [PMID: 38909761 DOI: 10.1016/j.neuroimage.2024.120696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 06/12/2024] [Accepted: 06/18/2024] [Indexed: 06/25/2024] Open
Abstract
How is information processed in the cerebral cortex? In most cases, recorded brain activity is averaged over many (stimulus) repetitions, which erases the fine-structure of the neural signal. However, the brain is obviously a single-trial processor. Thus, we here demonstrate that an unsupervised machine learning approach can be used to extract meaningful information from electro-physiological recordings on a single-trial basis. We use an auto-encoder network to reduce the dimensions of single local field potential (LFP) events to create interpretable clusters of different neural activity patterns. Strikingly, certain LFP shapes correspond to latency differences in different recording channels. Hence, LFP shapes can be used to determine the direction of information flux in the cerebral cortex. Furthermore, after clustering, we decoded the cluster centroids to reverse-engineer the underlying prototypical LFP event shapes. To evaluate our approach, we applied it to both extra-cellular neural recordings in rodents, and intra-cranial EEG recordings in humans. Finally, we find that single channel LFP event shapes during spontaneous activity sample from the realm of possible stimulus evoked event shapes. A finding which so far has only been demonstrated for multi-channel population coding.
Collapse
Affiliation(s)
- Achim Schilling
- Neuroscience Lab, University Hospital Erlangen, Germany; Cognitive Computational Neuroscience Group, University Erlangen-Nürnberg, Germany
| | - Richard Gerum
- Cognitive Computational Neuroscience Group, University Erlangen-Nürnberg, Germany; Department of Physics and Center for Vision Research, York University, Toronto, Canada
| | - Claudia Boehm
- Neuroscience Lab, University Hospital Erlangen, Germany; Cognitive Computational Neuroscience Group, University Erlangen-Nürnberg, Germany
| | - Jwan Rasheed
- Neuroscience Lab, University Hospital Erlangen, Germany; Cognitive Computational Neuroscience Group, University Erlangen-Nürnberg, Germany
| | - Claus Metzner
- Cognitive Computational Neuroscience Group, University Erlangen-Nürnberg, Germany; Pattern Recognition Lab, University Erlangen-Nürnberg, Germany
| | - Andreas Maier
- Pattern Recognition Lab, University Erlangen-Nürnberg, Germany
| | - Caroline Reindl
- Epilepsy Center, Department of Neurology, University Hospital Erlangen, Germany
| | - Hajo Hamer
- Epilepsy Center, Department of Neurology, University Hospital Erlangen, Germany
| | - Patrick Krauss
- Cognitive Computational Neuroscience Group, University Erlangen-Nürnberg, Germany; Pattern Recognition Lab, University Erlangen-Nürnberg, Germany.
| |
Collapse
|
3
|
Mahbub T, Bhagwagar A, Chand P, Zualkernan I, Judas J, Dghaym D. Bat2Web: A Framework for Real-Time Classification of Bat Species Echolocation Signals Using Audio Sensor Data. SENSORS (BASEL, SWITZERLAND) 2024; 24:2899. [PMID: 38733008 PMCID: PMC11086295 DOI: 10.3390/s24092899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 04/09/2024] [Accepted: 04/26/2024] [Indexed: 05/13/2024]
Abstract
Bats play a pivotal role in maintaining ecological balance, and studying their behaviors offers vital insights into environmental health and aids in conservation efforts. Determining the presence of various bat species in an environment is essential for many bat studies. Specialized audio sensors can be used to record bat echolocation calls that can then be used to identify bat species. However, the complexity of bat calls presents a significant challenge, necessitating expert analysis and extensive time for accurate interpretation. Recent advances in neural networks can help identify bat species automatically from their echolocation calls. Such neural networks can be integrated into a complete end-to-end system that leverages recent internet of things (IoT) technologies with long-range, low-powered communication protocols to implement automated acoustical monitoring. This paper presents the design and implementation of such a system that uses a tiny neural network for interpreting sensor data derived from bat echolocation signals. A highly compact convolutional neural network (CNN) model was developed that demonstrated excellent performance in bat species identification, achieving an F1-score of 0.9578 and an accuracy rate of 97.5%. The neural network was deployed, and its performance was evaluated on various alternative edge devices, including the NVIDIA Jetson Nano and Google Coral.
Collapse
Affiliation(s)
- Taslim Mahbub
- Department of Computer Science and Engineering, American University of Sharjah, Sharjah 26666, United Arab Emirates; (A.B.); (P.C.); (I.Z.); (D.D.)
| | - Azadan Bhagwagar
- Department of Computer Science and Engineering, American University of Sharjah, Sharjah 26666, United Arab Emirates; (A.B.); (P.C.); (I.Z.); (D.D.)
| | - Priyanka Chand
- Department of Computer Science and Engineering, American University of Sharjah, Sharjah 26666, United Arab Emirates; (A.B.); (P.C.); (I.Z.); (D.D.)
| | - Imran Zualkernan
- Department of Computer Science and Engineering, American University of Sharjah, Sharjah 26666, United Arab Emirates; (A.B.); (P.C.); (I.Z.); (D.D.)
| | - Jacky Judas
- Nature & Ecosystem Restoration, Soudah Development, Riyadh 13519, Saudi Arabia;
| | - Dana Dghaym
- Department of Computer Science and Engineering, American University of Sharjah, Sharjah 26666, United Arab Emirates; (A.B.); (P.C.); (I.Z.); (D.D.)
| |
Collapse
|
4
|
Noda T, Koizumi T, Yukitake N, Yamamoto D, Nakaizumi T, Tanaka K, Okuyama J, Ichikawa K, Hara T. Animal-borne soundscape logger as a system for edge classification of sound sources and data transmission for monitoring near-real-time underwater soundscape. Sci Rep 2024; 14:6394. [PMID: 38493174 PMCID: PMC10944488 DOI: 10.1038/s41598-024-56439-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 03/06/2024] [Indexed: 03/18/2024] Open
Abstract
The underwater environment is filled with various sounds, with its soundscape composed of biological, geographical, and anthropological sounds. Our work focused on developing a novel method to observe and classify these sounds, enriching our understanding of the underwater ecosystem. We constructed a biologging system allowing near-real-time observation of underwater soundscapes. Utilizing deep-learning-based edge processing, this system classifies the sources of sounds, and upon the tagged animal surfacing, it transmits positional data, results of sound source classification, and sensor readings such as depth and temperature. To test the system, we attached the logger to sea turtles (Chelonia mydas) and collected data through a cellular network. The data provided information on the location-specific sounds detected by the sea turtles, suggesting the possibility to infer the distribution of specific species of organisms over time. The data showed that not only biological sounds but also geographical and anthropological sounds can be classified, highlighting the potential for conducting multi-point and long-term observations to monitor the distribution patterns of various sound sources. This system, which can be considered an autonomous mobile platform for oceanographic observations, including soundscapes, has significant potential to enhance our understanding of acoustic diversity.
Collapse
Affiliation(s)
| | | | | | | | | | - Kotaro Tanaka
- Japan Fisheries Science and Technology Association, Tokyo, Japan
- Ocean Policy Research Institute of the Sasakawa Peace Foundation, Tokyo, Japan
| | - Junichi Okuyama
- Fisheries Technology Institute, Japan Fisheries Research and Education Agency, Okinawa, Japan
| | - Kotaro Ichikawa
- Field Science Education and Research Center, Kyoto University, Kyoto, Japan
| | - Takeshi Hara
- Japan Fisheries Science and Technology Association, Tokyo, Japan
| |
Collapse
|
5
|
Kather V, Seipel F, Berges B, Davis G, Gibson C, Harvey M, Henry LA, Stevenson A, Risch D. Development of a machine learning detector for North Atlantic humpback whale song. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:2050-2064. [PMID: 38477612 DOI: 10.1121/10.0025275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 02/22/2024] [Indexed: 03/14/2024]
Abstract
The study of humpback whale song using passive acoustic monitoring devices requires bioacousticians to manually review hours of audio recordings to annotate the signals. To vastly reduce the time of manual annotation through automation, a machine learning model was developed. Convolutional neural networks have made major advances in the previous decade, leading to a wide range of applications, including the detection of frequency modulated vocalizations by cetaceans. A large dataset of over 60 000 audio segments of 4 s length is collected from the North Atlantic and used to fine-tune an existing model for humpback whale song detection in the North Pacific (see Allen, Harvey, Harrell, Jansen, Merkens, Wall, Cattiau, and Oleson (2021). Front. Mar. Sci. 8, 607321). Furthermore, different data augmentation techniques (time-shift, noise augmentation, and masking) are used to artificially increase the variability within the training set. Retraining and augmentation yield F-score values of 0.88 on context window basis and 0.89 on hourly basis with false positive rates of 0.05 on context window basis and 0.01 on hourly basis. If necessary, usage and retraining of the existing model is made convenient by a framework (AcoDet, acoustic detector) built during this project. Combining the tools provided by this framework could save researchers hours of manual annotation time and, thus, accelerate their research.
Collapse
Affiliation(s)
- Vincent Kather
- Audio Communication and Technology, Technical University Berlin, Einsteinufer 17c, 10587, Berlin, Germany
| | - Fabian Seipel
- Audio Communication and Technology, Technical University Berlin, Einsteinufer 17c, 10587, Berlin, Germany
| | - Benoit Berges
- Wageningen Marine Research, Wageningen University and Research, IJmuiden, Noord Holland, 1976 CP, Netherlands
| | - Genevieve Davis
- National Oceanic and Atmospheric Administration (NOAA) Northeast Fisheries Science Center, 166 Water Street, Woods Hole, Massachusetts 02543, USA
| | - Catherine Gibson
- School of Biological Sciences, Queens University Belfast, Belfast, BT9 5DL, Northern Ireland
| | - Matt Harvey
- Google Inc., Mountain View, California 94043, USA
| | - Lea-Anne Henry
- School of GeoSciences, University of Edinburgh, James Hutton Road, EH9 3FE, Edinburgh, Scotland
| | | | - Denise Risch
- Scottish Association for Marine Science, University of Highlands and Islands, Oban, PA37 1QJ, Scotland
| |
Collapse
|
6
|
Fernandez-Betelu O, Iorio-Merlo V, Graham IM, Cheney BJ, Prentice SM, Cheng RX, Thompson PM. Variation in foraging activity influences area-restricted search behaviour by bottlenose dolphins. ROYAL SOCIETY OPEN SCIENCE 2023; 10:221613. [PMID: 37325592 PMCID: PMC10265022 DOI: 10.1098/rsos.221613] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 05/26/2023] [Indexed: 06/17/2023]
Abstract
Area-restricted search (ARS) behaviour is commonly used to characterize spatio-temporal variation in foraging activity of predators, but evidence of the drivers underlying this behaviour in marine systems is sparse. Advances in underwater sound recording techniques and automated processing of acoustic data now provide opportunities to investigate these questions where species use different vocalizations when encountering prey. Here, we used passive acoustics to investigate drivers of ARS behaviour in a population of dolphins and determined if residency in key foraging areas increased following encounters with prey. Analyses were based on two independent proxies of foraging: echolocation buzzes (widely used as foraging proxies) and bray calls (vocalizations linked to salmon predation attempts). Echolocation buzzes were extracted from echolocation data loggers and bray calls from broadband recordings by a convolutional neural network. We found a strong positive relationship between the duration of encounters and the frequency of both foraging proxies, supporting the theory that bottlenose dolphins engage in ARS behaviour in response to higher prey encounter rates. This study provides empirical evidence for one driver of ARS behaviour and demonstrates the potential for applying passive acoustic monitoring in combination with deep learning-based techniques to investigate the behaviour of vocal animals.
Collapse
Affiliation(s)
- Oihane Fernandez-Betelu
- Lighthouse Field Station, School of Biological Sciences, University of Aberdeen, Lighthouse Field Station, Cromarty IV11 8YL, UK
| | - Virginia Iorio-Merlo
- Lighthouse Field Station, School of Biological Sciences, University of Aberdeen, Lighthouse Field Station, Cromarty IV11 8YL, UK
| | - Isla M. Graham
- Lighthouse Field Station, School of Biological Sciences, University of Aberdeen, Lighthouse Field Station, Cromarty IV11 8YL, UK
| | - Barbara J. Cheney
- Lighthouse Field Station, School of Biological Sciences, University of Aberdeen, Lighthouse Field Station, Cromarty IV11 8YL, UK
| | - Simone M. Prentice
- Lighthouse Field Station, School of Biological Sciences, University of Aberdeen, Lighthouse Field Station, Cromarty IV11 8YL, UK
| | - Rachael Xi Cheng
- Leibniz Institute for Zoo and Wildlife Research (IZW), Berlin 10315, Germany
| | - Paul M. Thompson
- Lighthouse Field Station, School of Biological Sciences, University of Aberdeen, Lighthouse Field Station, Cromarty IV11 8YL, UK
| |
Collapse
|
7
|
Bergler C, Smeele SQ, Tyndel SA, Barnhill A, Ortiz ST, Kalan AK, Cheng RX, Brinkløv S, Osiecka AN, Tougaard J, Jakobsen F, Wahlberg M, Nöth E, Maier A, Klump BC. ANIMAL-SPOT enables animal-independent signal detection and classification using deep learning. Sci Rep 2022; 12:21966. [PMID: 36535999 PMCID: PMC9763499 DOI: 10.1038/s41598-022-26429-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 12/14/2022] [Indexed: 12/23/2022] Open
Abstract
Bioacoustic research spans a wide range of biological questions and applications, relying on identification of target species or smaller acoustic units, such as distinct call types. However, manually identifying the signal of interest is time-intensive, error-prone, and becomes unfeasible with large data volumes. Therefore, machine-driven algorithms are increasingly applied to various bioacoustic signal identification challenges. Nevertheless, biologists still have major difficulties trying to transfer existing animal- and/or scenario-related machine learning approaches to their specific animal datasets and scientific questions. This study presents an animal-independent, open-source deep learning framework, along with a detailed user guide. Three signal identification tasks, commonly encountered in bioacoustics research, were investigated: (1) target signal vs. background noise detection, (2) species classification, and (3) call type categorization. ANIMAL-SPOT successfully segmented human-annotated target signals in data volumes representing 10 distinct animal species and 1 additional genus, resulting in a mean test accuracy of 97.9%, together with an average area under the ROC curve (AUC) of 95.9%, when predicting on unseen recordings. Moreover, an average segmentation accuracy and F1-score of 95.4% was achieved on the publicly available BirdVox-Full-Night data corpus. In addition, multi-class species and call type classification resulted in 96.6% and 92.7% accuracy on unseen test data, as well as 95.2% and 88.4% regarding previous animal-specific machine-based detection excerpts. Furthermore, an Unweighted Average Recall (UAR) of 89.3% outperformed the multi-species classification baseline system of the ComParE 2021 Primate Sub-Challenge. Besides animal independence, ANIMAL-SPOT does not rely on expert knowledge or special computing resources, thereby making deep-learning-based bioacoustic signal identification accessible to a broad audience.
Collapse
Affiliation(s)
- Christian Bergler
- grid.5330.50000 0001 2107 3311Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91058 Erlangen, Germany
| | - Simeon Q. Smeele
- grid.507516.00000 0004 7661 536XCognitive and Cultural Ecology Lab, Max Planck Institute of Animal Behavior, 78315 Radolfzell, Germany ,grid.419518.00000 0001 2159 1813Department of Human Behavior, Ecology and Culture, Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany ,grid.9811.10000 0001 0658 7699Biology Department, University of Konstanz, 78464 Constance, Germany
| | - Stephen A. Tyndel
- grid.507516.00000 0004 7661 536XCognitive and Cultural Ecology Lab, Max Planck Institute of Animal Behavior, 78315 Radolfzell, Germany ,grid.35403.310000 0004 1936 9991Department of Natural Resources and Environmental Sciences, University of Illinois Urbana-Champaign, Champaign, IL United States
| | - Alexander Barnhill
- grid.5330.50000 0001 2107 3311Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91058 Erlangen, Germany
| | - Sara T. Ortiz
- grid.4372.20000 0001 2105 1091Max Planck Institute for Biological Intelligence, in Foundation, Seewiesen Eberhard-Gwinner-Strasse, 82319 Starnberg, Germany
| | - Ammie K. Kalan
- grid.143640.40000 0004 1936 9465Department of Anthropology, University of Victoria, Victoria, BC V8P 5C2 Canada
| | - Rachael Xi Cheng
- grid.418779.40000 0001 0708 0355Leibniz Institute for Zoo and Wildlife Research, Alfred-Kowalke-Straße 17, 10315 Berlin, Germany
| | - Signe Brinkløv
- grid.7048.b0000 0001 1956 2722Department of Bioscience, Wildlife Ecology, Aarhus University, 8410 Rønde, Denmark
| | - Anna N. Osiecka
- grid.8585.00000 0001 2370 4076Department of Vertebrate Ecology and Zoology, Faculty of Biology, University of Gdańsk, 80-308 Gdańsk, Poland
| | - Jakob Tougaard
- grid.7048.b0000 0001 1956 2722Department of Bioscience, Marine Mammal Research, Aarhus University, 4000 Roskilde, Denmark
| | - Freja Jakobsen
- grid.10825.3e0000 0001 0728 0170Department of Biology, University of Southern Denmark, 5230 Odense, Denmark
| | - Magnus Wahlberg
- grid.10825.3e0000 0001 0728 0170Department of Biology, University of Southern Denmark, 5230 Odense, Denmark
| | - Elmar Nöth
- grid.5330.50000 0001 2107 3311Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91058 Erlangen, Germany
| | - Andreas Maier
- grid.5330.50000 0001 2107 3311Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91058 Erlangen, Germany
| | - Barbara C. Klump
- grid.507516.00000 0004 7661 536XCognitive and Cultural Ecology Lab, Max Planck Institute of Animal Behavior, 78315 Radolfzell, Germany
| |
Collapse
|
8
|
Mutanu L, Gohil J, Gupta K, Wagio P, Kotonya G. A Review of Automated Bioacoustics and General Acoustics Classification Research. SENSORS (BASEL, SWITZERLAND) 2022; 22:8361. [PMID: 36366061 PMCID: PMC9658612 DOI: 10.3390/s22218361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Revised: 10/19/2022] [Accepted: 10/21/2022] [Indexed: 06/16/2023]
Abstract
Automated bioacoustics classification has received increasing attention from the research community in recent years due its cross-disciplinary nature and its diverse application. Applications in bioacoustics classification range from smart acoustic sensor networks that investigate the effects of acoustic vocalizations on species to context-aware edge devices that anticipate changes in their environment adapt their sensing and processing accordingly. The research described here is an in-depth survey of the current state of bioacoustics classification and monitoring. The survey examines bioacoustics classification alongside general acoustics to provide a representative picture of the research landscape. The survey reviewed 124 studies spanning eight years of research. The survey identifies the key application areas in bioacoustics research and the techniques used in audio transformation and feature extraction. The survey also examines the classification algorithms used in bioacoustics systems. Lastly, the survey examines current challenges, possible opportunities, and future directions in bioacoustics.
Collapse
Affiliation(s)
- Leah Mutanu
- Department of Computing, United States International University Africa, Nairobi P.O. Box 14634-0800, Kenya
| | - Jeet Gohil
- Department of Computing, United States International University Africa, Nairobi P.O. Box 14634-0800, Kenya
| | - Khushi Gupta
- Department of Computer Science, Sam Houston State University, Huntsville, TX 77341, USA
| | - Perpetua Wagio
- Department of Computing, United States International University Africa, Nairobi P.O. Box 14634-0800, Kenya
| | - Gerald Kotonya
- School of Computing and Communications, Lancaster University, Lacaster LA1 4WA, UK
| |
Collapse
|
9
|
|
10
|
Borowiec ML, Dikow RB, Frandsen PB, McKeeken A, Valentini G, White AE. Deep learning as a tool for ecology and evolution. Methods Ecol Evol 2022. [DOI: 10.1111/2041-210x.13901] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Marek L. Borowiec
- Entomology, Plant Pathology and Nematology University of Idaho Moscow ID USA
- Institute for Bioinformatics and Evolutionary Studies (IBEST) University of Idaho Moscow ID USA
| | - Rebecca B. Dikow
- Data Science Lab, Office of the Chief Information Officer Smithsonian Institution Washington DC USA
| | - Paul B. Frandsen
- Data Science Lab, Office of the Chief Information Officer Smithsonian Institution Washington DC USA
- Department of Plant and Wildlife Sciences Brigham Young University Provo UT USA
| | - Alexander McKeeken
- Entomology, Plant Pathology and Nematology University of Idaho Moscow ID USA
| | | | - Alexander E. White
- Data Science Lab, Office of the Chief Information Officer Smithsonian Institution Washington DC USA
- Department of Botany, National Museum of Natural History Smithsonian Institution Washington DC USA
| |
Collapse
|
11
|
Wang Y, Ye J, Borchers DL. Automated call detection for acoustic surveys with structured calls of varying length. Methods Ecol Evol 2022. [DOI: 10.1111/2041-210x.13873] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Yuheng Wang
- Centre for Research into Ecological and Environmental Modelling School of Mathematics and Statistics University of St Andrews, The Observatory, St Andrews Fife Scotland
| | - Juan Ye
- School of Computer Science University of St Andrews, North Haugh, St Andrews Fife Scotland
| | - David L. Borchers
- Centre for Research into Ecological and Environmental Modelling School of Mathematics and Statistics University of St Andrews, The Observatory, St Andrews Fife Scotland
- Centre for Statistics in Ecology, the Environment, and Conservation University of Cape Town Cape Town South Africa
| |
Collapse
|
12
|
Stowell D. Computational bioacoustics with deep learning: a review and roadmap. PeerJ 2022; 10:e13152. [PMID: 35341043 PMCID: PMC8944344 DOI: 10.7717/peerj.13152] [Citation(s) in RCA: 50] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Accepted: 03/01/2022] [Indexed: 01/20/2023] Open
Abstract
Animal vocalisations and natural soundscapes are fascinating objects of study, and contain valuable evidence about animal behaviours, populations and ecosystems. They are studied in bioacoustics and ecoacoustics, with signal processing and analysis an important component. Computational bioacoustics has accelerated in recent decades due to the growth of affordable digital sound recording devices, and to huge progress in informatics such as big data, signal processing and machine learning. Methods are inherited from the wider field of deep learning, including speech and image processing. However, the tasks, demands and data characteristics are often different from those addressed in speech or music analysis. There remain unsolved problems, and tasks for which evidence is surely present in many acoustic signals, but not yet realised. In this paper I perform a review of the state of the art in deep learning for computational bioacoustics, aiming to clarify key concepts and identify and analyse knowledge gaps. Based on this, I offer a subjective but principled roadmap for computational bioacoustics with deep learning: topics that the community should aim to address, in order to make the most of future developments in AI and informatics, and to use audio data in answering zoological and ecological questions.
Collapse
Affiliation(s)
- Dan Stowell
- Department of Cognitive Science and Artificial Intelligence, Tilburg University, Tilburg, The Netherlands,Naturalis Biodiversity Center, Leiden, The Netherlands
| |
Collapse
|
13
|
Marck A, Vortman Y, Kolodny O, Lavner Y. Identification, Analysis and Characterization of Base Units of Bird Vocal Communication: The White Spectacled Bulbul (Pycnonotus xanthopygos) as a Case Study. Front Behav Neurosci 2022; 15:812939. [PMID: 35237136 PMCID: PMC8884146 DOI: 10.3389/fnbeh.2021.812939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Accepted: 12/28/2021] [Indexed: 11/13/2022] Open
Abstract
Animal vocal communication is a broad and multi-disciplinary field of research. Studying various aspects of communication can provide key elements for understanding animal behavior, evolution, and cognition. Given the large amount of acoustic data accumulated from automated recorders, for which manual annotation and analysis is impractical, there is a growing need to develop algorithms and automatic methods for analyzing and identifying animal sounds. In this study we developed an automatic detection and analysis system based on audio signal processing algorithms and deep learning that is capable of processing and analyzing large volumes of data without human bias. We selected the White Spectacled Bulbul (Pycnonotus xanthopygos) as our bird model because it has a complex vocal communication system with a large repertoire which is used by both sexes, year-round. It is a common, widespread passerine in Israel, which is relatively easy to locate and record in a broad range of habitats. Like many passerines, the Bulbul’s vocal communication consists of two primary hierarchies of utterances, syllables and words. To extract each of these units’ characteristics, the fundamental frequency contour was modeled using a low degree Legendre polynomial, enabling it to capture the different patterns of variation from different vocalizations, so that each pattern could be effectively expressed using very few coefficients. In addition, a mel-spectrogram was computed for each unit, and several features were extracted both in the time-domain (e.g., zero-crossing rate and energy) and frequency-domain (e.g., spectral centroid and spectral flatness). We applied both linear and non-linear dimensionality reduction algorithms on feature vectors and validated the findings that were obtained manually, namely by listening and examining the spectrograms visually. Using these algorithms, we show that the Bulbul has a complex vocabulary of more than 30 words, that there are multiple syllables that are combined in different words, and that a particular syllable can appear in several words. Using our system, researchers will be able to analyze hundreds of hours of audio recordings, to obtain objective evaluation of repertoires, and to identify different vocal units and distinguish between them, thus gaining a broad perspective on bird vocal communication.
Collapse
Affiliation(s)
- Aya Marck
- The Department of Ecology, Evolution and Behavior, The Hebrew University of Jerusalem, Jerusalem, Israel
- *Correspondence: Aya Marck,
| | - Yoni Vortman
- Department of Animal Sciences, Hula Research Center, Tel-Hai College, Tel-Hai, Israel
| | - Oren Kolodny
- The Department of Ecology, Evolution and Behavior, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Yizhar Lavner
- Department of Computer Science, Tel-Hai College, Tel-Hai, Israel
- Yizhar Lavner,
| |
Collapse
|
14
|
Parsons MJG, Lin TH, Mooney TA, Erbe C, Juanes F, Lammers M, Li S, Linke S, Looby A, Nedelec SL, Van Opzeeland I, Radford C, Rice AN, Sayigh L, Stanley J, Urban E, Di Iorio L. Sounding the Call for a Global Library of Underwater Biological Sounds. Front Ecol Evol 2022. [DOI: 10.3389/fevo.2022.810156] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Aquatic environments encompass the world’s most extensive habitats, rich with sounds produced by a diversity of animals. Passive acoustic monitoring (PAM) is an increasingly accessible remote sensing technology that uses hydrophones to listen to the underwater world and represents an unprecedented, non-invasive method to monitor underwater environments. This information can assist in the delineation of biologically important areas via detection of sound-producing species or characterization of ecosystem type and condition, inferred from the acoustic properties of the local soundscape. At a time when worldwide biodiversity is in significant decline and underwater soundscapes are being altered as a result of anthropogenic impacts, there is a need to document, quantify, and understand biotic sound sources–potentially before they disappear. A significant step toward these goals is the development of a web-based, open-access platform that provides: (1) a reference library of known and unknown biological sound sources (by integrating and expanding existing libraries around the world); (2) a data repository portal for annotated and unannotated audio recordings of single sources and of soundscapes; (3) a training platform for artificial intelligence algorithms for signal detection and classification; and (4) a citizen science-based application for public users. Although individually, these resources are often met on regional and taxa-specific scales, many are not sustained and, collectively, an enduring global database with an integrated platform has not been realized. We discuss the benefits such a program can provide, previous calls for global data-sharing and reference libraries, and the challenges that need to be overcome to bring together bio- and ecoacousticians, bioinformaticians, propagation experts, web engineers, and signal processing specialists (e.g., artificial intelligence) with the necessary support and funding to build a sustainable and scalable platform that could address the needs of all contributors and stakeholders into the future.
Collapse
|
15
|
Romero-Mujalli D, Bergmann T, Zimmermann A, Scheumann M. Utilizing DeepSqueak for automatic detection and classification of mammalian vocalizations: a case study on primate vocalizations. Sci Rep 2021; 11:24463. [PMID: 34961788 PMCID: PMC8712519 DOI: 10.1038/s41598-021-03941-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 12/09/2021] [Indexed: 11/16/2022] Open
Abstract
Bioacoustic analyses of animal vocalizations are predominantly accomplished through manual scanning, a highly subjective and time-consuming process. Thus, validated automated analyses are needed that are usable for a variety of animal species and easy to handle by non-programing specialists. This study tested and validated whether DeepSqueak, a user-friendly software, developed for rodent ultrasonic vocalizations, can be generalized to automate the detection/segmentation, clustering and classification of high-frequency/ultrasonic vocalizations of a primate species. Our validation procedure showed that the trained detectors for vocalizations of the gray mouse lemur (Microcebus murinus) can deal with different call types, individual variation and different recording quality. Implementing additional filters drastically reduced noise signals (4225 events) and call fragments (637 events), resulting in 91% correct detections (Ntotal = 3040). Additionally, the detectors could be used to detect the vocalizations of an evolutionary closely related species, the Goodman’s mouse lemur (M. lehilahytsara). An integrated supervised classifier classified 93% of the 2683 calls correctly to the respective call type, and the unsupervised clustering model grouped the calls into clusters matching the published human-made categories. This study shows that DeepSqueak can be successfully utilized to detect, cluster and classify high-frequency/ultrasonic vocalizations of other taxa than rodents, and suggests a validation procedure usable to evaluate further bioacoustics software.
Collapse
Affiliation(s)
- Daniel Romero-Mujalli
- Institute of Zoology, University of Veterinary Medicine Hannover, Bünteweg 17, 30559, Hannover, Germany.
| | - Tjard Bergmann
- Institute of Zoology, University of Veterinary Medicine Hannover, Bünteweg 17, 30559, Hannover, Germany
| | | | - Marina Scheumann
- Institute of Zoology, University of Veterinary Medicine Hannover, Bünteweg 17, 30559, Hannover, Germany
| |
Collapse
|
16
|
Compensating class imbalance for acoustic chimpanzee detection with convolutional recurrent neural networks. ECOL INFORM 2021. [DOI: 10.1016/j.ecoinf.2021.101423] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
17
|
Comparing recurrent convolutional neural networks for large scale bird species classification. Sci Rep 2021; 11:17085. [PMID: 34429468 PMCID: PMC8385065 DOI: 10.1038/s41598-021-96446-w] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Accepted: 08/10/2021] [Indexed: 11/08/2022] Open
Abstract
We present a deep learning approach towards the large-scale prediction and analysis of bird acoustics from 100 different bird species. We use spectrograms constructed on bird audio recordings from the Cornell Bird Challenge (CBC)2020 dataset, which includes recordings of multiple and potentially overlapping bird vocalizations with background noise. Our experiments show that a hybrid modeling approach that involves a Convolutional Neural Network (CNN) for learning the representation for a slice of the spectrogram, and a Recurrent Neural Network (RNN) for the temporal component to combine across time-points leads to the most accurate model on this dataset. We show results on a spectrum of models ranging from stand-alone CNNs to hybrid models of various types obtained by combining CNNs with other CNNs or RNNs of the following types: Long Short-Term Memory (LSTM) networks, Gated Recurrent Units (GRU), and Legendre Memory Units (LMU). The best performing model achieves an average accuracy of 67% over the 100 different bird species, with the highest accuracy of 90% for the bird species, Red crossbill. We further analyze the learned representations visually and find them to be intuitive, where we find that related bird species are clustered close together. We present a novel way to empirically interpret the representations learned by the LMU-based hybrid model which shows how memory channel patterns change over time with the changes seen in the spectrograms.
Collapse
|
18
|
Reinwald M, Moseley B, Szenicer A, Nissen-Meyer T, Oduor S, Vollrath F, Markham A, Mortimer B. Seismic localization of elephant rumbles as a monitoring approach. J R Soc Interface 2021; 18:20210264. [PMID: 34255988 PMCID: PMC8277467 DOI: 10.1098/rsif.2021.0264] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 06/23/2021] [Indexed: 11/12/2022] Open
Abstract
African elephants (Loxodonta africana) are sentient and intelligent animals that use a variety of vocalizations to greet, warn or communicate with each other. Their low-frequency rumbles propagate through the air as well as through the ground and the physical properties of both media cause differences in frequency filtering and propagation distances of the respective wave. However, it is not well understood how each mode contributes to the animals' abilities to detect these rumbles and extract behavioural or spatial information. In this study, we recorded seismic and co-generated acoustic rumbles in Kenya and compared their potential use to localize the vocalizing animal using the same multi-lateration algorithms. For our experimental set-up, seismic localization has higher accuracy than acoustic, and bimodal localization does not improve results. We conclude that seismic rumbles can be used to remotely monitor and even decipher elephant social interactions, presenting us with a tool for far-reaching, non-intrusive and surprisingly informative wildlife monitoring.
Collapse
Affiliation(s)
| | - Ben Moseley
- Department of Computer Science, University of Oxford, Oxford, UK
| | | | | | | | - Fritz Vollrath
- Department of Zoology, University of Oxford, Oxford, UK
- Save the Elephants, Marula Manor, Karen, Nairobi, Kenya
| | - Andrew Markham
- Department of Computer Science, University of Oxford, Oxford, UK
| | - Beth Mortimer
- Department of Zoology, University of Oxford, Oxford, UK
| |
Collapse
|
19
|
Rasmussen JH, Širović A. Automatic detection and classification of baleen whale social calls using convolutional neural networks. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:3635. [PMID: 34241118 DOI: 10.1121/10.0005047] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Accepted: 04/30/2021] [Indexed: 06/13/2023]
Abstract
Passive acoustic monitoring has proven to be an indispensable tool for many aspects of baleen whale research. Manual detection of whale calls on these large data sets demands extensive manual labor. Automated whale call detectors offer a more efficient approach and have been developed for many species and call types. However, calls with a large level of variability such as fin whale (Balaenoptera physalus) 40 Hz call and blue whale (B. musculus) D call have been challenging to detect automatically and hence no practical automated detector exists for these two call types. Using a modular approach consisting of faster region-based convolutional neural network followed by a convolutional neural network, we have created automated detectors for 40 Hz calls and D calls. Both detectors were tested on recordings with high- and low density of calls and, when selecting for detections with high classification scores, they were shown to have precision ranging from 54% to 57% with recall ranging from 72% to 78% for 40 Hz and precision ranging from 62% to 64% with recall ranging from 70 to 73% for D calls. As these two call types are produced by both sexes, using them in long-term studies would remove sex-bias in estimates of temporal presence and movement patterns.
Collapse
Affiliation(s)
- Jeppe Have Rasmussen
- Department of Marine Biology, Texas A&M University at Galveston, Galveston, Texas 77554, USA
| | - Ana Širović
- Department of Marine Biology, Texas A&M University at Galveston, Galveston, Texas 77554, USA
| |
Collapse
|
20
|
Lu T, Han B, Yu F. Detection and classification of marine mammal sounds using AlexNet with transfer learning. ECOL INFORM 2021. [DOI: 10.1016/j.ecoinf.2021.101277] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
21
|
Zhong M, Torterotot M, Branch TA, Stafford KM, Royer JY, Dodhia R, Lavista Ferres J. Detecting, classifying, and counting blue whale calls with Siamese neural networks. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:3086. [PMID: 34241138 DOI: 10.1121/10.0004828] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 04/09/2021] [Indexed: 06/13/2023]
Abstract
The goal of this project is to use acoustic signatures to detect, classify, and count the calls of four acoustic populations of blue whales so that, ultimately, the conservation status of each population can be better assessed. We used manual annotations from 350 h of audio recordings from the underwater hydrophones in the Indian Ocean to build a deep learning model to detect, classify, and count the calls from four acoustic song types. The method we used was Siamese neural networks (SNN), a class of neural network architectures that are used to find the similarity of the inputs by comparing their feature vectors, finding that they outperformed the more widely used convolutional neural networks (CNN). Specifically, the SNN outperform a CNN with 2% accuracy improvement in population classification and 1.7%-6.4% accuracy improvement in call count estimation for each blue whale population. In addition, even though we treat the call count estimation problem as a classification task and encode the number of calls in each spectrogram as a categorical variable, SNN surprisingly learned the ordinal relationship among them. SNN are robust and are shown here to be an effective way to automatically mine large acoustic datasets for blue whale calls.
Collapse
Affiliation(s)
- Ming Zhong
- AI for Good Research Lab, Microsoft, Redmond, Washington 98052, USA
| | - Maelle Torterotot
- Laboratory Geosciences Ocean, University of Brest and CNRS, Brest, France
| | - Trevor A Branch
- School of Aquatic and Fishery Sciences, University of Washington, Seattle, Washington 98105, USA
| | - Kathleen M Stafford
- Applied Physics Laboratory, University of Washington, Seattle, Washington 98105, USA
| | - Jean-Yves Royer
- Laboratory Geosciences Ocean, University of Brest and CNRS, Brest, France
| | - Rahul Dodhia
- AI for Good Research Lab, Microsoft, Redmond, Washington 98052, USA
| | | |
Collapse
|
22
|
Ozanich E, Thode A, Gerstoft P, Freeman LA, Freeman S. Deep embedded clustering of coral reef bioacoustics. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:2587. [PMID: 33940892 DOI: 10.1121/10.0004221] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Accepted: 03/19/2021] [Indexed: 06/12/2023]
Abstract
Deep clustering was applied to unlabeled, automatically detected signals in a coral reef soundscape to distinguish fish pulse calls from segments of whale song. Deep embedded clustering (DEC) learned latent features and formed classification clusters using fixed-length power spectrograms of the signals. Handpicked spectral and temporal features were also extracted and clustered with Gaussian mixture models (GMM) and conventional clustering. DEC, GMM, and conventional clustering were tested on simulated datasets of fish pulse calls (fish) and whale song units (whale) with randomized bandwidth, duration, and SNR. Both GMM and DEC achieved high accuracy and identified clusters with fish, whale, and overlapping fish and whale signals. Conventional clustering methods had low accuracy in scenarios with unequal-sized clusters or overlapping signals. Fish and whale signals recorded near Hawaii in February-March 2020 were clustered with DEC, GMM, and conventional clustering. DEC features demonstrated the highest accuracy of 77.5% on a small, manually labeled dataset for classifying signals into fish and whale clusters.
Collapse
Affiliation(s)
- Emma Ozanich
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, California 92037, USA
| | - Aaron Thode
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, California 92037, USA
| | - Peter Gerstoft
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, California 92037, USA
| | - Lauren A Freeman
- Naval Undersea Warfare Center Newport, Newport, Rhode Island 02841, USA
| | - Simon Freeman
- Naval Undersea Warfare Center Newport, Newport, Rhode Island 02841, USA
| |
Collapse
|
23
|
Nguyen Hong Duc P, Torterotot M, Samaran F, White PR, Gérard O, Adam O, Cazau D. Assessing inter-annotator agreement from collaborative annotation campaign in marine bioacoustics. ECOL INFORM 2021. [DOI: 10.1016/j.ecoinf.2020.101185] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
24
|
Marzahl C, Aubreville M, Bertram CA, Maier J, Bergler C, Kröger C, Voigt J, Breininger K, Klopfleisch R, Maier A. EXACT: a collaboration toolset for algorithm-aided annotation of images with annotation version control. Sci Rep 2021; 11:4343. [PMID: 33623058 PMCID: PMC7902667 DOI: 10.1038/s41598-021-83827-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Accepted: 02/02/2021] [Indexed: 11/15/2022] Open
Abstract
In many research areas, scientific progress is accelerated by multidisciplinary access to image data and their interdisciplinary annotation. However, keeping track of these annotations to ensure a high-quality multi-purpose data set is a challenging and labour intensive task. We developed the open-source online platform EXACT (EXpert Algorithm Collaboration Tool) that enables the collaborative interdisciplinary analysis of images from different domains online and offline. EXACT supports multi-gigapixel medical whole slide images as well as image series with thousands of images. The software utilises a flexible plugin system that can be adapted to diverse applications such as counting mitotic figures with a screening mode, finding false annotations on a novel validation view, or using the latest deep learning image analysis technologies. This is combined with a version control system which makes it possible to keep track of changes in the data sets and, for example, to link the results of deep learning experiments to specific data set versions. EXACT is freely available and has already been successfully applied to a broad range of annotation tasks, including highly diverse applications like deep learning supported cytology scoring, interdisciplinary multi-centre whole slide image tumour annotation, and highly specialised whale sound spectroscopy clustering.
Collapse
Affiliation(s)
- Christian Marzahl
- Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
- Research and Development, EUROIMMUN Medizinische Labordiagnostika AG, Lübeck, Germany.
| | - Marc Aubreville
- Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
- Faculty of Computer Science, Technische Hochschule Ingolstadt, Ingolstadt, Germany
| | - Christof A Bertram
- Institute of Veterinary Pathology, Freie Universität Berlin, Berlin, Germany
| | - Jennifer Maier
- Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Christian Bergler
- Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Christine Kröger
- Research and Development, EUROIMMUN Medizinische Labordiagnostika AG, Lübeck, Germany
| | - Jörn Voigt
- Research and Development, EUROIMMUN Medizinische Labordiagnostika AG, Lübeck, Germany
| | - Katharina Breininger
- Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Robert Klopfleisch
- Institute of Veterinary Pathology, Freie Universität Berlin, Berlin, Germany
| | - Andreas Maier
- Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| |
Collapse
|
25
|
|
26
|
Kirsebom OS, Frazao F, Simard Y, Roy N, Matwin S, Giard S. Performance of a deep neural network at detecting North Atlantic right whale upcalls. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:2636. [PMID: 32359246 DOI: 10.1121/10.0001132] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Accepted: 04/05/2020] [Indexed: 06/11/2023]
Abstract
Passive acoustics provides a powerful tool for monitoring the endangered North Atlantic right whale (Eubalaena glacialis), but robust detection algorithms are needed to handle diverse and variable acoustic conditions and differences in recording techniques and equipment. This paper investigates the potential of deep neural networks (DNNs) for addressing this need. ResNet, an architecture commonly used for image recognition, was trained to recognize the time-frequency representation of the characteristic North Atlantic right whale upcall. The network was trained on several thousand examples recorded at various locations in the Gulf of St. Lawrence in 2018 and 2019, using different equipment and deployment techniques. Used as a detection algorithm on fifty 30-min recordings from the years 2015-2017 containing over one thousand upcalls, the network achieved recalls up to 80% while maintaining a precision of 90%. Importantly, the performance of the network improved as more variance was introduced into the training dataset, whereas the opposite trend was observed using a conventional linear discriminant analysis approach. This study demonstrates that DNNs can be trained to identify North Atlantic right whale upcalls under diverse and variable conditions with a performance that compares favorably to that of existing algorithms.
Collapse
Affiliation(s)
- Oliver S Kirsebom
- Institute for Big Data Analytics, Dalhousie University, Halifax, Nova Scotia, B3H 4R2, Canada
| | - Fabio Frazao
- Institute for Big Data Analytics, Dalhousie University, Halifax, Nova Scotia, B3H 4R2, Canada
| | - Yvan Simard
- Fisheries and Oceans Canada Chair in Underwater Acoustics Applied to Ecosystem and Marine Mammals, Marine Sciences Institute, University of Québec at Rimouski, Rimouski, Québec, Canada
| | - Nathalie Roy
- Maurice Lamontagne Institute, Fisheries and Oceans Canada, Mont-Joli, Québec, Canada
| | - Stan Matwin
- Institute for Big Data Analytics, Dalhousie University, Halifax, Nova Scotia, B3H 4R2, Canada
| | - Samuel Giard
- Maurice Lamontagne Institute, Fisheries and Oceans Canada, Mont-Joli, Québec, Canada
| |
Collapse
|
27
|
Zhong M, Castellote M, Dodhia R, Lavista Ferres J, Keogh M, Brewer A. Beluga whale acoustic signal classification using deep learning neural network models. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:1834. [PMID: 32237822 DOI: 10.1121/10.0000921] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Accepted: 02/27/2020] [Indexed: 05/23/2023]
Abstract
Over a decade after the Cook Inlet beluga (Delphinapterus leucas) was listed as endangered in 2008, the population has shown no sign of recovery. Lack of ecological knowledge limits the understanding of, and ability to manage, potential threats impeding recovery of this declining population. National Oceanic and Atmospheric Administration Fisheries, in partnership with the Alaska Department of Fish and Game, initiated a passive acoustics monitoring program in 2017 to investigate beluga seasonal occurrence by deploying a series of passive acoustic moorings. Data have been processed with semi-automated tonal detectors followed by time intensive manual validation. To reduce this labor intensive and time-consuming process, in addition to increasing the accuracy of classification results, the authors constructed an ensembled deep learning convolutional neural network model to classify beluga detections as true or false. Using a 0.5 threshold, the final model achieves 96.57% precision and 92.26% recall on testing dataset. This methodology proves to be successful at classifying beluga signals, and the framework can be easily generalized to other acoustic classification problems.
Collapse
Affiliation(s)
- Ming Zhong
- AI for Good Research Lab, Microsoft, Redmond, Washington 98052, USA
| | - Manuel Castellote
- Alaska Fisheries Science Center-National Oceanic and Atmospheric Administration (NOAA) Fisheries and Joint Institute for the Study of the Atmosphere and Ocean (JISAO), University of Washington, Seattle, Washington 98195, USA
| | - Rahul Dodhia
- AI for Good Research Lab, Microsoft, Redmond, Washington 98052, USA
| | | | - Mandy Keogh
- Alaska Department of Fish and Game, Juneau, Alaska 99802, USA
| | - Arial Brewer
- Alaska Fisheries Science Center-National Oceanic and Atmospheric Administration (NOAA) Fisheries and Joint Institute for the Study of the Atmosphere and Ocean (JISAO), University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
28
|
Bergler C, Schröter H, Cheng RX, Barth V, Weber M, Nöth E, Hofer H, Maier A. ORCA-SPOT: An Automatic Killer Whale Sound Detection Toolkit Using Deep Learning. Sci Rep 2019; 9:10997. [PMID: 31358873 PMCID: PMC6662697 DOI: 10.1038/s41598-019-47335-w] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 07/12/2019] [Indexed: 11/09/2022] Open
Abstract
Large bioacoustic archives of wild animals are an important source to identify reappearing communication patterns, which can then be related to recurring behavioral patterns to advance the current understanding of intra-specific communication of non-human animals. A main challenge remains that most large-scale bioacoustic archives contain only a small percentage of animal vocalizations and a large amount of environmental noise, which makes it extremely difficult to manually retrieve sufficient vocalizations for further analysis - particularly important for species with advanced social systems and complex vocalizations. In this study deep neural networks were trained on 11,509 killer whale (Orcinus orca) signals and 34,848 noise segments. The resulting toolkit ORCA-SPOT was tested on a large-scale bioacoustic repository - the Orchive - comprising roughly 19,000 hours of killer whale underwater recordings. An automated segmentation of the entire Orchive recordings (about 2.2 years) took approximately 8 days. It achieved a time-based precision or positive-predictive-value (PPV) of 93.2% and an area-under-the-curve (AUC) of 0.9523. This approach enables an automated annotation procedure of large bioacoustics databases to extract killer whale sounds, which are essential for subsequent identification of significant communication patterns. The code will be publicly available in October 2019 to support the application of deep learning to bioaoucstic research. ORCA-SPOT can be adapted to other animal species.
Collapse
Affiliation(s)
- Christian Bergler
- Friedrich-Alexander-University Erlangen-Nuremberg, Department of Computer Science, Pattern Recognition Lab, Martensstr. 3, 91058, Erlangen, Germany.
| | - Hendrik Schröter
- Friedrich-Alexander-University Erlangen-Nuremberg, Department of Computer Science, Pattern Recognition Lab, Martensstr. 3, 91058, Erlangen, Germany
| | - Rachael Xi Cheng
- Department of Ecological Dynamics, Leibniz Institute for Zoo and Wildlife Research (IZW) in the Forschungsverbund Berlin e.V., Alfred-Kowalke-Straße 17, 10315, Berlin, Germany
| | - Volker Barth
- Anthro-Media, Nansenstr. 19, 12047, Berlin, Germany
| | | | - Elmar Nöth
- Friedrich-Alexander-University Erlangen-Nuremberg, Department of Computer Science, Pattern Recognition Lab, Martensstr. 3, 91058, Erlangen, Germany.
| | - Heribert Hofer
- Department of Ecological Dynamics, Leibniz Institute for Zoo and Wildlife Research (IZW) in the Forschungsverbund Berlin e.V., Alfred-Kowalke-Straße 17, 10315, Berlin, Germany
- Department of Biology, Chemistry, Pharmacy, Freie Universität Berlin, Takustrasse 3, 14195, Berlin, Germany
- Department of Veterinary Medicine, Freie Universität Berlin, Oertzenweg 19b, 14195, Berlin, Germany
| | - Andreas Maier
- Friedrich-Alexander-University Erlangen-Nuremberg, Department of Computer Science, Pattern Recognition Lab, Martensstr. 3, 91058, Erlangen, Germany
| |
Collapse
|