1
|
Kershenbaum A, Akçay Ç, Babu‐Saheer L, Barnhill A, Best P, Cauzinille J, Clink D, Dassow A, Dufourq E, Growcott J, Markham A, Marti‐Domken B, Marxer R, Muir J, Reynolds S, Root‐Gutteridge H, Sadhukhan S, Schindler L, Smith BR, Stowell D, Wascher CA, Dunn JC. Automatic detection for bioacoustic research: a practical guide from and for biologists and computer scientists. Biol Rev Camb Philos Soc 2025; 100:620-646. [PMID: 39417330 PMCID: PMC11885706 DOI: 10.1111/brv.13155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 09/30/2024] [Accepted: 10/04/2024] [Indexed: 10/19/2024]
Abstract
Recent years have seen a dramatic rise in the use of passive acoustic monitoring (PAM) for biological and ecological applications, and a corresponding increase in the volume of data generated. However, data sets are often becoming so sizable that analysing them manually is increasingly burdensome and unrealistic. Fortunately, we have also seen a corresponding rise in computing power and the capability of machine learning algorithms, which offer the possibility of performing some of the analysis required for PAM automatically. Nonetheless, the field of automatic detection of acoustic events is still in its infancy in biology and ecology. In this review, we examine the trends in bioacoustic PAM applications, and their implications for the burgeoning amount of data that needs to be analysed. We explore the different methods of machine learning and other tools for scanning, analysing, and extracting acoustic events automatically from large volumes of recordings. We then provide a step-by-step practical guide for using automatic detection in bioacoustics. One of the biggest challenges for the greater use of automatic detection in bioacoustics is that there is often a gulf in expertise between the biological sciences and the field of machine learning and computer science. Therefore, this review first presents an overview of the requirements for automatic detection in bioacoustics, intended to familiarise those from a computer science background with the needs of the bioacoustics community, followed by an introduction to the key elements of machine learning and artificial intelligence that a biologist needs to understand to incorporate automatic detection into their research. We then provide a practical guide to building an automatic detection pipeline for bioacoustic data, and conclude with a discussion of possible future directions in this field.
Collapse
Affiliation(s)
- Arik Kershenbaum
- Girton College and Department of ZoologyUniversity of CambridgeHuntingdon RoadCambridgeCB3 0JGUK
| | - Çağlar Akçay
- Behavioural Ecology Research Group, School of Life SciencesAnglia Ruskin UniversityEast RoadCambridgeCB1 1PTUK
| | - Lakshmi Babu‐Saheer
- Computing Informatics and Applications Research Group, School of Computing and Information SciencesAnglia Ruskin UniversityEast RoadCambridgeCB1 1PTUK
| | - Alex Barnhill
- Pattern Recognition Lab, Department of Computer ScienceFriedrich‐Alexander‐Universität Erlangen‐NürnbergErlangen91058Germany
| | - Paul Best
- Université de Toulon, Aix Marseille Univ, CNRS, LIS, ILCB, CS 60584Toulon83041 CEDEX 9France
| | - Jules Cauzinille
- Université de Toulon, Aix Marseille Univ, CNRS, LIS, ILCB, CS 60584Toulon83041 CEDEX 9France
| | - Dena Clink
- K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of OrnithologyCornell University159 Sapsucker Woods RoadIthacaNew York14850USA
| | - Angela Dassow
- Biology DepartmentCarthage College2001 Alford Park Dr, 68 David A Straz JrKenoshaWisconsin53140USA
| | - Emmanuel Dufourq
- African Institute for Mathematical Sciences7 Melrose Road, MuizenbergCape Town7441South Africa
- Stellenbosch UniversityJan Celliers RoadStellenbosch7600South Africa
- African Institute for Mathematical Sciences ‐ Research and Innovation CentreDistrict Gasabo, Secteur Kacyiru, Cellule Kamatamu, Rue KG590 ST No 1KigaliRwanda
| | - Jonathan Growcott
- Centre of Ecology and Conservation, College of Life and Environmental SciencesUniversity of Exeter, Cornwall CampusExeterTR10 9FEUK
- Wildlife Conservation Research UnitRecanati‐Kaplan CentreTubney House, Abingdon Road TubneyAbingdonOX13 5QLUK
| | - Andrew Markham
- Department of Computer ScienceUniversity of OxfordParks RoadOxfordOX1 3QDUK
| | | | - Ricard Marxer
- Université de Toulon, Aix Marseille Univ, CNRS, LIS, ILCB, CS 60584Toulon83041 CEDEX 9France
| | - Jen Muir
- Behavioural Ecology Research Group, School of Life SciencesAnglia Ruskin UniversityEast RoadCambridgeCB1 1PTUK
| | - Sam Reynolds
- Behavioural Ecology Research Group, School of Life SciencesAnglia Ruskin UniversityEast RoadCambridgeCB1 1PTUK
| | - Holly Root‐Gutteridge
- School of Natural Sciences, University of LincolnJoseph Banks LaboratoriesBeevor StreetLincolnLincolnshireLN5 7TSUK
| | - Sougata Sadhukhan
- Institute of Environment Education and ResearchPune Bharati Vidyapeeth Educational CampusSatara RoadPuneMaharashtra411 043India
| | - Loretta Schindler
- Department of Zoology, Faculty of ScienceCharles UniversityPrague128 44Czech Republic
| | - Bethany R. Smith
- Institute of ZoologyZoological Society of LondonOuter CircleLondonNW1 4RYUK
| | - Dan Stowell
- Tilburg UniversityTilburgThe Netherlands
- Naturalis Biodiversity CenterDarwinweg 2Leiden2333 CRThe Netherlands
| | - Claudia A.F. Wascher
- Behavioural Ecology Research Group, School of Life SciencesAnglia Ruskin UniversityEast RoadCambridgeCB1 1PTUK
| | - Jacob C. Dunn
- Behavioural Ecology Research Group, School of Life SciencesAnglia Ruskin UniversityEast RoadCambridgeCB1 1PTUK
- Department of ArchaeologyUniversity of CambridgeDowning StreetCambridgeCB2 3DZUK
- Department of Behavioral and Cognitive BiologyUniversity of Vienna, University Biology Building (UBB)Djerassiplatiz 1Vienna1030Austria
| |
Collapse
|
2
|
Fortkord L, Veit L. Social context affects sequence modification learning in birdsong. Front Psychol 2025; 16:1488762. [PMID: 39973966 PMCID: PMC11835814 DOI: 10.3389/fpsyg.2025.1488762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2024] [Accepted: 01/13/2025] [Indexed: 02/21/2025] Open
Abstract
Social interactions are crucial for imitative vocal learning such as human speech learning or song learning in songbirds. Recently, introducing specific learned modifications into adult song by experimenter-controlled reinforcement learning has emerged as a key protocol to study aspects of vocal learning in songbirds. This form of adult plasticity does not require conspecifics as a model for imitation or to provide social feedback on song performance. We therefore hypothesized that social interactions are irrelevant to, or even inhibit, song modification learning. We tested whether social context affects song sequence learning in adult male Bengalese finches (Lonchura striata domestica). We targeted specific syllable sequences in adult birds' songs with negative auditory feedback, which led the birds to reduce the targeted syllable sequence in favor of alternate sequences. Changes were apparent in catch trials without feedback, indicating a learning process. Each experiment was repeated within subjects with three different social contexts (male-male, MM; male-female, MF; and male alone, MA) in randomized order. We found robust learning in all three social contexts, with a nonsignificant trend toward facilitated learning with social company (MF, MM) compared to the single-housed (MA) condition. This effect could not be explained by the order of social contexts, nor by different singing rates across contexts. Our results demonstrate that social context can influence degree of learning in adult birds even in experimenter-controlled reinforcement learning tasks, and therefore suggest that social interactions might facilitate song plasticity beyond their known role for imitation and social feedback.
Collapse
Affiliation(s)
| | - Lena Veit
- Neurobiology of Vocal Communication, Institute for Neurobiology, University of Tübingen, Tübingen, Germany
| |
Collapse
|
3
|
Koch TMI, Marks ES, Roberts TF. AVN: A Deep Learning Approach for the Analysis of Birdsong. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.10.593561. [PMID: 39229184 PMCID: PMC11370480 DOI: 10.1101/2024.05.10.593561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
Deep learning tools for behavior analysis have enabled important new insights and discoveries in neuroscience. Yet, they often compromise interpretability and generalizability for performance, making it difficult to quantitively compare phenotypes across datasets and research groups. We developed a novel deep learning-based behavior analysis pipeline, Avian Vocalization Network (AVN), for the learned vocalizations of the most extensively studied vocal learning model species - the zebra finch. AVN annotates songs with high accuracy across multiple animal colonies without the need for any additional training data and generates a comprehensive set of interpretable features to describe the syntax, timing, and acoustic properties of song. We use this feature set to compare song phenotypes across multiple research groups and experiments, and to predict a bird's stage in song development. Additionally, we have developed a novel method to measure song imitation that requires no additional training data for new comparisons or recording environments, and outperforms existing similarity scoring methods in its sensitivity and agreement with expert human judgements of song similarity. These tools are available through the open-source AVN python package and graphical application, which makes them accessible to researchers without any prior coding experience. Altogether, this behavior analysis toolkit stands to facilitate and accelerate the study of vocal behavior by enabling a standardized mapping of phenotypes and learning outcomes, thus helping scientists better link behavior to the underlying neural processes.
Collapse
Affiliation(s)
- Therese M I Koch
- Department of Neuroscience, UT Southwestern Medical Center, Dallas TX, USA
| | - Ethan S Marks
- Department of Neuroscience, UT Southwestern Medical Center, Dallas TX, USA
| | - Todd F Roberts
- Department of Neuroscience, UT Southwestern Medical Center, Dallas TX, USA
| |
Collapse
|
4
|
Smeele SQ, Tyndel SA, Klump BC, Alarcón‐Nieto G, Aplin LM. callsync: An R package for alignment and analysis of multi-microphone animal recordings. Ecol Evol 2024; 14:e11384. [PMID: 38799392 PMCID: PMC11116754 DOI: 10.1002/ece3.11384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 03/13/2024] [Accepted: 04/24/2024] [Indexed: 05/29/2024] Open
Abstract
To better understand how vocalisations are used during interactions of multiple individuals, studies are increasingly deploying on-board devices with a microphone on each animal. The resulting recordings are extremely challenging to analyse, since microphone clocks drift non-linearly and record the vocalisations of non-focal individuals as well as noise. Here we address this issue with callsync, an R package designed to align recordings, detect and assign vocalisations to the caller, trace the fundamental frequency, filter out noise and perform basic analysis on the resulting clips. We present a case study where the pipeline is used on a dataset of six captive cockatiels (Nymphicus hollandicus) wearing backpack microphones. Recordings initially had a drift of ~2 min, but were aligned to within ~2 s with our package. Using callsync, we detected and assigned 2101 calls across three multi-hour recording sessions. Two had loud beep markers in the background designed to help the manual alignment process. One contained no obvious markers, in order to demonstrate that markers were not necessary to obtain optimal alignment. We then used a function that traces the fundamental frequency and applied spectrographic cross correlation to show a possible analytical pipeline where vocal similarity is visually assessed. The callsync package can be used to go from raw recordings to a clean dataset of features. The package is designed to be modular and allows users to replace functions as they wish. We also discuss the challenges that might be faced in each step and how the available literature can provide alternatives for each step.
Collapse
Affiliation(s)
- Simeon Q. Smeele
- Cognitive & Cultural Ecology Research GroupMax Planck Institute of Animal BehaviorRadolfzellGermany
- Department of Human Behavior, Ecology and CultureMax Planck Institute for Evolutionary AnthropologyLeipzigGermany
- Department of BiologyUniversity of KonstanzConstanceGermany
- Department of EcoscienceAarhus UniversityAarhusDenmark
| | - Stephen A. Tyndel
- Cognitive & Cultural Ecology Research GroupMax Planck Institute of Animal BehaviorRadolfzellGermany
- Department of BiologyUniversity of KonstanzConstanceGermany
| | - Barbara C. Klump
- Cognitive & Cultural Ecology Research GroupMax Planck Institute of Animal BehaviorRadolfzellGermany
- Department of Behavioral and Cognitive BiologyUniversity of ViennaViennaAustria
| | - Gustavo Alarcón‐Nieto
- Cognitive & Cultural Ecology Research GroupMax Planck Institute of Animal BehaviorRadolfzellGermany
- Department of BiologyUniversity of KonstanzConstanceGermany
- Department of MigrationMax Planck Institute of Animal BehaviorRadolfzellGermany
| | - Lucy M. Aplin
- Cognitive & Cultural Ecology Research GroupMax Planck Institute of Animal BehaviorRadolfzellGermany
- Department of Evolutionary Biology and Environmental StudiesUniversity of ZurichZurichSwitzerland
- Division of Ecology and Evolution, Research School of BiologyThe Australian National UniversityCanberraAustralian Capital TerritoryAustralia
| |
Collapse
|
5
|
Kawaji T, Fujibayashi M, Abe K. Goal-directed and flexible modulation of syllable sequence within birdsong. Nat Commun 2024; 15:3419. [PMID: 38658545 PMCID: PMC11043396 DOI: 10.1038/s41467-024-47824-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 04/09/2024] [Indexed: 04/26/2024] Open
Abstract
Songs constitute a complex system of vocal signals for inter-individual communication in songbirds. Here, we elucidate the flexibility which songbirds exhibit in the organizing and sequencing of syllables within their songs. Utilizing a newly devised song decoder for quasi-real-time annotation, we execute an operant conditioning paradigm, with rewards contingent upon specific syllable syntax. Our analysis reveals that birds possess the capacity to modify the contents of their songs, adjust the repetition length of particular syllables and employing specific motifs. Notably, birds altered their syllable sequence in a goal-directed manner to obtain rewards. We demonstrate that such modulation occurs within a distinct song segment, with adjustments made within 10 minutes after cue presentation. Additionally, we identify the involvement of the parietal-basal ganglia pathway in orchestrating these flexible modulations of syllable sequences. Our findings unveil an unappreciated aspect of songbird communication, drawing parallels with human speech.
Collapse
Affiliation(s)
- Takuto Kawaji
- Lab of Brain Development, Graduate School of Life Sciences, Tohoku University, Katahira 2-1-1, Sendai, Miyagi, 980-8577, Japan
| | - Mizuki Fujibayashi
- Lab of Brain Development, Graduate School of Life Sciences, Tohoku University, Katahira 2-1-1, Sendai, Miyagi, 980-8577, Japan
| | - Kentaro Abe
- Lab of Brain Development, Graduate School of Life Sciences, Tohoku University, Katahira 2-1-1, Sendai, Miyagi, 980-8577, Japan.
- Division for the Establishment of Frontier Sciences of the Organization for Advanced Studies, Tohoku University, Sendai, Miyagi, 980-8577, Japan.
| |
Collapse
|
6
|
Santana GM, Dietrich MO. SqueakOut: Autoencoder-based segmentation of mouse ultrasonic vocalizations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.19.590368. [PMID: 38712291 PMCID: PMC11071348 DOI: 10.1101/2024.04.19.590368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Mice emit ultrasonic vocalizations (USVs) that are important for social communication. Despite great advancements in tools to detect USVs from audio files in the recent years, highly accurate segmentation of USVs from spectrograms (i.e., removing noise) remains a significant challenge. Here, we present a new dataset of 12,954 annotated spectrograms explicitly labeled for mouse USV segmentation. Leveraging this dataset, we developed SqueakOut, a lightweight (4.6M parameters) fully convolutional autoencoder that achieves high accuracy in supervised segmentation of USVs from spectrograms, with a Dice score of 90.22. SqueakOut combines a MobileNetV2 backbone with skip connections and transposed convolutions to precisely segment USVs. Using stochastic data augmentation techniques and a hybrid loss function, SqueakOut learns robust segmentation across varying recording conditions. We evaluate SqueakOut's performance, demonstrating substantial improvements over existing methods like VocalMat (63.82 Dice score). The accurate USV segmentations enabled by SqueakOut will facilitate novel methods for vocalization classification and more accurate analysis of mouse communication. To promote further research, we release the annotated 12,954 spectrogram USV segmentation dataset and the SqueakOut implementation publicly.
Collapse
Affiliation(s)
- Gustavo M Santana
- Laboratory of Physiology of Behavior, Interdepartmental Neuroscience Program, Program in Physics, Engineering and Biology, Yale University, USA
- Graduate Program in Biochemistry, Federal University of Rio Grande do Sul, BRA
| | - Marcelo O Dietrich
- Laboratory of Physiology of Behavior, Department of Comparative Medicine, Department of Neuroscience, Yale University, USA
| |
Collapse
|
7
|
Koparkar A, Warren TL, Charlesworth JD, Shin S, Brainard MS, Veit L. Lesions in a songbird vocal circuit increase variability in song syntax. eLife 2024; 13:RP93272. [PMID: 38635312 PMCID: PMC11026095 DOI: 10.7554/elife.93272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024] Open
Abstract
Complex skills like speech and dance are composed of ordered sequences of simpler elements, but the neuronal basis for the syntactic ordering of actions is poorly understood. Birdsong is a learned vocal behavior composed of syntactically ordered syllables, controlled in part by the songbird premotor nucleus HVC (proper name). Here, we test whether one of HVC's recurrent inputs, mMAN (medial magnocellular nucleus of the anterior nidopallium), contributes to sequencing in adult male Bengalese finches (Lonchura striata domestica). Bengalese finch song includes several patterns: (1) chunks, comprising stereotyped syllable sequences; (2) branch points, where a given syllable can be followed probabilistically by multiple syllables; and (3) repeat phrases, where individual syllables are repeated variable numbers of times. We found that following bilateral lesions of mMAN, acoustic structure of syllables remained largely intact, but sequencing became more variable, as evidenced by 'breaks' in previously stereotyped chunks, increased uncertainty at branch points, and increased variability in repeat numbers. Our results show that mMAN contributes to the variable sequencing of vocal elements in Bengalese finch song and demonstrate the influence of recurrent projections to HVC. Furthermore, they highlight the utility of species with complex syntax in investigating neuronal control of ordered sequences.
Collapse
Affiliation(s)
- Avani Koparkar
- Neurobiology of Vocal Communication, Institute for Neurobiology, University of TübingenTübingenGermany
| | - Timothy L Warren
- Howard Hughes Medical Institute and Center for Integrative Neuroscience, University of California San FranciscoSan FranciscoUnited States
- Departments of Horticulture and Integrative Biology, Oregon State UniversityCorvallisUnited States
| | - Jonathan D Charlesworth
- Howard Hughes Medical Institute and Center for Integrative Neuroscience, University of California San FranciscoSan FranciscoUnited States
| | - Sooyoon Shin
- Howard Hughes Medical Institute and Center for Integrative Neuroscience, University of California San FranciscoSan FranciscoUnited States
| | - Michael S Brainard
- Howard Hughes Medical Institute and Center for Integrative Neuroscience, University of California San FranciscoSan FranciscoUnited States
| | - Lena Veit
- Neurobiology of Vocal Communication, Institute for Neurobiology, University of TübingenTübingenGermany
| |
Collapse
|
8
|
Yurimoto T, Kumita W, Sato K, Kikuchi R, Oka G, Shibuki Y, Hashimoto R, Kamioka M, Hayasegawa Y, Yamazaki E, Kurotaki Y, Goda N, Kitakami J, Fujita T, Inoue T, Sasaki E. Development of a 3D tracking system for multiple marmosets under free-moving conditions. Commun Biol 2024; 7:216. [PMID: 38383741 PMCID: PMC10881507 DOI: 10.1038/s42003-024-05864-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 01/26/2024] [Indexed: 02/23/2024] Open
Abstract
Assessment of social interactions and behavioral changes in nonhuman primates is useful for understanding brain function changes during life events and pathogenesis of neurological diseases. The common marmoset (Callithrix jacchus), which lives in a nuclear family like humans, is a useful model, but longitudinal automated behavioral observation of multiple animals has not been achieved. Here, we developed a Full Monitoring and Animal Identification (FulMAI) system for longitudinal detection of three-dimensional (3D) trajectories of each individual in multiple marmosets under free-moving conditions by combining video tracking, Light Detection and Ranging, and deep learning. Using this system, identification of each animal was more than 97% accurate. Location preferences and inter-individual distance could be calculated, and deep learning could detect grooming behavior. The FulMAI system allows us to analyze the natural behavior of individuals in a family over their lifetime and understand how behavior changes due to life events together with other data.
Collapse
Affiliation(s)
- Terumi Yurimoto
- Department of Marmoset Biology and Medicine, Central Institute for Experimental Medicine and Life Science, Kawasaki, 210-0821, Japan
| | - Wakako Kumita
- Department of Marmoset Biology and Medicine, Central Institute for Experimental Medicine and Life Science, Kawasaki, 210-0821, Japan
| | - Kenya Sato
- Department of Marmoset Biology and Medicine, Central Institute for Experimental Medicine and Life Science, Kawasaki, 210-0821, Japan
| | - Rika Kikuchi
- Department of Marmoset Biology and Medicine, Central Institute for Experimental Medicine and Life Science, Kawasaki, 210-0821, Japan
| | - Gohei Oka
- Department of Marmoset Biology and Medicine, Central Institute for Experimental Medicine and Life Science, Kawasaki, 210-0821, Japan
| | - Yusuke Shibuki
- Department of Marmoset Biology and Medicine, Central Institute for Experimental Medicine and Life Science, Kawasaki, 210-0821, Japan
| | - Rino Hashimoto
- Department of Marmoset Biology and Medicine, Central Institute for Experimental Medicine and Life Science, Kawasaki, 210-0821, Japan
| | - Michiko Kamioka
- Department of Marmoset Biology and Medicine, Central Institute for Experimental Medicine and Life Science, Kawasaki, 210-0821, Japan
| | - Yumi Hayasegawa
- Department of Marmoset Biology and Medicine, Central Institute for Experimental Medicine and Life Science, Kawasaki, 210-0821, Japan
| | - Eiko Yamazaki
- Department of Marmoset Biology and Medicine, Central Institute for Experimental Medicine and Life Science, Kawasaki, 210-0821, Japan
| | - Yoko Kurotaki
- Center of Basic Technology in Marmoset, Central Institute for Experimental Medicine and Life Science, Kawasaki, 210-0821, Japan
| | - Norio Goda
- Public Digital Transformation Department, Hitachi, Ltd., Shinagawa, 140-8512, Japan
| | - Junichi Kitakami
- Vision AI Solution Design Department Hitachi Solutions Technology, Ltd, Tachikawa, 190-0014, Japan
| | - Tatsuya Fujita
- Engineering Department Eastern Japan division, Totec Amenity Limited, Shinjuku, 163-0417, Japan
| | - Takashi Inoue
- Department of Marmoset Biology and Medicine, Central Institute for Experimental Medicine and Life Science, Kawasaki, 210-0821, Japan
| | - Erika Sasaki
- Department of Marmoset Biology and Medicine, Central Institute for Experimental Medicine and Life Science, Kawasaki, 210-0821, Japan.
| |
Collapse
|
9
|
Best P, Paris S, Glotin H, Marxer R. Deep audio embeddings for vocalisation clustering. PLoS One 2023; 18:e0283396. [PMID: 37428759 PMCID: PMC10332598 DOI: 10.1371/journal.pone.0283396] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 06/25/2023] [Indexed: 07/12/2023] Open
Abstract
The study of non-human animals' communication systems generally relies on the transcription of vocal sequences using a finite set of discrete units. This set is referred to as a vocal repertoire, which is specific to a species or a sub-group of a species. When conducted by human experts, the formal description of vocal repertoires can be laborious and/or biased. This motivates computerised assistance for this procedure, for which machine learning algorithms represent a good opportunity. Unsupervised clustering algorithms are suited for grouping close points together, provided a relevant representation. This paper therefore studies a new method for encoding vocalisations, allowing for automatic clustering to alleviate vocal repertoire characterisation. Borrowing from deep representation learning, we use a convolutional auto-encoder network to learn an abstract representation of vocalisations. We report on the quality of the learnt representation, as well as of state of the art methods, by quantifying their agreement with expert labelled vocalisation types from 8 datasets of other studies across 6 species (birds and marine mammals). With this benchmark, we demonstrate that using auto-encoders improves the relevance of vocalisation representation which serves repertoire characterisation using a very limited number of settings. We also publish a Python package for the bioacoustic community to train their own vocalisation auto-encoders or use a pretrained encoder to browse vocal repertoires and ease unit wise annotation.
Collapse
Affiliation(s)
- Paul Best
- Université de Toulon, Aix Marseille Univ, CNRS, LIS, Toulon, France
| | - Sébastien Paris
- Université de Toulon, Aix Marseille Univ, CNRS, LIS, Toulon, France
| | - Hervé Glotin
- Université de Toulon, Aix Marseille Univ, CNRS, LIS, Toulon, France
| | - Ricard Marxer
- Université de Toulon, Aix Marseille Univ, CNRS, LIS, Toulon, France
| |
Collapse
|
10
|
Jourjine N, Woolfolk ML, Sanguinetti-Scheck JI, Sabatini JE, McFadden S, Lindholm AK, Hoekstra HE. Two pup vocalization types are genetically and functionally separable in deer mice. Curr Biol 2023; 33:1237-1248.e4. [PMID: 36893759 DOI: 10.1016/j.cub.2023.02.045] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2022] [Revised: 02/11/2023] [Accepted: 02/14/2023] [Indexed: 03/10/2023]
Abstract
Vocalization is a widespread social behavior in vertebrates that can affect fitness in the wild. Although many vocal behaviors are highly conserved, heritable features of specific vocalization types can vary both within and between species, raising the questions of why and how some vocal behaviors evolve. Here, using new computational tools to automatically detect and cluster vocalizations into distinct acoustic categories, we compare pup isolation calls across neonatal development in eight taxa of deer mice (genus Peromyscus) and compare them with laboratory mice (C57BL6/J strain) and free-living, wild house mice (Mus musculus domesticus). Whereas both Peromyscus and Mus pups produce ultrasonic vocalizations (USVs), Peromyscus pups also produce a second call type with acoustic features, temporal rhythms, and developmental trajectories that are distinct from those of USVs. In deer mice, these lower frequency "cries" are predominantly emitted in postnatal days one through nine, whereas USVs are primarily made after day 9. Using playback assays, we show that cries result in a more rapid approach by Peromyscus mothers than USVs, suggesting a role for cries in eliciting parental care early in neonatal development. Using a genetic cross between two sister species of deer mice exhibiting large, innate differences in the acoustic structure of cries and USVs, we find that variation in vocalization rate, duration, and pitch displays different degrees of genetic dominance and that cry and USV features can be uncoupled in second-generation hybrids. Taken together, this work shows that vocal behavior can evolve quickly between closely related rodent species in which vocalization types, likely serving distinct functions in communication, are controlled by distinct genetic loci.
Collapse
Affiliation(s)
- Nicholas Jourjine
- Department of Molecular & Cellular Biology, Department of Organismic & Evolutionary Biology, Center for Brain Science, Museum of Comparative Zoology, Harvard University and the Howard Hughes Medical Institute, 16 Divinity Avenue, Cambridge, MA 02138, USA
| | - Maya L Woolfolk
- Department of Molecular & Cellular Biology, Department of Organismic & Evolutionary Biology, Center for Brain Science, Museum of Comparative Zoology, Harvard University and the Howard Hughes Medical Institute, 16 Divinity Avenue, Cambridge, MA 02138, USA
| | - Juan I Sanguinetti-Scheck
- Department of Molecular & Cellular Biology, Department of Organismic & Evolutionary Biology, Center for Brain Science, Museum of Comparative Zoology, Harvard University and the Howard Hughes Medical Institute, 16 Divinity Avenue, Cambridge, MA 02138, USA
| | - John E Sabatini
- Department of Molecular & Cellular Biology, Department of Organismic & Evolutionary Biology, Center for Brain Science, Museum of Comparative Zoology, Harvard University and the Howard Hughes Medical Institute, 16 Divinity Avenue, Cambridge, MA 02138, USA
| | - Sade McFadden
- Department of Molecular & Cellular Biology, Department of Organismic & Evolutionary Biology, Center for Brain Science, Museum of Comparative Zoology, Harvard University and the Howard Hughes Medical Institute, 16 Divinity Avenue, Cambridge, MA 02138, USA
| | - Anna K Lindholm
- Department of Evolutionary Biology & Environmental Studies, University of Zürich, Winterthurerstrasse, 190 8057 Zürich, Switzerland
| | - Hopi E Hoekstra
- Department of Molecular & Cellular Biology, Department of Organismic & Evolutionary Biology, Center for Brain Science, Museum of Comparative Zoology, Harvard University and the Howard Hughes Medical Institute, 16 Divinity Avenue, Cambridge, MA 02138, USA.
| |
Collapse
|
11
|
Lorenz C, Hao X, Tomka T, Rüttimann L, Hahnloser RH. Interactive extraction of diverse vocal units from a planar embedding without the need for prior sound segmentation. FRONTIERS IN BIOINFORMATICS 2023; 2:966066. [PMID: 36710910 PMCID: PMC9880044 DOI: 10.3389/fbinf.2022.966066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 11/14/2022] [Indexed: 01/15/2023] Open
Abstract
Annotating and proofreading data sets of complex natural behaviors such as vocalizations are tedious tasks because instances of a given behavior need to be correctly segmented from background noise and must be classified with minimal false positive error rate. Low-dimensional embeddings have proven very useful for this task because they can provide a visual overview of a data set in which distinct behaviors appear in different clusters. However, low-dimensional embeddings introduce errors because they fail to preserve distances; and embeddings represent only objects of fixed dimensionality, which conflicts with vocalizations that have variable dimensions stemming from their variable durations. To mitigate these issues, we introduce a semi-supervised, analytical method for simultaneous segmentation and clustering of vocalizations. We define a given vocalization type by specifying pairs of high-density regions in the embedding plane of sound spectrograms, one region associated with vocalization onsets and the other with offsets. We demonstrate our two-neighborhood (2N) extraction method on the task of clustering adult zebra finch vocalizations embedded with UMAP. We show that 2N extraction allows the identification of short and long vocal renditions from continuous data streams without initially committing to a particular segmentation of the data. Also, 2N extraction achieves much lower false positive error rate than comparable approaches based on a single defining region. Along with our method, we present a graphical user interface (GUI) for visualizing and annotating data.
Collapse
Affiliation(s)
- Corinna Lorenz
- Institute of Neuroinformatics and Neuroscience Center Zurich, University of Zurich and ETH Zurich, Zurich, Switzerland
- Université Paris-Saclay, CNRS, Institut des Neurosciences Paris-Saclay, Saclay, France
| | - Xinyu Hao
- Institute of Neuroinformatics and Neuroscience Center Zurich, University of Zurich and ETH Zurich, Zurich, Switzerland
- Tianjin University, School of Electrical and Information Engineering, Tianjin, China
| | - Tomas Tomka
- Institute of Neuroinformatics and Neuroscience Center Zurich, University of Zurich and ETH Zurich, Zurich, Switzerland
| | - Linus Rüttimann
- Institute of Neuroinformatics and Neuroscience Center Zurich, University of Zurich and ETH Zurich, Zurich, Switzerland
| | - Richard H.R. Hahnloser
- Institute of Neuroinformatics and Neuroscience Center Zurich, University of Zurich and ETH Zurich, Zurich, Switzerland
| |
Collapse
|
12
|
Michaud F, Sueur J, Le Cesne M, Haupert S. Unsupervised classification to improve the quality of a bird song recording dataset. ECOL INFORM 2022. [DOI: 10.1016/j.ecoinf.2022.101952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
13
|
Cohen Y, Engel TA, Langdon C, Lindsay GW, Ott T, Peters MAK, Shine JM, Breton-Provencher V, Ramaswamy S. Recent Advances at the Interface of Neuroscience and Artificial Neural Networks. J Neurosci 2022; 42:8514-8523. [PMID: 36351830 PMCID: PMC9665920 DOI: 10.1523/jneurosci.1503-22.2022] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 09/30/2022] [Accepted: 10/03/2022] [Indexed: 11/17/2022] Open
Abstract
Biological neural networks adapt and learn in diverse behavioral contexts. Artificial neural networks (ANNs) have exploited biological properties to solve complex problems. However, despite their effectiveness for specific tasks, ANNs are yet to realize the flexibility and adaptability of biological cognition. This review highlights recent advances in computational and experimental research to advance our understanding of biological and artificial intelligence. In particular, we discuss critical mechanisms from the cellular, systems, and cognitive neuroscience fields that have contributed to refining the architecture and training algorithms of ANNs. Additionally, we discuss how recent work used ANNs to understand complex neuronal correlates of cognition and to process high throughput behavioral data.
Collapse
Affiliation(s)
- Yarden Cohen
- Department of Brain Sciences, Weizmann Institute of Science, Rehovot, 76100, Israel
| | - Tatiana A Engel
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, NY 11724
| | | | - Grace W Lindsay
- Department of Psychology, Center for Data Science, New York University, New York, NY 10003
| | - Torben Ott
- Bernstein Center for Computational Neuroscience Berlin, Institute of Biology, Humboldt University of Berlin, 10117, Berlin, Germany
| | - Megan A K Peters
- Department of Cognitive Sciences, University of California-Irvine, Irvine, CA 92697
| | - James M Shine
- Brain and Mind Centre, University of Sydney, Sydney, NSW 2006, Australia
| | | | - Srikanth Ramaswamy
- Biosciences Institute, Newcastle University, Newcastle upon Tyne, NE2 4HH, United Kingdom
| |
Collapse
|
14
|
McGregor JN, Grassler AL, Jaffe PI, Jacob AL, Brainard MS, Sober SJ. Shared mechanisms of auditory and non-auditory vocal learning in the songbird brain. eLife 2022; 11:75691. [PMID: 36107757 PMCID: PMC9522248 DOI: 10.7554/elife.75691] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 09/14/2022] [Indexed: 01/18/2023] Open
Abstract
Songbirds and humans share the ability to adaptively modify their vocalizations based on sensory feedback. Prior studies have focused primarily on the role that auditory feedback plays in shaping vocal output throughout life. In contrast, it is unclear how non-auditory information drives vocal plasticity. Here, we first used a reinforcement learning paradigm to establish that somatosensory feedback (cutaneous electrical stimulation) can drive vocal learning in adult songbirds. We then assessed the role of a songbird basal ganglia thalamocortical pathway critical to auditory vocal learning in this novel form of vocal plasticity. We found that both this circuit and its dopaminergic inputs are necessary for non-auditory vocal learning, demonstrating that this pathway is critical for guiding adaptive vocal changes based on both auditory and somatosensory signals. The ability of this circuit to use both auditory and somatosensory information to guide vocal learning may reflect a general principle for the neural systems that support vocal plasticity across species.
Collapse
Affiliation(s)
- James N McGregor
- Neuroscience Graduate Program, Graduate Division of Biological and Biomedical Sciences, Laney Graduate School, Emory University, Atlanta, United States
| | | | - Paul I Jaffe
- Center for Integrative Neuroscience, University of California, San Francisco, San Francisco, United States
| | | | - Michael S Brainard
- Center for Integrative Neuroscience, University of California, San Francisco, San Francisco, United States.,Howard Hughes Medical Institute, University of California, San Francisco, San Francisco, United States
| | - Samuel J Sober
- Department of Biology, Emory University, Atlanta, United States
| |
Collapse
|
15
|
Rookognise: Acoustic detection and identification of individual rooks in field recordings using multi-task neural networks. ECOL INFORM 2022. [DOI: 10.1016/j.ecoinf.2022.101818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|