1
|
Fortkord L, Veit L. Social context affects sequence modification learning in birdsong. Front Psychol 2025; 16:1488762. [PMID: 39973966 PMCID: PMC11835814 DOI: 10.3389/fpsyg.2025.1488762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2024] [Accepted: 01/13/2025] [Indexed: 02/21/2025] Open
Abstract
Social interactions are crucial for imitative vocal learning such as human speech learning or song learning in songbirds. Recently, introducing specific learned modifications into adult song by experimenter-controlled reinforcement learning has emerged as a key protocol to study aspects of vocal learning in songbirds. This form of adult plasticity does not require conspecifics as a model for imitation or to provide social feedback on song performance. We therefore hypothesized that social interactions are irrelevant to, or even inhibit, song modification learning. We tested whether social context affects song sequence learning in adult male Bengalese finches (Lonchura striata domestica). We targeted specific syllable sequences in adult birds' songs with negative auditory feedback, which led the birds to reduce the targeted syllable sequence in favor of alternate sequences. Changes were apparent in catch trials without feedback, indicating a learning process. Each experiment was repeated within subjects with three different social contexts (male-male, MM; male-female, MF; and male alone, MA) in randomized order. We found robust learning in all three social contexts, with a nonsignificant trend toward facilitated learning with social company (MF, MM) compared to the single-housed (MA) condition. This effect could not be explained by the order of social contexts, nor by different singing rates across contexts. Our results demonstrate that social context can influence degree of learning in adult birds even in experimenter-controlled reinforcement learning tasks, and therefore suggest that social interactions might facilitate song plasticity beyond their known role for imitation and social feedback.
Collapse
Affiliation(s)
| | - Lena Veit
- Neurobiology of Vocal Communication, Institute for Neurobiology, University of Tübingen, Tübingen, Germany
| |
Collapse
|
2
|
Maoudj I, Kuwano A, Panheleux C, Kubota Y, Kawamata T, Muragaki Y, Masamune K, Seizeur R, Dardenne G, Tamura M. Classification of speech arrests and speech impairments during awake craniotomy: a multi-databases analysis. Int J Comput Assist Radiol Surg 2025; 20:217-224. [PMID: 39652158 DOI: 10.1007/s11548-024-03301-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Accepted: 11/27/2024] [Indexed: 02/11/2025]
Abstract
PURPOSE Awake craniotomy presents a unique opportunity to map and preserve critical brain functions, particularly speech, during tumor resection. The ability to accurately assess linguistic functions in real-time not only enhances surgical precision, but also contributes significantly to improving postoperative outcomes. However, today, its evaluation is subjective as it relies on a clinician's observations only. This paper explores the use of a deep learning based model for the objective assessment of speech arrest and speech impairments during awake craniotomy. METHODS We extracted 1883 3-second audio clips containing the patient's response following direct electrical stimulation from 23 awake craniotomies recorded from two operating rooms of the Tokyo Women's Medical University Hospital (Japan) and two awake craniotomies recorded from the University Hospital of Brest (France). A Wav2Vec2-based model has been trained and used to detect speech arrests and speech impairments. Experiments were performed with different datasets settings and preprocessing techniques and the performances of the model were evaluated using the F1-score. RESULTS The F1-score was 84.12% when the model was trained and tested on Japanese data only. In a cross-language situation, the F1-score was 74.68% when the model was trained on Japanese data and tested on French data. CONCLUSIONS The results are encouraging even in a cross-language situation but further evaluation is required. The integration of preprocessing techniques, in particular noise reduction, improved the results significantly.
Collapse
Affiliation(s)
- Ilias Maoudj
- Laboratoire de Traitement de l'Information Médicale, UMR INSERM 1101, Brest, France.
| | - Atsushi Kuwano
- Department of Neurosurgery, Tokyo Women's Medical University, Tokyo, Japan
- Faculty of Advanced Techno-Surgery, Institute of Advanced Biomedical Engineering and Science, Tokyo Women's Medical University, Tokyo, Japan
| | - Céline Panheleux
- Laboratoire de Traitement de l'Information Médicale, UMR INSERM 1101, Brest, France
- Department of Neurosurgery, University Hospital Center of La Cavale Blanche, Brest, France
| | - Yuichi Kubota
- Department of Neurosurgery, Tokyo Women's Medical University Adachi Medical Center, Tokyo, Japan
| | - Takakazu Kawamata
- Institut National de la Santé et de la Recherche Médicale (INSERM), Brest, France
| | - Yoshihiro Muragaki
- Department of Neurosurgery, Tokyo Women's Medical University, Tokyo, Japan
- Faculty of Advanced Techno-Surgery, Institute of Advanced Biomedical Engineering and Science, Tokyo Women's Medical University, Tokyo, Japan
- Center for Advanced Medical Engineering Research and Development, Kobe University, Hyogo, Japan
| | - Ken Masamune
- Faculty of Advanced Techno-Surgery, Institute of Advanced Biomedical Engineering and Science, Tokyo Women's Medical University, Tokyo, Japan
| | - Romuald Seizeur
- Laboratoire de Traitement de l'Information Médicale, UMR INSERM 1101, Brest, France
- Department of Neurosurgery, University Hospital Center of La Cavale Blanche, Brest, France
| | - Guillaume Dardenne
- Laboratoire de Traitement de l'Information Médicale, UMR INSERM 1101, Brest, France
- Institut National de la Santé et de la Recherche Médicale (INSERM), Brest, France
| | - Manabu Tamura
- Department of Neurosurgery, Tokyo Women's Medical University, Tokyo, Japan
- Faculty of Advanced Techno-Surgery, Institute of Advanced Biomedical Engineering and Science, Tokyo Women's Medical University, Tokyo, Japan
| |
Collapse
|
3
|
Lohmann F, Allenspach S, Atz K, Schiebroek CCG, Hiss JA, Schneider G. Protein Binding Site Representation in Latent Space. Mol Inform 2025; 44:e202400205. [PMID: 39692081 PMCID: PMC11733832 DOI: 10.1002/minf.202400205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2024] [Revised: 11/14/2024] [Accepted: 11/26/2024] [Indexed: 12/19/2024]
Abstract
Interpretability and reliability of deep learning models are important for computer-based drug discovery. Aiming to understand feature perception by such a model, we investigate a graph neural network for affinity prediction of protein-ligand complexes. We assess a latent representation of ligand binding sites and investigate underlying geometric structure in this latent space and its relation to protein function. We introduce an automated computational pipeline for dimensionality reduction, clustering, hypothesis testing, and visualization of latent space. The results indicate that the learned protein latent space is inherently structured and not randomly distributed. Several of the identified protein binding site clusters in latent space correspond to functional protein families. Ligand size was found to be a determinant of cluster geometry. The computational pipeline proved applicable to latent space analysis and interpretation and can be adapted to work for different datasets and deep learning models.
Collapse
Affiliation(s)
- Frederieke Lohmann
- Department of Chemistry and Applied BiosciencesETH ZurichVladimir–Prelog–Weg 48093ZürichSwitzerland
| | - Stephan Allenspach
- Department of Chemistry and Applied BiosciencesETH ZurichVladimir–Prelog–Weg 48093ZürichSwitzerland
| | - Kenneth Atz
- Department of Chemistry and Applied BiosciencesETH ZurichVladimir–Prelog–Weg 48093ZürichSwitzerland
| | - Carl C. G. Schiebroek
- Department of Chemistry and Applied BiosciencesETH ZurichVladimir–Prelog–Weg 48093ZürichSwitzerland
| | - Jan A. Hiss
- Department of Chemistry and Applied BiosciencesETH ZurichVladimir–Prelog–Weg 48093ZürichSwitzerland
- Department of Biosystems Science and EngineeringETH ZurichKlingelbergstrasse 484056BaselSwitzerland
| | - Gisbert Schneider
- Department of Chemistry and Applied BiosciencesETH ZurichVladimir–Prelog–Weg 48093ZürichSwitzerland
- Department of Biosystems Science and EngineeringETH ZurichKlingelbergstrasse 484056BaselSwitzerland
| |
Collapse
|
4
|
Hu Z, Zhang Z, Li H, Yang LZ. Cross-device and test-retest reliability of speech acoustic measurements derived from consumer-grade mobile recording devices. Behav Res Methods 2024; 57:35. [PMID: 39738817 DOI: 10.3758/s13428-024-02584-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2024] [Indexed: 01/02/2025]
Abstract
In recent years, there has been growing interest in remote speech assessment through automated speech acoustic analysis. While the reliability of widely used features has been validated in professional recording settings, it remains unclear how the heterogeneity of consumer-grade recording devices, commonly used in nonclinical settings, impacts the reliability of these measurements. To address this issue, we systematically investigated the cross-device and test-retest reliability of classical speech acoustic measurements in a sample of healthy Chinese adults using consumer-grade equipment across three popular speech tasks: sustained phonation (SP), diadochokinesis (DDK), and picture description (PicD). A total of 51 participants completed two recording sessions spaced at least 24 hours apart. Speech outputs were recorded simultaneously using four devices: a voice recorder, laptop, tablet, and smartphone. Our results demonstrated good reliability for fundamental frequency and cepstral peak prominence in the SP task across testing sessions and devices. Other features from the SP and PicD tasks exhibited acceptable test-retest reliability, except for the period perturbation quotient from the tablet and formant frequency from the smartphone. However, measures from the DDK task showed a significant decrease in reliability on consumer-grade recording devices compared to professional devices. These findings indicate that the lower recording quality of consumer-grade equipment may compromise the reproducibility of syllable rate estimation, which is critical for DDK analysis. This study underscores the need for standardization of remote speech monitoring methodologies to ensure that remote home assessment provides accurate and reliable results for early screening.
Collapse
Affiliation(s)
- Zian Hu
- Institutes of Physical Science and Information Technology, Anhui University, Hefei, China
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China
| | - Zhenglin Zhang
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China
- University of Science and Technology of China, Hefei, China
| | - Hai Li
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China.
- University of Science and Technology of China, Hefei, China.
- Hefei Cancer Hospital, Chinese Academy of Sciences, Hefei, China.
| | - Li-Zhuang Yang
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China.
- University of Science and Technology of China, Hefei, China.
- Hefei Cancer Hospital, Chinese Academy of Sciences, Hefei, China.
| |
Collapse
|
5
|
Peterson RE, Choudhri A, Mitelut C, Tanelus A, Capo-Battaglia A, Williams AH, Schneider DM, Sanes DH. Unsupervised discovery of family specific vocal usage in the Mongolian gerbil. eLife 2024; 12:RP89892. [PMID: 39680425 DOI: 10.7554/elife.89892] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2024] Open
Abstract
In nature, animal vocalizations can provide crucial information about identity, including kinship and hierarchy. However, lab-based vocal behavior is typically studied during brief interactions between animals with no prior social relationship, and under environmental conditions with limited ethological relevance. Here, we address this gap by establishing long-term acoustic recordings from Mongolian gerbil families, a core social group that uses an array of sonic and ultrasonic vocalizations. Three separate gerbil families were transferred to an enlarged environment and continuous 20-day audio recordings were obtained. Using a variational autoencoder (VAE) to quantify 583,237 vocalizations, we show that gerbils exhibit a more elaborate vocal repertoire than has been previously reported and that vocal repertoire usage differs significantly by family. By performing gaussian mixture model clustering on the VAE latent space, we show that families preferentially use characteristic sets of vocal clusters and that these usage preferences remain stable over weeks. Furthermore, gerbils displayed family-specific transitions between vocal clusters. Since gerbils live naturally as extended families in complex underground burrows that are adjacent to other families, these results suggest the presence of a vocal dialect which could be exploited by animals to represent kinship. These findings position the Mongolian gerbil as a compelling animal model to study the neural basis of vocal communication and demonstrates the potential for using unsupervised machine learning with uninterrupted acoustic recordings to gain insights into naturalistic animal behavior.
Collapse
Affiliation(s)
- Ralph E Peterson
- Center for Neural Science, New York University, New York, United States
- Center for Computational Neuroscience, Flatiron Institute, New York, United States
| | - Aman Choudhri
- Columbia University, New York, New York, United States
| | - Catalin Mitelut
- Center for Neural Science, New York University, New York, United States
| | - Aramis Tanelus
- Center for Neural Science, New York University, New York, United States
- Center for Computational Neuroscience, Flatiron Institute, New York, United States
| | | | - Alex H Williams
- Center for Neural Science, New York University, New York, United States
- Center for Computational Neuroscience, Flatiron Institute, New York, United States
| | - David M Schneider
- Center for Neural Science, New York University, New York, United States
| | - Dan H Sanes
- Center for Neural Science, New York University, New York, United States
- Department of Psychology, New York University, New York, United States
- Neuroscience Institute, New York University School of Medicine, New York, United States
- Department of Biology, New York University, New York, United States
| |
Collapse
|
6
|
Karpowicz BM, Ye J, Fan C, Tostado-Marcos P, Rizzoglio F, Washington C, Scodeler T, de Lucena D, Nason-Tomaszewski SR, Mender MJ, Ma X, Arneodo EM, Hochberg LR, Chestek CA, Henderson JM, Gentner TQ, Gilja V, Miller LE, Rouse AG, Gaunt RA, Collinger JL, Pandarinath C. Few-shot Algorithms for Consistent Neural Decoding (FALCON) Benchmark. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.15.613126. [PMID: 39345641 PMCID: PMC11429771 DOI: 10.1101/2024.09.15.613126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
Intracortical brain-computer interfaces (iBCIs) can restore movement and communication abilities to individuals with paralysis by decoding their intended behavior from neural activity recorded with an implanted device. While this activity yields high-performance decoding over short timescales, neural data are often nonstationary, which can lead to decoder failure if not accounted for. To maintain performance, users must frequently recalibrate decoders, which requires the arduous collection of new neural and behavioral data. Aiming to reduce this burden, several approaches have been developed that either limit recalibration data requirements (few-shot approaches) or eliminate explicit recalibration entirely (zero-shot approaches). However, progress is limited by a lack of standardized datasets and comparison metrics, causing methods to be compared in an ad hoc manner. Here we introduce the FALCON benchmark suite (Few-shot Algorithms for COnsistent Neural decoding) to standardize evaluation of iBCI robustness. FALCON curates five datasets of neural and behavioral data that span movement and communication tasks to focus on behaviors of interest to modern-day iBCIs. Each dataset includes calibration data, optional few-shot recalibration data, and private evaluation data. We implement a flexible evaluation platform which only requires user-submitted code to return behavioral predictions on unseen data. We also seed the benchmark by applying baseline methods spanning several classes of possible approaches. FALCON aims to provide rigorous selection criteria for robust iBCI decoders, easing their translation to real-world devices.
Collapse
|
7
|
Li R, Huang G, Wang X, Lawler K, Goldberg LR, Roccati E, St George RJ, Aiyede M, King AE, Bindoff AD, Vickers JC, Bai Q, Alty J. Smartphone automated motor and speech analysis for early detection of Alzheimer's disease and Parkinson's disease: Validation of TapTalk across 20 different devices. ALZHEIMER'S & DEMENTIA (AMSTERDAM, NETHERLANDS) 2024; 16:e70025. [PMID: 39445342 PMCID: PMC11496774 DOI: 10.1002/dad2.70025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/24/2024] [Revised: 09/17/2024] [Accepted: 09/23/2024] [Indexed: 10/25/2024]
Abstract
INTRODUCTION Smartphones are proving useful in assessing movement and speech function in Alzheimer's disease and other neurodegenerative conditions. Valid outcomes across different smartphones are needed before population-level tests are deployed. This study introduces the TapTalk protocol, a novel app designed to capture hand and speech function and validate it in smartphones against gold-standard measures. METHODS Twenty different smartphones collected video data from motor tests and audio data from speech tests. Features were extracted using Google Mediapipe (movement) and Python audio analysis packages (speech). Electromagnetic sensors (60 Hz) and a microphone acquired simultaneous movement and voice data, respectively. RESULTS TapTalk video and audio outcomes were comparable to gold-standard data: 90.3% of video, and 98.3% of audio, data recorded tapping/speech frequencies within ± 1 Hz of the gold-standard measures. DISCUSSION Validation of TapTalk across a range of devices is an important step in the development of smartphone-based telemedicine and was achieved in this study. Highlights TapTalk evaluates hand motor and speech functions across a wide range of smartphones.Data showed 90.3% motor and 98.3% speech accuracy within +/-1 Hz of gold standards.Validation advances smartphone-based telemedicine for neurodegenerative diseases.
Collapse
Affiliation(s)
- Renjie Li
- Wicking Dementia Research and Education CentreUniversity of TasmaniaHobartTasmaniaAustralia
- School of ICTUniversity of TasmaniaHobartTasmaniaAustralia
| | - Guan Huang
- Wicking Dementia Research and Education CentreUniversity of TasmaniaHobartTasmaniaAustralia
| | - Xinyi Wang
- Wicking Dementia Research and Education CentreUniversity of TasmaniaHobartTasmaniaAustralia
| | - Katherine Lawler
- Wicking Dementia Research and Education CentreUniversity of TasmaniaHobartTasmaniaAustralia
- School of Allied HealthHuman Services and SportLa Trobe UniversityMelbourneVictoriaAustralia
| | - Lynette R. Goldberg
- Wicking Dementia Research and Education CentreUniversity of TasmaniaHobartTasmaniaAustralia
| | - Eddy Roccati
- Wicking Dementia Research and Education CentreUniversity of TasmaniaHobartTasmaniaAustralia
| | | | - Mimieveshiofuo Aiyede
- Wicking Dementia Research and Education CentreUniversity of TasmaniaHobartTasmaniaAustralia
| | - Anna E. King
- Wicking Dementia Research and Education CentreUniversity of TasmaniaHobartTasmaniaAustralia
| | - Aidan D. Bindoff
- Wicking Dementia Research and Education CentreUniversity of TasmaniaHobartTasmaniaAustralia
| | - James C. Vickers
- Wicking Dementia Research and Education CentreUniversity of TasmaniaHobartTasmaniaAustralia
| | - Quan Bai
- School of ICTUniversity of TasmaniaHobartTasmaniaAustralia
| | - Jane Alty
- Wicking Dementia Research and Education CentreUniversity of TasmaniaHobartTasmaniaAustralia
- School of MedicineUniversity of TasmaniaHobartTasmaniaAustralia
- Neurology DepartmentRoyal Hobart HospitalHobartTasmaniaAustralia
| |
Collapse
|
8
|
Mielke A, Badihi G, Graham KE, Grund C, Hashimoto C, Piel AK, Safryghin A, Slocombe KE, Stewart F, Wilke C, Zuberbühler K, Hobaiter C. Many morphs: Parsing gesture signals from the noise. Behav Res Methods 2024; 56:6520-6537. [PMID: 38438657 PMCID: PMC11362259 DOI: 10.3758/s13428-024-02368-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/12/2024] [Indexed: 03/06/2024]
Abstract
Parsing signals from noise is a general problem for signallers and recipients, and for researchers studying communicative systems. Substantial efforts have been invested in comparing how other species encode information and meaning, and how signalling is structured. However, research depends on identifying and discriminating signals that represent meaningful units of analysis. Early approaches to defining signal repertoires applied top-down approaches, classifying cases into predefined signal types. Recently, more labour-intensive methods have taken a bottom-up approach describing detailed features of each signal and clustering cases based on patterns of similarity in multi-dimensional feature-space that were previously undetectable. Nevertheless, it remains essential to assess whether the resulting repertoires are composed of relevant units from the perspective of the species using them, and redefining repertoires when additional data become available. In this paper we provide a framework that takes data from the largest set of wild chimpanzee (Pan troglodytes) gestures currently available, splitting gesture types at a fine scale based on modifying features of gesture expression using latent class analysis (a model-based cluster detection algorithm for categorical variables), and then determining whether this splitting process reduces uncertainty about the goal or community of the gesture. Our method allows different features of interest to be incorporated into the splitting process, providing substantial future flexibility across, for example, species, populations, and levels of signal granularity. Doing so, we provide a powerful tool allowing researchers interested in gestural communication to establish repertoires of relevant units for subsequent analyses within and between systems of communication.
Collapse
Affiliation(s)
- Alexander Mielke
- Wild Minds Lab, School of Psychology and Neuroscience, University of St Andrews, St Andrews, UK.
- School of Biological and Behavioural Sciences, Queen Mary University of London, London, UK.
| | - Gal Badihi
- Wild Minds Lab, School of Psychology and Neuroscience, University of St Andrews, St Andrews, UK
| | - Kirsty E Graham
- Wild Minds Lab, School of Psychology and Neuroscience, University of St Andrews, St Andrews, UK
| | - Charlotte Grund
- Wild Minds Lab, School of Psychology and Neuroscience, University of St Andrews, St Andrews, UK
| | - Chie Hashimoto
- Primate Research Institute, Kyoto University, Kyoto, Japan
| | - Alex K Piel
- Department of Anthropology, University College London, London, UK
- Department of Human Origins, Max Planck Institute of Evolutionary Anthropology, Leipzig, Germany
| | - Alexandra Safryghin
- Wild Minds Lab, School of Psychology and Neuroscience, University of St Andrews, St Andrews, UK
| | | | - Fiona Stewart
- Department of Anthropology, University College London, London, UK
- Department of Human Origins, Max Planck Institute of Evolutionary Anthropology, Leipzig, Germany
| | - Claudia Wilke
- Department of Psychology, University of York, York, UK
| | - Klaus Zuberbühler
- Institute of Biology, University of Neuchâtel, Neuchâtel, Switzerland
| | - Catherine Hobaiter
- Wild Minds Lab, School of Psychology and Neuroscience, University of St Andrews, St Andrews, UK
| |
Collapse
|
9
|
Cauzinille J, Favre B, Marxer R, Rey A. Applying machine learning to primate bioacoustics: Review and perspectives. Am J Primatol 2024; 86:e23666. [PMID: 39120066 DOI: 10.1002/ajp.23666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 06/13/2024] [Accepted: 07/12/2024] [Indexed: 08/10/2024]
Abstract
This paper provides a comprehensive review of the use of computational bioacoustics as well as signal and speech processing techniques in the analysis of primate vocal communication. We explore the potential implications of machine learning and deep learning methods, from the use of simple supervised algorithms to more recent self-supervised models, for processing and analyzing large data sets obtained within the emergence of passive acoustic monitoring approaches. In addition, we discuss the importance of automated primate vocalization analysis in tackling essential questions on animal communication and highlighting the role of comparative linguistics in bioacoustic research. We also examine the challenges associated with data collection and annotation and provide insights into potential solutions. Overall, this review paper runs through a set of common or innovative perspectives and applications of machine learning for primate vocal communication analysis and outlines opportunities for future research in this rapidly developing field.
Collapse
Affiliation(s)
- Jules Cauzinille
- LIS, CNRS, Aix-Marseille University, Marseille, France
- CRPN, CNRS, Aix-Marseille University, Marseille, France
- ILCB, Aix-Marseille University, Marseille, France
| | - Benoit Favre
- LIS, CNRS, Aix-Marseille University, Marseille, France
- ILCB, Aix-Marseille University, Marseille, France
| | - Ricard Marxer
- ILCB, Aix-Marseille University, Marseille, France
- LIS, CNRS, Université de Toulon, Toulon, France
| | - Arnaud Rey
- CRPN, CNRS, Aix-Marseille University, Marseille, France
- ILCB, Aix-Marseille University, Marseille, France
| |
Collapse
|
10
|
Milligan BG, Rohde AT. Why More Biologists Must Embrace Quantitative Modeling. Integr Comp Biol 2024; 64:975-986. [PMID: 38740442 DOI: 10.1093/icb/icae038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 04/26/2024] [Accepted: 05/01/2024] [Indexed: 05/16/2024] Open
Abstract
Biology as a field has transformed since the time of its foundation from an organized enterprise cataloging the diversity of the natural world to a quantitatively rigorous science seeking to answer complex questions about the functions of organisms and their interactions with each other and their environments. As the mathematical rigor of biological analyses has improved, quantitative models have been developed to describe multi-mechanistic systems and to test complex hypotheses. However, applications of quantitative models have been uneven across fields, and many biologists lack the foundational training necessary to apply them in their research or to interpret their results to inform biological problem-solving efforts. This gap in scientific training has created a false dichotomy of "biologists" and "modelers" that only exacerbates the barriers to working biologists seeking additional training in quantitative modeling. Here, we make the argument that all biologists are modelers and are capable of using sophisticated quantitative modeling in their work. We highlight four benefits of conducting biological research within the framework of quantitative models, identify the potential producers and consumers of information produced by such models, and make recommendations for strategies to overcome barriers to their widespread implementation. Improved understanding of quantitative modeling could guide the producers of biological information to better apply biological measurements through analyses that evaluate mechanisms, and allow consumers of biological information to better judge the quality and applications of the information they receive. As our explanations of biological phenomena increase in complexity, so too must we embrace modeling as a foundational skill.
Collapse
Affiliation(s)
- Brook G Milligan
- Department of Biology, New Mexico State University, Las Cruces, NM 88001, USA
| | - Ashley T Rohde
- Department of Biology, New Mexico State University, Las Cruces, NM 88001, USA
| |
Collapse
|
11
|
Norman-Haignere SV, Keshishian MK, Devinsky O, Doyle W, McKhann GM, Schevon CA, Flinker A, Mesgarani N. Temporal integration in human auditory cortex is predominantly yoked to absolute time, not structure duration. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.23.614358. [PMID: 39386565 PMCID: PMC11463558 DOI: 10.1101/2024.09.23.614358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
Sound structures such as phonemes and words have highly variable durations. Thus, there is a fundamental difference between integrating across absolute time (e.g., 100 ms) vs. sound structure (e.g., phonemes). Auditory and cognitive models have traditionally cast neural integration in terms of time and structure, respectively, but the extent to which cortical computations reflect time or structure remains unknown. To answer this question, we rescaled the duration of all speech structures using time stretching/compression and measured integration windows in the human auditory cortex using a new experimental/computational method applied to spatiotemporally precise intracranial recordings. We observed significantly longer integration windows for stretched speech, but this lengthening was very small (~5%) relative to the change in structure durations, even in non-primary regions strongly implicated in speech-specific processing. These findings demonstrate that time-yoked computations dominate throughout the human auditory cortex, placing important constraints on neurocomputational models of structure processing.
Collapse
Affiliation(s)
- Sam V Norman-Haignere
- University of Rochester Medical Center, Department of Biostatistics and Computational Biology
- University of Rochester Medical Center, Department of Neuroscience
- University of Rochester, Department of Brain and Cognitive Sciences
- University of Rochester, Department of Biomedical Engineering
- Zuckerman Institute for Mind Brain and Behavior, Columbia University
| | - Menoua K. Keshishian
- Zuckerman Institute for Mind Brain and Behavior, Columbia University
- Department of Electrical Engineering, Columbia University
| | - Orrin Devinsky
- Department of Neurology, NYU Langone Medical Center
- Comprehensive Epilepsy Center, NYU Langone Medical Center
| | - Werner Doyle
- Comprehensive Epilepsy Center, NYU Langone Medical Center
- Department of Neurosurgery, NYU Langone Medical Center
| | - Guy M. McKhann
- Department of Neurological Surgery, Columbia University Irving Medical Center
| | | | - Adeen Flinker
- Department of Neurology, NYU Langone Medical Center
- Comprehensive Epilepsy Center, NYU Langone Medical Center
- Department of Biomedical Engineering, NYU Tandon School of Engineering
| | - Nima Mesgarani
- Zuckerman Institute for Mind Brain and Behavior, Columbia University
- Department of Electrical Engineering, Columbia University
| |
Collapse
|
12
|
Peterson RE, Choudhri A, Mitelut C, Tanelus A, Capo-Battaglia A, Williams AH, Schneider DM, Sanes DH. Unsupervised discovery of family specific vocal usage in the Mongolian gerbil. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.03.11.532197. [PMID: 39282260 PMCID: PMC11398318 DOI: 10.1101/2023.03.11.532197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 09/21/2024]
Abstract
In nature, animal vocalizations can provide crucial information about identity, including kinship and hierarchy. However, lab-based vocal behavior is typically studied during brief interactions between animals with no prior social relationship, and under environmental conditions with limited ethological relevance. Here, we address this gap by establishing long-term acoustic recordings from Mongolian gerbil families, a core social group that uses an array of sonic and ultrasonic vocalizations. Three separate gerbil families were transferred to an enlarged environment and continuous 20-day audio recordings were obtained. Using a variational autoencoder (VAE) to quantify 583,237 vocalizations, we show that gerbils exhibit a more elaborate vocal repertoire than has been previously reported and that vocal repertoire usage differs significantly by family. By performing gaussian mixture model clustering on the VAE latent space, we show that families preferentially use characteristic sets of vocal clusters and that these usage preferences remain stable over weeks. Furthermore, gerbils displayed family-specific transitions between vocal clusters. Since gerbils live naturally as extended families in complex underground burrows that are adjacent to other families, these results suggest the presence of a vocal dialect which could be exploited by animals to represent kinship. These findings position the Mongolian gerbil as a compelling animal model to study the neural basis of vocal communication and demonstrates the potential for using unsupervised machine learning with uninterrupted acoustic recordings to gain insights into naturalistic animal behavior.
Collapse
Affiliation(s)
- Ralph E. Peterson
- Center for Neural Science, New York University, New York, NY
- Center for Computational Neuroscience, Flatiron Institute, New York, NY
| | | | - Catalin Mitelut
- Center for Neural Science, New York University, New York, NY
| | - Aramis Tanelus
- Center for Neural Science, New York University, New York, NY
- Center for Computational Neuroscience, Flatiron Institute, New York, NY
| | | | - Alex H. Williams
- Center for Neural Science, New York University, New York, NY
- Center for Computational Neuroscience, Flatiron Institute, New York, NY
| | | | - Dan H. Sanes
- Center for Neural Science, New York University, New York, NY
- Department of Psychology, New York University, New York, NY
- Department of Biology, New York University, New York, NY
- Neuroscience Institute, New York University School of Medicine, New York, NY
| |
Collapse
|
13
|
Torok Z, Luebbert L, Feldman J, Duffy A, Nevue AA, Wongso S, Mello CV, Fairhall A, Pachter L, Gonzalez WG, Lois C. Resilience of A Learned Motor Behavior After Chronic Disruption of Inhibitory Circuits. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.05.17.541057. [PMID: 37292888 PMCID: PMC10245685 DOI: 10.1101/2023.05.17.541057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Maintaining motor behaviors throughout life is crucial for an individual's survival and reproductive success. The neuronal mechanisms that preserve behavior are poorly understood. To address this question, we focused on the zebra finch, a bird that produces a highly stereotypical song after learning it as a juvenile. Using cell-specific viral vectors, we chronically silenced inhibitory neurons in the pre-motor song nucleus called the high vocal center (HVC), which caused drastic song degradation. However, after producing severely degraded vocalizations for around 2 months, the song rapidly improved, and animals could sing songs that highly resembled the original. In adult birds, single-cell RNA sequencing of HVC revealed that silencing interneurons elevated markers for microglia and increased expression of the Major Histocompatibility Complex I (MHC I), mirroring changes observed in juveniles during song learning. Interestingly, adults could restore their songs despite lesioning the lateral magnocellular nucleus of the anterior neostriatum (LMAN), a brain nucleus crucial for juvenile song learning. This suggests that while molecular mechanisms may overlap, adults utilize different neuronal mechanisms for song recovery. Chronic and acute electrophysiological recordings within HVC and its downstream target, the robust nucleus of the archistriatum (RA), revealed that neuronal activity in the circuit permanently altered with higher spontaneous firing in RA and lower in HVC compared to control even after the song had fully recovered. Together, our findings show that a complex learned behavior can recover despite extended periods of perturbed behavior and permanently altered neuronal dynamics. These results show that loss of inhibitory tone can be compensated for by recovery mechanisms partly local to the perturbed nucleus and do not require circuits necessary for learning.
Collapse
Affiliation(s)
- Zsofia Torok
- Division of Biology and Biological Engineering, California
Institute of Technology; Pasadena, CA, USA
| | - Laura Luebbert
- Division of Biology and Biological Engineering, California
Institute of Technology; Pasadena, CA, USA
| | - Jordan Feldman
- Division of Biology and Biological Engineering, California
Institute of Technology; Pasadena, CA, USA
| | | | | | - Shelyn Wongso
- Division of Biology and Biological Engineering, California
Institute of Technology; Pasadena, CA, USA
| | | | | | - Lior Pachter
- Division of Biology and Biological Engineering, California
Institute of Technology; Pasadena, CA, USA
- Department of Computing and Mathematical Sciences,
California Institute of Technology; Pasadena, CA, USA
| | - Walter G. Gonzalez
- Department of Physiology, University of San Francisco; San
Francisco, CA, USA
| | - Carlos Lois
- Division of Biology and Biological Engineering, California
Institute of Technology; Pasadena, CA, USA
| |
Collapse
|
14
|
Koch TMI, Marks ES, Roberts TF. AVN: A Deep Learning Approach for the Analysis of Birdsong. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.10.593561. [PMID: 39229184 PMCID: PMC11370480 DOI: 10.1101/2024.05.10.593561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
Deep learning tools for behavior analysis have enabled important new insights and discoveries in neuroscience. Yet, they often compromise interpretability and generalizability for performance, making it difficult to quantitively compare phenotypes across datasets and research groups. We developed a novel deep learning-based behavior analysis pipeline, Avian Vocalization Network (AVN), for the learned vocalizations of the most extensively studied vocal learning model species - the zebra finch. AVN annotates songs with high accuracy across multiple animal colonies without the need for any additional training data and generates a comprehensive set of interpretable features to describe the syntax, timing, and acoustic properties of song. We use this feature set to compare song phenotypes across multiple research groups and experiments, and to predict a bird's stage in song development. Additionally, we have developed a novel method to measure song imitation that requires no additional training data for new comparisons or recording environments, and outperforms existing similarity scoring methods in its sensitivity and agreement with expert human judgements of song similarity. These tools are available through the open-source AVN python package and graphical application, which makes them accessible to researchers without any prior coding experience. Altogether, this behavior analysis toolkit stands to facilitate and accelerate the study of vocal behavior by enabling a standardized mapping of phenotypes and learning outcomes, thus helping scientists better link behavior to the underlying neural processes.
Collapse
Affiliation(s)
- Therese M I Koch
- Department of Neuroscience, UT Southwestern Medical Center, Dallas TX, USA
| | - Ethan S Marks
- Department of Neuroscience, UT Southwestern Medical Center, Dallas TX, USA
| | - Todd F Roberts
- Department of Neuroscience, UT Southwestern Medical Center, Dallas TX, USA
| |
Collapse
|
15
|
Chung TL, Liu YH, Wu PY, Huang JC, Tsai YC, Wang YC, Pan SP, Hsu YL, Chen SC. Prediction of Arteriovenous Access Dysfunction by Mel Spectrogram-based Deep Learning Model. Int J Med Sci 2024; 21:2252-2260. [PMID: 39310268 PMCID: PMC11413895 DOI: 10.7150/ijms.98421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Accepted: 08/13/2024] [Indexed: 09/25/2024] Open
Abstract
Background: The early detection of arteriovenous (AV) access dysfunction is crucial for maintaining the patency of vascular access. This study aimed to use deep learning to predict AV access malfunction necessitating further vascular management. Methods: This prospective cohort study enrolled prevalent hemodialysis (HD) patients with an AV fistula or AV graft from a single HD center. Their AV access bruit sounds were recorded weekly using an electronic stethoscope from three different sites (arterial needle site, venous needle site, and the midpoint between the arterial and venous needle sites) before HD sessions. The audio signals were converted to Mel spectrograms using Fourier transformation and utilized to develop deep learning models. Three deep learning models, (1) Convolutional Neural Network (CNN), (2) Convolutional Recurrent Neural Network (CRNN), and (3) Vision Transformers-Gate Recurrent Unit (ViT-GRU), were trained and compared to predict the likelihood of dysfunctional AV access. Results: Total 437 audio recordings were obtained from 84 patients. The CNN model outperformed the other models in the test set, with an F1 score of 0.7037 and area under the receiver operating characteristic curve (AUROC) of 0.7112. The Vit-GRU model had high performance in out-of-fold predictions, with an F1 score of 0.7131 and AUROC of 0.7745, but low generalization ability in the test set, with an F1 score of 0.5225 and AUROC of 0.5977. Conclusions: The CNN model based on Mel spectrograms could predict malfunctioning AV access requiring vascular intervention within 10 days. This approach could serve as a useful screening tool for high-risk AV access.
Collapse
Affiliation(s)
- Tung-Ling Chung
- Graduate Institute of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
- Division of Nephrology, Department of Internal Medicine, Kaohsiung Veterans General Hospital, Kaohsiung, Taiwan
| | - Yi-Hsueh Liu
- Graduate Institute of Clinical Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
- Department of Internal Medicine, Kaohsiung Municipal Siaogang Hospital, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
- Division of Cardiology, Department of Internal Medicine, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Pei-Yu Wu
- Department of Internal Medicine, Kaohsiung Municipal Siaogang Hospital, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
- Division of Nephrology, Department of Internal Medicine, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
- Faculty of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Jiun-Chi Huang
- Department of Internal Medicine, Kaohsiung Municipal Siaogang Hospital, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
- Division of Nephrology, Department of Internal Medicine, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
- Faculty of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Yi-Chun Tsai
- Division of Nephrology, Department of Internal Medicine, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
- Division of General Medicine, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
- Faculty of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Yu-Chen Wang
- Muen Biomedical and Optoelectronics Technologies Inc., Taipei, Taiwan
| | - Shan-Pin Pan
- Muen Biomedical and Optoelectronics Technologies Inc., Taipei, Taiwan
| | - Ya-Ling Hsu
- Graduate Institute of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Szu-Chia Chen
- Department of Internal Medicine, Kaohsiung Municipal Siaogang Hospital, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
- Division of Nephrology, Department of Internal Medicine, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
- Faculty of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
| |
Collapse
|
16
|
Spiller M, Esmaeili N, Sühn T, Boese A, Turial S, Gumbs AA, Croner R, Friebe M, Illanes A. Enhancing Veress Needle Entry with Proximal Vibroacoustic Sensing for Automatic Identification of Peritoneum Puncture. Diagnostics (Basel) 2024; 14:1698. [PMID: 39125574 PMCID: PMC11311580 DOI: 10.3390/diagnostics14151698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 07/12/2024] [Accepted: 07/31/2024] [Indexed: 08/12/2024] Open
Abstract
Laparoscopic access, a critical yet challenging step in surgical procedures, often leads to complications. Existing systems, such as improved Veress needles and optical trocars, offer limited safety benefits but come with elevated costs. In this study, a prototype of a novel technology for guiding needle interventions based on vibroacoustic signals is evaluated in porcine cadavers. The prototype consistently detected successful abdominal cavity entry in 100% of cases during 193 insertions across eight porcine cadavers. The high signal quality allowed for the precise identification of all Veress needle insertion phases, including peritoneum puncture. The findings suggest that this vibroacoustic-based guidance technology could enhance surgeons' situational awareness and provide valuable support during laparoscopic access. Unlike existing solutions, this technology does not require sensing elements in the instrument's tip and remains compatible with medical instruments from various manufacturers.
Collapse
Affiliation(s)
- Moritz Spiller
- SURAG Medical GmbH, 04229 Leipzig, Germany; (N.E.); (T.S.); (A.I.)
| | - Nazila Esmaeili
- SURAG Medical GmbH, 04229 Leipzig, Germany; (N.E.); (T.S.); (A.I.)
- Chair for Computer Aided Medical Procedures and Augmented Reality, Technical University of Munich, 85748 Munich, Germany
| | - Thomas Sühn
- SURAG Medical GmbH, 04229 Leipzig, Germany; (N.E.); (T.S.); (A.I.)
- Department of Orthopaedic Surgery, Otto-von-Guericke University Magdeburg, 39106 Magdeburg, Germany
| | - Axel Boese
- INKA—Innovation Laboratory for Image Guided Therapy, Otto-von-Guericke University Magdeburg, 39106 Magdeburg, Germany; (A.B.); (M.F.)
| | - Salmai Turial
- Department of Pediatric Surgery and Pediatric Traumatology, University Clinic for General, Visceral, Vascular and Transplant Surgery, University Hospital Magdeburg, 39120 Magdeburg, Germany;
| | - Andrew A. Gumbs
- University Clinic for General, Visceral, Vascular and Transplant Surgery, University Hospital Magdeburg, 39120 Magdeburg, Germany; (A.A.G.); (R.C.)
- Advanced & Minimally Invasive Surgery Excellence Center, American Hospital Tblisi, 0102 Tblisi, Georgia
| | - Roland Croner
- University Clinic for General, Visceral, Vascular and Transplant Surgery, University Hospital Magdeburg, 39120 Magdeburg, Germany; (A.A.G.); (R.C.)
| | - Michael Friebe
- INKA—Innovation Laboratory for Image Guided Therapy, Otto-von-Guericke University Magdeburg, 39106 Magdeburg, Germany; (A.B.); (M.F.)
- Faculty of Computer Science, AGH University of Science and Technology, 30-059 Krakow, Poland
- Center for Innovation, Business Development & Entrepreneurship, FOM University of Applied Sciences, 45141 Essen, Germany
| | - Alfredo Illanes
- SURAG Medical GmbH, 04229 Leipzig, Germany; (N.E.); (T.S.); (A.I.)
| |
Collapse
|
17
|
Bousquet CAH, Sueur C, King AJ, O'Bryan LR. Individual and ecological heterogeneity promote complex communication in social vertebrate group decisions. Philos Trans R Soc Lond B Biol Sci 2024; 379:20230204. [PMID: 38768211 PMCID: PMC11391315 DOI: 10.1098/rstb.2023.0204] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 12/08/2023] [Accepted: 03/04/2024] [Indexed: 05/22/2024] Open
Abstract
To receive the benefits of social living, individuals must make effective group decisions that enable them to achieve behavioural coordination and maintain cohesion. However, heterogeneity in the physical and social environments surrounding group decision-making contexts can increase the level of difficulty social organisms face in making decisions. Groups that live in variable physical environments (high ecological heterogeneity) can experience barriers to information transfer and increased levels of ecological uncertainty. In addition, in groups with large phenotypic variation (high individual heterogeneity), individuals can have substantial conflicts of interest regarding the timing and nature of activities, making it difficult for them to coordinate their behaviours or reach a consensus. In such cases, active communication can increase individuals' abilities to achieve coordination, such as by facilitating the transfer and aggregation of information about the environment or individual behavioural preferences. Here, we review the role of communication in vertebrate group decision-making and its relationship to heterogeneity in the ecological and social environment surrounding group decision-making contexts. We propose that complex communication has evolved to facilitate decision-making in specific socio-ecological contexts, and we provide a framework for studying this topic and testing related hypotheses as part of future research in this area. This article is part of the theme issue 'The power of sound: unravelling how acoustic communication shapes group dynamics'.
Collapse
Affiliation(s)
- Christophe A. H. Bousquet
- Department of Psychology, University of Konstanz, Konstanz78457, Germany
- Centre for the Advanced Study of Collective Behaviour, University of Konstanz, Konstanz78457, Germany
| | - Cédric Sueur
- Institut pluridisciplinaire Hubert Curien, Strasbourg67000, France
- Institut Universitaire de France, Paris75005, France
| | - Andrew J. King
- Biosciences, Faculty of Science and Engineering, SwanseaSA2 8PP, UK
| | - Lisa R. O'Bryan
- Department of Psychological Sciences, Rice University, Houston, TX77005, USA
| |
Collapse
|
18
|
Wang B, Torok Z, Duffy A, Bell DG, Wongso S, Velho TAF, Fairhall AL, Lois C. Unsupervised restoration of a complex learned behavior after large-scale neuronal perturbation. Nat Neurosci 2024; 27:1176-1186. [PMID: 38684893 DOI: 10.1038/s41593-024-01630-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Accepted: 03/26/2024] [Indexed: 05/02/2024]
Abstract
Reliable execution of precise behaviors requires that brain circuits are resilient to variations in neuronal dynamics. Genetic perturbation of the majority of excitatory neurons in HVC, a brain region involved in song production, in adult songbirds with stereotypical songs triggered severe degradation of the song. The song fully recovered within 2 weeks, and substantial improvement occurred even when animals were prevented from singing during the recovery period, indicating that offline mechanisms enable recovery in an unsupervised manner. Song restoration was accompanied by increased excitatory synaptic input to neighboring, unmanipulated neurons in the same brain region. A model inspired by the behavioral and electrophysiological findings suggests that unsupervised single-cell and population-level homeostatic plasticity rules can support the functional restoration after large-scale disruption of networks that implement sequential dynamics. These observations suggest the existence of cellular and systems-level restorative mechanisms that ensure behavioral resilience.
Collapse
Affiliation(s)
- Bo Wang
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA.
| | - Zsofia Torok
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Alison Duffy
- Department of Physiology and Biophysics, University of Washington, Seattle, WA, USA
- Computational Neuroscience Center, University of Washington, Seattle, WA, USA
| | - David G Bell
- Computational Neuroscience Center, University of Washington, Seattle, WA, USA
- Department of Physics, University of Washington, Seattle, WA, USA
| | - Shelyn Wongso
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Tarciso A F Velho
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Adrienne L Fairhall
- Department of Physiology and Biophysics, University of Washington, Seattle, WA, USA
- Computational Neuroscience Center, University of Washington, Seattle, WA, USA
- Department of Physics, University of Washington, Seattle, WA, USA
| | - Carlos Lois
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA.
| |
Collapse
|
19
|
Liao DA, Brecht KF, Veit L, Nieder A. Crows "count" the number of self-generated vocalizations. Science 2024; 384:874-877. [PMID: 38781375 DOI: 10.1126/science.adl0984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 04/22/2024] [Indexed: 05/25/2024]
Abstract
Producing a specific number of vocalizations with purpose requires a sophisticated combination of numerical abilities and vocal control. Whether this capacity exists in animals other than humans is yet unknown. We show that crows can flexibly produce variable numbers of one to four vocalizations in response to arbitrary cues associated with numerical values. The acoustic features of the first vocalization of a sequence were predictive of the total number of vocalizations, indicating a planning process. Moreover, the acoustic features of vocal units predicted their order in the sequence and could be used to read out counting errors during vocal production.
Collapse
Affiliation(s)
- Diana A Liao
- Animal Physiology, Institute of Neurobiology, University of Tübingen, Auf der Morgenstelle 28, 72076 Tübingen, Germany
| | - Katharina F Brecht
- Animal Physiology, Institute of Neurobiology, University of Tübingen, Auf der Morgenstelle 28, 72076 Tübingen, Germany
| | - Lena Veit
- Neurobiology of Vocal Communication, Institute of Neurobiology, University of Tübingen Auf der Morgenstelle 28, 72076 Tübingen, Germany
| | - Andreas Nieder
- Animal Physiology, Institute of Neurobiology, University of Tübingen, Auf der Morgenstelle 28, 72076 Tübingen, Germany
| |
Collapse
|
20
|
Erb WM, Ross W, Kazanecki H, Mitra Setia T, Madhusudhana S, Clink DJ. Vocal complexity in the long calls of Bornean orangutans. PeerJ 2024; 12:e17320. [PMID: 38766489 PMCID: PMC11100477 DOI: 10.7717/peerj.17320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 04/09/2024] [Indexed: 05/22/2024] Open
Abstract
Vocal complexity is central to many evolutionary hypotheses about animal communication. Yet, quantifying and comparing complexity remains a challenge, particularly when vocal types are highly graded. Male Bornean orangutans (Pongo pygmaeus wurmbii) produce complex and variable "long call" vocalizations comprising multiple sound types that vary within and among individuals. Previous studies described six distinct call (or pulse) types within these complex vocalizations, but none quantified their discreteness or the ability of human observers to reliably classify them. We studied the long calls of 13 individuals to: (1) evaluate and quantify the reliability of audio-visual classification by three well-trained observers, (2) distinguish among call types using supervised classification and unsupervised clustering, and (3) compare the performance of different feature sets. Using 46 acoustic features, we used machine learning (i.e., support vector machines, affinity propagation, and fuzzy c-means) to identify call types and assess their discreteness. We additionally used Uniform Manifold Approximation and Projection (UMAP) to visualize the separation of pulses using both extracted features and spectrogram representations. Supervised approaches showed low inter-observer reliability and poor classification accuracy, indicating that pulse types were not discrete. We propose an updated pulse classification approach that is highly reproducible across observers and exhibits strong classification accuracy using support vector machines. Although the low number of call types suggests long calls are fairly simple, the continuous gradation of sounds seems to greatly boost the complexity of this system. This work responds to calls for more quantitative research to define call types and quantify gradedness in animal vocal systems and highlights the need for a more comprehensive framework for studying vocal complexity vis-à-vis graded repertoires.
Collapse
Affiliation(s)
- Wendy M. Erb
- K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University, Ithaca, NY, United States of America
- Department of Anthropology, Rutgers, The State University of New Jersey, New Brunswick, United States of America
| | - Whitney Ross
- K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University, Ithaca, NY, United States of America
| | - Haley Kazanecki
- K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University, Ithaca, NY, United States of America
| | - Tatang Mitra Setia
- Primate Research Center, Universitas Nasional Jakarta, Jakarta, Indonesia
- Department of Biology, Faculty of Biology and Agriculture, Universitas Nasional Jakarta, Jakarta, Indonesia
| | - Shyam Madhusudhana
- K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University, Ithaca, NY, United States of America
- Centre for Marine Science and Technology, Curtin University, Perth, Australia
| | - Dena J. Clink
- K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University, Ithaca, NY, United States of America
| |
Collapse
|
21
|
Hasani Azhdari SM, Mahmoodzadeh A, Khishe M, Agahi H. Enhanced PRIM recognition using PRI sound and deep learning techniques. PLoS One 2024; 19:e0298373. [PMID: 38691542 PMCID: PMC11062556 DOI: 10.1371/journal.pone.0298373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 01/24/2024] [Indexed: 05/03/2024] Open
Abstract
Pulse repetition interval modulation (PRIM) is integral to radar identification in modern electronic support measure (ESM) and electronic intelligence (ELINT) systems. Various distortions, including missing pulses, spurious pulses, unintended jitters, and noise from radar antenna scans, often hinder the accurate recognition of PRIM. This research introduces a novel three-stage approach for PRIM recognition, emphasizing the innovative use of PRI sound. A transfer learning-aided deep convolutional neural network (DCNN) is initially used for feature extraction. This is followed by an extreme learning machine (ELM) for real-time PRIM classification. Finally, a gray wolf optimizer (GWO) refines the network's robustness. To evaluate the proposed method, we develop a real experimental dataset consisting of sound of six common PRI patterns. We utilized eight pre-trained DCNN architectures for evaluation, with VGG16 and ResNet50V2 notably achieving recognition accuracies of 97.53% and 96.92%. Integrating ELM and GWO further optimized the accuracy rates to 98.80% and 97.58. This research advances radar identification by offering an enhanced method for PRIM recognition, emphasizing the potential of PRI sound to address real-world distortions in ESM and ELINT systems.
Collapse
Affiliation(s)
| | - Azar Mahmoodzadeh
- Department of Electrical Engineering, Shiraz Branch, Islamic Azad University, Shiraz, Iran
| | - Mohammad Khishe
- Department of Electrical Engineering, Imam Khomeini Marine Science University, Nowshahr, Iran
| | - Hamed Agahi
- Department of Electrical Engineering, Shiraz Branch, Islamic Azad University, Shiraz, Iran
| |
Collapse
|
22
|
Vattis K, Oubre B, Luddy AC, Ouillon JS, Eklund NM, Stephen CD, Schmahmann JD, Nunes AS, Gupta AS. Sensitive Quantification of Cerebellar Speech Abnormalities Using Deep Learning Models. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2024; 12:62328-62340. [PMID: 39606584 PMCID: PMC11601984 DOI: 10.1109/access.2024.3393243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Objective, sensitive, and meaningful disease assessments are critical to support clinical trials and clinical care. Speech changes are one of the earliest and most evident manifestations of cerebellar ataxias. This work aims to develop models that can accurately identify and quantify clinical signs of ataxic speech. We use convolutional neural networks to capture the motor speech phenotype of cerebellar ataxia based on time and frequency partial derivatives of log-mel spectrogram representations of speech. We train classification models to distinguish patients with ataxia from healthy controls as well as regression models to estimate disease severity. Classification models were able to accurately distinguish healthy controls from individuals with ataxia, including ataxia participants who clinicians rated as having no detectable clinical deficits in speech. Regression models produced accurate estimates of disease severity, were able to measure subclinical signs of ataxia, and captured disease progression over time. Convolutional networks trained on time and frequency partial derivatives of the speech signal can detect sub-clinical speech changes in ataxias and sensitively measure disease change over time. Learned speech analysis models have the potential to aid early detection of disease signs in ataxias and provide sensitive, low-burden assessment tools in support of clinical trials and neurological care.
Collapse
Affiliation(s)
- Kyriakos Vattis
- Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, USA
- Harvard Medical School, Boston, MA 02115, USA
| | - Brandon Oubre
- Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, USA
- Harvard Medical School, Boston, MA 02115, USA
| | - Anna C Luddy
- Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Jessey S Ouillon
- Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Nicole M Eklund
- Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Christopher D Stephen
- Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, USA
- Harvard Medical School, Boston, MA 02115, USA
- Department of Neurology, Ataxia Center, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Jeremy D Schmahmann
- Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, USA
- Harvard Medical School, Boston, MA 02115, USA
- Department of Neurology, Ataxia Center, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Adonay S Nunes
- Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, USA
- Harvard Medical School, Boston, MA 02115, USA
| | - Anoopum S Gupta
- Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, USA
- Harvard Medical School, Boston, MA 02115, USA
- Department of Neurology, Ataxia Center, Massachusetts General Hospital, Boston, MA 02114, USA
| |
Collapse
|
23
|
Santana GM, Dietrich MO. SqueakOut: Autoencoder-based segmentation of mouse ultrasonic vocalizations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.19.590368. [PMID: 38712291 PMCID: PMC11071348 DOI: 10.1101/2024.04.19.590368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Mice emit ultrasonic vocalizations (USVs) that are important for social communication. Despite great advancements in tools to detect USVs from audio files in the recent years, highly accurate segmentation of USVs from spectrograms (i.e., removing noise) remains a significant challenge. Here, we present a new dataset of 12,954 annotated spectrograms explicitly labeled for mouse USV segmentation. Leveraging this dataset, we developed SqueakOut, a lightweight (4.6M parameters) fully convolutional autoencoder that achieves high accuracy in supervised segmentation of USVs from spectrograms, with a Dice score of 90.22. SqueakOut combines a MobileNetV2 backbone with skip connections and transposed convolutions to precisely segment USVs. Using stochastic data augmentation techniques and a hybrid loss function, SqueakOut learns robust segmentation across varying recording conditions. We evaluate SqueakOut's performance, demonstrating substantial improvements over existing methods like VocalMat (63.82 Dice score). The accurate USV segmentations enabled by SqueakOut will facilitate novel methods for vocalization classification and more accurate analysis of mouse communication. To promote further research, we release the annotated 12,954 spectrogram USV segmentation dataset and the SqueakOut implementation publicly.
Collapse
Affiliation(s)
- Gustavo M Santana
- Laboratory of Physiology of Behavior, Interdepartmental Neuroscience Program, Program in Physics, Engineering and Biology, Yale University, USA
- Graduate Program in Biochemistry, Federal University of Rio Grande do Sul, BRA
| | - Marcelo O Dietrich
- Laboratory of Physiology of Behavior, Department of Comparative Medicine, Department of Neuroscience, Yale University, USA
| |
Collapse
|
24
|
Koparkar A, Warren TL, Charlesworth JD, Shin S, Brainard MS, Veit L. Lesions in a songbird vocal circuit increase variability in song syntax. eLife 2024; 13:RP93272. [PMID: 38635312 PMCID: PMC11026095 DOI: 10.7554/elife.93272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024] Open
Abstract
Complex skills like speech and dance are composed of ordered sequences of simpler elements, but the neuronal basis for the syntactic ordering of actions is poorly understood. Birdsong is a learned vocal behavior composed of syntactically ordered syllables, controlled in part by the songbird premotor nucleus HVC (proper name). Here, we test whether one of HVC's recurrent inputs, mMAN (medial magnocellular nucleus of the anterior nidopallium), contributes to sequencing in adult male Bengalese finches (Lonchura striata domestica). Bengalese finch song includes several patterns: (1) chunks, comprising stereotyped syllable sequences; (2) branch points, where a given syllable can be followed probabilistically by multiple syllables; and (3) repeat phrases, where individual syllables are repeated variable numbers of times. We found that following bilateral lesions of mMAN, acoustic structure of syllables remained largely intact, but sequencing became more variable, as evidenced by 'breaks' in previously stereotyped chunks, increased uncertainty at branch points, and increased variability in repeat numbers. Our results show that mMAN contributes to the variable sequencing of vocal elements in Bengalese finch song and demonstrate the influence of recurrent projections to HVC. Furthermore, they highlight the utility of species with complex syntax in investigating neuronal control of ordered sequences.
Collapse
Affiliation(s)
- Avani Koparkar
- Neurobiology of Vocal Communication, Institute for Neurobiology, University of TübingenTübingenGermany
| | - Timothy L Warren
- Howard Hughes Medical Institute and Center for Integrative Neuroscience, University of California San FranciscoSan FranciscoUnited States
- Departments of Horticulture and Integrative Biology, Oregon State UniversityCorvallisUnited States
| | - Jonathan D Charlesworth
- Howard Hughes Medical Institute and Center for Integrative Neuroscience, University of California San FranciscoSan FranciscoUnited States
| | - Sooyoon Shin
- Howard Hughes Medical Institute and Center for Integrative Neuroscience, University of California San FranciscoSan FranciscoUnited States
| | - Michael S Brainard
- Howard Hughes Medical Institute and Center for Integrative Neuroscience, University of California San FranciscoSan FranciscoUnited States
| | - Lena Veit
- Neurobiology of Vocal Communication, Institute for Neurobiology, University of TübingenTübingenGermany
| |
Collapse
|
25
|
Youngblood M. Language-like efficiency and structure in house finch song. Proc Biol Sci 2024; 291:20240250. [PMID: 38565151 PMCID: PMC10987240 DOI: 10.1098/rspb.2024.0250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2023] [Accepted: 03/06/2024] [Indexed: 04/04/2024] Open
Abstract
Communication needs to be complex enough to be functional while minimizing learning and production costs. Recent work suggests that the vocalizations and gestures of some songbirds, cetaceans and great apes may conform to linguistic laws that reflect this trade-off between efficiency and complexity. In studies of non-human communication, though, clustering signals into types cannot be done a priori, and decisions about the appropriate grain of analysis may affect statistical signals in the data. The aim of this study was to assess the evidence for language-like efficiency and structure in house finch (Haemorhous mexicanus) song across three levels of granularity in syllable clustering. The results show strong evidence for Zipf's rank-frequency law, Zipf's law of abbreviation and Menzerath's law. Additional analyses show that house finch songs have small-world structure, thought to reflect systematic structure in syntax, and the mutual information decay of sequences is consistent with a combination of Markovian and hierarchical processes. These statistical patterns are robust across three levels of granularity in syllable clustering, pointing to a limited form of scale invariance. In sum, it appears that house finch song has been shaped by pressure for efficiency, possibly to offset the costs of female preferences for complexity.
Collapse
Affiliation(s)
- Mason Youngblood
- Minds and Traditions Research Group, Max Planck Institute for Geoanthropology, Jena, Thüringen, Germany
- Institute for Advanced Computational Science, Stony Brook University, Stony Brook, NY, USA
| |
Collapse
|
26
|
Alam D, Zia F, Roberts TF. The hidden fitness of the male zebra finch courtship song. Nature 2024; 628:117-121. [PMID: 38509376 PMCID: PMC11410162 DOI: 10.1038/s41586-024-07207-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 02/19/2024] [Indexed: 03/22/2024]
Abstract
Vocal learning in songbirds is thought to have evolved through sexual selection, with female preference driving males to develop large and varied song repertoires1-3. However, many songbird species learn only a single song in their lifetime4. How sexual selection drives the evolution of single-song repertoires is not known. Here, by applying dimensionality-reduction techniques to the singing behaviour of zebra finches (Taeniopygia guttata), we show that syllable spread in low-dimensional feature space explains how single songs function as honest indicators of fitness. We find that this Gestalt measure of behaviour captures the spectrotemporal distinctiveness of song syllables in zebra finches; that females strongly prefer songs that occupy more latent space; and that matching path lengths in low-dimensional space is difficult for young males. Our findings clarify how simple vocal repertoires may have evolved in songbirds and indicate divergent strategies for how sexual selection can shape vocal learning.
Collapse
Affiliation(s)
- Danyal Alam
- Department of Neuroscience, UT Southwestern Medical Center, Dallas, TX, USA
| | - Fayha Zia
- Department of Neuroscience, UT Southwestern Medical Center, Dallas, TX, USA
| | - Todd F Roberts
- Department of Neuroscience, UT Southwestern Medical Center, Dallas, TX, USA.
- O'Donnell Brain Institute, UT Southwestern Medical Center, Dallas, TX, USA.
| |
Collapse
|
27
|
Cominelli S, Bellin N, Brown CD, Rossi V, Lawson J. Acoustic features as a tool to visualize and explore marine soundscapes: Applications illustrated using marine mammal passive acoustic monitoring datasets. Ecol Evol 2024; 14:e10951. [PMID: 38384822 PMCID: PMC10880131 DOI: 10.1002/ece3.10951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 12/16/2023] [Accepted: 01/02/2024] [Indexed: 02/23/2024] Open
Abstract
Passive Acoustic Monitoring (PAM) is emerging as a solution for monitoring species and environmental change over large spatial and temporal scales. However, drawing rigorous conclusions based on acoustic recordings is challenging, as there is no consensus over which approaches are best suited for characterizing marine acoustic environments. Here, we describe the application of multiple machine-learning techniques to the analysis of two PAM datasets. We combine pre-trained acoustic classification models (VGGish, NOAA and Google Humpback Whale Detector), dimensionality reduction (UMAP), and balanced random forest algorithms to demonstrate how machine-learned acoustic features capture different aspects of the marine acoustic environment. The UMAP dimensions derived from VGGish acoustic features exhibited good performance in separating marine mammal vocalizations according to species and locations. RF models trained on the acoustic features performed well for labeled sounds in the 8 kHz range; however, low- and high-frequency sounds could not be classified using this approach. The workflow presented here shows how acoustic feature extraction, visualization, and analysis allow establishing a link between ecologically relevant information and PAM recordings at multiple scales, ranging from large-scale changes in the environment (i.e., changes in wind speed) to the identification of marine mammal species.
Collapse
Affiliation(s)
- Simone Cominelli
- Northern EDGE Lab, Department of GeographyMemorial University of Newfoundland and LabradorSt. John'sNewfoundland and LabradorCanada
| | - Nicolo' Bellin
- Department of Chemistry, Life Sciences and Environmental SustainabilityUniversity of ParmaParmaItaly
| | - Carissa D. Brown
- Northern EDGE Lab, Department of GeographyMemorial University of Newfoundland and LabradorSt. John'sNewfoundland and LabradorCanada
| | - Valeria Rossi
- Department of Chemistry, Life Sciences and Environmental SustainabilityUniversity of ParmaParmaItaly
| | - Jack Lawson
- Marine Mammal SectionDepartment of Fisheries and OceansSt. John'sNewfoundland and LabradorCanada
| |
Collapse
|
28
|
Choi N, Miller P, Hebets EA. Vibroscape analysis reveals acoustic niche overlap and plastic alteration of vibratory courtship signals in ground-dwelling wolf spiders. Commun Biol 2024; 7:23. [PMID: 38182735 PMCID: PMC10770364 DOI: 10.1038/s42003-023-05700-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 12/12/2023] [Indexed: 01/07/2024] Open
Abstract
To expand the scope of soundscape ecology to encompass substrate-borne vibrations (i.e. vibroscapes), we analyzed the vibroscape of a deciduous forest floor using contact microphone arrays followed by automated processing of large audio datasets. We then focused on vibratory signaling of ground-dwelling Schizocosa wolf spiders to test for (i) acoustic niche partitioning and (ii) plastic behavioral responses that might reduce the risk of signal interference from substrate-borne noise and conspecific/heterospecific signaling. Two closely related species - S. stridulans and S. uetzi - showed high acoustic niche overlap across space, time, and dominant frequency. Both species show plastic behavioral responses - S. uetzi males shorten their courtship in higher abundance of substrate-borne noise, S. stridulans males increased the duration of their vibratory courtship signals in a higher abundance of conspecific signals, and S. stridulans males decreased vibratory signal complexity in a higher abundance of S. uetzi signals.
Collapse
Affiliation(s)
- Noori Choi
- School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, NE, USA.
- Max Planck Institute of Animal Behavior, Konstanz, Germany.
| | - Pat Miller
- University of Mississippi field station associate, Abbeville, MS, USA
| | - Eileen A Hebets
- School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, NE, USA
| |
Collapse
|
29
|
Martin K, Cornero FM, Clayton NS, Adam O, Obin N, Dufour V. Vocal complexity in a socially complex corvid: gradation, diversity and lack of common call repertoire in male rooks. ROYAL SOCIETY OPEN SCIENCE 2024; 11:231713. [PMID: 38204786 PMCID: PMC10776222 DOI: 10.1098/rsos.231713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Accepted: 12/08/2023] [Indexed: 01/12/2024]
Abstract
Vocal communication is widespread in animals, with vocal repertoires of varying complexity. The social complexity hypothesis predicts that species may need high vocal complexity to deal with complex social organization (e.g. have a variety of different interindividual relations). We quantified the vocal complexity of two geographically distant captive colonies of rooks, a corvid species with complex social organization and cognitive performances, but understudied vocal abilities. We quantified the diversity and gradation of their repertoire, as well as the inter-individual similarity at the vocal unit level. We found that males produced call units with lower diversity and gradation than females, while song units did not differ between sexes. Surprisingly, while females produced highly similar call repertoires, even between colonies, each individual male produced almost completely different call repertoires from any other individual. These findings question the way male rooks communicate with their social partners. We suggest that each male may actively seek to remain vocally distinct, which could be an asset in their frequently changing social environment. We conclude that inter-individual similarity, an understudied aspect of vocal repertoires, should also be considered as a measure of vocal complexity.
Collapse
Affiliation(s)
- Killian Martin
- PRC, UMR 7247, Ethologie Cognitive et Sociale, CNRS-IFCE-INRAE-Université de Tours, Strasbourg, France
| | | | | | - Olivier Adam
- Institut Jean Le Rond d'Alembert, UMR 7190, CNRS-Sorbonne Université, 75005 Paris, France
- Institut des Neurosciences Paris-Saclay, UMR 9197, CNRS-Université Paris Sud, Orsay, France
| | - Nicolas Obin
- STMS Lab, IRCAM, CNRS-Sorbonne Université, Paris, France
| | - Valérie Dufour
- PRC, UMR 7247, Ethologie Cognitive et Sociale, CNRS-IFCE-INRAE-Université de Tours, Strasbourg, France
| |
Collapse
|
30
|
Uehara K, Yasuhara M, Koguchi J, Oku T, Shiotani S, Morise M, Furuya S. Brain network flexibility as a predictor of skilled musical performance. Cereb Cortex 2023; 33:10492-10503. [PMID: 37566918 DOI: 10.1093/cercor/bhad298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Revised: 07/25/2023] [Accepted: 07/26/2023] [Indexed: 08/13/2023] Open
Abstract
Interactions between the body and the environment are dynamically modulated by upcoming sensory information and motor execution. To adapt to this behavioral state-shift, brain activity must also be flexible and possess a large repertoire of brain networks so as to switch them flexibly. Recently, flexible internal brain communications, i.e. brain network flexibility, have come to be recognized as playing a vital role in integrating various sensorimotor information. Therefore, brain network flexibility is one of the key factors that define sensorimotor skill. However, little is known about how flexible communications within the brain characterize the interindividual variation of sensorimotor skill and trial-by-trial variability within individuals. To address this, we recruited skilled musical performers and used a novel approach that combined multichannel-scalp electroencephalography, behavioral measurements of musical performance, and mathematical approaches to extract brain network flexibility. We found that brain network flexibility immediately before initiating the musical performance predicted interindividual differences in the precision of tone timbre when required for feedback control, but not for feedforward control. Furthermore, brain network flexibility in broad cortical regions predicted skilled musical performance. Our results provide novel evidence that brain network flexibility plays an important role in building skilled sensorimotor performance.
Collapse
Affiliation(s)
- Kazumasa Uehara
- Neural Information Dynamics Laboratory, Department of Computer Science and Engineering, Toyohashi University of Technology, Toyohashi, Japan
- Sony Computer Science Laboratories Inc, Tokyo 1410022, Japan
| | - Masaki Yasuhara
- Sony Computer Science Laboratories Inc, Tokyo 1410022, Japan
- Neural Engineering Laboratory, Department of Science of Technology Innovation, Nagaoka University of Technology, Nagaoka, Japan
| | - Junya Koguchi
- Sony Computer Science Laboratories Inc, Tokyo 1410022, Japan
- Graduate School of Advanced Mathematical Sciences, Meiji University, Tokyo, Japan
| | | | | | - Masanori Morise
- Sony Computer Science Laboratories Inc, Tokyo 1410022, Japan
- School of Interdisciplinary Mathematical Sciences, Meiji University, Tokyo, Japan
| | - Shinichi Furuya
- Sony Computer Science Laboratories Inc, Tokyo 1410022, Japan
- NeuroPiano Institute, Kyoto 6008086, Japan
| |
Collapse
|
31
|
Lockhart-Bouron M, Anikin A, Pisanski K, Corvin S, Cornec C, Papet L, Levréro F, Fauchon C, Patural H, Reby D, Mathevon N. Infant cries convey both stable and dynamic information about age and identity. COMMUNICATIONS PSYCHOLOGY 2023; 1:26. [PMID: 39242685 PMCID: PMC11332224 DOI: 10.1038/s44271-023-00022-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 08/31/2023] [Indexed: 09/09/2024]
Abstract
What information is encoded in the cries of human babies? While it is widely recognized that cries can encode distress levels, whether cries reliably encode the cause of crying remains disputed. Here, we collected 39201 cries from 24 babies recorded in their homes longitudinally, from 15 days to 3.5 months of age, a database we share publicly for reuse. Based on the parental action that stopped the crying, which matched the parental evaluation of cry cause in 75% of cases, each cry was classified as caused by discomfort, hunger, or isolation. Our analyses show that baby cries provide reliable information about age and identity. Baby voices become more tonal and less shrill with age, while individual acoustic signatures drift throughout the first months of life. In contrast, neither machine learning algorithms nor trained adult listeners can reliably recognize the causes of crying.
Collapse
Affiliation(s)
- Marguerite Lockhart-Bouron
- Neonatal and Pediatric Intensive Care Unit, SAINBIOSE Iaboratory, Inserm, University Hospital of Saint-Etienne, University of Saint-Etienne, Saint-Etienne, France
| | - Andrey Anikin
- ENES Bioacoustics Research Laboratory, CRNL, CNRS, Inserm, University of Saint-Etienne, Saint-Etienne, France
- Division of Cognitive Science, Lund University, Lund, Sweden
| | - Katarzyna Pisanski
- ENES Bioacoustics Research Laboratory, CRNL, CNRS, Inserm, University of Saint-Etienne, Saint-Etienne, France
- Laboratoire Dynamique du Langage DDL, CNRS, University of Lyon 2, Lyon, France
| | - Siloé Corvin
- ENES Bioacoustics Research Laboratory, CRNL, CNRS, Inserm, University of Saint-Etienne, Saint-Etienne, France
- Central Integration of Pain-Neuropain Laboratory, CRNL, CNRS, Inserm, UCB Lyon 1, University of Saint-Etienne, Saint-Etienne, France
| | - Clément Cornec
- ENES Bioacoustics Research Laboratory, CRNL, CNRS, Inserm, University of Saint-Etienne, Saint-Etienne, France
| | - Léo Papet
- ENES Bioacoustics Research Laboratory, CRNL, CNRS, Inserm, University of Saint-Etienne, Saint-Etienne, France
| | - Florence Levréro
- ENES Bioacoustics Research Laboratory, CRNL, CNRS, Inserm, University of Saint-Etienne, Saint-Etienne, France
- Institut universitaire de France, Paris, France
| | - Camille Fauchon
- Central Integration of Pain-Neuropain Laboratory, CRNL, CNRS, Inserm, UCB Lyon 1, University of Saint-Etienne, Saint-Etienne, France
| | - Hugues Patural
- Neonatal and Pediatric Intensive Care Unit, SAINBIOSE Iaboratory, Inserm, University Hospital of Saint-Etienne, University of Saint-Etienne, Saint-Etienne, France
| | - David Reby
- ENES Bioacoustics Research Laboratory, CRNL, CNRS, Inserm, University of Saint-Etienne, Saint-Etienne, France
- Institut universitaire de France, Paris, France
| | - Nicolas Mathevon
- ENES Bioacoustics Research Laboratory, CRNL, CNRS, Inserm, University of Saint-Etienne, Saint-Etienne, France.
- Institut universitaire de France, Paris, France.
- Ecole Pratique des Hautes Etudes, CHArt Lab, PSL Research University, Paris, France.
| |
Collapse
|
32
|
Fleishman E, Cholewiak D, Gillespie D, Helble T, Klinck H, Nosal EM, Roch MA. Ecological inferences about marine mammals from passive acoustic data. Biol Rev Camb Philos Soc 2023; 98:1633-1647. [PMID: 37142263 DOI: 10.1111/brv.12969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2022] [Revised: 04/20/2023] [Accepted: 04/24/2023] [Indexed: 05/06/2023]
Abstract
Monitoring on the basis of sound recordings, or passive acoustic monitoring, can complement or serve as an alternative to real-time visual or aural monitoring of marine mammals and other animals by human observers. Passive acoustic data can support the estimation of common, individual-level ecological metrics, such as presence, detection-weighted occupancy, abundance and density, population viability and structure, and behaviour. Passive acoustic data also can support estimation of some community-level metrics, such as species richness and composition. The feasibility of estimation and certainty of estimates is highly context dependent, and understanding the factors that affect the reliability of measurements is useful for those considering whether to use passive acoustic data. Here, we review basic concepts and methods of passive acoustic sampling in marine systems that often are applicable to marine mammal research and conservation. Our ultimate aim is to facilitate collaboration among ecologists, bioacousticians, and data analysts. Ecological applications of passive acoustics require one to make decisions about sampling design, which in turn requires consideration of sound propagation, sampling of signals, and data storage. One also must make decisions about signal detection and classification and evaluation of the performance of algorithms for these tasks. Investment in the research and development of systems that automate detection and classification, including machine learning, are increasing. Passive acoustic monitoring is more reliable for detection of species presence than for estimation of other species-level metrics. Use of passive acoustic monitoring to distinguish among individual animals remains difficult. However, information about detection probability, vocalisation or cue rate, and relations between vocalisations and the number and behaviour of animals increases the feasibility of estimating abundance or density. Most sensor deployments are fixed in space or are sporadic, making temporal turnover in species composition more tractable to estimate than spatial turnover. Collaborations between acousticians and ecologists are most likely to be successful and rewarding when all partners critically examine and share a fundamental understanding of the target variables, sampling process, and analytical methods.
Collapse
Affiliation(s)
- Erica Fleishman
- College of Earth, Ocean, and Atmospheric Sciences, Oregon State University, Corvallis, OR, 97331, USA
| | - Danielle Cholewiak
- Northeast Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, Woods Hole, MA, 02543, USA
| | - Douglas Gillespie
- Sea Mammal Research Unit, Scottish Oceans Institute, University of St Andrews, St Andrews, KY16 9XL, UK
| | - Tyler Helble
- Naval Information Warfare Center Pacific, San Diego, CA, 92152, USA
| | - Holger Klinck
- K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University, Ithaca, NY, 14850, USA
| | - Eva-Marie Nosal
- Department of Ocean and Resources Engineering, University of Hawai'i at Manoa, Honolulu, HI, 96822, USA
| | - Marie A Roch
- Department of Computer Science, San Diego State University, San Diego, CA, 92182, USA
| |
Collapse
|
33
|
Zhang S, Gao Y, Cai J, Yang H, Zhao Q, Pan F. A Novel Bird Sound Recognition Method Based on Multifeature Fusion and a Transformer Encoder. SENSORS (BASEL, SWITZERLAND) 2023; 23:8099. [PMID: 37836929 PMCID: PMC10575132 DOI: 10.3390/s23198099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 09/16/2023] [Accepted: 09/18/2023] [Indexed: 10/15/2023]
Abstract
Birds play a vital role in the study of ecosystems and biodiversity. Accurate bird identification helps monitor biodiversity, understand the functions of ecosystems, and develop effective conservation strategies. However, previous bird sound recognition methods often relied on single features and overlooked the spatial information associated with these features, leading to low accuracy. Recognizing this gap, the present study proposed a bird sound recognition method that employs multiple convolutional neural-based networks and a transformer encoder to provide a reliable solution for identifying and classifying birds based on their unique sounds. We manually extracted various acoustic features as model inputs, and feature fusion was applied to obtain the final set of feature vectors. Feature fusion combines the deep features extracted by various networks, resulting in a more comprehensive feature set, thereby improving recognition accuracy. The multiple integrated acoustic features, such as mel frequency cepstral coefficients (MFCC), chroma features (Chroma) and Tonnetz features, were encoded by a transformer encoder. The transformer encoder effectively extracted the positional relationships between bird sound features, resulting in enhanced recognition accuracy. The experimental results demonstrated the exceptional performance of our method with an accuracy of 97.99%, a recall of 96.14%, an F1 score of 96.88% and a precision of 97.97% on the Birdsdata dataset. Furthermore, our method achieved an accuracy of 93.18%, a recall of 92.43%, an F1 score of 93.14% and a precision of 93.25% on the Cornell Bird Challenge 2020 (CBC) dataset.
Collapse
Affiliation(s)
- Shaokai Zhang
- College of Electronics and Information Engineering, Sichuan University, Chengdu 610041, China; (S.Z.); (Y.G.); (J.C.)
| | - Yuan Gao
- College of Electronics and Information Engineering, Sichuan University, Chengdu 610041, China; (S.Z.); (Y.G.); (J.C.)
| | - Jianmin Cai
- College of Electronics and Information Engineering, Sichuan University, Chengdu 610041, China; (S.Z.); (Y.G.); (J.C.)
| | - Hangxiao Yang
- College of Computer Science, Sichuan University, Chengdu 610041, China; (H.Y.); (Q.Z.)
| | - Qijun Zhao
- College of Computer Science, Sichuan University, Chengdu 610041, China; (H.Y.); (Q.Z.)
| | - Fan Pan
- College of Electronics and Information Engineering, Sichuan University, Chengdu 610041, China; (S.Z.); (Y.G.); (J.C.)
| |
Collapse
|
34
|
Chen X, Wang R, Khalilian-Gourtani A, Yu L, Dugan P, Friedman D, Doyle W, Devinsky O, Wang Y, Flinker A. A Neural Speech Decoding Framework Leveraging Deep Learning and Speech Synthesis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.16.558028. [PMID: 37745380 PMCID: PMC10516019 DOI: 10.1101/2023.09.16.558028] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Decoding human speech from neural signals is essential for brain-computer interface (BCI) technologies restoring speech function in populations with neurological deficits. However, it remains a highly challenging task, compounded by the scarce availability of neural signals with corresponding speech, data complexity, and high dimensionality, and the limited publicly available source code. Here, we present a novel deep learning-based neural speech decoding framework that includes an ECoG Decoder that translates electrocorticographic (ECoG) signals from the cortex into interpretable speech parameters and a novel differentiable Speech Synthesizer that maps speech parameters to spectrograms. We develop a companion audio-to-audio auto-encoder consisting of a Speech Encoder and the same Speech Synthesizer to generate reference speech parameters to facilitate the ECoG Decoder training. This framework generates natural-sounding speech and is highly reproducible across a cohort of 48 participants. Among three neural network architectures for the ECoG Decoder, the 3D ResNet model has the best decoding performance (PCC=0.804) in predicting the original speech spectrogram, closely followed by the SWIN model (PCC=0.796). Our experimental results show that our models can decode speech with high correlation even when limited to only causal operations, which is necessary for adoption by real-time neural prostheses. We successfully decode speech in participants with either left or right hemisphere coverage, which could lead to speech prostheses in patients with speech deficits resulting from left hemisphere damage. Further, we use an occlusion analysis to identify cortical regions contributing to speech decoding across our models. Finally, we provide open-source code for our two-stage training pipeline along with associated preprocessing and visualization tools to enable reproducible research and drive research across the speech science and prostheses communities.
Collapse
|
35
|
Lu S, Ang GW, Steadman M, Kozlov AS. Composite receptive fields in the mouse auditory cortex. J Physiol 2023; 601:4091-4104. [PMID: 37578817 PMCID: PMC10952747 DOI: 10.1113/jp285003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 07/12/2023] [Indexed: 08/15/2023] Open
Abstract
A central question in sensory neuroscience is how neurons represent complex natural stimuli. This process involves multiple steps of feature extraction to obtain a condensed, categorical representation useful for classification and behaviour. It has previously been shown that central auditory neurons in the starling have composite receptive fields composed of multiple features. Whether this property is an idiosyncratic characteristic of songbirds, a group of highly specialized vocal learners or a generic property of sensory processing is unknown. To address this question, we have recorded responses from auditory cortical neurons in mice, and characterized their receptive fields using mouse ultrasonic vocalizations (USVs) as a natural and ethologically relevant stimulus and pitch-shifted starling songs as a natural but ethologically irrelevant control stimulus. We have found that these neurons display composite receptive fields with multiple excitatory and inhibitory subunits. Moreover, this was the case with either the conspecific or the heterospecific vocalizations. We then trained the sparse filtering algorithm on both classes of natural stimuli to obtain statistically optimal features, and compared the natural and artificial features using UMAP, a dimensionality-reduction algorithm previously used to analyse mouse USVs and birdsongs. We have found that the receptive-field features obtained with both types of the natural stimuli clustered together, as did the sparse-filtering features. However, the natural and artificial receptive-field features clustered mostly separately. Based on these results, our general conclusion is that composite receptive fields are not a unique characteristic of specialized vocal learners but are likely a generic property of central auditory systems. KEY POINTS: Auditory cortical neurons in the mouse have composite receptive fields with several excitatory and inhibitory features. Receptive-field features capture temporal and spectral modulations of natural stimuli. Ethological relevance of the stimulus affects the estimation of receptive-field dimensionality.
Collapse
Affiliation(s)
- Sihao Lu
- Department of BioengineeringImperial College LondonLondonUK
| | - Grace W.Y. Ang
- Department of BioengineeringImperial College LondonLondonUK
| | - Mark Steadman
- Department of BioengineeringImperial College LondonLondonUK
| | | |
Collapse
|
36
|
Best P, Paris S, Glotin H, Marxer R. Deep audio embeddings for vocalisation clustering. PLoS One 2023; 18:e0283396. [PMID: 37428759 PMCID: PMC10332598 DOI: 10.1371/journal.pone.0283396] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 06/25/2023] [Indexed: 07/12/2023] Open
Abstract
The study of non-human animals' communication systems generally relies on the transcription of vocal sequences using a finite set of discrete units. This set is referred to as a vocal repertoire, which is specific to a species or a sub-group of a species. When conducted by human experts, the formal description of vocal repertoires can be laborious and/or biased. This motivates computerised assistance for this procedure, for which machine learning algorithms represent a good opportunity. Unsupervised clustering algorithms are suited for grouping close points together, provided a relevant representation. This paper therefore studies a new method for encoding vocalisations, allowing for automatic clustering to alleviate vocal repertoire characterisation. Borrowing from deep representation learning, we use a convolutional auto-encoder network to learn an abstract representation of vocalisations. We report on the quality of the learnt representation, as well as of state of the art methods, by quantifying their agreement with expert labelled vocalisation types from 8 datasets of other studies across 6 species (birds and marine mammals). With this benchmark, we demonstrate that using auto-encoders improves the relevance of vocalisation representation which serves repertoire characterisation using a very limited number of settings. We also publish a Python package for the bioacoustic community to train their own vocalisation auto-encoders or use a pretrained encoder to browse vocal repertoires and ease unit wise annotation.
Collapse
Affiliation(s)
- Paul Best
- Université de Toulon, Aix Marseille Univ, CNRS, LIS, Toulon, France
| | - Sébastien Paris
- Université de Toulon, Aix Marseille Univ, CNRS, LIS, Toulon, France
| | - Hervé Glotin
- Université de Toulon, Aix Marseille Univ, CNRS, LIS, Toulon, France
| | - Ricard Marxer
- Université de Toulon, Aix Marseille Univ, CNRS, LIS, Toulon, France
| |
Collapse
|
37
|
Colquitt BM, Li K, Green F, Veline R, Brainard MS. Neural circuit-wide analysis of changes to gene expression during deafening-induced birdsong destabilization. eLife 2023; 12:e85970. [PMID: 37284822 PMCID: PMC10259477 DOI: 10.7554/elife.85970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 04/17/2023] [Indexed: 06/08/2023] Open
Abstract
Sensory feedback is required for the stable execution of learned motor skills, and its loss can severely disrupt motor performance. The neural mechanisms that mediate sensorimotor stability have been extensively studied at systems and physiological levels, yet relatively little is known about how disruptions to sensory input alter the molecular properties of associated motor systems. Songbird courtship song, a model for skilled behavior, is a learned and highly structured vocalization that is destabilized following deafening. Here, we sought to determine how the loss of auditory feedback modifies gene expression and its coordination across the birdsong sensorimotor circuit. To facilitate this system-wide analysis of transcriptional responses, we developed a gene expression profiling approach that enables the construction of hundreds of spatially-defined RNA-sequencing libraries. Using this method, we found that deafening preferentially alters gene expression across birdsong neural circuitry relative to surrounding areas, particularly in premotor and striatal regions. Genes with altered expression are associated with synaptic transmission, neuronal spines, and neuromodulation and show a bias toward expression in glutamatergic neurons and Pvalb/Sst-class GABAergic interneurons. We also found that connected song regions exhibit correlations in gene expression that were reduced in deafened birds relative to hearing birds, suggesting that song destabilization alters the inter-region coordination of transcriptional states. Finally, lesioning LMAN, a forebrain afferent of RA required for deafening-induced song plasticity, had the largest effect on groups of genes that were also most affected by deafening. Combined, this integrated transcriptomics analysis demonstrates that the loss of peripheral sensory input drives a distributed gene expression response throughout associated sensorimotor neural circuitry and identifies specific candidate molecular and cellular mechanisms that support the stability and plasticity of learned motor skills.
Collapse
Affiliation(s)
- Bradley M Colquitt
- Howard Hughes Medical InstituteChevy ChaseUnited States
- Department of Physiology, University of California, San FranciscoSan FranciscoUnited States
| | - Kelly Li
- Howard Hughes Medical InstituteChevy ChaseUnited States
- Department of Physiology, University of California, San FranciscoSan FranciscoUnited States
| | - Foad Green
- Howard Hughes Medical InstituteChevy ChaseUnited States
- Department of Physiology, University of California, San FranciscoSan FranciscoUnited States
| | - Robert Veline
- Howard Hughes Medical InstituteChevy ChaseUnited States
- Department of Physiology, University of California, San FranciscoSan FranciscoUnited States
| | - Michael S Brainard
- Howard Hughes Medical InstituteChevy ChaseUnited States
- Department of Physiology, University of California, San FranciscoSan FranciscoUnited States
| |
Collapse
|
38
|
Brudner S, Pearson J, Mooney R. Generative models of birdsong learning link circadian fluctuations in song variability to changes in performance. PLoS Comput Biol 2023; 19:e1011051. [PMID: 37126511 PMCID: PMC10150982 DOI: 10.1371/journal.pcbi.1011051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Accepted: 03/27/2023] [Indexed: 05/02/2023] Open
Abstract
Learning skilled behaviors requires intensive practice over days, months, or years. Behavioral hallmarks of practice include exploratory variation and long-term improvements, both of which can be impacted by circadian processes. During weeks of vocal practice, the juvenile male zebra finch transforms highly variable and simple song into a stable and precise copy of an adult tutor's complex song. Song variability and performance in juvenile finches also exhibit circadian structure that could influence this long-term learning process. In fact, one influential study reported juvenile song regresses towards immature performance overnight, while another suggested a more complex pattern of overnight change. However, neither of these studies thoroughly examined how circadian patterns of variability may structure the production of more or less mature songs. Here we relate the circadian dynamics of song maturation to circadian patterns of song variation, leveraging a combination of data-driven approaches. In particular we analyze juvenile singing in learned feature space that supports both data-driven measures of song maturity and generative developmental models of song production. These models reveal that circadian fluctuations in variability lead to especially regressive morning variants even without overall overnight regression, and highlight the utility of data-driven generative models for untangling these contributions.
Collapse
Affiliation(s)
- Samuel Brudner
- Department of Neurobiology, Duke University School of Medicine, Durham, North Carolina, United States of America
| | - John Pearson
- Department of Neurobiology, Duke University School of Medicine, Durham, North Carolina, United States of America
- Department of Biostatistics & Bioinformatics, Duke University, Durham, North Carolina, United States of America
| | - Richard Mooney
- Department of Neurobiology, Duke University School of Medicine, Durham, North Carolina, United States of America
| |
Collapse
|
39
|
Arnaud V, Pellegrino F, Keenan S, St-Gelais X, Mathevon N, Levréro F, Coupé C. Improving the workflow to crack Small, Unbalanced, Noisy, but Genuine (SUNG) datasets in bioacoustics: The case of bonobo calls. PLoS Comput Biol 2023; 19:e1010325. [PMID: 37053268 PMCID: PMC10129004 DOI: 10.1371/journal.pcbi.1010325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Revised: 04/25/2023] [Accepted: 03/01/2023] [Indexed: 04/15/2023] Open
Abstract
Despite the accumulation of data and studies, deciphering animal vocal communication remains challenging. In most cases, researchers must deal with the sparse recordings composing Small, Unbalanced, Noisy, but Genuine (SUNG) datasets. SUNG datasets are characterized by a limited number of recordings, most often noisy, and unbalanced in number between the individuals or categories of vocalizations. SUNG datasets therefore offer a valuable but inevitably distorted vision of communication systems. Adopting the best practices in their analysis is essential to effectively extract the available information and draw reliable conclusions. Here we show that the most recent advances in machine learning applied to a SUNG dataset succeed in unraveling the complex vocal repertoire of the bonobo, and we propose a workflow that can be effective with other animal species. We implement acoustic parameterization in three feature spaces and run a Supervised Uniform Manifold Approximation and Projection (S-UMAP) to evaluate how call types and individual signatures cluster in the bonobo acoustic space. We then implement three classification algorithms (Support Vector Machine, xgboost, neural networks) and their combination to explore the structure and variability of bonobo calls, as well as the robustness of the individual signature they encode. We underscore how classification performance is affected by the feature set and identify the most informative features. In addition, we highlight the need to address data leakage in the evaluation of classification performance to avoid misleading interpretations. Our results lead to identifying several practical approaches that are generalizable to any other animal communication system. To improve the reliability and replicability of vocal communication studies with SUNG datasets, we thus recommend: i) comparing several acoustic parameterizations; ii) visualizing the dataset with supervised UMAP to examine the species acoustic space; iii) adopting Support Vector Machines as the baseline classification approach; iv) explicitly evaluating data leakage and possibly implementing a mitigation strategy.
Collapse
Affiliation(s)
- Vincent Arnaud
- Département des arts, des lettres et du langage, Université du Québec à Chicoutimi, Chicoutimi, Canada
- Laboratoire Dynamique Du Langage, UMR 5596, Université de Lyon, CNRS, Lyon, France
| | - François Pellegrino
- Laboratoire Dynamique Du Langage, UMR 5596, Université de Lyon, CNRS, Lyon, France
| | - Sumir Keenan
- ENES Bioacoustics Research Laboratory, University of Saint Étienne, CRNL, CNRS UMR 5292, Inserm UMR_S 1028, Saint-Étienne, France
| | - Xavier St-Gelais
- Département des arts, des lettres et du langage, Université du Québec à Chicoutimi, Chicoutimi, Canada
| | - Nicolas Mathevon
- ENES Bioacoustics Research Laboratory, University of Saint Étienne, CRNL, CNRS UMR 5292, Inserm UMR_S 1028, Saint-Étienne, France
| | - Florence Levréro
- ENES Bioacoustics Research Laboratory, University of Saint Étienne, CRNL, CNRS UMR 5292, Inserm UMR_S 1028, Saint-Étienne, France
| | - Christophe Coupé
- Laboratoire Dynamique Du Langage, UMR 5596, Université de Lyon, CNRS, Lyon, France
- Department of Linguistics, The University of Hong Kong, Hong Kong, China
| |
Collapse
|
40
|
Vattis K, Luddy AC, Ouillon JS, Eklund NM, Stephen CD, Schmahmann JD, Nunes AS, Gupta AS. Sensitive quantification of cerebellar speech abnormalities using deep learning models. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.04.03.23288094. [PMID: 37066308 PMCID: PMC10104181 DOI: 10.1101/2023.04.03.23288094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
Objective Objective, sensitive, and meaningful disease assessments are critical to support clinical trials and clinical care. Speech changes are one of the earliest and most evident manifestations of cerebellar ataxias. The purpose of this work is to develop models that can accurately identify and quantify these abnormalities. Methods We use deep learning models such as ResNet 18 , that take the time and frequency partial derivatives of the log-mel spectrogram representations of speech as input, to learn representations that capture the motor speech phenotype of cerebellar ataxia. We train classification models to separate patients with ataxia from healthy controls as well as regression models to estimate disease severity. Results Our model was able to accurately distinguish healthy controls from individuals with ataxia, including ataxia participants with no detectable clinical deficits in speech. Furthermore the regression models produced accurate estimates of disease severity, were able to measure subclinical signs of ataxia, and captured disease progression over time in individuals with ataxia. Conclusion Deep learning models, trained on time and frequency partial derivatives of the speech signal, can detect sub-clinical speech changes in ataxias and sensitively measure disease change over time. Significance Such models have the potential to assist with early detection of ataxia and to provide sensitive and low-burden assessment tools in support of clinical trials and neurological care.
Collapse
|
41
|
Jourjine N, Woolfolk ML, Sanguinetti-Scheck JI, Sabatini JE, McFadden S, Lindholm AK, Hoekstra HE. Two pup vocalization types are genetically and functionally separable in deer mice. Curr Biol 2023; 33:1237-1248.e4. [PMID: 36893759 DOI: 10.1016/j.cub.2023.02.045] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2022] [Revised: 02/11/2023] [Accepted: 02/14/2023] [Indexed: 03/10/2023]
Abstract
Vocalization is a widespread social behavior in vertebrates that can affect fitness in the wild. Although many vocal behaviors are highly conserved, heritable features of specific vocalization types can vary both within and between species, raising the questions of why and how some vocal behaviors evolve. Here, using new computational tools to automatically detect and cluster vocalizations into distinct acoustic categories, we compare pup isolation calls across neonatal development in eight taxa of deer mice (genus Peromyscus) and compare them with laboratory mice (C57BL6/J strain) and free-living, wild house mice (Mus musculus domesticus). Whereas both Peromyscus and Mus pups produce ultrasonic vocalizations (USVs), Peromyscus pups also produce a second call type with acoustic features, temporal rhythms, and developmental trajectories that are distinct from those of USVs. In deer mice, these lower frequency "cries" are predominantly emitted in postnatal days one through nine, whereas USVs are primarily made after day 9. Using playback assays, we show that cries result in a more rapid approach by Peromyscus mothers than USVs, suggesting a role for cries in eliciting parental care early in neonatal development. Using a genetic cross between two sister species of deer mice exhibiting large, innate differences in the acoustic structure of cries and USVs, we find that variation in vocalization rate, duration, and pitch displays different degrees of genetic dominance and that cry and USV features can be uncoupled in second-generation hybrids. Taken together, this work shows that vocal behavior can evolve quickly between closely related rodent species in which vocalization types, likely serving distinct functions in communication, are controlled by distinct genetic loci.
Collapse
Affiliation(s)
- Nicholas Jourjine
- Department of Molecular & Cellular Biology, Department of Organismic & Evolutionary Biology, Center for Brain Science, Museum of Comparative Zoology, Harvard University and the Howard Hughes Medical Institute, 16 Divinity Avenue, Cambridge, MA 02138, USA
| | - Maya L Woolfolk
- Department of Molecular & Cellular Biology, Department of Organismic & Evolutionary Biology, Center for Brain Science, Museum of Comparative Zoology, Harvard University and the Howard Hughes Medical Institute, 16 Divinity Avenue, Cambridge, MA 02138, USA
| | - Juan I Sanguinetti-Scheck
- Department of Molecular & Cellular Biology, Department of Organismic & Evolutionary Biology, Center for Brain Science, Museum of Comparative Zoology, Harvard University and the Howard Hughes Medical Institute, 16 Divinity Avenue, Cambridge, MA 02138, USA
| | - John E Sabatini
- Department of Molecular & Cellular Biology, Department of Organismic & Evolutionary Biology, Center for Brain Science, Museum of Comparative Zoology, Harvard University and the Howard Hughes Medical Institute, 16 Divinity Avenue, Cambridge, MA 02138, USA
| | - Sade McFadden
- Department of Molecular & Cellular Biology, Department of Organismic & Evolutionary Biology, Center for Brain Science, Museum of Comparative Zoology, Harvard University and the Howard Hughes Medical Institute, 16 Divinity Avenue, Cambridge, MA 02138, USA
| | - Anna K Lindholm
- Department of Evolutionary Biology & Environmental Studies, University of Zürich, Winterthurerstrasse, 190 8057 Zürich, Switzerland
| | - Hopi E Hoekstra
- Department of Molecular & Cellular Biology, Department of Organismic & Evolutionary Biology, Center for Brain Science, Museum of Comparative Zoology, Harvard University and the Howard Hughes Medical Institute, 16 Divinity Avenue, Cambridge, MA 02138, USA.
| |
Collapse
|
42
|
Zimmermann J, Beguet F, Guthruf D, Langbehn B, Rupp D. Finding the semantic similarity in single-particle diffraction images using self-supervised contrastive projection learning. NPJ COMPUTATIONAL MATERIALS 2023; 9:24. [PMID: 38666059 PMCID: PMC11041688 DOI: 10.1038/s41524-023-00966-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Accepted: 01/10/2023] [Indexed: 04/28/2024]
Abstract
Single-shot coherent diffraction imaging of isolated nanosized particles has seen remarkable success in recent years, yielding in-situ measurements with ultra-high spatial and temporal resolution. The progress of high-repetition-rate sources for intense X-ray pulses has further enabled recording datasets containing millions of diffraction images, which are needed for the structure determination of specimens with greater structural variety and dynamic experiments. The size of the datasets, however, represents a monumental problem for their analysis. Here, we present an automatized approach for finding semantic similarities in coherent diffraction images without relying on human expert labeling. By introducing the concept of projection learning, we extend self-supervised contrastive learning to the context of coherent diffraction imaging and achieve a dimensionality reduction producing semantically meaningful embeddings that align with physical intuition. The method yields substantial improvements compared to previous approaches, paving the way toward real-time and large-scale analysis of coherent diffraction experiments at X-ray free-electron lasers.
Collapse
Affiliation(s)
| | | | | | | | - Daniela Rupp
- ETH Zürich, Zürich, Switzerland
- Max-Born-Institut, Berlin, Germany
| |
Collapse
|
43
|
Clink DJ, Kier I, Ahmad AH, Klinck H. A workflow for the automated detection and classification of female gibbon calls from long-term acoustic recordings. Front Ecol Evol 2023. [DOI: 10.3389/fevo.2023.1071640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023] Open
Abstract
Passive acoustic monitoring (PAM) allows for the study of vocal animals on temporal and spatial scales difficult to achieve using only human observers. Recent improvements in recording technology, data storage, and battery capacity have led to increased use of PAM. One of the main obstacles in implementing wide-scale PAM programs is the lack of open-source programs that efficiently process terabytes of sound recordings and do not require large amounts of training data. Here we describe a workflow for detecting, classifying, and visualizing female Northern grey gibbon calls in Sabah, Malaysia. Our approach detects sound events using band-limited energy summation and does binary classification of these events (gibbon female or not) using machine learning algorithms (support vector machine and random forest). We then applied an unsupervised approach (affinity propagation clustering) to see if we could further differentiate between true and false positives or the number of gibbon females in our dataset. We used this workflow to address three questions: (1) does this automated approach provide reliable estimates of temporal patterns of gibbon calling activity; (2) can unsupervised approaches be applied as a post-processing step to improve the performance of the system; and (3) can unsupervised approaches be used to estimate how many female individuals (or clusters) there are in our study area? We found that performance plateaued with >160 clips of training data for each of our two classes. Using optimized settings, our automated approach achieved a satisfactory performance (F1 score ~ 80%). The unsupervised approach did not effectively differentiate between true and false positives or return clusters that appear to correspond to the number of females in our study area. Our results indicate that more work needs to be done before unsupervised approaches can be reliably used to estimate the number of individual animals occupying an area from PAM data. Future work applying these methods across sites and different gibbon species and comparisons to deep learning approaches will be crucial for future gibbon conservation initiatives across Southeast Asia.
Collapse
|
44
|
Berthet M, Coye C, Dezecache G, Kuhn J. Animal linguistics: a primer. Biol Rev Camb Philos Soc 2023; 98:81-98. [PMID: 36189714 PMCID: PMC10091714 DOI: 10.1111/brv.12897] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 08/10/2022] [Accepted: 08/12/2022] [Indexed: 01/12/2023]
Abstract
The evolution of language has been investigated by several research communities, including biologists and linguists, striving to highlight similar linguistic capacities across species. To date, however, no consensus exists on the linguistic capacities of non-human species. Major controversies remain on the use of linguistic terminology, analysis methods and behavioural data collection. The field of 'animal linguistics' has emerged to overcome these difficulties and attempt to reach uniform methods and terminology. This primer is a tutorial review of 'animal linguistics'. It describes the linguistic concepts of semantics, pragmatics and syntax, and proposes minimal criteria to be fulfilled to claim that a given species displays a particular linguistic capacity. Second, it reviews relevant methods successfully applied to the study of communication in animals and proposes a list of useful references to detect and overcome major pitfalls commonly observed in the collection of animal behaviour data. This primer represents a step towards mutual understanding and fruitful collaborations between linguists and biologists.
Collapse
Affiliation(s)
- Mélissa Berthet
- Institut Jean Nicod, Département d'études cognitives, ENS, EHESS, CNRS, PSL University, 75005, Paris, France.,Center for the Interdisciplinary Study of Language Evolution, University of Zürich, Affolternstrasse 56, 8050, Zurich, Switzerland.,Department of Comparative Language Science, University of Zürich, Affolternstrasse 56, 8050, Zurich, Switzerland
| | - Camille Coye
- Institut Jean Nicod, Département d'études cognitives, ENS, EHESS, CNRS, PSL University, 75005, Paris, France.,Center for Ecology and Conservation, Bioscience Department, University of Exeter, Penryn Campus, Penryn, TR10 9FE, UK
| | | | - Jeremy Kuhn
- Institut Jean Nicod, Département d'études cognitives, ENS, EHESS, CNRS, PSL University, 75005, Paris, France
| |
Collapse
|
45
|
Walsh SL, Engesser S, Townsend SW, Ridley AR. Multi-level combinatoriality in magpie non-song vocalizations. J R Soc Interface 2023; 20:20220679. [PMID: 36722171 PMCID: PMC9890321 DOI: 10.1098/rsif.2022.0679] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
Comparative studies conducted over the past few decades have provided important insights into the capacity for animals to combine vocal segments at either one of two levels: within- or between-calls. There remains, however, a distinct gap in knowledge as to whether animal combinatoriality can extend beyond one level. Investigating this requires a comprehensive analysis of the combinatorial features characterizing a species' vocal system. Here, we used a nonlinear dimensionality reduction analysis and sequential transition analysis to quantitatively describe the non-song combinatorial repertoire of the Western Australian magpie (Gymnorhina tibicen dorsalis). We found that (i) magpies recombine four distinct acoustic segments to create a larger number of calls, and (ii) the resultant calls are further combined into larger call combinations. Our work demonstrates two levels in the combining of magpie vocal units. These results are incongruous with the notion that a capacity for multi-level combinatoriality is unique to human language, wherein the combining of meaningless sounds and meaningful words interactively occurs across different combinatorial levels. Our study thus provides novel insights into the combinatorial capacities of a non-human species, adding to the growing evidence of analogues of language-specific traits present in the animal kingdom.
Collapse
Affiliation(s)
- Sarah L. Walsh
- Centre for Evolutionary Biology, School of Biological Sciences, University of Western Australia, Crawley, WA 6009, Australia
| | - Sabrina Engesser
- Department of Biology, University of Copenhagen, 1165 København, Denmark
| | - Simon W. Townsend
- Department of Comparative Language Science, University of Zurich, Zurich 8006, Switzerland,Center for the Interdisciplinary Study of Language Evolution (ISLE), University of Zurich, Zurich 8006, Switzerland,Department of Psychology, University of Warwick, Coventry CV4 7AL, UK
| | - Amanda R. Ridley
- Centre for Evolutionary Biology, School of Biological Sciences, University of Western Australia, Crawley, WA 6009, Australia
| |
Collapse
|
46
|
Lorenz C, Hao X, Tomka T, Rüttimann L, Hahnloser RH. Interactive extraction of diverse vocal units from a planar embedding without the need for prior sound segmentation. FRONTIERS IN BIOINFORMATICS 2023; 2:966066. [PMID: 36710910 PMCID: PMC9880044 DOI: 10.3389/fbinf.2022.966066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 11/14/2022] [Indexed: 01/15/2023] Open
Abstract
Annotating and proofreading data sets of complex natural behaviors such as vocalizations are tedious tasks because instances of a given behavior need to be correctly segmented from background noise and must be classified with minimal false positive error rate. Low-dimensional embeddings have proven very useful for this task because they can provide a visual overview of a data set in which distinct behaviors appear in different clusters. However, low-dimensional embeddings introduce errors because they fail to preserve distances; and embeddings represent only objects of fixed dimensionality, which conflicts with vocalizations that have variable dimensions stemming from their variable durations. To mitigate these issues, we introduce a semi-supervised, analytical method for simultaneous segmentation and clustering of vocalizations. We define a given vocalization type by specifying pairs of high-density regions in the embedding plane of sound spectrograms, one region associated with vocalization onsets and the other with offsets. We demonstrate our two-neighborhood (2N) extraction method on the task of clustering adult zebra finch vocalizations embedded with UMAP. We show that 2N extraction allows the identification of short and long vocal renditions from continuous data streams without initially committing to a particular segmentation of the data. Also, 2N extraction achieves much lower false positive error rate than comparable approaches based on a single defining region. Along with our method, we present a graphical user interface (GUI) for visualizing and annotating data.
Collapse
Affiliation(s)
- Corinna Lorenz
- Institute of Neuroinformatics and Neuroscience Center Zurich, University of Zurich and ETH Zurich, Zurich, Switzerland
- Université Paris-Saclay, CNRS, Institut des Neurosciences Paris-Saclay, Saclay, France
| | - Xinyu Hao
- Institute of Neuroinformatics and Neuroscience Center Zurich, University of Zurich and ETH Zurich, Zurich, Switzerland
- Tianjin University, School of Electrical and Information Engineering, Tianjin, China
| | - Tomas Tomka
- Institute of Neuroinformatics and Neuroscience Center Zurich, University of Zurich and ETH Zurich, Zurich, Switzerland
| | - Linus Rüttimann
- Institute of Neuroinformatics and Neuroscience Center Zurich, University of Zurich and ETH Zurich, Zurich, Switzerland
| | - Richard H.R. Hahnloser
- Institute of Neuroinformatics and Neuroscience Center Zurich, University of Zurich and ETH Zurich, Zurich, Switzerland
| |
Collapse
|
47
|
McGinn K, Kahl S, Peery MZ, Klinck H, Wood CM. Feature embeddings from the BirdNET algorithm provide insights into avian ecology. ECOL INFORM 2023. [DOI: 10.1016/j.ecoinf.2023.101995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
|
48
|
Pranic NM, Kornbrek C, Yang C, Cleland TA, Tschida KA. Rates of ultrasonic vocalizations are more strongly related than acoustic features to non-vocal behaviors in mouse pups. Front Behav Neurosci 2022; 16:1015484. [PMID: 36600992 PMCID: PMC9805956 DOI: 10.3389/fnbeh.2022.1015484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 11/29/2022] [Indexed: 12/23/2022] Open
Abstract
Mouse pups produce. ultrasonic vocalizations (USVs) in response to isolation from the nest (i.e., isolation USVs). Rates and acoustic features of isolation USVs change dramatically over the first two weeks of life, and there is also substantial variability in the rates and acoustic features of isolation USVs at a given postnatal age. The factors that contribute to within age variability in isolation USVs remain largely unknown. Here, we explore the extent to which non-vocal behaviors of mouse pups relate to the within age variability in rates and acoustic features of their USVs. We recorded non-vocal behaviors of isolated C57BL/6J mouse pups at four postnatal ages (postnatal days 5, 10, 15, and 20), measured rates of isolation USV production, and applied a combination of pre-defined acoustic feature measurements and an unsupervised machine learning-based vocal analysis method to examine USV acoustic features. When we considered different categories of non-vocal behavior, our analyses revealed that mice in all postnatal age groups produce higher rates of isolation USVs during active non-vocal behaviors than when lying still. Moreover, rates of isolation USVs are correlated with the intensity (i.e., magnitude) of non-vocal body and limb movements within a given trial. In contrast, USVs produced during different categories of non-vocal behaviors and during different intensities of non-vocal movement do not differ substantially in their acoustic features. Our findings suggest that levels of behavioral arousal contribute to within age variability in rates, but not acoustic features, of mouse isolation USVs.
Collapse
|
49
|
Provost KL, Yang J, Carstens BC. The impacts of fine-tuning, phylogenetic distance, and sample size on big-data bioacoustics. PLoS One 2022; 17:e0278522. [PMID: 36477744 PMCID: PMC9728902 DOI: 10.1371/journal.pone.0278522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Accepted: 11/17/2022] [Indexed: 12/12/2022] Open
Abstract
Vocalizations in animals, particularly birds, are critically important behaviors that influence their reproductive fitness. While recordings of bioacoustic data have been captured and stored in collections for decades, the automated extraction of data from these recordings has only recently been facilitated by artificial intelligence methods. These have yet to be evaluated with respect to accuracy of different automation strategies and features. Here, we use a recently published machine learning framework to extract syllables from ten bird species ranging in their phylogenetic relatedness from 1 to 85 million years, to compare how phylogenetic relatedness influences accuracy. We also evaluate the utility of applying trained models to novel species. Our results indicate that model performance is best on conspecifics, with accuracy progressively decreasing as phylogenetic distance increases between taxa. However, we also find that the application of models trained on multiple distantly related species can improve the overall accuracy to levels near that of training and analyzing a model on the same species. When planning big-data bioacoustics studies, care must be taken in sample design to maximize sample size and minimize human labor without sacrificing accuracy.
Collapse
Affiliation(s)
- Kaiya L. Provost
- Department of Evolution, Ecology and Organismal Biology, The Ohio State University, Columbus, Ohio, United States of America
| | - Jiaying Yang
- Department of Evolution, Ecology and Organismal Biology, The Ohio State University, Columbus, Ohio, United States of America
| | - Bryan C. Carstens
- Department of Evolution, Ecology and Organismal Biology, The Ohio State University, Columbus, Ohio, United States of America
| |
Collapse
|
50
|
Michaud F, Sueur J, Le Cesne M, Haupert S. Unsupervised classification to improve the quality of a bird song recording dataset. ECOL INFORM 2022. [DOI: 10.1016/j.ecoinf.2022.101952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|