51
|
Cohen Y, Engel TA, Langdon C, Lindsay GW, Ott T, Peters MAK, Shine JM, Breton-Provencher V, Ramaswamy S. Recent Advances at the Interface of Neuroscience and Artificial Neural Networks. J Neurosci 2022; 42:8514-8523. [PMID: 36351830 PMCID: PMC9665920 DOI: 10.1523/jneurosci.1503-22.2022] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 09/30/2022] [Accepted: 10/03/2022] [Indexed: 11/17/2022] Open
Abstract
Biological neural networks adapt and learn in diverse behavioral contexts. Artificial neural networks (ANNs) have exploited biological properties to solve complex problems. However, despite their effectiveness for specific tasks, ANNs are yet to realize the flexibility and adaptability of biological cognition. This review highlights recent advances in computational and experimental research to advance our understanding of biological and artificial intelligence. In particular, we discuss critical mechanisms from the cellular, systems, and cognitive neuroscience fields that have contributed to refining the architecture and training algorithms of ANNs. Additionally, we discuss how recent work used ANNs to understand complex neuronal correlates of cognition and to process high throughput behavioral data.
Collapse
Affiliation(s)
- Yarden Cohen
- Department of Brain Sciences, Weizmann Institute of Science, Rehovot, 76100, Israel
| | - Tatiana A Engel
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, NY 11724
| | | | - Grace W Lindsay
- Department of Psychology, Center for Data Science, New York University, New York, NY 10003
| | - Torben Ott
- Bernstein Center for Computational Neuroscience Berlin, Institute of Biology, Humboldt University of Berlin, 10117, Berlin, Germany
| | - Megan A K Peters
- Department of Cognitive Sciences, University of California-Irvine, Irvine, CA 92697
| | - James M Shine
- Brain and Mind Centre, University of Sydney, Sydney, NSW 2006, Australia
| | | | - Srikanth Ramaswamy
- Biosciences Institute, Newcastle University, Newcastle upon Tyne, NE2 4HH, United Kingdom
| |
Collapse
|
52
|
Morales G, Vargas V, Espejo D, Poblete V, Tomasevic JA, Otondo F, Navedo JG. Method for passive acoustic monitoring of bird communities using UMAP and a deep neural network. ECOL INFORM 2022. [DOI: 10.1016/j.ecoinf.2022.101909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
53
|
Xing J, Sainburg T, Taylor H, Gentner TQ. Syntactic modulation of rhythm in Australian pied butcherbird song. ROYAL SOCIETY OPEN SCIENCE 2022; 9:220704. [PMID: 36177196 PMCID: PMC9515642 DOI: 10.1098/rsos.220704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 09/05/2022] [Indexed: 05/04/2023]
Abstract
The acoustic structure of birdsong is spectrally and temporally complex. Temporal complexity is often investigated in a syntactic framework focusing on the statistical features of symbolic song sequences. Alternatively, temporal patterns can be investigated in a rhythmic framework that focuses on the relative timing between song elements. Here, we investigate the merits of combining both frameworks by integrating syntactic and rhythmic analyses of Australian pied butcherbird (Cracticus nigrogularis) songs, which exhibit organized syntax and diverse rhythms. We show that rhythms of the pied butcherbird song bouts in our sample are categorically organized and predictable by the song's first-order sequential syntax. These song rhythms remain categorically distributed and strongly associated with the first-order sequential syntax even after controlling for variance in note length, suggesting that the silent intervals between notes induce a rhythmic structure on note sequences. We discuss the implication of syntactic-rhythmic relations as a relevant feature of song complexity with respect to signals such as human speech and music, and advocate for a broader conception of song complexity that takes into account syntax, rhythm, and their interaction with other acoustic and perceptual features.
Collapse
Affiliation(s)
- Jeffrey Xing
- Department of Psychology, University of California San Diego, La Jolla, CA, USA
| | - Tim Sainburg
- Department of Psychology, University of California San Diego, La Jolla, CA, USA
| | - Hollis Taylor
- Sydney Conservatorium of Music, University of Sydney, Sydney, New South Wales, Australia
| | - Timothy Q. Gentner
- Department of Psychology, University of California San Diego, La Jolla, CA, USA
- Neurobiology Section, Division of Biological Sciences, University of California San Diego, La Jolla, CA, USA
- Kavli Institute for Brain and Mind, University of California San Diego, La Jolla, CA, USA
| |
Collapse
|
54
|
Xing J, Sainburg T, Taylor H, Gentner TQ. Syntactic modulation of rhythm in Australian pied butcherbird song. ROYAL SOCIETY OPEN SCIENCE 2022; 9:220704. [PMID: 36177196 DOI: 10.6084/m9.figshare.c.6197494] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 09/05/2022] [Indexed: 05/21/2023]
Abstract
The acoustic structure of birdsong is spectrally and temporally complex. Temporal complexity is often investigated in a syntactic framework focusing on the statistical features of symbolic song sequences. Alternatively, temporal patterns can be investigated in a rhythmic framework that focuses on the relative timing between song elements. Here, we investigate the merits of combining both frameworks by integrating syntactic and rhythmic analyses of Australian pied butcherbird (Cracticus nigrogularis) songs, which exhibit organized syntax and diverse rhythms. We show that rhythms of the pied butcherbird song bouts in our sample are categorically organized and predictable by the song's first-order sequential syntax. These song rhythms remain categorically distributed and strongly associated with the first-order sequential syntax even after controlling for variance in note length, suggesting that the silent intervals between notes induce a rhythmic structure on note sequences. We discuss the implication of syntactic-rhythmic relations as a relevant feature of song complexity with respect to signals such as human speech and music, and advocate for a broader conception of song complexity that takes into account syntax, rhythm, and their interaction with other acoustic and perceptual features.
Collapse
Affiliation(s)
- Jeffrey Xing
- Department of Psychology, University of California San Diego, La Jolla, CA, USA
| | - Tim Sainburg
- Department of Psychology, University of California San Diego, La Jolla, CA, USA
| | - Hollis Taylor
- Sydney Conservatorium of Music, University of Sydney, Sydney, New South Wales, Australia
| | - Timothy Q Gentner
- Department of Psychology, University of California San Diego, La Jolla, CA, USA
- Neurobiology Section, Division of Biological Sciences, University of California San Diego, La Jolla, CA, USA
- Kavli Institute for Brain and Mind, University of California San Diego, La Jolla, CA, USA
| |
Collapse
|
55
|
Sun Y, Yen S, Lin T. soundscape_IR: A source separation toolbox for exploring acoustic diversity in soundscapes. Methods Ecol Evol 2022. [DOI: 10.1111/2041-210x.13960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Yi‐Jen Sun
- Biodiversity Research Center Academia Sinica Taipei Taiwan (R.O.C)
| | - Shih‐Ching Yen
- Center for General Education National Tsing Hua University Hsinchu Taiwan (R.O.C)
| | - Tzu‐Hao Lin
- Biodiversity Research Center Academia Sinica Taipei Taiwan (R.O.C)
| |
Collapse
|
56
|
Introducing the Software CASE (Cluster and Analyze Sound Events) by Comparing Different Clustering Methods and Audio Transformation Techniques Using Animal Vocalizations. Animals (Basel) 2022; 12:ani12162020. [PMID: 36009611 PMCID: PMC9404437 DOI: 10.3390/ani12162020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 07/28/2022] [Accepted: 08/04/2022] [Indexed: 11/17/2022] Open
Abstract
Simple Summary Unsupervised clustering algorithms are widely used in ecology and conservation to classify animal vocalizations, but also offer various advantages in basic research, contributing to the understanding of acoustic communication. Nevertheless, there are still some challenges to overcome. For instance, the quality of the clustering result depends on the audio transformation technique previously used to adjust the audio data. Moreover, it is difficult to verify the reliability of the clustering result. To analyze bioacoustic data using a clustering algorithm, it is, therefore, essential to select a reasonable algorithm from the many existing algorithms and prepare the recorded vocalizations so that the resulting values characterize a vocalization as accurately as possible. Frequency-modulated vocalizations, whose frequencies change over time, pose a particular problem. In this paper, we present the software CASE, which includes various clustering methods and provides an overview of their strengths and weaknesses concerning the classification of bioacoustic data. This software uses a multidimensional feature-extraction method to achieve better clustering results, especially for frequency-modulated vocalizations. Abstract Unsupervised clustering algorithms are widely used in ecology and conservation to classify animal sounds, but also offer several advantages in basic bioacoustics research. Consequently, it is important to overcome the existing challenges. A common practice is extracting the acoustic features of vocalizations one-dimensionally, only extracting an average value for a given feature for the entire vocalization. With frequency-modulated vocalizations, whose acoustic features can change over time, this can lead to insufficient characterization. Whether the necessary parameters have been set correctly and the obtained clustering result reliably classifies the vocalizations subsequently often remains unclear. The presented software, CASE, is intended to overcome these challenges. Established and new unsupervised clustering methods (community detection, affinity propagation, HDBSCAN, and fuzzy clustering) are tested in combination with various classifiers (k-nearest neighbor, dynamic time-warping, and cross-correlation) using differently transformed animal vocalizations. These methods are compared with predefined clusters to determine their strengths and weaknesses. In addition, a multidimensional data transformation procedure is presented that better represents the course of multiple acoustic features. The results suggest that, especially with frequency-modulated vocalizations, clustering is more applicable with multidimensional feature extraction compared with one-dimensional feature extraction. The characterization and clustering of vocalizations in multidimensional space offer great potential for future bioacoustic studies. The software CASE includes the developed method of multidimensional feature extraction, as well as all used clustering methods. It allows quickly applying several clustering algorithms to one data set to compare their results and to verify their reliability based on their consistency. Moreover, the software CASE determines the optimal values of most of the necessary parameters automatically. To take advantage of these benefits, the software CASE is provided for free download.
Collapse
|
57
|
Comella I, Tasirin JS, Klinck H, Johnson LM, Clink DJ. Investigating note repertoires and acoustic tradeoffs in the duet contributions of a basal haplorrhine primate. Front Ecol Evol 2022. [DOI: 10.3389/fevo.2022.910121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Acoustic communication serves a crucial role in the social interactions of vocal animals. Duetting—the coordinated singing among pairs of animals—has evolved independently multiple times across diverse taxonomic groups including insects, frogs, birds, and mammals. A crucial first step for understanding how information is encoded and transferred in duets is through quantifying the acoustic repertoire, which can reveal differences and similarities on multiple levels of analysis and provides the groundwork necessary for further studies of the vocal communication patterns of the focal species. Investigating acoustic tradeoffs, such as the tradeoff between the rate of syllable repetition and note bandwidth, can also provide important insights into the evolution of duets, as these tradeoffs may represent the physical and mechanical limits on signal design. In addition, identifying which sex initiates the duet can provide insights into the function of the duets. We have three main goals in the current study: (1) provide a descriptive, fine-scale analysis of Gursky’s spectral tarsier (Tarsius spectrumgurskyae) duets; (2) use unsupervised approaches to investigate sex-specific note repertoires; and (3) test for evidence of acoustic tradeoffs in the rate of note repetition and bandwidth of tarsier duet contributions. We found that both sexes were equally likely to initiate the duets and that pairs differed substantially in the duration of their duets. Our unsupervised clustering analyses indicate that both sexes have highly graded note repertoires. We also found evidence for acoustic tradeoffs in both male and female duet contributions, but the relationship in females was much more pronounced. The prevalence of this tradeoff across diverse taxonomic groups including birds, bats, and primates indicates the constraints that limit the production of rapidly repeating broadband notes may be one of the few ‘universals’ in vocal communication. Future carefully designed playback studies that investigate the behavioral response, and therefore potential information transmitted in duets to conspecifics, will be highly informative.
Collapse
|
58
|
Thomas M, Jensen FH, Averly B, Demartsev V, Manser MB, Sainburg T, Roch MA, Strandburg-Peshkin A. A practical guide for generating unsupervised, spectrogram-based latent space representations of animal vocalizations. J Anim Ecol 2022; 91:1567-1581. [PMID: 35657634 DOI: 10.1111/1365-2656.13754] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Accepted: 05/26/2022] [Indexed: 11/30/2022]
Abstract
BACKGROUND The manual detection, analysis and classification of animal vocalizations in acoustic recordings is laborious and requires expert knowledge. Hence, there is a need for objective, generalizable methods that detect underlying patterns in these data, categorize sounds into distinct groups and quantify similarities between them. Among all computational methods that have been proposed to accomplish this, neighbourhood-based dimensionality reduction of spectrograms to produce a latent space representation of calls stands out for its conceptual simplicity and effectiveness. Goal of the study/what was done: Using a dataset of manually annotated meerkat Suricata suricatta vocalizations, we demonstrate how this method can be used to obtain meaningful latent space representations that reflect the established taxonomy of call types. We analyse strengths and weaknesses of the proposed approach, give recommendations for its usage and show application examples, such as the classification of ambiguous calls and the detection of mislabelled calls. What this means: All analyses are accompanied by example code to help researchers realize the potential of this method for the study of animal vocalizations.
Collapse
Affiliation(s)
- Mara Thomas
- Department for the Ecology of Animal Societies, Max Planck Institute of Animal Behavior, Constance, Germany
- Department of Biology, University of Konstanz, Constance, Germany
| | - Frants H Jensen
- Department of Biology, Woods Hole Oceanographic Institution, Woods Hole, MA, USA
- Department of Biology, Syracuse University, Syracuse, NY, USA
| | - Baptiste Averly
- Department for the Ecology of Animal Societies, Max Planck Institute of Animal Behavior, Constance, Germany
- Department of Biology, University of Konstanz, Constance, Germany
| | - Vlad Demartsev
- Department for the Ecology of Animal Societies, Max Planck Institute of Animal Behavior, Constance, Germany
- Department of Biology, University of Konstanz, Constance, Germany
| | - Marta B Manser
- Kalahari Meerkat Project, Kuruman River Reserve, Van Zylsrus, South Africa
- Department of Evolutionary Biology and Environmental Studies, University of Zürich, Zürich, Switzerland
| | - Tim Sainburg
- Department of Psychology, University of California San Diego, La Jolla, CA, USA
| | - Marie A Roch
- Department of Computer Science, San Diego State University, San Diego, CA, USA
| | - Ariana Strandburg-Peshkin
- Department for the Ecology of Animal Societies, Max Planck Institute of Animal Behavior, Constance, Germany
- Department of Biology, University of Konstanz, Constance, Germany
- Kalahari Meerkat Project, Kuruman River Reserve, Van Zylsrus, South Africa
- Centre for the Advanced Study of Collective Behavior, University of Konstanz, Constance, Germany
| |
Collapse
|
59
|
Karigo T. Gaining insights into the internal states of the rodent brain through vocal communications. Neurosci Res 2022; 184:1-8. [PMID: 35908736 DOI: 10.1016/j.neures.2022.07.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Revised: 07/25/2022] [Accepted: 07/26/2022] [Indexed: 10/31/2022]
Abstract
Animals display various behaviors during social interactions. Social behaviors have been proposed to be driven by the internal states of the animals, reflecting their emotional or motivational states. However, the internal states that drive social behaviors are complex and difficult to interpret. Many animals, including mice, use vocalizations for communication in various social contexts. This review provides an overview of current understandings of mouse vocal communications, its underlying neural circuitry, and the potential to use vocal communications as a readout for the animal's internal states during social interactions.
Collapse
Affiliation(s)
- Tomomi Karigo
- Division of Biology and Biological Engineering 140-18,TianQiao and Chrissy Chen Institute for Neuroscience, California Institute of Technology, Pasadena CA 91125, USA; Present address: Kennedy Krieger Institute, Baltimore, MD 21205, USA; The Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.
| |
Collapse
|
60
|
Harvill J, Wani Y, Alam M, Ahuja N, Hasegawa-Johnsor M, Chestek D, Beiser DG. Estimation of Respiratory Rate from Breathing Audio. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2022; 2022:4599-4603. [PMID: 36085895 DOI: 10.1109/embc48229.2022.9871897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The COVID-19 pandemic has fueled exponential growth in the adoption of remote delivery of primary, specialty, and urgent health care services. One major challenge is the lack of access to physical exam including accurate and inexpensive measurement of remote vital signs. Here we present a novel method for machine learning-based estimation of patient respiratory rate from audio. There exist non-learning methods but their accuracy is limited and work using machine learning known to us is either not directly useful or uses non-public datasets. We are aware of only one publicly available dataset which is small and which we use to evaluate our algorithm. However, to avoid the overfitting problem, we expand its effective size by proposing a new data augmentation method. Our algorithm uses the spectrogram representation and requires labels for breathing cycles, which are used to train a recurrent neural network for recognizing the cycles. Our augmentation method exploits the independence property of the most periodic frequency components of the spectrogram and permutes their order to create multiple signal representations. Our experiments show that our method almost halves the errors obtained by the existing (non-learning) methods. Clinical Relevance- We achieve a Mean Absolute Error (MAE) of 1.0 for the respiratory rate while relying only on an audio signal of a patient breathing. This signal can be collected from a smartphone such that physicians can automatically and reliably determine respiratory rate in a remote setting.
Collapse
|
61
|
Miller CT, Gire D, Hoke K, Huk AC, Kelley D, Leopold DA, Smear MC, Theunissen F, Yartsev M, Niell CM. Natural behavior is the language of the brain. Curr Biol 2022; 32:R482-R493. [PMID: 35609550 PMCID: PMC10082559 DOI: 10.1016/j.cub.2022.03.031] [Citation(s) in RCA: 75] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
The breadth and complexity of natural behaviors inspires awe. Understanding how our perceptions, actions, and internal thoughts arise from evolved circuits in the brain has motivated neuroscientists for generations. Researchers have traditionally approached this question by focusing on stereotyped behaviors, either natural or trained, in a limited number of model species. This approach has allowed for the isolation and systematic study of specific brain operations, which has greatly advanced our understanding of the circuits involved. At the same time, the emphasis on experimental reductionism has left most aspects of the natural behaviors that have shaped the evolution of the brain largely unexplored. However, emerging technologies and analytical tools make it possible to comprehensively link natural behaviors to neural activity across a broad range of ethological contexts and timescales, heralding new modes of neuroscience focused on natural behaviors. Here we describe a three-part roadmap that aims to leverage the wealth of behaviors in their naturally occurring distributions, linking their variance with that of underlying neural processes to understand how the brain is able to successfully navigate the everyday challenges of animals' social and ecological landscapes. To achieve this aim, experimenters must harness one challenge faced by all neurobiological systems, namely variability, in order to gain new insights into the language of the brain.
Collapse
Affiliation(s)
- Cory T Miller
- Cortical Systems and Behavior Laboratory, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92039, USA.
| | - David Gire
- Department of Psychology, University of Washington, Guthrie Hall, Seattle, WA 98105, USA
| | - Kim Hoke
- Department of Biology, Colorado State University, 1878 Campus Delivery, Fort Collins, CO 80523, USA
| | - Alexander C Huk
- Center for Perceptual Systems, Departments of Neuroscience and Psychology, University of Texas at Austin, 116 Inner Campus Drive, Austin, TX 78712, USA
| | - Darcy Kelley
- Department of Biological Sciences, Columbia University, 1212 Amsterdam Avenue, New York, NY 10027, USA
| | - David A Leopold
- Section of Cognitive Neurophysiology and Imaging, National Institute of Mental Health, 49 Convent Drive, Bethesda, MD 20892, USA
| | - Matthew C Smear
- Department of Psychology and Institute of Neuroscience, University of Oregon, 1227 University Street, Eugene, OR 97403, USA
| | - Frederic Theunissen
- Department of Psychology, University of California Berkeley, 2121 Berkeley Way, Berkeley, CA 94720, USA
| | - Michael Yartsev
- Department of Bioengineering, University of California Berkeley, 306 Stanley Hall, Berkeley, CA 94720, USA
| | - Cristopher M Niell
- Department of Biology and Institute of Neuroscience, University of Oregon, 222 Huestis Hall, Eugene, OR 97403, USA.
| |
Collapse
|
62
|
Grazioli J, Ghiggi G, Billault-Roux AC, Berne A. MASCDB, a database of images, descriptors and microphysical properties of individual snowflakes in free fall. Sci Data 2022; 9:186. [PMID: 35504919 PMCID: PMC9065139 DOI: 10.1038/s41597-022-01269-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Accepted: 03/14/2022] [Indexed: 11/21/2022] Open
Abstract
Snowfall information at the scale of individual particles is rare, difficult to gather, but fundamental for a better understanding of solid precipitation microphysics. In this article we present a dataset (with dedicated software) of in-situ measurements of snow particles in free fall. The dataset includes gray-scale (255 shades) images of snowflakes, co-located surface environmental measurements, a large number of geometrical and textural snowflake descriptors as well as the output of previously published retrieval algorithms. These include: hydrometeor classification, riming degree estimation, identification of melting particles, discrimination of wind-blown snow, as well as estimates of snow particle mass and volume. The measurements were collected in various locations of the Alps, Antarctica and Korea for a total of 2'555'091 snowflake images (or 851'697 image triplets). As the instrument used for data collection was a Multi-Angle Snowflake Camera (MASC), the dataset is named MASCDB. Given the large amount of snowflake images and associated descriptors, MASCDB can be exploited also by the computer vision community for the training and benchmarking of image processing systems.
Collapse
Affiliation(s)
- Jacopo Grazioli
- Environmental Remote Sensing Laboratory, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.
| | - Gionata Ghiggi
- Environmental Remote Sensing Laboratory, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.
| | - Anne-Claire Billault-Roux
- Environmental Remote Sensing Laboratory, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Alexis Berne
- Environmental Remote Sensing Laboratory, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| |
Collapse
|
63
|
Valente D, Miaretsoa L, Anania A, Costa F, Mascaro A, Raimondi T, De Gregorio C, Torti V, Friard O, Ratsimbazafy J, Giacoma C, Gamba M. Comparative Analysis of the Vocal Repertoires of the Indri (Indri indri) and the Diademed Sifaka (Propithecus diadema). INT J PRIMATOL 2022. [DOI: 10.1007/s10764-022-00287-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
AbstractStrepsirrhine vocalisations are extraordinarily diverse and cross-species comparisons are needed to explore how this variability evolved. We contributed to the investigation of primate acoustic diversity by comparing the vocal repertoire of two sympatric lemur species, Propithecus diadema and Indri indri. These diurnal species belong to the same taxonomic family and have similar activity patterns but different social structures. These features make them excellent candidates for an investigation of the phylogenetic, environmental, and social influence on primate vocal behavior. We recorded 3 P. diadema groups in 2014 and 2016. From 1,872 recordings we selected and assigned 3814 calls to 9 a priori call types, on the basis of their acoustic structure. We implemented a reproducible technique performing an acoustic feature extraction relying on frequency bins, t-SNE data reduction, and a hard-clustering analysis. We first quantified the vocal repertoire of P. diadema, finding consistent results for the 9 putatively identified call types. When comparing this repertoire with a previously published repertoire of I. indri, we found highly species-specific repertoires, with only 2% of the calls misclassified by species identity. The loud calls of the two species were very distinct, while the low-frequency calls were more similar. Our results pinpoint the role of phylogenetic history, social and environmental features on the evolution of communicative systems and contribute to a deeper understanding of the evolutionary roots of primate vocal differentiation. We conclude by arguing that standardized and reproducible techniques, like the one we employed, allow robust comparisons and should be prioritized in the future.
Collapse
|
64
|
Zhu Y, Smith A, Hauser K. Automated Heart and Lung Auscultation in Robotic Physical Examinations. IEEE Robot Autom Lett 2022. [DOI: 10.1109/lra.2022.3149576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
65
|
Linhart P, Mahamoud-Issa M, Stowell D, Blumstein DT. The potential for acoustic individual identification in mammals. Mamm Biol 2022. [DOI: 10.1007/s42991-021-00222-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
66
|
Adret P. Developmental Plasticity in Primate Coordinated Song: Parallels and Divergences With Duetting Songbirds. Front Ecol Evol 2022. [DOI: 10.3389/fevo.2022.862196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Homeothermic animals (birds and mammals) are prime model systems for investigating the developmental plasticity and neural mechanisms of vocal duetting, a cooperative acoustic signal that prevails in family-living and pair-bonded species including humans. This review focuses on the nature of this trait and its nurturing during ontogeny and extending into adulthood. I begin by outlining the underpinning concepts of duet codes and pair-specific answering rules as used by birds to develop their learned coordinated song, driven by a complex interaction between self-generated and socially mediated auditory feedback. The more tractable avian model of duetting helps identify research gaps in singing primates that also use duetting as a type of intraspecific vocal interaction. Nevertheless, it has become clear that primate coordinated song—whether overlapping or antiphonal—is subject to some degree of vocal flexibility. This is reflected in the ability of lesser apes, titi monkeys, tarsiers, and lemurs to adjust the structure and timing of their calls through (1) social influence, (2) coordinated duetting both before and after mating, (3) the repair of vocal mistakes, (4) the production of heterosexual song early in life, (5) vocal accommodation in call rhythm, (6) conditioning, and (7) innovation. Furthermore, experimental work on the neural underpinnings of avian and mammalian antiphonal duets point to a hierarchical (cortico-subcortical) control mechanism that regulates, via inhibition, the temporal segregation of rapid vocal exchanges. I discuss some weaknesses in this growing field of research and highlight prospective avenues for future investigation.
Collapse
|
67
|
Parsons MJG, Lin TH, Mooney TA, Erbe C, Juanes F, Lammers M, Li S, Linke S, Looby A, Nedelec SL, Van Opzeeland I, Radford C, Rice AN, Sayigh L, Stanley J, Urban E, Di Iorio L. Sounding the Call for a Global Library of Underwater Biological Sounds. Front Ecol Evol 2022. [DOI: 10.3389/fevo.2022.810156] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Aquatic environments encompass the world’s most extensive habitats, rich with sounds produced by a diversity of animals. Passive acoustic monitoring (PAM) is an increasingly accessible remote sensing technology that uses hydrophones to listen to the underwater world and represents an unprecedented, non-invasive method to monitor underwater environments. This information can assist in the delineation of biologically important areas via detection of sound-producing species or characterization of ecosystem type and condition, inferred from the acoustic properties of the local soundscape. At a time when worldwide biodiversity is in significant decline and underwater soundscapes are being altered as a result of anthropogenic impacts, there is a need to document, quantify, and understand biotic sound sources–potentially before they disappear. A significant step toward these goals is the development of a web-based, open-access platform that provides: (1) a reference library of known and unknown biological sound sources (by integrating and expanding existing libraries around the world); (2) a data repository portal for annotated and unannotated audio recordings of single sources and of soundscapes; (3) a training platform for artificial intelligence algorithms for signal detection and classification; and (4) a citizen science-based application for public users. Although individually, these resources are often met on regional and taxa-specific scales, many are not sustained and, collectively, an enduring global database with an integrated platform has not been realized. We discuss the benefits such a program can provide, previous calls for global data-sharing and reference libraries, and the challenges that need to be overcome to bring together bio- and ecoacousticians, bioinformaticians, propagation experts, web engineers, and signal processing specialists (e.g., artificial intelligence) with the necessary support and funding to build a sustainable and scalable platform that could address the needs of all contributors and stakeholders into the future.
Collapse
|
68
|
Cohen Y, Nicholson DA, Sanchioni A, Mallaber EK, Skidanova V, Gardner TJ. Automated annotation of birdsong with a neural network that segments spectrograms. eLife 2022; 11:63853. [PMID: 35050849 PMCID: PMC8860439 DOI: 10.7554/elife.63853] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Accepted: 01/19/2022] [Indexed: 11/13/2022] Open
Abstract
Songbirds provide a powerful model system for studying sensory-motor learning. However, many analyses of birdsong require time-consuming, manual annotation of its elements, called syllables. Automated methods for annotation have been proposed, but these methods assume that audio can be cleanly segmented into syllables, or they require carefully tuning multiple statistical models. Here we present TweetyNet: a single neural network model that learns how to segment spectrograms of birdsong into annotated syllables. We show that TweetyNet mitigates limitations of methods that rely on segmented audio. We also show that TweetyNet performs well across multiple individuals from two species of songbirds, Bengalese finches and canaries. Lastly, we demonstrate that using TweetyNet we can accurately annotate very large datasets containing multiple days of song, and that these predicted annotations replicate key findings from behavioral studies. In addition, we provide open-source software to assist other researchers, and a large dataset of annotated canary song that can serve as a benchmark. We conclude that TweetyNet makes it possible to address a wide range of new questions about birdsong.
Collapse
Affiliation(s)
- Yarden Cohen
- Department of Brain Sciences, Weizmann Institute of Science, Rehovot, Israel
| | | | - Alexa Sanchioni
- Department of Biology, Boston University, Boston, United States
| | | | | | - Timothy J Gardner
- Phil and Penny Knight Campus for Accelerating Scientific Impact, University of Oregon, Eugene, United States
| |
Collapse
|
69
|
Vocal Learning and Behaviors in Birds and Human Bilinguals: Parallels, Divergences and Directions for Research. LANGUAGES 2021. [DOI: 10.3390/languages7010005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Comparisons between the communication systems of humans and animals are instrumental in contextualizing speech and language into an evolutionary and biological framework and for illuminating mechanisms of human communication. As a complement to previous work that compares developmental vocal learning and use among humans and songbirds, in this article we highlight phenomena associated with vocal learning subsequent to the development of primary vocalizations (i.e., the primary language (L1) in humans and the primary song (S1) in songbirds). By framing avian “second-song” (S2) learning and use within the human second-language (L2) context, we lay the groundwork for a scientifically-rich dialogue between disciplines. We begin by summarizing basic birdsong research, focusing on how songs are learned and on constraints on learning. We then consider commonalities in vocal learning across humans and birds, in particular the timing and neural mechanisms of learning, variability of input, and variability of outcomes. For S2 and L2 learning outcomes, we address the respective roles of age, entrenchment, and social interactions. We proceed to orient current and future birdsong inquiry around foundational features of human bilingualism: L1 effects on the L2, L1 attrition, and L1<–>L2 switching. Throughout, we highlight characteristics that are shared across species as well as the need for caution in interpreting birdsong research. Thus, from multiple instructive perspectives, our interdisciplinary dialogue sheds light on biological and experiential principles of L2 acquisition that are informed by birdsong research, and leverages well-studied characteristics of bilingualism in order to clarify, contextualize, and further explore S2 learning and use in songbirds.
Collapse
|
70
|
Measuring context dependency in birdsong using artificial neural networks. PLoS Comput Biol 2021; 17:e1009707. [PMID: 34962915 PMCID: PMC8746767 DOI: 10.1371/journal.pcbi.1009707] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 01/10/2022] [Accepted: 12/01/2021] [Indexed: 11/19/2022] Open
Abstract
Context dependency is a key feature in sequential structures of human language, which requires reference between words far apart in the produced sequence. Assessing how long the past context has an effect on the current status provides crucial information to understand the mechanism for complex sequential behaviors. Birdsongs serve as a representative model for studying the context dependency in sequential signals produced by non-human animals, while previous reports were upper-bounded by methodological limitations. Here, we newly estimated the context dependency in birdsongs in a more scalable way using a modern neural-network-based language model whose accessible context length is sufficiently long. The detected context dependency was beyond the order of traditional Markovian models of birdsong, but was consistent with previous experimental investigations. We also studied the relation between the assumed/auto-detected vocabulary size of birdsong (i.e., fine- vs. coarse-grained syllable classifications) and the context dependency. It turned out that the larger vocabulary (or the more fine-grained classification) is assumed, the shorter context dependency is detected.
Collapse
|
71
|
Sainburg T, Gentner TQ. Toward a Computational Neuroethology of Vocal Communication: From Bioacoustics to Neurophysiology, Emerging Tools and Future Directions. Front Behav Neurosci 2021; 15:811737. [PMID: 34987365 PMCID: PMC8721140 DOI: 10.3389/fnbeh.2021.811737] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 11/29/2021] [Indexed: 11/23/2022] Open
Abstract
Recently developed methods in computational neuroethology have enabled increasingly detailed and comprehensive quantification of animal movements and behavioral kinematics. Vocal communication behavior is well poised for application of similar large-scale quantification methods in the service of physiological and ethological studies. This review describes emerging techniques that can be applied to acoustic and vocal communication signals with the goal of enabling study beyond a small number of model species. We review a range of modern computational methods for bioacoustics, signal processing, and brain-behavior mapping. Along with a discussion of recent advances and techniques, we include challenges and broader goals in establishing a framework for the computational neuroethology of vocal communication.
Collapse
Affiliation(s)
- Tim Sainburg
- Department of Psychology, University of California, San Diego, La Jolla, CA, United States
- Center for Academic Research & Training in Anthropogeny, University of California, San Diego, La Jolla, CA, United States
| | - Timothy Q. Gentner
- Department of Psychology, University of California, San Diego, La Jolla, CA, United States
- Neurosciences Graduate Program, University of California, San Diego, La Jolla, CA, United States
- Neurobiology Section, Division of Biological Sciences, University of California, San Diego, La Jolla, CA, United States
- Kavli Institute for Brain and Mind, University of California, San Diego, La Jolla, CA, United States
| |
Collapse
|
72
|
Neethirajan S. Is Seeing Still Believing? Leveraging Deepfake Technology for Livestock Farming. Front Vet Sci 2021; 8:740253. [PMID: 34888374 PMCID: PMC8649769 DOI: 10.3389/fvets.2021.740253] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 11/02/2021] [Indexed: 11/17/2022] Open
Abstract
Deepfake technologies are known for the creation of forged celebrity pornography, face and voice swaps, and other fake media content. Despite the negative connotations the technology bears, the underlying machine learning algorithms have a huge potential that could be applied to not just digital media, but also to medicine, biology, affective science, and agriculture, just to name a few. Due to the ability to generate big datasets based on real data distributions, deepfake could also be used to positively impact non-human animals such as livestock. Generated data using Generative Adversarial Networks, one of the algorithms that deepfake is based on, could be used to train models to accurately identify and monitor animal health and emotions. Through data augmentation, using digital twins, and maybe even displaying digital conspecifics (digital avatars or metaverse) where social interactions are enhanced, deepfake technologies have the potential to increase animal health, emotionality, sociality, animal-human and animal-computer interactions and thereby productivity, and sustainability of the farming industry. The interactive 3D avatars and the digital twins of farm animals enabled by deepfake technology offers a timely and essential way in the digital transformation toward exploring the subtle nuances of animal behavior and cognition in enhancing farm animal welfare. Without offering conclusive remarks, the presented mini review is exploratory in nature due to the nascent stages of the deepfake technology.
Collapse
Affiliation(s)
- Suresh Neethirajan
- Farmworx, Adaptation Physiology Group, Animal Sciences Department, Wageningen University and Research, Wageningen, Netherlands
| |
Collapse
|
73
|
Singh Alvarado J, Goffinet J, Michael V, Liberti W, Hatfield J, Gardner T, Pearson J, Mooney R. Neural dynamics underlying birdsong practice and performance. Nature 2021; 599:635-639. [PMID: 34671166 PMCID: PMC9118926 DOI: 10.1038/s41586-021-04004-1] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Accepted: 09/07/2021] [Indexed: 11/09/2022]
Abstract
Musical and athletic skills are learned and maintained through intensive practice to enable precise and reliable performance for an audience. Consequently, understanding such complex behaviours requires insight into how the brain functions during both practice and performance. Male zebra finches learn to produce courtship songs that are more varied when alone and more stereotyped in the presence of females1. These differences are thought to reflect song practice and performance, respectively2,3, providing a useful system in which to explore how neurons encode and regulate motor variability in these two states. Here we show that calcium signals in ensembles of spiny neurons (SNs) in the basal ganglia are highly variable relative to their cortical afferents during song practice. By contrast, SN calcium signals are strongly suppressed during female-directed performance, and optogenetically suppressing SNs during practice strongly reduces vocal variability. Unsupervised learning methods4,5 show that specific SN activity patterns map onto distinct song practice variants. Finally, we establish that noradrenergic signalling reduces vocal variability by directly suppressing SN activity. Thus, SN ensembles encode and drive vocal exploration during practice, and the noradrenergic suppression of SN activity promotes stereotyped and precise song performance for an audience.
Collapse
Affiliation(s)
| | - Jack Goffinet
- Department of Computer Science, Duke University, Durham, NC, USA
| | - Valerie Michael
- Department of Neurobiology, Duke University, Durham, NC, USA
| | - William Liberti
- Department of Electrical Engineering and Computer Science, University of California Berkeley, Berkeley, CA, USA
| | - Jordan Hatfield
- Department of Neurobiology, Duke University, Durham, NC, USA
| | - Timothy Gardner
- Phil and Penny Knight Campus for Accelerating Scientific Impact, University of Oregon, Eugene, OR, USA
| | - John Pearson
- Department of Neurobiology, Duke University, Durham, NC, USA.
- Department of Biostatistics & Bioinformatics, Duke University, Durham, NC, USA.
- Department of Electrical and Computer Engineering, Duke University, Durham, NC, USA.
| | - Richard Mooney
- Department of Neurobiology, Duke University, Durham, NC, USA.
| |
Collapse
|
74
|
Steinfath E, Palacios-Muñoz A, Rottschäfer JR, Yuezak D, Clemens J. Fast and accurate annotation of acoustic signals with deep neural networks. eLife 2021; 10:e68837. [PMID: 34723794 PMCID: PMC8560090 DOI: 10.7554/elife.68837] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Accepted: 10/04/2021] [Indexed: 01/06/2023] Open
Abstract
Acoustic signals serve communication within and across species throughout the animal kingdom. Studying the genetics, evolution, and neurobiology of acoustic communication requires annotating acoustic signals: segmenting and identifying individual acoustic elements like syllables or sound pulses. To be useful, annotations need to be accurate, robust to noise, and fast. We here introduce DeepAudioSegmenter (DAS), a method that annotates acoustic signals across species based on a deep-learning derived hierarchical presentation of sound. We demonstrate the accuracy, robustness, and speed of DAS using acoustic signals with diverse characteristics from insects, birds, and mammals. DAS comes with a graphical user interface for annotating song, training the network, and for generating and proofreading annotations. The method can be trained to annotate signals from new species with little manual annotation and can be combined with unsupervised methods to discover novel signal types. DAS annotates song with high throughput and low latency for experimental interventions in realtime. Overall, DAS is a universal, versatile, and accessible tool for annotating acoustic communication signals.
Collapse
Affiliation(s)
- Elsa Steinfath
- European Neuroscience Institute - A Joint Initiative of the University Medical Center Göttingen and the Max-Planck-SocietyGöttingenGermany
- International Max Planck Research School and Göttingen Graduate School for Neurosciences, Biophysics, and Molecular Biosciences (GGNB) at the University of GöttingenGöttingenGermany
| | - Adrian Palacios-Muñoz
- European Neuroscience Institute - A Joint Initiative of the University Medical Center Göttingen and the Max-Planck-SocietyGöttingenGermany
- International Max Planck Research School and Göttingen Graduate School for Neurosciences, Biophysics, and Molecular Biosciences (GGNB) at the University of GöttingenGöttingenGermany
| | - Julian R Rottschäfer
- European Neuroscience Institute - A Joint Initiative of the University Medical Center Göttingen and the Max-Planck-SocietyGöttingenGermany
- International Max Planck Research School and Göttingen Graduate School for Neurosciences, Biophysics, and Molecular Biosciences (GGNB) at the University of GöttingenGöttingenGermany
| | - Deniz Yuezak
- European Neuroscience Institute - A Joint Initiative of the University Medical Center Göttingen and the Max-Planck-SocietyGöttingenGermany
- International Max Planck Research School and Göttingen Graduate School for Neurosciences, Biophysics, and Molecular Biosciences (GGNB) at the University of GöttingenGöttingenGermany
| | - Jan Clemens
- European Neuroscience Institute - A Joint Initiative of the University Medical Center Göttingen and the Max-Planck-SocietyGöttingenGermany
- Bernstein Center for Computational NeuroscienceGöttingenGermany
| |
Collapse
|
75
|
Lowe MX, Mohsenzadeh Y, Lahner B, Charest I, Oliva A, Teng S. Cochlea to categories: The spatiotemporal dynamics of semantic auditory representations. Cogn Neuropsychol 2021; 38:468-489. [PMID: 35729704 PMCID: PMC10589059 DOI: 10.1080/02643294.2022.2085085] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 03/31/2022] [Accepted: 05/25/2022] [Indexed: 10/17/2022]
Abstract
How does the auditory system categorize natural sounds? Here we apply multimodal neuroimaging to illustrate the progression from acoustic to semantically dominated representations. Combining magnetoencephalographic (MEG) and functional magnetic resonance imaging (fMRI) scans of observers listening to naturalistic sounds, we found superior temporal responses beginning ∼55 ms post-stimulus onset, spreading to extratemporal cortices by ∼100 ms. Early regions were distinguished less by onset/peak latency than by functional properties and overall temporal response profiles. Early acoustically-dominated representations trended systematically toward category dominance over time (after ∼200 ms) and space (beyond primary cortex). Semantic category representation was spatially specific: Vocalizations were preferentially distinguished in frontotemporal voice-selective regions and the fusiform; scenes and objects were distinguished in parahippocampal and medial place areas. Our results are consistent with real-world events coded via an extended auditory processing hierarchy, in which acoustic representations rapidly enter multiple streams specialized by category, including areas typically considered visual cortex.
Collapse
Affiliation(s)
- Matthew X. Lowe
- Computer Science and Artificial Intelligence Lab (CSAIL), MIT, Cambridge, MA
- Unlimited Sciences, Colorado Springs, CO
| | - Yalda Mohsenzadeh
- Computer Science and Artificial Intelligence Lab (CSAIL), MIT, Cambridge, MA
- The Brain and Mind Institute, The University of Western Ontario, London, ON, Canada
- Department of Computer Science, The University of Western Ontario, London, ON, Canada
| | - Benjamin Lahner
- Computer Science and Artificial Intelligence Lab (CSAIL), MIT, Cambridge, MA
| | - Ian Charest
- Département de Psychologie, Université de Montréal, Montréal, Québec, Canada
- Center for Human Brain Health, University of Birmingham, UK
| | - Aude Oliva
- Computer Science and Artificial Intelligence Lab (CSAIL), MIT, Cambridge, MA
| | - Santani Teng
- Computer Science and Artificial Intelligence Lab (CSAIL), MIT, Cambridge, MA
- Smith-Kettlewell Eye Research Institute (SKERI), San Francisco, CA
| |
Collapse
|
76
|
Goffinet J, Brudner S, Mooney R, Pearson J. Low-dimensional learned feature spaces quantify individual and group differences in vocal repertoires. eLife 2021; 10:e67855. [PMID: 33988503 PMCID: PMC8213406 DOI: 10.7554/elife.67855] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Accepted: 05/12/2021] [Indexed: 11/16/2022] Open
Abstract
Increases in the scale and complexity of behavioral data pose an increasing challenge for data analysis. A common strategy involves replacing entire behaviors with small numbers of handpicked, domain-specific features, but this approach suffers from several crucial limitations. For example, handpicked features may miss important dimensions of variability, and correlations among them complicate statistical testing. Here, by contrast, we apply the variational autoencoder (VAE), an unsupervised learning method, to learn features directly from data and quantify the vocal behavior of two model species: the laboratory mouse and the zebra finch. The VAE converges on a parsimonious representation that outperforms handpicked features on a variety of common analysis tasks, enables the measurement of moment-by-moment vocal variability on the timescale of tens of milliseconds in the zebra finch, provides strong evidence that mouse ultrasonic vocalizations do not cluster as is commonly believed, and captures the similarity of tutor and pupil birdsong with qualitatively higher fidelity than previous approaches. In all, we demonstrate the utility of modern unsupervised learning approaches to the quantification of complex and high-dimensional vocal behavior.
Collapse
Affiliation(s)
- Jack Goffinet
- Department of Computer Science, Duke UniversityDurhamUnited States
- Center for Cognitive Neurobiology, Duke UniversityDurhamUnited States
- Department of Neurobiology, Duke UniversityDurhamUnited States
| | - Samuel Brudner
- Department of Neurobiology, Duke UniversityDurhamUnited States
| | - Richard Mooney
- Department of Neurobiology, Duke UniversityDurhamUnited States
| | - John Pearson
- Center for Cognitive Neurobiology, Duke UniversityDurhamUnited States
- Department of Neurobiology, Duke UniversityDurhamUnited States
- Department of Biostatistics & Bioinformatics, Duke UniversityDurhamUnited States
- Department of Electrical and Computer Engineering, Duke UniversityDurhamUnited States
| |
Collapse
|
77
|
Jung DH, Kim NY, Moon SH, Jhin C, Kim HJ, Yang JS, Kim HS, Lee TS, Lee JY, Park SH. Deep Learning-Based Cattle Vocal Classification Model and Real-Time Livestock Monitoring System with Noise Filtering. Animals (Basel) 2021; 11:ani11020357. [PMID: 33535390 PMCID: PMC7911430 DOI: 10.3390/ani11020357] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Revised: 01/24/2021] [Accepted: 01/27/2021] [Indexed: 11/16/2022] Open
Abstract
The priority placed on animal welfare in the meat industry is increasing the importance of understanding livestock behavior. In this study, we developed a web-based monitoring and recording system based on artificial intelligence analysis for the classification of cattle sounds. The deep learning classification model of the system is a convolutional neural network (CNN) model that takes voice information converted to Mel-frequency cepstral coefficients (MFCCs) as input. The CNN model first achieved an accuracy of 91.38% in recognizing cattle sounds. Further, short-time Fourier transform-based noise filtering was applied to remove background noise, improving the classification model accuracy to 94.18%. Categorized cattle voices were then classified into four classes, and a total of 897 classification records were acquired for the classification model development. A final accuracy of 81.96% was obtained for the model. Our proposed web-based platform that provides information obtained from a total of 12 sound sensors provides cattle vocalization monitoring in real time, enabling farm owners to determine the status of their cattle.
Collapse
Affiliation(s)
- Dae-Hyun Jung
- Smart Farm Research Center, Korea Institute of Science and Technology (KIST), Gangneung 25451, Korea; (D.-H.J.); (C.J.); (J.-S.Y.); (H.S.K.); (T.S.L.); (J.Y.L.)
| | - Na Yeon Kim
- Department of Bio-Convergence Science, College of Biomedical and Health Science, Konkuk University, Chungju 27478, Korea; (N.Y.K.); (S.H.M.)
- Asia Pacific Ruminant Institute, Icheon 17385, Korea
| | - Sang Ho Moon
- Department of Bio-Convergence Science, College of Biomedical and Health Science, Konkuk University, Chungju 27478, Korea; (N.Y.K.); (S.H.M.)
| | - Changho Jhin
- Smart Farm Research Center, Korea Institute of Science and Technology (KIST), Gangneung 25451, Korea; (D.-H.J.); (C.J.); (J.-S.Y.); (H.S.K.); (T.S.L.); (J.Y.L.)
- Department of Smartfarm Research, 1778 Living Tech, Sejong 30033, Korea
| | - Hak-Jin Kim
- Department of Biosystems and Biomaterial Engineering, College of Agriculture and Life Sciences, Seoul National University, Seoul 08826, Korea;
| | - Jung-Seok Yang
- Smart Farm Research Center, Korea Institute of Science and Technology (KIST), Gangneung 25451, Korea; (D.-H.J.); (C.J.); (J.-S.Y.); (H.S.K.); (T.S.L.); (J.Y.L.)
| | - Hyoung Seok Kim
- Smart Farm Research Center, Korea Institute of Science and Technology (KIST), Gangneung 25451, Korea; (D.-H.J.); (C.J.); (J.-S.Y.); (H.S.K.); (T.S.L.); (J.Y.L.)
| | - Taek Sung Lee
- Smart Farm Research Center, Korea Institute of Science and Technology (KIST), Gangneung 25451, Korea; (D.-H.J.); (C.J.); (J.-S.Y.); (H.S.K.); (T.S.L.); (J.Y.L.)
| | - Ju Young Lee
- Smart Farm Research Center, Korea Institute of Science and Technology (KIST), Gangneung 25451, Korea; (D.-H.J.); (C.J.); (J.-S.Y.); (H.S.K.); (T.S.L.); (J.Y.L.)
| | - Soo Hyun Park
- Smart Farm Research Center, Korea Institute of Science and Technology (KIST), Gangneung 25451, Korea; (D.-H.J.); (C.J.); (J.-S.Y.); (H.S.K.); (T.S.L.); (J.Y.L.)
- Correspondence: ; Tel.: +82-33-650-3661
| |
Collapse
|
78
|
Sainburg T, Thielk M, Gentner TQ. Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires. PLoS Comput Biol 2020; 16:e1008228. [PMID: 33057332 PMCID: PMC7591061 DOI: 10.1371/journal.pcbi.1008228] [Citation(s) in RCA: 76] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Revised: 10/27/2020] [Accepted: 08/08/2020] [Indexed: 12/15/2022] Open
Abstract
Animals produce vocalizations that range in complexity from a single repeated call to hundreds of unique vocal elements patterned in sequences unfolding over hours. Characterizing complex vocalizations can require considerable effort and a deep intuition about each species' vocal behavior. Even with a great deal of experience, human characterizations of animal communication can be affected by human perceptual biases. We present a set of computational methods for projecting animal vocalizations into low dimensional latent representational spaces that are directly learned from the spectrograms of vocal signals. We apply these methods to diverse datasets from over 20 species, including humans, bats, songbirds, mice, cetaceans, and nonhuman primates. Latent projections uncover complex features of data in visually intuitive and quantifiable ways, enabling high-powered comparative analyses of vocal acoustics. We introduce methods for analyzing vocalizations as both discrete sequences and as continuous latent variables. Each method can be used to disentangle complex spectro-temporal structure and observe long-timescale organization in communication.
Collapse
Affiliation(s)
- Tim Sainburg
- Department of Psychology, University of California, San Diego, La Jolla, CA, USA
- Center for Academic Research & Training in Anthropogeny, University of California, San Diego, La Jolla, CA, USA
| | - Marvin Thielk
- Neurosciences Graduate Program, University of California, San Diego, La Jolla, CA, USA
| | - Timothy Q. Gentner
- Department of Psychology, University of California, San Diego, La Jolla, CA, USA
- Neurosciences Graduate Program, University of California, San Diego, La Jolla, CA, USA
- Neurobiology Section, Division of Biological Sciences, University of California, San Diego, La Jolla, CA, USA
- Kavli Institute for Brain and Mind, University of California, San Diego, La Jolla, CA, USA
| |
Collapse
|