1
|
Knight E, Rhinehart T, de Zwaan DR, Weldy MJ, Cartwright M, Hawley SH, Larkin JL, Lesmeister D, Bayne E, Kitzes J. Individual identification in acoustic recordings. Trends Ecol Evol 2024:S0169-5347(24)00118-6. [PMID: 38862357 DOI: 10.1016/j.tree.2024.05.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 05/08/2024] [Accepted: 05/15/2024] [Indexed: 06/13/2024]
Abstract
Recent advances in bioacoustics combined with acoustic individual identification (AIID) could open frontiers for ecological and evolutionary research because traditional methods of identifying individuals are invasive, expensive, labor-intensive, and potentially biased. Despite overwhelming evidence that most taxa have individual acoustic signatures, the application of AIID remains challenging and uncommon. Furthermore, the methods most commonly used for AIID are not compatible with many potential AIID applications. Deep learning in adjacent disciplines suggests opportunities to advance AIID, but such progress is limited by training data. We suggest that broadscale implementation of AIID is achievable, but researchers should prioritize methods that maximize the potential applications of AIID, and develop case studies with easy taxa at smaller spatiotemporal scales before progressing to more difficult scenarios.
Collapse
Affiliation(s)
- Elly Knight
- Department of Biological Sciences, Alberta Biodiversity Monitoring Institute, University of Alberta, Edmonton, Alberta, T6G 2E6, Canada.
| | - Tessa Rhinehart
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, 15260, USA.
| | - Devin R de Zwaan
- Department of Biology, Mount Allison University, Sackville, NB, E4L 1E4, Canada; Acadia University, Wolfville, NS, B4P 2R6, Canada
| | - Matthew J Weldy
- Department of Forest Ecosystems and Society, Oregon State University, Corvallis, OR, 97331-5704, USA
| | - Mark Cartwright
- Department of Informatics, New Jersey Institute of Technology, Newark, NJ, 07102, USA
| | - Scott H Hawley
- Chemistry and Physics Department, Belmont University, Nashville, TN, 37212, USA
| | - Jeffery L Larkin
- Department of Biology, Indiana University of Pennsylvania, Indiana, PA, 15705-1081, USA; American Bird Conservancy, The Plains, VA, 20198, USA
| | - Damon Lesmeister
- USDA Forest Service, Pacific Northwest Research Station, Corvallis Forestry Science Laboratory, Oregon State University, Corvallis, OR, 97330, USA
| | - Erin Bayne
- Department of Biological Sciences, Alberta Biodiversity Monitoring Institute, University of Alberta, Edmonton, Alberta, T6G 2E6, Canada
| | - Justin Kitzes
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| |
Collapse
|
2
|
Erb WM, Ross W, Kazanecki H, Mitra Setia T, Madhusudhana S, Clink DJ. Vocal complexity in the long calls of Bornean orangutans. PeerJ 2024; 12:e17320. [PMID: 38766489 PMCID: PMC11100477 DOI: 10.7717/peerj.17320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 04/09/2024] [Indexed: 05/22/2024] Open
Abstract
Vocal complexity is central to many evolutionary hypotheses about animal communication. Yet, quantifying and comparing complexity remains a challenge, particularly when vocal types are highly graded. Male Bornean orangutans (Pongo pygmaeus wurmbii) produce complex and variable "long call" vocalizations comprising multiple sound types that vary within and among individuals. Previous studies described six distinct call (or pulse) types within these complex vocalizations, but none quantified their discreteness or the ability of human observers to reliably classify them. We studied the long calls of 13 individuals to: (1) evaluate and quantify the reliability of audio-visual classification by three well-trained observers, (2) distinguish among call types using supervised classification and unsupervised clustering, and (3) compare the performance of different feature sets. Using 46 acoustic features, we used machine learning (i.e., support vector machines, affinity propagation, and fuzzy c-means) to identify call types and assess their discreteness. We additionally used Uniform Manifold Approximation and Projection (UMAP) to visualize the separation of pulses using both extracted features and spectrogram representations. Supervised approaches showed low inter-observer reliability and poor classification accuracy, indicating that pulse types were not discrete. We propose an updated pulse classification approach that is highly reproducible across observers and exhibits strong classification accuracy using support vector machines. Although the low number of call types suggests long calls are fairly simple, the continuous gradation of sounds seems to greatly boost the complexity of this system. This work responds to calls for more quantitative research to define call types and quantify gradedness in animal vocal systems and highlights the need for a more comprehensive framework for studying vocal complexity vis-à-vis graded repertoires.
Collapse
Affiliation(s)
- Wendy M. Erb
- K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University, Ithaca, NY, United States of America
- Department of Anthropology, Rutgers, The State University of New Jersey, New Brunswick, United States of America
| | - Whitney Ross
- K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University, Ithaca, NY, United States of America
| | - Haley Kazanecki
- K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University, Ithaca, NY, United States of America
| | - Tatang Mitra Setia
- Primate Research Center, Universitas Nasional Jakarta, Jakarta, Indonesia
- Department of Biology, Faculty of Biology and Agriculture, Universitas Nasional Jakarta, Jakarta, Indonesia
| | - Shyam Madhusudhana
- K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University, Ithaca, NY, United States of America
- Centre for Marine Science and Technology, Curtin University, Perth, Australia
| | - Dena J. Clink
- K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University, Ithaca, NY, United States of America
| |
Collapse
|
3
|
Batist CH, Dufourq E, Jeantet L, Razafindraibe MN, Randriamanantena F, Baden AL. An integrated passive acoustic monitoring and deep learning pipeline for black-and-white ruffed lemurs (Varecia variegata) in Ranomafana National Park, Madagascar. Am J Primatol 2024; 86:e23599. [PMID: 38244194 DOI: 10.1002/ajp.23599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 01/05/2024] [Accepted: 01/09/2024] [Indexed: 01/22/2024]
Abstract
The urgent need for effective wildlife monitoring solutions in the face of global biodiversity loss has resulted in the emergence of conservation technologies such as passive acoustic monitoring (PAM). While PAM has been extensively used for marine mammals, birds, and bats, its application to primates is limited. Black-and-white ruffed lemurs (Varecia variegata) are a promising species to test PAM with due to their distinctive and loud roar-shrieks. Furthermore, these lemurs are challenging to monitor via traditional methods due to their fragmented and often unpredictable distribution in Madagascar's dense eastern rainforests. Our goal in this study was to develop a machine learning pipeline for automated call detection from PAM data, compare the effectiveness of PAM versus in-person observations, and investigate diel patterns in lemur vocal behavior. We did this study at Mangevo, Ranomafana National Park by concurrently conducting focal follows and deploying autonomous recorders in May-July 2019. We used transfer learning to build a convolutional neural network (optimized for recall) that automated the detection of lemur calls (57-h runtime; recall = 0.94, F1 = 0.70). We found that PAM outperformed in-person observations, saving time, money, and labor while also providing re-analyzable data. Using PAM yielded novel insights into V. variegata diel vocal patterns; we present the first published evidence of nocturnal calling. We developed a graphic user interface and open-sourced data and code, to serve as a resource for primatologists interested in implementing PAM and machine learning. By leveraging the potential of this pipeline, we can address the urgent need for effective primate population surveys to inform conservation strategies.
Collapse
Affiliation(s)
- Carly H Batist
- Department of Anthropology, City University of New York (CUNY) Graduate Center, New York, New York, USA
- New York Consortium in Evolutionary Primatology (NYCEP), New York, New York, USA
- Rainforest Connection (RFCx), Katy, Texas, USA
| | - Emmanuel Dufourq
- African Institute for Mathematical Sciences, Muizenberg, South Africa
- Department of Mathematical Sciences, Stellenbosch University, Stellenbosch, South Africa
- National Institute for Theoretical & Computational Sciences, Stellenbosch, South Africa
- African Institute for Mathematical Sciences, Research and Innovation Centre, Kigali, Rwanda
| | - Lorène Jeantet
- African Institute for Mathematical Sciences, Muizenberg, South Africa
- Department of Mathematical Sciences, Stellenbosch University, Stellenbosch, South Africa
- National Institute for Theoretical & Computational Sciences, Stellenbosch, South Africa
| | - Mendrika N Razafindraibe
- Department of Animal Biology, University of Antananarivo, Antananarivo, Madagascar
- Institut International de Science Sociale, Antananarivo, Madagascar
| | | | - Andrea L Baden
- Department of Anthropology, City University of New York (CUNY) Graduate Center, New York, New York, USA
- New York Consortium in Evolutionary Primatology (NYCEP), New York, New York, USA
- Department of Anthropology, Hunter College of City University of New York (CUNY), New York, New York, USA
| |
Collapse
|
4
|
Wearn OR, Trinh-Dinh H, Ma CY, Khac Le Q, Nguyen P, Van Hoang T, Van Luong C, Van Hua T, Van Hoang Q, Fan PF, Duc Nguyen T. Vocal fingerprinting reveals a substantially smaller global population of the Critically Endangered cao vit gibbon (Nomascus nasutus) than previously thought. Sci Rep 2024; 14:416. [PMID: 38172177 PMCID: PMC10764777 DOI: 10.1038/s41598-023-50838-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Accepted: 12/26/2023] [Indexed: 01/05/2024] Open
Abstract
The cao vit gibbon (Nomascus nasutus) is one of the rarest primates on Earth and now only survives in a single forest patch of less than 5000 ha on the Vietnam-China border. Accurate monitoring of the last remaining population is critical to inform ongoing conservation interventions and track conservation success over time. However, traditional methods for monitoring gibbons, involving triangulation of groups from their songs, are inherently subjective and likely subject to considerable measurement errors. To overcome this, we aimed to use 'vocal fingerprinting' to distinguish the different singing males in the population. During the 2021 population survey, we complemented the traditional observations made by survey teams with a concurrent passive acoustic monitoring array. Counts of gibbon group sizes were also assisted with a UAV-mounted thermal camera. After identifying eight family groups in the acoustic data and incorporating long-term data, we estimate that the population was comprised of 74 individuals in 11 family groups, which is 38% smaller than previously thought. We have no evidence that the population has declined-indeed it appears to be growing, with new groups having formed in recent years-and the difference is instead due to double-counting of groups in previous surveys employing the triangulation method. Indeed, using spatially explicit capture-recapture modelling, we uncovered substantial measurement error in the bearings and distances from field teams. We also applied semi- and fully-automatic approaches to clustering the male calls into groups, finding no evidence that we had missed any males with the manual approach. Given the very small size of the population, conservation actions are now even more urgent, in particular habitat restoration to allow the population to expand. Our new population estimate now serves as a more robust basis for informing management actions and tracking conservation success over time.
Collapse
Affiliation(s)
| | - Hoang Trinh-Dinh
- School of Bioresources and Technology, King Mongkut's University of Technology Thonburi, Bangkok, Thailand
| | - Chang-Yong Ma
- College of Life Sciences, Guangxi Normal University, Guilin, China
| | | | | | | | | | - Tru Van Hua
- Trung Khanh Ranger Station, Forest Protection Department, Ministry of Agriculture and Rural Development, Trung Khanh, Cao Bang, Vietnam
| | - Quan Van Hoang
- Trung Khanh Ranger Station, Forest Protection Department, Ministry of Agriculture and Rural Development, Trung Khanh, Cao Bang, Vietnam
| | - Peng-Fei Fan
- School of Life Sciences, Sun Yat-sen University, Guangzhou, Guangdong, China
| | | |
Collapse
|
5
|
Enari H, Enari HS. Bioacoustic monitoring to determine addiction levels of primates to the human sphere: A feasibility study on Japanese macaques. Am J Primatol 2023; 85:e23558. [PMID: 37781937 DOI: 10.1002/ajp.23558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 09/17/2023] [Accepted: 09/22/2023] [Indexed: 10/03/2023]
Abstract
Some nonhuman primate species, whose original habitats have been reclaimed by artificial activities, have acquired boldness toward humans which is evident based on the diminished frequency of escape behaviors. Eventually, such species have become regular users of human settlements, and are referred to as "urban primates." Considering this, we developed a noninvasive technique based on bioacoustics to provide a transparent assessment of troop addiction levels in anthropogenic environments, which are determined by the dependence on agricultural crops and human living sphere for their diets and daily ranging, respectively. We attempted to quantify the addiction levels based on the boldness of troops when raiding settlements, characterized by a "landscape of fear" because of the presence of humans as predators. We hypothesized that the boldness of troops could be measured using two indices: the frequency of raiding events on settlements and the amount of time spent there. For hypothesis testing, we devised an efficient method to measure these two indices using sound cues (i.e., spontaneous calls) for tracing troop movements that are obtainable throughout the day from most primate species (e.g., contact calls). We conducted a feasibility study of this assessment procedure, targeting troops of Japanese macaques (Macaca fuscata). For this study, we collected 346 recording weeks of data using autonomous recorders from 24 troops with different addiction levels during the nonsnowy seasons. The results demonstrated that troops that reached the threshold level, at which radical interventions including mass culling of troop members is officially permitted, could be readily identified based on the following behavioral characteristics: troop members raiding settlements two or three times per week and mean time spent in settlements per raiding event exceeding 0.4 h. Thus, bioacoustic monitoring could become a valid option to ensure the objectivity of policy judgment in urban primate management.
Collapse
Affiliation(s)
- Hiroto Enari
- Faculty of Agriculture, Yamagata University, Tsuruoka, Yamagata, Japan
| | - Haruka S Enari
- Faculty of Agriculture, Yamagata University, Tsuruoka, Yamagata, Japan
| |
Collapse
|
6
|
Phaniraj N, Wierucka K, Zürcher Y, Burkart JM. Who is calling? Optimizing source identification from marmoset vocalizations with hierarchical machine learning classifiers. J R Soc Interface 2023; 20:20230399. [PMID: 37848054 PMCID: PMC10581777 DOI: 10.1098/rsif.2023.0399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 09/25/2023] [Indexed: 10/19/2023] Open
Abstract
With their highly social nature and complex vocal communication system, marmosets are important models for comparative studies of vocal communication and, eventually, language evolution. However, our knowledge about marmoset vocalizations predominantly originates from playback studies or vocal interactions between dyads, and there is a need to move towards studying group-level communication dynamics. Efficient source identification from marmoset vocalizations is essential for this challenge, and machine learning algorithms (MLAs) can aid it. Here we built a pipeline capable of plentiful feature extraction, meaningful feature selection, and supervised classification of vocalizations of up to 18 marmosets. We optimized the classifier by building a hierarchical MLA that first learned to determine the sex of the source, narrowed down the possible source individuals based on their sex and then determined the source identity. We were able to correctly identify the source individual with high precisions (87.21%-94.42%, depending on call type, and up to 97.79% after the removal of twins from the dataset). We also examine the robustness of identification across varying sample sizes. Our pipeline is a promising tool not only for source identification from marmoset vocalizations but also for analysing vocalizations of other species.
Collapse
Affiliation(s)
- Nikhil Phaniraj
- Institute of Evolutionary Anthropology (IEA), University of Zurich, Winterthurerstrasse 190, 8057 Zürich, Switzerland
- Neuroscience Center Zurich (ZNZ), University of Zurich and ETH Zurich, Winterthurerstrasse 190, 8057 Zürich, Switzerland
- Department of Biology, Indian Institute of Science Education and Research (IISER) Pune, Dr. Homi Bhabha Road, Pune 411008, India
| | - Kaja Wierucka
- Institute of Evolutionary Anthropology (IEA), University of Zurich, Winterthurerstrasse 190, 8057 Zürich, Switzerland
- Behavioral Ecology & Sociobiology Unit, German Primate Center, Leibniz Institute for Primate Research, Kellnerweg 4, 37077 Göttingen, Germany
| | - Yvonne Zürcher
- Institute of Evolutionary Anthropology (IEA), University of Zurich, Winterthurerstrasse 190, 8057 Zürich, Switzerland
| | - Judith M. Burkart
- Institute of Evolutionary Anthropology (IEA), University of Zurich, Winterthurerstrasse 190, 8057 Zürich, Switzerland
- Neuroscience Center Zurich (ZNZ), University of Zurich and ETH Zurich, Winterthurerstrasse 190, 8057 Zürich, Switzerland
- Center for the Interdisciplinary Study of Language Evolution (ISLE), University of Zurich, Affolternstrasse 56, 8050 Zürich, Switzerland
| |
Collapse
|
7
|
Best P, Paris S, Glotin H, Marxer R. Deep audio embeddings for vocalisation clustering. PLoS One 2023; 18:e0283396. [PMID: 37428759 DOI: 10.1371/journal.pone.0283396] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 06/25/2023] [Indexed: 07/12/2023] Open
Abstract
The study of non-human animals' communication systems generally relies on the transcription of vocal sequences using a finite set of discrete units. This set is referred to as a vocal repertoire, which is specific to a species or a sub-group of a species. When conducted by human experts, the formal description of vocal repertoires can be laborious and/or biased. This motivates computerised assistance for this procedure, for which machine learning algorithms represent a good opportunity. Unsupervised clustering algorithms are suited for grouping close points together, provided a relevant representation. This paper therefore studies a new method for encoding vocalisations, allowing for automatic clustering to alleviate vocal repertoire characterisation. Borrowing from deep representation learning, we use a convolutional auto-encoder network to learn an abstract representation of vocalisations. We report on the quality of the learnt representation, as well as of state of the art methods, by quantifying their agreement with expert labelled vocalisation types from 8 datasets of other studies across 6 species (birds and marine mammals). With this benchmark, we demonstrate that using auto-encoders improves the relevance of vocalisation representation which serves repertoire characterisation using a very limited number of settings. We also publish a Python package for the bioacoustic community to train their own vocalisation auto-encoders or use a pretrained encoder to browse vocal repertoires and ease unit wise annotation.
Collapse
Affiliation(s)
- Paul Best
- Université de Toulon, Aix Marseille Univ, CNRS, LIS, Toulon, France
| | - Sébastien Paris
- Université de Toulon, Aix Marseille Univ, CNRS, LIS, Toulon, France
| | - Hervé Glotin
- Université de Toulon, Aix Marseille Univ, CNRS, LIS, Toulon, France
| | - Ricard Marxer
- Université de Toulon, Aix Marseille Univ, CNRS, LIS, Toulon, France
| |
Collapse
|
8
|
Zhao S, Xie J, Ding CQ. Automatic individual recognition of wild Crested Ibis based on hybrid method of self-supervised learning and clustering. ECOL INFORM 2023. [DOI: 10.1016/j.ecoinf.2023.102089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]
|
9
|
Clink DJ, Kier I, Ahmad AH, Klinck H. A workflow for the automated detection and classification of female gibbon calls from long-term acoustic recordings. Front Ecol Evol 2023. [DOI: 10.3389/fevo.2023.1071640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023] Open
Abstract
Passive acoustic monitoring (PAM) allows for the study of vocal animals on temporal and spatial scales difficult to achieve using only human observers. Recent improvements in recording technology, data storage, and battery capacity have led to increased use of PAM. One of the main obstacles in implementing wide-scale PAM programs is the lack of open-source programs that efficiently process terabytes of sound recordings and do not require large amounts of training data. Here we describe a workflow for detecting, classifying, and visualizing female Northern grey gibbon calls in Sabah, Malaysia. Our approach detects sound events using band-limited energy summation and does binary classification of these events (gibbon female or not) using machine learning algorithms (support vector machine and random forest). We then applied an unsupervised approach (affinity propagation clustering) to see if we could further differentiate between true and false positives or the number of gibbon females in our dataset. We used this workflow to address three questions: (1) does this automated approach provide reliable estimates of temporal patterns of gibbon calling activity; (2) can unsupervised approaches be applied as a post-processing step to improve the performance of the system; and (3) can unsupervised approaches be used to estimate how many female individuals (or clusters) there are in our study area? We found that performance plateaued with >160 clips of training data for each of our two classes. Using optimized settings, our automated approach achieved a satisfactory performance (F1 score ~ 80%). The unsupervised approach did not effectively differentiate between true and false positives or return clusters that appear to correspond to the number of females in our study area. Our results indicate that more work needs to be done before unsupervised approaches can be reliably used to estimate the number of individual animals occupying an area from PAM data. Future work applying these methods across sites and different gibbon species and comparisons to deep learning approaches will be crucial for future gibbon conservation initiatives across Southeast Asia.
Collapse
|
10
|
McGinn K, Kahl S, Peery MZ, Klinck H, Wood CM. Feature embeddings from the BirdNET algorithm provide insights into avian ecology. ECOL INFORM 2023. [DOI: 10.1016/j.ecoinf.2023.101995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
|
11
|
Morales G, Vargas V, Espejo D, Poblete V, Tomasevic JA, Otondo F, Navedo JG. Method for passive acoustic monitoring of bird communities using UMAP and a deep neural network. ECOL INFORM 2022. [DOI: 10.1016/j.ecoinf.2022.101909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
12
|
Introducing the Software CASE (Cluster and Analyze Sound Events) by Comparing Different Clustering Methods and Audio Transformation Techniques Using Animal Vocalizations. Animals (Basel) 2022; 12:ani12162020. [PMID: 36009611 PMCID: PMC9404437 DOI: 10.3390/ani12162020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 07/28/2022] [Accepted: 08/04/2022] [Indexed: 11/17/2022] Open
Abstract
Simple Summary Unsupervised clustering algorithms are widely used in ecology and conservation to classify animal vocalizations, but also offer various advantages in basic research, contributing to the understanding of acoustic communication. Nevertheless, there are still some challenges to overcome. For instance, the quality of the clustering result depends on the audio transformation technique previously used to adjust the audio data. Moreover, it is difficult to verify the reliability of the clustering result. To analyze bioacoustic data using a clustering algorithm, it is, therefore, essential to select a reasonable algorithm from the many existing algorithms and prepare the recorded vocalizations so that the resulting values characterize a vocalization as accurately as possible. Frequency-modulated vocalizations, whose frequencies change over time, pose a particular problem. In this paper, we present the software CASE, which includes various clustering methods and provides an overview of their strengths and weaknesses concerning the classification of bioacoustic data. This software uses a multidimensional feature-extraction method to achieve better clustering results, especially for frequency-modulated vocalizations. Abstract Unsupervised clustering algorithms are widely used in ecology and conservation to classify animal sounds, but also offer several advantages in basic bioacoustics research. Consequently, it is important to overcome the existing challenges. A common practice is extracting the acoustic features of vocalizations one-dimensionally, only extracting an average value for a given feature for the entire vocalization. With frequency-modulated vocalizations, whose acoustic features can change over time, this can lead to insufficient characterization. Whether the necessary parameters have been set correctly and the obtained clustering result reliably classifies the vocalizations subsequently often remains unclear. The presented software, CASE, is intended to overcome these challenges. Established and new unsupervised clustering methods (community detection, affinity propagation, HDBSCAN, and fuzzy clustering) are tested in combination with various classifiers (k-nearest neighbor, dynamic time-warping, and cross-correlation) using differently transformed animal vocalizations. These methods are compared with predefined clusters to determine their strengths and weaknesses. In addition, a multidimensional data transformation procedure is presented that better represents the course of multiple acoustic features. The results suggest that, especially with frequency-modulated vocalizations, clustering is more applicable with multidimensional feature extraction compared with one-dimensional feature extraction. The characterization and clustering of vocalizations in multidimensional space offer great potential for future bioacoustic studies. The software CASE includes the developed method of multidimensional feature extraction, as well as all used clustering methods. It allows quickly applying several clustering algorithms to one data set to compare their results and to verify their reliability based on their consistency. Moreover, the software CASE determines the optimal values of most of the necessary parameters automatically. To take advantage of these benefits, the software CASE is provided for free download.
Collapse
|
13
|
Comella I, Tasirin JS, Klinck H, Johnson LM, Clink DJ. Investigating note repertoires and acoustic tradeoffs in the duet contributions of a basal haplorrhine primate. Front Ecol Evol 2022. [DOI: 10.3389/fevo.2022.910121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Acoustic communication serves a crucial role in the social interactions of vocal animals. Duetting—the coordinated singing among pairs of animals—has evolved independently multiple times across diverse taxonomic groups including insects, frogs, birds, and mammals. A crucial first step for understanding how information is encoded and transferred in duets is through quantifying the acoustic repertoire, which can reveal differences and similarities on multiple levels of analysis and provides the groundwork necessary for further studies of the vocal communication patterns of the focal species. Investigating acoustic tradeoffs, such as the tradeoff between the rate of syllable repetition and note bandwidth, can also provide important insights into the evolution of duets, as these tradeoffs may represent the physical and mechanical limits on signal design. In addition, identifying which sex initiates the duet can provide insights into the function of the duets. We have three main goals in the current study: (1) provide a descriptive, fine-scale analysis of Gursky’s spectral tarsier (Tarsius spectrumgurskyae) duets; (2) use unsupervised approaches to investigate sex-specific note repertoires; and (3) test for evidence of acoustic tradeoffs in the rate of note repetition and bandwidth of tarsier duet contributions. We found that both sexes were equally likely to initiate the duets and that pairs differed substantially in the duration of their duets. Our unsupervised clustering analyses indicate that both sexes have highly graded note repertoires. We also found evidence for acoustic tradeoffs in both male and female duet contributions, but the relationship in females was much more pronounced. The prevalence of this tradeoff across diverse taxonomic groups including birds, bats, and primates indicates the constraints that limit the production of rapidly repeating broadband notes may be one of the few ‘universals’ in vocal communication. Future carefully designed playback studies that investigate the behavioral response, and therefore potential information transmitted in duets to conspecifics, will be highly informative.
Collapse
|
14
|
Long-Distance Vocal Signaling in White-Handed Gibbons (Hylobates lar). INT J PRIMATOL 2022. [DOI: 10.1007/s10764-022-00312-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
15
|
Trapanotto M, Nanni L, Brahnam S, Guo X. Convolutional Neural Networks for the Identification of African Lions from Individual Vocalizations. J Imaging 2022; 8:jimaging8040096. [PMID: 35448223 PMCID: PMC9029749 DOI: 10.3390/jimaging8040096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 03/17/2022] [Accepted: 03/29/2022] [Indexed: 02/05/2023] Open
Abstract
The classification of vocal individuality for passive acoustic monitoring (PAM) and census of animals is becoming an increasingly popular area of research. Nearly all studies in this field of inquiry have relied on classic audio representations and classifiers, such as Support Vector Machines (SVMs) trained on spectrograms or Mel-Frequency Cepstral Coefficients (MFCCs). In contrast, most current bioacoustic species classification exploits the power of deep learners and more cutting-edge audio representations. A significant reason for avoiding deep learning in vocal identity classification is the tiny sample size in the collections of labeled individual vocalizations. As is well known, deep learners require large datasets to avoid overfitting. One way to handle small datasets with deep learning methods is to use transfer learning. In this work, we evaluate the performance of three pretrained CNNs (VGG16, ResNet50, and AlexNet) on a small, publicly available lion roar dataset containing approximately 150 samples taken from five male lions. Each of these networks is retrained on eight representations of the samples: MFCCs, spectrogram, and Mel spectrogram, along with several new ones, such as VGGish and stockwell, and those based on the recently proposed LM spectrogram. The performance of these networks, both individually and in ensembles, is analyzed and corroborated using the Equal Error Rate and shown to surpass previous classification attempts on this dataset; the best single network achieved over 95% accuracy and the best ensembles over 98% accuracy. The contributions this study makes to the field of individual vocal classification include demonstrating that it is valuable and possible, with caution, to use transfer learning with single pretrained CNNs on the small datasets available for this problem domain. We also make a contribution to bioacoustics generally by offering a comparison of the performance of many state-of-the-art audio representations, including for the first time the LM spectrogram and stockwell representations. All source code for this study is available on GitHub.
Collapse
Affiliation(s)
- Martino Trapanotto
- Department of Information Engineering, University of Padua, Via Gradenigo 6, 35131 Padova, Italy; (M.T.); (L.N.)
| | - Loris Nanni
- Department of Information Engineering, University of Padua, Via Gradenigo 6, 35131 Padova, Italy; (M.T.); (L.N.)
| | - Sheryl Brahnam
- Information Technology and Cybersecurity, Missouri State University, 901 S. National, Springfield, MO 65897, USA;
- Correspondence: ; Tel.: +1-417-873-9979
| | - Xiang Guo
- Information Technology and Cybersecurity, Missouri State University, 901 S. National, Springfield, MO 65897, USA;
| |
Collapse
|
16
|
Linhart P, Mahamoud-Issa M, Stowell D, Blumstein DT. The potential for acoustic individual identification in mammals. Mamm Biol 2022. [DOI: 10.1007/s42991-021-00222-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
17
|
Clink DJ, Zafar M, Ahmad AH, Lau AR. Limited Evidence for Individual Signatures or Site-Level Patterns of Variation in Male Northern Gray Gibbon (Hylobates funereus) Duet Codas. INT J PRIMATOL 2021. [DOI: 10.1007/s10764-021-00250-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
18
|
Mercado E, Perazio CE. All units are equal in humpback whale songs, but some are more equal than others. Anim Cogn 2021; 25:149-177. [PMID: 34363127 DOI: 10.1007/s10071-021-01539-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Revised: 07/13/2021] [Accepted: 07/21/2021] [Indexed: 11/28/2022]
Abstract
Flexible production and perception of vocalizations is linked to an impressive array of cognitive capacities including language acquisition by humans, song learning by birds, biosonar in bats, and vocal imitation by cetaceans. Here, we characterize a portion of the repertoire of one of the most impressive vocalizers in nature: the humpback whale. Qualitative and quantitative analyses of sounds (units) produced by humpback whales revealed that singers gradually morphed streams of units along multiple acoustic dimensions within songs, maintaining the continuity of spectral content across subjectively dissimilar unit "types." Singers consistently produced some unit forms more frequently and intensely than others, suggesting that units are functionally heterogeneous. The precision with which singing humpback whales continuously adjusted the acoustic characteristics of units shows that they possess exquisite vocal control mechanisms and vocal flexibility beyond what is seen in most animals other than humans. The gradual morphing of units within songs that we observed is inconsistent with past claims that humpback whales construct songs from a fixed repertoire of discrete unit types. These findings challenge the results of past studies based on fixed-unit classification methods and argue for the development of new metrics for characterizing the graded structure of units. The specific vocal variations that singers produced suggest that humpback whale songs are unlikely to provide detailed information about a singer's reproductive fitness, but can reveal the precise locations and movements of singers from long distances and may enhance the effectiveness of units as sonar signals.
Collapse
Affiliation(s)
- Eduardo Mercado
- Department of Psychology, University at Buffalo, The State University of New York, Park Hall, Buffalo, NY, 14260, USA.
| | - Christina E Perazio
- Department of Psychology, University at Buffalo, The State University of New York, Park Hall, Buffalo, NY, 14260, USA.,School of Social and Behavioral Sciences, University of New England, Biddeford, ME, USA
| |
Collapse
|