1
|
Zhang S, Gao Y, Cai J, Yang H, Zhao Q, Pan F. A Novel Bird Sound Recognition Method Based on Multifeature Fusion and a Transformer Encoder. SENSORS (BASEL, SWITZERLAND) 2023; 23:8099. [PMID: 37836929 PMCID: PMC10575132 DOI: 10.3390/s23198099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 09/16/2023] [Accepted: 09/18/2023] [Indexed: 10/15/2023]
Abstract
Birds play a vital role in the study of ecosystems and biodiversity. Accurate bird identification helps monitor biodiversity, understand the functions of ecosystems, and develop effective conservation strategies. However, previous bird sound recognition methods often relied on single features and overlooked the spatial information associated with these features, leading to low accuracy. Recognizing this gap, the present study proposed a bird sound recognition method that employs multiple convolutional neural-based networks and a transformer encoder to provide a reliable solution for identifying and classifying birds based on their unique sounds. We manually extracted various acoustic features as model inputs, and feature fusion was applied to obtain the final set of feature vectors. Feature fusion combines the deep features extracted by various networks, resulting in a more comprehensive feature set, thereby improving recognition accuracy. The multiple integrated acoustic features, such as mel frequency cepstral coefficients (MFCC), chroma features (Chroma) and Tonnetz features, were encoded by a transformer encoder. The transformer encoder effectively extracted the positional relationships between bird sound features, resulting in enhanced recognition accuracy. The experimental results demonstrated the exceptional performance of our method with an accuracy of 97.99%, a recall of 96.14%, an F1 score of 96.88% and a precision of 97.97% on the Birdsdata dataset. Furthermore, our method achieved an accuracy of 93.18%, a recall of 92.43%, an F1 score of 93.14% and a precision of 93.25% on the Cornell Bird Challenge 2020 (CBC) dataset.
Collapse
Affiliation(s)
- Shaokai Zhang
- College of Electronics and Information Engineering, Sichuan University, Chengdu 610041, China; (S.Z.); (Y.G.); (J.C.)
| | - Yuan Gao
- College of Electronics and Information Engineering, Sichuan University, Chengdu 610041, China; (S.Z.); (Y.G.); (J.C.)
| | - Jianmin Cai
- College of Electronics and Information Engineering, Sichuan University, Chengdu 610041, China; (S.Z.); (Y.G.); (J.C.)
| | - Hangxiao Yang
- College of Computer Science, Sichuan University, Chengdu 610041, China; (H.Y.); (Q.Z.)
| | - Qijun Zhao
- College of Computer Science, Sichuan University, Chengdu 610041, China; (H.Y.); (Q.Z.)
| | - Fan Pan
- College of Electronics and Information Engineering, Sichuan University, Chengdu 610041, China; (S.Z.); (Y.G.); (J.C.)
| |
Collapse
|
2
|
Transound: Hyper-head attention transformer for birds sound recognition. ECOL INFORM 2023. [DOI: 10.1016/j.ecoinf.2023.102001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
3
|
A review of automatic recognition technology for bird vocalizations in the deep learning era. ECOL INFORM 2022. [DOI: 10.1016/j.ecoinf.2022.101927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
4
|
Liu J, Zhang Y, Lv D, Lu J, Xie S, Zi J, Yin Y, Xu H. Birdsong classification based on ensemble multi-scale convolutional neural network. Sci Rep 2022; 12:8636. [PMID: 35606386 PMCID: PMC9126969 DOI: 10.1038/s41598-022-12121-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 04/22/2022] [Indexed: 11/13/2022] Open
Abstract
With the intensification of ecosystem damage, birds have become the symbolic species of the ecosystem. Ornithology with interdisciplinary technical research plays a great significance for protecting birds and evaluating ecosystem quality. Deep learning shows great progress for birdsongs recognition. However, as the number of network layers increases in traditional CNN, semantic information gradually becomes richer and detailed information disappears. Secondly, the global information carried by the entire input may be lost in convolution, pooling, or other operations, and these problems will weaken the performance of classification. In order to solve such problems, based on the feature spectrogram from the wavelet transform for the birdsongs, this paper explored the multi-scale convolution neural network (MSCNN) and proposed an ensemble multi-scale convolution neural network (EMSCNN) classification framework. The experiments compared the MSCNN and EMSCNN models with other CNN models including LeNet, VGG16, ResNet101, MobileNetV2, EfficientNetB7, Darknet53 and SPP-net. The results showed that the MSCNN model achieved an accuracy of 89.61%, and EMSCNN achieved an accuracy of 91.49%. In the experiments on the recognition of 30 species of birds, our models effectively improved the classification effect with high stability and efficiency, indicating that the models have better generalization ability and are suitable for birdsongs species recognition. It provides methodological and technical scheme reference for bird classification research.
Collapse
Affiliation(s)
- Jiang Liu
- College of Big Data and Intelligent Engineering, Southwest Forestry University, Kunming, 650000, China
| | - Yan Zhang
- College of Mathematics and Physics, Southwest Forestry University, Kunming, 650000, China.
| | - Danjv Lv
- College of Big Data and Intelligent Engineering, Southwest Forestry University, Kunming, 650000, China
| | - Jing Lu
- College of Big Data and Intelligent Engineering, Southwest Forestry University, Kunming, 650000, China
| | - Shanshan Xie
- College of Big Data and Intelligent Engineering, Southwest Forestry University, Kunming, 650000, China
| | - Jiali Zi
- College of Big Data and Intelligent Engineering, Southwest Forestry University, Kunming, 650000, China
| | - Yue Yin
- College of Big Data and Intelligent Engineering, Southwest Forestry University, Kunming, 650000, China
| | - Haifeng Xu
- School of Information Science and Technology, Beijing Forestry University, Beijing, 100091, China
| |
Collapse
|
5
|
A novel deep transfer learning models for recognition of birds sounds in different environment. Soft comput 2022. [DOI: 10.1007/s00500-021-06640-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
6
|
Hegg JC, Kennedy BP. Let's do the time warp again: non‐linear time series matching as a tool for sequentially structured data in ecology. Ecosphere 2021. [DOI: 10.1002/ecs2.3742] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Affiliation(s)
- Jens C. Hegg
- Department of Fish & Wildlife Sciences University of Idaho Moscow Idaho 83844 USA
| | - Brian P. Kennedy
- Department of Fish & Wildlife Sciences University of Idaho Moscow Idaho 83844 USA
- Department of Biology University of Idaho Moscow Idaho 83844 USA
- Department of Geology University of Idaho Moscow Idaho 83844 USA
| |
Collapse
|
7
|
Stamps MT, Go S, Mathuru AS. Computational geometric tools for quantitative comparison of locomotory behavior. Sci Rep 2019; 9:16585. [PMID: 31719560 PMCID: PMC6851375 DOI: 10.1038/s41598-019-52300-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Accepted: 10/14/2019] [Indexed: 11/08/2022] Open
Abstract
A fundamental challenge for behavioral neuroscientists is to accurately quantify (dis)similarities in animal behavior without excluding inherent variability present between individuals. We explored two new applications of curve and shape alignment techniques to address this issue. As a proof-of-concept we applied these methods to compare normal or alarmed behavior in pairs of medaka (Oryzias latipes). The curve alignment method we call Behavioral Distortion Distance (BDD) revealed that alarmed fish display less predictable swimming over time, even if individuals incorporate the same action patterns like immobility, sudden changes in swimming trajectory, or changing their position in the water column. The Conformal Spatiotemporal Distance (CSD) technique on the other hand revealed that, in spite of the unpredictability, alarmed individuals exhibit lower variability in overall swim patterns, possibly accounting for the widely held notion of "stereotypy" in alarm responses. More generally, we propose that these new applications of established computational geometric techniques are useful in combination to represent, compare, and quantify complex behaviors consisting of common action patterns that differ in duration, sequence, or frequency.
Collapse
Affiliation(s)
| | - Soo Go
- Yale-NUS College, Singapore, Singapore
| | - Ajay S Mathuru
- Yale-NUS College, Singapore, Singapore.
- Institute of Molecular and Cell Biology (IMCB), Singapore, Singapore.
- Department of Physiology, Yong Loo Lin School of Medicine (YLL), National University of Singapore, Singapore, Singapore.
| |
Collapse
|
8
|
Zhang X, Chen A, Zhou G, Zhang Z, Huang X, Qiang X. Spectrogram-frame linear network and continuous frame sequence for bird sound classification. ECOL INFORM 2019. [DOI: 10.1016/j.ecoinf.2019.101009] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
9
|
Goh GH, Maloney SK, Mark PJ, Blache D. Episodic Ultradian Events-Ultradian Rhythms. BIOLOGY 2019; 8:E15. [PMID: 30875767 PMCID: PMC6466064 DOI: 10.3390/biology8010015] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/27/2019] [Revised: 02/24/2019] [Accepted: 03/09/2019] [Indexed: 11/16/2022]
Abstract
In the fast lane of chronobiology, ultradian events are short-term rhythms that have been observed since the beginning of modern biology and were quantified about a century ago. They are ubiquitous in all biological systems and found in all organisms, from unicellular organisms to mammals, and from single cells to complex biological functions in multicellular animals. Since these events are aperiodic and last for a few minutes to a few hours, they are better classified as episodic ultradian events (EUEs). Their origin is unclear. However, they could have a molecular basis and could be controlled by hormonal inputs-in vertebrates, they originate from the activity of the central nervous system. EUEs are receiving increasing attention but their aperiodic nature requires specific sampling and analytic tools. While longer scale rhythms are adaptations to predictable changes in the environment, in theory, EUEs could contribute to adaptation by preparing organisms and biological functions for unpredictability.
Collapse
Affiliation(s)
- Grace H Goh
- School of Human Sciences, Faculty of Science, The University of Western Australia, 35 Stirling Highway, Crawley 6009, Western Australia, Australia.
| | - Shane K Maloney
- School of Human Sciences, Faculty of Science, The University of Western Australia, 35 Stirling Highway, Crawley 6009, Western Australia, Australia.
| | - Peter J Mark
- School of Human Sciences, Faculty of Science, The University of Western Australia, 35 Stirling Highway, Crawley 6009, Western Australia, Australia.
| | - Dominique Blache
- School of Agriculture and Environment and UWA Institute of Agriculture, Faculty of Science, The University of Western Australia, 35 Stirling Highway, Crawley 6009, Western Australia, Australia.
| |
Collapse
|
10
|
Grant AD, Wilsterman K, Smarr BL, Kriegsfeld LJ. Evidence for a Coupled Oscillator Model of Endocrine Ultradian Rhythms. J Biol Rhythms 2018; 33:475-496. [PMID: 30132387 DOI: 10.1177/0748730418791423] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Whereas long-period temporal structures in endocrine dynamics have been well studied, endocrine rhythms on the scale of hours are relatively unexplored. The study of these ultradian rhythms (URs) has remained nascent, in part, because a theoretical framework unifying ultradian patterns across systems has not been established. The present overview proposes a conceptual coupled oscillator network model of URs in which oscillating hormonal outputs, or nodes, are connected by edges representing the strength of node-node coupling. We propose that variable-strength coupling exists both within and across classic hormonal axes. Because coupled oscillators synchronize, such a model implies that changes across hormonal systems could be inferred by surveying accessible nodes in the network. This implication would at once simplify the study of URs and open new avenues of exploration into conditions affecting coupling. In support of this proposed framework, we review mammalian evidence for (1) URs of the gut-brain axis and the hypothalamo-pituitary-thyroid, -adrenal, and -gonadal axes, (2) UR coupling within and across these axes; and (3) the relation of these URs to body temperature. URs across these systems exhibit behavior broadly consistent with a coupled oscillator network, maintaining both consistent URs and coupling within and across axes. This model may aid the exploration of mammalian physiology at high temporal resolution and improve the understanding of endocrine system dynamics within individuals.
Collapse
Affiliation(s)
- Azure D Grant
- The Helen Wills Neuroscience Institute, University of California, Berkeley, California
| | - Kathryn Wilsterman
- Department of Integrative Biology, University of California, Berkeley, California
| | - Benjamin L Smarr
- Department of Psychology, University of California, Berkeley, California
| | - Lance J Kriegsfeld
- The Helen Wills Neuroscience Institute, University of California, Berkeley, California.,Department of Psychology, University of California, Berkeley, California
| |
Collapse
|
11
|
Le LN, Jones DL. Tensorial dynamic time warping with articulation index representation for efficient audio-template learning. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:1548. [PMID: 29604702 DOI: 10.1121/1.5027245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Audio classification techniques often depend on the availability of a large labeled training dataset for successful performance. However, in many application domains of audio classification (e.g., wildlife monitoring), obtaining labeled data is still a costly and laborious process. Motivated by this observation, a technique is proposed to efficiently learn a clean template from a few labeled, but likely corrupted (by noise and interferences), data samples. This learning can be done efficiently via tensorial dynamic time warping on the articulation index-based time-frequency representations of audio data. The learned template can then be used in audio classification following the standard template-based approach. Experimental results show that the proposed approach outperforms both (1) the recurrent neural network approach and (2) the state-of-the-art in the template-based approach on a wildlife detection application with few training samples.
Collapse
Affiliation(s)
- Long N Le
- Electrical and Computer Engineering Department, University of Illinois at Urbana-Champaign, Urbana, Illinois, 61801 USA
| | - Douglas L Jones
- Electrical and Computer Engineering Department, University of Illinois at Urbana-Champaign, Urbana, Illinois, 61801 USA
| |
Collapse
|
12
|
Mueen A, Chavoshi N, Abu-El-Rub N, Hamooni H, Minnich A, MacCarthy J. Speeding up dynamic time warping distance for sparse time series data. Knowl Inf Syst 2017. [DOI: 10.1007/s10115-017-1119-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
13
|
Qian K, Zhang Z, Baird A, Schuller B. Active learning for bird sound classification via a kernel-based extreme learning machine. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 142:1796. [PMID: 29092546 DOI: 10.1121/1.5004570] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In recent years, research fields, including ecology, bioacoustics, signal processing, and machine learning, have made bird sound recognition a part of their focus. This has led to significant advancements within the field of ornithology, such as improved understanding of evolution, local biodiversity, mating rituals, and even the implications and realities associated to climate change. The volume of unlabeled bird sound data is now overwhelming, and comparatively little exploration is being made into methods for how best to handle them. In this study, two active learning (AL) methods are proposed, sparse-instance-based active learning (SI-AL), and least-confidence-score-based active learning (LCS-AL), both effectively reducing the need for expert human annotation. To both of these AL paradigms, a kernel-based extreme learning machine (KELM) is then integrated, and a comparison is made to the conventional support vector machine (SVM). Experimental results demonstrate that, when the classifier capacity is improved from an unweighted average recall of 60%-80%, KELM can outperform SVM even when a limited proportion of human annotations are used from the pool of data in both cases of SI-AL (minimum 34.5% vs minimum 59.0%) and LCS-AL (minimum 17.3% vs minimum 28.4%).
Collapse
Affiliation(s)
- Kun Qian
- Machine Intelligence and Signal Processing Group, Chair of Human-Machine Communication, Technische Universität München, Arcisstr. 21, Munich 80333, Germany
| | - Zixing Zhang
- Chair of Complex and Intelligent Systems, University of Passau, Innstr. 43, Passau 94032, Germany
| | - Alice Baird
- Chair of Complex and Intelligent Systems, University of Passau, Innstr. 43, Passau 94032, Germany
| | - Björn Schuller
- GLAM-Group on Language, Audio and Music, Department of Computing, Imperial College London, 180 Queens' Gate, Huxley Building, London SW7 2AZ, United Kingdom
| |
Collapse
|
14
|
|
15
|
Kaewtip K, Alwan A, O'Reilly C, Taylor CE. A robust automatic birdsong phrase classification: A template-based approach. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:3691. [PMID: 27908084 DOI: 10.1121/1.4966592] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Automatic phrase detection systems of bird sounds are useful in several applications as they reduce the need for manual annotations. However, birdphrase detection is challenging due to limited training data and background noise. Limited data occur because of limited recordings or the existence of rare phrases. Background noise interference occurs because of the intrinsic nature of the recording environment such as wind or other animals. This paper presents a different approach to birdsong phrase classification using template-based techniques suitable even for limited training data and noisy environments. The algorithm utilizes dynamic time-warping (DTW) and prominent (high-energy) time-frequency regions of training spectrograms to derive templates. The performance of the proposed algorithm is compared with the traditional DTW and hidden Markov models (HMMs) methods under several training and test conditions. DTW works well when the data are limited, while HMMs do better when more data are available, yet they both suffer when the background noise is severe. The proposed algorithm outperforms DTW and HMMs in most training and testing conditions, usually with a high margin when the background noise level is high. The innovation of this work is that the proposed algorithm is robust to both limited training data and background noise.
Collapse
Affiliation(s)
- Kantapon Kaewtip
- Department of Electrical Engineering, University of California, Los Angeles, 56-125B Engineering IV Building, Box 951594, Los Angeles, California 90095, USA
| | - Abeer Alwan
- Department of Electrical Engineering, University of California, Los Angeles, 56-125B Engineering IV Building, Box 951594, Los Angeles, California 90095, USA
| | - Colm O'Reilly
- Sigmedia, Department of Electronic and Electrical Engineering, Trinity College, Dublin, Ireland
| | - Charles E Taylor
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, 621 Charles Young Drive South, Los Angeles, California 90095, USA
| |
Collapse
|
16
|
Taylor CE, Huang Y, Yao K. Distributed sensor swarms for monitoring bird behavior: an integrated system using wildlife acoustics recorders. ARTIFICIAL LIFE AND ROBOTICS 2016. [DOI: 10.1007/s10015-016-0295-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
17
|
Complexity, Predictability and Time Homogeneity of Syntax in the Songs of Cassin's Vireo (Vireo cassinii). PLoS One 2016; 11:e0150822. [PMID: 27050537 PMCID: PMC4822860 DOI: 10.1371/journal.pone.0150822] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2015] [Accepted: 02/20/2016] [Indexed: 11/19/2022] Open
Abstract
Many species of animals deliver vocalizations in sequences presumed to be governed by internal rules, though the nature and complexity of these syntactical rules have been investigated in relatively few species. Here I present an investigation into the song syntax of fourteen male Cassin's Vireos (Vireo cassinii), a species whose song sequences are highly temporally structured. I compare their song sequences to three candidate models of varying levels of complexity-zero-order, first-order and second-order Markov models-and employ novel methods to interpolate between these three models. A variety of analyses, including sequence simulations, Fisher's exact tests, and model likelihood analyses, showed that the songs of this species are too complex to be described by a zero-order or first-order Markov model. The model that best fit the data was intermediate in complexity between a first- and second-order model, though I also present evidence that some transition probabilities are conditioned on up to three preceding phrases. In addition, sequences were shown to be predictable with more than 54% accuracy overall, and predictability was positively correlated with the rate of song delivery. An assessment of the time homogeneity of syntax showed that transition probabilities between phrase types are largely stable over time, but that there was some evidence for modest changes in syntax within and between breeding seasons, a finding that I interpret to represent changes in breeding stage and social context rather than irreversible, secular shifts in syntax over time. These findings constitute a valuable addition to our understanding of bird song syntax in free-living birds, and will contribute to future attempts to understand the evolutionary importance of bird song syntax in avian communication.
Collapse
|
18
|
Arriaga JG, Sanchez H, Vallejo EE, Hedley R, Taylor CE. Identification of Cassin׳s Vireo (Vireo cassinii) individuals from their acoustic sequences using an ensemble of learners. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2015.05.129] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
19
|
|
20
|
Arriaga JG, Cody ML, Vallejo EE, Taylor CE. Bird-DB: A database for annotated bird song sequences. ECOL INFORM 2015. [DOI: 10.1016/j.ecoinf.2015.01.007] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|