1
|
Batist CH, Dufourq E, Jeantet L, Razafindraibe MN, Randriamanantena F, Baden AL. An integrated passive acoustic monitoring and deep learning pipeline for black-and-white ruffed lemurs (Varecia variegata) in Ranomafana National Park, Madagascar. Am J Primatol 2024; 86:e23599. [PMID: 38244194 DOI: 10.1002/ajp.23599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 01/05/2024] [Accepted: 01/09/2024] [Indexed: 01/22/2024]
Abstract
The urgent need for effective wildlife monitoring solutions in the face of global biodiversity loss has resulted in the emergence of conservation technologies such as passive acoustic monitoring (PAM). While PAM has been extensively used for marine mammals, birds, and bats, its application to primates is limited. Black-and-white ruffed lemurs (Varecia variegata) are a promising species to test PAM with due to their distinctive and loud roar-shrieks. Furthermore, these lemurs are challenging to monitor via traditional methods due to their fragmented and often unpredictable distribution in Madagascar's dense eastern rainforests. Our goal in this study was to develop a machine learning pipeline for automated call detection from PAM data, compare the effectiveness of PAM versus in-person observations, and investigate diel patterns in lemur vocal behavior. We did this study at Mangevo, Ranomafana National Park by concurrently conducting focal follows and deploying autonomous recorders in May-July 2019. We used transfer learning to build a convolutional neural network (optimized for recall) that automated the detection of lemur calls (57-h runtime; recall = 0.94, F1 = 0.70). We found that PAM outperformed in-person observations, saving time, money, and labor while also providing re-analyzable data. Using PAM yielded novel insights into V. variegata diel vocal patterns; we present the first published evidence of nocturnal calling. We developed a graphic user interface and open-sourced data and code, to serve as a resource for primatologists interested in implementing PAM and machine learning. By leveraging the potential of this pipeline, we can address the urgent need for effective primate population surveys to inform conservation strategies.
Collapse
Affiliation(s)
- Carly H Batist
- Department of Anthropology, City University of New York (CUNY) Graduate Center, New York, New York, USA
- New York Consortium in Evolutionary Primatology (NYCEP), New York, New York, USA
- Rainforest Connection (RFCx), Katy, Texas, USA
| | - Emmanuel Dufourq
- African Institute for Mathematical Sciences, Muizenberg, South Africa
- Department of Mathematical Sciences, Stellenbosch University, Stellenbosch, South Africa
- National Institute for Theoretical & Computational Sciences, Stellenbosch, South Africa
- African Institute for Mathematical Sciences, Research and Innovation Centre, Kigali, Rwanda
| | - Lorène Jeantet
- African Institute for Mathematical Sciences, Muizenberg, South Africa
- Department of Mathematical Sciences, Stellenbosch University, Stellenbosch, South Africa
- National Institute for Theoretical & Computational Sciences, Stellenbosch, South Africa
| | - Mendrika N Razafindraibe
- Department of Animal Biology, University of Antananarivo, Antananarivo, Madagascar
- Institut International de Science Sociale, Antananarivo, Madagascar
| | | | - Andrea L Baden
- Department of Anthropology, City University of New York (CUNY) Graduate Center, New York, New York, USA
- New York Consortium in Evolutionary Primatology (NYCEP), New York, New York, USA
- Department of Anthropology, Hunter College of City University of New York (CUNY), New York, New York, USA
| |
Collapse
|
2
|
Kather V, Seipel F, Berges B, Davis G, Gibson C, Harvey M, Henry LA, Stevenson A, Risch D. Development of a machine learning detector for North Atlantic humpback whale song. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:2050-2064. [PMID: 38477612 DOI: 10.1121/10.0025275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 02/22/2024] [Indexed: 03/14/2024]
Abstract
The study of humpback whale song using passive acoustic monitoring devices requires bioacousticians to manually review hours of audio recordings to annotate the signals. To vastly reduce the time of manual annotation through automation, a machine learning model was developed. Convolutional neural networks have made major advances in the previous decade, leading to a wide range of applications, including the detection of frequency modulated vocalizations by cetaceans. A large dataset of over 60 000 audio segments of 4 s length is collected from the North Atlantic and used to fine-tune an existing model for humpback whale song detection in the North Pacific (see Allen, Harvey, Harrell, Jansen, Merkens, Wall, Cattiau, and Oleson (2021). Front. Mar. Sci. 8, 607321). Furthermore, different data augmentation techniques (time-shift, noise augmentation, and masking) are used to artificially increase the variability within the training set. Retraining and augmentation yield F-score values of 0.88 on context window basis and 0.89 on hourly basis with false positive rates of 0.05 on context window basis and 0.01 on hourly basis. If necessary, usage and retraining of the existing model is made convenient by a framework (AcoDet, acoustic detector) built during this project. Combining the tools provided by this framework could save researchers hours of manual annotation time and, thus, accelerate their research.
Collapse
Affiliation(s)
- Vincent Kather
- Audio Communication and Technology, Technical University Berlin, Einsteinufer 17c, 10587, Berlin, Germany
| | - Fabian Seipel
- Audio Communication and Technology, Technical University Berlin, Einsteinufer 17c, 10587, Berlin, Germany
| | - Benoit Berges
- Wageningen Marine Research, Wageningen University and Research, IJmuiden, Noord Holland, 1976 CP, Netherlands
| | - Genevieve Davis
- National Oceanic and Atmospheric Administration (NOAA) Northeast Fisheries Science Center, 166 Water Street, Woods Hole, Massachusetts 02543, USA
| | - Catherine Gibson
- School of Biological Sciences, Queens University Belfast, Belfast, BT9 5DL, Northern Ireland
| | - Matt Harvey
- Google Inc., Mountain View, California 94043, USA
| | - Lea-Anne Henry
- School of GeoSciences, University of Edinburgh, James Hutton Road, EH9 3FE, Edinburgh, Scotland
| | | | - Denise Risch
- Scottish Association for Marine Science, University of Highlands and Islands, Oban, PA37 1QJ, Scotland
| |
Collapse
|
3
|
Ansong-Ansongton YON, Adamson TD. Computing Sickle Erythrocyte Health Index on quantitative phase imaging and machine learning. Exp Hematol 2024; 131:104166. [PMID: 38246310 DOI: 10.1016/j.exphem.2024.104166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Revised: 12/30/2023] [Accepted: 01/02/2024] [Indexed: 01/23/2024]
Abstract
Sickle cell disease (SCD) is a genetic disorder characterized by abnormal hemoglobin and deformation of red blood cells (RBCs), leading to complications and reduced life expectancy. This study developed an in vitro assessment, the Sickle Erythrocyte Health Index, using quantitative phase imaging (QPI) and machine learning to model the health of RBCs in people with SCD. The health index combines assessment of cell deformation, sickle-shaped classification, and membrane flexibility to evaluate erythrocyte health. Using QPI and image processing, the percentage of sickle-shaped cells and membrane flexibility were quantified. Statistically significant differences were observed between individuals with and without SCD, indicating the impact of underlying pathophysiology on erythrocyte health. Additionally, sodium metabisulfite led to an increase in sickle-shaped cells and a decrease in flexibility in the sickle cell blood samples. Based on these findings, two approaches were used to calculate the Sickle Erythrocyte Health Index: one using hand-crafted features and one using learned features from deep learning models. Both indices showed significant differences between non-SCD and SCD groups and sensitivity to changes induced by sodium metabisulfite. The Sickle Erythrocyte Health Index has important clinical implications for SCD management and could be used by providers when making treatment decisions. Further research is warranted to evaluate the clinical utility and applicability of the Sickle Erythrocyte Health Index in diverse patient populations.
Collapse
Affiliation(s)
- Yaw Ofosu Nyansa Ansong-Ansongton
- Department of Bioengineering, KovaDx, New Haven, CT; Department of Bioengineering, University of California Berkeley, Bioengineering, Berkeley, CA.
| | - Timothy D Adamson
- Department of Bioengineering, KovaDx, New Haven, CT; Department of Bioengineering, University of California Berkeley, Bioengineering, Berkeley, CA
| |
Collapse
|
4
|
Brickson L, Zhang L, Vollrath F, Douglas-Hamilton I, Titus AJ. Elephants and algorithms: a review of the current and future role of AI in elephant monitoring. J R Soc Interface 2023; 20:20230367. [PMID: 37963556 PMCID: PMC10645515 DOI: 10.1098/rsif.2023.0367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 10/23/2023] [Indexed: 11/16/2023] Open
Abstract
Artificial intelligence (AI) and machine learning (ML) present revolutionary opportunities to enhance our understanding of animal behaviour and conservation strategies. Using elephants, a crucial species in Africa and Asia's protected areas, as our focal point, we delve into the role of AI and ML in their conservation. Given the increasing amounts of data gathered from a variety of sensors like cameras, microphones, geophones, drones and satellites, the challenge lies in managing and interpreting this vast data. New AI and ML techniques offer solutions to streamline this process, helping us extract vital information that might otherwise be overlooked. This paper focuses on the different AI-driven monitoring methods and their potential for improving elephant conservation. Collaborative efforts between AI experts and ecological researchers are essential in leveraging these innovative technologies for enhanced wildlife conservation, setting a precedent for numerous other species.
Collapse
Affiliation(s)
| | | | - Fritz Vollrath
- Save the Elephants, Nairobi, Kenya
- Department of Biology, University of Oxford, Oxford, UK
| | | | - Alexander J. Titus
- Colossal Biosciences, Dallas, TX, USA
- Information Sciences Institute, University of Southern California, Los Angeles, USA
| |
Collapse
|
5
|
Erbs F, Gaona M, van der Schaar M, Zaugg S, Ramalho E, Houser D, André M. Towards automated long-term acoustic monitoring of endangered river dolphins: a case study in the Brazilian Amazon floodplains. Sci Rep 2023; 13:10801. [PMID: 37500656 PMCID: PMC10374533 DOI: 10.1038/s41598-023-36518-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2022] [Accepted: 06/05/2023] [Indexed: 07/29/2023] Open
Abstract
Using passive acoustic monitoring (PAM) and convolutional neural networks (CNN), we monitored the movements of the two endangered Amazon River dolphin species, the boto (Inia geoffrensis) and the tucuxi (Sotalia fluviatilis) from main rivers to floodplain habitats (várzea) in the Mamirauá Reserve (Amazonas, Brazil). We detected dolphin presence in four main areas based on the classification of their echolocation clicks. Using the same method, we automatically detected boat passages to estimate a possible interaction between boat and dolphin presence. Performance of the CNN classifier was high with an average precision of 0.95 and 0.92 for echolocation clicks and boats, respectively. Peaks of acoustic activity were detected synchronously at the river entrance and channel, corresponding to dolphins seasonally entering the várzea. Additionally, the river dolphins were regularly detected inside the flooded forest, suggesting a wide dispersion of their populations inside this large area, traditionally understudied and particularly important for boto females and calves. Boats overlapped with dolphin presence 9% of the time. PAM and recent advances in classification methods bring a new insight of the river dolphins' use of várzea habitats, which will contribute to conservation strategies of these species.
Collapse
Affiliation(s)
- Florence Erbs
- Laboratori d'Aplicacions Bioacústiques, Universitat Politècnica de Catalunya - BarcelonaTech, Barcelona, Spain
| | - Marina Gaona
- Laboratori d'Aplicacions Bioacústiques, Universitat Politècnica de Catalunya - BarcelonaTech, Barcelona, Spain
- Instituto de Desenvolvimento Sustentável Mamirauá, Tefé, Brazil
| | - Mike van der Schaar
- Laboratori d'Aplicacions Bioacústiques, Universitat Politècnica de Catalunya - BarcelonaTech, Barcelona, Spain
| | - Serge Zaugg
- Laboratori d'Aplicacions Bioacústiques, Universitat Politècnica de Catalunya - BarcelonaTech, Barcelona, Spain
| | | | | | - Michel André
- Laboratori d'Aplicacions Bioacústiques, Universitat Politècnica de Catalunya - BarcelonaTech, Barcelona, Spain.
| |
Collapse
|
6
|
Hauer C, Nöth E, Barnhill A, Maier A, Guthunz J, Hofer H, Cheng RX, Barth V, Bergler C. ORCA-SPY enables killer whale sound source simulation, detection, classification and localization using an integrated deep learning-based segmentation. Sci Rep 2023; 13:11106. [PMID: 37429871 DOI: 10.1038/s41598-023-38132-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Accepted: 07/03/2023] [Indexed: 07/12/2023] Open
Abstract
Acoustic identification of vocalizing individuals opens up new and deeper insights into animal communications, such as individual-/group-specific dialects, turn-taking events, and dialogs. However, establishing an association between an individual animal and its emitted signal is usually non-trivial, especially for animals underwater. Consequently, a collection of marine species-, array-, and position-specific ground truth localization data is extremely challenging, which strongly limits possibilities to evaluate localization methods beforehand or at all. This study presents ORCA-SPY, a fully-automated sound source simulation, classification and localization framework for passive killer whale (Orcinus orca) acoustic monitoring that is embedded into PAMGuard, a widely used bioacoustic software toolkit. ORCA-SPY enables array- and position-specific multichannel audio stream generation to simulate real-world ground truth killer whale localization data and provides a hybrid sound source identification approach integrating ANIMAL-SPOT, a state-of-the-art deep learning-based orca detection network, followed by downstream Time-Difference-Of-Arrival localization. ORCA-SPY was evaluated on simulated multichannel underwater audio streams including various killer whale vocalization events within a large-scale experimental setup benefiting from previous real-world fieldwork experience. Across all 58,320 embedded vocalizing killer whale events, subject to various hydrophone array geometries, call types, distances, and noise conditions responsible for a signal-to-noise ratio varying from [Formula: see text] dB to 3 dB, a detection rate of 94.0 % was achieved with an average localization error of 7.01[Formula: see text]. ORCA-SPY was field-tested on Lake Stechlin in Brandenburg Germany under laboratory conditions with a focus on localization. During the field test, 3889 localization events were observed with an average error of 29.19[Formula: see text] and a median error of 17.54[Formula: see text]. ORCA-SPY was deployed successfully during the DeepAL fieldwork 2022 expedition (DLFW22) in Northern British Columbia, with a mean average error of 20.01[Formula: see text] and a median error of 11.01[Formula: see text] across 503 localization events. ORCA-SPY is an open-source and publicly available software framework, which can be adapted to various recording conditions as well as animal species.
Collapse
Affiliation(s)
- Christopher Hauer
- Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Martensstr. 3, 91058, Erlangen, Germany.
| | - Elmar Nöth
- Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Martensstr. 3, 91058, Erlangen, Germany
| | - Alexander Barnhill
- Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Martensstr. 3, 91058, Erlangen, Germany
| | - Andreas Maier
- Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Martensstr. 3, 91058, Erlangen, Germany
| | | | - Heribert Hofer
- Leibniz Institute for Zoo and Wildlife Research (IZW), Alfred-Kowalke-Straße 17, 10315, Berlin, Germany
- Department of Veterinary Medicine, Freie Universität Berlin, 14195, Berlin, Germany
- Department of Biology, Chemistry, Pharmacy, Freie Universität Berlin, 14195, Berlin, Germany
| | - Rachael Xi Cheng
- Leibniz Institute for Zoo and Wildlife Research (IZW), Alfred-Kowalke-Straße 17, 10315, Berlin, Germany
| | - Volker Barth
- Anthro-Media, Nansenstr. 19, 12047, Berlin, Germany
| | - Christian Bergler
- Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Martensstr. 3, 91058, Erlangen, Germany.
| |
Collapse
|
7
|
Sadaiappan B, Balakrishnan P, C.R. V, Vijayan NT, Subramanian M, Gauns MU. Applications of Machine Learning in Chemical and Biological Oceanography. ACS OMEGA 2023; 8:15831-15853. [PMID: 37179641 PMCID: PMC10173431 DOI: 10.1021/acsomega.2c06441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 02/22/2023] [Indexed: 05/15/2023]
Abstract
Machine learning (ML) refers to computer algorithms that predict a meaningful output or categorize complex systems based on a large amount of data. ML is applied in various areas including natural science, engineering, space exploration, and even gaming development. This review focuses on the use of machine learning in the field of chemical and biological oceanography. In the prediction of global fixed nitrogen levels, partial carbon dioxide pressure, and other chemical properties, the application of ML is a promising tool. Machine learning is also utilized in the field of biological oceanography to detect planktonic forms from various images (i.e., microscopy, FlowCAM, and video recorders), spectrometers, and other signal processing techniques. Moreover, ML successfully classified the mammals using their acoustics, detecting endangered mammalian and fish species in a specific environment. Most importantly, using environmental data, the ML proved to be an effective method for predicting hypoxic conditions and harmful algal bloom events, an essential measurement in terms of environmental monitoring. Furthermore, machine learning was used to construct a number of databases for various species that will be useful to other researchers, and the creation of new algorithms will help the marine research community better comprehend the chemistry and biology of the ocean.
Collapse
Affiliation(s)
- Balamurugan Sadaiappan
- Department
of Biology, United Arab Emirates University, Al Ain 971, UAE
- Plankton
Laboratory, Biological Oceanography Division, CSIR-National Institute of Oceanography, Dona Paula, Goa 403004, India
| | - Preethiya Balakrishnan
- Faraday-Fleming
Laboratory, London W148TL, United Kingdom
- University
of London, London WC1E 7HU, United
Kingdom
| | - Vishal C.R.
- Plankton
Laboratory, Biological Oceanography Division, CSIR-National Institute of Oceanography, Dona Paula, Goa 403004, India
| | - Neethu T. Vijayan
- Plankton
Laboratory, Biological Oceanography Division, CSIR-National Institute of Oceanography, Dona Paula, Goa 403004, India
| | - Mahendran Subramanian
- Faraday-Fleming
Laboratory, London W148TL, United Kingdom
- Department
of Computing, Imperial College, London SW7 2AZ, United Kingdom
| | - Mangesh U. Gauns
- Plankton
Laboratory, Biological Oceanography Division, CSIR-National Institute of Oceanography, Dona Paula, Goa 403004, India
| |
Collapse
|
8
|
Goldwater M, Zitterbart DP, Wright D, Bonnel J. Machine-learning-based simultaneous detection and ranging of impulsive baleen whale vocalizations using a single hydrophone. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:1094. [PMID: 36859165 DOI: 10.1121/10.0017118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 01/17/2023] [Indexed: 06/18/2023]
Abstract
The low-frequency impulsive gunshot vocalizations of baleen whales exhibit dispersive propagation in shallow-water channels which is well-modeled by normal mode theory. Typically, underwater acoustic source range estimation requires multiple time-synchronized hydrophone arrays which can be difficult and expensive to achieve. However, single-hydrophone modal dispersion has been used to range baleen whale vocalizations and estimate shallow-water geoacoustic properties. Although convenient when compared to sensor arrays, these algorithms require preliminary signal detection and human labor to estimate the modal dispersion. In this paper, we apply a temporal convolutional network (TCN) to spectrograms from single-hydrophone acoustic data for simultaneous gunshot detection and ranging. The TCN learns ranging and detection jointly using gunshots simulated across multiple environments and ranges along with experimental noise. The synthetic data are informed by only the water column depth, sound speed, and density of the experimental environment, while other parameters span empirically observed bounds. The method is experimentally verified on North Pacific right whale gunshot data collected in the Bering Sea. To do so, 50 dispersive gunshots were manually ranged using the state-of-the-art time-warping inversion method. The TCN detected these gunshots among 50 noise-only examples with high precision and estimated ranges which closely matched those of the physics-based approach.
Collapse
Affiliation(s)
- Mark Goldwater
- Applied Ocean Physics and Engineering, Woods Hole Oceanographic Institution, Woods Hole, Massachusetts 02543, USA
| | - Daniel P Zitterbart
- Applied Ocean Physics and Engineering, Woods Hole Oceanographic Institution, Woods Hole, Massachusetts 02543, USA
| | - Dana Wright
- Duke University Marine Laboratory, Beaufort, North Carolina 28516, USA
| | - Julien Bonnel
- Applied Ocean Physics and Engineering, Woods Hole Oceanographic Institution, Woods Hole, Massachusetts 02543, USA
| |
Collapse
|
9
|
There You Are! Automated Detection of Indris' Songs on Features Extracted from Passive Acoustic Recordings. Animals (Basel) 2023; 13:ani13020241. [PMID: 36670780 PMCID: PMC9855168 DOI: 10.3390/ani13020241] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 12/21/2022] [Accepted: 12/28/2022] [Indexed: 01/11/2023] Open
Abstract
The growing concern for the ongoing biodiversity loss drives researchers towards practical and large-scale automated systems to monitor wild animal populations. Primates, with most species threatened by extinction, face substantial risks. We focused on the vocal activity of the indri (Indri indri) recorded in Maromizaha Forest (Madagascar) from 2019 to 2021 via passive acoustics, a method increasingly used for monitoring activities in different environments. We first used indris’ songs, loud distinctive vocal sequences, to detect the species’ presence. We processed the raw data (66,443 10-min recordings) and extracted acoustic features based on the third-octave band system. We then analysed the features extracted from three datasets, divided according to sampling year, site, and recorder type, with a convolutional neural network that was able to generalise to recording sites and previously unsampled periods via data augmentation and transfer learning. For the three datasets, our network detected the song presence with high accuracy (>90%) and recall (>80%) values. Once provided the model with the time and day of recording, the high-performance values ensured that the classification process could accurately depict both daily and annual habits of indris‘ singing pattern, critical information to optimise field data collection. Overall, using this easy-to-implement species-specific detection workflow as a preprocessing method allows researchers to reduce the time dedicated to manual classification.
Collapse
|
10
|
McGinn K, Kahl S, Peery MZ, Klinck H, Wood CM. Feature embeddings from the BirdNET algorithm provide insights into avian ecology. ECOL INFORM 2023. [DOI: 10.1016/j.ecoinf.2023.101995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
|
11
|
Bergler C, Smeele SQ, Tyndel SA, Barnhill A, Ortiz ST, Kalan AK, Cheng RX, Brinkløv S, Osiecka AN, Tougaard J, Jakobsen F, Wahlberg M, Nöth E, Maier A, Klump BC. ANIMAL-SPOT enables animal-independent signal detection and classification using deep learning. Sci Rep 2022; 12:21966. [PMID: 36535999 PMCID: PMC9763499 DOI: 10.1038/s41598-022-26429-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 12/14/2022] [Indexed: 12/23/2022] Open
Abstract
Bioacoustic research spans a wide range of biological questions and applications, relying on identification of target species or smaller acoustic units, such as distinct call types. However, manually identifying the signal of interest is time-intensive, error-prone, and becomes unfeasible with large data volumes. Therefore, machine-driven algorithms are increasingly applied to various bioacoustic signal identification challenges. Nevertheless, biologists still have major difficulties trying to transfer existing animal- and/or scenario-related machine learning approaches to their specific animal datasets and scientific questions. This study presents an animal-independent, open-source deep learning framework, along with a detailed user guide. Three signal identification tasks, commonly encountered in bioacoustics research, were investigated: (1) target signal vs. background noise detection, (2) species classification, and (3) call type categorization. ANIMAL-SPOT successfully segmented human-annotated target signals in data volumes representing 10 distinct animal species and 1 additional genus, resulting in a mean test accuracy of 97.9%, together with an average area under the ROC curve (AUC) of 95.9%, when predicting on unseen recordings. Moreover, an average segmentation accuracy and F1-score of 95.4% was achieved on the publicly available BirdVox-Full-Night data corpus. In addition, multi-class species and call type classification resulted in 96.6% and 92.7% accuracy on unseen test data, as well as 95.2% and 88.4% regarding previous animal-specific machine-based detection excerpts. Furthermore, an Unweighted Average Recall (UAR) of 89.3% outperformed the multi-species classification baseline system of the ComParE 2021 Primate Sub-Challenge. Besides animal independence, ANIMAL-SPOT does not rely on expert knowledge or special computing resources, thereby making deep-learning-based bioacoustic signal identification accessible to a broad audience.
Collapse
Affiliation(s)
- Christian Bergler
- grid.5330.50000 0001 2107 3311Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91058 Erlangen, Germany
| | - Simeon Q. Smeele
- grid.507516.00000 0004 7661 536XCognitive and Cultural Ecology Lab, Max Planck Institute of Animal Behavior, 78315 Radolfzell, Germany ,grid.419518.00000 0001 2159 1813Department of Human Behavior, Ecology and Culture, Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany ,grid.9811.10000 0001 0658 7699Biology Department, University of Konstanz, 78464 Constance, Germany
| | - Stephen A. Tyndel
- grid.507516.00000 0004 7661 536XCognitive and Cultural Ecology Lab, Max Planck Institute of Animal Behavior, 78315 Radolfzell, Germany ,grid.35403.310000 0004 1936 9991Department of Natural Resources and Environmental Sciences, University of Illinois Urbana-Champaign, Champaign, IL United States
| | - Alexander Barnhill
- grid.5330.50000 0001 2107 3311Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91058 Erlangen, Germany
| | - Sara T. Ortiz
- grid.4372.20000 0001 2105 1091Max Planck Institute for Biological Intelligence, in Foundation, Seewiesen Eberhard-Gwinner-Strasse, 82319 Starnberg, Germany
| | - Ammie K. Kalan
- grid.143640.40000 0004 1936 9465Department of Anthropology, University of Victoria, Victoria, BC V8P 5C2 Canada
| | - Rachael Xi Cheng
- grid.418779.40000 0001 0708 0355Leibniz Institute for Zoo and Wildlife Research, Alfred-Kowalke-Straße 17, 10315 Berlin, Germany
| | - Signe Brinkløv
- grid.7048.b0000 0001 1956 2722Department of Bioscience, Wildlife Ecology, Aarhus University, 8410 Rønde, Denmark
| | - Anna N. Osiecka
- grid.8585.00000 0001 2370 4076Department of Vertebrate Ecology and Zoology, Faculty of Biology, University of Gdańsk, 80-308 Gdańsk, Poland
| | - Jakob Tougaard
- grid.7048.b0000 0001 1956 2722Department of Bioscience, Marine Mammal Research, Aarhus University, 4000 Roskilde, Denmark
| | - Freja Jakobsen
- grid.10825.3e0000 0001 0728 0170Department of Biology, University of Southern Denmark, 5230 Odense, Denmark
| | - Magnus Wahlberg
- grid.10825.3e0000 0001 0728 0170Department of Biology, University of Southern Denmark, 5230 Odense, Denmark
| | - Elmar Nöth
- grid.5330.50000 0001 2107 3311Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91058 Erlangen, Germany
| | - Andreas Maier
- grid.5330.50000 0001 2107 3311Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91058 Erlangen, Germany
| | - Barbara C. Klump
- grid.507516.00000 0004 7661 536XCognitive and Cultural Ecology Lab, Max Planck Institute of Animal Behavior, 78315 Radolfzell, Germany
| |
Collapse
|
12
|
Conant PC, Li P, Liu X, Klinck H, Fleishman E, Gillespie D, Nosal EM, Roch MA. Silbido profundo: An open source package for the use of deep learning to detect odontocete whistles. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:3800. [PMID: 36586843 DOI: 10.1121/10.0016631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 12/08/2022] [Indexed: 06/17/2023]
Abstract
This work presents an open-source matlab software package for exploiting recent advances in extracting tonal signals from large acoustic data sets. A whistle extraction algorithm published by Li, Liu, Palmer, Fleishman, Gillespie, Nosal, Shiu, Klinck, Cholewiak, Helble, and Roch [(2020). Proceedings of the International Joint Conference on Neural Networks, July 19-24, Glasgow, Scotland, p. 10] is incorporated into silbido, an established software package for extraction of cetacean tonal calls. The precision and recall of the new system were over 96% and nearly 80%, respectively, when applied to a whistle extraction task on a challenging two-species subset of a conference-benchmark data set. A second data set was examined to assess whether the algorithm generalized to data that were collected across different recording devices and locations. These data included 487 h of weakly labeled, towed array data collected in the Pacific Ocean on two National Oceanographic and Atmospheric Administration (NOAA) cruises. Labels for these data consisted of regions of toothed whale presence for at least 15 species that were based on visual and acoustic observations and not limited to whistles. Although the lack of per whistle-level annotations prevented measurement of precision and recall, there was strong concurrence of automatic detections and the NOAA annotations, suggesting that the algorithm generalizes well to new data.
Collapse
Affiliation(s)
- Peter C Conant
- Department of Computer Science, San Diego State University, San Diego, California 92182, USA
| | - Pu Li
- Department of Computer Science, San Diego State University, San Diego, California 92182, USA
| | - Xiaobai Liu
- Department of Computer Science, San Diego State University, San Diego, California 92182, USA
| | - Holger Klinck
- K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University, New York, New York 14850, USA
| | - Erica Fleishman
- College of Earth, Ocean, and Atmospheric Sciences, Oregon State University, Corvallis, Oregon 97331, USA
| | - Douglas Gillespie
- Sea Mammal Research Unit, Scottish Oceans Institute, University of St. Andrews, St. Andrews, KY16 9AJ, United Kingdom
| | - Eva-Marie Nosal
- Department of Ocean and Resources Engineering, University of Hawai'i at Mānoa, Honolulu, Hawaii 96822, USA
| | - Marie A Roch
- Department of Computer Science, San Diego State University, San Diego, California 92182, USA
| |
Collapse
|
13
|
Mutanu L, Gohil J, Gupta K, Wagio P, Kotonya G. A Review of Automated Bioacoustics and General Acoustics Classification Research. SENSORS (BASEL, SWITZERLAND) 2022; 22:8361. [PMID: 36366061 PMCID: PMC9658612 DOI: 10.3390/s22218361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Revised: 10/19/2022] [Accepted: 10/21/2022] [Indexed: 06/16/2023]
Abstract
Automated bioacoustics classification has received increasing attention from the research community in recent years due its cross-disciplinary nature and its diverse application. Applications in bioacoustics classification range from smart acoustic sensor networks that investigate the effects of acoustic vocalizations on species to context-aware edge devices that anticipate changes in their environment adapt their sensing and processing accordingly. The research described here is an in-depth survey of the current state of bioacoustics classification and monitoring. The survey examines bioacoustics classification alongside general acoustics to provide a representative picture of the research landscape. The survey reviewed 124 studies spanning eight years of research. The survey identifies the key application areas in bioacoustics research and the techniques used in audio transformation and feature extraction. The survey also examines the classification algorithms used in bioacoustics systems. Lastly, the survey examines current challenges, possible opportunities, and future directions in bioacoustics.
Collapse
Affiliation(s)
- Leah Mutanu
- Department of Computing, United States International University Africa, Nairobi P.O. Box 14634-0800, Kenya
| | - Jeet Gohil
- Department of Computing, United States International University Africa, Nairobi P.O. Box 14634-0800, Kenya
| | - Khushi Gupta
- Department of Computer Science, Sam Houston State University, Huntsville, TX 77341, USA
| | - Perpetua Wagio
- Department of Computing, United States International University Africa, Nairobi P.O. Box 14634-0800, Kenya
| | - Gerald Kotonya
- School of Computing and Communications, Lancaster University, Lacaster LA1 4WA, UK
| |
Collapse
|
14
|
Andreas J, Beguš G, Bronstein MM, Diamant R, Delaney D, Gero S, Goldwasser S, Gruber DF, de Haas S, Malkin P, Pavlov N, Payne R, Petri G, Rus D, Sharma P, Tchernov D, Tønnesen P, Torralba A, Vogt D, Wood RJ. Toward understanding the communication in sperm whales. iScience 2022; 25:104393. [PMID: 35663036 PMCID: PMC9160774 DOI: 10.1016/j.isci.2022.104393] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022] Open
Abstract
Machine learning has been advancing dramatically over the past decade. Most strides are human-based applications due to the availability of large-scale datasets; however, opportunities are ripe to apply this technology to more deeply understand non-human communication. We detail a scientific roadmap for advancing the understanding of communication of whales that can be built further upon as a template to decipher other forms of animal and non-human communication. Sperm whales, with their highly developed neuroanatomical features, cognitive abilities, social structures, and discrete click-based encoding make for an excellent model for advanced tools that can be applied to other animals in the future. We outline the key elements required for the collection and processing of massive datasets, detecting basic communication units and language-like higher-level structures, and validating models through interactive playback experiments. The technological capabilities developed by such an undertaking hold potential for cross-applications in broader communities investigating non-human communication and behavioral research.
Collapse
Affiliation(s)
- Jacob Andreas
- MIT CSAIL, Cambridge, MA, USA
- Project CETI, New York, NY, USA
| | - Gašper Beguš
- Department of Linguistics, University of California, Berkeley, CA, USA
- Project CETI, New York, NY, USA
| | - Michael M. Bronstein
- Department of Computer Science, University of Oxford, Oxford, UK
- IDSIA, University of Lugano, Lugano, Switzerland
- Twitter, London, UK
- Project CETI, New York, NY, USA
| | - Roee Diamant
- Leon H. Charney School of Marine Sciences, University of Haifa, Haifa, Israel
- Project CETI, New York, NY, USA
| | - Denley Delaney
- Exploration Technology Lab, National Geographic Society, Washington DC, USA
- Project CETI, New York, NY, USA
| | - Shane Gero
- Dominica Sperm Whale Project, Roseau, Commonwealth of Dominica
- Department of Biology, Carleton University, Ottawa, ON, Canada
- Project CETI, New York, NY, USA
| | - Shafi Goldwasser
- Simons Institute for the Theory of Computing, University of California, Berkeley, CA, USA
| | - David F. Gruber
- Department of Natural Sciences, Baruch College and The Graduate Center, PhD Program in Biology, City University of New York, New York, NY, USA
- Project CETI, New York, NY, USA
| | - Sarah de Haas
- Google Research, Mountain View, CA USA
- Project CETI, New York, NY, USA
| | - Peter Malkin
- Google Research, Mountain View, CA USA
- Project CETI, New York, NY, USA
| | | | | | - Giovanni Petri
- ISI Foundation, Turin, Italy
- Project CETI, New York, NY, USA
| | - Daniela Rus
- MIT CSAIL, Cambridge, MA, USA
- Project CETI, New York, NY, USA
| | | | - Dan Tchernov
- Leon H. Charney School of Marine Sciences, University of Haifa, Haifa, Israel
- Project CETI, New York, NY, USA
| | - Pernille Tønnesen
- Marine Bioacoustics Lab, Zoophysiology, Department of Biology, Aarhus University, Aarhus, Denmark
- Project CETI, New York, NY, USA
| | | | - Daniel Vogt
- School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
- Project CETI, New York, NY, USA
| | - Robert J. Wood
- School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
- Project CETI, New York, NY, USA
| |
Collapse
|
15
|
Parameterizing animal sounds and motion with animal-attached tags to study acoustic communication. Behav Ecol Sociobiol 2022. [DOI: 10.1007/s00265-022-03154-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Abstract
Stemming from the traditional use of field observers to score states and events, the study of animal behaviour often relies on analyses of discrete behavioural categories. Many studies of acoustic communication record sequences of animal sounds, classify vocalizations, and then examine how call categories are used relative to behavioural states and events. However, acoustic parameters can also convey information independent of call type, offering complementary study approaches to call classifications. Animal-attached tags can continuously sample high-resolution behavioural data on sounds and movements, which enables testing how acoustic parameters of signals relate to parameters of animal motion. Here, we present this approach through case studies on wild common bottlenose dolphins (Tursiops truncatus). Using data from sound-and-movement recording tags deployed in Sarasota (FL), we parameterized dolphin vocalizations and motion to investigate how senders and receivers modified movement parameters (including vectorial dynamic body acceleration, “VeDBA”, a proxy for activity intensity) as a function of signal parameters. We show that (1) VeDBA of one female during consortships had a negative relationship with centroid frequency of male calls, matching predictions about agonistic interactions based on motivation-structural rules; (2) VeDBA of four males had a positive relationship with modulation rate of their pulsed vocalizations, confirming predictions that click-repetition rate of these calls increases with agonism intensity. Tags offer opportunities to study animal behaviour through analyses of continuously sampled quantitative parameters, which can complement traditional methods and facilitate research replication. Our case studies illustrate the value of this approach to investigate communicative roles of acoustic parameter changes.
Significance statement
Studies of animal behaviour have traditionally relied on classification of behavioural patterns and analyses of discrete behavioural categories. Today, technologies such as animal-attached tags enable novel approaches, facilitating the use of quantitative metrics to characterize behaviour. In the field of acoustic communication, researchers typically classify vocalizations and examine usage of call categories. Through case studies of bottlenose dolphin social interactions, we present here a novel tag-based complementary approach. We used high-resolution tag data to parameterize dolphin sounds and motion, and we applied continuously sampled parameters to examine how individual dolphins responded to conspecifics’ signals and moved while producing sounds. Activity intensity of senders and receivers changed with specific call parameters, matching our predictions and illustrating the value of our approach to test communicative roles of acoustic parameter changes. Parametric approaches can complement traditional methods for animal behaviour and facilitate research replication.
Collapse
|
16
|
Trapanotto M, Nanni L, Brahnam S, Guo X. Convolutional Neural Networks for the Identification of African Lions from Individual Vocalizations. J Imaging 2022; 8:jimaging8040096. [PMID: 35448223 PMCID: PMC9029749 DOI: 10.3390/jimaging8040096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 03/17/2022] [Accepted: 03/29/2022] [Indexed: 02/05/2023] Open
Abstract
The classification of vocal individuality for passive acoustic monitoring (PAM) and census of animals is becoming an increasingly popular area of research. Nearly all studies in this field of inquiry have relied on classic audio representations and classifiers, such as Support Vector Machines (SVMs) trained on spectrograms or Mel-Frequency Cepstral Coefficients (MFCCs). In contrast, most current bioacoustic species classification exploits the power of deep learners and more cutting-edge audio representations. A significant reason for avoiding deep learning in vocal identity classification is the tiny sample size in the collections of labeled individual vocalizations. As is well known, deep learners require large datasets to avoid overfitting. One way to handle small datasets with deep learning methods is to use transfer learning. In this work, we evaluate the performance of three pretrained CNNs (VGG16, ResNet50, and AlexNet) on a small, publicly available lion roar dataset containing approximately 150 samples taken from five male lions. Each of these networks is retrained on eight representations of the samples: MFCCs, spectrogram, and Mel spectrogram, along with several new ones, such as VGGish and stockwell, and those based on the recently proposed LM spectrogram. The performance of these networks, both individually and in ensembles, is analyzed and corroborated using the Equal Error Rate and shown to surpass previous classification attempts on this dataset; the best single network achieved over 95% accuracy and the best ensembles over 98% accuracy. The contributions this study makes to the field of individual vocal classification include demonstrating that it is valuable and possible, with caution, to use transfer learning with single pretrained CNNs on the small datasets available for this problem domain. We also make a contribution to bioacoustics generally by offering a comparison of the performance of many state-of-the-art audio representations, including for the first time the LM spectrogram and stockwell representations. All source code for this study is available on GitHub.
Collapse
Affiliation(s)
- Martino Trapanotto
- Department of Information Engineering, University of Padua, Via Gradenigo 6, 35131 Padova, Italy; (M.T.); (L.N.)
| | - Loris Nanni
- Department of Information Engineering, University of Padua, Via Gradenigo 6, 35131 Padova, Italy; (M.T.); (L.N.)
| | - Sheryl Brahnam
- Information Technology and Cybersecurity, Missouri State University, 901 S. National, Springfield, MO 65897, USA;
- Correspondence: ; Tel.: +1-417-873-9979
| | - Xiang Guo
- Information Technology and Cybersecurity, Missouri State University, 901 S. National, Springfield, MO 65897, USA;
| |
Collapse
|
17
|
Parsons MJG, Lin TH, Mooney TA, Erbe C, Juanes F, Lammers M, Li S, Linke S, Looby A, Nedelec SL, Van Opzeeland I, Radford C, Rice AN, Sayigh L, Stanley J, Urban E, Di Iorio L. Sounding the Call for a Global Library of Underwater Biological Sounds. Front Ecol Evol 2022. [DOI: 10.3389/fevo.2022.810156] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Aquatic environments encompass the world’s most extensive habitats, rich with sounds produced by a diversity of animals. Passive acoustic monitoring (PAM) is an increasingly accessible remote sensing technology that uses hydrophones to listen to the underwater world and represents an unprecedented, non-invasive method to monitor underwater environments. This information can assist in the delineation of biologically important areas via detection of sound-producing species or characterization of ecosystem type and condition, inferred from the acoustic properties of the local soundscape. At a time when worldwide biodiversity is in significant decline and underwater soundscapes are being altered as a result of anthropogenic impacts, there is a need to document, quantify, and understand biotic sound sources–potentially before they disappear. A significant step toward these goals is the development of a web-based, open-access platform that provides: (1) a reference library of known and unknown biological sound sources (by integrating and expanding existing libraries around the world); (2) a data repository portal for annotated and unannotated audio recordings of single sources and of soundscapes; (3) a training platform for artificial intelligence algorithms for signal detection and classification; and (4) a citizen science-based application for public users. Although individually, these resources are often met on regional and taxa-specific scales, many are not sustained and, collectively, an enduring global database with an integrated platform has not been realized. We discuss the benefits such a program can provide, previous calls for global data-sharing and reference libraries, and the challenges that need to be overcome to bring together bio- and ecoacousticians, bioinformaticians, propagation experts, web engineers, and signal processing specialists (e.g., artificial intelligence) with the necessary support and funding to build a sustainable and scalable platform that could address the needs of all contributors and stakeholders into the future.
Collapse
|
18
|
Escobar-Amado CD, Badiey M, Pecknold S. Automatic detection and classification of bearded seal vocalizations in the northeastern Chukchi Sea using convolutional neural networks. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:299. [PMID: 35105050 DOI: 10.1121/10.0009256] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 12/13/2021] [Indexed: 06/14/2023]
Abstract
Bearded seals vocalizations are often analyzed manually or by using automatic detections that are manually validated. In this work, an automatic detection and classification system (DCS) based on convolutional neural networks (CNNs) is proposed. Bearded seal sounds were year-round recorded by four spatially separated receivers on the Chukchi Continental Slope in Alaska in 2016-2017. The DCS is divided in two sections. First, regions of interest (ROI) containing possible bearded seal vocalizations are found by using the two-dimensional normalized cross correlation of the measured spectrogram and a representative template of two main calls of interest. Second, CNNs are used to validate and classify the ROIs among several possible classes. The CNNs are trained on 80% of the ROIs manually labeled from one of the four spatially separated recorders. When validating on the remaining 20%, the CNNs show an accuracy above 95.5%. To assess the generalization performance of the networks, the CNNs are tested on the remaining recorders, located at different positions, with a precision above 89.2% for the main class of the two types of calls. The proposed technique reduces the laborious task of manual inspection prone to inconstant bias and possible errors in detections.
Collapse
Affiliation(s)
| | - Mohsen Badiey
- Department of Electrical Engineering, University of Delaware, Newark, Delaware 19716, USA
| | - Sean Pecknold
- Defence Research and Development Canada, Dartmouth, Nova Scotia, B3K 5X5, Nova Scotia, Canada
| |
Collapse
|
19
|
Bermant PC. BioCPPNet: automatic bioacoustic source separation with deep neural networks. Sci Rep 2021; 11:23502. [PMID: 34873197 PMCID: PMC8648737 DOI: 10.1038/s41598-021-02790-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 11/16/2021] [Indexed: 11/09/2022] Open
Abstract
We introduce the Bioacoustic Cocktail Party Problem Network (BioCPPNet), a lightweight, modular, and robust U-Net-based machine learning architecture optimized for bioacoustic source separation across diverse biological taxa. Employing learnable or handcrafted encoders, BioCPPNet operates directly on the raw acoustic mixture waveform containing overlapping vocalizations and separates the input waveform into estimates corresponding to the sources in the mixture. Predictions are compared to the reference ground truth waveforms by searching over the space of (output, target) source order permutations, and we train using an objective function motivated by perceptual audio quality. We apply BioCPPNet to several species with unique vocal behavior, including macaques, bottlenose dolphins, and Egyptian fruit bats, and we evaluate reconstruction quality of separated waveforms using the scale-invariant signal-to-distortion ratio (SI-SDR) and downstream identity classification accuracy. We consider mixtures with two or three concurrent conspecific vocalizers, and we examine separation performance in open and closed speaker scenarios. To our knowledge, this paper redefines the state-of-the-art in end-to-end single-channel bioacoustic source separation in a permutation-invariant regime across a heterogeneous set of non-human species. This study serves as a major step toward the deployment of bioacoustic source separation systems for processing substantial volumes of previously unusable data containing overlapping bioacoustic signals.
Collapse
|
20
|
Kitzes J, Blake R, Bombaci S, Chapman M, Duran SM, Huang T, Joseph MB, Lapp S, Marconi S, Oestreich WK, Rhinehart TA, Schweiger AK, Song Y, Surasinghe T, Yang D, Yule K. Expanding NEON biodiversity surveys with new instrumentation and machine learning approaches. Ecosphere 2021. [DOI: 10.1002/ecs2.3795] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Affiliation(s)
- Justin Kitzes
- Department of Biological Sciences University of Pittsburgh Pittsburgh Pennsylvania USA
| | - Rachael Blake
- National Socio‐Environmental Synthesis Center Annapolis Maryland USA
| | - Sara Bombaci
- Department of Fish, Wildlife, and Conservation Biology Colorado State University Fort Collins Colorado USA
| | - Melissa Chapman
- Department of Environmental Science, Policy, and Management University of California Berkeley Berkeley California USA
| | - Sandra M. Duran
- Department of Ecology & Evolutionary Biology The University of Arizona Tucson Arizona USA
| | - Tao Huang
- Human‐Environment Systems Boise State University Boise Idaho USA
| | - Maxwell B. Joseph
- Earth Lab Cooperative Institute for Research in Environmental Sciences (CIRES) University of Colorado Boulder Boulder Colorado USA
| | - Samuel Lapp
- Department of Biological Sciences University of Pittsburgh Pittsburgh Pennsylvania USA
| | - Sergio Marconi
- Department of Wildlife Ecology and Conservation University of Florida Gainesville Florida USA
| | | | - Tessa A. Rhinehart
- Department of Biological Sciences University of Pittsburgh Pittsburgh Pennsylvania USA
| | | | - Yiluan Song
- Environmental Studies Department University of California Santa Cruz California USA
| | - Thilina Surasinghe
- Department of Biological Sciences Bridgewater State University Bridgewater Massachusetts USA
| | - Di Yang
- Wyoming Geographic Information Science Center (WyGISC) University of Wyoming Laramie Wyoming USA
| | - Kelsey Yule
- National Ecological Observatory Network Biorepository Arizona State University Tempe Arizona USA
| |
Collapse
|
21
|
Forest Vertical Structure Mapping Using Two-Seasonal Optic Images and LiDAR DSM Acquired from UAV Platform through Random Forest, XGBoost, and Support Vector Machine Approaches. REMOTE SENSING 2021. [DOI: 10.3390/rs13214282] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Research on the forest structure classification is essential, as it plays an important role in assessing the vitality and diversity of vegetation. However, classifying forest structure involves in situ surveying, which requires considerable time and money, and cannot be conducted directly in some instances; also, the update cycle of the classification data is very late. To overcome these drawbacks, feasibility studies on mapping the forest vertical structure from aerial images using machine learning techniques were conducted. In this study, we investigated (1) the performance improvement of the forest structure classification, using a high-resolution LiDAR-derived digital surface model (DSM) acquired from an unmanned aerial vehicle (UAV) platform and (2) the performance comparison of results obtained from the single-seasonal and two-seasonal data, using random forest (RF), extreme gradient boosting (XGBoost), and support vector machine (SVM). For the performance comparison, the UAV optic and LiDAR data were divided into three cases: (1) only used autumn data, (2) only used winter data, and (3) used both autumn and winter data. From the results, the best model was XGBoost, and the F1 scores achieved using this method were approximately 0.92 in the autumn and winter cases. A remarkable improvement was achieved when both two-seasonal images were used. The F1 score improved by 35.3% from 0.68 to 0.92. This implies that (1) the seasonal variation in the forest vertical structure can be more important than the spatial resolution, and (2) the classification performance achieved from the two-seasonal UAV optic images and LiDAR-derived DSMs can reach 0.9 with the application of an optimal machine learning approach.
Collapse
|
22
|
Comparing recurrent convolutional neural networks for large scale bird species classification. Sci Rep 2021; 11:17085. [PMID: 34429468 PMCID: PMC8385065 DOI: 10.1038/s41598-021-96446-w] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Accepted: 08/10/2021] [Indexed: 11/08/2022] Open
Abstract
We present a deep learning approach towards the large-scale prediction and analysis of bird acoustics from 100 different bird species. We use spectrograms constructed on bird audio recordings from the Cornell Bird Challenge (CBC)2020 dataset, which includes recordings of multiple and potentially overlapping bird vocalizations with background noise. Our experiments show that a hybrid modeling approach that involves a Convolutional Neural Network (CNN) for learning the representation for a slice of the spectrogram, and a Recurrent Neural Network (RNN) for the temporal component to combine across time-points leads to the most accurate model on this dataset. We show results on a spectrum of models ranging from stand-alone CNNs to hybrid models of various types obtained by combining CNNs with other CNNs or RNNs of the following types: Long Short-Term Memory (LSTM) networks, Gated Recurrent Units (GRU), and Legendre Memory Units (LMU). The best performing model achieves an average accuracy of 67% over the 100 different bird species, with the highest accuracy of 90% for the bird species, Red crossbill. We further analyze the learned representations visually and find them to be intuitive, where we find that related bird species are clustered close together. We present a novel way to empirically interpret the representations learned by the LMU-based hybrid model which shows how memory channel patterns change over time with the changes seen in the spectrograms.
Collapse
|
23
|
Bravo Sanchez FJ, Hossain MR, English NB, Moore ST. Bioacoustic classification of avian calls from raw sound waveforms with an open-source deep learning architecture. Sci Rep 2021; 11:15733. [PMID: 34344970 PMCID: PMC8333097 DOI: 10.1038/s41598-021-95076-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Accepted: 07/13/2021] [Indexed: 11/22/2022] Open
Abstract
The use of autonomous recordings of animal sounds to detect species is a popular conservation tool, constantly improving in fidelity as audio hardware and software evolves. Current classification algorithms utilise sound features extracted from the recording rather than the sound itself, with varying degrees of success. Neural networks that learn directly from the raw sound waveforms have been implemented in human speech recognition but the requirements of detailed labelled data have limited their use in bioacoustics. Here we test SincNet, an efficient neural network architecture that learns from the raw waveform using sinc-based filters. Results using an off-the-shelf implementation of SincNet on a publicly available bird sound dataset (NIPS4Bplus) show that the neural network rapidly converged reaching accuracies of over 65% with limited data. Their performance is comparable with traditional methods after hyperparameter tuning but they are more efficient. Learning directly from the raw waveform allows the algorithm to select automatically those elements of the sound that are best suited for the task, bypassing the onerous task of selecting feature extraction techniques and reducing possible biases. We use publicly released code and datasets to encourage others to replicate our results and to apply SincNet to their own datasets; and we review possible enhancements in the hope that algorithms that learn from the raw waveform will become useful bioacoustic tools.
Collapse
Affiliation(s)
- Francisco J Bravo Sanchez
- School of Engineering and Technology, Central Queensland University, North Rockhampton, QLD, Australia
| | - Md Rahat Hossain
- School of Engineering and Technology, Central Queensland University, North Rockhampton, QLD, Australia
| | - Nathan B English
- School of Health, Medical and Applied Sciences, Flora, Fauna and Freshwater Research, Central Queensland University, Townsville, QLD, Australia
| | - Steven T Moore
- School of Engineering and Technology, Central Queensland University, North Rockhampton, QLD, Australia.
| |
Collapse
|
24
|
Roch MA, Lindeneau S, Aurora GS, Frasier KE, Hildebrand JA, Glotin H, Baumann-Pickering S. Using context to train time-domain echolocation click detectors. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:3301. [PMID: 34241092 DOI: 10.1121/10.0004992] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 04/26/2021] [Indexed: 06/13/2023]
Abstract
This work demonstrates the effectiveness of using humans in the loop processes for constructing large training sets for machine learning tasks. A corpus of over 57 000 toothed whale echolocation clicks was developed by using a permissive energy-based echolocation detector followed by a machine-assisted quality control process that exploits contextual cues. Subsets of these data were used to train feed forward neural networks that detected over 850 000 echolocation clicks that were validated using the same quality control process. It is shown that this network architecture performs well in a variety of contexts and is evaluated against a withheld data set that was collected nearly five years apart from the development data at a location over 600 km distant. The system was capable of finding echolocation bouts that were missed by human analysts, and the patterns of error in the classifier consist primarily of anthropogenic sources that were not included as counter-training examples. In the absence of such events, typical false positive rates are under ten events per hour even at low thresholds.
Collapse
Affiliation(s)
- Marie A Roch
- Department of Computer Science, San Diego State University, 5500 Campanile Drive, San Diego, California 92182-7720, USA
| | - Scott Lindeneau
- Department of Computer Science, San Diego State University, 5500 Campanile Drive, San Diego, California 92182-7720, USA
| | - Gurisht Singh Aurora
- Department of Computer Science, San Diego State University, 5500 Campanile Drive, San Diego, California 92182-7720, USA
| | - Kaitlin E Frasier
- Scripps Institution of Oceanography, University of California, San Diego, 9500 Gilman Drive #0205, La Jolla, California 92093, USA
| | - John A Hildebrand
- Scripps Institution of Oceanography, University of California, San Diego, 9500 Gilman Drive #0205, La Jolla, California 92093, USA
| | - Hervé Glotin
- Université de Toulon, BP 20132, 83957 La Garde Cedex, France
| | - Simone Baumann-Pickering
- Scripps Institution of Oceanography, University of California, San Diego, 9500 Gilman Drive #0205, La Jolla, California 92093, USA
| |
Collapse
|
25
|
Rasmussen JH, Širović A. Automatic detection and classification of baleen whale social calls using convolutional neural networks. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:3635. [PMID: 34241118 DOI: 10.1121/10.0005047] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Accepted: 04/30/2021] [Indexed: 06/13/2023]
Abstract
Passive acoustic monitoring has proven to be an indispensable tool for many aspects of baleen whale research. Manual detection of whale calls on these large data sets demands extensive manual labor. Automated whale call detectors offer a more efficient approach and have been developed for many species and call types. However, calls with a large level of variability such as fin whale (Balaenoptera physalus) 40 Hz call and blue whale (B. musculus) D call have been challenging to detect automatically and hence no practical automated detector exists for these two call types. Using a modular approach consisting of faster region-based convolutional neural network followed by a convolutional neural network, we have created automated detectors for 40 Hz calls and D calls. Both detectors were tested on recordings with high- and low density of calls and, when selecting for detections with high classification scores, they were shown to have precision ranging from 54% to 57% with recall ranging from 72% to 78% for 40 Hz and precision ranging from 62% to 64% with recall ranging from 70 to 73% for D calls. As these two call types are produced by both sexes, using them in long-term studies would remove sex-bias in estimates of temporal presence and movement patterns.
Collapse
Affiliation(s)
- Jeppe Have Rasmussen
- Department of Marine Biology, Texas A&M University at Galveston, Galveston, Texas 77554, USA
| | - Ana Širović
- Department of Marine Biology, Texas A&M University at Galveston, Galveston, Texas 77554, USA
| |
Collapse
|
26
|
Senevirathna JDM, Asakawa S. Multi-Omics Approaches and Radiation on Lipid Metabolism in Toothed Whales. Life (Basel) 2021; 11:364. [PMID: 33923876 PMCID: PMC8074237 DOI: 10.3390/life11040364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2021] [Revised: 04/09/2021] [Accepted: 04/17/2021] [Indexed: 11/25/2022] Open
Abstract
Lipid synthesis pathways of toothed whales have evolved since their movement from the terrestrial to marine environment. The synthesis and function of these endogenous lipids and affecting factors are still little understood. In this review, we focused on different omics approaches and techniques to investigate lipid metabolism and radiation impacts on lipids in toothed whales. The selected literature was screened, and capacities, possibilities, and future approaches for identifying unusual lipid synthesis pathways by omics were evaluated. Omics approaches were categorized into the four major disciplines: lipidomics, transcriptomics, genomics, and proteomics. Genomics and transcriptomics can together identify genes related to unique lipid synthesis. As lipids interact with proteins in the animal body, lipidomics, and proteomics can correlate by creating lipid-binding proteome maps to elucidate metabolism pathways. In lipidomics studies, recent mass spectroscopic methods can address lipid profiles; however, the determination of structures of lipids are challenging. As an environmental stress, the acoustic radiation has a significant effect on the alteration of lipid profiles. Radiation studies in different omics approaches revealed the necessity of multi-omics applications. This review concluded that a combination of many of the omics areas may elucidate the metabolism of lipids and possible hazards on lipids in toothed whales by radiation.
Collapse
Affiliation(s)
- Jayan D. M. Senevirathna
- Laboratory of Aquatic Molecular Biology and Biotechnology, Department of Aquatic Bioscience, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo 113-8657, Japan;
- Department of Animal Science, Faculty of Animal Science and Export Agriculture, Uva Wellassa University, Badulla 90000, Sri Lanka
| | - Shuichi Asakawa
- Laboratory of Aquatic Molecular Biology and Biotechnology, Department of Aquatic Bioscience, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo 113-8657, Japan;
| |
Collapse
|
27
|
Ozanich E, Thode A, Gerstoft P, Freeman LA, Freeman S. Deep embedded clustering of coral reef bioacoustics. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:2587. [PMID: 33940892 DOI: 10.1121/10.0004221] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Accepted: 03/19/2021] [Indexed: 06/12/2023]
Abstract
Deep clustering was applied to unlabeled, automatically detected signals in a coral reef soundscape to distinguish fish pulse calls from segments of whale song. Deep embedded clustering (DEC) learned latent features and formed classification clusters using fixed-length power spectrograms of the signals. Handpicked spectral and temporal features were also extracted and clustered with Gaussian mixture models (GMM) and conventional clustering. DEC, GMM, and conventional clustering were tested on simulated datasets of fish pulse calls (fish) and whale song units (whale) with randomized bandwidth, duration, and SNR. Both GMM and DEC achieved high accuracy and identified clusters with fish, whale, and overlapping fish and whale signals. Conventional clustering methods had low accuracy in scenarios with unequal-sized clusters or overlapping signals. Fish and whale signals recorded near Hawaii in February-March 2020 were clustered with DEC, GMM, and conventional clustering. DEC features demonstrated the highest accuracy of 77.5% on a small, manually labeled dataset for classifying signals into fish and whale clusters.
Collapse
Affiliation(s)
- Emma Ozanich
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, California 92037, USA
| | - Aaron Thode
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, California 92037, USA
| | - Peter Gerstoft
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, California 92037, USA
| | - Lauren A Freeman
- Naval Undersea Warfare Center Newport, Newport, Rhode Island 02841, USA
| | - Simon Freeman
- Naval Undersea Warfare Center Newport, Newport, Rhode Island 02841, USA
| |
Collapse
|
28
|
Colonna JG, Carvalho JRH, Rosso OA. Estimating ecoacoustic activity in the Amazon rainforest through Information Theory quantifiers. PLoS One 2020; 15:e0229425. [PMID: 32716981 PMCID: PMC7384625 DOI: 10.1371/journal.pone.0229425] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2020] [Accepted: 06/26/2020] [Indexed: 11/19/2022] Open
Abstract
Automatic monitoring of biodiversity by acoustic sensors has become an indispensable tool to assess environmental stress at an early stage. Due to the difficulty in recognizing the Amazon's high acoustic diversity and the large amounts of raw audio data recorded by the sensors, the labeling and manual inspection of this data is not feasible. Therefore, we propose an ecoacoustic index that allows us to quantify the complexity of an audio segment and correlate this measure with the biodiversity of the soundscape. The approach uses unsupervised methods to avoid the problem of labeling each species individually. The proposed index, named the Ecoacoustic Global Complexity Index (EGCI), makes use of Entropy, Divergence and Statistical Complexity. A distinguishing feature of this index is the mapping of each audio segment, including those of varied lengths, as a single point in a 2D-plane, supporting us in understanding the ecoacoustic dynamics of the rainforest. The main results show a regularity in the ecoacoustic richness of a floodplain, considering different temporal granularities, be it between hours of the day or between consecutive days of the monitoring program. We observed that this regularity does a good job of characterizing the soundscape of the environmental protection area of Mamirauá, in the Amazon, differentiating between species richness and environmental phenomena.
Collapse
Affiliation(s)
- Juan G. Colonna
- Instituto de Computação (IComp), Universidade Federal do Amazonas (UFAM), Manaus, Amazonas, Brasil
| | - José R. H. Carvalho
- Instituto de Computação (IComp), Universidade Federal do Amazonas (UFAM), Manaus, Amazonas, Brasil
| | - Osvaldo A. Rosso
- Instituto de Física, Universidade Federal de Alagoas (UFAL), Maceió, Alagoas, Brasil
| |
Collapse
|
29
|
Bioacoustic Classification of Antillean Manatee Vocalization Spectrograms Using Deep Convolutional Neural Networks. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10093286] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
We evaluated the potential of using convolutional neural networks in classifying spectrograms of Antillean manatee (Trichechus manatus manatus) vocalizations. Spectrograms using binary, linear and logarithmic amplitude formats were considered. Two deep convolutional neural networks (DCNN) architectures were tested: linear (fixed filter size) and pyramidal (incremental filter size). Six experiments were devised for testing the accuracy obtained for each spectrogram representation and architecture combination. Results show that binary spectrograms with both linear and pyramidal architectures with dropout provide a classification rate of 94–99% on the training and 92–98% on the testing set, respectively. The pyramidal network presents a shorter training and inference time. Results from the convolutional neural networks (CNN) are substantially better when compared with a signal processing fast Fourier transform (FFT)-based harmonic search approach in terms of accuracy and F1 Score. Taken together, these results prove the validity of using spectrograms and using DCNNs for manatee vocalization classification. These results can be used to improve future software and hardware implementations for the estimation of the manatee population in Panama.
Collapse
|
30
|
Kirsebom OS, Frazao F, Simard Y, Roy N, Matwin S, Giard S. Performance of a deep neural network at detecting North Atlantic right whale upcalls. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:2636. [PMID: 32359246 DOI: 10.1121/10.0001132] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Accepted: 04/05/2020] [Indexed: 06/11/2023]
Abstract
Passive acoustics provides a powerful tool for monitoring the endangered North Atlantic right whale (Eubalaena glacialis), but robust detection algorithms are needed to handle diverse and variable acoustic conditions and differences in recording techniques and equipment. This paper investigates the potential of deep neural networks (DNNs) for addressing this need. ResNet, an architecture commonly used for image recognition, was trained to recognize the time-frequency representation of the characteristic North Atlantic right whale upcall. The network was trained on several thousand examples recorded at various locations in the Gulf of St. Lawrence in 2018 and 2019, using different equipment and deployment techniques. Used as a detection algorithm on fifty 30-min recordings from the years 2015-2017 containing over one thousand upcalls, the network achieved recalls up to 80% while maintaining a precision of 90%. Importantly, the performance of the network improved as more variance was introduced into the training dataset, whereas the opposite trend was observed using a conventional linear discriminant analysis approach. This study demonstrates that DNNs can be trained to identify North Atlantic right whale upcalls under diverse and variable conditions with a performance that compares favorably to that of existing algorithms.
Collapse
Affiliation(s)
- Oliver S Kirsebom
- Institute for Big Data Analytics, Dalhousie University, Halifax, Nova Scotia, B3H 4R2, Canada
| | - Fabio Frazao
- Institute for Big Data Analytics, Dalhousie University, Halifax, Nova Scotia, B3H 4R2, Canada
| | - Yvan Simard
- Fisheries and Oceans Canada Chair in Underwater Acoustics Applied to Ecosystem and Marine Mammals, Marine Sciences Institute, University of Québec at Rimouski, Rimouski, Québec, Canada
| | - Nathalie Roy
- Maurice Lamontagne Institute, Fisheries and Oceans Canada, Mont-Joli, Québec, Canada
| | - Stan Matwin
- Institute for Big Data Analytics, Dalhousie University, Halifax, Nova Scotia, B3H 4R2, Canada
| | - Samuel Giard
- Maurice Lamontagne Institute, Fisheries and Oceans Canada, Mont-Joli, Québec, Canada
| |
Collapse
|
31
|
Zhong M, Castellote M, Dodhia R, Lavista Ferres J, Keogh M, Brewer A. Beluga whale acoustic signal classification using deep learning neural network models. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:1834. [PMID: 32237822 DOI: 10.1121/10.0000921] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Accepted: 02/27/2020] [Indexed: 05/23/2023]
Abstract
Over a decade after the Cook Inlet beluga (Delphinapterus leucas) was listed as endangered in 2008, the population has shown no sign of recovery. Lack of ecological knowledge limits the understanding of, and ability to manage, potential threats impeding recovery of this declining population. National Oceanic and Atmospheric Administration Fisheries, in partnership with the Alaska Department of Fish and Game, initiated a passive acoustics monitoring program in 2017 to investigate beluga seasonal occurrence by deploying a series of passive acoustic moorings. Data have been processed with semi-automated tonal detectors followed by time intensive manual validation. To reduce this labor intensive and time-consuming process, in addition to increasing the accuracy of classification results, the authors constructed an ensembled deep learning convolutional neural network model to classify beluga detections as true or false. Using a 0.5 threshold, the final model achieves 96.57% precision and 92.26% recall on testing dataset. This methodology proves to be successful at classifying beluga signals, and the framework can be easily generalized to other acoustic classification problems.
Collapse
Affiliation(s)
- Ming Zhong
- AI for Good Research Lab, Microsoft, Redmond, Washington 98052, USA
| | - Manuel Castellote
- Alaska Fisheries Science Center-National Oceanic and Atmospheric Administration (NOAA) Fisheries and Joint Institute for the Study of the Atmosphere and Ocean (JISAO), University of Washington, Seattle, Washington 98195, USA
| | - Rahul Dodhia
- AI for Good Research Lab, Microsoft, Redmond, Washington 98052, USA
| | | | - Mandy Keogh
- Alaska Department of Fish and Game, Juneau, Alaska 99802, USA
| | - Arial Brewer
- Alaska Fisheries Science Center-National Oceanic and Atmospheric Administration (NOAA) Fisheries and Joint Institute for the Study of the Atmosphere and Ocean (JISAO), University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
32
|
Bianco MJ, Gerstoft P, Traer J, Ozanich E, Roch MA, Gannot S, Deledalle CA. Machine learning in acoustics: Theory and applications. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:3590. [PMID: 31795641 DOI: 10.1121/1.5133944] [Citation(s) in RCA: 140] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2019] [Accepted: 10/14/2019] [Indexed: 06/10/2023]
Abstract
Acoustic data provide scientific and engineering insights in fields ranging from biology and communications to ocean and Earth science. We survey the recent advances and transformative potential of machine learning (ML), including deep learning, in the field of acoustics. ML is a broad family of techniques, which are often based in statistics, for automatically detecting and utilizing patterns in data. Relative to conventional acoustics and signal processing, ML is data-driven. Given sufficient training data, ML can discover complex relationships between features and desired labels or actions, or between features themselves. With large volumes of training data, ML can discover models describing complex acoustic phenomena such as human speech and reverberation. ML in acoustics is rapidly developing with compelling results and significant future promise. We first introduce ML, then highlight ML developments in four acoustics research areas: source localization in speech processing, source localization in ocean acoustics, bioacoustics, and environmental sounds in everyday scenes.
Collapse
Affiliation(s)
- Michael J Bianco
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, California 92093, USA
| | - Peter Gerstoft
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, California 92093, USA
| | - James Traer
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Emma Ozanich
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, California 92093, USA
| | - Marie A Roch
- Department of Computer Science, San Diego State University, San Diego, California 92182, USA
| | - Sharon Gannot
- Faculty of Engineering, Bar-Ilan University, Ramat-Gan 5290002, Israel
| | - Charles-Alban Deledalle
- Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, California 92093, USA
| |
Collapse
|