1
|
Arnaud V, Pellegrino F, Keenan S, St-Gelais X, Mathevon N, Levréro F, Coupé C. Improving the workflow to crack Small, Unbalanced, Noisy, but Genuine (SUNG) datasets in bioacoustics: The case of bonobo calls. PLoS Comput Biol 2023; 19:e1010325. [PMID: 37053268 PMCID: PMC10129004 DOI: 10.1371/journal.pcbi.1010325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Revised: 04/25/2023] [Accepted: 03/01/2023] [Indexed: 04/15/2023] Open
Abstract
Despite the accumulation of data and studies, deciphering animal vocal communication remains challenging. In most cases, researchers must deal with the sparse recordings composing Small, Unbalanced, Noisy, but Genuine (SUNG) datasets. SUNG datasets are characterized by a limited number of recordings, most often noisy, and unbalanced in number between the individuals or categories of vocalizations. SUNG datasets therefore offer a valuable but inevitably distorted vision of communication systems. Adopting the best practices in their analysis is essential to effectively extract the available information and draw reliable conclusions. Here we show that the most recent advances in machine learning applied to a SUNG dataset succeed in unraveling the complex vocal repertoire of the bonobo, and we propose a workflow that can be effective with other animal species. We implement acoustic parameterization in three feature spaces and run a Supervised Uniform Manifold Approximation and Projection (S-UMAP) to evaluate how call types and individual signatures cluster in the bonobo acoustic space. We then implement three classification algorithms (Support Vector Machine, xgboost, neural networks) and their combination to explore the structure and variability of bonobo calls, as well as the robustness of the individual signature they encode. We underscore how classification performance is affected by the feature set and identify the most informative features. In addition, we highlight the need to address data leakage in the evaluation of classification performance to avoid misleading interpretations. Our results lead to identifying several practical approaches that are generalizable to any other animal communication system. To improve the reliability and replicability of vocal communication studies with SUNG datasets, we thus recommend: i) comparing several acoustic parameterizations; ii) visualizing the dataset with supervised UMAP to examine the species acoustic space; iii) adopting Support Vector Machines as the baseline classification approach; iv) explicitly evaluating data leakage and possibly implementing a mitigation strategy.
Collapse
Affiliation(s)
- Vincent Arnaud
- Département des arts, des lettres et du langage, Université du Québec à Chicoutimi, Chicoutimi, Canada
- Laboratoire Dynamique Du Langage, UMR 5596, Université de Lyon, CNRS, Lyon, France
| | - François Pellegrino
- Laboratoire Dynamique Du Langage, UMR 5596, Université de Lyon, CNRS, Lyon, France
| | - Sumir Keenan
- ENES Bioacoustics Research Laboratory, University of Saint Étienne, CRNL, CNRS UMR 5292, Inserm UMR_S 1028, Saint-Étienne, France
| | - Xavier St-Gelais
- Département des arts, des lettres et du langage, Université du Québec à Chicoutimi, Chicoutimi, Canada
| | - Nicolas Mathevon
- ENES Bioacoustics Research Laboratory, University of Saint Étienne, CRNL, CNRS UMR 5292, Inserm UMR_S 1028, Saint-Étienne, France
| | - Florence Levréro
- ENES Bioacoustics Research Laboratory, University of Saint Étienne, CRNL, CNRS UMR 5292, Inserm UMR_S 1028, Saint-Étienne, France
| | - Christophe Coupé
- Laboratoire Dynamique Du Langage, UMR 5596, Université de Lyon, CNRS, Lyon, France
- Department of Linguistics, The University of Hong Kong, Hong Kong, China
| |
Collapse
|
2
|
Ait Benali B, Mihi S, Ait Mlouk A, El Bazi I, Laachfoubi N. Arabic named entity recognition in social media based on BiLSTM-CRF using an attention mechanism. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2022. [DOI: 10.3233/jifs-211944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Named Entity Recognition (NER) is a vitally important task of Natural Language Processing (NLP), which aims at finding named entities in natural language text and classifying them into predefined categories such as persons (PER), places (LOC), organizations (ORG), and so on. In the Arabic context, the current NER approaches based on deep learning are mainly based on word embedding or character-level embedding as input. However, using a single granularity representation has problems with out-of-vocabulary (OOV), word embedding errors, and relatively simple semantic content. This paper presents a multi-headed self-attention mechanism implemented in the BiLSTM-CRF neural network structure to recognize Arabic named entities on social media using two embeddings. Unlike other state-of-the-art approaches, this approach combines character and word embedding at the embedding layer, and the attention mechanism calculates the similarity over the entire sequence of characters and captures local context information. The proposed approach better recognized NEs in Dialect Arabic, reaching an F1 value of 74.15% on Darwish’s dataset (a publicly available Arabic NER benchmark for social media). According to our knowledge, our findings outperform the current state-of-the-art models for Arabic Named Entity Recognition on social media.
Collapse
Affiliation(s)
- B. Ait Benali
- Faculty of Sciences and Techniques, IR2M Laboratory, Hassan First University of Settat, Settat, Morocco
| | - S. Mihi
- Faculty of Sciences and Techniques, IR2M Laboratory, Hassan First University of Settat, Settat, Morocco
| | - A. Ait Mlouk
- Department of Information Technology, Division of Scientific Computing, Uppsala University, Sweden
| | - I. El Bazi
- National School of Business and Management, Sultan Moulay Slimane University, Beni Mellal, Morocco
| | - N. Laachfoubi
- Faculty of Sciences and Techniques, IR2M Laboratory, Hassan First University of Settat, Settat, Morocco
| |
Collapse
|
3
|
Abstract
This article presents a rule-based grapheme-to-phoneme conversion method and algorithm for Polish. It should be noted that the fundamental grapheme-to-phoneme conversion rules have been developed by Maria Steffen-Batóg and presented in her set of monographs dedicated to the automatic grapheme-to-phoneme conversion of texts in Polish. The author used previously developed rules and independently developed the grapheme-to-phoneme conversion algorithm.The algorithm has been implemented as a software application called TransFon, which allows the user to convert any text in Polish orthography to corresponding strings of phonemes, in phonemic transcription. Using TransFon, a phonemic Polish language corpus was created out of an orthographic corpus. The phonemic language corpusallows statistical analysis of the Polish language, as well as the development of phoneme- and word-based language models for automatic speech recognition using statistical methods. The developed phonemic language corpus opens up further opportunities for research to improve automatic speech recognition in Polish. The development of statistical methods for speech recognition and language modelling requires access to large language corpora, including phonemic corpora. The method presented here enables the creation of such corpora.
Collapse
|
4
|
Abstract
Conversational agents are reshaping our communication environment and have the potential to inform and persuade in new and effective ways. In this paper, we present the underlying technologies and the theoretical background behind a health-care platform dedicated to supporting medical stuff and individuals with movement disabilities and to providing advanced monitoring functionalities in hospital and home surroundings. The framework implements an intelligent combination of two research areas: (1) sensor- and camera-based monitoring to collect, analyse, and interpret people behaviour and (2) natural machine–human interaction through an apprehensive virtual assistant benefiting ailing patients. In addition, the framework serves as an important assistant to caregivers and clinical experts to obtain information about the patients in an intuitive manner. The proposed approach capitalises on latest breakthroughs in computer vision, sensor management, speech recognition, natural language processing, knowledge representation, dialogue management, semantic reasoning, and speech synthesis, combining medical expertise and patient history.
Collapse
|
6
|
Frank SL, Yang J. Lexical representation explains cortical entrainment during speech comprehension. PLoS One 2018; 13:e0197304. [PMID: 29771964 PMCID: PMC5957381 DOI: 10.1371/journal.pone.0197304] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2017] [Accepted: 05/01/2018] [Indexed: 11/19/2022] Open
Abstract
Results from a recent neuroimaging study on spoken sentence comprehension have been interpreted as evidence for cortical entrainment to hierarchical syntactic structure. We present a simple computational model that predicts the power spectra from this study, even though the model's linguistic knowledge is restricted to the lexical level, and word-level representations are not combined into higher-level units (phrases or sentences). Hence, the cortical entrainment results can also be explained from the lexical properties of the stimuli, without recourse to hierarchical syntax.
Collapse
Affiliation(s)
- Stefan L. Frank
- Centre for Language Studies, Radboud University, Nijmegen, The Netherlands
- * E-mail:
| | - Jinbiao Yang
- Institute of Brain and Cognitive Science, NYU Shanghai, Shanghai, China
| |
Collapse
|
8
|
Moore RK, Marxer R, Thill S. Vocal Interactivity in-and-between Humans, Animals, and Robots. Front Robot AI 2016. [DOI: 10.3389/frobt.2016.00061] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
|