1
|
Duville MM, Alonso-Valerdi LM, Ibarra-Zarate DI. Improved emotion differentiation under reduced acoustic variability of speech in autism. BMC Med 2024; 22:121. [PMID: 38486293 PMCID: PMC10941423 DOI: 10.1186/s12916-024-03341-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Accepted: 03/05/2024] [Indexed: 03/17/2024] Open
Abstract
BACKGROUND Socio-emotional impairments are among the diagnostic criteria for autism spectrum disorder (ASD), but the actual knowledge has substantiated both altered and intact emotional prosodies recognition. Here, a Bayesian framework of perception is considered suggesting that the oversampling of sensory evidence would impair perception within highly variable environments. However, reliable hierarchical structures for spectral and temporal cues would foster emotion discrimination by autistics. METHODS Event-related spectral perturbations (ERSP) extracted from electroencephalographic (EEG) data indexed the perception of anger, disgust, fear, happiness, neutral, and sadness prosodies while listening to speech uttered by (a) human or (b) synthesized voices characterized by reduced volatility and variability of acoustic environments. The assessment of mechanisms for perception was extended to the visual domain by analyzing the behavioral accuracy within a non-social task in which dynamics of precision weighting between bottom-up evidence and top-down inferences were emphasized. Eighty children (mean 9.7 years old; standard deviation 1.8) volunteered including 40 autistics. The symptomatology was assessed at the time of the study via the Autism Diagnostic Observation Schedule, Second Edition, and parents' responses on the Autism Spectrum Rating Scales. A mixed within-between analysis of variance was conducted to assess the effects of group (autism versus typical development), voice, emotions, and interaction between factors. A Bayesian analysis was implemented to quantify the evidence in favor of the null hypothesis in case of non-significance. Post hoc comparisons were corrected for multiple testing. RESULTS Autistic children presented impaired emotion differentiation while listening to speech uttered by human voices, which was improved when the acoustic volatility and variability of voices were reduced. Divergent neural patterns were observed from neurotypicals to autistics, emphasizing different mechanisms for perception. Accordingly, behavioral measurements on the visual task were consistent with the over-precision ascribed to the environmental variability (sensory processing) that weakened performance. Unlike autistic children, neurotypicals could differentiate emotions induced by all voices. CONCLUSIONS This study outlines behavioral and neurophysiological mechanisms that underpin responses to sensory variability. Neurobiological insights into the processing of emotional prosodies emphasized the potential of acoustically modified emotional prosodies to improve emotion differentiation by autistics. TRIAL REGISTRATION BioMed Central ISRCTN Registry, ISRCTN18117434. Registered on September 20, 2020.
Collapse
Affiliation(s)
- Mathilde Marie Duville
- Escuela de Ingeniería y Ciencias, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501 Sur, Col: Tecnológico, Monterrey, N.L, 64700, México.
| | - Luz María Alonso-Valerdi
- Escuela de Ingeniería y Ciencias, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501 Sur, Col: Tecnológico, Monterrey, N.L, 64700, México
| | - David I Ibarra-Zarate
- Escuela de Ingeniería y Ciencias, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501 Sur, Col: Tecnológico, Monterrey, N.L, 64700, México
| |
Collapse
|
2
|
Akinpelu S, Viriri S. Speech emotion classification using attention based network and regularized feature selection. Sci Rep 2023; 13:11990. [PMID: 37491423 PMCID: PMC10368662 DOI: 10.1038/s41598-023-38868-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Accepted: 07/16/2023] [Indexed: 07/27/2023] Open
Abstract
Speech emotion classification (SEC) has gained the utmost height and occupied a conspicuous position within the research community in recent times. Its vital role in Human-Computer Interaction (HCI) and affective computing cannot be overemphasized. Many primitive algorithmic solutions and deep neural network (DNN) models have been proposed for efficient recognition of emotion from speech however, the suitability of these methods to accurately classify emotion from speech with multi-lingual background and other factors that impede efficient classification of emotion is still demanding critical consideration. This study proposed an attention-based network with a pre-trained convolutional neural network and regularized neighbourhood component analysis (RNCA) feature selection techniques for improved classification of speech emotion. The attention model has proven to be successful in many sequence-based and time-series tasks. An extensive experiment was carried out using three major classifiers (SVM, MLP and Random Forest) on a publicly available TESS (Toronto English Speech Sentence) dataset. The result of our proposed model (Attention-based DCNN+RNCA+RF) achieved 97.8% classification accuracy and yielded a 3.27% improved performance, which outperforms state-of-the-art SEC approaches. Our model evaluation revealed the consistency of attention mechanism and feature selection with human behavioural patterns in classifying emotion from auditory speech.
Collapse
Affiliation(s)
- Samson Akinpelu
- School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Durban, 4000, South Africa
| | - Serestina Viriri
- School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Durban, 4000, South Africa.
| |
Collapse
|
3
|
Duville MM, Corona-González CE, Romo De León R, Rodríguez Vera A, Flores-Jimenez MS, Ibarra-Zarate DI, Alonso-Valerdi LM. Perception of task-irrelevant affective prosody by typically developed and diagnosed children with Autism Spectrum Disorder under attentional loads: electroencephalographic and behavioural data. Data Brief 2023; 48:109057. [PMID: 37006385 PMCID: PMC10060595 DOI: 10.1016/j.dib.2023.109057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 02/13/2023] [Accepted: 03/06/2023] [Indexed: 03/16/2023] Open
Abstract
The relevance of affective information triggers cognitive prioritisation, dictated by both the attentional load of the relevant task, and socio-emotional abilities. This dataset provides electroencephalographic (EEG) signals related to implicit emotional speech perception under low, intermediate, and high attentional demands. Demographic and behavioural data are also provided. Specific social-emotional reciprocity and verbal communication characterise Autism Spectrum Disorder (ASD) and may influence the processing of affective prosodies. Therefore, 62 children and their parents or legal guardians participated in data collection, including 31 children with high autistic traits (x̄age=9.6-year-old, σage=1.5) who previously received a diagnosis of ASD by a medical specialist, and 31 typically developed children (x̄age=10.2-year-old, σage=1.2). Assessments of the scope of autistic behaviours using the Autism Spectrum Rating Scales (ASRS, parent report) are provided for every child. During the experiment, children listened to task-irrelevant affective prosodies (anger, disgust, fear, happiness, neutral and sadness) while answering three visual tasks: neutral image viewing (low attentional load), one-target 4-disc Multiple Object Tracking (MOT; intermediate), one-target 8-disc MOT (high). The EEG data recorded during all three tasks and the tracking capacity (behavioural data) from MOT conditions are included in the dataset. Particularly, the tracking capacity was computed as a standardised index of attentional abilities during MOT, corrected for guessing. Beforehand, children answered the Edinburgh Handedness Inventory, and resting-state EEG activity of children was recorded for 2 minutes with eyes open. Those data are also provided. The present dataset can be used to investigate the electrophysiological correlates of implicit emotion and speech perceptions and their interaction with attentional load and autistic traits. Besides, resting-state EEG data may be used to characterise inter-individual heterogeneity at rest and, in turn, associate it with attentional capacities during MOT and with autistic behavioural patterns. Finally, tracking capacity may be useful to explore dynamic and selective attentional mechanisms under emotional constraints.
Collapse
Affiliation(s)
- Mathilde Marie Duville
- Tecnologico de Monterrey, Escuela de Ingeniería y Ciencias, Ave. Eugenio Garza Sada 2501, Monterrey, N.L., México, 64849
| | - César E. Corona-González
- Tecnologico de Monterrey, Escuela de Ingeniería y Ciencias, Ave. Eugenio Garza Sada 2501, Monterrey, N.L., México, 64849
| | - Rebeca Romo De León
- Tecnologico de Monterrey, Escuela de Ingeniería y Ciencias, Ave. Eugenio Garza Sada 2501, Monterrey, N.L., México, 64849
| | - Andrea Rodríguez Vera
- Tecnologico de Monterrey, Escuela de Ingeniería y Ciencias, Ave. Eugenio Garza Sada 2501, Monterrey, N.L., México, 64849
| | - Mariana S. Flores-Jimenez
- Tecnológico de Monterrey, Escuela de Ingeniería y Ciencias, Campus Guadalajara, Av. Gral. Ramón Corona No 2514, Colonia Nuevo México, Zapopan, Jalisco 45121, México
| | - David I. Ibarra-Zarate
- Tecnologico de Monterrey, Escuela de Ingeniería y Ciencias, Ave. Eugenio Garza Sada 2501, Monterrey, N.L., México, 64849
| | - Luz María Alonso-Valerdi
- Tecnologico de Monterrey, Escuela de Ingeniería y Ciencias, Ave. Eugenio Garza Sada 2501, Monterrey, N.L., México, 64849
| |
Collapse
|
4
|
Duville MM, Ibarra-Zarate DI, Alonso-Valerdi LM. Autistic traits shape neuronal oscillations during emotion perception under attentional load modulation. Sci Rep 2023; 13:8178. [PMID: 37210415 DOI: 10.1038/s41598-023-35013-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 05/11/2023] [Indexed: 05/22/2023] Open
Abstract
Emotional content is particularly salient, but situational factors such as cognitive load may disturb the attentional prioritization towards affective stimuli and interfere with their processing. In this study, 31 autistic and 31 typically developed children volunteered to assess their perception of affective prosodies via event-related spectral perturbations of neuronal oscillations recorded by electroencephalography under attentional load modulations induced by Multiple Object Tracking or neutral images. Although intermediate load optimized emotion processing by typically developed children, load and emotion did not interplay in children with autism. Results also outlined impaired emotional integration emphasized in theta, alpha and beta oscillations at early and late stages, and lower attentional ability indexed by the tracking capacity. Furthermore, both tracking capacity and neuronal patterns of emotion perception during task were predicted by daily-life autistic behaviors. These findings highlight that intermediate load may encourage emotion processing in typically developed children. However, autism aligns with impaired affective processing and selective attention, both insensitive to load modulations. Results were discussed within a Bayesian perspective that suggests atypical updating in precision between sensations and hidden states, towards poor contextual evaluations. For the first time, implicit emotion perception assessed by neuronal markers was integrated with environmental demands to characterize autism.
Collapse
Affiliation(s)
- Mathilde Marie Duville
- Tecnologico de Monterrey, Escuela de Ingeniería y Ciencias, Ave. Eugenio Garza Sada 2501, 64849, Monterrey, NL, México.
| | - David I Ibarra-Zarate
- Tecnologico de Monterrey, Escuela de Ingeniería y Ciencias, Ave. Eugenio Garza Sada 2501, 64849, Monterrey, NL, México
| | - Luz María Alonso-Valerdi
- Tecnologico de Monterrey, Escuela de Ingeniería y Ciencias, Ave. Eugenio Garza Sada 2501, 64849, Monterrey, NL, México
| |
Collapse
|
5
|
Duville MM, Alonso-Valerdi LM, Ibarra-Zarate DI. Neuronal and behavioral affective perceptions of human and naturalness-reduced emotional prosodies. Front Comput Neurosci 2022; 16:1022787. [PMID: 36465969 PMCID: PMC9716567 DOI: 10.3389/fncom.2022.1022787] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Accepted: 10/24/2022] [Indexed: 12/27/2024] Open
Abstract
Artificial voices are nowadays embedded into our daily lives with latest neural voices approaching human voice consistency (naturalness). Nevertheless, behavioral, and neuronal correlates of the perception of less naturalistic emotional prosodies are still misunderstood. In this study, we explored the acoustic tendencies that define naturalness from human to synthesized voices. Then, we created naturalness-reduced emotional utterances by acoustic editions of human voices. Finally, we used Event-Related Potentials (ERP) to assess the time dynamics of emotional integration when listening to both human and synthesized voices in a healthy adult sample. Additionally, listeners rated their perceptions for valence, arousal, discrete emotions, naturalness, and intelligibility. Synthesized voices were characterized by less lexical stress (i.e., reduced difference between stressed and unstressed syllables within words) as regards duration and median pitch modulations. Besides, spectral content was attenuated toward lower F2 and F3 frequencies and lower intensities for harmonics 1 and 4. Both psychometric and neuronal correlates were sensitive to naturalness reduction. (1) Naturalness and intelligibility ratings dropped with emotional utterances synthetization, (2) Discrete emotion recognition was impaired as naturalness declined, consistent with P200 and Late Positive Potentials (LPP) being less sensitive to emotional differentiation at lower naturalness, and (3) Relative P200 and LPP amplitudes between prosodies were modulated by synthetization. Nevertheless, (4) Valence and arousal perceptions were preserved at lower naturalness, (5) Valence (arousal) ratings correlated negatively (positively) with Higuchi's fractal dimension extracted on neuronal data under all naturalness perturbations, (6) Inter-Trial Phase Coherence (ITPC) and standard deviation measurements revealed high inter-individual heterogeneity for emotion perception that is still preserved as naturalness reduces. Notably, partial between-participant synchrony (low ITPC), along with high amplitude dispersion on ERPs at both early and late stages emphasized miscellaneous emotional responses among subjects. In this study, we highlighted for the first time both behavioral and neuronal basis of emotional perception under acoustic naturalness alterations. Partial dependencies between ecological relevance and emotion understanding outlined the modulation but not the annihilation of emotional integration by synthetization.
Collapse
|
6
|
Automatic Speech Emotion Recognition of Younger School Age Children. MATHEMATICS 2022. [DOI: 10.3390/math10142373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
This paper introduces the extended description of a database that contains emotional speech in the Russian language of younger school age (8–12-year-old) children and describes the results of validation of the database based on classical machine learning algorithms, such as Support Vector Machine (SVM) and Multi-Layer Perceptron (MLP). The validation is performed using standard procedures and scenarios of the validation similar to other well-known databases of children’s emotional acting speech. Performance evaluation of automatic multiclass recognition on four emotion classes “Neutral (Calm)—Joy—Sadness—Anger” shows the superiority of SVM performance and also MLP performance over the results of perceptual tests. Moreover, the results of automatic recognition on the test dataset which was used in the perceptual test are even better. These results prove that emotions in the database can be reliably recognized both by experts and automatically using classical machine learning algorithms such as SVM and MLP, which can be used as baselines for comparing emotion recognition systems based on more sophisticated modern machine learning methods and deep neural networks. The results also confirm that this database can be a valuable resource for researchers studying affective reactions in speech communication during child-computer interactions in the Russian language and can be used to develop various edutainment, health care, etc. applications.
Collapse
|