1
|
Sinvani RT, Fogel-Grinvald H, Sapir S. Self-Rated Confidence in Vocal Emotion Recognition Ability: The Role of Gender. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2024; 67:1413-1423. [PMID: 38625128 DOI: 10.1044/2024_jslhr-23-00373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/17/2024]
Abstract
PURPOSE We studied the role of gender in metacognition of voice emotion recognition ability (ERA), reflected by self-rated confidence (SRC). To this end, we guided our study in two approaches: first, by examining the role of gender in voice ERA and SRC independently and second, by looking for gender effects on the ERA association with SRC. METHOD We asked 100 participants (50 men, 50 women) to interpret a set of vocal expressions portrayed by 30 actors (16 men, 14 women) as defined by their emotional meaning. Targets were 180 repetitive lexical sentences articulated in congruent emotional voices (anger, sadness, surprise, happiness, fear) and neutral expressions. Trial by trial, the participants were assigned retrospective SRC based on their emotional recognition performance. RESULTS A binomial generalized linear mixed model (GLMM) estimating ERA accuracy revealed a significant gender effect, with women encoders (speakers) yielding higher accuracy levels than men. There was no significant effect of the decoder's (listener's) gender. A second GLMM estimating SRC found a significant effect of encoder and decoder genders, with women outperforming men. Gamma correlations were significantly greater than zero for women and men decoders. CONCLUSIONS In spite of varying interpretations of gender in each independent rating (ERA and SRC), our results suggest that both men and women decoders were accurate in their metacognition regarding voice emotion recognition. Further research is needed to study how individuals of both genders use metacognitive knowledge in their emotional recognition and whether and how such knowledge contributes to effective social communication.
Collapse
Affiliation(s)
| | | | - Shimon Sapir
- Department of Communication Sciences and Disorders, Faculty of Social Welfare and Health Sciences, University of Haifa, Israel
| |
Collapse
|
2
|
Larrouy-Maestri P, Poeppel D, Pell MD. The Sound of Emotional Prosody: Nearly 3 Decades of Research and Future Directions. PERSPECTIVES ON PSYCHOLOGICAL SCIENCE 2024:17456916231217722. [PMID: 38232303 DOI: 10.1177/17456916231217722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
Emotional voices attract considerable attention. A search on any browser using "emotional prosody" as a key phrase leads to more than a million entries. Such interest is evident in the scientific literature as well; readers are reminded in the introductory paragraphs of countless articles of the great importance of prosody and that listeners easily infer the emotional state of speakers through acoustic information. However, despite decades of research on this topic and important achievements, the mapping between acoustics and emotional states is still unclear. In this article, we chart the rich literature on emotional prosody for both newcomers to the field and researchers seeking updates. We also summarize problems revealed by a sample of the literature of the last decades and propose concrete research directions for addressing them, ultimately to satisfy the need for more mechanistic knowledge of emotional prosody.
Collapse
Affiliation(s)
- Pauline Larrouy-Maestri
- Max Planck Institute for Empirical Aesthetics, Frankfurt, Germany
- School of Communication Sciences and Disorders, McGill University
- Max Planck-NYU Center for Language, Music, and Emotion, New York, New York
| | - David Poeppel
- Max Planck-NYU Center for Language, Music, and Emotion, New York, New York
- Department of Psychology and Center for Neural Science, New York University
- Ernst Strüngmann Institute for Neuroscience, Frankfurt, Germany
| | - Marc D Pell
- School of Communication Sciences and Disorders, McGill University
- Centre for Research on Brain, Language, and Music, Montreal, Quebec, Canada
| |
Collapse
|
3
|
Becker D, Bernecker K. The Role of Hedonic Goal Pursuit in Self-Control and Self-Regulation: Is Pleasure the Problem or Part of the Solution? AFFECTIVE SCIENCE 2023; 4:470-474. [PMID: 37744979 PMCID: PMC10514018 DOI: 10.1007/s42761-023-00193-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 05/18/2023] [Indexed: 09/26/2023]
Abstract
This paper examines the role of hedonic goal pursuit in self-control and self-regulation. We argue that not all pursuit of immediate pleasure is problematic and that successful hedonic goal pursuit can be beneficial for long-term goal pursuit and for achieving positive self-regulatory outcomes, such as health and well-being. The following two key questions for future research are discussed: How can people's positive affective experiences during hedonic goal pursuit be enhanced, and how exactly do those affective experiences contribute to self-regulatory outcomes? We also call for an intercultural perspective linking hedonic goal pursuit to self-regulatory outcomes at different levels. We suggest that understanding the cognitive, motivational, and affective mechanisms at play can help individuals reap the benefits of successful hedonic goal pursuit. Considering those potential benefits, hedonic goal pursuit should be studied more systematically. To achieve this, we argue for a stronger integration of affective science and self-control research.
Collapse
Affiliation(s)
- Daniela Becker
- Behavioural Science Institute, Radboud University, Thomas van Aquinostraat 4, 6525 GD Nijmegen, The Netherlands
| | - Katharina Bernecker
- Department of Psychology, University of Zurich, Zurich, Switzerland
- URPP Dynamics of Healthy Aging, University of Zurich, Zurich, Switzerland
| |
Collapse
|
4
|
Zolnoori M, Zolnour A, Topaz M. ADscreen: A speech processing-based screening system for automatic identification of patients with Alzheimer's disease and related dementia. Artif Intell Med 2023; 143:102624. [PMID: 37673583 PMCID: PMC10483114 DOI: 10.1016/j.artmed.2023.102624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2022] [Revised: 06/22/2023] [Accepted: 07/08/2023] [Indexed: 09/08/2023]
Abstract
Alzheimer's disease and related dementias (ADRD) present a looming public health crisis, affecting roughly 5 million people and 11 % of older adults in the United States. Despite nationwide efforts for timely diagnosis of patients with ADRD, >50 % of them are not diagnosed and unaware of their disease. To address this challenge, we developed ADscreen, an innovative speech-processing based ADRD screening algorithm for the protective identification of patients with ADRD. ADscreen consists of five major components: (i) noise reduction for reducing background noises from the audio-recorded patient speech, (ii) modeling the patient's ability in phonetic motor planning using acoustic parameters of the patient's voice, (iii) modeling the patient's ability in semantic and syntactic levels of language organization using linguistic parameters of the patient speech, (iv) extracting vocal and semantic psycholinguistic cues from the patient speech, and (v) building and evaluating the screening algorithm. To identify important speech parameters (features) associated with ADRD, we used the Joint Mutual Information Maximization (JMIM), an effective feature selection method for high dimensional, small sample size datasets. Modeling the relationship between speech parameters and the outcome variable (presence/absence of ADRD) was conducted using three different machine learning (ML) architectures with the capability of joining informative acoustic and linguistic with contextual word embedding vectors obtained from the DistilBERT (Bidirectional Encoder Representations from Transformers). We evaluated the performance of the ADscreen on an audio-recorded patients' speech (verbal description) for the Cookie-Theft picture description task, which is publicly available in the dementia databank. The joint fusion of acoustic and linguistic parameters with contextual word embedding vectors of DistilBERT achieved F1-score = 84.64 (standard deviation [std] = ±3.58) and AUC-ROC = 92.53 (std = ±3.34) for training dataset, and F1-score = 89.55 and AUC-ROC = 93.89 for the test dataset. In summary, ADscreen has a strong potential to be integrated with clinical workflow to address the need for an ADRD screening tool so that patients with cognitive impairment can receive appropriate and timely care.
Collapse
Affiliation(s)
- Maryam Zolnoori
- Columbia University Medical Center, New York, NY, United States of America; School of Nursing, Columbia University, New York, NY, United States of America.
| | - Ali Zolnour
- School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran
| | - Maxim Topaz
- Columbia University Medical Center, New York, NY, United States of America; School of Nursing, Columbia University, New York, NY, United States of America
| |
Collapse
|
5
|
Baglione H, Coulombe V, Martel-Sauvageau V, Monetta L. The impacts of aging on the comprehension of affective prosody: A systematic review. APPLIED NEUROPSYCHOLOGY. ADULT 2023:1-16. [PMID: 37603689 DOI: 10.1080/23279095.2023.2245940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/23/2023]
Abstract
Recent clinical reports have suggested a possible decline in the ability to understand emotions in speech (affective prosody comprehension) with aging. The present study aims to further examine the differences in performance between older and younger adults in terms of affective prosody comprehension. Following a recent cognitive model dividing affective prosody comprehension into perceptual and lexico-semantic components, a cognitive approach targeting these components was adopted. The influence of emotions' valence and category on aging performance was also investigated. A systematic review of the literature was carried out using six databases. Twenty-one articles, presenting 25 experiments, were included. All experiments analyzed affective prosody comprehension performance of older versus younger adults. The results confirmed that older adults' performance in identifying emotions in speech was reduced compared to younger adults. The results also brought out the fact that affective prosody comprehension abilities could be modulated by the emotion category but not by the emotional valence. Various theories account for this difference in performance, namely auditory perception, brain aging, and socioemotional selectivity theory suggesting that older people tend to neglect negative emotions. However, the explanation of the underlying deficits of the affective prosody decline is still limited.
Collapse
Affiliation(s)
- Héloïse Baglione
- Département de réadaptation, Université Laval, Québec City, Quebec, Canada
- Département de readaptation, Centre interdisciplinaire de recherche en réadaptation et intégration sociale (CIRRIS), Québec City, Quebec, Canada
| | - Valérie Coulombe
- Département de réadaptation, Université Laval, Québec City, Quebec, Canada
- Département de readaptation, Centre interdisciplinaire de recherche en réadaptation et intégration sociale (CIRRIS), Québec City, Quebec, Canada
| | - Vincent Martel-Sauvageau
- Département de réadaptation, Université Laval, Québec City, Quebec, Canada
- Département de readaptation, Centre interdisciplinaire de recherche en réadaptation et intégration sociale (CIRRIS), Québec City, Quebec, Canada
| | - Laura Monetta
- Département de réadaptation, Université Laval, Québec City, Quebec, Canada
- Département de readaptation, Centre interdisciplinaire de recherche en réadaptation et intégration sociale (CIRRIS), Québec City, Quebec, Canada
| |
Collapse
|
6
|
Illner V, Tykalova T, Skrabal D, Klempir J, Rusz J. Automated Vowel Articulation Analysis in Connected Speech Among Progressive Neurological Diseases, Dysarthria Types, and Dysarthria Severities. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023:1-22. [PMID: 37499137 DOI: 10.1044/2023_jslhr-22-00526] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
PURPOSE Although articulatory impairment represents distinct speech characteristics in most neurological diseases affecting movement, methods allowing automated assessments of articulation deficits from the connected speech are scarce. This study aimed to design a fully automated method for analyzing dysarthria-related vowel articulation impairment and estimate its sensitivity in a broad range of neurological diseases and various types and severities of dysarthria. METHOD Unconstrained monologue and reading passages were acquired from 459 speakers, including 306 healthy controls and 153 neurological patients. The algorithm utilized a formant tracker in combination with a phoneme recognizer and subsequent signal processing analysis. RESULTS Articulatory undershoot of vowels was presented in a broad spectrum of progressive neurodegenerative diseases, including Parkinson's disease, progressive supranuclear palsy, multiple-system atrophy, Huntington's disease, essential tremor, cerebellar ataxia, multiple sclerosis, and amyotrophic lateral sclerosis, as well as in related dysarthria subtypes including hypokinetic, hyperkinetic, ataxic, spastic, flaccid, and their mixed variants. Formant ratios showed a higher sensitivity to vowel deficits than vowel space area. First formants of corner vowels were significantly lower for multiple-system atrophy than cerebellar ataxia. Second formants of vowels /a/ and /i/ were lower in ataxic compared to spastic dysarthria. Discriminant analysis showed a classification score of up to 41.0% for disease type, 39.3% for dysarthria type, and 49.2% for dysarthria severity. Algorithm accuracy reached an F-score of 0.77. CONCLUSIONS Distinctive vowel articulation alterations reflect underlying pathophysiology in neurological diseases. Objective acoustic analysis of vowel articulation has the potential to provide a universal method to screen motor speech disorders. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.23681529.
Collapse
Affiliation(s)
- Vojtech Illner
- Department of Circuit Theory, Faculty of Electrical Engineering, Czech Technical University in Prague, Czech Republic
| | - Tereza Tykalova
- Department of Circuit Theory, Faculty of Electrical Engineering, Czech Technical University in Prague, Czech Republic
| | - Dominik Skrabal
- Department of Neurology and Centre of Clinical Neuroscience, First Faculty of Medicine, Charles University and General University Hospital, Prague, Czech Republic
| | - Jiri Klempir
- Department of Neurology and Centre of Clinical Neuroscience, First Faculty of Medicine, Charles University and General University Hospital, Prague, Czech Republic
| | - Jan Rusz
- Department of Circuit Theory, Faculty of Electrical Engineering, Czech Technical University in Prague, Czech Republic
- Department of Neurology and Centre of Clinical Neuroscience, First Faculty of Medicine, Charles University and General University Hospital, Prague, Czech Republic
- Department of Neurology and ARTORG Center, Inselspital, Bern University Hospital, University of Bern, Switzerland
| |
Collapse
|
7
|
Opladen V, Tanck JA, Baur J, Hartmann AS, Svaldi J, Vocks S. Body exposure and vocal analysis: validation of fundamental frequency as a correlate of emotional arousal and valence. Front Psychiatry 2023; 14:1087548. [PMID: 37293400 PMCID: PMC10244733 DOI: 10.3389/fpsyt.2023.1087548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 04/04/2023] [Indexed: 06/10/2023] Open
Abstract
Introduction Vocal analysis of fundamental frequency (f0) represents a suitable index to assess emotional activation. However, although f0 has often been used as an indicator of emotional arousal and different affective states, its psychometric properties are unclear. Specifically, there is uncertainty regarding the validity of the indices of f0mean and f0variabilitymeasures (f0dispersion, f0range, and f0SD) and whether higher or lower f0 indices are associated with higher arousal in stressful situations. The present study therefore aimed to validate f0 as a marker of vocally encoded emotional arousal, valence, and body-related distress during body exposure as a psychological stressor. Methods N = 73 female participants first underwent a 3-min, non-activating neutral reference condition, followed by a 7-min activating body exposure condition. Participants completed questionnaires on affect (i.e., arousal, valence, body-related distress), and their voice data and heart rate (HR) were recorded continuously. Vocal analyses were performed using Praat, a program for extracting paralinguistic measures from spoken audio. Results The results revealed no effects for f0 and state body dissatisfaction or general affect. F0mean correlated positively with self-reported arousal and negatively with valence, but was not correlated with HRmean/maximum. No correlations with any measure were found for any f0variabililtymeasures. Discussion Given the promising findings regarding f0mean for arousal and valence and the inconclusive findings regarding f0 as a marker of general affect and body-related distress, it may be assumed that f0mean represents a valid global marker of emotional arousal and valence rather than of concrete body-related distress. In view of the present findings regarding the validity of f0, it may be suggested that f0mean, but not f0variabilitymeasures, can be used to assess emotional arousal and valence in addition to self-report measures, which is less intrusive than conventional psychophysiological measures.
Collapse
Affiliation(s)
- Vanessa Opladen
- Department of Clinical Psychology and Psychotherapy, Osnabrück University, Osnabrück, Germany
| | - Julia A. Tanck
- Department of Clinical Psychology and Psychotherapy, Osnabrück University, Osnabrück, Germany
| | - Julia Baur
- Department of Clinical Psychology and Psychotherapy, University of Tübingen, Tübingen, Germany
| | - Andrea S. Hartmann
- Department of Psychology, Experimental Clinical Psychology, University of Konstanz, Konstanz, Germany
| | - Jennifer Svaldi
- Department of Clinical Psychology and Psychotherapy, University of Tübingen, Tübingen, Germany
| | - Silja Vocks
- Department of Clinical Psychology and Psychotherapy, Osnabrück University, Osnabrück, Germany
| |
Collapse
|
8
|
Ekberg M, Stavrinos G, Andin J, Stenfelt S, Dahlström Ö. Acoustic Features Distinguishing Emotions in Swedish Speech. J Voice 2023:S0892-1997(23)00103-0. [PMID: 37045739 DOI: 10.1016/j.jvoice.2023.03.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 03/09/2023] [Accepted: 03/10/2023] [Indexed: 04/14/2023]
Abstract
Few studies have examined which acoustic features of speech can be used to distinguish between different emotions, and how combinations of acoustic parameters contribute to identification of emotions. The aim of the present study was to investigate which acoustic parameters in Swedish speech are most important for differentiation between, and identification of, the emotions anger, fear, happiness, sadness, and surprise in Swedish sentences. One-way ANOVAs were used to compare acoustic parameters between the emotions and both simple and multiple logistic regression models were used to examine the contribution of different acoustic parameters to differentiation between emotions. Results showed differences between emotions for several acoustic parameters in Swedish speech: surprise was the most distinct emotion, with significant differences compared to the other emotions across a range of acoustic parameters, while anger and happiness did not differ from each other on any parameter. The logistic regression models showed that fear was the best-predicted emotion while happiness was most difficult to predict. Frequency- and spectral-balance-related parameters were best at predicting fear. Amplitude- and temporal-related parameters were most important for surprise, while a combination of frequency-, amplitude- and spectral balance-related parameters are important for sadness. Assuming that there are similarities between acoustic models and how listeners infer emotions in speech, results suggest that individuals with hearing loss, who lack abilities of frequency detection, may compared to normal hearing individuals have difficulties in identifying fear in Swedish speech. Since happiness and fear relied primarily on amplitude- and spectral-balance-related parameters, detection of them are probably facilitated more by hearing aid use.
Collapse
Affiliation(s)
- M Ekberg
- Department of Behavioural Sciences and Learning, Linköping University, Linköping, Östergötland, Sweden.
| | - G Stavrinos
- Department of Behavioural Sciences and Learning, Linköping University, Linköping, Östergötland, Sweden
| | - J Andin
- Department of Behavioural Sciences and Learning, Linköping University, Linköping, Östergötland, Sweden
| | - S Stenfelt
- Department of Biomedical and Clinical Sciences, Linköping University, Linköping, Östergötland, Sweden
| | - Ö Dahlström
- Department of Behavioural Sciences and Learning, Linköping University, Linköping, Östergötland, Sweden
| |
Collapse
|
9
|
Lyakso E, Ruban N, Frolova O, Mekala MA. The children's emotional speech recognition by adults: Cross-cultural study on Russian and Tamil language. PLoS One 2023; 18:e0272837. [PMID: 36791129 PMCID: PMC9931107 DOI: 10.1371/journal.pone.0272837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 07/27/2022] [Indexed: 02/16/2023] Open
Abstract
The current study investigated the features of cross-cultural recognition of four basic emotions "joy-neutral (calm state)-sad-anger" in the spontaneous and acting speech of Indian and Russian children aged 8-12 years across Russian and Tamil languages. The research tasks were to examine the ability of Russian and Indian experts to recognize the state of Russian and Indian children by their speech, determine the acoustic features of correctly recognized speech samples, and specify the influence of the expert's language on the cross-cultural recognition of the emotional states of children. The study includes a perceptual auditory study by listeners and instrumental spectrographic analysis of child speech. Different accuracy and agreement between Russian and Indian experts were shown in recognizing the emotional states of Indian and Russian children by their speech, with more accurate recognition of the emotional state of children in their native language, in acting speech vs spontaneous speech. Both groups of experts recognize the state of anger via acting speech with the high agreement. The difference between the groups of experts was in the definition of joy, sadness, and neutral states depending on the test material with a different agreement. Speech signals with emphasized differences in acoustic patterns were more accurately classified by experts as belonging to emotions of different activation. The data showed that, despite the universality of basic emotions, on the one hand, the cultural environment affects their expression and perception, on the other hand, there are universal non-linguistic acoustic features of the voice that allow us to identify emotions via speech.
Collapse
Affiliation(s)
- Elena Lyakso
- The Child Speech Research Group, St. Petersburg State University, St. Petersburg, Russia
- * E-mail:
| | - Nersisson Ruban
- School of Electrical Engineering, Vellore Institute of Technology, Vellore, India
| | - Olga Frolova
- The Child Speech Research Group, St. Petersburg State University, St. Petersburg, Russia
| | - Mary A. Mekala
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India
| |
Collapse
|
10
|
Vos S, Collignon O, Boets B. The Sound of Emotion: Pinpointing Emotional Voice Processing Via Frequency Tagging EEG. Brain Sci 2023; 13:brainsci13020162. [PMID: 36831705 PMCID: PMC9954097 DOI: 10.3390/brainsci13020162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 01/13/2023] [Accepted: 01/16/2023] [Indexed: 01/20/2023] Open
Abstract
Successfully engaging in social communication requires efficient processing of subtle socio-communicative cues. Voices convey a wealth of social information, such as gender, identity, and the emotional state of the speaker. We tested whether our brain can systematically and automatically differentiate and track a periodic stream of emotional utterances among a series of neutral vocal utterances. We recorded frequency-tagged EEG responses of 20 neurotypical male adults while presenting streams of neutral utterances at a 4 Hz base rate, interleaved with emotional utterances every third stimulus, hence at a 1.333 Hz oddball frequency. Four emotions (happy, sad, angry, and fear) were presented as different conditions in different streams. To control the impact of low-level acoustic cues, we maximized variability among the stimuli and included a control condition with scrambled utterances. This scrambling preserves low-level acoustic characteristics but ensures that the emotional character is no longer recognizable. Results revealed significant oddball EEG responses for all conditions, indicating that every emotion category can be discriminated from the neutral stimuli, and every emotional oddball response was significantly higher than the response for the scrambled utterances. These findings demonstrate that emotion discrimination is fast, automatic, and is not merely driven by low-level perceptual features. Eventually, here, we present a new database for vocal emotion research with short emotional utterances (EVID) together with an innovative frequency-tagging EEG paradigm for implicit vocal emotion discrimination.
Collapse
Affiliation(s)
- Silke Vos
- Center for Developmental Psychiatry, Department of Neurosciences, KU Leuven, 3000 Leuven, Belgium
- Leuven Autism Research (LAuRes), KU Leuven, 3000 Leuven, Belgium
- Leuven Brain Institute (LBI), KU Leuven, 3000 Leuven, Belgium
- Correspondence: ; Tel.: +32-16-37-76-83
| | - Olivier Collignon
- Institute of Research in Psychology & Institute of Neuroscience, Université Catholique de Louvain, 1348 Louvain-La-Neuve, Belgium
- School of Health Sciences, HES-SO Valais-Wallis, The Sense Innovation and Research Center, 1007 Lausanne and 1950 Sion, Switzerland
| | - Bart Boets
- Center for Developmental Psychiatry, Department of Neurosciences, KU Leuven, 3000 Leuven, Belgium
- Leuven Autism Research (LAuRes), KU Leuven, 3000 Leuven, Belgium
- Leuven Brain Institute (LBI), KU Leuven, 3000 Leuven, Belgium
| |
Collapse
|
11
|
Bailey G, Halamová J, Vráblová V. Clients' Facial Expressions of Self-Compassion, Self-Criticism, and Self-Protection in Emotion-Focused Therapy Videos. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:1129. [PMID: 36673885 PMCID: PMC9859613 DOI: 10.3390/ijerph20021129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 12/31/2022] [Indexed: 06/17/2023]
Abstract
Clients' facial expressions allow psychotherapists to gather more information about clients' emotional processing. This study aims to examine and investigate the facial Action Units (AUs) of self-compassion, self-criticism, and self-protection within real Emotion-Focused Therapy (EFT) sessions. For this purpose, we used the facial analysis software iMotions. Twelve video sessions were selected for the analysis based on specific criteria. For self-compassion, the following AUs were significant: AUs 4 (brow furrow), 15 (lip corner depressor), and the AU12_smile (lip corner puller). For self-criticism, iMotions identified the AUs 2 (outer brow raise), AU1 (inner brow raise), AU7 (lid tighten), AU12_smirk (unilateral lip corner puller), and AU43 (eye closure). Self-protection was combined using the occurrence of AUs 1 and 4 and AU12_smirk. Moreover, the findings support the significance of discerning self-compassion and self-protection as two different concepts.
Collapse
Affiliation(s)
| | - Júlia Halamová
- Institute of Applied Psychology, Faculty of Social and Economic Sciences, Comenius University in Bratislava, Mlynské luhy 4, 821 05 Bratislava, Slovakia
| | | |
Collapse
|
12
|
Dietz T, Tavenrath S, Schiewer V, Öztürk-Arenz H, Durakovic V, Labouvie H, Jäger RS, Kusch M. Cologne questionnaire on speechlessness: Development and validation. CURRENT PSYCHOLOGY 2022; 42:1-12. [PMID: 36531200 PMCID: PMC9741759 DOI: 10.1007/s12144-022-04102-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/30/2022] [Indexed: 12/14/2022]
Abstract
Speechlessness forms a psychological concept that describes non-speaking or silence in different situations. Speechlessness occurs in particular during emotional stress. The Cologne Questionnaire on Speechlessness (ger.: Kölner Fragebogen zur Sprachlosigkeit) is an instrument for measuring speechlessness as a function of emotional perception and processing in situations of emotional stress or existing emotional dysregulation. The questionnaire was developed in theoretical proximity to the constructs of alexithymia and expressive suppression. Item selection was performed on a first line sample of N = 307 individuals of a normal population. Acquisition of an exploratory model to classify the phenomenon was conducted within four samples in clinical and non-clinical settings. Validation of the factorial structure was performed using an overarching dataset (N = 1293) consisting of all samples. The results of a confirmatory factor analysis (CFA) indicated the best model fit (χ2 (df, 146) = 953.856; p < .001; Tucker-Lewis-Index = .891; Comparative Fit Index = .916; Root Mean Square Error of Approximation = .065; p < .001; N = 1293) with a four-factorial structure of the questionnaire. Both the overall acceptable validity and reliability recommend the application of KFS on individuals of the normal population as well as clinical subgroups. In addition, the questionnaire can also be used in the context of research on the regulation of emotions. Supplementary Information The online version contains supplementary material available at 10.1007/s12144-022-04102-x.
Collapse
Affiliation(s)
- Thilo Dietz
- Department of Internal Medicine I, Faculty of Medicine, Cologne University Hospital, University of Cologne, Kerpener Straße 62, 50937 Cologne, Germany
| | - Sally Tavenrath
- Department of Internal Medicine I, Faculty of Medicine, Cologne University Hospital, University of Cologne, Kerpener Straße 62, 50937 Cologne, Germany
| | - Vera Schiewer
- Department of Internal Medicine I, Faculty of Medicine, Cologne University Hospital, University of Cologne, Kerpener Straße 62, 50937 Cologne, Germany
| | - Hülya Öztürk-Arenz
- Department of Internal Medicine I, Faculty of Medicine, Cologne University Hospital, University of Cologne, Kerpener Straße 62, 50937 Cologne, Germany
| | - Vanessa Durakovic
- Department of Internal Medicine I, Faculty of Medicine, Cologne University Hospital, University of Cologne, Kerpener Straße 62, 50937 Cologne, Germany
| | - Hildegard Labouvie
- Department of Internal Medicine I, Faculty of Medicine, Cologne University Hospital, University of Cologne, Kerpener Straße 62, 50937 Cologne, Germany
| | - Reinhold S. Jäger
- Centre for Educational Research, University Koblenz-Landau, Campus Landau, Bürgerstraße 23, 76829 Landau in der Pfalz, Germany
| | - Michael Kusch
- Department of Internal Medicine I, Faculty of Medicine, Cologne University Hospital, University of Cologne, Kerpener Straße 62, 50937 Cologne, Germany
| |
Collapse
|
13
|
Shi J, Gu Y, Vigliocco G. Prosodic modulations in child-directed language and their impact on word learning. Dev Sci 2022:e13357. [PMID: 36464779 DOI: 10.1111/desc.13357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 09/13/2022] [Accepted: 11/21/2022] [Indexed: 12/07/2022]
Abstract
Child-directed language can support language learning, but how? we addressed two questions: (1) how caregivers prosodically modulated their speech as a function of word familiarity (known or unknown to the child) and accessibility of referent (visually present or absent from the immediate environment); (2) whether such modulations affect children's unknown word learning and vocabulary development. We used data from 38 English-speaking caregivers (from the ECOLANG corpus) talking about toys (both known and unknown to their children aged 3-4 years) both when the toys are present and when absent. We analyzed prosodic dimensions (i.e., speaking rate, pitch and intensity) of caregivers' productions of 6529 toy labels. We found that unknown labels were spoken with significantly slower speaking rate, wider pitch and intensity range than known labels, especially in the first mentions, suggesting that caregivers adjust their prosody based on children's lexical knowledge. Moreover, caregivers used slower speaking rate and larger intensity range to mark the first mentions of toys that were physically absent. After the first mentions, they talked about the referents louder with higher mean pitch when toys were present than when toys were absent. Crucially, caregivers' mean pitch of unknown words and the degree of mean pitch modulation for unknown words relative to known words (pitch ratio) predicted children's immediate word learning and vocabulary size 1 year later. In conclusion, caregivers modify their prosody when the learning situation is more demanding for children, and these helpful modulations assist children in word learning. RESEARCH HIGHLIGHTS: In naturalistic interactions, caregivers use slower speaking rate, wider pitch and intensity range when introducing new labels to 3-4-year-old children, especially in first mentions. Compared to when toys are present, caregivers speak more slowly with larger intensity range to mark the first mentions of toys that are physically absent. Mean pitch to mark word familiarity predicts children's immediate word learning and future vocabulary size.
Collapse
Affiliation(s)
- Jinyu Shi
- Department of Experimental Psychology, University College London, London, UK
| | - Yan Gu
- Department of Experimental Psychology, University College London, London, UK.,Department of Psychology, University of Essex, Colchester, UK
| | - Gabriella Vigliocco
- Department of Experimental Psychology, University College London, London, UK
| |
Collapse
|
14
|
Ukaegbe OC, Holt BE, Keator LM, Brownell H, Blake ML, Lundgren K. Aprosodia Following Focal Brain Damage: What's Right and What's Left? AMERICAN JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2022; 31:2313-2328. [PMID: 35868292 DOI: 10.1044/2022_ajslp-21-00302] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
PURPOSE Hemispheric specialization for the comprehension and expression of linguistic and emotional prosody is typically attributed to the right hemisphere. This study used techniques adapted from meta-analysis to critically examine the strength of existing evidence for hemispheric lateralization of prosody following brain damage. METHOD Twenty-one databases were searched for articles published from 1970 to 2020 addressing differences in prosody performance between groups defined by right hemisphere damage and left hemisphere damage. Hedges's g effect sizes were calculated for all possible prosody comparisons. Primary analyses summarize effects for four types: linguistic production, linguistic comprehension, emotion comprehension, and emotion production. Within each primary analysis, Hedges's g values were averaged across comparisons (usually from a single article) based on the same sample of individuals. Secondary analyses explore more specific classifications of comparisons. RESULTS Out of the 113 articles investigating comprehension and production of emotional and linguistic prosody, 62 were deemed appropriate for data extraction, but only 21 met inclusion criteria, passed quality reviews, and provided sufficient information for analysis. Evidence from this review illustrates the heterogeneity of research methods and results from studies that have investigated aprosodia. This review provides inconsistent support for selective contribution of the two cerebral hemispheres to prosody comprehension and production; however, the strongest finding suggests that right hemisphere lesions disrupt emotional prosody comprehension more than left hemisphere lesions. CONCLUSION This review highlights the impoverished nature of the existing literature; offers suggestions for future research; and highlights relevant clinical implications for the prognostication, evaluation, and treatment of aprosodia. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.20334987.
Collapse
Affiliation(s)
- Onyinyechi C Ukaegbe
- Department of Communication Sciences and Disorders, The University of North Carolina Greensboro
| | - Brooke E Holt
- Department of Communication Sciences and Disorders, The University of North Carolina Greensboro
| | - Lynsey M Keator
- Department of Communication Sciences and Disorders, University of South Carolina, Columbia
| | - Hiram Brownell
- Department of Psychology and Neuroscience, Boston College, MA
| | | | - Kristine Lundgren
- Department of Communication Sciences and Disorders, The University of North Carolina Greensboro
| |
Collapse
|
15
|
Machine Learning Algorithms for Detection and Classifications of Emotions in Contact Center Applications. SENSORS 2022; 22:s22145311. [PMID: 35890994 PMCID: PMC9321989 DOI: 10.3390/s22145311] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Revised: 06/27/2022] [Accepted: 07/13/2022] [Indexed: 12/04/2022]
Abstract
Over the past few years, virtual assistant solutions used in Contact Center systems are gaining popularity. One of the main tasks of the virtual assistant is to recognize the intentions of the customer. It is important to note that quite often the actual intention expressed in a conversation is also directly influenced by the emotions that accompany that conversation. Unfortunately, scientific literature has not identified what specific types of emotions in Contact Center applications are relevant to the activities they perform. Therefore, the main objective of this work was to develop an Emotion Classification for Machine Detection of Affect-Tinged Conversational Contents dedicated directly to the Contact Center industry. In the conducted study, Contact Center voice and text channels were considered, taking into account the following families of emotions: anger, fear, happiness, sadness vs. affective neutrality of the statements. The obtained results confirmed the usefulness of the proposed classification—for the voice channel, the highest efficiency was obtained using the Convolutional Neural Network (accuracy, 67.5%; precision, 80.3; F1-Score, 74.5%), while for the text channel, the Support Vector Machine algorithm proved to be the most efficient (accuracy, 65.9%; precision, 58.5; F1-Score, 61.7%).
Collapse
|
16
|
Zheng Q, Lam V. Influence of Multiple Music Styles and Composition Styles on College Students' Mental Health. Occup Ther Int 2022; 2022:6167197. [PMID: 35936831 PMCID: PMC9296337 DOI: 10.1155/2022/6167197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 06/20/2022] [Indexed: 11/17/2022] Open
Abstract
The purpose is to reduce students' psychological pressure and improve their quality of study and life. Here, 400 language-impaired students in the public elective psychology course at Northwestern University and the undergraduate psychology class at Xi'an Foreign Studies University in the 2018-2019 academic year are randomly selected as the research objects for this music psychology experiment. The students were divided into different experimental groups using the Questionnaire Survey (QS) method to analyze the students' psychological reactions to Baroque, classical, and romantic music styles. Then, it further discusses the students' emotional response and audiovisual synaesthesia, as well as their recognition and choice of music style. The results show that there are significant differences in the same emotional response intensity of the subjects to different styles of music creation. The music expression is consistent with the actual feelings of the subjects. The tonality and color density of audiovisual synaesthesia vary with the style of music creation. Different music creation styles generate different associations in students' minds, thus showing different psychological reactions. The QS results indicate that soft and soothing music can relieve students' learning pressure most, while music with a strong sense of rhythm and vitality has no significant effect. Therefore, different music creation styles affect students' learning pressure differently. This work discusses the influence of different music creation styles on the mental health of contemporary college students and provides a reference for music therapy to relieve students' learning pressure in the future.
Collapse
Affiliation(s)
- Ququ Zheng
- School of Music, Shanghai University, Shanghai City 200444, China
| | - Vincent Lam
- Amazon Music, 525 Market St FL19, San Francisco, CA 94105, USA
| |
Collapse
|
17
|
Langer M, König CJ, Siegel R, Fredenhagen T, Schunck AG, Hähne V, Baur T. Vocal-Stress Diary: A Longitudinal Investigation of the Association of Everyday Work Stressors and Human Voice Features. Psychol Sci 2022; 33:1027-1039. [PMID: 35640140 DOI: 10.1177/09567976211068110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
The human voice conveys plenty of information about the speaker. A prevalent assumption is that stress-related changes in the human body affect speech production, thus affecting voice features. This suggests that voice data may be an easy-to-capture measure of everyday stress levels and can thus serve as a warning signal of stress-related health consequences. However, previous research is limited (i.e., has induced stress only through artificial tasks or has investigated only short-term or extreme stressors), leaving it open whether everyday work stressors are associated with voice features. Thus, our participants (111 adult working individuals) took part in a 1-week diary study (Sunday until Sunday), in which they provided voice messages and self-report data on daily work stressors. Results showed that work stressors were associated with voice features such as increased speech rate and voice intensity. We discuss theoretical, practical, and ethical implications regarding the voice as an indicator of psychological states.
Collapse
Affiliation(s)
- Markus Langer
- Industrial and Organizational Psychology, Saarland University
| | | | - Rudolf Siegel
- Industrial and Organizational Psychology, Saarland University
| | | | | | - Viviane Hähne
- Industrial and Organizational Psychology, Saarland University
| | - Tobias Baur
- Human-Centered Artificial Intelligence, Augsburg University
| |
Collapse
|
18
|
Steiner F, Fernandez N, Dietziker J, Stämpfli SP, Seifritz E, Rey A, Frühholz FS. Affective speech modulates a cortico-limbic network in real time. Prog Neurobiol 2022; 214:102278. [DOI: 10.1016/j.pneurobio.2022.102278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 04/06/2022] [Accepted: 04/28/2022] [Indexed: 10/18/2022]
|
19
|
Kikutani M, Ikemoto M. Detecting emotion in speech expressing incongruent emotional cues through voice and content: investigation on dominant modality and language. Cogn Emot 2022; 36:492-511. [PMID: 34978263 DOI: 10.1080/02699931.2021.2021144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
This research investigated how we detect emotion in speech when the emotional cues in the sound of voice do not match the semantic content. It examined the dominance of the voice or semantics in the perception of emotion from incongruent speech and the influence of language on the interaction between the two modalities. Japanese participants heard a voice emoting anger, happiness or sadness while saying "I'm angry", "I'm pleased" or "I'm sad", which were in their native language, in their second language (English) and in unfamiliar languages (Khmer and Swedish). They reported how much they agree that the speaker is expressing each of the three emotions. Two experiments were conducted with different number of voice stimuli, and both found consistent results. Strong reliance on the voice was found for the speech in participants' second and unfamiliar languages but the dominance was weakened for the speech in their native language. Among the three emotions, voice was most important for perception of sadness. This research concludes that the impact of the emotional cues expressed by the voice and semantics varies depending on the expressed emotions and the language.
Collapse
Affiliation(s)
- Mariko Kikutani
- Institute of Liberal Arts and Science, Kanazawa University, Ishikawa, Japan
| | | |
Collapse
|
20
|
Beyond the Language Module: Musicality as a Stepping Stone Towards Language Acquisition. EVOLUTIONARY PSYCHOLOGY 2022. [DOI: 10.1007/978-3-030-76000-7_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
|
21
|
Superior Communication of Positive Emotions Through Nonverbal Vocalisations Compared to Speech Prosody. JOURNAL OF NONVERBAL BEHAVIOR 2021; 45:419-454. [PMID: 34744232 PMCID: PMC8553689 DOI: 10.1007/s10919-021-00375-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/22/2021] [Indexed: 11/29/2022]
Abstract
The human voice communicates emotion through two different types of vocalizations: nonverbal vocalizations (brief non-linguistic sounds like laughs) and speech prosody (tone of voice). Research examining recognizability of emotions from the voice has mostly focused on either nonverbal vocalizations or speech prosody, and included few categories of positive emotions. In two preregistered experiments, we compare human listeners’ (total n = 400) recognition performance for 22 positive emotions from nonverbal vocalizations (n = 880) to that from speech prosody (n = 880). The results show that listeners were more accurate in recognizing most positive emotions from nonverbal vocalizations compared to prosodic expressions. Furthermore, acoustic classification experiments with machine learning models demonstrated that positive emotions are expressed with more distinctive acoustic patterns for nonverbal vocalizations as compared to speech prosody. Overall, the results suggest that vocal expressions of positive emotions are communicated more successfully when expressed as nonverbal vocalizations compared to speech prosody.
Collapse
|
22
|
Richards SE, Hughes ME, Woodward TS, Rossell SL, Carruthers SP. External speech processing and auditory verbal hallucinations: A systematic review of functional neuroimaging studies. Neurosci Biobehav Rev 2021; 131:663-687. [PMID: 34517037 DOI: 10.1016/j.neubiorev.2021.09.006] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Revised: 08/31/2021] [Accepted: 09/03/2021] [Indexed: 12/23/2022]
Abstract
It has been documented that individuals who hear auditory verbal hallucinations (AVH) exhibit diminished capabilities in processing external speech. While functional neuroimaging studies have attempted to characterise the cortical regions and networks facilitating these deficits in a bid to understand AVH, considerable methodological heterogeneity has prevented a consensus being reached. The current systematic review investigated the neurobiological underpinnings of external speech processing deficits in voice-hearers in 38 studies published between January 1990 to June 2020. AVH-specific deviations in the activity and lateralisation of the temporal auditory regions were apparent when processing speech sounds, words and sentences. During active or affective listening tasks, functional connectivity changes arose within the language, limbic and default mode networks. However, poor study quality and lack of replicable results plague the field. A detailed list of recommendations has been provided to improve the quality of future research on this topic.
Collapse
Affiliation(s)
- Sophie E Richards
- Centre for Mental Health, Faculty of Health, Arts & Design, Swinburne University of Technology, VIC, 3122, Australia.
| | - Matthew E Hughes
- Centre for Mental Health, Faculty of Health, Arts & Design, Swinburne University of Technology, VIC, 3122, Australia
| | - Todd S Woodward
- Department of Psychiatry, University of British Colombia, Vancouver, BC, Canada; BC Mental Health and Addictions Research Institute, Vancouver, BC, Canada
| | - Susan L Rossell
- Centre for Mental Health, Faculty of Health, Arts & Design, Swinburne University of Technology, VIC, 3122, Australia; Department of Psychiatry, St Vincent's Hospital, Melbourne, VIC, Australia
| | - Sean P Carruthers
- Centre for Mental Health, Faculty of Health, Arts & Design, Swinburne University of Technology, VIC, 3122, Australia
| |
Collapse
|
23
|
Desmet PMA, Sauter DA, Shiota MN. Apples and oranges: three criteria for positive emotion typologies. Curr Opin Behav Sci 2021. [DOI: 10.1016/j.cobeha.2021.03.012] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
24
|
Huang KL, Duan SF, Lyu X. Affective Voice Interaction and Artificial Intelligence: A Research Study on the Acoustic Features of Gender and the Emotional States of the PAD Model. Front Psychol 2021; 12:664925. [PMID: 34017295 PMCID: PMC8129507 DOI: 10.3389/fpsyg.2021.664925] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2021] [Accepted: 03/18/2021] [Indexed: 11/18/2022] Open
Abstract
New types of artificial intelligence products are gradually transferring to voice interaction modes with the demand for intelligent products expanding from communication to recognizing users' emotions and instantaneous feedback. At present, affective acoustic models are constructed through deep learning and abstracted into a mathematical model, making computers learn from data and equipping them with prediction abilities. Although this method can result in accurate predictions, it has a limitation in that it lacks explanatory capability; there is an urgent need for an empirical study of the connection between acoustic features and psychology as the theoretical basis for the adjustment of model parameters. Accordingly, this study focuses on exploring the differences between seven major “acoustic features” and their physical characteristics during voice interaction with the recognition and expression of “gender” and “emotional states of the pleasure-arousal-dominance (PAD) model.” In this study, 31 females and 31 males aged between 21 and 60 were invited using the stratified random sampling method for the audio recording of different emotions. Subsequently, parameter values of acoustic features were extracted using Praat voice software. Finally, parameter values were analyzed using a Two-way ANOVA, mixed-design analysis in SPSS software. Results show that gender and emotional states of the PAD model vary among seven major acoustic features. Moreover, their difference values and rankings also vary. The research conclusions lay a theoretical foundation for AI emotional voice interaction and solve deep learning's current dilemma in emotional recognition and parameter optimization of the emotional synthesis model due to the lack of explanatory power.
Collapse
Affiliation(s)
- Kuo-Liang Huang
- Department of Industrial Design, Design Academy, Sichuan Fine Arts Institute, Chongqing, China
| | - Sheng-Feng Duan
- Department of Industrial Design, Design Academy, Sichuan Fine Arts Institute, Chongqing, China
| | - Xi Lyu
- Department of Digital Media Art, Design Academy, Sichuan Fine Arts Institute, Chongqing, China
| |
Collapse
|
25
|
Abstract
Abstract. Two experiments examined the impact of voice pitch on gender stereotyping. Participants listened to a text read by a female (Study 1; N = 171) or male (Study 2, N = 151) speaker, whose voice pitch was manipulated to be high or low. They rated the speaker on positive and negative facets of masculinity and femininity, competence, and likability. They also indicated their own gendered self-concept. High pitch was associated with the ascription of more feminine traits and greater likability. The high-pitch female speaker was rated as less competent, and the high-pitch male speaker was perceived as less masculine. Text content and participants’ gendered self-concept did not moderate the pitch effect. The findings underline the importance of voice pitch for impression formation.
Collapse
Affiliation(s)
- Barbara Krahé
- Department of Psychology, University of Potsdam, Germany
| | | | - Meike Herzberg
- Department of Psychology, University of Potsdam, Germany
| |
Collapse
|
26
|
Cortes DS, Tornberg C, Bänziger T, Elfenbein HA, Fischer H, Laukka P. Effects of aging on emotion recognition from dynamic multimodal expressions and vocalizations. Sci Rep 2021; 11:2647. [PMID: 33514829 PMCID: PMC7846600 DOI: 10.1038/s41598-021-82135-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Accepted: 01/15/2021] [Indexed: 12/20/2022] Open
Abstract
Age-related differences in emotion recognition have predominantly been investigated using static pictures of facial expressions, and positive emotions beyond happiness have rarely been included. The current study instead used dynamic facial and vocal stimuli, and included a wider than usual range of positive emotions. In Task 1, younger and older adults were tested for their abilities to recognize 12 emotions from brief video recordings presented in visual, auditory, and multimodal blocks. Task 2 assessed recognition of 18 emotions conveyed by non-linguistic vocalizations (e.g., laughter, sobs, and sighs). Results from both tasks showed that younger adults had significantly higher overall recognition rates than older adults. In Task 1, significant group differences (younger > older) were only observed for the auditory block (across all emotions), and for expressions of anger, irritation, and relief (across all presentation blocks). In Task 2, significant group differences were observed for 6 out of 9 positive, and 8 out of 9 negative emotions. Overall, results indicate that recognition of both positive and negative emotions show age-related differences. This suggests that the age-related positivity effect in emotion recognition may become less evident when dynamic emotional stimuli are used and happiness is not the only positive emotion under study.
Collapse
Affiliation(s)
- Diana S Cortes
- Department of Psychology, Stockholm University, Stockholm, Sweden.
| | | | - Tanja Bänziger
- Department of Psychology, Mid Sweden University, Östersund, Sweden
| | | | - Håkan Fischer
- Department of Psychology, Stockholm University, Stockholm, Sweden
| | - Petri Laukka
- Department of Psychology, Stockholm University, Stockholm, Sweden.
| |
Collapse
|
27
|
Arias P, Rachman L, Liuni M, Aucouturier JJ. Beyond Correlation: Acoustic Transformation Methods for the Experimental Study of Emotional Voice and Speech. EMOTION REVIEW 2020. [DOI: 10.1177/1754073920934544] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
While acoustic analysis methods have become a commodity in voice emotion research, experiments that attempt not only to describe but to computationally manipulate expressive cues in emotional voice and speech have remained relatively rare. We give here a nontechnical overview of voice-transformation techniques from the audio signal-processing community that we believe are ripe for adoption in this context. We provide sound examples of what they can achieve, examples of experimental questions for which they can be used, and links to open-source implementations. We point at a number of methodological properties of these algorithms, such as being specific, parametric, exhaustive, and real-time, and describe the new possibilities that these open for the experimental study of the emotional voice.
Collapse
Affiliation(s)
- Pablo Arias
- STMS UMR9912, IRCAM/CNRS/Sorbonne Université, France
| | - Laura Rachman
- STMS UMR9912, IRCAM/CNRS/Sorbonne Université, France
| | - Marco Liuni
- STMS UMR9912, IRCAM/CNRS/Sorbonne Université, France
| | | |
Collapse
|
28
|
Gradual positive and negative affect induction: The effect of verbalizing affective content. PLoS One 2020; 15:e0233592. [PMID: 32469910 PMCID: PMC7259663 DOI: 10.1371/journal.pone.0233592] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Accepted: 05/10/2020] [Indexed: 11/19/2022] Open
Abstract
In this paper, we study the effect of verbalizing affective pictures on affective state and language production. Individuals describe (Study I: Spoken Descriptions of Pictures) or passively view (Study II: Passively Viewing Pictures) 40 pictures for the International Affective Picture System (IAPS) that gradually increase from neutral to either positive or negative content. We expected that both methods would result in successful affect induction, and that the effect would be stronger for verbally describing pictures than for passively viewing them. Results indicate that speakers indeed felt more negative after describing negative pictures, but that describing positive (compared to neutral) pictures did not result in a more positive state. Contrary to our hypothesis, no differences were found between describing and passively viewing the pictures. Furthermore, we analysed the verbal picture descriptions produced by participants on various dimensions. Results indicate that positive and negative pictures were indeed described with increasingly more affective language in the expected directions. In addition to informing our understanding of the relationship between (spoken) language production and affect, these results also potentially pave the way for a new method of affect induction that uses free expression.
Collapse
|