1
|
Jiang Z, Long Y, Zhang X, Liu Y, Bai X. CNEV: A corpus of Chinese nonverbal emotional vocalizations with a database of emotion category, valence, arousal, and gender. Behav Res Methods 2025; 57:62. [PMID: 39838181 DOI: 10.3758/s13428-024-02595-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/18/2024] [Indexed: 01/23/2025]
Abstract
Nonverbal emotional vocalizations play a crucial role in conveying emotions during human interactions. Validated corpora of these vocalizations have facilitated emotion-related research and found wide-ranging applications. However, existing corpora have lacked representation from diverse cultural backgrounds, which may limit the generalizability of the resulting theories. The present paper introduces the Chinese Nonverbal Emotional Vocalization (CNEV) corpus, the first nonverbal emotional vocalization corpus recorded and validated entirely by Mandarin speakers from China. The CNEV corpus contains 2415 vocalizations across five emotion categories: happiness, sadness, fear, anger, and neutrality. It also includes a database containing subjective evaluation data on emotion category, valence, arousal, and speaker gender, as well as the acoustic features of the vocalizations. Key conclusions drawn from statistical analyses of perceptual evaluations and acoustic analysis include the following: (1) the CNEV corpus exhibits adequate reliability and high validity; (2) perceptual evaluations reveal a tendency for individuals to associate anger with male voices and fear with female voices; (3) acoustic analysis indicates that males are more effective at expressing anger, while females excel in expressing fear; and (4) the observed perceptual patterns align with the acoustic analysis results, suggesting that the perceptual differences may stem not only from the subjective factors of perceivers but also from objective expressive differences in the vocalizations themselves. For academic research purposes, the CNEV corpus and database are freely available for download at https://osf.io/6gy4v/ .
Collapse
Affiliation(s)
- Zhongqing Jiang
- College of Psychology, Liaoning Normal University, No. 850 Huanghe Road, Dalian, 116029, Liaoning, China.
| | - Yanling Long
- College of Psychology, Liaoning Normal University, No. 850 Huanghe Road, Dalian, 116029, Liaoning, China
| | - Xi'e Zhang
- Xianyang Senior High School of Shaanxi Province, Xianyang, China
| | - Yangtao Liu
- College of Psychology, Liaoning Normal University, No. 850 Huanghe Road, Dalian, 116029, Liaoning, China
| | - Xue Bai
- College of Psychology, Liaoning Normal University, No. 850 Huanghe Road, Dalian, 116029, Liaoning, China
| |
Collapse
|
2
|
Kim HN, Taylor S. Differences of people with visual disabilities in the perceived intensity of emotion inferred from speech of sighted people in online communication settings. Disabil Rehabil Assist Technol 2024; 19:633-640. [PMID: 35997772 DOI: 10.1080/17483107.2022.2114555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 06/17/2022] [Accepted: 08/12/2022] [Indexed: 10/15/2022]
Abstract
PURPOSE As humans convey information about emotions by speech signals, emotion recognition via auditory information is often employed to assess one's affective states. There are numerous ways of applying the knowledge of emotional vocal expressions to system designs that accommodate users' needs adequately. Yet, little is known about how people with visual disabilities infer emotions from speech stimuli, especially via online platforms (e.g., Zoom). This study focussed on examining the degree to which they perceive emotions strongly or weakly, i.e., perceived intensity but also investigating the degree to which their sociodemographic backgrounds affect them perceiving different intensity levels of emotions when exposed to a set of emotional speech stimuli via Zoom. MATERIALS AND METHODS A convenience sample of 30 individuals with visual disabilities participated in zoom interviews. Participants were given a set of emotional speech stimuli and reported the intensity level of the perceived emotions on a rating scale from 1 (weak) to 8 (strong). RESULTS When the participants were exposed to the emotional speech stimuli, calm, happy, fearful, sad, and neutral, they reported that neutral was the dominant emotion they perceived with the greatest intensity. Individual differences were also observed in the perceived intensity of emotions, associated with sociodemographic backgrounds, such as health, vision, job, and age. CONCLUSIONS The results of this study are anticipated to contribute to the fundamental knowledge that will be helpful for many stakeholders such as voice technology engineers, user experience designers, health professionals, and social workers providing support to people with visual disabilities.IMPLICATIONS FOR REHABILITATIONTechnologies equipped with alternative user interfaces (e.g., Siri, Alexa, and Google Voice Assistant) meeting the needs of people with visual disabilities can promote independent living and quality of life.Such technologies can also be equipped with systems that can recognize emotions via users' voice, such that users can obtain services customized to fit their emotional needs or adequately address their emotional challenges (e.g., early detection of onset, provision of advice, and so on).The results of this study can be beneficial to health professionals (e.g., social workers) who work closely with clients who have visual disabilities (e.g., virtual telehealth sessions) as they could gain insights or learn how to recognize and understand the clients' emotional struggle by hearing their voice, which is contributing to enhancement of emotional intelligence. Thus, they can provide better services to their clients, leading to building a strong bond and trust between health professionals and clients with visual disabilities even they meet virtually (e.g., Zoom).
Collapse
Affiliation(s)
- Hyung Nam Kim
- North Carolina A&T State University, Greensboro, NC, USA
| | - Shaniah Taylor
- North Carolina A&T State University, Greensboro, NC, USA
| |
Collapse
|
3
|
Szameitat DP, Szameitat AJ. Recognition of emotions in German laughter across cultures. Sci Rep 2024; 14:3052. [PMID: 38321192 PMCID: PMC10847427 DOI: 10.1038/s41598-024-53646-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 02/03/2024] [Indexed: 02/08/2024] Open
Abstract
Laughter conveys a wide range of information relevant for social interaction. In previous research we have shown that laughter can convey information about the sender's emotional state, however other research did not find such an effect. This paper aims to replicate our previous study using participant samples of diverse cultural backgrounds. 161 participants from Poland, the UK, India, Hong Kong, and other countries classified 121 spontaneously emitted German laughter sounds according to the laughter type, i.e., joyful, schadenfreude, and tickling laughter. Results showed that all participant groups classified the laughter sounds above chance level, and that there is a slight ingroup advantage for Western listeners. This suggests that classification of laughter according to the sender's emotional state is possible across different cultures, and that there might be a small advantage for classifying laughter of close cultural proximity.
Collapse
Affiliation(s)
- Diana P Szameitat
- Centre for Cognitive and Clinical Neuroscience, Division of Psychology, Department of Life Sciences, College of Health, Medicine and Life Sciences, Brunel University London, Kingston Lane, Uxbridge, UB8 3PH, UK.
| | - André J Szameitat
- Centre for Cognitive and Clinical Neuroscience, Division of Psychology, Department of Life Sciences, College of Health, Medicine and Life Sciences, Brunel University London, Kingston Lane, Uxbridge, UB8 3PH, UK.
| |
Collapse
|
4
|
Gómez-Emilsson A, Percy C. The heavy-tailed valence hypothesis: the human capacity for vast variation in pleasure/pain and how to test it. Front Psychol 2023; 14:1127221. [PMID: 38034319 PMCID: PMC10687198 DOI: 10.3389/fpsyg.2023.1127221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Accepted: 10/19/2023] [Indexed: 12/02/2023] Open
Abstract
Introduction Wellbeing policy analysis is often criticized for requiring a cardinal interpretation of measurement scales, such as ranking happiness on an integer scale from 0-10. The commonly-used scales also implicitly constrain the human capacity for experience, typically that our most intense experiences can only be at most ten times more intense than our mildest experiences. This paper presents the alternative "heavy-tailed valence" (HTV) hypothesis: the notion that the accessible human capacity for emotional experiences of pleasure and pain spans a minimum of two orders of magnitude. Methods We specify five testable predictions of the HTV hypothesis. A pilot survey of adults aged 21-64 (n = 97) then tested two predictions, asking respondents to comment on the most painful and most pleasurable experiences they can recall, alongside the second most painful and pleasurable experiences. Results The results find tentative support for the hypothesis. For instance, over half of respondents said their most intense experiences were at least twice as intense as the second most intense, implying a wide capacity overall. Simulations further demonstrate that survey responses are more consistent with underlying heavy-tailed distributions of experience than a "constrained valence" psychology. Discussion A synthesis of these results with prior findings suggests a "kinked" scale, such that a wide range of felt experience is compressed in reports at the high end of intensity scales, even if reports at lower intensities behave more cardinally. We present a discussion of three stylized facts that support HTV and six against, lessons for a future survey, practical guidelines for existing analyses, and implications for current policy. We argue for a dramatic increase in societal ambition. Even in high average income countries, the HTV hypothesis suggests we remain far further below our wellbeing potential than a surface reading of the data might suggest.
Collapse
|
5
|
Johnson KT, Narain J, Quatieri T, Maes P, Picard RW. ReCANVo: A database of real-world communicative and affective nonverbal vocalizations. Sci Data 2023; 10:523. [PMID: 37543663 PMCID: PMC10404278 DOI: 10.1038/s41597-023-02405-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Accepted: 07/24/2023] [Indexed: 08/07/2023] Open
Abstract
Nonverbal vocalizations, such as sighs, grunts, and yells, are informative expressions within typical verbal speech. Likewise, individuals who produce 0-10 spoken words or word approximations ("minimally speaking" individuals) convey rich affective and communicative information through nonverbal vocalizations even without verbal speech. Yet, despite their rich content, little to no data exists on the vocal expressions of this population. Here, we present ReCANVo: Real-World Communicative and Affective Nonverbal Vocalizations - a novel dataset of non-speech vocalizations labeled by function from minimally speaking individuals. The ReCANVo database contains over 7000 vocalizations spanning communicative and affective functions from eight minimally speaking individuals, along with communication profiles for each participant. Vocalizations were recorded in real-world settings and labeled in real-time by a close family member who knew the communicator well and had access to contextual information while labeling. ReCANVo is a novel database of nonverbal vocalizations from minimally speaking individuals, the largest available dataset of nonverbal vocalizations, and one of the only affective speech datasets collected amidst daily life across contexts.
Collapse
Affiliation(s)
- Kristina T Johnson
- Massachusetts Institute of Technology, MIT Media Lab, Cambridge, MA, USA.
| | - Jaya Narain
- Massachusetts Institute of Technology, MIT Media Lab, Cambridge, MA, USA.
| | - Thomas Quatieri
- Massachusetts Institute of Technology, Lincoln Laboratory, Lexington, MA, USA
| | - Pattie Maes
- Massachusetts Institute of Technology, MIT Media Lab, Cambridge, MA, USA
| | - Rosalind W Picard
- Massachusetts Institute of Technology, MIT Media Lab, Cambridge, MA, USA
| |
Collapse
|
6
|
Donhauser PW, Klein D. Audio-Tokens: A toolbox for rating, sorting and comparing audio samples in the browser. Behav Res Methods 2023; 55:508-515. [PMID: 35297013 PMCID: PMC10027774 DOI: 10.3758/s13428-022-01803-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/19/2022] [Indexed: 12/30/2022]
Abstract
Here we describe a JavaScript toolbox to perform online rating studies with auditory material. The main feature of the toolbox is that audio samples are associated with visual tokens on the screen that control audio playback and can be manipulated depending on the type of rating. This allows the collection of single- and multidimensional feature ratings, as well as categorical and similarity ratings. The toolbox ( github.com/pwdonh/audio_tokens ) can be used via a plugin for the widely used jsPsych, as well as using plain JavaScript for custom applications. We expect the toolbox to be useful in psychological research on speech and music perception, as well as for the curation and annotation of datasets in machine learning.
Collapse
Affiliation(s)
- Peter W Donhauser
- Cognitive Neuroscience Unit, Montreal Neurological Institute, McGill University, Montreal, QC, H3A 2B4, Canada.
- Ernst Strüngmann Institute for Neuroscience in Cooperation with Max Planck Society, 60528, Frankfurt am Main, Germany.
| | - Denise Klein
- Cognitive Neuroscience Unit, Montreal Neurological Institute, McGill University, Montreal, QC, H3A 2B4, Canada.
- Centre for Research on Brain, Language and Music, McGill University, Montreal, QC, H3G 2A8, Canada.
| |
Collapse
|
7
|
Grollero D, Petrolini V, Viola M, Morese R, Lettieri G, Cecchetti L. The structure underlying core affect and perceived affective qualities of human vocal bursts. Cogn Emot 2022; 37:1-17. [PMID: 36300588 DOI: 10.1080/02699931.2022.2139661] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Vocal bursts are non-linguistic affectively-laden sounds with a crucial function in human communication, yet their affective structure is still debated. Studies showed that ratings of valence and arousal follow a V-shaped relationship in several kinds of stimuli: high arousal ratings are more likely to go on a par with very negative or very positive valence. Across two studies, we asked participants to listen to 1,008 vocal bursts and judge both how they felt when listening to the sound (i.e. core affect condition), and how the speaker felt when producing it (i.e. perception of affective quality condition). We show that a V-shaped fit outperforms a linear model in explaining the valence-arousal relationship across conditions and studies, even after equating the number of exemplars across emotion categories. Also, although subjective experience can be significantly predicted using affective quality ratings, core affect scores are significantly lower in arousal, less extreme in valence, more variable between individuals, and less reproducible between studies. Nonetheless, stimuli rated with opposite valence between conditions range from 11% (study 1) to 17% (study 2). Lastly, we demonstrate that ambiguity in valence (i.e. high between-participants variability) explains violations of the V-shape and relates to higher arousal.
Collapse
Affiliation(s)
- Demetrio Grollero
- Social and Affective Neuroscience (SANe) Group, MoMiLab, IMT School for Advanced Studies Lucca, Lucca, Italy
| | - Valentina Petrolini
- Lindy Lab - Language in Neurodiversity, Department of Linguistics and Basque Studies, University of the Basque Country (UPV/EHU), Vitoria-Gasteiz, Spain
| | - Marco Viola
- Department of Philosophy and Education, University of Turin, Turin, Italy
| | - Rosalba Morese
- Faculty of Communication, Culture and Society, Università della Svizzera Italiana, Lugano, Switzerland
- Faculty of Biomedical Sciences, Università della Svizzera Italiana, Lugano, Switzerland
| | - Giada Lettieri
- Social and Affective Neuroscience (SANe) Group, MoMiLab, IMT School for Advanced Studies Lucca, Lucca, Italy
- Crossmodal Perception and Plasticity Laboratory, IPSY, University of Louvain, Louvain-la-Neuve, Belgium
| | - Luca Cecchetti
- Social and Affective Neuroscience (SANe) Group, MoMiLab, IMT School for Advanced Studies Lucca, Lucca, Italy
| |
Collapse
|