1
|
Arias-Sarah P, Bedoya D, Daube C, Aucouturier JJ, Hall L, Johansson P. Aligning the smiles of dating dyads causally increases attraction. Proc Natl Acad Sci U S A 2024; 121:e2400369121. [PMID: 39467124 DOI: 10.1073/pnas.2400369121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Accepted: 08/28/2024] [Indexed: 10/30/2024] Open
Abstract
Social interaction research is lacking an experimental paradigm enabling researchers to make causal inferences in free social interactions. For instance, the expressive signals that causally modulate the emergence of romantic attraction during interactions remain unknown. To disentangle causality in the wealth of covarying factors that govern social interactions, we developed an open-source video-conference platform enabling researchers to covertly manipulate the social signals produced by participants during interactions. Using this platform, we performed a speed-dating experiment where we aligned or misaligned the facial smiles of participants in real time with face transformation algorithms. Even though participants remained totally unaware that their faces were being manipulated, aligning their smiles causally enhanced the romantic attraction they felt toward each other, compared to unaligned scenarios. Manipulations also influenced how participants synchronized and vocally reacted to each other. This paradigm causally manipulates the emergence of romantic attraction in free social interactions. Moreover, our methodology opens the possibility to perform causal inferences during free social interactions.
Collapse
Affiliation(s)
- Pablo Arias-Sarah
- Lund University Cognitive Science, Lund University, Lund 221 00, Sweden
- Sciences et Technologies de la Musique et du Son Lab, UMR 9912 (Institut de Recherche et Coordination Acoustique Musique/CNRS/Sorbonne Univeristé), Paris 75004, France
- School of Neuroscience and Psychology, Glasgow University, Glasgow G12 8QQ, United Kingdom
| | - Daniel Bedoya
- Sciences et Technologies de la Musique et du Son Lab, UMR 9912 (Institut de Recherche et Coordination Acoustique Musique/CNRS/Sorbonne Univeristé), Paris 75004, France
| | - Christoph Daube
- School of Neuroscience and Psychology, Glasgow University, Glasgow G12 8QQ, United Kingdom
| | - Jean-Julien Aucouturier
- Franche-Comté Électronique Mécanique Thermique et Optique - Sciences et Technologies Institute (CNRS/Université de Bourgogne Franche Comté), Besançon 25000, France
| | - Lars Hall
- Lund University Cognitive Science, Lund University, Lund 221 00, Sweden
| | - Petter Johansson
- Lund University Cognitive Science, Lund University, Lund 221 00, Sweden
| |
Collapse
|
2
|
Kachlicka M, Tierney A. Voice actors show enhanced neural tracking of pitch, prosody perception, and music perception. Cortex 2024; 178:213-222. [PMID: 39024939 DOI: 10.1016/j.cortex.2024.06.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 05/28/2024] [Accepted: 06/26/2024] [Indexed: 07/20/2024]
Abstract
Experiences with sound that make strong demands on the precision of perception, such as musical training and experience speaking a tone language, can enhance auditory neural encoding. Are high demands on the precision of perception necessary for training to drive auditory neural plasticity? Voice actors are an ideal subject population for answering this question. Voice acting requires exaggerating prosodic cues to convey emotion, character, and linguistic structure, drawing upon attention to sound, memory for sound features, and accurate sound production, but not fine perceptual precision. Here we assessed neural encoding of pitch using the frequency-following response (FFR), as well as prosody, music, and sound perception, in voice actors and a matched group of non-actors. We find that the consistency of neural sound encoding, prosody perception, and musical phrase perception are all enhanced in voice actors, suggesting that a range of neural and behavioural auditory processing enhancements can result from training which lacks fine perceptual precision. However, fine discrimination was not enhanced in voice actors but was linked to degree of musical experience, suggesting that low-level auditory processing can only be enhanced by demanding perceptual training. These findings suggest that training which taxes attention, memory, and production but is not perceptually taxing may be a way to boost neural encoding of sound and auditory pattern detection in individuals with poor auditory skills.
Collapse
Affiliation(s)
- Magdalena Kachlicka
- School of Psychological Sciences, Birkbeck, University of London, London, UK
| | - Adam Tierney
- School of Psychological Sciences, Birkbeck, University of London, London, UK.
| |
Collapse
|
3
|
Watkins CD. Mate assessment based on physical characteristics: a review and reflection. Biol Rev Camb Philos Soc 2024. [PMID: 39175167 DOI: 10.1111/brv.13131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 08/07/2024] [Accepted: 08/08/2024] [Indexed: 08/24/2024]
Abstract
Mate choice, and sex differences in romantic behaviours, represented one of the first major applications of evolutionary biology to human behaviour. This paper reviews Darwinian approaches to heterosexual mate assessment based on physical characteristics, placing the literature in its historical context (1871-1979), before turning (predominantly) to psychological research on attractiveness judgements based on physical characteristics. Attractiveness is consistently inferred across multiple modalities, with biological theories explaining why we differentiate certain individuals, on average, from others. Simultaneously, it is a judgement that varies systematically in light of our own traits, environment, and experiences. Over 30 years of research has generated robust effects alongside reasons to be humble in our lack of understanding of the precise physiological mechanisms involved in mate assessment. This review concludes with three questions to focus attention in further research, and proposes that our romantic preferences still provide a critical window into the evolution of human sexuality.
Collapse
Affiliation(s)
- Christopher D Watkins
- Division of Psychology and Forensic Sciences, School of Applied Sciences, Abertay University, Kydd Building, Bell Street, Dundee, DD11HG, UK
| |
Collapse
|
4
|
Adl Zarrabi A, Jeulin M, Bardet P, Commère P, Naccache L, Aucouturier JJ, Ponsot E, Villain M. A simple psychophysical procedure separates representational and noise components in impairments of speech prosody perception after right-hemisphere stroke. Sci Rep 2024; 14:15194. [PMID: 38956187 PMCID: PMC11219855 DOI: 10.1038/s41598-024-64295-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 06/06/2024] [Indexed: 07/04/2024] Open
Abstract
After a right hemisphere stroke, more than half of the patients are impaired in their capacity to produce or comprehend speech prosody. Yet, and despite its social-cognitive consequences for patients, aprosodia following stroke has received scant attention. In this report, we introduce a novel, simple psychophysical procedure which, by combining systematic digital manipulations of speech stimuli and reverse-correlation analysis, allows estimating the internal sensory representations that subtend how individual patients perceive speech prosody, and the level of internal noise that govern behavioral variability in how patients apply these representations. Tested on a sample of N = 22 right-hemisphere stroke survivors and N = 21 age-matched controls, the representation + noise model provides a promising alternative to the clinical gold standard for evaluating aprosodia (MEC): both parameters strongly associate with receptive, and not expressive, aprosodia measured by MEC within the patient group; they have better sensitivity than MEC for separating high-functioning patients from controls; and have good specificity with respect to non-prosody-related impairments of auditory attention and processing. Taken together, individual differences in either internal representation, internal noise, or both, paint a potent portrait of the variety of sensory/cognitive mechanisms that can explain impairments of prosody processing after stroke.
Collapse
Affiliation(s)
- Aynaz Adl Zarrabi
- Université de Franche-Comté, SUPMICROTECH, CNRS, Institut FEMTO-ST, 25000, Besançon, France
| | - Mélissa Jeulin
- Department of Physical Medicine & Rehabilitation, APHP/Hôpital Pitié-Salpêtrière, 75013, Paris, France
| | - Pauline Bardet
- Department of Physical Medicine & Rehabilitation, APHP/Hôpital Pitié-Salpêtrière, 75013, Paris, France
| | - Pauline Commère
- Department of Physical Medicine & Rehabilitation, APHP/Hôpital Pitié-Salpêtrière, 75013, Paris, France
| | - Lionel Naccache
- Department of Physical Medicine & Rehabilitation, APHP/Hôpital Pitié-Salpêtrière, 75013, Paris, France
- Paris Brain Institute (ICM), Inserm, CNRS, PICNIC-Lab, 75013, Paris, France
| | | | - Emmanuel Ponsot
- Science & Technology of Music and Sound, IRCAM/CNRS/Sorbonne Université, 75004, Paris, France
| | - Marie Villain
- Department of Physical Medicine & Rehabilitation, APHP/Hôpital Pitié-Salpêtrière, 75013, Paris, France.
- Paris Brain Institute (ICM), Inserm, CNRS, PICNIC-Lab, 75013, Paris, France.
| |
Collapse
|
5
|
Nestor PG, Woodhull AA. Exploring cultural contributions to the neuropsychology of social cognition: the advanced clinical solutions. J Clin Exp Neuropsychol 2024; 46:303-315. [PMID: 38717033 DOI: 10.1080/13803395.2024.2348212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 04/21/2024] [Indexed: 08/09/2024]
Abstract
INTRODUCTION Culture and social cognition are deeply intertwined, yet how this rich intersectionality is expressed neuropsychologically remains an important question. METHOD In a convenience sample of 128 young adults (mean age = 24.9 years) recruited from a majority-minority urban university, we examined performance-based neuropsychological measures of social cognition, the Advanced Clinical Solutions-Social Perception (ACS-SP), in relation to both cultural orientation, as assessed by the Individualism-Collectivism Scale (ICS) and spoken English language, as assessed by the oral word pronunciation measure of the Wide Range Achievement Test-4 (WRAT4). RESULTS Results indicated higher WRAT4 scores correlated with better performance across all ACS-SP measures of social cognition. Controlling for these associations in spoken English, partial correlations linked lower scores across both prosody interpretation and affect naming ACS-SP tasks with a propensity to view social relationships vertically, irrespective of individualistic or collectivistic orientations. Hierarchical regression results showed that cultural orientation and English-language familiarity each specifically and uniquely contributed to ACS-SP performance for matching prosody with facial expressions. CONCLUSIONS These findings underscore the importance of incorporating and prioritizing both language and cultural factors in neuropsychological studies of social cognition. They may be viewed as offering strong support for expanding the boundaries of the construct of social cognition beyond its current theoretical framework of one that privileges Western, educated, industralized, rich and democratic (WEIRD) values, customs, and epistemologies.
Collapse
Affiliation(s)
- Paul G Nestor
- Department of Psychology, University of Massachusetts Boston, Boston, MA, USA
- Laboratory of Neuroscience, Harvard Medical School, Brockton, MA, USA
| | - Ashley-Ann Woodhull
- Department of Psychology, University of Massachusetts Boston, Boston, MA, USA
| |
Collapse
|
6
|
Roop BW, Parrell B, Lammert AC. A compressive sensing approach for inferring cognitive representations with reverse correlation. Behav Res Methods 2024; 56:3606-3618. [PMID: 38049576 PMCID: PMC11133035 DOI: 10.3758/s13428-023-02281-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/25/2023] [Indexed: 12/06/2023]
Abstract
Uncovering cognitive representations is an elusive goal that is increasingly pursued using the reverse correlation method, wherein human subjects make judgments about ambiguous stimuli. Employing reverse correlation often entails collecting thousands of stimulus-response pairs, which severely limits the breadth of studies that are feasible using the method. Current techniques to improve efficiency bias the outcome. Here we show that this methodological barrier can be diminished using compressive sensing, an advanced signal processing technique designed to improve sampling efficiency. Simulations are performed to demonstrate that compressive sensing can improve the accuracy of reconstructed cognitive representations and dramatically reduce the required number of stimulus-response pairs. Additionally, compressive sensing is used on human subject data from a previous reverse correlation study, demonstrating a dramatic improvement in reconstruction quality. This work concludes by outlining the potential of compressive sensing to improve representation reconstruction throughout the fields of psychology, neuroscience, and beyond.
Collapse
Affiliation(s)
- Benjamin W Roop
- Program of Neuroscience, Worcester Polytechnic Institute, Worcester, MA, USA
| | - Benjamin Parrell
- Department of Communication Sciences and Disorders, University of Wisconsin-Madison, Madison, WI, USA
| | - Adam C Lammert
- Program of Neuroscience, Worcester Polytechnic Institute, Worcester, MA, USA.
- Biomedical Engineering Department, Worcester Polytechnic Institute, 100 Institute Rd, Worcester, MA, 01609, USA.
| |
Collapse
|
7
|
Larrouy-Maestri P, Poeppel D, Pell MD. The Sound of Emotional Prosody: Nearly 3 Decades of Research and Future Directions. PERSPECTIVES ON PSYCHOLOGICAL SCIENCE 2024:17456916231217722. [PMID: 38232303 DOI: 10.1177/17456916231217722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
Emotional voices attract considerable attention. A search on any browser using "emotional prosody" as a key phrase leads to more than a million entries. Such interest is evident in the scientific literature as well; readers are reminded in the introductory paragraphs of countless articles of the great importance of prosody and that listeners easily infer the emotional state of speakers through acoustic information. However, despite decades of research on this topic and important achievements, the mapping between acoustics and emotional states is still unclear. In this article, we chart the rich literature on emotional prosody for both newcomers to the field and researchers seeking updates. We also summarize problems revealed by a sample of the literature of the last decades and propose concrete research directions for addressing them, ultimately to satisfy the need for more mechanistic knowledge of emotional prosody.
Collapse
Affiliation(s)
- Pauline Larrouy-Maestri
- Max Planck Institute for Empirical Aesthetics, Frankfurt, Germany
- School of Communication Sciences and Disorders, McGill University
- Max Planck-NYU Center for Language, Music, and Emotion, New York, New York
| | - David Poeppel
- Max Planck-NYU Center for Language, Music, and Emotion, New York, New York
- Department of Psychology and Center for Neural Science, New York University
- Ernst Strüngmann Institute for Neuroscience, Frankfurt, Germany
| | - Marc D Pell
- School of Communication Sciences and Disorders, McGill University
- Centre for Research on Brain, Language, and Music, Montreal, Quebec, Canada
| |
Collapse
|
8
|
Chen C, Messinger DS, Chen C, Yan H, Duan Y, Ince RAA, Garrod OGB, Schyns PG, Jack RE. Cultural facial expressions dynamically convey emotion category and intensity information. Curr Biol 2024; 34:213-223.e5. [PMID: 38141619 PMCID: PMC10831323 DOI: 10.1016/j.cub.2023.12.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 10/27/2023] [Accepted: 12/01/2023] [Indexed: 12/25/2023]
Abstract
Communicating emotional intensity plays a vital ecological role because it provides valuable information about the nature and likelihood of the sender's behavior.1,2,3 For example, attack often follows signals of intense aggression if receivers fail to retreat.4,5 Humans regularly use facial expressions to communicate such information.6,7,8,9,10,11 Yet how this complex signaling task is achieved remains unknown. We addressed this question using a perception-based, data-driven method to mathematically model the specific facial movements that receivers use to classify the six basic emotions-"happy," "surprise," "fear," "disgust," "anger," and "sad"-and judge their intensity in two distinct cultures (East Asian, Western European; total n = 120). In both cultures, receivers expected facial expressions to dynamically represent emotion category and intensity information over time, using a multi-component compositional signaling structure. Specifically, emotion intensifiers peaked earlier or later than emotion classifiers and represented intensity using amplitude variations. Emotion intensifiers are also more similar across emotions than classifiers are, suggesting a latent broad-plus-specific signaling structure. Cross-cultural analysis further revealed similarities and differences in expectations that could impact cross-cultural communication. Specifically, East Asian and Western European receivers have similar expectations about which facial movements represent high intensity for threat-related emotions, such as "anger," "disgust," and "fear," but differ on those that represent low threat emotions, such as happiness and sadness. Together, our results provide new insights into the intricate processes by which facial expressions can achieve complex dynamic signaling tasks by revealing the rich information embedded in facial expressions.
Collapse
Affiliation(s)
- Chaona Chen
- School of Psychology and Neuroscience, University of Glasgow, 62 Hillhead Street, Glasgow G12 8QB, Scotland, UK.
| | - Daniel S Messinger
- Departments of Psychology, Pediatrics, and Electrical & Computer Engineering, University of Miami, 5665 Ponce De Leon Blvd, Coral Gables, FL 33146, USA
| | - Cheng Chen
- Foreign Language Department, Teaching Centre for General Courses, Chengdu Medical College, 601 Tianhui Street, Chengdu 610083, China
| | - Hongmei Yan
- The MOE Key Lab for Neuroinformation, University of Electronic Science and Technology of China, North Jianshe Road, Chengdu 611731, China
| | - Yaocong Duan
- School of Psychology and Neuroscience, University of Glasgow, 62 Hillhead Street, Glasgow G12 8QB, Scotland, UK
| | - Robin A A Ince
- School of Psychology and Neuroscience, University of Glasgow, 62 Hillhead Street, Glasgow G12 8QB, Scotland, UK
| | - Oliver G B Garrod
- School of Psychology and Neuroscience, University of Glasgow, 62 Hillhead Street, Glasgow G12 8QB, Scotland, UK
| | - Philippe G Schyns
- School of Psychology and Neuroscience, University of Glasgow, 62 Hillhead Street, Glasgow G12 8QB, Scotland, UK
| | - Rachael E Jack
- School of Psychology and Neuroscience, University of Glasgow, 62 Hillhead Street, Glasgow G12 8QB, Scotland, UK
| |
Collapse
|
9
|
Lévesque-Lacasse A, Desjardins MC, Fiset D, Charbonneau C, Cormier S, Blais C. The Relationship Between the Ability to Infer Another's Pain and the Expectations Regarding the Appearance of Pain Facial Expressions: Investigation of the Role of Visual Perception. THE JOURNAL OF PAIN 2024; 25:250-264. [PMID: 37604362 DOI: 10.1016/j.jpain.2023.08.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 08/01/2023] [Accepted: 08/12/2023] [Indexed: 08/23/2023]
Abstract
Although pain is a commonly experienced and observed affective state, it is frequently misinterpreted, which leads to inadequate caregiving. Studies show the ability at estimating pain in others (estimation bias) and detecting its subtle variations (sensitivity) could emerge from independent mechanisms. While estimation bias is modulated by variables such as empathy level, pain catastrophizing tendency, and overexposure to pain, sensitivity remains unimpacted. The present study verifies if these 2 types of inaccuracies are partly explained by perceptual factors. Using reverse correlation, we measured their association with participants' mental representation of pain, or more simply put, with their expectations of what the face of a person in pain should look like. Experiment 1 shows that both parameters are associated with variations in expectations of this expression. More specifically, the estimation bias is linked with expectations characterized by salient changes in the middle face region, whereas sensitivity is associated with salient changes in the eyebrow region. Experiment 2 reveals that bias and sensitivity yield differences in emotional representations. Expectations of individuals with a lower underestimation tendency are qualitatively rated as expressing more pain and sadness, and those of individuals with a higher level of sensitivity as expressing more pain, anger, and disgust. Together, these results provide evidence for a perceptual contribution in pain inferencing that is independent of other psychosocial variables and its link to observers' expectations. PERSPECTIVE: This article reinforces the contribution of perceptual mechanisms in pain assessment. Moreover, strategies aimed to improve the reliability of individuals' expectations regarding the appearance of facial expressions of pain could potentially be developed, and contribute to decrease inaccuracies found in pain assessment and the confusion between pain and other affective states.
Collapse
Affiliation(s)
- Alexandra Lévesque-Lacasse
- Département de Psychoéducation et de Psychologie, Université du Québec en Outaouais, Gatineau, Québec, Canada
| | - Marie-Claude Desjardins
- Département de Psychoéducation et de Psychologie, Université du Québec en Outaouais, Gatineau, Québec, Canada
| | - Daniel Fiset
- Département de Psychoéducation et de Psychologie, Université du Québec en Outaouais, Gatineau, Québec, Canada
| | - Carine Charbonneau
- Département de Psychoéducation et de Psychologie, Université du Québec en Outaouais, Gatineau, Québec, Canada
| | - Stéphanie Cormier
- Département de Psychoéducation et de Psychologie, Université du Québec en Outaouais, Gatineau, Québec, Canada
| | - Caroline Blais
- Département de Psychoéducation et de Psychologie, Université du Québec en Outaouais, Gatineau, Québec, Canada
| |
Collapse
|
10
|
Coulombe V, Joyal M, Martel-Sauvageau V, Monetta L. Affective prosody disorders in adults with neurological conditions: A scoping review. INTERNATIONAL JOURNAL OF LANGUAGE & COMMUNICATION DISORDERS 2023; 58:1939-1954. [PMID: 37212522 DOI: 10.1111/1460-6984.12909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Accepted: 05/05/2023] [Indexed: 05/23/2023]
Abstract
BACKGROUND Individuals with affective-prosodic deficits have difficulty understanding or expressing emotions and attitudes through prosody. Affective prosody disorders can occur in multiple neurological conditions, but the limited knowledge about the clinical groups prone to deficits complicates their identification in clinical settings. Additionally, the nature of the disturbance underlying affective prosody disorder observed in different neurological conditions remains poorly understood. AIMS To bridge these knowledge gaps and provide relevant information to speech-language pathologists for the management of affective prosody disorders, this study provides an overview of research findings on affective-prosodic deficits in adults with neurological conditions by answering two questions: (1) Which clinical groups present with acquired affective prosodic impairments following brain damage? (2) Which aspects of affective prosody comprehension and production are negatively affected in these neurological conditions? METHODS & PROCEDURES We conducted a scoping review following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews guidelines. A literature search was undertaken in five electronic databases (MEDLINE, PsycINFO, EMBASE, CINAHL and Linguistics, and Language Behavior Abstracts) to identify primary studies reporting affective prosody disorders in adults with neurological impairments. We extracted data on clinical groups and characterised their deficits based on the assessment task used. OUTCOMES & RESULTS The review of 98 studies identified affective-prosodic deficits in 17 neurological conditions. The task paradigms typically used in affective prosody research (discrimination, recognition, cross-modal integration, production on request, imitation and spontaneous production) do not target the processes underlying affective prosody comprehension and production. Therefore, based on the current state of knowledge, it is not possible to establish the level of processing at which impairment occurs in clinical groups. Nevertheless, deficits in the comprehension of affective prosody are observed in 14 clinical groups (mainly recognition deficits) and deficits in the production of affective prosody (either on request or spontaneously) in 10 clinical groups. Neurological conditions and types of deficits that have not been investigated in many studies are highlighted. CONCLUSIONS & IMPLICATIONS The aim of this scoping review was to provide an overview on acquired affective prosody disorders and to identify gaps in knowledge that warrant further investigation. Deficits in the comprehension or production of affective prosody are common to numerous clinical groups with various neurological conditions. However, the underlying cause of affective prosody disorders across them is still unknown. Future studies should implement standardised assessment methods with specific tasks based on a cognitive model to identify the underlying deficits of affective prosody disorders. WHAT THIS PAPER ADDS What is already known on the subject What is already known on the subjectAffective prosody is used to share emotions and attitudes through speech and plays a fundamental role in communication and social interactions. Affective prosody disorders can occur in various neurological conditions, but the limited knowledge about the clinical groups prone to affective-prosodic deficits and about the characteristics of different phenotypes of affective prosody disorders complicates their identification in clinical settings. Distinct abilities underlying the comprehension and production of affective prosody can be selectively impaired by brain damage, but the nature of the disturbance underlying affective prosody disorders in different neurological conditions remains unclear. What this study adds Affective-prosodic deficits are reported in 17 neurological conditions, despite being recognised as a core feature of the clinical profile in only a few of them. The assessment tasks typically used in affective prosody research do not provide accurate information about the specific neurocognitive processes impaired in the comprehension or production of affective prosody. Future studies should implement assessment methods based on a cognitive approach to identify underlying deficits. The assessment of cognitive/executive dysfunctions, motor speech impairment and aphasia might be important for distinguishing primary affective prosodic dysfunctions from those secondarily impacting affective prosody. What are the potential clinical implications of this study? Raising awareness about the possible presence of affective-prosodic disorders in numerous clinical groups will facilitate their recognition by speech-language pathologists and, consequently, their management in clinical settings. A comprehensive assessment covering multiple affective-prosodic skills could highlight specific aspects of affective prosody that warrant clinical intervention.
Collapse
Affiliation(s)
- Valérie Coulombe
- Faculty of Medicine, Université Laval, Québec, Canada
- Center for Interdisciplinary Research in Rehabilitation and Social Integration (CIRRIS), Québec, Canada
| | | | - Vincent Martel-Sauvageau
- Faculty of Medicine, Université Laval, Québec, Canada
- Center for Interdisciplinary Research in Rehabilitation and Social Integration (CIRRIS), Québec, Canada
| | - Laura Monetta
- Faculty of Medicine, Université Laval, Québec, Canada
- Center for Interdisciplinary Research in Rehabilitation and Social Integration (CIRRIS), Québec, Canada
| |
Collapse
|
11
|
Lin C, Bulls LS, Tepfer LJ, Vyas AD, Thornton MA. Advancing Naturalistic Affective Science with Deep Learning. AFFECTIVE SCIENCE 2023; 4:550-562. [PMID: 37744976 PMCID: PMC10514024 DOI: 10.1007/s42761-023-00215-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 08/03/2023] [Indexed: 09/26/2023]
Abstract
People express their own emotions and perceive others' emotions via a variety of channels, including facial movements, body gestures, vocal prosody, and language. Studying these channels of affective behavior offers insight into both the experience and perception of emotion. Prior research has predominantly focused on studying individual channels of affective behavior in isolation using tightly controlled, non-naturalistic experiments. This approach limits our understanding of emotion in more naturalistic contexts where different channels of information tend to interact. Traditional methods struggle to address this limitation: manually annotating behavior is time-consuming, making it infeasible to do at large scale; manually selecting and manipulating stimuli based on hypotheses may neglect unanticipated features, potentially generating biased conclusions; and common linear modeling approaches cannot fully capture the complex, nonlinear, and interactive nature of real-life affective processes. In this methodology review, we describe how deep learning can be applied to address these challenges to advance a more naturalistic affective science. First, we describe current practices in affective research and explain why existing methods face challenges in revealing a more naturalistic understanding of emotion. Second, we introduce deep learning approaches and explain how they can be applied to tackle three main challenges: quantifying naturalistic behaviors, selecting and manipulating naturalistic stimuli, and modeling naturalistic affective processes. Finally, we describe the limitations of these deep learning methods, and how these limitations might be avoided or mitigated. By detailing the promise and the peril of deep learning, this review aims to pave the way for a more naturalistic affective science.
Collapse
Affiliation(s)
- Chujun Lin
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH USA
| | - Landry S. Bulls
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH USA
| | - Lindsey J. Tepfer
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH USA
| | - Amisha D. Vyas
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH USA
| | - Mark A. Thornton
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH USA
| |
Collapse
|
12
|
Compton A, Roop BW, Parrell B, Lammert AC. Stimulus whitening improves the efficiency of reverse correlation. Behav Res Methods 2023; 55:3120-3128. [PMID: 36038814 PMCID: PMC10556169 DOI: 10.3758/s13428-022-01946-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/26/2022] [Indexed: 11/08/2022]
Abstract
Human perception depends upon internal representations of the environment that help to organize the raw information available from the senses by acting as reference patterns. Internal representations are widely characterized using reverse correlation, a method capable of producing unconstrained estimates of the representation itself, all on the basis of simple responses to random stimuli. Despite its advantages, reverse correlation is often infeasible to apply because of its inefficiency-a very large number of stimulus-response trials are required in order to obtain an accurate estimate. Here, we show that an important source of this inefficiency is small, yet nontrivial, correlations that occur by chance between randomly generated stimuli. We demonstrate in simulation that whitening stimuli to remove such correlations before eliciting responses provides greater than 85% improvement in efficiency for a given estimation quality, as well as a two- to fivefold increase in quality for a given sample size. Moreover, unlike conventional approaches, whitening improves the efficiency of reverse correlation without introducing bias into the estimate, or requiring prior knowledge of the target internal representation. Improving the efficiency of reverse correlation with whitening may enable a broader scope of investigations into the individual variability and potential universality of perceptual mechanisms.
Collapse
Affiliation(s)
- Alexis Compton
- Biomedical Engineering Department, Worcester Polytechnic Institute, 100 Institute Rd, Worcester, MA, 01609, USA
| | - Benjamin W Roop
- Program of Neuroscience, Worcester Polytechnic Institute, Worcester, MA, USA
| | - Benjamin Parrell
- Department of Communication Sciences and Disorders, University of Wisconsin-Madison, Madison, WI, USA
| | - Adam C Lammert
- Biomedical Engineering Department, Worcester Polytechnic Institute, 100 Institute Rd, Worcester, MA, 01609, USA.
- Program of Neuroscience, Worcester Polytechnic Institute, Worcester, MA, USA.
| |
Collapse
|
13
|
Osses A, Spinelli E, Meunier F, Gaudrain E, Varnet L. Prosodic cues to word boundaries in a segmentation task assessed using reverse correlation. JASA EXPRESS LETTERS 2023; 3:095205. [PMID: 37756550 DOI: 10.1121/10.0021022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 09/03/2023] [Indexed: 09/29/2023]
Abstract
When listening to speech sounds, listeners are able to exploit acoustic features that mark the boundaries between successive words, the so-called segmentation cues. These cues are typically investigated by directly manipulating features that are hypothetically related to segmentation. The current study uses a different approach based on reverse correlation, where the stimulus manipulations are based on minimal assumptions. The method was evaluated using pairs of phonemically identical sentences in French, whose prosody was changed by introducing random f0 trajectories and segment durations. Our results support a prominent perceptual role of the f0 rise and vowel duration at the beginning of content words.
Collapse
Affiliation(s)
- Alejandro Osses
- Laboratoire des Systèmes Perceptifs, Département d'Études Cognitives, École Normale Supérieure, PSL University, CNRS, Paris, France
| | - Elsa Spinelli
- Laboratoire de Psychologie et NeuroCognition, Université Grenoble Alpes, Grenoble, France
| | | | - Etienne Gaudrain
- Lyon Neuroscience Research Center, CNRS, Inserm, Université Lyon 1, Lyon, , , , ,
| | - Léo Varnet
- Laboratoire des Systèmes Perceptifs, Département d'Études Cognitives, École Normale Supérieure, PSL University, CNRS, Paris, France
| |
Collapse
|
14
|
Yan S, Soladié C, Aucouturier JJ, Seguier R. Combining GAN with reverse correlation to construct personalized facial expressions. PLoS One 2023; 18:e0290612. [PMID: 37624781 PMCID: PMC10456187 DOI: 10.1371/journal.pone.0290612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 08/11/2023] [Indexed: 08/27/2023] Open
Abstract
Recent deep-learning techniques have made it possible to manipulate facial expressions in digital photographs or videos, however, these techniques still lack fine and personalized ways to control their creation. Moreover, current technologies are highly dependent on large labeled databases, which limits the range and complexity of expressions that can be modeled. Thus, these technologies cannot deal with non-basic emotions. In this paper, we propose a novel interdisciplinary approach combining the Generative Adversarial Network (GAN) with a technique inspired by cognitive sciences, psychophysical reverse correlation. Reverse correlation is a data-driven method able to extract an observer's 'mental representation' of what a given facial expression should look like. Our approach can generate 1) personalized facial expression prototypes, 2) of basic emotions, and non-basic emotions that are not available in existing databases, and 3) without the need for expertise. Personalized prototypes obtained with reverse correlation can then be applied to manipulate facial expressions. In addition, our system challenges the universality of facial expression prototypes by proposing the concepts of dominant and complementary action units to describe facial expression prototypes. The evaluations we conducted on a limited number of emotions validate the effectiveness of our proposed method. The code is available at https://github.com/yansen0508/Mental-Deep-Reverse-Engineering.
Collapse
Affiliation(s)
- Sen Yan
- CentraleSupelec, IETR, Rennes, France
| | | | | | | |
Collapse
|
15
|
Landsiedel J, Koldewyn K. Auditory dyadic interactions through the "eye" of the social brain: How visual is the posterior STS interaction region? IMAGING NEUROSCIENCE (CAMBRIDGE, MASS.) 2023; 1:1-20. [PMID: 37719835 PMCID: PMC10503480 DOI: 10.1162/imag_a_00003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 05/17/2023] [Indexed: 09/19/2023]
Abstract
Human interactions contain potent social cues that meet not only the eye but also the ear. Although research has identified a region in the posterior superior temporal sulcus as being particularly sensitive to visually presented social interactions (SI-pSTS), its response to auditory interactions has not been tested. Here, we used fMRI to explore brain response to auditory interactions, with a focus on temporal regions known to be important in auditory processing and social interaction perception. In Experiment 1, monolingual participants listened to two-speaker conversations (intact or sentence-scrambled) and one-speaker narrations in both a known and an unknown language. Speaker number and conversational coherence were explored in separately localised regions-of-interest (ROI). In Experiment 2, bilingual participants were scanned to explore the role of language comprehension. Combining univariate and multivariate analyses, we found initial evidence for a heteromodal response to social interactions in SI-pSTS. Specifically, right SI-pSTS preferred auditory interactions over control stimuli and represented information about both speaker number and interactive coherence. Bilateral temporal voice areas (TVA) showed a similar, but less specific, profile. Exploratory analyses identified another auditory-interaction sensitive area in anterior STS. Indeed, direct comparison suggests modality specific tuning, with SI-pSTS preferring visual information while aSTS prefers auditory information. Altogether, these results suggest that right SI-pSTS is a heteromodal region that represents information about social interactions in both visual and auditory domains. Future work is needed to clarify the roles of TVA and aSTS in auditory interaction perception and further probe right SI-pSTS interaction-selectivity using non-semantic prosodic cues.
Collapse
Affiliation(s)
- Julia Landsiedel
- Department of Psychology, School of Human and Behavioural Sciences, Bangor University, Bangor, United Kingdom
| | - Kami Koldewyn
- Department of Psychology, School of Human and Behavioural Sciences, Bangor University, Bangor, United Kingdom
| |
Collapse
|
16
|
Hickok G, Venezia J, Teghipco A. Beyond Broca: neural architecture and evolution of a dual motor speech coordination system. Brain 2023; 146:1775-1790. [PMID: 36746488 PMCID: PMC10411947 DOI: 10.1093/brain/awac454] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Revised: 11/04/2022] [Accepted: 11/19/2022] [Indexed: 02/08/2023] Open
Abstract
Classical neural architecture models of speech production propose a single system centred on Broca's area coordinating all the vocal articulators from lips to larynx. Modern evidence has challenged both the idea that Broca's area is involved in motor speech coordination and that there is only one coordination network. Drawing on a wide range of evidence, here we propose a dual speech coordination model in which laryngeal control of pitch-related aspects of prosody and song are coordinated by a hierarchically organized dorsolateral system while supralaryngeal articulation at the phonetic/syllabic level is coordinated by a more ventral system posterior to Broca's area. We argue further that these two speech production subsystems have distinguishable evolutionary histories and discuss the implications for models of language evolution.
Collapse
Affiliation(s)
- Gregory Hickok
- Department of Cognitive Sciences, University of California, Irvine, CA 92697, USA
- Department of Language Science, University of California, Irvine, CA 92697, USA
| | - Jonathan Venezia
- Auditory Research Laboratory, VA Loma Linda Healthcare System, Loma Linda, CA 92357, USA
- Department of Otolaryngology—Head and Neck Surgery, Loma Linda University School of Medicine, Loma Linda, CA 92350, USA
| | - Alex Teghipco
- Department of Psychology, University of South Carolina, Columbia, SC 29208, USA
| |
Collapse
|
17
|
Wang L, Ong JH, Ponsot E, Hou Q, Jiang C, Liu F. Mental representations of speech and musical pitch contours reveal a diversity of profiles in autism spectrum disorder. AUTISM : THE INTERNATIONAL JOURNAL OF RESEARCH AND PRACTICE 2023; 27:629-646. [PMID: 35848413 PMCID: PMC10074762 DOI: 10.1177/13623613221111207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
LAY ABSTRACT As a key auditory attribute of sounds, pitch is ubiquitous in our everyday listening experience involving language, music and environmental sounds. Given its critical role in auditory processing related to communication, numerous studies have investigated pitch processing in autism spectrum disorder. However, the findings have been mixed, reporting either enhanced, typical or impaired performance among autistic individuals. By investigating top-down comparisons of internal mental representations of pitch contours in speech and music, this study shows for the first time that, while autistic individuals exhibit diverse profiles of pitch processing compared to non-autistic individuals, their mental representations of pitch contours are typical across domains. These findings suggest that pitch-processing mechanisms are shared across domains in autism spectrum disorder and provide theoretical implications for using music to improve speech for those autistic individuals who have language problems.
Collapse
Affiliation(s)
- Li Wang
- University of Reading, UK
- The Chinese University of Hong Kong, Hong
Kong
| | | | | | - Qingqi Hou
- Nanjing Normal University of Special
Education, China
| | | | | |
Collapse
|
18
|
Nakai T, Rachman L, Arias Sarah P, Okanoya K, Aucouturier JJ. Algorithmic voice transformations reveal the phonological basis of language-familiarity effects in cross-cultural emotion judgments. PLoS One 2023; 18:e0285028. [PMID: 37134091 PMCID: PMC10156011 DOI: 10.1371/journal.pone.0285028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Accepted: 04/13/2023] [Indexed: 05/04/2023] Open
Abstract
People have a well-described advantage in identifying individuals and emotions in their own culture, a phenomenon also known as the other-race and language-familiarity effect. However, it is unclear whether native-language advantages arise from genuinely enhanced capacities to extract relevant cues in familiar speech or, more simply, from cultural differences in emotional expressions. Here, to rule out production differences, we use algorithmic voice transformations to create French and Japanese stimulus pairs that differed by exactly the same acoustical characteristics. In two cross-cultural experiments, participants performed better in their native language when categorizing vocal emotional cues and detecting non-emotional pitch changes. This advantage persisted over three types of stimulus degradation (jabberwocky, shuffled and reversed sentences), which disturbed semantics, syntax, and supra-segmental patterns, respectively. These results provide evidence that production differences are not the sole drivers of the language-familiarity effect in cross-cultural emotion perception. Listeners' unfamiliarity with the phonology of another language, rather than with its syntax or semantics, impairs the detection of pitch prosodic cues and, in turn, the recognition of expressive prosody.
Collapse
Affiliation(s)
- Tomoya Nakai
- Lyon Neuroscience Research Center (CRNL), (INSERM/CNRS/University of Lyon), Bron, France
- Center for Information and Neural Networks, National Institute of Information and Communications Technology, Suita, Japan
| | - Laura Rachman
- Department of Otorhinolaryngology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Pablo Arias Sarah
- Lund University Cognitive Science, Lund University, Lund, Sweden
- Sciences et Technologies de la Musique et du Son (IRCAM/CNRS/Sorbonne Université), Paris, France
- School of Psychology & Neuroscience, University of Glasgow, Glasgow, United Kingdom
| | - Kazuo Okanoya
- The University of Tokyo, Graduate School of Arts and Sciences, Tokyo, Japan
- Advanced Comprehensive Research Organization, Teikyo University, Tokyo, Japan
| | - Jean-Julien Aucouturier
- Sciences et Technologies de la Musique et du Son (IRCAM/CNRS/Sorbonne Université), Paris, France
- FEMTO-ST Institute (CNRS/Université de Bourgogne Franche Comté), Besançon, France
| |
Collapse
|
19
|
Ji Y, Hu Y, Jiang X. Segmental and suprasegmental encoding of speaker confidence in Wuxi dialect vowels. Front Psychol 2022; 13:1028106. [PMID: 36578688 PMCID: PMC9791101 DOI: 10.3389/fpsyg.2022.1028106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 11/15/2022] [Indexed: 12/14/2022] Open
Abstract
Introduction Wuxi dialect is a variation of Wu dialect spoken in eastern China and is characterized by a rich tonal system. Compared with standard Mandarin speakers, those of Wuxi dialect as their mother tongue can be more efficient in varying vocal cues to encode communicative meanings in speech communication. While literature has demonstrated that speakers encode high vs. low confidence in global prosodic cues at the sentence level, it is unknown how speakers' intended confidence is encoded at a more local, phonetic level. This study aimed to explore the effects of speakers' intended confidence on both prosodic and formant features of vowels in two lexical tones (the flat tone and the contour tone) of Wuxi dialect. Methods Words of a single vowel were spoken in confident, unconfident, or neutral tone of voice by native Wuxi dialect speakers using a standard elicitation procedure. Linear-mixed effects modeling and parametric bootstrapping testing were performed. Results The results showed that (1) the speakers raised both F1 and F2 in the confident level (compared with the neutral-intending expression). Additionally, F1 can distinguish between the confident and unconfident expressions; (2) Compared with the neutral-intending expression, the speakers raised mean f0, had a greater variation of f0 and prolonged pronunciation time in the unconfident level while they raised mean intensity, had a greater variation of intensity and prolonged pronunciation time in the confident level. (3) The speakers modulated mean f0 and mean intensity to a larger extent on the flat tone than the contour tone to differentiate between levels of confidence in the voice, while they modulated f0 and intensity range more only on the contour tone. Discussion These findings shed new light on the mechanisms of segmental and suprasegmental encoding of speaker confidence and lack of confidence at the vowel level, highlighting the interplay of lexical tone and vocal expression in speech communication.
Collapse
|
20
|
Si C, Zhang C, Lau P, Yang Y, Li B. Modelling representations in speech normalization of prosodic cues. Sci Rep 2022; 12:14635. [PMID: 36030274 PMCID: PMC9420126 DOI: 10.1038/s41598-022-18838-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 08/22/2022] [Indexed: 12/02/2022] Open
Abstract
The lack of invariance problem in speech perception refers to a fundamental problem of how listeners deal with differences of speech sounds produced by various speakers. The current study is the first to test the contributions of mentally stored distributional information in normalization of prosodic cues. This study starts out by modelling distributions of acoustic cues from a speech corpus. We proceeded to conduct three experiments using both naturally produced lexical tones with estimated distributions and manipulated lexical tones with f0 values generated from simulated distributions. State of the art statistical techniques have been used to examine the effects of distribution parameters in normalization and identification curves with respect to each parameter. Based on the significant effects of distribution parameters, we proposed a probabilistic parametric representation (PPR), integrating knowledge from previously established distributions of speakers with their indexical information. PPR is still accessed during speech perception even when contextual information is present. We also discussed the procedure of normalization of speech signals produced by unfamiliar talker with and without contexts and the access of long-term stored representations.
Collapse
Affiliation(s)
- Chen Si
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China.
- Hong Kong Polytechnic University-Peking University Research Centre on Chinese Linguistics, Kowloon, Hong Kong SAR, China.
- Research Centre for Language, Cognition, and Neuroscience, University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China.
| | - Caicai Zhang
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China
- Hong Kong Polytechnic University-Peking University Research Centre on Chinese Linguistics, Kowloon, Hong Kong SAR, China
- Research Centre for Language, Cognition, and Neuroscience, University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China
| | - Puiyin Lau
- Department of Statistics and Actuarial Science, University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China
| | - Yike Yang
- Department of Chinese Language and Literature, Hong Kong Shue Yan University, North Point, Hong Kong SAR, China
| | - Bei Li
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China
| |
Collapse
|
21
|
Zhang J, Tao S. Vocal Characteristics Influence Women's Perceptions of Infidelity and Relationship Investment in China. EVOLUTIONARY PSYCHOLOGY 2022; 20:14747049221108883. [PMID: 35898188 PMCID: PMC10303567 DOI: 10.1177/14747049221108883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 05/29/2022] [Accepted: 06/07/2022] [Indexed: 11/15/2022] Open
Abstract
Vocal characteristics are important cues to form social impressions. Previous studies indicated that men with masculine voices are perceived as engaging in higher rates of infidelity and being less committed to their relationship. In the current study, we examined how women in China perceive information regarding infidelity and relationship investment conveyed by the voices (voice pitch and vocal tract length) of males, and whether different vocal characteristics play a similar role in driving these impressions. In addition, we examined whether these perceptions are consistent in Chinese and English language contexts. The results indicated that women perceived men with more masculine voices (lower voice pitch and longer vocal tract length) as showing a lower likelihood of infidelity and higher relationship investment; further, women who preferred more masculine voices in long-term relationships, but not in short-term relationships, were more likely to perceive men with masculine voices as less likely to engage in infidelity and more likely to invest in their relationship. Moreover, the participants formed very similar impressions irrespective of whether the voices spoke native (Chinese) or foreign (English) languages. These results provide new evidence for the role of the voice in women's choices in selecting long-term partners.
Collapse
Affiliation(s)
- Jing Zhang
- School of Psychology, Sichuan Normal University, Chengdu, China
| | - Shuli Tao
- School of Psychology, Sichuan Normal University, Chengdu, China
| |
Collapse
|
22
|
|
23
|
Pruvost-Robieux E, André-Obadia N, Marchi A, Sharshar T, Liuni M, Gavaret M, Aucouturier JJ. It’s not what you say, it’s how you say it: a retrospective study of the impact of prosody on own-name P300 in comatose patients. Clin Neurophysiol 2022; 135:154-161. [DOI: 10.1016/j.clinph.2021.12.015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 12/16/2021] [Accepted: 12/18/2021] [Indexed: 02/05/2023]
|
24
|
Anikin A, Pisanski K, Reby D. Static and dynamic formant scaling conveys body size and aggression. ROYAL SOCIETY OPEN SCIENCE 2022; 9:211496. [PMID: 35242348 PMCID: PMC8753157 DOI: 10.1098/rsos.211496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Accepted: 12/09/2021] [Indexed: 05/03/2023]
Abstract
When producing intimidating aggressive vocalizations, humans and other animals often extend their vocal tracts to lower their voice resonance frequencies (formants) and thus sound big. Is acoustic size exaggeration more effective when the vocal tract is extended before, or during, the vocalization, and how do listeners interpret within-call changes in apparent vocal tract length? We compared perceptual effects of static and dynamic formant scaling in aggressive human speech and nonverbal vocalizations. Acoustic manipulations corresponded to elongating or shortening the vocal tract either around (Experiment 1) or from (Experiment 2) its resting position. Gradual formant scaling that preserved average frequencies conveyed the impression of smaller size and greater aggression, regardless of the direction of change. Vocal tract shortening from the original length conveyed smaller size and less aggression, whereas vocal tract elongation conveyed larger size and more aggression, and these effects were stronger for static than for dynamic scaling. Listeners familiarized with the speaker's natural voice were less often 'fooled' by formant manipulations when judging speaker size, but paid more attention to formants when judging aggressive intent. Thus, within-call vocal tract scaling conveys emotion, but a better way to sound large and intimidating is to keep the vocal tract consistently extended.
Collapse
Affiliation(s)
- Andrey Anikin
- Division of Cognitive Science, Lund University, Lund, Sweden
- ENES Sensory Neuro-Ethology lab, CRNL, Jean Monnet University of Saint Étienne, UMR 5293, 42023, St-Étienne, France
| | - Katarzyna Pisanski
- ENES Sensory Neuro-Ethology lab, CRNL, Jean Monnet University of Saint Étienne, UMR 5293, 42023, St-Étienne, France
| | - David Reby
- ENES Sensory Neuro-Ethology lab, CRNL, Jean Monnet University of Saint Étienne, UMR 5293, 42023, St-Étienne, France
| |
Collapse
|
25
|
Pinheiro AP, Anikin A, Conde T, Sarzedas J, Chen S, Scott SK, Lima CF. Emotional authenticity modulates affective and social trait inferences from voices. Philos Trans R Soc Lond B Biol Sci 2021; 376:20200402. [PMID: 34719249 PMCID: PMC8558771 DOI: 10.1098/rstb.2020.0402] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/12/2021] [Indexed: 01/31/2023] Open
Abstract
The human voice is a primary tool for verbal and nonverbal communication. Studies on laughter emphasize a distinction between spontaneous laughter, which reflects a genuinely felt emotion, and volitional laughter, associated with more intentional communicative acts. Listeners can reliably differentiate the two. It remains unclear, however, if they can detect authenticity in other vocalizations, and whether authenticity determines the affective and social impressions that we form about others. Here, 137 participants listened to laughs and cries that could be spontaneous or volitional and rated them on authenticity, valence, arousal, trustworthiness and dominance. Bayesian mixed models indicated that listeners detect authenticity similarly well in laughter and crying. Speakers were also perceived to be more trustworthy, and in a higher arousal state, when their laughs and cries were spontaneous. Moreover, spontaneous laughs were evaluated as more positive than volitional ones, and we found that the same acoustic features predicted perceived authenticity and trustworthiness in laughter: high pitch, spectral variability and less voicing. For crying, associations between acoustic features and ratings were less reliable. These findings indicate that emotional authenticity shapes affective and social trait inferences from voices, and that the ability to detect authenticity in vocalizations is not limited to laughter. This article is part of the theme issue 'Voice modulation: from origin and mechanism to social impact (Part I)'.
Collapse
Affiliation(s)
- Ana P. Pinheiro
- CICPSI, Faculdade de Psicologia, Universidade de Lisboa, Alameda da Universidade, 1649-013 Lisboa, Portugal
| | - Andrey Anikin
- Equipe de Neuro-Ethologie Sensorielle (ENES)/Centre de Recherche em Neurosciences de Lyon (CRNL), University of Lyon/Saint-Etienne, CNRS UMR5292, INSERM UMR_S 1028, 42023 Saint-Etienne, France
- Division of Cognitive Science, Lund University, 221 00 Lund, Sweden
| | - Tatiana Conde
- CICPSI, Faculdade de Psicologia, Universidade de Lisboa, Alameda da Universidade, 1649-013 Lisboa, Portugal
| | - João Sarzedas
- CICPSI, Faculdade de Psicologia, Universidade de Lisboa, Alameda da Universidade, 1649-013 Lisboa, Portugal
| | - Sinead Chen
- National Taiwan University, Taipei City, 10617 Taiwan
| | - Sophie K. Scott
- Institute of Cognitive Neuroscience, University College London, London WC1N 3AZ, UK
| | - César F. Lima
- Institute of Cognitive Neuroscience, University College London, London WC1N 3AZ, UK
- Instituto Universitário de Lisboa (ISCTE-IUL), Avenida das Forças Armadas, 1649-026 Lisboa, Portugal
| |
Collapse
|
26
|
Kurumada C, Roettger TB. Thinking probabilistically in the study of intonational speech prosody. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2021; 13:e1579. [PMID: 34599647 DOI: 10.1002/wcs.1579] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 08/09/2021] [Accepted: 08/26/2021] [Indexed: 11/07/2022]
Abstract
Speech prosody, the melodic and rhythmic properties of a language, plays a critical role in our everyday communication. Researchers have identified unique patterns of prosody that segment words and phrases, highlight focal elements in a sentence, and convey holistic meanings and speech acts that interact with the information shared in context. The mapping between the sound and meaning represented in prosody is suggested to be probabilistic-the same physical instance of sounds can support multiple meanings across talkers and contexts while the same meaning can be encoded in physically distinct sound patterns (e.g., pitch movements). The current overview presents an analysis framework for probing the nature of this probabilistic relationship. Illustrated by examples from the literature and a dataset of German focus marking, we discuss the production variability within and across talkers and consider challenges that this variability imposes on the comprehension system. A better understanding of these challenges, we argue, will illuminate how the human perceptual, cognitive, and computational mechanisms may navigate the variability to arrive at a coherent understanding of speech prosody. The current paper is intended to be an introduction for those who are interested in thinking probabilistically about the sound-meaning mapping in prosody. Open questions for future research are discussed with proposals for examining prosodic production and comprehension within a comprehensive, mathematically-motivated framework of probabilistic inference under uncertainty. This article is categorized under: Linguistics > Language in Mind and Brain Psychology > Language.
Collapse
Affiliation(s)
- Chigusa Kurumada
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, New York, USA
| | - Timo B Roettger
- Department of Linguistics & Scandinavian Studies, Universitetet i Oslo, Oslo, Norway
| |
Collapse
|
27
|
Pisanski K, Groyecka-Bernard A, Sorokowski P. Human voice pitch measures are robust across a variety of speech recordings: methodological and theoretical implications. Biol Lett 2021; 17:20210356. [PMID: 34582736 DOI: 10.1098/rsbl.2021.0356] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Fundamental frequency (fo), perceived as voice pitch, is the most sexually dimorphic, perceptually salient and intensively studied voice parameter in human nonverbal communication. Thousands of studies have linked human fo to biological and social speaker traits and life outcomes, from reproductive to economic. Critically, researchers have used myriad speech stimuli to measure fo and infer its functional relevance, from individual vowels to longer bouts of spontaneous speech. Here, we acoustically analysed fo in nearly 1000 affectively neutral speech utterances (vowels, words, counting, greetings, read paragraphs and free spontaneous speech) produced by the same 154 men and women, aged 18-67, with two aims: first, to test the methodological validity of comparing fo measures from diverse speech stimuli, and second, to test the prediction that the vast inter-individual differences in habitual fo found between same-sex adults are preserved across speech types. Indeed, despite differences in linguistic content, duration, scripted or spontan--eous production and within-individual variability, we show that 42-81% of inter-individual differences in fo can be explained between any two speech types. Beyond methodological implications, together with recent evidence that inter-individual differences in fo are remarkably stable across the lifespan and generalize to emotional speech and nonverbal vocalizations, our results further substantiate voice pitch as a robust and reliable biomarker in human communication.
Collapse
Affiliation(s)
- Katarzyna Pisanski
- University of Wroclaw, Wroclaw, Poland.,CNRS/Centre National de la Recherche Scientifique, Laboratoire Dynamique du Langage, Université Lyon 2, Lyon, France.,Equipe de Neuro-Ethologie Sensorielle, Centre de Recherche en Neurosciences de Lyon, Jean Monnet University of Saint-Etienne, France
| | - Agata Groyecka-Bernard
- University of Wroclaw, Wroclaw, Poland.,Johannes Gutenberg-Universität Mainz, Mainz, Germany
| | | |
Collapse
|
28
|
Knight S, Lavan N, Torre I, McGettigan C. The influence of perceived vocal traits on trusting behaviours in an economic game. Q J Exp Psychol (Hove) 2021; 74:1747-1754. [PMID: 33783278 PMCID: PMC8392757 DOI: 10.1177/17470218211010144] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
When presented with voices, we make rapid, automatic judgements of social traits such as trustworthiness—and such judgements are highly consistent across listeners. However, it remains unclear whether voice-based first impressions actually influence behaviour towards a voice’s owner, and—if they do—whether and how they interact over time with the voice owner’s observed actions to further influence the listener’s behaviour. This study used an investment game paradigm to investigate (1) whether voices judged to differ in relevant social traits accrued different levels of investment and/or (2) whether first impressions of the voices interacted with the behaviour of their apparent owners to influence investments over time. Results show that participants were responding to their partner’s behaviour. Crucially, however, there were no effects of voice. These findings suggest that, at least under some conditions, social traits perceived from the voice alone may not influence trusting behaviours in the context of a virtual interaction.
Collapse
Affiliation(s)
- Sarah Knight
- Department of Psychology, University of York, York, UK.,Speech, Hearing and Phonetic Sciences, University College London, London, UK.,Department of Psychology, Royal Holloway, University of London, London, UK
| | - Nadine Lavan
- Speech, Hearing and Phonetic Sciences, University College London, London, UK.,Department of Psychology, Royal Holloway, University of London, London, UK
| | - Ilaria Torre
- Division of Robotics, Perception and Learning, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Carolyn McGettigan
- Speech, Hearing and Phonetic Sciences, University College London, London, UK.,Department of Psychology, Royal Holloway, University of London, London, UK
| |
Collapse
|
29
|
Goupil L, Ponsot E, Richardson D, Reyes G, Aucouturier JJ. Listeners' perceptions of the certainty and honesty of a speaker are associated with a common prosodic signature. Nat Commun 2021; 12:861. [PMID: 33558510 PMCID: PMC7870677 DOI: 10.1038/s41467-020-20649-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Accepted: 11/20/2020] [Indexed: 02/07/2023] Open
Abstract
The success of human cooperation crucially depends on mechanisms enabling individuals to detect unreliability in their conspecifics. Yet, how such epistemic vigilance is achieved from naturalistic sensory inputs remains unclear. Here we show that listeners' perceptions of the certainty and honesty of other speakers from their speech are based on a common prosodic signature. Using a data-driven method, we separately decode the prosodic features driving listeners' perceptions of a speaker's certainty and honesty across pitch, duration and loudness. We find that these two kinds of judgments rely on a common prosodic signature that is perceived independently from individuals' conceptual knowledge and native language. Finally, we show that listeners extract this prosodic signature automatically, and that this impacts the way they memorize spoken words. These findings shed light on a unique auditory adaptation that enables human listeners to quickly detect and react to unreliability during linguistic interactions.
Collapse
Affiliation(s)
- Louise Goupil
- STMS UMR 9912 (CNRS/IRCAM/SU), Paris, France.
- University of East London, London, UK.
| | - Emmanuel Ponsot
- Laboratoire des Systèmes Perceptifs, Département d'Études Cognitives, École Normale Supérieure, PSL University, CNRS, Paris, France
- Hearing Technology - WAVES, Department of Information Technology, Ghent University, Ghent, Belgium
| | | | | | - Jean-Julien Aucouturier
- STMS UMR 9912 (CNRS/IRCAM/SU), Paris, France
- FEMTO-ST (FEMTO-ST UMR 6174, CNRS/UBFC/ENSMM/UTBM, Besançon, France
| |
Collapse
|
30
|
Zhang J, Zheng L, Zhang S, Xu W, Zheng Y. Vocal characteristics predict infidelity intention and relationship commitment in men but not in women. PERSONALITY AND INDIVIDUAL DIFFERENCES 2021. [DOI: 10.1016/j.paid.2020.110389] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
|
31
|
Learning metrics on spectrotemporal modulations reveals the perception of musical instrument timbre. Nat Hum Behav 2020; 5:369-377. [PMID: 33257878 DOI: 10.1038/s41562-020-00987-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Accepted: 09/18/2020] [Indexed: 11/08/2022]
Abstract
Humans excel at using sounds to make judgements about their immediate environment. In particular, timbre is an auditory attribute that conveys crucial information about the identity of a sound source, especially for music. While timbre has been primarily considered to occupy a multidimensional space, unravelling the acoustic correlates of timbre remains a challenge. Here we re-analyse 17 datasets from published studies between 1977 and 2016 and observe that original results are only partially replicable. We use a data-driven computational account to reveal the acoustic correlates of timbre. Human dissimilarity ratings are simulated with metrics learned on acoustic spectrotemporal modulation models inspired by cortical processing. We observe that timbre has both generic and experiment-specific acoustic correlates. These findings provide a broad overview of former studies on musical timbre and identify its relevant acoustic substrates according to biologically inspired models.
Collapse
|
32
|
Voice Pitch – A Valid Indicator of One’s Unfaithfulness in Committed Relationships? ADAPTIVE HUMAN BEHAVIOR AND PHYSIOLOGY 2020. [DOI: 10.1007/s40750-020-00154-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Abstract
Objectives
When judging a male speakers’ likelihood to act sexually unfaithful in a committed relationship, listeners rely on the speakers’ voice pitch such that lower voice pitch is perceived as indicating being more unfaithful. In line with this finding, a recent study (Schild et al. Behavioral Ecology, 2020) provided first evidence that voice pitch might indeed be a valid cue to sexual infidelity in men. In this study, male speakers with lower voice pitch, as indicated by lower mean fundamental frequency (mean F0), were actually more likely to report having been sexually unfaithful in the past. Although these results fit the literature on vocal perceptions in contexts of sexual selection, the study was, as stated by the authors, underpowered. Further, the study solely focused on male speakers, which leaves it open whether these findings are also transferable to female speakers.
Methods
We reanalyzed three datasets (Asendorpf et al. European Journal of Personality, 25, 16–30, 2011; Penke and Asendorpf Journal of Personality and Social Psychology, 95, 1113–1135, 2008; Stern et al. 2020) that include voice recordings and infidelity data of overall 865 individuals (63,36% female) in order to test the replicability of and further extend past research.
Results
A significant negative link between mean F0 and self-reported infidelity was found in only one out of two datasets for men and only one out of three datasets for women. Two meta-analyses (accounting for the sample sizes and including data of Schild et al. 2020), however, suggest that lower mean F0 might be a valid indicator of higher probability of self-reported infidelity in both men and women.
Conclusions
In line with prior research, higher masculinity, as indicated by lower mean F0, seems to be linked to self-reported infidelity in both men and women. However, given methodological shortcomings, future studies should set out to further delve into these findings.
Collapse
|
33
|
Guldner S, Nees F, McGettigan C. Vocomotor and Social Brain Networks Work Together to Express Social Traits in Voices. Cereb Cortex 2020; 30:6004-6020. [PMID: 32577719 DOI: 10.1093/cercor/bhaa175] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Revised: 05/08/2020] [Accepted: 05/31/2020] [Indexed: 11/14/2022] Open
Abstract
Voice modulation is important when navigating social interactions-tone of voice in a business negotiation is very different from that used to comfort an upset child. While voluntary vocal behavior relies on a cortical vocomotor network, social voice modulation may require additional social cognitive processing. Using functional magnetic resonance imaging, we investigated the neural basis for social vocal control and whether it involves an interplay of vocal control and social processing networks. Twenty-four healthy adult participants modulated their voice to express social traits along the dimensions of the social trait space (affiliation and competence) or to express body size (control for vocal flexibility). Naïve listener ratings showed that vocal modulations were effective in evoking social trait ratings along the two primary dimensions of the social trait space. Whereas basic vocal modulation engaged the vocomotor network, social voice modulation specifically engaged social processing regions including the medial prefrontal cortex, superior temporal sulcus, and precuneus. Moreover, these regions showed task-relevant modulations in functional connectivity to the left inferior frontal gyrus, a core vocomotor control network area. These findings highlight the impact of the integration of vocal motor control and social information processing for socially meaningful voice modulation.
Collapse
Affiliation(s)
- Stella Guldner
- Department of Cognitive and Clinical Neuroscience, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim 68159, Germany.,Graduate School of Economic and Social Sciences, University of Mannheim, Mannheim 68159, Germany.,Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| | - Frauke Nees
- Department of Cognitive and Clinical Neuroscience, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim 68159, Germany.,Institute of Medical Psychology and Medical Sociology, University Medical Center Schleswig Holstein, Kiel University, Kiel 24105, Germany
| | - Carolyn McGettigan
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK.,Department of Psychology, Royal Holloway, University of London, Egham TW20 0EX, UK
| |
Collapse
|
34
|
Arias P, Rachman L, Liuni M, Aucouturier JJ. Beyond Correlation: Acoustic Transformation Methods for the Experimental Study of Emotional Voice and Speech. EMOTION REVIEW 2020. [DOI: 10.1177/1754073920934544] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
While acoustic analysis methods have become a commodity in voice emotion research, experiments that attempt not only to describe but to computationally manipulate expressive cues in emotional voice and speech have remained relatively rare. We give here a nontechnical overview of voice-transformation techniques from the audio signal-processing community that we believe are ripe for adoption in this context. We provide sound examples of what they can achieve, examples of experimental questions for which they can be used, and links to open-source implementations. We point at a number of methodological properties of these algorithms, such as being specific, parametric, exhaustive, and real-time, and describe the new possibilities that these open for the experimental study of the emotional voice.
Collapse
Affiliation(s)
- Pablo Arias
- STMS UMR9912, IRCAM/CNRS/Sorbonne Université, France
| | - Laura Rachman
- STMS UMR9912, IRCAM/CNRS/Sorbonne Université, France
| | - Marco Liuni
- STMS UMR9912, IRCAM/CNRS/Sorbonne Université, France
| | | |
Collapse
|
35
|
Schirmer A, Chiu MH, Lo C, Feng YJ, Penney TB. Angry, old, male - and trustworthy? How expressive and person voice characteristics shape listener trust. PLoS One 2020; 15:e0232431. [PMID: 32365066 PMCID: PMC7197804 DOI: 10.1371/journal.pone.0232431] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Accepted: 04/14/2020] [Indexed: 11/23/2022] Open
Abstract
This study examined how trustworthiness impressions depend on vocal expressive and person characteristics and how their dependence may be explained by acoustical profiles. Sentences spoken in a range of emotional and conversational expressions by 20 speakers differing in age and sex were presented to 80 age and sex matched listeners who rated speaker trustworthiness. Positive speaker valence but not arousal consistently predicted greater perceived trustworthiness. Additionally, voices from younger as compared with older and female as compared with male speakers were judged more trustworthy. Acoustic analysis highlighted several parameters as relevant for being perceived as trustworthy (i.e., accelerated tempo, low harmonic-to-noise ratio, more shimmer, low fundamental frequency, more jitter, large intensity range) and showed that effects partially overlapped with those for perceived speaker affect, age, but not sex. Specifically, a fast speech rate and a lower harmonic-to-noise ratio differentiated trustworthy from untrustworthy, positive from negative, and younger from older voices. Male and female voices differed in other ways. Together, these results show that a speaker’s expressive as well as person characteristics shape trustworthiness impressions and that their effect likely results from a combination of low-level perceptual and higher-order conceptual processes.
Collapse
Affiliation(s)
- Annett Schirmer
- Department of Psychology, The Chinese University of Hong Kong, Shatin, Hong Kong
- The Brain and Mind Institute, The Chinese University of Hong Kong, Shatin, Hong Kong
- * E-mail:
| | - Man Hey Chiu
- Department of Psychology, The Chinese University of Hong Kong, Shatin, Hong Kong
| | - Clive Lo
- Department of Psychology, The Chinese University of Hong Kong, Shatin, Hong Kong
| | - Yen-Ju Feng
- Department of Psychology, National Taiwan University, Taipei, Taiwan
| | - Trevor B. Penney
- Department of Psychology, The Chinese University of Hong Kong, Shatin, Hong Kong
- The Brain and Mind Institute, The Chinese University of Hong Kong, Shatin, Hong Kong
| |
Collapse
|
36
|
Schyns PG, Zhan J, Jack RE, Ince RAA. Revealing the information contents of memory within the stimulus information representation framework. Philos Trans R Soc Lond B Biol Sci 2020; 375:20190705. [PMID: 32248774 PMCID: PMC7209912 DOI: 10.1098/rstb.2019.0705] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The information contents of memory are the cornerstone of the most influential models in cognition. To illustrate, consider that in predictive coding, a prediction implies that specific information is propagated down from memory through the visual hierarchy. Likewise, recognizing the input implies that sequentially accrued sensory evidence is successfully matched with memorized information (categorical knowledge). Although the existing models of prediction, memory, sensory representation and categorical decision are all implicitly cast within an information processing framework, it remains a challenge to precisely specify what this information is, and therefore where, when and how the architecture of the brain dynamically processes it to produce behaviour. Here, we review a framework that addresses these challenges for the studies of perception and categorization–stimulus information representation (SIR). We illustrate how SIR can reverse engineer the information contents of memory from behavioural and brain measures in the context of specific cognitive tasks that involve memory. We discuss two specific lessons from this approach that generally apply to memory studies: the importance of task, to constrain what the brain does, and of stimulus variations, to identify the specific information contents that are memorized, predicted, recalled and replayed. This article is part of the Theo Murphy meeting issue ‘Memory reactivation: replaying events past, present and future’.
Collapse
Affiliation(s)
- Philippe G Schyns
- Institute of Neuroscience and Psychology, University of Glasgow, Scotland G12 8QB, UK.,School of Psychology, University of Glasgow, Scotland G12 8QB, UK
| | - Jiayu Zhan
- Institute of Neuroscience and Psychology, University of Glasgow, Scotland G12 8QB, UK
| | - Rachael E Jack
- School of Psychology, University of Glasgow, Scotland G12 8QB, UK
| | - Robin A A Ince
- Institute of Neuroscience and Psychology, University of Glasgow, Scotland G12 8QB, UK
| |
Collapse
|
37
|
Abstract
The processing of emotional nonlinguistic information in speech is defined as emotional prosody. This auditory nonlinguistic information is essential in the decoding of social interactions and in our capacity to adapt and react adequately by taking into account contextual information. An integrated model is proposed at the functional and brain levels, encompassing 5 main systems that involve cortical and subcortical neural networks relevant for the processing of emotional prosody in its major dimensions, including perception and sound organization; related action tendencies; and associated values that integrate complex social contexts and ambiguous situations.
Collapse
Affiliation(s)
- Didier Grandjean
- Department of Psychology and Educational Sciences and Swiss Center for Affective Sciences, University of Geneva, Switzerland
| |
Collapse
|
38
|
Tognetti A, Durand V, Barkat-Defradas M, Hopfensitz A. Does he sound cooperative? Acoustic correlates of cooperativeness. Br J Psychol 2019; 111:823-839. [PMID: 31820449 DOI: 10.1111/bjop.12437] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2019] [Revised: 10/28/2019] [Indexed: 11/29/2022]
Abstract
The sound of the voice has several acoustic features that influence the perception of how cooperative the speaker is. It remains unknown, however, whether these acoustic features are associated with actual cooperative behaviour. This issue is crucial to disentangle whether inferences of traits from voices are based on stereotypes, or facilitate the detection of cooperative partners. The latter is likely due to the pleiotropic effect that testosterone has on both cooperative behaviours and acoustic features. In the present study, we quantified the cooperativeness of native French-speaking men in a one-shot public good game. We also measured mean fundamental frequency, pitch variations, roughness, and breathiness from spontaneous speech recordings of the same men and collected saliva samples to measure their testosterone levels. Our results showed that men with lower-pitched voices and greater pitch variations were more cooperative. However, testosterone did not influence cooperative behaviours or acoustic features. Our finding provides the first evidence of the acoustic correlates of cooperative behaviour. When considered in combination with the literature on the detection of cooperativeness from faces, the results imply that assessment of cooperative behaviour would be improved by simultaneous consideration of visual and auditory cues.
Collapse
Affiliation(s)
- Arnaud Tognetti
- Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden.,Institute for Advanced Study in Toulouse, France
| | | | | | - Astrid Hopfensitz
- Toulouse School of Economics, Université Toulouse 1 Capitole, France
| |
Collapse
|
39
|
Schild C, Stern J, Zettler I. Linking men's voice pitch to actual and perceived trustworthiness across domains. Behav Ecol 2019. [DOI: 10.1093/beheco/arz173] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Abstract
Previous research suggests that judgments about a male speaker's trustworthiness vary due to the speaker's voice pitch (mean F0) and differ across domains. However, mixed results in terms of the direction and extent of such effects have been reported. Moreover, no study so far has investigated whether men's mean F0 is, indeed, a valid cue to their self-reported and behavioral trustworthiness, and whether trustworthiness judgments are accurate. We tested the relation between mean F0 and actual general, economic, and mating-related trustworthiness in 181 men, as well as trustworthiness judgments of 95 perceivers across all three domains. Analyses show that men's mean F0 is not related to Honesty–Humility (as a trait indicator of general trustworthiness), trustworthy intentions, or trust game behavior, suggesting no relation of mean F0 to general or economic trustworthiness. In contrast, results suggest that mean F0 might be related to mating-related trustworthiness (as indicated by self-reported relationship infidelity). However, lower mean F0 was judged as more trustworthy in economic but less trustworthy in mating-related domains and rather weakly related to judgments of general trustworthiness. Trustworthiness judgments were not accurate for general or economic trustworthiness, but exploratory analyses suggest that women might be able to accurately judge men's relationship infidelity based on their voice pitch. Next to these analyses, we report exploratory analyses involving and controlling for additional voice parameters.
Collapse
Affiliation(s)
- Christoph Schild
- Department of Psychology, University of Copenhagen, Øster Farimagsgade, Copenhagen, Denmark
| | - Julia Stern
- Department of Psychology and Leibniz Science Campus Primate Cognition, University of Goettingen, Gosslerstrasse, Goettingen, Germany
| | - Ingo Zettler
- Department of Psychology, University of Copenhagen, Øster Farimagsgade, Copenhagen, Denmark
| |
Collapse
|
40
|
Hoshi H, Kwon N, Akita K, Auracher J. Semantic Associations Dominate Over Perceptual Associations in Vowel-Size Iconicity. Iperception 2019; 10:2041669519861981. [PMID: 31321019 PMCID: PMC6628535 DOI: 10.1177/2041669519861981] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2018] [Accepted: 05/26/2019] [Indexed: 11/21/2022] Open
Abstract
We tested the influence of perceptual features on semantic associations between the acoustic characteristics of vowels and the notion of size. To this end, we designed an experiment in which we manipulated size on two dissociable levels: the physical size of the pictures presented during the experiment (perceptual level) and the implied size of the objects depicted in the pictures (semantic level). Participants performed an Implicit Association Test in which the pictures of small objects were larger than those of large objects - that is, the actual size ratio on the semantic level was inverted on the perceptual level. Our results suggest that participants matched visual and acoustic stimuli in accordance with the content of the pictures (i.e., the inferred size of the depicted object), whereas directly perceivable features (i.e., the physical size of the picture) had only a marginal influence on participants' performance. Moreover, as the experiment has been conducted at two different sites (Japan and Germany), the results also suggest that the participants' cultural background or mother tongue had only a negligible influence on the effect. Our results, therefore, support the assumption that associations across sensory modalities can be motivated by the semantic interpretation of presemantic stimuli.
Collapse
Affiliation(s)
- Hideyuki Hoshi
- Department of Language and Literature, Max Planck Institute for Empirical Aesthetics, Frankfurt, Germany
| | - Nahyun Kwon
- Department of English Linguistics, Graduate School of Humanities, Nagoya University, Japan
| | - Kimi Akita
- Department of English Linguistics, Graduate School of Humanities, Nagoya University, Japan
| | - Jan Auracher
- Department of Language and Literature, Max Planck Institute for Empirical Aesthetics, Frankfurt, Germany
| |
Collapse
|
41
|
Burred JJ, Ponsot E, Goupil L, Liuni M, Aucouturier JJ. CLEESE: An open-source audio-transformation toolbox for data-driven experiments in speech and music cognition. PLoS One 2019; 14:e0205943. [PMID: 30947281 PMCID: PMC6448843 DOI: 10.1371/journal.pone.0205943] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2018] [Accepted: 02/15/2019] [Indexed: 11/29/2022] Open
Abstract
Over the past few years, the field of visual social cognition and face processing has been dramatically impacted by a series of data-driven studies employing computer-graphics tools to synthesize arbitrary meaningful facial expressions. In the auditory modality, reverse correlation is traditionally used to characterize sensory processing at the level of spectral or spectro-temporal stimulus properties, but not higher-level cognitive processing of e.g. words, sentences or music, by lack of tools able to manipulate the stimulus dimensions that are relevant for these processes. Here, we present an open-source audio-transformation toolbox, called CLEESE, able to systematically randomize the prosody/melody of existing speech and music recordings. CLEESE works by cutting recordings in small successive time segments (e.g. every successive 100 milliseconds in a spoken utterance), and applying a random parametric transformation of each segment’s pitch, duration or amplitude, using a new Python-language implementation of the phase-vocoder digital audio technique. We present here two applications of the tool to generate stimuli for studying intonation processing of interrogative vs declarative speech, and rhythm processing of sung melodies.
Collapse
Affiliation(s)
| | - Emmanuel Ponsot
- Science and Technology of Music and Sound (UMR9912, IRCAM/CNRS/Sorbonne Université), Paris, France
- Laboratoire des Systèmes Perceptifs (CNRS UMR 8248) and Département d’études cognitives, École Normale Supérieure, PSL Research University, Paris, France
| | - Louise Goupil
- Science and Technology of Music and Sound (UMR9912, IRCAM/CNRS/Sorbonne Université), Paris, France
| | - Marco Liuni
- Science and Technology of Music and Sound (UMR9912, IRCAM/CNRS/Sorbonne Université), Paris, France
| | - Jean-Julien Aucouturier
- Science and Technology of Music and Sound (UMR9912, IRCAM/CNRS/Sorbonne Université), Paris, France
- * E-mail:
| |
Collapse
|
42
|
Zhan J, Ince RAA, van Rijsbergen N, Schyns PG. Dynamic Construction of Reduced Representations in the Brain for Perceptual Decision Behavior. Curr Biol 2019; 29:319-326.e4. [PMID: 30639108 PMCID: PMC6345582 DOI: 10.1016/j.cub.2018.11.049] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Revised: 10/23/2018] [Accepted: 11/20/2018] [Indexed: 01/03/2023]
Abstract
Over the past decade, extensive studies of the brain regions that support face, object, and scene recognition suggest that these regions have a hierarchically organized architecture that spans the occipital and temporal lobes [1-14], where visual categorizations unfold over the first 250 ms of processing [15-19]. This same architecture is flexibly involved in multiple tasks that require task-specific representations-e.g. categorizing the same object as "a car" or "a Porsche." While we partly understand where and when these categorizations happen in the occipito-ventral pathway, the next challenge is to unravel how these categorizations happen. That is, how does high-dimensional input collapse in the occipito-ventral pathway to become low dimensional representations that guide behavior? To address this, we investigated what information the brain processes in a visual perception task and visualized the dynamic representation of this information in brain activity. To do so, we developed stimulus information representation (SIR), an information theoretic framework, to tease apart stimulus information that supports behavior from that which does not. We then tracked the dynamic representations of both in magneto-encephalographic (MEG) activity. Using SIR, we demonstrate that a rapid (∼170 ms) reduction of behaviorally irrelevant information occurs in the occipital cortex and that representations of the information that supports distinct behaviors are constructed in the right fusiform gyrus (rFG). Our results thus highlight how SIR can be used to investigate the component processes of the brain by considering interactions between three variables (stimulus information, brain activity, behavior), rather than just two, as is the current norm.
Collapse
Affiliation(s)
- Jiayu Zhan
- Institute of Neuroscience and Psychology, University of Glasgow, Scotland G12 8QB, United Kingdom
| | - Robin A A Ince
- Institute of Neuroscience and Psychology, University of Glasgow, Scotland G12 8QB, United Kingdom
| | - Nicola van Rijsbergen
- Institute of Neuroscience and Psychology, University of Glasgow, Scotland G12 8QB, United Kingdom
| | - Philippe G Schyns
- Institute of Neuroscience and Psychology, University of Glasgow, Scotland G12 8QB, United Kingdom; School of Psychology, University of Glasgow, 62 Hillhead Street, Glasgow, Scotland G12 8QB, United Kingdom.
| |
Collapse
|
43
|
Angry, old, male - and trustworthy? How expressive and person voice characteristics shape listener trust. PLoS One 2019; 14:e0210555. [PMID: 30650135 PMCID: PMC6334957 DOI: 10.1371/journal.pone.0210555] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2018] [Accepted: 12/27/2018] [Indexed: 11/19/2022] Open
Abstract
This study examined how trustworthiness impressions depend on vocal expressive and person characteristics and how their dependence may be explained by acoustical profiles. Sentences spoken in a range of emotional and conversational expressions by 20 speakers differing in age and sex were presented to 80 age and sex matched listeners who rated speaker trustworthiness. Positive speaker valence but not arousal consistently predicted greater perceived trustworthiness. Additionally, voices from younger as compared with older and female as compared with male speakers were judged more trustworthy. Acoustic analysis highlighted several parameters as relevant for differentiating trustworthiness ratings and showed that effects largely overlapped with those for speaker valence and age, but not sex. Specifically, a fast speech rate, a low harmonic-to-noise ratio, and a low fundamental frequency mean and standard deviation differentiated trustworthy from untrustworthy, positive from negative, and younger from older voices. Male and female voices differed in other ways. Together, these results show that a speaker’s expressive as well as person characteristics shape trustworthiness impressions and that their effect likely results from a combination of low-level perceptual and higher-order conceptual processes.
Collapse
|
44
|
Mahrholz G, Belin P, McAleer P. Judgements of a speaker's personality are correlated across differing content and stimulus type. PLoS One 2018; 13:e0204991. [PMID: 30286148 PMCID: PMC6171871 DOI: 10.1371/journal.pone.0204991] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2017] [Accepted: 09/18/2018] [Indexed: 11/19/2022] Open
Abstract
It has previously been shown that first impressions of a speaker's personality, whether accurate or not, can be judged from short utterances of vowels and greetings, as well as from prolonged sentences and readings of complex paragraphs. From these studies, it is established that listeners' judgements are highly consistent with one another, suggesting that different people judge personality traits in a similar fashion, with three key personality traits being related to measures of valence (associated with trustworthiness), dominance, and attractiveness. Yet, particularly in voice perception, limited research has established the reliability of such personality judgements across stimulus types of varying lengths. Here we investigate whether first impressions of trustworthiness, dominance, and attractiveness of novel speakers are related when a judgement is made on hearing both one word and one sentence from the same speaker. Secondly, we test whether what is said, thus adjusting content, influences the stability of personality ratings. 60 Scottish voices (30 females) were recorded reading two texts: one of ambiguous content and one with socially-relevant content. One word (~500 ms) and one sentence (~3000 ms) were extracted from each recording for each speaker. 181 participants (138 females) rated either male or female voices across both content conditions (ambiguous, socially-relevant) and both stimulus types (word, sentence) for one of the three personality traits (trustworthiness, dominance, attractiveness). Pearson correlations showed personality ratings between words and sentences were strongly correlated, with no significant influence of content. In short, when establishing an impression of a novel speaker, judgments of three key personality traits are highly related whether you hear one word or one sentence, irrespective of what they are saying. This finding is consistent with initial personality judgments serving as elucidators of approach or avoidance behaviour, without modulation by time or content. All data and sounds are available on OSF (osf.io/s3cxy).
Collapse
Affiliation(s)
- Gaby Mahrholz
- School of Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Pascal Belin
- Institut des Neurosciences de la Timone, UMR 7289, CNRS and Université Aix-Marseille, Marseille, France
| | - Phil McAleer
- School of Psychology, University of Glasgow, Glasgow, United Kingdom
| |
Collapse
|
45
|
Reply to Knight et al.: The complexity of inferences from speech prosody should be addressed using data-driven approaches. Proc Natl Acad Sci U S A 2018; 115:E6104-E6105. [PMID: 29899153 DOI: 10.1073/pnas.1806857115] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
46
|
The social code of speech prosody must be specific and generalizable. Proc Natl Acad Sci U S A 2018; 115:E6103. [PMID: 29899154 DOI: 10.1073/pnas.1806345115] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|